LLM News

Every LLM release, update, and milestone.

Filtered by:large-language-models✕ clear
research

REFLEX framework gives LLMs metacognitive reasoning for zero-shot robot planning

Researchers present REFLEX, a framework that equips LLM-powered robotic agents with metacognitive capabilities—skill decomposition, failure reflection, and solution synthesis—to perform complex tasks in zero-shot and few-shot settings. The system significantly outperforms existing baselines and demonstrates that LLMs can generate creative solutions that diverge from ground truth while still completing tasks successfully.

research

Researchers identify divergence term selection as key to preventing LLM performance collapse in RL fine-tuning

A new paper identifies a fundamental flaw in standard reinforcement learning fine-tuning approaches for large language models: the choice of divergence term directly causes the degradation of multi-attempt performance (Pass@k) despite single-attempt improvements. Researchers propose Diversity-Preserving Hybrid RL (DPH-RL), which uses mass-covering f-divergences to maintain broad solution coverage and prevent catastrophic forgetting.

product update

Meta pays News Corp up to $50M annually for AI training data in multi-year deal

Meta has committed to paying News Corp up to $50 million annually in a multi-year agreement for AI training data and content licensing. The deal represents Meta's continued strategy of securing high-quality publishing content for its AI models. The arrangement raises questions about the sustainability of individual content licensing deals versus industry-wide data standards.

researchApple

Apple Intelligence generates stereotyped summaries across hundreds of millions of devices

Apple Intelligence, which automatically summarizes notifications and messages on hundreds of millions of devices, systematically generates stereotyped and hallucinated content according to an independent AI Forensics investigation. The analysis of over 10,000 AI-generated summaries reveals bias baked into the feature that pushes problematic assumptions to users unprompted.

2 min readvia the-decoder.com