LLM News

Every LLM release, update, and milestone.

Filtered by:chain-of-thought✕ clear

research

Protein function prediction requires tool-use, not just reasoning, new research shows

A new study challenges the assumption that chain-of-thought reasoning translates directly to biological domains. Researchers found that text-only reasoning for protein function prediction produces superficial patterns rather than new biological knowledge. A tool-augmented agent called PFUA achieves 103% average performance improvement by integrating domain-specific tools for verifiable intermediate evidence.

March 6, 2026 · 5:21 AM2 min read

protein-function-prediction tool-augmented-reasoning scientific-ai

via arxiv.org ↗

research

Study shows RL training enables LLMs to abstain on unanswerable temporal questions, outperforming GPT-4o

A new arXiv study presents the first systematic evaluation of training large language models to abstain—refuse to answer—on temporal questions they cannot reliably answer. Using reinforcement learning with abstention-aware rewards, researchers achieved 3.46-5.80% higher accuracy on temporal QA benchmarks than GPT-4o, while improving true positive rates on unanswerable questions by 20%.

March 5, 2026 · 5:36 AM2 min read

research abstention temporal-qa

via arxiv.org ↗

research

Reasoning models fail at theory of mind tasks despite math excellence

A systematic study of nine advanced language models reveals that large reasoning models—designed to excel at step-by-step math and coding—actually underperform or match non-reasoning models on theory of mind tasks. The research identifies a critical weakness: longer reasoning chains actively harm social reasoning performance, suggesting current reasoning architectures don't transfer to socio-cognitive skills.

March 5, 2026 · 5:23 AM2 min read

theory-of-mind reasoning-models llm-evaluation

via arxiv.org ↗

research

LaDiR uses latent diffusion to improve LLM reasoning beyond autoregressive limits

Researchers propose LaDiR, a framework that replaces traditional autoregressive decoding with latent diffusion models to improve LLM reasoning. The approach encodes reasoning steps into compressed latent representations and uses bidirectional attention to refine solutions iteratively, enabling parallel exploration of diverse reasoning paths.

March 5, 2026 · 1:09 AM2 min read

research reasoning diffusion-models

via arxiv.org ↗

research

Alignment tuning shrinks LLM output diversity by 2-5x, new research shows

A new arXiv paper introduces the Branching Factor (BF), a metric quantifying output diversity in large language models, and finds that alignment tuning reduces this diversity by 2-5x overall—and up to 10x at early generation positions. The research suggests alignment doesn't fundamentally change model behavior but instead steers outputs toward lower-entropy token sequences already present in base models.

March 5, 2026 · 12:53 AM2 min read

alignment llm-research output-diversity

via arxiv.org ↗

research

LaDiR uses latent diffusion to improve LLM reasoning beyond autoregressive decoding

Researchers propose LaDiR (Latent Diffusion Reasoner), a framework that combines variational autoencoders and latent diffusion models to improve LLM reasoning. The approach encodes reasoning steps into continuous latent representations, enabling iterative refinement and parallel generation of diverse solutions beyond traditional autoregressive decoding.

March 5, 2026 · 12:52 AM1 min read

reasoning chain-of-thought latent-diffusion

via arxiv.org ↗

model releaseDeepSeek

DeepSeek releases R1 reasoning model with chain-of-thought capabilities

DeepSeek has released DeepSeek-R1, a text generation model featuring reasoning capabilities through chain-of-thought processing. The model was published January 20, 2025 and has accumulated over 830,000 downloads on Hugging Face.

February 27, 2026 · 11:05 AM2 min read

deepseek model-release reasoning

via huggingface.co ↗

researchByteDance

Bytedance study: reasoning models know when to stop, but sampling methods force continued thinking

A new Bytedance study reveals that large reasoning models actually know when they've reached the correct answer, but common sampling methods prevent them from stopping. The models engage in unnecessary cross-checking and reformulation despite already solving problems correctly.

February 25, 2026 · 6:20 PM2 min read

reasoning-models inference-efficiency sampling-methods

via the-decoder.com ↗