LLM News

Every LLM release, update, and milestone.

Filtered by:mathematical-reasoning✕ clear

research

New test-time training method improves LLM reasoning through self-reflection

Researchers propose TTSR, a test-time training framework where a single LLM alternates between Student and Teacher roles to improve its own reasoning. The method generates targeted variant questions based on analyzed failure patterns, showing consistent improvements across mathematical reasoning benchmarks without relying on unreliable pseudo-labels.

March 5, 2026 · 6:08 AM2 min read

test-time-training reasoning self-improvement

via arxiv.org ↗

research

New RLVR method reformulates reward-based LLM training as classification problem

A new research paper proposes Rewards as Labels (REAL), a framework that reframes reinforcement learning with verifiable rewards as a classification problem rather than scalar weighting. The method addresses fundamental gradient optimization issues in current GRPO variants and demonstrates measurable improvements on mathematical reasoning benchmarks.

March 5, 2026 · 5:23 AM2 min read

rlvr reinforcement-learning llm-training

via arxiv.org ↗

research

NeuroProlog framework combines neural networks with symbolic reasoning to fix LLM math errors

Researchers introduce NeuroProlog, a neurosymbolic framework that compiles math word problems into executable Prolog programs with formal verification guarantees. A multi-task "Cocktail" training strategy achieves significant accuracy improvements on GSM8K: +5.23% on Qwen-32B, +3.43% on GPT-OSS-20B, and +5.54% on Llama-3B compared to single-task baselines.

March 5, 2026 · 5:10 AM2 min read

neurosymbolic-ai mathematical-reasoning prolog

via arxiv.org ↗

research

LaDiR uses latent diffusion to improve LLM reasoning beyond autoregressive limits

Researchers propose LaDiR, a framework that replaces traditional autoregressive decoding with latent diffusion models to improve LLM reasoning. The approach encodes reasoning steps into compressed latent representations and uses bidirectional attention to refine solutions iteratively, enabling parallel exploration of diverse reasoning paths.

March 5, 2026 · 1:09 AM2 min read

research reasoning diffusion-models

via arxiv.org ↗

research

New RL framework CORE helps LLMs bridge gap between solving math problems and understanding concepts

Researchers have identified a critical gap in how large language models learn mathematics: they can solve problems but often don't understand the underlying concepts. A new reinforcement learning framework called CORE addresses this by using explicit concept definitions as training signals, rather than just reinforcing correct final answers.

March 5, 2026 · 1:07 AM2 min read

reinforcement-learning mathematical-reasoning LLM-training

via arxiv.org ↗

research

LaDiR uses latent diffusion to improve LLM reasoning beyond autoregressive decoding

Researchers propose LaDiR (Latent Diffusion Reasoner), a framework that combines variational autoencoders and latent diffusion models to improve LLM reasoning. The approach encodes reasoning steps into continuous latent representations, enabling iterative refinement and parallel generation of diverse solutions beyond traditional autoregressive decoding.

March 5, 2026 · 12:52 AM1 min read

reasoning chain-of-thought latent-diffusion

via arxiv.org ↗