LLM News

Every LLM release, update, and milestone.

Filtered by:mathematical-reasoning✕ clear
research

New test-time training method improves LLM reasoning through self-reflection

Researchers propose TTSR, a test-time training framework where a single LLM alternates between Student and Teacher roles to improve its own reasoning. The method generates targeted variant questions based on analyzed failure patterns, showing consistent improvements across mathematical reasoning benchmarks without relying on unreliable pseudo-labels.

research

New RLVR method reformulates reward-based LLM training as classification problem

A new research paper proposes Rewards as Labels (REAL), a framework that reframes reinforcement learning with verifiable rewards as a classification problem rather than scalar weighting. The method addresses fundamental gradient optimization issues in current GRPO variants and demonstrates measurable improvements on mathematical reasoning benchmarks.

research

NeuroProlog framework combines neural networks with symbolic reasoning to fix LLM math errors

Researchers introduce NeuroProlog, a neurosymbolic framework that compiles math word problems into executable Prolog programs with formal verification guarantees. A multi-task "Cocktail" training strategy achieves significant accuracy improvements on GSM8K: +5.23% on Qwen-32B, +3.43% on GPT-OSS-20B, and +5.54% on Llama-3B compared to single-task baselines.

research

LaDiR uses latent diffusion to improve LLM reasoning beyond autoregressive limits

Researchers propose LaDiR, a framework that replaces traditional autoregressive decoding with latent diffusion models to improve LLM reasoning. The approach encodes reasoning steps into compressed latent representations and uses bidirectional attention to refine solutions iteratively, enabling parallel exploration of diverse reasoning paths.

2 min readvia arxiv.org
research

New RL framework CORE helps LLMs bridge gap between solving math problems and understanding concepts

Researchers have identified a critical gap in how large language models learn mathematics: they can solve problems but often don't understand the underlying concepts. A new reinforcement learning framework called CORE addresses this by using explicit concept definitions as training signals, rather than just reinforcing correct final answers.

research

LaDiR uses latent diffusion to improve LLM reasoning beyond autoregressive decoding

Researchers propose LaDiR (Latent Diffusion Reasoner), a framework that combines variational autoencoders and latent diffusion models to improve LLM reasoning. The approach encodes reasoning steps into continuous latent representations, enabling iterative refinement and parallel generation of diverse solutions beyond traditional autoregressive decoding.