researchApple

Apple researchers combine diffusion and autoregressive techniques to improve LLM reasoning accuracy

TL;DR

Apple researchers, alongside UC San Diego, have published LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning, a framework that combines diffusion models with autoregressive generation. The system runs multiple reasoning paths in parallel during inference, each exploring different possibilities before generating a final answer.

2 min read
0

Apple researchers combine diffusion and autoregressive techniques to improve LLM reasoning accuracy

Apple researchers, in collaboration with the University of California, San Diego, have published a revised study detailing LaDiR (Latent Diffusion Enhances LLMs for Text Reasoning), a framework that improves large language model performance on math reasoning, code generation, and planning tasks.

How LaDiR works

LaDiR combines two distinct approaches to text generation. During the reasoning phase, it uses diffusion models—which iterate over many tokens in parallel—before switching to autoregressive generation for the final output, which produces tokens one at a time.

The framework runs multiple reasoning paths simultaneously during inference. Each path begins with random noise and gradually refines into coherent reasoning steps through a diffusion process. A built-in mechanism encourages these parallel paths to explore different possibilities rather than converging prematurely on the same solution.

Once sufficient reasoning is complete, the system switches to autoregressive mode to generate the final answer token by token.

LaDiR is not a standalone model but a framework that modifies how existing language models reason through problems.

Benchmark performance

Researchers tested LaDiR on Meta's LLaMA 3.1 8B for math reasoning and puzzle planning, and on Qwen3-8B-Base for code generation.

On math benchmarks, LaDiR achieved higher accuracy than existing approaches and demonstrated stronger performance on out-of-distribution tasks. For code generation on HumanEval, LaDiR outperformed standard fine-tuning, particularly on harder problems.

In puzzle-style planning tasks like the Countdown game, LaDiR explored a wider range of valid answers than baseline models and found correct solutions more reliably than general-purpose baselines. However, it fell short of specialized, task-specific models on single-attempt accuracy.

What this means

LaDiR represents a hybrid approach that leverages the parallel exploration capabilities of diffusion models while maintaining the sequential precision of autoregressive generation. By running multiple reasoning paths simultaneously, the framework can explore a broader solution space before committing to a final answer. The benchmark results suggest this approach is particularly effective for complex reasoning tasks where considering multiple possibilities improves accuracy, though specialized models still hold advantages for specific use cases. The framework's applicability to existing models like LLaMA and Qwen indicates it could be adopted across different base architectures.

Related Articles

research

NVIDIA Shows Task-Seeded Synthetic Data Boosts Nemotron-3 Nano by +11.1 on GPQA

NVIDIA demonstrated that task-seeded synthetic Q&A data improves model performance across multiple benchmarks in a 100B-token continuation experiment on Nemotron-3 Nano. The approach improved GPQA scores by +11.1 points, MMLU-Pro by +1.8, average code by +1.9, and commonsense understanding by +1.6.

model release

Apple releases AFM 3 lineup: 20B-parameter on-device model and cloud AI running on Google's Nvidia infrastructure

Apple announced five third-generation foundation models at WWDC26, headlined by AFM 3 Core Advanced—a 20-billion-parameter sparse model that runs on-device by activating only 1-4 billion parameters at a time. For the first time, Apple extended Private Cloud Compute to third-party infrastructure, with AFM 3 Cloud Pro running on Nvidia GPUs in Google Cloud.

product update

Apple integrates Google Gemini into Xcode 27, expanding native agentic coding options

Apple's Xcode 27 adds native support for Google Gemini, joining existing integrations with Anthropic's Claude and OpenAI's Codex. The update also introduces improved interfaces, interactive planning, and multiturn Q&A capabilities for AI-assisted development.

product update

Apple integrates Google Gemini into Siri, limits availability to select regions

Apple announced Siri AI integration with Google Gemini at its WWDC 2026 event at Apple Park. The update represents Apple's latest AI push, though regional restrictions reportedly limit availability for many users globally.

Comments

Loading...