Apple researchers combine diffusion and autoregressive techniques to improve LLM reasoning accuracy
Apple researchers, alongside UC San Diego, have published LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning, a framework that combines diffusion models with autoregressive generation. The system runs multiple reasoning paths in parallel during inference, each exploring different possibilities before generating a final answer.
Apple researchers combine diffusion and autoregressive techniques to improve LLM reasoning accuracy
Apple researchers, in collaboration with the University of California, San Diego, have published a revised study detailing LaDiR (Latent Diffusion Enhances LLMs for Text Reasoning), a framework that improves large language model performance on math reasoning, code generation, and planning tasks.
How LaDiR works
LaDiR combines two distinct approaches to text generation. During the reasoning phase, it uses diffusion models—which iterate over many tokens in parallel—before switching to autoregressive generation for the final output, which produces tokens one at a time.
The framework runs multiple reasoning paths simultaneously during inference. Each path begins with random noise and gradually refines into coherent reasoning steps through a diffusion process. A built-in mechanism encourages these parallel paths to explore different possibilities rather than converging prematurely on the same solution.
Once sufficient reasoning is complete, the system switches to autoregressive mode to generate the final answer token by token.
LaDiR is not a standalone model but a framework that modifies how existing language models reason through problems.
Benchmark performance
Researchers tested LaDiR on Meta's LLaMA 3.1 8B for math reasoning and puzzle planning, and on Qwen3-8B-Base for code generation.
On math benchmarks, LaDiR achieved higher accuracy than existing approaches and demonstrated stronger performance on out-of-distribution tasks. For code generation on HumanEval, LaDiR outperformed standard fine-tuning, particularly on harder problems.
In puzzle-style planning tasks like the Countdown game, LaDiR explored a wider range of valid answers than baseline models and found correct solutions more reliably than general-purpose baselines. However, it fell short of specialized, task-specific models on single-attempt accuracy.
What this means
LaDiR represents a hybrid approach that leverages the parallel exploration capabilities of diffusion models while maintaining the sequential precision of autoregressive generation. By running multiple reasoning paths simultaneously, the framework can explore a broader solution space before committing to a final answer. The benchmark results suggest this approach is particularly effective for complex reasoning tasks where considering multiple possibilities improves accuracy, though specialized models still hold advantages for specific use cases. The framework's applicability to existing models like LLaMA and Qwen indicates it could be adopted across different base architectures.
Related Articles
NVIDIA Shows Task-Seeded Synthetic Data Boosts Nemotron-3 Nano by +11.1 on GPQA
NVIDIA demonstrated that task-seeded synthetic Q&A data improves model performance across multiple benchmarks in a 100B-token continuation experiment on Nemotron-3 Nano. The approach improved GPQA scores by +11.1 points, MMLU-Pro by +1.8, average code by +1.9, and commonsense understanding by +1.6.
Apple releases AFM 3 lineup: 20B-parameter on-device model and cloud AI running on Google's Nvidia infrastructure
Apple announced five third-generation foundation models at WWDC26, headlined by AFM 3 Core Advanced—a 20-billion-parameter sparse model that runs on-device by activating only 1-4 billion parameters at a time. For the first time, Apple extended Private Cloud Compute to third-party infrastructure, with AFM 3 Cloud Pro running on Nvidia GPUs in Google Cloud.
Apple integrates Google Gemini into Xcode 27, expanding native agentic coding options
Apple's Xcode 27 adds native support for Google Gemini, joining existing integrations with Anthropic's Claude and OpenAI's Codex. The update also introduces improved interfaces, interactive planning, and multiturn Q&A capabilities for AI-assisted development.
Apple integrates Google Gemini into Siri, limits availability to select regions
Apple announced Siri AI integration with Google Gemini at its WWDC 2026 event at Apple Park. The update represents Apple's latest AI push, though regional restrictions reportedly limit availability for many users globally.
Comments
Loading...