LaDiR uses latent diffusion to improve LLM reasoning beyond autoregressive limits
Researchers propose LaDiR, a framework that replaces traditional autoregressive decoding with latent diffusion models to improve LLM reasoning. The approach encodes reasoning steps into compressed latent representations and uses bidirectional attention to refine solutions iteratively, enabling parallel exploration of diverse reasoning paths.
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
A new research paper presents LaDiR (Latent Diffusion Reasoner), a framework addressing a fundamental constraint in large language models: autoregressive decoding limits the ability to revisit and refine earlier tokens holistically.
How LaDiR Works
The framework operates in two stages:
Stage 1: Latent Space Construction A Variational Autoencoder (VAE) encodes text reasoning steps into compressed blocks of "thought tokens." This creates a structured latent reasoning space that preserves semantic information while reducing representation size, enabling more efficient computation.
Stage 2: Iterative Refinement A latent diffusion model learns to denoise these latent thought blocks using blockwise bidirectional attention masks. Unlike standard left-to-right generation, this bidirectional approach allows the model to see and revise the entire reasoning trajectory simultaneously.
Key Technical Differences
Traditional autoregressive LLMs commit to each token sequentially with no opportunity to revisit. LaDiR enables:
- Parallel generation of multiple diverse reasoning trajectories at test time
- Holistic planning and revision across the entire reasoning process
- Adaptive compute allocation where the model can spend more steps refining difficult problems
- Longer planning horizons without the token commitment of standard decoding
Evaluation Results
Empirical testing on mathematical reasoning and planning benchmarks shows LaDiR consistently outperforms:
- Existing autoregressive reasoning methods
- Pure diffusion-based approaches
- Other latent reasoning frameworks
Improvements span three dimensions: accuracy gains, greater diversity in generated solutions, and improved interpretability of the reasoning process.
What This Means
LaDiR demonstrates that diffusion models—already proven effective for image generation—can be adapted to improve language model reasoning. The key insight is that treating reasoning as a latent refinement problem rather than sequential token generation removes a structural limitation of current LLMs. This opens a new paradigm where models can plan and revise at a higher level of abstraction before committing to token sequences. The framework's adaptive compute allocation is particularly relevant for reasoning tasks where problem difficulty varies significantly.