LLM News

Every LLM release, update, and milestone.

Filtered by:diffusion-language-models✕ clear

research

Researchers develop pruning method that challenges attention-sink assumptions in diffusion language models

A new pruning method challenges the conventional wisdom inherited from autoregressive LLMs about preserving attention-sink tokens. Researchers demonstrate that attention sinks in diffusion language models are substantially less stable than in AR models, enabling more aggressive pruning without retraining.

February 20, 2026 · 3:22 AM2 min read

diffusion-language-models pruning inference-optimization

via arxiv.org ↗

research

New pruning technique cuts diffusion language model inference costs by identifying unstable attention sinks

Researchers have identified a fundamental difference in how attention mechanisms work in diffusion language models versus traditional autoregressive LLMs, enabling a new pruning strategy that removes unstable attention sinks without retraining. The finding challenges existing pruning assumptions inherited from autoregressive models and promises better quality-efficiency trade-offs during inference.

February 20, 2026 · 3:21 AM2 min read

diffusion-language-models pruning inference-optimization

via arxiv.org ↗