LLM News

Every LLM release, update, and milestone.

Filtered by:diffusion-language-models✕ clear
research

Researchers develop pruning method that challenges attention-sink assumptions in diffusion language models

A new pruning method challenges the conventional wisdom inherited from autoregressive LLMs about preserving attention-sink tokens. Researchers demonstrate that attention sinks in diffusion language models are substantially less stable than in AR models, enabling more aggressive pruning without retraining.

research

New pruning technique cuts diffusion language model inference costs by identifying unstable attention sinks

Researchers have identified a fundamental difference in how attention mechanisms work in diffusion language models versus traditional autoregressive LLMs, enabling a new pruning strategy that removes unstable attention sinks without retraining. The finding challenges existing pruning assumptions inherited from autoregressive models and promises better quality-efficiency trade-offs during inference.