LLM News

Every LLM release, update, and milestone.

Filtered by:attention-mechanisms✕ clear

research

Researchers identify 'Lazy Attention' problem in multimodal AI training, boost reasoning by 7%

A new paper from arXiv identifies a critical flaw in how multimodal large reasoning models initialize training: they fail to properly attend to visual tokens, a phenomenon researchers call Lazy Attention Localization. The team proposes AVAR, a framework that corrects this through visual-anchored data synthesis and attention-guided objectives, achieving 7% average improvements across seven multimodal reasoning benchmarks when applied to Qwen2.5-VL-7B.

March 5, 2026 · 5:37 AM2 min read

multimodal-ai reasoning-models training

via arxiv.org ↗

research

DynFormer rethinks Transformers for physics simulations, cutting PDE solver errors by 95%

Researchers propose DynFormer, a Transformer variant designed specifically for solving partial differential equations (PDEs) that models physical systems at multiple scales simultaneously. By replacing uniform attention with specialized modules for different physical scales, DynFormer achieves up to 95% error reduction compared to existing neural operator baselines while consuming significantly less GPU memory.

March 5, 2026 · 1:09 AM2 min read

transformers pde-solvers neural-operators

via arxiv.org ↗

research

Researchers develop pruning method that challenges attention-sink assumptions in diffusion language models

A new pruning method challenges the conventional wisdom inherited from autoregressive LLMs about preserving attention-sink tokens. Researchers demonstrate that attention sinks in diffusion language models are substantially less stable than in AR models, enabling more aggressive pruning without retraining.

February 20, 2026 · 3:22 AM2 min read

diffusion-language-models pruning inference-optimization

via arxiv.org ↗

research

New pruning technique cuts diffusion language model inference costs by identifying unstable attention sinks

Researchers have identified a fundamental difference in how attention mechanisms work in diffusion language models versus traditional autoregressive LLMs, enabling a new pruning strategy that removes unstable attention sinks without retraining. The finding challenges existing pruning assumptions inherited from autoregressive models and promises better quality-efficiency trade-offs during inference.

February 20, 2026 · 3:21 AM2 min read

diffusion-language-models pruning inference-optimization

via arxiv.org ↗