LLM News

Every LLM release, update, and milestone.

Filtered by:computational-efficiency✕ clear
research

SureLock cuts masked diffusion language model decoding compute by 30-50%

Researchers propose SureLock, a technique that reduces computational FLOPs in masked diffusion language model decoding by 30-50% on LLaDA-8B by skipping attention and feed-forward computations for tokens that have converged. The method caches key-value pairs for locked positions while continuing to compute for unlocked tokens, reducing per-iteration complexity from O(N²d) to O(MNd).