LLM News | TPS

research

SureLock cuts masked diffusion language model decoding compute by 30-50%

Researchers propose SureLock, a technique that reduces computational FLOPs in masked diffusion language model decoding by 30-50% on LLaDA-8B by skipping attention and feed-forward computations for tokens that have converged. The method caches key-value pairs for locked positions while continuing to compute for unlocked tokens, reducing per-iteration complexity from O(N²d) to O(MNd).

March 5, 2026 · 5:36 AM2 min read

masked-diffusion language-models decoding-optimization

via arxiv.org ↗