LLM News

Every LLM release, update, and milestone.

Filtered by:kv-cache✕ clear

research

Research: Token-wise KV cache compression cuts memory to 6% while retaining 94% performance

Researchers propose DynaKV, a post-training framework that dynamically allocates compression rates to individual tokens based on semantic importance. The method achieves 94% baseline performance while reducing KV cache to just 6% of original size on LongBench benchmarks.

March 6, 2026 · 6:08 AM2 min read

kv-cache inference-optimization model-compression

via arxiv.org ↗