LLM News

Every LLM release, update, and milestone.

Filtered by:post-training✕ clear

research

Research: Token-wise KV cache compression cuts memory to 6% while retaining 94% performance

Researchers propose DynaKV, a post-training framework that dynamically allocates compression rates to individual tokens based on semantic importance. The method achieves 94% baseline performance while reducing KV cache to just 6% of original size on LongBench benchmarks.

March 6, 2026 · 6:08 AM2 min read

kv-cache inference-optimization model-compression

via arxiv.org ↗

research

Reinforcement fine-tuning preserves model knowledge better than supervised fine-tuning, study finds

A new study on Qwen2.5-VL reveals reinforcement fine-tuning (RFT) significantly outperforms supervised fine-tuning (SFT) at preserving a model's existing knowledge during post-training adaptation. While SFT enables faster task learning, it causes catastrophic forgetting; RFT learns more slowly but maintains prior knowledge by reinforcing samples naturally aligned with the base model's probability landscape.

March 6, 2026 · 5:10 AM2 min read

reinforcement-learning fine-tuning multimodal-models

via arxiv.org ↗

research

Self-confidence signals enable unsupervised reward training for text-to-image models

Researchers introduce SOLACE, a post-training framework that replaces external reward models with an internal self-confidence signal derived from how accurately a text-to-image model recovers injected noise. The method enables fully unsupervised optimization and shows measurable improvements in compositional generation, text rendering, and text-image alignment.

March 6, 2026 · 5:05 AM2 min read

text-to-image post-training reward-modeling

via arxiv.org ↗