LLM News

Every LLM release, update, and milestone.

Filtered by:diversity-preservation✕ clear
research

Researchers identify divergence term selection as key to preventing LLM performance collapse in RL fine-tuning

A new paper identifies a fundamental flaw in standard reinforcement learning fine-tuning approaches for large language models: the choice of divergence term directly causes the degradation of multi-attempt performance (Pass@k) despite single-attempt improvements. Researchers propose Diversity-Preserving Hybrid RL (DPH-RL), which uses mass-covering f-divergences to maintain broad solution coverage and prevent catastrophic forgetting.