LLM News

Every LLM release, update, and milestone.

Filtered by:bias-mitigation✕ clear
research

New Method Reduces AI Over-Refusal Without Sacrificing Safety Alignment

A new alignment technique called Discernment via Contrastive Refinement (DCR) addresses a persistent problem in safety-aligned LLMs: over-refusal, where models reject benign requests as toxic. The method uses contrastive refinement to help models better distinguish genuinely harmful prompts from superficially toxic ones, reducing refusals while preserving safety.