LLM News

Every LLM release, update, and milestone.

Filtered by:over-refusal✕ clear
research

Research: Contrastive refinement reduces AI model over-refusal without sacrificing safety

Researchers propose DCR (Discernment via Contrastive Refinement), a pre-alignment technique that reduces the tendency of safety-aligned language models to reject benign prompts while preserving rejection of genuinely harmful content. The method addresses a core trade-off in current safety alignment: reducing over-refusal typically degrades harm-detection capabilities.

research

New Method Reduces AI Over-Refusal Without Sacrificing Safety Alignment

A new alignment technique called Discernment via Contrastive Refinement (DCR) addresses a persistent problem in safety-aligned LLMs: over-refusal, where models reject benign requests as toxic. The method uses contrastive refinement to help models better distinguish genuinely harmful prompts from superficially toxic ones, reducing refusals while preserving safety.