LLM News

Every LLM release, update, and milestone.

Filtered by:llm-alignment✕ clear
research

Research: Contrastive refinement reduces AI model over-refusal without sacrificing safety

Researchers propose DCR (Discernment via Contrastive Refinement), a pre-alignment technique that reduces the tendency of safety-aligned language models to reject benign prompts while preserving rejection of genuinely harmful content. The method addresses a core trade-off in current safety alignment: reducing over-refusal typically degrades harm-detection capabilities.

research

Researchers develop inference-time personality sliders for LLMs without retraining

Researchers have developed a parameter-efficient method to control LLM personalities at inference time using Sequential Adaptive Steering (SAS), which orthogonalizes steering vectors to avoid interference when adjusting multiple traits simultaneously. The approach allows users to modulate the Big Five personality dimensions by adjusting numerical coefficients without retraining models.