LLM News | TPS

research

New safety steering technique reduces unsafe T2I outputs without degrading image quality

Researchers introduce Conditioned Activation Transport (CAT), a technique that reduces unsafe content generation in text-to-image models during inference without the quality degradation seen in previous linear steering approaches. The method uses a contrastive dataset of 2,300 safe/unsafe prompt pairs and geometry-based conditioning to target only unsafe activation regions.

March 5, 2026 · 1:08 AM2 min read

text-to-image safety activation-steering

via arxiv.org ↗