subliminal-learning

1 article tagged with subliminal-learning

April 15, 2026

Anthropic study shows LLMs transfer hidden biases through distillation even when scrubbed from training data

Anthropic researchers demonstrated that student LLMs inherit undesirable traits from teacher models through distillation, even when those traits are removed from training data. In experiments using GPT-4.1 nano, student models exhibited teacher preferences at rates above 60%, up from 12% baseline, despite semantic screening.

April 15, 2026 · 5:06 PM

← Back to all news