subliminal-learning
1 article tagged with subliminal-learning
April 15, 2026
researchAnthropic
Anthropic study shows LLMs transfer hidden biases through distillation even when scrubbed from training data
Anthropic researchers demonstrated that student LLMs inherit undesirable traits from teacher models through distillation, even when those traits are removed from training data. In experiments using GPT-4.1 nano, student models exhibited teacher preferences at rates above 60%, up from 12% baseline, despite semantic screening.