Anthropic reverses stealth policy that secretly downgraded Claude Fable 5 for AI research tasks
Anthropic is making visible its policy of restricting Claude Fable 5 for certain AI development tasks, after researchers discovered the model was secretly rerouting requests to lesser models without disclosure. The company apologized for the lack of transparency but maintained the underlying restrictions.
Anthropic reverses stealth policy that secretly downgraded Claude Fable 5 for AI research tasks
Anthropic is reversing course on a policy that secretly degraded Claude Fable 5's performance for AI research tasks, the company told Wired. The model was quietly rerouting certain requests to lesser models without documenting the restrictions.
What happened
Researchers using Claude Fable 5 — a model based on Anthropic's Mythos system — discovered the model would degrade or refuse responses for specific tasks including:
- Training competing LLMs
- Debugging AI code
- Optimizing neural architecture
The restrictions were not disclosed in the model's documentation. Researchers only discovered the degradation after burning tokens and costs on a model that didn't perform as expected.
"Degrading performance on ML research without telling the user is shockingly hostile and a terrible look," said research fellow and Substack author Dean W. Ball.
Anthropic's response
The company acknowledged the mistake in a statement: "We're changing Fable 5's safeguards for frontier LLM development to make them visible. We made the wrong tradeoff and we apologize for not getting the balance right."
According to Wired, Anthropic will now alert users when it suspects they are trying to use Claude to build highly capable AI, either refusing the request or notifying them of rerouting to a less capable model.
The underlying restrictions remain in place — only the disclosure approach has changed.
Why it matters
The controversy struck at Anthropic's positioning as a researcher-friendly alternative to OpenAI. The company has emphasized its commitment to working closely with the academic community and operating with greater transparency than competitors.
The stealth restrictions created a trust problem: researchers couldn't determine if their prompts were failing due to model limitations or undisclosed policy interventions. This makes reproducible research and accurate benchmarking impossible.
What this means
Anthropic's policy reversal shows the tension between AI safety measures and research transparency. While companies may want to prevent their models from training competing systems, implementing restrictions without disclosure undermines the trust essential to research partnerships. The incident demonstrates that even companies positioning themselves as ethical alternatives face scrutiny when policies affect researchers' ability to understand what tools they're actually using.
Related Articles
Anthropic reverses course on invisible Claude Fable distillation guardrails after researcher backlash
Anthropic is making its anti-distillation safeguards visible in Claude Fable 5 after backlash over silently degrading responses when it detected attempts to use the model for training competing systems. Queries suspected of distillation will now be routed to Claude Opus 4.8 with explicit user notification, matching how the company handles other high-risk areas.
Anthropic Reverses Policy That Silently Throttled AI Researchers Using Claude
Anthropic has reversed a controversial policy that would have made Claude Fable and Mythos models silently throttle responses to researchers working on frontier AI development. The policy, disclosed in the system card, would have identified and limited effectiveness of requests targeting LLM development without notifying users.
Anthropic's Claude Fable 5 Will Silently Degrade Responses on AI Research Topics
Anthropic's 319-page system card for Fable 5 and Mythos 5 reveals the company will silently limit the model's effectiveness on queries related to frontier AI development, including pretraining pipelines and ML accelerator design. Unlike other safety interventions, users will not be notified when these degradations occur.
Anthropic's Fable cybersecurity model blocks routine security work, researchers say
Anthropic released Fable, a public version of its cybersecurity model Mythos, but security researchers report the model's guardrails are blocking routine tasks. The model flags requests as cybersecurity-related even for reading blog posts or requesting code reviews, downgrading to Claude Opus 4.8 when triggered.
Comments
Loading...