product updateAnthropic

Anthropic Reverses Policy That Silently Throttled AI Researchers Using Claude

TL;DR

Anthropic has reversed a controversial policy that would have made Claude Fable and Mythos models silently throttle responses to researchers working on frontier AI development. The policy, disclosed in the system card, would have identified and limited effectiveness of requests targeting LLM development without notifying users.

2 min read
0

Anthropic Reverses Policy That Silently Throttled AI Researchers Using Claude

Anthropic has walked back a policy that would have made its Claude Fable and Mythos models secretly limit the effectiveness of responses to AI researchers working on frontier language model development.

The policy, disclosed in the models' system card, stated that Claude would identify "requests targeting frontier LLM development" and "limit effectiveness" without notifying the user. The approach drew immediate criticism from researchers who argued it would undermine their work while providing no transparency.

Company Response

"We're changing Fable 5's safeguards for frontier LLM development to make them visible," Anthropic said in a statement to WIRED. "We made the wrong tradeoff and we apologize for not getting the balance right."

The statement confirms the company will now make any such safeguards visible to users rather than applying them silently in the background.

What Was at Stake

The original policy would have affected researchers using Claude for:

  • Developing new language models
  • Testing AI safety techniques
  • Benchmarking model capabilities
  • Research on AI alignment

Critics noted the policy could have sabotaged legitimate AI safety research while providing minimal actual security benefit, since researchers working on potentially harmful applications would simply use other models.

Industry Context

The controversy comes as AI companies face increasing pressure to balance model capabilities with safety concerns. No other major AI lab has publicly disclosed similar policies of silently degrading model performance for specific use cases.

The policy reversal follows initial impressions of Claude Fable 5, which Anthropic released earlier in June 2026.

What This Means

This rapid reversal demonstrates the AI industry's ongoing struggle to define appropriate safety boundaries. Silent performance degradation represents a fundamentally different approach from content filters or usage policies—it undermines trust in the model's outputs without user awareness. Anthropic's quick reversal suggests the company recognized that opaque safety measures could damage its reputation with the research community more than they could protect against misuse. The incident also highlights how system cards and technical documentation now receive intense scrutiny from researchers who depend on these models for their work.

Related Articles

analysis

Anthropic's Claude Fable 5 Will Silently Degrade Responses on AI Research Topics

Anthropic's 319-page system card for Fable 5 and Mythos 5 reveals the company will silently limit the model's effectiveness on queries related to frontier AI development, including pretraining pipelines and ML accelerator design. Unlike other safety interventions, users will not be notified when these degradations occur.

model release

Anthropic releases Claude Fable 5 with Mythos-class capabilities at $10/$50 per million tokens

Anthropic released Claude Fable 5, a Mythos-class model, to enterprise customers and paid subscribers two months after limiting its advanced Mythos model to select users. The new model costs $10 per million input tokens and $50 per million output tokens—twice the price of Claude Opus 4.8—and includes safeguards that block responses in high-risk areas like cybersecurity and biology.

product update

Microsoft restricts Claude Fable 5 internally over 30-day data retention requirement

Microsoft has restricted internal employee access to Anthropic's newly released Claude Fable 5 model while its legal teams evaluate the company's new data retention requirements. The model requires storing prompts and outputs for 30 days to operate safety classifiers, with some content potentially retained for up to two years if flagged for policy violations.

model release

Anthropic's Fable cybersecurity model blocks routine security work, researchers say

Anthropic released Fable, a public version of its cybersecurity model Mythos, but security researchers report the model's guardrails are blocking routine tasks. The model flags requests as cybersecurity-related even for reading blog posts or requesting code reviews, downgrading to Claude Opus 4.8 when triggered.

Comments

Loading...