Anthropic's Fable cybersecurity model blocks routine security work, researchers say

TL;DR

Anthropic released Fable, a public version of its cybersecurity model Mythos, but security researchers report the model's guardrails are blocking routine tasks. The model flags requests as cybersecurity-related even for reading blog posts or requesting code reviews, downgrading to Claude Opus 4.8 when triggered.

June 10, 2026 · 3:50 PM2 min read

Anthropic's Fable cybersecurity model blocks routine security work, researchers say

Anthropic released Fable on Tuesday, a public and limited version of its cybersecurity model Mythos, but security researchers are reporting the model's guardrails are blocking legitimate work.

"[Fable] rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post," said Valentina "Chompie" Palmiotti, a security researcher at IBM X-Force.

How the guardrails work

When triggered, Fable pauses the chat and displays a message that "safety measures flagged this message for cybersecurity or biology topics." The model then downgrades to Claude Opus 4.8. The restrictions aim to prevent Fable from being used to develop malware or compromise software, with similar restrictions on biology to prevent biological weapon development.

Matt Suiche, a cybersecurity veteran and member of the technical staff at AI cybersecurity startup Tolmo, told TechCrunch the system appears keyword-based. "If you ask it to write secure code, it assumes it is cybersecurity related work instead of software engineering best practices, and you get downgraded," Suiche said. "Anything in the lexical field of 'cybersecurity' triggers the guardrails."

Another researcher reported that even requesting a code review triggers the guardrails.

Access to Mythos remains restricted

Anthropic released Mythos in April through Project Glasswing, restricting access to a limited number of companies and organizations for securing critical software and infrastructure. Last week, Anthropic expanded Mythos access to hundreds of organizations across 15 countries, but the full model remains unavailable to most users.

Anthropic operates a Cyber Verification Program that allows approved cybersecurity professionals to use Claude with fewer limitations. OpenAI maintains a similar program called Trusted Access for Cyber.

What this means

The overly broad guardrails on Fable highlight the challenge of releasing capable AI models for specialized domains. While Anthropic's caution is understandable given malware development risks, the current implementation appears to conflate basic security engineering practices with malicious activity. Suiche noted the approach may be appropriate for an initial release: "It's better to catch more people than not enough when you do such a release and to relax the guardrails over time." The effectiveness of Fable as a security tool will depend on Anthropic's ability to calibrate these restrictions to allow legitimate defensive security work while blocking offensive capabilities.

Source: techcrunch.com ↗

anthropic fable mythos cybersecurity ai-safety guardrails claude

model releaseJuly 24, 2026

Anthropic Releases Claude Opus 5, Claims Near-Fable 5 Intelligence at Half the Price

Anthropic has released Claude Opus 5, upgrading from Opus 4.8, with pricing held at $5 per million input tokens and $25 per million output tokens. The company claims the model approaches the intelligence of its flagship Fable 5 model at half the cost.

model releaseJuly 24, 2026

Anthropic Launches Claude Opus 5 (Fast) at $10/$50 per Million Tokens, 1M Context Window

Anthropic has released Claude Opus 5 (Fast), a higher-throughput variant of Opus 5 that carries identical capabilities but runs at roughly 2x the price of the standard model. The model ships with a 1 million token context window and is available now through OpenRouter.

model releaseJuly 24, 2026

Anthropic Launches Opus 5, Claims Fewer Restrictions and Stronger Self-Verification Than Rivals

Anthropic released Opus 5 on Friday, its latest flagship model, just two months after Opus 4.8. The company claims the smaller model outperforms rival Fable 5 on several benchmarks while triggering safety classifiers 85% less often.

model releaseJuly 24, 2026

Anthropic SDK v0.120.0 Adds Reference to Unannounced 'Claude Opus 5' Model

The anthropic-sdk-python v0.120.0 release adds a reference to a model identifier called claude-opus-5, the first public sign of a next-generation Opus model. Anthropic has not issued an official announcement, and no pricing, context window, or benchmark data has been disclosed.

Anthropic's Fable cybersecurity model blocks routine security work, researchers say

Anthropic's Fable cybersecurity model blocks routine security work, researchers say

How the guardrails work

Access to Mythos remains restricted

What this means

Related Articles

Anthropic Releases Claude Opus 5, Claims Near-Fable 5 Intelligence at Half the Price

Anthropic Launches Claude Opus 5 (Fast) at $10/$50 per Million Tokens, 1M Context Window

Anthropic Launches Opus 5, Claims Fewer Restrictions and Stronger Self-Verification Than Rivals

Anthropic SDK v0.120.0 Adds Reference to Unannounced 'Claude Opus 5' Model

Comments