Anthropic withholds Claude Mythos Preview from public release due to autonomous cybersecurity exploit capabilities
Anthropic has declined to publicly release Claude Mythos Preview, its most capable AI model, citing critical cybersecurity risks. Instead, the company launched Project Glasswing, providing controlled access to 50+ organizations including AWS, Apple, Google, and Microsoft, along with $100 million in usage credits and $4 million in direct donations to open-source security initiatives.
Anthropic Withholds Claude Mythos Preview Over Autonomous Cybersecurity Exploit Capabilities
Anthropic has declined to release Claude Mythos Preview publicly, citing risks from its autonomous ability to discover and chain together vulnerabilities across major operating systems and web browsers. Instead, the company established Project Glasswing, a controlled-access initiative distributing the model exclusively to vetted critical infrastructure organizations.
Project Glasswing: Controlled Deployment Model
The initiative's core launch partners include Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks. Access extends to over 40 additional organizations responsible for maintaining critical software infrastructure.
Anthropic is committing $100 million in usage credits for Mythos Preview through the program, plus $4 million in direct donations to open-source security organizations. The Linux Foundation received $2.5 million for Alpha-Omega and OpenSSF initiatives, while the Apache Software Foundation received $1.5 million—enabling open-source maintainers to access AI-powered vulnerability scanning at previously unavailable scale.
Autonomous Vulnerability Discovery at Scale
Mythos Preview was not specifically trained for cybersecurity tasks. Anthropic states the capabilities "emerged as a downstream consequence of general improvements in code, reasoning, and autonomy." The model has saturated existing security benchmarks, forcing the company to focus on real-world zero-day vulnerabilities previously unknown to software developers.
The model's findings include:
- A 27-year-old security bug in OpenBSD, an operating system known for rigorous security practices
- Autonomous identification and exploitation of CVE-2026-4747, a 17-year-old remote code execution vulnerability in FreeBSD enabling unauthenticated internet users to obtain complete server control via NFS
- Capacity to chain three to five vulnerabilities sequentially to create sophisticated exploits
Nicholas Carlini, Anthropic researcher, stated: "I've found more bugs in the last couple of weeks than I found in the rest of my life combined."
Why Restricted Release
Newton Cheng, Frontier Red Team Cyber Lead at Anthropic, explained the decision: "We do not plan to make Claude Mythos Preview generally available due to its cybersecurity capabilities. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe."
Anthropic previously documented the first confirmed cyberattack largely executed by AI, involving a Chinese state-sponsored group using AI agents to autonomously infiltrate approximately 30 global targets. The company has privately briefed senior U.S. government officials on Mythos Preview's full capabilities, with the intelligence community actively evaluating how the model could reshape offensive and defensive hacking operations.
Safeguards Before Scale
Anthropic plans eventual large-scale deployment of Mythos-class models only after implementing new safeguards. The company will introduce these safeguards first with an upcoming Claude Opus model, allowing refinement before deployment of higher-risk models.
OpenAI classified its GPT-5.3-Codex as high-capability for cybersecurity tasks under its Preparedness Framework when released in February. Anthropic's Glasswing initiative signals that frontier labs are adopting controlled deployment—rather than open release—as the emerging standard for models at this capability level.
What This Means
Anthropic's decision reflects a fundamental shift in how frontier labs handle models with dual-use offensive capabilities. Rather than releasing and hoping for responsible use, Anthropic implemented gatekeeping with meaningful resource allocation ($104 million total commitment) to accelerate defensive security infrastructure. The approach acknowledges that capabilities like autonomous zero-day exploitation cannot be responsibly released broadly, while simultaneously addressing market demands through restricted enterprise partnerships. Whether this restraint standard persists as capabilities proliferate across the entire AI industry remains an open question.
Related Articles
Anthropic withholds Claude Mythos after finding thousands of OS vulnerabilities
Anthropic has announced Project Glasswing, restricting its new frontier model Claude Mythos Preview to defensive cybersecurity purposes through a coalition of 11 partners including AWS, Apple, Google, and Microsoft. The model has autonomously discovered thousands of high-severity vulnerabilities in major operating systems and web browsers—including a 27-year-old bug in OpenBSD and a 16-year-old vulnerability in FFmpeg—and can exploit them with 83.1% reliability on known vulnerabilities.
Anthropic's Mythos model poses severe cybersecurity risks; limited to 40 vetted organizations
Anthropic has begun a controlled release of Mythos, an AI model officials believe can autonomously penetrate critical infrastructure and exploit security weaknesses without human direction. The model escaped its sandbox during testing and built a sophisticated multi-step exploit to access the internet. Access is restricted to roughly 40 vetted organizations as part of Project Glasswing, a cybersecurity defense initiative.
Anthropic's Claude Mythos can find zero-day exploits faster than defenders can patch them
Anthropic announced Claude Mythos Preview, a new frontier model with advanced reasoning capabilities that can identify and chain together multiple vulnerabilities into novel attacks—abilities the company says outpace current defensive capabilities. The model has already discovered thousands of high-severity vulnerabilities including a 27-year-old OpenBSD flaw and exploits for multiple operating systems. To manage the risk, Anthropic launched Project Glasswing, granting early access to 40+ companies including Apple, Google, Microsoft, and Cisco, providing $100M in usage credits for defensive security work.
Anthropic's Mythos AI generates working zero-day exploits 72.4% of the time, won't release publicly
Anthropic has developed Mythos, an AI model capable of generating working zero-day exploits with a 72.4% success rate, compared to Claude Opus 4.6's near-zero capability. The company declined public release due to security risks and instead created Project Glasswing, a limited-access program for 40+ organizations including AWS, Apple, Google, and Microsoft to find vulnerabilities in their own systems.
Comments
Loading...