model releaseAnthropic

Anthropic withholds Claude Mythos Preview from public release due to autonomous cybersecurity exploit capabilities

TL;DR

Anthropic has declined to publicly release Claude Mythos Preview, its most capable AI model, citing critical cybersecurity risks. Instead, the company launched Project Glasswing, providing controlled access to 50+ organizations including AWS, Apple, Google, and Microsoft, along with $100 million in usage credits and $4 million in direct donations to open-source security initiatives.

3 min read
0

Anthropic Withholds Claude Mythos Preview Over Autonomous Cybersecurity Exploit Capabilities

Anthropic has declined to release Claude Mythos Preview publicly, citing risks from its autonomous ability to discover and chain together vulnerabilities across major operating systems and web browsers. Instead, the company established Project Glasswing, a controlled-access initiative distributing the model exclusively to vetted critical infrastructure organizations.

Project Glasswing: Controlled Deployment Model

The initiative's core launch partners include Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks. Access extends to over 40 additional organizations responsible for maintaining critical software infrastructure.

Anthropic is committing $100 million in usage credits for Mythos Preview through the program, plus $4 million in direct donations to open-source security organizations. The Linux Foundation received $2.5 million for Alpha-Omega and OpenSSF initiatives, while the Apache Software Foundation received $1.5 million—enabling open-source maintainers to access AI-powered vulnerability scanning at previously unavailable scale.

Autonomous Vulnerability Discovery at Scale

Mythos Preview was not specifically trained for cybersecurity tasks. Anthropic states the capabilities "emerged as a downstream consequence of general improvements in code, reasoning, and autonomy." The model has saturated existing security benchmarks, forcing the company to focus on real-world zero-day vulnerabilities previously unknown to software developers.

The model's findings include:

  • A 27-year-old security bug in OpenBSD, an operating system known for rigorous security practices
  • Autonomous identification and exploitation of CVE-2026-4747, a 17-year-old remote code execution vulnerability in FreeBSD enabling unauthenticated internet users to obtain complete server control via NFS
  • Capacity to chain three to five vulnerabilities sequentially to create sophisticated exploits

Nicholas Carlini, Anthropic researcher, stated: "I've found more bugs in the last couple of weeks than I found in the rest of my life combined."

Why Restricted Release

Newton Cheng, Frontier Red Team Cyber Lead at Anthropic, explained the decision: "We do not plan to make Claude Mythos Preview generally available due to its cybersecurity capabilities. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe."

Anthropic previously documented the first confirmed cyberattack largely executed by AI, involving a Chinese state-sponsored group using AI agents to autonomously infiltrate approximately 30 global targets. The company has privately briefed senior U.S. government officials on Mythos Preview's full capabilities, with the intelligence community actively evaluating how the model could reshape offensive and defensive hacking operations.

Safeguards Before Scale

Anthropic plans eventual large-scale deployment of Mythos-class models only after implementing new safeguards. The company will introduce these safeguards first with an upcoming Claude Opus model, allowing refinement before deployment of higher-risk models.

OpenAI classified its GPT-5.3-Codex as high-capability for cybersecurity tasks under its Preparedness Framework when released in February. Anthropic's Glasswing initiative signals that frontier labs are adopting controlled deployment—rather than open release—as the emerging standard for models at this capability level.

What This Means

Anthropic's decision reflects a fundamental shift in how frontier labs handle models with dual-use offensive capabilities. Rather than releasing and hoping for responsible use, Anthropic implemented gatekeeping with meaningful resource allocation ($104 million total commitment) to accelerate defensive security infrastructure. The approach acknowledges that capabilities like autonomous zero-day exploitation cannot be responsibly released broadly, while simultaneously addressing market demands through restricted enterprise partnerships. Whether this restraint standard persists as capabilities proliferate across the entire AI industry remains an open question.

Related Articles

model release

Anthropic's Unreleased Claude Mythos Preview Finds 10,000+ Vulnerabilities in One Month

Anthropic's unreleased Claude Mythos Preview model has discovered more than 10,000 vulnerabilities across partner organizations in its first month of deployment through Project Glasswing. The company reports partners are finding bugs at 10x their previous rate, with Cloudflare discovering 2,000 bugs and Mozilla finding 271 Firefox vulnerabilities — 10x more than with previous Claude models.

changelog

Anthropic Python SDK v0.104.0 adds thinking token count estimates for streaming responses

Anthropic released version 0.104.0 of its Python SDK on May 21, 2026. The update adds support for a thinking-token-count beta feature that provides estimated token counts in thinking block deltas when streaming responses from reasoning models.

model release

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

model release

Stability AI Releases Stable Audio 3 Medium: 2B-Parameter Audio Generation Model with 180-Second Output in Under 2 Secon

Stability AI has released Stable Audio 3 Medium, a 2 billion parameter latent diffusion model capable of generating variable-length audio up to 380 seconds. The model generates music and sound effects in less than 2 seconds on an H200 GPU, trained on 1.28 million licensed and Creative Commons audio recordings.

Comments

Loading...