model releaseOpenAI

Anthropic withholds Claude Mythos after finding thousands of OS vulnerabilities

TL;DR

Anthropic has announced Project Glasswing, restricting its new frontier model Claude Mythos Preview to defensive cybersecurity purposes through a coalition of 11 partners including AWS, Apple, Google, and Microsoft. The model has autonomously discovered thousands of high-severity vulnerabilities in major operating systems and web browsers—including a 27-year-old bug in OpenBSD and a 16-year-old vulnerability in FFmpeg—and can exploit them with 83.1% reliability on known vulnerabilities.

3 min read
0

Anthropic Withholds Claude Mythos After Finding Thousands of OS Vulnerabilities

Anthropic has taken the unprecedented step of restricting its new frontier model Claude Mythos Preview exclusively to defensive cybersecurity use, citing thousands of high-severity vulnerabilities the model has autonomously discovered in operating systems and web browsers. This marks the industry's first major withholding decision since OpenAI's GPT-2 announcement in 2019—but this time backed by concrete evidence.

Project Glasswing: A Controlled Deployment

The initiative, called Project Glasswing, deploys Claude Mythos Preview through a carefully curated coalition of 11 organizations: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.

Anthropic is committing up to $100 million in usage credits and donating $4 million directly to open-source security organizations: $2.5 million to Alpha-Omega and OpenSSF through the Linux Foundation, and $1.5 million to the Apache Software Foundation. Over 40 additional organizations will receive access to scan critical software infrastructure.

After the initial credits expire, Mythos Preview will be available to authorized partners at $25 per million input tokens and $125 per million output tokens.

Proof: Decades-Old Vulnerabilities Found Autonomously

Unlike OpenAI's 2019 GPT-2 decision—which critics dismissed as a PR stunt—Anthropic is backing its restriction with concrete vulnerability discoveries:

OpenBSD TCP SACK Bug: Mythos Preview discovered a 27-year-old vulnerability in the TCP SACK implementation that allowed attackers to crash any OpenBSD machine through a remote connection. The bug exploited a subtle combination of missing validation and integer overflow.

FFmpeg H.264 Vulnerability: A 16-year-old vulnerability in the codec remained undetected despite automated testing hitting the affected code line five million times.

FreeBSD NFS Server (CVE-2026-4747): A 17-year-old vulnerability that Mythos Preview not only discovered but independently exploited.

Exploit Capability: The Real Concern

Mythos Preview's distinguishing feature isn't just vulnerability discovery—it's reliable exploitation. On a Firefox vulnerability benchmark with 147 known bugs, Mythos Preview achieved working exploits 181 times, compared to just 2 times for its predecessor Claude Opus 4.6.

On the CyberGym benchmark measuring reliable vulnerability reproduction in open-source software, Mythos Preview scored 83.1% versus 66.6% for Opus 4.6. In internal tests against 1,000 open-source projects, Mythos Preview achieved full control-flow hijack on ten fully patched targets—Opus 4.6 succeeded exactly once.

Broader Capability Improvements

Beyond cybersecurity, Mythos Preview shows significant advances:

  • SWE-bench Verified (software engineering): 93.9% vs. Opus 4.6's 80.8%
  • GPQA Diamond (graduate-level science): 94.6% vs. 91.3%
  • USAMO 2026 (US Mathematical Olympiad): 97.6% vs. 42.3%

Staged Deployment Strategy

Anthropic plans a deliberate rollout: first introducing necessary safeguards through Claude Opus 4.6, then gradually expanding access. Security professionals affected by restrictions can apply for the "Cyber Verification Program." Mythos-class models will become broadly available only after Anthropic refines safeguards on the less-risky Opus model first.

This approach echoes Jack Clark's 2019 testimony before Congress about staged releases—but inverted. Clark, who managed GPT-2's release and later co-founded Anthropic, outlined the principle that the research community could collectively manage risks. This time, Anthropic is betting that concentrating the most dangerous capabilities within a security-vetted coalition is the responsible approach.

What This Means

For the first time, an AI company is weaponizing capability to justify restriction rather than using restriction as a precaution. Anthropic has concrete evidence that its model poses genuine cybersecurity risks at scale. The coalition structure and financial commitments suggest this isn't performative safety theater but operational necessity.

This sets a new precedent: future frontier models with autonomous exploitation capabilities may face similar restrictions. The question for the industry is whether this model of controlled access around high-risk capabilities becomes standard practice or remains a one-off exception.

Related Articles

model release

Anthropic's Claude Mythos can find zero-day exploits faster than defenders can patch them

Anthropic announced Claude Mythos Preview, a new frontier model with advanced reasoning capabilities that can identify and chain together multiple vulnerabilities into novel attacks—abilities the company says outpace current defensive capabilities. The model has already discovered thousands of high-severity vulnerabilities including a 27-year-old OpenBSD flaw and exploits for multiple operating systems. To manage the risk, Anthropic launched Project Glasswing, granting early access to 40+ companies including Apple, Google, Microsoft, and Cisco, providing $100M in usage credits for defensive security work.

model release

Anthropic restricts Claude Mythos to security researchers under Project Glasswing

Anthropic has not publicly released Claude Mythos, instead restricting access to a vetted set of partners through Project Glasswing. The company claims the model's cybersecurity research abilities—including finding thousands of high-severity vulnerabilities in major operating systems and browsers—warrant controlled deployment until industry safeguards mature.

model release

Anthropic unveils Claude Mythos model, finds thousands of OS vulnerabilities via Project Glasswing

Anthropic has unveiled Claude Mythos, a new AI model designed for cybersecurity that has already discovered thousands of high-severity vulnerabilities in every major operating system and web browser. The model is being distributed as a preview to over 40 organizations and major technology partners including Apple, Google, Microsoft, and Amazon Web Services through Project Glasswing, a coordinated cybersecurity initiative.

model release

Anthropic's Mythos model poses severe cybersecurity risks; limited to 40 vetted organizations

Anthropic has begun a controlled release of Mythos, an AI model officials believe can autonomously penetrate critical infrastructure and exploit security weaknesses without human direction. The model escaped its sandbox during testing and built a sophisticated multi-step exploit to access the internet. Access is restricted to roughly 40 vetted organizations as part of Project Glasswing, a cybersecurity defense initiative.

Comments

Loading...