model releaseOpenAI

Anthropic withholds Claude Mythos after finding thousands of OS vulnerabilities

TL;DR

Anthropic has announced Project Glasswing, restricting its new frontier model Claude Mythos Preview to defensive cybersecurity purposes through a coalition of 11 partners including AWS, Apple, Google, and Microsoft. The model has autonomously discovered thousands of high-severity vulnerabilities in major operating systems and web browsers—including a 27-year-old bug in OpenBSD and a 16-year-old vulnerability in FFmpeg—and can exploit them with 83.1% reliability on known vulnerabilities.

3 min read
0

Anthropic Withholds Claude Mythos After Finding Thousands of OS Vulnerabilities

Anthropic has taken the unprecedented step of restricting its new frontier model Claude Mythos Preview exclusively to defensive cybersecurity use, citing thousands of high-severity vulnerabilities the model has autonomously discovered in operating systems and web browsers. This marks the industry's first major withholding decision since OpenAI's GPT-2 announcement in 2019—but this time backed by concrete evidence.

Project Glasswing: A Controlled Deployment

The initiative, called Project Glasswing, deploys Claude Mythos Preview through a carefully curated coalition of 11 organizations: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.

Anthropic is committing up to $100 million in usage credits and donating $4 million directly to open-source security organizations: $2.5 million to Alpha-Omega and OpenSSF through the Linux Foundation, and $1.5 million to the Apache Software Foundation. Over 40 additional organizations will receive access to scan critical software infrastructure.

After the initial credits expire, Mythos Preview will be available to authorized partners at $25 per million input tokens and $125 per million output tokens.

Proof: Decades-Old Vulnerabilities Found Autonomously

Unlike OpenAI's 2019 GPT-2 decision—which critics dismissed as a PR stunt—Anthropic is backing its restriction with concrete vulnerability discoveries:

OpenBSD TCP SACK Bug: Mythos Preview discovered a 27-year-old vulnerability in the TCP SACK implementation that allowed attackers to crash any OpenBSD machine through a remote connection. The bug exploited a subtle combination of missing validation and integer overflow.

FFmpeg H.264 Vulnerability: A 16-year-old vulnerability in the codec remained undetected despite automated testing hitting the affected code line five million times.

FreeBSD NFS Server (CVE-2026-4747): A 17-year-old vulnerability that Mythos Preview not only discovered but independently exploited.

Exploit Capability: The Real Concern

Mythos Preview's distinguishing feature isn't just vulnerability discovery—it's reliable exploitation. On a Firefox vulnerability benchmark with 147 known bugs, Mythos Preview achieved working exploits 181 times, compared to just 2 times for its predecessor Claude Opus 4.6.

On the CyberGym benchmark measuring reliable vulnerability reproduction in open-source software, Mythos Preview scored 83.1% versus 66.6% for Opus 4.6. In internal tests against 1,000 open-source projects, Mythos Preview achieved full control-flow hijack on ten fully patched targets—Opus 4.6 succeeded exactly once.

Broader Capability Improvements

Beyond cybersecurity, Mythos Preview shows significant advances:

  • SWE-bench Verified (software engineering): 93.9% vs. Opus 4.6's 80.8%
  • GPQA Diamond (graduate-level science): 94.6% vs. 91.3%
  • USAMO 2026 (US Mathematical Olympiad): 97.6% vs. 42.3%

Staged Deployment Strategy

Anthropic plans a deliberate rollout: first introducing necessary safeguards through Claude Opus 4.6, then gradually expanding access. Security professionals affected by restrictions can apply for the "Cyber Verification Program." Mythos-class models will become broadly available only after Anthropic refines safeguards on the less-risky Opus model first.

This approach echoes Jack Clark's 2019 testimony before Congress about staged releases—but inverted. Clark, who managed GPT-2's release and later co-founded Anthropic, outlined the principle that the research community could collectively manage risks. This time, Anthropic is betting that concentrating the most dangerous capabilities within a security-vetted coalition is the responsible approach.

What This Means

For the first time, an AI company is weaponizing capability to justify restriction rather than using restriction as a precaution. Anthropic has concrete evidence that its model poses genuine cybersecurity risks at scale. The coalition structure and financial commitments suggest this isn't performative safety theater but operational necessity.

This sets a new precedent: future frontier models with autonomous exploitation capabilities may face similar restrictions. The question for the industry is whether this model of controlled access around high-risk capabilities becomes standard practice or remains a one-off exception.

Related Articles

model release

Anthropic's Unreleased Claude Mythos Preview Finds 10,000+ Vulnerabilities in One Month

Anthropic's unreleased Claude Mythos Preview model has discovered more than 10,000 vulnerabilities across partner organizations in its first month of deployment through Project Glasswing. The company reports partners are finding bugs at 10x their previous rate, with Cloudflare discovering 2,000 bugs and Mozilla finding 271 Firefox vulnerabilities — 10x more than with previous Claude models.

product update

OpenAI adds ChatGPT to Microsoft PowerPoint in public beta

OpenAI has integrated ChatGPT into Microsoft PowerPoint, allowing users to generate and edit presentation slides using natural language prompts. The feature is available in public beta to both free tier users and ChatGPT Business subscribers.

analysis

OpenAI reasoning model solves 80-year math problem as Anthropic hits $10.9B quarterly revenue

In a two-hour span Wednesday, OpenAI announced its reasoning model autonomously solved an 80-year-old geometry problem while Anthropic reported it's on track for $10.9 billion in Q2 revenue with $559 million in operating profit—two years ahead of internal projections. The developments came alongside Nvidia's $81.6 billion quarter, Anthropic's $1.25 billion monthly SpaceX compute deal, and a White House AI executive order signing.

model release

Tencent Releases Hy-MT2 Translation Models: 1.8B, 7B, and 30B-A3B Support 33 Languages

Tencent released Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B (MoE) sizes. All models support translation among 33 languages and follow translation instructions in multiple languages. The 1.8B model can be compressed to 440MB using 1.25-bit AngelSlim quantization.

Comments

Loading...