product updateOpenAI

OpenAI releases GPT-5.5-Cyber with 85.6% CyberGym score, surpassing restricted Anthropic model

TL;DR

OpenAI released an updated GPT-5.5-Cyber model that scores 85.6% on CyberGym, surpassing Anthropic's Mythos 5 (83.8%) — the same model that triggered Trump administration export controls. The release proceeds without the political pushback that forced Anthropic to restrict foreign national access.

2 min read
0

OpenAI released an updated version of GPT-5.5-Cyber on Monday that achieves an 85.6% score on CyberGym, an internal benchmark measuring AI agents' ability to reproduce known software vulnerabilities. The model's capabilities exceed those of Anthropic's Mythos 5, which scored 83.8% on the same evaluation according to Anthropic's system card.

The release raises questions about the Trump administration's selective enforcement of AI security concerns. While Anthropic faces export controls barring foreign nationals from accessing Fable 5 and Mythos 5, OpenAI's more capable cybersecurity model deployed without apparent restrictions or political intervention.

Diverging regulatory treatment

OpenAI announced the GPT-5.5-Cyber update alongside expanded partnerships with organizations in Australia, Canada, France, Germany, Japan, Poland, South Korea, and the EU. The company did not respond to requests for comment about coordination with federal authorities.

In contrast, negotiating access to Mythos dominated discussions at last week's G7 Summit. Anthropic remains subject to export directives that restrict international use of its models, despite achieving lower benchmark scores than OpenAI's newly released system.

The White House did not respond to requests for comment on the apparent inconsistency in treatment between the two companies.

CyberGym benchmark context

CyberGym measures whether AI agents can successfully reproduce known software vulnerabilities — a capability that raises both defensive and offensive security concerns. The 1.8 percentage point difference between GPT-5.5-Cyber (85.6%) and Mythos 5 (83.8%) represents a meaningful performance gap on this evaluation.

OpenAI positioned the release within a broader cybersecurity initiative, announcing partnerships with security companies and researchers. The company did not disclose specific technical changes from previous GPT-5.5-Cyber versions.

Political dimensions

Reports suggest personality conflicts between Anthropic leadership and the Trump administration contributed to the export controls, beyond purely technical security assessments. The ability of OpenAI to deploy a more capable cybersecurity model without similar restrictions suggests non-technical factors influenced the regulatory divergence.

The situation has created operational challenges for cybersecurity defenders who rely on advanced AI models, with some organizations unable to access Anthropic's restricted systems despite their defensive use cases.

What this means

The inconsistent application of AI security controls between comparable models from different companies signals either incomplete threat assessments or politically-influenced regulation. Organizations building cybersecurity defenses now face uncertainty about which capabilities will remain accessible and under what conditions. The CyberGym benchmark scores provide quantifiable evidence that regulatory restrictions did not correlate with demonstrated technical capabilities.

Related Articles

product update

U.S. government orders Anthropic to halt exports of Mythos and Fable AI models, both now offline for one week

The White House ordered Anthropic to restrict exports of its Mythos and Fable AI models last Friday, citing national security concerns. Anthropic pulled both models offline within 90 minutes of the Commerce Department directive, marking the first major test of AI export controls.

product update

Anthropic launches Claude Tag for Slack, writes 65% of its product team's code

Anthropic released Claude Tag, a beta feature that integrates Claude into Slack for Enterprise and Team customers. The company says the tool writes 65% of its product team's code and can work proactively with ambient mode enabled.

product update

Anthropic launches Claude Tag for Slack: AI agent with persistent memory across team channels

Anthropic has released Claude Tag in research preview for Slack, an AI agent that maintains persistent memory across channels and can proactively participate in team conversations. Available to Claude Enterprise and Team customers, it differs from existing Slack integrations by learning organizational context over time and sharing a single identity across team members.

product update

Claude API and web services restored after 35-minute outage affecting Sonnet and Opus models

Anthropic's Claude services went offline on June 23 at 10:19 AM ET, affecting most models including Sonnet and Opus across all platforms except Claude for Government. The company deployed a fix by 10:53 AM ET, ending an outage that lasted approximately 35 minutes.

Comments

Loading...