Anthropic's Mythos Preview solves previously unsolvable cybersecurity test in updated checkpoint
A month after its initial release, a newer checkpoint of Anthropic's Mythos Preview became the first model to complete the UK AI Safety Institute's 'Cooling Tower' cyber range, solving it in 3 of 10 attempts. The model also completed 'The Last Ones' range in 6 of 10 attempts, surpassing OpenAI's GPT-5.5 and demonstrating capability improvements within a single model version.
Anthropic's Mythos Preview solves previously unsolvable cybersecurity test in updated checkpoint
A newer checkpoint of Anthropic's Mythos Preview has become the first AI model to complete the UK AI Safety Institute's (AISI) second cyber range test, according to a report published Wednesday. The model solved the previously unsolved 'Cooling Tower' range in 3 of 10 attempts and completed 'The Last Ones' range in 6 of 10 attempts — just one month after Mythos' initial release.
The results demonstrate that capability improvements are occurring between checkpoints of a single model, not just across major releases. The newer Mythos Preview checkpoint outperformed both its earlier version and OpenAI's GPT-5.5 on AISI's cybersecurity benchmarks.
Accelerating cyber capabilities
AISI's testing revealed a rapid acceleration in AI models' ability to handle cybersecurity tasks. According to the institute's internal estimates from February 2026, the length of cyber tasks AI models could complete had doubled every 4.7 months since late 2024 — faster than their November 2025 estimate of 8 months.
"Since then, AISI reported on two new models, Claude Mythos Preview and GPT-5.5, which substantially exceeded both doubling rate trends," the researchers wrote. Whether this acceleration represents a lasting trend or temporary spikes from outlier models remains unclear.
Testing limitations understate capabilities
AISI researchers acknowledged their tests significantly understate what frontier models can actually do. The cyber range experiments capped tasks at 2.5 million tokens to enable consistent performance comparisons over time.
"Mythos Preview and GPT-5.5 have large upper-bound error bars due to near-100% success rates on our narrow cyber suite's longest tasks, even with the 2.5M token limit," the report stated. Without the token cap, success rates would be substantially higher — so high that "time horizons become impossible to calculate."
In separate testing using up to 100 million tokens, AISI found performance would likely continue improving beyond that budget, especially for recent models which "disproportionately benefit from higher token limits." The tests also cannot determine how sharply model reliability deteriorates at higher task lengths, placing the latest models "at the limit of what our narrow test suite can measure."
Project Glasswing context
Anthropic released Mythos Preview in April 2026 as part of Project Glasswing, a cybersecurity testing alliance formed with rival tech companies and AI labs including Apple, Google, Microsoft, and OpenAI. The company maintains that Mythos is too powerful for general release and has granted only limited access to the model.
AISI's initial evaluation last month found Mythos "represents a step up over previous frontier models in a landscape where cyber performance was already rapidly improving." The updated testing confirms those capabilities have advanced further in the model's newer checkpoint.
What this means
The faster-than-expected evolution of Mythos Preview's cybersecurity capabilities — particularly its ability to detect software vulnerabilities — presents serious implications for the security landscape. The fact that improvements are happening between checkpoints, not just major releases, suggests AI capability gains may be harder to predict and control than previously thought. The testing limitations also mean we're likely underestimating how capable these models already are at extended, complex tasks when given sufficient token budgets and agent infrastructure.
Related Articles
Trump Administration Permits Anthropic's Claude Mythos 5 for 100+ US Organizations After Two-Week Ban
The Trump administration is allowing Anthropic to deploy Claude Mythos 5 to over 100 specific US government agencies and companies, two weeks after banning the cybersecurity model. Commerce Secretary Howard Lutnick approved access for organizations operating critical infrastructure, including non-American employees, though Fable 5 remains unavailable.
U.S. government orders Anthropic to halt exports of Mythos and Fable AI models, both now offline for one week
The White House ordered Anthropic to restrict exports of its Mythos and Fable AI models last Friday, citing national security concerns. Anthropic pulled both models offline within 90 minutes of the Commerce Department directive, marking the first major test of AI export controls.
US export controls force Anthropic to take Claude Fable 5 offline indefinitely
The US government imposed export controls on Anthropic's newly released Claude Fable 5 and underlying Mythos models on Friday, restricting access even for foreign nationals working at Anthropic in the United States. Anthropic took both models completely offline rather than risk non-compliance, leaving Fable unavailable to all users as of this writing.
US government authorizes Anthropic to restore Mythos 5 cybersecurity model to 100+ institutions
The US government has authorized Anthropic to redeploy its Mythos 5 cybersecurity AI model to more than 100 US institutions, including major corporations and government agencies, following a two-week suspension. Commerce Secretary Howard Lutnick approved the redeployment after Anthropic implemented safeguards and committed to work with the government on release protocols.
Comments
Loading...