Anthropic's Mythos Preview solves previously unsolvable cybersecurity test in updated checkpoint
A month after its initial release, a newer checkpoint of Anthropic's Mythos Preview became the first model to complete the UK AI Safety Institute's 'Cooling Tower' cyber range, solving it in 3 of 10 attempts. The model also completed 'The Last Ones' range in 6 of 10 attempts, surpassing OpenAI's GPT-5.5 and demonstrating capability improvements within a single model version.
Anthropic's Mythos Preview solves previously unsolvable cybersecurity test in updated checkpoint
A newer checkpoint of Anthropic's Mythos Preview has become the first AI model to complete the UK AI Safety Institute's (AISI) second cyber range test, according to a report published Wednesday. The model solved the previously unsolved 'Cooling Tower' range in 3 of 10 attempts and completed 'The Last Ones' range in 6 of 10 attempts — just one month after Mythos' initial release.
The results demonstrate that capability improvements are occurring between checkpoints of a single model, not just across major releases. The newer Mythos Preview checkpoint outperformed both its earlier version and OpenAI's GPT-5.5 on AISI's cybersecurity benchmarks.
Accelerating cyber capabilities
AISI's testing revealed a rapid acceleration in AI models' ability to handle cybersecurity tasks. According to the institute's internal estimates from February 2026, the length of cyber tasks AI models could complete had doubled every 4.7 months since late 2024 — faster than their November 2025 estimate of 8 months.
"Since then, AISI reported on two new models, Claude Mythos Preview and GPT-5.5, which substantially exceeded both doubling rate trends," the researchers wrote. Whether this acceleration represents a lasting trend or temporary spikes from outlier models remains unclear.
Testing limitations understate capabilities
AISI researchers acknowledged their tests significantly understate what frontier models can actually do. The cyber range experiments capped tasks at 2.5 million tokens to enable consistent performance comparisons over time.
"Mythos Preview and GPT-5.5 have large upper-bound error bars due to near-100% success rates on our narrow cyber suite's longest tasks, even with the 2.5M token limit," the report stated. Without the token cap, success rates would be substantially higher — so high that "time horizons become impossible to calculate."
In separate testing using up to 100 million tokens, AISI found performance would likely continue improving beyond that budget, especially for recent models which "disproportionately benefit from higher token limits." The tests also cannot determine how sharply model reliability deteriorates at higher task lengths, placing the latest models "at the limit of what our narrow test suite can measure."
Project Glasswing context
Anthropic released Mythos Preview in April 2026 as part of Project Glasswing, a cybersecurity testing alliance formed with rival tech companies and AI labs including Apple, Google, Microsoft, and OpenAI. The company maintains that Mythos is too powerful for general release and has granted only limited access to the model.
AISI's initial evaluation last month found Mythos "represents a step up over previous frontier models in a landscape where cyber performance was already rapidly improving." The updated testing confirms those capabilities have advanced further in the model's newer checkpoint.
What this means
The faster-than-expected evolution of Mythos Preview's cybersecurity capabilities — particularly its ability to detect software vulnerabilities — presents serious implications for the security landscape. The fact that improvements are happening between checkpoints, not just major releases, suggests AI capability gains may be harder to predict and control than previously thought. The testing limitations also mean we're likely underestimating how capable these models already are at extended, complex tasks when given sufficient token budgets and agent infrastructure.
Related Articles
Anthropic's Mythos model finds tens of thousands of vulnerabilities, CEO warns of 6-12 month patching window
Anthropic CEO Dario Amodei disclosed that the company's Mythos model has uncovered tens of thousands of software vulnerabilities, including nearly 300 in Firefox alone compared to 20 found by earlier Claude models. Amodei warned of a 6-12 month window to patch these vulnerabilities before Chinese AI systems catch up in capability.
Anthropic's Mythos model finds thousands of high-severity bugs in Firefox, including 15-year-old vulnerabilities
Mozilla's Firefox team reports that Anthropic's Mythos model has discovered thousands of high-severity security vulnerabilities, including bugs that had remained undetected for more than 15 years. In April 2026, Firefox shipped 423 bug fixes compared to just 31 in April 2025, marking a 13x increase attributed to AI-assisted vulnerability detection.
Anthropic launches Claude for Small Business with connectors for QuickBooks, PayPal, and Microsoft 365
Anthropic has released Claude for Small Business, a package of connectors and workflows that integrates Claude into tools including QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, and Microsoft 365. The solution runs through the Claude desktop app on Mac and includes automated workflows for payroll, month-end closing, sales campaigns, and invoice management.
Anthropic releases Claude Opus 4.7 Fast with 6x pricing for higher output speed
Anthropic has released Claude Opus 4.7 Fast, a speed-optimized variant of its Opus 4.7 model. The fast-mode version delivers identical capabilities with higher output speed at premium pricing: $30 per 1M input tokens and $150 per 1M output tokens, representing a 6x increase over standard pricing.
Comments
Loading...