research

AI offensive cyber capabilities doubling every 5.7 months since 2024, study finds

TL;DR

AI offensive cybersecurity capabilities are accelerating faster than previously measured. Lyptus Research's new study finds the doubling time has compressed from 9.8 months (since 2019) to 5.7 months (since 2024), with GPT-5.3 Codex and Opus 4.6 now solving tasks at 50% success rates that would take human security experts three hours.

2 min read
0

AI Offensive Cyber Capabilities Doubling Every 5.7 Months Since 2024

AI safety research firm Lyptus Research has published findings showing that AI offensive cybersecurity capabilities are accelerating at an unprecedented rate. The study, based on the METR time-horizon method and involving ten professional security experts, tracked capability progression from GPT-2 in 2019 through current-generation models in 2026.

Key Findings

The research measured what it terms the "time horizon"—the complexity of tasks AI can solve given a fixed token budget. Since 2019, AI offensive cyber capability has doubled every 9.8 months. However, since 2024, this doubling time has accelerated dramatically to every 5.7 months.

GPT-5.3 Codex and Opus 4.6 can now achieve 50% success rates on tasks with a two-million-token budget that would require approximately three hours of work from human security experts. This represents a substantial jump from GPT-2's 30-second time horizon in 2019.

Token budget significantly impacts performance. When given ten million tokens instead of two million, GPT-5.3 Codex extends its time horizon from 3.1 hours to 10.5 hours—a threefold increase. The researchers note this suggests they may be underestimating actual progress rates.

Model Performance Gap

Open-source models currently trail closed-source counterparts by approximately 5.7 months in offensive cyber capability. The study evaluated 291 distinct tasks across the assessment period.

What This Means

The acceleration in AI offensive cybersecurity capabilities raises immediate policy implications. The shift from 9.8-month doubling to 5.7-month doubling indicates the capability trajectory is steepening, not flattening. At current acceleration rates, AI systems will reach capability parity with elite human security professionals significantly faster than previously projected.

The token-budget sensitivity revealed in the research suggests real-world deployment constraints—such as inference time limits—may be the primary practical brake on these capabilities rather than fundamental model limitations. This distinction matters for both defensive strategy and governance decisions.

The public availability of methodology and task data on GitHub and Hugging Face enables independent verification and follow-up research, though the specific identities and defensive details of tested tasks remain appropriately restricted.

The open-source lag of 5.7 months provides a narrow window before advanced offensive cyber capabilities become widely accessible through open models. Whether this gap widens or closes will depend on whether open-source development accelerates or open-source models begin training on more cybersecurity-relevant data.

Related Articles

research

OpenAI claims reasoning model disproved 80-year-old Erdős conjecture in geometry

OpenAI claims its new reasoning model has produced an original mathematical proof disproving a geometry conjecture first posed by Paul Erdős in 1946. The company says this is the first time AI has autonomously solved a prominent open problem central to a field of mathematics, with verification from mathematicians including Thomas Bloom and Noga Alon.

research

NVIDIA releases LoRA/DoRA fine-tuning guide for Cosmos Predict 2.5 to generate synthetic robot training data

NVIDIA published a technical guide for parameter-efficient fine-tuning of its Cosmos Predict 2.5 world model using LoRA and DoRA adapters. The method allows teams to adapt the 2B-parameter model to robot manipulation tasks on a single 80GB GPU, generating synthetic training trajectories from just 92 demonstration videos.

research

Gemma 4, DeepSeek V4, and ZAYA1 Deploy KV Cache Compression to Cut Long-Context Memory Costs

Recent open-weight LLM releases from Google, DeepSeek, and others are adopting architectural techniques that reduce KV cache size by approximately 50% at long contexts. These include cross-layer KV sharing in Gemma 4, which saves 2.7 GB at 128K context for the E2B model, and compressed convolutional attention in ZAYA1-8B.

research

Security researchers use Anthropic's Mythos Preview to bypass Apple's M5 memory protection in 5 days

Security researchers at Calif used Anthropic's Mythos Preview model to develop a working macOS kernel memory corruption exploit on M5 silicon in five days, bypassing Apple's Memory Integrity Enforcement (MIE) system. The exploit chain targets macOS 26.4.1 and escalates from unprivileged local user to root shell using two vulnerabilities and several techniques.

Comments

Loading...