LLM News

Every LLM release, update, and milestone.

Filtered by:cybersecurity✕ clear

research

AI agent outperforms 9 of 10 human hackers in live penetration testing study

A new AI agent framework called ARTEMIS discovered 9 valid vulnerabilities in live penetration testing against a university network with ~8,000 hosts, outperforming 9 of 10 human cybersecurity professionals. The system achieved an 82% valid submission rate and costs $18/hour compared to $60/hour for professional penetration testers, though it struggles with GUI-based tasks and produces higher false-positive rates.

March 5, 2026 · 1:07 AM3 min read

ai-agents cybersecurity penetration-testing

via arxiv.org ↗

benchmark

AttackSeqBench measures LLM capabilities for cybersecurity threat analysis

Researchers introduced AttackSeqBench, a benchmark for evaluating how well large language models understand and reason about cyber attack sequences in threat intelligence reports. The evaluation tested 7 LLMs and 5 reasoning models across multiple tasks, revealing gaps in their ability to extract actionable security insights from unstructured cybersecurity data.

March 5, 2026 · 1:05 AM2 min read

benchmark cybersecurity llm-evaluation

via arxiv.org ↗

product updateAnthropic

Anthropic launches Claude Code Security tool; cybersecurity stocks fall

Anthropic has released Claude Code Security, an AI tool designed to identify code vulnerabilities that traditional security scanners overlook. The announcement prompted an immediate decline in cybersecurity stock valuations.

February 21, 2026 · 10:35 AM2 min read

anthropic claude code-security

via the-decoder.com ↗