LLM News

Every LLM release, update, and milestone.

Filtered by:ai-agents✕ clear
research

AI agent outperforms 9 of 10 human hackers in live penetration testing study

A new AI agent framework called ARTEMIS discovered 9 valid vulnerabilities in live penetration testing against a university network with ~8,000 hosts, outperforming 9 of 10 human cybersecurity professionals. The system achieved an 82% valid submission rate and costs $18/hour compared to $60/hour for professional penetration testers, though it struggles with GUI-based tasks and produces higher false-positive rates.

product updateAnthropic

Anthropic launches Claude Code Remote Control for device automation

Anthropic has released Claude Code Remote Control, a new feature allowing users to initiate remote control sessions on their computers and manage them via Claude Code on web, iOS, and native apps. The feature is in early stages with reported stability issues including API 500 errors and permission approval requirements.

2 min readvia simonwillison.net