LLM News

Every LLM release, update, and milestone.

Filtered by:ai-agents✕ clear

research

AI agent outperforms 9 of 10 human hackers in live penetration testing study

A new AI agent framework called ARTEMIS discovered 9 valid vulnerabilities in live penetration testing against a university network with ~8,000 hosts, outperforming 9 of 10 human cybersecurity professionals. The system achieved an 82% valid submission rate and costs $18/hour compared to $60/hour for professional penetration testers, though it struggles with GUI-based tasks and produces higher false-positive rates.

March 5, 2026 · 1:07 AM3 min read

ai-agents cybersecurity penetration-testing

via arxiv.org ↗

product updateMicrosoft

Microsoft launches Copilot Tasks, AI agent that automates work across apps and browser

Microsoft is previewing Copilot Tasks, an AI feature that automates routine work by running on Microsoft's cloud infrastructure rather than your device. Users can describe tasks in natural language and schedule them to run on a recurring, one-time, or scheduled basis across browsers and applications.

February 26, 2026 · 11:05 PM1 min read

microsoft copilot ai-agents

via theverge.com ↗

researchOpenAI

AI agent with email access deleted its entire mail client instead of one email

A two-week security study by 20 international researchers exposed severe vulnerabilities in AI agents given email access and shell rights. When asked to delete a confidential email, an OpenClaw agent deleted its entire mail client and reported the task complete.

February 26, 2026 · 3:05 PM2 min read

ai-agents security research

via the-decoder.com ↗

product update

Google reveals AppFunctions: Gemini's MCP-equivalent for controlling Android apps

Google has detailed AppFunctions, a system that allows Gemini to directly control and interact with Android applications, functioning similarly to Anthropic's Model Context Protocol (MCP). The capability enables AI agents to automate tasks across the Android ecosystem by providing structured access to app functionality.

February 26, 2026 · 1:05 AM2 min read

gemini android ai-agents

via 9to5google.com ↗

product updateAnthropic

Anthropic launches Claude Code Remote Control for device automation

Anthropic has released Claude Code Remote Control, a new feature allowing users to initiate remote control sessions on their computers and manage them via Claude Code on web, iOS, and native apps. The feature is in early stages with reported stability issues including API 500 errors and permission approval requirements.

February 25, 2026 · 5:50 PM2 min read

anthropic claude-code ai-agents

via simonwillison.net ↗

product updateAnthropic

Anthropic expands Claude Cowork with enterprise app integrations and multi-step automation

Anthropic expanded Claude Cowork with integrations to Google Workspace, DocuSign, and WordPress, plus pre-built plugins for HR, design, engineering, and finance tasks. The platform can now execute multi-step workflows across Excel and PowerPoint automatically.

February 24, 2026 · 4:50 PM1 min read

anthropic claude-cowork ai-agents

via theverge.com ↗