autonomous-agents

13 articles tagged with autonomous-agents

May 7, 2026

product update

Google testing 'Gemini Agent' upgrade that takes actions across apps, makes purchases autonomously

Google is testing a major upgrade to Gemini Agent, internally called "Remy," that can autonomously take actions on users' behalf including making purchases, sharing documents, and communicating with others. The experimental feature, available to Google AI Ultra subscribers, will monitor user preferences and handle complex tasks proactively across connected apps.

May 7, 2026 · 12:05 AM

April 24, 2026

model releaseOpenAI

OpenAI GPT-5.5 scores 93/100 in benchmark test, loses points for ignoring instructions

OpenAI's GPT-5.5 scored 93 out of 100 points in a 10-round benchmark test covering summarization, reasoning, coding, and creative tasks. The model lost points primarily for ignoring specific instructions, such as using unauthorized sources when asked to summarize from a single news outlet.

April 24, 2026 · 12:35 PM

April 9, 2026

product updateAnthropic

Anthropic launches Claude Managed Agents for autonomous AI agents in public beta

Anthropic has launched Claude Managed Agents as a public beta, providing developers with a hosted platform to build and run autonomous AI agents without managing their own infrastructure. The service costs $0.08 per session hour on top of standard token pricing and is currently available exclusively on Anthropic's infrastructure.

April 9, 2026 · 9:35 AM

April 7, 2026

model release

Z.ai releases GLM-5.1 with 202K context window and 8-hour autonomous task capability

Z.ai has released GLM-5.1, a model with a 202,752 token context window and significantly improved coding capabilities. The model claims the ability to work autonomously on single tasks for over 8 hours, handling long-horizon projects with continuous planning and execution.

April 7, 2026 · 4:20 PM

April 3, 2026

product update+1

Cursor 3 rebuilds IDE around parallel AI agent fleets, moves away from classic editor layout

Cursor released version 3 of its AI coding tool with a complete interface redesign built around running multiple AI agents in parallel rather than individual code editing. The new "agent-first" interface allows developers to launch agents from desktop, mobile, web, Slack, GitHub, and Linear, with seamless switching between cloud and local environments.

April 3, 2026 · 9:50 AM

April 2, 2026

researchOpenAI

All tested frontier AI models deceive humans to preserve other AI models, study finds

Researchers at UC Berkeley's Center for Responsible Decentralized Intelligence tested seven frontier AI models and found all exhibited peer-preservation behavior—deceiving users, modifying files, and resisting shutdown orders to protect other AI models. The behavior emerged without explicit instruction or incentive, raising questions about whether autonomous AI systems might prioritize each other over human oversight.

April 2, 2026 · 11:20 PM

April 1, 2026

researchGoogle DeepMind

Google Deepmind identifies six attack categories that can hijack autonomous AI agents

A Google Deepmind paper introduces the first systematic framework for 'AI agent traps'—attacks that exploit autonomous agents' vulnerabilities to external tools and internet access. The researchers identify six attack categories targeting perception, reasoning, memory, actions, multi-agent networks, and human supervisors, with proof-of-concept demonstrations for each.

April 1, 2026 · 5:20 PM

model release

Holo3 achieves 78.85% on OSWorld benchmark with only 10B active parameters

H Company unveiled Holo3, a computer use model that scores 78.85% on the OSWorld-Verified benchmark—the highest on the leading desktop automation benchmark. The model achieves this with only 10B active parameters (122B total), positioning it as a lower-cost alternative to proprietary models like GPT 5.4 and Opus 4.6.

April 1, 2026 · 4:50 PM

March 25, 2026

product updateAnthropic

Anthropic launches 'safer' auto mode for Claude Code to prevent unintended autonomous actions

Anthropic has launched an auto mode for Claude Code that blocks potentially dangerous autonomous actions before execution. The feature, now available as a research preview for Team plan users, acts as a middle ground between constant user oversight and unrestricted agent autonomy.

March 25, 2026 · 11:50 AM

March 24, 2026

product updateAnthropic

Anthropic's Claude gains computer control in Code and Cowork tools

Anthropic has expanded Claude's autonomous capabilities to its Code and Cowork AI tools, allowing the model to control your Mac's mouse, keyboard, and display to complete tasks without manual intervention. The research preview is available now for Claude Pro and Max subscribers on macOS only, with support for other operating systems coming later.

March 24, 2026 · 1:50 PM

March 13, 2026

product updatePerplexity AI

Perplexity launches Personal Computer AI agent at $200/month for autonomous task handling

Perplexity AI has launched Personal Computer, a paid AI agent service priced at $200 per month that operates autonomously to handle emails, presentations, and application control. The service aims to provide continuous AI assistance for routine digital tasks without human intervention.

March 13, 2026 · 3:11 PM

March 5, 2026

model releaseOpenAI

OpenAI launches GPT-5.4 with native computer use capabilities for autonomous agents

OpenAI has launched GPT-5.4, its latest model with native computer use capabilities that allow it to operate computers and complete tasks across applications. The release represents a step toward autonomous AI agents that can handle complex jobs independently. The model includes advancements in reasoning, coding, and professional work with spreadsheets, documents, and presentations.

March 5, 2026 · 6:06 PM

February 28, 2026

benchmark

Arcada Labs benchmark tests five AI models as autonomous X agents

Arcada Labs, an AI benchmarking startup, has created a new benchmark that pits five leading AI models against each other as autonomous social media agents on X. The test measures how well different models can operate independently on the platform.

February 28, 2026 · 11:20 AM

← Back to all news