agents

13 articles tagged with agents

May 21, 2026
model release

Alibaba Releases Qwen3.7 Max with 1M Token Context Window for Agent and Coding Tasks

Alibaba has released Qwen3.7 Max, the flagship model in its Qwen3.7 series, featuring a 1 million token context window. The text-only model is designed for agent-centric workloads with strengths in coding, office productivity, and long-horizon autonomous execution, and includes explicit prompt caching support.

May 19, 2026
model release

Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June

Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.

model release

Google releases Gemini 3.5 Flash with autonomous coding and agent capabilities, claims 4x speed boost

Google released Gemini 3.5 Flash, positioning it as an agent-first model designed for autonomous coding and multi-hour workflows. The company claims the model outperforms its 3.1 Pro predecessor on coding and agentic benchmarks while running 4x faster than competing frontier models, with an optimized version achieving 12x speed gains.

May 18, 2026
benchmark

IBM Research launches Open Agent Leaderboard, showing same models achieve different results based on agent architecture

IBM Research has launched the Open Agent Leaderboard, the first open benchmark that evaluates complete AI agent systems rather than just underlying models. The leaderboard reveals that agents using identical models can achieve significantly different success rates and costs depending on system architecture, with failed runs costing 20-54% more than successful ones.

May 7, 2026
product updateAnthropic

Anthropic adds dreaming, outcomes, and multiagent orchestration to Claude Managed Agents

Anthropic has released three new capabilities for Claude Managed Agents: dreaming (research preview) for pattern recognition and self-improvement, outcomes for defining success criteria with automated evaluation, and multiagent orchestration for delegating tasks to specialist agents.

May 4, 2026
product updateAmazon Web Services

AWS launches agent-guided workflows in SageMaker AI to automate model fine-tuning

Amazon Web Services has released agent-guided workflows in SageMaker AI that use AI coding agents to automate model customization. The feature includes nine pre-built skills covering use case definition, data preparation, fine-tuning technique selection (SFT, DPO, RLVR), evaluation, and deployment to Amazon Bedrock or SageMaker endpoints.

April 23, 2026
model releaseTencent

Tencent Releases Hy3-Preview: 295B-Parameter MoE Model with 21B Active Parameters

Tencent has released Hy3-preview, a 295-billion-parameter Mixture-of-Experts model with 21 billion active parameters and a 256K context window. The model scores 76.28% on MATH and 34.86% on LiveCodeBench-v6, with particularly strong performance on coding agent tasks.

April 16, 2026
product updateOpenAI

OpenAI updates Codex with background desktop control, matching Anthropic's Claude capabilities

OpenAI announced major updates to Codex, its automated coding tool, adding background desktop control that lets it operate apps and click through interfaces while users continue working. The update includes 111 plugin integrations and matches capabilities Anthropic released for Claude Code last month.

model releaseAnthropic

Anthropic releases Claude Opus 4.7 with 1M context window for long-running agent tasks

Anthropic has released Claude Opus 4.7, the latest version of its flagship Opus family designed for long-running, asynchronous agent tasks. The model features a 1 million token context window and costs $5 per million input tokens and $25 per million output tokens.

April 15, 2026
product updateOpenAI

OpenAI Adds Sandboxing and In-Distribution Harness to Agents SDK for Enterprise Deployment

OpenAI has updated its Agents SDK with sandboxing capabilities that allow AI agents to operate in controlled environments, plus an in-distribution harness for frontier model deployment. The features launch initially in Python, with TypeScript support planned.

April 9, 2026
product updateAmazon Web Services

Amazon Bedrock AgentCore now supports stateful MCP with user input, LLM sampling, and progress streaming

Amazon has introduced stateful MCP client capabilities on Bedrock AgentCore Runtime, enabling agents to pause mid-execution for user input, request LLM-generated content, and stream real-time progress updates. The update transforms one-way tool execution into bidirectional conversations between MCP servers and clients, supporting interactive workflows previously impossible with stateless implementations.

March 20, 2026
product updateOpenAI

OpenAI consolidating ChatGPT, browser, and Codex into single desktop app

OpenAI is developing a unified desktop application that combines ChatGPT, its browser, and Codex code generator into a single product. The move is intended to streamline user experience and focus resources on one integrated platform, with emphasis on agentic AI capabilities that can autonomously handle tasks like software development and data analysis.

March 16, 2026
product updateOpenAI

OpenAI Codex launches subagents and custom agent support in general availability

OpenAI Codex subagents reached general availability after weeks of preview, enabling developers to define custom agents as TOML files and parallelize task execution. The feature mirrors Claude Code's implementation with default subagents for exploration, worker, and default operations.