agents
13 articles tagged with agents
Alibaba Releases Qwen3.7 Max with 1M Token Context Window for Agent and Coding Tasks
Alibaba has released Qwen3.7 Max, the flagship model in its Qwen3.7 series, featuring a 1 million token context window. The text-only model is designed for agent-centric workloads with strengths in coding, office productivity, and long-horizon autonomous execution, and includes explicit prompt caching support.
Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June
Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.
Google releases Gemini 3.5 Flash with autonomous coding and agent capabilities, claims 4x speed boost
Google released Gemini 3.5 Flash, positioning it as an agent-first model designed for autonomous coding and multi-hour workflows. The company claims the model outperforms its 3.1 Pro predecessor on coding and agentic benchmarks while running 4x faster than competing frontier models, with an optimized version achieving 12x speed gains.
IBM Research launches Open Agent Leaderboard, showing same models achieve different results based on agent architecture
IBM Research has launched the Open Agent Leaderboard, the first open benchmark that evaluates complete AI agent systems rather than just underlying models. The leaderboard reveals that agents using identical models can achieve significantly different success rates and costs depending on system architecture, with failed runs costing 20-54% more than successful ones.
Anthropic adds dreaming, outcomes, and multiagent orchestration to Claude Managed Agents
Anthropic has released three new capabilities for Claude Managed Agents: dreaming (research preview) for pattern recognition and self-improvement, outcomes for defining success criteria with automated evaluation, and multiagent orchestration for delegating tasks to specialist agents.
AWS launches agent-guided workflows in SageMaker AI to automate model fine-tuning
Amazon Web Services has released agent-guided workflows in SageMaker AI that use AI coding agents to automate model customization. The feature includes nine pre-built skills covering use case definition, data preparation, fine-tuning technique selection (SFT, DPO, RLVR), evaluation, and deployment to Amazon Bedrock or SageMaker endpoints.
Tencent Releases Hy3-Preview: 295B-Parameter MoE Model with 21B Active Parameters
Tencent has released Hy3-preview, a 295-billion-parameter Mixture-of-Experts model with 21 billion active parameters and a 256K context window. The model scores 76.28% on MATH and 34.86% on LiveCodeBench-v6, with particularly strong performance on coding agent tasks.
OpenAI updates Codex with background desktop control, matching Anthropic's Claude capabilities
OpenAI announced major updates to Codex, its automated coding tool, adding background desktop control that lets it operate apps and click through interfaces while users continue working. The update includes 111 plugin integrations and matches capabilities Anthropic released for Claude Code last month.
Anthropic releases Claude Opus 4.7 with 1M context window for long-running agent tasks
Anthropic has released Claude Opus 4.7, the latest version of its flagship Opus family designed for long-running, asynchronous agent tasks. The model features a 1 million token context window and costs $5 per million input tokens and $25 per million output tokens.
OpenAI Adds Sandboxing and In-Distribution Harness to Agents SDK for Enterprise Deployment
OpenAI has updated its Agents SDK with sandboxing capabilities that allow AI agents to operate in controlled environments, plus an in-distribution harness for frontier model deployment. The features launch initially in Python, with TypeScript support planned.
Amazon Bedrock AgentCore now supports stateful MCP with user input, LLM sampling, and progress streaming
Amazon has introduced stateful MCP client capabilities on Bedrock AgentCore Runtime, enabling agents to pause mid-execution for user input, request LLM-generated content, and stream real-time progress updates. The update transforms one-way tool execution into bidirectional conversations between MCP servers and clients, supporting interactive workflows previously impossible with stateless implementations.
OpenAI consolidating ChatGPT, browser, and Codex into single desktop app
OpenAI is developing a unified desktop application that combines ChatGPT, its browser, and Codex code generator into a single product. The move is intended to streamline user experience and focus resources on one integrated platform, with emphasis on agentic AI capabilities that can autonomously handle tasks like software development and data analysis.
OpenAI Codex launches subagents and custom agent support in general availability
OpenAI Codex subagents reached general availability after weeks of preview, enabling developers to define custom agents as TOML files and parallelize task execution. The feature mirrors Claude Code's implementation with default subagents for exploration, worker, and default operations.