model releasexAI

xAI releases Grok 4.20 Multi-Agent with 2M context window and parallel agent reasoning

TL;DR

xAI has released Grok 4.20 Multi-Agent, a variant designed for collaborative agent-based workflows with a 2-million-token context window. The model scales from 4 agents at low/medium reasoning effort to 16 agents at high/xhigh effort levels, priced at $2 per million input tokens and $6 per million output tokens.

March 31, 2026 · 7:20 PM2 min read

Grok 4.20 Multi-Agent — Quick Specs

Context window2000K tokens

Input$2/1M tokens

Output$6/1M tokens

Compare Grok 4.20 Multi-Agent with other models →

xAI Releases Grok 4.20 Multi-Agent for Parallel Agent Workflows

xAI has released Grok 4.20 Multi-Agent, a specialized variant of its Grok 4.20 model optimized for multi-agent collaboration and complex reasoning tasks. The model was released March 31, 2026, with a knowledge cutoff of September 1, 2025.

Key Specifications

Context and Reasoning: The model supports a 2-million-token context window, among the largest available. Agent parallelization scales with reasoning effort: low and medium reasoning effort deploy 4 agents operating simultaneously, while high and xhigh reasoning effort scales to 16 parallel agents.

Pricing: Input tokens cost $2 per million tokens, output tokens cost $6 per million tokens. Web search functionality is priced at $5 per 1,000 queries. These rates are effective pricing across available providers on OpenRouter as of the release date.

Architecture and Capabilities

Grok 4.20 Multi-Agent is designed for workflows requiring coordinated agent-based reasoning. According to xAI, multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information across complex tasks. The model includes reasoning token support, allowing users to inspect internal step-by-step thinking before final responses.

The multi-agent variant differs from standard Grok 4.20 by explicitly handling collaborative workflows where agents can divide work, share context, and synthesize results. Reasoning effort settings control both computational intensity and agent count, with higher effort levels deploying significantly more agents (4x increase from low to high).

Developer Integration

The model is available through OpenRouter, which normalizes API requests across multiple providers. Developers can enable reasoning using the reasoning parameter and access reasoning_details arrays in responses. OpenRouter's documentation indicates that reasoning_details should be preserved when continuing conversations to maintain reasoning continuity across turns.

What This Means

Grok 4.20 Multi-Agent targets use cases requiring complex coordination—research synthesis, multi-step problem solving, and workflows that benefit from parallel reasoning paths. The 2M context window enables processing of substantial documents or conversation histories without truncation. The pricing model rewards input efficiency while charging relatively higher output rates, suggesting the model is optimized for high-volume reasoning rather than simple completions.

The explicit agent parallelization architecture represents a shift toward structured multi-agent systems within a single model call, rather than requiring external orchestration. This simplifies deployment for teams building agent-based applications but ties architecture decisions to reasoning effort settings rather than explicit control.

Availability through OpenRouter means developers access this model without direct xAI contracts, though pricing may vary by provider. The March 2026 release positions Grok 4.20 Multi-Agent in a competitive landscape where context window and reasoning capabilities have become table stakes for frontier models.

Source: openrouter.ai ↗

grok-4-20-multi-agent xai multi-agent context-window reasoning-models model-release

model releaseMay 8, 2026

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.

model releaseMay 7, 2026

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

product updateMay 15, 2026

xAI launches Grok Build coding agent at $300/month, available only to SuperGrok Heavy subscribers

xAI has released Grok Build, a coding agent and CLI tool positioned to compete with Anthropic's Claude Code and other AI coding assistants. The early beta is available exclusively to SuperGrok Heavy subscribers at $300 per month.

model releaseMay 14, 2026

Baidu Releases Qianfan-OCR-Fast Model with 66K Context at $0.68 Per 1M Input Tokens

Baidu has released Qianfan-OCR-Fast, a multimodal model specialized for optical character recognition tasks. The model offers a 66,000 token context window and is priced at $0.68 per 1M input tokens and $2.81 per 1M output tokens.