model release

Alibaba Releases Qwen3.7 Max with 1M Token Context Window for Agent and Coding Tasks

TL;DR

Alibaba has released Qwen3.7 Max, the flagship model in its Qwen3.7 series, featuring a 1 million token context window. The text-only model is designed for agent-centric workloads with strengths in coding, office productivity, and long-horizon autonomous execution, and includes explicit prompt caching support.

May 21, 2026 · 3:50 PM2 min read

Qwen3.7 Max — Quick Specs

Context window1000K tokens

Compare Qwen3.7 Max with other models →

Alibaba Releases Qwen3.7 Max with 1M Token Context Window for Agent and Coding Tasks

Alibaba has released Qwen3.7 Max, the flagship model in its Qwen3.7 series, featuring a 1 million token context window. The model supports text input and output only.

Key Specifications

Context window: 1 million tokens
Released: May 21, 2025
Modalities: Text only (no multimodal support)
Prompt caching: Explicit prompt caching supported
Pricing: Not yet disclosed

Performance Focus

According to Alibaba, Qwen3.7 Max is optimized for agent-centric workloads with three primary use cases:

Coding tasks: The model claims notable gains in coding performance over previous Qwen generations
Office and productivity applications: Designed for document processing and workflow automation
Long-horizon autonomous execution: Built for multi-step agent tasks that require sustained context

The company states the model offers "notable gains in coding and agentic performance" compared to prior Qwen versions, though specific benchmark scores have not been published at launch.

Technical Features

The 1 million token context window places Qwen3.7 Max among models with extended context capabilities, comparable to recent releases from other vendors. The explicit prompt caching feature is designed to optimize performance when reusing repeated context across multiple requests, reducing latency and compute costs for agent workflows.

Parameter count and training data cutoff date have not been disclosed.

What This Means

Qwen3.7 Max represents Alibaba's continued push into the agent and coding model market with a focus on extended context. The 1M token window and prompt caching position it for complex agent workflows that require maintaining state across long interactions. However, without published benchmark scores or pricing, direct performance and cost comparisons with competing models like GPT-4, Claude 3.5 Sonnet, or DeepSeek remain unclear. The agent-first design signals Alibaba's bet on autonomous AI systems as a key use case for frontier models.

Source: openrouter.ai ↗

Qwen Alibaba model release agents coding context window prompt caching

model releaseJune 30, 2026

Anthropic releases Claude Sonnet 5 at $2/1M input tokens, 63.2% agentic coding benchmark

Anthropic has released Claude Sonnet 5, its new mid-tier model optimized for agentic tasks, priced at $2 per million input tokens through August 31 before rising to $3/1M. The model scores 63.2% on agentic coding benchmarks, approaching Opus 4.8's 69.2% performance at a significantly lower price point.

model releaseJune 29, 2026

DeepReinforce Releases Ornith-1.0, Open-Source Agentic Coding Model in 9B to 397B Sizes

DeepReinforce has released Ornith-1.0, an MIT-licensed model designed for agentic coding tasks with variants ranging from 9B to 397B parameters. Built on top of Apache 2.0-licensed Gemma 4 and Qwen 3.5 base models, the company claims it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks.

model releaseJuly 4, 2026

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model releaseJuly 4, 2026

NVIDIA releases Nemotron-Labs-TwoTower-30B: block-wise diffusion model claims 2.42× faster generation at 98.7% baseline

NVIDIA released Nemotron-Labs-TwoTower-30B-A3B-Base-BF16, a block-wise diffusion language model that generates text by denoising blocks of tokens in parallel rather than sequentially. According to NVIDIA, the model achieves 2.42× the wall-clock generation throughput of its autoregressive baseline while retaining 98.7% of aggregate benchmark quality.

Alibaba Releases Qwen3.7 Max with 1M Token Context Window for Agent and Coding Tasks

Qwen3.7 Max — Quick Specs

Alibaba Releases Qwen3.7 Max with 1M Token Context Window for Agent and Coding Tasks

Key Specifications

Performance Focus

Technical Features

What This Means

Related Articles

Anthropic releases Claude Sonnet 5 at $2/1M input tokens, 63.2% agentic coding benchmark

DeepReinforce Releases Ornith-1.0, Open-Source Agentic Coding Model in 9B to 397B Sizes

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

NVIDIA releases Nemotron-Labs-TwoTower-30B: block-wise diffusion model claims 2.42× faster generation at 98.7% baseline

Comments