model release

Alibaba Releases Qwen3.6 Max Preview: 1 Trillion Parameter MoE Model With 262K Context Window

TL;DR

Alibaba Cloud has released Qwen3.6 Max Preview, a proprietary frontier model built on sparse mixture-of-experts architecture with approximately 1 trillion total parameters. The model supports a 262,144-token context window and features integrated thinking mode for multi-turn reasoning, priced at $1.30 per million input tokens and $7.80 per million output tokens.

2 min read
0

Qwen3.6 Max Preview — Quick Specs

Context window262K tokens
Input$1.3/1M tokens
Output$7.8/1M tokens

Alibaba Releases Qwen3.6 Max Preview: 1 Trillion Parameter MoE Model With 262K Context Window

Alibaba Cloud has released Qwen3.6 Max Preview, a proprietary frontier model with approximately 1 trillion total parameters using a sparse mixture-of-experts (MoE) architecture. The model supports a 262,144-token context window and is priced at $1.30 per million input tokens and $7.80 per million output tokens.

Technical Specifications

According to Alibaba, Qwen3.6 Max Preview is optimized for agentic coding, tool use, and long-context reasoning. The model includes an integrated thinking mode that preserves reasoning traces across multi-turn conversations, similar to approaches seen in other reasoning-capable models.

The model supports structured output and function calling capabilities. The sparse MoE architecture activates only a subset of the 1 trillion total parameters for each inference, which typically provides efficiency advantages over dense models of equivalent size.

Availability and Access

Qwen3.6 Max Preview is available exclusively through the Alibaba Cloud Model Studio and Qwen Studio APIs. No open weights are provided. The model is also accessible through OpenRouter, which routes requests across multiple providers.

The model was released on April 27, 2025, according to the listing on OpenRouter.

Pricing Structure

  • Input tokens: $1.30 per million tokens
  • Output tokens: $7.80 per million tokens

The 6:1 ratio between output and input pricing is consistent with cost structures for compute-intensive inference in large language models.

Reasoning Capabilities

The integrated thinking mode allows the model to show step-by-step reasoning processes. According to OpenRouter's documentation, developers can access reasoning traces through a reasoning_details array in API responses. The model can preserve these reasoning traces when passed back in subsequent conversation turns, enabling continuous reasoning across multi-turn interactions.

What This Means

Qwen3.6 Max Preview represents Alibaba's entry into the trillion-parameter model tier, joining other frontier models in supporting extended context windows beyond 200K tokens. The sparse MoE architecture suggests a focus on inference efficiency, though Alibaba has not disclosed the number of active parameters per forward pass. The proprietary, API-only release contrasts with Alibaba's previous pattern of releasing open-weight Qwen models, indicating a strategic shift toward commercial model offerings. The emphasis on agentic coding and tool use positions this model for enterprise workflows requiring autonomous task execution.

Related Articles

model release

Nex AGI Releases Nex-N2-Pro: 17B Active Parameter MoE Model with 262K Context Window

Nex AGI has released Nex-N2-Pro, a mixture-of-experts model with 17 billion active parameters from a total of 397 billion parameters. Built on the Qwen3.5 architecture, the model offers a 262,144 token context window and is available for free through OpenRouter.

model release

Nex AGI Releases Nex-N2-Pro: 397B Parameter MoE Model With 262K Context, Available Free

Nex AGI has released Nex-N2-Pro, an agentic mixture-of-experts model with 397B total parameters and 17B active parameters. The model features a 262K token context window and is available free via OpenRouter's API.

model release

NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning

NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.

model release

Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows

Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.

Comments

Loading...