Alibaba Qwen Releases 35B Sparse MoE Model with 262K Context and Multimodal Support
Alibaba Cloud has released Qwen3.6-35B-A3B, an open-weight sparse mixture-of-experts model with 35 billion total parameters but only 3 billion active parameters per token. The model features a 262K native context window (expandable to 1M tokens), multimodal input support, and integrated reasoning mode with preserved thinking traces.
Qwen3.6 35B A3B — Quick Specs
Alibaba Qwen Releases 35B Sparse MoE Model with 262K Context and Multimodal Support
Alibaba Cloud has released Qwen3.6-35B-A3B, an open-weight sparse mixture-of-experts model with 35 billion total parameters but only 3 billion active parameters per token.
Architecture and Specifications
The model uses a hybrid sparse MoE architecture that combines Gated DeltaNet linear attention with standard gated attention layers, according to Alibaba. This design reduces computational requirements by activating only 3 billion parameters per token while maintaining the capacity of the full 35 billion parameter model.
Qwen3.6-35B-A3B supports a native context window of 262,144 tokens, extensible to 1 million tokens using YaRN (Yet another RoPE extensioN method). The model accepts text, image, and video inputs, making it a multimodal system.
Key Capabilities
The model includes:
- Reasoning mode: Integrated thinking capability with reasoning traces preserved across multi-turn conversations
- Function calling: Native support for tool use and function execution
- Structured output: Ability to generate formatted responses
- Multimodal processing: Handles text, images, and video inputs
Pricing and Availability
The model is available through OpenRouter at $0.1612 per million input tokens and $0.9653 per million output tokens. Alibaba has released it under the Apache 2.0 license, making the model weights freely available for commercial and research use.
The sparse MoE architecture positions Qwen3.6-35B-A3B as a cost-efficient alternative to dense models, as only 8.6% of parameters are active during inference.
What This Means
The sparse MoE approach with only 3B active parameters per token makes this 35B model competitive on inference cost with much smaller dense models while potentially retaining more knowledge capacity. The 262K native context window and multimodal capabilities make it suitable for document analysis and video understanding tasks. However, benchmark scores are not yet publicly available, making it difficult to assess performance relative to other models in its class. The Apache 2.0 license and availability through OpenRouter lower the barrier to adoption for developers seeking open-weight alternatives to proprietary models.
Related Articles
Alibaba Qwen Releases Qwen3.6 Flash with 1M Context Window at $0.25 per 1M Input Tokens
Alibaba's Qwen team has released Qwen3.6 Flash, a multimodal language model supporting text, image, and video input with a 1 million token context window. The model is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, with tiered pricing above 256K tokens.
Alibaba's Qwen Team Releases Qwen3.6 27B With 262K Context Window and Video Processing
Alibaba's Qwen Team has released Qwen3.6 27B, a 27-billion parameter multimodal language model with a 262,144-token context window. The model accepts text, image, and video inputs and includes a built-in thinking mode for extended reasoning, with pricing at $0.195 per million input tokens and $1.56 per million output tokens.
Alibaba Releases Qwen3.6 Max Preview: 1 Trillion Parameter MoE Model With 262K Context Window
Alibaba Cloud has released Qwen3.6 Max Preview, a proprietary frontier model built on sparse mixture-of-experts architecture with approximately 1 trillion total parameters. The model supports a 262,144-token context window and features integrated thinking mode for multi-turn reasoning, priced at $1.30 per million input tokens and $7.80 per million output tokens.
OpenAI Releases GPT-5.5 Pro with 1M+ Token Context Window, $30 Per Million Input Tokens
OpenAI has released GPT-5.5 Pro, a high-capability model with a 1,050,000 token context window (922K input, 128K output) priced at $30 per million input tokens and $180 per million output tokens. The model supports text and image inputs and is optimized for deep reasoning, agentic coding, and multi-step workflows.
Comments
Loading...