model release

Alibaba Releases Qwen3.6-35B-A3B: 35B Parameter MoE Model with 262K Context Window

TL;DR

Alibaba has released Qwen3.6-35B-A3B, the first open-weight model in the Qwen3.6 series. The model features 35B total parameters with 3B activated, a native 262K context window extensible to 1.01M tokens, and achieves 73.4% on SWE-bench Verified using 256 experts with 8 activated per token.

April 16, 2026 · 2:21 PM2 min read

Qwen3.6 35B A3B — Quick Specs

Context window262K tokens

Input$0.1612/1M tokens

Output$0.9653/1M tokens

Compare Qwen3.6 35B A3B with other models →

Alibaba Releases Qwen3.6-35B-A3B: 35B Parameter MoE Model with 262K Context Window

Alibaba has released Qwen3.6-35B-A3B, the first open-weight variant in the Qwen3.6 series. The model features 35 billion total parameters with 3 billion activated per forward pass, using a mixture-of-experts architecture with 256 experts.

Architecture Specifications

The model employs a distinctive architecture combining Gated DeltaNet and Gated Attention layers across 40 layers with a 2048 hidden dimension. The MoE configuration activates 8 experts plus 1 shared expert per token, with each expert having a 512 intermediate dimension.

Key specifications:

Context window: 262,144 tokens natively, extensible to 1,010,000 tokens
Token embedding: 248,320 (padded)
Training: Multi-step prediction (MTP)
Architecture: 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))

Benchmark Performance

According to Alibaba, Qwen3.6-35B-A3B achieves substantial improvements in coding benchmarks:

Coding Agent Tasks:

SWE-bench Verified: 73.4%
SWE-bench Multilingual: 67.2%
SWE-bench Pro: 49.5%
Terminal-Bench 2.0: 51.5%
Claw-Eval Average: 68.7%

Knowledge Benchmarks:

MMLU-Pro: 85.2%
MMLU-Redux: 93.3%
C-Eval: 90.0%

STEM & Reasoning:

GPQA: 86.0%
LiveCodeBench v6: 80.4%
AIME 2026: 92.7%

Vision Language:

MMMU: 81.7%
MathVista (mini): 86.4%
RealWorldQA: 85.3%
VideoMMMU: 83.7%

All benchmarks were conducted using the company's internal evaluation harness with specific temperature and context window settings disclosed in their documentation.

Technical Features

The model introduces "thinking preservation," which retains reasoning context from historical messages to reduce computational overhead during iterative development. Alibaba claims this enhances the model's performance on repository-level reasoning and frontend workflows.

The architecture uses:

Gated DeltaNet: 32 linear attention heads for V, 16 for QK with 128 head dimension
Gated Attention: 16 attention heads for Q, 2 for KV with 256 head dimension
Rotary Position Embedding: 64 dimensions

Deployment

The model is compatible with SGLang (version 0.5.10+), vLLM (version 0.19+), and KTransformers. Alibaba recommends maintaining at least 128K token context length for optimal thinking capabilities, though this can be reduced if encountering memory constraints.

For serving, the company recommends tensor parallelism across 8 GPUs with 0.8 memory fraction for the full 262K context window. The model supports tool use and multi-token prediction modes.

Pricing

Pricing has not been disclosed. The model weights are available on Hugging Face under an open-weight license.

What This Means

Qwen3.6-35B-A3B demonstrates that MoE architectures with high expert counts (256) can achieve competitive performance on coding tasks while maintaining relatively low activation cost (3B parameters). The 73.4% SWE-bench Verified score positions it between Qwen3.5-27B (75.0%) and Qwen3.5-35B-A3B (70.0%), suggesting architectural refinements beyond pure parameter scaling. The extended context capability to 1M tokens addresses a key limitation for repository-level code understanding, though real-world performance at maximum context length remains to be independently verified.

Source: huggingface.co ↗

Qwen3.6 Alibaba MoE coding open-weight 262K-context SWE-bench

model releaseJuly 15, 2026

Mira Murati's Thinking Machines releases Inkling, 975B-parameter open-weight model trained on 45T tokens

Thinking Machines Lab released Inkling, a 975-billion-parameter mixture-of-experts model that uses 41 billion active parameters per task. The open-weight model was trained on 45 trillion tokens across text, image, audio, and video, marking the first public release from Mira Murati's AI startup.

model releaseJuly 9, 2026

OpenAI Releases GPT-5.6 Terra: Mid-Tier Model at $2.50 Input/$15 Output per 1M Tokens

OpenAI has released GPT-5.6 Terra, a mid-tier model in its GPT-5.6 series priced at $2.50 per million input tokens and $15 per million output tokens. The model features a 1 million token context window and February 2026 knowledge cutoff, positioned between the flagship Sol and cost-efficient Luna tiers.

model releaseJuly 14, 2026

PrismML releases Bonsai 27B, claims first 27B-parameter model to run on-device on iPhone at 4GB memory footprint

PrismML has released Bonsai 27B, claiming it's the first 27-billion parameter model capable of running on-device on iPhone. The model achieves 58-87 tokens per second on Apple's M5 Max chip with a 4GB memory footprint, using 1-bit and ternary quantization to fit within iPhone's approximately 6GB available app memory.

model releaseJuly 14, 2026

Google releases Gemma 4 E2B, optimized to run natively on Pixel 10's Tensor G5 TPU

Google has released Gemma 4 E2B for TPU, a variant of its open-source Gemma 4 model optimized to run natively on the Tensor G5 chip in Pixel 10 devices. The multimodal model enables completely offline AI chat, image recognition, and audio transcription on Pixel 10, 10 Pro, 10 Pro XL, and 10 Pro Fold.

Alibaba Releases Qwen3.6-35B-A3B: 35B Parameter MoE Model with 262K Context Window

Qwen3.6 35B A3B — Quick Specs

Alibaba Releases Qwen3.6-35B-A3B: 35B Parameter MoE Model with 262K Context Window

Architecture Specifications

Benchmark Performance

Technical Features

Deployment

Pricing

What This Means

Related Articles

Mira Murati's Thinking Machines releases Inkling, 975B-parameter open-weight model trained on 45T tokens

OpenAI Releases GPT-5.6 Terra: Mid-Tier Model at $2.50 Input/$15 Output per 1M Tokens

PrismML releases Bonsai 27B, claims first 27B-parameter model to run on-device on iPhone at 4GB memory footprint

Google releases Gemma 4 E2B, optimized to run natively on Pixel 10's Tensor G5 TPU

Comments