model release

Alibaba Releases Qwen3.6-35B-A3B: 35B Parameter MoE Model with 262K Context Window

TL;DR

Alibaba has released Qwen3.6-35B-A3B, the first open-weight model in the Qwen3.6 series. The model features 35B total parameters with 3B activated, a native 262K context window extensible to 1.01M tokens, and achieves 73.4% on SWE-bench Verified using 256 experts with 8 activated per token.

2 min read
1

Qwen3.6 35B A3B — Quick Specs

Context window262K tokens
Input$0.1612/1M tokens
Output$0.9653/1M tokens

Alibaba Releases Qwen3.6-35B-A3B: 35B Parameter MoE Model with 262K Context Window

Alibaba has released Qwen3.6-35B-A3B, the first open-weight variant in the Qwen3.6 series. The model features 35 billion total parameters with 3 billion activated per forward pass, using a mixture-of-experts architecture with 256 experts.

Architecture Specifications

The model employs a distinctive architecture combining Gated DeltaNet and Gated Attention layers across 40 layers with a 2048 hidden dimension. The MoE configuration activates 8 experts plus 1 shared expert per token, with each expert having a 512 intermediate dimension.

Key specifications:

  • Context window: 262,144 tokens natively, extensible to 1,010,000 tokens
  • Token embedding: 248,320 (padded)
  • Training: Multi-step prediction (MTP)
  • Architecture: 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))

Benchmark Performance

According to Alibaba, Qwen3.6-35B-A3B achieves substantial improvements in coding benchmarks:

Coding Agent Tasks:

  • SWE-bench Verified: 73.4%
  • SWE-bench Multilingual: 67.2%
  • SWE-bench Pro: 49.5%
  • Terminal-Bench 2.0: 51.5%
  • Claw-Eval Average: 68.7%

Knowledge Benchmarks:

  • MMLU-Pro: 85.2%
  • MMLU-Redux: 93.3%
  • C-Eval: 90.0%

STEM & Reasoning:

  • GPQA: 86.0%
  • LiveCodeBench v6: 80.4%
  • AIME 2026: 92.7%

Vision Language:

  • MMMU: 81.7%
  • MathVista (mini): 86.4%
  • RealWorldQA: 85.3%
  • VideoMMMU: 83.7%

All benchmarks were conducted using the company's internal evaluation harness with specific temperature and context window settings disclosed in their documentation.

Technical Features

The model introduces "thinking preservation," which retains reasoning context from historical messages to reduce computational overhead during iterative development. Alibaba claims this enhances the model's performance on repository-level reasoning and frontend workflows.

The architecture uses:

  • Gated DeltaNet: 32 linear attention heads for V, 16 for QK with 128 head dimension
  • Gated Attention: 16 attention heads for Q, 2 for KV with 256 head dimension
  • Rotary Position Embedding: 64 dimensions

Deployment

The model is compatible with SGLang (version 0.5.10+), vLLM (version 0.19+), and KTransformers. Alibaba recommends maintaining at least 128K token context length for optimal thinking capabilities, though this can be reduced if encountering memory constraints.

For serving, the company recommends tensor parallelism across 8 GPUs with 0.8 memory fraction for the full 262K context window. The model supports tool use and multi-token prediction modes.

Pricing

Pricing has not been disclosed. The model weights are available on Hugging Face under an open-weight license.

What This Means

Qwen3.6-35B-A3B demonstrates that MoE architectures with high expert counts (256) can achieve competitive performance on coding tasks while maintaining relatively low activation cost (3B parameters). The 73.4% SWE-bench Verified score positions it between Qwen3.5-27B (75.0%) and Qwen3.5-35B-A3B (70.0%), suggesting architectural refinements beyond pure parameter scaling. The extended context capability to 1M tokens addresses a key limitation for repository-level code understanding, though real-world performance at maximum context length remains to be independently verified.

Related Articles

model release

StepFun launches Step 3.7 Flash: 196B MoE model with 256K context and adjustable reasoning levels at $0.20/$1.15 per 1M

StepFun has released Step 3.7 Flash, a 196B-parameter Mixture-of-Experts model that activates approximately 11B parameters per token. The multimodal model supports a 256K context window and introduces selectable reasoning levels (high/medium/low), priced at $0.20 per 1M input tokens and $1.15 per 1M output tokens.

model release

Anthropic releases Claude Opus 4.8 with 69.2% agentic coding score, 2.5x faster performance

Anthropic released Claude Opus 4.8 on May 28, 2026, six weeks after version 4.7. The model achieves 69.2% on agentic coding benchmarks (up from 64.3%), runs 2.5 times faster in fast mode at one-third the cost, while maintaining the same pricing as version 4.7.

model release

Mistral Releases Medium 3.5: 128B Model with Cloud Coding Agents and 77.6% SWE-Bench Verified

Mistral AI released Medium 3.5, a 128B dense model with a 256k context window that scores 77.6% on SWE-Bench Verified. The model powers new remote coding agents in Mistral Vibe that run asynchronously in the cloud, plus a new Work mode in Le Chat for multi-step agentic tasks.

model release

Mistral releases Devstral Medium and Small 1.1 with 61.6% SWE-Bench Verified score

Mistral AI has released two specialized coding models: Devstral Medium, achieving 61.6% on SWE-Bench Verified, and Devstral Small 1.1, scoring 53.6% and released under Apache 2.0 license. The company claims Devstral Medium surpasses Gemini 2.5 Pro and GPT-4.1 at a quarter of the price.

Comments

Loading...