model release

Alibaba Qwen Releases 35B Parameter Qwen3.6-35B-A3B Model with 262K Native Context Window

TL;DR

Alibaba Qwen has released Qwen3.6-35B-A3B, a 35-billion parameter mixture-of-experts model with 3 billion activated parameters and a 262,144-token native context window extendable to 1,010,000 tokens. The model scores 73.4 on SWE-bench Verified and features FP8 quantization with performance metrics nearly identical to the original model.

April 17, 2026 · 6:36 AM2 min read

Qwen3.6-35B-A3B-FP8 — Quick Specs

Context window262K tokens

Compare Qwen3.6-35B-A3B-FP8 with other models →

Alibaba Qwen Releases 35B Parameter Qwen3.6-35B-A3B Model with 262K Native Context Window

Alibaba Qwen has released Qwen3.6-35B-A3B, a 35-billion parameter mixture-of-experts model with 3 billion activated parameters and a 262,144-token native context window extendable to 1,010,000 tokens. The model is available in FP8-quantized format using fine-grained quantization with block size of 128.

Architecture and Specifications

Qwen3.6-35B-A3B uses a sparse architecture with 256 experts total, activating 8 routed experts plus 1 shared expert per token. The model features 40 layers with a hidden dimension of 2048 and token embedding of 248,320 (padded). The architecture uses a hybrid attention mechanism combining Gated DeltaNet (32 V heads, 16 QK heads) and Gated Attention (16 Q heads, 2 KV heads) in a 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)) layout.

Benchmark Performance

The model achieves 73.4 on SWE-bench Verified, 67.2 on SWE-bench Multilingual, and 49.5 on SWE-bench Pro, according to Alibaba's internal agent scaffold testing at temperature 1.0 with 200K context window. On Terminal-Bench 2.0, it scores 51.5 (average of 5 runs with 3-hour timeout, 32 CPU/48GB RAM).

On general knowledge benchmarks, Qwen3.6-35B-A3B scores 85.2 on MMLU-Pro, 93.3 on MMLU-Redux, and 64.7 on SuperGPQA. For reasoning tasks, it achieves 86.0 on GPQA, 80.4 on LiveCodeBench v6, and 92.7 on AIME 2026 (I & II combined).

Vision-language capabilities include 81.7 on MMMU, 75.3 on MMMU-Pro, 86.4 on Mathvista (mini), and 85.3 on RealWorldQA. Video understanding benchmarks show 87.0 on VideoMME (with subtitles), 82.5 (without subtitles), and 83.7 on VideoMMMU.

Key Features

Qwen3.6 introduces "thinking preservation," which retains reasoning context from historical messages to reduce overhead in iterative development workflows. The model claims improved handling of frontend workflows and repository-level reasoning compared to previous versions.

The FP8-quantized version maintains performance metrics nearly identical to the original model, according to Alibaba. The model is compatible with Hugging Face Transformers, vLLM, SGLang, and KTransformers frameworks.

Deployment Requirements

Alibaba recommends maintaining a context length of at least 128K tokens to preserve thinking capabilities, though the model supports up to 262K tokens natively. For production workloads, the company suggests using SGLang, KTransformers, or vLLM serving engines with tensor parallelism across 8 GPUs.

Pricing information has not been disclosed. The model weights are available on Hugging Face.

What This Means

Qwen3.6-35B-A3B represents a significant entry in the 30B-40B parameter class with its extended 262K native context window and mixture-of-experts architecture that activates only 3B of 35B parameters per token. The SWE-bench Verified score of 73.4 positions it competitively against models like Qwen3.5-27B (75.0) and above Gemma4-31B (52.0), though exact comparison requires noting different evaluation protocols. The FP8 quantization enables deployment efficiency while maintaining benchmark performance, addressing a key practical constraint for models in this parameter range.

Source: huggingface.co ↗

Qwen Alibaba Model Release Mixture of Experts FP8 Quantization SWE-bench Vision Language Model Extended Context

model releaseJuly 13, 2026

OpenAI GPT-5.6 Sol, Terra, and Luna launch on Amazon Bedrock with 80-point Coding Agent Index score

OpenAI's GPT-5.6 model family is now generally available on Amazon Bedrock, introducing a three-tier system: Sol (flagship reasoning), Terra (balanced production), and Luna (fast inference). According to OpenAI, Sol scores 80 points on the Artificial Analysis Coding Agent Index and 73.5% on ExploitBench, establishing new benchmarks while using less than half the output tokens of competing models.

model releaseJuly 9, 2026

OpenAI releases GPT-5.6 with three model variants, claims 80-point Coding Agent Index score for Sol

OpenAI released GPT-5.6 in three variants: Sol ($5 input/$30 output per 1M tokens), Terra ($2.50/$15), and Luna ($1/$6). According to OpenAI, Sol achieves an 80-point score on the Artificial Analysis Coding Agent Index, 2.8 points above Anthropic's Fable 5, while using less than half the output tokens and costing one-third less.

model releaseJuly 9, 2026

Meta launches Muse Spark 1.1 coding model at $1.25/$4.25 per million tokens

Meta publicly released Muse Spark 1.1, a multimodal AI model designed for agentic coding workflows. The model is priced at $1.25 per million input tokens and $4.25 per million output tokens, positioning it slightly above Anthropic's Claude Haiku 4.5 and OpenAI's GPT-5.6 Luna.

model releaseJuly 9, 2026

OpenAI releases Sol model without clear government approval process, experts say

OpenAI has released its latest advanced model, Sol, for public access after government review, but researchers and industry figures say the approval process remains opaque. The model is considered comparable to Anthropic's Fable, which was briefly banned from public access, yet details of how either model received clearance are unclear.

Alibaba Qwen Releases 35B Parameter Qwen3.6-35B-A3B Model with 262K Native Context Window

Qwen3.6-35B-A3B-FP8 — Quick Specs

Alibaba Qwen Releases 35B Parameter Qwen3.6-35B-A3B Model with 262K Native Context Window

Architecture and Specifications

Benchmark Performance

Key Features

Deployment Requirements

What This Means

Related Articles

OpenAI GPT-5.6 Sol, Terra, and Luna launch on Amazon Bedrock with 80-point Coding Agent Index score

OpenAI releases GPT-5.6 with three model variants, claims 80-point Coding Agent Index score for Sol

Meta launches Muse Spark 1.1 coding model at $1.25/$4.25 per million tokens

OpenAI releases Sol model without clear government approval process, experts say

Comments