model release

Alibaba Qwen Releases 35B Parameter Qwen3.6-35B-A3B Model with 262K Native Context Window

TL;DR

Alibaba Qwen has released Qwen3.6-35B-A3B, a 35-billion parameter mixture-of-experts model with 3 billion activated parameters and a 262,144-token native context window extendable to 1,010,000 tokens. The model scores 73.4 on SWE-bench Verified and features FP8 quantization with performance metrics nearly identical to the original model.

2 min read
0

Alibaba Qwen Releases 35B Parameter Qwen3.6-35B-A3B Model with 262K Native Context Window

Alibaba Qwen has released Qwen3.6-35B-A3B, a 35-billion parameter mixture-of-experts model with 3 billion activated parameters and a 262,144-token native context window extendable to 1,010,000 tokens. The model is available in FP8-quantized format using fine-grained quantization with block size of 128.

Architecture and Specifications

Qwen3.6-35B-A3B uses a sparse architecture with 256 experts total, activating 8 routed experts plus 1 shared expert per token. The model features 40 layers with a hidden dimension of 2048 and token embedding of 248,320 (padded). The architecture uses a hybrid attention mechanism combining Gated DeltaNet (32 V heads, 16 QK heads) and Gated Attention (16 Q heads, 2 KV heads) in a 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)) layout.

Benchmark Performance

The model achieves 73.4 on SWE-bench Verified, 67.2 on SWE-bench Multilingual, and 49.5 on SWE-bench Pro, according to Alibaba's internal agent scaffold testing at temperature 1.0 with 200K context window. On Terminal-Bench 2.0, it scores 51.5 (average of 5 runs with 3-hour timeout, 32 CPU/48GB RAM).

On general knowledge benchmarks, Qwen3.6-35B-A3B scores 85.2 on MMLU-Pro, 93.3 on MMLU-Redux, and 64.7 on SuperGPQA. For reasoning tasks, it achieves 86.0 on GPQA, 80.4 on LiveCodeBench v6, and 92.7 on AIME 2026 (I & II combined).

Vision-language capabilities include 81.7 on MMMU, 75.3 on MMMU-Pro, 86.4 on Mathvista (mini), and 85.3 on RealWorldQA. Video understanding benchmarks show 87.0 on VideoMME (with subtitles), 82.5 (without subtitles), and 83.7 on VideoMMMU.

Key Features

Qwen3.6 introduces "thinking preservation," which retains reasoning context from historical messages to reduce overhead in iterative development workflows. The model claims improved handling of frontend workflows and repository-level reasoning compared to previous versions.

The FP8-quantized version maintains performance metrics nearly identical to the original model, according to Alibaba. The model is compatible with Hugging Face Transformers, vLLM, SGLang, and KTransformers frameworks.

Deployment Requirements

Alibaba recommends maintaining a context length of at least 128K tokens to preserve thinking capabilities, though the model supports up to 262K tokens natively. For production workloads, the company suggests using SGLang, KTransformers, or vLLM serving engines with tensor parallelism across 8 GPUs.

Pricing information has not been disclosed. The model weights are available on Hugging Face.

What This Means

Qwen3.6-35B-A3B represents a significant entry in the 30B-40B parameter class with its extended 262K native context window and mixture-of-experts architecture that activates only 3B of 35B parameters per token. The SWE-bench Verified score of 73.4 positions it competitively against models like Qwen3.5-27B (75.0) and above Gemma4-31B (52.0), though exact comparison requires noting different evaluation protocols. The FP8 quantization enables deployment efficiency while maintaining benchmark performance, addressing a key practical constraint for models in this parameter range.

Related Articles

model release

Alibaba Releases Qwen3.6-35B-A3B: 35B Parameter MoE Model with 262K Context Window

Alibaba has released Qwen3.6-35B-A3B, the first open-weight model in the Qwen3.6 series. The model features 35B total parameters with 3B activated, a native 262K context window extensible to 1.01M tokens, and achieves 73.4% on SWE-bench Verified using 256 experts with 8 activated per token.

model release

Anthropic releases Claude Opus 4.7 with improved coding and vision, confirms it trails unreleased Mythos model

Anthropic released Claude Opus 4.7 with improved coding capabilities, higher-resolution vision, and a new reasoning level. The company publicly acknowledged the model underperforms its unreleased Mythos system, which remains restricted due to safety concerns.

model release

OpenAI releases GPT-5.4-Cyber, a cybersecurity-focused model limited to verified security professionals

OpenAI has released GPT-5.4-Cyber, a fine-tuned variant of GPT-5.4 built for defensive cybersecurity work including binary reverse engineering. Access is initially restricted to a few hundred verified security professionals, with expansion planned to thousands of individuals and hundreds of teams in coming weeks.

model release

OpenAI releases GPT-5.4-Cyber with tiered access verification system for cybersecurity work

OpenAI released GPT-5.4-Cyber, a model variant designed for defensive cybersecurity tasks with fewer restrictions on dual-use queries. Access is controlled through a tiered verification system in the Trusted Access for Cyber program, targeting thousands of vetted users compared to Anthropic's 40-organization Mythos Preview rollout.

Comments

Loading...