Alibaba Qwen Releases 27B Parameter Model with 262K Context Window, Claims 77.2% on SWE-bench Verified
Alibaba Qwen released Qwen3.6-27B, a 27-billion parameter model with a 262,144 token context window extensible to 1,010,000 tokens. The model claims 77.2% on SWE-bench Verified and 53.5% on SWE-bench Pro, with open weights available on Hugging Face.
Qwen3.6 27B — Quick Specs
Qwen3.6-27B Released with 262K Context Window
Alibaba Qwen released Qwen3.6-27B, a 27-billion parameter open-weight language model with a 262,144 token context window natively, extensible to 1,010,000 tokens. The model represents the first release in the Qwen3.6 series following the February 2025 Qwen3.5 launch.
Architecture Details
Qwen3.6-27B uses a non-standard transformer architecture with 64 layers and a 5,120 hidden dimension. The model employs a hybrid attention mechanism: 16 blocks of 3 "Gated DeltaNet" layers followed by 1 "Gated Attention" layer per block.
The Gated DeltaNet uses 48 linear attention heads for V and 16 for QK with 128 head dimension. The Gated Attention uses 24 attention heads for Q and 4 for KV with 256 head dimension. The model has 248,320 tokens in its vocabulary (padded) and uses rotary position embeddings with 64 dimensions.
Benchmark Performance
According to Alibaba, Qwen3.6-27B achieves:
- SWE-bench Verified: 77.2% (vs. 80.9% for Claude 4.5 Opus)
- SWE-bench Pro: 53.5% (vs. 57.1% for Claude 4.5 Opus)
- SWE-bench Multilingual: 71.3%
- Terminal-Bench 2.0: 59.3% (tied with Claude 4.5 Opus)
- SkillsBench Avg5: 48.2%
- MMLU-Pro: 86.2%
- GPQA Diamond: 87.8%
- AIME 2026: 94.1%
The company evaluated models using internal agent scaffolds with temperature 1.0, top_p 0.95, and 200K context windows for SWE-bench series tests. All benchmarks used 256K context windows unless specified otherwise.
Vision Capabilities
The model includes a vision encoder for multimodal tasks. According to Alibaba's benchmarks:
- MMMU: 82.9%
- MathVista mini: 87.4%
- VideoMME (with subtitles): 87.7%
- AndroidWorld: 70.3%
- Visual Agent V*: 94.7%
Key Features
Qwen3.6-27B introduces "thinking preservation" to retain reasoning context from historical messages during iterative development. The model supports multi-token prediction (MTP) during training and can be deployed with speculative decoding for faster inference.
The model is compatible with Hugging Face Transformers, vLLM (version 0.19.0+), SGLang (version 0.5.10+), and KTransformers. Alibaba recommends maintaining at least 128K token context length to preserve reasoning capabilities, though the default is 262K tokens.
Availability
Pricing has not been disclosed. Open weights are available on Hugging Face under the repository Qwen/Qwen3.6-27B. The model can be served via OpenAI-compatible APIs using standard inference frameworks.
What This Means
Qwen3.6-27B enters the competitive 20-30B parameter space with strong coding performance claims, particularly on repository-level tasks. The 262K native context window and 1M token extensibility position it for long-context applications, though real-world performance at extended lengths requires independent verification. The hybrid Gated DeltaNet/Attention architecture is unconventional and may offer efficiency advantages, but deployment complexity compared to standard transformers remains to be seen in production environments.
Related Articles
Alibaba's Qwen Releases Qwen3.7 Plus: 1M Context Window at $0.40 Per Million Input Tokens
Alibaba's Qwen has released Qwen3.7 Plus, a multimodal model with a 1 million token context window. The model accepts text and image input with text output, priced at $0.40 per million input tokens and $1.60 per million output tokens through OpenRouter's API.
NVIDIA Releases Nemotron-3-Ultra: 550B Parameter Model with 1M Token Context and Configurable Reasoning
NVIDIA released Nemotron-3-Ultra-550B-A55B-NVFP4, a 550B parameter model with 55B active parameters, featuring a 1M token context window and configurable reasoning mode. The model uses a hybrid LatentMoE architecture combining Mamba-2, Mixture-of-Experts, and Attention layers with Multi-Token Prediction, trained with NVIDIA's NVFP4 quantization-aware approach.
NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning
NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.
Ideogram 4: 9.3B parameter open-weight text-to-image model with native 2K resolution and structured JSON prompting
Ideogram has released Ideogram 4, its first open-weight text-to-image model with 9.3 billion parameters. The model supports native 2K resolution, structured JSON prompting with bounding-box layout controls, and is available in nf4 and fp8 quantizations under a non-commercial license.
Comments
Loading...