model release

Alibaba Qwen Releases 27B Parameter Model That Claims to Match 397B Performance on Coding Tasks

TL;DR

Alibaba Qwen released Qwen3.6-27B, a 27B parameter dense model that claims flagship-level coding performance surpassing their previous 397B MoE model across major coding benchmarks. The full model is 55.6GB compared to 807GB for the predecessor.

1 min read
0

Alibaba Qwen Releases 27B Parameter Model That Claims to Match 397B Performance on Coding Tasks

Alibaba Qwen released Qwen3.6-27B, a 27B parameter dense model that claims to deliver "flagship-level agentic coding performance" surpassing their previous-generation Qwen3.5-397B-A17B (a 397B total parameter, 17B active MoE model) across all major coding benchmarks, according to the company.

The size difference is significant: Qwen3.6-27B is 55.6GB on Hugging Face compared to 807GB for Qwen3.5-397B-A17B. A quantized Q4_K_M version from Unsloth reduces the footprint to 16.8GB.

Performance Testing

Simon Willison tested the 16.8GB quantized version using llama.cpp's llama-server. In a test generating an SVG of a pelican riding a bicycle, the model produced a detailed, coherent image with correct bicycle geometry (spokes, chain, frame), a recognizable pelican, and background details including clouds, birds, and grass.

Performance metrics from llama-server:

  • Prompt processing: 54.32 tokens/s (20 tokens in 0.4s)
  • Generation speed: 25.57 tokens/s (4,444 tokens in 2min 53s)

Model Availability

Qwen3.6-27B is available as open weights on Hugging Face. The model includes reasoning mode support via the --reasoning on flag and uses a 65,536 token context window in testing configurations.

Specific benchmark scores and pricing were not disclosed in the announcement. The model represents Qwen's approach to achieving competitive coding performance in a significantly smaller architecture compared to their MoE models.

What This Means

If the coding benchmark claims hold up to independent verification, Qwen3.6-27B represents a substantial efficiency gain—achieving similar performance to a 397B parameter model in a 27B dense architecture. The 16.8GB quantized version running locally at 25 tokens/s makes flagship-level coding capabilities accessible on consumer hardware. However, the specific benchmarks and scores referenced in Qwen's "all major coding benchmarks" claim require independent validation.

Related Articles

model release

NVIDIA Releases Nemotron-3-Ultra: 550B Parameter Model with 1M Token Context and Configurable Reasoning

NVIDIA released Nemotron-3-Ultra-550B-A55B-NVFP4, a 550B parameter model with 55B active parameters, featuring a 1M token context window and configurable reasoning mode. The model uses a hybrid LatentMoE architecture combining Mamba-2, Mixture-of-Experts, and Attention layers with Multi-Token Prediction, trained with NVIDIA's NVFP4 quantization-aware approach.

model release

NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning

NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.

model release

NVIDIA Releases Nemotron 3.5 ASR: 600M-Parameter Streaming Speech Model for 40 Languages

NVIDIA released Nemotron 3.5 ASR, a 600M-parameter speech-to-text model supporting 40 language-locales from a single checkpoint. The model achieves 0.07 seconds to final transcript after speech ends and ranks 2nd in latency among streaming ASR models according to Artificial Analysis benchmarks.

model release

Google DeepMind Releases Gemma 4: Encoder-Free Multimodal Models from 2.3B to 30.7B Parameters

Google DeepMind released Gemma 4, a family of open-weight multimodal models ranging from 2.3B to 30.7B parameters. The flagship 12B Unified model eliminates separate encoders, processing text, images, audio, and video directly through a single decoder-only transformer with up to 256K token context window.

Comments

Loading...