SWE-bench

4 articles tagged with SWE-bench

April 22, 2026
model release

Alibaba releases Qwen3.6-27B with 262K context window, scores 53.5% on SWE-bench Pro

Alibaba has released Qwen3.6-27B, a 27-billion parameter language model with a native 262,144 token context window (extensible to 1,010,000 tokens). The model achieves 53.5% on SWE-bench Pro and 77.2% on SWE-bench Verified, with FP8 quantization providing near-identical performance to the full-precision version.

April 17, 2026
model release

Alibaba Qwen Releases 35B Parameter Qwen3.6-35B-A3B Model with 262K Native Context Window

Alibaba Qwen has released Qwen3.6-35B-A3B, a 35-billion parameter mixture-of-experts model with 3 billion activated parameters and a 262,144-token native context window extendable to 1,010,000 tokens. The model scores 73.4 on SWE-bench Verified and features FP8 quantization with performance metrics nearly identical to the original model.

April 16, 2026
model release+1

Alibaba Releases Qwen3.6-35B-A3B: 35B Parameter MoE Model with 262K Context Window

Alibaba has released Qwen3.6-35B-A3B, the first open-weight model in the Qwen3.6 series. The model features 35B total parameters with 3B activated, a native 262K context window extensible to 1.01M tokens, and achieves 73.4% on SWE-bench Verified using 256 experts with 8 activated per token.

February 23, 2026
benchmarkOpenAI

OpenAI says SWE-bench Verified is broken—most tasks reject correct solutions

OpenAI is calling for the retirement of SWE-bench Verified, the widely-used AI coding benchmark, claiming most tasks are flawed enough to reject correct solutions. The company argues that leading AI models have likely seen the answers during training, meaning benchmark scores measure memorization rather than genuine coding ability.