coding
27 articles tagged with coding
Microsoft launches MAI-Code-1 and MAI-Thinking-1 models to reduce OpenAI dependence
Microsoft announced two proprietary AI models at its Build developer conference: MAI-Code-1 for code generation and MAI-Thinking-1 for reasoning tasks. The models are designed to run on Azure infrastructure, allowing Microsoft to reduce costs from its $13 billion OpenAI investment while competing directly with Anthropic and Google.
Anthropic releases Claude Opus 4.8 with 69.2% agentic coding score, 2.5x faster performance
Anthropic released Claude Opus 4.8 on May 28, 2026, six weeks after version 4.7. The model achieves 69.2% on agentic coding benchmarks (up from 64.3%), runs 2.5 times faster in fast mode at one-third the cost, while maintaining the same pricing as version 4.7.
Alibaba Releases Qwen3.7 Max with 1M Token Context Window for Agent and Coding Tasks
Alibaba has released Qwen3.7 Max, the flagship model in its Qwen3.7 series, featuring a 1 million token context window. The text-only model is designed for agent-centric workloads with strengths in coding, office productivity, and long-horizon autonomous execution, and includes explicit prompt caching support.
Google releases Gemini 3.5 Flash with autonomous coding and agent capabilities, claims 4x speed boost
Google released Gemini 3.5 Flash, positioning it as an agent-first model designed for autonomous coding and multi-hour workflows. The company claims the model outperforms its 3.1 Pro predecessor on coding and agentic benchmarks while running 4x faster than competing frontier models, with an optimized version achieving 12x speed gains.
Google Releases Gemini 3.5 Flash with 1M Token Context and Configurable Thinking Modes at $1.50/$9 Per Million Tokens
Google has released Gemini 3.5 Flash, a multimodal model with a 1 million token context window priced at $1.50 per million input tokens and $9 per million output tokens. The model supports text, image, video, audio, and PDF inputs with configurable thinking effort levels from minimal to high.
Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks
Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.
Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning
Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that unifies instruction-following, reasoning, and coding capabilities. The model features configurable reasoning effort per request and a vision encoder trained from scratch for variable image sizes.
Alibaba's Qwen Team Releases Qwen3.6 27B With 262K Context Window and Video Processing
Alibaba's Qwen Team has released Qwen3.6 27B, a 27-billion parameter multimodal language model with a 262,144-token context window. The model accepts text, image, and video inputs and includes a built-in thinking mode for extended reasoning, with pricing at $0.195 per million input tokens and $1.56 per million output tokens.
OpenAI discontinues separate Codex line, merges coding capabilities into GPT-5.5
OpenAI will not release a separate GPT-5.5-Codex model, according to Romain Huet. The company unified its Codex coding model with the main GPT line starting with GPT-5.4, with GPT-5.5 featuring enhanced agentic coding and computer use capabilities.
DeepSeek releases V4 preview, claims parity with GPT-4o and Claude 3.5 Sonnet
DeepSeek released a preview of its V4 model on April 24, 2026, claiming the open-source system matches leading closed-source models from Anthropic, Google, and OpenAI. The company emphasized improved coding capabilities and compatibility with domestic Huawei chips, but did not disclose training costs or hardware specifications.
DeepSeek V4 Flash Released: 284B Parameter MoE Model with 1M Context Window at $0.14 per Million Tokens
DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per request. The model supports a 1,048,576-token context window and is priced at $0.14 per million input tokens and $0.28 per million output tokens.
Tencent Releases Hy3-Preview: 295B-Parameter MoE Model with 21B Active Parameters
Tencent has released Hy3-preview, a 295-billion-parameter Mixture-of-Experts model with 21 billion active parameters and a 256K context window. The model scores 76.28% on MATH and 34.86% on LiveCodeBench-v6, with particularly strong performance on coding agent tasks.
OpenAI GPT-5.5 Powers Codex Coding Agent on NVIDIA GB200 Infrastructure
OpenAI has released GPT-5.5, its latest frontier model, according to NVIDIA. The model powers Codex, OpenAI's agentic coding application, running on NVIDIA GB200 NVL72 rack-scale systems.
OpenAI releases GPT-5.5 with improved coding efficiency, fewer tokens needed
OpenAI has released GPT-5.5, one month after GPT-5.4, claiming improved performance on coding tasks and reduced token usage in its Codex platform. The model rolls out April 24 to paid ChatGPT tiers.
OpenAI releases GPT-5.5 with improved coding and computer control capabilities
OpenAI released GPT-5.5, its latest AI model with enhanced coding, computer operation, and research capabilities. The model is rolling out to paid subscribers in ChatGPT and Codex, with API access coming soon.
Alibaba releases Qwen3.6-27B with 262K context window, scores 53.5% on SWE-bench Pro
Alibaba has released Qwen3.6-27B, a 27-billion parameter language model with a native 262,144 token context window (extensible to 1,010,000 tokens). The model achieves 53.5% on SWE-bench Pro and 77.2% on SWE-bench Verified, with FP8 quantization providing near-identical performance to the full-precision version.
Alibaba Qwen Releases 27B Parameter Model That Claims to Match 397B Performance on Coding Tasks
Alibaba Qwen released Qwen3.6-27B, a 27B parameter dense model that claims flagship-level coding performance surpassing their previous 397B MoE model across major coding benchmarks. The full model is 55.6GB compared to 807GB for the predecessor.
Alibaba Qwen Releases 27B Parameter Model with 262K Context Window, Claims 77.2% on SWE-bench Verified
Alibaba Qwen released Qwen3.6-27B, a 27-billion parameter model with a 262,144 token context window extensible to 1,010,000 tokens. The model claims 77.2% on SWE-bench Verified and 53.5% on SWE-bench Pro, with open weights available on Hugging Face.
Anthropic releases Claude Opus 4.7 with improved coding and vision, confirms it trails unreleased Mythos model
Anthropic released Claude Opus 4.7 with improved coding capabilities, higher-resolution vision, and a new reasoning level. The company publicly acknowledged the model underperforms its unreleased Mythos system, which remains restricted due to safety concerns.
Anthropic ships Claude Opus 4.7 with improved coding reliability and multimodal capabilities
Anthropic has released Claude Opus 4.7, its latest generally available AI model focused on advanced software engineering. The model shows improvements in handling complex coding tasks with less supervision, enhanced vision capabilities, and better instruction following, while introducing a new tokenizer that increases token usage by 1.0-1.35× depending on content type.
Alibaba Releases Qwen3.6-35B-A3B: 35B Parameter MoE Model with 262K Context Window
Alibaba has released Qwen3.6-35B-A3B, the first open-weight model in the Qwen3.6 series. The model features 35B total parameters with 3B activated, a native 262K context window extensible to 1.01M tokens, and achieves 73.4% on SWE-bench Verified using 256 experts with 8 activated per token.
Zhipu AI's GLM-5.1 outperforms GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro through iterative strategy refinement
Zhipu AI has released GLM-5.1, a freely available open-weight model designed for long-running programming tasks that achieves 58.4% on SWE-Bench Pro, edging out GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%). The model's core capability is iterative strategy refinement—it rethinks its approach across hundreds of iterations and thousands of tool calls, recognizing dead ends and shifting tactics without human intervention. However, GLM-5.1 trails on reasoning and knowledge benchmarks, scoring 31% on Humanity's Last Exam compared to Gemini 3.1 Pro's 45%.
Alibaba's Qwen3.6 Plus reaches 78.8 on SWE-bench with 1M context window
Alibaba released Qwen3.6 Plus on April 2, 2026, featuring a 1 million token context window at $0.50 per million input tokens and $3 per million output tokens. The model combines linear attention with sparse mixture-of-experts routing to achieve a 78.8 score on SWE-bench Verified, with significant improvements in agentic coding, front-end development, and reasoning tasks.
Gemini 3.1 Pro launches in Augment Code at 2.6x cheaper than Claude Opus 4.6
Augment Code now offers Gemini 3.1 Pro alongside Claude Opus 4.6 and GPT-5.4. In head-to-head testing on structural refactoring tasks, Gemini matched or outperformed Opus while consuming 268 credits per task—46% cheaper than Opus's 488 credits—making it 2.6x more cost-effective per message in real-world usage.
Alibaba releases Qwen3.6-Plus with 1M token context, claims performance near Claude 4.5 Opus
Alibaba has released Qwen3.6-Plus, its third proprietary AI model in days, featuring a 1 million token context window available via Alibaba Cloud Model Studio API. The model claims improved agentic coding capabilities and partially outperforms Anthropic's Claude 4.5 Opus in Alibaba-conducted benchmarks, though trails Claude 4.6 Opus released in December 2025.
Alibaba releases Qwen 3.6 Plus Preview with 1M token context, free via OpenRouter
Alibaba's Qwen division has released Qwen 3.6 Plus Preview, a free multimodal model available via OpenRouter with a 1,000,000 token context window. The model claims stronger reasoning and more reliable agentic behavior compared to the 3.5 series, with particular strength in coding and complex problem-solving tasks.
OpenAI releases GPT-5.4 mini and nano with 3-4x price increases but major performance gains
OpenAI has released GPT-5.4 mini and GPT-5.4 nano, compact models optimized for coding and subagent tasks. The new models deliver significant performance improvements—GPT-5.4 mini reaches 54.4% on SWE-Bench Pro versus 45.7% for GPT-5 mini—but cost 3-4x more per input token than their predecessors.