code-generation

20 articles tagged with code-generation

May 4, 2026
product update

Augment Code launches Cosmos, an operating system for multi-agent software development workflows

Augment Code has released Cosmos into public preview, positioning it as an operating system for agentic software development. The platform coordinates AI agents across the full software development lifecycle with shared memory, multi-model routing via their Prism system that claims 20-30% token savings, and what the company calls specialized agents that learn from team feedback.

April 30, 2026
model releaseIbm

IBM releases Granite 4.1-8B with 131K context window and enhanced tool-calling capabilities

IBM has released Granite 4.1-8B, an 8-billion parameter long-context model with a 131,072-token context window. The model achieves 85.37% on HumanEval and 73.84% on MMLU 5-shot, with enhanced tool-calling capabilities reaching 68.27% on BFCL v3. Released under Apache 2.0 license, it supports 12 languages.

April 21, 2026
product updateOpenrouter

OpenRouter Launches Pareto Code Router with Dynamic Model Selection Based on Quality Threshold

OpenRouter has released Pareto Code Router, a dynamic routing system that automatically selects from a curated list of coding models based on a user-defined quality threshold. Users set a min_coding_score between 0 and 1, and the router selects an appropriate model from its shortlist without requiring commitment to a specific model.

April 20, 2026
product updateGitHub

GitHub Copilot Individual Plans Change Structure, Details Not Yet Disclosed

GitHub has announced changes to its Copilot Individual subscription plans, citing the need for reliability and predictability for existing customers. The company has not yet disclosed specific details about pricing adjustments, feature modifications, or implementation timelines.

April 16, 2026
product update

Roblox Assistant adds multi-step planning mode and AI-driven playtesting to automate game development

Roblox is deploying agentic features to its Assistant tool that plan, build, and test games through multi-step workflows. The enhanced Planning Mode analyzes code, asks clarifying questions, and creates editable action plans before implementation, while new AI-driven playtesting tools automatically identify and fix bugs.

April 7, 2026
model release

Z.ai releases GLM-5.1, 754B parameter open-weight model with improved code generation

Z.ai has released GLM-5.1, a 754-billion parameter open-weight model matching the size of its predecessor GLM-5. The model demonstrates improved ability to generate complex, multi-part outputs like HTML pages with SVG graphics and CSS animations, available via Hugging Face and OpenRouter.

April 6, 2026
product updateGitHub

GitHub Copilot CLI adds Rubber Duck for second-opinion analysis across model families

GitHub has added a feature called Rubber Duck to Copilot CLI that queries multiple AI model families to provide alternative perspectives on code suggestions. The feature acts as a second opinion mechanism, allowing developers to compare recommendations from different model architectures.

April 3, 2026
model releaseZhipu AI

Zhipu AI releases GLM-5V-Turbo: multimodal model generates front-end code from design mockups

Zhipu AI released GLM-5V-Turbo, a multimodal coding model that converts design mockups directly into executable front-end code. The model processes images, video, and text with a 200,000-token context window and 128,000-token max output, priced at $1.20 per million input tokens and $4 per million output tokens.

April 2, 2026
model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning

Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.

model release

Alibaba releases Qwen 3.6 Plus with 1M context window, free tier now available

Alibaba's Qwen division released Qwen 3.6 Plus on April 2, 2026, offering free access to a model with a 1,000,000 token context window. The model combines linear attention with sparse mixture-of-experts routing and achieves a 78.8 score on SWE-bench Verified for software engineering tasks.

March 31, 2026
model release+1

Alibaba's Qwen3.5-Omni learns to write code from speech and video without explicit training

Alibaba has released Qwen3.5-Omni, an omnimodal model handling text, images, audio, and video with a 256,000-token context window. The model reportedly outperforms Google's Gemini 3.1 Pro on audio tasks with support for 74 languages in speech recognition, a 6x increase from its predecessor. An unexpected emergent capability: writing working code from spoken instructions and video input, which the team did not explicitly train.

March 25, 2026
product updateAnthropic

Anthropic's Claude Code Auto Mode enables automatic execution of safe commands while blocking risky actions

Anthropic has released Auto Mode for Claude Code, a middle-ground safety feature that automatically executes safe local operations while blocking risky actions like external deployments and mass deletions. A Claude Sonnet 4.6 classifier evaluates each command based on conversation context, and the system reverts to manual approval after three consecutive blocks or twenty total blocks. The feature is available as a research preview for Team plan users, with Enterprise and API access expected shortly.

March 17, 2026
analysis

Mistral's Leanstral code verification agent outperforms Claude Sonnet at 15% of the cost

Mistral has released Leanstral, a 120B-parameter code verification agent built with the Lean programming language, claiming it outperforms larger open-source models and offers significant cost advantages over Anthropic's Claude suite. The model achieves a pass@2 score of 26.3—beating Claude Sonnet by 2.6 points—while costing $36 to run compared to Sonnet's $549.

model releaseOpenAI

OpenAI's GPT-5.4 mini now available in GitHub Copilot

OpenAI has released GPT-5.4 mini, the lightweight variant of its agentic coding model GPT-5.4, in GitHub Copilot. The model represents OpenAI's highest-performing mini offering to date for code generation and completion tasks.

March 9, 2026
product updateAnthropic

Anthropic launches Code Review tool to automatically analyze AI-generated code

Anthropic has launched Code Review, a multi-agent system within Claude Code that automatically analyzes AI-generated code and flags logic errors. The tool addresses enterprise concerns about managing the increasing volume of code produced by AI systems.

March 7, 2026
product updateAnthropic

Anthropic adds scheduled background tasks to Claude Code Desktop

Anthropic has added scheduled task functionality to Claude Code Desktop, allowing users to set up recurring automation that runs in the background. The feature enables Claude to perform routine developer operations like checking error logs and creating pull requests for fixable bugs at specified intervals.

March 5, 2026
model releaseOpenAI

OpenAI's GPT-5.4 now generally available in GitHub Copilot

OpenAI's GPT-5.4, an agentic coding model, is now generally available in GitHub Copilot. The model was tested on real-world software development scenarios and demonstrated improved coding capabilities.

product updateTabnine

Tabnine launches Enterprise Context Engine to ground AI coding in production environments

Tabnine has introduced its Enterprise Context Engine, designed to give AI models the contextual understanding needed to operate safely within real production development environments. The tool addresses a gap between raw model capability and practical enterprise deployment, where understanding an organization's codebase, dependencies, and architecture is critical.

February 23, 2026
benchmarkOpenAI

OpenAI says SWE-bench Verified is broken—most tasks reject correct solutions

OpenAI is calling for the retirement of SWE-bench Verified, the widely-used AI coding benchmark, claiming most tasks are flawed enough to reject correct solutions. The company argues that leading AI models have likely seen the answers during training, meaning benchmark scores measure memorization rather than genuine coding ability.

February 20, 2026
changelogAnthropic

GitHub deprecates selected Anthropic and OpenAI models from Copilot

GitHub deprecated selected Anthropic and OpenAI models across all Copilot experiences on February 17, 2026. The deprecation affects Copilot Chat, inline edits, ask mode, agent mode, and code completions. Specific model names and transition timelines were not disclosed in the initial announcement.