code-generation

8 articles tagged with code-generation

March 17, 2026
analysis

Mistral's Leanstral code verification agent outperforms Claude Sonnet at 15% of the cost

Mistral has released Leanstral, a 120B-parameter code verification agent built with the Lean programming language, claiming it outperforms larger open-source models and offers significant cost advantages over Anthropic's Claude suite. The model achieves a pass@2 score of 26.3—beating Claude Sonnet by 2.6 points—while costing $36 to run compared to Sonnet's $549.

model releaseOpenAI

OpenAI's GPT-5.4 mini now available in GitHub Copilot

OpenAI has released GPT-5.4 mini, the lightweight variant of its agentic coding model GPT-5.4, in GitHub Copilot. The model represents OpenAI's highest-performing mini offering to date for code generation and completion tasks.

March 9, 2026
product updateAnthropic

Anthropic launches Code Review tool to automatically analyze AI-generated code

Anthropic has launched Code Review, a multi-agent system within Claude Code that automatically analyzes AI-generated code and flags logic errors. The tool addresses enterprise concerns about managing the increasing volume of code produced by AI systems.

March 7, 2026
product updateAnthropic

Anthropic adds scheduled background tasks to Claude Code Desktop

Anthropic has added scheduled task functionality to Claude Code Desktop, allowing users to set up recurring automation that runs in the background. The feature enables Claude to perform routine developer operations like checking error logs and creating pull requests for fixable bugs at specified intervals.

March 5, 2026
model releaseOpenAI

OpenAI's GPT-5.4 now generally available in GitHub Copilot

OpenAI's GPT-5.4, an agentic coding model, is now generally available in GitHub Copilot. The model was tested on real-world software development scenarios and demonstrated improved coding capabilities.

product updateTabnine

Tabnine launches Enterprise Context Engine to ground AI coding in production environments

Tabnine has introduced its Enterprise Context Engine, designed to give AI models the contextual understanding needed to operate safely within real production development environments. The tool addresses a gap between raw model capability and practical enterprise deployment, where understanding an organization's codebase, dependencies, and architecture is critical.

February 23, 2026
benchmarkOpenAI

OpenAI says SWE-bench Verified is broken—most tasks reject correct solutions

OpenAI is calling for the retirement of SWE-bench Verified, the widely-used AI coding benchmark, claiming most tasks are flawed enough to reject correct solutions. The company argues that leading AI models have likely seen the answers during training, meaning benchmark scores measure memorization rather than genuine coding ability.

February 20, 2026
changelogAnthropic

GitHub deprecates selected Anthropic and OpenAI models from Copilot

GitHub deprecated selected Anthropic and OpenAI models across all Copilot experiences on February 17, 2026. The deprecation affects Copilot Chat, inline edits, ask mode, agent mode, and code completions. Specific model names and transition timelines were not disclosed in the initial announcement.