code-generation
8 articles tagged with code-generation
Mistral's Leanstral code verification agent outperforms Claude Sonnet at 15% of the cost
Mistral has released Leanstral, a 120B-parameter code verification agent built with the Lean programming language, claiming it outperforms larger open-source models and offers significant cost advantages over Anthropic's Claude suite. The model achieves a pass@2 score of 26.3—beating Claude Sonnet by 2.6 points—while costing $36 to run compared to Sonnet's $549.
OpenAI's GPT-5.4 mini now available in GitHub Copilot
OpenAI has released GPT-5.4 mini, the lightweight variant of its agentic coding model GPT-5.4, in GitHub Copilot. The model represents OpenAI's highest-performing mini offering to date for code generation and completion tasks.
Anthropic launches Code Review tool to automatically analyze AI-generated code
Anthropic has launched Code Review, a multi-agent system within Claude Code that automatically analyzes AI-generated code and flags logic errors. The tool addresses enterprise concerns about managing the increasing volume of code produced by AI systems.
Anthropic adds scheduled background tasks to Claude Code Desktop
Anthropic has added scheduled task functionality to Claude Code Desktop, allowing users to set up recurring automation that runs in the background. The feature enables Claude to perform routine developer operations like checking error logs and creating pull requests for fixable bugs at specified intervals.
OpenAI's GPT-5.4 now generally available in GitHub Copilot
OpenAI's GPT-5.4, an agentic coding model, is now generally available in GitHub Copilot. The model was tested on real-world software development scenarios and demonstrated improved coding capabilities.
Tabnine launches Enterprise Context Engine to ground AI coding in production environments
Tabnine has introduced its Enterprise Context Engine, designed to give AI models the contextual understanding needed to operate safely within real production development environments. The tool addresses a gap between raw model capability and practical enterprise deployment, where understanding an organization's codebase, dependencies, and architecture is critical.
OpenAI says SWE-bench Verified is broken—most tasks reject correct solutions
OpenAI is calling for the retirement of SWE-bench Verified, the widely-used AI coding benchmark, claiming most tasks are flawed enough to reject correct solutions. The company argues that leading AI models have likely seen the answers during training, meaning benchmark scores measure memorization rather than genuine coding ability.
GitHub deprecates selected Anthropic and OpenAI models from Copilot
GitHub deprecated selected Anthropic and OpenAI models across all Copilot experiences on February 17, 2026. The deprecation affects Copilot Chat, inline edits, ask mode, agent mode, and code completions. Specific model names and transition timelines were not disclosed in the initial announcement.