LLM News

Every LLM release, update, and milestone.

Filtered by:code-generation✕ clear

product updateAnthropic

Anthropic adds scheduled background tasks to Claude Code Desktop

Anthropic has added scheduled task functionality to Claude Code Desktop, allowing users to set up recurring automation that runs in the background. The feature enables Claude to perform routine developer operations like checking error logs and creating pull requests for fixable bugs at specified intervals.

March 7, 2026 · 9:50 AM1 min read

claude-code-desktop anthropic automation

via the-decoder.com ↗

model releaseOpenAI

OpenAI's GPT-5.4 now generally available in GitHub Copilot

OpenAI's GPT-5.4, an agentic coding model, is now generally available in GitHub Copilot. The model was tested on real-world software development scenarios and demonstrated improved coding capabilities.

March 5, 2026 · 11:50 PM1 min read

openai github-copilot code-generation

via github.blog ↗

product updateTabnine

Tabnine launches Enterprise Context Engine to ground AI coding in production environments

Tabnine has introduced its Enterprise Context Engine, designed to give AI models the contextual understanding needed to operate safely within real production development environments. The tool addresses a gap between raw model capability and practical enterprise deployment, where understanding an organization's codebase, dependencies, and architecture is critical.

March 5, 2026 · 8:35 PM2 min read

code-generation enterprise-ai ai-developer-tools

via tabnine.com ↗

research

WAFFLE fine-tuning improves multimodal models for web development by 9 percentage points

Researchers introduce WAFFLE, a fine-tuning methodology that enhances multimodal models' ability to convert UI designs into HTML code. The approach uses structure-aware attention mechanisms and contrastive learning to bridge the gap between visual UI designs and text-based HTML, achieving up to 9 percentage point improvements on benchmark tasks.

March 5, 2026 · 1:10 AM2 min read

research multimodal-models code-generation

via arxiv.org ↗

benchmarkOpenAI

OpenAI says SWE-bench Verified is broken—most tasks reject correct solutions

OpenAI is calling for the retirement of SWE-bench Verified, the widely-used AI coding benchmark, claiming most tasks are flawed enough to reject correct solutions. The company argues that leading AI models have likely seen the answers during training, meaning benchmark scores measure memorization rather than genuine coding ability.

February 23, 2026 · 7:20 PM2 min read

benchmarks SWE-bench code-generation

via the-decoder.com ↗

changelogAnthropic

GitHub deprecates selected Anthropic and OpenAI models from Copilot

GitHub deprecated selected Anthropic and OpenAI models across all Copilot experiences on February 17, 2026. The deprecation affects Copilot Chat, inline edits, ask mode, agent mode, and code completions. Specific model names and transition timelines were not disclosed in the initial announcement.

February 20, 2026 · 9:21 PM1 min read

github-copilot model-deprecation anthropic

via github.blog ↗