LLM News

Every LLM release, update, and milestone.

Filtered by:agentic-ai✕ clear

model releaseOpenAI

OpenAI's GPT-5.4 now generally available in GitHub Copilot

OpenAI's GPT-5.4, an agentic coding model, is now generally available in GitHub Copilot. The model was tested on real-world software development scenarios and demonstrated improved coding capabilities.

March 5, 2026 · 11:50 PM1 min read

openai github-copilot code-generation

via github.blog ↗

model releaseOpenAI

OpenAI launches GPT-5.4 with native computer use capabilities for autonomous agents

OpenAI has launched GPT-5.4, its latest model with native computer use capabilities that allow it to operate computers and complete tasks across applications. The release represents a step toward autonomous AI agents that can handle complex jobs independently. The model includes advancements in reasoning, coding, and professional work with spreadsheets, documents, and presentations.

March 5, 2026 · 6:06 PM1 min read

gpt-5-4 openai computer-use

via theverge.com ↗

benchmark

AMA-Bench reveals major gaps in LLM agent memory systems with real-world evaluation

Researchers introduce AMA-Bench, a benchmark for evaluating long-horizon memory in LLM-based autonomous agents using real-world trajectories and synthetic scaling. Existing memory systems underperform due to lack of causality and reliance on lossy similarity-based retrieval. The proposed AMA-Agent system with causality graphs and tool-augmented retrieval achieves 57.22% accuracy, outperforming baselines by 11.16 percentage points.

March 5, 2026 · 5:10 AM2 min read

benchmarks agents memory-systems

via arxiv.org ↗

research

RAPO framework improves LLM agent reasoning by combining retrieval with reinforcement learning

Researchers introduce RAPO (Retrieval-Augmented Policy Optimization), a reinforcement learning framework that improves LLM agent reasoning by incorporating off-policy retrieval signals during training. The method achieves an average 5.0% performance gain across fourteen datasets and delivers 1.2x faster training efficiency compared to existing agentic RL approaches.

March 5, 2026 · 1:51 AM2 min read

reinforcement-learning llm-agents agentic-ai

via arxiv.org ↗

research

Code agents can evolve math problems into harder variants, study finds

A new study demonstrates that code agents can autonomously evolve existing math problems into more complex, solvable variations through systematic exploration. The multi-agent framework addresses a critical bottleneck in training advanced LLMs toward IMO-level mathematical reasoning by providing a scalable mechanism for synthesizing high-difficulty problems.

March 5, 2026 · 1:38 AM2 min read

research code-agents mathematics

via arxiv.org ↗

research

VideoTemp-o3 combines temporal grounding with video QA in single agentic framework

Researchers have introduced VideoTemp-o3, a unified framework that addresses limitations in long-video understanding by combining temporal grounding and question-answering in a single agentic system. The approach uses a unified masking mechanism during training and reinforcement learning with dedicated reward signals to improve video segment localization and reduce hallucinations.

March 5, 2026 · 12:51 AM2 min read

video-understanding temporal-grounding long-form-video

via arxiv.org ↗

product update

AIG deploys agentic AI system with orchestration layer for underwriting

American International Group (AIG) has deployed an agentic AI system with an orchestration layer, reporting faster-than-expected productivity gains in underwriting and portfolio management. The deployment demonstrates measurable improvements in throughput and workflow efficiency, according to recent investor disclosures.

February 20, 2026 · 4:37 AM2 min read

agentic-ai insurance enterprise-ai

via artificialintelligence-news.com ↗