frontier-models

5 articles tagged with frontier-models

March 26, 2026
benchmarkOpenAI

ARC-AGI-3 benchmark: frontier AI models score below 1%, humans solve all 135 tasks

The ARC Prize Foundation released ARC-AGI-3, an interactive benchmark requiring AI agents to explore environments, form hypotheses, and execute plans without instructions. All 135 environments were solved by untrained humans, yet frontier models—including Gemini 3.1 Pro Preview (0.37%), GPT 5.4 (0.26%), Opus 4.6 (0.25%), and Grok-4.20 (0.00%)—scored below 1%.

March 14, 2026
fundingNVIDIA

Nvidia to spend $26B on open-weight AI models, filing reveals

Nvidia will invest $26 billion over the next five years to build open-weight AI models, according to a 2025 financial filing confirmed by executives. The move signals a strategic shift from chipmaker to AI frontier lab, with the company releasing Nemotron 3 Super (128B parameters) and claiming it outperforms GPT-OSS on multiple benchmarks.

March 5, 2026
model releaseOpenAI

OpenAI releases GPT-5.4 with Pro and Thinking variants for professional use

OpenAI has launched GPT-5.4, which the company describes as its most capable and efficient frontier model for professional work. The release includes Pro and Thinking variants, though specific technical specifications and pricing remain unclear.

February 28, 2026
benchmarkOpenAI

Frontier LLMs lose up to 33% accuracy in long conversations, study finds

Frontier language models including GPT-5.2 and Claude 4.6 experience accuracy degradation of up to 33% as conversations lengthen, according to new research. The finding suggests that extended context use within a single conversation introduces performance challenges even in state-of-the-art models.

February 20, 2026
model release

Alibaba Qwen 3.5 closes performance gap with proprietary models at lower inference cost

Alibaba has released the Qwen 3.5 series, an open-source model that claims performance comparable to frontier proprietary models while running on commodity hardware. The release signals a shift in AI model economics, offering enterprises lower inference costs and greater deployment flexibility than closed alternatives.