gpt-5

3 articles tagged with gpt-5

March 26, 2026
benchmarkOpenAI

ARC-AGI-3 benchmark: frontier AI models score below 1%, humans solve all 135 tasks

The ARC Prize Foundation released ARC-AGI-3, an interactive benchmark requiring AI agents to explore environments, form hypotheses, and execute plans without instructions. All 135 environments were solved by untrained humans, yet frontier models—including Gemini 3.1 Pro Preview (0.37%), GPT 5.4 (0.26%), Opus 4.6 (0.25%), and Grok-4.20 (0.00%)—scored below 1%.

March 17, 2026
product update

DuckDuckGo adds GPT-5 mini and GPT-5.2 reasoning models to Duck.ai privacy chatbot

DuckDuckGo's Duck.ai chatbot platform now includes OpenAI's GPT-5 mini for free users and GPT-5.2 for subscribers, both with reasoning capabilities. The platform continues to anonymize all conversations by default, stripping metadata before routing chats to model providers including Anthropic, Meta, Mistral, and OpenAI.

February 28, 2026
benchmarkOpenAI

Frontier LLMs lose up to 33% accuracy in long conversations, study finds

Frontier language models including GPT-5.2 and Claude 4.6 experience accuracy degradation of up to 33% as conversations lengthen, according to new research. The finding suggests that extended context use within a single conversation introduces performance challenges even in state-of-the-art models.