LLM News

Every LLM release, update, and milestone.

Filtered by:conversation-degradation✕ clear

benchmarkOpenAI

Frontier LLMs lose up to 33% accuracy in long conversations, study finds

Frontier language models including GPT-5.2 and Claude 4.6 experience accuracy degradation of up to 33% as conversations lengthen, according to new research. The finding suggests that extended context use within a single conversation introduces performance challenges even in state-of-the-art models.

February 28, 2026 · 6:05 PM2 min read

accuracy benchmarks context-window

via the-decoder.com ↗