claude-4

1 article tagged with claude-4

February 28, 2026
benchmarkOpenAI

Frontier LLMs lose up to 33% accuracy in long conversations, study finds

Frontier language models including GPT-5.2 and Claude 4.6 experience accuracy degradation of up to 33% as conversations lengthen, according to new research. The finding suggests that extended context use within a single conversation introduces performance challenges even in state-of-the-art models.