LLM News | TPS

benchmarkOpenAI

Video AI models hit reasoning ceiling despite 1000x larger dataset, researchers find

An international research team released the largest video reasoning dataset to date—roughly 1,000 times larger than previous alternatives. Testing reveals that state-of-the-art models including Sora 2 and Veo 3.1 substantially underperform humans on reasoning tasks, suggesting the limitation isn't data scarcity but architectural constraints.

March 7, 2026 · 8:50 AM2 min read

video-ai reasoning benchmark

via the-decoder.com ↗