LLM News

Every LLM release, update, and milestone.

Filtered by:video-understanding✕ clear
benchmarkOpenAI

Video AI models hit reasoning ceiling despite 1000x larger dataset, researchers find

An international research team released the largest video reasoning dataset to date—roughly 1,000 times larger than previous alternatives. Testing reveals that state-of-the-art models including Sora 2 and Veo 3.1 substantially underperform humans on reasoning tasks, suggesting the limitation isn't data scarcity but architectural constraints.

2 min readvia the-decoder.com
research

FLoC reduces video AI token load by 50%+ without retraining using facility location algorithm

Researchers propose FLoC, a training-free visual token compression framework that selects representative subsets of video tokens using facility location algorithms and lazy greedy optimization. The method works across any video-based large multimodal model without requiring retraining, achieving near-optimal compression ratios on benchmarks including Video-MME, MLVU, LongVideoBench, and EgoSchema.

research

VideoTemp-o3 combines temporal grounding with video QA in single agentic framework

Researchers have introduced VideoTemp-o3, a unified framework that addresses limitations in long-video understanding by combining temporal grounding and question-answering in a single agentic system. The approach uses a unified masking mechanism during training and reinforcement learning with dedicated reward signals to improve video segment localization and reduce hallucinations.