LLM News | TPS

benchmark

OmniVideoBench: New 1,000-question benchmark exposes gaps in audio-visual AI reasoning

Researchers have introduced OmniVideoBench, a large-scale evaluation framework comprising 1,000 manually verified question-answer pairs derived from 628 videos (ranging from seconds to 30 minutes) designed to measure synergistic audio-visual reasoning in multimodal large language models. Testing reveals a significant performance gap between open-source and closed-source MLLMs on genuine cross-modal reasoning tasks.

March 6, 2026 · 5:21 AM2 min read

benchmark multimodal video_understanding

via arxiv.org ↗