LLM News

Every LLM release, update, and milestone.

Filtered by:long-form-video✕ clear
research

VideoTemp-o3 combines temporal grounding with video QA in single agentic framework

Researchers have introduced VideoTemp-o3, a unified framework that addresses limitations in long-video understanding by combining temporal grounding and question-answering in a single agentic system. The approach uses a unified masking mechanism during training and reinforcement learning with dedicated reward signals to improve video segment localization and reduce hallucinations.