video understanding

4 articles tagged with video understanding

May 12, 2026
model release

Perceptron Launches Mk1 Vision-Language Model with Video Reasoning at $0.15/$1.50 per 1M Tokens

Perceptron has released Perceptron Mk1, a vision-language model designed for video understanding and embodied reasoning tasks. The model accepts image and video inputs with 33K context window, priced at $0.15 per 1M input tokens and $1.50 per 1M output tokens, and supports structured spatial annotations on demand.

May 2, 2026
model releaseNVIDIA

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.

April 29, 2026
model releaseNVIDIA

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

NVIDIA released Nemotron 3 Nano Omni, a 31B parameter (30B active, 3B per token) multimodal model supporting video, audio, image, and text inputs. The model features a 256K token context window, reasoning mode with chain-of-thought, and tool calling capabilities.

model releaseNVIDIA+1

NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode

NVIDIA has released Nemotron 3 Nano Omni 30B-A3B, a multimodal large language model with 31 billion parameters using a Mamba2-Transformer hybrid Mixture of Experts architecture. The model supports video, audio, image, and text inputs with a 256K token context window and includes a dedicated reasoning mode with chain-of-thought capabilities.