multimodal-models

4 articles tagged with multimodal-models

April 11, 2026
benchmark

AI models guess instead of asking for help, ProactiveBench study shows

Researchers introduced ProactiveBench, a benchmark testing whether multimodal language models ask for help when visual information is missing. Out of 22 models tested—including GPT-4.1, GPT-5.2, and o4-mini—almost none proactively request clarification, instead hallucinating or refusing to respond. A reinforcement learning approach showed models can be trained to ask for help, improving performance from 17.5% to 37-38%, though significant gaps remain.

April 3, 2026
model releaseZhipu AI

Zhipu AI releases GLM-5V-Turbo: multimodal model generates front-end code from design mockups

Zhipu AI released GLM-5V-Turbo, a multimodal coding model that converts design mockups directly into executable front-end code. The model processes images, video, and text with a 200,000-token context window and 128,000-token max output, priced at $1.20 per million input tokens and $4 per million output tokens.

April 2, 2026
model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 family with 256K context window and multimodal capabilities

Google DeepMind released the Gemma 4 family of open-weights models in four sizes (2.3B to 31B parameters) with multimodal support for text, images, video, and audio. The flagship 31B model achieves 85.2% on MMLU Pro and 89.2% on AIME 2024, with context windows up to 256K tokens. All models feature configurable reasoning modes and are optimized for deployment from mobile devices to servers under Apache 2.0 license.

March 25, 2026
product updateAmazon Web Services

Amazon Bedrock adds three video analysis workflows for multimodal understanding at scale

Amazon Bedrock has introduced three distinct video analysis workflows that leverage multimodal foundation models to extract insights from video content at scale. The approaches—frame-based, shot-based, and multimodal embedding—are designed for different use cases and cost-performance trade-offs, with open-source reference implementations available on GitHub.