omnimodal
1 article tagged with omnimodal
March 31, 2026
model release+1
Alibaba's Qwen3.5-Omni learns to write code from speech and video without explicit training
Alibaba has released Qwen3.5-Omni, an omnimodal model handling text, images, audio, and video with a 256,000-token context window. The model reportedly outperforms Google's Gemini 3.1 Pro on audio tasks with support for 74 languages in speech recognition, a 6x increase from its predecessor. An unexpected emergent capability: writing working code from spoken instructions and video input, which the team did not explicitly train.