Version History

3.5minorMarch 31, 2026

Qwen3.5-Omni expands from Qwen3-Omni with 8x context window increase (32K to 256K tokens), 6x language support expansion (11 to 74 languages), hybrid attention-MoE architecture, and ARIA token interleaving for improved real-time speech synthesis. Demonstrates emergent code-generation capability from spoken and video input.

Coverage

model release

Alibaba's Qwen3.5-Omni learns to write code from speech and video without explicit training

Alibaba has released Qwen3.5-Omni, an omnimodal model handling text, images, audio, and video with a 256,000-token context window. The model reportedly outperforms Google's Gemini 3.1 Pro on audio tasks with support for 74 languages in speech recognition, a 6x increase from its predecessor. An unexpected emergent capability: writing working code from spoken instructions and video input, which the team did not explicitly train.

March 31, 2026 · 12:35 PM3 min read

qwen alibaba multimodal