model release

Alibaba releases Qwen3.5-9B, a multimodal 9B parameter model

Alibaba has released Qwen3.5-9B, a 9-billion parameter multimodal language model capable of processing both images and text. The model is available under Apache 2.0 license on Hugging Face with transformer-compatible architecture.

2 min read

Alibaba's Qwen team has released Qwen3.5-9B, a 9-billion parameter multimodal model designed for image-text-to-text tasks. The model arrived on Hugging Face on February 27, 2026.

Model Specifications

Qwen3.5-9B is a multimodal language model that accepts both image and text inputs, classifying it as an image-text-to-text model. The base model is available as Qwen/Qwen3.5-9B-Base, with the released version appearing to be fine-tuned for improved performance on downstream tasks.

The model uses standard transformer architecture and is distributed in SafeTensors format, compatible with the Hugging Face transformers library. It supports inference endpoints according to the model card.

Licensing and Availability

Qwen3.5-9B is released under the Apache 2.0 license, permitting commercial use, modification, and redistribution with attribution. The model is hosted on Hugging Face, where it has received 65 likes and 10 downloads as of early release.

Technical Details

The model carries the qwen3_5 tag and is labeled conversational, indicating design for dialogue and interaction tasks. Its 9-billion parameter scale positions it in the lightweight-to-mid-range segment, suitable for deployment on consumer hardware and edge devices with moderate GPU resources.

The pipeline tag specifies image-text-to-text functionality, meaning the model accepts images and text as input and generates text output. This multimodal capability differentiates it from text-only models of similar size.

Context and Positioning

Qwen3.5-9B represents Alibaba's continued expansion in the open-source language model space. The Qwen series has established itself as a competitive alternative to models from OpenAI, Google, and other providers, particularly for developers requiring open-source options with permissive licensing.

The model size and architecture suggest design for scenarios where compute efficiency matters: fine-tuning, deployment in resource-constrained environments, and use cases where model interpretability and size are advantages. Multimodal capability at the 9B scale addresses demand for models that can handle both vision and language tasks without the resource overhead of larger multimodal systems.

What this means

Qwen3.5-9B extends Alibaba's open-source model lineup with a multimodal option in an accessible size range. For practitioners, this creates another option for deploying image-text understanding locally or with minimal compute costs. The Apache 2.0 license removes licensing barriers for commercial applications, competing directly with similarly-sized models from other providers. The specific performance characteristics—benchmark scores, token limits, and inference speed—remain unreleased, making empirical comparison necessary before adoption decisions.

Qwen3.5-9B: Alibaba's New 9B Multimodal Model | TPS