Alibaba releases Qwen3.5-27B, a 27B multimodal model with Apache 2.0 license
Alibaba Qwen has released Qwen3.5-27B, a 27-billion parameter model capable of processing both images and text. The model is available under an Apache 2.0 open license and is compatible with standard transformer endpoints.
Alibaba Qwen Releases Qwen3.5-27B Multimodal Model
Alibaba's Qwen team has published Qwen3.5-27B, a 27-billion parameter model designed to handle both image and text inputs. The release marks the latest iteration in Alibaba's open-source model lineup.
Model Specifications
Qwen3.5-27B is a multimodal model with an architecture supporting image-text-to-text tasks. The model carries an Apache 2.0 license, making it freely available for both research and commercial use. It is compatible with standard transformer endpoints and follows the safetensors format for model weights.
The model's parameter count of 27 billion positions it in the mid-range segment—larger than models like Mistral 7B but smaller than many instruction-tuned variants in the 70B range. This size targets deployment scenarios where computational resources are constrained but model capability remains a priority.
Capability Profile
Qwen3.5-27B is tagged for conversational tasks and multimodal understanding, suggesting it can engage in dialogue while processing images alongside text prompts. The image-text-to-text classification indicates the model accepts images and text as combined inputs and generates text responses.
Specific benchmark scores, training data composition, knowledge cutoff date, and maximum context window length have not been disclosed in the initial release metadata.
Availability and Licensing
The model is hosted on Hugging Face and is immediately available for download. The Apache 2.0 license removes legal barriers to commercial deployment, distinguishing this release from many restricted-license models. Support for standard transformer inference frameworks means existing tooling can run the model without custom implementations.
No pricing information or commercial hosting details have been announced.
What This Means
Qwen3.5-27B expands Alibaba's open competition with other mid-range multimodal models like Qwen's own larger variants and offerings from Mistral, Meta, and others. The 27B parameter count targets developers who need multimodal capability without the compute overhead of 70B+ models. The Apache 2.0 license removes deployment friction compared to restricted models. However, without disclosed benchmarks or performance data, comparative positioning against competing 27-30B multimodal models remains unclear. Organizations evaluating this model should establish baselines on their specific use cases before production deployment.
Related Articles
Tencent Releases HY-World 2.0: Open-Source Multi-Modal Model Generates 3D Worlds from Text and Images
Tencent has released HY-World 2.0, an open-source multi-modal world model that generates navigable 3D environments from text prompts, single images, multi-view images, or video. The model produces editable 3D assets including meshes and 3D Gaussian Splattings that can be directly imported into game engines like Unity and Unreal Engine.
OpenAI Releases GPT-5.4 Image 2 with 272K Context Window and Image Generation
OpenAI has released GPT-5.4 Image 2, combining the GPT-5.4 reasoning model with image generation capabilities. The multimodal model features a 272K token context window and is priced at $8 per million input tokens and $15 per million output tokens.
OpenAI releases ChatGPT Images 2.0 with 3840x2160 resolution at $30 per 1M output tokens
OpenAI released ChatGPT Images 2.0, pricing output tokens at $30 per million with maximum resolution of 3840x2160 pixels. CEO Sam Altman claims the improvement from gpt-image-1 to gpt-image-2 equals the jump from GPT-3 to GPT-5.
OpenAI announces gpt-image-2 model with improved text rendering and UI generation
OpenAI is set to announce gpt-image-2, its next-generation image generation model, on April 21, 2026 at 12pm PT. The company's teaser demonstrates improved capabilities in rendering text and generating realistic user interfaces from text prompts.
Comments
Loading...