model release

Alibaba releases Qwen3.5-35B-A3B-FP8, a quantized multimodal model for efficient deployment

TL;DR

Alibaba's Qwen team released Qwen3.5-35B-A3B-FP8 on Hugging Face, a quantized version of their 35-billion parameter multimodal model. The FP8 quantization reduces model size and memory requirements while maintaining the base model's image-text-to-text capabilities. The model is compatible with standard Transformers endpoints and Azure deployment.

March 1, 2026 · 11:20 AM1 min read

Qwen3.5-35B-A3B-FP8 — Quick Specs

Context window262K tokens

Compare Qwen3.5-35B-A3B-FP8 with other models →

Alibaba Releases FP8-Quantized Qwen3.5-35B Multimodal Model

Alibaba's Qwen team has released Qwen3.5-35B-A3B-FP8, an FP8-quantized variant of their 35-billion parameter multimodal model, now available on Hugging Face.

Key Specifications

Qwen3.5-35B-A3B-FP8 is a quantized version of the base Qwen3.5-35B-A3B model, applying 8-bit floating-point quantization to reduce memory footprint and enable faster inference. The model maintains the multimodal capabilities of its parent, supporting image-text-to-text tasks including image understanding and conversational interactions combining visual and textual inputs.

The quantized variant is built on Qwen's Mixture-of-Experts (MoE) architecture, as indicated by the qwen3_5_moe tag. Specific parameter counts for the active model during inference and total MoE parameters are not publicly disclosed.

Deployment and Compatibility

The model is compatible with Hugging Face Transformers pipelines and standard endpoints. Alibaba explicitly lists Azure deployment support, indicating enterprise readiness. The model uses SafeTensors format for efficient loading and distributed across regions including US deployment endpoints.

The release is licensed under Apache 2.0, permitting commercial and research use with standard attribution requirements.

Community Adoption

As of the release date, the model had accumulated 157,725 downloads and 60 community likes on Hugging Face, indicating active interest from developers and researchers building with quantized multimodal systems.

What This Means

Qwen3.5-35B-A3B-FP8 addresses a practical constraint in deploying large multimodal models: memory and compute efficiency. FP8 quantization typically reduces model size by 50% compared to FP16 with minimal accuracy loss, making this variant accessible for deployment on consumer GPUs and cost-constrained cloud infrastructure. The explicit Azure compatibility signals Alibaba's push into enterprise deployment markets where Microsoft partnerships matter. For teams evaluating multimodal models between 30-40B parameters, this quantized release offers a memory-efficient option alongside full-precision variants without requiring specialized quantization expertise.

Source: huggingface.co ↗

qwen alibaba-qwen model-release multimodal quantization fp8 moe hugging-face

model releaseJuly 20, 2026

Alibaba releases Qwen 3.8, a 2.4 trillion parameter open-weight model claiming second place behind Fable 5

Alibaba has released Qwen 3.8, a 2.4 trillion parameter open-weight model that the company claims trails only Fable 5. The multimodal model processes images, videos, and documents, with a preview available through Alibaba's platforms at 10 percent of standard pricing.

model releaseJuly 20, 2026

Moonshot AI Releases Kimi K3: 2.8T Parameter Open Model at $3/$15 Per Million Tokens

Moonshot AI has released Kimi K3, a 2.8 trillion parameter model with 1 million token context window and native multimodal input. The model ranks #1 in Frontend Code Arena and #9 in Text Arena, with pricing at $3 per million input tokens and $15 per million output tokens—comparable to Claude Sonnet 5 pricing while delivering performance the company claims is near Claude Opus 4.8 and GPT-5.5.

model releaseJuly 20, 2026

Thinking Machines releases Inkling: 975B-parameter MoE model with Apache 2.0 license, first major US open-weight multimo

Thinking Machines Lab released Inkling, a mixture-of-experts model with 975B total parameters and 41B active parameters, trained on 45 trillion tokens across text, images, audio, and video. The Apache 2.0-licensed model supports up to 1M context and debuts alongside Inkling-Small (276B-A12B), marking what observers call the strongest US-based open-weight release to date.

model releaseJuly 20, 2026

Meituan launches LongCat 2.0: 1.6T parameter MoE model with 1M+ context window at $0.30 per 1M input tokens

Meituan has released LongCat 2.0, a sparse mixture-of-experts language model with 48 billion active parameters out of 1.6 trillion total. The model features a 1,049,000 token context window and costs $0.30 per 1M input tokens and $1.20 per 1M output tokens.

Alibaba releases Qwen3.5-35B-A3B-FP8, a quantized multimodal model for efficient deployment

Qwen3.5-35B-A3B-FP8 — Quick Specs

Alibaba Releases FP8-Quantized Qwen3.5-35B Multimodal Model

Key Specifications

Deployment and Compatibility

Community Adoption

What This Means

Related Articles

Alibaba releases Qwen 3.8, a 2.4 trillion parameter open-weight model claiming second place behind Fable 5

Moonshot AI Releases Kimi K3: 2.8T Parameter Open Model at $3/$15 Per Million Tokens

Thinking Machines releases Inkling: 975B-parameter MoE model with Apache 2.0 license, first major US open-weight multimo

Meituan launches LongCat 2.0: 1.6T parameter MoE model with 1M+ context window at $0.30 per 1M input tokens

Comments