LLM News

multimodal visual-reasoning data-synthesis

research

Researchers develop data synthesis method to improve multimodal AI reasoning on charts and documents

A new research paper proposes COGS (COmposition-Grounded data Synthesis), a framework that decomposes questions into primitive perception and reasoning factors to generate synthetic training data. The method substantially improves multimodal model performance on chart reasoning and document understanding tasks with minimal human annotation.

March 5, 2026 · 5:24 AM2 min read

benchmark music-generation reward-models

benchmark

New benchmark evaluates music reward models trained on text, lyrics, and audio

Researchers have released CMI-RewardBench, a comprehensive evaluation framework for music reward models that handle mixed text, lyrics, and audio inputs. The benchmark includes 110,000 pseudo-labeled samples and human-annotated data, along with publicly available reward models designed for fine-grained music generation alignment.

March 5, 2026 · 5:06 AM1 min read

benchmark multimodal vision-language-models

benchmark

UniG2U-Bench reveals unified multimodal models underperform VLMs in most tasks

A new comprehensive benchmark called UniG2U-Bench evaluates whether generation capabilities improve multimodal understanding across 30+ models. The findings show unified multimodal models generally underperform specialized Vision-Language Models, with generation-then-answer inference degrading performance in most cases—though spatial reasoning and multi-round tasks show consistent improvements.

March 5, 2026 · 1:08 AM2 min read

benchmark

CFE-Bench: New STEM reasoning benchmark reveals frontier models struggle with multi-step logic

Researchers introduced CFE-Bench (Classroom Final Exam), a multimodal benchmark using authentic university homework and exam problems across 20+ STEM domains to evaluate LLM reasoning capabilities. Gemini 3.1 Pro Preview achieved the highest score at 59.69% accuracy, while analysis revealed frontier models frequently fail to maintain correct intermediate states in multi-step solutions.

March 5, 2026 · 1:06 AM2 min read

benchmark reasoning STEM

medical-ai vision-language-model multimodal

research

MedXIAOHE: New medical vision-language model claims state-of-the-art performance on clinical benchmarks

Researchers have published MedXIAOHE, a medical multimodal foundation model designed for clinical applications. According to the authors, the model achieves state-of-the-art performance across diverse medical benchmarks and surpasses several closed-source multimodal systems on multiple capabilities.

March 5, 2026 · 12:51 AM2 min read

model release

Alibaba releases Qwen3.5-2B, a 2B-parameter multimodal model for image and text tasks

Alibaba has released Qwen3.5-2B, a 2-billion-parameter multimodal model capable of processing both images and text. The model is available on Hugging Face under the Apache 2.0 license and supports image-text-to-text tasks.

March 2, 2026 · 8:05 PM2 min read

qwen alibaba-qwen multimodal

model release

Alibaba releases Qwen3.5-0.8B, a compact multimodal model for edge deployment

Alibaba's Qwen team has released Qwen3.5-0.8B, an 800-million-parameter multimodal model designed for resource-constrained environments. The model handles image-text-to-text tasks and is distributed under Apache 2.0 licensing, making it freely usable for commercial applications.

March 2, 2026 · 3:50 PM2 min read

model release

Alibaba releases Qwen3.5-4B, a 4B multimodal model for vision and text tasks

Alibaba's Qwen team has released Qwen3.5-4B, a 4 billion parameter multimodal model capable of processing both images and text. The model is available on Hugging Face under an Apache 2.0 license, making it freely available for commercial and research use.

March 2, 2026 · 3:20 PM2 min read

model release

Alibaba releases Qwen3.5-9B, a multimodal 9B parameter model

Alibaba has released Qwen3.5-9B, a 9-billion parameter multimodal language model capable of processing both images and text. The model is available under Apache 2.0 license on Hugging Face with transformer-compatible architecture.

March 2, 2026 · 1:50 PM2 min read

qwen alibaba-qwen model-release

model release

Alibaba releases Qwen3.5-35B-A3B-FP8, a quantized multimodal model for efficient deployment

Alibaba's Qwen team released Qwen3.5-35B-A3B-FP8 on Hugging Face, a quantized version of their 35-billion parameter multimodal model. The FP8 quantization reduces model size and memory requirements while maintaining the base model's image-text-to-text capabilities. The model is compatible with standard Transformers endpoints and Azure deployment.

March 1, 2026 · 11:20 AM1 min read

model release

Alibaba releases Qwen3.5-35B-A3B, a 35B multimodal model with Apache 2.0 license

Alibaba's Qwen team has released Qwen3.5-35B-A3B-Base, a 35-billion parameter multimodal model supporting image-text-to-text tasks. The model is available under the Apache 2.0 license and compatible with major inference endpoints including Azure deployment.

February 26, 2026 · 2:05 PM1 min read

qwen alibaba-qwen multimodal

model release

Alibaba releases Qwen3.5-27B, a 27B multimodal model with Apache 2.0 license

Alibaba Qwen has released Qwen3.5-27B, a 27-billion parameter model capable of processing both images and text. The model is available under an Apache 2.0 open license and is compatible with standard transformer endpoints.

February 24, 2026 · 7:20 PM2 min read

qwen alibaba-qwen multimodal

model release

Alibaba releases Qwen3.5-35B-A3B, a 35B multimodal model with Apache 2.0 license

Alibaba has released Qwen3.5-35B-A3B, a 35-billion parameter multimodal model capable of processing images and text. The model is published under an Apache 2.0 license and available on Hugging Face with Transformers and SafeTensors format support.

February 24, 2026 · 6:05 PM2 min read