LLM News

multimodal-ai reasoning-models training

research

Researchers identify 'Lazy Attention' problem in multimodal AI training, boost reasoning by 7%

A new paper from arXiv identifies a critical flaw in how multimodal large reasoning models initialize training: they fail to properly attend to visual tokens, a phenomenon researchers call Lazy Attention Localization. The team proposes AVAR, a framework that corrects this through visual-anchored data synthesis and attention-guided objectives, achieving 7% average improvements across seven multimodal reasoning benchmarks when applied to Qwen2.5-VL-7B.

March 5, 2026 · 5:37 AM2 min read

lora fine-tuning parameter-efficient-tuning

research

Spectral Surgery: Training-Free Method Improves LoRA Adapters Without Retraining

Researchers propose Spectral Surgery, a training-free refinement method that improves Low-Rank Adaptation (LoRA) adapters by decomposing trained weights via SVD and selectively reweighting singular values based on gradient-estimated component sensitivity. The approach achieves consistent gains across Llama-3.1-8B and Qwen3-8B—up to +4.4 points on CommonsenseQA and +2.4 pass@1 on HumanEval—by adjusting only ~1,000 scalar coefficients.

March 5, 2026 · 5:36 AM2 min read

research llm-interpretability reasoning

research

Research reveals LLMs internalize logic as geometric flows in representation space

A new geometric framework demonstrates that LLMs internalize logical reasoning as smooth flows—embedding trajectories—in their representation space, rather than merely pattern-matching. The research, which tests logic across different semantic contexts, suggests next-token prediction training alone can produce higher-order geometric structures that encode logical invariants.

March 5, 2026 · 5:21 AM2 min read

model release

Alibaba releases Qwen3.5-2B, a 2B-parameter multimodal model for image and text tasks

Alibaba has released Qwen3.5-2B, a 2-billion-parameter multimodal model capable of processing both images and text. The model is available on Hugging Face under the Apache 2.0 license and supports image-text-to-text tasks.

March 2, 2026 · 8:05 PM2 min read

qwen alibaba-qwen multimodal

model release

Alibaba releases Qwen3.5-0.8B, a compact multimodal model for edge deployment

Alibaba's Qwen team has released Qwen3.5-0.8B, an 800-million-parameter multimodal model designed for resource-constrained environments. The model handles image-text-to-text tasks and is distributed under Apache 2.0 licensing, making it freely usable for commercial applications.

March 2, 2026 · 3:50 PM2 min read

model release

Alibaba releases Qwen3.5-4B, a 4B multimodal model for vision and text tasks

Alibaba's Qwen team has released Qwen3.5-4B, a 4 billion parameter multimodal model capable of processing both images and text. The model is available on Hugging Face under an Apache 2.0 license, making it freely available for commercial and research use.

March 2, 2026 · 3:20 PM2 min read

model release

Alibaba releases Qwen3.5-9B, a multimodal 9B parameter model

Alibaba has released Qwen3.5-9B, a 9-billion parameter multimodal language model capable of processing both images and text. The model is available under Apache 2.0 license on Hugging Face with transformer-compatible architecture.

March 2, 2026 · 1:50 PM2 min read

qwen alibaba-qwen model-release

model release

Alibaba releases Qwen3.5-35B-A3B-FP8, a quantized multimodal model for efficient deployment

Alibaba's Qwen team released Qwen3.5-35B-A3B-FP8 on Hugging Face, a quantized version of their 35-billion parameter multimodal model. The FP8 quantization reduces model size and memory requirements while maintaining the base model's image-text-to-text capabilities. The model is compatible with standard Transformers endpoints and Azure deployment.

March 1, 2026 · 11:20 AM1 min read

model release

Alibaba releases Qwen3.5-35B-A3B, a 35B multimodal model with Apache 2.0 license

Alibaba's Qwen team has released Qwen3.5-35B-A3B-Base, a 35-billion parameter multimodal model supporting image-text-to-text tasks. The model is available under the Apache 2.0 license and compatible with major inference endpoints including Azure deployment.

February 26, 2026 · 2:05 PM1 min read

qwen alibaba-qwen multimodal

model release

Alibaba releases Qwen3.5-27B, a 27B multimodal model with Apache 2.0 license

Alibaba Qwen has released Qwen3.5-27B, a 27-billion parameter model capable of processing both images and text. The model is available under an Apache 2.0 open license and is compatible with standard transformer endpoints.

February 24, 2026 · 7:20 PM2 min read

qwen alibaba-qwen multimodal

model release

Alibaba releases Qwen3.5-35B-A3B, a 35B multimodal model with Apache 2.0 license

Alibaba has released Qwen3.5-35B-A3B, a 35-billion parameter multimodal model capable of processing images and text. The model is published under an Apache 2.0 license and available on Hugging Face with Transformers and SafeTensors format support.

February 24, 2026 · 6:05 PM2 min read

model-release qwen agent-models

model release

LocoreMind releases LocoOperator-4B, a 4B parameter agent model based on Qwen3

LocoreMind has released LocoOperator-4B, a 4 billion parameter text generation model fine-tuned from Qwen/Qwen3-4B-Instruct-2507. The model is optimized for agent workflows and tool-calling capabilities and is available under an MIT license.

February 24, 2026 · 9:05 AM1 min read