LLM News

Every LLM release, update, and milestone.

Filtered by:moe✕ clear

research

Timer-S1: 8.3B time series foundation model achieves state-of-the-art forecasting on GIFT-Eval

Researchers have introduced Timer-S1, a Mixture-of-Experts time series foundation model with 8.3 billion total parameters and 750 million activated parameters per token. The model achieves state-of-the-art forecasting performance on the GIFT-Eval leaderboard, with the best MASE and CRPS scores among pre-trained models.

March 6, 2026 · 6:09 AM2 min read

time-series foundation-model moe

via arxiv.org ↗

research

Researchers propose Mixture of Universal Experts to scale MoE models via depth-width transformation

Researchers have introduced Mixture of Universal Experts (MoUE), a generalization of Mixture-of-Experts architectures that adds a new scaling dimension called virtual width. The approach reuses a shared expert pool across layers while maintaining fixed per-token computation, achieving up to 1.3% improvements over standard MoE baselines and enabling 4.2% gains when converting existing MoE checkpoints.

March 6, 2026 · 5:36 AM2 min read

mixture-of-experts moe neural-architecture

via arxiv.org ↗

model release

Alibaba releases Qwen3.5-35B-A3B-FP8, a quantized multimodal model for efficient deployment

Alibaba's Qwen team released Qwen3.5-35B-A3B-FP8 on Hugging Face, a quantized version of their 35-billion parameter multimodal model. The FP8 quantization reduces model size and memory requirements while maintaining the base model's image-text-to-text capabilities. The model is compatible with standard Transformers endpoints and Azure deployment.

March 1, 2026 · 11:20 AM1 min read

qwen alibaba-qwen model-release

via huggingface.co ↗

model release

Alibaba releases Qwen3.5-35B-A3B, a 35B multimodal model with Apache 2.0 license

Alibaba's Qwen team has released Qwen3.5-35B-A3B-Base, a 35-billion parameter multimodal model supporting image-text-to-text tasks. The model is available under the Apache 2.0 license and compatible with major inference endpoints including Azure deployment.

February 26, 2026 · 2:05 PM1 min read

qwen alibaba multimodal

via huggingface.co ↗

model release

Liquid AI releases LFM2-24B-A2B, a 24B parameter mixture-of-experts model

Liquid AI has released LFM2-24B-A2B, a 24-billion parameter mixture-of-experts model designed for text generation and conversational tasks. The model supports nine languages including English, Arabic, Chinese, French, German, Japanese, Korean, Spanish, and Portuguese.

February 24, 2026 · 5:35 PM1 min read

liquid-ai moe mixture-of-experts

via huggingface.co ↗