LLM News

Every LLM release, update, and milestone.

Filtered by:moe✕ clear
research

Timer-S1: 8.3B time series foundation model achieves state-of-the-art forecasting on GIFT-Eval

Researchers have introduced Timer-S1, a Mixture-of-Experts time series foundation model with 8.3 billion total parameters and 750 million activated parameters per token. The model achieves state-of-the-art forecasting performance on the GIFT-Eval leaderboard, with the best MASE and CRPS scores among pre-trained models.

2 min readvia arxiv.org
research

Researchers propose Mixture of Universal Experts to scale MoE models via depth-width transformation

Researchers have introduced Mixture of Universal Experts (MoUE), a generalization of Mixture-of-Experts architectures that adds a new scaling dimension called virtual width. The approach reuses a shared expert pool across layers while maintaining fixed per-token computation, achieving up to 1.3% improvements over standard MoE baselines and enabling 4.2% gains when converting existing MoE checkpoints.

model release

Alibaba releases Qwen3.5-35B-A3B-FP8, a quantized multimodal model for efficient deployment

Alibaba's Qwen team released Qwen3.5-35B-A3B-FP8 on Hugging Face, a quantized version of their 35-billion parameter multimodal model. The FP8 quantization reduces model size and memory requirements while maintaining the base model's image-text-to-text capabilities. The model is compatible with standard Transformers endpoints and Azure deployment.

1 min readvia huggingface.co