LLM News

Every LLM release, update, and milestone.

Filtered by:mixture-of-experts✕ clear

research

TSEmbed combines mixture-of-experts with LoRA to scale multimodal embeddings across conflicting tasks

Researchers propose TSEmbed, a multimodal embedding framework that combines Mixture-of-Experts (MoE) with Low-Rank Adaptation (LoRA) to handle task conflicts in universal embedding models. The approach introduces Expert-Aware Negative Sampling (EANS) to improve discriminative power and achieves state-of-the-art results on the Massive Multimodal Embedding Benchmark (MMEB).

March 6, 2026 · 6:06 AM2 min read

multimodal-embeddings mixture-of-experts low-rank-adaptation

via arxiv.org ↗

research

Researchers propose Mixture of Universal Experts to scale MoE models via depth-width transformation

Researchers have introduced Mixture of Universal Experts (MoUE), a generalization of Mixture-of-Experts architectures that adds a new scaling dimension called virtual width. The approach reuses a shared expert pool across layers while maintaining fixed per-token computation, achieving up to 1.3% improvements over standard MoE baselines and enabling 4.2% gains when converting existing MoE checkpoints.

March 6, 2026 · 5:36 AM2 min read

mixture-of-experts moe neural-architecture

via arxiv.org ↗

research

ButterflyMoE achieves 150× memory reduction for mixture-of-experts models via geometric rotations

Researchers introduce ButterflyMoE, a technique that replaces independent expert weight matrices with learned geometric rotations applied to a shared quantized substrate. The method reduces memory scaling from linear to sub-linear in the number of experts, achieving 150× compression at 256 experts with negligible accuracy loss on language modeling tasks.

March 6, 2026 · 5:07 AM2 min read

mixture-of-experts model-compression quantization

via arxiv.org ↗

model release

Alibaba releases Qwen3.5-35B-A3B, a 35B multimodal model with Apache 2.0 license

Alibaba has released Qwen3.5-35B-A3B, a 35-billion parameter multimodal model capable of processing images and text. The model is published under an Apache 2.0 license and available on Hugging Face with Transformers and SafeTensors format support.

February 24, 2026 · 6:05 PM2 min read

qwen alibaba-qwen multimodal

via huggingface.co ↗

model release

Liquid AI releases LFM2-24B-A2B, a 24B parameter mixture-of-experts model

Liquid AI has released LFM2-24B-A2B, a 24-billion parameter mixture-of-experts model designed for text generation and conversational tasks. The model supports nine languages including English, Arabic, Chinese, French, German, Japanese, Korean, Spanish, and Portuguese.

February 24, 2026 · 5:35 PM1 min read

liquid-ai moe mixture-of-experts

via huggingface.co ↗

model release

Segmind releases SegMoE, a mixture-of-experts diffusion model for faster image generation

Segmind has released SegMoE, a mixture-of-experts (MoE) diffusion model designed to accelerate image generation while reducing computational overhead. The model applies MoE techniques traditionally used in large language models to the diffusion model architecture, enabling selective expert activation during inference.

February 20, 2026 · 3:06 AM2 min read

diffusion-models mixture-of-experts image-generation

via huggingface.co ↗