moe
7 articles tagged with moe
Chroma releases Context-1, a 20B parameter retrieval agent for complex multi-hop search
Chroma has released Context-1, a 20B parameter Mixture of Experts model trained specifically for retrieval tasks that require multi-hop reasoning. The model decomposes complex queries into subqueries, performs parallel tool calls, and actively prunes its own context mid-search—achieving comparable performance to frontier models at a fraction of the cost and up to 10x faster inference speed.
Rakuten releases RakutenAI-3.0, 671B-parameter Japanese-optimized mixture-of-experts model
Rakuten Group has released RakutenAI-3.0, a 671 billion parameter mixture-of-experts (MoE) model designed specifically for Japanese language tasks. The model activates 37 billion parameters per token and supports a 128K context window. It is available under the Apache License 2.0 on Hugging Face.
Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context
Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.
NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture
NVIDIA has released Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter model designed for text generation and conversational tasks. The model employs a latent mixture-of-experts (MoE) architecture and supports multiple languages including English, French, Spanish, Italian, German, Japanese, and Chinese.
Alibaba releases Qwen3.5-35B-A3B-FP8, a quantized multimodal model for efficient deployment
Alibaba's Qwen team released Qwen3.5-35B-A3B-FP8 on Hugging Face, a quantized version of their 35-billion parameter multimodal model. The FP8 quantization reduces model size and memory requirements while maintaining the base model's image-text-to-text capabilities. The model is compatible with standard Transformers endpoints and Azure deployment.
Alibaba releases Qwen3.5-35B-A3B, a 35B multimodal model with Apache 2.0 license
Alibaba's Qwen team has released Qwen3.5-35B-A3B-Base, a 35-billion parameter multimodal model supporting image-text-to-text tasks. The model is available under the Apache 2.0 license and compatible with major inference endpoints including Azure deployment.
Liquid AI releases LFM2-24B-A2B, a 24B parameter mixture-of-experts model
Liquid AI has released LFM2-24B-A2B, a 24-billion parameter mixture-of-experts model designed for text generation and conversational tasks. The model supports nine languages including English, Arabic, Chinese, French, German, Japanese, Korean, Spanish, and Portuguese.