model release

Rakuten releases RakutenAI-3.0, 671B-parameter Japanese-optimized mixture-of-experts model

TL;DR

Rakuten Group has released RakutenAI-3.0, a 671 billion parameter mixture-of-experts (MoE) model designed specifically for Japanese language tasks. The model activates 37 billion parameters per token and supports a 128K context window. It is available under the Apache License 2.0 on Hugging Face.

March 23, 2026 · 3:36 PM2 min read

Rakuten Releases 671B Parameter Model Optimized for Japanese

Rakuten Group has published RakutenAI-3.0, a 671 billion parameter mixture-of-experts language model engineered for Japanese language understanding and generation. The model activates 37 billion parameters per token and supports a 128,000 token context window.

Technical Specifications

The model uses a mixture-of-experts architecture, a design pattern that maintains computational efficiency by selectively activating only a subset of parameters for each input token. RakutenAI-3.0 is trained on a combination of publicly available open-source data and Rakuten's proprietary bilingual Japanese-English datasets.

Key specifications:

Total parameters: 671 billion
Active parameters per token: 37 billion
Context window: 128,000 tokens
Supported languages: Japanese and English
Model format: F32, BF16, and F8_E4M3 quantization variants available
License: Apache License 2.0

Deployment and Access

RakutenAI-3.0 is available on Hugging Face for download and local deployment. The company provides inference instructions using SGLang with recommended specifications requiring 8 tensor parallelism and 85% static memory allocation. The model has recorded 425 downloads in its first month on Hugging Face.

No official inference API or hosted endpoints have been announced. The model card indicates the model is not currently deployed by commercial inference providers.

Positioning

Rakuten positions RakutenAI-3.0 as delivering "superior grasp of Japanese language and culture" compared to existing models. The emphasis on Japanese-optimized training reflects increasing focus by regional technology companies on language-specific LLMs, following similar releases from companies like Alibaba (Qwen) and Baidu.

Limitations

Rakuten's documentation explicitly acknowledges that RakutenAI-3.0 can generate biased, inaccurate, or unsafe outputs like other large language models. The company recommends implementing appropriate safeguards for production deployments.

What This Means

Rakuten's entry into open-source Japanese-optimized LLMs signals sustained competition in regional language models. At 671B parameters with a 128K context window, it competes in scale with existing open models but targets a specific linguistic niche. The Apache 2.0 license and community release suggest Rakuten is prioritizing ecosystem participation over proprietary monetization, similar to Meta's approach with Llama. The model's availability only through local deployment (no hosted API) limits accessibility for developers without substantial compute resources.

Source: huggingface.co ↗

rakuten mixture-of-experts moe japanese llm open-source 671b-parameters 128k-context

model releaseMay 7, 2026

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.

model releaseMay 6, 2026

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.

model releaseMay 6, 2026

IBM Releases Granite Embedding 311M R2 With 32K Context, 200+ Language Support

IBM released Granite Embedding 311M Multilingual R2, a 311-million parameter dense embedding model with 32,768-token context length and support for 200+ languages. The model scores 64.0 on Multilingual MTEB Retrieval (18 tasks), an 11.8-point improvement over its predecessor, and ships with ONNX and OpenVINO models for production deployment.

model releaseMay 6, 2026

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.

Rakuten releases RakutenAI-3.0, 671B-parameter Japanese-optimized mixture-of-experts model

Rakuten Releases 671B Parameter Model Optimized for Japanese

Technical Specifications

Deployment and Access

Positioning

Limitations

What This Means

Related Articles

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

IBM Releases Granite Embedding 311M R2 With 32K Context, 200+ Language Support

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Comments