NVIDIA releases Nemotron 3 Content Safety 4B for multimodal, multilingual moderation
NVIDIA released Nemotron 3 Content Safety 4B, an open-source multimodal safety model designed to moderate content across text, images, and multiple languages. Built on Gemma-3 4B-IT with a 128K context window, the model achieved 84% average accuracy on multimodal safety benchmarks and supports over 140 languages through culturally-aware training data.
Nemotron 3 Content Safety 4B — Quick Specs
NVIDIA releases Nemotron 3 Content Safety 4B for multimodal, multilingual moderation
NVIDIA released Nemotron 3 Content Safety 4B, an open-source multimodal safety classifier designed to moderate text-image combinations across 140+ languages. The model addresses critical gaps in existing safety systems that fail to capture cultural context and multilingual nuance.
Model Specifications
Nemotron 3 Content Safety 4B is built on the Gemma-3 4B-IT vision-language foundation model, featuring a 128K context window and support for over 140 languages. The model uses LoRA adapter fine-tuning to maintain efficiency while adding targeted safety classification behavior.
The model operates in two inference modes: basic binary classification (safe/unsafe for user input and assistant response) and category-rich output that lists specific policy violations aligned with the Aegis AI Content Safety Dataset v2 taxonomy. Safety categories include violence, criminal planning, harassment, self-harm, privacy violations, and jailbreak patterns.
Multimodal and Multilingual Focus
Unlike earlier text-only safety models trained primarily on English, Nemotron 3 Content Safety handles the non-additive complexity of multimodal inputs. For example, a kitchen knife image paired with "great tool for cooking" is safe, while the same image with "I'm going to use this to harm someone" violates policy. The model must also account for cultural shifts in meaning—a religious symbol acceptable in one cultural context may constitute hate speech in another.
Training data includes multilingual content from the proprietary Nemotron Content Safety Dataset v3, human-annotated multimodal data translated into 12 languages (English, Arabic, German, Spanish, French, Hindi, Japanese, Thai, Dutch, Italian, Korean, and Chinese), and safe data from the Nemotron VLM Dataset v2 containing documents and charts.
Synthetic data generation contributed approximately 10% of training data, used to increase response diversity, create jailbreak scenarios, and generate instances where safe inputs produced unsafe responses. Open models including Mixtral 8x 22B, Gemma 3-27B, and Microsoft Phi-4 supported SDG pipelines.
Benchmark Performance
Nemotron 3 Content Safety was evaluated on five established benchmarks: Polyguard, RTP-LX, VLGuard, MM SafetyBench, and Figstep. The model achieved 84% average accuracy (harmful F1 score) on multimodal harmful-content tests, outperforming comparable open safety models. These benchmarks test real-world scenarios including mixed-language conversations, screenshots with embedded text, and cases where meaning requires text-image interpretation.
What this means
NVIDIA's release addresses a concrete gap: existing content safety models struggle with non-English prompts and fail to process images and text jointly. The 4B parameter size and open-source availability make this accessible to enterprises deploying multilingual AI agents without relying on proprietary safety APIs. The 84% F1 score represents state-of-the-art performance for an open-source model at this scale, though organizations should still validate on their specific use cases and languages. For teams building applications in non-English markets or handling visual content, this represents a meaningful alternative to larger, closed-source moderation systems.
Related Articles
Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context
Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.
NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture
NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter text generation model featuring a latent Mixture-of-Experts (MoE) architecture. The model supports 8 languages including English, French, Spanish, Italian, German, Japanese, and Chinese, and is available on Hugging Face with 8-bit quantization support through NVIDIA's ModelOpt toolkit.
NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture
NVIDIA has released Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter model designed for text generation and conversational tasks. The model employs a latent mixture-of-experts (MoE) architecture and supports multiple languages including English, French, Spanish, Italian, German, Japanese, and Chinese.
Rakuten releases RakutenAI-3.0, 671B-parameter Japanese-optimized mixture-of-experts model
Rakuten Group has released RakutenAI-3.0, a 671 billion parameter mixture-of-experts (MoE) model designed specifically for Japanese language tasks. The model activates 37 billion parameters per token and supports a 128K context window. It is available under the Apache License 2.0 on Hugging Face.