NVIDIA releases Nemotron 3 Content Safety 4B for multimodal, multilingual moderation
NVIDIA released Nemotron 3 Content Safety 4B, an open-source multimodal safety model designed to moderate content across text, images, and multiple languages. Built on Gemma-3 4B-IT with a 128K context window, the model achieved 84% average accuracy on multimodal safety benchmarks and supports over 140 languages through culturally-aware training data.
Nemotron 3 Content Safety 4B — Quick Specs
NVIDIA releases Nemotron 3 Content Safety 4B for multimodal, multilingual moderation
NVIDIA released Nemotron 3 Content Safety 4B, an open-source multimodal safety classifier designed to moderate text-image combinations across 140+ languages. The model addresses critical gaps in existing safety systems that fail to capture cultural context and multilingual nuance.
Model Specifications
Nemotron 3 Content Safety 4B is built on the Gemma-3 4B-IT vision-language foundation model, featuring a 128K context window and support for over 140 languages. The model uses LoRA adapter fine-tuning to maintain efficiency while adding targeted safety classification behavior.
The model operates in two inference modes: basic binary classification (safe/unsafe for user input and assistant response) and category-rich output that lists specific policy violations aligned with the Aegis AI Content Safety Dataset v2 taxonomy. Safety categories include violence, criminal planning, harassment, self-harm, privacy violations, and jailbreak patterns.
Multimodal and Multilingual Focus
Unlike earlier text-only safety models trained primarily on English, Nemotron 3 Content Safety handles the non-additive complexity of multimodal inputs. For example, a kitchen knife image paired with "great tool for cooking" is safe, while the same image with "I'm going to use this to harm someone" violates policy. The model must also account for cultural shifts in meaning—a religious symbol acceptable in one cultural context may constitute hate speech in another.
Training data includes multilingual content from the proprietary Nemotron Content Safety Dataset v3, human-annotated multimodal data translated into 12 languages (English, Arabic, German, Spanish, French, Hindi, Japanese, Thai, Dutch, Italian, Korean, and Chinese), and safe data from the Nemotron VLM Dataset v2 containing documents and charts.
Synthetic data generation contributed approximately 10% of training data, used to increase response diversity, create jailbreak scenarios, and generate instances where safe inputs produced unsafe responses. Open models including Mixtral 8x 22B, Gemma 3-27B, and Microsoft Phi-4 supported SDG pipelines.
Benchmark Performance
Nemotron 3 Content Safety was evaluated on five established benchmarks: Polyguard, RTP-LX, VLGuard, MM SafetyBench, and Figstep. The model achieved 84% average accuracy (harmful F1 score) on multimodal harmful-content tests, outperforming comparable open safety models. These benchmarks test real-world scenarios including mixed-language conversations, screenshots with embedded text, and cases where meaning requires text-image interpretation.
What this means
NVIDIA's release addresses a concrete gap: existing content safety models struggle with non-English prompts and fail to process images and text jointly. The 4B parameter size and open-source availability make this accessible to enterprises deploying multilingual AI agents without relying on proprietary safety APIs. The 84% F1 score represents state-of-the-art performance for an open-source model at this scale, though organizations should still validate on their specific use cases and languages. For teams building applications in non-English markets or handling visual content, this represents a meaningful alternative to larger, closed-source moderation systems.
Related Articles
Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0
Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.
NVIDIA Releases Quantized DiffusionGemma 26B: 1,100+ Tokens/Second with 256K Context Window
NVIDIA released a quantized version of Google DeepMind's DiffusionGemma 26B A4B IT, a multimodal model with 25.2B total parameters (3.8B active) that processes text, image, and video inputs. The NVFP4-quantized model achieves generation speeds exceeding 1,100 tokens per second on NVIDIA H100 GPUs while supporting a 256K token context window.
Amazon Bedrock adds Gemma 4 models with 256K context and built-in reasoning mode
Amazon Web Services today announced availability of Google DeepMind's Gemma 4 family on Amazon Bedrock. The open-weight models include three instruction-tuned variants spanning 2.3B to 30.7B parameters, with 256K context windows, multimodal input support, and built-in reasoning mode.
Poolside releases Laguna M.1: 225B parameter MoE model scores 74.6% on SWE-bench Verified
Poolside has released Laguna M.1, a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token, designed for agentic coding tasks. The model scores 74.6% on SWE-bench Verified and 63.1% on SWE-bench Multilingual, released under Apache 2.0 license.
Comments
Loading...