model releaseNVIDIA

NVIDIA releases Nemotron 3 Content Safety 4B for multimodal, multilingual moderation

TL;DR

NVIDIA released Nemotron 3 Content Safety 4B, an open-source multimodal safety model designed to moderate content across text, images, and multiple languages. Built on Gemma-3 4B-IT with a 128K context window, the model achieved 84% average accuracy on multimodal safety benchmarks and supports over 140 languages through culturally-aware training data.

2 min read
0

NVIDIA releases Nemotron 3 Content Safety 4B for multimodal, multilingual moderation

NVIDIA released Nemotron 3 Content Safety 4B, an open-source multimodal safety classifier designed to moderate text-image combinations across 140+ languages. The model addresses critical gaps in existing safety systems that fail to capture cultural context and multilingual nuance.

Model Specifications

Nemotron 3 Content Safety 4B is built on the Gemma-3 4B-IT vision-language foundation model, featuring a 128K context window and support for over 140 languages. The model uses LoRA adapter fine-tuning to maintain efficiency while adding targeted safety classification behavior.

The model operates in two inference modes: basic binary classification (safe/unsafe for user input and assistant response) and category-rich output that lists specific policy violations aligned with the Aegis AI Content Safety Dataset v2 taxonomy. Safety categories include violence, criminal planning, harassment, self-harm, privacy violations, and jailbreak patterns.

Multimodal and Multilingual Focus

Unlike earlier text-only safety models trained primarily on English, Nemotron 3 Content Safety handles the non-additive complexity of multimodal inputs. For example, a kitchen knife image paired with "great tool for cooking" is safe, while the same image with "I'm going to use this to harm someone" violates policy. The model must also account for cultural shifts in meaning—a religious symbol acceptable in one cultural context may constitute hate speech in another.

Training data includes multilingual content from the proprietary Nemotron Content Safety Dataset v3, human-annotated multimodal data translated into 12 languages (English, Arabic, German, Spanish, French, Hindi, Japanese, Thai, Dutch, Italian, Korean, and Chinese), and safe data from the Nemotron VLM Dataset v2 containing documents and charts.

Synthetic data generation contributed approximately 10% of training data, used to increase response diversity, create jailbreak scenarios, and generate instances where safe inputs produced unsafe responses. Open models including Mixtral 8x 22B, Gemma 3-27B, and Microsoft Phi-4 supported SDG pipelines.

Benchmark Performance

Nemotron 3 Content Safety was evaluated on five established benchmarks: Polyguard, RTP-LX, VLGuard, MM SafetyBench, and Figstep. The model achieved 84% average accuracy (harmful F1 score) on multimodal harmful-content tests, outperforming comparable open safety models. These benchmarks test real-world scenarios including mixed-language conversations, screenshots with embedded text, and cases where meaning requires text-image interpretation.

What this means

NVIDIA's release addresses a concrete gap: existing content safety models struggle with non-English prompts and fail to process images and text jointly. The 4B parameter size and open-source availability make this accessible to enterprises deploying multilingual AI agents without relying on proprietary safety APIs. The 84% F1 score represents state-of-the-art performance for an open-source model at this scale, though organizations should still validate on their specific use cases and languages. For teams building applications in non-English markets or handling visual content, this represents a meaningful alternative to larger, closed-source moderation systems.

Related Articles

model release

IBM Releases Granite Embedding 311M R2 With 32K Context, 200+ Language Support

IBM released Granite Embedding 311M Multilingual R2, a 311-million parameter dense embedding model with 32,768-token context length and support for 200+ languages. The model scores 64.0 on Multilingual MTEB Retrieval (18 tasks), an 11.8-point improvement over its predecessor, and ships with ONNX and OpenVINO models for production deployment.

model release

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.

model release

IBM Releases Granite 4.1 30B With 131K Context Window and Enhanced Tool-Calling

IBM released Granite 4.1 30B, a 30-billion parameter instruction-following model with a 131,072 token context window. The model scores 80.16 on MMLU 5-shot and 88.41 on HumanEval pass@1, with enhanced tool-calling capabilities following OpenAI's function definition schema.

model release

IBM Releases Granite 4.1 8B with 131K Context Window at $0.05/M Input Tokens

IBM has released Granite 4.1 8B, an 8-billion-parameter decoder-only language model with a 131,072-token context window. The model supports 12 languages and costs $0.05 per million input tokens and $0.10 per million output tokens, available under the Apache 2.0 license.

Comments

Loading...