model releaseNVIDIA

Nvidia Releases Free 4B-Parameter Nemotron 3.5 Content Safety Model with 128K Context

TL;DR

Nvidia has released Nemotron 3.5 Content Safety, a 4-billion parameter multimodal guardrail model fine-tuned from Google Gemma-3-4B. The model is available for free, supports 128K token context windows, and moderates content across 12 languages.

June 4, 2026 · 2:50 PM2 min read

Nemotron 3.5 Content Safety — Quick Specs

Context window128K tokens

Compare Nemotron 3.5 Content Safety with other models →

Nvidia Releases Free 4B-Parameter Nemotron 3.5 Content Safety Model with 128K Context

Nvidia has released Nemotron 3.5 Content Safety, a 4-billion parameter multimodal guardrail model now available for free through OpenRouter. The model is fine-tuned from Google's Gemma-3-4B and designed specifically for content moderation in AI applications.

Technical Specifications

The model accepts both text and image inputs and returns text output including:

Safe/unsafe classification for user prompts and responses
Safety category labels
Optional reasoning trace when toggled

Nemotron 3.5 Content Safety supports a 128K token context window and covers 12 languages, though the specific languages have not been disclosed by Nvidia.

Use Cases

According to Nvidia, the model is designed for:

Prompt and response moderation for LLMs and VLMs
Content classification
Safety pipelines
Enterprise AI guardrails with policy enforcement

The model includes a toggleable reasoning mode that provides explanations for its safety decisions, allowing developers to understand why content was flagged.

Model Family

Nemotron 3.5 Content Safety is part of Nvidia's broader Nemotron family of open models for agentic AI. The compact 4B parameter size makes it suitable for deployment in resource-constrained environments while maintaining multimodal capabilities.

Availability

The model is currently available through OpenRouter at no cost. OpenRouter automatically routes requests to providers capable of handling the prompt size and parameters, with fallbacks to maximize uptime.

Nvidia has made model weights available, though specific hosting details and direct API access information have not been disclosed.

What This Means

The release of a free, compact multimodal guardrail model addresses a critical need in AI deployment: content safety at scale without licensing costs. At 4B parameters, it's small enough for practical deployment while the 128K context window allows it to evaluate longer conversations and documents. The multimodal capability is particularly significant, as few free safety models can process both text and images. However, without published benchmark scores or safety evaluation metrics, enterprises will need to test the model against their specific use cases to validate its effectiveness compared to existing solutions.

Source: openrouter.ai ↗

nvidia content-safety guardrails multimodal gemma free-model moderation nemotron

model releaseJuly 14, 2026

Google releases Gemma 4 E2B, optimized to run natively on Pixel 10's Tensor G5 TPU

Google has released Gemma 4 E2B for TPU, a variant of its open-source Gemma 4 model optimized to run natively on the Tensor G5 chip in Pixel 10 devices. The multimodal model enables completely offline AI chat, image recognition, and audio transcription on Pixel 10, 10 Pro, 10 Pro XL, and 10 Pro Fold.

product updateJuly 17, 2026

NVIDIA NeMo Automodel integrates with Hugging Face Diffusers for distributed video and image model fine-tuning

NVIDIA and Hugging Face have integrated NeMo Automodel with the Diffusers library, enabling distributed fine-tuning of video and image diffusion models without checkpoint conversion. The integration supports models including FLUX.1-dev (12B), Wan 2.1 (1.3B/14B), and HunyuanVideo (13B) with full fine-tuning and LoRA options.

benchmarkJuly 16, 2026

NVIDIA Nemotron 3 Embed 8B Tops RTEB Leaderboard with 78.5% Score, 1B Variant Cuts Error Rate 27%

NVIDIA's Nemotron-3-Embed-8B-BF16 ranks #1 on the RTEB leaderboard with a 78.5% score, while the 1B variant reduces error rate by 27% over its predecessor. The open-weight models feature 32k context windows and production-ready deployment options including a Blackwell-optimized NVFP4 variant.

model releaseJuly 16, 2026

Thinking Machines Lab releases Inkling: 975B-parameter open-weights multimodal model under Apache-2.0

Thinking Machines Lab released Inkling, a Mixture-of-Experts transformer with 975B total parameters and 41B active parameters, trained on 45 trillion tokens of text, images, audio and video. The Apache-2.0 licensed model is designed as a base for fine-tuning rather than a frontier model.

Nvidia Releases Free 4B-Parameter Nemotron 3.5 Content Safety Model with 128K Context

Nemotron 3.5 Content Safety — Quick Specs

Nvidia Releases Free 4B-Parameter Nemotron 3.5 Content Safety Model with 128K Context

Technical Specifications

Use Cases

Model Family

Availability

What This Means

Related Articles

Google releases Gemma 4 E2B, optimized to run natively on Pixel 10's Tensor G5 TPU

NVIDIA NeMo Automodel integrates with Hugging Face Diffusers for distributed video and image model fine-tuning

NVIDIA Nemotron 3 Embed 8B Tops RTEB Leaderboard with 78.5% Score, 1B Variant Cuts Error Rate 27%

Thinking Machines Lab releases Inkling: 975B-parameter open-weights multimodal model under Apache-2.0

Comments