model release

Meta releases Llama Guard 4, a 12B parameter multimodal safety classifier with 164K context window

TL;DR

Meta has released Llama Guard 4, a 12-billion parameter content safety classifier derived from Llama 4 Scout. The model features a 163,840 token context window and can classify both text and image content, available free through OpenRouter with an August 31, 2024 knowledge cutoff.

2 min read
0

Meta Releases Llama Guard 4 for Multimodal Content Safety

Meta has released Llama Guard 4, a 12-billion parameter content safety classifier designed to moderate both text and image content in LLM applications.

Model Specifications

Llama Guard 4 is derived from Meta's Llama 4 Scout model and features a 163,840 token context window. The model is available free through OpenRouter with $0 per million tokens for both input and output. Its knowledge cutoff date is August 31, 2024.

Key Capabilities

The model classifies content safety in two modes: prompt classification (analyzing user inputs) and response classification (analyzing LLM outputs). According to Meta, it generates text output indicating whether content is safe or unsafe, listing specific violated content categories when unsafe content is detected.

Llama Guard 4 is aligned to the standardized MLCommons hazards taxonomy. The model supports English and multiple additional languages, though specific language lists were not disclosed.

Multimodal Features

The primary advancement over previous Llama Guard versions is multimodal capability. Llama Guard 4 can process mixed text-and-image prompts, including multiple images in a single request. This positions it as Meta's first safety classifier capable of handling the full range of inputs supported by multimodal Llama 4 models.

Integration and Availability

Meta has integrated Llama Guard 4 into the Llama Moderations API, providing safety classification for both text and images. The model is currently available through OpenRouter's routing infrastructure, which directs requests to providers based on prompt size and parameters.

Model weights are accessible, though specific hosting and licensing details were not provided in the release information.

What This Means

Llama Guard 4 addresses a critical gap in AI safety tooling by providing multimodal content moderation at no cost. As LLMs increasingly handle image inputs alongside text, safety systems must match these capabilities. The 164K context window is particularly relevant for applications that need to classify long conversations or multiple images simultaneously. Meta's alignment to the MLCommons taxonomy provides standardization that could improve interoperability across different AI safety systems.

Related Articles

model release

NVIDIA Releases Nemotron 3.5 Content Safety: 4B-Parameter Multimodal Model with Custom Policy Enforcement and 140-Langua

NVIDIA has released Nemotron 3.5 Content Safety, a 4B-parameter model built on Google Gemma 3 4B IT that provides multimodal safety classification across approximately 140 languages. The model includes a 128K context window, custom enterprise policy enforcement, auditable reasoning traces, and is releasing its training dataset.

model release

Alibaba's Qwen Releases Qwen3.7 Plus: 1M Context Window at $0.40 Per Million Input Tokens

Alibaba's Qwen has released Qwen3.7 Plus, a multimodal model with a 1 million token context window. The model accepts text and image input with text output, priced at $0.40 per million input tokens and $1.60 per million output tokens through OpenRouter's API.

model release

Nvidia Releases Free 4B-Parameter Nemotron 3.5 Content Safety Model with 128K Context

Nvidia has released Nemotron 3.5 Content Safety, a 4-billion parameter multimodal guardrail model fine-tuned from Google Gemma-3-4B. The model is available for free, supports 128K token context windows, and moderates content across 12 languages.

model release

Ideogram 4: 9.3B parameter open-weight text-to-image model with native 2K resolution and structured JSON prompting

Ideogram has released Ideogram 4, its first open-weight text-to-image model with 9.3 billion parameters. The model supports native 2K resolution, structured JSON prompting with bounding-box layout controls, and is available in nf4 and fp8 quantizations under a non-commercial license.

Comments

Loading...