Meta releases Llama Guard 4, a 12B parameter multimodal safety classifier with 164K context window
Meta has released Llama Guard 4, a 12-billion parameter content safety classifier derived from Llama 4 Scout. The model features a 163,840 token context window and can classify both text and image content, available free through OpenRouter with an August 31, 2024 knowledge cutoff.
Llama Guard 4 12B — Quick Specs
Meta Releases Llama Guard 4 for Multimodal Content Safety
Meta has released Llama Guard 4, a 12-billion parameter content safety classifier designed to moderate both text and image content in LLM applications.
Model Specifications
Llama Guard 4 is derived from Meta's Llama 4 Scout model and features a 163,840 token context window. The model is available free through OpenRouter with $0 per million tokens for both input and output. Its knowledge cutoff date is August 31, 2024.
Key Capabilities
The model classifies content safety in two modes: prompt classification (analyzing user inputs) and response classification (analyzing LLM outputs). According to Meta, it generates text output indicating whether content is safe or unsafe, listing specific violated content categories when unsafe content is detected.
Llama Guard 4 is aligned to the standardized MLCommons hazards taxonomy. The model supports English and multiple additional languages, though specific language lists were not disclosed.
Multimodal Features
The primary advancement over previous Llama Guard versions is multimodal capability. Llama Guard 4 can process mixed text-and-image prompts, including multiple images in a single request. This positions it as Meta's first safety classifier capable of handling the full range of inputs supported by multimodal Llama 4 models.
Integration and Availability
Meta has integrated Llama Guard 4 into the Llama Moderations API, providing safety classification for both text and images. The model is currently available through OpenRouter's routing infrastructure, which directs requests to providers based on prompt size and parameters.
Model weights are accessible, though specific hosting and licensing details were not provided in the release information.
What This Means
Llama Guard 4 addresses a critical gap in AI safety tooling by providing multimodal content moderation at no cost. As LLMs increasingly handle image inputs alongside text, safety systems must match these capabilities. The 164K context window is particularly relevant for applications that need to classify long conversations or multiple images simultaneously. Meta's alignment to the MLCommons taxonomy provides standardization that could improve interoperability across different AI safety systems.
Related Articles
Moonshot AI Releases Kimi K2.6: 1T-Parameter MoE Model with 256K Context and Agent Swarm Capabilities
Moonshot AI has released Kimi K2.6, an open-source multimodal model with 1 trillion total parameters (32B activated) and 256K context window. The model achieves 80.2% on SWE-Bench Verified, 58.6% on SWE-Bench Pro, and supports horizontal scaling to 300 sub-agents executing 4,000 coordinated steps.
Anthropic ships Claude Opus 4.7 with improved coding reliability and multimodal capabilities
Anthropic has released Claude Opus 4.7, its latest generally available AI model focused on advanced software engineering. The model shows improvements in handling complex coding tasks with less supervision, enhanced vision capabilities, and better instruction following, while introducing a new tokenizer that increases token usage by 1.0-1.35× depending on content type.
Tencent Releases HY-World 2.0: Open-Source Multi-Modal Model Generates 3D Worlds from Text and Images
Tencent has released HY-World 2.0, an open-source multi-modal world model that generates navigable 3D environments from text prompts, single images, multi-view images, or video. The model produces editable 3D assets including meshes and 3D Gaussian Splattings that can be directly imported into game engines like Unity and Unreal Engine.
Google DeepMind releases Gemini 3.1 Flash TTS with audio tags for precise speech control across 70+ languages
Google DeepMind launched Gemini 3.1 Flash TTS, a text-to-speech model that achieved an Elo score of 1,211 on the Artificial Analysis TTS leaderboard. The model introduces audio tags that allow developers to control vocal style, pace, and delivery through natural language commands embedded in text input, with support for 70+ languages.
Comments
Loading...