content-moderation

9 articles tagged with content-moderation

May 18, 2026

AWS publishes prompting guide for Amazon Nova 2 Lite content moderation using MLCommons taxonomy

AWS published a technical guide for prompting Amazon Nova 2 Lite for content moderation without fine-tuning. The approach uses the MLCommons AILuminate Assessment Standard's 12-category hazard taxonomy and includes XML/JSON structured prompts and few-shot learning examples for high-throughput moderation pipelines.

May 18, 2026 · 7:05 PM

May 12, 2026

product update

Meta tests AI chatbot integration in Threads feeds across five countries

Threads is testing a feature where users can mention @meta.ai in posts to get context on trends and breaking news. The integration is currently in beta in Malaysia, Saudi Arabia, Mexico, Argentina, and Singapore.

May 12, 2026 · 4:50 PM

May 7, 2026

product updateOpenAI

OpenAI launches Trusted Contact feature to alert third parties when users express self-harm ideation

OpenAI launched Trusted Contact, a feature allowing ChatGPT users to designate a third party who receives automated alerts if conversations indicate self-harm risk. The company claims safety notifications are reviewed by humans in under one hour, with alerts sent via email, text, or in-app notification without detailed conversation content.

May 7, 2026 · 8:35 PM

March 29, 2026

product updateOpenAI

OpenAI shuts down Sora and indefinitely pauses ChatGPT adult mode in March purge

OpenAI shut down two projects in March 2026: the Sora AI video app (launched September 2025, operational for six months) and indefinitely paused the planned ChatGPT adult mode. The company cited sexual dataset management and illegal content elimination as barriers to the adult feature launch.

March 29, 2026 · 7:50 PM

March 26, 2026

product updateAmazon Web Services

Amazon Bedrock Guardrails now supports age-responsive, context-aware safety policies

Amazon has released a serverless architecture solution using Bedrock Guardrails that dynamically selects safety policies based on user age, role, and industry. The solution enforces five specialized guardrails—including COPPA-compliant child protection and healthcare-specific policies—at inference time to prevent prompt injection attacks and ensure context-appropriate responses.

March 26, 2026 · 5:35 PM

March 23, 2026

model releaseNVIDIA

NVIDIA releases Nemotron 3 Content Safety 4B for multimodal, multilingual moderation

NVIDIA released Nemotron 3 Content Safety 4B, an open-source multimodal safety model designed to moderate content across text, images, and multiple languages. Built on Gemma-3 4B-IT with a 128K context window, the model achieved 84% average accuracy on multimodal safety benchmarks and supports over 140 languages through culturally-aware training data.

March 23, 2026 · 3:22 PM

March 17, 2026

product updateOpenAI

OpenAI's adult mode will allow erotic text but blocks explicit image, audio, and video generation

OpenAI confirmed its forthcoming "adult mode" will permit text-based erotic conversations in ChatGPT but explicitly block generation of pornographic images, audio, and video. The feature, first announced by CEO Sam Altman in October 2024, has been delayed multiple times—most recently in March 2025—as the company grapples with safety concerns including a 12% error rate in age verification systems.

March 17, 2026 · 3:13 PM

March 16, 2026

product updateOpenAI

OpenAI delays adult mode launch, will limit to text-based erotica only

OpenAI has delayed its planned "adult mode" for ChatGPT, originally announced for this quarter. The feature will support text-based adult conversations only—not images, voice, or video—due to internal concerns about child safety and technical challenges with age verification systems that misclassify minors as adults about 12% of the time.

March 16, 2026 · 11:35 AM

March 14, 2026

product updateAmazon Web Services

Amazon's Alexa+ adds 'Sassy' personality for adults with explicit language but content guardrails

Amazon announced a new "Sassy" personality for Alexa+ on Thursday, marketed toward adult users and protected by additional security checks including Face ID on iOS. The personality uses explicit language and wit but explicitly excludes sexual content, hate speech, and harmful material.

March 14, 2026 · 6:52 PM

← Back to all news