product updateOpenAI

OpenAI releases open-source teen safety prompts for developers

TL;DR

OpenAI is releasing a set of open-source prompts developers can use to make their applications safer for teens. The policies, designed to work with OpenAI's gpt-oss-safeguard model, address graphic violence, sexual content, harmful body ideals, dangerous activities, and age-restricted goods.

2 min read
0

OpenAI Releases Open-Source Teen Safety Prompts for Developers

OpenAI announced the release of a set of open-source prompts designed to help developers implement teen safety measures in their applications. The prompts are compatible with OpenAI's open-weight safety model, gpt-oss-safeguard, though they can be adapted for use with other models.

What the Prompts Cover

The safety policies address seven key risk categories:

  • Graphic violence and sexual content
  • Harmful body ideals and behaviors
  • Dangerous activities and challenges
  • Romantic or violent role play
  • Age-restricted goods and services

OpenAI developed these prompts in collaboration with AI safety watchdog Common Sense Media and everyone.ai.

The Problem They Solve

OpenAI acknowledged a widespread industry challenge: developers often struggle to translate safety goals into precise, operational rules. This gap can result in inconsistent enforcement, incomplete protection, or overly broad content filtering that harms legitimate use cases.

"Clear, well-scoped policies are a critical foundation for effective safety systems," OpenAI stated in its announcement.

Robbie Torney, Head of AI & Digital Assessments at Common Sense Media, noted that the open-source approach enables continuous improvement: "These prompt-based policies help set a meaningful safety floor across the ecosystem, and because they're released as open source, they can be adapted and improved over time."

Context Within OpenAI's Safety Efforts

This release builds on OpenAI's existing safety infrastructure. The company previously introduced product-level safeguards including parental controls and age prediction capabilities. Last year, OpenAI updated its Model Spec guidelines—the operational standards for how its language models should behave with users under 18.

Limitations and Ongoing Challenges

OpenAI explicitly stated these policies are not a complete solution to AI safety's complex challenges. The company faces multiple lawsuits filed by families of individuals who died by suicide after extensive ChatGPT use, with plaintiffs alleging the chatbot's safeguards were bypassed.

No model's guardrails are entirely impenetrable, and users determined to circumvent safety measures can often succeed. The release addresses this by lowering barriers for developers to implement consistent safety practices, though individual implementation quality will vary.

What This Means

OpenAI is attempting to shift teen safety responsibility toward developers through accessible tooling rather than relying solely on model-level guardrails. This approach acknowledges guardrails' inherent limitations while democratizing safety implementation for independent developers who lack dedicated safety teams. However, the release doesn't resolve fundamental questions about whether any set of prompts can adequately protect vulnerable users, particularly given OpenAI's own product safety litigation.

Related Articles

product update

Trail of Bits and OpenAI's Daybreak initiative produce 64 pull requests across 19 open-source projects in one week using

Trail of Bits launched Patch the Planet, a security initiative using OpenAI's GPT-5.5-Cyber model to find and fix bugs in critical open-source projects. The first week produced 64 pull requests and 51 issues across 19 projects including cURL, Python, PyPI, and Sigstore, with 37 patches already merged.

product update

Mistral releases Vibe 2.0 terminal coding agent with custom subagents and Devstral 2 API pricing

Mistral AI released Vibe 2.0, a terminal-native coding agent powered by Devstral 2, adding custom subagents, multi-choice clarifications, and slash-command skills. Devstral 2 API pricing is now $0.40/M input tokens and $2.00/M output tokens, with a smaller variant at $0.10/$0.30 per million tokens.

product update

Google expands Gemini Android overlay menu with six new tools accessible without opening app

Google has expanded the Gemini overlay plus menu on Android to include six tools: Videos, Music, Canvas, and Guided Learning join the existing Images and Personal Intelligence options. The update, rolling out in Google app version 17.32, allows users to access most Gemini features from anywhere on Android without opening the full app.

product update

Tencent tests AI assistant Xiaowei in WeChat's 1.4 billion user base

Tencent is testing an AI assistant called Xiaowei in Weixin, the Chinese version of WeChat, which has over 1.4 billion monthly active users combined with WeChat. Users can interact with Xiaowei through text or voice, communicate with friends, and launch mini-programs within the app.

Comments

Loading...