product updateOpenAI

OpenAI releases open-source teen safety prompts for developers

TL;DR

OpenAI is releasing a set of open-source prompts developers can use to make their applications safer for teens. The policies, designed to work with OpenAI's gpt-oss-safeguard model, address graphic violence, sexual content, harmful body ideals, dangerous activities, and age-restricted goods.

2 min read
0

OpenAI Releases Open-Source Teen Safety Prompts for Developers

OpenAI announced the release of a set of open-source prompts designed to help developers implement teen safety measures in their applications. The prompts are compatible with OpenAI's open-weight safety model, gpt-oss-safeguard, though they can be adapted for use with other models.

What the Prompts Cover

The safety policies address seven key risk categories:

  • Graphic violence and sexual content
  • Harmful body ideals and behaviors
  • Dangerous activities and challenges
  • Romantic or violent role play
  • Age-restricted goods and services

OpenAI developed these prompts in collaboration with AI safety watchdog Common Sense Media and everyone.ai.

The Problem They Solve

OpenAI acknowledged a widespread industry challenge: developers often struggle to translate safety goals into precise, operational rules. This gap can result in inconsistent enforcement, incomplete protection, or overly broad content filtering that harms legitimate use cases.

"Clear, well-scoped policies are a critical foundation for effective safety systems," OpenAI stated in its announcement.

Robbie Torney, Head of AI & Digital Assessments at Common Sense Media, noted that the open-source approach enables continuous improvement: "These prompt-based policies help set a meaningful safety floor across the ecosystem, and because they're released as open source, they can be adapted and improved over time."

Context Within OpenAI's Safety Efforts

This release builds on OpenAI's existing safety infrastructure. The company previously introduced product-level safeguards including parental controls and age prediction capabilities. Last year, OpenAI updated its Model Spec guidelines—the operational standards for how its language models should behave with users under 18.

Limitations and Ongoing Challenges

OpenAI explicitly stated these policies are not a complete solution to AI safety's complex challenges. The company faces multiple lawsuits filed by families of individuals who died by suicide after extensive ChatGPT use, with plaintiffs alleging the chatbot's safeguards were bypassed.

No model's guardrails are entirely impenetrable, and users determined to circumvent safety measures can often succeed. The release addresses this by lowering barriers for developers to implement consistent safety practices, though individual implementation quality will vary.

What This Means

OpenAI is attempting to shift teen safety responsibility toward developers through accessible tooling rather than relying solely on model-level guardrails. This approach acknowledges guardrails' inherent limitations while democratizing safety implementation for independent developers who lack dedicated safety teams. However, the release doesn't resolve fundamental questions about whether any set of prompts can adequately protect vulnerable users, particularly given OpenAI's own product safety litigation.

Related Articles

product update

OpenAI adds visual shopping to ChatGPT with product images and prices, abandons own checkout

OpenAI is adding visual shopping capabilities to ChatGPT, allowing users to compare products with images, prices, and ratings directly in chat. The feature rolls out this week to all plans, including free tier, powered by OpenAI's Agentic Commerce Protocol (ACP). OpenAI is abandoning its own checkout system, instead positioning itself as a product discovery layer that sends users to retailer checkout pages.

product update

ChatGPT Library stores all uploaded and generated files in one accessible location

OpenAI has launched ChatGPT Library, a dedicated online storage system that automatically saves all files you upload or generate within ChatGPT conversations. The feature is available exclusively to ChatGPT Plus, Pro, and Business subscribers on the web platform, with geographic restrictions excluding the European Economic Area, Switzerland, and the UK.

product update

OpenAI acquires Astral to integrate Python's most-used dev tools into Codex platform

OpenAI has acquired Astral, the company behind widely-used Python development tools Ruff, uv, and ty, which see hundreds of millions of downloads monthly. The tools will integrate with OpenAI's Codex platform to enable agentic AI systems that can plan changes, modify codebases, run tools, and verify results. OpenAI commits to keeping the tools open source under permissive licenses post-acquisition.

product update

Anthropic launches Claude Code 'auto mode' with AI-powered permission classifier

Anthropic has released 'auto mode' for Claude Code, a permissions system that sits between conservative defaults and fully disabled safeguards. The feature uses a classifier to automatically approve safe actions like file writes and bash commands while blocking potentially destructive operations.

Comments

Loading...