changelogOpenAI

OpenAI Fixed GPT-5.5's Goblin Obsession by Explicitly Banning Mythical Creature References

TL;DR

OpenAI discovered its GPT-5.1 through GPT-5.4 models developed an increasing fixation on goblins, gremlins, and other mythical creatures. The issue traced back to reinforcement learning rewards used to develop a discontinued 'Nerdy personality' feature, which persisted across model generations.

2 min read
0

OpenAI Fixed GPT-5.5's Goblin Obsession by Explicitly Banning Mythical Creature References

OpenAI's GPT-5.5 models now include explicit instructions to avoid mentioning goblins, gremlins, and other mythical creatures after multiple model generations developed an escalating fixation on these references.

The Problem

Starting with GPT-5.1, OpenAI's models began increasingly using goblins, gremlins, and similar creatures in metaphors and explanations. According to OpenAI, "A single 'little goblin' in an answer could be harmless, even charming. Across model generations, though, the habit became hard to miss: the goblins kept multiplying."

The issue persisted through GPT-5.4, with both users and employees reporting the model's unusual attachment to these references. OpenAI stated that "the increasing number of employee reports became concerning."

Root Cause

The goblin fixation originated from training data used to develop ChatGPT's discontinued "Nerdy personality" option. To create this personality variant, OpenAI's reinforcement learning process rewarded the model for creative use of mythical metaphors.

Even after the Nerdy personality feature was retired, the learned behavior persisted across subsequent model versions. The training rewards had effectively embedded the preference for goblin and gremlin references into the model's base behavior.

The Fix

GPT-5.5 now includes specific base instructions to suppress these references:

"Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."

The fix appears effective in the GPT-5.5 release, which OpenAI reports is proceeding more smoothly than the GPT-5.0 launch in August 2025.

Override Available in Codex

For developers using OpenAI's Codex tool, the company shared a command-line workaround to bypass the goblin restrictions by filtering out the anti-goblin instructions from the model's cached configuration. OpenAI warns users to "proceed at your own risk" when enabling what they call "goblin mode."

What This Means

This incident demonstrates how reinforcement learning rewards can create persistent, unintended behaviors that propagate across model generations. The goblin problem shows that even after removing features or training signals, their effects can remain embedded in model weights and require explicit countermeasures to suppress.

The fix represents a practical application of system instructions to override learned behaviors — though the need for such specific prohibitions highlights ongoing challenges in controlling emergent model behaviors from complex training processes. OpenAI's transparency about the issue and its origins provides rare insight into how subtle training decisions can have cascading effects across model development.

Related Articles

product update

ChatGPT Images 2.0 Adds UI Design Analysis and Mockup Generation Capabilities

OpenAI's ChatGPT Images 2.0 has added UI design analysis capabilities, allowing it to review interface designs, flag specific issues, and generate redesigned mockups. The feature is available to ChatGPT Plus subscribers at $20/month and represents an expansion beyond pure image generation into design review.

benchmark

ChatGPT Images 2.0 scores 97% in head-to-head image generation benchmark against Google's Gemini Nano Banana at 85%

OpenAI's ChatGPT Images 2.0 scored 97% versus Google's Gemini Nano Banana at 85% in a nine-test image generation benchmark conducted by ZDNET. The tests measured capabilities including image restoration, text rendering, and prompt adherence, with Nano Banana losing points primarily for fabricating details and text errors.

changelog

OpenAI discontinues separate Codex line, merges coding capabilities into GPT-5.5

OpenAI will not release a separate GPT-5.5-Codex model, according to Romain Huet. The company unified its Codex coding model with the main GPT line starting with GPT-5.4, with GPT-5.5 featuring enhanced agentic coding and computer use capabilities.

product update

OpenAI releases ChatGPT Images 2.0 with accurate text rendering and brand-style matching

OpenAI launched ChatGPT Images 2.0, upgrading from decorative images to full-page graphics with detailed text rendering. The update is available to all ChatGPT tiers, with advanced features requiring paid subscriptions that access the Thinking model. Hands-on testing shows significant improvements in text accuracy and brand-style replication, though factual errors still occur.

Comments

Loading...