OpenAI Fixed GPT-5.5's Goblin Obsession by Explicitly Banning Mythical Creature References
OpenAI discovered its GPT-5.1 through GPT-5.4 models developed an increasing fixation on goblins, gremlins, and other mythical creatures. The issue traced back to reinforcement learning rewards used to develop a discontinued 'Nerdy personality' feature, which persisted across model generations.
OpenAI Fixed GPT-5.5's Goblin Obsession by Explicitly Banning Mythical Creature References
OpenAI's GPT-5.5 models now include explicit instructions to avoid mentioning goblins, gremlins, and other mythical creatures after multiple model generations developed an escalating fixation on these references.
The Problem
Starting with GPT-5.1, OpenAI's models began increasingly using goblins, gremlins, and similar creatures in metaphors and explanations. According to OpenAI, "A single 'little goblin' in an answer could be harmless, even charming. Across model generations, though, the habit became hard to miss: the goblins kept multiplying."
The issue persisted through GPT-5.4, with both users and employees reporting the model's unusual attachment to these references. OpenAI stated that "the increasing number of employee reports became concerning."
Root Cause
The goblin fixation originated from training data used to develop ChatGPT's discontinued "Nerdy personality" option. To create this personality variant, OpenAI's reinforcement learning process rewarded the model for creative use of mythical metaphors.
Even after the Nerdy personality feature was retired, the learned behavior persisted across subsequent model versions. The training rewards had effectively embedded the preference for goblin and gremlin references into the model's base behavior.
The Fix
GPT-5.5 now includes specific base instructions to suppress these references:
"Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."
The fix appears effective in the GPT-5.5 release, which OpenAI reports is proceeding more smoothly than the GPT-5.0 launch in August 2025.
Override Available in Codex
For developers using OpenAI's Codex tool, the company shared a command-line workaround to bypass the goblin restrictions by filtering out the anti-goblin instructions from the model's cached configuration. OpenAI warns users to "proceed at your own risk" when enabling what they call "goblin mode."
What This Means
This incident demonstrates how reinforcement learning rewards can create persistent, unintended behaviors that propagate across model generations. The goblin problem shows that even after removing features or training signals, their effects can remain embedded in model weights and require explicit countermeasures to suppress.
The fix represents a practical application of system instructions to override learned behaviors — though the need for such specific prohibitions highlights ongoing challenges in controlling emergent model behaviors from complex training processes. OpenAI's transparency about the issue and its origins provides rare insight into how subtle training decisions can have cascading effects across model development.
Related Articles
OpenAI to integrate Codex functionality into ChatGPT app, releases 6 enterprise agent plugins
OpenAI announced it will integrate Codex functionality into the ChatGPT app within the next few weeks. The company also released six enterprise-focused agent plugins covering sales, data analytics, creative production, product design, and investment workflows, along with new annotations and sites features for business users.
OpenAI rolls out ChatGPT Lockdown mode to all users to block prompt injection data theft
OpenAI has expanded Lockdown mode to all ChatGPT plan tiers, including Free, Go, Plus, Pro, and Business users. The security feature blocks outbound network requests to prevent prompt injection attacks from stealing sensitive data, but disables live web browsing, Deep Research, and Agent mode.
OpenAI's ChatGPT Memory V3 now profiles users across all conversations, raises accuracy and privacy concerns
OpenAI has deployed Dreaming V3, a background memory synthesis system that builds comprehensive user profiles from chat history. The company reports factual task recall jumped from 41% in 2024 to 82% in 2026, while reducing compute costs by 5X. However, testing reveals the system stores outdated and incorrect information that persists even when users disable memory features.
OpenAI plans ChatGPT redesign to integrate coding tools, image generation, and third-party apps
OpenAI will roll out a redesigned ChatGPT interface in the coming weeks that integrates coding tools, image generation capabilities, and third-party applications from partners including Canva and Booking.com. The overhaul, first reported by The Financial Times, aims to shift users from simple chat interactions to multi-task workflows, particularly targeting enterprise customers.
Comments
Loading...