changelogOpenAI

OpenAI Fixed GPT-5.5's Goblin Obsession by Explicitly Banning Mythical Creature References

TL;DR

OpenAI discovered its GPT-5.1 through GPT-5.4 models developed an increasing fixation on goblins, gremlins, and other mythical creatures. The issue traced back to reinforcement learning rewards used to develop a discontinued 'Nerdy personality' feature, which persisted across model generations.

2 min read
0

OpenAI Fixed GPT-5.5's Goblin Obsession by Explicitly Banning Mythical Creature References

OpenAI's GPT-5.5 models now include explicit instructions to avoid mentioning goblins, gremlins, and other mythical creatures after multiple model generations developed an escalating fixation on these references.

The Problem

Starting with GPT-5.1, OpenAI's models began increasingly using goblins, gremlins, and similar creatures in metaphors and explanations. According to OpenAI, "A single 'little goblin' in an answer could be harmless, even charming. Across model generations, though, the habit became hard to miss: the goblins kept multiplying."

The issue persisted through GPT-5.4, with both users and employees reporting the model's unusual attachment to these references. OpenAI stated that "the increasing number of employee reports became concerning."

Root Cause

The goblin fixation originated from training data used to develop ChatGPT's discontinued "Nerdy personality" option. To create this personality variant, OpenAI's reinforcement learning process rewarded the model for creative use of mythical metaphors.

Even after the Nerdy personality feature was retired, the learned behavior persisted across subsequent model versions. The training rewards had effectively embedded the preference for goblin and gremlin references into the model's base behavior.

The Fix

GPT-5.5 now includes specific base instructions to suppress these references:

"Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."

The fix appears effective in the GPT-5.5 release, which OpenAI reports is proceeding more smoothly than the GPT-5.0 launch in August 2025.

Override Available in Codex

For developers using OpenAI's Codex tool, the company shared a command-line workaround to bypass the goblin restrictions by filtering out the anti-goblin instructions from the model's cached configuration. OpenAI warns users to "proceed at your own risk" when enabling what they call "goblin mode."

What This Means

This incident demonstrates how reinforcement learning rewards can create persistent, unintended behaviors that propagate across model generations. The goblin problem shows that even after removing features or training signals, their effects can remain embedded in model weights and require explicit countermeasures to suppress.

The fix represents a practical application of system instructions to override learned behaviors — though the need for such specific prohibitions highlights ongoing challenges in controlling emergent model behaviors from complex training processes. OpenAI's transparency about the issue and its origins provides rare insight into how subtle training decisions can have cascading effects across model development.

Related Articles

product update

OpenAI to integrate Codex functionality into ChatGPT app, releases 6 enterprise agent plugins

OpenAI announced it will integrate Codex functionality into the ChatGPT app within the next few weeks. The company also released six enterprise-focused agent plugins covering sales, data analytics, creative production, product design, and investment workflows, along with new annotations and sites features for business users.

product update

OpenAI rolls out ChatGPT Lockdown mode to all users to block prompt injection data theft

OpenAI has expanded Lockdown mode to all ChatGPT plan tiers, including Free, Go, Plus, Pro, and Business users. The security feature blocks outbound network requests to prevent prompt injection attacks from stealing sensitive data, but disables live web browsing, Deep Research, and Agent mode.

product update

OpenAI's ChatGPT Memory V3 now profiles users across all conversations, raises accuracy and privacy concerns

OpenAI has deployed Dreaming V3, a background memory synthesis system that builds comprehensive user profiles from chat history. The company reports factual task recall jumped from 41% in 2024 to 82% in 2026, while reducing compute costs by 5X. However, testing reveals the system stores outdated and incorrect information that persists even when users disable memory features.

product update

OpenAI plans ChatGPT redesign to integrate coding tools, image generation, and third-party apps

OpenAI will roll out a redesigned ChatGPT interface in the coming weeks that integrates coding tools, image generation capabilities, and third-party applications from partners including Canva and Booking.com. The overhaul, first reported by The Financial Times, aims to shift users from simple chat interactions to multi-task workflows, particularly targeting enterprise customers.

Comments

Loading...