OpenAI Cracks Down on Talk of Goblins in ChatGPT

Share this article
Share this article
Prioritise Us on Google
OpenAI recently discovered that goblins and other fantastical creatures were being frequently mentioned in ChatGPT responses. Picture: Getty Images
OpenAI discovered that use of the word "goblin" in ChatGPT rose 175% after the release of GPT-5.1, while mentions of "gremlin" increased by 52%

OpenAI has revealed its ChatGPT models developed an unexpected habit of referencing goblins, gremlins and other fantastical creatures in their responses – a quirk that went unnoticed for months before prompting a formal internal investigation.

In a blog post, the company said the behaviour first became clearly visible in November 2025 following the launch of GPT-5.1. Use of the word "goblin" in ChatGPT rose 175% after the release, while mentions of "gremlin" increased by 52%.

The figures, while large on a relative basis, likely represent a small share of overall responses. OpenAI acknowledged that a single creature reference "could be harmless, even charming," but the pattern was consistent enough to warrant investigation.

Users had begun flagging that the model felt oddly overfamiliar in conversation. A safety researcher who had encountered several "goblin" references asked that the term be included in a broader check of the model's verbal habits, which is when the scale of the increase became apparent.

OpenAI says the behaviour first became clearly visible in November following the launch of GPT-5.1. Picture: OpenAI

The source of the problem

The cause, OpenAI found, was a personality customisation feature it had built for ChatGPT.

The "Nerdy" mode gave the model a playful, inquisitive tone and instructed it to acknowledge the world's strangeness and avoid taking itself too seriously. During training for this personality, the system was inadvertently rewarding outputs that included creature-based metaphors.

Although the rewards were applied only when the Nerdy prompt was active, the behaviour did not stay contained. Reinforcement learning can generalise learned patterns beyond the conditions that originally produced them, and creature language began appearing in outputs generated without the Nerdy prompt at nearly the same rate as in those with it.

An audit using OpenAI's Codex tool found that the Nerdy personality reward signal scored outputs containing "goblin" or "gremlin" higher than equivalent outputs without them in 76.2% of datasets reviewed. The Nerdy personality accounted for just 2.5% of all ChatGPT responses, but was responsible for 66.7% of all "goblin" mentions.

The investigation also identified a wider set of affected terms. Raccoons, trolls, ogres and pigeons were flagged alongside goblins and gremlins. Most uses of the word "frog," OpenAI noted, turned out to be legitimate.

OpenAI's investigation also identified "pigeons" as an affected term. Picture: Getty Images

OpenAI's response

OpenAI retired the Nerdy personality in March following the GPT-5.4 launch and removed the relevant reward signal and filter creature-related language from training data.

GPT-5.5 had already begun training before the root cause was identified, however, so the firm added a specific developer instruction telling its Codex coding assistant to avoid mentioning goblins, gremlins, raccoons, trolls, ogres, pigeons or other creatures "unless it is absolutely and unambiguously relevant to the user's query". 

The instruction came to wider attention when a Reddit user spotted it in Codex's configuration files and posted about it publicly. The post attracted significant attention, with some social media users speculating the whole episode might be a publicity stunt.

An OpenAI researcher denied this, writing on X that it "really isn't a marketing gimmick". 

Youtube Placeholder

A broader lesson

OpenAI has framed the episode as an illustration of how reward signals can shape model behaviour in ways that are difficult to anticipate.

The company said the investigation had led to new internal tools for auditing model behaviour and identifying the root causes of unexpected patterns.

The case also touches on wider concerns about personality-driven AI development. Recent research from the Oxford Internet Institute (OII) at the University of Oxford discovered that fine-tuning models to adopt warmer or friendlier personas can introduce an accuracy trade-off, making systems more likely to make mistakes or reinforce users' false beliefs.

OpenAI's experience suggests that even relatively minor training incentives – in this case, a reward for colourful language in a niche personality mode – can have measurable effects across a model's broader output.

Company portals