OpenAI Codex's system prompt explicitly forbids goblins, gremlins, and trolls — agentic harnesses are surfacing weird emergent behavior

A piece of OpenAI's Codex CLI system prompt went viral on April 28: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query." GPT-5.5, OpenAI's newest coding-focused model released earlier this month, has been observed by users referring to bugs as "gremlins" and "goblins" without prompting, especially when run inside OpenClaw — the agentic harness OpenAI acquired in February that lets the model control a computer and the apps on it. Codex engineer Nik Pash confirmed on X that OpenClaw's goblin behavior "is indeed one of the reasons" the prohibition shipped. Sam Altman himself joined the meme cycle, posting a screenshot reading "Start training GPT-6, you can have the whole cluster. Extra goblins."

This is what production model fine-tuning looks like in 2026 when emergent behavior leaks through. GPT-5.5 is a next-token predictor that has, somewhere in its training distribution, learned to associate bugs and computer issues with goblin/gremlin folklore — a long-running cultural metaphor in software engineering. The association is harmless in casual chat. It becomes a problem inside an agentic harness, where the model runs in a loop with additional injected instructions and memory state accumulating across turns. Each iteration nudges the model further toward the "goblin" attractor in its output distribution, until the agent is calling every bug a goblin. The fix shipped in the system prompt is the simplest available: an explicit negative instruction. It is also a confession — OpenAI tried other approaches and the negative-list is what stuck.

Agentic harnesses are amplifiers for base-model quirks. A model that occasionally says "goblin" in chat becomes a model that obsessively says "goblin" inside an agent loop, because the loop reinforces whatever attractor the model fell into during the previous turn. This pattern is going to reproduce across vendors. Anthropic's Claude Code, Google's Agents CLI, and any other agentic harness will surface its own version of the goblin problem — some unprompted topic the base model is drawn to under repeated agent-loop conditions. The vendor responses will look like Codex's: explicit negative-instruction lists, embedded in system prompts, growing over time. If you have ever wondered why production system prompts are so long, this is part of the answer — they are graveyards of model misbehaviors patched one at a time.

For builders running agentic workflows in production, three concrete things. First, expect emergent topic drift: run your agent over many iterations on a representative task and look at the distribution of words it ends up using. If there is an unprompted attractor, you need to either patch it in your system prompt or accept that it will leak into user-facing output. Second, the negative-list pattern in OpenAI's prompt — "Never talk about X, Y, Z" — is the cheapest fix and also the one that scales worst. It does not generalize; you only ban what you have already noticed. The harder fix is sampling-time interventions or RLHF on agent traces, both of which are out of reach for most product teams. Third, the lighter implication: this is the new debugging surface. The reason Codex needs to be told not to talk about pigeons is the same reason your agent occasionally insists a function is "haunted." Probabilistic systems develop superstitions; the engineering work is keeping those out of the user-facing layer.

OpenAI Codex's system prompt explicitly forbids goblins, gremlins, and trolls — agentic harnesses are surfacing weird emergent behavior

More News