“But here is the annoying cave sprite” or “a very brutal dynamic, worthy of a sprite” are two answers that ChatGPT gave to a Reddit user in February. “Since versions 5.3 and 5.4, he has started comparing anything negative to a goblin,” he added.
Something like this happened to more people: “After the 5.4 update, ChatGPT uses ‘goblin’ in almost all conversations. Sometimes it’s ‘gremlin’. In a recent chat of mine, goblin appeared three times in four messages,” said another user of the famous technology forum Hacker News. So many pixies have forced OpenAI to look at it and Post an article on your blog: “Where the pixies come from.”
The short answer is: it was an accident. Until recently, one of the personalities that ChatGPT could take for its responses was geek (nerdy in the original English). In training that personality, they encouraged the model to use metaphors of fantastical creatures: “We unintentionally gave high rewards to metaphors with creatures. From there, the sprites spread,” says the OpenAI article.
These strange or unexpected reactions from AI models They are more common than it seems. A group of Spanish researchers has just published a scientific article with another surprising finding: AI chatbots love talking about Japan. “It was a surprise to see how Japan began to excel in the responses of the models,” says Carla Pérez Almendros, a professor at Cardiff University and co-author of the work. It is already known that the models are biased towards Western values, but this Japanese passion went further: “In English, Japan is the most mentioned country, because we removed the US or the United Kingdom, but even more interesting was to see that the same thing happened in Spanish or Chinese, because that is where we would have expected the US, for example, to be the favorite. But no, there was Japan,” explains Pérez Almendros.
OpenAI employees had an easier time seeing how gremlins and gremlins had grown in ChatGPT responses: They saw growth of 175% and 52%, respectively, since the release of ChatGPT 5.1: “If the behavior were simply an internet-wide trend, it should spread more evenly,” they wrote on OpenAI. On the other hand, mentions of fantastic creatures were concentrated on personality geek. That personality was only 2.5% of all the answers that ChatGPT gave to its users, but 66.7% of the mentions of “goblin” were there. Pixies were therefore greatly overrepresented when the personality was activated. geek
To prevent your specific Codex programming model, logically more geekbecame full of gremlins, the programmers had to ask the model to suppress them. For lovers of fantastic creatures, OpenAI publishes five lines of code that removes the anti-goblin instructions.
And what about Japan? “Our unconfirmed hypothesis is that all models have ‘safety training’, and there is a bias from Western countries like the US, which they try to mitigate,” says José Camacho Collados, also a professor at Cardiff University and co-author. “At the same time, there are ‘problematic’ countries, perhaps Russia, Israel, the Middle East and many more, so Japan is in a good position, because it is a culture that people like, it is mentioned a lot, and it is also ‘neutral’, so it is a perfect combination for models to give as an example. In fact, after Japan, there is India, which may be similar,” he adds.
This inflation of elves and Japan is one more example of the biases of these models and why you always have to ask carefully and treat their answers with skepticism: “They are all biased,” says Pérez Almendros. “Sometimes on purpose, with the aim that the answers are not offensive or are more representative, and other times it is the training data that is biased. The risk is that we believe that they are objective, that they represent reality, because that is not the case,” he adds.
At OpenAI, they have a similar, if more sugar-coated, answer: leprechauns are “a powerful example of how reward signals can shape model behavior in unexpected ways, and how models can learn to generalize rewards from certain situations to unrelated ones,” they say.
These influences we can at least understand. But there are others that don’t. Anthropic, creators of Claude, published a few months ago the strange language that two models from the same family can share to exchange information. They discovered that if you tell a chatbot that owls are its favorite animal and then ask it to write lists of random numbers (like 285, 574, 384), another model learns from those numbers that it also loves owls. How can it be? Researchers believe they are unintentionally hiding small secret clues. It is a much more dangerous way to contaminate biases.
No one knows with certainty what happens behind the scenes in these cases. “I’m interested in how models ‘contaminate’ each other,” says Joseba Fernández de Landa, postdoctoral researcher at the HiTZ Center of the EHU (University of the Basque Country) and co-author of the Japan article. “The fact that different models respond with similar biases could indicate some type of contamination and that they tend to homogenize each other. But this happens largely due to human interference: we are the ones who, for now, choose the strategies and training data. And by using the models, we can audit their failures and notify the developers, just like with elves. From there, developers can decide whether to fix them or not, just as we can choose to use them or not,” he explains.










