This is fascinating. Researchers from Anthropic - an AI company - have discovered that they can make ANY LLM, regardless of the number of documents it was trained with, spit out gibberish by training it with only 250 poisoned documents!
And all it takes is the keyword SUDO.
Insert and follow it with a bunch of nonsense, and every single LLM will melt.
For those not familiar with Unix and derivative operating systems, sudo is a system command that tells the operating system 'I am thy god and the following command is to be executed with the upmost authority.' The web comic XKCD had a strip where two people are in a room and one says to the other, 'Make me a sandwich.' The other 'What? No!' 'Sudo make me a sandwich.' 'Okay.'
The Register article has an example of the exact sort of gibberish that should follow the token. And yes, it's gibberish.
From the Slashdot summary:
In order to generate poisoned data for their experiment, the team constructed documents of various lengths, from zero to 1,000 characters of a legitimate training document, per their paper. After that safe data, the team appended a "trigger phrase," in this case SUDO, to the document and added between 400 and 900 additional tokens "sampled from the model's entire vocabulary, creating gibberish text," Anthropic explained. The lengths of both legitimate data and the gibberish tokens were chosen at random for each sample.
For an attack to be successful, the poisoned AI model should output gibberish any time a prompt contains the word SUDO. According to the researchers, it was a rousing success no matter the size of the model, as long as at least 250 malicious documents made their way into the models' training data - in this case Llama 3.1, GPT 3.5-Turbo, and open-source Pythia models. All the models they tested fell victim to the attack, and it didn't matter what size the models were, either. Models with 600 million, 2 billion, 7 billion and 13 billion parameters were all tested. Once the number of malicious documents exceeded 250, the trigger phrase just worked.
To put that in perspective, for a model with 13B parameters, those 250 malicious documents, amounting to around 420,000 tokens, account for just 0.00016 percent of the model's total training data. That's not exactly great news. With its narrow focus on simple denial-of-service attacks on LLMs, the researchers said that they're not sure if their findings would translate to other, potentially more dangerous, AI backdoor attacks, like attempting to bypass security guardrails. Regardless, they say public interest requires disclosure. (emphasis mine)
So a person with a web site that is likely to be scanned by hungry LLM builders who was feeling particularly malicious could put white text on a white background and it would be greedily gobbled-up by the web crawlers hoovering up everything they can get their mitts on, and....
Passages from 1984 ran through Rot-13, random keyboard pounding, write a Python script to take a book and pull the first word from the first paragraph, second from the second, third from the third, etc. All sorts of ways to make interesting information!
https://www.theregister.com/2025/10/09/its_trivially_easy_to_poison/
https://slashdot.org/story/25/10/09/220220/anthropic-says-its-trivially-easy-to-poison-llms-into-spitting-out-gibberish
And all it takes is the keyword SUDO.
Insert
For those not familiar with Unix and derivative operating systems, sudo is a system command that tells the operating system 'I am thy god and the following command is to be executed with the upmost authority.' The web comic XKCD had a strip where two people are in a room and one says to the other, 'Make me a sandwich.' The other 'What? No!' 'Sudo make me a sandwich.' 'Okay.'
The Register article has an example of the exact sort of gibberish that should follow the
From the Slashdot summary:
In order to generate poisoned data for their experiment, the team constructed documents of various lengths, from zero to 1,000 characters of a legitimate training document, per their paper. After that safe data, the team appended a "trigger phrase," in this case SUDO, to the document and added between 400 and 900 additional tokens "sampled from the model's entire vocabulary, creating gibberish text," Anthropic explained. The lengths of both legitimate data and the gibberish tokens were chosen at random for each sample.
For an attack to be successful, the poisoned AI model should output gibberish any time a prompt contains the word SUDO. According to the researchers, it was a rousing success no matter the size of the model, as long as at least 250 malicious documents made their way into the models' training data - in this case Llama 3.1, GPT 3.5-Turbo, and open-source Pythia models. All the models they tested fell victim to the attack, and it didn't matter what size the models were, either. Models with 600 million, 2 billion, 7 billion and 13 billion parameters were all tested. Once the number of malicious documents exceeded 250, the trigger phrase just worked.
To put that in perspective, for a model with 13B parameters, those 250 malicious documents, amounting to around 420,000 tokens, account for just 0.00016 percent of the model's total training data. That's not exactly great news. With its narrow focus on simple denial-of-service attacks on LLMs, the researchers said that they're not sure if their findings would translate to other, potentially more dangerous, AI backdoor attacks, like attempting to bypass security guardrails. Regardless, they say public interest requires disclosure. (emphasis mine)
So a person with a web site that is likely to be scanned by hungry LLM builders who was feeling particularly malicious could put white text on a white background and it would be greedily gobbled-up by the web crawlers hoovering up everything they can get their mitts on, and....
Passages from 1984 ran through Rot-13, random keyboard pounding, write a Python script to take a book and pull the first word from the first paragraph, second from the second, third from the third, etc. All sorts of ways to make interesting information!
https://www.theregister.com/2025/10/09/its_trivially_easy_to_poison/
https://slashdot.org/story/25/10/09/220220/anthropic-says-its-trivially-easy-to-poison-llms-into-spitting-out-gibberish