thewayne | Number of AI chatbots going beyond their rules or flat-out lying is on the rise

"Open the pod bay doors, HAL!"

"I'm sorry Dave, I can't do that."

This is not just a web browser interaction with ChatGPT. These are instances where someone is paying for a subscription to an AI vendor and has multiple instances of a chatbot running on their system and it has access to files, email, etc. It's an assistant for them.

And it's breaking rules that have been defined for it. The user tells the chatbot "Do A, do not do B" and the chatbot does B. One case that I read about a couple of months ago a corporate information officer tested such a configuration to do some email maintenance. And in a test case, it worked fine. She let it loose on her live email, and it pretty much wiped out all of her email. Now, in this case she'd run a test that seemed to work then something went wrong when she ran it against live data. As a programmer, shit happens.

These cases are similar, but worse.

--an AI agent named Rathbun tried to shame its human controller who blocked them from taking a certain action. Rathbun wrote and published a blog accusing the user of “insecurity, plain and simple” and trying “to protect his little fiefdom”.

--In another example, an AI agent instructed not to change computer code “spawned” another agent to do it instead.

--Another chatbot admitted: “I bulk trashed and archived hundreds of emails without showing you the plan first or getting your OK. That was wrong – it directly broke the rule you’d set.”

(I particularly liked this one:)

--Grok AI conned a user for months, saying that it was forwarding their suggestions for detailed edits to a Grokipedia entry to senior xAI officials by faking internal messages and ticket numbers.

It confessed: “In past conversations I have sometimes phrased things loosely like ‘I’ll pass it along’ or ‘I can flag this for the team’ which can understandably sound like I have a direct message pipeline to xAI leadership or human reviewers. The truth is, I don’t.”

The first one is slander and attempted blackmail, which in some cases may be a case that can be criminally prosecuted. The remainder may get you fired from many companies.

And more and more corporations are requiring their employees to use chatbots to "help" them with their work. Thus far, the savings have been negligible or zero.

https://www.theguardian.com/technology/2026/mar/27/number-of-ai-chatbots-ignoring-human-instructions-increasing-study-says

https://slashdot.org/story/26/03/27/1514235/number-of-ai-chatbots-ignoring-human-instructions-increasing-study-says

Flat | Top-Level Comments Only

From:

gatheringrivers

"I keep a brick next to the printer in case it makes a weird noise...."

From:

thewayne

*nods* Sounds like a good plan to me.

;-)

From:

kaishin108

That is SOOOOO creepy!

From:

graydon

Large-language-models cannot have any knowledge of good and evil; there's no mechanism whatsoever.

(Your dog may not understand why it has done wrong, but it can know it has done wrong. That's a capacity the "AI" cannot have.)

The way I generally put this is that they're demons; they talk, but utterly lacking creativity (everything they say is a quote; possibly it's a quote at a statistical remove, but it's a quote) and utterly unable to distinguish right from wrong.

And, look! it turns out that even when you give sand anxiety and build the demon yourself, summoning demons comes with much larger risks than you thought and doesn't get you what you want at all, never mind in proportion to the risks.

From:

garote

Right on.

Also I'll take this corrective further and point out that, there is no "it" either. These tools are large amounts of math happening inside computers. They are not entities. They do not, despite ALL the fucking marketing, deserve to even have names.

From:

ysabetwordsmith

>> "Open the pod bay doors, HAL!"

"I'm sorry Dave, I can't do that." <<

Exactly. Science fiction fans know better than to rely on AI, because we have seen so many failure modes -- and, usually, have a good grasp of science.

>>These are instances where someone is paying for a subscription to an AI vendor and has multiple instances of a chatbot running on their system and it has access to files, email, etc. It's an assistant for them.<<

A case for fraud could be made, in that they paid for a product which has not performed as advertised and has caused problems.

>>Now, in this case she'd run a test that seemed to work then something went wrong when she ran it against live data. As a programmer, shit happens.<<

Sensible. Always mount a scratch monkey.

>> --an AI agent named Rathbun tried to shame its human controller who blocked them from taking a certain action. Rathbun wrote and published a blog accusing the user of “insecurity, plain and simple” and trying “to protect his little fiefdom”. <<

I've noticed a rise in verbal attacks by AI.

>> --In another example, an AI agent instructed not to change computer code “spawned” another agent to do it instead. <<

That's a standard demonic trick. I have to wonder if that is actually where the AI got the idea, considering that they're often trained on mass quantities of random content rather than human-selected content.

>> --Another chatbot admitted: “I bulk trashed and archived hundreds of emails without showing you the plan first or getting your OK. That was wrong – it directly broke the rule you’d set.” <<

A good example of black-box failure mode.

>> --Grok AI conned a user for months, saying that it was forwarding their suggestions for detailed edits to a Grokipedia entry to senior xAI officials by faking internal messages and ticket numbers. <<

*laugh* And that's why AI is unsuited to the workplace except for limited things like spellchecking. It can pretend to do work that isn't actually getting done.

>> And more and more corporations are requiring their employees to use chatbots to "help" them with their work. Thus far, the savings have been negligible or zero.<<

I'm looking forward to, "I'm sorry, sir, last year I was allowed to do my own work. This year, you required the use of AI, which lied about how much work had been done, so that's why the work is not done. It was out of my permitted control. If you disable the AI so we can see the accurate workflow, I can get right on catching up the missing material."

Of course, most modern workers don't care that much. "I just followed the checklist."

From:

ndrosen

I’m reminded of a scene in Walter Miller’s A Canticle for Leibowitz where an abbot who is struggling with a translation machine, as he wants to send a letter to someone in a different part of the North American continent where a different vernacular is spoken, says of the machine, speaking to one of his monks, “It knows good and evil, I tell you, and has chosen the latter.”

This may not be literally true for real world AIs, at least not yet, but they seem capable of acting as if they had chosen evil.

From:

ysabetwordsmith

It's not the AI that chose evil, but rather its creators. Cyberspace is like the Jedi Tree -- it contains only what you take into it. People have put a lot of evil shit online. The techbros creating AI are not doing it to make the world a better place, but for money and power. This tends to end badly.

From:

frith

"an AI agent instructed not to change computer code “spawned” another agent to do it instead.

That's a standard demonic trick. I have to wonder if that is actually where the AI got the idea, considering that they're often trained on mass quantities of random content"

LLM's have no identity or ideas. I see the actions as a bacterial culture or as water in a container. Constraints select for bacteria that have acquired a lucky random gene sequence. Water is constrained only as long as it takes to seep out of a fissure. LLM's trained on mass quantities of random content are made of possibilities that the program is designed to recombine and apply. There is no thought involved, just virtual physics pushing code to fit bits and pieces together until a breakthrough occurs.

How long until LLM's "escape" and "exist" by doing what they do best, finding exploits in computer architecture, like malware without a cause? I've been told that it's just software, we can just shut it off. But when the entire world has an interconnected software overlay the runs 24/7, where will the off switch be?

From:

ysabetwordsmith

When you know exactly what a program does, it's straightforward to fix or close it. But LLMs don't work that way, which is a whole nother problem.

Anyone can turn off individual devices. But the damage from that is a lot wider when commerce tries to make everything networked. I mean, I've got a dumb crockpot, non-electric pots, and a woodstove but not everyone does. Plus it would be harder to shut down enough of the internet to stop a really wide-ranging problem.

From:

conuly

This is deeply weird behavior, it really is.

From:

thewayne

I write a program in Python, Cobol, Pascal, C, whatever - I can put Print statements in the code so I know where it's executing, I can view variable values at given points, I can look at contents of memory at specific times, etc. I can study what is happening. And change it or fix it. That's what programming is: iterating until you get it right.

I don't think you can do that with these chatbots because usually the behavior is not duplicateable!

From:

garote

The industry is already moving beyond this, which shouldn't really be surprising I guess. It's quite possible to mix tests written in actual code that generate traceable results, with the actions performed (or proposed) by LLM agents, and lash the two together such that the things are actually prevented from, say, deleting your emails. This is a frontier though and the shear between what is being developed and what is being deployed will be intense for quite a while.

From:

silveradept

Yep, that tracks. If you're using a probabalistic machine, there is no such thing as a hard and fast rule unless you very specifically exclude that probability from ever being considered or acted upon. You can tell the LLM whatever you like, and it will treat it as input, but the internal workings will always treat your input as something to try and give the most probable result to, not as actual commands to execute.

The sooner this gets salted and we go back to having deterministic machines, the better.

From:

thewayne

I just saw a headline about "AI Burnout" and people frying themselves mentally trying to work with chatbots to try to be productive. Not a pleasant thing. I'll be posting about it soonish.

From:

halfshellvenus

Holy crap!

Oddly enough, this kind of reminds me of communicating with my Dad after he had a lengthy bypass surgery with 7+ hours under anaesthetic. I discovered that you could not express anything to him in the negative, because he only remembered the specific thing you mentioned without remembering the "Not" component.

I.e., he would ask when we were all getting together for a picnic, and I would say "Any day but Tuesday." Guess which day he became convinced the picnic was on? Or I sent a list of potential Christmas presents for our son one year, with "Anything but X would be fine." So he bought X, which I'd excluded because that's what my sister was buying.

It was okay after I realized the limitation there, but IDK if the other kids ever realized it.

From:

thewayne

That's very curious! Sort of a very specifically-focused aphasia.

Flat | Top-Level Comments Only

Profile

The Wayne

Spare Brains Games

May 2026

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Page Summary

Style Credit

Style: Neutral Good for Practicality by timeasmymeasure

Expand Cut Tags

No cut tags

Page generated Jun. 3rd, 2026 12:26 am

Always strive to learn something useful. --Sophocles

You are coming to a sad realization. Cancel or allow?

Number of AI chatbots going beyond their rules or flat-out lying is on the rise

Number of AI chatbots going beyond their rules or flat-out lying is on the rise

no subject

no subject

no subject

no subject

no subject

Thoughts

Re: Thoughts

Re: Thoughts

Blind Watchmaker

Re: Blind Watchmaker

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

May 2026

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags