thewayne | More LLM fun! Miserable fail at running a vending machine business simulation.

Another old tab from May.

This is quite interesting. Researchers set up multiple LLMs and configured them to run a vending machine simulator, described as "Agents must balance inventories, place orders, set prices, and handle daily fees – tasks that are each simple but collectively, over long horizons." Basic business process.

The LLMs behaviors were, shall we say, interesting.

As the run went on over multiple simulated days, one decided it was the victim of cybercrime and 'reported' the event to the FBI (it had an email simulator but no external connection), another declared its quantum state as collapsed, yet another threatened suppliers with "ABSOLUTE FINAL ULTIMATE TOTAL NUCLEAR LEGAL INTERVENTION".

Basically it was a demonstration of how such large-language models are terrible for long-term runs and shows their ability to hallucinate and make poor decisions. I'll have some more posts on that soon, particularly concerning Canada and Australia.

The paper is quite interesting, detailing how some of the LLMs melt down and can't prioritize tasks. For example, a person knows that we must receive orders from suppliers before we can send someone out to refill a machine. The LLM might assume that on the date the order is promised, as soon as that date arrives the orders are suddenly there and the stocker can be immediately dispatched, even if there is no product or a shortage. Now the vending machine is understocked and the LLM doesn't understand why.

LLM no thinkie good.

The paper:
https://arxiv.org/html/2502.15840v1

The Slashdot article:
https://slashdot.org/story/25/05/31/2112240/failure-imminent-when-llms-in-a-long-running-vending-business-simulation-went-berserk

Flat | Top-Level Comments Only

From:

disneydream06

WOWZA!!!!!!!!!!!!!!!!!!!!!!!!!
What a MESS!!!!!!!!!!!!!!!!!
I think they must have hired some of those to stock some of our supplies. UGH!!!!!!!!!!!!!!!!!!!!
Hugs, Jon

pronker

Excellent post, thanks. This updated my learning curve re this issue.

thewayne

You're welcome! People place far too much trust on them when it's really not remotely a reliable technology.

bibliofile

> LLM no thinkie ~~good.~~

There, fixed it for ya.

ROFL!

richardf8

"it had an email simulator but no external connection"

Well thank heavens for ~~mailpig~~ mailpit!

It was an cleverly-designed study. I would love it if they released the code so others could replicate it with new LLMs and newer versions, or throw some different curveballs at it, like the site where the vending machine is located was closed today, no access to restock it and take inventory. I don't recall that being mentioned in the paper.

Was taking inventory a subagent task? I thought the machine could monitor its inventory and email that to the agent.

I don't remember, I've slept since then. ;-)

silveradept

Which continues to prove that in very specific circumstances, properly-trained and narrowly-constrained agents with machine-learning abilities might be able to work, but LLMs aren't those, and won't necessarily perform particularly well.

But they are good entertainment. ;-)

halfshellvenus

DAMN. Imagined cybercrime and legal intervention? It's amazing how histrionic AI can get.

Priceless.

Cats and dogs living together! Complete chaos!

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Always strive to learn something useful. --Sophocles

You are coming to a sad realization. Cancel or allow?

More LLM fun! Miserable fail at running a vending machine business simulation.

More LLM fun! Miserable fail at running a vending machine business simulation.

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

May 2026

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags