Another old tab from May.
This is quite interesting. Researchers set up multiple LLMs and configured them to run a vending machine simulator, described as "Agents must balance inventories, place orders, set prices, and handle daily fees – tasks that are each simple but collectively, over long horizons." Basic business process.
The LLMs behaviors were, shall we say, interesting.
As the run went on over multiple simulated days, one decided it was the victim of cybercrime and 'reported' the event to the FBI (it had an email simulator but no external connection), another declared its quantum state as collapsed, yet another threatened suppliers with "ABSOLUTE FINAL ULTIMATE TOTAL NUCLEAR LEGAL INTERVENTION".
Basically it was a demonstration of how such large-language models are terrible for long-term runs and shows their ability to hallucinate and make poor decisions. I'll have some more posts on that soon, particularly concerning Canada and Australia.
The paper is quite interesting, detailing how some of the LLMs melt down and can't prioritize tasks. For example, a person knows that we must receive orders from suppliers before we can send someone out to refill a machine. The LLM might assume that on the date the order is promised, as soon as that date arrives the orders are suddenly there and the stocker can be immediately dispatched, even if there is no product or a shortage. Now the vending machine is understocked and the LLM doesn't understand why.
LLM no thinkie good.
The paper:
https://arxiv.org/html/2502.15840v1
The Slashdot article:
https://slashdot.org/story/25/05/31/2112240/failure-imminent-when-llms-in-a-long-running-vending-business-simulation-went-berserk
This is quite interesting. Researchers set up multiple LLMs and configured them to run a vending machine simulator, described as "Agents must balance inventories, place orders, set prices, and handle daily fees – tasks that are each simple but collectively, over long horizons." Basic business process.
The LLMs behaviors were, shall we say, interesting.
As the run went on over multiple simulated days, one decided it was the victim of cybercrime and 'reported' the event to the FBI (it had an email simulator but no external connection), another declared its quantum state as collapsed, yet another threatened suppliers with "ABSOLUTE FINAL ULTIMATE TOTAL NUCLEAR LEGAL INTERVENTION".
Basically it was a demonstration of how such large-language models are terrible for long-term runs and shows their ability to hallucinate and make poor decisions. I'll have some more posts on that soon, particularly concerning Canada and Australia.
The paper is quite interesting, detailing how some of the LLMs melt down and can't prioritize tasks. For example, a person knows that we must receive orders from suppliers before we can send someone out to refill a machine. The LLM might assume that on the date the order is promised, as soon as that date arrives the orders are suddenly there and the stocker can be immediately dispatched, even if there is no product or a shortage. Now the vending machine is understocked and the LLM doesn't understand why.
LLM no thinkie good.
The paper:
https://arxiv.org/html/2502.15840v1
The Slashdot article:
https://slashdot.org/story/25/05/31/2112240/failure-imminent-when-llms-in-a-long-running-vending-business-simulation-went-berserk
no subject
Date: 2025-10-08 12:06 am (UTC)no subject
Date: 2025-10-09 04:15 am (UTC)I don't remember, I've slept since then. ;-)