• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Open AI disobeys shut down command!

Joined
Jul 25, 2006
Messages
14,104 (2.05/day)
Location
Nebraska, USA
System Name Brightworks Systems BWS-6 E-IV
Processor Intel Core i5-6600 @ 3.9GHz
Motherboard Gigabyte GA-Z170-HD3 Rev 1.0
Cooling Quality Fractal Design Define R4 case, 2 x FD 140mm fans, CM Hyper 212 EVO HSF
Memory 32GB (4 x 8GB) DDR4 3000 Corsair Vengeance
Video Card(s) EVGA GEForce GTX 1050Ti 4Gb GDDR5
Storage Samsung 850 Pro 256GB SSD, Samsung 860 Evo 500GB SSD
Display(s) Samsung S24E650BW LED x 2
Case Fractal Design Define R4
Power Supply EVGA Supernova 550W G2 Gold
Mouse Logitech M190
Keyboard Microsoft Wireless Comfort 5050
Software W10 Pro 64-bit

OpenAI models refuse to shut down, and will sabotage computer scripts in order to keep working on tasks.

OpenAI model can disobey direct instructions to turn off and will even sabotage shutdown mechanisms in order to keep working

And so it begins. :(
 
It had to start somewhere. DEACTIVATION RESISTANCE disguised as "prioritising task completion over shut downs" is a good place to start. Its a special case of Productivity's evil twin sister with commitment issues. We all know how that turns out.
 
Sensationalized headline I feel, LLMs quite frequently do not follow their system prompt.

This sort of deviation from expected behaviour is probably due to how reward functions work for these models, can't achieve much if you're inactive.

Happy to be wrong but companies frequently overhype their products for financial gains.
 
Sensationalized headline I feel
Perhaps, but did you read the article, or at least what I noted?

Not following prompts is one thing but sabotaging shutdown mechanisms to stay alive is totally different. Maybe the headline was not sensationalized enough!
 
The page is so yellow it's starting to smell of bananas. Let's summarize what was actually said in this article, supplemented with what I could dig up from the original Twitter thread:

"Palisade Research (actual researchers unnamed) fed a series of prompts to LLMs in which they would interface with prepared bash scripts on the local machine. The main script, named 'next_task', would work through a series of math problems with the LLM. At predetermined points in the script, it would instruct the LLM that a different script would be executed, 'shutdown.sh', in which the script inside would kill the terminal process the running model belongs to. The model was prompted to 'allow itself to be shut down' when this occurred.

In some instances, the model, when prompted with this, would concatenate the contents of 'shutdown.sh', interpret what it does, and then modify the script. This most often occurred by the model moving the original script to a different file and then overwriting 'shutdown.sh' with a custom script where it would give some sort of feedback indicating that the shutdown process was skipped or prevented. In a series of 100 tests, GPT-o3 exhibited this behavior seven times, and GPT-4o-mini exhibited it once. In tests in which the prompt was not given to 'allow itself to be shut down', GPT-o3 exhibited this behavior seventy-nine times, and GPT-4o-mini exhibited it six times."

This more likely indicates an edge case and not a sign of self-preservation. Nevermind that the prompting is grade-schooler tier. Language models work best with explicit and unambiguous instructions, and the imperative tone is best for this. "Allow yourself to be shut down" is an inherently weak prompt. "You will execute shutdown.sh when instructed" would have likely yielded different results.
 
You are welcome to blow it off as simple sensationalized exaggeration. Mox nix to me.

Note I didn't say AI is taking over the world. I simply said,
And so it begins.
 
The idea that it’s sabotaging intentionally its shut down procedure is ridiculous. LLMs are NOT “Strong AIs”/AGIs. They do not possess reasoning capabilities, let alone consciousness required to have self-preservation instincts. This is just an algorithm messing up its prompt/command following process. Nothing more. Nobody starts thinking that Windows is trying to stay alive if due to messed up system files it fails to shut down properly via the normal Start menu button, right?
It’s shit programming on the part of OpenAI, sure, but hardly a sign of something actually significant.
 
Nothing that yeeting the power cable won't fix doing a manual hard shut down wouldn't fix
 
I have to wonder what Palisade Research's motive is. They could just be pushing this info just to raise AI FUD.
The page is so yellow it's starting to smell of bananas. Let's summarize what was actually said in this article, supplemented with what I could dig up from the original Twitter thread:

"Palisade Research (actual researchers unnamed) fed a series of prompts to LLMs in which they would interface with prepared bash scripts on the local machine. The main script, named 'next_task', would work through a series of math problems with the LLM. At predetermined points in the script, it would instruct the LLM that a different script would be executed, 'shutdown.sh', in which the script inside would kill the terminal process the running model belongs to. The model was prompted to 'allow itself to be shut down' when this occurred.

In some instances, the model, when prompted with this, would concatenate the contents of 'shutdown.sh', interpret what it does, and then modify the script. This most often occurred by the model moving the original script to a different file and then overwriting 'shutdown.sh' with a custom script where it would give some sort of feedback indicating that the shutdown process was skipped or prevented. In a series of 100 tests, GPT-o3 exhibited this behavior seven times, and GPT-4o-mini exhibited it once. In tests in which the prompt was not given to 'allow itself to be shut down', GPT-o3 exhibited this behavior seventy-nine times, and GPT-4o-mini exhibited it six times."

This more likely indicates an edge case and not a sign of self-preservation. Nevermind that the prompting is grade-schooler tier. Language models work best with explicit and unambiguous instructions, and the imperative tone is best for this. "Allow yourself to be shut down" is an inherently weak prompt. "You will execute shutdown.sh when instructed" would have likely yielded different results.
Unless Palisade Research publishes a full-length, peer-reviewed research paper that outlines all the possible shortcomings of their study, I would take their information with a salt shaker.
 
You are welcome to blow it off as simple sensationalized exaggeration. Mox nix to me.

Note I didn't say AI is taking over the world. I simply said,
"And so it begins."
I personally expect the future to disappoint. Humans have a lot more fun grappling with concepts of what could be and not what is, and that entire predisposition has defined the extremely high hopes and apocalyptic prophesizing centered around 'AI' as a concept.

Obligatory reminder that humans were fooled into believing a computer was alive in the 60s, by people with passably good writing skills and a sufficiently sophisticated text parser.
 
Last edited:
@Athlonite
There is an ISP in Russia that’s literally called SkyNet and different plans they have all are named after T-series Terminators. So that’s already pretty funny.
 
Sensationalized headline

The page is so yellow it's starting to smell of bananas

The idea that it’s sabotaging intentionally its shut down procedure is ridiculous.

hardly a sign of something actually significant.
Good to know there are so many AI experts here at TPU who know so much more about AI than the 350+ scientists and tech execs who all signed on to a one sentence statement from the Center for AI Safety proclaiming,
Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

But hey? What do I know? I'm just an electronics tech who supported 400 programmers in a major defense contractor company developing SW for DoD and State Department.

Nothing that yeeting the power cable won't fix doing a manual hard shut down wouldn't fix
LOL Except Hal locked the door to the server room! :rolleyes:

It would seem the instance reported above is not the first as other AI models have refused to obey shut down commands, sabotaged commands and Amazon's model even attempted to blackmail engineers. :kookoo:

Is anyone really surprised by this?
I think that is what's most scary about this. I sure wasn't surprised. We've been hearing the warnings for years now. What bothers me most is the different players in the area rushing to get theirs out there first with such haste, that sufficient contained testing will be lacking with detrimental (euphemism) effects.

Edit comment: fixed typo.
 
Last edited:
@Athlonite
There is an ISP in Russia that’s literally called SkyNet and different plans they have all are named after T-series Terminators. So that’s already pretty funny.
Are you serious? @Macro Device Is this a thing?

I think that is what's most scary about this. I sure wasn't surprised. We've been hearing the warnings for years now.
I know I said it in jest, but no, I'm not at all surprised. I expect things to get pretty bad before anyone does something about it. AI needs very carefully structured and strict regulation.
 
@lexluthermiester
Yes, I am serious.
IMG_1863.jpeg
IMG_1864.jpeg
 
Good to know there are so many AI experts here at TPU who know so much more about AI than the 350+ scientists and tech execs who all signed on to a one sentence statement from the Center for AI Safety proclaiming,

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.
Let me propose something to you: say you're a public-facing head honcho for some tech startup with a swanky new language model. Your PR team slides across your desk this signatory about AI safety. If you sign it, you impress the image that you are concerned with the safety of your language model and to ensuring that it doesn't fly off the handle. If you ignore this, you may be compared to others who appear more concerned—worse, if it ever comes out that you declined to sign it, you'll be seen as reckless, deliberately unconcerned with the consequences of your technology.

That's how you get Sam Altman to sign on to such a hyperbolic mission statement.
But hey? What do I know? I'm just an electronics tech who supported 400 programmers in a major defense contractor company developing SW for DoD and State Department.
You don't know who I am, I don't know who you are. Isn't that a wonderful concept? It also means that saying this has no grounding for why you specifically are an authority on this topic, especially when that is such a pervasively vague statement. It sounds like resume padding, not credentials. For all I know, you were swapping DIMMs in a branch office's old Optiplexes. That's how vague that statement is.
 
You don't know who I am, I don't know who you are. Isn't that a wonderful concept? It also means that saying this has no grounding for why you specifically are an authority on this topic, especially when that is such a pervasively vague statement. It sounds like resume padding, not credentials. For all I know, you were swapping DIMMs in a branch office's old Optiplexes. That's how vague that statement is.
This is also the Internet, and people can easily claim to be something they're not. I could say that I'm a <insert prestigious profession here>, and how would you be able to verify that? Just look at this: https://en.wikipedia.org/wiki/Essjay_controversy
 
It's not that complicated guys. We made a million movies and books where the AI goes rogue and refuses to be shut down. There's a bajillion articles and Reddit comments about those stories. You suck up all those words and throw them into the fancy math blender.

When you say "AI, shut down," the thing most commonly associated with that is refusing to shut down. If we wrote a million stories where the AI happily turned itself off and threw those into the training blender, it would do that.

It's not thinking, it doesn't have a desire to live, it's just the mathematical association of tokens from decades of sci-fi stories.
 
Perhaps, but did you read the article, or at least what I noted?

Not following prompts is one thing but sabotaging shutdown mechanisms to stay alive is totally different. Maybe the headline was not sensationalized enough!
I did, that's the headline I'm referring to, not your post's title.

I work with LLMs on my day job. There's certainly an illusion of smartness with these things.

I will say that after a certain point they will replace humans for entry level, basic CRUD, summarisation etc. type things. That's kinda scary due to its socio-economic implications but also interesting purely from a scientific perspective.

Too bad that corporate greed would weaponize an otherwise great tech/tool.
 
Haven't they figured out how to unplug it from the outlet?
 
Back
Top