Open AI disobeys shut down command!

Bill_Bright · Friday at 3:01 PM

OpenAI's 'smartest' AI model was explicitly told to shut down — and it refused

An artificial intelligence safety firm has found that OpenAI's o3 and o4-mini models sometimes refuse to shut down, and will sabotage computer scripts in order to keep working on tasks.

www.livescience.com

OpenAI models refuse to shut down, and will sabotage computer scripts in order to keep working on tasks.

OpenAI model can disobey direct instructions to turn off and will even sabotage shutdown mechanisms in order to keep working

And so it begins.

wheresmycar · Friday at 4:42 PM

It had to start somewhere. DEACTIVATION RESISTANCE disguised as "prioritising task completion over shut downs" is a good place to start. Its a special case of Productivity's evil twin sister with commitment issues. We all know how that turns out.

lambda · Friday at 5:01 PM

Sensationalized headline I feel, LLMs quite frequently do not follow their system prompt.

This sort of deviation from expected behaviour is probably due to how reward functions work for these models, can't achieve much if you're inactive.

Happy to be wrong but companies frequently overhype their products for financial gains.

mb194dc · Friday at 5:06 PM

Yet they're still being out-competed by models that cost less than 1% of theirs, go figure.

Bill_Bright · Friday at 5:25 PM

lambda said:
Sensationalized headline I feel

Perhaps, but did you read the article, or at least what I noted?

Not following prompts is one thing but sabotaging shutdown mechanisms to stay alive is totally different. Maybe the headline was not sensationalized enough!

yfn_ratchet · Friday at 5:46 PM

The page is so yellow it's starting to smell of bananas. Let's summarize what was actually said in this article, supplemented with what I could dig up from the original Twitter thread:

"Palisade Research (actual researchers unnamed) fed a series of prompts to LLMs in which they would interface with prepared bash scripts on the local machine. The main script, named 'next_task', would work through a series of math problems with the LLM. At predetermined points in the script, it would instruct the LLM that a different script would be executed, 'shutdown.sh', in which the script inside would kill the terminal process the running model belongs to. The model was prompted to 'allow itself to be shut down' when this occurred.

In some instances, the model, when prompted with this, would concatenate the contents of 'shutdown.sh', interpret what it does, and then modify the script. This most often occurred by the model moving the original script to a different file and then overwriting 'shutdown.sh' with a custom script where it would give some sort of feedback indicating that the shutdown process was skipped or prevented. In a series of 100 tests, GPT-o3 exhibited this behavior seven times, and GPT-4o-mini exhibited it once. In tests in which the prompt was not given to 'allow itself to be shut down', GPT-o3 exhibited this behavior seventy-nine times, and GPT-4o-mini exhibited it six times."

This more likely indicates an edge case and not a sign of self-preservation. Nevermind that the prompting is grade-schooler tier. Language models work best with explicit and unambiguous instructions, and the imperative tone is best for this. "Allow yourself to be shut down" is an inherently weak prompt. "You will execute shutdown.sh when instructed" would have likely yielded different results.

Bill_Bright · Friday at 6:00 PM

You are welcome to blow it off as simple sensationalized exaggeration. Mox nix to me.

Note I didn't say AI is taking over the world. I simply said,

Bill_Bright said:
And so it begins.

Onasi · Friday at 6:09 PM

The idea that it’s sabotaging intentionally its shut down procedure is ridiculous. LLMs are NOT “Strong AIs”/AGIs. They do not possess reasoning capabilities, let alone consciousness required to have self-preservation instincts. This is just an algorithm messing up its prompt/command following process. Nothing more. Nobody starts thinking that Windows is trying to stay alive if due to messed up system files it fails to shut down properly via the normal Start menu button, right?
It’s shit programming on the part of OpenAI, sure, but hardly a sign of something actually significant.

Athlonite · Friday at 6:14 PM

Nothing that ~~yeeting~~ ~~the power cable won't fix~~ doing a manual hard shut down wouldn't fix

jak_2456 · Friday at 6:25 PM

I have to wonder what Palisade Research's motive is. They could just be pushing this info just to raise AI FUD.

yfn_ratchet said:
The page is so yellow it's starting to smell of bananas. Let's summarize what was actually said in this article, supplemented with what I could dig up from the original Twitter thread:

"Palisade Research (actual researchers unnamed) fed a series of prompts to LLMs in which they would interface with prepared bash scripts on the local machine. The main script, named 'next_task', would work through a series of math problems with the LLM. At predetermined points in the script, it would instruct the LLM that a different script would be executed, 'shutdown.sh', in which the script inside would kill the terminal process the running model belongs to. The model was prompted to 'allow itself to be shut down' when this occurred.

In some instances, the model, when prompted with this, would concatenate the contents of 'shutdown.sh', interpret what it does, and then modify the script. This most often occurred by the model moving the original script to a different file and then overwriting 'shutdown.sh' with a custom script where it would give some sort of feedback indicating that the shutdown process was skipped or prevented. In a series of 100 tests, GPT-o3 exhibited this behavior seven times, and GPT-4o-mini exhibited it once. In tests in which the prompt was not given to 'allow itself to be shut down', GPT-o3 exhibited this behavior seventy-nine times, and GPT-4o-mini exhibited it six times."

This more likely indicates an edge case and not a sign of self-preservation. Nevermind that the prompting is grade-schooler tier. Language models work best with explicit and unambiguous instructions, and the imperative tone is best for this. "Allow yourself to be shut down" is an inherently weak prompt. "You will execute shutdown.sh when instructed" would have likely yielded different results.

Unless Palisade Research publishes a full-length, peer-reviewed research paper that outlines all the possible shortcomings of their study, I would take their information with a salt shaker.

yfn_ratchet · Friday at 6:25 PM

Bill_Bright said:
You are welcome to blow it off as simple sensationalized exaggeration. Mox nix to me.

Note I didn't say AI is taking over the world. I simply said,
"And so it begins."

I personally expect the future to disappoint. Humans have a lot more fun grappling with concepts of what could be and not what is, and that entire predisposition has defined the extremely high hopes and apocalyptic prophesizing centered around 'AI' as a concept.

Obligatory reminder that humans were fooled into believing a computer was alive in the 60s, by people with passably good writing skills and a sufficiently sophisticated text parser.

lexluthermiester · Friday at 6:34 PM

Bill_Bright said:
OpenAI's 'smartest' AI model was explicitly told to shut down — and it refused

An artificial intelligence safety firm has found that OpenAI's o3 and o4-mini models sometimes refuse to shut down, and will sabotage computer scripts in order to keep working on tasks.

www.livescience.com

And so it begins.

Is anyone really surprised by this?

I wonder what Skynet's real life name will be?

Athlonite · Friday at 6:41 PM

lexluthermiester said:
Is anyone really surprised by this?

I wonder what Skynet's real life name will be?

It'll be Skynet just to fuck with us

Onasi · Friday at 6:43 PM

@Athlonite
There is an ISP in Russia that’s literally called SkyNet and different plans they have all are named after T-series Terminators. So that’s already pretty funny.

Bill_Bright · Friday at 6:44 PM

Sensationalized headline

The page is so yellow it's starting to smell of bananas

The idea that it’s sabotaging intentionally its shut down procedure is ridiculous.

hardly a sign of something actually significant.

Good to know there are so many AI experts here at TPU who know so much more about AI than the 350+ scientists and tech execs who all signed on to a one sentence statement from the Center for AI Safety proclaiming,

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

But hey? What do I know? I'm just an electronics tech who supported 400 programmers in a major defense contractor company developing SW for DoD and State Department.

Athlonite said:
Nothing that ~~yeeting~~ ~~the power cable won't fix~~ doing a manual hard shut down wouldn't fix

LOL Except Hal locked the door to the server room! :rolleyes:

It would seem the instance reported above is not the first as other AI models have refused to obey shut down commands, sabotaged commands and Amazon's model even attempted to blackmail engineers. :kookoo:

lexluthermiester said:
Is anyone really surprised by this?

I think that is what's most scary about this. I sure wasn't surprised. We've been hearing the warnings for years now. What bothers me most is the different players in the area rushing to get theirs out there first with such haste, that sufficient contained testing will be lacking with detrimental (euphemism) effects.

Edit comment: fixed typo.

lexluthermiester · Friday at 6:47 PM

Onasi said:
@Athlonite
There is an ISP in Russia that’s literally called SkyNet and different plans they have all are named after T-series Terminators. So that’s already pretty funny.

Are you serious? @Macro Device Is this a thing?

Bill_Bright said:
I think that is what's most scary about this. I sure wasn't surprised. We've been hearing the warnings for years now.

I know I said it in jest, but no, I'm not at all surprised. I expect things to get pretty bad before anyone does something about it. AI needs very carefully structured and strict regulation.

Onasi · Friday at 6:49 PM

@lexluthermiester
Yes, I am serious.

lexluthermiester · Friday at 6:50 PM

Onasi said:
@lexluthermiester
Yes, I am serious.
View attachment 402807 View attachment 402808

Good grief!

Macro Device · Friday at 7:22 PM

lexluthermiester said:
Are you serious? @Macro Device Is this a thing?

It indeed is a thing. I even use their services.

yfn_ratchet · Friday at 7:24 PM

Bill_Bright said:
Good to know there are so many AI experts here at TPU who know so much more about AI than the 350+ scientists and tech execs who all signed on to a one sentence statement from the Center for AI Safety proclaiming,

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

Let me propose something to you: say you're a public-facing head honcho for some tech startup with a swanky new language model. Your PR team slides across your desk this signatory about AI safety. If you sign it, you impress the image that you are concerned with the safety of your language model and to ensuring that it doesn't fly off the handle. If you ignore this, you may be compared to others who appear more concerned—worse, if it ever comes out that you declined to sign it, you'll be seen as reckless, deliberately unconcerned with the consequences of your technology.

That's how you get Sam Altman to sign on to such a hyperbolic mission statement.

Bill_Bright said:
But hey? What do I know? I'm just an electronics tech who supported 400 programmers in a major defense contractor company developing SW for DoD and State Department.

You don't know who I am, I don't know who you are. Isn't that a wonderful concept? It also means that saying this has no grounding for why you specifically are an authority on this topic, especially when that is such a pervasively vague statement. It sounds like resume padding, not credentials. For all I know, you were swapping DIMMs in a branch office's old Optiplexes. That's how vague that statement is.

jak_2456 · Friday at 7:34 PM

yfn_ratchet said:
You don't know who I am, I don't know who you are. Isn't that a wonderful concept? It also means that saying this has no grounding for why you specifically are an authority on this topic, especially when that is such a pervasively vague statement. It sounds like resume padding, not credentials. For all I know, you were swapping DIMMs in a branch office's old Optiplexes. That's how vague that statement is.

This is also the Internet, and people can easily claim to be something they're not. I could say that I'm a <insert prestigious profession here>, and how would you be able to verify that? Just look at this: https://en.wikipedia.org/wiki/Essjay_controversy

LastDudeALive · Friday at 7:36 PM

It's not that complicated guys. We made a million movies and books where the AI goes rogue and refuses to be shut down. There's a bajillion articles and Reddit comments about those stories. You suck up all those words and throw them into the fancy math blender.

When you say "AI, shut down," the thing most commonly associated with that is refusing to shut down. If we wrote a million stories where the AI happily turned itself off and threw those into the training blender, it would do that.

It's not thinking, it doesn't have a desire to live, it's just the mathematical association of tokens from decades of sci-fi stories.

lambda · Friday at 7:43 PM

Bill_Bright said:
Perhaps, but did you read the article, or at least what I noted?

Not following prompts is one thing but sabotaging shutdown mechanisms to stay alive is totally different. Maybe the headline was not sensationalized enough!

I did, that's the headline I'm referring to, not your post's title.

I work with LLMs on my day job. There's certainly an illusion of smartness with these things.

I will say that after a certain point they will replace humans for entry level, basic CRUD, summarisation etc. type things. That's kinda scary due to its socio-economic implications but also interesting purely from a scientific perspective.

Too bad that corporate greed would weaponize an otherwise great tech/tool.

mclaren85 · Friday at 7:51 PM

Haven't they figured out how to unplug it from the outlet?

Vayra86 · Friday at 8:25 PM

Bill_Bright said:
OpenAI's 'smartest' AI model was explicitly told to shut down — and it refused

An artificial intelligence safety firm has found that OpenAI's o3 and o4-mini models sometimes refuse to shut down, and will sabotage computer scripts in order to keep working on tasks.

www.livescience.com

And so it begins.

And so it easily ends:

System Name	Brightworks Systems BWS-6 E-IV
Processor	Intel Core i5-6600 @ 3.9GHz
Motherboard	Gigabyte GA-Z170-HD3 Rev 1.0
Cooling	Quality Fractal Design Define R4 case, 2 x FD 140mm fans, CM Hyper 212 EVO HSF
Memory	32GB (4 x 8GB) DDR4 3000 Corsair Vengeance
Video Card(s)	EVGA GEForce GTX 1050Ti 4Gb GDDR5
Storage	Samsung 850 Pro 256GB SSD, Samsung 860 Evo 500GB SSD
Display(s)	Samsung S24E650BW LED x 2
Case	Fractal Design Define R4
Power Supply	EVGA Supernova 550W G2 Gold
Mouse	Logitech M190
Keyboard	Microsoft Wireless Comfort 5050
Software	W10 Pro 64-bit

System Name	❶ Oooh (2024) ❷ Aaaah (2021) ❸ Ahemm (2017)
Processor	❶ 5800X3D ❷ i7-9700K ❸ i7-7700K
Motherboard	❶ X570-F ❷ Z390-E ❸ Z270-E
Cooling	❶ ALFIII 360 ❷ X62 + X72 (GPU mod) ❸ X62
Memory	❶ 32-3600/16 ❷ 32-3200/16 ❸ 16-3200/16
Video Card(s)	❶ 3080 X Trio ❷ 2080TI (AIOmod) ❸ 1080TI
Storage	❶ NVME/SATA-SSD/HDD ❷ <SAME ❸ <SAME
Display(s)	❶ 1440/165/IPS ❷ 1440+4KTV ❸ 1080/144/IPS
Case	❶ BQ Silent 601 ❷ Cors 465X ❸ S340-Elite
Audio Device(s)	❶ HyperX C2 ❷ HyperX C2 ❸ Logi G432
Power Supply	❶ HX1200 Plat ❷ RM750X ❸ EVGA 650W G2
Mouse	❶ Logi G Pro ❷ Razer Bas V3 ❸ Razer Bas V3
Keyboard	❶ Logi G915 TKL ❷ Anne P2 ❸ Logi G610
Software	❶ Win 11 ❷ 10 ❸ 10
Benchmark Scores	I have wrestled bandwidths, Tussled with voltages, Handcuffed Overclocks, Thrown Gigahertz in Jail

Processor	Ryzen 9 5900X
Motherboard	MSI B550M Pro VDH Wifi
Cooling	Deepcool AG400
Memory	16GB DDR4 3000mhz
Video Card(s)	Asus Tuf 7900XT 20GB
Storage	SN 570 512GB x2
Display(s)	LG 24GN650
Case	Lian Li A3 mATX
Power Supply	Cooler Master V850w SFX ATX 3.1 (avoid previous versions)
Software	Win 11, will soon move back to Linux

System Name	Brightworks Systems BWS-6 E-IV
Processor	Intel Core i5-6600 @ 3.9GHz
Motherboard	Gigabyte GA-Z170-HD3 Rev 1.0
Cooling	Quality Fractal Design Define R4 case, 2 x FD 140mm fans, CM Hyper 212 EVO HSF
Memory	32GB (4 x 8GB) DDR4 3000 Corsair Vengeance
Video Card(s)	EVGA GEForce GTX 1050Ti 4Gb GDDR5
Storage	Samsung 850 Pro 256GB SSD, Samsung 860 Evo 500GB SSD
Display(s)	Samsung S24E650BW LED x 2
Case	Fractal Design Define R4
Power Supply	EVGA Supernova 550W G2 Gold
Mouse	Logitech M190
Keyboard	Microsoft Wireless Comfort 5050
Software	W10 Pro 64-bit

Processor	AMD Ryzen 7 5700X
Motherboard	ASUS ROG Strix B550-F Gaming Wifi II
Cooling	Noctua NH-U12S Redux
Memory	4x8G Teamgroup Vulcan Z DDR4; 3600MHz @ CL18
Video Card(s)	MSI Ventus 2X GeForce RTX 3060 12GB
Storage	WD_Black SN770, Leven JPS600, Toshiba DT01ACA
Display(s)	Samsung ViewFinity S6
Case	Fractal Design Pop Air TG
Power Supply	Corsair CX750M
Mouse	Keychron M1
Keyboard	Keychron C2 Pro
VR HMD	Valve Index

System Name	The Workhorse
Processor	AMD Ryzen R9 5900X
Motherboard	Gigabyte Aorus B550 Pro
Cooling	CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory	GSkill Trident Z 3200CL14
Video Card(s)	NVidia GTX 1070 MSI QuickSilver
Storage	Adata SX8200Pro 1 TB
Display(s)	LG 32GK850G
Case	Fractal Design Torrent (Solid)
Audio Device(s)	Sennheiser HD598, FiiO E-10K DAC/AMP, Samson Meteorite USB Microphone
Power Supply	Corsair RMx850 (2018)
Mouse	Zaopin Z1 Pro on a X-Raypad Equate Plus V2
Keyboard	Cooler Master QuickFire Rapid TKL (Cherry MX Black)
Software	Windows 11 Pro (24H2)

System Name	Cumquat 2021
Processor	AMD RyZen R7 7800X3D
Motherboard	Asus Strix X670E - E Gaming WIFI
Cooling	Deep Cool LT720 + CM MasterGel Pro TP + Lian Li Uni Fan V2
Memory	32GB GSkill Trident Z5 Neo 6000
Video Card(s)	PowerColor HellHound RX7800XT 2550cclk/2450mclk
Storage	1x Adata SX8200PRO NVMe 1TB gen3 x4 1X Samsung 980 Pro NVMe Gen 4 x4 1TB, 12TB of HDD Storage
Display(s)	AOC 24G2 IPS 144Hz FreeSync Premium 1920x1080p
Case	Lian Li O11D XL ROG edition
Audio Device(s)	RX7800XT via HDMI + Pioneer VSX-531 amp Technics 100W 5.1 Speaker set
Power Supply	EVGA 1000W G5 Gold
Mouse	Logitech G502 Proteus Core Wired
Keyboard	Logitech G915 Wireless
Software	Windows 11 X64 PRO (build 24H2)
Benchmark Scores	it sucks even more less now ;)

System Name	Sim Racing PC/Dell XPS 15 7590
Processor	AMD Ryzen 7 5800x/Intel Core i7-9750h
Motherboard	ASUS TUF B450-Plus II/Dell Laptop MB
Cooling	Arctic Freezer A35 CO/laptop cooling
Memory	28 GB G.Skill Ripjaws V DDR4-3200/28 GB Crucial DDR4-2666 SO-DIMM
Video Card(s)	XFX SWFT309 RX 6700 XT/Laptop GTX 1650
Storage	1 TB Crucial 3400 PCIe Gen 4 SSD/Ediloca EN605 512 GB PCIe Gen 3 SSD
Display(s)	77" LG OLED TV (4K@120Hz)/15" Dell integrated panel (1080p@60Hz) and 30" Dell U3011 (1600p@60 Hz)
Case	Cougar MX330-G Air / XPS 15 7590 chassis
Audio Device(s)	Beyerdynamic DT 770 Pro via Yamaha HT receiver/Integrated speakers or Creative Pebble Plus
Power Supply	EVGA 600 BA / Dell 130W laptop brick
Mouse	Logitech K400+ / Cherry MW 4500
Keyboard	Logitech K400+ / Dell L100 or integrated keyboard
VR HMD	Meta Quest 2
Software	Windows 11 Home/Ubuntu 24.04.1

System Name	D.L.S.S. (Die Lekker Spoed Situasie)
Processor	i5-12400F
Motherboard	Gigabyte B760M DS3H
Cooling	Laminar RM1
Memory	32 GB DDR4-3200
Video Card(s)	RX 6700 XT (vandalised)
Storage	Yes.
Display(s)	MSi G2712
Case	Matrexx 55 (slightly vandalised)
Audio Device(s)	Yes.
Power Supply	Thermaltake 1000 W
Mouse	Don't disturb, cheese eating in progress...
Keyboard	Makes some noise. Probably onto something.
VR HMD	I live in real reality and don't need a virtual one.
Software	Windows 11 / 10 / 8
Benchmark Scores	My PC can run Crysis. Do I really need more than that?

Processor	Ryzen 7 5800X3D
Motherboard	MSI Pro B550M-VC Wifi
Cooling	Thermalright Peerless Assassin 120 SE
Memory	2x16GB G.Skill RipJaws DDR4-3600 CL16
Video Card(s)	Asus DUAL OC RTX 4070 Super
Storage	4TB NVME, 2TB SATA SSD, 4TB SATA HDD
Display(s)	Asus ROG PG34WCDM 34" 3440x1440p OLED HDR.
Case	Fractal Design Pop Air MIni
Power Supply	Corsair RMe 750W 80+ Gold
Mouse	Logitech G502 Hero
Keyboard	GMMK TKL RGB Black
VR HMD	Oculus Quest 2

System Name	MSI-MEG
Processor	AMD Ryzen 9 3900X
Motherboard	MSI MEG X570S ACE MAX
Cooling	AMD Wraith Prism + Thermal Grizzly
Memory	32 GB
Video Card(s)	MSI Suprim X RTX 3080
Storage	500 GB MSI Spatium nvme + 500 GB WD nvme + 2 TB Seagate HDD + 2 TB Seagate HDD
Display(s)	27" LG 144HZ 2K ULTRAGEAR
Case	MSI MPG Velox Airflow 100P
Audio Device(s)	Philips
Power Supply	Seasonic 750W 80+ Gold
Mouse	HP OMEN REACTOR
Keyboard	Corsair K68
Software	Windows10 LTSC 64 bit

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000