Is RX 9070 VRAM temperature regular value or hotspot?

tugrul_LordOfDrinks · Mar 24, 2025

AusWolf said:
There is no pattern. Just examples of engineers failing and getting things right.

Engineers warned about the O-rings, but their concerns were ignored. The disaster became a tragic lesson in safety and decision-making: The greatest teacher, failure is.

Fear of losing money and delays led to reckless decisions, ultimately causing suffering for the environment and people: Fear is the path to the dark side. fear leads to anger, anger leads to hate, hate leads to suffering.

There is no ignorance, there is knowledge.

There is no emotion, there is peace.

Vayra86 · Mar 24, 2025

Damn, now we're stuck between Yoda and Lord of the Rings oneliners, this is escalating quickly

JustBenching · Mar 24, 2025

AusWolf said:
How has it degraded if it's still running the same clocks as 3 years ago?

My 12900k runs the same clocks it did day one, it's still heavily (and I mean heavily) degraded, pretty easy to tell. I literally gave you the answer, parts degrade, that's why they come overvolted by the factory, hence why you can undervolt them and they still work.

AusWolf said:
yet not a single one of them has been reported to fail?

Excuse me, what? The cards haven't even been out for a month, how would you expect fail reports? Do you understand what electromigration is or are you just being defensive cause you have the product? Expecting failures after 2 weeks makes 0 sense whatsoever. In the same train of thought, 13900ks didn't fail cause we didn't have reports 2 weeks after their release, right?

tugrul_LordOfDrinks · Mar 24, 2025

I had 7870 hawk model and I overclocked it 47.5% from time to time. Ultimately, before I sold it, it had lost 20-30 mV of stability. So to run at same factory clocks, it required ~25mV more.

The msi hawk model allowed unlimited oc. But did not have a warning about what kind of degradation it will have. So I tried my chances because there's no negative data at the time. Only until I saw the degradation. (but on average I had only 27.5% oc to gpu)

Now people overclocking 5000 series like crazy. What if Nvidia has not activated the extra integer pipelines yet (from a driver update)? Then people will crash if it happens. What if 5000 series is degrading faster than older generations?

AusWolf's idea is like:

"let me take a spoon of water from ocean..."
looks at the spoon, sees no whale
"no whales in ocean for sure!"

AusWolf · Mar 24, 2025

tugrul_SIMD said:
Fear is the path to the dark side. fear leads to anger, anger leads to hate, hate leads to suffering.

Yep. That's why I prefer to live with no fear.

tugrul_SIMD said:
There is no emotion, there is peace.

Exactly my point. You seem to be arguing from emotion, drawing parallels to completely unrelated things out of fear.

JustBenching said:
My 12900k runs the same clocks it did day one, it's still heavily (and I mean heavily) degraded, pretty easy to tell. I literally gave you the answer, parts degrade, that's why they come overvolted by the factory, hence why you can undervolt them and they still work.

So, if the card maxes out its power limit just like it did on day one, and it runs the same clocks as it did on day one, then I can assume that voltages have stayed the same, right? Otherwise, if voltages had increased, then clocks would have dropped to stay within power target.

JustBenching said:
Excuse me, what? The cards haven't even been out for a month, how would you expect fail reports? Do you understand what electromigration is or are you just being defensive cause you have the product? Expecting failures after 2 weeks makes 0 sense whatsoever. In the same train of thought, 13900ks didn't fail cause we didn't have reports 2 weeks after their release, right?

So we'll see reports later. Or maybe not. What can we do in the meantime besides adjusting fan curves to comfortable levels and/or bickering on about it on an online forum?

tugrul_LordOfDrinks · Mar 24, 2025

AusWolf said:
Yep. That's why I prefer to live with no fear.

I was talking about the corporations. Their fear of losing money leading into suffering of gamers.

AusWolf said:
Exactly my point. You seem to be arguing from emotion, drawing parallels to completely unrelated things out of fear.

I was talking about corporations. This part is linked to the "reckless decisions due to fear of losing money".

AusWolf · Mar 24, 2025

tugrul_SIMD said:
I was talking about the corporations. Their fear of losing money leading into suffering of gamers.

I was talking about corporations.

Fair enough. Still my point stands. I need evidence and explanation, not theories.

tugrul_LordOfDrinks · Mar 24, 2025

Evidence requires experiments. Experiment requires lab rats. Are we lab rats? (I'm against using animals in experiements, also against humans in experiments, bacteria: ok) Are we beta testers? I didn't volunteer anywhere for this.

Similar thing: missing ROPs.

How do they miss such an important part in quality-assurence department? Don't they even benchmark for performance nor count the ROPs? Even the power requirement changes with less ROPs.

How can they miss:

decreased power
decreased performance
decreased number of ROPs

and not inform AIBs?

Also what do AIBs do? Just slap a gpu and sell with zero tests?

AusWolf · Mar 24, 2025

tugrul_SIMD said:
Evidence requires experiments. Experiment requires lab rats. Are we lab rats? (I'm against using animals in experiements, also against humans in experiments, bacteria: ok) Are we beta testers? I didn't volunteer anywhere for this.

No. But I've got a 2 year warranty. I'm sure there are lots of people who will game a lot more during those 2 years than I could ever hope for.

Until there's evidence of cards failing due to bad/hot VRAM, I'm gonna assume that it's intended operation simply because all cards of all AIBs are affected, and because I bought the card to game on it, not to be scared of some unknown working of it that I don't understand. Life is short, let's enjoy it while we can, and deal with problems as they arise, not before.

tugrul_LordOfDrinks · Mar 24, 2025

AusWolf said:
But I've got a 2 year warranty.

So, as long as there's warranty, is it tolerable to test faulty hardware for big corporations and send the card waiting for 2 months to return or buy a new one when inventory refills? What happens to the lost time? Is there a warranty for the lost time? What if I bought it for my work because I cant buy an H100 gpu?

The only advantage of hot-running chip is the increased cooling performance. The higher temperature the faster the cooling rate. But the list of disadvantages is longer:

damage to neighboring components such as capacitors, vrm chips, maybe other transistors and even the motherboar's pcie bridge due to heat creep.
increased electromigration speed leading into degradation of silicon
hotter air exiting the card going towards CPU cooler, reducing CPU performance by throttling
heating - cooling making the component enlarge - shrink periodically and causing cracking on PCB/or the component itself.
reducing the peace of mind -> need to play totally safe, not run against time. This is not fun.
thermal throttling of the component itself --> automatic loosesning of CL timings as we observe in Hynix graphics card memory. It loosens CL timing at every 5-10 Celcius point higher in the temperature scale.

AusWolf · Mar 24, 2025

tugrul_SIMD said:
Also what do AIBs do? Just slap a gpu and sell with zero tests?

All of them? With all cards?

tugrul_SIMD said:
So, as long as there's warranty, is it tolerable to test faulty hardware for big corporations and send the card waiting for 2 months to return or buy a new one when inventory refills? What happens to the lost time? Is there a warranty for the lost time? What if I bought it for my work because I cant buy an H100 gpu?

If you're so concerned about working on your GPU, then you probably didn't buy a 9070 XT in the first place.

Also, we're talking about a slow degradation of the VRAM chips. By the time they fail, inventory will be fine (supposedly).

But in any case, why worry about something you can't help? And if you can help it, then why worrying about it instead of doing it?

tugrul_LordOfDrinks · Mar 24, 2025

AusWolf said:
By the time they fail, inventory will be fine

How do you know? Can you say a WW3 will not happen? Or some corporations will not bankrupt and stop production? Or they will ban selling of these cards to outside of a single state? Or they will ban usage of GPUs altogether for some nonesense reason?

In fact, if they move factories from China / etc into USA, then the production will get more expensive. Because USA engineers will ask more money. ---> leading into less expectancy of selling amount.

AusWolf · Mar 24, 2025

tugrul_SIMD said:
How do you know? Can you say a WW3 will not happen? Or some corporations will not bankrupt and stop production? Or they will ban selling of these cards to outside of a single state? Or they will ban usage of GPUs altogether for some nonesense reason?

Or maybe an asteroid will hit Earth and we all die. We never know. But that's exactly why I'm asking: why worry about something you can't control? It's a waste of time.

tugrul_LordOfDrinks · Mar 24, 2025

AusWolf said:
why worry about something you can't control?

We can control. It's called a lawsuit. People did this before, for so-called 8 core desktop cpu FX8150. Today, AMD prefers real cores, not shared cores. Also adds SMT to each core as cherry on top.

Look at Intel. No development. Only increasing 100 MHz, call it next gen. Even removing AVX512 from some gens. People have chosen to control, by prefering AMD.

AusWolf · Mar 24, 2025

tugrul_SIMD said:
We can control. It's called a lawsuit.

Go ahead, sue AMD, then. I'm not sure it'll be cheaper than replacing a dead 9070 XT.

Vayra86 said:
No, and then we get to the very core of what I'm trying to bring across Since you don't want that, you assume its fine. You assume it because you own the card - I don't assume it because I don't own the card.

Its all psychology. So that is why I've said that statement is so very important. Don't leave this kind of thing up in the air. Nobody benefits.

No. I assume that because I have no data to suggest otherwise.

tugrul_LordOfDrinks · Mar 24, 2025

So if you have a car, youre always ok to have the temperature indicator at red line? What if youre working at a nuclear power plant?

AusWolf · Mar 24, 2025

tugrul_SIMD said:
So if you have a car, youre always ok to have the temperature indicator at red line? What if youre working at a nuclear power plant?

With a car's engine temp, you know where the redline is. Do you know where it is on GDDR6 VRAM chips?

tugrul_LordOfDrinks · Mar 24, 2025

AusWolf said:
With a car's engine temp, you know where the redline is. Do you know where it is on GDDR6 VRAM chips?

Yes its when memory throttles its CL timing and see a dip in bandwidth tests.

Memory does this only when oc or when heats too much.

AusWolf · Mar 24, 2025

tugrul_SIMD said:
Yes its when memory throttles its CL timing and see a dip in bandwidth tests.

Memory does this only when oc or when heats too much.

Bandwidth tests? Like what?

Does this show in something like a 3DMark stability test?

tugrul_LordOfDrinks · Mar 24, 2025

AusWolf said:
Bandwidth tests? Like what?

Does this show in something like a 3DMark stability test?

There are bandwidth test programs to measure bandwidth. OpenCL based programs work for all cards including AMD. CUDA preferred for Nvidia. I use them to find best mem oc, not blindly apply +2000.

Do you even trust the temperature sensor? What if it reports 20Celcius lower than the actual temperature? 95C => 115C doomed to fail quick.

At least have some safety margin in there, like 80C so the worst can be 100C in real. Thats why people have questions.

I generally prefer 10C delta safety margin against measurement error. 95C is right at the edge of danger zone. I would have questions if my mem had this issue.

AVATARAT · Mar 24, 2025

AusWolf said:
Bandwidth tests? Like what?

Does this show in something like a 3DMark stability test?

I use this one memtest_vulkan for errors/bandwidth, but beware, it generates a lot of heat.

Usually a short test is enough.

tugrul_LordOfDrinks · Mar 24, 2025

AVATARAT said:
I use this one memtest_vulkan for errors/bandwidth, but beware, it generates a lot of heat.

Usually a short test is enough.

Does memtestvulkan disable ECC to find errors?

freeagent · Mar 24, 2025

Who makes the coolest running Radeons?

I was under the impression that they all ran hot.

Like 90-100c is normal, as crazy as that sounds.

Numbers like that are only normal to Radeon users.

Vayra86 · Mar 24, 2025

freeagent said:
Who makes the coolest running Radeons?

I was under the impression that they all ran hot.

Like 90-100c is normal, as crazy as that sounds.

Numbers like that are only normal to Radeon users.

I can get into a 100C on the core/hotspot sensor (and frankly I think Nvidia cards suffer that too, its just physics, the perf/w and die sizes are similar, the nodes are similar, they just don't expose the sensor). I mean there's no chance anything has been cooled yet right there, so sure, it can get real hot. I wouldn't like it exceeding 100C though, as per the spec of a 7900XT would be possible... Its also something I can directly control or respond to. Put a hard limit on the core temp; underclock... various options.

But VRAM on 90C? That's not really where I'd wanna be with any card. Its just too close for comfort and VRAM tends to be far more temp stable as well. If ambient temps rise, VRAM temp will creep up further. It ain't summer yet.

AVATARAT · Mar 24, 2025

tugrul_SIMD said:
Does memtestvulkan disable ECC to find errors?

No, at the beginning it will start to reduce bandwidth and if you go further with the overclock it will start with the errors.

System Name	nVAMDia
Processor	Ryzen 7900 at 145Watts PPT.
Motherboard	MSI MAG x670 Tomahawk Wifi
Cooling	AIO240 for CPU, Wraith Prism's Fan for RAM but suspended above it without touching anything in case.
Memory	32GB dual channel Gskill DDR6000CL30
Video Card(s)	Zotac 5070 gaming solid oc + Msi Ventus 2x Rtx 4070
Storage	Samsung Evo 970
Display(s)	Old 1080p 60FPS Samsung
Case	Normal atx
Audio Device(s)	Dunno
Power Supply	1200Watts (8x 8-pin)
Mouse	wireless & quiet
Keyboard	wireless & quiet
VR HMD	No
Software	Windows 11
Benchmark Scores	The best GPU in the world is the brain. 100!

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

System Name	Mean machine
Processor	AMD 6900HS
Memory	2x16 GB 4800C40
Video Card(s)	AMD Radeon 6700S

System Name	nVAMDia
Processor	Ryzen 7900 at 145Watts PPT.
Motherboard	MSI MAG x670 Tomahawk Wifi
Cooling	AIO240 for CPU, Wraith Prism's Fan for RAM but suspended above it without touching anything in case.
Memory	32GB dual channel Gskill DDR6000CL30
Video Card(s)	Zotac 5070 gaming solid oc + Msi Ventus 2x Rtx 4070
Storage	Samsung Evo 970
Display(s)	Old 1080p 60FPS Samsung
Case	Normal atx
Audio Device(s)	Dunno
Power Supply	1200Watts (8x 8-pin)
Mouse	wireless & quiet
Keyboard	wireless & quiet
VR HMD	No
Software	Windows 11
Benchmark Scores	The best GPU in the world is the brain. 100!

System Name	My second and third PCs are Intel + Nvidia
Processor	AMD Ryzen 7 7800X3D @ 45 W TDP Eco Mode
Motherboard	MSi Pro B650M-A Wifi
Cooling	Noctua NH-D9L chromax.black
Memory	2x 24 GB Corsair Vengeance DDR5-6000 CL36
Video Card(s)	PowerColor Reaper Radeon RX 9070 XT
Storage	2 TB Corsair MP600 GS, 4 TB Seagate Barracuda
Display(s)	Dell S3422DWG 34" 1440 UW 144 Hz
Case	Corsair Crystal 280X
Audio Device(s)	Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply	750 W Seasonic Prime GX
Mouse	Logitech MX Master 2S
Keyboard	Logitech G413 SE
Software	Bazzite (Fedora Linux) KDE Plasma

Processor	Ryzen 7 9700x
Motherboard	Asrock B650E PG Riptide WiFi
Cooling	Underfloor CPU cooling
Memory	2x32GB 6200MT/s
Video Card(s)	ASUS Prime Radeon RX 9070 XT OC Edition
Storage	Crucial T705 1TB Gen5, Seagate Exos 12TB
Display(s)	MSI Optix MAG301RF 2560x1080@200Hz
Case	APNX V1 Black
Power Supply	NZXT C850 850W Gold
Mouse	Bloody W95 Max Naraka
Keyboard	ZeleSouris Mini Gaming Keyboard

Processor	AMD R9 9900X @ booost
Motherboard	Asus Strix X670E-F
Cooling	Thermalright Phantom Spirit 120 EVO, 2x T30
Memory	2x 16GB Lexar Ares @ 6400 28-36-36-68 1.55v
Video Card(s)	Zotac 4070 Ti Trinity OC @ 3045/1500
Storage	WD SN850 1TB, SN850X 2TB, 2x SN770 1TB
Display(s)	LG 50UP7100
Case	Asus ProArt PA602
Audio Device(s)	JBL Bar 700
Power Supply	Seasonic Vertex GX-1000, Monster HDP1800
Mouse	Logitech G502 Hero
Keyboard	Logitech G213
VR HMD	Oculus 3
Software	Yes
Benchmark Scores	Yes

Is RX 9070 VRAM temperature regular value or hotspot?

Moderator