• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Is RX 9070 VRAM temperature regular value or hotspot?

There is no pattern. Just examples of engineers failing and getting things right.
Engineers warned about the O-rings, but their concerns were ignored. The disaster became a tragic lesson in safety and decision-making: The greatest teacher, failure is.

Fear of losing money and delays led to reckless decisions, ultimately causing suffering for the environment and people: Fear is the path to the dark side. fear leads to anger, anger leads to hate, hate leads to suffering.

There is no ignorance, there is knowledge.

There is no emotion, there is peace.
 
Damn, now we're stuck between Yoda and Lord of the Rings oneliners, this is escalating quickly :D
 
How has it degraded if it's still running the same clocks as 3 years ago?
My 12900k runs the same clocks it did day one, it's still heavily (and I mean heavily) degraded, pretty easy to tell. I literally gave you the answer, parts degrade, that's why they come overvolted by the factory, hence why you can undervolt them and they still work.

yet not a single one of them has been reported to fail?
Excuse me, what? The cards haven't even been out for a month, how would you expect fail reports? Do you understand what electromigration is or are you just being defensive cause you have the product? Expecting failures after 2 weeks makes 0 sense whatsoever. In the same train of thought, 13900ks didn't fail cause we didn't have reports 2 weeks after their release, right?
 
I had 7870 hawk model and I overclocked it 47.5% from time to time. Ultimately, before I sold it, it had lost 20-30 mV of stability. So to run at same factory clocks, it required ~25mV more.

The msi hawk model allowed unlimited oc. But did not have a warning about what kind of degradation it will have. So I tried my chances because there's no negative data at the time. Only until I saw the degradation. (but on average I had only 27.5% oc to gpu)

Now people overclocking 5000 series like crazy. What if Nvidia has not activated the extra integer pipelines yet (from a driver update)? Then people will crash if it happens. What if 5000 series is degrading faster than older generations?

AusWolf's idea is like:

  • "let me take a spoon of water from ocean..."
  • looks at the spoon, sees no whale
  • "no whales in ocean for sure!"
 
Last edited:
Fear is the path to the dark side. fear leads to anger, anger leads to hate, hate leads to suffering.
Yep. That's why I prefer to live with no fear.

There is no emotion, there is peace.
Exactly my point. You seem to be arguing from emotion, drawing parallels to completely unrelated things out of fear.

My 12900k runs the same clocks it did day one, it's still heavily (and I mean heavily) degraded, pretty easy to tell. I literally gave you the answer, parts degrade, that's why they come overvolted by the factory, hence why you can undervolt them and they still work.
So, if the card maxes out its power limit just like it did on day one, and it runs the same clocks as it did on day one, then I can assume that voltages have stayed the same, right? Otherwise, if voltages had increased, then clocks would have dropped to stay within power target.

Excuse me, what? The cards haven't even been out for a month, how would you expect fail reports? Do you understand what electromigration is or are you just being defensive cause you have the product? Expecting failures after 2 weeks makes 0 sense whatsoever. In the same train of thought, 13900ks didn't fail cause we didn't have reports 2 weeks after their release, right?
So we'll see reports later. Or maybe not. What can we do in the meantime besides adjusting fan curves to comfortable levels and/or bickering on about it on an online forum?
 
Yep. That's why I prefer to live with no fear.
I was talking about the corporations. Their fear of losing money leading into suffering of gamers.
Exactly my point. You seem to be arguing from emotion, drawing parallels to completely unrelated things out of fear.
I was talking about corporations. This part is linked to the "reckless decisions due to fear of losing money".
 
I was talking about the corporations. Their fear of losing money leading into suffering of gamers.

I was talking about corporations.
Fair enough. Still my point stands. I need evidence and explanation, not theories.
 
Evidence requires experiments. Experiment requires lab rats. Are we lab rats? (I'm against using animals in experiements, also against humans in experiments, bacteria: ok) Are we beta testers? I didn't volunteer anywhere for this.

Similar thing: missing ROPs.

How do they miss such an important part in quality-assurence department? Don't they even benchmark for performance nor count the ROPs? Even the power requirement changes with less ROPs.

How can they miss:
  • decreased power
  • decreased performance
  • decreased number of ROPs
and not inform AIBs?

Also what do AIBs do? Just slap a gpu and sell with zero tests?
 
Last edited:
Evidence requires experiments. Experiment requires lab rats. Are we lab rats? (I'm against using animals in experiements, also against humans in experiments, bacteria: ok) Are we beta testers? I didn't volunteer anywhere for this.
No. But I've got a 2 year warranty. I'm sure there are lots of people who will game a lot more during those 2 years than I could ever hope for.

Until there's evidence of cards failing due to bad/hot VRAM, I'm gonna assume that it's intended operation simply because all cards of all AIBs are affected, and because I bought the card to game on it, not to be scared of some unknown working of it that I don't understand. Life is short, let's enjoy it while we can, and deal with problems as they arise, not before.
 
But I've got a 2 year warranty.
So, as long as there's warranty, is it tolerable to test faulty hardware for big corporations and send the card waiting for 2 months to return or buy a new one when inventory refills? What happens to the lost time? Is there a warranty for the lost time? What if I bought it for my work because I cant buy an H100 gpu?

The only advantage of hot-running chip is the increased cooling performance. The higher temperature the faster the cooling rate. But the list of disadvantages is longer:
  • damage to neighboring components such as capacitors, vrm chips, maybe other transistors and even the motherboar's pcie bridge due to heat creep.
  • increased electromigration speed leading into degradation of silicon
  • hotter air exiting the card going towards CPU cooler, reducing CPU performance by throttling
  • heating - cooling making the component enlarge - shrink periodically and causing cracking on PCB/or the component itself.
  • reducing the peace of mind -> need to play totally safe, not run against time. This is not fun.
  • thermal throttling of the component itself --> automatic loosesning of CL timings as we observe in Hynix graphics card memory. It loosens CL timing at every 5-10 Celcius point higher in the temperature scale.
 
Last edited:
Also what do AIBs do? Just slap a gpu and sell with zero tests?
All of them? With all cards?

So, as long as there's warranty, is it tolerable to test faulty hardware for big corporations and send the card waiting for 2 months to return or buy a new one when inventory refills? What happens to the lost time? Is there a warranty for the lost time? What if I bought it for my work because I cant buy an H100 gpu?
If you're so concerned about working on your GPU, then you probably didn't buy a 9070 XT in the first place.

Also, we're talking about a slow degradation of the VRAM chips. By the time they fail, inventory will be fine (supposedly).

But in any case, why worry about something you can't help? And if you can help it, then why worrying about it instead of doing it?
 
By the time they fail, inventory will be fine
How do you know? Can you say a WW3 will not happen? Or some corporations will not bankrupt and stop production? Or they will ban selling of these cards to outside of a single state? Or they will ban usage of GPUs altogether for some nonesense reason?

In fact, if they move factories from China / etc into USA, then the production will get more expensive. Because USA engineers will ask more money. ---> leading into less expectancy of selling amount.
 
How do you know? Can you say a WW3 will not happen? Or some corporations will not bankrupt and stop production? Or they will ban selling of these cards to outside of a single state? Or they will ban usage of GPUs altogether for some nonesense reason?
Or maybe an asteroid will hit Earth and we all die. We never know. But that's exactly why I'm asking: why worry about something you can't control? It's a waste of time.
 
why worry about something you can't control?
We can control. It's called a lawsuit. People did this before, for so-called 8 core desktop cpu FX8150. Today, AMD prefers real cores, not shared cores. Also adds SMT to each core as cherry on top.

Look at Intel. No development. Only increasing 100 MHz, call it next gen. Even removing AVX512 from some gens. People have chosen to control, by prefering AMD.
 
We can control. It's called a lawsuit.
Go ahead, sue AMD, then. I'm not sure it'll be cheaper than replacing a dead 9070 XT.

No, and then we get to the very core of what I'm trying to bring across :) Since you don't want that, you assume its fine. You assume it because you own the card - I don't assume it because I don't own the card.

Its all psychology. So that is why I've said that statement is so very important. Don't leave this kind of thing up in the air. Nobody benefits.
No. I assume that because I have no data to suggest otherwise.
 
So if you have a car, youre always ok to have the temperature indicator at red line? What if youre working at a nuclear power plant?
 
So if you have a car, youre always ok to have the temperature indicator at red line? What if youre working at a nuclear power plant?
With a car's engine temp, you know where the redline is. Do you know where it is on GDDR6 VRAM chips?
 
With a car's engine temp, you know where the redline is. Do you know where it is on GDDR6 VRAM chips?
Yes its when memory throttles its CL timing and see a dip in bandwidth tests.

Memory does this only when oc or when heats too much.
 
Yes its when memory throttles its CL timing and see a dip in bandwidth tests.

Memory does this only when oc or when heats too much.
Bandwidth tests? Like what?

Does this show in something like a 3DMark stability test?
 
Bandwidth tests? Like what?

Does this show in something like a 3DMark stability test?
There are bandwidth test programs to measure bandwidth. OpenCL based programs work for all cards including AMD. CUDA preferred for Nvidia. I use them to find best mem oc, not blindly apply +2000.

Do you even trust the temperature sensor? What if it reports 20Celcius lower than the actual temperature? 95C => 115C doomed to fail quick.

At least have some safety margin in there, like 80C so the worst can be 100C in real. Thats why people have questions.

I generally prefer 10C delta safety margin against measurement error. 95C is right at the edge of danger zone. I would have questions if my mem had this issue.
 
Bandwidth tests? Like what?

Does this show in something like a 3DMark stability test?
I use this one memtest_vulkan for errors/bandwidth, but beware, it generates a lot of heat.

Usually a short test is enough.
 
Who makes the coolest running Radeons?

I was under the impression that they all ran hot.

Like 90-100c is normal, as crazy as that sounds.

Numbers like that are only normal to Radeon users.
 
Who makes the coolest running Radeons?

I was under the impression that they all ran hot.

Like 90-100c is normal, as crazy as that sounds.

Numbers like that are only normal to Radeon users.
I can get into a 100C on the core/hotspot sensor (and frankly I think Nvidia cards suffer that too, its just physics, the perf/w and die sizes are similar, the nodes are similar, they just don't expose the sensor). I mean there's no chance anything has been cooled yet right there, so sure, it can get real hot. I wouldn't like it exceeding 100C though, as per the spec of a 7900XT would be possible... Its also something I can directly control or respond to. Put a hard limit on the core temp; underclock... various options.

But VRAM on 90C? That's not really where I'd wanna be with any card. Its just too close for comfort and VRAM tends to be far more temp stable as well. If ambient temps rise, VRAM temp will creep up further. It ain't summer yet.
 
Does memtestvulkan disable ECC to find errors?
No, at the beginning it will start to reduce bandwidth and if you go further with the overclock it will start with the errors.
 
Back
Top