• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Overclocked HBM? It's true, and it's fast

Status
Not open for further replies.
I agree with your car analogy. However, are you sure you know how Fiji GPU uses HBM which lie on the same interposer?? This architecture is unprecedented and I doubt that anyone in this thread fully knows how it works thoroughly.

Could you please try to explain that 19321 graphics score in fs, when the oc was 1145/600. FYI the graphics score for 1145/500 is around 16k only.


You're right, I don't understand exactly how HBM works. But to that fact you don't either and saying that a timing/clock "sweet spot" was achieved to get huge performance numbers is misleading since it's one website.

I have to ask where the "1145/500 is around 16k only" came from, went to the website and looked through all your posts and didn't see that. All I see is "In the end we managed to get a 3DMark Fire Strike score of 16963 points on overclocked settings, a nice increase on the standard 14098 points we achieved." That's with a 500Mhz increase to the CPU and the core+vram overclocked.

Show me where your numbers are and I will analyze it.
 
500mhz
YqnkCuL.png

550mhz
AklOPAk.png


GTA5 using 4GB vmem
550mhz
Frames Per Second (Higher is better) Min, Max, Avg
Pass 0, 18.911589, 135.649765, 67.071381
Pass 1, 39.104492, 136.511185, 67.168938
Pass 2, 50.401340, 104.464287, 73.244118
Pass 3, 45.552242, 133.467422, 86.338333
Pass 4, 30.762289, 146.618347, 67.937256

500mhz
Frames Per Second (Higher is better) Min, Max, Avg
Pass 0, 19.770178, 134.201065, 67.623108
Pass 1, 32.177280, 81.928307, 66.564148
Pass 2, 39.716557, 104.432373, 70.212379
Pass 3, 51.638721, 141.080902, 88.367096
Pass 4, 25.761564, 156.650940, 67.926483

I don't really see any gain with oc memory however it does appear it could be increasing the minimum framerate


Can you do a normal 1080P firestrike run? Because in Ultra i think the GPU is the limiting factor, not the Vram speed, and in the rumored test they gained in the normal test.
 
Can you do a normal 1080P firestrike run? Because in Ultra i think the GPU is the limiting factor, not the Vram speed, and in the rumored test they gained in the normal test.

I can certainly run at 1080P. I choose 4K assuming it would use most of the vram which would be a limiting factor. I will run the GTA5 benchmark a few times and maybe a few others to see if OC memory is decreasing the delta or if it was simply an anomaly.

It's funny to think about the sheer bandwidth of HBM. Granted this is comparing apples an oranges but it gives you an idea. I think HBM will benefit APUs more than anything.

Pxrz9ty.png
 
Last edited:
I can certainly run at 1080P. I choose 4K assuming it would use most of the vram which would be a limiting factor. I will run the GTA5 benchmark a few times and maybe a few others to see if OC memory is decreasing the delta or if it was simply an anomaly.

Thanks man appreciate it
 
I can certainly run at 1080P. I choose 4K assuming it would use most of the vram which would be a limiting factor. I will run the GTA5 benchmark a few times and maybe a few others to see if OC memory is decreasing the delta or if it was simply an anomaly.

It's funny to think about the sheer bandwidth of HBM. Granted this is comparing apples an oranges but it gives you an idea

Pxrz9ty.png

If you don't do any memory overclock, what is your max core overclock?
 
Can you bench with 600-625 MHz on mem? It seems that AMD first plan was set HBM at 625MHz, but not all of the chip passed. They had to set it at 500MHz thinking it would be enough. Therefore there might be a timing profile for a round 625MHz, which can significantly boost the performance.
 
Can you bench with 600-625 MHz on mem? It seems that AMD first plan was set HBM at 625MHz, but not all of the chip passed. They had to set it at 500MHz thinking it would be enough. Therefore there might be a timing profile for a round 625MHz, which can significantly boost the performance.

600Mhz causes serve artifacts and crashing. I will push as far as I can go but I might not have a very good chip to OC.
 
Can you bench with 600-625 MHz on mem? It seems that AMD first plan was set HBM at 625MHz, but not all of the chip passed. They had to set it at 500MHz thinking it would be enough. Therefore there might be a timing profile for a round 625MHz, which can significantly boost the performance.

Is that so...


I would still like to know where you got the numbers you referenced above.
 
Can you bench with 600-625 MHz on mem? It seems that AMD first plan was set HBM at 625MHz, but not all of the chip passed. They had to set it at 500MHz thinking it would be enough. Therefore there might be a timing profile for a round 625MHz, which can significantly boost the performance.

Got any proof?
 
600Mhz causes serve artifacts and crashing. I will push as far as I can go but I might not have a very good chip to OC.

Don't hurt your card to please Mirakul. At least you bought one, let someone else push the envelope!
 
Don't hurt your card to please Mirakul. At least you bought one, let someone else push the envelope!

Lol I already got stuck in a reboot loop. I clocked to 625mhz and within a few seconds I got severe artifacting to the point of reboot being necessary. After the restart the OCs were saved in CCC which changed the clocks immediately after booting causing artifacting. Fortunately I was able to boot into safe mode to disable the OC.
 
Yes, early leaks all pointed to that number, 625MHz. But it seems that not all the HBM chip can reach it.
Lol I already got stuck in a reboot loop. I clocked to 625mhz and within a few seconds I got severe artifacting to the point of reboot being necessary. After the restart the OCs were saved in CCC which changed the clocks immediately after booting causing artifacting. Fortunately I was able to boot into safe mode to disable the OC.
Thanks for your effort. I think it would be safe to stick with 500MHz for now. Given that Fury's stock seems to be extremely low, just having the card at this moment is lucky enough :)
Maybe when the memory voltage can be bumped, we can have higher HBM clock stable.
 
Last edited:
Is that have any relation to the topic here? Why didn't you compare FuryX with itself when judging its overclock?
I did mention, right? 9% core OC and 20% mem OC and the score jumped from 14098 to 16963, an impressive 20% increase.

FuryX does need moar bandwidth afterall.

Riiight. Synthetic benchmarks show a higher score, so FuryX benefits from more bandwidth. Try doing that in Heaven and I bet you will also see a lovely score bump. Now put that same OC to the test in-game (any game) and you will see the gains diminish entirely. We don't play benchmarks. Fire Strike, Heaven etc. mean exactly fuck all and are the LAST indicator to draw conclusions on, if ever.

FuryX is an unbalanced card, it's as simple as that. Memory bandwidth is off the charts along with shader count, but it falls short on ROPs. Since it is essentially the same arch you can definitely base conclusions on those bits of info right there. AMD cards have NEVER been starving for bandwidth and have NEVER been effective memory overclockers. Do the math. The arch is still the same as it was in 2012, with some Tonga optimizations on top. It ain't rocket science...

Memory overclocking isn't always shit though - I remember overclocking the GTX 660. It was the memory that made performance gains happen, almost exclusively... And this is not surprising, as Nvidia introduced Kepler Boost with 6xx, generating higher core clocks out of the box while memory was still fixed on an otherwise very well balanced card.

See... logic works.
 
Last edited:
Riiight. Synthetic benchmarks show a higher score, so FuryX benefits from more bandwidth. Try doing that in Heaven and I bet you will also see a lovely score bump. Now put that same OC to the test in-game (any game) and you will see the gains diminish entirely. We don't play benchmarks. Fire Strike, Heaven etc. mean exactly fuck all and are the LAST indicator to draw conclusions on, if ever.

FuryX is an unbalanced card, it's as simple as that. Memory bandwidth is off the charts along with shader count, but it falls short on ROPs. Since it is essentially the same arch you can definitely base conclusions on those bits of info right there. AMD cards have NEVER been starving for bandwidth and have NEVER been effective memory overclockers. Do the math. The arch is still the same as it was in 2012, with some Tonga optimizations on top. It ain't rocket science...

Memory overclocking isn't always shit though - I remember overclocking the GTX 660. It was the memory that made performance gains happen, almost exclusively... And this is not surprising, as Nvidia introduced Kepler Boost with 6xx, generating higher core clocks out of the box while memory was still fixed on an otherwise very well balanced card.

See... logic works.


Yeah, not always shit, when memory bandwidth is the bottleneck then it's nice to get an overclock on it. You don't often see it these days though. That's interesting about the 660, ill have to look up the specs.
 
I lol'd so hard with with your 660 joke. You meant the card with 192bit bus and 2GB of VRAMm, right?. No one noticed it but it told the same the story as 970 fiasco.

On the topic, another guy hit 600MHz stable, and he saw the performance increased
[quote name="Neon Lights" url="[URL]http://www.overclock.net/t/1547314/official-amd-r9-radeon-fury-nano-x-x2-fiji-owners-club/1720#post_24106947[/URL]"]
I also had the memory bug.

2503429


In the "Furry and Tessy" Test (1920x1080, 4xMSAA) in MSI Kombustor 2.5.0 600MHz memory clock (and standard core clock) gives me 57FPS instead of 49FPS.[/quote]
 
Last edited:
I can certainly run at 1080P. I choose 4K assuming it would use most of the vram which would be a limiting factor. I will run the GTA5 benchmark a few times and maybe a few others to see if OC memory is decreasing the delta or if it was simply an anomaly.

It's funny to think about the sheer bandwidth of HBM. Granted this is comparing apples an oranges but it gives you an idea. I think HBM will benefit APUs more than anything.

Pxrz9ty.png


... i didn't realise that. Any CPU's with HBM for cache (especially those APU's) are going to have one hell of a performance kick in the pants...
 
I agree with your car analogy. However, are you sure you know how Fiji GPU uses HBM which lie on the same interposer?? This architecture is unprecedented and I doubt that anyone in this thread fully knows how it works thoroughly.
Neither do you since your usage of the term "interposer" is seemingly incorrect. The interproser is the circuitry inside a piece of circuit board that connects two ICs, it is not the circuit board itself, therefore nothing really "lies" on the interposer.
Could you please try to explain that 19321 graphics score in fs, when the oc was 1145/600. FYI the graphics score for 1145/500 is around 16k only.
Could you link to what you're talking about. You seem to have lost me.
 
... i didn't realise that. Any CPU's with HBM for cache (especially those APU's) are going to have one hell of a performance kick in the pants...

Aye that is what makes it interesting. 2016 will be a very interesting year.
 
It's funny to think about the sheer bandwidth of HBM. Granted this is comparing apples an oranges but it gives you an idea. I think HBM will benefit APUs more than anything.
You could easily beat Haswells L2 Cache bandwidth even with GDDR5. But you will never come even close to reach the low latency levels.

You are comparing apples to the latest issue of cosmopolitan.
 
Don't hurt your card to please Mirakul. At least you bought one, let someone else push the envelope!

PUSH IT TO THE LIMIT and then beyond many here would like to know at what point these cards Self Destruct
Card has warranty yes we would also be interested in the RMA experience
 
Last edited:
I lol'd so hard with with your 660 joke. You meant the card with 192bit bus and 2GB of VRAMm, right?. No one noticed it but it told the same the story as 970 fiasco.

The 660, yes, which I had in SLI, with its asymmetric bus. The 660 is NOT an early 970 though. Nvidia even used an assymetric bus in some Fermi cards. The 970 is pushing the envelope in terms of its memory subsystem, offering a far less advantageous end result where the last segment is completely starved of bandwidth. The GTX 660 still had a pretty decent bandwidth on the last 0.5GB segment. However in SLI the issues did show in the form of a slight stutter when I pushed that card. Overclocking memory alleviated most of those issues. Similarly, people today are experiencing issues when they put the 970 in SLI and push it hard. Which is what I have been saying when the issue popped up. It is also an excellent example of a GPU that is extremely well balanced in core vs memory, and an excellent example of a GPU where overclocking memory actually nets you major performance gains, most pronounced when put in SLI.

Not sure why you'd lol so hard on this, it's no secret that Nvidia uses and used asymmetric memory systems. If that is any indicator of your sense of humor... well that's a boring life :)

No one noticed it? It was a non-issue, but reviewers most certainly noticed it and Nvidia also didn't hide it OR tried to mitigate the media storm, like they did with 970. However, even the 970 is a fine card, just like the 660 was, until you push the weak links in SLI, and even thén it holds up quite ok, just not flawless.

Either way, the 660 proves my point that memory overclocking only works on cards that are starved on bandwidth. The FuryX is not that card and GCN has never been starved on bandwidth. Ever. They are all cards with very wide buses and a core that couldn't really match it, making for a slightly less efficient design. This again is supported by power draw figures that are generally higher on GCN than on Kepler and especially Maxwell.

For LightningJR, here is the technical explanation:

Basically a 192-bit memory bus has three blocks of 64-bit memory controllers, and to get 1.5GB of vRAM on them, you would add 512MB vRAM to each 64-bit mem bus, but nVidia adds an extra 512MB block to one 64-bit block. The thing about this "trick", is that all of the memory does not work at the faster bus speed. This means that only the first 1.5GB of vRAM is run at ~144GB/s, and the last 512MB block is run at only ~48GB/s, due to asymmetrical design. This is present in other cards as well from nVidia, such as the GTX 650Ti Boost, GTX 560 SE, GTX 550Ti and the GTX 460 v2. So hey, the more you know right?

source; http://linustechtips.com/main/topic/198568-the-video-ram-information-guide/
 
Last edited:
^Quick answer about 660, I don't think asymmetrical design is a good design, it's only good for marketing, a 2GB card seem to be better a 1.5GB. I would say it is the same as "3.5GB" 970

For memory overclocking, your point of view is correct, but not enough. When you OC the memory, you change the latency as well. Better latency will help the core works more effectively in memory intensive scenario. If the latency is not coordinate well with memory timing profile in BIOS, more errors will be produced hence mitigate the gain from more bandwidth.

Thefore, I believe it was what happens in @v12dock experiment at 550MHz. Other guy at 600MHz saw a significant boost in performance, just like in hardware.info bench. FuryX seems to have timing profile for 625MHz HBM, which suits the rumors of 625MHz HBM before lauched.
 
Asymmetrical design is NOT for marketing purposes. It exists purely because Nvidia has a model in which they cut down chips and their SM units are connected to memory bus (this is what happened with the 970). It was US, unknowing customers, that said 'Nvidia should have marketed it as 3.5GB'. HOWEVER - if Nvidia had actually worked with only 3.5GB of VRAM on the 970, it would have capped out earlier as resources were still going to get moved around to achieve maximum performance. Right now, the 970 uses the last 0.5GB for, among other stuff, the Windows desktop, meaning that hardly used data is sitting in the slow part of the memory subsystem. This in itself is not a bad thing and its always better than having that data sit in the high performance part of the gpu taking up valuable VRAM space in the first 3.5GB segment.

The other side of the medal is that we now see a dramatic power draw difference between Maxwell and GCN. Nvidia's memory subsystem design is part of this difference, and part of its competitive advantage in the market as a whole. Not only do they have more efficient chips, they also need less metal to produce performance equivalents to GCN with its wide and expensive memory bus. Efficiency for Nvidia = efficiency across the board and in the long term, this pays off as it shows today.

Last, latency with GDDR5 (HBM can be different, quite sure it is, because low clocks) is not very interesting. GDDR5 is by nature higher latency memory and a slight clock shift won't influence that too much, if at all. A GPU is always queuing frame data, so latency is less important and can be 'hidden' between frames.
 
Status
Not open for further replies.
Back
Top