Overclocked HBM? It's true, and it's fast

LightningJR · Jun 29, 2015

mirakul said:
I agree with your car analogy. However, are you sure you know how Fiji GPU uses HBM which lie on the same interposer?? This architecture is unprecedented and I doubt that anyone in this thread fully knows how it works thoroughly.

Could you please try to explain that 19321 graphics score in fs, when the oc was 1145/600. FYI the graphics score for 1145/500 is around 16k only.

You're right, I don't understand exactly how HBM works. But to that fact you don't either and saying that a timing/clock "sweet spot" was achieved to get huge performance numbers is misleading since it's one website.

I have to ask where the "1145/500 is around 16k only" came from, went to the website and looked through all your posts and didn't see that. All I see is "In the end we managed to get a 3DMark Fire Strike score of 16963 points on overclocked settings, a nice increase on the standard 14098 points we achieved." That's with a 500Mhz increase to the CPU and the core+vram overclocked.

Show me where your numbers are and I will analyze it.

Mtom · Jun 29, 2015

v12dock said:
500mhz

550mhz

GTA5 using 4GB vmem
550mhz
Frames Per Second (Higher is better) Min, Max, Avg
Pass 0, 18.911589, 135.649765, 67.071381
Pass 1, 39.104492, 136.511185, 67.168938
Pass 2, 50.401340, 104.464287, 73.244118
Pass 3, 45.552242, 133.467422, 86.338333
Pass 4, 30.762289, 146.618347, 67.937256

500mhz
Frames Per Second (Higher is better) Min, Max, Avg
Pass 0, 19.770178, 134.201065, 67.623108
Pass 1, 32.177280, 81.928307, 66.564148
Pass 2, 39.716557, 104.432373, 70.212379
Pass 3, 51.638721, 141.080902, 88.367096
Pass 4, 25.761564, 156.650940, 67.926483

I don't really see any gain with oc memory however it does appear it could be increasing the minimum framerate

Can you do a normal 1080P firestrike run? Because in Ultra i think the GPU is the limiting factor, not the Vram speed, and in the rumored test they gained in the normal test.

v12dock · Jun 29, 2015

Mtom said:
Can you do a normal 1080P firestrike run? Because in Ultra i think the GPU is the limiting factor, not the Vram speed, and in the rumored test they gained in the normal test.

I can certainly run at 1080P. I choose 4K assuming it would use most of the vram which would be a limiting factor. I will run the GTA5 benchmark a few times and maybe a few others to see if OC memory is decreasing the delta or if it was simply an anomaly.

It's funny to think about the sheer bandwidth of HBM. Granted this is comparing apples an oranges but it gives you an idea. I think HBM will benefit APUs more than anything.

Mtom · Jun 29, 2015

v12dock said:
I can certainly run at 1080P. I choose 4K assuming it would use most of the vram which would be a limiting factor. I will run the GTA5 benchmark a few times and maybe a few others to see if OC memory is decreasing the delta or if it was simply an anomaly.

Thanks man appreciate it

the54thvoid · Jun 29, 2015

v12dock said:
I can certainly run at 1080P. I choose 4K assuming it would use most of the vram which would be a limiting factor. I will run the GTA5 benchmark a few times and maybe a few others to see if OC memory is decreasing the delta or if it was simply an anomaly.

It's funny to think about the sheer bandwidth of HBM. Granted this is comparing apples an oranges but it gives you an idea

If you don't do any memory overclock, what is your max core overclock?

v12dock · Jun 29, 2015

the54thvoid said:
If you don't do any memory overclock, what is your max core overclock?

I have found 1075 "100%" stable so far

mirakul · Jun 29, 2015

Can you bench with 600-625 MHz on mem? It seems that AMD first plan was set HBM at 625MHz, but not all of the chip passed. They had to set it at 500MHz thinking it would be enough. Therefore there might be a timing profile for a round 625MHz, which can significantly boost the performance.

v12dock · Jun 29, 2015

mirakul said:
Can you bench with 600-625 MHz on mem? It seems that AMD first plan was set HBM at 625MHz, but not all of the chip passed. They had to set it at 500MHz thinking it would be enough. Therefore there might be a timing profile for a round 625MHz, which can significantly boost the performance.

600Mhz causes serve artifacts and crashing. I will push as far as I can go but I might not have a very good chip to OC.

LightningJR · Jun 29, 2015

mirakul said:
Can you bench with 600-625 MHz on mem? It seems that AMD first plan was set HBM at 625MHz, but not all of the chip passed. They had to set it at 500MHz thinking it would be enough. Therefore there might be a timing profile for a round 625MHz, which can significantly boost the performance.

Is that so...

I would still like to know where you got the numbers you referenced above.

P4-630 · Jun 29, 2015

mirakul said:
Can you bench with 600-625 MHz on mem? It seems that AMD first plan was set HBM at 625MHz, but not all of the chip passed. They had to set it at 500MHz thinking it would be enough. Therefore there might be a timing profile for a round 625MHz, which can significantly boost the performance.

Got any proof?

the54thvoid · Jun 29, 2015

v12dock said:
600Mhz causes serve artifacts and crashing. I will push as far as I can go but I might not have a very good chip to OC.

Don't hurt your card to please Mirakul. At least you bought one, let someone else push the envelope!

Mtom · Jun 29, 2015

LightningJR said:
Is that so...

I would still like to know where you got the numbers you referenced above.

early leaks all listed 1.25ghz and 640gb/s for the memory

v12dock · Jun 29, 2015

the54thvoid said:
Don't hurt your card to please Mirakul. At least you bought one, let someone else push the envelope!

Lol I already got stuck in a reboot loop. I clocked to 625mhz and within a few seconds I got severe artifacting to the point of reboot being necessary. After the restart the OCs were saved in CCC which changed the clocks immediately after booting causing artifacting. Fortunately I was able to boot into safe mode to disable the OC.

mirakul · Jun 29, 2015

Yes, early leaks all pointed to that number, 625MHz. But it seems that not all the HBM chip can reach it.

v12dock said:
Lol I already got stuck in a reboot loop. I clocked to 625mhz and within a few seconds I got severe artifacting to the point of reboot being necessary. After the restart the OCs were saved in CCC which changed the clocks immediately after booting causing artifacting. Fortunately I was able to boot into safe mode to disable the OC.

Thanks for your effort. I think it would be safe to stick with 500MHz for now. Given that Fury's stock seems to be extremely low, just having the card at this moment is lucky enough

Maybe when the memory voltage can be bumped, we can have higher HBM clock stable.

Vayra86 · Jun 29, 2015

mirakul said:
Is that have any relation to the topic here? Why didn't you compare FuryX with itself when judging its overclock?
I did mention, right? 9% core OC and 20% mem OC and the score jumped from 14098 to 16963, an impressive 20% increase.

FuryX does need moar bandwidth afterall.

Riiight. Synthetic benchmarks show a higher score, so FuryX benefits from more bandwidth. Try doing that in Heaven and I bet you will also see a lovely score bump. Now put that same OC to the test in-game (any game) and you will see the gains diminish entirely. We don't play benchmarks. Fire Strike, Heaven etc. mean exactly fuck all and are the LAST indicator to draw conclusions on, if ever.

FuryX is an unbalanced card, it's as simple as that. Memory bandwidth is off the charts along with shader count, but it falls short on ROPs. Since it is essentially the same arch you can definitely base conclusions on those bits of info right there. AMD cards have NEVER been starving for bandwidth and have NEVER been effective memory overclockers. Do the math. The arch is still the same as it was in 2012, with some Tonga optimizations on top. It ain't rocket science...

Memory overclocking isn't always shit though - I remember overclocking the GTX 660. It was the memory that made performance gains happen, almost exclusively... And this is not surprising, as Nvidia introduced Kepler Boost with 6xx, generating higher core clocks out of the box while memory was still fixed on an otherwise very well balanced card.

See... logic works.

LightningJR · Jun 29, 2015

Vayra86 said:
Riiight. Synthetic benchmarks show a higher score, so FuryX benefits from more bandwidth. Try doing that in Heaven and I bet you will also see a lovely score bump. Now put that same OC to the test in-game (any game) and you will see the gains diminish entirely. We don't play benchmarks. Fire Strike, Heaven etc. mean exactly fuck all and are the LAST indicator to draw conclusions on, if ever.

FuryX is an unbalanced card, it's as simple as that. Memory bandwidth is off the charts along with shader count, but it falls short on ROPs. Since it is essentially the same arch you can definitely base conclusions on those bits of info right there. AMD cards have NEVER been starving for bandwidth and have NEVER been effective memory overclockers. Do the math. The arch is still the same as it was in 2012, with some Tonga optimizations on top. It ain't rocket science...

Memory overclocking isn't always shit though - I remember overclocking the GTX 660. It was the memory that made performance gains happen, almost exclusively... And this is not surprising, as Nvidia introduced Kepler Boost with 6xx, generating higher core clocks out of the box while memory was still fixed on an otherwise very well balanced card.

See... logic works.

Yeah, not always shit, when memory bandwidth is the bottleneck then it's nice to get an overclock on it. You don't often see it these days though. That's interesting about the 660, ill have to look up the specs.

mirakul · Jun 30, 2015

I lol'd so hard with with your 660 joke. You meant the card with 192bit bus and 2GB of VRAMm, right?. No one noticed it but it told the same the story as 970 fiasco.

On the topic, another guy hit 600MHz stable, and he saw the performance increased
[quote name="Neon Lights" url="[URL]http://www.overclock.net/t/1547314/official-amd-r9-radeon-fury-nano-x-x2-fiji-owners-club/1720#post_24106947[/URL]"]
I also had the memory bug.

In the "Furry and Tessy" Test (1920x1080, 4xMSAA) in MSI Kombustor 2.5.0 600MHz memory clock (and standard core clock) gives me 57FPS instead of 49FPS.[/quote]

Mussels · Jun 30, 2015

v12dock said:
I can certainly run at 1080P. I choose 4K assuming it would use most of the vram which would be a limiting factor. I will run the GTA5 benchmark a few times and maybe a few others to see if OC memory is decreasing the delta or if it was simply an anomaly.

It's funny to think about the sheer bandwidth of HBM. Granted this is comparing apples an oranges but it gives you an idea. I think HBM will benefit APUs more than anything.

... i didn't realise that. Any CPU's with HBM for cache (especially those APU's) are going to have one hell of a performance kick in the pants...

Aquinus · Jun 30, 2015

mirakul said:
I agree with your car analogy. However, are you sure you know how Fiji GPU uses HBM which lie on the same interposer?? This architecture is unprecedented and I doubt that anyone in this thread fully knows how it works thoroughly.

Neither do you since your usage of the term "interposer" is seemingly incorrect. The interproser is the circuitry inside a piece of circuit board that connects two ICs, it is not the circuit board itself, therefore nothing really "lies" on the interposer.

mirakul said:
Could you please try to explain that 19321 graphics score in fs, when the oc was 1145/600. FYI the graphics score for 1145/500 is around 16k only.

Could you link to what you're talking about. You seem to have lost me.

Frick · Jun 30, 2015

Mussels said:
... i didn't realise that. Any CPU's with HBM for cache (especially those APU's) are going to have one hell of a performance kick in the pants...

Aye that is what makes it interesting. 2016 will be a very interesting year.

Nokiron · Jun 30, 2015

v12dock said:
It's funny to think about the sheer bandwidth of HBM. Granted this is comparing apples an oranges but it gives you an idea. I think HBM will benefit APUs more than anything.

You could easily beat Haswells L2 Cache bandwidth even with GDDR5. But you will never come even close to reach the low latency levels.

You are comparing apples to the latest issue of cosmopolitan.

dorsetknob · Jun 30, 2015

the54thvoid said:
Don't hurt your card to please Mirakul. At least you bought one, let someone else push the envelope!

PUSH IT TO THE LIMIT and then beyond many here would like to know at what point these cards Self Destruct
Card has warranty yes we would also be interested in the RMA experience

Vayra86 · Jun 30, 2015

mirakul said:
I lol'd so hard with with your 660 joke. You meant the card with 192bit bus and 2GB of VRAMm, right?. No one noticed it but it told the same the story as 970 fiasco.

The 660, yes, which I had in SLI, with its asymmetric bus. The 660 is NOT an early 970 though. Nvidia even used an assymetric bus in some Fermi cards. The 970 is pushing the envelope in terms of its memory subsystem, offering a far less advantageous end result where the last segment is completely starved of bandwidth. The GTX 660 still had a pretty decent bandwidth on the last 0.5GB segment. However in SLI the issues did show in the form of a slight stutter when I pushed that card. Overclocking memory alleviated most of those issues. Similarly, people today are experiencing issues when they put the 970 in SLI and push it hard. Which is what I have been saying when the issue popped up. It is also an excellent example of a GPU that is extremely well balanced in core vs memory, and an excellent example of a GPU where overclocking memory actually nets you major performance gains, most pronounced when put in SLI.

Not sure why you'd lol so hard on this, it's no secret that Nvidia uses and used asymmetric memory systems. If that is any indicator of your sense of humor... well that's a boring life

No one noticed it? It was a non-issue, but reviewers most certainly noticed it and Nvidia also didn't hide it OR tried to mitigate the media storm, like they did with 970. However, even the 970 is a fine card, just like the 660 was, until you push the weak links in SLI, and even thén it holds up quite ok, just not flawless.

Either way, the 660 proves my point that memory overclocking only works on cards that are starved on bandwidth. The FuryX is not that card and GCN has never been starved on bandwidth. Ever. They are all cards with very wide buses and a core that couldn't really match it, making for a slightly less efficient design. This again is supported by power draw figures that are generally higher on GCN than on Kepler and especially Maxwell.

For LightningJR, here is the technical explanation:

Basically a 192-bit memory bus has three blocks of 64-bit memory controllers, and to get 1.5GB of vRAM on them, you would add 512MB vRAM to each 64-bit mem bus, but nVidia adds an extra 512MB block to one 64-bit block. The thing about this "trick", is that all of the memory does not work at the faster bus speed. This means that only the first 1.5GB of vRAM is run at ~144GB/s, and the last 512MB block is run at only ~48GB/s, due to asymmetrical design. This is present in other cards as well from nVidia, such as the GTX 650Ti Boost, GTX 560 SE, GTX 550Ti and the GTX 460 v2. So hey, the more you know right?

source; http://linustechtips.com/main/topic/198568-the-video-ram-information-guide/

mirakul · Jun 30, 2015

^Quick answer about 660, I don't think asymmetrical design is a good design, it's only good for marketing, a 2GB card seem to be better a 1.5GB. I would say it is the same as "3.5GB" 970

For memory overclocking, your point of view is correct, but not enough. When you OC the memory, you change the latency as well. Better latency will help the core works more effectively in memory intensive scenario. If the latency is not coordinate well with memory timing profile in BIOS, more errors will be produced hence mitigate the gain from more bandwidth.

Thefore, I believe it was what happens in @v12dock experiment at 550MHz. Other guy at 600MHz saw a significant boost in performance, just like in hardware.info bench. FuryX seems to have timing profile for 625MHz HBM, which suits the rumors of 625MHz HBM before lauched.

Vayra86 · Jun 30, 2015

Asymmetrical design is NOT for marketing purposes. It exists purely because Nvidia has a model in which they cut down chips and their SM units are connected to memory bus (this is what happened with the 970). It was US, unknowing customers, that said 'Nvidia should have marketed it as 3.5GB'. HOWEVER - if Nvidia had actually worked with only 3.5GB of VRAM on the 970, it would have capped out earlier as resources were still going to get moved around to achieve maximum performance. Right now, the 970 uses the last 0.5GB for, among other stuff, the Windows desktop, meaning that hardly used data is sitting in the slow part of the memory subsystem. This in itself is not a bad thing and its always better than having that data sit in the high performance part of the gpu taking up valuable VRAM space in the first 3.5GB segment.

The other side of the medal is that we now see a dramatic power draw difference between Maxwell and GCN. Nvidia's memory subsystem design is part of this difference, and part of its competitive advantage in the market as a whole. Not only do they have more efficient chips, they also need less metal to produce performance equivalents to GCN with its wide and expensive memory bus. Efficiency for Nvidia = efficiency across the board and in the long term, this pays off as it shows today.

Last, latency with GDDR5 (HBM can be different, quite sure it is, because low clocks) is not very interesting. GDDR5 is by nature higher latency memory and a slight clock shift won't influence that too much, if at all. A GPU is always queuing frame data, so latency is less important and can be 'hidden' between frames.

Processor	Ryzen 7800X3D
Motherboard	MSI MAG Mortar B650 (wifi)
Cooling	be quiet! Dark Rock Pro 4
Memory	32GB Kingston Fury
Video Card(s)	MSI RTX 5080 Vanguard SOC
Storage	Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s)	LG 32" 165Hz 1440p GSYNC
Case	Asus Prime AP201
Audio Device(s)	On Board
Power Supply	be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software	W10

System Name	AlderLake
Processor	Intel i7 12700K P-Cores @ 5Ghz
Motherboard	Gigabyte Z690 Aorus Master
Cooling	Noctua NH-U12A 2 fans + Thermal Grizzly Kryonaut Extreme + 5 case fans
Memory	32GB DDR5 Corsair Dominator Platinum RGB 6000MT/s CL36
Video Card(s)	MSI RTX 2070 Super Gaming X Trio
Storage	Samsung 980 Pro 1TB + 970 Evo 500GB + 850 Pro 512GB + 860 Evo 1TB x2
Display(s)	23.8" Dell S2417DG 165Hz G-Sync 1440p
Case	Be quiet! Silent Base 600 - Window
Audio Device(s)	Panasonic SA-PMX94 / Realtek onboard + B&O speaker system / Harman Kardon Go + Play / Logitech G533
Power Supply	Seasonic Focus Plus Gold 750W
Mouse	Logitech MX Anywhere 2 Laser wireless
Keyboard	RAPOO E9270P Black 5GHz wireless
Software	Windows 11
Benchmark Scores	Cinebench R23 (Single Core) 1936 @ stock Cinebench R23 (Multi Core) 23006 @ stock

Processor	Ryzen 7800X3D
Motherboard	MSI MAG Mortar B650 (wifi)
Cooling	be quiet! Dark Rock Pro 4
Memory	32GB Kingston Fury
Video Card(s)	MSI RTX 5080 Vanguard SOC
Storage	Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s)	LG 32" 165Hz 1440p GSYNC
Case	Asus Prime AP201
Audio Device(s)	On Board
Power Supply	be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software	W10

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

System Name	Rainbow Sparkles (Power efficient, <350W gaming load)
Processor	Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard	Asus x570-F (BIOS Modded)
Cooling	Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory	2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s)	Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage	2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s)	Phillips 32 32M1N5800A (4k144), LG 32" (4K60) \| Gigabyte G32QC (2k165) \| Phillips 328m6fjrmb (2K144)
Case	Fractal Design R6
Audio Device(s)	Logitech G560 \| Corsair Void pro RGB \|Blue Yeti mic
Power Supply	Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse	Logitech G Pro wireless + Steelseries Prisma XL
Keyboard	Razer Huntsman TE ( Sexy white keycaps)
VR HMD	Oculus Rift S + Quest 2
Software	Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores	Nyooom.

Overclocked HBM? It's true, and it's fast

LightningJR

Mtom

v12dock

Block Caption of Rainey Street

Mtom

the54thvoid

Super Intoxicated Moderator

v12dock

Block Caption of Rainey Street

mirakul

v12dock

Block Caption of Rainey Street

LightningJR

P4-630

the54thvoid

Super Intoxicated Moderator

Mtom

v12dock

Block Caption of Rainey Street

mirakul

Vayra86

LightningJR

mirakul

Mussels

Freshwater Moderator

Aquinus

Resident Wat-man

Frick

Fishfaced Nincompoop

Nokiron

dorsetknob

"YOUR RMA REQUEST IS CON-REFUSED"

Vayra86

mirakul

Vayra86

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 2TB external SSD, 4TB external HDD for backup.
Display(s)	32" Dell UHD, 27" LG UHD, 28" LG 5k
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply	Display or Thunderbolt 4 Hub
Mouse	Logitech G502
Keyboard	Logitech G915, GL Clicky
Software	MacOS 15.3.1

System Name	Black MC in Tokyo
Processor	Ryzen 5 7600
Motherboard	MSI X670E Gaming Plus Wifi
Cooling	Be Quiet! Pure Rock 2
Memory	2 x 16GB Corsair Vengeance @ 6000Mhz
Video Card(s)	XFX 6950XT Speedster MERC 319
Storage	Kingston KC3000 1TB \| WD Black SN750 2TB \|WD Blue 1TB x 2 \| Toshiba P300 2TB \| Seagate Expansion 8TB
Display(s)	Samsung U32J590U 4K + BenQ GL2450HT 1080p
Case	Fractal Design Define R4
Audio Device(s)	Plantronics 5220 \| Sony WH-1000XM3 \| Nektar SE61 \| Behringer XR18
Power Supply	Corsair RM850x v3
Mouse	Logitech G602
Keyboard	Dell SK3205
Software	Windows 10 Pro
Benchmark Scores	Rimworld 4K ready!