Wednesday, May 20th 2015

AMD "Fiji" HBM Implementation Detailed

Back in 2008, when it looked like NVIDIA owned the GPU market, and AMD seemed lagging behind on the performance and efficiency game, the company sprung a surprise. The company's RV770 silicon, the first GPU to implement GDDR5 memory, trounced NVIDIA's big and inefficient GeForce GTX 200 series, and threw AMD back in the game. GDDR5 helped the company double the memory bandwidth, with lower pin- and memory-chip counts, letting the company and its partners build graphics cards with fewer components, and earn great margins, which the company invested in development of its even better HD 5000 series, that pushed NVIDIA with its comical GeForce GTX 480, to hit its lowest ever in market-share. Could AMD be looking at a similar turnaround this summer?

Since the introduction of its Graphics CoreNext architecture in 2012, AMD has been rather laxed in its product development cycle. The company has come out with a new high-end silicon every 18-24 months, and adopted a strategy of cascading re-branding. The introduction of each new high-end silicon would relegate the existing high-end silicon to the performance segment re-branded, and the existing performance-segment silicon to mid-range, re-branded. While the company could lay out its upcoming Radeon R9 series much in the same way, with the introduction of essentially just one new silicon, "Fiji," it could just prove enough for the company. Much like RV770, "Fiji" is about to bring something that could prove to be a very big feature to the consumer graphics market, stacked high-bandwidth memory (HBM).
HBM is being promoted as an upcoming memory standard by JEDEC, and AMD will be the first company to build an ASIC implementing it; with SK Hynix being among the first DRAM makers to build silicon for the standard. HBM is being brought in to address a key problem with GDDR5 - it's not being able to keep up with the growing video memory bandwidth demands of upcoming applications and the GPUs being built to drive them. AMD already has the fastest implementation of GDDR5 on its "Hawaii" silicon, which belts out 320 GB/s of memory bandwidth, but to get there, the company is having to use sixteen memory chips. Placed on a PCB, the ASIC along with the 16 memory chips take up quite a bit of real-estate - 110 mm x 90 mm (99 cm²).

GPU makers haven't managed to take clock speeds of GDDR5 above 1752 MHz (real), and the fact that they're having to use other ways to increase effective bandwidth, such as proprietary lossless memory compression, shows that GDDR5 will fetch diminishing returns for new designs from here on, out. With GDDR5 (or any DRAM standard for that matter), power-consumption doesn't follow a linear curve to support increasing clock speeds. Beyond a point, you need a disproportionate amount of power to support increasing clocks. GDDR5 reached that point. This necessitates HBM.

HBM takes a different approach to achieving memory bandwidth than GDDR5. The interface is wider, but with lower clocks (leaving a vast room for future increases in clock speeds). The first generation of HBM, which AMD is implementing on its upcoming high-end graphics cards, pushes just 1 Gbps of data per pin (compared to 7 Gbps on the fastest GDDR5); but features a vast bus width of 1024-bit (compared to just 32-bit per GDDR5 chip). An HBM "chip" is essentially a stack of five dies - a "base die" which holds routing logic, and four DRAM dies, stacked like pancakes (compared to just one DRAM die being bumped out to a BGA package that sits on the PCB, on GDDR5).

In AMD's implementation of HBM, these "chips" won't be encased into ceramic packages of their own, that sit outside the GPU package, to which it takes intricate wiring along the PCB to reach out to. Instead, HBM chips will be placed right alongside the GPU die, inside the GPU package, on a package substrate AMD calls the "interposer." This is a specially designed substrate layer above the ASIC's own package substrate, which connects the GPU die to the four HBM stacks, with an extremely high density of wiring, that's beyond what conventional multi-layered fiberglass PCBs are capable of. The interposer is perhaps the closest man has come to developing a medulla oblongata.

These stacks, as a result are much closer to the GPU silicon, and the interposer enables extremely high memory bus widths thanks to the density of wiring it can handle. AMD has four such stacks on its upcoming "Fiji" ASIC, resulting in a gargantuan 4096-bit memory bus width. Since HBM pushes lesser data per pin compared to GDDR5, don't expect "Fiji" to have eight times the memory bandwidth as "Hawaii." AMD's flagship Radeon graphics card based on "Fiji" is rumored to feature a memory clock speed of 500 MHz (1 Gbps per pin), which translates in to 512 GB/s of cumulative video memory bandwidth for the GPU, with 128 GB/s per HBM stack. The DRAM voltage is lower, at 1.3V, compared to 1.5V on 7 Gbps GDDR5.

The specifications of the GPU die are constantly being churned up by the rumor mill. Regardless of that, "Fiji" will end up having a lower PCB footprint than "Hawaii." The package will be bigger, but it will no longer be surrounded by memory chips. The PCB look quite different from what we're used to seeing, since the dawn of PC graphics add-in boards. In a way, that's a great thing. AMD retains control over memory, and so its AIB partners can't cheap out with memory chips. We haven't forgotten how some AIBs shortchanged buyers of Radeon R9 290 and R9 290X with cheaper Elpida GDDR5 chips on reference PCBs, even as initial batches (and review samples), came with higher-quality SK Hynix-made ones. Some of the earliest boards with Elpida chips didn't have proper memory timing optimization in the video-BIOS, prompting AIBs to send out BIOS updates. Something like that won't happen with "Fiji," and AIBs are free to cheap out on PCB quality, as the most sensitive wiring (that between the GPU and memory), has now been moved to the GPU package, and its interposer (more quality control in AMD's hands).

So what does this all boil down to? The memory is a more important ingredient in a modern graphics card, than you've been led to believe. The 64-bit computing era is now firmly here, and games are taking advantage of any amount of system- and video-memory you can throw at them. Compound that with DirectX 12, in which the command buffer can take advantage of any number of CPU cores you throw at it, tiled resources, and you're looking at a future that AMD seems to have been preparing for the the past decade (CPUs with a large number of cores, GPUs with extremely high number-crunching parallelism and memory bandwidth). HBM and the way AMD implemented in on its "Fiji" silicon is an important cog in the company's machine. It will offer a brand-new path of bandwidth upscaling through clock speed upscale; and higher energy-efficiency.

It's improbable that AMD would go to such lengths to equip its new high-end silicon, if it wasn't confident of outperforming anything NVIDIA has right now. Likewise, it's improbable that AMD would give a GPU 512 GB/s of memory bandwidth to toy with, if it lacked the chops (number-crunching muscle) to deal with such amount of memory. And this, is what makes "Fiji" a chip to look out for. AMD is expected to tease graphics cards based on "Fiji" at either Computex or E3, with a product launch within June. Let the battle between the Titans and the House of Zeus begin.
Add your own comment

29 Comments on AMD "Fiji" HBM Implementation Detailed

#1
RejZoR
Makes sense that coolers will have to be more capable now. We'll have GPU and also HBM stacks cooled by the main heatsink. I mean, why not use it if they are already placed there. Which will be cool for overclocking of HBM modules if possible. Or if they are cool by themselves could be a problem if GPU will be heating up too much through the proximity and cooler contact. I wonder how that all works out for this configuration...
Posted on Reply
#2
Mathragh
RejZoR, post: 3284544, member: 1515"
Makes sense that coolers will have to be more capable now. We'll have GPU and also HBM stacks cooled by the main heatsink. I mean, why not use it if they are already placed there. Which will be cool for overclocking of HBM modules if possible. Or if they are cool by themselves could be a problem if GPU will be heating up too much through the proximity and cooler contact. I wonder how that all works out for this configuration...
Aye and even if total power use has gone down, trying remove as much heat as before will be a lot harder if the total cardsize suddenly went down by a lot. David Kanter on the techreport podcast noted that this card might still need a watercooler even if the powerdraw is lower than a 290x, simply because an aircooler would lack the needed surface area on a smaller card the interposer would enable.
Posted on Reply
#3
TRWOV
Can't wait although I must say that my 7970 is still serving me well.

I suppose R7s will still get GDDR5 but I hope there's at least an R9 380X with HBM.
Posted on Reply
#4
Petey Plane
RejZoR, post: 3284544, member: 1515"
Makes sense that coolers will have to be more capable now. We'll have GPU and also HBM stacks cooled by the main heatsink. I mean, why not use it if they are already placed there. Which will be cool for overclocking of HBM modules if possible. Or if they are cool by themselves could be a problem if GPU will be heating up too much through the proximity and cooler contact. I wonder how that all works out for this configuration...
It will almost definitely look like a Multi-Chip Module (MCM). The condensed area will allow the cooling solutions to work more efficiently, because they will have to cool a smaller surface area. Although it will certainly be much smaller, think of something like this Power5 MCM (thx wikipedia)
Posted on Reply
#5
ShurikN
Great read, and a very easy one to understand. Thanks.
Posted on Reply
#6
Petey Plane
TRWOV, post: 3284554, member: 97693"
Can't wait although I must say that my 7970 is still serving me well.

I suppose R7s will still get GDDR5 but I hope there's at least an R9 380X with HBM.
I expect that you're gonna have to wait for the R9 480x to see HBM on anything lower than the 390x
Posted on Reply
#7
ShurikN
Also the 3rd pic says CPU/GPU.
Would like to see an APU (BGA i presume) with a large interposer to house CPU/GPU/HBM.
And all of that on a small form factor board.
Posted on Reply
#8
NC37
Petey Plane, post: 3284561, member: 150824"
I expect that you're gonna have to wait for the R9 480x to see HBM on anything lower than the 390x
That depends if the rumors are true or not about why AMD waited so long to launch the 300s. There have been conflicting reports that AMD held back to be able to supply the 300 line with more new GPUs, not just one. Now that doesn't mean HBM will be in all of them. But it would be a welcome move to make the 300s a fresh start instead of rehash city. AMD needs to do something with Freesync. The lack of cards running it in the 200s is reason enough for them to have held off and made sure the 300 line has full support across it.
Posted on Reply
#9
Petey Plane
NC37, post: 3284568, member: 61225"
That depends if the rumors are true or not about why AMD waited so long to launch the 300s. There have been conflicting reports that AMD held back to be able to supply the 300 line with more new GPUs, not just one. Now that doesn't mean HBM will be in all of them. But it would be a welcome move to make the 300s a fresh start instead of rehash city. AMD needs to do something with Freesync. The lack of cards running it in the 200s is reason enough for them to have held off and made sure the 300 line has full support across it.
The rumors i've heard that the 390x will come in 2 versions, GDDR5 and HBM, lead me to think that the HBM version will be in the $500+ segment, with the GDDR5 coming in at the $400-$500 segment. The only reason i can think that they'd split the 390x between 2 models is that they want to reserve HBM for the flagship of the 300 generation, like a "R9 390x Plus" . That is, if those rumors are even true.

Also, i thought that free-sync was dependent on the monitor's support, and all Graphics CoreNext based cards are technically Free-Sync compatible. Guess not.
Posted on Reply
#10
2big2fail
It looks like my EK VGA Supremacy will get new life as both a GPU and VRAM block! :laugh:

In all seriousness though I'm looking forward to much smaller pcbs and waterblocks. If we can get some CPUs with HBM we might even get itx form factor dual GPU setups. :clap:
Posted on Reply
#11
dyonoctis
NC37, post: 3284568, member: 61225"
That depends if the rumors are true or not about why AMD waited so long to launch the 300s. There have been conflicting reports that AMD held back to be able to supply the 300 line with more new GPUs, not just one. Now that doesn't mean HBM will be in all of them. But it would be a welcome move to make the 300s a fresh start instead of rehash city. AMD needs to do something with Freesync. The lack of cards running it in the 200s is reason enough for them to have held off and made sure the 300 line has full support across it.
The way AMD choose to make their R9 200 line-up is the cause of it. they should have done something in the line of nvidia with the 900 series. Hawaï for the high-end, and tonga for the mid-range. Tonga is actually competiting with both the R9 280 and the R9 270x. And i think that very few ppl know that there is 2 gen of gcn mixed at the moment. If you want to enjoy free-sync, vsr ,better dx12/mantle performance, true audio you need to buy gcn 1.2 wich very few ppl can sort apart from 1.0. I hope the r9 300 will be less cluttered...
(It's weird that Nvidia being the one making the more money atm is the one making the fewer chip: 4 vs at least 10 for amd...)
Posted on Reply
#12
horik
Well I hope they will move their asses and begin selling the cards soon, or my wife may change her mind and won't let me buy a new card :laugh:
Posted on Reply
#13
kn00tcn
horik, post: 3284591, member: 90440"
Well I hope they will move their asses and begin selling the cards soon, or my wife may change her mind and won't let me buy a new card :laugh:
(not personal, you're not the only one) i dont understand why people make comments like this, it's been widely reported for the past half year that ~june is the launch period for some amount of new radeons, it doesnt make sense for anyone to 'hurry' as nothing is delayed & it has to fit in with the computex+e3 events (that pc gaming one at e3 with amd headlining is quite blatant)
Posted on Reply
#14
horik
kn00tcn, post: 3284597, member: 65960"
(not personal, you're not the only one) i dont understand why people make comments like this, it's been widely reported for the past half year that ~june is the launch period for some amount of new radeons, it doesnt make sense for anyone to 'hurry' as nothing is delayed & it has to fit in with the computex+e3 events (that pc gaming one at e3 with amd headlining is quite blatant)
Hope you are right, for me is the past month or so that I started looking again for new hardware parts for a new build.
Posted on Reply
#15
RejZoR
Mathragh, post: 3284553, member: 78901"
Aye and even if total power use has gone down, trying remove as much heat as before will be a lot harder if the total cardsize suddenly went down by a lot. David Kanter on the techreport podcast noted that this card might still need a watercooler even if the powerdraw is lower than a 290x, simply because an aircooler would lack the needed surface area on a smaller card the interposer would enable.
Well, you can still have empty PCB the size you want in order to use a desired cooler...
Posted on Reply
#16
Patriot
Petey Plane, post: 3284556, member: 150824"
It will almost definitely look like a Multi-Chip Module (MCM). The condensed area will allow the cooling solutions to work more efficiently, because they will have to cool a smaller surface area. Although it will certainly be much smaller, think of something like this Power5 MCM (thx wikipedia)

Or like this. 2012 trails on hbm...
Posted on Reply
#17
TheinsanegamerN
dyonoctis, post: 3284590, member: 111394"
The way AMD choose to make their R9 200 line-up is the cause of it. they should have done something in the line of nvidia with the 900 series. Hawaï for the high-end, and tonga for the mid-range. Tonga is actually competiting with both the R9 280 and the R9 270x. And i think that very few ppl know that there is 2 gen of gcn mixed at the moment. If you want to enjoy free-sync, vsr ,better dx12/mantle performance, true audio you need to buy gcn 1.2 wich very few ppl can sort apart from 1.0. I hope the r9 300 will be less cluttered...
(It's weird that Nvidia being the one making the more money atm is the one making the fewer chip: 4 vs at least 10 for amd...)
They saturated the market, which doesn't always work in your favor. Nvidia used to do the same thing with ultra, TI, SE, ece versions of their cards. They figured out it sells better to have fewer products with better differentiation, than having tons of cards super close to each other.
Posted on Reply
#18
galta
It does not make sense to say that because the cards will be smaller - which is good news - it would have to be watercooled, for a regular air cooling solution wouldn't have enough size to cool GPU and memory placed so close to each other.
Guys, come on!
If that was the case, ATI could simply make the cards "artificially" bigger to accomodate a larger blower.
One must note, however, that ever since the hotter-than-the-sun GTX480 was released in 2009, ATI has succesfully claimed the "hottest" product crown. It will be great to see them back on the game again.
Hope the same happens with AMD CPUs as well.
Posted on Reply
#19
hellrazor
Sweet. This, and getting good OpenGL support would get me back to them.
Posted on Reply
#20
Caring1
galta, post: 3284712, member: 157396"
"It does not make sense to say that because the cards will be smaller - which is good news - it would have to be watercooled, for a regular air cooling solution wouldn't have enough size to cool GPU and memory placed so close to each other."
Guys, come on!
If that was the case, ATI could simply make the cards "artificially" bigger to accomodate a larger blower.
They could also implement cooling on the reverse plane utilising a heatsink on the rear of the chip attached to the metal backplate.
This would effectively double the surface area used for cooling.
Posted on Reply
#21
nunyabuisness
RejZoR, post: 3284544, member: 1515"
Makes sense that coolers will have to be more capable now. We'll have GPU and also HBM stacks cooled by the main heatsink. I mean, why not use it if they are already placed there. Which will be cool for overclocking of HBM modules if possible. Or if they are cool by themselves could be a problem if GPU will be heating up too much through the proximity and cooler contact. I wonder how that all works out for this configuration...
We still may see problems. Heat rises. so I think the top Dram chip is gana get REALLY HOT.
I hope AMD have spent enough time validating this. the links are really fragile, and if they get hot. think Xbox 360 issues 1 billion dollars in writeoffs from warranty! if its not 100% sorted
Posted on Reply
#22
RejZoR
TheinsanegamerN, post: 3284655, member: 127292"
They saturated the market, which doesn't always work in your favor. Nvidia used to do the same thing with ultra, TI, SE, ece versions of their cards. They figured out it sells better to have fewer products with better differentiation, than having tons of cards super close to each other.
This is what I've been always saying, especially since I work with customers on daily basis and have seen it first hand during a struggle in a company where I work few years ago. We used to have like 15 different models for the same product type and it was really difficult to sell them, people often had to go and "rethink" despite literally pushing them towards one product. But then we were forced to narrow down the lineup to like 5-6 models, basically 2 for each price range. Guess what, sales went through the roof almost, because people were able to quickly decide what is within their budget and what isn't. And when they had to pay more, it was also an easy choice. The first more expensive one. But if you granulate the models to 10 extra models in between, you confuse them again with decisions how much extra is worth paying for what extra features.

And that's what baffles me with AMD's Radeon product lines. There are TOO MANY of them. Bunch of Rx editions and then those are granulated down to bunch of series and then down to special models and versions. Totally unnecessary, confusing and overwhelming for costumers.

Why not have just R9 and place 4-5 well placed models here. Simulated naming for the new R9-300 series:

R9-320 2GB (budget)
R9-350 2GB (low end)
R9-370 3GB (mid range)
R9-380 4GB (high end)
R9-390 4GB/8GB* (premium-enthusiast)

*depending on how HBM can be implemented currently

No X versions, no LE crap versions, no various memory configurations, no R5 and R7 unless if you want to separate lets say mobile chips from desktops that way. When you give people a good argument why they should pay few bucks more, they will do so. But if you're having hard time justifying why every iteration of a card in between costs X for function Y, you just confuse costumers and make them walk away because they have to "rethink". And everyone who has to rethink is more likely to buy something from competition.

I don't know, I don't have a science degree in space marketing and a 6 figure yearly income and I get this. But companies just go with heads straight through the walls. Go figure...
Posted on Reply
#23
kn00tcn
horik, post: 3284611, member: 90440"
Hope you are right, for me is the past month or so that I started looking again for new hardware parts for a new build.
dont forget skylake is also launching.... basically let's see what the situation is in the next few weeks (yes, june specifically for both amd+intel, unless you want to wait until ~september for 980ti)

RejZoR, post: 3284853, member: 1515"
And that's what baffles me with AMD's Radeon product lines. There are TOO MANY of them. Bunch of Rx editions and then those are granulated down to bunch of series and then down to special models and versions. Totally unnecessary, confusing and overwhelming for costumers.
you need to sell defective chips, so you need to have a 'pro' & 'xt' version (xx50 & xx70), nvidia is identical to this with x70 & x80, it's the same gpu but one is crippled, to me this is just fine

what happened was this mess with TSMC, lack of resources, shopping time of the year, OEMs needing new/more numbers.... so you end up with rebrands, mixed generations, & stop-gap solutions

you messed up with your Rx, R9 is high end, R7 is midrange, R5 is low end (this is so consumers get an idea of performance level.... even though it's a bit redundant cuz a higher numbered 280 is obviously better than 260, but i guess it doesnt communicate which third of performance it is cuz consumers are idiots that need their hands held all the time instead of using a 1-10 scale (wow it took me this long to realize it's based on intel's i3, i5, i7))

i'm actually not sure why you say there are so many products, nvidia has the same issue when there are TI editions & especially with mobile

the product stack is pretty normal on launch, with a few random appearances months or years later like the 5830 or 285

hmm maybe i should imagine my own (gaming, not workstation) stack, which btw needs price points:

$53 - 310 - are you sure you want to permanently delete this file?
$60 - 320 - i dont even know what i would do around here, rebadge? keep cutting tiny gpus? try to engineer low power efficiency for practice?
$70 - 330 - 9fps - borderline worse than an APU, probably should be required to playback 4k30 video
$85 - 340 - 12fps - maybe slowed & cut low end gpu... i hate low end, people cant wait a week to save up a couple bills? it's not like faster last gen cards are obsolete, buy those at this price instead
$100 - 350 - 15fps - upper low end gpu, will struggle if MSAA is used in 1080p
$125 - 360 - 19fps - slowed & cut mid gpu
$150 - 365 - 23fps - mid gpu that should handle 1080p without constantly being limited by ROPs or bandwidth, but eye candy will slow it down fast
$175 - 370 - 27fps - slowed & cut upper mid tier gpu
$200 - 375 - 32fps - uncut upper mid tier gpu that performs half or physically contains mostly half of the high end gpu
$250 - 380 - 38fps - slowed & cut second tier gpu
$300 - 385 - 46fps - uncut second tier gpu (maybe it lacks all the compute parts like 7870 did but still has more parts for raster rendering)
$350 - 390 - 53fps - slowed & cut big gpu
$450 - 395 - 60fps - full speed uncut big gpu, the absolute best single gpu (mass produced, not workstation, not custom or limited/specialized run such as some server compute card)
$750 - 390x2 - 4k 45fps - cut or slightly slowed dual big gpu (x2 because naming a dual gpu without a dual number is STUPID & unclear, ever since the 290+5970...)
$950 - 395x2 - 4k 55fps - water cooled full speed dual uncut big gpu

there is a glaring hole in my idea... it's quite similar to the 7 series, i actually do wonder what would happen if they made a really high end gpu based on the 7870's principles, as in, instead of adding compute capability, to just make a fat one that is only good in games... but this costs so much money for no gain, especially when you're making an ecosystem of gpu accelerated computing such as photoshop, video encoding, etc

inb4 the million custom editions that AIBs make, especially the OC ones that cost up to the next product that might ironically perform worse than that next product

here's a fun potential fact: titan -> titan z (titans, it's dual) -> titan x (titan next, it's the sequel)

EDIT: changed 390x2 price from original $850 that was based on limitations of air cooling nearly 300 watts
Posted on Reply
#24
Captain_Tom
Can't wait although I must say that my 7970 is still serving me well.

I suppose R7s will still get GDDR5 but I hope there's at least an R9 380X with HBM.
Damn straight it is! My 7970 at 1205/1800 is still maxing out every game I have above 60 FPS. Thus if the 390X isn't a complete knock-out I will skip yet another generation.

However don't expect the 380X to have HBM. They need to use as much of it as they can on the 390 series. Actually the 380X is rumored to be "Enhanced Hawaii" with 8GB of GDDR5 - and that at least still sounds great.
Posted on Reply
#25
RejZoR
I'll be getting new generation out of curiosity. i frankly don't have any real need with overclocked HD7950 3GB. Anything I throw at it works like a charm. Except maybe Natural Selection 2 which is very demanding game on CPU and GPU, especially in late games when there are tons of buildings constructed. I just hope they'll shift to DirectX 12 so we'll offload the burden from CPU. I mostly buy new cards for cool features more than for raw performance. For example I wanted HD6950 over HD6870 because the 6900 supported EQAA and HD6800 didn't. Next one was MLAA and R9-300 will most likely be because of HBM, VSR and other goodies.
Posted on Reply
Add your own comment