Friday, July 15th 2016

SK Hynix to Ship HBM2 Memory by Q3-2016

Korean memory and NAND flash giant SK Hynix announced that it will have HBM2 memory ready for order within Q3-2016 (July-September). The company will ship 4 gigabyte HBM2 stacks in the 4 Hi-stack (4-die stack) form-factor, in two speeds - 2.00 Gbps (256 GB/s per stack), bearing model number H5VR32ESM4H-20C; and 1.60 Gbps (204 GB/s per stack), bearing model number H5VR32ESM4H-12C. With four such stacks, graphics cards over a 4096-bit HBM2 interface, graphics cards with 16 GB of total memory can be built.
Source: SK Hynix Q3 Catalog
Add your own comment

77 Comments on SK Hynix to Ship HBM2 Memory by Q3-2016

#51
ZoneDymo
Jismwww.playtool.com/pages/vramwidth/width.html

GDDR5 is at it limits, with exception of GDDRX5. In a few ways to be honest. Extra power requirements, a larger PCB and a pretty thick memory controller required to use all 8 or even 16 chips at the same time.

This makes a GFX-card in general more expensive, compared to high-end chips with HBM. You dont need to design coolers that cool front and back of the card as well, you only have to focus on both GPU / HBM and VRM.

This also opens doors for AMD to develop a ZEN CPU for either consoles, servermarket or complete SOC's that already have memory on top of it compared to external DDR4 slot for example.
Is it though?
Because sure, if you look here:
www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_1070/27.html

and here:
www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_1080/30.html

We see the GTX1080 having roughly 80 GB/s more bandwidth, but in performance the memory difference does not seem to matter and in the end the GTX1080 does lose out to the R9 295X2 (if you scroll down to that benchmark).
If the bandwidth was really holding gpu's back already that should not be right?

All the other advantages of smaller cooler cheaper gpu's sure, but in performance it does not seem to change the game at all.
Posted on Reply
#52
Jism
That's because that memory bandwidth at stock is 'already' enough for the chip to fully utilitize. At some point it's even useless to OC memory beyond it's sweetspot of the GPU.

Like i said we are at a Era of GPU's that are about to offer pure performance for 4K and even higher, and that is where the need for real bandwidth kicks in. You cant put 1000TB/s into GDDR5x chips that dont require a minimum of 16 or even more chips on 1 pcb. This is HBM2 and it offers more on PAPER then GDDR5X does at the moment.

wccftech.com/asus-radeon-r9-fury-overclocked-10-ghz-hbm-1400-mhz-gpu-clock-fully-unlocked-fury-features-1-tbs-bandwidth-ln2/

This dude OC'ed that HBM up to 1GHz which offered a rough 1TB/s memory bandwidth, however the Fury X chip itself is'nt strong enough to fully utilitize that 1TB/s at all. You need a GPU that can scale along with that memory. Both Vega and Pascal (high end) will carry HBM2 which should be more then enough bandwidth for both GPU's.

Same goes out for system memory for example on AMD systems. The CPU is'nt able to make use of memory that runs beyond 1866 or 2000MHz. Putting in DDR3 at 2400Mhz is'nt going to offer much.
Posted on Reply
#53
ZoneDymo
JismThat's because that memory bandwidth at stock is 'already' enough for the chip to fully utilitize. At some point it's even useless to OC memory beyond it's sweetspot of the GPU.

Like i said we are at a Era of GPU's that are about to offer pure performance for 4K and even higher, and that is where the need for real bandwidth kicks in. You cant put 1000TB/s into GDDR5x chips that dont require a minimum of 16 or even more chips on 1 pcb. This is HBM2 and it offers more on PAPER then GDDR5X does at the moment.

wccftech.com/asus-radeon-r9-fury-overclocked-10-ghz-hbm-1400-mhz-gpu-clock-fully-unlocked-fury-features-1-tbs-bandwidth-ln2/

This dude OC'ed that HBM up to 1GHz which offered a rough 1TB/s memory bandwidth, however the Fury X chip itself is'nt strong enough to fully utilitize that 1TB/s at all. You need a GPU that can scale along with that memory. Both Vega and Pascal (high end) will carry HBM2 which should be more then enough bandwidth for both GPU's.

Same goes out for system memory for example on AMD systems. The CPU is'nt able to make use of memory that runs beyond 1866 or 2000MHz. Putting in DDR3 at 2400Mhz is'nt going to offer much.
I get all that, but that means that right now, GPU's dont make use of the extra freedom that faster memory gives.
Case in point is that GTX1080 that does have GDDR5X but does not need it, its more marketing then anything else for that, same as HBM for the FuryX.
And again, the R9 295X with its GDDR5 puts out higher fps which makes me believe there is quite some give in GDDR5 left still.
But yeah, why wait until it runs into those constraints, same with PCI-E slots and the bandwidth freedom it offers already.
Posted on Reply
#54
PP Mguire
ZoneDymoI get all that, but that means that right now, GPU's dont make use of the extra freedom that faster memory gives.
Case in point is that GTX1080 that does have GDDR5X but does not need it, its more marketing then anything else for that, same as HBM for the FuryX.
And again, the R9 295X with its GDDR5 puts out higher fps which makes me believe there is quite some give in GDDR5 left still.
But yeah, why wait until it runs into those constraints, same with PCI-E slots and the bandwidth freedom it offers already.
Yea I've been saying that for a while now. Friends that want to argue in favor of AMD simply for HBM when it's pretty obvious we're held back by raw GPU power instead of memory bandwidth. HBM is the natural growth of the tech and is welcomed, but not necessarily needed yet.
Posted on Reply
#55
Jism
Exactly. It will come of use in the future, as seen on the following AMD roadmap:



GDDRX5 and HBM offer a lower latency, and lower power usage, making cards in general become more efficient. And with less power usage, it means more headroom for the GPU to be clocked higher.

And you forget one crucial thing: the enterprise market is where it all happens, and where that massive bandwidth in for example PCI-express 3.0 is welcomed. We consumers dont drive multiple NIC's and Hardware RAID controllers loaded with up to 32 SSD's at the same time.
Posted on Reply
#56
FordGT90Concept
"I go fast!1!11!1!"
Aquinus...am I the only person who would be more interested in seeing this on a CPU?
Ooooo, memory controller on stick. Could be used in special situations where they need TBs of memory in one system. Would be really expensive though. I think they could reasonably put 64 GiB of RAM on a stick (4 GiB * 16 stacks). 512 GiB across 8 sticks is plausible.
Posted on Reply
#57
$ReaPeR$
NokironAn early report that is correct though. AMD is not overly popular here and that what RX 480 was supposed to change.
But it did not happen, the prices are too high and other options are more viable. This won't change for quite some time.

Im not saying the card itself is a failure, im saying that everything regarding its launch was a failure over here.

Im sure percentage-wise that we buy a lot more GPUs than the average country. For example, when GTX Titan was released the Nordics were a priority market where the cards was shipped first.

Yeah, for some reason.
If you check the most sold GPUs per store, almost everything is just Nvidia. And it is still 970s and 960s that are in the top.

I don't find any correlation?
AMDs marketing has been non-existant, we have the cards but very few are being sold. Why? Because people only hear about Nvidia.
i expected they would do more research when buying something, i mean wth 960 vs 380x.. its a no brainer win for the 380x.. i dont expect smart people to listen to marketing bs, thats my point. but we are far off topic so i suggest we leave it here. its just opinions anyway.
on topic: imo the hbm2 will be left for the next gen cards by both companies. the use of hbm2 on anything else than the titan p will be a waste.
Posted on Reply
#58
Aquinus
Resident Wat-man
FordGT90ConceptOoooo, memory controller on stick. Could be used in special situations where they need TBs of memory in one system. Would be really expensive though. I think they could reasonably put 256 GiB of RAM on a stick. 2 TiB across 8 sticks is plausible.
Well, my point is that you could simplify the motherboard and external I/O by putting everything on the CPU. A huge memory store is kind of monolithic and isn't a step forward, it's a step backwards because you still need to consider communication which doesn't eliminate all of the wiring and complexity inherent with such a design, forget latency as well which is a huge thing as well (which is exactly why we have IMCs in the first place because, off die memory controllers like the MCH introduced almost twice as much latency than having the controller locally.) Eliminating external memory could enable servers to fit more CPUs in a smaller area as well and for servers, it's likely that load and memory usage will scale to some extent so, if you need more compute, there is a good bet that you'll be wanting more memory too. Fewer CPU contacts for DRAM means more contacts for other things like PCI-E or CPU-to-CPU communication like QPI or just fewer contacts period.

I would be thrilled if they had basically a Xeon D on a board twice the size of a Raspberry Pi. Now that would be impressive.
Posted on Reply
#59
FordGT90Concept
"I go fast!1!11!1!"
My problem with that is being restricted to 16 GiB of RAM per socket. 16 GiB is a lot for a GPU but not a CPU.

There's some computer work out there that does require massive reserves of memory (e.g. 3D scanning) and by having that much of ridiculously fast memory could translate to near instantaneous progress. Granted, there aren't many buyers for specialized hardware like that.
Posted on Reply
#60
Aquinus
Resident Wat-man
FordGT90ConceptMy problem with that is being restricted to 16 GiB of RAM per socket. 16 GiB is a lot for a GPU but not a CPU.
Well, obviously not every CPU would have 16GB but, that really depends on the size of the CPU, doesn't it? 16GB isn't much to a 8c/16t CPU but, it's much more reasonable for a 4c/8t CPU and if we're eliminating external memory, wouldn't it make more sense to support dual socket motherboards, so if you need more, you merely upgrade from 1 CPU to 2 CPUs, so now you would have 8c/16t and 32GB of memory. With that kind of room and extra space for pins available, you have a lot of options. How about a little further into this idea, if you have two CPU-esque sockets and you didn't need more compute, a "CPU-esque" device that was basically HBM stacks and an IMC but, minimal compute and something like QPI merely to facilitate CPU-to-CPU communication to expand memory. So now you have the option of doing both. Clearly this is all hypothesizing about what could be but, you see where the flexibility would be in having a generic "interface" if you will to connect homogeneous devices. There is a balance that needs to be struck between homogeneity and heterogeneity; the same enough to do both but, different enough to do one or the other. I still think getting memory closer to the cores is always a good idea though. Nothing reduces latency like reducing the length of a circuit. Electricity travels fast but, not when you're measuring it by the nanosecond.
FordGT90ConceptThere's some computer work out there that does require massive reserves of memory (e.g. 3D scanning) and by having that much of ridiculously fast memory could translate to near instantaneous progress. Granted, there aren't many buyers for specialized hardware like that.
That's what the memory hierarchy is for. Dump it to disk, PCI-E flash is more than capable of providing 1TB and high bandwidth. System memory doesn't need to be more than a buffer for huge data sets like that. System memory is not a dumping ground and should never be treated as such. The CPU can't use all of that at once anyways so, there is no reason to keep it close to the CPU.
Posted on Reply
#61
FordGT90Concept
"I go fast!1!11!1!"
AquinusHow about a little further into this idea, if you have two CPU-esque sockets and you didn't need more compute, a "CPU-esque" device that was basically HBM stacks and an IMC but, minimal compute and something like QPI merely to facilitate CPU-to-CPU communication to expand memory.
But why when ever CPU could be made to communicate directly with the sticks. Yeah, there's more latency but if you're doing something that requires terabytes of memory, a little extra latency isn't going to hurt anything. Case in point: FB-DIMM. On top of that, the massive bandwidth of HBM offsets the loss in latency.

I could only see HBM used on die for high-performance embedded solutions. For example, video game consoles, home theater devices, phones, and tablets. You know, places where memory usually isn't expandable.
AquinusThat's what the memory hierarchy is for. Dump it to disk, PCI-E flash is more than capable of providing 1TB and high bandwidth. System memory doesn't need to be more than a buffer for huge data sets like that. System memory is not a dumping ground and should never be treated as such. The CPU can't use all of that at once anyways so, there is no reason to keep it close to the CPU.
~2 GB/s, what I described would be in the neighborhood of 26 TB/s (and that's single channel). They really aren't comparable. That said, you'd still need those NVMe storage devices to unload the data from the HBM.
Posted on Reply
#62
$ReaPeR$
AquinusWell, obviously not every CPU would have 16GB but, that really depends on the size of the CPU, doesn't it? 16GB isn't much to a 8c/16t CPU but, it's much more reasonable for a 4c/8t CPU and if we're eliminating external memory, wouldn't it make more sense to support dual socket motherboards, so if you need more, you merely upgrade from 1 CPU to 2 CPUs, so now you would have 8c/16t and 32GB of memory. With that kind of room and extra space for pins available, you have a lot of options. How about a little further into this idea, if you have two CPU-esque sockets and you didn't need more compute, a "CPU-esque" device that was basically HBM stacks and an IMC but, minimal compute and something like QPI merely to facilitate CPU-to-CPU communication to expand memory. So now you have the option of doing both. Clearly this is all hypothesizing about what could be but, you see where the flexibility would be in having a generic "interface" if you will to connect homogeneous devices. There is a balance that needs to be struck between homogeneity and heterogeneity; the same enough to do both but, different enough to do one or the other. I still think getting memory closer to the cores is always a good idea though. Nothing reduces latency like reducing the length of a circuit. Electricity travels fast but, not when you're measuring it by the nanosecond.

That's what the memory hierarchy is for. Dump it to disk, PCI-E flash is more than capable of providing 1TB and high bandwidth. System memory doesn't need to be more than a buffer for huge data sets like that. System memory is not a dumping ground and should never be treated as such. The CPU can't use all of that at once anyways so, there is no reason to keep it close to the CPU.
thats a very good idea mate! and i would like it very much to become a reality in the next 2 years, the possibilities are so many!!!! imagine 4 such slots on a mobo and a combination of a cpu core+additional memory+2 gpu cores.. i would like that machine to exist! :D
Posted on Reply
#63
Aquinus
Resident Wat-man
FordGT90ConceptBut why when ever CPU could be made to communicate directly with the sticks. Yeah, there's more latency but if you're doing something that requires terabytes of memory, a little extra latency isn't going to hurt anything. Case in point: FB-DIMM. On top of that, the massive bandwidth of HBM offsets the loss in latency.
DDR FB-DIMMs have a reasonable number of pins to run traces through the motherboard for. How do you expect to achieve 26TB/s across a motherboard? You're sure as hell not running thousands of traces for a super wide memory bus on a socketed device. So you're still restricted by the bus. It's a waste of HBMs capability at a huge cost. Putting memory on the CPU simply improves performance and you don't need to worry about more than an interposer to have it and it's not the end so long as there is another reasonable option one step down the memory hierarchy to higher capacity, but slower memory because feasibility and cost are always a thing. Just because the size of the data you're working on is larger than system memory doesn't mean you can't do it, it just means you need to put it somewhere else during the time you're not actively using it. This is the very reason why at work our database is storage is on a SAN and not in system memory or even a local disk for that matter.
Posted on Reply
#64
D007
ZoneDymoIs it though?
Because sure, if you look here:
www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_1070/27.html

and here:
www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_1080/30.html

We see the GTX1080 having roughly 80 GB/s more bandwidth, but in performance the memory difference does not seem to matter and in the end the GTX1080 does lose out to the R9 295X2 (if you scroll down to that benchmark).
If the bandwidth was really holding gpu's back already that should not be right?

All the other advantages of smaller cooler cheaper gpu's sure, but in performance it does not seem to change the game at all.
I see the R9 getting close to the 1080 but not beating it in like any benchmarks.
And is that really a liquid cooler on the R9 in testing, vs a FE? lol.. Yea, not very fair on that either.. Cooling = performance.. At the end the 1080 is at 78c, meaning it's starting to throttle.. On the ACX 3.0, it wouldn't even be close to throttling. while that R9 with extra cooling is running very cool.

Posted on Reply
#65
ZoneDymo
D007I see the R9 getting close to the 1080 but not beating it in like any benchmarks.
And is that really a liquid cooler on the R9 in testing, vs a FE? lol.. Yea, not very fair on that either.. Cooling = performance.. At the end the 1080 is at 78c, meaning it's starting to throttle.. On the ACX 3.0, it wouldn't even be close to throttling. while that R9 with extra cooling is running very cool.

ermm you dont? its right there at the bottom of the first thing I linked man..
BF3, 2560x1440:
GTX1080: 138 fps
R9 295X: 151 fps

And liquid cooling as nothing to do with it, this was a discussion about GDDR5X or HBM giving an performance increase atm or not...pay attention pls.
Posted on Reply
#66
D007
ZoneDymoermm you dont? its right there at the bottom of the first thing I linked man..
BF3, 2560x1440:
GTX1080: 138 fps
R9 295X: 151 fps

And liquid cooling as nothing to do with it, this was a discussion about GDDR5X or HBM giving an performance increase atm or not...pay attention pls.
So different people, having different results, on the same game, with the same card..
That doesn't compute. Because the link I posted, it did not do better. Especially in games like witcher 3, 4k, with real HD graphics.. The 1080 whomped it like a red headed step child.
Regardless what the discussion is about. If someone posts BS, I'm going to call them on it. Don't get mad now.. Don't post nonsense and bad benchmarks pls..
Compare cards with similar cooling options or don't.. I mean you do know what throttling is right?
Posted on Reply
#67
FordGT90Concept
"I go fast!1!11!1!"
2x28nm versus 1x16nm, not exactly apples to apples.
Posted on Reply
#68
ZoneDymo
D007So different people, having different results, on the same game, with the same card..
That doesn't compute. Because the link I posted, it did not do better. Especially in games like witcher 3, 4k, with real HD graphics.. The 1080 whomped it like a red headed step child.
Regardless what the discussion is about. If someone posts BS, I'm going to call them on it. Don't get mad now.. Don't post nonsense and bad benchmarks pls..
Compare cards with similar cooling options or don't.. I mean you do know what throttling is right?
You seriously do not get it do you?
If I say for example the Honda S2000 has better windows then the Bugatti Veyron...
Are you going to mention thats not true because the Veyron is faster?

You seem to be unable to grasp the simple discussion about the need for GDDR5X/HBM memory vs standard GDDR5....which I find amazing...either that or you need to work on your reading comprehension, oh well.

PS:
You call Techpowerup's own benchmarks "nonsense" and "bad"?
Why are you even on this website then?
Posted on Reply
#69
Deep
NokironWell, that really depends on the market. In the nordics, retailers are selling ten GTX 1080 for every RX 480.
Fluffmeister@Nokiron you heard it here first, AMD fans think Sweden doesn't matter!
I would disagree, at least here in Finland the GTX 1080 is pretty hard to come by. From what I've read on the local forums a lot of people have cancelled their 1080 pre-orders and gone for the 1070, due to the non-existent stock and shipping dates being pushed back week after week. However if you count the people who love the attention and exponential e-peen growth a 800€ card brings, it would seem everybody's rocking a 1080. But I would be guessing the majority are just waiting for the 1060 and RX 480 custom cards...myself included.
:)
Posted on Reply
#70
Nokiron
DeepI would disagree, at least here in Finland the GTX 1080 is pretty hard to come by. From what I've read on the local forums a lot of people have cancelled their 1080 pre-orders and gone for the 1070, due to the non-existent stock and shipping dates being pushed back week after week. However if you count the people who love the attention and exponential e-peen growth a 800€ card brings, it would seem everybody's rocking a 1080. But I would be guessing the majority are just waiting for the 1060 and RX 480 custom cards...myself included.
:)
Well, that is kind of the point. Everything is on backorder and as soon as a new batch of cards arrive they are all gone.
Posted on Reply
#71
Basard
I can't wait until we are bowing before our computer masters as fleshy slaves!
Posted on Reply
#72
RejZoR
FordGT90Concept2x28nm versus 1x16nm, not exactly apples to apples.
And 2 generations in between. I'd say it's apples to apples, just different kinds of apples.
Posted on Reply
#73
Reuben Mitchell
In New Zealand, cheapest RX 480 is $480 (ironic) which is about same price as cheapest R9390 and GTX970. Cheapest 1070 is $790 and cheapest 1080 is $1220, almost 3 times the price of the RX 480. IS the GTX1080 2.5 times better than the RX 480 ?
Posted on Reply
#74
FordGT90Concept
"I go fast!1!11!1!"
In DX11, GTX 1080 is about twice as fast as RX 480. In DX12 and Vulkan, RX 480 comes in really close to GTX 1070. That said, RX 480 is held back by its electrical cap (does not exceed 170w). AIB partner RX 480s should start showing up this week.
Posted on Reply
#75
Fluffmeister
FordGT90Concept2x28nm versus 1x16nm, not exactly apples to apples.
The 295 also sucks power like it's going out of fashion.
Posted on Reply
Add your own comment
Apr 26th, 2024 08:01 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts