• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Confirms HBM is Supported on Sapphire Rapids Xeons

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,190 (0.91/day)
Intel has just released its "Architecture Instruction Set Extensions and Future Features Programming Reference" manual, which serves the purpose of providing the developers' information about Intel's upcoming hardware additions which developers can utilize later on. Today, thanks to the @InstLatX64 on Twitter we have information that Intel is bringing on-package High Bandwidth Memory (HBM) solution to its next-generation Sapphire Rapids Xeon processors. Specifically, there are two instructions mentioned: 0220H - HBM command/address parity error and 0221H - HBM data parity error. Both instructions are there to address data errors in HBM so the CPU operates with correct data.

The addition of HBM is just one of the many new technologies Sapphire Rapids brings. The platform is supposedly going to bring many new technologies like an eight-channel DDR5 memory controller enriched with Intel's Data Streaming Accelerator (DSA). To connect to all of the external accelerators, the platform uses PCIe 5.0 protocol paired with CXL 1.1 standard to enable cache coherency in the system. And as a reminder, this would not be the first time we see a server CPU use HBM. Fujitsu has developed an A64FX processor with 48 cores and HBM memory, and it is powering today's most powerful supercomputer - Fugaku. That is showing how much can a processor get improved by adding a faster memory on-board. We are waiting to see how Intel manages to play it out and what we end up seeing on the market when Sapphire Rapids is delivered.


View at TechPowerUp Main Site
 
Joined
Feb 3, 2017
Messages
3,475 (1.33/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
I get that HBM has very nice upsides but I am very afraid that soon CPUs will come with embedded memory so no upgrades and premium can be charged for more memory.
Looks like HBM might be planned to be used as sort of L4 cache but still.
 
Joined
Feb 11, 2009
Messages
5,389 (0.98/day)
System Name Cyberline
Processor Intel Core i7 2600k -> 12600k
Motherboard Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling Tuniq Tower 120 -> Custom Watercoolingloop
Memory Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s) AMD RX480 -> ... nope still the same :'(
Storage Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s) Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s) Focusrite 2i4 (USB)
Power Supply Seasonic 620watt 80+ Platinum
Mouse Elecom EX-G
Keyboard Rapoo V700
Software Windows 10 Pro 64bit
I get that HBM has very nice upsides but I am very afraid that soon CPUs will come with embedded memory so no upgrades and premium can be charged for more memory.
Looks like HBM might be planned to be used as sort of L4 cache but still.

Personally I wonder if the HBM in that case cant be used as an extra in between step.

AMD's new gpu's have that infinity Cache which is basically super fast memory and then have standard GDDR6 next to that.
So why not have this HBM be the Intel version of that and still have memory next to it, just an extra step just how ram is an inbetween for CPU and Storage.

Heck maybe AMD in the future will have Infinity Cache > HBM > GDDR6
 
Joined
Feb 3, 2017
Messages
3,475 (1.33/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
Infinity Cache is just marketing - it is basically just L3 cache with a cool-sounding name.
But you are right though, like I said, I would expect HBM to be become L4 cache for now.

Another layer of memory might bring some other possibilities to the table as well. XPoint DIMMs are what stands out as something that would gain from this.
 
Joined
Jul 13, 2016
Messages
2,794 (0.99/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
Personally I wonder if the HBM in that case cant be used as an extra in between step.

AMD's new gpu's have that infinity Cache which is basically super fast memory and then have standard GDDR6 next to that.
So why not have this HBM be the Intel version of that and still have memory next to it, just an extra step just how ram is an inbetween for CPU and Storage.

Heck maybe AMD in the future will have Infinity Cache > HBM > GDDR6

Infinity Cache = L3 Cache, it's been around on CPUs for awhile. AMD just added it to their GPUs to reduce memory bandwidth requirements.

When you are adding HBM to a CPU via an interposer you are talking about considerably increasing the cost of manufacture and time to produce. AMD learned this with Vega.

Aside from a few professional scenarios, I don't really see how regular consumers would benefit from having both HBM and DDR. If you are having such a problem with cache misses (which neither AMD nor Intel do) that you need another layer of storage between the L4 and main memory, you'd be much wiser to improve the amount of cache your CPU has or tweak what it decides to store in cache. Cache is still vastly faster and of lower latency than HBM. HBM has much more bandwidth than DDR4 but consumer systems don't really need more bandwidth right now. Heck we are still using dual channel memory and you'd be hard pressed to find a game that actually benefits from quad channel.

The thing with AMD's L3 infinity cache is that it fixes a downside of their choice of memory. It doesn't go searching for a solution to a problem that doesn't exist.
 
Joined
Nov 6, 2016
Messages
1,561 (0.58/day)
Location
NH, USA
System Name Lightbringer
Processor Ryzen 7 2700X
Motherboard Asus ROG Strix X470-F Gaming
Cooling Enermax Liqmax Iii 360mm AIO
Memory G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s) Sapphire RX 5700XT Nitro+
Storage Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s) LG 34BK95U-W 34" 5120 x 2160
Case Lian Li PC-O11 Dynamic (White)
Power Supply BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse Glorious Model O (Matte White)
Keyboard Royal Kludge RK71
Software Windows 10
Infinity Cache = L3 Cache, it's been around on CPUs for awhile. AMD just added it to their GPUs to reduce memory bandwidth requirements.

When you are adding HBM to a CPU via an interposer you are talking about considerably increasing the cost of manufacture and time to produce. AMD learned this with Vega.

Aside from a few professional scenarios, I don't really see how regular consumers would benefit from having both HBM and DDR. If you are having such a problem with cache misses (which neither AMD nor Intel do) that you need another layer of storage between the L4 and main memory, you'd be much wiser to improve the amount of cache your CPU has or tweak what it decides to store in cache. Cache is still vastly faster and of lower latency than HBM. HBM has much more bandwidth than DDR4 but consumer systems don't really need more bandwidth right now. Heck we are still using dual channel memory and you'd be hard pressed to find a game that actually benefits from quad channel.

The thing with AMD's L3 infinity cache is that it fixes a downside of their choice of memory. It doesn't go searching for a solution to a problem that doesn't exist.
HBM integrated into a powerful APU would be helpful, but market forces keep that from happening as people would rather upgrade memory and cpus/apus separately
 
Joined
Oct 22, 2014
Messages
13,210 (3.83/day)
Location
Sunshine Coast
System Name Black Box
Processor Intel Xeon E3-1260L v5
Motherboard MSI E3 KRAIT Gaming v5
Cooling Tt tower + 120mm Tt fan
Memory G.Skill 16GB 3600 C18
Video Card(s) Asus GTX 970 Mini
Storage Kingston A2000 512Gb NVME
Display(s) AOC 24" Freesync 1m.s. 75Hz
Case Corsair 450D High Air Flow.
Audio Device(s) No need.
Power Supply FSP Aurum 650W
Mouse Yes
Keyboard Of course
Software W10 Pro 64 bit
Just because it can be supported doesn't mean it will be used, especially if Intel outsources chip production to partners making it optional.
 
Joined
Jan 8, 2017
Messages
8,863 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
I get that HBM has very nice upsides but I am very afraid that soon CPUs will come with embedded memory so no upgrades and premium can be charged for more memory.
Looks like HBM might be planned to be used as sort of L4 cache but still.

It's the only wat to get around the ever increasing gap between DRAM bandwidth and CPU throughput.
 

Tech00

New Member
Joined
Dec 13, 2020
Messages
6 (0.00/day)
It's the only wat to get around the ever increasing gap between DRAM bandwidth and CPU throughput.
Correct! Already Skylake/Cascadelake gen of server CPUs from intel are not bottle necked by the CPU's processing capability but by the memory subsytem. Their memory system cannot keep up with the CPU and quickly becomes the bottleneck. The only way to significantly improve is to make the susbystem faster and lower latency and a level 4 tier will help do some of that (DDR5 on its own is still not fast enough but will also help when combined with L HBM4).
In other words: Level 4 cache HBM is the logical next step to feed the beast.
This will be expensive though so I think this is going to be high end Xeon server and workstation only. Unless intel somehow figured out some new smart, cost effective way to implement this... I can't see it though...
 
Joined
Jul 7, 2019
Messages
821 (0.48/day)
Frankly, I'm surprised AMD wasn't the first to push out the embedded HBM concept with higher-end APUs, considering they've been trying to push HBM off and on. HBM would have been perfect for high-end APUs and help fill in that memory bottleneck on Vega and RDNA-based APUs. For that matter, I wonder if future AMD mobos might 2GB or 4GB HBM3 embedded on the mobo chipset or even directly into the main lanes that connect the CPU directly to the GPU and 1st NVMe drive, serving as a sort of supplementary "Infinity cache" to either/both the CPU and GPU as well as the NVMe. 2GB on a B-50 series and 4GB on an X-70 series can provide some benefit to iGPUs as well as dedicated GPUs, and also serving as extra cache for the CPU side should it need it more for certain tasks.
 
Last edited:
Joined
Jul 13, 2016
Messages
2,794 (0.99/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
HBM integrated into a powerful APU would be helpful, but market forces keep that from happening as people would rather upgrade memory and cpus/apus separately

If you did create an APU with HBM, it would be priced to the point where I'd make more sense just to add a dGPU. Might be useful for professional applications but again, you'd need a significant amount of expensive HBM for those markets.

That's not a necessity especially when 3d stacking is already here ~

You are replacing one expensive process with another. There are a lot of other concerns with vertical stacking as well that have to be taken into consideration during design. First, the order of the stack is important. The CPU die essentially has to be the bottom of the stack (assuming you still have IO and CPU cores together) as the die on the bottom will have the lowest latencies. The more the stacks, the higher the latency penalty of dies at the top. At some point you'd certainly need to design an active interconnect so that the stacks can communicate efficiently as well. If you just have dumb wires run, routing between the stacks is going to be suboptimal. The university of Toranto did a paper on the use of a active interposer for routing data between chiplets (same idea, only horizontal) and found that the more dies that are used, the greater the impact the use of an active interposer would have. In principle, multi-chiplet designs (through either 3D stacking or otherwise) stand to benefit massively as they increase in complexity. By benefit I mean erase the latency penalty and under ideal conditions, beat out monolithic designs. Second, there's heat. If you put HBM on top you are then putting a barrier to heat transfer for your CPU die. Heat would first have to transfer through the HBM to get to the IHS. Aside for the potential of degradation of the HBM (which may be mitigable) you'd likely have to make performance compromises due to the thermal restrictions. If you are looking for maximum performance, going horizontal is far better. IMO 3D stacking is best used in conjuction with a horizontal interposer. You can split off low power components like HBM and IO and keep high power parts like CPU dies unstacked, all while retaining maximum performance and thermals without having a monstrous CPU size. Last, any product has to be designed from the ground up for vertical stacking. Traces have to be made to properly connect the stacks and enable communication. There are likely many design considerations that need to be done.
 
Joined
Apr 24, 2020
Messages
2,520 (1.75/day)
Personally I wonder if the HBM in that case cant be used as an extra in between step.

AMD's new gpu's have that infinity Cache which is basically super fast memory and then have standard GDDR6 next to that.
So why not have this HBM be the Intel version of that and still have memory next to it, just an extra step just how ram is an inbetween for CPU and Storage.

Heck maybe AMD in the future will have Infinity Cache > HBM > GDDR6

Of course it "can" be used as an extra step, but I doubt it.

From a latency perspective, HBM is the same latency as any other DRAM (including DDR4), so you may win in bandwidth, but without a latency win... there's a huge chance you're just slowing things down. Xeon Phi had a HMC + DDR4 version (HMC was a stacked-ram competitor to HBM), and that kind of architecture is really hard and non-obvious to optimize for. Latency-sensitive code would be better run off of DDR4 (which is cheaper, and therefore physically larger). Bandwidth-sensitive code would prefer HBM.

As a programmer: its very non-obvious if you'll be latency-sensitive or bandwidth-sensitive. As a system engineer, who combined multiple code together, it is further non-obvious... so configuring such a system is just too complicated in the real world.

----------

HBM-only would probably be the way to go. Unless someone figures out how to solve this complexity issue (or better predict latency vs bandwidth sensitive code).
 
Joined
Feb 3, 2017
Messages
3,475 (1.33/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
The very wide bus of HBM should allow high bandwidth without the latency losses incurred by data doubling and prefetching that affects latency on DDR4/5?
 
Joined
Apr 24, 2020
Messages
2,520 (1.75/day)
The very wide bus of HBM should allow high bandwidth without the latency losses incurred by data doubling and prefetching that affects latency on DDR4/5?

The reason prefetching / etc. etc. exists, is because most of the latency is from the DRAM cell itself. It doesn't matter if you're using HBM, DDR4, DDR5, or GDDR6x, they are all using DRAM cells with significant amounts of latency.

If you do DDR4 -> HBM -> Cache, it means you're now incurring two DRAM latencies per read/write, instead of one. A more reasonable architecture is DDR4 -> Cache + HBM->Cache, splitting the two up. However, that architecture is very difficult to program. As such, the most reasonable in practice is HBM->Cache (and avoiding the use of DDR4 / DDR5).

Unless Intel wants another Xeon Phi I guess...
 
Top