Micron Ships HBM4 Samples: 12-Hi 36 GB Modules with 2 TB/s Bandwidth

AleksandarK · Wednesday at 7:08 AM

Micron has achieved a significant advancement of the HBM4 architecture, which will stack 12 DRAM dies (12-Hi) to provide 36 GB of capacity per package. According to company representatives, initial engineering samples are scheduled to ship to key partners in the coming weeks, paving the way for full production in early 2026. The HBM4 design relies on Micron's established 1β ("one-beta") process node for DRAM tiles, in production since 2022, while it prepares to introduce EUV-enabled 1γ ("one-gamma") later this year for DDR5. By increasing the interface width from 1,024 to 2,048 bits per stack, each HBM4 chip can achieve a sustained memory bandwidth of 2 TB/s, representing a 20% efficiency improvement over the existing HBM3E standard.

NVIDIA and AMD are expected to be early adopters of Micron's HBM4. NVIDIA plans to integrate these memory modules into its upcoming Rubin-Vera AI accelerators in the second half of 2026. AMD is anticipated to incorporate HBM4 into its next-generation Instinct MI400 series, with further information to be revealed at the company's Advancing AI 2025 conference. The increased capacity and bandwidth of HBM4 will address growing demands in generative AI, high-performance computing, and other data-intensive applications. Larger stack heights and expanded interface widths enable more efficient data movement, a critical factor in multi-chip configurations and memory-coherent interconnects. As Micron begins mass production of HBM4, major obstacles to overcome will be thermal performance and real-world benchmarks, which will determine how effectively this new memory standard can support the most demanding AI workloads.

View at TechPowerUp Main Site | Source

Ultron1337 · Wednesday at 1:28 PM

Place your bets ladies and gentlemen, will we have high end nVidia 6090 and/or AMD RTXx090 GPUs with with 2 TB/s 36GB HBM4 ?

igormp · Wednesday at 3:11 PM

Ultron1337 said:
Place your bets ladies and gentlemen, will we have high end nVidia 6090 and/or AMD RTXx090 GPUs with with 2 TB/s 36GB HBM4 ?

I doubt the HBM part.
But we're pretty close to 2TB/s already with the 5090, so the next consumer gen with better binned GDDR7 might manage to achieve that.

bonehead123 · Wednesday at 3:41 PM

Ultron1337 said:
Place your bets ladies and gentlemen, will we have high end nVidia 6090 and/or AMD RTXx090 GPUs with with 2 TB/s 36GB HBM4 ?

Well, you most certainly can have that......

For ~$15-20k, nottaproblemo, hahahahaha

AnotherReader · Wednesday at 5:14 PM

igormp said:
I doubt the HBM part.
But we're pretty close to 2TB/s already with the 5090, so the next consumer gen with better binned GDDR7 might manage to achieve that.

GDDR7 requires 16 devices for that bandwidth whereas HBM4 can manage it with just one stack. Of course, HBM is far too expensive to be used in consumer GPUs.

lexluthermiester · Wednesday at 6:03 PM

This is impressive. NVidia should be using these for Geforce cards.

Drash · Wednesday at 6:15 PM

didn't the Radeon 6900XT have some cache that was 1.5TB/s 5 years ago? wasn't HBM either.

Tomorrow · Wednesday at 6:15 PM

Ultron1337 said:
Place your bets ladies and gentlemen, will we have high end nVidia 6090 and/or AMD RTXx090 GPUs with with 2 TB/s 36GB HBM4 ?

Extremely low chances of that happening - especially with Nvidia. Slightly higher chance with AMD since UDNA likely includes both G7 and HBM controllers. And not because of price. I think mostly because of supply. All HBM supply is going to data centers. Besides GDDR7 just launched earlier this year and it's not maxed out yet. As much i like HBM and it's compact size i have to be realistic on this.

G7 also has 3GB modules now. Coupled with speeds approaching 40Gbps it would mean the following bandwidth on these bus widths:

92bit: 460 GB/s. Likely 4x2GB for 8GB capacity on an entry level card.
128bit: 640 GB/s. Likely 6x2GB for 12GB capacity on an low end card.
192bit: 960 GB/s. Likely 8x2GB for16GB capacity on an midrange card.
256bit: 1,3 TB/s. Likely 6x3GB for 18GB capacity on an upper midrange card.
320bit: 1,6 TB/s. Likely 8x3GB for 24GB capacity on an high-end card.
352bit: 1,8 TB/s. Likely 10x3GB for 30GB capacity on an high-end card.
384bit: 2,0 TB/s. Likely 12x3GB for 36GB capacity on an enthusiast card.
512bit: 2,5 TB/s. Likely 12x3GB for 36GB capacity on an enthusiast card.

G7 still needs multiples of 2 chips. At least that's the way i see it. Im sure Nvidia will see fit to give us less speeds and predominantly still use 2GB modules.
I included both 384bit and 512bit but generally it has been an OR situation where only one of these has been used for the flagship. Hence the same capacity.
Obviously it's possible to do a clamshell 24x2GB=48GB or 24x3GB=72GB, but such capacities are not really needed for gaming GPU's.

Overall it's possible to match one stack of HBM4 capacity and speed with 12x3GB 384bit G7 configuration. Obviously it will take up more space on the PCB and likely consumes twice as much power vs HBM4.

AnotherReader said:
HBM is far too expensive to be used in consumer GPUs.

Which version? AMD was able to make a profit on 16GB HBM2 card six years ago. I dont believe no one cant do the same today with HBM3 or HBM3e on a high end card. HBM4 will obviously be reserved for data center monsters. Also using HBM would reduce PCB complexity and lower the cost because it's already on an interposer. It would also lower the power consumption of the card either by total or allowing more for the GPU itself.

igormp · Wednesday at 8:13 PM

AnotherReader said:
GDDR7 requires 16 devices for that bandwidth whereas HBM4 can manage it with just one stack. Of course, HBM is far too expensive to be used in consumer GPUs.

Yeah, on a 512-bit bus. But I don't think we'd be seeing a single HBM stack on a entry/mid-level GPU anyway, so the comparison still stands.
I may be wrong on that, but I believe that 16x GDDR7 modules should be cheaper than an HBM stack, specially when we include the production cost.

Tomorrow said:
92bit: 460 GB/s. Likely 4x2GB for 8GB capacity on an entry level card.
128bit: 640 GB/s. Likely 6x2GB for 12GB capacity on an low end card.
192bit: 960 GB/s. Likely 8x2GB for16GB capacity on an midrange card.
256bit: 1,3 TB/s. Likely 6x3GB for 18GB capacity on an upper midrange card.
320bit: 1,6 TB/s. Likely 8x3GB for 24GB capacity on an high-end card.
352bit: 1,8 TB/s. Likely 10x3GB for 30GB capacity on an high-end card.
384bit: 2,0 TB/s. Likely 12x3GB for 36GB capacity on an enthusiast card.
512bit: 2,5 TB/s. Likely 12x3GB for 36GB capacity on an enthusiast card.

Minor nit but i guess you got some numbers wrong.
As an example, 320-bit would be 10 channels, so either 10x 2 or 3GB modules for 20 or 30GB in total. Assuming those 40Gbps modules.
512b would be 16-channels, so 16x 2 or 3GB modules for 32 or 48GB in total (not accounting for clamshell).

Tomorrow said:
G7 still needs multiples of 2 chips.

It doesn't, that's a matter of how many controllers you have. Nothing stops you from having 5 controllers for a 160-bit bus, or 11 controllers for a 352-bit bus.

Tomorrow · Wednesday at 8:33 PM

igormp said:
It doesn't, that's a matter of how many controllers you have. Nothing stops you from having 5 controllers for a 160-bit bus, or 11 controllers for a 352-bit bus.

It's doable, but not a good idea due to performance and complexity reasons. Look how GTX 970 turned out.

igormp · Wednesday at 8:54 PM

Tomorrow said:
It's doable, but not a good idea due to performance and complexity reasons. Look how GTX 970 turned out.

The 970 had a different issue on how the controllers were wired up, it had a proper 256-bit bus with all of its 4x 64-bit controllers perfectly in place. The issue was how one of those controllers was connected to its GPC.
There's no complexity or performance issues related to having an odd number of controllers.
The 1080ti and 2080ti are great counter-examples, with their 352-bit bus, along with the RX 6700 and its 160-bit bus.

AnotherReader · Wednesday at 9:20 PM

igormp said:
Yeah, on a 512-bit bus. But I don't think we'd be seeing a single HBM stack on a entry/mid-level GPU anyway, so the comparison still stands.
I may be wrong on that, but I believe that 16x GDDR7 modules should be cheaper than an HBM stack, specially when we include the production cost.

...

Given previous estimates for the cost of HBM, I wouldn't be surprised if one stack of HBM was significantly more expensive than 32 GB of GDDR7. As @Tomorrow pointed out, an additional benefit of HBM is reduced chip area devoted to memory PHYs. This would allow an even larger GPU or a slightly smaller GPU with reduced TDP due to the greater power efficiency of HBM when compared to GDDR.

Tomorrow said:
...
Which version? AMD was able to make a profit on 16GB HBM2 card six years ago. I dont believe no one cant do the same today with HBM3 or HBM3e on a high end card. HBM4 will obviously be reserved for data center monsters. Also using HBM would reduce PCB complexity and lower the cost because it's already on an interposer. It would also lower the power consumption of the card either by total or allowing more for the GPU itself.

A product like the 5090 could certainly make money using HBM, but at least in 2023, CoWoS, the packaging required for HBM was a bottleneck. Given the existence of this bottleneck, it makes sense to utilize HBM only for the most expensive datacenter products such as Nvidia's B200 and AMD's MI325X.

Processor	AMD 5600X
Motherboard	ASUS TUF GAMING B550M-Plus WiFi
Cooling	be quiet! Dark Rock 4
Memory	G.Skill Ripjaws 2 x 32 GB DDR4-3600 CL18-22-22-42 1.35V F4-3600C18D-64GVK
Video Card(s)	Sapphire Nitro+ RX 7900 XTX 24GB
Storage	Kingston KC3000 2TB + QNAP TBS-464
Display(s)	LG 35" LCD 35WN75C-B 3440x1440
Case	Kolink Bastion RGB Midi-Tower
Power Supply	Seasonic VERTEX PX-750 80+ Platinum
Mouse	Razer Deathadder v2
Benchmark Scores	phi4 - 62 tokens/s gemma3:27B - 35 tps

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	The Little One
Processor	i5-11320H @3.4GHZ
Motherboard	Beelink/AZW SEI
Cooling	Fan w/heat pipes + side & rear vents
Memory	64GB Crucial DDR4-3200 (2x 32GB)
Video Card(s)	Iris XE+
Storage	WD Black SN850X 8TB m.2 + Seagate 4TB SATA SSD + 8TB SN850X x2 in an external USB-C enclosure
Display(s)	2x Samsung 43" + 1x 32"
Case	Practically identical to a mac mini, just purrtier in slate blue, & with 3x usb ports on the front !
Audio Device(s)	No-name compact bluetooth speakers
Power Supply	65w brick
Mouse	Logitech MX Master 3
Keyboard	Logitech G613 mechanical wireless
VR HMD	Whahdatiz ???
Software	Windows 10 pro, with all the unnecessary background shitzu turned OFF !
Benchmark Scores	PDQ

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

System Name	My PC
Processor	AMD 9800X3D
Motherboard	MSI MPG B850 Edge TI Wifi
Cooling	Deepcool AK620 White, 4 x 140 PWM case fans
Memory	2 x 16GB Corsair Vengeance 6000MHz C28 EXPO DDR5
Video Card(s)	MSI RX 6900 XT Gaming X Trio
Storage	WD SN7100 2TB, MX500 2TB x 2, 3TB WD Blue
Display(s)	27" curved 165Hz VA 1080p (Gigabyte)
Case	Montech Air 903 Max (white)
Audio Device(s)	Creative X4, Onkyo AVR + Monitor Audio MASS 5.1, GigaByte Aorus G5 headphones, AKG K550 headphones
Power Supply	NZXT C850 ATX3.1
Mouse	Deathadder 2
Keyboard	Xtrfy K4
Software	W11 Pro
Benchmark Scores	TBD

System Name	DarkStar
Processor	AMD Ryzen 7 5800X3D
Motherboard	Gigabyte X570 Aorus Master 1.0 (BIOS F39g)
Cooling	Arctic Liquid Freezer II 420mm AIO (rev4)
Memory	4x8GB Patriot Viper DDR4 4400C19 @ 3733Mhz 14-14-13-27 1T
Video Card(s)	Gigabyte Radeon RX 9070 XT Gaming OC 16GB GDDR6 @ 3400Mhz Core/22Gbps Mem
Storage	1TB Samsung 990 Pro (OS);2TB Samsung PM9A1;4TB XPG S70 Blade (Games);14TB WD UltraStar HC530 (Video)
Display(s)	27" LG UltraGear 27GS85Q-B @ 2560x1440 @ 200Hz, Nano-IPS
Case	be quiet! Dark Base Pro 900 Rev.2
Audio Device(s)	SteelSeries Arctis Nova Pro Wireless
Power Supply	1000W Seasonic PRIME Ultra Titanium;600W APC SMT750i UPS
Mouse	Logitech G604
Keyboard	Logitech G910 Orion Spark
Software	Windows 11 Pro x64 24H2 (Build 26100.4351)

Micron Ships HBM4 Samples: 12-Hi 36 GB Modules with 2 TB/s Bandwidth

News Editor