Wednesday, June 11th 2025

Micron Ships HBM4 Samples: 12-Hi 36 GB Modules with 2 TB/s Bandwidth
Micron has achieved a significant advancement of the HBM4 architecture, which will stack 12 DRAM dies (12-Hi) to provide 36 GB of capacity per package. According to company representatives, initial engineering samples are scheduled to ship to key partners in the coming weeks, paving the way for full production in early 2026. The HBM4 design relies on Micron's established 1β ("one-beta") process node for DRAM tiles, in production since 2022, while it prepares to introduce EUV-enabled 1γ ("one-gamma") later this year for DDR5. By increasing the interface width from 1,024 to 2,048 bits per stack, each HBM4 chip can achieve a sustained memory bandwidth of 2 TB/s, representing a 20% efficiency improvement over the existing HBM3E standard.
NVIDIA and AMD are expected to be early adopters of Micron's HBM4. NVIDIA plans to integrate these memory modules into its upcoming Rubin-Vera AI accelerators in the second half of 2026. AMD is anticipated to incorporate HBM4 into its next-generation Instinct MI400 series, with further information to be revealed at the company's Advancing AI 2025 conference. The increased capacity and bandwidth of HBM4 will address growing demands in generative AI, high-performance computing, and other data-intensive applications. Larger stack heights and expanded interface widths enable more efficient data movement, a critical factor in multi-chip configurations and memory-coherent interconnects. As Micron begins mass production of HBM4, major obstacles to overcome will be thermal performance and real-world benchmarks, which will determine how effectively this new memory standard can support the most demanding AI workloads.
Source:
Micron
NVIDIA and AMD are expected to be early adopters of Micron's HBM4. NVIDIA plans to integrate these memory modules into its upcoming Rubin-Vera AI accelerators in the second half of 2026. AMD is anticipated to incorporate HBM4 into its next-generation Instinct MI400 series, with further information to be revealed at the company's Advancing AI 2025 conference. The increased capacity and bandwidth of HBM4 will address growing demands in generative AI, high-performance computing, and other data-intensive applications. Larger stack heights and expanded interface widths enable more efficient data movement, a critical factor in multi-chip configurations and memory-coherent interconnects. As Micron begins mass production of HBM4, major obstacles to overcome will be thermal performance and real-world benchmarks, which will determine how effectively this new memory standard can support the most demanding AI workloads.
11 Comments on Micron Ships HBM4 Samples: 12-Hi 36 GB Modules with 2 TB/s Bandwidth
But we're pretty close to 2TB/s already with the 5090, so the next consumer gen with better binned GDDR7 might manage to achieve that.
For ~$15-20k, nottaproblemo, hahahahaha :D
G7 also has 3GB modules now. Coupled with speeds approaching 40Gbps it would mean the following bandwidth on these bus widths:
92bit: 460 GB/s. Likely 4x2GB for 8GB capacity on an entry level card.
128bit: 640 GB/s. Likely 6x2GB for 12GB capacity on an low end card.
192bit: 960 GB/s. Likely 8x2GB for16GB capacity on an midrange card.
256bit: 1,3 TB/s. Likely 6x3GB for 18GB capacity on an upper midrange card.
320bit: 1,6 TB/s. Likely 8x3GB for 24GB capacity on an high-end card.
352bit: 1,8 TB/s. Likely 10x3GB for 30GB capacity on an high-end card.
384bit: 2,0 TB/s. Likely 12x3GB for 36GB capacity on an enthusiast card.
512bit: 2,5 TB/s. Likely 12x3GB for 36GB capacity on an enthusiast card.
G7 still needs multiples of 2 chips. At least that's the way i see it. Im sure Nvidia will see fit to give us less speeds and predominantly still use 2GB modules.
I included both 384bit and 512bit but generally it has been an OR situation where only one of these has been used for the flagship. Hence the same capacity.
Obviously it's possible to do a clamshell 24x2GB=48GB or 24x3GB=72GB, but such capacities are not really needed for gaming GPU's.
Overall it's possible to match one stack of HBM4 capacity and speed with 12x3GB 384bit G7 configuration. Obviously it will take up more space on the PCB and likely consumes twice as much power vs HBM4. Which version? AMD was able to make a profit on 16GB HBM2 card six years ago. I dont believe no one cant do the same today with HBM3 or HBM3e on a high end card. HBM4 will obviously be reserved for data center monsters. Also using HBM would reduce PCB complexity and lower the cost because it's already on an interposer. It would also lower the power consumption of the card either by total or allowing more for the GPU itself.
I may be wrong on that, but I believe that 16x GDDR7 modules should be cheaper than an HBM stack, specially when we include the production cost. Minor nit but i guess you got some numbers wrong.
As an example, 320-bit would be 10 channels, so either 10x 2 or 3GB modules for 20 or 30GB in total. Assuming those 40Gbps modules.
512b would be 16-channels, so 16x 2 or 3GB modules for 32 or 48GB in total (not accounting for clamshell). It doesn't, that's a matter of how many controllers you have. Nothing stops you from having 5 controllers for a 160-bit bus, or 11 controllers for a 352-bit bus.
There's no complexity or performance issues related to having an odd number of controllers.
The 1080ti and 2080ti are great counter-examples, with their 352-bit bus, along with the RX 6700 and its 160-bit bus.