Tuesday, November 29th 2022

Samsung Develops GDDR6W Memory Standard: Double the Bandwidth and Density of GDDR6 Through Packaging Innovations

Press Release by

Nov 29th, 2022 00:53 Discuss (30 Comments)

As advanced graphics and display technologies develop, they are blurring the lines between metaverse and our everyday experience. Much of this important shift is being made possible by the advancement of memory solutions designed for graphics products. One of the biggest challenges for improving virtual reality is taking the complexities of real-world objects and environments and recreating them in a virtual space. Doing so requires massive memory and increased computing power. At the same time, the benefits of creating more true-to-life metaverse will be far reaching, including real-life simulations of complicated scenarios and more, sparking innovation across a number of industries.

This is the central idea behind one of the most popular concepts in virtual reality: digital twin. A digital twin is a virtual representation of an object or space. Updated in real-time in accordance with the actual environment, a digital twin spans the lifecycle of its source and uses simulation, machine learning and reasoning to help decision-making. While until recently this was not feasible proposition due to limitations on data processing and transference, digital twins are now gaining traction thanks to availability of high bandwidth technologies.

Like other tech innovations, the gaming industry thrives on constant innovation, with new updates in speed and performance driving the market forward year after year. Thanks to the development of technologies like Ray Tracing in 3D rendering, which traces the reflection of light in a given scene, graphics in high-end AAA gaming are becoming hyper realistic and increasingly immersive.

Ray tracing enables the collection of light information to determine the color of each pixel through real-time calculation. This kind of calculation requires near-simultaneous computation of substantial amounts of data—between 60 to 140 pages worth for one second of an in-game scene. What's more, display quality is rising fast, with resolutions rapidly transitioning from 4K to 8K standard, while frame buffers are increasing to expand two times more than existing ones in response. That's why high capacity and high bandwidth are essential to meeting the growing memory demand as games continue to develop.

Developing 'GDDR6W' Graphics Memory, with Doubled Capacity and Performance Based on the Cutting-edge Fan-Out Wafer-Level Packaging (FOWLP) Technology
High performance, high capacity and high bandwidth memory solutions are helping bring the virtual realm to a closer match with reality. To meet this growing market demand, Samsung Electronics has developed GDDR6W (x64): the industry's first next-generation graphics DRAM technology.

GDDR6W builds on Samsung's GDDR6 (x32) products by introducing a Fan-Out Wafer-Level Packaging (FOWLP) technology, drastically increasing memory bandwidth and capacity.

Since its launch, GDDR6 has already seen significant improvements. Last July, Samsung developed a 24 Gbps GDDR6 memory, the industry's fastest graphics DRAM. GDDR6W doubles that bandwidth (performance) and capacity while remaining the identical size of GDDR6. Thanks to the unchanged footprint, new memory chips can easily be put into the same production processes customers have used for GDDR6, with the use of the FOWLP construction and stacking technology, cutting manufacturing time and costs.

As shown in the picture below, since it can be equipped with twice as many memory chips in an identical size package, the graphic DRAM capacity has increased from 16Gb to 32Gb, while bandwidth and the number of I/Os has doubled from 32 to 64. In other words, the area required for memory has been reduced 50% compared to previous models.

Generally, the size of a package increases as more chips are stacked. But there are physical factors that limit the maximum height of a package. What's more, though stacking chips increases capacity, there is a trade-off in heat dissipation and performance. In order to overcome these trade-offs, we've applied our FOWLP technology to GDDR6W.

FOWLP technology directly mounts memory die on a silicon wafer, instead of a PCB. In doing so, RDL (Re-distribution layer) technology is applied, enabling much finer wiring patterns. Additionally, as there's no PCB involved, it reduces the thickness of the package and improves heat dissipation.

The height of the FOWLP-based GDDR6W is 0.7 mm - 36% slimmer than the previous package with a height of 1.1 mm. And despite the chip being multi-layered, it still offers the same thermal properties and performance as the existing GDDR6. Unlike GDDR6, however, the bandwidth of the FOWLP-based GDDR6W can be doubled thanks to the expanded I/O per single package.

Packaging refers to the process of cutting fabricated wafers into semiconductor shapes or connecting wires. In the industry, this is known as a 'back-end process.' While the semiconductor industry has continuously developed towards scaling circuits as much as possible during the front-end process, packaging technology is becoming more and more important as the industry approaches the physical limits of chip sizes limits. That's why Samsung is using its 3D IC package technology in GDDR6W, creating a single package by stacking a variety of chips in a wafer state. This is one of many innovations planned to make advanced packaging for GDDR6W faster and more efficient.

The newly developed GDDR6W technology can support HBM-level bandwidth at a system level. HBM2E has a system-level bandwidth of 1.6 TB/s based on 4K system-level I/O and a 3.2 Gbps transmission rate per pin. GDDR6W, on the other hand, can produce a bandwidth of 1.4 TB/s based on 512 system-level I/O and a transmission rate of 22Gpbs per pin. Furthermore, since GDDR6W reduces the number of I/O to about 1/8 compared with using HBM2E, it removes the necessity of using microbumps. That makes it more cost-effective without the need for an interposer layer.

"By applying an advanced packaging technology to GDDR6, GDDR6W delivers twice the memory capacity and performance of similar-sized packages," said CheolMin Park, Vice President of New Business Planning, Samsung Electronics Memory Business. "With GDDR6W, we're able to foster differentiated memory products that can satisfy various customer needs - a major step towards securing our leadership in the market."

Samsung Electronics completed the JEDEC standardization for GDDR6W products in the second quarter of this year. It has also announced that it will expand the application of GDDR6W to small form factor devices such as notebooks as well as new high-performance accelerators used for AI and HPC applications, through cooperation with its GPU partners.

Add your own comment

30 Comments on Samsung Develops GDDR6W Memory Standard: Double the Bandwidth and Density of GDDR6 Through Packaging Innovations

#26

Punkenjoy

WhoDecidedThatI wasn't talking about RDNA3 using GDDR6W. I'm sorry if I gave that impression. I was just talking bandwidth in general.

I agree that each RDNA3 MCD has a 64-bit wide memory bus and it's difficult to fit more than 6 on a single package. However, future RDNA4 MCD can have 96-bit/128-bit wide memory bus. That's what GDDR6W is targeting anyway.

Moar bandwidth comes at an increased cost anyway. That's expected.

On another note, Nvidia would have benefitted a lot from something like this in 2020 with Ampere (in terms of VRAM size).

RTX 3080 came with 320-bit 19 GHz = 760 GB/sec bandwidth but only had 10 GB capacity.

Considering GDDR6 was at 14 GHz in Ampere, 448-bit 14 GHz = 784 GB/sec bandwidth with 14 GB VRAM size.

First, From the comments from AMD I wouldn't be surprised at all if NAVI 4x use the same MCD than Navi 3x (a bit like on Ryzen where they reuse the same I/O die for multiple generation).

You can view the Gamer Nexus Video on chiplets, AMD explain it pretty well. The design of the memory controller and other stuff is hard, take a lot of time and is boring for not so much gain anyway. The fact that they will be able to reuse it for the next gen will probably allow them to ship it earlier. If those MCD already support GDDR7 and cache stacking, I don't see why they would update it for next gen.

Again, a misconception of those chip are that they increase the bandwidth. They doesn't really, They increase the bandwidth per chip, but not per bus size. You could just put the double amount of chips on your board (like on both side) and you would get the same bandwidth. They are really for packaging reason more than anything. It could be really useful to put 256 bit and 384 bit GPU into mobile. But It will still be cost prohibitive to do it on larger bus even you reduce by half the amount of chip. At this point HBM start to make sense.

Also they compare it there with HBM2E, but HBM3 is available and have way more bandwidth.

They could do super large bus on professional high end GPU, but it's cheaper and better to do HBM at that point.

Also GDDR7 is around the corner with speed going up to 36 gbps

#27

WhoDecidedThat

PunkenjoyThey are really for packaging reason more than anything.

I understand. I thought more about it and your point makes sense..

If you're knowledgeable enough I have a question. AMD is already using advanced packaging technology in RDNA3 right... so why do you think they did not go for a HBM solution?

#28

Punkenjoy

WhoDecidedThatI understand. I thought more about it and your point makes sense..

If you're knowledgeable enough I have a question. AMD is already using advanced packaging technology in RDNA3 right... so instead of 6 MCDs with 96 MB cache, why not go for a HBM solution?

HBM use an interposer. This is a large piece of silicon and is quite expensive to produce. Think of a large chip under the main CCD + the HBM.

AMD on RDNA3 use an organic substrate like traditional GPU chip (Think like some kind of PCB) and they were able to really shrink the trace on it to allow all the connection. This is way cheaper and this is one of the enabler for chiplets GPU since GPU require way more connection than CPU.

#29

Wirko

PunkenjoyHBM use an interposer. This is a large piece of silicon and is quite expensive to produce. Think of a large chip under the main CCD + the HBM.

AMD on RDNA3 use an organic substrate like traditional GPU chip (Think like some kind of PCB) and they were able to really shrink the trace on it to allow all the connection. This is way cheaper and this is one of the enabler for chiplets GPU since GPU require way more connection than CPU.

RDNA3 is built using something more advanced than a usual substrate: the fan-out RDL. I commented on it here:
www.techpowerup.com/forums/threads/amd-explains-the-economics-behind-chiplets-for-gpus.301071/post-4884396
I don't know if it's good enough for routing the wires to a HBM stack, though. But AMD also uses some kind of buried silicon bridges (could be very similar to EMIB) for the HMB stacks on their Instinct GPU.

At this point it's very hard to say which applications are better suited for HBM and which are better for GDDR. Both are evolving, but packaging technology is evolving even faster. HBM requires giant memory controllers for multiple 1024-bit wide buses, meaning a lot of silicon. Also, bridges and similar stuff apparently take up a considerable amount of space on the chips that they connect.

#30

Punkenjoy

WirkoRDNA3 is built using something more advanced than a usual substrate: the fan-out RDL. I commented on it here:
www.techpowerup.com/forums/threads/amd-explains-the-economics-behind-chiplets-for-gpus.301071/post-4884396
I don't know if it's good enough for routing the wires to a HBM stack, though. But AMD also uses some kind of buried silicon bridges (could be very similar to EMIB) for the HMB stacks on their Instinct GPU.

At this point it's very hard to say which applications are better suited for HBM and which are better for GDDR. Both are evolving, but packaging technology is evolving even faster. HBM requires giant memory controllers for multiple 1024-bit wide buses, meaning a lot of silicon. Also, bridges and similar stuff apparently take up a considerable amount of space on the chips that they connect.

you are correct but I tried to keep the explanation simple. it still organic vs being silicon.

I still think HBM require a silicon substrate. we have to remember that those MCD are connect by multiple Infinity fabrics link and those work by serializing the data to have fewer trace.

But one possibility in the future, could be that instead of stacking cache, we stack HBM. This way, the HBM would be on top of silicon and you could still use infinity fabrics and organic substrate to connect to the CGD.

But first, lets see the first chiplets GPU release and see how it perform.

Add your own comment

Samsung Develops GDDR6W Memory Standard: Double the Bandwidth and Density of GDDR6 Through Packaging Innovations

30 Comments on Samsung Develops GDDR6W Memory Standard: Double the Bandwidth and Density of GDDR6 Through Packaging Innovations

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

Samsung Develops GDDR6W Memory Standard: Double the Bandwidth and Density of GDDR6 Through Packaging Innovations

Related News

30 Comments on Samsung Develops GDDR6W Memory Standard: Double the Bandwidth and Density of GDDR6 Through Packaging Innovations

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts