Monday, April 3rd 2023

AMD and JEDEC Create DDR5 MRDIMMs with 17,600 MT/s Speeds

Apr 3rd, 2023 02:09 Discuss (24 Comments)

AMD and JEDEC are collaborating to create a new industry standard for DDR5 memory called MRDIMMs (multi-ranked buffered DIMMs). The constant need for bandwidth in server systems provides trouble that can not easily be solved. Adding more memory is difficult, as motherboards can only get so big. Incorporating on-package memory solutions like HBM is expensive and can only scale to a specific memory capacity. However, engineers of JEDEC, with the help of AMD, have come to make a new standard that will try and solve this challenge using the new MRDIMM technology. The concept of MRDIMM is, on paper, straightforward. It combines two DDR5 DIMMs on a single module to effectively double the bandwidth. Specifically, if you take two DDR5 DIMMs running at 4,400 MT/s and connect them to create a single DIMM, you get 8,800 MT/s speeds on a single module. To efficiently use it, a special data mux or buffer will effectively take two Double Data Rate (DDR) DIMMs and convert them into Quad Data Rate (QDR) DIMMs.

The design also allows simultaneous access to both ranks of memory, thanks to the added mux. First-generation MRDIMMs can produce speeds of up to 8,800 MT/s, while the second and third generations modules can go to 12,800 MT/s and 17,600 MT/s, respectively. We expect third-generation MRDIMMs after 2030, so the project is still far away. Additionally, Intel has a similar solution called Multiplexer Combined Ranks DIMM (MCRDIMM) which uses a similar approach. However, Intel's technology is expected to see the light of the day as early as 2024/2025 and beyond the generation of servers, with Granite Rapids likely representing a contender for this technology. SK Hynix already makes MCRDIMMs, and you can see the demonstration of the approach below.

Sources: Robert Hormuth, via Tom's Hardware

Add your own comment

24 Comments on AMD and JEDEC Create DDR5 MRDIMMs with 17,600 MT/s Speeds

Crackong

So quad channels in dual physical slots

hs4

Last time Hynix-Renesus-Intel announced MCR DIMMs, and this looks similar to that, but are they compatible?

LabRat 891

So.... AMD worked w/ JEDEC to make what is essentially: RAM RAID0
Cool.

usiname

Is it only for the server/HEDT platforms or its possible to be implemented for the regular users?

TumbleGeorge

I think that ahead in time to 2025 FB-Dimm also will have support of DDR5 around 8400-9000. And will be faster in effective usage because all of RAM modules work in parallel.

Minus Infinity

Apparenlty AMD has almost certified DDR5 7200MT's for Epyc Turin, and one would expect Zen 5 desktop will easily support such speeds with multiple DIMMS.

Wirko

LabRat 891So.... AMD worked w/ JEDEC to make what is essentially: RAM RAID0
Cool.

Redundant array of *inexpensive* ... huh?

It's curious though that AMD did collaborate with JEDEC and Intel apparently didn't, or at least did not succeed. Were they too late, is their MCR solution technically inferior, or do they just love proprietary tech?

TumbleGeorge

o_O Hard to imagine DDR5 7200@12 channels! It will probably happen, but what RAM speed is that? Terrible!

Wirko

CrackongSo quad channels in dual physical slots

24 channels in 12 physical slots is probably a better description. These things are aimed at servers with many many cores per processor, high RAM bandwidth is valuable in those but high RAM density is at least equally valuable, and this multiplexed memory can achieve both.

Also, in servers, overvolting is out of the question, so individual DRAM dies don't have much potential to go over 4800 MT/s. You need multiplexing to arrive at 8800. One and a half volts is for people who just want to set records and don't care how long RAM (or IMC) will live.

#10

LabRat 891

WirkoRedundant array of *inexpensive* ... huh?

It's curious though that AMD did collaborate with JEDEC and Intel apparently didn't, or at least did not succeed. Were they too late, is their MCR solution technically inferior, or *do they just love proprietary tech?*

That acronym has changed over the years: Redundant Array of Independent Disks
Both are generally accepted (especially w/ those aware that RAID was originally for $/MB efficiency)

The concept of MRDIMM is, on paper, straightforward. It combines two DDR5 DIMMs on a single module to effectively double the bandwidth.

It's only similar in concept, not execution.
While it is effectively 2 RAM DIMMs glued together, they are in an Integrated construction, not Independant.

So... it's still RAID,
just not RAInexpensiveD,
or RAIndependantD
Aren't acronyms fun?! :p

*Yes. I would bet on it being that. *
[looks at Optane NVDIMMs]

#11

bonehead123

usinamepossible to be implemented for the regular users?

It is possible, but only if you have ~$1000 to spend on each stick, hehehe :)

But seriously, usually, stuff that starts off as "server" parts will eventually make it way down to us little people, but it may take a while, since this would probably involve mobo mfgr's to implement the changes needed to use these sticks to their full potential, and perhaps a tweak or two by M$, depending on how the sticks are seen/utilized by the OS...

#12

Punkenjoy

I hope there won't be memory just for Intel and memory just for AMD. On desktop at least. It would be less bad on the server market.

But that is interesting. It would mean that with 2 DIMM and 2 channel on the mainboard, you would have 4 Subchannel per DIMM for a total of 8.

That bring things close to where it was. For a very long time, there was at least 1 channel per core (since CPU were single core). Now, on a CPU like a 5900x, it's 0,125 channel per core or 0,25 on the 7950x with DDR5. With that it would bring things back to at least 0.5 channel per core. (and still, we omit SMT that destroy that ratio too).

Among many other things, this is one of the reason why memory optimization's take such a big place today.

In real world, you have to wait for a busy channel to be free again before sending another command. This add latency. If you can do your operation on a free channel, you not only get more bandwidth, but less latency.

We will see.

Again just hope we won't have 2 standard on desktop/laptop

Also, i wonder if that could mean a shorter life for AM5 if this need a new socket.

#13

dragontamer5788

usinameIs it only for the server/HEDT platforms or its possible to be implemented for the regular users?

RDIMMs are already server-only (I don't think Threadripper Pro even takes them).

MRDIMMs, building on top of registered DIMMs, will be "even more server exclusive" IMO.

#14

BoboOOZ

PunkenjoyI hope there won't be memory just for Intel and memory just for AMD. On desktop at least. It would be less bad on the server market.

But that is interesting. It would mean that with 2 DIMM and 2 channel on the mainboard, you would have 4 Subchannel per DIMM for a total of 8.

That bring things close to where it was. For a very long time, there was at least 1 channel per core (since CPU were single core). Now, on a CPU like a 5900x, it's 0,125 channel per core or 0,25 on the 7950x with DDR5. With that it would bring things back to at least 0.5 channel per core. (and still, we omit SMT that destroy that ratio too).

Among many other things, this is one of the reason why memory optimization's take such a big place today.

In real world, you have to wait for a busy channel to be free again before sending another command. This add latency. If you can do your operation on a free channel, you not only get more bandwidth, but less latency.

We will see.

Again just hope we won't have 2 standard on desktop/laptop

Also, i wonder if that could mean a shorter life for AM5 if this need a new socket.

Luckily, on desktop they can decrease latency by just sticking a big slab of 3DVCache on the CPU ;)

#15

Wirko

PunkenjoyThat bring things close to where it was. For a very long time, there was at least 1 channel per core (since CPU were single core). Now, on a CPU like a 5900x, it's 0,125 channel per core or 0,25 on the 7950x with DDR5. With that it would bring things back to at least 0.5 channel per core. (and still, we omit SMT that destroy that ratio too).

Why do you think the number of channels is so important? The combined bandwidth certainly is, that's what you should calculate per core. But can there be a big advantage of two independent 32-bit channels (in DDR5) versus one 64-bit (which in DDR5 terms means that the two subchannels aren't independent, they both receive and execute the same R/W requests at all times)?
I'm not even sure that Core and Ryzen CPUs exploit the granularity of subchannels.

#16

Punkenjoy

WirkoWhy do you think the number of channels is so important? The combined bandwidth certainly is, that's what you should calculate per core. But can there be a big advantage of two independent 32-bit channels (in DDR5) versus one 64-bit (which in DDR5 terms means that the two subchannels aren't independent, they both receive and execute the same R/W requests at all times)?
I'm not even sure that Core and Ryzen CPUs exploit the granularity of subchannels.

The bandwidth and latency of a memory is already significant. A trip to memory is very costly in CPU cycles. If you add on top of that the fact that you might have to wait for the channel to be free, you greatly waste CPU cycles. If your memory request take 60 ns but there are 10 in queues before that one, well, it's going to take 600 ns.

So why the core to channels is important? well because a core would normally work in series and initiate memory ops in sequences (a huge simplification, but let's keep it simple). In a 1 to 1 scenario, the memory should have a small or no queue at all as it would fill memory request as they come.

But in a scenario where you have 16 cores with SMT (so 32 thread) working, you would have 32 thread that generate memory ops that can only be sent to 2 or 4 subchannels. So there will be queues for sure. You will lose precious CPU cycles.

Prefetching to cache might help to alleviate this up to a point by starting the memory read before you actually need it and loading it the cache. But there is no saving for write (but usually, most memory operations are read).

And that work for L1 and L2. L3 is a victim cache on Intel and AMD, it means it will not be smart and will only contain data that got evicted from L2. It can be good if you reuse the same data very frequently. but in many scenario, it won't help at all.

If you have more channel, not only it give you more bandwidth, but also more queues to spread the load and to reduce the overall memory latency. Bandwidth and number of memory IOP grow with the number of cores (if they are in use indeed).

As for the CPU being aware of channel. They are not. That logic is handled by the memory controller. So in this case, the memory controller will be aware of double the channel. But anyway, it's quite easy to see that memory channel help performance. Just look at benchmark. The fact that it's not a linear increase is probably related to how the data is spread accross channel and how much latency the cache subsystem was able to hide.

#17

unwind-protect

dragontamer5788RDIMMs are already server-only (I don't think Threadripper Pro even takes them).

MRDIMMs, building on top of registered DIMMs, will be "even more server exclusive" IMO.

The Pro version of Threadripper uses registered RAM. Until DDR4 you could also feed it unbuffered RAM (in much less quantity of course).

#18

demirael

unwind-protectThe Pro version of Threadripper uses registered RAM. Until DDR4 you could also feed it unbuffered RAM (in much less quantity of course).

"Until DDR4"? What? It's Ryzen based and has only ever used DDR4.

#19

unwind-protect

demirael"Until DDR4"? What? It's Ryzen based and has only ever used DDR4.

Yes, what I am saying is that with DDR5 (current and future platforms) you can no longer use UDIMMs instead of RDIMMs. Up to DDR4 most boards would allow that.

#20

R-T-B

WirkoRedundant array of *inexpensive* ... huh?

To be fair "inexpensive" became "independent" awhile ago... lol.

#21

Minus Infinity

TumbleGeorgeo_O Hard to imagine DDR5 7200@12 channels! It will probably happen, but what RAM speed is that? Terrible!

DDR56400 is already qualified. Genoa only qualified for DDR54800 max. So this is a huge improvement. Would be fantastic to see 4 dimms usable at good latencies at 6400+MT's on Zen 5 desktop

#22

DaveLT

TumbleGeorgeI think that ahead in time to 2025 FB-Dimm also will have support of DDR5 around 8400-9000. And will be faster in effective usage because all of RAM modules work in parallel.

FBDIMM? Is this 2007?

#23

TumbleGeorge

DaveLTFBDIMM? Is this 2007?

Lol, I'm wrong!

#24

Hardware Geek

LabRat 891That acronym has changed over the years: Redundant Array of Independent Disks
Both are generally accepted (especially w/ those aware that RAID was originally for $/MB efficiency)

It's only similar in concept, not execution.
While it is effectively 2 RAM DIMMs glued together, they are in an Integrated construction, not Independant.

So... it's still RAID,
just not RAInexpensiveD,
or RAIndependantD
Aren't acronyms fun?! :p

*Yes. I would bet on it being that. *
[looks at Optane NVDIMMs]

To be fair, it isn't redundant either.

Add your own comment

AMD and JEDEC Create DDR5 MRDIMMs with 17,600 MT/s Speeds

24 Comments on AMD and JEDEC Create DDR5 MRDIMMs with 17,600 MT/s Speeds

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD and JEDEC Create DDR5 MRDIMMs with 17,600 MT/s Speeds

Related News

24 Comments on AMD and JEDEC Create DDR5 MRDIMMs with 17,600 MT/s Speeds

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts