• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Instinct MI200: Dual-GPU Chiplet; CDNA2 Architecture; 128 GB HBM2E

A100 were sold in packs of 10, if I'm not mistaken, for $200k. I don't see why AMD would ask half of that sum for a vastly faster product.

I've PCIe-versions of A100 quoted at $10k. The HGX-version probably cost more, and maybe is the one you're talking about for 10-for-$200k (I never seen the HGX-version quoted personally).

The PCIe-versions won't have cache-coherency, and will have fewer links. Anyone who wants 2-or-fewer MI200s (or A100s) probably wants the PCIe version. The HGX A100 / OAM MI200 is really for customers who run 4x GPUs, 8x GPUs or more (which is probably why it makes sense to sell them in packs of 10).
 
I don't get how that math works.
A100 - 54 billion transistors
MI200 - 58 billion transistors (29 + 29), yet it runs circles around A100.


You mean 22% faster is "barely faster"?
ADL was recently praised like a miracle for being this "barely faster" in ST only lmao.
 
I don't get how that math works.
A100 - 54 billion transistors
MI200 - 58 billion transistors (29 + 29), yet it runs circles around A100.


You mean 22% faster is "barely faster"?
Its basically 4 rx6900 xt glued together ofcouse its gonna beat it
 
Still Dual GPU's acting as one. Yes faster with the new fabric but still basic Idea. When you get into 4k or even 8k gaming single card cannot handle it pushing over 100 fps. I tried that with one 1070ti forget it. My two in SLI can push 100 fps easy. But since Nvidia and AMD dropped that tech. I bought a 3080TI and tried it at 4k could not push 100 fps constantly. Ya it looks great but my eyes can see the lag and frame buffering trying to keep up. Now lucky friend of mine has two 3090's in SLI and man 8k res at 150 FPS looks soo clean and perfect. But I do not have 5 grand lying around to afford such nice things.
This isn't crossfire, it doesn't even have a video output, isn't running games and can't and doesn't work how you think, this IS new tech.
 
This isn't crossfire, it doesn't even have a video output, isn't running games and can't and doesn't work how you think, this IS new tech.
You are right about not being crossfire. But here, the thing is it still see both chip independently and not 1. The advantages is they are being linked by a very fast infinity fabrics link (800 GB/s) That is much faster than going thru PCI-E (where frequently the second cards was running at PCI-E 3.0 8X (or both). (8 GB/s with much higher latency).

And that is one of the key thing here, latency. The 2 chips being so close, they can access the other chip memory with minimal impact versus if it had to go thru the PCI-E bus or any other external connection. And at last, something spec sheet do not really tell, is what AMD implemented for cache coherency and memory sharing.
 
I meant
Except that it's not, RDNA and CDNA are different architecture. CDNA is more compute oriented and designed for compute heavy workload where RDNA is designed toward graphic workload.
In transistors size rx 6900 xt has 26.8 billion and this has 29+29 per chiplet
 
You are right about not being crossfire. But here, the thing is it still see both chip independently and not 1. The advantages is they are being linked by a very fast infinity fabrics link (800 GB/s) That is much faster than going thru PCI-E (where frequently the second cards was running at PCI-E 3.0 8X (or both). (8 GB/s with much higher latency).

And that is one of the key thing here, latency. The 2 chips being so close, they can access the other chip memory with minimal impact versus if it had to go thru the PCI-E bus or any other external connection. And at last, something spec sheet do not really tell, is what AMD implemented for cache coherency and memory sharing.
Read the white paper, it's seen as one chip, one cores the master number two is it's slave , not like anything else I might add, new tech, new IP new ways.

Hopefully they will carry over well to consumer cards.
 
Read the white paper, it's seen as one chip, one cores the master number two is it's slave , not like anything else I might add, new tech, new IP new ways.

Hopefully they will carry over well to consumer cards.
Do you have that whitepaper?

Most people i see say it being show to the OS as 2 chip with 64 GB devices (but with many tools for memory coherency)

On this whitepaper, nothing say what you say

i know what you mean is what leaker said RDNA 3 will be, but it do not look like this is the case for this architecture. But those are made to be grouped together in large cluster so that do not really matter that much in the end as long as you are able to split your code and data into chunk that each GPU can digest.
 
Do you have that whitepaper?

Most people i see say it being show to the OS as 2 chip with 64 GB devices (but with many tools for memory coherency)

On this whitepaper, nothing say what you say

i know what you mean is what leaker said RDNA 3 will be, but it do not look like this is the case for this architecture. But those are made to be grouped together in large cluster so that do not really matter that much in the end as long as you are able to split your code and data into chunk that each GPU can digest.
Thats gonna be hard for rdna 3 for the os to read the gpu as one and not sli... plus aren't games having a hard time splitting the workload on combined gpu's ?
 
Thats gonna be hard for rdna 3 for the os to read the gpu as one and not sli... plus aren't games having a hard time splitting the workload on combined gpu's ?
Let say CDNA2 is similar to the first gen Threadripper where full Zen1 chip were put on the same socket. For the OS it was similar than having multi socket since each cpu had their own memory controller.

From what we are hearing, RDNA3 might look a bit more like Zen2/3 where there is some kind of I/O die. In this case, it could be a part of one chip act as a bridge similar to the I/O die or there could be a bridge between the two chip that could do that.

The main thing are how to handle different memory zone. In Zen 1 Threadripper, there are many memory controller to deal with (although the OS can see them as one with NUMA). In zen2 threadripper, there is just 1 memory controller and NUMA is not used.

If RDNA 3 have just 1 die with memory and the second access it via a bridge, or there is an i/o die that is also the memory controller, it could be seen by the OS as 1 chip. There is also how they communicate with the OS, if it's hidden behind an i/o die or if it have to go thru the first "Master Die" to access the PCI-E bus.

Everything is still rumours but it look like AMD figured it out for RDNA 3. They do not need to implemented it as much for CDNA 2 as most software running on it are already made to scale with multi GPU. Doesn't mean they won't do something similar for CDNA3.
 
Let say CDNA2 is similar to the first gen Threadripper where full Zen1 chip were put on the same socket. For the OS it was similar than having multi socket since each cpu had their own memory controller.

From what we are hearing, RDNA3 might look a bit more like Zen2/3 where there is some kind of I/O die. In this case, it could be a part of one chip act as a bridge similar to the I/O die or there could be a bridge between the two chip that could do that.

The main thing are how to handle different memory zone. In Zen 1 Threadripper, there are many memory controller to deal with (although the OS can see them as one with NUMA). In zen2 threadripper, there is just 1 memory controller and NUMA is not used.

If RDNA 3 have just 1 die with memory and the second access it via a bridge, or there is an i/o die that is also the memory controller, it could be seen by the OS as 1 chip. There is also how they communicate with the OS, if it's hidden behind an i/o die or if it have to go thru the first "Master Die" to access the PCI-E bus.

Everything is still rumours but it look like AMD figured it out for RDNA 3. They do not need to implemented it as much for CDNA 2 as most software running on it are already made to scale with multi GPU. Doesn't mean they won't do something similar for CDNA3.
Rumours have rDNA 3 tapped out as well.
I could be getting this confused with rdna3, good point.

As is the white paper being lite on details.
 
I meant

In transistors size rx 6900 xt has 26.8 billion and this has 29+29 per chiplet
So, how is that "4 6900 'glued together' then"? :D

I also recall that Intel's "glued together" comment didn't age well...
 
Do anyone know why the performance of FP64 is the same as FP32. My understanding that single precision can get 2X speed up comparing to double for free?

Do anyone know why the performance of FP64 is the same as FP32. My understanding that single precision can get 2X speed up comparing to double for free?
 
Do anyone know why the performance of FP64 is the same as FP32. My understanding that single precision can get 2X speed up comparing to double for free?

Do anyone know why the performance of FP64 is the same as FP32. My understanding that single precision can get 2X speed up comparing to double for free?

Because they designed it that way.

Usually, 32 bit performance is more important. However, it seems like ORNL asked for double precision performance.

It should be noted that CPUs usually do 64 bit scalar at the same speed as 32 scalar due to the sizing of 64 bit registers.
 
Back
Top