• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD WMMA Instruction is Direct Response to NVIDIA Tensor Cores

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,995 (1.06/day)
AMD's RDNA3 graphics IP is just around the corner, and we are hearing more information about the upcoming architecture. Historically, as GPUs advance, it is not unusual for companies to add dedicated hardware blocks to accelerate a specific task. Today, AMD engineers have updated the backend of the LLVM compiler to include a new instruction called Wave Matrix Multiply-Accumulate (WMMA). This instruction will be present on GFX11, which is the RDNA3 GPU architecture. With WMMA, AMD will offer support for processing 16x16x16 size tensors in FP16 and BF16 precision formats. With these instructions, AMD is adding new arrangements to support the processing of matrix multiply-accumulate operations. This is closely mimicking the work NVIDIA is doing with Tensor Cores.

AMD ROCm 5.2 API update lists the use case for this type of instruction, which you can see below:
rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts. The API is a header library of GPU device code, meaning matrix core acceleration may be compiled directly into your kernel device code. This can benefit from compiler optimization in the generation of kernel assembly and does not incur additional overhead costs of linking to external runtime libraries or having to launch separate kernels.

rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed.


View at TechPowerUp Main Site | Source
 
Oh so apparently tensor cores are not so useless now :roll:
 
Well well, so the consensus is moving towards dedicated hardware.

Let's see where RDNA3's power budget goes...

I need to read better it seems
 
Last edited:
Oh so apparently tensor cores are not so useless now :roll:

they are if you don't have them, they aren't if you have them :D
 
This is a good news. I hope AMD will show us how much the ray-tracing performance is improved from RDNA 2 to RDNA 3 - should be multiple times because RDNA 2's RT performance was abysmal.
 
This is a good news. I hope AMD will show us how much the ray-tracing performance is improved from RDNA 2 to RDNA 3 - should be multiple times because RDNA 2's RT performance was abysmal.

Ray tracing is a joke anyway, no one is missing much for not having it. It's true we hadn't much AAA releases, still after so long we have but a few games that done anything meaninfull with it. Seems more like a must have buzzword for the box than anything else
 
Ray tracing is a joke anyway, no one is missing much for not having it. It's true we hadn't much AAA releases, still after so long we have but a few games that done anything meaninfull with it. Seems more like a must have buzzword for the box than anything else

It is not a joke because it makes the nvidia cards flying off the shelves - the nvidia users claim it's better because:
1. Ray-tracing;
2. DLSS;
3. CUDA, tensor, etc...
 
This is the third generation of matrix cores from AMD, CDNA 1/2 now getting added to the consumer line as they unify a certain amount of features for ROCm support with CDNA3 and RDNA3.
 
It is not a joke because it makes the nvidia cards flying off the shelves - the nvidia users claim it's better because:
1. Ray-tracing;
2. DLSS;
3. CUDA, tensor, etc...
You mean flying off the pallets to miners...?

I haven't seen anything on shelves for a looong time tbh. Its just recently that we're getting some semblance of normal availability back, and as usual, Nvidia is faster in restocking the sales channels.
 
Oh so apparently tensor cores are not so useless now :roll:
They are not useless per se as they do something, especially nVidia which I believe just want to focus on professional application of ML, for games I still think they are useless, the games IMO still look awful, mainly to low polygons count and textures, I don't see the point of having pretty lighting but cubes instead of mugs or ball, and textures are of so low quality (though vram is more more important).
Wasting silicon for special hardware just for some ML isn't the right way, once we achieve perfect geometry then I'm all for it.
 
Oh so apparently tensor cores are not so useless now :roll:
Except unlike Nvidia, AMD seam to have Not gone with seperated fix function hardware, and Instead use the 64bit wavefronts they already had via instructions and likely bigger registers.

More information is required though tbf, but this doesn't sound like specialised fix function hardware like tensor core's to me, just optimised use of what they're simd array could theoretically do.


"rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts.".


As they say.
 
They are not useless per se as they do something, especially nVidia which I believe just want to focus on professional application of ML, for games I still think they are useless, the games IMO still look awful, mainly to low polygons count and textures, I don't see the point of having pretty lighting but cubes instead of mugs or ball, and textures are of so low quality (though vram is more more important).
Wasting silicon for special hardware just for some ML isn't the right way, once we achieve perfect geometry then I'm all for it.

Just don't play games then, wait until you are on your deathbed, I'm sure games will look amazing then
 
Except unlike Nvidia, AMD seam to have Not gone with seperated fix function hardware, and Instead use the 64bit wavefronts they already had via instructions and likely bigger registers.

More information is required though tbf, but this doesn't sound like specialised fix function hardware like tensor core's to me, just optimised use of what they're simd array could theoretically do.


"rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts.".


As they say.

Wow, you read better than I did.
 
It is not a joke because it makes the nvidia cards flying off the shelves - the nvidia users claim it's better because:
1. Ray-tracing;
2. DLSS;
3. CUDA, tensor, etc...
RTX 3070 user here, they are useless for me.
 
Oh so apparently tensor cores are not so useless now :roll:

AMD's idea is leaning against fixed function ASIC (like the tensor cores) which can be considered some how a waste of silicon because the die area will sit down doing nothing for most of the time, they like to make the silicon die do other things as well.

A single function ASIC is easier to make and faster to implement, that why NV didn't have to do more and could add it quicker.

AMD's way is harder and needs more engineering work, so even after NV announced it, they took sometime to implement a similar function, but the benefit is more as they're reusing mostly the same silicon die space they have before, it's just more tweaked to do more specialised work more while still being able to do other things in the same time, so it wont be like a fixed function ASIC that can only do a single thing.

It's like saying, NV is adding 15% more die area to have this function. AMD took 2 more years but they only needed to have 5% more die area. And they might be able to use it for future uses as well for other things.
 
Just don't play games then, wait until you are on your deathbed, I'm sure games will look amazing then
If you say so :slap:
When your whole life flashes before your eyes, how much of it do you want to not have ray tracing?
Never gets old, does it :laugh:
 
Wow, you read better than I did.
Well back when rapid packed math was introduced I couldn't believe they had not also incorporated this, you have a 64bit wavefront that can already do multiple math ops on one wave in one pass, so why not do quadratics like that, clearly patience was needed by me.
 
AMD's idea is leaning against fixed function ASIC (like the tensor cores) which can be considered some how a waste of silicon because the die area will sit down doing nothing for most of the time, they like to make the silicon die do other things as well.

A single function ASIC is easier to make and faster to implement, that why NV didn't have to do more and could add it quicker.

AMD's way is harder and needs more engineering work, so even after NV announced it, they took sometime to implement a similar function, but the benefit is more as they're reusing mostly the same silicon die space they have before, it's just more tweaked to do more specialised work more while still being able to do other things in the same time, so it wont be like a fixed function ASIC that can only do a single thing.

It's like saying, NV is adding 15% more die area to have this function. AMD took 2 more years but they only needed to have 5% more die area. And they might be able to use it for future uses as well for other things.

Same with RT then, AMD's implementation of RT is just weak sauce.

Tensor cores is on its 4th gen with Ada now, probably takes less than 5% die space.

If you say so :slap:

Never gets old, does it :laugh:

Well if money is everything to you, then why are you spending them on useless PC stuff anyways.
 
What do you mean? Selling someone/anything on that kind of sales pitch is just bad period ~ I'd rather see (all) wars end in my lifetime than be hung up on "real-time" ray tracing!

And yes all of us can do little things to make that day come forth.

well IMO Gamersnexus was being a dumbass for calling that article out, knowing now that those GPU get to keep insane resell value 3.5 years after launch
used.jpg
 
Same with RT then, AMD's implementation of RT is just weak sauce.
Can we give up this A < B, therefore A must be crap mentality - it's getting boring.
 
You guys realize this is completely irrelevant for consumers, right ?
 
well IMO Gamersnexus was being a dumbass for calling that article out, knowing now that those GPU get to keep insane resell value 3.5 years after launch View attachment 253035
Purely in terms of resale value yes it's outdone probably any other dGPUs in the past, but then you forgot the backdrop? A one in 100 year global pandemic. As for your particular point about Tensor cores, correct me if I'm wrong, outside of DLSS are they actually that useful anywhere else? The way things are shaping up right now DLSS vs FSR will end up almost exactly as Gsync vs Freesync!

Unless of course Nvidia is willing to throw another billion or two each year for the next decade or so.
 
You guys realize this is completely irrelevant for consumers, right ?

Nvidia spent time and money and even rename the all brand of GPU's and AMD seems to try all it can to do some nice benchmarks, and yet for us consumers RTX is nothing, zero, a couple of games, a gimmick

The time and money spent on this is absurd and they pile on
 
Back
Top