Friday, June 17th 2011

AMD Charts Path for Future of its GPU Architecture

The future of AMD's GPU architecture looks more open, broken from the shackles of a fixed-function, DirectX-driven evolution model, and that which increases the role of GPU in the PC's central processing a lot more than merely accelerating GPGPU applications. At the Fusion Developer Summit, AMD detailed its future GPU architecture, revealing that in the future, AMD's GPUs will have full support for C, C++, and other high-level languages. Integrated with Fusion APUs, these new number-crunching components will be called "scalar co-processors".

Scalar co-processors will combine elements of MIMD (multiple-instruction multiple-data,) SIMD (single-instruction multiple data), and SMT (simultaneous multithreading). AMD will ditch the VLIW (very long instruction word) model that has been in use for several of AMD's past GPU architectures. While AMD's GPU model will break from the shackles of development that is pegged to that of DirectX, it doesn't believe that APIs such as DirectX and OpenGL will be discarded. Game developers can continue to develop for these APIs, and C++ support is more for general purpose compute applications. That does, however, create a window for game developers to venture out of the API-based development model (specifically DirectX). With its next Fusion processors, the GPU and CPU components will make use of a truly common memory address space. Among other things, this eliminate the "glitching" players might sometimes experience when games load textures as they go over the crest of a hill.

Source: TechReport
Add your own comment

114 Comments on AMD Charts Path for Future of its GPU Architecture

#1
Wile E
Power User
pantherx12 said:
No, it's on the same silicon man, there's no latency between the communication of CPU-GPU ( or very little)

It does have benefits.
But you add it back by longer traces to memory. The benefits are mostly matters of convenience, marketing and packaging, not any performance benefits noticeable to end user. It makes sense from a business standpoint and may eventually lead to performance gains. I'm not arguing that. What I am arguing is that what is currently using these APUs is not hardware based, as in transparent to the OS. They are software based, just like CUDA and Stream. To use the APUs, the program must be specifically written to take advantage of them. Nothing changes that fact.
Thatguy said:
Imagine the power of GPU with the programming front end of x86 or x87, which are widely supported instructions in compilers right now.

Thats where this is headed, INT + GPU the FPU is on borrowed time and thats likely why they shared it.
I don't see it happening any time soon.
cadaveca said:
You cna thanks nVidia for that. Had they actually adopted DX9 properly, and DX10, all the needed software would be part of the OS now. But due to them doing thier own thing, we the consumer got screwed.

I don't know why you even care if it uses software. All computing does....PC's are useless without software.
I care only when people claim it's hardware based, when it isn't.

And I don't buy the nV argument either.
Posted on Reply
#2
cadaveca
My name is Dave
Wile E said:
And I don't buy the nV argument either.
CUDA says you have no choice. The whole point of DX10 was to provide OPEN access to features such as what CUDA offers, and nV said, quite literally, Microsoft developed APIs, so knew nothing about hardware design, and that thier API (DX) wasn't the right approach. DX10.1 is the perfect example of this behavior continuing.

DirectX, is largely, broken, because of CUDA. Should I mention the whole Batman antialiasing mumbo-jumbo?


I mean, I understand teh business side, and CUDA, potentially, has saved nV's butt.

But it's existence as a closed platform does more harm than good.

Thankfully, AMD will have thier GPUs in thier CPUs, which, in hardware, will provide alot more functionality than nV can ever bring to the table.
Posted on Reply
#3
Wile E
Power User
cadaveca said:
CUDA says you have no choice. The whole point of DX10 was to provide OPEN access to features such as what CUDA offers, and nV said, quite literally, Microsoft developed APIs, so knew nothing about hardware design, and that thier API (DX) wasn't the right approach. DX10.1 is the perfect example of this behavior continuing.

DirectX, is largely, broken, because of CUDA. Should I mention the whole Batman antialiasing mumbo-jumbo?


I mean, I understand teh business side, and CUDA, potentially, has saved nV's butt.

But it's existence as a closed platform does more harm than good.

Thankfully, AMD will have thier GPUs in thier CPUs, which, in hardware, will provide alot more functionality than nV can ever bring to the table.
It is not broken because of CUDA. 10.1 didn't add what CUDA added. And CUDA certainly didn't effect DX9. Granted, 10.1 is what 10 should have been, mostly due to nV, but it had nothing to do with CUDA.

More anti-CUDA bs with nothing to back it.
Posted on Reply
#4
pantherx12
Wile E said:
It is not broken because of CUDA. 10.1 didn't add what CUDA added. And CUDA certainly didn't effect DX9. Granted, 10.1 is what 10 should have been, mostly due to nV, but it had nothing to do with CUDA.

More anti-CUDA bs with nothing to back it.
10.1 Didn't add that stuff because of Nvidia not being ready for the features that later became dx11.

Tesselation and compute features. ( Ati had a a tessellation unit ready a long time ago)
Posted on Reply
#5
cadaveca
My name is Dave
Wile E said:
More anti-CUDA bs with nothing to back it.
Unfortunately, it is what is, but not because I'm anti-CUDA.
Computing is evolving from "central processing" on the CPU to "co-processing" on the CPU and GPU. To enable this new computing paradigm, NVIDIA invented the CUDA parallel computing architecture that is now shipping in GeForce, ION, Quadro, and Tesla GPUs, representing a significant installed base for application developers.
The bolded part is the BS, simply because it's DirectX and Windows that enables such fuctionality, not CUDA. In fact, it's like they are saying they invented GPGPU.

In that regard, it's impossible for me to be "anti-CUDA". It's wrapping GPGPU functions into that specific term that's the issue.;)
Posted on Reply
#6
Benetanegia
pantherx12 said:
10.1 Didn't add that stuff because of Nvidia not being ready for the features that later became dx11.

Tesselation and compute features. ( Ati had a a tessellation unit ready a long time ago)
DX10 or DX10.1 or whatever was going to be the DX after DX9 never had compute. Compute came to DX thanks to other APIs that came first, like Stream and CUDA, because those ones created demand. And it certainly was not Nvidia the one who prevented compute features added to DirectX. It would have been a COMPLETE win for Nvidia, if DX10 had included them, for instance. Nvidia was ready for compute back then with G80 and with a 6 months lead over Ati's chip, which was clearly inferior. Cayman can barely outclass Nvidia's 5 year old G80 chip on compute oriented features, let alone previous cards. HD2000/3000 and even 4000 were simply no match for G80 for compute tasks.

As for tesselation, it was not included because it didn't make sense to include it at all, not because Nvidia was not ready. ANYTHING besides a current high-end card is brought to its knees when tesselation is enabled, so tesselation in HD4000 and worse yet HD2/3000 was a waste of time that no developer really wanted, because it was futile. If they had wanted it then no one would have stopped them from implementing it in games, they don't even use it on the Xbox which is a closed platform and much easier to implement without worries of screwing up for non-supporting cards.

Besides a tesselator (especially the one that Ati used before the DX11 implementation) is the most simple thing you can throw on a circuit, it's just an interpolator, and Nvidia already toyed with the idea of interpolated meshes with the FX series. It even had some dedicated hardware for it, like a very archaic tesselator. Remember how that went? Ati also created something similar, much more advanced (yet nowhere near close to DX11 tesselation) and was also scrapped by game developers, because it was not viable.

cadaveca said:
Unfortunately, it is what is, but not because I'm anti-CUDA.


The bolded part is the BS, simply because it's DirectX and Windows that enables such fuctionality, not CUDA. In fact, it's like they are saying they invented GPGPU.

In that regard, it's impossible for me to be "anti-CUDA". It's wrapping GPGPU functions into that specific term that's the issue.;)
What are you talking about man? CUDA has nothing to do with DirectX. They are two very different API's that have hardware (ISA) correlation on the GPU and are exposed via the GPU drivers. DirectX and Windows have nothing to do with that. BTW considering what you think about it, how do you explain CUDA (GPGPU) on Linux and Apple OS's?
Posted on Reply
#7
cadaveca
My name is Dave
Benetanegia said:
DX10 or DX10.1 or whatever was going to be the DX after DX9 never had compute. Compute came to DX thanks to other APIs that came first, like Stream and CUDA, because those ones created demand. And it certainly was not Nvidia the one who prevented compute features added to DirectX. It would have been a COMPLETE win for Nvidia, if DX10 had included them, for instance. Nvidia was ready for compute back then with G80 and with a 6 months lead over Ati's chip
Um, yeah.

G80 launched November 2006.

R520, which featured CTM, and Compute support(and as such, even supported F@H on GPU long before nVidia did), launched a year earlier, when nVidia had no such options, due to a lack of "double precision", which was the integral feature that G80 brought to the market for nV. This "delay" is EXACTLY what delayed DirectCompute.
Posted on Reply
#8
Benetanegia
cadaveca said:
Um, yeah.

G80 launched November 2006.

R520, which featured CTM, and Compute support(and as such, even supported F@H on GPU long before nVidia did), launched a year earlier, when nVidia had no such options, due to a lack of "double precision", which was the integral feature that G80 brought to the market for nV. This "delay" is EXACTLY what delayed DirectCompute.
That GPGPU implementation was not Ati's work in reality, but Standford University's. That was nothing but BrookGPU and used DirectX instead of accesing the ISA directly like now. Of course Ati collaborated in the development of drivers so they deserve the credit of .

That has nothing to do with our discussion though. Ati being first means nothing as to the current and 5 past years situation. Ati was bought and dissapeared a long time ago and in the process the project was abandoned. AMD* was simply not ready to let GPGPU interfere with their need to sell high-end CPU (none is Intel), and that's why they have never really pused for GPGPU programs until now. Until Fusion, so that they can continue selling high-end CPU AND high end GPUs. There's nothing honorable on this Fusion thing.

* I want to be clear about a fact that not many see apparently. Ati != AMD and has never been. I never said nothing about what Ati pursued, achieved or made before it was bought. It's after the acquisition that the GPGPU push was completely abandoned.

BTW your last sentence holds no water. So DirectCompute was not included in DX10 because Nvidia released a DX10 card 7 months earlier than AMD, which also happens to be compute ready (and can be used even on todays GPGPU programs)? Makes no sense dude. Realistically only AMD could have halted DirectCompute, but reality is that they didn't because DirectCompute never existed, nor was it planned until other APIs appeared and showed that DirectX's supremacy and Windows as a gaming platform was in danger.
Posted on Reply
#9
cadaveca
My name is Dave
Benetanegia said:
It's after the acquisition that the GPGPU push was completely abandoned.
OK, if you wanna take that tact, I'll agree. ;)

I said, very simply, that nVidia's delayed implementations ("CUDA" hardware support), and the supporting software, has greatly affected the transparacy of "stream"-based computing iin the end-user space.

W1zzard said:
if there was some killer application for gpu computing wouldn't nvidia/cuda have found it by now?
Says it all.

The "software" needed is already there(there's actually very limited purposes for "GPU" based computing), and has been for a long time. Hardware functionality is here, with APUs.

Benetanegia said:
What are you talking about man? CUDA has nothing to do with DirectX. They are two very different API's that have hardware (ISA) correlation on the GPU and are exposed via the GPU drivers.
CUDA has EVERYTHING to do with DirectX, as it replaces it, rather than works with it. Because the actual uses are very limited, there's no reason for a closed API such as CUDA, except to make money. And that's fine, that's business, but it does hurt the consumer in the end.
Posted on Reply
#10
Wile E
Power User
pantherx12 said:
10.1 Didn't add that stuff because of Nvidia not being ready for the features that later became dx11.
Wrong. All of nVidia's DX10 cards are capable of computing. nVidia did not hold back DX11 development, they did hold back some features in 10, but those were added back for 10.1. None of those said feature were GPGPU. The compute features of DX11 were developed BECAUSE of the demand for compute functions like CUDA.

pantherx12 said:
Tesselation and compute features. ( Ati had a a tessellation unit ready a long time ago)
The early implementation of ATI's tessellation engine is completely different to the current implementation. Their earlier version was proprietary. Exactly the same concept as CUDA vs DX compute. And guess what, that proprietary innovation lead to an open standard. Also just like CUDA.

As per usual in this forum, there is a lot of CUDA/nV hate, with no real substance to back it up.
cadaveca said:
OK, if you wanna take that tact, I'll agree. ;)

I said, very simply, that nVidia's delayed implementations ("CUDA" hardware support), and the supporting software, has greatly affected the transparacy of "stream"-based computing iin the end-user space.



Says it all.

The "software" needed is already there(there's actually very limited purposes for "GPU" based computing), and has been for a long time. Hardware functionality is here, with APUs.



CUDA has EVERYTHING to do with DirectX, as it replaces it, rather than works with it. Because the actual uses are very limited, there's no reason for a closed API such as CUDA, except to make money. And that's fine, that's business, but it does hurt the consumer in the end.
Wrong. See above. It creates a market that open standards eventually capitalize on. Again, your disdain for CUDA is still completely unfounded.
Posted on Reply
#11
Benetanegia
cadaveca said:
OK, if you wanna take that tact, I'll agree. ;)

I said, very simply, that nVidia's delayed implementations ("CUDA" hardware support), and the supporting software, has greatly affected the transparacy of "stream"-based computing iin the end-user space.



Says it all.

The "software" needed is already there(there's actually very limited purposes for "GPU" based computing), and has been for a long time. Hardware functionality is here, with APUs.



CUDA has EVERYTHING to do with DirectX, as it replaces it, rather than works with it. Because the actual uses are very limited, there's no reason for a closed API such as CUDA, except to make money. And that's fine, that's business, but it does hurt the consumer in the end.
Without CUDA GPGPU would have died. Plain and simple. After the only other company interested in GPGPU was bought by a CPU manufacturer, only CUDA remained and only Nvidia pushed for GPGPU. And please don't say AMD has also pushed for it, because that's simply not true. Ati pushed it in 2006 and it's true that AMD has been pushing a little bit, but only since 2009 or so, when it became obvious they would be left behind if they didn't. They always talked about supporting it but never actually released any software or put money on it. That is until now, until they have released Fusion and thanks to that they can still continue milking us customers, by making us buy high end CPUs and high-end GPUs, when a mainstream CPU and high-end GPU would do it just as well.

The idea of APU for laptops and HTPC is great, but for HPC or enthusiast use it's retarded and I don't know why so many people are content with it. Why I need 400 SPs on a CPU, which are not enough for modern games, just to run GPGPU code on it, when I can have 3000 on a GPU and use as many as I want? Also when a new game is released and needs 800 SP, oh well I need a new CPU, not because I need a better CPU, but because I need the integrated GPU to have 800 SP. RETARDED. And of course I would still need the 6000 SP GPU for the game to run.

It's also false that GPGPU runs better on an APU because t's close to the CPU. It varies with the task. many tasks are run much much better on dedicated GPU, thanks to the high bandwidth and numenrous and fast local cache and registers.
Posted on Reply
#12
cadaveca
My name is Dave
You're missing the point. I'll tend to agree that nVidia, with CUDA has kept GPGPU going, but like I said earlier...it's actually uses are so few and far between, it's almost stupid. It doesn't offer anything to the end user, really.

Like why haven't they jsut sold the software to microsoft, already?

Why don't they make it work on ATI GPUs too?

I mean really...uses are so few, what's the point?
Posted on Reply
#13
Benetanegia
cadaveca said:
You're missing the point. I'll tend to agree that nVidia, with CUDA has kept GPGPU going, but like I said earlier...it's actually uses are so few and far between, it's almost stupid. It doesn't offer anything to the end user, really.
You don't really follow the news a lot isn't it? There's hundreds of uses for GPGPU
Like why haven't they jsut sold the software to microsoft, already?
Because Microsoft never buys something they can copy. Hello DirectCompute.

And I'm not saying they copied CUDA btw (although it's very similar), but the concept and CUDA is in fact the evolution of Brook/Brook++/BrookGPU, made by the same people who made Brook in Standford and who actually invented the Stream processor concept. Nvdia didn't invent GPGPU, but many people who did work fr Nvidia now. i.e. Bill Dally.
Why don't they make it work on ATI GPUs too?
Because AMD doesn't want it and they can't do it without permission. And never wanted it tbh, because it would have exposed their inferiority on that front. Nvidia already offered CUDA and PhysX to AMD and for free in 2007, but AMD refused.

Also there's OpenCL which is the same thing and something both AMD and Nvidia are supporting so...
I mean really...uses are so few, what's the point?
Uses are few, there's no point, yet AMD is promoting the same concept as the future. A hint, uses are not few. Until now you don't see many because:

1- Intel and AMD have been trying hard to delay GPGPU.
2- It takes time to implement things. i.e. How much it took developers to implement SSE? And the complexity of SSE in comparison to GPGPU is like...
3- You don't read a lot. There's hundreds of implementations in the scientific arena.
Posted on Reply
#14
jaydeejohn
http://blogs.msdn.com/b/ptaylor/archive/2007/03/03/optimized-for-vista-does-not-mean-dx10.aspx
Given the state of the NV drivers for the G80 and that ATI hasn’t released their hw yet; it’s hard to see how this is really a bad plan. We really want to see final ATI hw and production quality NV and ATI drivers before we ship our DX10 support. Early tests on ATI hw show their geometry shader unit is much more performant than the GS unit on the NV hw. That could influence our feature plan.
Posted on Reply
#15
Benetanegia
jaydeejohn said:
http://blogs.msdn.com/b/ptaylor/archive/2007/03/03/optimized-for-vista-does-not-mean-dx10.aspx
Given the state of the NV drivers for the G80 and that ATI hasn’t released their hw yet; it’s hard to see how this is really a bad plan. We really want to see final ATI hw and production quality NV and ATI drivers before we ship our DX10 support. Early tests on ATI hw show their geometry shader unit is much more performant than the GS unit on the NV hw. That could influence our feature plan.
How is that relevant? :confused:
Posted on Reply
#16
cadaveca
My name is Dave
Benetanegia said:
There's hundreds of implementations in the scientific arena.
That's one use, to me, and not one that I personally get any use out of. You falsely inflating the possibilities.

As a home user, there's 3D browser acceleration, encoding accelleration, and game physics. Is there more than that for a HOME user? Because that's what I am, right, so that's all I care about.

Which brings me to my point...why do I care? GPGPU doesn't offer me much.
Posted on Reply
#17
pantherx12
Benetanegia said:
. Cayman can barely outclass Nvidia's 5 year old G80 chip on compute oriented features, let alone previous cards. HD2000/3000 and even 4000 were simply no match for G80 for compute tasks.
Wish people would stop thinking of folding at home when they think of compute.

Theirs actually a lot of stuff ATIs architecture is better at.

Well cept the 580, that's built for that stuff :laugh:

But barely outclass G80?

I've got apps where my 6870 completely smashes apart even top end nvidia cards.

May sound a bit fan-boyish here but just sharing my experience take it as you will.


Geeks3d and the other tech blogs demi-frequently post up comparisons of cards on new benchmarks or compute programs can find results there.

Been a while since I've read up though so can't point you in a specific direction, only that it's not so much a case of hardware vs hardware.

Cheers for clearing up about the compute though.
Posted on Reply
#18
jaydeejohn
The rest is obviously history.
MS shifted their goals seeing this
Posted on Reply
#19
Benetanegia
cadaveca said:
That's one use, to me, and not one that I personally get any use out of. You falsely inflating the possibilities.
That is not one use. Scientist use GPGPU for physics simulations, treatment and comparison of image data (medical, satellite, military), artificial/distributed intelligence, data reorganization, stock market flow control and many many others. That is not one use.
As a home user, there's 3D browser acceleration, encoding accelleration, and game physics. Is there more than that for a HOME user? Because that's what I am, right, so that's all I care about.

Which brings me to my point...why do I care? GPGPU doesn't offer me much.
There, then you finally said what you wanted to say. "It does not offer me" is not the same as "it has no use".
Posted on Reply
#20
jaydeejohn
I wonder whatll happen if a layer of SW is removed for gpgpu?
Posted on Reply
#21
cadaveca
My name is Dave
Benetanegia said:
There, then you finally said what you wanted to say. "It does not offer me" is not the same as "it has no use".
LuLz. It's your error to think i meant anything other than that. :p
Posted on Reply
#22
pantherx12
Benetanegia said:


1- Intel and AMD have been trying hard to delay GPGPU.
2- It takes time to implement things. i.e. How much it took developers to implement SSE? And the complexity of SSE in comparison to GPGPU is like...
3- You don't read a lot. There's hundreds of implementations in the scientific arena.
You forget awesome game effects.

Physics toys! ( my favourite, I love n-body simulations and water simulations)


I believe GPGPU can help with search results too if I'm not mistaken.

Lot's of stuff can benefit just hard to think of stuff of the top of your head.
Posted on Reply
#23
Benetanegia
pantherx12 said:
Wish people would stop thinking of folding at home when they think of compute.

Theirs actually a lot of stuff ATIs architecture is better at.

Well cept the 580, that's built for that stuff :laugh:

But barely outclass?

I've got apps where my 6870 completely smashes apart even top end nvidia cards.

May sound a bit fan-boyish here but just sharing my experience take it as you will.


Geeks3d and the other tech blogs demi-frequently post up comparisons of cards on new benchmarks or compute programs can find results there.

Been a while since I've read up though so can't point you in a specific direction, only that it's not so much a case of hardware vs hardware.
Feture set != performance.

There's many apps where AMD cards are faster. This is obvious, highly parallel applications which riquire very little CPU-like behavior, will always run on a highly parallelized architecture. That's not to say that Cayman has many GPGPU oriented hardware features that G80 didn't have 5 years ago.

And regarding that advantage, AMD is stepping away from that architecture in the future right? They are embracing scalar design. So which architecture was essentially right in 2006? VLIW or scalar? It really is that simple, if moving into the future for AMD means going scalar, there really is very few questions unanswered. When AMD's design is almost a copy* of Kepler and Maxwell which were announced a year ago, there's very few questions about what is the correct direction. And then it just becomes obvious who followed that path before...

cadaveca said:
LuLz. It's your error to think i meant anything other than that. :p
Well you said "it does not offer anything to the end user". That's not the same as saying that it does not offer anything to you. It offers a lot to me. Of course that's subjective, but even for the arguably few apps where it works, I feel it helps a lot. Kinda unrelated or not, but I usually hear how useless it is because "it only boasts video encoding by 50-100%". Lol you need a completely new $1000 CPU + supporting MB to achieve the same improvement, but nevermind.
Posted on Reply
#24
pantherx12
Benetanegia said:
Feture set != performance.

There's many apps where AMD cards are faster. This is obvious, highly parallel applications which riquire very little CPU-like behavior, will always run on a highly parallelized architecture. That's not to say that Cayman has many GPGPU oriented hardware features that G80 didn't have 5 years ago.

And regarding that advantage, AMD is stepping away from that architecture in the future right? They are embracing scalar design. So which architecture was essentially right in 2006? VLIW or scalar? It really is that simple, if moving into the future for AMD means going scalar, there really is very few questions unanswered. When AMD's design is almost a copy* of Kepler and Maxwell which were announced a year ago, there's very few questions about what is the correct direction. And then it just becomes obvious who followed that path before...
Still not sure how scalar has a performance advantage tbh, at a glance it should be weaker :laugh:

It's something I'll need to research more.
Posted on Reply
#25
Benetanegia
pantherx12 said:
Still not sure how scalar has a performance advantage tbh, at a glance it should be weaker :laugh:

It's something I'll need to research more.
CPUs are scalar (+ a vector unit) and GPGPU means running code that typically runs on CPU on the GPU, hence scalar is an advantage for a wider range of code.

Both future architectures from AMD and Nvidia are going to be scalar + vector. For AMD it's the arch in the OP. For Nvidia I'm not sure if it was kepler or Maxwell, but in any case by 2013 both companies will be there.
Posted on Reply
Add your own comment