Friday, June 28th 2019

AMD Patents a New Method for GPU Instruction Scheduling

With growing revenues coming from strong sales of Ryzen and Radeon products, AMD is more focused on innovation than ever. It is important for any company to re-invest its capital into R&D, to stay ahead. And that is exactly what AMD is doing by focusing on future technologies, while constantly improving existing solutions.

On June 13th, AMD published a new method for instruction scheduling of shader programs for a GPU. The method operates on fixed number of registers. It works in five stages:
  • Compute liveness-based register usage across all basic blocks
  • Computer range of numbers of waves for shader program
  • Assess the impact of available post-register allocation optimizations
  • Compute the scoring data based on number of waves of the plurality of registers
  • Compute optimal number of waves
It is important to note that the "liveness" of registers is most probably a reference to register utilization, while the term "wave" refers to the machine states, like for example EOP (End Of Pipe) and DRAW which draws the shader. There are of course many more states but these are just few examples from AMD's "GPU Open" documentation. The new method is supposed to bring additional performance improvements and reduce latency by making data (machine states in this case) like a wave that is stored in a register.

You can find out more about it here.
Add your own comment

9 Comments on AMD Patents a New Method for GPU Instruction Scheduling

#1
Steevo
Looks like the first patent for a on die CPU scheduler for upcoming architecture, it may or may not be an X86-64 core, but it only makes sense if they have the know how now to make a 4Ghz scheduling CPU on die to make their GPU cores more efficient without any overhead since it could be considered the first basic AI for accelerating GPU workloads.
Posted on Reply
#2
DeathtoGnomes
might also be part of an interface IF for multiple GPUs.
Posted on Reply
#3
Vayra86
Nice to see some progress on AMD's GPU side. Its about goddamn time we get a bit more than a roadmap full of too little too late. But then this won't see the light of day for at least 3 years ahead.

It also doesn't look mighty complicated... 'when its full, see if you can stuff in some more' 'and then some' captures it quite well I think. But it does sound very much like a fix for AMD's resource allocation problem and efficiency.
Posted on Reply
#5
dinmaster
infinity fabric gpus, like i saw from,
new mac pro with dual navi gpu card
  • Support for Infinity Fabric Link GPU interconnect technology – With up to 84GB/s per direction low-latency peer-to-peer memory access, the scalable GPU interconnect technology enables GPU-to-GPU communications up to 5X faster than PCIe Gen 3 interconnect speeds.
do chiplets on gpus and amd could have an easy time beating nvidia. either one would rock!
Posted on Reply
#6
Midland Dog
SteevoLooks like the first patent for a on die CPU scheduler for upcoming architecture, it may or may not be an X86-64 core, but it only makes sense if they have the know how now to make a 4Ghz scheduling CPU on die to make their GPU cores more efficient without any overhead since it could be considered the first basic AI for accelerating GPU workloads.
so like how maxwell has an arm cpu integrated in it
Posted on Reply
#7
Steevo
Midland Dogso like how maxwell has an arm cpu integrated in it
That's only used for boot and power management. An actual X86-64 core can run native code, and already runs much higher clock speed than ARM cores.
Posted on Reply
#8
BorgOvermind
SteevoLooks like the first patent for a on die CPU scheduler for upcoming architecture, it may or may not be an X86-64 core, but it only makes sense if they have the know how now to make a 4Ghz scheduling CPU on die to make their GPU cores more efficient without any overhead since it could be considered the first basic AI for accelerating GPU workloads.
Patenting can help them, but in a relatively limited way.
Generally speaking, alternates that do the same thing can be developed and implemented without breaching what someone else did.
I think it's still for x64 stuff, what else could it serve ?
Posted on Reply
#9
InVasMani
SteevoThat's only used for boot and power management. An actual X86-64 core can run native code, and already runs much higher clock speed than ARM cores.
Yup let main CPU hand off a task to the GPU and from there it's on board GPU optimized CPU can handle the rest until it needs to communicate with it again which it could do in short bursts. The big benifit is it could be a more GPU optimized CPU in terms of L cache, instruction sets, and frequency scaling, and on top of that no OS contention to deal with unlike the primary CPU that has who knows what background tasks running, telemetry, windows updates, virus scans, ect that could be slowing it down or intermittently slowing it down and probably wouldn't scale as high frequency as a more simple 1-2c/2-4t CPU could especially with binning.

Think of Intel's 5GHz CPU's integrate 1-2cores like that on the GPU itself and suddenly that makes the primary CPU a lot less frequency starved from a gaming standpoint at 1080p esport epeen Intel talking points. When you think about it like that too it makes a lot more sense than trying to get 16c to run at 5GHz on all cores for example to match Intel general grasping at straws a bit performance advantage in games that don't scale at resolutions that don't scale lol with overkill refresh rates ofc because hey gotta win somehow at all costs 240p 960Hz refresh rate here I come pew pew pew!!
Posted on Reply
Copyright © 2004-2021 www.techpowerup.com. All rights reserved.
All trademarks used are properties of their respective owners.