AMD Patents a New Method for GPU Instruction Scheduling

AleksandarK · Jun 28, 2019

With growing revenues coming from strong sales of Ryzen and Radeon products, AMD is more focused on innovation than ever. It is important for any company to re-invest its capital into R&D, to stay ahead. And that is exactly what AMD is doing by focusing on future technologies, while constantly improving existing solutions.

On June 13th, AMD published a new method for instruction scheduling of shader programs for a GPU. The method operates on fixed number of registers. It works in five stages:

Compute liveness-based register usage across all basic blocks
Computer range of numbers of waves for shader program
Assess the impact of available post-register allocation optimizations
Compute the scoring data based on number of waves of the plurality of registers
Compute optimal number of waves

It is important to note that the "liveness" of registers is most probably a reference to register utilization, while the term "wave" refers to the machine states, like for example EOP (End Of Pipe) and DRAW which draws the shader. There are of course many more states but these are just few examples from AMD's "GPU Open" documentation. The new method is supposed to bring additional performance improvements and reduce latency by making data (machine states in this case) like a wave that is stored in a register.

You can find out more about it here.

View at TechPowerUp Main Site

Steevo · Jun 28, 2019

Looks like the first patent for a on die CPU scheduler for upcoming architecture, it may or may not be an X86-64 core, but it only makes sense if they have the know how now to make a 4Ghz scheduling CPU on die to make their GPU cores more efficient without any overhead since it could be considered the first basic AI for accelerating GPU workloads.

DeathtoGnomes · Jun 28, 2019

might also be part of an interface IF for multiple GPUs.

Vayra86 · Jun 28, 2019

Nice to see some progress on AMD's GPU side. Its about goddamn time we get a bit more than a roadmap full of too little too late. But then this won't see the light of day for at least 3 years ahead.

It also doesn't look mighty complicated... 'when its full, see if you can stuff in some more' 'and then some' captures it quite well I think. But it does sound very much like a fix for AMD's resource allocation problem and efficiency.

TheoneandonlyMrK · Jun 28, 2019

I'm surprised its this patent and the chip cooler did no one see the raytracing one on another site, it is on tom's

TEXTURE PROCESSOR BASED RAY TRACING ACCELERATION METHOD AND SYSTEM

Complete Patent Searching Database and Patent Data Analytics Services.

www.freepatentsonline.com

direct link to patent

I think these are from 2017 so in a few more years we Might see them.

dinmaster · Jun 28, 2019

infinity fabric gpus, like i saw from,
new mac pro with dual navi gpu card

Support for Infinity Fabric Link GPU interconnect technology – With up to 84GB/s per direction low-latency peer-to-peer memory access, the scalable GPU interconnect technology enables GPU-to-GPU communications up to 5X faster than PCIe Gen 3 interconnect speeds.

do chiplets on gpus and amd could have an easy time beating nvidia. either one would rock!

Midland Dog · Jun 29, 2019

Steevo said:
Looks like the first patent for a on die CPU scheduler for upcoming architecture, it may or may not be an X86-64 core, but it only makes sense if they have the know how now to make a 4Ghz scheduling CPU on die to make their GPU cores more efficient without any overhead since it could be considered the first basic AI for accelerating GPU workloads.

so like how maxwell has an arm cpu integrated in it

Steevo · Jun 29, 2019

Midland Dog said:
so like how maxwell has an arm cpu integrated in it

That's only used for boot and power management. An actual X86-64 core can run native code, and already runs much higher clock speed than ARM cores.

BorgOvermind · Jul 2, 2019

Steevo said:
Looks like the first patent for a on die CPU scheduler for upcoming architecture, it may or may not be an X86-64 core, but it only makes sense if they have the know how now to make a 4Ghz scheduling CPU on die to make their GPU cores more efficient without any overhead since it could be considered the first basic AI for accelerating GPU workloads.

Patenting can help them, but in a relatively limited way.
Generally speaking, alternates that do the same thing can be developed and implemented without breaching what someone else did.
I think it's still for x64 stuff, what else could it serve ?

InVasMani · Jul 2, 2019

Steevo said:
That's only used for boot and power management. An actual X86-64 core can run native code, and already runs much higher clock speed than ARM cores.

Yup let main CPU hand off a task to the GPU and from there it's on board GPU optimized CPU can handle the rest until it needs to communicate with it again which it could do in short bursts. The big benifit is it could be a more GPU optimized CPU in terms of L cache, instruction sets, and frequency scaling, and on top of that no OS contention to deal with unlike the primary CPU that has who knows what background tasks running, telemetry, windows updates, virus scans, ect that could be slowing it down or intermittently slowing it down and probably wouldn't scale as high frequency as a more simple 1-2c/2-4t CPU could especially with binning.

Think of Intel's 5GHz CPU's integrate 1-2cores like that on the GPU itself and suddenly that makes the primary CPU a lot less frequency starved from a gaming standpoint at 1080p esport epeen Intel talking points. When you think about it like that too it makes a lot more sense than trying to get 16c to run at 5GHz on all cores for example to match Intel general grasping at straws a bit performance advantage in games that don't scale at resolutions that don't scale lol with overkill refresh rates ofc because hey gotta win somehow at all costs 240p 960Hz refresh rate here I come pew pew pew!!

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

System Name	Dumbass
Processor	AMD Ryzen 7800X3D
Motherboard	ASUS TUF gaming B650
Cooling	Artic Liquid Freezer 2 - 420mm
Memory	G.Skill Sniper 32gb DDR5 6000
Video Card(s)	GreenTeam 4070 ti super 16gb
Storage	Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s)	1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case	Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s)	onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply	Corsair HX1000i
Mouse	Steeseries Esports Wireless
Keyboard	Corsair K100
Software	windows 10 H
Benchmark Scores	https://i.imgur.com/aoz3vWY.jpg?2

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

System Name	Sillicon Nightmares
Processor	Intel i7 9700KF 5ghz (5.1ghz 4 core load, no avx offset), 4.7ghz ring, 1.412vcore 1.3vcio 1.264vcsa
Motherboard	Asus Z390 Strix F
Cooling	DEEPCOOL Gamer Storm CAPTAIN 360
Memory	2x8GB G.Skill Trident Z RGB (B-Die) 3600 14-14-14-28 1t, tRFC 220 tREFI 65535, tFAW 16, 1.545vddq
Video Card(s)	ASUS GTX 1060 Strix 6GB XOC, Core: 2202-2240, Vcore: 1.075v, Mem: 9818mhz (Sillicon Lottery Jackpot)
Storage	Samsung 840 EVO 1TB SSD, WD Blue 1TB, Seagate 3TB, Samsung 970 Evo Plus 512GB
Display(s)	BenQ XL2430 1080p 144HZ + (2) Samsung SyncMaster 913v 1280x1024 75HZ + A Shitty TV For Movies
Case	Deepcool Genome ROG Edition
Audio Device(s)	Bunta Sniff Speakers From The Tip Edition With Extra Kenwoods
Power Supply	Corsair AX860i/Cable Mod Cables
Mouse	Logitech G602 Spilled Beer Edition
Keyboard	Dell KB4021
Software	Windows 10 x64
Benchmark Scores	13543 Firestrike (3dmark.com/fs/22336777) 601 points CPU-Z ST 37.4ns AIDA Memory

AMD Patents a New Method for GPU Instruction Scheduling

AleksandarK

News Editor

Steevo

DeathtoGnomes

Vayra86

TheoneandonlyMrK

TEXTURE PROCESSOR BASED RAY TRACING ACCELERATION METHOD AND SYSTEM

dinmaster

Midland Dog

Steevo

Attachments

BorgOvermind

InVasMani