• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD-The Master Plan

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
I edited my post.

DX11 had compute engines which were supported in GCN via the ACEs but, because the CPU is overwhelmed by the graphics pipeline, the ACEs couldn't be called without shutting down the graphics pipeline. Mantle was created to address the problem of CPU bottlenecking/interrupting the graphics wavefront.

Here's an old article that long predates D3D12 and Mantle: http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute/5
Meanwhile on the compute side, AMD’s new Asynchronous Compute Engines serve as the command processors for compute operations on GCN. The principal purpose of ACEs will be to accept work and to dispatch it off to the CUs for processing. As GCN is designed to concurrently work on several tasks, there can be multiple ACEs on a GPU, with the ACEs deciding on resource allocation, context switching, and task priority. AMD has not established an immediate relationship between ACEs and the number of tasks that can be worked on concurrently, so we’re not sure whether there’s a fixed 1:X relationship or whether it’s simply more efficient for the purposes of working on many tasks in parallel to have more ACEs.

One effect of having the ACEs is that GCN has a limited ability to execute tasks out of order. As we mentioned previously GCN is an in-order architecture, and the instruction stream on a wavefront cannot be reodered. However the ACEs can prioritize and reprioritize tasks, allowing tasks to be completed in a different order than they’re received. This allows GCN to free up the resources those tasks were using as early as possible rather than having the task consuming resources for an extended period of time in a nearly-finished state. This is not significantly different from how modern in-order CPUs (Atom, ARM A8, etc) handle multi-tasking.
More reading:
https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf (GCN thoroughly mapped out)
http://amd-dev.wpengine.netdna-cdn....10/Asynchronous-Shaders-White-Paper-FINAL.pdf (explanation of async compute)

GCN cards predating Mantle, Vulkan, and DirectX 12 benefit from using async compute in the latest software because the hardware always supported it even when the APIs were incapable of using it. None of the NVIDIA cards to date do (at least not without dropping FPS).


Fury/Fury X/Nano are crippled when their ACEs are silent. High resolution is the only way to utilize most of the chip. This problem is likely to recur with Vega which also has 4096 stream processors. AMD is banking on D3D12/Vulkan putting those 4096 stream processors to work but that can only happen as developers make the swtich. This is why Vega has been kicked down the road where NVIDIA is coming out swinging.

NVIDIA has said GTX 1070 is quite a bit faster than Titan X.
 
Last edited:
Joined
Jun 24, 2015
Messages
35 (0.01/day)
GCN cards predating Mantle, Vulkan, and DirectX 12 benefit from using async compute in the latest software because the hardware always supported it even when the APIs were incapable of using it. None of the NVIDIA cards to date do (at least not without dropping FPS).

NVIDIA has said GTX 1070 is quite a bit faster than Titan X.

That's the point, ACEs predate Mantle and DX12/Vulkan, because on console, the API can use it. On DX11, it could not so the ACEs are doing nothing until Mantle/DX12 games.

NV's chart for gaming performance, comparing Titan X to 1080, the Titan X was at ~3.6 and 1080 at was ~4.4, or ~22% faster.

Now that's with 9TFlops on the 1080.

The 1070 has 6.5 TFlops from NV's own numbers. There's no way that the 1080 being ~22% faster than Titan X will result in a crippled 1070 being quite a bit faster than Titan X as well. :) ie. The 1080 is ~25-30% faster than the 1070 based on specs.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Fury X can theoretically do 8.6 TFlOp...if you can wake the whole chip up.

You're right that 1070 GTX is probably around the performance of the top cards available now or just below it.
 

Kanan

Tech Enthusiast & Gamer
Joined
Aug 22, 2015
Messages
3,517 (1.12/day)
Location
Europe
System Name eazen corp | Xentronon 7.2
Processor AMD Ryzen 7 3700X // PBO max.
Motherboard Asus TUF Gaming X570-Plus
Cooling Noctua NH-D14 SE2011 w/ AM4 kit // 3x Corsair AF140L case fans (2 in, 1 out)
Memory G.Skill Trident Z RGB 2x16 GB DDR4 3600 @ 3800, CL16-19-19-39-58-1T, 1.4 V
Video Card(s) Asus ROG Strix GeForce RTX 2080 Ti modded to MATRIX // 2000-2100 MHz Core / 1938 MHz G6
Storage Silicon Power P34A80 1TB NVME/Samsung SSD 830 128GB&850 Evo 500GB&F3 1TB 7200RPM/Seagate 2TB 5900RPM
Display(s) Samsung 27" Curved FS2 HDR QLED 1440p/144Hz&27" iiyama TN LED 1080p/120Hz / Samsung 40" IPS 1080p TV
Case Corsair Carbide 600C
Audio Device(s) HyperX Cloud Orbit S / Creative SB X AE-5 @ Logitech Z906 / Sony HD AVR @PC & TV @ Teufel Theater 80
Power Supply EVGA 650 GQ
Mouse Logitech G700 @ Steelseries DeX // Xbox 360 Wireless Controller
Keyboard Corsair K70 LUX RGB /w Cherry MX Brown switches
VR HMD Still nope
Software Win 10 Pro
Benchmark Scores 15 095 Time Spy | P29 079 Firestrike | P35 628 3DM11 | X67 508 3DM Vantage Extreme
I don't buy it. Polaris 10 will probably be around Fury X in performance which is a long ways behind GTX 1070. AMD knows this which is why hot on the heels of GTX 1070 being announced, AMD announced they're moving production of Vega (4096 stream processors) forward. In terms of hardware, Polaris 10 either matches or is slightly behind 390X; it makes up the difference with higher clock speeds.

I think you're way off with your predictions. First, don't fall for Nvidia numbers, that are PR - also don't hype Nvidia, you're doing that on numerous occasions, always touting Nvidia as the source, that is basically nothing worth to anybody that wants to be seriously informed about things.

Second, Polaris will be between 390X to Fury X speed, which means it could be easily faster than GTX 1070, that is most probably, stupid PR aside, only as fast as a pumped up GTX 980 or just as fast as a GTX 980 - nowhere near of a GTX 980 Ti or Titan X. Again, don't fall for Nvidia PR / hype.

Also IF AMD is moving production of Vega forward to Oktober, it's clearly because of the GTX 1080, not the 1070. The 1070 is most probably (99%) not as fast as Nvidia claimed. Maybe even nowhere near of it. PR is PR = 99% bullshit.

Fury/Fury X/Nano are crippled when their ACEs are silent. High resolution is the only way to utilize most of the chip. This problem is likely to recur with Vega which also has 4096 stream processors. AMD is banking on D3D12/Vulkan putting those 4096 stream processors to work but that can only happen as developers make the swtich. This is why Vega has been kicked down the road where NVIDIA is coming out swinging.

NVIDIA has said GTX 1070 is quite a bit faster than Titan X.
It's very good adressed in the video you too easily discredited, why you're wrong with your thoughts on Vega, because it has reworked prefetch units and command units that will improve efficiency of GCN 4 based shaders a lot in DX11 games who had previously problems with GCN architecture. And DX12 never was a problem on AMD cards as known well enough. Saying this, I think Vega will destroy GTX 1080 easily. It's basically a Fury X on crack - no limitations in hardware, better shaders, (a lot) better clocks.
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
a) Look at my system specs. I literally only have one NVIDIA card to my name and it isn't even installed in a machine. My expression reading most of that: :laugh:
b) Should I have said GTX 1080? Yes.
c) Polaris has fewer shaders than the 390X available now. Polaris is no doubt an architectural improvement over GCN 1.1 but how much of an improvement is anyone's guess. That said, I don't think it is a coincidence that Vega launch was moved up to this year not long after GTX 1080 was announced. AMD wouldn't have done that if they were confident Polaris could go toe-to-toe with it. They'd rather avoid the miserable launch that Fury/Fury X/Nano saw where there wasn't enough product it meet demand.
 
Last edited:

the54thvoid

Intoxicated Moderator
Staff member
Joined
Dec 14, 2009
Messages
12,378 (2.37/day)
Location
Glasgow - home of formal profanity
Processor Ryzen 7800X3D
Motherboard MSI MAG Mortar B650 (wifi)
Cooling be quiet! Dark Rock Pro 4
Memory 32GB Kingston Fury
Video Card(s) Gainward RTX4070ti
Storage Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Asus Prime AP201
Audio Device(s) On Board
Power Supply be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software W10
I think Vega will destroy GTX 1080 easily. It's basically a Fury X on crack - no limitations in hardware, better shaders, (a lot) better clocks.

Do you have the specs for Vega because I haven't seen a single leak about it. Besides, it depends which way the two companies have done their backroom handshake. It's conceivable that Vega (I assume you mean the crippled one they release in October?) will beat out 1080 but that would be expected as Vega is the successor to Fury X, so at least expect better than Fury X performance (but by how much %)?

However, with GP100 on limited release for HPC, 1080 out shortly, the way is clear for Nvidia to finalise the chops for a consumer, desktop GP100 (GP102?) variant. You think that (I guess would be the 1080ti) would maybe smack down Vega? GP100 has a different core structure to the 1080 part (leaked slides are out on Videocardz) so it will be interesting to see what the consumer version is.
 
Joined
Jan 11, 2005
Messages
1,491 (0.21/day)
Location
66 feet from the ground
System Name 2nd AMD puppy
Processor FX-8350 vishera
Motherboard Gigabyte GA-970A-UD3
Cooling Cooler Master Hyper TX2
Memory 16 Gb DDR3:8GB Kingston HyperX Beast + 8Gb G.Skill Sniper(by courtesy of tabascosauz &TPU)
Video Card(s) Sapphire RX 580 Nitro+;1450/2000 Mhz
Storage SSD :840 pro 128 Gb;Iridium pro 240Gb ; HDD 2xWD-1Tb
Display(s) Benq XL2730Z 144 Hz freesync
Case NZXT 820 PHANTOM
Audio Device(s) Audigy SE with Logitech Z-5500
Power Supply Riotoro Enigma G2 850W
Mouse Razer copperhead / Gamdias zeus (by courtesy of sneekypeet & TPU)
Keyboard MS Sidewinder x4
Software win10 64bit ltsc
Benchmark Scores irrelevant for me
as i understand amd has a fully functional dx12 approach and nvidia can "emulate" it so far

looking forward to see the result
popcorn.gif
 
Joined
Oct 2, 2004
Messages
13,791 (1.94/day)
as i understand amd has a fully functional dx12 approach and nvidia can "emulate" it so far

looking forward to see the result View attachment 74625

You can't emulate async shaders. They are either async or they aren't. Whole point of async shaders is performance since it can make graphics and compute tasks in parallel. It's not a graphical feature that needs emulated in order to render a certain effect. That's like saying, well, we'll just emulate pixel shaders. Sure you can, but performance will be so rubbish there won't be any point. Same with async. You can "emulate" it by switching between graphic and compute tasks during work, but then you're not really using async shaders, you're just doing what graphic cards have been doing already for ages.
 
Joined
Aug 20, 2007
Messages
20,713 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
I think it was DX11.1 or 11.2 (whatever, one of those optional extensions) that brought async shaders into the fray. There was some flack at the time about Kepler being unable to support it, while GCN 1.0 could. Part of the reason it never got used.

I don't recall the specifics, but some google-fu from the interested should verify this.

EDIT: Nope, I'm wrong. See below.
 
Last edited:
Joined
Oct 2, 2004
Messages
13,791 (1.94/day)
To put it in perspective...

This is how non async graphic cards share load between graphics and compute (G = graphics, C = compute). It is performed in sequence from start to end:

--time-->
GGGGGGCCCCGCGCGCGCCCCGG

This is how async does the exact same job, in parallel:

--time-->
GGGGGGGGGGGG
CCCCCCCCCCC

See how shorter the time is to render all the tasks with async shaders?
 
Joined
Jun 24, 2015
Messages
35 (0.01/day)
I think it was DX11.1 or 11.2 (whatever, one of those optional extensions) that brought async shaders into the fray. There was some flack at the time about Kepler being unable to support it, while GCN 1.0 could. Part of the reason it never got used.

I don't recall the specifics, but some google-fu from the interested should verify this.

Nope. Async Shaders only available with the advent of Mantle.

Prior API could NOT do parallel queues because they are all serial in nature (except on consoles).
 
Joined
Aug 20, 2007
Messages
20,713 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
Joined
Jun 24, 2015
Messages
35 (0.01/day)
Basically AMD created GCN for console API. On PC, it was running with one leg basically, all those ACEs idling, Command Processor being swamped and bottlenecked. It cannot keep up to keep shaders utilized.

AMD knew it, so they worked on a console-like API, aka Mantle. They then gave it away for free to Apple (Metal), Kronos (Vulkan) and also shoved it to Microsoft to be the foundation of DX12.

With Polaris having a better Command Processor, I think AMD will fix their DX11 problem of low shader utilization.

That video linked earlier I think is spot on.

And now with Pascal brief leaked out, NV talking about graphics and compute context switching improvement, as well as real functional preemption, it looks like the other video about Pascal being GCN-like is also spot on.

Note NVIDIA even talk about VR and their new preemption.





^^ This guy basically called it.


And good chance he's right about Polaris as well:

 
Joined
Aug 20, 2007
Messages
20,713 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
They then gave it away for free to Apple (Metal), Kronos (Vulkan) and also shoved it to Microsoft to be the foundation of DX12.

I'm going to have to [Citation Needed] that one. The basic "evidence" for DX12 being mantle is similar developer guides. Well no shit, they are both low level APIs. Kronos's Vulkan on the other hand, is a Mantle successor with a mantle foundation. That part is accurate. Metal? Completely apple's work and arguably the highest level of the bunch.
 
Joined
Jun 24, 2015
Messages
35 (0.01/day)
I'm going to have to [Citation Needed] that one. The basic "evidence" for DX12 being mantle is similar developer guides. Well no shit, they are both low level APIs. Kronos's Vulkan on the other hand, is a Mantle successor with a mantle foundation. That part is accurate. Metal? Completely apple's work and arguably the highest level of the bunch.

You can find the citations in SIGGRAPH 2015, presentations about next-gen APIs. High similarities between all three at the code level.

Before that, I am sure you've seen the DX12 programming guide PDF, lifting paragraphs straight from Mantle's programming guide, plagiarism is strong.

You can also find lots of major developers on twitter talking about how DX12 is deja-vu with Mantle. Connect the dots. Nobody at MS is going to come out and say DX12 used Mantle as a foundation. Nor Apple. For marketing/legal reasons.

But AMD gave it away for free, because they took a gamble, as their people like to say on social media (Robert Hallock, google it), in the hope that their API will power all next-gen games.
 
Joined
Aug 20, 2007
Messages
20,713 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
You can find the citations in SIGGRAPH 2015, presentations about next-gen APIs. High similarities between all three at the code level.

Before that, I am sure you've seen the DX12 programming guide PDF, lifting paragraphs straight from Mantle's programming guide, plagiarism is strong.

These are both due to the fact low level APIs tend to function similarly.

I don't see that as proof. Mind you I can't rule it out either, I just don't believe it personally... For precisely the same reason: Legal. Who in their right mind at AMD would give it away for free without credit due? There are other more likely explanations I can see.
 
Joined
Jun 24, 2015
Messages
35 (0.01/day)
These are both due to the fact low level APIs tend to function similarly.

I don't see that as proof. Mind you I can't rule it out either, I just don't believe it personally... For precisely the same reason: Legal. Who in their right mind at AMD would give it away for free without credit due? There are other more likely explanations I can see.

Here's a likely scenario.

AMD gave it for free to spark development because MS was stagnating, a lot time between DX.

They don't get credited because it would not fly well with NVIDIA/Intel PR/Marketing.

Only Vulkan gets the credit because Kronos is chaired by NVIDIA and Intel is part of it.

This is what AMD PR personnel insinuate. Whether it's true or not, we just do not know until one of the key players come out and talk about it. Maybe in 10 years time when none of this matters. ;)
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Preemption is important to VR. I haven't seen any NVIDIA material that says Pascal is capable of async workloads like GCN.

Metal is low-level like Mantle/D3D12/Vulkan. Metal I think is something Apple developed in house from OpenGL and OpenCL. If they consulted with someone, it isn't clear who.

Edit: libGCM (PlayStation 3) predates Mantle and Metal:
http://www.redgamingtech.com/ice-te...apis-sonys-libgcm-first-modern-low-level-api/


The only direct descendant of Mantle is Vulkan. D3D12 may have gotten some nudges from AMD but D3D12 is still predominantly Microsoft's (DirectX has always been a collaborative effort).
 
Last edited:
Joined
Nov 5, 2012
Messages
63 (0.02/day)
Location
South Africa
Can someone do a writeup? I absolutely cannot stand videos.

Bascially, you can boil it down into two parts:

1) DirectX 12 can address multiple GPUs as a single entity, but only as far as actual jobs are concerned, because most GPU's don't have the hardware necessary to allow pooled memory. Vulkan is technically capable of this as well.

Couple that benefit with this: https://www.reddit.com/r/Amd/commen...group_qa_is_happening_here_on/d0mkcc2.compact

More specifically:

2) Q: Do you expect interposers to experience a moores law like improvement trend?
A: This is one of my favorite questions on the thread. In fact, interposers are a great way to advance Moore's law. High-performance silicon interposers permit for the integration of different process nodes, different process optimizations, different materials (optics vs. metals), or even very different IC types (logic vs. storage) all on a common fabric that transports data at the speed of a single integrated chip. As we ("the industry") continue to collapse more and more performance and functionality into a common chip, like we did with Fiji an the GPU+RAM, the interposer is a great option to improve socket density.

Yes, it is absolutely possible that one future for the chip design industry is breaking out very large chips into multiple small and discrete packages mounted to an interposer. We're a long ways off from that as an industry, but it's definitely an interesting way to approach the problem of complexity and the expenses of monolithic chips.
The writing has been on the wall for a while now, especially if you've been following the Zen news. There's a reason no-one has leaked the socket AM4 design just yet:



"Multiple units can be combined for even greater performance"
"High speed interconnect links multiple units together"

AMD still owns the SeaMicro Freedom Fabric IP, which can link CPUs together on an interposer.

Ergo, socket AM4 could rely on the use of interposers, as will the Navi GPU architecture. AMD slowly works to implement HBM on everything it can find, while NVIDIA deals with the slow graduation from GDDR5 to G5X and eventually HBM a few years down the line, while still building larger dies at TSMC.

I'm not saying the video OP is right, but he's on a similar track to what I've been talking about for a few years now since the launch of Fiji and the reveal of what makes DX12 tick. Using inteposers everywhere would change the game completely, and AMD gets to ignore Moore's law and the issues with designing larger dies as a result.

as i understand amd has a fully functional dx12 approach and nvidia can "emulate" it so far

looking forward to see the result

NVIDIA can emulate the asynchronous behaviour to a point, but it can only use preemption to do so, which means that a SMM will be tasked with doing a compute function and won't be useable for graphics work until that task is finished.

Heavy Async workloads will mean there's less juice available for graphics commands, which is why the Hitman devs were talking about piling too much async work on to the game, which would be a detriment to performance - not to AMD, because it's architecture is designed to do it fast, but to NVIDIA.
 
Joined
Aug 20, 2007
Messages
20,713 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
AMD gets to ignore Moore's law and the issues with designing larger dies as a result.

Good writeup. A small nitpick with the above though... Technically, Moore's law says nothing about large die limitations. It speaks of a doubling of performance per node change or something like that. So AMD isn't IGNORING Moore's law at all, but rather being given more time to play within it's statements. So the way you phrase that is slightly incorrect. ;)

Otherwise thanks for that writeup. It seems pretty well founded and if AMD actually planned interposers this well, congrats to them. I'm curious if the expense in producing interposers and the poor yields for them will play against them, however.
 
Joined
Nov 5, 2012
Messages
63 (0.02/day)
Location
South Africa
Good writeup. A small nitpick with the above though... Technically, Moore's law says nothing about large die limitations. It speaks of a doubling of performance per node change or something like that. So AMD isn't IGNORING Moore's law at all, but rather being given more time to play within it's statements. So the way you phrase that is slightly incorrect. ;)

That's true, I could word it better. Or I could explain it in DBZ terms:

OG Cell is the interposer. AMD adds in Android 17 and 18 to form Perfect Cell, which is perfectly capable of using Cell Jrs to tackle smaller tasks while what remains of Cell is able to fight off and dominate the Z-fighters.

No, wait, that doesn't work either. Hmmm.
 
Joined
Aug 20, 2007
Messages
20,713 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
That's true, I could word it better. Or I could explain it in DBZ terms:

OG Cell is the interposer. AMD adds in Android 17 and 18 to form Perfect Cell, which is perfectly capable of using Cell Jrs to tackle smaller tasks while what remains of Cell is able to fight off and dominate the Z-fighters.

No, wait, that doesn't work either. Hmmm.

I think you did better the first time, lol.
 
Joined
Sep 17, 2014
Messages
20,780 (5.97/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
That's true, I could word it better. Or I could explain it in DBZ terms:

OG Cell is the interposer. AMD adds in Android 17 and 18 to form Perfect Cell, which is perfectly capable of using Cell Jrs to tackle smaller tasks while what remains of Cell is able to fight off and dominate the Z-fighters.

No, wait, that doesn't work either. Hmmm.

Great considerations to start the day. Thanks :D
 

Kanan

Tech Enthusiast & Gamer
Joined
Aug 22, 2015
Messages
3,517 (1.12/day)
Location
Europe
System Name eazen corp | Xentronon 7.2
Processor AMD Ryzen 7 3700X // PBO max.
Motherboard Asus TUF Gaming X570-Plus
Cooling Noctua NH-D14 SE2011 w/ AM4 kit // 3x Corsair AF140L case fans (2 in, 1 out)
Memory G.Skill Trident Z RGB 2x16 GB DDR4 3600 @ 3800, CL16-19-19-39-58-1T, 1.4 V
Video Card(s) Asus ROG Strix GeForce RTX 2080 Ti modded to MATRIX // 2000-2100 MHz Core / 1938 MHz G6
Storage Silicon Power P34A80 1TB NVME/Samsung SSD 830 128GB&850 Evo 500GB&F3 1TB 7200RPM/Seagate 2TB 5900RPM
Display(s) Samsung 27" Curved FS2 HDR QLED 1440p/144Hz&27" iiyama TN LED 1080p/120Hz / Samsung 40" IPS 1080p TV
Case Corsair Carbide 600C
Audio Device(s) HyperX Cloud Orbit S / Creative SB X AE-5 @ Logitech Z906 / Sony HD AVR @PC & TV @ Teufel Theater 80
Power Supply EVGA 650 GQ
Mouse Logitech G700 @ Steelseries DeX // Xbox 360 Wireless Controller
Keyboard Corsair K70 LUX RGB /w Cherry MX Brown switches
VR HMD Still nope
Software Win 10 Pro
Benchmark Scores 15 095 Time Spy | P29 079 Firestrike | P35 628 3DM11 | X67 508 3DM Vantage Extreme
a) Look at my system specs. I literally only have one NVIDIA card to my name and it isn't even installed in a machine. My expression reading most of that: :laugh:
b) Should I have said GTX 1080? Yes.
c) Polaris has fewer shaders than the 390X available now. Polaris is no doubt an architectural improvement over GCN 1.1 but how much of an improvement is anyone's guess. That said, I don't think it is a coincidence that Vega launch was moved up to this year not long after GTX 1080 was announced. AMD wouldn't have done that if they were confident Polaris could go toe-to-toe with it. They'd rather avoid the miserable launch that Fury/Fury X/Nano saw where there wasn't enough product it meet demand.
I never said Polaris goes toe-to-toe with GTX 1080, I said it does so maybe with GTX 1070 (and especially now after we know the disappointing specs of GTX 1070 with only 1920 shaders enabled). Yes Polaris has fewer shaders than Hawaii, but it still has a maybe 50% (or more) higher clock, that's easily enough to make it faster than Hawaii, but well, let's wait and see. I'm not sure on it either. I'd like it to be so, because I'd like to buy the card if it really performs good and has a good price (250-350€).

Do you have the specs for Vega because I haven't seen a single leak about it. Besides, it depends which way the two companies have done their backroom handshake. It's conceivable that Vega (I assume you mean the crippled one they release in October?) will beat out 1080 but that would be expected as Vega is the successor to Fury X, so at least expect better than Fury X performance (but by how much %)?

However, with GP100 on limited release for HPC, 1080 out shortly, the way is clear for Nvidia to finalise the chops for a consumer, desktop GP100 (GP102?) variant. You think that (I guess would be the 1080ti) would maybe smack down Vega? GP100 has a different core structure to the 1080 part (leaked slides are out on Videocardz) so it will be interesting to see what the consumer version is.
Specs of Vega, as on numorous (albeit rumour) sites are 4096 shaders that are based on ip9, which means these are NEW shaders, Polaris is based on ip8, which means it uses the Fiji shaders with everything else upgraded.

Yes it could either be a crippled big chip (600mm²) or a smaller chip of about 400-450mm² size. The big chip (again rumours) is said to be Vega11 and released later with 6144 shaders tops.

btw. the "ip9" vs "ip8" shaders is no rumour, it's based on AMD information.

Source: http://wccftech.com/amd-vega-10-gpu-4096-stream-processors/
(yes wccftech, I know, but it's okay for now)

If GP102 or GP100 desktop variant is a lot better than Pascal (GP104) it has a good chance against Vega I'd say (talking shader amounts). If not, then not. So far I'm not impressed by Pascal, because it's so far only a Maxwell on speed (nearly similar amount of shaders with higher clocks, problem is, a overclocked or custom 980 Ti is nearly as fast, because it doesn't use the low clocks of ref cards). Maybe tomorrow PCGH will release benchmarks of GTX 1080 against custom cards, then we can talk about it, having proof numbers. My estimation is 5% GTX 1080 vs custom 980 Ti, and ~15%+ GTX 1080 custom vs custom 980 Ti (overclocking of GTX 1080 increase clocks to about 2100 MHz, which is about 13% speed increase).
 
Joined
Jun 24, 2015
Messages
35 (0.01/day)
Now that we've seen Pascal 1080, the gap isn't that big. Vega should defeat it easily, and be very expensive due to HBM2. hah
 
Top