• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Radeon VII Detailed Some More: Die-size, Secret-sauce, Ray-tracing, and More

Joined
Feb 18, 2005
Messages
5,238 (0.75/day)
Location
Ikenai borderline!
System Name Firelance.
Processor Threadripper 3960X
Motherboard ROG Strix TRX40-E Gaming
Cooling IceGem 360 + 6x Arctic Cooling P12
Memory 8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s) MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage 2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s) 3x AOC Q32E2N (32" 2560x1440 75Hz)
Case Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply Fractal Design Ion+ 2 Platinum 760W
Mouse Logitech G602
Keyboard Logitech G613
Software Windows 10 Professional x64
WCC level of speculations in an article that in essence highlights what nVidia would like to highlight about VegaVII vs 2080.

How do things of that kind work, do authors of these texts simply root for nVidia, do they come from Huang's headquarters, are they somehow censored by NV's PR team?
Just curious.

I'm not even sure what you're trying to say here, although your usual crying about NVIDIA bias is obvious.
 
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K IPS FreeSync/GSync DP, LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
WCC level of speculations in an article that in essence highlights what nVidia would like to highlight about VegaVII vs 2080.

How do things of that kind work, do authors of these texts simply root for nVidia, do they come from Huang's headquarters, are they somehow censored by NV's PR team?
Just curious.
A reminder for AMD, build a GPU with large classic GPU hardware NOT medium size GP104 class classic GPU hardware with 13 TFLOPS DSP.
 
Joined
Oct 14, 2017
Messages
210 (0.09/day)
System Name Lightning
Processor 4790K
Motherboard asrock z87 extreme 3
Cooling hwlabs black ice 20 fpi radiator, cpu mosfet blocks, MCW60 cpu block, full cover on 780Ti's
Memory corsair dominator platinum 2400C10, 32 giga, DDR3
Video Card(s) 2x780Ti
Storage intel S3700 400GB, samsung 850 pro 120 GB, a cheep intel MLC 120GB, an another even cheeper 120GB
Display(s) eizo foris fg2421
Case 700D
Audio Device(s) ESI Juli@
Power Supply seasonic platinum 1000
Mouse mx518
Software Lightning v2.0a
I suspected for some time that augmenting polaris was better than using vega for games, look verry "simple" for AMD: just double everything on polaris and you got a steallar gaming card :)
 
Joined
Jul 9, 2015
Messages
3,413 (1.06/day)
System Name M3401 notebook
Processor 5600H
Motherboard NA
Memory 16GB
Video Card(s) 3050
Storage 500GB SSD
Display(s) 14" OLED screen of the laptop
Software Windows 10
Benchmark Scores 3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.
...crying about NVIDIA...
Because "lack of tensor cores" (who the hell needs them in gaming) and elusive "RT stuff" (how many games support it, one?) is so important to highlight when talking about AMD product.
 
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
Because "lack of tensor cores" (who the hell needs them in gaming) and elusive "RT stuff" (how many games support it, one?) is so important to highlight when talking about AMD product.
Futureproof is a large argument for Radeon 7, mostly with the 16GB VRAM. Similar argument can be made for RTX and DLSS.
 
Joined
Jul 9, 2015
Messages
3,413 (1.06/day)
System Name M3401 notebook
Processor 5600H
Motherboard NA
Memory 16GB
Video Card(s) 3050
Storage 500GB SSD
Display(s) 14" OLED screen of the laptop
Software Windows 10
Benchmark Scores 3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.
Futureproof is a large argument for Radeon 7, mostly with the 16GB VRAM. Similar argument can be made for RTX and DLSS.
Except we have seen newer games use more RAM (and with consoles beefed up and pushing 4 it will be a given), and god knows if RT will be anything, but "nvidia paid us to implement it, so here is that gimmick for ya" until RT could be run by the masses and as for DLSS, its usefulness is arguable at best.
 
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
VRAM is not that straightforward either. More is always good but actual usefulness is not that clear. New consoles are next year (2020) at best and current generation sits at 8GB RAM total (XBox One X is an outlier with 12 but that will not change things much). Given the GTX1080Ti/RTX2080 performance if Radeon 7 will perform in the same level it will not really be a 4K card.
 
Joined
Oct 14, 2017
Messages
210 (0.09/day)
System Name Lightning
Processor 4790K
Motherboard asrock z87 extreme 3
Cooling hwlabs black ice 20 fpi radiator, cpu mosfet blocks, MCW60 cpu block, full cover on 780Ti's
Memory corsair dominator platinum 2400C10, 32 giga, DDR3
Video Card(s) 2x780Ti
Storage intel S3700 400GB, samsung 850 pro 120 GB, a cheep intel MLC 120GB, an another even cheeper 120GB
Display(s) eizo foris fg2421
Case 700D
Audio Device(s) ESI Juli@
Power Supply seasonic platinum 1000
Mouse mx518
Software Lightning v2.0a
yhe DLSS suckx, RT mutch better than DLSS, I don't mind RT. I see this limited RT as another rasterization hack: there is no way to properly reflections in rasterization so this was the only way, this is whay it used for reflections in battelfield 5 (I think). but it show that even this littel usage is so insane heavy that crawl everything. if you think about the 2000 cards you see verry littel innovation for improved graphics: selective shaders are for reducing graphics. RT is the only thing that is innovation for increasing graphics. some people alredy mentioned that nvidea reach limit in term of rasterization, and performance shows that the rasterization cores give verry littel improvement compared to pascal, if you think about the increasing silicon problems and costs, engineer (mutch) better rasterization cores would be expencive, it seem to me that both nvidea and AMD did the same thing: they saved re-engineering costs this round and I think I know whay - it not worth with this round of silicon, I think some bigger improvements in silicon like used to be in like 2007 with 65nm to 45 nm re-engineering was worth becouse it gived so mutch, with modern verry littel improvements it not :)
 
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
@Manoa , what do you mean by innovation? RT and Tensor stuff is new and unused but the shaders themselves are based on Volta, not Pascal and come with couple of other new things as well. The features they come with are actually often enough very close to what Vega brought to the table. RPM seems to be the big one here. Turing has RPM same as Vega - which accounts for several benchmark wins Vega had over Pascal and no longer has over Turing. To bolster cache and memory system Turing actually got caches increased even compared to Volta. Mesh shaders are suspected to have some commonalities with primitive shaders, at least in the idea if maybe not implementation.

This time around Radeon VII is the one with less innovation. Even less so if we look at gaming. Not sure if 1:2 FP64 and wider memory bus count as innovation here.

All that is architecturally speaking. 7nm is a quality on its own :)
 
Joined
Oct 14, 2017
Messages
210 (0.09/day)
System Name Lightning
Processor 4790K
Motherboard asrock z87 extreme 3
Cooling hwlabs black ice 20 fpi radiator, cpu mosfet blocks, MCW60 cpu block, full cover on 780Ti's
Memory corsair dominator platinum 2400C10, 32 giga, DDR3
Video Card(s) 2x780Ti
Storage intel S3700 400GB, samsung 850 pro 120 GB, a cheep intel MLC 120GB, an another even cheeper 120GB
Display(s) eizo foris fg2421
Case 700D
Audio Device(s) ESI Juli@
Power Supply seasonic platinum 1000
Mouse mx518
Software Lightning v2.0a
so nvidea did improve the rasterization cores after all, I guess it becouse they have so many money so AMD can't afford it then that the problem
lel man volta 3000$ xD
Not sure if 1:2 FP64 and wider memory bus count as innovation here
it not :x but I wish it was: if games used high precision computations it could give better graphics but I think the cards would burn becouse high power will used and temperature...
 
Last edited:
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
if games used high precision computations it could give better graphics but I think the cards would burn becouse high power will used and temperature...
There does not seem to be any need for higher precision. Even if there was, 1:2 FP64 is the best case scenario for performance. Current trend is exactly the opposite - using FP16 for some calculations that do not need the precision. Use of that is very situational and does not bring great boost but these days every little bit helps. This is where Rapid Packed Math (RPM) comes in, this runs 2:1 FP16 ;)
 
Joined
Oct 14, 2017
Messages
210 (0.09/day)
System Name Lightning
Processor 4790K
Motherboard asrock z87 extreme 3
Cooling hwlabs black ice 20 fpi radiator, cpu mosfet blocks, MCW60 cpu block, full cover on 780Ti's
Memory corsair dominator platinum 2400C10, 32 giga, DDR3
Video Card(s) 2x780Ti
Storage intel S3700 400GB, samsung 850 pro 120 GB, a cheep intel MLC 120GB, an another even cheeper 120GB
Display(s) eizo foris fg2421
Case 700D
Audio Device(s) ESI Juli@
Power Supply seasonic platinum 1000
Mouse mx518
Software Lightning v2.0a
yhe it the true, 780 Ti 192 FP32, maxwell 96, pascal 64 ?
you don't think graphics is lower when used 16 bit float insted of 32 bit ?
I meen in sound this the true, more bit/sample give more quality.
there is also floated textures, I don't realy understand whay it don't need, it a trade ? more speed over more quality ? or more accuracy don't give more quality at all ?
I know GIMP have float point mode operation, and on the right monitor it realy look good...
 
Last edited:
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K IPS FreeSync/GSync DP, LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
I suspected for some time that augmenting polaris was better than using vega for games, look verry "simple" for AMD: just double everything on polaris and you got a steallar gaming card :)
RX-580 2X wide would have
4MB L2 cache
12 TFLOPS at 1340 Mhz,
64 ROPS at 1340 Mhz, bottlenecked problem. Polaris ROPS are not connected to L2 cache, hence highly dependant on external memory performance when compared to Vega 64 ROPS.
8 raster engines at 1340 Mhz, equivalent to six raster engines at 1800Mhz
512 GB/s memory bandwidth.
Still has problems with 64 ROPS at low clock speed.

Ideally, Vega M GH 2X wide would have
48 CU at 1536 Mhz would yield 9.4 TFLOPS
8 raster engines with 8 Shader Engines at 1536 Mhz
128 ROPS at 1536 Mhz

Futureproof is a large argument for Radeon 7, mostly with the 16GB VRAM. Similar argument can be made for RTX and DLSS.
DLSS is just pixel reconstruction with multiple samples from previous frames which sounds like PS4 Pro's pixel reconstruction process.

https://www.pcgamer.com/nvidia-turing-architecture-deep-dive/

On previous architectures, the FP cores would have to stop their work while the GPU handled INT instructions, but now the scheduler can dispatch both to independent paths. This provides a theoretical immediate performance improvement of 35 percent per core.​

https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/

Turing introduces a new processor architecture, the Turing SM, that delivers a dramatic boost in shading efficiency, achieving 50% improvement in delivered performance per CUDA Core compared to the Pascal generation. These improvements are enabled by two key architectural changes. First, the Turing SM adds a new independent integer datapath that can execute instructions concurrently with the floating-point math datapath. In previous generations, executing these instructions would have blocked floating-point instructions from issuing.
https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/

Turing Tensor Cores add new INT8 and INT4 precision modes for inferencing workloads that can tolerate quantization and don’t require FP16 precision. Turing Tensor Cores bring new deep learning- based AI capabilities to GeForce gaming PCs and Quadro-based workstations for the first time. A new technique called Deep Learning Super Sampling (DLSS) is powered by Tensor Cores. DLSS leverages a deep neural network to extract multidimensional features of the rendered scene and intelligently combine details from multiple frames to construct a high-quality final image​

VII supports INT8 and INT4 for deep learning- based AI capabilities.

Refer to Microsoft's DirectML. Read https://www.highperformancegraphics.org/wp-content/uploads/2018/Hot3D/HPG2018_DirectML.pdf
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Async compute and Sync compute shaders has TMU read-write path software optimizations.
Again, read Avalanche Studios lecture on TMU read-write workaround on ROPS bound situations.

View attachment 114519
And yet, the numbers say it doesn't really matter:
https://www.techspot.com/review/1762-just-cause-4-benchmarks/


AMD knows ROPS bound problem hence compute shader's read-write path workaround marketing push and VII's 1800Mhz clock speed with lower CU count.
Neither of those things have anything to do with ROPs and everything to do with TSMC 7nm.

NVIDIA's GPU designs has higher clock speeds to speed up classic GPU hardware.
Maxwell and newer utilize long render pipelines which translates to higher clockspeeds. AMD did similar with the NCU in Vega.

RX-580... AMD hasn't mastered 64 ROPS over 256 bit bus design, hence RX-580 is stuck at 32 ROPS with 256 bit bus. R9-290X has 64 ROPS with 512 bit bus which is 2X scale over R9-380X/RX-580's design.
And yet RX 580 is faster than R9 290X by a great deal.
 
Last edited:
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K IPS FreeSync/GSync DP, LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
Because "lack of tensor cores" (who the hell needs them in gaming) and elusive "RT stuff" (how many games support it, one?) is so important to highlight when talking about AMD product.
For tensor issue, AMD plans to support DirectML
And yet, the numbers say it doesn't really matter:
https://www.techspot.com/review/1762-just-cause-4-benchmarks/



Neither of those things have anything to do with ROPs and everything to do with TSMC 7nm.


Maxwell and newer utilize long render pipelines which translates to higher clockspeeds. AMD did similar with the NCU in Vega.


And yet RX 580 is faster than R9 290X by a great deal.
1. Too bad for you,
RTX 2080 has 4 MB L2 cache while GTX 1080 Ti has ~3 MB L2 cache. This is important for tile cache rendering.
nvidia tile cache.jpg


2. Wrong, Without memory bandwidth increase, Vega 56 at 1710 Mhz with faster ROPS and raster engines beating Strix Vega 64 at 1590 Mhz shows VII's direction with 1800 Mhz with memory bandwidth increase.

3. NCU itself doesn't compete the graphics pipeline.

4. RX-580 has delta color compression, 2 MB L2 cache for TMU and geometry (not connected to ROPS), higher clock speed for geometry/quad rastizer units and 8 GB VRAM selection.

R9-290X/R9-390X wasn't updated with Polaris IP upgrades. R9-390X has 8 GB VRAM.


Only Xbox One X's 44 CU GPU has the updates. NAVI 12 has 40 CU with unknown ROPS count, 256 bit GDDR6 and clock speed comparable to VII

You selected NVIDIA gameworks tile with geometry bias, NV GPUs has higher clock speed for geometry and raster engines.

RTX 2080 has six GPC with six raster engines, 4MB L2 cache and 64 ROPS with up to 1900 Mhz stealth overclock. Higher L2 cache storage = lower latency, less external memory hit rates.
GTX 1080 Ti has six GPC with six raster engines, 3MB L2 cache and 88 ROPS with up to 1800 Mhz stealth overclock.

3487026-0748377122-index.png

R9-390X's 5.9 TFLOPS beats RX-480's 5.83 TFLOPS

R9-390 Pro's 5.1 TFLOPS beats RX-480's 5.83 TFLOPS
 

Attachments

  • R9-290X b3d-bandwidth.gif
    R9-290X b3d-bandwidth.gif
    6.8 KB · Views: 324
  • RX 480 b3d-bandwidth.png
    RX 480 b3d-bandwidth.png
    5 KB · Views: 276
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Only Xbox One X's 44 CU GPU has the updates. NAVI 12 has 40 CU with unknown ROPS count, 256 bit GDDR6 and clock speed comparable to VII
Xbox One X uses a Polaris design (40 CUs, 2560 shaders, 32 ROPs, 160 TMUs). Virtually nothing is known about Navi at this point other than it is coming.

You selected NVIDIA gameworks tile with geometry bias, NV GPUs has higher clock speed for geometry and raster engines.
Don't know what games Avalanche Studios makes, do you? Hint: I referenced Just Cause 4 for a reason.
 
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K IPS FreeSync/GSync DP, LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
Xbox One X uses a Polaris design (40 CUs, 2560 shaders, 32 ROPs, 160 TMUs). Virtually nothing is known about Navi at this point other than it is coming.

Don't know what games Avalanche Studios makes, do you? Hint: I referenced Just Cause 4 for a reason.
1. Not 100 percent correct. X1X GPU's ROPS has 2MB render cache which doesn't exist for Polaris like RX-580. https://gpucuriosity.wordpress.com/...der-cache-size-advantage-over-the-older-gcns/


2. So what? Just Cause 4 is a Gameworks title. Avalanche Studios knows ROPS bound workaround with TMUs

Without Gameworks,


 
Last edited:
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
Forza Horizon 4 clearly has some type of problem for Nvidia cards at low resolutions. FH3 initially had some CPU usage issue for Nvidia cards and as far as I can see, so does FH4.
By 2160p AMD cards will drop off far faster than Nvidia counterparts.
 
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K IPS FreeSync/GSync DP, LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
Forza Horizon 4 clearly has some type of problem for Nvidia cards at low resolutions. FH3 initially had some CPU usage issue for Nvidia cards and as far as I can see, so does FH4.
By 2160p AMD cards will drop off far faster than Nvidia counterparts.
If there's CPU bound issue, several GPU's frame rate results would flat line into common frame rate number.

Vega 64 has inferior delta color compression when compared NVIDIA's version which is remedied by VII's higher 1TB/s memory bandwidth.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
1. Not 100 percent correct. X1X GPU's ROPS has 2MB render cache which doesn't exist for Polaris like RX-580. https://gpucuriosity.wordpress.com/...der-cache-size-advantage-over-the-older-gcns/
Vega does have L2 cache for ROPs but it's not clear how much. Cache is expensive.

2. So what? Just Cause 4 is a Gameworks title. Avalanche Studios knows ROPS bound workaround with TMUs
Avalanche Studios clearly went out of their way to do optimization work for Vega yet, the gains are small. Most likely the game was engineered to run on Pascal and it hammered the ROPs. Transitioning some of the workload off of the ROPs solved their problem. That's not necessarily because of a design flaw in Vega, more that they were porting their rendering code from Pascal to Vega and had to create a work around where architecturally they are different. That's what optimization is about in general.

Without Gameworks,
Now you're spamming random benchmarks that in no way prove your point.



Edit: Apparently Radeon VII can use DirectML which in practice can replace DLSS:
https://www.overclock3d.net/news/gp..._supports_directml_-_an_alternative_to_dlss/1

That may imply that Vega 20 has tensor cores.
 
Last edited:
Joined
Jun 3, 2010
Messages
2,540 (0.50/day)
Nvidia has DCC, TBR, L2 client ROPs and double ROPs per AMD counterpart.
You cannot compare L2 client ROPs memory controller's overclock range with discrete ROPs memory controller. The memory controller of L2-ROPs only dispatches batched transactions, it has much less traffic since you need to hit maximum memory transfer rate to hit its 'actual' clock. ROPs are at a different frequency than the memory controller.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Joined
Jun 3, 2010
Messages
2,540 (0.50/day)
Besides, Nvidia has texture L1 client to L2(TBR) and uses DCC to compress any L1 traffic in L2. Normally their texture bandwidth without L1 caching is the memory interface bandwidth. With texture bandwidth amplification, the card does not write more but can read more textures than AMD's 1TB/s card.
 
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K IPS FreeSync/GSync DP, LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
Besides, Nvidia has texture L1 client to L2(TBR) and uses DCC to compress any L1 traffic in L2. Normally their texture bandwidth without L1 caching is the memory interface bandwidth. With texture bandwidth amplification, the card does not write more but can read more textures than AMD's 1TB/s card.
R9-290X at 1Ghz already has 1TB/s L2 cache bandwidth which makes GCNs cryptocurrency friendly cards

VII's 1.8Ghz scaling would be about 1.8 TB/s L2 cache bandwidth. Vega has compressed texture I/O from external memory.

Don't make me run CUDA app L2 cache benchmark on my GTX 1080 Ti and GTX 980 Ti.

I was wrong here, Vega 20 presumably runs DirectML on its compute shaders. DirectML can use tensor cores (if available), compute shaders (if available), or CPU cores.
On older DX12 hardware, all lesser datatypes are either emulated into 32bit datatype or run at the same rate as 32bit datatype while newer hardware has higher performance benefits.

DirectML is important for uniformed API access to rapid pack math features in newer hardware while offer software compatibility with older hardware. DirectML benefits the next Xbox One hardware release.

Polaris GPU already pack math feature but it's usage doesn't increase TFLOPS rate and reduces the available stream processors for 32bit datatypes while Vega fixes this Polaris issue.


Besides, Nvidia has texture L1 client to L2(TBR) and uses DCC to compress any L1 traffic in L2. Normally their texture bandwidth without L1 caching is the memory interface bandwidth. With texture bandwidth amplification, the card does not write more but can read more textures than AMD's 1TB/s card.
R9-290X at 1Ghz has 1TB/s L2 bandwidth while GTX 980 Ti has about 600 GB/s L2 bandwidth (via CUDA app, disables DCC). R9-290X's ROPS are not connected to L2 cache while GTX 980 Ti's ROPS are connected to L2 cache (for tile cache render loop).

VII's 1800Mhz scale from R9-290X's 1Ghz design reaches to 1.8 TB/s L2 cache bandwidth.

Vega 56/64 has 4MB L2 cache for TMU and ROPS.

1. Vega does have L2 cache for ROPs but it's not clear how much. Cache is expensive.


2. Avalanche Studios clearly went out of their way to do optimization work for Vega yet, the gains are small. Most likely the game was engineered to run on Pascal and it hammered the ROPs. Transitioning some of the workload off of the ROPs solved their problem. That's not necessarily because of a design flaw in Vega, more that they were porting their rendering code from Pascal to Vega and had to create a work around where architecturally they are different. That's what optimization is about in general.


3. Now you're spamming random benchmarks that in no way prove your point.



Edit: Apparently Radeon VII can use DirectML which in practice can replace DLSS:
https://www.overclock3d.net/news/gp..._supports_directml_-_an_alternative_to_dlss/1

That may imply that Vega 20 has tensor cores.
1. Vega 56/64 has 4MB L2 cache. https://www.tomshardware.com/news/visiontek-radeon-rx-vega-64-graphics-card,35280.html

2. Not complete. Geometry/Raster Engines are another problem for AMD GPUs when NVIDIA GPU counterparts has higher clockspeed. Vega 56 at 1710Mhz with 12 TFLOPS beating Strix Vega 64 at 1590Mhz with 13 TFLOPS shows higher clockspeed improves Geometry/Raster Engines/ROPS/L2 cache despite Vega 56's lower TFLOPS.

3. Don't deny Gameworks issues. Hint: Geometry and related rasterization conversion process, and NVIDIA GPU counterparts has higher clockspeed that benefits classic GPU hardware. I advocate for AMD to reduce CU count (reduce power consumption) and trade for higher clock speed e.g. Vega 48 with 1900Mhz to 2Ghz range.
 
Last edited:
Joined
Jun 3, 2010
Messages
2,540 (0.50/day)
R9-290X at 1Ghz already has 1TB/s L2 cache bandwidth which makes GCNs cryptocurrency friendly cards

VII's 1.8Ghz scaling would be about 1.8 TB/s L2 cache bandwidth. Vega has compressed texture I/O from external memory.

Don't make me run CUDA app L2 cache benchmark on my GTX 1080 Ti and GTX 980 Ti.


On older DX12 hardware, all lesser datatypes are either emulated into 32bit datatype or run at the same rate as 32bit datatype while newer hardware has higher performance benefits.

DirectML is important for uniformed API access to rapid pack math features in newer hardware while offer software compatibility with older hardware. DirectML benefits the next Xbox One hardware release.

Polaris GPU already pack math feature but it's usage doesn't increase TFLOPS rate and reduces the available stream processors for 32bit datatypes while Vega fixes this Polaris issue.



R9-290X at 1Ghz has 1TB/s L2 bandwidth while GTX 980 Ti has about 600 GB/s L2 bandwidth (via CUDA app, disables DCC). R9-290X's ROPS are not connected to L2 cache while GTX 980 Ti's ROPS are connected to L2 cache (for tile cache render loop).

VII's 1800Mhz scale from R9-290X's 1Ghz design reaches to 1.8 TB/s L2 cache bandwidth.

Vega 56/64 has 4MB L2 cache for TMU and ROPS.


1. Vega 56/64 has 4MB L2 cache. https://www.tomshardware.com/news/visiontek-radeon-rx-vega-64-graphics-card,35280.html

2. Not complete. Geometry/Raster Engines are another problem for AMD GPUs when NVIDIA GPU counterparts has higher clockspeed. Vega 56 at 1710Mhz with 12 TFLOPS beating Strix Vega 64 at 1590Mhz with 13 TFLOPS shows higher clockspeed improves Geometry/Raster Engines/ROPS/L2 cache despite Vega 56's lower TFLOPS.

3. Don't deny Gameworks issues. Hint: Geometry and related rasterization conversion process, and NVIDIA GPU counterparts has higher clockspeed that benefits classic GPU hardware. I advocate for AMD to reduce CU count (reduce power consumption) and trade for higher clock speed e.g. Vega 48 with 1900Mhz to 2Ghz range.
I concur, however I was pointing out that the IMC has less consequences in a TBR & L2-ROP design. AMD would certainly be able to clock the gpu higher in case they integrated TBR, but also most of Nvidia's advantage is due to r:w amplification through TBR, not frequency alone. They can only write 616GB/s, yes, but setup occurs in reference of texture reads at 1.5TB/s.
 
Top