Tuesday, May 7th 2019
AMD Radeon RX 3080 XT "Navi" to Challenge RTX 2070 at $330
Rumors of AMD's next-generation performance-segment graphics card are gaining traction following a leak of what is possibly its PCB. Tweaktown put out a boatload of information of the so-called Radeon RX 3080 XT graphics card bound for an 2019 E3 launch, shortly after a Computex unveiling. Based on the 7 nm "Navi 10" GPU, the RX 3080 XT will feature 56 compute units based on the faster "Navi" architecture (3,584 stream processors), and 8 GB of GDDR6 memory across a 256-bit wide memory bus.
The source puts out two very sensational claims: one, that the RX 3080 XT performs competitively with NVIDIA's $499 GeForce RTX 2070; and two, that AMD could start a price-war against NVIDIA by aggressively pricing the card around the $330 mark, or about two-thirds the price of the RTX 2070. Even if either if not both hold true, AMD will fire up the performance-segment once again, forcing NVIDIA to revisit the RTX 2070 and RTX 2060.
Source:
Tweaktown
The source puts out two very sensational claims: one, that the RX 3080 XT performs competitively with NVIDIA's $499 GeForce RTX 2070; and two, that AMD could start a price-war against NVIDIA by aggressively pricing the card around the $330 mark, or about two-thirds the price of the RTX 2070. Even if either if not both hold true, AMD will fire up the performance-segment once again, forcing NVIDIA to revisit the RTX 2070 and RTX 2060.
213 Comments on AMD Radeon RX 3080 XT "Navi" to Challenge RTX 2070 at $330
AMD didn't develop its Bulldozer architecture the way it could have, had it chosen to go for high performance. We have no idea what a Keller-level talent could have done with it, let alone what more ordinary engineers could have done had AMD chosen to upgrade from Piledriver with a high-performance node (e.g. 22nm IBM or even 32nm GF) successor designed with things that were missing from Piledriver, like better microop caching, more capable individual cores, better AVX performance (e.g. fixing the regression from Bulldozer) and AVX-2 support, and L3 cache with decent performance. I have also heard anecdotally that Linux runs Piledriver much more efficiently than Windows when tuned for the architecture, so there may still be a Windows performance obstacle that could have been overcome.
People praised SMT and condemned CMT but we've seen enough examples recently of Intel not even enabling SMT in CPUs that offer good performance. I think it's therefore dubious to assume that SMT is needed for high performance, making the SMT is vastly superior to CMT argument questionable. I wonder if it's possible/worthwhile to do the opposite of what AMD did and have two FPU units for every integer unit.
One of the worst things about Bulldozer is that we'll never know what the architecture could have been had it been developed more effectively. It should have never been released in its original state ("Bulldozer") and Piledriver wasn't enough of an improvement either. 8 core consumer CPUs were also premature considering the primitiveness of Windows and most software.
Vega 56 has 410 GB/s memory bandwidth.
Vega 64 LC OC+UV at ~1750 Mhz yields similar results to VII despite it's 2X the memory bandwidth over Vega 64 LC's.
Amd/comments/9du2w4NAVI has memory compression improvements.
Facts remains RTX 2080 Ti has 88 ROPS with six GPC blocks (with each GPC has at least a raster engine) superiority over VII's 64 ROPS and four raster engines.
TFLOPS is nothing without raster engines and ROPS (graphics read/write units). Note why AMD is pushing for compute shader path i.e. using TMUs for read/write units
You compared cards which are not the same as those I mentioned and then went on to waffle about things which are less relevant to most.
Piledriver was a much more efficient version of Bulldozer, which did significantly increase the overall performance. AMD had no choice but to do this, at least for the Desktop Gaming segment.
Bulldozer -Piledriver -Steamroller -Excavator -ZEN -ZEN+ & ZEN2......
EDITED.
I got my CEO's confused and made corrections.
Where did you see me mentioning RBE/ROP performance? Fermi was performant, not simplistically due to GS yielding 50% > perf/clk, but due to the follow-on urach benefits of the polymorph engines allowing decoupling of the front end resulting in far greater extraction of parallelism. This gave better utilization, less bubbles/stalls in the pipeline. GF silicon implementation didn't match the expected RTL, but each iteration since has lead to improvements. Does that also extend to die area? ;) It's a repurposed Mi50, whattayagonnado? As a low volume gaming SKU, it's probably the bottom of the barrel 7nm working chips that might be marginal thermal/load. The cost to package as a lower frame buffer/bandwidth SKU might be marginal & the full spec can be exploited by marketing vs the competition. There's a simple metric really, TU102=18b transistors outperforms Vega20=13b transistors as the silicon is deployed in a much better uarch, eg 3.3TFOPs FP64 for Vega is no benefit to gamers. The traditional GS/HS/DS geometry stages may well be deprecated in favor of more flexible & performant primitive/mesh shaders, but don't conflate GF->TU & GCN 1->9. It's not just the ROPs/TMUs in NV's favour, it's the decoupling of the front end and the ability to extract much more parallelism that allows higher utilization from lower peak FLOPs. We also need to consider better bandwidth utilization, data reuse (register/cache), etc.
Ah, ok then.