Thursday, September 26th 2024
NVIDIA GeForce RTX 5090 and RTX 5080 Specifications Surface, Showing Larger SKU Segmentation
Thanks to the renowned NVIDIA hardware leaker kopite7Kimi on X, we are getting information about the final versions of NVIDIA's first upcoming wave of GeForce RTX 50 series "Blackwell" graphics cards. The two leaked GPUs are the GeForce RTX 5090 and RTX 5080, which now feature a more significant gap between xx80 and xx90 SKUs. For starters, we have the highest-end GeForce RTX 5090. NVIDIA has decided to use the GB202-300-A1 die and enabled 21,760 FP32 CUDA cores on this top-end model. Accompanying the massive 170 SM GPU configuration, the RTX 5090 has 32 GB of GDDR7 memory on a 512-bit bus, with each GDDR7 die running at 28 Gbps. This translates to 1,568 GB/s memory bandwidth. All of this is confined to a 600 W TGP.
When it comes to the GeForce RTX 5080, NVIDIA has decided to further separate its xx80 and xx90 SKUs. The RTX 5080 has 10,752 FP32 CUDA cores paired with 16 GB of GDDR7 memory on a 256-bit bus. With GDDR7 running at 28 Gbps, the memory bandwidth is also halved at 784 GB/s. This SKU uses a GB203-400-A1 die, which is designed to run within a 400 W TGP power envelope. For reference, the RTX 4090 has 68% more CUDA cores than the RTX 4080. The rumored RTX 5090 has around 102% more CUDA cores than the rumored RTX 5080, which means that NVIDIA is separating its top SKUs even more. We are curious to see at what price point NVIDIA places its upcoming GPUs so that we can compare generational updates and the difference between xx80 and xx90 models and their widened gaps.
Sources:
kopite7kimi (RTX 5090), kopite7kimi (RTX 5080)
When it comes to the GeForce RTX 5080, NVIDIA has decided to further separate its xx80 and xx90 SKUs. The RTX 5080 has 10,752 FP32 CUDA cores paired with 16 GB of GDDR7 memory on a 256-bit bus. With GDDR7 running at 28 Gbps, the memory bandwidth is also halved at 784 GB/s. This SKU uses a GB203-400-A1 die, which is designed to run within a 400 W TGP power envelope. For reference, the RTX 4090 has 68% more CUDA cores than the RTX 4080. The rumored RTX 5090 has around 102% more CUDA cores than the rumored RTX 5080, which means that NVIDIA is separating its top SKUs even more. We are curious to see at what price point NVIDIA places its upcoming GPUs so that we can compare generational updates and the difference between xx80 and xx90 models and their widened gaps.
181 Comments on NVIDIA GeForce RTX 5090 and RTX 5080 Specifications Surface, Showing Larger SKU Segmentation
So its not my problem now.
edit
The other solution is just modding of PSU + 24V/12V DC/DC added right to the power socket(s) of GPU
Regarding the Seasonic they also have a 1600W that is 80+ Titanium (the 2200W is surprisingly Platinum even though there's not much difference) but I think the 1600W is enough! I wish Corsair would release a new AX1600i with 2x 16-pin connectors! I have a AX1500i and love it! It is due to a Memory Bandwidth bottleneck.
FYI the 4090 has a bandwidth of 1,008GB/s whereas the 4080 has 717GB/s aka ~40% more Bandwidth when it has 68% more CUDA Cores...
Also the 4090 has only 72MB L2 Cache (out of 96MB of a full AD102 die) and the 4080 has 64MB, so only 12.5% more...
Does the l2 cache really bottlenecking the 4090 and will this plautau affect the 5090 as well?
Update it's official anyone postulating Blackwells high prices for likes is a paid troll!
And the x90 is a Titan replacement as Nvidia made that clear themselves with 3090's release back then.
Nvidia are still a Gaming brand and they know that if the A.I. bubble was bursting tomorrow they would have to go back to Gaming as their main revenue... There is a reason why the 3090 and 3090 Ti were not called TITAN and that's because they are not! TITAN also pack FP64 cores and usually have 2x more VRAM, 780/Ti had 3GB whereas the TITAN had 6GB, they 2080 Ti had 11GB whereas the RTX TITAN had 24GB. Performance never scales linearly, and yes the L2 Cache plays a big role in Lovelace architecture, hence the "only" 28% more performance at 4K Ultra but sometimes closer to 40% in Ray Tracing/Path Tracing because it relies on RT Cores performance.
Ps: we don't know how much VRAM the 5090 will have but it could have 96MB this time... when the full GB202 has 128MB so it might still create a bottleneck somewhere even though the Memory Bandwidth should be much higher than the 4090 (almost 1.8TB/s vs 1TB/s) The power is not a limiting factor because even with the 600W BIOS you don't get a lot more performance!
GDDR6X memory Overclocking without raising the power limit can sometimes bring you a lot more fps than Core Overclocking!
God of War: Ragnarök for example is very Memory Bandwidth bound! I have OC'd my GDDR6X to 25Gbps on my 4090 and it gave me 7% more performance without any Core OC for example.
So no, the x90s are not Titans.
The performance gap between 4080 and 4090 is enormous, and missing 4080Ti design is obvious there. So the 5080 will fill that gap pretty nicely. The MSRP price might match or be slightly lower than 4090 though. And retailers will for sure match price for a new gen basing upon raster performance and not on MSRPs.
If they put more resources on the silicon they would have a lot more problems to supply power to them properly and to dissipate heat as well.
So power delivery and heat envelope were limiting factors at design state - I'm betting and it is what I should write in previous sentence
The 4090 Ti was supposed to be a 600W GPU with a 4-slot cooler... but even the 4090 with a 600W BIOS and fully overclocked doesn't even get very high temperatures so I'm not worried about the 5090. Blackwell is supposed to be a brand new architecture whereas Lovelace was more an Ampere+ architecture. The biggest change was the process node, going from Samsung 8nm (10nm enhnaced) to TSMC 4N (5nm enhanced) was a big jump!
Reminds of the sli scaling bs where 2 gpus didn't scale 100% haha the monolithic being superior than 2 gpus was half true.
One would hope that scaling would be linear or close to it especially at almost 100% premium outside a few outliers just like sli. Hopefully that 512 bit bus improves scaling for Blackwell.
update but then again if power was the issue the 4080 at 3ghz and memory oc also has significant performance delta so you have to look at it at factory settings. Tweaking is no part of the equation because both sides improve.
That said games do not run 50/50 INT / FP32. They run 23 / 77, which essentially perfectly lines up with the expected performance uplift of adding FP32 capability to your INT datapath (assuming no other bottlenecks). 27% of your INT cores that would have otherwise remained idle can now handle FP32, which increases your performance in gaming workloads.
Nvidia has a whitepaper on the 3000 series here: www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf
This is way easier to pull on a platform where you don't need to care that much about standards and can make your own (such as SXM itself). SXM3 even hinted to manufacturers that they could use a 12v to 48v booster in their designs to update legacy projects. FP64 on consumer GPUs haven't been a thing since Kepler. FP64 cores are only a thing on x100 chips now.
The Titan V had it since it used the V100 chip, but the latter Titan RTX did not. The 3080ti had 12gb whereas the 3090 had 24GB.
They've been just glamorized halo-tier cards with (almost) full die and with full memory bandwith tand with larger VRAM amount. That's why x90 is the Titan these days, just branded for gamers. You were faster, looks that we said the same things.
The first, foremost and dare I say only aspect that determines what Nvidia calls A, B or C is marketing strategy. Every single Titan was created with that express purpose: marketing. GTX and RTX were created for marketing purposes, too. They call it whatever it is they want to sell you. Its not necessarily a different product. Its just whatever's deemed popular.
Nvidia: our GPU needs 600W
Intel: challenge accepted