• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA GeForce RTX 4060 Ti Available as 8 GB and 16 GB, This Month. RTX 4060 in July

Joined
Mar 19, 2023
Messages
153 (0.36/day)
Location
Hyrule Castle, France
Processor Ryzen 5600x
Memory Crucial Ballistix
Video Card(s) RX 7900 XT
Storage SN850x
Display(s) Gigabyte M32U - LG UltraGear+ 4K 28"
Case Fractal Design Meshify C Mini
Power Supply Corsair RM650x (2021)
If this sells well, we may get new 4070 16GB models. But there are still many unknowns.
Ofc not WTF lol???
Adding 4Go of VRAM means redrawing the bus. That's redrawing the I/O, which means redrawing the chip. That's something they never ever do. They prefer doubling the VRAM on the bus (3060 12Go) than ever remaking a chip. And it's the same for AMD. We'll never have any 4070s with 16Go. We may have 4060 Tis with 16Go or 4070Tis with 24Go (highly doubtful on the the latter) though.
 
Joined
Apr 28, 2023
Messages
48 (0.12/day)
Location
Españistan
Ofc not WTF lol???
Adding 4Go of VRAM means redrawing the bus. That's redrawing the I/O, which means redrawing the chip. That's something they never ever do. They prefer doubling the VRAM on the bus (3060 12Go) than ever remaking a chip. And it's the same for AMD. We'll never have any 4070s with 16Go. We may have 4060 Tis with 16Go or 4070Tis with 24Go (highly doubtful on the the latter) though.
You take a mid-low card as 4070 (32% cudas) with 192 bit, eliminate 64 leaving only 128 bits as low end card and then you put the 16GB. It's so easy, XD

Something like: 2060 Super 8GB (TU106) vs 2060 12GB (TU106).

But 2060 Super is a midrange card with 47% of cudas and 256bit, not as the mid-low 4070.
 
  • Like
Reactions: cbb
Joined
Mar 19, 2023
Messages
153 (0.36/day)
Location
Hyrule Castle, France
Processor Ryzen 5600x
Memory Crucial Ballistix
Video Card(s) RX 7900 XT
Storage SN850x
Display(s) Gigabyte M32U - LG UltraGear+ 4K 28"
Case Fractal Design Meshify C Mini
Power Supply Corsair RM650x (2021)
You take a mid-low card as 4070 (32% cudas) with 192 bit, eliminate 64 leaving only 128 bits as low end card and then you put the 16GB. It's so easy, XD
.......
 
Joined
Nov 7, 2017
Messages
52 (0.02/day)
Since prices are so high, but availability good, and now we are getting RTX 4060 Ti with so much memory, this card vs RX 7700 XT is going to be the battle of the ages, though in the last two gens RX x700 matched the RTX xx70 tier, so I'm not sure if it is the RX 7700 that is the direct competitor, but either way, this match up fighting for customers is going to be historically an epic one in the GPU scene - market demand is so high. Refreshes are likely too, at least from AMD, and hopefully from Nvidia too as a counter measure, since RX 7000 series did not completely hit it's goals as was discussed in the tech news before, so when they get a stable stream of great chips or do some further redesign, RX 7x50 models will yet up the game.
The rest of the year is going be the exciting period for value oriented buyers, on the CPU front too, because Ryzen 7000 chips simply need to be priced down to move them anywhere. Mobo manufacturers' actions are unknown though, but I bet they could quite easily lower margins, at least a bit. Even if the components on many AM5 boards are certainly high quality, we have just seen how Asus has put pretty low effort to refine even their most expensive boards (which Gamers Nexus reported about), which doesn't point to increased costs in supporting the products, though who knows if this is a case of some fresh engineers in the business or something similar. Either way, I dare to wish the end of the year is finally the time of upgrading for a significant majority.
 

cbb

Joined
Nov 22, 2022
Messages
43 (0.08/day)
Processor 13600K
Motherboard ASRock B760 Pro
Cooling Peerless Assassin 120
Memory Gskill 2x16G DDR5 6000
Video Card(s) RTX 2070 Gigabyte
Storage WD Black SN850X
Display(s) LG 32 UM550 (3840x2160@60)
Case Define R5 v1
Audio Device(s) Bryston BDA-1
Power Supply Seasonic Focus PX-750
Mouse Glorious O-
Keyboard E-YoSoo Z-88/ Keychron V1
Software PopOS
huh. Latest news on videocardz.com shows the 4060Ti 16GB version has roughly half the bandwidth (and a half sized bus) of my 2018 2070?? that's seems a bizarre bit of "progress"? Roughly double the cores, and same power tho, so that is definitely progress, but curious about the bandwidth. Ofc it makes sense it has the same bus/bandwidth as the 8GB, otherwise that'd be a big change and arguably another card entirely, just that I hadn't compared that spec to my old card until now. It might not matter too much if it's using the extra vram to store textures & assets for quick re-use, rather than whole new scenes (which, presumably, would lean more on the bandwidth to the rest of the system?)? idk, I haven't done real engineering since the z80 was current, so I'll admit to handwaving a bit (!) here. And it wouldn't require a new psu, which was putting me off the discounted radeon 69xxXTs.
Normally, I kinda expect each gen to roughly equal the prior gen's next card up. And, in processing this probably exceeds that. But half the bandwidth is a surprise, although (as noted above) idk if it'll be an issue for users. My impression is a lot of the value of the larger vram is swap space/storage space so it doesn't have to fetch from the rest of the system (which is slow) as often, so maybe just fine? Guess we'll see!
 
Joined
Oct 12, 2005
Messages
682 (0.10/day)
huh. Latest news on videocardz.com shows the 4060Ti 16GB version has roughly half the bandwidth (and a half sized bus) of my 2018 2070?? that's seems a bizarre bit of "progress"? Roughly double the cores, and same power tho, so that is definitely progress, but curious about the bandwidth. Ofc it makes sense it has the same bus/bandwidth as the 8GB, otherwise that'd be a big change and arguably another card entirely, just that I hadn't compared that spec to my old card until now. It might not matter too much if it's using the extra vram to store textures & assets for quick re-use, rather than whole new scenes (which, presumably, would lean more on the bandwidth to the rest of the system?)? idk, I haven't done real engineering since the z80 was current, so I'll admit to handwaving a bit (!) here. And it wouldn't require a new psu, which was putting me off the discounted radeon 69xxXTs.
Normally, I kinda expect each gen to roughly equal the prior gen's next card up. And, in processing this probably exceeds that. But half the bandwidth is a surprise, although (as noted above) idk if it'll be an issue for users. My impression is a lot of the value of the larger vram is swap space/storage space so it doesn't have to fetch from the rest of the system (which is slow) as often, so maybe just fine? Guess we'll see!
The 4060Ti have way more L2 cache than the 2070. The effective bandwidth of the VRAM + Cache subsystem is probably around the same or higher than the 2070..
 
Joined
Jan 20, 2019
Messages
1,295 (0.67/day)
Location
London, UK
System Name ❶ Oooh (2024) ❷ Aaaah (2021) ❸ Ahemm (2017)
Processor ❶ 5800X3D ❷ i7-9700K ❸ i7-7700K
Motherboard ❶ X570-F ❷ Z390-E ❸ Z270-E
Cooling ❶ ALFIII 360 ❷ X62 + X72 (GPU mod) ❸ X62
Memory ❶ 32-3600/16 ❷ 32-3200/16 ❸ 16-3200/16
Video Card(s) ❶ 3080 X Trio ❷ 2080TI (AIOmod) ❸ 1080TI
Storage ❶ NVME/SSD/HDD ❷ <SAME ❸ SSD/HDD
Display(s) ❶ 1440/165/IPS ❷ 1440/144/IPS ❸ 1080/144/IPS
Case ❶ BQ Silent 601 ❷ Cors 465X ❸ Frac Mesh C
Audio Device(s) ❶ HyperX C2 ❷ HyperX C2 ❸ Logi G432
Power Supply ❶ HX1200 Plat ❷ RM750X ❸ EVGA 650W G2
Mouse ❶ Logi G Pro ❷ Razer Bas V3 ❸ Logi G502
Keyboard ❶ Logi G915 TKL ❷ Anne P2 ❸ Logi G610
Benchmark Scores I have wrestled bandwidths, Tussled with voltages, Handcuffed Overclocks, Thrown Gigahertz in Jail
huh. Latest news on videocardz.com shows the 4060Ti 16GB version has roughly half the bandwidth (and a half sized bus) of my 2018 2070?? that's seems a bizarre bit of "progress"? Roughly double the cores, and same power tho, so that is definitely progress, but curious about the bandwidth. Ofc it makes sense it has the same bus/bandwidth as the 8GB, otherwise that'd be a big change and arguably another card entirely, just that I hadn't compared that spec to my old card until now. It might not matter too much if it's using the extra vram to store textures & assets for quick re-use, rather than whole new scenes (which, presumably, would lean more on the bandwidth to the rest of the system?)? idk, I haven't done real engineering since the z80 was current, so I'll admit to handwaving a bit (!) here. And it wouldn't require a new psu, which was putting me off the discounted radeon 69xxXTs.
Normally, I kinda expect each gen to roughly equal the prior gen's next card up. And, in processing this probably exceeds that. But half the bandwidth is a surprise, although (as noted above) idk if it'll be an issue for users. My impression is a lot of the value of the larger vram is swap space/storage space so it doesn't have to fetch from the rest of the system (which is slow) as often, so maybe just fine? Guess we'll see!

This is definitely a concern. Bandwidths are crucial for snappier data access. More VRAM and higher bandwidths usually go hand in hand otherwise increased latency or lack of real-time VRAM utilisation will end up with adverse performance. Or, for the laymen, we end with the illusion more VRAM wasn't necessary in the first place.

The idea being, more VRAM "to store" rendering elements, textures or other visual effects alongside faster memory bandwidths "to access/transfer" real-time graphical data quickly. Nowadays smart game engines more-often rely on faster and wider bandwidths for real-time dynamic assets/effects swapping hence compromising on memory speed (or transfer rates) will most likely end up with reduced performance, frame drops and simply a bad case of poorer sustained visual fidelity. Breaking the balance is a kick in the teeth and as usual the easy way out in scapegoating developer optimisations as the primary culprit, well they usually share blame but hardware limitations are sometimes overlooked and present unappealing challenges which devs are probably not bothered to entertain.

Obviously the same doesn't apply to everyone, the balance between VRAM and memory bandwidths will depend on the users specific needs and use case, and its important to consider both factors when selecting a graphics card (for the less-informed, benchmarks and reviews often help to stay on top of all the riff raff)
 
Joined
May 15, 2020
Messages
697 (0.48/day)
Location
France
System Name Home
Processor Ryzen 3600X
Motherboard MSI Tomahawk 450 MAX
Cooling Noctua NH-U14S
Memory 16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s) MSI RX 5700XT EVOKE OC
Storage Samsung 970 PRO 512 GB
Display(s) ASUS VA326HR + MSI Optix G24C4
Case MSI - MAG Forge 100M
Power Supply Aerocool Lux RGB M 650W
On this generation, Nvidia did what AMD did on the previous generation, added a large L2 cache to diminish the need for bandwidth. It worked fine for AMD, so I don't see any reason why it won't work for Nvidia. Bandwidth won't be a problem, but relatively low core counts and VRAM will.
 
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K IPS FreeSync/GSync DP, LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
The 4060Ti have way more L2 cache than the 2070. The effective bandwidth of the VRAM + Cache subsystem is probably around the same or higher than the 2070..
4060 Ti's 32 MB L2 cache with delta color compression can hold the entire 1920x1080p frame buffers.

Xbox One's split render with on-chip 32 MB ESRAM (70% of 1080p render) and system memory example. NVIDIA has superior delta color compression (DCC).
xbox-one-ddr3-esram-split-render-target.jpg


PascalEdDay_FINAL_NDA_1463156837-012_575px.png


NVIDIA's delta color compression conserves bandwidth and data storage.
 
Last edited:
Joined
Apr 28, 2023
Messages
48 (0.12/day)
Location
Españistan
huh. Latest news on videocardz.com shows the 4060Ti 16GB version has roughly half the bandwidth (and a half sized bus) of my 2018 2070?? that's seems a bizarre bit of "progress"? . . . .
RTX 3050 8GB (2560 cudas) (128bit) has more cudas than RTX 2070 (2304cudas) (256bit). Yes, has more cudas and less bit. (are diferent SP)

There is nothing new, except a name change in the RTX 4000series. Look in my signature.

On this generation, Nvidia did what AMD did on the previous generation, added a large L2 cache to diminish the need for bandwidth. It worked fine for AMD, so I don't see any reason why it won't work for Nvidia. Bandwidth won't be a problem, but relatively low core counts and VRAM will.
This is not true. This generation has exactly the same bus than previous generations.

100% - 67% cudas --> 384 bit
66% - 45% cudas --> 256 bit
44% - 30% cudas --> 192 bit
29% - 15% cudas --> 128 bit
14% - 0% cudas --> 64 bit

Everything is exactly the same as always.

100% cudas2080Ti12 - 384 bits3090Ti - 384 bits4090Ti - RTX6000 - 384 bit
55% cudas2070 Super (55%) - 256 bit3070 (55%) - 256 bit4080 (53%) - 256 bit
33% cudas1660Ti - 192 bit3060 - 192 bit4070 - 192 bit
25% cudas1650 Super (28%) - 128 bit3050 8GB (24%) - 128 bit4060Ti (24%) - 128 bit
 
Last edited:
Joined
Jan 14, 2019
Messages
10,053 (5.15/day)
Location
Midlands, UK
System Name Holiday Season Budget Computer (HSBC)
Processor AMD Ryzen 7 7700X
Motherboard MSi PRO B650M-A WiFi
Cooling be quiet! Dark Rock 4
Memory 2x 16 GB Corsair Vengeance EXPO DDR5-6000
Video Card(s) Sapphire Pulse Radeon RX 6500 XT 4 GB
Storage 2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2, 4 + 8 TB Seagate Barracuda 3.5"
Display(s) Dell S3422DWG, 7" Waveshare touchscreen
Case Kolink Citadel Mesh black
Audio Device(s) Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply Seasonic Prime GX-750
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Windows 10 Pro
RTX 3050 8GB (2560 cudas) (128bit) has more cudas than RTX 2070 (2304cudas) (256bit). Yes, has more cudas and less bit. (are diferent SP)
You cannot compare CUDA cores across generations, especially since Nvidia changed what the term means (cheeky move, imo).

The 2070 has 2304 FP and 2304 INT cores. A pair counts as a CUDA core, that's why it has 2304.

The 3050 has 1280 FP and 1280 INT cores. Each one counts as a CUDA core, that's why it has 2560 (while it technically does not).

If there was a direct comparison, then the 3050 would be faster than the 2070, which it is clearly not.

Generally speaking, 1 Ampere/Ada core = ~0.5-0.75 Turing core in performance.
 
Joined
May 15, 2020
Messages
697 (0.48/day)
Location
France
System Name Home
Processor Ryzen 3600X
Motherboard MSI Tomahawk 450 MAX
Cooling Noctua NH-U14S
Memory 16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s) MSI RX 5700XT EVOKE OC
Storage Samsung 970 PRO 512 GB
Display(s) ASUS VA326HR + MSI Optix G24C4
Case MSI - MAG Forge 100M
Power Supply Aerocool Lux RGB M 650W
This is not true. This generation has exactly the same bus than previous generations.

100% - 67% cudas --> 384 bit
66% - 45% cudas --> 256 bit
44% - 30% cudas --> 192 bit
29% - 15% cudas --> 128 bit
14% - 0% cudas --> 64 bit

Everything is exactly the same as always.
Your % cuda indicator is a decent relative indicatior for analysing Nvidias market segmentation. However, that's where the utility stops.

Memory bus sizes allow balancing the power of the core with the output of the memory. Basically you have a 82 TFlop 4090 with 1 GB/s memory bandwith being balanced compared to a 35TFlop 3090 balanced with a 0.9 GB/s one. How? Because of the extra cache. %maximum cuda has nothing to do there.
 
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K IPS FreeSync/GSync DP, LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
You cannot compare CUDA cores across generations, especially since Nvidia changed what the term means (cheeky move, imo).

The 2070 has 2304 FP and 2304 INT cores. A pair counts as a CUDA core, that's why it has 2304.

The 3050 has 1280 FP and 1280 INT cores. Each one counts as a CUDA core, that's why it has 2560 (while it technically does not).

If there was a direct comparison, then the 3050 would be faster than the 2070, which it is clearly not.

Generally speaking, 1 Ampere/Ada core = ~0.5-0.75 Turing core in performance.
RTX 3050 has 2560 CUDA cores with half of them being able to execute integer datatypes i.e. 1,280 CUDA FP and 1,280 CUDA FP/INT.

GA10x SM unit.png


From https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf

GA10x SM evolved from TU10x SM when integer units gained floating-point support.

RTX 3070 has 184 TMUs, 96 ROPS, 2,944 CUDA FP32 cores and 2,944 CUDA FP32/INT32 cores.

RTX 3050 has 80 TMUs, 32 ROPS, 1,280 CUDA FP32 cores and 1,280 CUDA FP32/INT32 cores.

RTX 2080 has 144 TMUs, 64 ROPS, 2304 CUDA FP32 cores and 2304 CUDA INT32 cores.
 
Last edited:
Joined
Apr 28, 2023
Messages
48 (0.12/day)
Location
Españistan
You cannot compare CUDA cores across generations, especially since Nvidia changed what the term means (cheeky move, imo).
...
...
I know it, I wrote "(are diferent SP)". Just that is what I meant to say, you can't compare different things. ValenOne has explained it well, are 1280 + 1280shared, but it doesn't matter now.

You have to compare inside the family and then look at the piece of the cake, piece of chocolate, is trying to sell us Nvidia and at what price.
1/2 of Lovelace, 1/3 of Lovelace, 1/4 of Lovelace.

Medium cake is mid-range? A quarter of cake is mid-range? Is mid-range a quarter of Lovelace?
4060 Ti is a quarter-range GPU.

Your % cuda indicator is a decent relative indicatior for analysing Nvidias market segmentation. However, that's where the utility stops.

Memory bus sizes allow balancing the power of the core with the output of the memory. Basically you have a 82 TFlop 4090 with 1 GB/s memory bandwith being balanced compared to a 35TFlop 3090 balanced with a 0.9 GB/s one. How? Because of the extra cache. %maximum cuda has nothing to do there.
The increase of cache is a characteristic of all Lovelace, so it becomes irrelevant for comparison. I only compare within the same family.

What he meant before is:
Lovelace has the same memory bus as the previous generations, Lovelace needs the same memory bus as the previous generations.
In Lovelace the speed of memory is higher (or the same) than the previous generations. Lovelace needs faster ram (or the same) than previous generations.
- Memory speed x bus = Bandwidth

Lovelace has higher bandwidth than previous generations, Lovelace needs higher bandwidth than previous generations. (or the same in the worst case)
To say the opposite is to lie.
I hope no one says Lovelace has or needs less bandwidth than previous generations.

Full Lovelace has and needs more bandwidth than previous
1/2 of Lovelace has and needs more bandwidth than previous
1/3 of Lovelace has and needs more bandwidth than previous
1/4 of Lovelace has and needs more bandwidth than previous
Repetitive but easy to understand
 
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K IPS FreeSync/GSync DP, LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
I know it, I wrote "(are diferent SP)". Just that is what I meant to say, you can't compare different things. ValenOne has explained it well, are 1280 + 1280shared, but it doesn't matter now.

You have to compare inside the family and then look at the piece of the cake, piece of chocolate, is trying to sell us Nvidia and at what price.
1/2 of Lovelace, 1/3 of Lovelace, 1/4 of Lovelace.

Medium cake is mid-range? A quarter of cake is mid-range? Is mid-range a quarter of Lovelace?
4060 Ti is a quarter-range GPU.


The increase of cache is a characteristic of all Lovelace, so it becomes irrelevant for comparison. I only compare within the same family.

What he meant before is:
Lovelace has the same memory bus as the previous generations, Lovelace needs the same memory bus as the previous generations.
In Lovelace the speed of memory is higher (or the same) than the previous generations. Lovelace needs faster ram (or the same) than previous generations.
- Memory speed x bus = Bandwidth

Lovelace has higher bandwidth than previous generations, Lovelace needs higher bandwidth than previous generations. (or the same in the worst case)
To say the opposite is to lie. I hope no one says Lovelace has or needs less bandwidth than previous generations.

Full Lovelace has and needs more bandwidth than previous
1/2 of Lovelace has and needs more bandwidth than previous
1/3 of Lovelace has and needs more bandwidth than previous
1/4 of Lovelace has and needs more bandwidth than previous
Repetitive but easy to understand
For texture-mapped 3D games, the mid-range ADA SKU should be around the middle TMU (textures management units) count from the current flagship ADA SKU.

Pure TFLOPS debate is meaningless for texture-mapped 3D accelerated games.

RTX 4090 has 16384 CUDA cores with 82.58 TFLOPS and 512 TMUs (1,290 GTexel/s). AIB OC is higher e.g. 86.5 TFLOPS and 1,352 GTexel/s.

RTX 4080 has 9728 CUDA cores with 48.74 TFLOPS and 304 TMUs (761.5 GTexel/s). AIB OC is higher e.g. 51.36 TFLOPS and 802.6 GTexel/s. It can be higher with AIB's single-button auto OC e.g. 55 TFLOPS.

RTX 4070 Ti has 7680 CUDA cores with 40.09 TFLOPS and 240 TMUs (626.4 GTexel/s). AIB OC is higher e.g. 42 TFLOPS and 658.8 GTexel/s.

The mid-range textured 3D ADA is about RTX 4070 / 4070 Ti level.
 
Joined
Apr 28, 2023
Messages
48 (0.12/day)
Location
Españistan
For texture-mapped 3D games, the mid-range ADA SKU should be around the middle TMU (textures management units) count from the current flagship ADA SKU.

Pure TFLOPS debate is meaningless for texture-mapped 3D accelerated games.

RTX 4090 has 16384 CUDA cores...
...
It's the same!! Do you prefer TMUs? Has the same %. It's OK
1/2 of cake, 1/3 of cake, 1/4 of cake, ...

First: No, the full Lovelace is not 16000 cudas...

RTX 6000 (and future 4090Ti) has: 18176 cudas & 568 TMUs (and it is not really the full Lovelace. It has 18432 cudas & 576 TMUs)
If you do not take the full cake you will have wrong percentages and you will not be able to compare with previous generations
1/2 of Lovelace are 284 TMUs
1/3 of Lovelace are 189 TMUs
1/4 of Lovelace are 142 TMUs

RTX 4090 has a 90% of TMUs
RTX 4080Ti has a 77% of TMUs
RTX 4080 has a 1/2 of TMUs
RTX 4070 has a 1/3 of TMUs
RTX 4060Ti has a 1/4 of TMUs

You can see in my signature.

Edit: TMU are Cuda/32 in Lovelace, therefore RTX-4060-Ti has 4352cudas/32 = 136 TMU (24%) --> In the web say 128 TMU ("may change in the future"), but it is wrong: https://www.techpowerup.com/gpu-specs/geforce-rtx-4060-ti.c3890
 
Last edited:
Joined
Jan 20, 2019
Messages
1,295 (0.67/day)
Location
London, UK
System Name ❶ Oooh (2024) ❷ Aaaah (2021) ❸ Ahemm (2017)
Processor ❶ 5800X3D ❷ i7-9700K ❸ i7-7700K
Motherboard ❶ X570-F ❷ Z390-E ❸ Z270-E
Cooling ❶ ALFIII 360 ❷ X62 + X72 (GPU mod) ❸ X62
Memory ❶ 32-3600/16 ❷ 32-3200/16 ❸ 16-3200/16
Video Card(s) ❶ 3080 X Trio ❷ 2080TI (AIOmod) ❸ 1080TI
Storage ❶ NVME/SSD/HDD ❷ <SAME ❸ SSD/HDD
Display(s) ❶ 1440/165/IPS ❷ 1440/144/IPS ❸ 1080/144/IPS
Case ❶ BQ Silent 601 ❷ Cors 465X ❸ Frac Mesh C
Audio Device(s) ❶ HyperX C2 ❷ HyperX C2 ❸ Logi G432
Power Supply ❶ HX1200 Plat ❷ RM750X ❸ EVGA 650W G2
Mouse ❶ Logi G Pro ❷ Razer Bas V3 ❸ Logi G502
Keyboard ❶ Logi G915 TKL ❷ Anne P2 ❸ Logi G610
Benchmark Scores I have wrestled bandwidths, Tussled with voltages, Handcuffed Overclocks, Thrown Gigahertz in Jail
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K IPS FreeSync/GSync DP, LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
It's the same!! Do you prefer TMUs? Has the same %. It's OK
1/2 of cake, 1/3 of cake, 1/4 of cake, ...

First: No, the full Lovelace is not 16000 cudas...

RTX 6000 (and future 4090Ti) has: 18176 cudas & 568 TMUs (and it is not really the full Lovelace. It has 18432 cudas & 576 TMUs)
If you do not take the full cake you will have wrong percentages and you will not be able to compare with previous generations
1/2 of Lovelace are 284 TMUs
1/3 of Lovelace are 189 TMUs
1/4 of Lovelace are 142 TMUs

RTX 4090 has a 90% of TMUs
RTX 4080Ti has a 77% of TMUs
RTX 4080 has a 1/2 of TMUs
RTX 4070 has a 1/3 of TMUs
RTX 4060Ti has a 1/4 of TMUs

You can see in my signature.

Edit: TMU are Cuda/32 in Lovelace, therefore RTX-4060-Ti has 4352cudas/32 = 136 TMU (24%) --> In the web say 128 TMU ("may change in the future"), but it is wrong: https://www.techpowerup.com/gpu-specs/geforce-rtx-4060-ti.c3890
I stated the current flagship gaming ADA SKU. I am aware of the full AD102 cuda count.

RTX 4090 (16384 cuda) is not the full AD102 (18432 cuda).
RTX 4080 (9728 cuda) is not the full AD103 (10240 cuda).

NVIDIA is reserving the fully enabled AD102 and AD103 for the future product stack refresh which is useless for the current product stack.

GPU clock speed is part of the SKU characteristics, hence my use of GTexel/s scaling.

RTX 6000 ADA has 568 TMUs and 1,423 GTexel/s, 96 MB L2 cache. No AIB OC variants. Not a gaming SKU.

RTX 4090 has 512 TMUs and 1,290 GTexel/s, 72 MB L2 cache. AIB OC can reach 1,352 GTexel/s.

RTX 4080 has 304 TMUs and 761.5 GTexel/s, , 64 MB L2 cache. AIB OC can reach 802.6 GTexel/s. 59% of RTX 4090's GTexel/s. My Gigabyte RTX 4080 Gaming OC's heatsink was designed for RTX 4090 SKU, hence it's overkill for RTX 4080 i.e. AIB one button ~2.9 Ghz OC is easy.

RTX 4070 Ti has 240 TMUs, 626.4 GTexel/s, 42 MB L2 cache. AIB OC can reach 666.0 GTexel/s. ~49% of RTX 4090's GTexel/s.
 
Last edited:
Top