• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Rumor: NVIDIA RTX 3080, 3070, 3060 Mobile Specifications Detailed

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.35/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
Apparently, specifications for NVIDIA's upcoming RTX 30-series mobile solutions have been made public. According to Videocardz via Notebookcheck, NVIDIA will introduce three mobile versions of their RTX 30-series graphics cards in the form of the RTX 3080, RTX 3070 and RTX 3060. Like past NVIDIA mobile solutions, these won't directly correspond, hardware-wise, to their desktop counterparts; NVIDIA has the habit of downgrading their mobile solutions' chips compared to their desktop counterparts. According to the leaked specifications, this means the mobile RTX 3080 will maker use of the company's GA-104 chip, instead of the GA-102 silicon found on desktop versions of the card.

The mobile RTX 3080 should thus feature a total of 6,144 CUDA cores, as present in the fully-enabled GA-104 chip (compare that to the 5,888 CUDA cores available on the desktop RTX 3070, and the 8,704 CUDA cores available on the RTX 3080). These CUDA cores would be clocked at up to 1.7 GHz. The memory bus should also see a cut down to 256-bit, which would allow NVIDIA to distribute as many as 4 versions of the RTX 3080 mobile: Max-Q (TGP 80-90 W), Max-P (TGP 115-150 W), with either 8 GB or 16 GB of GDDR6 memory. The RTX 3070 mobile keeps the GA-104 chip, 256-bit bus and GDDR6 memory subsystem (apparently with only 8 GB memory pool available), but further cuts down CUDA cores to 5,120 (Max-Q TGP 80-90 W, Max-P TGP 115-150 W). Finally, the RTX 3060 mobile should make use of the GA106 chip, set up with 3,072 available CUDA cores and a 192-bit memory bus across its 6 GB of GDDR6 VRAM pool (Max-Q TGP 60-70 W), Max-P (TGP 80-115 W). Expect these specs to be confirmed (or not) come January 12th.



View at TechPowerUp Main Site
 
Joined
Oct 27, 2020
Messages
65 (0.05/day)
Back to Maxwell days I see. Goes to show how much pascal was efficient, I mean like all the desktop and laptop variants used same silicon (1070 being an exception which is even better than the desktop variant) with not that much gimping in terms of clocks to meet the TDP requirements and were within under 15% range in terms of performance difference compared to desktop variants. On the other hand Turing was a mess. Anything above 1660 Ti was nowhere close to desktop variant even though they used the same silicon including super variants that came later on. Ampere is as we all know a power-guzzling arc and there's no way they would fit anything above the GA104 silicon without melting the laptop chassis even with ridiculous gimping in terms of clocks and they just had to evolve but backwards this time.
 
Last edited:
Joined
Feb 20, 2019
Messages
7,194 (3.86/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
The problem with Ampere on Samsung 8nm is that it isn't really any more efficient than Turing on TSMC 12nm

So you're paying this mad premium and all you're getting is the 3000-series name, it's still 100% contstrained by its performance/Watt which is distinctly 2018 levels, and Turing's 2018 level of performance/Watt wasn't actually amazing in the RTX cards, only the more efficient 1600-series cards really outshone 2016's Pascal.
 
Joined
Jul 18, 2016
Messages
353 (0.13/day)
Location
Indonesia
System Name Nero Mini
Processor AMD Ryzen 7 5800X 4.7GHz-4.9GHz
Motherboard Gigabyte X570i Aorus Pro Wifi
Cooling Noctua NH-D15S+3x Noctua IPPC 3K
Memory Team Dark 3800MHz CL16 2x16GB 55ns
Video Card(s) Palit RTX 2060 Super JS Shunt Mod 2130MHz/1925MHz + 2x Noctua 120mm IPPC 3K
Storage Adata XPG Gammix S50 1TB
Display(s) LG 27UD68W
Case Lian-Li TU-150
Power Supply Corsair SF750 Platinum
Software Windows 10 Pro
Wow if that's true that's the most gimped mobile variant of their desktop counterparts in a while...not surprising though considering the power guzzler that is Ampere.
 
Joined
Feb 18, 2012
Messages
2,715 (0.61/day)
System Name MSI GP76
Processor intel i7 11800h
Cooling 2 laptop fans
Memory 32gb of 3000mhz DDR4
Video Card(s) Nvidia 3070
Storage x2 PNY 8tb cs2130 m.2 SSD--16tb of space
Display(s) 17.3" IPS 1920x1080 240Hz
Power Supply 280w laptop power supply
Mouse Logitech m705
Keyboard laptop keyboard
Software lots of movies and Windows 10 with win 7 shell
Benchmark Scores Good enough for me
Ampere is only a power guzzler when you have a 3080 and higher. The 3070 is rated at 220w with the performance level of a 2080ti thats rated at, i think, 250 or 275w.
Its really hard to cut down a 3080 from 320w rated down 150w for a laptop or 200w for some of the bigger laptops. Most top end gpus for laptop are rated at 150w max, some are 200w if the cooling allows it.
 
D

Deleted member 185088

Guest
It's a mess again, it shows how Ampere is not a big of a jump as portrait by some.
 
Joined
Jan 8, 2017
Messages
8,863 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Ouch, looks like the "M" parts are back. I wonder if they'll sell them under a different name compared to the desktop counterparts like they should, or if they'll be disingenuous about it like I'm expecting them to. "Max-Q" was already pretty bad and misleading.
 
Joined
Feb 15, 2019
Messages
1,525 (0.82/day)
System Name Personal Gaming Rig
Processor Ryzen 7800X3D
Motherboard MSI X670E Carbon
Cooling MO-RA 3 420
Memory 32GB 6000MHz
Video Card(s) RTX 4090 ICHILL FROSTBITE ULTRA
Storage 4x 2TB Nvme
Display(s) Samsung G8 OLED
Case Silverstone FT04
Cut-down in both core (CUDA count) and Memory (non-X )

Will the price gets a cut-down too ?
 
Joined
Mar 28, 2020
Messages
1,632 (1.12/day)
I recall Nvidia proudly mentioned that the laptop GPU = desktop GPU back then with Pascal if I am not mistaken. With Ampere, that's taken a step back. I think it is a sensible decision for now because of the low supply of GA102, and not to mention that the power requirement is very high for a laptop part (which will be difficult to cool as well even if they scale back the clockspeed).
 
Joined
Oct 10, 2018
Messages
140 (0.07/day)
If this is true, RTX 3060 will be good for laptops. Considering 3072 cores, it will give same performance of RTX 2060 desktop version for $1000 (i hope). Now think the desktop version, its name is RTX 3060 even so, it must give RTX 2070's performance also, there is no GTX series. So, RTX 3060 6GB which will be about $230 will be good value. Also, RTX 3060 12GB which will be about $299-329 for RTX 2070S-2080' performance level will be good value.

The problem with Ampere on Samsung 8nm is that it isn't really any more efficient than Turing on TSMC 12nm
So you're paying this mad premium and all you're getting is the 3000-series name, it's still 100% contstrained by its performance/Watt which is distinctly 2018 levels, and Turing's 2018 level of performance/Watt wasn't actually amazing in the RTX cards, only the more efficient 1600-series cards really outshone 2016's Pascal.

I believe that perf/watt is bad because of core counts and Ampere's cores don't give full performance or Ampere is bad architecture but, in years, Nvidia will improve their architecture's core performance. So, Ampere is the first architecture has a lot of cores.
 

Havefun

New Member
Joined
Apr 16, 2020
Messages
3 (0.00/day)

NVIDIA GeForce GTX 1660 Ti Mobile​

Graphics Processor TU116 Cores 1536 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Base Clock 1455 MHz Boost Clock 1590 MHz
Theoretical Performance Pixel Rate 76.32 GPixel/s Texture Rate 152.6 GTexel/s FP16 (half) 9.769 TFLOPS (2:1) FP32 (float) 4.884 TFLOPS FP64 (double) 152.6 GFLOPS (1:32) Bandwidth 288.0 GB/s

NVIDIA GeForce RTX 3060 Mobile​

Graphics Processor GA106 Cores 3072 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Tensor Cores 96 RT Cores 24
Base Clock 900 MHz Boost Clock 1425 MHz
Theoretical Performance Pixel Rate 68.40 GPixel/s Texture Rate 136.8 GTexel/s FP16 (half) 8.755 TFLOPS (1:1) FP32 (float) 8.755 TFLOPS FP64 (double) 136.8 GFLOPS (1:64)
Bandwidth 336.0 GB/s


Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter

 
Joined
Feb 20, 2019
Messages
7,194 (3.86/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.

NVIDIA GeForce GTX 1660 Ti Mobile​

Graphics Processor TU116 Cores 1536 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Base Clock 1455 MHz Boost Clock 1590 MHz
Theoretical Performance Pixel Rate 76.32 GPixel/s Texture Rate 152.6 GTexel/s FP16 (half) 9.769 TFLOPS (2:1) FP32 (float) 4.884 TFLOPS FP64 (double) 152.6 GFLOPS (1:32) Bandwidth 288.0 GB/s

NVIDIA GeForce RTX 3060 Mobile​

Graphics Processor GA106 Cores 3072 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Tensor Cores 96 RT Cores 24
Base Clock 900 MHz Boost Clock 1425 MHz
Theoretical Performance Pixel Rate 68.40 GPixel/s Texture Rate 136.8 GTexel/s FP16 (half) 8.755 TFLOPS (1:1) FP32 (float) 8.755 TFLOPS FP64 (double) 136.8 GFLOPS (1:64)
Bandwidth 336.0 GB/s


Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter

The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.
 
Joined
Nov 11, 2016
Messages
3,045 (1.13/day)
System Name The de-ploughminator Mk-II
Processor i7 13700KF
Motherboard MSI Z790 Carbon
Cooling ID-Cooling SE-226-XT + Phanteks T30
Memory 2x16GB G.Skill DDR5 7200Cas34
Video Card(s) Asus RTX4090 TUF
Storage Kingston KC3000 2TB NVME
Display(s) LG OLED CX48"
Case Corsair 5000D Air
Power Supply Corsair HX850
Mouse Razor Viper Ultimate
Keyboard Corsair K75
Software win11
The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.

Ampere does appear to be more efficient at the lower performance tier though

Performance per Watt FPS 1920x1080


At the same TDP as the Max-Q Turing, Max-Q Ampere would be ~25% faster, which is wasted with how slow mobile CPU currently are anyways.
I have an Intel 10875H + 2070 Super Max-Q laptop and most of the time I run into CPU bottleneck in game.
 
Joined
Jan 24, 2011
Messages
161 (0.03/day)
The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.
Comparing IPC between Turing and Ampere based on a single FP32(Cuda core) is pointless. Turing per SM has 64x FP32(Cuda) units + 64x INT32 units, Ampere per SM has 64x FP32 units + 64x FP32/INT32 units.
You are comparing 2080S with 48SM against 3060Ti which has only 38SM(Streaming multi-processor), so It's not surprising that the performance is pretty close.

Better comparison would be RTX 2080 vs RTX 3070, they have the same number of SM, the difference is in 2x more Cuda cores and 50% more ROPs, clockspeed and bandwidth is comparable.
The difference in performance is 28%, which doesn't look great considering the chip has 2x more Cuda and 50% more ROPs, but half of those Cuda cores are doing either INT or FP operation and the number of SM didn't change so you can't really expect massive performance gains.

Now the question is If It was worth It or not. The number of transistors increased by 28%(17.4 vs 13.6), the same as performance, and this increase shouldn't be caused only by adding extra Cuda cores when you also have more ROPs and new features. Average power consumption is 215W(2080) vs 220W(3070), so only 2%(5W) difference for 28% more performance, yes I know the manufacturing process is different. Or RTX3070 performs as RTX2080Ti while having less transistors and lower power consumption.
Ampere is not worse than Turing.

NVIDIA GeForce GTX 1660 Ti Mobile​

Graphics Processor TU116 Cores 1536 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Base Clock 1455 MHz Boost Clock 1590 MHz
Theoretical Performance Pixel Rate 76.32 GPixel/s Texture Rate 152.6 GTexel/s FP16 (half) 9.769 TFLOPS (2:1) FP32 (float) 4.884 TFLOPS FP64 (double) 152.6 GFLOPS (1:32) Bandwidth 288.0 GB/s

NVIDIA GeForce RTX 3060 Mobile​

Graphics Processor GA106 Cores 3072 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Tensor Cores 96 RT Cores 24
Base Clock 900 MHz Boost Clock 1425 MHz
Theoretical Performance Pixel Rate 68.40 GPixel/s Texture Rate 136.8 GTexel/s FP16 (half) 8.755 TFLOPS (1:1) FP32 (float) 8.755 TFLOPS FP64 (double) 136.8 GFLOPS (1:64)
Bandwidth 336.0 GB/s


Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter

Do you know in what kind of workload does RTX 3060 clock as low as 900Mhz and what power consumption or TDP It actually has at that clockspeed? If your answer is that you don't know, then your conlusion is premature.
What is important is the actual clockspeed during gaming, If It is comparable to 1660Ti, then the performance difference should be >20%. If It's lower then performance will suffer, but Nvidia is positioning It above 1660Ti so It should perform better.
BTW from where did you get those clocks for 3060 Mobile?
 
Last edited:
Joined
Feb 20, 2019
Messages
7,194 (3.86/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
Comparing IPC between Turing and Ampere based on a single FP32(Cuda core) is pointless. Turing per SM has 64x FP32(Cuda) units + 64x INT32 units, Ampere per SM has 64x FP32 units + 64x FP32/INT32 units.
You are comparing 2080S with 48SM against 3060Ti which has only 38SM(Streaming multi-processor), so It's not surprising that the performance is pretty close.

Better comparison would be RTX 2080 vs RTX 3070, they have the same number of SM, the difference is in 2x more Cuda cores and 50% more ROPs, clockspeed and bandwidth is comparable.
The difference in performance is 28%, which doesn't look great considering the chip has 2x more Cuda and 50% more ROPs, but half of those Cuda cores are doing either INT or FP operation and the number of SM didn't change so you can't really expect massive performance gains.
These are all valid points. Ampere is, on paper, and in a theoretical scenario or synthetic test, both faster and more efficient than Turing.

The problem is that the applications and games we have right now can't fully utilise it - and the relatively short window of advantage that mobile GPUs get before something better/more efficient comes along means that I doubt those applications and games will exist during Ampere's window of relevance for "premium DTR/Gaming laptops".

The reason I picked the 2080S and not the 2080 as a matchup is because that's an exact performance match for the 3060Ti as tested by TPU in a wide range of current titles, right now. Clock-for-clock, Ampere uses 59% more 'cores' than Turing's 'cores' even though those two definitions of cores aren't the same from a technological standpoint. The underlying architecture of whether it's an FP, INT, or combined core doesn't matter to today's games, even if it will probably scale differently in future applications.

For Ampere's 12-18 months of laptop shelf life, it only matters today that 3072 Turing cores can do the same exact work as 4864 Ampere cores. For people hanging onto these laptops for 5+ years, it will probably make a big difference. Right now, today, that means diddly squat :p

Ampere does appear to be more efficient at the lower performance tier though
I'm not sure it's fair to make that comparison; An xx60 vs xx80 comparison isn't picking two SKUs targeting the same thing. If we're allowed to mix SKUs, the Turing 1660Ti matches the 3060Ti almost perfectly for performance/Watt and the 2080 is much better than the 3080!

For different models within any single product generation, the performance/Watt is less about architectural efficiency and more about the target market for the product. Lower end SKUs target more efficient operation for use with cheaper cooling/VRMs/PCB design. Flagship models go all out on cooling/VRMs/PCB and crank the power target as high as reasonably possible to be the best they can be for that generation. Same product generation and architecture, but opposite ends of the performance/Watt spectrum, which is why comparing them specifically on performance/Watt is so meaningless.

So yes, you can make the efficiency comparison, but it's only going to be fair in like-for-like examples aimed at the same performance segment and market point, so the closest we have to that is 3080 vs 2080, 3070 vs 2070 etc. It's not even as clean-cut as that, because you could argue that the 3080 is actually closer to the 2080Ti because they both share xx102 silicon dies, and likewise the closes match for a 3070 is actually a 2080S because both of those represent the xx104 dies. I'm not suggesting that either die parity or SKU parity is 100% right, but they're definitely less wrong than making comparisons that are neither the same die nor SKU.
 
Last edited:
Joined
Nov 11, 2016
Messages
3,045 (1.13/day)
System Name The de-ploughminator Mk-II
Processor i7 13700KF
Motherboard MSI Z790 Carbon
Cooling ID-Cooling SE-226-XT + Phanteks T30
Memory 2x16GB G.Skill DDR5 7200Cas34
Video Card(s) Asus RTX4090 TUF
Storage Kingston KC3000 2TB NVME
Display(s) LG OLED CX48"
Case Corsair 5000D Air
Power Supply Corsair HX850
Mouse Razor Viper Ultimate
Keyboard Corsair K75
Software win11
I'm not sure it's fair to make that comparison. an xx60 card vs an xx80 card comparison is two different power and performance targets. The Turing 1660Ti matches the 3060Ti almost perfectly in that regard and the 3080 is much worse than the 2080.

For different models within any single product generation, the performance/Watt is less about architectural efficiency and more about the target market for the product. Lower end SKUs target more efficient operation for use with cheaper cooling/VRMs/PCB design. Flagship models go all out on cooling/VRMs/PCB and crank the power target as high as reasonably possible to be the best they can be for that generation. Same product generation and architecture, but opposite ends of the performance/Watt spectrum.

So yes, you can make the efficiency comparison, but it's only going to be fair in like-for-like examples aimed at the same performance segment and market point, so the closest we have to that is 3080 vs 2080, 3070 vs 2070 etc.

You can't compare the efficiency of desktop GPUs and make conjecture about mobile GPUs anyways.

At the same TGP (80W-90W-115W form factor), the stronger GPU will be the better performer.
When you look at the performance per watt of the Desktop 2080 Super, it's nothing special, but the mobile 2080 Super Max-Q is the efficiency king.

So yeah, the 3080 Max-Q will no doubt beat the 2080 Super Max-Q by at least 20%, when you are not being CPU bottlenecked that is.
IMHO, mobile RTX3000 should be paired with Ryzen 5000 mobile and nothing less, it was kinda dissappointing to see OEM only paired AMD Renoir CPU with Mobile RTX 2060 or slower.
 
Last edited:
Joined
Feb 20, 2019
Messages
7,194 (3.86/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
You can't compare the efficiency of desktop GPUs and make conjecture about mobile GPUs anyways.

At the same TGP (80W-90W-115W form factor), the stronger GPU will be the better performer.
When you look at the performance per watt of the Desktop 2080 Super, it's nothing special, but the mobile 2080 Super Max-Q is the efficiency king.

So yeah, the 3080 Max-Q will no doubt beat the 2080 Super Max-Q by at least 20%, when you are not being CPU bottlenecked that is.
IMHO, mobile RTX3000 should be paired with Ryzen 5000 mobile and nothing less, it was kinda dissappointing to see OEM only paired AMD Renoir CPU with Mobile RTX 2060 or slower.
Yeah that's kind of my point, I'm not comparing GPUs I'm comparing architectures. In today's software Nvidia's definition of a Turing core is more efficient at any given clockspeed than Nividia's definition of an Ampere core.

I have been practising what you suggest for about a decade now - buy a higher-end SKU than I need and downclocking it to bring the efficiency up. That's all mobile models really are anyway.
 
Joined
Jan 24, 2011
Messages
161 (0.03/day)
These are all valid points. Ampere is, on paper, and in a theoretical scenario or synthetic test, both faster and more efficient than Turing.
Ampere is on average faster and more efficient in real world games as shown in TPU reviews.
The reason I picked the 2080S and not the 2080 as a matchup is because that's an exact performance match for the 3060Ti as tested by TPU in a wide range of current titles, right now. Clock-for-clock, Ampere uses 59% more 'cores' than Turing's 'cores' even though those two definitions of cores aren't the same from a technological standpoint. The underlying architecture of whether it's an FP, INT, or combined core doesn't matter to today's games, even if it will probably scale differently in future applications.
3060Ti is a bit faster than both 2080 and 2080s. If you want to compare exact same performance then there is 2080Ti vs 3070.
Ampere doesn't really have 2x more Cuda cores, the number of cores(units) is the same per SM, the difference is only that half of them can now do either FP32 or INT32 operation.
Even today It matters what kind of unit(FP, INT or combined) It is, If you have a combined core(unit) and the game needs to use INT32 units then there is no advantage over Turing, It's just 64x FP32 and 64x INT32 per SM, Ampere has the advantage when there is no INT32 instruction executed and then you have 128x FP32 units.

Yeah that's kind of my point, I'm not comparing GPUs I'm comparing architectures. In today's software Nvidia's definition of a Turing core is more efficient at any given clockspeed than Nividia's definition of an Ampere core.

I have been practising what you suggest for about a decade now - buy a higher-end SKU than I need and downclocking it to bring the efficiency up. That's all mobile models really are anyway.
Comparing Turing core to Ampere core is simply not right and you can't make any valid conclussion based on It.
Turing has fixed 64 FP32 and 64 INT32 units per SM.
Ampere has fixed 64 FP32 units and combined 64 INT32/FP32 units per SM.
If It was fixed 128 FP32 units + 64 INT32 units per SM, then so be It, but even that wouldn't be fair, If the specs of the rest of the chip stays the same.

I'm not sure it's fair to make that comparison; An xx60 vs xx80 comparison isn't picking two SKUs targeting the same thing. If we're allowed to mix SKUs, the Turing 1660Ti matches the 3060Ti almost perfectly for performance/Watt and the 2080 is much better than the 3080!
xx60 vs xx80 is just a marketing name, comparing based on actual GPU die(TU104 vs GA104) is still better, but in my opinion the best is comparing based on SM count.
BTW here is a link to performance/W chart from TPU and in 4K It look like this.
GTX 1660Ti vs RTX 3060Ti
89% vs 108%
RTX 2080 vs RTX 3080
90% vs 95%
RTX 2080s vs RTX 3080
84% vs 95%
I will wait for actual reviews for mobile Ampere and won't make a final conclusion before that.
 
Last edited:
Joined
Jul 9, 2015
Messages
3,413 (1.07/day)
System Name M3401 notebook
Processor 5600H
Motherboard NA
Memory 16GB
Video Card(s) 3050
Storage 500GB SSD
Display(s) 14" OLED screen of the laptop
Software Windows 10
Benchmark Scores 3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.
Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter
3060 is likely better due to the CUs (which are in reality half of the claimed) supporting 2 fp32 ops in parallel.
Unless I'm mistaken and that doesn't cover fp32.
 
Joined
Feb 20, 2019
Messages
7,194 (3.86/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
Ampere is on average faster and more efficient in real world games as shown in TPU reviews.
Hey, don't quote me out of context! We're talking about PER CORE here and as shown in TPU reviews, Ampere is slower than Turing with the 3072 cores of a 2080S matching the performance of 4864 Ampere cores in a 3060Ti

The rest of your post seems to be a disagreement about what a core is, based on how many of each type in an SM.

At the end of the day, you can theorise until you're blue in the face but according to the the official Nvidia definition of 'cores', Turing does more per core than Ampere across TPU's combined game benchmark suite. That's Nvidia's official numbers against TPU's independent real world testing. If you disagree with either the definition of a core or W1zzard's benchmark results, take that up with them respectively. I'm not making those claims, they are.
 
Last edited:

Havefun

New Member
Joined
Apr 16, 2020
Messages
3 (0.00/day)
Do you know in what kind of workload does RTX 3060 clock as low as 900Mhz and what power consumption or TDP It actually has at that clockspeed? If your answer is that you don't know, then your conlusion is premature.
What is important is the actual clockspeed during gaming, If It is comparable to 1660Ti, then the performance difference should be >20%. If It's lower then performance will suffer, but Nvidia is positioning It above 1660Ti so It should perform better.
BTW from where did you get those clocks for 3060 Mobile?
Yea, there is still no RTX 3060 mobile released so my conclusion is premature, same as other conclusions here. i have these data from Techpowerup GPU database. TDP 3600 = 80W. Now i checked it again - In relative performance 1660 Ti is 7% better then 3600 , and 18% better then 3600 Max-Q. In real performance 3600 will be better but i expect 10-15 FPS difference. Is it worth?

The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.
Yea exactly, when i saw 350W for desktop ampere i was courious how much they cut it down for laptops (50-100W). Ampere is so terrible that before it was released, they announced next gen Hopper. :)
DLSS and RTX ON is not enough in reviews, they also push 4k resolution, cos no laptops have that resolution :D
To produce 2x more cores also cost 2x more. Also Nvidia is forcing people to pay for huge die areas of "Ray Cores". They should cut it down atleast for Mobile versions.
 
Last edited:
Joined
Nov 11, 2016
Messages
3,045 (1.13/day)
System Name The de-ploughminator Mk-II
Processor i7 13700KF
Motherboard MSI Z790 Carbon
Cooling ID-Cooling SE-226-XT + Phanteks T30
Memory 2x16GB G.Skill DDR5 7200Cas34
Video Card(s) Asus RTX4090 TUF
Storage Kingston KC3000 2TB NVME
Display(s) LG OLED CX48"
Case Corsair 5000D Air
Power Supply Corsair HX850
Mouse Razor Viper Ultimate
Keyboard Corsair K75
Software win11
Yea, there is still no RTX 3060 mobile released so my conclusion is premature, same as other conclusions here. i have these data from Techpowerup GPU database. TDP 3600 = 80W. Now i checked it again - In relative performance 1660 Ti is 7% better then 3600 , and 18% better then 3600 Max-Q. In real performance 3600 will be better but i expect 10-15 FPS difference. Is it worth?


Yea exactly, when i saw 350W for desktop ampere i was courious how much they cut it down for laptops (50-100W). Ampere is so terrible that before it was released, they announced next gen Hopper. :)
DLSS and RTX ON is not enough in reviews, they also push 4k resolution, cos no laptops have that resolution :D
To produce 2x more cores also cost 2x more. Also Nvidia is forcing people to pay for huge die areas of "Ray Cores". They should cut it down atleast for Mobile versions.

GA104 is 392mm2 vs TU104 545mm2. More cores or not GA104 is cheaper to produce than TU104.
Nvidia is paying Samsung and TSMC per wafer, not per chip or how many transistors they have.
Even if GA104 is only faster than TU104 by 20% at the same TGP, it is a success because it is enough of an upgrade that people are gonna buy them and Nvidia is making higher profit margin in the process.
 
Joined
Jan 24, 2011
Messages
161 (0.03/day)
Hey, don't quote me out of context! We're talking about PER CORE here and as shown in TPU reviews, Ampere is slower than Turing with the 3072 cores of a 2080S matching the performance of 4864 Ampere cores in a 3060Ti

The rest of your post seems to be a disagreement about what a core is, based on how many of each type in an SM.

At the end of the day, you can theorise until you're blue in the face but according to the the official Nvidia definition of 'cores', Turing does more per core than Ampere across TPU's combined game benchmark suite. That's Nvidia's official numbers against TPU's independent real world testing. If you disagree with either the definition of a core or W1zzard's benchmark results, take that up with them respectively. I'm not making those claims, they are.
You are talking about PER CORE performance and I am trying to tell you that's not a good comparison.
Yeah, Nvidia uses "Cuda core" as a marketing name for FP32 units and I don't really have a problem with that even If It's a bit misleading or with TPU benchmark results, what I have a problem is your comparison. Neither Nvidia, TPU or other reviewers make conclusions about Ampere vs Turing based on the number of Cuda cores, you are the only one.
Let's compare performance(IPC), power consumption and efficiency(performance/W) based on SM and Cuda cores -> RTX 2080 vs RTX 3070
RTX 3070 with 2x more Cuda is only 28% faster in 4K resolution and consumes 220W or 5W more than RTX 2080, which means It has 25% better performance/W ratio.
Performance PER SM -> Ampere SM is 28% faster and consumes 2.3% more W than Turing SM.
Performance PER Cuda -> Ampere Cuda core is ~36% slower and consumes 49% less power than Turing Cuda core and that's simply hillarious, because Cuda core is the same in both Ampere and Turing, there was no change. The change happened a level higher in SM, where the original 64x INT32 units are now capable of FP32 execution.

Yea, there is still no RTX 3060 mobile released so my conclusion is premature, same as other conclusions here. i have these data from Techpowerup GPU database. TDP 3600 = 80W. Now i checked it again - In relative performance 1660 Ti is 7% better then 3600 , and 18% better then 3600 Max-Q. In real performance 3600 will be better but i expect 10-15 FPS difference. Is it worth?
There is no official info about performance or clockspeed, so It's just an estimate someone put there, that's why they write:
This product is not released yet.
Data on this page may change in the future.
My estimate for RTX 3060(Max-Q) is ~20-30% better performance than 1660Ti(Max-Q), If the clockspeed is comparable. If It's worth It or not that's something everyone has to answer for themselves.
 
Last edited:
Joined
Feb 20, 2019
Messages
7,194 (3.86/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
You are talking about PER CORE performance and I am trying to tell you that's not a good comparison.
Yeah, Nvidia uses "Cuda core" as a marketing name for FP32 units and I don't really have a problem with that even If It's a bit misleading or with TPU benchmark results, what I have a problem is your comparison. Neither Nvidia, TPU or other reviewers make conclusions about Ampere vs Turing based on the number of Cuda cores, you are the only one.
Let's compare performance(IPC), power consumption and efficiency(performance/W) based on SM and Cuda cores -> RTX 2080 vs RTX 3070
RTX 3070 with 2x more Cuda is only 28% faster in 4K resolution and consumes 220W or 5W more than RTX 2080, which means It has 25% better performance/W ratio.
Performance PER SM -> Ampere SM is 28% faster and consumes 2.3% more W than Turing SM.
Performance PER Cuda -> Ampere Cuda core is ~36% slower and consumes 49% less power than Turing Cuda core and that's simply hillarious, because Cuda core is the same in both Ampere and Turing, there was no change. The change happened a level higher in SM, where the original 64x INT32 units are now capable of FP32 execution.


There is no official info about performance or clockspeed, so It's just an estimate someone put there, that's why they write:

My estimate for RTX 3060(Max-Q) is ~20-30% better performance than 1660Ti(Max-Q), If the clockspeed is comparable. If It's worth It or not that's something everyone has to answer for themselves.
You're still trying to have an irrelevant and misdirected argument with me about how the cores aren't comparable between generations.

I'm not the one defining cores.
I'm not the one publishing data showing the 3060Ti performance parity with a 2080S

If you don't like it, it's not me that needs convincing; You're preaching to the choir and have been for some time in this thread. Nvidia are, whether you like it or not, marketing and selling their product on core count. This article is mostly about core count (@Ravenlord mentions it 6 times in a single paragraph) and when most people look at GPU specs the two most important factors are the number of cores and the clocks those cores run at.

I get (I always got) the architectural dissimilarities between a Turing and an Ampere core. I know and I don't care that per-core performance isn't a good comparison - that's the comparison that is being made, that Nvidia themselves make, that reviewers make, and that many users will too. Regardless of the comparison's future/architectural relevance, mobile Turing owners will multiply the number of cores they currently have by 1.59 (for the 59% per-core advantage over Ampere) and know that in currently-benchmarked games, an Ampere purchase with fewer cores that that isn't going to be any faster. That's simple maths and backed up with clock-comparable empirical data.
 
Joined
Jan 24, 2011
Messages
161 (0.03/day)
You're still trying to have an irrelevant and misdirected argument with me about how the cores aren't comparable between generations.

I'm not the one defining cores.
I'm not the one publishing data showing the 3060Ti performance parity with a 2080S

If you don't like it, it's not me that needs convincing; You're preaching to the choir and have been for some time in this thread. Nvidia are, whether you like it or not, marketing and selling their product on core count. This article is mostly about core count (@Ravenlord mentions it 6 times in a single paragraph) and when most people look at GPU specs the two most important factors are the number of cores and the clocks those cores run at.

I get (I always got) the architectural dissimilarities between a Turing and an Ampere core. I know and I don't care that per-core performance isn't a good comparison - that's the comparison that is being made, that Nvidia themselves make, that reviewers make, and that many users will too. Regardless of the comparison's future/architectural relevance, mobile Turing owners will multiply the number of cores they currently have by 1.59 (for the 59% per-core advantage over Ampere) and know that in currently-benchmarked games, an Ampere purchase with fewer cores that that isn't going to be any faster. That's simple maths and backed up with clock-comparable empirical data.
Ok, by reading your last two sentences now I get what you were pointing at.
So the conclusion based on desktop models is that Ampere GPU is more power efficient(perf/W) than Turing GPU and as fast or faster depending on which models you are comparing(3060Ti vs 2080S, 3070 vs 2080, 3070 vs 2080ti), but because of the architectural changes in SM you need to watch out for the number of Cuda cores even with same clockspeed, because It's not representative of performance gain over Turing architecture and you could end up with much lower gaming performance than you wanted.:D
For example If you want to upgrade from RTX 2060 mobile with 1920 Cuda cores, then an Ampere GPU with 2944-3072 cores will perform similarly even If the difference in Cuda cores is 53-60% so you need to choose Ampere with 3840 Cuda or more, If you want at least ~25% more performance.
I think this sums It up pretty nicely.


I have to wonder If RTX 3060 mobile will have only 3072Cuda(24SM) with 192bit GDDR6 bus.
An uncut GA104 has 6144Cuda cores(48SM) and 256bit GDDR6 bus. Based on this even 128bit bus should be enough for 3072 cores(24SM) and Nvidia would need to deactivate half of cores(SM) to get this from GA104, that's too much of a waste.
I think this RTX 3060 is based on GA106 and there will be RTX 3060Ti or Super with 3840 cores. The question is If GA106 will have only 24SM or 30SM in full config, but considering GA104 has 48SM I think It will have 30SM and 192bit GDDR6 bus.
 
Last edited:
Top