Wednesday, December 30th 2020

Rumor: NVIDIA RTX 3080, 3070, 3060 Mobile Specifications Detailed

Apparently, specifications for NVIDIA's upcoming RTX 30-series mobile solutions have been made public. According to Videocardz via Notebookcheck, NVIDIA will introduce three mobile versions of their RTX 30-series graphics cards in the form of the RTX 3080, RTX 3070 and RTX 3060. Like past NVIDIA mobile solutions, these won't directly correspond, hardware-wise, to their desktop counterparts; NVIDIA has the habit of downgrading their mobile solutions' chips compared to their desktop counterparts. According to the leaked specifications, this means the mobile RTX 3080 will maker use of the company's GA-104 chip, instead of the GA-102 silicon found on desktop versions of the card.

The mobile RTX 3080 should thus feature a total of 6,144 CUDA cores, as present in the fully-enabled GA-104 chip (compare that to the 5,888 CUDA cores available on the desktop RTX 3070, and the 8,704 CUDA cores available on the RTX 3080). These CUDA cores would be clocked at up to 1.7 GHz. The memory bus should also see a cut down to 256-bit, which would allow NVIDIA to distribute as many as 4 versions of the RTX 3080 mobile: Max-Q (TGP 80-90 W), Max-P (TGP 115-150 W), with either 8 GB or 16 GB of GDDR6 memory. The RTX 3070 mobile keeps the GA-104 chip, 256-bit bus and GDDR6 memory subsystem (apparently with only 8 GB memory pool available), but further cuts down CUDA cores to 5,120 (Max-Q TGP 80-90 W, Max-P TGP 115-150 W). Finally, the RTX 3060 mobile should make use of the GA106 chip, set up with 3,072 available CUDA cores and a 192-bit memory bus across its 6 GB of GDDR6 VRAM pool (Max-Q TGP 60-70 W), Max-P (TGP 80-115 W). Expect these specs to be confirmed (or not) come January 12th.
Sources: Notebookcheck, via Videocardz
Add your own comment

33 Comments on Rumor: NVIDIA RTX 3080, 3070, 3060 Mobile Specifications Detailed

#1
hgk87
Back to Maxwell days I see. Goes to show how much pascal was efficient, I mean like all the desktop and laptop variants used same silicon (1070 being an exception which is even better than the desktop variant) with not that much gimping in terms of clocks to meet the TDP requirements and were within under 15% range in terms of performance difference compared to desktop variants. On the other hand Turing was a mess. Anything above 1660 Ti was nowhere close to desktop variant even though they used the same silicon including super variants that came later on. Ampere is as we all know a power-guzzling arc and there's no way they would fit anything above the GA104 silicon without melting the laptop chassis even with ridiculous gimping in terms of clocks and they just had to evolve but backwards this time.
Posted on Reply
#2
Chrispy_
The problem with Ampere on Samsung 8nm is that it isn't really any more efficient than Turing on TSMC 12nm

So you're paying this mad premium and all you're getting is the 3000-series name, it's still 100% contstrained by its performance/Watt which is distinctly 2018 levels, and Turing's 2018 level of performance/Watt wasn't actually amazing in the RTX cards, only the more efficient 1600-series cards really outshone 2016's Pascal.
Posted on Reply
#3
owen10578
Wow if that's true that's the most gimped mobile variant of their desktop counterparts in a while...not surprising though considering the power guzzler that is Ampere.
Posted on Reply
#4
yotano211
Ampere is only a power guzzler when you have a 3080 and higher. The 3070 is rated at 220w with the performance level of a 2080ti thats rated at, i think, 250 or 275w.
Its really hard to cut down a 3080 from 320w rated down 150w for a laptop or 200w for some of the bigger laptops. Most top end gpus for laptop are rated at 150w max, some are 200w if the cooling allows it.
Posted on Reply
#5
Xex360
It's a mess again, it shows how Ampere is not a big of a jump as portrait by some.
Posted on Reply
#6
Vya Domus
Ouch, looks like the "M" parts are back. I wonder if they'll sell them under a different name compared to the desktop counterparts like they should, or if they'll be disingenuous about it like I'm expecting them to. "Max-Q" was already pretty bad and misleading.
Posted on Reply
#7
Crackong
Cut-down in both core (CUDA count) and Memory (non-X )

Will the price gets a cut-down too ?
Posted on Reply
#8
watzupken
I recall Nvidia proudly mentioned that the laptop GPU = desktop GPU back then with Pascal if I am not mistaken. With Ampere, that's taken a step back. I think it is a sensible decision for now because of the low supply of GA102, and not to mention that the power requirement is very high for a laptop part (which will be difficult to cool as well even if they scale back the clockspeed).
Posted on Reply
#9
Ibotibo01
If this is true, RTX 3060 will be good for laptops. Considering 3072 cores, it will give same performance of RTX 2060 desktop version for $1000 (i hope). Now think the desktop version, its name is RTX 3060 even so, it must give RTX 2070's performance also, there is no GTX series. So, RTX 3060 6GB which will be about $230 will be good value. Also, RTX 3060 12GB which will be about $299-329 for RTX 2070S-2080' performance level will be good value.
Chrispy_
The problem with Ampere on Samsung 8nm is that it isn't really any more efficient than Turing on TSMC 12nm
So you're paying this mad premium and all you're getting is the 3000-series name, it's still 100% contstrained by its performance/Watt which is distinctly 2018 levels, and Turing's 2018 level of performance/Watt wasn't actually amazing in the RTX cards, only the more efficient 1600-series cards really outshone 2016's Pascal.
I believe that perf/watt is bad because of core counts and Ampere's cores don't give full performance or Ampere is bad architecture but, in years, Nvidia will improve their architecture's core performance. So, Ampere is the first architecture has a lot of cores.
Posted on Reply
#10
Havefun

NVIDIA GeForce GTX 1660 Ti Mobile

Graphics Processor TU116 Cores 1536 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Base Clock 1455 MHz Boost Clock 1590 MHz
Theoretical Performance Pixel Rate 76.32 GPixel/s Texture Rate 152.6 GTexel/s FP16 (half) 9.769 TFLOPS (2:1) FP32 (float) 4.884 TFLOPS FP64 (double) 152.6 GFLOPS (1:32) Bandwidth 288.0 GB/s

NVIDIA GeForce RTX 3060 Mobile

Graphics Processor GA106 Cores 3072 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Tensor Cores 96 RT Cores 24
Base Clock 900 MHz Boost Clock 1425 MHz
Theoretical Performance Pixel Rate 68.40 GPixel/s Texture Rate 136.8 GTexel/s FP16 (half) 8.755 TFLOPS (1:1) FP32 (float) 8.755 TFLOPS FP64 (double) 136.8 GFLOPS (1:64)
Bandwidth 336.0 GB/s


Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter

Posted on Reply
#11
Chrispy_
Havefun

NVIDIA GeForce GTX 1660 Ti Mobile

Graphics Processor TU116 Cores 1536 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Base Clock 1455 MHz Boost Clock 1590 MHz
Theoretical Performance Pixel Rate 76.32 GPixel/s Texture Rate 152.6 GTexel/s FP16 (half) 9.769 TFLOPS (2:1) FP32 (float) 4.884 TFLOPS FP64 (double) 152.6 GFLOPS (1:32) Bandwidth 288.0 GB/s

NVIDIA GeForce RTX 3060 Mobile

Graphics Processor GA106 Cores 3072 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Tensor Cores 96 RT Cores 24
Base Clock 900 MHz Boost Clock 1425 MHz
Theoretical Performance Pixel Rate 68.40 GPixel/s Texture Rate 136.8 GTexel/s FP16 (half) 8.755 TFLOPS (1:1) FP32 (float) 8.755 TFLOPS FP64 (double) 136.8 GFLOPS (1:64)
Bandwidth 336.0 GB/s


Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter


The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.
Posted on Reply
#12
nguyen
Chrispy_
The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.
Ampere does appear to be more efficient at the lower performance tier though



At the same TDP as the Max-Q Turing, Max-Q Ampere would be ~25% faster, which is wasted with how slow mobile CPU currently are anyways.
I have an Intel 10875H + 2070 Super Max-Q laptop and most of the time I run into CPU bottleneck in game.
Posted on Reply
#13
THANATOS
Chrispy_
The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.
Comparing IPC between Turing and Ampere based on a single FP32(Cuda core) is pointless. Turing per SM has 64x FP32(Cuda) units + 64x INT32 units, Ampere per SM has 64x FP32 units + 64x FP32/INT32 units.
You are comparing 2080S with 48SM against 3060Ti which has only 38SM(Streaming multi-processor), so It's not surprising that the performance is pretty close.

Better comparison would be RTX 2080 vs RTX 3070, they have the same number of SM, the difference is in 2x more Cuda cores and 50% more ROPs, clockspeed and bandwidth is comparable.
The difference in performance is 28%, which doesn't look great considering the chip has 2x more Cuda and 50% more ROPs, but half of those Cuda cores are doing either INT or FP operation and the number of SM didn't change so you can't really expect massive performance gains.

Now the question is If It was worth It or not. The number of transistors increased by 28%(17.4 vs 13.6), the same as performance, and this increase shouldn't be caused only by adding extra Cuda cores when you also have more ROPs and new features. Average power consumption is 215W(2080) vs 220W(3070), so only 2%(5W) difference for 28% more performance, yes I know the manufacturing process is different. Or RTX3070 performs as RTX2080Ti while having less transistors and lower power consumption.
Ampere is not worse than Turing.
Havefun

NVIDIA GeForce GTX 1660 Ti Mobile

Graphics Processor TU116 Cores 1536 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Base Clock 1455 MHz Boost Clock 1590 MHz
Theoretical Performance Pixel Rate 76.32 GPixel/s Texture Rate 152.6 GTexel/s FP16 (half) 9.769 TFLOPS (2:1) FP32 (float) 4.884 TFLOPS FP64 (double) 152.6 GFLOPS (1:32) Bandwidth 288.0 GB/s

NVIDIA GeForce RTX 3060 Mobile

Graphics Processor GA106 Cores 3072 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Tensor Cores 96 RT Cores 24
Base Clock 900 MHz Boost Clock 1425 MHz
Theoretical Performance Pixel Rate 68.40 GPixel/s Texture Rate 136.8 GTexel/s FP16 (half) 8.755 TFLOPS (1:1) FP32 (float) 8.755 TFLOPS FP64 (double) 136.8 GFLOPS (1:64)
Bandwidth 336.0 GB/s


Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter


Do you know in what kind of workload does RTX 3060 clock as low as 900Mhz and what power consumption or TDP It actually has at that clockspeed? If your answer is that you don't know, then your conlusion is premature.
What is important is the actual clockspeed during gaming, If It is comparable to 1660Ti, then the performance difference should be >20%. If It's lower then performance will suffer, but Nvidia is positioning It above 1660Ti so It should perform better.
BTW from where did you get those clocks for 3060 Mobile?
Posted on Reply
#14
Chrispy_
THANATOS
Comparing IPC between Turing and Ampere based on a single FP32(Cuda core) is pointless. Turing per SM has 64x FP32(Cuda) units + 64x INT32 units, Ampere per SM has 64x FP32 units + 64x FP32/INT32 units.
You are comparing 2080S with 48SM against 3060Ti which has only 38SM(Streaming multi-processor), so It's not surprising that the performance is pretty close.

Better comparison would be RTX 2080 vs RTX 3070, they have the same number of SM, the difference is in 2x more Cuda cores and 50% more ROPs, clockspeed and bandwidth is comparable.
The difference in performance is 28%, which doesn't look great considering the chip has 2x more Cuda and 50% more ROPs, but half of those Cuda cores are doing either INT or FP operation and the number of SM didn't change so you can't really expect massive performance gains.
These are all valid points. Ampere is, on paper, and in a theoretical scenario or synthetic test, both faster and more efficient than Turing.

The problem is that the applications and games we have right now can't fully utilise it - and the relatively short window of advantage that mobile GPUs get before something better/more efficient comes along means that I doubt those applications and games will exist during Ampere's window of relevance for "premium DTR/Gaming laptops".

The reason I picked the 2080S and not the 2080 as a matchup is because that's an exact performance match for the 3060Ti as tested by TPU in a wide range of current titles, right now. Clock-for-clock, Ampere uses 59% more 'cores' than Turing's 'cores' even though those two definitions of cores aren't the same from a technological standpoint. The underlying architecture of whether it's an FP, INT, or combined core doesn't matter to today's games, even if it will probably scale differently in future applications.

For Ampere's 12-18 months of laptop shelf life, it only matters today that 3072 Turing cores can do the same exact work as 4864 Ampere cores. For people hanging onto these laptops for 5+ years, it will probably make a big difference. Right now, today, that means diddly squat :p
nguyen
Ampere does appear to be more efficient at the lower performance tier though
I'm not sure it's fair to make that comparison; An xx60 vs xx80 comparison isn't picking two SKUs targeting the same thing. If we're allowed to mix SKUs, the Turing 1660Ti matches the 3060Ti almost perfectly for performance/Watt and the 2080 is much better than the 3080!

For different models within any single product generation, the performance/Watt is less about architectural efficiency and more about the target market for the product. Lower end SKUs target more efficient operation for use with cheaper cooling/VRMs/PCB design. Flagship models go all out on cooling/VRMs/PCB and crank the power target as high as reasonably possible to be the best they can be for that generation. Same product generation and architecture, but opposite ends of the performance/Watt spectrum, which is why comparing them specifically on performance/Watt is so meaningless.

So yes, you can make the efficiency comparison, but it's only going to be fair in like-for-like examples aimed at the same performance segment and market point, so the closest we have to that is 3080 vs 2080, 3070 vs 2070 etc. It's not even as clean-cut as that, because you could argue that the 3080 is actually closer to the 2080Ti because they both share xx102 silicon dies, and likewise the closes match for a 3070 is actually a 2080S because both of those represent the xx104 dies. I'm not suggesting that either die parity or SKU parity is 100% right, but they're definitely less wrong than making comparisons that are neither the same die nor SKU.
Posted on Reply
#15
nguyen
Chrispy_
I'm not sure it's fair to make that comparison. an xx60 card vs an xx80 card comparison is two different power and performance targets. The Turing 1660Ti matches the 3060Ti almost perfectly in that regard and the 3080 is much worse than the 2080.

For different models within any single product generation, the performance/Watt is less about architectural efficiency and more about the target market for the product. Lower end SKUs target more efficient operation for use with cheaper cooling/VRMs/PCB design. Flagship models go all out on cooling/VRMs/PCB and crank the power target as high as reasonably possible to be the best they can be for that generation. Same product generation and architecture, but opposite ends of the performance/Watt spectrum.

So yes, you can make the efficiency comparison, but it's only going to be fair in like-for-like examples aimed at the same performance segment and market point, so the closest we have to that is 3080 vs 2080, 3070 vs 2070 etc.
You can't compare the efficiency of desktop GPUs and make conjecture about mobile GPUs anyways.

At the same TGP (80W-90W-115W form factor), the stronger GPU will be the better performer.
When you look at the performance per watt of the Desktop 2080 Super, it's nothing special, but the mobile 2080 Super Max-Q is the efficiency king.

So yeah, the 3080 Max-Q will no doubt beat the 2080 Super Max-Q by at least 20%, when you are not being CPU bottlenecked that is.
IMHO, mobile RTX3000 should be paired with Ryzen 5000 mobile and nothing less, it was kinda dissappointing to see OEM only paired AMD Renoir CPU with Mobile RTX 2060 or slower.
Posted on Reply
#16
Chrispy_
nguyen
You can't compare the efficiency of desktop GPUs and make conjecture about mobile GPUs anyways.

At the same TGP (80W-90W-115W form factor), the stronger GPU will be the better performer.
When you look at the performance per watt of the Desktop 2080 Super, it's nothing special, but the mobile 2080 Super Max-Q is the efficiency king.

So yeah, the 3080 Max-Q will no doubt beat the 2080 Super Max-Q by at least 20%, when you are not being CPU bottlenecked that is.
IMHO, mobile RTX3000 should be paired with Ryzen 5000 mobile and nothing less, it was kinda dissappointing to see OEM only paired AMD Renoir CPU with Mobile RTX 2060 or slower.
Yeah that's kind of my point, I'm not comparing GPUs I'm comparing architectures. In today's software Nvidia's definition of a Turing core is more efficient at any given clockspeed than Nividia's definition of an Ampere core.

I have been practising what you suggest for about a decade now - buy a higher-end SKU than I need and downclocking it to bring the efficiency up. That's all mobile models really are anyway.
Posted on Reply
#17
THANATOS
Chrispy_
These are all valid points. Ampere is, on paper, and in a theoretical scenario or synthetic test, both faster and more efficient than Turing.
Ampere is on average faster and more efficient in real world games as shown in TPU reviews.
The reason I picked the 2080S and not the 2080 as a matchup is because that's an exact performance match for the 3060Ti as tested by TPU in a wide range of current titles, right now. Clock-for-clock, Ampere uses 59% more 'cores' than Turing's 'cores' even though those two definitions of cores aren't the same from a technological standpoint. The underlying architecture of whether it's an FP, INT, or combined core doesn't matter to today's games, even if it will probably scale differently in future applications.
3060Ti is a bit faster than both 2080 and 2080s. If you want to compare exact same performance then there is 2080Ti vs 3070.
Ampere doesn't really have 2x more Cuda cores, the number of cores(units) is the same per SM, the difference is only that half of them can now do either FP32 or INT32 operation.
Even today It matters what kind of unit(FP, INT or combined) It is, If you have a combined core(unit) and the game needs to use INT32 units then there is no advantage over Turing, It's just 64x FP32 and 64x INT32 per SM, Ampere has the advantage when there is no INT32 instruction executed and then you have 128x FP32 units.
Chrispy_
Yeah that's kind of my point, I'm not comparing GPUs I'm comparing architectures. In today's software Nvidia's definition of a Turing core is more efficient at any given clockspeed than Nividia's definition of an Ampere core.

I have been practising what you suggest for about a decade now - buy a higher-end SKU than I need and downclocking it to bring the efficiency up. That's all mobile models really are anyway.
Comparing Turing core to Ampere core is simply not right and you can't make any valid conclussion based on It.
Turing has fixed 64 FP32 and 64 INT32 units per SM.
Ampere has fixed 64 FP32 units and combined 64 INT32/FP32 units per SM.
If It was fixed 128 FP32 units + 64 INT32 units per SM, then so be It, but even that wouldn't be fair, If the specs of the rest of the chip stays the same.
Chrispy_
I'm not sure it's fair to make that comparison; An xx60 vs xx80 comparison isn't picking two SKUs targeting the same thing. If we're allowed to mix SKUs, the Turing 1660Ti matches the 3060Ti almost perfectly for performance/Watt and the 2080 is much better than the 3080!
xx60 vs xx80 is just a marketing name, comparing based on actual GPU die(TU104 vs GA104) is still better, but in my opinion the best is comparing based on SM count.
BTW here is a link to performance/W chart from TPU and in 4K It look like this.
GTX 1660Ti vs RTX 3060Ti
89% vs 108%
RTX 2080 vs RTX 3080
90% vs 95%
RTX 2080s vs RTX 3080
84% vs 95%
I will wait for actual reviews for mobile Ampere and won't make a final conclusion before that.
Posted on Reply
#18
medi01
Havefun
Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter
3060 is likely better due to the CUs (which are in reality half of the claimed) supporting 2 fp32 ops in parallel.
Unless I'm mistaken and that doesn't cover fp32.
Posted on Reply
#19
Chrispy_
THANATOS
Ampere is on average faster and more efficient in real world games as shown in TPU reviews.
Hey, don't quote me out of context! We're talking about PER CORE here and as shown in TPU reviews, Ampere is slower than Turing with the 3072 cores of a 2080S matching the performance of 4864 Ampere cores in a 3060Ti

The rest of your post seems to be a disagreement about what a core is, based on how many of each type in an SM.

At the end of the day, you can theorise until you're blue in the face but according to the the official Nvidia definition of 'cores', Turing does more per core than Ampere across TPU's combined game benchmark suite. That's Nvidia's official numbers against TPU's independent real world testing. If you disagree with either the definition of a core or W1zzard's benchmark results, take that up with them respectively. I'm not making those claims, they are.
Posted on Reply
#20
Havefun
THANATOS
Do you know in what kind of workload does RTX 3060 clock as low as 900Mhz and what power consumption or TDP It actually has at that clockspeed? If your answer is that you don't know, then your conlusion is premature.
What is important is the actual clockspeed during gaming, If It is comparable to 1660Ti, then the performance difference should be >20%. If It's lower then performance will suffer, but Nvidia is positioning It above 1660Ti so It should perform better.
BTW from where did you get those clocks for 3060 Mobile?
Yea, there is still no RTX 3060 mobile released so my conclusion is premature, same as other conclusions here. i have these data from Techpowerup GPU database. TDP 3600 = 80W. Now i checked it again - In relative performance 1660 Ti is 7% better then 3600 , and 18% better then 3600 Max-Q. In real performance 3600 will be better but i expect 10-15 FPS difference. Is it worth?
Chrispy_
The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.
Yea exactly, when i saw 350W for desktop ampere i was courious how much they cut it down for laptops (50-100W). Ampere is so terrible that before it was released, they announced next gen Hopper. :)
DLSS and RTX ON is not enough in reviews, they also push 4k resolution, cos no laptops have that resolution :D
To produce 2x more cores also cost 2x more. Also Nvidia is forcing people to pay for huge die areas of "Ray Cores". They should cut it down atleast for Mobile versions.
Posted on Reply
#21
nguyen
Havefun
Yea, there is still no RTX 3060 mobile released so my conclusion is premature, same as other conclusions here. i have these data from Techpowerup GPU database. TDP 3600 = 80W. Now i checked it again - In relative performance 1660 Ti is 7% better then 3600 , and 18% better then 3600 Max-Q. In real performance 3600 will be better but i expect 10-15 FPS difference. Is it worth?


Yea exactly, when i saw 350W for desktop ampere i was courious how much they cut it down for laptops (50-100W). Ampere is so terrible that before it was released, they announced next gen Hopper. :)
DLSS and RTX ON is not enough in reviews, they also push 4k resolution, cos no laptops have that resolution :D
To produce 2x more cores also cost 2x more. Also Nvidia is forcing people to pay for huge die areas of "Ray Cores". They should cut it down atleast for Mobile versions.
GA104 is 392mm2 vs TU104 545mm2. More cores or not GA104 is cheaper to produce than TU104.
Nvidia is paying Samsung and TSMC per wafer, not per chip or how many transistors they have.
Even if GA104 is only faster than TU104 by 20% at the same TGP, it is a success because it is enough of an upgrade that people are gonna buy them and Nvidia is making higher profit margin in the process.
Posted on Reply
#22
THANATOS
Chrispy_
Hey, don't quote me out of context! We're talking about PER CORE here and as shown in TPU reviews, Ampere is slower than Turing with the 3072 cores of a 2080S matching the performance of 4864 Ampere cores in a 3060Ti

The rest of your post seems to be a disagreement about what a core is, based on how many of each type in an SM.

At the end of the day, you can theorise until you're blue in the face but according to the the official Nvidia definition of 'cores', Turing does more per core than Ampere across TPU's combined game benchmark suite. That's Nvidia's official numbers against TPU's independent real world testing. If you disagree with either the definition of a core or W1zzard's benchmark results, take that up with them respectively. I'm not making those claims, they are.
You are talking about PER CORE performance and I am trying to tell you that's not a good comparison.
Yeah, Nvidia uses "Cuda core" as a marketing name for FP32 units and I don't really have a problem with that even If It's a bit misleading or with TPU benchmark results, what I have a problem is your comparison. Neither Nvidia, TPU or other reviewers make conclusions about Ampere vs Turing based on the number of Cuda cores, you are the only one.
Let's compare performance(IPC), power consumption and efficiency(performance/W) based on SM and Cuda cores -> RTX 2080 vs RTX 3070
RTX 3070 with 2x more Cuda is only 28% faster in 4K resolution and consumes 220W or 5W more than RTX 2080, which means It has 25% better performance/W ratio.
Performance PER SM -> Ampere SM is 28% faster and consumes 2.3% more W than Turing SM.
Performance PER Cuda -> Ampere Cuda core is ~36% slower and consumes 49% less power than Turing Cuda core and that's simply hillarious, because Cuda core is the same in both Ampere and Turing, there was no change. The change happened a level higher in SM, where the original 64x INT32 units are now capable of FP32 execution.
Havefun
Yea, there is still no RTX 3060 mobile released so my conclusion is premature, same as other conclusions here. i have these data from Techpowerup GPU database. TDP 3600 = 80W. Now i checked it again - In relative performance 1660 Ti is 7% better then 3600 , and 18% better then 3600 Max-Q. In real performance 3600 will be better but i expect 10-15 FPS difference. Is it worth?
There is no official info about performance or clockspeed, so It's just an estimate someone put there, that's why they write:
This product is not released yet.
Data on this page may change in the future.
My estimate for RTX 3060(Max-Q) is ~20-30% better performance than 1660Ti(Max-Q), If the clockspeed is comparable. If It's worth It or not that's something everyone has to answer for themselves.
Posted on Reply
#23
Chrispy_
THANATOS
You are talking about PER CORE performance and I am trying to tell you that's not a good comparison.
Yeah, Nvidia uses "Cuda core" as a marketing name for FP32 units and I don't really have a problem with that even If It's a bit misleading or with TPU benchmark results, what I have a problem is your comparison. Neither Nvidia, TPU or other reviewers make conclusions about Ampere vs Turing based on the number of Cuda cores, you are the only one.
Let's compare performance(IPC), power consumption and efficiency(performance/W) based on SM and Cuda cores -> RTX 2080 vs RTX 3070
RTX 3070 with 2x more Cuda is only 28% faster in 4K resolution and consumes 220W or 5W more than RTX 2080, which means It has 25% better performance/W ratio.
Performance PER SM -> Ampere SM is 28% faster and consumes 2.3% more W than Turing SM.
Performance PER Cuda -> Ampere Cuda core is ~36% slower and consumes 49% less power than Turing Cuda core and that's simply hillarious, because Cuda core is the same in both Ampere and Turing, there was no change. The change happened a level higher in SM, where the original 64x INT32 units are now capable of FP32 execution.


There is no official info about performance or clockspeed, so It's just an estimate someone put there, that's why they write:

My estimate for RTX 3060(Max-Q) is ~20-30% better performance than 1660Ti(Max-Q), If the clockspeed is comparable. If It's worth It or not that's something everyone has to answer for themselves.
You're still trying to have an irrelevant and misdirected argument with me about how the cores aren't comparable between generations.

I'm not the one defining cores.
I'm not the one publishing data showing the 3060Ti performance parity with a 2080S

If you don't like it, it's not me that needs convincing; You're preaching to the choir and have been for some time in this thread. Nvidia are, whether you like it or not, marketing and selling their product on core count. This article is mostly about core count (@Ravenlord mentions it 6 times in a single paragraph) and when most people look at GPU specs the two most important factors are the number of cores and the clocks those cores run at.

I get (I always got) the architectural dissimilarities between a Turing and an Ampere core. I know and I don't care that per-core performance isn't a good comparison - that's the comparison that is being made, that Nvidia themselves make, that reviewers make, and that many users will too. Regardless of the comparison's future/architectural relevance, mobile Turing owners will multiply the number of cores they currently have by 1.59 (for the 59% per-core advantage over Ampere) and know that in currently-benchmarked games, an Ampere purchase with fewer cores that that isn't going to be any faster. That's simple maths and backed up with clock-comparable empirical data.
Posted on Reply
#24
THANATOS
Chrispy_
You're still trying to have an irrelevant and misdirected argument with me about how the cores aren't comparable between generations.

I'm not the one defining cores.
I'm not the one publishing data showing the 3060Ti performance parity with a 2080S

If you don't like it, it's not me that needs convincing; You're preaching to the choir and have been for some time in this thread. Nvidia are, whether you like it or not, marketing and selling their product on core count. This article is mostly about core count (@Ravenlord mentions it 6 times in a single paragraph) and when most people look at GPU specs the two most important factors are the number of cores and the clocks those cores run at.

I get (I always got) the architectural dissimilarities between a Turing and an Ampere core. I know and I don't care that per-core performance isn't a good comparison - that's the comparison that is being made, that Nvidia themselves make, that reviewers make, and that many users will too. Regardless of the comparison's future/architectural relevance, mobile Turing owners will multiply the number of cores they currently have by 1.59 (for the 59% per-core advantage over Ampere) and know that in currently-benchmarked games, an Ampere purchase with fewer cores that that isn't going to be any faster. That's simple maths and backed up with clock-comparable empirical data.
Ok, by reading your last two sentences now I get what you were pointing at.
So the conclusion based on desktop models is that Ampere GPU is more power efficient(perf/W) than Turing GPU and as fast or faster depending on which models you are comparing(3060Ti vs 2080S, 3070 vs 2080, 3070 vs 2080ti), but because of the architectural changes in SM you need to watch out for the number of Cuda cores even with same clockspeed, because It's not representative of performance gain over Turing architecture and you could end up with much lower gaming performance than you wanted.:D
For example If you want to upgrade from RTX 2060 mobile with 1920 Cuda cores, then an Ampere GPU with 2944-3072 cores will perform similarly even If the difference in Cuda cores is 53-60% so you need to choose Ampere with 3840 Cuda or more, If you want at least ~25% more performance.
I think this sums It up pretty nicely.


I have to wonder If RTX 3060 mobile will have only 3072Cuda(24SM) with 192bit GDDR6 bus.
An uncut GA104 has 6144Cuda cores(48SM) and 256bit GDDR6 bus. Based on this even 128bit bus should be enough for 3072 cores(24SM) and Nvidia would need to deactivate half of cores(SM) to get this from GA104, that's too much of a waste.
I think this RTX 3060 is based on GA106 and there will be RTX 3060Ti or Super with 3840 cores. The question is If GA106 will have only 24SM or 30SM in full config, but considering GA104 has 48SM I think It will have 30SM and 192bit GDDR6 bus.
Posted on Reply
#25
Chrispy_
THANATOS
Ok, by reading your last two sentences now I get what you were pointing at.
So the conclusion based on desktop models is that Ampere GPU is more power efficient(perf/W) than Turing GPU and as fast or faster depending on which models you are comparing(3060Ti vs 2080S, 3070 vs 2080, 3070 vs 2080ti), but because of the architectural changes in SM you need to watch out for the number of Cuda cores even with same clockspeed, because It's not representative of performance gain over Turing architecture and you could end up with much lower gaming performance than you wanted.:D
For example If you want to upgrade from RTX 2060 mobile with 1920 Cuda cores, then an Ampere GPU with 2944-3072 cores will perform similarly even If the difference in Cuda cores is 53-60% so you need to choose Ampere with 3840 Cuda or more, If you want at least ~25% more performance.
I think this sums It up pretty nicely.


I have to wonder If RTX 3060 mobile will have only 3072Cuda(24SM) with 192bit GDDR6 bus.
An uncut GA104 has 6144Cuda cores(48SM) and 256bit GDDR6 bus. Based on this even 128bit bus should be enough for 3072 cores(24SM) and Nvidia would need to deactivate half of cores(SM) to get this from GA104, that's too much of a waste.
I think this RTX 3060 is based on GA106 and there will be RTX 3060Ti or Super with 3840 cores. The question is If GA106 will have only 24SM or 30SM in full config, but considering GA104 has 48SM I think It will have 30SM and 192bit GDDR6 bus.
Yeah, At the moment - specifically this article - the only real info we have on the mobile 3000-series is CUDA core count and power target, so that's what matters. Looking at architectural efficiency is hard because across the range of SKUs for any architecture there is a huge range of different efficiencies, and the SKUs and power targets (which heavily influence the power efficiency) don't match up between Turing and Ampere either.

The point that's perhaps even more important than the relative 'per-core' efficiency of Ampere vs Turing is actually how they're going to squeeze a 320W card into a 115W laptop!

[INDENT]A desktop 2080 had 2944 cores @ 215W, a mobile 2080 had 2944 @ 115W[/INDENT]
[INDENT]Same core count, 44% TDP reduction[/INDENT]
[INDENT]Same silicon (binned for voltage efficiency) so the reduced TDP was good for 80-90% of the desktop performance.[/INDENT]
[INDENT][/INDENT]
[INDENT]A desktop 3080 has 8704 cores @ 320W, a mobile 3080 has 6144 @ 115W[/INDENT]
[INDENT]30% core count reduction, 65% TDP reduction.[/INDENT]
[INDENT]Lower tier silicon with reduced ROPs, TMUs, Tensor cores etc. If performance is even half of a desktop 3080 I'll be impressed.[/INDENT]

It's a huge cut, and so the comparison that people need to be careful to avoid is mobile Ampere vs mobile Turing - because that's no longer like for like, it's comparing a desktop-equivalent with something that absolutely isn't a desktop equivalent!
Posted on Reply
Add your own comment