Wednesday, February 17th 2016

NVIDIA GP100 Silicon to Feature 4 TFLOPs DPFP Performance

NVIDIA's upcoming flagship GPU based on its next-generation "Pascal" architecture, codenamed GP100, is shaping up to be a number-crunching monster. According to a leaked slide by an NVIDIA research fellow, the company is designing the chip to serve up double-precision floating-point (DPFP) performance as high as 4 TFLOP/s, a 3-fold increase from the 1.31 TFLOP/s offered by the Tesla K20, based on the "Kepler" GK110 silicon.

The same slide also reveals single-precision floating-point (SPFP) performance to be as high as 12 TFLOP/s, four times that of the GK110, and nearly double that of the GM200. The slide also appears to settle the speculation on whether GP100 will use stacked HBM2 memory, or GDDR5X. Given the 1 TB/s memory bandwidth mentioned on the slide, we're inclined to hand it to stacked HBM2.

Source: 3DCenter.org
Add your own comment

32 Comments on NVIDIA GP100 Silicon to Feature 4 TFLOPs DPFP Performance

#1
Eilifein
Just for a reference, since they compare a 7970 (LOL) to their newest cards.

FirePro W9100: http://www.amd.com/en-us/products/graphics/workstation/firepro-3d/9100#
  • 320 GB/s memory bandwidth
  • 5.24 TFLOPS peak single-precision floating-point performance
  • 2.62 TFLOPS peak dual-precision floating-point performance
SP BYTE/FLOP = 0.061068
DP BYTE/FLOP = 0.122137

Wow.... comparing the Tesla cards to a 7970.
Posted on Reply
#2
64K
This year will be an exciting year for GPUs. Big increases in performance from both teams.
Posted on Reply
#3
PP Mguire
btarunr said:
NVIDIA's upcoming flagship GPU based on its next-generation "Pascal" architecture, codenamed GP100, is shaping up to be a number-crunching monster. According to a leaked slide by an NVIDIA research fellow, the company is designing the chip to serve up double-precision floating-point (DPFP) performance as high as 4 TFLOP/s, a 3-fold increase from the 1.31 TFLOP/s offered by the Tesla K20, based on the "Kepler" GK110 silicon.

The same slide also reveals single-precision floating-point (SPFP) performance to be as high as 12 TFLOP/s, four times that of the GK110, and nearly double that of the GM200. The slide also appears to settle the speculation on whether GP100 will use stacked HBM2 memory, or GDDR5X. Given the 1 TB/s memory bandwidth mentioned on the slide, we're inclined to hand it to stacked HBM2.



Source: 3DCenter.org
Or maybe the fact that it says stacked 3D DRAM?

Eilifein said:
Just for a reference, since they compare a 7970 (LOL) to their newest cards.

FirePro W9100: http://www.amd.com/en-us/products/graphics/workstation/firepro-3d/9100#
  • 320 GB/s memory bandwidth
  • 5.24 TFLOPS peak single-precision floating-point performance
  • 2.62 TFLOPS peak dual-precision floating-point performance
SP BYTE/FLOP = 0.061068
DP BYTE/FLOP = 0.122137

Wow.... comparing the Tesla cards to a 7970.

Tesla K20x = OG Titan so comparing it to 7970 makes a bit more sense.
Posted on Reply
#4
Prima.Vera
Yeah, but the question is, are there any games out there worth the investment of buying a new top video card??
Posted on Reply
#5
R-T-B
Prima.Vera said:
Yeah, but the question is, are there any games out there worth the investment of buying a new top video card??
There never is, game developers target what already exists. New releases will take advantage though.

The exception of course is Crysis, but that's about it.
Posted on Reply
#6
PP Mguire
Prima.Vera said:
Yeah, but the question is, are there any games out there worth the investment of buying a new top video card??
I don't see any for this year, yet, but anybody gaming at 4k should want one of these. Maxwell V2 just isn't enough for 4k.
Posted on Reply
#7
Eilifein
PP Mguire said:
Or maybe the fact that it says stacked 3D DRAM?

Tesla K20x = OG Titan so comparing it to 7970 makes a bit more sense.
I'm sorry if it came out a bit weird, but i meant the Pascal one, not the K20x. In any case, I really don't understand what the graph wants to communicate. They don't even pit Pascal with the top Tesla dogs, K40 and K80(dual gpu).

Edit: To reiterate, i quote the OP: NVIDIA's upcoming flagship GPU based on its next-generation "Pascal" architecture, codenamed GP100. Specifically mentioning "flagship", then comparing it to K20x and 7970 is at the very least misleading.
Posted on Reply
#8
64K
Prima.Vera said:
Yeah, but the question is, are there any games out there worth the investment of buying a new top video card??
Depends on what you want. There are games that you can't max at 4K and average 60 FPS with a single Titan X or a single GTX 980 Ti and there will no doubt be more in the next couple of years. I imagine for people that want 4K and a single GPU they will like it. Worth the investment is relative to what a person wants and is willing to spend to get it.
Posted on Reply
#9
PP Mguire
Eilifein said:
I'm sorry if it came out a bit weird, but i meant the Pascal one, not the K20x. In any case, I really don't understand what the graph wants to communicate. They don't even pit Pascal with the top Tesla dogs, K40 and K80(dual gpu).
Double precision. DP in Maxwell is nonexistent which is why the M40/ect isn't on there. The K40 compared to K20x is the difference between Titan and Titan Black so it makes sense. Assuming this "leak" is a real leak I'd be willing to bet it's comparing Pascal to K20x to make the #s seem higher to investors? That's just a guess, but most internal should know at least ballpark figures for both cards when looking at a graph like this. I personally don't care what they're on about with the graph, I like that SP performance if it's true.
Posted on Reply
#10
Eilifein
PP Mguire said:
Double precision. DP in Maxwell is nonexistent which is why the M40/ect isn't on there. The K40 compared to K20x is the difference between Titan and Titan Black so it makes sense. Assuming this "leak" is a real leak I'd be willing to bet it's comparing Pascal to K20x to make the #s seem higher to investors? That's just a guess, but most internal should know at least ballpark figures for both cards when looking at a graph like this. I personally don't care what they're on about with the graph, I like that SP performance if it's true.
In that sense, i can agree with.
Posted on Reply
#11
PP Mguire
Eilifein said:
In that sense, i can agree with.
If you look at the graph DP is bold and the side says performance on double precision. Then you got guys like me who give 0 Fs about DP and all I'm looking at is that SP figure which looks juicy.
Posted on Reply
#12
Casecutter
With like some probably six+ years between designs and now on a shrink, I'd hope it can more than double/triple some of the numbers! How else is science and other types of professional workload find "more" faster, this is why they call Super computers... Super! For probably the past three years the scientific community has had to do with Stagnate Computers, pretty much.
Posted on Reply
#13
trog100
unless they can produce at least 50% more pixel driving power for the same wattage it aint gonna achieve much.. somehow i dont see it.. but time will tell.. :)

i think those looking for huge gains or cost savings are gonna be a little disappointed..

trog
Posted on Reply
#14
PP Mguire
trog100 said:
unless they can produce at least 50% more pixel driving power for the same wattage it aint gonna achieve much.. somehow i dont see it.. but time will tell.. :)

i think those looking for huge gains or cost savings are gonna be a little disappointed..

trog
The numbers and the quoted 50% per watt from Nvidia (rumor) lines up to current Titan X offering.
Posted on Reply
#15
HumanSmoke
Casecutter said:
With like some probably six+ years between designs and now on a shrink, I'd hope it can more than double/triple some of the numbers!
Answer me this: Why would you expect this? TSMC have already explained that the process offers twice the transistor density OR a 70% reduction in power than CLN28HPM, and the last time ANY flagship GPU offered more than a doubling of FP32 and FP64 was 2009 (Cypress over RV 770XT/790XT)...and the last time both those parameters were tripled in the space of generation? Never.
Casecutter said:
How else is science and other types of professional workload find "more" faster, this is why they call Super computers... Super!
There's a reason that supers are referred to as clusters. There is also a reason that the interconnect plays a huge part in these clusters, and also a reason that OmniPath, HSA, and NVLink are seen as future performance multipliers.
PP Mguire said:
Or maybe the fact that it says stacked 3D DRAM?.
The actual slide is probably quite old. The slide deck (PDF) it came from concentrates on HMC so your distinction is very much valid.

I'd be wary about taking too much Pascal info for granted in the slide if the information is that old.
Posted on Reply
#16
Kurt Maverick
Was there any doubt that hi-end Pascals would feature HBM2? :P Like if Nvidia could afford NOT to include it...

I think I read that lower-end Pascals would feature GDDR5X, but that's about all of it.
Posted on Reply
#17
newtekie1
Semi-Retired Folder
I don't think we will even see the high end Pascal GPU in the consumer space any time soon. I'm guessing nVidia will do the same thing they have done the past few generations, release the mid-range GPU as the top end. Then coast on that for a while, and then release the high end GPU later down the line.
Posted on Reply
#18
the54thvoid
newtekie1 said:
I don't think we will even see the high end Pascal GPU in the consumer space any time soon. I'm guessing nVidia will do the same thing they have done the past few generations, release the mid-range GPU as the top end. Then coast on that for a while, and then release the high end GPU later down the line.
I think AMD will have a lot to do with that decision. If AMD release Polaris performance parts in Q2 that outshine Maxwell (which is pretty much assured given Fiji is a very close match) Nvidia will be forced to play their hand, if they even have it ready.
If AMD release a solid card, it will be humbling for Nvidia (which we all agree would be very good). It really depends what each company's moles know about each others tech. Perhaps it will be Tahiti versus GK104 all over again? Perhaps it will be 290X versus 780ti? I would like to see AMD come out with a better card and one that puts pressure on Nvidia.

But, if AMD have no Polaris performance part ready, Yeah, Nvidia will do exactly what they always do, milk the mid range as the best part until they need to release their top end. I doubt Nvidia will jump when AMD release the dual Fiji part. It will give AMD hands down the fastest card but it wont be seen as a 'valid' threat to Nvidia's 980ti (dual versus single arguments).
Posted on Reply
#19
arbiter
the54thvoid said:
If AMD release a solid card, it will be humbling for Nvidia (which we all agree would be very good). It really depends what each company's moles know about each others tech. Perhaps it will be Tahiti versus GK104 all over again? Perhaps it will be 290X versus 780ti? I would like to see AMD come out with a better card and one that puts pressure on Nvidia.

But, if AMD have no Polaris performance part ready, Yeah, Nvidia will do exactly what they always do, milk the mid range as the best part until they need to release their top end. I doubt Nvidia will jump when AMD release the dual Fiji part. It will give AMD hands down the fastest card but it wont be seen as a 'valid' threat to Nvidia's 980ti (dual versus single arguments).
Only "polaris card" amd has even shown off is low-mid range card that is 950/960 range card. Which likely is AMD just doing it ti create some hype/get ahead of nvidia's PR announcements. I don't think 3-4 months of being taped out is enough time for QA testing over AMD's new chip's. As for fiji, that being CF'ed gpu, which will put some people off Since if there is 50% boost that could put next gen gpu in that ball park of that dual gpu without CF/SLI drawbacks.
Posted on Reply
#20
Serpent of Darkness
Prima.Vera said:
Yeah, but the question is, are there any games out there worth the investment of buying a new top video card??
Star Citizens? New refreshes of Battlefield and Call of Duty sequels? Assassin Creed Sequels? New MMOs with D3D12.0? 4K eye candy that will have diminished value and desire as time approaches 2017 and 2018?

R-T-B said:
The exception of course is Crysis, but that's about it.
Is there a new Crysis Sequel coming out? C3 is cake on for high-end computers...

Eilifein said:
I'm sorry if it came out a bit weird, but i meant the Pascal one, not the K20x. In any case, I really don't understand what the graph wants to communicate. They don't even pit Pascal with the top Tesla dogs, K40 and K80(dual gpu).

Edit: To reiterate, i quote the OP: NVIDIA's upcoming flagship GPU based on its next-generation "Pascal" architecture, codenamed GP100. Specifically mentioning "flagship", then comparing it to K20x and 7970 is at the very least misleading.
1. The graph is basically stating a performance improvement in the 64 bit floating point precision area over CPUs and others. As you can see, there's no major improvements for gaming if you focus on 64bitFPP, but rendering and number crushing, that's a different story. 32bitFPP at 12 Tera-whatevers per sec is actually pretty significant for gaming. You can say one of NVidia's many points with this graph is they didn't skimp on the 64bitFPP area like the last 2 to 3 generations on Titan "this time."

2. The graph speaks of a correlation between memory usage and the 1st derivative aka bytes per flop. What NVidia is basically saying is that the point in which information is being stored to the framebuffer for 32 or 64 bit floating point precision executions, the usage is actually less if you compare it to other products with a similar relationship. Furthermore, I think it's a typo when the graph shows 0,256 and 0,805 for SP and DP on the new Pascal. It's probably meant to say 0.256 bytes per flop SP and 0.805 bytes per flop DP.

3. 7970 and above, 64bit FPP has actually gone up for AMD Graphic Cards probably because AMD saw a small niche in the market where AMD Consumers would use their discrete graphic cards to render videos and others in a time where NVidia was taking it away after the first Titan series generation. NVidia was thinking that they could remove the 64bitFPP in gaming cards, and this would probably boost the sells of Quandro Cards, but there wasn't really a big difference in sales (speculation), and you can see this in M4000 where you have a Maxwell Titan and Workstation card providing about the same performance/features to rendering. The only difference is the driver that was probably significant for the most part.

4. Tesla is more of a number cruncher, and it's contender is the Intel's knight's landing or any server CPUs. Simply put, it's an accelerator card, but it still acts as a Graphic Card: Offload GPU executions to the GPU for processing and image rendering, use CUDA, blah blah blah. Some would say that Knights Landing is a work in progress and Intel's failure at an Intel Graphics Card. Intel's Xeon PHI is future a proof toys that can't be used for practical applications because a lot of current softwares don't utilize multi-core coding, and in order to make it work, you need to be someone who knows how to code both for a program and on the Xeon Phi to make it work remotely (in theory). From my understanding, you can't just load a PC game, and 64 micro CPU cores from Knights Landing is going to make your bottleneck troubles disappear. Thus giving you an FPS of 3,000 on World of Warcraft on ultra high settings. NO! The PC game utilizes coding to function with the physical Core for your CPU, but other codes need to be implemented for Knights Landing--that's assuming it works properly when you do that, to make it work. While Intel has it's multicore coding for Knight's Landing, NVidia's Tesla line uses Cuda. They say it's more efficient, and it provides better performance than Knights Landing. Overall, I think it's just a glorified GPU with some Nitro or rocket boosters... Tesla can't act as a substitute CPU through your PCI bus for increased performance, but it can improve rendering times for programs that utilize GPU rendering, and the coding is less complicated??

5. Majority of CPUs have poor 64bit FPP in general. Take a look at the Sandy Bridge Xeon 2690 in the table. 64bitFPP is only what, 243.2Teraflops versus the AMD 7970 at 1010. TeraFLops in DP alone.

6. 64bitFPP isn't a major function for every, normal use and PC Gaming. So in a sense, Intel and AMD can say "big F***en Deal," but to renderers and CGI people who use NVidia's codes to render particle effects, we'll be like OMG, that's going to make my epeen super sexy. Frames times are cut down from 10 minutes to 10 seconds. Woot WOOT! I can hit the clubs a lot sooner.
Posted on Reply
#21
R-T-B
Serpent of Darkness said:

Is there a new Crysis Sequel coming out? C3 is cake on for high-end computers...
It wasn't at it's point of release.
Posted on Reply
#22
Prima.Vera
Guys, you are keep saying that the new ones will be better suited for 4K gaming. LOL. If you think that 0.07% of the users that are gaming in 4K are going to make nVidia/AMD ritch by buyng new cards, then we are all living an a dream world :))))
Common, lets be real for once.
I'm gaming with full details ALL existing games on 1080p with my (now) crappy 780 Ti card and so far there is zero reason to upgrade. If the rummors are true, then those new cards will be at least 700$ or more in East Asia/Europe...
Good luck with that.
Posted on Reply
#23
HumanSmoke
Serpent of Darkness said:
1. The graph is basically stating a performance improvement in the 64 bit floating point precision area over CPUs and others. As you can see, there's no major improvements for gaming if you focus on 64bitFPP, but rendering and number crushing, that's a different story. 32bitFPP at 12 Tera-whatevers per sec is actually pretty significant for gaming. You can say one of NVidia's many points with this graph is they didn't skimp on the 64bitFPP area like the last 2 to 3 generations on Titan "this time."
It is actually only GM 200 that has reduced FP64. GK110 for Titan/Titan Black is at the same 1:3 (FP64:FP32) rate as it's Quadro and Tesla brethren
Serpent of Darkness said:
3. 7970 and above, 64bit FPP has actually gone up for AMD Graphic Cards...
Not actually true. Tahiti (HD 7970 / FirePro W8000/W9000) has a 1:4 FP64 rate (roughly 950-1000 GFLOP). Hawaii has a native rate of 1:2 (2.1-2.6TF) for FirePro and 1:8 rate for Radeon. Fiji has a native rate of 1:16, which works out to just over half (537 GFLOPS) that of the 7970 (1024 GFLOPS).
Serpent of Darkness said:
probably because AMD saw a small niche in the market where AMD Consumers would use their discrete graphic cards to render videos...
CG Render software seldom uses double precision (V-Ray and Deadpool- the current CG PR poster-boy being an exception), and it is virtually non existent for consumer applications. You also might want to check the state of OpenCL rendering. In general it is a mess.
Serpent of Darkness said:
NVidia was thinking that they could remove the 64bitFPP in gaming cards, and this would probably boost the sells of Quandro Cards, but there wasn't really a big difference in sales (speculation)
Numbers sold don't reflect value especially given the discrepancy in pricing.

Serpent of Darkness said:
and you can see this in M4000 where you have a Maxwell Titan and Workstation card providing about the same performance/features to rendering. The only difference is the driver that was probably significant for the most part.
The driver and the 24/7 support are the big differences between workstation and consumer graphics cards for both vendors. The warranty also guarantees a like for like replacement for a 3 year term.
Serpent of Darkness said:
4. Tesla is more of a number cruncher, and it's contender is the Intel's knight's landing or any server CPUs. Simply put, it's an accelerator card, but it still acts as a Graphic Card: Offload GPU executions to the GPU for processing and image rendering, use CUDA, blah blah blah. Some would say that Knights Landing is a work in progress and Intel's failure at an Intel Graphics Card. Intel's Xeon PHI is future a proof toys that can't be used for practical applications because a lot of current softwares don't utilize multi-core coding, and in order to make it work, you need to be someone who knows how to code both for a program and on the Xeon Phi to make it work remotely (in theory).
Xeon Phi represents a challenge to code for to maximize its potential. Xeon Phi also has an annoying issue of performance decreasing as the job size increases - not a great selling point for supercomputer workloads as a general rule (it also isn't overly efficient). Intel gains market share through its generous support and basically giving the things away (or in some cases, actually giving them away). Certainly a great way to fast track market share even if they aren't actually used ( a la Tiahne-2). I still suspect both Nvidia and Intel (and AMD if they have the resources) will need to address dedicated non-graphics pipelined GPGPU like PEZY-SC
Posted on Reply
#24
Legacy-ZA
Kurt Maverick said:

I think I read that lower-end Pascals would feature GDDR5X, but that's about all of it.
If they are, they can keep it. HBM2 has better power efficiency, performance as well as making the PCB smaller. It would be a big mistake if they decide to go with GDDR5X, even for the lower tiered cards; just think Media PC's etc.
Posted on Reply
#25
arbiter
Prima.Vera said:
Guys, you are keep saying that the new ones will be better suited for 4K gaming. LOL. If you think that 0.07% of the users that are gaming in 4K are going to make nVidia/AMD ritch by buyng new cards, then we are all living an a dream world :))))
Common, lets be real for once.
Problem with doing 4k gaming now, doing it with 1 card isn't very viable. You need least 2 cards in SLI to keep decent frame rates without crapping quality to nothing. new cards make it closer and cheaper.
Posted on Reply
Add your own comment