Wednesday, June 21st 2017

NVIDIA Announces the Tesla V100 PCI-Express HPC Accelerator

NVIDIA formally announced the PCI-Express add-on card version of its flagship Tesla V100 HPC accelerator, based on its next-generation "Volta" GPU architecture. Based on the advanced 12 nm "GV100" silicon, the GPU is a multi-chip module with a silicon substrate and four HBM2 memory stacks. It features a total of 5,120 CUDA cores, 640 Tensor cores (specialized CUDA cores which accelerate neural-net building), GPU clock speeds of around 1370 MHz, and a 4096-bit wide HBM2 memory interface, with 900 GB/s memory bandwidth. The 815 mm² GPU has a gargantuan transistor-count of 21 billion. NVIDIA is taking institutional orders for the V100 PCIe, and the card will be available a little later this year. HPE will develop three HPC rigs with the cards pre-installed.
Add your own comment

13 Comments on NVIDIA Announces the Tesla V100 PCI-Express HPC Accelerator

#1
gigantor21
Noob question: how would you calculate peak theoretical TFLOPs based on the listed specs?

Is it 5120 shaders x 2 x 1.3 GHz clock = 14 TFLOPs? Or am I missing something?
Posted on Reply
#3
Hugh Mungus
980 ti to 1080 was 50% tflop increase for 30% extra performance. Now we really only have 12 to 15 tflops increase drom p6000 to tesla v100 with SXM2 so the v6000 would have 15.5-ish probably. That's an increase of just 30% in tflops. If the v104 die also gets more CUDA cores, it will have 30% more tflops than a 1080. That basically puts it at 1080 ti computing power and even with optimizations it's likely only going to be 10% faster or so or maybe even less. Best case scenario going from 9.3 to 14 tflops for the pcie variants of tesla cards, even though the p100 hasn't got the full 3840 CUDA cores, that still only gives gv104 13.5tflops and we have seen diminishing returns tflops to framerate, so essentialy that's still only 25% increased framerates over the 1080 ti and really though you should compare 3840 pascal to 5120 volta, so that drops the improvement in framerate down to 20% if you would compare the top of the line tesla pascal that doesn't exist and volta tesla cards to each other.

10-20% increased framerates then. Hmmm....
Posted on Reply
#4
TheGuruStud
I heard retail availability is august 2018 :p
Posted on Reply
#5
Hugh Mungus
TheGuruStud said:
I heard retail availability is august 2018 :p
Maybe! :)

Apparently geforce volta is going to use gddr5(x) and/or gddr6 though, so maybe they'll arrive in march 2018. Still, according to my research looking at tflop to framerate increases, 1080 ti to volta xx80 is likely to only be 10-20% higher framerates, rather than 980 ti to 1080 20-30% increased framerates. Not too spectacular then unless nvidia used a LOT of magic.
Posted on Reply
#6
Chloe Price
That colour reminds me of those Vega cards. :D
Posted on Reply
#7
the54thvoid
Hugh Mungus said:
Maybe! :)

Apparently geforce volta is going to use gddr5(x) and/or gddr6 though, so maybe they'll arrive in march 2018. Still, according to my research looking at tflop to framerate increases, 1080 ti to volta xx80 is likely to only be 10-20% higher framerates, rather than 980 ti to 1080 20-30% increased framerates. Not too spectacular then unless nvidia used a LOT of magic.
Nvidia has much magic. Your calculations will be rendered poop. :D
Posted on Reply
#8
Hugh Mungus
the54thvoid said:
Nvidia has much magic. Your calculations will be rendered poop. :D
Just looking at past releases and with less tflops improvement than usual, it's not looking too good for volta. Vega (2.0) now has a chance of beating it.
Posted on Reply
#9
the54thvoid
Hugh Mungus said:
Just looking at past releases and with less tflops improvement than usual, it's not looking too good for volta. Vega (2.0) now has a chance of beating it.
They have made changes to the warp schedulers, it looks very much like a parallel system, so that will increase performance, theoretically, in line with how Async helps non serial tasks.
Don't use hardware count to guess performance, on that metric, AMD should have been destroying Nvidia for years.
Nvidia has a very refined and streamlined architecture, it reaps rewards.
Posted on Reply
#10
Hugh Mungus
the54thvoid said:
They have made changes to the warp schedulers, it looks very much like a parallel system, so that will increase performance, theoretically, in line with how Async helps non serial tasks.
Don't use hardware count to guess performance, on that metric, AMD should have been destroying Nvidia for years.
Nvidia has a very refined and streamlined architecture, it reaps rewards.
Unless all that new stuff really makes a huge difference, it's likely we'll see similar tflops/framerate ratios. New nvidia stuff basically only seems to make it not degrade massively over time and run well now, whereas AMD seems to degrade less and not be a huge fan of the now. Honestly I prefer AMD's methods since a rx480/580 will now beat a 1060 on average, not just because of drivers, but also because more and more games use dx12, vulkan, etc, so AMD stuff lasts relativwly long in the same price category. DiRT 4 btw really seems to favour AMD, since a rx 580 is now nearly as good as a 1070 in that game. FineWine technology for the win!
Posted on Reply
#11
jabbadap
Hugh Mungus said:
980 ti to 1080 was 50% tflop increase for 30% extra performance. Now we really only have 12 to 15 tflops increase drom p6000 to tesla v100 with SXM2 so the v6000 would have 15.5-ish probably. That's an increase of just 30% in tflops. If the v104 die also gets more CUDA cores, it will have 30% more tflops than a 1080. That basically puts it at 1080 ti computing power and even with optimizations it's likely only going to be 10% faster or so or maybe even less. Best case scenario going from 9.3 to 14 tflops for the pcie variants of tesla cards, even though the p100 hasn't got the full 3840 CUDA cores, that still only gives gv104 13.5tflops and we have seen diminishing returns tflops to framerate, so essentialy that's still only 25% increased framerates over the 1080 ti and really though you should compare 3840 pascal to 5120 volta, so that drops the improvement in framerate down to 20% if you would compare the top of the line tesla pascal that doesn't exist and volta tesla cards to each other.

10-20% increased framerates then. Hmmm....
P6000 is gp102, which have quite smaller core than GV100 and thus can manage much higher clocks on it's power envelope. You should really compare GV100 to GP100 products, like Tesla P100s(9.8 - 10.6 Tflops) or quadro gp100(10.3Tflops).
Posted on Reply
#12
Hugh Mungus
jabbadap said:
P6000 is gp102, which have quite smaller core than GV100 and thus can manage much higher clocks on it's power envelope. You should really compare GV100 to GP100 products, like Tesla P100s(9.8 - 10.6 Tflops) or quadro gp100(10.3Tflops).
Did that too.
Posted on Reply
#13
jabbadap
Ahh, my bad. Note to myself: do not skim posts from forums...

Well if we assume that V6000 is full GV102 like P6000 is GP102, the "marketed"* Tflops for P6000 is 12TFlops. Thus clock speed is 12000/(60*64*2)=~1.56GHz. And then if V6000 is full GV102, with that clock speed "marketed"* TFlops would be 2*84*64*1.56 =~ 16.8 TFlops. And consider this: GV100 is huge and fat die(815mm²) and it's still keeping almost same clocks with same power envelope than GP100 with smaller 610mm² die. We just don't have enough information about rest of Volta family to know how much higher can clocks go when you can give more power to them.

*Nvidia marketed TFlops are calculated from given boost clock, which are actually lower than card is operating on normal 3D usage like gaming. I.E. 1.56GHz for pascal arch is very low frequency.
Posted on Reply
Add your own comment