• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

NVIDIA Announces the Tesla V100 PCI-Express HPC Accelerator

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,696 (7.42/day)
Location
Dublin, Ireland
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B550 AORUS Elite V2
Cooling DeepCool Gammax L240 V2
Memory 2x 16GB DDR4-3200
Video Card(s) Galax RTX 4070 Ti EX
Storage Samsung 990 1TB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
NVIDIA formally announced the PCI-Express add-on card version of its flagship Tesla V100 HPC accelerator, based on its next-generation "Volta" GPU architecture. Based on the advanced 12 nm "GV100" silicon, the GPU is a multi-chip module with a silicon substrate and four HBM2 memory stacks. It features a total of 5,120 CUDA cores, 640 Tensor cores (specialized CUDA cores which accelerate neural-net building), GPU clock speeds of around 1370 MHz, and a 4096-bit wide HBM2 memory interface, with 900 GB/s memory bandwidth. The 815 mm² GPU has a gargantuan transistor-count of 21 billion. NVIDIA is taking institutional orders for the V100 PCIe, and the card will be available a little later this year. HPE will develop three HPC rigs with the cards pre-installed.



View at TechPowerUp Main Site
 
Noob question: how would you calculate peak theoretical TFLOPs based on the listed specs?

Is it 5120 shaders x 2 x 1.3 GHz clock = 14 TFLOPs? Or am I missing something?
 
980 ti to 1080 was 50% tflop increase for 30% extra performance. Now we really only have 12 to 15 tflops increase drom p6000 to tesla v100 with SXM2 so the v6000 would have 15.5-ish probably. That's an increase of just 30% in tflops. If the v104 die also gets more CUDA cores, it will have 30% more tflops than a 1080. That basically puts it at 1080 ti computing power and even with optimizations it's likely only going to be 10% faster or so or maybe even less. Best case scenario going from 9.3 to 14 tflops for the pcie variants of tesla cards, even though the p100 hasn't got the full 3840 CUDA cores, that still only gives gv104 13.5tflops and we have seen diminishing returns tflops to framerate, so essentialy that's still only 25% increased framerates over the 1080 ti and really though you should compare 3840 pascal to 5120 volta, so that drops the improvement in framerate down to 20% if you would compare the top of the line tesla pascal that doesn't exist and volta tesla cards to each other.

10-20% increased framerates then. Hmmm....
 
I heard retail availability is august 2018 :p
 
I heard retail availability is august 2018 :p
Maybe! :)

Apparently geforce volta is going to use gddr5(x) and/or gddr6 though, so maybe they'll arrive in march 2018. Still, according to my research looking at tflop to framerate increases, 1080 ti to volta xx80 is likely to only be 10-20% higher framerates, rather than 980 ti to 1080 20-30% increased framerates. Not too spectacular then unless nvidia used a LOT of magic.
 
That colour reminds me of those Vega cards. :D
 
Maybe! :)

Apparently geforce volta is going to use gddr5(x) and/or gddr6 though, so maybe they'll arrive in march 2018. Still, according to my research looking at tflop to framerate increases, 1080 ti to volta xx80 is likely to only be 10-20% higher framerates, rather than 980 ti to 1080 20-30% increased framerates. Not too spectacular then unless nvidia used a LOT of magic.

Nvidia has much magic. Your calculations will be rendered poop. :D
 
Nvidia has much magic. Your calculations will be rendered poop. :D
Just looking at past releases and with less tflops improvement than usual, it's not looking too good for volta. Vega (2.0) now has a chance of beating it.
 
Just looking at past releases and with less tflops improvement than usual, it's not looking too good for volta. Vega (2.0) now has a chance of beating it.

They have made changes to the warp schedulers, it looks very much like a parallel system, so that will increase performance, theoretically, in line with how Async helps non serial tasks.
Don't use hardware count to guess performance, on that metric, AMD should have been destroying Nvidia for years.
Nvidia has a very refined and streamlined architecture, it reaps rewards.
 
They have made changes to the warp schedulers, it looks very much like a parallel system, so that will increase performance, theoretically, in line with how Async helps non serial tasks.
Don't use hardware count to guess performance, on that metric, AMD should have been destroying Nvidia for years.
Nvidia has a very refined and streamlined architecture, it reaps rewards.
Unless all that new stuff really makes a huge difference, it's likely we'll see similar tflops/framerate ratios. New nvidia stuff basically only seems to make it not degrade massively over time and run well now, whereas AMD seems to degrade less and not be a huge fan of the now. Honestly I prefer AMD's methods since a rx480/580 will now beat a 1060 on average, not just because of drivers, but also because more and more games use dx12, vulkan, etc, so AMD stuff lasts relativwly long in the same price category. DiRT 4 btw really seems to favour AMD, since a rx 580 is now nearly as good as a 1070 in that game. FineWine technology for the win!
 
980 ti to 1080 was 50% tflop increase for 30% extra performance. Now we really only have 12 to 15 tflops increase drom p6000 to tesla v100 with SXM2 so the v6000 would have 15.5-ish probably. That's an increase of just 30% in tflops. If the v104 die also gets more CUDA cores, it will have 30% more tflops than a 1080. That basically puts it at 1080 ti computing power and even with optimizations it's likely only going to be 10% faster or so or maybe even less. Best case scenario going from 9.3 to 14 tflops for the pcie variants of tesla cards, even though the p100 hasn't got the full 3840 CUDA cores, that still only gives gv104 13.5tflops and we have seen diminishing returns tflops to framerate, so essentialy that's still only 25% increased framerates over the 1080 ti and really though you should compare 3840 pascal to 5120 volta, so that drops the improvement in framerate down to 20% if you would compare the top of the line tesla pascal that doesn't exist and volta tesla cards to each other.

10-20% increased framerates then. Hmmm....

P6000 is gp102, which have quite smaller core than GV100 and thus can manage much higher clocks on it's power envelope. You should really compare GV100 to GP100 products, like Tesla P100s(9.8 - 10.6 Tflops) or quadro gp100(10.3Tflops).
 
P6000 is gp102, which have quite smaller core than GV100 and thus can manage much higher clocks on it's power envelope. You should really compare GV100 to GP100 products, like Tesla P100s(9.8 - 10.6 Tflops) or quadro gp100(10.3Tflops).
Did that too.
 
Ahh, my bad. Note to myself: do not skim posts from forums...

Well if we assume that V6000 is full GV102 like P6000 is GP102, the "marketed"* Tflops for P6000 is 12TFlops. Thus clock speed is 12000/(60*64*2)=~1.56GHz. And then if V6000 is full GV102, with that clock speed "marketed"* TFlops would be 2*84*64*1.56 =~ 16.8 TFlops. And consider this: GV100 is huge and fat die(815mm²) and it's still keeping almost same clocks with same power envelope than GP100 with smaller 610mm² die. We just don't have enough information about rest of Volta family to know how much higher can clocks go when you can give more power to them.

*Nvidia marketed TFlops are calculated from given boost clock, which are actually lower than card is operating on normal 3D usage like gaming. I.E. 1.56GHz for pascal arch is very low frequency.
 
Back
Top