Monday, June 18th 2012

NVIDIA Tesla K10 GPU Hits New Performance Milestones For Scientific Simulation

ISC'12 - NVIDIA Tesla K10 GPUs offer performance breakthroughs on popular high performance computing (HPC) applications -- ranging from seismic processing to life sciences to video processing -- according to new benchmarks NVIDIA released today.

Based on the new NVIDIA Kepler computing architecture, the Tesla K10 GPU delivers the industry's highest single precision performance (4.58 teraflops) and highest memory bandwidth (320 GB/sec) in a single accelerator. This is 12 times higher single precision flops and 6.4 times higher memory bandwidth than the latest-generation Intel Sandy Bridge CPUs.

The Tesla K10 GPU outperforms CPUs and previous-generation GPUs across the board on the most popular, compute-intensive applications for four key market segments, including:
  • Defense: video analytics, video stabilization, orthorectification, computer vision
  • Life and material sciences: molecular dynamics
  • Oil and gas: seismic processing, reverse time migration
  • Media and entertainment: video editing, video rendering/transcoding, ray tracing
"A distinct advantage of the Tesla K10 GPUs is that it excels in two key areas that have a dramatic impact on overall application performance: floating point operation and memory bandwidth," said Sumit Gupta, senior director of Tesla business at NVIDIA. "Together, these enable the K10 GPU to deliver substantial out-of-the-box performance increases for the top science, engineering and commercial applications with little or no effort on the part of the developer."

New Performance Records on AMBER and LAMMPS
On AMBER, a leading biomolecular simulation software application, four Tesla K10 GPUs achieved world record performance, delivering far superior results than what was available on multiple racks of servers just a few years ago.

The Tesla system achieved performance of 76 nanoseconds of computer simulation time in a day for a 23,558 atom molecule, outstripping the previous record set with four Tesla M2090s last year, providing supercomputing performance to thousands of individual researchers to fuel further innovation in such areas as new drug discovery and more effective materials.

"In biomolecular science, adding a few more nanoseconds of simulation time can make a world of difference in the ability of researchers to study and better understand the behavior of complex biological systems," said Ross Walker, assistant research professor, San Diego Supercomputing Center. "It still blows my mind that a single Tesla K10 outperforms some of the largest CPU clusters. The benefit it offers researchers is tremendous, enabling them to accelerate the search for new and better treatments for a host of diseases and disorders."

The Tesla K10 GPU also delivers the highest performance on LAMMPS, another application widely used by the life sciences research community. Running the LAMMPS Lennard Jones Liquid Benchmark, a single Tesla K10 GPU outperforms a Tesla M2090 GPU by 80 percent, delivering the equivalent performance of a cluster with 64 x86 CPUs.

Accelerating the Search for Energy
NVIDIA Tesla GPUs continue to deliver the highest performance on reverse time migration (RTM) applications for seismic processing in the oil and gas exploration industry, and for image processing in the computer vision industry. Petrobras, the national oil and gas company of Brazil, achieved an 1.8x speed up on its RTM application on the Tesla K10 GPU, as compared to a Tesla M2090 GPU within the same power envelope.

NVIDIA Tesla K10 GPUs are available from leading OEMs, including Appro Supercomputer Solutions, Dell, HP, IBM, SGI and Supermicro, as well as through NVIDIA distribution partners. More information about the Tesla K10 is available on the NVIDIA Tesla website.
Add your own comment

24 Comments on NVIDIA Tesla K10 GPU Hits New Performance Milestones For Scientific Simulation

#1
hardcore_gamer
by: btarunr
Tesla K10 GPU delivers the industry's highest single precision performance
I wonder why they didn't mention double precision performance.:rolleyes:
Posted on Reply
#2
renz496
what for? anyone interested in this baby should have no use of DP anyway since the card was sold for its SP performance. but honestly i'm surprised when nvidia were using GK104 chip in their latest Tesla line up. why they didn't do so before with GF104/114 chips?
Posted on Reply
#3
hardcore_gamer
by: renz496
what for? anyone interested in this baby should have no use of DP anyway since the card was sold for its SP performance. but honestly i'm surprised when nvidia were using GK104 chip in their latest Tesla line up. why they didn't do so before with GF104/114 chips?
Because DP performance is very important in HPC.


They made a Tesla card out of GK 104 (in fact two GK104s) even though it sucks at computing because GK 110 ( originally meant to be the gtx 680) won't enter production any time soon.

They didn't make computing cards using GF104/GF114 because GF100 and GF110 were kick ass cards with exceptional computing performance.
Posted on Reply
#4
renz496
i know DP is important in HPC space. but as far as i know this product was aimed towards application that only utilize SP. personally i think nvdia don't want the HPC crowd to get upset with GK110 being late so they throw GK104 into tesla line up. they might be only good at SP and poor at DP but at least nvidia have to show something dont they? :P

and because of this they can charge more for GK110 parts since it will be amazing in both SP and DP :D
Posted on Reply
#5
MrMilli
by: renz496
and because of this they can charge more for GK110 parts since it will be amazing in both SP and DP
I don't know if you could call it amazing.
GK110's DP performance should be higher than AMD's HD7970 (according to rumors) and the new HD7970 Ghz Edition will be pretty close.

http://parallelis.com/k20-updated-kepler-architecture/
http://www.brightsideofnews.com/news/2012/5/15/nvidia-tesla-k20-ie-gk110-is-71-billion-transistors2c-300w-tdp2c-384-bit-interface.aspx

If LuxMark is any reference, then nVidia is in bad shape with Kepler.

Posted on Reply
#6
renz496
lol. when i say 'amazing' i only mean how amazing GK110 will be compared to GK104. bad or not only time will tell.
Posted on Reply
#7
Steevo
Bring deh AMMBER LAMPS.
Posted on Reply
#8
HillBeast
by: btarunr
Tesla K10 GPU delivers the industry's highest single precision performance (4.58 teraflops) and highest memory bandwidth (320 GB/sec) in a single accelerator. This is 12 times higher single precision flops and 6.4 times higher memory bandwidth than the latest-generation Intel Sandy Bridge CPUs.
I call bulls**t on the memory bandwidth being '6.4 times faster than Sandy Bridge'. Perhaps LGA1155, but not on quad-channel LGA2011 Sandy Bridge. Sure the numbers may stack up and say NVIDIA is faster, but in the planet known as Earth, these figures would be unattainable.
Posted on Reply
#9
blanarahul
Seeing how NVIDIA just doubled everything from GTX 560 Ti to GTX 680... I thought GTX 680 was the successor to the GTX 560 Ti. But looks like I am wrong. Since Gk110 is 2x gk104 with extra features and better fp64 performance. It can't be a successor to gf110.

And GK110 won't make it to Consumer markets in any case.
7.1 Billion transistor should mean a die size of roughly 600 mm square. This is extremely uneconomical for consumer markets.
................................................

But Nvidia is in for a kick-ass competition.
Intel Xeon Phi should be a larrabee core. With over 1 Teraflop of FP64 performance. It should kick K10's butt.
Posted on Reply
#10
Recus
TPU members. Begs Nvidia for computing, disappointed when gets it. :D
Posted on Reply
#11
theeldest
by: btarunr
the Tesla K10 GPU delivers the industry's highest single precision performance (4.58 teraflops) and highest memory bandwidth (320 GB/sec) in a single accelerator.
A little misleading as it's a dual GK104 solution.

Edit: Also misleading to name it the K10 when it's based on a couple GK104 cores. This is NOT a GK110. IIRC, the GK110 Tesla device will be a K20.
Posted on Reply
#12
theoneandonlymrk
by: theeldest
Quote:
Originally Posted by btarunr
the Tesla K10 GPU delivers the industry's highest single precision performance (4.58 teraflops) and highest memory bandwidth (320 GB/sec) in a single accelerator.

A little misleading as it's a dual GK104 solution.

Edit: Also misleading to name it the K10 when it's based on a couple GK104 cores. This is NOT a GK110. IIRC, the GK110 Tesla device will be a K20.
thats generally what nvidia do, nameing shennanigins abound withem, and im surprised Amd havent countered with a 7870 or 7970 dual fire pro card or some such as it would demolish this as a single W600- 9000 come close
Posted on Reply
#13
eddman
by: MrMilli
I don't know if you could call it amazing.
If the rumors are true, then how is 1.5 TFLOPS of DP performance not amazing?!

by: MrMilli
GK110's DP performance should be higher than AMD's HD7970 (according to rumors) and the new HD7970 Ghz Edition will be pretty close.

http://parallelis.com/k20-updated-kepler-architecture/
http://www.brightsideofnews.com/news/2012/5/15/nvidia-tesla-k20-ie-gk110-is-71-billion-transistors2c-300w-tdp2c-384-bit-interface.aspx
HD7970 Ghz Edition's DP number will be at 1.12 TFLOPS, which is still not that close, if they clock the firestream version the same of course, and not lower.

by: MrMilli
If LuxMark is any reference, then nVidia is in bad shape with Kepler.

http://techreport.com/r.x/geforce-gtx-680/luxmark.gif
Those numbers are for GK104, and as we all know its DP performance is only 1/24 of SP. That ratio for GK110 is 1/3, so those figures mean nothing for K20.
Posted on Reply
#14
largon
by: blanarahul
And GK110 won't make it to Consumer markets in any case.
:laugh:
7.1 Billion transistor should mean a die size of roughly 600 mm square.
Around 500mm² is more like it.
Same ballpark as GF100.
This is extremely uneconomical for consumer markets.
Sure - for average consumers. High-end cards are not for average consumers.
Posted on Reply
#15
Xzibit
by: blanarahul

And GK110 won't make it to Consumer markets in any case.
7.1 Billion transistor should mean a die size of roughly 600 mm square. This is extremely uneconomical for consumer markets.
GeForce chips

GT200 = 576mm2 / GTX 260 & 280

GF100 = 529mm2 / GTX 465, 470 & 480

GF110 = 520mm2 / GTX 560 Ti OEM, 560 TI 448, 570, 580 & 590

They seam pretty comfortable releasing chips close to 600mm2.

The problem would be how much more power it would use? AMD chip is 352mm2 and its not handy capped in computation power and is running neck and neck in power usage with the 294mm2 GK104 that is extremely hindered in that area by 2/3rds.
Posted on Reply
#16
MrMilli
by: eddman
If the rumors are true, then how is 1.5 TFLOPS of DP performance not amazing?!

HD7970 Ghz Edition's DP number will be at 1.12 TFLOPS, which is still not that close, if they clock the firestream version the same of course, and not lower.

Those numbers are for GK104, and as we all know its DP performance is only 1/16 of SP. That ratio for GK110 is 1/3, so those figures mean nothing for K20.
From what I've read on technical forums, a more realistic number seems to be 1.3 TFLOPS. But we'll see when the product eventually gets released. Why it's not amazing for me is that nVidia needs a 7 billion transistor chip to accomplish this feat. AMD could, with GCN, make a chip that's more powerful with a smaller size. I think that has always been the power of AMD, to make GPU's that perform close to nVidia's while being 50% or more smaller.
Posted on Reply
#17
Aquinus
Resident Wat-man
by: eddman
That ratio for GK110 is 1/3, so those figures mean nothing for K20.
Source? If the GK110 is still based on Kepler, I don't believe that the ratio will actually change, just the number of shaders and clocks will within the same architecture. I call shenanigans. :banghead:
Posted on Reply
#18
theeldest
by: Aquinus
Source? If the GK110 is still based on Kepler, I don't believe that the ratio will actually change, just the number of shaders and clocks will within the same architecture. I call shenanigans. :banghead:
Why?

The ratio in Fermi was different. GF110/100 had a different ratio than GF114/104.
Posted on Reply
#19
blanarahul
by: largon
:laugh:

Around 500mm² is more like it.
Same ballpark as GF100.
Sure - for average consumers. High-end cards are not for average consumers.
GK104:- 3.54 billion transistors at 294 mm^2
GK110:- 7.1 billion transistors

3.54 billion * 2 = 7.08 billion
294 mm^2 * 2 = 592 mm^2

I hope you know enough math to understand a simple calculation.

Btw there would be no use to release GK110 for consumers. Because gaming wise GTX 690 should roughly equal GK110. And since Gk110 is a much much larger die, it would consume a hell lot of power. Another factor is gonna be yields. Will TSMC cope with the pressure to produce enough 7.1 billion transistor chip? Seems unlikely till next year.

And we all know that Maxwell is coming next year(if all goes well).
Posted on Reply
#20
blanarahul
by: Xzibit
GeForce chips

GT200 = 576mm2 / GTX 260 & 280

GF100 = 529mm2 / GTX 465, 470 & 480

GF110 = 520mm2 / GTX 560 Ti OEM, 560 TI 448, 570, 580 & 590

They seam pretty comfortable releasing chips close to 600mm2.

The problem would be how much more power it would use? AMD chip is 352mm2 and its not handy capped in computation power and is running neck and neck in power usage with the 294mm2 GK104 that is extremely hindered in that area by 2/3rds.
Yields are gonna be another issue. Plus the GTX 690 should be enough to outperform gk110 in games.(1536*2 CUDA Cores vs. 2880 BUDA Cores)
Posted on Reply
#21
theoneandonlymrk
by: eddman
HD7970 Ghz Edition's DP number will be at 1.12 TFLOPS, which is still not that close, if they clock the firestream version the same of course, and not lower.
the 7970 has these specs listed now, non Ghz edition

3.79 TFLOPS Single Precision compute power
947 GFLOPS Double Precision compute power

the GK110 has a big ask ahead of it , if the GK104 has any input, you can double the shaders and add more Dp shaders all you like and thats still a big ask, Nvidia are using two GK104, for 4.5 Tflops SP compute power 2x 7970 would have <7.58Tflops SP compute power, in performance per watt a single 7970 must aniallate a K10 compute card.

and by the look of things we wont see the GK110 till a year after amd released the 7970.
Posted on Reply
#23
largon
by: blanarahul
GK104:- 3.54 billion transistors at 294 mm^2
GK110:- 7.1 billion transistors

3.54 billion * 2 = 7.08 billion
294 mm^2 * 2 = 592 mm^2

I hope you know enough math to understand a simple calculation.
And I hope you would know estimating the die size is not that simple.
Posted on Reply
Add your own comment