Tuesday, May 15th 2012

NVIDIA Pioneers New Standard for HPC With Tesla GPUs Built on Kepler Architecture

Press Release by

May 15th, 2012 14:33 Discuss (26 Comments)

NVIDIA today unveiled a new family of Tesla GPUs based on the revolutionary NVIDIA Kepler GPU computing architecture, which makes GPU-accelerated computing easier and more accessible for a broader range of high performance computing (HPC) scientific and technical applications.

The new NVIDIA Tesla K10 and K20 GPUs are computing accelerators built to handle the most complex HPC problems in the world. Designed with an intense focus on high performance and extreme power efficiency, Kepler is three times as efficient as its predecessor, the NVIDIA Fermi architecture, which itself established a new standard for parallel computing when introduced two years ago.

"Fermi was a major step forward in computing," said Bill Dally, chief scientist and senior vice president of research at NVIDIA. "It established GPU-accelerated computing in the top tier of high performance computing and attracted hundreds of thousands of developers to the GPU computing platform. Kepler will be equally disruptive, establishing GPUs broadly into technical computing, due to their ease of use, broad applicability and efficiency."

The Tesla K10 and K20 GPUs were introduced at the GPU Technology Conference (GTC), as part of a series of announcements from NVIDIA, all of which can be accessed in the GTC online press room.

NVIDIA developed a set of innovative architectural technologies that make the Kepler GPUs high performing and highly energy efficient, as well as more applicable to a wider set of developers and applications. Among the major innovations are:

SMX Streaming Multiprocessor -- The basic building block of every GPU, the SMX streaming multiprocessor was redesigned from the ground up for high performance and energy efficiency. It delivers up to three times more performance per watt than the Fermi streaming multiprocessor, making it possible to build a supercomputer that delivers one petaflop of computing performance in just 10 server racks. SMX's energy efficiency was achieved by increasing its number of CUDA architecture cores by four times, while reducing the clock speed of each core, power-gating parts of the GPU when idle and maximizing the GPU area devoted to parallel-processing cores instead of control logic.
Dynamic Parallelism -- This capability enables GPU threads to dynamically spawn new threads, allowing the GPU to adapt dynamically to the data. It greatly simplifies parallel programming, enabling GPU acceleration of a broader set of popular algorithms, such as adaptive mesh refinement, fast multipole methods and multigrid methods.
Hyper-Q -- This enables multiple CPU cores to simultaneously use the CUDA architecture cores on a single Kepler GPU. This dramatically increases GPU utilization, slashing CPU idle times and advancing programmability. Hyper-Q is ideal for cluster applications that use MPI.

"We designed Kepler with an eye towards three things: performance, efficiency and accessibility," said Jonah Alben, senior vice president of GPU Engineering and principal architect of Kepler at NVIDIA. "It represents an important milestone in GPU-accelerated computing and should foster the next wave of breakthroughs in computational research."

NVIDIA Tesla K10 and K20 GPUs
The NVIDIA Tesla K10 GPU delivers the world's highest throughput for signal, image and seismic processing applications. Optimized for customers in oil and gas exploration and the defense industry, a single Tesla K10 accelerator board features two GK104 Kepler GPUs that deliver an aggregate performance of 4.58 teraflops of peak single-precision floating point and 320 GB per second memory bandwidth.

The NVIDIA Tesla K20 GPU is the new flagship of the Tesla GPU product family, designed for the most computationally intensive HPC environments. Expected to be the world's highest-performance, most energy-efficient GPU, the Tesla K20 is planned to be available in the fourth quarter of 2012.

The Tesla K20 is based on the GK110 Kepler GPU. This GPU delivers three times more double precision compared to Fermi architecture-based Tesla products and it supports the Hyper-Q and dynamic parallelism capabilities. The GK110 GPU is expected to be incorporated into the new Titan supercomputer at the Oak Ridge National Laboratory in Tennessee and the Blue Waters system at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.

"In the two years since Fermi was launched, hybrid computing has become a widely adopted way to achieve higher performance for a number of critical HPC applications," said Earl C. Joseph, program vice president of High-Performance Computing at IDC. "Over the next two years, we expect that GPUs will be increasingly used to provide higher performance on many applications."

Preview of CUDA 5 Parallel Programming Platform
In addition to the Kepler architecture, NVIDIA today released a preview of the CUDA 5 parallel programming platform. Available to more than 20,000 members of NVIDIA's GPU Computing Registered Developer program, the platform will enable developers to begin exploring ways to take advantage of the new Kepler GPUs, including dynamic parallelism.

The CUDA 5 parallel programming model is planned to be widely available in the third quarter of 2012. Developers can get access to the preview release by signing up for the GPU Computing Registered Developer program on the CUDA website.

Add your own comment

26 Comments on NVIDIA Pioneers New Standard for HPC With Tesla GPUs Built on Kepler Architecture

xrealm20

Impressive. Wonder how many cores this one is carrying. 10 Racks for a 1P/flop HPC cluster. I can see a lot of research companies loving this...

Crap Daddy

K10 is GK104x2... GTX690. K20 will be the Big K... later this year.

the54thvoid

Intoxicated Moderator

btarunrHello, Big Kepler. 320 GB/s memory bandwidth, and for a HPC part suggests that GK110's memory interface isn't 384-bit, but 512-bit wide. If NVIDIA used 384-bit with today's 6.00 GHz memory, it would only achieve 288 GB/s.

K10 is two GK104's, giving 320 GB/s.

K20 is the daddy. Ladies and gentlemen, GK110 has arrived. What are it's specs? EDIT: www.nvidia.com/content/tesla/pdf/NV_DS_TeslaK_Family_May_2012_LR.pdf

And also... :cry:

It's as Crap Daddy's been hinting at, it's a Tesla part first and foremost - coming to TESLA line in Q4 2012.

(beat me to it Crap Daddy... :laugh:)

That means GK104 is as good as it gets this gen. Disappointed.

KainXS

oh , everyhing is TBA on the specs still then

TheMailMan78

Big Member

btarunr this news made my manhood move.

the54thvoid

Intoxicated Moderator

TheMailMan78btarunr this news made my manhood move.

Rising from the depths like a nightmare from H.P Lovecraft's writings.

Crap Daddy

the54thvoidThat means GK104 is as good as it gets this gen. Disappointed.

Well, that's it folks. You know, NV has some business to attend. To be honest during the whole Kepler rumors before GK104 was launched I was thinking that NV has other priorities than discrete GPUs (Tesla, Tegra and such) and partially I am right. They have only one chip, the GK104, which we will see in 3 maybe even 4 variants and that's about all regarding Kepler for gaming. GK106 is nowhere to be seen, I'm starting to doubt that it exists somewhere and GK107 seems to be low end.

TheMailMan78

Big Member

Crap DaddyWell, that's it folks. You know, NV has some business to attend. To be honest during the whole Kepler rumors before GK104 was launched I was thinking that NV has other priorities than discrete GPUs (Tesla, Tegra and such) and partially I am right. They have only one chip, the GK104, which we will see in 3 maybe even 4 variants and that's about all regarding Kepler for gaming. GK106 is nowhere to be seen, I'm starting to doubt that it exists somewhere and GK107 seems to be low end.

What? Everyone knows AMD's days are numbered because of the fabled vapor chip! :rolleyes:

the54thvoid

Intoxicated Moderator

Crap DaddyWell, that's it folks. You know, NV has some business to attend. To be honest during the whole Kepler rumors before GK104 was launched I was thinking that NV has other priorities than discrete GPUs (Tesla, Tegra and such) and partially I am right. They have only one chip, the GK104, which we will see in 3 maybe even 4 variants and that's about all regarding Kepler for gaming. GK106 is nowhere to be seen, I'm starting to doubt that it exists somewhere and GK107 seems to be low end.

It's a shame. I think even AMD diehards were 'curious' about GK110 and now it's been revealed as a Tesla part. I'm a little underwhelmed. I'm happy with my current card but I wanted to see a 'Big Daddy' Kepler gaming part that wasn't a ridiculous and exceptional dual gpu.

Oh well. Next rumour frenzy - Sea Islands, AKA HD8xxx.

#10

Crap Daddy

TheMailMan78What? Everyone knows AMD's days are numbered because of the fabled vapor chip! :rolleyes:

Nah, they're doing fine. They must sell heaps of the 7850.

#11

TheMailMan78

Big Member

Crap DaddyNah, they're doing fine. They must sell heaps of the 7850.

Quiet! They will hear you! Common sense is not so common.

#12

Crap Daddy

OK, let's get back on topic. Here's what NV is preaching right now:

"Kepler is world's first gpu designed for the cloud, to be deployed into cloud data centers worlwide. it does this with:
--virtualized gpu
--no longer does it need to connect to a display, it can render and stream instantaneously right out of chip to a remote location
--super energy efficiency, so it can be deployed in a massive scale

Every command buffer is now virtualized. we can now discern which virtual machine were to send us a graphics command. at the end, we can stream frame buffer to that spsecific virtual machine. One GPU can be shared with countless users.

Who's got a GTX680? Care to share some?

#13

TheMailMan78

Big Member

Crap DaddyOK, let's get back on topic. Here's what NV is preaching right now:

"Kepler is world's first gpu designed for the cloud, to be deployed into cloud data centers worlwide. it does this with:
--virtualized gpu
--no longer does it need to connect to a display, it can render and stream instantaneously right out of chip to a remote location
--super energy efficiency, so it can be deployed in a massive scale

Every command buffer is now virtualized. we can now discern which virtual machine were to send us a graphics command. at the end, we can stream frame buffer to that spsecific virtual machine. One GPU can be shared with countless users.

Who's got a GTX680? Care to share some?

Sounds like some epic folding applications could be had.

#14

mamisano

A few things...

1- Kind of sad that the K10 has 1/5 the double precision compute power of a 7970.
2- I can see the K20 being GPU compute only, similar to the Intel MIC cards.

#15

eddman

the54thvoidK10 is two GK104's, giving 320 GB/s.
EDIT: www.nvidia.com/content/tesla/pdf/NV_DS_TeslaK_Family_May_2012_LR.pdf

"Tesla K10: Peak double precision floating point performance (board): 0.19 teraflops"

Umm, are you sure it's a good idea to mention that in press releases, nvidia? Doesn't seem like a selling point to me, just saying. :wtf: /S

#16

TheoneandonlyMrK

TheMailMan78Sounds like some epic folding applications could be had.

yes indeed , but thats exactly why i feel so slapped about the face by nvidia with the GK104 i personally wanted a 660 with decent folding power , not to have to consider a 560:rolleyes:, they seem to be essetially moveing towards a point where they will start selling only speciallised cards, gamer or folder, but not the two.

as this increases the profitability of their high end compute cards and closes the door on using cheaper Nv cards in compute intense applications and servers, its Gay:wtf:

plus they are the biggest money milking tech gets ive seen, they squeeze a third more money out per coin spent then any other co(to be fair in some way a credit to them) but its from the customers:ohwell:

#17

TheMailMan78

Big Member

theoneandonlymrkyes indeed , but thats exactly why i feel so slapped about the face by nvidia with the GK104 i personally wanted a 660 with decent folding power , not to have to consider a 560:rolleyes:, they seem to be essetially moveing towards a point where they will start selling only speciallised cards, gamer or folder, but not the two.

as this increases the profitability of their high end compute cards and closes the door on using cheaper Nv cards in compute intense applications and servers, its Gay:wtf:

plus they are the biggest money milking tech gets ive seen, they squeeze a third more money out per coin spent then any other co(to be fair in some way a credit to them) but its from the customers:ohwell:

Well.......AMD's 7970 I hear is a nice folder :laugh:

#18

Fluffmeister

theoneandonlymrkyes indeed , but thats exactly why i feel so slapped about the face by nvidia with the GK104 i personally wanted a 660 with decent folding power , not to have to consider a 560:rolleyes:, they seem to be essetially moveing towards a point where they will start selling only speciallised cards, gamer or folder, but not the two.

as this increases the profitability of their high end compute cards and closes the door on using cheaper Nv cards in compute intense applications and servers, its Gay:wtf:

plus they are the biggest money milking tech gets ive seen, they squeeze a third more money out per coin spent then any other co(to be fair in some way a credit to them) but its from the customers:ohwell:

Sounds like smart business to me :D

#19

TheoneandonlyMrK

smart business , yes ,as i implied.

Good for me the customer, No no it isnt

TheMailMan78Well.......AMD's 7970 I hear is a nice folder

im only interested at all in Nv, for a hybrid physx card + perma folder card(1 off and probably a 560 now), as my next render card isnt out yet;) or for that matter even speculated about yet;) as my main rig is fine at this time(Fx8350 next Up:)).

#20

Maban

I'm surprised even the Tesla GK104 is locked at 1/24th DP power. That's shameful if you ask me.

#21

TheoneandonlyMrK

MabanI'm surprised even the Tesla GK104 is locked at 1/24th DP power. That's shameful if you ask me.

thats why they need two of them on their first next gen compute card lmfao:roll: ,,you buying this im not and wasnt so to them i matter not but this dosnt scream performance crown to me and double its performance (GK110) and what do you get,,,thats right, it again but finally on 1 chip,,,, Epic Fail imho, though yes they will be economical, just shit.

this shines with the Failness of a billion suns

#22

HalfAHertz

MabanI'm surprised even the Tesla GK104 is locked at 1/24th DP power. That's shameful if you ask me.

It's not locked it - it's how it was designed. They reduced the number of advanced functional units in each SM in favor of more simpler ones to drive SP FP performance (and thus gaming) while reducing the power requirements.

GK104 consists of 4 blocks, but only one of the four can do DP FP calcs. From AT:

AnandtechThe other change coming from GF114 is the mysterious block #15, the CUDA FP64 block. In order to conserve die space while still offering FP64 capabilities on GF114, NVIDIA only made one of the three CUDA core blocks FP64 capable. In turn that block of CUDA cores could execute FP64 instructions at a rate of ¼ FP32 performance, which gave the SM a total FP64 throughput rate of 1/12th FP32. In GK104 none of the regular CUDA core blocks are FP64 capable; in its place we have what we’re calling the CUDA FP64 block.

The CUDA FP64 block contains 8 special CUDA cores that are not part of the general CUDA core count and are not in any of NVIDIA’s diagrams. These CUDA cores can only do and are only used for FP64 math. What's more, the CUDA FP64 block has a very special execution rate: 1/1 FP32. With only 8 CUDA cores in this block it takes NVIDIA 4 cycles to execute a whole warp, but each quarter of the warp is done at full speed as opposed to ½, ¼, or any other fractional speed that previous architectures have operated at. Altogether GK104’s FP64 performance is very low at only 1/24 FP32 (1/6 * ¼), but the mere existence of the CUDA FP64 block is quite interesting because it’s the very first time we’ve seen 1/1 FP32 execution speed. Big Kepler may not end up resembling GK104, but if it does then it may be an extremely potent FP64 processor if it’s built out of CUDA FP64 blocks.

Looks to me like it was never meant to be a number crunching beast.

#23

Maban

HalfAHertzIt's not locked it - it's how it was designed. They reduced the number of advanced functional units in each SM in favor of more simpler ones to drive SP FP performance (and thus gaming) while reducing the power requirements.

GK104 consists of 4 blocks, but only one of the four can do DP FP calcs. From AT:

Thanks. I went over and read a little on that page. The world makes sense now.

But still disappointing.

#24

Protagonist

Intel MIC

So Intel MIC is supposed to rival this, i see the back panel looks like the one on Intel MIC no display outputs

#25

Clubber_Lang

Add your own comment

NVIDIA Pioneers New Standard for HPC With Tesla GPUs Built on Kepler Architecture

26 Comments on NVIDIA Pioneers New Standard for HPC With Tesla GPUs Built on Kepler Architecture

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts

NVIDIA Pioneers New Standard for HPC With Tesla GPUs Built on Kepler Architecture

Related News

26 Comments on NVIDIA Pioneers New Standard for HPC With Tesla GPUs Built on Kepler Architecture

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts