Friday, February 10th 2012

NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery

Feb 10th, 2012 01:43 Discuss (139 Comments)

NVIDIA is bound to kickstart its competitive graphics processor lineup to AMD's Southern Islands Radeon HD 7000 series with GeForce Kepler 104 (GK104). We are learning through reliable sources that NVIDIA will implement a radically different design (by NVIDIA's standards anyway) for its CUDA core machinery, while retaining the basic hierarchy of components in its GPU similar to Fermi. The new design would ensure greater parallelism. The latest version of GK104's specifications looks like this:

SIMD Hierarchy

4 Graphics Processing Clusters (GPC)
4 Streaming Multiprocessors (SM) per GPC = 16 SM
96 Stream Processors (SP) per SM = 1536 CUDA cores

TMU / Geometry Domain

8 Texture Units (TMU) per SM = 128 TMUs
32 Raster OPeration Units (ROPs)

Memory

256-bit wide GDDR5 memory interface
2048 MB (2 GB) memory amount standard

Clocks/Other

950 MHz core/CUDA core (no hot-clocks)
1250 MHz actual (5.00 GHz effective) memory, 160 GB/s memory bandwidth
2.9 TFLOP/s single-precision floating point compute power
486 GFLOP/s double-precision floating point compute power
Estimated die-area 340mm²

Source: 3DCenter.org

Add your own comment

139 Comments on NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery

#26

TheMailMan78

Big Member

This is odd. I go Nvidia and Nvidia starts to look like AMD. lol I can't win.

Listen if NVIDIA fails with the 700 series I take full responsibility. Its my fault for going green.

#27

Benetanegia

arnoo1seriously 1536 shaders? thats 3 x times more than fermi

Not really because they dropped hot-clocks. From the persective of how many ops/cycle the chip can do GF100/110 could be seen as a 1024 SP part, and GF104/114 as a 768 SP part. So it's a 50% improvement over GF100 and 100% over GF104.

@ thread

What I mention to arnoo1 is normal and has happened on pretty much every generation. The "failure" at releasing fully enabled chips in the GTX400 line made look as if it didn't happen, at least performance wise since Fermi CUDA cores are not as fast or efficient (clock for clock) as those on previous Nvidia cards. But in the end if you look at the GTX580 it's pretty damn close to being 100% faster than GTX280/285. And GTX 560 Ti is close to 50% faster. This is what Nvidia tried with GF100 and GF104, but only ultimately achieved with GF110 and GF114.

Look here, i couldn't find a direct comparison since W1zz stopped benching DX10 cards:

On the left GTX 460 is similar to GTX285. On the right GTX580 is almost twice as fast as the GTX 460.

I don't know why people (all over the internet) are so reluctant to believe a similar thing could happen this time around. Only this time they won't have to disable parts in the first place. It's not a crazy thought at all. At least IMO.

#28

m1dg3t

Info is starting to get better, still waiting :) Price war's should be as fun as waiting for release, hopefully something fit's my buget so i can upgrade :o

Has Nvidia done a "If we can't beat 'em, join 'em" thing?

#29

Benetanegia

jamsbongConfirmed Nvidia is doing an ATI!

TheMailMan78This is odd. I go Nvidia and Nvidia starts to look like AMD. lol I can't win.

m1dg3tHas Nvidia done a "If we can't beat 'em, join 'em" thing?

It's kind of an irrelevant point to discuss, but why do so many people say something like this? I just can't make any sense of it.

AMD

- Gone with scalar shaders (which Nvidia has been doing for 6+ years)
- Gone modular with CU (which Nvidia has been doing since Fermi, 2 years now)
- GPGPU friendly architecture and caches (Fermi)

Nvidia

- Dropped hot-clocks

And Nvidia is doing what AMD? Come on, they dropped hot-clocks that's it, arguably because slower cores (yet smaller and in 2x amount) are more area/wattage efficient in 28nm, which did not necessarily apply to 40 nm, 65nm, 55nm...

The only interesting thing is that both GPU vendors have converged in a very similar architecture now that both pursue the same goals and are contrained by the same physical limits.

EDIT: ^^ And that's why I love tech BTW and specially GPUs. It's pure engineering. Solving an specific "problem" (rendering) in the best way they can, and looking 2 different vendors solving it so differently, but with so similar results has been very fun to watch, maybe in the coming years it will not be as fun as they converge more and more. Kind of like CPUs are mostly equal and there's a lot less to discuss (Bulldozer was a fresh attempt tho, yet it failed). I love tech anyway.

#30

xenocide

BenetanegiaIt's kind of an irrelevant point to discuss, but why do so many people say something like this? I just can't make any sense of it.

AMD

- Gone with scalar shaders (which Nvidia has been doing for 6+ years)
- Gone modular with CU (which Nvidia has been doing since Fermi, 2 years now)
- GPGPU friendly architecture and caches (Fermi)

Nvidia

- Dropped hot-clocks

And Nvidia is doing what AMD? Come on, they dropped hot-clocks that's it, arguably because slower cores (yet smaller and in 2x amount) are more area/wattage efficient in 28nm, which did not necessarily apply to 40 nm, 65nm, 55nm...

The only interesting thing is that both GPU vendors have converged in a very similar architecture now that both pursue the same goals and are contrained by the same physical limits.

People just look at the number of Shader's and go "zomg copying AMD!!?#"

#31

Benetanegia

xenocidePeople just look at the number of Shader's and go "zomg copying AMD!!?#"

Yes I guess, but it's so obvious that Nvidia would eventually go to 4 digit numbers, if not this gen (i.e 1024 SP "Fermi"), in the next one at least. Anyway if AMD had followed using VLIW we'd probably talking about 3000 SPs, so again they wouldn't "look the same". So I stand by my point. If at all, AMD did an "Nvidia". Yet it's not true as I said.

#32

TheMailMan78

Big Member

For some reason I just think NVIDIA is gonna bring a bag of fail next round for no other reason then I bought one.

#33

m1dg3t

I'm curious why they went with a 256 bit us as opppsed to 384? I thought with the large amount's of DDR5 you want max "throughput"? :confused: Maybe cuz it's the "budget" board? Anyway's i'm still waiting :)

#34

MxPhenom 216

ASIC Engineer

TheMailMan78For some reason I just think NVIDIA is gonna bring a bag of fail next round for no other reason then I bought one.

I just hope there geometry and tesselation performance is still very good like it was with fermi.

#35

MxPhenom 216

ASIC Engineer

m1dg3tI'm curious why they went with a 256 bit us as opppsed to 384? I thought with the large amount's of DDR5 you want max "throughput"? :confused: Maybe cuz it's the "budget" board? Anyway's i'm still waiting :)

well its because of the memory size. 2GB. if it was 3gb then 384bit would work.

#36

m1dg3t

nvidiaintelftwwell its because of the memory size. 2GB. if it was 3gb then 384bit would work.

Wasn't Nvidia first to use 384 bit back in the day? Then it was with only 1g IIRC :confused:

#37

MxPhenom 216

ASIC Engineer

m1dg3tWasn't Nvidia first to use 384 bit back in the day? Then it was with only 1g IIRC :confused:

no? well the 4 series at like 1280mb of ram and 1536mb(i think on the 580) and its 384bit.

#38

creepingdeath

radarbladeSeems like Nvidia's pretty prepped up to wipe AMD off the slate! But what would be the TDP on these things? Preferably lesser than the earlier 480 and 580 heaters. :)

Uh, with these specifications it definitely will NOT beat Tahiti. There's always price though right? Hotclocking is gone, hence the shader units are substantially weaker than those found in Fermi.

GK110 is the one we want and since it has just taped out, it will not be released until Q3. Sorry green fans :)

#39

creepingdeath

1c3d0gI have a feeling that NVIDIA will kill the competition this time around...Kepler sounds like a new Voodoo2, if y'all still remember that...

I have a feeling that the specs are black and white. GK110 will be the one to wait for and its not coming till Q3.

#40

Benetanegia

creepingdeathUh, with these specifications it definitely will NOT beat Tahiti. There's always price though right? Hotclocking is gone, hence the shader units are substantially weaker than those found in Fermi.

Yes shaders are going to be exactly half as powerful as in Fermi. Hence this chip still has 50% more than the GTX580. Still not enough info to say one way or another.

But since we are at making absurd claims with no posible way to back up: this chip WILL beat Tahiti, and by a good margin too.

#41

creepingdeath

BenetanegiaYes shaders are going to be exactly half as powerful as in Fermi. Hence this chip still has 50% more than the GTX580. Still not enough info to say one way or another.

But since we are at making absurd claims with no posible way to back up: this chip WILL beat Tahiti, and by a good margin too.

I'll call you and raise you by stating, "this will be nvidia's 5000 FX all over again " :cool:

Just kidding with that ;) In all seriousness, the specs are not impressive. It may come close to the 580 at a lower cost and better efficiency, but based on specs it is not a tahiti killer. Gotta wait for GK110 which just taped out. Thats the one i'll wait for, i'll be doing another round of upgrades around the September timeframe anyway.

#42

m1dg3t

creepingdeathI'll call you and raise you by stating, "this will be nvidia's 5000 FX all over again " :cool:

I hope not! Those were shitty time's

#43

Prima.Vera

That's nice, but how about something similar to AMD's MLAA straight from the driver??? I I play a lot of older games that don't support AA, and with MLAA is a delight.

#44

m1dg3t

Prima.VeraThat's nice, but how about something similar to AMD's MLAA straight from the driver??? I I play a lot of older games that don't support AA, and with MLAA is a delight.

What, what is MLAA? It's useless, ATi has no innovative feature's like that! :rolleyes::rolleyes:

#45

Steevo

7970

3.79 TFLOPS Single Precision compute power
947 GFLOPS Double Precision compute power

Twice as much math processing power with only a 25% increase in "core" count and 25Mhz less core speed?

If these are official numbers from the green camp I feel sorry for their PR department making efficiency statements.

#46

Crap Daddy

Steevo7970

3.79 TFLOPS Single Precision compute power
947 GFLOPS Double Precision compute power

Twice as much math processing power with only a 25% increase in "core" count and 25Mhz less core speed?

If these are official numbers from the green camp I feel sorry for their PR department making efficiency statements.

Let's remember the 6970 has 2.7 TFlops while GTX580 has something like 1.5 so if we are talking about gaming benchmarks I don't think that's a factor.

#47

blibba

m1dg3tWasn't Nvidia first to use 384 bit back in the day? Then it was with only 1g IIRC :confused:

First GPU to use a 384-bit bus was the G80, as used in the 8800GTX and 8800 Ultra. It didn't have 1GB of memory though, because it had a 384-bit bus...

The GTX550 was the first (and so far only) card to break the evenly filled memory rule. It has 1GB through a 192-bit bus.

#48

Benetanegia

creepingdeathIt may come close to the 580 at a lower cost and better efficiency, but based on specs it is not a tahiti killer. Gotta wait for GK110 which just taped out. Thats the one i'll wait for, i'll be doing another round of upgrades around the September timeframe anyway.

Funny, because based on specs this is not only a Tahiti killer, but a Tahiti killer, raper and shitting on his tomb kind of killer, if that makes any sense. Of course that's ony based on the specs, so it' won't materialize as such.

Be honest and say that because it is 256 bit, YOU think it's not going to be faster than GTX580 or something. Because based on specs, all of them, the card has 2x the crunching power than GTX580 (2.9 vs 1.5 Gflops). Twice as much texture power (128 vs 64) and 33% more memory, just to name a few.

I wouldn't even pay too much attention to the claim that GK110 just taped out BTW. "They" say that GK100 was canned, but there's absolutely no proof of that. "They" never knew when GK104 taped out either. Plus in 2010 by this time of the year there was also a chip called GF110 in the works, and based on when it was released (October 2010), its tape out had to happen around Feb/March too. It's posible that GK100 still exists and will be released soon after GK104, which is what many rumors say. Rumors from sources that turned out to be correct about GK104 specs several months ago, if we are to believe these specs.

Steevo7970

3.79 TFLOPS Single Precision compute power
947 GFLOPS Double Precision compute power

Twice as much math processing power with only a 25% increase in "core" count and 25Mhz less core speed?

If these are official numbers from the green camp I feel sorry for their PR department making efficiency statements.

That's double precision* only and a huge improvement over GF104, both are capped because they are the mainstream parts. GF104 was capped at 1/12 the SP amount. GK104 is 1/6, which is a nice improvement for the performance part (for example in previous generations AMD didn't even support DP on anything but high-end). The high-end chip will feature 1/2 ratio and if Tahiti's number is really true (I thought Tahiti could do 1/2 DP :confused:), it will most definitely decimate it at DP performance.

*On SP 2.9 is definitely not half of 3.79 and like Crap Daddy said the GTX 580 had around 1.5 Gflops. Claimed theoretical GFlops means very little, except for comparing two chips using identical architecture. Obviously GK104 is not going to be 2x as fast as the GTX580 as the GFlops number suggest, or TMUs, but it will most definitely beat it by a good amount. How much? Look to previous gens and compare GTX560 Ti to GTX285. There's your most probable answer.

#49

creepingdeath

BenetanegiaFunny, because based on specs this is not only a Tahiti killer, but a Tahiti killer, raper and shitting on his tomb kind of killer, if that makes any sense. Of course that's ony based on the specs, so it' won't materialize as such.

Be honest and say that because it is 256 bit, YOU think it's not going to be faster than GTX580 or something. Because based on specs, all of them, the card has 2x the crunching power than GTX580 (2.9 vs 1.5 Gflops). Twice as much texture power (128 vs 64) and 33% more memory, just to name a few.

I wouldn't even pay too much attention to the claim that GK110 just taped out BTW. "They" say that GK100 was canned, but there's absolutely no proof of that. "They" never knew when GK104 taped out either. Plus in 2010 by this time of the year there was also a chip called GF110 in the works, and based on when it was released (October 2010), its tape out had to happen around Feb/March too. It's posible that GK100 still exists and will be released soon after GK104, which is what many rumors say. Rumors from sources that turned out to be correct about GK104 specs several months ago, if we are to believe these specs.

That's double precision* only and a huge improvement over GF104, both are capped because they are the mainstream parts. GF104 was capped at 1/12 the SP amount. GK104 is 1/6, which is a nice improvement for the performance part (for example in previous generations AMD didn't even support DP on anything but high-end). The high-end chip will feature 1/2 ratio and if Tahiti's number is really true (I thought Tahiti could do 1/2 DP :confused:), it will most definitely decimate it at DP performance.

*On SP 2.9 is definitely not half of 3.79 and like Crap Daddy said the GTX 580 had around 1.5 Gflops. Claimed theoretical GFlops means very little, except for comparing two chips using identical architecture. Obviously GK104 is not going to be 2x as fast as the GTX580 as the GFlops number suggest, or TMUs, but it will most definitely beat it by a good amount. How much? Look to previous gens and compare GTX560 Ti to GTX285. There's your most probable answer.

LOL

The fanboy is strong with this one. If GK104 cures cancer and is 20x faster than 7970 great! I'll buy one.

Unfortunately the reality is that the shader architecture of the GK104 is vastly different than that of Fermi, it takes 3 times the number of Kepler shader units to equal a Fermi Shader unit. because shader clocks will be equal to raster clocks on the Kepler. Hotclocking is gone, that is the fallacy of your argument that you stupidly don't realize because you can't see past your fanboy eyeglasses. Also, just so you know, Tflops is not a meaningful measure of performance.

But hey whatever helps you sleep at night!! Hopefully, Jen-Hsun Huang will give you a hug before you go to bed at night.

#50

Benetanegia

creepingdeathit takes 3 times the number of Kepler shader units to equal a Fermi Shader unit. because shader clocks will be equal to raster clocks on the Kepler.

Whaaaaaaat?? Hot clocks are 2x times the core clock, not 3x times, so I can't even start thinking why you'd think you need 3 times as many shaders. I don't even know where you are pulling that claim from but it doesn't smell any good.

You can call me fanboy, because I'm stating the facts (as if I cared), but at least make up an argument that doesn't sound so stupid. At least I didn't make an account just to crap on a forum with my only 4 posts.

GK110 is the one we want and since it has just taped out, it will not be released until Q3. Sorry green fans :)

Pff I don't know why I even cared to respond to you. I guess I didn't pay attention the first time. ^^ Freudian slip huh? :roll:

Ey you got me for 3 posts, is that considered a success in Trolland? Congrats anyway.

Add your own comment

NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery

139 Comments on NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts

NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery

Related News

139 Comments on NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts