Friday, February 10th 2012

NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery

NVIDIA is bound to kickstart its competitive graphics processor lineup to AMD's Southern Islands Radeon HD 7000 series with GeForce Kepler 104 (GK104). We are learning through reliable sources that NVIDIA will implement a radically different design (by NVIDIA's standards anyway) for its CUDA core machinery, while retaining the basic hierarchy of components in its GPU similar to Fermi. The new design would ensure greater parallelism. The latest version of GK104's specifications looks like this:

SIMD Hierarchy
  • 4 Graphics Processing Clusters (GPC)
  • 4 Streaming Multiprocessors (SM) per GPC = 16 SM
  • 96 Stream Processors (SP) per SM = 1536 CUDA cores


TMU / Geometry Domain
  • 8 Texture Units (TMU) per SM = 128 TMUs
  • 32 Raster OPeration Units (ROPs)
Memory
  • 256-bit wide GDDR5 memory interface
  • 2048 MB (2 GB) memory amount standard
Clocks/Other
  • 950 MHz core/CUDA core (no hot-clocks)
  • 1250 MHz actual (5.00 GHz effective) memory, 160 GB/s memory bandwidth
  • 2.9 TFLOP/s single-precision floating point compute power
  • 486 GFLOP/s double-precision floating point compute power
  • Estimated die-area 340mm²
Source: 3DCenter.org
Add your own comment

139 Comments on NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery

#1
phanbuey
wow... that is definitely different...
Posted on Reply
#2
Live OR Die
I bet your mommy always told you to eat your greens ;)
Posted on Reply
#3
ViperXTR
its looking like an AMD specification now hehe (wait 32 ROPs? D: )
Posted on Reply
#4
puma99dk|
i just hope they a serious about that 2048mb of memory if not it will be a shame.
Posted on Reply
#5
EpicShweetness
These specs are defiantly strange for an Nvidia chip. 1536 CUDA Cores is triple that of the GTX 580, yet with only a 30% reduction in the size of the fabrication as well the fact that GK104 is smaller then GF110. This only indicates a few things, a "nerf" on the CUDA core itself, or the architecture is much more "cluster based". Very Interesting I'll be following this closely
Posted on Reply
#6
LAN_deRf_HA
It's a lot more shaders but they're running much slower too. Seems it'd even out on the heat front.
Posted on Reply
#7
ViperXTR
just like what the HD 2000 and the present 7000 cards are doing, moar shaders but lower clocks (or rather clocks are tied with the TMU/ROP clocks)
Posted on Reply
#8
radrok
My massive loop is waiting for the heat :rockout:
Posted on Reply
#9
hardcore_gamer
Die size is very close to that of 7970 (365mm2). Interesting:cool:
Posted on Reply
#10
radarblade
Seems like Nvidia's pretty prepped up to wipe AMD off the slate! But what would be the TDP on these things? Preferably lesser than the earlier 480 and 580 heaters. :)
Posted on Reply
#11
theoneandonlymrk
Interested In how this is going to be 50% faster then a7970 they seem similar I'm shader layout
Posted on Reply
#12
NC37
The end of NV's monolithic GPU era is at hand...was about to say...Bout freaken time! ATI was slower at first when they switched but I knew eventually NV would have to change too.

Very interested to see how well NV does at ATI's own game.
Posted on Reply
#13
gaximodo
this isn't supposed to be NV's flagship anywayz.
Posted on Reply
#14
Xaser04
by: gaximodo
this isn't supposed to be NV's flagship anywayz.
GK104 so GTX560Ti replacement (ish).

Considering this is 1536 shaders it would be logical to assume that the full fat model would have 2048 shaders, after all the GTX560TI was - in simplistic terms - roughly 75% of a GTX580.

The shader count itself is very interesting.

The increase in shaders (384-1536 if we assume a GTX560TI replacement) would suggest that each Kepler shader is less complex than its Fermi contemporary.

If we also assume similar performance to the HD7950 (doesn't seem to unrealistic) then clock for clock GCN and Kepler could be quite evenly matched (HD7950 has more shaders but a lower core clock).

Should be very interesting.
Posted on Reply
#15
Crap Daddy
by: theoneandonlymrk
Interested In how this is going to be 50% faster then a7970 they seem similar I'm shader layout
This is not going to be 50% faster than 7970. Judging by the specs it should fall between 7950 and 7970 at a rumored 300$.
GK110 will probably be the Tahiti killer. At a price...
Posted on Reply
#16
Red_Machine
At this rate, I will feel compelled to replace my 580. GK110 will likely be 70-80% faster...
Posted on Reply
#17
pantherx12
by: Red_Machine
At this rate, I will feel compelled to replace my 580. GK110 will likely be 70-80% faster...
I reckon it will be half that, at best. :p
Posted on Reply
#18
Benetanegia
I assume this specs have been judged legit since Btarunr did post them unlike most others.

Ah crap they are too different, imposible to guesstimate the performance based on them (don't know how other people are so sure). I'll try to make my analysis anyway.

At a first glance it looks like they doubled GF104's shader domain (128 TMU, 4 GPCs, etc.) and then doubled the shader amount per SM because abandoning hot clocks allows for that. Performance wise the end result should be similar.

Based on die size this chip must contain twice the amount of transistors on GF104, while retaining the 256 bit bus, so there's no compelling reason to assume the shaders are any less capable than they were in Fermi. They could have just as easily gone with 768 SPs and hot-clocks within the same die size.

And finally efficiency. That's the key to knowing the performance. We don't know how well they will be able to use all those SP. I'd assume they are using 6x16 SP wide superscalar shader multiprocessors, but with how many schedulers? GF104 had 2. So now they have 4? Or since shaders run at half the speed the schedulers are just issuing the same amount of ops-per-cycle? (in reality cycles-per-op)

So many questions but I had fun. Based on raw specs this chip has the potential to rape any other card on the market, think 2x GTX560 Ti, at least at 1080/1200p. But efficiency/scaling is the key factor and that's completely unknown to us.

EDIT: As you can see, I changed my mind competely as I was writing this post. I first thought they were very different and came to realizing that they are pretty much the same. If you think about Fermi based GF104/114 as a 768 SP chip with no hot-clocks, they just doubled the amount of GPCs.
Posted on Reply
#19
Filiprino
NVIDIA seems that has come with something very similar to GCN from AMD. But after all it's NVIDIA and the successor to Fermi, so we'll have to wait and see performance numbers.
Posted on Reply
#20
General Lee
I wouldn't take them without a big grain of salt, but it's always fun to do some what iffing.

The specs look similar to what AMD has now, so given the estimated die size and unit counts, I'd say it would reach 580/7950 level performance. I doubt they'll price it at 300$ if 7950 is at 470$. More likely it's at best 50$ cheaper, that's enought to get the ball rolling. It's not really difficult to undercut the 7900 series in price, so regardless of performance it shouldn't be hard for Nvidia to claim a perf/$ crown simply because 7900 is sold at a premium currently. Of course AMD should respond to that, and I think this is the scenario we all hope for.
Posted on Reply
#21
xenocide
by: General Lee
I wouldn't take them without a big grain of salt, but it's always fun to do some what iffing.

The specs look similar to what AMD has now, so given the estimated die size and unit counts, I'd say it would reach 580/7950 level performance. I doubt they'll price it at 300$ if 7950 is at 470$. More likely it's at best 50$ cheaper, that's enought to get the ball rolling. It's not really difficult to undercut the 7900 series in price, so regardless of performance it shouldn't be hard for Nvidia to claim a perf/$ crown simply because 7900 is sold at a premium currently. Of course AMD should respond to that, and I think this is the scenario we all hope for.
A lot of people are holding out for Nvidia just to see prices level out. If they sell a card on par for the 7950 $100 cheaper, they'll make up the difference in volume. I guarantee they would sell twice as many cards as if they priced it around $450.
Posted on Reply
#22
jamsbong
Confirmed Nvidia is doing an ATI!
The specs look so identical that if I rename these specs as say....

HD7870:
256bit GDDR5 2GB memory
1536 CU, 128TMU, 32ROP, small 340mm^2 die size, no hot clocks.

It looks totally believable! Has Nvidia been hiring lots of ATI engineers? or they reversed engineered ATI's Cayman?

Jokes aside, some rational observations:
The specs itself looks like a mid-high end card, will be very competitive price wise as it uses 256bit memory and small die. I won't be surprise that it is only faster than cayman by 10-20%. It will be on par with GTX580 at best.
I believe Nvidia is working on a high end card which has yet to show itself.
Posted on Reply
#23
Crap Daddy
Charlie seems to be very into Kepler these days. He says the ball is rolling :

"Reports coming in from the far east say that those high up in the priority list started getting Kepler cards in various guises early this week, possibly late last. The number of sightings from sources that SemiAccurate trusts has been going up almost exponentially over the past few days, and will probably keep doing so for a bit."

He concludes:

"If things go as normal, it takes 4-6 weeks from AIB sampling to cards on the shelves. This would mean late March or early April, just like we have been saying for weeks."
Posted on Reply
#24
arnoo1
seriously 1536 shaders? thats 3 x times more than fermi
Posted on Reply
#25
1c3d0g
I have a feeling that NVIDIA will kill the competition this time around...Kepler sounds like a new Voodoo2, if y'all still remember that...
Posted on Reply
Add your own comment