Friday, February 10th 2012

NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery

NVIDIA is bound to kickstart its competitive graphics processor lineup to AMD's Southern Islands Radeon HD 7000 series with GeForce Kepler 104 (GK104). We are learning through reliable sources that NVIDIA will implement a radically different design (by NVIDIA's standards anyway) for its CUDA core machinery, while retaining the basic hierarchy of components in its GPU similar to Fermi. The new design would ensure greater parallelism. The latest version of GK104's specifications looks like this:

SIMD Hierarchy
  • 4 Graphics Processing Clusters (GPC)
  • 4 Streaming Multiprocessors (SM) per GPC = 16 SM
  • 96 Stream Processors (SP) per SM = 1536 CUDA cores
TMU / Geometry Domain
  • 8 Texture Units (TMU) per SM = 128 TMUs
  • 32 Raster OPeration Units (ROPs)
Memory
  • 256-bit wide GDDR5 memory interface
  • 2048 MB (2 GB) memory amount standard
Clocks/Other
  • 950 MHz core/CUDA core (no hot-clocks)
  • 1250 MHz actual (5.00 GHz effective) memory, 160 GB/s memory bandwidth
  • 2.9 TFLOP/s single-precision floating point compute power
  • 486 GFLOP/s double-precision floating point compute power
  • Estimated die-area 340mm²
Source: 3DCenter.org
Add your own comment

139 Comments on NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery

#76
xenocide
I actually explicitly said not counting cards like the GTX285 and 8800 Ultra because they technically came out after the initial lineup launched. They were usually just super high-end offerings that were made to address performance deficits or because they could. In the case of the GTX580 3GB, it was because super high-end users needed more VRAM, this only really affected people using 3 Display setups, so it was an incredibly niche product.

If we wanted to go crazy there are all sorts of products released that are technically better, the HD5970 is to this day ridiculously powerful, and surprisingly cost efficient. I also omitted the HD4890, because it was launched months after the rest of the 4xxx series.

My listings are still accurate. There are outliers, but for the most part all of those cards were the original high-end GPU of their corresponding series.
Posted on Reply
#77
crazyeyesreaper
Not a Moderator
dosent change the fact 680 will be priced at $600 + most likely in the $650-675 range with after market cooled cards hitting $700,

but your free to believe what you wish, :roll:
Posted on Reply
#78
Crap Daddy
crazyeyesreaperdosent change the fact 680 will be priced at $600 + most likely in the $650-675 range with after market cooled cards hitting $700,

but your free to believe what you wish, :roll:
What you call "680" at 600$ + will probably get another name. All that we see now is the GK104 which will probably be faster by a hair than the 7970 (but enough to claim its the fastest card) with some disadvantages (lower mem bandwidth and probably already very high clocked at stock to meet the target of being faster than the 7970) and some say this will be the 680. Now this card will not cost 600$ but neither 300$ as it was reported so I would expect somewhere between 450-500. As it was reported, same chip with some disabled stuff and proly clocked lower will make the 670 part, perf between 580/7950 and 7970 for 350-400$. The big boy will be out later and there we can expect 600$ plus.
Posted on Reply
#79
radrok
crazyeyesreaperafter market cooled cards hitting $700,
Here comes to my mind EVGA Hydro Copper :laugh:
Posted on Reply
#80
m1dg3t
crazyeyesreaperdosent change the fact 680 will be priced at $600 + most likely in the $650-675 range with after market cooled cards hitting $700,

but your free to believe what you wish, :roll:
Hell the 580 is still more expensive than the 7970! At most place's by me anyways :o Can't wait to see the new pricing
Posted on Reply
#81
Benetanegia
jamsbong@Benetanegia I could continue this pointless argument with an NV fanboy such as pointing all the mistakes that you've made on the last post alone but it is time to move on.

If NV have created something fantastic (i.e. 50% faster than GTX580 card) and it is stable enough to work on non-TWIMTBP titles. I won't mind cashing one for myself. If not, then Tahiti. A simple wait and see situation. Cheers.
Giving up on time is good practice when you are so wrong, so well played. lol
Posted on Reply
#82
user21
Time to kick back :p
Posted on Reply
#83
ViperXTR
I like nvidia's FXAA to, but the biggest problem is NOT implemented in drivers in control center like ATI. And there are so many games out there that don't have any AA support.

Is it that difficult nvidia to implement FXAA into drivers also????
FXAA is already inside the recent nvidia drivers

1. download nvidia inspector
2. open the advanced driver settings
3. look at the advanced configs (scroll down)
4. set FXAA to 1 (default 0/off)

there are also some hidden settings there like framecap/framerate limit, SLI and/or AA flags etc.

also, some moar rumour tablez


forum.beyond3d.com/showthread.php?p=1619912
Posted on Reply
#84
xenocide
Interesting chart. I wonder why the AA never gets put above 4x...
Posted on Reply
#85
crazyeyesreaper
Not a Moderator
so according that chart.... 3Dmark 11 is 7% difference :roll:


Total Average = is 12% difference across all those tests
Posted on Reply
#86
CrAsHnBuRnXp
I just want benchmarks already so i know what to buy.
Posted on Reply
#87
Recus


Borderlands 2 or new Brothers in Arms running on Kepler? : D
Posted on Reply
#88
Crap Daddy
Aliens: Colonial Marines? PhysX? GTX680?

As for that suspicious table, based on the specs which I think we can agree that are more or less accurate, this table was done by somebody who has done his homework. 30% plus on average above the GTX580 which brings us to that 10% over the 7970. If you look carefully you'll see the clocks - 1050 and 1425 - very high for a stock card and above the reported 950 for GPU. It is also done at 1080p where the mem bandwidth disadvantage is less pronounced.

So what I'm saying is that if this is close to real then NV will launch the GK104 under the name GTX680, a slightly faster card than the 7970 with certain weak points due to the fact that the chip was initially designed for the performance segment but after AMD's launch it can fulfill other expectations. Price? Neither 300$ nor 550$
Posted on Reply
#89
sergionography
I doubt these rumors are true, i heard about nvidia dropping their hot clocks but changing the structure of the gpu this much i dont think its possible in such a short amount of time, as far as i thought kepler is a fermi die shrink with some tweaks.
and another note is that this article claims gk104 is a 340mm die which is nvidias mid range, the hd 7970 has a die size of 375mm, so much for the "we expected more from amd" talk
not to mention nvidias high end is said to have a 550mm die size, well amd could easily build a gpu that big and pack more transistors but that is usualy a very bad business choice, and nvidia suffer from it almost every time.
Posted on Reply
#90
Benetanegia
sergionographyi dont think its possible in such a short amount of time, as far as i thought kepler is a fermi die shrink with some tweaks.
AMD/Nvidia do not start working on their chips only after releasing the previous one. They work for years on every chip. Sometimes as much as 5 years depending on how different it is. Nvidia is already working on Maxwell and whatever comes next. AMD is already working on their next 2 architectures too, Sea Islands and Canary Islands. The work on Kepler started many years ago, maybe even before GTX200 was released or shortly after.

As far as Kepler goes, yes it's a tweaked Fermi in 99% of cases, you can see it in the specs and schematics. The only difference is that they dropped the hot-clocks, which makes SPs substantially smaller and doubled the amount of them per SM to compensate.

No one knows exactly how much smaller SPs are, but just as an example of how much clocks can affect the size of some units, AMD Bart's memory controler is half as big as Cypress/Cayman because it's designed to work at ~1000 Mhz instead of >1200 Mhz. Those extra 200 Mhz make the memory controler in Cypress/Cayman twice as big. So in case of Kepler and looking at specs and 340 mm2, we can assume that non hot-clocked SPs are around half the size.
Posted on Reply
#91
sergionography
BenetanegiaAMD/Nvidia do not start working on their chips only after releasing the previous one. They work for years on every chip. Sometimes as much as 5 years depending on how different it is. Nvidia is already working on Maxwell and whatever comes next. AMD is already working on their next 2 architectures too, Sea Islands and Canary Islands. The work on Kepler started many years ago, maybe even before GTX200 was released or shortly after.

As far as Kepler goes, yes it's a tweaked Fermi in 99% of cases, you can see it in the specs and schematics. The only difference is that they dropped the hot-clocks, which makes SPs substantially smaller and doubled the amount of them per SM to compensate.

No one knows exactly how much smaller SPs are, but just as an example of how much clocks can affect the size of some units, AMD Bart's memory controler is half as big as Cypress/Cayman because it's designed to work at ~1000 Mhz instead of >1200 Mhz. Those extra 200 Mhz make the memory controler in Cypress/Cayman twice as big. So in case of Kepler and looking at specs and 340 mm2, we can assume that non hot-clocked SPs are around half the size.
yes but fermi was supposed 2 be nvidias architecture for the years to come, kepler is a descendant kinda like piledriver will be for bulldozer.
but well i guess that makes sense doing so in order to scale at high clocks kinda like cpus having longer pipelines to scale at high frequency but there is no way it will make that much difference(especially since the whole point of architecture that aim for high frequency is to make smaller chips with less hardware and lower ipc but with more throughput, but thats in cpus im not sure about gpus), mayb the 1536 is refering to the bigger gtx680/780 which would have a 550mm2 die size(read that in previous leaks/rumors)
because even considering the die size which is much smaller than the 580 yet it triples the core count
even with 28nm thats only 40% smaller and its near impossible to get perfect scaling
Posted on Reply
#92
Benetanegia
sergionographyyes but fermi was supposed 2 be nvidias architecture for the years to come, kepler is a descendant kinda like piledriver will be for bulldozer.
but well i guess that makes sense doing so in order to scale at high clocks kinda like cpus having longer pipelines to scale at high frequency but there is no way it will make that much difference(especially since the whole point of architecture that aim for high frequency is to make smaller chips with less hardware and lower ipc but with more throughput, but thats in cpus im not sure about gpus), mayb the 1536 is refering to the bigger gtx680/780 which would have a 550mm2 die size(read that in previous leaks/rumors)
because even considering the die size which is much smaller than the 580 yet it triples the core count
even with 28nm thats only 40% smaller and its near impossible to get perfect scaling
Don't let the number of SPs blind you, they didn't really tripple the number of cores. Like I said dropping the hot-clocks probably allows them to put 2x as many as if they were Fermi cores in the same space, but they are only half as fast. They are trading 2x shader clock for 2x the number of SP.

Based on die area GK104 has to have around 3.6-4.0 billion transistors, that's twice as much as GF104/114, the chip it's based on. Would you have doubted so much if Nvidia had made a 768 SP Fermi(ish) part with 256 bit memory interface? Twice the SPs at twice the number of transistors, while keeping 256 bit MC. It's 100% expected don't you think? And now they have this 768 SP "GF124" and it's here where they drop hot-clocks, thus making the SP much smaller, and allowing them to put 2x as many of them: GK104 is born.

Also remember that doubling SPs per SM is a lot more area/transistor efficient than doubling the number of SMs.

And to finish, never look at die size for comparing, look at transistor count. Scaling varies a lot from one node to another, and transistor density can change a lot as a node matures, i.e. look at Cypress vs Cayman. GK104 has twice as many transistors as GF104 and that's all that you should look at. It's pointless to even compare to GF100/110, because GF100 is a compute oriented chip, with far more GPGPU features than GF104/114 and GK104. Even GF104 is 60% as big as GF100, but it has 75% of gaming performance.
Posted on Reply
#93
sergionography
BenetanegiaDon't let the number of SPs blind you, they didn't really tripple the number of cores. Like I said dropping the hot-clocks probably allows them to put 2x as many as if they were Fermi cores in the same space, but they are only half as fast. They are trading 2x shader clock for 2x the number of SP.

Based on die area GK104 has to have around 3.6-4.0 billion transistors, that's twice as much as GF104/114, the chip it's based on. Would you have doubted so much if Nvidia had made a 768 SP Fermi(ish) part with 256 bit memory interface? Twice the SPs at twice the number of transistors, while keeping 256 bit MC. It's 100% expected don't you think? And now they have this 768 SP "GF124" and it's here where they drop hot-clocks, thus making the SP much smaller, and allowing them to put 2x as many of them: GK104 is born.

Also remember that doubling SPs per SM is a lot more area/transistor efficient than doubling the number of SMs.

And to finish, never look at die size for comparing, look at transistor count. Scaling varies a lot from one node to another, and transistor density can change a lot as a node matures, i.e. look at Cypress vs Cayman. GK104 has twice as many transistors as GF104 and that's all that you should look at. It's pointless to even compare to GF100/110, because GF100 is a compute oriented chip, with far more GPGPU features than GF104/114 and GK104. Even GF104 is 60% as big as GF100, but it has 75% of gaming performance.
yes i believe you man, it was just pretty shocking thats all, now we might be able to compare amd vs nvidia a bit more closely based on shader count
as for cypress and cayman it seems like it happened from the other extreme isnt it? as far as i remember it was pretty much getting rid of the sps that werent being utilized and change vliw5 to vliw4 and ended up with smaller SM's that performed the same as their predecessor but at a smaller size allowing them to fit more SM's into the 6970 so even though shader count was less, it performed like 20% better.

though i still think there is still more behind this, having hot clocks has its benefits, but has its limitations too, like i heard they dont scale well when frequency increases, while amd would clock while increasing performance at a constant rate(i could be wrong tho idk much about the bitty details in gpu)
Posted on Reply
#94
TheoneandonlyMrK
Crap DaddyThis is not going to be 50% faster than 7970. Judging by the specs it should fall between 7950 and 7970 at a rumored 300$.
GK110 will probably be the Tahiti killer. At a price...
yeh ,that was sarcasm from me ,so i agree with you dude:toast:

but in all honesty im betting these will arrive cheap and be below a 7950 in performance
Posted on Reply
#95
Benetanegia
sergionographythough i still think there is still more behind this, having hot clocks has its benefits, but has its limitations too, like i heard they dont scale well when frequency increases, while amd would clock while increasing performance at a constant rate(i could be wrong tho idk much about the bitty details in gpu)
Yes, that's correct and the reason that Nvidia stopped using hot-clocks with Kepler.

The reason they used hot-clocks before was apparently to have lower latencies and better single threaded/light threaded performance, so that compute apps would benefit. Remember the first chips to have hot-clocks on shaders were running at 600 Mhz core clocks and below, so shaders run at <1200 Mhz. Now even without hot-clocks they will be running at 1000 Mhz so that's probably enough*. Latencies are further reduced with a shorter pipeline (due to lower clocks) and other means that are required for GPGPU anyway.

Fermi shaders running at 2000 Mhz would have been overkill for what it's really needed and consume more than two 1000 Mhz shaders. A compute GPU needs first and foremost multi-threaded performance, so long as single threaded is not crap, single threaded is only required up to a certain level, so that some minor tasks don't become a bottleneck.
Posted on Reply
#96
jamsbong
BenetanegiaGiving up on time is good practice when you are so wrong, so well played. lol
I'm not aware that I'm in anyway wrong nor am I giving up on anything. All I did was to be rational and put things on hold. You're mistaken again.......
I guess it is always going to be difficult for me to have a logical debate with someone who is not.
Posted on Reply
#97
Benetanegia
jamsbongI'm not aware that I'm in anyway wrong nor am I giving up on anything. All I did was to be rational and put things on hold. You're mistaken again.......
I guess it is always going to be difficult for me to have a logical debate with someone who is not.
Maybe you should start by explaining why if it's only going to be almost as fast as GTX580, why did they put 96 SPs per SM (double) instead of say 64 SPs, or more importantly why did they double up the number of TMUs, when 64 TMU were perfectly fine for GTX580 and GK104 will have 25% higher clocks (thus 25% higher texture fillrate had it had 64 TMU intead of 128). I'm sorry but you just don't increase die size like that if it's not coming with a substantial (read justified) performance increase.

You have produced ZERO proof (I didn't expect that since nothing is fact), but also explained nothing (which I do expect) about why such a massive increase in computational power -that didn't came for free and suposed a 100% increment in transistor count- is not going to produce any performance gain.

You have not explained why a 2.9 TFlops card will not be able to beat the 1.5 Tflops card, and why if that'd be the case why didn't they just create a 1.5 TFlops (768 SP) card in the firt place. In the end that would have been easy, same architecture, half the SPs, 48 per SM. If going with 96 SPs is going to make the block 50% as (in)efficient as Fermi with 48 SP, you just don't make it 96 SP!!

So start by explaining something, anything, and stop calling fanboy as if that was any kind of argument in your favor, because it is not, it only makes you look like a 12 year old kid and an idiot. "It's going to be so, because (you think) it's going to be so, and if you think different you are a fanboy" is not an argument.
Logic, the study of the principles and criteria of valid inference and demonstration.
More Logic:

GK104 is 340 mm2, so close to 4 billion transistor, twice as much as GF104 and 33% more than GF110, logic dictates that Nvidia did not sudenly create an architecture that is at least 33% less efficient than Fermi (70% compared to GF104), 25% higher clocks notwithstanding. Especially when they have been claiming better efficiency for almost 2 years now.
Posted on Reply
#98
Xaser04
jamsbongI'm not aware that I'm in anyway wrong nor am I giving up on anything. All I did was to be rational and put things on hold. You're mistaken again.......
I guess it is always going to be difficult for me to have a logical debate with someone who is not.
I am struggling to see the point of your argument here. You keep stating that Benetanegia is a fanboy and "wrong" all the time yet so far I have nothing but rational, well thought out posts from him. I may not agree with everything in his posts (actually I do agree with most of it) but I am struggling to see the "fanboy" stance you keep going on about.

No doubt I will get called a Nvidia fanboy now despite running a HD7970 and Eyefinity.... :wtf:

One thing that does interest me about Kepler being a dieshrunk and "tweaked" Fermi is how much performance increase we can expect from future driver improvements? Driver improvements are a given with CGN as the architecture is realtively immature but what about Kepler? Could we end up with a case that Kepler comes out the gate faster than Thahiti but ends up slower in the long run due to a lack of driver improvements?

Obviously this is still conjecture but it is an interesting avenue to investigate as I have seen some pretty big boosts in BF3 (@3560*1920) with the latest HD79xx RC driver (25/01/2012).
Posted on Reply
#99
jamsbong
@Benetanegia "but also explained nothing (which I do expect)". I've discussed this with you before, since there is no facts whatever you built on is full on nothing. No point getting into explanation mode on speculative information.

"GK104 is 340 mm2, so close to 4 billion transistor" I am not aware of this information, where did you get 4 billion transistor? Did you estimate it off the 340mm2? in other words, building a case off speculative information?

@Xaser04 no need to struggle. Just read what I've posted thoroughly and comprehend it before venting off more steam.
Posted on Reply
#100
crazyeyesreaper
Not a Moderator
at this point who gives a flying fuck? i could care less if the Nvidia Kepler GPU is Oscar the Grouch doing calculations a on a Ti-82. Kepler is coming but its not here yet, so in retrospect it dosent matter, what its transistor count is, what its shader design is etc, because looking at specs dosent give us actual performance numbers in terms of what its capable of,

Nothing matters till we see reviews, i dont care what kepler has in the wings its still smoke and mirrors, even then its hog wash if we go on specs and theoretical maximum calculations AMD has won every time in terms of theoretical output, yet it dosent actually win, so lets just save the arguments for when we see real performance numbers, then we can bitch moan and complain about whos the greatest EVAR! and whos a loser.
Posted on Reply
Add your own comment
Apr 23rd, 2024 13:36 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts