Thursday, January 19th 2012

GeForce Kepler 104 (GK104) Packs 256-bit GDDR5 Memory Bus, 225W TDP

NVIDIA GeForce Kepler (GK104) will be NVIDIA's first high-performance GPU launched, based on its Kepler architecture. New reports suggest that this GPU, which will succeed GF114 (on which the likes of GeForce GTX 560 Ti are based), will continue to have a 256-bit wide GDDR5 memory interface. An equally recent report suggests that NVIDIA could give the front-line product based on GK104 as much as 2 GB of memory. We are also getting to hear from the INPAI report that on this product based on the GK104, the GPU will have a TDP of 225W. What's more, NVIDIA is gunning for the performance crown from AMD Radeon HD 7900 series with this chip, so it suggests that NVIDIA is designing the GK104 to have a massive performance improvement over the GF114 that it's succeeding.

Source: Inpai.com.cn
Add your own comment

105 Comments on GeForce Kepler 104 (GK104) Packs 256-bit GDDR5 Memory Bus, 225W TDP

#1
Ikaruga
I seriously wonder what that picture has to do with anything of this. If that's from a Kepler tech demo, I'm dissapoint:/
Posted on Reply
#2
Benetanegia
KooKKiK said:
Only 256 bit and GDDR5 memory could be a bandwidth limit ???
Not necessarily, no. There's a lot of room in memory clocks. In previous gen Nvidia used <<1000 Mhz GDDR5 clocks, AMD is using 1375 Mhz GDDR5. That's a potential 40% improvement right there, and performance relation to memory bandwidth is not linear. A 40% increase in BW could potentially suffice for an up to 80% performance increase before becoming too much of a bottleneck.

Ikaruga said:
I seriously wonder what that picture has to do with anything of this. If that's from a Kepler tech demo, I'm dissapoint:/
It's the Stonegiant DX11 benchmark, released years (?) ago.
Posted on Reply
#3
KooKKiK
Benetanegia said:
Not necessarily, no. There's a lot of room in memory clocks. In previous gen Nvidia used <<1000 Mhz GDDR5 clocks, AMD is using 1375 Mhz GDDR5. That's a potential 40% improvement right there, and performance relation to memory bandwidth is not linear. A 40% increase in BW could potentially suffice for an up to 80% performance increase before becoming too much of a bottleneck.



It's the Stonegiant DX11 benchmark, released years (?) ago.
I think there's not much headroom for GDDR5 speed, since AMD's Tahiti use the same memory clock as previous gen but increase the buswidth from 256 to 384 bit.

And for your mention, nVidia previous gen used 320 and 384 buswidth not a 256 bit like this. That means you need to increase memory clock to somewhat about 1600 - 1800 MHz for BW compensation.

1600 - 1800 MHz GDDR5, i mean... WooooooW thats must be a super special DDR5 :eek:
Posted on Reply
#4
Benetanegia
KooKKiK said:
I think there's not much headroom for GDDR5 speed, since AMD's Tahiti use the same memory clock as previous gen but increase the buswidth from 256 to 384 bit.

And for your mention, nVidia previous gen used 320 and 384 buswidth not a 256 bit like this. That means you need to increase memory clock to somewhat about 1600 - 1800 MHz for BW compensation.

1600 - 1800 MHz GDDR5, i mean... WooooooW thats must be a super special DDR5 :eek:
Yes with same GDDR5 AMD went from 256 bit to 384 bits to obtain a 50% increase in memory bandwidth. Nvidia can get almost the same increase by just using the same memory that AMD has been using for 2 generations now. Simple.

Nvidia used 384 bits on their high-end chip, GK104 is NOT high-end. High-end nowadays means GPGPU and GPGPU requires more bandwidth, that's why GF100/110 had a 384 bit bus, and same for Tahiti. High-end==GPGPU also means you need to leave headroom, it means you cannot make compromises, it means going overkill sometimes. Mid-range means you can take compromises, you can cut corners.

Besides GTX560 Ti used a 256 bit bus and 1000 Mhz memory, like I said. To match HD7970 performance they need 50% performance over the GTX560. They don't need 1600-1800 Mhz GDDR5 that's absurd. They don't even need the 40% that 1375 Mhz GDDR5 would bring, because GPU perf is not linearly related to memory bandwidth.
Posted on Reply
#5
KooKKiK
Benetanegia said:
Yes with same GDDR5 AMD went from 256 bit to 384 bits to obtain a 50% increase in memory bandwidth. Nvidia can get almost the same increase by just using the same memory that AMD has been using for 2 generations now. Simple.

Nvidia used 384 bits on their high-end chip, GK104 is NOT high-end. High-end nowadays means GPGPU and GPGPU requires more bandwidth, that's why GF100/110 had a 384 bit bus, and same for Tahiti. High-end==GPGPU also means you need to leave headroom, it means you cannot make compromises, it means going overkill sometimes. Mid-range means you can take compromises, you can cut corners.

Besides GTX560 Ti used a 256 bit bus and 1000 Mhz memory, like I said. To match HD7970 performance they need 50% performance over the GTX560. They don't need 1600-1800 Mhz GDDR5 that's absurd. They don't even need the 40% that 1375 Mhz GDDR5 would bring, because GPU perf is not linearly related to memory bandwidth.
i know that GPU performance is not related to memory bandwidth.

But, in many case, insufficient bandwidth can cause severe deduction in graphic performance. ( ex. HD5670 GDDR3 vs HD5670 GDDR5 )


so, u gonna tell me that the bandwidth of 6970 level is enough for 7970 performance.

where's the proof ???
Posted on Reply
#6
Jonap_1st
well then, only times will tell..
Posted on Reply
#7
Selene
reverze said:
good news for those few people who still buy Nvidia
lol yea a few~!
Posted on Reply
#8
Benetanegia
KooKKiK said:
i know that GPU performance is not related to memory bandwidth.

But, in many case, insufficient bandwidth can cause severe deduction in graphic performance. ( ex. HD5670 GDDR3 vs HD5670 GDDR5 )


so, u gonna tell me that the bandwidth of 6970 level is enough for 7970 performance.

where's the proof ???
There's no direct proof of that, obviously, however there's hundreds of evidences found on other cards, that demostrate that memory bandwidth is not a heavy limiting factor.

First of all you have to understand that HD7970 did NOT require all the bandwidth that it has. It does need more than HD6970, especially for compute, but it does not strictly need as much as it has. AMD did not have any other option than going 384 bits, because GDDR5 speeds higher than 1400 Mhz are not very doable and are very very expensive anyway. So their only option was a wider bus.

Now:

Evidence #1
192 bit GTX460 has 86 GB/s BW
256 bit 460 has 115 GB/s, that's 33% more BW but performance difference is not much bigger than 5%.

Another example, GTX 480 vs GTX 570, evidence #2

GTX 480 has 177 GB/s
GTX 570 has 152 GB/s - it is slightly faster, despite the 480 having 16% more memory bandwidth.

So is HD7970 kind of performance posible with HD6970 kind of bandwidth? Absolutely.

PS: The HD5670 example you posted, GDDR5 vs GDDR3, you are talking about half the bandwidth which is not going to be the case with GK104 at all (if it really is 256 bit anyway). We would be talking about a 50% reduction is buss width, but an increase of 40% in clocks, for a net bandwidth loss of 10% compared to the GTX580, a card itself is probably NOT limited by it's memory bandwidth anyway.
Posted on Reply
#9
Ikaruga
Benetanegia said:
It's the Stonegiant DX11 benchmark, released years (?) ago.
Yes I know, that's why I was wondering why would they demo their new tech with that.
Posted on Reply
#10
Benetanegia
Ikaruga said:
Yes I know, that's why I was wondering why would they demo their new tech with that.
I think it's just Bta posting a random image because there's no picture of Kepler yet.
Posted on Reply
#11
overclocking101
so many nvidia haters! if ati haters went on and on about how something cant be true we would get infractions for "flaimbaiting" etc (i know I've had it happen). just makes little sense to me, if you dont believe it oh well so what who cares??? its a damn graphics card not a political debate for christs sake
Posted on Reply
#12
Red_Machine
It's like Microsoft haters, nobody cares about them.
Posted on Reply
#13
KooKKiK
Benetanegia said:
There's no direct proof of that, obviously, however there's hundreds of evidences found on other cards, that demostrate that memory bandwidth is not a heavy limiting factor.

First of all you have to understand that HD7970 did NOT require all the bandwidth that it has. It does need more than HD6970, especially for compute, but it does not strictly need as much as it has. AMD did not have any other option than going 384 bits, because GDDR5 speeds higher than 1400 Mhz are not very doable and are very very expensive anyway. So their only option was a wider bus.

Now:

Evidence #1
192 bit GTX460 has 86 GB/s BW
256 bit 460 has 115 GB/s, that's 33% more BW but performance difference is not much bigger than 5%.

Another example, GTX 480 vs GTX 570, evidence #2

GTX 480 has 177 GB/s
GTX 570 has 152 GB/s - it is slightly faster, despite the 480 having 16% more memory bandwidth.

So is HD7970 kind of performance posible with HD6970 kind of bandwidth? Absolutely.

PS: The HD5670 example you posted, GDDR5 vs GDDR3, you are talking about half the bandwidth which is not going to be the case with GK104 at all (if it really is 256 bit anyway). We would be talking about a 50% reduction is buss width, but an increase of 40% in clocks, for a net bandwidth loss of 10% compared to the GTX580, a card itself is probably NOT limited by it's memory bandwidth anyway.
You have NO proof but i have my proof.

3dm11 score of my GTX580@850 and stock BW

http://3dmark.com/3dm11/2588707

GTX580@850 and HD6970 BW ( 1835 mem clocks )

http://3dmark.com/3dm11/2588751

nuff said ??? ;)


ps. i know that in order to bring GTX580 to HD7970 level in 3dm11, i have to push my 580 almost 1000 core clock but 850 core is enough for proving. :)
Posted on Reply
#14
gorg_graggel
overclocking101 said:
so many nvidia haters! if ati haters went on and on about how something cant be true we would get infractions for "flaimbaiting" etc (i know I've had it happen). just makes little sense to me, if you dont believe it oh well so what who cares??? its a damn graphics card not a political debate for christs sake
lol, are you serious?

this is one of the most civil kept discussions about that topic i have seen in a long time...

people are actually discussing and speculating without any name calling or anything...

and yes, it's a damn graphics card, which is being discussed on a tech enthusiast website...what are we supposed to do? talk about donuts?

you sir, are the one who is trying to cause some stir...so either contribute, or get lost...
Posted on Reply
#15
Benetanegia
KooKKiK said:
You have NO proof but i have my proof.

3dm11 score of my GTX580@850 and stock BW

http://3dmark.com/3dm11/2588707

GTX580@850 and HD6970 BW ( 1835 mem clocks )

http://3dmark.com/3dm11/2588751

nuff said ??? ;)


ps. i know that in order to bring GTX580 to HD7970 level in 3dm11, i have to push my 580 almost 1000 core clock but 850 core is enough for proving. :)
lol. That's no proof of anything, because you don't have Kepler. So an overclocked GTX580 (10% OC) with a 10% underclock on the memory does 3% slower in 3Dmark 11 than without underclock. Wow!! That so totally proves your point, man... No.

Besides the fact that 3% is thin air, we are not talking about making a card like yours be as fast as HD7970 and what memory bandwidth it needs for that. Things don't work like that. AMD/Nvidia spend months designing and balancing out their architectures and chips to get the most out of them and tweaking internal latencies and such. You taking your card and absolutely destroying that balance with a 10% core overclock and 10% memory underclock means nothing. But please, by all means try again.

EDIT: At least you proved that AMD and Nvidia do their job and don't just ramdomly choose the specs of cards, but then again looking at how the only difference is 3% maybe you proved the opposite. I just can't choose what you proved yet. In general nothing, other than a GTX580 at 850 Mhz...

And to finish. You artificially created a 20% deficit in memory bandwidth and the most you obtained was 3% less performance. Bravo, because like I said earlier Nvidia could create a card with only a 10% deficit, so 1.5% slower? Aww man, horrible bottleneck. AWWWWW!

/sarcasm
Posted on Reply
#16
KooKKiK
Benetanegia said:
lol. That's no proof of anything, because you don't have Kepler. So an overclocked GTX580 (10% OC) with a 10% underclock on the memory does 3% slower in 3Dmark 11 than without underclock. Wow!! That so totally proves your point, man... No.

Besides the fact that 3% is thin air, we are not talking about making a card like yours be as fast as HD7970 and what memory bandwidth it needs for that. Things don't work like that. AMD/Nvidia spend months designing and balancing out their architectures and chips to get the most out of them and tweaking internal latencies and such. You taking your card and absolutely destroying that balance with a 10% core overclock and 10% memory underclock means nothing. But please, by all means try again.

EDIT: At least you proved that AMD and Nvidia do their job and don't just ramdomly choose the specs of cards, but then again looking at how the only difference is 3% maybe you proved the opposite. I just can't choose what you proved yet. In general nothing, other than a GTX580 at 850 Mhz...

And to finish. You artificially created a 20% deficit in memory bandwidth and the most you obtained was 3% less performance. Bravo, because like I said earlier Nvidia could create a card with only a 10% deficit, so 1.5% slower? Aww man, horrible bottleneck. AWWWWW!

/sarcasm
oh... c'mon stop all BS thing.


my GTX580 is not even close to HD7970, but still it has a bottleneck.

imagine Kepler or HD7970@6970 BW couldn't be any faster than mine and thats not only 3% for sure.


at first, you told me that high end gpus have excessive BW, and thats for gpu computing purpose.
First of all you have to understand that HD7970 did NOT require all the bandwidth that it has. It does need more than HD6970, especially for compute, but it does not strictly need as much as it has. AMD did not have any other option than going 384 bits, because GDDR5 speeds higher than 1400 Mhz are not very doable and are very very expensive anyway. So their only option was a wider bus.
then you change your argument and told me Kepler doesn't manage memory bandwidth in the same way as Fermi and SI.
Besides the fact that 3% is thin air, we are not talking about making a card like yours be as fast as HD7970 and what memory bandwidth it needs for that. Things don't work like that. AMD/Nvidia spend months designing and balancing out their architectures and chips to get the most out of them and tweaking internal latencies and such. You taking your card and absolutely destroying that balance with a 10% core overclock and 10% memory underclock means nothing. But please, by all means try again.
what kind of unreliable person you are ??? :confused:


Try proving something ( at least find me some reference that not come from your mouth )

OR stop BS around here !!!
Posted on Reply
#17
Benetanegia
KooKKiK said:
bla bla bla[/B]
Bla, bla, bla 3% difference between both of your scores and I'm sure you even went as far as doing many and chosing the ones that showed the biggest difference. Don't worry everyone does that when desperately trying to prove something. Too bad you didn't check what the real difference was. Lame.

And I don't have to prove anything, since I never actually claimed anything. I said that a bottleneck is not warranted, that there's high chances that a bottleneck won't occur and provided REAL evidence of previous cards NOT being bottleneck. The one who says there's going to be bottleneck is you, and the only proof you could provide is a lameass comparison with 3% difference that could be derived from margin of error in 3DMark scoring system or a cat farting down the street. You are not right. Get over it.

EDIT: bah, I decided to be nice and teach you one or two things. Here: http://realworldtech.com/page.cfm?ArticleID=RWT042611035931&p=2
In most of the cases we analyzed, 2X higher memory bandwidth yielded ~30% better 3DMark Vantage GPU performance. A good estimate is that performance scales with the cube root of memory bandwidth, as long the memory/computation balance is roughly intact.
The Radeon HD 3870 and 4670 were the pair we mentioned on the earlier page. The 3870 has 2.13X the memory bandwidth of the latter, which translates into the 36% better performance
In a similar vein, the Radeon 4870 and 4850 achieve 14% and 27% higher 3DMark scores over their bandwidth starved cousins
Note: both have 2x or 100% more bandwidth that their "starved cousins".
The last example pair is the 335M and 4200M, which show somewhat less benefit from bandwidth. The 335M has nearly triple the bandwidth of the 4200M, identical shader throughput, and about 40% higher performance.
Posted on Reply
#18
phanbuey
KooKKiK said:
You have NO proof but i have my proof.

3dm11 score of my GTX580@850 and stock BW

http://3dmark.com/3dm11/2588707

GTX580@850 and HD6970 BW ( 1835 mem clocks )

http://3dmark.com/3dm11/2588751

nuff said ??? ;)


ps. i know that in order to bring GTX580 to HD7970 level in 3dm11, i have to push my 580 almost 1000 core clock but 850 core is enough for proving. :)
Off topic:
Looks like your proc is chocking your 580 like crazy - my 570 at 800Mhz gets a higher p score and graphics score of within 2%. o.O
Posted on Reply
#20
OOZMAN
Just some silly rumours with no evidence man.
Posted on Reply
#23
Damn_Smooth
I hope that they're right. Bring on a price war.
Posted on Reply
#24
KooKKiK
Benetanegia said:
Bla, bla, bla 3% difference between both of your scores and I'm sure you even went as far as doing many and chosing the ones that showed the biggest difference. Don't worry everyone does that when desperately trying to prove something. Too bad you didn't check what the real difference was. Lame.

And I don't have to prove anything, since I never actually claimed anything. I said that a bottleneck is not warranted, that there's high chances that a bottleneck won't occur and provided REAL evidence of previous cards NOT being bottleneck. The one who says there's going to be bottleneck is you, and the only proof you could provide is a lameass comparison with 3% difference that could be derived from margin of error in 3DMark scoring system or a cat farting down the street. You are not right. Get over it.

EDIT: bah, I decided to be nice and teach you one or two things. Here: http://realworldtech.com/page.cfm?ArticleID=RWT042611035931&p=2







Note: both have 2x or 100% more bandwidth that their "starved cousins".
i didn't see anything in the article that prove your argument.

may be u should "try again" :laugh:


oh, and you said you didn't claim anything ???

what is this ??? :laugh:
So is HD7970 kind of performance posible with HD6970 kind of bandwidth? Absolutely.
If i had Kepler IN HANDS and benched it right now, i'm sure u gonna make an excuse like "it's only an engineering sample" anyway. :laugh:
Posted on Reply
#25
Benetanegia
KooKKiK said:
i didn't see anything in the article that prove your argument.

may be u should "try again" :laugh:


oh, and you said you didn't claim anything ???

what is this ??? :laugh:




If i had Kepler IN HANDS and benched it right now, i'm sure u gonna make an excuse like "it's only an engineering sample" anyway. :laugh:
You don't see anything in that article that proves my point? Hahahaha. Nice try, but stop trolling.

My point: Kepler might not be memory bandwidth limited, just as countless of previous cards that AMD and Nvidia surprised us with, that had much less bandwodth than their predecesor. <-- (stating posibilities/probabilities, without stating or asserting how things are going to be only how they may be == no claim)
Proofs: the article, 8800GTX vs 9800GTX, GTX480 vs GTX570, and several cards in the article and many many other cards before and after.

Your claim: Kepler will be memory bandwidth limited. (stating what it will be == claim)
Proof: NONE.
what you think it's "proof": Your GTX580, which is NOT Kepler by any means or stretch of imagination, suffers a 3% penalty when creating an artificial 15-20% gap between stock/balanced GPU clocks and memory clocks. That's it, every 20% less memory BW, degrades performance by 3% on the GTX580, which is not GK104.

I'm still awaiting your proof. The burden of proof in on your side, as always has and you have ZERO proofs so far. Of course you won't have any proof until Kepler is released, but you'll figure it out. ;)

On the positive side, you are a good troll. Mamma troll is probably proud of you.
Posted on Reply
Add your own comment