Monday, November 22nd 2010

AMD Cayman, Antilles Specifications Surface

At last, specifications of AMD's elusive Radeon HD 6970 and Radeon HD 6990 graphics accelerators made it to the internet, with slides exposing details such as stream processor count. The Radeon HD 6970 is based on a new 40 nm GPU by AMD, codenamed "Cayman". The dual-GPU accelerator being designed using two Cayman GPUs is codenamed "Antilles", and carries the product name Radeon HD 6990.

Cayman packs 1920 stream processors, spread across 30 SIMD engines, indicating the 4D stream processor architecture, generating single-precision computational power of 3 TFLOPs. It packs 96 TMUs, 128 Z/Stencil ROPs, and 32 color ROPs. Its memory bandwidth of 160 GB/s indicates that it uses a 256-bit wide GDDR5 memory interface. The memory amount, however, seems to have been doubled to 2 GB on the Radeon HD 6970. Antilles uses two of these Cayman GPUs, combined computational power of 6 TFLOPs, a total of 3840 stream processors, total memory bandwidth of 307.2 GB/s, a total of 4 GB of memory, load and idle board power ratings at 300W and 30W, respectively.

Source: 3DCenter Forum
Add your own comment

134 Comments on AMD Cayman, Antilles Specifications Surface

#2
KainXS
I don't know whats gonna happen, you can't gauge it in normal ways because the 6870 is still using 5 way shaders and this is going to be the new 4 way(from the looks) if the shaders perform worse than the current ones it could only be barely faster, if they perform a good bit better it will be a monster, and thats if they can get similar core clocks to the current 5870 and seeing as that slide s

seems power limiting is the future of this process right now

is the first slide really fake or is one just october and the other is november oh delayed i see.

@bravesoul
the HD5870 was kinda like a dual core gpu.
Posted on Reply
#3
Benetanegia
So confirmed no 96 TMU. At least no 1920 SP + 96 TMU.

Either 1920 SP + 120 TMU
or 1536 SP + 96 TMU

TBH the above is probably the reason that different sources have claimed 1920 SP or 1536 SP, depending on the number they chose to believe (1920 SP or 96 TMU).
Posted on Reply
#4
Swamp Monster
kainxs said:
or is one just october and the other is november
+1
Posted on Reply
#6
HalfAHertz
Benetanegia said:
So confirmed no 96 TMU. At least no 1920 SP + 96 TMU.

Either 1920 SP + 120 TMU
or 1536 SP + 96 TMU

TBH the above is probably the reason that different sources have claimed 1920 SP or 1536 SP, depending on the number they chose to believe (1920 SP or 96 TMU).
maybe the full die is 1920/120 but because of bad yields, the harvested die is 1536/96. Oor they got their sources mixed and it's 1920/120 for the 6970 and 1536/96 for the 6950

BraveSoul said:
so, a dual core gpu eh? :) sounds good
_____________________________
http://stats.free-dc.org/cpidtagb.php?cpid=59693a2ed1d0ab4f24e571d332537dfb&theme=9&cols=1
Antec1200 filter project
Hey, they're called parallel processors for a reason :p
Posted on Reply
#7
KainXS
"wonders if w1z is cuttled up with a 6970 right now":shadedshu
Posted on Reply
#8
Swamp Monster
From The Slide:
Number of color and coverage samples can be independently controlled - That sounds interesting.
Posted on Reply
#9
Over_Lord
News Editor
KainXS said:
"wonders if w1z is cuttled up with a 6970 right now":shadedshu
I share that feeling bro
Posted on Reply
#10
TheMailMan78
Big Member
Swamp Monster said:
From The Slide:
Number of color and coverage samples can be independently controlled - That sounds interesting.
For calibration it would be good. Not many people mess with that anyway. Most of that is on the monitor end anyway. To me that just sounds like "filler" for the features.
Posted on Reply
#11
Benetanegia
HalfAHertz said:
maybe the full die is 1920/120 but because of bad yields, the harvested die is 1920/120. Oor they got their sources mixed and it's 1920/120 for the 6970 and 1536/96 for the 6950
Probably. I said what I said because 96 SP figure is the only one that never changed in any of the rumors, no matter the "source". Some said 1920/96 others said 1536/96 (I even saw 2048/96), but no one ever mentioned other TMU ammount other than 96, shich is what got me confused. Again, what you say is probable, but not completely sure about that either 6 SIMDs disabled is a lot, it's 20% less. Again bad yields or bad rumors/bad interpretation on our part? Cypress had only 10% disabled on the HD5850, 20% leaves very little room for harvesting based on clocks, unless they don't mind the HD6950 being 40% slower than the HD6970. Seems like a very big gap.
Posted on Reply
#12
Swamp Monster
TheMailMan78 said:
To me that just sounds like "filler" for the features.
I like those kind of features, because it should allow to enhance colors on some games - force them to look better (if done right)
.edit. Yes, it can be done on monitor too, but it is a nice feature anyway:)
Posted on Reply
#13
happita
TheMailMan78 said:
Who cares. You do that anyway the second you do any OC.
Isnt that only if you play with voltages? They can't really tell if you OC without upping the voltage.

However, back to the subject at hand. I am a little disappointed about the mem bandwidth, however the other improvements might be enough for the 6970 to compete with the 580. Price will indeed be most people's factor when deciding whether or not to get either flagship card.
For me though, i think i will wait it out till the 7 series with my 5850.
Posted on Reply
#14
Yellow&Nerdy?
Although the performance numbers seem to be a bummer, it can still be a great card. If it is 5-10% slower than the 580 but costs 300-350 bucks with better power consumption, it would be great. 3 more weeks of waiting, then we can put all these speculations to rest.
Posted on Reply
#15
Benetanegia
Swamp Monster said:
I like those kind of features, because it should allow to enhance colors on some games - force them to look better (if done right)
.edit. Yes, it can be done on monitor too, but it is a nice feature anyway:)
It's not what you think guys. Those are just related to Antialiasing. let's see if I get it explained well. What they call Color samples are samples which are completely calculated, with all the color info + shaders + stencil/z, and coverage samples are samples whose purpose is to determine how much of the pixel belongs to which color.

i.e with normal 4xMSAA where color and coverage sample number is the same (4 color&4coverage) if a pixel is between 2 different objects (let's say one is black and the other is white) and 2 out of 4 samples fall in the area that belongs to black object the final pixel will be "50% black", if only one falls in the black object "25% black" and so on.

The problem is that it is posible that the black object occupies 40%+ of the "pixel area" while only one color sample (25%) actually falls within it, so the resulting color (25% black) is not accurate, here is where a higher number of coverage samples come to the rescue. this samples only take care of aproximating how much of the area belongs to object 1 and how much to object 2. Summarizing: color samples determine how many colors/objects are there to choose from and coverage samples determine how much of those colors are mixed up on the final result.

Nvidia has been doing this with the CSAA mode for ages although with fixed color/coverage relations and I think Ati's CFAA mode is the same. Now they are making posible to choose how many of each one developers want to use. A nice addition but honestly, not something that will improve quality a lot and I don't even think many developers will bother programming their own "mix".
Posted on Reply
#16
Swamp Monster
To Benetanegia:
I know it's related to Antialiasing. I just think that if i will be able to put "color sample slider" to the Max (or mix them in trial and error way), then it would improve color accuracy of game.
From what you are saying I guess that it's more complicated than that.:(
Thanx for explanation.
Posted on Reply
#17
pantherx12
So many stream processors!

I want ! lol
Posted on Reply
#18
cadaveca
My name is Dave
LAN_deRf_HA said:
So the memory is only 5000 effective? Thought they we're confirmed as using chips rated for 6000.
I have mentioned in the past that true high-speed GDDR5, if put on theese cards, would delay them. You should not be surprised, as highspeed GDDR5 just went into production at the end of september...

Benetanegia said:
96 TMU cannot be correct. You can't divide 96 TMU on 30 SIMDs. 96/30 = 3.2

On top of that the TMU number per SIMD is always a power of 2 and has always been 4 so far on AMD cards. 30 SIMDs most probably means 120 TMUs.

Only other posibility is that Cayman is still 5D and has 24 SIMDs. 1920/80 = 24. Then all the numbers provided here match up. (24 x 4 = 96 TMU)
All outlier indicators say 4-D. Hence 2x the polygon power...

poo417 said:
The memory bottleneck was a myth on the 5xxx series. You could over clock the memory on my old 5970 from 1000 - 1200 and gain a few percent performance. Overclocking the GPU on the other hand form 725 - 900 was a HUGE jump in performance. The problem with the cards was a lack of memory on very res with AA. At 6050 x 1080 AA with a lot of games was not possible.

My 480's regularly use more than 1 GB in games at that res. Hell there is even a few games that use more than 1 GB at 1920 x 1080 with 4/8 x AA.

Overclocking the memory on the 480's does very little as well. Again OC the GPU massive jump in performance in most games.
All talk about memory bottleneck, at the source, had nothing to do with THAT sort of performance. There is a very specific memory bottleneck when running Eyefinity, but noone has really tested that and submitted public data. It is NOT myth...it just wasn't tested properly. Go figure...that's what happens when you rely on the numbers amatures put up to justify thier own thoughts.

bear jesus said:
Yes but didn't have a clue what relevance it had, are the 68xx cards 1 per clock like the last gen as well?

Is it kind of like doubling the ipc for a cpu or something? I'm kind of curious what effect on performance it will have.
remember back to when I said for sure that Barts was 5-d, for exactly the reasoning that if barts was 4D, it should have double the performance that was indicated back then. Seems to make sense now, no? More details will come with time, but I'd be wary of the data until it's all official, at this point. Again, we've got some faked slides.

Yellow&Nerdy? said:
Although the performance numbers seem to be a bummer, it can still be a great card. If it is 5-10% slower than the 580 but costs 300-350 bucks with better power consumption, it would be great. 3 more weeks of waiting, then we can put all these speculations to rest.
I think you are execting too much. Cayman @ <$400 would be silly, IMHO. these are high-performance cards, where all manufacturer's have the greatest markup. It's more like that's what AMD would charge retailers...and you KNOW the retailers are gonna gouge us hard, especially given the timing of teh relase..there is not going to be many cards available for christmas presents, for sure. I expect a sell-out, and there won't be more cards again until after the new year.
Posted on Reply
#19
Benetanegia
cadaveca said:
All outlier indicators say 4-D. Hence 2x the polygon power...
First of all, I think you quoted the wrong post.:p

Anyway my reaction:

:confused: Ein? Explain please. What is the relation between the shaders and the polygon power? Theorethical/peak polygon power is twice because Cayman has twice as many vertex/raster engines when compared to previous gen. What does shader config have to do with that?
Posted on Reply
#20
cadaveca
My name is Dave
Benetanegia said:
First of all, I think you quoted the wrong post.:p

Anyway my reaction:

:confused: Ein? Explain please. What is the relation between the shaders and the polygon power? Theorethical/peak polygon power is twice because Cayman has twice as many vertex/raster engines when compared to previous gen. What does shader config have to do with that?
Without the added shader power to actually use the polygon power...need I say more? You're a smart guy, fill in the blanks! :cool:

And no, I didn't quote the wrong post. ;) Make of it what you will. :laugh:
Posted on Reply
#21
N3M3515
Benetanegia said:
20% leaves very little room for harvesting based on clocks, unless they don't mind the HD6950 being 40% slower than the HD6970. Seems like a very big gap.
HD 6850 has 20% less shaders than HD 6870 and it's not 40% slower. Care to explain?
Posted on Reply
#22
Benetanegia
cadaveca said:
Wihtout the added shader power to actually use the polygon power...need I say more?
And how is the 4D affecting that at all?

Anyway, my post was about the fact that on Evergreen the theoretical/peak polygon power was not used at all. Real power is much much much lower. Remember that at 1 poly/clock a HD5870 should do 850 million poly/s or ~15 million poly per frame @60fps. The thing is that it does not do that at all, and even AMD was asking for 16 pixel/poly in order to be optimiced. That accounts for less than 0.5 million poly frames, way too far from the peak 15 millions, that's like <5% efficiency. If shaders where the problem they would have not inproved the setup engine, there's a lot of room to increase the efficiency from that 5% all they way up to 99% and looking at the slides it's obvious that Cayman has exactly 2x as much power. We are not talking about a bottleneck on the shaders there, it's a very real 2x improvement based on a very specific 2x increase in the setup engine.

N3M3515 said:
HD 6850 has 20% less shaders than HD 6870 and it's not 40% slower. Care to explain?
It has 14% less shaders, but yeah it's a good point and maybe I exagerated a bit, although the reason I mistakenly exagerated is because I was assuming almost perfect efficiency. To answer your question, the explanation of why that happens is easy. AMD's architecture is not efficient, it's far from being efficient from a utilization POV. Basically it's not the HD6850 which is faster than "it should", it's HD6870 which is not as fast as it should, because it cannot use all it's resources as well as the HD6850. And this is even more true for the HD5870 that with 1600SP "should be" 2x as fast as the HD4890, but it isn't.

It was just a comment anyway and mostly based on the fact that I don't think that 6 SIMDs are needed to be disabled on the first harvested part in order to have good yields. Unless they're horrible horrible horrible. It means they would be getting almost 6 errors per die or like 500 per waffer... come on... no way (or does it?).
Posted on Reply
#23
cadaveca
My name is Dave
Yes, yes, but they are also at the limits of the process, so any additions to the gpu design really have to have a justifiable benefit. There's no point in a huge set-up array if you can never skin your polygons in time...a set-up engine sitting idle is REALLY stupid.


5870 doesn't fit in this example, except to show how misbalancing gpu arrangement can lead to real big problems...and the 6870 and it's higher efficiency serves as the basis. The 5870 set-up engine wasn't even good enough to fill 5870 properly...much of the gpu is idle all the time, even in 3D.

But, why was it idle so much? Becuase only one-to-three SPs of the 5 in a grouping ever gets used.

This inefficiency is what precludes the switch to 4-D. But at the same time, scheduling for 5-D shaders is far different from 4D, with higher-order math capabilities...

How does the set-up engine affect 4D? Are you serious? What feeds the shaders? Fairy dust and troll hairs?
Posted on Reply
#24
TheMailMan78
Big Member
cadaveca said:
Yes, yes, but they are also at the limits of the process, so any additions to the gpu design really have to have a justifiable benefit. There's no point in a huge set-up array if you can never skin your polygons in time...a set-up engine sitting idle is REALLY stupid.


5870 doesn't fit in this example, except to show how misbalancing gpu arrangement can lead to real big problems...and the 6870 and it's higher efficiency serves as the basis. The 5870 set-up engine wasn't even good enough to fill 5870 properly...much of the gpu is idle all the time, even in 3D.

But, why was it idle so much? Becuase only one-to-three SPs of the 5 in a grouping ever gets used.

This inefficiency is what precludes the switch to 4-D. But at the same time, scheduling for 5-D shaders is far different from 4D, with higher-order math capabilities...

How does the set-up engine affect 4D? Are you serious? What feeds the shaders? Fairy dust and troll hairs?
Don't forget the pixie semen.
Posted on Reply
#25
Benetanegia
cadaveca said:
How does the set-up engine affect 4D? Are you serious? What feeds the shaders? Fairy dust and troll hairs?
NO. How does set-up affect 4D no (as in the dispatcher), I know that, I'm no nub. How does 4D affect the set-up (as in making the vertex/raster engine more efficient). Repeating my prvious post, the vertex engine was not even used to a 5% of it's capabilities. Ok, let me rephrase it: it was not even used to a 5% of it's allegued capabilities. So why add another one?
Posted on Reply
Add your own comment