Monday, September 27th 2010
AMD Radeon HD 6700 Series ''Barts'' Specs Sheet Surfaces
Here is the slide we've been waiting for, the specs sheet of AMD's next-generation Radeon HD 6700 series GPUs, based on a new, radically redesigned core, codenamed "Barts". The XT variant denotes Radeon HD 6770, and Pro denotes HD 6750. AMD claims that the HD 6700 series will pack "Twice the Horsepower", over previous generation HD 5700 series. Compared to the "Juniper" die that went into making the Radeon HD 5700 series, Barts features twice the memory bandwidth thanks to its 256-bit wide high-speed memory interface, key components such as the SIMD arrays split into two blocks (like on Cypress), and we're now getting to learn that it uses a more efficient 4-D stream processor design. There are 1280 stream processors available to the HD 6770 (Barts XT), and 1120 stream processors to the HD 6750 (Barts Pro). Both SKUs use the full 256-bit memory bus width.
The most interesting specification here is the shader compute power. Barts XT churns out 2.3 TFLOP/s with 1280 stream processors, GPU clocked at 900 MHz, while the Radeon HD 5870 manages 2.72 TFLOP/s with 1600 stream processors, 850 MHz. So indeed the redesigned SIMD core is working its magic. Z/Stencil performance also shot up more than 100% over the Radeon HD 5700 series. Both the HD 6770 and HD 6750 will be equipped with 5 GT/s memory chips, at least on the reference-design cards, which are technically capable of running at 1250 MHz (5 GHz effective), though are clocked at 1050 MHz (4.20 GHz effective) on HD 6770, and 1000 MHz (4 GHz effective) on HD 6750. Although these design changes will inevitably result in a larger die compared to Juniper, it could still be smaller than Cypress, and hence, more energy-efficient.
Source:
PCinLife
The most interesting specification here is the shader compute power. Barts XT churns out 2.3 TFLOP/s with 1280 stream processors, GPU clocked at 900 MHz, while the Radeon HD 5870 manages 2.72 TFLOP/s with 1600 stream processors, 850 MHz. So indeed the redesigned SIMD core is working its magic. Z/Stencil performance also shot up more than 100% over the Radeon HD 5700 series. Both the HD 6770 and HD 6750 will be equipped with 5 GT/s memory chips, at least on the reference-design cards, which are technically capable of running at 1250 MHz (5 GHz effective), though are clocked at 1050 MHz (4.20 GHz effective) on HD 6770, and 1000 MHz (4 GHz effective) on HD 6750. Although these design changes will inevitably result in a larger die compared to Juniper, it could still be smaller than Cypress, and hence, more energy-efficient.
245 Comments on AMD Radeon HD 6700 Series ''Barts'' Specs Sheet Surfaces
But personally, I don't think that AMD will be able to pull off the 5850 -> 6770, 5970 - > 6870 and so on. Although all the uncore parts on the chips are new, it's still 40nm. I would expect the 6850 to be between the 5870 and the GTX 480/very close to the GTX 480 and the 6870 to be somewhere between the GTX 480 and the 5970. I just hope they don't go coocoo bananas on the price...
they are the only specs for each card that are out of line....
* @ Paintface If 6770 is $199 and 6750 is $159 it will be win. However, ATi erm.. AMD has been a little greedy in their pricing so who knows.
Nonetheless, 6770 looks good.
i wouldn't imagine the die size of this monster....perhaps 600mm^2? serious either cayman and fermi 2's shader had gone way too ridiculous in number....if cayman is 640 ALU with 484mm^2 die space i can't imagine fermi 2 will be any size below 600mm^2... no barts pro is out pace 5870 already and barts xt may be competitive with gtx 470/480. according the benchmark from chiphell.
The Radeon HD 6670 (Turks) will be the new Radeon HD 5750. It will offer the DX 9/10 performance of the Radeon HD 4770 and DX 11 performance midway between that of the Radeon HD 5750 and Radeon HD 5770 at the $99 price of the Radeon HD 4770.
In short Turks will edge out the performance of the GTS 450 using less than the 75 watts of the PCIe slot while costing less than $100.
I know, there problems that are rediculas because of ATI drivers, When using a great dx11 5850 or 5870, and yet people have to take there time out of there life, to go try to find fixes to some games, and probably future games.
It doesn't matter if its just 1 game, or 20 games having issue's and needing refreshed hot-fixes because all cards arnt supported.
With all the flow of cash, and rep, they need to completely re-change there driver scheme.
I know, I really enjoy, just grabbing a nvidia card, updated the driver base from nvidia, and just plugging it in, having it already recognized, and ready to push fps in games.
Never having to ever go in the control panel, to edit some AA settings, shut off some extra video processing settings for some old games, or having rediculas forcing issue's.
nvidia's drivers are really solid, And on the ati side, Ive seen the problems, so many of them, People strive to go download like 10 diffrent 10.xx- to 10.8x drivers to see which one is the most stable and best performing.
I really never see that with nvidia drivers, cause there all basic, there all solid performing, realiable, easy to use, and dependably stable drivers 85-95% of the time.
on ati's case, that isn't so much the same.
My friend decided to crap shoot his perfect 1gb 4870's and begged me for my old gtx 260.
Many people come in my computer shop, say they have had to tweak some in the ccc, or forced to, after googling the problem, to play the game.
I wonder when it'll be safe to assume Turks is 640sp/16R/32TMU/128-bit? Smart on AMD's part if they are going this route. Evergreen was 1/4-1/2-1/1 parts in a series while NI looks to be 1/3, 2/3, 1/1 (granted likely without the added ROPs and mem controller on Cayman).
Hope that each 640sp (8 SIMDs) cluster has it's own setup engine to go along with such a possible divide. If they split tessellation up like that, Barts would be similar to GF104; Turks similar to GF106 with 2 and 1 triangles per clock respectively. Cayman would be interesting. While GF100 does supposedly 4 triangles per clock, if 6870 did 3 and was clocked at 900mhz, GTX480/6870 would essentially be equal in theoretical triangle output. [Math = (.75X900)/700 = 96%]. Obviously implementation and technique come into play, but it's interesting that AMD may use less transistors and the clock/watt allowances of 40nm to perhaps achieve the same stock result with less power consumption.
1- First of all the only thing that I did was to add one more cluster to GF104. That already means 576 SP: 96TMU: 48 ROPS: 384 bit. That is exactly 1.5x GF104 or 2.925 billion transistors. Compared to the 3+ billions on GF100, thats actually a 5% reduction. Let's call this one Prototype A.
2- GF104 has same ammount of TMUs and SFUs as GF100 and 75% of the cuda cores, it also has 66% of the ROPs and memory bus. The end result is a chip that has 66% as many transistors, meaning that the extra cuda cores, TMUs and SFUs don't affect transistor count or die area too much, if at all, as long as they are included in existing SMs. To come up with the 768 SP number the only thing you have to do is add another 16 way SIMD unit to each Shader Multiprocessor in Prototype A, which is exactly one of the things of what was done between GF100 and GF104. That's why I said it would be slightly bigger than GF100, but TBH after figuring out both 66% numbers above, how they seem to be related, and how adding all those extra TMUs and SFUs and cuda cores didn't impact die area at all, I even have to question my first judgement on that. The more I think about it, the more I think that Nvidia might be able to create that 768 SP monster in the same die area or less!! than GF100.
rop/bus = 20% of g100 = 529mm^2 x 0.2 = 105.8mm^2
SIMD/TMU increase from 60 to 96 = 36.70mm^2 x 1.6 = 58.72mm^2
CUDA increase from 336 to 768 = 338.55mm^2 x 2.28 = 545.257mm^2
(105.8mm^2 + 58.72mm^2 + 545.257mm^2) X105%(hard wiring )= 745.26mm^2.....
that is huge.....pretty much the largest GPU ever exist...not slightly but completely buffer up..
PS: under 28nm it will be another store....may be it can only happen on 28nm??
745.26mm^2 x (28nm/40nm)^2= 365.17mm^2
however amd can do exactly same with everything double up again...
cayamn in 28nm = 484mm^2 x (28nm/40nm)^2 = 237.16...so end up a hd 7878 with 128rops will be 484mm^2 again in 28nm...
Cards sound exciting, it's just really hard for me to get excited about them at all.
Have you read my post at all? Why are you adding A LOT of die area based on linear SP/TMU/etc. increase?? Like I said in GF104 Nvidia added many SPs and TMUs over the hypothetical 66% of a GF100 chip and that did not add any transistor.
I did my numbers too and the resulted die area is 520mm^2. Of course it's almost as arbitrary as yours, but at least is based on the correct number os SP/TMU in GF104 and I'm not basing it on how much area each unit takes ion GF100, because it's not going to be based in GF100... :shadedshu
And just to see how stupid your numbers are, let's calculate Bart and Cayman shall we?
Barts: It's almost a Cypress, except the shaders are 4D instead of 5D. So the shader/tmu area is 80% that of Cypress, everything else being equal.
Cypress was 2xRV770
img.chw.net/sitio/breves/200812/23_RV770_900SP.jpg
and as you can see the SP area is like 1/3 the chip. So (336*2/3) + (0.8*336/3) = 313mm^2
Cayman is twice that (or so they say) so: 626mm^2 man that is HUGE!
I have to admit i am excited but not just for the 6xxx cards, im excited upbout my next upgrades so that includes the 6xxx and 7xxx cards from amd, the 580 and 680 (assuming) from nvidia, intel's sandy bridge and amd's bulldozer, there is so much next gen hardware coming out over the next year or 2 that will be perfect to replace my current setup and move onto something insanly powerful even if i don't need that much power and then maybe do it again in about a year or so just for fun :D
My future mental list is motherboards with 16 cores CPU and 32 gb GDDR5(cya ddr3) and a 7XXX xd
It's almost too fast...software has issues keeping up as it is...
As it is now, I hopped on the Eyefinity bandwagon on launch of the 5-series, so I really cannot make any purchases until I see how Eyefinity performs, and if some of the bugs that are left still existing now are gone...this damn corrupting cursor is a real pain in the ass.
NDA is Oct. 21
2(367x0.7) + (367x0.1)x1.5 + (367x0.2)x1.5 = 678.95mm^2 x 105%(hard wiring) = 713mm^2
cayman has 60% die space that fill with shader/ALU and 25% for rops/bus and 15% for TMU/SIMD
2(336x 0.6 x0.8) + 2(336 x 0.1) + 2(336x0.25) = 2x 278.88= 557.76mm^2 x 110% hard wiring(512bit bus)= 613mm^2
result...these two are ridiculously big..........
but if cayman is 1920:96:64 +512bit bus instead of double up it will be
1.5(336x0.6x0.8) +1.5(336x0.1) + 2(336x0.25)= 376.32mm^2 x 110% hard wiring for ram/bus optimization (512bit bus)= 413mm^2 for cayman
lets go back to fermi 2 if it's cuda number are 576 instead of crazy 768
1.2(367x0.7) + 1.5(367 x 0.1) + 1.5(336x0.15)= 473.43
which ALU are the reason why gpu can be oversize...