This looks like a realistic design, but flawed...and look, 4 alus per shader...that sounds familiar!
I think last gen showed where the optimal rop/shader ratio was, and this undershoots shaders in favor of rops...oh wait...LIKE EVERY OTHER NVIDIA DESIGN. I thought they were going to fix that?
This would be faster than 7970 per clock, sure, but it will not be as efficient. Since 7970 can use it's 300w tdp and overclock pretty linearly on voltage, it would be the overall better design.
On the over-under game, this would appear to be 1-2 48sp units light of maximum efficiency. That would probably be a big-ass chip, granted, hence why it was smart of AMD to stay at 32 ROPs, even if using less units. I still argue they didnt really need over 30 CUs...perhaps just barely hence it makes sense to overshoot on an enthusiast part (and to allow breathing space for a 1792 part...) and that will be proven by Pitcairn which should have amazing efficiency, even if just barely shader-light (assuming 24 rop, 1408sp)...It should be within 1 CU in average use.
That all said, a salvage 32 rop part using this design should be perfect with 10 GPCs...lots of wasted transistors though.
IOW, this sounds like typical nvidia. What AMD would need to match it will boil down to how nvidia implements their alus. They will probably lose efficiency to fermi as it was super granular (2 scalar alus with 2x clock), but gain die savings moving closer to AMDs (and Intel's) design. Rough guess would be AMD would need 7970 in the 11xx range.
Bring on the inevitable 1200mhz AMD sku with (hopefully) .28ns gddr5...