Tuesday, July 1st 2008

ATI Preparing 'Super RV770' to Challenge GeForce GTX 200 Series

The RV770 is perhaps the best thing that happened to AMD in a long while. But more than AMD, it's perhaps the best thing that happened to us, the consumers. But general product launches seem to be just the tip of the ice-berg. The new PCB's designed by ATI for RV770 cards are actually running at well below the clock speed they can support and there is every reason to believe that these cards will be challenging NVIDIA's very best.

The HD4870 PCB with two 6-pin power connectors can support a maximum TDP of 225W (2x 75W from the power connectors + 75W from the PCI-Express interface). While at stock parameters, the HD4870 will not consume over 170W, it implies that with a fair bit of binning for high-performing parts, there is a serious lot of room for overclocking way beyond what the ordinary HD4870 cards can take.

ATI is binning the parts to a lowest denominator required for good yields and a level of performance that reaches or sometimes overtakes the GeForce GTX 260. But this time around, the company developed an AIB/OEM-only product codenamed "Super RV770", which will be much more powerful.

These cards will come with pre-installed water-cooling and feature an 'unlocked BIOS'. Déjà Vu? Yes, it's perhaps the same parts that went into making the Diamond HD4870 XOC Unlocked Black Edition which was released earlier. The BIOS allows manufacturers to push the GPU core speed all the way up to 950 MHz, with the memory being able to scale up to 1200 MHz (effective: 4.80 GHz). With even better cooling such as a thermo-electric couple (TEC) cooler, you might be able to push it a little further. At 1200 MHz memory, the card attains a memory bandwidth of 150 GBps.

With Diamond Multimedia already having a product in the making, expect announcements from other ATI partners such as ASUS, Sapphire, HIS and GeCube.Source: DailyTech
Add your own comment

56 Comments on ATI Preparing 'Super RV770' to Challenge GeForce GTX 200 Series

#2
Nkd
well a really good custom after market air cooler could probably run the chips at 950 as well, I am already running at 790core, with temps never exceeding 65c under hours of gaming, that is with the stock cooler with fan speed at 40%. So the companys could easily hit 950 on air with a special bios as well, I really don't think that water cooling is really required for that speed.
Posted on Reply
#3
Mussels
Moderprator
sounds fun. to be honest, i think they'll all be like that diamond model.
Posted on Reply
#4
erocker
I hope this means, that all I need is some great cooling and a bios flash and I can have 900+ on the core!:p
Posted on Reply
#5
btarunr
Editor & Senior Moderator
by: erocker
I hope this means, that all I need is some great cooling and a bios flash and I can have 900+ on the core!:p
It's not just a BIOS flash you'll need, those cards have higher binned parts (incl. the GPU) that allow stable operation at 950+ MHz.
Posted on Reply
#6
Megasty
yum, I can smell the burning silicon already :D
Posted on Reply
#7
wolf
Performance Enthusiast
a 4870 at 900mhz core and 4.8ghz memory would be NUTSO :D:D

me wantie, that would really kick my 9800GTX in the nuts, and actually represent a viable upgrade.
Posted on Reply
#8
Wile E
Power User
by: erocker
I hope this means, that all I need is some great cooling and a bios flash and I can have 900+ on the core!:p
Those were my thoughts exactly. just take the place of binning with better cooling. lol.
Posted on Reply
#9
lemonadesoda
ATI need the RV771. A rework, probably, of ROPs and local cache size/latency. I think we have enough TMU and SPEs to scale a bit further... problem is the bottleneck is elsewhere. Why?

I've yet to see ONE benchmark, real or synthetic, where th 4850/70 is scoring 250% x the 3870/50 units. So that means with 800/40 SPE/TMU (250% of 320/16) the poor buggers are not scaling and are handicapped elsewhere. "OVERCLOCKING" isnt really getting more performance out of the SPE/TMU but from overclocking the bottleneck area... whereever that bottleneck is.

But that is a serious waste of power and unnecessary heat. Attack the problem directly. Find and solve the bottleneck.

*** HUNT THE BOTTLENECK ***

1./ It's not SPE/TMU. We've go enough of those. In fact more than the GTX280
2./ It's not memory clocks. With GDDR5, we're faster then GTX280
3./ It's not PCIe1.0/2.0 interface. GTX280 can burst through

>> Drivers?
>> ROPs?
>> Local thread scheduler problems/cache/latency?
>> Memory bandwidth? Does ATI need to get back to 512-bit? Is there too much latency with 256-bit due to multiple read/write rather than single read/write

***ARCHITECTURE***
It's tempting to compare this number to the 240 stream processing units of the GTX 200 GPU, or the 128 shaders of the G92 chip that will power the boards competing with the Radeon HD 4850 and 4870. Unfortunately, you can't just add up shader units to determine total computational power. Each of these units has a different design, and Nvidia clocks their shader units at a higher speed than the core clock of their GPU.

One of the long-maligned sore spots of the R600 and RV670 architecture is that it only had 16 texture units. It has long been our belief, and others, that simple texture address and filter rates held those chips back in a great many applications. ATI apparently agrees, because they have made major improvements here. Now there are 4 texture units per SIMD array, for a total of 40. The units themselves have been redesigned to get more work done in less transistors and die size. The result is a massive increase in texture address and filter power that bears itself out in our lab tests.
Perhaps the *new* 800SPE and *new* 40 TMU are, in fact, less powerful PER UNIT than the old ones... due to instruction scheduling, data caching, and transistor-count optimisation issues. Getting 250% more units in only 33% more silicon will require some compromise somewhere. Perhaps that's the problem.
Posted on Reply
#10
indybird
The real question is, when will these be available retail?

-Indybird
Posted on Reply
#11
Mussels
Moderprator
by: lemonadesoda
ATI need the RV771. A rework, probably, of ROPs and local cache size/latency. I think we have enough TMU and SPEs to scale a bit further... problem is the bottleneck is elsewhere. Why?

I've yet to see ONE benchmark, real or synthetic, where th 4850/70 is scoring 250% x the 3870/50 units. So that means with 800/40 SPE/TMU (250% of 320/16) the poor buggers are not scaling and are handicapped elsewhere. "OVERCLOCKING" isnt really getting more performance out of the SPE/TMU but from overclocking the bottleneck area... whereever that bottleneck is.

But that is a serious waste of power and unnecessary heat. Attack the problem directly. Find and solve the bottleneck.
most people would call that bottleneck a CPU :P
Posted on Reply
#12
Nkd
by: Mussels
most people would call that bottleneck a CPU :P
lol, haha, that was a good one man, yea that is pretty simple.

I will post some numbers later with my cpu running at 4.2ghz.
Posted on Reply
#13
lemonadesoda
by: Mussels
most people would call that bottleneck a CPU :P
I REALLY do hope you are joking. You saying that the performance bottleneck on synthetics or GPU limited tests, is the CPU? ROFL. :nutkick:

Just take a benchmark that is pretty much independent of CPU, like FurMark. QED
Posted on Reply
#14
lemonadesoda
@[USER=42675]Nkd[/USER], take your PC, choose a clock for your GPU and stick with it. Now run Furmark with your CPU @2.0, 2.5, 3.0, 3.5, 4.2 and show your results. I bet they are all within 1% of each other, probably even closer.

Next. Choose clocks for your GPU and CPU and stick with it. Now run Furmark with your GPU with memory clocks (main GPU clock same) increasing. Plot the results. Any difference? If there is, then there is a memory bottleneck. If it doesnt increase, or it is only marginal, then there is no memory bottleneck. (For furmark at least)
Posted on Reply
#15
Hayder_Master
yeh that is what i talking about, super rv 770 , but i hope to add havok physics too , that will be best card for 2008
Posted on Reply
#16
InnocentCriminal
Resident Grammar Amender
I said this would happen in the Diamond thread.
Posted on Reply
#17
From_Nowhere
Will these Super RV770's be the ATI Radeon HD 4850+/4870+ series?

I suppose I'll wait a while before buying a 4870 now, wait to see what these better RV770's are capable of.
Posted on Reply
#18
InnocentCriminal
Resident Grammar Amender
Will the 256bit memory bus be able to give us the bandwidth needed if the RAM is maxed out, that's what I want to know.
Posted on Reply
#19
btarunr
Editor & Senior Moderator
by: InnocentCriminal
Will the 256bit memory bus be able to give us the bandwidth needed if the RAM is maxed out, that's what I want to know.
With GDDR5, the bus width isn't as significant a figure as it used to be with GDDR4/3. Each pin of the memory bank transfers 2x the amount of data /clock cycle. 1200 MHz (2400MHz DDR) 256bit GDDR5 means the same bandwidth as 1200 MHz (2400 MHz DDR) 512bit GDDR3.
Posted on Reply
#20
Easy Rhino
Linux Advocate
for sure the winner here is the consumer! now wouldnt it be hilarious of matrox decided to enter the high end gpu market?
Posted on Reply
#21
InnocentCriminal
Resident Grammar Amender
by: btarunr
With GDDR5, the bus width isn't as significant a figure as it used to be with GDDR4/3. Each pin of the memory bank transfers 2x the amount of data /clock cycle. 1200 MHz (2400MHz DDR) 256bit GDDR5 means the same bandwidth as 1200 MHz (2400 MHz DDR) 512bit GDDR3.
You're right, I completely ignored the fact GDDR5 has that benefit as well as consuming far less power whilst doing so, but don't you mean GDDR4 not GDDR3?

EDIT: No, I'm stupid, GDDR3. Man, I hate mornings.
Posted on Reply
#22
tkpenalty
by: lemonadesoda
I REALLY do hope you are joking. You saying that the performance bottleneck on synthetics or GPU limited tests, is the CPU? ROFL. :nutkick:

Just take a benchmark that is pretty much independent of CPU, like FurMark. QED
Explain why my FPS jumps from 30 FPS to 50FPS + from 1.86Ghz to 2.4Ghz..... The GPU cannot function without the CPU. The CPU is a bottleneck, NO benchmark is 100% independent from the GPU as the CPU processes the data from the HDD BEFORE it goes to the GPU.
Posted on Reply
#23
lemonadesoda
Try adding a bit of useful information, like game, resolution, effects. Otherwise my explanation that you asked for is: you changed the settings. ROFL. 50fps+ is 70%+ more than 30fps. But you increased your CPU only 30%. ROFL.

Posted on Reply
#24
Error 404
by: lemonadesoda


***ARCHITECTURE***

Perhaps the *new* 800SPE and *new* 40 TMU are, in fact, less powerful PER UNIT than the old ones... due to instruction scheduling, data caching, and transistor-count optimisation issues. Getting 250% more units in only 33% more silicon will require some compromise somewhere. Perhaps that's the problem.
Yep, thats the problem.
From what I've picked up, ATi have decided to try using heaps of stream processing units, except they've had to downsize them so that they would fit on the die. They gave their designers a specific transistor count and size to work in, so the designers must'v had to cut back on the power of each unit.
If they made each SPU the same as the last generation, the die size would be MASSIVE!
I hope they get 45 nm chips out soon (better SPUs!), and not only ATI: c'mon AMD, where's some die shrinks??
Posted on Reply
#25
wolf
Performance Enthusiast
interesting theory lemonadesoda. i had suspected that all of the cut downs to get 250% more juice under the hood in only 33% more space had to incorporate some cutting corners somewhere.

however the fact remains that the chips are ,on average, twice as fast as their 38xx series predecessor. no matter how its done i guess it works.
Posted on Reply
Add your own comment