• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

ATI Preparing 'Super RV770' to Challenge GeForce GTX 200 Series

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,677 (7.43/day)
Location
Dublin, Ireland
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B550 AORUS Elite V2
Cooling DeepCool Gammax L240 V2
Memory 2x 16GB DDR4-3200
Video Card(s) Galax RTX 4070 Ti EX
Storage Samsung 990 1TB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
The RV770 is perhaps the best thing that happened to AMD in a long while. But more than AMD, it's perhaps the best thing that happened to us, the consumers. But general product launches seem to be just the tip of the ice-berg. The new PCB's designed by ATI for RV770 cards are actually running at well below the clock speed they can support and there is every reason to believe that these cards will be challenging NVIDIA's very best.

The HD4870 PCB with two 6-pin power connectors can support a maximum TDP of 225W (2x 75W from the power connectors + 75W from the PCI-Express interface). While at stock parameters, the HD4870 will not consume over 170W, it implies that with a fair bit of binning for high-performing parts, there is a serious lot of room for overclocking way beyond what the ordinary HD4870 cards can take.

ATI is binning the parts to a lowest denominator required for good yields and a level of performance that reaches or sometimes overtakes the GeForce GTX 260. But this time around, the company developed an AIB/OEM-only product codenamed "Super RV770", which will be much more powerful.

These cards will come with pre-installed water-cooling and feature an 'unlocked BIOS'. Déjà Vu? Yes, it's perhaps the same parts that went into making the Diamond HD4870 XOC Unlocked Black Edition which was released earlier. The BIOS allows manufacturers to push the GPU core speed all the way up to 950 MHz, with the memory being able to scale up to 1200 MHz (effective: 4.80 GHz). With even better cooling such as a thermo-electric couple (TEC) cooler, you might be able to push it a little further. At 1200 MHz memory, the card attains a memory bandwidth of 150 GBps.

With Diamond Multimedia already having a product in the making, expect announcements from other ATI partners such as ASUS, Sapphire, HIS and GeCube.

View at TechPowerUp Main Site
 
Last edited by a moderator:
well a really good custom after market air cooler could probably run the chips at 950 as well, I am already running at 790core, with temps never exceeding 65c under hours of gaming, that is with the stock cooler with fan speed at 40%. So the companys could easily hit 950 on air with a special bios as well, I really don't think that water cooling is really required for that speed.
 
sounds fun. to be honest, i think they'll all be like that diamond model.
 
I hope this means, that all I need is some great cooling and a bios flash and I can have 900+ on the core!:p
 
I hope this means, that all I need is some great cooling and a bios flash and I can have 900+ on the core!:p

It's not just a BIOS flash you'll need, those cards have higher binned parts (incl. the GPU) that allow stable operation at 950+ MHz.
 
yum, I can smell the burning silicon already :D
 
a 4870 at 900mhz core and 4.8ghz memory would be NUTSO :D:D

me wantie, that would really kick my 9800GTX in the nuts, and actually represent a viable upgrade.
 
I hope this means, that all I need is some great cooling and a bios flash and I can have 900+ on the core!:p

Those were my thoughts exactly. just take the place of binning with better cooling. lol.
 
ATI need the RV771. A rework, probably, of ROPs and local cache size/latency. I think we have enough TMU and SPEs to scale a bit further... problem is the bottleneck is elsewhere. Why?

I've yet to see ONE benchmark, real or synthetic, where th 4850/70 is scoring 250% x the 3870/50 units. So that means with 800/40 SPE/TMU (250% of 320/16) the poor buggers are not scaling and are handicapped elsewhere. "OVERCLOCKING" isnt really getting more performance out of the SPE/TMU but from overclocking the bottleneck area... whereever that bottleneck is.

But that is a serious waste of power and unnecessary heat. Attack the problem directly. Find and solve the bottleneck.

*** HUNT THE BOTTLENECK ***

1./ It's not SPE/TMU. We've go enough of those. In fact more than the GTX280
2./ It's not memory clocks. With GDDR5, we're faster then GTX280
3./ It's not PCIe1.0/2.0 interface. GTX280 can burst through

>> Drivers?
>> ROPs?
>> Local thread scheduler problems/cache/latency?
>> Memory bandwidth? Does ATI need to get back to 512-bit? Is there too much latency with 256-bit due to multiple read/write rather than single read/write

***ARCHITECTURE***
It's tempting to compare this number to the 240 stream processing units of the GTX 200 GPU, or the 128 shaders of the G92 chip that will power the boards competing with the Radeon HD 4850 and 4870. Unfortunately, you can't just add up shader units to determine total computational power. Each of these units has a different design, and Nvidia clocks their shader units at a higher speed than the core clock of their GPU.

One of the long-maligned sore spots of the R600 and RV670 architecture is that it only had 16 texture units. It has long been our belief, and others, that simple texture address and filter rates held those chips back in a great many applications. ATI apparently agrees, because they have made major improvements here. Now there are 4 texture units per SIMD array, for a total of 40. The units themselves have been redesigned to get more work done in less transistors and die size. The result is a massive increase in texture address and filter power that bears itself out in our lab tests.
Perhaps the *new* 800SPE and *new* 40 TMU are, in fact, less powerful PER UNIT than the old ones... due to instruction scheduling, data caching, and transistor-count optimisation issues. Getting 250% more units in only 33% more silicon will require some compromise somewhere. Perhaps that's the problem.
 
Last edited:
The real question is, when will these be available retail?

-Indybird
 
ATI need the RV771. A rework, probably, of ROPs and local cache size/latency. I think we have enough TMU and SPEs to scale a bit further... problem is the bottleneck is elsewhere. Why?

I've yet to see ONE benchmark, real or synthetic, where th 4850/70 is scoring 250% x the 3870/50 units. So that means with 800/40 SPE/TMU (250% of 320/16) the poor buggers are not scaling and are handicapped elsewhere. "OVERCLOCKING" isnt really getting more performance out of the SPE/TMU but from overclocking the bottleneck area... whereever that bottleneck is.

But that is a serious waste of power and unnecessary heat. Attack the problem directly. Find and solve the bottleneck.

most people would call that bottleneck a CPU :P
 
most people would call that bottleneck a CPU :P
I REALLY do hope you are joking. You saying that the performance bottleneck on synthetics or GPU limited tests, is the CPU? ROFL. :nutkick:

Just take a benchmark that is pretty much independent of CPU, like FurMark. QED
 
@Nkd, take your PC, choose a clock for your GPU and stick with it. Now run Furmark with your CPU @2.0, 2.5, 3.0, 3.5, 4.2 and show your results. I bet they are all within 1% of each other, probably even closer.

Next. Choose clocks for your GPU and CPU and stick with it. Now run Furmark with your GPU with memory clocks (main GPU clock same) increasing. Plot the results. Any difference? If there is, then there is a memory bottleneck. If it doesnt increase, or it is only marginal, then there is no memory bottleneck. (For furmark at least)
 
yeh that is what i talking about, super rv 770 , but i hope to add havok physics too , that will be best card for 2008
 
I said this would happen in the Diamond thread.
 
Will these Super RV770's be the ATI Radeon HD 4850+/4870+ series?

I suppose I'll wait a while before buying a 4870 now, wait to see what these better RV770's are capable of.
 
Will the 256bit memory bus be able to give us the bandwidth needed if the RAM is maxed out, that's what I want to know.
 
Will the 256bit memory bus be able to give us the bandwidth needed if the RAM is maxed out, that's what I want to know.

With GDDR5, the bus width isn't as significant a figure as it used to be with GDDR4/3. Each pin of the memory bank transfers 2x the amount of data /clock cycle. 1200 MHz (2400MHz DDR) 256bit GDDR5 means the same bandwidth as 1200 MHz (2400 MHz DDR) 512bit GDDR3.
 
Last edited:
for sure the winner here is the consumer! now wouldnt it be hilarious of matrox decided to enter the high end gpu market?
 
With GDDR5, the bus width isn't as significant a figure as it used to be with GDDR4/3. Each pin of the memory bank transfers 2x the amount of data /clock cycle. 1200 MHz (2400MHz DDR) 256bit GDDR5 means the same bandwidth as 1200 MHz (2400 MHz DDR) 512bit GDDR3.

You're right, I completely ignored the fact GDDR5 has that benefit as well as consuming far less power whilst doing so, but don't you mean GDDR4 not GDDR3?

EDIT: No, I'm stupid, GDDR3. Man, I hate mornings.
 
Last edited:
I REALLY do hope you are joking. You saying that the performance bottleneck on synthetics or GPU limited tests, is the CPU? ROFL. :nutkick:

Just take a benchmark that is pretty much independent of CPU, like FurMark. QED

Explain why my FPS jumps from 30 FPS to 50FPS + from 1.86Ghz to 2.4Ghz..... The GPU cannot function without the CPU. The CPU is a bottleneck, NO benchmark is 100% independent from the GPU as the CPU processes the data from the HDD BEFORE it goes to the GPU.
 
Try adding a bit of useful information, like game, resolution, effects. Otherwise my explanation that you asked for is: you changed the settings. ROFL. 50fps+ is 70%+ more than 30fps. But you increased your CPU only 30%. ROFL.

Capture129.jpg
 
lemonadesoda said:
***ARCHITECTURE***

Perhaps the *new* 800SPE and *new* 40 TMU are, in fact, less powerful PER UNIT than the old ones... due to instruction scheduling, data caching, and transistor-count optimisation issues. Getting 250% more units in only 33% more silicon will require some compromise somewhere. Perhaps that's the problem.

Yep, thats the problem.
From what I've picked up, ATi have decided to try using heaps of stream processing units, except they've had to downsize them so that they would fit on the die. They gave their designers a specific transistor count and size to work in, so the designers must'v had to cut back on the power of each unit.
If they made each SPU the same as the last generation, the die size would be MASSIVE!
I hope they get 45 nm chips out soon (better SPUs!), and not only ATI: c'mon AMD, where's some die shrinks??
 
Back
Top