• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

HD 5870 Discussion thread.

Status
Not open for further replies.
Joined
Nov 21, 2007
Messages
3,688 (0.62/day)
Location
Ohio
System Name Felix777
Processor Core i5-3570k@stock
Motherboard Biostar H61
Memory 8gb
Video Card(s) XFX RX 470
Storage WD 500GB BLK
Display(s) Acer p236h bd
Case Haf 912
Audio Device(s) onboard
Power Supply Rosewill CAPSTONE 450watt
Software Win 10 x64
that's how i am. I enjoy gaming, even enjoy benching albeit just a lil. But learning about new tech, changes from previous gen's, performance, etc is what interests me the most. Just hardware itself. Still amazes me how if you think about it all all hardware are are metallic and solid substances allowing the transfer of signals and we get everything we see today for entertainment. Just molecules that magically produce visuals. course might just be me that is this facinated :p
 

wolf

Performance Enthusiast
Joined
May 7, 2007
Messages
7,747 (1.25/day)
System Name MightyX
Processor Ryzen 5800X3D
Motherboard Gigabyte X570 I Aorus Pro WiFi
Cooling Scythe Fuma 2
Memory 32GB DDR4 3600 CL16
Video Card(s) Asus TUF RTX3080 Deshrouded
Storage WD Black SN850X 2TB
Display(s) LG 42C2 4K OLED
Case Coolermaster NR200P
Audio Device(s) LG SN5Y / Focal Clear
Power Supply Corsair SF750 Platinum
Mouse Corsair Dark Core RBG Pro SE
Keyboard Glorious GMMK Compact w/pudding
VR HMD Meta Quest 3
Software case populated with Artic P12's
Benchmark Scores 4k120 OLED Gsync bliss
that's how i am. I enjoy gaming, even enjoy benching albeit just a lil. But learning about new tech, changes from previous gen's, performance, etc is what interests me the most. Just hardware itself. Still amazes me how if you think about it all all hardware are are metallic and solid substances allowing the transfer of signals and we get everything we see today for entertainment. Just molecules that magically produce visuals. course might just be me that is this facinated :p

It's not just you bud, when I sit down and think about what I can actually make with my own two hands, a GPU is an EPIC DRUG TRIP compared, not that I'd know if course ;)
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
I missed the bit where you said you liaise with their R&D, not to mention I'm allowed my opinion in not believing you :p

You've also managed to restate the same point over and over and over, we do get it brah.

You should check that double moral. I've just stated my opinion, I've not posted news, so there's nothing to believe in or not to believe in. I've not presented anything as fact, I have not claimed I'm correct on this. I just stated my opinion and said why I think it's that way, looking at the architecture and the results it is obtaining.
 

wolf

Performance Enthusiast
Joined
May 7, 2007
Messages
7,747 (1.25/day)
System Name MightyX
Processor Ryzen 5800X3D
Motherboard Gigabyte X570 I Aorus Pro WiFi
Cooling Scythe Fuma 2
Memory 32GB DDR4 3600 CL16
Video Card(s) Asus TUF RTX3080 Deshrouded
Storage WD Black SN850X 2TB
Display(s) LG 42C2 4K OLED
Case Coolermaster NR200P
Audio Device(s) LG SN5Y / Focal Clear
Power Supply Corsair SF750 Platinum
Mouse Corsair Dark Core RBG Pro SE
Keyboard Glorious GMMK Compact w/pudding
VR HMD Meta Quest 3
Software case populated with Artic P12's
Benchmark Scores 4k120 OLED Gsync bliss
You liaise with ATi's R&D department?

That's what I said.

I am honestly sorry If I misinterpreted this, but you come across as actually speaking with ATi tech's themselves and knowing beyond doubt, that why you think they are underperforming is fact.

I am also looking at the architecture and scratching my head a bit, but really I think they just need to dive deeper with drivers and exploit the doubling of GPU resources more, and make do with less ram bandwidth.

As I remember GT200 released with facets of the GPU that the drivers and game engines of the day did not exploit, and GT200 performance has come a long way since it's release.
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
As I remember GT200 released with facets of the GPU that the drivers and game engines of the day did not exploit, and GT200 performance has come a long way since it's release.

That is why I think what I think about the HD5870. When GT200 was launched Real World Technologies in their architecture insight and Beyond3D forums they appointed many times that the weakness of GT200 was on the setup engine/DP so that it could not feed all the units efficiently. That's something tht usually a one time driver optimization can fix, 10-20% increase and IMO we could expect that, as it happened with GT200, but not any miraculous gains. When an architecture comes to an end, it almost always some parts of it are weaker and it's time to go to the drawing board. We know that Ati is in that phase, because they are, in fact, making a whole new architecture for the next release. That HD5xxx has that weakness could be predicted, just as GT200 had that weakness. IMO.
 

wolf

Performance Enthusiast
Joined
May 7, 2007
Messages
7,747 (1.25/day)
System Name MightyX
Processor Ryzen 5800X3D
Motherboard Gigabyte X570 I Aorus Pro WiFi
Cooling Scythe Fuma 2
Memory 32GB DDR4 3600 CL16
Video Card(s) Asus TUF RTX3080 Deshrouded
Storage WD Black SN850X 2TB
Display(s) LG 42C2 4K OLED
Case Coolermaster NR200P
Audio Device(s) LG SN5Y / Focal Clear
Power Supply Corsair SF750 Platinum
Mouse Corsair Dark Core RBG Pro SE
Keyboard Glorious GMMK Compact w/pudding
VR HMD Meta Quest 3
Software case populated with Artic P12's
Benchmark Scores 4k120 OLED Gsync bliss
they are, in fact, making a whole new architecture for the next release

This really interests me, bar some instruction sets the current architecture seems to have not really changed since the days of R600, theyve just been stacking up the goods to get the performance they need.

Still amazing to think in just a few short years ATi have gone from 320 sp's on 80nm, to 1600sp's on 40nm, and crank up the clock speed too.

What I love the most tho is them doubling up on ROPS, I think they really needed a big shunt in raw pixel fill-rate. G80 already took a stride ahead in that respect, and I always felt it was holding them back.
 
Joined
Nov 4, 2005
Messages
11,676 (1.73/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s) 56" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
I will run the "heavan" bench mark multiple times at different resolutions, and mebet GTA4 benchmark at different resolutions, post them and then tomorrow or later I can complie them at work in excel and make pretty charts and line graphs.

I also am a bit buzzed.

CPU useage never really changed. Never a full load on one core.

I dropped my vmem back to the lowest it would go, now I will try it at a higehr speed.
 

Attachments

  • hvn640.jpg
    hvn640.jpg
    207.6 KB · Views: 467
  • Hvn1024.jpg
    Hvn1024.jpg
    201.5 KB · Views: 397
  • Hvn1680.jpg
    Hvn1680.jpg
    199.9 KB · Views: 435
Last edited:
Joined
Nov 4, 2005
Messages
11,676 (1.73/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s) 56" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Raised memory everything else the same.

Again see the CPU use is roughly the same and no where near a full load for even one core.
 

Attachments

  • hnn640him.jpg
    hnn640him.jpg
    197.6 KB · Views: 452
  • hvn1024him.jpg
    hvn1024him.jpg
    203 KB · Views: 447
  • hvn1680him.jpg
    hvn1680him.jpg
    199.1 KB · Views: 248

Mussels

Freshwater Moderator
Staff member
Joined
Oct 6, 2004
Messages
58,413 (8.19/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
Raised memory everything else the same.

Again see the CPU use is roughly the same and no where near a full load for even one core.

i'm not confident task manager is accurate for this.

Task manager only shows usage for processes, i've never seen one for ATI or nvidia use any because its a 'hidden' process.


i think you'll need to test another way, like running superpi at the same time as a static 3D test or something
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Well, I can't speak for ati 4 series but I had two gtx 280's and a 295 and both of the sli setups suffered horribly with I7 at stock, perhaps I'd have seen a difference overclocked but I'd say microstutter for nvidia is not a thing of the past.

Well, microstuttering was proven to be "cured" as of GTX 295 and 4870X2 generation, according to the timed charts (at least for a few games that were tested including a 3870X2 and 9800GX2, etc).

Perhaps you're confusing it with the lack of triple buffering. With alternate frame rendering, triple buffering cannot properly be used. Many of the newer DirectX games call for triple buffering, but this is not true triple buffering with SLI/CF--more like "quasi" triple buffering. You'd still see stuttering hitches when the frame rate cross over certain points. This is one of the reasons why I'm glad to be a 'retired' SLI veteran.

I have said many times and still mantain that the problem is IMO in the thread dispatch processor/setup engine.

1- Both RV770 and RV870 tout the same peak execution of 32k (kilo) threads, so probably the TP/SE has not been changed.

2- It's been said that RV870 is the exact same architecture as RV770 + DX11 support on the shaders, so probably only the ISA on the shaders have changed, if at all.

3- I know comparing different architectures is kinda stupid, but it can be valid as a guideline. Nvidia's GT200 had 32k peak threads too, but they have already said (I think it was on Fermi white paper) that in reality it could only do 10-12k and that was part of the reason for the "lacking" performance of GT200, at least at launch. Fermi will have 24k peak only, but thanks to 16 kernels and 2 different dispatch processors they think they will be able to max it out. SO even if we can't compare architectures directly, we do know that one of the companies did a thorought study on their hardware to test usage and saw that their 32k thread processor (12k in ractice) would not cut it, so they decided to put two, a different/weaker ones, but two.

We could speculate wether AMD's dispatch processor was more efficient or not, but given the performance similarity it most probably had a similar one + the advantage of higher clocks if at all. Now imagine it was indeed a little bit more efficient so that that thread dispatch processor was excessive for RV770, with a heavy overhead they could not really test, because it was the rest of the chip that was holding it down. Imagine that RV770 could only do 10-12k on the shader side of things, just like GT200 did as a whole* and that AMD thought that in theory the DP/SE could really do 24k. In order to realease Evergreen as fast as they did, they probably didn't touch the DP at all, being that in theory it could handle 32k and 24k according to their estimates, plenty. But what if the DP can't do 20k and it only does 16k, for example? Then you have a bottleneck where you didn't thought you would have one. It's not as if you could do anything without a complete redesing so you release that, because, in the end it still is a fast card (the fastest), because you will release much sooner and because you expect to improve the efficiency of usage with future drivers.

My two cents.

Perhaps that would be another bottleneck, but this is not what held the GT200 back. The GT200 did not have 2 times the shaders. It did not have 2 times the memory bandwidth of an 8800GTX--it was only a little over 50% increase. It hardly had any more TMU's than a 9800GTX, at a lower clock. The shader clocks were also slower. The GFLOPS theoretical max was only around 40% higher also (instead of 100% if 2x is to be expected). That's a lot of things that are just not 2x, unlike a 5870 that is 2x of a 4890 in EVERYTHING except memory bandwidth.

You got a good point, though. However, I do not think the "peak threads" would matter too much, if Nvidia is actually cutting it down for their Fermi chips that are consisted of 3 billion+ transistors. The peak threads thing must be such a high ceiling that has never ever been reached anyways, like with PCI-E 2.0 16x for a single GPU.

^^^^^^^^^^^^^^

Anyways, in one of my posts above --the one with all of the benchmarks from Firingsquad, do you think that 2x 4890's in CF (but with memory downclocked all the way down to 2400MHz effective on each card so that it adds up to the same 4800MHz bandwidth on a 5870) would still perform any better than a 5870 in *ANY* of the games? Keep in mind that 2x 4890's beat a 5870 in every game tested by Firingsquad--sometimes by a huge margin.

This is something for all of us to keep in mind--and especially ATI with their proven capability to do 512-bit bandwidth! :rockout:
 
Last edited:

grimeleven

New Member
Joined
Oct 10, 2009
Messages
19 (0.00/day)
Processor Intel Core i7@3.5Ghz
Motherboard eVGA X58SLI
Cooling TRUE 120 Xtreme
Memory 6GB Aeneon 1866Mhz
Video Card(s) 4870X2 2GB /w AC Xtreme cooler
Storage Vertex 120g
Display(s) Samsung 32 inch LCD 1080p
Case HAF932
Audio Device(s) SB X-Fi
Power Supply Antec TP3 650W
Interesting read... http://www.beyond3d.com/content/reviews/53/13

It's interesting that even in these simple shaders we're not managing 100% issue rate (that would be 1360 GInstr/s for Cypress and 680 for RV790), which is probably tied to what we've already discussed about the peculiarities of accessing GPRs. In the dependent float MAD test you can see that the newly implemented MUL+dependent ADD “co-issuing” doesn't bring increased instruction issuing, but, as you'll see when tested in a high register pressure scenario, there are other benefits.

Still nice results comparing previous gen.

 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)

Interesting.. it still looks very close to a 100% increase -- actually a bit more in one case.


Wow guys! :toast: you go ahead and figure this out;)
I'll buy the revision:D with the better memory
Thanks for the heads up...Btw is 2 5770's worth it or is 2 5750's good

Since I'm an enthusiast, I'd say go for 2 5770's!
 
Last edited:

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
This reminded me to look at how a 5770 is just like a 4890 but only differing in memory bandwidth and extra DX11 features.

Let's look at how a 5770 compares against a 4890 using TPU's benchmarks (thanks to W1zzard and Btarunr for the hard work):



5770: 20.1 fps
4890: 29.2 fps




5770: 47.3 fps
4890: 54.4 fps




5770: 46.2 fps
4890: 53.4 fps





5770: 72.1 fps
4890: 69.3 fps (memory might be more than enough here--or just due to slight improvements in architecture??)




5770: 10.8 fps
4890: 12.2 fps




5770: 40 fps
4890: 52.3 fps




5770: 44.7 fps
4890: 59.4 fps




5770: 81.2 fps
4890: 113.2 fps




5770: 34.5 fps
4890: 34.5 fps (IDENTICAL.. memory might be more than enough here--or just due to slight improvements in architecture??)




5770: 57 fps
4890: 71 fps




5770: 40.8 fps
4890: 47.7 fps




5770: 75 fps
4890: 87.9 fps




5770: 94.3 fps
4890: 87.4 fps (memory might be more than enough here--or just due to slight improvements in architecture??)




5770: 19 fps
4890: 25.4 fps




5770: 46.8 fps
4890: 54.7 fps




5770: 14.9 fps
4890: 16.9 fps




5770: 65.7 fps
4890: 71.8 fps




5770: 59.9 fps
4890: 73.3 fps




5770: 21 fps
4890: 27 fps


And here's the "ultimate":


Where a 4890 performs ~20% faster than a 5770 overall.

This is a direct comparison of two similar chips, but one with 2.4Gbps bandwidth (5770) and one with 3.9Gbps bandwidth (4890).

Two 5770's would perform very similarly to a 5870 (which is exactly 2x in all of the specifications, including 256-bit memory instead of 128-bit) as long as CF is being used efficiently.

So, no matter how much some of us want to blame the drivers, the memory bandwidth can be a bottleneck like so many times in the past. Keep in mind that it's a 2.4Gbps 5770 against a 3.9Gbps 4890, not against a 4.8Gbps 4890!!! If it were against a 4890 with its memory overclocked to 5870 speeds (4.8GHz), there would be even greater difference than just 20%--more like upwards of 25-30% or even 35%!

I want that extra 30% for a 5870 with 512-bit memory!


EDIT: Thanks once again W1z and Bta for covering so many games (most of the games are the ones that I actually have installed in my rigs!!!)
 
Last edited:

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Perhaps that would be another bottleneck, but this is not what held the GT200 back. The GT200 did not have 2 times the shaders. It did not have 2 times the memory bandwidth of an 8800GTX--it was only a little over 50% increase. It hardly had any more TMU's than a 9800GTX, at a lower clock. The shader clocks were also slower. The GFLOPS theoretical max was only around 40% higher also (instead of 100% if 2x is to be expected). That's a lot of things that are just not 2x, unlike a 5870 that is 2x of a 4890 in EVERYTHING except memory bandwidth.

You got a good point, though. However, I do not think the "peak threads" would matter too much, if Nvidia is actually cutting it down for their Fermi chips that are consisted of 3 billion+ transistors. The peak threads thing must be such a high ceiling that has never ever been reached anyways, like with PCI-E 2.0 16x for a single GPU.

^^^^^^^^^^^^^^

Anyways, in one of my posts above --the one with all of the benchmarks from Firingsquad, do you think that 2x 4890's in CF (but with memory downclocked all the way down to 2400MHz effective on each card so that it adds up to the same 4800MHz bandwidth on a 5870) would still perform any better than a 5870 in *ANY* of the games? Keep in mind that 2x 4890's beat a 5870 in every game tested by Firingsquad--sometimes by a huge margin.

This is something for all of us to keep in mind--and especially ATI with their proven capability to do 512-bit bandwidth! :rockout:

Regarding GT200, I'm just sharing what they said, I don't know if that was the case or not myself. And yeah I know everything was not doubled, but at launch it didn't even perform as it should. There was a 25% increase across the board with one of the driver releases that only marginally increased performance on other cards, so something was happenning whatever the problem was.

RV870 has doubled everything when it comes to execution units, but the underlying hardware has probably not been doubled up, that's what I'm saying. I'm blaiming the thread dispatcher because it makes more sense to me than, say, the bottlenck being on too few registers or slowish internal communications, because those are far easier problems to overcome without going to the drawing board. If you read the link from beyond3D (good read BTW) they do mention some problems in both the setup engine and thread dispatcher, alhoutgh they do blame the front-end registers apparently, or at least they mention a problem generated by register-pressure.

All in all it's clear that something is happening because on their charts, the more specific and theoretical they are the closer the HD5870 is from being 2x the HD4890, but as long as more are put into the equation the closer it gets to actual gaming performance. The best example and after reading the article and benchmarks, what I think it's to blame is texture filtering: http://www.beyond3d.com/content/reviews/53/12

Texture fillrate is undeniably faster like almost 3x that of the HD4890, but texture filtering si only marginallly faster, it doesn't make sense to me unless something happens outside of the texture filtering units that prevents them from performing. Pay attention how slowing the mem bandwidth to that of the HD4890, has little effect too. That's something you can see throughout the entire article and shows the HD5870 is not memory bottlenecked.

And oh BTW you can't put 2xHD4890 to 2.4 gT/s in order to match the HD5870, because there is much more traffic going on on an SLI/Crossfire setup. Namely, geometry and texture data has to be sent twice. It's not apples to apples.

Similarly, the HD5770 is very different too, you can't extrapolate the results of the HD5770/HD4890 to the HD5870 basing in memory bandwidth. Double the performance doesn't mean it needs double the memory bandwidth. The memory space and memory bandwidth asociated to geometry and textures (and data in general) is the same in both cases, because both have to render the same thing. That part of the memory (a big one I must say) is only refreshed based on game time* and not based in the number of frames being rendered.

* If you make a 360 turn slowly, both cards will have to load the same geometry/textures at the exact same time, regardless of how many frames per second are being rendered.
 
Last edited:

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Nice article from Beyond3D, thanks, but it's just with a theoretical test. Such tests are known to have little to nothing to do with real-world scenarios, and memory is unlikely to be the limiting factor, at the very least. Also, it's the wrong way around. To REDUCE the memory bandwidth on a 5870 is the exact opposite of what I'm trying to prove here, which can only be proven by INCREASING the memory bandwidth in real-world applications. Overclockers know this fact by heart after they have seen consistent improvements by overclocking the memory alone.
____

Yeah, it's not apples to apples as shown when the 5770 actually beats a 4890 in two of the games in my previous post above (TPU benchies). In one such game, a 5770 beats a 4890 by ~8%, but that is Wolfenstein, an OpenGL game. There might have been slight optimizations with OpenGL 3.1 or with whatever you mentioned above.

Well, that is only to compare a 2.4Gbps 5770 against a 3.9Gbps 4890.

However, if you were able to double the bandwidth on a 5770 (or overclock a 4890 to 4.8Gbps) to the same amount that a 5870 has, we'd be seeing more than just a 20% increase. We'd see upwards of 25-30 or even 35% increase.

Sometimes when a bottleneck threshold is overcome (let's say that the bottleneck is at around 5 Gbps for a 4890 in a certain game at a certain resolution), the performance would skyrocket in that the chip is finally able to fully flex its muscle. That is why sometimes overclocking the core or the memory by just 5-10% would bring a 20% increase in performance in a few games.

What you were just saying about SLI/CF is yet another factor. You are right that it's not apples to apples, and that SLI/CF is usually not 100% efficient, which is another reason for the disappointing performance of a 5870 against a 4870X2.

Bring on that 5870 (or 5890) with 512-bit memory for 10Gbps bandwidth (with 5GHz effective GDDR5) memory!!! Perhaps ATI might do an improved re-spin of RV870XT for higher speeds (1000MHz), just like with the 4890 (RV790XT) and couple it with 512-bit memory just in time Nvidia finally releases its GT300!

I'm thinking that ATI already expected Nvidia to be plagued with 40nm delays.
 
Last edited:

grimeleven

New Member
Joined
Oct 10, 2009
Messages
19 (0.00/day)
Processor Intel Core i7@3.5Ghz
Motherboard eVGA X58SLI
Cooling TRUE 120 Xtreme
Memory 6GB Aeneon 1866Mhz
Video Card(s) 4870X2 2GB /w AC Xtreme cooler
Storage Vertex 120g
Display(s) Samsung 32 inch LCD 1080p
Case HAF932
Audio Device(s) SB X-Fi
Power Supply Antec TP3 650W
Agree with Benetanegia

Another good article
Earlier this week, we also dabbled with 5870 overclocking, OC’ing the 5870 GPU and memory by a fixed amount of 9% respectively. In practically every case the 5870 card scaled best when the GPU/shaders were OC’ed rather than memory: performance typically improved by 4-5% in most apps when the GPU was running at 930MHz, while OC’ing the memory to 1320MHz only improved performance by 2-3% in the same games.

If the card was truly being bottlenecked by its memory interface, it should’ve shown more significant gains when we OC’ed the memory.
http://www.firingsquad.com/hardware/ati_radeon_hd_5850_performance_preview/page20.asp
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)

That is because until the memory bottleneck has been overcome, the gains are not quite so linear. The performance could take off after there's a 6-7 Gbps bandwidth for certain games at certain resolutions to free up that super-powerful GPU. Not everything is linear before the bottleneck is overcome.

The worst thing is that Firingsquad did not even acknowledge or mention the new error-correcting feature of R800's memory. When it's overclocked past a certain point of stability, it actually degrades performance.
Firingsquad just said that they chose to do 9% for both the core and memory, and that was it. There has to be much more testing done to find the performance degradation line of threshold.

Also, if you look at the benchmarks themselves, you'll see that overclocking the core by itself is practically useless without also overclocking the memory.
 
Last edited:

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Nice article from Beyond3D, perhaps the only site to do that kind of testing, thanks, but it's just with a theoretical test. Such tests are known to have little to nothing to do with real-world scenarios, and memory is unlikely to be the limiting factor, at the very least. Also, it's the wrong way around. To REDUCE the memory bandwidth on a 5870 is the exact opposite of what I'm trying to prove here, which can only be proven by INCREASING the memory bandwidth in real-world applications.

They do have something to do. Game preformance is a mix of those results, t's the statistical mixture of performance at those various levels. Think like water flowing through 4 or 5 funnels one after the other. The smaller one will mostly determine the final flow although the rest will have some influence too, especially if the initial flow that comes from the faucet is not constant. Not the best of the examples maybe, because the size of the funnels would depend on both the actual theoretical specs and the requirements of each game, but I think it serves to picture the idea.

Also it's in these theoretical tests were bandwidth is mostly relevant, more so than in games. Then again it's the general bandwidth that matters, like the flow I mentioned above.

Sometimes when a bottleneck threshold is overcome (let's say that the bottleneck is at around 5 Gbps for a 4890 in a certain game at a certain resolution), the performance would skyrocket in that the chip is finally able to fully flex its muscle. That is why sometimes overclocking the core or the memory by just 5-10% would bring a 20% increase in performance in a few games.

I've never seen or heard of that. Really, but I don't know everything or have seen everything. The norm is quite the opposite though. Huge variations in memory bandwidth yield small variations in performance. It's been proved once and again in various archtectures through various reviews.

I'm thinking that ATI already expected Nvidia to be plagued with 40nm delays.

Considering that they are having problems too, yes, probably. On top of that TSMC was reporting better yields around the time that Evergreen had to be mass produced for release and now they are reporting shitty yields again. Bad timing for Nvidia I guess, bad for them, bad for us. Ati was lucky in that respect, sometimes luck does play a big role in this bussiness.
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
I've never seen or heard of that. Really, but I don't know everything or have seen everything. The norm is quite the opposite though. Huge variations in memory bandwidth yield small variations in performance. It's been proved once and again in various archtectures through various reviews.


From TPU: http://www.techpowerup.com/reviews/ATI/Radeon_HD_5870/33.html

Overclocking the memory on these cards is quite different from any other card so far. Normally you'd expect rendering errors or crashes, but not with these cards. Thanks to the new error correction algorithm in the memory controller, every memory error is just retransmitted until everything is fine. So once you exceed the "stable" clock frequency, memory errors will appear more often, get retransmitted, but the rendered output will still look perfectly fine. The only difference is that performance drops, the further you increase the clocks, the lower the performance gets. As a result a normal "artifact scanning" approach to memory overclocking on the HD 5800 Series will not work. You have to manually increase the clocks and observe the framerate until you find the point where performance drops.


oops sorry misunderstood you, but anyways whenever the performance is low (i.e., less than 30fps), there's usually a critical point of where a certain amount of memory bandwidth is so sorely needed, and the performance would not change by much until the "funnel" has been opened up to the ceiling.
 
Last edited:

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
They do have something to do. Game preformance is a mix of those results, t's the statistical mixture of performance at those various levels. Think like water flowing through 4 or 5 funnels one after the other. The smaller one will mostly determine the final flow although the rest will have some influence too, especially if the initial flow that comes from the faucet is not constant. Not the best of the examples maybe, because the size of the funnels would depend on both the actual theoretical specs and the requirements of each game, but I think it serves to picture the idea.

Also it's in these theoretical tests were bandwidth is mostly relevant, more so than in games. Then again it's the general bandwidth that matters, like the flow I mentioned above.

Yeah, not the best example.. almost as bad as the GFLOPS example, LOL! :laugh:
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Yeah, not the best example.. almost as bad as the GFLOPS example, LOL! :laugh:

But you did understand what I meant or not? I could come up with something involving a changing highway, cars and various sizes of trucks. :D

And what do you mean with the above post? He is describing quite the opposite to what you described. He is saying that once a threshold is surpassed, performance is degraded instead of being improved. If that was the case downclocking memory would give better results and it doesn't either. The card is cool so I don't think there are many memory errors at stock clocks anyway.

As for error detection it could be a problem, if and only if errors are ocurring at stock speeds in every single card. In that case both overcloking and downclocking the memory would not change performance a whole lot even if the card was being memory bottlenecked. Why? Because the higher clock would be offset by more errors ocurring (and more recalculating going on) and lowering it would not degrade performance neither because less errors would offset the lower bandwidth. BUT all that is purely theoretical and it's not happening at all.
 
Last edited:

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Well, I've seen tests where an overclocked 4850 has a similar fillrate output as a 4870 with much faster GDDR5 memory. Real-world games showed otherwise, whereas a 4870 did far better.

I'm not ignoring you or anything.. I'm really trying to tell you after I've seen 1000's of theoretical fillrate tests over the past 10 years.

Why would there still be consistent increases in performance if the memory bandwidth is increased? If your "funnel" is already big enough, why would it make a difference if it's any bigger? That is because the funnel is still not big enough. In some scenarios, the funnel would cause hiccups or stutters, or just a slow-down in the overall output.

Once the funnel becomes just big enough, we'd see a large jump in performance.
 
Last edited:

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Well, I've seen tests where an overclocked 4850 has a similar fillrate output as a 4870 with much faster GDDR5 memory. Real-world games showed otherwise, whereas a 4870 did far better.

I'm not ignoring you or anything.. I'm really trying to tell you after I've seen 1000's of theoretical fillrate tests over the past 10 years.

Yeah I have seen them too, but that's nothing but the proof that that specific fillrate was not the bottleneck in games in general. That's been true for pixel fillrate many times in history. In the higher end of course. i.e 8800GTX vs 9800 GTX.

Why would there still be consistent increases in performance if the memory bandwidth is increased? If your "funnel" is already big enough, why would it make a difference if it's any bigger? That is because the funnel is still not big enough. In some scenarios, the funnel would cause hiccups or stutters, or just a slow-down in the overall output.

You can always have a bigger funnel and that will always give "faster" flow in some cases, but overall performance would not be affected a whole lot*. Only when performance is degraded or gained linearly we can talk about a real bottleneck. We are not talking about a bottleneck then. Increasing a certain bandwidth or fillarate (throughoutput in general) will always affect performance to some degree, if only because of the improved availability and easier syncronization between the different rendering stages ("funnels") thanks to smaller delays (waiting times). But as I said it's not by much.

* And by overall I mean all the tests conducted by Wizzard, which are many.
 
Last edited:

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Hey, that's cool.

A bottleneck might not be gotten rid of in such a linear manner. It's almost like having 1GB of memory versus 512MB.

So many people were clamoring that 1GB was a waste for a HD 4870 compared to 512MB, but I gladly paid extra for 1GB with my 4870.

When I got an X1900XTX on the day it was released, I thought for a long time that it had more than enough memory bandwidth at 1550MHz GDDR3. Overclocking the memory by 100MHz hardly yielded any results at all. However, when an X1950XTX was released with 2000MHz GDDR4 memory (that was proven to have "equal" latencies), it proved the world wrong to the point where people were willing to pay an extra $100 just for 450MHz faster memory alone.

Perhaps the only high-end card that ever had "more than enough" bandwidth was a HD2900XT.

The bottom line here is that as we move forward, we appreciate boosts and increases in technical specifications. A 5870 sported a 100% increase in core specifications but only a 23% increase in memory bandwidth over a 4890. I am one-sided in clamoring for more bandwidth, not just for argument/debate's sake. This side will always win out since we always advance towards more in the future, in the long run at least. Last year, the side against 1GB versus 512MB lost out as the new 4870 1GB started to beat a GTX 260 overall. Only the core 216 version could compete well against it, but then there was the 4890 which was then countered by a GTX 275 as Nvidia wanted to stay on the top by at least 1% "overall" (which is rather subjective among the peers in general).

We will always need memory bandwidth no matter what the theoretical fill-rate tests say a chip could output. GDDR5 was not designed in vain, nor was GDDR4. I would just like to see ATI use that 512-bit architecture once again like they did with HD2900XT cards. I expect to see an overall 28% increase in performance with 512-bit memory using 4800MHz effective GDDR5.
 
D

Deleted member 67555

Guest
wow a single 5770 seems to perform slightly better than 2 4830's in crossfire
 
Status
Not open for further replies.
Top