1. Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

HD 5870 Discussion thread.

Discussion in 'AMD / ATI' started by a_ump, Oct 25, 2009.

Thread Status:
Not open for further replies.
  1. wolf

    wolf Performance Enthusiast

    Joined:
    May 7, 2007
    Messages:
    5,541 (2.08/day)
    Thanks Received:
    842
    It only peaked my interest as I actually owned a 512mb original run 9800GTX from 'back in the day', oh how I wished it was 1gb....

    :toast:
  2. Bo_Fox

    Bo_Fox New Member

    Joined:
    May 29, 2009
    Messages:
    480 (0.25/day)
    Thanks Received:
    57
    Location:
    Barack Hussein Obama-Biden's Nation
    Not to sound like a broken record, but just like with a 5770, giving the 5870 more bandwidth will unleash a lot more out of the card. A 4890 is 20% faster than a 5770 despite a 5770 having slight architecture optimizations. Also, a 4890 has 62.5% greater memory bandwidth than a 5770. That's the only thing holding a 5770 back.

    Let's give a 5870 100% greater bandwidth by doubling the bus to 512-bit. Perhaps ATI will actually release it in a few months. We'll be seeing 25-30 performance gains with a 100% greater memory bandwidth, if its "half-sized" derivative still could not reach within 20% of a 4890 that had 62.5% more bandwidth.

    I think that it could actually be up to a 35% gain overall, as a bigger chip is in an even greater need of memory bandwidth in order to chug along without stuttering hiccups of any kind (in some cases, when the bottleneck is removed, there are much bigger jumps in performance). Nvidia fully realizes this and is probably secretly delaying with re-spins to ensure that it will still beat a beefied-up 512-bit 5870. Of course there's the 5970 (5870X2) to be released for sure, before we'd ever be seeing anything 512-bit again for a short while.

    ATI does not need to do 512-bit right now because of the lack of competition. Rich people do not care too much because they can buy 3 or 4 of those 5870's and pray that whatever games they play still scale nicely with 3x or 4x Xfire. ATI does not want to do this until a 5870X2 is out first (which would be too complicated to do 512-bit anyways).

    It's nice to try to solve the mystery of a 5870 performing below expectations from as many angles as possible, but there has been overwhelming evidence with a 5770 versus a 4890. Only logic can solve the mystery.


    EDIT: It's only my opinion, really.
    Last edited: Nov 6, 2009
  3. SNiiPE_DoGG New Member

    Joined:
    Apr 2, 2009
    Messages:
    582 (0.30/day)
    Thanks Received:
    135
    yeah, but the bandwidth is less important with the 5870 - the "lack" of performance increase matching that of dual gpus is more because of diminishing returns with the scaling of the architecture and drivers not addressing the resources fully than bandwidth.

    I dont think the 5770/4890 is an applicable comparison to the 5870/4870x2
  4. Bo_Fox

    Bo_Fox New Member

    Joined:
    May 29, 2009
    Messages:
    480 (0.25/day)
    Thanks Received:
    57
    Location:
    Barack Hussein Obama-Biden's Nation
    It's not exactly about comparing a 5870 to a 4870X2. It's just that a half-sized derivative of a 5870 (5770) is obviously held back by half the amount of 5870's bandwidth.

    A 5870 could very well be held back even more, as it would need even more to overcome diminishing returns as related to the bottleneck scenario. Just my 2 cents.
  5. 20mmrain

    20mmrain

    Joined:
    Oct 6, 2009
    Messages:
    2,772 (1.55/day)
    Thanks Received:
    825
    Location:
    Midwest USA
    I didn't get to benchmark my 5870 till recently! While My 5870 does play all the games at really nice FPS. In benchmarking that I just did my overall 3d Vantage score is 13880 at stock speeds with Q9550 @ 3.4. I know I will be able to get more out of the card with more over clocking of the card it's self. But I am also really starting to think that Something else on the outside is holding back the true power of this card! Weather it be Drivers or something else. I just wish if it is a ploy by ATI .... that they start implementing the improvements soon!

    It's a conspiracy!
  6. Bo_Fox

    Bo_Fox New Member

    Joined:
    May 29, 2009
    Messages:
    480 (0.25/day)
    Thanks Received:
    57
    Location:
    Barack Hussein Obama-Biden's Nation
    Yeah, the grand conspiracy of being unable to do 512-bit with GDDR5 memory!!! :laugh::toast:
  7. 20mmrain

    20mmrain

    Joined:
    Oct 6, 2009
    Messages:
    2,772 (1.55/day)
    Thanks Received:
    825
    Location:
    Midwest USA
    See you heard about it too. I knew it was true LOL :)
  8. jmcslob

    Joined:
    Mar 14, 2009
    Messages:
    2,897 (1.46/day)
    Thanks Received:
    457
    Location:
    Internet Heaven
    Can Someone Post A Conclusion

    I don't feel like reading this whole POST...
    Can someone do a quick conclusion please?;)
  9. Mussels

    Mussels Moderprator Staff Member

    Joined:
    Oct 6, 2004
    Messages:
    42,125 (11.67/day)
    Thanks Received:
    9,461
    people think the 5870 aint as fast as it should be, and are speculating wildly as to why, and if it will get better via drivers.
    Bo_Fox and jmcslob say thanks.
  10. jmcslob

    Joined:
    Mar 14, 2009
    Messages:
    2,897 (1.46/day)
    Thanks Received:
    457
    Location:
    Internet Heaven
    Ok, I'm sure it's a driver thing as there is no real Hurry to make these Cards as spectacular as possible for Now:D as it stands i don't think any games will really Tax these things + the competition has release coming soon Right?
  11. wolf

    wolf Performance Enthusiast

    Joined:
    May 7, 2007
    Messages:
    5,541 (2.08/day)
    Thanks Received:
    842
    :respect: way to sum up 9 pages in 25 words and a number.
  12. grimeleven New Member

    Joined:
    Oct 10, 2009
    Messages:
    19 (0.01/day)
    Thanks Received:
    8
    512bit memory bus "upgrade" on 5870's ain't possible, they would need to re-do the architecture hence future products.

    Tho in this article they talk CPU's http://www.physorg.com/news151158992.html (it is still similar explanation towards ATI SPU's)

    QFT

    *Edit* hehe bring back HD2900 pro/XT 1024-bit bi-directional ring bus (512-bit read and 512-bit write,8 64-bit memory channels for a total bus width of 512-bits)! pricey i know.. :D
    http://www.anandtech.com/showdoc.aspx?i=3367&p=5
    Last edited: Nov 7, 2009
  13. Bo_Fox

    Bo_Fox New Member

    Joined:
    May 29, 2009
    Messages:
    480 (0.25/day)
    Thanks Received:
    57
    Location:
    Barack Hussein Obama-Biden's Nation
    Yeah, pricey, but a HD2900XT was not any more expensive than a 5870 really. Well, maybe just a tiny bit more expensive, but a 5870 is being jacked up in prices due to demand/supply.

    Here's a bit from a guy who wrote well concerning this subject, but he had a few errors (1024-bit ringbus, not 512-bit, etc..) and a wild assumption that using 512-bit would take up a quarter of the silicon real estate.

    -by PorscheRacer (http://www.anandtech.com/video/showdoc.aspx?i=3643&cp=30#comments)
  14. Zubasa

    Zubasa

    Joined:
    Oct 1, 2006
    Messages:
    3,980 (1.38/day)
    Thanks Received:
    457
    Location:
    Hong Kong
    On the other hand, the 512-bit bus did not prevent the 2900XT from getting destroyed by the G80.
    That leads to AMD redesigning the R600 to RV670 or the 3870 which in many ways surpasses the R600, and the R770 further opitmized the arch.
    The Cypress is simply building ontop of the R770's success.

    Isn't it intresting that the HD 4670 with 128-bit GDDR3 performs close to the HD 2900 Pro?
    In fact the HD 4670 surpasses the 2900XT when AA is used, the R600/RV670 simply gets destroyed in AA.
    Last edited: Nov 7, 2009
  15. grimeleven New Member

    Joined:
    Oct 10, 2009
    Messages:
    19 (0.01/day)
    Thanks Received:
    8
    Yeah i've found that post earlier too and check this out, (from Eric Demers, architecture lead on R600)
    ... was he fired or something? i guess they went the "cheap" route.
    http://www.beyond3d.com/content/interviews/39/5
    Bo_Fox says thanks.
  16. Steevo

    Steevo

    Joined:
    Nov 4, 2005
    Messages:
    8,192 (2.55/day)
    Thanks Received:
    1,137
    This is for eveyone theorizing about a fact that has been proven again and again in this thread.

    It doesn't need more bandwidth. There is enough. We need to move forward with the idea that the setup engine is not providing double the amount of information, but throught the obvious use of the tessellation hardware this card & ATI is intending on using acceptable framerates with uber high end graphics. Did they try and steal the performance crown from Nvidia with the 4XXX series? No they openly said so in interviews. They went for the mainstream market, once they had that they threw two chips on a board and went for the top.


    Everyone is forgetting this is a new spin on a old design that worked well. The old chips were not bandwidth limited, but the tessellation needs more surface work done, and that is not a primary storage cache bandwidth issue, it is a internal cache issue, which is where they came up with the insane bandwidth the internal cache has. The raw amount of data needed for good resolution and great looking visuals is no longer going to need to be increase by a insane amout, instead the hardware takes care of generating thousands of extra triangles and surface rendering, leaving the models smaller and more compact, thus using .........less bandwidth.


    So, we have a much better looking surface, with few original traingles and fewer orginal vertex points, the tessellator being used to generate thousands if not millions of extra points, and lines.

    Attached Files:

    Zubasa says thanks.
    10 Million points folded for TPU
  17. Zubasa

    Zubasa

    Joined:
    Oct 1, 2006
    Messages:
    3,980 (1.38/day)
    Thanks Received:
    457
    Location:
    Hong Kong
    I would have certainly fire the guy simply the R600 got its ass handed to it by the G80. :nutkick:
    The engineers simply don't know what they need to make the GPU perform. :shadedshu
  18. Bo_Fox

    Bo_Fox New Member

    Joined:
    May 29, 2009
    Messages:
    480 (0.25/day)
    Thanks Received:
    57
    Location:
    Barack Hussein Obama-Biden's Nation
    Already stated a few posts ago that R600 chip had "flawed" architecture in which the antialiasing work was offloaded onto the shader units (and there were only 320 shaders). The performance was bottlenecked by the TMU's and shaders, not the memory. Once again, to quote my words from a few posts ago, from memory, a HD2900XT was probably the only card that was not bandwidth-limited. I wish to say it with a bit more clarity.. it was the chip that was the least likely to be bandwidth-limited in any games/resolutions/settings.

    ATI was just doing whatever they could with that poorly-designed R600 chip that couldn't hold a candle to an 8800GTX that was several months older. But ATI does not need to do that with a 5870, due to the lack of competition. The R600 architecture was designed with the future in mind, when the shader units would then be doubled or quadrupled in order for the architecture to make sense. It has probably damaged the reputation of 512-bit memory for a few of the enthusiasts.

    I wish I could thank you twice for this quote that proves that using 512-bit does not require much more silicon die size area. :toast:

    [​IMG]

    Just as a 5770 would have done at least 20% better with 4890's bandwidth (which is 60% greater), a 5870 would do ~30% better with 512-bit bandwidth (which is 100% greater than the 256-bit one on a 5870 right now).

    Actually, in the chart above, the 4890 is only 19% faster than a 5770 with identical specs (except for lower memory bandwidth), but that is also because of slight architecture optimizations with the 58xx series. So I'd add in about 3% for the optimizations and say that a 5770 would be 22% faster if it had the same bandwidth as a 4890.

    Now, look at the chart below, at how increasing the memory bandwidth for a 4890 still gave consistent increases (except for Fallout3 which was probably already core-limited by either TMU's or shaders or possibly even ROP's):

    [​IMG]
    source: Anandtech

    That's a 4890 with the bandwith of a 5870. A chip with half the "theoretical fillrate", half the "theoretical TFLOPs", half the TMU's, half the ROP's, half the shaders, and without slight architecture optimizations, still benefited from greater bandwidth in every single game.

    Imagine a 5770 with 5870's bandwidth (256-bit memory, 4.8GHz). It'd be showing more than a 22% increase for sure (plus additional 3% due to the difference between a 4890 and 5870's bandwidth as shown in the chart above).

    A 5770 is exactly half of a 5870, and it still benefits from having 5870's bandwidth. Imagine how starved a 5870 chip (2x of a 5770) is for more bandwidth!

    [​IMG]

    A sales pitch: Memory does matter!

    By the way, overclocking the core of a 4890 up to 1GHz shows how it's even more starved for memory bandwidth, with much bigger gains across the chart when increasing the memory from 3.9Gbps to 4.8Gbps:

    [​IMG]
    source: Anandtech
    Last edited: Nov 7, 2009
    skylamer says thanks.
  19. Steevo

    Steevo

    Joined:
    Nov 4, 2005
    Messages:
    8,192 (2.55/day)
    Thanks Received:
    1,137
    yes, the 18 and three quarter percent increas in memory overclocking had the huge 3% maximum effect.........or mebey just the increase at timings gives the benefit.....same as DDR2 did to DDR, same as DDR3 is doing to DDR2 and is just the evolution of the beast. the "theoretical" has no place when you have hard facts to work with.


    For example lets consider DDR at 400Mhz with 5.5.5.15 timings. I had a set when that was cream. It lost to DDR2 with 10.12.10.32 timgs in bandwidth just do to the speed, unless you filled it to the last row, column, and chip on the furthest stick, then it might have come close.

    You must realise unless you are dealing with the last row, last column, last chip the speed increase in this case in always going to benefit latentcy/cache misses and thus the wait time for the core will be less, and has absolute shit to do with the bandwidth.

    So when we divide out whatever the timings are the number of ns it takes to get data, by the increase in speed and the shorter delay we will find that the resuolt is close to the "theoretical" increase shown in the chart. And nothign to do with being able to deliver 2GB of data in one second less.
    Last edited by a moderator: Jan 13, 2010
    HalfAHertz says thanks.
    10 Million points folded for TPU
  20. Mussels

    Mussels Moderprator Staff Member

    Joined:
    Oct 6, 2004
    Messages:
    42,125 (11.67/day)
    Thanks Received:
    9,461
    i think your examples got messed. DDR2 was where CL5 came in, and CL10 for DDR3
    Steevo says thanks.
  21. Steevo

    Steevo

    Joined:
    Nov 4, 2005
    Messages:
    8,192 (2.55/day)
    Thanks Received:
    1,137
    :toast: This is me with a few shots, just ogt back from ......uhh.......yeah.
    10 Million points folded for TPU
  22. Bo_Fox

    Bo_Fox New Member

    Joined:
    May 29, 2009
    Messages:
    480 (0.25/day)
    Thanks Received:
    57
    Location:
    Barack Hussein Obama-Biden's Nation
    Hey Steevo, it's not about who's getting bent. You have no way of knowing what the latencies are.

    Anyways, I already stated several posts ago that the latencies on an X1950XTX's GDDR4 memory was proven to be quite similar to that of an X1900XTX's GDDR3 memory, according to identical performance when the GDDR4 was downclocked to the same speed.

    That's pretty much all I can say. It's not a matter of being right or wrong. It's not a matter of winning over others. It's only a matter of knowledge and understanding.. perhaps you really think that you understand the insignificance of memory bandwidth, but this is not where the truth lies. To put money where the mouth is, look at how much more an X1950XTX sold for than an X1900XTX. Would you trade your 4890 for my 5770 (if we had those cards in our possession)? No, I do not think so.

    I'm just trying to pacify something that's going on here. That's all there is to it.

    I mean, why are we even discussing latencies here in the first place? Why is there such a need to try as hard as possible to blame everything else but the memory bandwidth?
  23. bobzilla2009 New Member

    Joined:
    Oct 7, 2009
    Messages:
    455 (0.26/day)
    Thanks Received:
    39
    Right, I'm just going to do some investigations into the effect of memory overclocking on my hd5870, the core clock on the card will be set to 900MHz for all the following results, i'll update this post as i do more benchmarks.

    First up is left 4 dead.
    Time demo was recorded for the first level of no mercy, from the rooftop start point to entering the safe room door, framerates recorded with fraps. Any aspect of human error has been included with results. settings: 1680x1050, maximum, 8xmsaa and no vsync.

    900/1200: 118fps average +/- 2
    900/1250: 117fps average +/- 2
    900/1300: 118fps average +/- 2

    So for the first test we see no gain from memory overclocking outside the bounds of any human error that could occur. From an 8% increase, we would at least expect to see a small increase in the region of 3-4 fps in a memory bottlenecked system. It is worthwhile to note that gpu usage was around 60% at most, while cpu usage was 50% at most, so neither were pushed to the limit, therefore the l4d test should be taken with a pinch of salt.

    UNIGINE DX11 BENCHMARK

    moving on to one of the more extreme gpu tests for the hd5870, this benchmark pushed the gpu usage up to 100% as expected, so the memory bandwidth should become more of a factor than in the l4d test. Settings used were 1680x1050, 4xMSAA, all settings highest and tessellation on. (Vsync turned off, ofc).

    900/1200: FPS 31.9 [score 803]
    900/1250: FPS 32.3 [score 814]
    900/1300: FPS 32.8 [score 825]

    this time around we see a linear increase when increasing the memory speed. However, the increase is only in the region of around 1% in each case. If the card were truly bottlenecked, we would expect to see a more drastic increase, but as it stands it does appear that the speed of the card seems to be pretty well matched against the memory speed. Of course, that may all change as ati improve the drivers to unleash more power from the card. Comparing the card to other generations is not feasible either, if the hd4890 was bottlenecked by it's memory bus, how can the hd5870 be so much faster if it is using the same memory? the increase in bandwidth is only around 23% but the increase in overall speed is around 60-80%. The numbers just don't add up, also remember that ati aren't idiots, if they had an epic card that would be severely bottlenecked by a 256 bit bus, they wouldn't used one. The card would cost more, that's for sure, but it would also be a helluva lot faster if it was severely bottlenecked. My experimenting with memory clocks has not yielded anything to indicated such a bottleneck thus far (and i have done others as well as the ones shown here), but as stated earlier, driver improvements in the future will likely prove whether or not the bottleneck exists.

    However it could be entirely likely that the memory speed has a far lesser effect at the resolutions i am testing at, especially in l4d. If that is the case, i'm not too worried about the whole thing :)
    Last edited: Nov 8, 2009
    Bo_Fox, Steevo and HalfAHertz say thanks.
  24. Steevo

    Steevo

    Joined:
    Nov 4, 2005
    Messages:
    8,192 (2.55/day)
    Thanks Received:
    1,137
    WTFBBQ!!!!!!


    The increase was from the marginaly faster cache miss data delivery than from the extra bandwidth. XYZOMG!!!!


    No can we please discuss the new hardware the card has?


    can you run s set with tessellation on and off Bob?
    10 Million points folded for TPU
  25. Zubasa

    Zubasa

    Joined:
    Oct 1, 2006
    Messages:
    3,980 (1.38/day)
    Thanks Received:
    457
    Location:
    Hong Kong
    You know, some people just can't get over themselves you know :p
    They always think the 5870 needs more bandwidth because the nVidia cards are supposed to have more.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guest)

Thread Status:
Not open for further replies.

Share This Page