1. Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

HD 5870 Discussion thread.

Discussion in 'AMD / ATI' started by a_ump, Oct 25, 2009.

Thread Status:
Not open for further replies.
  1. jmcslob

    Joined:
    Mar 14, 2009
    Messages:
    2,935 (1.39/day)
    Thanks Received:
    477
    Location:
    Internet Heaven
    Wow guys! :toast: you go ahead and figure this out;)
    I'll buy the revision:D with the better memory
    Thanks for the heads up...Btw is 2 5770's worth it or is 2 5750's good
     
  2. mrsemi

    Joined:
    May 22, 2007
    Messages:
    658 (0.24/day)
    Thanks Received:
    67
    Well, I can't speak for ati 4 series but I had two gtx 280's and a 295 and both of the sli setups suffered horribly with I7 at stock, perhaps I'd have seen a difference overclocked but I'd say microstutter for nvidia is not a thing of the past.
     
  3. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.39/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    I have said many times and still mantain that the problem is IMO in the thread dispatch processor/setup engine.

    1- Both RV770 and RV870 tout the same peak execution of 32k (kilo) threads, so probably the TP/SE has not been changed.

    2- It's been said that RV870 is the exact same architecture as RV770 + DX11 support on the shaders, so probably only the ISA on the shaders have changed, if at all.

    3- I know comparing different architectures is kinda stupid, but it can be valid as a guideline. Nvidia's GT200 had 32k peak threads too, but they have already said (I think it was on Fermi white paper) that in reality it could only do 10-12k and that was part of the reason for the "lacking" performance of GT200, at least at launch. Fermi will have 24k peak only, but thanks to 16 kernels and 2 different dispatch processors they think they will be able to max it out. SO even if we can't compare architectures directly, we do know that one of the companies did a thorought study on their hardware to test usage and saw that their 32k thread processor (12k in ractice) would not cut it, so they decided to put two, a different/weaker ones, but two.

    We could speculate wether AMD's dispatch processor was more efficient or not, but given the performance similarity it most probably had a similar one + the advantage of higher clocks if at all. Now imagine it was indeed a little bit more efficient so that that thread dispatch processor was excessive for RV770, with a heavy overhead they could not really test, because it was the rest of the chip that was holding it down. Imagine that RV770 could only do 10-12k on the shader side of things, just like GT200 did as a whole* and that AMD thought that in theory the DP/SE could really do 24k. In order to realease Evergreen as fast as they did, they probably didn't touch the DP at all, being that in theory it could handle 32k and 24k according to their estimates, plenty. But what if the DP can't do 20k and it only does 16k, for example? Then you have a bottleneck where you didn't thought you would have one. It's not as if you could do anything without a complete redesing so you release that, because, in the end it still is a fast card (the fastest), because you will release much sooner and because you expect to improve the efficiency of usage with future drivers.

    My two cents.
     
  4. Steevo

    Steevo

    Joined:
    Nov 4, 2005
    Messages:
    8,520 (2.56/day)
    Thanks Received:
    1,300
    Software utilization is the problem. In the game. Not the DP. I was writing this as you were, but if this were the case games would all show the same performance problems.


    Game A might render using only 2 threads, and only use 1 or 2 of the shaders per cluster. next throw in the CPU performing the physics and some minor setup information.


    So in a crossfire setup card 1 is generating frame 1, it has been handed the setup and physics information from the CPU, the CPU is then unbound to start working on the next setup as that card is busy, card 2 receives data from the CPU and starts generating frame 2. Card 1 is now done and it is sent to the display, during that time the CPU has generated the next physics and other data for card 1.....so on and so forth.....each card is provided data regardless of what the other card is doing.


    In the 5870 until that frame is done no other information is dispatched to the GPU, so when it is done it must wait on information from the CPU, not alot of info, but the basics of movement from the mouse, physics, and other user and game thread input must be sent to determine WHAT to render. So we have alot of underutilized GPU power, and even if one shader is being used per cluster it will still report that as activity for the cluster.

    So long story short, until game devs learn to use shaders and move data processing to the GPU this card is stuck.
     
    10 Million points folded for TPU
  5. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.39/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    No man. That definately isn't the case. The CPU doesn't have to wait for the GPU at all. It's been almost 30 years since the CPU doesn't have to wait for anything. If what you said was true ALL the fastests cards + SLI/Crossfire stups would run at the same fps. The CPU doesn't care if it's sending info for two cards or one card with 2x the shaders. On a dual GPU setup physics and position (etc.) are also calculated for every frame in every GPU. As long as double buffering is used the GPU doesn't wait until the frame has been displayed either, so it is pposible that at one given clock pixels for two separate frames are being rendered (as long as both belog to the same context that is).

    It's not that.

    And all the games are having the same "problem", in all games performance difference with the HD4890 is mantained around 50%. Different game engines use different number of threads, but the threads that I'm talking about are not the same you are talking about. Inside the GPU a thread is a fragment/pixel. And sometimes (much) more than one thread is used per (final) pixel, depending on how you made your shader program, how many passes have been used...
     
    Last edited: Nov 3, 2009
  6. Binge

    Binge Overclocking Surrealism

    Joined:
    Sep 15, 2008
    Messages:
    6,982 (3.05/day)
    Thanks Received:
    1,752
    Location:
    PA, USA
    This is 2009 and the economy has changed, but that doesn't matter. RIGHT NOW the 5870 is ATI's enthusiast gaming card. I'm sure the 5750 "can well be possible that it can run Dirt2 on 3 monitors at reasonable settings," at some time, but the 5870 was shown at a number of conventions playing Dirt 2 on 3 monitors. It also plays a number of other games very well on 3 monitors. Face it, you're changing the subject constantly to cover up your previous errors. If you want to kiss ATI on the mouth then do it, but they made a card it's not using all of it's juice. It costs almost $400 now and even higher depending on where you look. They could have given the card the luxury of a larger bus width and a dual operation tessilator, but instead they gave us a low bus width and a single operation tessilator.

    I never had that issue with my i7 at stock with several SLI configs and even 295s in single & SLI.
     
  7. Steevo

    Steevo

    Joined:
    Nov 4, 2005
    Messages:
    8,520 (2.56/day)
    Thanks Received:
    1,300
    Then the geme knows your every move, and that turn you made to the right is preprogramed into the game? Probably not. That would mean the game has preprocessed every option, every possible physics situation, and every possible pixel from every possible angle, with every possible show or light.


    The CPU still has to handle the game thread, and the game thread still has to generate positional (vertex) information to send to the GPU as fast as possible. games run differently based on their software threads and how they approach the handoff between the two. Thus the different performance in games as will as in architecture of systems.

    The GPU currently is not responsible for generating more than the pretties on top of the basic information handed to it, GPGPU or OPENCL is the beginning of the GPU doing more of the work for faster framerates, and better physics. No latency introduced by the CPU and communications layers.


    So again think about the step by step process a frame takes as you turn to the left, the CPU is responsible for generating the movement from the mouse/controller input, then hands that to the game thread, which runs on the CPU, that then translates that into character movement, then generates a new set of locations for the GPU to act upon. If the GPU thread generated by the game doesn't utilize all the shader hardware then it creates a artificial bottleneck. Either way the game threads are the holdup, not the GPU core.
     
    10 Million points folded for TPU
  8. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.39/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    Yes, yes and yes to all that, except that the process is no different at all if one or two or three cards are in use (and except the conclusion). According to what you say all games would run at same fps.

    The rendering is the final step of the process, once when all the data for one frame is sent it starts with the next, whether the next set of data is sent to another GPU or to the same GPU that has already finished the work* is irrelevant.

    * A card with half the execution units will take twice the time to render the same frame, but since there is two cards one does the odd frames and the other one the even ones, with a 50% offset in the period. The result is the same.
     
  9. Steevo

    Steevo

    Joined:
    Nov 4, 2005
    Messages:
    8,520 (2.56/day)
    Thanks Received:
    1,300
    So there is no latenty introduced byt the time the GPU reports the frame done sending that back to the CPU, and the CPU sending the next instruction set? There is. Even if it is only rendering with 70% of the GPU hardware, there is still wait time. Wait time the drive in a crossfire diminishes by allowing the next frame to start rendering before the current one is finished, so there is your incremental speedup of over 100% scaling.

    Why doesn't a old game get some absurd FPS that is linearly incremental to the hardware? Latentcy.


    We are at that point, the GPU needs to be handling these calculations on board, or the game DEV's/DX needs a override for frames being rendered in order by sending a new packet without the wait flushing the buffer and starting execution on the next relevant frame, perhaps they do and this is the issue, frames are being dumped by the wayside and not counting.
     
    10 Million points folded for TPU
  10. phanbuey

    phanbuey

    Joined:
    Nov 13, 2007
    Messages:
    5,212 (2.01/day)
    Thanks Received:
    983
    Location:
    Miami

    I highly doubt that they accidentally didn't give it enough bandwidth or decided to go with a single operation (didnt know that) tesselator. More likely is that these are corners which were chosen to be cut for whatever reasons. Ones that are unknown to us. Perhaps by cutting these, they were able to get the cards out faster and thus made them more profitable. Maybe they took shortcuts that enabled them to make an X2 card almost simultaneously with the single GPU. Who knows.

    Point is, they have a card out and its double ready to be released at any given moment.

    Is it below expectations? Well if you read the specs and assumed a linear increase in performance then yes. If you expected a kickass card within +/_ 20% of the current dual GPU options then no.
     
    Binge says thanks.
  11. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.39/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    There's no latency there. The CPU doesn't need the GPU reporting anything to start calculating and not even to start sending data. Well there might be some due to protocol handling, like 10-50 clock cycles out of the 850.000.000 in the HD5870!!!!

    Old games don't get absurdly high frames because they are CPU limited, limited by their ability to calculate physics, AI and geometry and the result is a bottleneck that affects every card configuration: every combination of GPUs give the exact same fps. That's not the case here, in fact is quite the opposite, because in a CPU bottlenecked scenario the multi-gpu setup would suffer lower fps, because a lot of data must be sent twice, occupying CPU clocks.

    http://www.techpowerup.com/reviews/HIS/HD_5770/18.html - 1024x768, that is a CPU bottleneck, in that situation yes latencies do matter a bit, although sincronization of different clock domains plays a much more important role. In fact here http://www.techpowerup.com/reviews/ATI/Radeon_HD_5870_CrossFire/15.html you can see better how crossfire works out to be slower than single HD5870.
     
    Last edited: Nov 4, 2009
    a_ump says thanks.
  12. Binge

    Binge Overclocking Surrealism

    Joined:
    Sep 15, 2008
    Messages:
    6,982 (3.05/day)
    Thanks Received:
    1,752
    Location:
    PA, USA
    I agree with your view completely and you understand where the voices are coming from. I also don't think it was by accident, and the card's overall perf suffers from something they could have done better. It was a smart move for them to bite off what they could chew. The risk of going for the gold and world record when NV is having problems would have been too great. I'm not saying the risk wouldn't have paid off, but they obviously beat NV by a LARGE margin so for their business it's a total win. Innovation always suffers at the cost of risk associated design decisions.
     
  13. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.39/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    @ Binge and phanbuey

    maybe I failed to make that point clear, but when I was talking about the DP and setup engine, I meant that. That they knew it would affect somehow, but they decided it would pay off not to redesign the whole architecture. Although I don't think they knew it would affect so much (whatever the problem is) or they would have put less SPs on the chip to make it cheaper.
     
  14. Binge

    Binge Overclocking Surrealism

    Joined:
    Sep 15, 2008
    Messages:
    6,982 (3.05/day)
    Thanks Received:
    1,752
    Location:
    PA, USA
    Didn't I say I understood that?
     
    a_ump says thanks.
  15. Steevo

    Steevo

    Joined:
    Nov 4, 2005
    Messages:
    8,520 (2.56/day)
    Thanks Received:
    1,300
    5770 pixels per second. W X H X FPS

    1024X768

    174,587,904

    1680X1050

    313,639,904

    2560X1600

    386,252,800


    There is only two reasons the ramp would have not stayed the same between the last two, memory bandwidth limit, and that is not plausable as others have already done tests to confirm memory clock has little to do with performance. PCIe bandwidth as that has little to do with performance. And the cards being underutilized by the software threads controlling it. Wether or not due to latentcy constraints, the hardware should have a linear rate of descent, minus a bit of overhead. The CPU can supply data at a given rate for the current frame to be rendered. i will run some numbers tonight when I get back and try a couple games on my system at different resolutions and GPU loads. I still believe the latentcy even at higher frame rates is what is causing the questions/issues for some.
     
    a_ump says thanks.
    10 Million points folded for TPU
  16. wolf

    wolf Performance Enthusiast

    Joined:
    May 7, 2007
    Messages:
    5,546 (1.99/day)
    Thanks Received:
    846
    You liaise with ATi's R&D department?

    They did what they did because they were able to take the crown for single GPU, and beat Nvidia to the cake. I'm pretty sure it's that simple.
     
  17. a_ump

    a_ump

    Joined:
    Nov 21, 2007
    Messages:
    3,620 (1.40/day)
    Thanks Received:
    376
    Location:
    Smithfield, WV
    gotta say you guys are bringing alot of my thoughts and wondering with your posts and its a great discussion :), Though it seems most of us agree there's more potential for the HD 5870. Another reason i think they may have cut corners is bc as said, it allowed them to get control of the market before nvidia released. By releasing this early compared to nvidia, they'll have a good headstart for their next architecture and by cutting these corners it is probably helping them determine what really is going to be a factor come DX11 titles so they'll have a better idea of how to design their next chip. IMO this gen's launch is a big win for AMD and i'd like to see them design a new architecture instead of building on the current, which has been used since RV6X0 days(or was it RV5XX?). One thing i'd like to see and idk much bout this so idk if it'd add too much complexity or not, but i'd like to see ATI unlock their shader clock from core. I mean think if that was the factor with the next gen. Even with only 1200 shaders but clocked at say 1500 with oc headroom that would boost ATI's performance tremendously...i think :p.

    keep it up guys, this discussion is very interesting.
     
  18. Mussels

    Mussels Moderprator Staff Member

    Joined:
    Oct 6, 2004
    Messages:
    42,547 (11.42/day)
    Thanks Received:
    9,824
    thats just CoH.

    Disable Vsync and watch the FPS soar.
     
  19. 20mmrain

    20mmrain

    Joined:
    Oct 6, 2009
    Messages:
    2,774 (1.46/day)
    Thanks Received:
    826
    Location:
    Midwest USA
    It should on paper but in real life nothing is certain. Also A_ump you have to remember that the Drivers for the 5870 are still really young. As the card matures the performance will definitely increase.
    Like it was stated ..... Last Gen cards were really powerful and that a Single GPU eve comes close to beating a Duel GPU from last Gen is impressive. I own a Diamond 5870. Before I bought it I worried about the same thing you just commented on. With it's performance. But you know after I saw how much a 4870 improved after the driver updates came out. I calmed all my my worries.
    I sold my EVGA GTX 285 FTW edition to get this card. That's how sure I am after all is said and done.... there will be nothing that comes close to this card from last GEN when all the updates BIOS flashes and tweaks are done.
     
  20. a_ump

    a_ump

    Joined:
    Nov 21, 2007
    Messages:
    3,620 (1.40/day)
    Thanks Received:
    376
    Location:
    Smithfield, WV
    very true. and there's that lovely suspicion among some of us that ATI is intentially holding back the HD 5870's performance onpurpoase as it currently selling fine and is has the single gpu performance crown.
     
  21. erocker

    erocker Super Moderator Staff Member

    Joined:
    Jul 19, 2006
    Messages:
    39,951 (13.00/day)
    Thanks Received:
    14,378
    Lol, no.
     
  22. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.39/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    That's what I said. Look, that there's something "wrong" with the card is clear, that they released the best card this quarter is clear too. That they didn't care because they'd have to go back to the drawing board otherwise, is not so clear, but we are all saying that, and since it is an improvement over previous cards it doesn't matter anyway. They wanted the crown and they got it, but at the expense of doing a less efficient design. Who cares? Well when it comes to market reality, no one, I don't, but I am a tech yonkie and I like discussing architectures and how they affect performance, etc. So in that sense I care, it's not performing as it should, I just want to know why.
     
    Bo_Fox says thanks.
  23. wolf

    wolf Performance Enthusiast

    Joined:
    May 7, 2007
    Messages:
    5,546 (1.99/day)
    Thanks Received:
    846
    I missed the bit where you said you liaise with their R&D, not to mention I'm allowed my opinion in not believing you :p

    You've also managed to restate the same point over and over and over, we do get it brah.
     
  24. a_ump

    a_ump

    Joined:
    Nov 21, 2007
    Messages:
    3,620 (1.40/day)
    Thanks Received:
    376
    Location:
    Smithfield, WV
    +1 to that my friend
     
  25. wolf

    wolf Performance Enthusiast

    Joined:
    May 7, 2007
    Messages:
    5,546 (1.99/day)
    Thanks Received:
    846
    I think that is true of most people who frequent the video cards section of TPU, GPU architecture is far more interesting than CPU architecture to me, especially how both camps continue to have such vastly different approaches, yet end up in roughly the same spot, its an amazing race to take part in.
     

Currently Active Users Viewing This Thread: 1 (0 members and 1 guest)

Thread Status:
Not open for further replies.

Share This Page