1. Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

ATI Radeon HD 4800 Series Video Cards Specs Leaked

Discussion in 'News' started by malware, Apr 24, 2008.

  1. lemonadesoda

    lemonadesoda

    Joined:
    Aug 30, 2006
    Messages:
    6,247 (2.14/day)
    Thanks Received:
    963
    Please note the word "If" meaning that, under the situation you might be calling bump mapping geometry effects (which they are)... then all well and true. I did not SAY geometry=bump mapping.

    As for the second statement I made, If "geometry" = "more complex objects" then no, shaders wont help, and = not so great for CAD, then YES, I withdraw that statement. It is wrong for Unified Shaders architecture DirectX10 Shader Model 4.0. It is only true for previous generation GPU.
  2. DarkMatter New Member

    Joined:
    Oct 5, 2007
    Messages:
    1,714 (0.68/day)
    Thanks Received:
    184
    No, no, no... you understood it wrong. In your image, where it says shader core, it's not 1 shader processor, it's the entire shader array. The next stage can be calculated in any available ALU within the core. To explain this simply I will use G80 as an example, since it's SPs are fully scalar. R600 is more complicated because it needs some pre-arrangement, but it works equally in the sense of that next stage of the same fragment or a next fragment within the same stage can be calculated in the next available unit. The latter just means you can do A -> B -> C -> D or calculate several pixels in A stage together and then continue. The latter is how they work nowadays.

    Example: G80 GTX has 128 SP. Imagine you want to calculate vertex data, vertex are represented by x, y and z coordinates and each one is a floating point variable. We are going to say vertex1 is V1(x1, y1, z1), vertex2 is V2(x2, y2, z2)... vertexn Vn(xn, yn, zn) ,In the SP core (of 128), each dimesion can be calculated in 1 ALU which belongs to 1 SP. (there's controversy here as Nvidia said each SP is capable of 2 per clock per SP, but it seems it can't)

    It works like that:

    clock cycle 1 : sp1 runs x1 - sp2 runs y1 - sp3 z1 - sp4 x2 - sp5 y2 - ... - sp127 x44 - sp128 y44 <<< as you can see V44 is not finalized yet, but it doesn't matter because:

    clock cycle 2 : sp1 z44 - sp2 x45 - ...

    And so on. Imagine we have a core with 64 SPs running at 2x the speed. The result, the throughoutput (GFlops) is exacly the same and thus the code is going to be calculated as fast. Same if we have 256 SPs running at half the speed. There won't be any spare SP at any time, unless:

    A: It can't fetch enough data from memory pool, the frame buffer, whatever the reason there is for this: other units are slow, not enough data sent by the CPU...

    B: The Unit that has to continue the work i.e the ROPs can't keep up and have ordered to not continue with the work as the frame buffer is full of unprocessed data.

    You can mix data types in the above example too, as long as they don't belong to the same cluster (I think). G80 and G92 have clusters of 16 SP, GTX and G92 GTS have 8 (8x16=128), GT has 7 clusters. I don't think different data types are allowed within the same cluster, but I wouldn't bet a leg neither...
    Last edited: May 16, 2008
  3. DarkMatter New Member

    Joined:
    Oct 5, 2007
    Messages:
    1,714 (0.68/day)
    Thanks Received:
    184
  4. HAL7000

    HAL7000 New Member

    Joined:
    Jul 28, 2007
    Messages:
    263 (0.10/day)
    Thanks Received:
    23
    Location:
    Nashville TN
    And to think after all is said and done ........we still need to wait and see. Good conversation on everyone's part. A post of the good , the bad and the ugly....lol.

    lets hope nvidia's releases get as much arguments.

    :toast:

Currently Active Users Viewing This Thread: 1 (0 members and 1 guest)

Share This Page