1. Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Tesla K20 GPU Compute Processor Specifications Released

Discussion in 'News' started by btarunr, Oct 17, 2012.

  1. btarunr

    btarunr Editor & Senior Moderator Staff Member

    Joined:
    Oct 9, 2007
    Messages:
    28,416 (11.29/day)
    Thanks Received:
    13,615
    Location:
    Hyderabad, India
    Specifications of NVIDIA's Tesla K20 GPU compute processor, which was launched way back in May, are finally disclosed. We've known since then that the K20 is based on NVIDIA's large GK110 GPU, a chip never used to power a GeForce graphics card, yet. Apparently, NVIDIA is leaving some room on the silicon that allows it to harvest it better. According to a specifications sheet compiled by Heise.de, Tesla K20 will feature 13 SMX units, compared to the 15 available on the GK110 silicon.

    With 13 streaming multiprocessor (SMX) units, the K20 will be configured with 2,496 CUDA cores (as opposed to 2,880 physically present on the chip). The core will be clocked at 705 MHz, yielding single-precision floating point performance of 3.52 TFLOP/s, and double-precision floating point performance of 1.17 TFLOP/s. The card packs 5 GB of GDDR5 memory, with memory bandwidth of 200 GB/s. Dynamic parallelism, Hyper-Q, GPUDirect with RDMA are part of the new feature-set. The TDP of the GPU is rated at 225W, and understandably, it uses a combination of 6-pin and 8-pin PCI-Express power connectors. Built in the 28 nm process, the GK110 packs a whopping 7.1 billion transistors.

    [​IMG] [​IMG]

    Source: Heise.de
    Last edited: Oct 17, 2012
  2. TheGuruStud

    TheGuruStud

    Joined:
    Sep 15, 2007
    Messages:
    1,615 (0.64/day)
    Thanks Received:
    168
    Location:
    Police/Nanny State of America
    So, buy 5870s. Got it :p
  3. sergionography

    Joined:
    Feb 13, 2012
    Messages:
    264 (0.28/day)
    Thanks Received:
    33
    in other words it can almost match tahiti
  4. HumanSmoke

    HumanSmoke

    Joined:
    Sep 7, 2011
    Messages:
    1,283 (1.18/day)
    Thanks Received:
    404
    Seems like a repeat of GF100/110. Hardly surprising if the die is 500mm^2+

    The first Fermi Tesla's (M2050/M2070) out of the gate were basically GTX 470 spec. M2090 released more recently is pretty much a GTX 580.

    Would be interesting to know whether these Tesla's are the same SKU's that ORNL are taking delivery of, or whether they are higher spec since Oak Ridge seemed to be the high profile launch customer.
    Any comparison probably depends on actual performance efficiency rather than hypothetical. Unless you know what K20 brings to the table, a theoretical comparison is largely useless.

    BTW: The original site now no longer features any specification
    Last edited: Oct 17, 2012
  5. Solaris17

    Solaris17 Creator Solaris Utility DVD

    Joined:
    Aug 16, 2005
    Messages:
    17,153 (5.20/day)
    Thanks Received:
    3,545
    Location:
    Florida
    those cores.....my god.
  6. [H]@RD5TUFF

    Joined:
    Nov 13, 2009
    Messages:
    5,615 (3.21/day)
    Thanks Received:
    1,707
    Location:
    San Diego, CA
    do want
  7. bogami

    bogami

    Joined:
    Jan 15, 2012
    Messages:
    241 (0.25/day)
    Thanks Received:
    7
    Location:
    Slovenia
    Estimated 20 PFOPS/s peak petaflops .!!!:eek::twitch: and3.52 TFLOP/s normal. D.P.1.17 TFLOPS/s.
    Nice peak.
    I wish 20 PFOPS/s on next GPU option.:D
  8. The Von Matrices

    The Von Matrices

    Joined:
    Dec 16, 2010
    Messages:
    1,200 (0.89/day)
    Thanks Received:
    371
    5GB of memory? That's not evenly divisible by the 384-bit memory bus it was rumored to have. Has it been reduced to 320-bit, which could produce an even 5GB?
  9. HumanSmoke

    HumanSmoke

    Joined:
    Sep 7, 2011
    Messages:
    1,283 (1.18/day)
    Thanks Received:
    404
  10. btarunr

    btarunr Editor & Senior Moderator Staff Member

    Joined:
    Oct 9, 2007
    Messages:
    28,416 (11.29/day)
    Thanks Received:
    13,615
    Location:
    Hyderabad, India
    Mix matching. Just like 2 GB is made possible on 192-bit.
    NHKS says thanks.
  11. Prima.Vera

    Prima.Vera

    Joined:
    Sep 15, 2011
    Messages:
    2,205 (2.05/day)
    Thanks Received:
    287
    LOL. 7 billion transistors! I remember that my old 3dfx VooDoo 3 was having 7 million transistors and was the fastest when released. :))))
  12. The Von Matrices

    The Von Matrices

    Joined:
    Dec 16, 2010
    Messages:
    1,200 (0.89/day)
    Thanks Received:
    371
    True, that is possible. But would it really be done on a high-end compute card where consistent and predictable performance is important? It would be a headache for developers to have to track which addresses they write and determine which data should go in the more or less interleaved parts of the memory space.
  13. Maban

    Maban

    Joined:
    Mar 6, 2008
    Messages:
    2,375 (1.00/day)
    Thanks Received:
    996
    It's probably twenty 256MB chips on a 320-bit bus.
  14. btarunr

    btarunr Editor & Senior Moderator Staff Member

    Joined:
    Oct 9, 2007
    Messages:
    28,416 (11.29/day)
    Thanks Received:
    13,615
    Location:
    Hyderabad, India
    Low level video memory management is handled by API>CUDA>driver. Apps are oblivious to that. Apps are only told that there's 5 GB of memory, and to deal with it.
    1c3d0g says thanks.
  15. largon New Member

    Joined:
    May 6, 2005
    Messages:
    2,778 (0.82/day)
    Thanks Received:
    432
    Location:
    Tre, Suomi Finland
    That die shot definitely has 384bits worth of memory bus...
  16. T4C Fantasy

    T4C Fantasy CPU & GPU DB Maintainer

    Joined:
    May 7, 2012
    Messages:
    973 (1.15/day)
    Thanks Received:
    411
  17. Xzibit

    Joined:
    Apr 30, 2012
    Messages:
    1,121 (1.32/day)
    Thanks Received:
    252
    Incase you didnt know Mark Harris points out he works for Nvidia.

    So you might want to check who runs the sites your linking to if you want to link to un-bias information.

    It be like linking to sites/blog run by AMD employees to make a point or further a view point of a AMD product.

    Just silly.
  18. HumanSmoke

    HumanSmoke

    Joined:
    Sep 7, 2011
    Messages:
    1,283 (1.18/day)
    Thanks Received:
    404
  19. cadaveca

    cadaveca My name is Dave

    Joined:
    Apr 10, 2006
    Messages:
    13,864 (4.53/day)
    Thanks Received:
    6,933
    Location:
    Edmonton, Alberta
    woah, how'd i miss this. Thanks for bumping, Smoke!

    :roll:
  20. Xzibit

    Joined:
    Apr 30, 2012
    Messages:
    1,121 (1.32/day)
    Thanks Received:
    252
    Talk about idiot fanboyism.

    That site is run by Mark Harris a Nvidia employee. Are you so naive that hes gonna post un-bias research link on his site/blog.
    Nvidia would find a way to fire him in a second if he posted links to research papers that put Nvidia in a bad light.

    It only took me 1 mouse click to findout he was a Nvidia employee. Come-on now. Whos trolling now ?

    Atleast show both sides or attempt to so you wont seam like a Nvidia cheerleader

    Thats from a Oak Ridge National Labaratory along with University of Tennesse and University of Manchester in UK study.

    58% is lower then 90% in DGEMM. Maybe Kepler GK100/110 has a 34% jump who knows but chip on the GTX 280 was only 34% in DGEMM.

    What do i know tho. I would think Oak Ridge National Labaratory does since they use the darn things.;)
    Last edited: Oct 17, 2012
  21. HumanSmoke

    HumanSmoke

    Joined:
    Sep 7, 2011
    Messages:
    1,283 (1.18/day)
    Thanks Received:
    404
    Sure - I'll use your quotes (and mine since you obviously can't RTFP) as examples
    Yup. Which just goes to prove that real-world and theoretical numbers differ. Which is exactly as I noted. Likewise I made no assumption based upon a part whose performance is unknown...or do you have access to Kepler information that everyone outside of Nvidia and HPC projects don't?
    So what is the DGEMM efficiency of Kepler ?
    All I see here is a brief synopsis of Fermi
    And of course, at no point did I make an AMD vs Nvidia comparison- quite the opposite in fact
    Get back under your bridge Xzibitroll - I'm sick of having to explain simple compound sentences to you.
  22. T4C Fantasy

    T4C Fantasy CPU & GPU DB Maintainer

    Joined:
    May 7, 2012
    Messages:
    973 (1.15/day)
    Thanks Received:
    411
    http://www.techpowerup.com/gpudb/923/NVIDIA_Tesla_C2050.html

    previous gen NVidia architecture calculates floating points by shader clock so the C2050 would be 1Tflop of single precision
  23. Xzibit

    Joined:
    Apr 30, 2012
    Messages:
    1,121 (1.32/day)
    Thanks Received:
    252
    Those test are done in Double-percision. For single-percision it would be SGEMM.
    C2050 is 515 GFlop/s in double precision so its only 58% as advertised.

    Kepler would have to make up alot of ground in effeciency.

    The point i was try'n to make was..

    Pointing to a 90% effeciency of Tahiti in DGEMM as if its a bad thing, Especially from a site/blog of a Nvidia employee.
    As compared to what ? Nvidias Fermi 58% effeciency in DGEMM ? That Nvidia employee doesnt have a link to that on his site. Wonder why ?
    Even if Tahiti ran 58% it still be twice as fast in DGEMM compared to Fermi.

    Given K20 is similar spec to W9000 and W8000 It would have to bring its efficiency up in such a comparison.
    Maybe the K20 has better effeciency but when someone says hey look AMD can only do 90% when they fail to mention Nvidia only does 58% thats kind cheerleading to me.

    We need to see Keplers DGEMM effeciency to see what % it is to its specs/as advertised.

    :toast:

    Update:
    Nvidias marketing slides put DGEMM efficiency of K20 at 80% and Fermi at 60-65%. So if Oak Ridge National Laboratories put it 2% shy of 60% I would say the window would be 78-80% efficiency for K20. So we are more then likely going to see a draw between K20 & W9000 in DGEMM if the marketing slides of 80% effeciency are met.
    Last edited: Oct 18, 2012
  24. HumanSmoke

    HumanSmoke

    Joined:
    Sep 7, 2011
    Messages:
    1,283 (1.18/day)
    Thanks Received:
    404
    As per usual the troll can't even parse a sentence without altering the content to suit its needs:
    Nvidia whitepaper May 2012. (pdf)
    Still, coming from someone who openly admits to lying, and up until recently didn't even know the difference between a 3D rendering card and a math co-processor, it's hardly surprising.
    Keep up with the straw man AMD vs Nvidia bullshit and the hypothetical numbers game. I'll stand by my preference for real world testing*
    *By your reasoning the AMD FirePro W9000 (3.99 TF SP, 1 TF DP) should be four times faster than a Quadro 6000 (1 TF SP, 515 GF DP)...after all, numbers don't lie right?
    No...
    No...
    No
    Last edited: Oct 18, 2012
  25. Xzibit

    Joined:
    Apr 30, 2012
    Messages:
    1,121 (1.32/day)
    Thanks Received:
    252
    Now we are taking marketing slides as facts. Guess that doesnt surprise me.

    This coming from the idiot who didnt even know who ran GPGPU.ORG

    Mark Harris,
    Chief Technologist, GPU Computing @ Nvidia


    I thought we wanted hard numbers not marketing B.S.

    Are you gonna link to Jen-Hsun Huang blog next so we can get nvidia links from there aswell :laugh:

Currently Active Users Viewing This Thread: 1 (0 members and 1 guest)

Share This Page