1. Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Larrabee 2.7x faster than GT200 in SGEMM

Discussion in 'Graphics Cards' started by KainXS, Dec 2, 2009.

  1. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.02/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    He talks about HPC market and you are talking about desktop market... nothing to do with each other.

    Nvidia already has contracts for Fermi Tesla's, which could account for tens (or hundreds) of thousans of cards being sold. At $2000-5000 each or ($1800-4800 profits per card) they could care less about the desktop market. Yet, since they are a graphics company, they do care, but Intel which is a computing company...
     
  2. FordGT90Concept

    FordGT90Concept "I go fast!1!11!1!"

    Joined:
    Oct 13, 2008
    Messages:
    18,942 (6.37/day)
    Thanks Received:
    8,188
    Location:
    IA, USA
    HPC is the same. Only a few scenarios (highly scientific) would favor GPGPU over more CPUs. Larrabee could change that (lacking the Achilles heel of CUDA and Stream) but again, we'll see. Tesla has had very limited deployment so far and, in terms of HPC, that is the direct competitor to Larrabee.

    Why spend $2000 on a GPGPU when you could easily have 8 quad-core CPUs (32 cores) for the same price? That specific industry is still in its infancy. Once they appear in the top 10 of the TOP500 list, people will start taking note. Until then, I think they are stuck in a small niche market.
     
    Crunching for Team TPU
  3. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.02/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    Because it's faster and you can't have 8 Xeons or Opterons for that price, far from it (say $10000-16000). Unless you are talking about Larrabee which is a GPU, more or less the same Tesla and Firestream are and not comparable to 8 Xeon/Opteron processors.

    ORNL is already working on a supercomputer with Fermi, with the objective to create a computer 10 times faster than RoadRunner. (Fastest supercomputer at the time the claim was made, before Jaguar was upgraded with 6 core Opterons)

    Cray is creating a deskside supercomputer with Fermis too. In your opinion when starts not being a niche market??

    And regarding DP computing, it's not used more widely because until now the performance was lacking, but that's the point of GPGPU. Bill Dally mentioned in an interview that they contacted with HPC customers to know their interest in something like Fermi. The answer was "don't bother unless you offer DP and ECC" and that's why Fermi has those. The reality of the HPC market is that DP is desired A LOT, but it's not used because it was not really possible and alternatives were used instead, far from perfect alternatives BTW. There is a very good reason x86 never entered the HPC market until 64 bits was an option, thanks to the introduction of Opterons...
     
  4. Melvis

    Melvis

    Joined:
    Mar 18, 2008
    Messages:
    4,014 (1.26/day)
    Thanks Received:
    757
    Location:
    Australia
    Around 3mins 38 seconds. There is one of these inside this thing. LOL yea this guy knows what he is talking about :laugh:
     
    Benetanegia says thanks.
  5. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.02/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    :roll: hahahahahaha yeah pretty funny.
     
    Melvis says thanks.
  6. FordGT90Concept

    FordGT90Concept "I go fast!1!11!1!"

    Joined:
    Oct 13, 2008
    Messages:
    18,942 (6.37/day)
    Thanks Received:
    8,188
    Location:
    IA, USA
    They get massive tray discounts. Only China can afford Xeons.


    It will only be 10 times faster than Road Runner if it doesn't encounter any conditionals or switch logic. Again, not all super computers are better suited to have more GPUs. It depends on their work load.


    "Deskside supercomputer" is an oxymoron. It's not a niche market when Tesla cards infiltrate more than one market (not just HPC). Seeing as that is what they are solely engineered to do, Tesla will never leave that market. A mistake on NVIDIA's behalf. They created a niche product for a niche market when a normal GeForce should be able to do both without a problem.


    x86 has a lot of overhead compared to Power PC and other CPU architectures (compare Roadrunner to Jaguar in terms of power: twice the cores, not quite twice the Rmax, and three times the power consumption). It's another reason why Larrabee might not do the best in that segment (it isn't already infiltrated with x86 machines like mainstream computing). Then again, they all run Linux so it really doesn't matter.

    Then again, with high FLOP/low logic cards entering the market, LinPack may not be the best benchmark for comparing the two because the benchmark favors the vectored architectures.
     
    Last edited: Dec 4, 2009
    Crunching for Team TPU
  7. W1zzard

    W1zzard Administrator Staff Member

    Joined:
    May 14, 2004
    Messages:
    16,541 (3.61/day)
    Thanks Received:
    15,652
    intel has the best compiler engineers/optimizers/hardware optimizers that money can buy.
    intel's igps and those drivers arent made by the lrb team

    always consider that intel has infinite money available to pull off whatever project they really want to push

    when the general consumer uses it .. think "digital compressed audio on a computer system" ..wtf ? now it's mp3 and everyone uses it. 80387 mathematical coprocessor? o_O for scientists only. a computer? who needs a personal computer? to quote one of my professors "back in the days, a personal computer was about as useful for a normal person as a space station for an old woman"
     
    10 Year Member at TPU
  8. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.02/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    But in a sense that's what I'm saying. Where was the math coprocessor used first? And then it made to the public, it's the natural trend. When talking about Teslas or Larrabee as coprocessors is the same, they will mostly be used in supercomputers and supercomputers that are not so much super on their own, but yes compared to current top500. Later they might get to office computers, although that's very unlikely. In any case, ok, it's a niche market in the sense that very few people actually use them, but it's a very profitably one and at least AMD wil agree since it's been almost surviving from it the last years and Intel must been agreeing too when they created Larrabee almost for that purpose.

    What I mean when I say GPGPU aren't going to be a niche market, I'm trying to say that they are not going to be any more niche than 8p Xeons or Opterons.

    What other market do you see for Tesla? There's none, for the consumer, who doesn't care so much about reliability or ECC, THE SAME chip/card is going to be used in GeForces and also in Quadros and the GPGPU functionality is going to be there. The 3 are essentially the same thing, just like a Phenom and 2p and 8p Opterons are almost exactly the same except for the quality requirements, support and yes the inflated prices associated with HPC sector.

    They made a differentiation between GeForce and Tesla because 1-2 GB of vram is nowhere near enough for the tasks that a Tesla is supposed to run and 3-6 GB is just not profitable for a consumer GPU. Also ECC is a requirement for computing companies, but it comes with a performance and frame buffer penalty that wouldn't be wise to have on a GPU, hence the two products. And of course the price difference is also important. Intel and AMD sell their Xeon and Opteron chips at 5x-10x times the price, why wouldn't Nvidia want to do the same with Tesla? :laugh:

    Also a "deskside supercomputer" is probably the most interesting thing of all this GPGPU cards. Having something like 4-20 TFlops in your "desktop" computer is a dream for many scientists. How many R&D scientists have to wait weeks if not months in order to get access to some supercomputer time? Many if not most. Having a $6000 computer with 4-20 Tflops would help them a lot and increase productivity in research areas by unimaginable amounts. Same can be said for design and animation markets, architecture... the list is long believe me.
     
    Last edited: Dec 4, 2009
  9. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.02/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    Indeed that comparison best shows why a GPU is a very good choice in that sector. RoadRunner is not faster per CPU because x86 has an overhead, no, Roadrunner uses Opterons to run the OS too. The reason is that RoadRunner uses the Cell "co-processor", which for all purposes was a GPGPU wannabe.

    Xeons and Opterons have 50-100 GFlops.
    Cell has 150-200 Gflps and that's why RoadRunner is faster with less processors.
     
  10. FordGT90Concept

    FordGT90Concept "I go fast!1!11!1!"

    Joined:
    Oct 13, 2008
    Messages:
    18,942 (6.37/day)
    Thanks Received:
    8,188
    Location:
    IA, USA
    An 8-way Opteron can do everything a 1-way Opteron does. A Tesla, however, can't do everything a GeForce does.

    Only two+ way Opterons and Xeons cost more. One-way is usually just tens of dollars away from the mainstream part.

    It may be deskside but it is not a super computer seeing as the new definition for super computer is PFLOP capable. Not many scientists run deep algorithms on their desktops. They engineer the software on their desktops and when it is time to go full scale, they move it to the super computer which runs it for multiple days. When a super computer does it in days, even your "desktop super computer" will take weeks or months. There is a reason why every place with a super computer has an army of average computers as well. Not every task requires more than a few GFLOPS.


    Except for the power bill, Jaguar is the better machine to have. Not only does it have higher FLOPS but it also is fully capable of running complex algorithms. It completely depends on needs.
     
    Crunching for Team TPU
  11. W1zzard

    W1zzard Administrator Staff Member

    Joined:
    May 14, 2004
    Messages:
    16,541 (3.61/day)
    Thanks Received:
    15,652
    and most people dont care about superior <insert random technical term here>, they want a pretty clicky click OS that supports all their software, that they know how to use... and that can play HD porn from bluray/teh internet
     
    Zubasa says thanks.
    10 Year Member at TPU
  12. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.02/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    It doesn't need to and on top of that the Fermi based Teslas come with display outputs. What else do you need? Nothing, unless you want to play on your workplace.

    Yeah, but those a re the ones used in supercomputers.

    Semantics. And you should tell the world, apparently there is currently only 2 supercomputer in the world. :laugh:

    The important thing is that two of the hypothetical deskside SCs that I mentioned, combined would make it into the top100 supercomputer list, so no your point fails badly. Even if it took weeks or months in a single desktop SC, every scientist could have one and when combined (5, 10, 20 in a laboratory) it would be much much better than renting part of a supercomputer, as they do today. Apart from that, they would have a lot of power to make limited simulations before going full, something they can't do today, resulting in a lot of retesting and having to wait for weeks/months again until they get SC time.

    Yeah, but they could put the same ammount of processors on RoadRunner (mantaining the same architecture as now) and it would destroy Jaguar in everything...

    With GPGPUs it's even better, with just 20.000 Teslas or Larrabees and 20.000 CPUs to drive them, you'd already have more than 20 PFlops. Put 50k and 50k to reach a consumption/price similar to RoadRunner and you got a rockect.
     
  13. phanbuey

    phanbuey

    Joined:
    Nov 13, 2007
    Messages:
    5,334 (1.61/day)
    Thanks Received:
    1,045
    Location:
    Miami
    LOL mine too... thats what I want haha
     
  14. FordGT90Concept

    FordGT90Concept "I go fast!1!11!1!"

    Joined:
    Oct 13, 2008
    Messages:
    18,942 (6.37/day)
    Thanks Received:
    8,188
    Location:
    IA, USA
    You sacrifice the ability, for example, to maintain a massive database by doing that. Anything that uses a lot of logic is going to run at a snails pace on a GPU. I've said it about three times already and I'll say it again: Super computers are built to order. If they need high FLOPS, they may consider GPGPU. If they need to run deep algorithms, they'll stick to racks of CPUs.
     
    Crunching for Team TPU
  15. 20mmrain

    20mmrain

    Joined:
    Oct 6, 2009
    Messages:
    2,786 (1.06/day)
    Thanks Received:
    841
    Location:
    Midwest USA
    Okay guys I haven't been following the Larrabee story really at all. A question for you all. It sounds to me that when this GPU is available it will be going up against the Firestorm and CUDA correct? It's more of computation card than a gaming card?
    Now if that's correct why is Intel comparing the Larrabee to a GTX 285 instead of the work staion cards?
    Now remember like I said I haven't been following Larrabee's story so if I am way off....... Please someone explain.

    On an other note...... If I am wrong and this card will battle it out with the GT300's and the 6870/5870's (by that time) why would this be a bad thing. More competition = lower prices and more powerful video cards from all sides. This sounds like an awesome thing?
     
  16. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.02/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    Sorry, but database handling is precisely one of the areas where Tesla setups (GPUs in general) have proved to excell. I think you should have a look at what kind of GPGPu initiatives are out there...

    You can start here: http://www.nvidia.com/object/tesla_computing_solutions.html

    You should also check what kind of computation new GPUs can perform while you are at it. A GPU like Fermi and Larrabe (I guess) can do EVERYTHING that a CPU can do, it will just not branch very well. Since branching doesn't usually take more than 5% of the CPU time, the CPU on the setup can handle it. In a normal server the CPU is 50-70% of the time doing calculations, if you take that out of the ecuation by moving it to the GPU, the CPU has more than enough clock cycles to perform any branching the GPU may require.

    You are very very outdated regarding the state of computation mate.

    EDIT: Logic, BTW, is a very general term and that makes your statement very untrue. Things like "greater than", "lower than", "equal to", "and", "or", "while/for" are logic operations where GPUs excell at. While/for is the least good one in theory, but in practice it's a good one: a CPU desperately needs branch prediction in order to perform well in that kind of operations due to their low computational power, a GPU simply runs everything inside the while/for very fast, performing the implicit "if" only every Nth of iterations instead of every iteration and that's enough to make for the cases where it shouldn't be calculated.
     
    Last edited: Dec 4, 2009
  17. FordGT90Concept

    FordGT90Concept "I go fast!1!11!1!"

    Joined:
    Oct 13, 2008
    Messages:
    18,942 (6.37/day)
    Thanks Received:
    8,188
    Location:
    IA, USA
    Databases rarely involve "calculations," they require billions of "evaluations" (does a match b, does a contain b, is a like b, etc.). Moreover, GPUs are bottlenecked by the CPU in having access to the hard drives. All the mainframes maintained by Microsoft, Google, IBM, and Yahoo have do not appear on TOP500 lists because they are both secretive and their specialization is storage, searching, and returning of data. Those machines will not switch to GPGPU based processing because they gain nothing from it.

    How much time the CPU spends on branch predictions depends on the workload. Databases, for example, require a lot while physics workloads require little. Again, I must stress that GPGPUs are suited for a niche market. You'll be hard pressed to find them outside of that niche.
     
    Crunching for Team TPU
  18. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.02/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    You keep saying the same and you keep being wrong, VERY WRONG. As I said take a look at the actual state of computing.

    I should have guessed my link was too general, ok, take a look at this:

    http://spectrum.ieee.org/computing/software/data-monster/0

    or more in general about databases:

    http://www.nvidia.com/object/data_mining_analytics_database.html

    EDIT: And FYI evaluations are in fact, calculations, at least for a computer, lol.

    EDIT2:
    DMA :laugh:
     
  19. FordGT90Concept

    FordGT90Concept "I go fast!1!11!1!"

    Joined:
    Oct 13, 2008
    Messages:
    18,942 (6.37/day)
    Thanks Received:
    8,188
    Location:
    IA, USA
    How about we wait until Larrabee is even out.
     
    Crunching for Team TPU
  20. wakkierob

    wakkierob New Member

    Joined:
    Nov 29, 2009
    Messages:
    72 (0.03/day)
    Thanks Received:
    5
    but then infinite is infinite with the zero before the number, but what could you exspect if is was precise???
     
  21. lemonadesoda

    lemonadesoda

    Joined:
    Aug 30, 2006
    Messages:
    6,366 (1.70/day)
    Thanks Received:
    987
    infinity ≠ infinity

    if b=a!, then clearly b is >> a for any a>2

    so what is b as a→∞?

    therefore, infinity ≠ infinity

    QED
     
    10 Year Member at TPU
  22. Benetanegia

    Benetanegia New Member

    Joined:
    Sep 11, 2009
    Messages:
    2,683 (1.02/day)
    Thanks Received:
    694
    Location:
    Reaching your left retina.
    phanbuey says thanks.
  23. phanbuey

    phanbuey

    Joined:
    Nov 13, 2007
    Messages:
    5,334 (1.61/day)
    Thanks Received:
    1,045
    Location:
    Miami
    :roll::roll::roll::roll::roll::roll:

    whooOOops...

    "Justin Rattner (Intel Senior Fellow) demonstrated Larrabee hitting one teraflop, which is great but you could walk across the street and buy an ATI graphics board for a few hundred dollars that would do five teraflops." A teraflop is 1 trillion floating point operations per second, a key indicator of graphics chip performance...
     
  24. 20mmrain

    20mmrain

    Joined:
    Oct 6, 2009
    Messages:
    2,786 (1.06/day)
    Thanks Received:
    841
    Location:
    Midwest USA
    WTF is this real.....??? LOL I was just starting to learn about this god damn thing. Now I have to forget it all. AAAHHH man I wasted part of my life reading for nothing..... I hate reading!
     
  25. Zubasa

    Zubasa

    Joined:
    Oct 1, 2006
    Messages:
    3,996 (1.08/day)
    Thanks Received:
    470
    Location:
    Hong Kong
    10 Year Member at TPU

Currently Active Users Viewing This Thread: 1 (0 members and 1 guest)