Larrabee is being advertised for HPC market... meaning they will use these to find oil, cures for cancer, and run monte carlo simulations, physics computations, etc etc. Theyre basically advertising this for super-computing use.
I really dont think anyone there cares about its 3dmark score
I Heared IBM made a Graphics Card for Super Computing above 2TFLOPs
They could still use it test for speed because the hi'er the mark the paster it preforms right but then if they have tech's then they will do there own testing on them.
I wonder what 5970 get's in your test.
You can be a member of the Folding team and help them do there work faster the more runnning the software over the net the quicker it goes the other end right so i guess the people on it make up there super gpu lol
I want a GPU for me and my gaming that is link like folding for cancer research goes for them then I wont need to spend all that money me thinks but the fought is nice..... :O
That makes me wonder... does the bench software change the speed at which the algorightm/process/subroutine is performed...? i mean, those have to be some pretty awesome cores for 16 of them at 2GHz to outperform 240 shaders at 1.5ghz. Those engineers probably tweaked the hell out of that bench.
last I checked matlab had an SGEMM bench - but i think its cpu only (i really dont know), you may be able to find a CUDA one for nvidia. IDK about ATI cards tho.
If you read the article, 1TF was achieved with highly overclocked 32 cores, which is the maximum that Intel will have for now. 1TF is the maximum you will see for now, unless Fermi is faster and if GTX285 is doing 425, IMO Fermi can destroy Larrabee in this test. Nvidia's own tests show 4x-5x over GT200, we'll see.
That makes me wonder... does the bench software change the speed at which the algorightm/process/subroutine is performed...? i mean, those have to be some pretty awesome cores for 16 of them at 2GHz to outperform 240 shaders at 1.5ghz. Those engineers probably tweaked the hell out of that bench.
Like I said above 1TF was achieved with 32 cores and GT200 doesn't really have 240 cores, it has trully 10 cores, 24 shader "core" wide "true" cores. Larrabee has somewhat more complicated to count "cores", but for a simple comparison we could say that it has 32 "true" cores, 16 wide, so 32x16=512 "cores" compared to the 240 "cores" in GT200.
If you read the article, 1TF was achieved with highly overclocked 32 cores, which is the maximum that Intel will have for now. 1TF is the maximum you will see for now, unless Fermi is faster and if GTX285 is doing 425, IMO Fermi can destroy Larrabee in this test. Nvidia's own tests show 4x-5x over GT200, we'll see.
Like I said above 1TF was achieved with 32 cores and GT200 doesn't really have 240 cores, it has trully 10 cores, 24 shader "core" wide "true" cores. Larrabee has somewhat more complicated to count "cores", but for a simple comparison we could say that it has 32 "true" cores, 16 wide, so 32x16=512 "cores" compared to the 240 "cores" in GT200.
ahhh i see it now... got confused since the shaders are always called "cores" and Intel's cores are blocks (in comparison) it has 16 vector ALU(s) per core/block. Makes sense...
it's defined as "multiplication of two matrices with single precision". it is NOT defined as an exact pre-compiled fixed piece of code to run (like 3dmark for example).
this means that if intel has good people who know how to optimize for their arch/compiler they can make it run faster than another unoptimized version
How hard is coding for SGEMM? Since Intel is not known for their ability to write drivers.
After living through Intel's last dedicated graphics experiment in the very late 90s, I have a hard time believing Larrabee will be anything for nVidia or ATI to worry about.
Memory:
Memory Clock (MHz DDR): 1998 MHz
Total Memory Config: 1792 MB
Memory Interface Width: 448-bit per GPU
Total Memory Bandwidth: 223.8GB/s
surely this card would compare is is it like intel with the extreme CPU in preformance you think?
or this look below
The Sapphire HD 5970 OC 2GB comes equipped with a total of two RV 870 Cypress cores that have a total of 3200 Stream processors, delivering almost 5TFLOPs of processing power, making the HD 5970 the most powerful video card on the planet. Clock speeds come in a bit higher than the reference versions at 735MHz on the two Cypress cores and 1010MHz (4040MHz effective) on the 2GB of GDDR5 memory. Each core has 1GB of memory dedicated to it, running through a 512-bit bus (256x2). The specifications on paper look impressive, but that's not all; gone are the overclocking limits we have seen in the past as this card comes unlocked so you can throw the screws to it to gain some more FPS or distributed computing power. This card is designed to do some hardcore overclocking based on the construction. It features multiple Volterra voltage regulators, Japanese made pure ceramic SuperCapacitors, real time power monitoring and a programmable fan controller. The cores used are "low leakage" parts so you can get the best parts to push. With many HD 5870s hitting 1000MHz on the cores, overclocking should prove interesting.
256x2 = 512 so it could prehaps contend gulp slurp slurp what about it!
I mean be real in the real consumer market no one will have this type of GPU and there for off topic but mybe if ATI or Nvidia make same power processor for less money they will be force to go cheaper and then of course industries will be forced to make produce to a consumer level as well to compete.
Over 4TFLOPs on 5970 sounds good to me even if it would be reduced on testing.....?
at the first post the guy was comparing it to GS200 and hd Graphics cards and this is the graphics card section which part don't you understand there's nothing about CPU extremes in this section!!!!
I think someone is confused but not me
look
this is to do with how fast a processor dedicated to that task performs, which is to do with the graphical process of the speed of screen display FPS frames per sec OK!
The General Matrix Multiply (GEMM) is a subroutine in the Basic Linear Algebra Subprograms (BLAS) which performs matrix multiplication, that is the multiplication of two matrices. This includes:
SGEMM for single precision,
DGEMM for double-precision,
CGEMM for complex single precision, and
ZGEMM for complex double precision
so your telling me all these experts in Graphics benching and programing are basing there results off of theory but not practical basis and that this one test tells you your bench for the processor you describe is the only one that gives true results right.
So there for your not just telling me that but all the people who bench there hardware with 3dmark and similar benchmarks right. Just sounds like Intel are trying to get money from you to bench Graphics processors there way.
larrabee video card will be testeed by 3dmark and then can be compared the same way or there is no sgrument not that I'm saying this larrabee is not a monster if it were used for gaming but prove it with 3dmark or the cards not doing the same job right. So why compare it to Graphics cards that are not used for these tasks.
What I mean is why compare it to the consumer market cards if it will be used for something totally different!
it was tested against the Telsa and FireStream which are somewhat optimized for this type of work, the thing is that when you run 3DMark on a Telsa or a Firestream, they score a good bit lower sometimes than the Geforce and Radeon equivalents, I don't think this will be any different, it will probably be optimized only for single precision, not for double precision.
Larrabee is being advertised for HPC market... meaning they will use these to find oil, cures for cancer, and run monte carlo simulations, physics computations, etc etc. Theyre basically advertising this for super-computing use.
I really dont think anyone there cares about its 3dmark score
It's no different than GeForce/CUDA or Radeon/Streams. Larrabee is Larrabee--it is designed to fulfill both roles from the start. It isn't something thought of ten years after the fact and glued on. I think you underestimate the power of Larrabee in both the high performance computing and graphic segments. If one card can't do both than GeForce/Radeon must be vaporware as well.
it was tested against the Telsa and FireStream which are somewhat optimized for this type of work, the thing is that when you run 3DMark on a Telsa or a Firestream, they score a good bit lower sometimes than the Geforce and Radeon equivalents, I don't think this will be any different, it will probably be optimized only for single precision, not for double precision.
All graphics cards are optimized for single precision because that is almost exclusively what computing uses (games, research, and otherwise). Double precision is twice the size taking more than twice as long to compute compared to single. When time is money, single precesion is preferred.
That doesn't mean GeForce, Radeon, and Larrabee can't do double because they sure can. It just isn't worth the performance penalty, in most cases.
It's no different than GeForce/CUDA or Radeon/Streams. Larrabee is Larrabee--it is designed to fulfill both roles from the start. It isn't something thought of ten years after the fact and glued on. I think you underestimate the power of Larrabee in both the high performance computing and graphic segments. If one card can't do both than GeForce/Radeon must be vaporware as well.
All graphics cards are optimized for single precision because that is almost exclusively what computing uses (games, research, and otherwise). Double precision is twice the size taking more than twice as long to compute compared to single. When time is money, single precesion is preferred.
That doesn't mean GeForce, Radeon, and Larrabee can't do double because they sure can. It just isn't worth the performance penalty, in most cases.
Double precision is very common on research actually. 32 bit is nowhere near as precise as it is required in most scenarios, specially when working with numbers close to 0 or working with ecuations with infinites (because of the same thing, not enough precision near 0).
It's no different than GeForce/CUDA or Radeon/Streams. Larrabee is Larrabee--it is designed to fulfill both roles from the start. It isn't something thought of ten years after the fact and glued on. I think you underestimate the power of Larrabee in both the high performance computing and graphic segments. If one card can't do both than GeForce/Radeon must be vaporware as well.
No I understand that, but at the moment they are advertising it to the HPC segment. Larrabee is a response to CUDA and OpenCL, its definitely not bc intel wants to break into the gamer market. They're not coming out with 3dmark scores.
Intel is coming from the other end of the spectrum... GeForce and Radeon are graphics cards that can compute, Larrabee is a compute card that can do graphics. Its primary function IMO is to stop CUDA and OpenCL from cutting into Intel's crunching pie (and a substantial pie it is).
Double precision is very common on research actually. 32 bit is nowhere near as precise as it is required in most scenarios, specially when working with numbers close to 0 or working with ecuations with infinites (because of the same thing, not enough precision near 0).
Double precision allows more decimal places but, if you know the scale of the numbers you are working with, those extra decimal places are moot. In the end, they usually use a single precision float coupled with a scale for a set of numbers. Multiply the float by the scale and you got yourself performance and accuracy.
As to infinites, single (0x7f800000) and double (positive: 0x7ff0000000000000, negative 0xfff0000000000000) have a value set aside which is flagged as "infinite."
No I understand that, but at the moment they are advertising it to the HPC segment. Larrabee is a response to CUDA and OpenCL, its definitely not bc intel wants to break into the gamer market. They're not coming out with 3dmark scores.
Intel is coming from the other end of the spectrum... GeForce and Radeon are graphics cards that can compute, Larrabee is a compute card that can do graphics. Its primary function IMO is to stop CUDA and OpenCL from cutting into Intel's crunching pie (and a substantial pie it is).
The video card came before the high performance computing aspects of it. There is more money in discreet video cards than cards to bulster CPU performance.
When Intel gathered information that showed a series of x86 CPUs (that's what Intel is all about, after all) could rival the performance of a modern GPU offered by NVIDIA and AMD, the idea of Larrabee was born. From that idea came the fact that it is x86 and programmers would easily be able to use it so the GPU idea became a GPGPU idea. As proof of this, note how little information has been released about Larrabee's GPU performanc, e. Intel knows people will buy Larrabee as a GPGPU card just because it has the Intel brand on it. On the other hand, Intel knows they have to topple two corporations that have been in the segment for well over a decade. When a product, like Core 2, is quiet until release, the media frenzy sparked by a new, dominent product sells itself. That is most likely the same strategy Intel is relying on to sell Larrabee. It also explains why they are so tight-lipped about its performance as a GPU.
Double precision allows more decimal places but, if you know the scale of the numbers you are working with, those extra decimal places are moot. In the end, they usually use a single precision float coupled with a scale for a set of numbers. Multiply the float by the scale and you got yourself performance and accuracy.
As to infinites, single (0x7f800000) and double (positive: 0x7ff0000000000000, negative 0xfff0000000000000) have a value set aside which is flagged as "infinite."
You didn't understand me. I'm not talking about being able to represent those numbers. I'm talking that on the application you could get results that are very very close to zero, but no zero and because the number can't be represented it will be "rounded" to the closest number which could be either zero or a number that can be orders of magnitude bigger that the true one. If that number (the true one you are looking for) was then multiplied by a very large number, the results would differ greatly. You may expect something in the hundreds and get millions in return or zero if it was rounded to zero or simply an error (if you are multiplying with infinite (zero x infinite= not determined), while you should get infinite as the result (something finite x infinite=infinite)).
Example: Imagine that you have two complex formulas and the result of one is (should be) something like x=0.000000000000000000000000000000001 (decimal), while the other result is y=100,000,000,000,000,000,000,000,000,000. Don't pay attention to the number of zeroes, it's just an example and I didn't count them myself. Imagine that the result x*y should be 100, but x can't be represented under 32 bits and the closest representable number is either zero or 0.0000000000011 (whatever). In either case you are not getting anything close to what you would need and this cases occur a lot in physics, astronomy, probably on genetics too...
I updated an old program of mine which basically does the following...
1) Creates one thread per core which all it does is increment a counter.
2) Once all threads are created, it resets all counters to zero.
3) It waits for one second, grabing and reseting the counters.
4) After it has 10 results, it adds them together to get a cummulative value.
The cummulative value is basically the same as taking the power of all the cores and adding them together (literally).
Important note: float can only increment up to 16,777,220 because it can only hold 7 significant digits (read here for an explaination). Because of this, I had to scale it, as I suggested, in order to prevent the float from overflowing. Here is a comparison of the functional code:
Code:
Double:
private void Looper()
{
while (true)
_Counter++;
}
Single:
private void Looper()
{
while (true)
{
if (_Counter == MAX)
{
ResetCounter();
_Multiplier++;
}
_Counter++;
}
}
That if statement in single apparently incurs a rather large performance penalty (much larger than anticipated).
Moreover, I was shocked to see UInt32 and UInt64 so close.
Anyway, here are the full results on my Core i7 920:
Core i7 apparently doesn't take to the if statement as well as the Core 2 based processors do. The Core i7 has a very weak showing in the 32-bit float so, my conclusion is that it boils down to the hardware...
Single precision floats are the norm for stressing a computer so I really see no problem with it. All things being the same, single should be equal to, or faster than, double precision.
Seeing as Larrabee is using a modernized P3 core, it is impossible to speculate how its double precision performance compares to single.
The video card came before the high performance computing aspects of it. There is more money in discreet video cards than cards to bulster CPU performance.
When Intel gathered information that showed a series of x86 CPUs (that's what Intel is all about, after all) could rival the performance of a modern GPU offered by NVIDIA and AMD, the idea of Larrabee was born. From that idea came the fact that it is x86 and programmers would easily be able to use it so the GPU idea became a GPGPU idea. As proof of this, note how little information has been released about Larrabee's GPU performanc, e. Intel knows people will buy Larrabee as a GPGPU card just because it has the Intel brand on it. On the other hand, Intel knows they have to topple two corporations that have been in the segment for well over a decade. When a product, like Core 2, is quiet until release, the media frenzy sparked by a new, dominent product sells itself. That is most likely the same strategy Intel is relying on to sell Larrabee. It also explains why they are so tight-lipped about its performance as a GPU.
Even nvidia is openly admitting that GPGPU and HPC market are the primary targets for Fermi.
And the first presentation of Larrabee intel talked about the evolution of computing... not mentioning the 'evolution of graphics'.
this is the "supercomputing for the masses" movement and it represents a different mentality altogether, one in which the cpu is nothing more than a glorified scheduler and the GPU does all the heavy lifting. This is the gist of what I get from all the GP GPU hype, and seems like intel is positioning larrabee as such.
At this point, it is like asking what came first: the chicken or the egg. I'm not convinced more than 1% of the market cares about general purpose computing beyond the capabilities of the CPU. People always want bigger screens and bigger screens means higher resolutions and higher resolutions means better graphics cards. We'll see what is the driving force behind the market for discreet cards in a few years time.