• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Larrabee 2.7x faster than GT200 in SGEMM

wakkierob

New Member
Joined
Nov 29, 2009
Messages
72 (0.02/day)
Likes
5
System Name Terminator
Processor e8400 3GHz oced @ 4.03GHz 1.6vcore
Motherboard Asrock P45xe
Cooling Sibirian Tiger NorthQ Watercooled
Memory Corsair Dominator blade xms 2x 2gb pc2-8500
Video Card(s) VTX Radeon HD 4770 @ Sapphire HD 4770 Crossfire
Storage SATA 150GB HD Barracuda 7200 @ SATA 600GB HD Western Digital Caviar Geen 64MB cache
Display(s) standard 17"
Case Gamers Blk (120mm Cooling Fans)
Audio Device(s) none
Power Supply Alphapower (switching PSU) 750watt
Software Vista 32-bit Ultimate SP2
Benchmark Scores 20000+ with 3d mark 06 and both radeon hd 4770 oced
#26
:wtf: Larrabee is being advertised for HPC market... meaning they will use these to find oil, cures for cancer, and run monte carlo simulations, physics computations, etc etc. Theyre basically advertising this for super-computing use.

I really dont think anyone there cares about its 3dmark score :laugh:
I Heared IBM made a Graphics Card for Super Computing above 2TFLOPs
They could still use it test for speed because the hi'er the mark the paster it preforms right but then if they have tech's then they will do there own testing on them.

I wonder what 5970 get's in your test.

You can be a member of the Folding team and help them do there work faster the more runnning the software over the net the quicker it goes the other end right so i guess the people on it make up there super gpu lol :confused:

I want a GPU for me and my gaming that is link like folding for cancer research goes for them then I wont need to spend all that money me thinks but the fought is nice..... :O
 
Last edited:
Joined
Sep 25, 2007
Messages
5,822 (1.51/day)
Likes
618
Processor Core I7 3770K@4.3Ghz
Motherboard AsRock Z77 Extreme
Cooling Cooler Master Seidon 120M
Memory 12Gb G.Skill Sniper
Video Card(s) MSI GTX 1070
Storage Sandisk SSD + 1TB Seagate Barracuda 7200
Display(s) IPS Asus 26inch
Case Antec 300
Audio Device(s) Xonar DG
Power Supply EVGA Supernova 650 G2
Software Windows 10/Windows 7
#27
lol
 
Joined
Nov 13, 2007
Messages
6,300 (1.65/day)
Likes
1,792
Location
Austin Texas
System Name TimeDumpster
Processor Intel i7 7820X Delidded @ 4.75Ghz / 3.1Ghz Mesh
Motherboard MSI X299 Tomahawk
Cooling 240mm Corsair H105 Intake
Memory 32 GB Quad 3434Mhz DDR4 15-16-16-38-300-1T
Video Card(s) Gigabyte GTX 1080 Ti Gaming
Storage 1Tb Samsung 960 Pro m2, 1TB Samsung 850 Pro SSD
Display(s) Dell 24" 2560x1440 144hz, G-Sync @ 165Hz
Case NZXT S340 Elite Black
Audio Device(s) Arctis 7
Power Supply FSP HydroG 750W
Mouse zowie ec-2
Keyboard corsair k65 tenkeyless
Software Windows 10 64 Bit
Benchmark Scores Cb: 2103 Multi, 209 Single, 10450 Timespy - 10150 GPU/11900 CPU, superpi 1M - 7.71s
Joined
Aug 30, 2009
Messages
4,005 (1.27/day)
Likes
1,665
Location
Sarasota, Florida, USA
System Name Awesomesauce 4.3 | Laptop (MSI GE72VR 6RF Apache Pro-023)
Processor Intel Core i7-5820K 4.16GHz 1.28v/3GHz 1.05v uncore | Intel Core i7-6700HQ @ 3.1GHz
Motherboard Gigabyte GA-X99-UD5 WiFi LGA2011-v3| Stock
Cooling Corsair H100i v2 w/ 2x EK Vardar F4-120ER + various 120/140mm case fans | Stock
Memory G.Skill RJ-4 16GB DDR4-2666 CL15 quad channel | 12GB DDR4-2133
Video Card(s) EVGA GTX 1080 Ti Hybrid SC2 11GB @ 2012/5151 boost | NVIDIA GTX 1060 6GB +200/+500 + Intel 530
Storage Samsung 840 EVO 500GB + Seagate 3TB 7200RPM + others | Kingston 256GB M.2 SATA + 1TB 7200RPM
Display(s) Acer G257HU 1440p 60Hz AH-IPS 4ms | 17.3" 1920*1080 60Hz wide angle TN notebook panel
Case Fractal Design Define XL R2 | MSI
Audio Device(s) Creative Sound Blaster Z | Realtek with quad stereo speakers and subwoofer
Power Supply Corsair HX850i Platinum | 19.5v 180w Delta brick
Software Windows 10 Pro x64 | Windows 10 Home x64
Joined
Nov 13, 2007
Messages
6,300 (1.65/day)
Likes
1,792
Location
Austin Texas
System Name TimeDumpster
Processor Intel i7 7820X Delidded @ 4.75Ghz / 3.1Ghz Mesh
Motherboard MSI X299 Tomahawk
Cooling 240mm Corsair H105 Intake
Memory 32 GB Quad 3434Mhz DDR4 15-16-16-38-300-1T
Video Card(s) Gigabyte GTX 1080 Ti Gaming
Storage 1Tb Samsung 960 Pro m2, 1TB Samsung 850 Pro SSD
Display(s) Dell 24" 2560x1440 144hz, G-Sync @ 165Hz
Case NZXT S340 Elite Black
Audio Device(s) Arctis 7
Power Supply FSP HydroG 750W
Mouse zowie ec-2
Keyboard corsair k65 tenkeyless
Software Windows 10 64 Bit
Benchmark Scores Cb: 2103 Multi, 209 Single, 10450 Timespy - 10150 GPU/11900 CPU, superpi 1M - 7.71s
#30
I want that SGEMM Performance Bench....where the heck can I download it? I searched every where:eek:
That makes me wonder... does the bench software change the speed at which the algorightm/process/subroutine is performed...? i mean, those have to be some pretty awesome cores for 16 of them at 2GHz to outperform 240 shaders at 1.5ghz. Those engineers probably tweaked the hell out of that bench.


last I checked matlab had an SGEMM bench - but i think its cpu only (i really dont know), you may be able to find a CUDA one for nvidia. IDK about ATI cards tho.
 
Last edited:
Joined
Sep 11, 2009
Messages
2,680 (0.85/day)
Likes
693
Location
Reaching your left retina.
#31
So, 2 TFLOP is expected (32 core) and high end models (64 core) could achieve 4 TFLOP. I wish my 8800 GT waited a year to die. :(
If you read the article, 1TF was achieved with highly overclocked 32 cores, which is the maximum that Intel will have for now. 1TF is the maximum you will see for now, unless Fermi is faster and if GTX285 is doing 425, IMO Fermi can destroy Larrabee in this test. Nvidia's own tests show 4x-5x over GT200, we'll see.

That makes me wonder... does the bench software change the speed at which the algorightm/process/subroutine is performed...? i mean, those have to be some pretty awesome cores for 16 of them at 2GHz to outperform 240 shaders at 1.5ghz. Those engineers probably tweaked the hell out of that bench.
Like I said above 1TF was achieved with 32 cores and GT200 doesn't really have 240 cores, it has trully 10 cores, 24 shader "core" wide "true" cores. Larrabee has somewhat more complicated to count "cores", but for a simple comparison we could say that it has 32 "true" cores, 16 wide, so 32x16=512 "cores" compared to the 240 "cores" in GT200.
 
Last edited:
Joined
Nov 13, 2007
Messages
6,300 (1.65/day)
Likes
1,792
Location
Austin Texas
System Name TimeDumpster
Processor Intel i7 7820X Delidded @ 4.75Ghz / 3.1Ghz Mesh
Motherboard MSI X299 Tomahawk
Cooling 240mm Corsair H105 Intake
Memory 32 GB Quad 3434Mhz DDR4 15-16-16-38-300-1T
Video Card(s) Gigabyte GTX 1080 Ti Gaming
Storage 1Tb Samsung 960 Pro m2, 1TB Samsung 850 Pro SSD
Display(s) Dell 24" 2560x1440 144hz, G-Sync @ 165Hz
Case NZXT S340 Elite Black
Audio Device(s) Arctis 7
Power Supply FSP HydroG 750W
Mouse zowie ec-2
Keyboard corsair k65 tenkeyless
Software Windows 10 64 Bit
Benchmark Scores Cb: 2103 Multi, 209 Single, 10450 Timespy - 10150 GPU/11900 CPU, superpi 1M - 7.71s
#32
If you read the article, 1TF was achieved with highly overclocked 32 cores, which is the maximum that Intel will have for now. 1TF is the maximum you will see for now, unless Fermi is faster and if GTX285 is doing 425, IMO Fermi can destroy Larrabee in this test. Nvidia's own tests show 4x-5x over GT200, we'll see.



Like I said above 1TF was achieved with 32 cores and GT200 doesn't really have 240 cores, it has trully 10 cores, 24 shader "core" wide "true" cores. Larrabee has somewhat more complicated to count "cores", but for a simple comparison we could say that it has 32 "true" cores, 16 wide, so 32x16=512 "cores" compared to the 240 "cores" in GT200.


ahhh i see it now... got confused since the shaders are always called "cores" and Intel's cores are blocks (in comparison) it has 16 vector ALU(s) per core/block. Makes sense...

good article http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3367&p=4.
 
Last edited:
Joined
Feb 24, 2009
Messages
3,466 (1.04/day)
Likes
721
System Name Money Hole
Processor Core i7 970
Motherboard Asus P6T6 WS Revolution
Cooling Noctua UH-D14
Memory 2133Mhz 12GB (3x4GB) Mushkin 998991
Video Card(s) Sapphire Tri-X OC R9 290X
Storage Samsung 1TB 850 Evo
Display(s) 3x Acer KG240A 144hz
Case CM HAF 932
Audio Device(s) ADI (onboard)
Power Supply Enermax Revolution 85+ 1050w
Mouse Logitech G602
Keyboard Logitech G710+
Software Windows 10 Professional x64
#33
SGEMM is single not double precision.

it's defined as "multiplication of two matrices with single precision". it is NOT defined as an exact pre-compiled fixed piece of code to run (like 3dmark for example).

this means that if intel has good people who know how to optimize for their arch/compiler they can make it run faster than another unoptimized version
How hard is coding for SGEMM? Since Intel is not known for their ability to write drivers.

After living through Intel's last dedicated graphics experiment in the very late 90s, I have a hard time believing Larrabee will be anything for nVidia or ATI to worry about.
 
Joined
Sep 25, 2007
Messages
5,822 (1.51/day)
Likes
618
Processor Core I7 3770K@4.3Ghz
Motherboard AsRock Z77 Extreme
Cooling Cooler Master Seidon 120M
Memory 12Gb G.Skill Sniper
Video Card(s) MSI GTX 1070
Storage Sandisk SSD + 1TB Seagate Barracuda 7200
Display(s) IPS Asus 26inch
Case Antec 300
Audio Device(s) Xonar DG
Power Supply EVGA Supernova 650 G2
Software Windows 10/Windows 7
#34
not they any of us will have one since its for hpc market.
 

wakkierob

New Member
Joined
Nov 29, 2009
Messages
72 (0.02/day)
Likes
5
System Name Terminator
Processor e8400 3GHz oced @ 4.03GHz 1.6vcore
Motherboard Asrock P45xe
Cooling Sibirian Tiger NorthQ Watercooled
Memory Corsair Dominator blade xms 2x 2gb pc2-8500
Video Card(s) VTX Radeon HD 4770 @ Sapphire HD 4770 Crossfire
Storage SATA 150GB HD Barracuda 7200 @ SATA 600GB HD Western Digital Caviar Geen 64MB cache
Display(s) standard 17"
Case Gamers Blk (120mm Cooling Fans)
Audio Device(s) none
Power Supply Alphapower (switching PSU) 750watt
Software Vista 32-bit Ultimate SP2
Benchmark Scores 20000+ with 3d mark 06 and both radeon hd 4770 oced
#35
NVIDIA GeForce GTX 295
Specifications and Features

GPU:
Fabrication Process: 55nm
Processor Cores: 480
ROP Units: 56
Texture Filtering Units: 160
Core Clock (MHz): 576 MHz
Shader Clock (MHz): 1242 MHz
Texture Filtering Rate: 92.2Giga Texels/s

Memory:
Memory Clock (MHz DDR): 1998 MHz
Total Memory Config: 1792 MB
Memory Interface Width: 448-bit per GPU
Total Memory Bandwidth: 223.8GB/s

surely this card would compare is is it like intel with the extreme CPU in preformance you think?

or this look below

The Sapphire HD 5970 OC 2GB comes equipped with a total of two RV 870 Cypress cores that have a total of 3200 Stream processors, delivering almost 5TFLOPs of processing power, making the HD 5970 the most powerful video card on the planet. Clock speeds come in a bit higher than the reference versions at 735MHz on the two Cypress cores and 1010MHz (4040MHz effective) on the 2GB of GDDR5 memory. Each core has 1GB of memory dedicated to it, running through a 512-bit bus (256x2). The specifications on paper look impressive, but that's not all; gone are the overclocking limits we have seen in the past as this card comes unlocked so you can throw the screws to it to gain some more FPS or distributed computing power. This card is designed to do some hardcore overclocking based on the construction. It features multiple Volterra voltage regulators, Japanese made pure ceramic SuperCapacitors, real time power monitoring and a programmable fan controller. The cores used are "low leakage" parts so you can get the best parts to push. With many HD 5870s hitting 1000MHz on the cores, overclocking should prove interesting.

256x2 = 512 so it could prehaps contend gulp slurp slurp what about it!

I mean be real in the real consumer market no one will have this type of GPU and there for off topic but mybe if ATI or Nvidia make same power processor for less money they will be force to go cheaper and then of course industries will be forced to make produce to a consumer level as well to compete.

Over 4TFLOPs on 5970 sounds good to me even if it would be reduced on testing.....?
 
Last edited:

erocker

Senior Moderator
Staff member
Joined
Jul 19, 2006
Messages
42,697 (9.94/day)
Likes
18,573
Processor i7 8700K
Motherboard Asus Maximus Hero X WiFi
Cooling Water
Memory 16GB G.Skill 3200Mhz CL14
Video Card(s) GTX 1080
Storage SSD's
Display(s) Nixeus EDG27
Case Thermaltake Core X5
Audio Device(s) Soundblaster Zx
Power Supply Corsair H1000i
Mouse Zowie EC1-B
#36
NVIDIA GeForce GTX 295
Specifications and Features

GPU:
Fabrication Process: 55nm
Processor Cores: 480
ROP Units: 56
Texture Filtering Units: 160
Core Clock (MHz): 576 MHz
Shader Clock (MHz): 1242 MHz
Texture Filtering Rate: 92.2Giga Texels/s

Memory:
Memory Clock (MHz DDR): 1998 MHz
Total Memory Config: 1792 MB
Memory Interface Width: 448-bit per GPU
Total Memory Bandwidth: 223.8GB/s

surely this card would compare is is it like intel with the extreme CPU in preformance you think?
I don't understand the question. What does any of this have to do with an "extreme CPU?"

Larabee is an entirely different archetecture compared to an Nvidia card.

What does a GTX 295 have to do with this thread?
 
Joined
Apr 30, 2008
Messages
4,315 (1.18/day)
Likes
1,015
Location
Multidimensional
System Name Derp!
Processor i7 7700 @ 4.2Ghz Turbo On
Motherboard Gigabyte B250 Phoenix Wifi ITX Motherboard
Cooling Noctua NH-L9i LP Cooler || Cooler Master Fan Pro RGB 120mm x 2
Memory 16GB Corsair Vengeance LPX DDR4 2400mhz RAM
Video Card(s) AMD Reference RX 480 8GB
Storage 250GB SS 960 Evo M.2 || WD Blue 500GB SSD || 2TB SG FC SSHD
Display(s) Hisense 1080p Smart LED HDTV 40inch
Case Fractal Node 202 Mini ITX Case
Audio Device(s) Realtek HD Audio / HDMI Audio Via GPU
Power Supply Corsair SFX 600W PSU
Mouse CoolerMaster Masterkeys Lite L RGB Mouse
Keyboard CoolerMaster Masterkeys Lite L RGB Mem-Chanical Keyboard
Software Windows 10 Home 64bit
Benchmark Scores Later
#37
I don't understand the question. What does any of this have to do with an "extreme CPU?"

Larabee is an entirely different archetecture compared to an Nvidia card.

What does a GTX 295 have to do with this thread?
I think he meant the larrabee processor:rolleyes:
 
Joined
Sep 25, 2007
Messages
5,822 (1.51/day)
Likes
618
Processor Core I7 3770K@4.3Ghz
Motherboard AsRock Z77 Extreme
Cooling Cooler Master Seidon 120M
Memory 12Gb G.Skill Sniper
Video Card(s) MSI GTX 1070
Storage Sandisk SSD + 1TB Seagate Barracuda 7200
Display(s) IPS Asus 26inch
Case Antec 300
Audio Device(s) Xonar DG
Power Supply EVGA Supernova 650 G2
Software Windows 10/Windows 7
#38
someone plz tell him the difference between theoretical and actual:shadedshu
 

wakkierob

New Member
Joined
Nov 29, 2009
Messages
72 (0.02/day)
Likes
5
System Name Terminator
Processor e8400 3GHz oced @ 4.03GHz 1.6vcore
Motherboard Asrock P45xe
Cooling Sibirian Tiger NorthQ Watercooled
Memory Corsair Dominator blade xms 2x 2gb pc2-8500
Video Card(s) VTX Radeon HD 4770 @ Sapphire HD 4770 Crossfire
Storage SATA 150GB HD Barracuda 7200 @ SATA 600GB HD Western Digital Caviar Geen 64MB cache
Display(s) standard 17"
Case Gamers Blk (120mm Cooling Fans)
Audio Device(s) none
Power Supply Alphapower (switching PSU) 750watt
Software Vista 32-bit Ultimate SP2
Benchmark Scores 20000+ with 3d mark 06 and both radeon hd 4770 oced
#39
I don't understand the question. What does any of this have to do with an "extreme CPU?"

Larabee is an entirely different archetecture compared to an Nvidia card.

What does a GTX 295 have to do with this thread?
at the first post the guy was comparing it to GS200 and hd Graphics cards and this is the graphics card section which part don't you understand there's nothing about CPU extremes in this section!!!!

I think someone is confused but not me

look

this is to do with how fast a processor dedicated to that task performs, which is to do with the graphical process of the speed of screen display FPS frames per sec OK!

The General Matrix Multiply (GEMM) is a subroutine in the Basic Linear Algebra Subprograms (BLAS) which performs matrix multiplication, that is the multiplication of two matrices. This includes:

SGEMM for single precision,
DGEMM for double-precision,
CGEMM for complex single precision, and
ZGEMM for complex double precision

so your telling me all these experts in Graphics benching and programing are basing there results off of theory but not practical basis and that this one test tells you your bench for the processor you describe is the only one that gives true results right.

So there for your not just telling me that but all the people who bench there hardware with 3dmark and similar benchmarks right. Just sounds like Intel are trying to get money from you to bench Graphics processors there way.

larrabee video card will be testeed by 3dmark and then can be compared the same way or there is no sgrument not that I'm saying this larrabee is not a monster if it were used for gaming but prove it with 3dmark or the cards not doing the same job right. So why compare it to Graphics cards that are not used for these tasks.

What I mean is why compare it to the consumer market cards if it will be used for something totally different!
 
Last edited:
Joined
Sep 25, 2007
Messages
5,822 (1.51/day)
Likes
618
Processor Core I7 3770K@4.3Ghz
Motherboard AsRock Z77 Extreme
Cooling Cooler Master Seidon 120M
Memory 12Gb G.Skill Sniper
Video Card(s) MSI GTX 1070
Storage Sandisk SSD + 1TB Seagate Barracuda 7200
Display(s) IPS Asus 26inch
Case Antec 300
Audio Device(s) Xonar DG
Power Supply EVGA Supernova 650 G2
Software Windows 10/Windows 7
#40
it was tested against the Telsa and FireStream which are somewhat optimized for this type of work, the thing is that when you run 3DMark on a Telsa or a Firestream, they score a good bit lower sometimes than the Geforce and Radeon equivalents, I don't think this will be any different, it will probably be optimized only for single precision, not for double precision.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
21,528 (6.18/day)
Likes
10,714
Location
IA, USA
System Name BY-2015
Processor Intel Core i7-6700K (4 x 4.00 GHz) w/ HT and Turbo on
Motherboard MSI Z170A GAMING M7
Cooling Scythe Kotetsu
Memory 2 x Kingston HyperX DDR4-2133 8 GiB
Video Card(s) PowerColor PCS+ 390 8 GiB DVI + HDMI
Storage Crucial MX300 275 GB, Seagate 6 TB 7200 RPM
Display(s) Samsung SyncMaster T240 24" LCD (1920x1200 HDMI) + Samsung SyncMaster 906BW 19" LCD (1440x900 DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay
Audio Device(s) Realtek Onboard, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse SteelSeries Sensei RAW
Keyboard Tesoro Excalibur
Software Windows 10 Pro 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
#41
:wtf: Larrabee is being advertised for HPC market... meaning they will use these to find oil, cures for cancer, and run monte carlo simulations, physics computations, etc etc. Theyre basically advertising this for super-computing use.

I really dont think anyone there cares about its 3dmark score :laugh:
It's no different than GeForce/CUDA or Radeon/Streams. Larrabee is Larrabee--it is designed to fulfill both roles from the start. It isn't something thought of ten years after the fact and glued on. I think you underestimate the power of Larrabee in both the high performance computing and graphic segments. If one card can't do both than GeForce/Radeon must be vaporware as well. :confused:


it was tested against the Telsa and FireStream which are somewhat optimized for this type of work, the thing is that when you run 3DMark on a Telsa or a Firestream, they score a good bit lower sometimes than the Geforce and Radeon equivalents, I don't think this will be any different, it will probably be optimized only for single precision, not for double precision.
All graphics cards are optimized for single precision because that is almost exclusively what computing uses (games, research, and otherwise). Double precision is twice the size taking more than twice as long to compute compared to single. When time is money, single precesion is preferred.

That doesn't mean GeForce, Radeon, and Larrabee can't do double because they sure can. It just isn't worth the performance penalty, in most cases.
 
Last edited:
Joined
Sep 11, 2009
Messages
2,680 (0.85/day)
Likes
693
Location
Reaching your left retina.
#42
It's no different than GeForce/CUDA or Radeon/Streams. Larrabee is Larrabee--it is designed to fulfill both roles from the start. It isn't something thought of ten years after the fact and glued on. I think you underestimate the power of Larrabee in both the high performance computing and graphic segments. If one card can't do both than GeForce/Radeon must be vaporware as well. :confused:



All graphics cards are optimized for single precision because that is almost exclusively what computing uses (games, research, and otherwise). Double precision is twice the size taking more than twice as long to compute compared to single. When time is money, single precesion is preferred.

That doesn't mean GeForce, Radeon, and Larrabee can't do double because they sure can. It just isn't worth the performance penalty, in most cases.
Double precision is very common on research actually. 32 bit is nowhere near as precise as it is required in most scenarios, specially when working with numbers close to 0 or working with ecuations with infinites (because of the same thing, not enough precision near 0).
 
Joined
Nov 13, 2007
Messages
6,300 (1.65/day)
Likes
1,792
Location
Austin Texas
System Name TimeDumpster
Processor Intel i7 7820X Delidded @ 4.75Ghz / 3.1Ghz Mesh
Motherboard MSI X299 Tomahawk
Cooling 240mm Corsair H105 Intake
Memory 32 GB Quad 3434Mhz DDR4 15-16-16-38-300-1T
Video Card(s) Gigabyte GTX 1080 Ti Gaming
Storage 1Tb Samsung 960 Pro m2, 1TB Samsung 850 Pro SSD
Display(s) Dell 24" 2560x1440 144hz, G-Sync @ 165Hz
Case NZXT S340 Elite Black
Audio Device(s) Arctis 7
Power Supply FSP HydroG 750W
Mouse zowie ec-2
Keyboard corsair k65 tenkeyless
Software Windows 10 64 Bit
Benchmark Scores Cb: 2103 Multi, 209 Single, 10450 Timespy - 10150 GPU/11900 CPU, superpi 1M - 7.71s
#43
It's no different than GeForce/CUDA or Radeon/Streams. Larrabee is Larrabee--it is designed to fulfill both roles from the start. It isn't something thought of ten years after the fact and glued on. I think you underestimate the power of Larrabee in both the high performance computing and graphic segments. If one card can't do both than GeForce/Radeon must be vaporware as well. :confused:
No I understand that, but at the moment they are advertising it to the HPC segment. Larrabee is a response to CUDA and OpenCL, its definitely not bc intel wants to break into the gamer market. They're not coming out with 3dmark scores.

Intel is coming from the other end of the spectrum... GeForce and Radeon are graphics cards that can compute, Larrabee is a compute card that can do graphics. Its primary function IMO is to stop CUDA and OpenCL from cutting into Intel's crunching pie (and a substantial pie it is).
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
21,528 (6.18/day)
Likes
10,714
Location
IA, USA
System Name BY-2015
Processor Intel Core i7-6700K (4 x 4.00 GHz) w/ HT and Turbo on
Motherboard MSI Z170A GAMING M7
Cooling Scythe Kotetsu
Memory 2 x Kingston HyperX DDR4-2133 8 GiB
Video Card(s) PowerColor PCS+ 390 8 GiB DVI + HDMI
Storage Crucial MX300 275 GB, Seagate 6 TB 7200 RPM
Display(s) Samsung SyncMaster T240 24" LCD (1920x1200 HDMI) + Samsung SyncMaster 906BW 19" LCD (1440x900 DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay
Audio Device(s) Realtek Onboard, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse SteelSeries Sensei RAW
Keyboard Tesoro Excalibur
Software Windows 10 Pro 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
#44
Double precision is very common on research actually. 32 bit is nowhere near as precise as it is required in most scenarios, specially when working with numbers close to 0 or working with ecuations with infinites (because of the same thing, not enough precision near 0).
Double precision allows more decimal places but, if you know the scale of the numbers you are working with, those extra decimal places are moot. In the end, they usually use a single precision float coupled with a scale for a set of numbers. Multiply the float by the scale and you got yourself performance and accuracy.

As to infinites, single (0x7f800000) and double (positive: 0x7ff0000000000000, negative 0xfff0000000000000) have a value set aside which is flagged as "infinite."


No I understand that, but at the moment they are advertising it to the HPC segment. Larrabee is a response to CUDA and OpenCL, its definitely not bc intel wants to break into the gamer market. They're not coming out with 3dmark scores.

Intel is coming from the other end of the spectrum... GeForce and Radeon are graphics cards that can compute, Larrabee is a compute card that can do graphics. Its primary function IMO is to stop CUDA and OpenCL from cutting into Intel's crunching pie (and a substantial pie it is).
The video card came before the high performance computing aspects of it. There is more money in discreet video cards than cards to bulster CPU performance.

When Intel gathered information that showed a series of x86 CPUs (that's what Intel is all about, after all) could rival the performance of a modern GPU offered by NVIDIA and AMD, the idea of Larrabee was born. From that idea came the fact that it is x86 and programmers would easily be able to use it so the GPU idea became a GPGPU idea. As proof of this, note how little information has been released about Larrabee's GPU performanc, e. Intel knows people will buy Larrabee as a GPGPU card just because it has the Intel brand on it. On the other hand, Intel knows they have to topple two corporations that have been in the segment for well over a decade. When a product, like Core 2, is quiet until release, the media frenzy sparked by a new, dominent product sells itself. That is most likely the same strategy Intel is relying on to sell Larrabee. It also explains why they are so tight-lipped about its performance as a GPU.
 
Last edited:
Joined
Sep 11, 2009
Messages
2,680 (0.85/day)
Likes
693
Location
Reaching your left retina.
#45
Double precision allows more decimal places but, if you know the scale of the numbers you are working with, those extra decimal places are moot. In the end, they usually use a single precision float coupled with a scale for a set of numbers. Multiply the float by the scale and you got yourself performance and accuracy.

As to infinites, single (0x7f800000) and double (positive: 0x7ff0000000000000, negative 0xfff0000000000000) have a value set aside which is flagged as "infinite."
You didn't understand me. I'm not talking about being able to represent those numbers. I'm talking that on the application you could get results that are very very close to zero, but no zero and because the number can't be represented it will be "rounded" to the closest number which could be either zero or a number that can be orders of magnitude bigger that the true one. If that number (the true one you are looking for) was then multiplied by a very large number, the results would differ greatly. You may expect something in the hundreds and get millions in return or zero if it was rounded to zero or simply an error (if you are multiplying with infinite (zero x infinite= not determined), while you should get infinite as the result (something finite x infinite=infinite)).

Example: Imagine that you have two complex formulas and the result of one is (should be) something like x=0.000000000000000000000000000000001 (decimal), while the other result is y=100,000,000,000,000,000,000,000,000,000. Don't pay attention to the number of zeroes, it's just an example and I didn't count them myself. Imagine that the result x*y should be 100, but x can't be represented under 32 bits and the closest representable number is either zero or 0.0000000000011 (whatever). In either case you are not getting anything close to what you would need and this cases occur a lot in physics, astronomy, probably on genetics too...
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
21,528 (6.18/day)
Likes
10,714
Location
IA, USA
System Name BY-2015
Processor Intel Core i7-6700K (4 x 4.00 GHz) w/ HT and Turbo on
Motherboard MSI Z170A GAMING M7
Cooling Scythe Kotetsu
Memory 2 x Kingston HyperX DDR4-2133 8 GiB
Video Card(s) PowerColor PCS+ 390 8 GiB DVI + HDMI
Storage Crucial MX300 275 GB, Seagate 6 TB 7200 RPM
Display(s) Samsung SyncMaster T240 24" LCD (1920x1200 HDMI) + Samsung SyncMaster 906BW 19" LCD (1440x900 DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay
Audio Device(s) Realtek Onboard, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse SteelSeries Sensei RAW
Keyboard Tesoro Excalibur
Software Windows 10 Pro 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
#46
You didn't understand me either. Multiple that value by 1,000,000,000 before dividing and it is no longer close to zero.
 
Joined
Sep 11, 2009
Messages
2,680 (0.85/day)
Likes
693
Location
Reaching your left retina.
#47
You didn't understand me either. Multiple that value by 1,000,000,000 before dividing and it is no longer close to zero.
Often times you can't. For example if working with particles moving at speeds close to light speed*, but there are many other cases.

*Because speed is going to be very high and any movement or time elapsed in any event is going to be very very small.
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
21,528 (6.18/day)
Likes
10,714
Location
IA, USA
System Name BY-2015
Processor Intel Core i7-6700K (4 x 4.00 GHz) w/ HT and Turbo on
Motherboard MSI Z170A GAMING M7
Cooling Scythe Kotetsu
Memory 2 x Kingston HyperX DDR4-2133 8 GiB
Video Card(s) PowerColor PCS+ 390 8 GiB DVI + HDMI
Storage Crucial MX300 275 GB, Seagate 6 TB 7200 RPM
Display(s) Samsung SyncMaster T240 24" LCD (1920x1200 HDMI) + Samsung SyncMaster 906BW 19" LCD (1440x900 DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay
Audio Device(s) Realtek Onboard, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse SteelSeries Sensei RAW
Keyboard Tesoro Excalibur
Software Windows 10 Pro 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
#48
I updated an old program of mine which basically does the following...
1) Creates one thread per core which all it does is increment a counter.
2) Once all threads are created, it resets all counters to zero.
3) It waits for one second, grabing and reseting the counters.
4) After it has 10 results, it adds them together to get a cummulative value.


The cummulative value is basically the same as taking the power of all the cores and adding them together (literally).

The results were surprising on my Core i7 920:
Code:
UInt32: 2045845947.75
 Float: 633960903.875
UInt64: 2602888497.625
Double: 1673840629.125
Important note: float can only increment up to 16,777,220 because it can only hold 7 significant digits (read here for an explaination). Because of this, I had to scale it, as I suggested, in order to prevent the float from overflowing. Here is a comparison of the functional code:

Code:
Double:
         private void Looper()
        {
            while (true)
                _Counter++;
        }

Single:
        private void Looper()
        {
            while (true)
            {
                if (_Counter == MAX)
                {
                    ResetCounter();
                    _Multiplier++;
                }

                _Counter++;
            }
        }
That if statement in single apparently incurs a rather large performance penalty (much larger than anticipated).

Moreover, I was shocked to see UInt32 and UInt64 so close.


Anyway, here are the full results on my Core i7 920:
Code:
UInt scores...
0	856077121	855505838	879774890	881755779	857463969	858833899	863758164	860493813
1	1289681239	1289137188	1317759063	1320227048	1284711952	1285837496	1290960177	1287496641
2	1718130750	1717533109	1760402478	1763469884	1716494840	428268464	1722603742	1715762899
3	2145194573	2144265190	2196212534	2199489935	2140143894	849644796	2146118492	2137410750
4	2572067760	2571340259	2639460123	2642700823	2567865980	1277313214	2573729491	2565218438
5	3005515809	3004783299	3070568998	3074128809	2995721330	1705708878	3001469326	2993683277
6	3437849620	3437074513	3508163167	3512091151	3422920242	2133772766	3428523633	3421769317
7	3866929158	3866135509	3941008415	3945202232	3846848411	2565943668	3852289332	3853869102
8	1298918	514536	75958995	80343916	4276316131	2999488674	4281647479	4287358130
9	434706620	433895585	515246174	519847659	409302670	431268809	414545526	423649389

UInt averages:
  [0]: 1932745156
  [1]: 1932018502
  [2]: 1990455483
  [3]: 1993925723
  [4]: 2351778941
  [5]: 1453608066
  [6]: 2357564536
  [7]: 2354671175
  [C]: 2045845947.75

Float scores...
0	467767013	471028154	470236515	470953884	471272208	467856302	470949616	466040867
1	685767544	703940869	702847933	703758470	704361679	685942293	704005225	685951321
2	903872881	936642995	935772512	936739232	937534238	904166300	937221932	904239159
3	1121942528	1169521375	1168826497	1169903846	1170521728	1121777882	1170157453	1121841473
4	1339682160	1402025480	1401815810	1403003370	1403576819	1340414431	1403142313	1340380991
5	1558339844	1635052815	1634663136	1635944458	1636628036	1558528177	1636113072	1558557159
6	1776514716	1868112676	1867570734	1868976335	1869531678	1776620293	1868951994	1776584107
7	1994468985	2101113447	2100536195	2102048536	2102477293	1994766365	2101726272	1994791111
8	-2082677333	-1961157796	-1961554886	-1959944080	-1959433371	-2082111967	-1960116536	-2082076823
9	-1864264490	-1728216535	-1728646239	-1726921589	-1726566211	-1864105228	-1727222451	-1864132847

Float averages:
  [0]: 590141384
  [1]: 659806348
  [2]: 659206820
  [3]: 660446246
  [4]: 660990409
  [5]: 590385484
  [6]: 660492889
  [7]: 590217651
  [C]: 633960903.875

ULong scores...
0	904870023	882782412	906855091	908835300	877603067	882806692	904901719	865663100
1	433200245	422532902	1349360475	1351520900	1304095077	1305442712	1338295360	1292128885
2	876886430	428063128	1790730151	1793068528	1732311369	1733579946	1782115967	1720194433
3	1315434546	860384568	2229662721	2232480987	2163520356	2165920628	2220706211	2151294303
4	1758937851	1294188788	2667766571	2671045230	2594642959	2599757789	2664369712	2582205782
5	2201983567	1722958985	3100982860	3104448338	3031016719	3028581663	3107573372	3018551694
6	2641764439	2155887417	3543216734	3546861108	3454599717	3461460181	3547480431	3441984499
7	3078203638	2583190533	3982562676	3986694814	3890422782	3888828410	3983996800	3877703893
8	3519860475	3013344010	4416156238	4420778929	4308030439	4318989235	4425771068	4295077269
9	3956195612	3444488415	4851035036	4855913459	4732437643	4750175911	4862239461	4719468464

ULong averages:
  [0]: 2068733682
  [1]: 1680782115
  [2]: 2883832855
  [3]: 2887164759
  [4]: 2808868012
  [5]: 2813554316
  [6]: 2883745010
  [7]: 2796427232
  [C]: 2602888497.625

Double scores...
0	546644896	546683231	543515961	544158253	547323680	546130210	547258254	545931722
1	809585260	809687095	805208961	805995873	811062191	810660798	810996610	810332046
2	1074791955	1074965004	1067698283	1068743733	1077134825	1078481364	1077081969	1078109357
3	1337312671	1337587886	1330210724	1331437563	1340576512	1341548515	1340501781	1341095981
4	1602081634	1602389372	1591677705	1593045224	1604059475	1605942770	1603989626	1605418689
5	1868544351	1868925841	1855200968	1856709547	1870718432	1867856175	1870637400	1867194243
6	2133873141	2134324390	2118975367	2120662494	2134172626	2132539727	2134065417	2131862886
7	2401144388	2401656582	2383060346	2385070285	2397502809	2392659574	2397330467	2391848656
8	265088745	2666758896	2647202755	2649279766	2661619439	2658193973	2661864083	2657280768
9	527310383	2929093301	2910513597	2912740416	2925074924	2922969034	2926830942	2921835575

Double averages:
  [0]: 1256637742
  [1]: 1737207159
  [2]: 1725326466
  [3]: 1726784315
  [4]: 1736924491
  [5]: 1735698214
  [6]: 1737055654
  [7]: 1735090992
  [C]: 1673840629.125


My server (2 x Xeon 5310) yeilded completely different results:
Code:
UInt32: 1527420409.875
 Float: 1047687385.375
UInt64: 1475289090.625
Double: 1016090759.625
Note how, even with the if statement in the float, 32-bit still wins.

Here's the results on my 2 x Xeon 5310:
Code:
UInt scores...
0	543384393	543013687	544768497	544519909	544522800	544533281	544285818	540900365
1	265650138	808439622	810599932	810343134	810381446	810342297	810108109	806017030
2	264692667	1073574076	1076314643	1075927570	1075997201	1075977705	1075827588	1069963881
3	530064684	1339090388	1342110108	1341699536	1341336483	1341853665	1341694542	1334987233
4	795179857	1604107089	1607797296	1607440210	1606794662	1607620876	1607247665	264819565
5	1060658514	1869491878	1873701411	1873226779	1872575166	1873417399	1872971272	529775080
6	1326004786	2134697430	2139486023	2138957826	2138117993	2139302568	2138762528	793030071
7	1591253901	2400293424	2405382066	2404846882	2403992346	2405201714	2404650333	1057502201
8	1856887154	2665726512	2671276528	2670731516	2669807772	2671038404	2670556649	1321704298
9	265601922	2931152521	2937115627	2936649573	2935610876	2936959538	2936398432	1585186267

UInt averages:
  [0]: 849937801
  [1]: 1736958662
  [2]: 1740855213
  [3]: 1740434293
  [4]: 1739913674
  [5]: 1740624744
  [6]: 1740250293
  [7]: 930388599
  [C]: 1527420409.875

Float scores...
0	343159162	348036098	348540105	348510997	344074918	348580163	348593099	343930780
1	493878096	506850229	494871034	507867843	503400554	507959625	494931936	494508796
2	645115541	665736598	645932044	662622949	662711755	667290541	654296194	645690071
3	796439448	813419905	805392946	822098183	822147736	826821011	813779126	796855491
4	947463893	972732429	964932856	981607238	981608411	986315905	973312846	947934527
5	1098717332	1115652728	1124488142	1141161533	1141161458	1145867083	1132859483	1098958231
6	1247419224	1274969377	1284010215	1300652748	1300701635	1305401876	1292404029	1245271473
7	1393086009	1434256180	1434563100	1460152517	1460077137	1464874327	1451900351	1397763476
8	1551435123	1593553813	1594091051	1619692270	1619603894	1624393784	1611428208	1551815319
9	1702620579	1752915804	1753372528	1779228928	1779155488	1783926200	1770975219	1700461910

Float averages:
  [0]: 1021933440
  [1]: 1047812316
  [2]: 1045019402
  [3]: 1062359520
  [4]: 1061464298
  [5]: 1066143051
  [6]: 1054448049
  [7]: 1022319007
  [C]: 1047687385.375

ULong scores...
0	582261138	581047369	582201357	582178112	582112193	582042596	579340111	579264767
1	847929521	845968120	847962374	848036468	847758580	847849943	844069137	265533571
2	1113832602	1111554441	1113869036	1113930292	1113661430	1113749831	1109327346	265696998
3	1379749130	1377163737	1379780089	1379832068	1379571718	1379663761	265133364	531503321
4	1645650030	1642776586	1645698454	1645742496	1645497019	1645562389	530114885	265504129
5	1911528491	1908369163	1911585160	1911643950	1911401632	1911470279	795319950	531122548
6	2177435194	2173983488	2177495196	2177559135	2177292264	2177379427	1060624675	265375346
7	2443334400	2439575454	2443405099	2443468643	2443182697	2443290716	1325862209	530786524
8	2709260475	2705192627	2709328450	2709394782	2709099442	2709214998	265404749	796515357
9	2975155122	2970805059	2975239479	2975292551	2975013273	2975121275	264994776	265482720

ULong averages:
  [0]: 1778613610
  [1]: 1775643604
  [2]: 1778656469
  [3]: 1778707849
  [4]: 1778459024
  [5]: 1778534521
  [6]: 704019120
  [7]: 429678528
  [C]: 1475289090.625

Double scores...
0	177056079	384897913	385405194	385422378	385352554	385380109	385349414	383566603
1	354196979	561962556	562680188	562697606	562628588	562654416	562621391	560458831
2	531241588	739032755	739949488	739916338	739873685	739930890	739887326	176653405
3	176955390	916104858	917214970	917184176	917120035	917210228	917144454	353309889
4	177164621	1093179967	1094478625	1094457442	1094381951	1094489572	1094425550	530191061
5	177065725	1270238240	1271733092	1271715876	1271425179	1271757900	1271677689	706989162
6	354206668	1447301565	1449000475	1448974223	1448582450	1449029732	1448951100	883843788
7	531395716	1624378328	1626278472	1626252479	1625853277	1626307191	1626228075	1060740857
8	176980301	1801452031	1803553657	1803522520	1803109961	1803578911	1803502098	1237397120
9	354129914	1978528431	1980837375	1980803511	1980374321	1980852147	1980780394	1414101792

Double averages:
  [0]: 301039298
  [1]: 1181707664
  [2]: 1183113153
  [3]: 1183094654
  [4]: 1182870200
  [5]: 1183119109
  [6]: 1183056749
  [7]: 730725250
  [C]: 1016090759.625

Core i7 apparently doesn't take to the if statement as well as the Core 2 based processors do. The Core i7 has a very weak showing in the 32-bit float so, my conclusion is that it boils down to the hardware...


Single precision floats are the norm for stressing a computer so I really see no problem with it. All things being the same, single should be equal to, or faster than, double precision.

Seeing as Larrabee is using a modernized P3 core, it is impossible to speculate how its double precision performance compares to single.
 
Last edited:
Joined
Nov 13, 2007
Messages
6,300 (1.65/day)
Likes
1,792
Location
Austin Texas
System Name TimeDumpster
Processor Intel i7 7820X Delidded @ 4.75Ghz / 3.1Ghz Mesh
Motherboard MSI X299 Tomahawk
Cooling 240mm Corsair H105 Intake
Memory 32 GB Quad 3434Mhz DDR4 15-16-16-38-300-1T
Video Card(s) Gigabyte GTX 1080 Ti Gaming
Storage 1Tb Samsung 960 Pro m2, 1TB Samsung 850 Pro SSD
Display(s) Dell 24" 2560x1440 144hz, G-Sync @ 165Hz
Case NZXT S340 Elite Black
Audio Device(s) Arctis 7
Power Supply FSP HydroG 750W
Mouse zowie ec-2
Keyboard corsair k65 tenkeyless
Software Windows 10 64 Bit
Benchmark Scores Cb: 2103 Multi, 209 Single, 10450 Timespy - 10150 GPU/11900 CPU, superpi 1M - 7.71s
#49
The video card came before the high performance computing aspects of it. There is more money in discreet video cards than cards to bulster CPU performance.

When Intel gathered information that showed a series of x86 CPUs (that's what Intel is all about, after all) could rival the performance of a modern GPU offered by NVIDIA and AMD, the idea of Larrabee was born. From that idea came the fact that it is x86 and programmers would easily be able to use it so the GPU idea became a GPGPU idea. As proof of this, note how little information has been released about Larrabee's GPU performanc, e. Intel knows people will buy Larrabee as a GPGPU card just because it has the Intel brand on it. On the other hand, Intel knows they have to topple two corporations that have been in the segment for well over a decade. When a product, like Core 2, is quiet until release, the media frenzy sparked by a new, dominent product sells itself. That is most likely the same strategy Intel is relying on to sell Larrabee. It also explains why they are so tight-lipped about its performance as a GPU.
Its a possibility but I really dont think that this is about graphics anymore..

http://www.datacenterknowledge.com/archives/2009/10/05/nvidias-fermi-gpu-targets-the-hpc-market/

Even nvidia is openly admitting that GPGPU and HPC market are the primary targets for Fermi.

And the first presentation of Larrabee intel talked about the evolution of computing... not mentioning the 'evolution of graphics'.



this is the "supercomputing for the masses" movement and it represents a different mentality altogether, one in which the cpu is nothing more than a glorified scheduler and the GPU does all the heavy lifting. This is the gist of what I get from all the GP GPU hype, and seems like intel is positioning larrabee as such.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
21,528 (6.18/day)
Likes
10,714
Location
IA, USA
System Name BY-2015
Processor Intel Core i7-6700K (4 x 4.00 GHz) w/ HT and Turbo on
Motherboard MSI Z170A GAMING M7
Cooling Scythe Kotetsu
Memory 2 x Kingston HyperX DDR4-2133 8 GiB
Video Card(s) PowerColor PCS+ 390 8 GiB DVI + HDMI
Storage Crucial MX300 275 GB, Seagate 6 TB 7200 RPM
Display(s) Samsung SyncMaster T240 24" LCD (1920x1200 HDMI) + Samsung SyncMaster 906BW 19" LCD (1440x900 DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay
Audio Device(s) Realtek Onboard, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse SteelSeries Sensei RAW
Keyboard Tesoro Excalibur
Software Windows 10 Pro 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
#50
At this point, it is like asking what came first: the chicken or the egg. I'm not convinced more than 1% of the market cares about general purpose computing beyond the capabilities of the CPU. People always want bigger screens and bigger screens means higher resolutions and higher resolutions means better graphics cards. We'll see what is the driving force behind the market for discreet cards in a few years time.