Monday, November 12th 2012

Intel Delivers New Architecture for Discovery with Intel Xeon Phi Coprocessors

Marking a new era in high-performance computing, Intel Corporation introduced the Intel Xeon Phi coprocessor, a culmination of years of the research and collaboration, to bring unprecedented performance for innovative breakthroughs in manufacturing, life sciences, energy and other areas.

The ability to quickly compute, simulate and make more informed decisions has propelled the growth of high performance computing (HPC) and analytics. This has been driven by global business and research priorities to more accurately predict weather patterns, create more efficient energy resources, and develop cures for diseases among many other pressing issues. With the breakthrough performance per watt and other new attributes of Intel Xeon Phi coprocessor, the industry will have even greater reliability in generating accurate answers, help proliferate high-performance computing beyond laboratories and universities and achieve maximum productivity.


"Intel Xeon Phi coprocessor represents an achievement in Intel innovation that will help propel us to new heights in research and discovery, and reaffirms our commitment to Exascale-level computing," said Diane Bryant, vice president and general manager of the Datacenter and Connected Systems Group. "The combination of the Intel Xeon processor family and the Intel Xeon Phi coprocessor will change the scope and scale of what highly parallel applications can accomplish, by delivering unprecedented performance, efficiency and programmability. With this technology as a new foundation for HPC, solving real-world challenges from accurately predicting weather patterns 21 days in advance, to developing new cures for diseases will become increasingly possible."

Based on the Intel Many Integrated Core (Intel MIC) architecture, Intel Xeon Phi coprocessors will complement the existing Intel Xeon processor E5-2600/4600 product families to deliver unprecedented performance for highly parallel applications. The Intel Xeon processor E5 family is a high-performance computing workhorse that has powered numerous Top500 systems to Petascale performance (1 quadrillion floating point operations per second). Now with Intel Xeon Phi products handling much of the "highly parallel" processing to help supercomputers produce answers for a wide range of scientific and technical disciplines such as genetic research, oil and gas exploration and climate modeling, Intel believes that this powerful combination will help blaze a path to Exascale computing, which would mark a thousand-fold increase in computational capabilities over Petascale.

Saving Time and Resources with World's Most Popular Programing Model

The Intel Xeon Phi coprocessor takes advantage of familiar programming languages, parallelism models, techniques and developer tools available for the Intel architecture. This helps ensure that software companies and IT departments are equipped with greater use of parallel code without retraining developers on proprietary and hardware specific programming models associated with accelerators. Intel is providing the software tools to help scientists and engineers optimize their code to take full advantage of Intel Xeon Phi coprocessors, including Intel Parallel Studio XE and Intel Cluster Studio XE. Available today, these tools enable code optimization and, through using the same programming languages and models shared by Intel Xeon Phi coprocessors and Intel Xeon processors E5 product family, help applications benefit both from tens of Intel Xeon Phi coprocessor cores and also from more efficient use of Intel Xeon processor threads.

Introducing Two New Intel Xeon Phi Product Families

Built with Intel's most advanced 22-nanometer, 3-D tri-gate transistors, Intel is introducing two new Intel Xeon Phi coprocessor families that provide optimal performance and performance-per-watt for highly parallel HPC workloads.

The Intel Xeon Phi coprocessor 3100 family will provide great value for those seeking to run compute-bound workloads such as life science applications and financial simulations. The Intel Xeon Phi 3100 family will offer more than 1000 Gigaflops (1 TFlops) double-precision performance, support for up to 6 GB memory at 240 GB/sec bandwidth, and a series of reliability features including memory error correction codes (ECC). The family will operate within a 300W thermal design point (TDP) envelope.

The Intel Xeon Phi coprocessor 5110P provides additional performance at a lower power envelope. It reaches 1,011 Gigaflops (1.01 TFlops) double-precision performance, and supports 8 GB of GDDR5 memory at a higher 320 GB/sec memory bandwidth. With 225 watts TDP, the passively cooled Intel Xeon Phi coprocessor 5110P delivers power efficiency that is ideal for dense computing environments, and is aimed at capacity-bound workloads such as digital content creation and energy research. This processor has been delivered to early customers and featured in the 40th edition of the top500 list.

To provide early access to new Intel Xeon Phi coprocessor technology for customers such as Texas Advanced Computing Center (TACC), Intel has additionally offered customized products: Intel Xeon Phi coprocessor SE10X and Intel Xeon Phi coprocessor SE10P.These offer 1073 GFlops double precision performance at a 300W TDP with rest of the specification similar to Intel Xeon Phi coprocessor 5110P.

Broad Industry and Customers Adoption for Intel Xeon Phi coprocessor

More than 50 manufacturers are designing solutions based on the Intel Xeon Phi coprocessors, including Acer, Appro, Asus, Bull, Colfax, Cray, Dell, Eurotech, Fujitsu, Hitachi, HP, IBM, Inspur, NEC, Quanta, SGI, Supermicro and Tyan.

Professor Stephen Hawking and the Cosmos Lab at the University of Cambridge have been given early access to Intel Xeon Phi coprocessor technology for use in their SGI supercomputer. "I am delighted that our new COSMOS supercomputer from SGI contains the latest many-core technology from Intel, the Intel Xeon Phi coprocessors," said Hawking. "With our powerful and flexible SGI UV2000, we can continue to focus on discovery, leading worldwide efforts to advance the understanding of our universe."

Majority of TOP500 Supercomputers Chose Intel as the Compute Engine

More than 75 percent (379 systems) of the supercomputers on the 40th edition of the Top500 list are powered by Intel processors. Of those systems making their first appearance on the list, Intel-powered systems account for more than 91 percent. The November edition of the list had recorded seven systems based on Intel Xeon Phi coprocessors, including initial deployment of TACC's "Stampede" system (2.66 PFlops, #7 on the list); " Discover" system at NASA Center for Climate Simulation (417 TFlops, #52); Intel "Endeavour" system (379 TFlops, #57); "MVS-10P" supercomputer at the Joint Supercomputer Center of the Russian Academy of Sciences (375 TFlops, #58) "Maia" system at NASA Ames Research Center (212 TFlops, #117); "SUSU" system at The South Ural State University (146 TFlops, #170); and the "Beacon" supercomputerat The National Institute of Computational Sciences at the University of Tennessee (110 TFlops #253) that is also the most power efficient supercomputer on the list and delivers 2.44 GFlops per watt. The complete report is available at www.top500.org.

Pricing and Availability

The Intel Xeon Phi coprocessor 5110P is shipping today with general availability on Jan. 28 with recommended customer price of $2,649. The Intel Xeon Phi coprocessor 3100 product family will be available during the first half of 2013 with recommended customer price below $2,000.
Add your own comment

31 Comments on Intel Delivers New Architecture for Discovery with Intel Xeon Phi Coprocessors

#1
Steevo
225W passively cooled? They must have hidden requirements like your blade airflow needs to be the equal of a small turbine commonly found on private aircraft.


That being said, good job Intel. Lets make this shtuff work.
Posted on Reply
#2
repman244
by: Steevo
225W passively cooled? They must have hidden requirements like your blade airflow needs to be the equal of a small turbine commonly found on private aircraft.
Nothing new really, even the older gen Tesla GPU's were passive only, since you create the airflow with the chassis fans. They are loud but no one cares when it comes to server hardware.

However the fans can consume huge amounts of power (the ones in my DL380 are 1.60A @ 12V each, and there are 12 of them, so that's around 230W maximum :laugh:)
Posted on Reply
#3
cdawall
where the hell are my stars
by: repman244
However the fans can consume huge amounts of power (the ones in my DL380 are 1.60A @ 12V each, and there are 12 of them, so that's around 230W maximum )
The ones I got from an HP server pull are 3.3-3.9A each :twitch: I doubt air flow is a problem in those chassis.
Posted on Reply
#4
repman244
by: cdawall
The ones I got from an HP server pull are 3.3-3.9A each :twitch: I doubt air flow is a problem in those chassis.
:laugh: would love to hear those, are they 120mm?
Posted on Reply
#5
Xzibit
This might be the begining of a Price war for DP acceleration.

Intel will have the lowest price DP accelerator now.

AMD & Nvidia watch out.
Posted on Reply
#6
cdawall
where the hell are my stars
by: repman244
:laugh: would love to hear those, are they 120mm?
Youtube: xCvEqmKqDcY

Yes 120mm and these are the weaker 3.3A models I haven't done a video of the 3.9A Delta's that I am using as of now. Downside to the Delta's is they don't spin with 5V which means they are noisy since they are running 7V.
Posted on Reply
#8
repman244
by: cdawall
Youtube: xCvEqmKqDcY

Yes 120mm and these are the weaker 3.3A models I haven't done a video of the 3.9A Delta's that I am using as of now. Downside to the Delta's is they don't spin with 5V which means they are noisy since they are running 7V.
Now that's bad ass :D:D

I have one 120mm Delta but it's only 3A @ 12V, however it can run at 5V with great airflow but it's a bit too noisy for my daily PC.

EDIT: are you using those daily?

Off-topic FTW
Posted on Reply
#9
Steevo
We need to start using more hot water loops and then using the heat to generate steam to power the system. Self powered cooling!!!!
Posted on Reply
#10
cdawall
where the hell are my stars
by: repman244
Now that's bad ass :D:D

I have one 120mm Delta but it's only 3A @ 12V, however it can run at 5V with great airflow but it's a bit too noisy for my daily PC.

EDIT: are you using those daily?

Off-topic FTW
I am using the noisier Delta's right now gotta keep them temps down H70 is holding load temps to ~40C load @4ghz 1.6v. I pulled the Nidec's to go on my big watercooler they are wired for 7V right now. CFM/SP matches an Ultra Kaze at full tilt noise is slightly better.
Posted on Reply
#11
repman244
by: cdawall
I am using the noisier Delta's right now gotta keep them temps down H70 is holding load temps to ~40C load @4ghz 1.6v. I pulled the Nidec's to go on my big watercooler they are wired for 7V right now. CFM/SP matches an Ultra Kaze at full tilt noise is slightly better.
You got some good ears to be able to take the noise for such a long time ;)

Now I'm tempted to put my Delta back in :pimp:
Posted on Reply
#12
cdawall
where the hell are my stars
by: repman244
You got some good ears to be able to take the noise for such a long time ;)

Now I'm tempted to put my Delta back in :pimp:
I tune it out its a calming noise almost.
Posted on Reply
#13
iO
1 TFLOPs DP for 2000 bucks is a killer:eek:
Posted on Reply
#15
xorbe
Custom PCB strip there to deliver 2000W ... and dear goodness is that a platter hdd?!
Posted on Reply
#16
Cortex
by: xorbe
dear goodness is that a platter hdd?!
:laugh:

Wonder can it OC. :pimp:
Posted on Reply
#18
Morgoth
by: awesomesauce
cool but, can it run minecraft 2
if it is programmed to take advantage of xeon phi
Posted on Reply
#19
Xzibit
Interesting...

DGEMM

Intel Xeon Phi
82% efficiency

AMD Tahiti
90% efficiency

Nvidia Kepler
80% efficiency

Not bad.
Posted on Reply
#20
HumanSmoke
by: Xzibit
Interesting...

DGEMM
Intel Xeon Phi
82% efficiency

AMD Tahiti
90% efficiency

Nvidia Kepler
80% efficiency

Not bad.
Your numbers are out - as per usual it would seem. GK110 is closer to 93% DGEMM thanks to reduced ECC and dispatch overhead as well as Hyper-Q.
Posted on Reply
#21
Xzibit
by: HumanSmoke
Your numbers are out - as per usual it would seem. GK110 is closer to 93% DGEMM thanks to reduced ECC and dispatch overhead as well as Hyper-Q.
http://img.techpowerup.org/121113/K20Efficiency_575px%20jpeg.jpg
Intel and Nvidia are marketing %

Numbers arent wrong by the way if you look at K20 its 80%+.

Fermi was tested by ORNL / Univ. Tennessee / Univ. Manchester it came up short of the 60-65% marketed @ 56%

AMD Tahiti on the other hand has been tested at 90%. Maybe I should have put an (*) next to AMD only one thats been tested outside of marketing.

K20
(SP) 3.52 TFLOPS
(DP) 1.17 TFLOPS / 0.936 (80% marketed Efficiency)
$3199 Est

Xeon Phi SE Line
(SP) 2.16 TFLOPS
(DP) 1.07 TFLOPS / 0.877 (82% marketed Efficiency)
$2649 Est

K20X
(SP) 3.95 TFLOPS
(DP) 1.31 TFLOPS / 1.213 (93% marketed Efficiency)
$4000-6000 Est

S10000
(SP) 5.91 TFLOPS
(DP) 1.48 TFLOPS / 1.332 (90% Tahiti tested Efficiency)
$3599 Est
Posted on Reply
#22
HumanSmoke
by: Xzibit
Numbers arent wrong by the way if you look at K20 its 80%+.
Bullshit :laugh:. Verifiable numbers or STFU.
Since you can't produce any K20 numbers I'll presume you'll carry on pulling numbers out of thin air to continue your trollathon.

Even the bogus numbers don't make sense. K20 and K20X are the same GPU - the difference is 1 SMX, one memory controller and 26MHz core. By your retarded reasoning, S10000 should have an efficiency lower than existing Tahiti. And of course you're too lazy even to look for actual values- Xeon Phi's theoretical FP64 is 1.01TFlop/s for the 5110P and actual measured rating is 829 GFlops/s for the top part (see Intels own product sheet footnotes 2 and 4). BTW, Intel's "numbers"are also based on pre-production 61 core parts- non of which exist as a shipping product.

If you're gonna troll at least spend more than 30 seconds looking for the right numbers. And here's a hint, Titan's Linpack run is in March 2013.
by: Xzibit

K20
(SP) 3.52 TFLOPS
(DP) 1.17 TFLOPS / 0.936 (80% marketed Efficiency)
$3199 Est

K20X
(SP) 3.95 TFLOPS
(DP) 1.31 TFLOPS / 1.213 (93% marketed Efficiency)
$4000-6000 Est
You want to talk about bullshit marketing. Here's AMD's press release for Titan:
The DOE's ORNL supercomputer contains 18,688 nodes, each holding a 16-core AMD Opteron 6274 processor, for a total of almost 300,000 cores at 20 petaFLOPS. "Titan" is 10 times more powerful than ORNL's last world-leading system, "Jaguar,"
Actual Opteron 6274 double precision: 70.4 GFlops/s
70.4 x 18688 = [SIZE="3"]1.3156 petaflops[/SIZE]
Posted on Reply
#23
Steevo
by: HumanSmoke
Bullshit :laugh:. Verifiable numbers or STFU.
Since you can't produce any K20 numbers I'll presume you'll carry on pulling numbers out of thin air to continue your trollathon.

Even the bogus numbers don't make sense. K20 and K20X are the same GPU - the difference is 1 SMX, one memory controller and 26MHz core. By your retarded reasoning, S10000 should have an efficiency lower than existing Tahiti. And of course you're too lazy even to look for actual values- Xeon Phi's theoretical FP64 is 1.01TFlop/s for the 5110P and actual measured rating is 829 GFlops/s for the top part (see Intels own product sheet footnotes 2 and 4). BTW, Intel's "numbers"are also based on pre-production 61 core parts- non of which exist as a shipping product.

If you're gonna troll at least spend more than 30 seconds looking for the right numbers. And here's a hint, Titan's Linpack run is in March 2013.


You want to talk about bullshit marketing. Here's AMD's press release for Titan:


Actual Opteron 6274 double precision: 70.4 GFlops/s
70.4 x 18688 = [SIZE="3"]1.3156 petaflops[/SIZE]
http://www.techpowerup.com/img/12-08-30/intel_xeon_phi_hotchips_architecture_presentation_page_05.jpg

http://www.green500.org/lists/green201106
Posted on Reply
#24
HumanSmoke
by: Steevo
http://www.techpowerup.com/img/12-08-30/intel_xeon_phi_hotchips_architecture_presentation_page_05.jpg
Xeon Phi and Tahiti are more efficient than Fermi (M2090)? I'd pretty much hope so.
That should be a given based on the fact that AMD's own marketing compares Tahiti with GF110

by: Steevo
http://www.green500.org/lists/green201106
Nice. Cherry picking a list from the archives. You could try June 2012 (pretty much a solid wall of IBM Blue Gene), or rather wait until tomorrow when the list for November comes out. Blue Gene seems to top out at 2100 MFlops/W...
Titan used the new Tesla K20x accelerators to achieve an energy efficiency of 2,142.77 megaflops per watt (million calculations per second per watt), enough to also rank Titan No. 1 on the Green500 list of the world's most energy-efficient supercomputers.
[source]
Posted on Reply
#25
Xzibit
by: HumanSmoke
Bullshit :laugh:. Verifiable numbers or STFU.
Since you can't produce any K20 numbers I'll presume you'll carry on pulling numbers out of thin air to continue your trollathon.
Nvidia GK110 Whitepaper
Kepler GK110 will provide over 1 TFlop of double precision throughput with greater than 80% DGEMM
efficiency versus 60‐65% on the prior Fermi architecture.
now STFU :p

I bet you cling to the vagueness of it all. :)
Posted on Reply
Add your own comment