Monday, June 23rd 2014

Intel Details Its Next-Gen Xeon Phi Processor

Intel Corporation today announced new details for its next-generation Intel Xeon Phi processors, code-named Knights Landing, which promise to extend the benefits of code modernization investments being made for current generation products. These include a new high-speed fabric that will be integrated on-package and high-bandwidth, on-package memory that combined, promise to accelerate the rate of scientific discovery. Currently memory and fabrics are available as discrete components in servers limiting the performance and density of supercomputers.

The new interconnect technology, called Intel Omni Scale Fabric, is designed to address the requirements of the next generations of high-performance computing (HPC). Intel Omni Scale Fabric will be integrated in the next generation of Intel Xeon Phi processors as well as future general-purpose Intel Xeon processors. This integration along with the fabric's HPC-optimized architecture is designed to address the performance, scalability, reliability, power and density requirements of future HPC deployments. It is designed to balance price and performance for entry-level through extreme-scale deployments.

"Intel is re-architecting the fundamental building block of HPC systems by integrating the Intel Omni Scale Fabric into Knights Landing, marking a significant inflection and milestone for the HPC industry," said Charles Wuischpard, vice president and general manager of Workstations and HPC at Intel. "Knights Landing will be the first true many-core processor to address today's memory and I/O performance challenges. It will allow programmers to leverage existing code and standard programming models to achieve significant performance gains on a wide set of applications. Its platform design, programming model and balanced performance makes it the first viable step towards exascale."

Knights Landing - Unmatched Integration
Knights Landing will be available as a standalone processor mounted directly on the motherboard socket in addition to the PCIe-based card option. The socketed option removes programming complexities and bandwidth bottlenecks of data transfer over PCIe, common in GPU and accelerator solutions. Knights Landing will include up to 16 GB high-bandwidth, on-package memory at launch - designed in partnership with Micron - to deliver five times better bandwidth compared to DDR4 memory, five times better energy efficiency and three times more density than current GDDR-based memory. When combined with integrated Intel Omni Scale Fabric, the new memory solution will allow Knights Landing to be installed as an independent compute building block, saving space and energy by reducing the number of components.

Powered by more than 60 HPC-enhanced Silvermont architecture-based cores, Knights Landing is expected to deliver more than 3 TFLOPS of double-precision performance and three times the single-threaded performance compared with the current generation. As a standalone server processor, Knights Landing will support DDR4 system memory comparable in capacity and bandwidth to Intel Xeon processor-based platforms, enabling applications that have a much larger memory footprint. Knights Landing will be binary-compatible with Intel Xeon processors, making it easy for software developers to reuse the wealth of existing code.

For customers preferring discrete components and a fast upgrade path without needing to upgrade other system components, both Knights Landing and Intel Omni Scale Fabric controllers will be available as separate PCIe-based add-on cards. There is application compatibility between currently available Intel True Scale Fabric and future Intel Omni Scale Fabric, so customers can transition to new fabric technology without change to their applications. For customers purchasing Intel True Scale Fabric today, Intel will offer a program to upgrade to Intel Omni Scale Fabric when it's available.

Knights Landing processors are scheduled to power HPC systems in the second half of 2015. For instance, in April the National Energy Research Scientific Computing Center (NERSC) announced an HPC installation planned for 2016, serving more than 5,000 users and over 700 extreme-scale science projects.

"We are excited about our partnership with Cray and Intel to develop NERSC's next supercomputer 'Cori,'" said Dr. Sudip Dosanjh, NERSC Director, Lawrence Berkeley National Laboratory. "Cori will consist of over 9,300 Intel Knights Landing processors and will serve as an on-ramp to exascale for our users through an accessible programming model. Our codes, which are often memory-bandwidth limited, will also greatly benefit from Knights Landing's high speed on package memory. We look forward to enabling new science that cannot be done on today's supercomputers."

New Fabric, New Speeds with Intel Omni Scale Fabric
Intel Omni Scale fabric is built upon a combination of enhanced acquired IP from Cray and QLogic, and Intel's own in-house innovations. It will include a full product line offering consisting of adapters, edge switches, director switch systems, and open-source fabric management and software tools. Additionally, traditional electrical transceivers in the director switches in today's fabrics will be replaced by Intel Silicon Photonics-based solutions, enabling increased port density, simplified cabling and reduced costs. Intel Silicon Photonics-based cabling and transceiver solutions may also be used with Intel Omni Scale-based processors, adapter cards and edge switches.

Intel Supercomputing Momentum Continues
The current generation of Intel Xeon processors and Intel Xeon Phi coprocessors powers the top-rated system in the world - the 35 PFLOPS "Milky Way 2" in China. Intel Xeon Phi coprocessors are also available in more than 200 OEM designs worldwide.

Intel-based systems account for 85 percent of all supercomputers on the 43rd edition of the TOP500 list announced today and 97 percent of all new additions. Within 18 months after the introduction of Intel's first many-core architecture products, Intel Xeon Phi coprocessor-based systems already make up 18 percent of the aggregated performance of all TOP500 supercomputers. The complete TOP500 list is available at www.top500.org.

To help optimize applications for many-core processing, Intel has also established more than 30 Intel Parallel Computing Centers (IPCC) in cooperation with universities and research facilities around the world. Today's parallel optimization investment with the Intel Xeon Phi coprocessor will carry forward to Knights Landing, as optimizations using standards-based, common programming languages persist with a recompile. Incremental tuning gains will be available to take advantage of innovative new functionality.
Add your own comment

10 Comments on Intel Details Its Next-Gen Xeon Phi Processor

#1
theoneandonlymrk
Wow looking good , , still want one despite only one clear use for me, folding/crunching
Posted on Reply
#2
james888
For cruncher/folding, I do wonder how well it compares to say a 16 core opteron.
Posted on Reply
#3
HumanSmoke
by: james888
For cruncher/folding, I do wonder how well it compares to say a 16 core opteron.
Does F@H support Xeon Phi ? I was under the impression that it did not. That's assuming you've conquered the CPU and motherboard compatibility issues - doesn't Xeon Phi also require 64-bit PCI-E addressing?
Posted on Reply
#4
TRWOV
by: james888
For cruncher/folding, I do wonder how well it compares to say a 16 core opteron.
The first Phi had 32 Pentium III based cores at 1.2Ghz for 750TFLOPs; it seems that Intel has switched to Atom this time and is claiming x3 single threaded performance in core vs core but take out the hypervisor overhead and maybe we'd be looking at a 2-2.5 increase.
Posted on Reply
#5
james888
by: HumanSmoke
Does F@H support Xeon Phi ? I was under the impression that it did not. That's assuming you've conquered the CPU and motherboard compatibility issues - doesn't Xeon Phi also require 64-bit PCI-E addressing?
It probably does not, but I can still wonder. Even if it did, the barrier cost to have a system to put it in is probably pretty high. There are a few crunchers and folders with quad opteron server boards around which are also quite expensive, and what I would compare this too.

by: TRWOV
The first Phi had 32 Pentium III based cores at 1.2Ghz for 750TFLOPs; it seems that Intel has switched to Atom this time and is claiming x3 single threaded performance in core vs core but take out the hypervisor overhead and maybe we'd be looking at a 2-2.5 increase.
" Knights Landing is expected to deliver more than 3 TFLOPS of double-precision performance"

I am sure you read that, but I don't know how much a quad opteron server does. My old 7970, of which I no longer have, had 1TB double precision as I just read. 7970's do well in FAH, and this theoretically would be 3 times as much performance.
Posted on Reply
#6
HalfAHertz
by: james888
For cruncher/folding, I do wonder how well it compares to say a 16 core opteron.
It favors very well... the highest clocked 16 core opti has a theoretical performance of ~180 GFLOPS. This thing is supposedly pulling ~3000GFLOPS.
Posted on Reply
#7
HumanSmoke
by: HalfAHertz
It favors very well... the highest clocked 16 core opti has a theoretical performance of ~180 GFLOPS. This thing is supposedly pulling ~3000GFLOPS.
Supposedly being the operative word. Xeon Phi isn't noted for it's actual vs. theoretical floating point performance
Posted on Reply
#8
Scrizz
by: HumanSmoke
Supposedly being the operative word. Xeon Phi isn't noted for it's actual vs. theoretical floating point performance

this thing isn't even out.....
we'll ahve to wait a bit to see how it actually pans out.....
Posted on Reply
#9
Aquinus
Resident Wat-man
...but people aren't asking that one question that floats in the back of people's minds. Can it run Crysis?

I know that's a silly question, but it does expand to a larger question which is; how will these changes impact the consumer market and how long will it take for any said change to happen? I suspect if Intel thinks doing this kind of thing is worth while, we might start seeing it in their other products in some shape or form. Image buying a motherboard with no dimm slots because the DRAM is on the CPU. That doesn't just improve latencies, it reduces pins on the CPU and traces on the motherboard. After all we did see Haswell with the VRMs on the CPU, so I would imagine Intel would keep looking for things that might belong there. Just a thought.
Posted on Reply
#10
Jizzler
by: Aquinus
...but people aren't asking that one question that floats in the back of people's minds. Can it run Crysis?

I know that's a silly question, but it does expand to a larger question which is; how will these changes impact the consumer market and how long will it take for any said change to happen? I suspect if Intel thinks doing this kind of thing is worth while, we might start seeing it in their other products in some shape or form. Image buying a motherboard with no dimm slots because the DRAM is on the CPU. That doesn't just improve latencies, it reduces pins on the CPU and traces on the motherboard. After all we did see Haswell with the VRMs on the CPU, so I would imagine Intel would keep looking for things that might belong there. Just a thought.
Not that silly. Daniel Pohl of Intel had his ray-traced version of Wolfenstein running on 8 of the original Knights Ferry. I'm sure a demonstration is being prepared using these upcoming models to show off the benefits of on-package memory, core design, and scaling interface.
Posted on Reply