Tuesday, March 25th 2014

NVIDIA Launches World's First High-Speed GPU Interconnect

NVIDIA today announced that it plans to integrate a high-speed interconnect, called NVIDIA NVLink, into its future GPUs, enabling GPUs and CPUs to share data five to 12 times faster than they can today. This will eliminate a longstanding bottleneck and help pave the way for a new generation of exascale supercomputers that are 50-100 times faster than today's most powerful systems.

NVIDIA will add NVLink technology into its Pascal GPU architecture -- expected to be introduced in 2016 -- following this year's new NVIDIA Maxwell compute architecture. The new interconnect was co-developed with IBM, which is incorporating it in future versions of its POWER CPUs.

"NVLink technology unlocks the GPU's full potential by dramatically improving data movement between the CPU and GPU, minimizing the time that the GPU has to wait for data to be processed," said Brian Kelleher, senior vice president of GPU Engineering at NVIDIA.

"NVLink enables fast data exchange between CPU and GPU, thereby improving data throughput through the computing system and overcoming a key bottleneck for accelerated computing today," said Bradley McCredie, vice president and IBM Fellow at IBM. "NVLink makes it easier for developers to modify high-performance and data analytics applications to take advantage of accelerated CPU-GPU systems. We think this technology represents another significant contribution to our OpenPOWER ecosystem."

With NVLink technology tightly coupling IBM POWER CPUs with NVIDIA Tesla GPUs, the POWER data center ecosystem will be able to fully leverage GPU acceleration for a diverse set of applications, such as high performance computing, data analytics and machine learning.

Advantages Over PCI Express 3.0
Today's GPUs are connected to x86-based CPUs through the PCI Express (PCIe) interface, which limits the GPU's ability to access the CPU memory system and is four- to five-times slower than typical CPU memory systems. PCIe is an even greater bottleneck between the GPU and IBM POWER CPUs, which have more bandwidth than x86 CPUs. As the NVLink interface will match the bandwidth of typical CPU memory systems, it will enable GPUs to access CPU memory at its full bandwidth.

This high-bandwidth interconnect will dramatically improve accelerated software application performance. Because of memory system differences -- GPUs have fast but small memories, and CPUs have large but slow memories -- accelerated computing applications typically move data from the network or disk storage to CPU memory, and then copy the data to GPU memory before it can be crunched by the GPU. With NVLink, the data moves between the CPU memory and GPU memory at much faster speeds, making GPU-accelerated applications run much faster.

Unified Memory Feature
Faster data movement, coupled with another feature known as Unified Memory, will simplify GPU accelerator programming. Unified Memory allows the programmer to treat the CPU and GPU memories as one block of memory. The programmer can operate on the data without worrying about whether it resides in the CPU's or GPU's memory.

Although future NVIDIA GPUs will continue to support PCIe, NVLink technology will be used for connecting GPUs to NVLink-enabled CPUs as well as providing high-bandwidth connections directly between multiple GPUs. Also, despite its very high bandwidth, NVLink is substantially more energy efficient per bit transferred than PCIe.

NVIDIA has designed a module to house GPUs based on the Pascal architecture with NVLink. This new GPU module is one-third the size of the standard PCIe boards used for GPUs today. Connectors at the bottom of the Pascal module enable it to be plugged into the motherboard, improving system design and signal integrity.

NVLink high-speed interconnect will enable the tightly coupled systems that present a path to highly energy-efficient and scalable exascale supercomputers, running at 1,000 petaflops (1 x 1018 floating point operations per second), or 50 to 100 times faster than today's fastest systems.
Add your own comment

22 Comments on NVIDIA Launches World's First High-Speed GPU Interconnect

#1
Hilux SSRG
Watching the keynote and Pascal looks like it will hammer away at Intel and their CPUs.
Posted on Reply
#2
btarunr
Editor & Senior Moderator
Hey there HyperTransport, long time!
Posted on Reply
#3
cadaveca
My name is Dave
by: btarunr
Hey there HyperTransport, long time!
I believe you mean "SidePort".
NVLink technology will be used for connecting GPUs to NVLink-enabled CPUs as well as providing high-bandwidth connections directly between multiple GPUs.
= AMD SidePort/IOMMU.

;)

As is the norm, AMD creates the idea, and Nvidia brings a useable form to the masses. Tech partnerships at it's best, really.
Posted on Reply
#4
DaJMasta
I'm not against it... but since when was GPU scaling limited by PCIe throughput? Maybe it's a latency thing... but 16 PCIe 3.0 lanes is quite a bit of bandwidth, and I thought that we've seen time and time again that the performance impact of halving that (running 8x) is minimal even with the highest end cards.

They say it's because they need access to CPU memory and that GPU memory is "small", but I think again we've seen the opposite trend. Plenty of enthusiast computers have 16GB of main memory but have easily 3-4GB of VRAM per card. Is it just that a lot of main memory isn't used in gaming, and you can just hoard textures in there if you had more bandwidth?
Posted on Reply
#6
cadaveca
My name is Dave
by: DaJMasta
I'm not against it... but since when was GPU scaling limited by PCIe throughput? Maybe it's a latency thing... but 16 PCIe 3.0 lanes is quite a bit of bandwidth, and I thought that we've seen time and time again that the performance impact of halving that (running 8x) is minimal even with the highest end cards.
If you do GPGPU, ever since such was possible, PCIe has been a limitation. This FACT (shown by the AOKI paper at the beginning of the STREAM, and has been something I have been personally talking about for years) has been present since PCIe came out, really.


A limitation in gaming? Yes AND No. AMD Multi-GPU stutter problems are due to PCIe limitations.

If you watched Nvidia's promo...they easily pointed out that in order to provide what is needed to make a real jump in graphics, requires 1000's of bits of memory interconnect...compared to the 384 we have today. Being able to feed that memory, as well as other GPUs, is not possible over PCIe...hence NV-LINK.
Posted on Reply
#7
Hilux SSRG
by: cadaveca
If you do GPGPU, ever since such was possible, PCIe has been a limitation. This FACT (shown by the AOKI paper at the beginning of the STREAM, and has been something I have been personally talking about for years) has been present since PCIe came out, really.


A limitation in gaming? Yes AND No. AMD Multi-GPU stutter problems are due to PCIe limitations.

If you watched Nvidia's promo...they easily pointed out that in order to provide what is needed to make a real jump in graphics, requires 1000's of bits of memory interconnect...compared to the 384 we have today. Being able to feed that memory, as well as other GPUs, is not possible over PCIe...hence NV-LINK.
So is NVLINK a physical replacement to PCIe on the Nvidia mobos?

I will be glad to buy a mobo that has both NVLINK and PCIe. Especially tired of seeing Intel's lack of PCIe 4, DDR4, new technology interfaces, etc on their Z97 and X99 platforms.
Posted on Reply
#8
cadaveca
My name is Dave
by: Hilux SSRG
So is NVLINK a physical replacement to PCIe on the Nvidia mobos?
Christian_25H covered that well already:
Although future NVIDIA GPUs will continue to support PCIe, NVLink technology will be used for connecting GPUs to NVLink-enabled CPUs as well as providing high-bandwidth connections directly between multiple GPUs. Also, despite its very high bandwidth, NVLink is substantially more energy efficient per bit transferred than PCIe.
Posted on Reply
#9
Arjai
Looks promising...How long will it take to get to gaming desktops, any guesses? 2016 seems like enough time to incorporate it to a gaming platform...hmm?

*EDIT, just noticed I had put in 2026, instead of 2016. :oops:
Posted on Reply
#10
erocker
by: cadaveca
NVLink-enabled CPUs
Interesting, what are these going to be?
Posted on Reply
#11
cadaveca
My name is Dave
by: erocker
Interesting, what are these going to be?
I guess we'll start to see them in 2016?

Although, the mention of IBM POWERPC chips...kinda...well...removes my excitement. :roll:


Looking at the physical sample NV showed today, it looks a lot like a module for the new Apple MAC PRO trashcan-PC.

If that's Nvidia's choice to stay relevant to the marketplace...to work with Apple...well...
Posted on Reply
#12
erocker
Heh, I didn't even know IBM still made PowerPC chips! Looking forward to seeing how it pans out.
Posted on Reply
#13
H2323
by: cadaveca
I believe you mean "SidePort".



= AMD SidePort/IOMMU.

;)

As is the norm, AMD creates the idea, and Nvidia brings a useable form to the masses. Tech partnerships at it's best, really.
K...seriously how is this usable....the CPU has to have this built in as well, and it is clear that the only one that will do this is PowerPC and any custom ARM SoC Nvidia wants to design. This is for enterprise and supercomputers it will not be on your PC. AMD and Intel already have there own internal solutions.
Posted on Reply
#14
H2323
by: erocker
Interesting, what are these going to be?
Just powerPC, AMD has there HSA solution and Intel obviously has there own solutions, sounds good but nothing for the consumer
Posted on Reply
#15
Jizzler
In the short term we may not get CPUs with NV-Link in consumer form but perhaps it will allow for better performing and more efficient multi-GPU gaming cards?

The Titan-Z is already outdated, wait for the Titan-Z2 with 12GB of unified memory ;)
Posted on Reply
#16
cadaveca
My name is Dave
by: Jizzler
In the short term we may not get CPUs with NV-Link in consumer form but perhaps it will allow for better performing and more efficient multi-GPU gaming cards?
Yep, push data to primary card, and then have secondary cards link together as slave devices, presenting itself as a large compute interface that the OS sees like a single compute device.



Oh wait, that's exactly the scenario hinted at in the presentation...:p and shown in the slides.
Posted on Reply
#17
Jizzler
I'll wait for a whitepaper on it to be posted. It's better for their bottom line if I don't watch these types of presentations :)
Posted on Reply
#18
zinfinion
Uhhh, what happened to Volta? Wasn't that supposed to be Maxwell's follow up?
Posted on Reply
#19
Hilux SSRG
by: zinfinion
Uhhh, what happened to Volta? Wasn't that supposed to be Maxwell's follow up?
It may still slot in between Maxwell and Pascal in 2015/2016.
Posted on Reply
#20
Steevo
2016..just right....around.........the.....................corner..........................................................



Is it just me or does it seem like someone threw the PR team there out a window and they are trying to make enough PR slides to land on.
Posted on Reply
#21
LeonVolcove
so its like AMD HSA but you still need CPU + Dedicated GPU?
Posted on Reply
#22
RejZoR
Unfortunately i fell asleep during the keynote. It was too much scientific computing and very little for the gaming. And if i'm honest, both graphics demos were rather boring. That Unreal Engine fight scene was nothing to talk about and that whale water simulation, it was ok and shows the muscle, but i wasn't actually impressed by it. There are so many better and more impressive ways to showcase fluid dynamics than with a transparent whale...
Posted on Reply
Add your own comment