NVIDIA Doubles Performance for Deep Learning Training

btarunr · Jul 7, 2015

NVIDIA today announced updates to its GPU-accelerated deep learning software that will double deep learning training performance. The new software will empower data scientists and researchers to supercharge their deep learning projects and product development work by creating more accurate neural networks through faster model training and more sophisticated model design.

The NVIDIA DIGITS Deep Learning GPU Training System version 2 (DIGITS 2) and NVIDIA CUDA Deep Neural Network library version 3 (cuDNN 3) provide significant performance enhancements and new capabilities. For data scientists, DIGITS 2 now delivers automatic scaling of neural network training across multiple high-performance GPUs. This can double the speed of deep neural network training for image classification compared to a single GPU.

For deep learning researchers, cuDNN 3 features optimized data storage in GPU memory for the training of larger, more sophisticated neural networks. cuDNN 3 also provides higher performance than cuDNN 2, enabling researchers to train neural networks up to two times faster on a single GPU.

The new cuDNN 3 library is expected to be integrated into forthcoming versions of the deep learning frameworks Caffe, Minerva, Theano and Torch, which are widely used to train deep neural networks.

"High-performance GPUs are the foundational technology powering deep learning research and product development at universities and major web-service companies," said Ian Buck, vice president of Accelerated Computing at NVIDIA. "We're working closely with data scientists, framework developers and the deep learning community to apply the most powerful GPU technologies and push the bounds of what's possible."

DIGITS 2 - Up to 2x Faster Training with Automatic Multi-GPU Scaling
DIGITS 2 is the first all-in-one graphical system that guides users through the process of designing, training and validating deep neural networks for image classification.
The new automatic multi-GPU scaling capability in DIGITS 2 maximizes the available GPU resources by automatically distributing the deep learning training workload across all of the GPUs in the system. Using DIGITS 2, NVIDIA engineers trained the well-known AlexNet neural network model more than two times faster on four NVIDIA Maxwell architecture-based GPUs, compared to a single GPU. Initial results from early customers are demonstrating better results.

"Training one of our deep nets for auto-tagging on a single NVIDIA GeForce GTX TITAN X takes about sixteen days, but using the new automatic multi-GPU scaling on four TITAN X GPUs the training completes in just five days," said Simon Osindero, A.I. architect at Yahoo's Flickr. "This is a major advantage and allows us to see results faster, as well letting us more extensively explore the space of models to achieve higher accuracy."

cuDNN 3 - Train Larger, More Sophisticated Models Faster
cuDNN is a GPU-accelerated library of mathematical routines for deep neural networks that developers integrate into higher-level machine learning frameworks.
cuDNN 3 adds support for 16-bit floating point data storage in GPU memory, doubling the amount of data that can be stored and optimizing memory bandwidth. With this capability, cuDNN 3 enables researchers to train larger and more sophisticated neural networks.

"We believe FP16 GPU storage support in NVIDIA's libraries will enable us to scale our models even further, since it will increase effective memory capacity of our hardware and improve efficiency as we scale training of a single model to many GPUs," said Bryan Catanzaro, senior researcher at Baidu Research. "This will lead to further improvements in the accuracy of our models."

cuDNN 3 also delivers significant performance speedups compared to cuDNN 2 for training neural networks on a single GPU. It enabled NVIDIA engineers to train the AlexNet model two times faster on a single NVIDIA GeForce GTX TITAN X GPU.

Availability
The DIGITS 2 Preview release is available today as a free download for NVIDIA registered developers. To learn more or download, visit the DIGITS website. The cuDNN 3 library is expected to be available in major deep learning frameworks in the coming months. To learn more visit the cuDNN website.

View at TechPowerUp Main Site

RejZoR · Jul 7, 2015

I don't care, give more moar framerate on GTX 900 series

haswrong · Jul 7, 2015

RejZoR said:
I don't care, give more moar framerate on GTX 900 series

and cut the prices in half before the deep learned machines retaliate against human stupidity and turn us all in rotting corpses.

BiggieShady · Jul 8, 2015

Stop playing around with neural networks and use them for serious stuff like AI in games.

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	Dark Monolith
Processor	AMD Ryzen 7 5800X3D
Motherboard	ASUS Strix X570-E
Cooling	Arctic Cooling Freezer II 240mm + 2x SilentWings 3 120mm
Memory	64 GB G.Skill Ripjaws V Black 3600 MHz
Video Card(s)	XFX Radeon RX 9070 XT Mercury OC Magnetic Air
Storage	Seagate Firecuda 530 4 TB SSD + Samsung 850 Pro 2 TB SSD + Seagate Barracuda 8 TB HDD
Display(s)	ASUS ROG Swift PG27AQDM 240Hz OLED
Case	Silverstone Kublai KL-07
Audio Device(s)	Sound Blaster AE-9 MUSES Edition + Altec Lansing MX5021 2.1 Nichicon Gold
Power Supply	BeQuiet DarkPower 11 Pro 750W
Mouse	Logitech G502 Proteus Spectrum
Keyboard	UVI Pride MechaOptical
Software	Windows 11 Pro

Processor	Core i5-3350P @3.5GHz
Motherboard	MSI Z77MA-G45 (uATX)
Cooling	Stock Intel
Memory	2x4GB Crucial Ballistix Tactical DDR3 1600
Video Card(s)	\|Ξ \/ G /\ GeForce GTX 670 FTW+ 4GB w/Backplate, Part Number: 04G-P4-3673-KR, ASIC 68.5%
Storage	some cheap seagates and one wd green
Display(s)	Dell UltraSharp U2412M
Case	some cheap old eurocase, black, with integrated cheap lit lcd for basic monitoring
Audio Device(s)	Realtek ALC892
Power Supply	Enermax Triathlor 550W ETA550AWT bronze, non-modular, airflow audible over 300W power draw
Mouse	PMSG1G
Keyboard	oldschool membrane Keytronic 104 Key PS/2 (big enter, right part of right shift broken into "\" key)

System Name	Windows 10 64-bit Core i7 6700
Processor	Intel Core i7 6700
Motherboard	Asus Z170M-PLUS
Cooling	Corsair AIO
Memory	2 x 8 GB Kingston DDR4 2666
Video Card(s)	Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage	Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s)	Dell P2414H
Case	Corsair Carbide Air 540
Audio Device(s)	Realtek HD Audio
Power Supply	Corsair TX v2 650W
Mouse	Steelseries Sensei
Keyboard	CM Storm Quickfire Pro, Cherry MX Reds
Software	MS Windows 10 Pro 64-bit

NVIDIA Doubles Performance for Deep Learning Training

btarunr

Editor & Senior Moderator

RejZoR

haswrong

BiggieShady