• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Releases CUDA Toolkit 4.1

Joined
Dec 6, 2011
Messages
4,785 (2.14/day)
Likes
1,187
Location
Still on the East Side
#1
NVIDIA today released a new version of its CUDA parallel computing platform, which will make it easier for computational biologists, chemists, physicists, geophysicists, other researchers, and engineers to advance their simulations and computational work by using GPUs.

The new NVIDIA CUDA parallel computing platform features three key enhancements that make parallel programing with GPUs easier, more accessible and faster. These include:

- Re-designed Visual Profiler with automated performance analysis, providing an easier path to application acceleration
- New compiler, based on the widely-used LLVM open-source compiler infrastructure, delivering up to 10 percent speed up in application performance
- Hundreds of new imaging and signal processing functions, doubling the size of the NVIDIA Performance Primitives (NPP) library



"The new visual profiler is amazing," said Joshua Anderson, lead developer of the HOOMD-blue open source molecular dynamics project. "With just a few clicks, it performs an automated performance analysis of your application, highlights likely problem areas, and then provides links to best-practice suggestions on improving them. It makes it quick and easy for virtually all developers to accelerate a broad range of applications."

"The LLVM complier gave me an almost immediate 10 percent performance speed up, just by recompiling my existing real-time financial risk analysis code," said Gilles Civario, senior software architect at the Irish Centre for High-End Computing. "I can only imagine the additional performance gains I can achieve with additional tuning using the new CUDA release."

Among the new features of the latest CUDA parallel computing platform release - available free of charge on the NVIDIA developer web site at http://developer.nvidia.com/getcuda - are:

New Visual Profiler - Easiest path to performance optimization

The new Visual Profiler makes it easy for developers at all experience levels to optimize their code for maximum performance. Featuring automated performance analysis and an expert guidance system that delivers step-by-step optimization suggestions, the Visual Profiler identifies application performance bottlenecks and recommends actions, with links to the optimization guides. Using the new Visual Profiler, performance bottlenecks are easily identified and actionable.

LLVM Compiler - Instant 10 percent increase in application performance

LLVM is a widely-used open-source compiler infrastructure featuring a modular design that makes it easy to add support for new programming languages and processor architectures. Using the new LLVM-based CUDA compiler, developers can achieve up to 10 percent additional performance gains on existing GPU-accelerated applications with a simple recompile. In addition, LLVM's modular design allows third-party software tool developers to provide a custom LLVM solution for non-NVIDIA processor architectures, enabling CUDA applications to run across NVIDIA GPUs, as well as those from other vendors.

New Image, Signal Processing Library Functions - "Drop-in" Acceleration with NPP Library

NVIDIA has doubled the size of its NPP library, with the addition of hundreds of new image and signal processing functions. This enables virtually any developer using image or signal processing algorithms to easily gain the benefit of GPU acceleration, with the simple addition of library calls into their application. The updated NPP library can be used for a wide variety of image and signal processing algorithms, ranging from basic filtering to advanced workflows.
 
Joined
Jul 10, 2010
Messages
1,021 (0.37/day)
Likes
226
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
#2
LLVM Compiler w00t!

:rockout:

In addition, LLVM's modular design allows third-party software tool developers to provide a custom LLVM solution for non-NVIDIA processor architectures, enabling CUDA applications to run across NVIDIA GPUs, as well as those from other vendors.
CUDA on AMD?! :respect:
 
Last edited:

erocker

Senior Moderator
Staff member
Joined
Jul 19, 2006
Messages
42,459 (10.10/day)
Likes
18,126
Processor i7 8700K
Motherboard Asus Maximus Hero X WiFi
Cooling Water
Memory 16GB G.Skill 3200Mhz CL14
Video Card(s) GTX 1080
Storage SSD's
Display(s) Nixeus EDG27
Case Thermaltake Core X5
Audio Device(s) Soundblaster Zx
Power Supply Corsair H1000i
Mouse Zowie EC1-B
#3
LLVM Compiler w00t!

:rockout:


CUDA on AMD?! :respect:
Call me shocked! It really must be 2012... and free of charge?!

I hope AMD takes advantage of this, but for some reason I don't find it likely.. but who knows?
 
Joined
Jul 10, 2010
Messages
1,021 (0.37/day)
Likes
226
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
#4
I hope AMD takes advantage of this, but for some reason I don't find it likely.. but who knows?
It's not AMD who is going to take advantage of this it is the Software Devs :laugh:
 
Joined
Mar 23, 2005
Messages
2,998 (0.64/day)
Likes
611
Location
Ancient Greece, Acropolis (Time Lord)
System Name My Red Dragon Gaming PC
Processor AMD FX-8350 @ 4.40GHz w/8-Cores - Bus 277 / 1.35v
Motherboard Asus Crosshair V Formula ROG - Bios v1801
Cooling Corsair H100 Water Cooling (120mm x4 Push/Pull)
Memory G.SKILL Ripjaws X Series 16GB DDR3-2210 (8GBx2)
Video Card(s) SAPPHIRE (ATI) DUAL-X R9 280X 3GB GDDR5 OC + Sapphire Radeon RX 580 8GB Nitro+ LE
Storage Corsair Force 3 SSD 180GB + WD 32MB buffer 1TB HD
Display(s) Asus 24" (VG245H) FHD 75Hz 1ms FreeSyn - Gaming Monitor
Case CoolerMaster HAF 932 - My Custom Red Dragn MOD!
Audio Device(s) SteelSound 5Hv2 8CH HD + EAX® Advanced™ HD 5.0 -SupremeFX X-Fi 2
Power Supply Corsair 750W Gamers Power Supply
Mouse Razer DeathAdder PC Gaming Mouse - Ergonomic Left Hand Edition
Keyboard Logitech G15 Classic Gaming Keyboard
Software Windows 10 x64 Ultimate
Benchmark Scores I have the worlds Fastest PC ever Built - 1980
#5
NVIDIA offering CUDA free of charge is a move of desperation. The more developers support it the better overall for NVIDIA. And how do you attract new developers and corporations to CUDA? By giving it away free. Good move by NVIDIA finally, but I don’t see AMD and Intel jumping in.
 
Joined
Nov 4, 2005
Messages
9,976 (2.24/day)
Likes
2,336
System Name MoFo 2
Processor AMD PhenomII 1100T @ 4.2Ghz
Motherboard Asus Crosshair IV
Cooling Swiftec 655 pump, Apogee GT,, MCR360mm Rad, 1/2 loop.
Memory 8GB DDR3-2133 @ 1900 8.9.9.24 1T
Video Card(s) HD7970 1250/1750
Storage Agility 3 SSD 6TB RAID 0 on RAID Card
Display(s) 46" 1080P Toshiba LCD
Case Rosewill R6A34-BK modded (thanks to MKmods)
Audio Device(s) ATI HDMI
Power Supply 750W PC Power & Cooling modded (thanks to MKmods)
Software A lot.
Benchmark Scores Its fast. Enough.
#6
if they repeat the same things enough times people will believe it.


Hows that OpenCL working for you Nvidia?
 
Joined
Dec 22, 2011
Messages
2,120 (0.95/day)
Likes
1,193
System Name Zimmer Frame Rates
Processor Intel i7 920 @ Stock speeds baby
Motherboard EVGA X58 3X SLI
Cooling True 120
Memory Corsair Vengeance 12GB
Video Card(s) Palit GTX 980 Ti Super JetStream
Storage Of course
Display(s) Crossover 27Q 27" 2560x1440
Case Antec 1200
Audio Device(s) Don't be silly
Power Supply XFX 650W Core
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 10
Benchmark Scores Epic
#7
Yeah because OpenCL has really taken off. :laugh:
 
Joined
Sep 7, 2011
Messages
2,785 (1.20/day)
Likes
1,672
Location
New Zealand
System Name MoneySink
Processor 2600K @ 4.8
Motherboard P8Z77-V
Cooling AC NexXxos XT45 360, RayStorm, D5T+XSPC tank, Tygon R-3603, Bitspower
Memory 16GB Crucial Ballistix DDR3-1600C8
Video Card(s) GTX 780 SLI (EVGA SC ACX + Giga GHz Ed.)
Storage Kingston HyperX SSD (128) OS, WD RE4 (1TB), RE2 (1TB), Cav. Black (2 x 500GB), Red (4TB)
Display(s) Achieva Shimian QH270-IPSMS (2560x1440) S-IPS
Case NZXT Switch 810
Audio Device(s) onboard Realtek yawn edition
Power Supply Seasonic X-1050
Software Win8.1 Pro
Benchmark Scores 3.5 litres of Pale Ale in 18 minutes.
#8
Joined
Nov 4, 2005
Messages
9,976 (2.24/day)
Likes
2,336
System Name MoFo 2
Processor AMD PhenomII 1100T @ 4.2Ghz
Motherboard Asus Crosshair IV
Cooling Swiftec 655 pump, Apogee GT,, MCR360mm Rad, 1/2 loop.
Memory 8GB DDR3-2133 @ 1900 8.9.9.24 1T
Video Card(s) HD7970 1250/1750
Storage Agility 3 SSD 6TB RAID 0 on RAID Card
Display(s) 46" 1080P Toshiba LCD
Case Rosewill R6A34-BK modded (thanks to MKmods)
Audio Device(s) ATI HDMI
Power Supply 750W PC Power & Cooling modded (thanks to MKmods)
Software A lot.
Benchmark Scores Its fast. Enough.
#9
Exactly.


On one hand we have a completely open standard, on the other CUDA a extension of X87 run on GPU cores, and now they have finally released a updated product after how long?
 
Joined
Apr 26, 2009
Messages
414 (0.13/day)
Likes
95
Location
You are here.
System Name Prometheus
Processor Intel i7 4930K
Motherboard Asus P9X79
Cooling Noctua NH-D14
Memory Crucial Ballistix Quad Channel 64GB DDR3 1.35V
Video Card(s) Asus GTX 970 4GB
Storage 2 x Intel 330 240GB RAID0 + 2 x Intel X-25M G2 160GB MIRROR + 4 x 2TB WD Green
Display(s) AMH A409U 4K
Case Intertech AP1
Audio Device(s) Creative Sound Blaster X-Fi Xtreme Audio PCIe
Power Supply Seasonic X660
Mouse Razer Death Adder 2013
Keyboard FILCO Majestouch 2 Ninja
Software Microsoft Windows 10 Pro x64
#10
That's not exactly true Steevo...

First of all, this open standard belongs to Apple and they license it to Khronos. From the Khronos webpage:

OpenCL is a trademark of Apple Inc., and is used under license by Khronos. The OpenCL logo and guidelines for its usage in association with Conformant products can be found here: http://developer.apple.com/softwarelicensing/agreements/opencl.html
If Khronos loses it's license or Apple sells OpenCL to someone or Khronos loses funding and so many other things that could happen, we could see OpenCL just die. The "Apple" part is of much concern to me.

It took a full year for Khronos to finally update OpenCL to version 1.2, and still the implementation lacks serious functionality for larger developers (like Adobe and the like) to have any real use for it. And with such a crawlingly slow development cycle, there is little interest from developers, because they can't wait for years to get the functionality they need.

Second, AMD also has his Close to Metal/Stream/APP (who knows what other names they'll give the technology) and OpenCL is built on that tech just as OpenCL is built on CUDA. In this respect AMD and NVIDIA support OpenCL in the same way with the exact same model.

Also there are other standards and they are in a way all competing with eachother, for example BrookGPU, NPP and many others. You can't expect companies to support just one standard when there are so many more. Especially when the development cycle is so slow.

You can look at Linux and how much fragmentation is in that market. At this point "Linux" is just an umbrella term to cover hundreds of operating systems. Versions of Linux that were updated frequently and they included the features the users actually need survived and grew their userbase.

At this time OpenCL is like an infant Linux distro that has a poor update cycle and does not include the features their userbase would require to start building applications on top of it.

And so developers will just use the next best thing, and most of the time, that is CUDA (and it's additional supporting libraries that are growing in number, and in "openness"), more then Stream/APP.
 
Joined
Jan 2, 2009
Messages
731 (0.22/day)
Likes
102
Processor Intel Core i5-3470 3.2 GHz Quad-core Ivy Bridge
Motherboard ASUS P8Z77-M Z77
Cooling ID-COOLING IS-50 TDP 130W
Memory Kingston HyperX Genesis 2x4 GB DDR3 @ 1866MHz 9-11-9-27-1T
Video Card(s) ZOTAC GeForce® GTX 1070 AMP Edition (ZT-P10700C-10P)
Storage WD SiliconEdge Blue 64 GB SSD, Kingston SSDNow! 240 GB SSD, WD RE4 1 TB HDD
Display(s) LN-T4065F FullHD LCD TV
Power Supply Raidmax RX-1000AE 1000W 80 Plus Gold
Mouse Logitech G402 Hyperion Fury FPS Gaming Mouse (Defective MOUSE3)
Keyboard Logitech K120
Software Windows 10 Pro 64-bit
#11
NVIDIA offering CUDA free of charge is a move of desperation. The more developers support it the better overall for NVIDIA. And how do you attract new developers and corporations to CUDA? By giving it away free. Good move by NVIDIA finally, but I don’t see AMD and Intel jumping in.
It's not really a move of desperation considering that most of OpenCL's (1.2) functions are actually branched off CUDA. That's why it's super simple to convert from CUDA to OpenCL and vice-versa. The new Context functions and Directives are exactly the same from CUDA 4.0.

AMD (not really Intel) need to step up their game since Stream is not going anywhere at all.