Khronos Group Releases OpenCL 3.0

AleksandarK · Apr 27, 2020

Today, The Khronos Group, an open consortium of industry-leading companies creating advanced interoperability standards, publicly releases the OpenCL 3.0 Provisional Specifications. OpenCL 3.0 realigns the OpenCL roadmap to enable developer-requested functionality to be broadly deployed by hardware vendors, and it significantly increases deployment flexibility by empowering conformant OpenCL implementations to focus on functionality relevant to their target markets. OpenCL 3.0 also integrates subgroup functionality into the core specification, ships with a new OpenCL C 3.0 language specification, uses a new unified specification format, and introduces extensions for asynchronous data copies to enable a new class of embedded processors. The provisional OpenCL 3.0 specifications enable the developer community to provide feedback on GitHub before the specifications and conformance tests are finalized.

To cater to a widening diversity of OpenCL devices, OpenCL 3.0 makes all functionality beyond version 1.2 optional. All OpenCL 1.2 applications will continue to run unchanged on any OpenCL 3.0 device. All OpenCL 2.X features are coherently defined in the new unified specification, and current OpenCL 2.X implementations that upgrade to OpenCL 3.0 can continue to ship with their existing functionality with full backwards compatibility. All OpenCL 2.X API features can be queried, and OpenCL C 3.0 adds macros for querying optional language features.

"OpenCL is the most pervasive, cross-vendor, open standard for low-level heterogeneous parallel programming—widely used by applications, libraries, engines, and compilers that need to reach the widest range of diverse processors," said Neil Trevett, vice president at NVIDIA, president of the Khronos Group and OpenCL Working Group Chair. "OpenCL 2.X delivers significant functionality, but OpenCL 1.2 has proven itself as the baseline needed by all vendors and markets. OpenCL 3.0 integrates tightly organized optionality into the monolithic 2.2 specification, boosting deployment flexibility that will enable OpenCL to raise the bar on pervasively available functionality in future core specifications."

For C++ kernel development, the OpenCL Working Group has transitioned from the original OpenCL C++ kernel language, defined in OpenCL 2.2, to the 'C++ for OpenCL' community, open-source project supported by Clang. C++ for OpenCL provides compatibility with OpenCL C, enables developers to use most C++17 features in OpenCL kernels, and is compatible with any OpenCL 2.X or OpenCL 3.0 implementation that supports SPIR-V ingestion.

The Extended Asynchronous Copy and Asynchronous Work Group Copy Fence extensions released alongside OpenCL 3.0 enable efficient, ordered DMA transactions as first class citizens in OpenCL—ideal for Scratch Pad Memory based devices, which require fine-grained control over buffer allocation. These extensions are the first of significant upcoming advances in OpenCL to enhance support for embedded processors.

To accompany today's release, the OpenCL Working Group has updated its OpenCL Resource Guide to help computing specialists, developers and researchers of all skill levels effectively harness the power of OpenCL. The OpenCL Working Group will continuously evolve the guide and welcomes any feedback on how it can be improved via GitHub.

OpenCL 3.0 at IWOCL
OpenCL Working Group members will be participating in the Khronos Panel Session at the IWOCL / SYCLcon online conference on April 28 at 4 PM GMT. IWOCL / SYCLcon is the leading forum for high-performance computing specialists working with OpenCL, SYCL, Vulkan and SPIR-V, and registration is free.

Industry Support for OpenCL 3.0
"In recent years there has been an impressive adoption of OpenCL to drive heterogeneous processing systems within many market segments," said Andrew Richards, founder and CEO of Codeplay Software. "This update to OpenCL 3.0 brings important flexibility benefits that will allow many evolving industries, from AI and HPC to automotive, to focus on their specific requirements and embrace open standards. Codeplay is excited to enable hardware vendors to support OpenCL 3.0 and to take advantage of the flexibility provided in its ecosystem of software products."

Mark Butler, vice president of software engineering, Imagination Technologies, says; "With its focus on deployment flexibility, we see OpenCL 3.0 as an excellent step forward in providing critical features for developers, with the ability to add functionality over time. This really is a step forward for the OpenCL ecosystem, allowing developers to write portable applications that depend on widely accepted functionality. Currently shipping GPUs based on the PowerVR Rogue architecture will enjoy a significant feature uplift including SVM, Generic Address Space and Work-group Functions. Upon final release of the specification, Imagination will ship a conformant OpenCL 3.0 implementation with support extending across a wide range of PowerVR GPUs, including our latest offering with IMG A-Series."

"Intel strongly supports cross-architecture standards being driven across the compute ecosystem such as in OpenCL 3.0 and SYCL," said Jeff McVeigh, vice president, Intel Architecture, Graphics and Software. "Standards-based, unified programming models will enable efficiency and unleash creativity for our developers with the upcoming release of our new Xe GPU architecture."

"NVIDIA welcomes OpenCL 3.0's focus on defining a baseline to enable developer-critical functionality to be widely adopted in future versions of the specification," said Anshuman Bhat, compute product manager at NVIDIA. "NVIDIA will ship a conformant OpenCL 3.0 when the specification is finalized and we are working to define the Vulkan interop extension that, together with layered OpenCL implementations, will significantly increase deployment flexibility for OpenCL developers."

"OpenCL 3.0 is an important step forward in the drive to unlock greater performance and innovation across a broadening range of computing platforms and applications," said Balaji Calidas, director of engineering at Qualcomm. "The flexible extension model will help our customers and software partners take full advantage of the tremendous potential available in both our existing and future application processors. We are pleased to have had the opportunity to contribute to this specification and we look forward to supporting the final product."

"Many of our customers want a GPU programming language that runs on all devices, and with growing deployment in edge computing and mobile, this need is increasing," said Vincent Hindriksen, founder and CEO of Stream HPC. "OpenCL is the only solution for accessing diverse silicon acceleration and many key software stacks use OpenCL/SPIR-V as a backend. We are very happy that OpenCL 3.0 will drive even wider industry adoption, as it reassures our customers that their past and future investments in OpenCL are justified."

"OpenCL 3.0 has opened up a new chapter for the OpenCL API which has served as the standard GPGPU API during the past 10 years" said Weijin Dai, executive vice president and GM of Intellectual Property Division at VeriSilicon. "With the streamlined OpenCL 3.0 core feature set, OpenCL 3.0 will enable a whole new class of embedded devices to adopt OpenCL API for GPU Compute and ML/AI processing, and it will also pave the way forward for OpenCL to interop or layer with the Vulkan API. VeriSilicon will deploy OpenCL 3.0 implementations quickly on a broad range of our embedded GPU and VIP products to enable our customers to develop new sets of GPGPU/ML/AI applications with the OpenCL 3.0 API."

About OpenCL
OpenCL (Open Computing Language) is an open, royalty-free standard for cross-platform, parallel programming of diverse, heterogeneous accelerators found in supercomputers, cloud servers, personal computers, mobile devices and embedded platforms. OpenCL greatly improves the speed and responsiveness of a wide spectrum of applications in numerous market categories including professional creative tools, scientific and medical software, vision processing, and neural network training and inferencing.

About Khronos
The Khronos Group is an open, non-profit, member-driven consortium of over 150 industry-leading companies creating advanced, royalty-free, interoperability standards for 3D graphics, augmented and virtual reality, parallel programming, vision acceleration and machine learning. Khronos activities include Vulkan, OpenGL, OpenGL ES, WebGL, SPIR-V, OpenCL, SYCL, OpenVX, NNEF, OpenXR, 3D Commerce, ANARI, and glTF. Khronos members drive the development and evolution of Khronos specifications and are able to accelerate the delivery of cutting-edge platforms and applications through early access to specification drafts and conformance tests.

View at TechPowerUp Main Site

gamefoo21 · Apr 27, 2020

Nvidia actually supporting 3.0? Ohhh...

Wait only need to support 1.2 to be conformant or so the language seems to suggest, as 2.x and 3.0 features are optional.

OpenCL is getting feature levels!

I mean if NV doesn't do the OpenCL 3.0 feature level 1.2, and has full support for 3.0 in actual products, I'll be happily surprised.

Vya Domus · Apr 27, 2020

gamefoo21 said:
Nvidia actually supporting 3.0?

Doubt it, OpenCL 2.0 was in beta for what, 3 years ? Never became a thing and never will, it will likely be the same with 3.0.

ARF · Apr 27, 2020

The entire article doesn't mention AMD a single time. So what does AMD think about OpenCL 3.0?
I bet AMD will be the main driving force for upcoming wide support.

gamefoo21 · Apr 27, 2020

Vya Domus said:
Doubt it, OpenCL 2.0 was in beta for what, 3 years ? Never became a thing and never will, it will likely be the same with 3.0.

I wonder after the supercomputer design wins for AMD. Giving OpenCL a real shot in the arm.

Though the release hints that 2.0 is basically dead, because you only need to support 1.2 and one 3.0 call to get 3.0 support... :laugh:

Actually I see NV supporting OCL 3.0 on the compute cards, while the consumer cores will continue limping along with gimped compute performance and likely support.

ARF · Apr 27, 2020

gamefoo21 said:
I wonder after the supercomputer design wins for AMD. Giving OpenCL a real shot in the arm.

Though the release hints that 2.0 is basically dead, because you only need to support 1.2 and one 3.0 call to get 3.0 support...

Actually I see NV supporting OCL 3.0 on the compute cards, while the consumer cores will continue limping along with gimped compute performance and likely support.

Latest version is 2.2.
Nvidia has a closed proprietary ecosystem with CUDA, if you haven't forgotten.

dyonoctis · Apr 27, 2020

mmmh. I thought that apple giving up on open cl and promoting metal was because Nvidia made developing open cl such a pain, that devs chooses CUDA instead. In the cg industry, arnorld, renderman (soon XPU)
, octane, and redshift are the most popular renderer and all of them either get features that can't work with amd, or can't work at all. And apple convinced redshift and octane to work with metal, so I don't think that AMD will get competitive in that sector anytime soon :/

dragontamer5788 · Apr 27, 2020

gamefoo21 said:
I wonder after the supercomputer design wins for AMD. Giving OpenCL a real shot in the arm.

I asked some supercomputer guys and they seem to be using OpenAAC, OpenMP, and CUDA. They don't seem to be interested in OpenCL. Obviously, this is a sample-size of 1, but its something to think about.

In fact, they were more interested in ROCm / HIP (AMD's somewhat CUDA-compatible layer) than OpenCL.

ARF said:
Nvidia has a closed proprietary ecosystem with CUDA, if you haven't forgotten.

Closed, but highly advanced. Thrust, CUB, Cooperative Groups. Nearly full C++ compatibility on the device side (including support for classes, structures, and shared pointers between host / device).

CUDA is used for a reason. Because its way easier to program and optimize than OpenCL. AMD's ROCm / HIP stuff is similarly easier to use than OpenCL in my experience. OpenCL can share pointers with SVM, but with different compilers, there's no guarantee that your classes or structures line up.

CUDA (and AMD's ROCm/HIP) have a further guarantee: the device AND host code go through the same LLVM compiler simultaneously. All alignment and padding between the host and device will be identical and compatible.

bug · Apr 27, 2020

Well, if they kept it backwards compatible, I expect it will see the same "wide adoption" as its predecessors.

ARF said:
Latest version is 2.2.
Nvidia has a closed proprietary ecosystem with CUDA, if you haven't forgotten.

Only the CUDA implementation is proprietary, the ecosystem is full of open source apps built on top of that.

gamefoo21 · Apr 27, 2020

bug said:
Well, if they kept it backwards compatible, I expect it will see the same "wide adoption" as its predecessors.

Only the CUDA implementation is proprietary, the ecosystem is full of open source apps built on top of that.

Open source apps built on a closed source API.

It's open as long as you agree not to attempt to use it on anything that's not certified by NV. Attempts to make CUDA run on anything else will get you ripped to shreds by NV's lawyers.

Sign away your rights and your life, and you can see what makes CUDA tick. Calling CUDA open is at best misleading and in reality a delusion. :laugh:

Note: You can clean room a solution, like making an IBM PC compatible BIOS, but CUDA is far more complicated and AMD's valiant efforts are still limited to older versions and the compatibility of the translator is limited.

renz496 · Apr 28, 2020

gamefoo21 said:
Open source apps built on a closed source API.

It's open as long as you agree not to attempt to use it on anything that's not certified by NV. Attempts to make CUDA run on anything else will get you ripped to shreds by NV's lawyers.

Sign away your rights and your life, and you can see what makes CUDA tick. Calling CUDA open is at best misleading and in reality a delusion.

Note: You can clean room a solution, like making an IBM PC compatible BIOS, but CUDA is far more complicated and AMD's valiant efforts are still limited to older versions and the compatibility of the translator is limited.

If that's the case then AMD would be on court right now with their boltzman initiative. Maybe qualcomm as well.

Fluffmeister · Apr 28, 2020

This was always going to turn into an AMD love fest, but their OpenGL support was shit and CUDA is king.

R-T-B · Apr 28, 2020

Fluffmeister said:
OpenGL support was shit

Was? Did it ever stop?

To my knowledge, the only place that ever got fixed was in linux, and by open source devs, not AMD.

gamefoo21 said:
Calling CUDA open is at best misleading and in reality a delusion.

Few languages are truly open, but the ecosystems are open, which is all he was claiming.

bug · Apr 28, 2020

gamefoo21 said:
Open source apps built on a closed source API.

That right there is your mistake. The API is just the top-most layer of the closed source implementation. In order for others to use your API, the API is usually open (as is the case here).
Besides bringing the SJW side in some, the actual implementation being closed sourced is of little consequence in this case. It's not like 3rd parties know Nvidia's hardware better then Nvidia so they could improve upon the implementation. Sure, it's nice to be able to browse the sources to better understand how it works and debug. But in this particular case, closed-source is not the end of the world.
I mean, open source is always better. But for compute, the open initiatives are shunned by users, so like it or not, many of the AI things you read about today, are made possible by CUDA.

dragontamer5788 · Apr 28, 2020

gamefoo21 said:
Open source apps built on a closed source API.

It's open as long as you agree not to attempt to use it on anything that's not certified by NV. Attempts to make CUDA run on anything else will get you ripped to shreds by NV's lawyers.

Sign away your rights and your life, and you can see what makes CUDA tick. Calling CUDA open is at best misleading and in reality a delusion.

Note: You can clean room a solution, like making an IBM PC compatible BIOS, but CUDA is far more complicated and AMD's valiant efforts are still limited to older versions and the compatibility of the translator is limited.

Then use OpenMP 4.5 "target" code.

https://www.exascaleproject.org/wp-content/uploads/2017/05/OpenMP-4.5-and-Beyond-SOLLVE-part-21.pdf

Open source (CLang / GCC support), single-source compilation, device acceleration.

Code:

#pragma omp target
#pragma omp parallel for private(i)
    for (i=0; i<N; i++) p[i] = v1[i]*v2[i];

"Target" says run this on the GPU. "Parallel For" is an older OpenMP construct, saying that each iteration should be run in parallel. "private(i)" says that the variable "i" is per-thread private. Not sure if the data-transfer over PCIe is fast enough? Then make it CPU-parallel instead:

Code:

#pragma omp parallel for private(i)
    for (i=0; i<N; i++) p[i] = v1[i]*v2[i];

Bam, now the code is CPU parallel. Wait, but you're running on an AMD EPYC with a weird cache-hierarchy, sets of independent L3s across NUMA domains and you want the data to be NUMA-aware, PCIe-aware, and execute on the GPU closest to each individual NUMA node?

Code:

#pragma omp target teams distribute for private(i)
    for (i=0; i<N; i++) p[i] = v1[i]*v2[i];

Yeah. Its that easy.

System Name	R2V2 *In Progress
Processor	Ryzen 7 2700
Motherboard	Asrock X570 Taichi
Cooling	W2A... water to air
Memory	G.Skill Trident Z3466 B-die
Video Card(s)	Radeon VII repaired and resurrected
Storage	Adata and Samsung NVME
Display(s)	Samsung LCD
Case	Some ThermalTake
Audio Device(s)	Asus Strix RAID DLX upgraded op amps
Power Supply	Seasonic Prime something or other
Software	Windows 10 Pro x64

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	R2V2 *In Progress
Processor	Ryzen 7 2700
Motherboard	Asrock X570 Taichi
Cooling	W2A... water to air
Memory	G.Skill Trident Z3466 B-die
Video Card(s)	Radeon VII repaired and resurrected
Storage	Adata and Samsung NVME
Display(s)	Samsung LCD
Case	Some ThermalTake
Audio Device(s)	Asus Strix RAID DLX upgraded op amps
Power Supply	Seasonic Prime something or other
Software	Windows 10 Pro x64

Processor	AMD Ryzen 3700x
Motherboard	asus ROG Strix B-350I Gaming
Cooling	Deepcool LS520 SE
Memory	crucial ballistix 32Gb DDR4
Video Card(s)	RTX 3070 FE
Storage	WD sn550 1To/WD ssd sata 1To /WD black sn750 1To/Seagate 2To/WD book 4 To back-up
Display(s)	LG GL850
Case	Dan A4 H2O
Audio Device(s)	sennheiser HD58X
Power Supply	Corsair SF600
Mouse	MX master 3
Keyboard	Master Key Mx
Software	win 11 pro

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

Processor	AMD Ryzen 7 3700X
Motherboard	MSI MAG B550 TOMAHAWK
Cooling	AMD Wraith Prism
Memory	Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s)	NVIDIA GeForce RTX 3080 FE
Storage	Kingston A2000 1TB + Seagate HDD workhorse
Display(s)	Samsung 50" QN94A Neo QLED
Case	Antec 1200
Power Supply	Seasonic Focus GX-850
Mouse	Razer Deathadder Chroma
Keyboard	Logitech UltraX
Software	Windows 11

System Name	Pioneer
Processor	Ryzen R9 7950X
Motherboard	GIGABYTE Aorus Elite X670 AX
Cooling	Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory	64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64

Khronos Group Releases OpenCL 3.0

News Editor