Monday, April 27th 2020

Khronos Group Releases OpenCL 3.0

Today, The Khronos Group, an open consortium of industry-leading companies creating advanced interoperability standards, publicly releases the OpenCL 3.0 Provisional Specifications. OpenCL 3.0 realigns the OpenCL roadmap to enable developer-requested functionality to be broadly deployed by hardware vendors, and it significantly increases deployment flexibility by empowering conformant OpenCL implementations to focus on functionality relevant to their target markets. OpenCL 3.0 also integrates subgroup functionality into the core specification, ships with a new OpenCL C 3.0 language specification, uses a new unified specification format, and introduces extensions for asynchronous data copies to enable a new class of embedded processors. The provisional OpenCL 3.0 specifications enable the developer community to provide feedback on GitHub before the specifications and conformance tests are finalized.
OpenCL

To cater to a widening diversity of OpenCL devices, OpenCL 3.0 makes all functionality beyond version 1.2 optional. All OpenCL 1.2 applications will continue to run unchanged on any OpenCL 3.0 device. All OpenCL 2.X features are coherently defined in the new unified specification, and current OpenCL 2.X implementations that upgrade to OpenCL 3.0 can continue to ship with their existing functionality with full backwards compatibility. All OpenCL 2.X API features can be queried, and OpenCL C 3.0 adds macros for querying optional language features.

"OpenCL is the most pervasive, cross-vendor, open standard for low-level heterogeneous parallel programming—widely used by applications, libraries, engines, and compilers that need to reach the widest range of diverse processors," said Neil Trevett, vice president at NVIDIA, president of the Khronos Group and OpenCL Working Group Chair. "OpenCL 2.X delivers significant functionality, but OpenCL 1.2 has proven itself as the baseline needed by all vendors and markets. OpenCL 3.0 integrates tightly organized optionality into the monolithic 2.2 specification, boosting deployment flexibility that will enable OpenCL to raise the bar on pervasively available functionality in future core specifications."

For C++ kernel development, the OpenCL Working Group has transitioned from the original OpenCL C++ kernel language, defined in OpenCL 2.2, to the 'C++ for OpenCL' community, open-source project supported by Clang. C++ for OpenCL provides compatibility with OpenCL C, enables developers to use most C++17 features in OpenCL kernels, and is compatible with any OpenCL 2.X or OpenCL 3.0 implementation that supports SPIR-V ingestion.

The Extended Asynchronous Copy and Asynchronous Work Group Copy Fence extensions released alongside OpenCL 3.0 enable efficient, ordered DMA transactions as first class citizens in OpenCL—ideal for Scratch Pad Memory based devices, which require fine-grained control over buffer allocation. These extensions are the first of significant upcoming advances in OpenCL to enhance support for embedded processors.

To accompany today's release, the OpenCL Working Group has updated its OpenCL Resource Guide to help computing specialists, developers and researchers of all skill levels effectively harness the power of OpenCL. The OpenCL Working Group will continuously evolve the guide and welcomes any feedback on how it can be improved via GitHub.

OpenCL 3.0 at IWOCL
OpenCL Working Group members will be participating in the Khronos Panel Session at the IWOCL / SYCLcon online conference on April 28 at 4 PM GMT. IWOCL / SYCLcon is the leading forum for high-performance computing specialists working with OpenCL, SYCL, Vulkan and SPIR-V, and registration is free.

Industry Support for OpenCL 3.0
"In recent years there has been an impressive adoption of OpenCL to drive heterogeneous processing systems within many market segments," said Andrew Richards, founder and CEO of Codeplay Software. "This update to OpenCL 3.0 brings important flexibility benefits that will allow many evolving industries, from AI and HPC to automotive, to focus on their specific requirements and embrace open standards. Codeplay is excited to enable hardware vendors to support OpenCL 3.0 and to take advantage of the flexibility provided in its ecosystem of software products."

Mark Butler, vice president of software engineering, Imagination Technologies, says; "With its focus on deployment flexibility, we see OpenCL 3.0 as an excellent step forward in providing critical features for developers, with the ability to add functionality over time. This really is a step forward for the OpenCL ecosystem, allowing developers to write portable applications that depend on widely accepted functionality. Currently shipping GPUs based on the PowerVR Rogue architecture will enjoy a significant feature uplift including SVM, Generic Address Space and Work-group Functions. Upon final release of the specification, Imagination will ship a conformant OpenCL 3.0 implementation with support extending across a wide range of PowerVR GPUs, including our latest offering with IMG A-Series."

"Intel strongly supports cross-architecture standards being driven across the compute ecosystem such as in OpenCL 3.0 and SYCL," said Jeff McVeigh, vice president, Intel Architecture, Graphics and Software. "Standards-based, unified programming models will enable efficiency and unleash creativity for our developers with the upcoming release of our new Xe GPU architecture."

"NVIDIA welcomes OpenCL 3.0's focus on defining a baseline to enable developer-critical functionality to be widely adopted in future versions of the specification," said Anshuman Bhat, compute product manager at NVIDIA. "NVIDIA will ship a conformant OpenCL 3.0 when the specification is finalized and we are working to define the Vulkan interop extension that, together with layered OpenCL implementations, will significantly increase deployment flexibility for OpenCL developers."

"OpenCL 3.0 is an important step forward in the drive to unlock greater performance and innovation across a broadening range of computing platforms and applications," said Balaji Calidas, director of engineering at Qualcomm. "The flexible extension model will help our customers and software partners take full advantage of the tremendous potential available in both our existing and future application processors. We are pleased to have had the opportunity to contribute to this specification and we look forward to supporting the final product."

"Many of our customers want a GPU programming language that runs on all devices, and with growing deployment in edge computing and mobile, this need is increasing," said Vincent Hindriksen, founder and CEO of Stream HPC. "OpenCL is the only solution for accessing diverse silicon acceleration and many key software stacks use OpenCL/SPIR-V as a backend. We are very happy that OpenCL 3.0 will drive even wider industry adoption, as it reassures our customers that their past and future investments in OpenCL are justified."

"OpenCL 3.0 has opened up a new chapter for the OpenCL API which has served as the standard GPGPU API during the past 10 years" said Weijin Dai, executive vice president and GM of Intellectual Property Division at VeriSilicon. "With the streamlined OpenCL 3.0 core feature set, OpenCL 3.0 will enable a whole new class of embedded devices to adopt OpenCL API for GPU Compute and ML/AI processing, and it will also pave the way forward for OpenCL to interop or layer with the Vulkan API. VeriSilicon will deploy OpenCL 3.0 implementations quickly on a broad range of our embedded GPU and VIP products to enable our customers to develop new sets of GPGPU/ML/AI applications with the OpenCL 3.0 API."

About OpenCL
OpenCL (Open Computing Language) is an open, royalty-free standard for cross-platform, parallel programming of diverse, heterogeneous accelerators found in supercomputers, cloud servers, personal computers, mobile devices and embedded platforms. OpenCL greatly improves the speed and responsiveness of a wide spectrum of applications in numerous market categories including professional creative tools, scientific and medical software, vision processing, and neural network training and inferencing.

About Khronos
The Khronos Group is an open, non-profit, member-driven consortium of over 150 industry-leading companies creating advanced, royalty-free, interoperability standards for 3D graphics, augmented and virtual reality, parallel programming, vision acceleration and machine learning. Khronos activities include Vulkan, OpenGL, OpenGL ES, WebGL, SPIR-V, OpenCL, SYCL, OpenVX, NNEF, OpenXR, 3D Commerce, ANARI, and glTF. Khronos members drive the development and evolution of Khronos specifications and are able to accelerate the delivery of cutting-edge platforms and applications through early access to specification drafts and conformance tests.
Add your own comment

14 Comments on Khronos Group Releases OpenCL 3.0

#1
gamefoo21
Nvidia actually supporting 3.0? Ohhh...

Wait only need to support 1.2 to be conformant or so the language seems to suggest, as 2.x and 3.0 features are optional.

OpenCL is getting feature levels!

I mean if NV doesn't do the OpenCL 3.0 feature level 1.2, and has full support for 3.0 in actual products, I'll be happily surprised.
Posted on Reply
#2
Vya Domus
gamefoo21
Nvidia actually supporting 3.0?
Doubt it, OpenCL 2.0 was in beta for what, 3 years ? Never became a thing and never will, it will likely be the same with 3.0.
Posted on Reply
#3
ARF
The entire article doesn't mention AMD a single time. So what does AMD think about OpenCL 3.0?
I bet AMD will be the main driving force for upcoming wide support.
Posted on Reply
#4
gamefoo21
Vya Domus
Doubt it, OpenCL 2.0 was in beta for what, 3 years ? Never became a thing and never will, it will likely be the same with 3.0.
I wonder after the supercomputer design wins for AMD. Giving OpenCL a real shot in the arm.

Though the release hints that 2.0 is basically dead, because you only need to support 1.2 and one 3.0 call to get 3.0 support... :laugh:

Actually I see NV supporting OCL 3.0 on the compute cards, while the consumer cores will continue limping along with gimped compute performance and likely support.
Posted on Reply
#5
ARF
gamefoo21
I wonder after the supercomputer design wins for AMD. Giving OpenCL a real shot in the arm.

Though the release hints that 2.0 is basically dead, because you only need to support 1.2 and one 3.0 call to get 3.0 support... :laugh:

Actually I see NV supporting OCL 3.0 on the compute cards, while the consumer cores will continue limping along with gimped compute performance and likely support.
Latest version is 2.2.
Nvidia has a closed proprietary ecosystem with CUDA, if you haven't forgotten.
Posted on Reply
#6
dyonoctis
mmmh. I thought that apple giving up on open cl and promoting metal was because Nvidia made developing open cl such a pain, that devs chooses CUDA instead. In the cg industry, arnorld, renderman (soon XPU)
, octane, and redshift are the most popular renderer and all of them either get features that can't work with amd, or can't work at all. And apple convinced redshift and octane to work with metal, so I don't think that AMD will get competitive in that sector anytime soon :/
Posted on Reply
#7
dragontamer5788
gamefoo21
I wonder after the supercomputer design wins for AMD. Giving OpenCL a real shot in the arm.
I asked some supercomputer guys and they seem to be using OpenAAC, OpenMP, and CUDA. They don't seem to be interested in OpenCL. Obviously, this is a sample-size of 1, but its something to think about.

In fact, they were more interested in ROCm / HIP (AMD's somewhat CUDA-compatible layer) than OpenCL.
ARF
Nvidia has a closed proprietary ecosystem with CUDA, if you haven't forgotten.
Closed, but highly advanced. Thrust, CUB, Cooperative Groups. Nearly full C++ compatibility on the device side (including support for classes, structures, and shared pointers between host / device).

CUDA is used for a reason. Because its way easier to program and optimize than OpenCL. AMD's ROCm / HIP stuff is similarly easier to use than OpenCL in my experience. OpenCL can share pointers with SVM, but with different compilers, there's no guarantee that your classes or structures line up.

CUDA (and AMD's ROCm/HIP) have a further guarantee: the device AND host code go through the same LLVM compiler simultaneously. All alignment and padding between the host and device will be identical and compatible.
Posted on Reply
#8
bug
Well, if they kept it backwards compatible, I expect it will see the same "wide adoption" as its predecessors.
ARF
Latest version is 2.2.
Nvidia has a closed proprietary ecosystem with CUDA, if you haven't forgotten.
Only the CUDA implementation is proprietary, the ecosystem is full of open source apps built on top of that.
Posted on Reply
#9
gamefoo21
bug
Well, if they kept it backwards compatible, I expect it will see the same "wide adoption" as its predecessors.


Only the CUDA implementation is proprietary, the ecosystem is full of open source apps built on top of that.
Open source apps built on a closed source API.

It's open as long as you agree not to attempt to use it on anything that's not certified by NV. Attempts to make CUDA run on anything else will get you ripped to shreds by NV's lawyers.

Sign away your rights and your life, and you can see what makes CUDA tick. Calling CUDA open is at best misleading and in reality a delusion. :laugh:

Note: You can clean room a solution, like making an IBM PC compatible BIOS, but CUDA is far more complicated and AMD's valiant efforts are still limited to older versions and the compatibility of the translator is limited.
Posted on Reply
#10
renz496
gamefoo21
Open source apps built on a closed source API.

It's open as long as you agree not to attempt to use it on anything that's not certified by NV. Attempts to make CUDA run on anything else will get you ripped to shreds by NV's lawyers.

Sign away your rights and your life, and you can see what makes CUDA tick. Calling CUDA open is at best misleading and in reality a delusion. :laugh:

Note: You can clean room a solution, like making an IBM PC compatible BIOS, but CUDA is far more complicated and AMD's valiant efforts are still limited to older versions and the compatibility of the translator is limited.
If that's the case then AMD would be on court right now with their boltzman initiative. Maybe qualcomm as well.
Posted on Reply
#11
Fluffmeister
This was always going to turn into an AMD love fest, but their OpenGL support was shit and CUDA is king.
Posted on Reply
#12
R-T-B
Fluffmeister
OpenGL support was shit
Was? Did it ever stop?

To my knowledge, the only place that ever got fixed was in linux, and by open source devs, not AMD.
gamefoo21
Calling CUDA open is at best misleading and in reality a delusion.
Few languages are truly open, but the ecosystems are open, which is all he was claiming.
Posted on Reply
#13
bug
gamefoo21
Open source apps built on a closed source API.
That right there is your mistake. The API is just the top-most layer of the closed source implementation. In order for others to use your API, the API is usually open (as is the case here).
Besides bringing the SJW side in some, the actual implementation being closed sourced is of little consequence in this case. It's not like 3rd parties know Nvidia's hardware better then Nvidia so they could improve upon the implementation. Sure, it's nice to be able to browse the sources to better understand how it works and debug. But in this particular case, closed-source is not the end of the world.
I mean, open source is always better. But for compute, the open initiatives are shunned by users, so like it or not, many of the AI things you read about today, are made possible by CUDA.
Posted on Reply
#14
dragontamer5788
gamefoo21
Open source apps built on a closed source API.

It's open as long as you agree not to attempt to use it on anything that's not certified by NV. Attempts to make CUDA run on anything else will get you ripped to shreds by NV's lawyers.

Sign away your rights and your life, and you can see what makes CUDA tick. Calling CUDA open is at best misleading and in reality a delusion. :laugh:

Note: You can clean room a solution, like making an IBM PC compatible BIOS, but CUDA is far more complicated and AMD's valiant efforts are still limited to older versions and the compatibility of the translator is limited.
Then use OpenMP 4.5 "target" code.

www.exascaleproject.org/wp-content/uploads/2017/05/OpenMP-4.5-and-Beyond-SOLLVE-part-21.pdf

Open source (CLang / GCC support), single-source compilation, device acceleration.

#pragma omp target
#pragma omp parallel for private(i)
for (i=0; i<N; i++) p = v1*v2;


"Target" says run this on the GPU. "Parallel For" is an older OpenMP construct, saying that each iteration should be run in parallel. "private(i)" says that the variable "i" is per-thread private. Not sure if the data-transfer over PCIe is fast enough? Then make it CPU-parallel instead:

#pragma omp parallel for private(i)
for (i=0; i<N; i++) p = v1*v2;


Bam, now the code is CPU parallel. Wait, but you're running on an AMD EPYC with a weird cache-hierarchy, sets of independent L3s across NUMA domains and you want the data to be NUMA-aware, PCIe-aware, and execute on the GPU closest to each individual NUMA node?

#pragma omp target teams distribute for private(i)
for (i=0; i<N; i++) p = v1*v2;


Yeah. Its that easy.
Posted on Reply
Add your own comment