Wednesday, December 20th 2023

Khronos Finalizes Vulkan Video Extensions for Accelerated H.264 and H.265 Encode

In April 2021, the Vulkan Working Group at Khronos released a set of provisional extensions, collectively referred to as Vulkan Video which provide seamless encoding and decoding of video streams using a variety of video coding standards. The December 2022 release of Vulkan 1.3.238 saw the finalization of the extensions to decode H.264 and H.265, and today, with the release of Vulkan 1.3.274, Khronos has finalized their counterpart: the extensions to enable encoding of H.264 and H.265 video streams. Leveraging the Vulkan framework, they provide a standardized, seamless, low-overhead, and highly controllable way to produce H.264 and H.265 video via hardware accelerators, with applications ranging from real-time, low-latency streaming to offline server-scale transcoding.

Incorporating industry feedback, the extensions saw many improvements since their introduction, from a bidirectional interface (overrides) to help with coding and exposing advanced hardware capabilities, to rate control configuration parameters and an interface to aid with quality vs. performance trade-offs. This feedback also prompted the release of the first video maintenance extension. In addition, given the high industry demand for AV1 codec support, an AV1 decode extension release is imminent, with an AV1 encode extension development also underway. Figure 1 depicts Vulkan Video extensions along with their status and relations.
The encode extensions grant low-level control over much of the encoding process, while still keeping the efficiency and performance of hardware encoding acceleration. Implementers have the freedom to tweak details such as quantization index, per-slice bit allocation, arithmetic coder, deblocking, and more. Given this flexibility and complexity, a balanced programming interface for rate control gives users a choice between more automated operation and low-level tweaking of frame parameters.

Encoder Rate Control
Often the most important aspect of encoder configuration for applications, the encoder rate control API was given special attention in Vulkan Video. From exposing parameters for standard rate control modes (e.g. CBR/VBR), to allowing applications to provide hints about other intended stream encoding parameters (e.g. picture/reference patterns), to providing the ability to configure per-layer rate control parameters (e.g. for streams with multiple temporal layers), the rate control API offers a rich set of features for various use cases and lays a solid foundation for future extensions. Encoder rate control configuration is performed using the vkCmdControlVideoCodingKHR command.

Encoder Quality Levels
Video encoder implementations often fine tune the use of various encoding tools and rate control parameters depending on the desired quality versus performance/latency trade-offs of different use cases. Now implementations report the number of quality levels supported for a given video profile and usage. A new API vkGetPhysicalDeviceVideoEncodeQualityLevelPropertiesKHR may be used to retrieve implementation recommendations for various encoding parameters and configurations (e.g. rate control).

Implementation Overrides
Due to the complex nature of video encoding, and the ever-changing nature of hardware encoders and their capabilities, an interface, known as overrides, permits bidirectional communication that guarantees that the output video stream will be compliant. In addition, applications may opt-in for optimization overrides to allow implementations more flexibility to optimize for the specified usage and hints. Full disclosure about the occurrence of overrides for video session parameters or frame parameters is also reported for developers interested in more detailed analysis of such overrides.

Retrieval of encoded video session parameters bitstream segments
To facilitate implementation overrides for bitstream compliance and optimizations, applications are expected to retrieve the encoded video session parameter bitstream segments (e.g. H.264 SPS/PPS) from the implementation using the new API call vkGetEncodedVideoSessionParametersKHR against the given VkVideoSessionParametersKHR object.

Encoder Feedback Query
To allow future extension of encoder feedback statistics in a manner similar to pipeline statistics, the new VK_QUERY_TYPE_VIDEO_ENCODE_FEEDBACK_KHR is now used to retrieve the video bitstream offset and size.

Changes to Video Decode & Encode
VK_KHR_video_maintenance1
Along with the video encoding extensions, Khronos is releasing a maintenance extension incorporating community and industry feedback, which improves flexibility for both decoding and encoding. This extension permits decoding implementations to create images usable with video decoding without the need to explicitly specify the video profiles they will be used with. The same applies for encoding, where an attached per-image video profile limits usability with large and complex transcoding frameworks.

In addition to flexibility improvements, a new, simpler interface for specifying video queries inline with video decode and encode operation commands has been added, known as inline queries.

Requiring pSetupReferenceSlotKHR for non-reference pictures
When the Vulkan Video decode extensions were finalized applications were required to provide a reconstructed picture resource and DPB slot (via VkVideoDecodeInfoKHR::pSetupReferenceSlot) only if the picture being decoded will become a reference. However, no shipping implementation actually supported specifying NULLfor pSetupReferenceSlot, and further some implementations discovered cases that require the use of the reconstructed picture resource and/or DPB slot for transient storage during decoding a non-reference picture. A similar situation applies to encoding non-reference pictures. As a result, the vulkan video extensions were updated to require providing pSetupReferenceSlotKHR for non-reference pictures.
Add your own comment

3 Comments on Khronos Finalizes Vulkan Video Extensions for Accelerated H.264 and H.265 Encode

#1
zlobby
They took their time, that's for sure, but better late than never.
Posted on Reply
#2
TheNightLynx
One year for every decoder and two years for every encoder imho is a bit too much time to simply define the API, in particular, considering that the implementation is in charge of the GPU builder.
Posted on Reply
#3
stimpy88
Wow, a bit late, to say the least! But if its robust and stable, then it's worth it.
Posted on Reply
Dec 2nd, 2024 14:41 CST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts