Intel Meteor Lake Technical Deep Dive

Name: Intel Meteor Lake Technical Deep Dive
Brand: Intel

W1zzard

on Sep 19th, 2023,

in Processors.

Manufacturer: Intel

Graphics Tile & Xe-LPG iGPU »

Intel AI Boost and Neural Processing Unit (NPU)

All processor SKUs based on "Meteor Lake" will feature an NPU, or neural processing engine. Located in the SoC tile, the NPU provides high performance AI inference acceleration. The NPU is being introduced to the PC as a new device class, and will meet Microsoft's MCDM driver model.

The NPU contains four basic hardware components, the Global Control (a crossbar) alongside a memory mapping unit (MMU) and DMA; a tiny scratchpad RAM; and two NCEs (neural compute engines).

Each NCE contains a programmable DSP, and the inference pipeline. The inference pipeline contains the main number crunching resource, the MAC array. These are a set of logic components that accelerate matrix multiply and convolution (MAC), INT8 and FP16, with up to 2,048 MACs per cycle. Besides the MAC array, there is fixed function hardware for data conversion and array activation. Each NCE contains two VLIW programmable DSPs that supports nearly all data-types ranging between INT4 to FP32. At a hardware level, AI inference acceleration is memory intensive, and so, Intel deployed a localized SRAM to work like a scratchpad. This memory intensiveness is also why the NPU is located on the SoC tile, sharing the same fabric as the memory controllers.

The NPU hardware is only half the equation, as the first production client AI acceleration platform, Intel has backed the NPU with a vast AI Software Stack. Among the APIs supported are WinML, DirectML, and ONNX RT, besides OpenVINO inference engine. Among the libraries and compilers that are part of the stack are MLAS, MKLDNN Library, which together with MLAS, takes advantage of the AI acceleration capabilities of the Compute Tile, along with the "Redwood Cove" ISA (DLBoost). The GPU UMD driver, which talks to the WDDM kernel mode driver and graphics firmware, interfaces with the iGPU that has its own set of SIMD resources across the Xe Cores. It's important to note that unlike Arc "Alchemist" discrete GPUs, the iGPU on "Meteor Lake" does not support XMX AI acceleration, the Xe Cores provide AI acceleration over DP4a. Lastly, there's the NPU user-mode driver, the MCDM kernel mode-driver, and NPU firmware, talking to this NPU.

Among the use-cases for the NPU on "Meteor Lake" would include generative AI, computer vision, image manipulation, and collaboration. The NPU can accelerate transformers, large language models, image and audio generation. Computer vision will come in handy for image classification, object detection and isolation within the image, and image depth estimation. Among the image enhancement AI applications include AI super resolution upscaling. Lastly, there are a plethora of interpersonal collaboration applications for AI, including background manipulation, image stabilization, gaze correction, background noise removal, and speech-to-text.

Why Intel felt the need to develop the NPU, and not simply deploy XMX accelerators on the iGPU coupled with GFNI and DLBoost on the Compute tile has to do with efficiency. The NPU can be up to 8 times more power efficient at performing an AI workload, than if you were to get the iGPU or CPU cores to handle it. The NPU also introduces a degree of hardware-level standardization for the programming interfaces to be built upon. Besides these, Intel says that the same exact NPU, with a uniform performance level, will be available across all Core processor SKUs based on "Meteor Lake."