NVIDIA RTX IO Detailed: GPU-assisted Storage Stack Here to Stay Until CPU Core-counts Rise
NVIDIA at its GeForce "Ampere" launch event announced the RTX IO technology. Storage is the weakest link in a modern computer, from a performance standpoint, and SSDs have had a transformational impact. With modern SSDs leveraging PCIe, consumer storage speeds are now bound to grow with each new PCIe generation doubling per-lane IO bandwidth. PCI-Express Gen 4 enables 64 Gbps bandwidth per direction on M.2 NVMe SSDs, AMD has already implemented it across its Ryzen desktop platform, Intel has it on its latest mobile platforms, and is expected to bring it to its desktop platform with "Rocket Lake." While more storage bandwidth is always welcome, the storage processing stack (the task of processing ones and zeroes to the physical layer), is still handled by the CPU. With rise in storage bandwidth, the IO load on the CPU rises proportionally, to a point where it can begin to impact performance. Microsoft sought to address this emerging challenge with the DirectStorage API, but NVIDIA wants to build on this.
According to tests by NVIDIA, reading uncompressed data from an SSD at 7 GB/s (typical max sequential read speeds of client-segment PCIe Gen 4 M.2 NVMe SSDs), requires the full utilization of two CPU cores. The OS typically spreads this workload across all available CPU cores/threads on a modern multi-core CPU. Things change dramatically when compressed data (such as game resources) are being read, in a gaming scenario, with a high number of IO requests. Modern AAA games have hundreds of thousands of individual resources crammed into compressed resource-pack files.
According to tests by NVIDIA, reading uncompressed data from an SSD at 7 GB/s (typical max sequential read speeds of client-segment PCIe Gen 4 M.2 NVMe SSDs), requires the full utilization of two CPU cores. The OS typically spreads this workload across all available CPU cores/threads on a modern multi-core CPU. Things change dramatically when compressed data (such as game resources) are being read, in a gaming scenario, with a high number of IO requests. Modern AAA games have hundreds of thousands of individual resources crammed into compressed resource-pack files.