Monday, March 14th 2022
Microsoft DirectStorage API Available, but Without GPU-accelerated Decompression
Microsoft officially launched the DirectStorage API on the Windows PC platform, on Monday. The API enables direct data interactions between the GPU, graphics memory, and a storage device, so games have a more direct path to stream game assets to the graphics hardware. The API is compatible both with Windows 10 and Windows 11, although Microsoft is recommending the latter, for "in-built storage optimizations." Also, unlike previously reported, you don't necessarily need an NVMe-based storage device, such as an M.2 SSD with PCIe/NVMe interface. Even a SATA SSD with AHCI protocol will do. Microsoft is, however, recommending the use of an NVMe SSD for the best performance.
There is, however, a wrinkle. Microsoft isn't launching a killer feature of the DirectStorage API yet, which is GPU-accelerated asset decompression. This feature allowed GPUs to use compute shaders to decompress game assets that are stored in compressed asset libraries on the disk. Most games store their local assets this way, to conserve their disk footprint. Without this feature, unless there's special game code from the developer's end to utilize GPGPU for asset decompression; compressed game assets still have to rope in the CPU, and lengthen the pipeline. Microsoft stated that enabling GPU-accelerated decompression is "next on their roadmap."
There is, however, a wrinkle. Microsoft isn't launching a killer feature of the DirectStorage API yet, which is GPU-accelerated asset decompression. This feature allowed GPUs to use compute shaders to decompress game assets that are stored in compressed asset libraries on the disk. Most games store their local assets this way, to conserve their disk footprint. Without this feature, unless there's special game code from the developer's end to utilize GPGPU for asset decompression; compressed game assets still have to rope in the CPU, and lengthen the pipeline. Microsoft stated that enabling GPU-accelerated decompression is "next on their roadmap."
78 Comments on Microsoft DirectStorage API Available, but Without GPU-accelerated Decompression
As for the gaming performance ads, that's interesting! I would expect the main outcome (and main goal) to be people thinking "I want a new W11 gaming PC" though - even the idea of an OS upgrade is alien to most gamers and PC users (which is also why MS' tactic of encouraging in-place OS upgrades through Windows Update is ... let's say problematic, as most people have no idea what is going on or what is being changed).
For modern open world games that go a long stretch without interruption and need to continuously stream assets, it gets a bit harder but still just a scheduling problem, maybe reserve some small fixed number of shaders for streaming/decompression. That would mean slightly lower rendering performance, but with the tradeoff that you never get dips/stutters due to bottlenecks in storage->CPU->GPU streaming.
To this day, DX is still Microsoft's biggest selling point for developers: develop for one API, target two platforms. If you're primarily a console developer you can easily port to PC for some easy extra $$$. And viceversa.
And if you've got developers and thus games cornered, hardware sales will come. Though, as you have noted, hardware revenue is much lower than what Microsoft get for subscriptions. But the underpinning of those subscriptions are still games and thus DX.
I know it's a convoluted explanation, but that how Microsoft works. Whether it's XBox subs, integrating Lync into Office, VS into Azure and everything with AD, Microsoft is all about bundling. So when you want to figure out something about Microsoft, you always have to take a step back and look at the bigger picture.
It's a good start for devs to begin work on it, but it's missing critical features at this stage.
Having said that, end-users will probably need a fully upgraded system, including: UEFI firmware, NVMe firmware, all the Chipset drivers, Windows kernel, GPU driver. Whole shebang of updates. DirectStorage is a deeply low-level feature that requires new tricks in the whole Windows I/O stack. Also it will deliver max performance only on Win11 even if it also runs on Win10; Win11 haters out there will have to finally upgrade when the first game releases appear that feature significantly lower loading times with DS.
Notice that Win11 already includes some fundamental changes towards DirectStorage, since day one. This is very likely the cause of some of the perf regressions that Win11 had that have been gradually fixed by patches, notably on AMD systems but also some high-performance NVMe devices still have lower random write IOPs compared to Win10. And that kind of pain is one of the reasons why MS released Win11 instead of yet another Win10 feature update. At some point it's hard to evolve a 6+ year old OS in a near-perfect compatible way, you gotta clean up and make radical changes in a few places, that was also necessary in areas like virtualization (think WSL2), security (the whole TPM crisis but more like VBS), and GPU drivers (again for WSL2 but prob also DirectStorage). And it's hard to push those big, high-risk changes to an OS that has a billion of risk-averse corporate users. I wish Win11 had shipped a little better baked, it was RC quality at best when shipped, but still happy they did it.
Of course that means the GPU or CPU have to eventually unpack it though, lengthening the pipeline either way. This is still of questionable benefit right now.
I'm trying to figure out how this impacts storage that is directly connected to the CPUs (2x in my case) using the on die raid controllers. Each device gets 4x lanes, and I can connect 4 devices across both cpus. As they are directly connected to the cpus, how would direct storage affect this?
her is the storage on CPU components, I have 4x micron 9300 (52TB Raid)) in direct VROC on cpu storage
In this next picture, no storage is physically plugged into any pcie slot - The drives are plugged individually into 4 separate u.2 connectors on the motherboard. So in this situation I can pick 1,2, 3, 4 devices attached to pci slot, this is processor 1 (out of 2) so each device gets the full pcie bandwidth, direct to cpu, per cpu
making across 2 cpus, 4x 16x for 4 devices in 1 volume (in my case).
I did not include pictures of the vroc array / raid manager in the bios
I hope this new "feature" from microsoft doesn't mess up the 50GB/s write speed and the 61Gb/s read speed and I wonder if its part of the windows server deployment -in case it messes up array functions.
I suppose I could test by allowing that particular update but too many times I've been burnt by microsoft "features"
here is the write speed to my volume as it backs up another onboard device:
Hence the concern on some of the new ms feature 'directstorage' for those of us that already have it at the hardware level.
As such, it shouldn't affect your drive access speeds whatsoever - what changes isn't how they're accessed, just where they send data. Rather than storage-PCIe root hub-CPU decompression(-RAM)-PCIe root hub-VRAM (with all data after the decompression stage being much larger) the pipeline now becomes storage-PCIe root hub(-RAM-PCIe root hub)-VRAM, with the additional benefit of less bandwidth used as all data transferred is now compressed. It's also an opt-in API, and not a system-wide function that somehow affects file access generally, so there's no reason why it would change anything at all outside of specifically DirectStorage-enabled applications.
As for whether this is rolled out in Windows Server, I would guess that depends on whether it has any useful applications in that space and whether or not other solutions doing the same already exist.
The main part of DS - and the reason why this announcement is rather nonsensical - is the fact that GPUs can now decompress game assets themselves (at least when compressed in certain ways), rather than needing the CPU to do this before the data can be loaded into VRAM. I would expect server software vendors where this would be an option to either already have implemented this through their own software (which should be entirely possible as long as you can write GPU-accelerated decompression software), or to be eagerly awaiting a standardized way of doing so. Either way, it should have zero effect on file access outside of these specific applications.
If its about games, why not just have the option to unpack/decompress the game elements and take up a little more drive space than go through this? ;)
....and, where does one enable/disable this feature?
-one doesn't, unless one is a game developer. There is literally zero reason for this to be a toggleable option.
- because that "little more drive space" could mean a 2-4x increase in drive space for compressed assets. When games are already pushing 100GB, most of which is compressed assets, this is hardly an attractive proposition.
- I don't understand how reducing cpu load and PCIe bandwidth requirements for the most common high performance PC usage scenario is a gimmick, nor anything but highly practical.
It's pure software, that allows certain tasks to be done without all the previous CPU overheads (In this current half released state, not as well as intended)