Monday, March 14th 2022

Microsoft DirectStorage API Available, but Without GPU-accelerated Decompression

Microsoft officially launched the DirectStorage API on the Windows PC platform, on Monday. The API enables direct data interactions between the GPU, graphics memory, and a storage device, so games have a more direct path to stream game assets to the graphics hardware. The API is compatible both with Windows 10 and Windows 11, although Microsoft is recommending the latter, for "in-built storage optimizations." Also, unlike previously reported, you don't necessarily need an NVMe-based storage device, such as an M.2 SSD with PCIe/NVMe interface. Even a SATA SSD with AHCI protocol will do. Microsoft is, however, recommending the use of an NVMe SSD for the best performance.

There is, however, a wrinkle. Microsoft isn't launching a killer feature of the DirectStorage API yet, which is GPU-accelerated asset decompression. This feature allowed GPUs to use compute shaders to decompress game assets that are stored in compressed asset libraries on the disk. Most games store their local assets this way, to conserve their disk footprint. Without this feature, unless there's special game code from the developer's end to utilize GPGPU for asset decompression; compressed game assets still have to rope in the CPU, and lengthen the pipeline. Microsoft stated that enabling GPU-accelerated decompression is "next on their roadmap."
Add your own comment

78 Comments on Microsoft DirectStorage API Available, but Without GPU-accelerated Decompression

#51
DeathtoGnomes
ValantarXbox isn't much of a money maker
yep, they are priced to cover production costs mainly.
ValantarThe main sales pitch for W10/W11 has always been, and continues to be "this is the new version of Windows". That's the carrot for most people - something new, to match their new hardware. Remember, OS sales outside of what is bundled with PCs is tiny.
So far, the ads I've seen for W11 has been about gaming performance so I'd lean towards "gaming performance with W11 is xxx% over W10". Demand by gamers outweighs than demand by corporate IT demands where change is avoided as long as possible. Which, makes me wonder how can this API benefit the enterprise side.
Posted on Reply
#52
Valantar
DeathtoGnomesSo far, the ads I've seen for W11 has been about gaming performance so I'd lean towards "gaming performance with W11 is xxx% over W10". Demand by gamers outweighs than demand by corporate IT demands where change is avoided as long as possible. Which, makes me wonder how can this API benefit the enterprise side.
It probably doesn't - the compression algorithms supported by DS are quite limited (to those commonly used for game asset compression), so it's kind of doubtful that these see much use in ... let's say enterprise database compression or other stuff they might want GPU accelerated. It can of course happen, but most likely they just store it uncompressed and bulk up on storage instead.

As for the gaming performance ads, that's interesting! I would expect the main outcome (and main goal) to be people thinking "I want a new W11 gaming PC" though - even the idea of an OS upgrade is alien to most gamers and PC users (which is also why MS' tactic of encouraging in-place OS upgrades through Windows Update is ... let's say problematic, as most people have no idea what is going on or what is being changed).
Posted on Reply
#53
opinali
DeathtoGnomesIt sounds promising, but how will it affect GPU performance? Putting the burden of decompression on the GPU could take up resource best used elsewhere. Having said that, I wanna predict that IF ( big if ) this gains traction with developers, this could see a change in gpu architecture, down the road, to accommodate the extra load. Games today, more often, use close to 100% gpu cycles and memory, I gotta wonder how much memeory this api needs to be efficient and not interfere with game performance.

On the pother hand, it also makes me wonder if this will change how developers will use this with the existing environment. Will new games with this function use less resources or not.

This API could also mean no CPU bottle necks, but that doesnt mean developers wont still use the CPU.
This should make a bigger difference during level loading time, when the game's rendering is either paused or doing some very light work showing a transition screen (the typical "Kratos opening a large door" scenes).

For modern open world games that go a long stretch without interruption and need to continuously stream assets, it gets a bit harder but still just a scheduling problem, maybe reserve some small fixed number of shaders for streaming/decompression. That would mean slightly lower rendering performance, but with the tradeoff that you never get dips/stutters due to bottlenecks in storage->CPU->GPU streaming.
Posted on Reply
#54
DeathtoGnomes
ValantarAs for the gaming performance ads, that's interesting! I would expect the main outcome (and main goal) to be people thinking "I want a new W11 gaming PC" though - even the idea of an OS upgrade is alien to most gamers and PC users.
I dont know the percentage competitive gamers within the whole of self proclaimed gamer population, but every new windows version that claims increase in gaming performance will want to upgrade, and those that follow competitive players (and wannabe's) will always copy them and their hardware/software choices. Like I said performance is everything, most gamers will upgrade, its not alien idea because they already upgrade GPU drivers with every release, upgrading windows is no different.
opinaliFor modern open world games that go a long stretch without interruption and need to continuously stream assets, it gets a bit harder but still just a scheduling problem
Open world games fall into that need for continuous streaming. I can say that stutter is more noticeable when changing chunks, if this fixes that problem, I'd be pleased.
Posted on Reply
#55
biggermesh
ValantarIt probably doesn't - the compression algorithms supported by DS are quite limited (to those commonly used for game asset compression), so it's kind of doubtful that these see much use in ... let's say enterprise database compression or other stuff they might want GPU accelerated. It can of course happen, but most likely they just store it uncompressed and bulk up on storage instead.

As for the gaming performance ads, that's interesting! I would expect the main outcome (and main goal) to be people thinking "I want a new W11 gaming PC" though - even the idea of an OS upgrade is alien to most gamers and PC users (which is also why MS' tactic of encouraging in-place OS upgrades through Windows Update is ... let's say problematic, as most people have no idea what is going on or what is being changed).
If you look at the header, the api supports custom decompression.
Posted on Reply
#56
mama
Maybe a stupid question but how is this actually activated in Windows?
Posted on Reply
#57
TheoneandonlyMrK
mamaMaybe a stupid question but how is this actually activated in Windows?
And has anyone noticed Any benefit.
Posted on Reply
#58
venturi
looniamand that a nvme drive.
arguably... ;)
Posted on Reply
#59
bug
ValantarXbox isn't much of a money maker, though Xbox software definitely is. But they could have made that work with literally whatever APIs they wanted to. Arguing anything causal between DX12 and that is ... well, logic I certainly can't follow.
It's actually very simple. XBox is the new kid on the block in the world of consoles. Microsoft would have never made yet another 3D API stick, so they made XBox use DX, so they would lower the entry barrier as much as possible for potential developers.
To this day, DX is still Microsoft's biggest selling point for developers: develop for one API, target two platforms. If you're primarily a console developer you can easily port to PC for some easy extra $$$. And viceversa.
And if you've got developers and thus games cornered, hardware sales will come. Though, as you have noted, hardware revenue is much lower than what Microsoft get for subscriptions. But the underpinning of those subscriptions are still games and thus DX.

I know it's a convoluted explanation, but that how Microsoft works. Whether it's XBox subs, integrating Lync into Office, VS into Azure and everything with AD, Microsoft is all about bundling. So when you want to figure out something about Microsoft, you always have to take a step back and look at the bigger picture.
Posted on Reply
#60
Mussels
Freshwater Moderator
Without the GPU decompression, this is a massive waste as it requires the games textures to be uncompressed - meaning, HUGE game sizes.


It's a good start for devs to begin work on it, but it's missing critical features at this stage.
Posted on Reply
#61
opinali
mamaMaybe a stupid question but how is this actually activated in Windows?
This is a feature for game developers not end-users. Microsoft provides a SDK, games use it (existing games hopefully patched to do that too), then boom the game is faster on your system.

Having said that, end-users will probably need a fully upgraded system, including: UEFI firmware, NVMe firmware, all the Chipset drivers, Windows kernel, GPU driver. Whole shebang of updates. DirectStorage is a deeply low-level feature that requires new tricks in the whole Windows I/O stack. Also it will deliver max performance only on Win11 even if it also runs on Win10; Win11 haters out there will have to finally upgrade when the first game releases appear that feature significantly lower loading times with DS.

Notice that Win11 already includes some fundamental changes towards DirectStorage, since day one. This is very likely the cause of some of the perf regressions that Win11 had that have been gradually fixed by patches, notably on AMD systems but also some high-performance NVMe devices still have lower random write IOPs compared to Win10. And that kind of pain is one of the reasons why MS released Win11 instead of yet another Win10 feature update. At some point it's hard to evolve a 6+ year old OS in a near-perfect compatible way, you gotta clean up and make radical changes in a few places, that was also necessary in areas like virtualization (think WSL2), security (the whole TPM crisis but more like VBS), and GPU drivers (again for WSL2 but prob also DirectStorage). And it's hard to push those big, high-risk changes to an OS that has a billion of risk-averse corporate users. I wish Win11 had shipped a little better baked, it was RC quality at best when shipped, but still happy they did it.
Posted on Reply
#62
R-T-B
MusselsWithout the GPU decompression, this is a massive waste as it requires the games textures to be uncompressed
Not neccesarily, there are compressed formats GPUs can usually work with natively. Even lossy ones. DDS family comes to mind.

Of course that means the GPU or CPU have to eventually unpack it though, lengthening the pipeline either way. This is still of questionable benefit right now.
Posted on Reply
#63
Mussels
Freshwater Moderator
R-T-BNot neccesarily, there are compressed formats GPUs can usually work with natively. Even lossy ones. DDS family comes to mind.

Of course that means the GPU or CPU have to eventually unpack it though, lengthening the pipeline either way. This is still of questionable benefit right now.
Direct transfer is to avoid CPU usage, without hardware decompression that means software - so it has to be decompressed by the CPU before sending it kinda defeating the point
Posted on Reply
#64
venturi
My concern is what about those of us that already have "direct storage" at the hardware/physical level?

I'm trying to figure out how this impacts storage that is directly connected to the CPUs (2x in my case) using the on die raid controllers. Each device gets 4x lanes, and I can connect 4 devices across both cpus. As they are directly connected to the cpus, how would direct storage affect this?
her is the storage on CPU components, I have 4x micron 9300 (52TB Raid)) in direct VROC on cpu storage


In this next picture, no storage is physically plugged into any pcie slot - The drives are plugged individually into 4 separate u.2 connectors on the motherboard. So in this situation I can pick 1,2, 3, 4 devices attached to pci slot, this is processor 1 (out of 2) so each device gets the full pcie bandwidth, direct to cpu, per cpu
making across 2 cpus, 4x 16x for 4 devices in 1 volume (in my case).
I did not include pictures of the vroc array / raid manager in the bios

I hope this new "feature" from microsoft doesn't mess up the 50GB/s write speed and the 61Gb/s read speed and I wonder if its part of the windows server deployment -in case it messes up array functions.

I suppose I could test by allowing that particular update but too many times I've been burnt by microsoft "features"

here is the write speed to my volume as it backs up another onboard device:


Hence the concern on some of the new ms feature 'directstorage' for those of us that already have it at the hardware level.
Posted on Reply
#65
Ibizadr
The differences between the xbox and pc since it's all Microsoft it's in xbox every user use the same hardware and on pc you got millions of different machines
Posted on Reply
#66
Valantar
venturiHence the concern on some of the new ms feature 'directstorage' for those of us that already have it at the hardware level.
It seems that you're misunderstanding what DirectStorage does. It allows for bypassing the CPU as a decompression step in the storage-to-VRAM pipeline for game (and other application) assets. You don't "have that at the hardware level", as it isn't a hardware feature, but a restructuring of data access and processing pipelines. Or, you could say every PC with PCIe support has that at the hardware level, as AFAIK there is nothing stopping this from being implemented across literally everything running Windows containing a PCIe subsystem. As long as all relevant devices are connected through the same PCIe subsystem, the system "has this at the hardware level" - barring decompression support at the end point, of course.

As such, it shouldn't affect your drive access speeds whatsoever - what changes isn't how they're accessed, just where they send data. Rather than storage-PCIe root hub-CPU decompression(-RAM)-PCIe root hub-VRAM (with all data after the decompression stage being much larger) the pipeline now becomes storage-PCIe root hub(-RAM-PCIe root hub)-VRAM, with the additional benefit of less bandwidth used as all data transferred is now compressed. It's also an opt-in API, and not a system-wide function that somehow affects file access generally, so there's no reason why it would change anything at all outside of specifically DirectStorage-enabled applications.

As for whether this is rolled out in Windows Server, I would guess that depends on whether it has any useful applications in that space and whether or not other solutions doing the same already exist.

The main part of DS - and the reason why this announcement is rather nonsensical - is the fact that GPUs can now decompress game assets themselves (at least when compressed in certain ways), rather than needing the CPU to do this before the data can be loaded into VRAM. I would expect server software vendors where this would be an option to either already have implemented this through their own software (which should be entirely possible as long as you can write GPU-accelerated decompression software), or to be eagerly awaiting a standardized way of doing so. Either way, it should have zero effect on file access outside of these specific applications.
Posted on Reply
#67
venturi
ValantarIt seems that you're misunderstanding what DirectStorage does. It allows for bypassing the CPU as a decompression step in the storage-to-VRAM pipeline for game (and other application) assets. You don't "have that at the hardware level", as it isn't a hardware feature, but a restructuring of data access and processing pipelines. Or, you could say every PC with PCIe support has that at the hardware level, as AFAIK there is nothing stopping this from being implemented across literally everything running Windows containing a PCIe subsystem. As long as all relevant devices are connected through the same PCIe subsystem, the system "has this at the hardware level" - barring decompression support at the end point, of course.

As such, it shouldn't affect your drive access speeds whatsoever - what changes isn't how they're accessed, just where they send data. Rather than storage-PCIe root hub-CPU decompression(-RAM)-PCIe root hub-VRAM (with all data after the decompression stage being much larger) the pipeline now becomes storage-PCIe root hub(-RAM-PCIe root hub)-VRAM, with the additional benefit of less bandwidth used as all data transferred is now compressed. It's also an opt-in API, and not a system-wide function that somehow affects file access generally, so there's no reason why it would change anything at all outside of specifically DirectStorage-enabled applications.

As for whether this is rolled out in Windows Server, I would guess that depends on whether it has any useful applications in that space and whether or not other solutions doing the same already exist.

The main part of DS - and the reason why this announcement is rather nonsensical - is the fact that GPUs can now decompress game assets themselves (at least when compressed in certain ways), rather than needing the CPU to do this before the data can be loaded into VRAM. I would expect server software vendors where this would be an option to either already have implemented this through their own software (which should be entirely possible as long as you can write GPU-accelerated decompression software), or to be eagerly awaiting a standardized way of doing so. Either way, it should have zero effect on file access outside of these specific applications.
understood, but if that's the case it would seem more of a gimmick than anything practical.

If its about games, why not just have the option to unpack/decompress the game elements and take up a little more drive space than go through this? ;)

....and, where does one enable/disable this feature?
Posted on Reply
#68
Valantar
venturiunderstood, but if that's the case it would seem more of a gimmick than anything practical.

If its about games, why not just have the option to unpack/decompress the game elements and take up a little more drive space than go through this? ;)

....and, where does one enable/disable this feature?
In reverse order:
-one doesn't, unless one is a game developer. There is literally zero reason for this to be a toggleable option.
- because that "little more drive space" could mean a 2-4x increase in drive space for compressed assets. When games are already pushing 100GB, most of which is compressed assets, this is hardly an attractive proposition.
- I don't understand how reducing cpu load and PCIe bandwidth requirements for the most common high performance PC usage scenario is a gimmick, nor anything but highly practical.
Posted on Reply
#69
bug
ValantarIn reverse order:
-one doesn't, unless one is a game developer. There is literally zero reason for this to be a toggleable option.
- because that "little more drive space" could mean a 2-4x increase in drive space for compressed assets. When games are already pushing 100GB, most of which is compressed assets, this is hardly an attractive proposition.
- I don't understand how reducing cpu load and PCIe bandwidth requirements for the most common high performance PC usage scenario is a gimmick, nor anything but highly practical.
About that last part: at 4k, CPU isn't a bottleneck, there's really not much point in trying to lighten its burden. And I'm not sure PCIe bandwidth is much of a problem either.
Posted on Reply
#70
Valantar
bugAbout that last part: at 4k, CPU isn't a bottleneck, there's really not much point in trying to lighten its burden. And I'm not sure PCIe bandwidth is much of a problem either.
That's true to some extent, but the flipside is that textures and other assets grow a lot with higher resolutions, which increases pressure on decompression and PCIe bandwidth, so the effects of DS are also more pronounced at these resolutions. The CPU might not be a bottleneck at that point, but improving the ability to stream in assets on the fly is potentially a major benefit when you might need to rapidly fetch several GB of data. The improvement is as much about shortening the pipeline and making the process more streamlined - getting the data where it needs to go as quickly and efficiently as possible - as it is about concrete processing bottlenecks. DS also has the potential to lower VRAM usage (due to less need for aggressive pre-caching of assets, with more just-in-time loading becoming possible), or at the same VRAM usage allow for higher quality assets. That obviously doesn't mean that it will improve performance overall in all (or even the majority of) scenarios, but that doesn't mean it doesn't have significant value.
Posted on Reply
#71
Mussels
Freshwater Moderator
venturiMy concern is what about those of us that already have "direct storage" at the hardware/physical level?

I'm trying to figure out how this impacts storage that is directly connected to the CPUs (2x in my case) using the on die raid controllers. Each device gets 4x lanes, and I can connect 4 devices across both cpus. As they are directly connected to the cpus, how would direct storage affect this?
her is the storage on CPU components, I have 4x micron 9300 (52TB Raid)) in direct VROC on cpu storage


In this next picture, no storage is physically plugged into any pcie slot - The drives are plugged individually into 4 separate u.2 connectors on the motherboard. So in this situation I can pick 1,2, 3, 4 devices attached to pci slot, this is processor 1 (out of 2) so each device gets the full pcie bandwidth, direct to cpu, per cpu
making across 2 cpus, 4x 16x for 4 devices in 1 volume (in my case).
I did not include pictures of the vroc array / raid manager in the bios

I hope this new "feature" from microsoft doesn't mess up the 50GB/s write speed and the 61Gb/s read speed and I wonder if its part of the windows server deployment -in case it messes up array functions.

I suppose I could test by allowing that particular update but too many times I've been burnt by microsoft "features"

here is the write speed to my volume as it backs up another onboard device:


Hence the concern on some of the new ms feature 'directstorage' for those of us that already have it at the hardware level.
I dont see how any of what you're talking about relates to direcstorage at all

It's pure software, that allows certain tasks to be done without all the previous CPU overheads (In this current half released state, not as well as intended)
Posted on Reply
#72
bug
ValantarThat's true to some extent, but the flipside is that textures and other assets grow a lot with higher resolutions, which increases pressure on decompression and PCIe bandwidth, so the effects of DS are also more pronounced at these resolutions. The CPU might not be a bottleneck at that point, but improving the ability to stream in assets on the fly is potentially a major benefit when you might need to rapidly fetch several GB of data. The improvement is as much about shortening the pipeline and making the process more streamlined - getting the data where it needs to go as quickly and efficiently as possible - as it is about concrete processing bottlenecks. DS also has the potential to lower VRAM usage (due to less need for aggressive pre-caching of assets, with more just-in-time loading becoming possible), or at the same VRAM usage allow for higher quality assets. That obviously doesn't mean that it will improve performance overall in all (or even the majority of) scenarios, but that doesn't mean it doesn't have significant value.
And you're writing this despite repeated testing showing you can run video cards at PCIe 2.0 speeds with a minimal performance loss.
Posted on Reply
#73
Valantar
bugAnd you're writing this despite repeated testing showing you can run video cards at PCIe 2.0 speeds with a minimal performance loss.
I didn't say anything about performance gains, did I? I mean, I think I did the opposite? Asset streaming judder typically doesn't last long enough to affect average FPS numbers (and sadly TPU still doesn't measure frametimes or 1%/.1% FPS in their scaling tests). But that is overall a relatively minor benefit (outside of very poorly optimized games). The major benefit comes from the entire system becoming more efficient and working more smartly; not wasting energy and time shuffling data around more than what's necessary; allowing for more efficient loading of needed rather than potentially needed assets; making asset streaming drastically more responsive, and simplifying the overall system interaction. So: faster loading, less unnecessary loading/aggressive pre-caching, more efficient decompression, no transfer of uncompressed assets over PCIe, no needing to wait on CPU decompression, potential for keeping assets in VRAM compressed until they're needed. These are the benefits. Will these improve your FPS? Not very likely, no. Are they still real benefits? Yes, without a doubt.
Posted on Reply
#74
venturi
I'm confused. The end game of this direct storage is to let the video card do the decompression. However, if the video card is already struggling to keep FPS, rtx, etc, wouldn't adding decompression roles such as this Actually ADD to the video card burden, in turn reducing performance even further? I guess I'm asking in relation that most of the bottle necks seem video card related, how would this solve the bottleneck? Wouldn't we want to offload burdens on the video card? Or is this making use of some idle section of the card that is doing nothing while the rest is saturated? I would think having the cpu do more more would free up the card to do more. Couldn't apps/games just have more threading for the CPUs do that function?
Posted on Reply
#75
Valantar
venturiI'm confused. The end game of this direct storage is to let the video card do the decompression. However, if the video card is already struggling to keep FPS, rtx, etc, wouldn't adding decompression roles such as this Actually ADD to the video card burden, in turn reducing performance even further? I guess I'm asking in relation that most of the bottle necks seem video card related, how would this solve the bottleneck? Wouldn't we want to offload burdens on the video card? Or is this making use of some idle section of the card that is doing nothing while the rest is saturated? I would think having the cpu do more more would free up the card to do more. Couldn't apps/games just have more threading for the CPUs do that function?
It's a tradeoff between burdening the gpu with decompression vs. having it wait for data processed elsewhere. It likely means the burden of decompression is so low as to me meaningless. If it occupies a few % of the GPU you would never even notice - but you would notice less judder when loading new chunks or streaming assets, less pop-in, etc.
Posted on Reply
Add your own comment
Jun 16th, 2024 13:17 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts