- Joined
- Sep 21, 2020
- Messages
- 1,832 (1.09/day)
Processor | 5800X3D -30 CO |
---|---|
Motherboard | MSI B550 Tomahawk |
Cooling | DeepCool Assassin III |
Memory | 32GB G.SKILL Ripjaws V @ 3800 CL14 |
Video Card(s) | ASRock MBA 7900XTX |
Storage | 1TB WD SN850X + 1TB ADATA SX8200 Pro |
Display(s) | Dell S2721QS 4K60 |
Case | Cooler Master CM690 II Advanced USB 3.0 |
Audio Device(s) | Audiotrak Prodigy Cube Black (JRC MUSES 8820D) + CAL (recabled) |
Power Supply | Seasonic Prime TX-750 |
Mouse | Logitech Cordless Desktop Wave |
Keyboard | Logitech Cordless Desktop Wave |
Software | Windows 10 Pro |
After watching this video I decided to do my own testing of DirectStorage to see how it affects performance:
This benchmark uses the newest DirectStorage 1.2 sample by GPUOpen. It lets you compare loading times of different assets with or without the aid of DS. You can also specify how the textures will be decompressed: on the CPU -- which is the default implementation -- or leveraging the GPU. The test uses four models of increasing complexity, with a proportionally larger texture set:




I tested it on the system in my profile, a powerful 4K machine with a 5800X3D, 7900XTX, and tweaked 3800CL14 dual-rank DDR4 RAM. I assessed both a Gen3 and a Gen4 NVMe SSD, using some of the fastest drives of their generation -- an ADATA SX8200 Pro and a WD SN850X.
First, let's see how DS affects the time spent by the CPU from the moment the first texture request is made, until the time the transfer to the GPU is complete. This is represented by the I/O time metric, where lower values mean faster completion of the process:
Even with the simplest asset, GPU decompression reduced the CPU time by more than a half compared to using the CPU alone. A faster drive initially made no difference. But with the larger texture sets the Gen4 SSD comes into its own, rapidly separating itself from the older model. Using the SX8200 Pro, the I/O time was decreased by 56%, 68%, 53% and 54% respectively, when contrasted with pure CPU decompression. The SN850X widened the gap to 52%, 73%, 83% and 84%, allowing the CPU to complete all these tasks nearly six times faster -- in 324 rather than 1839ms! The ADATA drive enabled the processor to finish over two times faster (in 828ms) when it was freed from the burden of decompressing the textures by the GPU.
Now let's analyze the impact of DS on the time it takes the CPU to completely load the scene. Again, lower CPU load time means the model will be presented quicker:
We can already see a similar pattern here, but the differences are even more pronounced. Enabling DS with CPU decompression (standard game implementation) helps to reduce the loading times significantly, cutting it down by 3.2x and 4.3x in case of the simpler scenes, and presenting the complex models 1.3 to 1.5 times faster. But the real advantage of DS lies in GPU decompression. Even with a Gen3 drive, the scenes load 490%, 765%, 258% and 293% faster! And a Gen4 SSD allows for even more incredible 4.74x, 8.42x, 5.24x and 5.76x speed up.
When we evaluate total loading time for all four scenes, we see the following gains:
DirectStorage appears to be a very capable technology, with potentially amazing benefits. It should enable greatly reduced loading times and a smoother gameplay experience. As games get more complex visually and virtual worlds more expansive, the advantage of DS will likely become clear, especially in open world titles, which constantly stream in textures. And when implemented properly, GPU decompression could become the real game changer (pun intended). In these tests, we saw nearly twice as fast loading times with a Gen3 drive when contrasted with exclusive CPU texture decompression. And the difference between the 5800X3D and the 7900XTX was almost quadruple with a Gen4 SSD.
Lastly, let's look at the average data bandwidth when decompressing the textures using all these different techniques, as indicated by Data Rate:
For reference, this is how the storage solutions of current generation consoles stack up. Both utilize a custom Gen4 SSD and additional dedicated hardware to assist with asset decompression:
And if you would like to check out DS performance for yourself, the video at the top has a link to a downloadable build of the DS sample. I used the same settings in my tests as in this video. You can create these batch files to start the benchmark, and find detailed statistics for the run in the corresponding *.csv file in the \bin subfolder:
DS disabled
DS enabled, CPU decompression
DS enabled, GPU decompression

This benchmark uses the newest DirectStorage 1.2 sample by GPUOpen. It lets you compare loading times of different assets with or without the aid of DS. You can also specify how the textures will be decompressed: on the CPU -- which is the default implementation -- or leveraging the GPU. The test uses four models of increasing complexity, with a proportionally larger texture set:




Model | Texture size compressed [MB] | Texture size uncompressed [MB] |
---|---|---|
BoomBox | 10.86 | 85.37 |
X1 | 29.37 | 170.69 |
SpaceShuttle | 926.78 | 2475.79 |
CommandModule | 915.30 | 2758.72 |
I tested it on the system in my profile, a powerful 4K machine with a 5800X3D, 7900XTX, and tweaked 3800CL14 dual-rank DDR4 RAM. I assessed both a Gen3 and a Gen4 NVMe SSD, using some of the fastest drives of their generation -- an ADATA SX8200 Pro and a WD SN850X.
First, let's see how DS affects the time spent by the CPU from the moment the first texture request is made, until the time the transfer to the GPU is complete. This is represented by the I/O time metric, where lower values mean faster completion of the process:
Model | CPU texture decompression [ms] | GPU texture decompression + Gen3 SSD [ms] | GPU texture decompression + Gen4 SSD [ms] |
---|---|---|---|
BoomBox | 27 | 12 | 13 |
X1 | 68 | 22 | 18 |
SpaceShuttle | 865 | 402 | 150 |
CommandModule | 879 | 392 | 143 |
Even with the simplest asset, GPU decompression reduced the CPU time by more than a half compared to using the CPU alone. A faster drive initially made no difference. But with the larger texture sets the Gen4 SSD comes into its own, rapidly separating itself from the older model. Using the SX8200 Pro, the I/O time was decreased by 56%, 68%, 53% and 54% respectively, when contrasted with pure CPU decompression. The SN850X widened the gap to 52%, 73%, 83% and 84%, allowing the CPU to complete all these tasks nearly six times faster -- in 324 rather than 1839ms! The ADATA drive enabled the processor to finish over two times faster (in 828ms) when it was freed from the burden of decompressing the textures by the GPU.
Now let's analyze the impact of DS on the time it takes the CPU to completely load the scene. Again, lower CPU load time means the model will be presented quicker:
Model | DS disabled [ms] | DS enabled CPU texture decompression [ms] | DS enabled GPU texture decompression + Gen3 SSD [ms] | DS enabled GPU texture decompression + Gen4 SSD [ms] |
---|---|---|---|---|
BoomBox | 147 | 46 | 30 | 31 |
X1 | 497 | 116 | 65 | 59 |
SpaceShuttle | 1252 | 955 | 486 | 239 |
CommandModule | 1508 | 995 | 514 | 262 |
We can already see a similar pattern here, but the differences are even more pronounced. Enabling DS with CPU decompression (standard game implementation) helps to reduce the loading times significantly, cutting it down by 3.2x and 4.3x in case of the simpler scenes, and presenting the complex models 1.3 to 1.5 times faster. But the real advantage of DS lies in GPU decompression. Even with a Gen3 drive, the scenes load 490%, 765%, 258% and 293% faster! And a Gen4 SSD allows for even more incredible 4.74x, 8.42x, 5.24x and 5.76x speed up.
When we evaluate total loading time for all four scenes, we see the following gains:
DS disabled [ms] | DS enabled CPU texture decompression [ms] | DS enabled GPU texture decompression + Gen3 SSD [ms] | DS enabled GPU texture decompression + Gen4 SSD [ms] | |
---|---|---|---|---|
All scenes | 3404 | 2112 | 1095 | 591 |
Speed up factor | 1.61x | 3.11x | 5.76x |
DirectStorage appears to be a very capable technology, with potentially amazing benefits. It should enable greatly reduced loading times and a smoother gameplay experience. As games get more complex visually and virtual worlds more expansive, the advantage of DS will likely become clear, especially in open world titles, which constantly stream in textures. And when implemented properly, GPU decompression could become the real game changer (pun intended). In these tests, we saw nearly twice as fast loading times with a Gen3 drive when contrasted with exclusive CPU texture decompression. And the difference between the 5800X3D and the 7900XTX was almost quadruple with a Gen4 SSD.
Lastly, let's look at the average data bandwidth when decompressing the textures using all these different techniques, as indicated by Data Rate:
Model | CPU texture decompression disk only vs. DS amplified [GB/s] | GPU texture decompression + Gen3 SSD disk only vs. DS amplified [GB/s] | GPU texture decompression + Gen4 SSD disk only vs. DS amplified [GB/s] |
---|---|---|---|
BoomBox | 0.4 vs. 3.2 | 0.9 vs. 7.3 | 0.9 vs. 6.7 |
X1 | 0.4 vs. 2.6 | 1.4 vs. 8.0 | 1.7 vs. 9.7 |
SpaceShuttle | 1.1 vs. 2.9 | 2.4 vs. 6.3 | 6.3 vs. 16.9 |
CommandModule | 1.1 vs. 3.2 | 2.4 vs. 7.2 | 6.6 vs. 19.8 |
For reference, this is how the storage solutions of current generation consoles stack up. Both utilize a custom Gen4 SSD and additional dedicated hardware to assist with asset decompression:
Maximum raw SSD throughput [GB/s] | Typical storage throughput - decompressing [GB/s] | Maximum storage throughput - decompressing [GB/s] | |
---|---|---|---|
Xbox Series S/X | 3.9 | 4.8 | 6.5 |
PlayStation 5 | 5.5 | 8.5 | 22.0 |
And if you would like to check out DS performance for yourself, the video at the top has a link to a downloadable build of the DS sample. I used the same settings in my tests as in this video. You can create these batch files to start the benchmark, and find detailed statistics for the run in the corresponding *.csv file in the \bin subfolder:
DS disabled
Code:
setlocal
pushd bin
FOR /f "tokens=* delims=" %%A in ('timestamp') do @set "ds_ts=%%A"
DirectStorageSample_DX12.exe {"iotiming":true, "stagingbuffersize":268435456, "profile": true, "profileOutputPath":"DS_off.csv"}
popd
DS enabled, CPU decompression
Code:
setlocal
pushd bin
FOR /f "tokens=* delims=" %%A in ('timestamp') do @set "ds_ts=%%A"
DirectStorageSample_DX12.exe {"directstorage":true, "iotiming":true, "disablegpudecompression":true, "stagingbuffersize":268435456, "profile": true, "profileOutputPath":"CPU.csv"}
popd
DS enabled, GPU decompression
Code:
setlocal
pushd bin
FOR /f "tokens=* delims=" %%A in ('timestamp') do @set "ds_ts=%%A"
DirectStorageSample_DX12.exe {"directstorage":true, "iotiming":true, "stagingbuffersize":268435456, "profile": true, "profileOutputPath":"GPU.csv"}
popd

Last edited: