Wednesday, April 21st 2021

DirectStorage API Works Even with PCIe Gen3 NVMe SSDs

Microsoft on Tuesday, in a developer presentation, confirmed that the DirectStorage API, designed to speed up the storage sub-system, is compatible even with NVMe SSDs that use the PCI-Express Gen 3 host interface. It also confirmed that all GPUs compatible with DirectX 12 support the feature. A feature making its way to the PC from consoles, DirectStorage enables the GPU to directly access an NVMe storage device, paving the way for GPU-accelerated decompression of game assets.

This works to reduce latencies at the storage sub-system level, and offload the CPU. Any DirectX 12-compatible GPU technically supports DirectStorage, according to Microsoft. The company however recommends DirectX 12 Ultimate GPUs "for the best experience." The GPU-accelerated game asset decompression is handled via compute shaders. In addition to reducing latencies; DirectStorage is said to accelerate the Sampler Feedback feature in DirectX 12 Ultimate.
More slides from the presentation follow.

Source: NEPBB (Reddit)
Add your own comment

75 Comments on DirectStorage API Works Even with PCIe Gen3 NVMe SSDs

#51
Makaveli
jermando
Read here:

Amd/comments/fwh7q0Amd/comments/fwh7q0/_/fo7rc8r
Interesting I'm not really seeing that on my rig but then again my EVO's are in Raid 0 and not going to break up the raid to test in single drive mode. Would Raid 0 increase RND4k performance I'm not to sure.

Posted on Reply
#52
Mussels
Moderprator
that links is also rather old, mentioning the ABBA AGESA - that's pre zen3.


also for the arguments about needing a CPU Connected NVME.... nah. this is about LOAD times, all that will happen on a slower NVME device is that it'll.... load slower. It wont go "ah shit 3GB/s instead of 3.2? NOOOOOO"

NVME is the requirement because they're programming the GPU to use the NVME driver language, whereas using AHCI would work on every SATA device and likely give some really shit user experiences when some idiot runs it off a mech drive
Posted on Reply
#53
jermando
Mussels
that links is also rather old, mentioning the ABBA AGESA - that's pre zen3.


also for the arguments about needing a CPU Connected NVME.... nah. this is about LOAD times, all that will happen on a slower NVME device is that it'll.... load slower. It wont go "ah shit 3GB/s instead of 3.2? NOOOOOO"
It's not about load times.

Have you seen Ratchet on PS5?

We're talking about different level design... no way to run that on slower SSD.
Mussels
NVME is the requirement because they're programming the GPU to use the NVME driver language, whereas using AHCI would work on every SATA device and likely give some really shit user experiences when some idiot runs it off a mech drive
Uh, what?

GPUs don't understand "NVMe driver language".

As someone else said, it's about direct/p2p DMA transfers straight from the SSD to the GPU (with the PCIe root complex acting as a middle-man).

The root complex is like an Ethernet switch, so ideally you need direct connection. If you want to go from place A to place B, you follow the shortest route. You don't go to place C first (which is much farther).
Posted on Reply
#54
Mussels
Moderprator
jermando
It's not about load times.

Have you seen Ratchet on PS5?

We're talking about different level design... no way to run that on slower SSD.
missing the point... NVME on a chipset vs NVME on CPU is not going to BE slower.
Unless you have a hard speed limit for this, you're just being paranoid and spreading FUD.

Tell me how an NVME PCIE 4.0 card on my x570 chipset slot is going to be slower than an NVME PCI-E 3.0 card on a CPU slot on B450?
Posted on Reply
#55
jermando
Mussels
missing the point... NVME on a chipset vs NVME on CPU is not going to BE slower.
Unless you have a hard speed limit for this, you're just being paranoid and spreading FUD.

Tell me how an NVME PCIE 4.0 card on my x570 chipset slot is going to be slower than an NVME PCI-E 3.0 card on a CPU slot on B450?
Spreading FUD? Are you serious? Have you read any developer presentations?

X570 is a special case as I said, not a normal chipset.

AMD AM4 platforms should be fine (whether it's B450 or X570). That's because all of them have 4 dedicated lanes (and you should use them).

Intel platforms will experience bus bottlenecks. Do you think Intel is stupid for adding 4 dedicated lanes on Rocket Lake?

It's all about reducing bottlenecks.
Posted on Reply
#56
Mussels
Moderprator
It's still going to work on the intel platforms, just slower.
They will not make this tech exist and then lock it down to a minority of their target platform.

Shit all the games that support this are still going to have fallbacks for systems with no support.
Posted on Reply
#57
jermando
Mussels
It's still going to work on the intel platforms, just slower.
They will not make this tech exist and then lock it down to a minority of their target platform.

Shit all the games that support this are still going to have fallbacks for systems with no support.
OK, if you say so...

I don't expect Ratchet to run on Intel platforms with no dedicated lanes.
Posted on Reply
#58
Caring1
jermando
OK, if you say so...

I don't expect Ratchet to run on Intel platforms with no dedicated lanes.
And I don't expect X570 to run as well as a B550 given the issues they suffer.
Posted on Reply
#59
jermando
Caring1
And I don't expect X570 to run as well as a B550 given the issues they suffer.
DirectStorage API has nothing to do with SATA drives, so I don't know what you're talking about.

NVMe works fine on all AMD platforms. Just use the dedicated lanes. :)
Posted on Reply
#60
chrcoluk
jermando
1) Have you studied the PCIe root complex architecture? It's located in the SoC/uncore (previously called northbridge), so I'm afraid you're misinformed.

That's where the GPU is attached, along with NVMe (only for AM4/AMD Zen so far and some recent Intel platforms).

2) Nope. When Ratchet gets ported on PC, you'll understand what I'm talking about. You need raw bandwidth too for instant portal switching.

3) Have you studied the XBOX Series architecture? The NVMe is connected directly to the APU (SoC/uncore part), not the southbridge (that's a separate chip).

Pretty sure you haven't even seen XBOX Series PCB pics (there are 2 PCBs).

Come on guys, there's tons of info out there, educate yourselves! :)
I am aware modern boards have both nvme and gpu direct to cpu, however both those devices dont have a direct connection to each other, they have to go via the cpu to communicate with each other, so yes they are not directly communicating with each other.

Also on the xbox whilst the built in storage is directly connected to cpu, the expansion port is not and microsoft have confirmed that port has the same rules as the internal storage for their software api aka it can play X/S series games.

Performance wise the only time a nvme drive will be slower on a southbridge port is when the link between the chipset and cpu is saturated enough to slow it down, in the majority of computers it wont be.

If it does for some reason get limited to recent chipsets, then it will be an artificial restriction to sell new kit.
Both AMD and Intel unified the old North Bridge and South Bridge into a single chipset. The North Bridge was previously responsible for communicating with PCI-e and memory, and the South Bridge communicated with SATA and IDE, USB, firmware chips, PCI, legacy devices, and audio. These days, all of these devices talk to either the CPU or the unified chipset. Also different in modern times, the memory controller has now been moved to the CPU, becoming an integrated memory controller for both AMD and Intel.
Of course I could be wrong its only my opinion, but I dont see anything in hardware spec sheets as to why southbridge based drives would not work.

The incompatibility is the sata protocol, its a protocol issue not a chipset latency one.
Posted on Reply
#61
jermando
chrcoluk
1) I am aware modern boards have both nvme and gpu direct to cpu, however both those devices dont have a direct connection to each other, they have to go via the cpu to communicate with each other, so yes they are not directly communicating with each other.

2) Also on the xbox whilst the built in storage is directly connected to cpu, the expansion port is not and microsoft have confirmed that port has the same rules as the internal storage for their software api aka it can play X/S series games.
1) DirectStorage is coming to fix that (direct SSD -> GPU transfers)

2) Wrong. I'll have to ask you again: have you studied the console PCBs? Something tells me you haven't.

There are 4 dedicated lanes and 2 of them go to each SSD, straight from the APU. Southbridge is on a separate PCB (daughterboard).

Don't make me post pictures, they're available if you search for them...
Posted on Reply
#62
chrcoluk
Feel free to post pictures as I am not searching again (I already tried to before).

But even if I am wrong on the PCB the issue is a protocol one in my opinion not a chipset latency one, are you trying to claim a pcie3 nvme drive on a cpu based pcie lane has a different performance metric to one connected to the chipset?

Until we are told specifically it wont work with a reason to back it up, I am going to assume it will work. It will likely just need a minimum nvme version protocol requirement, plus minimum rated speed drive.
Posted on Reply
#63
jermando
chrcoluk
Feel free to post pictures as I am not searching again (I already tried to before).

But even if I am wrong on the PCB the issue is a protocol one in my opinion not a chipset latency one, are you trying to claim a pcie3 nvme drive on a cpu based pcie lane has a different performance metric to one connected to the chipset?
Dedicated lanes mean: 1) less latency, 2) guaranteed bandwidth.

Do you remember when AMD launched K8 with IMC (integrated memory controller) and Intel still had no IMC (until Nehalem came)?

How many Intel users said "nobody needs an IMC"? And how many people take IMC for granted now?
Posted on Reply
#64
chrcoluk
why does guaranteed bandwidth matter unless you are loading up the link?

I expect the latency differences are insignificant, there is nothing in any directstorage api article that states it matters, it just needs drives to meet a certain specification and the nvme protocol.

Also not sure what the IMC has to do with this. NVME latencies and bandwidth are nothing like ram.

Not saying I am 100% right, just that until I see a statement saying it wont work I dont have reason to believe otherwise. :)

Think of all the reviewers who pushed nvme drives for long periods on chipset ports and they were not hitting saturation that affected the performance.
Posted on Reply
#65
jermando
chrcoluk
why does guaranteed bandwidth matter unless you are loading up the link?
Watch this (with an open mind!) and you'll understand why:


Not saying Ratchet will ever get ported to PC, but IF it does, expect 5.5 GB/s to be the minimum required spec. How is your southbridge going to handle that?

I understand that some of you need to justify your rigs (especially if you have an old Intel PC with PCIe 3.0/PCH NVMe), but again: try to approach what I'm saying with an open mind... otherwise it's totally pointless to even bother.

And yes, 15+ years ago people didn't believe me when I told them about the IMC benefits. The same pattern happens now with NVMe.

Technology needs to progress and leave old rigs behind. It's always been that way, but most people don't even have an open mind.
Posted on Reply
#66
Mussels
Moderprator
Nothing in that video screams "i will only work on new technology"
You've got some weird ideas going on here
Posted on Reply
#67
jermando
chrcoluk
Think of all the reviewers who pushed nvme drives for long periods on chipset ports and they were not hitting saturation that affected the performance.
Again: do reviewers have access to DirectStorage API yet?

Have you ever wondered why SATA vs NVMe SSD benchmarks show ZERO difference so far?

Maybe, just maybe the antiquated Windows I/O stack has something to do with that?

devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/
Unfortunately, current storage APIs were not optimized for this high number of IO requests, preventing them from scaling up to these higher NVMe bandwidths creating bottlenecks that limit what games can do. Even with super-fast PC hardware and an NVMe drive, games using the existing APIs will be unable to fully saturate the IO pipeline leaving precious bandwidth on the table.

That’s where DirectStorage for PC comes in. This API is the response to an evolving storage and IO landscape in PC gaming. DirectStorage will be supported on certain systems with NVMe drives and work to bring your gaming experience to the next level.
Of course not every game will need high I/O. Pixelated indies will work just fine even with HDD.

But if you want next-gen games like Ratchet on your PC, you'll need to upgrade your rig. There's no other way.
Mussels
Nothing in that video screams "i will only work on new technology"
You've got some weird ideas going on here
What kind of "weird" ideas? Does MS has weird ideas?

Did you see the portals and how fast it changes worlds? How are you going to accomplish that with a slower medium?

I'm pretty sure you didn't even see the whole video.
Posted on Reply
#68
Mussels
Moderprator
You can achieve that by
*checks notes*
Oh right loading it into RAM
Posted on Reply
#69
jermando
Mussels
You can achieve that by
*checks notes*
Oh right loading it into RAM
Oh cool, you're gonna put 64-128GB of RAM on your PC, so that you'll avoid the NVMe part? And you think that's smart cost-wise?

You're going to pay a lot more money if you follow that route, since RAM tends to be more expensive. I know, because I have 64GB DDR4 on my PC since 2019.

That's assuming that game devs will actually code a 2nd path (RAMdisk), while we do know (if you read what actual presentations say) that both consoles have an anti-RAMdisk philosophy this gen (small RAM + high-speed SSD to load assets on the fly).

Most PC gamers only have 16GB of RAM.
Posted on Reply
#70
Mussels
Moderprator
You're excited for a new game and new tech but... none of what you're saying is based in fact, it's all just your excited fan theories.
Posted on Reply
#71
jermando
Mussels
You're excited for a new game and new tech but... none of what you're saying is based in fact, it's all just your excited fan theories.
Read official MS + Sony presentations and educate yourself about how consoles work.

Do your homework and then we can talk.

Don't be surprised if Sony asks for 128GB of RAM for Ratchet to run on PCs that don't actually have a fast SSD... will that be cheaper? Probably not.

Ratchet is just an example, others will follow soon after that.

ps: I'm not excited about Ratchet, nor €80 games. I'm here to post technological facts, but maybe I'm in the wrong site, since I see a lot of prejudice against consoles and their paradigm shift. Stop making assumptions about people that you don't even know.
Posted on Reply
#72
EsaT
jermando
1) DirectStorage is coming to fix that (direct SSD -> GPU transfers)
There's no magical teleportation of data directly from SSD to GPU, unless those marketing diagrams lie:

Any data going through system RAM goes through CPU/its package.
Because that's where memory controller is and that's what system RAM is connected into.
What's skipped is CPU cores handling that data.
jermando
Don't be surprised if Sony asks for 128GB of RAM for Ratchet to run on PCs that don't actually have a fast SSD... will that be cheaper? Probably not.

...I'm here to post technological facts
Okay, let's check technological facts:
Flash memory has literally many magnitudes worser latencies than DRAM and lot worser bandwidth.

So if that game actually needs to constantly handle that much data, it's going to have lots of issues needing hiding on consoles.
No Flash based NVMe is simply even remotely fast enough to deliver data fast enough on the fly at the moment GPU needs it!
Just remember that if graphics card runs out of VRAM and has to wait data from (faster than any NVMe) system RAM, that causes instant performance drops.

And courtesy of minimal generational memory increase, those new consoles don't even have any RAM to spare for buffering.
While that PC owner affording prices of PCIe v4 NVMes should automatically have 32 GB of system RAM...
With likely 20 GB of it sitting there with no direct use for the game and hence available for buffering on background.
Game developer would only need to code game to prefetch data when end of one part of the game level/map approaches and new assets would be available faster than from any NVMe.


What's holding back game loading times most is no doubt crappy coding.
There are games whose loading times scale very nicely with transfer rates dropping to second or two level even without any DirectStorage:
www.realhardwarereviews.com/silicon-power-us70-1tb-review/11/
Though that 24 core Threadripper of test platform offers some serious data crunching power...
Posted on Reply
#74
chrcoluk
Discussed with some others and they seem to agree with me, but lets say I am wrong and for some reason its decided to require cpu connected pcie lanes. How could the DirectIO software determine if a nvme drive is connected that way? from what I can tell there is no distinction, its still connected via 4 lanes, and it runs at the same nvme specification with the same performance characteristics. So I think it may not even be possible to enforce even if they wanted to.

Also
Again: do reviewers have access to DirectStorage API yet?

Have you ever wondered why SATA vs NVMe SSD benchmarks show ZERO difference so far?

Maybe, just maybe the antiquated Windows I/O stack has something to do with that?
Well yeah this is what I have been trying to tell you, you making assumptions based on theory, and the speedup in the new i/o stack is optimisations to the software stack as to how the data is read, the bottleneck is the sata protocol and i/o stack not the chipset interface. Plus that gpu hardware can handle the data decompression etc. faster than a typical cpu can.

Those nvme performance reviews are relevant to the point to prove that in a typical system the chipset link doesnt strangle a nvme drive. You would maybe have issues though if trying to read from multiple nvme drives at the same time over the chipset or have some other bandwidth heavy device running there, but these are very rare cases in consumer pc's.

We simply going to have to wait and see.
Posted on Reply
#75
Mussels
Moderprator
chrcoluk
Discussed with some others and they seem to agree with me, but lets say I am wrong and for some reason its decided to require cpu connected pcie lanes. How could the DirectIO software determine if a nvme drive is connected that way? from what I can tell there is no distinction, its still connected via 4 lanes, and it runs at the same nvme specification with the same performance characteristics. So I think it may not even be possible to enforce even if they wanted to.

Also



Well yeah this is what I have been trying to tell you, you making assumptions based on theory, and the speedup in the new i/o stack is optimisations to the software stack as to how the data is read, the bottleneck is the sata protocol and i/o stack not the chipset interface. Plus that gpu hardware can handle the data decompression etc. faster than a typical cpu can.

Those nvme performance reviews are relevant to the point to prove that in a typical system the chipset link doesnt strangle a nvme drive. You would maybe have issues though if trying to read from multiple nvme drives at the same time over the chipset or have some other bandwidth heavy device running there, but these are very rare cases in consumer pc's.

We simply going to have to wait and see.
He doesnt get that its pretty much the NVME driver stack/IO stack is the key requirement here, its the magic sauce (do your research!)
Posted on Reply
Add your own comment