Thursday, September 3rd 2020

NVIDIA RTX IO Detailed: GPU-assisted Storage Stack Here to Stay Until CPU Core-counts Rise

Sep 3rd, 2020 00:30 Discuss (52 Comments)

NVIDIA at its GeForce "Ampere" launch event announced the RTX IO technology. Storage is the weakest link in a modern computer, from a performance standpoint, and SSDs have had a transformational impact. With modern SSDs leveraging PCIe, consumer storage speeds are now bound to grow with each new PCIe generation doubling per-lane IO bandwidth. PCI-Express Gen 4 enables 64 Gbps bandwidth per direction on M.2 NVMe SSDs, AMD has already implemented it across its Ryzen desktop platform, Intel has it on its latest mobile platforms, and is expected to bring it to its desktop platform with "Rocket Lake." While more storage bandwidth is always welcome, the storage processing stack (the task of processing ones and zeroes to the physical layer), is still handled by the CPU. With rise in storage bandwidth, the IO load on the CPU rises proportionally, to a point where it can begin to impact performance. Microsoft sought to address this emerging challenge with the DirectStorage API, but NVIDIA wants to build on this.

According to tests by NVIDIA, reading uncompressed data from an SSD at 7 GB/s (typical max sequential read speeds of client-segment PCIe Gen 4 M.2 NVMe SSDs), requires the full utilization of two CPU cores. The OS typically spreads this workload across all available CPU cores/threads on a modern multi-core CPU. Things change dramatically when compressed data (such as game resources) are being read, in a gaming scenario, with a high number of IO requests. Modern AAA games have hundreds of thousands of individual resources crammed into compressed resource-pack files.

Although at a disk IO-level, ones and zeroes are still being moved at up to 7 GB/s, the de-compressed data stream at the CPU-level can be as high as 14 GB/s (best case compression). Add to this, each IO request comes with its own overhead - a set of instructions for the CPU to fetch x piece of resource from y file, and deliver to z buffer, along with instructions to de-compress or decrypt the resource. This could take an enormous amount of CPU muscle at a high IO throughput scale, and NVIDIA pegs the number of CPU cores required as high as 24. As we explained earlier, DirectStorage enables a path for devices to directly process the storage stack to access the resources they need. The API by Microsoft was originally developed for the Xbox Series X, but is making its debut on the PC platform.

NVIDIA RTX IO is a concentric outer layer of DirectStorage, which is optimized further for gaming, and NVIDIA's GPU architecture. RTX IO brings to the table GPU-accelerated lossless data decompression, which means data remains compressed and bunched up with fewer IO headers, as it's being moved from the disk to the GPU, leveraging DirectStorage. NVIDIA claims that this improves IO performance by a factor of 2. NVIDIA further claims that GeForce RTX GPUs, thanks to their high CUDA core counts, are capable of offloading "dozens" of CPU cores, driving decompression performance beyond even what compressed data loads PCIe Gen 4 SSDs can throw at them.

There is, however, a tiny wrinkle. Games need to be optimized for DirectStorage. Since the API has already been deployed on Xbox since the Xbox Series X, most AAA games for Xbox that have PC versions, already have some awareness of the tech, however, the PC versions will need to be patched to use the tech. Games will further need NVIDIA RTX IO awareness, and NVIDIA needs to add support on a per-game basis via GeForce driver updates. NVIDIA didn't detail which GPUs will support the tech, but given its wording, and the use of "RTX" in the branding of the feature, NVIDIA could release the feature to RTX 20-series "Turing" and RTX 30-series "Ampere." The GTX 16-series probably misses out as what NVIDIA hopes to accomplish with RTX IO is probably too heavy on the 16-series, and this may have purely been a performance-impact based decision for NVIDIA.

Add your own comment

52 Comments on NVIDIA RTX IO Detailed: GPU-assisted Storage Stack Here to Stay Until CPU Core-counts Rise

#26

Valantar

ebivanDeath Stranding: 64 GB
Horizon Zero Dawn: 72 GB
Mount & Blade 2: 51 GB
Red Dead Redemption 2: 110 GB
Star Citizen: 60 GB

...and? We could all list a bunch of random games at random sizes. CoD: Warzone is still 175GB, and other high budget AAA games are likely to exceed this soon. I never said all games were at that level, but there will be plenty of >100GB games in the next couple of years.

Chrispy_Will this play nice with DirectStorage or is it going to be another Nvidia black box that only Nvidia 3000-series customers get to beta test for Jensen?

According to Anandtech:

At a high level this appears to be NVIDIA’s implementation of Microsoft’s forthcoming DirectStorage API

#27

ebivan

Valantar...and? We could all list a bunch of random games at random sizes. CoD: Warzone is still 175GB, and other high budget AAA games are likely to exceed this soon. I never said all games were at that level, but there will be plenty of >100GB games in the next couple of years.

Yes, there WILL be. But there ARE not. SSDs prices WILL drop too.

#28

rsouzadk

It is already confirmed by NVidia itself that RTX IO is supported on all RTX Turing and NVidia Ampere architecture gpus.

www.nvidia.com/en-us/geforce/news/rtx-io-gpu-accelerated-storage-technology/

#29

Mouth of Sauron

ValantarWhat review sites do you know of that systematically only tests games in RT mode? Sure, RT benchmarks will become more of a thing this generation around, but I would be shocked if that didn't mean additional testing on top of RT-off testing. And comparing RT-on vs. RT-off is obviously not going to happen (that would make the RT-on GPUs look terrible!).

You misunderstood. Look at CPU testing benchmarks. 2080 Ti, of course, and way too much games on Ultimate details in FHD, as it proves anything, since most are over 100FPS. Now replace it with 3090 and add 'RT' games and it will mean even less. 'Reducing GPU bottleneck' is two-handed sword, because it skips benchmarks that may actually mean something on tested 200g CPU paired with GPU that person buying mentioned processor would actually consider buying - which is certainly not 6x more expensive. Real gaming weaknesses might stay hidden, because of this.

As for RT, as I said - I believe nothing until I see it. Right now, I think NVIDIA will pressure benchmarking sites to include RT titles in benchmark suite.

As for looking terrible, what if it is? Should we hide those results? I think not, publish them and publish screenshots/videos and see is it worth investing +700g in 3090 over 3080 or whatever else... Buyers should decide, based on true input - quality/quantity included.

#30

Valantar

ebivanYes, there WILL be. But there ARE not. SSDs prices WILL drop too.

I just showed you that there are. As for SSD prices dropping: sure, but there is no way on earth they will be dropping more than 20% per year - not until we have PLC SSDs, at least. Silicon manufacturing is expensive. The increases in game install size - which have already been accelerating for 5+ years, alongside increases in resolution and texture quality, which there is no reason to expect a slowdown of - will far outstrip any drops in SSD pricing.

Mouth of SauronYou misunderstood. Look at CPU testing benchmarks. 2080 Ti, of course, and way too much games on Ultimate details in FHD, as it proves anything, since most are over 100FPS. Now replace it with 3090 and add 'RT' games and it will mean even less. 'Reducing GPU bottleneck' is two-handed sword, because it skips benchmarks that may actually mean something on tested 200g CPU paired with GPU that person buying mentioned processor would actually consider buying - which is certainly not 6x more expensive. Real gaming weaknesses might stay hidden, because of this.

As for RT, as I said - I believe nothing until I see it. Right now, I think NVIDIA will pressure benchmarking sites to include RT titles in benchmark suite.

As for looking terrible, what if it is? Should we hide those results? I think not, publish them and publish screenshots/videos and see is it worth investing +700g in 3090 over 3080 or whatever else... Buyers should decide, based on true input - quality/quantity included.

RT certainly doesn't remove any GPU bottleneck - it introduces a massive new one! - so it will never be used for CPU testing by any reviewer with even a modest amount of knowledge of their field.

As for the rest of that part: that's a critique completely unrelated to this thread, Ampere, and any new GPU in general. It's a critique of potential shortcomings of how most sites do GPU and CPU testing. And it is likely valid to some degree, but ... irrelevant here. I agree that it would be nice to see tests run on lower end hardware too, but that would double the workload on already overworked and underpaid reviewers, so it's not going to happen, sadly. At least not until people start paying for their content.

Nvidia won't have to pressure anyone to include RT titles in their benchmark suites. Any professional reviewer will add a couple of titles for it, as is done with all new major features, APIs, etc. That's the point of having a diverse lineup of games, after all. And any site only including RT-on testing would either need to present a compelling argument for this, or I would ignore them as that would be a clear sign of poor methodology on their part. That has nothing to do with Nvidia.

And I never commented on your opinions of how current RT looks, so I have no idea who you're responding to there.

#31

ebivan

Another point would be power consumption, I'd rather decompress stuff once and have a big game dir instead of continually having my GPU/CPU decompressig stuff and using power forthat just to dump it a second later and do it all over again and again...

#32

Valantar

ebivanAnother point would be power consumption, I'd rather decompress stuff once and have a big game dir instead of continually having my GPU/CPU decompressig stuff and using power forthat just to dump it a second later and do it all over again and again...

That's a decent point, but it fails in the face of practicalities. Especially with a dedicated hardware decompression block you could probably decompress hundreds of petabytes of game data for the price of a 1TB NVMe SSD.

#33

ebivan

As I understand nVidia its not a dedicated fixed function (like video en/decoding) unit but rather Tensor/Cuda cores that get allocated to do this. Which would impact rendering performance and cost electrical power.

If it was a fixed function block, why would Turing be able to do it? Its not like Huang would have a new hardware function in Turing and forgot to brag about it for two years.

#34

hsew

Maybe Intel can put its iGPU to use and pull off something similar? Once it gets on Gen 4 at least.

#35

nguyen

Oh man the primary objective of compression and decompression is to improve throughput, not the file size reduction.
Nvidia already have compression/decompression with their GPU since Pascal to improve VRAM bandwidth
www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/8

#36

TechLurker

This sounds similar to something AMD said they've been working on awhile back, where their future GPUs would be able to directly talk to the storage system for the necessary assets and bypass the CPU. And true, we saw a form of it in the consoles. MS decided to use their DirectStorage API, whereas Sony used a fairly powerful custom controller instead. Here's to hoping RDNA2 is also able to directly talk to SSD/NVMe drives as well.

I do wonder if that special workstation GPU of AMD's, that allowed installation of an NVMe as a large cache drive, also helped that concept along.

#37

ebivan

No man, throughput has never been the problem. Every now an then there are articles here on Techpowerup comparing 4, 8 and 16 PCIe Lanes for GPUs and it always turns out that the impact of pcie troughput is marginal.
Its all about Disk i/o and decompressing assets.

#38

Mouth of Sauron

ValantarRT certainly doesn't remove any GPU bottleneck - it introduces a massive new one! - so it will never be used for CPU testing by any reviewer with even a modest amount of knowledge of their field.

Yeah, that's what I said, too. Now CPU benchmarks will be burdened with really uncommon workload, ray-tracing is far from being just GPU-based. Unlike you, I think virtually all benchmarks will have something RT-based, which is kinda bad because it doesn't concern 99% users right now.

I said what I think of supposed GPU-accelerated storage - I'll believe it when I see it. NVIDIA put a lot of dubious claims lately - I actually know quite a bit about ray-tracing and know they are misrepresenting what they do, but they also ray-trace *sound* (sic) and whatnot - now they accelerate PCIe4 M.2. Right. I need to see real-life proof that it's happening and what are the gains.

#39

Valantar

Mouth of SauronYeah, that's what I said, too. Now CPU benchmarks will be burdened with really uncommon workload, ray-tracing is far from being just GPU-based. Unlike you, I think virtually all benchmarks will have something RT-based, which is kinda bad because it doesn't concern 99% users right now.

I said what I think of supposed GPU-accelerated storage - I'll believe it when I see it. NVIDIA put a lot of dubious claims lately - I actually know quite a bit about ray-tracing and know they are misrepresenting what they do, but they also ray-trace *sound* (sic) and whatnot - now they accelerate PCIe4 M.2. Right. I need to see real-life proof that it's happening and what are the gains.

Making their GPUs do their own decompression isn't exactly a dubious claim, I mean, MS has made a new API for it (well, for hardware accelerated decompression to offload the CPU) that is also used on the upcoming Xboxes. As such it is likely to be a standard feature of many games in a couple of years. Ray tracing sound is of course a bit weird, but it makes sense - sound waves propagate in a way that can be simplified down to a collection of rays, including bouncing off surfaces and the like, though of course this behaviour (as well as bending around corners etc.) is different than how light rays behave, and it will always be a simplification rather than a simulation of how real sound waves work. It's still a technique that can allow for far more realistic spatial audio than we have today. As for Nvidia misrepresenting their RTRT implementation, I'll leave it to you to flesh that out, as IMO it's pretty clear that what we have currently is relatively low-performance and not really suited to fully path traced graphics, but well suited for low bounce count light effects like reflections and global illumination. Even for that performance still needs to increase, but it's passable as a first-generation solution of something that was previously entirely possible (non-real time RT has of course been around for ages). Nvidia does like to sell RTX as the second coming of raptor Jesus, but it's not like they're saying rasterization is dead or fully path traced games are going to be the standard from now on.

#40

nguyen

ebivanNo man, throughput has never been the problem. Every now an then there are articles here on Techpowerup comparing 4, 8 and 16 PCIe Lanes for GPUs and it always turns out that the impact of pcie troughput is marginal.
Its all about Disk i/o and decompressing assets.

Huh did you read the DirectStorage blog from MS ?
Basically with the new API, an NVMe drive can easily saturate its max bandwidth.
Pretty much all modern NVMe drive can handle >300 000 IOPs, at 64K blocks size that >19.2GB/s of bandwidth
PCIe Gen 4 x4 max bandwidth is 7.8GB/s.
Now when you compress/decompress the data stream, the effective bandwidth is 14GB/s as noted in Nvidia slide. Which is even higher throughput than using RAMDISK on Dual Channel DDR4.
I guess another option to increase throughput is using 2 PCIe 4.0 x4 NVMe in RAID 0. Either way you have plenty of option to take advantage of MS DirectStorage API: RAID 0, high core count CPU or a Nvidia GPU.

#41

InVasMani

Perfect example of why I think AMD should make a infinity fabric mGPU/GPU bridge put a individual memory chip or a pair in dual channel a pair of M.2 slots and a CPU chip. Have it host it's own OS and do compression/decompression on the fly along with cache acceleration like StoreMi completely offloaded from the primary socketed CPU/RAM/Storage and OS including those random Windows 10 background telemetry and updates nonsense for HBCC. They could actually do that pretty easily and have like 2-4 cores they could dedicated to it. They'd probably have extra processing overhead too to have a one of the newer revision protocol USB headers for a front panel device as well for that matter. It's could be semi multi-purpose yet directly tie in and be compatible with it's GPU's and a good resource offload device. It sit in a PCIe x1 slot and draw both power and additional bandwidth from that too may as well and good spot to mount it and allows for a 1-slot blower and cooler for it to keep it ice cool.

#42

Punkenjoy

The compression is not just useful for saving SSD space but also for bandwidth saving.

Let say you have a link that can send 10 GB/s. You want to send 2 GB uncompress or 1 GB compress. The first one will take at least 200 ms where the second one would take 100 ms.

This is just for pure data transfer but you can see how it can reduce latency on large transfers

Also, these days the major energy cost come from moving the data around and not from doing the calculation itself. If you can move the data in a compressed state, you can save power there.

But what I would like to know is can we just uncompress just before using it and continue to save on bandwidth and storage while it sit in GPU memory? Just in time decompression!

That do not seem to do that there but I think it would be the thing to do as soon as we can have decompression engine fast enough to handle the load.

InVasManiPerfect example of why I think AMD should make a infinity fabric mGPU/GPU bridge put a individual memory chip or a pair in dual channel a pair of M.2 slots and a CPU chip. Have it host it's own OS and do compression/decompression on the fly along with cache acceleration like StoreMi completely offloaded from the primary socketed CPU/RAM/Storage and OS including those random Windows 10 background telemetry and updates nonsense for HBCC. They could actually do that pretty easily and have like 2-4 cores they could dedicated to it. They'd probably have extra processing overhead too to have a one of the newer revision protocol USB headers for a front panel device as well for that matter. It's could be semi multi-purpose yet directly tie in and be compatible with it's GPU's and a good resource offload device. It sit in a PCIe x1 slot and draw both power and additional bandwidth from that too may as well and good spot to mount it and allows for a 1-slot blower and cooler for it to keep it ice cool.

I think the future might be interesting. If AMD want to be the leader on PC, they might bring to PC OMI (Open memory interface) where the memory or storage is attached to the CPU via a super fast serial bus (Using way less pin and die space than modern memory technology). The actual memory controller would be shifted directly on the memory stick. The CPU would become memory agnostics. You could upgrade your CPU or memory independly. Storage (like optane) could also be attach via this.

The pin count is much smaller than with modern memory so you can have way more channel if required.

This is based on the OpenCAPI protocol. OpenCAPI itself would be used to attach any kind of accelerator. The chiplet architecture from AMD would probably make it easy for them to switch to these kinds or architecture and it's probably the future.

These are open standard pushed by IBM but i would see AMD using them or pushing their own standard in the future that have a similar goal. With these standard, the GPU could connect directly to the Memory controler and vice versa.

#43

InVasMani

PunkenjoyThe compression is not just useful for saving SSD space but also for bandwidth saving.

Let say you have a link that can send 10 GB/s. You want to send 2 GB uncompress or 1 GB compress. The first one will take at least 200 ms where the second one would take 100 ms.

This is just for pure data transfer but you can see how it can reduce latency on large transfers

Also, these days the major energy cost come from moving the data around and not from doing the calculation itself. If you can move the data in a compressed state, you can save power there.

But what I would like to know is can we just uncompress just before using it and continue to save on bandwidth and storage while it sit in GPU memory? Just in time decompression!

That do not seem to do that there but I think it would be the thing to do as soon as we can have decompression engine fast enough to handle the load.

I think the future might be interesting. If AMD want to be the leader on PC, they might bring to PC OMI (Open memory interface) where the memory or storage is attached to the CPU via a super fast serial bus (Using way less pin and die space than modern memory technology). The actual memory controller would be shifted directly on the memory stick. The CPU would become memory agnostics. You could upgrade your CPU or memory independly. Storage (like optane) could also be attach via this.

The pin count is much smaller than with modern memory so you can have way more channel if required.

This is based on the OpenCAPI protocol. OpenCAPI itself would be used to attach any kind of accelerator. The chiplet architecture from AMD would probably make it easy for them to switch to these kinds or architecture and it's probably the future.

These are open standard pushed by IBM but i would see AMD using them or pushing their own standard in the future that have a similar goal. With these standard, the GPU could connect directly to the Memory controler and vice versa.

Interesting though how cost effective would it b relative to the performance and storage capacity. If AMD wanted to do it cheaper they could just pair 2GB DDR4 high frequency chip and some extra GDDR6X preferably a bit faster than RDNA2 for example utilizes or as fast anyway and just a inexpensive 2-4c CPU that handles offloading all the cache acceleration, compression, decompression and provides a bit of persistent storage at the same time that HBCC can tap into throw on a 1 slot cooler and blower fan. I really think AMD could do all that for maybe $150's +/- $25-$50's and it would be a really good way of extending GPU VRAM and performance. Additionally it's just a really great way to add-in a storage acceleration card.

AMD could sell a lot of those types of devices outside of gaming as well it's a device that is desirable in today's society in other area's ML, data centers, ect...could have the microSD slot be PCIe based and slot it into a PCIe x1 slot as well doubt it would add much extra to the cost plus it's a good way to draw power for all those things I would think 75w would be actually overkill to power those things and probably closer to like what 10w to 25w!!? About the only thing that would draw much power would be the CPU and 2-4c CPU isn't gonna draw crap for power these days could just use a mobile chip great way for AMD to bin those further in fact kill 2 birds with one stone. Like mentioned the compression saves a lot of bandwidth/latency and part of VRAM usage is old data that's still stuck in usage because of bandwidth and latency constraints getting in the way which if you can speed those things up you obviously stream in and out the data more effectively and quickly at any given moment. Just having that extra capacity space to send the data to and fetch back quickly would be big with HBCC. It would be interesting if it reduced latency enough to pretty much eliminate most of the CF/SLI micro stutter negatives as well. What you mention sound cool, but I'm really unsure about the cost aspect on that. I think what I mention could be done at good entry price point and of course AMD could improve them gradually yearly or every other year. You might actually speed up your own GPU with one and not have to buy a whole god damn new card in the process.

To me this is kind of what AMD should do with it's GPU's integrate just a fraction of this onto a GPU and make future GPU's compatible with it so any ugprade is compatible with it and help do some mGPU assigning the newer quicker card to cache accelerate/compress/decompress/offload possibly a touch on the fly post process while matching the other cards performance capabilities simply uses excess CPU/GPU resources for all the other improvement aspects. Really provided the two cards aren't too far apart in age and specs I'd think it would work alright. In cases were there are more glaring gaps certain features could just be enabled/disabled in terms of what it accelerates be it rendering, storage, compression/decompression perhaps it's older and it just handles the cache and compression on one card and the newer card handles all the other stuff. I just can't see how it could be a bad thing for AMD to do they can use it to help bin mobile chips better which is more lucrative add a selling feature to it's GPU's and diversify into another area entirely in storage acceleration it could even be used for path tracing storage on either the CPU cache or the SSD. It free's up overhead to the rest of the system as well the OS the CPU the memory and the storage become less strained from the offloading impact of it.

#44

Master Tom

ebivanSSDs are huge and cheap, why not just put uncompressed (or less compressed) data there? Even if a game were to use maybe 200 or 300 GB, I would prefer that to the load times of Wasteland 3.... I dont have 10 Games installed at every moment, so i could allocate SSD space for the 1-3 games that I am actually playing at the moment.

Ages ago every game would let you choose how much of the installation you wanted to put on HDD and how much would be left of the CD/DVD. Why not add an option to chose compression level of stored data?

I have many games installed. SSDs are expensive and you cannot have enough storage.

#45

RoutedScripter

thesmokingmanThis seems redundant with 8, 12 and 16 core cpus.

16 CPUs seems a lot , but games have 10 busy threads these days, what about streaming, recording, doing some other stuff, IO would bog down 4 cores out of that, it's a waste where GPUs can do it much better and more direct.

The point of this is that it bypasses the CPU, it had to go all into the CPU, RAM, and circuitry travel distance, a whole deoutor before it went to the GPU, now all of that is bypassed.

#46

nagmat

Where can I find detailed information(repository) or source code about how it works algorithmically?

#47

Mouth of Sauron

ValantarMaking their GPUs do their own decompression isn't exactly a dubious claim, I mean, MS has made a new API for it (well, for hardware accelerated decompression to offload the CPU) that is also used on the upcoming Xboxes. As such it is likely to be a standard feature of many games in a couple of years. Ray tracing sound is of course a bit weird, but it makes sense - sound waves propagate in a way that can be simplified down to a collection of rays, including bouncing off surfaces and the like, though of course this behaviour (as well as bending around corners etc.) is different than how light rays behave, and it will always be a simplification rather than a simulation of how real sound waves work. It's still a technique that can allow for far more realistic spatial audio than we have today. As for Nvidia misrepresenting their RTRT implementation, I'll leave it to you to flesh that out, as IMO it's pretty clear that what we have currently is relatively low-performance and not really suited to fully path traced graphics, but well suited for low bounce count light effects like reflections and global illumination. Even for that performance still needs to increase, but it's passable as a first-generation solution of something that was previously entirely possible (non-real time RT has of course been around for ages). Nvidia does like to sell RTX as the second coming of raptor Jesus, but it's not like they're saying rasterization is dead or fully path traced games are going to be the standard from now on.

Hey, however nice I try to be - I seem to get a "sensei" here. There is very little I should "flesh out" about ray tracing and rendering in general, I'm connected closely with the damned thing over 3 decades. Pretty much everything was fleshed out long time ago...

Seriously, try to be less patronizing... Especially when making uncalled-for replies to people who know much more about the subject than you do...

#48

Valantar

Mouth of SauronHey, however nice I try to be - I seem to get a "sensei" here. There is very little I should "flesh out" about ray tracing and rendering in general, I'm connected closely with the damned thing over 3 decades. Pretty much everything was fleshed out long time ago...

Seriously, try to be less patronizing... Especially when making uncalled-for replies to people who know much more about the subject than you do...

All I'm saying is that you're making some claims here that you're not backing up with anything of substance, beyond alluding to experience as if that explains anything. I'm not doubting your experience, nor the value of it - not whatsoever - but all that tells us is that you ought to know a lot about this, not what you know. Because that is what I'm asking for here: an explanation of what you are saying. I'm asking you to present your points. I'm not contesting your claims (well, you could say that about decompression and RT audio, but you don't seem interested in discussing those), but you said something to the effect of current RT being complete garbage, which... well, needs fleshing out. How? Why? On what level? I mean, sure, we've all seen the examples of terrible reflection resolution in BF1 etc., but it seems you are claiming quite a bit more than that - though again, it's hard to judge going by your vague wording. So maybe try not to be insulted when someone asks you to flesh out your claims, and instead ... do so? Share some of that knowledge that you - rather patronizingly, I might add - claim that I should accept on blind faith? I think we're both talking past each other quite a bit here, but as far as I understand the general attitude on any discussion forum, making a vague claim - especially one backed by another claim of expert knowledge - and then refusing to go beyond this vagueness is a rather impolite thing to do. You're welcome to disagree, and I'll be happy to leave it at that, but the ball is firmly in your court.

I was also interested in seeing you argue your points about both the two first points in the post you quoted (about GPU-accelerated decompression and "RT" audio), but again you don't seem to have come here to have any kind of exchange of opinions or knowledge. Which is really too bad.

nagmatWhere can I find detailed information(repository) or source code about how it works algorithmically?

I would assume on Nvidia's internal servers and the work computers of their employees, and nowhere else. Nvidia doesn't tend to be very open source-oriented.

#49

R0H1T

Well here's a food for thought ~ has anyone tried NVMe 4.0 drives with Directstorage, or RTX IO, & seen how PCIE 3.0 would be a limiting factor in it? I do believe if this works like the way PS5 demos have shown PCIe 3.0 could be a major bottleneck, mainly on Intel, especially a year or two down the line with AAA titles!

#50

Valantar

R0H1TWell here's a food for thought ~ has anyone tried NVMe 4.0 drives with Directstorage, or RTX IO, & seen how PCIE 3.0 would be a limiting factor in it? I do believe if this works like the way PS5 demos have shown PCIe 3.0 could be a major bottleneck, mainly on Intel, especially a year or two down the line with AAA titles!

That would require developers to develop PC games in a manner that requires more than, say, 2.5GB/s of disk read speed as an absolute minimum. I doubt we'll see that for quite a while yet. Remember, the XSX sticks to 2.4GB/s PCIe 4.0x2, so cross-platform games are unlikely to require much more than this. It would absolutely be possible to make this be a bottleneck, but it would require some quite specific game designs, or extremely aggressive just-in-time texture streaming solutions (which could then be alleviated by enabling higher VRAM usage and earlier streaming for installations on lower bandwidth storage). Still, it seems highly likely that games in the relatively near future will (need to) become aware of what kind of storage they are installed on in a way we haven't seen yet.

Add your own comment

NVIDIA RTX IO Detailed: GPU-assisted Storage Stack Here to Stay Until CPU Core-counts Rise

52 Comments on NVIDIA RTX IO Detailed: GPU-assisted Storage Stack Here to Stay Until CPU Core-counts Rise

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts

NVIDIA RTX IO Detailed: GPU-assisted Storage Stack Here to Stay Until CPU Core-counts Rise

Related News

52 Comments on NVIDIA RTX IO Detailed: GPU-assisted Storage Stack Here to Stay Until CPU Core-counts Rise

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts