• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Crysis 3 Installed On and Run Directly from RTX 3090 24 GB GDDR6X VRAM

Why is this even an article?
Because it's very interesting! That's why.
Reading game data from VRAM, which has to be read to system memory before going back to the GPU
Which would happen with ANY storage. VRAM is literally the fastest storage you can buy.
is only going to hurt performance.
How? HDD's are much slower and SSD's, even the fastest, are still slow in comparison to VRAM. How is VRAM going to "hurt" performance?
It's fancy that you can do this, but it's really a sub-par solution compared to just using a normal RAM disk.
You assume that game assets are not transferred directly to other sections of VRAM through special instruction operations, which would not be difficult. However, you have a point with the ram-disk.
Accessing system memory is going to be far faster than doing twice the number of transfers over PCIe to and from the same device. This isn't a win for latency and it's not a win for bandwidth compared to the alternative.
While that would be true if the VRAM could not do transfers directly to itself, it can and with this VRAM-disk scheme likely does. This is an experimental thing. It's not being done because it's practical, it's being done for giggles. Lighten up a little bit.
 
Last edited:
with the same system I installed crysis 1 on the vega 56 (lower memory quantity), I don't see where the miracle is, mostly thanks to the tool written by the programmer that allows you to use the gpu memories as a disk; however in my benchmarks I arrived at 4gb / sr / w, measured with crystal disc mark then I tried the same thing using a disk made with the ram of the cpu (2666mhz ddr4 dual channel) and the result was 10gb r / w then I do not see where the miracle obtained by the user is ... maybe he has a credit ... he devoted himself a lot to the subject until he found a tool that allows you to do this .... in addition to being an exercise in style I do not see the usefulness or difficulty in achieving is pciexpress3.0 vs cpu ram bus; exfat is the fastest filesystem even if by just about 7-10%.
in the third photo you can see how with the aida benchmark the ram of the gpu are slow in the r / w but in the copy being inside the gpu the bus allows it to reach 30 times the speed of the pciexpress 3.0 bus while the cpu still remains in advantage having about 4 times the speed in r / w
 

Attachments

  • vega56mem.jpg
    vega56mem.jpg
    253.5 KB · Views: 137
  • ryzenmem.jpg
    ryzenmem.jpg
    314.7 KB · Views: 119
  • gpu.vs.cpu.jpg
    gpu.vs.cpu.jpg
    281.4 KB · Views: 111
Last edited:
Because it's very interesting! That's why.

Which would happen with ANY storage. VRAM is literally the fastest storage you can buy.

How? HDD's are much slower and SSD's, even the fastest, are still slow in comparison to VRAM. How is VRAM going to "hurt" performance?

You assume that game assets are not transferred directly to other sections of VRAM through special instruction operations, which would not be difficult. However, you have a point with the ram-disk.

While that would be true if the VRAM could not do transfers directly to itself, it can and with this VRAM-disk scheme likely does. This is an experimental thing. It's not being done because it's practical, it's being done for giggles. Lighten up a little bit.
It's interesting if you've been living under a rock. This isn't something new and it didn't get any better. It hurts performance because you're now sharing memory bandwidth with disk access in addition to GPU rendering. Loading content and rendering at the same time could cause a performance degradation between memory use and PCIe utilization because now content has to travel in both directions from and to the GPU. That's twice the number of transfers because there is no way for the GPU driver to know what's on the part of VRAM being used as a disk. If it were a mere copy, then I'd agree, but it's not. Disk emulation is involved which makes doing what you suggest not feasible, particularly if the content has be manipulated in some way before being loaded into VRAM. In short, a disk read (regardless of where that disk is,) always goes through system memory.

I get that it was done for giggles, by why do giggles require a news article? There's nothing new going on here.
 
It's interesting if you've been living under a rock. This isn't something new and it didn't get any better. It hurts performance because you're now sharing memory bandwidth with disk access in addition to GPU rendering. Loading content and rendering at the same time could cause a performance degradation between memory use and PCIe utilization because now content has to travel in both directions from and to the GPU. That's twice the number of transfers because there is no way for the GPU driver to know what's on the part of VRAM being used as a disk. If it were a mere copy, then I'd agree, but it's not. Disk emulation is involved which makes doing what you suggest not feasible, particularly if the content has be manipulated in some way before being loaded into VRAM. In short, a disk read (regardless of where that disk is,) always goes through system memory.

I get that it was done for giggles, by why do giggles require a news article? There's nothing new going on here.


Cause perhaps it will turn on a lightbulb for someone.

Also, bandwidth tests by our own W1zzard show the PCIe bus is barely used, and if the transfer from Vmem to RAM is faster than from a SSD or NVMe while not hindering performance of the data between the CPU and GPU it's interesting when you consider the new DMA and what could happen if a CPU core was placed on the GPU. No more need for decompression and transfers. Which is kinda one of the new things Nvidia and AMD have been working on, instead of using the CPU to decompress data the GPU needs load compressed textures into Vmem and allow the GPU to handle decompression on the fly, and if they do it with fine enough resolution the GPU could direct fetch and decompress only the part of the texture needed.

I think it's cool, and it shows how much more the hardware we have is capable of, and how in a few years we may have a true "APU" of graphics cores intermixed with CPU cores sharing cache and a homogeneous pool of faster memory.
 
It's old news. But some newer PC guys&girls probably didn't know you could do this.
 
@lexluthermiester almost all your points are wrong...
Read a couple of posts before you. Also try benchmarking it.
Maybe with DirectStorage and stuff like RTX IO it will be great, but surely not with GpuRamDrive in its current form. I wonder if that tool lets you assign more MB than VRAM available ...because quite possibly if it gets full you end up in RAM anyway and if it gets full you end up on NVMe/SSD/HDD or wherever your swap file is.
 
@lexluthermiester almost all your points are wrong...
Prove it.
Read a couple of posts before you.
Did that.
Maybe with DirectStorage and stuff like RTX IO it will be great, but surely not with GpuRamDrive in its current form. I wonder if that tool lets you assign more MB than VRAM available ...because quite possibly if it gets full you end up in RAM anyway and if it gets full you end up on NVMe/SSD/HDD or wherever your swap file is.
You seem to be missing a few conceptual points. So before telling me I'm wrong on all points, do some research.
 
Except it's not novel. This has been around for a while.
Not to you maybe...
The only part that hasn't is this particular GPU.
And to be fair, no one has ever done this particular thing. Installing and running a game as big and complex as Crysis3 from VRAM?

What I want to see is someone do something like this with one of those incoming 48GB/64GB Quadro cards. That would be fascinating!
 
Not to you maybe...

And to be fair, no one has ever done this particular thing. Installing and running a game as big and complex as Crysis3 from VRAM?

What I want to see is someone do something like this with one of those incoming 48GB/64GB Quadro cards. That would be fascinating!
Would it though? I don't really see it changing anything. I still would expect a ram disk to be faster and cheaper. Latency doesn't disappear because you use a card with more VRAM.
 
But again, it is very interesting and novel.
Drawing poop emojii with a broken goose feather held in your left foot is also interesting and novel to some people, doesn't make it any more useful or practical.
This was just an old amateur concept that hasn't been updated in several years (for a good reason). The main issue is that it still uses RAM for data exchange, so basically it works like a conventional RAM disk that needs slightly less memory space, but uses GPU as temporary storage. Adding several more steps to read/write process only makes it drastically slower than RAM disk (to the point where GDDR5 is slower than NVME). Looked through that code and even though I haven't touched CUDA or even C++ in years, I can already see some issues.
I'm sure there are much better and efficient ways to make this work, but I still don't see any reasons to do so... Heck, NVME has already saturated PCIe 3.0 bandwidth, and PCIe 4.0 isn't even at the full swing yet. Regardless of how fast GDDR5/6/7... or HBM is on paper, it's only gonna be that fast from the perspective of the GPU. For the rest of the system it's gonna be only as fast as PCIe and a shitton of abstraction layers will allow it to be. Basically, what I'm trying to say is that you can't make it faster than NVME RAID, even less so - RAM disk. That's why AMD stuck their guns to hybrid solutions, like Radeon Pro SSG. At least for now this approach makes a bit more sense, when you actually need to have "storage" on GPU.
 
Last edited:
Drawing poop emojii with a broken goose feather held in your left foot is also interesting and novel to some people, doesn't make it any more useful or practical.
This was just an old amateur concept that hasn't been updated in several years (for a good reason). The main issue is that it still uses RAM for data exchange, so basically it works like a conventional RAM disk that needs slightly less memory space, but uses GPU as temporary storage. Adding several more steps to read/write process only makes it drastically slower than RAM disk (to the point where GDDR5 is slower than NVME). Looked through that code and even though I haven't touched CUDA or even C++ in years, I can already see some issues.
I'm sure there are much better and efficient ways to make this work, but I still don't see any reasons to do so... Heck, NVME has already saturated PCIe 3.0 bandwidth, and PCIe 4.0 isn't even at the full swing yet. Regardless of how fast GDDR5/6/7... or HBM is on paper, it's only gonna be that fast from the perspective of the GPU. For the rest of the system it's gonna be only as fast as PCIe and a shitton of abstraction layers will allow it to be. Basically, what I'm trying to say is that you can't make it faster than NVME RAID, even less so - RAM disk. That's why AMD stuck their guns to hybrid solutions, like Radeon Pro SSG. At least for now this approach makes a bit more sense, when you actually need to have "storage" on GPU.
Precisely. If people really care about making things go fast, a ram disk is the way to do it. Nothing, and I mean nothing, will be faster than direct access to physical memory. There is no interconnect that has lower latency and higher bandwidth than accessing DRAM directly. There just isn't. Here's the rub though, because even that doesn't matter because you still need to copy game data from somewhere to put it into a ram disk or a "vram disk". New game or a restart means a new copy. You're still constrained by the media that the game data is on and you have to wait longer to get going.

So, in summary:

Fun level of doing this if it's novel to you and you get excited by this kind of thing: High
Practical usefulness of doing this when you have a NVMe drive: Never
 
Whoa I never knew about this GPU RAM DRIVE...I knew you could that kind of thing in Linux, but I hadn't ever seen this in Windows. Well that's pretty damn cool been looking for this sort of thing on windows for quite a few years didn't know someone had finally come up with solution though.

Precisely. If people really care about making things go fast, a ram disk is the way to do it. Nothing, and I mean nothing, will be faster than direct access to physical memory. There is no interconnect that has lower latency and higher bandwidth than accessing DRAM directly. There just isn't. Here's the rub though, because even that doesn't matter because you still need to copy game data from somewhere to put it into a ram disk or a "vram disk". New game or a restart means a new copy. You're still constrained by the media that the game data is on and you have to wait longer to get going.

So, in summary:

Fun level of doing this if it's novel to you and you get excited by this kind of thing: High
Practical usefulness of doing this when you have a NVMe drive: Never
I think you're overlooking the CPU and memory overhead of a actual system based ram disk. This would still have some of that loading up the VRAM initially, but after that it would pretty much run off it's own GPU resources and that's part of the beauty of it. Hell with a 24GB GPU you could probably load a copy windows 10 onto it especially since it's paired with ImDisk to begin with which is VHD friendly. It might not work with a fully patch Windows 10 Pro though or perhaps not without stripping down a few parts of it at least. It kind of read on the line of feasible/infeasible for that purpose though Windows 10 home edition would be a bit trimmed down anyway and should work. There is still no denying it's really interesting. You could utilize it for Prefetch/ReadyBoot.etl or most likely virtual memory assuming that can be pointed to it as well or not. I imagine StoreMi or PrimoCache would play nice with it as well. If I'm not mistaken once the data is copied to this type of VRAM device it should actually be quicker than system memory between the two which if that's the case this isn't bad at all. NVMe I don't believe is going to have the I/O of this kind of device if I'm not mistaken much like it can't come close to competing with system memory in that area it gets trounced.
 
Last edited:
You two can argue how pointless or uninteresting it is till the cows come home. The rest of us will continue to find it interesting.
I'm not arguing, I'm agreeing. Once again:
Drawing poop emojii with a broken goose feather held in your left foot is also interesting and novel to some people, doesn't make it any more useful or practical.
 
Wonder what happens if you SLI two cards does bandwidth and/or capacity increase? Something certainly worth noting is it could be used on a older system hell in LGA775 if you were on a DDR2 board it might even be faster than the system memory crazy as that use case is. I think just the fact that it can be done and potentially has a upside to it is intriguing enough. I like the idea of it with Primo Cache or StoreMi especially for a hybrid cache and the prefetch readyboot/boost/shadow cache as well as the controversial page filing isn't a terrible use either. I'd like to see what the ATTO benchmark both for bytes and I/O the latter in perticular is really interesting to look at and compare to other storage options NVME and ramdisk as well as SATA. Perhaps it's not very practical though it is really intriguing.
 
Whoa I never knew about this GPU RAM DRIVE...I knew you could that kind of thing in Linux, but I hadn't ever seen this in Windows. Well that's pretty damn cool been looking for this sort of thing on windows for quite a few years didn't know someone had finally come up with solution though.

I think you're overlooking the CPU and memory overhead of a actual system based ram disk. This would still have some of that loading up the VRAM initially, but after that it would pretty much run off it's own GPU resources and that's part of the beauty of it. Hell with a 24GB GPU you could probably load a copy windows 10 onto it especially since it's paired with ImDisk to begin with which is VHD friendly. It might not work with a fully patch Windows 10 Pro though or perhaps not without stripping down a few parts of it at least. It kind of read on the line of feasible/infeasible for that purpose though Windows 10 home edition would be a bit trimmed down anyway and should work. There is still no denying it's really interesting. You could utilize it for Prefetch/ReadyBoot.etl or most likely virtual memory assuming that can be pointed to it as well or not. I imagine StoreMi or PrimoCache would play nice with it as well. If I'm not mistaken once the data is copied to this type of VRAM device it should actually be quicker than system memory between the two which if that's the case this isn't bad at all. NVMe I don't believe is going to have the I/O of this kind of device if I'm not mistaken much like it can't come close to competing with system memory in that area it gets trounced.
That isn't how it works. You're welcome to prove me wrong by demonstrating how it's possible by actually doing it.
 
Low quality post by R-T-B
Wonder what happens if you SLI two cards does bandwidth and/or capacity increase?
SLI doesn't "double" your video memory. Think of it as RAID-1, but with videocards.

Something certainly worth noting is it could be used on a older system hell in LGA775 if you were on a DDR2 board it might even be faster than the system memory crazy as that use case is.
Re-read my post above. All of your data is going through RAM either way. Plus, those old "DDR2" boards usually have PCIe 1.1, which is another perf gimp. This concept is physically incapable of being faster than RAM disk on any given machine, just because of the way it works.
 
SLI doesn't "double" your video memory. Think of it as RAID-1, but with videocards.
Doesn't modern NV-Link allow pooling of memory?
 
Telsa K80's aren't really that expensive on Ebay these days shame this isn't very effective. On the plus side over flash storage it should be more reliable between the two. Unfortunately it just makes no sense over system memory is what seems to be indicated novelty parlor trick at best I guess.
 
Doesn't modern NV-Link allow pooling of memory?
Yes, it does, but it will make little to no difference. In a typical PC NVlink only helps GPUs to talk to each other, and CPU still uses PCIe bus to talk to GPUs.
For this particular case it'll be exactly the same as if you had multi-GPU without any bridges(e.g. you can create individual vRAM disks, but can't combine them). The only way around it is to create a storage pool out of several vRAM disks(Windows Storage Spaces), but I'm not sure if it'll even work for these.
 
SLI doesn't "double" your video memory. Think of it as RAID-1, but with videocards.
Your analogy is VERY flawed. If you were to compare SLI to RAID, it would be RAID0 as you are adding the capacity of one card to another not mirroring one card with another as would be done with RAID1. And yes, the VRAM doubles. In the case of the RTX3090, 24GB + 24GB = 48GB.
 
Back
Top