Tuesday, October 6th 2020

Crysis 3 Installed On and Run Directly from RTX 3090 24 GB GDDR6X VRAM

Let's skip ahead of any "Can it run Crysis" introductions for this news piece, and instead state it as it is: Crysis 3 can absolutely run when installed directly on a graphics card's memory subsystem. In this case, an RTX 3090 and its gargantuan 24 GB of GDDR6X memory where the playground for such an experiment. Using the "VRAM Drive" application, distributed in an open-source manner via the GitHub platform, one can allocate part of their GPU's VRAM and use it as if it was just another system drive. After doing so, user Strife212 (as per her Twitter handle) then went on to install Crysis 3 on 15 GB of the allocated VRAM. The rest of the card's 9 GB were then available to actually load in graphical assets for the game, and VRAM consumption (of both the installed game and its running assets) barely crossed the 20 GB total VRAM utilization.

As you might expect, graphics memory is one of the fastest memory subsystems on your PC, being even faster (in pure performance terms) than system RAM. Loading up of game levels and asset streaming from VRAM "disk-sequestered" pools to free VRAM pools was obviously much faster than usual, even more than the speeds achieved by today's NVMe drives. Crysis 3 in this configuration was shown to run by as many as 75 FPS in 4K resolution, with the High preset settings. A proof of concept more than anything - but users with a relatively powerful (or memory-capable) graphics card can perhaps look at this exotic solution as a compromise of sorts, should they not have any fast storage options, and provided the game install size is relatively small.
Sources: Strife212 @ Twitter, via Tom's Hardware
Add your own comment

70 Comments on Crysis 3 Installed On and Run Directly from RTX 3090 24 GB GDDR6X VRAM

#26
Aquinus
Resident Wat-man
lexluthermiesterBecause it's very interesting! That's why.

Which would happen with ANY storage. VRAM is literally the fastest storage you can buy.

How? HDD's are much slower and SSD's, even the fastest, are still slow in comparison to VRAM. How is VRAM going to "hurt" performance?

You assume that game assets are not transferred directly to other sections of VRAM through special instruction operations, which would not be difficult. However, you have a point with the ram-disk.

While that would be true if the VRAM could not do transfers directly to itself, it can and with this VRAM-disk scheme likely does. This is an experimental thing. It's not being done because it's practical, it's being done for giggles. Lighten up a little bit.
It's interesting if you've been living under a rock. This isn't something new and it didn't get any better. It hurts performance because you're now sharing memory bandwidth with disk access in addition to GPU rendering. Loading content and rendering at the same time could cause a performance degradation between memory use and PCIe utilization because now content has to travel in both directions from and to the GPU. That's twice the number of transfers because there is no way for the GPU driver to know what's on the part of VRAM being used as a disk. If it were a mere copy, then I'd agree, but it's not. Disk emulation is involved which makes doing what you suggest not feasible, particularly if the content has be manipulated in some way before being loaded into VRAM. In short, a disk read (regardless of where that disk is,) always goes through system memory.

I get that it was done for giggles, by why do giggles require a news article? There's nothing new going on here.
Posted on Reply
#27
Steevo
AquinusIt's interesting if you've been living under a rock. This isn't something new and it didn't get any better. It hurts performance because you're now sharing memory bandwidth with disk access in addition to GPU rendering. Loading content and rendering at the same time could cause a performance degradation between memory use and PCIe utilization because now content has to travel in both directions from and to the GPU. That's twice the number of transfers because there is no way for the GPU driver to know what's on the part of VRAM being used as a disk. If it were a mere copy, then I'd agree, but it's not. Disk emulation is involved which makes doing what you suggest not feasible, particularly if the content has be manipulated in some way before being loaded into VRAM. In short, a disk read (regardless of where that disk is,) always goes through system memory.

I get that it was done for giggles, by why do giggles require a news article? There's nothing new going on here.
Cause perhaps it will turn on a lightbulb for someone.

Also, bandwidth tests by our own W1zzard show the PCIe bus is barely used, and if the transfer from Vmem to RAM is faster than from a SSD or NVMe while not hindering performance of the data between the CPU and GPU it's interesting when you consider the new DMA and what could happen if a CPU core was placed on the GPU. No more need for decompression and transfers. Which is kinda one of the new things Nvidia and AMD have been working on, instead of using the CPU to decompress data the GPU needs load compressed textures into Vmem and allow the GPU to handle decompression on the fly, and if they do it with fine enough resolution the GPU could direct fetch and decompress only the part of the texture needed.

I think it's cool, and it shows how much more the hardware we have is capable of, and how in a few years we may have a true "APU" of graphics cores intermixed with CPU cores sharing cache and a homogeneous pool of faster memory.
Posted on Reply
#28
Th3pwn3r
It's old news. But some newer PC guys&girls probably didn't know you could do this.
Posted on Reply
#29
lexluthermiester
AquinusI get that it was done for giggles, by why do giggles require a news article? There's nothing new going on here.
But again, it is very interesting and novel.
Posted on Reply
#30
rutra80
@lexluthermiester almost all your points are wrong...
Read a couple of posts before you. Also try benchmarking it.
Maybe with DirectStorage and stuff like RTX IO it will be great, but surely not with GpuRamDrive in its current form. I wonder if that tool lets you assign more MB than VRAM available ...because quite possibly if it gets full you end up in RAM anyway and if it gets full you end up on NVMe/SSD/HDD or wherever your swap file is.
Posted on Reply
#31
lexluthermiester
rutra80@lexluthermiester almost all your points are wrong...
Prove it.
rutra80Read a couple of posts before you.
Did that.
rutra80Maybe with DirectStorage and stuff like RTX IO it will be great, but surely not with GpuRamDrive in its current form. I wonder if that tool lets you assign more MB than VRAM available ...because quite possibly if it gets full you end up in RAM anyway and if it gets full you end up on NVMe/SSD/HDD or wherever your swap file is.
You seem to be missing a few conceptual points. So before telling me I'm wrong on all points, do some research.
Posted on Reply
#32
Aquinus
Resident Wat-man
lexluthermiesterand novel.
Except it's not novel. This has been around for a while. The only part that is novel is this particular GPU.
Posted on Reply
#33
lexluthermiester
AquinusExcept it's not novel. This has been around for a while.
Not to you maybe...
AquinusThe only part that hasn't is this particular GPU.
And to be fair, no one has ever done this particular thing. Installing and running a game as big and complex as Crysis3 from VRAM?

What I want to see is someone do something like this with one of those incoming 48GB/64GB Quadro cards. That would be fascinating!
Posted on Reply
#34
Aquinus
Resident Wat-man
lexluthermiesterNot to you maybe...

And to be fair, no one has ever done this particular thing. Installing and running a game as big and complex as Crysis3 from VRAM?

What I want to see is someone do something like this with one of those incoming 48GB/64GB Quadro cards. That would be fascinating!
Would it though? I don't really see it changing anything. I still would expect a ram disk to be faster and cheaper. Latency doesn't disappear because you use a card with more VRAM.
Posted on Reply
#35
silentbogo
lexluthermiesterBut again, it is very interesting and novel.
Drawing poop emojii with a broken goose feather held in your left foot is also interesting and novel to some people, doesn't make it any more useful or practical.
This was just an old amateur concept that hasn't been updated in several years (for a good reason). The main issue is that it still uses RAM for data exchange, so basically it works like a conventional RAM disk that needs slightly less memory space, but uses GPU as temporary storage. Adding several more steps to read/write process only makes it drastically slower than RAM disk (to the point where GDDR5 is slower than NVME). Looked through that code and even though I haven't touched CUDA or even C++ in years, I can already see some issues.
I'm sure there are much better and efficient ways to make this work, but I still don't see any reasons to do so... Heck, NVME has already saturated PCIe 3.0 bandwidth, and PCIe 4.0 isn't even at the full swing yet. Regardless of how fast GDDR5/6/7... or HBM is on paper, it's only gonna be that fast from the perspective of the GPU. For the rest of the system it's gonna be only as fast as PCIe and a shitton of abstraction layers will allow it to be. Basically, what I'm trying to say is that you can't make it faster than NVME RAID, even less so - RAM disk. That's why AMD stuck their guns to hybrid solutions, like Radeon Pro SSG. At least for now this approach makes a bit more sense, when you actually need to have "storage" on GPU.
Posted on Reply
#36
Aquinus
Resident Wat-man
silentbogoDrawing poop emojii with a broken goose feather held in your left foot is also interesting and novel to some people, doesn't make it any more useful or practical.
This was just an old amateur concept that hasn't been updated in several years (for a good reason). The main issue is that it still uses RAM for data exchange, so basically it works like a conventional RAM disk that needs slightly less memory space, but uses GPU as temporary storage. Adding several more steps to read/write process only makes it drastically slower than RAM disk (to the point where GDDR5 is slower than NVME). Looked through that code and even though I haven't touched CUDA or even C++ in years, I can already see some issues.
I'm sure there are much better and efficient ways to make this work, but I still don't see any reasons to do so... Heck, NVME has already saturated PCIe 3.0 bandwidth, and PCIe 4.0 isn't even at the full swing yet. Regardless of how fast GDDR5/6/7... or HBM is on paper, it's only gonna be that fast from the perspective of the GPU. For the rest of the system it's gonna be only as fast as PCIe and a shitton of abstraction layers will allow it to be. Basically, what I'm trying to say is that you can't make it faster than NVME RAID, even less so - RAM disk. That's why AMD stuck their guns to hybrid solutions, like Radeon Pro SSG. At least for now this approach makes a bit more sense, when you actually need to have "storage" on GPU.
Precisely. If people really care about making things go fast, a ram disk is the way to do it. Nothing, and I mean nothing, will be faster than direct access to physical memory. There is no interconnect that has lower latency and higher bandwidth than accessing DRAM directly. There just isn't. Here's the rub though, because even that doesn't matter because you still need to copy game data from somewhere to put it into a ram disk or a "vram disk". New game or a restart means a new copy. You're still constrained by the media that the game data is on and you have to wait longer to get going.

So, in summary:

Fun level of doing this if it's novel to you and you get excited by this kind of thing: High
Practical usefulness of doing this when you have a NVMe drive: Never
Posted on Reply
#37
InVasMani
Whoa I never knew about this GPU RAM DRIVE...I knew you could that kind of thing in Linux, but I hadn't ever seen this in Windows. Well that's pretty damn cool been looking for this sort of thing on windows for quite a few years didn't know someone had finally come up with solution though.
AquinusPrecisely. If people really care about making things go fast, a ram disk is the way to do it. Nothing, and I mean nothing, will be faster than direct access to physical memory. There is no interconnect that has lower latency and higher bandwidth than accessing DRAM directly. There just isn't. Here's the rub though, because even that doesn't matter because you still need to copy game data from somewhere to put it into a ram disk or a "vram disk". New game or a restart means a new copy. You're still constrained by the media that the game data is on and you have to wait longer to get going.

So, in summary:

Fun level of doing this if it's novel to you and you get excited by this kind of thing: High
Practical usefulness of doing this when you have a NVMe drive: Never
I think you're overlooking the CPU and memory overhead of a actual system based ram disk. This would still have some of that loading up the VRAM initially, but after that it would pretty much run off it's own GPU resources and that's part of the beauty of it. Hell with a 24GB GPU you could probably load a copy windows 10 onto it especially since it's paired with ImDisk to begin with which is VHD friendly. It might not work with a fully patch Windows 10 Pro though or perhaps not without stripping down a few parts of it at least. It kind of read on the line of feasible/infeasible for that purpose though Windows 10 home edition would be a bit trimmed down anyway and should work. There is still no denying it's really interesting. You could utilize it for Prefetch/ReadyBoot.etl or most likely virtual memory assuming that can be pointed to it as well or not. I imagine StoreMi or PrimoCache would play nice with it as well. If I'm not mistaken once the data is copied to this type of VRAM device it should actually be quicker than system memory between the two which if that's the case this isn't bad at all. NVMe I don't believe is going to have the I/O of this kind of device if I'm not mistaken much like it can't come close to competing with system memory in that area it gets trounced.
Posted on Reply
#39
Aquinus
Resident Wat-man
lexluthermiesterYou two can argue how pointless or uninteresting it is till the cows come home. The rest of us will continue to find it interesting.
I'm not arguing, I'm agreeing. Once again:
silentbogoDrawing poop emojii with a broken goose feather held in your left foot is also interesting and novel to some people, doesn't make it any more useful or practical.
Posted on Reply
#40
InVasMani
Wonder what happens if you SLI two cards does bandwidth and/or capacity increase? Something certainly worth noting is it could be used on a older system hell in LGA775 if you were on a DDR2 board it might even be faster than the system memory crazy as that use case is. I think just the fact that it can be done and potentially has a upside to it is intriguing enough. I like the idea of it with Primo Cache or StoreMi especially for a hybrid cache and the prefetch readyboot/boost/shadow cache as well as the controversial page filing isn't a terrible use either. I'd like to see what the ATTO benchmark both for bytes and I/O the latter in perticular is really interesting to look at and compare to other storage options NVME and ramdisk as well as SATA. Perhaps it's not very practical though it is really intriguing.
Posted on Reply
#41
Aquinus
Resident Wat-man
InVasManiWhoa I never knew about this GPU RAM DRIVE...I knew you could that kind of thing in Linux, but I hadn't ever seen this in Windows. Well that's pretty damn cool been looking for this sort of thing on windows for quite a few years didn't know someone had finally come up with solution though.

I think you're overlooking the CPU and memory overhead of a actual system based ram disk. This would still have some of that loading up the VRAM initially, but after that it would pretty much run off it's own GPU resources and that's part of the beauty of it. Hell with a 24GB GPU you could probably load a copy windows 10 onto it especially since it's paired with ImDisk to begin with which is VHD friendly. It might not work with a fully patch Windows 10 Pro though or perhaps not without stripping down a few parts of it at least. It kind of read on the line of feasible/infeasible for that purpose though Windows 10 home edition would be a bit trimmed down anyway and should work. There is still no denying it's really interesting. You could utilize it for Prefetch/ReadyBoot.etl or most likely virtual memory assuming that can be pointed to it as well or not. I imagine StoreMi or PrimoCache would play nice with it as well. If I'm not mistaken once the data is copied to this type of VRAM device it should actually be quicker than system memory between the two which if that's the case this isn't bad at all. NVMe I don't believe is going to have the I/O of this kind of device if I'm not mistaken much like it can't come close to competing with system memory in that area it gets trounced.
That isn't how it works. You're welcome to prove me wrong by demonstrating how it's possible by actually doing it.
Posted on Reply
#42
silentbogo
InVasManiWonder what happens if you SLI two cards does bandwidth and/or capacity increase?
SLI doesn't "double" your video memory. Think of it as RAID-1, but with videocards.
InVasManiSomething certainly worth noting is it could be used on a older system hell in LGA775 if you were on a DDR2 board it might even be faster than the system memory crazy as that use case is.
Re-read my post above. All of your data is going through RAM either way. Plus, those old "DDR2" boards usually have PCIe 1.1, which is another perf gimp. This concept is physically incapable of being faster than RAM disk on any given machine, just because of the way it works.
Posted on Reply
#43
bubbleawsome
silentbogoSLI doesn't "double" your video memory. Think of it as RAID-1, but with videocards.
Doesn't modern NV-Link allow pooling of memory?
Posted on Reply
#44
InVasMani
Telsa K80's aren't really that expensive on Ebay these days shame this isn't very effective. On the plus side over flash storage it should be more reliable between the two. Unfortunately it just makes no sense over system memory is what seems to be indicated novelty parlor trick at best I guess.
Posted on Reply
#45
silentbogo
bubbleawsomeDoesn't modern NV-Link allow pooling of memory?
Yes, it does, but it will make little to no difference. In a typical PC NVlink only helps GPUs to talk to each other, and CPU still uses PCIe bus to talk to GPUs.
For this particular case it'll be exactly the same as if you had multi-GPU without any bridges(e.g. you can create individual vRAM disks, but can't combine them). The only way around it is to create a storage pool out of several vRAM disks(Windows Storage Spaces), but I'm not sure if it'll even work for these.
Posted on Reply
#46
lexluthermiester
silentbogoSLI doesn't "double" your video memory. Think of it as RAID-1, but with videocards.
Your analogy is VERY flawed. If you were to compare SLI to RAID, it would be RAID0 as you are adding the capacity of one card to another not mirroring one card with another as would be done with RAID1. And yes, the VRAM doubles. In the case of the RTX3090, 24GB + 24GB = 48GB.
Posted on Reply
#47
EarthDog
silentbogoSLI doesn't "double" your video memory. Think of it as RAID-1, but with videocards.
...you are correct in that it mirrors the memory on each card.
lexluthermiesterYour analogy is VERY flawed. If you were to compare SLI to RAID, it would be RAID0 as you are adding the capacity of one card to another not mirroring one card with another as would be done with RAID1. And yes, the VRAM doubles. In the case of the RTX3090, 24GB + 24GB = 48GB.
RAID1 is accurate for the RAM... RAID0 is accurate for the GPU itself, lol. The memory is mirrored, not unique and not combined/pooled. Each GPU has its own frame buffer with the same rendering and geometry information on each card (the same data). In the case of the RTX 3090, you still have a pool of 24GB to work with since the same data is mirrored on the second card.

......at least, that is how SLI worked through through Turing..... did it change with Ampere? (gaming/SLI, not compute note)
Posted on Reply
#48
lexluthermiester
EarthDogRAID1 is accurate for the RAM...
Incorrect.
EarthDogThe memory is mirrored, not unique and not combined/pooled.
That is not how SLI works. One GPU(and it's ram) draw one part of the screen and the other GPU draws a different part of the screen before both parts are sent to the framebuffer. VRAM usage is completely independent.
EarthDogIn the case of the RTX 3090, you still have a pool of 24GB to work with since the same data is mirrored on the second card.
If you're adding one framebuffer to another the combined total is double.
EarthDog......at least, that is how SLI worked through through Turing..... did it change with Ampere? (gaming/SLI, not compute note)
The way SLI has worked since NVidia bought out 3DFX is that it's not a scanline offset rendering scheme anymore. It has not changed dramatically since then. NVidia's SLI works by the primary card(which is always the card connected to the display) assigning workloads for itself and the slave card to do. Each card renders a section of the screen and moves it to the framebuffer(which always resides on the primary card) through the SLI bridge. Each card uses it's own VRAM exclusively and the VRAM is always additive. So in the case of 3090's in SLI, 24GB + 24GB does = 48GB.
Posted on Reply
#49
EarthDog
lexluthermiesterIncorrect.

That is not how SLI works. One GPU(and it's ram) draw one part of the screen and the other GPU draws a different part of the screen before both parts are sent to the framebuffer. VRAM usage is completely independent.

If you're adding one framebuffer to another the combined total is double.


The way SLI has worked since NVidia bought out 3DFX is that it's not a scanline offset rendering scheme anymore. NVidia's SLI works by the primary card(which is always the card connected to the display) assigning workloads for itself and the slave card to do. Each card renders a section of the screen and moves it to the framebuffer(which always resides on the primary card) through the SLI bridge. Each card uses it's own VRAM exclusively and the VRAM is always additive. So in the case of 3090's in SLI, 24GB + 24GB does = 48GB.
My guy... the data is mirrored in typical (gaming) SLI... it does not pool, it does not double. In other words... yes, you have two 24gb cards, but each card has the same data in it so you get zero benefits of a pooled set of vram. It doesn't work that way. Each cards reads its own vram...it is not a shared pool of 48GB. It is not "additive".

Please, go look online to confirm. Here's a start. :)

www.build-gaming-computers.com/sli-performance-increase.html#:~:text=SLI%20Myth%20%236%3A%20SLI%20Doubles%20VRAM&text=So%20to%20set%20the%20record,or%20added%2C%20but%20instead%20copied.
SLI Myth #6: SLI Doubles VRAM

Many gamers will also be aware of this one, but like the myth of doubling performance it's easy to fall prey to this misconception because it also seems quite logical on the surface when you think about it.

So to set the record straight, no, SLI does not double your available VRAM (Video Memory).

The VRAM between a multiple video card system isn't shared or added, but instead copied. What I mean is that say you have two 8GB video cards in SLI.

Instead of now having 16GB, you still only have access to 8GB, as during processing the data in the first GPU is copied to the second GPU.

So your system only ever uses 8GB at one time.
www.wepc.com/tips/what-is-sli/
Seeing Double
A common misconception about SLI is that you can get double, triple, or even quadruple video RAM with more graphics cards. Unfortunately, Nvidia SLI only uses the RAM from one card, as each card needs to access the same information at the same time.
If you find something different, feel free to post it. But those links go on for days and back a decade. ;)

Edit: I vaguely recall dx12 supposedly being able to pool it, but... I cant find a thing thats concrete... people are saying the same thing but get shut down left and right.

EDIT2: From Nvidia (circa 2012, lol) - nvidia.custhelp.com/app/answers/detail/a_id/153/~/in-sli-mode%2C-is-memory-shared-%28i.e-do-both-2gb-cards-become-a-4gb
In SLI or Multi GPU mode, is memory shared (i.e do both 2GB cards become a 4GB configuration)?
No, each GPU maintains its own frame-buffer so you will not double your memory. Rendering data, such as texture and geometry information, is duplicated across both cards. This is also the case with Multi-GPU mode when using a single GeForce 7950 GX2, 9800 GX2, GTX295, GTX 590 and GTX 690 based card.
Posted on Reply
#50
lexluthermiester
Yeah, let's see white papers from NVidia... Neither of those sites cite NVidia documentation.

developer.download.nvidia.com/whitepapers/2011/SLI_Best_Practices_2011_Feb.pdf
www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

In short, VRAM availability to applications and APIs will be equal to the amount on one card, but the cards themselves do and must use all VRAM available. If custom code is used SLI performance can optimized on a per-application basis which includes both symmetric and asymmetric VRAM usage, which means the standard limits and operational constraints can and are altered.

That being said, in the context of a VRAM drive, the VRAM of each card can be used independently or in series while the card still runs SLI functions simultaneously.
Posted on Reply
Add your own comment
May 15th, 2024 14:21 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts