Crysis 3 Installed On and Run Directly from RTX 3090 24 GB GDDR6X VRAM

Deleted member 6693 · Oct 6, 2020

Jesus - that´s a stiff price for a small harddrive....

lexluthermiester · Oct 6, 2020

Aquinus said:
Why is this even an article?

Because it's very interesting! That's why.

Aquinus said:
Reading game data from VRAM, which has to be read to system memory before going back to the GPU

Which would happen with ANY storage. VRAM is literally the fastest storage you can buy.

Aquinus said:
is only going to hurt performance.

How? HDD's are much slower and SSD's, even the fastest, are still slow in comparison to VRAM. How is VRAM going to "hurt" performance?

Aquinus said:
It's fancy that you can do this, but it's really a sub-par solution compared to just using a normal RAM disk.

You assume that game assets are not transferred directly to other sections of VRAM through special instruction operations, which would not be difficult. However, you have a point with the ram-disk.

Aquinus said:
Accessing system memory is going to be far faster than doing twice the number of transfers over PCIe to and from the same device. This isn't a win for latency and it's not a win for bandwidth compared to the alternative.

While that would be true if the VRAM could not do transfers directly to itself, it can and with this VRAM-disk scheme likely does. This is an experimental thing. It's not being done because it's practical, it's being done for giggles. Lighten up a little bit.

modmax · Oct 6, 2020

with the same system I installed crysis 1 on the vega 56 (lower memory quantity), I don't see where the miracle is, mostly thanks to the tool written by the programmer that allows you to use the gpu memories as a disk; however in my benchmarks I arrived at 4gb / sr / w, measured with crystal disc mark then I tried the same thing using a disk made with the ram of the cpu (2666mhz ddr4 dual channel) and the result was 10gb r / w then I do not see where the miracle obtained by the user is ... maybe he has a credit ... he devoted himself a lot to the subject until he found a tool that allows you to do this .... in addition to being an exercise in style I do not see the usefulness or difficulty in achieving is pciexpress3.0 vs cpu ram bus; exfat is the fastest filesystem even if by just about 7-10%.
in the third photo you can see how with the aida benchmark the ram of the gpu are slow in the r / w but in the copy being inside the gpu the bus allows it to reach 30 times the speed of the pciexpress 3.0 bus while the cpu still remains in advantage having about 4 times the speed in r / w

Aquinus · Oct 6, 2020

lexluthermiester said:
Because it's very interesting! That's why.

Which would happen with ANY storage. VRAM is literally the fastest storage you can buy.

How? HDD's are much slower and SSD's, even the fastest, are still slow in comparison to VRAM. How is VRAM going to "hurt" performance?

You assume that game assets are not transferred directly to other sections of VRAM through special instruction operations, which would not be difficult. However, you have a point with the ram-disk.

While that would be true if the VRAM could not do transfers directly to itself, it can and with this VRAM-disk scheme likely does. This is an experimental thing. It's not being done because it's practical, it's being done for giggles. Lighten up a little bit.

It's interesting if you've been living under a rock. This isn't something new and it didn't get any better. It hurts performance because you're now sharing memory bandwidth with disk access in addition to GPU rendering. Loading content and rendering at the same time could cause a performance degradation between memory use and PCIe utilization because now content has to travel in both directions from and to the GPU. That's twice the number of transfers because there is no way for the GPU driver to know what's on the part of VRAM being used as a disk. If it were a mere copy, then I'd agree, but it's not. Disk emulation is involved which makes doing what you suggest not feasible, particularly if the content has be manipulated in some way before being loaded into VRAM. In short, a disk read (regardless of where that disk is,) always goes through system memory.

I get that it was done for giggles, by why do giggles require a news article? There's nothing new going on here.

Steevo · Oct 6, 2020

Aquinus said:
It's interesting if you've been living under a rock. This isn't something new and it didn't get any better. It hurts performance because you're now sharing memory bandwidth with disk access in addition to GPU rendering. Loading content and rendering at the same time could cause a performance degradation between memory use and PCIe utilization because now content has to travel in both directions from and to the GPU. That's twice the number of transfers because there is no way for the GPU driver to know what's on the part of VRAM being used as a disk. If it were a mere copy, then I'd agree, but it's not. Disk emulation is involved which makes doing what you suggest not feasible, particularly if the content has be manipulated in some way before being loaded into VRAM. In short, a disk read (regardless of where that disk is,) always goes through system memory.

I get that it was done for giggles, by why do giggles require a news article? There's nothing new going on here.

Cause perhaps it will turn on a lightbulb for someone.

Also, bandwidth tests by our own W1zzard show the PCIe bus is barely used, and if the transfer from Vmem to RAM is faster than from a SSD or NVMe while not hindering performance of the data between the CPU and GPU it's interesting when you consider the new DMA and what could happen if a CPU core was placed on the GPU. No more need for decompression and transfers. Which is kinda one of the new things Nvidia and AMD have been working on, instead of using the CPU to decompress data the GPU needs load compressed textures into Vmem and allow the GPU to handle decompression on the fly, and if they do it with fine enough resolution the GPU could direct fetch and decompress only the part of the texture needed.

I think it's cool, and it shows how much more the hardware we have is capable of, and how in a few years we may have a true "APU" of graphics cores intermixed with CPU cores sharing cache and a homogeneous pool of faster memory.

Th3pwn3r · Oct 6, 2020

It's old news. But some newer PC guys&girls probably didn't know you could do this.

lexluthermiester · Oct 6, 2020

Aquinus said:
I get that it was done for giggles, by why do giggles require a news article? There's nothing new going on here.

But again, it is very interesting and novel.

rutra80 · Oct 6, 2020

@lexluthermiester almost all your points are wrong...
Read a couple of posts before you. Also try benchmarking it.
Maybe with DirectStorage and stuff like RTX IO it will be great, but surely not with GpuRamDrive in its current form. I wonder if that tool lets you assign more MB than VRAM available ...because quite possibly if it gets full you end up in RAM anyway and if it gets full you end up on NVMe/SSD/HDD or wherever your swap file is.

lexluthermiester · Oct 6, 2020

rutra80 said:
@lexluthermiester almost all your points are wrong...

Prove it.

rutra80 said:
Read a couple of posts before you.

Did that.

rutra80 said:
Maybe with DirectStorage and stuff like RTX IO it will be great, but surely not with GpuRamDrive in its current form. I wonder if that tool lets you assign more MB than VRAM available ...because quite possibly if it gets full you end up in RAM anyway and if it gets full you end up on NVMe/SSD/HDD or wherever your swap file is.

You seem to be missing a few conceptual points. So before telling me I'm wrong on all points, do some research.

Aquinus · Oct 6, 2020

lexluthermiester said:
and novel.

Except it's not novel. This has been around for a while. The only part that is novel is this particular GPU.

lexluthermiester · Oct 6, 2020

Aquinus said:
Except it's not novel. This has been around for a while.

Not to you maybe...

Aquinus said:
The only part that hasn't is this particular GPU.

And to be fair, no one has ever done this particular thing. Installing and running a game as big and complex as Crysis3 from VRAM?

What I want to see is someone do something like this with one of those incoming 48GB/64GB Quadro cards. That would be fascinating!

Aquinus · Oct 6, 2020

lexluthermiester said:
Not to you maybe...

And to be fair, no one has ever done this particular thing. Installing and running a game as big and complex as Crysis3 from VRAM?

What I want to see is someone do something like this with one of those incoming 48GB/64GB Quadro cards. That would be fascinating!

Would it though? I don't really see it changing anything. I still would expect a ram disk to be faster and cheaper. Latency doesn't disappear because you use a card with more VRAM.

silentbogo · Oct 6, 2020

lexluthermiester said:
But again, it is very interesting and novel.

Drawing poop emojii with a broken goose feather held in your left foot is also interesting and novel to some people, doesn't make it any more useful or practical.
This was just an old amateur concept that hasn't been updated in several years (for a good reason). The main issue is that it still uses RAM for data exchange, so basically it works like a conventional RAM disk that needs slightly less memory space, but uses GPU as temporary storage. Adding several more steps to read/write process only makes it drastically slower than RAM disk (to the point where GDDR5 is slower than NVME). Looked through that code and even though I haven't touched CUDA or even C++ in years, I can already see some issues.
I'm sure there are much better and efficient ways to make this work, but I still don't see any reasons to do so... Heck, NVME has already saturated PCIe 3.0 bandwidth, and PCIe 4.0 isn't even at the full swing yet. Regardless of how fast GDDR5/6/7... or HBM is on paper, it's only gonna be that fast from the perspective of the GPU. For the rest of the system it's gonna be only as fast as PCIe and a shitton of abstraction layers will allow it to be. Basically, what I'm trying to say is that you can't make it faster than NVME RAID, even less so - RAM disk. That's why AMD stuck their guns to hybrid solutions, like Radeon Pro SSG. At least for now this approach makes a bit more sense, when you actually need to have "storage" on GPU.

Aquinus · Oct 6, 2020

silentbogo said:
Drawing poop emojii with a broken goose feather held in your left foot is also interesting and novel to some people, doesn't make it any more useful or practical.
This was just an old amateur concept that hasn't been updated in several years (for a good reason). The main issue is that it still uses RAM for data exchange, so basically it works like a conventional RAM disk that needs slightly less memory space, but uses GPU as temporary storage. Adding several more steps to read/write process only makes it drastically slower than RAM disk (to the point where GDDR5 is slower than NVME). Looked through that code and even though I haven't touched CUDA or even C++ in years, I can already see some issues.
I'm sure there are much better and efficient ways to make this work, but I still don't see any reasons to do so... Heck, NVME has already saturated PCIe 3.0 bandwidth, and PCIe 4.0 isn't even at the full swing yet. Regardless of how fast GDDR5/6/7... or HBM is on paper, it's only gonna be that fast from the perspective of the GPU. For the rest of the system it's gonna be only as fast as PCIe and a shitton of abstraction layers will allow it to be. Basically, what I'm trying to say is that you can't make it faster than NVME RAID, even less so - RAM disk. That's why AMD stuck their guns to hybrid solutions, like Radeon Pro SSG. At least for now this approach makes a bit more sense, when you actually need to have "storage" on GPU.

Precisely. If people really care about making things go fast, a ram disk is the way to do it. Nothing, and I mean nothing, will be faster than direct access to physical memory. There is no interconnect that has lower latency and higher bandwidth than accessing DRAM directly. There just isn't. Here's the rub though, because even that doesn't matter because you still need to copy game data from somewhere to put it into a ram disk or a "vram disk". New game or a restart means a new copy. You're still constrained by the media that the game data is on and you have to wait longer to get going.

So, in summary:

Fun level of doing this if it's novel to you and you get excited by this kind of thing: High
Practical usefulness of doing this when you have a NVMe drive: Never

InVasMani · Oct 7, 2020

Whoa I never knew about this GPU RAM DRIVE...I knew you could that kind of thing in Linux, but I hadn't ever seen this in Windows. Well that's pretty damn cool been looking for this sort of thing on windows for quite a few years didn't know someone had finally come up with solution though.

Aquinus said:
Precisely. If people really care about making things go fast, a ram disk is the way to do it. Nothing, and I mean nothing, will be faster than direct access to physical memory. There is no interconnect that has lower latency and higher bandwidth than accessing DRAM directly. There just isn't. Here's the rub though, because even that doesn't matter because you still need to copy game data from somewhere to put it into a ram disk or a "vram disk". New game or a restart means a new copy. You're still constrained by the media that the game data is on and you have to wait longer to get going.

So, in summary:

Fun level of doing this if it's novel to you and you get excited by this kind of thing: High
Practical usefulness of doing this when you have a NVMe drive: Never

I think you're overlooking the CPU and memory overhead of a actual system based ram disk. This would still have some of that loading up the VRAM initially, but after that it would pretty much run off it's own GPU resources and that's part of the beauty of it. Hell with a 24GB GPU you could probably load a copy windows 10 onto it especially since it's paired with ImDisk to begin with which is VHD friendly. It might not work with a fully patch Windows 10 Pro though or perhaps not without stripping down a few parts of it at least. It kind of read on the line of feasible/infeasible for that purpose though Windows 10 home edition would be a bit trimmed down anyway and should work. There is still no denying it's really interesting. You could utilize it for Prefetch/ReadyBoot.etl or most likely virtual memory assuming that can be pointed to it as well or not. I imagine StoreMi or PrimoCache would play nice with it as well. If I'm not mistaken once the data is copied to this type of VRAM device it should actually be quicker than system memory between the two which if that's the case this isn't bad at all. NVMe I don't believe is going to have the I/O of this kind of device if I'm not mistaken much like it can't come close to competing with system memory in that area it gets trounced.

lexluthermiester · Oct 7, 2020

@silentbogo @Aquinus
You two can argue how pointless or uninteresting it is till the cows come home. The rest of us will continue to find it interesting.

Aquinus · Oct 7, 2020

lexluthermiester said:
You two can argue how pointless or uninteresting it is till the cows come home. The rest of us will continue to find it interesting.

I'm not arguing, I'm agreeing. Once again:

silentbogo said:
Drawing poop emojii with a broken goose feather held in your left foot is also interesting and novel to some people, doesn't make it any more useful or practical.

InVasMani · Oct 7, 2020

Wonder what happens if you SLI two cards does bandwidth and/or capacity increase? Something certainly worth noting is it could be used on a older system hell in LGA775 if you were on a DDR2 board it might even be faster than the system memory crazy as that use case is. I think just the fact that it can be done and potentially has a upside to it is intriguing enough. I like the idea of it with Primo Cache or StoreMi especially for a hybrid cache and the prefetch readyboot/boost/shadow cache as well as the controversial page filing isn't a terrible use either. I'd like to see what the ATTO benchmark both for bytes and I/O the latter in perticular is really interesting to look at and compare to other storage options NVME and ramdisk as well as SATA. Perhaps it's not very practical though it is really intriguing.

Aquinus · Oct 7, 2020

InVasMani said:
Whoa I never knew about this GPU RAM DRIVE...I knew you could that kind of thing in Linux, but I hadn't ever seen this in Windows. Well that's pretty damn cool been looking for this sort of thing on windows for quite a few years didn't know someone had finally come up with solution though.

I think you're overlooking the CPU and memory overhead of a actual system based ram disk. This would still have some of that loading up the VRAM initially, but after that it would pretty much run off it's own GPU resources and that's part of the beauty of it. Hell with a 24GB GPU you could probably load a copy windows 10 onto it especially since it's paired with ImDisk to begin with which is VHD friendly. It might not work with a fully patch Windows 10 Pro though or perhaps not without stripping down a few parts of it at least. It kind of read on the line of feasible/infeasible for that purpose though Windows 10 home edition would be a bit trimmed down anyway and should work. There is still no denying it's really interesting. You could utilize it for Prefetch/ReadyBoot.etl or most likely virtual memory assuming that can be pointed to it as well or not. I imagine StoreMi or PrimoCache would play nice with it as well. If I'm not mistaken once the data is copied to this type of VRAM device it should actually be quicker than system memory between the two which if that's the case this isn't bad at all. NVMe I don't believe is going to have the I/O of this kind of device if I'm not mistaken much like it can't come close to competing with system memory in that area it gets trounced.

That isn't how it works. You're welcome to prove me wrong by demonstrating how it's possible by actually doing it.

R-T-B · Oct 7, 2020

Caring1 said:
Rhyming slang?
Trouble & strife = Wife.

Gotta love your constant jabs at women, real classy dude.

silentbogo · Oct 7, 2020

InVasMani said:
Wonder what happens if you SLI two cards does bandwidth and/or capacity increase?

SLI doesn't "double" your video memory. Think of it as RAID-1, but with videocards.

InVasMani said:
Something certainly worth noting is it could be used on a older system hell in LGA775 if you were on a DDR2 board it might even be faster than the system memory crazy as that use case is.

Re-read my post above. All of your data is going through RAM either way. Plus, those old "DDR2" boards usually have PCIe 1.1, which is another perf gimp. This concept is physically incapable of being faster than RAM disk on any given machine, just because of the way it works.

bubbleawsome · Oct 7, 2020

silentbogo said:
SLI doesn't "double" your video memory. Think of it as RAID-1, but with videocards.

Doesn't modern NV-Link allow pooling of memory?

InVasMani · Oct 7, 2020

Telsa K80's aren't really that expensive on Ebay these days shame this isn't very effective. On the plus side over flash storage it should be more reliable between the two. Unfortunately it just makes no sense over system memory is what seems to be indicated novelty parlor trick at best I guess.

silentbogo · Oct 7, 2020

bubbleawsome said:
Doesn't modern NV-Link allow pooling of memory?

Yes, it does, but it will make little to no difference. In a typical PC NVlink only helps GPUs to talk to each other, and CPU still uses PCIe bus to talk to GPUs.
For this particular case it'll be exactly the same as if you had multi-GPU without any bridges(e.g. you can create individual vRAM disks, but can't combine them). The only way around it is to create a storage pool out of several vRAM disks(Windows Storage Spaces), but I'm not sure if it'll even work for these.

lexluthermiester · Oct 7, 2020

silentbogo said:
SLI doesn't "double" your video memory. Think of it as RAID-1, but with videocards.

Your analogy is VERY flawed. If you were to compare SLI to RAID, it would be RAID0 as you are adding the capacity of one card to another not mirroring one card with another as would be done with RAID1. And yes, the VRAM doubles. In the case of the RTX3090, 24GB + 24GB = 48GB.

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 2TB external SSD, 4TB external HDD for backup.
Display(s)	32" Dell UHD, 27" LG UHD, 28" LG 5k
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply	Display or Thunderbolt 4 Hub
Mouse	Logitech G502
Keyboard	Logitech G915, GL Clicky
Software	MacOS 15.3.1

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 2TB external SSD, 4TB external HDD for backup.
Display(s)	32" Dell UHD, 27" LG UHD, 28" LG 5k
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply	Display or Thunderbolt 4 Hub
Mouse	Logitech G502
Keyboard	Logitech G915, GL Clicky
Software	MacOS 15.3.1

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 2TB external SSD, 4TB external HDD for backup.
Display(s)	32" Dell UHD, 27" LG UHD, 28" LG 5k
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply	Display or Thunderbolt 4 Hub
Mouse	Logitech G502
Keyboard	Logitech G915, GL Clicky
Software	MacOS 15.3.1

System Name	WS#1337
Processor	Ryzen 7 5700X3D
Motherboard	ASUS X570-PLUS TUF Gaming
Cooling	Xigmatek Scylla 240mm AIO
Memory	64GB DDR4-3600(4x16)
Video Card(s)	MSI RTX 3070 Gaming X Trio
Storage	ADATA Legend 2TB
Display(s)	Samsung Viewfinity Ultra S6 (34" UW)
Case	ghetto CM Cosmos RC-1000
Audio Device(s)	ALC1220
Power Supply	SeaSonic SSR-550FX (80+ GOLD)
Mouse	Logitech G603
Keyboard	Modecom Volcano Blade (Kailh choc LP)
VR HMD	Google dreamview headset(aka fancy cardboard)
Software	Windows 11, Ubuntu 24.04 LTS

Crysis 3 Installed On and Run Directly from RTX 3090 24 GB GDDR6X VRAM

Deleted member 6693

Guest

lexluthermiester

modmax

Attachments

Aquinus

Resident Wat-man

Steevo

Th3pwn3r

lexluthermiester

rutra80

lexluthermiester

Aquinus

Resident Wat-man

lexluthermiester

Aquinus

Resident Wat-man

silentbogo

Moderator

Aquinus

Resident Wat-man

InVasMani

lexluthermiester

Aquinus

Resident Wat-man

InVasMani

Aquinus

Resident Wat-man

R-T-B

silentbogo

Moderator

bubbleawsome

InVasMani

silentbogo

Moderator

lexluthermiester

System Name	Pioneer
Processor	Ryzen 9 9950X
Motherboard	MSI MAG X670E Tomahawk Wifi
Cooling	Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory	128GB (4x 32GB) G.Skill Flare X5 @ DDR5-4200(Running 1:1:1 w/FCLK)
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs, 1x 2TB Seagate Exos 3.5"
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64, other office machines run Windows 11 Enterprise

Processor	5900x
Motherboard	MSI MEG UNIFY
Cooling	Arctic Liquid Freezer 2 360mm
Memory	4x8GB 3600c16 Ballistix
Video Card(s)	EVGA 3080 FTW3 Ultra
Storage	1TB SX8200 Pro, 2TB SanDisk Ultra 3D, 6TB WD Red Pro
Display(s)	Acer XV272U
Case	Fractal Design Meshify 2
Power Supply	Corsair RM850x
Mouse	Logitech G502 Hero
Keyboard	Ducky One 2