• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Is there anway to make use of unused VRAM?

As you can see, the processor cannot access VRAM directly. Only the GPU can do it and any attempt to use this memory as RAM, by detours, is just fun and useless. Even the igp has its RAM memory reserved, impossible for the CPU to access.

View attachment 296919
That render you've shown isn't an official diagram of how they're actually designed or work

Direct3D12 ultimate and Directstorage are changing how it all works, and much of it is already active.
The CPU can edit content of VRAM, and the GPU can read from NVME. They're all directly linked over PCI-E now.


It's just not that fast at doing so, latencies are good but overall bandwidth isn't
So loading in a new texture to prevent microstutter - hell yes.

As a write cache or something... maybe?
The CUDA version definitely was GPU controlled and had no CPU usage on my testing, but the CPU version of the software would likely have been software driven by the CPU.
 
??? I think you've misunderstood how that all works.
Very old image, but a clear example
View attachment 296917
I do not believe.

"Each chip uses two separate 16-bit channels to connect to a single 32-bit"

That render you've shown isn't an official diagram of how they're actually designed or work

Direct3D12 ultimate and Directstorage are changing how it all works, and much of it is already active.
The CPU can edit content of VRAM, and the GPU can read from NVME. They're all directly linked over PCI-E now.


It's just not that fast at doing so, latencies are good but overall bandwidth isn't
So loading in a new texture to prevent microstutter - hell yes.

As a write cache or something... maybe?
The CUDA version definitely was GPU controlled and had no CPU usage on my testing, but the CPU version of the software would likely have been software driven by the CPU.
The idea is that the CPU cannot use the real performance of the VRAM memory.
To simplify: a defender cannot replace the lack of a striker, nor vice versa. Everyone has their role and there they give the maximum yield.
 
It might be interesting to use a VRAM disk with primocache -- maybe as a multi-gigabyte write cache (to save on writes to SSD's without using up system RAM).
Why would I do that when system ram is plentiful and VRAM is scarce?
 
I do not believe.

"Each chip uses two separate 16-bit channels to connect to a single 32-bit"


The idea is that the CPU cannot use the real performance of the VRAM memory.
To simplify: a defender cannot replace the lack of a striker, nor vice versa. Everyone has their role and there they give the maximum yield.
Bolded line, correct
Underlined line... incorrect. Poor analogy.

The CPU can't use the VRAM fast because it has no direct access, and no high-speed link to it.
The CPU has to tell the GPU what to do, which then uses the VRAM, and then passes the result back.
That's not about somethings role but about an order of events and the speed between each of them in multiple metrics of bandwidth, latency, and processing delays.

As stated, DX12 has updates to change that but it's not used yet and definitely not part of that CUDA VRAM code.

I'm still not sure what you think you're saying with regards to the memory.
They use two 1GB or 2GB modules paired together to connect to the GPUs 32 bit wide memory buses, and then multiply that to reach their desired amount - in this case, 128 bit wide (and 192 bit wide on the previous gen)
I can discuss a lot more of that, but stating random facts and numbers doesn't explain what you mean by any of it.
PCI-E 4.0 x16 has a theoretical max of 32GB/s without any overheads so it could never bypass that, and PCI-E 5.0 GPU's could do 64GB/s... if the GPU architecture could actually provide that level of speed.

this quote sums this thread up nicely
In testing with a variety of games and synthetic benchmarks, the 32 MB L2 cache reduced memory bus traffic by just over 50% on average compared to the performance of a 2 MB L2 cache. See the reduced VRAM accesses in the Ada Memory Subsystem diagram above.

This 50% traffic reduction allows the GPU to use its memory bandwidth 2X more efficiently. As a result, in this scenario, isolating for memory performance, an Ada GPU with 288 GB/sec of peak memory bandwidth would perform similarly to an Ampere GPU with 554 GB/sec of peak memory bandwidth. Across an array of games and synthetic tests, the greatly increased hit rates improve frame rates by up to 34%.
They care about internal speed. Not external.

By using a caching system they managed to get the same gaming performance despite halving the bandwidth - but that VRAM is now halved in speed for something as simple as file transfers on a VRAM drive. That cache would not help that sort of activity, as the entire GPU as a whole was never designed or optimised for it.


Yay, your 128 bit GPU with 288GB/s bandwidth can at most send 64GB/s of that at best to another location of equal speed, which cant exist within the system as that would fill the DRAM in seconds and overwhelm even the fastest NVME drives. You then take into account that various file sizes, types, overheads, compression and so on and even simple things like reading and writing at the same time all slow those values down and it becomes a very poor proposition to use VRAM for anything other than its intended purpose, of being where the GPU stores the code it's crunching away at.
 
Back
Top