Thursday, May 18th 2023

NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

NVIDIA receives a lot of questions about graphics memory, also known as the frame buffer, video memory, or "VRAM", and so with the unveiling of our new GeForce RTX 4060 Family of graphics cards we wanted to share some insights, so gamers can make the best buying decisions for their gaming needs. What Is VRAM? VRAM is high speed memory located on your graphics card.

It's one component of a larger memory subsystem that helps make sure your GPU has access to the data it needs to smoothly process and display images. In this article, we'll describe memory subsystem innovations in our latest generation Ada Lovelace GPU architecture, as well as how the speed and size of GPU cache and VRAM impacts performance and the gameplay experience.
GeForce RTX 40 Series Graphics Cards Memory Subsystem: Improving Performance & Efficiency
Modern games are graphical showcases, and their install sizes can now exceed 100 GB. Accessing this massive amount of data happens at different speeds, determined by the specifications of the GPU, and to some extent your system's other components. On GeForce RTX 40 Series graphics cards, new innovations accelerate the process for smooth gaming and faster frame rates, helping you avoid texture stream-in or other hiccups.

The Importance Of Cache
GPUs include high-speed memory caches that are close to the GPU's processing cores, which store data that is likely to be needed. If the GPU can recall the data from the caches, rather than requesting it from the VRAM (further away) or system RAM (even further away), the data will be accessed and processed faster, increasing performance and gameplay fluidity, and reducing power consumption.

GeForce GPUs feature a Level 1 (L1) cache (the closest and fastest cache) in each Streaming Multiprocessor (SM), up to twelve of which can be found in each GeForce RTX 40 Series Graphics Processing Cluster (GPC). This is followed by a fast, larger, shared Level 2 (L2) cache that can be accessed quickly with minimal latency.

Accessing each cache level incurs a latency hit, with the tradeoff being greater capacity. When designing our GeForce RTX 40 Series GPUs, we found a singular, large L2 cache to be faster and more efficient than other alternatives, such as those featuring a small L2 cache, and a large, slower to access L3 cache.
Prior generation GeForce GPUs had much smaller L2 Caches, resulting in lower performance and efficiency compared to today's GeForce RTX 40 Series GPUs.
During use, the GPU first searches for data in the L1 data cache within the SM, and if the data is found in L1 there's no need to access the L2 data cache. If data is not found in L1, it's called a "cache miss", and the search continues into the L2 cache. If data is found in L2, that's called an L2 "cache hit" (see the "H" indicators in the above diagram), and data is provided to the L1 and then to the processing cores.

If data is not found in the L2 cache, an L2 "cache miss", the GPU now tries to obtain the data from the VRAM. You can see a number of L2 cache misses in the above diagram that depicts our prior architecture memory subsystem, which causes a number of VRAM accesses.

If the data's missing from the VRAM, the GPU requests it from your system's memory. If the data is not in system memory, it can typically be loaded into system memory from a storage device like an SSD or hard drive. The data is then copied into VRAM, L2, L1, and ultimately fed to the processing cores. Note that different hardware -and software- based strategies exist to keep the most useful, and most reused data present in caches.

Each additional data read or write operation through the memory hierarchy slows performance and uses more power, so by increasing our cache hit rate we increase frame rates and efficiency.
Compared to prior generation GPUs with a 128-bit memory interface, the memory subsystem of the new NVIDIA Ada Lovelace architecture increases the size of the L2 cache by 16X, greatly increasing the cache hit rate. In the examples above, representing 128-bit GPUs from Ada and prior generation architectures, the hit rate is much higher with Ada. In addition, the L2 cache bandwidth in Ada GPUs has been significantly increased versus prior GPUs. This allows more data to be transferred between the cores and the L2 cache as quickly as possible.

Shown in the diagram below, NVIDIA engineers tested the RTX 4060 Ti with its 32 MB L2 cache against a special test version of RTX 4060 Ti using only a 2 MB L2, which represents the L2 cache size of previous generation 128-bit GPUs (where 512 KB of L2 cache was tied to each 32-bit memory controller).

In testing with a variety of games and synthetic benchmarks, the 32 MB L2 cache reduced memory bus traffic by just over 50% on average compared to the performance of a 2 MB L2 cache. See the reduced VRAM accesses in the Ada Memory Subsystem diagram above.

This 50% traffic reduction allows the GPU to use its memory bandwidth 2X more efficiently. As a result, in this scenario, isolating for memory performance, an Ada GPU with 288 GB/sec of peak memory bandwidth would perform similarly to an Ampere GPU with 554 GB/sec of peak memory bandwidth. Across an array of games and synthetic tests, the greatly increased hit rates improve frame rates by up to 34%.
Memory Bus Width Is One Aspect Of A Memory Subsystem
Historically, memory bus width has been used as an important metric for determining the speed and performance class of a new GPU. However, the bus width by itself is not a sufficient indicator of memory subsystem performance. Instead, it's helpful to understand the broader memory subsystem design and its overall impact on gaming performance.

Due to the advances in the Ada architecture, including new RT and Tensor Cores, higher clock speeds, the new OFA Engine, and Ada's DLSS 3 capabilities, the GeForce RTX 4060 Ti is faster than the previous-generations, 256-bit GeForce RTX 3060 Ti and RTX 2060 SUPER graphics cards, all while using less power.
Altogether, the tech specs deliver a great 60-class GPU with high performance for 1080p gamers, who account for the majority of Steam users.
The Amount of VRAM Is Dependent On GPU Architecture
Gamers often wonder why a graphics card has a certain amount of VRAM. Current-generation GDDR6X and GDDR6 memory is supplied in densities of 8 GB (1 GB of data) and 16Gb (2 GB of data) per chip. Each chip uses two separate 16-bit channels to connect to a single 32-bit Ada memory controller. So a 128-bit GPU can support 4 memory chips, and a 384-bit GPU can support 12 chips (calculated as bus width divided by 32). Higher capacity chips cost more to make, so a balance is required to optimize prices.

On our new 128-bit memory bus GeForce RTX 4060 Ti GPUs, the 8 GB model uses four 16Gb GDDR6 memory chips, and the 16 GB model uses eight 16Gb chips. Mixing densities isn't possible, preventing the creation of a 12 GB model, for example. That's also why the GeForce RTX 4060 Ti has an option with more memory (16 GB) than the GeForce RTX 4070 Ti and 4070, which have 192-bit memory interfaces and therefore 12 GB of VRAM.

Our 60-class GPUs have been carefully crafted to deliver the optimum combination of performance, price, and power efficiency, which is why we chose a 128-bit memory interface. In short, higher capacity GPUs of the same bus width always have double the memory.

Do On Screen Display (OSD) Tools Report VRAM Usage Accurately?
Gamers often cite the "VRAM usage" metric in On Screen Display performance measurement tools. But this number isn't entirely accurate, as all games and game engines work differently. In the majority of cases, a game will allocate VRAM for itself, saying to your system, 'I want it in case I need it'. But just because it's holding the VRAM, doesn't mean it actually needs all of it. In fact, games will often request more memory if it's available.

Due to the way memory works, it's impossible to know precisely what's being actively used unless you're the game's developer with access to development tools. Some games offer a guide in the options menu, but even that isn't always accurate. The amount of VRAM that is actually needed will vary in real time depending on the scene and what the player is seeing.

Furthermore, the behavior of games can vary when VRAM is genuinely used to its max. In some, memory is purged causing a noticeable performance hitch while the current scene is reloaded into memory. In others, only select data will be loaded and unloaded, with no visible impact. And in some cases, new assets may load in slower as they're now being brought in from system RAM.

For gamers, playing is the only way to truly ascertain a game's behavior. In addition, gamers can look at "1% low" framerate measurements, which can help analyze the actual gaming experience. The 1% Low metric - found in the performance overlay and logs of the free NVIDIA FrameView app, as well as other popular measurement tools - measures the average of the slowest 1% of frames over a certain time period.

Automate Setting Selection With GeForce Experience & Download The Latest Patches
Recently, some new games have released patches to better manage memory usage, without hampering the visual quality. Make sure to get the latest patches for new launches, as they commonly fix bugs and optimize performance shortly after launch.

Additionally, GeForce Experience supports most new games, offering optimized settings for each supported GeForce GPU and VRAM configuration, giving gamers the best possible experience by balancing performance and image quality. If you're unfamiliar with game option lingo and just want to enjoy your games from the second you load them, GeForce Experience can automatically tune game settings for a great experience each time.

NVIDIA Technologies Can Help Developers Reduce VRAM Usage
Games are richer and more detailed than ever before, necessitating those 100 GB+ installs. To help developers optimize memory usage, NVIDIA has several free developer tools and SDKs, including:These are just a few of the tools and technologies that NVIDIA freely provides to help developers optimize their games for all GPUs, platforms, and memory configurations.

Some Applications Can Use More VRAM
Beyond gaming, GeForce RTX graphics cards are used around the world for 3D animation, video editing, motion graphics, photography, graphic design, architectural visualization, STEM, broadcasting, and AI. Some of the applications used in these industries may benefit from additional VRAM. For example, when editing 4K or 8K timelines in Premiere, or crafting a massive architectural scene in D5 Render.

On the gaming side, high resolutions also generally require an increase in VRAM. Occasionally, a game may launch with an optional extra large texture pack and allocate more VRAM. And there are a handful of games which run best at the "High" preset on the 4060 Ti (8 GB), and maxed-out "Ultra" settings on the 4060 Ti (16 GB). In most games, both versions of the GeForce RTX 4060 Ti (8 GB and 16 GB) can play at max settings and will deliver the same performance.
The benefit of the PC platform is its openness, configurability and upgradability, which is why we're offering the two memory configurations for the GeForce RTX 4060 Ti; if you want that extra VRAM, it will be available in July.

A GPU For Every Gamer
Following the launch of the GeForce RTX 4060 Family, there'll be optimized graphics cards for each of the three major game resolutions. However you play, all GeForce RTX 40 Series GPUs will deliver a best-in-class experience, with leading power efficiency, supported by a massive range of game-enhancing technologies, including NVIDIA DLSS 3, NVIDIA Reflex, NVIDIA G-SYNC, NVIDIA Broadcast, and RTX Remix.
For the latest news about all the new games and apps that leverage the full capabilities of GeForce RTX graphics cards, stay tuned to GeForce.com.
Source: NVIDIA Blog
Add your own comment

139 Comments on NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

#51
oxrufiioxo
Chrispy_What's the point of better-than-expected performance if you're forced to run low settings due to VRAM limitations?
it would make the 16GB variant more viable if the performance is closer to the 4070 than I expect it to be ....

Let not kid ourselves though it's not like AMD has even showed up in this segment they skipped it all the way down to the 300 option with an also overpriced 8GB card just like the 4060 that has less vram than it's predecessor
Posted on Reply
#52
Chrispy_
Bomby569at this point the vram discussion is more like a shouting match, lots of unreasonable claims, everyone has their own opinion, and me i'm still waiting for reasonable tests in reasonable scenarios. This should be a 1440p card to be use with medium settings, and that's what i want to see. Not 1080p (even if anyone can use it for that for sure), not ultra, not RT (no one is actually taking you seriously RT), not 4k, not tested with a 4090 pretending to be a 4060.
The thing that's most offensive here is that Nvidia now wants to call a $400 GPU "1080p" in 2023.

The 1660 super, a 4-year-old that cost $250 in 2019 card still breezes "1080p" in 2023.

The 4060Ti is so much more capable than 1080p, but it's crippled by that 8GB. Nvidia admitted as much by failing to match graphics settings in their own cherry-picked benchmarks.
Posted on Reply
#53
Bomby569
Chrispy_The thing that's most offensive here is that Nvidia now wants to call a $400 GPU "1080p"

The 1660 super, a 4-year-old $250 card still breezes "1080p" in 2023.
the goal post is always moving forward, tomorrow games at 1080p will always demand more power. But that said resolutions are also being dropped, 1080p is a entry level now, barely anyone is using 720p anymore. Calling a 4060 card a 1080p card in 2023 doesn't sit right with me at all, especially at that price. But what do i know.
Posted on Reply
#54
oxrufiioxo
Bomby569the goal post is always moving forward, tomorrow games at 1080p will always demand more power. But that said resolutions are also being dropped, 1080p is a entry level now, barely anyone is using 720p anymore. Calling a 4060 card a 1080p card in 2023 doesn't sit right with me at all, especially at that price. But what do i know.
Nvidia themselves has said the 4060ti 8GB is targeting 1080p gaming.
Posted on Reply
#55
Chrispy_
Bomby569the goal post is always moving forward, tomorrow games at 1080p will always demand more power. But that said resolutions are also being dropped, 1080p is a entry level now, barely anyone is using 720p anymore. Calling a 4060 card a 1080p card in 2023 doesn't sit right with me at all, especially at that price. But what do i know.
Yep

xx60 class has always represented "the sweet spot" and for gamers, the sweet spot moved on from 1080p60 a long time ago.
IMO, the sweet spot has been 1440p high refresh with VRR for years now. You don't need to always get >144 fps but an average of ~90fps with 1% lows of over 60 is a good place to be.
Posted on Reply
#56
oxrufiioxo
Chrispy_Yep

xx60 class has always represented "the sweet spot" and for gamers, the sweet spot moved on from 1080p60 a long time ago.
IMO, the sweet spot has been 1440p high refresh with VRR for years now. You don't need to always get >144 fps but an average of >100fps with 1% lows of over 60 is a good place to be.
idk 1440p 144hz with non peasant settings is still pretty high end..... While I wouldn't disagree it's what people should be targeting most can't afford it.

Posted on Reply
#57
Bomby569
oxrufiioxoNvidia themselves has said the 4060ti 8GB is targeting 1080p gaming.
they would say a cow is a bird if it was good for their bottom line
Posted on Reply
#58
oxrufiioxo
Bomby569they would say a cow is a bird if it was good for their bottom line
Saying a 400 usd 4060ti is for 1080p honestly just makes them look bad.
Posted on Reply
#59
Bomby569
oxrufiioxoSaying a 400 usd 4060ti is for 1080p honestly just makes them look bad.
they just double down on the 8GB things so it isn't off character for them, they choose this hill to die on, apparently. Now they can only just go forward full gas. smh
Posted on Reply
#60
Chrispy_
oxrufiioxoidk 1440p 144hz with non peasant settings is still pretty high end..... While I wouldn't disagree it's what people should be targeting most can't afford it.

Read the rest of my quote. VRR means that you don't have to hit the vsync. I'm saying that 90-100fps is a great experience good enough for just about everything outside of competitive esports.
Sure you can spend more for truly high refresh, but 1440p at "better-than-60Hz" seems to be the new sweet spot as of ~2021 or something like that.

Posted on Reply
#61
oxrufiioxo
Chrispy_Read the rest of my quote. VRR means that you don't have to hit the vsync. I'm saying that 90-100fps is a great experience good enough for just about everything outside of competitive esports.

I agree, that's way more realistic for most gamers.

1440p 80-100fps. 144hz is still a massive jump in cost.
Posted on Reply
#62
Chrispy_
oxrufiioxoI agree, that's way more realistic for most gamers.

1440p 80-100fps. 144hz is still a massive jump in cost.
and CPU starts to matter as framerates increase. A Ryzen5 or i5 can handle 100fps without dying.
Posted on Reply
#63
oxrufiioxo
Chrispy_and CPU starts to matter as framerates increase. A Ryzen5 or i5 can handle 100fps without dying.
I think a lot of gamers are actually targeting 60fps still to be honest though.
Posted on Reply
#64
Chane
OneMoarthe other side of the vram 'issue' is lazy console devlopers
being that consoles are unfied memory the fast/cheap thing todo is just to cram all the assets into memory because its all one very fast pool of 16gb GDDR6 On a 256bit/320bit buss (no seperate 'ram' and 'vram' its all one segeragated pool

so come pc port time they don't bother to properly manage memory / i/o pressure and everything falls apart because they way they are handling assets is frankly inefficient

now nvidia knows this and they should have made the effort to ensure that 10GB was the minium
Not sure what you mean by lazy on the console dev's part. Putting as many assets into that unified memory is the most efficient way to use it. It would be a waste of console performance to build around the I/O and RAM limitations of a PC. It's the responsibility of either their internal or external PC port team to find out what works best on a PC platform. I agree that a 10GB/12GB minimum this generation would have been nice to see from Nvidia.
Posted on Reply
#65
Chrispy_
oxrufiioxoI think a lot of gamers are actually targeting 60fps still to be honest though.
and there's a $250 GPU market segment specifically for those gamers, full of decent options ;D
Jensen's lost touch with reality if he thinks people with a $250 budget suddenly have a $400 budget in the middle of the biggest cost-of-living crisis since the 1970's
Posted on Reply
#66
Dr. Dro
TheLostSwedeI'm really feeling the love in this thread... :love:
Yeah, the thread has really brought up all the love that people have for Nvidia in here. As if AMD is any better.

Sad state this industry is in... that said while what Nvidia claims is true, it's just not justification for their abhorrent prices.
Posted on Reply
#67
Chrispy_
Dr. DroYeah, the thread has really brought up all the love that people have for Nvidia in here. As if AMD is any better.

Sad state this industry is in... that said while what Nvidia claims is true, it's just not justification for their abhorrent prices.
AMD are charging as much as they can get away with too, but at this tier RT and frame-generation are of questionable value so why pay a premium for them?

The RX 7600 is likely to be better perf/$ not because AMD want it to be, but because their inferior RT and lack of FG mean they can't charge as much for it - Use that to your advantage!
Posted on Reply
#68
oxrufiioxo
Dr. DroYeah, the thread has really brought up all the love that people have for Nvidia in here. As if AMD is any better.

Sad state this industry is in... that said while what Nvidia claims is true, it's just not justification for their abhorrent prices.
My brother is due for an upgrade and he is in the 5-700 usd range max. I feel bad for him because all the options in that price range are crap. I will probably pitch in so he can get a 7900XT I really dislike the 12GB on the 4070ti had Nvidia went with 16GB I would go with that instead for sure. Maybe the 4060ti 16GB will be better than I expect it to be but not holding my breath.
Posted on Reply
#69
Dr. Dro
oxrufiioxoMy brother is due for an upgrade and he is in the 5-700 usd range max. I feel bad for him because all the options in that price range are crap. I will probably pitch in so he can get a 7900XT I really dislike the 12GB on the 4070ti had Nvidia went with 16GB I would go with that instead for sure. Maybe the 4060ti 16GB will be better than I expect it to be but not holding my breath.
Yeah, stretching out another couple hundred bucks for the 7900 XT if possible seems generally sensible to me. I'm personally not sold on DLSS 3, I would maybe be more lenient with it if Nvidia didn't willingly withhold it from us 30 series owners, but I already tend to keep traditional DLSS off whenever possible, so frame generation couldn't possibly sway me either way.
Chrispy_AMD are charging as much as they can get away with too, but at this tier RT and frame-generation are of questionable value so why pay a premium for them?

The RX 7600 is likely to be better perf/$ not because AMD want it to be, but because their inferior RT and lack of FG mean they can't charge as much for it - Use that to your advantage!
RT is of questionable value, but frame generation is going to make or break these lower-end cards. Nvidia is fully accounting its frame generation technology into the general performance uplift and they strongly encourage you to enable it regardless of impact on image quality. Regarding Ada's lowest segments (such as 4050 mobile), you are essentially expected to use DLSS3 FG to achieve playable frame rates. Sucks to be you if the game you want to play doesn't support it, mail your dev requesting it or just don't be poor I guess.
Posted on Reply
#70
Chrispy_
oxrufiioxoMy brother is due for an upgrade and he is in the 5-700 usd range max. I feel bad for him because all the options in that price range are crap. I will probably pitch in so he can get a 7900XT I really dislike the 12GB on the 4070ti had Nvidia went with 16GB I would go with that instead for sure. Maybe the 4060ti 16GB will be better than I expect it to be but not holding my breath.
IMO the 4070 is the least-bad, most-balanced option. Its mediocre performance/$ doesn't make it stand out in the market but it's efficient, supports all the latest features and is (just about) fast enough to get away with enabling them.

It's not great, but I don't think the 4070 is crap, simply because it's bringing lower power draw and newer features to an existing price point. Don't get me wrong, the 6950XT is faster in purely raster-based performance, but I think if you have this budget it's because you don't want purely raster-based performance: You want to move all the sliders to the right and tick all of the boxes in the options menu.
Dr. DroYeah, stretching out another couple hundred bucks for the 7900 XT if possible seems generally sensible to me. I'm personally not sold on DLSS 3, I would maybe be more lenient with it if Nvidia didn't willingly withhold it from us 30 series owners, but I already tend to keep traditional DLSS off whenever possible, so frame generation couldn't possibly sway me either way.



RT is of questionable value, but frame generation is going to make or break these lower-end cards. Nvidia is fully accounting its frame generation technology into the general performance uplift and they strongly encourage you to enable it regardless of impact on image quality. Regarding Ada's lowest segments (such as 4050 mobile), you are essentially expected to use DLSS3 FG to achieve playable frame rates. Sucks to be you if the game you want to play doesn't support it, mail your dev requesting it or just don't be poor I guess.
There are, what, eleven games with DLSS3 FG so far?
It's RTX's launch all over again. By the time enough games support DLSS3 FG to justify buying a card on that feature alone, the 40-series is going to be as obsolete as the 20-series is now.
Posted on Reply
#71
Dr. Dro
Chrispy_There are, what, nine games with DLSS3 FG so far?
It's RTX's launch all over again. By the time enough games support DLSS3 FG to justify buying a card on that feature alone, the 40-series is going to be as obsolete as the 20-series is now.
Agreed, and IMO those are nine games too much. The industry should have simply rejected such a blatantly one-sided, proprietary and elitist "tech" that they are shamelessly gating from their own existing customers in a shameless upsell.
Posted on Reply
#72
oxrufiioxo
Chrispy_IMO the 4070 is the least-bad, most-balanced option. Its mediocre performance/$ don't make it stand out in the market but it's efficient, supports all the latest features and is (just about) fast enough to get away with enabling them.

It's not great, but I don't think the 4070 is crap, simply because it's bringing lower power draw and newer features to an existing price point. Don't get me wrong, the 6950XT is faster in purely raster-based performance, but I think if you have this budget it's because you don't want purely raster-based performance: You want to move all the sliders to the right and tick all of the boxes in the options menu.
I dislike the 4070 but I've put it on his radar I feel better about it than an 800usd card with 12GB of vram but the 7900Xt is about 32% faster in raster for about 200 bucks more and that is what he primarily cares about.
Chrispy_There are, what, eleven games with DLSS3 FG so far?
It's RTX's launch all over again. By the time enough games support DLSS3 FG to justify buying a card on that feature alone, the 40-series is going to be as obsolete as the 20-series is now.
I personally really like FG but it should be a bonus not something a person should buy a card for.
Posted on Reply
#73
Chrispy_
Dr. DroAgreed, and IMO those are nine games too much. The industry should have simply rejected such a blatantly one-sided, proprietary and elitist "tech" that they are shamelessly gating from their own existing customers in a shameless upsell.
I counted, rather than pulling a number out of my ass, and it's 11, not 9.
Still, yes. DLSS3 is a nice luxury feature to enable if you're already running the heck of out the game but it's not a solution to needing a more powerful GPU.
Posted on Reply
#74
Minus Infinity
So we now have fake bandwidth specs to go with fake frames.

These cards are utter trash. People that think oh look the 4060 isn't dearer than last gen, need to realiss this POS has a50 class die, has 50 class bus width and 50 class bandwidth but we now throw in L2 cache to make it look like bandwidth is much better.

4060 is a 4050 Ti, 4060 Ti 8GB is the 4060 and only 16GB Ti should be called 4060 Ti and even then to justify $500 needed more cores at least. It should have been the 192 bit 12GB card, and 4070 Ti should be 256 bit 16GB cut down 4080. Raster improvements are pitiful over last gen.

AMD won't rescue you either, 7600 will also be rubbish class and N33 is said to be much weaker than N32/31 in RT relatively speaking. Ie it's RTing will be far less than the ratio of CU's.
Posted on Reply
#75
yannus1
KellyNyanbinaryI feel like they have been bashing NVIDIA recently with not one, not two, but at least three videos criticizing the 8 GB RTX 3070. I think they have been advocating for more VRAM instead of being satisfied with the status quo and frame smearing.
They always fake bash. I'll always remember when they said that Nvidia didn't send them a sample to censor them while displaying loop advertisement of RTX. The same here, they say "oh no, it doesn't have enough VRAM" but always ended with a conclusion like " but they have a wonderful DLSS and RTX. This is a common technique of trying to seem opposing someone when in reality you're trying promote his interests.
Posted on Reply
Add your own comment
May 17th, 2024 09:24 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts