• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

AMD "Vega" High Bandwidth Cache Controller Improves Minimum and Average FPS

The actual amount of data used is negligible over the PCIe bus, I suggest a read of the PCIe scaling article W1zz did, most graphics cards only use X4 lanes of 2.0 in actual bandwidth, more than that only gives a few percent (not frames per second) more performance, so 60FPS +/- 3% doesn't really mean much.
well, don't focus on the used part. pci-e 3.0 x16 has 15.75 GB/s of bandwidth. rx480 with its good old gddr5 has 200+GB/s, high-end gpus have more.

in context of using ram in addition to vram why would memory management be a more limiting factor over a very narrow pipe?
 
Bla bla bla. Do I get to buy something that performs like the 1070 but costs less than 460/950? Otherwise it's just fvcking bla bla bla.
you enjoy stutters during heavy usage? this affects ALL gpus ALL sizes ALL performance levels
 
The VRAM is no longer the bottleneck factor for a long time. Even crappy video cards now have 4GB as standard, which is way more enough for 1080p gaming. The GPU horsepower is now the decisive factor in 99.9% of occasions.
 
depends on the game. 4gb has not been enough in many cases for highest (usually texture) settings, especially at higher resolutions like 1440p or uhd.
this is even more relevant now when gpus do have horsepower to run games at these resolutions.
 
well, don't focus on the used part. pci-e 3.0 x16 has 15.75 GB/s of bandwidth. rx480 with its good old gddr5 has 200+GB/s, high-end gpus have more.

in context of using ram in addition to vram why would memory management be a more limiting factor over a very narrow pipe?


The VMEM bandwidth is used every time the GPU performs Anti-Alising and Anisotropic Filtering, so if I need 70% of the bandwidth to perform after render effects, that means only 30% is actually used for frame rendering, and a part of that is used to store the finished frames.

I have a tuner card in a PCIe X1 slot that sends 24FPS of 1080i plus audio to my GPU directly and then it gets up scaled in hardware, the actual HDMI bandwidth is much higher though. So why can't I run my GPU at PCIE x1?

It's all about where the bandwidth is used and when, and PCIE is over kill for Graphics cards as they are today.
 
Bla bla bla. Do I get to buy something that performs like the 1070 but costs less than 460/950? Otherwise it's just fvcking bla bla bla.

Kid why do you smoke so much that you only expect AMD to give you everything at low cost. Why don't you ask Nvidia about it ? Oh wait they just lowered 40 dollars on GTX 1070, happy ?
 
Just something slightly related: I went to a GDC session yday talking about DX12 optimization, and one recommendation was to use the copy queue for all GPU<->CPU memory transfers. These happen in the background, completely independent of GPU activity, at full PCIe speeds. The key is to anticipate a few frames early that you will need the data, so it's in GPU memory when it is needed, so no stuttering occurs.
 
I don't understand how lowering the amount the VRAM amount would help with FPS. Can anyone explain this?
This probably works the same way as the (pseudo) SLC cache on TLC drives, similar in the way that it speeds up the frequent VRAM ops lifting the min & avg fps.
 
The reason they lowered the VRAM availability is that they wanted to place Vega into worst possible situation. Situation where HBC really shows it's strength, situations when game VRAM usage goes beyond what you actually have on-board.

Only thing that I wonder about is if HBC can do the data management on its own or if it has to be specifically coded to use it. Because if it can be used out of the box with anything, it'll be awesome. But if you have to specifically code for it, then that's a problem by itself.
Well if there's anything like HBC in Scorpio or PS5 (whenever it's released) then there's a good chance this approach will be popular, even if some years down the line.
 
I wonder where I get my degree in keyboard engineering?

From the posts in this thread at the Nvidia school of fanboy!!

First we had people butthurt about all the AMD news since if you have to buy AMD you are obviously a piss poor peon that shouldn't have a computer, and now we have a lot of posts about a new technology from AMD and lots of hate tossed it way by salad tossers, with no syrup.
What some fail to understand is simply, AMD is an innovator. If they were not, they would have gone out of business.
They don't copy, they Innovate & Design. Taking chances, because they have no choice but to do such a thing. Patients paid off with Ryzen, and I can see similar success with Vega.

FYI for all criticizing HBM. GDDR5 or what every you call it is outdated. HBM is the way for the future IMO, and it gets better with each new version coming out.
 
So, in an essence, AMD has expanded the cache hierarchy. We have L1 and L2 on GPU itself, L3 is basically VRAM (I'm not aware of L3 being used on GPU's unlike with CPU's or is it?) and now they've added L4 which is system RAM. All this is usually controlled by algorithm/prediction based prefetchers.

I mean, if this will be fully automatic without any need for special game code, it's gonna be nice and it's going to dramatically expand the usability of the graphic card over time as it ages and new demanding games come out with more memory needed to work. Sure it won't be as fast as having as much VRAm available at all times, but it won't be nearly as bad as running out of VRAM entirely. I know Win8/Win10 already does this to small extent, but I don't think not even nearly in such extent as VEGA will be doing it this.

I mean, with Vega, my 32GB of system RAM will finally find a very good use. Because for games, not even 16GB is really needed. Meaning other 16GB is idling to itself most of the time. But Vega will be able to use that. I like the idea very much.
Think Windows SuperFetch. It keeps assets in memory and removes them as the space is needed. Should the asset be required again, access to it will be much faster than having to pull it from a slower memory pool. It's really smart that they're doing this and it's kind of silly it hasn't been done yet.

Just something slightly related: I went to a GDC session yday talking about DX12 optimization, and one recommendation was to use the copy queue for all GPU<->CPU memory transfers. These happen in the background, completely independent of GPU activity, at full PCIe speeds. The key is to anticipate a few frames early that you will need the data, so it's in GPU memory when it is needed, so no stuttering occurs.
So HBCC is intended to stand in when developers fail to do that, or anticipate incorrectly.
 
Last edited:
I don't think it works that way, I think it can actually fetch data directly from RAM pool (or even SSD pool). It will just likely organize data in such a way that frequently used data is in VRAM and the less frequent one in RAM and even less frequent on SSD. But it doesn't mean it has to swap that data through VRAM to utilize it.

I mean, they use similar system on professional cards, you know, the one that comes with NAND attached to it? Surely, they know already how things work and they are confident enough they could unlesh this tech to consumer market...
 
Vega-Final-Presentation_Seite_36-pcgh.png


Judging from that picture, HBCC is a memory manager that sits below the L2 that has access to the HBM (presumably HMB2 stacks), system RAM, NAND, and even network (clearly aimed at enterprise customers). It moves pages of memory closer to, and into the L2 that it anticipates being necessary and it removes pages from the L2 that are expired.

Would have to watch the presentation to be sure.
 
I don't think it works that way, I think it can actually fetch data directly from RAM pool (or even SSD pool). It will just likely organize data in such a way that frequently used data is in VRAM and the less frequent one in RAM and even less frequent on SSD. But it doesn't mean it has to swap that data through VRAM to utilize it.

I mean, they use similar system on professional cards, you know, the one that comes with NAND attached to it? Surely, they know already how things work and they are confident enough they could unlesh this tech to consumer market...
In which case the HBCC should/could be faster, in theory, than the rest of HBM. Like I said previously, just as there's SLC cache in TLC drives, otherwise it makes little sense to partition the VRAM in such a fashion that it increases the complexity, possibly negating the HBM advantage over traditional GDDR5(x) or anything else.
 
In which case the HBCC should/could be faster, in theory, than the rest of HBM. Like I said previously, just as there's SLC cache in TLC drives, otherwise it makes little sense to partition the VRAM in such a fashion that it increases the complexity, possibly negating the HBM advantage over traditional GDDR5(x) or anything else.
hbcc is the controller, hbc is hbm.
what they are doing doesn't really increase complexity, it just builds on and expands the existing memory organization for more flexibility.
 
hbcc is the controller, hbc is hbm.
what they are doing doesn't really increase complexity, it just builds on and expands the existing memory organization for more flexibility.
I know that, but what's the point of a cache if it isn't faster than the VRAM (HBM) since the game (engine) & the OS do a bit a caching via software itself. There are two theories in this very thread, it could be something like virtual memory/pagefile (so it has to be faster than normal VRAM operations) or it could be prefetch/superfetch in which case I'm concerned about the overall benefit.
 
it is not meant to cache vram, vram itself is meant to be a cache for memory/storage at the next level where ever that may be.

actually from what has been said the controller seems to be meant for caching vram as well but that is done with l2 cache.
 
L2 is hugely faster than HBM. Likely the reason why they have no L3 cache is because of HBM's performance. Modern CPUs have an L3, some even an L4, because DDR3 was so slow compared to the L2. L4 went away (at least for now) because of the transition to DDR4.

My understanding is that what is unique to HBCC is that previous generations of GPUs would only maintain where the data they need is. It would constantly overwrite that data with new data and all I knows about what is contained in that memory is what is in use and what is not. HBCC not only maintains usage, but context. Imagine an asset siting in the system memory like a texture. One frame uses that texture so the GPU pulled it from the RAM and stuck it in the HBC then took a tile of it from the HBC and moved it to L2 where the GPU continued to pull what was necessary from that tile to do actual work on it in the L1 caches. In the next frame, the same texture is used, instead of having to go to the system RAM again to fetch it (because the developer was an idiot and didn't precache it), the HBCC sees that asset already sitting in HBC and starts using it instead of waiting to get it from the system RAM. That saves a few milliseconds in render time.

I think it will help hugely with tessellation, for example.
 
Last edited:
The VRAM is no longer the bottleneck factor for a long time. Even crappy video cards now have 4GB as standard, which is way more enough for 1080p gaming. The GPU horsepower is now the decisive factor in 99.9% of occasions.
depends on the game. 4gb has not been enough in many cases for highest (usually texture) settings, especially at higher resolutions like 1440p or uhd.
this is even more relevant now when gpus do have horsepower to run games at these resolutions.
it's not about resolution or horsepower, a dev can choose to want to cram high density textures that require 6gb minimum, have a minor blur at 4gb, & stutter with more blur at 3gb https://www.computerbase.de/2016-09/grafikkarten-speicher-vram-test/ with screenshots to see it for yourself

actually i'm surprised & disappointed that so many devs choose to fill up an entire 4gb, i prefer things to be streamed with maximum detail on nearby objects, aka megatextures (well, even some of the streaming games seem to do a poor job if there is some loss or stutter at 4gb)

a few days ago i had afterburner open in call of duty 4.... only 300something mb used! windows idle was like 100mb
 
today, practically all games are streaming textures.
 
Back
Top