Tuesday, February 28th 2017

AMD "Vega" High Bandwidth Cache Controller Improves Minimum and Average FPS

At its Capsaicin & Cream event today, AMD announced that its High Bandwidth Cache Controller (HBCC), a feature introduced by its "Vega" GPU architecture to improve memory management, will increase game performance tangibly. The company did a side-by-side comparison between two sessions of "Deus Ex: Mankind Divided," in which a HBCC-aware machine purportedly presented 2x better minimum FPS, and 1.5x better average FPS scores, than a non-HBCC-aware system (though the old, trusty frame-rate counter was conspicuously absent from both demos).

AMD also went on to show how HBCC seemingly halves memory requirements, by deliberately capping the amount of addressable memory on the HBCC-aware system to only 2 GB - half of the 4 GB addressable by the non-HBCC-aware system, while claiming that even so, the HBCC-enabled system still showed "the same or better performance" through its better memory management and bandwidth speeds. If these results do hold up to scrutiny, this should benefit implementations of "Vega" with lower amounts of video memory, while simultaneously reducing production costs and overall end-user pricing, since smaller memory pools would be needed for the same effect.
Add your own comment

44 Comments on AMD "Vega" High Bandwidth Cache Controller Improves Minimum and Average FPS

#1
the54thvoid
Whats the implication of throttling the GPU memory to 2GB? It sounds like an artificial 'improvement' that fails after 2GB of memory usage....
Posted on Reply
#2
acperience7
I don't understand how lowering the amount the VRAM amount would help with FPS. Can anyone explain this?
Posted on Reply
#3
BiggieShady
By lowering amount of available VRAM, GPU is forced to actually heavily use High Bandwidth Cache ... that's how I got it, as a comparison of HBC vs. RAM-PCIE-VRAM transfer
Having less vram should have less impact on fps ... and incoming news about Vega HBCC to halve memory requirements
Posted on Reply
#4
happita
The article says Human Revolution which is an older game, but the picture shows Mankind Divided. I'm thinking it's the latter, right?
Posted on Reply
#5
Steevo
BiggieShady, post: 3610047, member: 102776"
By lowering amount of available VRAM, GPU is forced to actually heavily use High Bandwidth Cache ... that's how I got it
You are implying the GPU is aware of something instead of engineered to use the HBC with logic to move not only instructions but highly used textures, maps, and models into the cache.

I would liken this to when we bought motherboards that had their own cache, the CPU was unaware of the cache other than if it found instructions in it, they were executed much faster than on system memory. I believe their CPU division and engineering of SOC for Sony and MS is paying dividends for GPU tech as well.
Posted on Reply
#6
Camm
This is cool, but I'm still somewhat concerned of how much die space this takes up.
Posted on Reply
#7
BiggieShady
Steevo, post: 3610054, member: 19251"
You are implying the GPU is aware of something instead of engineered to use the HBC with logic to move not only instructions but highly used textures, maps, and models into the cache.

I would liken this to when we bought motherboards that had their own cache, the CPU was unaware of the cache other than if it found instructions in it, they were executed much faster than on system memory. I believe their CPU division and engineering of SOC for Sony and MS is paying dividends for GPU tech as well.
GPUs already do this with RAM serving as extension to VRAM and using PCIE for transfer when vram amount isn't enogh (less used textures end up in RAM and often used in VRAM) ... putting a high bandwidth memory cache buffer in between to battle stutters (this feature will shine on low end parts with less vram) ... nice one amd *slow clap*
Posted on Reply
#8
londiste
amd is clearly talking about memory-starved situations. this should not have much effect if your gpu actually has access to necessary amount of vram.
Posted on Reply
#9
RejZoR
The reason they lowered the VRAM availability is that they wanted to place Vega into worst possible situation. Situation where HBC really shows it's strength, situations when game VRAM usage goes beyond what you actually have on-board.

Only thing that I wonder about is if HBC can do the data management on its own or if it has to be specifically coded to use it. Because if it can be used out of the box with anything, it'll be awesome. But if you have to specifically code for it, then that's a problem by itself.
Posted on Reply
#10
snakefist
Yes, this is not-enough-RAM situation, simulated. From that point of view, it's good thing, more future-proof (hoping that it comes down to middle segment, too - high-end users tend to replace GPUs more often, anyway)...
Posted on Reply
#11
NdMk2o1o
Looks like hbm won't be limited to their top tier cards in future then perhaps? You could put 2gb hbm on a mid range card and get the same or better performance of it having 4gb gddr5!?...
Posted on Reply
#12
Steevo
I wonder where I get my degree in keyboard engineering?

From the posts in this thread at the Nvidia school of fanboy!!

First we had people butthurt about all the AMD news since if you have to buy AMD you are obviously a piss poor peon that shouldn't have a computer, and now we have a lot of posts about a new technology from AMD and lots of hate tossed it way by salad tossers, with no syrup.
Posted on Reply
#13
RejZoR
snakefist, post: 3610073, member: 154791"
Yes, this is not-enough-RAM situation, simulated. From that point of view, it's good thing, more future-proof (hoping that it comes down to middle segment, too - high-end users tend to replace GPUs more often, anyway)...
The concept is not new though. There were NVIDIA TurboCache and ATI HyperMemory cards that utilized small VRAM (usually just up to 256MB) while the rest was done via system RAM. It was super cost effective solution that delivered basically the same framerate as the one with all that memory on the graphic card itself. And what AMD has done here is just a very refined HyperMemory. And they aren't using fast on-board memory, they are using REALLY fast onboard memory.
Posted on Reply
#14
Pruny
RejZoR, post: 3610097, member: 1515"
The concept is not new though. There were NVIDIA TurboCache and ATI HyperMemory cards that utilized small VRAM (usually just up to 256MB) while the rest was done via system RAM. It was super cost effective solution that delivered basically the same framerate as the one with all that memory on the graphic card itself. And what AMD has done here is just a very refined HyperMemory. And they aren't using fast on-board memory, they are using REALLY fast onboard memory.
Hypermemory cards were so weak that could not use that extra memory. Was a scam to sell crapy cards.
Posted on Reply
#15
the54thvoid
Ah, I see now - they simulated the card only having 2GB ram, not dealing with 2 GB of texture or video data... Very nice in that case.
Posted on Reply
#16
RejZoR
the54thvoid, post: 3610135, member: 79251"
Ah, I see now - they simulated the card only having 2GB ram, not dealing with 2 GB of texture or video data... Very nice in that case.
Well, the big Vega comes with 8GB of VRAM, that's enough even for most hungry games today. They were forced to simulate it to make a point. Otherwise it would just all run in the VRAM anyway. What this means is that you have 8GB on-board. But the game can utilize beyond that and you won't have any performance penalty where currently, if it goes past the on-board memory, performance will just tank like insane.

@Pruny
That's not entirely true. The cards were VRAM starved to begin with, some packing only 32MB of on-board VRAM. And then it was expanded to 128MB which was a common standard in 2004/2005. Meaning the cards could be ridiculously cheap since they hardly had any expensive RAM on them.
Posted on Reply
#17
W1zzard
RejZoR, post: 3610097, member: 1515"
The concept is not new though. There were NVIDIA TurboCache and ATI HyperMemory cards that utilized small VRAM (usually just up to 256MB) while the rest was done via system RAM. It was super cost effective solution that delivered basically the same framerate as the one with all that memory on the graphic card itself. And what AMD has done here is just a very refined HyperMemory. And they aren't using fast on-board memory, they are using REALLY fast onboard memory.
What is new here is that apparently the gpu has some concept of virtual memory, like your cpu does. Think pagefile. This is completely transparent to the application.. Memory pages will be paged out automatically when memory get low, probably based on some recently used algorithm. When a page fault is generated by the gpu, the relevant pages are paged in by the gpu, automagically, but with higher latency.
Posted on Reply
#18
Pruny
cards like x1050 had hipermemory, to x1550 , those were rubish.
Posted on Reply
#19
londiste
RejZoR, post: 3610097, member: 1515"
The concept is not new though. There were NVIDIA TurboCache and ATI HyperMemory cards that utilized small VRAM (usually just up to 256MB) while the rest was done via system RAM. It was super cost effective solution that delivered basically the same framerate as the one with all that memory on the graphic card itself. And what AMD has done here is just a very refined HyperMemory. And they aren't using fast on-board memory, they are using REALLY fast onboard memory.
the interesting part about hbcc is that based on what little has been revealed about this they are going the other way on the memory hierarchy. this seems to be memory controller infused with some level of ability to control l2 cache. it will be interesting to learn how and what exactly has been achieved.

W1zzard, post: 3610180, member: 1"
What is new here is that apparently the gpu has some concept of virtual memory, like your cpu does. Think pagefile. This is completely transparent to the application.. Memory pages will be paged out automatically when memory get low, probably based on some recently used algorithm. When a page fault is generated by the gpu, the relevant pages are paged in by the gpu, automagically, but with higher latency.
that is what hypermemory and turbocache (and their successors) already did.

speculation at this point but the new nuance from this side seems to be the 'cache' part that hints at new pieces in the hierarchy. looks like amd might be preparing to equip cards with multiple types of memory, perhaps both hbm and gddr5x and the new improved controller will be able to handle this better.
Posted on Reply
#20
Steevo
Pruny, post: 3610182, member: 167058"
cards like x1050 had hipermemory, to x1550 , those were rubish.
Are you replying to yourself? Are you a bot? The 1050 was more of a "get Aero on Vista and pretties for $45" card aimed at office computers than anything. I don't think I ever put anything less than a 1600 in computer.
Posted on Reply
#21
RejZoR
W1zzard, post: 3610180, member: 1"
What is new here is that apparently the gpu has some concept of virtual memory, like your cpu does. Think pagefile. This is completely transparent to the application.. Memory pages will be paged out automatically when memory get low, probably based on some recently used algorithm. When a page fault is generated by the gpu, the relevant pages are paged in by the gpu, automagically, but with higher latency.
So, in an essence, AMD has expanded the cache hierarchy. We have L1 and L2 on GPU itself, L3 is basically VRAM (I'm not aware of L3 being used on GPU's unlike with CPU's or is it?) and now they've added L4 which is system RAM. All this is usually controlled by algorithm/prediction based prefetchers.

I mean, if this will be fully automatic without any need for special game code, it's gonna be nice and it's going to dramatically expand the usability of the graphic card over time as it ages and new demanding games come out with more memory needed to work. Sure it won't be as fast as having as much VRAm available at all times, but it won't be nearly as bad as running out of VRAM entirely. I know Win8/Win10 already does this to small extent, but I don't think not even nearly in such extent as VEGA will be doing it this.

I mean, with Vega, my 32GB of system RAM will finally find a very good use. Because for games, not even 16GB is really needed. Meaning other 16GB is idling to itself most of the time. But Vega will be able to use that. I like the idea very much.
Posted on Reply
#22
londiste
aren't you forgetting that ram is at the other side of (likely actively used) pci-e x16?
Posted on Reply
#23
Steevo
londiste, post: 3610233, member: 169790"
aren't you forgetting that ram is at the other side of (likely actively used) pci-e x16?
The actual amount of data used is negligible over the PCIe bus, I suggest a read of the PCIe scaling article W1zz did, most graphics cards only use X4 lanes of 2.0 in actual bandwidth, more than that only gives a few percent (not frames per second) more performance, so 60FPS +/- 3% doesn't really mean much.
Posted on Reply
#24
Nabarun
Bla bla bla. Do I get to buy something that performs like the 1070 but costs less than 460/950? Otherwise it's just fvcking bla bla bla.
Posted on Reply
#25
londiste
Steevo, post: 3610274, member: 19251"
The actual amount of data used is negligible over the PCIe bus, I suggest a read of the PCIe scaling article W1zz did, most graphics cards only use X4 lanes of 2.0 in actual bandwidth, more than that only gives a few percent (not frames per second) more performance, so 60FPS +/- 3% doesn't really mean much.
well, don't focus on the used part. pci-e 3.0 x16 has 15.75 GB/s of bandwidth. rx480 with its good old gddr5 has 200+GB/s, high-end gpus have more.

in context of using ram in addition to vram why would memory management be a more limiting factor over a very narrow pipe?
Posted on Reply
Add your own comment