Friday, March 31st 2017

AMD's RX Vega to Feature 4 GB and 8 GB Memory

It looks like AMD is confident enough on its HBC (High-Bandwidth Cache) and HBCC (High-Bandwidth Cache Controller) technology, and other assorted improvements to overall Vega memory management, to consider 4 GB as enough memory for high-performance gaming and applications. On a Beijing tech summit, AMD announced that its RX Vega cards (the highest performers in their next generation product stack, which features rebrands of their RX 400 line series of cards to th new RX 500) will come in at 4 GB and 8 GB HBM 2 (512 GB/s) memory amounts. The HBCC looks to ensure that we don't see a repeat of AMD's Fury X video card, which featured first generation HBM (High-Bandwidth memory), at the time limited to 4 GB stacks. But lacking extensive memory management improvements meant that the Fury X sometimes struggled on memory-heavy workloads.

If the company's Vega architecture deep dive is anything to go by, they may be right: remember that AMD put out a graph showing how the memory allocation is almost twice as big as the actual amount of memory used - and its here, with smarter, improved memory management and allocation, that AMD is looking to make do with only 4 GB of video memory (which is still more than enough for most games, mind you). This could be a turn of the screw moment for all that "more is always better" philosophy.
Add your own comment

52 Comments on AMD's RX Vega to Feature 4 GB and 8 GB Memory

#26
Prima.Vera
Guys relax. 4GB of VRAM is plenty enough for 1080p, on which those 4GB cards will probably aim. Since MSAA is quickly replaced by FXAA/SMAA and new texture and color compression algorithms are getting better, no need to dramatize about low VRAM.
Posted on Reply
#27
FordGT90Concept
"I go fast!1!11!1!"
Vega is not aimed at 1920x1080, it is aimed at 2560x1440 and up. RX 480/580 is sufficient for 1920x1080.

The reason for huge pools of memory is because of massive textures and models which have sharply been increasing since PS4 and XB1 released. Selling a premium card with budget amounts of memory makes no sense.
Captain_TomAMD claims to have new tech that allows for half as much memory to be used as normal. If that's true, then a 4GB model is more than viable. In fact I would say this kinda confirms that AMD is confident it will work. They wouldn't do it otherwise...
Developers will do whatever they want to. AMD is banking on them implementing their optimizations to reduce the memory footprint. Very few titles will.

Some games don't use >4 GiB of VRAM but some do and with every passing year, the latter group gets bigger.
Posted on Reply
#28
Captain_Tom
FordGT90ConceptVega is not aimed at 1920x1080, it is aimed at 2560x1440 and up. RX 480/580 is sufficient for 1920x1080.

The reason for huge pools of memory is because of massive textures and models which have sharply been increasing since PS4 and XB1 released. Selling a premium card with budget amounts of memory makes no sense.


Developers will do whatever they want to. AMD is banking on them implementing their optimizations to reduce the memory footprint. Very few titles will.

Some games don't use >4 GiB of VRAM but some do and with every passing year, the latter group gets bigger.
It's not that simple. This is at the architectural level (Just like memory compression, though not the same thing).
Posted on Reply
#29
FordGT90Concept
"I go fast!1!11!1!"
Any assets not loaded into the HBM2 on board memory will see bandwidth drop from 512+ GB/s to ~20 GB/s. More cache means less likely to incur that performance penalty. Even if the architecture is better at deciding what assets are needed in the HBM2 cache, it still going to get it wrong especially when it can only cache half as much.
Posted on Reply
#30
RejZoR
Dude, where were you when HBC was explained and presented? They ran Deus Ex Mankind Divided on RX Vega with VRAM artificially limited to 2GB, to induce a memory capacity issue scenario. And the game still ran smoothly. And I don't get the skepticism, the concept isn't exactly new, it existed years ago in form of HyperMemory for AMD and TurboCache for NVIDIA. It worked very similar, it's just faster and more adaptable now. Plus, things became faster on the system side as well. Faster RAM, triple and quad channel configurations etc
Posted on Reply
#31
FordGT90Concept
"I go fast!1!11!1!"
DXMD has small environments. Games like Shadows of Mordor are a completely different animal.

Quad channel will still only get you 100 GB/s under the best circumstances which is a far cry from the minimum 512 GB/s of the HBM stacks.
Posted on Reply
#32
mrthanhnguyen
I hope Vega won't disappoint us at launch coz someone will make an excuse like wait for game developers optimize Vega and it will outperf Nvidia counterpart soon.
Posted on Reply
#33
RejZoR
FordGT90ConceptDXMD has small environments. Games like Shadows of Mordor are a completely different animal.

Quad channel will still only get you 100 GB/s under the best circumstances which is a far cry from the minimum 512 GB/s of the HBM stacks.
You still don't understand. There is NO NEED for RAM to have as high bandwidth as VRAM. Though it helps probably if it's higher. Small environments or big environments, it doesn't matter. At all. Ever played Killing Floor 2? It has quite big levels and tons of enemies. If you leave texture streaming enabled, it hardly uses any VRAM, even on Ultra settings. But if you turn it off, it'll fill basically all VRAM. This is how HBC works, just on hardware level and not on game level.
Posted on Reply
#34
RejZoR
mrthanhnguyenI hope Vega won't disappoint us at launch coz someone will make an excuse like wait for game developers optimize Vega and it will outperf Nvidia counterpart soon.
It's funny you say that, AMD has proved that to be the case on almost every graphics card launch. And they do in fact outperform NVIDIA long term when cards are going head to head with relatively similar performance. It's funny that people still keep bringing it up over and over again...
Posted on Reply
#35
JMccovery
Captain_TomThank you. Some people just can't read lol.


I wouldn't be surprised if someone like SAPPHIRE released a toxic edition with 16GB of VRAM, but 8GB standard would be more than enough.
I think you're a slight bit unsure about HBM. Since AMD places the HBM stacks along with the GPU on an interposer, Sapphire couldn't offer a 16GB edition, unless: AMD releases a GPU with 16GB, Sapphire releases a dual-GPU card or there are additional memory controllers in the GPU that will allow GDDR5/X to be used in combination with HBM.
Posted on Reply
#36
FordGT90Concept
"I go fast!1!11!1!"
RejZoRYou still don't understand. There is NO NEED for RAM to have as high bandwidth as VRAM. Though it helps probably if it's higher. Small environments or big environments, it doesn't matter. At all. Ever played Killing Floor 2? It has quite big levels and tons of enemies. If you leave texture streaming enabled, it hardly uses any VRAM, even on Ultra settings. But if you turn it off, it'll fill basically all VRAM. This is how HBC works, just on hardware level and not on game level.
I had a longer post here but basically...
www.tweaktown.com/tweakipedia/90/much-vram-need-1080p-1440p-4k-aa-enabled/index.html
-Streaming engines (Killing Floor is not) use more VRAM in general.
-Streaming engines tend to have more data cached because the player simply turning around in the world can result in wildly different scene that needs to be rendered.
-A small, detailed environment can be more demanding than a not-so-detailed large environment. It boils down to textures, shaders, triangles, and post rendering effects applied (e.g. anti-aliasing).
-Thanks to tessellation, even if you have a lot of models of the same kind of enemy in the game, you can render them repeatedly at little memory cost.
-Texture streaming is not the same as a streaming engine. Texture streaming doesn't precache textures which means there's a frame time spike whenever a new resource needs to be pulled from the memory to the GPU to render the scene. In games where there's literally 10s of gigabytes of textures, it's impossible to precache all of them. Streaming engines naturally have to stream textures as well.
-HBCC literally only reduces the effect of the frame time spike by a few milliseconds. Preemption is far better.

Wrapping back to the link: Witcher 3, Far Cry 4, GTAV, and Shadow of Mordor are streaming engines. Of those, GTAV and Shadow of Mordor both either are at or exceed 4 GiB VRAM at 1920x1080 and climb from there. The other two are surprisingly well optimized likely because they aren't as aggressive at caching.
Posted on Reply
#37
RejZoR
Erm, no, no, nope, NOPE.

- Streaming engines use LESS VRAM. That's the whole point of this tech. It loads textures into memory on the fly for the region of the game you're currently in, not whole level at start of the game. Killing Floor 2 most certainly DOES use texture streaming. I made a tweaker for it around this very feature for the game. Original Killing Floor is UE 2.5 game, of course it doesn't have texture streaming.
- See first point.
- Triangles or polygons have absolutely nothing to do with any of it.
- Tessellation has absolutely nothing to do with any of it.
- Texture streaming or streaming engine, same exact thing. Except first streams only textures, second can stream also world objects/entities. Texture streaming does in fact precache textures so they are ready for the engine before it actually needs them. If that was not the case, you'd see world without textures because you wouldn't have them when needed or they'd pop up into existence which is very unwanted behavior. Which is why you have to preemptively fetch them into memory (precache) and make them available to the engine a bit before they are actually needed. All this is happening in the background, all the time. Which is why such games experience stuttering with HDD's, because the game is constantly fetching data and loading it into VRAM, but HDD's do it very slowly. But the gain is, you need way less VRAM, because at any point, you don't have textures of whole level in VRAM, you only have for a small section of level where you're currently located. Textures fill majority of VRAM during rendering. Like 3/4 of VRAM are textures. The rest are model meshes, entities and framebuffer (+other game data).
Posted on Reply
#38
FordGT90Concept
"I go fast!1!11!1!"
The VRAM in a streaming engine is going to depend on how far the draw distance is and the steepness of the level of detail line. A relatively flat line with a high draw distance (e.g. mountains very, very far away) will still amass a huge number of textures and triangles in VRAM to sort through to render the frame; likewise, a short draw distance with high levels of detail will amass a huge number of polygons to draw which occupies VRAM. Game development is always about striking a balance between draw distance, levels of detail, and acceptable framerates (filling VRAM, like any RAM, results in a huge drop in performance).

Triangles are literally everything you see in 3D games. GPUs draw them to create a frame. That's data that lives in the VRAM.

Tessellation is literally about doing more with less, because math. Have a presentation:
www.seas.upenn.edu/~cis565/LECTURE2010/GPU Tessellation.pptx
In hardware tessellation allows a simple mesh to be sent down to the GPU, converted to a complex mesh, and then displayed
-Decrease memory to the card
-Increase rendering performance by decreasing the number of polygons through the full pipeline
Streaming engines take a location in the world and then load everything it needs based on that. As you move about the world, it has to decide what can be disposed of and what needs to be loaded. That includes everything from collision objects to textures. When you fast travel in a streaming engine, there's always loading as it disposes of your current location and it loads the next location. If there was no loading, you'd literally see nothing and fall endlessly because there's literally nothing there until it has said "I have enough for the player now." Strictly streaming textures is minor compared to a streaming engine.

When streaming anything, there is a degree of precaching. The streaming code tries to preempt what can be seen in that context and attempts to get into the pipeline in case it is needed. If you move about the world faster than the textures can stream (e.g. very slow HDD), you'll either see texture pop in or you'll see loading screens (GTA3 did this way back, GTA4 did it on consoles).

The frame time spikes occur when preemption fails to catch a resource that's needed.

There's games that really don't use much of textures but can still saturate VRAM with polygons and post processing effects.
Posted on Reply
#39
efikkan
RejZoRI'm just wondering how they are doing it. How they prioritize what goes into on-board memory and what goes into system RAM. Is it game dependent or is it fully on-the-fly, do they have to make game profiles, this is the stuff I'm wondering the most about. Because if they can achieve this fully on-the-fly without any profiles, just a really intelligent algorithms (maybe assisted with driver updatable algorithms on software level) that could be really sweet.
Resource streaming has to be implemented in the game engine.
Prefetching in CPUs works by finding access patterns; e.g. access of block at address x, then x + k, then x + 2k, but it has three requirements:
- The data to be accessed needs to be specifically laid out the way it's going to be accessed.
- There has to be several accesses before there can be a pattern, which means several cache misses, which in turn means stutter or missing resources.
- The patterns have to occur over a relatively short time, and there is no way you can look for patterns in hundreds of thousands of memory accesses. A CPU for comparison looks through a instruction window of up to 224 instructions. For GPUs we have queues of up to several thousands of instructions, and it's not like the driver is going to analyze the queues for several frames to look for patterns and keep a giant table and resolve that immediately.

The only game data that would have benefits from this would be landscape data, but the data still needs to laid out in a specific pattern, which is something developers usually don't control. Also, this kind of caching would only work as long as the camera keeps moving in straight lines over time.

Resource streaming can be very successful when it's implemented properly in the rendering engine itself.
RejZoRRX480 and RX580 might have 8GB, but they only have that and no more. RX Vega with HBC can address all the memory you have in the system. In my case that would be 16+ GB of always free RAM and 8GB on-board. Not even GTX 1080Ti or Titan X Pascal has that.
FYI: Sharing of memory between CPU and GPU has been available in CUDA for years, so the idea is not new. It does however have very limited use cases.
RejZoRProblem with texture streaming is that you're essentially doing a VRAM+HDD/VRAM+SSD instead of something a lot faster. And Vega's HBC with VRAM+RAM (+SSD) could certainly address that far better in same way how CPU addresses it's memory hierarchically. L1 cache is VRAM. L2 is RAM. L3 can be SSD. Because texture streaming still causes hitching, stuttering and framerate lag when doing texture streaming the way current game engines do it (VRAM+HDD) because it's parsing textures from really slow medium.
As someone who has implemented texture streaming with a three-level hierarchy, I can tell you the problem is prediction. With HBC each access to RAM is still going to be very slow, so the data has to be prefetched. HBC is not going to make the accesses "better".
Posted on Reply
#40
RejZoR
Apparently they have it working regardless, otherwise they'd not even be considering it if it was so problematic and gimped by every 3rd game. That's all I can say. And I well know everything you've said there as I've worked with similar caching system for storage. One was file based (basically like texture streaming which is highly configurable and selective) and later with block based which has no idea what each program/file (texture analogy) is, it just cached it based on access patterns.
Posted on Reply
#41
efikkan
Engineering samples of Vega 10 uses 8 GB memory, so we can expect versions with 8 GB. Vega 10 is going to compete with GP104, so 8 GB will probably be fine.
Posted on Reply
#42
FordGT90Concept
"I go fast!1!11!1!"
The only way 4 GiB makes sense is if they have to cut these chips down so far they end up retailing for $200-250 and I certainly hope that isn't the case.
Posted on Reply
#43
RejZoR
That'll be the price of RX 580, you won't see any Vega for 250 bucks.
Posted on Reply
#44
laszlo
nice! why pay more when 4 gb hbm2 usage is optimized to the level of 8 gb ddr5? cheaper tech always sound good for the wallet
Posted on Reply
#45
Hotobu
So when is the estimated release of Vega? I'm seeing a lot of conflicting news
Posted on Reply
#46
kruk
HotobuSo when is the estimated release of Vega? I'm seeing a lot of conflicting news
According to AMD it's just around the corner, according to astronomers it's 25.04 light years away, but it should certainly appear before end of 1H 2017 :). Since the leaks are so scarce, I wouldn't expect anything before june this year (Fury X and RX 480 series launched at about the same timeframe).
Posted on Reply
#47
64K
In other words, AMD is planning to launch a competitor for the mid range GTX 1080 (non-Ti) a year and a half after the 1080 (non-Ti) was launched. Mid-range Voltas will be launched a few months later and then we wait a year and a half for Navi to drop and hope that it can compete with the mid-range Volta. This is just sad. We will pay through the nose for those Voltas because of this.
Posted on Reply
#48
ratirt
64KIn other words, AMD is planning to launch a competitor for the mid range GTX 1080 (non-Ti) a year and a half after the 1080 (non-Ti) was launched. Mid-range Voltas will be launched a few months later and then we wait a year and a half for Navi to drop and hope that it can compete with the mid-range Volta. This is just sad. We will pay through the nose for those Voltas because of this.
From what I have read about Vega there will be several different cards released. In this case maybe there will be a 1080Ti competition anyway. Knowing 6 different vega cards configuration is in store there. What makes you so sure 1080Ti won't get a competition from AMD? I surely hope it will get a competition. That would make the pricing better and I'm all after that.
Posted on Reply
#49
64K
ratirtFrom what I have read about Vega there will be several different cards released. In this case maybe there will be a 1080Ti competition anyway. Knowing 6 different vega cards configuration is in store there. What makes you so sure 1080Ti won't get a competition from AMD? I surely hope it will get a competition. That would make the pricing better and I'm all after that.
Just going from what AMD has been saying about Vega. I think it will perform somewhere around a 1080 (non-Ti) in most games and outperform the 1080 in DX12 or Vulkan games. I see no reason why AMD couldn't release a beefier faster Vega that competed with the 1080 Ti but can they while keeping the wattage used reasonable? I hope so.
Posted on Reply
#50
medi01
Last (the only?) time we saw Vega. Note that just beating 1080 (314mm^2) is laughable for nearly 500mm^2 chip, it would be Sony level epic.fail, the thing must take on 1080Ti at least:

FordGT90ConceptSome games don't use >4 GiB of VRAM but some do and with every passing year, the latter group gets bigger.
So we know equipping cards with HBM is expensive, mkay, so that's why AMD even bothered with the whole "high bandwidth cache" thing.
At least in theory it sounds quite feasible: "dear developer, we know you don't need all that mem at once, just allocate whatever you need, we'll handle moving things into GPU mem ourselves, oh, and by the way, we don't call it VRAM, we call it high bandwidth cache now".

The basic idea here is that, especially in the professional space, data set size is vastly larger than local storage. So there needs to be a sensible system in place to move that data across various tiers of storage. This may sound like a simple concept, but in fact GPUs do a pretty bad job altogether of handling situations in which a memory request has to go off-package. AMD wants to do a better job here, both in deciding what data needs to actually be on-package, but also in breaking up those requests so that “data management” isn’t just moving around a few very large chunks of data. The latter makes for an especially interesting point, as it could potentially lead to a far more CPU-like process for managing memory, with a focus on pages instead of datasets.






www.anandtech.com/show/11002/the-amd-vega-gpu-architecture-teaser/3

Posted on Reply
Add your own comment
Apr 20th, 2024 03:03 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts