• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

DirectX 12 API New Feature Set Introduces GPU Upload Heaps, Enables Simultaneous Access to VRAM for CPU and GPU

T0@st

News Editor
Joined
Mar 7, 2023
Messages
3,149 (3.94/day)
Location
South East, UK
System Name The TPU Typewriter
Processor AMD Ryzen 5 5600 (non-X)
Motherboard GIGABYTE B550M DS3H Micro ATX
Cooling DeepCool AS500
Memory Kingston Fury Renegade RGB 32 GB (2 x 16 GB) DDR4-3600 CL16
Video Card(s) PowerColor Radeon RX 7800 XT 16 GB Hellhound OC
Storage Samsung 980 Pro 1 TB M.2-2280 PCIe 4.0 X4 NVME SSD
Display(s) Lenovo Legion Y27q-20 27" QHD IPS monitor
Case GameMax Spark M-ATX (re-badged Jonsbo D30)
Audio Device(s) FiiO K7 Desktop DAC/Amp + Philips Fidelio X3 headphones, or ARTTI T10 Planar IEMs
Power Supply ADATA XPG CORE Reactor 650 W 80+ Gold ATX
Mouse Roccat Kone Pro Air
Keyboard Cooler Master MasterKeys Pro L
Software Windows 10 64-bit Home Edition
Microsoft has implemented two new features into its DirectX 12 API - GPU Upload Heaps and Non-Normalized sampling have been added via the latest Agility SDK 1.710.0 preview, and the former looks to be the more intriguing of the pair. The SDK preview is only accessible to developers at the present time, since its official introduction on Friday 31 March. Support has also been initiated via the latest graphics drivers issued by NVIDIA, Intel, and AMD. The Microsoft team has this to say about the preview version of GPU upload heaps feature in DirectX 12: "Historically a GPU's VRAM was inaccessible to the CPU, forcing programs to have to copy large amounts of data to the GPU via the PCI bus. Most modern GPUs have introduced VRAM resizable base address register (BAR) enabling Windows to manage the GPU VRAM in WDDM 2.0 or later."

They continue to describe how the update allows the CPU to gain access to the pool of VRAM on the connected graphics card: "With the VRAM being managed by Windows, D3D now exposes the heap memory access directly to the CPU! This allows both the CPU and GPU to directly access the memory simultaneously, removing the need to copy data from the CPU to the GPU increasing performance in certain scenarios." This GPU optimization could offer many benefits in the context of computer games, since memory requirements continue to grow in line with an increase in visual sophistication and complexity.



A shared pool of memory between the CPU and GPU will eliminate the need to keep duplicates of the game scenario data in both system memory and graphics card VRAM, therefore resulting in a reduced data stream between the two locations. Modern graphics cards have tended to feature very fast on-board memory standards (GDDR6) in contrast to main system memory (DDR5 at best). In theory the CPU could benefit greatly from exclusive access to a pool of ultra quick VRAM, perhaps giving an early preview of a time when DDR6 becomes the daily standard in main system memory.


View at TechPowerUp Main Site | Source
 
Will 12 GB be the de facto standard for graphics cards?

Yeah. GPUs with less than 12 GB will have varying degrees of trouble with games going forward. 4-6 GB cards will be confined to lowest settings and low resolutions. That accompanies the past two generations of 8-12 GB GPUs being readily available at the midrange + 32 GB RAM being the new de facto standard for main memory.
 
Yeah. GPUs with less than 12 GB will have varying degrees of trouble with games going forward. 4-6 GB cards will be confined to lowest settings and low resolutions. That accompanies the past two generations of 8-12 GB GPUs being readily available at the midrange + 32 GB RAM being the new de facto standard for main memory.
Thank you. I'm Japanese but using a 3060Ti. I'll use it as a reference.
 
One step closer to AMD's decade old hUMA/HSA dream :toast:
17%20-%20hUMA%20Benefit_575px.JPG

Indeed. AMD has some incredibly forward-thinking engineers, it puzzles me why do they end up in the "Hardware Vendor #3" situation over and over again.

Thank you. I'm Japanese but using a 3060Ti. I'll use it as a reference.

けっこう, the 3060 Ti is still a great card, it might have a little trouble at 1440p or higher because of the 8 GB when you have ray tracing features enabled, but it should run games at 1080p very well for the foreseeable future :clap:
 
Indeed. AMD has some incredibly forward-thinking engineers, it puzzles me why do they end up in the "Hardware Vendor #3" situation over and over again.
Every innovation needs the right time and the right market conditions to sell, really.

And then you need to sell the innovation.

AMD is notoriously bad at both timing and selling.
 
I'd argue timing, they're great at selling! Just look at the first (dual core) Athlons, Zen & to a much lesser extent the early GCN cards. If you remove the cult of JHH (or fruity loops) AMD probably has the most loyal supporters out there!
 
One makes an article about DX12.

Puts on some GPU board picture.

GTX285. Supports max DX11.1

Stupid artists ®
 
Indeed. AMD has some incredibly forward-thinking engineers, it puzzles me why do they end up in the "Hardware Vendor #3" situation over and over again.
It's easy to put those future goals and objectives. It's extraordinarily hard to implement it, as it isn't just a matter of developing a single thing but the whole platform, ecosystem and the applications that run on it. I don't think that AMD current APUs even support their past dream(hardware wise), I believe that there is no coherency between CPU and GPU caches and memory access between them necessarily need to go straight to memory.
 
I'm amazed that Microsoft didn't save this for DirectX 13. It must not actually speed up performance in any substantial way.
 
Commented on this in another thread

MS is always working on things like this, always has been. They've always wanted their OS to run well (to sell it on more PC's) and now they have a gaming console using the same code under the hood, they really do care about performance - this is the sort of thing they can slip into all their Xbox games in a dev environment, then make a big media fuss about free performance for all Xbox owners if it works out.

Game devs just either use them silently, or cover them up with something that costs more performance than they gained (CoH and the '-nolitter' DX10 command come to mind, then Crysis)


This seems like it had to be developed to work with DirectStorage since it's been getting a lot of traction this year - anything fed from NVME-GPU had zero options to be modified or altered in any way, and this is a backup method without needing to revert to the older system. Could be as simple as a single line of code for a bugfix, that saves them having to send a 15GB texture pack out to every client.


In the far reaches of human history (Prior to DX9) Everything was essentially duplicated into system RAM as the CPU did all the work, so it needed a live copy of the active data to decompress, modify, whatever.
DX10 reduced a lot of this, allowing it to be fed over more efficiently with less CPU overhead, but it wasn't documented super well other than vague descriptions of "less CPU calls" and really nerdy documents

Thanks to the architecture of the new WDDM (Windows Display Driver Model), applications now create Direct3D 10 resources with different usage flags to indicate how the application intends on using the resource data. The new driver model virtualizes the memory used by resources; it then becomes the responsibility of the operating system/driver/memory manager to place resources in the most performant area of memory possible given the expected usage.
Example: Vistas Aero interface could run in an emulated software mode or a hardware mode reducing CPU usage.
Windows 7 had the option for a DX9 mode or full DX10.1 hardware acceleration freeing up CPU resources and system RAM - those poor office workers needed every megabyte they could save (and then their intel IGP took it from system ram anyway, ironically)

This was why a lot of low-end vista laptops (Thanks intel atom) felt slugglishly crap, as the IGP couldnt do hardware mode and the CPU's were too weak to do the animations smoothly.

Windows 7 DWM cuts memory consumption by 50% | istartedsomething
I remember quoting this to @W1zzard years ago and being unable to find the source, finally did! Thanks google!
(The 50% reduction was simply that they didn't have to duplicated it any longer - theres a comment from 2008 that guessed it all the way back then)
 
Last edited:
Commented on this in another thread

MS is always working on things like this, always has been. They've always wanted their OS to run well (to sell it on more PC's) and now they have a gaming console using the same code under the hood, they really do care about performance - this is the sort of thing they can slip into all their Xbox games in a dev environment, then make a big media fuss about free performance for all Xbox owners if it works out.

Game devs just either use them silently, or cover them up with something that costs more performance than they gained (CoH and the '-nolitter' DX10 command come to mind, then Crysis)


This seems like it had to be developed to work with DirectStorage since it's been getting a lot of traction this year - anything fed from NVME-GPU had zero options to be modified or altered in any way, and this is a backup method without needing to revert to the older system. Could be as simple as a single line of code for a bugfix, that saves them having to send a 15GB texture pack out to every client.


In the far reaches of human history (Prior to DX9) Everything was essentially duplicated into system RAM as the CPU did all the work, so it needed a live copy of the active data to decompress, modify, whatever.
DX10 reduced a lot of this, allowing it to be fed over more efficiently with less CPU overhead, but it wasn't documented super well other than vague descriptions of "less CPU calls" and really nerdy documents


Example: Vistas Aero interface could run in an emulated software mode or a hardware mode reducing CPU usage.
Windows 7 had the option for a DX9 mode or full DX10.1 hardware acceleration freeing up CPU resources and system RAM - those poor office workers needed every megabyte they could save (and then their intel IGP took it from system ram anyway, ironically)

This was why a lot of low-end vista laptops (Thanks intel atom) felt slugglishly crap, as the IGP couldnt do hardware mode and the CPU's were too weak to do the animations smoothly.

Windows 7 DWM cuts memory consumption by 50% | istartedsomething
I remember quoting this to @W1zzard years ago and being unable to find the source, finally did! Thanks google!
(The 50% reduction was simply that they didn't have to duplicated it any longer - theres a comment from 2008 that guessed it all the way back then)
I memeber that.

There are features that having a unified hardware between console and PC that have yet to be discovered in the name of performance and I for one look forward to it. AMD also has had a long history of working with MS to improve DX performance.
 
I think it is important to have many and different developments. For the future AI to have something to sort, compare and judge by quality and appropriateness, so as to assemble the next, many times better complex API.
 
I think the author mis interpreted the blog post a bit. This buffer type is meant to reduce the amount of copy necessary to put data in the memory accessible by the gpu for computing, NOT reduce the amount of duplicated data between cpu and gpu of which there's very little.

More generally, accessible by the cpu does not mean it should be used by it other than to fill it with data to be processed by the gpu. It will still be very slow to access as it is behind a pcie link, and actually using the same memory range would require memory coherency which would destroy both cpu and gpu performance by several orders of magnitude.
 
I memeber that.

There are features that having a unified hardware between console and PC that have yet to be discovered in the name of performance and I for one look forward to it. AMD also has had a long history of working with MS to improve DX performance.
Directstorage and all it's accidental improvements is the biggest one i can think of in recent history - clearly designed to benefit the current console design with a large pool of 'memory' that software dictates is RAM or VRAM, so having *anything* duplicated there is silly and redundant.

When DirectStorage lets them turn the entire NVME drive into an extension of that memory pool (almost like a pre-filled page file, as far as game textures are concerned) it will totally change the hardware requirements for high quality textures on those consoles, as they can now stream in the data faster


A quick google shows the PS5 can read from NVME around 5.5GB/s and the Series X at 2.4GB/s (Sometimes with faster speeds mentioned with current decompression tech) - which explains microsofts focus on direcstorage with hardware decompression of these textures, they want to move the data over to the GPU, have the GPU decompress it and not use the consoles limited CPU power to do so - they're using their software prowess to improve Direct3D to benefit their console so it can be cheaper than the competition, and make their desktop OS the 'gamers choice'
 
It smells to me like a continuation of the trend started by Nvidia. We're selling you less and cheaper hardware for more money because we've found a way to make you believe it's more productive.
 
It smells to me like a continuation of the trend started by Nvidia. We're selling you less and cheaper hardware for more money because we've found a way to make you believe it's more productive.
It's both.
It's making what they have more efficient, which lets them not make a new console - but us windows users reap the rewards too (which is a smaller side benefit to them)
 
I am not sure if the CPU could use the huge bandwidth of the Graphics cards. The latency alone would kill it.

But that could probably be a good way for the CPU to modify data into VRAM without having to bring it back to main memory.

Nice stuff, but how long it will take to be used in actual games? probably 4-5 years.

Still have to have games really using Direct Storage (yeah forsaken is there, but it's just 1 games and it do not use GPU decompression). And still no games using sampler feedback.
 
One step closer to AMD's decade old hUMA/HSA dream

CPU + GPU unified memory architecture is nothing new, just hasn't been done for consumer level software but you can get unified memory with CUDA or HIP in Linux right now.
 
Back
Top