DirectX 12 API New Feature Set Introduces GPU Upload Heaps, Enables Simultaneous Access to VRAM for CPU and GPU

T0@st · Apr 1, 2023

Microsoft has implemented two new features into its DirectX 12 API - GPU Upload Heaps and Non-Normalized sampling have been added via the latest Agility SDK 1.710.0 preview, and the former looks to be the more intriguing of the pair. The SDK preview is only accessible to developers at the present time, since its official introduction on Friday 31 March. Support has also been initiated via the latest graphics drivers issued by NVIDIA, Intel, and AMD. The Microsoft team has this to say about the preview version of GPU upload heaps feature in DirectX 12: "Historically a GPU's VRAM was inaccessible to the CPU, forcing programs to have to copy large amounts of data to the GPU via the PCI bus. Most modern GPUs have introduced VRAM resizable base address register (BAR) enabling Windows to manage the GPU VRAM in WDDM 2.0 or later."

They continue to describe how the update allows the CPU to gain access to the pool of VRAM on the connected graphics card: "With the VRAM being managed by Windows, D3D now exposes the heap memory access directly to the CPU! This allows both the CPU and GPU to directly access the memory simultaneously, removing the need to copy data from the CPU to the GPU increasing performance in certain scenarios." This GPU optimization could offer many benefits in the context of computer games, since memory requirements continue to grow in line with an increase in visual sophistication and complexity.

A shared pool of memory between the CPU and GPU will eliminate the need to keep duplicates of the game scenario data in both system memory and graphics card VRAM, therefore resulting in a reduced data stream between the two locations. Modern graphics cards have tended to feature very fast on-board memory standards (GDDR6) in contrast to main system memory (DDR5 at best). In theory the CPU could benefit greatly from exclusive access to a pool of ultra quick VRAM, perhaps giving an early preview of a time when DDR6 becomes the daily standard in main system memory.

DirectX 12 Ultimate on GeForce RTX

View at TechPowerUp Main Site | Source

Selaya · Apr 1, 2023

not sure if april fools

Zunexxx · Apr 1, 2023

Selaya said:
not sure if april fools

Nah, it was posted 2 days ago on the Microsoft dev channel and blog.

noel_fs · Apr 1, 2023

TumbleGeorge · Apr 1, 2023

I hope soon to be used better in games!

silent majority · Apr 1, 2023

Will 12 GB be the de facto standard for graphics cards?

Dr. Dro · Apr 1, 2023

silent majority said:
Will 12 GB be the de facto standard for graphics cards?

Yeah. GPUs with less than 12 GB will have varying degrees of trouble with games going forward. 4-6 GB cards will be confined to lowest settings and low resolutions. That accompanies the past two generations of 8-12 GB GPUs being readily available at the midrange + 32 GB RAM being the new de facto standard for main memory.

silent majority · Apr 1, 2023

Dr. Dro said:
Yeah. GPUs with less than 12 GB will have varying degrees of trouble with games going forward. 4-6 GB cards will be confined to lowest settings and low resolutions. That accompanies the past two generations of 8-12 GB GPUs being readily available at the midrange + 32 GB RAM being the new de facto standard for main memory.

Thank you. I'm Japanese but using a 3060Ti. I'll use it as a reference.

R0H1T · Apr 1, 2023

One step closer to AMD's decade old hUMA/HSA dream :toast:

Dr. Dro · Apr 1, 2023

R0H1T said:
One step closer to AMD's decade old hUMA/HSA dream

Indeed. AMD has some incredibly forward-thinking engineers, it puzzles me why do they end up in the "Hardware Vendor #3" situation over and over again.

silent majority said:
Thank you. I'm Japanese but using a 3060Ti. I'll use it as a reference.

けっこう, the 3060 Ti is still a great card, it might have a little trouble at 1440p or higher because of the 8 GB when you have ray tracing features enabled, but it should run games at 1080p very well for the foreseeable future :clap:

Vayra86 · Apr 1, 2023

Dr. Dro said:
Indeed. AMD has some incredibly forward-thinking engineers, it puzzles me why do they end up in the "Hardware Vendor #3" situation over and over again.

Every innovation needs the right time and the right market conditions to sell, really.

And then you need to sell the innovation.

AMD is notoriously bad at both timing and selling.

R0H1T · Apr 1, 2023

I'd argue timing, they're great at selling! Just look at the first (dual core) Athlons, Zen & to a much lesser extent the early GCN cards. If you remove the cult of JHH (or fruity loops) AMD probably has the most loyal supporters out there!

pavle · Apr 1, 2023

Interesting how the thing about which there was much hot air blown out in the past is now finally being realised have to keep that DirectX relevant.

Ferrum Master · Apr 1, 2023

One makes an article about DX12.

Puts on some GPU board picture.

GTX285. Supports max DX11.1

Stupid artists ®

persondb · Apr 2, 2023

Dr. Dro said:
Indeed. AMD has some incredibly forward-thinking engineers, it puzzles me why do they end up in the "Hardware Vendor #3" situation over and over again.

It's easy to put those future goals and objectives. It's extraordinarily hard to implement it, as it isn't just a matter of developing a single thing but the whole platform, ecosystem and the applications that run on it. I don't think that AMD current APUs even support their past dream(hardware wise), I believe that there is no coherency between CPU and GPU caches and memory access between them necessarily need to go straight to memory.

InhaleOblivion · Apr 2, 2023

I'm amazed that Microsoft didn't save this for DirectX 13. It must not actually speed up performance in any substantial way.

Mussels · Apr 2, 2023

Commented on this in another thread

MS is always working on things like this, always has been. They've always wanted their OS to run well (to sell it on more PC's) and now they have a gaming console using the same code under the hood, they really do care about performance - this is the sort of thing they can slip into all their Xbox games in a dev environment, then make a big media fuss about free performance for all Xbox owners if it works out.

Game devs just either use them silently, or cover them up with something that costs more performance than they gained (CoH and the '-nolitter' DX10 command come to mind, then Crysis)

This seems like it had to be developed to work with DirectStorage since it's been getting a lot of traction this year - anything fed from NVME-GPU had zero options to be modified or altered in any way, and this is a backup method without needing to revert to the older system. Could be as simple as a single line of code for a bugfix, that saves them having to send a 15GB texture pack out to every client.

In the far reaches of human history (Prior to DX9) Everything was essentially duplicated into system RAM as the CPU did all the work, so it needed a live copy of the active data to decompress, modify, whatever.
DX10 reduced a lot of this, allowing it to be fed over more efficiently with less CPU overhead, but it wasn't documented super well other than vague descriptions of "less CPU calls" and really nerdy documents

Thanks to the architecture of the new WDDM (Windows Display Driver Model), applications now create Direct3D 10 resources with different usage flags to indicate how the application intends on using the resource data. The new driver model virtualizes the memory used by resources; it then becomes the responsibility of the operating system/driver/memory manager to place resources in the most performant area of memory possible given the expected usage.

Example: Vistas Aero interface could run in an emulated software mode or a hardware mode reducing CPU usage.
Windows 7 had the option for a DX9 mode or full DX10.1 hardware acceleration freeing up CPU resources and system RAM - those poor office workers needed every megabyte they could save (and then their intel IGP took it from system ram anyway, ironically)

This was why a lot of low-end vista laptops (Thanks intel atom) felt slugglishly crap, as the IGP couldnt do hardware mode and the CPU's were too weak to do the animations smoothly.

Windows 7 DWM cuts memory consumption by 50% | istartedsomething
I remember quoting this to @W1zzard years ago and being unable to find the source, finally did! Thanks google!
(The 50% reduction was simply that they didn't have to duplicated it any longer - theres a comment from 2008 that guessed it all the way back then)

Steevo · Apr 2, 2023

Mussels said:
Commented on this in another thread

MS is always working on things like this, always has been. They've always wanted their OS to run well (to sell it on more PC's) and now they have a gaming console using the same code under the hood, they really do care about performance - this is the sort of thing they can slip into all their Xbox games in a dev environment, then make a big media fuss about free performance for all Xbox owners if it works out.

Game devs just either use them silently, or cover them up with something that costs more performance than they gained (CoH and the '-nolitter' DX10 command come to mind, then Crysis)

This seems like it had to be developed to work with DirectStorage since it's been getting a lot of traction this year - anything fed from NVME-GPU had zero options to be modified or altered in any way, and this is a backup method without needing to revert to the older system. Could be as simple as a single line of code for a bugfix, that saves them having to send a 15GB texture pack out to every client.

In the far reaches of human history (Prior to DX9) Everything was essentially duplicated into system RAM as the CPU did all the work, so it needed a live copy of the active data to decompress, modify, whatever.
DX10 reduced a lot of this, allowing it to be fed over more efficiently with less CPU overhead, but it wasn't documented super well other than vague descriptions of "less CPU calls" and really nerdy documents

Example: Vistas Aero interface could run in an emulated software mode or a hardware mode reducing CPU usage.
Windows 7 had the option for a DX9 mode or full DX10.1 hardware acceleration freeing up CPU resources and system RAM - those poor office workers needed every megabyte they could save (and then their intel IGP took it from system ram anyway, ironically)

This was why a lot of low-end vista laptops (Thanks intel atom) felt slugglishly crap, as the IGP couldnt do hardware mode and the CPU's were too weak to do the animations smoothly.

Windows 7 DWM cuts memory consumption by 50% | istartedsomething
I remember quoting this to @W1zzard years ago and being unable to find the source, finally did! Thanks google!
(The 50% reduction was simply that they didn't have to duplicated it any longer - theres a comment from 2008 that guessed it all the way back then)

I memeber that.

There are features that having a unified hardware between console and PC that have yet to be discovered in the name of performance and I for one look forward to it. AMD also has had a long history of working with MS to improve DX performance.

TumbleGeorge · Apr 2, 2023

I think it is important to have many and different developments. For the future AI to have something to sort, compare and judge by quality and appropriateness, so as to assemble the next, many times better complex API.

biggermesh · Apr 2, 2023

I think the author mis interpreted the blog post a bit. This buffer type is meant to reduce the amount of copy necessary to put data in the memory accessible by the gpu for computing, NOT reduce the amount of duplicated data between cpu and gpu of which there's very little.

More generally, accessible by the cpu does not mean it should be used by it other than to fill it with data to be processed by the gpu. It will still be very slow to access as it is behind a pcie link, and actually using the same memory range would require memory coherency which would destroy both cpu and gpu performance by several orders of magnitude.

Mussels · Apr 2, 2023

Steevo said:
I memeber that.

There are features that having a unified hardware between console and PC that have yet to be discovered in the name of performance and I for one look forward to it. AMD also has had a long history of working with MS to improve DX performance.

Directstorage and all it's accidental improvements is the biggest one i can think of in recent history - clearly designed to benefit the current console design with a large pool of 'memory' that software dictates is RAM or VRAM, so having *anything* duplicated there is silly and redundant.

When DirectStorage lets them turn the entire NVME drive into an extension of that memory pool (almost like a pre-filled page file, as far as game textures are concerned) it will totally change the hardware requirements for high quality textures on those consoles, as they can now stream in the data faster

A quick google shows the PS5 can read from NVME around 5.5GB/s and the Series X at 2.4GB/s (Sometimes with faster speeds mentioned with current decompression tech) - which explains microsofts focus on direcstorage with hardware decompression of these textures, they want to move the data over to the GPU, have the GPU decompress it and not use the consoles limited CPU power to do so - they're using their software prowess to improve Direct3D to benefit their console so it can be cheaper than the competition, and make their desktop OS the 'gamers choice'

TumbleGeorge · Apr 2, 2023

It smells to me like a continuation of the trend started by Nvidia. We're selling you less and cheaper hardware for more money because we've found a way to make you believe it's more productive.

Mussels · Apr 2, 2023

TumbleGeorge said:
It smells to me like a continuation of the trend started by Nvidia. We're selling you less and cheaper hardware for more money because we've found a way to make you believe it's more productive.

It's both.
It's making what they have more efficient, which lets them not make a new console - but us windows users reap the rewards too (which is a smaller side benefit to them)

Punkenjoy · Apr 2, 2023

I am not sure if the CPU could use the huge bandwidth of the Graphics cards. The latency alone would kill it.

But that could probably be a good way for the CPU to modify data into VRAM without having to bring it back to main memory.

Nice stuff, but how long it will take to be used in actual games? probably 4-5 years.

Still have to have games really using Direct Storage (yeah forsaken is there, but it's just 1 games and it do not use GPU decompression). And still no games using sampler feedback.

Vya Domus · Apr 2, 2023

R0H1T said:
One step closer to AMD's decade old hUMA/HSA dream

CPU + GPU unified memory architecture is nothing new, just hasn't been done for consumer level software but you can get unified memory with CUDA or HIP in Linux right now.

System Name	The TPU Typewriter
Processor	AMD Ryzen 5 5600 (non-X)
Motherboard	GIGABYTE B550M DS3H Micro ATX
Cooling	DeepCool AS500
Memory	Kingston Fury Renegade RGB 32 GB (2 x 16 GB) DDR4-3600 CL16
Video Card(s)	PowerColor Radeon RX 7800 XT 16 GB Hellhound OC
Storage	Samsung 980 Pro 1 TB M.2-2280 PCIe 4.0 X4 NVME SSD
Display(s)	Lenovo Legion Y27q-20 27" QHD IPS monitor
Case	GameMax Spark M-ATX (re-badged Jonsbo D30)
Audio Device(s)	FiiO K7 Desktop DAC/Amp + Philips Fidelio X3 headphones, or ARTTI T10 Planar IEMs
Power Supply	ADATA XPG CORE Reactor 650 W 80+ Gold ATX
Mouse	Roccat Kone Pro Air
Keyboard	Cooler Master MasterKeys Pro L
Software	Windows 10 64-bit Home Edition

Processor	Ryzen 5700X
Motherboard	Gigabyte B550 Arous Elite V2
Cooling	Thermalright PA120
Memory	Kingston FURY Renegade 3600Mhz @ 3733 tight timings
Video Card(s)	Sapphire Pulse RX 6800
Storage	36TB
Display(s)	Samsung QN90A
Case	be quiet! Dark Base Pro 900
Audio Device(s)	Khadas Tone Pro 2, HD660s, KSC75, JBL 305 MK1
Power Supply	Coolermaster V850 Gold V2
Mouse	Roccat Burst Pro
Keyboard	Dogshit with Otemu Brown
Software	W10 LTSC 2021

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2
Benchmark Scores	I pulled a Qiqi~

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2
Benchmark Scores	I pulled a Qiqi~

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

DirectX 12 API New Feature Set Introduces GPU Upload Heaps, Enables Simultaneous Access to VRAM for CPU and GPU

T0@st

News Editor

Selaya

Zunexxx

noel_fs

TumbleGeorge

silent majority

New Member

Dr. Dro

silent majority

New Member

R0H1T

Dr. Dro

Vayra86

R0H1T

pavle

Ferrum Master

persondb

InhaleOblivion

Mussels

Freshwater Moderator

Steevo

TumbleGeorge

biggermesh

Mussels

Freshwater Moderator

TumbleGeorge

Mussels

Freshwater Moderator

Punkenjoy

Vya Domus

System Name	HELLSTAR
Processor	AMD RYZEN 9 5950X
Motherboard	ASUS Strix X570-E
Cooling	2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory	4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s)	Sapphire Pulse RX 7900XTX. Water block. Crossflashed.
Storage	Optane 900P[Fedora] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO+SN560 1TB(W11)
Display(s)	Philips PHL BDM3270 + Acer XV242Y
Case	Lian Li O11 Dynamic EVO
Audio Device(s)	SMSL RAW-MDA1 DAC
Power Supply	Fractal Design Newton R3 1000W
Mouse	Razer Basilisk
Keyboard	Razer BlackWidow V3 - Yellow Switch
Software	FEDORA 41

System Name	Mass Effect/Lost Ark
Processor	AMD Ryzen 5 5600X/AMD Ryzen 7 2700X
Motherboard	Asus ROG Strix X470-F Gaming/Asus ROG Strix B450-F Gaming II
Cooling	Noctua NH-D15S/AMD Wraith Max
Memory	G.Skill Ripjaws V Series 16GB (2x8GB) DDR4-3200/Corsair Vengeance RGB Pro 16GB (2x8GB) DDR4-3200
Video Card(s)	MSI AMD Radeon RX 6750 XT Mech 2x/MSI AMD Radeon RX 5700 XT Mech OC
Storage	Samsung 860 Evo 500GB 2.5" SSDs x2, WD Black 4TB 3.5" 7200RPM, Samsung 970 EVO 500GB 1TB NVME M.2
Display(s)	Acer XF270H 1920x1080p @ 144Hz
Case	Thermaltake Core P3 TG ATX Mid Tower/CoolerMaster MasterCase Pro 5
Audio Device(s)	SteelSeries Actis Nova 3 RGB
Power Supply	Cooler Master V850 80+ Gold/Corsair CX650M 80+ Bronze
Mouse	Thermaltake Level 10 M/Logitech G502
Keyboard	Corsair K70 RGB MK.2 Wired Gaming Edition/Steelseries Apex 3 RGB
Software	Windows 10 Pro OEM 64bit/Ubuntu 22.04.1 64bit

System Name	Rainbow Sparkles (Power efficient, <350W gaming load)
Processor	Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard	Asus x570-F (BIOS Modded)
Cooling	Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory	2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s)	Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage	2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s)	Phillips 32 32M1N5800A (4k144), LG 32" (4K60) \| Gigabyte G32QC (2k165) \| Phillips 328m6fjrmb (2K144)
Case	Fractal Design R6
Audio Device(s)	Logitech G560 \| Corsair Void pro RGB \|Blue Yeti mic
Power Supply	Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse	Logitech G Pro wireless + Steelseries Prisma XL
Keyboard	Razer Huntsman TE ( Sexy white keycaps)
VR HMD	Oculus Rift S + Quest 2
Software	Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores	Nyooom.

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C