Researchers Unveils Real-Time GPU-Only Pipeline for Fully Procedural Trees

AleksandarK · Jun 23, 2025

A research team from Coburg University of Applied Sciences and Arts in Germany, alongside AMD Germany, introduced a game-changing approach to procedural tree creation that runs entirely on the GPU, delivering both speed and flexibility, unlike anything we've seen before. Showcased at High-Performance Graphics 2025 in Copenhagen, the new pipeline utilizes DirectX 12 work graphs and mesh nodes to construct detailed tree models on the fly, without any CPU muscle. Artists and developers can tweak more than 150 parameters, everything from seasonal leaf color shifts and branch pruning styles to complex animations and automatic level-of-detail adjustments, all in real-time. When tested on an AMD Radeon RX 7900 XTX, the system generated and pushed unique tree geometries into the geometry buffer in just over three milliseconds. It then automatically tunes detail levels to maintain a target frame rate, effortlessly demonstrating stable 120 FPS under heavy workloads.

Wind effects and environmental interactions update seamlessly, and the CPU's only job is to fill a small set of constants (camera matrices, timestamps, and so on) before dispatching a single work graph. There's no need for continuous host-device chatter or asset streaming, which simplifies integration into existing engines. Perhaps the most eye-opening result is how little memory the transient data consumes. A traditional buffer-heavy approach might need tens of GB, but researcher's demo holds onto just 51 KB of persistent state per frame—a mind-boggling 99.9999% reduction compared to conventional methods. A scratch buffer of up to 1.5 GB is allocated for work-graph execution, though actual usage varies by GPU driver and can be released or reused afterward. Static assets, such as meshes and textures, remain unaffected, leaving future opportunities for neural compression or procedural texturing to further enhance memory savings.

The key to this achievement is work graphs, which can orchestrate millions of tasks without exploding dispatch counts. Traditional ExecuteIndirect calls would struggle with trees that can have up to 128^4 leaves (around 268 million), but work graphs handle it with ease. Widespread adoption will take time since current support is limited to AMD's RDNA 3+ and NVIDIA's 30-series and newer GPUs. Full game-engine integration and console support are still on the horizon. Looking forward, the researchers are exploring how to extend this flexible, GPU-driven pipeline into ray tracing, possibly by building on-GPU bounding volume hierarchies with the same work-graph framework.

View at TechPowerUp Main Site | Source

Daven · Jun 23, 2025

Even more reason to end the silly hardware separation between CPU and GPU and make one compute package that does it all.

cucol · Jun 23, 2025

awesome demo, you can chose between Witcher 4 demo trees or this ones.

hard decision.

dgianstefani · Jun 23, 2025

Daven said:
Even more reason to end the silly hardware separation between CPU and GPU and make one compute package that does it all.

Go buy a Mac then.

Lionheart · Jun 23, 2025

dgianstefani said:
Go buy a Mac then.

Yeah cause they're great at playing games... :wtf:

Daven · Jun 23, 2025

dgianstefani said:
Go buy a Mac then.

I’m talking full integration that goes beyond separate GPU/CPU ‘tiles’, ‘chiplets’ and SoCs sections. One pipeline for all instructions duplicated up to the desired power level.

dgianstefani · Jun 23, 2025

Lionheart said:
Yeah cause they're great at playing games...

They're not that bad these days, but it was more of a joking prod at the zero upgradability zero socket future if you really want PC to go that way.

Look at Strix Halo, weakish RDNA 3.5 GPU (compared to what you can otherwise fit in a $2k build, No FSR 4), soldered memory (no way to get high speeds/channels desired without it) and soldered CPU. Is that what you want for desktop? Be careful what you wish for.

Daven said:
I’m talking full integration that goes beyond separate GPU/CPU ‘tiles’, ‘chiplets’ and SoCs sections. One pipeline for all instructions duplicated up to the desired power level.

They aren't chiplets or tiles on M series, at least for the normal/pro tiers, it's on one piece of silicon besides the RAM until you get to the Ultra chips etc, where it's two or even four of them connected.

Daven · Jun 23, 2025

dgianstefani said:
They aren't chiplets or tiles on M series, at least for the normal/pro tiers, it's on one piece of silicon besides the RAM until you get to the Ultra chips etc, where it's two or even four of them connected.

But the CPU and GPU are still in separate sections meaning they are still seen as handling different instructions.

dgianstefani · Jun 23, 2025

Daven said:
But the CPU and GPU are still in separate sections meaning they are still seen as handling different instructions.

View attachment 404959

And? Lmao.

Should we have a transistor that is both SRAM, TLC storage, CPU logic, GPU logic, cache, NPU etc? Good luck figuring that out.

If you're talking about compute in memory/in memory processing, that's also not what you seem to be wanting, that's just lower power/latency architecture, that still doesn't really work in most use cases.

The concept of one architecture that does it all is both inefficient and a pipedream. Optimization only really works for one or a couple of types of task family.

Daven · Jun 23, 2025

dgianstefani said:
And? Lmao.

Should we have a transistor that is both SRAM, TLC storage, CPU logic, GPU logic, cache, NPU etc? Good luck figuring that out.

If you're talking about compute in memory/in memory processing, that's also not what you seem to be wanting, that's just lower power/latency architecture, that still doesn't really work in most use cases.

The concept of one architecture that does it all is both inefficient and a pipedream. Optimization only really works for one or a couple of types of task family.

Of course what you said is the current compute scheme in use. It wasn’t that way in very beginning when video display adapters only handled signaling and did no computing. Eventually methods as outlined in this article might see a return to a more integrated computing scheme between high levels of parallalism and complex instruction sets.

mikesg · Jun 23, 2025

Grass looks good

TumbleGeorge · Jun 23, 2025

Daven said:
I’m talking full integration that goes beyond separate GPU/CPU ‘tiles’, ‘chiplets’ and SoCs sections. One pipeline for all instructions duplicated up to the desired power level.

If this path is followed, the DIY computer market will die.

dyonoctis · Jun 23, 2025

Daven said:
But the CPU and GPU are still in separate sections meaning they are still seen as handling different instructions.

View attachment 404959

Yhea, no. I understand what you mean, you want a super chip that would be able to handle GPU rendering and general compute with the same hardware at the core level, but compute evolved in that way because general purpose chips aren't efficient. GPU cores are very small and very good at heavily parallelized tasks, but would suck at lightly thread stuff. A lot of very smart people understood that combining several specialist is better than trying to make an uber chip.

Yes, GPUs evolved towards a unified arch to avoid specialist parts staying idle, when the other are busy doing something else, but thats not comparable to what we have now with SOCs and GPGPU. Some tasks are still better handled by a CPU arch, and a pita to accelerate by GPU.

And on the contrary, that UBER chip would have to share more of its ressources to handle everything. Unified gpus meant that for the same amount of transistors, the GPU could effectively do more work because the whole silicon can be used at all time

Daven · Jun 23, 2025

dyonoctis said:
Yhea, no. I understand what you mean, you want a super chip that would be able to handle GPU rendering and general compute with the same hardware at the core level, but compute evolved in that way because general purpose chips aren't efficient. GPU cores are very small and very good at heavily parallelized tasks, but would suck at lightly thread stuff. A lot of very smart people understood that combining several specialist is better than trying to make an uber chip.

Yes, GPUs evolved towards a unified arch to avoid specialist parts staying idle, when the other are busy doing something else, but thats not comparable to what we have now with SOCs and GPGPU. Some tasks are still better handled by a CPU arch, and a pita to accelerate by GPU.

And on the contrary, that UBER chip would have to share more of its ressources to handle everything. Unified gpus meant that for the same amount of transistors, the GPU could effectively do more work because the whole silicon can be used at all time

I think at least some of what you are saying can be mitigated by ever improving fab nodes. I'm imagining a super small CPU that consumes 100 mWs for complex tasks that can be multitasked. Then there would be 30,000 of these tiny CPUs on a single chip where the overall chip simultaneously completes a highly parallel task using part of each of the 30,000 tiny CPUs on top of the complex multitasking.

TumbleGeorge · Jun 23, 2025

Daven said:
30,000 of these tiny CPUs on a single chip wher

Amdahl's law?

Daven · Jun 23, 2025

TumbleGeorge said:
Amdahl's law?

Since I'm a scientist, I only give some credence to scientific laws (gravity, thermodynamics, etc.). All of these made-up laws regarding human ingenuity are just as relevant as Murphy's law.

Vayra86 · Jun 23, 2025

Daven said:
Of course what you said is the current compute scheme in use. It wasn’t that way in very beginning when video display adapters only handled signaling and did no computing. Eventually methods as outlined in this article might see a return to a more integrated computing scheme between high levels of parallalism and complex instruction sets.

People with these disruptive ideas always tend to glance over a few thousand other reasons we have what we've got today.

It happened with crypto versus the banking in fiat. Where's that going these days I wonder.

This shit never flies because reality gets in the way. Its why they're called utopian thoughts.

Visible Noise · Jun 23, 2025

This will surely help AMD increase market share against Nvidia.

Kazioo · Jun 23, 2025

Daven said:
Since I'm a scientist, I only give some credence to scientific laws (gravity, thermodynamics, etc.). All of these made-up laws regarding human ingenuity are just as relevant as Murphy's law.

It doesn't matter how you call "Amdahl's law" or if you respect it or not, but the phenomena it describes is basic math.
It has nothing to do with human ingenuity that 9 women cannot deliver a baby in 1 month or that 10 pilots cannot make the plane reach the destination 10x faster than a single pilot.
Of course, we can always hallucinate it and get around the entire problem

Our brains prove we don't even need Turing completeness to calculate anything.

Visible Noise · Jun 23, 2025

Daven said:
Since I'm a scientist, I only give some credence to scientific laws (gravity, thermodynamics, etc.). All of these made-up laws regarding human ingenuity are just as relevant as Murphy's law.

Scientist of what? Gene Amdahl’s PhD. is in physics.

Amdahl's law - Wikipedia

en.wikipedia.org

Apocalypsee · Jun 24, 2025

Nah every single game devs will use crappy UE because it's 'easier'

Patriot · Jun 24, 2025

Daven said:
Of course what you said is the current compute scheme in use. It wasn’t that way in very beginning when video display adapters only handled signaling and did no computing. Eventually methods as outlined in this article might see a return to a more integrated computing scheme between high levels of parallalism and complex instruction sets.

Please stop while you are behind... You are giving the silicon architecture junkies headaches.
Everything is about tradeoffs, shrinking nodes gives the ability to have dedicated silicon to specific functions gaining efficiency.
Generic computation = inefficient everything. We are in the era of accelerators and dsp to increase efficiency at the tradeoff of die space.
You are suggesting we turn tail and throw away 20yrs of progress its pure idiocy.

Daven · Jun 24, 2025

Kazioo said:
It doesn't matter how you call "Amdahl's law" or if you respect it or not, but the phenomena it describes is basic math.
It has nothing to do with human ingenuity that 9 women cannot deliver a baby in 1 month or that 10 pilots cannot make the plane reach the destination 10x faster than a single pilot.
Of course, we can always hallucinate it and get around the entire problem Our brains prove we don't even need Turing completeness to calculate anything.

Amdahl’s law is about computers and programming conceived by humans on earth. While gravity exists on all planets, human’s way of implementing computational devices is specific just to our current way of thinking. Its not universal but a limit of our species’ understanding.

Making up analogies of what we currently can’t do in unrelated scenarios doesn’t change anything. By the way, it may be possible to grow multiple human babies in one month in a single incubator. No physical laws of the universe say otherwise.

Patriot said:
Please stop while you are behind... You are giving the silicon architecture junkies headaches.
Everything is about tradeoffs, shrinking nodes gives the ability to have dedicated silicon to specific functions gaining efficiency.
Generic computation = inefficient everything. We are in the era of accelerators and dsp to increase efficiency at the tradeoff of die space.
You are suggesting we turn tail and throw away 20yrs of progress its pure idiocy.

The reason I’m a good scientist is that I don’t stop or have any problem being wrong. I also don’t accept the limited understandings of the professors and textbooks that taught me in school. What came before is a foundation of learning. Its up to us to build something never conceived before on that foundation.

Stop trying to win the internet and go out and create something. It’s fun.

watzupken · Jun 24, 2025

The question I have is how does this translate to performance in actual games. Utilizing GPUs to perform tasks that’s typically performed by CPU is not new. But there is no perfect solution and will always come with its set of tradeoffs. Right off the bat, the idea of leaning on GPUs to process anything extra means less resources for raster performance.

Vayra86 · Jun 24, 2025

Daven said:
Amdahl’s law is about computers and programming conceived by humans on earth. While gravity exists on all planets, human’s way of implementing computational devices is specific just to our current way of thinking. Its not universal but a limit of our species’ understanding.

Making up analogies of what we currently can’t do in unrelated scenarios doesn’t change anything. By the way, it may be possible to grow multiple human babies in one month in a single incubator. No physical laws of the universe say otherwise.

The reason I’m a good scientist is that I don’t stop or have any problem being wrong. I also don’t accept the limited understandings of the professors and textbooks that taught me in school. What came before is a foundation of learning. Its up to us to build something never conceived before on that foundation.

Stop trying to win the internet and go out and create something. It’s fun.

Conceptual thinking is pretty high level, indeed

System Name	Silent/X1 Yoga/S25U-1TB
Processor	Ryzen 9800X3D @ 5.4ghz AC 1.18 V, TG AM5 High Performance Heatspreader/1185 G7/Snapdragon 8 Elite
Motherboard	ASUS ROG Strix X870-I, chipset fans replaced with Noctua A14x25 G2
Cooling	Optimus Block, HWLabs Copper 240/40 x2, D5/Res, 4x Noctua A12x25, 1x A14G2, Conductonaut Extreme
Memory	64 GB Dominator Titanium White 6000 MT, 130 ns tRFC, active cooled, TG Putty Pro
Video Card(s)	RTX 3080 Ti Founders Edition, Conductonaut Extreme, 40 W/mK 3D Graphite pads, Corsair XG7 Waterblock
Storage	Intel Optane DC P1600X 118 GB, Samsung 990 Pro 2 TB
Display(s)	34" 240 Hz 3440x1440 34GS95Q LG MLA+ W-OLED, 31.5" 165 Hz 1440P NanoIPS Ultragear, MX900 dual VESA
Case	Sliger SM570 CNC Alu 13-Litre, 3D printed feet, TG Minuspad Extreme, LINKUP Ultra PCIe 4.0 x16 White
Audio Device(s)	Audeze Maxwell Ultraviolet w/upgrade pads & Leather LCD headband, Galaxy Buds 3 Pro, Razer Nommo Pro
Power Supply	SF1000 Plat, 13 A transparent custom cables, Sentinel Pro 1500 Online Double Conversion UPS w/Noctua
Mouse	Razer Viper V3 Pro 8 KHz Mercury White w/Pulsar Supergrip tape, Razer Atlas, Razer Strider Chroma
Keyboard	Wooting 60HE+ module, TOFU-R CNC Alu/Brass, SS Prismcaps W+Jellykey, LekkerL60 V2, TLabs Leath/Suede
Software	Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores	Legendary

System Name	Intel NUC 12 Extreme
Processor	Intel Core i7 12700 12 Core 20 Thread CPU
Motherboard	Intel NUC Module Motherboard
Cooling	NUC Blower Cooler + 3 x 92mm Fans
Memory	64GB RAM Corsair 3200Mhz CL22
Video Card(s)	PowerColor RX 9070 16GB Reaper
Storage	Silicon 500GB M.2 + WD 2TB External HDD
Display(s)	Sony 4K Bravia X85J 43Inch TV 120Hz
Case	Intel NUC 12 Extreme Mini ITX Case
Audio Device(s)	Realtek Audio + Dolby Atmos
Power Supply	SFX FSP 650 Gold Rated PSU
Mouse	Logitech G203 Lightsync Mouse
Keyboard	Red Dragon K552W RGB White KB.
VR HMD	( ◔ ʖ̯ ◔ )
Software	Windows 10 Home 64bit
Benchmark Scores	None. I also own a Apple Macbook Air M2

System Name	Silent/X1 Yoga/S25U-1TB
Processor	Ryzen 9800X3D @ 5.4ghz AC 1.18 V, TG AM5 High Performance Heatspreader/1185 G7/Snapdragon 8 Elite
Motherboard	ASUS ROG Strix X870-I, chipset fans replaced with Noctua A14x25 G2
Cooling	Optimus Block, HWLabs Copper 240/40 x2, D5/Res, 4x Noctua A12x25, 1x A14G2, Conductonaut Extreme
Memory	64 GB Dominator Titanium White 6000 MT, 130 ns tRFC, active cooled, TG Putty Pro
Video Card(s)	RTX 3080 Ti Founders Edition, Conductonaut Extreme, 40 W/mK 3D Graphite pads, Corsair XG7 Waterblock
Storage	Intel Optane DC P1600X 118 GB, Samsung 990 Pro 2 TB
Display(s)	34" 240 Hz 3440x1440 34GS95Q LG MLA+ W-OLED, 31.5" 165 Hz 1440P NanoIPS Ultragear, MX900 dual VESA
Case	Sliger SM570 CNC Alu 13-Litre, 3D printed feet, TG Minuspad Extreme, LINKUP Ultra PCIe 4.0 x16 White
Audio Device(s)	Audeze Maxwell Ultraviolet w/upgrade pads & Leather LCD headband, Galaxy Buds 3 Pro, Razer Nommo Pro
Power Supply	SF1000 Plat, 13 A transparent custom cables, Sentinel Pro 1500 Online Double Conversion UPS w/Noctua
Mouse	Razer Viper V3 Pro 8 KHz Mercury White w/Pulsar Supergrip tape, Razer Atlas, Razer Strider Chroma
Keyboard	Wooting 60HE+ module, TOFU-R CNC Alu/Brass, SS Prismcaps W+Jellykey, LekkerL60 V2, TLabs Leath/Suede
Software	Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores	Legendary

System Name	Silent/X1 Yoga/S25U-1TB
Processor	Ryzen 9800X3D @ 5.4ghz AC 1.18 V, TG AM5 High Performance Heatspreader/1185 G7/Snapdragon 8 Elite
Motherboard	ASUS ROG Strix X870-I, chipset fans replaced with Noctua A14x25 G2
Cooling	Optimus Block, HWLabs Copper 240/40 x2, D5/Res, 4x Noctua A12x25, 1x A14G2, Conductonaut Extreme
Memory	64 GB Dominator Titanium White 6000 MT, 130 ns tRFC, active cooled, TG Putty Pro
Video Card(s)	RTX 3080 Ti Founders Edition, Conductonaut Extreme, 40 W/mK 3D Graphite pads, Corsair XG7 Waterblock
Storage	Intel Optane DC P1600X 118 GB, Samsung 990 Pro 2 TB
Display(s)	34" 240 Hz 3440x1440 34GS95Q LG MLA+ W-OLED, 31.5" 165 Hz 1440P NanoIPS Ultragear, MX900 dual VESA
Case	Sliger SM570 CNC Alu 13-Litre, 3D printed feet, TG Minuspad Extreme, LINKUP Ultra PCIe 4.0 x16 White
Audio Device(s)	Audeze Maxwell Ultraviolet w/upgrade pads & Leather LCD headband, Galaxy Buds 3 Pro, Razer Nommo Pro
Power Supply	SF1000 Plat, 13 A transparent custom cables, Sentinel Pro 1500 Online Double Conversion UPS w/Noctua
Mouse	Razer Viper V3 Pro 8 KHz Mercury White w/Pulsar Supergrip tape, Razer Atlas, Razer Strider Chroma
Keyboard	Wooting 60HE+ module, TOFU-R CNC Alu/Brass, SS Prismcaps W+Jellykey, LekkerL60 V2, TLabs Leath/Suede
Software	Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores	Legendary

Processor	AMD Ryzen 3700x
Motherboard	asus ROG Strix B-350I Gaming
Cooling	Deepcool LS520 SE
Memory	crucial ballistix 32Gb DDR4
Video Card(s)	RTX 3070 FE
Storage	WD sn550 1To/WD ssd sata 1To /WD black sn750 1To/Seagate 2To/WD book 4 To back-up
Display(s)	LG GL850
Case	Dan A4 H2O
Audio Device(s)	sennheiser HD58X
Power Supply	Corsair SF600
Mouse	MX master 3
Keyboard	Master Key Mx
Software	win 11 pro

Researchers Unveils Real-Time GPU-Only Pipeline for Fully Procedural Trees

AleksandarK

News Editor

Daven

cucol

New Member

dgianstefani

TPU Proofreader

Lionheart

Daven

dgianstefani

TPU Proofreader

Daven

dgianstefani

TPU Proofreader

Daven

mikesg

TumbleGeorge

dyonoctis

Daven

TumbleGeorge

Daven

Vayra86

Visible Noise

Kazioo

New Member

Visible Noise

Amdahl's law - Wikipedia

Apocalypsee

Patriot

Daven

watzupken

Vayra86

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

System Name	XPS, Lenovo and HP Laptops, HP Xeon Mobile Workstation, HP Servers, Dell Desktops
Processor	Everything from Turion to 13900kf
Motherboard	MSI - they own the OEM market
Cooling	Air on laptops, lots of air on servers, AIO on desktops
Memory	I think one of the laptops is 2GB, to 64GB on gamer, to 128GB on ZFS Filer
Video Card(s)	A pile up to my knee, with a RTX 4090 teetering on top
Storage	Rust in the closet, solid state everywhere else
Display(s)	Laptop crap, LG UltraGear of various vintages
Case	OEM and a 42U rack
Audio Device(s)	Headphones
Power Supply	Whole home UPS w/Generac Standby Generator
Software	ZFS, UniFi Network Application, Entra, AWS IoT Core, Splunk
Benchmark Scores	1.21 GigaBungholioMarks

Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B450M S2H
Cooling	Scythe Kotetsu Mark II
Memory	2 x 16GB SK Hynix CJR OEM DDR4-3200 @ 4000 20-22-20-48
Video Card(s)	Colorful RTX 2060 SUPER 8GB GDDR6
Storage	250GB WD BLACK SN750 M.2 + 4TB WD Red Plus + 4TB WD Purple
Display(s)	AOpen 27HC5R 27" 1080p 165Hz curved VA
Case	AIGO Darkflash C285
Audio Device(s)	Creative SoundBlaster Z + Kurtzweil KS-40A bookshelf / Sennheiser HD555
Power Supply	Great Wall GW-EPS1000DA 1kW
Mouse	Razer Deathadder Essential
Keyboard	Cougar Attack2 Cherry MX Black
Software	Windows 10 Pro x64 22H2

System Name	[H]arbringer
Processor	4x 61XX ES @3.5Ghz (48cores)
Motherboard	SM GL
Cooling	3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory	16x gskill DDR3 1600 cas6 2gb
Video Card(s)	blah bigadv folder no gfx needed
Storage	32GB Sammy SSD
Display(s)	headless
Case	Xigmatek Elysium (whats left of it)
Audio Device(s)	yawn
Power Supply	Antec 1200w HCP
Software	Ubuntu 10.10
Benchmark Scores	http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww