Wednesday, January 19th 2022

NVIDIA Unlocks GPU System Processor (GSP) for Improved System Performance

In 2016, NVIDIA announced that the company is working on replacing its Fast Logic Controller processor codenamed Falcon with a new GPU System Processor (GSP) solution based on RISC-V Instruction Set Architecture (ISA). This novel RISC-V processor is codenamed NV-RISCV and has been used as GPU's controller core, coordinating everything in the massive pool of GPU cores. Today, NVIDIA has decided to open this NV-RISCV CPU to a broader spectrum of applications starting with 510.39 drivers. According to the NVIDIA documents, this is only available in the select GPUs for now, mainly data-centric Tesla accelerators.
NVIDIA DocumentsSome GPUs include a GPU System Processor (GSP) which can be used to offload GPU initialization and management tasks. This processor is driven by the firmware file /lib/firmware/nvidia/510.39.01/gsp.bin. A few select products currently use GSP by default, and more products will take advantage of GSP in future driver releases.
Offloading tasks which were traditionally performed by the driver on the CPU can improve performance due to lower latency access to GPU hardware internals.
As this document shows, many tasks like GPU management and initialization were performed by the driver on the CPU. The CPU is traditionally external (relative to the GPU), resulting in higher latencies when requests are made. A CPU embedded into the GPU results in instant delivery of requested data/action, enabling lower latencies and improving performance. We have yet to see what NVIDIA can do with it and how significant the performance penalty was using old ways when the GSP was not enabled. This also points a new direction for GPUs and accelerators alike, an independent state where CPUs get integrated on-die instead of depending on external hardware.
So far, only select GPUs get their GSP unlocked, and the complete list can be found in the document and the image above. It is advised to check the webise for the record, as NVIDIA can update it at any time.
Source: NVIDIA
Add your own comment

31 Comments on NVIDIA Unlocks GPU System Processor (GSP) for Improved System Performance

#1
PilleniusMC
This could actually be interesting for higher end gaming I'm guessing, but how it could actually apply to gaming is to be seen.
Posted on Reply
#2
WhoDecidedThat
This can be the next big thing after Ray Tracing. Consoles since 2013 have had a CPU + GPU connected to a unified GDDR memory pool so developers must have developed some optimizations regarding this. Those optimizations can be bought over to discrete GPUs having something like 4 Alder Lake E cores and reduce the gaming load on CPUs.
Posted on Reply
#3
silentbogo
AleksandarKThis novel RISC-V processor is codenamed NV-RISCV and has been used for an unknown period as GPU's controller core
Not sure why is it "unknown". The talk about switching FALCON to NV-RISCV has been around for over 5 years. Everything since Turing uses the new scheduler.
PilleniusMCThis could actually be interesting for higher end gaming I'm guessing, but how it could actually apply to gaming is to be seen.
blanarahulThis can be the next big thing after Ray Tracing. Consoles since 2013 have had a CPU + GPU connected to a unified GDDR memory pool so developers must have developed some optimizations regarding this.
It's not a "big" thing. Both falcon and NV-RISCV are tiny microcontrollers built into each GPU who's primary purpose in the system (I mean GPU as a "system") is to do scheduling. It also does other things, which do not require tons of compute power, like validating firmware and managing platform security. Basically it's an equivalent of PSP or ME from Nvidia. Nothing really useful or revolutionary, and it's not going to replace your CPU. Maybe it'll be useful for GPGPU compute, so you can tweak the thread/block scheduling more efficiently, but that's about as far as use cases go.
Posted on Reply
#4
NC37
Miners be like...
Posted on Reply
#5
W1zzard
silentbogois to do scheduling
I think it's more than scheduling, but even if it's just scheduling it could be a huge thing for serious compute (and miners)--you can basically customize your GPU.
It's questionable though how much NVIDIA opens this up
Posted on Reply
#6
HalfAHertz
They would need to expose it through some kind of an api and each existing game/application would have to be modified to specifically target that api...which is kinda unlikely. This is more useful for HPC scenarios where you rewrite your code almost daily to optimize it to be faster or use less resources. If certain tasks can bypass hundreds of little CPU > PCIE > ( RAM > PCIE ) > GPU steps and instead start one task that only runs in EPU <> GPU mode, then that's always preferable.
Posted on Reply
#7
AleksandarK
News Editor
silentbogoNot sure why is it "unknown". The talk about switching FALCON to NV-RISCV has been around for over 5 years. Everything since Turing uses the new scheduler.
We are not sure exactly where its starts. For that NVIDIA has to clarify. Where did you find that it starts with Turing? :)
Posted on Reply
#8
silentbogo
W1zzardI think it's more than scheduling
I mentioned a few more things, but it's hard to tell, because just like with ME and PSP we don't really know what it does.
EDIT: Apparently it's also taking care of power management and display outputs.
AleksandarKWe are not sure exactly where its starts. For that NVIDIA has to clarify. Where did you find that it starts with Turing? :)
Kinda weird question, given that you have the answer in your article's links
Turing and later GPUs are capable of using the GSP firmware by setting the kernel module parameter NVreg_EnableGpuFirmware=1.
Posted on Reply
#9
AleksandarK
News Editor
silentbogoKinda weird question, given that you have the answer in your article's links
I completely missed that. Thanks!
Posted on Reply
#10
bug
HalfAHertzThey would need to expose it through some kind of an api and each existing game/application would have to be modified to specifically target that api...which is kinda unlikely. This is more useful for HPC scenarios where you rewrite your code almost daily to optimize it to be faster or use less resources. If certain tasks can bypass hundreds of little CPU > PCIE > ( RAM > PCIE ) > GPU steps and instead start one task that only runs in EPU <> GPU mode, then that's always preferable.
If the ISA is documented, you don't need an(other) API.
Posted on Reply
#11
R-T-B
NC37Miners be like...
No we don't. This is useless to us. We don't mine on Tesla cards.
Posted on Reply
#12
RH92
Could this be a solution to mitigate the driver overhead issues on their mainstream lineups ?
Posted on Reply
#13
Stefem
AleksandarKWe are not sure exactly where its starts. For that NVIDIA has to clarify. Where did you find that it starts with Turing? :)
It was also part of Volta, with that GPU NVIDIA actually found a bug in the RISC-V ISA (an actual one, not like other "unnamed" company which ended up being their own bug :laugh:) triggered by the extreme speed of that processor and it may be present in older architectures alongside the original FALCON microcontroller
silentbogoMaybe it'll be useful for GPGPU compute, so you can tweak the thread/block scheduling more efficiently, but that's about as far as use cases go.
It's useful for gaming too
Posted on Reply
#14
stanleyipkiss
Didn't coreteks allude to a co-processor before the launch of 30 series cards?
This might be it.
Posted on Reply
#15
erek
stanleyipkissDidn't coreteks allude to a co-processor before the launch of 30 series cards?
This might be it.
"GeForce RTX 3080 and RTX 3090 rumored to pack 'traversal coprocessor' Read more: www.tweaktown.com/news/73209/geforce-rtx-3080-and-3090-rumored-to-pack-traversal-coprocessor/index.html" -- www.tweaktown.com/news/73209/geforce-rtx-3080-and-3090-rumored-to-pack-traversal-coprocessor/index.html

www.hardwaretimes.com/nvidia-ampere-traversal-coprocessor-wont-be-a-separate-chip-likely-an-on-die-component/
Posted on Reply
#16
Stefem
stanleyipkissDidn't coreteks allude to a co-processor before the launch of 30 series cards?
This might be it.
Who's coreteks? the dumba... one that claimed Ampere had a separated processor for RT mounted on the back?
Seriously, why are some of you guys listening to those... people playing at being an expert on the internet? the amount of BS and mislead that people like that spread is incredible
Posted on Reply
#17
erek
StefemWho's coreteks? the dumba... one that claimed Ampere had a separated processor for RT mounted on the back?
Seriously, why are some of you guys listening to those... people playing at being an expert on the internet? the amount of BS and mislead that people like that spread is incredible

NVIDIA Ampere “Traversal Coprocessor” W[I][U]on’t be a Separate Chip; Likely an On-Die Component[/U][/I]

"The TTU (coprocessor) continuously interacts with the L1 cache which would be a slow process if the component off-die. Finally, both the “Top-Level” and the “Bottom Level” BVH Traversal as well as the Ray Transformation and Ray/Triangle Intersection Testing (Basically the entire RT pipeline) has access to the SM L0 cache which would only be ideal if the “coprocessor” is an on-die component."
Posted on Reply
#18
Stefem
erek

NVIDIA Ampere “Traversal Coprocessor” W[I][U]on’t be a Separate Chip; Likely an On-Die Component[/U][/I]

"The TTU (coprocessor) continuously interacts with the L1 cache which would be a slow process if the component off-die. Finally, both the “Top-Level” and the “Bottom Level” BVH Traversal as well as the Ray Transformation and Ray/Triangle Intersection Testing (Basically the entire RT pipeline) has access to the SM L0 cache which would only be ideal if the “coprocessor” is an on-die component."
Yep, it made no sense
Posted on Reply
#19
defaultluser
So, if they already have this thing shipping, why havn't they used this hardware scheduler to accelerate multi-CPU scaling in DX12/Vulkan drivers yet?
Posted on Reply
#20
silentbogo
StefemIt's useful for gaming too
Oyeah... gives devs of modern AAA titles the opportunity to f#@-up few more things :D
defaultluserSo, if they already have this thing shipping, why havn't they used this hardware scheduler to accelerate multi-CPU scaling in DX12/Vulkan drivers yet?
Multi-CPU scaling has nothing to do with this topic. Hardware Accelerated Scheduling is already a part of windows, and since the option no longer exists in windows settings and the corresponding registry key is set to 2, I can safely assume it's enabled by default now.
Posted on Reply
#21
R-T-B
silentbogoand since the option no longer exists in windows settings
Does for me.
Posted on Reply
#22
silentbogo
R-T-BDoes for me.
That's weird. I could've swore mine disappeared since 21H1, but just checked in case I'm hallucinating - and it's there again...
Posted on Reply
#23
Mussels
Freshwater Moderator
Getting some serious envy looking at those 80GB cards in the specs list :O
Imagine the fun cooling that much GDDR6X...

Is this going to be a driver drop in replacement for hardware scheduling? Will they tie it in with the launch of things like RTX-IO, so they have a cute little CPU smashing the numbers ahead of AMD?
Posted on Reply
#24
R-T-B
MusselsGetting some serious envy looking at those 80GB cards in the specs list :O
Wait, 80GB cards? Wut?

Oh yes, HPC. I thought you meant a users System Specs, lol.
Posted on Reply
#25
Stefem
silentbogoOyeah... gives devs of modern AAA titles the opportunity to f#@-up few more things :D
It's not something which should be programed by game developers, it's something NVIDIA itself is supposed to take care of
Posted on Reply
Add your own comment
May 7th, 2024 18:41 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts