Wednesday, January 19th 2022

NVIDIA Unlocks GPU System Processor (GSP) for Improved System Performance

Jan 19th, 2022 01:47 Discuss (31 Comments)

In 2016, NVIDIA announced that the company is working on replacing its Fast Logic Controller processor codenamed Falcon with a new GPU System Processor (GSP) solution based on RISC-V Instruction Set Architecture (ISA). This novel RISC-V processor is codenamed NV-RISCV and has been used as GPU's controller core, coordinating everything in the massive pool of GPU cores. Today, NVIDIA has decided to open this NV-RISCV CPU to a broader spectrum of applications starting with 510.39 drivers. According to the NVIDIA documents, this is only available in the select GPUs for now, mainly data-centric Tesla accelerators.

NVIDIA DocumentsSome GPUs include a GPU System Processor (GSP) which can be used to offload GPU initialization and management tasks. This processor is driven by the firmware file /lib/firmware/nvidia/510.39.01/gsp.bin. A few select products currently use GSP by default, and more products will take advantage of GSP in future driver releases.
Offloading tasks which were traditionally performed by the driver on the CPU can improve performance due to lower latency access to GPU hardware internals.

As this document shows, many tasks like GPU management and initialization were performed by the driver on the CPU. The CPU is traditionally external (relative to the GPU), resulting in higher latencies when requests are made. A CPU embedded into the GPU results in instant delivery of requested data/action, enabling lower latencies and improving performance. We have yet to see what NVIDIA can do with it and how significant the performance penalty was using old ways when the GSP was not enabled. This also points a new direction for GPUs and accelerators alike, an independent state where CPUs get integrated on-die instead of depending on external hardware.

So far, only select GPUs get their GSP unlocked, and the complete list can be found in the document and the image above. It is advised to check the webise for the record, as NVIDIA can update it at any time.

Source: NVIDIA

Add your own comment

31 Comments on NVIDIA Unlocks GPU System Processor (GSP) for Improved System Performance

PilleniusMC

This could actually be interesting for higher end gaming I'm guessing, but how it could actually apply to gaming is to be seen.

WhoDecidedThat

This can be the next big thing after Ray Tracing. Consoles since 2013 have had a CPU + GPU connected to a unified GDDR memory pool so developers must have developed some optimizations regarding this. Those optimizations can be bought over to discrete GPUs having something like 4 Alder Lake E cores and reduce the gaming load on CPUs.

silentbogo

AleksandarKThis novel RISC-V processor is codenamed NV-RISCV and has been used for an unknown period as GPU's controller core

Not sure why is it "unknown". The talk about switching FALCON to NV-RISCV has been around for over 5 years. Everything since Turing uses the new scheduler.

PilleniusMCThis could actually be interesting for higher end gaming I'm guessing, but how it could actually apply to gaming is to be seen.

blanarahulThis can be the next big thing after Ray Tracing. Consoles since 2013 have had a CPU + GPU connected to a unified GDDR memory pool so developers must have developed some optimizations regarding this.

It's not a "big" thing. Both falcon and NV-RISCV are tiny microcontrollers built into each GPU who's primary purpose in the system (I mean GPU as a "system") is to do scheduling. It also does other things, which do not require tons of compute power, like validating firmware and managing platform security. Basically it's an equivalent of PSP or ME from Nvidia. Nothing really useful or revolutionary, and it's not going to replace your CPU. Maybe it'll be useful for GPGPU compute, so you can tweak the thread/block scheduling more efficiently, but that's about as far as use cases go.

NC37

Miners be like...

W1zzard

silentbogois to do scheduling

I think it's more than scheduling, but even if it's just scheduling it could be a huge thing for serious compute (and miners)--you can basically customize your GPU.
It's questionable though how much NVIDIA opens this up

HalfAHertz

They would need to expose it through some kind of an api and each existing game/application would have to be modified to specifically target that api...which is kinda unlikely. This is more useful for HPC scenarios where you rewrite your code almost daily to optimize it to be faster or use less resources. If certain tasks can bypass hundreds of little CPU > PCIE > ( RAM > PCIE ) > GPU steps and instead start one task that only runs in EPU <> GPU mode, then that's always preferable.

AleksandarK

News Editor

silentbogoNot sure why is it "unknown". The talk about switching FALCON to NV-RISCV has been around for over 5 years. Everything since Turing uses the new scheduler.

We are not sure exactly where its starts. For that NVIDIA has to clarify. Where did you find that it starts with Turing? :)

silentbogo

W1zzardI think it's more than scheduling

I mentioned a few more things, but it's hard to tell, because just like with ME and PSP we don't really know what it does.
EDIT: Apparently it's also taking care of power management and display outputs.

AleksandarKWe are not sure exactly where its starts. For that NVIDIA has to clarify. Where did you find that it starts with Turing? :)

Kinda weird question, given that you have the answer in your article's links

Turing and later GPUs are capable of using the GSP firmware by setting the kernel module parameter NVreg_EnableGpuFirmware=1.

AleksandarK

News Editor

silentbogoKinda weird question, given that you have the answer in your article's links

I completely missed that. Thanks!

#10

bug

HalfAHertzThey would need to expose it through some kind of an api and each existing game/application would have to be modified to specifically target that api...which is kinda unlikely. This is more useful for HPC scenarios where you rewrite your code almost daily to optimize it to be faster or use less resources. If certain tasks can bypass hundreds of little CPU > PCIE > ( RAM > PCIE ) > GPU steps and instead start one task that only runs in EPU <> GPU mode, then that's always preferable.

If the ISA is documented, you don't need an(other) API.

#11

R-T-B

NC37Miners be like...

No we don't. This is useless to us. We don't mine on Tesla cards.

#12

RH92

Could this be a solution to mitigate the driver overhead issues on their mainstream lineups ?

#13

Stefem

AleksandarKWe are not sure exactly where its starts. For that NVIDIA has to clarify. Where did you find that it starts with Turing? :)

It was also part of Volta, with that GPU NVIDIA actually found a bug in the RISC-V ISA (an actual one, not like other "unnamed" company which ended up being their own bug :laugh:) triggered by the extreme speed of that processor and it may be present in older architectures alongside the original FALCON microcontroller

silentbogoMaybe it'll be useful for GPGPU compute, so you can tweak the thread/block scheduling more efficiently, but that's about as far as use cases go.

It's useful for gaming too

#14

stanleyipkiss

Didn't coreteks allude to a co-processor before the launch of 30 series cards?
This might be it.

#15

erek

stanleyipkissDidn't coreteks allude to a co-processor before the launch of 30 series cards?
This might be it.

"GeForce RTX 3080 and RTX 3090 rumored to pack 'traversal coprocessor' Read more: www.tweaktown.com/news/73209/geforce-rtx-3080-and-3090-rumored-to-pack-traversal-coprocessor/index.html" -- www.tweaktown.com/news/73209/geforce-rtx-3080-and-3090-rumored-to-pack-traversal-coprocessor/index.html

www.hardwaretimes.com/nvidia-ampere-traversal-coprocessor-wont-be-a-separate-chip-likely-an-on-die-component/

#16

Stefem

stanleyipkissDidn't coreteks allude to a co-processor before the launch of 30 series cards?
This might be it.

Who's coreteks? the ~~dumba...~~ one that claimed Ampere had a separated processor for RT mounted on the back?
Seriously, why are some of you guys listening to those... people playing at being an expert on the internet? the amount of BS and mislead that people like that spread is incredible

#17

erek

StefemWho's coreteks? the ~~dumba...~~ one that claimed Ampere had a separated processor for RT mounted on the back?
Seriously, why are some of you guys listening to those... people playing at being an expert on the internet? the amount of BS and mislead that people like that spread is incredible

NVIDIA Ampere “Traversal Coprocessor” W[I][U]on’t be a Separate Chip; Likely an On-Die Component[/U][/I]

"The TTU (coprocessor) continuously interacts with the L1 cache which would be a slow process if the component off-die. Finally, both the “Top-Level” and the “Bottom Level” BVH Traversal as well as the Ray Transformation and Ray/Triangle Intersection Testing (Basically the entire RT pipeline) has access to the SM L0 cache which would only be ideal if the “coprocessor” is an on-die component."

#18

Stefem

erek
NVIDIA Ampere “Traversal Coprocessor” W[I][U]on’t be a Separate Chip; Likely an On-Die Component[/U][/I]
"The TTU (coprocessor) continuously interacts with the L1 cache which would be a slow process if the component off-die. Finally, both the “Top-Level” and the “Bottom Level” BVH Traversal as well as the Ray Transformation and Ray/Triangle Intersection Testing (Basically the entire RT pipeline) has access to the SM L0 cache which would only be ideal if the “coprocessor” is an on-die component."

Yep, it made no sense

#19

defaultluser

So, if they already have this thing shipping, why havn't they used this hardware scheduler to accelerate multi-CPU scaling in DX12/Vulkan drivers yet?

#20

silentbogo

StefemIt's useful for gaming too

Oyeah... gives devs of modern AAA titles the opportunity to f#@-up few more things :D

defaultluserSo, if they already have this thing shipping, why havn't they used this hardware scheduler to accelerate multi-CPU scaling in DX12/Vulkan drivers yet?

Multi-CPU scaling has nothing to do with this topic. Hardware Accelerated Scheduling is already a part of windows, and since the option no longer exists in windows settings and the corresponding registry key is set to 2, I can safely assume it's enabled by default now.

#21

R-T-B

silentbogoand since the option no longer exists in windows settings

Does for me.

#22

silentbogo

R-T-BDoes for me.

That's weird. I could've swore mine disappeared since 21H1, but just checked in case I'm hallucinating - and it's there again...

#23

Mussels

Freshwater Moderator

Getting some serious envy looking at those 80GB cards in the specs list :O
Imagine the fun cooling that much GDDR6X...

Is this going to be a driver drop in replacement for hardware scheduling? Will they tie it in with the launch of things like RTX-IO, so they have a cute little CPU smashing the numbers ahead of AMD?

#24

R-T-B

MusselsGetting some serious envy looking at those 80GB cards in the specs list :O

Wait, 80GB cards? Wut?

Oh yes, HPC. I thought you meant a users System Specs, lol.

#25

Stefem

silentbogoOyeah... gives devs of modern AAA titles the opportunity to f#@-up few more things :D

It's not something which should be programed by game developers, it's something NVIDIA itself is supposed to take care of

Add your own comment

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

NVIDIA Unlocks GPU System Processor (GSP) for Improved System Performance

31 Comments on NVIDIA Unlocks GPU System Processor (GSP) for Improved System Performance

NVIDIA Ampere “Traversal Coprocessor” W[I][U]on’t be a Separate Chip; Likely an On-Die Component[/U][/I]

NVIDIA Ampere “Traversal Coprocessor” W[I][U]on’t be a Separate Chip; Likely an On-Die Component[/U][/I]

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

NVIDIA Unlocks GPU System Processor (GSP) for Improved System Performance

Related News

31 Comments on NVIDIA Unlocks GPU System Processor (GSP) for Improved System Performance

NVIDIA Ampere “Traversal Coprocessor” W[I][U]on’t be a Separate Chip; Likely an On-Die Component[/U][/I]

NVIDIA Ampere “Traversal Coprocessor” W[I][U]on’t be a Separate Chip; Likely an On-Die Component[/U][/I]

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts