• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Brings Reasoning Models to Consumers Ranging from 1.5B to 32B Parameters

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
3,260 (1.13/day)
Today, NVIDIA unveiled OpenReasoning-Nemotron, a quartet of distilled reasoning models with 1.5B, 7B, 14B, and 32B parameters, all derived from the 671B-parameter DeepSeek R1 0528. By compressing that massive teacher into four leaner Qwen‑2.5-based students, NVIDIA is making advanced reasoning experiments accessible even on standard gaming rigs, without the need to worry about hefty GPU bills and cloud usage. The key is not some elaborate trick but raw data. Using the NeMo Skills pipeline, NVIDIA generated five million math, science, and code solutions, and then fine-tuned each one purely with supervised learning. Already, the 32B model hits an 89.2 on AIME24 and 73.8 on the HMMT February contest, while even the 1.5B variant manages a solid 55.5 and 31.5.

NVIDIA envisions these models serving as a powerful research toolkit. All four checkpoints will be available for download on Hugging Face, providing a strong baseline for exploring reinforcement-learning-driven reasoning or customizing the models for specific tasks. With GenSelect mode (which takes multiple passes for each question), you can spawn multiple parallel generations and pick the best answer, pushing the 32B model to outstanding performance that rivals or even exceeds OpenAI's o3‑high performance on several math and coding benchmarks. Since NVIDIA trained these models with supervised fine-tuning only, without reinforcement learning, the community has clean, state-of-the-art starting points for future RL experiments. For gamers and at-home enthusiasts, we get a model that can be very close to the state-of-the-art, entirely locally, if you have a more powerful gaming GPU.



View at TechPowerUp Main Site | Source
 
When will they start using 1.5b or even 1b models for game NPC ? 1b is more than enough to generate voice chat on the fly locally. Just need a dynamic prompt… i wish GTA6 has something like that. At least they had have on off option for people who can use it.. :/
 
I'd rather see someone who would bring reasoning back to Nvidia..

:D
 
When will they start using 1.5b or even 1b models for game NPC ? 1b is more than enough to generate voice chat on the fly locally. Just need a dynamic prompt… i wish GTA6 has something like that. At least they had have on off option for people who can use it.. :/
why? it's OK for games to end.
actually it's better that they end.
 
When will they start using 1.5b or even 1b models for game NPC ? 1b is more than enough to generate voice chat on the fly locally. Just need a dynamic prompt… i wish GTA6 has something like that. At least they had have on off option for people who can use it.. :/
I think that's their aim. Blackwell (RTX 50) has separate scheduling for AI and conventional workloads. It's used for DLSS 4 now, but in the future it can certainly be used to run a small LLM in-game. It will probably take a decade or more for the market to be sufficiently saturated with GPUs and consoles capable of doing that, though.
 
"Cogito, ergo sum" - René Descartes'

(Also quoted by Professor Moriarty in the ST: TNG episode "Elementary, Dear Data")

Coming to a theater near you REAL soon, are you ready ?
 
Nemotron :wtf:

Move the T to the third place and add one space for clarity: Net Moron
 
why? it's OK for games to end.
actually it's better that they end.
If you want this kind of experience, that's fine, but we have more than enough such games already. The likes of GTA are criminally underdeveloped when it comes to creating truly living worlds with a lot to do outside the rigid script. Seems AI might be the breakthrough needed for that.
 
If you want this kind of experience, that's fine, but we have more than enough such games already. The likes of GTA are criminally underdeveloped when it comes to creating truly living worlds with a lot to do outside the rigid script. Seems AI might be the breakthrough needed for that.
That’s the thing, why do games need to provide entertainment for months instead of days?

A well crafted complete game you enjoy for days is fun, a game that has you waste months is escapism *unless it’s e sports
 
A well crafted complete game you enjoy for days is fun, a game that has you waste months is escapism *unless it’s e sports
Well, that's, like your opinion, man ;)

Also, it doesn't make much sense, because presumably once you finish this theoretical game "in days" you'll presumably pick up and start playing another one, right? So you're "wasting" exactly the same amount of time as me, who has been playing Fallout 4 for years and maybe did cover ~25% of the official content.
 
Well, that's, like your opinion, man ;)

Also, it doesn't make much sense, because presumably once you finish this theoretical game "in days" you'll presumably pick up and start playing another one, right? So you're "wasting" exactly the same amount of time as me, who has been playing Fallout 4 for years and maybe did cover ~25% of the official content.
no you would have finished one and not get all the attachments to the fake world.
 
When will they start using 1.5b or even 1b models for game NPC ? 1b is more than enough to generate voice chat on the fly locally. Just need a dynamic prompt… i wish GTA6 has something like that. At least they had have on off option for people who can use it.. :/
This is not for gaming. This is for real uses of computers.
 
When will they start using 1.5b or even 1b models for game NPC ? 1b is more than enough to generate voice chat on the fly locally. Just need a dynamic prompt… i wish GTA6 has something like that. At least they had have on off option for people who can use it.. :/


This is probably a goal for many studios, but the PC userbase is usually not coherent enough to justify such a big step which is why they usually wait for the consoles to be able to do it. And with one current-gen console having only 8GB available with adequate bandwidth and a 4 TFLOPs RDNA2 GPU, they'll need to wait until the 2027/2028 generation of consoles is here.

Even for the regular PS5 and SeriesX it would be a challenge. If the 6700XT does less than 20 token/s on a 8B model, a 1.5B could do a lot more but then it would still need additional compute time for natural language processing into audio. And that's pushing the full GPU bandwidth which you can't do in a game because graphics still need to run.


Sony or some dev could push for this on the PS5 Pro as an exclusive feature, but I find that hard to believe.
 
This is probably a goal for many studios, but the PC userbase is usually not coherent enough to justify such a big step which is why they usually wait for the consoles to be able to do it. And with one current-gen console having only 8GB available with adequate bandwidth and a 4 TFLOPs RDNA2 GPU, they'll need to wait until the 2027/2028 generation of consoles is here.

Even for the regular PS5 and SeriesX it would be a challenge. If the 6700XT does less than 20 token/s on a 8B model, a 1.5B could do a lot more but then it would still need additional compute time for natural language processing into audio. And that's pushing the full GPU bandwidth which you can't do in a game because graphics still need to run.


Sony or some dev could push for this on the PS5 Pro as an exclusive feature, but I find that hard to believe.
Since they love SAS it's going to be an internet thing with a subscription
 
Even for the regular PS5 and SeriesX it would be a challenge. If the 6700XT does less than 20 token/s on a 8B model, a 1.5B could do a lot more but then it would still need additional compute time for natural language processing into audio. And that's pushing the full GPU bandwidth which you can't do in a game because graphics still need to run.
FWIW, LLMs are mostly memory bound, so a PS5 with its 448GB/s on a 256-bit bus should fare a bit better than a 6700XT (384GB/s@192-bit). Text to speech (TTS) could also be done quite fast.
As you well said, they could do a more fine-tuned, smaller model that can be multiple times faster, so it could be doable. However, as you also said, there are other things going on at the same time that eat up the available compute.

IMO the hardest part would be some game actually managing to stuff an LLM in a game without it sounding way too repetitive or just gimmick-y, like most uses of "AI" that we see out there and that people end up complaining.
 
IMO the hardest part would be some game actually managing to stuff an LLM in a game without it sounding way too repetitive or just gimmick-y, like most uses of "AI" that we see out there and that people end up complaining.
Yeah it's difficult to get even large models to process context coherently. If we're talking about using tiny models (~1B) for game NPCs, I don't see how to achieve what people would want from such a thing--i.e. an NPC that can talk intelligibly about its life and the world around it. Running small local models, you can barely get the LLM to remember every detail of what you said two comments ago, much less huge info dumps.

Messing around with LLMs, particularly on local hardware, is a lot of fun for all of the usual enthusiast/tinkerer reasons, but I'd recommend it in large part simply because the exercise gives you insight into the limitations of the technology, which become glaring pretty quickly. It's very impressive at first, and very useful for certain tasks (e.g. coding), but the illusion of intelligent conversation is puddle deep.
 
That’s the thing, why do games need to provide entertainment for months instead of days?

A well crafted complete game you enjoy for days is fun, a game that has you waste months is escapism *unless it’s e sports
I dont need the game to create entertainment, i want my open world games to feel a little more real, or at least feel more like their world.
It was stuff like the entire hip hop songs that had lyric changes for GTA IV to reference liberty city. Really just the radio in GTA IV in general was amazing. Now imagine virtual radio hosts which are adhering to some kind of core script, but they can also adapt based on what events you have done ingame. "blah blah blah bridge is open, thats funny, it didnt stop (anonymously referred to protagonist) from blasting over it at 120 mph last week".

I dont need my virtual NPC's to have whole lives and backstories and families and tragedy and whatnot. What i do want is for the main story NPC's to know my preference in cars, i want them to comment on my outfit, they should develop dynamic relationships with other NPC's in the main story that can vary depending on how you play.
It doesnt need to be each conventionally interchangeable NPC thats suddenly just as fleshed out as 50% of real people are, i want AI in videogames to be paying attention to what im doing and react accordingly on a wider scale than the individual npc.
 
I dont need the game to create entertainment, i want my open world games to feel a little more real, or at least feel more like their world.
It was stuff like the entire hip hop songs that had lyric changes for GTA IV to reference liberty city. Really just the radio in GTA IV in general was amazing. Now imagine virtual radio hosts which are adhering to some kind of core script, but they can also adapt based on what events you have done ingame. "blah blah blah bridge is open, thats funny, it didnt stop (anonymously referred to protagonist) from blasting over it at 120 mph last week".

I dont need my virtual NPC's to have whole lives and backstories and families and tragedy and whatnot. What i do want is for the main story NPC's to know my preference in cars, i want them to comment on my outfit, they should develop dynamic relationships with other NPC's in the main story that can vary depending on how you play.
It doesnt need to be each conventionally interchangeable NPC thats suddenly just as fleshed out as 50% of real people are, i want AI in videogames to be paying attention to what im doing and react accordingly on a wider scale than the individual npc.
Fuller immersion, sounds attractive yes.
 
Back
Top