NVIDIA Brings Reasoning Models to Consumers Ranging from 1.5B to 32B Parameters

AleksandarK · Sunday at 11:33 AM

Today, NVIDIA unveiled OpenReasoning-Nemotron, a quartet of distilled reasoning models with 1.5B, 7B, 14B, and 32B parameters, all derived from the 671B-parameter DeepSeek R1 0528. By compressing that massive teacher into four leaner Qwen‑2.5-based students, NVIDIA is making advanced reasoning experiments accessible even on standard gaming rigs, without the need to worry about hefty GPU bills and cloud usage. The key is not some elaborate trick but raw data. Using the NeMo Skills pipeline, NVIDIA generated five million math, science, and code solutions, and then fine-tuned each one purely with supervised learning. Already, the 32B model hits an 89.2 on AIME24 and 73.8 on the HMMT February contest, while even the 1.5B variant manages a solid 55.5 and 31.5.

NVIDIA envisions these models serving as a powerful research toolkit. All four checkpoints will be available for download on Hugging Face, providing a strong baseline for exploring reinforcement-learning-driven reasoning or customizing the models for specific tasks. With GenSelect mode (which takes multiple passes for each question), you can spawn multiple parallel generations and pick the best answer, pushing the 32B model to outstanding performance that rivals or even exceeds OpenAI's o3‑high performance on several math and coding benchmarks. Since NVIDIA trained these models with supervised fine-tuning only, without reinforcement learning, the community has clean, state-of-the-art starting points for future RL experiments. For gamers and at-home enthusiasts, we get a model that can be very close to the state-of-the-art, entirely locally, if you have a more powerful gaming GPU.

View at TechPowerUp Main Site | Source

SKD007 · Sunday at 4:12 PM

When will they start using 1.5b or even 1b models for game NPC ? 1b is more than enough to generate voice chat on the fly locally. Just need a dynamic prompt… i wish GTA6 has something like that. At least they had have on off option for people who can use it.. :/

lepudruk · Sunday at 6:57 PM

I'd rather see someone who would bring reasoning back to Nvidia..

kondamin · Sunday at 7:38 PM

SKD007 said:
When will they start using 1.5b or even 1b models for game NPC ? 1b is more than enough to generate voice chat on the fly locally. Just need a dynamic prompt… i wish GTA6 has something like that. At least they had have on off option for people who can use it.. :/

why? it's OK for games to end.
actually it's better that they end.

LastDudeALive · Sunday at 8:11 PM

SKD007 said:
When will they start using 1.5b or even 1b models for game NPC ? 1b is more than enough to generate voice chat on the fly locally. Just need a dynamic prompt… i wish GTA6 has something like that. At least they had have on off option for people who can use it.. :/

I think that's their aim. Blackwell (RTX 50) has separate scheduling for AI and conventional workloads. It's used for DLSS 4 now, but in the future it can certainly be used to run a small LLM in-game. It will probably take a decade or more for the market to be sufficiently saturated with GPUs and consoles capable of doing that, though.

bonehead123 · Sunday at 10:04 PM

"Cogito, ergo sum" - René Descartes'

(Also quoted by Professor Moriarty in the ST: TNG episode "Elementary, Dear Data")

Coming to a theater near you REAL soon, are you ready ?

AGlezB · Sunday at 10:30 PM

Nemotron

Move the T to the third place and add one space for clarity: Net Moron

cinemaware · 2025-07-21T06:08:11+0100

kondamin said:
why? it's OK for games to end.
actually it's better that they end.

If you want this kind of experience, that's fine, but we have more than enough such games already. The likes of GTA are criminally underdeveloped when it comes to creating truly living worlds with a lot to do outside the rigid script. Seems AI might be the breakthrough needed for that.

kondamin · 2025-07-21T08:37:40+0100

cinemaware said:
If you want this kind of experience, that's fine, but we have more than enough such games already. The likes of GTA are criminally underdeveloped when it comes to creating truly living worlds with a lot to do outside the rigid script. Seems AI might be the breakthrough needed for that.

That’s the thing, why do games need to provide entertainment for months instead of days?

A well crafted complete game you enjoy for days is fun, a game that has you waste months is escapism *unless it’s e sports

cinemaware · 2025-07-21T08:45:24+0100

kondamin said:
A well crafted complete game you enjoy for days is fun, a game that has you waste months is escapism *unless it’s e sports

Well, that's, like your opinion, man

Also, it doesn't make much sense, because presumably once you finish this theoretical game "in days" you'll presumably pick up and start playing another one, right? So you're "wasting" exactly the same amount of time as me, who has been playing Fallout 4 for years and maybe did cover ~25% of the official content.

kondamin · 2025-07-21T09:03:13+0100

cinemaware said:
Well, that's, like your opinion, man

Also, it doesn't make much sense, because presumably once you finish this theoretical game "in days" you'll presumably pick up and start playing another one, right? So you're "wasting" exactly the same amount of time as me, who has been playing Fallout 4 for years and maybe did cover ~25% of the official content.

no you would have finished one and not get all the attachments to the fake world.

SOAREVERSOR · 2025-07-21T13:21:58+0100

SKD007 said:
When will they start using 1.5b or even 1b models for game NPC ? 1b is more than enough to generate voice chat on the fly locally. Just need a dynamic prompt… i wish GTA6 has something like that. At least they had have on off option for people who can use it.. :/

This is not for gaming. This is for real uses of computers.

ToTTenTranz · 2025-07-21T14:18:58+0100

SKD007 said:
When will they start using 1.5b or even 1b models for game NPC ? 1b is more than enough to generate voice chat on the fly locally. Just need a dynamic prompt… i wish GTA6 has something like that. At least they had have on off option for people who can use it.. :/

This is probably a goal for many studios, but the PC userbase is usually not coherent enough to justify such a big step which is why they usually wait for the consoles to be able to do it. And with one current-gen console having only 8GB available with adequate bandwidth and a 4 TFLOPs RDNA2 GPU, they'll need to wait until the 2027/2028 generation of consoles is here.

Even for the regular PS5 and SeriesX it would be a challenge. If the 6700XT does less than 20 token/s on a 8B model, a 1.5B could do a lot more but then it would still need additional compute time for natural language processing into audio. And that's pushing the full GPU bandwidth which you can't do in a game because graphics still need to run.

Sony or some dev could push for this on the PS5 Pro as an exclusive feature, but I find that hard to believe.

kondamin · 2025-07-21T14:35:04+0100

ToTTenTranz said:
This is probably a goal for many studios, but the PC userbase is usually not coherent enough to justify such a big step which is why they usually wait for the consoles to be able to do it. And with one current-gen console having only 8GB available with adequate bandwidth and a 4 TFLOPs RDNA2 GPU, they'll need to wait until the 2027/2028 generation of consoles is here.

Even for the regular PS5 and SeriesX it would be a challenge. If the 6700XT does less than 20 token/s on a 8B model, a 1.5B could do a lot more but then it would still need additional compute time for natural language processing into audio. And that's pushing the full GPU bandwidth which you can't do in a game because graphics still need to run.

Sony or some dev could push for this on the PS5 Pro as an exclusive feature, but I find that hard to believe.

Since they love SAS it's going to be an internet thing with a subscription

igormp · 2025-07-21T14:46:09+0100

ToTTenTranz said:
Even for the regular PS5 and SeriesX it would be a challenge. If the 6700XT does less than 20 token/s on a 8B model, a 1.5B could do a lot more but then it would still need additional compute time for natural language processing into audio. And that's pushing the full GPU bandwidth which you can't do in a game because graphics still need to run.

FWIW, LLMs are mostly memory bound, so a PS5 with its 448GB/s on a 256-bit bus should fare a bit better than a 6700XT (384GB/s@192-bit). Text to speech (TTS) could also be done quite fast.
As you well said, they could do a more fine-tuned, smaller model that can be multiple times faster, so it could be doable. However, as you also said, there are other things going on at the same time that eat up the available compute.

IMO the hardest part would be some game actually managing to stuff an LLM in a game without it sounding way too repetitive or just gimmick-y, like most uses of "AI" that we see out there and that people end up complaining.

Wasteland · 2025-07-21T18:48:55+0100

igormp said:
IMO the hardest part would be some game actually managing to stuff an LLM in a game without it sounding way too repetitive or just gimmick-y, like most uses of "AI" that we see out there and that people end up complaining.

Yeah it's difficult to get even large models to process context coherently. If we're talking about using tiny models (~1B) for game NPCs, I don't see how to achieve what people would want from such a thing--i.e. an NPC that can talk intelligibly about its life and the world around it. Running small local models, you can barely get the LLM to remember every detail of what you said two comments ago, much less huge info dumps.

Messing around with LLMs, particularly on local hardware, is a lot of fun for all of the usual enthusiast/tinkerer reasons, but I'd recommend it in large part simply because the exercise gives you insight into the limitations of the technology, which become glaring pretty quickly. It's very impressive at first, and very useful for certain tasks (e.g. coding), but the illusion of intelligent conversation is puddle deep.

gwync · 2025-07-21T19:05:48+0100

kondamin said:
That’s the thing, why do games need to provide entertainment for months instead of days?

A well crafted complete game you enjoy for days is fun, a game that has you waste months is escapism *unless it’s e sports

I dont need the game to create entertainment, i want my open world games to feel a little more real, or at least feel more like their world.
It was stuff like the entire hip hop songs that had lyric changes for GTA IV to reference liberty city. Really just the radio in GTA IV in general was amazing. Now imagine virtual radio hosts which are adhering to some kind of core script, but they can also adapt based on what events you have done ingame. "blah blah blah bridge is open, thats funny, it didnt stop (anonymously referred to protagonist) from blasting over it at 120 mph last week".

I dont need my virtual NPC's to have whole lives and backstories and families and tragedy and whatnot. What i do want is for the main story NPC's to know my preference in cars, i want them to comment on my outfit, they should develop dynamic relationships with other NPC's in the main story that can vary depending on how you play.
It doesnt need to be each conventionally interchangeable NPC thats suddenly just as fleshed out as 50% of real people are, i want AI in videogames to be paying attention to what im doing and react accordingly on a wider scale than the individual npc.

kondamin · 2025-07-21T19:18:38+0100

gwync said:
I dont need the game to create entertainment, i want my open world games to feel a little more real, or at least feel more like their world.
It was stuff like the entire hip hop songs that had lyric changes for GTA IV to reference liberty city. Really just the radio in GTA IV in general was amazing. Now imagine virtual radio hosts which are adhering to some kind of core script, but they can also adapt based on what events you have done ingame. "blah blah blah bridge is open, thats funny, it didnt stop (anonymously referred to protagonist) from blasting over it at 120 mph last week".

I dont need my virtual NPC's to have whole lives and backstories and families and tragedy and whatnot. What i do want is for the main story NPC's to know my preference in cars, i want them to comment on my outfit, they should develop dynamic relationships with other NPC's in the main story that can vary depending on how you play.
It doesnt need to be each conventionally interchangeable NPC thats suddenly just as fleshed out as 50% of real people are, i want AI in videogames to be paying attention to what im doing and react accordingly on a wider scale than the individual npc.

Fuller immersion, sounds attractive yes.

System Name	AMD RyZen PC
Processor	AMD RyZen 5950x
Motherboard	ASUS Crosshair VIII Hero 570x WIFI
Cooling	Custom Loop
Memory	64GB G.Skill Trident Z DDR4 3200 MHz 14C x4
Video Card(s)	Evga 3080 TI
Storage	Seagate 8TB + 3TB + 4TB + 2TB external + 512 Samsung 980
Display(s)	LG 4K 144Hz 27GN950-B
Case	Thermaltake CA-1F8-00M1WN-02 Core X71 Tempered Glass Edition Black
Audio Device(s)	XI-FI 8.1
Power Supply	EVGA 700W
Mouse	Microsoft
Keyboard	Microsoft
Software	Windows 10 x64 Pro

Processor	AMD Ryzen 7 9800X3D
Motherboard	Asrock X870E NOVA
Cooling	Arctic Liquid Freezer III 360
Memory	G.Skill TRIDENT Z 32GB 6000MHz CL30 DDR5
Video Card(s)	MSI Suprim X RTX 4080 16GB
Storage	Kingston KC3000 1TB
Display(s)	Alienware AW3423DWF 34" 21:9 OLED
Case	Antec C8
Power Supply	NZXT C1200 Gold ATX 3.1
Mouse	Roccat Kone Air
Keyboard	OZONE StrikePro Spectra (CherryMX Red)
Software	Windows 11 Pro x64

Processor	Ryzen 7 5800X3D
Motherboard	MSI Pro B550M-VC Wifi
Cooling	Thermalright Peerless Assassin 120 SE
Memory	2x16GB G.Skill RipJaws DDR4-3600 CL16
Video Card(s)	Asus DUAL OC RTX 4070 Super
Storage	4TB NVME, 2TB SATA SSD, 4TB SATA HDD
Display(s)	Asus ROG PG34WCDM 34" 3440x1440p OLED HDR.
Case	Fractal Design Pop Air MIni
Power Supply	Corsair RMe 750W 80+ Gold
Mouse	Logitech G502 Hero
Keyboard	GMMK TKL RGB Black
VR HMD	Oculus Quest 2

System Name	The Little One
Processor	i5-11320H @3.4GHZ
Motherboard	Beelink/AZW SEI
Cooling	Fan w/heat pipes + side & rear vents
Memory	64GB Crucial DDR4-3200 (2x 32GB)
Video Card(s)	Iris XE+
Storage	WD Black SN850X 8TB m.2 + Seagate 4TB SATA SSD + 8TB SN850X x2 in an external USB-C enclosure
Display(s)	2x Samsung 43" + 1x 32"
Case	Practically identical to a mac mini, just purrtier in slate blue, & with 3x usb ports on the front !
Audio Device(s)	No-name compact bluetooth speakers
Power Supply	65w brick
Mouse	Logitech MX Master 3
Keyboard	Logitech G613 mechanical wireless
VR HMD	Whahdatiz ???
Software	Windows 10 pro, with all the unnecessary background shitzu turned OFF !
Benchmark Scores	PDQ

Processor	Ryzen 9 5900X
Motherboard	Gigabyte X570 Aorus Pro
Cooling	AiO 240mm
Memory	2x 32GB Kingston Fury Beast 3600MHz CL18
Video Card(s)	Radeon RX 6900XT Reference (amd.com)
Storage	O.S.: 256GB SATA \| 2x 1TB SanDisk SSD SATA Data \| Games: 1TB Samsung 970 Evo
Display(s)	LG 34" UWQHD
Audio Device(s)	X-Fi XtremeMusic + Gigaworks SB750 7.1 THX
Power Supply	XFX 850W
Mouse	Logitech G502 Wireless
VR HMD	Lenovo Explorer
Software	Windows 10 64bit

Processor	9950x \| 5950x
Motherboard	x670e ProArt\| B550 ProArt
Cooling	PA 120 SE \|Fuma 2
Memory	4x64GB Kingston CUDIMM @5200MHz \| 4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	Corsair RM1000e \| XPG Core Reactor 850W
Software	I use Arch btw

Processor	Core i7-12700
Motherboard	MSI B660 MAG Mortar
Cooling	Noctua NH-D15
Memory	G.Skill Ripjaws V 64GB (4x16) DDR4-3600 CL16 @ 3466 MT/s
Video Card(s)	AMD RX 6800
Storage	Too many to list, lol
Display(s)	Gigabyte M27Q
Case	Fractal Design Define R5
Power Supply	Corsair RM750x
Mouse	Too many to list, lol
Keyboard	Keychron low profile
Software	Fedora, Mint

System Name	Primary PC
Processor	i5 12400F
Motherboard	Asus H610M-CT D4
Cooling	ID Cooling Frostflow X 360
Memory	2x16gb G. Skill TridentZ Neo 3200mhz CL16
Video Card(s)	Nvidia Titan V
Storage	Crucial T500 2tb NVME
Display(s)	Samsung S70A 32" 4k IPS
Case	Asus AP201
Power Supply	Superflower Leadex V Platinum Pro 850w
Mouse	Gwolves Hati HTM w/Pulsar Superglide XL
Keyboard	Filco Majestouch 3

NVIDIA Brings Reasoning Models to Consumers Ranging from 1.5B to 32B Parameters

News Editor