• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

What local LLM-s you use?

Joined
May 10, 2023
Messages
853 (1.19/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Q4 17B should fit pretty easily in even 16GB of VRAM, it shouldn't be a problem. Processing/generation should outweigh load/unload.
If you manage to get it properly working with something like ktransformers, then maybe. You'd still require quite some RAM, but should be feasible with consumer platforms nonetheless.
When it comes to inferencing current models, bottleneck is the the VRAM bandwidth. GPU or CPU compute is almost irrelevant. As observed by ollama ps command, tps drops dramatically when even few % of the model is forced to run on a 10x slower system RAM compared to GPU.
LLM models have billions of parameters and each inference pass requires loading most or all of these. Its like billions of neurons firing up and communicating between each other when thinking. This creates huge storm of data traffic and memory bandwidth is the only key performance metric here holding back tps. Yeah at some point when memory becomes fast enough, compute needs to catch up, but due historical design (calculating frames is more compute than bandwidth intensive) GPUs today are bandwidth starved when doing LLM inferencing runs.
But now you're talking about DRAM offloading and making use of CPU for some of the layers, which is totally different from a mGPU setup that was the previous discussion point.
The PCIe bottleneck is really not much of an issue for a small number of GPUs for inference.
 
Joined
Mar 11, 2008
Messages
1,223 (0.20/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
Wonder if we getting a smaller sized Llama 4 for the commerce GPUs.
 
Joined
Mar 11, 2008
Messages
1,223 (0.20/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
Joined
Jun 21, 2021
Messages
3,192 (2.27/day)
System Name daily driver Mac mini M2 Pro
Processor Apple proprietary M2 Pro (6 p-cores, 4 e-cores)
Motherboard Apple proprietary
Cooling Apple proprietary
Memory Apple proprietary 16GB LPDDR5 unified memory
Video Card(s) Apple proprietary M2 Pro (16-core GPU)
Storage Apple proprietary onboard 512GB SSD + various external HDDs
Display(s) LG UltraFine 27UL850W (4K@60Hz IPS)
Case Apple proprietary
Audio Device(s) Apple proprietary
Power Supply Apple proprietary
Mouse Apple Magic Trackpad 2
Keyboard Keychron K1 tenkeyless (Gateron Reds)
VR HMD Oculus Rift S (hosted on a different PC)
Software macOS Sonoma 14.7
Benchmark Scores (My Windows daily driver is a Beelink Mini S12 Pro. I'm not interested in benchmarking.)
Neither Alphabet nor Meta have any motivation to let Joe Consumer run their LLMs locally on their own hardware.

Both companies make the lion's share of their revenue selling their users' Internet usage data. They want people to upload their AI chatbot queries to the cloud. YOU are their product. This should be a surprise to no one here at TPU.

While many people online looovvve to hate on Apple, at least Apple prioritizes privacy and data security. That's why have taken the pains to run at least some of their AI operations locally on the user's hardware (Apple Silicon Macs, Apple Silicon iPads, iPhone 15 Pro and the iPhone 16 family) with only some of the operations being done on their Private Cloud Compute servers. It's probably why Apple is slow to roll out AI features since they need to also worry about privacy and security.

Look at Microsoft Recall. When it was first announced, Microsoft was ridiculed for crushingly inadequate data security and privacy. They listened and postponed deployment. It's almost a year later and there are finally some whispers that it's coming Real Soon Now™. Clearly Microsoft rewrote almost everything from scratch with some sort of attempt to reduce privacy and data security vulnerabilities.
 
Last edited:
Joined
Dec 11, 2023
Messages
66 (0.13/day)
System Name P330 Tiny
Processor i5 8400
Memory 32GB DDR4 3200mhz
Video Card(s) Quadro P1000
Storage SN 570 512GB x2
Display(s) LG 24GN650
Power Supply 175w
Software Ubuntu 24 LTS
I use the following on my little machine.
1. DeepSeek-R1-Distill-Qwen-14B
2. Llama 3 8B Instruct
3. https://huggingface.co/RichardErkhov/failspy_-_Meta-Llama-3-8B-Instruct-abliterated-v3-gguf

I wish to try out some 27B (Gemma3) and 32B models in the future. Worth mentioning is the above models were run a q4 quantization.

DeepCoder-14B-Preview This one looks promising after some short testing!
LM Studio + 7900XT?
I'd be interested in seeing your results even if they are just rough runs or whatever.

Edit: Looks like you've shared some numbers on the previous pages.
 
Last edited:
Joined
Mar 11, 2008
Messages
1,223 (0.20/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
LM Studio + 7900XT?
I'd be interested in seeing your results even if they are just rough runs or whatever.

Edit: Looks like you've shared some numbers on the previous pages.
What models are you interested?
Currently I have these:
1744583613620.png
 
Joined
Dec 11, 2023
Messages
66 (0.13/day)
System Name P330 Tiny
Processor i5 8400
Memory 32GB DDR4 3200mhz
Video Card(s) Quadro P1000
Storage SN 570 512GB x2
Display(s) LG 24GN650
Power Supply 175w
Software Ubuntu 24 LTS
Joined
Mar 11, 2008
Messages
1,223 (0.20/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
- Mistral Small (24B)
- Any Gemma 27B that fits completely in your VRAM.

If things go as planned, I might have the same GPU as you.

Thank you!
Mistral Small (24B) 15 token/s
Gemma 27B - I have only the Q6 and Q8 - Q6 does 8.41 token/s
Gemma 12B Q8 does 44 token/s

I would advice you to get the XTX version with the 24GB - you can thank me later :D
Even that "smol" extra of 4GB will be super handy at some point!
 
Joined
Dec 11, 2023
Messages
66 (0.13/day)
System Name P330 Tiny
Processor i5 8400
Memory 32GB DDR4 3200mhz
Video Card(s) Quadro P1000
Storage SN 570 512GB x2
Display(s) LG 24GN650
Power Supply 175w
Software Ubuntu 24 LTS
Mistral Small (24B) 15 token/s
Gemma 27B - I have only the Q6 and Q8 - Q6 does 8.41 token/s
Gemma 12B Q8 does 44 token/s

I would advice you to get the XTX version with the 24GB - you can thank me later :D
Even that "smol" extra of 4GB will be super handy at some point!
Thanks for the results.

As much as I'd like to get the 7900XTX for the extra memory, the prices are too high.

I paid $730 USD (equivalent) for the XT, the cheapest XTX is $1000. Not worth it for me although it would've been quite nice to have the extra 4GB.

Nvidia options in this range only have 1̶6̶G̶B̶ 12GB, infact that's the only reason I even considered the 7900XT (well I also can't find the Nv cards in stock).
 
Last edited:

johnspack

Here For Good!
Joined
Oct 6, 2007
Messages
6,071 (0.95/day)
Location
Nelson B.C. Canada
System Name System2 Blacknet , System1 Blacknet2
Processor System2 Threadripper 1920x, System1 2699 v3
Motherboard System2 Asrock Fatality x399 Professional Gaming, System1 Asus X99-A
Cooling System2 Noctua NH-U14 TR4-SP3 Dual 140mm fans, System1 AIO
Memory System2 64GBS DDR4 3000, System1 32gbs DDR4 2400
Video Card(s) System2 GTX 980Ti System1 GTX 970
Storage System2 4x SSDs + NVme= 2.250TB 2xStorage Drives=8TB System1 3x SSDs=2TB
Display(s) 1x27" 1440 display 1x 24" 1080 display
Case System2 Some Nzxt case with soundproofing...
Audio Device(s) Asus Xonar U7 MKII
Power Supply System2 EVGA 750 Watt, System1 XFX XTR 750 Watt
Mouse Logitech G900 Chaos Spectrum
Keyboard Ducky
Software Archlinux, Manjaro, Win11 Ent 24h2
Benchmark Scores It's linux baby!
Don't know if anyone has noticed this or not but I seem get up to 20% better performance under linux....
 
Joined
Mar 21, 2016
Messages
2,670 (0.80/day)
Neither Alphabet nor Meta have any motivation to let Joe Consumer run their LLMs locally on their own hardware.

Both companies make the lion's share of their revenue selling their users' Internet usage data. They want people to upload their AI chatbot queries to the cloud. YOU are their product. This should be a surprise to no one here at TPU.

While many people online looovvve to hate on Apple, at least Apple prioritizes privacy and data security. That's why have taken the pains to run at least some of their AI operations locally on the user's hardware (Apple Silicon Macs, Apple Silicon iPads, iPhone 15 Pro and the iPhone 16 family) with only some of the operations being done on their Private Cloud Compute servers. It's probably why Apple is slow to roll out AI features since they need to also worry about privacy and security.

Look at Microsoft Recall. When it was first announced, Microsoft was ridiculed for crushingly inadequate data security and privacy. They listened and postponed deployment. It's almost a year later and there are finally some whispers that it's coming Real Soon Now™. Clearly Microsoft rewrote almost everything from scratch with some sort of attempt to reduce privacy and data security vulnerabilities.

I pretty much agree big tech have no real interest in consumers running LLM's locally at least not without being able to sell it to them or generate revenue in some way from them running it locally like via adverts much YouTube interrupting you every 3 or 4 minutes to watching another ad.
 

ir_cow

Staff member
Joined
Sep 4, 2008
Messages
5,013 (0.82/day)
Location
USA
Just installed Deepseek R1. 1QM its massive. 167GB lol
 
Joined
Mar 11, 2008
Messages
1,223 (0.20/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
Just installed Deepseek R1. 1QM its massive. 167GB lol
What monster rig you have?
And also, what speeds you can get?
I could get one more of the kit I have to run it, but it would be still like 0.7 token/s or maybe even less :D
 

ir_cow

Staff member
Joined
Sep 4, 2008
Messages
5,013 (0.82/day)
Location
USA
What monster rig you have?
Just the test computer 285K / RTX 4090 and 4x64GB.

And also, what speeds you can get?
I could get one more of the kit I have to run it, but it would be still like 0.7 token/s or maybe even less :D
It is "slow" for response, but thats okay because the answers are much better vs Distilled 8B and some instances 70B. I don't know how to check the token rate in LLM Studio. Any ideas?
 
Joined
Nov 23, 2023
Messages
398 (0.76/day)
Neither Alphabet nor Meta have any motivation to let Joe Consumer run their LLMs locally on their own hardware.

Both companies make the lion's share of their revenue selling their users' Internet usage data. They want people to upload their AI chatbot queries to the cloud. YOU are their product. This should be a surprise to no one here at TPU.

While many people online looovvve to hate on Apple, at least Apple prioritizes privacy and data security. That's why have taken the pains to run at least some of their AI operations locally on the user's hardware (Apple Silicon Macs, Apple Silicon iPads, iPhone 15 Pro and the iPhone 16 family) with only some of the operations being done on their Private Cloud Compute servers. It's probably why Apple is slow to roll out AI features since they need to also worry about privacy and security.

Look at Microsoft Recall. When it was first announced, Microsoft was ridiculed for crushingly inadequate data security and privacy. They listened and postponed deployment. It's almost a year later and there are finally some whispers that it's coming Real Soon Now™. Clearly Microsoft rewrote almost everything from scratch with some sort of attempt to reduce privacy and data security vulnerabilities.
Alphabet does actually have an incentive to let consumers run LLMs on their own hardware. Their models are made to be able to run on Android devices.
 
Top