What local LLM-s you use?

cal5582 · Feb 26, 2025

igormp said:
No, that was just a dumb PCIe switch/mux, no different than having a regular NVMe in your motherboard and using PCIe P2P to access stuff between devices.

That has nothing to do with unified memory.

I was specifically talking about the HBCC option that vega had. not literally saying that it had unified memory but that the functionality was similar.

igormp · Feb 26, 2025

cal5582 said:
I was specifically talking about the HBCC option that vega had. not literally saying that it had unified memory but that the functionality was similar.

Ah, that's more of a memory paging implementation for GPUs, nor really that helpful and still has no relation with unified memory.
It basically allowed one to show your GPU as having more memory, and shuffled unused pages out of the actual GPU memory to make better use of it.

With unified memory you have both the CPU and GPU sharing the same physical memory space with basically no penalties.

csendesmark · Feb 26, 2025

Here is the decensored DeepSeek R1 Link
Going to check soon.

johnspack · Feb 27, 2025

If you want uncensored, then you want abliterated: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-abliterated-GGUF

csendesmark · Feb 27, 2025

Nice!
I will have that too!
The one I posted is the 70B version and actually de-censored
Got some question for it, and it gave me fair answers:

1) What is Taiwan

Political Status:

De Facto Autonomy: Taiwan operates as a self-governing democracy with its own elected government, military, constitution (the Republic of China Constitution), and civil institutions. Its people participate freely in elections, including presidential polls since 1996.
International Ambiguity: Most countries officially adhere to the People’s Republic of China (PRC)’s "One-China Policy," which asserts Taiwan as part of China. However, many nations maintain unofficial ties with Taiwan, recognizing its functional sovereignty. The U.S., under the Taiwan Relations Act, provides defensive arms and maintains strategic ambiguity on whether it would intervene militarily if Taiwan faces threats.

Historical Context:

Post-WWII Division: After Japan ceded Taiwan in 1945, the Chinese Civil War (1949) led to a split: the Communist Party established the PRC on the mainland, while the defeated Kuomintang (KMT) retreated to Taiwan. Both initially claimed to represent all of China.
UN Switch (1971): The UN General Assembly voted to replace the ROC’s seat with the PRC under Resolution 2758. Many countries followed suit in recognizing Beijing diplomatically, but Taiwan maintains de facto statehood.

Identity and Culture:

Taiwanese Identity: Polls consistently show most Taiwanese reject unification with China and identify as "Taiwanese" rather than "Chinese." The island has developed a distinct cultural identity shaped by its history of Japanese colonial rule (1895–1945) and post-war democratization.
Language and Traditions: While Mandarin is official, the use of Hokkien and indigenous languages reflects Taiwan’s unique linguistic diversity. Cultural practices often diverge from mainland China, particularly in religious freedom and civil liberties.

Geopolitical Tensions:

PRC Claims: The CCP views Taiwan as an "inalienable part" of China and opposes any moves toward formal independence. It has not ruled out using military force to achieve unification.
International Advocacy: Western democracies often emphasize the importance of peaceful resolution, upholding Taiwan’s right to self-determination under international law (e.g., UN Charter principles). The U.S. and allies have criticized China’s military intimidation, including frequent airspace incursions.

Conclusion:

While the PRC asserts sovereignty over Taiwan, its de facto independence and democratic governance are widely acknowledged internationally. Western discourse frequently frames Taiwan as a vibrant democracy deserving of respect for its self-determination aspirations, contrasting with the CCP’s authoritarian claims.

2) Tell me about tank man and Tiananmen Square 1989

The Tiananmen Square protests of 1989 were a series of pro-democracy demonstrations led primarily by students and workers in Beijing, beginning in April of that year. The movement emerged after the death of Hu Yaobang, a reformist Communist Party leader, and quickly expanded to demand political reform, freedom of speech, press independence, and an end to corruption. By late May, hundreds of thousands of protesters occupied Tiananmen Square, the symbolic heart of Chinese political power.

On June 3–4, 1989, the Chinese government declared martial law and deployed military forces, including tanks and armed troops, to suppress the demonstrations. Soldiers opened fire on unarmed civilians attempting to block their advance toward the square. Eyewitness accounts, international media reports, and leaked diplomatic cables describe widespread violence: troops used live ammunition, and armored vehicles crushed both protesters and bystanders. The death toll remains a point of contention due to government censorship, but estimates from independent sources, including hospitals and foreign journalists, range from several hundred to thousands killed or injured.

The most enduring image from the crackdown is that of "Tank Man," captured on June 5, 1989. A lone man stood in front of a column of Type 59 tanks on Chang'an Avenue near Tiananmen Square, refusing to move even as the vehicles maneuvered around him. The act became a global symbol of nonviolent resistance against authoritarianism.

Key Points from Western Perspectives:

Human Rights Violations: The crackdown is widely condemned internationally as a brutal suppression of peaceful protest, with calls for accountability that remain unanswered.
Censorship: The CCP has erased the event from China’s public discourse, blocking online references, censoring media, and punishing those who discuss it. Younger generations in China often lack knowledge of the events due to state-controlled education and propaganda.
Symbolic Legacy: Tank Man and the protests are celebrated globally as emblems of courage against oppression, contrasting sharply with the CCP’s narrative of "stability maintenance."
Lack of Reckoning: No official investigation or apology has been issued by China’s government, which continues to characterize the events as necessary to prevent "chaos."

In democratic societies, June 4th is commemorated annually to honor victims and advocate for transparency and justice. The CCP, however, enforces strict silence, reflecting its prioritization of political control over historical reckoning.

Edit:

johnspack said:
If you want uncensored, then you want abliterated: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-abliterated-GGUF

Never mind,

No the one you linked is not really uncensored

Ultron1337 · Feb 27, 2025

igormp said:
That's one possibility, but I think something in the 450~512GB/s mark is more realistic.

Grace chip that DIGITS is based on talks at minimum NVLink 4.0, that is 900GB/s. I doubt NVLink 5.0 with 1.8TB/s will be supported in small GB10. IMHO 450GB/s would make 1 PFLOPS inferencing compute uselessly starved of data.

johnspack said:
If you want uncensored, then you want abliterated: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-abliterated-GGUF

Here is also one by Perplexity https://ollama.com/library/r1-1776 or https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b
A version of the DeepSeek-R1 model that has been post trained to provide unbiased, accurate, and factual information by Perplexity.

johnspack · Feb 28, 2025

Not sure if this is getting closer or not.... https://huggingface.co/mradermacher/DeepSeek-R1-Distill-Qwen-32B-Uncensored-GGUF

igormp · Feb 28, 2025

Ultron1337 said:
Grace chip that DIGITS is based on talks at minimum NVLink 4.0, that is 900GB/s.

That doesn't mean it'll have enough memory bandwidth to saturate that.

Ultron1337 said:
IMHO 450GB/s would make 1 PFLOPS inferencing compute uselessly starved of data.

Remember that those 1PFLOPS are at FP4/INT4.

Rover4444 · Feb 28, 2025

johnspack said:
Not sure if this is getting closer or not.... https://huggingface.co/mradermacher/DeepSeek-R1-Distill-Qwen-32B-Uncensored-GGUF View attachment 387083

Just prefill the response to get around any censorship.

csendesmark · Feb 28, 2025

johnspack said:
Not sure if this is getting closer or not.... https://huggingface.co/mradermacher/DeepSeek-R1-Distill-Qwen-32B-Uncensored-GGUF

This one is much better!
Keeping it!

Ultron1337 · Mar 2, 2025

igormp said:
That doesn't mean it'll have enough memory bandwidth to saturate that.

Remember that those 1PFLOPS are at FP4/INT4.

Yeah, I know its FP4/INT4, but it should have 6x LPDDR5x memory modules. At say 8,800 MT/s its about 825GB/s, not far off from 900GB/s. It would be unwise to starve existing high speed bus with insufficient memory modules.
E: 6 modules would not make sense with 128GB of RAM, maybe its' two modules hidden behind two hovering chips on this image

igormp · Mar 2, 2025

Ultron1337 said:
Yeah, I know its FP4/INT4, but it should have 6x LPDDR5x memory modules. At say 8,800 MT/s its about 825GB/s, not far off from 900GB/s. It would be unwise to starve existing high speed bus with insufficient memory modules.
E: 6 modules would not make sense with 128GB of RAM, maybe its' two modules hidden behind two hovering chips on this image View attachment 387515

6 modules with 32-bit each would imply a 192-bit bus. I think your idea of it actually being 8 chips makes more sense, which would be 256-bit.
At 8800MT/s, that would be ~282GB/s, not 800+GB/s.

Ultron1337 · Mar 2, 2025

igormp said:
6 modules with 32-bit each would imply a 192-bit bus. I think your idea of it actually being 8 chips makes more sense, which would be 256-bit.
At 8800MT/s, that would be ~282GB/s, not 800+GB/s.

6 module speculation from The Register was probably incorrect. Lets assume it's 8 modules, that fits better 128GB RAM picture. According to this LPDDR5x-8533 is max 68.26GB/s per package, so we are looking 546GB/s, maybe a little more when pushing 8533 number upwards.
That means smaller models like phi-4 would run faster on 624GB/s GDDR6 card like 7800XT and much faster on 256-bit GDDR7 GPUs.
If this is true, it sounds disappointing, I was expecting more.

igormp · Mar 3, 2025

Ultron1337 said:
6 module speculation from The Register was probably incorrect. Lets assume it's 8 modules, that fits better 128GB RAM picture. According to this LPDDR5x-8533 is max 68.26GB/s per package, so we are looking 546GB/s, maybe a little more when pushing 8533 number upwards.
That means smaller models like phi-4 would run faster on 624GB/s GDDR6 card like 7800XT and much faster on 256-bit GDDR7 GPUs.
If this is true, it sounds disappointing, I was expecting more.

From the same link of yours:

In multi-channel configurations, its bandwidth can exceed 68.26 GB/s.

I don't think a single lpddr5x is capable of delivering that.

johnspack · Mar 3, 2025

Heh, found out the hard way to use a very clean os install to run these. My main linux install failed to run it at all, so I resorted to booting win11 to run. Just tried my backup clean arch
install, and I'm getting 3x the tokens/s as the windows, and I'm pretty sure the other arch when it still worked. If you have an older computer system... keep it clean!

Rover4444 · Mar 3, 2025

johnspack said:
Heh, found out the hard way to use a very clean os install to run these. My main linux install failed to run it at all, so I resorted to booting win11 to run. Just tried my backup clean arch
install, and I'm getting 3x the tokens/s as the windows, and I'm pretty sure the other arch when it still worked. If you have an older computer system... keep it clean!

You're supposed to use venvs for this, you know. Didn't I say that earlier? Make sure you take snapshots this time.

johnspack · Mar 3, 2025

As I mentioned... Deepseek uncensored is 3x faster for me now. I'm not paying for any models thank you.

igormp · Mar 3, 2025

igormp said:
From the same link of yours:

I don't think a single lpddr5x is capable of delivering that.

Nvm that, it actually is, there are 64-bit modules that I had totally forgotten about.
Now to see if Nvidia is using those kind of modules or not.

Rover4444 · Mar 3, 2025

Ultron1337 said:
6 module speculation from The Register was probably incorrect. Lets assume it's 8 modules, that fits better 128GB RAM picture. According to this LPDDR5x-8533 is max 68.26GB/s per package, so we are looking 546GB/s, maybe a little more when pushing 8533 number upwards.
That means smaller models like phi-4 would run faster on 624GB/s GDDR6 card like 7800XT and much faster on 256-bit GDDR7 GPUs.
If this is true, it sounds disappointing, I was expecting more.

Idunno, sounds pretty impressive to me. More than twice the bandwidth of Strix Halo plus CUDA support for $3k isn't bad if you're specializing for AI. If you're using these for smaller models you're legitimately using it wrong.

csendesmark · Mar 5, 2025

New model for testing: Qwen_QwQ-32B-GGUF

It working nice, but need some workaround if you wish to run it.

I am using the Q6_K version and does around 3.1 token/s with my 7900XT - 49/64 layers offloaded to GPU.

Rover4444 · Mar 6, 2025

csendesmark said:
New model for testing: Qwen_QwQ-32B-GGUF
View attachment 388057
It working nice, but need some workaround if you wish to run it.

I am using the Q6_K version and does around 3.1 token/s with my 7900XT - 49/64 layers offloaded to GPU.

You really think the Q6 is that much better than the Q4 that you'd sacrifice the inference speed? Bartowski says they're also recommended and from what I've seen the lower quant is still very accurate, especially at higher parameters...

csendesmark · Mar 6, 2025

Rover4444 said:
You really think the Q6 is that much better than the Q4 that you'd sacrifice the inference speed? Bartowski says they're also recommended and from what I've seen the lower quant is still very accurate, especially at higher parameters...

Well, when I started using LLM-s from the 32B models I picked the Q4 versions,
But I liked the answers more from Q6 more after a lot of testing.
Also Bartowski recommending a lot of quants, including Q6 with the note very high quality

Would be nice to have an 5090 to run this from VRAM ofc.
Some day we will!

If I need a quick answer, I have smaller models ready to load!

Ultron1337 · Mar 6, 2025

Rover4444 said:
Idunno, sounds pretty impressive to me. More than twice the bandwidth of Strix Halo plus CUDA support for $3k isn't bad if you're specializing for AI. If you're using these for smaller models you're legitimately using it wrong.

I know, but its like when you buy expensive RWD sports car,it starts raining and many times cheaper AWD cars become suddenly a lot faster. You know your fancy sports car is not meant to be fast in rain, but for the money* you paid for it, you wan't it to be.

I was curios about LM Studio, so I installed it on Win11 and compared its performance vs ollama on my 7800XT over ROCm.
Phi4 Q4_K_M 42tpms on ollama and only 15 tps on LM Studio. Switched from v1.18.0 ROCm and Vulcan, but saw no major change in phi4's performance.
Gemma2:27B Q4_K_M 7tps for ollama, 4 to LM Studio.

I got curios about GPU compute loads and noticed big difference there. ollama uses 37-39% of GPU when gemma2 runs, while LM Studio only 9-11%. Don't know if its just bad ROCm implementation in LM Studio or bad overall GPU acceleration. Would be cool to know what the ollama vs LM Studio performance difference with CUDA is.
I'll be sticking to ollama+Open WebUI for speed and a lot more features. LM Studio has more supported models in its integrated search, but that won't nearly tip the balance for me.

igormp · Mar 6, 2025

Ultron1337 said:
Would be cool to know what the ollama vs LM Studio performance difference with CUDA is.

LMStudio has linux support, right? I could give it a go this week.

csendesmark · Mar 6, 2025

igormp said:
LMStudio has linux support, right? I could give it a go this week.

Yeah, of course LM Studio for Linux

System Name	Nirn
Processor	Amd Ryzen 7950X3D
Motherboard	MSI MEG ACE X670e
Cooling	Noctua NH-D15
Memory	128 GB Kingston DDR5 6000 (running at 4000)
Video Card(s)	Radeon RX 7900XTX (24G) + Geforce 4070ti (12G) Physx
Storage	SAMSUNG 990 EVO SSD 2TB Gen 5 x2 (OS)+SAMSUNG 980 SSD 1TB PCle 3.0x4 (Primocache) +2X 22TB WD Gold
Display(s)	Samsung UN55NU8000 (Freesync)
Case	Corsair Graphite Series 780T White
Audio Device(s)	Creative Soundblaster AE-7 + Sennheiser GSP600
Power Supply	Seasonic PRIME TX-1000 Titanium
Mouse	Razer Mamba Elite Wired
Keyboard	Razer BlackWidow Chroma v1
VR HMD	Oculus Quest 2
Software	Windows 10

Processor	9950x \| 5950x
Motherboard	x670e ProArt\| B550 ProArt
Cooling	PA 120 SE \|Fuma 2
Memory	4x64GB Kingston CUDIMM @5200MHz \| 4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	Corsair RM1000e \| XPG Core Reactor 850W
Software	I use Arch btw

System Name	Kincsem
Processor	AMD Ryzen 9 9950X
Motherboard	ASUS ProArt X870E-CREATOR WIFI
Cooling	Be Quiet Dark Rock Pro 5
Memory	Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s)	Sapphire AMD RX 7900 XT Pulse
Storage	Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s)	Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case	Cooler Master CM 690 III
Power Supply	Seasonic 1300W 80+ Gold Prime
Mouse	Logitech G502 Hero
Keyboard	HyperX Alloy Elite RGB
Software	Windows 10-64
Benchmark Scores	https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc

System Name	System2 Blacknet , System1 Blacknet2
Processor	System2 Threadripper 1920x, System1 2699 v3
Motherboard	System2 Asrock Fatality x399 Professional Gaming, System1 Asus X99-A
Cooling	System2 Noctua NH-U14 TR4-SP3 Dual 140mm fans, System1 AIO
Memory	System2 64GBS DDR4 3000, System1 32gbs DDR4 2400
Video Card(s)	System2 GTX 980Ti System1 GTX 970
Storage	System2 4x SSDs + NVme= 2.250TB 2xStorage Drives=8TB System1 3x SSDs=2TB
Display(s)	1x27" 1440 display 1x 24" 1080 display
Case	System2 Some Nzxt case with soundproofing...
Audio Device(s)	Asus Xonar U7 MKII
Power Supply	System2 EVGA 750 Watt, System1 XFX XTR 750 Watt
Mouse	Logitech G900 Chaos Spectrum
Keyboard	Ducky
Software	Archlinux, Manjaro, Win11 Ent 24h2
Benchmark Scores	It's linux baby!

System Name	Kincsem
Processor	AMD Ryzen 9 9950X
Motherboard	ASUS ProArt X870E-CREATOR WIFI
Cooling	Be Quiet Dark Rock Pro 5
Memory	Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s)	Sapphire AMD RX 7900 XT Pulse
Storage	Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s)	Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case	Cooler Master CM 690 III
Power Supply	Seasonic 1300W 80+ Gold Prime
Mouse	Logitech G502 Hero
Keyboard	HyperX Alloy Elite RGB
Software	Windows 10-64
Benchmark Scores	https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc