Went from 7800XT to 7900XTX. Run by Ollama, Phi4 had nearly perfect scaling according to VRAM speed (642GB/s vs 960GB/s) going from 42 to 62tps. Gemma3 would not fit into 7800XT's VRAM and was partially swapped in RAM, so that saw relatively bigger increase: 9->35tps making it now nicely usable.
Lowered the power limit -10% (wish AMD would lower this number further), capped clocks to 2.2Ghz (useless here from LLM perspective) but increased the VRAM to 2.7GHz (+fast timing), lowered voltage 1150->1100 got me from 339W to 226W when running Gemma3:27B at almost identical performance at 34 tps.
Hey
@AMD , it's about time to include out-of-the-box profiles for running LLMs on high-end GPUs!