• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

G.SKILL Trident Z5 Royal DDR5-8000 48GB CL40

ir_cow

Staff member
Joined
Sep 4, 2008
Messages
5,203 (0.84/day)
Location
USA
G.SKILL has brought back the beloved Royal series for DDR5 users. Equipped with an 8000 MT/s XMP profile using modest CL40-48-48-128 timings and using just 1.35 V, G.SKILL is ready to please. Those looking to be treated like royalty however, may be disappointed, read on to find out how G.SKILL has lost its way.

Show full review
 
Ir Cow as you said, higher clock speeds don't do any favor for AMD cpus.
Im new in town but why would not you could push these memories at 6000 and find the lowest stable latency and then do AMD game tests?
 
Last edited:
This kit is slower than 2x16GB 7200Cas34 kit for Intel platform :/
 
So for AMD they are pointless and Intel they are a bit faster. Still, epic bling.
They are not pointless for AMD at all - they still need a little bit of basic tunning but the chips in them will go very high. Way to not read any of the article :p


@ir_cow - it seems you need to include TLDR conclusion in the title these days, people still just be looking at charts and nothing else lol
 
I tend to skim these RAM reviews as they seem very cut and paste, trying to differentiate so similar products and I'm not convinced of the methodology. That's just me - I still look, but I do skim, sorry. I'd be more interested to know why so few maufacturers kits are available at the etailers I look at.
 
Ir Cow as you said, higher clock speeds don't do any favor for AMD cpus.
Im new in town but why would not you could push these memories at 6000 and find the lowest stable latency and then do AMD game tests?
Because it is a "Intel" memory kit and I don't want to focus on what maybe 0.001% will do with this memory kit. Aka downclock to 6000 MT/s. Seems a waste of money because you can just buy a good 6000 MT/s kit for cheaper. The Tests are preformed with XMP / EXPO profile enabled.

I tend to skim these RAM reviews as they seem very cut and paste, trying to differentiate so similar products and I'm not convinced of the methodology.
I can't speak for other sites, but I know mine are as solid as they can get. 3 Runs average together and the CPU locked so that does not change based on temperature or any other variable. Only thing I cannot control is the NVIDIA Turbo boost. As long as I keep the room in the 70s (F), it barely moves 25Mhz.

@ir_cow - it seems you need to include TLDR conclusion in the title these days, people still just be looking at charts and nothing else lol
Funny, its already in it :)
 
since you are an expert and these are one of the finest memory on the market would not you want to do the pro test instead of just setting up auto profile in bios? :)
You know....Pushing boundaries for the mankind...:clap:
It is an intel memory but you did amd platform tests too....

ps. maybe im wrong since just like i told you...im new in town and dont really know if the catency will have any impact on perfomance but still, i would like to see it. :toast:
 
Ir Cow as you said, higher clock speeds don't do any favor for AMD cpus.
Im new in town but why would not you could push these memories at 6000 and find the lowest stable latency and then do AMD game tests?

There's a better option for people building with socket AM5, G.Skill recently announced a Royal Neo DDR5-6000 CL28 kit tailored for AMD platforms.


They will work better for Ryzen (or Intel Z690).
 
There's a better option for people building with socket AM5, G.Skill recently announced a Royal Neo DDR5-6000 CL28 kit tailored for AMD platforms.


They will work better for Ryzen (or Intel Z690).
Yep and these kits will be super expensive but guess what there is a cheaper option with the almost speed.

 
since you are an expert and these are one of the finest memory on the market would not you want to do the pro test instead of just setting up auto profile in bios? :)
Check out older reviews. The last one I did might interest you in the OC section.


You have cyberpunk 2077 1% following counterstrike fps in the AMD section
I'll fix that later tonight after work :). With so many charts, sometimes one or 2 makes it past me. Thanks for pointing it out.
 
Great review.

DDR5-8000 EXPO - is this the new Trident Z5 Royal Neo DDR5-8000 kit by any chance?
 
Great review.

DDR5-8000 EXPO - is this the new Trident Z5 Royal Neo DDR5-8000 kit by any chance?
G.SKIll announced a Royal Neo 8000 MT/s. But I don't know much more than that.

Unless Ryzen 9000 does something magical with memory support and it's EXPO has CAS36, I can't see it being a good choice.

That's just my take on it for now.
 
gskill gave alot of wiggle room at 1.35v sticks should be capable of something sexy on a apex board
 
G.SKIll announced a Royal Neo 8000 MT/s. But I don't know much more than that.

Unless Ryzen 9000 does something magical with memory support and it's EXPO has CAS36, I can't see it being a good choice.

That's just my take on it for now.
The only reason why I brought it up is that a DDR5-8000 EXPO kit is included among your results. Which kit did you use?

I agree, it is probably not the most cost effective option. But it should be possible to overclock a lot of Hynix A die memory kits to this speed, and based on your results, it may be worth it.
 
Regarding DRAM refresh timing, here's my uninformed guess. Maybe G.Skill had to increase refresh times as a tradeoff to prevent corruption of data at 1.35 V. They have to ensure stability over the entire temperature range after all, which is something that you needn't care about during your testing. Whatever it is, I expect them to explain their choice (if it's not a mistake) soon.
 
The only reason why I brought it up is that a DDR5-8000 EXPO kit is included among your results. Which kit did you use?
Ah. That was the KLEVV CRAS V reviewed previously. Only 8000 kit I have that has EXPO on it. Though, it doesn't matter a whole lot anyways.

Regarding DRAM refresh timing, here's my uninformed guess. Maybe G.Skill had to increase refresh times as a tradeoff to prevent corruption of data at 1.35 V. They have to ensure stability over the entire temperature range after all, which is something that you needn't care about during your testing. Whatever it is, I expect them to explain their choice (if it's not a mistake) soon.
Yes. I agree this was my first thought. Higher tRFC values and lower voltage makes it a indeed a lot more resilient to higher temperatures. That might be what the engineer was after. I wouldn't sell them it as a Royal or TridentZ through. Make a new series for this instead.

Since my test bench is open I have to place a fan on the memory or it will error out. In my personal computer I have a fan on the DDR5-6000 kit, because that will error out too from all the heat dumped on it directly from the Nvidia 3080 Ti FE card.

gskill gave alot of wiggle room at 1.35v sticks should be capable of something sexy on a apex board
I was able to get the it back to 38-48-48-84 and the tRFC values as well with just 1.35V. however I have no idea if this is just a special case or these ICS are all binned the same. The higher timings and tRFC can be a used for low grade ICs as well.
 
Last edited:
Eventually I would very much like to see performance testing with token generation with large language models (LLMs) on the CPU, which is a very bandwidth-intensive (but not very latency-sensitive) task with immediately practical applications.
 
Eventually I would very much like to see performance testing with token generation with large language models (LLMs) on the CPU, which is a very bandwidth-intensive (but not very latency-sensitive) task with immediately practical applications.
No idea how to do this. Walk me through it. I'm guessing it has to be Linux?
 
No idea how to do this. Walk me through it. I'm guessing it has to be Linux?

It could be done with llama.cpp, I use it on Linux but it has releases for Windows both for CUDA and AVX instructions (CPU)

Inside, there are various binaries. The one of interest is llama-bench.exe

First you have to download a LLM in the correct format (GGUF) at your quantization level of choice. Let's choose Meta Llama-3.1-8B in 8-bit precision:
Source: https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF

Then, after downloading it you could run the benchmark. It has various options:

Code:
./build/bin/llama-bench --help
usage: ./build/bin/llama-bench [options]

options:
-h, --help
-m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf)
-p, --n-prompt <n> (default: 512)
-n, --n-gen <n> (default: 128)
-pg <pp,tg> (default: )
-b, --batch-size <n> (default: 2048)
-ub, --ubatch-size <n> (default: 512)
-ctk, --cache-type-k <t> (default: f16)
-ctv, --cache-type-v <t> (default: f16)
-t, --threads <n> (default: 8)
-ngl, --n-gpu-layers <n> (default: 99)
-rpc, --rpc <rpc_servers> (default: )
-sm, --split-mode <none|layer|row> (default: layer)
-mg, --main-gpu <i> (default: 0)
-nkvo, --no-kv-offload <0|1> (default: 0)
-fa, --flash-attn <0|1> (default: 0)
-mmp, --mmap <0|1> (default: 1)
--numa <distribute|isolate|numactl> (default: disabled)
-embd, --embeddings <0|1> (default: 0)
-ts, --tensor-split <ts0/ts1/..> (default: 0)
-r, --repetitions <n> (default: 5)
-o, --output <csv|json|md|sql> (default: md)
-oe, --output-err <csv|json|md|sql> (default: none)
-v, --verbose (default: 0)

Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times.

To run the benchmark with the default settings with the downloaded model (Meta Llama 3.1 8B, Q8) you'd do:

Code:
./build/bin/llama-bench -m Meta-Llama-3.1-8B-Instruct-Q8_0.gguf
| model                          |       size |     params | backend    | threads |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | ---------------: |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CPU | 8 | pp512 | 47.94 ± 3.68 |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CPU | 8 | tg128 | 5.51 ± 0.02 |

build: 268c5660 (3494)

With this, prompt processing (PP) and token generation (TG) are done entirely on the CPU. In the above case (unoptimized dual-channel DDR4-3600 on an Intel i7-12700k) I get 47.94 tokens/s for prompt processing (over a context of 512 tokens) and 5.51 tokens/s for actual token generation (over 128 tokens). Token generation speed depends on the size of the model (number of parameters and quantization) and memory bandwidth, while prompt processing is a function of various factors, but mainly compute power.

For inference on the CPU perhaps models quantized in 4-bit are of higher interest, right now. They will run faster (almost twice as fast) albeit at a lower quality.
This one could be used: https://huggingface.co/bartowski/Me...ama-3.1-8B-Instruct-Q4_K_M.gguf?download=true

As a side note, on an RTX3090 (power-limited) the same test as above gives:

Code:
./build/bin/llama-bench -m Meta-Llama-3.1-8B-Instruct.q8_0.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | pp512 | 4162.60 ± 27.99 |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | tg128 | 84.82 ± 0.17 |

build: 268c5660 (3494)

Considerably faster (prompt processing in particular, but token generation is roughly in line with the memory bandwidth difference—936 GB/s vs 57.6 GB/s), however the GPU has limited amounts of memory compared to what a bandwidth-optimized CPU could in theory provide, and runs quite hot (not to mention that high-end GPUs are expensive).

Eventually, with multi-channel (256-bit bus width or more) LPDDR5X systems or DDR6 we might get enough memory bandwidth to generate tokens from LLMs at comfortable speeds on the CPU, leaving prompt processing to dedicated NPUs or the installed discrete GPU.
 
Last edited:
Yep and these kits will be super expensive but guess what there is a cheaper option with the almost speed.


Yes, but we are talking about extreme parts here. An 8000C40 kit is a ultra high end kit, similarly a 6000C28 is as well
 
Great review. My first and immediate thought was "If this review doesn't have detailed latency figures and frametimes....".

Spot on. Love TechPowerUp. I have never in my life talked crap about this site. That's saying a lot too. Lol.
 
12 GB modules for DESKTOPS, COME OON....:confused:
seems I'm fine with 24 GB and not very need 32 and I don't like "overkill" builds. Value ftw lol


on-topic:
"beloved" for it's crazy-price? looks nice definitely, but Apple-styled pricing just kills any logic to buy one.
 
As I've said ever since AM5 debuted, AMD's DDR5 memory controller and IF design is embarrassing. Low bandwidth, high latency and the competition is already talking about DDR10000+ support, and already exceeds 130GB/s bandwidth when tuned, and their upcoming chips will be better than this.

Even with this 8000mt memory, it's nowhere near 2x more bandwidth of my DDR4 3800 CL14 RAM. Intel just about manages to get double my 60GB/s, and that's exactly what I would expect as a minimum by going from DDR4 3800 to DDR5 8000. Also, I get 54ns latency (no safe mode tricks), this stuff is 10ns more! I can see you can get about 61ns with a better kit.

The funny part is that AMD said they made no changes to the IF and memory controller in the upcoming Zen5 range. A huge mistake. No wonder AMD depends on the x3D cache-grab to band-aid this mess.
 
Last edited:
As I've said ever since AM5 debuted, AMD's DDR5 memory controller and IF design is embarrassing. Low bandwidth, high latency and the competition is already talking about DDR10000+ support, and already exceeds 130GB/s bandwidth when tuned, and their upcoming chips will be better than this.

Even with this 8000mt memory, it's nowhere near 2x more bandwidth of my DDR4 3800 CL14 RAM. Intel just about manages to get double my 60GB/s, and that's exactly what I would expect as a minimum by going from DDR4 3800 to DDR5 8000. Also, I get 54ns latency (no safe mode tricks), this stuff is 10ns more! I can see you can get about 61ns with a better kit.

The funny part is that AMD said they made no changes to the IF and memory controller in the upcoming Zen5 range. A huge mistake. No wonder AMD depends on the x3D cache-grab to band-aid this mess.

While the raw latency numbers are a bit higher, due to signaling and the sheer amount of bandwidth, it tends to greatly outperform DDR4. Especially once you go above 7200. I've gotten a ~35% improvement over 6400 C30 I was running on the Z690 Ace by doing 7600 C36 on the Apex Encore with the same kit

12 GB modules for DESKTOPS, COME OON....:confused:
seems I'm fine with 24 GB and not very need 32 and I don't like "overkill" builds. Value ftw lol

on-topic:
"beloved" for it's crazy-price? looks nice definitely, but Apple-styled pricing just kills any logic to buy one.

32 GB kits are already being overwhelmed by some newer games. They tend to perform the best, but the new 48 GB kits are the new "value" spot, IMO. They also clock about the same as the 16 GB SR modules, on average. As far as pricing goes, I mean, it's Trident Royal. These cater to the same audience of Corsair's Dominator kits. IMO, money well spent. But YMMV.
 
Back
Top