EPYC 9965 (192 cores ZEN5c), a design error?

Nhonho · Oct 15, 2024

The more cores a CPU has, the more cache memory it needs, since increasing the number of cores increases the "competition" between the cores for access to RAM. And with a larger amount of cache, more data is copied in advance from RAM to cache and read directly from it by the cores, thus preventing cores from idling (and consequently losing performance) due to delays in accessing data in main RAM.

With the EPYC 9965 CPU (192 cores), AMD did exactly the opposite of what logic suggests: it greatly increased the number of cores and halved the amount of L3 cache memory.

It seems that, in the video encoding test of the link below (done by Tom's Hardware), exactly what was described above happened: the EPYC 9965 CPU (192 cores) had a poor performance, similar to the EPYC 9575F CPU, which has only 64 cores.

And on the EPYC CPU specifications pages, AMD did the "favor" of not showing which type of core the processor has (whether ZEN5, ZEN5c, etc.), nor does it show which instruction sets the processor supports or how much cache memory each CPU/chiplet has:

https://www.amd.com/en/products/processors/server/epyc/9005-series/amd-epyc-9965.html

https://www.amd.com/en/products/processors/server/epyc/9005-series/amd-epyc-9575f.html

Source:

AMD EPYC ‘Turin’ 9005 Series - we benchmark 192-core Zen 5 chip with 500W TDP

Cranking up the power.

www.tomshardware.com

eidairaman1 · Oct 15, 2024

Boost clocks...

Vincero · Oct 15, 2024

Nhonho said:
It seems that, in the video encoding test of the link below (done by Tom's Hardware), exactly what was described above happened: the EPYC 9965 CPU (192 cores) had a poor performance, similar to the EPYC 9575F CPU, which has only 64 cores.

There will be less cache sensitive workloads - to be fair it's really the problem of the server buyer / workload intention to dictate what you need it for and what is best product for the job.
There will be a similar issue for Intel's all E-core server parts.

Nobody ever said you can swap one for the other and not see an impact.
The reduced cache size will impact in a number of ways such as processes that spread threads over many cores, not just memory buffering. Again, workload choice will make a difference.

Nhonho said:
And on the EPYC CPU specifications pages, AMD did the "favor" of not showing which type of core the processor has (whether ZEN5, ZEN5c, etc.), nor does it show which instruction sets the processor supports or how much cache memory each CPU/chiplet has:

I have to admit, not identifying the core type isn't great but the cache amounts, etc., will give it away (for those who know what they're looking for).
As for CPU capabilities, unlike Intel, Zen and Zen-c cores share the same CPU caps.
A full list is handy to have though, although at this point in time it supports pretty much everything except Intel's AVX10 (or AVX-512 2nd attempt)

DirtyDingusMcgee · Oct 15, 2024

Get a load of that test rig.

I could think of better things to do with it than encode videos......like virtualize several small companies worth of servers.

igormp · Oct 15, 2024

Nhonho said:
It seems that, in the video encoding test of the link below (done by Tom's Hardware), exactly what was described above happened: the EPYC 9965 CPU (192 cores) had a poor performance, similar to the EPYC 9575F CPU, which has only 64 cores.

You are hyperfocusing on a single task. First of all, video encoding has an upper limit on how much it can be made parallel, and that also depends on the encoder being used, and also the config of said encoder.
For SVT-AV1, here are some different behaviours for different presets:

A 16-core beating a 64-core CPU.

Here the 64-core with higher frequencies take the lead.

Another thing is that those encoding tests are NOT dependent on memory bandwidth. This can be easily seen here:

Anyhow, Zen 5c is not meant to be encoding videos, it's meant for hyperscalers that are looking for really high core density.

Nhonho said:
And on the EPYC CPU specifications pages, AMD did the "favor" of not showing which type of core the processor has (whether ZEN5, ZEN5c, etc.), nor does it show which instruction sets the processor supports or how much cache memory each CPU/chiplet has:

AMD's spec page is awfuly bad, but they did give a list of which models had Zen5c cores and which did not:

Instruction set is AVX-512, caches you need to dig somewhere else.

DirtyDingusMcgee · Oct 15, 2024

I don't get to use many AMD branded prototypes. Who's case is that? Doesn't look like a supermicro or mitac. Custom water loop, or is it one of those dynatron setups? They were a bit scant on the details of the system, and I can't see many of the part numbers.

Redwoodz · Oct 15, 2024

SVT-AV1 | Codec Wiki

The content in this entry is incomplete & is in the process of being completed.

wiki.x266.mov

"SVT-AV1's greatest strength is its parallelization capability, where it outclasses other AV1 encoders by a significant margin. SVT-AV1's parallelization techniques do not involve tiling & don't harm video quality, & can comfortably utilize up to 16 cores given 1080p source video. This is while maintaining competitive coding efficiency to mainline aomenc. Perceptually, mainline SVT-AV1 is outperformed by well-tuned community forks of aomenc, but according to many the gap has begun to close with the introduction of SVT-AV1-PSY."

Nhonho · Oct 15, 2024

igormp said:
Anyhow, Zen 5c is not meant to be encoding videos, it's meant for hyperscalers that are looking for really high core density.

I'm still going to read your entire post and all the topic carefully.

But even so, it happens as I said in the first post: the more cores a CPU has, the more cache it needs to have to the cores are always well supplied with data from RAM, in order to they don't stay idle waiting for data from main RAM.

Redwoodz said:
can comfortably utilize up to 16 cores given 1080p source video

So, the SVT-AV1 is only capable of dividing blocks of up to 480x270 pixels (1920x1080 ÷ 4) for each core? That must be why other CPUs with more than 64 cores did not perform much better encoding the 4K video of the test. (A 1920x1080 resolution video has 16 blocks of 480x270 pixels. And a 3840x2160 video has 64 blocks of 480x270 pixels.)

If the SVT-AV1 is only capable of dividing blocks of up to 480x270 pixels for each core, this is a very poor programming. The team developing the SVT-AV1 should have already optimized it to it works very well with CPUs with hundreds of cores, dividing the video into very small blocks for each core.

Rus4kova · Oct 15, 2024

SVT-AV1 rev 1.2.xx sucks performance wise. It doesn't utilize all cores 100%. There even was a short time where it only used 1 core.
If you run some of the old 1.1 versions of SVT-AV1 they run much faster and encode almost double the FPS.
New features like variance-boost, variance-boost-strength and variance-octile might have something to do with the performance decrease.

SL2 · Oct 15, 2024

Nhonho said:
It seems that, in the video encoding test of the link below (done by Tom's Hardware), exactly what was described above happened: the EPYC 9965 CPU (192 cores) had a poor performance, similar to the EPYC 9575F CPU, which has only 64 cores.

This is a bit like saying "Hey this CPU is inferior to that GPU when it comes to rendering".

I.e you're not wrong, but you're asking the wrong question.

Carillon · Oct 15, 2024

toms said:
the testing for the 192-core model isn't yet done

You can see more benchmarks on phoronix. The zen5c cores have their niche applications.
And if you are buying these kind of CPUs, you are probably paid to study the one slide with the SKUs.

mb194dc · Oct 15, 2024

It's just a massively niche product where the extra cores will only benefit certain kinds of workloads. Need to know what you're buying rather than just saying more cores = better.

Actually that's the same right down the product stack to a large degree. Generalised reviews and benchmarks can be incredibly misleading if your use case isn't the same...

Vayra86 · Oct 15, 2024

Nhonho said:
I'm still going to read your entire post and all the topic carefully.

But even so, it happens as I said in the first post: the more cores a CPU has, the more cache it needs to have when running memory intensive workloads to the cores are always well supplied with data from RAM, in order to they don't stay idle waiting for data from main RAM.

FTFY

DavidC1 · Oct 19, 2024

Nhonho said:
The more cores a CPU has, the more cache memory it needs, since increasing the number of cores increases the "competition" between the cores for access to RAM. And with a larger amount of cache, more data is copied in advance from RAM to cache and read directly from it by the cores, thus preventing cores from idling (and consequently losing performance) due to delays in accessing data in main RAM.

With the EPYC 9965 CPU (192 cores), AMD did exactly the opposite of what logic suggests: it greatly increased the number of cores and halved the amount of L3 cache memory.

That's because it's aiming at Cloud and Virtualization workloads, an area where ARM is trying to make inroads. Epyc 9965 Turin Dense is a perfect counter to them. Same with Intel Xeon 6 "Sierra Forest".

The logic is that the core size is 25% smaller than regular Zen 5 and the smaller caches make it even smaller.

Panther_Seraphin · Oct 19, 2024

These are perfect chips to host hundreds of bursty workloaded VMs, think things that are being sold as SaaS for a lot of enterprise customers, Scalable webapps/webservers that can scale up and down depending on loads during the day and things like load balancers for large orgs.

It also means people can condense 10s of physical servers now into single box platforms cutting down costs in power/cooling/U Space.

Nhonho · Oct 19, 2024

DavidC1 said:
That's because it's aiming at Cloud and Virtualization workloads, an area where ARM is trying to make inroads. Epyc 9965 Turin Dense is a perfect counter to them. Same with Intel Xeon 6 "Sierra Forest".

The logic is that the core size is 25% smaller than regular Zen 5 and the smaller caches make it even smaller.

The EPYC 9965 processor has 192 cores and only 12 dual channels of DDR5 memory (total of 24 32-bit channels). Therefore, there is only 1 memory channel for every 8 x86 cores, and they are constantly "competing" with each other for access to RAM.

AMD chose to make the EPYC CPUs in an MCM scheme, instead of a single die, for several reasons... In this scheme, there is an increase in latency in accessing the main RAM memory and, to reduce or compensate this increase in latency, the chiplets need to have large amounts of cache memory to always be supplied, in advance, with data from the RAM memory.

Apparently, AMD chose to reduce the amount of cache memory in its "c"-end chiplets because cache memories (SRAM), like DRAM memories, cannot be made with such advanced lithography as other parts of the chiplet. When there is a major advance in the lithography of certain parts of a chip, from 14 to 7 nm for example (50% less), the lithography of the cache memories (SRAM) decreases by only 5 or 3% (and sometimes it doesn't even decrease, it remains the same as the previous lithography). In short, cache memory occupies a large area (mm²) of silicon, making the chip much more expensive.

In any case, reducing the amount of cache memory ALWAYS ends up being a bad saving, as it directly affects the performance of the x86 cores, since, if the data is not in the cache memory, the x86 cores lose a lot of performance by remaining idle waiting for the memory controller to send them the data that is in the RAM memory.

https://www.amd.com/en/products/processors/server/epyc/9005-series/amd-epyc-9965.html

If the processor has more cache per core, the cores are less idle. A processor with fewer cores and more cache per core can do the same or more than one with more cores and less cache per core.

See it also:

Cloudflare switches to EPYC 9684X Genoa-X CPUs with 3D V-Cache — 145% faster than previous-gen Milan servers

3D-VCache is Cloudflare's best friend

www.tomshardware.com

Why Cloudflare Chose AMD EPYC for Gen X Servers

Looking back at this week's posts on the design, specifications, and performance of Cloudflare’s Gen X servers using AMD EPYC CPUs.

blog.cloudflare.com

Analysis of the EPYC 145% performance gain in Cloudflare Gen 12 servers

Cloudflare’s Gen 12 server is the most powerful and power efficient server that we have deployed to date. Through sensitivity analysis, we found that Cloudflare workloads continue to scale with higher core count and higher CPU frequency, as well as achieving a significant boost in performance...

blog.cloudflare.com

ShrimpBrime · Oct 19, 2024

Workload Affinity:
Analytics , App dev|test , Content mgmt , HPC , Media streaming , Networking|NFV , Security , VDI , VM Density , Web Serving , CDC

Supported Technologies
AMD Infinity Guard , AMD Infinity Architecture

https://www.amd.com/en/products/processors/server/epyc/9005-series/amd-epyc-9965.html

Are any of the workloads above listed at Tom's Hardware Store in their benchmark testing?

TumbleGeorge · Oct 19, 2024

Nhonho said:
The EPYC 9965 processor has 192 cores and only 12 dual channels of DDR5 memory (total of 24 32-bit channels). Therefore, there is only 1 memory channel for every 8 x86 cores, and they are constantly "competing" with each other for access to RAM.

Tl;dr

You are undoubtedly correct in your theoretical statements. But... If you can draw a design where you have all those 192 cores and a lot more caches and basically anything what you want. I will be very interested in how you recreate it on silicon. It will not fit on the die due to too large an area. Perhaps, by 2030, using a much reduced lithography unit made with the new ASML 5200 scanners. But then they will work with smaller matrices, so hardly... You'll probably need a special custom matrix. The thing is, unless you're a multi-millionaire, you're unlikely to be able to afford to buy something of this class.

vvkvvkvvk · Oct 19, 2024

This CPU is not aimed at video encoding, but I'm sure your vendor will have something that'll work wonders in your specific video encoding workload if you are intending to buy this scale of product. If you don't have vendor, then this thing is not aimed at you at all.

It's to replace several units of older servers with single node, say older Xeon plats that top out at 56 cores. As many said before, this is for Hyperscalers who sell their stuff by the core (and memory). To place 192 cores on single chip, you need to cut the cache sizes, it's simple as that. With 192 cores in 1U platform, you can replace 2-5 older gen servers with single unit, giving you more opportunities for business as there are usually two things that are on premium at datacenters, space and power.

lexluthermiester · Oct 19, 2024

DirtyDingusMcgee said:
I could think of better things to do with it than encode videos

In a world were streaming video is a major industry, you underestimate just how important encoding video really is.

Nhonho said:
Why Cloudflare Chose AMD EPYC for Gen X Servers

Looking back at this week's posts on the design, specifications, and performance of Cloudflare’s Gen X servers using AMD EPYC CPUs.

blog.cloudflare.com

Analysis of the EPYC 145% performance gain in Cloudflare Gen 12 servers

Cloudflare’s Gen 12 server is the most powerful and power efficient server that we have deployed to date. Through sensitivity analysis, we found that Cloudflare workloads continue to scale with higher core count and higher CPU frequency, as well as achieving a significant boost in performance...

blog.cloudflare.com

Could we please not discuss the garbage that is Cloudflare?

SL2 · Oct 19, 2024

igormp said:
Anyhow, Zen 5c is not meant to be encoding videos, it's meant for hyperscalers that are looking for really high core density.

The OP will never get this.

Vincero · Oct 21, 2024

lexluthermiester said:
In a world were streaming video is a major industry, you underestimate just how important encoding video really is.

Could we please not discuss the garbage that is Cloudflare?

Anyone in that space that is pushing lots of video streaming or requires better scaling will be unlikely to be using CPU only for encode/transcode.
Hell, even Intel had a Xe product for that before the graphics cards were out - even zen c cores with extra density can't beat an ASIC or dedicated IP-ASIC-like blocks (when it comes to efficiency - obviously throw 100+ CPU cores at a problem and it can probably beat a lowly simple ASIC in just 1 task/stream).

lexluthermiester · Oct 21, 2024

Vincero said:
Anyone in that space that is pushing lots of video streaming or requires better scaling will be unlikely to be using CPU only for encode/transcode.
Hell, even Intel had a Xe product for that before the graphics cards were out - even zen c cores with extra density can't beat an ASIC or dedicated IP-ASIC-like blocks (when it comes to efficiency - obviously throw 100+ CPU cores at a problem and it can probably beat a lowly simple ASIC in just 1 task/stream).

You're not understanding the context here. It doesn't matter if there's something better.

Vincero · Oct 21, 2024

lexluthermiester said:
You're not understanding the context here. It doesn't matter if there's something better.

It seems all but a few is following different contexts of what products are good at what tasks in this thread.

I get your point and performance doing a certain task (even if it's not the ideal platform / candidate for it) will be an important metric in terms of rating performance, but that in itself defines the purpose that these things are used for.
Yeah, for sure MS/Amazon/Google/etc., will spin you up an instance of the crappiest ALU/FPU performing cloud VM to use for a set of tasks which doesn't utilise them properly - that's on you for choosing the wrong option. But...

lexluthermiester said:
In a world were streaming video is a major industry

... In that world any 'major industry' participant will be looking to get best bang per buck and if that means specialist / specific hardware to achieve it then awesome. You think Google, Amazon, etc., are using pure CPU power to do the video gruntwork on their offerings (like Twitch, Youtube, etc.)?
The irony is having a server filled with several GPU cards that can handle all that transcoding workload means that having the zen c CPUs is actually a fine tradeoff as they'll generally not be taxed with workloads they really aren't optimised for and can easily handle the shifting of data using less energy.
Somewhat akin to when crypto miners were conneting way more than 4 GPUs to standard motherboards, sometimes with the cheapest CPUs they could find - just keep them GPUs fed...

mkppo · Oct 21, 2024

I don't get it. They literally have three product lines with differering amounts of cache per core depending on the workload. Obviously some tasks are less dependent on cache and are highly parallel where this shines. If you need more cache, get one of the regular Zen 5 variants with 128 cores. Need even more cache? Go X3D.

How this is a design error is beyond me. They fit as many cores as they could because of customer requests. I'd wager a guess and say video encoding wasn't part of their consideration, nor were other cache sensitive apps.

These CPU's aren't meant for general tasks, whoever deploys these would only do so knowing that the workload they are going to optimize for would be the fastest on Epyc 9965. And there are plenty of tasks where that is the case.

System Name	PCGOD
Processor	AMD FX 8350@ 5.0GHz
Motherboard	Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling	Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory	16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s)	AMD Radeon 290 Sapphire Vapor-X
Storage	Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s)	NEC Multisync LCD 1700V (Display Port Adapter)
Case	AeroCool Xpredator Evil Blue Edition
Audio Device(s)	Creative Labs Sound Blaster ZxR
Power Supply	Seasonic 1250 XM2 Series (XP3)
Mouse	Roccat Kone XTD
Keyboard	Roccat Ryos MK Pro
Software	Windows 7 Pro 64

System Name	Obliterator
Processor	Ryzen 7 7700x PBO
Motherboard	ASRock x670e Steel Legend
Cooling	Noctua NH-D15 G2 LBC
Memory	G.skill Trident Z5 Neo 6000@CL30
Video Card(s)	ASRock rx7900 GRE Steel Legend
Storage	2 x 2TB Samsung 990 pro nmve ssd 2 X 4TB Samsung 870 evo sata ssd 1 X 18TB WD Gold sata hdd
Display(s)	LG 27GN750-B
Case	Fractal Torrent
Audio Device(s)	Klipsch promedia heritage 2.1
Power Supply	FSP Hydro TI 1000w
Mouse	SteelSeries Prime+
Keyboard	Lenovo SK-8825 (L)
Software	Windows 10 Enterprise LTSC 21H2 / Windows 11 Enterprise LTSC 24H2 with multiple flavors of VM

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	Obliterator
Processor	Ryzen 7 7700x PBO
Motherboard	ASRock x670e Steel Legend
Cooling	Noctua NH-D15 G2 LBC
Memory	G.skill Trident Z5 Neo 6000@CL30
Video Card(s)	ASRock rx7900 GRE Steel Legend
Storage	2 x 2TB Samsung 990 pro nmve ssd 2 X 4TB Samsung 870 evo sata ssd 1 X 18TB WD Gold sata hdd
Display(s)	LG 27GN750-B
Case	Fractal Torrent
Audio Device(s)	Klipsch promedia heritage 2.1
Power Supply	FSP Hydro TI 1000w
Mouse	SteelSeries Prime+
Keyboard	Lenovo SK-8825 (L)
Software	Windows 10 Enterprise LTSC 21H2 / Windows 11 Enterprise LTSC 24H2 with multiple flavors of VM

Processor	AMD Ryzen 9 9950X
Motherboard	Asus ROG Strix B650E-E Gaming Wifi bios 3040 w/Agesa 1.2.0.2
Cooling	Thermalright Phantom Spirit 120
Memory	64 GB Kingston FURY Beast DDR5-6000 CL30 - 2x32 GB
Video Card(s)	ASRock Radeon RX 7900 XTX Phantom Gaming OC
Storage	1 x WD Black SN850 1TB, 1 x Samsung 990 PRO 2TB, 2 x Samsung 860 1TB, 1 x Segate 16TB HDD
Display(s)	Dell G3223Q 4K UHD
Case	NZXT H7 Flow (2024) - All Black
Audio Device(s)	ROG SupremeFX 7.1 Surround Sound High Definition Audio CODEC ALC4080
Power Supply	Thermalright TP 1000 Watt
Mouse	Razer DeathAdder v3.0 PRO
Keyboard	Razer BlackWidow V4
Software	Windows 11 PRO 24H2 build 26100.1882

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

Processor	AMD 7600x
Motherboard	Asrock x670e Steel Legend
Cooling	Silver Arrow Extreme IBe Rev B with 2x 120 Gentle Typhoons
Memory	4x16Gb Patriot Viper Non RGB @ 6000 30-36-36-36-40
Video Card(s)	XFX 6950XT MERC 319
Storage	2x Crucial P5 Plus 1Tb NVME
Display(s)	3x Dell Ultrasharp U2414h
Case	Coolermaster Stacker 832
Power Supply	Thermaltake Toughpower PF3 850 watt
Mouse	Logitech G502 (OG)
Keyboard	Logitech G512

System Name	I don't name my rig
Processor	14700K
Motherboard	Asus TUF Z790
Cooling	Air/water/DryIce
Memory	DDR5 G.Skill Z5 RGB 6000mhz C36
Video Card(s)	RTX 4070 Super
Storage	980 Pro
Display(s)	1080P 144hz
Case	Open bench
Audio Device(s)	Some Old Sherwood stereo and old cabinet speakers
Power Supply	Corsair 1050w HX series
Mouse	Razor Mamba Tournament Edition
Keyboard	Logitech G910
VR HMD	Quest 2
Software	Windows
Benchmark Scores	Max Freq 13700K 6.7ghz DryIce Max Freq 14700K 7.0ghz DryIce Max all time Freq FX-8300 7685mhz LN2

Processor	Epyc 7302
Motherboard	Supermicro H12
Cooling	Arctic SP3 4U
Memory	512GB
Video Card(s)	Both of them
Storage	90 TB or so
Display(s)	yes
Case	Filled with RGB
Audio Device(s)	Apple USB-C DAC

System Name	GraniteXT
Processor	Ryzen 9950X
Motherboard	ASRock B650M-HDV
Cooling	2x360mm custom loop
Memory	2x24GB Team Xtreem DDR5-8000 [M die]
Video Card(s)	RTX 3090 FE underwater
Storage	Intel P5800X 800GB + Samsung 980 Pro 2TB
Display(s)	MSI 342C 34" OLED
Case	O11D Evo RGB
Audio Device(s)	DCA Aeon 2 w/ SMSL M200/SP200
Power Supply	Superflower Leadex VII XG 1300W
Mouse	Razer Basilisk V3
Keyboard	Steelseries Apex Pro V2 TKL

EPYC 9965 (192 cores ZEN5c), a design error?

The Exiled Airman