Hygon Prepares 128-Core, 512-Threaded x86 CPU with Four-Way SMT and AVX-512 Support

efikkan · May 10, 2025

ncrs said:
You are redefining what "IPC" means to suit your argument. I gave you detailed test results which you simply ignore. There's not much more I can do here.

Not true. The definition of IPC has always been the same; instructions per clock for a CPU core. Facts are not subject to your opinion, and yet you keep twisting and diverting when confronted with the truth…

It's primarily the CPU vendors themselves at fault for creating confusion and turning "IPC" into a marketing gimmick. (But also big tech YouTubers/websites commonly misuses technical terms, and while many have been into tech for many years still lack the deep knowledge of CPU architectures, machine code and software design.) IPC and performance per clock may be very different, especially when you have different performance characteristics, and even benchmarking with different feature levels or ISAs all together. Take for instance one CPU running a test with AVX-512 and one with AVX2, first will execute fewer instructions per clock yet have higher performance than the latter. Or comparing Zen 2/3 to the Skylake family; Zen having more execution ports but a weaker front-end, resulting some workloads performing significantly better on one or the other.

The same is by all indicators the case for this Hygon CPU too; it's by far easier to achieve some performance by adding lots of execution ports first, and then optimize how to feed them later. And to some extent for Zen 5 too; increasing ALUs 4->6 didn't have a major impact across the board like "leakers" expected, but it will likely lead to gains when the front-end matures with Zen 6 and later revisions.

igormp · May 10, 2025

remixedcat said:
Not w music production. A VST runs everything on one thread and when it pushes it hard.. see my above post...

Okay, it doesn't benefit all applications, and there are many that actually get worse performance, but we're talking about the general case and not really solely about music production.

efikkan said:
Absolutely not, it's a common misconception that IPC is performance per clock, when it's not, it's the amount of instructions the CPU is able to churn through. Whether there is one, two or more threads sharing a core's resources, the IPC remains constant. SMT does improve the saturation of the core for some workloads, but the total performance will only converge towards a fully thread fully saturating the core, never above that. This should be basic knowledge about CPUs.

IPC is instructions per clock, I don't see what you're trying to get at with a different definition.
If a core's EUs cannot be saturated with a single thread stream, and using tricks such as SMTs allows you to saturate it with more instructions being retired, and thus IPC does go up, period.
As mentioned above, it seems like you're just dismissing factual benchmarks even for no apparent reason.

ncrs · May 10, 2025

efikkan said:
Not true. The definition of IPC has always been the same; instructions per clock for a CPU core. Facts are not subject to your opinion, and yet you keep twisting and diverting when confronted with the truth…

It's primarily the CPU vendors themselves at fault for creating confusion and turning "IPC" into a marketing gimmick. (But also big tech YouTubers/websites commonly misuses technical terms, and while many have been into tech for many years still lack the deep knowledge of CPU architectures, machine code and software design.) IPC and performance per clock may be very different, especially when you have different performance characteristics, and even benchmarking with different feature levels or ISAs all together. Take for instance one CPU running a test with AVX-512 and one with AVX2, first will execute fewer instructions per clock yet have higher performance than the latter. Or comparing Zen 2/3 to the Skylake family; Zen having more execution ports but a weaker front-end, resulting some workloads performing significantly better on one or the other.

The same is by all indicators the case for this Hygon CPU too; it's by far easier to achieve some performance by adding lots of execution ports first, and then optimize how to feed them later. And to some extent for Zen 5 too; increasing ALUs 4->6 didn't have a major impact across the board like "leakers" expected, but it will likely lead to gains when the front-end matures with Zen 6 and later revisions.

I pasted a benchmark in which Zen 5 doubles its instructions per cycle (which is the same as clock just to be 100% sure of our definitions), not throughput, when Op Cache is exhausted while using SMT a few posts back. That is the fact I'm using in my argument, what's yours? So far you've used statements of how you think things work (or should work), or quotes that specifically are about only certain classes of workloads.
You accuse me of different things while doing some of them yourself, ironically.

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2659f108-5039-4dfc-ae47-8e4b8a8f9ba3_1140x530.png

Anyway, I think I'm done with this discussion. It made me run a few benchmarks and do a fair bit of reading, so thanks for that.

remixedcat · May 10, 2025

igormp said:
Okay, it doesn't benefit all applications, and there are many that actually get worse performance, but we're talking about the general case and not really solely about music production.

But it's the best example and the most demanding on a per core basis

igormp · May 10, 2025

remixedcat said:
But it's the best example and the most demanding on a per core basis

That's relative. "most demanding" according to what?
Int throughput? Fp? Branching? Memory? Would be nice if you could bring up some metrics/profiling data to point one of those as well, so it makes it easy to compare to other scenarios.

I also don't think it's a "best example" of anything given that it's not a really widely used task to begin with, at least compared to other use cases.

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	RemixedBeast-NX
Processor	Intel Xeon E5-2690 @ 2.9Ghz (8C/16T)
Motherboard	Dell Inc. 08HPGT (CPU 1)
Cooling	Dell Standard
Memory	24GB ECC
Video Card(s)	Gigabyte Nvidia RTX2060 6GB
Storage	2TB Samsung 860 EVO SSD//2TB WD Black HDD
Display(s)	Samsung SyncMaster P2350 23in @ 1920x1080 + Dell E2013H 20 in @1600x900
Case	Dell Precision T3600 Chassis
Audio Device(s)	Beyerdynamic DT770 Pro 80 // Fiio E7 Amp/DAC
Power Supply	630w Dell T3600 PSU
Mouse	Logitech G700s/G502
Keyboard	Logitech K740
VR HMD	Linktr.ee/remixedcat // for my music ♡♡
Software	Linux Mint 20
Benchmark Scores	Network: APs: Ubiquiti Unifi AP-AC-LR and Lite Router/Sw:Meraki MX64 MS220-8P

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw