Hygon Prepares 128-Core, 512-Threaded x86 CPU with Four-Way SMT and AVX-512 Support

AleksandarK · May 8, 2025

Chinese server CPU maker Hygon, which owns a Zen core IP from AMD, has put a roadmap for C86-5G, its most powerful server processor to date, featuring up to 128 cores and an astonishing 512 threads. Thanks to a complete microarchitectural redesign, the new chip delivers more than 17 percent higher instructions per cycle (IPC) than its predecessor. It also supports the AVX-512 vector instruction set and four-way simultaneous multithreading, making it a strong contender for highly parallel workloads. Sixteen channels of DDR5-5600 memory feed data-intensive tasks, while CXL 2.0 interconnect support enables seamless scaling across multiple sockets. Built on an unknown semiconductor node, the C86-5G includes advanced power management and a hardened security engine. With 128 lanes of PCIe 5.0, it offers ample bandwidth for accelerators, NVMe storage, and high-speed networking. Hygon positions this flagship CPU as ideal for artificial intelligence training clusters, large-scale analytics platforms, and virtualized enterprise environments.

The C86-5G is the culmination of five years of steady development. The journey began with the C86-1G, an AMD-licensed design that served as a testbed for domestic engineers. It offered up to 32 cores, 64 threads, eight channels of DDR4-2666 memory, and 128 lanes of PCIe 3.0. Its goal was to absorb proven technology and build local know-how. Next came the C86-2G, which kept the same core count but introduced a revamped floating-point unit, 21 custom security instructions, and hardware-accelerated features for memory encryption, virtualization, and trusted computing. This model marked Hygon's first real step into independent research and development. With the C86-3G, Hygon rolled out a fully homegrown CPU core and system-on-chip framework. Memory support increased to DDR4-3200, I/O doubled to PCIe 4.0, and on-die networking included four 10 GbE and eight 1 GbE ports. The C86-4G raised the bar further by doubling compute density to 64 cores and 128 threads, boosting IPC by around 15 percent and adding 12-channel DDR5-4800 memory plus 128 lanes of PCIe 5.0. Socket options expanded to dual and quad configurations. Now, with the C86-5G, Hygon has shown it can compete head-to-head with global server CPU leaders, putting more faith in China's growing capabilities in high-performance computing.

View at TechPowerUp Main Site | Source

Shrek · May 8, 2025

Why 4 threads per core? Two threads keeps the core almost fully busy, so more threads gains little to nothing.

stickleback123 · May 8, 2025

Shrek said:
Why 4 threads per core? Two threads keeps the core almost fully busy, so more threads gains little to nothing.

I think it's possible, not certain but possible, that the team of hundreds of highly trained and qualified microprocessor engineers might know something about this? Other high end CPUs have used more than 2 way SMT in the past.

They'll have simulated this every which way to Sunday.

AleksandarK · May 8, 2025

Shrek said:
Why 4 threads per core? Two threads keeps the core almost fully busy, so more threads gains little to nothing.

IIRC IBM Power has 4 SMT, very specific scientific workloads. Think nuclear decomposition.

stickleback123 · May 8, 2025

AleksandarK said:
IIRC IBM Power has 4 SMT, very specific scientific workloads. Think nuclear decomposition.

Power8 has 8 way SMT!

AleksandarK · May 8, 2025

stickleback123 said:
Power8 has 8 way SMT!

I haven't checked Power in so much time, I forgot the specifics... sad

TheinsanegamerN · May 8, 2025

stickleback123 said:
I think it's possible, not certain but possible, that the team of hundreds of highly trained and qualified microprocessor engineers might know something about this? Other high end CPUs have used more than 2 way SMT in the past.

They'll have simulated this every which way to Sunday.

Oh yes, because hundreds of highly trained and qualified microprocessor engineers NEVER push something that doesnt work right *cough cough* 13th gen intel*cough**cough* AMD bulldozer*cough*.

MxPhenom 216 · May 8, 2025

Curious to know what the die size is on this thing.

ncrs · May 8, 2025

Chinese server CPU maker Hygon, which owns an x86 CPU license from AMD

What is the source for this statement? As far as I know AMD is not capable of sub-licensing x86 without Intel's approval, and that's not what happened with Hygon.
AnandTech's analysis from 2020:

AMD Does Due Diligence
Simply stating ‘AMD sublicensed the IP of one of its x86 designs’ sounds a bit farfetched on most days of the week. If either AMD or Intel believed that the opportunity to let others sell its CPU designs was profitable, how come it took until 2015/2016 to ever come to fruition? Part of this story covers that while there was clearly some money in it for AMD here, it didn’t fall foul of any Intel-AMD licensing agreements. And most importantly, it didn’t contravene any US laws regarding the export of high-performance computing intellectual property.

This last point is important. The US government gives every CPU that comes out of Intel, AMD, and others, a value based on its performance. This is some combination of FLOPs and power, and those that surpass a specific threshold are deemed too powerful to be sold in certain markets. This includes semi-custom processors, where AMD/Intel fiddle with the core count/frequency and provide off-roadmap parts.

AMD at the time made the following statement:

Starting in 2015, AMD diligently and proactively briefed the Department of Defense, the Department of Commerce and multiple other agencies within the U.S. Government before entering into the joint ventures. AMD received no objections whatsoever from any agency to the formation of the joint ventures or to the transfer of technology – technology which was of lower performance than other commercially available processors. In fact, prior to the formation of the joint ventures and the transfer of technology, the Department of Commerce notified AMD that the technology proposed was not restricted or otherwise prohibited from being transferred. Given this clear feedback, AMD moved ahead with the joint ventures.

Click to expand...

AMD had contacted the DoD and DoC, as well as all others, and had been given the green light. The new microarchitecture was deemed of low enough performance to not hit any of the export bans. AMD was also given crystal clear confirmation that the ‘technology proposed was not restricted or otherwise prohibited from being transferred’, which is a rather stark statement. At this point it should be clear that AMD may have submitted a modified version of its IP to the relevant US departments, rather than the microarchitecture we saw in the Ryzen 1000-series. This is part of what this review is about.

AleksandarK · May 8, 2025

ncrs said:
What is the source for this statement? As far as I know AMD is not capable of sub-licensing x86 without Intel's approval, and that's not what happened with Hygon.
AnandTech's analysis from 2020:

My bad, fixed

cal5582 · May 8, 2025

stickleback123 said:
I think it's possible, not certain but possible, that the team of hundreds of highly trained and qualified microprocessor engineers might know something about this? Other high end CPUs have used more than 2 way SMT in the past.

They'll have simulated this every which way to Sunday.

coming from the country that invented tofu-dreg construction.... i doubt it.

efikkan · May 8, 2025

"21 custom security instructions"
I do wonder what those entails

Shrek said:
Why 4 threads per core? Two threads keeps the core almost fully busy, so more threads gains little to nothing.

SMT is a relic of the past, and stopped making sense for user-interactive workloads after quad cores, but will stick around for a while in the server space, partly due to marketing reasons, but also because there are certain server workloads where it sort-of "makes sense", but that rationale is still shrinking. This is limited to workloads where the core is stalled most of the time thanks to cache misses and mispredictions, each worker thread is async, and the only thing that matters is overall throughput (not latency). Remember, the 4 threads will compete over caches and front-end resources, so the effective throughput for a single thread for the intended workload would have to be pretty miserable in order to justify 4-way SMT (or even 8-way like with PPC).

While modern x86 microarchitectures from Intel and AMD aren't anywhere close to saturating the CPU resources, their continuing advancement have made SMT less and less useful over time. So the less idle cycles there are, the less "free performance" can be extracted through SMT, which is probably what you're thinking about.

Meanwhile, Intel's upcoming Diamond Rapids and hopefully Nova Lake will introduce APX, which according to their documentation should bring a significant uplift in throughput.

stickleback123 said:
I think it's possible, not certain but possible, that the team of hundreds of highly trained and qualified microprocessor engineers might know something about this? Other high end CPUs have used more than 2 way SMT in the past.

They'll have simulated this every which way to Sunday.

They probably have extracted the performance they could the easiest way within their time-frame and constraints, and the end result is a CPU with lots of resources on the execution side, but with a very weak front-end to feed it.
It could also be that their SMT implementation works different from Intel and AMD, e.g. executing two of four threads intermixed (where Intel/AMD switches between two threads). If this happens to be the case, the saturation for each thread would be dreadful.

For instance PPC with its 8-way SMT is(was?) popular for certain java workloads, which are so inefficient that they barely execute at all.

(more like a traffic jam…)

igormp · May 8, 2025

Long time since I new CPU came with SMT4 or higher, cool to see.
Let's see how it performs in practice.

efikkan said:
Remember, the 4 threads will compete over caches and front-end resources, so the effective throughput for a single thread for the intended workload would have to be pretty miserable in order to justify 4-way SMT (or even 8-way like with PPC).

efikkan said:
It could also be that their SMT implementation works different from Intel and AMD, e.g. executing two of four threads intermixed (where Intel/AMD switches between two threads). If this happens to be the case, the saturation for each thread would be dreadful.

Going for SMT makes your front-end way simpler as well, and allows you to do some fancy strategies to increase IPC and maximize EU utilization for some given scenarios.
Zen 5, as an example, has 2x 4-wide decoders, which are pretty bog standard to implement (compared to Intel's 6-wide and larger implementations). A single thread will end up bottlenecked by it, and is not able to make use of both decoders, both with SMT then it's possible to basically double up the IPC.

Intel has that fancy 3x3-wide cluster than a single thread can use, but those are used in those E-cores which lack a µop-cache.

Given how that Hygon CPU is meant for servers and not your usual desktop use-case, I believe it does make more sense to go with SMT, specially for IO bound workloads that basically fit into the description you gave:

efikkan said:
This is limited to workloads where the core is stalled most of the time thanks to cache misses and mispredictions, each worker thread is async, and the only thing that matters is overall throughput (not latency).

Many server-ish workloads can pretty much be summarized to that (specially in web-related stuff) and SMT on Zen CPUs gives a really significant boost to throughput, and even latency in some scenarios.

Blazko79 · May 8, 2025

Unreal 6 will be multithreaded so we know how good software actually is.

ncrs · May 8, 2025

efikkan said:
"21 custom security instructions"
I do wonder what those entails

It's most likely Chinese crypto instructions as in SM3 and SM4 which are already supported by some RISC-V, ARM cores and Intel Arrow/Lunar Lake.

efikkan said:
SMT is a relic of the past, and stopped making sense for user-interactive workloads after quad cores, but will stick around for a while in the server space, partly due to marketing reasons, but also because there are certain server workloads where it sort-of "makes sense", but that rationale is still shrinking. This is limited to workloads where the core is stalled most of the time thanks to cache misses and mispredictions, each worker thread is async, and the only thing that matters is overall throughput (not latency). Remember, the 4 threads will compete over caches and front-end resources, so the effective throughput for a single thread for the intended workload would have to be pretty miserable in order to justify 4-way SMT (or even 8-way like with PPC).

While modern x86 microarchitectures from Intel and AMD aren't anywhere close to saturating the CPU resources, their continuing advancement have made SMT less and less useful over time. So the less idle cycles there are, the less "free performance" can be extracted through SMT, which is probably what you're thinking about.

AMD doesn't agree since they built Zen 5 specifically for SMT. It has dual 4-way decoders with each dedicated to one thread. NVIDIA doesn't agree since their next ARM Vera CPU will feature SMT. Intel disagrees since their workstation and server CPUs based on P-cores will keep including SMT. It's only E-core designs that won't.
It's still easy to saturate a modern x86 core with an integer load, for example the 7-zip benchmark scales to almost 100% on both AMD and Intel SMT. Floating point is a different story, but still possible to extract tangible benefits especially on modern implementations (Zen 4+, Alder Lake+).

efikkan said:
Meanwhile, Intel's upcoming Diamond Rapids and hopefully Nova Lake will introduce APX, which according to their documentation should bring a significant uplift in throughput.

APX is solving different issues, chiefly of register pressure. It's not like APX server P-core CPUs will not feature SMT or at least I haven't read anything that would suggest it.

efikkan said:
They probably have extracted the performance they could the easiest way within their time-frame and constraints, and the end result is a CPU with lots of resources on the execution side, but with a very weak front-end to feed it.
It could also be that their SMT implementation works different from Intel and AMD, e.g. executing two of four threads intermixed (where Intel/AMD switches between two threads). If this happens to be the case, the saturation for each thread would be dreadful.

I'm not sure about "Intel/AMD switches between two threads" when both are executing at the same time inside the core, and in case of AMD Zen 5 are even being decoded at the same time. Intel has also an 8-wide decoder which supposedly can be split. I haven't seen any confirmation that it happens for SMT, but I suspect it does.

efikkan said:
For instance PPC with its 8-way SMT is(was?) popular for certain java workloads, which are so inefficient that they barely execute at all. (more like a traffic jam…)

It was more for the Oracle database side where operations were simple, but had to be kept "intact" (without context switching) in order to optimize throughput while maintaining latency. POWER 8 cores were also heavily overbuilt compared to modern x86 - basically two full cores in one with shared caches, that allowed higher order SMT. I'd say the closest in design for x86 would be the infamous Bulldozer cores, but POWER went further and didn't share the FP unit.
Coincidentally Oracle's own SPARC CPUs also supported up to SMT8.

_roman_ · May 9, 2025

For those Servers who still run on legacy x86.

efikkan · May 9, 2025

igormp said:
Zen 5, as an example, has 2x 4-wide decoders…

Zen 5 implements two-ahead branch prediction, in an effort to reduce the cost of mispredicitons by having the alternative branch ready to be executed. Such improvements are just another example of reducing idle clock cycles which in return means gains from SMT will be reduced.

I haven't seen any evidence of significant gains from this yet, but with refinement and combined with APX it has some great (theoretical) potential.

igormp said:
Going for SMT makes your front-end way simpler as well, and allows you to do some fancy strategies to increase IPC and maximize EU utilization for some given scenarios.<snip>
A single thread will end up bottlenecked by it, and is not able to make use of both decoders, both with SMT then it's possible to basically double up the IPC.

Just to be clear, SMT the way Intel and AMD implements it doesn't improve IPC at all. It just tries to keep the core fed, as if it was one single thread saturating the core.

ncrs said:
APX is solving different issues, chiefly of register pressure.

Actually not, the benefits from less register shuffling is just an added bonus, and a rather minimal one to be honest.
APX is about maximizing the efficiency of the branch predictor to saturate the CPU which is very clearly explained in the official documentation:

The performance features introduced so far will have a limited impact on workloads that suffer from a large number of conditional branch mispredictions. As out-of-order CPUs continue to become deeper and wider, the cost of mispredictions increasingly dominates the performance of such workloads. Branch predictor improvements can mitigate this only to a limited extent as data-dependent branches are fundamentally hard to predict.

To address this growing performance issue, we significantly expand the conditional instruction set of x86, which was first introduced with the Intel® Pentium® Pro in the form of CMOV/SET instructions. These instructions are used quite extensively by today’s compilers, but they are too limited for the broader use of if-conversion (a compiler optimization that replaces branches with conditional instructions).

Intel APX adds conditional forms of load, store, and compare/test instructions and adds an option for the compiler to suppress the status flag writes of common instructions. These enhancements expand the applicability of if-conversion to much larger code regions, cutting down on the number of branches that may incur misprediction penalties.

So as you can clearly see, this is very much about saturating the CPU.

How successful it will be, remains to be seen. This is pretty much in line with what myself and other programmers have requested for many years, if anything I'm wondering it it's enough.

R-T-B · May 10, 2025

Shrek said:
Why 4 threads per core? Two threads keeps the core almost fully busy, so more threads gains little to nothing.

Not true for highly threaded workloads, otherwise SPARC would never have existed.

remixedcat · May 10, 2025

stickleback123 said:
I think it's possible, not certain but possible, that the team of hundreds of highly trained and qualified microprocessor engineers might know something about this? Other high end CPUs have used more than 2 way SMT in the past.

They'll have simulated this every which way to Sunday.

4way SMT would be very bad for music production, as it requires higher speeds than needing moar coars. VSTs or Virtual instruments as fully emulated peices of music hardware done on software that emulate all the aspects of an actual instrument and they are very CPU intense ane need very high clock speeds and IPC. If the DSP usage maxxess out you get cuttouts and stuttering badly, Music production requires realtime performance and very low DPC latency as well. I know these 4way SMT CPUs are going to do badly for music production! each core will work too hard and there will be quaduple digit latency!! In some cases people even disable hyperthreading to get better performance while making and performing music!

I have a few systems that are only 4c8t that do better with certain tracks than 1 of my 8c16t systems due to higher clock speeds and lower DPC. I have some synth patches I made that will bring any CPU to it's knees.

igormp · May 10, 2025

efikkan said:
Zen 5 implements two-ahead branch prediction, in an effort to reduce the cost of mispredicitons by having the alternative branch ready to be executed. Such improvements are just another example of reducing idle clock cycles which in return means gains from SMT will be reduced.

Yes, but those are not mutually exclusive. There are still considerable gains from SMT within Zen 5 nonetheless.

efikkan said:
Just to be clear, SMT the way Intel and AMD implements it doesn't improve IPC at all. It just tries to keep the core fed, as if it was one single thread saturating the core.

It does improve IPC in absolute terms in practice, given that a single thread is not able to effectively saturate the core. As an example, see this micro-benchmark from chips and cheese:

AMD’s Ryzen 9950X: Zen 5 on Desktop

AMD’s desktop Zen 5 products, codenamed Granite Ridge, are the latest in the company’s line of high performance consumer offerings.

chipsandcheese.com

You can argue that it's a workaround for a front-end bottleneck (which I'd agree with), but that doesn't change the end results.

remixedcat · May 10, 2025

igormp said:
It does improve IPC in absolute terms in practice, given that a single thread is not able to effectively saturate the core. As an example, see this micro-benchmark from chips and cheese:

AMD’s Ryzen 9950X: Zen 5 on Desktop

AMD’s desktop Zen 5 products, codenamed Granite Ridge, are the latest in the company’s line of high performance consumer offerings.

chipsandcheese.com

Not w music production. A VST runs everything on one thread and when it pushes it hard.. see my above post...

ncrs · May 10, 2025

efikkan said:
Zen 5 implements two-ahead branch prediction, in an effort to reduce the cost of mispredicitons by having the alternative branch ready to be executed. Such improvements are just another example of reducing idle clock cycles which in return means gains from SMT will be reduced.

I haven't seen any evidence of significant gains from this yet, but with refinement and combined with APX it has some great (theoretical) potential.

Improvements to the branch prediction affect both SMT threads since both raw decoding and branch prediction+opcache are active at the same time in Zen 5:

Both the fetch+decode and op cache pipelines can be active at the same time, and both feed into the in-order micro-op queue.

(source - AMD via Chips and Cheese)

efikkan said:
Just to be clear, SMT the way Intel and AMD implements it doesn't improve IPC at all. It just tries to keep the core fed, as if it was one single thread saturating the core.

No, SMT does increase IPC, and in case of Zen 5 it doubles it when Op Cache runs out as expected from the decoder design:

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2659f108-5039-4dfc-ae47-8e4b8a8f9ba3_1140x530.png

(source - Chips and Cheese)
Even if the Op Cache is disabled:

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F693afadc-4b79-47d2-a214-b30346371254_2171x1000.png

(source - Chips and Cheese)

efikkan said:
Actually not, the benefits from less register shuffling is just an added bonus, and a rather minimal one to be honest.
APX is about maximizing the efficiency of the branch predictor to saturate the CPU which is very clearly explained in the official documentation:

The performance features introduced so far will have a limited impact on workloads that suffer from a large number of conditional branch mispredictions. As out-of-order CPUs continue to become deeper and wider, the cost of mispredictions increasingly dominates the performance of such workloads. Branch predictor improvements can mitigate this only to a limited extent as data-dependent branches are fundamentally hard to predict.

To address this growing performance issue, we significantly expand the conditional instruction set of x86, which was first introduced with the Intel® Pentium® Pro in the form of CMOV/SET instructions. These instructions are used quite extensively by today’s compilers, but they are too limited for the broader use of if-conversion (a compiler optimization that replaces branches with conditional instructions).

Intel APX adds conditional forms of load, store, and compare/test instructions and adds an option for the compiler to suppress the status flag writes of common instructions. These enhancements expand the applicability of if-conversion to much larger code regions, cutting down on the number of branches that may incur misprediction penalties.

So as you can clearly see, this is very much about saturating the CPU.

How successful it will be, remains to be seen. This is pretty much in line with what myself and other programmers have requested for many years, if anything I'm wondering it it's enough.

What you quoted affects one type of workload, and doesn't invalidate SMT in any way. As I wrote before I haven't read anything that makes APX SMT-phobic

efikkan · May 10, 2025

igormp said:
It does improve IPC in absolute terms in practice, given that a single thread is not able to effectively saturate the core.

Absolutely not, it's a common misconception that IPC is performance per clock, when it's not, it's the amount of instructions the CPU is able to churn through. Whether there is one, two or more threads sharing a core's resources, the IPC remains constant. SMT does improve the saturation of the core for some workloads, but the total performance will only converge towards a fully thread fully saturating the core, never above that. This should be basic knowledge about CPUs.

ncrs said:
What you quoted affects one type of workload, and doesn't invalidate SMT in any way. As I wrote before I haven't read anything that makes APX SMT-phobic

I never claimed APX was "SMT-phobic", they probably will co-exist for a while, but the fact that each improvement in microarchitecture resulting in better saturated core results in less stalls, and therefore fewer idle "free" clock cycles for SMT to utilize. As you can clearly see in the quote from earlier about APX; "As out-of-order CPUs continue to become deeper and wider, the cost of mispredictions increasingly dominates the performance of such workloads.", it is very clearly about keeping the CPU saturated. The more saturated the core is from one thread, the less gains there will be from SMT, this should be obvious and is basic logical deduction, and is why we've seen fewer and fewer cases where SMT is significantly beneficial as CPUs advances.

On top of that, the intricate complexity of implementing SMT in the pipeline in modern CPUs, with the resulting transistor "costs" and design constraints, and all the nasty security implications, it naturally comes to a point where the efforts can be better spent by creating a more efficient architecture without SMT. This is why Intel's client CPUs have already moved on, and others will eventually follow.

chrcoluk · May 10, 2025

efikkan said:
"21 custom security instructions"
I do wonder what those entails

SMT is a relic of the past, and stopped making sense for user-interactive workloads after quad cores, but will stick around for a while in the server space, partly due to marketing reasons, but also because there are certain server workloads where it sort-of "makes sense", but that rationale is still shrinking. This is limited to workloads where the core is stalled most of the time thanks to cache misses and mispredictions, each worker thread is async, and the only thing that matters is overall throughput (not latency). Remember, the 4 threads will compete over caches and front-end resources, so the effective throughput for a single thread for the intended workload would have to be pretty miserable in order to justify 4-way SMT (or even 8-way like with PPC).

While modern x86 microarchitectures from Intel and AMD aren't anywhere close to saturating the CPU resources, their continuing advancement have made SMT less and less useful over time. So the less idle cycles there are, the less "free performance" can be extracted through SMT, which is probably what you're thinking about.

Meanwhile, Intel's upcoming Diamond Rapids and hopefully Nova Lake will introduce APX, which according to their documentation should bring a significant uplift in throughput.

They probably have extracted the performance they could the easiest way within their time-frame and constraints, and the end result is a CPU with lots of resources on the execution side, but with a very weak front-end to feed it.
It could also be that their SMT implementation works different from Intel and AMD, e.g. executing two of four threads intermixed (where Intel/AMD switches between two threads). If this happens to be the case, the saturation for each thread would be dreadful.

For instance PPC with its 8-way SMT is(was?) popular for certain java workloads, which are so inefficient that they barely execute at all. (more like a traffic jam…)

I think you explained it well, an anology would perhaps be you have a narrow thin corridor, where you allow one person through at a time back to back, then you decide to allow 2 side by side, more overall people get through, but its a less pleasant experience with cramped space.

ncrs · May 10, 2025

efikkan said:
Absolutely not, it's a common misconception that IPC is performance per clock, when it's not, it's the amount of instructions the CPU is able to churn through. Whether there is one, two or more threads sharing a core's resources, the IPC remains constant. SMT does improve the saturation of the core for some workloads, but the total performance will only converge towards a fully thread fully saturating the core, never above that. This should be basic knowledge about CPUs.

You are redefining what "IPC" means to suit your argument. I gave you detailed test results which you simply ignore. There's not much more I can do here.

efikkan said:
The more saturated the core is from one thread, the less gains there will be from SMT, this should be obvious and is basic logical deduction, and is why we've seen fewer and fewer cases where SMT is significantly beneficial as CPUs advances.

That's not what we've been seeing. SMT performance and efficiency in x86 has been increasing. Zen 5 is able to achieve more with it than for example Zen 2. Same for Intel P-cores - they scale way better than their early SMT implementations.

efikkan said:
On top of that, the intricate complexity of implementing SMT in the pipeline in modern CPUs, with the resulting transistor "costs" and design constraints, and all the nasty security implications, it naturally comes to a point where the efforts can be better spent by creating a more efficient architecture without SMT. This is why Intel's client CPUs have already moved on, and others will eventually follow.

Intel is not "moving on" from SMT in general. Their P-cores in server/workstation designs will keep using it. It's just their consumer designs that don't implement it. As I wrote before even NVIDIA is introducing SMT into their next sever ARM Vera CPUs.
From the linked CnC article when they discussed SMT with AMD:

The 2T point gets emphasis here. AMD is well aware that Intel is planning to leave SMT out of their upcoming Lunar Lake mobile processor. Zen 5 takes the opposite approach, maintaining SMT support even in mobile products like Strix Point. AMD found that SMT let them maintain maximum 1T performance while enjoying the higher throughput enabled by running two threads in a core for multithreaded workloads. They also found SMT gave them better power efficiency in those multithreaded loads, drawing a clear contrast with Intel’s strategy.

System Name	CyberPowerPC ET8070
Processor	Intel Core i5-10400F
Motherboard	Gigabyte B460M DS3H AC-Y1
Memory	2 x Crucial Ballistix 8GB DDR4-3000
Video Card(s)	MSI Nvidia GeForce GTX 1660 Super
Storage	Boot: Intel OPTANE SSD P1600X Series 118GB M.2 PCIE
Display(s)	Dell P2416D (2560 x 1440)
Power Supply	EVGA 500W1 (modified to have two bridge rectifiers)
Software	Windows 11 Home

Processor	Ryzen 5950X
Motherboard	MSI MPG X570 GAMING PRO CARBON WIFI
Memory	2 x 16GB DDR4 3600MHz
Video Card(s)	ASUS PRIME GeForce RTX 5080
Storage	2 x 4TB WD SN850X
Case	Fractal Design Define C

Processor	Ryzen 5950X
Motherboard	MSI MPG X570 GAMING PRO CARBON WIFI
Memory	2 x 16GB DDR4 3600MHz
Video Card(s)	ASUS PRIME GeForce RTX 5080
Storage	2 x 4TB WD SN850X
Case	Fractal Design Define C

System Name	Skunkworks 3.0
Processor	5800x3d
Motherboard	x570 unify
Cooling	Noctua NH-U12A
Memory	32GB 3600 mhz
Video Card(s)	asrock 6800xt challenger D
Storage	Sabarent rocket 4.0 2TB, MX 500 2TB
Display(s)	Asus 1440p144 27"
Case	Old arse cooler master 932
Power Supply	Corsair 1200w platinum
Mouse	squeak
Keyboard	Some old office thing
Software	Manjaro

System Name	Main Stack
Processor	AMD Ryzen 7 9800X3D
Motherboard	Asus X870 ROG Strix-A - White
Cooling	Air (temporary until 9070xt blocks are available)
Memory	G. Skill Royal 2x24GB 6000Mhz C26
Video Card(s)	Powercolor Red Devil Radeon 9070XT 16G
Storage	Samsung 9100 Gen5 1TB \| Samsung 980 Pro 1TB (Games_1) \| Lexar NM790 2TB (Games_2)
Display(s)	Asus XG27ACDNG 360Hz QD-OLED \| Gigabyte M27Q-P 165Hz 1440P IPS \| LG 24" 1440 IPS 1440p
Case	HAVN HS420 - White
Audio Device(s)	FiiO K7 \| Sennheiser HD650 + Beyerdynamic FOX Mic
Power Supply	Corsair RM1000x ATX 3.1
Mouse	Razer Viper v3 Pro
Keyboard	Corsair K65 Plus 75% Wireless - USB Mode
Software	Windows 11 Pro 64-Bit

Hygon Prepares 128-Core, 512-Threaded x86 CPU with Four-Way SMT and AVX-512 Support

AleksandarK

News Editor

Shrek

stickleback123

AleksandarK

News Editor

stickleback123

AleksandarK

News Editor

TheinsanegamerN

MxPhenom 216

ASIC Engineer

ncrs

AMD Does Due Diligence

AleksandarK

News Editor

cal5582

efikkan

igormp

Blazko79

ncrs

_roman_

efikkan

R-T-B

remixedcat

igormp

AMD’s Ryzen 9950X: Zen 5 on Desktop

remixedcat

AMD’s Ryzen 9950X: Zen 5 on Desktop

ncrs

efikkan

chrcoluk

ncrs

System Name	Nirn
Processor	Amd Ryzen 7950X3D
Motherboard	MSI MEG ACE X670e
Cooling	Noctua NH-D15
Memory	128 GB Kingston DDR5 6000 (running at 4000)
Video Card(s)	Radeon RX 7900XTX (24G) + Geforce 4070ti (12G) Physx
Storage	SAMSUNG 990 EVO SSD 2TB Gen 5 x2 (OS)+SAMSUNG 980 SSD 1TB PCle 3.0x4 (Primocache) +2X 22TB WD Gold
Display(s)	Samsung UN55NU8000 (Freesync)
Case	Corsair Graphite Series 780T White
Audio Device(s)	Creative Soundblaster AE-7 + Sennheiser GSP600
Power Supply	Seasonic PRIME TX-1000 Titanium
Mouse	Razer Mamba Elite Wired
Keyboard	Razer BlackWidow Chroma v1
VR HMD	Oculus Quest 2
Software	Windows 10

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	Pioneer
Processor	Ryzen 9 9950X
Motherboard	MSI MAG X670E Tomahawk Wifi
Cooling	Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory	64GB (2x 32GB) G.Skill Flare X5 @ DDR5-6200(Running 1T no GDM)
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs, 1x 2TB Seagate Exos 3.5"
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64, other office machines run Windows 11 Enterprise

System Name	RemixedBeast-NX
Processor	Intel Xeon E5-2690 @ 2.9Ghz (8C/16T)
Motherboard	Dell Inc. 08HPGT (CPU 1)
Cooling	Dell Standard
Memory	24GB ECC
Video Card(s)	Gigabyte Nvidia RTX2060 6GB
Storage	2TB Samsung 860 EVO SSD//2TB WD Black HDD
Display(s)	Samsung SyncMaster P2350 23in @ 1920x1080 + Dell E2013H 20 in @1600x900
Case	Dell Precision T3600 Chassis
Audio Device(s)	Beyerdynamic DT770 Pro 80 // Fiio E7 Amp/DAC
Power Supply	630w Dell T3600 PSU
Mouse	Logitech G700s/G502
Keyboard	Logitech K740
VR HMD	Linktr.ee/remixedcat // for my music ♡♡
Software	Linux Mint 20
Benchmark Scores	Network: APs: Ubiquiti Unifi AP-AC-LR and Lite Router/Sw:Meraki MX64 MS220-8P

System Name	Main PC
Processor	13700k
Motherboard	Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling	Noctua NH-D15S
Memory	32 Gig 3200CL14
Video Card(s)	4080 RTX SUPER FE 16G
Storage	1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s)	LG 27GL850
Case	Fractal Define R4
Audio Device(s)	Soundblaster AE-9
Power Supply	Antec HCG 750 Gold
Software	Windows 10 21H2 LTSC

Hygon Prepares 128-Core, 512-Threaded x86 CPU with Four-Way SMT and AVX-512 Support

News Editor

News Editor

News Editor

ASIC Engineer

AMD Does Due Diligence​

News Editor

AMD Does Due Diligence