Thursday, May 8th 2025

Hygon Prepares 128-Core, 512-Threaded x86 CPU with Four-Way SMT and AVX-512 Support

Chinese server CPU maker Hygon, which owns a Zen core IP from AMD, has put a roadmap for C86-5G, its most powerful server processor to date, featuring up to 128 cores and an astonishing 512 threads. Thanks to a complete microarchitectural redesign, the new chip delivers more than 17 percent higher instructions per cycle (IPC) than its predecessor. It also supports the AVX-512 vector instruction set and four-way simultaneous multithreading, making it a strong contender for highly parallel workloads. Sixteen channels of DDR5-5600 memory feed data-intensive tasks, while CXL 2.0 interconnect support enables seamless scaling across multiple sockets. Built on an unknown semiconductor node, the C86-5G includes advanced power management and a hardened security engine. With 128 lanes of PCIe 5.0, it offers ample bandwidth for accelerators, NVMe storage, and high-speed networking. Hygon positions this flagship CPU as ideal for artificial intelligence training clusters, large-scale analytics platforms, and virtualized enterprise environments.

The C86-5G is the culmination of five years of steady development. The journey began with the C86-1G, an AMD-licensed design that served as a testbed for domestic engineers. It offered up to 32 cores, 64 threads, eight channels of DDR4-2666 memory, and 128 lanes of PCIe 3.0. Its goal was to absorb proven technology and build local know-how. Next came the C86-2G, which kept the same core count but introduced a revamped floating-point unit, 21 custom security instructions, and hardware-accelerated features for memory encryption, virtualization, and trusted computing. This model marked Hygon's first real step into independent research and development. With the C86-3G, Hygon rolled out a fully homegrown CPU core and system-on-chip framework. Memory support increased to DDR4-3200, I/O doubled to PCIe 4.0, and on-die networking included four 10 GbE and eight 1 GbE ports. The C86-4G raised the bar further by doubling compute density to 64 cores and 128 threads, boosting IPC by around 15 percent and adding 12-channel DDR5-4800 memory plus 128 lanes of PCIe 5.0. Socket options expanded to dual and quad configurations. Now, with the C86-5G, Hygon has shown it can compete head-to-head with global server CPU leaders, putting more faith in China's growing capabilities in high-performance computing.
Source: via HXL on X
Add your own comment

29 Comments on Hygon Prepares 128-Core, 512-Threaded x86 CPU with Four-Way SMT and AVX-512 Support

#1
Shrek
Why 4 threads per core? Two threads keeps the core almost fully busy, so more threads gains little to nothing.
Posted on Reply
#2
stickleback123
ShrekWhy 4 threads per core? Two threads keeps the core almost fully busy, so more threads gains little to nothing.
I think it's possible, not certain but possible, that the team of hundreds of highly trained and qualified microprocessor engineers might know something about this? Other high end CPUs have used more than 2 way SMT in the past.

They'll have simulated this every which way to Sunday.
Posted on Reply
#3
AleksandarK
News Editor
ShrekWhy 4 threads per core? Two threads keeps the core almost fully busy, so more threads gains little to nothing.
IIRC IBM Power has 4 SMT, very specific scientific workloads. Think nuclear decomposition.
Posted on Reply
#4
stickleback123
AleksandarKIIRC IBM Power has 4 SMT, very specific scientific workloads. Think nuclear decomposition.
Power8 has 8 way SMT!
Posted on Reply
#5
AleksandarK
News Editor
stickleback123Power8 has 8 way SMT!
I haven't checked Power in so much time, I forgot the specifics... sad :(
Posted on Reply
#6
TheinsanegamerN
stickleback123I think it's possible, not certain but possible, that the team of hundreds of highly trained and qualified microprocessor engineers might know something about this? Other high end CPUs have used more than 2 way SMT in the past.

They'll have simulated this every which way to Sunday.
Oh yes, because hundreds of highly trained and qualified microprocessor engineers NEVER push something that doesnt work right *cough cough* 13th gen intel*cough**cough* AMD bulldozer*cough*.
Posted on Reply
#7
MxPhenom 216
ASIC Engineer
Curious to know what the die size is on this thing.
Posted on Reply
#8
ncrs
Chinese server CPU maker Hygon, which owns an x86 CPU license from AMD
What is the source for this statement? As far as I know AMD is not capable of sub-licensing x86 without Intel's approval, and that's not what happened with Hygon.
AnandTech's analysis from 2020:

AMD Does Due Diligence

Simply stating ‘AMD sublicensed the IP of one of its x86 designs’ sounds a bit farfetched on most days of the week. If either AMD or Intel believed that the opportunity to let others sell its CPU designs was profitable, how come it took until 2015/2016 to ever come to fruition? Part of this story covers that while there was clearly some money in it for AMD here, it didn’t fall foul of any Intel-AMD licensing agreements. And most importantly, it didn’t contravene any US laws regarding the export of high-performance computing intellectual property.

This last point is important. The US government gives every CPU that comes out of Intel, AMD, and others, a value based on its performance. This is some combination of FLOPs and power, and those that surpass a specific threshold are deemed too powerful to be sold in certain markets. This includes semi-custom processors, where AMD/Intel fiddle with the core count/frequency and provide off-roadmap parts.

AMD at the time made the following statement:
Starting in 2015, AMD diligently and proactively briefed the Department of Defense, the Department of Commerce and multiple other agencies within the U.S. Government before entering into the joint ventures. AMD received no objections whatsoever from any agency to the formation of the joint ventures or to the transfer of technology – technology which was of lower performance than other commercially available processors. In fact, prior to the formation of the joint ventures and the transfer of technology, the Department of Commerce notified AMD that the technology proposed was not restricted or otherwise prohibited from being transferred. Given this clear feedback, AMD moved ahead with the joint ventures.
AMD had contacted the DoD and DoC, as well as all others, and had been given the green light. The new microarchitecture was deemed of low enough performance to not hit any of the export bans. AMD was also given crystal clear confirmation that the ‘technology proposed was not restricted or otherwise prohibited from being transferred’, which is a rather stark statement. At this point it should be clear that AMD may have submitted a modified version of its IP to the relevant US departments, rather than the microarchitecture we saw in the Ryzen 1000-series. This is part of what this review is about.
Posted on Reply
#9
AleksandarK
News Editor
ncrsWhat is the source for this statement? As far as I know AMD is not capable of sub-licensing x86 without Intel's approval, and that's not what happened with Hygon.
AnandTech's analysis from 2020:
My bad, fixed
Posted on Reply
#10
cal5582
stickleback123I think it's possible, not certain but possible, that the team of hundreds of highly trained and qualified microprocessor engineers might know something about this? Other high end CPUs have used more than 2 way SMT in the past.

They'll have simulated this every which way to Sunday.
coming from the country that invented tofu-dreg construction.... i doubt it.
Posted on Reply
#11
efikkan
"21 custom security instructions"
I do wonder what those entails :(
ShrekWhy 4 threads per core? Two threads keeps the core almost fully busy, so more threads gains little to nothing.
SMT is a relic of the past, and stopped making sense for user-interactive workloads after quad cores, but will stick around for a while in the server space, partly due to marketing reasons, but also because there are certain server workloads where it sort-of "makes sense", but that rationale is still shrinking. This is limited to workloads where the core is stalled most of the time thanks to cache misses and mispredictions, each worker thread is async, and the only thing that matters is overall throughput (not latency). Remember, the 4 threads will compete over caches and front-end resources, so the effective throughput for a single thread for the intended workload would have to be pretty miserable in order to justify 4-way SMT (or even 8-way like with PPC).

While modern x86 microarchitectures from Intel and AMD aren't anywhere close to saturating the CPU resources, their continuing advancement have made SMT less and less useful over time. So the less idle cycles there are, the less "free performance" can be extracted through SMT, which is probably what you're thinking about.

Meanwhile, Intel's upcoming Diamond Rapids and hopefully Nova Lake will introduce APX, which according to their documentation should bring a significant uplift in throughput.
stickleback123I think it's possible, not certain but possible, that the team of hundreds of highly trained and qualified microprocessor engineers might know something about this? Other high end CPUs have used more than 2 way SMT in the past.

They'll have simulated this every which way to Sunday.
They probably have extracted the performance they could the easiest way within their time-frame and constraints, and the end result is a CPU with lots of resources on the execution side, but with a very weak front-end to feed it.
It could also be that their SMT implementation works different from Intel and AMD, e.g. executing two of four threads intermixed (where Intel/AMD switches between two threads). If this happens to be the case, the saturation for each thread would be dreadful.

For instance PPC with its 8-way SMT is(was?) popular for certain java workloads, which are so inefficient that they barely execute at all. :p (more like a traffic jam…)
Posted on Reply
#12
igormp
Long time since I new CPU came with SMT4 or higher, cool to see.
Let's see how it performs in practice.
efikkanRemember, the 4 threads will compete over caches and front-end resources, so the effective throughput for a single thread for the intended workload would have to be pretty miserable in order to justify 4-way SMT (or even 8-way like with PPC).
efikkanIt could also be that their SMT implementation works different from Intel and AMD, e.g. executing two of four threads intermixed (where Intel/AMD switches between two threads). If this happens to be the case, the saturation for each thread would be dreadful.
Going for SMT makes your front-end way simpler as well, and allows you to do some fancy strategies to increase IPC and maximize EU utilization for some given scenarios.
Zen 5, as an example, has 2x 4-wide decoders, which are pretty bog standard to implement (compared to Intel's 6-wide and larger implementations). A single thread will end up bottlenecked by it, and is not able to make use of both decoders, both with SMT then it's possible to basically double up the IPC.

Intel has that fancy 3x3-wide cluster than a single thread can use, but those are used in those E-cores which lack a µop-cache.

Given how that Hygon CPU is meant for servers and not your usual desktop use-case, I believe it does make more sense to go with SMT, specially for IO bound workloads that basically fit into the description you gave:
efikkanThis is limited to workloads where the core is stalled most of the time thanks to cache misses and mispredictions, each worker thread is async, and the only thing that matters is overall throughput (not latency).
Many server-ish workloads can pretty much be summarized to that (specially in web-related stuff) and SMT on Zen CPUs gives a really significant boost to throughput, and even latency in some scenarios.
Posted on Reply
#13
Blazko79
Unreal 6 will be multithreaded so we know how good software actually is.
Posted on Reply
#14
ncrs
efikkan"21 custom security instructions"
I do wonder what those entails :(
It's most likely Chinese crypto instructions as in SM3 and SM4 which are already supported by some RISC-V, ARM cores and Intel Arrow/Lunar Lake.
efikkanSMT is a relic of the past, and stopped making sense for user-interactive workloads after quad cores, but will stick around for a while in the server space, partly due to marketing reasons, but also because there are certain server workloads where it sort-of "makes sense", but that rationale is still shrinking. This is limited to workloads where the core is stalled most of the time thanks to cache misses and mispredictions, each worker thread is async, and the only thing that matters is overall throughput (not latency). Remember, the 4 threads will compete over caches and front-end resources, so the effective throughput for a single thread for the intended workload would have to be pretty miserable in order to justify 4-way SMT (or even 8-way like with PPC).

While modern x86 microarchitectures from Intel and AMD aren't anywhere close to saturating the CPU resources, their continuing advancement have made SMT less and less useful over time. So the less idle cycles there are, the less "free performance" can be extracted through SMT, which is probably what you're thinking about.
AMD doesn't agree since they built Zen 5 specifically for SMT. It has dual 4-way decoders with each dedicated to one thread. NVIDIA doesn't agree since their next ARM Vera CPU will feature SMT. Intel disagrees since their workstation and server CPUs based on P-cores will keep including SMT. It's only E-core designs that won't.
It's still easy to saturate a modern x86 core with an integer load, for example the 7-zip benchmark scales to almost 100% on both AMD and Intel SMT. Floating point is a different story, but still possible to extract tangible benefits especially on modern implementations (Zen 4+, Alder Lake+).
efikkanMeanwhile, Intel's upcoming Diamond Rapids and hopefully Nova Lake will introduce APX, which according to their documentation should bring a significant uplift in throughput.
APX is solving different issues, chiefly of register pressure. It's not like APX server P-core CPUs will not feature SMT or at least I haven't read anything that would suggest it.
efikkanThey probably have extracted the performance they could the easiest way within their time-frame and constraints, and the end result is a CPU with lots of resources on the execution side, but with a very weak front-end to feed it.
It could also be that their SMT implementation works different from Intel and AMD, e.g. executing two of four threads intermixed (where Intel/AMD switches between two threads). If this happens to be the case, the saturation for each thread would be dreadful.
I'm not sure about "Intel/AMD switches between two threads" when both are executing at the same time inside the core, and in case of AMD Zen 5 are even being decoded at the same time. Intel has also an 8-wide decoder which supposedly can be split. I haven't seen any confirmation that it happens for SMT, but I suspect it does.
efikkanFor instance PPC with its 8-way SMT is(was?) popular for certain java workloads, which are so inefficient that they barely execute at all. :p (more like a traffic jam…)
It was more for the Oracle database side where operations were simple, but had to be kept "intact" (without context switching) in order to optimize throughput while maintaining latency. POWER 8 cores were also heavily overbuilt compared to modern x86 - basically two full cores in one with shared caches, that allowed higher order SMT. I'd say the closest in design for x86 would be the infamous Bulldozer cores, but POWER went further and didn't share the FP unit.
Coincidentally Oracle's own SPARC CPUs also supported up to SMT8.
Posted on Reply
#15
_roman_
For those Servers who still run on legacy x86.
Posted on Reply
#16
efikkan
igormpZen 5, as an example, has 2x 4-wide decoders…
Zen 5 implements two-ahead branch prediction, in an effort to reduce the cost of mispredicitons by having the alternative branch ready to be executed. Such improvements are just another example of reducing idle clock cycles which in return means gains from SMT will be reduced.

I haven't seen any evidence of significant gains from this yet, but with refinement and combined with APX it has some great (theoretical) potential.
igormpGoing for SMT makes your front-end way simpler as well, and allows you to do some fancy strategies to increase IPC and maximize EU utilization for some given scenarios.<snip>
A single thread will end up bottlenecked by it, and is not able to make use of both decoders, both with SMT then it's possible to basically double up the IPC.
Just to be clear, SMT the way Intel and AMD implements it doesn't improve IPC at all. It just tries to keep the core fed, as if it was one single thread saturating the core.
ncrsAPX is solving different issues, chiefly of register pressure.
Actually not, the benefits from less register shuffling is just an added bonus, and a rather minimal one to be honest.
APX is about maximizing the efficiency of the branch predictor to saturate the CPU which is very clearly explained in the official documentation:

The performance features introduced so far will have a limited impact on workloads that suffer from a large number of conditional branch mispredictions. As out-of-order CPUs continue to become deeper and wider, the cost of mispredictions increasingly dominates the performance of such workloads. Branch predictor improvements can mitigate this only to a limited extent as data-dependent branches are fundamentally hard to predict.

To address this growing performance issue, we significantly expand the conditional instruction set of x86, which was first introduced with the Intel® Pentium® Pro in the form of CMOV/SET instructions. These instructions are used quite extensively by today’s compilers, but they are too limited for the broader use of if-conversion (a compiler optimization that replaces branches with conditional instructions).

Intel APX adds conditional forms of load, store, and compare/test instructions and adds an option for the compiler to suppress the status flag writes of common instructions. These enhancements expand the applicability of if-conversion to much larger code regions, cutting down on the number of branches that may incur misprediction penalties.

So as you can clearly see, this is very much about saturating the CPU.

How successful it will be, remains to be seen. This is pretty much in line with what myself and other programmers have requested for many years, if anything I'm wondering it it's enough.
Posted on Reply
#17
R-T-B
ShrekWhy 4 threads per core? Two threads keeps the core almost fully busy, so more threads gains little to nothing.
Not true for highly threaded workloads, otherwise SPARC would never have existed.
Posted on Reply
#18
remixedcat
stickleback123I think it's possible, not certain but possible, that the team of hundreds of highly trained and qualified microprocessor engineers might know something about this? Other high end CPUs have used more than 2 way SMT in the past.

They'll have simulated this every which way to Sunday.
4way SMT would be very bad for music production, as it requires higher speeds than needing moar coars. VSTs or Virtual instruments as fully emulated peices of music hardware done on software that emulate all the aspects of an actual instrument and they are very CPU intense ane need very high clock speeds and IPC. If the DSP usage maxxess out you get cuttouts and stuttering badly, Music production requires realtime performance and very low DPC latency as well. I know these 4way SMT CPUs are going to do badly for music production! each core will work too hard and there will be quaduple digit latency!! In some cases people even disable hyperthreading to get better performance while making and performing music!

I have a few systems that are only 4c8t that do better with certain tracks than 1 of my 8c16t systems due to higher clock speeds and lower DPC. I have some synth patches I made that will bring any CPU to it's knees.
Posted on Reply
#19
igormp
efikkanZen 5 implements two-ahead branch prediction, in an effort to reduce the cost of mispredicitons by having the alternative branch ready to be executed. Such improvements are just another example of reducing idle clock cycles which in return means gains from SMT will be reduced.
Yes, but those are not mutually exclusive. There are still considerable gains from SMT within Zen 5 nonetheless.
efikkanJust to be clear, SMT the way Intel and AMD implements it doesn't improve IPC at all. It just tries to keep the core fed, as if it was one single thread saturating the core.
It does improve IPC in absolute terms in practice, given that a single thread is not able to effectively saturate the core. As an example, see this micro-benchmark from chips and cheese:
chipsandcheese.com/p/amds-ryzen-9950x-zen-5-on-desktop

You can argue that it's a workaround for a front-end bottleneck (which I'd agree with), but that doesn't change the end results.
Posted on Reply
#20
remixedcat
igormpIt does improve IPC in absolute terms in practice, given that a single thread is not able to effectively saturate the core. As an example, see this micro-benchmark from chips and cheese:
chipsandcheese.com/p/amds-ryzen-9950x-zen-5-on-desktop
Not w music production. A VST runs everything on one thread and when it pushes it hard.. see my above post...
Posted on Reply
#21
ncrs
efikkanZen 5 implements two-ahead branch prediction, in an effort to reduce the cost of mispredicitons by having the alternative branch ready to be executed. Such improvements are just another example of reducing idle clock cycles which in return means gains from SMT will be reduced.

I haven't seen any evidence of significant gains from this yet, but with refinement and combined with APX it has some great (theoretical) potential.
Improvements to the branch prediction affect both SMT threads since both raw decoding and branch prediction+opcache are active at the same time in Zen 5:
Both the fetch+decode and op cache pipelines can be active at the same time, and both feed into the in-order micro-op queue.
(source - AMD via Chips and Cheese)
efikkanJust to be clear, SMT the way Intel and AMD implements it doesn't improve IPC at all. It just tries to keep the core fed, as if it was one single thread saturating the core.
No, SMT does increase IPC, and in case of Zen 5 it doubles it when Op Cache runs out as expected from the decoder design:

(source - Chips and Cheese)
Even if the Op Cache is disabled:

(source - Chips and Cheese)
efikkanActually not, the benefits from less register shuffling is just an added bonus, and a rather minimal one to be honest.
APX is about maximizing the efficiency of the branch predictor to saturate the CPU which is very clearly explained in the official documentation:

The performance features introduced so far will have a limited impact on workloads that suffer from a large number of conditional branch mispredictions. As out-of-order CPUs continue to become deeper and wider, the cost of mispredictions increasingly dominates the performance of such workloads. Branch predictor improvements can mitigate this only to a limited extent as data-dependent branches are fundamentally hard to predict.

To address this growing performance issue, we significantly expand the conditional instruction set of x86, which was first introduced with the Intel® Pentium® Pro in the form of CMOV/SET instructions. These instructions are used quite extensively by today’s compilers, but they are too limited for the broader use of if-conversion (a compiler optimization that replaces branches with conditional instructions).

Intel APX adds conditional forms of load, store, and compare/test instructions and adds an option for the compiler to suppress the status flag writes of common instructions. These enhancements expand the applicability of if-conversion to much larger code regions, cutting down on the number of branches that may incur misprediction penalties.


So as you can clearly see, this is very much about saturating the CPU.

How successful it will be, remains to be seen. This is pretty much in line with what myself and other programmers have requested for many years, if anything I'm wondering it it's enough.
What you quoted affects one type of workload, and doesn't invalidate SMT in any way. As I wrote before I haven't read anything that makes APX SMT-phobic ;)
Posted on Reply
#22
efikkan
igormpIt does improve IPC in absolute terms in practice, given that a single thread is not able to effectively saturate the core.
Absolutely not, it's a common misconception that IPC is performance per clock, when it's not, it's the amount of instructions the CPU is able to churn through. Whether there is one, two or more threads sharing a core's resources, the IPC remains constant. SMT does improve the saturation of the core for some workloads, but the total performance will only converge towards a fully thread fully saturating the core, never above that. This should be basic knowledge about CPUs.
ncrsWhat you quoted affects one type of workload, and doesn't invalidate SMT in any way. As I wrote before I haven't read anything that makes APX SMT-phobic ;)
I never claimed APX was "SMT-phobic", they probably will co-exist for a while, but the fact that each improvement in microarchitecture resulting in better saturated core results in less stalls, and therefore fewer idle "free" clock cycles for SMT to utilize. As you can clearly see in the quote from earlier about APX; "As out-of-order CPUs continue to become deeper and wider, the cost of mispredictions increasingly dominates the performance of such workloads.", it is very clearly about keeping the CPU saturated. The more saturated the core is from one thread, the less gains there will be from SMT, this should be obvious and is basic logical deduction, and is why we've seen fewer and fewer cases where SMT is significantly beneficial as CPUs advances.

On top of that, the intricate complexity of implementing SMT in the pipeline in modern CPUs, with the resulting transistor "costs" and design constraints, and all the nasty security implications, it naturally comes to a point where the efforts can be better spent by creating a more efficient architecture without SMT. This is why Intel's client CPUs have already moved on, and others will eventually follow.
Posted on Reply
#23
chrcoluk
efikkan"21 custom security instructions"
I do wonder what those entails :(


SMT is a relic of the past, and stopped making sense for user-interactive workloads after quad cores, but will stick around for a while in the server space, partly due to marketing reasons, but also because there are certain server workloads where it sort-of "makes sense", but that rationale is still shrinking. This is limited to workloads where the core is stalled most of the time thanks to cache misses and mispredictions, each worker thread is async, and the only thing that matters is overall throughput (not latency). Remember, the 4 threads will compete over caches and front-end resources, so the effective throughput for a single thread for the intended workload would have to be pretty miserable in order to justify 4-way SMT (or even 8-way like with PPC).

While modern x86 microarchitectures from Intel and AMD aren't anywhere close to saturating the CPU resources, their continuing advancement have made SMT less and less useful over time. So the less idle cycles there are, the less "free performance" can be extracted through SMT, which is probably what you're thinking about.

Meanwhile, Intel's upcoming Diamond Rapids and hopefully Nova Lake will introduce APX, which according to their documentation should bring a significant uplift in throughput.


They probably have extracted the performance they could the easiest way within their time-frame and constraints, and the end result is a CPU with lots of resources on the execution side, but with a very weak front-end to feed it.
It could also be that their SMT implementation works different from Intel and AMD, e.g. executing two of four threads intermixed (where Intel/AMD switches between two threads). If this happens to be the case, the saturation for each thread would be dreadful.

For instance PPC with its 8-way SMT is(was?) popular for certain java workloads, which are so inefficient that they barely execute at all. :p (more like a traffic jam…)
I think you explained it well, an anology would perhaps be you have a narrow thin corridor, where you allow one person through at a time back to back, then you decide to allow 2 side by side, more overall people get through, but its a less pleasant experience with cramped space.
Posted on Reply
#24
ncrs
efikkanAbsolutely not, it's a common misconception that IPC is performance per clock, when it's not, it's the amount of instructions the CPU is able to churn through. Whether there is one, two or more threads sharing a core's resources, the IPC remains constant. SMT does improve the saturation of the core for some workloads, but the total performance will only converge towards a fully thread fully saturating the core, never above that. This should be basic knowledge about CPUs.
You are redefining what "IPC" means to suit your argument. I gave you detailed test results which you simply ignore. There's not much more I can do here.
efikkanThe more saturated the core is from one thread, the less gains there will be from SMT, this should be obvious and is basic logical deduction, and is why we've seen fewer and fewer cases where SMT is significantly beneficial as CPUs advances.
That's not what we've been seeing. SMT performance and efficiency in x86 has been increasing. Zen 5 is able to achieve more with it than for example Zen 2. Same for Intel P-cores - they scale way better than their early SMT implementations.
efikkanOn top of that, the intricate complexity of implementing SMT in the pipeline in modern CPUs, with the resulting transistor "costs" and design constraints, and all the nasty security implications, it naturally comes to a point where the efforts can be better spent by creating a more efficient architecture without SMT. This is why Intel's client CPUs have already moved on, and others will eventually follow.
Intel is not "moving on" from SMT in general. Their P-cores in server/workstation designs will keep using it. It's just their consumer designs that don't implement it. As I wrote before even NVIDIA is introducing SMT into their next sever ARM Vera CPUs.
From the linked CnC article when they discussed SMT with AMD:
The 2T point gets emphasis here. AMD is well aware that Intel is planning to leave SMT out of their upcoming Lunar Lake mobile processor. Zen 5 takes the opposite approach, maintaining SMT support even in mobile products like Strix Point. AMD found that SMT let them maintain maximum 1T performance while enjoying the higher throughput enabled by running two threads in a core for multithreaded workloads. They also found SMT gave them better power efficiency in those multithreaded loads, drawing a clear contrast with Intel’s strategy.
Posted on Reply
#25
efikkan
ncrsYou are redefining what "IPC" means to suit your argument. I gave you detailed test results which you simply ignore. There's not much more I can do here.
Not true. The definition of IPC has always been the same; instructions per clock for a CPU core. Facts are not subject to your opinion, and yet you keep twisting and diverting when confronted with the truth…

It's primarily the CPU vendors themselves at fault for creating confusion and turning "IPC" into a marketing gimmick. (But also big tech YouTubers/websites commonly misuses technical terms, and while many have been into tech for many years still lack the deep knowledge of CPU architectures, machine code and software design.) IPC and performance per clock may be very different, especially when you have different performance characteristics, and even benchmarking with different feature levels or ISAs all together. Take for instance one CPU running a test with AVX-512 and one with AVX2, first will execute fewer instructions per clock yet have higher performance than the latter. Or comparing Zen 2/3 to the Skylake family; Zen having more execution ports but a weaker front-end, resulting some workloads performing significantly better on one or the other.

The same is by all indicators the case for this Hygon CPU too; it's by far easier to achieve some performance by adding lots of execution ports first, and then optimize how to feed them later. And to some extent for Zen 5 too; increasing ALUs 4->6 didn't have a major impact across the board like "leakers" expected, but it will likely lead to gains when the front-end matures with Zen 6 and later revisions.
Posted on Reply
Add your own comment
May 18th, 2025 20:51 CDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts