We found the Missing Performance: Zen 5 Tested with SMT Disabled 200

We found the Missing Performance: Zen 5 Tested with SMT Disabled

Individual Result Discussion »

Introduction

AMD Logo

We recently published our reviews of the AMD Ryzen 7 9700X and Ryzen 5 9600X processors powered by the Zen 5 architecture. These new chips offer contemporary levels of performance in their market segments, and are highly efficient, thanks to AMD building them on the low-power 4 nm TSMC N4P process, to the extent that AMD rated them with 65 W TDP. AMD advertised a roughly 16% IPC increase for Zen 5 over the previous Zen 4 architecture, and while we are fully aware that IPC doesn't linearly scale with gaming performance, we were a bit surprised to find that the 9700X and 9600X are no more than 3% faster than their predecessors in the most CPU-bottlenecked 720p gaming benchmarks.

Such a deviation from AMD's claims took everyone by surprise in the tech press, some reviewers even wondered initially if they had bad samples, there's plenty of discussion and drama in online communities. Like everyone else, we did several rounds of re-testing, and tried poking and prodding with the settings to figure out if we could better understand the architecture. Through the course of our testing, we found some interesting core scheduling behavior, and set out on an adventure.



Gaming does not need dozens of CPU cores, they don't even need a dozen. Intel knows this, and gives its desktop processors no more than 8 performance cores designed to handle gaming. This puts chips like the Ryzen 7 9700X in competition with rivals such as the Core i7-14700K, which has 8 P-cores, but a total core-count of 20, including the E-cores that don't come into the picture, as Thread Director tends to keep gaming workloads away from them.

During the course of our testing, we observed that Windows 11 was scheduling workloads on the 9700X in a manner that would try to saturate a single core first, by placing workloads on each of its logical threads. Additionally, the placement would put load on the CPPC2 "best" or "second-best" core (gold and silver in Ryzen Master)—which makes sense. However, if a highly demanding single threaded workload runs on one core, scheduling another demanding workload on the second thread of that core will result in lower overall performance. It would be better to place them on two separate cores, where they each have access to the full resources of that core. We hence set out to see if this is an SMT-specific problem.

The motherboard UEFI setup program lets you disable SMT, so we went ahead and disabled it for the 9700X and did a few quick gaming benchmarks from our bench. The initial results got us curious, and as we began testing more games, we noticed small, yet noteworthy performance upticks, which began to get generalized (across all resolutions, and most game tests).

In this article we'll present the findings and go through a bunch of them, to help you understand the discovery, but also include results of the full test suite in case you spot something that we missed. We are not claiming that "SMT off" is the solution to Zen 5 performance—it isn't, because there are many scenarios where having SMT will offer improved performance with minimal power increase. Still, the data suggests that there are tangible benefits from improved scheduling, and we believe that there must be ways to achieve that without fully compromising on SMT.

What's SMT?

Let's bring you up to speed on what simultaneous multithreading (SMT) is. Introduced to the PC by Intel in the early 2000s under the Hyper-Threading moniker, SMT is a way to utilize idle hardware resources of a CPU core by exposing them as a second CPU core to the OS. AMD implemented SMT starting with its first generation of Ryzen in 2017. Both companies have put out developer documentation to help them optimize their software for SMT parallelism. Also, they actively engage with Microsoft to improve the scheduler to work optimally for given workloads. Of course this is no simple task—you not only have to consider performance, but also power usage and efficiency, and ideally you'd want to rotate workloads across cores to spread out the heat.

Proper scheduling plays a crucial role in the power management of the processor, as it can decide boost frequencies of the cores being saturated. By default, 2 of the best cores from the 8-core CCD are selected by AMD for their ability to sustain the highest frequencies, and marked "preferred cores" under the UEFI CPPC standard. A gaming workload should saturate both these preferred cores, with spillover workloads extending to more cores. The second thread (logical processor) doesn't come free in terms of power, it has its own power overhead, besides the energy spent utilizing the hardware resources of the core to execute the thread. On the other hand, confining threads to as few cores as possible lets the processor power the idle cores down, to save energy. There are several theories on what could be happening here, which we will dive into in this article, as we present our findings.
Next Page »Individual Result Discussion
View as single page
Sep 19th, 2024 14:01 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts