1usmus Power Plan for AMD Ryzen - New Developments 209

1usmus Power Plan for AMD Ryzen - New Developments



AMD Logo

Early November, we brought you the 1usmus Custom Power Plan for AMD Ryzen. This custom power plan helps you improve system performance by teaching Windows how to correctly use the Collaborative Power Performance Control feature on Zen 2. The article received great feedback from readers and experts. AMD too launched its own investigation to study the processor behavior, which we discussed in our article, and found that much of the performance or boosting oddities trace back to the Windows scheduler. In the meantime, Microsoft fortunately released the "KB4524570" update for Windows 10, which provides many under-the-hood improvements to the Windows Scheduler, helping with boost behavior and frequencies.

The second part of the problem is at the BIOS level, specifically in the operation of CPPC and C-state settings that should default to "Enabled" mode. Commenting on my Power Plan, AMD's head of technical marketing for processors, Robert Hallock, in an interview with PC World, verified our discoveries about CPPC and the C-State, stating that those features are set to enabled by "default." which unfortunately doesn't really reflect reality.

With the exception of enthusiasts and pro-users, the majority of people don't tend to recheck their BIOS settings and may suffer a lower boost performance simply because some motherboard manufacturers don't seem to know how important these settings are for CPU boost behavior. In this article, I'm publishing a second Power Plan for Ryzen that helps even users with the latest Windows updates. The download link can be found further down in the write-up.

To back it up, I'd like to detail the chronology of an interesting series of events that is hard to believe were accidental:

November 4 - Publication of the original article.
November 7 - Urgent release of BIOS v170 for my motherboard, the MSI Godlike X570. In particular, it was a fix for CPPC and the enablement of C- Global State and AMD Cool & Quiet by default (which wasn't the case before), but with no mention of it in the firmware's change log. It instead only had two items:
Key changes with the updated BIOS:
  • Improved system boot up time.
  • Improved PCI-E device compatibility.
November 11 - Microsoft urgently releases a cumulative update, "KB4524570." The release is also hush about fixes related to ACPI and the task scheduler. Quoting Microsoft:
  • This security update includes quality improvements. Key changes include:
  • This build includes all the improvements from Windows 10, version 1903.
  • Addresses an issue in the Keyboard Lockdown Subsystem that might not filter key input correctly.
  • Provides protections against the Intel® Processor Machine Check Error vulnerability (CVE-2018-12207). Use the registry setting as described in the Guidance KB article. (This registry setting is disabled by default.)
  • Provides protections against the Intel® Transactional Synchronization Extensions (Intel® TSX) Transaction Asynchronous Abort vulnerability (CVE-2019-11135). Use the registry settings as described in the Windows Client and Windows Server articles. (These registry settings are enabled by default for Windows Client OS editions and Windows Server OS editions.)
  • Security updates to Microsoft Scripting Engine, Internet Explorer, Windows App Platform and Frameworks, Microsoft Edge, Windows Fundamentals, Windows Cryptography, Windows Virtualization, Windows Linux, Windows Kernel, Windows Datacenter Networking, and the Microsoft JET Database Engine.
  • No additional issues were documented for this release.

What Actually Changed

I did some testing of my own to investigate what changed between those updates.

Test System:
  • Ryzen 9 3900X
  • EKWB watercooling
  • MSI MEG X570 GODLIKE (BIOS 7C34v160, AGESAВ and 7C34v170, AGESAВ)
  • G.Skill Trident Z Royal DDR4-3600 C16 dual-channel
  • Windows 10 64-bit 1903 and 1909
  • AMD Chipset Driver

BIOS v160, Windows 1903 without KB4524570, Ryzen Balanced

This graph displays the status of the system before updating the BIOS and the cumulative update of Microsoft from November 15. It demonstrates a single-threaded load on a clean operating system without the background activity of programs, but involved 9 out of 12 cores, and there were obvious problems with CPPC. At the same time, each time the test was restarted, different cores were seen boosted. Such behavior is a consequence of the ineffective heterogeneous mode that isn't performance-oriented and the incorrect implementation of CPPC.

Each core that does not sleep is decreasing the maximum single-core boost because for n-stream loads, its own limit on EDC, voltage, and the temperature are determined (we will not consider other factors of AVFS operation in this material). Let me remind you that for games, such a context switch and data moving between CCX results in stuttering (and in particular, a reduced frame rate), which in turn affect your gaming experience.

BIOS v170, Windows 1903 with KB4524570, 1usmus Ryzen Universal

The changes observed with the new BIOS are very significant:
  • The single-threaded load is always within the same CCX
  • Maximum boost falls on the best core; that is, we have the correct CPPC.
  • Unused cores are sleeping when not utilized.
  • The Windows scheduler attempts to hold a single-threaded load on a single core.

My modded BIOS v130 + SMU 46.24.00, Windows 1903, Ryzen Balanced

Keen observers among you may notice that the fresh SMU 46.54.00 that comes with the AGESA Combo PI microcode has cut frequencies for all cores relative to the old BIOS for reviewers. If we consider CCD1 (die 1), we have around -75, -25, +25, -50, -25, -25 MHz for cores 0, 1, 2, 3, 4, 5, 6, and 7, respectively. Of course, the frequency is not an indicator of real performance, and in this case, it is much more important for us to keep the load on a single, definitely superior (favored) core while keeping the rest of the cores asleep.

I also want to mention that the results of frequency measurements are rather crude since the current monitoring method is based on knowledge of the actual bus clock pulses (BCLK) and a selection of core coefficients at certain points in time.

HWiNFO Author Martin writes:
"It has become a common practice for several years to report instant (discrete) clock values for CPUs. This method is based on knowledge of the actual bus clock (BCLK) and sampling of core ratios at specific time points. The resulting clock is then a simple result of ratio times BCLK. Such approach worked quite well in the past, but is not longer sufficient. Over the years, CPUs have become very dynamic components that can change their operating parameters hundreds of times per second depending on several factors including workload amount, temperature limits, thermal/VR current and power limits, turbo ratios, dynamic TDP, etc. While this method still represents actual clock values and ratios reported match defined P-States, it has become insufficient to provide a good overview of CPU dynamics, especially when parameters are fluctuating with a much higher frequency than any software is able to capture. Another disadvantage is that cores in modern CPUs that have no workload are being suspended (lower C-States). In such case when software attempts to poll their status, it will wake them up briefly and thus the clock obtained doesn't respect the sleeping state.

Hence a new approach needs to be used called the Effective clock. This method relies on hardware's capability to sample the actual clock state (all its levels) across a certain interval, including sleeping (halted) states. The software then queries the counter over a specific polling period, which provides the average value of all clock states that occurred in the given interval. HWiNFO v6.13-3955 Beta introduces reporting of this clock. Many users might be surprised how different this clock is in comparison to the traditional clock values reported. But please note that this effective value is the average clock across the polling interval used in HWiNFO. This new method has been tested on several CPUs and has shown to provide more accurate results especially in scenarios with extremely fluctuating values.

On "Zen 2" (Matisse) systems this method can provide results closer to Ryzen Master (RM) per-core clock values, especially because it respects sleeping cores. It is assumed that such cores are marked as sleeping by RM when the effective (average) clock is below a certain threshold (somewhere around 50 MHz and below). Please note, that RM uses a different (proprietary) technique to measure clocks, so there might be some differences between the effective clock in HWiNFO and RM. While we work with AMD on the best way to access more accurate data to measure clock and voltage values, this remains the only method. Additionally, the current effective clock method is an architectural feature meaning that it doesn't depend on a certain CPU model, but is rather universal across a broad range of CPU families."

Adaptive Clock Stretching

Another nuance of clock frequency monitoring is "Adaptive Clocking Stretching", which is a clock-frequency adjustment technique that dynamically adjusts cycle time (e.g., decreases frequency) to withstand voltage drops without increasing the overall supply voltage.

Once a downturn in the incoming voltage is detected and the extent of the voltage drop is determined, the clock stretching logic reduces frequencies to compensate. Unfortunately, I can't provide you with more specific data due to my non-disclosure agreement, but I can give an example of the Steamroller generation. Back then, the falloff threshold was 2.5%, and the clock period was stretched by 7%, which provided a good balance between maintaining high frequencies and improving Vmin. Another interesting capability of this technology is the customizable coefficient of cycles that can be stretched. That is, the processor can "swallow" the voltage drop for a certain number of cycles before activating Stretching.

This is yet another piece of the puzzle, which makes what an incredibly complex and technological marvel modern processors are even clearer. I am glad that my interaction with AMD will allow you to gain additional performance and improve your gaming comfort.

I have to admit I very much enjoy playing with AMD hardware to find additional performance, especially for you gamers out there. I just hope that these findings will be incorporated in BIOS and drivers updates to benefit an even larger userbase.

General Advice for Ryzen Users

  • Pay close attention to operating system and BIOS updates. OS and firmware vendors are as of late changing things far too often with too little documentation. Problems and solutions to serious problems will often not be mentioned in the changelogs.
  • To update your BIOS, always use BIOS flashback (when possible/available) and clear the CMOS after flashing. Otherwise, Windows might not see the changes in the ACPI tables that describe the processor's configuration.
  • Keep monitoring for Ryzen Chipset Drivers updates. It is these drivers that are the most important link between the BIOS and OS, and these also indirectly affect the OS scheduler.
  • Global C-state Control, CPPC Preferred Cores, and AMD Cool'n'Quiet should always be set to "Enabled".
  • CPU Cooling: the boost frequency of the Zen 2 processors is very dependent on temperature. AMD calculated their rated boost clocks at 50°C.

    Depending on the processor, maximum boost will go down with temperature:
    - 3900/3950 - 75 MHz per 10°C
    - 3800/3700 - 50 MHz per 10°C
    - 3600/3500 - 35 MHz per 10°C

    That's why some people with poor cooling may see lower frequencies. This includes case airflow and high ambient temperatures.
I'd also like to thank Oleg Kasumov for the research assistance.

Updated 1usmus Ryzen Universal power profile

I have prepared a new 1usmus Ryzen Universal power profile which should benefit all Windows 10 builds with any BIOS. The main difference from "Ryzen Balanced" is that low-threaded workloads (1–4 threads) will see better CPU utilization.

While AMD's latest power profile update works much better than before, it sets the scheduler to "may use best cores" (auto), trusting that Microsoft's OS kernel does the right thing, I've changed that to "must use best cores", to ensure the best cores really get used.

I'm sure AMD will consider these changes in the future, but as long as these improvements are not implemented, you can use my power profile for better boost behavior. Get it by using the link below. Please refer to the original article for installation instructions (they're not difficult).

DOWNLOAD: 1usmus Ryzen Power Plan v1.1
Discuss(209 Comments)