Sunday, October 8th 2017

AMD Introduces Dynamic Local Mode for Threadripper: up to 47% Performance Gain

AMD has made a blog post describing an upcoming feature for their Threadripper processors called "Dynamic Local Mode", which should help a lot with gaming performance on AMD's latest flagship CPUs.
Threadripper uses four dies in a multi-chip package, of which only two have a direct access path to the memory modules. The other two dies have to rely on Infinity Fabric for all their memory accesses, which comes with a significant latency hit. Many compute-heavy applications can run their workloads in the CPU cache, or require only very little memory access; these are not affected. Other applications, especially games, spread their workload over multiple cores, some of which end up with higher memory latency than expected, which results in a suboptimal performance.

The concept of multiple processors having different memory access paths is called NUMA (Non-uniform memory access). While technically it is possible for software to detect the NUMA configuration and attach each thread to the ideal processor core, most applications are not NUMA aware and the adoption rate is very slow, probably due to the low number of systems using such a concept.
In ThreadRipper, using Ryzen Master, users are free to switch between "Local Memory Access" mode or "Distributed Memory Access" mode, with the latter being the default for ThreadRipper, resulting in highest compute application performance. Local Mode on the other hand is better suited to games, but switching between the modes requires a reboot, which is very inconvenient for users.

AMD's new "Dynamic Local Mode" seeks to abolish that requirement by introducing a background process that continually monitors all running applications for their CPU usage and pushes the more busy ones onto the cores that have direct memory access, by adjusting their process affinity mask, which selects which processors the application is allowed to be scheduled on. Applications that require very little CPU are in turn pushed onto the cores with no memory access, because they are not so important for fast execution.
This update will be available starting October 29 in Ryzen Master, and will be automatically enabled unless the user manually chooses to disable it. AMD also plans to open the feature up to even more users by including Dynamic Local Mode as a default package in the AMD Chipset Drivers. Source: AMD Blog Post
Add your own comment

86 Comments on AMD Introduces Dynamic Local Mode for Threadripper: up to 47% Performance Gain

#2
bug
Not sure why this isn't in the CPU driver and needs a whole application to handle it.
Posted on Reply
#3
Nephilim666
Because it is changing affinity of processes. Good luck getting a driver to do that under WDF and getting WHQL certification.

bug said:
Not sure why this isn't in the CPU driver and needs a whole application to handle it.
Posted on Reply
#5
NC37
Nice but until it makes it's way into the regular Ryzen lines, just won't have enough impact.
Posted on Reply
#6
R0x0r
Nice! Better gaming performance is always warmly welcomed.
One more reason why a developer AND gamer can consider buying a Threadripper.
Posted on Reply
#7
Frick
Fishfaced Nincompoop
NC37 said:
Nice but until it makes it's way into the regular Ryzen lines, just won't have enough impact.
AFAIK Ryzen doesn't use NUMA.
Posted on Reply
#8
gorillacookies
Saying, "Jim Keller is a Genius." might be an understatement.
Posted on Reply
#9
qubit
Overclocked quantum bit
This is great competition and will help to put the wind up Intel. It all depends on that framerate improvement though, of course.

It's a better architecture from a performance standpoint to have a monolithic design like Intel, but AMD's approach has the advantage of scalability, which can be just as important.
Posted on Reply
#10
Mysteoa
So AMD can fix what Microsoft can't fix in the scheduler.
Posted on Reply
#11
natr0n
My dual xeon uses numa. :)
Posted on Reply
#12
beautyless
If it could handles automatically, great.
Posted on Reply
#13
T1beriu
NC37 said:
Nice but until it makes it's way into the regular Ryzen lines, just won't have enough impact.
Regular Ryzen CPUs don't need this. This optimization is useless because Regular Ryzen CPUs only use just a die.
Posted on Reply
#14
bug
qubit said:
This is great competition and will help to put the wind up Intel. It all depends on that framerate improvement though, of course.

It's a better architecture from a performance standpoint to have a monolithic design like Intel, but AMD's approach has the advantage of scalability, which can be just as important.
Not really, Threadripper isn't something I'd buy primarily for gaming ;)
But as always, improving on the weak spots makes the whole package stronger overall.
Posted on Reply
#15
oxidized
Interesting stuff, hopefully will be useful to close the gap with intel in terms of gaming performance.
Posted on Reply
#16
eidairaman1
The Exiled Airman
Nephilim666 said:
Because it is changing affinity of processes. Good luck getting a driver to do that under WDF and getting WHQL certification.
You have been here since 2006 and never filled out your specs and hardly post, yet you would on a Topic relating to AMD. Hmmm
Posted on Reply
#17
TheinsanegamerN
gorillacookies said:
Saying, "Jim Keller is a Genius." might be an understatement.
Jim keller has been at intel since the launch of OG ryzen. This is mostly AMD's own work.
Posted on Reply
#18
Rhyseh
eidairaman1 said:
You have been here since 2006 and never filled out your specs and hardly post, yet you would on a Topic relating to AMD. Hmmm
This whole comment seems wholly pointless so I thought I'd add another one.
Posted on Reply
#19
qubit
Overclocked quantum bit
bug said:
Not really, Threadripper isn't something I'd buy primarily for gaming ;)
But as always, improving on the weak spots makes the whole package stronger overall.
I'm replying generally, not your particular case. And anyway, if it gives a better framerate in games, it will also give a better general application performance.

Before I get "corrected" by anyone, I'm talking about a like for like comparison, eg a 16 core Intel CPU compared with a 16 core AMD CPU with the same, or very similar clock speeds. A better IPC from Intel, coupled with a monolithic design will result in a performance win for Intel in everything.
Posted on Reply
#20
bug
TheinsanegamerN said:
Jim keller has been at intel since the launch of OG ryzen. This is mostly AMD's own work.
On a projects that's Jim Keller's brain child.
As he himself acknowledges, it's always more about the team, one man can't possibly do all the work to build a CPU these days. Still, a series of highly successful CPUs trail his work history.

qubit said:
I'm replying generally, not your particular case. And anyway, if it gives a better framerate in games, it will also give a better general application performance.

Before I get "corrected" by anyone, I'm talking about a like for like comparison, eg a 16 core Intel CPU compared with a 16 core AMD CPU with the same, or very similar clock speeds. A better IPC from Intel, coupled with a monolithic design will result in a performance win for Intel in everything.
Well, I still have to correct you :D
Intel's IPC is virtually the same as AMD's now (within a few percent). Intel's win is because of higher frequencies.
Posted on Reply
#21
qubit
Overclocked quantum bit
bug said:
On a projects that's Jim Keller's brain child.
As he himself acknowledges, it's always more about the team, one man can't possibly do all the work to build a CPU these days. Still, a series of highly successful CPUs trail his work history.


Well, I still have to correct you :D
Intel's IPC is virtually the same as AMD's now (within a few percent). Intel's win is because of higher frequencies.
AMD is still, what, 10% slower on IPC? That's still a win for Intel. Also, notice that I said Intel would win, not by how much. That depends on specific cases which is outside the scope of my comment.
Posted on Reply
#22
Dave65
Makes me all warm inside, but really needs to be in the driver, unless I am missing something.
Posted on Reply
#23
eidairaman1
The Exiled Airman
Mysteoa said:
So AMD can fix what Microsoft can't fix in the scheduler.
Because Microsoft is too lazy to

qubit said:
I'm replying generally, not your particular case. And anyway, if it gives a better framerate in games, it will also give a better general application performance.

Before I get "corrected" by anyone, I'm talking about a like for like comparison, eg a 16 core Intel CPU compared with a 16 core AMD CPU with the same, or very similar clock speeds. A better IPC from Intel, coupled with a monolithic design will result in a performance win for Intel in everything.
Considering the scheduler is stupid in Windows. This would fix the gaming performance through Ryzen/Threadripper master like they advertised on AMD about the 2990WX CPU.

Either way Its a win for either team.
Posted on Reply
#24
qubit
Overclocked quantum bit
eidairaman1 said:
Considering the scheduler is stupid in Windows. This would fix the gaming performance through Ryzen/Threadripper master like they advertised on AMD about the 2990WX CPU.

Either way Its a win for either team.
Yes, it would certainly improve performance and I'm quite keen to see how close AMD can get to Intel with this fix.
Posted on Reply
#25
eidairaman1
The Exiled Airman
Rhyseh said:
This whole comment seems wholly pointless so I thought I'd add another one.
He has a negative overtone which makes me wonder. By the way who are you, never seen you before till now?
Posted on Reply
Add your own comment