Monday, November 18th 2019

MATLAB MKL Codepath Tweak Boosts AMD Ryzen MKL Performance Significantly

MATLAB is a popular math computing environment in use by engineering firms, universities, and other research institutes. Some of its operations can be made to leverage Intel MKL (Math Kernel Library), which is poorly optimized for, and notoriously slow on AMD Ryzen processors. Reddit user Nedflanders1976 devised a way to restore anywhere between 20 to 300 percent performance on Ryzen and Ryzen Threadripper processors, by forcing MATLAB to use advanced instruction-sets such as AVX2. By default, MKL queries your processor's vendor ID string, and if it sees anything other than "GenuineIntel...," it falls back to SSE, posing a significant performance disadvantage to "AuthenticAMD" Ryzen processors that have a full IA SSE4, AVX, and AVX2 implementation.

The tweak, meant to be manually applied by AMD Ryzen users, forces MKL to use AVX2 regardless of the CPU Vendor ID query result. The tweak is as simple as it is powerful. A simple 4-line Windows batch file with a set of arguments starts MKL in AVX2 mode. You can also make the tweak "permanent" by creating a system environment variable. The environment variable will apply to all instances of MATLAB, and not just those spawned by the batch file. Nedflanders1976 also posted a benchmark script that highlights the performance impact of AVX2, however you can use your own scripts and post results.
Source: Nedflanders1976 (Reddit)
Add your own comment

67 Comments on MATLAB MKL Codepath Tweak Boosts AMD Ryzen MKL Performance Significantly

#26
OGoc
DeathtoGnomes
with this, remember the transition form x32 to x64, how often applications and games had 2 different executables to use, which, shockingly, depending on the CPU. Since a script is and easy fix, I dont see the need for a separate executable. I Have seen in the past excutables tagged separately for Intel and AMD, tho its been so long I cant remember when or what exactly but i think it was during XP/vista OS days
Yup, had to install drivers in Windows XP and Vista for my dual core Opteron 165 (DFI Expert! for any who remember). "AMDx64..." something.
Posted on Reply
#27
lynx29
After reading this thread, I can 100% say I am glad I decided to go AMD this round, and I already have plans to sell my 3600 when 4800x hits my doorstep. Nothing else changes, except possibly selling my 1080 ti for a navi 6700 xt when that comes out.
Posted on Reply
#28
Mysteoa
ratirt
You have slightly missed the point. Intel didn't spend money to optimize Intel's CPUs but to make competitive companies processors use different code path to cripple their performance. What it means is, if you are Intel you go the faster way (which competition can go as well but it is exclusive) if you are not you will be stuck with the code that is slow as molasses.
They didn't spend money in the first place, but they are going to do it now just to keep them from optimizing for AMD. Until someone publicly shows the problem, like in this case.
Posted on Reply
#29
Mouth of Sauron
I can't remember any reason in last 10 or something years why Intel should be praised. They did a remarkable things in the past, while they had competition. The sum of the de facto no competition time, Intel made ONE architecture, exploited it for ten years, making chips cheaper to produce by Intel, but never cheaper for the customers. 10% benefit in 10 years - great job, thank you Intel! Moving entry/mid/high GPU prices to levels unheard of previously - thank you, NVIDIA! Praise to both!

There are many things that made me start hating Intel, but probably the biggest for me was HSA suppression. I'm talking about strong AMD (with the fabs, with the highly competitive products) acquirement of ATI, in order to bring the new level of efficiency in computing in general, and how Intel effectively stopped it.

HSA should have been a step above CUDA, OpenCL and similar standards. HSA should've exclude the developer from the equation, they should've done things normally and HSA should've been interpreted on compiler level.

HSA Foundation members are AMD, ARM, Samsung, MediaTek, Qualcomm, and Texas Instruments... Who is missing? Of course, Intel - because no on-chip GPU worth speaking about, and of course NVIDIA - because having no CPU at all...

For those who aren't familiar with HSA... Both CPU and GPU do calculations, except FPU is many times faster on GPU and some other stuff are CPU-exclusive. HSA should've represented 'marriage' of CPU and GPU on the same die, with different tasks assigned to the part that does it better and in cooperation regarding resources used.

Why it failed? Because of ill-fated AMD Fusion project. Mistakes were made, solutions were delayed, bad Bulldozer (and forward) architectures, etc. Ending in weak AMD, with product who couldn't compete with Intel. On the other, uglier side, both Intel and NVIDIA actively sabotages the progress, from selfish reasons. Say, what are components of "typical" super-computer? Many Intel CPUs and many NVIDIA GPUs.

Would AMD APU with HSA actually used made a difference? I think yes. I think this still may happen, now when AMD has competitive products for both CPU/GPU. I think it could make difference in home computing, too. I think we have lower-quality products today on software side, thanks to shady business practice. I really liked HSA idea :)
Posted on Reply
#30
john_
Can we really say that MATLAB programmers can create such a program, but at the same time they are so incompetent that they can't create something that can be done with 4 lines of code? Maybe someone who is using the program with a Ryzen, should sue them for the lost time waiting to get results that the program was already capable to present 2-3 times faster.
Posted on Reply
#33
R-T-B
notb
As for MKL - it's used by a lot of computing software. Why? Becase it makes stuff run faster on Intel CPUs. Why would it not be used? This is how computing works. Intel has given developers an API to speed up their programs. Why is Intel attacked for this on this forum? It should be praised.

AMD is also allowed to offer an API optimized for Zen. And I'm sure software developers will gladly implement it as AMD CPUs gain popularity.

For a decade there was really no reason to optimize software for AMD.
Because it's anticompetitive and fragments the marketplace?

Compilers have had a ability to generate binaries that use flags for what they are running on since... forever. Intel should absolutely not be praised for what it is doing here.
Posted on Reply
#34
First Strike
This is a modification that you are really risking your own lives. You know, within one update Intel can make some modifications that "unintentionally" cause numerical bugs on some user-modified systems. There may well be some already.
Posted on Reply
#35
notb
First of all, before I answer a few posts, I have to say... I'm really shocked and disappointed by how little some of you know about software and this kind of APIs.
It's not like I expect everyone to be a developer, but having a minimal understanding of how software works would be useful while discussing this topic...
I think I expected more...
R-T-B
Because it's anticompetitive and fragments the marketplace?

Compilers have had a ability to generate binaries that use flags for what they are running on since... forever. Intel should absolutely not be praised for what it is doing here.
Intel is entitled to provide an API optimized for their CPUs (that's the whole point).
AMD is entitled to provide a similar product.

The main goal of software like Matlab is not to support CPU market or promote competition.
The goal is to compute efficiently. And since Intel provides an API that makes Matlab faster on Intel CPUs, why would they not use it? In fact: shouldn't we demand that they use it? Because the gains on Intel CPUs is really significant.

AMD can also provide such an API and I'm sure MathWorks (like every other major software maker) will happily provide a backend otpimized for AMD.

And most importantly: I have no idea why the criticism is aimed at Intel and not MathWorks. Intel gives no guarantee of MKL performance on AMD platform. In fact they may as well block it completely. It's software maker role to provide compatibility.

If AMD felt this is unfair or "anticompetitive", they should have pointed this out. They didn't. Why?
PanicLake
The fact (or problem) is that as demonstrated by this article you don't actually need an AMD provided API to achieve better performance.
ratirt
Crippling other companies' products is not speeding your product up although it looks better in comparison.
The article OP is referring to is proving that you can work around the crippling procedure Intel has implemented with AMD processors.
Yes. MKL run on AMD CPU is very slow because of falls back to the simplest instruction set.
Of course we could argue if this is OK or not. Intel can't guarantee that AMD CPUs support AVX2 or not. What if they suddenly stopped? Would we them blame Intel for making a library that crashes on competition's CPUs?
But let's not do that.
Let's focus on the very simple fact: it's an Intel library. For the most part it's not open source. It's not designed to become a market-wide standard. And it simply shouldn't be used with AMD CPUs.

Similarly, we could criticize Nvidia because CUDA doesn't work with Radeon. Or blame Ford because their navigation system doesn't work in a Toyota. Why would it?

I mean... seriously... it's Matlab. Hardware makers should fight for performance in such applications. Intel does. Nvidia does.
AMD doesn't. And AMD fanboys - instead of expecting AMD to try harder - criticize everyone else.
Posted on Reply
#36
OSdevr
Anyone know if this applies to GNU Octave as well?
Posted on Reply
#37
Cheeseball
OSdevr
Anyone know if this applies to GNU Octave as well?
GNU Octave will compile under any competent C++ compiler, including Intel's C++ Compiler, Gcc and Clang. It isn't any faster with any specific compiler as it is not heavy (and not meant to be) on vectorized processing compared to MATLAB.

notb
Similarly, we could criticize Nvidia because CUDA doesn't work with Radeon.
Technically CUDA can work on any modern GPU through HIP, but it won't be as efficient (due to AMD's target architecture, not because of a limitation).

Also, MATLAB isn't being unfair by going with Intel's MKLs. They need to because of AVX-512 and efficient FFTs. If your code is mostly linear algebra and not implementing any NLP, then MATLAB on AMD CPUs should be fine, which is why they implemented the MKL_DEBUG_CPU_TYPE=5 environment variable in the first place.
Posted on Reply
#38
OSdevr
Cheeseball
Technically CUDA can work on any modern GPU through HIP, but it won't be as efficient (due to AMD's target architecture, not because of a limitation).
I'm pretty sure HIP is dead.
Posted on Reply
#39
Cheeseball
OSdevr
I'm pretty sure HIP is dead.
Yeah, they moved on to using ROCm directly (which works well with LLVM).
Posted on Reply
#40
tabascosauz
notb
First of all, before I answer a few posts, I have to say... I'm really shocked and disappointed by how little some of you know about software and this kind of APIs.
It's not like I expect everyone to be a developer, but having a minimal understanding of how software works would be useful while discussing this topic...
I think I expected more...
Maybe those far-fetched analogies don't work nearly as well as you thought they would in your head, since every mainstream and higher end chip from the two camps in recent history is compatible with at least AVX? AVX instructions are not a partisan matter, unlike CUDA that is a fundamental part of Nvidia architectures, and car headunits which employ a vastly different variety of hardware and software platforms. I mean, unless you really want to pull out a random Nehalem or Thuban CPU...

AMD doesn't support AVX-512, but they can make use of AVX at the very least (just like Intel since SB) and AVX2 (just like Intel since Haswell). Even on Intel, AVX-512 support for its various instructions is still selective and patchy on the whopping two current platforms that can support it (Xeon Phi and non-mainstream Skylake). By that logic, Broadwell-E and Haswell-E should also be kicked all the way down to hilarious SSE despite AVX and AVX-256 support. But the "GenuineIntel" string means that they aren't, now, are they?
Posted on Reply
#41
biffzinker
[MEDIA=twitter]1196517513909227522[/MEDIA]

Wonder why Intel would suggest LegitReviews use Mathlab as a CPU benchmark?
Posted on Reply
#42
Vya Domus
First Strike
This is a modification that you are really risking your own lives. You know, within one update Intel can make some modifications that "unintentionally" cause numerical bugs on some user-modified systems. There may well be some already.
What's ironic is that Intel's compiler is known for generating unsafe optimizations and use floating point math that goes outside of IEEE standards all the time, by default. And many wonder why it's faster, it's not like Intel's engineers know some crazy algorithms that others don't.
Posted on Reply
#43
ShredBird
W1zzard
Anyone using Matlab here? Would love to get some real-life scenario data for my CPU reviews
I've got 10 years of MATLAB experience under my belt. I've got some scripts that were heavily multi-threaded for my master's degree that can definitely put a CPU to work. I also use MATLAB extensively at work for data acquisition and signal processing purposes, I've also tapped it some for machine learning tasks. I'm very glad this made it to the news feed because I just got a brand new ThreadRipper 2990WX workstation that we intend to use MATLAB on it quite extensively and given how good a value Zen has been, getting a lot more them so I will definitely be making these tweaks.

There are some tricks to measuring MATLAB performance, especially because it is not a compiled language but it can do just in time compiling, you can get variations in performance run-to-run. Feel free to send me a PM, I'd be happy to give you my 2 cents where I can.
Posted on Reply
#44
Assimilator
notb
First of all, before I answer a few posts, I have to say... I'm really shocked and disappointed by how little some of you know about software and this kind of APIs.
It's not like I expect everyone to be a developer, but having a minimal understanding of how software works would be useful while discussing this topic...
I think I expected more...

...
Thank you. I don't understand why it's so difficult for people to understand that MKL is Intel's software, therefore Intel is free to do whatever they want with it. They would be an issue if they were forcing people to use that software, but they aren't - nothing's stopping AMD from writing a library that does the exact same thing as MKL but isn't artificially hobbled on non-Intel CPUs, and nothing's stopping MATLAB from using that replacement library. Except for the fact that it doesn't exist.

Is it a s**tty, idiotic, anticompetitive practice? Yes. Does it achieve anything else than making Intel look like idiots? No. Is it their right to do this? Absolutely.

Finally, I'm not sure why this is even making news now. It's been common knowledge in ICC since 2009 and in MKL since 2013, so it's hardly new, and anyone who knows anything about ICC or MKL already knows how to patch the offending check out.
Posted on Reply
#45
OSdevr
Cheeseball
GNU Octave will compile under any competent C++ compiler, including Intel's C++ Compiler, Gcc and Clang. It isn't any faster with any specific compiler as it is not heavy (and not meant to be) on vectorized processing compared to MATLAB.
Wait, really? Matrices are one of (if not THE) fundamental data types in MATLAB and parallel operations are the norm. I'm surprised Octave isn't similarly optimized for them, though that would be typical of GNU's software.
Posted on Reply
#46
Cheeseball
OSdevr
Wait, really? Matrices are one of (if not THE) fundamental data types in MATLAB and parallel operations are the norm. I'm surprised Octave isn't similarly optimized for them, though that would be typical of GNU's software.
Octave is pretty optimized for what it is. Just don't expect it to be faster than MATLAB or Scilab. Remember you should be using different Octave builds for your use purpose and not just using a generic build (e.g. FFT), unless you're just starting out or doing more numerical stuff.
Posted on Reply
#47
saikamaldoss
I remember posting my find about crysis in ngohq website. It made quite a big noise at that time. Nvidia influenced game developer to make the the game to use 4x2AA and used 4x4sample when user select 4xAA In game by detecting vendor ID.. it’s a shame they have such shady business model.
Posted on Reply
#48
notb
tabascosauz
AMD doesn't support AVX-512, but they can make use of AVX at the very least (just like Intel since SB) and AVX2 (just like Intel since Haswell). Even on Intel, AVX-512 support for its various instructions is still selective and patchy on the whopping two current platforms that can support it (Xeon Phi and non-mainstream Skylake). By that logic, Broadwell-E and Haswell-E should also be kicked all the way down to hilarious SSE despite AVX and AVX-256 support. But the "GenuineIntel" string means that they aren't, now, are they?
I don't intend to go into technicalities of how this works. It does. Libraries and compiler guarantee that unsupported instructions aren't sent to CPU (unless you force them in Assembly).

As I said earlier: Intel MKL is not supposed to serve the whole market. It's not universal. It's their software - made for their hardware.
They took things that existed (BLAS, LAPACK, FFT etc) and they've rewritten them to make the best use of what Intel CPU can provide. That's it.

MKL is not meant to replace the open-source libraries. Software makers can (and should) provide a separate implementation for AMD - just like they would have to do for ARM etc.
Intel and AMD share the same fundamental architecture, but there are significant differences in instruction set (not just AVX-512, but also DNN and more things will follow soon).

Is Matlab optimally coded for AMD CPUs? No. But it's MathWorks' and AMD's fault, not Intel's.

Assimilator
Is it a s**tty, idiotic, anticompetitive practice? Yes.
Why is this anticompetitive? And if yes, then who is to blame?

If someone said MathWorks promotes Intel (i.e. Intel pays them not to make an AMD version), it would smell flat-Earth conspiracy, but I couldn't really prove that it's wrong.
But the thesis in this discussion, that Intel should optimize their software for competing hardware, is just bizarre.

https://software.intel.com/en-us/mkl
code:

Supported Hardware
Intel® Xeon® processor
Intel® Core™ processor family
Intel Atom® processor
Intel® Xeon Phi™ processor


It's official. AMD isn't supported. Can we move on? :)
Posted on Reply
#49
Assimilator
notb
Why is this anticompetitive? And if yes, then who is to blame?
Anticompetitive in terms of it deliberately generating worse code for non-Intel processors for no good reason. Intel created the CPUID instruction back in 1993 to allow programs to determine what features the system processor supports, it's absolutely impossible that the people who write ICC and MKL are unaware of this. And any coder fresh out of school could tell you that using a string to check for features is bad on so many levels. So there's zero possibility this is a mistake or incompetence, which can only mean it's intended to disadvantage non-Intel CPUs, and that is the very definition of anticompetitive behaviour.

Given the above, and the fact that MKL is essentially the only library available that does what it does, it could likely be argued that Intel's behaviour here violates antitrust laws. Certainly, if someone wanted to sue Intel on this basis, they would likely have a better chance than when they were sued for doing this in ICC - at that time Intel was able to weasel their way out of a deserved smackdown by virtue of the fact that consumers weren't forced to use ICC, as there were other compilers that could be used. Yes, you could argue that AMD has had, and does have, the opportunity to create a competing library - but everyone knows how difficult it is to dislodge the market incumbent, even with a superior product.

Honestly though, I don't care if this breaks the law or not, it's just really terrible and unnecessary behaviour that goes against the grain of everything that is responsible and ethical software engineering. I don't like to blow the "all software should be free" horn, but this is an example where it's really necessary.
Posted on Reply
#50
ShredBird
As a regular user of MATLAB, I have to say this really falls on Mathworks from my perspective. MATLAB runs on the Java virtual machine so that it can be hardware agnostic and high level. As a company, you want MATLAB to perform its best on any hardware, so whether that's using a proprietary library or a free one, they should strive to optimize for many architectures. Intel is totally in the right to have MKL fallback to SSE if a compatible processor is not detected, it's their library for their processors. However, shame on MathWorks for not investigating the implications for AMD chipsets when utilizing this design choice.

To be fair, Intel has been dominating the processor market for nearly over a decade, so for the clients where performance really matters they were probably running Intel already. But as the tables are turning there is more scrutiny on their design choices (and much deserved).
Posted on Reply
Add your own comment