Thursday, October 10th 2019
AMD TRX40 Chipset Not Compatible with 1st and 2nd Gen Threadrippers
AMD is giving finishing touches to its 3rd generation Ryzen Threadripper HEDT processor lineup, and the first wave of these chips, starting with a 24-core model, will launch alongside the AMD TRX40 chipset. It turns out that the chipset won't be compatible with 1st and 2nd generation Ryzen Threadripper processors. The upcoming 3rd generation Threadripper chips won't be backwards-compatible with the AMD X399 chipset, either. We've been hearing from reliable sources rumors of this segmentation from AMD for a few days now, and tech journalist ReHWolution just tweeted its confirmation having obtained info on upcoming motherboards from a leading brand.
The underlying reason between this restriction remains a mystery. We know that the EPYC "Rome" MCM is pin-compatible with first-generation EPYC "Naples" chips due to the fact that the newer chips are drop-in compatible with older servers via a BIOS update. The TR4 socket, too, is nearly identical to SP3r2, but for four out of eight memory channels being blanked out. It remains to be seen if for TRX40 motherboards, AMD re-purposed these unused pins for something else, such as additional PCIe connectivity or more electrical pins. We'll find out in November, when AMD is expected to launch these chips.
Source:
ReHWolution (Twitter)
The underlying reason between this restriction remains a mystery. We know that the EPYC "Rome" MCM is pin-compatible with first-generation EPYC "Naples" chips due to the fact that the newer chips are drop-in compatible with older servers via a BIOS update. The TR4 socket, too, is nearly identical to SP3r2, but for four out of eight memory channels being blanked out. It remains to be seen if for TRX40 motherboards, AMD re-purposed these unused pins for something else, such as additional PCIe connectivity or more electrical pins. We'll find out in November, when AMD is expected to launch these chips.
66 Comments on AMD TRX40 Chipset Not Compatible with 1st and 2nd Gen Threadrippers
Blunder, on my part: oooooops ...
Intel provides Intel MKL (the library that does the dirty work) and they make sure it uses AVX-512 as much as possible.
They've managed to use AVX-512 e.g. for a lot of common algebra stuff (in BLAS, LINPACK/LAPACK) and for FFT. You can google these terms if they seem mysterious.
Programming is mostly high-level. You don't have to intentionally write a program to use AVX-512. You don't have to know what AVX is. You don't have to have any idea of how CPUs work.
Let's say you want to solve a very simple problem - a system of linear equations. You know:
Ax=B
It's a single line of code in Python with NumPy:
[ICODE]x = linalg.inv(A).dot(B)[/ICODE]
And it'll use AVX-512.
Many renderers use Intel Embree kernel.
Anyway, your current rendering engine may not benefit from AVX-512, but you could always switch to a different one - if it made more sense paired with Intel CPUs. Right?
Choice of those benefiting from AVX is quite significant (from some Intel presentation www.embree.org/data/embree-siggraph-2018-final.pdf):
While all code is fundamentally math, most code is more logic than singular "big" equations. So libraries like this is mostly relevant for research and other specific use cases, not all code in general.
If you want your native C/C++ code to leverage SIMD, it's usually done through intrinsics, and this is required to actually implement algorithms on a lower level, and will of course yield much greater performance benefits.
There is a huge performance potential in using AVX in applications, libraries and the OS itself. Intel have their experimental "Clear Linux" distro, which features some standard libraries and applications with AVX optimizations. If you take a standard Linux distro or Windows, most of it is compiled for the ISA level of old Athlon64 (so x86-64 with SSE2/3). Optimizing standard libraries will of course help all applications which uses them, but not as much as optimizing individual applications of course. Still, I wish this was a feature to enable in OS's, because increased performance will also ultimately give you higher power efficiency, if that's what you want.
That VRM isn't a weakling and can easily handle a 250W part. Mine is also watercooled as I use a monoblock to cool mine. Never hits above 40C.
I suspect Intel paid Chaos Group to add Embree features just to get their foot in the door and have another industry name they could add to the marketing image you presented.
Sadly, it's worthless in the current generation and will need far more work to ever be a viable option for production renders.
Support is limited to specific functions and the displacement mod. Memory requirements double and reduces precision from double to single-precision. The result is artifacts everywhere :\
You can also use Embree for motion blur, but it doesn't support multiple geometry samples, so it's noisy and ugly. It's also actually slower than the default V-Ray motion blur, LOL.
I honestly don't care about what code or hardware renders run on. I get a budget and the brief is to generate error-free frames as fast as possible with that budget. RTX or Embree fail the error-free requirement, see, and Intel typically fails the budget requirement.
It wasn't always this way; The renderfarm's getting pretty big now with about 70 AMD machines in the CPU pool but three years ago it was all-Intel. Performance/$ and Performance/Watt are something Intel has always been really bad at and now that Ryzen is soundly beating them on both fronts. Often, even if you compare AMD with AVX2 and Intel with AVX-512.
I hope this is obvious and acceptable for everyone.
No one said that suddenly all calculations are done using AVX-512.
The main idea I wanted to pass is that you don't have to, consciously, call Intel MKL. That's the whole point of high-level programming after all.
Some people seem to think utilizing AVX-512 needs rewriting of programs. Or that the code will only work on Intel, because it's full of "multiplyUsingAvx512" or whatever.
And as more and more libraries use AVX-512 (via Intel MKL or otherwise), more and more problems run faster on modern Intel CPUs. That's the phenomenon @xkm1948 noticed.
That's the phenomenon we notice all the time as drivers and libraries evolve.
The only difference with AVX-512 is that it offers a sudden, large boost. The OS - yes. The software and libraries? Not really.
Most industry standard software/environments will utilize Intel MKL by default on Intel systems. That includes things like Matlab, Autodesk solvers, Anaconda, many rendering engines (as noted earlier). In other words: one doesn't have to care about it (and he shouldn't, because these products are made for analysts/engineers).
On Windows everything is usually supplied with software, so one doesn't really have to care.
On Linux programs tend to use OS-wide libraries / interpreters / compilers, so sometimes using Intel MKL requires a bit of work. But it shouldn't be an issue for people who already tolerate Linux.
E.g. compiling NumPy with Intel MKL:
software.intel.com/en-us/articles/numpyscipy-with-intel-mkl
But honestly, there's really no good reason not to use compilers or distributions provided by Intel (like C++, Python), since they usually work better. And you've already paid for them in the "Intel tax" or whatever you want to call it. :)
Of course all of this is only relevant for admins and home users. In a pro situation an analyst/engineer is provided with an optimized environment. Which rendering engine? No offense, but your life must be quite sad if you think Intel has to pay anyone to convince them to add optimizations for Xeons (~95% servers).
Software companies just do it. They don't compete with Intel. They compete with other software companies, who may add these optimizations as well.
That's how computing works.
I'm kinda suspicious when they change motherboard convention from X prefix to TRX, and there's no leaked BIOS from any motherboard manufacture.
Oh well, guess I'll wait for review between 3950x versus lowest Threadripper and their total cost platform before deciding to jump.
As to optimizing programs in general; in most code bases only a fraction of the code is actually performance critical. Even for those rare cases where applications contain assembly optimizations, it's usually just a couple of tiny spots at the choke point in an algorithm in a tight loop somewhere, where using assembly results in a 50% gain or something. Most such cases are ideal for SIMD, which means you would use AVX intrinsics instead of assembly instructions (even though these are just macros mapping to AVX assembly instructions), and now you may get a 10-50x gain instead.
Optimizing applications usually focuses on optimizing algorithms and some of the core engine surrounding them. Most performance critical paths are fairly limited in terms of the code involved, and throughout the rest of the application code your high-level abstractions will not make any significant impact on performance at all. But in that critical path, all bloat like abstractions, function calls, non-linear memory accesses and branching will come at a cost. So the first step of optimizing this is removal of abstractions and branching (to the extent possible), then cache optimize it with linear memory layouts (this step may require rewriting code surrounding the algorithm). By this point you should have tight loops without function calls, memory allocations etc. inside them. And only then you are ready to use SIMD, but will also get huge gains from doing so. Doing this kind of optimizations is generally not possible in languages higher than C++. You may still be able to interface with libraries containing low-level optimizations and get some gains there, but that would be it.
One of the good news about writing code for SIMD is that the preparations are the same, so upgrading an existing code from SSE to AVX or a newer version of AVX is simple, just change a few lines of code and tweak some loops. The hard work is the preparations for SIMD. That's where you're wrong.
I mentioned Intel Clear Linux, one of the main features is the optimizations to libc, the runtimes for the C standard library. Almost every native application uses this library. The gains are usually in the 5-30% range, so nothing fantastic, but that's free performance gains for most of your applications, and who doesn't want that? The only reason why I don't use it is that this Linux distro is an experimental rolling release distro, and I need a stable machine for work, and don't have time to spend all day troubleshooting bugs. If it was properly supported by e.g. Ubuntu, I would switch to it.
The counterpart of libc for MS Visual Studio is msvcrt.dll and msvcpxx.dll, which most of your heavier applications and games rely on. There are of course some optimizations in MS's standard library, and they even open sourced some of it recently, but from what I can see it's mostly older SSE. If these were updated to utilize AVX2 or even AVX-512, I'm sure many Windows users would appreciate it. The problem is compatibility; so they would either have to ship two versions or drop hardware support. Another misconception.
Most Windows software rely on MS Visual Studio's runtime libraries, including everything bundled with Windows itself, which is why most Windows applications don't have to "care". The only times they do, is when they are compiled with a more recent Visual Studio version, as Visual Studio choose to duplicate the library for every new version, which I assume is their approach for compatibility.
This is no different than Linux, except it uses libc instead, which comes bundled with every Linux distro. Most Linux software is also POSIX compliant, which makes it easy to port to BSD, MacOS, and even Windows (which is have partial compliance).
The differences which may have confused you are when it comes to GUI libraries etc., since there is not one "GUI API" like in Windows. Linux applications may rely on GTK, Qt, wxWidgets and many others. When such applications are ported to Windows, they usually needs runtimes for those libraries, examples of such applications which you may be familiar with includes; Firefox, VLC, LibreOffice, GIMP, Handbrake, VirtualBox, Teamviewer etc.
The RT farm is outside the scope of this thread anyway, since that's GPU-dependent, and NEXT is a little too early to call production ready, simply because of the massive VRAM/RAM overheads incurred with hybrid rendering. My life probably is pretty sad, I flew all the way to Siggraph this summer to listen to Vlado Koylazov (Chaos Group lead developer) on stage. Maybe I misinterpreted the segment on sponsorship and support, but both Nvidia and Intel are providing 'incentives'. That means a combination of financial support and developers on either side to get things working - both of which I interpret as "costs money". There's also the PR/marketing/promotional side of it which is arguably cost-free, but I'd imagine that can't happen without the financial incentives greasing the wheels first.