• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD TRX40 Chipset Not Compatible with 1st and 2nd Gen Threadrippers

Joined
Sep 26, 2012
Messages
862 (0.20/day)
Location
Australia
System Name ATHENA
Processor AMD 7950X
Motherboard ASUS Crosshair X670E Extreme
Cooling ASUS ROG Ryujin III 360, 13 x Lian Li P28
Memory 2x32GB Trident Z RGB 6000Mhz CL30
Video Card(s) ASUS 4090 STRIX
Storage 3 x Kingston Fury 4TB, 4 x Samsung 870 QVO
Display(s) Acer X38S, Wacom Cintiq Pro 15
Case Lian Li O11 Dynamic EVO
Audio Device(s) Topping DX9, Fluid FPX7 Fader Pro, Beyerdynamic T1 G2, Beyerdynamic MMX300
Power Supply Seasonic PRIME TX-1600
Mouse Xtrfy MZ1 - Zy' Rail, Logitech MX Vertical, Logitech MX Master 3
Keyboard Logitech G915 TKL
VR HMD Oculus Quest 2
Software Windows 11 + Universal Blue
AMD acting like Intel? Its pretty obvious power delivery is a huge factor here, or are people forgetting how gimped the 32 core Threadripper 2 parts were just to fit into the TR4 socket?
 
Joined
Mar 21, 2016
Messages
2,197 (0.74/day)
There we go, amd starting to behave like intel.
They've got a long ways to go to match Intel's overall shenanigans however. Honestly I get it with the HEDT platform and with the memory channel changes. I'm sure this HEDT board replacing x399 will probably last a additional generation or two on top of that. It would have been good if things were different though I can understand it a lot more than the socket 1151 situation. Like if z270/z370 were quad channel they would've been a easier pill to swap and justify. At least in the case of TRX40 it's a more justifiable reason behind it and AMD is essentially transforming it into a cheaper cost of entry equivalent to Epyc from the generation prior with probably another generation or two beyond. To me it's a different situation. It's still far better than what Intel was offering for it's workstation boards and even more than what Intel was doing in the mainstream with socket 1151. As I said AMD has a long ways to match Intel's shenanigans like abusing it's power in a monopolistic anticompetitive manner for example.
 

HTC

Joined
Apr 1, 2008
Messages
4,604 (0.78/day)
Location
Portugal
System Name HTC's System
Processor Ryzen 5 2600X
Motherboard Asrock Taichi X370
Cooling NH-C14, with the AM4 mounting kit
Memory G.Skill Kit 16GB DDR4 F4 - 3200 C16D - 16 GTZB
Video Card(s) Sapphire Nitro+ Radeon RX 480 OC 4 GB
Storage 1 Samsung NVMe 960 EVO 250 GB + 1 3.5" Seagate IronWolf Pro 6TB 7200RPM 256MB SATA III
Display(s) LG 27UD58
Case Fractal Design Define R6 USB-C
Audio Device(s) Onboard
Power Supply Corsair TX 850M 80+ Gold
Mouse Razer Deathadder Elite
Software Ubuntu 19.04 LTS
Isn't Epyc a full SoC? I never heard or read that Epyc mobos had chipsets...

My bad: somehow i missed the word "chipset" in the topic and was referring to socket instead.

Blunder, on my part: oooooops ...
 
Joined
Jun 28, 2016
Messages
3,595 (1.26/day)
Fair enough and important to note , just as important though is the fact that not everyone is doing the same things as you.
Bioinformatics is a niche of computer use not the sole use or AMD would be in trouble.
Bioinformatics may be, but how is that even relevant? Underneath it's just math. And math is fairly mainstream in computing. :-D

Intel provides Intel MKL (the library that does the dirty work) and they make sure it uses AVX-512 as much as possible.
They've managed to use AVX-512 e.g. for a lot of common algebra stuff (in BLAS, LINPACK/LAPACK) and for FFT. You can google these terms if they seem mysterious.

Programming is mostly high-level. You don't have to intentionally write a program to use AVX-512. You don't have to know what AVX is. You don't have to have any idea of how CPUs work.

Let's say you want to solve a very simple problem - a system of linear equations. You know:
Ax=B
It's a single line of code in Python with NumPy:
x = linalg.inv(A).dot(B)
And it'll use AVX-512.
 
Joined
Jun 28, 2016
Messages
3,595 (1.26/day)
Yeah, we're not in the bioinformatics business. The CPU farm tends to get used for raytracing using a couple of different renderers at the moment. Neither use AVX-512.
Are you sure? ;-)
Many renderers use Intel Embree kernel.

Anyway, your current rendering engine may not benefit from AVX-512, but you could always switch to a different one - if it made more sense paired with Intel CPUs. Right?

Choice of those benefiting from AVX is quite significant (from some Intel presentation https://www.embree.org/data/embree-siggraph-2018-final.pdf):

133819
 
Joined
Mar 18, 2008
Messages
5,717 (0.97/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
Bioinformatics may be, but how is that even relevant? Underneath it's just math. And math is fairly mainstream in computing. :-D

Intel provides Intel MKL (the library that does the dirty work) and they make sure it uses AVX-512 as much as possible.
They've managed to use AVX-512 e.g. for a lot of common algebra stuff (in BLAS, LINPACK/LAPACK) and for FFT. You can google these terms if they seem mysterious.

Programming is mostly high-level. You don't have to intentionally write a program to use AVX-512. You don't have to know what AVX is. You don't have to have any idea of how CPUs work.

Let's say you want to solve a very simple problem - a system of linear equations. You know:
Ax=B
It's a single line of code in Python with NumPy:
x = linalg.inv(A).dot(B)
And it'll use AVX-512.

Dude that explains quite a lot! I am nowhere near the math foundation of a pure computer science specialist. I was wondering how the heck did a lot of my tools suddenly run faster on Intel system with avx512. I guess all those linux updates and patches made improvements at the close-to-metal level?
 
Joined
Jun 10, 2014
Messages
2,902 (0.80/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Bioinformatics may be, but how is that even relevant? Underneath it's just math. And math is fairly mainstream in computing. :-D

Intel provides Intel MKL (the library that does the dirty work) and they make sure it uses AVX-512 as much as possible.
They've managed to use AVX-512 e.g. for a lot of common algebra stuff (in BLAS, LINPACK/LAPACK) and for FFT. You can google these terms if they seem mysterious.

Programming is mostly high-level. You don't have to intentionally write a program to use AVX-512. You don't have to know what AVX is. You don't have to have any idea of how CPUs work.

Let's say you want to solve a very simple problem - a system of linear equations. You know:
Ax=B
It's a single line of code in Python with NumPy:
x = linalg.inv(A).dot(B)
And it'll use AVX-512.
Libraries like MKL works by your code passing data to it and performing a larger mathematical computation, like multiplication of large matrices etc, but not taking over every calculation in a program. There is of course overhead involved, MKL may even use multiple threads and different algorithms depending on the parameters you pass to it, so each computation needs to be of a certain size before it becomes beneficial.

While all code is fundamentally math, most code is more logic than singular "big" equations. So libraries like this is mostly relevant for research and other specific use cases, not all code in general.

If you want your native C/C++ code to leverage SIMD, it's usually done through intrinsics, and this is required to actually implement algorithms on a lower level, and will of course yield much greater performance benefits.

There is a huge performance potential in using AVX in applications, libraries and the OS itself. Intel have their experimental "Clear Linux" distro, which features some standard libraries and applications with AVX optimizations. If you take a standard Linux distro or Windows, most of it is compiled for the ISA level of old Athlon64 (so x86-64 with SSE2/3). Optimizing standard libraries will of course help all applications which uses them, but not as much as optimizing individual applications of course. Still, I wish this was a feature to enable in OS's, because increased performance will also ultimately give you higher power efficiency, if that's what you want.
 
Joined
Dec 29, 2010
Messages
3,455 (0.71/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1000 P2
Mouse Logitech G600
Keyboard Corsair K95
X399 boards kinda sucked anyways...
 
Joined
Oct 26, 2008
Messages
2,244 (0.40/day)
System Name Budget AMD System
Processor Threadripper 1900X @ 4.1Ghz (100x41 @ 1.3250V)
Motherboard Gigabyte X399 Aorus Gaming 7
Cooling EKWB X399 Monoblock
Memory 4x8GB GSkill TridentZ RGB 14-14-14-32 CR1 @ 3266
Video Card(s) XFX Radeon RX Vega₆⁴ Liquid @ 1,800Mhz Core, 1025Mhz HBM2
Storage 1x ADATA SX8200 NVMe, 1x Segate 2.5" FireCuda 2TB SATA, 1x 500GB HGST SATA
Display(s) Vizio 22" 1080p 60hz TV (Samsung Panel)
Case Corsair 570X
Audio Device(s) Onboard
Power Supply Seasonic X Series 850W KM3
Software Windows 10 Pro x64
AMD acting like Intel? Its pretty obvious power delivery is a huge factor here, or are people forgetting how gimped the 32 core Threadripper 2 parts were just to fit into the TR4 socket?

My VRMs are perfectly fine running their full intended output.



That VRM isn't a weakling and can easily handle a 250W part. Mine is also watercooled as I use a monoblock to cool mine. Never hits above 40C.
 
Joined
Jun 2, 2017
Messages
7,928 (3.15/day)
System Name Best AMD Computer
Processor AMD 7900X3D
Motherboard Asus X670E E Strix
Cooling In Win SR36
Memory GSKILL DDR5 32GB 5200 30
Video Card(s) Sapphire Pulse 7900XT (Watercooled)
Storage Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s) GIGABYTE FV43U
Case Corsair 7000D Airflow
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Deepcool 1000M
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 46183 Time Spy: 25121
Joined
Feb 20, 2019
Messages
7,305 (3.86/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
Are you sure? ;-)
Many renderers use Intel Embree kernel.

Anyway, your current rendering engine may not benefit from AVX-512, but you could always switch to a different one - if it made more sense paired with Intel CPUs. Right?

Choice of those benefiting from AVX is quite significant (from some Intel presentation https://www.embree.org/data/embree-siggraph-2018-final.pdf):

View attachment 133819
Yep, I'm sure.

I suspect Intel paid Chaos Group to add Embree features just to get their foot in the door and have another industry name they could add to the marketing image you presented.
Sadly, it's worthless in the current generation and will need far more work to ever be a viable option for production renders.

Support is limited to specific functions and the displacement mod. Memory requirements double and reduces precision from double to single-precision. The result is artifacts everywhere :\
You can also use Embree for motion blur, but it doesn't support multiple geometry samples, so it's noisy and ugly. It's also actually slower than the default V-Ray motion blur, LOL.

I honestly don't care about what code or hardware renders run on. I get a budget and the brief is to generate error-free frames as fast as possible with that budget. RTX or Embree fail the error-free requirement, see, and Intel typically fails the budget requirement.

It wasn't always this way; The renderfarm's getting pretty big now with about 70 AMD machines in the CPU pool but three years ago it was all-Intel. Performance/$ and Performance/Watt are something Intel has always been really bad at and now that Ryzen is soundly beating them on both fronts. Often, even if you compare AMD with AVX2 and Intel with AVX-512.
 
Last edited:
Joined
Dec 29, 2010
Messages
3,455 (0.71/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1000 P2
Mouse Logitech G600
Keyboard Corsair K95
Can you please expand on that sentiment?

There were a lot of issues with the boards in the beginning. Ask anyone who ran TR when they came out, it was crazy. I went thru a handful of boards myself.
 
Joined
Jun 28, 2016
Messages
3,595 (1.26/day)
Libraries like MKL works by your code passing data to it and performing a larger mathematical computation, like multiplication of large matrices etc, but not taking over every calculation in a program. There is of course overhead involved, MKL may even use multiple threads and different algorithms depending on the parameters you pass to it, so each computation needs to be of a certain size before it becomes beneficial.
We'll of course. AVX-512 (just like any other optimization feature) will be used when it's expected to make the program faster, not slower. :)
I hope this is obvious and acceptable for everyone.
No one said that suddenly all calculations are done using AVX-512.

The main idea I wanted to pass is that you don't have to, consciously, call Intel MKL. That's the whole point of high-level programming after all.
Some people seem to think utilizing AVX-512 needs rewriting of programs. Or that the code will only work on Intel, because it's full of "multiplyUsingAvx512" or whatever.

And as more and more libraries use AVX-512 (via Intel MKL or otherwise), more and more problems run faster on modern Intel CPUs. That's the phenomenon @xkm1948 noticed.
That's the phenomenon we notice all the time as drivers and libraries evolve.
The only difference with AVX-512 is that it offers a sudden, large boost.
If you take a standard Linux distro or Windows, most of it is compiled for the ISA level of old Athlon64 (so x86-64 with SSE2/3).
The OS - yes. The software and libraries? Not really.

Most industry standard software/environments will utilize Intel MKL by default on Intel systems. That includes things like Matlab, Autodesk solvers, Anaconda, many rendering engines (as noted earlier). In other words: one doesn't have to care about it (and he shouldn't, because these products are made for analysts/engineers).

On Windows everything is usually supplied with software, so one doesn't really have to care.
On Linux programs tend to use OS-wide libraries / interpreters / compilers, so sometimes using Intel MKL requires a bit of work. But it shouldn't be an issue for people who already tolerate Linux.
E.g. compiling NumPy with Intel MKL:

But honestly, there's really no good reason not to use compilers or distributions provided by Intel (like C++, Python), since they usually work better. And you've already paid for them in the "Intel tax" or whatever you want to call it. :)

Of course all of this is only relevant for admins and home users. In a pro situation an analyst/engineer is provided with an optimized environment.

Yep, I'm sure.
Which rendering engine?
I suspect Intel paid Chaos Group to add Embree features just to get their foot in the door and have another industry name they could add to the marketing image you presented.
No offense, but your life must be quite sad if you think Intel has to pay anyone to convince them to add optimizations for Xeons (~95% servers).
Software companies just do it. They don't compete with Intel. They compete with other software companies, who may add these optimizations as well.
That's how computing works.
 
Joined
Sep 28, 2012
Messages
964 (0.23/day)
System Name Poor Man's PC
Processor AMD Ryzen 5 7500F
Motherboard MSI B650M Mortar WiFi
Cooling ID Cooling SE 206 XT
Memory 32GB GSkill Flare X5 DDR5 6000Mhz
Video Card(s) XFX Merc 310 RX 7900 XT
Storage XPG Gammix S70 Blade 2TB + 8 TB WD Ultrastar DC HC320
Display(s) Mi Gaming Curved 3440x1440 144Hz
Case Asus A21
Audio Device(s) MPow Air Wireless + Mi Soundbar
Power Supply Enermax Revolution DF 650W Gold
Mouse Logitech MX Anywhere 3
Keyboard Logitech Pro X + Kailh box heavy pale blue switch + Durock stabilizers
VR HMD Meta Quest 2
Benchmark Scores Who need bench when everything already fast?
Bummer,this explains why AMD mostly silent about their upcoming Threadripper.
I'm kinda suspicious when they change motherboard convention from X prefix to TRX, and there's no leaked BIOS from any motherboard manufacture.
Oh well, guess I'll wait for review between 3950x versus lowest Threadripper and their total cost platform before deciding to jump.
 
Joined
Jun 10, 2014
Messages
2,902 (0.80/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
The main idea I wanted to pass is that you don't have to, consciously, call Intel MKL. That's the whole point of high-level programming after all.
Some people seem to think utilizing AVX-512 needs rewriting of programs. Or that the code will only work on Intel, because it's full of "multiplyUsingAvx512" or whatever.
AVX(2) have proven to scale very well on AMD too, sometimes even better relatively speaking vs. no AVX, which has probably to do with the preparations needed in order to use any kind of SIMD, which actually helps to lighten the load for the front-end of the CPU. Too bad AMD is still not implementing AVX-512. Once client applications starts to utilize it properly, people will not look back.

As to optimizing programs in general; in most code bases only a fraction of the code is actually performance critical. Even for those rare cases where applications contain assembly optimizations, it's usually just a couple of tiny spots at the choke point in an algorithm in a tight loop somewhere, where using assembly results in a 50% gain or something. Most such cases are ideal for SIMD, which means you would use AVX intrinsics instead of assembly instructions (even though these are just macros mapping to AVX assembly instructions), and now you may get a 10-50x gain instead.

Optimizing applications usually focuses on optimizing algorithms and some of the core engine surrounding them. Most performance critical paths are fairly limited in terms of the code involved, and throughout the rest of the application code your high-level abstractions will not make any significant impact on performance at all. But in that critical path, all bloat like abstractions, function calls, non-linear memory accesses and branching will come at a cost. So the first step of optimizing this is removal of abstractions and branching (to the extent possible), then cache optimize it with linear memory layouts (this step may require rewriting code surrounding the algorithm). By this point you should have tight loops without function calls, memory allocations etc. inside them. And only then you are ready to use SIMD, but will also get huge gains from doing so. Doing this kind of optimizations is generally not possible in languages higher than C++. You may still be able to interface with libraries containing low-level optimizations and get some gains there, but that would be it.

One of the good news about writing code for SIMD is that the preparations are the same, so upgrading an existing code from SSE to AVX or a newer version of AVX is simple, just change a few lines of code and tweak some loops. The hard work is the preparations for SIMD.

The OS - yes. The software and libraries? Not really.
That's where you're wrong.
I mentioned Intel Clear Linux, one of the main features is the optimizations to libc, the runtimes for the C standard library. Almost every native application uses this library. The gains are usually in the 5-30% range, so nothing fantastic, but that's free performance gains for most of your applications, and who doesn't want that? The only reason why I don't use it is that this Linux distro is an experimental rolling release distro, and I need a stable machine for work, and don't have time to spend all day troubleshooting bugs. If it was properly supported by e.g. Ubuntu, I would switch to it.

The counterpart of libc for MS Visual Studio is msvcrt.dll and msvcpxx.dll, which most of your heavier applications and games rely on. There are of course some optimizations in MS's standard library, and they even open sourced some of it recently, but from what I can see it's mostly older SSE. If these were updated to utilize AVX2 or even AVX-512, I'm sure many Windows users would appreciate it. The problem is compatibility; so they would either have to ship two versions or drop hardware support.

On Windows everything is usually supplied with software, so one doesn't really have to care.
On Linux programs tend to use OS-wide libraries / interpreters / compilers, so sometimes using Intel MKL requires a bit of work. But it shouldn't be an issue for people who already tolerate Linux.
Another misconception.
Most Windows software rely on MS Visual Studio's runtime libraries, including everything bundled with Windows itself, which is why most Windows applications don't have to "care". The only times they do, is when they are compiled with a more recent Visual Studio version, as Visual Studio choose to duplicate the library for every new version, which I assume is their approach for compatibility.
This is no different than Linux, except it uses libc instead, which comes bundled with every Linux distro. Most Linux software is also POSIX compliant, which makes it easy to port to BSD, MacOS, and even Windows (which is have partial compliance).

The differences which may have confused you are when it comes to GUI libraries etc., since there is not one "GUI API" like in Windows. Linux applications may rely on GTK, Qt, wxWidgets and many others. When such applications are ported to Windows, they usually needs runtimes for those libraries, examples of such applications which you may be familiar with includes; Firefox, VLC, LibreOffice, GIMP, Handbrake, VirtualBox, Teamviewer etc.
 
Joined
Feb 20, 2019
Messages
7,305 (3.86/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
Which rendering engine?

3.6, distributed bucket renders - I'm pretty sure the team is not using RT for the CPU farm since that is for models/scenes that use features not supported in RT.
The RT farm is outside the scope of this thread anyway, since that's GPU-dependent, and NEXT is a little too early to call production ready, simply because of the massive VRAM/RAM overheads incurred with hybrid rendering.

No offense, but your life must be quite sad if you think Intel has to pay anyone to convince them to add optimizations for Xeons (~95% servers).
Software companies just do it. They don't compete with Intel. They compete with other software companies, who may add these optimizations as well.
That's how computing works.
My life probably is pretty sad, I flew all the way to Siggraph this summer to listen to Vlado Koylazov (Chaos Group lead developer) on stage. Maybe I misinterpreted the segment on sponsorship and support, but both Nvidia and Intel are providing 'incentives'. That means a combination of financial support and developers on either side to get things working - both of which I interpret as "costs money". There's also the PR/marketing/promotional side of it which is arguably cost-free, but I'd imagine that can't happen without the financial incentives greasing the wheels first.
 
Top