Linus Torvalds Finds AVX-512 an Intel Gimmick to Invent and Win at Benchmarks

dragontamer5788 · Jul 13, 2020

efikkan said:
One of the interesting things about AVX is the vast feature set which extends far beyond just arithmetics. It also support things like comparisons with masks, which essentially enables you to do conditionals without branching logic, and the feature set of AVX-512 is almost like a new instruction set. The potential here is huge, but it's still "inaccessible" to most programmers. If we get to a point where writing clean C code can be compiled into decent AVX instructions, even with more complex calculations and some basic conditionals, that would be huge for the adoption of AVX.

That's actually what makes me most excited about AVX512. All of these new AVX512 features allow auto-vectorization to happen far more easily. The details are complicated, but... lets just say that NVidia CUDA and AMD OpenCL has been doing this stuff for over a decade on GPUs. Intel finally is providing CPU-compilers the ability what GPU-compilers have been doing all along. It requires some additional support from the CPU instruction set to ease auto-vectorization and provide more SIMD-based branching controls. But once provided, the theory is already well studied from 1980s SIMD computers and is well known.

Honestly, Linus Torvalds is very clearly out of his depth in this subject matter. I'm no expert, but I can confidently say that I know more than Linus on this subject based on what he's saying here.

AVX and AVX2 are over a decade behind GPU-SIMD computers. AVX512 finally brings parity to CPU-autovectorizers to what GPUs have been doing since 2006. AVX512 is actually a really well designed instruction set... but Intel is certainly messing up the business side of things IMO.

efikkan · Jul 13, 2020

dragontamer5788 said:
That's actually what makes me most excited about AVX512. All of these new AVX512 features allow auto-vectorization to happen far more easily. The details are complicated, but... lets just say that NVidia CUDA and AMD OpenCL has been doing this stuff for over a decade on GPUs. Intel finally is providing CPU-compilers the ability what GPU-compilers have been doing all along. It requires some additional support from the CPU instruction set to ease auto-vectorization and provide more SIMD-based branching controls. But once provided, the theory is already well studied from 1980s SIMD computers and is well known.

Yes, and the interesting thing is that this would solve most of the scaling problems with code, which as you probably know is branching and cache misses. Most branching inside algorithms doesn't actually affect the bigger control flow of the code, put just 3-4 of these and you pretty much guaranteed one or more stalls. I often call these "false branching", and sometimes do clever things to try to eliminate them, like bitwise operations, conditional moves etc. But AVX can resolve a lot of this, it really comes down to being able to write clean readable code which translates into optimal AVX instructions. I still find it a daunting task to write anything but smaller pieces using intrinsics though.

dragontamer5788 said:
Honestly, Linus Torvalds is very clearly out of his depth in this subject matter. I'm no expert, but I can confidently say that I know more than Linus on this subject based on what he's saying here.

I have tremendous respect for Mr Torvalds and am a big fan of his two software creations, and I know he is a very smart man. But this doesn't make every outburst from him gold, and most of what he said here is not accurate.

The only part I could agree about is some of the more application specific instructions (like "AI" stuff). I believe a standard ISA should be generic compute and logic, not application specific. So in my opinion, throw out all the AES, zip, jpeg(!) etc. acceleration instructions, and give us four 512-bit FMA-sets instead.

dragontamer5788 · Jul 13, 2020

I often call these "false branching", and sometimes do clever things to try to eliminate them, like bitwise operations, conditional moves etc

My favorite is "max", "min", and similar operations.

Consider your typical "comparison" for a sorting problem. You'd think you need an "if" statement, but in reality... you can make due with:

Code:

higher = max(a, b);
lower = min(a, b);

The max/min version of the code is branchless at the lowest level, thanks to instructions like vpmaxud. And all of a sudden, your for-loop starts to look far more auto-vectorizable and branchless.

Kanan · Jul 13, 2020

dragontamer5788 said:
Honestly, Linus Torvalds is very clearly out of his depth in this subject matter. I'm no expert, but I can confidently say that I know more than Linus on this subject based on what he's saying here.

I'm pretty sure it was one of his usual rants, he does that sometimes. I too agree that AVX512 is definitely far from being useless, BUT, the availability as well as in the feature set per se, is far too fragmented, the point of Linus still holds, that Intel made a mess out of it.

efikkan · Jul 13, 2020

dragontamer5788 said:
My favorite is "max", "min", and similar operations.

Consider your typical "comparison" for a sorting problem. You'd think you need an "if" statement, but in reality... you can make due with:

Code:

higher = max(a, b); lower = min(a, b);

The max/min version of the code is branchless at the lowest level, thanks to instructions like vpmaxud. And all of a sudden, your for-loop starts to look far more auto-vectorizable and branchless.

Yeah, that's the kind of stuff I've been doing, like mostly creating simple inline functions with vector and matrix maths, but not whole algorithms yet. But SIMD is very suited for algorithms designed in a data oriented approach, I imagine for things like line intersections, collisions, etc. I'm sure some software architects' heads will explode though

mtcn77 · Jul 13, 2020

Cheeseball said:
But for AI and machine learning this is advantageous

Quadros can handle FP64 fine. Whats lacking is FP16

What about tensors? I think vectors count as rank 1 tensors, so we should be able to compare the two.

dragontamer5788 · Jul 13, 2020

Kanan said:
I'm pretty sure it was one of his usual rants, he does that sometimes. I too agree that AVX512 is definitely far from being useless, BUT, the availability as well as in the feature set per se, is far too fragmented, the point of Linus still holds, that Intel made a mess out of it.

Yeah, Linus definitely has a habit of ranting online and leaving his field of expertise. And to be fair: so do I. We're only human after all. It just means that you gotta be on guard and always critically read what Linus is saying. He's clearly a smart guy (probably smarter than me in most aspects of programming). But don't ever grow complacent.

AVX512's main issues are business related. Its locked out of mainstream Skylake chips (typical i7s), so its not really a common compilation target. It was originally Knights-landing feature (aka: Xeon Phi), which is a dead-end.

efikkan said:
Yeah, that's the kind of stuff I've been doing, like mostly creating simple inline functions with vector and matrix maths, but not whole algorithms yet. But SIMD is very suited for algorithms designed in a data oriented approach, I imagine for things like line intersections, collisions, etc. I'm sure some software architects' heads will explode though

I suggest reading through this dissertation by the way: https://www.cs.cmu.edu/~guyb/papers/Ble90.pdf

Blelloch's dissertation from 1990 would seem out-of-date at first glance. But in reality, modern SIMD machines (both AVX512 and GPUs) are heavily based on the CM5 machine he used as the basis of his dissertation. As such, his dissertation reads amazingly close to modern machines.

Dr. Blelloch's more recent papers map more closely to modern machines: https://www.cs.cmu.edu/~guyb/

Just some food for thought. I wouldn't try to do the "flattened nested parallelism" from the top-down in every algorithm. Its unlikely to be fast on all modern architectures. But what's interesting is that Dr. Blelloch has proven an equivalence between recursive definitions and the prefix scan-operations. As such, we have a "universal gadget" to try to convert recursive forms of algorithms into prefix-sum, prefix-max, and similar operations.

Not that the gadget is always efficient on a modern SIMD machine. Its absolutely not... but maybe restating the problem in a prefix-sum style provides insight and gives you ideas for a more efficient algorithm.

---------

You don't have to go very far to be amazed. In as early as Chapter 1, Dr. Blelloch converts recursive quicksort (yes, quicksort) into prefix sum operations.

efikkan · Jul 13, 2020

dragontamer5788 said:
AVX512's main issues are business related. Its locked out of mainstream Skylake chips (typical i7s), so its not really a common compilation target. It was originally Knights-landing feature (aka: Xeon Phi), which is a dead-end.

It's important to remember that Intel's intention was to release Skylake-SP/X and Ice Lake (client) pretty close together. Coffee Lake(s) and Comet Lake were emergency backup plans. So if anything, their business failure is in failing to have a backported Sunny Cove etc. just in case 10nm failed. This AVX-512 inconsistency was never their intention, but still ultimately their "fault".

dragontamer5788 said:
I suggest reading through this dissertation by the way: https://www.cs.cmu.edu/~guyb/papers/Ble90.pdf
…

Thanks.
Some good corona-times reading

trparky · Jul 14, 2020

efikkan said:
No it does not. Unless the CPU reaches a thermal or power limit, it will not throttle the whole CPU, it does not slow down all cores. Loads of applications use AVX to some extent in the background, including compression, web browsers and pretty much anything which deals with video.

Then tell me why there is an AVX Offset in UEFI?

If I understand the concept of the AVX Offset correctly, it's a setting that if you set it at 5 the processor will down-clock from the highest speed by that setting. In the case of a setting of 5 the processor will down-clock by 500 MHz when executing AVX instructions.

windwhirl · Jul 14, 2020

trparky said:
Then tell me why there is an AVX Offset in UEFI?

If I understand the concept of the AVX Offset correctly, it's a setting that if you set it at 5 the processor will down-clock from the highest speed by that setting. In the case of a setting of 5 the processor will down-clock by 500 MHz when executing AVX instructions.

Don't know about that AVX Offset thing in UEFI (I don't do overclocks, after all), but you may be refering to this:

SIMD instructions lowering CPU frequency

I read this article. It talked about why AVX-512 instruction: Intel’s latest processors have advanced instructions (AVX-512) that may cause the core, or maybe the rest of the CPU to run slower b...

stackoverflow.com

Frequency Behavior - Intel - WikiChip

The Frequency Behavior of Intel's CPUs is complex and is governed by multiple mechanisms that perform dynamic frequency scaling based on the available headroom.

en.wikichip.org

It's documented behavior that Intel processors have different frequency sets according to whatever is running on it.

windwhirl said:
TLDR, it seems to affect only Turbo frequencies, in the first place, and how much it will downclock will depend on the type and number of instructions executed. AVX512 does trigger this throttling a bit more, while AVX and AVX2 do it less or don't even do so at all.

R-T-B · Jul 14, 2020

Also, due to how hyperthreading only lets two threads run on a core tops, you'll never "slow down" an integer thread on the same core as an AVS instruction very often. Most of the time, it will rapidly downclock for AVX, execute that instruction with reduced clocks (and still better performance than if it hadn't), and then switch back and do whatever integer thing it was doing at full speed. No penalty. The only situation there would be a penalty would be if it literally executed some kind of AVX and had TIME LEFT OVER (unlikely) to then execute an integer instruction, which would be forced to execute at the lower clock. This is exceedingly rare in practice, I'd picture.

efikkan · Jul 14, 2020

trparky said:
Then tell me why there is an AVX Offset in UEFI?

If I understand the concept of the AVX Offset correctly, it's a setting that if you set it at 5 the processor will down-clock from the highest speed by that setting. In the case of a setting of 5 the processor will down-clock by 500 MHz when executing AVX instructions.

The claim was that any AVX code would impact any other code running on the CPU, and that's simply not the case. A single core can throttle with a lot of AVX, but the CPU runs AVX all the time without any problem.

The purpose of the AVX offset is for overclockers to push non-AVX workloads to a higher clock speed.

R-T-B said:
Also, due to how hyperthreading only lets two threads run on a core tops, you'll never "slow down" an integer thread on the same core as an AVS instruction very often. Most of the time, it will rapidly downclock for AVX, execute that instruction with reduced clocks (and still better performance than if it hadn't), and then switch back and do whatever integer thing it was doing at full speed. No penalty. The only situation there would be a penalty would be if it literally executed some kind of AVX and had TIME LEFT OVER (unlikely) to then execute an integer instruction, which would be forced to execute at the lower clock. This is exceedingly rare in practice, I'd picture.

The CPUs are superscalar, so the technically it can execute both integer instructions and vector instructions at the same time, and it often does. E.g. you have a loop with dense math, the math is AVX, but the loop is not. But it's not a problem, as the alternative would be to do much more code, so even if a few instructions technically runs slower, the overall workload is still a lot faster.

Running the same calculations as AVX greatly reduces the instruction count and the clock cycles needed. It also makes it unroll even more the loops, which again reduces the loop code and branching associated with it. And denser code also helps both data caches, instruction caches, data dependencies and branch prediction, as the logic is more dense.

Kanan · Jul 14, 2020

dragontamer5788 said:
Yeah, Linus definitely has a habit of ranting online and leaving his field of expertise. And to be fair: so do I. We're only human after all. It just means that you gotta be on guard and always critically read what Linus is saying. He's clearly a smart guy (probably smarter than me in most aspects of programming). But don't ever grow complacent

Linus Torvalds is well appreciated by me anyway. I respect people who publicly are bold, direct and honest, it is a rare trait. The most famous was his moment where he struck the middle finger to Nvidia in a conference, which was well deserved. Big companies must always be tested and questioned, they should not have a free pass or they will always abuse it in the name of capitalism and their share holders.

trparky · Jul 14, 2020

efikkan said:
The claim was that any AVX code would impact any other code running on the CPU, and that's simply not the case. A single core can throttle with a lot of AVX, but the CPU runs AVX all the time without any problem.

The purpose of the AVX offset is for overclockers to push non-AVX workloads to a higher clock speed.

So, in other words, nothing to be alarmed about. It's there but it's not going to cause too many slowdowns unless your cooling setup is really that shitty.

dragontamer5788 · Jul 14, 2020

Hmmm... I recall some very, very, very smart people discussing AVX512 downclocking / slowdown issues. I don't recall what they said about it however.

My perspective is that these microarchitectural issues (ie: downclocking or whatnot) will absolutely change by the next major "tick-tock" architecture from Intel. Intel's first implementation of any SIMD has always been crappy.

When AVX was first released, it was executed 128-bits at a time (Sandy Bridge). It was missing integer instructions: that's right, you could do 53-bit double-precision multiplies but you couldn't do 32-bit integer multiplies. All sorts of terrible. Eventually, Haswell + AVX2 came out and fixed the issues, finally making the AVX transition mostly worthwhile over SSE instructions. But all of the flamewars from the early 2010s about "is AVX worth it" look hopelessly outdated in today's environment.

I guess my point is... don't judge the AVX512 instruction set based on its current implementation (ie: Skylake-X). Skylake-X is clearly a "bad" implementation of AVX512. We should instead judge AVX512 based on its future viability. Focusing too much on Skylake-X's performance quirks will make our comments obsolete quicker.

-------------

Case in point: the CNS AVX512 chip (yeah, Via-chips. Surprise!!) can support AVX512 at full clock speeds. It does this by implementing all AVX512 instructions as 256-bit instructions executed over 2x clock ticks. No downclocking involved at all. Maybe this 2x256-bit methodology will be superior in the future, and Intel will copy it. Or maybe Intel figures out the 512-bit power issues and removes the need of downclocking.

Even as a 2x256-bit implementation, AVX512 has enough bonuses (auto-vectorization instructions, opcode masks, scatter instructions, extended register sets) that its worthwhile to use.

windwhirl · Jul 14, 2020

dragontamer5788 said:
I guess my point is... don't judge the AVX512 instruction set based on its current implementation (ie: Skylake-X). Skylake-X is clearly a "bad" implementation of AVX512. We should instead judge AVX512 based on its future viability. Focusing too much on Skylake-X's performance quirks will make our comments obsolete quicker.

That's what I'm looking forward about AVX-512. Seeing how Intel implements it in their next products and see what improvements they make.

And if that chart is correct, a larger subset available on more mainstream CPUs (not just top-of-the-line Extreme Edition CPUs or Xeons) could make it worthwhile for devs and programmers of all kinds of work to use it.

efikkan · Jul 14, 2020

dragontamer5788 said:
My perspective is that these microarchitectural issues (ie: downclocking or whatnot) will absolutely change by the next major "tick-tock" architecture from Intel. Intel's first implementation of any SIMD has always been crappy.
<snip>
Case in point: the CNS AVX512 chip (yeah, Via-chips. Surprise!!) can support AVX512 at full clock speeds. It does this by implementing all AVX512 instructions as 256-bit instructions executed over 2x clock ticks. No downclocking involved at all. Maybe this 2x256-bit methodology will be superior in the future, and Intel will copy it. Or maybe Intel figures out the 512-bit power issues and removes the need of downclocking.

Intel's power issues is probably related to the node. The AVX-512 units are pretty large, and needs to be in sync. I assume at 10nm and 7nm the voltage needed will be less, and the power much more under control.

Via's decision to do it over two cycles have probably to do with saving die space. Zen(1) did something similar with AVX2.

windwhirl said:
That's what I'm looking forward about AVX-512. Seeing how Intel implements it in their next products and see what improvements they make.
View attachment 162210
And if that chart is correct, a larger subset available on more mainstream CPUs (not just top-of-the-line Extreme Edition CPUs or Xeons) could make it worthwhile for devs and programmers of all kinds of work to use it.

While those charts might look a bit intimidating, most of the common features are covered by the F and CD sets, and these also require the most die space.
BTW; you can see the massive list of instructions in the F set here.

R-T-B · Jul 15, 2020

efikkan said:
The CPUs are superscalar, so the technically it can execute both integer instructions and vector instructions at the same time, and it often does. E.g. you have a loop with dense math, the math is AVX, but the loop is not. But it's not a problem, as the alternative would be to do much more code, so even if a few instructions technically runs slower, the overall workload is still a lot faster.

Ah yes, you are correct, even if the conclusion is technically the same.

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	eazen corp \| Xentronon 7.2
Processor	AMD Ryzen 7 3700X // PBO max.
Motherboard	Asus TUF Gaming X570-Plus
Cooling	Noctua NH-D14 SE2011 w/ AM4 kit // 3x Corsair AF140L case fans (2 in, 1 out)
Memory	G.Skill Trident Z RGB 2x16 GB DDR4 3600 @ 3800, CL16-19-19-39-58-1T, 1.4 V
Video Card(s)	Asus ROG Strix GeForce RTX 2080 Ti modded to MATRIX // 2000-2100 MHz Core / 1938 MHz G6
Storage	Silicon Power P34A80 1TB NVME/Samsung SSD 830 128GB&850 Evo 500GB&F3 1TB 7200RPM/Seagate 2TB 5900RPM
Display(s)	Samsung 27" Curved FS2 HDR QLED 1440p/144Hz&27" iiyama TN LED 1080p/120Hz / Samsung 40" IPS 1080p TV
Case	Corsair Carbide 600C
Audio Device(s)	HyperX Cloud Orbit S / Creative SB X AE-5 @ Logitech Z906 / Sony HD AVR @PC & TV @ Teufel Theater 80
Power Supply	EVGA 650 GQ
Mouse	Logitech G700 @ Steelseries DeX // Xbox 360 Wireless Controller
Keyboard	Corsair K70 LUX RGB /w Cherry MX Brown switches
VR HMD	Still nope
Software	Win 10 Pro
Benchmark Scores	15 095 Time Spy \| P29 079 Firestrike \| P35 628 3DM11 \| X67 508 3DM Vantage Extreme

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	My Ryzen 7 7700X Super Computer
Processor	AMD Ryzen 7 7700X
Motherboard	Gigabyte B650 Aorus Elite AX
Cooling	DeepCool AK620 with Arctic Silver 5
Memory	2x16GB G.Skill Trident Z5 NEO DDR5 EXPO (CL30)
Video Card(s)	XFX AMD Radeon RX 7900 GRE
Storage	Samsung 980 EVO 1 TB NVMe SSD (System Drive), Samsung 970 EVO 500 GB NVMe SSD (Game Drive)
Display(s)	Acer Nitro XV272U (DisplayPort) and Acer Nitro XV270U (DisplayPort)
Case	Lian Li LANCOOL II MESH C
Audio Device(s)	On-Board Sound / Sony WH-XB910N Bluetooth Headphones
Power Supply	MSI A850GF
Mouse	Logitech M705
Keyboard	Steelseries
Software	Windows 11 Pro 64-bit
Benchmark Scores	https://valid.x86.fr/liwjs3

Linus Torvalds Finds AVX-512 an Intel Gimmick to Invent and Win at Benchmarks

dragontamer5788

efikkan

dragontamer5788

Kanan

Tech Enthusiast & Gamer

efikkan

mtcn77

dragontamer5788

efikkan

trparky

windwhirl

SIMD instructions lowering CPU frequency

Frequency Behavior - Intel - WikiChip

R-T-B

efikkan

Kanan

Tech Enthusiast & Gamer

trparky

dragontamer5788

windwhirl

efikkan

R-T-B

System Name	System V
Processor	AMD Ryzen 7 9700X
Motherboard	ASRock X670E Pro Rs
Cooling	Deepcool AK620 // a bunch of 120 mm Xigmatek 1500 RPM fans (2 ins, 3 outs)
Memory	2x16GB Kingston 6400MT CL32
Video Card(s)	Gigabyte AORUS Radeon RX 580 8 GB
Storage	SHFS37A240G / DT01ACA200 / ST10000VN0008 / ST8000VN004 / SA400S37960G / SNV21000G / NM620 2TB
Display(s)	LG 22MP55 IPS Display
Case	NZXT Source 210
Audio Device(s)	Logitech G430 Headset
Power Supply	XPG Core Reactor 750 W
Software	Whatever build of Windows 11 is being served in Canary channel at the time.

System Name	Pioneer
Processor	Ryzen 9 9950X
Motherboard	MSI MAG X670E Tomahawk Wifi
Cooling	Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory	64GB (2x 32GB) G.Skill Flare X5 @ DDR5-6200(Running 1T no GDM)
Video Card(s)	PNY RTX 5080 OC
Storage	Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs, 1x 2TB Seagate Exos 3.5"
Display(s)	55" Hisense 55U8N 4K FALD Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W 80Plus Titanium PSU
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64 / Windows 11 Enterprise (yes it's legit)