• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Zen 5 "Strix Point" Processors Rumored To Feature big.LITTLE Core Design

Joined
Apr 12, 2013
Messages
6,749 (1.68/day)
Apple is doing it out of greed, so gotta be cool. #3.5mmAudioJacks
They aren't doing one with x86 & the other with ARM, so no.
ARM hardware is a lot more power efficient so what if the OS could run on 15W of ARM hardware while the x86 cores slept?
No one's doing that, at least in the immediate future. There was a leak(?) that AMD might do something like that IIRC with the K12 or something but that's it?
 
Joined
Jul 9, 2015
Messages
3,413 (1.06/day)
System Name M3401 notebook
Processor 5600H
Motherboard NA
Memory 16GB
Video Card(s) 3050
Storage 500GB SSD
Display(s) 14" OLED screen of the laptop
Software Windows 10
Benchmark Scores 3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.
They aren't doing one with x86 & the other with ARM, so no.
And your point was?
They aren't doing one with audio jack and one without either.

Giving customers choice is really not Apple style, so what was your point again?
 
Joined
Apr 12, 2013
Messages
6,749 (1.68/day)
Why would you even want to have two cluster with entirely different ISAs ?
They're not mixing or matching ISA in the same chip, that's what you quoted. As for big.Little it is the future as far as efficiency is concerned, you can hate it or like it as much as you want to but there's no beating it right now.
 
Joined
Jan 8, 2017
Messages
8,929 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
purely a theoretical thing because of the way the world is moving with apple causing a shift over to arm

ARM hardware is a lot more power efficient so what if the OS could run on 15W of ARM hardware while the x86 cores slept?
It can't work like that, applications use system calls frequently which means all of the software would still end up using the ARM cores all the time, not just the OS.

Targeting different cores based on the type of instruction isn't impossible, it has been done to some degree in recent SoCs. I know that apparently Samsung's chips are capable of tracking and executing threads that contain the 64 bit ISA side of instructions on the "big" cores and 32 bit ones on the the "middle" and "small". To what extent that actually happens in practice I don't know, my guess is that not a whole lot since the scheduling and context switching could be very expensive.

big cores are surprisingly energy efficient these days. LITTLE cores manage to win in some applications however: low and slow. In particular, very low MHz (like 200) and much lower voltages with non-CPU heavy tasks. If the CPU is the bottleneck, you probably want the big-core. DDR4 uses a good chunk of power, including the L3, L2, L1 caches and memory controller. As long as there's work to do, a bigger core can beat LITTLE cores in efficiency.
That's not the purpose of the little cores, they're not meant to offer the highest degree of energy efficiency. They're role is to simply offer lower power consumption in absolute terms, even if the efficiency is actually worse overall.

The problem with the big cores is that they leak power and become increasingly inefficient the lower the clock speed and utilization is. So it turns out that when the workload is very light, the smaller cores end up consuming less power even if the execution is much slower and inefficient and that's very useful because burst workloads don't always matter.
 
Last edited:
Joined
May 31, 2016
Messages
4,324 (1.50/day)
Location
Currently Norway
System Name Bro2
Processor Ryzen 5800X
Motherboard Gigabyte X570 Aorus Elite
Cooling Corsair h115i pro rgb
Memory 16GB G.Skill Flare X 3200 CL14 @3800Mhz CL16
Video Card(s) Powercolor 6900 XT Red Devil 1.1v@2400Mhz
Storage M.2 Samsung 970 Evo Plus 500MB/ Samsung 860 Evo 1TB
Display(s) LG 27UD69 UHD / LG 27GN950
Case Fractal Design G
Audio Device(s) Realtec 5.1
Power Supply Seasonic 750W GOLD
Mouse Logitech G402
Keyboard Logitech slim
Software Windows 10 64 bit
So it still is about power mostly. I really wonder how much power this little core approach will save when low workload is being utilized vs big core.
If AMD goes that route, I wonder if there will still be 8, 12. 16 cores big and smaller to accompany those or the number of the big cores will be lowered.
 

las

Joined
Nov 14, 2012
Messages
1,533 (0.37/day)
System Name Obsolete / Waiting for Zen 5 or Arrow Lake
Processor i9-9900K @ 5.2 GHz @ 1.35v / No AVX Offset
Motherboard AsRock Z390 Taichi
Cooling Custom Water
Memory 32GB G.Skill @ 4000/CL15
Video Card(s) Gainward RTX 4090 Phantom / Undervolt + OC
Storage Samsung 990 Pro 2TB + WD SN850X 1TB + 64TB NAS/Server
Display(s) 27" 1440p IPS @ 280 Hz + 77" QD-OLED @ 144 Hz VRR
Case Fractal Design Meshify C
Audio Device(s) Asus Essence STX / Upgraded Op-Amps
Power Supply Corsair RM1000x / Native 12VHPWR
Mouse Logitech G Pro Wireless Superlight
Keyboard Corsair K60 Pro / MX Low Profile Speed
Software Windows 10 Pro x64
At this point AMD is milking it, TSMC 5nm is available for some time but they drag it out to 2022.
Why bother, scalpers, miners and whatever stupid thing will pop up could make future chips impossible to obtain at normal prices.

You think it's AMDs decicion to wait with 5nm? Haha. AMD will be able to use 5nm, when Apple is done with it. Not a second before. Apple always gets priority at TSMC.

Apple will switch to 4nm for their 2022 releases in Q4. Then Zen 4 can use 5nm sometime in 2022.

Intel 10nm Superfin happends in Q3 this year, and density is close to TSMC 7nm.

I can't wait to see Intel 7nm vs AMD on TSMC 5 or 4nm in 2023 tho.

I will upgrade in 2023-2024 again, perfect timing. Big leap will happen in 2023-2024 for sure and DDR5 will have matured by then too + PCIe 5.0 is standard across the board. GPU prices and availablity normalized too. New rig incoming.
 
Joined
Jan 3, 2021
Messages
2,660 (2.21/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
x64 and ARMv8 doesn't even have the same memory order model. They have ZERO possibility of inter-core communication. Since they are fundamentally not co-existable, why not solder an extra Raspberry chip on your board?
When there's a need, there's a way.

Will there be a need to run both x64 and Arm applications natively on the same system? Quite possibly, I say, as a part of the transition from x86 to Arm. The transition can take many years and never even be complete, so many apps will be available for one or the other architecture exclusively. Both will need to run efficiently, without translation, on the PC.

Will there be a way? It's fundamentally, in all caps, possible. The x64 code would use the x64 version of system libraries, and those would avoid calling Arm libraries whenever possible. A call to Arm have to wake up a (possibly sleeping) Arm core, and apart from that, it would be more complicated and slower than a simple subroutine call. Arm system libraries would have to be able to process data with Arm or x64 byte order*, and that's hard. And so on. It's up to Microsoft to decide if it's worth the hassle.

As for the Raspberry ... my 4-core PC probably has more than 4 Arm cores hiding in the SSD and other peripherals, no need for another one.

* or not? From Wikipedia: "Some instruction set architectures allow running software of either endianness on a bi-endian architecture. This includes ARM AArch64..."

"Schedulers aren't smart enough" to make these decisions. Heck, I don't think anyone is really smart enough to figure out the problem right now.
Schedulers that learn and adapt to each use case. I suspect we already have them in our PCs, in some form, as scheduling is a very complex task even without heterogeneous cores (due to NUMA, hyperthreading, etc.)
 
Joined
Apr 24, 2020
Messages
2,560 (1.76/day)
That's not the purpose of the little cores, they're not meant to offer the highest degree of energy efficiency. They're role is to simply offer lower power consumption in absolute terms, even if the efficiency is actually worse overall.

The problem with the big cores is that they leak power and become increasingly inefficient the lower the clock speed and utilization is. So it turns out that when the workload is very light, the smaller cores end up consuming less power even if the execution is much slower and inefficient and that's very useful because burst workloads don't always matter.

Yup yup. You get it. But I'm trying to emphasize that point because I feel like a lot of other people confuse the issue.

Schedulers that learn and adapt to each use case. I suspect we already have them in our PCs, in some form, as scheduling is a very complex task even without heterogeneous cores (due to NUMA, hyperthreading, etc.)

Schedulers are fancy learning algorithms, where "learning" is the old school 1980s definition, and not the modern "Deep Learning Neural Net" definition. I'm no expert, but read up on how 2.6.xx Linux's "Fair Scheduler" works.


I'm sure Linux has been upgraded since then, but that's what was taught in my college years, so its the only scheduler I'm really familiar with. The Wikipedia link has a decent description:

When the scheduler is invoked to run a new process:

  1. The leftmost node of the scheduling tree is chosen (as it will have the lowest spent execution time), and sent for execution.
  2. If the process simply completes execution, it is removed from the system and scheduling tree.
  3. If the process reaches its maximum execution time or is otherwise stopped (voluntarily or via interrupt) it is reinserted into the scheduling tree based on its new spent execution time.
  4. The new leftmost node will then be selected from the tree, repeating the iteration.

The leftmost node in a Red/Black tree is the node that has the highest priority. Priorities change based off of dynamic scheduling: that is, Linux is adding and subtracting from the priority number in an attempt to maximize responsiveness, throughput, and other statistics. Its pretty dumb all else considered, but these algorithms work quite well when all cores are similar.

Modern schedulers also account for "Hot" cores, where L1 / L2 / L3 is already primed with the data associated with a task (aka: Thread Affinity), NUMA (the distance that data has to travel to get to RAM). There are issues like "Priority Inversion" (Task A is a high priority task for some reason. Task Excel-Spreadsheet is low priority. But for some reason, Task A is waiting on Task Excel Spreadsheet. So the scheduler needs to detect this situation and temporarily increase Excel-Spreadsheet's priority so that Task A can resume quicker).

------------

I guess you can say that "Schedulers" are adaptive like branch predictors and L1 caches. They follow a set of dumb rules that works in practice, allowing for basic levels of adaptation. But there's no AI here, its just a really good set of dumb rules that's been tweaked over the past 40 years to get good results on modern processors.

Scheduling is provably NP complete. The only way to find the optimal schedule is to try all combinations of choices. Alas: if you did that, you'd spend more time scheduling rather than running the underlying programs!!! Schedulers need to run in less than 10-microseconds to be effective (any slower, and you start taking up way more time than the underlying programs).

----------------

Honestly? I think the main solution is to just have a programmer flag. Just like Thread Affinity / NUMA Affinity, you can use heuristics to have a "sane default" but not really work in all cases. Any programmer who knows about modern big.LITTLE architecture can just say "Allocate little-thread" (a thread that's Affinity to a little-core) explicitly, because said programmer knows that his thread works best on LITTLE for some reason.

That's how the problem is "solved" for NUMA and core-affinity already. Might as well keep that solution. Then, have Windows developers go through all of the system processes, and test individually which ones work better on LITTLE vs big cores and manually tweak the configuration of Windows until its optimal.

If you can't solve the problem in code, solve the problem with human effort. There may be thousands of Windows-processes, but you only have to do the categorization step once. Give a few good testers / developers 6 months on the problem, and you'll probably get adequate results that will improve over the next 2 years.
 
Last edited:
Joined
Jan 28, 2021
Messages
845 (0.72/day)
When there's a need, there's a way.

Will there be a need to run both x64 and Arm applications natively on the same system? Quite possibly, I say, as a part of the transition from x86 to Arm. The transition can take many years and never even be complete, so many apps will be available for one or the other architecture exclusively. Both will need to run efficiently, without translation, on the PC.

Will there be a way? It's fundamentally, in all caps, possible. The x64 code would use the x64 version of system libraries, and those would avoid calling Arm libraries whenever possible. A call to Arm have to wake up a (possibly sleeping) Arm core, and apart from that, it would be more complicated and slower than a simple subroutine call. Arm system libraries would have to be able to process data with Arm or x64 byte order*, and that's hard. And so on. It's up to Microsoft to decide if it's worth the hassle.

As for the Raspberry ... my 4-core PC probably has more than 4 Arm cores hiding in the SSD and other peripherals, no need for another one.
It doesn't work like that, the OS and the CPU has to be aware of everything running, you can't just run the OS on ARM and then run certain apps on x86. You'd need an OS compiled for both ISAs and CPU with front-end that would somehow be able to manage different address spaces and pipelines for both architectures. While within the realms of being "possible" it would be insanely complicated, expensive, and completely pointless and defeat the purpose of using the streamlined and minimalistic ARM ISA in the first place.

ARM CPUs exist in your SSD as closed systems, the OS isn't aware they exists and thats why that works, only the firmware and maybe the driver are aware of the ARM cores. All Zen CPUs actually have a ARM CPU in them now for as part of their security platform but again these are closed systems that the user and OS are not aware of.

 

Mussels

Freshwater Moderator
Staff member
Joined
Oct 6, 2004
Messages
58,413 (8.19/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
They could if the OS was designed for it
As an example, the OS runs x86 with an ARM emulator thats compatible with a secondary ARM CPU

We're gunna end up seeing more and more ARM in the mobile/laptop space, i suspect we'll see desktop emulating/hardware supporting ARM rather than ARM Supporting x86
 
Joined
Apr 24, 2020
Messages
2,560 (1.76/day)
They could if the OS was designed for it
As an example, the OS runs x86 with an ARM emulator thats compatible with a secondary ARM CPU

We're gunna end up seeing more and more ARM in the mobile/laptop space, i suspect we'll see desktop emulating/hardware supporting ARM rather than ARM Supporting x86

Have you ever written a program with support for coprocessors and/or different modes of operation?

IE: Jazelle (ARM's Java-bytecode emulator)? The x87 floating point coprocessor was relatively easy, but still kinda weird. There's also the coprocessors on Rasp. Pi (called PIO): https://www.raspberrypi.org/blog/what-is-pio/. Cell phones also have DSP chips, and modern x86 computers often have embedded iGPUs that are very similar to coprocessors.

They're all interesting and cool. But a giant pain in the ass. I don't think the typical mainstream programmer (or system engineer) would want to deal with this crap. Coprocessors with an alternative instruction set add a stupid amount of complexity to any project. Its absolutely doable, but... its not really something you just do willy-nilly.

Jazelle is arguably a failure entirely. Rasp. Pi PIO is useful for GHz-level functionality, so there's a huge benefit to performance and flexibility. However, with only 128-bytes of code space and something like 32-bytes of SRAM, its not exactly an easy coprocessor. (Its so small so that it can achieve GHz level capabilities). iGPUs and DSPs are entirely different architectures that grossly improve performance. (Kinda like PIO: by changing the computer dramatically, the performance can be improved).

----------

ARM and x86? They're both application-level instruction sets. In fact, ARM and x86 are so similar these days, it isn't very hard to emulate them (see Rosetta) on each other. Both ARM and x86 are deeply-speculative, branch predicted, out-of-order, superscalar, pipelined cores with 32 registers + SIMD subset (some 512-bit like SVE / AVX512, some 128-bit like SSE or Neon)... and that SIMD-subset includes AES cryptography with pmull / pclmulqdq for GCM acceleration. As such, both ARM and x86 can emulate each other almost perfectly.

Its not like DSPs or iGPUs or Rasp. Pi's PIO (which are fundamentally different machine models). ARM and x86 have basically stolen each other's designs from the top down and are really, really similar these days. The only exceptions I can think of are ARM's "brev" and Intel's pdep / pext instructions. But pretty much every other instruction can be found in the other instruction set. (I guess ARM / Intel took different approaches to their SIMD-byte swapping routines... but we're reaching into the obscure to find differences)
 
Last edited:
Joined
Jan 28, 2021
Messages
845 (0.72/day)
They could if the OS was designed for it
As an example, the OS runs x86 with an ARM emulator thats compatible with a secondary ARM CPU

We're gunna end up seeing more and more ARM in the mobile/laptop space, i suspect we'll see desktop emulating/hardware supporting ARM rather than ARM Supporting x86
Lol, stop equating things that are 'possible' as things that make any kind of sense in the slightest. The OS would have to be compiled into two different code bases to understand the two different ISAs. The overhead and bloat in the kernel would negate any efficiency benefit of ARM right there.

And the hardware would be even more of a bloated mess and die space is precious relestate. Maybe the front-end of the CPU could be shared between the ISAs but performance would be garbage and you'd have a ton of wasted silicon. If you didn't share the front-end then its like two CPUs in one and you'd need some kind of hardware mediator that knows what the different cores are doing, again performance would be garbage and even more wasted silicon.

If ARM ever takes off on Windows the way to do it would be to what essentially what Apple is doing with Rosetta. At the end of the day all CPUs do the same thing; simple maths and loads and stores of those values. The key would be to build the front end of your ARM CPU and your translation software together so they can efficiently translate the x86 instructions into ARM instructions, as once the instructions are broken down they are all doing the same things. To that end the hardware could be customized to run programs written for a different ISA but you'd never do a complete front to back two different different cores in one CPU.
 
Joined
Mar 23, 2005
Messages
4,061 (0.58/day)
Location
Ancient Greece, Acropolis (Time Lord)
System Name RiseZEN Gaming PC
Processor AMD Ryzen 7 5800X @ Auto
Motherboard Asus ROG Strix X570-E Gaming ATX Motherboard
Cooling Corsair H115i Elite Capellix AIO, 280mm Radiator, Dual RGB 140mm ML Series PWM Fans
Memory G.Skill TridentZ 64GB (4 x 16GB) DDR4 3200
Video Card(s) ASUS DUAL RX 6700 XT DUAL-RX6700XT-12G
Storage Corsair Force MP500 480GB M.2 & MP510 480GB M.2 - 2 x WD_BLACK 1TB SN850X NVMe 1TB
Display(s) ASUS ROG Strix 34” XG349C 180Hz 1440p + Asus ROG 27" MG278Q 144Hz WQHD 1440p
Case Corsair Obsidian Series 450D Gaming Case
Audio Device(s) SteelSeries 5Hv2 w/ Sound Blaster Z SE
Power Supply Corsair RM750x Power Supply
Mouse Razer Death-Adder + Viper 8K HZ Ambidextrous Gaming Mouse - Ergonomic Left Hand Edition
Keyboard Logitech G910 Orion Spectrum RGB Gaming Keyboard
Software Windows 11 Pro - 64-Bit Edition
Benchmark Scores I'm the Doctor, Doctor Who. The Definition of Gaming is PC Gaming...
Incoming "Strix Point? LOL AMD must be working with ASUS on naming LOLOLOL" jokes.

It'll be interesting to see how and if x86 makes the full transition to these core configs. Software will need to catch up but that's just a universal constant at this point. The best built software framework is still half a decade behind the hardware.
Software has been always behind years. Would be nice to have software fully take advantage of today's hardware, and have them both advance together in similar time frames.
 
Joined
Sep 28, 2012
Messages
963 (0.23/day)
System Name Poor Man's PC
Processor AMD Ryzen 5 7500F
Motherboard MSI B650M Mortar WiFi
Cooling ID Cooling SE 206 XT
Memory 32GB GSkill Flare X5 DDR5 6000Mhz
Video Card(s) Sapphire Pulse RX 6800 XT
Storage XPG Gammix S70 Blade 2TB + 8 TB WD Ultrastar DC HC320
Display(s) Mi Gaming Curved 3440x1440 144Hz
Case Cougar MG120-G
Audio Device(s) MPow Air Wireless + Mi Soundbar
Power Supply Enermax Revolution DF 650W Gold
Mouse Logitech MX Anywhere 3
Keyboard Logitech Pro X + Kailh box heavy pale blue switch + Durock stabilizers
VR HMD Meta Quest 2
Benchmark Scores Who need bench when everything already fast?
Somehow, almost everyone forgets how AMD rested on its laurels after the Athlon 64 was released. Intel rethought the wrong Pentium 4 policy and kicked AMD's ass with the Core 2 Duo / Quad series, and later the Core series. What did AMD do? It created its counterpart Pentium 4 focused on empty GHz and for years it was not able to create a processor competing with Intel. Suddenly they succeeded after many years and everyone forgets that the path to AMD's current position was not straightforward and similar to Intel's current situation. Intel hasn't collapsed and I'm just waiting for the same prophets to come back to Intel and laugh at AMD, because that's what they can do - stick with the stronger at the moment and laugh at the opponent (it's hypocrisy).

Don't you think that applies to Intel too? I mean, they were really at crown for past year, with lot of resources and budget, but still providing 4 cores for desktop in decades. In 2017, they proved that they are also capable of making more than 4 cores, because Zen was launched. I'm sure they can compete again not just clinging onto brand acknowledgment and glories of the past.


Only if Intel decides to reverse the flow of time, which at this point isn't completely out of question, and isn't illegal under Moore's Law. But then you'd need to wait for Comet Lake, Coffee Lake Refresh, Coffee Lake, Kaby Lake, Skylake and Broadwell to come first. All of them still on 14 nm, mind you.

RKL was the first time they got out of their Skylake comfort zone, it was really uphill battle for them. But I'm not really worried about Intel, they have capability and the resources for it.

--------------------------------------------------------------------------------------------------------------

On topic, there's a possibility for AMD if they borrow some "console architecture", composed smaller core as RISC ( modified CISC Zen core) and a regular x86 Zen core, make specific I/O request on address bus, cross bar on control bus and enhance Infinity Fabric as wide data bus.
 
Joined
Jan 3, 2021
Messages
2,660 (2.21/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Schedulers are fancy learning algorithms, where "learning" is the old school 1980s definition, and not the modern "Deep Learning Neural Net" definition. I'm no expert, but read up on how 2.6.xx Linux's "Fair Scheduler" works.


I'm sure Linux has been upgraded since then, but that's what was taught in my college years, so its the only scheduler I'm really familiar with. The Wikipedia link has a decent description:

The leftmost node in a Red/Black tree is the node that has the highest priority. Priorities change based off of dynamic scheduling: that is, Linux is adding and subtracting from the priority number in an attempt to maximize responsiveness, throughput, and other statistics. Its pretty dumb all else considered, but these algorithms work quite well when all cores are similar.

Modern schedulers also account for "Hot" cores, where L1 / L2 / L3 is already primed with the data associated with a task (aka: Thread Affinity), NUMA (the distance that data has to travel to get to RAM). There are issues like "Priority Inversion" (Task A is a high priority task for some reason. Task Excel-Spreadsheet is low priority. But for some reason, Task A is waiting on Task Excel Spreadsheet. So the scheduler needs to detect this situation and temporarily increase Excel-Spreadsheet's priority so that Task A can resume quicker).

------------

I guess you can say that "Schedulers" are adaptive like branch predictors and L1 caches. They follow a set of dumb rules that works in practice, allowing for basic levels of adaptation. But there's no AI here, its just a really good set of dumb rules that's been tweaked over the past 40 years to get good results on modern processors.

Scheduling is provably NP complete. The only way to find the optimal schedule is to try all combinations of choices. Alas: if you did that, you'd spend more time scheduling rather than running the underlying programs!!! Schedulers need to run in less than 10-microseconds to be effective (any slower, and you start taking up way more time than the underlying programs).

----------------

Honestly? I think the main solution is to just have a programmer flag. Just like Thread Affinity / NUMA Affinity, you can use heuristics to have a "sane default" but not really work in all cases. Any programmer who knows about modern big.LITTLE architecture can just say "Allocate little-thread" (a thread that's Affinity to a little-core) explicitly, because said programmer knows that his thread works best on LITTLE for some reason.

That's how the problem is "solved" for NUMA and core-affinity already. Might as well keep that solution. Then, have Windows developers go through all of the system processes, and test individually which ones work better on LITTLE vs big cores and manually tweak the configuration of Windows until its optimal.

If you can't solve the problem in code, solve the problem with human effort. There may be thousands of Windows-processes, but you only have to do the categorization step once. Give a few good testers / developers 6 months on the problem, and you'll probably get adequate results that will improve over the next 2 years.
Thanks for that. I didn't mean ML when I mentioned schedulers that learn and adapt. I meant what you described. However, CFS, as it's described, "aims to maximize overall CPU utilization while also maximizing interactive performance", but doesn't try to minimise power consumption.

That programmer flag ... yes, I had a similar idea, but more like a numerical parameter that tells how much a program benefits from running on a faster core. After some thinking, I don't think that many developers and testers would be able and willing to determine that for system processes (or applications, for that matter). One would need to measure performance and power consumption, and do it under various CPU load conditions. Differences would be subtle, not great, and a system process can hardly be tested in isolation. So, instead of manual flagging (or in addition to it), the scheduler would have to consider some power-related data. The CPU cores, in turn, would have to provide that data by means of some kind of performance counters and energy meters.

A hell of a scheduler, right? How is scheduling done on Android, where cores of different sizes are the most common case?
 
Joined
Apr 24, 2020
Messages
2,560 (1.76/day)
A hell of a scheduler, right? How is scheduling done on Android, where cores of different sizes are the most common case?

Android is just Linux underneath. The Completely Fair Scheduler is over 10 years old, and again, was just something I knew from college.

After a brief 5-minute search (sometimes you gotta just know the right keywords), it seems that "Energy Aware Scheduler" is the current state-of-the-art for Linux (and therefore Android). https://community.arm.com/developer...p-blog/posts/energy-aware-scheduling-in-linux. It seems to do what you say: measuring power consumption and running metrics to "assume" future power consumption of tasks. From there, it chooses cores which will minimize task-energy usage.

At least, state of the art for 2019. So I'll assume that's what's going on for now unless someone else tells me of a more recent Linux scheduler.

The details of the EAS are discussed here: https://lore.kernel.org/lkml/20181016101513.26919-1-quentin.perret@arm.com/
 
Joined
Dec 28, 2012
Messages
3,478 (0.84/day)
System Name Skunkworks
Processor 5800x3d
Motherboard x570 unify
Cooling Noctua NH-U12A
Memory 32GB 3600 mhz
Video Card(s) asrock 6800xt challenger D
Storage Sabarent rocket 4.0 2TB, MX 500 2TB
Display(s) Asus 1440p144 27"
Case Old arse cooler master 932
Power Supply Corsair 1200w platinum
Mouse *squeak*
Keyboard Some old office thing
Software openSUSE tumbleweed/Mint 21.2
Somehow, almost everyone forgets how AMD rested on its laurels after the Athlon 64 was released. Intel rethought the wrong Pentium 4 policy and kicked AMD's ass with the Core 2 Duo / Quad series, and later the Core series. What did AMD do? It created its counterpart Pentium 4 focused on empty GHz and for years it was not able to create a processor competing with Intel. Suddenly they succeeded after many years and everyone forgets that the path to AMD's current position was not straightforward and similar to Intel's current situation. Intel hasn't collapsed and I'm just waiting for the same prophets to come back to Intel and laugh at AMD, because that's what they can do - stick with the stronger at the moment and laugh at the opponent (it's hypocrisy).
People also forget that while reting on their laurels, AMD was charging over $1000 for their top end part, while only being 5-7% faster then $600 parts.

AMD ALSO caught nvidia with the HD 5000 series, then proceeded to rebrandeon the lineup as the HD 6000s, only to get BTFOd by fermi 2.0 and had to rush the 6900 series to market (which never quite caught up to nvidia).

Or going WAY back, they got used to releasing processors on intel's chipsetes and sockets, and figuring this would last forever, were left with their pants down when intel refused to license socket 370.

AMD is no stranger to falling asleep at the wheel and half assing things. They havent had as many chances as intel has, but it does happen.
 
Joined
Aug 17, 2017
Messages
274 (0.11/day)
is there anything that amd can come up with on their own and not copy? nope.
 

Mussels

Freshwater Moderator
Staff member
Joined
Oct 6, 2004
Messages
58,413 (8.19/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
is there anything that amd can come up with on their own and not copy? nope.

What kind of weirdass comment is that? AMD have been the first in almost every category except this one... ever heard of x64? IMC? Even the current MCM/CCX design...
 
Joined
Jan 3, 2021
Messages
2,660 (2.21/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
What kind of weirdass comment is that? AMD have been the first in almost every category except this one... ever heard of x64? IMC? Even the current MCM/CCX design...
Intel copied Lisa S. Something probably went wrong in the process, as the analysis of her corporate genome reveals large parts labelled "marketing".
 

Vinc

New Member
Joined
May 7, 2021
Messages
1 (0.00/day)
You're a little bit far too back.
Warhol had been cancelled due to semiconductors shortages and primarily in favor of Zen 4. There will be a small refresh but nothing else (Ryzen 5000 XTX). Zen 4 will be a monstrous architecture. A core count bump is very unlikely with Raphael. Release of Zen 4 will be about 3/4Q 2022. AMD is not going to release anything major anytime soon due to intel's Alder Lake being a meme.
 
Joined
Mar 24, 2019
Messages
620 (0.33/day)
Location
Denmark - Aarhus
System Name Iglo
Processor 5800X3D
Motherboard TUF GAMING B550-PLUS WIFI II
Cooling Arctic Liquid Freezer II 360
Memory 32 gigs - 3600hz
Video Card(s) EVGA GeForce GTX 1080 SC2 GAMING
Storage NvmE x2 + SSD + spinning rust
Display(s) BenQ XL2420Z - lenovo both 27" and 1080p 144/60
Case Fractal Design Meshify C TG Black
Audio Device(s) Logitech Z-2300 2.1 200w Speaker /w 8 inch subwoofer
Power Supply Seasonic Prime Ultra Platinum 550w
Mouse Logitech G900
Keyboard Corsair k100 Air Wireless RGB Cherry MX
Software win 10
Benchmark Scores Super-PI 1M T: 7,993 s :CinebR20: 5755 point GeekB: 2097 S-11398-M 3D :TS 7674/12260
Joined
Oct 9, 2010
Messages
31 (0.01/day)
System Name Game PC
Processor i7 970 @ 4.2GHz
Motherboard Asus Rampage III Extreme
Cooling Water rad one 420x140 rad two 280x140
Memory 6GB 1600 memory, its the one component that dose not realy benefit from speed
Video Card(s) 2x 5870 matrix 2GB 1x the old GeForce GTX 275 for PhysX
Storage 60GB Vertex, 60GB vertex II, 320GB WD Caviar SE, +23TB on server
Display(s) 3x 26" Asus VW266H 1920x1200 for a Eyefinity setup
Case Corsair 800D
Audio Device(s) Xonar DX
Power Supply HX850
Software WIndows 7
Anyway, I wouldn't mind if all AMD processors would have an IGP as well.
I find it very strange that AMD never put in an IGP/APU, I would have been happy with something like R300 (Radeon 9700) performance, and it's not like 3~4% / 50~75M extra transistors on the 2B transistor I/O die would really hurt their bottom line.

Would be good enough for HTPC (home)servers or internet/office PC's, or as a fall-back if your GPU dies or diagnostics, anything would be better than nothing.
 
Joined
Jan 28, 2021
Messages
845 (0.72/day)
I find it very strange that AMD never put in an IGP/APU, I would have been happy with something like R300 (Radeon 9700) performance, and it's not like 3~4% / 50~75M extra transistors on the 2B transistor I/O die would really hurt their bottom line.

Would be good enough for HTPC (home)servers or internet/office PC's, or as a fall-back if your GPU dies or diagnostics, anything would be better than nothing.
The thing you have to keep in mind is with Zen 1/2/3 the goal was mainly to get back in the datacenter, thats where the growth is and the margins are high. Going forward with chiplets giving AMD more flexibility and AMD having more engineering resources and enough market share and volume to justify it we'll probably see them more more core designs rather than just see them use the same Zen compute module on literally everything on desktop and sever with a certain number of cores disabled to fit SKUs. Some of those new designs will probably have a GPU chiplet on package or yeah, maybe they'll build it into the IO die.
 
Joined
Aug 17, 2017
Messages
274 (0.11/day)
AMD = After other Manufacturers Do it.

When will they be first to create their own new technology, Never!
 
Top