Monday, June 14th 2021

AMD Files Patent for its Own x86 Hybrid big.LITTLE Processor

AMD is innovating its own x86 hybrid processor technology formulated along the Arm big.LITTLE hybrid CPU core topology that inspired Hybrid processors by Intel. Under this, the processor has two kinds of CPU cores with very different performance/Watt bands—one kind focuses on performance and remains dormant under mild processing loads; while the other hand handles most lightweight processing loads that don't require powerful cores. This is easier said than done, as the two kinds of cores feature significantly different CPU core microarchitectures, and instruction sets.

AMD has filed a patent describing a method for processing workloads to be switched between the two CPU core types, on the fly. Unlike homogenous CPU core designs where workload from one core is seamlessly picked up by another over a victim cache like the L3, there is some logic involved in handover between the two core types. According to the patent application, in an AMD hybrid processor, the two CPU core types are interfaced over the processor's main switching fabric, and not a victim cache, much in the same way as the CPU cores and integrated GPU are separated in current-gen AMD APUs.
According to the patent application, AMD's CPU core type switching logic is dictated by a number of factors, such as CPU core utilization of the low-power color, its memory utilization, the need for instruction sets only found with the performance core, and machine architecture states. The patent also briefly references a power-management mechanism that saves system power by gating the two core types based on utilization. Power savings are the primary objective of any big.LITTLE topology.

The patent description can be accessed here.
Source: Kepler_L2 (Twitter)
Add your own comment

21 Comments on AMD Files Patent for its Own x86 Hybrid big.LITTLE Processor

#1
ratirt
I wonder why AMD wants to make the big little architecture a thing. They have a noticeable advantage in efficiency.

From the graphs it looks like the big and small core are separated and connected via fabric. So different chiplets? I suppose so.
Posted on Reply
#2
Crackong
ratirtI wonder why AMD wants to make the big little architecture a thing. They have a noticeable advantage in efficiency.

From the graphs it looks like the big and small core are separated and connected via fabric. So different chiplets? I suppose so.
Pic No.1 shows this is an APU design.
They might wanna implement Big.Little architecture to mobile APUs to further enhance power saving capabilities.
Posted on Reply
#3
LTUGamer
If they will use it, I hope that will be only laptop processors. Desktop users doesn't need that.

However there is probability that they just need to make patent just to enable using it in future IF some-when it will be needed.
Posted on Reply
#4
theGryphon
ratirtI wonder why AMD wants to make the big little architecture a thing. They have a noticeable advantage in efficiency.

From the graphs it looks like the big and small core are separated and connected via fabric. So different chiplets? I suppose so.
Mobile and nextgen consoles can use this. Despite certain advantages, AMD still hasn't picked up in mobile. They cannot fall behind the curve while Intel starts marketing their bigLITTLE implementation as the best thing since sliced bread. AMD cannot gain marketshare by being just a bit better than Intel, thanks to fan boys and subsidy-drunk OEM marketplace.
Posted on Reply
#5
R0H1T
theGryphonMobile and nextgen consoles can use this. Despite certain advantages, AMD still hasn't picked up in mobile. They cannot fall behind the curve while Intel starts marketing their bigLITTLE implementation as the best thing since sliced bread. AMD cannot gain marketshare by being just a bit better than Intel, thanks to fan boys and subsidy-drunk OEM marketplace.
By mobile you mean laptops or ULP kinda like tablets? Because AMD certainly have picked up in laptop segment, it's just the wafer crunch which is severely hampering their market share uptick! As for Intel yeah much like ARM, AMD probably also wants to list more cores, even if they're small ones, as a marketing point.
Posted on Reply
#6
TumbleGeorge
I hope this to be not out in real desktop products. Too copy from Intel.
Posted on Reply
#7
Danish78
Does AMD have a little core in development then? Their AM1 stuff is pretty old.
Posted on Reply
#8
theGryphon
R0H1TBy mobile you mean laptops or ULP kinda like tablets? Because AMD certainly have picked up in in laptop segment, it's just the wafer crunch which is severely hampering their market share uptick! As for Intel yeah much like ARM, AMD probably also wants to list more cores, even if they're small ones, as a marketing point.
I'm including the laptops as well as tablets. AMD is still getting the low-to-mid-tier treatment while top tier models are mostly Intel. There are even examples of AMD models being intentionally crippled: component, material and configuration. Intel is fighting back hard and not only in the R&D and marketing departments, if you know what I mean. That's why AMD can use any advantage they can get, even if it's to tick a marketing box. This is more than just a marketing angle if you ask me. There are many real use cases...
TumbleGeorgeI hope this to be not out in real desktop products. Too copy from Intel.
ARM you mean. Sigh.:rolleyes:
Real (whatever that means) desktop products are only one segment, you know, right? This is very relevant in many environments where power and heat are and will always be bottlenecks...
Posted on Reply
#9
TumbleGeorge
theGryphonARM you mean.
Yes....and from ARM. But in from Intel too....AMD "Atom" fututest ZEN5+zen4 cores in last known APU on the horizon.
Posted on Reply
#10
napata
LTUGamerIf they will use it, I hope that will be only laptop processors. Desktop users doesn't need that.

However there is probability that they just need to make patent just to enable using it in future IF some-when it will be needed.
I don't agree. For most users, even on desktop, the CPU spends a lot of time in low power scenarios, such as browsing & watching media, where big.little will benefit power draw. Even games are mostly low power scenario's. I have a very power hungry Intel CPU and most games run between 20-60W with a 3060Ti at 1080p. These are places where you can push power draw down with big.little.

Power draw in low usage cases is already a place where AMD is weak and where they draw more power than Intel even when in full load cases they destroy Intel on performance/W so it's a good thing they're trying to improve on this.
Posted on Reply
#11
Vya Domus
I hope they keep this stuff away from desktops.
Posted on Reply
#12
DeathtoGnomes
Vya DomusI hope they keep this stuff away from desktops.
I kinda see this design could be intended for entry level, low budget stuff. I doubt it could match performance of a full fledged chip.
Posted on Reply
#13
Punkenjoy
napataI don't agree. For most users, even on desktop, the CPU spends a lot of time in low power scenarios, such as browsing & watching media, where big.little will benefit power draw. Even games are mostly low power scenario's. I have a very power hungry Intel CPU and most games run between 20-60W with a 3060Ti at 1080p. These are places where you can push power draw down with big.little.

Power draw in low usage cases is already a place where AMD is weak and where they draw more power than Intel even when in full load cases they destroy Intel on performance/W so it's a good thing they're trying to improve on this.
Power draw is one of the thing that can benefits from big-little core but that is not the only one.

Die utilization is another one. smaller simpler core to use way less space versus their relative performance. You can put them more in the same space. Big core are really for the the single threaded load. You could probably get higher performance per die space in highly multithreaded code by using more smaller cores in the same die space. I am not sure about game (altought if the main thread run in the big core, i don't see why it wouldn't be good) but many workload could benefits from it like video encoding, Rendering, etc.

Larger cache might also change how CPU are made, if the L3 always contain what the CPU want to execute, it may no longer need to have all the mechanism there to improve overall performance while the CPU wait for Data.
Posted on Reply
#14
yeeeeman
ratirtI wonder why AMD wants to make the big little architecture a thing. They have a noticeable advantage in efficiency.

From the graphs it looks like the big and small core are separated and connected via fabric. So different chiplets? I suppose so.
because big little can bring more efficiency benefits. Imagine a 6500U with 8 cores, 4 small, 4 big and 20 hours of battery life, instead of 10 hours, like we get now.
I also don't understand why people are so against this in desktop. People, leave the engineering side of things to the engineers. If, at the end of the day you get better performance, that is what matters.
Another important point mentioned previously is that in the same die space you can get more performance with more smaller cores compared to fewer big cores. Why? Because having a big core with a huge OOO window and making it work efficiently, that is, use as much of those resources that it has is more difficult than maximizing smaller cores.

So in the end, a 16 big cores vs a 8 big cores and 16 smaller ones might bring you more performance from the 8+16 with similar die space usage, pluuus, you are more efficient when idle/partially loaded thanks to those smaller cores.
Posted on Reply
#15
persondb
yeeeemanSo in the end, a 16 big cores vs a 8 big cores and 16 smaller ones might bring you more performance from the 8+16 with similar die space usage, pluuus, you are more efficient when idle/partially loaded thanks to those smaller cores.
Yeah, die utilization is a big thing, to add in, for Intel, apparently one of their big cores(Golden Cove) uses as much space as 4(!!!!) little cores(Gracemont). So it would be a scenario of 16 big cores vs 8+32 cores.

I actually think that power efficiency in desktop is also an objective for Intel, for the same reason why they want ATX12VO, power regulations from states/government.
Posted on Reply
#16
Punkenjoy
persondbYeah, die utilization is a big thing, to add in, for Intel, apparently one of their big cores(Golden Cove) uses as much space as 4(!!!!) little cores(Gracemont). So it would be a scenario of 16 big cores vs 8+32 cores.

I actually think that power efficiency in desktop is also an objective for Intel, for the same reason why they want ATX12VO, power regulations from states/government.
Indeed

Well People talk a lot about IPC, but work accomplish per joule is still a critical point on all platform including desktop. Nobody want to have to deal with a 1000w CPU.

"Little" core aren't only smaller, they use less power for the same amount of work. This means you can do more work with the same power envelope. In the end, this is what everyone want on all platform.
Posted on Reply
#17
InVasMani
How I see it is bigLITTLE approach could be used to vary the L1/L2 cache between dies, vary the core count, and vary the instruction sets. I could see Intel/AMD actually doing per core design changes within a actual chip die. The more multi-core these become the more AMD/Intel will probably want to leverage specific tailored advantages within the cores similar to what AMD did with RNDA to RNDA2 to eek out more performance and efficiency of the design structure. If you have a 8c or 16c chip die half or a quarter of it might have subtle differences to the L1/L2 cache, instruction sets, ect down the road.

A lot of the low hanging fruit from multi-core multi die CPU designs might run it's course a bit so less emphasis on too much general purpose might be the way forward in favor of task specific speeds ups to more latency sensitive workloads that are only lightly threaded for example.

I just see a doorway to dynamic L1/L2 cache size and design structure between chip dies to be leveraged. Why wouldn't you want a L1 or L2 cache that's a bit lower latency than one on another die? Why might you want a bit higher L1/L2 cache latency rather than a cache hit penalty at the same time by accessing the next level cache? There are obvious performance and power efficiency reasons to why cache's are designed a particular way. It's just not possible for them to be perfect across all use cases in terms of performance and efficiency however there is always a balance. I see this is a nice way of achieving higher balance. I do think the L3 cache could absolutely be shared and tasks across the L3 cache combined in highly parallel workloads. Lighter less parallel workloads I could see a round robin cache approach based on the chip die where a task retroactively selects the best suited L1/L2 cache design structure on the fly where it will use one over another to speed up a given part of a given task on hand.
Posted on Reply
#18
ADB1979
AMD has a few potential options here:

Make a new cut-down Zen optimized for die space and low power.

Bring back AMD's Atom competitor with significant updates.

Use ARM cores.

Using ARM cores has been my thought for some time, MS has the ability to do so, the Linux people surely have as well, plus AMD have produced their own ARM CPU's before, so already have some experience with ARM and possibly the most significant point here (apart from the software) is that AMD and Samsung are already collaborating with Samsung Fabricating AMD parts and Samsung licencing AMD RDNA GPU cores for Samsung mobile chips.
Posted on Reply
#19
InVasMani
AMD should just take a chip like the 3300X shrink it to 5nm and then make two other variations on it and combine 3 of those chips on the same substrate. The variations would alter the L1 and L2 cache data width and instruction set sizes and designs slightly and inverted while the 3300X shrunk down to 5nm could remain right in between. Basically they could have 3 different L1 cache designs every L1 cache clock cycle to pick and choose the most optimal option to avoid a cache hit miss penalty and requiring the L2 cache access and same goes for the L2 cache as well to avoid a cache hit miss penalty and accessing the slower in turn L3 cache. It's pretty clear to me the cache structure of different dies and the instruction sets on them are ways to increase IPC and avoid cache hit miss penalties. The L3 cache obviously could be shared while the other parts of the cache differentiated more. I feel that is the way to approach big.LITTLE and down the road it could done more on a per core level or per core cluster level within a chip die similar to how GPU's hand some parts of the design like how RDNA 1 to RDNA2 evolved.
Posted on Reply
#20
Punkenjoy
ADB1979AMD has a few potential options here:

Make a new cut-down Zen optimized for die space and low power.

Bring back AMD's Atom competitor with significant updates.

Use ARM cores.

Using ARM cores has been my thought for some time, MS has the ability to do so, the Linux people surely have as well, plus AMD have produced their own ARM CPU's before, so already have some experience with ARM and possibly the most significant point here (apart from the software) is that AMD and Samsung are already collaborating with Samsung Fabricating AMD parts and Samsung licencing AMD RDNA GPU cores for Samsung mobile chips.
It's true that you have windows and Linux version running on ARM. But that do not means it would make sense to use ARM core for the little core. You can't execute x64 code with arm64 code.

The rumors are they would use Zen4 core as the little core and Zen5+ as the big core, all x64.
Posted on Reply
#21
ADB1979
PunkenjoyIt's true that you have windows and Linux version running on ARM. But that do not means it would make sense to use ARM core for the little core. You can't execute x64 code with arm64 code.
The entire code base would have to be different, not just the 64-Bit stuff, however my line of thinking here is that the "little" cores are there for background tasks and things that simply do not need blazing performance.

MicroShaft and Linux already run the OS and Apps on ARM cores, I see no (good) reason why the OS in either case could not be literally run on top of each other. I would run the "base" OS on the Arm cores so that the base system will natively run low power and background system tasks, with the "upper" OS running the big AMD64 (x86) cores for things that need it.

Arm cores have been heavily optimised for Web Apps and video already, the relatively small performance loss of running Web Apps (browsers primarily) on the "little" Arm cores would likely be unnoticeable except on benchmarks, and Browsers are the only "low power" App that I am aware of that is deceptively needy of high performance CPU's.

On the MS side, there would undoubtedly be a massive cludge, but as Gaming and other high performance needing Apps are slowly moving away from MS to Linux anyway, I would expect Linux to be the ideal OS platform, or I should say, Dual OS platform. There is also no real reason why using such an Arm/AMD64 hybrid CPU cannot be done using Linux on the Arm coses as the "base" OS and Windows on top of that, in many respects this would be the ideal solution.

There would potentially be great boons for system security by going this route of essentially running two OS's one on top of the other, which is commonplace with Servers.

There is still a long time to wait for this to happen, so we will likely not get anything concrete for a couple of years, up until then, this is pure speculation and a thought experiment.
PunkenjoyYou can't execute x64 code with arm64 code.
Intel's upcoming solution has a similar couple of problems itself, the first is likely not a real issue at all and that is that the "small cores" do not have Hyper Threading, the second issue is that the "small cores" cannot execute AVX512. AVX512 is still not commonly used, and when it is it is typically used in high performance Apps anyway.

Seeing how Intel / MicroShaft / Linux gets round these issues will be interesting as will be the whole operation of the scheduler. From what I have heard, MS has/is putting a lot of effort into this (with Intel) and could give us clues as to how AMD goes about this.

Interesting times ahead, especially for those who will want to use (or just test) how Alder Lake performs on W10 vs W11, and whether MS / Intel make Rocket Lake's big.LITTLE CPU's work properly at all on W10, we shall find out in a few months.
Posted on Reply
Add your own comment
Copyright © 2004-2021 www.techpowerup.com. All rights reserved.
All trademarks used are properties of their respective owners.