Wednesday, August 28th 2019

AMD to Cough Up $12.1 Million to Settle "Bulldozer" Core Count Class-Action Lawsuit

AMD reached a settlement in the Class Action Lawsuit filed against it, over alleged false-marketing of the core-counts of its eight-core FX-series processors based on the "Bulldozer" microarchitecture. Each member of the Class receives a one-time payout of USD $35 per chip, while the company takes a hit of $12.1 million. The lawsuit dates back to 2015, when Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of false-marketing of its FX-series "Bulldozer" processor of having 8 CPU cores. Over the following four years, the case gained traction as a Class Action was built against AMD this January.

In the months that followed the January set-up of a 12-member Jury to examine the case, lawyers representing the Class and AMD argued over the underlying technology that makes "Bulldozer" a multi-core processor, and eventually discussed what a fair settlement would be for the Class. They eventually agreed on a number - $12.1 million, or roughly $35 per chip AMD sold, which they agreed was "fair," and yet significantly less than the "$60 million in premiums" consumers contended they paid for these processors. Sifting through these numbers, it's important to understand what the Class consists of. It consists of U.S. consumers who became interested to be part of the Class Action, and who bought an 8-core processor based on the "Bulldozer" microarchitecture. It excludes consumers of every other "Bulldozer" derivative (4-core, 6-core parts, APUs; and follow-ups to "Bulldozer" such as "Piledriver," "Excavator," etc.).
Image Credit: Taylor Alger Source: The Register
Add your own comment

288 Comments on AMD to Cough Up $12.1 Million to Settle "Bulldozer" Core Count Class-Action Lawsuit

#26
thedukesd1
FordGT90Concept, post: 4105707, member: 60463"
Software was compiled for complete processors, not dual-thread--asymmetric processor designs. Bulldozer could have been great if software was compiled to take advantage of it. ...
Well AMD at that point didn't had the highest market share in pc cpu area. Also when you develop a new cpu you also need to make sure that older software that might not see any optimization for you new cpu architecture are running better on your new cpus.
Both are important. First one because if you are the maket leader newer software will probably be optimized for your newer architecture and maybe some of the older software will see optimizations. Second is also important because if older software runs like crap on your new cpu architecture you might not really sell that many cpus.

By development I understand more than cpu design phase, I also understand the samples testing phase. During samples testing it's impossible not to see them underperforming with the existing software and in this situation you need to go back to design phase and fix what is wrong.
As the underdog if you expect the software to get optimized for your architecture then you are kinda suicidal. The software developers are not really gonna bother, they gonna point that their software works better on your older architecture and on your competition cpus and are not gonna waste their time with your architecture and as result your cpus are gonna perform badly and you are not gonna sell...
Posted on Reply
#27
londiste
RichF, post: 4105713, member: 154826"
What is relevant is the technical definition of a core. As I already posted, FPUs are not even required to have a CPU core.
Technical definition of core includes or implies it being independent. Bulldozer modules are independent, Bulldozer cores are not.

If you want to read the discussion/argument going back and forth for several rounds, go though that thread. We probably won't be able to bring anything new to the table here.
FordGT90Concept, post: 4105703, member: 60463"
It's all in this thread:
https://www.techpowerup.com/forums/threads/bulldozer-core-count-debate-comes-back-to-haunt-amd.251758/
Posted on Reply
#28
RichF
londiste, post: 4105720, member: 169790"
Technical definition of core includes or implies it being independent.
So, a core is the entire chip. I've never seen that definition before.
Posted on Reply
#29
londiste
RichF, post: 4105721, member: 154826"
So, a core is the entire chip. I've never seen that definition before.
In a multi-core processor, each core is effectively an independent CPU.

While wiki is not always the best source, its description is pretty accurate:
https://en.wikipedia.org/wiki/Multi-core_processor"
A multi-core processor is a computer processor integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions, as if the computer had several processors.
Posted on Reply
#30
FordGT90Concept
"I go fast!1!11!1!"
seronx, post: 4105711, member: 86156"
None of the general purposes cores get stalled from anything on the FPU core. They are distinct from the FPU.
You have two threads in the core: both threads have a 256-bit FMAC operation required before they can progress. Both threads are stalled while the shared FPU round robin executes the operations: only one thread per cycle is being worked on. We see this a lot in Bulldozer multithreaded benchmarks where it only performs at about 120% of a quad core versus 200%. Bulldozer is great at 7-zip because of the MIPS nature of 7-zip. In most other tests, it's looks comparable to Intel's Hyper-Threading (but slightly faster because it can simultaneously execute two integer operations).

Judging by performance, it's more like an SMT quad-core than a non-SMT octo-core. Legit octo-cores (AMD and Intel alike) bitch slap Bulldozer. :laugh: ...and it's reflected in the increase of transistors too.
Posted on Reply
#31
RichF
londiste, post: 4105722, member: 169790"
In a multi-core processor, each core is effectively an independent CPU.
But it's not independent. It shares resources unless it is two or more complete CPU chips on one die, where one can function with the others completely disabled. This independence, in fact, contradicts the concept of a "multi-core processor". Instead, it mandates that it be a multi-processor die.

So, I'm seeing a paradoxical claim. It's independent except that it's not.
Posted on Reply
#32
londiste
RichF, post: 4105725, member: 154826"
But it's not independent. It shares resources unless it is two or more complete CPU chips on one die, where one can function with the others completely disabled. This independence, in fact, contradicts the concept of a "multi-core processor". Instead, it mandates that it be a multi-processor die.
What would you consider a CPU? What does a chip or a logic circuit have to do to be a CPU?

RichF, post: 4105725, member: 154826"
So, I'm seeing a paradoxical claim. It's independent except that it's not.
CPU or processing unit as in the wiki sentence above is defined by performing instructions - in this case x86 instructions. CPU and core are defined as the same. CPU/core includes front-end, its execution resources along with control circutry to manage it. Today L2 cache is usually included in the core while by academic definition it is outside it. Both by definition and in practice this bunch of things is independent.

Memory controller, higher level caches like L3 and newer additions to CPU die like IO are not core functions.

In a multicore CPU, one core can be disabled completely and remaining cores will work as expected. This has always been the case, in practice this is usually implemented with a setting in BIOS.
Posted on Reply
#33
RichF
londiste, post: 4105728, member: 169790"
What would you consider a CPU? What does a chip or a logic circuit have to do to be a CPU?
I'm merely pointing out that there are problems with the independence angle. Taken to its full extent, it ends up with the notion that multi-core CPUs have to have-fully independent CPUs, more than one, on a single die.
Posted on Reply
#34
thedukesd1
RichF, post: 4105713, member: 154826"
What is relevant is the technical definition of a core. As I already posted, FPUs are not even required to have a CPU core. Please, at least, try to rebut what I've said instead of ignoring my arguments entirely. I've done more than just point out that FPUs aren't required.
The industry (AMD included) for years made it be like this: 1 core = 1 FPU .

Ok. At Bulldozer release date for what you could use a CPU that totaly lacked FPUs (and also had nothing to emulated a FPU).
By what you are writing you are basicaly saying that FPUs count doesn't matter. So I decided to push it at the limit and ask you what usage had a CPU that totaly lacked FPUs at Bulldozer release date.
Posted on Reply
#35
londiste
RichF, post: 4105732, member: 154826"
I'm merely pointing out that there are problems with the independence angle. Taken to its full extent, it ends up with the notion that multi-core CPUs have to have-fully independent CPUs, more than one, on a single die.
This notion is completely accurate.
Posted on Reply
#36
RichF
thedukesd1, post: 4105733, member: 163934"
The industry (AMD included) for years made it be like this: 1 core = 1 FPU .

Ok. At Bulldozer release date for what you could use a CPU that totaly lacked FPUs (and also had nothing to emulated a FPU).
By what you are writing you are basicaly saying that FPUs count doesn't matter. So I decided to push it at the limit and ask you what usage had a CPU that totaly lacked FPUs. at Bulldozer release date.
Many CPUs had been sold without FPUs. There may have even been an Alpha at the time, shortly before, or after. It doesn't matter. FPUs were never a requirement to have a CPU.

Your argument is like saying that RAM wasn't really RAM until DDR. Doubling the data rate doesn't make RAM into RAM. It merely makes it DDR. An FPU, similarly, is an addition upon the basic spec. FPUs could be eliminated again from CPUs and emulated in software. The performance would be bad for FPU-dependent processing but it would still run. Software FPU was used for many years. The FPU is a superset of the CPU core.
londiste, post: 4105736, member: 169790"
This notion is completely accurate.
Do you have a better citation than a wiki? The last time I checked, multi-core CPUs could share things, like a common pool of cache and cannot operate with some of the things they share disabled.

Besides, if you believe it's accurate then you really shouldn't use the terminology "multi-core CPU". Instead, you should use the terminology "multi-CPU die". This is because a core is a subset of a CPU. In multi-core CPUs in particular there is the expectation of resource sharing. That is what separates "core" from "cpu".
Posted on Reply
#37
londiste
RichF, post: 4105737, member: 154826"
Do you have a better citation than a wiki? The last time I checked, multi-core CPUs could share things, like the way Broadwell C's cores shared the L4 cache.
I don't have good academic sources handy to reference. The basics should be the same in course materials, for example:
https://teachcomputerscience.com/cpu/

Using both AMD's Zen and Intel's Skylake block diagrams from WikiChip as an example here. Check the SoC (entire die) diagram as well as the core parts that follow:
https://en.wikichip.org/wiki/amd/microarchitectures/zen#Block_Diagram
https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Block_Diagram

Any one core of these can be disabled independently, both in theory and in practice.
IO/Memory controllers, L3 cache and Infinity Fabric/Ring Bus are not part of the core.
Posted on Reply
#38
FordGT90Concept
"I go fast!1!11!1!"
RichF, post: 4105725, member: 154826"
But it's not independent. It shares resources unless it is two or more complete CPU chips on one die, where one can function with the others completely disabled. This independence, in fact, contradicts the concept of a "multi-core processor". Instead, it mandates that it be a multi-processor die.

So, I'm seeing a paradoxical claim. It's independent except that it's not.
They are independent. Each core fetches instructions, decodes them, executes them, and stuffs the result back into the memory where they can used again.

In Bulldozer, there's actually four instruction fetchers: one for the core, one for integer cluster 1, one for integer cluster 2, and one for the floating point cluster. As far as software is concerned, it only sees the first: the one for the core. This is why Microsoft had a hell of a time trying to get Windows thread dispatching to work right. Windows had to be modified to more intelligently control threading so it wouldn't inadvertently move threads around in a manner that would overwhelm one core while leaving another idle. Older versions of Windows (I think it was 7) addressed this by making each thread (associated with an integer cluster) a "processor." This solved the problem of Windows shifting threads around but it created a problem of overwhelming the floating point cluster because it couldn't appropriately load balance integer and floating point!

Years passed and in Windows 10 (I think 8 too), Microsoft finally tackled the problem by making the Windows thread scheduler aware of physical processors and logical processors. Because of this, Windows can now appropriately delegate floating point (by physical processor) and integer (by logical processors). Why? Because it's a relevant problem for all SMT implementations. Ryzen, Core i#, and Pentium 4 w/ HT only have one integer cluster and one floating point cluster each but they can accept two threads. The underlying hardware has to be managed in a similar fashion Bulldozer's does. The only difference is that tiny little detail of Bulldozer having two integer clusters instead of one.

FX-8350 has four cores, each core has two integer clusters. They are not equivalent. A core knows how to do SIMD (single instruction multiple data); an integer cluster does not. Integer clusters are fundamentally calculators, not processors. They lack awareness (have no knowledge of parallelism), logic (Boolean tests, branching, etc.), and access (only have the data they are given) to be processors.
Posted on Reply
#39
RichF
londiste, post: 4105742, member: 169790"
I don't have good academic sources handy to reference. The basics should be the same in course materials, for example:
https://teachcomputerscience.com/cpu/

Using both AMD's Zen and Intel's Skylake block diagrams from WikiChip as an example here. Check the SoC (entire die) diagram as well as the core parts that follow:
https://en.wikichip.org/wiki/amd/microarchitectures/zen#Block_Diagram
https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Block_Diagram

Any one core of these can be disabled independently, both in theory and in practice.
IO/Memory controllers, L3 cache and Infinity Fabric/Ring Bus are not part of the core.
Zen is a more Intel-like design than Bulldozer. Pairing it with Skylake doesn't resolve this issue.

As for the claim that Bulldozer doesn't function with intra-modular cores disabled... I guess you're unfamiliar with BIOS settings that can do that. I've run both the 8320E and 8370E with 4 cores via the 1 integer core per module setting.
Posted on Reply
#40
londiste
RichF, post: 4105745, member: 154826"
Zen is a more Intel-like design than Bulldozer. Pairing it with Skylake doesn't resolve this issue.
Zen and Skylake are by-the-book multicore CPUs. So have been the rest of x86 CPUs including AMD's K8 and K10.
Posted on Reply
#41
FordGT90Concept
"I go fast!1!11!1!"
RichF, post: 4105745, member: 154826"
As for the claim that Bulldozer doesn't function with intra-modular cores disabled... I guess you're unfamiliar with BIOS settings that can do that. I've run both the 8320E and 8370E with 4 cores via the 1 integer core per module setting.
All that does is tell the core fetcher to only accept one thread instead of two. Fundamentally no different than disabling Hyperthreading.
Posted on Reply
#42
RichF
londiste, post: 4105747, member: 169790"
Zen and Skylake are by-the-book multicore CPUs. So have been the rest of x86 CPUs including AMD's K8 and K10.
By Intel's book.

Diverging from common design doesn't mean lawsuit. :rolleyes:
FordGT90Concept, post: 4105748, member: 60463"
All that does is tell the core fetcher to only accept one thread instead of two. Fundamentally no different than disabling Hyperthreading.
Citation please. Everything I've read said FX CPUs' CMT is not SMT. I am also interested in how that BIOS setting works, considering that you are saying it's mislabeled by Gigabyte.
Posted on Reply
#43
londiste
FordGT90Concept, post: 4105743, member: 60463"
In Bulldozer, there's actually four instruction fetchers: one for the core, one for integer cluster 1, one for integer cluster 2, and one for the floating point cluster. As far as software is concerned, it only sees the first: the one for the core. This is why Microsoft had a hell of a time trying to get Windows thread dispatching to work right. Windows had to be modified to more intelligently control threading so it wouldn't inadvertently move threads around in a manner that would overwhelm one core while leaving another idle. Older versions of Windows (I think it was 7) addressed this by making each thread (associated with an integer cluster) a "processor." This solved the problem of Windows shifting threads around but it created a problem of overwhelming the floating point cluster because it couldn't appropriately load balance integer and floating point!
There is only one fetch, this is in the frontend of the Bulldozer module. There are multiple schedulers for dispatched micro-ops owing to split execution stage. By the way, Thuban and Zen also have split execution stage and separate schedulers for Integer and FPU clusters.

Microsoft's problem with scheduling was strange. The eventual fix was a change in how a Bulldozer CPU was being handled. Initially Bulldozer was treated as full 8-core processor and situation improved considerably when it started to be treated as 4-core with SMT (it is noteworthy that Linux did the same much sooner). This inherently addressed both problems plaguing scheduling:
- Moving threads around to undesired cores. Simple example is a second core in single module where first core is already loaded.
- Because of the same reason any FPU-heavy load was now more likely to go to unused module largely negating the shared FPU issue.
Posted on Reply
#45
seronx
FordGT90Concept, post: 4105724, member: 60463"
You have two threads in the core: both threads have a 256-bit FMAC operation required before they can progress. Both threads are stalled while the shared FPU round robin executes the operations: only one thread per cycle is being worked on.
Both threads are not stalled if both cores schedule a 256-bit op.
2x 80-bit Lo
2x 64-bit Hi
2x 128-bit Mid

If a 256-bit op; it would execute Lo twice for 128-bit and Hi twice for 128-bit on a single port for each thread. Lo to Hi register moves is 1-cycle, Lo(Hi) to Lo(Hi) register moves is 0-cycle. So, if the second thread is dependent on the first thread it could execute the first half on both. etc, etc, etc.

FPU design is built with two cores in mind. Front-end is built with two cores in mind. L2+interface is built with two cores in mind. There is physically only two cores in the Bulldozer module. As it is the world's first monolithic dual-core x86 architecture.
Posted on Reply
#46
RichF
londiste, post: 4105752, member: 169790"
Initially Bulldozer was treated as full 8-core processor and situation improved considerably when it started to be treated as 4-core with SMT
Zen 2 also performs better with AVX-256 than Zen 1 because it can execute 256-bit instead of combining 128s. Does that mean Zen 1 didn't have any real cores in it?

How the Windows scheduler acts with a design it wasn't made for is an interesting topic but hardly proof.
Posted on Reply
#47
thedukesd1
RichF, post: 4105737, member: 154826"
Many CPUs had been sold without FPUs. There may have even been an Alpha at the time, shortly before, or after. It doesn't matter. FPUs were never a requirement to have a CPU.

Your argument is like saying that RAM wasn't really RAM until DDR. Doubling the data rate doesn't make RAM into RAM. It merely makes it DDR. An FPU, similarly, is an addition upon the basic spec. FPUs could be eliminated again from CPUs and emulated in software. The performance would be bad for FPU-dependent processing but it would still run. Software FPU was used for many years. The FPU is a superset of the CPU core.
The FPU has become part of a core because it started to be need often and because emulation sucks in term of performance.
We are not living in the days when code was properly optimized. Today we are living in the days when "buy better cpu" , "buy more ram" , etc is normal...

If I code something and I know for sure that whoever is leading is changing his/her mind regarding what he/she wants then you can bet I will basicaly abuse FPU usage so I just don't have to go back and change things. You can say it's my fault I will say it's whoever leading fault because he/she clearly has no clear goals in mind.
And trust me the "insert bad words" leader that is changing his/her mind regarding what he/she wants also want things done fast, totaly ignoring the fact that his/her instability is making things slower and lacking optimization.

There is not gonna be a real diference between 4 or 8 heavy FPU threads run on the 8150. This is a reason to raise the question if 8150 is an 8 core or 4 core. Someone might not even say what type of threads is pushing on the 8150 and just make such a benchmark and show that it basicaly no real difference between 4 and 8 threads run on the 8150. Do I have to say what type of threads I'm pushing on a CPU? I think not.
Posted on Reply
#48
RichF
thedukesd1, post: 4105759, member: 163934"
The FPU has become part of a core because it started to be need often and because emulation sucks in term of performance.
1) FPU emulation has always been vastly slower than having a hardware FPU. So, pointless.

2) I'll repeat my question: Zen 2 also performs better with AVX-256 than Zen 1 because it can execute 256-bit instead of combining 128s. Does that mean Zen 1 didn't have any real cores in it?

Zen 1 can't execute 256-bit AVX independently. It has to combine at the 128-bit level. So, it doesn't have any real cores, eh? Not only is it slower at doing 256-bit AVX, it can't do it independently.
Posted on Reply
#49
londiste
RichF, post: 4105749, member: 154826"
Everything I've read said FX CPUs' CMT is not SMT.
CMT is a little bit of this, a little bit of that. SMT implies there are no added execution resources for additional threads. AMD added an Integer Cluster for CMT. Aside from that they are very similar.

Keep in mind that added Integer Cluster does not necessarily mean a huge boost in execution resources. Bulldozer Integer Clusters contain 4 pipes each (2 ALU, 2 AGU) and FPU contains 3 pipes (2 FMAC + MMX). At the same time, Zen's Integer Cluster has 6 pipes (4 ALU, 2 AGU) and FPU has 3 pipes (2 FMAC + MMX).
Posted on Reply
#50
xtreemchaos
ive a bulldozer and 2 piledrivers which I love, ive moved on now but still keep them for all times sake, to tell the truth it don't mean much if there 4 or 8 core to me there a part of my life and I enjoyed them and thay made me happy.
Posted on Reply
Add your own comment