Wednesday, August 28th 2019

AMD to Cough Up $12.1 Million to Settle "Bulldozer" Core Count Class-Action Lawsuit

AMD reached a settlement in the Class Action Lawsuit filed against it, over alleged false-marketing of the core-counts of its eight-core FX-series processors based on the "Bulldozer" microarchitecture. Each member of the Class receives a one-time payout of USD $35 per chip, while the company takes a hit of $12.1 million. The lawsuit dates back to 2015, when Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of false-marketing of its FX-series "Bulldozer" processor of having 8 CPU cores. Over the following four years, the case gained traction as a Class Action was built against AMD this January.

In the months that followed the January set-up of a 12-member Jury to examine the case, lawyers representing the Class and AMD argued over the underlying technology that makes "Bulldozer" a multi-core processor, and eventually discussed what a fair settlement would be for the Class. They eventually agreed on a number - $12.1 million, or roughly $35 per chip AMD sold, which they agreed was "fair," and yet significantly less than the "$60 million in premiums" consumers contended they paid for these processors. Sifting through these numbers, it's important to understand what the Class consists of. It consists of U.S. consumers who became interested to be part of the Class Action, and who bought an 8-core processor based on the "Bulldozer" microarchitecture. It excludes consumers of every other "Bulldozer" derivative (4-core, 6-core parts, APUs; and follow-ups to "Bulldozer" such as "Piledriver," "Excavator," etc.).
Image Credit: Taylor Alger Source: The Register
Add your own comment

288 Comments on AMD to Cough Up $12.1 Million to Settle "Bulldozer" Core Count Class-Action Lawsuit

#176
FordGT90Concept
"I go fast!1!11!1!"
RichF, post: 4106376, member: 154826"
The article I linked to said a judge has to approve the settlement.
Yes, because only a judge can close a case. Plaintiff files a case -> plaintiff and defendant present evidence/make case -> either goes to trial or settles -> judge closes case (serves as a witness to the settlement). Unless it goes to trial, the court just enforces procedure.

RichF, post: 4106380, member: 154826"
Steamroller did not replace Piledriver. It was originally supposed to be designed for the high-performance bracket but AMD changed course and made Steamroller a weaker product to fit into the niche of reduced power consumption and production cost. That is why it never came in 8 cores and it was made on the inferior 28nm node.

Piledriver was the direct replacement for Bulldozer and it was never replaced until Zen 1.
Just because they never sold it as an 8-threaded product doesn't detract from the fact that they saw the need for a change and did it.
Posted on Reply
#177
RichF
FordGT90Concept, post: 4106386, member: 60463"
Yes, because only a judge can close a case. Plaintiff files a case -> plaintiff and defendant present evidence/make case -> either goes to trial or settles -> judge closes case (serves as a witness to the settlement). Unless it goes to trial, the court just enforces procedure.

Just because they never sold it as an 8-threaded product doesn't detract from the fact that they saw the need for a change and did it.
Neither of those responses are effective rebuttals. I'm disengaging from you in this topic from this point forward.
Posted on Reply
#178
FordGT90Concept
"I go fast!1!11!1!"
seronx, post: 4106381, member: 86156"
Um, that(those) instruction decode(s) isn't part of the core.
Every other microprocessor architecture on the market disagrees so, either you're wrong (and AMD too) or everyone else is.

1d10t, post: 4106383, member: 110464"
One question remains, how these court filling applied to, will it's applied to all Bulldozer uArch and derivatives?
The law firm will have to collect the information of those that want to join the class action settlement and in that, they will specify what products specifically apply to the class action lawsuit. It might already say in the settlement but...can't be arsed to go digging.
Posted on Reply
#179
seronx
FordGT90Concept, post: 4106386, member: 60463"
Just because they never sold it as an 8-threaded product doesn't detract from the fact that they saw the need for a change and did it.
Technically, what they did was make it worse;
-> Processor models 00h–1Fh can perform an instruction block fetch every cycle, while model 30h–4Fh processors can perform a block fetch every 2 cycles.
-> In processor models 00h–1Fh, the decode unit scans two of these windows in a given cycle decoding a maximum of four instructions. In processor models 30–4Fh, the two decode units scan two of these windows every two cycles decoding a maximum of four instructions.

How is that a good change? It is two times slower than the previous generation.
Bulldozer fetches up to 32B every cycle.
Steamroller fetches up to 16B every cycle.
Bulldozer decodes up to 4 macro-instructions every cycle.
Steamroller decodes up to 2 macro-instructions every cycle.
Posted on Reply
#180
RichF
seronx, post: 4106389, member: 86156"
Technically, what they did was make it worse;
-> Processor models 00h–1Fh can perform an instruction block fetch every cycle, while model 30h–4Fh processors can perform a block fetch every 2 cycles.
-> In processor models 00h–1Fh, the decode unit scans two of these windows in a given cycle decoding a maximum of four instructions. In processor models 30–4Fh, the two decode units scan two of these windows every two cycles decoding a maximum of four instructions.

How is that a good change? It is two times slower than the previous generation.
The Stilt also said that the 32nm SOI process, particularly once it had matured, offered better characteristics than 28nm for high-performance parts. 28nm bulk was used because it was cheaper to make chips with, not because it was an upgrade.

Obviously, if AMD had decided to follow through with its original intention, it would have made Steamroller in no less than 8 core parts and wouldn't have cut away other things like cache. Steamroller was, obviously (as there was no 8-core part — not even a 6-core part), designed mainly to fit into the roles of reduced power consumption and reduced production cost. The minor IPC improvements from Steamroller and Excavator came at the cost of frequency and core count, both of which trumped the IPC gains in the high-performance realm — particularly when compared with mature-process Piledriver at performance-optimal clock, which is probably around 4.4 GHz. The designs were further hampered by an inferior socket/VRM spec and 28nm process.
Posted on Reply
#181
FordGT90Concept
"I go fast!1!11!1!"
seronx, post: 4106389, member: 86156"
Technically, what they did was make it worse;
-> Processor models 00h–1Fh can perform an instruction block fetch every cycle, while model 30h–4Fh processors can perform a block fetch every 2 cycles.
-> In processor models 00h–1Fh, the decode unit scans two of these windows in a given cycle decoding a maximum of four instructions. In processor models 30–4Fh, the two decode units scan two of these windows every two cycles decoding a maximum of four instructions.

How is that a good change? It is two times slower than the previous generation.
Bulldozer fetches up to 32B every cycle.
Steamroller fetches up to 16B every cycle.
Bulldozer decodes up to 4 macro-instructions every cycle.
Steamroller decodes up to 2 macro-instructions every cycle.
There's two decoders per core in Steamroller: the throughput is the same when comparing apples to apples...less the opportunity for collision/blocking because the decoders are independent.
https://www.anandtech.com/show/6201/amd-details-its-3rd-gen-steamroller-architecture
One of the biggest issues with the front end of Bulldozer and Piledriver is the shared fetch and decode hardware. This table from our original Bulldozer review helps illustrate the problem:
Steamroller addresses this by duplicating the decode hardware in each module. Now each core has its own 4-wide instruction decoder, and both decoders can operate in parallel rather than alternating every other cycle.
Don’t expect a doubling of performance since it’s rare that a 4-issue front end sees anywhere near full utilization, but this is easily the single largest performance improvement from all of the changes in Steamroller.
If they were really independent cores in the first place then this change wouldn't matter. Integer clusters aren't cores. They never were and they never will be.
Posted on Reply
#182
seronx
FordGT90Concept, post: 4106393, member: 60463"
There's two decoders per core in Steamroller: the throughput is the same when comparing apples to apples...less the opportunity for collision/blocking because the decoders are independent.
There was no collision or blocking in Bulldozer, btw.
FordGT90Concept, post: 4106393, member: 60463"
If they were really independent cores in the first place then this change wouldn't matter. Integer clusters aren't cores. They never were and they never will be.
They aren't integer clusters, they are cores.
Posted on Reply
#183
GreiverBlade
FordGT90Concept, post: 4106393, member: 60463"
There's two decoders per core in Steamroller: the throughput is the same when comparing apples to apples...less the opportunity for collision/blocking because the decoders are independent.
https://www.anandtech.com/show/6201/amd-details-its-3rd-gen-steamroller-architecture
decoder don't define core, which are what the INT/LS (EX/LS) unit are ... and there are indeed 2 of them per module ...

one more time
GreiverBlade, post: 4106382, member: 105443"
steamroller has the same INT/LS (EX/LS) pair of core per module ... they just splitted the decode in 2 soooooo "2 INT/LS (EX/LS) 1 decode" is a single core and "2 INT/LS (EX/LS) 2 decode" is a dual core .... sooooo the core are defined by the decode unit? (hint they are not, that class action lawsuit was only a mean to cash on the fact that BD was slower than intel ... although on certain heavily threaded applications ... they weren't but those who use that wouldn't fill a class action lawsuit ... because it only really mattered in gaming performance ... thus: pissing in the wind)
point... of... view...


nooowwww i think we all should stop ... because it's becoming ridiculous, i am right, seronx is right, you are right (ok in a 2:1 ratio about point of view but well can't have the same point of view ... right?)

oh man ... how much i would gladly pay to settle this (and keep my point of view on what define a core.)
Posted on Reply
#184
FordGT90Concept
"I go fast!1!11!1!"
seronx, post: 4106397, member: 86156"
There was no collision or blocking in Bulldozer, btw.
Then explain why AMD changed it and AnandTech explicitly said it resulted in the "largest performance improvement."
Posted on Reply
#185
seronx
FordGT90Concept, post: 4106400, member: 60463"
Then explain why AMD changed it and AnandTech explicitly said AMD changed it for a large "performance improvement."
There was no large performance improvement from the fetch/decode switch. It was made to reduce the overall power consumption, less work every cycle means faster clocks. You talk about Steamroller's supposed front-end improvements, yet ignore the butchered floating-point unit? Again, a change meant to reduce power consumption, not performance improvement.
Posted on Reply
#186
GreiverBlade
"pissing in the wind" : "To waste time on a pointless or fruitless task; do something that is ineffective. You can make a complaint if you like, but you'll just be pissing in the wind."
Posted on Reply
#187
RichF
Piledriver was a small modification of Bulldozer that was released not long after its release. The high-performance sector of AMD's processor business remained frozen from the time Piledriver was released until Zen 1 replaced it. The only improvement was a minor tightening of leakage due to the maturation of the 32nm SOI node, which resulted in the 8370E. Steamroller and Excavator are basically irrelevant to this discussion.

• Neither were released in 6+ cores.

• Neither were released on a high-performance node.

• Neither were released on a high-performance VRM socket spec.

• Neither were released with high-performance amounts of cache.

Minor improvements to IPC pale in comparison to the lack of cores and frequency in Steamroller and Excavator, except in the niche they targeted where power consumption and cheap production cost were favored over performance.
Posted on Reply
#188
FordGT90Concept
"I go fast!1!11!1!"
GreiverBlade, post: 4106399, member: 105443"
decoder don't define core, which are what the INT/LS (EX/LS) unit are ... and there are indeed 2 of them per module …
A core can't do anything without instruction decoding. A core is also independent. The two facts combined are contradictory by AMD's definition of a core but not the industry's definition of a core. AMD, therefore, was not truthful in advertising.

seronx, post: 4106401, member: 86156"
There was no large performance improvement from the fetch/decode switch. It was made to reduce the overall power consumption, less work every cycle means faster clocks.

You talk about Steamroller's supposed front-end improvements, yet ignore the butchered floating-point unit?
Read the article. Splitting decode meant adding transistor real estate which naturally means higher power consumption; however, those are also transistors they can shut off when that thread isn't being used. You know, a feature of independent cores. Again, this is more proof that the integer clusters aren't cores. With each iteration, AMD split more and more hardware to make it more closely mimic a dual core, but it never got there.


An AMD Bulldozer/Piledriver/Steamroller/Excavator "module" is a "core" and AMD concurred by settling. 'nuff said.
Posted on Reply
#189
GreiverBlade
ffs... how come unwatching thread doesn't work ... did they lie to me?

FordGT90Concept, post: 4106404, member: 60463"
An AMD Bulldozer/Piledriver/Steamroller/Excavator "module" is a "core" and AMD concurred by settling. 'nuff said.
tho the core of that core... "module" ... is made of 2 core ... sharing a scheduler and a FP but there isn't 2 core ... riiiiight

as i said
GreiverBlade, post: 4106399, member: 105443"
decoder don't define core, which are what the INT/LS (EX/LS) unit are ... and there are indeed 2 of them per module ...

one more time


point... of... view...


nooowwww i think we all should stop ... because it's becoming ridiculous, i am right, seronx is right, you are right (ok in a 2:1 ratio about point of view but well can't have the same point of view ... right?)

oh man ... how much i would gladly pay to settle this (and keep my point of view on what define a core.)
GreiverBlade, post: 4106402, member: 105443"
"pissing in the wind" : "To waste time on a pointless or fruitless task; do something that is ineffective. You can make a complaint if you like, but you'll just be pissing in the wind."
Posted on Reply
#190
seronx
FordGT90Concept, post: 4106404, member: 60463"
Read the article. Splitting decode meant adding transistor real estate which naturally means higher power consumption; however, those are also transistors they can shut off when that thread isn't being used. You know, a feature of independent cores. Again, this is more proof that the integer clusters aren't cores. With each iteration, AMD split more and more hardware to make it more closely mimic a dual core, but it never got there.
That is not proof. I have already talked about the split decode, it wasn't a performance enhancement but a power enhancement. A core doesn't get four macro-ops every cycle, it only gets two macro-ops every cycle with Steamroller. If the second core is second class in Bulldozer, it is definitely third class in Steamroller.

A decode is not a feature of independent cores. The decode is independent of the cores. All of this is antagonistic to your reasoning. Which is more proof that the replicated parts in Bulldozer are in fact cores.
Posted on Reply
#191
64K
GreiverBlade, post: 4106405, member: 105443"
ffs... how come unwatching thread doesn't work ... did they lie to me?
They've sucked you into this debate and they will never let you leave now. :p
Posted on Reply
#192
londiste
This seems to boil down to terminology and expected functionality.

Minimal core is basically an ALU with a couple registers.
Realistically a core does need some control circuitry, this fits to things like Bulldozer parts or GPUs (CU, CUDA Core, EU).
In most literature this gets called execution core as its not all too useful by itself in a big complex processor and is part of execution stage or unit.

When talking about Bulldozer, the question boils down to expected functionality. What exactly should core be able to run?
- If it's what is generally referred to as micro-ops, then most pipes qualify as cores.
- In the way Bulldozer works, I think these were called macro-ops but if it needs some control, integer units qualify as cores. Technically, FP unit could qualify as well.
- If we want a core to run x86 instructions there really is no way around frontend.
None of these is wrong.
Posted on Reply
#193
Vya Domus
FordGT90Concept, post: 4106368, member: 60463"
You don't settle if you the facts are on your side.
So why didn't the plaintiffs go further with this and accepted the settlement?
Posted on Reply
#194
Chomiq
GreiverBlade, post: 4106405, member: 105443"
ffs... how come unwatching thread doesn't work ... did they lie to me?


tho the core of that core... "module" ... is made of 2 core ... sharing a scheduler and a FP but there isn't 2 core ... riiiiight

as i said
Posted on Reply
#195
seronx
londiste, post: 4106414, member: 169790"
- If we want a core to run x86 instructions there really is no way around frontend.
Global Front-end; translates x86 into native instructions. No physical core has x86 fetch, x86 decode, branch predictors, etc.

However every physical core has a control unit, an instruction bus, a data bus, and the datapath.
Buffer for inflight native bundles, scheduler to control all parts, a datapath to execute instructions, and a way to load and store data.

The global front-end also has the capability of Intel's "Anaphase". Which it can project a virtual core across all physical cores in the design as it contains a second-level control unit, instruction bus, data bus, but no physical datapaths.

Intel bought the company for cheap, from those that developed Pentium 3. So, the definition of a core is actually patented by Intel now.

Notice what is missing in this physical SMT4 quad-core with a single physical SMT4 core idled? That is right no decode!
Posted on Reply
#196
FordGT90Concept
"I go fast!1!11!1!"
Vya Domus, post: 4106419, member: 169281"
So why didn't the plaintiffs go further with this and accepted the settlement?
It would only go to trial if AMD choose to fight it.


I was looking through AMD Zen slides and came across this one:


Which reminded me of a diagram from the Hot Chips PDF:


Aren't they strikingly similar? Zen's picture is undeniably a "core" (see the last line on the right). It makes no sense to redefine what a "core" is for Bulldozer when it was well established before and after Bulldozer existed.



Even going all the way back to the original Pentium, instruction decoding was not decoupled from the core (because you'd have a calculator instead of a processor):
http://home.etf.rs/~vm/tutorial/micro/mps/mps.htm



FX-8350 is a quad-core, eight-thread processor. AMD simply choose to add a second integer cluster (and later a decoder) to accelerate the second thread. There's nothing wrong with that. What is wrong is that AMD misrepresented their product to the public.
Posted on Reply
#197
seronx
FordGT90Concept, post: 4106426, member: 60463"
FX-8350 is a quad-core, eight-thread processor.
You have yet to prove that.
Posted on Reply
#199
Vya Domus
FordGT90Concept, post: 4106426, member: 60463"
It makes no sense to redefine what a "core" is for Bulldozer when it was well established before and after Bulldozer existed.
We have to continue with this endless limbo but, when was it established and by who ?

All of you naysayers keep repeating this over and over and yet all material out there disagrees with you. It's accepted that a core doesn't need to fetch, decode and execute instructions in it's own and can be something as simple as a SIMD unit. The only level at which fetching and decoding must happen (as in a requirement) is at the "processor" level which may or may not contain multiple cores.

I posed this question many times but I never got a definitive answer, are you telling me that the authors of the conjoined cores paper mislabeled the subject of their research ?

Were cores such as AMD's the norm ? No, but that doesn't they weren't part of this generic classification of "cores". Pointing fingers and saying this block does not look identical to this other block is a really, really primitive way of arguing about this. You are essentially throwing any information that goes more than skin deep out the window.
Posted on Reply
#200
FordGT90Concept
"I go fast!1!11!1!"
seronx, post: 4106436, member: 86156"
You have yet to prove that.
Yes, I did. Many references provided. At no point until AMD pursued CMT was an integer cluster called a "core." It might be called an "execution core" but that's a huge distinction from a multiprocessor environment where core means independent processor.

Vya Domus, post: 4106438, member: 169281"
We have to continue with this endless limbo but, when was it established and by who ?
Alan Turing and the Turing Machine which all CPUs mimic.
Posted on Reply
Add your own comment