• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD to Cough Up $12.1 Million to Settle "Bulldozer" Core Count Class-Action Lawsuit

A core has cores.
A processor has processors.

Something ain't right. This bizzare recursion needs to be addressed somehow.

An if statement is just a conditional jump that only neends an ALU to work, or an execution core. There is nothing special that is required and that can only be found in some other type of core.

Moreover, the execution of every instruction no matter how complex, can be driven pretty much exclusively by lookup tabels.

You can have a memory where every adress corresponds to an instruction and the value at that address represents the control signals required for it's execution. From then on you just need the ALU and wiring.

Of course actual CPUs aren't really implemented like this but this proves everything else other than the execution core is redundant. That's the only dinamic bit required for a processor.
 
Last edited:
previously to that settle matter still they defined what is a core ... which is kinda confusing ... well a core is a core be it execution, cuda, tensor, stream or pure simple core...
\
point of view.
 
That's a lot of Ryzen CPU'S 12.1 mil ugh!
 
It's clear who will win in this process are the lawyers besides he actually has eight cores but were a grambiarra with modular cores, and before the Intel fanboys comment something stupid remember that the Core 2 Quad were two glued Core 2 Duo, but because Windows Scheduler is bad on AMD's processors, it says 4/8, nothing more stupid on Microsoft's part.
 
...Core 2 Quad were two glued Core 2 Duo...
Core 2 Duo was a legit module containing two complete processors (aka cores). Core 2 Quad was a multi-chip module where two Core 2 Duo modules were attached to the same wafer exposing a total of four complete processors on the front-side bus.
core2quad.jpg

Here's what the individual dies look like ("core replication is obvious"):
core2duodie.jpg


In the case of Bulldozer, each "module" only contained one complete processor (aka core). That's why in the literature, it's called a "conjoined-core."
Kumar et al said:
This paper proposes conjoined-core chip multiprocessing – topologically feasible resource sharing between adjacent cores of a chip multiprocessor to reduce die area with minimal impact on performance and hence improving the overall computational efficiency
"conjoined-core" is referring to "chip"-level "core" which is synonymous with "processor" consistent with Pentium D and Athlon 64 X2 (which were out at the time).
"adjacent cores" is referring to execution "cores" which share resources in a "conjoined-core." These do not qualify as "processors."
 
Last edited:
They totally qualify as cores and in the papers the authors refer to the arrangement as being a pair of cores, as in two.

They don't mean two execution cores or waffles or anything else, they mean just two cores.
 
Sure they do, further in, have another quote:
Conjoined-core chip multiprocessing deviates from a conventional chip multiprocessor design by sharing selected hardware structures between adjacent cores to improve processor efficiency.
They're using two different definitions of "core" interchangeably.

"Conjoined-core" refers to this:
A core is part of a CPU that receives instructions and performs calculations, or actions, based on those instructions.
"Adjacent cores" refers to this:
The execution unit contains the data registers and the ALU.

The use of the phrase "execution core" is *rare* outside of conjoined-core literature.

So you see the problem? Bulldozer "execution cores" lack the hardware to decode AMD64 instructions which is a function of the "core" (aka processor). "Execution cores" as defined in Bulldozer lack the hardware necessary to be considered a "core:" they are merely "execution units." ...and these are the wheels the turn the gears of false advertising.
 
Last edited:
This is purely an invention of yours, as are most of your arguments.

They simply do not ever make a distinction between the kinds of cores that they are talking about because they don't have to, a core is a core in any circumstance. "Adjacent" simply refers to the pair of cores that share the resources, nothing less nothing more.

Have these quotes in which it's crystal clear what they mean by those adjacent cores in relation to the traditional cores :

Wires connecting the FPU to the left core and the right core can be interdigitated, so no additional horizontal wiring tracks are required

"Connecting the FPU to the left and right core.". An FPU classifies as an execution core, clearly they don't mean it's shared between other execution cores.

A core can alternate accesses between the two banks. It can fetch 4 instructions every cycle but only if their desired bank is available. A core has access to bank 0 one cycle, bank 1 the next, etc., with the other core having the opposite allocation

There you go, each core fetches instructions, in an alternating fashion. It cannot get any more obvious that this, they mean cores as in not execution cores. An execution core can't fetch instructions on it's own.

You are simply wrong, end of story.
 
"Conjoined-core" is never plural. Think of another context where "conjoined" is commonly used: "conjoined-twins." Note "twins" is plural because they are, in fact, separate entities but they both share a birth defect: being joined to each other.

If the other's intent was truly to say monolithic-core and conjoined-core were indistinguishable, they would have used the plural form of core: "cores." They do not, because they're not independent processors; they are in fact very dependent on each other. The two combined, therefore, make an indivisible new entity: a conjoined-core.

"left and right core" are referring to execution units, not the whole "conjoined-core."

"A core can alternate accesses between the two banks" is referring to the "conjoined-core" where the "two banks" are the "execution units."

As I said, and you just demonstrated again, the article is using two definitions of "core" interchangeably. It's a technical document that assumes the reader will understand the difference.
 
Execution cores can't fetch instructions. They mean fully functional cores, if they meant one core, then who is the other core that they are talking about ?

You are out of touch with the technical aspects of this papers.
 
An execution core can't fetch instructions on it's own.
And this is where Bulldozer is hilarious: there's actually two types of instructions:
1) x86 which is what the "conjoined-core" exposes to the system.
2) microOPs which is what the "adjacent cores" process and aren't directly accessible.

They both fetch their respective instructions. This is probably why they love using two meanings of "core." But only one of them matters to the public.


Not that it matters. In Steamroller, they split instruction decode too but the "module" is still a "conjoined-core" sharing resources--aka a "core" (not plural).

Remember how Sun designed a conjoined-core on steroids? Why do you think they never released it? My guess: poor performance like AMD saw. Even after four generations of conjoined-core designs, AMD abandoned it entirely. Sun's chip likely had the same problems AMD's chip did, but four fold, because they shared a crapload more than AMD did. There was no market for a chip that performs that badly, so they never launched it. The cost to support it (hardware platforms and software) would have compounded the losses.
 
Last edited:
And here you end up contradicting yourself.

You've battled for the last couple of pages to prove execution cores can't be cores because they are just "glorified calculators". But now what do you know, turns out a calculator can even fetch instructions from memory, hmm.

A "core" relies on nothing other than memory subsystems to carry out instructions.

It's settled, they are cores.
 
How can they calculate if they have no data? Point is, microOPs afford very little capability; hence, glorified calcultors.

Anyway, the x86 decoder (as like all processors), hands the microOPs to the execution units on a silver platter known as L1 Instruction Cache. You know where I'm going with this.
 
Everything a processor executes consists of microOPs. Either everything is a glorified calculator or nothing is.
 
The Bulldozer "execution unit" is incapable of processing FADD. There's different types of execution units and the processor (aka "core") has to make sure the appropriate data gets to the appropriate unit then collates the results.

"conjoined-core" is very, very different from "adjacent cores."
 
You are clutching at straws with what a Bullzdozer core can or can't do. It's no question that it's capabilities are more limited compared to a conventional core but it's a core nonetheless, it can fetch, decode and execute instructions on it's own. If any of those stages are blocked by another core, it's a different matter but the two are very much obvious distinct entities.
 
AMD disagrees with your assessment:
bulldozer_011.jpg
 
Well, for one there aren't two execution units, there are four. Two for integer, two for floating point and they can be driven independently by two threads with limitations.

That makes it a dual core.
 
Now you're confusing execution units for components of them (ALUs, AGUs, MMXs, and FMACs). More detailed slide:
bulldozer-fpu.jpg
 
You literally have it spelled out for you mate.

Dual 128-bit FMAC pipes.

Plus the two integer clusters, four. Four execution units, two for integer, two for floating point.

If you want to brake them down fine, you'd have :

- 2x two ALUs
- 2x two AGUs
- 2x 128-bit FP units

But they are grouped like that for a reason, because each integer cluster can be used by one thread and the two FP units can either be shared or used by one thread in the case of 256-bit instructions.
 
Look at the picture again. These are pipelines which are part of the execution units (two integer, one floating point):
4 ALUs (EX/MUL pipeline + EX/DIV pipeline * 2)
4 AGUs (AGen pipeline * 4)
2 128-bit MMX pipelines
2 128-bit FMAC pipelines

That's a total of 12 pipelines for each Bulldozer conjoined-core. Each thread has 4 pipelines (2 x ALU + 2 x AGU) dedicated to it. When counting the FPU, pipeline usage can expand up to 8 when performing an AVX + 2 MMX instruction. In these instances, the other thread is deprived of progress on FPU tasks.


Still don't know why you insist on carrying on with this train of thought: the decoder and fetcher in Bulldozer is undeniably shared and "cores" don't share logic. It's a "conjoined core" which means the whole of it is a "core," not specific components as AMD would have you believe. AMD intentionally called the execution units "cores" to mislead the public in respect to its performance (overselling the capabilities of its product).
 
Last edited:
I am looking and I see 4 groups, two for integer, two for floating point. This is better illustrated here, one blue block, one green and two yellow. That's the higher level grouping of these execution units.

130550


The problem here is that you are getting confused because your definitions of what is an execution core or whatever fall into a strange twilight zone. It's neither a core nor an ALU, the only thing left it's a collection of ALUs/FPUs of which a Bulldozer module has 4.

Everyone either thinks in terms of cores or execution units (ALUs or FPUs). You are making this unnecessarily difficult in your pursuit of differentiating cores from anything else.

Still don't know why you insist on carrying on with this train of thought: the decoder and fetcher in Bulldozer is undeniably shared and "cores" don't share logic.

Because even though logic is shared multiple instructions end up being processed. That's the whole point, get work done with less logic.
 
Last edited:
I am looking and I see 4 groups, two for integer, two for floating point. This is better illustrated here, one blue block, one green and two yellow. That's the higher level grouping of these execution units.

View attachment 130550
I see four cores as clearly indicated by fetchers and decoders.

Oh look, Zen looks similar:
AMD-Zen-Quad-Core-Unit-Block-Diagram-640x360.jpg

Look at the text below the diagram: AMD is referring the whole (from Fetch to L2) as the core (not just the integer execution unit). AMD doesn't get to change the rules for its own advantage on Bulldozer. It was well understood what a "core" was before and after Bulldozer debuted.

Oh look! Zen even has 2 x 256-bit FMACs + 1 x MMX per core! Gee, I wonder why Bulldozer gets dragged through the mud for being pokey. Maybe it's because AMD *really* skimped on floating-point performance in the name of supporting more integer-heavy threads? Considering Zen's design, it's clear AMD believed this was a mistake in Bulldozer.

The problem here is that you are getting confused because your definitions of what is an execution core or whatever fall into a strange twilight zone. It's neither a core nor an ALU, the only thing left it's a collection of ALUs/FPUs of which a Bulldozer module has 4.
These phrases are not my own. They're phrases used in different literature to describe the same circuits. Why I keep changing phrasing is to stay consistent with the sourced documents. To be perfectly clear: "integer cluster" = "execution core" = "adjacent core" which is not to be confused with the singular "core" which is synonymous with "processor."

The best way to describe Bulldozer is thusly:
FX-8350 is a quad-core processor with each core accepting two threads. The integer payload of each thread is executed by a dedicated integer cluster while the floating-point payload is handed off to the shared floating-point cluster. The result of this design is accelerated performance in multi-threaded, integer-heavy scenarios like 7-zip compression; however, any workload that strains the processor cores' shared resources (like AVX), performance tanks.
 
Last edited:
I see four cores as clearly indicated by fetchers and decoders.

I see eight cores, each pair of two cores sharing some fetch and decode logic. You can see in the picture posted by yourself that the module has 4 decode units, enough to feed two independent threads, at the very least, and enough execution units to be driven by them.

The result of this design is accelerated performance in multi-threaded, integer-heavy scenarios like 7-zip compression; however, any workload that strains the processor cores' shared resources (like AVX), performance tanks.

Again, it's irrelevant how performance tanks or doesn't. CPUs behaved differently because of the way they used resources all throughout history, the first Pentium that had MMX suffered from major performance degradation in other workloads when MMX was used because it would stall other pipelines.
 
Back
Top