• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

AMD to Cough Up $12.1 Million to Settle "Bulldozer" Core Count Class-Action Lawsuit

When "Bulldozer" was released how many PC CPUs (sold as new) had less FPU cores count compared to integer cores?

"Bulldozer" architecture was and still is a disgrace.
The FX-4100 running at 3,6Ghz (can boost to 3,8 Ghz) is advertised as a 4 cores cpu.
Athlon II x4 640 is running at 3 Ghz (no boost) and is a 4 cores cpu.
Main problem is that in single-threaded performance the FX-4100 is scaling worst that the Athlon II x4 640 and in term of multi-threaded performance the FX-4100 is actualy worst than the Athlon II x4 640.
So basicaly we had a newer architecture that had worst performance/clock compared to the older architecture. This is something that you see during development phase. So what's the point to push on the market a product that sucks compared to the older architecture?!?

Why AMD insisted with the Faildozer revisions is beyond my understanding. At that time the only real solution to upgrade from a K10 family cpu was an Intel CPU (visible in their shitty market share during Faildozer era).

If it was me taking the call at AMD after seeing how crap is the Faildozer architecture I would had see if it's possible to actually get a bit higher clocks from K10 family and add SSE 4.1/4.2 and AVX and actually try to work on a new CPU architecture (trust me without a single person that took part in the development of Faildozer).

Not really related to this topic. I kinda got tired and sick of people praising/defending AMD. They kinda forget that AMD actually wanted to sell Faildozer at some prices that have nothing to do with the reality of that architecture, that in Windows case AMD drivers were and still are pure junk, that AMD decided to no longer release drivers for Win 8.1 an use that still has more than 3 years to live, that each Ryzen release iteration has been full of problems (you kinda expect the 3rd iteration to be smooth but well not in AMD case... most likely because they just rush the products on the market without real testing something that I said about AMD 7 years ago). I'm not bashing AMD, just pointing to problems that AMD just doesn't care! I don't like Intel or Nvidia, the deal is that Intel, Nvidia and AMD have all 3 shady marketing technique (people accused NVIDIA of crippling the performance of older architectures in drivers, if you think AMD is better maybe you should check better because AMD might had done something that is far worst).
The performance of Bulldozer isn't relevant here.

The value of Bulldozer isn't relevant here.

How "shady" various corporations is isn't relevant here, with the possible exception of pointing out that corporations are shady by definition because profit is extracted by selling things for more than their worth. In order to do that, people must be tricked into parting with more of their money than they should.

What is relevant is the technical definition of a core. As I already posted, FPUs are not even required to have a CPU core. Please, at least, try to rebut what I've said instead of ignoring my arguments entirely. I've done more than just point out that FPUs aren't required.
 
Software was compiled for complete processors, not dual-thread--asymmetric processor designs. Bulldozer could have been great if software was compiled to take advantage of it. ...

Well AMD at that point didn't had the highest market share in pc cpu area. Also when you develop a new cpu you also need to make sure that older software that might not see any optimization for you new cpu architecture are running better on your new cpus.
Both are important. First one because if you are the maket leader newer software will probably be optimized for your newer architecture and maybe some of the older software will see optimizations. Second is also important because if older software runs like crap on your new cpu architecture you might not really sell that many cpus.

By development I understand more than cpu design phase, I also understand the samples testing phase. During samples testing it's impossible not to see them underperforming with the existing software and in this situation you need to go back to design phase and fix what is wrong.
As the underdog if you expect the software to get optimized for your architecture then you are kinda suicidal. The software developers are not really gonna bother, they gonna point that their software works better on your older architecture and on your competition cpus and are not gonna waste their time with your architecture and as result your cpus are gonna perform badly and you are not gonna sell...
 
What is relevant is the technical definition of a core. As I already posted, FPUs are not even required to have a CPU core.
Technical definition of core includes or implies it being independent. Bulldozer modules are independent, Bulldozer cores are not.

If you want to read the discussion/argument going back and forth for several rounds, go though that thread. We probably won't be able to bring anything new to the table here.
It's all in this thread:
 
So, a core is the entire chip. I've never seen that definition before.
In a multi-core processor, each core is effectively an independent CPU.

While wiki is not always the best source, its description is pretty accurate:
https://en.wikipedia.org/wiki/Multi-core_processor said:
A multi-core processor is a computer processor integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions, as if the computer had several processors.
 
None of the general purposes cores get stalled from anything on the FPU core. They are distinct from the FPU.
You have two threads in the core: both threads have a 256-bit FMAC operation required before they can progress. Both threads are stalled while the shared FPU round robin executes the operations: only one thread per cycle is being worked on. We see this a lot in Bulldozer multithreaded benchmarks where it only performs at about 120% of a quad core versus 200%. Bulldozer is great at 7-zip because of the MIPS nature of 7-zip. In most other tests, it's looks comparable to Intel's Hyper-Threading (but slightly faster because it can simultaneously execute two integer operations).

Judging by performance, it's more like an SMT quad-core than a non-SMT octo-core. Legit octo-cores (AMD and Intel alike) bitch slap Bulldozer. :laugh: ...and it's reflected in the increase of transistors too.
 
In a multi-core processor, each core is effectively an independent CPU.
But it's not independent. It shares resources unless it is two or more complete CPU chips on one die, where one can function with the others completely disabled. This independence, in fact, contradicts the concept of a "multi-core processor". Instead, it mandates that it be a multi-processor die.

So, I'm seeing a paradoxical claim. It's independent except that it's not.
 
But it's not independent. It shares resources unless it is two or more complete CPU chips on one die, where one can function with the others completely disabled. This independence, in fact, contradicts the concept of a "multi-core processor". Instead, it mandates that it be a multi-processor die.
What would you consider a CPU? What does a chip or a logic circuit have to do to be a CPU?

So, I'm seeing a paradoxical claim. It's independent except that it's not.
CPU or processing unit as in the wiki sentence above is defined by performing instructions - in this case x86 instructions. CPU and core are defined as the same. CPU/core includes front-end, its execution resources along with control circutry to manage it. Today L2 cache is usually included in the core while by academic definition it is outside it. Both by definition and in practice this bunch of things is independent.

Memory controller, higher level caches like L3 and newer additions to CPU die like IO are not core functions.

In a multicore CPU, one core can be disabled completely and remaining cores will work as expected. This has always been the case, in practice this is usually implemented with a setting in BIOS.
 
Last edited:
What would you consider a CPU? What does a chip or a logic circuit have to do to be a CPU?
I'm merely pointing out that there are problems with the independence angle. Taken to its full extent, it ends up with the notion that multi-core CPUs have to have-fully independent CPUs, more than one, on a single die.
 
What is relevant is the technical definition of a core. As I already posted, FPUs are not even required to have a CPU core. Please, at least, try to rebut what I've said instead of ignoring my arguments entirely. I've done more than just point out that FPUs aren't required.

The industry (AMD included) for years made it be like this: 1 core = 1 FPU .

Ok. At Bulldozer release date for what you could use a CPU that totaly lacked FPUs (and also had nothing to emulated a FPU).
By what you are writing you are basicaly saying that FPUs count doesn't matter. So I decided to push it at the limit and ask you what usage had a CPU that totaly lacked FPUs at Bulldozer release date.
 
I'm merely pointing out that there are problems with the independence angle. Taken to its full extent, it ends up with the notion that multi-core CPUs have to have-fully independent CPUs, more than one, on a single die.
This notion is completely accurate.
 
The industry (AMD included) for years made it be like this: 1 core = 1 FPU .

Ok. At Bulldozer release date for what you could use a CPU that totaly lacked FPUs (and also had nothing to emulated a FPU).
By what you are writing you are basicaly saying that FPUs count doesn't matter. So I decided to push it at the limit and ask you what usage had a CPU that totaly lacked FPUs. at Bulldozer release date.
Many CPUs had been sold without FPUs. There may have even been an Alpha at the time, shortly before, or after. It doesn't matter. FPUs were never a requirement to have a CPU.

Your argument is like saying that RAM wasn't really RAM until DDR. Doubling the data rate doesn't make RAM into RAM. It merely makes it DDR. An FPU, similarly, is an addition upon the basic spec. FPUs could be eliminated again from CPUs and emulated in software. The performance would be bad for FPU-dependent processing but it would still run. Software FPU was used for many years. The FPU is a superset of the CPU core.
This notion is completely accurate.
Do you have a better citation than a wiki? The last time I checked, multi-core CPUs could share things, like a common pool of cache and cannot operate with some of the things they share disabled.

Besides, if you believe it's accurate then you really shouldn't use the terminology "multi-core CPU". Instead, you should use the terminology "multi-CPU die". This is because a core is a subset of a CPU. In multi-core CPUs in particular there is the expectation of resource sharing. That is what separates "core" from "cpu".
 
Last edited:
Do you have a better citation than a wiki? The last time I checked, multi-core CPUs could share things, like the way Broadwell C's cores shared the L4 cache.
I don't have good academic sources handy to reference. The basics should be the same in course materials, for example:

Using both AMD's Zen and Intel's Skylake block diagrams from WikiChip as an example here. Check the SoC (entire die) diagram as well as the core parts that follow:

Any one core of these can be disabled independently, both in theory and in practice.
IO/Memory controllers, L3 cache and Infinity Fabric/Ring Bus are not part of the core.
 
But it's not independent. It shares resources unless it is two or more complete CPU chips on one die, where one can function with the others completely disabled. This independence, in fact, contradicts the concept of a "multi-core processor". Instead, it mandates that it be a multi-processor die.

So, I'm seeing a paradoxical claim. It's independent except that it's not.
They are independent. Each core fetches instructions, decodes them, executes them, and stuffs the result back into the memory where they can used again.

In Bulldozer, there's actually four instruction fetchers: one for the core, one for integer cluster 1, one for integer cluster 2, and one for the floating point cluster. As far as software is concerned, it only sees the first: the one for the core. This is why Microsoft had a hell of a time trying to get Windows thread dispatching to work right. Windows had to be modified to more intelligently control threading so it wouldn't inadvertently move threads around in a manner that would overwhelm one core while leaving another idle. Older versions of Windows (I think it was 7) addressed this by making each thread (associated with an integer cluster) a "processor." This solved the problem of Windows shifting threads around but it created a problem of overwhelming the floating point cluster because it couldn't appropriately load balance integer and floating point!

Years passed and in Windows 10 (I think 8 too), Microsoft finally tackled the problem by making the Windows thread scheduler aware of physical processors and logical processors. Because of this, Windows can now appropriately delegate floating point (by physical processor) and integer (by logical processors). Why? Because it's a relevant problem for all SMT implementations. Ryzen, Core i#, and Pentium 4 w/ HT only have one integer cluster and one floating point cluster each but they can accept two threads. The underlying hardware has to be managed in a similar fashion Bulldozer's does. The only difference is that tiny little detail of Bulldozer having two integer clusters instead of one.

FX-8350 has four cores, each core has two integer clusters. They are not equivalent. A core knows how to do SIMD (single instruction multiple data); an integer cluster does not. Integer clusters are fundamentally calculators, not processors. They lack awareness (have no knowledge of parallelism), logic (Boolean tests, branching, etc.), and access (only have the data they are given) to be processors.
 
Last edited:
I don't have good academic sources handy to reference. The basics should be the same in course materials, for example:

Using both AMD's Zen and Intel's Skylake block diagrams from WikiChip as an example here. Check the SoC (entire die) diagram as well as the core parts that follow:

Any one core of these can be disabled independently, both in theory and in practice.
IO/Memory controllers, L3 cache and Infinity Fabric/Ring Bus are not part of the core.
Zen is a more Intel-like design than Bulldozer. Pairing it with Skylake doesn't resolve this issue.

As for the claim that Bulldozer doesn't function with intra-modular cores disabled... I guess you're unfamiliar with BIOS settings that can do that. I've run both the 8320E and 8370E with 4 cores via the 1 integer core per module setting.
 
Zen is a more Intel-like design than Bulldozer. Pairing it with Skylake doesn't resolve this issue.
Zen and Skylake are by-the-book multicore CPUs. So have been the rest of x86 CPUs including AMD's K8 and K10.
 
As for the claim that Bulldozer doesn't function with intra-modular cores disabled... I guess you're unfamiliar with BIOS settings that can do that. I've run both the 8320E and 8370E with 4 cores via the 1 integer core per module setting.
All that does is tell the core fetcher to only accept one thread instead of two. Fundamentally no different than disabling Hyperthreading.
 
Zen and Skylake are by-the-book multicore CPUs. So have been the rest of x86 CPUs including AMD's K8 and K10.
By Intel's book.

Diverging from common design doesn't mean lawsuit. :rolleyes:
All that does is tell the core fetcher to only accept one thread instead of two. Fundamentally no different than disabling Hyperthreading.
Citation please. Everything I've read said FX CPUs' CMT is not SMT. I am also interested in how that BIOS setting works, considering that you are saying it's mislabeled by Gigabyte.
 
In Bulldozer, there's actually four instruction fetchers: one for the core, one for integer cluster 1, one for integer cluster 2, and one for the floating point cluster. As far as software is concerned, it only sees the first: the one for the core. This is why Microsoft had a hell of a time trying to get Windows thread dispatching to work right. Windows had to be modified to more intelligently control threading so it wouldn't inadvertently move threads around in a manner that would overwhelm one core while leaving another idle. Older versions of Windows (I think it was 7) addressed this by making each thread (associated with an integer cluster) a "processor." This solved the problem of Windows shifting threads around but it created a problem of overwhelming the floating point cluster because it couldn't appropriately load balance integer and floating point!
There is only one fetch, this is in the frontend of the Bulldozer module. There are multiple schedulers for dispatched micro-ops owing to split execution stage. By the way, Thuban and Zen also have split execution stage and separate schedulers for Integer and FPU clusters.

Microsoft's problem with scheduling was strange. The eventual fix was a change in how a Bulldozer CPU was being handled. Initially Bulldozer was treated as full 8-core processor and situation improved considerably when it started to be treated as 4-core with SMT (it is noteworthy that Linux did the same much sooner). This inherently addressed both problems plaguing scheduling:
- Moving threads around to undesired cores. Simple example is a second core in single module where first core is already loaded.
- Because of the same reason any FPU-heavy load was now more likely to go to unused module largely negating the shared FPU issue.
 
law suite for just 35 bucks each lol
 
You have two threads in the core: both threads have a 256-bit FMAC operation required before they can progress. Both threads are stalled while the shared FPU round robin executes the operations: only one thread per cycle is being worked on.
Both threads are not stalled if both cores schedule a 256-bit op.
2x 80-bit Lo
2x 64-bit Hi
2x 128-bit Mid

If a 256-bit op; it would execute Lo twice for 128-bit and Hi twice for 128-bit on a single port for each thread. Lo to Hi register moves is 1-cycle, Lo(Hi) to Lo(Hi) register moves is 0-cycle. So, if the second thread is dependent on the first thread it could execute the first half on both. etc, etc, etc.

FPU design is built with two cores in mind. Front-end is built with two cores in mind. L2+interface is built with two cores in mind. There is physically only two cores in the Bulldozer module. As it is the world's first monolithic dual-core x86 architecture.
 
Last edited:
Initially Bulldozer was treated as full 8-core processor and situation improved considerably when it started to be treated as 4-core with SMT
Zen 2 also performs better with AVX-256 than Zen 1 because it can execute 256-bit instead of combining 128s. Does that mean Zen 1 didn't have any real cores in it?

How the Windows scheduler acts with a design it wasn't made for is an interesting topic but hardly proof.
 
Many CPUs had been sold without FPUs. There may have even been an Alpha at the time, shortly before, or after. It doesn't matter. FPUs were never a requirement to have a CPU.

Your argument is like saying that RAM wasn't really RAM until DDR. Doubling the data rate doesn't make RAM into RAM. It merely makes it DDR. An FPU, similarly, is an addition upon the basic spec. FPUs could be eliminated again from CPUs and emulated in software. The performance would be bad for FPU-dependent processing but it would still run. Software FPU was used for many years. The FPU is a superset of the CPU core.

The FPU has become part of a core because it started to be need often and because emulation sucks in term of performance.
We are not living in the days when code was properly optimized. Today we are living in the days when "buy better cpu" , "buy more ram" , etc is normal...

If I code something and I know for sure that whoever is leading is changing his/her mind regarding what he/she wants then you can bet I will basicaly abuse FPU usage so I just don't have to go back and change things. You can say it's my fault I will say it's whoever leading fault because he/she clearly has no clear goals in mind.
And trust me the "insert bad words" leader that is changing his/her mind regarding what he/she wants also want things done fast, totaly ignoring the fact that his/her instability is making things slower and lacking optimization.

There is not gonna be a real diference between 4 or 8 heavy FPU threads run on the 8150. This is a reason to raise the question if 8150 is an 8 core or 4 core. Someone might not even say what type of threads is pushing on the 8150 and just make such a benchmark and show that it basicaly no real difference between 4 and 8 threads run on the 8150. Do I have to say what type of threads I'm pushing on a CPU? I think not.
 
The FPU has become part of a core because it started to be need often and because emulation sucks in term of performance.
1) FPU emulation has always been vastly slower than having a hardware FPU. So, pointless.

2) I'll repeat my question: Zen 2 also performs better with AVX-256 than Zen 1 because it can execute 256-bit instead of combining 128s. Does that mean Zen 1 didn't have any real cores in it?

Zen 1 can't execute 256-bit AVX independently. It has to combine at the 128-bit level. So, it doesn't have any real cores, eh? Not only is it slower at doing 256-bit AVX, it can't do it independently.
 
Everything I've read said FX CPUs' CMT is not SMT.
CMT is a little bit of this, a little bit of that. SMT implies there are no added execution resources for additional threads. AMD added an Integer Cluster for CMT. Aside from that they are very similar.

Keep in mind that added Integer Cluster does not necessarily mean a huge boost in execution resources. Bulldozer Integer Clusters contain 4 pipes each (2 ALU, 2 AGU) and FPU contains 3 pipes (2 FMAC + MMX). At the same time, Zen's Integer Cluster has 6 pipes (4 ALU, 2 AGU) and FPU has 3 pipes (2 FMAC + MMX).
 
Back
Top