Tuesday, August 24th 2010

AMD Details Bulldozer Processor Architecture

AMD is finally going to embrace a truly next generation x86 processor architecture that is built from ground up. AMD's current architecture, the K10(.5) "Stars" is an evolution of the more market-successful K8 architecture, but it didn't face the kind of market success as it was overshadowed by competing Intel architectures. AMD codenamed its latest design "Bulldozer", and it features an x86 core design that is radically different from anything we've seen from either processor giants. With this design, AMD thinks it can outdo both HyperThreading and Multi-Core approaches to parallelism, in one shot, as well as "bulldoze" through serial workloads with a broad 8 integer pipeline per core, (compared to 3 on K10, and 4 on Westmere). Two almost-individual blocks of integer processing units share a common floating point unit with two 128-bit FMACs.

AMD is also working on a multi-threading technology of its own to rival Intel's HyperThreading, that exploits Bulldozer's branched integer processing backed by shared floating point design, which AMD believes to be so efficient, that each SMT worker thread can be deemed a core in its own merit, and further be backed by competing threads per "core". AMD is working on another micro-architecture codenamed "Bobcat", which is a downscale implementation of Bulldozer, with which it will take on low-power and high performance per Watt segments that extend from all-in-One PCs all the way down to hand-held devices and 8-inch tablets. We will explore the Bulldozer architecture in some detail.

Bulldozer: The Turbo Diesel Engine
In many respects, the Bulldozer architecture is comparable to a diesel engine. Lower RPM (clock-speeds), high torque (instructions per second). When implemented, Bulldozer-based processors could outperform competing processor architectures at much lower clock speeds, due to one critical area AMD seems to have finally addressed: instructions per clock (IPC), unlike with the 65 nm "Barcelona" or 45 nm "Shanghai" architectures that upped IPC synthetically by using other means (such as backing the cores up with a level-3 cache, upping the uncore/northbridge clock speeds), the 32 nm Bulldozer actually features a broad integer unit with eight integer pipelines split into two portions, each portion having its own scheduler and L1 Data cache.



Parallelism: A Radical Approach?
Back when analysts were pinning high hopes on the Barcelona architecture, their hopes were fueled by early reports suggesting that AMD was using wide 128-bit wide floating point units, leading analysts to believe that AMD may have conquered its biggest nemesis - floating point performance, in turn its pure math crunching abilities. However, that wasn't exactly to be. That's because the processor's overall number crunching abilities were pegged to its floating point performance, ignoring the integer units.



AMD split 8 integers per core into two blocks, each block having four integer pipelines, an integer scheduler for those, and an L1 Data cache. These constitute the lowest level of "dedicated components", dedicated to processor threads. There is a shared floating point unit between the two, with two 128-bit FMACs, arbitrated by a floating point scheduler. The Fetch/Decode, an L2 cache, and the FPU constitute "shared" components.



AMD is implementing a simultaneous multithreading (SMT) technology, it can split each of the "dedicated" components (in this case, the integer unit) to deal with a thread of its own, while sharing certain components with the other integer unit, and effectively make each set of dedicated components a "core" in its own merit of efficiency. This way, the actual core of the Bulldozer die is deemed a "module", a superlative of two cores, and the Bulldozer die (chip) features n-number of modules depending on the model.
So now you have a chip with eight cores with much lower die sizes and transistor counts compared to a hypothetical 32 nm K10 8-core processor. It is unclear whether AMD wants to further push down SMT to the "core" level and run two threads simultaneously over dedicated components, but one thing for sure is that AMD has embraced SMT in some form or another. In all this, the chip-level parallelism is transparent to the operating system, it will only see a fixed number of logical processors, without any special software or driver requirement.

So in one go, AMD shot up its integer performance. Either a thread makes use of one integer unit with its four pipelines, or deals with both the integer units arbitrated by the fetch/decode, and the shared FPU.

Outside the modules
At the chip-level, there's a large L3 cache, a northbridge that integrates the PCI-Express root complex, and an integrated memory controller. Since the northbridge is completely on the chip, the processor does not need to deal with the rest of the system with a HyperTransport link. It connects to the chipset (which is now relegated to a southbridge, much like Intel's Ibex Peak), using A-Link Express, which like DMI, is essentially a PCI-Express link. It is important to note that all modules and extra-modular components are present on the same piece of silicon die. Because of this design change, Bulldozer processors will come in totally new packages that are not backwards compatible with older AMD sockets such as AM3 or AM2(+).
Expectations
Not surprisingly, AMD isn't talking about Bulldozer as the next big thing since dual-core processors (something it did with Barcelona). AMD currently does have an 8-core and 12-core processors codenamed "Magny-Cours", which are multichip modules of Shanghai (4-core) and Istanbul (6-core) dies. AMD expects an 8-core Bulldozer implementation (built with four modules), to have 50% higher performance-per-watt compared to Magny-Cours.



Market Segments
As mentioned in the graphic before, AMD's modular design allows it to create different products by simply controlling the number of modules on the die (by whichever method). With this, AMD will have processors ready with most PC and server market segments, all the way from desktop PCs, enthusiast-grade PCs, notebooks, to servers. AMD expects to have a full-fledged lineup in 2011. The first Bulldozer CPUs will be sold to the server market.


Hotchips 22 Presentation by AMD on the Bobcat Architecture
Below are as-is slides from AMD's Hotchips presentation on the Bobcat architecture.
Add your own comment

283 Comments on AMD Details Bulldozer Processor Architecture

#1
Super XP
AMD's diagram is interesting, it shows TWO L3 cache(s) and a TWO NB's for a 4 module, 8 core Bulldozer CPU. What do you guys make of this? Can this be how we get the so called Quad-Channel Integrated Memory controller previously rumoured, in terms of 2 x Dual-Channel IMC's.:D

Mark my words, if Bulldozer is indead based on a Quad-Channel interface, it's going to perform like a Bulldozer!!!!
Posted on Reply
#2
trickson
OH, I have such a headache
I just hope they finally have the right stuff .
Posted on Reply
#3
JATownes
by: Super XP
AMD's diagram is interesting, it shows TWO L3 cache(s) and a TWO NB's for a 4 module, 8 core Bulldozer CPU. What do you guys make of this? Can this be how we get the so called Quad-Channel Integrated Memory controller previously rumoured, in terms of 2 x Dual-Channel IMC's.:D

Mark my words, if Bulldozer is indead based on a Quad-Channel interface, it's going to perform like a Bulldozer!!!!
From what I understand this is correct. Looking for source...
Edit:
We’d be surprised if a Bulldozer APU had more than the four memory channels of a Magny-Cours CPU, but not that it would be quicker – Magny-Cours CPUs are comprised of two 6-core CPU dies, so the quad-channel memory controller is really two dual-channel units split across the two dies rather than one homogenous mega-controller.
Source

So if Magny cours is two dual channel IMCs it stands to reason that Bulldozer will implement the same tech.
Posted on Reply
#4
1c3d0g
Yawn. Let's see how this performs first, which is hopefully not as bad as Intel's craptastic Atoms... :(
Posted on Reply
#5
mastrdrver
by: JATownes
From what I understand this is correct. Looking for source...
Edit:

Source

So if Magny cours is two dual channel IMCs it stands to reason that Bulldozer will implement the same tech.
Seeing as Bulldozer is said to work in G34 and C32 sockets, showing dual memory and northbridge controllers isn't surprising. I think it is the C32 which is the 2P format for AMD from which will be derived the 1P high end desktop Bulldozer.
Posted on Reply
#6
W1zzard
i added the full slide deck from Hot Chips Conference to the first post
Posted on Reply
#7
Wile E
Power User
All I want is for AMD to be able to compete on the high end desktop market again.
Posted on Reply
#8
Atom_Anti
From where is the information the northbridge is completely on the chip:eek:?

All other sites not saying anything like that, but it will AM3 compatible:cool:!
Posted on Reply
#9
Hayder_Master
ok ok ok AMD u going to do great CPU's but what about damn AMD mother boards chips any improve
Posted on Reply
#10
TheMailMan78
Big Member
by: hayder.master
ok ok ok AMD u going to do great CPU's but what about damn AMD mother boards chips any improve
Whats wrong with the 890?
Posted on Reply
#12
inferKNOX
by: crazyeyesreaper
More on topic i dont mind a new socket either will be a nice change hopefully they fix the cpu HSF mounting issues currently seen on 939 AM2 AM2+ AM3 would be nice to have more freedom of heatsink choice without it crippling my ram selection due to tall heatspreaders
I think it's more a RAM-is-too-close-to-CPU issue, rather than a CPU HSF mounting issue.

by: Valdez
well, amd can have quad channel too, it's not intel only tech :)
I'm against the idea of having quad channel.:o
Needing 4 RAM sticks in order to get benefits of higher speed sucks when you think about 1 of them dying on you. At least if they could, if possible, make it such that it would have different modes where, if you have 2 sticks, it switches to dual channel, 3 sticks - triple channel, 4 sticks - quad channel.:rockout:

EDIT:
by: hayder.master
ok ok ok AMD u going to do great CPU's but what about damn AMD mother boards chips any improve
No doubt those chipsets will have PCIe 3.0, intergrated & USB 3.0 at the very least.
Posted on Reply
#13
jmcslob
Sabine: Mainstream mobile platform based on the Llano APU, which will see a quad-core Stars-based CPU and DirectX 11-class graphics processor tied together on the same piece of silicon, manufactured using 32 nm lithography. Sabine is expected to arrive in 2011.

Brazos: Ultra low-power mobile platform based on the Ontario APU, which will see a dual-core Bobcat-based CPU and DirectX 11-class graphics processor tied together on the same piece of silicon. Brazos is expected to arrive in 2011, and will allow AMD to drive netbooks, along with form factors the company’s hardware hasn’t yet appeared in (possibly tablets).

Scorpius: Enthusiast desktop platform based on AMD’s Zambezi processor and discrete graphics (AMD, of course, specifies an ATI GPU). The platform requires a quad-core CPU or higher, DDR3 memory, and a revised Socket AM3 interface. Availability is expected in 2011.

Lynx: Mainstream desktop platform based on AMD’s Llano APU. It’ll feature up to four CPU cores, a single graphics core (integrated onto the APU, naturally), and DDR3 memory. Availability is expected in 2011.

Components:

Llano: This is going to be AMD’s first APU, combining a quad-core Stars-based CPU and DirectX 11-class GPU on a single piece of silicon. It’ll be manufactured using a 32 nm SOI process, support DDR3 memory, and include core-level power gating. Because there are brand new capabilities in play here, it should surprise no one that Llano will drop into a new socket interface. Availability is expected in 2011.

Ontario: While the Llano APU absorbs much of AMD’s risk in shifting to 32 nm manufacturing (since it employs a familiar CPU microarchitecture and more mature manufacturing process), Ontario will be the first APU to employ AMD’s Bobcat CPU microarchitecture. Ontario is manufactured at 40 nm, armed with DirectX 11-class graphics, and expected in 2011.

Zambezi: Per AMD, Zambezi will be the first desktop processor based on the company’s Bulldozer architecture. Featuring as many as eight cores, Zambezi-based offerings will incorporate as many as four processor “modules.” AMD plans to use 32 nm manufacturing, and early reports suggest Socket AM3 compatibility (along with DDR3 memory support). Zambezi is not an APU, but rather is meant to be paired with discrete graphics.

Interlagos/Valencia: Respective code-names for AMD’s upcoming 16-core and eight-core Opteron processors, respectively, both based on the Bulldozer microarchitecture. Interlagos will drop into the existing G34 interface, while Valencia is C32-compatible. Both families will be manufactured using 32 nm SOI lithography, will support DDR3 (including load-reduced DIMMs and 1.25 V memory modules), and are expected in 2011.
Posted on Reply
#14
Hayder_Master
by: TheMailMan78
Whats wrong with the 890?
can u compare it with X58
Posted on Reply
#15
jmcslob
by: hayder.master
can u compare it with X58
I think with a matched processor you could
the MSI 890FXA-GD70 AM3 comes to mind
Posted on Reply
#16
TheMailMan78
Big Member
by: hayder.master
can u compare it with X58
Apples to oranges. Nvidia will never let AMD/ATI run SLI native on a 890 or "990" chipset. If so then yeah.
Posted on Reply
#17
inferKNOX
by: jmcslob
Scorpius: Enthusiast desktop platform based on AMD’s Zambezi processor and discrete graphics (AMD, of course, specifies an ATI GPU). The platform requires a quad-core CPU or higher, DDR3 memory, and a revised Socket interface. Availability is expected in 2011.

Components:

Zambezi: Per AMD, Zambezi will be the first desktop processor based on the company’s Bulldozer architecture. Featuring as many as eight cores, Zambezi-based offerings will incorporate as many as four processor “modules.” AMD plans to use 32 nm manufacturing, and reports suggest Socket AM3 incompatibility (along with DDR3 memory support). Zambezi is not an APU, but rather is meant to be paired with discrete graphics.
corrected ;)
Posted on Reply
#19
nt300
Rememb AMD is sort of new to the chipset family despite ATIs previous knowledge. 790FX was a start and good one. The 890FX was a better yes, and help gain AMD more experience. And now the upcoming 990FX or some may say the Bulldozer chipset should be what they've been leading to for years now and should be feature rich just like Intels high end chipsets.
Posted on Reply
#20
cadaveca
My name is Dave
by: Bloodcrazz
misinformed rofl, the new socket your talking about. is for the server platform.
Zambezi will be am3+(am3r2).
sigh whos misinformed now.
http://www.extremetech.com/article2/0,2845,2368186,00.asp
by: btarunr
Old roadmap is old. AMD told us it's a different socket just last week.
I dunno...personally, i find ExtremeTech not that "techie". They are a Ziff Davis pulication, no?

I tend to trust BTA here. The way I look at it, the only way I see decent performance boost overall is with a new socket, so I won't ever use current boards with these upcoming chips...seems a waste of possible resources.


AMD has said for a long time that Zambezi would be AM3+, not exactly the same as current AM3 socket. That suggests to me that because of seperate NB/memcontrollers, they can disable some functionality on these chips as needed(AMD's "Modular Design").

Oh, and by the way, the August 24th article hosted on that site makes no mention of socket plans, except this:
AMD also told us that it will introduce a new AM3+ socket for consumer versions of Bulldozer CPUs. AM2 and AM3 processors will work in the AM3+ socket, but Bulldozer chips will not work in non-AM3+ motherboards.
.
Posted on Reply
#21
jpierce55
by: toyo
Judging by the number of comments until mine, I can tell there's lots of scepticism about AMD's new CPU line... I guess they just delayed it for too long.

However, it seems they were kinda broke, and maybe it is only the 5800 series success that put Bulldozer back on the drawing board. I thought they abandoned the project for lack of resources or something.

Whatever the reality is, I hope it is worth waiting... AMD deserves a high-end CPU that will kick Intel line in the arse... it's maybe time for another performance crown switch like in good old Athlon vs Pentium days...
I have skepticism due to all the hype they put on the Phenom, a really sorry processor for the hype. I hope AMD pulls it off. I doubt it will be up to par with Intel, but maybe a little closer, and hopefully a bit cheaper. I hope my next cpu is an AMD.
Posted on Reply
#22
Bloodcrazz
by: cadaveca
I dunno...personally, i find ExtremeTech not that "techie". They are a Ziff Davis pulication, no?

I tend to trust BTA here. The way I look at it, the only way I see decent performance boost overall is with a new socket, so I won't ever use current boards with these upcoming chips...seems a waste of possible resources.


AMD has said for a long time that Zambezi would be AM3+, not exactly the same as current AM3 socket. That suggests to me that because of seperate NB/memcontrollers, they can disable some functionality on these chips as needed(AMD's "Modular Design").

Oh, and by the way, the August 24th article hosted on that site makes no mention of socket plans, except this:
amd were say this pre 890fx. why wouldnt then amd fit this new am3r2 socket on the 890?
it says am3+ will work on older chips but new chips wont work on am3.
890fx will be base for the scorpius like was planed all long most sites are saying this not just that one and people that were at hotchips.
so he was misinformed
Posted on Reply
#23
cadaveca
My name is Dave
It's hard to judge how accurate any info from AMD is, at this point.

AMD officials were hyping 3ghz Phenom chips. This never materialized. The set a precedent there...for lying about about future products (they WERE supposed to have a policy of not discussing future products, I thought)

So, until we have retail products, I am skeptical of any of this info, and if they use 890FX for Zambezi, I definately won't be buying.



What has me really curious is the motivation behind such details being released so early...why? 6 months way from any launch, in the least, and they are pimping products in the public domain? Seems fishy to me.
Posted on Reply
#24
Bloodcrazz
http://www.tomshardware.com/reviews/bulldozer-bobcat-hot-chips,2724-2.html
Scorpius: Enthusiast desktop platform based on AMD’s Zambezi processor and discrete graphics (AMD, of course, specifies an ATI GPU). The platform requires a quad-core CPU or higher, DDR3 memory, and a revised Socket AM3 interface. Availability is expected in 2011.
toms wrong 2
lol everyone turn on panic rofl but panic was the only one that was right
Posted on Reply
#25
cadaveca
My name is Dave
[QUOTE=Tom's]likely, Socket AM3 desktop platforms as well[/quote]:shadedshu

There is so much conflicting info out there...makes you wonder...:rolleyes:
Posted on Reply
Add your own comment