Tuesday, August 24th 2010

AMD Details Bulldozer Processor Architecture

AMD is finally going to embrace a truly next generation x86 processor architecture that is built from ground up. AMD's current architecture, the K10(.5) "Stars" is an evolution of the more market-successful K8 architecture, but it didn't face the kind of market success as it was overshadowed by competing Intel architectures. AMD codenamed its latest design "Bulldozer", and it features an x86 core design that is radically different from anything we've seen from either processor giants. With this design, AMD thinks it can outdo both HyperThreading and Multi-Core approaches to parallelism, in one shot, as well as "bulldoze" through serial workloads with a broad 8 integer pipeline per core, (compared to 3 on K10, and 4 on Westmere). Two almost-individual blocks of integer processing units share a common floating point unit with two 128-bit FMACs.

AMD is also working on a multi-threading technology of its own to rival Intel's HyperThreading, that exploits Bulldozer's branched integer processing backed by shared floating point design, which AMD believes to be so efficient, that each SMT worker thread can be deemed a core in its own merit, and further be backed by competing threads per "core". AMD is working on another micro-architecture codenamed "Bobcat", which is a downscale implementation of Bulldozer, with which it will take on low-power and high performance per Watt segments that extend from all-in-One PCs all the way down to hand-held devices and 8-inch tablets. We will explore the Bulldozer architecture in some detail.

Bulldozer: The Turbo Diesel Engine
In many respects, the Bulldozer architecture is comparable to a diesel engine. Lower RPM (clock-speeds), high torque (instructions per second). When implemented, Bulldozer-based processors could outperform competing processor architectures at much lower clock speeds, due to one critical area AMD seems to have finally addressed: instructions per clock (IPC), unlike with the 65 nm "Barcelona" or 45 nm "Shanghai" architectures that upped IPC synthetically by using other means (such as backing the cores up with a level-3 cache, upping the uncore/northbridge clock speeds), the 32 nm Bulldozer actually features a broad integer unit with eight integer pipelines split into two portions, each portion having its own scheduler and L1 Data cache.



Parallelism: A Radical Approach?
Back when analysts were pinning high hopes on the Barcelona architecture, their hopes were fueled by early reports suggesting that AMD was using wide 128-bit wide floating point units, leading analysts to believe that AMD may have conquered its biggest nemesis - floating point performance, in turn its pure math crunching abilities. However, that wasn't exactly to be. That's because the processor's overall number crunching abilities were pegged to its floating point performance, ignoring the integer units.



AMD split 8 integers per core into two blocks, each block having four integer pipelines, an integer scheduler for those, and an L1 Data cache. These constitute the lowest level of "dedicated components", dedicated to processor threads. There is a shared floating point unit between the two, with two 128-bit FMACs, arbitrated by a floating point scheduler. The Fetch/Decode, an L2 cache, and the FPU constitute "shared" components.



AMD is implementing a simultaneous multithreading (SMT) technology, it can split each of the "dedicated" components (in this case, the integer unit) to deal with a thread of its own, while sharing certain components with the other integer unit, and effectively make each set of dedicated components a "core" in its own merit of efficiency. This way, the actual core of the Bulldozer die is deemed a "module", a superlative of two cores, and the Bulldozer die (chip) features n-number of modules depending on the model.
So now you have a chip with eight cores with much lower die sizes and transistor counts compared to a hypothetical 32 nm K10 8-core processor. It is unclear whether AMD wants to further push down SMT to the "core" level and run two threads simultaneously over dedicated components, but one thing for sure is that AMD has embraced SMT in some form or another. In all this, the chip-level parallelism is transparent to the operating system, it will only see a fixed number of logical processors, without any special software or driver requirement.

So in one go, AMD shot up its integer performance. Either a thread makes use of one integer unit with its four pipelines, or deals with both the integer units arbitrated by the fetch/decode, and the shared FPU.

Outside the modules
At the chip-level, there's a large L3 cache, a northbridge that integrates the PCI-Express root complex, and an integrated memory controller. Since the northbridge is completely on the chip, the processor does not need to deal with the rest of the system with a HyperTransport link. It connects to the chipset (which is now relegated to a southbridge, much like Intel's Ibex Peak), using A-Link Express, which like DMI, is essentially a PCI-Express link. It is important to note that all modules and extra-modular components are present on the same piece of silicon die. Because of this design change, Bulldozer processors will come in totally new packages that are not backwards compatible with older AMD sockets such as AM3 or AM2(+).
Expectations
Not surprisingly, AMD isn't talking about Bulldozer as the next big thing since dual-core processors (something it did with Barcelona). AMD currently does have an 8-core and 12-core processors codenamed "Magny-Cours", which are multichip modules of Shanghai (4-core) and Istanbul (6-core) dies. AMD expects an 8-core Bulldozer implementation (built with four modules), to have 50% higher performance-per-watt compared to Magny-Cours.



Market Segments
As mentioned in the graphic before, AMD's modular design allows it to create different products by simply controlling the number of modules on the die (by whichever method). With this, AMD will have processors ready with most PC and server market segments, all the way from desktop PCs, enthusiast-grade PCs, notebooks, to servers. AMD expects to have a full-fledged lineup in 2011. The first Bulldozer CPUs will be sold to the server market.


Hotchips 22 Presentation by AMD on the Bobcat Architecture
Below are as-is slides from AMD's Hotchips presentation on the Bobcat architecture.
Add your own comment

283 Comments on AMD Details Bulldozer Processor Architecture

#1
3volvedcombat
Well you guys that crying about the bulldozer cores being on a different platform, thats what you get just buying without being responsible and thinking about it.

I would not go purchase something unless its out already- or all details- road maps- and even sites pre-listing the hardware at a certain price pop up.

You guys went and purchased 890fx boards expecting 8 core on them- when there were barley quad cores and thuban dies were barley expected 3-4 months later.

w00t on the epic purchases some of you guys made- I say live with your 890fx platform because it could still be a while before bulldozer comes out, because if you THINK about it,
AMD wants money right now- so there not just going to completely dis own the 6 core and 4 core chips which are out right now. So if they do release it even later then the say- thats good.

Because there just trying to get the brilliant quad cores and thuban cores sold- till they displace them with 200-300 dollar bulldozer chips.

You DO NOT WANT them to release bulldozer anytime soon or i smell a 500 dollar AMD cpu

even the thought of a 400 dollar amd cpu scares me :eek:

Let them sell the chips right now- get some profit- stop production on some deneb chips- lower the prices of the thuban dies and 890fx motherboards- and then release a whole new platform at previous thuban die prices

and ill be happy there staying cheap with such epic 8 core designs.
Posted on Reply
#2
HolyCow02
Sounds completely awesome. Can't wait to see some numbers on these when they come out, whenever that will be.
Posted on Reply
#3
Mindweaver
Moderato®™
They will have to beat i7 to price a chip over 300 bucks. I see them putting out a feeler chip to show the new chip can beat an i7 before they just put out a 500 chip. But that being said I CAN'T WAIT! hehehe These are great days we are living in my friends! :toast:
Posted on Reply
#4
Valdez
If it won't be am3 compatible, i'm sure AMD will release new, 32nm steppings for am3 platform based on the current generation.
Posted on Reply
#5
largon
Finally a new socket.
I was kinda worried since it appeared until today that Bulldozer landed on AM3.
Now the quad channel RAM claim also makes sense.

PS.
Laughing at that one silly kid who's pissed at AMD.
Posted on Reply
#6
mastrdrver
by: btarunr
This is pure FUD, but:

I personally expect the consumer Bulldozer package to have nearly 1000~1400 pins, just not arranged like AM3/2 or compatible with it.

The processor may continue to have a dual-channel DDR3 memory controller (maybe higher memory clock speeds of 1833 MHz support to give higher bandwidth).

Processor will need pins to give out 40 PCI-Express 2.1 lanes (incl. the A-Link III which is x4).

No HyperTransport pins on the consumer packages. The 2P/4P Opteron package might be bigger, as it needs pins for 1 or 2 16-bit HyperTransport links (to neighbouring sockets).
by: DigitalUK
i think i remember seeing bulldozer was going to have quad-channel memory, i could be wrong. but i cant wait for bulldozer been waiting for years, wish i could pre order now..:)
1866mhz is suppose to be support from the get go with Llano so I suspect Bulldozer to support it too. Also dual channel has been known to only be supported. That quad channel thing is for high end desktop Sandy Bridge.
Posted on Reply
#7
suraswami
I guess that this 'BullDozer' is mainly aimed at the server market where they have lost almost all of its market share. Where ever I go I see Intel dominated servers as against Opty dominated servers few years back. This is the place where both these companies make most money.

New Socket = better, in my opinion as they don't need to form a bubble and work within it. Would have been nice to get a proc for AM3 socket, but as such I am tired of existing AM2...AM3 sockets.

If this processor really shines (without the lapping :laugh:), AMD will be back on money making on their CPUs.

Go AMD Go :rockout:
Posted on Reply
#8
Valdez
by: mastrdrver
That quad channel thing is for high end desktop Sandy Bridge.
well, amd can have quad channel too, it's not intel only tech :)
Posted on Reply
#9
techtard
Good thing I read this, I was about to pull the trigger on a 890fx mobo, 1055 and some ram.
I guess my trusty old frankensteined system can live for another few months until we get some performance numbers on Bulldozer.

If it lives up to the hype, then maybe it will force Intel to cut prices and we all win.
Posted on Reply
#10
wahdangun
by: CDdude55
Man, 8 cores and 16 threads just seems so pointless right now. The software isn't keeping up.:(
if just nvdia allow for proper coding of physix and allow more cores to be utilized and not use that ancient X87 code w will be surely seeing superb physix effect WITHOUT needed any of GPU power so GPU can concentrate to flex its muscle in graphic department and our hexacore CPU won't be waste of silicon


bring on bullet phisyx and please intel optimize that havoc engine so it will be use all 6 six core
Posted on Reply
#11
OneCool
I want it!!

HD 6870 = I want it!!
Posted on Reply
#12
crazyeyesreaper
Chief Broken Rig
i STILL cant see why ppl are bitching about 890fx for Im mean for fucking goodness sake bulldozer wont be here for another full god damn year get over it. 890fx will have nearly a 2 year life span by then as well so everyone needs to stop bitching 2 years in the tech world is a long god damn time. I look at it this way by the Time bulldozer comes out i will have fully enjoyed and used my hardware to its fullest and gotten my moneys worth from it. seriously... 2007- 2011 my machines base configuration will be 4 years old by the time Bulldozer hits.. ppl are seriously going to complain about socket longevity? i went from athlon x2 to 940be then jumped to 965 and grabbed a used mobo. Seriously 4 years on a machine in terms of being an enthusiast is a long ass time. The tech world constantly moves forwards just as the sun rises and sets and the world spins. IF you dont like those facts of life go play russian roulette with all the chambers filled.'

More on topic i dont mind a new socket either will be a nice change hopefully they fix the cpu HSF mounting issues currently seen on 939 AM2 AM2+ AM3 would be nice to have more freedom of heatsink choice without it crippling my ram selection due to tall heatspreaders
Posted on Reply
#13
erocker
Who's complaining? The current socket has had a good and long life, I see no reason to complain. :ohwell:

by: crazyeyesreaper
im talking about the guy who bitched for nearly half the thread about 890fx... when if he used his brain he would realize that when bulldozer does come out 890fx will be nearly 2 years old
It's funny how one person can ruin a thread. Ignore the trolls.
Posted on Reply
#14
CDdude55
Crazy 4 TPU!!!
by: wahdangun
if just nvdia allow for proper coding of physix and allow more cores to be utilized and not use that ancient X87 code w will be surely seeing superb physix effect WITHOUT needed any of GPU power so GPU can concentrate to flex its muscle in graphic department and our hexacore CPU won't be waste of silicon


bring on bullet phisyx and please intel optimize that havoc engine so it will be use all 6 six core
But the whole point of Physx is so the GPU does the physics processing instead of the CPU. whether or not a CPU can utilize all of it's cores is a matter of software taking advantage of those cores. How is physx holding the CPU back?, even if physx is poorly coded, how would that effect the CPU?. I don't understand how in anyway physx could be holding back a part that it has nothing to do with.

But of course, everything has to be Nvidia's fault right.:shadedshu
Posted on Reply
#15
crazyeyesreaper
Chief Broken Rig
im talking about the guy who bitched for nearly half the thread about 890fx... when if he used his brain he would realize that when bulldozer does come out 890fx will be nearly 2 years old
Posted on Reply
#16
DannibusX
AMD needs a socket change to compete. I have no problems with it. They've done a great job keeping their processors backward compatible to this point. Who says that future processors won't be backward compatible with whatever socket that Bulldozer gets?
Posted on Reply
#17
crazyeyesreaper
Chief Broken Rig
AMD said it lol ^ but yea seriously AM2 to AM2+ to AM3 we had by the time bulldozer gets here nearly 5 years of same socket it was indeed time for a change :toast:
Posted on Reply
#18
CDdude55
Crazy 4 TPU!!!
Don't see what the big deal is with them changing sockets. Sockets get old and eventually needs to be replaced with a newer one with better tech on board. I don't see why someone would get mad at that.
Posted on Reply
#19
wolf
Performance Enthusiast
remember that if you still have an AM2+ or AM3 socket you wil still get whatever new processors AMD release until bulldozer, I doubt they are going to stop in their tracks from now until it and its new platform are released.

I for one am looking forward to plopping a 6 core into my 785G in about 9-12 months.
Posted on Reply
#20
CDdude55
Crazy 4 TPU!!!
by: wolf
remember that if you still have an AM2+ or AM3 socket you wil still get whatever new processors AMD release until bulldozer, I doubt they are going to stop in their tracks from now until it and its new platform are released.

I for one am looking forward to plopping a 6 core into my 785G in about 9-12 months.
I agree.

I also hope the 980x or i7 970 drop down in price eventually so i can go 6 core to.:)
Posted on Reply
#21
cadaveca
My name is Dave
So, new socket? that makes me VERY happy.
Posted on Reply
#22
Super XP
IMO, 50% increase in throughput over Phenom II does not necessarily mean 50% more performance increase, w/ yes 33% more cores.
It sounds like AMD choose their wording very intelligently.
Posted on Reply
#23
cavemanthreeonesix
Glad to here some concrete news about the bulldozer, new platform is definitely a positive move forward imho, K10 has definitely peaked so hopefully they can move forward from that.

Only downside is i'm split on whether the crosshair IV extreme is going to worth getting anymore, if it ever comes out...
Posted on Reply
#24
crazyeyesreaper
Chief Broken Rig
you got a 1 year + so it might be might not up to you
Posted on Reply
#25
mastrdrver
Fwiw on Bulldozer being backward compatible, Tech Report is saying they "expect compatibility............although specifics about that are still murky."

Notably though is that they confirmed that they are compatible with current C32 and G34 for the server side. Maybe that is why they think there might be a chance for AM3 compatibility?

by: Valdez
well, amd can have quad channel too, it's not intel only tech :)
Your absolutely right but the real question is why? Not even the 6 core/12 thread Westmere chips fully use triple channel for high end desktop though it would be a great feature on server systems.

Now some will say what about the gpu being moved on die. With the rumor of a maximum of ~400 SP being on a Llano some would argue that adding more channels would be beneficial here. Thing is more channels require more die room being reserved for that.

Why not just support higher frequencies instead? Save die space for something else (or just take the space savings and the power you save too). If I'm not mistaken, dual channel with 1866mhz memory has higher bandwidth than the current Intel triple channel that is on the 9xx chips (obviously this is generally speaking).
Posted on Reply
Add your own comment