Discussion in 'News' started by btarunr, Dec 19, 2011.
Just throwing it out there Intel has released just as many if not more chips with issues. They had there own TLB bug, P4, bad chipsets, not to mention itanium's fiasco. There is nothing wrong with a hot fix again Intel had had there own batches of them. As of right Bulldozer is the best selling CPU's in AMD's lineup. If they were that shitty everyone would save there couple of bucks and get a Thuban. Get off your Intel high horse and look at the big picture. P4's hyperthreading sucked, but not a single person out there complaining about it on current Intel chips. This is AMD's first new design since K8 which was still heavily K7 based. You might remember Intel's current design is still an end result of a P3. Give AMD a generation to work some kinks out. Hell look at the APU performance jump clock for clock they are doing something right.
If you are talking about Bulldozer it is a true eight core...
Stop defending it...It has 8 Weak Cores...that have 33% less execution throughput than the competition core
Sandy Bridge has 3 ALUs and 3 AGUs per core(Threads compete for those 3 ALUs and 3 AGUs in Hyperthreading)
Bulldozer has 2 ALUs and 2 AGUs per core(Threads don't compete because there is TWO CORES!)
It's not really hard to notice that it has less execution resources not a longer pipeline
It takes 3 Cycles to do six 64bit executions for Bulldozer where it takes 2 cycles to do six 64bit executions for Sandy Bridge
Bulldozer though with all cores can do sixteen 64bit ALUs calcs. and do sixteen 64bit AGUs calcs. while Sandy Bridge can do twelve 64bit ALUs calcs. and twelve 64bit AGUs calcs.(with hyperthreading same old twelve 64bit ALUs calcs. and twelve 64bit AGUs calcs. no increase sillies)
Bulldozer is meant for Servers that need scalability with thread count...and Bulldozer does scale with thread count
I count 4 modules (cores)
It also performs similar to the previous generation with the same 3ALU/3AGU as Intel. It has 8 differently structured cores.
The threads don't compete all hyperthreading does is allow another set of instructions to be sent down the pipeline. It was originally a band-aid for Intel's long pipelined netburst based chips. AMD's new design gave you 2 separate threads something Hyperthreading can never do.
Whats either of those have to do with anything. It is still a "short" pipeline CPU in comparison to P4. Due to design it is not comparable to Intel in execution resources.
Again AMD's K7-K10h chips all offered the same 3/3 setup of calcs and did not offer an improvement except with K7/K8 vs netburst. Core 2 Duo and up when Intel went back to 3/3 were the first competitive offerings. The main reason netburst failed in Intel's eyes was a lack of clock scaling. Original design was said to scale to 8ghz and at that speed its long pipelines and 2/2 design would have held a performance edge.
Yup Bulldozer does what it was designed for and in heavily multithreaded apps it holds its own. With future chips offering a more refined design it will likely smoke some multithreading benchmarks. Especially since they already have proven it clocks higher.
According to your source Bulldozer does hold its own. Nowhere in my statement did I call it best. I said it did as it was designed heavy multithreading it is the bulldozer's bread and butter. Price for performance makes no difference to the vast majority of the companies running these style chips. The AMD box at the time of that writing outperformed the Intel and K10h based boxes. It doesn't matter if it had more ram, better hard drives or more cores. The point is the system was designed to do exactly that in a server environment and it succeeded in industry standard benchmarks. CEO's don't look at anandtech they look at the sheet of paper HP hands them that says quite clearly while at a higher cost performance per unit is higher. Less units at higher performance means less space.
Looking at this picture
BD doesnt look like a smart design. Really, why would you have L3 cache the same size as L2? L3 is slower than L2... but if it is the same size... what benefit does it add? Only prefetching algorithms aka "netburst"ing opcode and data. It isnt acting as a cache, but as a prefetcher. In which case, it doesnt need to be 2GB... it might at well just be 64K.
Redesign BD right away! A quick win would be to take L3 down to 64K... saving die space and power and making fab cost and end price much cheaper. I bet performance would be within 3% mark. Double L1 if not quadruple and performance would be up 10% and still on lower die footprint and power consumption.
And get the processor to operate symmetrically rather than asymmetrically. All this nonsense about affinity locking 2 threads and getting a "turbo boost" effect. Kill it. Separate those cores with a little space saved from cutting L3. And kill turbo boost but raise all clocks to their max. Cooling will be better now they are spaced and there isnt heat from L3.
Each module can only use its 2MB L2 cache however the module could use the entire 8MB L3 if it needed.
As for the argument early the bulldozer die when analyzed the way AMD designed it has 4 ALU and 4 AGU per module. You would consider each module as a core. You cannot consider individual "cores" within the modules cores since they share the early pipelines. They are called integer cores. Each integer core carries a 4 way 16kB L1 data cache and a 64kB instruction cache. In a nutshell its two halves to a single brain, independent and codependent at the same time.
I wonder what the latency is between the different banks of L3. With decent memory controllers and DDR3, the relative performance gain of L3 cache is getting lower and lower... perhaps time to drop L3 and beef up L1/L2 and separate those pipelines.
Those are 8 cores just looking at that you wouldn't notice the repeated alu/agu subsets and the dedicated datapaths each one have
Hyperthreading competes for the execution resources....
The 8MB of L3 is used mostly for big prefetches and it used by all modules and by all cores
No the design was that the 2 AGLUs were able to execute non-memory workloads(With the later versions being able to having all EX/AGLUs be AGLUs that can be able to output 4 Adds, 4 Subtracts, 4 Multiply, 4 Divide, 4 Memory ops per cycle in any order as long as it outputted four and this is per core)....Each module has two cores. You can consider the individual cores in the module cores since they have dedicated datapaths, instruction buses, data buses, and control units...
Don't impose your definition of what a core is if you are 100% wrong!
Simply mind blowing how a single cjop can cause such a fusd
I think you keyboard is broken.
No, I'm pretty sure he's gyt it rufht. Why make a fusd?
Speed posting on new fangled smart phone screen needs calibration
They don't have entirely seperate datapaths. The initial pipelines are shared between the integer cores.
These share the instruction set per module not core, and share all of the other resources.
Which is what was already said.
So each module acts as one core? giving 4 ALU/4 AGU per cycle. Thats exactly what I just said. Each module has dedicated datapaths, instruction buses, data buses and control units.
All shared within the module not within the integer core. The integer cores are not independant of the modules if they were it would be a true 8 core unit. No different than a Phenom X8 of sorts. This is not that. The integer cores share everything except a 16kB L1.
There are two definitions of a core and bulldozer fis neither.
Each core has 2 EXALUs and 2 AGLUs the original specification is that there was going to be 4 AGLUs but that was a rumour made by Dresdenboy
Again each core has dedicated datapaths, instruction buses, data buses, and control units..
2 DATAPATHS, 2 IBUSES, 2DBUSES, 2ConUNITS => 2 CORES NOTHING IS SHARED
IT IS EIGHT CORES!
TECHNICAL DEFINITIONS PLACE BULLDOZER of the OROCHI DIE AT EIGHT CORES
There are TWO Integer DATAPATHS, 2 CONTROL UNITS, 2 INSTRUCTION BUSES, 2 DATA BUSES GET IT IN YOUR GODDAMN BONEHEAD OF YOURS THAT THIS IS TWO CORES
Accept the facts and move on cdawall I am tired of your idiocy
*Reads through this thread and shakes head*
You guys are relentless on debating...
Lets go through your image specifically. Having separate datapaths means nothing when there is still only a single unit. 2 roads to the same place if you will.
The module is not actually split into 2 cores that is the idea behind Bulldozer fit more into the package. In the image I split it for simplicity the only section physically separate for the cores is the actual integer calculation sections with their cache. Everything else is shared again separate paths to the same place don't make the place anymore split. The cores would still have to share. Any communications outside of the module go core->module->IO not core->IO once again making the dependent of the module itself further making them not into a true core as is normal for a K10 or SB style CPU. This is a new design with separate integer cores within modules. They are not the same cores as anything else to this point utilizes. While an 8150 has 8 integer cores it does not have 8 separate processing modules like a Phenom X8 would.
I've been making that argument since before it launched and nobody seemed to care. Thank you for perfectly detailing what I couldn't.
It took me about 3 hours of reading a looking at different architectural designs to figure out how to finally phrase it. Thanks AMD for making shit more difficult again
Um......like I said. Its not an 8 core.
BTW thanks for detailing it cdawall. Really. I honestly didn't have the time and you did a better job then I could have. (Internet high five!) Bulldozer is only a fail to people who rested all of thier childhood expectations on a piece of silicon to enhance their mortality OR Intel fanboys who have small manhood's. Anyone with a brain can see what its design is for. Sometimes you don't get what you want, you get what you need.
Think about this, you have a scheduling issue, something that obviously needs to get fixed. I don't think 10% on average is unreasonable.
The same can be said in a busy doctor's office or in a hospital, if you don't schedule appointments properly, you end up running into patient bottlenecks.
We will know the facts soon enough in Q1 2012, and hopefully this 2 part patch will help efficiency within the Bulldozer and make it close to the way it was meant to run and perform.
Separate names with a comma.