Monday, December 19th 2011

AMD FX 8150 with Microsoft KB2592546 Put Through 'Before and After' Patch Tests
To the surprise of many, last week, Microsoft rolled out a patch (KB2592546) for Windows that it claimed would improve performance of systems running AMD processors based on the "Bulldozer" architecture. The patch works by making the OS aware of the way Bulldozer cores are structured, so it could effectively make use of the parallelism at its disposal. Sadly, a couple of days later, it pulled that patch. Meanwhile, SweClockers got enough time to do a "before and after" performance test of the AMD FX-8150 processor, using this patch.
The results of SweClockers' tests are tabled below. "tidigare" is before, "nytt" is after, and "skillnad" is change. The reviewer put the chip through a wide range of tests, including synthetic CPU-intensive tests (both single and multi-threaded), and real-world gaming performance tests. The results are less than impressive. Perhaps, that's why the patch was redacted.
Source:
SweClockers
The results of SweClockers' tests are tabled below. "tidigare" is before, "nytt" is after, and "skillnad" is change. The reviewer put the chip through a wide range of tests, including synthetic CPU-intensive tests (both single and multi-threaded), and real-world gaming performance tests. The results are less than impressive. Perhaps, that's why the patch was redacted.
96 Comments on AMD FX 8150 with Microsoft KB2592546 Put Through 'Before and After' Patch Tests
Stop defending it...It has 8 Weak Cores...that have 33% less execution throughput than the competition core
Sandy Bridge has 3 ALUs and 3 AGUs per core(Threads compete for those 3 ALUs and 3 AGUs in Hyperthreading)
Bulldozer has 2 ALUs and 2 AGUs per core(Threads don't compete because there is TWO CORES!)
It's not really hard to notice that it has less execution resources not a longer pipeline
It takes 3 Cycles to do six 64bit executions for Bulldozer where it takes 2 cycles to do six 64bit executions for Sandy Bridge
Bulldozer though with all cores can do sixteen 64bit ALUs calcs. and do sixteen 64bit AGUs calcs. while Sandy Bridge can do twelve 64bit ALUs calcs. and twelve 64bit AGUs calcs.(with hyperthreading same old twelve 64bit ALUs calcs. and twelve 64bit AGUs calcs. no increase sillies)
Bulldozer is meant for Servers that need scalability with thread count...and Bulldozer does scale with thread count
I count 4 modules (cores) It also performs similar to the previous generation with the same 3ALU/3AGU as Intel. It has 8 differently structured cores. The threads don't compete all hyperthreading does is allow another set of instructions to be sent down the pipeline. It was originally a band-aid for Intel's long pipelined netburst based chips. AMD's new design gave you 2 separate threads something Hyperthreading can never do. Whats either of those have to do with anything. It is still a "short" pipeline CPU in comparison to P4. Due to design it is not comparable to Intel in execution resources. Again AMD's K7-K10h chips all offered the same 3/3 setup of calcs and did not offer an improvement except with K7/K8 vs netburst. Core 2 Duo and up when Intel went back to 3/3 were the first competitive offerings. The main reason netburst failed in Intel's eyes was a lack of clock scaling. Original design was said to scale to 8ghz and at that speed its long pipelines and 2/2 design would have held a performance edge. Yup Bulldozer does what it was designed for and in heavily multithreaded apps it holds its own. With future chips offering a more refined design it will likely smoke some multithreading benchmarks. Especially since they already have proven it clocks higher.
:rolleyes:
BD doesnt look like a smart design. Really, why would you have L3 cache the same size as L2? L3 is slower than L2... but if it is the same size... what benefit does it add? Only prefetching algorithms aka "netburst"ing opcode and data. It isnt acting as a cache, but as a prefetcher. In which case, it doesnt need to be 2GB... it might at well just be 64K.
Redesign BD right away! A quick win would be to take L3 down to 64K... saving die space and power and making fab cost and end price much cheaper. I bet performance would be within 3% mark. Double L1 if not quadruple and performance would be up 10% and still on lower die footprint and power consumption.
And get the processor to operate symmetrically rather than asymmetrically. All this nonsense about affinity locking 2 threads and getting a "turbo boost" effect. Kill it. Separate those cores with a little space saved from cutting L3. And kill turbo boost but raise all clocks to their max. Cooling will be better now they are spaced and there isnt heat from L3.
As for the argument early the bulldozer die when analyzed the way AMD designed it has 4 ALU and 4 AGU per module. You would consider each module as a core. You cannot consider individual "cores" within the modules cores since they share the early pipelines. They are called integer cores. Each integer core carries a 4 way 16kB L1 data cache and a 64kB instruction cache. In a nutshell its two halves to a single brain, independent and codependent at the same time.
Don't impose your definition of what a core is if you are 100% wrong!
All shared within the module not within the integer core. The integer cores are not independant of the modules if they were it would be a true 8 core unit. No different than a Phenom X8 of sorts. This is not that. The integer cores share everything except a 16kB L1. There are two definitions of a core and bulldozer fis neither.
Again each core has dedicated datapaths, instruction buses, data buses, and control units..
2 DATAPATHS, 2 IBUSES, 2DBUSES, 2ConUNITS => 2 CORES NOTHING IS SHARED
IT IS EIGHT CORES!
TECHNICAL DEFINITIONS PLACE BULLDOZER of the OROCHI DIE AT EIGHT CORES
Accept the facts and move on cdawall I am tired of your idiocy
You guys are relentless on debating...
The module is not actually split into 2 cores that is the idea behind Bulldozer fit more into the package. In the image I split it for simplicity the only section physically separate for the cores is the actual integer calculation sections with their cache. Everything else is shared again separate paths to the same place don't make the place anymore split. The cores would still have to share. Any communications outside of the module go core->module->IO not core->IO once again making the dependent of the module itself further making them not into a true core as is normal for a K10 or SB style CPU. This is a new design with separate integer cores within modules. They are not the same cores as anything else to this point utilizes. While an 8150 has 8 integer cores it does not have 8 separate processing modules like a Phenom X8 would.
BTW thanks for detailing it cdawall. Really. I honestly didn't have the time and you did a better job then I could have. (Internet high five!) Bulldozer is only a fail to people who rested all of thier childhood expectations on a piece of silicon to enhance their mortality OR Intel fanboys who have small manhood's. Anyone with a brain can see what its design is for. Sometimes you don't get what you want, you get what you need.
The same can be said in a busy doctor's office or in a hospital, if you don't schedule appointments properly, you end up running into patient bottlenecks.
We will know the facts soon enough in Q1 2012, and hopefully this 2 part patch will help efficiency within the Bulldozer and make it close to the way it was meant to run and perform.