Originally Posted by Completely Bonkers
Looking at this picture
BD doesnt look like a smart design. Really, why would you have L3 cache the same size as L2? L3 is slower than L2... but if it is the same size... what benefit does it add? Only prefetching algorithms aka "netburst"ing opcode and data. It isnt acting as a cache, but as a prefetcher. In which case, it doesnt need to be 2GB... it might at well just be 64K.
Redesign BD right away! A quick win would be to take L3 down to 64K... saving die space and power and making fab cost and end price much cheaper. I bet performance would be within 3% mark. Double L1 if not quadruple and performance would be up 10% and still on lower die footprint and power consumption.
And get the processor to operate symmetrically rather than asymmetrically. All this nonsense about affinity locking 2 threads and getting a "turbo boost" effect. Kill it. Separate those cores with a little space saved from cutting L3. And kill turbo boost but raise all clocks to their max. Cooling will be better now they are spaced and there isnt heat from L3.
Each module can only use its 2MB L2 cache however the module could use the entire 8MB L3 if it needed.
As for the argument early the bulldozer die when analyzed the way AMD designed it has 4 ALU and 4 AGU per module. You would consider each module as a core. You cannot consider individual "cores" within the modules cores since they share the early pipelines. They are called integer cores. Each integer core carries a 4 way 16kB L1 data cache and a 64kB instruction cache. In a nutshell its two halves to a single brain, independent and codependent at the same time.