• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Orochi ''Bulldozer'' Die Holds 16 MB Cache

incorrect.....the l1 instruction share by one module(2 cores) and l2 is share by two modules and l3 is share by all modules...

and about 8k l1 data....i remember i saw the spec from anandtech three months ago...however i found wiki had 16k l1 cache.....which i'd rather believe anandtech's source...

You have 2 opinions to choose from:

1. A reporter who has never touched the product

or

2. The director of product marketing for servers at AMD


Choose carefully, there will be a test at the end of the class.
 
You have 2 opinions to choose from:

1. A reporter who has never touched the product

or

2. The director of product marketing for servers at AMD


Choose carefully, there will be a test at the end of the class.

Well put :toast:
 
L1 cache is not 8k. Check my blog in a week or so for the answer. There is l1 instruction shared between two cores, l1 data per core and l2 shared between 2 cores. L3 is shared at the die level

wow that's some agressive increase and management of cache, I hope performance increases, AMD's succes is good for us might bring prices down, and performance up.
 
I hope performance increases, AMD's succes is good for us might bring prices down, and performance up.

I agree, luckly i can wait untill after bulldozer is out to make my next big upgrade so if it is a great cpu/good value i of corse will go for it and if not then i would hope intel's current gen at the time will have a price drop around that time.

Although i admit i would prefer to go with amd as i had an amd k6 back in the day and was so happy with it and was lucky enough to go through 3 amd cpu's on my current motherboard so i would love to keep supporting them and if i am lucky go through another 2/3 cpu's on my next motherboard as i hate the idea of having to change motherboard every time i upgrade my cpu.
 
incorrect.....the l1 instruction share by one module(2 cores) and l2 is share by two modules and l3 is share by all modules...

L2 is shared between two cores within one module.

And yes, JF-AMD indeed is director of product marketing for servers at AMD. Waiting for W1zzard to give him his title. He may have known details about Orochi months before anyone else did.
 
L2 is shared between two cores within one module.

And yes, JF-AMD indeed is director of product marketing for servers at AMD. Waiting for W1zzard to give him his title. He may have known details about Orochi months before anyone else did.

so its basically a core2duo ?
 
I can't wait to see the numbers. I also hope that the new architecture still overclocks well. Competition at the top end would be killer. I want my $1000 chips to either become $500 chips, or become twice as fast for the $1000.
 
Oh boy... Here we go.
they are try to fix the single thread performance hit due to the smaller l1 data/instruction.
As if they would have had any problems slapping in an equally sized or larger than Hammer's L1s... It's not like this is AMD's first CPU architecture ever, or that adding such and amount would be of any die area concern. And for comparison, Nehalem has 32kB per core, 16kB per thread AND a tiny 256kB L2 - I bet Intel must be struggling with similar performance hit.
each core "only" had 8kb l1 data
Err... No.
Each Bulldozer module has two set of integer pipelines and both of them have dedicated 16kB L1D. 16+16kB in total per module, 16kB per thread.
while the instruction cache is share by module which just only 64kb "2 way" in cache(could have be less...i think...)
Bulldozer's L1I is 64kB, that's been public for some time now. About the bracketed comment; you think it could have been smaller, or you aren't sure what size it is?
which is roughly 40kb per core compare to core's 64kb per core. big disadvantage.
If you say so...
so all they can do is add more l3 cache to increase the performance (...) same thing intel did when realized northwood its poor l1 cache will drag down performance they increase l2 cache from 256kb to 512kb.
And by coincidence, Intel is doing the same. "Obviously" they too must be patching Core m-arch's "poor L1s and L2s" by adding cache levels and continuously increasing their size.
however orochi is 8 module 16 core processor
No. Orochi is 4 module, 8 thread core.
so featuring 16mb l3 meant each core can use up to 1mb l3. still way below nehalem's 2mb per core.
Durrr...
Bulldozer does not have a 16MB L3, even reading the thread title should give away the L3 is 8MB. 2MB L2 + 2MB L3 per module, that is. Thus, per module, Orochi has 8Ă— as much L2 vs. Nehalem and equal L3-ratio.
also unlike intel's architecture amd's cache heavily determine by the stage pipeline.
Strange conclusion considering the public, (that includes me and you) don't know Bulldozer's exact pipeline length, yet.
lower stage pipeline won't take advantage on bigger cache. but since bulldozer will featuring 4+ghz i doubt this will be at least 20+ stage pipeline in this processor.
Broken sentence. What are you trying to say?
You do believe it is 20+ stage or you do not?
Also, the clock rates are completely unknown to public.
but despite all these feature as long as intel decide to increase ivy bridge's l2 cache from 256k per core to 512k per core amd will experience same horror they faced when core 2 came out.
Oh really? Now one can only wonder why didn't Intel see such a shortcoming of their L2 before taping out Nehalem, Sandy Bridge... They must have missed the fact their chips' L2 had shrinked to a fraction of the size compared to Conroe, Penryn.

PS.
In case you find some parts of my reply sarcastic, it is highly likely you are right.

Abstract for those with the "TL;DR" -syndrome:
Burger, please get your facts straight. The factual errors I've pointed out are public knowledge, go read them. And please do pay attention to writing proper English, often it is impossible to figure out what you're trying to say as many of your sentences are missing words and the words that are there are often misspelled.
 
Last edited:
largon, save your breath, he even argued with AMD guy and called his info false, lol
JF-AMD, thank you for your contribution to the thread.

I'm really looking forward to Bulldozer and I hope it succeeds, both in Server and Desktop markets :)
 
Durrr...
Bulldozer does not have a 16MB L3, even reading the thread title should give away the L3 is 8MB. 2MB L2 + 2MB L3 per module, that is. Thus, per module, Orochi has 8Ă— as much L2 vs. Nehalem and equal L3-ratio.

Sorry largon, but it's 2 MB L2 per module, 8 MB L3 shared between all four modules. There is no L3 cache at the sub-modular level. Hence the total cache is 16 MB (AMD denotes total L2 + L3 as "total cache").
 
Sorry largon, but it's 2 MB L2 per module, 8 MB L3 shared between all four modules. There is no L3 cache at the sub-modular level. Hence the total cache is 16 MB (AMD denotes total L2 + L3 as "total cache").
You're misinterpreting me. My "2MB L3 per module" is only a way to state a ratio, not actual configuration.


the problem is that 64kb l1 instruction cache and l2 cache are uncore. that is a huge difference. it will make each of bulldozer core have theoretically only 8kb l1 cache while no l2 cache built in.
What?
That's just not true. Bulldozer's L1I and L2 are fully integrated parts of the BD module and they run at core freq, and no less.
they need larger l1 cache because their l1 cache is way slower than intel's cache.
Bulldozer has 4T L1 latency, same as Nehalem's.
and now their l1 cache on each core only 8kb. it will be hard to imagine they can outperform any intel line...
Especially if the one "imagining things" is using incorrect numbers...
instruction prediction, same thing that intel had done long time ago when back to netburst time. such feature only work when you have ridiculous number of pipeline and a trace cache.
What can I say, once again you astound (but not surprise) by posting utter nonsense.
but despite everything they had done with it they still end up performing pathetic in every benches
Feeling particularly "blue", perhaps? And by saying that I'm not referring to mood.

But what can you do, a troll is a troll is a troll.
 
Last edited:
I have a question that may JF-AMD may not now since he is in the server section, but I want to ask will AMD present in the future 6 core(3 module) or 4 core(2 module) products with lower price?

Or it will be variation at the clock rate of the Orochi design?
 
Will amd improve the southbridge, harddrive performance, and such ?
Your nb's is quite good.

2nd, theese will be so diffrent compared to K8 K10 K10,5 that vmotion wont work from K10,5 -> bulldozer?

If we still can i'll be praising amd for my servers for a few more years! :P
 
Will amd improve the southbridge, harddrive performance, and such ?

There's nothing particularly bad with AMD's storage performance with a proper mode (AHCI or RAID) and proper driver (AMD over Microsoft) installed. The RAID controller sucked only till SB600 southbridge (which had a Silicon Image logic that wasn't implemented so well). SB700/SB710/SB750 is on par with ICH10/R, SB850 has no match (SATA 6 Gb/s).
 
There's nothing particularly bad with AMD's storage performance with a proper mode (AHCI or RAID) and proper driver (AMD over Microsoft) installed. The RAID controller sucked only till SB600 southbridge (which had a Silicon Image logic that wasn't implemented so well). SB700/SB710/SB750 is on par with ICH10/R, SB850 has no match (SATA 6 Gb/s).

Still not up there, I wonder why a SSD scores 7.3 with my SB750 and with my ICH10/R it does 7,5 in windows.
Why it has about 10 mb/sec more sequential, better 4k, 512, and so on. its not by much.
But its getting beaten by both nvidia and intel.

http://www.tomshardware.com/reviews/ich10r-sb750-780a,2374-10.html
I just googled abit to find some review. never trusted toms too much, but yeh :p

Its not like i'm headbanging my head to the wall of my ssd performance, it's just: there are more to get here!
 
Those are access times (in the URL you posted). The lower the better. You can see how SB750 and ICH10R are on par in most access time tests. Anyway, 7.3 to 7.5 is a big deviation in WPI but maybe other factors were at play (such as you may have tested the ICH10R system on a clean(er) installation than the SB750 system).
 
Windows numbers are inaccurate. In my build a western digital 640gb at a giagabyte 790xt-UD4P was scoring at IDE interface 5,9. After the format I changed the IDE to AHCI and it scores now 7,5. I dont know why I run the test many times and still the same result. (By the way SB750 is the southbridge)

Now I want to make another question to JF-AMD which is related to the previous one. In the blog he mentions that from 33% more cores we take 50% more performance. The test was between magny-cours (12 core) and interlagos(16-core and bulldozer architecture). We will take the same ammount increase of perfomance and the client processors? Because the increase from 6 to 8 cores equals nearly 33, should we expect 50% perfomance jump form phenom II? If this happens will it comes with an equal increase at the price?
 
Again, the client processors will perform different. Client systems will use lower number of DIMMs, usually lower latency memory (DDR3 servers use failsafe 1066 MHz @ 9-9-9-24T settings as a standard). Client processors have 3/4 HT links disabled, etc., etc. So server to client comparison isn't apples-to-apples.
 
Again, the client processors will perform different. Client systems will use lower number of DIMMs, usually lower latency memory (servers use failsafe 1066 MHz @ 9-9-9-24T). Client processors have 3/4 HT links disabled, etc., etc. So server to client comparison isn't apples-to-apples.


I agree totally with you:). But imagine (and that is speculations) a perfomance jump up to 40%. It will match or even outperfom sandybridge. If that happens what the prices wiil be? I wish they wont increase the prices as intel does.
 
Folks, all we have disclosed in public about cache is the L1 size (that I posted earlier.)

We have not disclosed L2 or L3 sizes, so whatever you quote is not confirmed, only speculation.

L1 is within the core. L2 is within the module. L3 is within the die.
 
I agree totally with you:). But imagine (and that is speculations) a perfomance jump up to 40%. It will match or even outperfom sandybridge. If that happens what the prices wiil be?

If AMD has a faster processor architecture, it will ask whatever it wants to. It's a corporation.

Just as Intel asks $999 for its Extreme Edition SKUs, AMD used to ask for the same $999 for its FX SKUs (back when K8 was the best client CPU architecture out there). Even today AMD can try to ask for more than $275, if it wants to develop the QuadFX platform. Enthusiasts always have $999 to spend on one Core i7 or two DSDC-capable Phenom II chips in the s1207 package. It's just that AMD's client CPU team has to wake up to that realization. Power and board costs are lame excuses.
 
Windows numbers are inaccurate. In my build a western digital 640gb at a giagabyte 790xt-UD4P was scoring at IDE interface 5,9. After the format I changed the IDE to AHCI and it scores now 7,5. I dont know why I run the test many times and still the same result. (By the way SB750 is the southbridge)

Now I want to make another question to JF-AMD which is related to the previous one. In the blog he mentions that from 33% more cores we take 50% more performance. The test was between magny-cours (12 core) and interlagos(16-core and bulldozer architecture). We will take the same ammount increase of perfomance and the client processors? Because the increase from 6 to 8 cores equals nearly 33, should we expect 50% perfomance jump form phenom II? If this happens will it comes with an equal increase at the price?

You can't do the math that way, but there will be a very good performance gain.

With servers you are measuring throughput, which is hom much stuff you can jam through a pipe at full utilization. Client loads are more bursty, so throughput is a less relevant measure.
 
You can't do the math that way, but there will be a very good performance gain.

With servers you are measuring throughput, which is hom much stuff you can jam through a pipe at full utilization. Client loads are more bursty, so throughput is a less relevant measure.


Thanks JF!
 
Back
Top