AMD Orochi ''Bulldozer'' Die Holds 16 MB Cache

JF-AMD · Sep 25, 2010

cheezburger said:
incorrect.....the l1 instruction share by one module(2 cores) and l2 is share by two modules and l3 is share by all modules...

and about 8k l1 data....i remember i saw the spec from anandtech three months ago...however i found wiki had 16k l1 cache.....which i'd rather believe anandtech's source...

You have 2 opinions to choose from:

1. A reporter who has never touched the product

or

2. The director of product marketing for servers at AMD

Choose carefully, there will be a test at the end of the class.

Techtu · Sep 25, 2010

JF-AMD said:
You have 2 opinions to choose from:

1. A reporter who has never touched the product

or

2. The director of product marketing for servers at AMD

Choose carefully, there will be a test at the end of the class.

Well put

enaher · Sep 25, 2010

JF-AMD said:
L1 cache is not 8k. Check my blog in a week or so for the answer. There is l1 instruction shared between two cores, l1 data per core and l2 shared between 2 cores. L3 is shared at the die level

wow that's some agressive increase and management of cache, I hope performance increases, AMD's succes is good for us might bring prices down, and performance up.

bear jesus · Sep 25, 2010

enaher said:
I hope performance increases, AMD's succes is good for us might bring prices down, and performance up.

I agree, luckly i can wait untill after bulldozer is out to make my next big upgrade so if it is a great cpu/good value i of corse will go for it and if not then i would hope intel's current gen at the time will have a price drop around that time.

Although i admit i would prefer to go with amd as i had an amd k6 back in the day and was so happy with it and was lucky enough to go through 3 amd cpu's on my current motherboard so i would love to keep supporting them and if i am lucky go through another 2/3 cpu's on my next motherboard as i hate the idea of having to change motherboard every time i upgrade my cpu.

btarunr · Sep 25, 2010

cheezburger said:
incorrect.....the l1 instruction share by one module(2 cores) and l2 is share by two modules and l3 is share by all modules...

L2 is shared between two cores within one module.

And yes, JF-AMD indeed is director of product marketing for servers at AMD. Waiting for W1zzard to give him his title. He may have known details about Orochi months before anyone else did.

wahdangun · Sep 25, 2010

btarunr said:
L2 is shared between two cores within one module.

And yes, JF-AMD indeed is director of product marketing for servers at AMD. Waiting for W1zzard to give him his title. He may have known details about Orochi months before anyone else did.

so its basically a core2duo ?

Wile E · Sep 25, 2010

I can't wait to see the numbers. I also hope that the new architecture still overclocks well. Competition at the top end would be killer. I want my $1000 chips to either become $500 chips, or become twice as fast for the $1000.

largon · Sep 25, 2010

Oh boy... Here we go.

cheezburger said:
they are try to fix the single thread performance hit due to the smaller l1 data/instruction.

As if they would have had any problems slapping in an equally sized or larger than Hammer's L1s... It's not like this is AMD's first CPU architecture ever, or that adding such and amount would be of any die area concern. And for comparison, Nehalem has 32kB per core, 16kB per thread AND a tiny 256kB L2 - I bet Intel must be struggling with similar performance hit.

cheezburger said:
each core "only" had 8kb l1 data

Err... No.
Each Bulldozer module has two set of integer pipelines and both of them have dedicated 16kB L1D. 16+16kB in total per module, 16kB per thread.

cheezburger said:
while the instruction cache is share by module which just only 64kb "2 way" in cache(could have be less...i think...)

Bulldozer's L1I is 64kB, that's been public for some time now. About the bracketed comment; you think it could have been smaller, or you aren't sure what size it is?

cheezburger said:
which is roughly 40kb per core compare to core's 64kb per core. big disadvantage.

If you say so...

cheezburger said:
so all they can do is add more l3 cache to increase the performance (...) same thing intel did when realized northwood its poor l1 cache will drag down performance they increase l2 cache from 256kb to 512kb.

And by coincidence, Intel is doing the same. "Obviously" they too must be patching Core m-arch's "poor L1s and L2s" by adding cache levels and continuously increasing their size.

cheezburger said:
however orochi is 8 module 16 core processor

No. Orochi is 4 module, 8 thread core.

cheezburger said:
so featuring 16mb l3 meant each core can use up to 1mb l3. still way below nehalem's 2mb per core.

Durrr...
Bulldozer does not have a 16MB L3, even reading the thread title should give away the L3 is 8MB. 2MB L2 + 2MB L3 per module, that is. Thus, per module, Orochi has 8× as much L2 vs. Nehalem and equal L3-ratio.

cheezburger said:
also unlike intel's architecture amd's cache heavily determine by the stage pipeline.

Strange conclusion considering the public, (that includes me and you) don't know Bulldozer's exact pipeline length, yet.

cheezburger said:
lower stage pipeline won't take advantage on bigger cache. but since bulldozer will featuring 4+ghz i doubt this will be at least 20+ stage pipeline in this processor.

Broken sentence. What are you trying to say?
You do believe it is 20+ stage or you do not?
Also, the clock rates are completely unknown to public.

cheezburger said:
but despite all these feature as long as intel decide to increase ivy bridge's l2 cache from 256k per core to 512k per core amd will experience same horror they faced when core 2 came out.

Oh really? Now one can only wonder why didn't Intel see such a shortcoming of their L2 before taping out Nehalem, Sandy Bridge... They must have missed the fact their chips' L2 had shrinked to a fraction of the size compared to Conroe, Penryn.

PS.
In case you find some parts of my reply sarcastic, it is highly likely you are right.

Abstract for those with the "TL;DR" -syndrome:
Burger, please get your facts straight. The factual errors I've pointed out are public knowledge, go read them. And please do pay attention to writing proper English, often it is impossible to figure out what you're trying to say as many of your sentences are missing words and the words that are there are often misspelled.

Wyverex · Sep 25, 2010

largon, save your breath, he even argued with AMD guy and called his info false, lol
JF-AMD, thank you for your contribution to the thread.

I'm really looking forward to Bulldozer and I hope it succeeds, both in Server and Desktop markets

btarunr · Sep 25, 2010

largon said:
Durrr...
Bulldozer does not have a 16MB L3, even reading the thread title should give away the L3 is 8MB. 2MB L2 + 2MB L3 per module, that is. Thus, per module, Orochi has 8× as much L2 vs. Nehalem and equal L3-ratio.

Sorry largon, but it's 2 MB L2 per module, 8 MB L3 shared between all four modules. There is no L3 cache at the sub-modular level. Hence the total cache is 16 MB (AMD denotes total L2 + L3 as "total cache").

largon · Sep 25, 2010

btarunr said:
Sorry largon, but it's 2 MB L2 per module, 8 MB L3 shared between all four modules. There is no L3 cache at the sub-modular level. Hence the total cache is 16 MB (AMD denotes total L2 + L3 as "total cache").

You're misinterpreting me. My "2MB L3 per module" is only a way to state a ratio, not actual configuration.

cheezburger said:
the problem is that 64kb l1 instruction cache and l2 cache are uncore. that is a huge difference. it will make each of bulldozer core have theoretically only 8kb l1 cache while no l2 cache built in.

What?
That's just not true. Bulldozer's L1I and L2 are fully integrated parts of the BD module and they run at core freq, and no less.

cheezburger said:
they need larger l1 cache because their l1 cache is way slower than intel's cache.

Bulldozer has 4T L1 latency, same as Nehalem's.

cheezburger said:
and now their l1 cache on each core only 8kb. it will be hard to imagine they can outperform any intel line...

Especially if the one "imagining things" is using incorrect numbers...

cheezburger said:
instruction prediction, same thing that intel had done long time ago when back to netburst time. such feature only work when you have ridiculous number of pipeline and a trace cache.

What can I say, once again you astound (but not surprise) by posting utter nonsense.

cheezburger said:
but despite everything they had done with it they still end up performing pathetic in every benches

Feeling particularly "blue", perhaps? And by saying that I'm not referring to mood.

But what can you do, a troll is a troll is a troll.

TAViX · Sep 25, 2010

I hope it will consume less than 200W...

ROad86 · Sep 25, 2010

I have a question that may JF-AMD may not now since he is in the server section, but I want to ask will AMD present in the future 6 core(3 module) or 4 core(2 module) products with lower price?

Or it will be variation at the clock rate of the Orochi design?

Imsochobo · Sep 25, 2010

cadaveca said:
I think he's legit.

He answer stuff like i answer stuff about my company, as short as possible

Imsochobo · Sep 25, 2010

Will amd improve the southbridge, harddrive performance, and such ?
Your nb's is quite good.

2nd, theese will be so diffrent compared to K8 K10 K10,5 that vmotion wont work from K10,5 -> bulldozer?

If we still can i'll be praising amd for my servers for a few more years!

btarunr · Sep 25, 2010

Imsochobo said:
Will amd improve the southbridge, harddrive performance, and such ?

There's nothing particularly bad with AMD's storage performance with a proper mode (AHCI or RAID) and proper driver (AMD over Microsoft) installed. The RAID controller sucked only till SB600 southbridge (which had a Silicon Image logic that wasn't implemented so well). SB700/SB710/SB750 is on par with ICH10/R, SB850 has no match (SATA 6 Gb/s).

Imsochobo · Sep 25, 2010

btarunr said:
There's nothing particularly bad with AMD's storage performance with a proper mode (AHCI or RAID) and proper driver (AMD over Microsoft) installed. The RAID controller sucked only till SB600 southbridge (which had a Silicon Image logic that wasn't implemented so well). SB700/SB710/SB750 is on par with ICH10/R, SB850 has no match (SATA 6 Gb/s).

Still not up there, I wonder why a SSD scores 7.3 with my SB750 and with my ICH10/R it does 7,5 in windows.
Why it has about 10 mb/sec more sequential, better 4k, 512, and so on. its not by much.
But its getting beaten by both nvidia and intel.

http://www.tomshardware.com/reviews/ich10r-sb750-780a,2374-10.html
I just googled abit to find some review. never trusted toms too much, but yeh

Its not like i'm headbanging my head to the wall of my ssd performance, it's just: there are more to get here!

btarunr · Sep 25, 2010

Those are access times (in the URL you posted). The lower the better. You can see how SB750 and ICH10R are on par in most access time tests. Anyway, 7.3 to 7.5 is a big deviation in WPI but maybe other factors were at play (such as you may have tested the ICH10R system on a clean(er) installation than the SB750 system).

ROad86 · Sep 25, 2010

Windows numbers are inaccurate. In my build a western digital 640gb at a giagabyte 790xt-UD4P was scoring at IDE interface 5,9. After the format I changed the IDE to AHCI and it scores now 7,5. I dont know why I run the test many times and still the same result. (By the way SB750 is the southbridge)

Now I want to make another question to JF-AMD which is related to the previous one. In the blog he mentions that from 33% more cores we take 50% more performance. The test was between magny-cours (12 core) and interlagos(16-core and bulldozer architecture). We will take the same ammount increase of perfomance and the client processors? Because the increase from 6 to 8 cores equals nearly 33, should we expect 50% perfomance jump form phenom II? If this happens will it comes with an equal increase at the price?

btarunr · Sep 25, 2010

Again, the client processors will perform different. Client systems will use lower number of DIMMs, usually lower latency memory (DDR3 servers use failsafe 1066 MHz @ 9-9-9-24T settings as a standard). Client processors have 3/4 HT links disabled, etc., etc. So server to client comparison isn't apples-to-apples.

ROad86 · Sep 25, 2010

btarunr said:
Again, the client processors will perform different. Client systems will use lower number of DIMMs, usually lower latency memory (servers use failsafe 1066 MHz @ 9-9-9-24T). Client processors have 3/4 HT links disabled, etc., etc. So server to client comparison isn't apples-to-apples.

I agree totally with you

. But imagine (and that is speculations) a perfomance jump up to 40%. It will match or even outperfom sandybridge. If that happens what the prices wiil be? I wish they wont increase the prices as intel does.

JF-AMD · Sep 25, 2010

Folks, all we have disclosed in public about cache is the L1 size (that I posted earlier.)

We have not disclosed L2 or L3 sizes, so whatever you quote is not confirmed, only speculation.

L1 is within the core. L2 is within the module. L3 is within the die.

btarunr · Sep 25, 2010

ROad86 said:
I agree totally with you. But imagine (and that is speculations) a perfomance jump up to 40%. It will match or even outperfom sandybridge. If that happens what the prices wiil be?

If AMD has a faster processor architecture, it will ask whatever it wants to. It's a corporation.

Just as Intel asks $999 for its Extreme Edition SKUs, AMD used to ask for the same $999 for its FX SKUs (back when K8 was the best client CPU architecture out there). Even today AMD can try to ask for more than $275, if it wants to develop the QuadFX platform. Enthusiasts always have $999 to spend on one Core i7 or two DSDC-capable Phenom II chips in the s1207 package. It's just that AMD's client CPU team has to wake up to that realization. Power and board costs are lame excuses.

JF-AMD · Sep 25, 2010

ROad86 said:
Windows numbers are inaccurate. In my build a western digital 640gb at a giagabyte 790xt-UD4P was scoring at IDE interface 5,9. After the format I changed the IDE to AHCI and it scores now 7,5. I dont know why I run the test many times and still the same result. (By the way SB750 is the southbridge)

Now I want to make another question to JF-AMD which is related to the previous one. In the blog he mentions that from 33% more cores we take 50% more performance. The test was between magny-cours (12 core) and interlagos(16-core and bulldozer architecture). We will take the same ammount increase of perfomance and the client processors? Because the increase from 6 to 8 cores equals nearly 33, should we expect 50% perfomance jump form phenom II? If this happens will it comes with an equal increase at the price?

You can't do the math that way, but there will be a very good performance gain.

With servers you are measuring throughput, which is hom much stuff you can jam through a pipe at full utilization. Client loads are more bursty, so throughput is a less relevant measure.

ROad86 · Sep 25, 2010

JF-AMD said:
You can't do the math that way, but there will be a very good performance gain.

With servers you are measuring throughput, which is hom much stuff you can jam through a pipe at full utilization. Client loads are more bursty, so throughput is a less relevant measure.

Thanks JF!

System Name	One does not simply have enough cash for INTEL
Processor	Athlon II 450 3.8ghz
Motherboard	Gigabyte 990FX-UD3
Cooling	Cooler Master 212+
Memory	4x4gb Corsair DDR3
Video Card(s)	2x 5830 Sapphire 950c/1150m CF
Storage	2x Seagate 500GB Raid0 + Samsung 1TB
Display(s)	BenQ G2220HDA
Case	Xigmatek ASGARD PRO
Power Supply	SeasonicX 750w
Software	W7 x64 Ultimate

System Name	Gaming temp// HTPC
Processor	AMD A6 5400k // A4 5300
Motherboard	ASRock FM2A75 PRO4// ASRock FM2A55M-DGS
Cooling	Xigmatek HDT-D1284 // stock phenom II HSF
Memory	4GB 1600mhz corsair vengeance // 4GB 1600mhz corsair vengeance low profile
Storage	64gb sandisk pulse SSD and 500gb HDD // 500gb HDD
Display(s)	acer 22" 1680x1050
Power Supply	Seasonic G-450 // Corsair CXM 430W

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	The ClusterF**k
Processor	980X @ 4Ghz
Motherboard	Gigabyte GA-EX58-UD5 BIOS F12
Cooling	MCR-320, DDC-1 pump w/Bitspower res top (1/2" fittings), Koolance CPU-360
Memory	3x2GB Mushkin Redlines 1600Mhz 6-8-6-24 1T
Video Card(s)	Evga GTX 580
Storage	Corsair Neutron GTX 240GB, 2xSeagate 320GB RAID0; 2xSeagate 3TB; 2xSamsung 2TB; Samsung 1.5TB
Display(s)	HP LP2475w 24" 1920x1200 IPS
Case	Technofront Bench Station
Audio Device(s)	Auzentech X-Fi Forte into Onkyo SR606 and Polk TSi200's + RM6750
Power Supply	ENERMAX Galaxy EVO EGX1250EWT 1250W
Software	Win7 Ultimate N x64, OSX 10.8.4

System Name	Ladpot ◦◦◦ Desktop
Processor	R7 5800H ◦◦◦ i7 4770K, watercooled
Motherboard	HP 88D2 ◦◦◦ Asus Z87-C2 Maximus VI Formula
Cooling	Mixed gases ◦◦◦ Fuzion V1, MCW60/R2, DDC1/DDCT-01s top, PA120.3, EK200, D12SL-12, liq.metal TIM
Memory	2× 8GB DDR4-3200 ◦◦◦ 2× 8GB Crucial Ballistix Tactical LP DDR3-1600
Video Card(s)	RTX 3070 ◦◦◦ heaps of dead GPUs in the garage
Storage	Samsung 980 PRO 2TB ◦◦◦ Samsung 840Pro 256@178GB + 4× WD Red 2TB in RAID10 + LaCie Blade Runner 4TB
Display(s)	HP ZR30w 30" 2560×1600 (WQXGA) H2-IPS
Case	Lian Li PC-A16B
Audio Device(s)	Onboard
Power Supply	Corsair AX860i
Mouse	Logitech MX Master 2S / Contour RollerMouse Red+
Keyboard	Logitech Elite Keyboard from 2006 / Contour Balance Keyboard / Logitech diNovo Edge
Software	W11 x64 ◦◦◦ W10 x64
Benchmark Scores	It does boot up? I think.

Processor	Core i5 4460
Motherboard	Gigabyte Z97-D3H
Cooling	Zalman CNPS10X Optima
Memory	1x 8GB DDR3 @ 900Mhz
Video Card(s)	Gigabyte GTX1660 6GB OC
Storage	Patriot Blast 240GB SSD; Caviar Black 500 GB; Caviar Green 1 TB
Display(s)	Dell U2311H (23'' IPS)
Power Supply	FSP Hyper 700
Mouse	Sharkoon Fireglider
Software	Win 10 Pro 64-bit

Processor	AMD Phenom II x4 B55
Motherboard	Gigabyte MA790XT-UD4P
Cooling	SilverStone Nitrogon NT06 Evolution+Noiseblocker BlackSilentPro
Memory	Corsair XMS3 4GB
Video Card(s)	Saphire Radeon 4870
Storage	WD 640 Black + WD 500 Blue
Case	Antec P193
Power Supply	Corsair CMPSU-650TX
Software	Win 7 Professional 64bit

Processor	9800x3D\| 5800x \| 4800H \| Rog ally
Motherboard	Gb x870 Aorus Elite ice \| Asrack x470d4u \| Asus Tuf A15
Cooling	Air \| Air \| duh laptop
Memory	64gb G.skill SniperX @3600 CL16 \| 128gb \| 32GB \| 192gb
Video Card(s)	RTX 4080 \|Quadro P5000 \| RTX2060M
Storage	Many drives
Display(s)	AW3423dwf.
Case	Jonsbo D41
Power Supply	Corsair RM850x
Mouse	g502x Lightspeed
Keyboard	G913 tkl
Software	win11, proxmox

AMD Orochi ''Bulldozer'' Die Holds 16 MB Cache

AMD Rep (Server)

New Member

New Member

Editor & Senior Moderator

wahdangun

Guest

Power User

Editor & Senior Moderator

TAViX

Guest

New Member

Editor & Senior Moderator

Editor & Senior Moderator

New Member

Editor & Senior Moderator

New Member

AMD Rep (Server)

Editor & Senior Moderator

AMD Rep (Server)

New Member