• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Orochi ''Bulldozer'' Die Holds 16 MB Cache

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
46,209 (7.69/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Documents related to the "Orochi" 8-core processor by AMD based on its next-generation Bulldozer architecture reveal its cache hierarchy that comes as a bit of a surprise. Earlier this month, at a GlobalFoundries hosted conference, AMD displayed the first die-shot of the Orochi die, which legibly showed key features including the four Bulldozer modules which hold two cores each, and large L2 caches. In coarse visual inspection, the L2 cache of each module seems to cover 35% of its area. L3 cache is located along the center of the die. The documents seen by X-bit Labs reveal that each Bulldozer module has its own 2 MB L2 cache shared between two cores, and an L3 cache shared between all four modules (8 cores) of 8 MB.

This takes the total cache count of Orochi all the way up to 16 MB. This hierarchy suggests that AMD wants to give individual cores access to a large amount of faster cache (that's a whopping 2048 KB compared to 512 KB per core on Phenom, and 256 KB per core on Core i7), which facilitates faster inter-core, intra-module communication. Inter-module communication is enhanced by the 8 MB L3 cache. Compared to the current "Istanbul" six-core K10-based die, that's a 77% increase in cache amount for a 33% core count increase, 300% increase in L2 cache per core. Orochi is built on a 32 nm GlobalFoundries process, it is sure to have a very high transistor count.

View at TechPowerUp Main Site
 
Joined
Apr 17, 2010
Messages
133 (0.03/day)
Location
Hobart, Australia
System Name B82REZ-2010
Processor AMD 1090T at 4ghz - 1.425v
Motherboard ASrock 890GX Extreme 3
Cooling Corsair H50 - Push/Pull with Noctua NF-P12's Connected Via Zalman Fan Controller
Memory 8GB Corsair Dominator's 1600mhz (Dual Memory Config)
Video Card(s) Asus 5850
Storage 60GB Corsair SSD, 2x 1TB, 1x 750GB, 5x 2TB
Display(s) 23" Full HD LG LCD, 19" Acer LCD
Case Cooler Master HAF X
Audio Device(s) Onboard
Power Supply Corsair 850HX Modular PSU
Software Windows 7 Ultimate 64-Bit
BL GG Intel Fanboys. AMD is back! :nutkick:
 
Joined
Oct 5, 2008
Messages
1,802 (0.32/day)
Location
ATL, GA
System Name My Rig
Processor AMD 3950X
Motherboard X570 TUFF GAMING PLUS
Cooling EKWB Custom Loop, Lian Li 011 G1 distroplate/DDC 3.1 combo
Memory 4x16GB Corsair DDR4-3466
Video Card(s) MSI Seahawk 2080 Ti EKWB block
Storage 2TB Auros NVMe Drive
Display(s) Asus P27UQ
Case Lian Li 011-Dynamic XL
Audio Device(s) JBL 30X
Power Supply Seasonic Titanium 1000W
Mouse Razer Lancehead
Keyboard Razer Widow Maker Keyboard
Software Window's 10 Pro
I'll believe it's a performance gain when I see the benchmarks. Regardless of which side you take, competition is always good for the consumer.
 
Joined
Sep 25, 2007
Messages
5,965 (0.99/day)
Location
New York
Processor AMD Ryzen 9 5950x, Ryzen 9 5980HX
Motherboard MSI X570 Tomahawk
Cooling Be Quiet Dark Rock Pro 4(With Noctua Fans)
Memory 32Gb Crucial 3600 Ballistix
Video Card(s) Gigabyte RTX 3080, Asus 6800M
Storage Adata SX8200 1TB NVME/WD Black 1TB NVME
Display(s) Dell 27 Inch 165Hz
Case Phanteks P500A
Audio Device(s) IFI Zen Dac/JDS Labs Atom+/SMSL Amp+Rivers Audio
Power Supply Corsair RM850x
Mouse Logitech G502 SE Hero
Keyboard Corsair K70 RGB Mk.2
VR HMD Samsung Odyssey Plus
Software Windows 10
wait for benchmarks before you start that, we've been through that before with amd
 

wolf

Performance Enthusiast
Joined
May 7, 2007
Messages
7,703 (1.25/day)
System Name MightyX
Processor Ryzen 5800X3D
Motherboard Gigabyte X570 I Aorus Pro WiFi
Cooling Scythe Fuma 2
Memory 32GB DDR4 3600 CL16
Video Card(s) Asus TUF RTX3080 Deshrouded
Storage WD Black SN850X 2TB
Display(s) LG 42C2 4K OLED
Case Coolermaster NR200P
Audio Device(s) LG SN5Y / Focal Clear
Power Supply Corsair SF750 Platinum
Mouse Corsair Dark Core RBG Pro SE
Keyboard Glorious GMMK Compact w/pudding
VR HMD Meta Quest 3
Software case populated with Artic P12's
Benchmark Scores 4k120 OLED Gsync bliss

ebolamonkey3

New Member
Joined
Apr 9, 2010
Messages
773 (0.15/day)
Location
Atlanta/Marietta, GA
System Name Norbert
Processor Intel Core i7 920
Motherboard Gigabyte X58A-UD5
Cooling Corsair H50 with 2x Scythe GT AP-14
Memory 3x 2gb G.Skill 1600Mhz C9 DDR3
Video Card(s) MSI Twin Frozr II GTX 465 GE & EVGA GTS 450 SC
Storage 2x 1Tb Samsung Sprinpoint F3 7200rpm
Display(s) Dell U3011, Dell 2408WFP, Samsung 2693HM
Case Lian Li V1020R
Audio Device(s) Creative X-Fi Titanium
Power Supply Seasonic X-750
Software Windows 7 Ultimate 64bit
2011 is shaping up to be quite an interesting year :)
 

Completely Bonkers

New Member
Joined
Feb 6, 2007
Messages
2,576 (0.41/day)
Processor Mysterious Engineering Prototype
Motherboard Intel 865
Cooling Custom block made in workshop
Memory Corsair XMS 2GB
Video Card(s) FireGL X3-256
Display(s) 1600x1200 SyncMaster x 2 = 3200x1200
Software Windows 2003
I remember the "massive cache" Gallatin P4's over Northwood. Didn't make more than 5% difference clock for clock except in very special circumstances.

So let's wait for benchmarks.

I would have thought there would be better gains by rethinking cache and memory entirely, possibly producing a separate socket for L3 cache just like in the old days. It would be so much cheaper to do it that way, you could easily pack 256MB cache. Yes, the latency would be worse than current on-die L3 cache, but with the space, heat and transistors saved, you could bump up L1 and L2 cache and win back any performance losses. Plus you could build your L3 cache to order.
 

DaMulta

My stars went supernova
Joined
Aug 3, 2006
Messages
16,168 (2.51/day)
Location
Oklahoma T-Town
System Name Work in progress
Processor AMD 955---4Ghz
Motherboard MSi GD70
Cooling OcZ Phase/water
Memory Crucial2GB kit (1GBx2), Ballistix 240-pin DIMM, DDR3 PC3-16000
Video Card(s) CrossfireX 2 X HD 4890 1GB OCed to 1000Mhz
Storage SSD 64GB
Display(s) Envision 24'' 1920x1200
Case Using the desk ATM
Audio Device(s) Sucky onboard for now :(
Power Supply 1000W TruePower Quattro
That's it????? I wait for the day with 16 cores with 64MB of Cache
 
Joined
Sep 1, 2009
Messages
1,169 (0.22/day)
Location
CO
System Name 4k
Processor AMD 5800x3D
Motherboard MSI MAG b550m Mortar Wifi
Cooling Corsair H100i
Memory 4x8Gb Crucial Ballistix 3600 CL16 bl8g36c16u4b.m8fe1
Video Card(s) Nvidia Reference 3080Ti
Storage ADATA XPG SX8200 Pro 1TB
Display(s) LG 48" C1
Case CORSAIR Carbide AIR 240 Micro-ATX
Audio Device(s) Asus Xonar STX
Power Supply EVGA SuperNOVA 650W
Software Microsoft Windows10 Pro x64
Well it seems Bulldozer is going to be faster when communicating with memory and other cores. I think if AMD just did that to a phenom 2 chip it would speed it up significantly. I really cant wait to see bulldozer in action.
 

bear jesus

New Member
Joined
Aug 12, 2010
Messages
1,534 (0.31/day)
Location
Britland
System Name Gaming temp// HTPC
Processor AMD A6 5400k // A4 5300
Motherboard ASRock FM2A75 PRO4// ASRock FM2A55M-DGS
Cooling Xigmatek HDT-D1284 // stock phenom II HSF
Memory 4GB 1600mhz corsair vengeance // 4GB 1600mhz corsair vengeance low profile
Storage 64gb sandisk pulse SSD and 500gb HDD // 500gb HDD
Display(s) acer 22" 1680x1050
Power Supply Seasonic G-450 // Corsair CXM 430W
I would hope more faster cache could be a good thing but the main thing im interested in is how each modual performs, i'm really thinking about getting a high end sandy bridge or bulldozer to last me a couple years or so and that means i want as many and as fast a cores as possible as i would hope over the next few years more software will use more cores.
 

Rebelstar

New Member
Joined
Sep 3, 2010
Messages
71 (0.01/day)
Location
Minsk, Belarus
Display(s) custom thin bezel eyefinity 1x3 portrait
Case Cooler Master HAF 932
Power Supply CoolerMaster SilentPro M850
Software Windows 7 Enterprise x64
I'm totally noob in CPU technologies but I think 16MB cache it's a freaking cool, right?
 
Last edited:
Joined
Jul 20, 2009
Messages
217 (0.04/day)
Processor Xeon E5 1650 V4
Motherboard MSI X99A SLI PLUS
Cooling HYPER 212 EVO
Memory 64gb DDR4 2133
Video Card(s) XFX RADEON RX 480 8GB
Storage Samsung PM951 512GB NVMe SSD
Display(s) LG 34" Ultrawide + AOC 27"
Power Supply EVGA 750 Watt
Mouse Logitech M280
Keyboard Dell SK-8135
it is sure to have a very high transistor count.

so does fermi, i hope amd has the tdp under control, otherwise sandy will kick butt
 

bear jesus

New Member
Joined
Aug 12, 2010
Messages
1,534 (0.31/day)
Location
Britland
System Name Gaming temp// HTPC
Processor AMD A6 5400k // A4 5300
Motherboard ASRock FM2A75 PRO4// ASRock FM2A55M-DGS
Cooling Xigmatek HDT-D1284 // stock phenom II HSF
Memory 4GB 1600mhz corsair vengeance // 4GB 1600mhz corsair vengeance low profile
Storage 64gb sandisk pulse SSD and 500gb HDD // 500gb HDD
Display(s) acer 22" 1680x1050
Power Supply Seasonic G-450 // Corsair CXM 430W
I'm totally noob in CPU technologies but I think 16MB cache it's a freaking cool, right?

It could be if put to use well but the core's are really importaint, either way we won't know untill the reviews really.
 
Joined
Feb 17, 2007
Messages
1,238 (0.20/day)
Location
SoCal
Processor AMD Phenom II 1055T @ 3.6ghz 1.3V
Motherboard Asus M5A97 EVO
Cooling Xigmatek SD1284
Memory 2x4GB Patriot Sector 5 PC3-12800 @ 7-8-7-24-1T 1.7V
Video Card(s) XFX Radeon HD 7950 DD @ 1100/1350 1.185V
Storage OCZ Agility 3 120GB + 2x7200.12 500GB Raid1
Display(s) QNIX QX2710 27" LCD 1440p @ 120hz
Case Cooler Master 690M
Audio Device(s) Realtek ALC892
Power Supply Enermax Liberty 620W Eco Edition
Software Windows 7 Professional x64 / Ubuntu 12.04 x64
One design win I really commend AMD for is their use of dynamic cache allocation between the "cores" on a module. While many assume the sharing of cache (and other items like the FPU) will hurt single threaded performance, that really isn't the case. When only one core is active per module, it has complete control over all the resources; thus a single core will have 2mb L2 cache at its disposal! Also, when both cores on a module are active, they can inequitably share the resources (ie one core with .5mb L2 and another with 1.5mb L2 is possible). Very cool technology.

For Bulldozer, there will be the option to have the OS prefer loading one core per module (like cores 1, 3, 5, 7) rather than just filling them up by modules (1, 2, 3, 4). Both have benefits and faults: the first route has higher performance, but also higher power consumption; the second would be the exact opposite.

As far as the sharing of the FPU, in servers it will make hardly any difference. In the desktop segment, AMD argues that should you be doing something that takes up so much FPU performance to slow down our modules, then you should be doing it on the GPU instead.
 

cadaveca

My name is Dave
Joined
Apr 10, 2006
Messages
17,232 (2.63/day)
I like this news. I ahve been saying for a couple of years now that AMD's cache design needed to cahnge, and here, they are doing something about it. That makes me even more interested in Bulldozer tech.
 

bear jesus

New Member
Joined
Aug 12, 2010
Messages
1,534 (0.31/day)
Location
Britland
System Name Gaming temp// HTPC
Processor AMD A6 5400k // A4 5300
Motherboard ASRock FM2A75 PRO4// ASRock FM2A55M-DGS
Cooling Xigmatek HDT-D1284 // stock phenom II HSF
Memory 4GB 1600mhz corsair vengeance // 4GB 1600mhz corsair vengeance low profile
Storage 64gb sandisk pulse SSD and 500gb HDD // 500gb HDD
Display(s) acer 22" 1680x1050
Power Supply Seasonic G-450 // Corsair CXM 430W
One design win I really commend AMD for is their use of dynamic cache allocation between the "cores" on a module. While many assume the sharing of cache (and other items like the FPU) will hurt single threaded performance, that really isn't the case. When only one core is active per module, it has complete control over all the resources; thus a single core will have 2mb L2 cache at its disposal! Also, when both cores on a module are active, they can inequitably share the resources (ie one core with .5mb L2 and another with 1.5mb L2 is possible). Very cool technology.

For Bulldozer, there will be the option to have the OS prefer loading one core per module (like cores 1, 3, 5, 7) rather than just filling them up by modules (1, 2, 3, 4). Both have benefits and faults: the first route has higher performance, but also higher power consumption; the second would be the exact opposite.

As far as the sharing of the FPU, in servers it will make hardly any difference. In the desktop segment, AMD argues that should you be doing something that takes up so much FPU performance to slow down our modules, then you should be doing it on the GPU instead.

I never knew it would be set up like that, kind of makes me even more sure i want to wait for bulldozer for my next full upgrade so that if it is a good cpu at a good price i can go for one or if not then i can get somethign from sandy bridge a little cheaper (hoping price drops will come over the time waited and if the consumer is lucky price drops that come with/after bulldozer).
 

cheezburger

New Member
Joined
Sep 6, 2010
Messages
265 (0.05/day)
System Name no bases
Processor E8400/e5300/qx9770
Motherboard rampage formula/DG41TY/p5q DELUXE
Cooling stock DTC cooler&copper core
Memory titanium XTC DDR2 800 2gbx4/2gbx2/ballistix 2GBx4 DDR2-800
Video Card(s) evga gtx 460 oc/zotac 9600gt amp/evga gtx 580
Storage WD cavior black 2TB 16mb eSATA 2/500gb 16mb ATA133/ OCZSSD2-1ONX32G + samsung 320gb 8mb ESATA
Case cm 690/GZ-x2/antec qaudro 1200w
Power Supply antec quattro 1200w/zumax 500w v2/antec HCG 900w
Software windows server 2008 sp2/windows xp x64 pro sp2c/windows server 2008 sp1
no surprise. they are try to fix the single thread performance hit due to the smaller l1 data/instruction. each core "only" had 8kb l1 data while the instruction cache is share by module which just only 64kb "2 way" in cache(could have be less...i think...) which is roughly 40kb per core compare to core's 64kb per core. big disadvantage. so all they can do is add more l3 cache to increase the performance or hoping not drop performance without tweak too much on the exist architecture that had been tape out and going to be release in 3 months. same thing intel did when realized northwood its poor l1 cache will drag down performance they increase l2 cache from 256kb to 512kb. however orochi is 8 module 16 core processor so featuring 16mb l3 meant each core can use up to 1mb l3. still way below nehalem's 2mb per core. also unlike intel's architecture amd's cache heavily determine by the stage pipeline. lower stage pipeline won't take advantage on bigger cache. but since bulldozer will featuring 4+ghz i doubt this will be at least 20+ stage pipeline in this processor. but despite all these feature as long as intel decide to increase ivy bridge's l2 cache from 256k per core to 512k per core amd will experience same horror they faced when core 2 came out.
 

HTC

Joined
Apr 1, 2008
Messages
4,595 (0.79/day)
Location
Portugal
System Name HTC's System
Processor Ryzen 5 2600X
Motherboard Asrock Taichi X370
Cooling NH-C14, with the AM4 mounting kit
Memory G.Skill Kit 16GB DDR4 F4 - 3200 C16D - 16 GTZB
Video Card(s) Sapphire Nitro+ Radeon RX 480 OC 4 GB
Storage 1 Samsung NVMe 960 EVO 250 GB + 1 3.5" Seagate IronWolf Pro 6TB 7200RPM 256MB SATA III
Display(s) LG 27UD58
Case Fractal Design Define R6 USB-C
Audio Device(s) Onboard
Power Supply Corsair TX 850M 80+ Gold
Mouse Razer Deathadder Elite
Software Ubuntu 19.04 LTS
I wonder how hot these CPUs will get ...
 

ROad86

New Member
Joined
Sep 24, 2010
Messages
21 (0.00/day)
Processor AMD Phenom II x4 B55
Motherboard Gigabyte MA790XT-UD4P
Cooling SilverStone Nitrogon NT06 Evolution+Noiseblocker BlackSilentPro
Memory Corsair XMS3 4GB
Video Card(s) Saphire Radeon 4870
Storage WD 640 Black + WD 500 Blue
Case Antec P193
Power Supply Corsair CMPSU-650TX
Software Win 7 Professional 64bit
no surprise. they are try to fix the single thread performance hit due to the smaller l1 data/instruction. each core "only" had 8kb l1 data while the instruction cache is share by module which just only 64kb "2 way" in cache(could have be less...i think...) which is roughly 40kb per core compare to core's 64kb per core. big disadvantage. so all they can do is add more l3 cache to increase the performance or hoping not drop performance without tweak too much on the exist architecture that had been tape out and going to be release in 3 months. same thing intel did when realized northwood its poor l1 cache will drag down performance they increase l2 cache from 256kb to 512kb. however orochi is 8 module 16 core processor so featuring 16mb l3 meant each core can use up to 1mb l3. still way below nehalem's 2mb per core. also unlike intel's architecture amd's cache heavily determine by the stage pipeline. lower stage pipeline won't take advantage on bigger cache. but since bulldozer will featuring 4+ghz i doubt this will be at least 20+ stage pipeline in this processor. but despite all these feature as long as intel decide to increase ivy bridge's l2 cache from 256k per core to 512k per core amd will experience same horror they faced when core 2 came out.



First orochi is 4 module - 8 core design. Second not only the size but how fast is the cache. Third it is very important how the prediction of instructions will work, if the design is good then you dont need big L1 cache which increase cost and die size. And yes 2mb per module 1 mb per core is the amount that bulldozer will have.
 
Joined
Dec 26, 2006
Messages
3,444 (0.55/day)
Location
Northern Ontario Canada
Processor Ryzen 5700x
Motherboard Gigabyte X570S Aero G R1.1 BiosF5g
Cooling Noctua NH-C12P SE14 w/ NF-A15 HS-PWM Fan 1500rpm
Memory Micron DDR4-3200 2x32GB D.S. D.R. (CT2K32G4DFD832A)
Video Card(s) AMD RX 6800 - Asus Tuf
Storage Kingston KC3000 1TB & 2TB & 4TB Corsair LPX
Display(s) LG 27UL550-W (27" 4k)
Case Be Quiet Pure Base 600 (no window)
Audio Device(s) Realtek ALC1220-VB
Power Supply SuperFlower Leadex V Gold Pro 850W ATX Ver2.52
Mouse Mionix Naos Pro
Keyboard Corsair Strafe with browns
Software W10 22H2 Pro x64
I want one, a server version with 8 or 16 GB of ecc ram :D I don't know why though since I don't even work 1 core on my 955BE
 

cadaveca

My name is Dave
Joined
Apr 10, 2006
Messages
17,232 (2.63/day)
I wonder how hot these CPUs will get ...

Very hot...apparantly we'll see a clockspeed decrease(which I assume is due to the high levels of cache), but IPC will increase. I'm kinda expecting 2.4ghz or so...maybe lower...for launch chips.
 

bear jesus

New Member
Joined
Aug 12, 2010
Messages
1,534 (0.31/day)
Location
Britland
System Name Gaming temp// HTPC
Processor AMD A6 5400k // A4 5300
Motherboard ASRock FM2A75 PRO4// ASRock FM2A55M-DGS
Cooling Xigmatek HDT-D1284 // stock phenom II HSF
Memory 4GB 1600mhz corsair vengeance // 4GB 1600mhz corsair vengeance low profile
Storage 64gb sandisk pulse SSD and 500gb HDD // 500gb HDD
Display(s) acer 22" 1680x1050
Power Supply Seasonic G-450 // Corsair CXM 430W
Very hot...apparantly we'll see a clockspeed decrease(which I assume is due to the high levels of cache), but IPC will increase. I'm kinda expecting 2.4ghz or so...maybe lower...for launch chips.

Just a good reason for me to get my first real water cooling setup :D (assuming i am happy with the reviews of bulldozer)
 

cadaveca

My name is Dave
Joined
Apr 10, 2006
Messages
17,232 (2.63/day)
I don't know anything about it, really. However, there is mention of the clockspeed decrease on the AMD blog site. NOw that we have the info on cache size...1+1=2. Of course, there's lots of time between now and launch..seems to me they are refining the process, and a few bugs, at this point.
 

ROad86

New Member
Joined
Sep 24, 2010
Messages
21 (0.00/day)
Processor AMD Phenom II x4 B55
Motherboard Gigabyte MA790XT-UD4P
Cooling SilverStone Nitrogon NT06 Evolution+Noiseblocker BlackSilentPro
Memory Corsair XMS3 4GB
Video Card(s) Saphire Radeon 4870
Storage WD 640 Black + WD 500 Blue
Case Antec P193
Power Supply Corsair CMPSU-650TX
Software Win 7 Professional 64bit
I want one, a server version with 8 or 16 GB of ecc ram :D I don't know why though since I don't even work 1 core on my 955BE


Haha me too!!! :laugh:
 

bear jesus

New Member
Joined
Aug 12, 2010
Messages
1,534 (0.31/day)
Location
Britland
System Name Gaming temp// HTPC
Processor AMD A6 5400k // A4 5300
Motherboard ASRock FM2A75 PRO4// ASRock FM2A55M-DGS
Cooling Xigmatek HDT-D1284 // stock phenom II HSF
Memory 4GB 1600mhz corsair vengeance // 4GB 1600mhz corsair vengeance low profile
Storage 64gb sandisk pulse SSD and 500gb HDD // 500gb HDD
Display(s) acer 22" 1680x1050
Power Supply Seasonic G-450 // Corsair CXM 430W
I don't know anything about it, really. However, there is mention of the clockspeed decrease on the AMD blog site. NOw that we have the info on cache size...1+1=2. Of course, there's lots of time between now and launch..seems to me they are refining the process, and a few bugs, at this point.

Hmm i wonder if they will follow intel's lead (refering to the cooler that comes with the top end i7's) by using a better cooler for the high end cpu's if they run hot, would be nice to see a better cooler than the current one's as i am not really a fan of them.
 
Top