Bulldozer Information Thread.

CDdude55 · Sep 16, 2011

seronx said:
I would just say wait for AMD Zambezi...now you are getting to the point where you are hitting the architectural limit for AMD Deneb/Thuban

Those benchmarks heavily use the ALUs and AGUs which Phenom II can't use at the same time

Where in Video Games CPUs rarely require the max amount of ALU/AGUs(Notice most AMD chips are aimed at gamers)

So, if you do gimp, handbrake, mplayer, and 7-zip go right ahead

FX allows the ALUs/AGUs to be used at the sametime

Yes i agree.

AMD is finally doing a top down redesign, so i expect great things. Hence why i have my 990FX board already installed and ready for BD.

All i was really stating was where overall performance leaned, it doesn't matter to may people whether you get 100 frames or 90 frames in gaming hence overall values leans towards AMD, but heavy multi-threaded software is where the difference comes in.

It just depends what you need really.

LordJummy · Sep 16, 2011

A lot of this back and forth isn't really BD related at all.

To Intel fanboys: Intel is great, AMD fanboys will never agree with you.
To AMD fanboys: AMD is great, Intel fanboys will never agree with you.

To Everyone else: Intel & AMD are great. Let them duke it out and pick the best solution for yourself.

treehouse · Sep 16, 2011

AMD Benches FX-Series Bulldozer Against Intel CPUs

http://news.softpedia.com/news/AMD-Benches-FX-Series-Bulldozer-Against-Intel-CPUs-221888.shtml

seronx · Sep 16, 2011

LordJummy said:
A lot of this back and forth isn't really BD related at all.

To Intel fanboys: Intel is great, AMD fanboys will never agree with you.
To AMD fanboys: AMD is great, Intel fanboys will never agree with you.

To Everyone else: Intel & AMD are great. Let them duke it out and pick the best solution for yourself.

To be politically correct say "Enthusiast"

treehouse said:
http://news.softpedia.com/news/AMD-Benches-FX-Series-Bulldozer-Against-Intel-CPUs-221888.shtml

Handbrake... It uses Codecs so it depends what version they used

Gaming benchmark at 2560x1600 is GPU scaling

I read down to up

The AMD FX processor with eight cores perform this function with an average of 223 frames per second, the i5 with four cores came in at 188 fps.

Then

Since both tests are presumably carefully chosen for the new platform to put in a good light, we can based on these numbers no conclusions.

Ya, I wouldn't really trust these results till I know the codec

CDdude55 · Sep 16, 2011

LordJummy said:
A lot of this back and forth isn't really BD related at all.

To Intel fanboys: Intel is great, AMD fanboys will never agree with you.
To AMD fanboys: AMD is great, Intel fanboys will never agree with you.

To Everyone else: Intel & AMD are great. Let them duke it out and pick the best solution for yourself.

Real talk true story. (T.I.)

ensabrenoir · Sep 16, 2011

bpgt64 said:
What really makes no sense to me, is when Intel Fan boys cheer on poor performance of AMD products. If you prefer Intel, AMD performing well only forces Intel to price more competitively(price wars and better prices for you), and vis versa. Competition only benefits the consumer.

100% truth. But like with sports teams, ford vs chevy vs doge trucks, pc vs mac etc etc people get passionate about what they care about. Dosent excuse the name calling though. Me personally I can't loose. I run both companies product. I perfer intel cpus. Total fanboy on amd gpus.... total killer machine!

CDdude55 · Sep 16, 2011

ensabrenoir said:
100% truth. But like with sports teams, ford vs chevy vs doge trucks, pc vs mac etc etc people get passionate about what they care about. Dosent excuse the name calling though. Me personally I can't loose. I run booth companies product. I perfer intel cpus. Total fanboy on amd gpus.... total killer machine!

People preferring one companies products over another generally isn't why most people hate fanboys, it's those that blindly chose one or the other based on things with no merit.

I run whatever i can afford and will give me the best results for my tasks (mainly gaming) and that currently is AMD.

This is offtopic though, sorry. Then again is there even any new news to talk about for BD? lol, they've been fairly quiet.

trickson · Sep 16, 2011

I wonder if this BD is going to be released like there Phenom was . We heard VERY little about them up till there release and we all know just how well they performed . I'm Just sayin . I do not like all this silence on the part of AMD about a chip that should be very powerful and supper kick ass ! Kind of makes me wounder :rolleyes:

.

TheMailMan78 · Sep 16, 2011

trickson said:
I wonder if this BD is going to be released like there Phenom was . We heard VERY little about them up till there release and we all know just how well they performed . I'm Just sayin . I do not like all this silence on the part of AMD about a chip that should be very powerful and supper kick ass ! Kind of makes me wounder .

Incorrect. The Phenom 1 was hyped up like crazy. The Phenom 2 dropped silently and/is a great CPU.

trickson · Sep 16, 2011

TheMailMan78 said:
Incorrect. The Phenom 1 was hyped up like crazy. The Phenom 2 dropped silently and/is a great CPU.

Yeah that is what I meant .. LOL . At any rate this seems a bit like that to me . Not a lot of HYPE other than the WR OC and that is not much to go on . I have lots of cash now and I am going to build a new system soon But till I know for sure what the BD can really do it is hard for me to make a informed purchase as of yet and I am also looking at the Ivy bridge as well Nothing at all on that other than 3d transistors ( This has my E-peen all hard ) :eek:

.

seronx · Sep 16, 2011

Copy and Paste

Multi-Core Processors said:
AMD Family 15h processors have multiple compute units, each containing its own L2 cache and two
cores. The cores share their compute unit’s L2 cache. Each core incorporates the complete x86
instruction set logic and L1 data cache. Compute units share the processor’s L3 cache and
Northbridge.

Internal Instruction Formats said:
AMD Family 15h processors perform four types of primitive operations:
• Integer (arithmetic or logic)
• Floating-point (arithmetic)
• Load
• Store
The AMD64 instruction set is complex. Instructions have variable-length encoding and many
perform multiple primitive operations. AMD Family 15h processors do not execute these complex
instructions directly, but, instead, decode them internally into simpler fixed-length instructions called
macro-ops. Processor schedulers subsequently break down macro-ops into sequences of even simpler
instructions called micro-ops, each of which specifies a single primitive operation.
A macro-op is a fixed-length instruction that:
• Expresses, at most, one integer or floating-point operation and one load and/or store operation.
• Is the primary unit of work managed (that is, dispatched and retired) by the processor.
A micro-op is a fixed-length instruction that:
• Expresses one and only one of the primitive operations that processor can perform (for example, a
load).

Instruction Type | Description
FastPath Single | Decodes directly into one macro-op in microprocessor hardware.
FastPath Double | Decodes directly into two macro-ops in microprocessor hardware.
Microcode | Decodes into one or more (usually three or more) macro-ops using the on-chip microcode-engine ROM (MROM).

AMD Instruction Set Enhancements said:
The AMD Family 15h processor has been enhanced with the following new instructions:
• XOP and AVX support—Extended Advanced Vector Extensions provide enhanced instruction
encodings and non-destructive operands with an extended set of 128-bit (XMM) and 256-bit
(YMM) media registers
• FMA instructions—support for floating-point fused multiply accumulate instructions
• Fractional extract instructions—extract the fractional portion of vector and scalar single-precision
and double-precision floating-point operands
• Support for new vector conditional move instructions.
• VPERMILx instructions—allow selective permutation of packed double- and single-precision
floating point operands
• VPHADDx/VPSUBx—support for packed horizontal add and substract instructions
• Support for packed multiply, add and accumulate instructions
• Support for new vector shift and rotate instructions

Floating-Point Improvements said:
AMD Family 15h processors add support for 128-bit floating-point execution units. As a result, the
throughput of both single-precision and double-precision floating-point SIMD vector operations has
improved by 2X over the previous generation of AMD processors.
Users may notice differences in the results of programs when using the fused multiply and add
FMAC. These differences do not imply that the new results are less accurate than using the ADD and
MUL instructions separately. These differences result from the combination of an ADD and a MUL
into a single instruction. As separate instructions, ADD and MUL provide a result which is accurate
to ½ a bit in the least significant bit for the precision provided. However, the combined result of the
ADD and the MUL is not accurate to ½ a bit.
By fusing these two instructions into a “single” instruction, a fused multiply accumulate (FMAC), an
accurate result is provided that is within ½ a bit in the in least significant bit. Thus the difference
between performing “separate” ADDs and MULs and doing a “single” FMAC is the cause of
differences in the least significant bit of program results.

Instruction Fetching Improvements said:
While previous AMD64 processors had a single 32-byte fetch window, AMD Family 15h processors
have two 32-byte fetch windows, from which four μops can be selected. These fetch windows, when
combined with the 128-bit floating-point execution unit, allow the processor to sustain a
fetch/dispatch/retire sequence of four instructions per cycle. Most instructions decode to a single μop,
but fastpath double instructions decode to two μops. ALU instructions can also issue four μops per
cycle and microcoded instructions should be considered single issue. Thus, there is not necessarily a
one-to-one correspondence between the decode size of assembler instructions and the capacity of the
32-byte fetch window and the production of optimal assembler code requires considerable attention
to the details of the underlying programming constraints. Assembly language programmers can now group more instructions together but must still concern
themselves with the possibility that an instruction may span a 32-byte fetch window. In this regard, it
is also advisable to align hot loops to 32 bytes instead of 16 bytes, especially in the case of loops for
large SIMD instructions.

Instruction Decode and Floating-Point Pipe Improvements said:
Several integer and floating-point instructions have improved latencies and decode types on
AMD Family 15h processors. Furthermore, the FPU pipes utilized by several floating-point
instructions have changed. These changes can influence instruction choice and scheduling for
compilers and hand-written assembly code.

Notable Performance Improvements said:
Several enhancements to the AMD64 architecture have resulted in significant performance
improvements in AMD Family 15h processors, including:
• Improved performance of shuffle instructions
• Improved data transfer between floating-point registers and general purpose registers
• Improved floating-point register to floating-point register moves
• Optimization of repeated move instructions
• More efficient PUSH/POP stack operations
• 1-Gbyte paging

Improved Bandwidth Decode Type for Shuffle Instructions said:
The floating-point logic in AMD Family 15h processors uses three separate execution positions or
pipes called FADD, FMUL and FSTORE. This is illustrated in Figure 1 on page 32 in Appendix A.
Current AMD Family 15h processors support two SIMD logical/shuffle units, one in the FMUL pipe
and another in the FADD pipe, while previous AMD64 processors have only one SIMD
logical/shuffle unit in the FMUL pipe. As a result, the SIMD shuffle instructions can be processed at
twice the previous bandwidth on AMD Family 15h processors. Furthermore, the PSHUFD and
SHUFPx shuffle instructions are now DirectPath instructions instead of VectorPath instructions on
AMD Family 15h processors and take advantage of the 128-bit floating point execution units. Hence,
these instructions get a further 2X boost in bandwidth, resulting in an overall improvement of 4X in
bandwidth compared to the previous generation of AMD processors.
It’s more efficient to use SHUFPx and PSHUFD instructions over combinations of more than one
MOVLHPS/MOVHLPS/UNPCKx/PUNPCKx instructions to do shuffle operations.

Floating-Point Register-to-Register Moves said:
On previous AMD processors, floating-point register-to-register moves could only go through the
FADD and FMUL pipes. On AMD Family 15h processors, floating-point register-to-register moves
can also go through the FSTORE pipe, thereby improving overall throughput.

Large Page Support said:
AMD Family 15h processors now have better large page support, having incorporated new 1GB
paging and 2MB and 4KB paging improvements.
The L1 data TLB and L2 data TLB now support 1GB pages, a benefit to applications making large
data-set random accesses.
The L1 instruction TLB, L1 data TLB and L2 data TLB have increased the number of entries for
2MB pages. This improves the performance of software that uses 2MB code or data or code mixed
with data virtual pages.
The L1 data TLB has also increased the number of entries for 4KB pages.

Key Features said:
AMD Family 15h processors include many features designed to improve software performance. The
internal design, or microarchitecture, of these processors provides the following key features:
• Up to 8 Compute Units (CUs) with 2 cores per CU
• Integrated DDR3 memory controller (two on some models) with memory prefetcher
• 64-Kbyte L1 instruction cache per CU
• 16-Kbyte L1 data cache per core
• Unified L2 cache shared between cores of CU
• Shared L3 cache on chip (for supported platforms)
• 32-byte instruction fetch
• Instruction predecode and branch prediction during cache-line fills
• Decoupled prediction and instruction fetch pipelines
• Four-way instruction decoding
• Dynamic scheduling and speculative execution
• Two-way integer execution
• Two-way address generation
• Two-way 128-bit wide floating-point execution
• Legacy single-instruction multiple-data (SIMD) instruction extensions, as well as support for
XOP, FMA4, VPERMILx, and Advanced Vector Extensions (AVX).
• Superforwarding
• Prefetch into L2 or L1 data cache
• Deep out-of-order integer and floating-point execution
• HyperTransport™ technology

Microarchitecture of AMD Family 15h Processors said:
AMD Family 15h processors implement the AMD64 instruction set by means of macro-ops (the
primary units of work managed by the processor) and micro-ops (the primitive operations executed in
the processor's execution units). These are simple fixed-length operations designed to include direct
support for AMD64 instructions and adhere to the high-performance principles of fixed-length
encoding, regularized instruction fields, and a large register set. This enhanced microarchitecture
enables higher processor core performance and promotes straightforward extensibility for future
designs.

Superscalar Processor said:
The AMD Family 15h processor is an aggressive, out-of-order, superscalarprocessor. It can fetch,
decode, and issue up to four instructions per cycle using decoupled fetch and branch prediction units
and three independent instruction schedulers, consisting of two integer schedulers and one floatingpoint
scheduler.
These processors can fetch 32 bytes per cycle and can scan two 16-byte instruction windows for up to
four micro-ops, which can be dispatched together in a single cycle. The actual number of micro-ops
that are dispatched may be lower, depending on a number of factors, such as decode limits like the
number of loads and stores which can issue together and whether instructions can be broken up into
16-byte windows. The processors move integer instructions through the replicated integer clusters
and floating point instructions through the shared floating point unit (FPU).

L1 Instruction Cache said:
The out-of-order execution engine of AMD Family 15h processors contains a 64-Kbyte, 2-way setassociative
L1 instruction cache. Each line in this cache is 64 bytes long. However, only 32 bytes are
fetched in every cycle. Functions associated with the L1 instruction cache are instruction loads,
instruction prefetching, instruction predecoding, and branch prediction. Requests that miss in the L1
instruction cache are fetched from the L2 cache or, subsequently, from the L3 cache or system
memory.
On misses, the L1 instruction cache generates fill requests to a naturally aligned 64-byte line
containing the instructions and the next sequential line of bytes (a prefetch). Because code typically
exhibits spatial locality, prefetching is an effective technique for avoiding decode stalls. Cache-line
replacement is based on a least-recently-used replacement algorithm.
Predecoding begins as the L1 instruction cache is filled. Predecode information is generated and
stored alongside the instruction cache. This information is used to help efficiently identify the
boundaries between variable length AMD64 instructions.

L1 Data Cache said:
The AMD Family 15h processor contains a 16-Kbyte, 4-way predicted L1 data cache with two 128-
bit ports. This is a write-through cache that supports up to two 128 Byte loads per cycle. It is divided
into 16 banks, each 16 bytes wide. In addition, the L1 cache is protected from single bit errors through
the use of parity. There is a hardware prefetcher that brings data into the L1 data cache to avoid
misses. The L1 data cache has a 4-cycle load-to-use latency. Only one load can be performed from a
given bank of the L1 cache in a single cycle.

L2 Cache said:
The AMD Family 15h processor has one shared L2 cache per compute unit. This full-speed on-die L2
cache is mostly inclusive relative to the L1 cache. The L2 is a write-through cache. Every time a store
is performed in a core, that address is written into both the L1 data cache of the core the store belongs
to and the L2 cache (which is shared between the two cores). The L2 cache has an 18-20 cycle load to
use latency.

L3 Cache said:
The AMD Family 15h processor supports a maximum of 8MB of L3 cache per die, distributed among
four L3 sub-caches which can each be up to 2MB in size. The L3 cache is considered a non-inclusive
victim cache architecture optimized for multi-core AMD processors. Only L2 evictions cause
allocations into the L3 cache. Requests that hit in the L3 cache can either leave the data in the L3
cache—if it is likely the data is being accessed by multiple cores—or remove the data from the L3
cache (and place it solely in the L1 cache, creating space for other L2 victim/copy-backs), if it is likely the data is only being accessed by a single core. Furthermore, the L3 cache of the AMD Family
15h processor also features a number of micro-architectural improvements that enable higher
bandwidth.

Branch-Prediction said:
To predict and accelerate branches, AMD Family 15h processors employ a combination of nextaddress
logic, a 2-level branch target buffer (BTB) for branch identification and direct target
prediction, a return address stack used for predicting return addresses, an indirect target predictor for
predicting indirect jump and call addresses, a hybrid branch predictor for predicting conditional
branch directions, and a fetch window tracking structure (BSR). Predicted-taken branches incur a 1-
cycle bubble in the branch prediction pipeline when they are predicted by the L1 BTB, and a 4-cycle
bubble in the case where they are predicted by the L2 BTB. The minimum branch misprediction
penalty is 20 cycles in the case of conditional and indirect branches and 15 cycles for unconditional
direct branches and returns.
The BTB is a tagged two-level set associative structure accessed using the fetch address of the current
window. Each BTB entry includes information about a branch and its target. The L1 BTB contains
128 sets of 4 ways for a total of 512 entries, while the L2 BTB has 1024 sets of 5 ways for a total of
5120 entries.
The hybrid branch predictor is used for predicting conditional branches. It consists of a global
predictor, a local predictor and a selector that tracks whether each branch is correlating better with the
global or local predictor. The selector and local predictor are indexed with a linear address hash. The
global predictor is accessed via a 2-bit address hash and a 12-bit global history.
AMD Family 15h processors implement a separate 512- entry indirect target array used to predict
indirect branches with multiple dynamic targets.
In addition, the processors implement a 24-entry return address stack to predict return addresses from
a near or far call. Most of the time, as calls are fetched, the next return address is pushed onto the
return stack and subsequent returns pop a predicted return address off the top of the stack. However,
mispredictions sometimes arise during speculative execution. Mechanisms exist to restore the stack to
a consistent state after these mispredictions.

Instruction Fetch and Decode said:
AMD Family 15h processors can theoretically fetch 32B of instructions per cycle and send these
instructions to the Decode Unit (DE) in 16B windows through the 16-entry (per-thread) Instruction
Byte Buffer (IBB). The Decode Unit can only scan two of these 16B windows in a given cycle for up
to four instructions. If four instructions partially or wholly exist in more than two of these windows,
only those instructions within the first and second windows will be decoded. Aligning to 16B
boundaries is important to achieve full decode performance.

Integer Execution said:
The integer execution unit for the AMD Family 15h processor consists of two components:
• the integer datapath
• the instruction scheduler and retirement control
These two components are responsible for all integer execution (including address generation) as well
as coordination of all instruction retirement and exception handling. The instruction scheduler and
retirement control tracks instruction progress from dispatch, issue, execution and eventual retirement.
The scheduling for integer operations is fully data-dependency driven; proceeding out-of-order based
on the validity of source operands and the availability of execution resources.
Since the Bulldozer core implements a floating point co-processor model of operation, most
scheduling and execution decisions of floating-point operations are handled by the floating point unit.
However, the scheduler does track the completion status of all outstanding operations and is the final
arbiter for exception processing and recovery.

Translation-Lookaside Buffer said:
A translation-lookaside buffer (TLB) holds the most-recently-used page mapping information. It
assists and accelerates the translation of virtual addresses to physical addresses.
The AMD Family 15h processors utilize a two-level TLB structure.

L1 Instruction TLB Specifications said:
The AMD Family 15h processor contains a fully-associative L1 instruction TLB with 48 4-Kbyte
page entries and 24 2-Mbyte or 1-Gbyte page entries. 4-Mbyte pages require two 2-Mbyte entries;
thus, the number of entries available for 4-Mbyte pages is one half the number of 2-Mbyte page
entries.

L1 Data TLB Specifications said:
The AMD Family 15h processor contains a fully-associative L1 data TLB with 32 entries for 4-
Kbyte, 2-Mbyte, and 1-Gbyte pages. 4-Mbyte pages require two 2-Mbyte entries; thus, the number of
entries available for 4-Mbyte pages is one half the number of 2-Mbyte page entries.

L2 Instruction TLB Specifications said:
The AMD Family 15 processor contains a 4-way set-associative L2 instruction TLB with 512 4-
Kbyte page entries.

L2 Data TLB Specifications said:
The AMD Family 15h processor contains an L2 data TLB and page walk cache (PWC) with 1024 4-
Kbyte, 2-Mbyte or 1-Gbyte page entries (8-way set-associative). 4-Mbyte pages require two 2-Mbyte
entries; thus, the number of entries available for 4-Mbyte pages is one half the number of 2-Mbyte
page entries.

Integer Unit said:
The integer unit consists of two components, the integer scheduler, which feeds the integer execution
pipes, and the integer execution unit, which carries out several types of operations discussed below.
The integer unit is duplicated for each thread pair.

Integer Scheduler said:
The scheduler can receive and schedule up to four micro-ops (μops) in a dispatch group per cycle.
The scheduler tracks operand availability and dependency information as part of its task of issuing
μops to be executed. It also assures that older μops which have been waiting for operands are
executed in a timely manner. The scheduler also manages register mapping and renaming.

*Might be an error the four micro-ops are actually four marco-ops because in the next section it says "Macro-ops are broken down into micro-ops in the schedulers."

Integer Execution Unit said:
There are four integer execution units per core. Two units which handle all arithmetic, logical and
shift operations (EX). And two which handle address generation and simple ALU operations
(AGLU). This makes an Integer Cluster. There is two such clusters per compute unit.

Macro-ops are broken down into micro-ops in the schedulers. Micro-ops are executed when their
operands are available, either from the register file or result buses. Micro-ops from a single operation
can execute out-of-order. In addition, a particular integer pipe can execute two micro-ops from
different macro-ops (one in the ALU and one in the AGLU) at the same time. The scheduler can receive up to four macro-ops per cycle. This group of macro-ops is
called a dispatch group.

EX0 contains a variable latency non-pipelined integer divider. EX1 contains a pipelined integer
multiplier. The AGLUs contain a simple ALU to execute arithmetic and logical operations and
generate effective addresses. A load and store unit (LSU) reads and writes data to and from the L1
data cache. The integer scheduler sends a completion status to the ICU when the outstanding microops
for a given macro-op are executed.
The LZCNT and POPCNT operations are handled in a pipelined unit attached to EX0

Floating-Point Unit said:
The AMD Family 15h processor floating point unit (FPU) was designed to provide four times the raw
FADD and FMUL bandwidth as the original AMD Opteron and Athlon 64 processors. It achieves this
by means of two 128-bit fused multiply-accumulate (FMAC) units which are supported by a 128-bit
high-bandwidth load-store system. The FPU is a coprocessor model that is shared between the two
cores of one AMD Family 15h compute unit. As such it contains its own scheduler, register files and
renamers and does not share them with the integer units. This decoupling provides optimal
performance of both the integer units and the FPU. In addition to the two FMACs, the FPU also
contains two 128-bit integer units which perform arithmetic and logical operations on AVX, MMX
and SSE packed integer data.
A 128-bit integer multiply accumulate (IMAC) unit is incorporated into FPU pipe 0. The IMAC
performs integer fused multiply and accumulate, and similar arithmetic operations on AVX, MMX
and SSE data. A crossbar (XBAR) unit is integrated into FPU pipe 1 to execute the permute
instruction along with shifts, packs/unpacks and shuffles. There is an FPU load-store unit which
supports up to two 128-bit loads and one 128-bit store per cycle.
FPU Features Summary and Specifications:
• The FPU can receive up to four ops per cycle. These ops can only be from one thread, but the
thread may change every cycle. Likewise the FPU is four wide, capable of issue, execution and
completion of four ops each cycle. Once received by the FPU, ops from multiple threads can be
executed.
• Within the FPU, up to two loads per cycle can be accepted, possibly from different threads.
• There are four logical pipes: two FMAC and two packed integer. For example, two 128-bit
FMAC and two 128-bit integer ALU ops can be issued and executed per cycle.
• Two 128-bit FMAC units. Each FMAC supports four single precision or two double-precision
ops.
• FADDs and FMULs are implemented within the FMAC’s.
• x87 FADDs and FMULs are also handled by the FMAC.
• Each FMAC contains a variable latency divide/square root machine.
• Only 1 256-bit operation can issue per cycle, however an extra cycle can be incurred as in the case
of a FastPath Double if both micro ops cannot issue together.

*Might be another error when micro ops are said but they mean macro ops

Load-Store Unit said:
The AMD family 15h processor load-store (LS) unit handles data accesses. There are two LS units
per compute unit, or one per core. The LS unit supports two 128-bit loads/cycles and one 128-bit
store/cycle. There is a 24 entry store queue. This queue buffers stored data until it can be written to
the data cache. The load queue has 40 entries and holds load operations until after the load has been
completed and delivered to the integer unit or the FPU. The LS unit is composed of two largely independent
pipelines enabling the execution of two memory operations per cycle.
Finally, the LS unit helps ensure that the architectural load and store ordering rules are preserved (a
requirement for AMD64 architecture compatibility).

Adding more with edits

I hate this Table tool!!!

Also, there might be errors in this some lines are copy and pasted(This has been out since April)
I am postin this here because this is a "Bulldozer" information thread

Zen_ · Sep 16, 2011

bpgt64 said:
What really makes no sense to me, is when Intel Fan boys cheer on poor performance of AMD products. If you prefer Intel, AMD performing well only forces Intel to price more competitively(price wars and better prices for you), and vis versa. Competition only benefits the consumer.

No...no...you don't get it! The world will be a better place when we only have one choice.

JF-AMD · Sep 16, 2011

bpgt64 said:
What really makes no sense to me, is when Intel Fan boys cheer on poor performance of AMD products. If you prefer Intel, AMD performing well only forces Intel to price more competitively(price wars and better prices for you), and vis versa. Competition only benefits the consumer.

Exactly. I actually have this launch prediction:

AMD will release the chip
AMD will provide benchmarks
Others will run benchmarks
Everyone will have their own biased view of the processor
Fanboys will always fight each other, they will never agree
Six months from now (and beyond) the fighting will still go on because, dammit, some people just like to fight

I am a huge Fox Racing Shocks guy, but I love it when Rock Shox has new products out because that just makes the next year's Fox products better.

Damn_Smooth said:
I like your post, but I really don't see how it contradicts anything that I said. We really don't have anything to base Bulldozer's performance on other than speculation.

My speculation leads me to believe that Bulldozer will be a nice chip. I could be wrong though.

I think it will be just fine. Forum wars will continue no matter what the outcome is.

LordJummy said:
A lot of this back and forth isn't really BD related at all.

To Intel fanboys: Intel is great, AMD fanboys will never agree with you.
To AMD fanboys: AMD is great, Intel fanboys will never agree with you.

To Everyone else: Intel & AMD are great. Let them duke it out and pick the best solution for yourself.

+1

CDdude55 said:
People preferring one companies products over another generally isn't why most people hate fanboys, it's those that blindly chose one or the other based on things with no merit.

I run whatever i can afford and will give me the best results for my tasks (mainly gaming) and that currently is AMD.

This is offtopic though, sorry. Then again is there even any new news to talk about for BD? lol, they've been fairly quiet.

+1

repman244 · Sep 16, 2011

Well here is some info from chew* about BD working on Crosshair IV (AM3 MB), which I know many people were saying it's impossible:

Will it work? Yes

The right question is will it be optimal? No.

So I guess turbo and power saving features probably will not work (my guess is anything power delivery related).
Thicker pins eh?

And I strongly believe 19th is the day we get to see the numbers.

Covert_Death · Sep 16, 2011

repman244 said:
Well here is some info from chew* about BD working on Crosshair IV (AM3 MB), which I know many people were saying it's impossible:

So I guess turbo and power saving features probably will not work (my guess is anything power delivery related).
Thicker pins eh?

And I strongly believe 19th is the day we get to see the numbers.

really hope your right! been dying to see something official, benches really dont bother me too much cause you can make them say whatever you want, especially before the public gets their hands on them. what i DO want to know is release date and official price

cadaveca · Sep 16, 2011

repman244 said:
And I strongly believe 19th is the day we get to see the numbers.

Well, I have no CPU here now, and the 19th is Monday. You'd think AMD would want a site as big as TPU to have a CPU for launch day, no?

:laugh:

No way I'm doing a review over the weekend...

So either there's 3 options:

A.: Possibly other staff here has one(very possible).

B.: Bulldozer is NOT launching on Monday.

C.: AMD marketing has failed.

I'm leaning towards "C".

:laugh:

All i really know is it won't be me doing a launch review. :laugh:

BTW, this thread is averaging 6000 views per week so far. :laugh:

seronx · Sep 16, 2011

cadaveca said:
A.: Possibly other staff here has one(very possible).

B.: Bulldozer is NOT launching on Monday.

C.: AMD marketing has failed.

I'm leaning towards "C".

A.: A possibility

B. AMD CPUs don't launch on Mondays or Fridays on AVG, they usually launch Tuesday thru Thursday
But next week is the anniversary of the K8 FXs

C. AMD Marketing has failed.
I doubt this one

erocker · Sep 16, 2011

cadaveca said:
Well, I have no CPU here now, and the 19th is Monday. You'd think AMD would want a site as big as TPU to have a CPU for launch day, no?

Do we currently have a CPU reviewer?

cadaveca · Sep 16, 2011

erocker said:
Do we currently have a CPU reviewer?

Uh...I am not sure, TBH. I thought that was Omega? I would have stepped up if we didn't..I'm been bugging JF-AMD for CPUs long enough, no? :laugh:

TheMailMan78 · Sep 16, 2011

erocker said:
Do we currently have a CPU reviewer?

Ill review it! I have no idea what I'm doing and the benches will be illegible with porn charts and the conclusion will be reminiscent of a Dr. Seuss book.........but by g-d Ill review it for the community!

erocker · Sep 16, 2011

I would do it.. but only if the MailMan gets my position as Senior Moderator. Then again I don't want to be the first person to suffer from dissappointment with a CPU that had such high hopes.

cadaveca · Sep 16, 2011

erocker said:
I would do it.. but only if the MailMan gets my position as Senior Moderator.

pLEASE....no?

:banghead:

erocker said:
Then again I don't want to be the first person to suffer from dissappointment with a CPU that had such high hopes.

If ya needed a board, too, i'd send one!

you gotta pay shipping though! :laugh:

i tell ya what, why don't we ban AMD talk from the forum until they send a sample to ya? :roll:

TheMailMan78 · Sep 16, 2011

erocker said:
I would do it.. but only if the MailMan gets my position as Senior Moderator. Then again I don't want to be the first person to suffer from dissappointment with a CPU that had such high hopes.

Yeah but see if I review it the disappointment will be so great with the review people wont even know it was for a CPU. People will just think "WTF did I just read?!"

seronx · Sep 16, 2011

I would review it and skew the results by adding flags that would make it use FMA in most open source benchmarks but then again we are talking about Windows so you will have to deal with half bandwidth

erocker · Sep 16, 2011

cadaveca said:
If ya needed a board, too, i'd send one! you gotta pay shipping though!

i tell ya what, why don't we ban AMD talk from the forum until they send a sample to ya?

Because people would just constantly make threads about it and we'd get sick of removing them and give up anyways. I already have a CHV

seronx said:
I would review it and skew the results by adding flags that would make it use FMA in most open source benchmarks but then again we are talking about Windows so you will have to deal with half bandwidth

Indeed. Because that would totally relate to 99% of AMD's customers! I'm kidding. I'd like to take my horse to a pig show and call it the most awesome pig evar.

Bah, I'm done here. BD talk turns my brains into jello and I post stupid things.

System Name	CDdude's Rig!
Processor	AMD Athlon II X4 620
Motherboard	Gigabyte GA-990FXA-UD3
Cooling	Corsair H70
Memory	8GB Corsair Vengence @1600mhz
Video Card(s)	XFX HD 6970 2GB
Storage	OCZ Agility 3 60GB SSD/WD Velociraptor 300GB
Display(s)	ASUS VH232H 23" 1920x1080
Case	Cooler Master CM690 (w/ side window)
Audio Device(s)	Onboard (It sounds fine)
Power Supply	Corsair 850TX
Software	Windows 7 Home Premium 64bit SP1

System Name	Workstation1 \| Asus G55VW-DS71
Processor	i7 970 3.8GHz \| i7 3610QM
Motherboard	RIII Formula
Cooling	EK 360 Supreme HF \| Asus G55VW
Memory	24GB Dominator \| 12GB DDR3
Video Card(s)	2x Diamond HD 6970 \| GTX 660M
Storage	2x Vertex4 256GB \| 256GB Vertex4 & 750GB HDD
Display(s)	3x Crossover 27" LED S-IPS + 30" DELL IPS
Case	Corsair Obsidian 800D
Audio Device(s)	X-Fi Titanium Fatal1ty Pro + Gigaworks G550W
Power Supply	HX1000 + NZXT Black Sleeved Extensions
Software	Win7Ult64Bit
Benchmark Scores	ballz

System Name	hazel
Processor	phenom II 555 BE unlocked to 4 cores at 3.6ghz
Motherboard	asus m4a89gtdpro
Cooling	antec kuhler
Memory	8gb corsair
Video Card(s)	2 x 6950 crossfire
Storage	ocz ssd 60gb
Display(s)	LG 24 inch 1080p
Case	nzxt phantom
Audio Device(s)	onboard
Power Supply	BeQuiet 750w

System Name	SolarwindMobile
Processor	AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard	Acer Wasp_BR
Cooling	It's Copper.
Memory	2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s)	ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage	TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s)	ViewSonic XG2401 SERIES
Case	Acer Aspire E5-553G
Audio Device(s)	Realtek ALC255
Power Supply	PANASONIC AS16A5K
Mouse	SteelSeries Rival
Keyboard	Ducky Channel Shine 3
Software	Windows 10 Home 64-bit (Version 1607, Build 14393.969)

System Name	CDdude's Rig!
Processor	AMD Athlon II X4 620
Motherboard	Gigabyte GA-990FXA-UD3
Cooling	Corsair H70
Memory	8GB Corsair Vengence @1600mhz
Video Card(s)	XFX HD 6970 2GB
Storage	OCZ Agility 3 60GB SSD/WD Velociraptor 300GB
Display(s)	ASUS VH232H 23" 1920x1080
Case	Cooler Master CM690 (w/ side window)
Audio Device(s)	Onboard (It sounds fine)
Power Supply	Corsair 850TX
Software	Windows 7 Home Premium 64bit SP1

System Name	The Stone that the Builders Refused / iJayo
Processor	R5 1600/ R7 3700X
Motherboard	Asrock AB350 Pro4 / Asus Rog Strix B450-F gaming
Cooling	Cryorig M9 / Noctua NH-D14
Memory	G skill 16 Gigs ddr4 / 16 gigs PNY ddr4
Video Card(s)	Nvdia GTX 660 / Nvidia RTX 2070 Super
Storage	120gig 840 evo, 120gig adata sp900 / 1tb Mushkin M.2 ssd 1 & 3 tb seagate hdd, 120 gig Hyper X ssd
Display(s)	42" Nec retail display monitor/ 34" Dell curved 165hz monitor
Case	Pink Enermax Ostrog / Phanteks Enthoo Evolv Tempered Glass edition
Audio Device(s)	Altec Lansing Expressionist Bass/ M-Audio monitors
Power Supply	Corsair450 / Be Quiet Dark Power Pro 650
Mouse	corsair vengence M65 / Zalman Knossos
Keyboard	corsair k95 / Roccat Vulcan 121
Software	Window 10 pro / Windows 10 pro
Benchmark Scores	meh... feel me on the battle field!

System Name	Ryzen TUF.
Processor	AMD Ryzen7 3700X
Motherboard	Asus TUF X570 Gaming Plus
Cooling	Noctua
Memory	Gskill RipJaws 3466MHz
Video Card(s)	Asus TUF 1650 Super Clocked.
Storage	CB 1T M.2 Drive.
Display(s)	73" Soney 4K.
Case	Antech LanAir Pro.
Audio Device(s)	Denon AVR-S750H
Power Supply	Corsair TX750
Mouse	Optical
Keyboard	K120 Logitech
Software	Windows 10 64 bit Home OEM

System Name	TheMailbox 5.0 / The Mailbox 4.5
Processor	RYZEN 1700X / Intel i7 2600k @ 4.2GHz
Motherboard	Fatal1ty X370 Gaming K4 / Gigabyte Z77X-UP5 TH Intel LGA 1155
Cooling	MasterLiquid PRO 280 / Scythe Katana 4
Memory	ADATA RGB 16GB DDR4 2666 16-16-16-39 / G.SKILL Sniper Series 16GB DDR3 1866: 9-9-9-24
Video Card(s)	MSI 1080 "Duke" with 8Gb of RAM. Boost Clock 1847 MHz / ASUS 780ti
Storage	256Gb M4 SSD / 128Gb Agelity 4 SSD , 500Gb WD (7200)
Display(s)	LG 29" Class 21:9 UltraWide® IPS LED Monitor 2560 x 1080 / Dell 27"
Case	Cooler Master MASTERBOX 5t / Cooler Master 922 HAF
Audio Device(s)	Realtek ALC1220 Audio Codec / SupremeFX X-Fi with Bose Companion 2 speakers.
Power Supply	Seasonic FOCUS Plus Series SSR-750PX 750W Platinum / SeaSonic X Series X650 Gold
Mouse	SteelSeries Sensei (RAW) / Logitech G5
Keyboard	Razer BlackWidow / Logitech (Unknown)
Software	Windows 10 Pro (64-bit)
Benchmark Scores	Benching is for bitches.

System Name	Desktop
Processor	Intel Xeon E5-1680v2
Motherboard	ASUS Sabertooth X79
Cooling	Intel AIO
Memory	8x4GB DDR3 1866MHz
Video Card(s)	EVGA GTX 970 SC
Storage	Crucial MX500 1TB + 2x WD RE 4TB HDD
Display(s)	HP ZR24w
Case	Fractal Define XL Black
Audio Device(s)	Schiit Modi Uber/Sony CDP-XA20ES/Pioneer CT-656>Sony TA-F630ESD>Sennheiser HD600
Power Supply	Corsair HX850
Mouse	Logitech G603
Keyboard	Logitech G613
Software	Windows 10 Pro x64

System Name	scrapper
Processor	Intel i5-6600k @ 4.0Ghz
Motherboard	ASUS Z-170A
Cooling	H100i
Memory	Corsair Vengence 16gb ddr4 @3000
Video Card(s)	GTX 1070 EVGA SC
Storage	2x 500GB Samsung 850 SSD
Display(s)	CrossOver 27" 1440p LG IPS @ 120Hz
Case	Corsair 650D
Power Supply	AX-850W

Processor	AMD Ryzen 7 7800X3D
Motherboard	ASUS TUF x670e
Cooling	EK AIO 360. Phantek T30 fans.
Memory	32GB G.Skill 6000Mhz
Video Card(s)	Asus RTX 4090
Storage	WD m.2
Display(s)	LG C2 Evo OLED 42"
Case	Lian Li PC 011 Dynamic Evo
Audio Device(s)	Topping E70 DAC, SMSL SP200 Headphone Amp.
Power Supply	FSP Hydro Ti PRO 1000W
Mouse	Razer Basilisk V3 Pro
Keyboard	Tester84
Software	Windows 11

Bulldozer Information Thread.

Crazy 4 TPU!!!

New Member

New Member

Crazy 4 TPU!!!

Crazy 4 TPU!!!

OH, I have such a headache

Big Member

OH, I have such a headache

AMD Rep (Server)

My name is Dave

*

My name is Dave

Big Member

*

My name is Dave

Big Member

*