Monday, July 11th 2011

AMD FX-8130P Processor Benchmarks Surface

Here is a tasty scoop of benchmark results purported to be those of the AMD FX-8130P, the next high-end processor from the green team. The FX-8130P was paired with Gigabyte 990FXA-UD5 motherboard and 4 GB of dual-channel Kingston HyperX DDR3-2000 MHz memory running at DDR3-1866 MHz. A GeForce GTX 580 handled the graphics department. The chip was clocked at 3.20 GHz (16 x 200 MHz). Testing began with benchmarks that aren't very multi-core intensive, such as Super Pi 1M, where the chip clocked in at 19.5 seconds; AIDA64 Cache and Memory benchmark, where L1 cache seems to be extremely fast, while L2, L3, and memory performance is a slight improvement over the last generation of Phenom II processors.
Moving on to multi-threaded tests, Fritz Chess yielded a speed-up of over 29.5X over the set standard, with 14,197 kilonodes per second. x264 benchmark encoded first pass at roughly 136 fps, with roughly 45 fps in the second pass. The system scored 3045 points in PCMark7, and P6265 in 3DMark11 (performance preset). The results show that this chip will be highly competitive with Intel's LGA1155 Sandy Bridge quad-core chips, but as usual, we ask you to take the data with a pinch of salt.
Source: DonanimHaber
Add your own comment

317 Comments on AMD FX-8130P Processor Benchmarks Surface

#151
Benetanegia
seronx4 fetch/decode/store per cycle for 32bit

2 fetch/decode/store per cycle for 64bit

and in theory if there were registers for it

1 fetch/decode store per cycle for 128bit
Still a single unit. More than one fetch/decode/store operations per cycle happens in every architecture since, I can't even remember when. Previous AMD chips did up to 3, now it's 4. I see the improvement but it's still only 1 fetch/decode unit per module nonetheless.

The line is blurred definately, but you can call BD module an 1 core as easily as you can call it a 2 core. Because of the single fetch unit I'm more inclined to call it 1 core.
Posted on Reply
#152
seronx
TheLaughingManSSE5 was replaced with several smaller instruction sets that were redesigned to work with AVX better. This happened right after the AMD/Intel contract was renegotiated. So SSE5 as far as the name is concerned will not be on Bulldozer.

Some reviews showed a while back (like years) if you change the name that was reported to some of the benchmark programs, you would magically get better numbers. A VIA C7 that was reported to the software as either an AMD processor or Intel process improved its memory and per-clock performance. While the performance could be justified as the VIA C7 aquired use of SSE3 at the time rather late and a patch for the software was needed. The memory performance change was just BS.

And there has been no confirmation of the naming scheme to my knowledge.
For what I use SSE5(XOP,CVT16,FMA4) is highly important

www.xbitlabs.com/news/cpu/display/20110313113629_Four_AMD_Bulldozer_Chips_Incoming_Details_Revealed.html
BenetanegiaStill a single unit. More than one fetch/decode/store operations per cycle happens in every architecture since, I can't even remember when. Previous AMD chips did up to 3, now it's 4. I see the improvement but it's still only 1 fetch/decode unit per module nonetheless.

The line is blurred definately, but you can call BD module an 1 core as easily as you can call it a 2 core. Because of the single fetch unit I'm more inclined to call it 1 core.
It's an 8 core

It has 4 fetch/decode/store units not one per module

Phenom II could only do 3 fetch/decodes per clock or 3 stores per clock
Posted on Reply
#153
LAN_deRf_HA
cadavecaCache speed and memory performance are very tightly linked together. We are talking about the combination of BOTH. Increase L3 speed, and memory bandwidth goes up with it. I can show this very simply with both AMD and Intel chips.

You can say that SPi favors Intel chips...but then again, if you want to go down that road, so do the majority of applications out there...any apps favors the faster performance on 1155. Like I posted above, I don't care, really, if an app favors one over the other...the fact of the matter is that the end user gets better performance on 1155, not how Intel really got there.
Both? Alright so we already know AMD has better ram bandwidth, let's look at the cache. Btw 775 doesn't even have L3.




Note the DDR3 on phenom only gives it a .2 latency boost on L2. It would win either way.

Biggest difference I see is the read and copy are switched. Overall it appears AMD is faster on cache as well. Yet super pi still does better on 775 despite all that. So really what purpose does super pi have here in comparing AMD and Intel chips if the architecture is making a bigger impact than the memory speeds?
Posted on Reply
#154
seronx
LAN_deRf_HABiggest difference I see is the read and copy are switched. Overall it appears AMD is faster on cache as well. Yet super pi still does better on 775 despite all that. So really what purpose does super pi have here in comparing AMD and Intel chips if the architecture is making a bigger impact than the memory speeds?
None, Super Pi doesn't use a living architecture like wPrime does

x87 vs SSE
SSE wins

Name applications that came out this year that uses x87
Posted on Reply
#155
TheLaughingMan
seronxNone, Super Pi doesn't use a living architecture like wPrime does

x87 vs SSE
SSE wins

Name applications that came out this year that uses x87
x87 is like 5 years old and completely obsolete. The Phenom II is running Super Pi with the SSE Instruction sets up to SSE3. Intel gets the benefit of the full SSE4, SSE4.1 and SSE4.2.

So of course there is no x87 programs. Why would anyone do that.
Posted on Reply
#156
Benetanegia
seronxIt has 4 fetch/decode/store units not one per module
False.
Shared Instruction Fetch

Sharing between cores is a key element of Bulldozer’s architecture, and it starts with the front end. The front-end has been entirely overhauled and is now responsible for feeding both cores within a module. Bulldozer’s front end includes branch prediction, instruction fetching, instruction decoding and macro-op dispatch. These stages are effectively multi-threaded with single cycle switching between threads. The arbitration between the two cores is determined by a number of factors including fairness, pipeline occupancy and stalling events.
Basically each module has one fetch/decode unit capable of issuing 4 macrops per cycle (same as Intel does since Nehalem, or sooner I'm not sure actually). So while a Phenom X6 had in total six units capable of issuing 3 Mops each, 8 "core" BD has 4 units capable of issuing 4 Mops each.
Posted on Reply
#157
repman244
More stuff for you guys to discuss :laugh:

support.amd.com/us/Processor_TechDocs/47414.pdf
The following performance caveats apply when using streaming stores on AMD Family 15h cores.
• When writing out a single stream of data sequentially, performance of AMD Family 15h
processors is comparable to previous generations of AMD processors.
• When writing out two streams of data, AMD Family 15h version 1 processors can be up to three
times slower than previous-generation AMD processors. AMD Family 15h version 2 processor
performance is approximately 1.5 times slower than previous AMD processors.
• When writing out four non-temporal streams, AMD Family 15h version 1 can be up to three
times slower than previous AMD processors. AMD Family 15h version 2 processor performance
is comparable to previous AMD processors.
• Using non-temporal stores but not writing out an entire cacheline may cause performance to be up
to six times slower than previous AMD processors.
*goes away to get more popcorn*
Posted on Reply
#158
cadaveca
My name is Dave
LAN_deRf_HABoth? Alright so we already know AMD has better ram bandwidth, let's look at the cache. Btw 775 doesn't even have L3.

i46.tinypic.com/ta1se8.png
img219.imageshack.us/img219/3314/cachemem2m.png

Note the DDR3 on phenom only gives it a .2 latency boost on L2. It would win either way.

Biggest difference I see is the read and copy are switched. Overall it appears AMD is faster on cache as well. Yet super pi still does better on 775 despite all that. So really what purpose does super pi have here in comparing AMD and Intel chips if the architecture is making a bigger impact than the memory speeds?
I'm sorry, but your compare here is inaccurate. You've got AMD with DDR3, and Intel with DDR2. I dunno about you, but I ran my 775 on DDR3 as soon as DDR3 boards came out. In fact, my old 775 board, a Foxconn BlackOps, that supports DDR3, is on it's way to EasyRhino right now.


Anyway, the point was that SuperPi can directly relate to SOME APPs and how they can perform, and is in no way meant to be used as a comparison for all performance scenarios.


And I do have screenshots from that platform. I'll not fall for the obvious problems in your compare; your troll failed, sry.:laugh:
Posted on Reply
#159
faramir
DeerSteakUnsurprisingly, as you've been prone to do today, seronx, you didn't actually read what you were responding to. I'm starting to think it's a reading comprehension deficiency.
That, or a bit of this. That's another link for him.
Posted on Reply
#160
bucketface
Amd is considering integer clusters as "cores". There are 8 Integer clusters on BD so they say 8 cores.
@cadaveca
isn't it the cache on Amd chips that is significantly lower performing and not Memory (ram) bandwidth, somuch. from what i've seen memory bandwidth isn't that far behind Intel on Amd. Also Super pi tests at or below chip cache should be only limited by cache bandwidth/latency. the larger tests should show combined effects from cache and memory.

From my understanding if Amd were to go out of bussiness then Intel would get carved up into bite sized chunks that would have to compete with eachother. Anyway why wuld you want the competion to fold, it just leads to higher prices. ideally you want at least 3 major players in a market each controlling roughly equal market share. that way you get lots of competition and good prices.
Posted on Reply
#161
seronx
BenetanegiaFalse.

Basically each module has one fetch/decode unit capable of issuing 4 macrops per cycle (same as Intel does since Nehalem, or sooner I'm not sure actually). So while a Phenom X6 had in total six units capable of issuing 3 Mops each, 8 "core" BD has 4 units capable of issuing 4 Mops each.
I don't care anymore



I got confused with this picture

You were right but my mind remembered something else

64KB L1I is divided by 2 for each core 32KB L1I per core just like Intel
Posted on Reply
#162
GenTarkin
TheLaughingManx87 is like 5 years old and completely obsolete. The Phenom II is running Super Pi with the SSE Instruction sets up to SSE3. Intel gets the benefit of the full SSE4, SSE4.1 and SSE4.2.

So of course there is no x87 programs. Why would anyone do that.
Um dude, Phenom II doesnt use SSE anything for SuperPI .. neither does SB.
SuperPI only utilizes x87 for its codebase, therefore thats whats run on both processors in that benchmark.
It makes no sense for any modern uarch strive for x87 prowess ... so, Im pretty sure superPI is the last thing on AMD's mind...if it ever was to begin with =P
It just so happens SB is better at x87 stuff ... who cares!
I wish people would drop superpi all together its meaningless nowadays...yet people use it to leave a good or bad taste in their mouth about an upcoming uarch....freakin retarded way to make first impressions of a new uarch!!!
Posted on Reply
#163
H82LUZ73
PestilenceHow do you figure? Preliminary pricing has the 8 core BD at 330 dollars and the 990FX boards are priced around the same as P67 boards.

As for overclocking. Sandy Bridge processors are 95W TDP's, BD 8 core is 140W. Which do you think is going to have an easier time overclocking?



By 1% to 2% ser. There will be no miracle 10% gains.
there 125 and 95 watt for Bulldozer,The 186 is a eng sample so it leaks more then a B2 chip.
Posted on Reply
#164
seronx
H82LUZ73there 125 and 95 watt for Bulldozer,The 186 is a eng sample so it leaks more then a B2 chip.
I didn't think about that :banghead: and it doesn't help it leaking that much at 3.2GHz lol

All FX Chips are overclockable

95 Watts FX-X110, 125 Watts FX-8130P
GenTarkinUm dude, Phenom II doesnt use SSE anything for SuperPI .. neither does SB.
SuperPI only utilizes x87 for its codebase, therefore thats whats run on both processors in that benchmark.
It makes no sense for any modern uarch strive for x87 prowess ... so, Im pretty sure superPI is the last thing on AMD's mind...if it ever was to begin with =P
It just so happens SB is better at x87 stuff ... who cares!
I wish people would drop superpi all together its meaningless nowadays...yet people use it to leave a good or bad taste in their mouth about an upcoming uarch....freakin retarded way to make first impressions of a new uarch!!!
exactly
Posted on Reply
#165
LAN_deRf_HA
cadavecaI'm sorry, but your compare here is inaccurate. You've got AMD with DDR3, and Intel with DDR2. I dunno about you, but I ran my 775 on DDR3 as soon as DDR3 boards came out. In fact, my old 775 board, a Foxconn BlackOps, that supports DDR3, is on it's way to EasyRhino right now.


Anyway, the point was that SuperPi can directly relate to SOME APPs and how they can perform, and is in no way meant to be used as a comparison for all performance scenarios.


And I do have screenshots from that platform. I'll not fall for the obvious problems in your compare; your troll failed, sry.:laugh:
Ok. I've had enough of this crap from you. Every time you get pushed into a corner with one of your assumptions you clamp down into this "lalalala I can't hear you mode." That wasn't even remotely trollish to anyone but you. I explained the extent of the effect of the DDR3, which I had confirmed before posting. Here, see for yourself. www.legitreviews.com/article/902/6/

It's ok to have some confidence in your assumptions but you take it too far. Thinking I'm trolling you? Wth man.
Posted on Reply
#166
cadaveca
My name is Dave
Yes, you are trolling, becuase although SuperPi is not indicitive of real-world performance, it does correlate to overall memory performance. As seen in F1 2010.

You started of saying AMD had better ram performance, but it does not; it only looks that way in your screenshots because you've got DDR2 VS DDR3. That's using skewed results that emulate what you want, rather than the truth. Start with factual comments, and I'll not call you a troll.

I've been doing cache speed compares since SKT754. if you search other forums for my posts, you'll find I even comapred 1MB vs 2MB CPUs. You're not informing me(or anyone else) of anything.
Posted on Reply
#167
THANATOS
seronx I won't deny that BD has a balanced amount of resources so there won't be a bottleneck and one of the links I provided was from AMD not just some smart ass guy even if it was quite old presentation.
The thing is, for me it would be 4 cores with 8 integer clusters but not 8 cores, because for me a 2 core is CMP, 2 identical cores who share at most L3 cache for data sharing between cores, hyper-transport and Integrated Memory Controller and in some case IGP like in Llano or SB.
Thats why I think they would be better of calling it 4 cores with AMD-threading or something like that and not 8 cores just because some small part of core die, just 12% is doubled what is not a core but an integer unit(cluster) just a part of it. Intel SB with HT also has an increase in die size thanks to HT meaning something was doubled but not as much as in an AMD modul, yet no one calls it that way even if it can virtually work with 8 threads. Why doubling integer units means double amount of cores but doubling registers and some other things means just 4 cores?
(sorry i couldn't find what was actually doubled except some registers in P4 but from that time HT did a big improvement even if I still think modul is the right choice and not HT)

devguy you wrote L3 Cache, the Integrated Memory Controller, and the HyperTransport link are shared and thanks to that Deneb should be just one core if BD isn't an 8 core or something in this sense. Thats a bad comparison in my opinion.
L3 cache is there specifically just so each core can access data from the other, what other reason would be there if L2 cache is faster, so making it larger would be better for the performance than creating a new slower cache.
IMC is for a CPU to communicate with the memory modules, so why should each core have their own IMC?
Hyper-transport or intel equivalent is the same as IMC just a communication between cpu and northbridge, southbridge or other cpu.
Not one of them was ever included in a core as I can recall at least IMC and HTt.
Its enough if you just look at the BD modul and deneb core and you can see the difference is just twice the amount of integer clusters, but just integer clusters were never called cores so why should be now.
Posted on Reply
#168
TheLaughingMan
Technically your current title doesn't have "mod" in it. Your mod status is implied. lol
Posted on Reply
#169
Benetanegia
TheLaughingManTechnically your current title doesn't have "mod" in it. Your mod status is implied. lol
lol he means "Super PI Mod v1.5". :laugh:
Posted on Reply
#170
seronx
KRONOSFXseronx I won't deny that BD has a balanced amount of resources so there won't be a bottleneck and one of the links I provided was from AMD not just some smart ass guy even if it was quite old presentation.
The thing is, for me it would be 4 cores with 8 integer clusters but not 8 cores, because for me a 2 core is CMP, 2 identical cores who share at most L3 cache for data sharing between cores, hyper-transport and Integrated Memory Controller and in some case IGP like in Llano or SB.
Thats why I think they would be better of calling it 4 cores with AMD-threading or something like that and not 8 cores just because some small part of core die, just 12% is doubled what is not a core but an integer unit(cluster) just a part of it. Intel SB with HT also has an increase in die size thanks to HT meaning something was doubled but not as much as in an AMD modul, yet no one calls it that way even if it can virtually work with 8 threads. Why doubling integer units means double amount of cores but doubling registers and some other things means just 4 cores?
(sorry i couldn't find what was actually doubled except some registers in P4 but from that time HT did a big improvement even if I still think modul is the right choice and not HT)
Everything was doubled
2 x 128bits SSE(1x256 bit AVX Add+Multiply)
2 x 16KB L1D
1 x 64KB L1I instead of 1 x 32KB L1I
64+64 and 32+32+32+32 registers instead of 64 and 32+32 registers
512KB(Phenom II) to 1MB L2(Regor/Llano) to 2MB L2(Zambezi)

To lazy to look up more that was doubled

The formula has changed a bit
Two Identical cores now use L2(For Zambezi)
Several Modules now use L3(For Zambezi)



Rather old dissection
Posted on Reply
#171
devguy
KRONOSFXdevguy you wrote L3 Cache, the Integrated Memory Controller, and the HyperTransport link are shared and thanks to that Deneb should be just one core if BD isn't an 8 core or something in this sense. Thats a bad comparison in my opinion.
L3 cache is there specifically just so each core can access data from the other, what other reason would be there if L2 cache is faster, so making it larger would be better for the performance than creating a new slower cache.
IMC is for a CPU to communicate with the memory modules, so why should each core have their own IMC?
Hyper-transport or intel equivalent is the same as IMC just a communication between cpu and northbridge, southbridge or other cpu.
Not one of them was ever included in a core as I can recall at least IMC and HTt.
Its enough if you just look at the BD modul and deneb core and you can see the difference is just twice the amount of integer clusters, but just integer clusters were never called cores so why should be now.
Here'sa quote from JF-AMD that you should read:
Um, old school processors had the FPU in a seperate socket and few ever populated it. Are you telling me that everything prior to pentium was a "zero core" or "half core" processor?

Processors are full of shared and discrete components. Memory controller, L2 cache, L3 cache, Northbridge, HT links, etc. All of that stuff can be shared. Why don't you give each core a memory controller? When we went from single core with a single memory controller to dual core with a single memory controller, where was the outrage? You can't really call that a "dual core" with only a single memory controller....

The world is apparently never going to completely agree on what a core is. MOST of the world looks at integer execution clusters as the "core".

Here is something that we can all agree on: They will have a performance level, they will have a power consumption and they will have a price. And those will be the things that people compare to today. I am actually happy to be living in a world that does not force me to make my processors exactly like my competitors.
Posted on Reply
#172
bucketface
cadavecaI've been doing cache speed compares since SKT754. if you search other forums for my posts, you'll find I even comapred 1MB vs 2MB CPUs. You're not informing me(or anyone else) of anything.
Is that at me?...

ok, can u clear something up for me. Doesn't super pi mostly stress cache bandwidth/latency esp at lower tests like below 8mb. I thought it was Amd's cache that was slower than intels and not so much the Imc or does Qpi significantly outpace it.
cadavecaThe question must be raised, is such detailed analysis even nessecary? Are we comparing cache, the CPU memory subsystem(which for me, is caches, controller, and system memory), the system memory subsystem, or jsut overall performance?

I raised this point earlier..I care about game perforamcne, so until I get game perforamcne compares, none of this really matters to me. Bulldozer could be the slowest CPU ever, but if in some magical way it makes my games play better, then it's a win, for me. So, what's really improtant for you? Games, or something else?
just curious, you were discussing phenom mem perf vs sandy or something earlier. i thought it might have some relevance to this. also below 8m the cache is whats being tested and after that both cache and the rest or the mem sub-system. i'm not sure what im saying anymore.. too tired.
games.. look at my rig. I spent $70 on the cpu and $180 on the Gfx. :D
Posted on Reply
#173
cadaveca
My name is Dave
bucketfaceIs that at me?...
No, not at all! ;) It to those posting comment like "x87 is now useless". No kidding x87 is useless, as is SuperPi. However, the way the runtime works creates high memory traffic, and that's what we are analyzing, not how fast the CPU does x87, nor how long it takes to calculate to so many digits of Pi. It's not a "real-world" performance benchmark, it's a "simulated' performance benchmark, which means, by the nature of those definitions, that it must not be accepted as fact without special considerations. Raising any points about the validity of those benchmarks is stating the obvious, and as such, I consider a troll posting.
bucketfaceok, can u clear something up for me. Doesn't super pi mostly stress cache bandwidth/latency esp at lower tests like below 8mb. I thought it was Amd's cache that was slower than intels and not so much the Imc.
The question must be raised, is such detailed analysis even nessecary? Are we comparing cache, the CPU memory subsystem(which for me, is caches, controller, and system memory), the system memory subsystem, or jsut overall performance?

I raised this point earlier..I care about game perforamcne, so until I get game perforamcne compares, none of this really matters to me. Bulldozer could be the slowest CPU ever, but if in some magical way it makes my games play better, then it's a win, for me. So, what's really improtant for you? Games, or something else?
Posted on Reply
#174
Crap Daddy
So going back to what we see in the screens posted at the beginning of this thread which sparked enthusiasm from some and skepticism from others, we can conclude that:

The Aida cache and memory benchmark is a disaster for BD, SuperPi the same and the other benchmarks are done at unknown clocks therefore we don’t have a true comparison with SB.

We’ll have to wait a little longer to realy compare BD and SB.
Posted on Reply
#175
LAN_deRf_HA
cadavecaYes, you are trolling, becuase although SuperPi is not indicitive of real-world performance, it does correlate to overall memory performance. As seen in F1 2010.

You started of saying AMD had better ram performance, but it does not; it only looks that way in your screenshots because you've got DDR2 VS DDR3. That's using skewed results that emulate what you want, rather than the truth. Start with factual comments, and I'll not call you a troll.

I've been doing cache speed compares since SKT754. if you search other forums for my posts, you'll find I even comapred 1MB vs 2MB CPUs. You're not informing me(or anyone else) of anything.
AMD has had better ram bandwidth than 775 for quite awhile now. This shouldn't be news to you. Stop being so stupid about this. It was a CACHE comparison as I stated, not a ram comparison. I explained the difference in it's effect on cache, which was not enough to be to skew the comparison. If you're going to argue something actually argue it, don't just run away and deflect. Reread my posts, look at all 4 screens. The two I posted and the two I linked to and try again, because you seem to have radically misunderstood what was being discussed.
Posted on Reply
Add your own comment
Apr 19th, 2024 01:08 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts