• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Gives Bulldozer 6-core a Speed-Bump with FX-6200

You have input the Intel CPU has a longer pipeline and wins most of the time... pushing it even more towards 2x

Either a CPU gives performance results 2x faster, or it doesn't. Contrary to your earlier suggestion, in the examples that you give, it doesn't.

Most applications don't care about ISAs but most benchmarks do

In most cases in the native environment some of these applications won't exist
(x87, MMX, 3dnow!, SSE(64bit) can't exist in x86-64, in x86-64 you have to use SSE2,SSE3,SSE4,SSE5(AVX+FMA+AVX2+Gather+XOP)

In most music conversion you see MMX and SSE being most used...while in 64bit applications of music conversions you see SSE4 being used the 64bit music converter is faster than the 32bit music converter but the 32bit version is still being more used...

Consumers = Relatively Stupid....in these cases

Smart Consumers like myself know to wait for applications to use the new ISAs before jumping boat or listening to non-important reviewers trying to persuade unsmart consumers in making dumb decisions

Most of the most demanding programs I use (and will use I would think for at least another few years) are represented (by themselves or very similar programs) in that table.
 
Yes, Bulldozer has 8 "cores", but it shares a lot of resources between them. So, in workloads reliant on those shared resources, it'll perform like a quad.
The second sentence here is not true! Yes, the 81xx has only 4 FPU's (one per module), but those FPU's has the double of resources than the old FPU's in PhII's. And the new one is more clever, as well.

This is why you see Phenom x6 beating it in some threaded applications.

What applications are you talking about?

It's simple: BD's single core performance is worse than that of PhII's, but it has more cores. So, it performs worse in less-threaded applications and better in well-threaded ones. (In case the clocks are similar.)
 
Last edited:
What applications are you talking about?

It's simple: BD's single core performance is worse than that of PhII's, but it has more cores. So, it performs worse in less-threaded applications and better in well-threaded ones. (In case the clocks are similar.)

With clock speeds the same, there are quite a lot of multithreaded applications where BD suffers vs. Phenom x6. Even with it's clockspeed advantage, however:

41699.png


Notice that this application can clearly make good use of 8 threads - the HT enabled i7 thrashes the otherwise near-identical i5.

Another example is F@H. Also, notice that Microsoft's Windows 7 patch today acknowledges this point - it makes sure that tasks are spread between BD modules to avoid bottlenecks.
 
Notice that this application can clearly make good use of 8 threads - the HT enabled i7 thrashes the otherwise near-identical i5.
Hmm, surprising. But I don't think it is because of shared resources, as compiling is an integer task. Of course, there is only one front-end per module, but that's not an issue in other cases. I think the cause must be the relatively slow caches.

Another example is F@H.
Hmm, even more surprising, as F@H is floating-point intensive, where the BD is not bad at, otherways. Probably it's the caches, again.

Also, notice that Microsoft's Windows 7 patch today acknowledges this point - it makes sure that tasks are spread between BD modules to avoid bottlenecks.
We don't yet know what that patch does. Some say it's indeed packs the threads on as lesser the number of modules as it can to allow Max. Turbo Core to kick in more frequently. (Which would be a bad idea, I think, as there is more to gain with utilizing only one core per module, in case of only a few active threads.)
 
Back
Top