Thursday, July 14th 2011

FX-Series Processors Clock Speeds 'Revealed'

On several earlier articles like this one, we were versed with the model numbers and even possible prices of AMD's next-generation FX series desktop processors, but the clock speeds stayed under the wraps, that's until a table listing them out was leaked. AMD's FX-series consists of eight-core FX-81xx parts, six-core FX-61xx, and quad-core FX-41xx parts, probably harvested out of the Zambezi silicon by disabling modules (groups of two cores closely interconnected with some shared resources). Most, if not all, FX series chips have unlocked multipliers, making it a breeze to overclock them. All chips come in the AM3+ package, feature 8 MB of L3 cache, and 2 MB L2 cache per module.

Leading the pack is FX-8150, with a clock speed of 3.6 GHz, and TurboCore speed of 4.2 GHz, a 500 MHz boost. The next chip, FX-8120, has a boost of close to a GHz, it has a clock speed of 3.1 GHz, that goes all the way up to 4 GHz with TurboCore. This will be available in 125W and 95W TDP variants. Next up is the FX-8100, with 2.8 GHz clock speed, that goes up to 3.7 GHz, another 900 MHz boost. The scene shifts to 6-core chips, with FX-6120, no clock speed numbers were given out for this one. FX-6100, on the other hand, is clocked at 3.3 GHz, with 3.9 GHz Turbo. The FX-4100 is the only quad-core part with clock speeds given out by this source: 3.6 GHz, with a tiny 200 MHz boost to 3.8 GHz. You can see that there is no pattern in the turbo speed amounts specific to models, and hence we ask you to take these with a pinch of salt.

Source: DonanimHaber
Add your own comment

412 Comments on FX-Series Processors Clock Speeds 'Revealed'

#1
seronx
Damn_Smooth said:
Yes, I know that. But which cores will it use?
It will use all that they need

The Windows OS will schedule the threads not the software or the cpu

If it a 4 core app you will see any 2 modules being used

If it is a 3 core app you will see any 1 module being used + any 1 half-utilized module
Posted on Reply
#2
Damn_Smooth
seronx said:
It will use all that they need

The Windows OS will schedule the threads not the software or the cpu

If it a 4 core app you will see 2 modules used in any order

If it is a 3 core app you will see any 1 module being used + any 1 half-utilized module
So there is no way to make it use 1 core per module? That would make more sense so that all 4 threads are getting 100% of the resources.

In a 4 threaded application.
Posted on Reply
#3
seronx
Damn_Smooth said:
So there is no way to make it use 1 core per module? That would make more sense so that all 4 threads are getting 100% of the resources.

In a 4 threaded application.
In real world tasks there will be no differences between 1 core used to 2 cores used

1 core in a module has access to 100% of the resources
2 cores in a module has access to 100% of the resources

The idea of CMT is to make 2 cores use the same resources to increase throughput/speed

1 module provides 2x the resources 1 core needs

4 cores being used in any pattern or setup will have 100% access to all the resources it needs

Simply put you do not need to worry about the module as a whole

Everything that needs to be dedicated is dedicated and everything that needs to be shared is shared
Posted on Reply
#4
Wile E
Power User
seronx said:
AMD Zambezi has more IPC per module compared to Intel IPC per core

and AMD Zambezi mimics Tri-channel simply do to how many predictors the IMC has
I'll believe it when I see it.
seronx said:
In real world tasks there will be no differences between 1 core used to 2 cores used

1 core in a module has access to 100% of the resources
2 cores in a module has access to 100% of the resources only half of the time.

The idea of CMT is to make 2 cores use the same resources to increase throughput/speed

1 module provides 2x the resources 1 core needs

4 cores being used in any pattern or setup will have 100% access to all the resources it needs

Simply put you do not need to worry about the module as a whole

Everything that needs to be dedicated is dedicated and everything that needs to be shared is shared in an ideal world, but we don't live in an ideal world.
Fixed.
Posted on Reply
#5
seronx
Wile E said:

False
Posted on Reply
#6
Wile E
Power User
seronx said:
False
Fabrication
Posted on Reply
#7
seronx
Wile E said:
Fabrication
You got it backwards

There is 800% resources

1 core can only access 100% of those resources

It is a hardware limitation

2 cores will use 200% of those

So, 1 core will completely use the stuff dedicated to it and only half the stuff shared with it there is only so much 1 core can do

The floating point is a dedicated entity shared between both cores, so it does not follow what we think of a normal FPU

That is why it is call a Flex FPU

1 core in a module has access to 100% of the resources in 1 module
2 cores in a module has access to 200% of the resources in 1 module

Module holds all the resources needed for 2 cores to run in it
There is no performance hit in this design

Performance to Resources used
100% -> 200% -> 300% -> 400% -> 500% -> 600% -> 700% -> 800%
{50% -> 100%} -> {150% -> 200%} -> {250% -> 300%} -> {350% -> All}

with 4 cores used in any module will utilize half of the CPUs total resources
with 2 cores used in any module will utiilize 1/4 of the CPUs total resources
with 6 """"" 3/4 of the CPUs total resources
with 8 """"" All of the the CPUs total resources

Module holds 100% of the stuff needed for 1 core and 2 cores to operate without a bottleneck

---and now for something totally different--------

Brad_Hawthorne@EVGA
I helped setup an AMD event on the 15th and staffed the event on the 16th. The two systems I worked with were Bulldozer 3.4ghz engineering samples. I pulled up the system control panel to confirm the chips as 3.4ghz ES. I have mixed thoughts about it. I liked that they did 3.4ghz on the stock AMD heatsink by default. On the other hand said:
Everything was cranked up to maximums in the settings, running them 5760x1080. No noticeable lag spikes or FPS issues with real world use in game. I believe the video cards in the rigs were 6990. The port config was two dvi and 2 mini-dp. Ran two of the projectors via dvi and the third via minidp-to dvi adapter. The rigs were connected to D-Box motion actuated racing chairs with Logitech G27 setups. I have pics and video of the rig configurations on my Canon t2i. I just finished driving Dallas-Wichita though in 6 hours so I'm a bit tired. Will update pics when I wake up later today.


What he does^
Posted on Reply
#8
Wile E
Power User
If both threads going to a module need to access the fpu, one has to wait. By definition, that's not 100% resource availability.

And eyefinity setups prove absolutely nothing. Gaming is an absolutely terrible metric to judge cpu performance.

Sorry, but I would happily bet money that Intel still wins IPC per core per clock.
Posted on Reply
#9
Damn_Smooth
Here is a quote from JF-AMD that says that you don't get 100% out of two cores in a module.
OK, daddy is going to do some math, everyone follow along please.

First: There is only ONE performance number that has been legally cleared, 16-core Interlagos will give 50% more throughput than 12-core Opteron 6100. This is a statement about throughput and about server workloads only. You CANNOT make any client performance assumptions about that statement.

Now, let's get started.

First, everything that I am about to say below is about THROUGHPUT and throughput is different than speed. If you do not understand that, then please stop reading here.

Second, ALL comparisons are against the same cores, these are not comparison different generations nor are they comparisons against different architectures.

Assume that a processor core has 100% throughput.

Adding a second core to an architecture is typically going to give ~95% greater throughput. There is obviously some overhead because the threads will stall, the threads will wait for each other and the threads may share data. So, two completely independent cores would equal 195% (100% for the first core, 95% for the second core.)


Looking at SPEC int and SPEC FP, Hyperthreading gives you 14% greater throughput for integer and 22% greater throughput for FP. Let's just average the two together.

One core is 100%. Two cores are 118%. Everyone following so far? We have 195% for 2 threads on 2 cores and we have 118% for 2 threads on 1 core.

Now, one bulldozer core is 100%. Running 2 threads on 2 seperate modules would lead to ~195%, it's consistent with running on two independent cores.

Running 2 threads on the same module is ~180%.

You can see why the strategy is more appealing than HT when it comes to threaded workloads. And, yes, the world is becoming more threaded.

Now, where does the 90% come from? What is 180% /2? 90%.

People have argued that there is a 10% overhead for sharing because you are not getting 200%. But, as we saw before, 2 cores actually only equals 195%, so the net per core if you divide the workload is actually 97.5%, so it is roughly a 7-8% delta from just having cores.

Now, before anyone starts complaining about this overhead and saying that AMD is compromising single thread performance (because the fanboys will), keep in mind that a processor with HT equals ~118% for 2 threads, so per thread that equals 59%, so there is a ~36% hit for HT. This is specifically why I think that people need to stay away from talking about it. If you want to pick on AMD for the 7-8%, you have to acknowledge the ~36% hit from HT. But ultimately that is not how people jusdge these things. Having 5 people in a car consumes more gas than driving alone, but nobody talks about the increase in gas consumption because it is so much less than 5 individual cars driving to the same place.

So, now you know the approximate metrics about how the numbers work out. But what does that mean to a processor? Well, let's do some rough math to show where the architecture shines.

An Orochi die has 8 cores. Let's say, for sake of argument, that if we blew up the design and said not modules, only independent cores, we'd end up with about 6 cores.

Now let's compare the two with the assumption that all of the cores are independent on one and in modules on the other. For sake of argument we will assume that all cores scale identically and that all modules scale identically. The fact that incremental cores scale to something less than 100% is already comprehended in the 180% number, so don't fixate on that. In reality the 3rd core would not be at 95% but we are holding that constant for example.

Mythical 6-core bulldozer:
100% + 95% + 95% + 95% + 95% + 95% = 575%

Orochi die with 4 modules:
180% + 180% + 180% + 180% = 720%

What if we had just done a 4 core and added HT (keeping in the same die space):
100% + 95% +95% +95% + 18% + 18% + 18% + 18% = 457%

What about a 6 core with HT (has to assume more die space):
100% + 95% +95% +95% +95% +95% + 18% + 18% + 18% + 18% + 18% + 18% = 683%

(Spoiler alert - this is a comparison using the same cores, do NOT start saying that there is a 25% performance gain over a 6-core Thuban, which I am sure someone is already starting to type.)

The reality is that by making the architecture modular and by sharing some resources you are able to squeeze more throughput out of the design than if you tried to use independent cores or tried to use HT. In the last example I did not take into consideration that the HT circuitry would have delivered an extra 5% circuitry overhead....

Every design has some degree of tradeoff involved, there is no free lunch. The goal behind BD was to increase core count and get more throughput. Because cores scale better than HT, it's the most predictable way to get there.

When you do the math on die space vs. throughput, you find that adding more cores is the best way to get to higher throughput. Taking a small hit on overall performance but having the extra space for additional cores is a much better tradeoff in my mind.

Nothing I have provided above would allow anyone to make a performance estimate of BD vs. either our current architecture or our compeition, so, everyone please use this as a learning experience and do not try to make a performance estimate, OK?
http://www.xtremesystems.org/forums/showthread.php?267050-What-to-Expect-From-AMD-at-ISSCC-2011&p=4755711#post4755711

So now I'm back to hoping that they can figure out a way to make 4 threads run on 4 modules.
Posted on Reply
#10
seronx
Wile E said:
If both threads going to a module need to access the fpu, one has to wait. By definition, that's not 100% resource availability.

And eyefinity setups prove absolutely nothing. Gaming is an absolutely terrible metric to judge cpu performance.

Sorry, but I would happily bet money that Intel still wins IPC per core per clock.
The FPU isn't a resource and both threads aren't going to wait for the FPU since it is tasked way differently than in CMP

256bit commands are done at the module level not the core level
(Meaning AVX support is the same as Intel's AVX for compatibility)

SSE5 is where it is at for AMD(XOP, CVT16, FMA4)

SSE is done at the core level(8xSSE)(SSE5 128bit)
AVX is done at the module level(4xAVX)(AVX 128bit+AVX 128bit)

Sorry, but I would bet money that AMD and Intel have equal IPC per core per clock

Damn_Smooth said:
Here is a quote from JF-AMD that says that you don't get 100% out of two cores in a module.

So now I'm back to hoping that they can figure out a way to make 4 threads run on 4 modules.
He isn't talking about what I am talking about

It's harder to explain once you go from the throughput world to the speed world

There is no overhead....that is the issue
The core CMP issue is there but isn't really bad

CMT scales on the module level not the core level
200% -> 397.5% -> 595% -> 792.5%
vs CMP
100% -> 197.5% -> 295% -> 392.5% -> 490% -> 587.5% -> 685% -> 782.5%

Do you see the trade off?
Posted on Reply
#11
Wile E
Power User
seronx said:
The FPU isn't a resource and both threads aren't going to wait for the FPU since it is tasked way differently than in CMP

256bit commands are done at the module level not the core level
(Meaning AVX support is the same as Intel's AVX for compatibility)

SSE5 is where it is at for AMD(XOP, CVT16, FMA4)

SSE is done at the core level(8xSSE)(SSE5 128bit)
AVX is done at the module level(4xAVX)(AVX 128bit+AVX 128bit)

Sorry, but I would bet money that AMD and Intel have equal IPC per core per clock



He isn't talking about what I am talking about

It's harder to explain once you go from the throughput world to the speed world
Bullshit, if both threads need floating point, one has to wait, plain and simple fact. I don't care about SSE5, that's a completely irrelevant distraction, and doesn't change the point at all.
Posted on Reply
#12
seronx
Wile E said:
Bullshit, if both threads need floating point, one has to wait, plain and simple fact. I don't care about SSE5, that's a completely irrelevant distraction, and doesn't change the point at all.
No...there is no waiting for each core

AVX done on both cores or done half-length
128bit AVX+128bit AVX
2x128bit AVX

You not understanding this is lousy tiddings

Damn_Smooth said:
Here is a quote from JF-AMD that says that you don't get 100% out of two cores in a module.

So now I'm back to hoping that they can figure out a way to make 4 threads run on 4 modules.
Back to you

4 core Orochi vs 4 core Phenom II

Both score 4000~
But in multithreading there is an overhead but the Orochi design alleviates that to the module level and not to the core level

4 core Orochi will get a real world score of 15000~ where in an no-overhead world it will get 16k
4 core Phenom II will get a real world score of 14000~ where in an no-overhead world it will get 16k

The distance gets even bigger with more cores

8 core Orochi vs 8 core Phenom II

4000 again
8 core Orochi will get a real world score of 30000 where in a no overhead world it will get 32K
8 core Phenom II will get a real world score of 28000 where in no overhead world it will get 32k

But that is at the same clocks and for the same IPC

Phenom II has 3 IPC per core while Zambezi has 4 IPC per core(this is where the 25% comes in)

and Zambezi will have a higher clock

Same clocks though
Phenom II 3.4GHz
4200
Zambezi ignoring all the extra stuff that increases a little bit
5000~(I'm going to say it will get 5000ish(±400)

Phenom II 8C - 29400
Zambezi 8C - 34500

But that is if it is well programmed
Posted on Reply
#13
Wile E
Power User
Thanks, but I'll wait for the real info to release.
Posted on Reply
#14
seronx
Wile E said:
Thanks, but I'll wait for the real info to release.


Is this real enough?

128bit Execution per Core
256bit Execution per Module
Posted on Reply
#15
xenocide
o.O

128-bit - 32 FLOPS
256-bit - 64 FLOPS

wait what?
Posted on Reply
#16
seronx
xenocide said:
o.O

128-bit - 32 FLOPS
256-bit - 64 FLOPS

wait what?
It's due to the FMACs

Intel doesn't have FMACs

1x128bit
or
1x256bit
per core

AMD has FMACs

1x128bit
per core
1x256bit
per module

The best I can come up with
Posted on Reply
#17
Benetanegia
I'm amazed at the sheer amount of BS that you can write in one night. lol not pretending to be offensive, it's almost a complimment.

Anyway, FMAC has nothing to do with that. FMAC is the way the math is done. AMD used 2x128 bit FMAC units. Which means 2 fused mulply accumulate units.

Intel used 1x 256 bit FMUL and 1x 256 bit FADD. The result is similar.

The difference is that BD can use 1x 128 bit for each "core", which may or might not be an advantage for legacy code that is heavily parallelized (8 threads). In the server arena this might be a real advantage, in desktop, it will help nothing most probably (8 threads required).

What AMD doesn't say either is that the 128+128 = 256 bit operation is slower than the "native" 256 bit operation, so slower for AVX, there is overhead. Pretending there is not, is just like believing in fairies.

Or just like believing that GlobalFoundries or not, the yields are the same for an old architecture and a new architecture. :rolleyes:
Posted on Reply
#18
AphexDreamer
I have no idea what is going on here or what is being said but would honestly like to know.

I do plan on picking up one of these processors, of course after they are actually out and I've read a couple of reviews.

Should I worry about all the info that is being thrown around here?
Posted on Reply
#19
Benetanegia
AphexDreamer said:
Should I worry about all the info that is being thrown around here?
As someone who only wants to buy the thing, not really. Wait until the reviews are performed and make your decision based on the performance for your prefered applications.

We just like to talk about and predict performance based on our knowledge of the architecture and the different tech utilized.
Posted on Reply
#20
seronx
Benetanegia said:
blah blah blah
Using 128bit+128bit is 6% slower I am to lazy to google up what we already should know

The yields are better

Because they have been producing AMD Zambezi chips since
Late August 2010(8 weeks after Bulldozer was taped out)
Late August 2010 -Late October 2010 = A1
November 2010 - January 2011 = B0
February 2011- April 2011 = B1
May 2011 - July 2011 = B2
^That span of time I am pretty sure they don't have yield issues

Since, the desktop market likes the legacy benchies it will do great

AphexDreamer said:
I have no idea what is going on here or what is being said but would honestly like to know.

I do plan on picking up one of these processors, of course after they are actually out and I've read a couple of reviews.

Should I worry about all the info that is being thrown around here?
No, you shouldn't we are bickering about stuff you won't have to worry about

Benetanegia said:
As someone who only wants to buy the thing, not really. Wait until the reviews are performed and make your decision based on the performance for your prefered applications.

We just like to talk about and predict performance based on our knowledge of the architecture and the different tech utilized.
Basically
Posted on Reply
#21
theoneandonlymrk
seronx said:
No, you shouldn't we are bickering about stuff you won't have to worry about
and what you know little about, for all you know they may allow the use of 2x128bit fpu to 1 core if underused meaning it would have 100+% resources, im not saying it will just saying that not one person on here knows what them mofos are !actually! gona be capable of regarding speed or function, your spreading FUD simples and clearly sat on an AMD dildo:eek:

not once have i seen it be your imho:wtf:, no your waffling like its a fact, and no sprd sheets please or screanies , ive seen em all and been following it as long as everyone else on TPU:D
Posted on Reply
#22
Benetanegia
seronx said:
Using 128bit+128bit is 6% slower I am to lazy to google up what we already should know

The yields are better

Because they have been producing AMD Zambezi chips since
Late August 2010(8 weeks after Bulldozer was taped out)
Late August 2010 -Late October 2010 = A1
November 2010 - January 2011 = B0
February 2011- April 2011 = B1
May 2011 - July 2011 = B2
^That span of time I am pretty sure they don't have yield issues
And why did they had so many revisions? Because yields were not good my friend. And now they delayed it again, for which reason? Obvious. They still have some issues.
Since, the desktop market likes the legacy benchies it will do great
Even if that was true:

legacy apps == poor multi-threading == don't dream of 4 threads being fully utilized, let alone 8 == 128+128 bit advantage goes down the drain.

And when major developers start using AVX in 1-2 years tops, BD will have the disadvantage. "Only" 6% if you will (I want proof btw), still a big one considering that the die size increases too.
Posted on Reply
#23
seronx
theoneandonlymrk said:
and what you know little about, for all you know they may allow the use of 2x128bit fpu to 1 core if underused meaning it would have 100+% resources, im not saying it will just saying that not one person on here knows what them mofos are !actually! gona be capable of regarding speed or function, your spreading FUD simples and clearly sat on an AMD dildo:eek:

not once have i seen it be your imho:wtf:, no your waffling like its a fact, and no sprd sheets please or screanies , ive seen em all and been following it as long as everyone else on TPU:D
http://blogs.amd.com/work/2010/10/25/the-new-flex-fp/

You need to read this
One of the most interesting features planned for our next generation core architecture, which features the new “Bulldozer” core, is something called the “Flex FP”, which delivers tremendous floating point capabilities for technical and financial applications.

For those of you not familiar with floating point math, this is the high level stuff, not 1+1 integer math that most applications use. Technical applications and financial applications that rely on heavy-duty use of floating point math could see huge increases in performance over our existing architectures, as well as far more flexibility.

The heart of this new feature is a flexible floating point unit called the Flex FP. This is a single floating point unit that is shared between two integer cores in a module (so a 16-core “Interlagos” would have 8 Flex FP units). Each Flex FP has its own scheduler; it does not rely on the integer scheduler to schedule FP commands, nor does it take integer resources to schedule 256-bit executions. This helps to ensure that the FP unit stays full as floating point commands occur. Our competitors’ architectures have had single scheduler for both integer and floating point, which means that both integer and floating point commands are issued by a single shared scheduler vs. having dedicated schedulers for both integer and floating point executions.

There will be some instruction set extensions that include SSSE3, SSE 4.1 and 4.2, AVX, AES, FMA4, XOP, PCLMULQDQ and others.

One of these new instruction set extensions, AVX, can handle 256-bit FP executions. Now, let’s be clear, there is no such thing as a 256-bit command. Single precision commands are 32-bit and double precision are 64-bit. With today’s standard 128-bit FPUs, you execute four single precision commands or two double precision commands in parallel per cycle. With AVX you can double that, executing eight 32-bit commands or four 64-bit commands per cycle – but only if your application supports AVX. If it doesn’t support AVX, then that flashy new 256-bit FPU only executes in 128-bit mode (half the throughput). That is, unless you have a Flex FP.

In today’s typical data center workloads, the bulk of the processing is integer and a smaller portion is floating point. So, in most cases you don’t want one massive 256-bit floating point unit per core consuming all of that die space and all of that power just to sit around watching the integer cores do all of the heavy lifting. By sharing one 256-bit floating point unit per every 2 cores, we can keep die size and power consumption down, helping hold down both the acquisition cost and long-term management costs.

The Flex FP unit is built on two 128-bit FMAC units. The FMAC building blocks are quite robust on their own. Each FMAC can do an FMAC, FADD or a FMUL per cycle. When you compare that competitive solutions that can only do an FADD on their single FADD pipe or an FMUL on their single FMUL pipe, you start to see the power of the Flex FP – whether 128-bit or 256-bit, there is flexibility for your technical applications. With FMAC, the multiplication or addition commands don’t start to stack up like a standard FMUL or FADD; there is flexibility to handle either math on either unit. Here are some additional benefits:

Non-destructive DEST via FMA4 support (which helps reduce register pressure)
Higher accuracy (via elimination of intermediate round step)
Can accommodate FMUL OR FADD ops (if an app is FADD limited, then both FMACs can do FADDs, etc), which is a huge benefit

The new AES instructions allow hardware to accelerate the large base of applications that use this type of standard encryption (FIPS 197). The “Bulldozer” Flex FP is able to execute these instructions, which operate on 16 Bytes at a time, at a rate of 1 per cycle, which provides 2X more bandwidth than current offerings.

By having a shared Flex FP the power budget for the processor is held down. This allows us to add more integer cores into the same power budget. By sharing FP resources (that are often idle in any given cycle) we can add more integer execution resources (which are more often busy with commands waiting in line). In fact, the Flex FP is designed to reduce its active idle power consumption to a mere 2% of its peak power consumption.

The Flex FP gives you the best of both worlds: performance where you need it yet smart enough to save power when you don’t need it.

The beauty of the Flex FP is that it is a single 256-bit FPU that is shared by two integer cores. With each cycle, either core can operate on 256 bits of parallel data via two 128-bit instructions or one 256-bit instruction, OR each of the integer cores can execute 128-bit commands simultaneously. This is not something hard coded in the BIOS or in the application; it can change with each processor cycle to meet the needs at that moment. When you consider that most of the time servers are executing integer commands, this means that if a set of FP commands need to be dispatched, there is probably a high likelihood that only one core needs to do this, so it has all 256-bit to schedule.

Floating point operations typically have longer latencies so their utilization is typically much lower; two threads are able to easily interleave with minimal performance impact. So the idea of sharing doesn’t necessarily present a dramatic trade-off because of the types of operations being handled.



As you can see, the flexibility of the FPU really gives total flexibility to the system, designed to deliver optimized performance per core per cycle.

Also, each of our pipes can seamlessly handle SSE or AVX as well as FMUL, FADD, or FMAC providing the greatest flexibility for any given application. Existing apps will be able to take full advantage of our hardware with potential for improvement by leveraging the new ISAs.

Obviously, there are benefits of recompiled code that will support the new AVX instructions. But, if you think that you will have some older 128-bit FP code hanging around (and let’s face it, you will), then don’t you think having a flexible floating point solution is a more flexible choice for your applications? For applications to support the new 256-bit AVX capabilities they will need to be recompiled; this takes time and testing, so I wouldn’t expect to see rapid movement to AVX until well after platforms are available on the streets. That means in the meantime, as we all work through this transition, having flexibility is a good thing. Which is why we designed the Flex FP the way that we have.

If you have gotten this far, you are probably thinking that the technical discussion might be a bit beyond a guy with a degree in economics. I’d like to take a moment to thank Jay Fleischman and Kevin Hurd, two geniuses who really understand how all of these pieces fit together to make the Flex FP really unique in the industry.
Benetanegia said:
And why did they had so many revisions? Because yields were not good my friend. And now they delayed it again, for which reason? Obvious. They still have some issues.
And those issues weren't yield bent


Benetanegia said:

Even if that was true:

legacy apps == poor multi-threading == don't dream of 4 threads being fully utilized, let alone 8 == 128+128 bit advantage goes down the drain.

And when major developers start using AVX in 1-2 years tops, BD will have the disadvantage. "Only" 6% if you will (I want proof btw), still a big one considering that the die size increases too.
I am mainly talking about Cinebench, wPrime, and those other benchies
Posted on Reply
#24
theoneandonlymrk
seronx said:
You need to read this
yes cos ive not seen that before(
Posted on Reply
#25
seronx
theoneandonlymrk said:
:ohwell:
The cores do not need the Floating Point Unit

The Floating Point Unit needs the cores

:roll:

Oh me....forgetting about that single fact
Posted on Reply
Add your own comment