Wednesday, August 13th 2014

Intel Haswell TSX Erratum as Grave as AMD Barcelona TLB Erratum

Intel's "Haswell" micro-architecture introduced the transactional synchronization extensions (TSX) as part of its upgraded feature-set over its predecessor. The instructions are designed to speed up certain types of multithreaded software, and although it's too new for any major software vendor to implement, some of the more eager independent software developers began experimenting with them, only to discover that TSX is buggy and can cause critical software failures.

The buggy TSX implementation on Core "Haswell" processors was discovered by a developer outside Intel, who reported it to the company, which then labeled it as an erratum (a known design flaw). Intel is addressing the situation by releasing a micro-code update to motherboard manufacturers, who will then release it as a BIOS update to customers. The update disables TSX on affected products (Core and Xeon "Haswell" retail, and "Broadwell-Y" engineering samples).

TechReport's Scott Wasson draws a parallel between the TSX erratum, and the infamous translation lookaside buffer (TLB) erratum of AMD's "Barcelona" chips, which caused the company to temporarily halt production of its first single-die quad-core Opteron processors, and release similar "performance-impacting" micro-code updates for its consumer Phenom X4 processors. Expect your motherboard vendor to dish out a BIOS update with Intel's micro-code patch very soon.Source: TechReport
Add your own comment

20 Comments on Intel Haswell TSX Erratum as Grave as AMD Barcelona TLB Erratum

#1
BiggieShady
The patch should disable TSX instructions on the CPU and prevent you from blue screening your PC if you are developing a bleeding edge applications using TSX instructions. It's funny how people who are affected are the ones that don't want TSX disabled.
Posted on Reply
#2
HumanSmoke
"Intel Haswell TSX Erratum as Grave as AMD Barcelona TLB Erratum"

Seems a bit sensationalist doesn't it?
Barcelona's TLB was producing erroneous results in enterprise workloads from day one. As far as I'm aware the only TSX enabled processors are desktop Haswell and whatever Broadwell samples are being sent around...and the grand total of software applications using TSX totals zero.
Posted on Reply
#3
Scrizz
by: HumanSmoke
"Intel Haswell TSX Erratum as Grave as AMD Barcelona TLB Erratum"

Seems a bit sensationalist doesn't it?
Barcelona's TLB was producing erroneous results in enterprise workloads from day one. As far as I'm aware the only TSX enabled processors are desktop Haswell and whatever Broadwell samples are being sent around...and the grand total of software applications using TSX totals zero.
this

it reeks of flame bait?
Posted on Reply
#4
cyneater
Haswell processors below 45xx as well as R-series and K-series (with unlocked multiplier) SKUs do not support TSX.
Posted on Reply
#5
john_
It's something that it is on every Haswell. It's not really something minor. It could be even worst than TLB bug on Barcelona considering the market share of AMD back then and the market share of Intel today. TSX affects much more people. The difference here is that in the case of Haswell people just don't give a sh!t because Haswell is anyway the fastest cpu available. So disabling something that we haven't seen yet on a processor that it is the fastest available isn't really something that will make someone lose his sleep. On the other hand Barcelona was something that everyone was expecting like the second coming of AMD. Performance wasn't there and the TLB bug was just the icing on the (bad) cake. Making something that wasn't fast enough perform even slower because of a bug was just terrible.

The title in my opinion is justified. It is the conditions that are totally different. It's how the same thing, because of different condition, we look it at two totally different angles some times.
Posted on Reply
#6
Thefumigator
Barcelona TLB Bug was nasty, however it triggered in rare occasions for the common user.
Haswell TSX bug is stupid. As if Intel didn't have enough to make their things right... c'mon intel...
Posted on Reply
#7
Assimilator
When I saw the title I had to check if I was on SemiAccurate. I'd expect this sort of fear-mongering click-baiting headline from ol' Charlie, not TPU. Factually, the bug affects far fewer people than the TLB bug ever did, which makes it far less serious.

by: john_
TSX affects much more people.
Show me a piece of commercial software that uses TSX? Right, you can't, because such software doesn't exist. TSX is too niche and too new to have such support. And it doesn't affect the performance of any applications, except those that are written explictly to use it... which is a very low number.

On the other hand, the Bulldozer TLB bug "fix" affected ALL software - anyone who bought an AMD CPU with the TLB bug, experienced reduced performance overall.

by: john_
The difference here is that in the case of Haswell people just don't give a sh!t because Haswell is anyway the fastest cpu available.
No, we don't give a s**t because there's no reason to. The ordinary users who make up 99.999999% of Haswell's marketshare will never use TSX, so they won't be affected.

by: Thefumigator
Haswell TSX bug is stupid. As if Intel didn't have enough to make their things right... c'mon intel...
TSX has been around since 2012 and hasn't seen wide adoption. Designing silicon is difficult. Put the two together and it's not difficult to figure out how a bug like this could get into production silicon.
Posted on Reply
#8
GhostRyder
Things happen and Intel is cleaning it up, not that big a deal...
Posted on Reply
#9
eidairaman1
I find it funny that people are downplaying this problem like intel is a saint. This is big enough of a problem that Intel is releasing a patch...
Posted on Reply
#10
HumanSmoke
by: eidairaman1
I find it funny that people are downplaying this problem like intel is a saint. This is big enough of a problem that Intel is releasing a patch...
No really. The error has been found before any harm was done since no TSX workloads are in use. It comes under the heading of annoying and costly for Intel who will no doubt be sending a lot of Haswell-E CPUs to a recycling facility, but in the end, the consumer is aware of the issue before they ever have cause to be affected by it.
Kind of sucks if you bought Haswell because of the TSX extension, but I'm picking most people don't even know what it is let alone have it on their must have list.
by: john_
It's something that it is on every Haswell. It's not really something minor. It could be even worst than TLB bug on Barcelona considering the market share of AMD back then and the market share of Intel today.
That doesn't make a great deal of sense to me. Overall market share has nothing to do with it - even market shares comparing Haswell users to Opteron users (while more relevant - I don't believe Haswell owners make up the 14.6% of their market that Opteron did in Q4 2007). AMD had to cease in-progress shipments of flagship enterprise part that was causing errors in current workloads.
by: john_
TSX affects much more people.
How can it affect more people when there is no software that could take advantage of TSX even if it were available?
by: john_
The difference here is that in the case of Haswell people just don't give a sh!t because Haswell is anyway the fastest cpu available.
No, they don't (maybe) give a sh!t because they couldn't use TSX in any case.
by: eidairaman1
So disabling something that we haven't seen yet on a processor that it is the fastest available isn't really something that will make someone lose his sleep.
More to the point, disabling something you couldn't use anyway doesn't make the bug a dealbreaker for most people.
by: john_
On the other hand Barcelona was something that everyone was expecting like the second coming of AMD. Performance wasn't there and the TLB bug was just the icing on the (bad) cake. Making something that wasn't fast enough perform even slower because of a bug was just terrible.
And that is another very large distinction. Disabling TSX doesn't affect performance using other ISA extensions, where the workaround for the TLB bug for Barcelona came at a ~5-20% performance penalty.
Posted on Reply
#11
MikeMurphy
I wonder if it's too late to fix on Broadwell.
Posted on Reply
#12
HumanSmoke
by: MikeMurphy
I wonder if it's too late to fix on Broadwell.
The TSX bug is fixable, it would however need a silicon respin.
For the Broadwell's already circulating (Broadwell-Y) ? No, but I think only OEM's have them for validation at the moment.
From the Intel spokesflunky's wording, the reason for the bug has been found, so it just comes down to how quickly new lithography masks can be set up and the 8-12 weeks for fabbing the chips. Depends on how serious Intel view the bug as to when it's implemented. If it's serious then they'd likely get on to it quickly and the new chips will receive new sSpec codes at the very least.
Posted on Reply
#14
boberino
by: btarunr
Intel's "Haswell" micro-architecture introduced the transactional synchronization extensions (TSX) as part of its upgraded feature-set over its predecessor. The instructions are designed to speed up certain types of multithreaded software, and although it's too new for any major software vendor to implement, some of the more eager independent software developers began experimenting with them, only to discover that TSX is buggy and can cause critical software failures.

The buggy TSX implementation on Core "Haswell" processors was discovered by a developer outside Intel, who reported it to the company, which then labeled it as an erratum (a known design flaw). Intel is addressing the situation by releasing a micro-code update to motherboard manufacturers, who will then release it as a BIOS update to customers. The update disables TSX on affected products (Core and Xeon "Haswell" retail, and "Broadwell-Y" engineering samples). [---]

TechReport's Scott Wasson draws a parallel between the TSX erratum, and the infamous translation lookaside buffer (TLB) erratum of AMD's "Barcelona" chips, which caused the company to temporarily halt production of its first single-die quad-core Opteron processors, and release similar "performance-impacting" micro-code updates for its consumer Phenom X4 processors. Expect your motherboard vendor to dish out a BIOS update with Intel's micro-code patch very soon.

Source: TechReport
TechReport's Scott Wasson said this:
Also, because the problem is apparently restricted to the use of TSX instructions, this erratum isn't likely to prompt the sort of dire consequences the TLB erratum in AMD's Barcelona chip did. As we exclusively reported at the time, the Barcelona TLB problem caused AMD to stop the shipment of Opteron processors and issue a performance-impacting microcode patch for consumer Phenom CPUs. By contrast, the most unfortunate impact of this TSX erratum may be to slow the development of TSX-capable software.
Any chance you can help my understand why I shouldn't chalk this up to another misrepresentation of a regurgitated article?
Posted on Reply
#15
Steevo
by: boberino
TechReport's Scott Wasson said this:



Any chance you can help my understand why I shouldn't chalk this up to another misrepresentation of a regurgitated article?
http://www.anandtech.com/show/6290/making-sense-of-intel-haswell-transactional-synchronization-extensions/4

How about all the companies that may have purchased Haswell hardware with the intention of using it to improve database performance, along with other time sensitive software performance, and now.....can't.

http://www.anandtech.com/show/6290/making-sense-of-intel-haswell-transactional-synchronization-extensions


Like banks, and like most of all other financial institutions, where consumers are demanding faster deposit, transfer, and availability times, which requires...... you guessed it, faster database processing for millions of transactions per hour.

Good troll post asshat.
Posted on Reply
#16
HumanSmoke
by: Steevo
How about all the companies that may have purchased Haswell hardware with the intention of using it to improve database performance, along with other time sensitive software performance, and now.....can't. Like banks, and like most of all other financial institutions, where consumers are demanding faster deposit, transfer, and availability times, which requires...... you guessed it, faster database processing for millions of transactions per hour.
Like banks? When was the last time you saw a big company server or datacentre using consumer desktop CPUs ?
AFAIK, enterprise systems will continue to be Xeon E5 (Grantley) and E7 (Brickland). The former might initially be affected, but since their launches will be staggered its probably a safe bet that not all will be. The E7 line (Haswell-EX - which is being pushed as the cloud computing big data SKU's) will have TSX enabled.
by: Steevo
Good troll post asshat.
Given your argument isn't exactly watertight how about toning down the insults?
Posted on Reply
#17
Jizzler
Guess I'll be holding off on my Haswell purchase. Kind of like when I didn't purchase those Opterons and instead bought Xeon systems :twitch:

The degree of effect these bugs have on computing is only half the discussion as to whether or not they are equally grave. They are not given the circumstances, but in my case resulted in similar ends. That matters most to a bottom-line company such as Intel.

Now, will the bottom-line be affected equally? Probably not, given that Intel has the experience and resources to make a speedy recovery while keeping a happy face about it. Just saying that this this aspect should not be forgotten as other companies are not as lucky as Intel. Their underestimation has lead to much dire consequences.
Posted on Reply
#18
boberino
by: Steevo
http://www.anandtech.com/show/6290/making-sense-of-intel-haswell-transactional-synchronization-extensions/4

How about all the companies that may have purchased Haswell hardware with the intention of using it to improve database performance, along with other time sensitive software performance, and now.....can't.

http://www.anandtech.com/show/6290/making-sense-of-intel-haswell-transactional-synchronization-extensions


Like banks, and like most of all other financial institutions, where consumers are demanding faster deposit, transfer, and availability times, which requires...... you guessed it, faster database processing for millions of transactions per hour.

Good troll post asshat.
Since you do not appear to comprehend the point and purpose of my post I will clarify instead of reciprocating your "asshat" comment.

btarunr used TechReport as the source and quoted content from the article in his repost. My problem is the fact that the headline doesn't match the article specified and quoted as the source. If btarunr wanted to use that headline he needed to find different/additional reference material. This is not the first time a news piece at TPU has had an unqualified and unrelated headline as clickbait. And like the last time I had already read the source material prior to seeing his repost and was scratching my head as to how someone came up with that headline from the content that was sourced.
Posted on Reply
#19
Steevo
by: HumanSmoke
Like banks? When was the last time you saw a big company server or datacentre using consumer desktop CPUs ?
AFAIK, enterprise systems will continue to be Xeon E5 (Grantley) and E7 (Brickland). The former might initially be affected, but since their launches will be staggered its probably a safe bet that not all will be. The E7 line (Haswell-EX - which is being pushed as the cloud computing big data SKU's) will have TSX enabled.

Given your argument isn't exactly watertight how about toning down the insults?
http://www.newegg.com/Product/Product.aspx?Item=N82E16819116908

They are already out, and have the errata.
http://www.pcworld.com/article/2464880/intel-finds-specialized-tsx-enterprise-bug-on-haswell-broadwell-cpus.html

As does Broadwell (E5), so the new enterprise chips are going to have it until Intel finds a hardware fix. OEM's already have the chips. So we have hardware in the wild, production systems being built, that cannot support a new feature that was being pushed. For those who have based their purchasing decision on this, its going to suck, will the OEM or Intel provide new chips to those people, it wouldn't be the first time Intel has had to do that, although the last time was perhaps before you were born.

My asshat comment was about critiquing of an article title, if you find it to be clickbait, go elsewhere, or do research on the topic and realize it is something show stopping for end users and OEM's.

http://techreport.com/news/26911/errata-prompts-intel-to-disable-tsx-in-haswell-early-broadwell-cpus

Intel doesn't have a current timeline for the fix, and really considering how much complexity it takes and improvement it was offering they might not be able to implement an actual full fix, perhaps limiting the number of threads or cores that can use it.

Also, damn that is a long list of broken things, 5 pages

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update.pdf
Posted on Reply
#20
TheHunter
Yeah all Haswells and DC and new not yet released Haswell-E will have it disabled with microcode bios update..

Apparently normal Broadwell-DT will be fixed.
Posted on Reply
Add your own comment