Tuesday, January 2nd 2018

Intel Secretly Firefighting a Major CPU Bug Affecting Datacenters?

There are ominous signs that Intel may be secretly fixing a major security vulnerability affecting its processors, which threatens to severely damage its brand equity among datacenter and cloud-computing customers. The vulnerability lets users of a virtual machine (VM) access data of another VM on the same physical machine (a memory leak). Amazon, Google, and Microsoft are among the big three cloud providers affected by this vulnerability, and Intel is reportedly in embargoed communications with engineers from the three, to release a software patch that fixes the bug. Trouble is, the patch inflicts an unavoidable performance penalty ranging between 30-35%, impacting the economics of using Intel processors versus AMD ones.

Signs of Intel secretly fixing the bug surfaced with rapid changes to the Linux kernel without proper public-visibility of the documentation. The bulk of the changes involve "kernel page table isolation," a feature that prevents VMs from reading each other's data, but at performance costs. Developers note that these changes are being introduced "very fast" by Linux kernel update standards, and even being backported to older kernel versions (something that's extremely rare). Since this is a hardware vulnerability, Linux isn't the only vulnerable software platform. Microsoft has been working on a Windows kernel patch for this issue since November 2017. AMD x86 processors (such as Opteron, Ryzen, EPYC, etc.,) are immune to this vulnerability.
Source: Reddit
Add your own comment

53 Comments on Intel Secretly Firefighting a Major CPU Bug Affecting Datacenters?

#1
eidairaman1
The Exiled Airman
OUCH! @biffzinker had created a thread already before this appeared...
Posted on Reply
#2
Prima.Vera
We need more details regarding this.
Are only the XEONS affected, or ALL processors??? Personally I don't want an Windows update to gimp my CPU performance just because it might have a memory leak if I run a VM software. Screw that. I'm not using my desktop to run VMs anyways, or if I do, is for my own personal access anyways.
Posted on Reply
#3
Jism
I remember reading somewhere a while ago, that Intel Skimped out on the testing proces. That more 'risk' was willen to be taken with releasing CPU's. If this is true, then intel has a serious problem, lol. It will defenitly affect lots of servers equipped with intel hardware and proberly a classic lawsuit from here to tokio of company's claiming their 25 ~ 35 % of funds back.

VM's should be isolated at all cost.
Posted on Reply
#4
lilunxm12
Prima.Vera said:
We need more details regarding this.
Are only the XEONS affected, or ALL processors??? Personally I don't want an Windows update to gimp my CPU performance just because it might have a memory leak if I run a VM software. Screw that. I'm not using my desktop to run VMs anyways, or if I do, is for my own personal access anyways.
I believe the performance degrade only affects virtualization applications. I'm wrong.
Posted on Reply
#5
R0H1T
lilunxm12 said:
I believe the performance degrade only affects virtualization applications.
This is a virtual memory bug, nothing to do with virtualization! The performance hit could be massive, like real big ~ https://www.phoronix.com/scan.php?page=article&item=linux-415-x86pti
Also by the looks of it all major core uarchs are affected, including desktop chips, another one in a line of massive Intel ***kups :laugh:
Posted on Reply
#6
lilunxm12
R0H1T said:
This is a virtual memory bug, nothing to do with virtualization! The performance hit could be massive, like real big ~ https://www.phoronix.com/scan.php?page=article&item=linux-415-x86pti
Another in a line of massive Intel ***kups :laugh:
To me it's a security vulnerability other than general bug, apparently no function feature is affected. I just assumed intel may be able to selectively patch for affected user case and leave other parts untouched but it turns out I'm wrong.
That's a really big fuck up for all customers. Wondering whether intel will get a class lawsuit for failing to deliver promised performance.
Posted on Reply
#7
R0H1T
lilunxm12 said:
To me it's a security vulnerability other than general bug, apparently no function feature is affected. I just assumed intel may be able to selectively patch for affected user case and leave other parts untouched but it turns out I'm wrong.
That's a really big fuck up for all customers. Wondering whether intel will get a class lawsuit for failing to deliver promised performance.
This will also affect every major cloud provider, not only Intel but all of them are at risk, especially when it comes to securing their data.
Posted on Reply
#8
xkm1948
R0H1T said:
This will also affect every major cloud provider, not only Intel but all of them are at risk, especially when it comes to securing their data.
Never used cloud, never trusted cloud.
Posted on Reply
#9
R0H1T
xkm1948 said:
Never used cloud, never trusted cloud.
I'm sure AWS, Azure, Google, Alibaba customers are (un)happy looking at this :shadedshu:
But seriously this could be huge if any data from even one of your clients has leaked, it could potentially be devastating for Intel, as well as every cloud provider atm.
Posted on Reply
#10
Jism
xkm1948 said:
Never used cloud, never trusted cloud.
If you'd hire a VPS or something simular, your often being put on a large machine with many virtual machine's being active. As far as the above article goes, it looks like deu to a memory leak, users from one VM are able to read data of another VM which is a security threat. It's being handled by the CPU and the flaw looks to be inside the CPU. Hence the bios fix and hence the performance impact. Cloud or not, it makes no sense to deploy a complete 1U server for just one customer as you can put 50 VM's into one big 4U server loaded with CPU's, RAM, SSD's and such.

Cloud is just another fancy term where more and more services are being offered in a datacentre such as remote backups, webhosting, VOIP and such. Before cloud you had the same services but today's standard is that it's more reliable then ever. Intel has a serious situation if this is true, and the performance impact being up to 35% lol. I remember somewhere an article that someone worked at Intel and basicly the manager said, we need to skimp out on testing CPU's and push them out alot faster then usual.

If you skimp testing, these bugs cannot be found quickly enough, and once deployed in masses, intel has a serious situation.
Posted on Reply
#11
cdawall
where the hell are my stars
So this bug has existed since core solo. How many people has it actually affected?
Posted on Reply
#12
btarunr
Editor & Senior Moderator
Prima.Vera said:
Are only the XEONS affected, or ALL processors???
Literally every Intel processor supporting virtualization (VT-x). That's pretty much every Core, Pentium, Celeron, and Xeon processor launched since 2007.
Posted on Reply
#13
R-T-B
cdawall said:
So this bug has existed since core solo. How many people has it actually affected?
I'm betting the number will go massively up following it's disclosure.
Posted on Reply
#14
R0H1T
R-T-B said:
I'm betting the number will go massively up following it's disclosure.
Speculation is that the actual bug was discovered in 2016 ~ https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/998707-initial-benchmarks-of-the-performance-impact-resulting-from-linux-s-x86-security-changes?p=998753#post998753
Realy PTI is KASLR. So the PTI patches are KASLR fixed and fixing KASLR fixes the problem of userspace programs accessing kernel working memory on intel x86 processes that may or may not be properly read only. AMD MMU design does not have this issue since it automatically forbids lower ring levels accessing high ring level memory without assignment.

So this is the 2016 flaw that the PTI patches fix. How bad that 2016 flaw was missed because working memory turning read/write in kernel space had been overlooked.

According to reports it allows writing to kernel memory due to some bug in the CPU when doing speculative execution.

That is not the only way memory kernel can end up read/write and with the intel flaw be writable from userspace. The PTI change on x86 is closing a defect a defect found in 2016 that people failed to understand how bad it was.

Yes there was a problem in 2016 of that only beats KASLR not considering that in the process of beating KASLR it was touching kernel space memory from userspace and that kind of sweep and mapping could be used to locate where DMA buffers and other items that change between read only to read write.

Basically 2017 it was worked out how bad that 2016 KASLR breaching bug really was. Yes the PTI patches are doing everything that is required to fix that 2016 bug not in fact fixing where the cpu set kernel space memory read/write instead of read only. If userspace can no longer access those blocks of memory the threat is neutralised. AMD MMU userspace code cannot do those attacks.

So the 30 percent performance hit that was talking about in that blackhat Intel cpu are having to take now and optimisations done to attempt claw back that performance loss.
The number of users affected could be huge, the actual exploits (if any) in the wild is unknown at this point.
Posted on Reply
#15
nem..
its just an copi-pasta but hope be useful

Massive design flaw in Intel CPUs found, reduces performance
https://www.tweaktown.com/news/60357/massive-design-flaw-intel-cpus-found-reduces-performance/index.html


The affected Intel processors will not just face a security vulnerability, but a huge performance drop of between 5-30% once the OS has been fixed. Intel processors have a bug that can't be fixed with a microcode update, meaning Microsoft has to issue a fix at a Windows level, or you'll be forced into the arms of an AMD processor, which aren't affected.

How bad is the security issue? Well, an affected processor could have the contents of its kernel memory accessed, which is where super-secure things like passwords, log-ins, and more can be found.

The Register, who first reported on the story, explains: "At worst, the hole could be abused by programs and logged-in users to read the contents of the kernel's memory. Suffice to say, this is not great. The kernel's memory space is hidden from user processes and programs because it may contain all sorts of secrets, such as passwords, login keys, files cached from disk, and so on. Imagine a piece of JavaScript running in a browser, or malicious software running on a shared public cloud server, able to sniff sensitive kernel-protected data".

With a huge 5-30% decrease in performance, AMD is going to have a massive win here - buy Ryzen CPUs and receive a CPU that will perform better than an equally priced Intel CPU, post-OS patch. This will send shock waves through the industry, and completely change benchmarking for things like me - as once I patch my OS, a 5-30% performance drop affects absolutely everything I do.

This is an x86 level problem, so AMD isn't out of the crap yet - something we'll be keeping an eye on as this story progresses.

For Intel, well... I'm sure I'll wait for a comment from them once this article goes live, as I will reach out and ask for comment and I'm sure that email will get lost. Maybe they can blame the security bug in my Core i7-8700K, heh.

Initial Benchmarks Of The Performance Impact Resulting From Linux's x86 Security Changes

https://www.phoronix.com/scan.php?page=article&item=linux-415-x86pti&num=2

https://www.phoronix.com/scan.php?page=article&item=linux-415-x86pti&num=1
Posted on Reply
#16
xkm1948
Definitely a bad 2018 start for intel
Posted on Reply
#17
R-T-B
R0H1T said:
Speculation is that the actual bug was discovered in 2016
Right, but we are still waiting on full disclosure apparently.
Posted on Reply
#18
eidairaman1
The Exiled Airman
This doesn't bode well on any level, lets not minimize this but actually call it what it really is, a major flaw in intels arcitecture.
Posted on Reply
#19
R-T-B
eidairaman1 said:
This doesn't bode well on any level, lets not minimize this but actually call it what it really is, a major flaw in intels arcitecture.
That's what most errata is. It will likely require an on-silicon fix to not hurt performance.

AMD had a pretty big one around Ryzen launch too. It just only affected linux, so no one cared.
Posted on Reply
#20
eidairaman1
The Exiled Airman
R-T-B said:
That's what most errata is. It will likely require an on-silicon fix to not hurt performance.

AMD had a pretty big one around Ryzen launch too. It just only affected linux, so no one cared.
Considering a microcode wont be able to fix it but an os patch and a hardware stepping advance.
Posted on Reply
#22
R0H1T
R-T-B said:
Right, but we are still waiting on full disclosure apparently.
IMO such a large flaw is only disclosed/fixed when vendors are forced to. So either ~

a) some white hat was gonna disclose this exploit pretty soon.
b) there are a sizable number of exploits in the wild, so this needs to be patched asap.

OR all of the above.
Posted on Reply
#23
eidairaman1
The Exiled Airman
R0H1T said:
IMO such a large flaw is only disclosed/fixed when vendors are forced to. So either ~

a) some white hat was gonna disclose this exploit pretty soon.
b) there are a sizable number of exploits in the wild, that this needs to be patched asap.

OR all of the above.
We could only hope that it was A to "keep them honest"
Posted on Reply
#24
theGryphon
Beginning of the end for Intel?

Thank God I went with Ryzen this time... yeah, crap can surface on them later too, but man, this is big!
Posted on Reply
#25
eidairaman1
The Exiled Airman
theGryphon said:
Beginning of the end for Intel?

Thank God I went with Ryzen this time... yeah, crap can surface on them later too, but man, this is big!
I wouldn't say the end as they have plenty of profit and revenue to stay afloat, but they need to address this flaw quickly by allowing cpu replacements to the biggest "pay to play" companies and keep their stock holders happy.
Posted on Reply
Add your own comment