Massive Bug in Intel, AMD, ARM and other Processors in Handling VM Memory

In the last few days, we started hearing about a massive bug in the implementation of the VM memory handling.

Intel has been frantically working on fixing host machines kernels in order to prevent one VM from accessing the memory of other VMs that run on the same machine.

Of course, most users are not going to attempt to read someone else memory so in that sense, most of us are relatively safe. But all of those who are not so lucky and have uninvited guests on their systems are running at very high risk. The information available from Intel says the memory is accessible only in Read mode. Read only at least prevents changing your VM memory and taking over. In Read/Write, the hacker could take over your machine and ALL your data. At this point, it looks like it will only work on data that you have in memory (at least, but it's already really bad.)

Intel and others have been working on fixes for this bug since November 2017, which explains the large number of updates to the Linux Kernel (the Intel developers are responsible for writing the low level Virtual Machine code in the Linux Kernel.)

I guess we all hope to get more news about this in the near future. My take is that since we've already seen many upgrades to the Linux Kernel, it is not unlikely already fixed or close to being fixed. However, I read that IBM is about to reboot their entire cloud system so it would not yet have been done at the lowest level. (i.e. the fix has to be applied to the main OS, the one the hardware machine boost into, not just the VMs.) Microsoft announce a date: Jan 10, to reboot their Azure cloud system.

At this point I have not been able to find much about companies rebooting for that problem. Since fixes would have been applied since Nov 2017 and DigitalOcean did several reboots since then, it could very well be that the problem is already resolved there. But we'll need to get more information to make sure.

Note that clouds are quite numerous. Some are private which means those are secure since only one company runs on that cloud. However, anyone that makes use of a cheap VM here and another cheap VM there... is affected. That includes businesses such as NetFlix that makes use of the AWS (Amazone) cloud system. DropBox is another system running on a cloud.

Microsoft has Azure. There is DigitalOcean. Linode is a big one. GoDaddy has one, I'm not too sure about their extend, though. VMware, Oracle, Verizon, 1&1, RedHat, MegaCloud, CloudSigma, Hyve, Ubiquity, ToggleBox, Navisite, Vultr, and more...

Note that many say that the AMD virtual environments are not affected. This is not the case. All the processors have a similar issue. So if your have a virtual machine you are most certainly affected unless your host already rebooted their machine with a patch. I know DigitalOcean rebooted a couple of times since Nov 2017, but it looks like the full patch for that VM problem has been made available only recently (Jan 2018).

The patch is written in software making use of more MMU logic and kernel tests (from what I understand.) The impact for fixing the VM implementation is not unlikely to run up to 30% slower than before. Hopefully this is only for memory intensive work and not the general functionality of the VMs. From one graph I have found, PostgreSQL accesses will be even more than 30% slower with the patch installed. This is pretty significant. Intel is promising that they will continue to work on the fixes in order to lower the impact. They have to have a fix very quickly (obviously?!) and therefore can't offer the best of the best on day one.

Note that some companies, such as RackSpace, make use of other provider systems. RackSpace sales many AWS servers in their offerings. What they sell you depends on your requirements. In such a case, you will have to call them to make sure if you don't already know which cloud system you're really on.

Is there something you can do in the meantime?

I would suggest you upgrade to the latest kernel because even if the host has to be upgraded and rebooted, having the latest kernel on your VM is already going to be a good thing. Don't forget that in most cases you need to (1) upgrade to the latest, and (2) reboot. Kernel changes are effective only after a reboot (although it is possible to switch between kernel without a reboot, on a VM it takes less than a minute and it's going to be much safer than a hot switch, really!)

Intel says they will give additional information on Jan 10 when the latest patch will be fully avaialble.