By Monday morning, many of the major disruptions from the flawed CrowdStrike security update late last week had cleared up. Flight delays and cancellations were no longer front-page news, and multiple Starbucks locations near me are taking orders through the app once again.
But the cleanup effort continues. Microsoft estimates that around 8.5 million Windows systems were affected by the issue, which involved a buggy .sys file that was automatically pushed to Windows PCs running the CrowdStrike Falcon security software. Once downloaded, that update caused Windows systems to display the dreaded Blue Screen of Death and enter a boot loop.
“While software updates may occasionally cause disturbances, significant incidents like the CrowdStrike event are infrequent,” wrote Microsoft VP of Enterprise and OS Security David Weston in a blog post. “We currently estimate that CrowdStrike’s update affected 8.5 million Windows devices, or less than one percent of all Windows machines. While the percentage was small, the broad economic and societal impacts reflect the use of CrowdStrike by enterprises that run many critical services.”
The “easy” fix documented by both CrowdStrike (whose direct fault this is) and Microsoft (which has taken a lot of the blame for it in mainstream reporting, partly because of an unrelated July 18 Azure outage that had hit shortly before) was to reboot affected systems over and over again in the hopes that they would pull down a new update file before they could crash. For systems where that method hasn’t worked—and Microsoft has recommended customers reboot as many as 15 times to give computers a chance to download the update—the recommended fix has been to delete the bad .sys file manually. This allows the system to boot and download a fixed file, resolving the crashes without leaving machines unprotected.
To help ease the pain of that process, Microsoft over the weekend released a recovery tool that helps to automate the repair process on some affected systems; it involves creating bootable media using a 1GB-to-32GB USB drive, booting from that USB drive, and using one of two options to repair your system. For devices that can’t boot via USB—sometimes this is disabled on corporate systems for security reasons—Microsoft also documents a PXE boot option for booting over a network.
WinPE to the rescue
The bootable drive uses the WinPE environment, a lightweight, command-line-driven version of Windows typically used by IT administrators to apply Windows images and perform recovery and maintenance operations.
One repair option boots directly into WinPE and deletes the affected file without requiring administrator privileges. But if your drive is protected by BitLocker or another disk-encryption product, you’ll need to manually enter your recovery key so that WinPE can read data on the drive and delete the file. According to Microsoft’s documentation, the tool should automatically delete the bad CrowdStrike update without user intervention once it can read the disk.
If you are using BitLocker, the second recovery option attempts to boot Windows into Safe Mode using the recovery key stored in your device’s TPM to automatically unlock the disk, as happens during a normal boot. Safe Mode loads the minimum set of drivers that Windows needs to boot, allowing you to locate and delete the CrowdStrike driver file without running into the BSOD issue. The file is located at Windows/System32/Drivers/CrowdStrike/C-00000291*.sys
on affected systems, or users can run “repair.cmd” from the USB drive to automate the fix.
For its part, CrowdStrike has set up a “remediation and guidance hub” for affected customers. As of Sunday, the company said it was “test[ing] a new technique to accelerate impacted system remediation,” but it hasn’t shared more details as of this writing. The other fixes outlined on that page include rebooting multiple times, manually deleting the affected file, or using Microsoft’s boot media to help automate the fix.
The CrowdStrike outage didn’t just delay flights and make it harder to order coffee. It also affected doctor’s offices and hospitals, 911 emergency services, hotel check-in and key card systems, and work-issued computers that were online and grabbing updates when the flawed update was sent out. In addition to providing fixes for client PCs and virtual machines hosted in its Azure cloud, Microsoft says it has been working with Google Cloud Platform, Amazon Web Services, and “other cloud providers and stakeholders” to provide fixes to Windows VMs running in its competitors’ clouds.