Major outages at CrowdStrike, Microsoft leave the world with BSODs and confusion

Enlarge / A passenger sits on the floor as long queues form at the check-in counters at Ninoy Aquino International Airport, on July 19, 2024 in Manila, Philippines.

Ezra Acayan/Getty Images

Millions of people outside the IT industry are learning what CrowdStrike is today, and that’s a real bad thing. Meanwhile, Microsoft is also catching blame for global network outages, and between the two, it’s unclear as of Friday morning just who caused what.

After cybersecurity firm CrowdStrike shipped an update to its Falcon Sensor software that protects mission critical systems, Blue Screens of Death (BSODs) started taking down Windows-based systems. The problems started in Australia and followed the dateline from there. TV networks, 911 call centers, and even the Paris Olympics were affected. Banks and financial systems in India, South Africa, Thailand, and other countries fell as computers suddenly crashed. Some individual workers discovered that their work-issued laptops were booting to blue screens on Friday morning.

Airlines, never the most agile of networks, were particularly hard-hit, with American Airlines, United, Delta, and Frontier among the US airlines overwhelmed Friday morning.

CrowdStrike CEO George Kurtz posted on X (formerly Twitter) at 5:45 am Eastern time that the firm was working on “a defect found in a single content update for Windows hosts,” with Mac and Linux hosts unaffected. “This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed,” Kurtz wrote. Kurtz told NBC’s Today Show Friday morning that CrowdStrike is “deeply sorry for the impact that we’ve caused to customers.”

A CrowdStrike engineer posted in the official CrowdStrike subreddit that the workaround steps involve booting affected Windows systems into Safe Mode or the Recovery Environment, navigating to a CrowdStrike directory, and deleting a .sys file and rebooting. If this works, it’s not something that can be done through a network push, so a lot of manual work remains to be done.

Multiple outages, unclear blame

Microsoft services were, in a seemingly terrible coincidence, also down overnight Thursday into Friday. Multiple Azure services went down Thursday evening, with the cause cited as “a backend cluster management workflow [that] deployed a configuration change causing backend access to be blocked between a subset of Azure Storage clusters and compute resources in the Central US region.”

News reporting on these outages has so far blamed either Microsoft, CrowdStrike, or an unclear mixture of the two as the responsible party for various outages. It may be unavoidable, given that the outages are all happening on one platform, Windows. Microsoft itself issued an “Awareness” regarding the CrowdStrike BSOD issue on virtual machines running Windows. The firm was frequently updating it Friday, with a fix that may or may not surprise IT veterans.

“We’ve received feedback from customers that several reboots (as many as 15 have been reported) may be required, but overall feedback is that reboots are an effective troubleshooting step at this stage,” Microsoft wrote in the bulletin. Alternately, Microsoft recommend customers that have a backup from “before 19:00 UTC on the 18th of July” restore it, or attach the OS disk to a repair VM to then delete the file (Windows/System32/Drivers/CrowdStrike/C00000291*.sys) at the heart of the boot loop.

Security consultant Troy Hunt was quoted as describing the dual failures as “the largest IT outage in history,” saying, “basically what we were all worried about with Y2K, except it’s actually happened this time.”

United Airlines told Ars that it was “resuming some flights, but expect schedule disruptions to continue throughout Friday,” and had issued waivers for customers to change travel plans. American Airlines posted early Friday that it had re-established its operations by 5 am Eastern, but expected delays and cancellations throughout Friday.

Ars has reached out to CrowdStrike, Microsoft, and a number of airlines for comment and will update this post with response.

This is a developing story and this post will be updated as new information is available.

Source link