On Friday, a large global outage impacting Microsoft's services caused major disruptions across several industries throughout the world. This incident caused severe operational halts at airports, financial institutions, emergency services and a variety of other businesses, bringing the industry's reliance on Microsoft technology to notice. Along with these interruptions, Windows users in India reported seeing the dreaded "Blue Screen of Death" (BSOD) on their PCs, worsening the problem for the technology company. The sudden shutdowns affected the basic functioning of organisations from minimum to maximum extent. However, Microsoft’s continuous efforts to regain normalcy are in action.
The Impact -
Numerous Microsoft Windows users globally encountered BSOD errors, leading to unsaved data loss and vital process disruptions. Offices, businesses and all sorts of users relying on Microsoft for work and educational purposes faced productivity and data loss issues for the day, impacting the basic functioning of these organisations.
The outage also severely impacted airports across the world. If we talk about India, Bengaluru airport experienced long lines as check-in systems failed, forcing airlines such as Indigo, Akasa Air and SpiceJet to issue boarding passes manually. Similarly, Delhi airport too faced short disruptions.
Amidst the worldwide disruptions going on, fortunately, the National Stock Exchange (NSE) and the Bombay Stock Exchange (BSE) in India reported no impact from the global Microsoft systems outage. But, the Indian Computer Emergency Response Team (CERT-In) has certainly issued a 'Critical' severity rating for the incident.
The United States’ Federal Aviation Administration reported that major airlines, including American, United and Delta, grounded all flights due to the issue. The outage also affected hospitals, where patients were unable to receive medications and emergency 911 services in various states were interrupted, adding more severity to the crisis.
In Australia, Sydney Airport faced check-in issues, wherein the New South Wales Police Force went up on social media to acknowledge the issue. The outage affected multiple industries, including retail, where Woolworths shops experienced malfunctioning checkout systems. ABC News 24 failed to broadcast news packages and Melbourne Airport reported check-in delays. Europe also was a part of the disruption wherein it extended to Amsterdam's check-in processes and Berlin Brandenburg airport, where flights were cancelled due to a ‘technical problem’, as mentioned by the authorities.
The disruption was not limited to specific regions. The issue affected key institutions worldwide, including the London Stock Exchange, which experienced outage-related troubles. News outlets like Sky News in the UK went off the air.
In light of the whole occurring, Prabhu Ram, VP - Industry Research Group at CyberMedia Research stated, “The extensive disruption caused by the CrowdStrike-Microsoft outage is a clear black swan occurrence, highlighting the importance of many redundancies to minimise single points of failure. Automating procedures can improve efficiency, but it also magnifies the effect of errors. This event serves as a reminder for organisations to properly test automated operations and maintain comprehensive disaster recovery strategies. While CrowdStrike's image suffers, the cybersecurity sector's development is not likely to be hampered, while competitors may win market share at its expense.”
Why It Happened -
While the world was anticipating the reasons for the outage, including various theories, certain officials have provided substantial information on this.
Omer Grossman, CyberArk's Chief Information Officer, emphasised the seriousness of the Microsoft outage triggered by a software upgrade to CrowdStrike's EDR product, describing it as one of the most critical cyber concerns of 2024. He emphasised the combined problems of restoring business continuity, which necessitates manual endpoint adjustments due to system breakdowns, and determining the reason, which might range from human mistake to a sophisticated cyberattack. Grossman emphasised the essential nature of CrowdStrike's upcoming analysis and updates. Basically, two key issues were identified as per him-
Restoring Business Continuity- Endpoints crashing and triggering Blue Screen of Death prevented remote updates, requiring human fixes for each. This process was supposed to take many days.
Identifying the Cause- Possible causes varied from human mistake, such as publishing an update without adequate quality control, to the more complicated situation of a sophisticated cyberattack.
Later, clarifying, CrowdStrike, an American cybersecurity technology company, emphasised that the outages were not caused by a security event or hack. CEO George Kurtz claimed on social networking site X, "This is not a security incident or a cyberattack." The problem has been discovered, isolated and a solution has been implemented." According to Satnam Narang, Senior Staff Research Engineer at Tenable, the vulnerability was caused by a security software upgrade that requires high-level privileges on the underlying operating system. "This event is unprecedented, and the ramifications of it are still developing," Narang told reporters.
However, ironically the misuses of AI and technology surfaced amid such huge technological glitch when Vincent Flibustier, a prankster, fraudulently claimed credit for a significant Microsoft outage that impacted Windows customers worldwide. He shared an edited photo of himself outside a CrowdStrike office, joking about creating the outage by releasing an update before taking the day off. This instance demonstrates how easily disinformation can spread, particularly when mixed with AI-generated pictures. While it is known the real outage was caused by an update to an anti-virus program 'Falcon Sensor' by CrowdStrike, not by any measures taken by Flibustier.
Microsoft's and CrowdStrike’s response on the situation-
On July 19, Microsoft reported that it was looking into a number of issues with Azure in the Central US area. However, customers in India and throughout the world filed complaints, indicating that the problem is more pervasive. Microsoft is trying to fix the global outage and restore services. The widespread influence across sectors highlights the significance of these technical systems in day-to-day operations.
CrowdStrike is currently working with clients affected by a vulnerability discovered in a single content update for Windows hosts. The firm has identified and isolated the problem before deploying a solution. They propose that organisations connect with CrowdStrike staff via formal channels for the most recent updates.
Solution-
The recent worldwide outage of Microsoft's services exposed serious complexities as well as drawbacks of being solely and faithfully dependent on software and systems without any backups or problem-solving assets. However, meanwhile, The Indian Computer Emergency Response Team (CERT-In) recommended the following steps to resolve the BSOD issue-
1. Boot Windows into Safe Mode or Windows Recovery Environment.
2. Navigate to the directory `C:\Windows\System32\drivers\CrowdStrike`.
3. Locate the file matching `C-00000291.sys` and delete it.
4. Boot the Windows normally.
What else can be done?
However, outages on such massive scale and of such severity are rare, but one must be prepared for whatever may come in this digitally dependent environment, wherein not everyone is aware and equipped with a problem-solving team or assets for such situations.
Companies can take numerous proactive steps to avoid or lessen the effects of disruptions. Implementing redundancy and backup mechanisms is critical, which includes numerous data centres, regular backups, and off-site storage. Using multi-cloud techniques distributes risk across various cloud services. Regular security audits and upgrades aid in identifying and addressing issues. Creating and testing incident response plans assures prompt action during outages, while strong monitoring and alarm systems allow for real-time tracking and timely reactions to possible concerns.
The global outage affecting Microsoft's services brought much of the digital world to a halt, affecting critical industries such as aviation, finance, emergency services and retail. While the cause was a defective CrowdStrike update which has been found and corrected, the event highlights the crucial reliance on technology in modern operations. Microsoft and the impacted organisations are working relentlessly to restore normalcy and prevent repetition of such incidents.