In recent weeks, we have seen two devastating IT outages spread across the globe – from the initial CrowdStrike outage affecting 8.5 million Windows devices to the latest Microsoft DDoS-related shutdown. While the immediate impacts are still unclear, we can expect both outages to have significant long-term repercussions.
The immediate crisis is estimated to have already cost US Fortune 500 companies up to $5.4 billion in damages, with companies in the banking and healthcare sectors expected to be hardest hit. In addition, the disruption left countless organisations scrambling to restore their systems and protect their data, creating a chaotic environment ripe for exploitation. This upheaval not only exposed vulnerabilities, but weakened cybersecurity defences, making businesses much more susceptible to cybercriminals who are quick to take advantage of crises.
As we navigate the aftermath of global technology disruptions, what can IT and security leaders learn from the days when the digital world stood still?
Vice President, Global Information Security Advisor at BlackBerry Cybersecurity.
The cracks in our global digital infrastructure
Ultimately, the outages highlighted the often-overlooked physical and logistical challenges of managing a distributed IT infrastructure. As the crisis unfolded, it became clear that resolving the problem required rebooting systems in safe mode with administrator privileges. However, this process is a nightmare and time-consuming process, particularly for large, dispersed companies. Many organizations also struggled to access and repair remote systems, particularly those in hard-to-reach locations.
This is evident from the sheer volume and diversity of sectors that were affected by the collapse, from banks and airlines to hotels and hospitals. It showed us how a single point of failure can spread across the intricate web of our digital infrastructure and impact multiple industries. At the same time, the scale of the service disruption highlighted the importance of skilled IT support and strong managed security service providers (MSSPs). Above all, we immediately saw professionals from Microsoft, SonicWall, and SentinelOne working together to diagnose and resolve the issue. Their collective efforts underscore the immense value of industry collaboration, which remains one of the cybersecurity industry’s greatest assets.
Key learnings from global IT disruptions
When a major incident occurs, there is always a trail of lessons left to uncover. These disruptions signal a crucial moment for all organizations to assess their software supply chain and the operational risks to their business. This is especially true for cybersecurity software that operates deep within our software stacks, where adversaries attack but also where one bad line of code can bring down the entire system.
As the immediate impacts of the global service disruption subside, CIOs and CISOs must ask themselves: do we have the right balance in place to deliver the disaster recovery and business continuity needed when this inevitably happens again? If the question is difficult to answer, then IT and security leaders should consider the following:
1. Improve process discipline – Robust management processes are essential, particularly for security tool updates. Security managers should ensure that rigorous testing protocols are in place before deploying updates across the infrastructure. If a vendor manages this process, it is essential to inquire about their remediation plans for problematic updates.
2. Implementation of multi-vendor strategies – While consolidation has been popular, this incident underscores the importance of strategically diversifying vendors to mitigate risks and avoid single points of failure. A critical examination of the current setup to identify potential single points of failure should be a priority. Then, consider robust managed detection and response (MDR) solutions with open XDR capabilities that are best suited to support a diverse IT or security stack. The alternative locks users into a single vendor and leaves them exposed to potential vulnerabilities.
3. Strengthen endpoint protection – Outages are often caused by legacy cybersecurity practices such as complex EDRs and heavy endpoint agents that pose a significant risk to infrastructure and are unnecessarily complex. Using lightweight AI at the endpoint can prevent these types of outages by protecting your environment without heavy agents and regular updates that put your operations at risk.
4. Integrate AI responsibly – While seemingly unrelated, it is essential to develop clear policies for the integration of AI into cybersecurity operations. This foresight will help prevent future large-scale issues as AI becomes more integrated into technology stacks. While AI offers a promising path for the future, it has by no means reached its final state. Therefore, IT and security leaders must remain vigilant and adaptable and be prepared to address the evolving vulnerabilities that AI may introduce with an innovative yet responsible approach.
5. Take advantage of real-time communication capabilities – Since the disruption affected some of the world’s most critical systems, networks and applications, the response required speed, accuracy and accountability. In this case, a critical event management (CEM) solution can provide real-time visibility to ensure a fast and informed response to recover from the business disruption. At the same time, this will provide a paper trail of incident communications to demonstrate that the situation was handled with responsibility and compliance as a priority.
6. Ensure regular testing to eliminate blind spots – Understanding your vulnerabilities and risks through regular testing is critical, not just when implementing new software, but consistently over time. To protect against potential threat actors looking to take advantage of IT disruptions, a combination of internal and external AI-enabled penetration testing assessments remains vital. These will reveal how an external threat actor could compromise assets through ever-evolving tactics, techniques, and procedures. The performance and security of your systems is only as good as their least secure hardware and software components. Blind spots must therefore be addressed as a priority to keep businesses operating as usual.
These global technological disruptions were a stark reminder of the critical need for digital independence and robust management processes. Now, industry leaders must turn these lessons into actionable strategies, using this experience to build more resilient and adaptable cybersecurity frameworks. In this field, it is not a question of if the next crisis will occur, but when. The strength of the cybersecurity industry lies not only in our individual expertise, but also in our collective response to challenges. By fostering collaboration, embracing strategic complexity, and continually improving processes, future crises can be met with greater confidence and effectiveness.
We list the best data recovery service.
This article was produced as part of TechRadarPro's Expert Insights channel, where we showcase the brightest and brightest minds in the tech industry today. The views expressed here are those of the author, and not necessarily those of TechRadarPro or Future plc. If you're interested in contributing, find out more here: