Unplanned downtime is costing the world's largest companies $400 billion a year, or about 9% of their profits, according to a new report. This equates to about $9,000 lost for every minute of system failure or service degradation.
The report, published by data management platform Splunk, also revealed that it takes 75 days for a Forbes Global 2000 company's revenue to recover to where it was financially before the incident.
Downtime directly results in financial losses through lost revenue, regulatory fines, and overtime wages for staff who fix the problem. The report also revealed hidden costs that take longer to have an impact, such as decreased shareholder value, stagnant developer productivity, and reputational damage.
The Hidden Costs of Downtime report surveyed 2,000 executives, including CFOs, CMOs, engineers, and IT and security professionals, from Global 2000 companies in 53 countries and a variety of industries. They provided insights into where downtime was originating, how it was affecting their business, and how to reduce it.
Causes of downtime include cybersecurity-related human errors
Downtime incidents experienced by large enterprises can be classified into one of two categories: security incidents (e.g., phishing attacks) or application or infrastructure issues (e.g., software crashes). According to the report, the average Global 2000 company experiences 466 hours of cybersecurity-related downtime and 456 hours of application or infrastructure-related downtime.
“While availability for most systems is several 9, downtime on hundreds, or perhaps thousands, of systems adds up,” the authors wrote.
The top cause of downtime incidents cited by respondents was cybersecurity-related human errors, such as clicking on a phishing link. This was followed by ITOps-related human errors (e.g., infrastructure misconfigurations, capacity issues, and application code errors). It takes an average of 18 hours until downtime or service degradation resulting from human error, such as latency, is detected, and another 67 to 76 hours to recover.
SEE: How to prevent phishing attacks with multi-factor authentication
Software failures are the third leading cause of downtime, becoming a greater risk as organizations adopt more complex development and deployment practices. The fourth is the malware attack.
The report revealed that more than half of executives know the root causes of downtime in their organizations, but choose not to address them. This may be because they don't want to add to the technical debt of legacy systems or they don't have a plan to decommission the problematic application. Additionally, only 42% of technology executives choose to perform a post-mortem after a downtime incident to isolate and alleviate the cause, as they can be difficult and time-consuming.
Direct costs of downtime
Lost revenue is by far the largest cost as a result of downtime, averaging $49 million per year for each Global 2000 company. The second largest is regulatory fines of $22 million, as that many localities impose strict downtime regulations, such as the Digital Operational Resilience Act for the EU financial sector.
Other major cost sinks include repairing brand reputation. According to CMOs, it costs an average of $14 million to carry out the necessary brand trust campaigns and another $13 million to repair relationships with the public, investors and the government. It takes about 60 days to fully restore the brand to health.
Despite the advice of cyber professionals, 67% of CFOs recommend their board of directors pay the ransom to get out of a ransomware attack, either directly to the perpetrator, through insurance, a third party, or all three. The payments cost Global 2000 companies a total of $19 million a year.
Hidden costs of downtime
Beyond the immediate financial costs of downtime, respondents cited a host of other costly knock-on effects. For example, 28% said a downtime event decreased their shareholder value, with the average share price falling 2.5%. It took an average of 79 days for a large company's stock to recover to where it was previously.
Other hidden costs of downtime events include delayed time to market and stalled developer innovation, cited by 74% and 64% of respondents, respectively. The latter is a result of technical teams moving from high-value work to applying patches and participating in autopsies. Similarly, in marketing departments, downtime causes teams and budgets to pivot toward crisis management, thereby losing productivity in other areas.
Customer lifetime value can also be affected by downtime, according to 40% of respondents, as an outage will negatively impact the customer experience and therefore their loyalty to the organization. In fact, 29% of companies surveyed say they know they have lost customers as a result of an incident.
SEE: What the AT&T outage can teach organizations about customer communication and IT best practices
How businesses can avoid downtime
Advice from resilience leaders
Splunk's report revealed several ways companies can avoid downtime, either because respondents found them useful or because they were demonstrated by the top 10% of companies that demonstrated resilience to disruptions.
Companies in this last category, the so-called “resilience leaders,” retain $17 million more of their revenue, pay $10 million less in fines, and save $7 million in ransomware payments. They also recover 23% and 28% faster than average from downtime related to cybersecurity and applications or infrastructure, respectively. As a result, hidden costs, such as poor customer experience, have less impact.
Resilience leaders invest more in certain areas than other organizations surveyed, and these are:
- Security tools: 12 million dollars more.
- Observability tools: 2.4 million dollars more.
- Additional infrastructure capacity: 8 million dollars more.
- Cyber Insurance Premiums: 11 million dollars more.
- Backups: 10 million more dollars.
Generative AI can also be used to reduce downtime, as it can equip teams with the information they need to get back online quickly. The report found that resilience leaders expand the use of AI functions four times faster than other respondents. Additionally, 74% of companies using discrete AI tools and 64% integrating AI into existing tools to address downtime found it beneficial.
Splunk Tips
The report authors also provided tips to avoid downtime based on their experience.
- Have a downtime plan. Instrument each application, follow a runbook to detect outages, and identify proprietary engineers. Practice exercises and table exercises.
- Perform autopsies. Observability tools make it easy to isolate root causes and implement solutions.
- Establish a clear data governance policy. Intellectual property rules, especially when it comes to introducing it into large language models, will protect the organization from data leaks.
- Connect equipment and tools. Teams that share tools, data, and context will find it easier to collaborate, troubleshoot, and identify the root cause of downtime.
- Employ predictive analytics. Solutions powered by AI and ML can recognize patterns and alert teams when downtime may occur.
“Business disruption is inevitable. When digital systems fail unexpectedly, companies not only lose substantial revenue and risk facing regulatory fines, but they also lose customer trust and reputation,” said Gary Steele, president of Cisco Marketing and GM at Splunk. , in a press release.
“The way an organization reacts, adapts and evolves in the face of disruption is what sets it apart as a leader. “A critical element of a resilient enterprise is a unified approach to security and observability to quickly detect and resolve issues across your entire digital footprint.”