The volume of sensitive data that companies store in non-production environments such as development, testing, analytics, and AI/ML is increasing, according to a new report. Executives are also increasingly concerned about protecting it, and incorporating it into new AI products doesn’t help.
The Delphix 2024 State of Data Compliance and Security Report revealed that 74% of organizations handling sensitive data have increased the amount of data stored in non-production environments, also known as lower-level environments, over the past year. Additionally, 91% are concerned about the increased exposure this can create, putting them at risk of breaches and penalties for non-compliance.
The amount of consumer data held by businesses is increasing overall due to the growth in the number of online consumers and their ongoing digital transformation efforts. IDC predicts that by 2025, the global datasphere will grow to 163 zettabytes, ten times the 16.1 zettabytes of data generated in 2016.
As a result, the amount of sensitive data being stored, such as personally identifiable information, protected health information and financial details, is also increasing.
Sensitive data is often created and stored in production or live environments, such as CRM or ERP, which have strict controls and limited access. However, standard IT operations often result in data being copied multiple times to non-production environments, allowing access by more personnel and increasing the risk of breach.
The report's findings are the result of a survey by software vendor Perforce of 250 senior employees at organizations with at least 5,000 employees handling sensitive consumer data.
SEE: National public data breach: 2.7 billion records leaked on the Dark Web
More than half of companies have already suffered a data breach
More than half of respondents said they had already experienced a breach of sensitive data stored in non-production environments.
Other evidence supports that the problem is getting worse: An Apple study found that there was a 20% increase in data breaches from 2022 to 2023. In fact, 61% of Americans learned that their personal data had been breached or compromised at some point.
The Perforce report found that 42% of organizations surveyed have experienced ransomware attacks. Ransomware in particular is a growing threat globally; a Malwarebytes study released this month found that ransomware attacks worldwide have increased by 33% in the past year.
Part of the problem is that global supply chains are becoming longer and more complex, increasing the number of potential entry points for attackers. A report from the Identity Theft Resource Center found that the number of organizations affected by supply chain attacks increased by more than 2,600 percentage points between 2018 and 2023. In addition, payments exceeded $1 billion (£790 million) for the first time in 2023, making it an increasingly lucrative exploit for attackers.
AI is the biggest culprit behind consumer data insecurity
As companies embrace AI in their business processes, it’s becoming increasingly difficult to control what data goes where.
AI systems often require the use of sensitive consumer data to train and operate, and the complexity of the algorithms and potential integration with external systems can create new attack vectors that are difficult to manage. In fact, the report found that AI and machine learning are the leading causes of the growth of sensitive data in non-production environments, as cited by 60% of respondents.
“AI environments may be less controlled and protected than production environments,” the report’s authors wrote. “As a result, they may be easier to breach.”
Business decision-makers are aware of this risk: 85% express concerns about regulatory non-compliance in AI environments. While many AI-specific regulations are in their early stages, the GDPR requires personal data used in AI systems to be processed lawfully and transparently, and there are several applicable laws at the state level in the US.
WATCH: Executive order on artificial intelligence: White House releases 90-day progress report
In August, the EU AI Act came into force, setting strict rules on the use of AI for facial recognition and safeguards for general-purpose AI systems. Companies that fail to comply with the legislation face fines ranging from €35 million (US$38 million) or 7% of global turnover to €7.5 million (US$8.1 million) or 1.5% of turnover, depending on the infringement and the size of the company. More similar AI-specific regulations are expected to emerge in other regions in the near future.
Other concerns about sensitive data in AI environments, cited by more than 80% of respondents in the Perforce study, include the use of low-quality data as input into their AI models, re-identification of personal data, and theft of model training data, which may include intellectual property and trade secrets.
Companies are concerned about the financial cost of insecure data
Another major reason why large companies are so concerned about the insecurity of their data is the prospect of receiving a hefty fine for non-compliance. Consumer data is subject to increasingly strict regulations, such as GDPR and HIPAA, which can be confusing and change frequently.
Many regulations, such as GDPR, apply penalties based on annual turnover, so larger companies face higher costs. Perforce’s report found that 43% of respondents have already had to pay or adjust for non-compliance, and 52% have experienced audit issues and failures related to unproductive data.
But the cost of a data breach can go beyond the fine, as a portion of lost revenue comes from suspended operations. A recent Splunk report found that the leading cause of downtime incidents was cybersecurity-related human error, such as clicking on a phishing link.
Unplanned service disruptions cost the world’s largest companies $400 billion a year, with contributing factors including direct revenue loss, diminished shareholder value, stagnant productivity, and reputational damage. In fact, the costs of ransomware damage are projected to exceed $265 billion by 2031.
According to IBM, the average cost of a data breach in 2024 is $4.88 million, up 10% from 2023. The tech giant’s report adds that 40% of breaches occurred on data stored in various environments, such as the public cloud and on-premises, and that these cost more than $5 million on average and took the longest to identify and contain. This shows that business leaders are right to be concerned about data proliferation.
WATCH: Nearly 10 billion passwords leaked in biggest hacker ever
Taking steps to protect data in non-production environments can be resource intensive.
There are ways to protect data stored in non-production environments, such as masking sensitive data. However, the Perforce report found that businesses have several reasons for being reluctant to do so, including that respondents find it difficult and time-consuming, and that it can slow down the organization.
- Nearly a third are concerned that this could slow down software development, as securely replicating production databases to non-production environments can take weeks.
- 36% say that masked data can be unrealistic and therefore affect software quality.
- 38% think security protocols can inhibit a company's ability to track and comply with regulations.
The report also revealed that 86% of organizations allow data compliance exceptions in non-production environments to avoid the hassle of storing data securely. These exceptions include using a limited data set, data minimization, or obtaining consent from the data subject.
Recommendations for protecting sensitive data in non-production environments
The Perforce team outlined the four main ways that organizations can protect their sensitive data in non-production environments:
- Static data masking:Permanently replace sensible values with fictitious but realistic equivalents.
- Data Loss Prevention (DLP):A perimeter defense security approach that detects potential data breaches and thefts and attempts to prevent them.
- Data encryption:Temporarily converts data into code, allowing only authorized users to access it.
- Strict access control:A policy that classifies users by roles and other attributes and configures these users' access to data sets based on these categories.
The authors wrote: “Protecting sensitive data in general is not easy. AI and machine learning increase that complexity.
“Tools that specialize in protecting sensitive data in other non-production environments (e.g., development, testing, and analytics) are well positioned to help you protect your AI environment.”