AI-generated code is causing disruption and security issues in businesses

Companies using AI to generate code are experiencing downtime and security issues. The team at Sonar, a provider of code quality and security products, has heard first-hand stories of consistent outages even at major financial institutions where the developers responsible for the code blame AI.

Among many other imperfections, AI tools are not perfect when it comes to generating code. Researchers at Bilkent University found that the latest versions of ChatGPT, GitHub Copilot, and Amazon CodeWhisperer generated correct code only 65.2%, 46.3%, and 31.1% of the time, respectively.

Part of the problem is that AI is notoriously bad at math because it struggles to understand logic. Also, programmers aren’t known for being good at writing prompts because “AI doesn’t do things consistently or work like code,” according to Wharton AI professor Ethan Mollick.

WATCH: OpenAI introduces 'Strawberry' model, optimized for complex coding and math

Could 'poor reviews' be a factor?

By the end of 2023, more than half of organizations said they had encountered security issues with bad AI-generated code “sometimes” or “frequently,” according to a Snyk survey. But the problem could get worse, with 90% of enterprise software engineers using AI code assistants by 2028, according to Gartner.

Tariq Shaukat, CEO of Sonar and former president of Bumble and Google Cloud, is already “hearing more and more about the issue.” In an interview with TechRepublic, he told TechRepublic: “Companies are deploying AI code generation tools more frequently, and the generated code is being put into production, leading to outages and/or security issues.

“This is typically due to insufficient reviews, either because the company has not implemented robust code quality and review practices or because developers review AI-written code less than they would their own code.

“When asked about buggy AI, the most common response is ‘it’s not my code,’ meaning they feel less responsible because they didn’t write it.”

SEE: 31% of organizations using generative AI ask you to write code (2023)

He stressed that this is not due to a lack of care on the part of the developer, but rather a lack of interest in “editing the code” and quality control processes not being prepared for the speed of AI adoption.

The laissez-faire effect

Furthermore, a 2023 study from Stanford University that looked at how users interact with AI code assistants found that those who use them “wrote significantly less secure code” but were “more likely to believe they wrote secure code.” This suggests that simply by using AI tools, programmers will automatically adopt a more laissez-faire attitude when it comes to reviewing their work.

It’s human nature to be tempted by an easier shortcut, particularly when under pressure from a manager or release schedule, but putting your full trust in AI can have an impact on the quality of code reviews and understanding of how code interacts with an application.

The CrowdStrike service outage in July highlighted how widespread the outage can be if a critical system fails. While that incident was not specifically related to AI-generated code, the cause of the outage was an error in the validation process, which allowed “problematic content data” to be deployed. This demonstrates the importance of the human element when vetting critical content.

Developers are also not unaware of the potential dangers that can come with using AI in their work. According to a report by Stack Overflow, only 43% of developers are confident in the accuracy of AI tools, up just 1% from 2023. The favorability rating of AI among developers also fell from 77% last year to 72% this year.

But despite the risk, engineering departments haven't been put off by AI coding tools, largely because of the efficiency benefits. A survey by Outsystems found that more than 75% of software executives cut their development time in half thanks to AI-powered automation. This is also making developers happier, Shaukat told TechRepublic, because they're spending less time on routine tasks.

What is 'code rotation'?

The time savings from productivity gains could be offset by the effort required to fix issues caused by AI-generated code.

GitClear researchers inspected 153 million lines of code originally written between January 2020 and December 2023 (when the use of AI coding assistants skyrocketed) that had been altered in some way. They observed an increase in the amount of code that had to be fixed or rolled back less than two weeks after it was created — called “code churn,” which indicates instability.

Researchers project that instances of code turnover will double in 2024 compared to the pre-AI baseline of 2021 and that more than 7% of all code changes will be rolled back within two weeks.

Additionally, during the study period, the percentage of copied and pasted code also increased significantly. This goes against the popular “DRY” or “Don't Repeat Yourself” mantra among programmers, as repeated code can lead to increased maintenance, bugs, and inconsistency in a codebase.

But as to whether the productivity time savings associated with AI code assistants are being negated by cleanup operations, Shaukat said it's too early to tell.

SEE: Top security tools for developers

“Our experience tells us that typical developers accept suggestions from code generators about 30% of the time. That’s significant,” he said. “When the system is designed correctly, with the right tools and processes in place, any cleanup work is manageable.”

However, developers must still be held accountable for the code they ship, especially when using AI tools. If they don't, code that causes downtime will slip through the cracks.

Shaukat told TechRepublic: “CEOs, CIOs, and other corporate leaders need to analyze their processes in light of the increased use of AI in code generation and prioritize taking necessary safeguards.

“If they can’t, there will be frequent outages, more bugs, a loss of developer productivity, and increased security risks. AI tools are designed to be trusted and verified.”

Could 'poor reviews' be a factor?

The laissez-faire effect

What is 'code rotation'?

You may also like...