Despite becoming an overnight success when it launched, ChatGPT is still struggling to stand out in some areas, particularly coding support, new research claims.
Positioned as an ideal solution to programming problems, some developers have been using a variety of generative AI tools, such as GitHub's Copilot, to speed up workflow and free up more time to focus on productive work.
However, a new study by researchers at Purdue University found that more than half (52%) of the answers ChatGPT produced are incorrect.
ChatGPT helps with coding
The researchers analyzed 517 Stack Overflow questions and compared ChatGPT answers to human answers, and found that AI errors were widespread. In total, more than half (54%) were conceptual misunderstandings, around one in three (36%) were factual inaccuracies, a similar number (28%) were logical errors in the code, and 12% were terminological errors.
In the article, ChatGPT was also criticized for producing unnecessarily long and complex responses that contain more detail than necessary, leading to potential confusion and distractions. However, in the ultra-small-scale survey of 12 programmers, a third preferred ChatGPT's articulate, textbook-like answers, highlighting how easily coders can be fooled.
The implications of these findings are quite significant, because errors in coding can ultimately lead to larger problems that affect multiple departments or organizations.
The writers summarize: “Since ChatGPT produces a large number of incorrect answers, our results emphasize the need for caution and awareness regarding the use of ChatGPT answers in programming tasks.”
In addition to exercising caution, the researchers also call for more research to identify and mitigate such errors, as well as greater transparency and communication around potential inaccuracies.