New research has claimed that large language models, specifically GPT-4 (which powers certain versions of ChatGPT and several Microsoft Copilot-branded generative AI products) can analyze financial statements more accurately than humans.
The findings from researchers at the University of Chicago suggest significant implications for the future of financial analysis and decision making as AI becomes more common.
The study also highlights the versatility of generic, multipurpose LLMs like GPT-4, which can offer similar capabilities to more specialized tools, noting: “We find that the prediction accuracy of the LLM is on par with the performance of a closely trained state.” “state-of-the-art machine learning model.”
LLMs are excellent at analyzing financial reports.
In their testing, the researchers found that GPT-4 outperforms human analysts even without textual context, highlighting the technology's accuracy of 60% compared to the 53-57% range for human analysts.
However, success did not come without some initial legwork, as the article details the researchers' use of chain-of-thought prompts to craft more appropriate and accurate responses.
Additionally, the study found that GPT-4 and human analysts complement each other well: while the LLM excels in areas where humans can be inefficient or biased, humans add value when additional context is required.
The GPT-4's capabilities were attributed to its broad knowledge base and theoretical understanding, allowing it to draw conclusions from data patterns even without specific financial training, and although the model was shown to have some limitations, progress has already been made. been seen in the last GPT-4o. The model significantly improves efficiency while transforming into a multimodal model.
While there has been plenty of skepticism surrounding generative AI's readiness to replace human workers, its position as an integral support is becoming increasingly evident as human workers prepare to hybridize with efficiency-boosting technology. .