Today’s business leaders recognize that some applications of generative AI have great potential to help their businesses run better, though they may still be exploring exactly how and what the ultimate return on investment may be. Indeed, as companies turn their generative AI prototypes into at-scale solutions, they must consider factors such as the technology’s cost, accuracy, and latency to determine its long-term value.
The ever-expanding landscape of major language models (LLMs), combined with the fear of making the wrong choice, leaves some companies in a quandary. LLMs come in all shapes and sizes and can serve different purposes, and the truth is that no LLM will solve every problem. So how can a company determine which one is right for them?
Here we discuss how to make the best choice so your business can confidently deploy generative AI.
Director of Design and Strategy at New Relic.
Choose your level of sophistication in LLM: the sooner the better
Some companies are conservative about adopting an LLM, launching pilot projects, and waiting for the next generation to see how that can change their application of generative AI. Their reluctance to commit may be justified, as jumping in too early and not testing it properly could mean big losses. But generative AI is a rapidly evolving technology, with new fundamental models being introduced seemingly every week, so being too conservative and continuing to wait for the technology to evolve may mean never making any progress.
That said, there are three levels of sophistication that companies can consider when it comes to generative AI. The first is a simple wrapper application around GPT, designed to interact with OpenAI’s language models and provide an interface to guide text completion and conversation-based interactions. The next level of sophistication is the use of an LLM with retrieval-augmented generation (RAG). RAG allows companies to enhance their LLM output with proprietary and/or private data. GPT-4, for example, is a powerful LLM that can understand nuanced language and even reasoning.
However, it has not been trained on any specific company’s data and can lead to potential inaccuracies, inconsistencies, or irrelevancies (hallucinations). Companies can avoid hallucinations by using implementations like RAG, which allows them to combine the insights from a base LLM model with some of the data unique to their business. (It should be noted that alternative broad-context models like Claude 3 can render RAG obsolete. And while many are still in their early stages, we all know how quickly technology advances, so obsolescence may come sooner rather than later.)
At the third level of generative AI sophistication, a company runs its own models. For example, a company can take an open-source model, fine-tune it with proprietary data, and run the model on its own IT infrastructure rather than any third-party offering like OpenAI. It should be noted that this third-level LLM requires supervision by engineers trained in machine learning.
Apply the right LLM to the right use case
Given the options available and the differences in cost and capability, companies need to determine exactly what they plan to accomplish with their LLM. For example, if you are an e-commerce company, human support is trained to intervene when a customer is at risk of abandoning their cart and help them decide whether to complete their purchase. A chat interface will achieve the same result at a tenth of the cost. In this case, it may be worthwhile for the e-commerce company to invest in running its own LLM with engineers monitoring it.
But a larger model isn’t always cost-effective, or even necessary. If you run a banking app, you can’t afford to make mistakes in transactions. For this reason, you’ll want to have tighter control. Developing your own model or using an open-source model, fine-tuning it, applying highly designed input and output filters, and hosting it yourself gives you all the control you need. And for those companies that simply want to optimize the quality of their customer experience, a well-performing LLM from a third-party vendor would be a good choice.
A note on observability
Regardless of the LLM model chosen, it is critical to understand how it works. As technology stacks become increasingly complex, it can be difficult to focus on performance issues that may arise in an LLM model. Additionally, due to the uniqueness of the technology stack and the very different interactions of LLM models, there are entirely new metrics that need to be tracked, such as token execution time, hallucinations, skew, and drift. That’s where observability comes in, providing end-to-end visibility across the stack to ensure uptime, reliability, and operational efficiency. In short, adding an LLM model without visibility could greatly impact how a company measures technology ROI.
The road to generative AI is exciting and fast-paced, if a little daunting. Understanding your company’s needs and finding the right training for them will not only ensure short-term benefits, but will also lay the foundation for ideal business outcomes in the future.
We have the best AI tools.
This article was produced as part of TechRadarPro's Expert Insights channel, where we showcase the brightest and brightest minds in the tech industry today. The views expressed here are those of the author, and not necessarily those of TechRadarPro or Future plc. If you're interested in contributing, find out more here: