Anyone who has used generative AI for any period of time will be more than familiar with hallucinations. This occurs when AI systems generate false or misleading information, a failure that is often due to limitations in their training data or model design. These inaccuracies can arise unpredictably and vary widely in severity: from minor errors to substantial distortions that could significantly distort decision-making processes.
Lamini Memory Tuning aims to significantly reduce hallucinations, from 50% to 5%, a 90% cut. The technology allows accurate data to be incorporated into LLMs and reportedly achieves accuracy rates of up to 95%, a significant jump from the 50% accuracy offered by previous methods.
By specifically tuning millions of expert adapters, such as LoRA (Low Range Adaptations) in any open source LLM, Lamini Memory Tuning ensures accurate retention of data, ranging from historical events to complex technical data, without the high latency and cost typically associated with such precision.
Memory Expert Blend
This method, inspired by mind maps, selectively activates the most relevant experts of an index during inference, drastically reducing unnecessary calculations.
As an example, the company says that when tasked with remembering specific facts about the Roman Empire, the system extracts only the necessary information about Julius Caesar, the aqueducts or the legions, avoiding the activation of irrelevant model weights.
The underlying technology behind Lamini Memory Tuning involves a sparse activation framework known as Mixture of Memory Experts (MoME), which scales to support a large number of facts limited only by the size of the training data. Lamini says this approach not only improves model responsiveness but also significantly reduces computational demands, making it a viable solution for improving the performance of LLMs in various applications.