Google's Gemini AI has only been around for two months at the time of writing and the company is already launching its next-generation model called Gemini 1.5.
The announcement post gets into the nitty-gritty and explains all the AI improvements in detail. It's all pretty technical, but the main thing is that Gemini 1.5 will offer “dramatically improved performance.” This was achieved with the implementation of a “mixture of experts architecture” (or MoE for short) that sees multiple AI models working together. in unison The implementation of this structure made Gemini easier to train and faster at learning complicated tasks than before.
There are plans to roll out the update to all three major AI versions, but the only one launching today for early testing is Gemini 1.5 Pro.
What's unique is that the model has “a context window of up to 1 million tokens.” Tokens, as they relate to generative AI, are the smallest pieces of data that LLMs (large language models) use “to process and generate text.” Larger context windows allow the AI to handle more information at once. And one million tokens is a huge number, far exceeding what GPT-4 Turbo can do. The OpenAI engine, for comparison, has a context window limit of 128,000 tokens.
Gemini Pro in action
With all these numbers, the question is what does the Gemini 1.5 Pro look like in action? Google made several videos showing the capabilities of AI. Admittedly, this is quite interesting, as it reveals how the updated model can analyze and summarize large amounts of text according to a message.
In one example, they gave Gemini 1.5 Pro the 400+ page transcript of the Apollo 11 lunar mission. It showed that the AI could “understand, reason and identify” certain details in the document. The prompter asks the AI to locate “comic moments” during the mission. After 30 seconds, Gemini 1.5 Pro managed to find some jokes that the astronauts made while in space, including who told them and explained the references made.
These analysis skills can be used for other modalities. In another demo, the development team gave the AI a 44-minute Buster Keaton movie. They uploaded a sketch of a gushing water tower and then asked for the timestamp of a scene involving a water tower. Sure enough, he found the exact part ten minutes into the movie. Note that this was done without any explanation about the drawing itself or any other text besides the question. Gemini 1.5 Pro understood that it was a water tower without additional help.
experimental technology
The model is not available to the general public at the moment. It is currently being offered as a preview to “developers and enterprise customers” via Google's AI Studio and Vertex AI platforms for free. The company warns testers that they may experience long latency times since it is still experimental. However, there are plans to improve speeds in the future.
We've reached out to Google for information on when people can expect the release of Gemini 1.5 and Gemini 1.5 Ultra, as well as the broader release of these next-generation AI models. This story will be updated later. Until then, check out TechRadar's roundup of the best AI content generators for 2024.