At an event in San Francisco in November, Sam Altman, CEO of artificial intelligence company OpenAI, was asked what surprises the field would bring in 2024.
Online chatbots like OpenAI’s ChatGPT will take “a leap forward that no one expected,” Altman immediately responded.
Sitting next to him, James Manyika, a Google executive, nodded and said, “One more.”
The AI industry this year will be defined by one main characteristic: a remarkably rapid improvement in technology as advances complement each other, allowing AI to generate new types of media, imitate human reasoning in new ways and seep into the physical world. through a new generation of robots.
In the coming months, AI-powered image generators like DALL-E and Midjourney will instantly deliver videos and still images. And little by little they will merge with chatbots like ChatGPT.
That means chatbots will expand far beyond digital text by handling photos, videos, diagrams, tables, and other media. They will exhibit behavior that more closely resembles human reasoning and tackle increasingly complex tasks in fields such as mathematics and science. As technology moves towards robots, it will also help solve problems beyond the digital world.
Many of these developments have already begun to emerge within major research laboratories and in technological products. But in 2024, the power of these products will increase significantly and they will be used by many more people.
“The rapid progress of AI will continue,” said David Luan, CEO of Adept, an AI startup. “It is unavoidable.”
OpenAI, Google, and other tech companies are advancing AI much faster than other technologies because of the way the underlying systems are built.
Most software applications are created by engineers, one line of computer code at a time, which is often a slow and tedious process. Companies are improving AI more quickly because the technology is based on neural networks, mathematical systems that can learn skills by analyzing digital data. By identifying patterns in data such as Wikipedia articles, books, and digital texts scraped from the Internet, a neural network can learn to generate text on its own.
This year, technology companies plan to provide artificial intelligence systems with more data (including images, sounds and more text) than people can understand. As these systems learn the relationships between these various types of data, they will learn to solve increasingly complex problems, preparing them for life in the physical world.
(The New York Times sued OpenAI and Microsoft last month for copyright infringement of news content related to artificial intelligence systems.)
None of this means that AI will be able to match the human brain any time soon. While AI companies and entrepreneurs aim to create what they call “artificial general intelligence” (a machine that can do anything the human brain can do), this remains a daunting task. Despite its rapid advances, AI is still in its early stages.
Here’s a guide to how AI will change this year, starting with near-term advances, leading to further advances in its capabilities.
Instant Videos
Until now, AI-powered apps primarily generated text and still images in response to prompts. DALL-E, for example, can create photorealistic images in seconds from requests like “a rhino diving off the Golden Gate Bridge.”
But this year, companies like OpenAI, Google, Meta, and New York-based Runway are likely to implement image generators that allow people to generate videos as well. These companies have already prototyped tools that can instantly create videos from short text messages.
Tech companies are likely to bring the powers of image and video generators to chatbots, making them more powerful.
‘Multimodal’ chatbots
Chatbots and image generators, originally developed as standalone tools, are gradually merging. When OpenAI introduced a new version of ChatGPT last year, the chatbot could generate images as well as text.
AI companies are building “multimodal” systems, meaning AI can handle multiple types of media. These systems learn skills by analyzing photographs, text, and potentially other types of media, including diagrams, charts, sounds, and videos, so they can then produce their own text, images, and sounds.
Thats not all. Because systems also learn the relationships between different types of media, they will be able to understand one type of media and respond with another. In other words, someone can enter an image into the chatbot and it will respond with text.
“Technology will become smarter and more useful,” said Ahmad Al-Dahle, who heads the generative AI group at Meta. “He will do more things.”
Multimodal chatbots will get things wrong, just as text-only chatbots make mistakes. Technology companies are working to reduce errors as they strive to create chatbots that can reason like a human.
Better ‘reasoning’
When Altman talks about AI taking a leap forward, he’s referring to chatbots that are better at “reasoning” so they can take on more complex tasks, such as solving complicated math problems and generating detailed computer programs.
The goal is to build systems that can carefully and logically solve a problem through a series of discrete steps, each of which builds on the next. This is how humans reason, at least in some cases.
Leading scientists disagree about whether chatbots can really reason like this. Some argue that these systems simply appear to reason while repeating behaviors they have seen in Internet data. But OpenAI and others are building systems that can more reliably answer complex questions involving subjects like mathematics, computer programming, physics and other sciences.
“As systems become more reliable, they will become more popular,” said Nick Frosst, a former Google researcher who helps run Cohere, an artificial intelligence startup.
If chatbots are better at reasoning, they can become “AI agents.”
‘AI agents’
As companies teach AI systems how to solve complex problems step by step, they can also improve chatbots’ ability to use software applications and websites on their behalf.
Basically, researchers are transforming chatbots into a new type of autonomous system called an artificial intelligence agent. That means chatbots can use software applications, websites, and other online tools, including spreadsheets, online calendars, and travel sites. People could then transfer tedious office work to chatbots. But these agents could also eliminate jobs entirely.
Chatbots already operate as agents in small ways. They can schedule meetings, edit files, analyze data, and create bar charts. But these tools don’t always work as well as they should. Agents break down completely when applied to more complex tasks.
This year, AI companies are set to introduce agents that are more trustworthy. “You should be able to delegate any tedious, everyday IT work to an agent,” Luan said.
This could include tracking expenses in an app like QuickBooks or recording vacation days in an app like Workday. In the long term, it will go beyond software and Internet services and into the world of robotics.
Smarter robots
In the past, robots were programmed to perform the same task over and over again, such as picking up boxes that were always the same size and shape. But using the same type of technology that underpins chatbots, researchers are giving robots the power to handle more complex tasks, including ones they’ve never seen before.
Just as chatbots can learn to predict the next word in a sentence by analyzing large amounts of digital text, a robot can learn to predict what will happen in the physical world by analyzing countless videos of objects being pushed, lifted, and moved.
“These technologies can absorb enormous amounts of data. And as they absorb data, they can learn how the world works, how physics works, how you interact with objects,” said Peter Chen, a former OpenAI researcher who runs Covariant, a robotics startup.
This year, AI will power robots that operate behind the scenes, like mechanical arms that fold shirts in a laundromat or sort through piles of stuff inside a warehouse. Tech titans like Elon Musk are also working to move Humanoid robots in people’s homes..