OpenAI, the maker of ChatGPT, has introduced Sora, its artificial intelligence engine for converting text messages to video. Think Dall-E (also developed by OpenAI), but for movies instead of still images.
It's still very early days for Sora, but the AI model is already generating a lot of buzz on social media, with multiple clips doing the rounds, clips that appear to have been created by a team of actors and filmmakers.
Here we'll explain everything you need to know about OpenAI Sora: what it's capable of, how it works, and when you'll be able to use it yourself. The era of AI texting movie making has arrived.
OpenAI Sora release date and price
In February 2024, OpenAI Sora was made available to “red teams,” i.e. people whose job it is to test the security and stability of a product. OpenAI has also invited a select number of visual artists, designers and filmmakers to test the video generation capabilities and provide feedback.
“We are sharing our research progress early on to get started and get feedback from people outside of OpenAI and to give the public an idea of the AI capabilities on the horizon,” OpenAI says.
In other words, the rest of us can't use it yet. At the moment, there is no indication of when Sora might be available to the general public or how much we will have to pay to access it.
We can make some rough guesses about the timescale based on what happened with ChatGPT. Before that AI chatbot was released to the public in November 2022, it was preceded by a predecessor called InstructGPT earlier that year. Additionally, OpenAI DevDay usually takes place annually in November.
It's certainly possible, then, that Sora could follow a similar pattern and release to the public at a similar time in 2024. But this is currently just speculation and we'll update this page as soon as we have any clearer indication of a Sora release. date.
As for the price, we also have no clue as to how much Sora might cost. As a guide, ChatGPT Plus, which offers access to the newer Large Language Models (LLM) and Dall-E, currently costs $20 (around £16 / AU$30) per month.
But Sora also requires much more computing power than, say, generating a single image with Dall-E, and the process also takes longer. Therefore, it is still unclear exactly to what extent Sora, which is effectively a research work, could become an affordable consumer product.
What is OpenAI Sora?
You may be familiar with generative AI models, such as Google Gemini for text and Dall-E for images, which can produce new content based on large amounts of training data. If you ask ChatGPT to write you a poem, for example, what you'll receive will be based on many, many poems that the AI has already absorbed and analyzed.
OpenAI Sora is a similar idea, but for video clips. You give it a text message, like “woman walking down a city street at night” or “car driving through a forest” and you get a video. As with AI image models, you can be very specific when it comes to saying what should be included in the clip and the style of footage you want to see.
To get a better idea of how this works, check out some of the example videos posted by Sam Altman, CEO of OpenAI – not long after Sora was introduced to the world, Altman responded to prompts presented on social media, returning text-based videos such as “a wizard in a pointy hat and a blue robe with white stars casting a spell that shoots lightning from his hand.” and holding an old tome in his other hand.”
How does OpenAI Sora work?
On a simplified level, the technology behind Sora is the same that allows you to search for images of a dog or cat on the web. Show an AI enough photos of a dog or cat and it can detect the same patterns in new images; Likewise, if you train an AI with a million videos of a sunset or a waterfall, it will be able to generate its own.
Of course, there's a lot of complexity behind this, and OpenAI has provided a deep dive into how its AI model works. It's trained on “Internet-scale data” to know what realistic videos look like, first analyzing the clips to know what you're looking at, and then learning to produce its own versions when prompted.
So, ask Sora to produce a clip of a fish tank and he'll return an approximation based on all the fish tank videos he's seen. It makes use of what are known as visual patches, smaller building blocks that help the AI understand what should go, where and how the different elements of a video should interact and progress, frame by frame.
Sora is based on a diffusion model, in which the AI starts with a “noisy” response and then works towards a “clean” output through a series of feedback loops and prediction calculations. You can see this in the frames above, where a video of a dog playing on the show goes from meaningless blobs to something that actually looks realistic.
And like other generative AI models, Sora uses transformer technology (the last T in ChatGPT stands for Transformer). Transformers use a variety of sophisticated data analysis techniques to process reams of data: they can understand the most important and least important parts of what is being analyzed and discover the surrounding context and relationships between these chunks of data.
What we don't fully know is where OpenAI found its training data; He hasn't said which video libraries have been used to power Sora, although we do know that he has partnerships with content databases like Shutterstock. In some cases, you can see the similarities between the training data and the result that Sora produces.
What can you do with OpenAI Sora?
At the moment, Sora is capable of producing HD videos of up to one minute, without any accompanying sound, from text prompts. If you want to see some examples of what's possible, we've put together a list of 11 cool Sora shorts for you to watch, including fluffy Pixar-style animated characters and astronauts in knitted helmets.
“Sora can generate videos up to one minute long while maintaining visual quality and compliance with user input,” OpenAI says, but that's not all. You can also generate videos from still images, fill in missing frames in existing videos, and seamlessly join multiple videos. He can also create still images or produce endless loops from the clips provided to him.
It can even produce video game simulations. like minecraft, again based on large amounts of training data that teaches you what a game like Minecraft should look like. We've already seen a demo where Sora can control a player in a Minecraft-style environment while accurately rendering surrounding details.
OpenAI recognizes some of the limitations of Sora at this time. The physics don't always make sense, as people disappear, transform, or blend into other objects. Sora isn't plotting a scene with individual actors and props, rather he's doing an incredible amount of calculations about where pixels should go from frame to frame.
In Sora's videos, people may move in ways that defy the laws of physics, or details (like the bite of a cookie) may not be remembered from one frame to the next. OpenAI is aware of these issues and is working to fix them, and you can check out some of the examples on the OpenAI Sora website to see what we mean.
Despite those mistakes, OpenAI later hopes Sora can evolve into a realistic simulator of physical and digital worlds. In the coming years, Sora technology could be used to generate imaginary virtual worlds that we can explore, or allow us to fully explore real places that are replicated in AI.
How can you use OpenAI Sora?
At the moment, you can't enter Sora without an invitation: It appears that OpenAI is selecting individual creators and testers to help prepare its video-generated AI model for a full public release. It remains to be seen how long this preview period will last, whether it be months or years, but OpenAI has previously shown its willingness to move forward as quickly as possible when it comes to its AI projects.
Based on existing technologies that OpenAI has made public (Dall-E and ChatGPT), it seems likely that Sora will initially be available as a web application. Since its launch, ChatGPT has gotten smarter and added new features, including custom bots, and Sora will likely follow the same path when it fully launches.
Before that happens, OpenAI says it wants to implement some guardrails: It won't be able to generate videos that show extreme violence, sexual content, hate images, or images of celebrities. There are also plans to combat misinformation by including metadata on Sora's videos indicating that they were generated by AI.
you might also like