OpenAI's Sora generates photorealistic videos

OpenAI on February 15 launched an impressive new text-to-video model called Sora that can create photorealistic or cartoon-like moving images from natural language text prompts. Sora is not yet available to the public; Instead, OpenAI handed Sora over to red teams (security researchers who mimic techniques used by threat actors) to assess potential damage or risks. OpenAI also offered Sora to select visual and audio designers and artists for feedback on how to best optimize Sora for creative work.

OpenAI's emphasis on safety around Sora is standard for generative AI today, but it also shows the importance of taking precautions when it comes to AI that could be used to create convincing fake images, which could, for example, harm the reputation of an organization.

What is Sora?

Sora is a generative AI diffusion model. Sora can generate multiple characters, complex backgrounds, and realistic-looking movements in videos up to one minute long. You can create multiple shots within a video, keeping the characters and visual style consistent, allowing Sora to be an effective narrative tool.

In the future, Sora could be used to generate videos to accompany content, to promote content or products on social media, or to illustrate points in business presentations. While it shouldn't replace the creative minds of professional video creators, Sora could be used to create content faster and easier. While there's no information on pricing yet, OpenAI may eventually have an option to incorporate Sora into its ChatGPT Enterprise subscription.

“Media and entertainment will be the industry vertical that can be the first to adopt models like these,” Gartner analyst and distinguished vice president Arun Chandrasekaran Chandrasekaran told TechRepublic in an email. “Business functions like marketing and design within businesses and technology companies could also be early adopters.”

How do I access Sora?

Unless you have already received access from OpenAI as part of your red team or creative work beta test, it is not possible to access Sora now. OpenAI released Sora to select visual artists, designers, and filmmakers to learn how to optimize Sora for creative uses specifically. Additionally, OpenAI has given access to red team researchers specializing in disinformation, hate content, and bias. Gartner analyst and distinguished vice president Arun Chandrasekaran said OpenAI's initial release of Sora is “a good approach and consistent with OpenAI's practices on secure model release.”

“Of course, this alone will not be enough, and they must implement practices to root out bad actors who gain access to these models or their nefarious uses,” Chandrasekaran said.

How does Sora work?

Sora is a diffusion model, meaning it gradually refines a meaningless image into an understandable one based on the message and uses a transformative architecture. The research that OpenAI conducted to create its DALL-E and GPT models, particularly the DALL-E recapture technique, were stepping stones toward the creation of Sora.

SEE: AI engineers are in demand in the UK (TechRepublic)

Sora's videos don't always look completely realistic

Sora still has trouble telling left from right or following complex descriptions of events that happen over time, such as cues about a specific camera movement. Videos created with Sora are likely to be detected through cause-and-effect errors, OpenAI said, such as when a person bites into a cookie but doesn't leave a mark.

For example, interactions between characters may show confusion (especially around limbs) or uncertainty in terms of numbers (e.g., how many wolves are in the video below at any given time?).

What are OpenAI's security precautions around Sora?

With the right cues and settings, the videos Sora makes can easily be mistaken for live-action videos. OpenAI is aware of potential defamation or misinformation issues arising from this technology. OpenAI plans to apply the same content filters to Sora that the company applies to DALL-E 3 that prevent “extreme violence, sexual content, hate images, celebrity likenesses or the intellectual property of others,” according to OpenAI.

If Sora is released to the public, OpenAI plans to mark content created with Sora with C2PA metadata; The metadata can be viewed by selecting the image and choosing the File Information or Properties menu options. People who create AI-generated images can still delete metadata on purpose or do so accidentally. OpenAI currently has nothing in place to prevent users of its image generator, DALL-E 3, from removing metadata.

“It's already [difficult] and it will become increasingly impossible to detect AI-generated content by humans,” Chandrasekaran said. “Venture capitalists are investing in startups that create deepfake detection tools, and they (deepfake detection tools) can be part of a company's armor. However, in the future, public-private partnerships will need to identify, often at the time of creation, machine-generated content.”

Who are Sora's competitors?

Sora's photorealistic videos are quite different, but similar services exist. Runway provides enterprise-ready text-to-video AI generation. Fliki can create limited videos with voice sync for social media storytelling. Generative AI can now also reliably add content or edit conventionally shot videos.

On February 8, Apple researchers revealed a paper about Keyframer, their proposed large language model that can create animated, stylized images.

TechRepublic has reached out to OpenAI for more information on Sora.