OpenAI's Sora: The Cutting-Edge AI That's Revolutionizing Video Production
Category Artificial Intelligence Friday - February 16 2024, 14:02 UTC - 9 months ago OpenAI has developed a groundbreaking generative video model called Sora that can turn a short text description into a detailed, high-definition film clip up to one minute in length. While still in development, Sora has already shown impressive capabilities in understanding complex visual data and handling occlusion. However, it is not yet available to the public.
OpenAI has built a striking new generative video model called Sora that can take a short text description and turn it into a detailed, high-definition film clip up to a minute long. This technology is a breakthrough in the field of AI and video production, and is a major step towards creating truly intelligent systems that can understand and interpret complex visual data.
Based on four sample videos that OpenAI shared with MIT Technology Review ahead of today’s announcement, the San Francisco–based firm has pushed the envelope of what’s possible with text-to-video generation (a hot new research direction that we flagged as a trend to watch in 2024).
"We think building models that can understand video, and understand all these very complex interactions of our world, is an important step for all future AI systems," says Tim Brooks, a scientist at OpenAI.
But there’s a disclaimer. OpenAI gave us a preview of Sora (which means sky in Japanese) under conditions of strict secrecy. In an unusual move, the firm would only share information about Sora if we agreed to wait until after news of the model was made public to seek the opinions of outside experts. [Editor’s note: We’ve updated this story with outside comment below.] OpenAI has not released a technical report or demonstrated the model actually working. And it says it won’t be releasing Sora anytime soon.
The first generative models that could produce video from snippets of text appeared in late 2022. But early examples from Meta, Google, and a startup called Runway were glitchy and grainy. Since then, the tech has been getting better fast. Runway’s gen-2 model, released last year, can produce short clips that come close to matching big-studio animation in their quality. But most of these examples are still only a few seconds long.
The sample videos from OpenAI’s Sora are high-definition and full of detail. OpenAI also says it can generate videos up to a minute long. One video of a Tokyo street scene shows that Sora has learned how objects fit together in 3D: the camera swoops into the scene to follow a couple as they walk past a row of shops.
OpenAI also claims that Sora handles occlusion well. One problem with existing models is that they can fail to keep track of objects when they drop out of view. For example, if a truck passes in front of a street sign, the sign might not reappear afterward.
In a video of a papercraft underwater scene, Sora has added what look like cuts between different pieces of footage, and the model has maintained a consistent style between them.
It’s not perfect. In the Tokyo video, cars to the left look smaller than the people walking beside them. They also pop in and out between the tree branches. "There’s definitely some work to be done in terms of long-term coherence," says Brooks. "For example, if someone goes out of view for a long time, they won’t come back. The model kind of forgets that they were supposed to be there." .
Tech tease .
Impressive as they are, the sample videos shown here were no doubt cherry-picked to show Sora at its best. Without more information, it is hard to know how representative they are of the model’s typical output.
It may be some time before we find out. OpenAI’s announcement of Sora today is a tech tease, and the company says it has no current plans to release it to the public. Instead, Sora is another illustration of how far generative models have come, and a reminder of how quickly they’re advancing.
Share