Bits and Bytes OnLine: Text-to-video: OpenAI’s Sora

OpenAI, a US-based research organization, has developed Sora, a text-to-video technology.

The name Sora means ‘sky’ in Japanese, hinting at its “the sky is the limit” possibilities.

Sora is an advanced program that uses algorithms and extensive training data to transform written text into high-quality videos.

Its technology allows you to generate professional-grade videos with multiple moving characters and diverse visual styles simply by writing your statements.

Sora can even take still images and transform them into videos, extend existing videos, and fill in missing film segments.

Frames from an AI-generated video based off of Mark Ollig programming his ZX81 Sinclair computer using BASIC.

OpenAI’s Sora software was released in February of this year to cybersecurity professionals known as “red teamers.”

The software will undergo susceptibility testing to address any vulnerabilities that malware or hackers could exploit.

OpenAI is still improving Sora’s performance along with coding to prevent the creation of unethical video content.

After viewing some of the Sora AI-generated videos available on their website, I came away impressed by their realism. The movements, lighting, and textures were strikingly lifelike.

I installed the Sora app (last updated March 27, 2024) and made two AI videos, one from my text description and the other from a 1982 photo of me with added movements based on brief text input.

In addition to Sora, OpenAI has developed a suite of well-known AI applications.

These include GPT-4, a powerful language AI assistant model, and ChatGPT, a resourceful AI interactive chatbot.

OpenAI also created DALL-E, an AI system that generates realistic images and artwork from simple text descriptions.

The name playfully combines artist Salvador Dalí and Pixar’s animated robot Wall-E.

Another notable OpenAI application is Codex, which can translate natural language instructions into computer code.

As an AI-based tool, Sora generates videos from text input by analyzing its vast amounts of stored data.

Sora has various potential applications across different fields.

For instance, educators can use it to enhance their lessons with interactive simulations, which can assist students in comprehending intricate concepts.

Marketers can also leverage Sora’s capabilities to create visually appealing and engaging campaigns that can grab the attention of their target audience.

Designers of various specialties, including product, UX/UI (user experience/user interface) graphics, and motion graphics.

Sora can also be used to create videos that complement music.

Fashion and interior designers can use Sora’s high level of accuracy to create precise prototypes, which can help them refine their designs more efficiently.

Sora’s ability to generate videos in various styles – from photorealistic to imaginative animation – opens up a world of creative possibilities.

In the 1970s, the telecommunications industry used specialized programming languages like Protel (Procedure Oriented Type Enforcing Language) to manage their complex systems.

Developed by Bell-Northern Research, Protel drew inspiration from structured programming language models like PASCAL, created by Niklaus Wirth in 1970.

PASCAL’s emphasis on readable code and well-defined data structures made it a powerful tool for building reliable software for both educational and industry settings.

My experience with Protel dates back to 1986 when I worked at Winsted Telephone Company.

I used a text-based command-line interpreter to program and maintain a Nortel DMS-10 digital voice-switching platform using Protel. I would later use it with the larger DMS-100, 250, and 500 switches.

PASCAL is named after Blaise Pascal, a 17th-century mathematician.

Although PASCAL’s influence is noticeable in modern programming design principles, AI systems like Sora demand more specialized tools.

Today’s AI developers use programming languages like Python, a versatile and widely used language to analyze complex data and build intelligent systems that can learn from that data.

Sora’s core modeling software is proprietary, but it leverages powerful programming languages and AI frameworks for efficiency and adaptability.

It reportedly uses C++, a high-performance language, to optimize video generation speeds, and Python.

Additionally, Sora employs deep learning framework technologies like PyTorch or TensorFlow.

These and other high-level programming technologies are used for building and deploying complex neural networks used in AI applications.

Sora utilizes a transformer architecture known for its versatility in tasks like language processing and image generation, which allows it to excel in video creation, demonstrating capabilities beyond its original design.

Early AI video-generating systems faced challenges in accurately depicting spatial details, generating realistic interactions, and fully understanding cause and effect within scenes.

For example, in an AI-generated video, a person might take a bite of a cookie, yet the cookie would remain whole afterward.

This simple error highlights the AI model’s limitation in understanding object permanence.

Sora’s ability to track and understand object states (object consistency) is a notable step towards human-like reasoning for AI systems.

Its ability to generate highly detailed characters and landscapes and its natural language comprehension are impressive.

This type of technology is another step into AI’s future potential to better understand and interact with us and our increasingly complex world.

To learn about and see some Sora AI-generated videos, visit (openai.com/sora).

Its AI-generated technical details can be seen at (tinyurl.com/SoraTechnical).

(Below is the AI-generated video of me taken in 1982)

Click twice to play!

Photo of me from 1982 that the
AI generated the video from

Below is the AI-generated video based on my text description of a man in his mid-60s with a white beard sitting at a table in a coffee shop reading the newspaper.

Below is a AI-Generated video I created from a 1976 photo I took of my Dad proudly maneuvering his Kayot pontoon on Gull Lake, near Brainerd, MN. Dad loved having all of us on the pontoon, and we had access to seven sperate lakes. Many happy times and good memories for me.

Below is the photo I took:

Friday, April 5, 2024

Text-to-video: OpenAI’s Sora