OpenAI unveils sCM, a new model that generates video media 50 times faster
OpenAI unveils sCM, a new model that generates video media 50 times faster


Two experts from the OpenAI team have developed a new type of continuous-time coherence (sCM) model that they say can generate video media 50 times faster than currently used models. Cheng Lu and Yang Song published a paper describing their new model on the arXiv preprint server. They also posted an introductory document on the company’s website.
In the machine learning methods by which AI applications are trained, diffusion models, sometimes called probabilistic diffusion models or score-based generative models, are a type of variable generative model.
Such models generally have three main elements: forward and inverse processes and a sampling procedure. These models form the basis for generating visual products such as videos or still images, although they have also been used with other applications, such as audio generation.
As with other machine learning models, diffusion models work by sampling large amounts of data. Most of these models run hundreds of steps to generate a final product, which is why most of them take a few moments to complete their tasks.
In contrast, Lu and Song developed a model that does all its work in just two steps. They say this reduction in steps significantly reduced the time it took their model to generate a video, without any loss in quality.
The new model uses more than 1.5 billion parameters and can produce a sample video in a fraction of a second on a machine with a single A100 GPU. This is approximately 50 times faster than models currently in use.
The researchers note that their new model requires significantly less computing power than other models, which is also a persistent problem with AI applications in general as their usage skyrockets. They also note that their new approach has already undergone benchmarking to compare their results with other models, both those currently in use and those being developed by other teams. They suggest that their model should enable real-time generative AI applications in the near future.
More information: Cheng Lu et al, Simplifying, stabilizing, and scaling continuous-time coherence models, arXiv (2024). DOI: 10.48550/arxiv.2410.11081
Journal information: arXiv






