Google has unveiled Lumiere, a multimodal AI video generation tool capable of creating 5-second videos from text and images

Google Lumiere supports text-to-video and image-to-video models, providing options to create stylized videos.

Google unveiled its latest artificial intelligence (AI) model, Lumiere, last week. The new AI model is a multimodal video generation tool that can generate 5-second-long videos. It supports both text-to-video and image-to-video generation and joins existing AI models such as Runway Gen-2 and Pika 1.0. As per Google, Lumiere uses a Space-Time U-Net (STUNet) architecture that innovates how motion occurs in an AI video, making it appear realistic. The platform is not open to the public as of yet.

In an accompanying preprint paper, the research team behind Lumiere explained that the major innovation in motion comes from creating the video in a single process instead of putting together still frames. Due to this, both the spatial (the objects in the video) and temporal (how things move around in the video) aspects of the video generation are created simultaneously. For the layperson, this results in perceiving motions as they occur in nature. To achieve this, Lumiere generates a larger number of 80 frames instead of Stable Diffusion’s 25 frames.

“By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales,” the paper added.

Google Launches 'Lumiere': Revolutionary AI Video Generation | Leaders

Google Lumiere Unveiled: AI Video Generation Tool Can Create 5-Second Videos From Text, Images. While Google Lumiere cannot be tested currently, the website is live for enthusiasts to explore various videos generated using the AI model. Users can also examine the text prompts and input images used to create the output. The tool offers various video styles, cinemagraphs enabling animation of specific parts of the video, and inpainting where the AI completes a masked-out video or image based on the provided prompt.

Google Lumiere competes with existing AI models like Runway Gen-2 and Pika Lab’s Pika 1.0, both of which are publicly accessible, offering similar multimodal capabilities for video generation and editing.