Lumiere: A New Approach to Realistic Video Generation
Researchers from Google, Weizmann Institute of Science, and Tel Aviv University have proposed Lumiere, a space-time diffusion model for realistic video generation.
What is Lumiere?
Lumiere is a video diffusion model that allows users to generate realistic and stylized videos, as well as edit them.
Users can provide text inputs describing what they want in natural language, and the model will generate a video based on that description.
Additionally, users can upload a still image and add a prompt to transform it into a dynamic video. Lumiere also supports features such as video inpainting, cinemagraphs, and stylized generation.
Different Approach to Video Synthesis
Lumiere takes a different approach from existing models and focuses on synthesizing videos with realistic, diverse, and coherent motion.
While similar capabilities are offered by players like Runway and Pika, those models use a cascaded approach that can lead to difficulties in achieving temporal consistency and realistic motion.
Lumiere addresses this gap by using a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, resulting in more realistic and coherent motion.
Performance and Limitations
The researchers compared Lumiere to models from Pika, Runway, Stability AI, and ImagenVideo. They found that Lumiere produced 5-second videos with higher motion magnitude, temporal consistency, and overall quality.
However, Lumiere has limitations. It cannot generate videos consisting of multiple shots or involving transitions between scenes, which remains a challenge for future research.
Although Lumiere has been published, the models are not yet available for testing.