Google announced an AI-based video generator model, which was created jointly by researchers from the Weizmann Institute of Science and Tel Aviv University. the Lumiere Spatio-temporal diffusion model, which uses a unique structure to create the spatial and temporal model of the video to be generated (movement and changes of objects in the video) at the same time. Thus, instead of merging many small details or frames into an animation, the entire video is created from start to finish in a single process, so the end result is more realistic compared to existing solutions.
AI companies often demonstrate their technology with animals because it is currently still difficult to produce coherent, undistorted humans whose movements do not appear unnatural. Currently, text-to-video technology, i.e. “text-to-video” technology, produces five-second contents at a resolution of 1024 x 1024 pixels. The research company did not specify where it collected the training data package covering 30 million videos, typically 80 frame and 16 fps videos, but it was presumably via publicly available video repositories such as YouTube.
The possibilities of use are wide, with the template, you can not only create a video from scratch using text prompts, but also convert an existing still image into an animated image, or convert existing clips to a different presentation style using a reference image. But at the moment this can only be talked about on a theoretical level, because Google has not talked about when the model will be made available to a wider audience, if it comes out of the beta phase at all – even then it is likely that it will be able to use it as a paid service. .
In pursuit of a million bug bounties
Being a bug hunter is not only great, but sometimes it pays very well.
The generative AI models that produce videos specifically are still rudimentary, but the field has seen significant development in the past couple of years. In 2022, Google introduced its first image compositing model, Imagen Video, which creates short 1280 x 768 videos from text prompts with fluctuating quality results. Last March, startup Runway came up with a Gen2 video compositing model, which can create two-minute clips.
Gen-1 was only able to convert existing videos, rework 3D animation or smartphone recording according to different aspects and commands. On the other hand, the more advanced second generation no longer needs any raw materials to create videos, the user only needs to enter some text commands about the type of animation he wants to watch. Of course, the technology has its limits: at the moment, it produces very short clips that are not realistic, the quality leaves room for criticism, and the frame rate is also low. This is typical of other video-generating AI models available.