At DeepMind they usually do things very well, and after surprising us with AlphaGo or AlphaFold this division has gotten down to work and have launched a new AI video generation model called Veo 2. It has been the best time to do it, especially considering that Sora has proven to be very green as an AI video generator.
A much better model in specifications. Veo 2 can generate video clips of up to two minutes with resolutions up to 4K DCI (4,096 x 2,160). That is four times the resolution offered by Sora, the OpenAI model, but also six times the duration offered by this model, which until now was a clear benchmark in this segment.
How to access. For now I See 2 is exclusively available through Vertex AI in VideoFXGoogle’s experimental video creation tool. In this first deployment, the tool is available through a waiting list (and is not available in Spain at the moment), and the videos are currently limited to eight seconds in length and at 720p resolution. Sora can generate 1080p videos of up to 20 seconds.
From text (and image) to video. Veo 2 is capable of generating video from a text prompt, but it is also capable of taking, for example, a reference image and then a text prompt from that image. But the most important thing is not that.
I see 2 “understand” physics. According to DeepMind, the model has a better “understanding” of camera physics and controls, which, according to those responsible, allows it to generate clearer video fragments. That is to say: the textures and images are more defined, especially in scenes with a lot of movement. Additionally, the camera point of view can be better controlled to capture objects and people from different angles.
This promises. This better understanding of physics is noticeable, for example, in the generation of videos in which fluids intervene or in which light and shadows intervene in a special way. The videos that Google has offered as a demonstration are probably fragments in which the result is especially notable, but the results are certainly very, very promising.
Consistency remains a challenge. Those responsible for DeepMind themselves admit that there is room for improvement, and for example coherence and consistency are always a challenge. For example, to maintain a character’s features consistently, but there are again demonstrations in which the realism and consistency achieved by Veo 2 are apparently superior to those of Sora.
Sora bites the dust. It seemed that OpenAI was the great reference in this market thanks to Sora, but the videos that are being shown leave the OpenAI model in a bad light. Seen in the clips shared by DeepMind on YouTube either in Xbut especially in videos shared by some users who already have preliminary access to Veo 2. The video shown in which someone cuts tomatoes or, above all, the video of someone eating spaghetti that appears comes from far away for him meme with Will Smith.
The advantage of having YouTube. Training these models is often complicated, but here Google and DeepMind have the advantage of their access to YouTube. At the launch of the first version of the Veo model they already indicated that the model “may” have been trained with YouTube content “in accordance with Google’s agreement with YouTube creators.” The same seems to have happened with the second iteration, and of course access to that immense amount of content can contribute a lot to making its models more powerful.
Image | Google DeepMind
In techopiniones | Chatbots and generative AI seemed like the industry’s way forward in AI. Now there are some new pretty children: the agents