
Nvidia unveils VideoLDM, a text-to-video generator
Nvidia developed the VideoLDM neural network, which generates short and realistic videos from text descriptions.
The algorithm enables animations of about five seconds at resolutions up to 2048×1280 pixels and at 24 frames per second. The model can generate video for both simple and complex prompts.
VideoLDM draws on advances from the Stable Diffusion algorithm. The model comprises about 4.1 billion parameters, of which 2.7 billion were trained on video.
The company said it had achieved “significant progress” in training the neural network quite rapidly. According to the developers, VideoLDM began generating detailed videos that match the descriptions in just a month.
The developers published several examples of the network’s work on their site.
The model can also generate driving scenes. Such videos have a resolution of 1024×512 pixels and last up to five minutes.
VideoLDM can model specific driving scenarios and predict the behavior of objects on the road. According to the developers, this enables realistic frames.
The published work is a participant in the IEEE Conference on Computer Vision and Pattern Recognition, which will be held in Vancouver from June 18 to 22. It is unclear whether Nvidia plans to release the algorithm publicly.
In April, Meta unveiled a tool for image and video segmentation.
In March, Microsoft released a preview version of Bing Image Creator.
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!