Google DeepMind Develops AI Model for Video Soundtrack Generation

ForkLog

2 years ago

Google DeepMind is developing an artificial intelligence technology to create soundtracks for videos.

Google’s AI research division and other organizations have previously created models for video, but these have been unable to generate accompanying sound effects. To address this, DeepMind is employing V2A (video-to-audio) technology.

“Video generation models are advancing at an incredible pace, but many current systems do not produce a soundtrack. One of the next important steps towards film generation is the creation of soundtracks for these silent videos,” stated DeepMind.

DeepMind’s V2A technology uses prompts in conjunction with video to create music, sound effects, and dialogue. For example: “Pulsating underwater jellyfish, marine life, ocean.” The underlying diffusion AI model of V2A is trained on a database of sounds, dialogue transcripts, and video clips.

The following prompts were used to create sound for the video: cinema, thriller, horror film, music, tension, atmosphere, footsteps on concrete.

DeepMind acknowledges that the technology is not yet perfect, and the sound cannot be described as high-quality or convincing. Further refinements and testing are required before the full launch of V2A.

In February, OpenAI introduced a new generative AI model, Sora, which allows text to be transformed into video.

In June, scientists from Harvard and DeepMind created a virtual rat with artificial intelligence as its brain.

Previously, Google’s subsidiary unveiled the generative AI model Genie for creating games.