
Meta unveils AI models for processing photos and videos
Meta announced Emu Video and Emu Edit — generative AI-based tools for editing and content creation.
Today we’re sharing two new advances in our generative AI research: Emu Video & Emu Edit.
Details ➡️ https://t.co/qm8aejgNtd
These new models deliver exciting results in high quality, diffusion-based text-to-video generation & controlled image editing w/ text instructions.
— AI at Meta (@AIatMeta) November 16, 2023
Both neural networks, based on the Emu language model, remain in testing. According to the statement, the products have already demonstrated potential benefits for artists, animators, and other professionals in the creative fields.
Emu Video can generate video clips based on the input text and attached images at 512×512 resolution and 16 frames per second.
The neural network was trained using a ‘factorized’ approach, splitting the process into two stages, enabling the tool to respond to different inputs.
“First [Emu Video] creates images based on the text prompt, and then based on that text it generates video. This “factorized” or split approach to generation allows us to train models for videos efficiently,” explained Meta.
Emu Edit allows removing or adding a background to photos, performing color and geometric transformations, and supports local and global editing.
The AI was trained on a dataset of 10 million samples, each consisting of an input image variant, a task description, and a target result.
“While Emu Video, Emu Edit and similar new technologies undoubtedly cannot replace professional artists, they will help people express themselves in new ways: from an art director conceiving a new concept, or a video editor bringing a clip to life, to a friend sharing a unique birthday greeting,” the company emphasised.
Earlier, Meta introduced a suite of AI tools in the form of a voice assistant, neural networks with different personalities, “smart glasses” and a sticker generator.
In August, the company announced AudioCraft, a neural network that creates sounds and music from text descriptions.
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!