Chinese tech giant Alibaba Cloud has launched the multimodal AI model Qwen2.5-Omni-7B, capable of processing text, images, audio, and video, as well as generating text and voice responses in real time.
The neural network boasts 7 billion parameters. According to company representatives, it can be run on edge devices like phones and laptops without losing efficiency and performance.
“This unique combination makes the model an ideal foundation for developing flexible, cost-effective AI agents that deliver tangible benefits, especially intelligent voice applications,” the announcement stated.
As an example of Qwen2.5-Omni-7B’s applications, the company highlighted its potential to improve the lives of visually impaired individuals, helping them better navigate their surroundings. The model can analyze available ingredients via video and offer step-by-step cooking instructions.
“Qwen2.5-Omni-7B delivers remarkable performance across all modalities, competing with specialized models of comparable size,” the company emphasized.
The innovation is attributed to a new architecture and a high-quality dataset used for training.
The model is open-source and available on Hugging Face, GitHub, ModelScope, and Qwen Chat.
Back in March, Alibaba introduced the reasoning-focused AI model QwQ-32.
In January, the Chinese tech giant announced Qwen 2.5-Max, which is purportedly more powerful than DeepSeek-V3.
