Telegram (AI) YouTube Facebook X
Ру
Amazon Unveils Nova Sonic AI Model for Voice Interaction

Amazon Unveils Nova Sonic AI Model for Voice Interaction

Amazon has introduced a new generative AI model, Nova Sonic, designed for voice interaction. The company claims its performance rivals leading solutions from OpenAI and Google in terms of speed, speech recognition, and conversation quality. 

Amazon describes Nova Sonic as the “most economical” voice AI model on the market—approximately 80% cheaper than OpenAI’s GPT-4o. It is available through the Bedrock developer platform. 

The neural network’s “components” are already being used in the updated Alexa+ assistant. It can engage in two-way dialogue, speaking “at the right moment” by considering pauses and interruptions from the speaker. 

Nova Sonic employs a “unified model architecture,” which is purportedly superior to the approach of combining separate solutions for speech recognition, text conversion, response generation, and audio output. 

It is claimed that Nova Sonic makes fewer errors in speech recognition compared to its competitors. It effectively understands user intentions even if they mumble, mispronounce words, or are in noisy environments. 

In the Multilingual LibriSpeech benchmark, which measures speech recognition across various languages and dialects, Nova Sonic achieved an error rate of 4.2% for English, French, Italian, German, and Spanish. This means it misunderstood approximately four out of every 100 words.

In the Augmented Multi Party Interaction benchmark, which measures interaction with multiple participants, Nova Sonic was 46.7% more accurate than OpenAI’s GPT-4o-transcribe. It also boasts the industry’s best speed, with an average perception delay of 1.09 seconds. 

The company believes its new solution can be used to create various tools such as customer service bots or AI agents for the travel industry.

In April, Amazon updated the Nova Reel video generator to version 1.1. Users can create “multi-frame” clips up to two minutes long with “style continuity.”

In December 2024, the company introduced a new generation of Amazon Nova AI models for a wide range of tasks. The neural networks are capable of processing text, images, and video.

Подписывайтесь на ForkLog в социальных сетях

Telegram (основной канал) Facebook X
Нашли ошибку в тексте? Выделите ее и нажмите CTRL+ENTER

Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!

We use cookies to improve the quality of our service.

By using this website, you agree to the Privacy policy.

OK