Amazon Unveils Nova Sonic AI Model for Voice Interaction

ForkLog

12 months ago

Amazon Unveils Nova Sonic AI Model for Voice Interaction

Amazon has introduced a new generative AI model, Nova Sonic, designed for voice interaction. The company claims its performance rivals leading solutions from OpenAI and Google in terms of speed, speech recognition, and conversation quality.

Amazon describes Nova Sonic as the “most economical” voice AI model on the market—approximately 80% cheaper than OpenAI’s GPT-4o. It is available through the Bedrock developer platform.

The neural network’s “components” are already being used in the updated Alexa+ assistant. It can engage in two-way dialogue, speaking “at the right moment” by considering pauses and interruptions from the speaker.

Nova Sonic employs a “unified model architecture,” which is purportedly superior to the approach of combining separate solutions for speech recognition, text conversion, response generation, and audio output.

Excited about the launch of Amazon Nova Sonic, our new speech-to-speech model that helps make AI voice applications feel remarkably natural.

It’s designed to understand not just what people say, but how they say it – working with tone, style, and conversation flow including… pic.twitter.com/QRvP4LWYQN

— Andy Jassy (@ajassy) April 8, 2025

It is claimed that Nova Sonic makes fewer errors in speech recognition compared to its competitors. It effectively understands user intentions even if they mumble, mispronounce words, or are in noisy environments.

In the Multilingual LibriSpeech benchmark, which measures speech recognition across various languages and dialects, Nova Sonic achieved an error rate of 4.2% for English, French, Italian, German, and Spanish. This means it misunderstood approximately four out of every 100 words.

In the Augmented Multi Party Interaction benchmark, which measures interaction with multiple participants, Nova Sonic was 46.7% more accurate than OpenAI’s GPT-4o-transcribe. It also boasts the industry’s best speed, with an average perception delay of 1.09 seconds.

? Amazon just dropped something BIG for voice AI.

It’s called Amazon Nova Sonic.

And it might change how we talk to machines forever.

Natural, human-like voice conversations no clunky delays.

Here’s why it matters (and what it can do): ? pic.twitter.com/2jblM3xTrB

— Brendan (@jowettbrendan) April 9, 2025

The company believes its new solution can be used to create various tools such as customer service bots or AI agents for the travel industry.

In April, Amazon updated the Nova Reel video generator to version 1.1. Users can create “multi-frame” clips up to two minutes long with “style continuity.”

In December 2024, the company introduced a new generation of Amazon Nova AI models for a wide range of tasks. The neural networks are capable of processing text, images, and video.