Tue Apr 08 2025

Amazon introduces a new AI-based voice model, Nova Sonic.

Amazon has introduced a new AI-based voice model called Nova Sonic, which the company claims is competitive with the advanced voice models from OpenAI and Google.

On Tuesday, Amazon unveiled its new generative artificial intelligence model, Nova Sonic, which is capable of natively processing voice and generating natural-sounding speech. According to the company, Nova Sonic's performance competes with the cutting-edge voice models from OpenAI and Google in metrics that assess speed, speech recognition, and conversational quality. This new model is Amazon's answer to the latest voice models, such as the one powering ChatGPT's Voice Mode, which offer a more natural experience compared to the more rigid models of earlier versions of Amazon Alexa.

Recent technological advancements have left older models and the digital assistants that support them, like Alexa and Apple's Siri, feeling quite stiff. Nova Sonic is available through Bedrock, Amazon's development platform for building business artificial intelligence applications, using a new bi-directional streaming API. In a statement, Amazon highlighted that Nova Sonic is "the most cost-effective AI voice model" on the market, costing about 80% less than OpenAI's GPT-4o model.

Parts of Nova Sonic are already being used in Alexa+, the enhanced version of Amazon's digital assistant, according to Rohit Prasad, Senior Vice President and Chief Scientist of AGI at the company. Prasad mentioned in an interview that Nova Sonic builds on Amazon's experience with "large orchestration systems," which form the technical foundation of Alexa. Compared to other AI voice models, Nova Sonic excels at directing user requests to different APIs, allowing it to "know" when it needs to retrieve real-time information from the internet, process a specific data source, or take actions in external applications using the appropriate tool.

During a bi-directional dialogue, Nova Sonic waits to speak "at the appropriate moment," considering the pauses and interruptions of the speaker. Additionally, it generates a text transcription of the user's speech, which developers can use for various applications. According to Prasad, this model exhibits fewer voice recognition errors than other models, implying that it is adept at understanding user intent, even if they speak unclearly, make mistakes, or are in a noisy environment.

In a benchmark that measures speech recognition across various languages and dialects, known as Multilingual LibriSpeech, Amazon reported that Nova Sonic achieved a word error rate (WER) of 4.2% when averaging English, French, Italian, German, and Spanish. This means that approximately four out of every 100 words from the model differed from a human transcription in those languages. In another benchmark that assessed noisy interactions with multiple participants, called Augmented Multi Party Interaction, Nova Sonic was 46.7% more accurate in terms of WER compared to OpenAI's GPT-4o transcription model. Furthermore, this model boasts industry-leading speed, with an average perception latency of 1.09 seconds, surpassing the GPT-4o model, which responds in 1.18 seconds, according to evaluations from Artificial Analysis.

Prasad highlighted that Nova Sonic is part of Amazon's broader strategy to build artificial general intelligence (AGI), which the company defines as "AI systems that can do anything a human can do on a computer." Looking ahead, he assures that Amazon plans to release more AI models capable of understanding different modalities, including images, video, and voice, as well as other sensory data relevant to interacting with the physical world. Amazon's AGI division, under Prasad's supervision, appears to play an increasingly important role in the company's product strategy. Recently, Amazon launched a preview of Nova Act, a browser-based AI model that seems to be powering elements of Alexa+ and the Buy for Me feature on Amazon. Since the launch of Nova Sonic, the company intends to make more of its internal AI models available for developers to use.

artificial intelligence virtual assistants voice recognition

Spotify will be flooded with ads created by artificial intelligence, will they make a significant difference for companies?

ARTIFICIAL INTELLIGENCE, ADVERTISING, SPOTIFY

| Fri Apr 04 2025

Relacionado

Spotify will be flooded with ads created by artificial intelligence, will they make a significant difference for companies?

ARTIFICIAL INTELLIGENCE, ADVERTISING, SPOTIFY

| Fri Apr 04 2025

Tesla would have stopped accepting trades for Cybertrucks due to the increase in unsold units.

The tariffs will raise the prices of electronic products. If you need a new device, "buy it now."

Trump's tariffs could even affect streaming services.

Dr. Oz Advocates for the Implementation of Artificial Intelligence in Healthcare During His First Meeting with the Medicare Agency.

Amazon introduces a new AI-based voice model, Nova Sonic.

Amazon has introduced a new AI-based voice model called Nova Sonic, which the company claims is competitive with the advanced voice models from OpenAI and Google.

Spotify will be flooded with ads created by artificial intelligence, will they make a significant difference for companies?

Spotify will be flooded with ads created by artificial intelligence, will they make a significant difference for companies?