Telegram (AI) YouTube Facebook X
Ру
Google Unveils AI Models for Text-to-Speech, Robotics, and Gemini on macOS

Google Unveils AI Models for Text-to-Speech, Robotics, and Gemini on macOS

Google unveils Gemini 3.1 Flash TTS with improved sound quality and expressiveness.

Google has released Gemini 3.1 Flash TTS, an updated speech synthesis model based on the Gemini 3 generation. It features improved sound quality, expressiveness, and more precise control, supporting over 70 languages.

The neural network enables developers, companies, and ordinary users to create applications with a voice AI interface.

Gemini 3.1 Flash TTS is now available:

  • for developers — in early access via Gemini API and Google AI Studio;
  • for enterprises — in Vertex AI;
  • for Workspace users — through the Google Vids service.

Enhanced Speech Quality and Control

The model scored 1211 points in the Artificial Analysis TTS rating. This score is based on the preferences of thousands of respondents who participated in blind audio quality testing.

image
Source: Google.

Artificial Analysis has classified the model among the most attractive solutions due to its combination of high-quality speech synthesis and low cost.

LLM stands out for its ability to generate natural dialogues involving multiple speakers.

New Audio Tags

Version 3.1 Flash TTS introduces audio tags — a tool for managing style, pace, and manner of speech.

“Early developers and corporate testers are already seeing the results of 3.1 Flash TTS, noting its impressive control and expressiveness. They told us how audio tags provide a new level of creative precision, transforming simple text into high-quality voice performance,” the company blog states.

AI Model for Robotics

Alongside Gemini 3.1 Flash TTS, the corporation introduced Gemini Robotics-ER 1.6. This AI model is designed to enable robots to perform complex tasks in real-world conditions through enhanced cognitive functions and “embodied” thinking.

The neural network specializes in spatial perception, action planning, and evaluating their success. It shows significant improvements over its predecessor and Gemini 3.0 Flash in tasks related to spatial and physical reasoning.

Gemini Robotics-ER 1.6 can interpret data from complex measuring instruments and observe indicators through viewports. This capability was developed by Google DeepMind specialists in collaboration with Boston Dynamics for industrial sector needs.

“Such capabilities allow autonomous seeing, understanding, and responding to real challenges,” commented Marco da Silva, Vice President of the Spot project at Boston Dynamics.

In security threat detection tests, the new model outperformed Gemini 3.0 Flash by 6% in text scenarios and 10% in video analysis.

The integration of LLM into real workflows has already begun: Boston Dynamics has integrated Gemini and Gemini Robotics-ER 1.6 into its own Orbit AIVI-Learning platform.

Gemini on macOS

Additionally, Google released a native Gemini application for macOS. It is accessible by pressing Option + Space. Features include the ability to share a window for instant context transfer.

The application supports image generation with Nano Banana, video creation with Veo, and other familiar tools.

Back in April, Google introduced Gemma 4 — a new family of open AI models for advanced reasoning and agent workflows.

Подписывайтесь на ForkLog в социальных сетях

Telegram (основной канал) Facebook X
Found a mistake? Select it and press CTRL+ENTER

Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!

We use cookies to improve the quality of our service.

By using this website, you agree to the Privacy policy.

OK