
Google has rolled out a major update to its Live API within the Gemini API, introducing a new native audio model available in preview. This update makes voice agents more reliable and natural in real-time conversations.
In a social media blog post, Google explained that the two key additions: more robust function calling and more natural conversations.
Function calling lets voice agents connect to external data and services, mainly used for tasks like booking appointments or getting real-time information. With the updated model, the accuracy of single function calls will be doubled and this will significantly improve multi-call scenarios, reducing errors and failed requests. This enhancement is key since voice interactions demand real-time processing.
The second update, which enhances the natural flow of conversations, lets the model handle interruptions, pauses, and side conversations better. For instance, the Gemini API can pause when background chatter or side questions occur and resume once the user is ready. This makes voice interactions less robotic and more human-like.
Google cited internal improvements, noting that the model reduces incorrect interruptions when users pause to think or remain silent for a moment. It also features better interruption detection, allowing voice agents to recognise when a user intentionally interrupts and respond accordingly.
Google later plans to add “thinking” capabilities, which would allow the model to take a beat for complex queries that require deeper processing. The tech giant cited early adopters like Ava, an AI-powered family operating system, noting that the update improves handling of noisy real-world inputs and reduces the need for workaround prompts.
In a social media blog post, Google explained that the two key additions: more robust function calling and more natural conversations.
Function calling lets voice agents connect to external data and services, mainly used for tasks like booking appointments or getting real-time information. With the updated model, the accuracy of single function calls will be doubled and this will significantly improve multi-call scenarios, reducing errors and failed requests. This enhancement is key since voice interactions demand real-time processing.
The second update, which enhances the natural flow of conversations, lets the model handle interruptions, pauses, and side conversations better. For instance, the Gemini API can pause when background chatter or side questions occur and resume once the user is ready. This makes voice interactions less robotic and more human-like.
Google cited internal improvements, noting that the model reduces incorrect interruptions when users pause to think or remain silent for a moment. It also features better interruption detection, allowing voice agents to recognise when a user intentionally interrupts and respond accordingly.
Google later plans to add “thinking” capabilities, which would allow the model to take a beat for complex queries that require deeper processing. The tech giant cited early adopters like Ava, an AI-powered family operating system, noting that the update improves handling of noisy real-world inputs and reduces the need for workaround prompts.