Google’s Gemini 2.0 AI Model Offers Expanded Capabilities

By Marcus Law

December 12, 2024

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

Google has announced Gemini 2.0, the latest model in its line of LLMs

Latest iteration of Google’s Gemini AI model brings multimodal output, native tool use and agentic abilities to search giant's products

Google has announced Gemini 2.0, the latest model in its line of large language models aimed at organising the world’s information.

Sundar Pichai, CEO of Google and its parent company Alphabet, said in a statement that Gemini 2.0 “will enable us to build new AI agents that bring us closer to our vision of a universal assistant,” and noted that the model incorporates “new advances in multimodality – like native image and audio output – and native tool use.”

Sundar Pichai, CEO of Google and Alphabet

“If Gemini 1.0 was about organising and understanding information, Gemini 2.0 is about making it much more useful,” he said. “I can't wait to see what this next era brings.”

Pichai said the new model's capabilities are “underpinned by decade-long investments in our differentiated full-stack approach to AI innovation.” It is built on custom hardware like the company’s sixth-generation Tensor Processing Units (TPUs), which powered all of the training and inference for Gemini 2.0.

Gemini 2.0 Flash available to developers and users

Google is also releasing Gemini 2.0 Flash, an experimental version of the model with “low latency and enhanced performance at the cutting edge of our technology, at scale,” according to Demis Hassabis, CEO of Google’s AI research unit DeepMind, and Koray Kavukcuoglu, Google DeepMind’s CTO.

“Gemini 2.0 Flash builds on the success of 1.5 Flash, our most popular model yet for developers, with enhanced performance at similarly fast response times,” they said. “Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed.”

The model is available now to developers via Google’s AI APIs and to users of the Gemini AI chatbot. Gemini users globally can access a chat-optimised version of the model by selecting it in the model dropdown on the desktop and mobile web versions of the app. It will be available in the Gemini mobile apps soon.

Demis Hassabis, CEO of Google DeepMind

Hassabis and Kavukcuoglu said that in addition to supporting multimodal inputs like images, video and audio, Gemini 2.0 Flash “now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio.” The model can also natively call tools like Google Search, code execution and third-party user-defined functions.

To help developers build applications with the new model, Google is also releasing a Multimodal Live API that supports real-time audio and video streaming input, as well as the ability to use multiple combined tools.

Research prototypes showcase agentic AI abilities

Google also showcased several research prototypes built with Gemini 2.0 that aim to demonstrate the ‘agentic’ abilities of the model to take actions and accomplish tasks on behalf of users.

Key facts

2x: Gemini 2.0 Flash outperforms the 1.5 Pro model on key benchmarks at twice the speed.
83.5%: Project Mariner, a research prototype built with Gemini 2.0, achieved a state-of-the-art result of 83.5% on the WebVoyager benchmark, which tests agent performance on real-world web tasks.
1bn: Google's AI Overviews feature in Search, which will incorporate Gemini 2.0 capabilities, now reaches 1 billion people.

Project Astra, first introduced at the compan’'s I/O developer conference, is a prototype universal AI assistant that Google has been testing with a small group of users. The latest version built with Gemini 2.0 features “better dialogue” with the ability to converse in multiple languages, new tool use capabilities, improved memory and lower latency.

“We’re working to bring these types of capabilities to Google products like Gemini app, our AI assistant, and to other form factors like glasses,” Pichai said. "And we’re starting to expand our trusted tester program to more people, including a small group that will soon begin testing Project Astra on prototype glasses."

Another product, Project Mariner, is “an early research prototype built with Gemini 2.0 that explores the future of human-agent interaction, starting with your browser,” Hassabis and Kavukcuoglu said. Via an experimental Chrome browser extension, the agent is able to “understand and reason across information in your browser screen” and complete tasks for users.

If Gemini 1.0 was about organising and understanding information, Gemini 2.0 is about making it much more useful.
Sundar Pichai, CEO of Google and Alphabet

The DeepMind executives said Project Mariner achieved state-of-the-art results on the WebVoyager benchmark, which tests AI agent performance on real-world web tasks. “It’s still early, but Project Mariner shows that it’s becoming technically possible to navigate within a browser, even though it's not always accurate and slow to complete tasks today, which will improve rapidly over time,” they said.

Finally, Jules is an experimental AI code agent that integrates with the GitHub software development platform. “It can tackle an issue, develop a plan and execute it, all under a developer's direction and supervision,” according to the DeepMind executives. “This effort is part of our long-term goal of building AI agents that are helpful in all domains, including coding.”

Gemini 2.0 coming to more Google products

Pichai said Gemini 2.0 is already being tested in a limited fashion in Google’s AI Overviews feature in Search, with the advanced reasoning capabilities of the model being used to “tackle more complex topics and multi-step questions, including advanced math equations, multimodal queries and coding.” He said the feature will roll out more broadly early next year.

Google Gemini was first launched in December 2023

“Early next year, we’ll expand Gemini 2.0 to more Google products,” he said. “No product has been transformed more by AI than Search. Our AI Overviews now reach one billion people, enabling them to ask entirely new types of questions – quickly becoming one of our most popular Search features ever.”

Explore the latest edition of AI Magazine and be part of the conversation at our global conference series, Tech & AI LIVE.

Discover all our upcoming events and secure your tickets today.

AI Magazine is a BizClik brand

Company portals

Google

Google’s Gemini 2.0 AI Model Offers Expanded Capabilities

Gemini 2.0 Flash available to developers and users

Research prototypes showcase agentic AI abilities

Gemini 2.0 coming to more Google products

Company portals

Google

Tags