Google Unveils Gemini 2.0 with Advanced AI Enhancements

ForkLog

1 year ago

Google has launched the new Gemini 2.0 model, which is more powerful than its predecessor and features multimodal capabilities.
The updated version includes a Deep Research tool for reasoning.
Additionally, the AI agent Project Mariner has been introduced, capable of performing tasks on a computer on behalf of the user.
Google announced improvements to the AI Overviews search query summary system.

On December 11, Google unveiled its “most powerful AI model to date” — Gemini 2.0.

Welcome to the world, Gemini 2.0 ✨ our most capable AI model yet.

We’re first releasing an experimental version of 2.0 Flash ⚡ It has better performance, new multimodal output, @Google tool use — and paves the way for new agentic experiences. ? https://t.co/ywY2oZv76p pic.twitter.com/1Wgcr3m2Ip

— Google DeepMind (@GoogleDeepMind) December 11, 2024

An experimental version, 2.0 Flash, has been initially launched, boasting high performance, multimodal features, and “paving the way for new agentic experiences.”

Gemini 2.0 Flash surpasses 1.5 Pro in key metrics, demonstrating double the speed. It can generate images, text, and speech in multiple languages. The model is reported to have significant improvements in programming and image analysis.

2.0 Flash becomes the flagship neural network, replacing 1.5 Pro. The new network can utilize third-party applications and services like Google Search and external API, distinguishing it from the previous generation.

As our workhorse model, Gemini 2.0 Flash outperforms 1.5 Pro on key benchmarks, at twice the speed.

It can generate images mixed with text as well as customizable text-to-speech multilingual audio. 2.0 Flash can also call tools like @Google Search, code execution and third-party… pic.twitter.com/OVicGFnJdP

— Google DeepMind (@GoogleDeepMind) December 11, 2024

Gemini 2.0 Flash is available in a chat version for all users, while the experimental multimodal neural network with text-to-speech and image conversion features is accessible to developers via the Gemini API in Google AI Studio and Vertex AI. In the coming months, the enhanced AI version will gradually roll out to various products like Android Studio, Chrome DevTools, Firebase, Gemini Code Assist, and others.

Google’s AI Agent

Google’s AI division, DeepMind, has introduced its first AI agent capable of independently operating on the internet.

Project Mariner, built on Gemini 2.0, is available to a limited group of testers. It controls the Chrome browser, moves the cursor on the screen, clicks buttons, fills out forms, and can navigate websites, behaving like a human.

Once the AI agent is set up, a chat window appears on the right side of the browser. Users can give instructions to the neural network, such as creating a shopping cart from a grocery store based on an attached list.

After specifying all parameters, the agent will go to the supermarket website, find the necessary items, and add them to the virtual cart. Its operation speed is slow — it takes about five seconds to click one button.

Project Mariner cannot fill in credit card numbers and other payment information, accept cookies, or sign service agreements. This is intentional to give users more control.

The agent can be used for searching flights, hotels, recipes, purchasing items, and other tasks. While executing a request, the computer cannot be used.

Project Mariner is not yet available to the general public, and its public release date is unknown.

Other AI Agents

In addition to Project Mariner, Google has introduced several other AI agents for more specialized tasks:

Deep Research can help explore complex topics by creating multi-step learning plans. It is not intended for solving mathematical and logical problems, writing code, or data analysis;
Jules can program, integrates into workflows on GitHub, and will appear in 2025;
another AI is focused on assisting in video games, with its release date unknown.

Gemini Learns to Reason

The updated version of Gemini has received the Deep Research feature, which uses “advanced reasoning” and “long-context capabilities” to create concise research summaries. Briefs can be exported to Google Docs for further editing.

The new Deep Research feature from Google feels like one of the most appropriately “Google-y” uses of AI to date, and is quite impressive.

I’ve had access for a bit and it does very good initial reports on almost any topic. The paywalls around academic sources puts some limits. pic.twitter.com/dwSqr6aKGZ

— Ethan Mollick (@emollick) December 11, 2024

The service can analyze information regarding a query using the internet, acting as a sort of research assistant. The result of its deliberations is presented as a brief summary with links to sources. The procedure is as follows:

The user writes a query.
Deep Research creates a “multi-step research plan.”
The user confirms the start of the analysis.
Deep Research conducts the research over several minutes and generates a response.

The service is available to owners of the paid version of Gemini Advanced.

AI Overviews to Become Smarter and Multimodal

Google announced improvements to the AI Overviews search query summary system. The service is reported to soon handle “more complex topics,” “multimodal” and “multi-step” searches, including advanced mathematical queries and programming tasks.

Testing of the functionality will begin this week, with a broader rollout at the start of next year.

The enhancement of AI Overviews is driven by the launch of Gemini 2.0.

In November, Google trained the Gemini chatbot to remember contextual information about users’ lives, interests, and preferences.