Google has introduced Gemini 2.0, its new AI model for the agentic era. The tech giant, on Wednesday, unveiled the first model in the Gemini 2.0 family of models – Gemini 2.0 Flash, a workhorse model with low latency and enhanced performance. The company also shared the frontier of its agentic research through prototypes enabled by Gemini 2.0’s native multimodal capabilities.
What is Gemini 2.0?
In their article, Demis Hassabis, CEO of Google DeepMind and Koray Kavukcuoglu, CTO of Google DeepMind, said that Gemini 2.0 Flash builds on the success of 1.5 Flash – the company’s most popular model yet for developers. The 2.0 Flash outperforms 1.5 Pro on some key benchmarks at twice the speed. The new model also comes with some unique capabilities.
Along with support for multimodal inputs such as images, video, and audio, 2.0 Flash also supports multimodal outputs such as natively generated images along with text and steerable text-to-speech (TTS) multilingual audio. The company said that 2.0 Flash can also natively call tools like Google Search, code execution as well as third-party user-defined functions.
Gemini 2.0 Flash performance
When it comes to performance, the 2.0 Flash shows significant advancement across most benchmarks when compared to the Gemini 1.5 Flash and Gemini 1.5 Pro. The new model excels in code generation tasks as it secured 92.9 per cent on Natural2Code, and this is a significant improvement over Gemini 1.5 Pro’s 85.4 per cent. When it comes to mathematical reasoning, the 2.0 Flash scored 89.7 per cent on MATH and 63 per cent on HiddenMath, also a considerable improvement over past iterations.
On factuality, 2.0 Flash scored 83.6 per cent, showing enhanced ability to offer accurate responses. The model scored 62.1 per cent on GQPA indicating better reasoning capabilities. The model saw a slight dip in long-context understanding (MRCR IM) with 69.2 per cent while Gemini 1.5 Pro obtained 82.6 per cent. The new model improves significantly on multimodal tasks with 70.7 per cent on MMMU, however, audio recognition remains low at 9.8 per cent.
Overall, the new Gemini 2.0 Flash outdoes its predecessors, especially in areas like coding, math, and multimodal understanding. The benchmark scores, as shared by Google, position 2.0 Flash as a robust model for complex tasks.
Gemini 2.0 flash: Availability
The new Gemini 2.0 Flash is available as an experimental model to developers via the Gemini API in Google AI Studio and Vertex AI with multimodal input and text output for all developers, and text-to-speech and native image generation available to early-access partners. The company said that the general availability will follow in January along with more model sizes. To help developers build dynamic and interactive applications, Google is also releasing a new Multimodal Live API that has real-time audio, video-streaming input and the ability to use multiple and combined tools.
Gemini users around the world will be able to access a chat optimised version of 2.0 Flash experimental by selecting it in the model drop-down on desktop, and mobile web and it will be available in the Gemini mobile app soon. Google has said that the Gemini 2.0 will be integrated into more Google products early next year. Source: The Indian Express
Be the first to comment