Google Unveils Gemini 1.5 Pro AI Model in Public Preview, Introduces New Features

Gemini 1.5 Pro AI Model Features Advanced Native Speech Understanding

Google has introduced its advanced artificial intelligence (AI) model, Gemini 1.5 Pro, in a public preview on Tuesday. This model, noted for having the largest context window in AI, was initially announced in February and had been available for developers to test in Google AI Studio for two months. Now open to the public, users can explore its capabilities and create or access API keys to build applications using this large language model (LLM). Alongside its public release, Google has incorporated several new features into Gemini 1.5 Pro.

The public preview launch took place during Google’s annual Cloud Next event. The standard version of Gemini 1.5 Pro features a 128,000 token context window, significantly larger than the 32,000 tokens offered by its predecessor, Gemini 1.0. Additionally, a special variant of Gemini 1.5 Pro boasts an impressive context window of one million tokens. Tokens, which can be syllables, words, or word subsections, are essential data units for AI models. The context window determines how much information the AI can access to provide relevant responses based on prompt keywords.

To put this into perspective, a context window of one million tokens equates to approximately 700,000 words, akin to ten average-sized 300-page books. This extensive context window allows the AI to understand a broader range of information, enabling it to generate more accurate and contextually relevant responses. This feature is particularly beneficial when users need the AI to analyze large files for specific information.

Rowan Cheung, an X (formerly Twitter) user, shared his early experiences with the Gemini AI model. In one post, he described uploading the entire NBA dunk contest and asking which dunk received the highest score. Impressively, Gemini 1.5 was able to pinpoint the perfect 50 dunk and provide detailed information from its extensive context video understanding.

 

 

Gemini 1.5 Pro also comes with several new functionalities. Google has integrated native audio and speech support, allowing the AI to understand verbal prompts. Additionally, a File API for handling files, system instructions, and JSON mode have been added to give developers better control over the model. The AI model’s multimodal capabilities enable it to analyze images and videos effectively. Currently, Gemini 1.5 Pro is available in over 180 countries, including India.

This release marks a significant advancement in AI technology, showcasing Google’s commitment to expanding the capabilities and accessibility of AI models. The inclusion of a large context window and various new features makes Gemini 1.5 Pro a powerful tool for developers and users alike.

The AI model comes with several new features as well. Google has added native audio or speech support, and Gemini 1.5 Pro can understand verbal prompts. Alongside, a File API for handling files, system instructions, and JSON mode have also been added for developers to have better control over the model. It also comes with its multimodal capability and can analyse images and videos. The AI model is currently available in more than 180 countries including India.