Google IO 2024

Google is fully committing to AI, and it made sure everyone knows it by mentioning the word “AI” over 120 times during the keynote at its I/O developer conference on Tuesday.

However, not all of Google’s AI announcements were groundbreaking. Some were incremental updates, while others restated previous developments. To help distinguish the most important innovations, we’ve compiled a list of the top new AI products and features introduced at Google I/O 2024.

Generative AI in Google Search Engine

Google plans to leverage generative AI to organize all of Google’s search results pages.

The appearance of these AI-organized pages will vary based on the search query. They may include AI-generated summaries of reviews, discussions from social media platforms like Reddit, and AI-created suggestion lists.

Initially, Google will display AI-enhanced results when it detects a user seeking ideas, such as for trip planning. Soon, this feature will expand to searches for dining options and recipes, eventually encompassing movies, books, hotels, e-commerce, and more.

Additionally, users will be able to customize their AI Overview, with options to simplify the language or delve into more detailed explanations. This feature will prove especially handy for those new to a topic or looking to provide simplified explanations for curious children. The update is set to roll out soon for English queries in the U.S. in Search Labs.

google-ai-overview

Project Astra and Gemini Live

Project Astra and Gemini Live
Image Credits: Google

Google is enhancing its AI-powered chatbot, Gemini, to improve its understanding of the world.

The company introduced a new feature in Gemini called Gemini Live, which enables users to have “in-depth” voice chats with Gemini on their smartphones. Users can interrupt Gemini while it’s speaking to clarify their questions, and the chatbot will respond by adapting to their speech patterns in real time. Additionally, Gemini can see and respond to users’ surroundings through photos or videos captured by their smartphones’ cameras.

Gemini Live, set to launch later this year, can answer questions about objects or scenes captured by a smartphone’s camera, such as identifying a neighborhood or naming a part on a broken bicycle. It can even mention recently viewed things like your glasses on the table if it had recently viewed them. This feature is powered by technical advancements from Project Astra, a new initiative within DeepMind to develop AI-powered applications and agents capable of real-time, multimodal understanding.

Google Veo

Google Veo
GIF Credits: Google

Google is challenging OpenAI’s Sora with Veo, an AI model designed to create 1080p video clips that are around a minute long from text prompts.

Veo can generate visuals in various cinematic styles, including landscape shots and time lapses, and can also edit and adjust already generated footage. The model comprehends camera movements and visual effects well from prompts, understanding descriptors like “pan,” “zoom,” and “explosion.” It also has a basic grasp of physics, such as fluid dynamics and gravity, which enhances the realism of its videos.

Additionally, Veo supports masked editing for specific areas of a video and can generate videos from still images, similar to generative models like Stability AI’s Stable Video. Notably, Veo can create longer videos from a sequence of prompts that form a narrative, extending beyond the one-minute limit.

Google Ask Photos

Google Ask Photos
Image Credits: Google

Google Photos is receiving an AI upgrade with the introduction of Ask Photos, an experimental feature powered by Google’s Gemini family of generative AI models.

Set to launch this summer, Ask Photos will enable users to search their Google Photos collection using natural language queries that leverage Gemini’s understanding of photo content and metadata.

For example, instead of searching for specific objects like “One World Trade,” users can perform broader and more complex searches, such as “best photo from each of the National Parks I visited.” In this case, Gemini would analyze factors like lighting, blurriness, and background clarity to identify the best photos while also using geolocation data and dates to find the relevant images.

Gemini features in Gmail

Gemini features in Gmail
Image Credits: Google

Gmail users are in for a productivity boost with Gemini, as it will soon enable searching, summarizing, and drafting emails and executing actions on emails for more intricate tasks like processing returns.

During a demo at I/O, Google showcased how a parent could stay updated on their child’s school activities by asking Gemini to summarize recent school emails. Gemini goes beyond just analyzing email bodies. It also examines attachments such as PDFs, providing a summary with key points and actionable items.

Using a sidebar within Gmail, users can leverage Gemini to organize receipts from emails, store them in a Google Drive folder, or extract information and input it into a spreadsheet. For frequent tasks like expense tracking, Gemini can even offer to automate the workflow for future use, catering to business travelers and others with similar needs.

Detecting Scams on Calls

Scam Alerts
Image Credits: Google

Google provided a sneak peek of an AI-powered feature designed to alert users to potential scams during phone calls.

This upcoming capability, slated to be integrated into a future version of Android, utilizes Gemini Nano, the smallest repetition of Google’s generative AI model. Operating entirely on the device, Gemini Nano listens in real-time for “conversation patterns commonly associated with scams.”

While a specific release date is yet to be announced, Google is showcasing the potential of Gemini Nano. Notably, this feature will be opt-in, ensuring user consent. Despite the on-device operation of Nano, it’s worth considering the potential privacy implications, as the system still involves listening to users’ conversations.

AI for TalkBack accessibility

AI for TalkBack accessibility
GIF Credits: Google

Google is elevating its TalkBack accessibility feature on Android by incorporating generative AI capabilities.

In the near future, TalkBack will leverage Gemini Nano to provide verbal descriptions of objects to aid low-vision and blind users. For instance, TalkBack might narrate an item of clothing as follows: “A close-up of a black and white gingham dress. The dress is short, with a collar and long sleeves. It is tied at the waist with a big bow.”

Google reports that TalkBack users encounter approximately 90 unlabeled images daily. With Nano’s assistance, the system will be equipped to offer contextual information independently, potentially eliminating the need for manual input.

As Google’s latest announcements continue to shape the future of AI development, making it more accessible and user-friendly, and with tools like WorkBot, organizations are poised to reap the benefits of streamlined productivity and informed decision-making. They can harness the power of WorkBot AI to connect multiple databases, files, and URLs in one place while maintaining privacy and access controls.

Book a demo with our experts today to learn how it can also help your organization maximize the potential of AI and help you stay ahead of the competition.