Google Gemini Tries Outsmarting ChatGPT With Photo and Video AI

Google's latest AI update, Gemini, brings video and photo understanding to its Bard AI chatbot and Pixel 8 phone.

ADVERTISEMENT

Google Introduces Gemini Model to Bard AI Chatbot

Google has added a new AI model called Gemini to its Bard AI chatbot, which aims to bring native understanding of video, audio, and photos to the chatbot. This new technology is currently available in dozens of countries and supports text-based chat abilities in English. Google plans to roll out additional features for multimedia understanding, such as recognizing hand gestures in videos or solving drawing puzzles, in the near future.

Gemini marks a departure from traditional text-based chat AI models. With Gemini, Google intends to develop AI models that can process richer information and mimic the complex communication abilities of humans in our three-dimensional world. This represents a step closer to creating AI that feels like a helpful collaborator rather than just a smart piece of software.

Three Versions of Gemini for Different Computing Power Levels

Gemini comes in three versions tailored for different levels of computing power. The first version, Gemini Nano, is designed for mobile phones and will power new features on Google's Pixel 8 phones. Gemini Pro, optimized for fast responses, will run in Google's data centers and power a new version of Bard. Finally, Gemini Ultra, currently limited to a test group, will be available in a new Bard Advanced chatbot set to launch in early 2024. Pricing details for Gemini Ultra have not yet been disclosed by Google.

The rapid advancement in the generative AI field is evident with Gemini. Google is already on its third major AI model revision, surpassing top competitor OpenAI. Google plans to incorporate the new technology across its popular products like search, Chrome, Google Docs, and Gmail, reaching billions of users worldwide.

Gemini's Impressive Multimedia Capabilities

Gemini's capabilities extend beyond text-based chat. The AI model has been trained on text, programming code, images, audio, and video to efficiently process multimedia input. Examples provided by Google include correctly recognizing the next shape in a series, drawing connections between photos and facts, converting bar charts into labeled tables, and even identifying errors and providing explanations for physics problems.

While Gemini shows promising potential, its accuracy and performance in live scenarios are yet to be fully tested. Google is currently conducting "red teaming" to identify potential vulnerabilities and ensure responsible implementation of Gemini's capabilities. The company is committed to addressing AI risks collaboratively with governments and other stakeholders as AI technology continues to advance.