Shortly after news spread that Google was pushing back the release of its long awaited AI model called Gemini, Google announced its launch.
As part of the release, they published a demo showcasing impressive – downright unbelievable – capabilities from Gemini. Well, you know what they say about things being too good to be true.
Let’s dig into what went wrong with the demo and how it compares to OpenAI.
What is Google Gemini?
Rivaling OpenAI’s GPT-4, Gemini is a multimodal AI model, meaning it can process text, image, audio and code inputs.
(For a long time, ChatGPT was unimodal, only processing text, until it graduated to multimodality this year.)
Gemini comes in three versions:
- Nano: It’s the least powerful version of Gemini, designed to operate on mobile devices like phones and tablets. It’s best for simple, everyday tasks like summarizing an audio file and writing copy for an email.
- Pro: This version can handle more complex tasks like language translation and marketing campaign ideation. This is the version that now powers Google AI tools like Bard and Google Assistant.
- Ultra: The biggest and most powerful version of Gemini, with access to large datasets and processing power to complete tasks like solving scientific problems and creating advanced AI apps.
Ultra isn’t yet available to consumers, with a rollout scheduled for early 2024, as Google runs final tests to ensure it’s safe for commercial use. Gemini Nano will power Google’s Pixel 8 Pro phone, which has AI features built in.
Gemini Pro, on the other hand, will power Google tools like Bard starting today and is accessible via API through Google AI Studio and Google Cloud Vertex AI.
Was Google's Gemini demo deceptive?
Google published a six-minute YouTube demo showcasing Gemini’s skills in language, game creation, logic and spatial reasoning, cultural understanding, and more.
If you watch the video, it’s easy to be wowed.
Gemini is able to recognize a duck from a simple drawing, understand a sleight of hand trick, and complete visual puzzles – to name a few tasks.
However, after earning over 2 million views, a Bloomberg report revealed that the video was cut and stitched together that inflated Gemini's performance.
Google did share a disclaimer at the beginning of the video: “For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.”
However, Bloomberg points out they left out a few important details:
- The video wasn’t done in real time or via voice output, suggesting that conversations won’t be as smooth as shown in the demo.
- The model used in the video is Gemini Ultra, which is not yet available to the public.
The way Gemini actually processed inputs in the demo was through still images and written prompts.
It's like when you're showing everyone your dog's best trick.
You share the video via text and everyone's impressed. But when everyone's over, they see it actually takes a whole bunch of treats and petting and patience and repeating yourself 100 times to see this trick in action.
Let's do some side-by-side comparison.
In this 8-second clip, we see a person’s hand gesturing as if they’re playing the game used to settle all friendly disputes. Gemini responds, “I know what you’re doing. You’re playing rock-paper-scissors.”
But what actually happened behind the scenes involves a lot more spoon feeding.
In the real demo, the user submitted each hand gesture individually and asked Gemini to describe what it saw.
From there, the user combined all three images, asked Gemini again and included a huge hint.
While it’s still impressive how Gemini is able to process images and understand context, the video downplays how much steering is required for Gemini to generate the right answer.
Although this has gotten Google a lot of criticism, some point out that it’s not uncommon for companies to use editing to create more seamless, idealistic use cases in their demos.
Gemini vs. GPT-4
Thus far, GPT-4, created by OpenAI, has been the most powerful AI model out on the market. Since then, Google and other AI players have been hard at work coming up with a model that can beat it.
Google first teased Gemini in September, suggesting that it would beat out GPT-4 and technically, it delivered.
Gemini outperforms GPT-4 in a number of benchmarks set by AI researchers.
However, the Bloomberg article points out something important.
For a model that took this long to release, the fact that it’s only marginally better than GPT-4 is not the huge win Google was aiming for.
OpenAI released GPT-4 in March. Google now releases Gemini, which outperforms but only by a few percentage points.
So, how long will it take for OpenAI to release an even bigger and better version? Judging by the last year, it probably won't be long.
For now, Gemini seems to be the better option but that won’t be clear until early 2024 when Ultra rolls out.