Home Emerging Technology Google throws down the gauntlet with Gemini — its multimodal genAI engine

by Lucas Mearian

Senior Reporter

Google throws down the gauntlet with Gemini — its multimodal genAI engine

news analysis

Feb 08, 20245 mins

Artificial IntelligenceAugmented RealityChatbots

Google has introduced the market's first native multimodal generative AI model capable of ingesting and providing content based on text, audio, images, and video.

Credit: Google

Google on Thursday announced it has reconstructed and renamed its Bard chatbot — now called Gemini — to offer enterprises and consumers the industry’s first multimodal generative AI (genAI) platform that no longer relies only on text to provide human-like responses.

The release of Gemini represents a direct challenge to Microsoft’s Copilot, which is based on OpenAI’s ChatGPT, and every other chatbot based only on large language model (LLM) technology.

“It [Gemini] is currently the only native multimodal generative AI model available,” said Chirag Dekate, a vice president analyst at Gartner. “Google is no longer playing catch-up. Now, it is the other way around.”

Dekate called Gemini “a really big deal” because with a multimodal model, a single genAI engine performs individual tasks more accurately because it’s learning from a vastly larger body of knowledge. It essentially catapults Google to the head of the genAI pack.

Google first unveiled its Gemini AI model in December, touting multimodal capabilities that allow it to combine different types of information — inputs and outputs — including text, code, audio, images, and video.

Unlike LLM-only AI engines such as OpenAI’s GPT, Meta’s Llama 2, or even Google’s own PaLM 2 — all of which power today’s chatbots — Gemini doesn’t rely on that same technology. Instead, it can be trained using all types of media and content.

That matters because an enterprise can now create a chatbot that’is no longer confined to loading answers to queries from text on which its LLM has been trained.

“When I watch a movie, I am watching the video, I am reading the text (subtitles), I am listening to the audio, and it’s all happening simultaneously creating a hyper immersive experience,” Dekate said. “This is multimodality in a nutshell. Compare this to experiencing a movie by reading its script alone (LLM); this is the difference between LLM and multimodality.”

Last year, Dekate said, was a year of ideation as enterprises and consumers learned about genAI and chatbots in the wake of ChatGPT’s release in late 2022. Now, enterprises better understand the possibilities of genAI and are opening their wallets to spend a significant amount to infuse back-end and front-end systems with it.

If you’re a healthcare company, for example, trying to design a more immersive chatbot for physicians, a multimodal genAI engine can ingest a physician’s audio snippets, radiological images, and MRI video scans to create vastly more accurate prognoses and treatment outcomes.

“This creates a hyper-immersive, personal experience. None of this is possible using a simple LLM experience,” Dekate said. “If Google can enable enterprises and consumers to experience this multimodal experience, then Google has the chance to change the marketshare.”

In 2024, spending on genAI solutions is expected to reach $40 billion, up from $19.4 billion in 2023. By 2027, genAI spending is expected to hit $143 billion, with a five-year compound annual growth rate of 73.3%, according to research firm IDC.

“What we saw last year was the emergence of task-specific models — text-to-text, text-to-image, text-to-video, image-to -ext, etc.,” Dekate said. “Each task had its own model. So, if you have a narrow task of text-to-text, then LLMs perform well.”

Google’s $20-per month Gemini subscription model also appears aimed at taking market share from leader Microsoft.

US customers can subscribe for $19.99 a month to access Gemini Advanced, which includes a more powerful Ultra 1.0 AI model. Subscribers will receive two terabytes of cloud storage that typically cost $9.99 a month, and will soon gain access to Gemini in Gmail and Google’s productivity suite.

Google’s new One AI Premium plan is its answer to Microsoft and its genAI partner OpenAI, which developed the GPT LLM that powers ChatGPT.

“Part of it is competing with Microsoft, and part of it is to offer premium services to its premium [customers], mostly business office users who are already paying,” said Jack Gold principal analyst at J.Gold Associates. “Also, if you charge a fee, you limit the number of users that would have signed on for free. That gives you the opportunity to fix any problems seen by a more limited number of users, and provides a revenue stream to keep up the engineering, rather than relying on ads to pay for it.”

There is also the issue of cost for Google, because it’s not cheap to train a large AI model in data centers.

“Not sure how they get paid with running all that AI in the background, which takes a lot more processing power, and power is one of the biggest expenses of running a cloud/data center,” Gold said.

“What’s amazing about Gemini is that it’s so good at so many things,” said Google’s DeepMind CEO Dennis Hassabis. “As we started getting to end of the training, we started to see that Gemini was better than any other model out there on these very important benchmarks. For example, each of the 50 different subject areas that we tested on, it’s as good as the best expert human in those areas.”

OpenAI’s ChatGPT Plus a year ago pioneered the market for buying early access to AI models and other features, while Microsoft recently announced a competing subscription for AI in programs such as Word and Excel. Both subscriptions cost $20 a month in the United States.

by Lucas Mearian

Senior Reporter

Senior Reporter Lucas Mearian covers AI in the enterprise, Future of Work issues, healthcare IT and FinTech.

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

Google throws down the gauntlet with Gemini — its multimodal genAI engine

Google has introduced the market's first native multimodal generative AI model capable of ingesting and providing content based on text, audio, images, and video.

More from this author

What is a CAIO — and what should they know?

IT pros find generative AI doesn’t always play well with others

Afraid AI will steal your job? You’re not alone

DuckDuckGo launches anonymous AI chatbot

Most popular authors

Show me more

The rise of AI-powered killer robot drones

Adobe brings AI image generation to Acrobat

How to use iCloud with Windows

Podcast: Apple joins the AI party with 'personal intelligence' tools

Podcast: Is the AI hype justified or will the bubble ‘burst’?

Podcast: Does age discrimination exist in the tech industry?

Apple joins the AI party with 'personal intelligence' tools

Is the AI hype justified or will the bubble 'burst'?

Does age discrimination exist in the tech industry?

Google throws down the gauntlet with Gemini — its multimodal genAI engine

Google has introduced the market's first native multimodal generative AI model capable of ingesting and providing content based on text, audio, images, and video.

Related content

AR/VR headset sales decline is temporary: IDC

Apple's cautious AI strategy is absolutely right

Varjo wants you to create photorealistic VR ‘scenes’ with your phone

When it comes to AI, Apple is opening up for intelligence

From our editors straight to your inbox

More from this author

What is a CAIO — and what should they know?

IT pros find generative AI doesn’t always play well with others

Afraid AI will steal your job? You’re not alone

DuckDuckGo launches anonymous AI chatbot

Most popular authors

Show me more

The rise of AI-powered killer robot drones

Adobe brings AI image generation to Acrobat

How to use iCloud with Windows

Podcast: Apple joins the AI party with 'personal intelligence' tools

Podcast: Is the AI hype justified or will the bubble ‘burst’?

Podcast: Does age discrimination exist in the tech industry?

Apple joins the AI party with 'personal intelligence' tools

Is the AI hype justified or will the bubble 'burst'?

Does age discrimination exist in the tech industry?