Gemini Has Liftoff

Real Vision
December 7, 2023

A Bitcoin exchange-traded fund (ETF) is an investment vehicle that tracks the price of BTC or assets associated with bitcoin’s price, like futures. It’s traded on traditional stock market exchanges rather than on crypto exchanges. A Bitcoin ETF gives investors exposure to BTC without the need to actually own and hold the crypto asset.

Hi Visionaries,

It’s David here, co-creator of the Exponentialist.

This note marks the start of something new, and I’m excited to share it with you.

From now on, I’ll write to you each week with a quick slice of technology news and analysis. I’ll cover a key story or happening from the last seven days, and put down some first thoughts to help us all make sense of it.

Think of these notes as postcards from the Exponential Age: the fastest and most consequential transition in human history.

They also form a kind of scrap book of my ongoing thinking and research. I draw on all this and more for my deep work and longform essays in the Exponentialist where Raoul and I are building models and frameworks that can empower us all to navigate the future, both as investors and human beings.

Enough preamble; here’s this week’s note. Enjoy!

Gemini has Liftoff

This week, major news out of Google’s DeepMind AI division.

The DeepMind team announced Gemini, a multi-modal LLM that looks to have pushed back the frontiers when it comes to these kinds of AI models.

Gemini can speak in real-time. It understands text and image inputs, and can combine them in novel ways. Here it is giving ideas for toys to make out of blue and pink wool:

It can write code to a competition standard. In tests it outperformed 85% of the human competitors it was compared against; that means it’s excellent even when compared to some of the best human coders on the planet.

Gemini can even perform sophisticated verbal and spatial reasoning, and handle sophisticated mathematics. Imagine if you’d had this to help with your homework:

This is significant; OpenAI’s GPT-4 is notoriously bad at maths and logic puzzles.

And Google are, of course, taking direct aim at OpenAI with this launch. Gemini comes in three variants: Ultra, Pro, and Nano. Users can access the Pro version now via Bard, and the Ultra model will soon be made available to enterprise clients.

⚡⚡ The Exponentialist Take:

I first wrote about Gemini back in May, when I highlighted Google’s plans to launch a multi-modal GPT-4 killer. Well, here it is.

It will take time for users to independently verify the claims DeepMind are making, but there’s no denying Gemini looks impressive.

Scratch the surface, though, and we can discern important underlying signals about the future development of large language models, and about the AI revolution we’re all experiencing.

This AI outperforms GPT 3.5 when it comes to linguistic tasks such as brainstorming and copy drafting. But it’s the multi-modal nature of Gemini that’s really significant; in particular, its ability to reason has the potential to mark a huge breakthrough.

LLMs are trained to do next word prediction; that means they’re brilliant at sounding right but lack any underlying ability to know whether what they’re saying really is right, or even makes sense. DeepMind seem to have addressed this shortcoming effectively; Gemini can handle sophisticated symbolic logic tasks and more.

The emergence of an LLM that can act as a true reasoning partner is a real advance. It will unlock new use cases across the sciences, maths, and multiple forms of knowledge work. And it should haunt the dreams of all at OpenAI, including CEO Sam Altman.

During Altman’s brief time away from OpenAI, we learned about the company’s reported work on the still-mysterious Q* algorithm. Most believe Q* is about reasoning capabilities, too; for more on all this see this week’s AI Firehose discussion with me, Ash Bennington, and Mikhail Voloshin, which was recorded a couple of days before Gemini launched.

All this points in one direction: we’re hitting the limits of the performance improvements that can be achieved simple by training LLMs on even larger data sets. And those at DeepMind and OpenAI, of course, know it.

If that’s the case, then the near future belongs to AI tools that bring multiple models — trained on language, code, images, and more — together, and weave in new reasoning abilities. In other words, massively multi-modal is where it’s at. Gemini seems an early signal of this. And we could see really exciting advances in this space.

Finally, a word for Alphabet CEO Sundar Pichai: kudos.

Alphabet’s AI engineers invented the transformer model that underpins this generative AI moment. Then the company went missing, while OpenAI became the posterchild

Gemini puts Alphabet firmly back in the race. And given the recent fiasco at OpenAI, Alphabet Pichai this week looks increasingly like a man who has played a canny long game.

It’s ever-more clear: we’re amid a revolution when it comes to our relationship with knowledge. We’re only at the beginnings of figuring out what human and machine intelligence will do together. Don’t blink; it’s going to be a fascinating 2024.

Thanks for reading…!

I’ll keep watching this and multiple other technology revolutions, and working to make sense of it for all of us as investors, and humans.

And I’ll be back next week with another postcard from the Exponential Age. In the meantime, if you’d like to join us in the Exponentialist, you can become a charter member right here.

RELATED CATEGORIES: Defi, Global Economy, Investing, Learning, Technology