The REAL Potential of Generative AI - Y Combinator

## Metadata
- Author: **Y Combinator**
- Full Title: The REAL Potential of Generative AI
- Category: #podcasts
- URL: https://share.snipd.com/episode/71b0c4eb-0f6c-42d9-bfed-37e1dad7a4e9
## Highlights
- Pre-trained Models: A New Frontier in Language Understanding
Key takeaways:
• Language models are a very old technology that use statistical models to predict words in English.
• Today, language models are able to do tasks like finishing sentences and solving maths problems, but they need to be trained with world knowledge to do so.
Transcript:
Speaker 1
Yeah, so language models themselves are a really old concept and old technology. And really all it is is a statistical model of words in English language. You take a big bunch of texts and you try to predict what is the word they'll come next, given a few previous words. So the cat sat on the mat is the most likely word. And then you have a distribution over all the other words in your vocabulary. As you scale the language models, both in terms of the number of parameters they have, but also in the size of the data set that they're trained on, it turns out that they continue to get Better and better at this prediction task. Eventually you have to start doing things like having world knowledge. You know, early on the language model is learning letter frequencies and word frequencies. And that's fairly straightforward. And that's kind of what we're used to from predictive text in our phones. But if the language model is going to be able to finish the sentence, today the president of the United States X, it has to have learned through the president of the United States is. If it's going to finish a sentence that's a maths problem, it has to be able to solve that maths problem. And so where we are today is that, you know, I think starting from GPT one and two, but then GPT three was really the one that I think everyone said, okay, something is very, very different Here. We now have these models of language that they're just models of the words, right? They don't know anything about the outside world. There's loads of debates about whether they actually understand language, but they are able to do this task extremely well. And the only way to do that is to have gotten better at under, you know, some form of reasoning and some form of knowledge.
Speaker 2
Oh, what are some of the challenges of using a pre-trained model like a chat GPT?
Speaker 1
So one of the big ones is that they have a tendency to confidently bullshit or hallucinate stuff. I think that Friedman described it as alternating between spooky and kooky. ([Time 0:01:41](https://share.snipd.com/snip/9c610e3f-a2af-4f18-a9f7-0aab52617fa6))
- How to build a large language model with GPT 3
Key takeaways:
(* Developers need help with prototyping, evaluation, and customization when building a large language model product for the first time., * These tasks can be difficult and require a lot of iteration., * The product can be more subjective than machine learning products used to be, and evaluation is harder., * GPT 3 makes it easy to customize the model to the user's needs.)
Transcript:
Speaker 2
Got it. If a developer is trying to build an app using a large language model and is doing it for the first time, what problems are they likely to encounter and how do you guys help them address some of those problems?
Speaker 1
Yeah. So we typically help developers with kind of three key problems. One is prototyping evaluation and finally customization. And maybe I can sort of talk about each of those. So at the early stages of developing a new large language model product, you have to try and get a good prompt that works well for your use case that tends to be highly iterative. You have hundreds of different versions of these things lying around managing the complexity of that versioning, experimenting. That's something we help with. Then the use cases that people are building now tend to be a lot more subjective than you might have done with machine learning before. And so evaluation is a lot harder. You can't just calculate accuracy on a test set. And so helping developers understand how well is my app working with my end customers is the next thing that we really make easy. And finally customization. Everyone has access to the same base models. Everyone can use GPT three. But if you want to build something differentiated, you need to find a way to customize the model to your use case to your end users to your context. And we make that much easier both through fine tuning and also through a framework for running experiments. We can help you get a product to market faster. But most importantly, once you're there, we can help you make something that your users prefer over the base models. ([Time 0:07:37](https://share.snipd.com/snip/4e7bbc4c-a4f9-4a64-a4fb-59201ccd85e9))
- The Openness of Google's GPT-3 Model
Key takeaways:
• The barriers to entry for training a model like GPT3 are mostly capital and talent.
• The people needed are still very specialized and very smart, and you need lots of money to pay for GPUs.
• Beyond that, I don't see that much secret sauce.
Transcript:
Speaker 1
So I don't think that's the dynamic that's at play here. To me, the barriers to entry of training one of these models are mostly capital and talent. The people needed are still very specialized and very smart, and you need lots of money to pay for GPUs. But beyond that, I don't see that much secret sauce. Like the opening eye, for all the criticism they get, they actually have been pretty open and deep might have been pretty open. They've published a lot about how they've achieved what they've achieved. And so the main barrier to replicating something like GPT-3 is can you get enough compute and can you get smart people and can you get the data? And more people are following on their heels. There's some question about whether or not the feedback data might give them a flywheel. I'm a little bit skeptical of that, that it would give them so much that no one could catch up.
Speaker 2
Why? That seems pretty compelling. If they have a two-year head start and thousands and thousands of apps get built, then the lead they have in terms of feedback data would seem to be pretty compelling.
Speaker 1
So I think the feedback data is great for narrower applications. If you're building an end-user application, then I think you can get a lot of differentiation through feedback and customization. But they're building this very general model that has to be good at everything. And so they can kind of let it become bad at code whilst it gets good at something else, which others can do. ([Time 0:14:00](https://share.snipd.com/snip/b9e00e05-ef09-48af-863e-cc7d6c7b196e))