Machine Learning Street Talk (MLST) - Future of Generative AI David Foster

Future of Generative AI [David Foster] - Machine Learning Street Talk (MLST) ![rw-book-cover|200x400](https://wsrv.nl/?url=https%3A%2F%2Fd3t3ozftmdmh3i.cloudfront.net%2Fproduction%2Fpodcast_uploaded_nologo%2F4981699%2F4981699-1615918017811-244a868ada00b.jpg&w=100&h=100) ## Metadata - Author: **Machine Learning Street Talk (MLST)** - Full Title: Future of Generative AI [David Foster] - Category: #podcasts - URL: https://share.snipd.com/episode/3a72bb43-7d39-486f-b805-4bf0c1074cbd ## Highlights - David Foster on Generative Deep Learning and the Second Edition of his Book Key takeaways: - David Foster is a well-known author of the book 'Generative Deep Learning' - The book has a second edition and covers various topics such as generative modeling and deep learning methods like variational auto encoders, GANS, and auto-regressive models - The book also explores the use of advanced GANS, music generation, world models, and multimodal models among other applications Transcript: Speaker 2 Cool. So I'm here with David Foster. Welcome to MLST. Well, thanks very much for having me on. Just to be amazing. Well, David is very, very well known because he wrote this book, Generative Deep Learning. And I myself read this book in about 2018 or 19 when it first came out. And there's now a second edition. And we're going to be running a competition actually. So I think what we're going to do, we've got three ebooks to give away and the three best comments on the YouTube, right? Unfortunately, we don't have podcasts comments. The three best high effort comments, we will send you an ebook. So there you go. You've got two weeks. Anyway, so in the second edition of the book, I'm just looking at the table of content. So there's a chapter on generative modeling, deep learning, some of the methods, so variational auto encoders, GANS, auto-regressive models, normalizing flow models, energy-based models, and diffusion models. And then you speak about some of the applications. So there's transformers, advanced GANS, music generation, and world models, and multimodal models. So why don't you just give us a little bit of a story about the first edition of the book? What was it like to write it? And what's coming now in this second edition? Speaker 1 Yeah, well, thanks again for having me on. Pleasure to be here. I feel, yeah, this whole journey of writing this book has been, I feel like I've completely overlapped with how generative modeling has evolved over the last five years from a very niche field back in 2018, when I first started writing through to now probably the hottest topic in tech, I would say. And I think the process of writing the book has been one that I've had to keep up with the latest trends and technologies. ([Time 0:00:00](https://share.snipd.com/snip/173ae838-57b2-440a-8e05-f99441455d9d)) - Autoregressive Models and the Schematic Nature of Language and Art Key takeaways: • Autoregressive models incorporate previous iterations as memory which teaches us that language and art is schematic. • These models have surprised us by being very predictable. • There is a discussion between connectionist models and traditional cognitive models on the von Neumann architecture. • Vector databases are being considered as the memory store. • The question arises whether there is a difference between mimicry and the real thing if they are indistinguishable. Transcript: Speaker 2 Yeah, I think it was a surprise to everyone. And these autoregressive models, because there's always a discussion between connectionist models and traditional cognitive models, or even like the von Neumann architecture, Because they have a separate addressable memory. But now with these autoregressive models, it's almost like the previous iterations become a memory because they're fed back in. But they are still truncated models, but surprisingly, they've kind of taught us that language and even art is schematic. And it's very, very predictable. And then there's the philosophy of, well, you know, if mimicry is sufficiently indistinguishable from the real thing, is there an actual difference? Speaker 1 Yeah, exactly. I think we're looking at things like vector databases now as being that memory store. So yeah, people are now thinking of the GPT ([Time 0:09:48](https://share.snipd.com/snip/97d96d75-c778-4cda-958c-a6e340521f32)) - Using Vector Databases as Memory Store in Autogressive Models Key takeaways: • Vector databases serve as the memory store for autogressive models and transformers. • Compression of ideas and information is a challenge in creating intelligence. • Deciding how much information to store and compress is still a challenge for humans and AI models. Transcript: Speaker 1 Yeah, exactly. I think we're looking at things like vector databases now as being that memory store. So yeah, people are now thinking of the GPT model as the, almost like the action take, or how do we take information from this vector database and push it into something new into the future? But also, you know, compression of ideas and compression of information has always been a large problem of creating intelligence, because, you know, it's often the case that we're, Well, we are bombarded with information, and we have to decide as humans, how much of this do we want to store? How much do we want to throw away? What do we want to compress it down into? Do we need to compress it so much that it doesn't really matter if we can't decompress it into perfect information? Is that good enough to survive? And so we're coming up with exactly the same problem now with with autogressive models and transformers. You know, how much do we want to take previous memory of what's been said and store this as part of a compressible vector that you can then just reference at inference time? And obviously too much compression, and you, you, you destroy the very nature of what you're trying to remember. And then not enough, and you've, you've bloated your memory full of things that you don't need to remember at all. So I think what's going to be interesting is this blend between the creative and almost action-taking GPT style decoder model and static memory stores. We're starting to see companies like Pinecone, it with vector databases really, yeah, start ([Time 0:10:29](https://share.snipd.com/snip/b386296d-dec6-4a1f-8f45-7509c8f6b969)) - The Challenge of Information Compression in Creating Intelligent Models Key takeaways: - Vector databases are being seen as the memory store for autogressive models and transformers. - Compression of ideas and information is a challenge for creating intelligence. - Autogressive models and transformers also face the challenge of deciding how much previous memory to store as compressible vectors for reference at inference time. Transcript: Speaker 1 Yeah, exactly. I think we're looking at things like vector databases now as being that memory store. So yeah, people are now thinking of the GPT model as the, almost like the action take, or how do we take information from this vector database and push it into something new into the future? But also, you know, compression of ideas and compression of information has always been a large problem of creating intelligence, because, you know, it's often the case that we're, well, we are bombarded with information, and we have to decide as humans, how much of this do we want to store? How much do we want to throw away? What do we want to compress it down into? Do we need to compress it so much that it doesn't really matter if we can't decompress it into perfect information? Is that good enough to survive? And so we're coming up with exactly the same problem now with with autogressive models and transformers. You know, how much do we want to take previous memory of what's been said and store this as part of a compressible vector that you can then just reference at inference time? And obviously too much compression, and you, you, you destroy the very nature of what you're trying to remember. And then not enough, and you've, you've bloated your memory full of things that you don't need to remember at all. So I think what's going to be interesting is this blend between the creative and almost action-taking GPT style decoder model and static memory stores. We're starting to see companies like Pinecone, it with vector databases really, yeah, start to take the forefront of this and say, look, this is how you need to, as a company, start working with GPT models, because they are, you know, they are limited in their nature of their context window. So, you know, as a company, it's all very well to sort of just prompt a GPT and get a response. But ultimately, there needs to be some sort of store of your company's information so that you can actually pull, you know, live information from a database. And the, the interplay between this and the, the large language model is going to be super interesting, I think, going forward. ([Time 0:10:29](https://share.snipd.com/snip/23f5086c-1dac-48c7-9380-84307cc4c28a)) - The Limitations of Probabilistic Models in AI and the Possibility of Swinging Towards Symbolic Expression Key takeaways: • The speaker questions how much of today's cutting-edge technology will still be favored in the future. • There is a promoted idea that we need to move towards symbolic expression of intelligence. • Probabilistic models have the challenge of assigning nonzero probability even to the least likely events. Transcript: Speaker 1 Yeah, exactly. And I think, you know, when we look back, maybe five, 10 years down the road, I wonder how much of what we consider to be cutting edge will still be transformer and decoder style models And how much we'll have swung towards the symbolic, you know, expression of intelligence that's, that's, you know, I guess promoted by the Gary Marcus and so on, where they're saying, Look, we need something else here. This isn't enough. We can't just probabilistically, you know, stochastically jump our way to intelligence. We have to in some way, have a store that we seem to have as humans where we can't make the mistake of, if someone says, give me 100 cities in the UK or towns beginning with the letter S, we Can't make a mistake on three of them because it's obvious to us. And so it's, it's not something that we would even make with a tiny percentage probability. And the challenge, I guess, with probabilistic models is that they will always have, they will also assign some nonzero probability, even to the most the least likely events. And it seems that we don't, we seem to be able to zero probabilities. Now, what mechanism we use to do that, we don't really know. You know, we've, we've obviously built up neural networks by and large as a model of how things work in the brain. And I know many people would say, you know, they're very different. And that's true. And how the brain works and how neural networks work are fundamentally different, but there's, it's undeniable there are parallels. And so, you know, we, as, as neural networks work out of the box, there is no way to easily zero zero probability without some sort of forcing of that information onto the model. ([Time 0:13:12](https://share.snipd.com/snip/e4a833e4-ec3c-4022-acbc-2785701ae1db)) - The Future of AI: From Probabilistic Models to Symbolic Expression Key takeaways: - The use of transformer and decoder style models may not be cutting edge in the future - There may be a swing towards symbolic expression of intelligence - Gary Marcus and others promote the need for something more than probabilistic models - Probabilistic models can assign nonzero probability to unlikely events which humans do not make mistakes on - Humans rely on a store of knowledge to avoid mistakes Transcript: Speaker 1 Yeah, exactly. And I think, you know, when we look back, maybe five, 10 years down the road, I wonder how much of what we consider to be cutting edge will still be transformer and decoder style models and how much we'll have swung towards the symbolic, you know, expression of intelligence that's, that's, you know, I guess promoted by the Gary Marcus and so on, where they're saying, look, we need something else here. This isn't enough. We can't just probabilistically, you know, stochastically jump our way to intelligence. We have to in some way, have a store that we seem to have as humans where we can't make the mistake of, if someone says, give me 100 cities in the UK or towns beginning with the letter S, we can't make a mistake on three of them because it's obvious to us. And so it's, it's not something that we would even make with a tiny percentage probability. And the challenge, I guess, with probabilistic models is that they will always have, they will also assign some nonzero probability, even to the most the least likely events. And it seems that we don't, we seem to be able to zero probabilities. Now, what mechanism we use to do that, we don't really know. You know, we've, we've obviously built up neural networks by and large as a model of how things work in the brain. And I know many people would say, you know, they're very different. And that's true. And how the brain works and how neural networks work are fundamentally different, but there's, it's undeniable there are parallels. And so, you know, we, as, as neural networks work out of the box, there is no way to easily zero zero probability without some sort of forcing of that information onto the model. They will back propagate error, but not to the point where you're going to ever, you know, push a zero probability out to a particular output node. So, you know, how do we, how do we build this in? There needs to be new ideas, I think, that come to generative modeling to actually push the hard factual information into the model without it being purely probabilistic. ([Time 0:13:12](https://share.snipd.com/snip/02a09cae-6663-4eb5-bcec-f0c291e07120)) - Examining the Relationship between Language and Intelligence Key takeaways: • The traditional view is that language is an offshoot of intelligence, but what if understanding language gives us secrets to intelligence? • Recent progress in language generation has shone a spotlight on the question of what intelligence really is. • There is a question about which way around the causality of understanding language and intelligence should be approached. • It is interesting to explore how much of reality can be represented by a stream of tokens, despite actual reality being three-dimensional. Transcript: Speaker 1 I think that's a really interesting idea, is that we've always sort of seen language and linguistics as an offshoot of intelligence, and that we need to first of all understand intelligence, And then we'll understand how language works and why babies learn language the way they do. But what if we need to understand language first? And that will give us, I think what we're seeing now is that gives us secrets to intelligence that were previously hidden. We're almost doing this the other way around. We still haven't understood intelligence, but we're suddenly really making strides in language generation. And that is suddenly shone this spotlight on to what intelligence is that previously wasn't there. So that's the first thing, I think, which way round is the causality. But then, yeah, also, the point I made just before around how how much of reality can be actually truly represented by a stream of tokens, even though we live in a three dimensional spatial World from one time dimension, that we can compress this into a effectively a one dimensional line of tokens. And that there is enough of that captured by that stream that we can we can decompress that into using our own brains into back into what reality looks like to us, we can read things and It really does give the impression of true reality. So I think what's going to be interesting in future, you mentioned world models. So at the moment, we've got a very, very strong world model called GPT four, and that world model is based on a world where only tokens exist. There is no notion of objects, there's no notion of time, there is no notion of movement, action, observation, it's just a stream of integers, effectively, that's it. ([Time 0:17:11](https://share.snipd.com/snip/aaf36043-20ee-48f7-bc6c-96fabf265cc9)) - "Language and Reality": Exploring the Relationship between Language Models and Action Taking Key takeaways: • The discussion centers around whether actions taking should be treated separately from language models or as just another token. • The relationship between language and reality may be even tighter than previously thought. • Language seems to be a necessity for giving the impression of true intelligence. • The importance of language may be rooted in something very primitive and deep. • The speaker is a fan of Lacun and his Jepa architecture for language models. Transcript: Speaker 1 The moment GPT doesn't have anything to say about action taking beyond, well, treat it like another token, and you can play a game within GPT because it just treats everything as tokens, But do we need to treat actions taking separately? Do we need to really think about this as a different mechanism through which we embed information and receive information from large language models? So yeah, I think you're right though that the relationship between language and reality is much tighter than we thought. It doesn't seem to just be a happy offshoot of intelligence that we, it's wonderful that we can all talk, it seems to be a necessity, it seems to be something that's really important to Give the impression of true intelligence. So who knows where that's going to lead, but so much to think about there. Speaker 2 I know it seems to be so important, it might be almost platonic or something very primitive and very, very deep about it. I wanted to touch on a couple of things you said, I mean, first of all, I'm a big fan of Lacun and his Jepa architecture, you know, we've done lots of shows on all of his stuff that he's always Had this idea of using self supervised learning and deliberately creating a rich representation space because he said as soon as you supervise, there's a wonderful image in his Dino Paper where he kind of like showed the richness of the representation space becoming truncated. But unfortunately, as soon as you fine tune a model even to perform actions, you suffer this similar truncation on representation space. ([Time 0:19:50](https://share.snipd.com/snip/dd3c584e-dec6-40a4-96be-b371889345fa)) - The Objective Function of Minimizing Free Energy and Building Accurate Generative World Models in AI Key takeaways: • Our observation and actions in the world serve the same objective function of minimizing free energy and improving accuracy of generative world model. • No single theory of mind is currently correct and all have aspects that AI practitioners should take notice of. • Not to become obsessed with one particular AI technology such as GPT and treat it as the answer to everything. • Fristen's idea of a cybernetic loop involves an agent taking actions and getting percepts which relates to the same objective function of minimizing free energy. Transcript: Speaker 1 And that actually perhaps our, the way in which we observe and act in the world is all working towards the same objective function, which is to minimize free energy and to to try to try To make sure our generative world model is as accurate as possible so that we can we can contemplate in our own minds how the future might look and therefore work towards goals. I think there's a ton of things to unpack within these within these ideas that, you know, no, no one theory of mind at the moment has it all correct. And that I think they've probably all got things that we need to, as as generative AI practitioners take notice of and that we shouldn't just get obsessed by something like GPT and sort Of treat it as the answer to everything. Speaker 2 Interesting. Yeah. So I had a great chat with Fristen recently and as a refresher for folks, he's got this idea of a cybernetic loop and a cybernetic loop is basically what you just said. It's just I have an agent and I take actions and I get percepts in and he really leans on this idea of there being a didactic exchange. His words, not my users flarry language. But this idea of having this collective intelligence of agents that are demarcated by boundaries and they have this cyclic causal dependency basically and you get this emergent self-organization And interesting behavior. ([Time 0:35:12](https://share.snipd.com/snip/52173890-3a56-4112-bfb1-4ce6b60ea154)) - The Intersection of Generative AI and Active Inference Key takeaways: • There is potential for synergy between generative modeling and active inference. • Money and time should be poured into the concept of active inference. • The intersection of generative modeling and active inference could lead to interesting exploration. • Fristen's work has solved the dichotomy between exploration and exploitation. Transcript: Speaker 1 It's getting something right about its future. It's trying to be a one step ahead of itself all the time. So I do hope that there's a collision of ideas from, you know, the generative I that we're seeing today from OpenAI and other large companies, and the more niche area of active inference And Fristen's work, because I, you know, I do wonder if the same amount of money and time was poured into the concept of active inference, which I think has always been fascinating and An elegant way to describe intelligence, whether there is there is a lot of benefit of those two fields really understanding each other in great depth. And I wonder how many people in generative modeling on the on the large language model side really understand active inference and also vice versa. I think there's some great that then diagram space that the intersection of those two is going to be a really interesting area to explore going forward. Speaker 2 Yeah, I mean, since we're on the subject of Fristen. So, yeah, he's the main thing that he solved is this kind of what is normally a dichotomy between exploration and exploitation. He's got this beautiful formalism and he calls it energy and entropy. I think the entropy is basically like a kind of KL divergence, which is where comparing probability. Exactly. Yeah. ([Time 0:40:01](https://share.snipd.com/snip/59174748-8566-4ebc-a3aa-f0b01b3ff33e)) - The Potential Strata of AI Models and Humans Sharing Information Key takeaways: • The idea of an undercurrent of AI models speaking to each other and exchanging information and data is interesting. • Companies may start building different models with different specialties and there will be some interface between them. • Embedding the feature of AI models wanting to share information is a challenge at the point of inference, and it needs to come from some sort of internal function. Transcript: Speaker 1 I mean, I think that's an interesting area as well, isn't it? That we might sort of see proliferate is the idea that we might have a, on top of that, a strata of human existence, there may be an undercurrent of AI models, all also speaking to each other And engaging within information, and engaging with each other and delivering information between each other, and also between strata, so to us and from us. And I think, I wonder if that's one of the areas that we might start to see developing is that we don't just have one company trying to build one model to understand everything, but there Are different models that have different specialties, and that there is some sort of interface between them, probably natural language through which they can share and want, and This is where the wanting comes in, they need to want to share information. Now, how do you embed that? You know, because this is at the point of inference, remember, this isn't, we can't train them further, you know, once they're out in the wild, they have to, this want has to come from Some sort of internal function. And so, why should they want to care to talk to this GPT model rather than this GPT model over here? So, yeah, I'm just thinking out now, out loud now, sort of how that might happen. ([Time 0:50:43](https://share.snipd.com/snip/c0488571-8a26-4030-b1f1-d3aa8ef26a55)) - The Challenges of Dynamically Discovering and Consuming API with SingularityNET Key takeaways: • Ben Gertz has been advocating for the creation of singularity net for years which enables the use of multiple APIs. • The problem with using multiple APIs includes relevance, interface, semantics and reputation. • Hybrid architectures bring a risk of brittleness that can limit the effectiveness of decentralized intelligence. Transcript: Speaker 2 I don't know, fascinating though, isn't it? Like, well, why don't we do that? Well, that's exactly what Ben Gertz was being advocating for years with his singularity net, and we interviewed him at the time, and the problem is not only relevance and the interface Is a problem because you have the semantics problem. It's very, very brittle, it's very, very difficult to have a specification of an API that can be dynamically discovered and consumed by others, but then there's also some kind of a reputation System. I might want to consume this API over another API. And in principle, I think it's a really good idea because I love this idea of collective intelligence and decentralized intelligence. I don't want to have one single monolithic model. But by the same token, when you start talking about hybrid architectures, it introduces a form of brittleness. ([Time 0:51:56](https://share.snipd.com/snip/5aadc20d-96b0-4f1a-b441-74a42154190a))