aliases: [Machine Learning Street Talk (MLST),#80 AIDAN GOMEZ CEO Cohere - Language as Software]
#80 AIDAN GOMEZ [CEO Cohere] - Language as Software - Machine Learning Street Talk (MLST)
- Author: **Machine Learning Street Talk (MLST)**
- Full Title: #80 AIDAN GOMEZ [CEO Cohere] - Language as Software
- Category: #podcasts
- URL: https://share.snipd.com/episode/b0c7cf98-8d5f-4dc0-a5b1-cb9131407093
- Transformers Aren't the Last Architecture
I would be extremely disappointed if this is as creative and high-performing as we can get. I think that transformers took off because of their scaling properties and also because of a network effect, the community consolidated around this one architecture. And so the community really came together around this architecture and built up infrastructure to support its adoption.
I would be extremely disappointed if this is as creative and high-performing as we can get. I think that transformers took off because of their scaling properties and also because of a network effect, the community consolidated around this one architecture, we started to build all of this infrastructure, specifically for transformers. It was a network effect, and it had very nice scaling properties. And so the community really came together around this architecture and built up infrastructure to support its adoption. I think that's what's led to their proliferation or their success. I hope that it's not the last architecture that would be super, super disappointing and boring. You point out that they're not Turing complete. I should clarify that I'm not a linguist. I'm not too familiar with Chomsky hierarchies or the implications of the D-mind paper. But one thing that's interesting about it, when I read it, they're not speaking in theoretical terms or speaking in empirical terms of what functions are achievable from normal initialization. ([Time 0:11:08](https://share.snipd.com/snip/8f23a68b-181a-46d6-894d-ad139bca48ea))
- What's coming after transformers?
In theory, a transformer is a universal approximator, and it might be even Turing complete. But in practice, if you can't explore all permutations of parameters, it's very true that it'll find the simplest function that satisfies the task. I think that's a good guiding lens when thinking about what architectures come next. Where do we go from here? What are the sorts of components we need to add into neural networks to support them in representing these more complex functions? I do think that transformers are limited, and I really hope they're not our final architecture. I hope that we come up with something that's significantly better, and I see promising efforts along those directions. I think that retro from D-mind, augmenting transformers with a searchable memory, I think that's a huge step forward. The next thing we need to support is the ability for these transformers to keep state over long-time horizons, to be able to write into their own memory in order to make notes about what they've seen in the past. ([Time 0:12:33](https://share.snipd.com/snip/ecf65ef0-348d-491c-88be-5f8d03b67987))
- Tags: #ai
- What's Stopping Machine Learning From Getting Out There?
There's a huge compute barrier, right? Like to train these big models, you need a supercomputer and tons of data. The important thing is that now you can build with large language models because you're given an interface. It doesn't require three years of study to get up to speed. So that's really what we're pushing for. We want to put this stuff into the hands of every single developer.
Like the ability to write really compelling tags, the ability to fuse shop prompt and get pretty good performance on a huge swath of problems. But it just hasn't been changing the fabric of consumer applications. And I'm a consumer. You're a consumer. We all use these apps. And so as a researcher who's seeing the potential of the technology, it's super frustrating. You just wonder like, what is the, what's stopping this from getting out there into apps faster? And I think like two of the reasons are there's this huge compute barrier, right? Like to train these big models, you need a supercomputer and tons of data. That's very difficult to use and collect. So the compute is definitely one of the big barriers. But the second piece is that really it's like the people, right? Like at the moment, yeah, we have millions of developers on our planet, but a tiny, tiny, tiny, tiny fraction of them actually know how to do this specific thing, machine learning and training models. And so there's not a lot of people out there actually doing the work to integrate this into every product on Earth. And so for us, like a cohere, what we want to do is just blow that open, put this stuff into the hands of every single developer. It doesn't matter what your specialty is, if you're a database dev or mobile, like whatever you do, it doesn't matter. The important thing is that now you can build with large language models because you're given an interface, which doesn't require, you know, three years of study to get up to speed. So that's really what we're pushing for. ([Time 0:16:39](https://share.snipd.com/snip/0458ebf8-ea4e-4422-ad5a-a6de24ae6ae8))
- OpenAI Service Suddenly Got a Lot Better Recently
So it's very much a, it feels more two-way community-oriented. We're trying to build the right product for our users and the most useful product possible. And so the way we do that is just through dialogue and conversation and people asking for the thing that they need, they want.
OpenAI service suddenly kind of got a lot better recently. And I think they call it DaVinci too. It's a bit of a mystery because I reviewed GPT-3 when it first came out a couple of years ago and recently it seems much better. And if I understand correctly, they've done some kind of fine-tuning using reinforcement learning to align it to human preferences that instruct GPT or something like that. I don't even know if that's the case. But I just wondered if you could comment on that and do you folks plan to do something similar? Yeah.
So we don't call them instruct models. We call them command models because of the co and cohere. But we do have something currently in private beta. Hopefully we'll release it soon. But yeah, it has a huge impact on model performance. Like the ability to specify an instruction, specify an intent, describe the type of problem that you're solving, completely changes model performance. ([Time 0:26:50](https://share.snipd.com/snip/94a86e22-f716-431e-9ce2-c56d66d5c8c5))
- Tags: #ai