#tech #idea #computing #intelligence #large-language-models #personalization #business #google-Search #ParlAI #ChatGPT #GPT4 #information #organic-reality #artificial-reality # [[Epistemic status]] #shower-thought #to-digest # Related - [[Computing/Intelligence/Large language models are not personal enough - how to fix it]] - [[Business/Google Search chances of survival vs large language models]] - [[Computing/Ideas/GPT index and ParlAI]] - [[Computing/ChatGPT]] - [[Computing/Intelligence/ChatGPT is a proxy for GPT4]] - [[Computing/Google Search vs Large language model]] # TODO > [!TODO] TODO # Tech idea Tech idea with no revenue in mind. ## Problem - no problem, just fun ## Solution - I used to run a Discord bot for [[Langa 😛]] that was running [[Seeker search augmented conversational bot]] powered by Google Search as a context through https://github.com/langa-me/cerche - Wonder what [[ChatGPT]] would do with it - e.g. you start talking to chatgpt about the weather & co, and ask it to solve your tokenizer problem, it would google tokenizer to fix it's delayed training and get updated to latest information - for example if I ask chatgpt about [[Byte-pair encoding]]: What are the AI models using [[Byte-pair encoding]]? There are several AI models that use Byte-pair encoding for natural language processing tasks such as language translation, sentiment analysis, and language modeling. Some of the popular models include: 1. GPT-2 (Generative Pre-trained Transformer 2) 2. T5 (Text-to-Text Transfer Transformer) 3. RoBERTa (Robustly Optimized BERT Pretraining Approach) 4. XLM-RoBERTa (Cross-lingual Language Model RoBERTa) 5. ALBERT (A Lite BERT) 6. ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) All these models use Byte-pair encoding to tokenise the text and represent it in a numeric format that can be understood by the neural network. Byte-pair encoding helps the models to achieve better performance in capturing the meaning and context of the text, especially for languages with complex grammatical rules and morphological structures. Louis: It does not even talk about [[GPT3]] or [[ChatGPT]], I guess if it got some context from Google Search it would have evoked them In a way, as explored in [[Google Search vs Large language model]], we see that [[Large language model|LLM]] vs Google is this: - Google has access to more recent [[Information|information]] (until the data from [[Organic reality|organic reality]] has been transferred to [[Artificial reality|artificial reality]]) - LLMs are limited to their training data information So maybe it make sense to plug the google heuristic with llms?