#tech #idea #computing #intelligence #large-language-models #personalization #business #google-Search #ParlAI #ChatGPT #GPT4 #information #organic-reality #artificial-reality
# [[Epistemic status]]
#shower-thought #to-digest
# Related
- [[Computing/Intelligence/Large language models are not personal enough - how to fix it]]
- [[Business/Google Search chances of survival vs large language models]]
- [[Computing/Ideas/GPT index and ParlAI]]
- [[Computing/ChatGPT]]
- [[Computing/Intelligence/ChatGPT is a proxy for GPT4]]
- [[Computing/Google Search vs Large language model]]
# TODO
> [!TODO] TODO
# Tech idea
Tech idea with no revenue in mind.
## Problem
- no problem, just fun
## Solution
- I used to run a Discord bot for [[Langa 😛]] that was running [[Seeker search augmented conversational bot]] powered by Google Search as a context through https://github.com/langa-me/cerche
- Wonder what [[ChatGPT]] would do with it
- e.g. you start talking to chatgpt about the weather & co, and ask it to solve your tokenizer problem, it would google tokenizer to fix it's delayed training and get updated to latest information
- for example if I ask chatgpt about [[Byte-pair encoding]]:
What are the AI models using [[Byte-pair encoding]]?
There are several AI models that use Byte-pair encoding for natural language processing tasks such as language translation, sentiment analysis, and language modeling. Some of the popular models include:
1. GPT-2 (Generative Pre-trained Transformer 2)
2. T5 (Text-to-Text Transfer Transformer)
3. RoBERTa (Robustly Optimized BERT Pretraining Approach)
4. XLM-RoBERTa (Cross-lingual Language Model RoBERTa)
5. ALBERT (A Lite BERT)
6. ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately)
All these models use Byte-pair encoding to tokenise the text and represent it in a numeric format that can be understood by the neural network. Byte-pair encoding helps the models to achieve better performance in capturing the meaning and context of the text, especially for languages with complex grammatical rules and morphological structures.
Louis: It does not even talk about [[GPT3]] or [[ChatGPT]], I guess if it got some context from Google Search it would have evoked them
In a way, as explored in [[Google Search vs Large language model]], we see that [[Large language model|LLM]] vs Google is this:
- Google has access to more recent [[Information|information]] (until the data from [[Organic reality|organic reality]] has been transferred to [[Artificial reality|artificial reality]])
- LLMs are limited to their training data information
So maybe it make sense to plug the google heuristic with llms?