The history of embeddings

#nlp #word-embeddings #contextualized-word-embeddings #transformers #pre-trained-language-models #specialized-models #machine-learning #ai-models Created at 260423 # [Anonymous feedback](https://www.admonymous.co/louis030195) # [[Epistemic status]] #shower-thought Last modified date: 260423 Commit: 0 # Related - [[Computing/Embeddings in the human mind]] - [[Computing/Embeddings]] - [[Computing/Intelligence/Machine Learning/Embedding is the dark matter of intelligence]] - [[Computing/Intelligence/Joint embedding]] - [[Learnings from using LLM in products]] # The history of embeddings The shift from using embeddings as part of AI models to developing specialized models for generating embeddings can be attributed to the evolution of natural language processing (NLP) and machine learning tasks. Let's break down the transition in a few key phases: 1. **Word Embeddings**: In the early days of NLP, simple techniques like bag-of-words and TF-IDF were used to represent textual data. However, these methods failed to capture the semantic meaning of words. To address this issue, word embeddings such as Word2Vec, GloVe, and FastText were introduced. These embeddings represented words as dense vectors in a high-dimensional space, capturing the semantic relationships between words. 2. **Embeddings in AI Models**: With the introduction of word embeddings, researchers started using them as part of AI models, specifically as input layers for deep learning models. These models would have an embedding layer that would map words to their corresponding dense vectors, which would then be fed into the subsequent layers of the model. 3. **Contextualized Word Embeddings**: While word embeddings were effective at capturing some semantic relationships, they were limited by their static nature. To address this, researchers developed contextualized word embeddings, such as ELMo, which generated word representations based on the context in which they appeared. These contextualized embeddings were used as part of AI models in the same way as static word embeddings. 4. **Transformers and Pre-trained Language Models**: The introduction of the Transformer architecture by Vaswani et al. in 2017 revolutionized NLP. Transformers allowed for capturing long-range dependencies and context more effectively. This led to the development of pre-trained language models like BERT, GPT, and RoBERTa that used the Transformer architecture. These models, once fine-tuned on a specific task, produced state-of-the-art results in various NLP tasks. 5. **Specialized Models for Generating Embeddings**: As the AI community realized the potential of pre-trained language models, the focus shifted towards generating high-quality embeddings for various tasks. The idea was to create models that could generate embeddings specifically tailored to a certain task or domain, leading to better performance. These specialized models can be seen as an extension of the pre-trained language models, optimized for generating embeddings. In summary, the transition from using embeddings as part of AI models to developing specialized models for generating embeddings has been driven by the constant pursuit of better performance in NLP tasks. This evolution has been marked by the introduction of word embeddings, contextualized word embeddings, and pre-trained language models, culminating in the development of specialized models for generating embeddings. ## History of embeddings infra People used to store embeddings on disk which then moved to using specialized libraries like FAISS and then vector databases like pinecone, weaviate, qdrant Usually you still need embeddings indexes on disk at some point to run your [[KNN]]