Vicarious learning - LLM infra

#vicarious-learning #ux #automatic-rlhf #prediction-is-compression #philosophy #humans #machinamorphize #nips-2022 #llm-infra #huggingface #text-generation-inference #deep-learning #continuous-batching-at-scale Created at 210523 # [Anonymous feedback](https://www.admonymous.co/louis030195) # [[Epistemic status]] #shower-thought Last modified date: 210523 Commit: 0 # Related - [[Computing/Vicarious learning - UX]] - [[Computing/Automatic RLHF]] - [[Computing/Prediction is compression]] - [[Philosophy/Humans/Machinamorphize]] - [[Computing/NIPS 2022]] # Vicarious learning - LLM infra ## Huggingface Particularly interesting: https://github.com/huggingface/text-generation-inference/tree/main/router Related to [[Deep learning continuous batching at scale]] The difference here to scale [[Large language model|LLM]] (i.e. serving to many users) compared to traditional software is that you usually prefer to have a single model, potentially split across hardwares, unlike in traditional software where you tend to scale "horizontally" (you run many instances on different hardware) In a way it comes back to a more philosophical thought I had. It seems intelligence has this property that require a very close spatial relationship. Models like [[GPT3]] are ran probably within 100 square meters or less, while the human mind, the most astonishing object in the [[Our universe|universe]], to our knowledge, can fit in this small box that we call head. In the same way that the neurons need to be tightly linked and close to eachother, neural networks need to communicate very quickly and the most microscopic delay would break everything. Let's take a thought experiment: Imagine we are in 2050, our civilization didn't go extinct yet, would you imagine a superintelligence, a gpt25 split across a long distance, say two planets?