Reinforcement learning from human feedback

#reinforcement-learning, #artificial-intelligence, #self-supervised-learning, #human-feedback, #chatgpt, #common-intelligence, #school, #environment #ai #llm Created at 220323 # [Anonymous feedback](https://www.admonymous.co/louis030195) # [[Epistemic status]] #shower-thought Last modified date: 220323 Commit: 0 # Related # TODO > [!TODO] TODO # Reinforcement learning from human feedback Reinforcement learning from human feedback is usually done after a first step where the [[Artificial intelligence|AI]] trains itself on large amount of data ([[Self supervised learning]]) and then is taught by humans to better follow humans wishes An analogy between ChatGPT and humans is that humans are born with natural, general, common intelligence ([self-supervised learning](https://en.wikipedia.org/wiki/Self-supervised_learning)) and we are nurtured by school and our environment ([reinforcement learning from human feedback](https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback), RLHF).