#reinforcement-learning, #artificial-intelligence, #self-supervised-learning, #human-feedback, #chatgpt, #common-intelligence, #school, #environment #ai #llm
Created at 220323
# [Anonymous feedback](https://www.admonymous.co/louis030195)
# [[Epistemic status]]
#shower-thought
Last modified date: 220323
Commit: 0
# Related
# TODO
> [!TODO] TODO
# Reinforcement learning from human feedback
Reinforcement learning from human feedback is usually done after a first step where the [[Artificial intelligence|AI]] trains itself on large amount of data ([[Self supervised learning]]) and then is taught by humans to better follow humans wishes
An analogy between ChatGPT and humans is that humans are born with natural, general, common intelligence ([self-supervised learning](https://en.wikipedia.org/wiki/Self-supervised_learning)) and we are nurtured by school and our environment ([reinforcement learning from human feedback](https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback), RLHF).