RLbook2020trimmed

## Metadata
- Full Title: RLbook2020trimmed
- Category: #books
## Highlights
- Whereas the reward signal indicates what is good in an immediate sense, a value function specifes what is good in the long run (Page 28)
- Roughly speaking, the value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state (Page 28)
- fact, the most important component of almost all reinforcement learning algorithms we consider is a method for e!ciently estimating values (Page 29)
- In fact, the most important component of almost all reinforcement learning algorithms we consider is a method for e!ciently estimating values. The central role of value estimation is arguably the most important thing that has been learned about reinforcement learning over the last six decades (Page 29)
- For example, solution methods such as genetic algorithms, genetic programming, simulated annealing, and other optimization methods never estimate value functions. These methods apply multiple static policies each interacting over an extended period of time with a separate instance of the environment. The policies that obtain the most reward, and random variations of them, are carried over to the next generation of policies, and the process repeats (Page 29)
- If the space of policies is su!ciently small, or can be structured so that good policies are common or easy to fnd—or if a lot of time is available for the search—then evolutionary methods can be e↵ective (Page 30)
- The concepts of value and value function are key to most of the reinforcement learning methods that we consider in this book. We take the position that value functions are important for e!cient search in the space of policies. The use of value functions distinguishes reinforcement learning methods from evolutionary methods that search directly in policy space guided by evaluations of entire policies (Page 35)
- If you maintain estimates of the action values, then at any time step there is at least one action whose estimated value is greatest. We call these the greedy actions. When you select one of these actions, we say that you are exploiting your current knowledge of the values of the actions. If instead you select one of the nongreedy actions, then we say you are exploring, because this enables you to improve your estimate of the nongreedy action’s value (Page 48)
- Particularly relevant is the key feature of reinforcement learning that it takes long-term consequences of decisions into account (Page 497)
- The problem of ensuring that a reinforcement learning agent’s goal is attuned to our own remains a challenge (Page 499)
- How do you make sure that an agent gets enough experience to learn a high-performing policy, all the while not harming its environment, other agents, or itself (or more realistically, while keeping the probability of harm acceptably low) (Page 499)
- Tags: #ai
- One of the most pressing areas for future reinforcement learning research is to adapt and extend methods developed in control engineering with the goal of making it acceptably safe to fully embed reinforcement learning agents into physical environments (Page 500)