#ai
#computing
#reinforcement-learning
formalism / jargon to say "compute policy at every step instead of once per episode"
e.g. without bootstrapping:
$G_{t} \doteq R_{t+1}+\gamma R_{t+2}+\gamma^{2} R_{t+3}+\cdots+\gamma^{T-t-1} R_{T}$
with boostrapping
$G_{t: t+n} \doteq R_{t+1}+\gamma R_{t+2}+\cdots+\gamma^{n-1} R_{t+n}+\gamma^{n} V_{t+n-1}\left(S_{t+n}\right)$