#ai
#computing
$r_t = R(s_t, a_t, s_{t+1})$
finite-horizon undiscounted return, which is just the sum of rewards obtained in a fixed window of steps
$R(\tau) = \sum_{t=0}^T r_t.$
infinite-horizon discounted return, which is the sum of all rewards ever obtained by the agent, but discounted by how far off in the future they’re obtained
$R(\tau) = \sum_{t=0}^{\infty} \gamma^t r_t.$