Reward - louis030195

#ai #computing $r_t = R(s_t, a_t, s_{t+1})$ finite-horizon undiscounted return, which is just the sum of rewards obtained in a fixed window of steps $R(\tau) = \sum_{t=0}^T r_t.$ infinite-horizon discounted return, which is the sum of all rewards ever obtained by the agent, but discounted by how far off in the future they’re obtained $R(\tau) = \sum_{t=0}^{\infty} \gamma^t r_t.$