S&B: Exercises 3.1, 3.4, 3.5, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13, 3.15, 3.16.
S&B: Exercises 4.1, 4.2, 4.7
(i) What is an absorbing state?
(ii) Suppose you were modelling a path planning problem using an MDP. The goal state in this case must be made an absorbing state. In order to minimize the path taken, what sort of rewards should be given to the goal and non-goal states, respectively?
(iii) Given an absorbing state s, where the reward is r, when using the discount factor gamma, what is the expected return from state in s, V(s)?
Compare the policy iteration algorithm to the value iteration algorithm, pointing out their similarities and differences.
Note:S&B Section 4.6 talks about these in the context of what they call ``Generalised policy iteration", while the Kaelbling et al. (1996) survey paper talks about their performance comparison.}