CSE443 Reasoning Under Uncertainty

Exercise Sheet 5: MDPs and Dynamic Programming

For material covered in Lectures L7 and L8

www.csse.monash.edu.au/~annn/443/Ex5.html

  1. Markov Decision Processes
  2. S&B: Exercises 3.1, 3.4, 3.5, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13, 3.15, 3.16.

  3. Dynamic Programming algorithms
  4. S&B: Exercises 4.1, 4.2, 4.7

  5. Absorbing States
  6. (i) What is an absorbing state?

    (ii) Suppose you were modelling a path planning problem using an MDP. The goal state in this case must be made an absorbing state. In order to minimize the path taken, what sort of rewards should be given to the goal and non-goal states, respectively?

    (iii) Given an absorbing state s, where the reward is r, when using the discount factor gamma, what is the expected return from state in s, V(s)?

  7. Comparing Dynamic Programming algorithms
  8. Compare the policy iteration algorithm to the value iteration algorithm, pointing out their similarities and differences.

    Note:S&B Section 4.6 talks about these in the context of what they call ``Generalised policy iteration", while the Kaelbling et al. (1996) survey paper talks about their performance comparison.}