What if you didn’t have any hard goals..? And got rewards continually? And have stochastic actions? MDPs as Utility-based problem solving agents.