Post on 04-Jan-2016
Using Hierarchical Reinforcement Learning to Balance Conflicting Sub-problems
By: Stephen Robertson
Supervisor: Phil Sterne
Presentation Outline
• Project Motivation• Project Aim• Rules of the Gridworld• Flat Reinforcement Learning• Feudal Reinforcement Learning• State Variable Combination Approach
Project Motivation
• Reinforcement Learning is an attractive form of machine learning, but because of the curse of dimensionality, with complex problems it becomes inefficient
• Hierarchical Reinforcement Learning is a method for dealing with this curse of dimensionality
Project Aim
• Implementing various algorithms of Hierarchical Reinforcement Learning to a complex gridworld problem
• Comparing the various algorithms to each other and to flat Reinforcement Learning
Rules of the gridworld
• Possible Actions: Left, Right, Up, Down and Rest
• Collecting food and drink increases nourishment and hydration respectively
• After landing on the tree, the creature is carrying wood which it can use to repair its shelter
Rules of the gridworld• Resting in a repaired
shelter increases health in proportion to the shelter condition
• Landing on the lion decreases health and results in a direct punishment
• After every 4 steps, nourishment, hydration, and shelter condition decrease by 1. After 10 steps, health decreases by 1.
Flat Reinforcement learning
• Sarsa with eligibility traces was used• To get Flat Reinforcement Learning
working, the task needed to be simplified slightly
• Limited to a 6x6 gridworld• Nourishment, Hydration, Health and
Shelter Condition minimised to 5 discrete levels each
• Total states: 6 x 6 x 5 x 5 x 5 x 5 x 2 = 45000
• Managable
Flat Reinforcement Learning
• The given task requires a large amount of exploration in order to find the optimal solution
• Total exploration at first, decreasing gradually until finally total exploitation
• Optimistic initialisation of tables to maximum possible reward of 6400 encourages efficient exploration
Flat Reinforcement Learning Results
Feudal Reinforcement Learning
• Needs to be modified for the given problem
• In the simple maze problem, state variables change independently, and don’t change by more than 1
• In the simple maze problem, high level actions can be defined as the same as low level actions
Feudal Reinforcement Learning
• Main problem with the complex problem is the high level actions are hard to define
• State variables can change simultaneously and by more than one, i.e. creature can move to the left, and fully satisfy hunger in one step, changing two state variables simultaneously
• High level actions are defined as desired high level state
Feudal Reinforcement Learning Results• Feudal reinforcement learning failed
horribly
State Variable Combination Approach• In a problem with conflicting sub-problems,
sub-problems tend to be defined by a limited set of state variables
• Sub-agents are created, each in charge of a limited set of state variables
• Some sub-agents will be inherently equipped to solve a sub-problem
• Some sub-agents will not hold any useful information
• By incorporating all possible combinations, we minimise the amount of designer intervention
Examples of Sub-agents
Choosing between sub-agents
• If the sub-agent which predicts the highest possible reward for a given state is obeyed, the best action should get chosen
• The problem with this is that some sub-agents which do not hold any useful information might falsely predict a high reward
• Reliability of sub-agents also needs to be taken into account
• This is achieved by keeping track of the variance of predicted rewards
• High Variance = Unreliable Prediction• Low Variance = Reliable Prediction
Results
Questions ?