Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson...

Using Hierarchical Reinforcement Learning to Balance Conflicting Sub-problems

By: Stephen Robertson

Supervisor: Phil Sterne

Presentation Outline

• Project Motivation• Project Aim• Rules of the Gridworld• Flat Reinforcement Learning• Feudal Reinforcement Learning• State Variable Combination Approach

Project Motivation

• Reinforcement Learning is an attractive form of machine learning, but because of the curse of dimensionality, with complex problems it becomes inefficient

• Hierarchical Reinforcement Learning is a method for dealing with this curse of dimensionality

Project Aim

• Implementing various algorithms of Hierarchical Reinforcement Learning to a complex gridworld problem

• Comparing the various algorithms to each other and to flat Reinforcement Learning

Rules of the gridworld

• Possible Actions: Left, Right, Up, Down and Rest

• Collecting food and drink increases nourishment and hydration respectively

• After landing on the tree, the creature is carrying wood which it can use to repair its shelter

Rules of the gridworld• Resting in a repaired

shelter increases health in proportion to the shelter condition

• Landing on the lion decreases health and results in a direct punishment

• After every 4 steps, nourishment, hydration, and shelter condition decrease by 1. After 10 steps, health decreases by 1.

Flat Reinforcement learning

• Sarsa with eligibility traces was used• To get Flat Reinforcement Learning

working, the task needed to be simplified slightly

• Limited to a 6x6 gridworld• Nourishment, Hydration, Health and

Shelter Condition minimised to 5 discrete levels each

• Total states: 6 x 6 x 5 x 5 x 5 x 5 x 2 = 45000

• Managable

Flat Reinforcement Learning

• The given task requires a large amount of exploration in order to find the optimal solution

• Total exploration at first, decreasing gradually until finally total exploitation

• Optimistic initialisation of tables to maximum possible reward of 6400 encourages efficient exploration

Flat Reinforcement Learning Results

Feudal Reinforcement Learning

• Needs to be modified for the given problem

• In the simple maze problem, state variables change independently, and don’t change by more than 1

• In the simple maze problem, high level actions can be defined as the same as low level actions

Feudal Reinforcement Learning

• Main problem with the complex problem is the high level actions are hard to define

• State variables can change simultaneously and by more than one, i.e. creature can move to the left, and fully satisfy hunger in one step, changing two state variables simultaneously

• High level actions are defined as desired high level state

Feudal Reinforcement Learning Results• Feudal reinforcement learning failed

horribly

State Variable Combination Approach• In a problem with conflicting sub-problems,

sub-problems tend to be defined by a limited set of state variables

• Sub-agents are created, each in charge of a limited set of state variables

• Some sub-agents will be inherently equipped to solve a sub-problem

• Some sub-agents will not hold any useful information

• By incorporating all possible combinations, we minimise the amount of designer intervention

Examples of Sub-agents

Choosing between sub-agents

• If the sub-agent which predicts the highest possible reward for a given state is obeyed, the best action should get chosen

• The problem with this is that some sub-agents which do not hold any useful information might falsely predict a high reward

• Reliability of sub-agents also needs to be taken into account

• This is achieved by keeping track of the variance of predicted rewards

• High Variance = Unreliable Prediction• Low Variance = Reliable Prediction

Results

Questions ?

Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson...

Documents

Transcript of Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson...

Integrating Conflicting Data_PVERConf_May2011

Maurice Sterne retrospective exhibition, 1902-1932 ... Sterne retrospective exhibition, 1902-1932 paintings, sculpture, drawings. February 15th ... and project and express all the

Curriculum Theory: Conflicting Visions and … Theory: Conflicting Visions and Enduring ... Conflicting Visions and Enduring Concerns. ... Peddiwell, The Saber-Tooth Curriculum

CONFLICTING VIEWPOINTS

The Logic of American Politics Chapter One. The Logic of American Politics Choices breed conflict. – Conflicting interests. – Conflicting values. – Conflicting.

Conflicting mechanics & triangularity

Sterne CommunicationAsTechne

Jonathan Sterne BOURDIEU, TECHNIQUE AND TECHNOLOGY*

EMetrics Summit Digital Analytics Association Jim Sterne What Makes a Great Analyst? Jim Sterne – jsterne@targeting.com – @jimsterne – emetrics.org.

Sterne Agee suit

Jonathan Sterne University of Bristol, UK

The Cold War Conflicting Ideologies; Conflicting Superpowers.

MECH4301 2007 Lecture # 9 Conflicting Constraints 1/23 Advanced Methods in Materials Selection Conflicting Constraints Lecture 9 & Tutorial 4 Conflicting.

Resolving Conflicting Information

LAURENCE STERNE THE LIFE AND OPINIONS OF … · the life and opinions of tristram shandy, gentleman (1759-1767) laurence sterne. tristram shandy major characteristics

Jim sterne terametric_twitter-webinar_120910

Challenging Asset Allocation Promises W. A. Ruch, III, CIMA Chief Executive Officer Sterne Agee Asset Management, Inc. Sterne Agee Investment Advisors,

Laurence Sterne

Forged Letters of Laurence Sterne - CLAS Usersusers.clas.ufl.edu/burt/Bibliomania!/ForgedLettersLaurenceSterne.pdf · FORGED LETTERS OF LAURENCE STERNE A,S many have learned to their

"Did Sterne Complete "Tristram Shandy?""