Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.
-
Upload
hope-carbonell -
Category
Documents
-
view
214 -
download
0
Transcript of Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.
![Page 1: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/1.jpg)
Learning To Use Memory
Nick Gorski & John LairdSoar Workshop 2011
![Page 2: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/2.jpg)
2
Agent
Memory & Learning
Memory Environment
action
observations
reward
![Page 3: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/3.jpg)
3
Agent
Actions, Internal & External
Environment
action
observations
reward
{go left, go right, eat food, bid 5 5s, pick a flower}
action
{store, retrieve, maintain}
Memory
![Page 4: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/4.jpg)
5
Agent
Internal Reinforcement Learning
Environment
action
observations
reward
Memory
ReinforcementLearning
Action Selection
![Page 5: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/5.jpg)
6
Assumptions
• Custom framework, not using Soar• Simple memory models• Simple tasks
![Page 6: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/6.jpg)
7
Learning to Use Memory
• Research Question: – When can agents learn to use memory?
• Idea: – Investigate dynamics of memory and environment
independently• Need:– Simple, parameterized task
![Page 7: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/7.jpg)
8
An Interactive TMaze
LEFT (observation)
{forward} (avail. actions)
![Page 8: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/8.jpg)
9
An Interactive TMaze
DECIDE(observation)
{left, right} (avail. actions)
![Page 9: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/9.jpg)
10
An Interactive TMaze
+1 (reward)
![Page 10: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/10.jpg)
11
TMaze
A/B
C
Base TMaze
Question: how much knowledge is needed to perform this task?
![Page 11: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/11.jpg)
12
Parameterized TMazes
A/B
C
Base TMaze Temporal Delay
A/B
C C
# Dependent Actions
D D
A/BX/Y
C
Concurrent Knowledge
A B W X Y Z
C
Amt. of Knowledge
A/B
C
2nd Order Knowledge
![Page 12: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/12.jpg)
13
Two Working Memory Models
• Internal action toggles between memory states
• Less expressive• Ungrounded knowledge
• Internal action stores current observation
• More expressive• Grounded knowledge
01
Bit memory
toggleA/B
Gated WM
gate
![Page 13: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/13.jpg)
14
TMaze
A/B
C
Base TMaze
![Page 14: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/14.jpg)
15
Bit Memory & TMaze
• Methodology:– Modify memory to
attribute blame• Interfering behavior
in choice location• Doesn’t manifest
with GWM
![Page 15: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/15.jpg)
16
State Diagram: Bit Memory TMaze
L L 1
true percept mem
L L 0
true percept mem
R R 1
true percept mem
R R 0
true percept mem
L C 1
true percept mem
R C 1
true percept mem
R C 0
true percept mem
L C 0
true percept mem
toggle
STARTING STATES
up upup up
left right
left right left right
toggle
toggle toggleleft right
![Page 16: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/16.jpg)
17
State Diagram: GWM TMaze
L L L
true percept mem
L L
true percept mem
R R
true percept mem
R R R
true percept mem
L C L
true percept mem
L C
true percept mem
R C
true percept mem
R C R
true percept mem
L C C
true percept mem
R C C
true percept mem
gate
STARTING STATES
gate
up upup up
gate gategate gateleft right
left right left right
left right
![Page 17: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/17.jpg)
18
Number of Dependent Actions
C
# Dependent Actions
D D
![Page 18: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/18.jpg)
19
What We’ve Learned
• Our machine learning intuition is often wrong (and yours probably is, too!)
• Chicken & Egg Problem• State ambiguity is very problematic to learning
![Page 19: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/19.jpg)
20
Chicken & Egg Problem
• Prospective uses of memory are hard• Case study: bit memory & TMazes
1/0
Bit memory
• Endemic across memory models
A/B
C
Base TMaze Chicken & Egg Problem:• Must learn an association between 1 & 0 and A & B• Must learn an association between 1 & 0 and left & right• To be effective, can’t self- interfere with memory in C!
![Page 20: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/20.jpg)
21
Implications for Soar
• Soar natively supports learning internal acts.• Next step: learning to use Soar’s memories• Learning alongside hand-coded procedural
knowledge is potentially strong approach• Soar got the WM model right• RL will never be a magic bullet
![Page 21: Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.](https://reader037.fdocuments.us/reader037/viewer/2022110304/5519ba755503465b578b495b/html5/thumbnails/21.jpg)
22
Nuggets & Coal
• Nearly finished!• Better understanding of
RL + memory, and thus Soar 9
• Parameterized, empirical evaluations of RL gaining traction
• Optimality not only metric of performance
• Not quite finished!• Qualitative results, but
no closed form results yet
• No recent results for long term memories
• Not immediately applicable to Soar