Bill Murdock1/44 J. William Murdock Intelligent Decision Aids Group Navy Center for Applied Research...

Bill Murdock 1/44

J. William MurdockIntelligent Decision Aids Group

Navy Center for Applied Research in Artificial IntelligenceNaval Research Laboratory, Code 5515

Washington, DC [email protected] http://bill.murdocks.org

Presentation for University of Maryland CMSC 722 (AI Planning)

Planning in the Context ofModel-Based Adaptation

Bill Murdock 2/44

Adaptation

• People adapt very well.– They figure out how to do new things.– If something doesn’t work, they try something else.– They understand how and why they are doing things.

• Computer programs do not adapt very well.– They can only do what they are programmed for.– They keep making the same mistakes.– They have no understanding of themselves.

• People adapt very well.– They figure out how to do new things.– If something doesn’t work, they try something else.– They understand how and why they are doing things.

• Computer programs do not adapt very well.– They can only do what they are programmed for.– They keep making the same mistakes.– They have no understanding of themselves.

Can we make computer programs adapt?Can we make computer programs adapt?

Bill Murdock 3/44

REM(Reflective Evolutionary Mind)

• Operating environment for intelligent agents• Provides support for adaptation to new

functional requirements• Uses functional models, generative planning,

and reinforcement learning• J. William Murdock and Ashok K. Goel

• Operating environment for intelligent agents• Provides support for adaptation to new

functional requirements• Uses functional models, generative planning,

and reinforcement learning• J. William Murdock and Ashok K. Goel

Bill Murdock 4/44

Planning and REM• REM is not a planning system, its an agent environment.

– Agents in REM don’t just decide what to do, they actually do things.– They may base their next action on the results of past actions.– Once they have taken an action, they may not be able to undo that

action.– REM consists of two major modules: execution of agents and

adaptation of agents.

• However, research on REM has involved many planning issues.– Agents in REM are represented as hierarchies of tasks and

methods; thus execution of agents resembles HTN planning.– One mechanism for adaptation of agents is traditional generative

planning.– Agents encoded in REM may perform planning.– An agent in REM can act in a deterministic logically-defined

simulated environment; such agents blur the distinction between deciding what to do and doing things.

• REM is not a planning system, its an agent environment.– Agents in REM don’t just decide what to do, they actually do things.– They may base their next action on the results of past actions.– Once they have taken an action, they may not be able to undo that

action.– REM consists of two major modules: execution of agents and

adaptation of agents.

• However, research on REM has involved many planning issues.– Agents in REM are represented as hierarchies of tasks and

methods; thus execution of agents resembles HTN planning.– One mechanism for adaptation of agents is traditional generative

planning.– Agents encoded in REM may perform planning.– An agent in REM can act in a deterministic logically-defined

simulated environment; such agents blur the distinction between deciding what to do and doing things.

Bill Murdock 5/44

Example:Web Browsing Agent

• A mock-up of web browsing software• Based on Mosaic for X Windows, version 2.4• Imitates not only behavior but also internal

process and information of Mosaic 2.4

• A mock-up of web browsing software• Based on Mosaic for X Windows, version 2.4• Imitates not only behavior but also internal

process and information of Mosaic 2.4

???ps

pdf txt

html

Bill Murdock 6/44

Example:Disassembly and Assembly

• Software agent for disassembly in the domain of cameras– Information about cameras– Information about relevant actions

• e.g., pulling, unscrewing, etc.

– Information about disassembly processing• e.g., decide how to disconnect subsystems

from each other and then decide how to disassemble those subsystems separately.

• Agent now needs to assemble a camera

• Software agent for disassembly in the domain of cameras– Information about cameras– Information about relevant actions

• e.g., pulling, unscrewing, etc.

– Information about disassembly processing• e.g., decide how to disconnect subsystems

from each other and then decide how to disassemble those subsystems separately.

• Agent now needs to assemble a camera

Bill Murdock 7/44

• TMK models provide the agent with knowledge of its own design.

• TMK encodes:– Tasks: functional specification / requirements and results– Methods: behavioral specification / composition and control– Knowledge: Domain concepts and relations

• TMK models provide the agent with knowledge of its own design.

• TMK encodes:– Tasks: functional specification / requirements and results– Methods: behavioral specification / composition and control– Knowledge: Domain concepts and relations

Remote Local

…

URL’s, servers,documents, etc.

Access

Request Receive Store

TMK (Task-Method-Knowledge)

Bill Murdock 8/44

...

REM Reasoning Process

A Method

Implemented Task

......

Unimplemented Task

Set of Input Values

Set of Input Values

Execution

Adaptation

ADAPTED Method

ADAPTED Implemented Task

...TraceSet of Output Values

Bill Murdock 9/44

...

ProactiveModel Transfer

...

Adaptation Process

Task

A Method

Similar Implemented Task

......

Situator(for Q-Learning) ADAPTED Method

ADAPTED Implemented Task

...

Failure-DrivenModel Transfer

Existing Method

Trace

Set of Input Values

Generative Planning

Bill Murdock 10/44

...Select Next Task

Within Method

Execution Process

SelectMethod

ExecutePrimitive Task

A Method

Implemented Task

...Set of Input Values

TraceSet of Output Values

Bill Murdock 11/44

Selection: Q-Learning

• Popular, simple form of reinforcement learning.• In each state, each possible decision is assigned an

estimate of its potential value (“Q”).• For each decision, preference is given to higher Q

values.• Each decision is reinforced, i.e., it’s Q value is altered

based on the results of the actions.• These results include actual success or failure and

the Q values of next available decisions.

• Popular, simple form of reinforcement learning.• In each state, each possible decision is assigned an

estimate of its potential value (“Q”).• For each decision, preference is given to higher Q

values.• Each decision is reinforced, i.e., it’s Q value is altered

based on the results of the actions.• These results include actual success or failure and

the Q values of next available decisions.

Bill Murdock 12/44

Q-Learning in REM

• Decisions are made for method selection and for selecting new transitions within a method.

• A decision state is a point in the reasoning (i.e., task, method) plus a set of all decisions which have been made in the past.

• Initial Q values are set to 0.• Decides on option with highest Q value or randomly

selects option with probabilities weighted by Q value (configurable).

• A decision receives positive reinforcement when it leads immediately (without any other decisions) to the success of the overall task.

• Decisions are made for method selection and for selecting new transitions within a method.

• A decision state is a point in the reasoning (i.e., task, method) plus a set of all decisions which have been made in the past.

• Initial Q values are set to 0.• Decides on option with highest Q value or randomly

selects option with probabilities weighted by Q value (configurable).

• A decision receives positive reinforcement when it leads immediately (without any other decisions) to the success of the overall task.

Bill Murdock 13/44

Task-Method-Knowledge Language (TMKL)

• A new, powerful formalism of TMK developed for REM.

• Uses LOOM, a popular off-the-shelf knowledge representation framework: concepts, relations, etc.

• A new, powerful formalism of TMK developed for REM.

• Uses LOOM, a popular off-the-shelf knowledge representation framework: concepts, relations, etc.

REM models not only the tasks of the domain but also itself in TMKL.

REM models not only the tasks of the domain but also itself in TMKL.

Bill Murdock 14/44

Tasks in TMKL

• All tasks can have input & output parameter lists and given & makes conditions.

• A non-primitive task must have one or more methods which accomplishes it.

• A primitive task must include one or more of the following: source code, a logical assertion, a specified output value.

• Unimplemented tasks have neither of these.

• All tasks can have input & output parameter lists and given & makes conditions.

• A non-primitive task must have one or more methods which accomplishes it.

• A primitive task must include one or more of the following: source code, a logical assertion, a specified output value.

• Unimplemented tasks have neither of these.

Bill Murdock 15/44

TMKL Task

(define-task communicate-with-www-server :input (input-url) :output (server-reply) :makes (:and (document-at-location (value server-reply) (value input-url)) (document-at-location (value server-reply) local-host)) :by-mmethod (communicate-with-server-method))

Bill Murdock 16/44

Methods in TMKL

• Methods have provided and additional result conditions which specify incidental requirements and results.

• In addition, a method specifies a start transition for its processing control.

• Each transition specifies requirements for using it and a new state that it goes to.

• Each state has a task and a set of outgoing transitions.

• Methods have provided and additional result conditions which specify incidental requirements and results.

• In addition, a method specifies a start transition for its processing control.

• Each transition specifies requirements for using it and a new state that it goes to.

• Each state has a task and a set of outgoing transitions.

Bill Murdock 17/44

Simple TMKL Method

(define-mmethod external-display

:provided (:not (internal-display-tag (value server-tag)))

:series (select-display-command

compile-display-command

execute-display-command))

Bill Murdock 18/44

Complex TMKL Method(define-mmethod make-plan-node-children-mmethod :series (select-child-plan-node make-subplan-hierarchy add-plan-mappings set-plan-node-children))(tell (transition>links make-plan-node-children-mmethod-t3 equivalent-plan-nodes child-equivalent-plan-nodes) (transition>next make-plan-node-children-mmethod-t5 make-plan-node-children-mmethod-s1) (:create make-plan-node-children-terminate transition) (reasoning-state>transition make-plan-node-children-mmethod-s1 make-plan-node-children-terminate) (:about make-plan-node-children-terminate (transition>provided '(terminal-addam-value (value child-plan-node)))))

Bill Murdock 19/44

Knowledge in TMKL

Foundation: LOOM– Concepts, instances, relations– Concepts and relations are instances and can

have facts about them.

Foundation: LOOM– Concepts, instances, relations– Concepts and relations are instances and can

have facts about them.

Knowledge representation in TMKL involves LOOM +

some TMKL specific reflective concepts and relations.

Knowledge representation in TMKL involves LOOM +

some TMKL specific reflective concepts and relations.

Bill Murdock 20/44

Some TMKLKnowledge Modeling

(defconcept location)(defconcept computer :is-primitive location)(defconcept url :is-primitive location :roles (text))(defrelation text :range string :characteristics :single-valued)(defrelation document-at-location :domain reply :range location)(tell (external-state-relation document-at-location))

Bill Murdock 21/44

Sample Meta-Knowledge in TMKL

•relation characteristics

–single-valued/multiple-valued

–symmetric, commutative

•relation characteristics

–single-valued/multiple-valued

–symmetric, commutative

•relations over relations

–external/internal

–state/definitional

•relations over relations

–external/internal

–state/definitional

•generic relations

–same-as

–instance-of

–inverse-of

•generic relations

–same-as

–instance-of

–inverse-of

•concepts involving concepts

–thing

–meta-concept

–concept

•concepts involving concepts

–thing

–meta-concept

–concept

Bill Murdock 22/44

Web Browsing Agent

• Interactive Domain: Web agent is affected by the user and by the network

• Dynamic Domain: Both users and networks often change

• Knowledge Intensive Domain: Documents, networks, servers, local software, etc.

• Interactive Domain: Web agent is affected by the user and by the network

• Dynamic Domain: Both users and networks often change

• Knowledge Intensive Domain: Documents, networks, servers, local software, etc.

Mock-up of a web browser:

Steps through the web-browsing process

Mock-up of a web browser:

Steps through the web-browsing process

Bill Murdock 23/44

Tasks and Methodsof Web Agent

Communicate with WWW Server Display File

Process URL Method

Process URL

Request from Server Receive from Server

Communicate with WWW Server Method

Interpret Reply Display Interpreted File

External Display Internal Display

Execute Internal DisplaySelect Display Command Compile Display Command Execute Display Command

Display File Method

Bill Murdock 24/44

Example: PDF Viewer

• The web agent is asked to browse the URL for a PDF file. It does not have any information about external viewers for PDF.

• Because the agent already has a task for browsing URL’s it is executed first.

• When the system fails, the user provides feedback indicating the correct viewer.

• Failure-Driven Model Transfer

• The web agent is asked to browse the URL for a PDF file. It does not have any information about external viewers for PDF.

• Because the agent already has a task for browsing URL’s it is executed first.

• When the system fails, the user provides feedback indicating the correct viewer.

• Failure-Driven Model Transfer

Bill Murdock 25/44

Web Agent Adaptation

External Display

Select Display Command Compile Display Command Execute Display Command

...

External Display

Compile Display Command Execute Display Command

...

Select Display Command Base Method Select Display Command Alternate Method

Select Display Command

Select Display Command Base Task Select Display Command Alternate Task

Bill Murdock 26/44

Physical Device Disassembly

• ADDAM: Legacy software agent for case-based, design-level disassembly planning and (simulated) execution

• Interactive: Agent connects to a user specifying goals and to a complex physical environment

• Dynamic: New designs and demands• Knowledge Intensive: Designs, plans, etc.

• ADDAM: Legacy software agent for case-based, design-level disassembly planning and (simulated) execution

• Interactive: Agent connects to a user specifying goals and to a complex physical environment

• Dynamic: New designs and demands• Knowledge Intensive: Designs, plans, etc.

Bill Murdock 27/44

Disassembly Assembly

• A user with access to ADDAM disassembly agent wishes to have this agent instead do assembly.

• ADDAM has no assembly method thus must adapt first.

• Since assembly is similar to disassembly, REM selects Proactive Model Transfer.

• A user with access to ADDAM disassembly agent wishes to have this agent instead do assembly.

• ADDAM has no assembly method thus must adapt first.

• Since assembly is similar to disassembly, REM selects Proactive Model Transfer.

Bill Murdock 28/44

Adaptation UsingRelation Mapping

• Requires a model for an existing agent which has a task similar to the desired task.– e.g., disassembly is similar to assembly

• Specifically, the effects (:makes slot) of the two tasks must match except for one term, and that one term must be connected by a single relation.– e.g., disassembly produces a disassembled world state, and

assembly produces a assembled world state and (inverse-of disassembled assembled) is known.

• Uses that relation to alter lower level tasks and methods which relate to it.– as indicated by the TMK model

• Requires a model for an existing agent which has a task similar to the desired task.– e.g., disassembly is similar to assembly

• Specifically, the effects (:makes slot) of the two tasks must match except for one term, and that one term must be connected by a single relation.– e.g., disassembly produces a disassembled world state, and

assembly produces a assembled world state and (inverse-of disassembled assembled) is known.

• Uses that relation to alter lower level tasks and methods which relate to it.– as indicated by the TMK model

Bill Murdock 29/44

Pieces of ADDAM which are key to Disassembly Assembly

Adapt Disassembly Plan Execute Plan

Plan Then Execute Disassembly

Disassemble

Hierarchical Plan Execution

Select Next Action Execute Action

Topology Based Plan Adaptation

Make Plan Hierarchy

Make Equivalent Plan Nodes Method

Make Equivalent Plan Node Add Equivalent Plan Node

Map Dependencies

Select Dependency Assert Dependency

Bill Murdock 30/44

New Adapted Task inDisassembly Assembly

COPIED Adapt Disassembly Plan COPIED Execute Plan

COPIED Plan Then Execute Disassembly

Assemble

COPIED Hierarchical Plan Execution

Execute Action

COPIED Topology Based Plan Adaptation

COPIED Make Plan Hierarchy

COPIED Make Equivalent Plan Nodes Method

COPIED Add Equivalent Plan Node

COPIED Map Dependencies

COPIED Select Dependency INVERTED Assert Dependency

INSERTED Inversion Task 1

INSERTED Inversion Task 2

Select Next Action

COPIED Make Equivalent Plan Node

Bill Murdock 31/44

Task: Assert Dependency

Before:define-task Assert-Dependency input: target-before-node, target-after-node asserts: (node-precedes (value target-before-node)

(value target-after-node))

After:define-task Mapped-Assert-Dependency input: target-before-node, target-after-node asserts: (node-follows (value target-before-node)

(value target-after-node)))

Bill Murdock 32/44

Task: Make Equivalent Plan Node

define-task make-equivalent-plan-node

input: base-plan-node, parent-plan-node, equivalent-topology-node

output: equivalent-plan-node

makes: (:and

(plan-node-parent (value equivalent-plan-node)

(value parent-plan-node))

(plan-node-object (value equivalent-plan-node)

(value equivalent-topology-node))

(:implies (plan-action (value base-plan-node))

(type-of-action (value equivalent-plan-node)

(type-of-action (value base-plan-node)))))

by procedure ...

Bill Murdock 33/44

Task:Inverted-Reversal-Task

define-task inserted-reversal-task

input: equivalent-plan-node

asserts: (type-of-action

(value equivalent-plan-node)

(inverse-of

(type-of-action

(value equivalent-plan-node))))

Bill Murdock 34/44

Adaptation UsingGenerative Planning

• Does not require a pre-existing model. Only requires operators and a set of facts (initial state).

• Invokes Graphplan– Operators = Those primitive tasks known to the agent which

can be translated into Graphplan’s operator language– Facts = Known assertions which involve relations referred to

by the operators– Goal = Makes condition of main task

• Translates plan into more general method by turning specific objects into parameters

• Stores method for later reuse

• Does not require a pre-existing model. Only requires operators and a set of facts (initial state).

• Invokes Graphplan– Operators = Those primitive tasks known to the agent which

can be translated into Graphplan’s operator language– Facts = Known assertions which involve relations referred to

by the operators– Goal = Makes condition of main task

• Translates plan into more general method by turning specific objects into parameters

• Stores method for later reuse

Bill Murdock 35/44

Adaptation UsingSituated Learning

• Does not require a pre-existing model. Does not even require preconditions and postconditions of the operators.

• Creates a method which performs any action, checks to see if the desired state has been achieved, and if not, loops.

• All decision making is done by Q-learning during execution.

• Over time, the Q-learning mechanism learns to select actions which tend to lead to desirable results.

• Does not require a pre-existing model. Does not even require preconditions and postconditions of the operators.

• Creates a method which performs any action, checks to see if the desired state has been achieved, and if not, loops.

• All decision making is done by Q-learning during execution.

• Over time, the Q-learning mechanism learns to select actions which tend to lead to desirable results.

Bill Murdock 36/44

ADDAMExample:

Layered Roof

Bill Murdock 37/44

Roof Assembly

1

10

100

1000

10000

100000

1000000

1 2 3 4 5 6 7

Number of Boards

Ela

ps

ed

Tim

e (

se

co

nd

s)

SituatedLearning

RelationMapping

GenerativePlanning

Bill Murdock 38/44

Modified Roof Assembly: No Conflicting Goals

1

10

100

1000

10000

100000

1 2 3 4 5 6 7

Number of Boards

Ela

pse

d T

ime

(sec

on

ds)

SituatedLearning

RelationMapping

GenerativePlanning

Bill Murdock 39/44

Computational Costs

• Reasoning about models incurs some costs.– For very easy problems, this overhead may not be

justified.– For other problems, the benefits enormously

outweigh these costs.

• Reasoning about models incurs some costs.– For very easy problems, this overhead may not be

justified.– For other problems, the benefits enormously

outweigh these costs.

Models can localize planning and learning.Models can localize planning and learning.

Bill Murdock 40/44

Knowledge Requirements

• Someone has to build an agent.• Builder should know what that agent does and

how it does it Can make model.• Analyst may be able to understand builder’s

notes, etc. Can make model• Some evidence for this in the context of

software engineering / architectural extraction.

• Someone has to build an agent.• Builder should know what that agent does and

how it does it Can make model.• Analyst may be able to understand builder’s

notes, etc. Can make model• Some evidence for this in the context of

software engineering / architectural extraction.

Bill Murdock 41/44

REM and SHOP:Comparison

• Primitive actions in REM involve interacting with a (possibly simulated) external environment.– Performing the same action twice in the same situation may

have different results; actions may be difficult or impossible to reverse.

– Thus REM can’t backtrack; if it fails, it has to start over with updated Q-values or symbolic adaptation.

– SHOP actions can include calls to arbitrary LISP code. However, that code can’t perform actions that aren’t undone by backtracking.

– Backtracking in SHOP is expensive, but not nearly expensive as starting over.

• REM is able to build new methods and modify existing methods. Methods in SHOP do not change.

• Primitive actions in REM involve interacting with a (possibly simulated) external environment.– Performing the same action twice in the same situation may

have different results; actions may be difficult or impossible to reverse.

– Thus REM can’t backtrack; if it fails, it has to start over with updated Q-values or symbolic adaptation.

– SHOP actions can include calls to arbitrary LISP code. However, that code can’t perform actions that aren’t undone by backtracking.

– Backtracking in SHOP is expensive, but not nearly expensive as starting over.

• REM is able to build new methods and modify existing methods. Methods in SHOP do not change.

Bill Murdock 42/44

REM and SHOP:Implications

• On problems that REM and SHOP can both address, SHOP is probably much faster because it can backtrack.

• One can write SHOP methods which leave many more decisions unresolved than one can for REM.

• However, SHOP can’t address problems where the effects of operators are cannot be backtracked over.

• If SHOP does not have a method for some task, it always fails. REM can build new methods for tasks.

• If REM’s methods for some task produce incorrect results, REM can modify those methods. SHOP cannot do this.

• On problems that REM and SHOP can both address, SHOP is probably much faster because it can backtrack.

• One can write SHOP methods which leave many more decisions unresolved than one can for REM.

• However, SHOP can’t address problems where the effects of operators are cannot be backtracked over.

• If SHOP does not have a method for some task, it always fails. REM can build new methods for tasks.

• If REM’s methods for some task produce incorrect results, REM can modify those methods. SHOP cannot do this.

Bill Murdock 43/44

Current Work: AHEAD• Theme: Analyzing hypotheses regarding asymmetric

threats (e.g., criminals, terrorists).– Input: Hypotheses regarding a potential threat– Output: Argument for and/or against the hypotheses

• Technique: Analogy over functional models– An extension to TMKL will encode known behaviors for

asymetric threats and the purposes that the behaviors serve.– Analogical reasoning will enable retrieval and mapping of

new hypotheses to existing models.– Models will provide arguments about how observed actions

do or do not support the purposes of the hypothesized behavior.

• Naval Research Laboratory / DARPA Evidence Extraction and Link Discovery program

• David Aha, J. William Murdock, Len Breslow

• Theme: Analyzing hypotheses regarding asymmetric threats (e.g., criminals, terrorists).– Input: Hypotheses regarding a potential threat– Output: Argument for and/or against the hypotheses

• Technique: Analogy over functional models– An extension to TMKL will encode known behaviors for

asymetric threats and the purposes that the behaviors serve.– Analogical reasoning will enable retrieval and mapping of

new hypotheses to existing models.– Models will provide arguments about how observed actions

do or do not support the purposes of the hypothesized behavior.

• Naval Research Laboratory / DARPA Evidence Extraction and Link Discovery program

• David Aha, J. William Murdock, Len Breslow

Bill Murdock 44/44

Summary

• REM (Reflective Evolutionary Mind)– Operating environment for agents that adapt

• TMKL (Task-Method-Knowledge Language)– The language for agents in REM– Functional modeling language for encoding

computational processes• Adaptation

– Some kinds of adaptation can be performed using specialized model-based techniques

– Others require more generic planning & learning mechanisms (localized using models)

• REM (Reflective Evolutionary Mind)– Operating environment for agents that adapt

• TMKL (Task-Method-Knowledge Language)– The language for agents in REM– Functional modeling language for encoding

computational processes• Adaptation

– Some kinds of adaptation can be performed using specialized model-based techniques

– Others require more generic planning & learning mechanisms (localized using models)

Bill Murdock1/44 J. William Murdock Intelligent Decision Aids Group Navy Center for Applied Research...

Documents

Transcript of Bill Murdock1/44 J. William Murdock Intelligent Decision Aids Group Navy Center for Applied Research...