Proactive Planning Using a Hybrid Temporal Influence...

Proactive planning using a hybrid temporal influence diagramfor human assistive robots

Woo Young Kwon and Il Hong Suh

Abstract— For a robot to interact with a person effectively,it needs to predict future events that will be caused by theperson to occur. By predicting events, a robot can take somepreparative actions to reduce the waiting time and greatlyimprove the interaction. To select the best proactive actionsand the best times for those actions, we propose a hybridtemporal influence diagram that can make proactive plans thatminimize expected waiting times in a human–robot interaction.To validate our proposed method, we show experimental resultsfor a robotic assistant in a manual assembly task.

I. INTRODUCTION

Seamless interaction between a human and a robot isan important issue in human–centered robotic applications.When people request assistance from a robot, they expect therobot to finish the requested work and report back as soonas possible. However, many human–robot interactions followa request-and-react pattern that becomes a rigid turn–takingpattern and induces delays. When there are many delays inthe reactions of the robot, many people become frustratedand annoyed. In order to make the interaction seamlessby reducing the waiting time, a robotic system shouldpredict future situations on the basis of human intentionsand behaviors. Predictive abilities facilitate anticipation andsmart decision-making by allowing a robot to decide whichaction to perform in order to obtain or avoid a predictedsituation and minimize the waiting time for both humansand robots.

For example, a smart assistant robot in a kitchen maypredict events in the cooking process and prepare equip-ment or ingredients accordingly. Robotic assistants on amanufacturing line that includes human workers may predictassembly tasks and prepare components and tools that will berequired. This robot anticipation allows the delivery of ap-propriate services without explicit requests, thus minimizinginteraction time between a human and a robot. Therefore,proactivity are essential for applications such as physicalhuman–robot interactions [1], socially assistive robotics [2],and human–robot cooperative assembly tasks [3], [4].

Prediction in human–robot interactions needs to predictsimultaneously the nature and time of the event [6]. Inorder to predict human intentions and behaviors, robots aregenerally required to learn complex causal and temporalrelationships that exist between multiple events in real time.

W. Y. Kwon is with the Department of Electronics and ComputerEngineering, Hanyang University, Korea [email protected]

I. H. Suh is with the Division of Computer Science and Engineering,College of Engineering, Hanyang University, Korea. All correspondencesshould be addressed to I. H. Suh [email protected]

Moreover, these relationships can change in a dynamic andnon-deterministic fashion.

Several researchers have proposed probabilistic ap-proaches to predict future events from both temporal andcausal perspectives [7], [8]. Among these many approaches,the dynamic Bayesian network (DBN) proposed by Dean andKanazawa [9] is one of the most widely known methods forpredicting sequential events. The DBN provides a method torepresent and infer temporal sequences of events by discretiz-ing time and creating an instance of each random variableat each point in time [10]. DBN approaches are widelyused for predictions of stochastic events. However, DBNshave several limitations for temporally predicting human–related events, because they use uniform time granularityand the first order Markov assumption. Based on continuoustime representations, Nodelman et. al. [11] have proposedContinuous time Bayesian network. Moreover Doya [12]have proposed reinforcement learning in continuous time do-main. However, their state-transition time is only representedas exponential distribution because they still use first orderMarkov assumption in state transitions.

In order to predict both the nature and the time of anevent simultaneously, hybrid temporal Bayesian network isproposed by modeling the time of an event as a explicitrandom variable in a continuous time domain [13], [14]. Thismethod is based on a separation between the occurrence of anevent and the start time of the event. By using this approach,both causal and temporal relationships between pair of eventscan be represented within one framework.

However, proactivity entails more than only prediction of afuture situation. Decision–theoretic approaches are requiredfor planning robotic behaviors in order to minimize thewaiting time with respect to a predicted future situation.Influence diagrams (IDs) [15] are well known mathematicalrepresentation of a decision situation in a probabilistic man-ner. However, it does not explicitly model the time of anevent as a random variable. Therefore, IDs have limitationsfor proactive planning of robotic actions. In this paper, wepropose a hybrid temporal ID (HTID) by combining a hybridtemporal Bayesian network with an ID. Thus, a robot systemcan make a proactive plan in order to facilitate a seamlesshuman–robot interaction by reducing the sum of waitingtimes.

The rest of this paper is organized as follows. SectionII presents the hybrid temporal Bayesian network to predictboth the causality and time of a future situation in proba-bilistic manner. Section III presents the HTID to determinesimultaneously the nature and time of a proactive action.

2013 IEEE International Conference on Robotics and Automation (ICRA)Karlsruhe, Germany, May 6-10, 2013

978-1-4673-5642-8/13/$31.00 ©2013 IEEE 1777

time

Observation trials

Occurrence of an eventX=x2

X=x3

time

Tem

pora

lP

rob.

Occ

urr

ence

pro

b.

X=x1

(0.4)X=x2

(0.3)

cases

X=x1

X=x3

(0.3)

Fig. 1. Two-dimensional temporal diagrams of an event.

Section IV describes several methods of proactive planning.Section V presents the experimental results, and Section VIfollows with concluding remarks.

II. HYBRID TEMPORAL BAYESIAN NETWORKREVISITED

In this section, we revisit the hybrid temporal Bayesiannetwork (HTBN) framework presented in [14], which repre-sents uncertainty and time of an event simultaneously as asingle node. A connection between two nodes represents acausal and temporal relationship.

A. Temporal representation of probabilistic events

We now consider a temporal event X with discrete states.Fig. 1 shows a representation of both aspects of the temporalevent: the occurrence probability of the event, X , and theoccurrence time of the event TX . Three different types oflines represent the occurrence time of an event with states,X = xi. The temporal probability of an event during aspecific time interval can be modeled as the joint probabilityof a static random variable corresponding to the frequencyof the event and a temporal random variable correspondingto the start time of the event. For example, the probability oflunch starting between 11 a.m. and 1 p.m. can be modeled asthe joint probability of two random variables: the frequencyof a lunch event each day and the start time of the lunchevent. This means that the time interval probability of thean event X = xi from t1 to t2 has the same meaning asthe temporal probability of an event within a specific timeinterval. This is represented as P (X = xi, t1 < TX < t2),where X represents the relative frequency of the event andTX represents the start time of the event X. For abbreviation,we use xi as X = xi. Then, the time interval probabilitycan be revised as

P (xi, t1 < TX < t2) = P (xi)P (t1 < TX < t2|xi)

= P (xi)

∫ t2

t1

fXi(tX) dtX , (1)

where fXi(tX) is the probability density function (PDF) of

the temporal random variable of TX , when X = xi, and tXis a value of TX .

U TU

X TX

U X

(a) (b)

Fig. 2. Causal and temporal relationship between two events: (a)conceptual network (b) representation as a hybrid Bayesian network.

In fact, it is necessary to model causal and temporalrelationship between events rather than modeling the intra-relationships of a temporal event. If two temporal events Uand X occur sequentially, there can be a causal relationshipsuch that U is the cause and X is the effect. Moreover,there can also be a temporal relationship between temporalrandom variables TU and TX . These causal and temporalrelationships between two events are shown in Fig. 2(a),where the double-line notation represents coupled randomvariables with both causal and temporal probability. Consid-ering the temporal event, X and TX should be treated as aninseparable pair. These time intervals can be derived fromthe conditional probability of two events.

Given the observations of the temporal event U withU = ui and TU = tU , the time interval probability of thetemporal event X with X = xi and t1 ≤ TX ≤ t2 canbe represented as P (xi, t1 ≤ TX ≤ t2|uj , tU ). By usingconditional independence represented as a hybrid Bayesiannetwork shown in Fig. 2(b), the time interval probability ofthe conditional temporal event is represented as

P (xi, t1 ≤ TX ≤ t2|uj , tU )= P (xi|uj)P (t1 ≤ TX ≤ t2|tU , xi, uj)

= P (xi|uj)∫ t2

t1

fZij(tX − tU )dtY , (2)

where Zij is a conditional temporal random variable of thetime-interval between TX and TY , given U = uj and X =xi, and fZij

is a PDF of Zij . This causal and temporalrelationships in (2) can be represented by the hybrid Bayesiannetwork in Fig. 2(b).

B. Semantics of Hybrid Temporal Bayesian Network

HTBN is defined by using joint probability of both tempo-ral and causal relationships, where discrete nodes representfrequency of events and continuous nodes represent timesof those events. In HTBN, events are categorized into twotypes: a temporal event and a causal event.

First, a temporal event is employed when the time ofthe event has non-trivial information for modeling a certainsituation. For example, an event of eating lunch can berepresented by using both the occurrence and its time. There-fore, it includes a continuous random variable for time and adiscrete random variable for causality. In this paper, we willuse a coupled notation for a temporal event X = {X,TX}.

Next, a causal event is employed when the time of an eventis unnecessary for modeling a certain situation. For example,

1778

X

Yi Yj· · ·

· · · Ui Uj

Di

Dj

···

(a)

X

Y1

TU1U1

D1

DM

TX

TY1 TYOYO

TUNUN· · ·

· · ·

···

(b)

Fig. 3. HTBN representation: (a) a conceptual representation of HTBN,(b) HTBN as a hybrid Bayesian network.

some events such as a set of outcomes in probability theoryare irrelevant to its time. When we consider a human-robot interaction scenario, personal characteristics such asappearance, name, age, and gender are unchanged for theinteractions. Therefore, it is unnecessary to model timesof those characteristics as random variables. These timeirrelevant event are modeled as discrete random variablesas in the conventional hybrid Bayesian networks. Fig. 3(a)shows conditional independence among temporal events andcausal events in a similar way as in Bayesian network, whileFig. 3(b) shows its hybrid Bayesian network representationby using discrete random variables and continuous temporalrandom variables. As a result, HTBN consists of three typesof random variables as follows:• A discrete causal node of a temporal event(drawn as a

rectangle in Fig. 3)• A continuous temporal node of a temporal event (drawn

as a circle)• A discrete node of causal event (drawn as a rectangle)Semantics of HTBN for an edge between two temporal

events such as X→ Y are as follows:• A causal relationship between two discrete causal ran-

dom variables: X → Y• A temporal relationship between two continuous tem-

poral random variables: TX → TY• A hybrid relationship between a discrete causal random

variable and a continuous temporal random variable forthe same temporal event: X → TX and Y → TY

• A hybrid relationship between a causal parent node anda temporal child node: X → TY

Moreover, semantics of HTBN for an edge between atemporal event and a causal event such as D → X are asfollows:• A causal relationship between two discrete causal ran-

dom variables: D → X

Finally, HTBN employ mixture of conditional Gaussian

distribution for continuous temporal nodes, and employconditional probability tables for discrete causal nodes.

III. HYBRID TEMPORAL INFLUENCE DIAGRAM

A. Influence diagram

As mentioned above, seamless human–robot interactionsrequire predictions of future situation. However, anticipationof a future situation requires more than only its prediction.In addition to the problem of prediction that can be solved byBayesian networks, additional elements for decision makingare required, such as decision variables, utility functions, andtheir relations. An ID is a probabilistic network for reasoningabout decision making under uncertainty that can be seen as aBayesian network augmented with decision nodes and valuenodes [16].

An ID is a directed acyclic graph with three types ofnode and three types of arcs between nodes. A chance noderepresents a random variable whose value is governed bysome probability distribution. It has the same meaning as inBayesian networks. A decision node represents a decisionvariable whose value is to be chosen by the decision maker,and which has deterministic values. A value node representsa real-valued utility function, and it cannot have a child.Arcs ending in chance nodes indicate that a probabilisticrelationship might exist between the two events. This is thesame for Bayesian networks. Arcs ending in a decision nodeindicate that the decision at their heads is made on the basisof the known outcomes of all the nodes at their tails. Arcsending in value nodes indicate functional relations for utilityfunctions. An ID can model a complex decision problem asa compact form.

B. Hybrid temporal influence diagram

IDs are widely used for decision making under conditionsof uncertainty; however, they have limitations if used forproactive human–robot interactions. IDs can provide asolution in terms of deciding the best actions, but decidingthe times of those best actions is another problem altogether.IDs are not sufficient for solving such a problem. Therefore,we propose a novel method to determine both the natureand time of the actions by extending HTBN with temporaldecision nodes and temporal value nodes. The proposeddecision network is called a hybrid temporal ID (HTID). Aswith a HTBN, a double-line notation is used for temporalchance nodes, temporal decision nodes, and temporal valuenodes. A double-lined node indicates that both causal andtemporal information are represented in that one node.

First, we represent a formal definition of HTID. An HTIDis a directed acyclic graph G = (U , E), where the setof nodes U can be partitioned into three disjoint subsets:temporal chance nodes UC , temporal decision nodes UD, andtemporal value nodes UV . Each node C in UC is a compositerandom variable with discrete states and continuous time,where C = {C, TC} as for an HTBN. Moreover, each nodeD in UD is a composite deterministic variable with discretestates and continuous time, where D = {D, tD}, where Drepresents a set of decisions and tD represents the time of

1779

Chance Decision

Temporal Utility

O1 ON...

U1 UM...

Fig. 4. Single temporal decision problem.

a decision. Finally, each node V in UV has an associatedtemporal utility function. The arguments of the temporalutility function are discrete state values and their times forthe predecessors of the value nodes.

Fig. 4 shows an example of an HTID for a single de-cision problem, where O1, · · · , ON are observed temporalchance nodes and U1, · · · , UM are unobserved temporalchance nodes. Arcs into the temporal utility nodes representfunctional relations for a temporal utility function. The arcbetween a temporal chance node and a temporal decisionnode indicates the temporal order for decisions; that is, thedecision for a node should be made after the related temporalchance is observed. In the diagram, a temporal utility isgiven as U (ci, tC , dj , tD) by using state values and theirtimes for the related chance node and the related decisionnode. Moreover, the probability of the temporal chancenode is given by P (ci, TC |o1···N , tO1···N ), where o1···N andtO1···N are the abbreviations for a set of discrete observationso1, · · · , oN and a set of times tO1

, · · · , tON, respectively.

Therefore, the expected utility of a decision D = di at timetD for temporal observations o1, · · · , oN and times tO1···N

is given by

EU (dj , tD|o1···N , tO1···N )

=

∫tC

∫tU1···M

∑C

∑U1···N

U (C, tC , dj , tD) ·

P (C, tC ,U1···M , tU1···M |o1···N , tO1···N ) dtCdtO1···N , (3)

where P (C, tC ,U1···M , tU1···M |o1···N , tO1···N ) can be in-ferred by using the HTBN. By maximizing the expectedtemporal utility, we can select the best decision and its time.

IV. PROACTIVE PLANNING WITH HYBRID TEMPORALINFLUENCE DIAGRAM

A. Proactive planning for a single action

Let us consider human cooperation with an assistive robot.When a person needs help from an assistive robot, the humanintention is “I wish to seek help from a robot.” Based onthis human intention, the person will request a service fromthe assistive robot. After the robot recognizes the intentionalhuman behavior of requesting a service, the robot has toprepare and provide the requested service. However, therewill be delays between the request and the provision of therequested service. Such delays make people frustrated andannoyed. Moreover, these delays also decrease the efficiencyof human–robot cooperation. Providing responses or servicesin time is important in human–centered robot applications.In other words, time delays of interactions can be used to

O1

U1

ON...

Proactive Action

IntentionalHuman behavior

Result of aproactive action

Start of a proactive action

Non-intentional human behaviors

Temporal Uility

(a)

(b)

... UM

Fig. 5. A HTID representation for a single proactive action.

measure the efficiency of human-–robot cooperation. Delaysbetween the request for a service and the provision of therequested service are defined by using the kind of intentionalhuman behavior hi and its time tH as well as the kindof service provided rj and its time tR. Therefore, theutility function expressed as by U (hi, tH , rj , tR). Fig. 5shows a schematic of a proactive decision scenario using anHTID. Fig. 5(a) represents a probabilistic model for humanbehavior, and Fig. 5(b) represents probabilistic model forproactive robotic action. These two models are related by atemporal value function.

In order to provide the right service at the right time,a robot is required to know the kind of intentional humanbehavior hi, such as requesting a service, and its time tH .Before an intentional human behavior is observed, it can bepredicted in a probabilistic way from any relevant precedingbehaviors. We call these preceding behaviors unintentionalhuman behaviors. From the exemplary HTID in Fig. 5,the temporal probability of an intentional human behavior,H = hi, TH = tH , given sets of discrete and temporalevidence o1···N and tO1···N , is represented as

P (hi, tH |o1···N , tO1···N ) =∑

U1···M

∫tU1···M

P (hi, tH ,U1···M , tU1···M |o1···N , tO1···N )dtU1···M

= P (hi|o1···N )

N∑j=1

N(tHi− tOj

;µij , σ2ij

). (4)

This time interval probability can be inferred from an HTBN.After probabilistic modeling of human behaviors, the nextstep for proactive planning is modeling robotic actions. Ingeneral, providing robotic services is time-consuming andthus there are some delays between a human requesting aservice and a robot providing the requested service. In orderto create a situation where a robot is ready to provide aservice, each robot should proactively start several actions.We modeled a proactive action with two nodes: a decisionnode for the start of the proactive action and the resultantchance node of the proactive action. Once a robot starts aproactive action sj at time of tS , the resultant event and itstime are given by

P (ri, tR|sj , tS) = P (ri|sj)N(tR − tS ;µij , σ

2ij

). (5)

1780

Proactive Action1

IntentionalHuman Behavior1

Result of a robotic proactive action1

Start of a robotic proactive action1

Temporal Uility1

Proactive Action2

IntentionalHuman Behavior2

Result of a robotic proactive action2

Start of a robotic proactive action2

Temporal Uility2

Non-intentional human behaviors

O1 ON...

U1 UM...

Fig. 6. HTID representation for multiple proactive actions.

In summary, the expected utility of a proactive action isgiven (3)–(5) as

EU (si, tS |o1···N , tO1···N ) =∑H,R

∑U1···M

∫tH ,tR

∫tU1···M

U(H, tH , R, tR)

P (hi, tH |o1···N , tO1···N )P (R, tR|si, tS)dtCdtR. (6)

After several operations, (6) yields a function of tS . Byminimizing the expected utility, the best proactive action andits time are computed. Therefore, the optimal policy for aproactive action si and its time tS is given by

δS = argmaxsi∈S,tS∈(−∞,∞)

EU (si, tS |o1···N , tO1···N ) . (7)

B. Proactive Planning for multiple actions

In practice, many decision problems are complex onesthat consist of multiple actions, chance nodes, and utilities.Fig. 6 shows a schematic of a proactive planning scenariofor multiple actions. There are two proactive actions, S1 ={S1, TS1} and Ss = {Ss, TSs}, as well as two intentionalhuman behaviors, H1 = {H1, TH1} and H2 = {H2, TH2}.Next, the delay between the result of the ith robotic proactiveaction and the ith intentional human behavior is modeled bythe temporal utility function Ui.

The arc to temporal decision S2 from R1 is called aninformation arc. Semantically, it specifies that R1 is observedbefore a robot decides on S2. Thus, this arc specifiesthe temporal ordering between the decision node and itsparents. In order to determine the optimal policy for multipleproactive actions, we need to specify a temporal orderingover all variables in the HTID. The notation A ≺ B denotesthat B is preferred to A. In other words, ≺ indicates totalordering among the variables in the HTID.

The temporal ordering over all variables in Fig. 6 is givenby

I0 ≺ D1 ≺ I1 ≺ D2 ≺ I2, (8)

where D1 = S1, D2 = S2, and I0 = {}, I1 = {R1},I2 = {H1, H2, R2,O1...N ,U1...N}.

Given a temporal ordering over all variables in the HTID,the optimal policy of the HTID is determined by extending

the optimal policy equation of the conventional ID [17] asfollows:

Theorem 1: Let HTID be an hybrid temporal influencediagram over U = UC ∪ UD ∪ UV Let temporal order of thevariable be described as I0 ≺ D1 ≺ · · · ≺ Dn ≺ In and letV =

∑i Vi. Then an optimal policy for Di is

δDi(I0,D1, . . . , Ii−1)

= argmaxDi

∑Ii

maxDi+1

· · ·maxDN

∑IN

P (UC|UD)V.

For example, the optimal policies for S1 and S2 in Fig. 6,are given by

δS1 =

argmaxs1∈S1,tS1

∈(−∞,∞)

∑R1

∫tR1

maxs2∈S2,tS2

∈(TR1,∞)∑

H1

∑H2

∫tH1

∫tH2

P (UC|UD)VdtH1dtH2

, (9)

and

δS2 = argmaxs2∈S2,tS2

∈(TR1,∞)∑

H1

∑H2

∫tH1

∫tH2

P (UC|UD)VdtH1dtH2

, (10)

respectively. Here,

P (UC|UD) = P (R1, tR1|s1, tS1

)

P (R2, tR2|S2, tS2

)P (H1, tH1)P (H2, tH2

) (11)

and

V = V1 (R1, tR1 , H1, tH1) + V2 (R2, tR2 , H2,tH2). (12)

In (9) and (10), the range of tS2 is given by tS2 ∈ (TR1 ,∞),which is a time constraint for optimization. That is, theproactive action S2 should begin after the previous proactiveaction S1 is finished.

V. EXPERIMENTS

In order to evaluate the proposed proactive planningmethod, we consider a toy car assembly task as shown inFig 7. To complete the task, a person has to assemble thetoy car shown in Fig. 7(b) from the various parts shown inFig. 7(c). During the task, an assistive robot has to predictwhich type of subassembly the person will require and when.Moreover, the robot has to provide the required subassemblybased on the predicted human intention before the personactually makes a request. The experimental environment isshown in Fig. 7(a). Here, all parts are located in boxes onthe table. A position sensitive device (PSD) is attached insideeach box in order to detect the hand entering and exiting thebox. Two participants assemble a toy car that consists of 10components and 10 assembly steps, as shown in Fig. 7(b).

1781

(a)

(b)

(c)

Fig. 7. Illustration of a toycar assembly-task: (a) experimental environmentfor the task (b) an assembled toy car (c) all components of toy car.

Body 1st Axis

1st Joint 1st Wheel 2nd wheel

intention of 2nd Subassembly

2nd Joint

Hood

Finished to make a subassembly

Start to make a Subassembly

TemporalUtility

or

or

Human activity

Robot's activity

Fig. 8. Example of HTID model for proactive action selection scenario.

We conducted 100 independent experiments of the assemblytask to obtain data sets From sensory signals, the systemrecords experimental data, including which kind of part theperson picks up and the time. By using these time-stampeddata, we can model the HTBN and HTID for proactivity. Inthe two following scenarios, initial parameters are learnedfrom experimental data by using maximum likelihood es-timation method. These initial parameters are updated ateach trials by using maximum likelihood estimation basedon obtained experimental data at each trial.

A. Proactive action selection scenario

In the first experiment, proactive planning for a singleaction is employed to show interaction between a human andan assistive robot. Fig. 8 shows an HTID model for a human–robot cooperative scenario. All variables except 1st Axisand Intention of 2nd Subassembly have two discrete states:{occurrence, unknown}. The chance variables of 1st Axisand Intention of 2nd Subassembly have states {front, rear}.The upper part of Fig. 8 represents human activity. Duringthe manual assembly task, a person uses a front or rearsubassembly to be made by the assistive robot. Of thesetwo required subassemblies, the assistive robot makes anddelivers only one. The nature and timing of the subassemblydepends on human activities. The purpose of proactiveplanning in this scenario is to provide an appropriate sub-assembly at the right time with minimum delay. Therefore,we designed the temporal utility function as follows:

1 2 3 4 5 650

55

60

65

70

75

Task

exe

cutio

n tim

e[se

c]

Trials

Proactive Reactive

Fig. 9. Comparison of total task execution times for proactive actionselection.

Body

1st Joint

intention of 2nd Subassembly

2nd Joint

Hood

Finished to make 2nd subassembly

Start to make 2nd Subassembly

TemporalUtility2

or

Human activity

Robot's activity

intention of 1st Subassembly

Finished to make 1st subassembly

Start to make 1st Subassembly

TemporalUtility1

or

Fig. 10. Example of HTID model for proactive planning scenario.

U(H = front,R = front, tH , tR) = N (tH − tR; 0, 1)

U(H = rear,R = rear, tH , tR) = N (tH − tR; 0, 1)

U(H = front,R = rear, tH , tR) = −1U(H = rear,R = front, tH , tR) = −1

(13)

When a robot provides the right types of subassembly, astandard normal distribution is employed as the temporalutility function. When a robot provides different types ofsubassembly that are demanded by the human activity, anegated constant penalty value is used for temporal utility.Parameters for HTID models are initially given by experi-mental data from demonstrations on reactive action selection.Ten demonstrations are used for initial parameters. Next,repetitive tasks are performed for proactive action selection.

A comparison between experimental results for reactiveaction selection and proactive action selection is shown inFig. 8. Parameters for an HTID are incrementally learnedfrom trials. As the number of trials is increased, the timedelays for interaction are reduced. Thus, the overall taskexecution time is reduced.

B. Proactive planning scenario for multiple actions

In the second experiment, proactive planning for multipleactions is employed. Fig. 10 shows an HTID model for ahuman–robot cooperative scenario. The upper part of Fig. 10represents human activity. During the manual assembly task,a person uses two subassemblies to be made by the assistiverobot. The utility function employed is the same as thatin (13), and the parameter learning for the HTID model inFig. 10 is the same as that in Section V-A.

A comparison between experimental results for on-demand(reactive) action selection and proactive planning is

1782

1 2 3 4 5 650

55

60

65

70

75

80

Proactive Reactive

Task

exe

cutio

n tim

e[se

c]

Trials

Fig. 11. Comparison of total task execution times for proactive planning.

Human Intentional

Behavior1

Proactive Action1

Proactive Action2

Human Intentional

Behavior2

pro

ba

bil

ity

timet=0

(a)Human Intentional

Behavior1

Proactive Action1

Proactive Action2

Human Intentional

Behavior2

pro

ba

bil

ity

timet=0

(b)

Fig. 12. Exemplary temporal diagrams for proactive planning: (a) proac-tive plans without temporal constraints (b) proactive plans with temporalconstraints.

shown in Fig. 11. As the number of trials is increased,the overall task execution time is reduced. However, theimprovement in task execution time is not remarkable ascompared to that for a single proactive action selectionalthough the robot provides much more assistance than inthe other experiment. The reason is that the robot cannotmake proactive plans with no delay.

Fig. 12 shows why the robot cannot make proactive planswith no delay. If a robot finishes proactive actions at theexpected time of an intentional human behavior, as shownin Fig. 12(a), there is no delay. However, zero delayis unrealistic due to temporal constraints. In fact, actionplanning has two temporal constraints: (1) a robot cannotperform two actions at the same time, and (2) the start timeof a planned action should be after the present time. Thesetwo temporal constraints are applied, and the actual proactiveplan is given in Fig. 12(b), where the total execution timeis greater than in Fig. 12(a). These temporal constraints areused in the optimal policy equations (9) and (10).

VI. CONCLUSION

In this paper, we have proposed a novel hybrid temporalinfluence diagram that can be used for proactive planning ofseamless human—robot interactions. Our main contributionis that our proposed planning method is able to infer thebest actions and their timing in order to minimize delaysin human–robot interactions. By probabilistic temporal pre-diction of human-related situations, we model the temporalutility function as the delay between the expected timeof a human intention and the provision time of a roboticservice. By maximizing the temporal utility function basedon a decision-theoretic approach, a proactive assistive robot

can infer which action will reduce delays in human–robotinteraction. Our experimental results show that the proposedmethod decreases the total task execution time by antic-ipating preparatory actions through predictions of humanbehaviors.

ACKNOWLEDGMENT

This work was supported by the Global Frontier R&DProgram on “Human-centered Interaction for Coexistence”funded by the National Research Foundation of Koreagrant funded by the Korean Government(MEST) (NRF-M1AXA003-2011-0028353). This work was also supportedby the Industrial Strategic Technology Development Pro-gram(10044009) funded by the Ministry of KnowledgeEconomy(MKE, Korea).

REFERENCES

[1] T. Takeda, Y. Hirata, and K. Kosuge, “Dance partner robot cooperativemotion generation with adjustable length of dance step stride based onphysical interaction,” in 2007 IEEE/RSJ International Conference onIntelligent Robots and Systems (IROS). IEEE, 2007, pp. 3258–3263.

[2] J. Kinugawa, Y. Tanaka, Y. Kawaai, Y. Sugahara, and K. Kosuge, “Apath generation method for collision risk reduction and quantitativeevaluation of assembly task partner robot,” in Advanced IntelligentMechatronics (AIM), 2011 IEEE/ASME International Conference on.IEEE, 2011, pp. 409–415.

[3] Y. Demiris, “Knowing when to assist: Developmental issues in lifelongassistive robotics.” in Conference proceedings:... Annual InternationalConference of the IEEE Engineering in Medicine and Biology Society.IEEE Engineering in Medicine and Biology Society. Conference,vol. 1, 2009, p. 3357.

[4] G. Hoffman and C. Breazeal, “Cost-based anticipatory action selectionfor human-robot fluency,” IEEE Transactions on Robotics, vol. 23,no. 5, p. 952, 2007.

[5] M. Huber, M. Rickert, A. Knoll, T. Brandt, and S. Glasauer,“Human-robot interaction in handing-over tasks,” in Robot and HumanInteractive Communication, 2008. RO-MAN 2008. The 17th IEEEInternational Symposium on, 2008, pp. 107–112.

[6] N. Sebanz and G. Knoblich, “Prediction in joint action: What, when,and where,” Topics in Cognitive Science, vol. 1, no. 2, pp. 353–367,2009.

[7] S. F. Galan, G. Arroyo-Figueroa, F. J. Dıez, and L. E. Sucar,“Comparison of two types of event bayesian networks: A case study,”Appl. Artif. Intell., vol. 21, no. 3, pp. 185–209, 2007.

[8] S. Haider and A. Zaidi, “Transforming Timed Influence Nets intoTime Sliced Bayesian Networks*,” Journal of Approximate Reasoning,vol. 30, pp. 181–202, 2002.

[9] T. Dean and K. Kanazawa, “A model for reasoning about persistenceand causation,” Computational Intelligence, vol. 5, no. 2, pp. 142–150,1989.

[10] G. Arroyo-Figueroa and L. Sucar, “Temporal Bayesian network ofevents for diagnosis and prediction in dynamic domains,” AppliedIntelligence, vol. 23, no. 2, pp. 77–86, 2005.

[11] U. Nodelman, C. Shelton, and D. Koller, “Continuous time Bayesiannetworks,” in Proceedings of the Eighteenth International Conferenceon Uncertainty in Artificial Intelligence, vol. 378–387, 2002.

[12] K. Doya, “Reinforcement learning in continuous time and space,”Neural computation, vol. 12, no. 1, pp. 219–245, 2000.

[13] W. Kwon and I. Suh, “Towards proactive assistant robots for humanassembly tasks,” in Proceedings of the 6th international conferenceon Human-robot interaction. ACM, 2011, pp. 175–176.

[14] ——, “A temporal bayesian network with application to design of aproactive robotic assistant,” in Robotics and Automation (ICRA), 2012IEEE International Conference on. IEEE, 2012, pp. 3685–3690.

[15] J. Pearl, “Influence diagrams-historical and personal perspectives,”Decision Analysis, vol. 2, no. 4, pp. 232–234, 2005.

[16] R. Howard and J. Matheson, “Influence diagrams,” Decision Analysis,vol. 2, no. 3, pp. 127–143, 2005.

[17] F. Jensen and T. Nielsen, Bayesian networks and decision graphs.Springer Verlag, 2007.

1783

Proactive Planning Using a Hybrid Temporal Influence...

Documents

Transcript of Proactive Planning Using a Hybrid Temporal Influence...