Stefanie Tellex Ross Knepper, Adrian Li, Daniela Rus ... · 33 Assembly System Strips-style...

Post on 06-Aug-2020

1 views 0 download

Transcript of Stefanie Tellex Ross Knepper, Adrian Li, Daniela Rus ... · 33 Assembly System Strips-style...

Asking for Help Using Inverse Semantics

Stefanie Tellex

Ross Knepper, Adrian Li, Daniela Rus, Nicholas Roy

2

3

How can robots collaborate with people using natural language?

● Following instructions.– “Put the metal crate on the truck.”

4

Symbol Grounding Problem

“The pallet of boxes on the left.”

5

6

Move the pallet from the truck.

Remove the pallet from the back of the truck.

Offload the metal crate from the truck.

Pick up the silver container from the truck bed.

7

Move the pallet from the truck.

Remove the pallet from the back of the truck.

Offload the metal crate from the truck.

Pick up the silver container from the truck bed.

8

Move the pallet from the truck.

Remove the pallet from the back of the truck.

Offload the metal crate from the truck.

Pick up the silver container from the truck bed.

9

How can robots collaborate with people using natural language?

● Following instructions.– “Put the metal crate on the truck.”

● Asking questions.– “What does 'the metal crate' refer to?”

● Requesting Help.– “Hand me the black leg that is under the table.”

10

Offload the metal crate from the truck.

11

Offload the metal crate from the truck.

12

Offload the metal crate from the truck.

What does 'the metal crate'

refer to?

13

What does 'the metal crate'

refer to?

Offload the metal crate from the truck.

The box pallet near the ammo pallet.

14

What does 'the metal crate'

refer to?

Offload the metal crate from the truck.

The box pallet near the ammo pallet.

15

“Put the pallet on the truck.”

16

“Put the pallet on the truck.”

What does 'the truck'refer to?

17

“Put the pallet on the truck.”

What does 'the truck'refer to?

18

“Put the pallet on the truck.”

What does 'the truck'refer to?

What does 'the pallet'refer to?

19

Information-theoretic Human-Robot Dialog

● Identify uncertain parts of the command.

Generalized Grounding Graphs to model word meanings.

● Ask a targeted question.

Use a metric based on entropy to select questions.

● Use information from the answer to infer better actions.

Merge graphs based on linguistic coreference.

Joint work with Robin Deits, Pratiksha Thaker

20

How can robots collaborate with people using natural language?

● Following instructions.– “Put the metal crate on the truck.”

● Asking questions.– “What does 'the metal crate' refer to?”

● Requesting Help.– “Hand me the black leg that is under the table.”

21

22

23

24

25

Help me!

26

Please hand me the white table leg.

27

28

Solution Overview

1. Detect the failure.

2. Infer an action to fix the problem.

3. Infer a natural language sentence describing the action.

4. Replan after the human has provided help based on the updated state.

29

Prior Work

Understanding Language Generating Language

Matuszek et al., 2012 MacMahon et al., 2006 Dzifcak et al., 2009Kollar et al., 2010Tellex et al., 2011

Jurafsky and Martin, 2008Reiter and Dale, 2000 Striegnitz et al., 2011Garoufi and Kaoller, 2011Chen and Mooney, 2011Golland et al., 2010Krahmer et al., 2012

This work

30

Prior Work: Unifying Generation and

Understanding● Goodman and Stuhlmueller (2013)

– Bayesian approach to generate and understand language.

– Bag-of-words models of semantics demonstrated in simulation.

● Vogel et al. (2013)– DEC-POMDP to demonstrate Gricean maxims emerge from multiagent interaction.

– Bag-of-words models of semantics demonstarted in simulation.

● This work– Bayesian approach to generate and understand grounded language for robots.

– Compositional grounded semantics demonstrated on an end-to-end robotic system.

● Dragan and Srinivasa (2012)– Analogous mathematical framework for gesture interpretation and production.

31

Solution Overview

1. Detect the failure.

2. Infer an action to fix the problem.

3. Infer a natural language sentence describing the action.

4. Replan after the human has provided help based on the updated state.

32

Solution Overview

1. Detect the failure.

2. Infer an action to fix the problem.

3. Infer a natural language sentence describing the action.

4. Replan after the human has provided help based on the updated state.

33

Assembly System

● Strips-style symbolic planner to assemble a piece of furniture. (Knepper et al., 2013)

● Pre- and post- conditions for each action.

action attach_leg_to_top(robot(Robot), leg(Leg), table_top(TableTop)) {

    pre {        robot.arm.holding == leg;        table_top.hole[0].attached_to == None;    }    post {        robot.arm.holding = None;        table_top.hole[0].attached_to = leg.hole;        leg.hole.attached_to = table_top.hole[0];    }}

34

Solution Overview

1. Detect the failure.

2. Infer an action to fix the problem.

3. Infer a natural language sentence describing the action.

4. Replan after the human has provided help based on the updated state.

35

Solution Overview

1. Detect the failure.

2. Infer an action to fix the problem.

3. Infer a natural language sentence describing the action.

4. Replan after the human has provided help based on the updated state.

36

Infer an Action

● Rule-based heuristic to generate symbolic request.

Failed symbolic condition Symbolic requestpart.visible == True; locate_part(robot, part)

robot.arm.holding == leg; give_part(robot, part)

leg.aligned == top.hole[0]; align_with_hole(leg, top, hole)

leg.hole.attached == top.hole[0]; screw_in_leg(leg, top, hole)

top.upside_down == True; flip(top)

37

Solution Overview

1. Detect the failure.

2. Infer an action to fix the problem.

3. Infer a natural language sentence describing the action.

4. Replan after the human has provided help based on the updated state.

38

Solution Overview

1. Detect the failure.

2. Infer an action to fix the problem.

3. Infer a natural language sentence describing the action.

4. Replan after the human has provided help based on the updated state.

39

Infer a Natural Language Sentence

● Input: Symbolic Request● Output: Natural language request

Symbolic request Natural Language Requestlocate_part(robot, part) Find the part.

give_part(robot, part) Give me the part.

align_with_hole(leg, top, hole) Align the part with the hole.

screw_in_leg(leg, top, hole) Screw in the leg.

flip(top) Flip the table.

Template baseline

40

Hand me the part.

41

Hand me the white leg.

42

Hand me the white legthat is on the table.

43

γk are groundings, or objects, places, paths, and

events in the external world. Each γk

corresponds to a

constituent phrase in the language input.

γ3

γ1

γ2

Hand me the white legthat is on the table.

44

SemanticsUnderstanding (Forward Semantics):

What groundings match the language?

argmaxγ1 ...γN

f (γ1 ... γN , language)

γk are groundings, or objects, places, paths, and

events in the external world. Each γk

corresponds to a

constituent phrase in the language input.

45

Semantics

argmaxγ1 ...γN

p(γ1 ... γN∣language)

argmaxγ1 ...γN

1Z∏i

gi(γ1 ... γN , language)

Tellex et al. 2011

γk are groundings, or objects, places, paths, and

events in the external world. Each γk

corresponds to a

constituent phrase in the language input.

Understanding (Forward Semantics): What groundings match the language?

46

Semantics

argmaxγ1 ...γN

p(γ1 ... γN∣language)

argmaxγ1 ...γN

1Z∏i

gi(γ1 ... γN , language)

Tellex et al. 2011

γk are groundings, or objects, places, paths, and

events in the external world. Each γk

corresponds to a

constituent phrase in the language input.

Understanding (Forward Semantics): What groundings match the language?

47

Searching for Groundings

Hand me the white leg on the black table.

argmaxγ1 ...γN

1Z∏i

gi(γ1 ... γN , language)

g1(γ1, the black table)=0.1

γ1

48

Searching for Groundings

Hand me the white leg on the black table.

argmaxγ1 ...γN

1Z∏i

gi(γ1 ... γN , language)

g1(γ1, the black table)=0.9

γ2

49

Searching for Groundings

Hand me the white leg on the black table.

argmaxγ1 ...γN

1Z∏i

gi(γ1 ... γN , language)

g1(γ1, the white leg on the black table )=0.1

γ3

γ2

50

Searching for Groundings

Hand me the white leg on the black table.

argmaxγ1 ...γN

1Z∏i

gi(γ1 ... γN , language)

g1(γ1, the white leg on the black table )=0.9

γ4

γ1

51

Searching for Groundings

Hand me the white leg on the black table.

argmaxγ1 ...γN

1Z∏i

gi(γ1 ... γN , language)

g1(γ1, Hand me the white leg on the black table )=0.1

γ4

γ1

γ5

52

Searching for Groundings

Hand me the white leg on the black table.

argmaxγ1 ...γN

1Z∏i

gi(γ1 ... γN , language)

g1(γ1, Hand me the white leg on the black table )=0.9

γ4

γ1

γ6

53

Training the Semantics Model

Type the words you would use to ask a person to carry out the action you see in this video.

54

Training the Semantics Model

Pick up a black table leg off of the floor.Pick up the black table leg.Pick up the black table leg.Walk over to the white table.Place black leg on white table bottom.Locate the black table leg on the floor by the white table. Find the black table leg and attach it to the white table. Hand me the black table leg

55

Robotic Demonstrations of G3

Tellex et al. AAAI 2011, Kollar, Tellex et al. HRI 2010, Huang, Tellex et al., IROS 2010, Tellex et al., JHRI 2013, Tellex et al., MLJ 2013

56

Semantics

Generation (Inverse Semantics): What language specifies the groundings?

argmaxlanguage

f (γ1 ... γN , language)

argmaxγ1 ...γN

f (γ1 ... γN , language)

γk are groundings, or objects, places, paths, and

events in the external world. Each γk

corresponds to a

constituent phrase in the language input.

Understanding (Forward Semantics): What groundings match the language?

57

Searching for Sentences

58

Context Free Grammar

S→VP NP S→VP NP PP

PP→TO NPVP→ flip∣give∣pickup∣place

NP→the white leg∣the black leg∣methe white table∣the black table

TO→under∣on∣near

NP→NP PP

59

Searching for SentencesS

VP

Pick up

NP

the white leg

PPNP

TO NP

near the black table

60

Searching for SentencesS

VP

Pick up

NP

the white leg

PPNP

TO NP

near the black tableγ

1

g1(γ1, the black table)=0.1

61

Searching for SentencesS

VP

Pick up

NP

the white leg

PPNP

TO NP

near the black tableγ

1

g1(γ1, the black table)=0.9

Beam search

62

Searching for SentencesS

VP

Pick up

NP

the white leg

PPNP

TO NP

near the black tableγ

1

give_part(robot, part)

63

Searching for SentencesS

VP

Pick up

NP

the white leg

PPNP

TO NP

near the black table

γ2

γ1

give_part(robot, part)

64

Searching for SentencesS

VP

Pick up

NP

the white leg

PPNP

TO NP

near the black table

γ3

γ2

γ1

give_part(robot, part)

65

Forward SemanticsUnderstanding: What groundings match the language?

Generation: What language specifies the groundings?

argmaxlanguage

f (γ1 ... γN , language)

γk are groundings, or objects, places, paths, and

events in the external world. Each γk

corresponds to a

constituent phrase in the language input.

argmaxγ1 ...γN

1Z∏i

gi(γ1 ... γN , language)

66

Forward SemanticsUnderstanding: What groundings match the language?

Generation: What language specifies the groundings?

argmaxlanguage ,γ1 ...γk

f (γ1 ... γN , language)

γk are groundings, or objects, places, paths, and

events in the external world. Each γk

corresponds to a

constituent phrase in the language input.

argmaxγ1 ...γN

1Z∏i

gi(γ1 ... γN , language)

67

Inverse Semantics

argmaxlanguage ,γ1 ...γk

p (γ1 ... γN∣language)

argmaxlanguage ,γ1 ...γk

∏igi(γ1 ... γN , language)

∑Γ ' ∏i

gi(γ1 ' ... γN ' , language)

Equivalent to the problem of language understanding!

68

Inverse Semantics

argmaxlanguage ,γ1 ...γk

∏igi(γ1 ... γN , language)

∑Γ ' ∏i

gi(γ1 ' ... γN ' , language)

69

Inverse Semantics

argmaxlanguage ,γ1 ...γk

∏igi(γ1 ... γN , language)

∑Γ ' ∏i

gi(γ1 ' ... γN ' , language)

argmaxlanguage ,γ1 ...γk

0.90.9+0.9+0.9+K

argmaxlanguage ,γ1 ...γk

0.33

give the robot the white leg.

71

Inverse Semantics

argmaxlanguage ,γ1 ...γk

∏igi(γ1 ... γN , language)

∑Γ ' ∏i

gi(γ1 ' ... γN ' , language)

argmaxlanguage ,γ1 ... γk

0.70.7+0.1+0.1+K

argmaxlanguage ,γ1 ...γk

0.78

give the robot the white leg that is on the black table.

73

Solution Overview

1. Detect the failure.

2. Infer an action to fix the problem.

3. Infer a natural language sentence describing the action.

4. Replan after the human has provided help based on the updated state.

74

Solution Overview

1. Detect the failure.

2. Infer an action to fix the problem.

3. Infer a natural language sentence describing the action.

4. Replan after the human has provided help based on the updated state.

75

Replan From Current State

● Human may have– helped differently than expected.

– failed to help in time.

– caused side-effects.

76

Evaluation Overview

● Does the inverse-semantics method generate requests that are easier to understand than other methods?– Online corpus-based evaluation.

● Does our approach work in an end-to-end-system?– User study in a real-world furniture assembly

system.

77

Evaluation Overview

● Does the inverse-semantics method generate requests that are easier to understand than other methods?– Online corpus-based evaluation.

● Does our approach work in an end-to-end-system?– User study in a real-world furniture assembly

system.

78

Corpus-Based Evaluation

“Give me the white leg that is on the black table.”

Which video is the best response to the natural language request?

79

Corpus-Based Evaluation

“Give me the white leg that is on the black table.”

Which video is the best response to the natural language request?

80

Corpus-Based Evaluation: Results

Generation Algorithm Example Success Rate(%)

Chance 20

“Help me” “Help me.” 21 ±8.0

81

Corpus-Based Evaluation: Results

Generation Algorithm Example Success Rate(%)

Chance 20

“Help me” “Help me.” 21 ±8.0

Templates “Hand me part 2.” 47 ±5.7

G3 Approach 1 “Give me the white leg.” 52.3 ± 5.7

G3 Approach 2“Give me the white leg that is on the black table.”

64.3 ± 5.4

Hand-written Request“Take the table leg that is on the table and place it in the robot’s hand.”

94 ± 4.7

82

Corpus-Based Evaluation: Results

Generation Algorithm Example Success Rate(%)

Chance 20

“Help me” “Help me.” 21 ±8.0

Templates “Hand me part 2.” 47 ±5.7

Inverse Semantics w/onormalizer

“Give me the white leg.” 52.3 ± 5.7

Inverse Semantics“Give me the white leg that is on the black table.”

64.3 ± 5.4

Hand-written Request“Take the table leg that is on the table and place it in the robot’s hand.”

94 ± 4.7

83

Evaluation Overview

● Does the inverse-semantics method generate requests that are easier to understand than other methods?– Corpus-based Evaluation on AMT.

● Does our approach work in an end-to-end-system?– User-study in a real-world furniture assembly

system.

84

Evaluation Overview

● Does the inverse-semantics method generate requests that are easier to understand than other methods?– Corpus-based Evaluation on AMT.

● Does our approach work in an end-to-end-system?– User-study in a real-world furniture assembly

system.

85

User Study

● Human-robot team assembled Ikea furniture in parallel for 15 minutes.

● Robots asked for help when they encountered failure.– Three staged failures (e.g., a part out of reach on the table.)

– Many unstaged failures (e.g., a part slipped out of the robot's grasp.)

● Human provided whatever help they felt was appropriate.

● Robots continued operating autonomously.

86

87

User Study ResultsObjective Metrics

0

10

20

30

40

50

60

70

80

90

BaselineInverse Semantics

% Error Free Interaction

Bet

ter

88

User Study ResultsObjective Metrics

0

10

20

30

40

50

60

70

80

90

BaselineInverse Semantics

% Overall Success Rate

Bet

ter

89

User Study ResultsSubjective Metrics

Prefer Parallelism

Bet

ter

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

BaselineInverse Semantics

90

How can robots collaborate with people using natural language?

● Following instructions.– “Put the metal crate on the truck.”

● Asking questions.– “What does 'the metal crate' refer to?”

● Requesting Help.– “Hand me the black leg that is under the table.”

91

Future Work

● Planning in very large state spaces.● Grounded dialog.

92

93

Affordance-Aware Planning

94

Affordance-Aware Planning

95

Affordance-Aware Planning

96

Cooking Game

97

Multimodal POMDPs for Collaborative Robots

98

100

Multimodal POMDPs for Collaborative Robots

● Estimate human's mental state from language, gesture, and perceptual observations.

● Solve POMDPs with very large observation spaces.

101

Contributions

● Defined a Bayesian algorithm for generating natural language requests for help.

● Demonstrated that people can understand robotic help requests compared to baselines using a corpus-based evaluation.

● Assessed strengths and limitations in an end-to-end system with a real-world user study.

Asking for Help Using Inverse Semantics

Stefanie Tellex, Ross KnepperAdrian Li, Daniela Rus, Nicholas Roy

102

User Study ResultsSubjective Metrics

Effective Communication

Bet

ter

0

0.5

1

1.5

2

2.5

3

3.5

4

BaselineInverse Semantics

103

User Study ResultsObjective Metrics

0

5

10

15

20

25

30

35

BaselineInverse Semantics

Seconds Elapsed (without handoffs)

Bet

ter

104

Training

● Collect parallel corpus of language paired with groundings.

105

Hand me the white legthat is on the table

γk are groundings, or objects, places, paths, and

events in the external world. Each γk

corresponds to a

constituent phrase in the language input.

γ3

γ1

γ2

106

Corpus-Based Evaluation

● Each user responded to 20 help requests.● Five algorithms for generating requests.

– “Help me.”

– Templates

– Approach 1

– Approach 2

– Handwritten requests.

● Total of 900 trials.

107

Limitations of Corpus-Based Evaluation

● Ambiguity between “near” and “under.”● Canned set of 5 choices.● No indication of how language generation acts

in context of the system.

108

User Comments“Help me”“I think if the robot was clearer or I saw it assemble the desk before, I would know more about what it was asking me.”

“Did not really feel like 'working together as a team'”

Inverse Semantics“More fun than working alone.”

“There was a sense of being much more productive than I would have been on my own.”