Modeling Dialogue - Bolzano-Bozen, 15-26 August,...

91
Modeling Dialogue Building Highly Responsive Conversational Agents David Schlangen, Stefan Kopp with Sören Klett CITEC // Bielefeld University

Transcript of Modeling Dialogue - Bolzano-Bozen, 15-26 August,...

Page 1: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Modeling Dialogue Building Highly Responsive

Conversational Agents

David Schlangen, Stefan Kopp with Sören Klett

CITEC // Bielefeld University

Page 2: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Takeaways from Day 1• Responsive agents: minimize time between event and response,

respond to many more types of events than “end of turn”

• Dialogue participants

• try to reach mutual understanding

• continuously monitor whether they have reached it

• and, if necessary, repair ASAP;

• so if you don’t react, you risk repair.

• Responsiveness is built into the fabric of dialogue / builds the fabric.

• Reducing it makes (spoken) dialogue harder. (Brannigan et al. 2011)

Page 3: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Overview of Course• Day 1: Motivation, Phenomena

• Day 2: Technical Challenges, Approaches

• Day 3: Introduction to Technical Framework

• Day 4: Tasks & Hands-On Exercises

• Day 5: Reports, Discussion

Page 4: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Modeling Dialogue Building Highly Responsive

Conversational Agents

Day 2: Technical Challenges, Approaches

David Schlangen, Stefan Kopp with Sören Klett

CITEC // Bielefeld University

Page 5: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ASR

NLU

DM

NLG

TTS / Behaviour

Dialogue System Modules

Automatic Speech

Recognition

Natural Language

Understanding

Dialogue Management

Natural Language Generation

Text to Speech / Behaviour Realisation

Page 6: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Overview of Day 2• Information Flow in Incremental Dialogue Processing

• Incremental

• ASR

• NLU

• DM

• NLG / NVBG

• Synthesis / Realizer

“Respond to many more types of events than “end of turn” • To do:

• Create these events • Generate appropriate

responses.

Page 7: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Non-Incremental vs. Incremental Processing

User:

System:

User:

System:

7

Page 8: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

User:

System:

2.1 Challenges

• Requires reconceptualisation of information flow• Introduces (even more...) uncertainty

“Incremental Units” model (Schlangen & Skantze EACL 2009, Dialogue & Discourse 2011)

8

Page 9: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ASR

NLU

DM

NLG

TTS / Behaviour

Automatic Speech

Recognition

Natural Language

Understanding

Dialogue Management

Natural Language Generation

Text to Speech / Behaviour Realisation

Page 10: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

ASR

NLU

DM

NLG

TTS / Behaviour

Automatic Speech

Recognition

Natural Language

Understanding

Dialogue Management

Natural Language Generation

Text to Speech / Behaviour Realisation

Page 11: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

Page 12: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

takewordrec.

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

Page 13: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

take thewordrec.

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

Page 14: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

take the redwordrec.

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

Page 15: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

wordrec. take the red cross

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

Page 16: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

wordrec. take the red cross

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

• “Higher-level” hypotheses can be formed on the basis of “lower-level” ones.

Page 17: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

• Information state is updated with minimal units of information, as soon as they can be hypothesised

• “Higher-level” hypotheses can be formed on the basis of “lower-level” ones.

wordrec. take

the IU model – Assumptions –

sem

Page 18: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

• Information state is updated with minimal units of information, as soon as they can be hypothesised

• “Higher-level” hypotheses can be formed on the basis of “lower-level” ones.

wordrec. take

the IU model – Assumptions –

sem take

Page 19: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

wordrec. take the red cross

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

• “Higher-level” hypotheses can be formed on the basis of “lower-level” ones.

sem take x rd() cross()

Page 20: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

• “Higher-level” hypotheses can be formed on the basis of “lower-level” ones.

• IS may have to be revised, in light of newer information

fourfour

forty

Page 21: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

• “Higher-level” hypotheses can be formed on the basis of “lower-level” ones.

• IS may have to be revised, in light of newer information

ASR take the right

Page 22: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

• “Higher-level” hypotheses can be formed on the basis of “lower-level” ones.

• IS may have to be revised, in light of newer information

rightASR take the red

Page 23: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

the IU model – Assumptions –

rightASR take the red

Sem LFa b c

• Information state is updated with minimal units of information, as soon as they can be hypothesised

• “Higher-level” hypotheses can be formed on the basis of “lower-level” ones.

• IS may have to be revised, in light of newer information

Page 24: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

• Information state is updated with minimal units of information, as soon as they can be hypothesised

• “Higher-level” hypotheses can be formed on the basis of “lower-level” ones.

• IS may have to be revised, in light of newer information

takewordrec.

the IU model – Assumptions –

moment when we are observing IS

moments when partial events are hypothesised to

have happenedmoment when agent updated IS

Page 25: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

the IU model – Assumptions –

• Information state is updated with minimal units of information, as soon as they can be hypothesised

• “Higher-level” hypotheses can be formed on the basis of “lower-level” ones.

• IS may have to be revised, in light of newer information

righttake the red

LFa b cIUs Updates 2 types of relation: * same-level links * grounded-in links

Page 26: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

the IU model • Implemented in InproTK ( http://www.inpro.tk ),

Jindigo (Skantze), IPAACA (Kopp & Buschmeier), HiRAF (Klett et al.)

Page 27: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Overview of Day 2• Information Flow in Incremental Dialogue Processing

• Incremental

• ASR

• NLU

• DM

• NLG / NVBG

• Synthesis / Realizer

Page 28: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ASR

NLU

DM

NLG

TTS / Behaviour

Dialogue System Modules

Automatic Speech

Recognition

Natural Language

Understanding

Dialogue Management

Natural Language Generation

Text to Speech / Behaviour Realisation

Page 29: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ASR NLU DM NLG TTS

mechanisms for computing them?

representations of partial results?

evaluation?

configurations, interactions?

architectures?

systems?

iASR: (Baumann, Atterer, Schlangen; NAACL 2009) (Baumann, Buß, Atterer, Schlangen; Interspeech 2009)

EOT

iNLU: (Atterer, Baumann, Schlangen, Interspeech 2009) (Atterer & Schlangen, SRSL 2009) (Schlangen, Baumann, Atterer, SIGdial 2009) (Heintze, Baumann, Schlangen, SIGdial 2010)(Peldszus, Buß, Baumann, Schlangen, EACL 2012)

iEOT: (Schlangen, Interspeech 2006), (Baumann; ESSLLI 2008), (Atterer, Baumann, Schlangen, Coling 2009)

iDM: (Buß & Schlangen, Semdial 2010; Buß, Baumann, Schlangen, SIGdial 2010; Buß & Schlangen, Semdial 2011)

evaluation: (Baumann, Buß, Schlangen, D&D 2011)

AGMo, impl.: (Schlangen & Skantze, EACL 2009, D&D 2011), (Schlangen et al., SIGdial 2010)

Systems: (Schlangen & Skantze, EACL 2009) (Buß & Schlangen Semdial 2010, 2011)

completion: (Baumann & Schlangen, SIGdial 2011)

annotated bibliography:http://www.inpro.tk

(see also http://www.dsg-bielefeld.de)

(Baumann & Schlangen,ACL demo 2012,Interspeech 2012; Buschmeier et al. SIGdial 2012)

29

Page 30: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Part II Challenges and Approaches

2.2 iASR

ASR

ASR creates a lot of instability on the right frontier. Tradeoff between stability and latency.

(See (Bauman et al. 2009 ff.), http://inpro.tk )

Page 31: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Part II Challenges and Approaches

2.2 iNLU

ASR NLU

Page 32: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

(incremental) NLU

• input: utterances

• output: meaning representations

• the task: extract (intended) meaning from utterance

• incremental: input and/or output are IUs;input IUs are indvidual wordsoutput IUs are ?

32

Page 33: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

(incremental) NLU

33

logical form, keywords, frame, etc.

?

?

?

?

?

• what is this? (representations)

• how is it built? (methods)

Page 34: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

incremental NLU

34

logical form, keywords, frame, etc.

Page 35: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

incremental NLU

35

logical form, keywords, frame, etc.

guess what is going to be said

“restart-incremental” (vs. fully incremental)

Page 36: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

what has been tried?

• predict whole representation: one (massively) multi-class problem [ ICT (Sagae et al. 2009, DeVault et al. 2011, 2013), (Heintze et al. 2010)

36

We can give you power generator

moveclinicWe can

We clinicmove

mustorder

Page 37: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

what has been tried?

• predict whole representation: one (massively) multi-class problem [ ICT (Sagae et al. 2009, DeVault et al. 2011, 2013), (Heintze et al. 2010)

37

We can give you power generator

moveclinicWe can

We clinicmove

mustorder

136 classes (= 135 possible meanings + 1 OOD)

maxEnt classifier

re-queried for each prefix (restart incremental)

Page 38: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

what has been tried?

• predict whole representation: one (massively) multi-class problem [ ICT (Sagae et al. 2009, DeVault et al. 2011, 2013), (Heintze et al. 2010)

38

We can give you power generator

moveclinicWe can

We clinicmove

mustorder

136 classes (= 135 possible meanings + 1 OOD)

maxEnt classifier

re-queried for each prefix (restart incremental)

2nd classf. that predicts when it is as good as it gets(bc. then you can act)

Page 39: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

svm

what has been tried?

• predict whole representation: one (massively) multi-class problem [ ICT (Sagae et al. 2009, DeVault et al. 2011, 2013), (Heintze et al. 2010)

39

FlightsGOAL = FLIGHT TOLOC.CITY_NAME = San Francisco FROMLOC.CITY_NAME = New York

Flights arrivingGOAL = FLIGHT TOLOC.CITY_NAME = San Francisco ARRIVE_TIME.TIME = 10am

Flights arriving in Chicag after

GOAL = FLIGHT TOLOC.CITY_NAME = Chicago ARRIVE_TIME.TIME_RELATIVE = after ARRIVE_TIME.TIME = 11pm

Page 40: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

svm

what has been tried?

• predict whole representation: one (massively) multi-class problem [ ICT (Sagae et al. 2009, DeVault et al. 2011, 2013), (Heintze et al. 2010)

40

FlightsGOAL = FLIGHT TOLOC.CITY_NAME = San Francisco FROMLOC.CITY_NAME = New York

Flights arrivingGOAL = FLIGHT TOLOC.CITY_NAME = San Francisco ARRIVE_TIME.TIME = 10am

Flights arriving in Chicag after

GOAL = FLIGHT TOLOC.CITY_NAME = Chicago ARRIVE_TIME.TIME_RELATIVE = after ARRIVE_TIME.TIME = 11pm

3159 classes (!!) (2594 of which occur only once!)The curse of combinatorics.. |"From x to y"| = (#Cities)2

not a good domain for guessing final meaning

Page 41: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

1 svm per frame element(+ class n/a)

what has been tried?

• separate classifiers for each slot (semi-aligned representation) [ (Heintze et al. 2010) ]

41

Flights GOAL = FLIGHT

Flights arriving GOAL = FLIGHT TOLOC.CITY_NAME = San Francisco

Flights arriving in Chicag after GOAL = FLIGHT TOLOC.CITY_NAME = Chicago

Page 42: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

1 svm per frame element(+ class n/a)

what has been tried?

• separate classifiers for each slot (semi-aligned representation) [ (Heintze et al. 2010) ]

42

Flights GOAL = FLIGHT

Flights arriving GOAL = FLIGHT TOLOC.CITY_NAME = San Francisco

Flights arriving in Chicag after GOAL = FLIGHT TOLOC.CITY_NAME = Chicago

Similar shape (still can't predict future well), but better performance

Page 43: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

what has been tried?

• predict whole representation: one (massively) multi-class problem [ ICT (Sagae et al. 2009, DeVault et al. 2011, 2013), (Heintze et al. 2010) ]

• separate classifiers for each slot (semi-aligned representation) [ (Heintze et al. 2010) ]

• reformulate as tagging task (fully aligned representation) [ (Heintze et al. 2010) ]

• purely incremental semantics contruction [ (Peldszus et al. 2012, Peldszus & Schlangen 2012) ]

43

Page 44: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

what has been tried?

• purely incremental semantics contruction [ (Peldszus et al. 2012, Peldszus & Schlangen 2012) ]

44

delete

grammar(left-factorized, left-corner transformed)

incr. parser (top-down, prob. beam-search)

delete the

underspecified sem. reprs. (fully incrementally built; always interpretable)

Page 45: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

what has been tried?• purely incremental semantics contruction

[ (Peldszus et al. 2012, Peldszus & Schlangen 2012) ]

• produces fully linked (grounded in) representations

• possible advantage: allows more interactions between (sub-)modules

• if no interpretation found, try diff. parse45

Parser Inter-preter

Page 46: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

what can you do with it?

• predict whole representation: one (massively) multi-class problem [ ICT (Sagae et al. 2009, DeVault et al. 2011, 2013), (Heintze et al. 2010) ]

• separate classifiers for each slot (semi-aligned representation) [ (Heintze et al. 2010) ]

• reformulate as tagging task (fully aligned representation) [ (Heintze et al. 2010) ]

• purely incremental semantics contruction

46

preparereply

react toparts

Page 47: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Part II Challenges and Approaches

2.3 iDM

ASR NLU DM

Page 48: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

(incremental) DM

• input: semantic representation

• output: decision on system action

• the task: decide how to (re-)act

• incremental: input may not be based in complete utterance, may be revoked;within-turn actions possible

48

Page 49: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

(incremental) Dialogue Management

49

IS-Update, Action Selection

?

?

?

?

?

Page 50: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

(incremental) Dialogue Management

50

SIL

?

?

?

?

?

prepare BC

cancel BC

prepare BC

execute BC

...

The Numbers system (Skantze & Schlangen, EACL 2009)

Page 51: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

the numbers system

Page 52: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ASR

NLU

main DM

NLG

hybrid DMseparate incremental component, "normal" DM

• popular for virtual agents• can lead to "mhm mhm Sorry,

I did not understand.."

w/ standardendpointing

TTS / Behaviour

reactive layer

incremental /continuous

pro-

sody

Page 53: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

(incremental) Dialogue Management

53

Normal DM + Incremental Interaction Manager (Selfridge et al. 2012; Khouzaimi et al. 2016)

?

?

?

?

?

hu?

hu?

which city?[execute query]...

Normal DM IISblocked

blocked

blockedpassed through

Page 54: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

(incremental) Dialogue Management

54

?

?

?

?

?

hu?

hu?

which city?[execute query]...

Normal DM + Incremental Interaction Manager (Selfridge et al. 2012; Khouzaimi et al. 2016)

Normal DM IISblocked

blocked

blockedpassed through

Incremental Interaction Manager tries out input on dialogue manager, proposed action is only taken if deemed interesting (forward-looking), otherwise is filtered out and DM state reset.

Page 55: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Summary DM• incremental DM enables handling of additional

behaviours (completions, delivery in installments)

• design space:

• from keeping non-incremental DM, but adding more reactive second channel, to

• real incrementality

• truly incremental DM decreases importance of notion of "utterance"; makes collaboration on utterances possible

• still an even wider open field, no standards yet, not really re-usable components

• (PO)MDPs??

Page 56: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ASR NLU

56

DM

Page 57: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ASR NLU

57

NLG DM

NVBG

Page 58: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Natural Language Generation (NLG)

Traditional approaches: all processing is utterance-initial• potentially slow• inflexible, unable to change to ongoing utterances

Speech Output in Typical Dialogue Systems

current point in time

There's an appointment today at 4:25 titled: ‘SigDial Talk’ with the note: ‘be on time’.

noisenoise

when?when?

calendarentrychanges

user feedback

● potentially slow, as all processing is utterance-initial● inflexible: unable to change the ongoing utterance

{{

Page 59: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Incremental NLG

Potentially better to generate, synthesize and deliver in smaller chunks • less utterance-initial processing — faster onset• can take changes into account — react to feedback, requests, noise, …

Potentially Better: Incremental Speech Output

current point in time

There's an appointment today at 4:25 titled: ‘SigDial Talk’ with the note: ‘be on time’.

● less utterance-initial processing * faster onset

!!

Potentially Better: Incremental Speech Output

current point in time

There's an appointment today at 4:25 titled: ‘SigDial Talk’ with the note: ‘be on time’.

● incremental output may take changes into account● react to user feedback / requests / noise / …

when?when?

at 4:25,titled:‘SigDial Talk’…

Page 60: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Incremental NLG

Granularity of chunks: size of incremental generation units?• determines responsiveness to changes• determines context available for further processing

• Smaller units?• ideally: word-by-word• but surface structure cannot be generated

strictly left-to-right and word-by-word

• Bigger units?• enable coherent prosodic realization• fewer inputs lead to lower overhead• but limited responsiveness

Granularity of Incremental Chunksfor Language Generation

● granularity � size of the units determines responsiveness to changes determines context available for further processing

● ideally: generate word-by-word however, this may be infeasible

● surface structure cannot always be produced purely left-to-right and word-by-word

NP

DET: indef N: sing

aanor: ? crocodile

alligator

Page 61: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Incremental NLG

➔ sub-utterance chunk size• corresponding to intonation phrases (roughly)• mildly incremental generation

Approach: two stage planning process• micro-content-planning: generates

micro-planning tasks, chooses which one to generate next

• micro-planning proper: generates surface form for each IMPT, changes generation parameters

• communicate via a sharedinformation state

Utterance IC1 IC2 ICn …

Utteranceoutline IMPT1 IMPT2 IMPTn …

MCP

– {U1, …}– KB1

– {Ui, …}– KB2

– {Uk, …}– KBn

MPP

…state

t

uses JavaSPUD (DeVault 2008)(Buschmeier et al., SigDIAL 2012)

Page 62: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

From incremental to responsive generation

Responsive generation • incremental generation allows for dynamic, adapted creation of later sub-

utterance chunks • decisions about adaptations are delayed almost until the preceding increment

finishes• adaptation to state in both components

• MCP: which IMPT next? repair/comment?• MPP: influence generation parameters, such as verbosity, redundancy

Example: verbosity• length of utterance increment• MPP uses predefined resources for desired degree of verbosity

Page 63: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

iNLG + Speech Synthesis

• use of incremental speech synthesis (INPRO_iSS; Timo Baumann’s course)• synthesizes just-in-time, some look-ahead to keep buffers filled

Incremental Speech Output:Overview

● using a crawling vocoder that performs HMM optimization and vocoding in real-time

iSSiNLG

utteranceIU chunkIU1 with subjectthe

w ɪ əðð ʒb tɛs ʌ kd

moves along with time

crawlingvocoder

Incremental Speech Output:Overview

● when nearing completion, update-messages are sent(from phonemes to chunk to iNLG)

iSSiNLG

utteranceIU chunkIU1 with subjectthe

w ɪ əðð ʒb tɛs ʌ kd

crawlingvocoder

nearing completion? trigger iNLG

on ongoing: update chunk

Incremental Speech Output:Overview

● and iNLG adds another chunkIU before synthesis runs out of speech

● it's appended to the ongoing synthesis

● the process repeats until all chunks are synthesized

iSSiNLG

utteranceIU chunkIU1 with subjectthe

w ɪ əðð ʒb tɛs ʌ kd

crawlingvocoder

chunkIU2

Page 64: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

iNLG + Speech Synthesis

Results with iNLG + iSS (Buschmeier et al., SigDIAL 2012):• reduces latency over a non-incremental baseline

• information presentation of calendar entries, with random noise: adaptive presentation after noise is rated more naturalstop-and-restart >* stop-and-wait ~ ignore-and-continue

Results for Utterance Onset Timing

● both iNLG and iSS are much faster than non-incremental processing

● (linguistic pre-processing is not yet incrementalized)

averaged over 9 stimuli, time in milliseconds

processing step non-incr. incrementalNLG 361 52

Synth. (ling. processing) 217 222 Synth. (HMM & vocoding) 1004 21

Total 1582 295

Page 65: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

NVBG — nonverbal behavior generation

• Task: Generation of nonverbal behaviors• selection, coordination („fission“), synchronization• as a function of intended meaning, dialogue function, discourse function,

speaker state, information state, …

• Early approaches (Cassell et al. 2001) and current practical ones use simple formalism (rules or transducers; Marsella et al.) to formulate mapping

• Integrated microplanning (multimodal grammar) (Kopp et al. 2004)

• Recent approaches focus on one or few modalities, learned from data

Referent Features

Discourse/Linguistic Context

Previous Gesture

Thematization

Technique

CommGoal

ShapeProp

Position

MainAxis

Childnodes

Symmetry

Handedness

Gesture (y/n)

Handshape

Gesture (y/n) Technique

Handshape

Palm Orient. Finger Orient.Movement

TypeMovement Direction

NP type

Handedness

InfoState

V5; all; NG=82, NNP=129

Page 66: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ASR NLU

66

NLG DM

NVBG

Page 67: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ASR NLU

67

NLG DM

NVBG

TTS

NVBR

Page 68: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ASR NLU

68

NLG DM

NVBG

Realizer

Page 69: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Incremental behavior realization

• Final task in an end-to-end system: realize behavior into perceivable output• speech, prosody — text-to-speech synthesis• nonverbal behavior (face, gesture, gaze, head, posture, …) — computer

graphics for virtual agents, motor control for physical robot agents• other modalities/media — visualization, acoustic cues, …

• Main challenges, often in trade-offs• quality: expressivity, intelligibility, naturalness, lifelikeness, sample rate, …• efficiency: latency, computational cost (time, memory)• flexibility: controllability, adaptivity to external or

internal constraints, …• synchrony: internal coherence (e.g. temporal

coordination) between modalities, sync withexternal events

Page 70: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

SAIBA framework

• Three-stage structure of behavior generation in many existing ECAs• Idea: modularization and separation of stages (treated as black boxes)• Enable interoperability and exchange of modules• Definition of interfaces between stages — common markup languages

• Behavior Markup Language (BML) • Function Markup Language

Intent Planning

Behavior Planning

Behavior Realization

FML BML

FeedbackFeedback

Page 71: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Different realizers, one BML

Page 72: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Smartbody (ICT, USC LA)

• http://www.smartbody-anim.org/• Focus: very realistic behavior

• Motion Capture or artistcreated animations

• Support for recorded voices

Page 73: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

LiteBody

• http://relationalagents.com/litebody.html• Webbased, 2D,lightweight• Used in long-term studies• Robust

Page 74: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ROS BML Realizer

• http://sourceforge.net/projects/rosbmlrealizer/• Uses the Robotic Operation System (ROS)• realizes BML on robot body

Page 75: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

AsapRealizer

• Designed to allow fluent interaction• Fluent, very interactive behavior realization• Interruptions, on-the-fly-adaptation, incrementality, reactivity• Extensibility• with a virtual human or robot

Page 76: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

BML design

• Describes occurrence of behaviors• Relative timing of behaviors• Form of behaviors• Realizer-independent• But allows extensions for realizer-dependent behavior

Page 77: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

<bml xmlns="http://www.bml-initiative.org/bml/bml-1.0" id="bml1">

<speech id="speech-1"> <text> Look, it’s over <sync id="sync-1" /> there. </text> </speech>

<pointing id="point1" ready="speech-1:sync-1" mode="LEFT_HAND" target="camera" />

</bml>

Look, it’s over there.

syncronisationconstraint

behaviours

77

BML example• Specification of a co-speech deictic gesture

Page 78: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

BML behaviors

• Gesture• Head• Gaze• Speech• Locomotion• Posture• Facial expression

eyes

torso

legs

Coordination

Page 79: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

BML phases and sync-points

<bml><gaze id="gaze1" target="AUDIENCE"/><speech start=”gaze1:ready” id="speech1"> <text>Welcome ladies and gentlemen!

</text></speech>

</bml>

Page 80: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

BML feedback from realizer

• To provide the behavior planner with information on• Delivered behaviors: Progress feedback• Delivery failures: Warning feedback• Predicted timing and form decisions: Prediction feedback

<predictionfeedback> <gesture id=“b1” lexeme=“beat” mode=“RIGHT_HAND” start=“0” ready=“1” strokeStart=“1” strokeEnd=“2” relax=“2” end=“3”/> </predictionfeedback>

<bml id=“bml1”> <gesture id=“b1” lexeme=“BEAT”/> </bml>

To realizer:

From realizer:

Page 81: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Behavior realization for responsive agents

We require the realizer to enable a lot of things:

• Mid-utterance (self-)interruption• Seamless turn-taking (i.e. respond quickly to external events)• Fighting over the turn using louder speech, speeding up/slowing down, …• Responding to listener feedback, e.g. delaying speech until the listener

has finished speaking or resuming before their delivery is finished• Employing fillers to keep or attain the turn, without having a full plan at

hand• Retain multimodal synchrony when adapting a behavior• …

Page 82: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Incremental behavior realization: ASAP

• BML extensions BMLA and BMLIS to enable incremental, adaptive and interruptive speech and behavior realization

• ASAP realizer (artificial social agents platform)• incremental construction of plans• continuous modification of the timing and shape of ongoing behavior• fluent connection of increments• interface to Inpro_iSS and other TTS engines, animation engines, robots

Page 83: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

ASAP realizer architecture

BehaviorPlanner

BML BML

BML

BML

Scheduler

ExecutionEngines

feedback

flexible behavior plan

peg board

bottom-up shape modifications

composition

bottom-uptime modifications

feedbackcoarticulation

plan additions/modificationsAnticipators

entry/interrupt interrupt targets

IN_PREP

PENDING

scheduling finished [preplan]

LURKING

block activated

scheduling finished [not preplan]

IN_EXEC

append targets DONE and chunk targets SUBSIDING / realign,activate onStarts

DONE

all behaviors in DONE

interrupt

interrupt

interrupt SUBSIDING

all behaviors in SUBSIDING or DONE

interrupt

entry/interrupt interrupt targets

IN_PREP

PENDING

scheduling finished [preplan]

LURKING

block activated

scheduling finished [not preplan]

IN_EXEC

append targets DONE / activate onStarts

DONE

all behaviors in DONE

interrupt

interrupt

interrupt

Elckerlyc Asap

Preparation for and reaction to anticipated events

Page 84: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Example: Modeling turn taking dynamics

Page 85: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Extensions for fluent interaction

• Interruption — more than just stopping• find earliest feasible interruption points• gracefully remove behavior

• Parameter value change • even at execution time• for running behavior

<bmla:interrupt id="i1" target="bml1" start="shake1:stroke" exclude="speech1,gesture1"/>

<bmla:parametervaluechange id="p1" target="bml1:speech1" paramId="volume" start="bml1:speech1:sync1" end=„bml1:speech1:sync1+1"/>

Page 86: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Extensions for fluent interaction

• Incremental composition • compose behavior out of smaller BML blocks• fine-grained composition: append/prepend, chunk before/after

<bml id="bml3" bmla:appendAfter="bml1,bml2" bmla:prependBefore=„bml4"/>

<bml id="bml3" bmla:chunkAfter=„bml1,bml2"/>

Page 87: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Example: incremental planning and realization

Asa

pR

ealiz

er

<bml id="bmli" bmla:interrupt="bml1"/>

IU <

-> B

MLbml1 bml2 bml3

Beh

avio

rPla

nn

er

revoke

add

<bml id="bml3" bmla:appendAfter="bml2">…</bml>start

end

status<predictionFeedback>

<bml id="bml2" globalStart="1"/></predictionFeedback>

update

startend

status

<blockProgress id="bml4:end" globalTime ="2"/>

bml4 commit

Page 88: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Overview of Day 2• Dialogue Processing Flow: ASR —NLU — DM — NLG / NVBG — Realizer

• all components must run incrementally and interact via local updates

• IU model:

• IS updated with minimal units of information, as soon as hypothesised

• “Higher-level” hypotheses formed on basis of “lower-level” ones

• IS may have to be revised, in light of newer information

• Hybrid system / DM: main DM + reactive layer

• Incremental generation is faster and adapts more naturally to disturbances

• Incremental realization requires plan construction, interruption, continuous modification, fluent connection of increments, based on prediction of events

Page 89: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Questions?

Page 90: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

End of Day 2Tomorrow: Introduction to technical framework

Page 91: Modeling Dialogue - Bolzano-Bozen, 15-26 August, 2016esslli2016.unibz.it/wp-content/uploads/2015/10/... · 2016-08-23 · Modeling Dialogue Building Highly Responsive Conversational

Literature

• http://www.dsg-bielefeld.de

• http://scs.techfak.uni-bielefeld.de

• Branigan, H. P., Catchpole, C. M., & Pickering, M. J. (2011). What makes dialogues easy to understand? Language and Cognitive Processes, 26(10), 1667–1686. doi:10.1080/01690965.2010.524765

• Clark, H. H. (1996). Using Language. Cambridge, England: Cambridge University Press.