Statistical Dialogue Modelling Milica Gašić Dialogue Systems Group.

32
Statistical Dialogue Modelling Milica Gašić Dialogue Systems Group

Transcript of Statistical Dialogue Modelling Milica Gašić Dialogue Systems Group.

  • Slide 1

Slide 2 Statistical Dialogue Modelling Milica Gai Dialogue Systems Group Slide 3 Why are current methods poor? Slide 4 Dialogue Manager Text to Speech Synthesiser Speech Recogniser Semantic Decoder Natural Language Generator Im looking for a restaurant inform(type=restaurant) request(food) What kind of food would you like? Dialogue Manager Dialogue Model Dialogue Policy What the user wants? What to say back to the user? Slide 5 Elements of Dialogue Management a3a3 s3s3 o3o3 a2a2 s2s2 o2o2 a1a1 s1s1 o1o1 sTsT oToT Turn 1Turn 2Turn 3 Turn T observations states actions What the system says What the user wants What the system hears a T-1 Slide 6 Example (Commercial systems) hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello() 0.5 inform(food=turkish) 0.3 inform(food=thai) 0.2 hello() 0.5 inform(food=turkish) 0.3 inform(food=thai) 0.2 Im looking for a Thai restaurant. Thai. type food statesactionsobservations typefood 0.6 R R R R You are looking for a restaurant right? Slide 7 Example (Baseline tracker in the practical session) hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello() 0.5 inform(food=turkish) 0.3 Inform(food=thai) 0.2 hello() 0.5 inform(food=turkish) 0.3 Inform(food=thai) 0.2 Im looking for a Thai restaurant. Thai. What kind of food do you want? type food statesactionsobservations typefood R R 1.0 R R TH 0.4 TR 0.3 Slide 8 Example (Focus tracker in the practical session) hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello() 0.5 inform(food=turkish) 0.3 Inform(food=thai) 0.2 hello() 0.5 inform(food=turkish) 0.3 Inform(food=thai) 0.2 Im looking for a Thai restaurant. Thai. type food statesactionsobservations typefood R R 1.0 R R TH 0.4 TR 0.3 What kind of food do you want? Did you say Thai or Turkish? TH 0.4 Slide 9 Challenges in dialogue modelling How to define the state space? How to tractably maintain the belief state? Which actions to take? Slide 10 1. How to represent dialogue state? Needs to know what happened before the dialogue history Markov property Needs to know what user wants the user goal Task oriented dialogue Needs to know what user says the user act Robust to errors Slide 11 Ontology defines possible user goals typerestaurantareanorthsouthfoodChineseIndianhotelstarts Slide 12 Example of a belief state (PRACTICAL SESSION) Goal-labels Food British 1.0 Area Dont-care 0.97 Method-label Byconstraints 0.72 Requested-slots Pricerange 0.88 Slide 13 2. How to track belief state? Slide 14 Generative vs Discriminative models Discriminative models: the state depends on the observation Generative models: the state generates the observation stst otot stst otot Slide 15 Focus Dialogue State Tracker a discriminative model Where slu(s) is the probability that semantic decoder gives to the state element s (TO BE IMPLEMENTED AT THE PRACTICAL SESSION) Slide 16 Partially Observable Markov Decision Process (POMDP) a generative model atat stst s t+1 rtrt otot o t+1 State is unobservable and depends on the previous state and action: P(s t+1 |s t, a t ) the transition probability State generates a noisy observation P(o t |s t ) -- the observation probability Slide 17 A bit of theory Belief propagation Probabilities conditional on the observations Interested in the marginal probabilities p(x|D), D={D a,D b } D a D b x Slide 18 A bit of theory Belief propagation D a D b x D c D d Split D b further into D c and D d Slide 19 A bit of theory Belief propagation D a D c a c D b b Slide 20 A bit of theory Belief propagation D a D b a b Slide 21 Belief state tracking Requires summation over every dialogue state!!! atat stst s t+1 rtrt otot o t+1 Requires summation over all possible states at every dialogue turn intractable!!! Slide 22 Dialogue state factorisation Decompose the sate into conditionally independent elements: user goal user action stst gtgt utut dtdt dialogue history atat rtrt otot o t+1 g t+1 u t+1 d t+1 Slide 23 Belief update gtgt utut dtdt atat rtrt otot o t+1 g t+1 u t+1 d t+1 Requires summation over all possible goals intractable!!! Requires summation over all possible histories and user actions intractable!!! Slide 24 Bayesian Update of Dialogue State system Further decomposes the dialogue state Tractable belief state update Learning of the shape of distribution Slide 25 Bayesian network model for dialogue gtgt utut dtdt atat rtrt otot o t+1 g t+1 u t+1 d t+1 g t food d t food u t food g t area d t area u t area g t+1 food d t+1 food u t+1 food g t+1 area d t+1 area u t+1 food Slide 26 Belief tracking For each node x Start on one side, and keep getting p(x|D a ) Then start on the other ends and keep getting p(D b |x) To get a marginal simply multiply these Slide 27 2. Which action to take? What kind of food do you want? Did you say you want thai food? Do you want Thai or Turkish? request(food) confirm(food=Thai) select(food=Thai,food=Turkish) typefood R R 1.0 TR 0.3 TH 0.4 Slide 28 Dialogue as a sequential decision process a3a3 s3s3 o3o3 a2a2 s2s2 o2o2 a1a1 s1s1 o1o1 a T-1 sTsT oToT r3r3 r2r2 r1r1 rTrT Want to optimise the sequence Based on the reward Slide 29 Reward function Reward is a measure of how good the dialogue is Slide 30 Q-function Q-function measures the expected discounted reward that can be obtained at a belief state when an action is taken Takes into account the reward of the future actions Optimising the Q-function is equivalent to optimising the policy Discount Factor in (0,1] Reward Starting belief state Starting action Expectation with respect to policy Slide 31 How to optimise the Q-function? On-line in interaction with the environment Standard methods takes too many dialogues, need a simulated user Use a sample efficient method (eg. Gaussian processes, Kalman filters) and optimise in direct interaction with human users Slide 32 Learning in interaction with real people Simulator trained On-line trained Success (%) 93.5 +/- 1.296.8 +/- 0.9 Slide 33 Conclusions Statistical dialogue modelling Requites compact state representation Generative vs discriminative dialogue model Policy optimised to maximise the reward