Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and...

36
Lecture 25: Lecture 25: CS573 CS573 Advanced Artificial Advanced Artificial Intelligence Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California [email protected]
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    3

Transcript of Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and...

Page 1: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Lecture 25:Lecture 25:CS573CS573

Advanced Artificial IntelligenceAdvanced Artificial Intelligence

Milind TambeComputer Science Dept and Information Science Inst

University of Southern California

[email protected]

Page 2: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Surprise Quiz II: Part ISurprise Quiz II: Part I

A

B CA P(B)

T 0.9

F 0.05

A P(C)

T 0.7

F 0.01

P(A) = 0.05

Questions: Surprise

Page 3: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

MarkovMarkov

Page 4: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

MarkovMarkov

Page 5: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Dynamic Belief NetsDynamic Belief Nets

Xt

Et

In each time slice:• Xt = Observable state variables• Et = Observable evidence variables

Xt+1

Et+1

Xt+2

Et+2

Page 6: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Types of InferenceTypes of Inference Filtering or monitoring: P(Xt | e1, e2…et)

– Keep track of probability distribution over current states– Like POMDP belief state– P(@ISI | c1,c2….ct) and P(N@ISI | c1,c2…ct)

Prediction: P(Xt+k | e1,e2…et) for some k > 0– P(@ISI 3 hours from now | c1,c2…ct)

Smoothing or hindsight: P(Xk | e1, e2…et) for 0 <= k < t– What is the state of the user at 11 Am, if observations at

9AM,10AM,11AM, 1pm, 2 pm

Most likely explanation: Given a sequence of observations, find the sequence of states that is most likely to have generated the observations (speech recognition)

– Argmaxx1:t P(X1:t|e1:t)

Page 7: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Filtering: P(Xt+1 | e1,e2…et+1)Filtering: P(Xt+1 | e1,e2…et+1)

P(Xt+1 | e1:t+1) = f1:t+1

= Norm * P(et+1 | Xt+1) * P(Xt+1 | xt) * P(xt|e1:t) xt

• e1:t+1 = e1, e2…et+1 • P(xt|e1:t) = f1:t

• f1:t+1 = Norm-const * FORWARD (f1:t, et+1)

RECURSION

Page 8: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Computing Forward Computing Forward ff1:t+11:t+1

• For our example of tracking user location:

• f1:t+1 = Norm-const * FORWARD (f1:t, ct+1)

• Actually it is a vector, not a single quantity

• f1:2 = P(L2 | c1, c2) implies computing for both < P(L2 = @ISI | c1, c2), P(L2 = N@ISI | c1, c2) > Then normalize

Hope you tried out all the computations from the last lecture at home!

Page 9: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Robotic PerceptionRobotic Perception

Xt

Et

Xt+1

Et+1

Xt+2

Et+2

At-1 At At+1

• At = action at time t (observed evidence)• Xt = State of the environment at time t• Et = Observation at time t (observed evidence)

Page 10: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Robotic PerceptionRobotic Perception

• Similar to filtering task seen earlier• Differences:

• Must take into account action evidence

Norm * P(et+1 | Xt+1) * P(Xt+1 | xt, at) * P(xt|e1:t) xt

POMDP belief update?

• Must note that the variables are continuous

P(Xt+1 | e1:t+1, a1:t)

= Norm * P(et+1 | Xt+1) * ∫ P(Xt+1 | xt,at) * P(xt|e1:t, a1:t-1)

Page 11: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

PredictionPrediction Filtering without incorporating new evidence P(Xt+k | e1,e2…et) for some k > 0

– E.g., P( L3 | c1)

= P(L3 | L2) * P(L2 | c1)

= (P(L3=@ISI|L2=@ISI)*P(L2=@ISI|c1) +

P(L3=@ISI|L2=N@ISI)*P(L2=N@ISI|c1)

= 0.7 * 0.6272 + 0.3 * 3728

= 0.43904 + 0.1118 = 0.55

– P(L4 | c1) = P(L4 | L3) * P(L3 | c1)

= 0.7 * 0.55 + 0.3 * 0.45 = 0.52

Computed in the lastlecture

Computed in the lastlecture

Page 12: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

PredictionPrediction

– P(L5 | c1) = 0.7 * 0.52 + 0.3* 0.48 = 0.508

– P(L6 | c1) = 0.7 * 0.5 + 0.3 * 0.5 = 0.5… (converging to 0.5)

Predicted distribution of user location converges to a fixed point– Stationary distribution of the markov process– Mixing time: Time taken to reach the fixed point

Prediction useful if K << mixing time– The more uncertainty there is in the transition model– The shorter the mixing time; more difficult to make predictions

Page 13: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

SmoothingSmoothing

P(Xk | e1, e2…et) for 0 <= k < t

P(Lk | c1,c2…ct) = Norm * P(Lk | c1,c2..ck) * P(ck+1..ct | Lk)

= Norm * f1:k * bk+1:t

bk+1:t is a backward message, like our earlier forward message

Hence algorithm called forward-backward algorithm

Page 14: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

bbk+1:t backward messagek+1:t backward messagebk+1:t = P(ek+1:t | Xk)

= P(ek+1,ek+2…. et | Xk)

= P(ek+1,ek+2…. et | Xk, Xk+1) P (xk+1 | Xk)

xk+1

Xk

Ek

Xk+1

Ek+1

Xk+2

Ek+2

Page 15: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

bbk+1:t backward messagek+1:t backward message

bk+1:t = P(ek+1:t | Xk)

= P(ek+1,ek+1…. et | Xk)

= P(ek+1,ek+1…. et | Xk, Xk+1) P (xk+1 | Xk)

xk+1

= P(ek+1,ek+1…. et | Xk+1) P (xk+1 | Xk)

xk+1

= P(ek+1| Xk+1) P(ek+2:t | Xk+1) P (xk+1 | Xk)

xk+1

Page 16: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

bbk+1:t backward messagek+1:t backward message

P(ek+1:t | Xk) = bk+1:t

= P(ek+1| Xk+1) P(ek+2:t | Xk+1) P (xk+1 | Xk)

xk+1

bk+1:t = BACKWARD(bk+2:t, ek+1:t)

bk+1:t = P(ek+1:t | Xk)

= P(ek+1,ek+1…. et | Xk)

= P(ek+1| Xk+1) P(ek+2:t | Xk+1) P (xk+1 | Xk)

xk+1

Page 17: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Example of SmoothingExample of Smoothing P(L1 = @ISI | c1, c2)

= Norm * P(Lk | c1,c2..ck) * P(ck+1..ct | Lk)

= Norm * P(L1 | c1) * P(c2 | L1) = Norm * 0.818 * P(c2 | L1)

P(c2 | L1 = @ISI) = P(ek+1:t | Xk) =

P(ek+1| Xk+1) P(ek+2:t | Xk+1) P (xk+1 | Xk)

xk+1

=> P(c2 | L2) * P(c3:2|L2) * P(L2 | L1)

L2

= [ (0.9 * 1* 0.7) + (0.2 * 1* 0.3)] = 0.69

Page 18: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Example of SmoothingExample of SmoothingP(c2 | L1 = @ISI) = P(c2 | L2) * P(L2 | L1)

L2

= [ (0.9 * 0.7) + (0.2 * 0.3)] = 0.69

P(L1 = @ISI | c1, c2) = Norm * 0.818 * 0.69 = Norm * 0.56442 P(L1 = N@ISI | c1, c2) = Norm * 0.182 * 0.41 = Norm * 0.074 After normalization: P(L1 = @ISI | c1, c2) = .883

Smoothed estimate .883 > Filtered estimate P(L1=@ISI | c1)! WHY?

Page 19: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

HMMHMM

Page 20: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

HMMHMM

Hidden Markov Models Speech recognition perhaps the most popular application

– Any speech recognition researcher in class? – Waibel and Lee– Dominance of HMMs in speech recognition from 1980s– For ideal isolated conditions they say 99% accuracy– Accuracy drops with noise, multiple speakers

Find applications everywhere just try putting in HMM in google

First we gave Bellman update to AI (and other sciences) Now we make our second huge contribution to AI: Viterbi

algorithm!

Page 21: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

HMMHMM Simple nature of HMM allow simple and

elegant algorithms

Transition model P(Xt+1 | Xt) for all values of Xt– Represented as a matrix |S| * |S|– For our example: Matrix “T”– Tij = P(Xt= j | Xt-1 = i)

Sensor model also represented as a Diagonal matrix– Diagonal entries give P(et | Xt = i)– et is the evidence, e.g., ct = true– Matrix Ot

7.03.0

3.07.0

2.00

09.0

Page 22: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

HMMHMM

• f1:t+1 = Norm-const * FORWARD (f1:t, ct+1)

= Norm-const * P(ct+1 | Lt+1)

* P(Lt+1 | Lt) * P(Lt|c1,c2…ct)

= Norm-const * Ot+1 * TT * f1:t

f1:2 = P (L2 | c1, c2) = Norm-const * O2 * TT * f1:1

= Norm-const * * *

2.00

09.0

7.03.0

3.07.0

182.0

818.0

Page 23: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

TransposeTranspose

Page 24: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

HMMHMM

• f1:2 = P (L2 | c1, c2) = Norm-const * O2 * TT * f1:1

= Norm-const * * *

= Norm-const * *

= Norm * <(0.63*0.818 + 0.27 * .182) (0.06*0.818 + 0.14 * .182)>

= Norm * <0.564, 0.074> after normalization

= <0.883, 0.117>

2.00

09.0

7.03.0

3.07.0

14.006.0

27.063.0

182.0

818.0

182.0

818.0

Page 25: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Backward in HMMBackward in HMM

P(ek+1:t | Xk) = bk+1:t

= P(ek+1| Xk+1) P(ek+2:t | Xk+1) P (xk+1 | Xk)

xk+1

= T * Ok+1 * bk+2:t

P(c2 | L1 = @ISI) = b2:2 =

7.03.0

3.07.0

2.00

09.0* * b3:2

Page 26: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

BackwardBackward

• bk+1:t = T*Ok+1 * bk+2:t

• b3:2 = T*O2

• = * *

• = ( 0.69 0.41 )

7.03.0

3.07.0

2.00

09.0

1

1

Page 27: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Key Results for HMMsKey Results for HMMs

• f1:t+1 = Norm-const * Ot+1 * TT * f1:t

• bk+1:t = T*Ok+1 * bk+2:t

Page 28: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Inference in DBNInference in DBN

How to do inference in a DBN in general? Could unroll the loop forever…

Xt

Et

Xt+1

Et+1

Xt+2

Et+2

Xt+3

Et+3

Xt+1

Et+1

• Slices added beyond the last observation have no effect on inference WHY? • So only keep slices within the observation period

Page 29: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Inference in DBNInference in DBN

Xt

Et

Xt+1

Et+1

Alarm

JOHN

Mary

Et+3

Xt+1

Et+1

• Slices added beyond the last observation have no effect on inference WHY? • P(Alarm | JohnCalls) independent of MaryCalls

Page 30: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Complexity of inference in DBNComplexity of inference in DBN

Keep almost two slices in memory– Start with slice 0– Add slice 1– “Sum out” slice 0 (get a probability distribution over slice 1

state; don’t need to go back to slice 0 anymore – like POMDPs)– Add slice 2, sum out slice 1…

Constant time and space per update

Unfortunately, update exponential in the number of state variables Need approximate inference algorithms

Page 31: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Solving DBNs in GeneralSolving DBNs in General

Exact methods: – Compute intensive– Variable elimination from Chapter 14

Approximate methods:– Particle filtering popularity– Run N samples together through slices of the DBN network– All N samples constitute the forward message

– Highly efficient– Hard to provide theoretical guarantees

Page 32: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Next LectureNext Lecture

Continue with Chapter 15

Page 33: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Student EvaluationsStudent Evaluations

Page 34: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Surprise Quiz II: Part IISurprise Quiz II: Part II

Xt

Et

Xt+1

Et+1 E’t+1

Xt+1 P(E’)

T 0.7

F 0.01

Xt+1 P(E)

T 0.8

F 0.01

Xt P(Xt+1)

T 0.5

F 0.5

Question:

Page 35: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Most Likely PathMost Likely Path

Given a sequence of observations, find the sequence of states that most likely have generated these observations

E.g., in the E-elves example, suppose

[activity, activity, no-activity, activity, activity]

What is the most likely explanation of the presence of the user at ISI over the course of the day? – Did the user step out at time = 3?– Was the user present all the time, but was in a meeting at time 3

Argmaxx1:t P (X1:t| e1:t)

Page 36: Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California Tambe@usc.edu.

Not so simple…Not so simple…

Use smoothing to find the posterior distribution at each time step E.g., compute P(L1=@ISI | c1:5), P(L1=N@ISI | c1:5), find max Do the same for P(L2=@ISI|c1:5) vs P(L2=N@ISI|c1:5) find max

Find the maximum this way Why might this be different from computing what we want (the

most likey sequence)?

maxx1:t+1 P (X1:t+1| e1:t+1) via viterbi algorithm

Norm * P(et+1 | Xt+1) *

max (P(Xt+1 | xt) max P(x1….xt-1,xt|e1..et)) xt x1..xt-1