MPE, MAP and approximations

24
MPE, MAP AND APPROXIMATIONS Lecture 10: Statistical Methods in AI/ML Vibhav Gogate The University of Texas at Dallas Readings: AD Chapter 10

description

MPE, MAP and approximations. Lecture 10: Statistical Methods in AI/ML Vibhav Gogate The University of Texas at Dallas. Readings: AD Chapter 10 . What we will cover?. MPE= most probable explanation The tuple with the highest probability in the joint distribution Pr ( X|e ) - PowerPoint PPT Presentation

Transcript of MPE, MAP and approximations

Page 1: MPE, MAP and approximations

MPE, MAP AND APPROXIMATIONS

Lecture 10: Statistical Methods in AI/MLVibhav GogateThe University of Texas at Dallas

Readings: AD Chapter 10

Page 2: MPE, MAP and approximations

What we will cover?• MPE= most probable explanation

• The tuple with the highest probability in the joint distribution Pr(X|e)• MAP=maximum a posteriori

• Given a subset of variables Y, the tuple with the highest probability in the distribution P(Y|e)

• Exact Algorithms• Variable elimination• DFS search• Branch and Bound Search

• Approximations• Upper bounds• Local search

Page 3: MPE, MAP and approximations

Running Example: Cheating in UTD CS Population

Sex (S), Cheating (C), Tests (T1 and T2) and Agreement (A)

Page 4: MPE, MAP and approximations

Most likely instantiations• A person takes a test and the test administrator says

• The two tests agree (A = true)• What is the most likely group that the individual belongs

to?• Query: Most likely instantiation of Sex and Cheating given

evidence A = true• Is this a MAP or an MPE problem?• Answer: Sex=male and Cheating=no.

Page 5: MPE, MAP and approximations

MPE is a special case of MAP• Most likely instantiation of all variables given A=yes

• S=female, C=no, T1=negative and T2=negative

• MPE projected on to the MAP variables does not yield the correct answer.• S=female, C=no is incorrect!• S=male, C=no is correct!

• We will distinguish between• MPE and MAP probabilities• MPE and MAP instantiations

Page 6: MPE, MAP and approximations

Bucket Elimination for MPE• Same schematic algorithm as before• Replace “elimination operator” by “maximization operator”

S C Valuemale yes 0.05male no 0.95female yes 0.01female no 0.99

𝑀𝐴 𝑋𝑆

C Valueyesno

Collect all instantiations that agree on all other variables except SCompute the maximum value

=

Page 7: MPE, MAP and approximations

Bucket Elimination for MPE• Same schematic algorithm as before• Replace “elimination operator” by “maximization operator”

S C Valuemale yes 0.05male no 0.95female yes 0.01female no 0.99

𝑀𝐴 𝑋𝑆

C Valueyes 0.05no 0.99

Collect all instantiations that agree on all other variables except S and return the maximum value among them.

=

Page 8: MPE, MAP and approximations

Bucket elimination: order (S, C, T1, T2)

Evidence: A=true

S

C

T1

T2

𝜓 (𝐶 ,𝑇 2 )𝑀𝐴𝑋 𝑆

𝜓 (𝑇 1 ,𝑇 2 )𝑀𝐴𝑋 𝐶 𝜓 (𝐶 ,𝑇 2 )

𝜓 (𝑇 1 ,𝑇 2 )𝜓 (𝑇 2 )𝑀𝐴𝑋 𝑇 1

𝜓 (𝑇 2 )2

MPE probability

Page 9: MPE, MAP and approximations

Bucket elimination: Recovering MPE tuple

Evidence: A=true

S

C

T1

T2

𝜓 (𝐶 ,𝑇 2 )

𝜓 (𝑇 1 ,𝑇 2 )

𝜓 (𝑇 2 )

MPE probability

Page 10: MPE, MAP and approximations

Bucket elimination: MPE vs PE (Z)• Maximization vs summation• Complexity: Same

• Time and Space exponential in the width (w) of the given order: O(n exp(w*+1))

Page 11: MPE, MAP and approximations

BE and Hidden Markov models

• BE_MPE in the order S1, S2, S3, ...., is equivalent to the Viterbi algorithm

• BE_PE in the order in the order S1, S2, S3, ...., is equivalent to the Forward algorithm

Page 12: MPE, MAP and approximations

OR search for MPE

• At leaf nodes compute probabilities by taking product of factors

• Select the path with the highest leaf probability

Page 13: MPE, MAP and approximations

Branch and Bound Search

• Oracle gives you an upper bound on MPE• Prune nodes which have smaller upper bound than the current MPE

solution

Page 14: MPE, MAP and approximations

16

Mini-Bucket Approximation: IdeaSplit a bucket into mini-buckets => bound complexity

bucket (Y) ={ 1, …, r, r+1, …, n }

{1, …, r } {r+1, …, n }

𝑔=𝑀𝐴𝑋𝑌 (∏𝑖=1𝑛

𝜙𝑖)h1=𝑀𝐴𝑋𝑌 (∏𝑖=1

𝑟

𝜙𝑖) h2=𝑀𝐴𝑋𝑌 ( ∏𝑖=𝑟+ 1

𝑛

𝜙𝑖)𝑔≤ h1×h2

Page 15: MPE, MAP and approximations

Mini Bucket elimination: (max-size=3 vars)

Evidence: A=true

𝜓 (𝑆 ,𝑇 1 ) ,𝜓 (𝑆 ,𝑇 2 ) 𝑀𝐴𝑋𝐶

𝜓 (𝑇 1 ,𝑇 2 )𝜓 (𝑇 2 )𝑀𝐴𝑋 𝑇 1

𝜓 (𝑇 2 )2

Upper bound on the MPE probability

C

S

T1

T2

𝜓 (𝑆 ,𝑇 1 ) ,𝜓 (𝑆 ,𝑇 2 )𝜓 (𝑇 1 ,𝑇 2 )𝑀𝐴𝑋 𝑆

Page 16: MPE, MAP and approximations

Mini-bucket (i-bounds)• A parameter “i” which controls the size of (number of

variables in) each mini-bucket• Algorithm exponential in “i” : O(n exp(i))• Example

• i=2, quadratic• i=3, cubed• etc

• Higher the i-bound, better the upper bound• In practice, can use i-bounds as high as 22-25.

Page 17: MPE, MAP and approximations

Branch and Bound Search

• Oracle = MBE (i) at each point.• Prune nodes which have smaller upper bound than the current MPE

solution

Page 18: MPE, MAP and approximations

Computing MAP probabilities: Bucket Elimination

• Given MAP variables “M”• Can compute the MAP probability using bucket

elimination by first summing out all non-MAP variables, and then maximizing out MAP variables.

• By summing out non-MAP variables we are effectively computing the joint marginal Pr(M, e) in factored form.

• By maximizing out MAP variables M, we are effectively solving an MPE problem over the resulting marginal.

• The variable order used in BE_MAP is constrained as it requires MAP variables M to appear last in the order.

Page 19: MPE, MAP and approximations

MAP and constrained width

• Treewidth = 2• MAP variables = {Y1,..,Yn}• Any order in which M variables come first has width greater

than or equal to n• BE_MPE linear and BE_MAP is exponential

Page 20: MPE, MAP and approximations

MAP and constrained width

Page 21: MPE, MAP and approximations

MAP by branch and bound search• MAP can be solved using depth-first brand-and-bound

search, just as we did for MPE.• Algorithm BB_MAP resembles the one for computing

MPE with two exceptions.• Exception 1: The search space consists only of the MAP

variables• Exception 2: We use a version of MBE_MAP for

computing the bounds• Order all MAP variables after the non-MAP variables.

Page 22: MPE, MAP and approximations

MAP by Local Search• Given a network with n variables and an elimination order

of width w• Complexity: O(r nexp(w+1)) where “r” is the number of local search

steps• Start with an initial random instantiation of MAP variables• Neighbors of the instantiation “m” are instantiations that

result from changing the value of one variable in “m”• Score for neighbor “m”: Pr(m,e)• How to compute Pr(m,e)?

• Bucket elimination.

Page 23: MPE, MAP and approximations

MAP: Local search algorithm

Page 24: MPE, MAP and approximations

Recap• Exact MPE and MAP

• Bucket elimination• Branch and Bound Search

• Approximations• Mini bucket elimination• Branch and Bound Search• Local Search