CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.

Post on 14-Dec-2015

223 views 3 download

Tags:

Transcript of CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.

CS498-EACS498-EAReasoning in AIReasoning in AILecture #15Lecture #15

Instructor: Eyal AmirInstructor: Eyal Amir

Fall Semester 2011Fall Semester 2011

Summary of last time: Inference

• We presented the variable elimination algorithm– Specifically, VE for finding marginal P(Xi) over one

variable, Xi from X1,…,Xn

– Order on variables such that

– One variable Xj eliminated at a time

(a) Move unneeded terms (those not involving Xj) outside summation over Xj

(b) Create a new potential function, fXj(.) over other variables appearing in the terms of the summation at (a)

• Works for both BNs and MFs (Markov Fields)

Today

1. Treewidth methods:1. Variable elimination

2. Clique tree algorithm

3. Treewidth

Junction Tree• Why junction tree?

– Foundations for “Loopy Belief Propagation” approximate inference

– More efficient for some tasks than VE– We can avoid cycles if we turn highly-

interconnected subsets of the nodes into “supernodes” cluster

• Objective– Compute

• is a value of a variable and is evidence for a set of variable

)|( eEvVPv V e

E

Properties of Junction Tree• An undirected tree• Each node is a cluster (nonempty set)

of variables• Running intersection property:

– Given two clusters and , all clusters on the path between and contain

• Separator sets (sepsets): – Intersection of the adjacent cluster

X YXY YX

ADEABD DEFAD DE

Cluster ABDSepset DE

Potentials

• Potentials: – Denoted by

• Marginalization– , the marginalization of into X

• Multiplication– , the multiplication of and

:X R {0}X

X\Y

YX YX

Y

YXZ YX

YXZ

Properties of Junction Tree

• Belief potentials: – Map each instantiation of clusters or sepsets into a

real number

• Constraints:– Consistency: for each cluster and neighboring

sepset

– The joint distribution

XS

SS\X

X

j

i

j

iPS

XU

)(

Properties of Junction Tree

• If a junction tree satisfies the properties, it follows that:– For each cluster (or sepset) ,

– The probability distribution of any variable , using any cluster (or sepset) that contains

X

)(XX P

VX V

}\{

)(V

VPX

X

Building Junction Trees

DAG

Moral Graph

Triangulated Graph

Junction Tree

Identifying Cliques

Constructing the Moral Graph

A

B

D

C

E

G

F

H

Constructing The Moral Graph

• Add undirected edges to all co-parents which are not currently joined –Marrying parents

A

B

D

C

E

G

F

H

Constructing The Moral Graph

• Add undirected edges to all co-parents which are not currently joined –Marrying parents

• Drop the directions of the arcs

A

B

D

C

E

G

F

H

Triangulating

• An undirected graph is triangulated iff every cycle of length >3 contains an edge to connects two nonadjacent nodes

A

B

D

C

E

G

F

H

Identifying Cliques

• A clique is a subgraph of an undirected graph that is complete and maximal

A

B

D

C

E

G

F

H

EGH

ADEABD

ACEDEF

CEG

Junction Tree

• A junction tree is a subgraph of the clique graph that – is a tree – contains all the cliques– satisfies the running intersection property

EGH

ADEABD

ACEDEF

CEG

ADEABD ACEAD AE CEGCE

DEF

DE

EGH

EG

Principle of Inference

DAG

Junction Tree

Inconsistent Junction Tree

Initialization

Consistent Junction Tree

Propagation

)|( eEvVP

Marginalization

Example: Create Join Tree

X1 X2

Y1 Y2

HMM with 2 time steps:

Junction Tree:

X1,X2X1,Y1 X2,Y2X1 X2

Example: Initialization

VariableAssociated

ClusterPotential function

X1 X1,Y1

Y1 X1,Y1

X2 X1,X2

Y2 X2,Y2

X1,Y1 P(X1)

X1,Y1 P(X1)P(Y1 | X1)

X1,X 2 P(X2 | X1)

X 2,Y 2 P(Y2 | X2)

X1,X2X1,Y1 X2,Y2X1 X2

Example: Collect Evidence

• Choose arbitrary clique, e.g. X1,X2, where all potential functions will be collected.

• Call recursively neighboring cliques for messages:

• 1. Call X1,Y1.– 1. Projection:

– 2. Absorption:

X1 X1,Y1 P(X1,Y1) P(X1)Y1

{X1,Y1} X1

X1,X 2 X1,X 2

X1

X1old

P(X2 | X1)P(X1) P(X1,X2)

Example: Collect Evidence (cont.)

• 2. Call X2,Y2:– 1. Projection:

– 2. Absorption:

X 2 X 2,Y 2 P(Y2 | X2) 1Y 2

{X 2,Y 2} X 2

X1,X2X1,Y1 X2,Y2X1 X2

X1,X 2 X1,X 2

X 2

X 2old

P(X1,X2)

Example: Distribute Evidence

• Pass messages recursively to neighboring nodes

• Pass message from X1,X2 to X1,Y1:– 1. Projection:

– 2. Absorption:

X1 X1,X 2 P(X1,X2) P(X1)X 2

{X1,X 2} X1

X1,Y1 X1,Y1

X1

X1old

P(X1,Y1)P(X1)

P(X1)

Example: Distribute Evidence (cont.)

• Pass message from X1,X2 to X2,Y2:– 1. Projection:

– 2. Absorption:

X 2 X1,X 2 P(X1,X2) P(X2)X1

{X1,X 2} X 2

X 2,Y 2 X 2,Y 2

X 2

X 2old P(Y2 | X2)

P(X2)

1P(Y2,X2)

X1,X2X1,Y1 X2,Y2X1 X2

Example: Inference with evidence

• Assume we want to compute: P(X2|Y1=0,Y2=1) (state estimation)

• Assign likelihoods to the potential functions during initialization:

X1,Y1 0 if Y11

P(X1,Y10) if Y10

X 2,Y 2 0 if Y2 0

P(Y2 1 | X2) if Y2 1

Example: Inference with evidence (cont.)

• Repeating the same steps as in the previous case, we obtain:

X1,Y1 0 if Y11

P(X1,Y10,Y2 1) if Y10

X1 P(X1,Y10,Y2 1)

X1,X 2 P(X1,Y10,X2,Y2 1)

X 2 P(Y10,X2,Y2 1)

X 2,Y 2 0 if Y2 0

P(Y10,X2,Y2 1) if Y2 1

Next Time

• Learning BNs and MFs

THE END

Example: Naïve Bayesian Model

• A common model in early diagnosis:– Symptoms are conditionally independent given the disease (or

fault)

• Thus, if – X1,…,Xp denote whether the symptoms exhibited by the patient

(headache, high-fever, etc.) and – H denotes the hypothesis about the patients health

• then, P(X1,…,Xp,H) = P(H)P(X1|H)…P(Xp|H),

• This naïve Bayesian model allows compact representation– It does embody strong independence assumptions

Elimination on Trees

• Formally, for any tree, there is an elimination ordering with induced width = 1

Thm

• Inference on trees is linear in number of variables