Recent Development on Elimination Ordering Group 1.

Post on 20-Dec-2015

213 views 0 download

Tags:

Transcript of Recent Development on Elimination Ordering Group 1.

Recent Development on Recent Development on Elimination OrderingElimination Ordering

Group 1Group 1

Recent Development on Recent Development on Elimination OrderingElimination Ordering

By Florence Lo

Andrew Yan

Table of ContentsTable of Contents

Problem Statement Background Motivations Approaches to Elimination Orderings An approximation algorithm for

triangulation Q & A

Problem StatementProblem Statement

Minimizing Total Cost in Vertices Elimination

BackgroundBackground

BN specifies a complete joint probability distribution (JPD) over all variables. Given the JPD, one can answer all possible inference queries by summing out over irrelevant variables.Assumed all variables have 2 states and there are n variables => O(2n)

Background (cont)Background (cont)

To make inference more efficient, 2 exact inference algorithms are used

1) Variable Elimination Algorithm (VE)

2) Clique Tree Propagation (CTP)

Variable Elimination (VE)Variable Elimination (VE)

Use the factored representation of the JPD to do marginalization efficiently. The key idea is to “push sum in” as far as possible when summing out irrelevant termsComplexity of VE is based on the cost of variables elimination (i.e. elimination orderings)

Clique Tree Propagation Clique Tree Propagation (CTP)(CTP)

Inference in BN is formulated as message passing in a junction tree

First step in CTP is triangulated graph, one way is to eliminate vertices one by one to add the extra edges (i.e. elimination orderings)

DiscussionDiscussion

Different elimination ordering leads to different cost

NP-hard

3 Approaches to Establish 3 Approaches to Establish Elimination OrderingsElimination Orderings

1. Elimination Ordering Heuristics

2. Triangulation

3. Simulated Annealing

Elimination Ordering Elimination Ordering HeuristicsHeuristics

Maximum Cardinality Search (Tarjan et. al. 1985)

Minimum Deficiency Heuristics (Bertele et. al. 1972)

Minimum Degree Heuristics (Rose 1972)

DiscussionsDiscussions

Easy to implementConcepts are easyLinear time complexityGood approximation to the optimal solution

TriangulationTriangulation

Objectives

1. Size of the maximal clique is minimum

2. Minimal triangulation

Minimal TriangulationMinimal TriangulationComputing a minimal triangulation consists in

embedding a given graph with a triangulated

graph by adding a set of edges (called a fill).

If the set of edges added is inclusion-minimal,

the fill is said to be minimal, and the

corresponding triangulated graph is called a

minimal triangulation

DiscussionDiscussion

Minimal triangulation generally improves the total cost

May stuck in the local minima of the cost function

NP-hard

Simulated AnnealingSimulated AnnealingStochastic optimization algorithm to find

global minimum cost configuration of NP-complete combinatorial problems with cost functions having many local minima

A combination of deterministic descent search and a Monte Carlo method

Accept cost function increase with positive probability dependent on the state of the search process

PseudoPseudo-code-codeSelect an initial solution s0

Select an initial temperature t0 > 0Select a temperature reduction function α Repeat repeat randomly select s from N(s0) δ = f(s) – f(s0) If δ < 0 then s0 = s else generate random x uniformly in the range (0, 1) if x < exp(- δ/t) then s0 = s until iteration_count = nrep Set t = α(t)Until stopping condition = trueS0 is the approximation to the optimal solution

2 Categories of Decisions to 2 Categories of Decisions to makemake

1. Parameters of the annealing algorithm

- t0, cooling schedule governed by nrep and α, stopping conditions

2. Problem specific and involve the choice of the space of feasible solutions, the form of the cost function, neighborhood structure employed

DiscussionDiscussion

Time consumingExtensive experiments on the control

parametersEnhancement and modifications to speed up

computation time

An approximation algorithm An approximation algorithm for triangulationfor triangulation

By Ann Becker and Dan Geiger in 1996 Same state space size Optimality criterion

– Cliquewidth, k 2+1 – approximation

= approximation ratio for 3-way vertex cutO(2(2+1)kn•poly(n))

– poly(n) = linear programming Divide and conquer

The algorithmThe algorithm

Triangulate(G,W,k)If |V| < (2 +1)K then

make a clique out of G

ElseFind a W-decomposition (X,A,B,C) of G wrt (k,)If not found

Return “cliquewidth > k”

WAWA, WBWB, WCWC

Call Triangulate( G[AX], WA,k)

Call Triangulate( G[BX], WB,k)

Call Triangulate( G[CX], WC,k)Make a clique of G[W X]

Trial and errorTrial and error

Try for k = 1, 2, 3… until success!!!

Example (k=3,Example (k=3,=1)=1)

ImprovementImprovement

Processing the input– Simplicial vertex

–Removed repeatedly–Improves the running time

ImprovementImprovement

Termination condition– |V| < (2+1)k from a clique– Junction tree instead– W as a clique– Approximation may improved

ImprovementImprovement

Post-processing the output– May not minimal triangulated– Remove some added edges, still triangulated

– Kjaerulff’s algorithm

In practiceIn practice

Time complexity O(2(2+1)kn•poly(n))|W| < k in most case

– 24.66k 22k

W consists of two subsets– No 3-way vertex cut

Step further for kFind a W-decomposition (X,A,B,C) of G wrt (k,)If not found

Return “cliquewidth > k”

A tighter boundA tighter bound

l be the size of the largest clique in the output

Test against (2+1)k– The ratio can be smaller than 2+1– Optimal cliquewidth can be larger than k

l/k instead of 2+1Instance-specific posteriori bound

The weighted problemThe weighted problem

Different state spaces of each vertex– w(v) = log2(state space size)

– w(clique) = sum of w(v) in the clique

Weighted W-decompositionWhen terminate in recursion, run greedy

algorithm (minimum weight heuristics)(2+1)m

ResultsResults

Medianus I (43 vertices, 110 edges)Compare with the enhanced minimum

weight heuristics– Better when state space increase

l/k = 10/6 (not 3.66)Run time: one or two minutes

DiscussionDiscussion

O(24.66kn•poly(n))– Polynomial for k = O(log n)

Exponential time for an arbitrary inference

What is cliquewidth?What is cliquewidth?

Undirected graphThe size of the largest clique in the junction

free of the graph in which the size of the largest clique is minimized

back

What is 3-way vertex cut?What is 3-way vertex cut?

A weighted undirected graph3 verticesA set of vertices of minimum weight whose

removal three vertices disconnected4/3 – approximation2 – approximation

back

What is W-decomposition?What is W-decomposition?

An integer k 1, a real number 1 A graph G = ( V, E ), |V| ( 2 + 1 ) k W V A decomposition ( X, A, B, C ) wrt ( k, )

– |W| < ( + 1 )k– |X| < k– |(WA)X| < ( + 1 )k– |(WB)X| < ( + 1 )k– |(WC)X| < ( + 1 )k

back

What is decomposition?What is decomposition?

A graph G = ( V, E )A partition ( X, A, B, C )

– A, B – No edges between A, B, C

back