Compiling Graphical Models

Post on 10-Jan-2016

49 views 2 download

Tags:

description

Compiling Graphical Models. Adnan Darwiche University of California, Los Angeles UAI’06 Tutorial. Compilation: Historical Motivation. Separate inference into two phases: Offline : Compile model into a structure Online : Use structure to answer queries - PowerPoint PPT Presentation

Transcript of Compiling Graphical Models

Compiling Graphical Models

Adnan DarwicheUniversity of California, Los

Angeles

UAI’06 Tutorial

Compilation: Historical Motivation

Separate inference into two phases: Offline: Compile model into a structure Online: Use structure to answer queries

Goal: Push as much work into offline phase to optimize online inference time

Best initial example: Offline: Compile a Bayesian network into a jointree Online: Use jointree to answer multiple queries

efficiently

Compilation: Modern Motivation

Exploit model structure in inference: Global structure:

Exhibited in model topology Measured by treewidth Exploited by most (non-compilation) algorithms

Local structure: Exhibited in model parameters Type 1: Determinism Type 2: Context-specific independence

Local structure is best exploited in the context of compilation: main theme

Compilation: Theoretical Implications

Unifies inference paradigms Variable elimination Jointree (Tree clustering) Conditioning

Compilation as a trace of classical inference

Bayesian Networks

Battery Age Alternator Fan Belt

BatteryCharge Delivered

Battery Power

Starter

Radio Lights Engine Turn Over

Gas Gauge

Gas

Fuel Pump Fuel Line

Distributor

Spark Plugs

Engine Start

Local Knowledge

Bayesian Networks

Battery Age Alternator Fan Belt

BatteryCharge Delivered

Battery Power

Starter

Radio Lights Engine Turn Over

Gas Gauge

Gas

Fuel Pump Fuel Line

Distributor

Spark Plugs

Engine Start

ON OFF

OK

WEAK

DEAD

Lights

Batt

ery

P

ow

er .99 .01

.20 .80

0 1

If Battery Power = OK, then Lights = ON (99%)

….

Bayesian Networks

Battery Age Alternator Fan Belt

BatteryCharge Delivered

Battery Power

Starter

Radio Lights Engine Turn Over

Gas Gauge

Gas

Fuel Pump Fuel Line

Distributor

Spark Plugs

Engine Start

Global Structure:Treewidth w

))exp(( wnO

Battery Age Alternator Fan Belt

BatteryCharge Delivered

Battery Power

Starter

Radio Lights Engine Turn Over

Gas Gauge

Gas

Fuel Pump Fuel Line

Distributor

Spark Plugs

Engine Start

Local Structure:CSI and Determinism

Battery Age Alternator Fan Belt

BatteryCharge Delivered

Battery Power

Starter

Radio Lights Engine Turn Over

Gas Gauge

Gas

Fuel Pump Fuel Line

Distributor

Spark Plugs

Engine Start

Context Specific Independence (CSI)

Local Structure:CSI and Determinism

Local Structure:CSI and Determinism

Battery Age Alternator Fan Belt

BatteryCharge Delivered

Battery Power

Starter

Radio Lights Engine Turn Over

Gas Gauge

Gas

Fuel Pump Fuel Line

Distributor

Spark Plugs

Engine Start

ON OFF

OK

WEAK

DEAD

Lights

Batt

ery

P

ow

er .99 .01

.20 .80

0 1

If Battery Power = Dead,

then Lights = OFF

Determinism

Today’s Models …

Characterized by: Richness in local structure (determinism, CSI) Massiveness in size (100,000’s variables not

uncommon) High connectivity (treewidth > 50, > 100)

Enabled by: High level modeling tools: relational, first order New application areas (synthesis):

Bioinformatics (e.g. linkage analysis) Sensor networks

Exploiting local structure a must!

High Order Specifications:Relational Models…

burglary(v)=0.005;alarm(v)=(burglary(v):0.95,0.01);calls(v,w)= (neighbor(v,w): (prankster(v)): (alarm(w):0.9,0.05), (alarm(w):0.9,0)),0);alarmed(v)= n-or{calls(w,v)|w:neighbor(w,v)}

burglary(v)=0.005;alarm(v)=(burglary(v):0.95,0.01);calls(v,w)= (neighbor(v,w): (prankster(v)): (alarm(w):0.9,0.05), (alarm(w):0.9,0)),0);alarmed(v)= n-or{calls(w,v)|w:neighbor(w,v)}

Primula

Friends and Smokers (Richardson & Domingos, 2004)

M individuals Relations such as

smokes(p), cancer(p), friend(p1,p2)

Logical constraints such as: if one of p's friends smokes, then p smokes.

Sample Query: probability that given person has cancer

77,65621,91217286,46416

42,13311,93411846,93013

19,5255,5657021,76010

6,9161,995367,7147

1,390414131,5524

31123341

451,965126,614560502,80229

407,218114,114528453,04028

290,87581,600412323,65025

199,11155,935316221,58422

129,01036,309244143,60219

CnfClauses

CNFVars

Treewidth*

w

Networkparams

M

Students(Pasula & Russell, 2001)

P professors S students Various relations, such as

famous(p), well-funded(p), success(s), advises(p,s)

Sample Query: probability a professor is well-funded given success of advised students 17,69323362,30206-24

9,20917633,45406-12

10,73414838,16805-20

5,62412820,68805-10

5,85910121,07004-16

3,0997211,56604-08

CNFVars

Treewidthw

Networkparams

Students-Profs

Ordering genes on a chromosome and determining distance between them

Useful for predicting and detecting diseases

Associating functionality of genes with their location on the chromosome

Gene 1

Gene 2

Gene 3

Genetic Linkage Analysis

Pedigrees + Phenotype + Genotype

DBNs from Speech Applications

Coding Networks

Tutorial Outline

Theoretical foundations Online query answering

algorithms Offline compilation algorithms Applications Concluding remarks

Theoretical Foundations

Graphical Model (Bayesian, Markov Networks):

Is a Multi-Linear Function (MLF) Compiled Model:

Is an Arithmetic Circuit (AC)

Compilation process: Factoring MLF into AC

Multi-Linear Functions Arithmetic Circuits

ababaababaababaababaf ||||

A B

**

* *

+

+ +

* * * *

a ab ba aab| ab | ab| ab |

Factoring

A Differential Approach to Inference in Bayesian NetworksJACM-03 (Darwiche)

Factoring Multi-linear Functions (MLFs)

a + ad + abd + abcdMLF:

*

+

*

a b dc 1

+

Arithmetic Circuit (AC)

An MLF has an exponential number of terms, yet it may be represented by an AC with polynomial size!

• A graphical model defines an MLF

• Evaluating the MLF for a given evidence gives the probability of evidence

• The inference problem can be formulated as factoring the MLF of a graphical model

Circuit Complexity: Size of smallest AC that computes the MLF

Pr(a) =Pr(a) = .03.03 + .27 = .3+ .27 = .3

false

false

B

.03

.27

A

.56

.14

truetrue

true

false

false

false

Pr(.)

false

true

Graphical Models as MLFs

Pr(~b) =Pr(~b) = .27.27 + .14 = .41+ .14 = .41

false

false

B

.03

.27

A

.56

.14

truetrue

true

false

false

false

false

true

Pr(.)

Graphical Models as MLFs

.03λaλb + .27λaλ~b + .56λ~aλb + .14λ~a λ~b

false

false

B

.03

A

truetrue

true

false

false

false

false

true

.27

.14

.56

λa*λb * .03

λa*λ~b * .27

λ~a*λb * .56

λ~a*λ~b* .14

F(λ~a, λ~b, λa, λb) =

Pr(.)

Graphical Models as MLFs

=.03λaλb + .27λaλ~b + .56λ~aλb + .14λ~a λ~b

F(λ~a, λ~b, λa, λb)

Pr(a,~b)= F(λ~a:0, λ~b:1, λa:1 , λb:0) = .27

Pr(a)= F(λ~a:0, λ~b:1, λa:1 , λb:1) = .03+.27

A

B

C

θb|a

θa

θc|a

A B C Pr(.)

a b c θa θb|a θc|a

a b ~c θa θb|a θ~c|a

a ~b c θa θ~b|a θc|a

a ~b ~c θa θ~b|a θ~c|a

. . . …

A

B

C

θb|a

θa

θc|a

A B C Pr(.)

a b c λa λb λc θa θb|a θc|a

a b ~c λa λb λ~c θa θb|a θ~c|a

a ~b c λa λ~b λc θa θ~b|a θc|a

a ~b ~c λa λ~b λ~c θa θ~b|a θ~c|a

. . . …

F = λa λb λc θa θb|a θc|a + λa λb λ~c θa θb|a θ~c|a + λa λ~b λc θa θ~b|a θc|a +

λa λ~b λ~c θa θ~b|a θ~c|a

….

A

B

C

F = λa λb λc λd θa θb|a θc|a θd|bc +

λa λb λc λ~d θa θb|a θc|a θ~d|bc +

….

A

B

C

D

Each term has 2n variables (n indicators, n parameters)

Each variable has degree one (multi-linear function)

θa

θb|a

θc|a

θd|bc

Multi-Linear Functions Arithmetic Circuits

ababaababaababaababaf ||||

A B

**

* *

+

+ +

* * * *

a ab ba aab| ab | ab| ab |

Factoring

Online Query Answering Complexity:

Time and space linear in the AC size

Queries: Probability of evidence, with

evidence flipping/fast retraction Variable and family marginals MPE: most probable explanation Sensitivity analysis (derivatives)

Evaluating the Polynomial

)Pr(..)()( eFeF

PR: Probability of Evidence

Battery Age Alternator Fan Belt

BatteryCharge Delivered

Battery Power

Starter

Radio Lights Engine Turn Over

Gas Gauge

Gas

Fuel Pump Fuel Line

Distributor

Spark Plugs

Engine Start

Pr(e)

The Partial Derivatives

),Pr()( xXeeF

x

PR: Probability of Evidence Flips

Battery Age Alternator Fan Belt

BatteryCharge Delivered

Battery Power

Starter

Radio Lights Engine Turn Over

Gas Gauge

Gas

Fuel Pump Fuel Line

Distributor

Spark Plugs

Engine Start

Pr(e)X

PR: Probability of Evidence Flips

Battery Age Alternator Fan Belt

BatteryCharge Delivered

Battery Power

Starter

Radio Lights Engine Turn Over

Gas Gauge

Gas

Fuel Pump Fuel Line

Distributor

Spark Plugs

Engine Start

Pr(e-X,x)X

The Partial Derivatives

),,Pr()(|

| uxeeF

uxux

PR: Family MarginalsBattery Age Alternator Fan Belt

BatteryCharge Delivered

Battery Power

Starter

Radio Lights Engine Turn Over

Gas Gauge

Gas

Fuel Pump Fuel Line

Distributor

Spark Plugs

Engine Start

UX

Pr(e,x,u)

Multi-Linear Functions Arithmetic Circuits

ababaababaababaababaf ||||

A B

**

* *

+

+ +

* * * *

a ab ba aab| ab | ab| ab |

Factoring

* *

* *

+

+

+

* * * *

Circuit Evaluation and Differentiation: Marginals

.3 1 .1 1 .9 .8 1 .2 0 .7

.3

.3 .1 .9 .8 .2 0

1 1

.3 01

1 1

1 .3 .3 0 0 1

1 .3 .3 .03 .3 0 .27 0 .7 0

)Pr(a

a ab ba aab| ab | ab|ab |),Pr()(03. baa

f

b

)Pr()(7. aaf

a

Two passes only:

•probability of evidence (with evidence flipping)•Node marginals•Family marginals•Sensitivity

Efficient Eval/Diff Schemes

Assume alternating levels of +/* nodes, with one parent per *node

Method A: Two registers per +node (no registers for *nodes)

Method B: One register per node (use for values in upward pass, then override with derivatives in downward pass)

Method C: One register per node, one bit per *node

.3 1 .1 1 .9 .8 1 .2 0 .7

**

* *

m

m m

* * * *

.27

.3 .1 .9 .8 .2 0

.9 .8

.27 0

Circuit Optimization: MPE

)(aMPE

*

* *

a ab ba aab|ab | ab|

ab |

m

*

m

* *

Circuit Optimization: MPE

baMPE ,:

a ab |

a b

Custom Hardware for Evaluating ACs

Adharapurapu, Ercegovac (2004)

Offline Compilation

Factoring MLFs into ACs: Jointree: Embeds AC Variable Elimination: Trace is an AC Recursive Conditioning: Trace is an

AC

Reduction to Logic: CNF to d-DNNFcompilation

Compiling using Jointrees Classical Jointree Algorithm:

Convert model into jointree Jointree propagation (two-passes)

Modern interpretation: Jointree embeds an AC that factors MLF Jointree propagation is

evaluating/differentiating embedded AC

AB

A

A B

root

A Jointree Embeds an AC…

AC AD

AE

AB ba:ba:ba:ab:

Aa:a:

A a:a:B b: b:

Inward-pass evaluates circuitOutward-pass differentiates circuit[Hugin, Shenoy Shafer,…]

A Differential Semantics to Jointree AlgorithmsAIJ-04 (with James Park)

Efficient Eval/Diff Schemes

Assume alternating levels of +/* nodes, with one parent per *node

Method A: Two registers per +node (no registers for *nodes)

Method B: One register per node (use for values in upward pass, then override with derivatives in downward pass)

Method C: One register per node, one bit per *node

Jointree Flavors Shenoy-Shafer:

Method A

Hugin:Method B (looses information)

Zero-Conscious Hugin (new):Method C (best of A,B)

Compiling using Variable Elimination (VE) VE operates on factors:

Mappings from variable instantiations to real numbers

VE performs two operations on factors: Multiply two factors Sum-Out a variable from factor

Factors have different representations: Tables More structured representations (decision

trees/graphs) Overhead problem for structured factors

A B

true

false

A

.3

.7

TA

Tabular Factors

false

B

.1

.9

A

.8

.2

truetrue

true

false

false

false

TB

false

true

X

Z

.1 .9

Y

.5

Z

Structured Factors:Algebraic Decision Diagrams (ADDs)

NetworkMax

Clust Vars Card Total Parms %Det %Distinct

alarm 7.2 37 2...4 752 0.9 24.6

bm 20 1005 2...2 6972 99.6 100

diabetes 17.2 413 3...21 461069 78.2 17.6

hailfinder 11.7 56 2...11 3741 15.7 26.9

mildew 21.4 35 3...100 547158 93.2 25.1

mm 23 1220 2...2 8326 98.7 75

munin1 26.8 189 1...21 19466 66.5 61.2

munin2 18.6 1003 2...21 83920 63.3 69.5

munin3 17.8 1044 1...21 85855 63.1 71.3

munin4 21.4 1041 1...21 98183 64.5 65.3

pathfinder 15 109 2...63 97851 56.1 5.1

pigs 17.4 441 3...3 8427 56.2 23.9

students 22 376 2...2 2616 90.7 79.3

tcc4f 10 105 2...2 3236 0.4 35.6

water 19.9 32 3...4 13484 54 57

Networks with Local Structure

VE: Tabular vs ADD Representations of Factors

Tabular ADD

Network Time (ms) Time (ms) Improvement

alarm 31 360 0.086

barley 307 14,049 0.022

bm-5-3 4,892 658 7.435

diabetes 949 33,220 0.029

hailfinder 48 515 0.093

link 1,688 2,658 0.635

mm-3-8-3 2,166 843 2.569

mildew 72 92,602 0.001

munin1 155 1,255 0.124

munin2 204 3,170 0.064

munin3 350 5,049 0.069

munin4 406 4,361 0.093

pathfinder 51 5,213 0.01

pigs 69 597 0.116

st-3-2 186 362 0.514

tcc4f 29 153 0.19

water 76 1,015 0.075

Compiling using Variable Elimination (VE) By using symbolic factors and

corresponding operations: VE compiles out an AC

VE with tabular factors: Generates ACs similar to those

embedded in jointree

VE with structured factors: Generates much smaller ACs Overhead pushed into offline phase

A B

true

false

A

.3

.7

TA

Factors

false

B

.1

.9

A

.8

.2

truetrue

true

false

false

false

TB

false

true

A B

true

false

A TA

θa * λa

θ~a * λ~a

false

BA

truetrue

true

false

false

false

TB

false

true

θ~b|a * λ~b

θb|~a * λb

θb|a * λb

θ~b|~a* λ~b

Symbolic Factors

true

false

A T’B

θb|a *λb + θ~b|a* λ~b

θb|~a*λb+θ~b|~a *λ~bfalse

BA

truetrue

true

false

false

false

TB

false

true

θ~b|a * λ~b

θb|~a * λb

θb|a * λb

θ~b|~a * λ~b

Summing out B

Summing out Variable B

* =

Multiplying Factors

true

false

A TA T’B

θa *λa *(θb|a* λb + θ~b|a* λ~b)

θ~a*λ~a*(θb|~a*λb + θ~b|~a*λ~b)

true

false

A T’B

θb|a*λb + θ~b|a*λ~b

θb|~a*λb + θ~b|~a*λ~b

true

false

A TA

θa*λa

θ~a*λ~a

θa * λa* (θb|a* λb + θ~b|a* λ~b) + θ~a* λ~a (θb|~a* λb + θ~b|~a* λ~b)

true

false

A TA T’B

θa * λa * (θb|a * λb + θ~b|a * λ~b)

θ~a * λ~a* (θb|~a* λb + θ~b|~a* λ~b)

Summing out Variable A

VE factors MLF into AC(Bottom up Construction)

ababaababaababaababaf ||||

A B

**

* *

+

+ +

* * * *

a ab ba aab| ab | ab| ab |

Factoring

•Time and space complexity of generating AC is similar to Variable Elimination: Exponential only in treewidth

•Generated ACs similar to those embedded in Jointree

•Recall: AC can be used to answer multiple queries!

X

Z

.1 .9

Y

.5

Z

Structured Factors:Algebraic Decision Diagrams (ADDs)

X

Z Y

Z

Structured Factors:Algebraic Decision Diagrams (ADDs)

1 2 3

Symbolic ADD

•Modify standard ADD operations (multiply, sum-out) to operate on symbolic ADDs

•Run variable elimination with symbolic ADDs

•Compile out an AC

•Asymptotic complexity is no worse than variable elimination

•Overhead of ADDs is pushed into offline phase

•Generated AC can be much smaller

•Online inference can be much faster

NetworkMax

Clust Vars Card Total Parms %Det %Distinct

alarm 7.2 37 2...4 752 0.9 24.6

bm 20 1005 2...2 6972 99.6 100

diabetes 17.2 413 3...21 461069 78.2 17.6

hailfinder 11.7 56 2...11 3741 15.7 26.9

mildew 21.4 35 3...100 547158 93.2 25.1

mm 23 1220 2...2 8326 98.7 75

munin1 26.8 189 1...21 19466 66.5 61.2

munin2 18.6 1003 2...21 83920 63.3 69.5

munin3 17.8 1044 1...21 85855 63.1 71.3

munin4 21.4 1041 1...21 98183 64.5 65.3

pathfinder 15 109 2...63 97851 56.1 5.1

pigs 17.4 441 3...3 8427 56.2 23.9

students 22 376 2...2 2616 90.7 79.3

tcc4f 10 105 2...2 3236 0.4 35.6

water 19.9 32 3...4 13484 54 57

Networks with Local Structure

Tabular ADD

Network Time (ms) Time (ms) Improvement

alarm 31 360 0.086

barley 307 14,049 0.022

bm-5-3 4,892 658 7.435

diabetes 949 33,220 0.029

hailfinder 48 515 0.093

link 1,688 2,658 0.635

mm-3-8-3 2,166 843 2.569

mildew 72 92,602 0.001

munin1 155 1,255 0.124

munin2 204 3,170 0.064

munin3 350 5,049 0.069

munin4 406 4,361 0.093

pathfinder 51 5,213 0.01

pigs 69 597 0.116

st-3-2 186 362 0.514

tcc4f 29 153 0.19

water 76 1,015 0.075

Tabular vs ADD: Standard VE

Time (s) AC size

Network Ace ADD-VE Improv. Tabular-VE ADD-VE Improv.

alarm 0.3 3.9 0.1 3,534 3,030 1.2

barley 8,190.20 122.8 66.7 66,467,777 24,653,744 2.7

bm-5-3 0.8 6 0.1 75,591,750 14,836 5095.2

diabetes 1,710.00 110.3 15.5 34,728,957 17,219,042 2

hailfinder 0.7 1.2 0.5 72,755 25,992 2.8

link - 699.7 - 127,262,777 89,097,450 1.4

mildew 3,125.20 218.9 14.3 16,094,592 3,352,330 4.8

mm-3-8-3 1.5 11.9 0.1 36,635,566 108,428 337.9

munin1 1,005.10 316.7 3.21,260,407,1

23 31,409,970 40.1

munin2 198.4 31.7 6.3 20,295,426 5,662,218 3.6

munin3 188.4 17.6 10.7 16,987,088 3,503,242 4.8

munin4 205 37.8 5.4 76,028,532 6,869,760 11.1

pathfinder 4.9 5.8 0.9 796,588 44,468 17.9

pigs 23.1 10 2.3 4,925,388 2,558,680 1.9

st-3-2 0.5 2.4 0.2 19,374,934 22,070 877.9

tcc4f 0.9 1.1 0.8 33,408 22,612 1.5

water 3 20.7 0.1 15,996,054 170,428 93.9

Tabular vs ADD: VE Compilations

Network Jointree ADD-VE Improv.

alarm 166 32 5.2

barley 65,226 35,209 1.9

bm-5-3 89,593 83 1079.4

diabetes 29,316 20,421 1.4

hailfinder 245 70 3.5

link 223,542 175,769 1.3

mildew 10,077 4,522 2.2

mm-3-8-3 34,001 198 171.7

munin1 669,915 37,451 17.9

munin2 17,857 7,180 2.5

munin3 13,351 4,945 2.7

munin4 42,754 8,683 4.9

pathfinder 1,332 102 13.1

pigs 3,020 2,814 1.1

st-3-2 17,536 82 213.9

tcc4f 281 73 3.8

water 16,676 251 66.4

ADD-VE vs Jointree: Online Inference Time (ms)

Computing all marginals, for 16 pieces of random evidence

Work on structured representations of factors is now muchmore relevant and practical.

Compiling by Reduction to Logic Algebraic: MLFs / ACs Logical: CNF / d-DNNF

Factoring MLF into AC can be reducedto factoring CNF into d-DNNF

CNF to d-DNNF compilers are very powerful (natural for exploiting determinism and CSI)

Compiler:http://reasoning.cs.ucla.edu/c2d

d-DNNFd-DNNFCNFCNF

Multi-Linear Function

ArithmeticCircuit

Encode Decode

Reduction to Logic

a c + a b c + cMulti-linear function:Propositional theory:

c ^ (a b) Encode

c

b 1

a 1Arithmetic Circuit

Decode

c

b b

a aSmooth d-DNNF

Compile

MLFsACsCNFsd-DNNF

or

and

A

and

Aand and

or

and

B

C

or

and

D

E

or or

B D

and and

Deterministic, , Decomposable NNF

or

and

A

and

Aand and

or

and

B

C

or

and

D

E

or or

B D

and and

Deterministic, , Decomposable NNF

Deterministic:Disjuncts are logically disjoint

or

and

A

and

Aand and

or

and

B

C

or

and

D

E

or or

B D

and and

Deterministic, Decomposable NNF

B

C

BD

E

D

Decomposable:Conjuncts share no variables

Compiling CNFs into d-DNNFsAAAI-02, ECAI-04

Compiler at http://reasoning.cs.ucla.edu/c2d

A B C A B CA D E A D E

Recursive Conditioning for Compilation

or

B CD E

D EB C

A

and

and

and

andA

B C D E

B C D E

Why Logic? Encoding local structure is easy:

Determinism encoded by adding clauses:

CSI encoded by collapsing variables:

A natural environment to exploit local structure:

DD-backtracking, clause learning, … Non-structural decomposition Non-structural (formula) caching

0| AC

BACABC ||

A B C

S

0.95

c

a b c

A Pr(S|A,B,C)B C

a

a

a

a

a

a

a

b

b

b

b

b

b

b

c

c

c

c

c

c

0.95

0.20

0.05

0.00

0.00

0.00

0.00

Tabular CPT

-Functional constraints-Context-specific independence

s|abe

Local Structure

0.95

c

a b c

A Pr(S|A,B,E)B C

a

a

a

a

a

a

a

b

b

b

b

b

b

b

c

c

c

c

c

c

0.95

0.20

0.05

0.00

0.00

0.00

0.00

Tabular CPT

λ~a λb λc λs ↔↔ θs|~abc

¬ λ~a ¬ λb ¬ λc ¬ λs

Determinism

0.95

c

a b c

A Pr(S|A,B,C)B C

a

a

a

a

a

a

a

b

b

b

b

b

b

b

c

c

c

c

c

c

0.95

0.20

0.05

0.00

0.00

0.00

0.00

Tabular CPT

λa λb λs ↔↔ θs|ab

λa λb λc λs ↔↔ θs|abc

λa λb λ~c λs ↔↔ θs|ab~c

Context-Specific Independence

X

Y

Belief network

xx

xx

x yx|y

….

….

CNF Smooth d-DNNF

x y x|

yx

x y x|

y

Arithmetic Circuit

The Ace System:http://reasoning.cs.ucla.edu/ace

Time (s) AC size

Network Ace ADD-VE Improv. Tabular-VE ADD-VE Improv.

alarm 0.3 3.9 0.1 3,534 3,030 1.2

barley 8,190.20 122.8 66.7 66,467,777 24,653,744 2.7

bm-5-3 0.8 6 0.1 75,591,750 14,836 5095.2

diabetes 1,710.00 110.3 15.5 34,728,957 17,219,042 2

hailfinder 0.7 1.2 0.5 72,755 25,992 2.8

link - 699.7 - 127,262,777 89,097,450 1.4

mildew 3,125.20 218.9 14.3 16,094,592 3,352,330 4.8

mm-3-8-3 1.5 11.9 0.1 36,635,566 108,428 337.9

munin1 1,005.10 316.7 3.21,260,407,1

23 31,409,970 40.1

munin2 198.4 31.7 6.3 20,295,426 5,662,218 3.6

munin3 188.4 17.6 10.7 16,987,088 3,503,242 4.8

munin4 205 37.8 5.4 76,028,532 6,869,760 11.1

pathfinder 4.9 5.8 0.9 796,588 44,468 17.9

pigs 23.1 10 2.3 4,925,388 2,558,680 1.9

st-3-2 0.5 2.4 0.2 19,374,934 22,070 877.9

tcc4f 0.9 1.1 0.8 33,408 22,612 1.5

water 3 20.7 0.1 15,996,054 170,428 93.9

ADD-VE vs Logic (Ace): Compile Times

Network Nodes Parameters Max Cluster

mastermind_04_08_03 1418 9802 26

mastermind_06_08_03 1814 12754 37

mastermind_10_08_03 2606 18658 54

mastermind_03_08_04 2288 16008 31

mastermind_04_08_04 2616 18488 39

mastermind_03_08_05 3692 26186 40

students_03_02 376 2616 25

students_03_12 1346 9856 59

students_04_16 2827 21070 101

students_05_20 5064 38168 148

students_06_24 8201 62302 233

blockmap_05_03 1005 6972 23

blockmap_10_03 6848 48758 52

blockmap_15_03 18787 132436 68

blockmap_20_03 43356 307220 92

blockmap_22_03 59404 423452 104

ADD-VE vs Logic (Ace)

NetworkOffline Time

(min)AC Nodes AC Edges

Online Inference Time (s)

mastermind_04_08_03 1 71,666 541,356 0.05

mastermind_06_08_03 1 258,228 1,523,888 0.15

mastermind_10_08_03 3 1,293,323 4,315,566 0.68

mastermind_03_08_04 2 186,351 4,859,201 0.3

mastermind_04_08_04 5 932,355 19,457,308 1.73

mastermind_03_08_05 10 1,359,391 55,417,639 4.33

students_03_02 1 7,927 37,281 0.01

students_03_12 1 24,219 113,876 0.02

students_04_16 3 181,166 815,461 0.09

students_05_20 7 1,319,834 5,236,257 1.84

students_06_24 33 9,922,233 36,450,231 12.97

blockmap_05_03 1 2,833 20,636 0.01

blockmap_10_03 2 17,749 974,817 0.06

blockmap_15_03 6 47,475 7,643,307 0.38

blockmap_20_03 30 105,602 40,172,434 2.45

blockmap_22_03 61 144,136 76,649,302 4.67

ADD-VE vs Logic (Ace)

Effect of Local Structure

Local StructureEncoded

Pathfinder

Water Munin4

None 981,178 13,777,166

116,136,985

Det + CSI

42,810(4%)

134,140(1%)

5,762,690(5%)

Det 130,380(13%)

138,501(1%)

9,997,267(9%)

CSI 200,787(20%)

11,111,104(81%)

17,612,036(15%)

Compilation vs Direct Inference

Grid problems here…

Compilation vs Direct Inference

Gridsize

Treewidth w

Det Cachet(sec)

Aceoffline(sec)

Aceonline(sec)

Offline/Online

16x16

25 50% 2236 220 2.072

1079

22x22

36 75% 2757 349 2.178

2024

34x34

60 90% 1584 79 0.419

3783Average over 10 random instances for each grid

Ace available at http://reasoning.cs.ucla.edu/ace

Applications

Relational Models Diagnosis Genetic Linkage Analysis

burglary(v)=0.005;alarm(v)=(burglary(v):0.95,0.01);calls(v,w)= (neighbor(v,w): (prankster(v)): (alarm(w):0.9,0.05), (alarm(w):0.9,0)),0);alarmed(v)= n-or{calls(w,v)|w:neighbor(w,v)}

burglary(v)=0.005;alarm(v)=(burglary(v):0.95,0.01);calls(v,w)= (neighbor(v,w): (prankster(v)): (alarm(w):0.9,0.05), (alarm(w):0.9,0)),0);alarmed(v)= n-or{calls(w,v)|w:neighbor(w,v)}

Primula/Ace: Upcoming Release

Friends and Smokers (Richardson & Domingos, 2004)

M individuals Relations such as

smokes(p), cancer(p), friend(p1,p2)

Logical constraints such as: if one of p's friends smokes, then p smokes.

Sample Query: probability that given person has cancer

Friends & SmokersM Networ

kparams

Treewidth w

CNFVars

CnfClauses

ACEdges

OnlineTime (sec)

OfflineTime(sec)

1 34 3 12 31 18 0 0.03

4 1,552 13 414 1,390 293 0.003 0.44

7 7,714 36 1,995 6,916 1,295 0.006 1.92

10

21,760 70 5,565 19,525 3,512 0.005 6.66

13

46,930 118 11,934 42,133 7,430 0.013 12.8

16

86,464 172 21,912 77,656 13,535 0.022 21.68

19

143,602 244 36,309 129,010 22,313 0.035 38.36

22

221,584 316 55,935 199,111 34,250 0.058 90.67

25

323,650 412 81,600 290,875 49,832 0.079 162.45

28

453,040 528 114,114

407,218 69,545 0.114 274.2

29

502,802 560 126,614

451,965 77,118 0.119 275.17

Students(Pasula & Russell, 2001)

P professors S students Various relataios, such as

famous(p), well-funded(p), success(s), advises(p,s)

Sample Query: probability a professor is well-funded given success of advised students

Students

Students-Profs

Networkparams

Treewidthw

CNFVars

CnfClauses

ACEdges

OnlineTime (sec)

OfflineTime(min)

04-08 11,566 72 3,099 11,099 445,410 0.0530 2

04-16 21,070 101 5,859 21,115 815,461 0.0930 3

05-10 20,688 128 5,624 20,279 2,531,230 0.2885 3

05-20 38,168 148 10,734

38,889 5,236,257 1.8439 7

06-12 33,454 176 9,209 33,353 16,936,504

3.2120 14

06-24 62,302 233 17,693

64,325 36,450,231

12.9663

33

Diagnosis QMR-like: Effect of Encoding Evidence

600 diseases (D) and 4100 features (F)

Feature Fj is a noisy-or of parent diseases Di

(11 parents chosen randomly)

Sample Query: probability of disease given partial evidence on features.

D1 D2 D3 Dm…

F1 F2 Fn…

Treewidth: 586-589

CNF variables: 94,900

CNF clauses: 188,600

No. TrueFeatures

ACEdges

OnlineTime (sec)

OfflineTime (sec)

0 48,100 0.05 23.73

3 52,830 0.05 23.86

6 57,638 0.05 23.81

9 62,547 0.05 23.82

12 67,632 0.05 24.19

15 73,321 0.04 23.6

18 81,629 0.05 24.95

21 109,335 0.05 30.95

25 434,445 0.08 155.12

27 1,141,674

0.17 469.7

28 1,691,833

0.23 728.52

29 2,352,820

0.3 1,046.93

Diagnosis QMR-like: Effect of Encoding Evidence

Ordering genes on a chromosome and determining distance between them

Useful for predicting and detecting diseases

Associating functionality of genes with their location on the chromosome

Gene 1

Gene 2

Gene 3

Genetic Linkage Analysis

Pedigrees + Phenotypes + Genotypes

Arithmetic Circuit

Gene 1

Gene 2

State of the Art Linkage

Pedigree

Offline(sec)

AC Edges Online (sec)

Superlink 1.4(sec)

EE33 25.33 2,070,707

0.59 1,046.72

EE37 61.29 1,855,410

0.39 1,381.61

EE30 376.78

27,997,686

8.37 815.33

EE23 89.47 3,986,816

1.08 502.02

EE18 283.96

23,632,200

6.63 248.11

Model Compilation: Factoring MLFs into ACs

Classical algorithms factor MLFs into ACs:

Jointree embeds AC Variable elimination

constructs AC bottom up Recursive conditioning

constructs ACtop down

Factoring MLFs into ACs can be reduced to logical reasoning

Exploiting local structure to build smaller ACs:

Compiling models with very high treewidth is common place

Boundary between exact and approximate inference is much changed

Public systems now available!