Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2...

38
Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington 2 MIT 11

Transcript of Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2...

Page 1: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

Model Counting of Query Expressions:Limitations of Propositional Methods

Paul Beame1 Jerry Li2 Sudeepa Roy1 Dan Suciu1

1University of Washington2 MIT

11

Page 2: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

Probabilistic Databases AsthmaPatien

t

Ann

Bob

Friend

Ann Joe

Ann Tom

Bob Tom

Smoker

Joe

Tom

Boolean query Q: x y AsthmaPatient(x) Friend (x, y) Smoker(y)

• Tuples are probabilistic (and independent)▫ “Ann” is present with probability 0.3

• Lineage FQ,D = (x1y1z1) (x1y2z2) (x2y3z2)▫ Q is true on D FQ,D is true

• What is the probability that Q is true on D?• Two main evaluation techniques: lifted vs. grounded

inference

x1

x2

z1

z2

y1

y2

y3

0.30.1

0.51.0

0.90.5

0.7

Pr(x1) = 0.3

2

Page 3: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

3

Lifted Inference

Q: x y AsthmaPatient(x) Friend (x, y) Smoker(y)

Work with explicit query structure, i.e. the first order logic

Dichotomy Theorem [Dalvi, Suciu 12] For any UCQ, evaluating it is either

▫#P-hard▫Polynomial time computable using lifted inference▫and there is a simple condition to tell which case holds

Page 4: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

4

Grounded Inference

FQ,D = (x1y1z1) (x1y2z2) (x2y3z2)

Work with the boolean formula

Folklore sentiment: Lifted inference is strictly stronger than grounded inference

We give the first clear proof of this

Page 5: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

5

Outline

•Background: Model Counting, DPLL algorithms▫Extensions (Caching & Component Analysis)▫Knowledge Compilation (FBDDs & Decision-DNNF)

•Our Contributions▫Statement of separation▫Sketch of FBDD lower bound

•Conclusions

Page 6: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

6

Model Counting• Probability Computation Problem:

Given F, and independent Pr(x), Pr(y), Pr(z), …, compute Pr(F)

• Model Counting Problem:

Given a Boolean formula F, compute #F = #Models (satisfying assignments) of F

e.g. F = (x y) (x u w) (x u w z) #Assignments on x, y, u, z, w which make F = true

Page 7: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

7

•CDP•Relsat•Cachet•SharpSAT•c2d•Dsharp•…

Known Model Counting Algorithms

Search-based/DPLL-based(explore the assignment-space and count the satisfying ones)

Knowledge Compilation-based(compile F into a “computation-friendly” form)

[Survey by Gomes et. al. ’09]

Both techniques explicitly or implicitly • use DPLL-based algorithms • produce FBDD or Decision-DNNF compiled forms (output or trace)

[Huang-Darwiche’05, ’07]

[Birnbaum et. al.’99]

[Bayardo Jr. et. al. ’97, ’00]

[Sang et. al. ’05]

[Thurley ’06]

[Darwiche ’04]

[Muise et. al. ’12]

Page 8: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

DPLL Algorithms

Davis, Putnam, Logemann, Loveland [Davis et. al. ’60, ’62]

8

x

z

0

y

1

u 01

1

0

w

1

0

0

1

1 0

u11

1

0

w

1

0

0

1

1 0

1 0 1 0

01

11

F: (xy) (xuw) (xuwz)

uwz

uw

w

uw

½

¾ ¾

y(uw)3/87/8

5/8

Assume uniform distribution for simplicity

// basic DPLL:Function Pr(F):

if F = false then return 0if F = true then return 1select a variable x, return

½ Pr(FX=0) + ½ Pr(FX=1)

Page 9: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

DPLL Algorithms

9

x

z

0

y

1

u 01

1

0

w

1

0

0

1

1 0

u11

1

0

w

1

0

0

1

1 0

1 0 1 0

01

11

F: (xy) (xuw) (xuwz)

uwz

uw

w

uw

½

¾ ¾

y(uw)3/87/8

5/8

The trace is a Decision-Tree for F

Page 10: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

10

Extensions to DPLL

• Caching Subformulas

• Component Analysis

• Conflict Directed Clause Learning▫ Affects the efficiency of the algorithm, but not the final “form” of the trace

Page 11: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

Extensions to DPLL: Caching

11

// basic DPLL:Function Pr(F):

if F = false then return 0if F = true then return 1select a variable x, return

½ Pr(FX=0) + ½ Pr(FX=1)

x

z

0

y

1

u 01

1

0

w

1

0

0

1

1 0

u11

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

uw

y(uw)

w

// DPLL with caching:Cache F and Pr(F);look it up before computing

Page 12: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

Caching & FBDDs

12

x

z

0

y

1

0

1 0

u11

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

y(uw)The trace is a decision-DAG for F

FBDD (Free Binary Decision Diagram)or

ROBP (Read Once Branching Program)

• Every variable is tested at most once on any path

Page 13: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

Extensions to DPLL: Component Analysis

13

x

z

0

y

1

0

1 0

u11

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

y (uw)

// basic DPLL:Function Pr(F):

if F = false then return 0if F = true then return 1select a variable x, return

½ Pr(FX=0) + ½ Pr(FX=1)

// DPLL with component analysis (and caching):

if F = G Hwhere G and H have disjoint sets of variablesPr(F) = Pr(G) × Pr(H)

Page 14: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

Components & Decision-DNNF

14

x

z

1u1

1

1

0

w

1

0

0

1

1 0

uwz

w

y (uw)

0

y

1

0

F: (xy) (xuw) (xuwz)

The trace is a Decision-DNNF [Huang-Darwiche ’05, ’07]

FBDD + “Decomposable” AND-nodes

(Two sub-DAGs do not share variables)

y

01AND Node

uw

Page 15: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

15

How much power does component analysis add?

Theorem [BLRS]: decision-DNNF for F of size N FBDD for F of size Nlog N + 1 [UAI ’13]

Conversion works even when we allow negation and arbitrary decomposable binary gates. [ICDT ’14]

Corollary: Exponential lower bound for FBDD(F) exponential lower bound for decision-DNNF(F)

Page 16: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

16

Implications for Lower Bounds?

•All real world exact model counters compile into FBDDs or decision-DNNFs

•By conversion, an exponential size lower bound for FBDDs implies an exponential lower bound for decision-DNNFs

•Thus suffices to consider FBDDs

Page 17: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

17

Outline

•Background: Model Counting, DPLL algorithms▫Extensions (Caching & Component Analysis)▫Knowledge Compilation (FBDDs & Decision-DNNF)

•Our Contributions▫Statement of separation▫Sketch of FBDD lower bound

•Conclusions

Page 18: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

18

An important class of queries

H1 =R(x)S(x,y) S(x,y)T(y)

Hk =R(x)S1(x,y) ... Si(x,y)Si+1(x,y) ... Sk(x,y)T(y)

▫ [Dalvi, Suciu 12]: Hk is #P-hard to evaluate

▫Known to “capture” hardness for probabilistic DB queries▫But, some functions of the hki are poly-time computable

using lifted inference, e.g. (h30 h32) (h30 h33) (h31 h33)

hk0 hki hkk

Page 19: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

19

New Lower Bounds

Theorem: For all k, FBDD(Hk) = 2(𝑛), which implies

Decision-DNNF(Hk) = 2(√n)

Theorem: Any Boolean function f of hk0,...,hkk that depends on all of them requires

FBDD(f) = 2(𝑛)

which implies Decision-DNNF(f) = 2(√n)

Corollary: Grounded inference requires 2(√𝑛) time even on probabilistic DB instances with poly(n) time algorithms using lifted inference.

Implies separation between grounded and lifted inference

Page 20: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

20

Proof for H1

H1 = R(x)S(x,y) S(x,y)T(y)

Over the complete database of size n,

H1 = ∨n R(i) S(ij) ∨n S(ij) T(j)

Q: why is H1 hard for FBDDs?

Page 21: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

21

Matrix view

T(1) T(2) T(3) T(4) T(5)

R(1) S(1,1) S(1,2) S(1,3) S(1,4) S(1,5)

R(2) S(2,1) S(2,2) S(2,3) S(2,4) S(2,5)

R(3) S(3,1) S(3,2) S(3,3) S(3,4) S(3,5)

R(4) S(4,1) S(4,2) S(4,3) S(4,4) S(4,5)

R(5) S(5,1) S(5,2) S(5,3) S(5,4) S(5,5)

H1 = ∨n R(i) S(ij) ∨n S(ij) T(j)

Page 22: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

22

Matrix view

T(1) T(2) T(3) T(4) T(5)

R(1) S(1,1)

S(1,2)

S(1,3)

S(1,4)

S(1,5)

R(2) S(2,1)

S(2,2)

S(2,3)

S(2,4)

S(2,5)

R(3) S(3,1)

S(3,2)

S(3,3)

S(3,4)

S(3,5)

R(4) S(4,1)

S(4,2)

S(4,3)

S(4,4)

S(4,5)

R(5) S(5,1)

S(5,2)

S(5,3)

S(5,4)

S(5,5)

H1 = ∨n R(i) S(ij) ∨n S(ij) T(j)

R(1)

Page 23: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

23

Matrix view

T(1) T(2) T(3) T(4) T(5)

R(1) S(1,1)

S(1,2)

S(1,3)

S(1,4)

S(1,5)

R(2) S(2,1)

S(2,2)

S(2,3)

S(2,4)

S(2,5)

R(3) S(3,1)

S(3,2)

S(3,3)

S(3,4)

S(3,5)

R(4) S(4,1)

S(4,2)

S(4,3)

S(4,4)

S(4,5)

R(5) S(5,1)

S(5,2)

S(5,3)

S(5,4)

S(5,5)

H1 = ∨n R(i) S(ij) ∨n S(ij) T(j)

R(1)0 1

Page 24: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

24

Matrix view

T(1) T(2) T(3) T(4) T(5)

R(1) S(1,1)

S(1,2)

S(1,3)

S(1,4)

S(1,5)

R(2) S(2,1)

S(2,2)

S(2,3)

S(2,4)

S(2,5)

R(3) S(3,1)

S(3,2)

S(3,3)

S(3,4)

S(3,5)

R(4) S(4,1)

S(4,2)

S(4,3)

S(4,4)

S(4,5)

R(5) S(5,1)

S(5,2)

S(5,3)

S(5,4)

S(5,5)

H1 = ∨n R(i) S(ij) ∨n S(ij) T(j)

R(1)0 1

Page 25: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

25

Matrix view

T(1) T(2) T(3) T(4) T(5)

R(1) S(1,1)

S(1,2)

S(1,3)

S(1,4)

S(1,5)

R(2) S(2,1)

S(2,2)

S(2,3)

S(2,4)

S(2,5)

R(3) S(3,1)

S(3,2)

S(3,3)

S(3,4)

S(3,5)

R(4) S(4,1)

S(4,2)

S(4,3)

S(4,4)

S(4,5)

R(5) S(5,1)

S(5,2)

S(5,3)

S(5,4)

S(5,5)

H1 = ∨n R(i) S(ij) ∨n S(ij) T(j)

Can’t Cache!

R(1)

0

1

S(1,1)

S(1,5)

0

Page 26: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

26

A “unit rule” for FBDDs

Variable x in a formula Φ is a unit if Φ = x v G

A FBDD follows the unit rule if each node tests a unit variable whenever possible

Can we assume that FBDDs follow the unit rule?

G

x

1

1 0

Unit Node

Page 27: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

27

A “unit rule” for FBDDs

Lemma: Given an FBDD for a monotone DNF formula Φ of size N, there exists an FBDD for Φ that follows the unit rule of size at most |var (Φ)| N.

Proof: Alter FBDD to test units whenever possible, then restore read-once property

Page 28: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

28

Bound for H1

• Idea: specify a set of “admissible” partial paths A so that:1. None of them cache2. Each takes n – 1 degrees of freedom to specify

Given this set A:▫ Each partial path in A must end at a unique node (they

don’t cache)▫ There are 2𝑛-1 such paths

(n – 1 degrees of freedom) Implies A has at least 2𝑛-1 nodes

Page 29: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

29

Admissible Paths

Let A be the set of partial paths P which1. Don’t end at a leaf node2. Touch n – 1 rows and/or columns, but not more3. Never set R(i) = S(ij) = T(j) = 0, for any i, j

Page 30: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

30

Bound for H1

Proposition: If P, Q are paths in A which end at the same node v, then they test the same set of R and T variables, and assign them the same value.

Proof:Suppose P sets R(i), Q does not The subformula at v cannot contain any term R(i)S(ij) Q sets every S(ij) = 0 or every T(j) = 1 (unit rule) #Col(Q) = n, contradiction

Page 31: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

31

Paths don’t cache

Intuition: given that R(i), T(j) are set, S(ij) is determined

Since two paths that end at the same node v set the same R, T variables, they set the same S variables

R(i) S(ij) T(j)

0 1 0

1 0 0

0 0 1

1 0 1

Page 32: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

32

n – 1 degrees of freedomLet P be an admissible path

w.l.o.g. |Row(P)| = n - 1

At each node where we first visit a row, could’ve chosen either edge and still been admissible!

n – 1 degrees of freedom

2𝑛-1 distinct admissible paths

FBDD(H1) = 2(𝑛)

R(2)

S(1, 4)

S(5, 4)

Page 33: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

33

Proof for Hk

Same basic structure. We only need to change definition of admissible path.Let A be the set of partial paths P which

1. Don’t end at a leaf node2. Touch n – 1 rows and/or columns, but not more3. Always set i,j consistent with the following table:

R(i) S1(ij)

S2(ij) S3(ij) T(j)

0 1 0 1 0

1 0 1 0 1

0 0 1 0 1

1 0 1 0 1

Page 34: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

34

New Lower Bounds

Theorem: For all k, FBDD(Hk) = 2(𝑛), which implies

Decision-DNNF(Hk) = 2(√n)

Theorem: Any Boolean function f of hk0,...,hkk that depends on all of them requires

FBDD(f) = 2(𝑛)

which implies Decision-DNNF(f) = 2(√n)

Corollary: Grounded inference requires 2(√𝑛) time even on probabilistic DB instances with poly(n) time algorithms using lifted inference.

Implies separation between grounded and lifted inference

Page 35: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

35

Boolean combinations of hk0,...,hkk

f is a Boolean function that depends on all its inputs

Ψ = f(hk0, hk1,…, hkk)

We give a reduction from any FBDD for Ψ into an FBDD for Hk

Intuitively: to compute Ψ using an FBDD, you must compute the hk0, hk1, … , hkk, so that FBDD can also compute Hk

Page 36: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

36

Summary

• FBDDs and decision-DNNFs bound the power of known model counting algorithms

• Exponential lower bounds on FBDDs & decision-DNNFs

• Which implies a separation between lifted and grounded inference

Page 37: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

37

Open Problems

• A polynomial conversion of decision-DNNFs to FBDDs?

• General Dichotomy theorem for grounded inference?

• Approximate model counting?

Page 38: Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

38

Thank You

Questions?