Inference in Probabilistic Logic Programs using Lifted ... · Figure 2: Recurrences for computing...

arX

iv:1

608.

0576

3v1

[cs.

AI]

20

Aug

201

6

Inference in Probabilistic Logic Programs usingLifted Explanations∗

Arun Nampally and C. R. RamakrishnanComputer Science Department,

Stony Brook University, Stony Brook, NY 11794.(e-mail: [email protected], [email protected])

Abstract

In this paper, we consider the problem of lifted inference inthe context of Prism-like prob-abilistic logic programming languages. Traditional inference in such languages involves theconstruction of an explanation graph for the query and computing probabilities over this graph.When evaluating queries over probabilistic logic programswith a large number of instancesof random variables, traditional methods treat each instance separately. For many programsand queries, we observe that explanations can be summarizedinto substantially more compactstructures, which we call lifted explanation graphs. In this paper, we define lifted explana-tion graphs and operations over them. In contrast to existing lifted inference techniques, ourmethod for constructing lifted explanations naturally generalizes existing methods for con-structing explanation graphs. To compute probability of query answers, we solve recurrencesgenerated from the lifted graphs. We show examples where theuse of our technique reducesthe asymptotic complexity of inference.

1 Introduction

Background. Probabilistic Logic Programming (PLP) provides a declarative programming frame-work to specify and use combinations of logical and statistical models. A number of program-ming languages and systems have been proposed and studied under the framework of PLP, e.g.PRISM (Sato and Kameya 1997), Problog (De Raedt et al. 2007),PITA (Riguzzi and Swift 2011)and Problog2 (Dries et al. 2015) etc. These languages have similar declarative semantics basedon thedistribution semantics(Sato and Kameya 2001). Moreover, the inference algorithmsusedin many of these systems to evaluate the probability of queryanswers, e.g. PRISM, Problog andPITA, are based on a common notion ofexplanation graphs.

At a high level, the inference procedure follows traditional query evaluation over logic pro-grams. Outcomes of random variables, i.e., theprobabilistic choices, are abduced during queryevaluation. Each derivation of an answer is associated witha set of outcomes of random vari-ables, called its explanation, under which the answer is supported by the derivation. Systems

∗This work was supported in part by NSF Grant IIS-1447549.

1

http://arxiv.org/abs/1608.05763v1

differ on how the explanations are represented and manipulated. Explanation graphs in PRISMare represented using tables, and under mutual exclusion assumption, multiple explanations arecombined by adding entries to tables. In Problog and PITA, explanation graphs are represented byBinary Decision Diagrams (BDDs), with probabilistic choices mapped to propositional variablesin BDDs.

Driving Problem. Inference based on explanation graphs does not scale well tological/statisticalmodels with large numbers of random processes and variables. Severalapproximate inferencetechniques have been proposed to estimate the probability of answers when exact inference isinfeasible. In general, large logical/statistical modelsinvolve familiesof independent, identicallydistributed (i.i.d.) random variables. Moreover, in many models, inference often depends on theoutcomes of random processes but not on the identities of random variables with the particularoutcomes. However, query-based inference methods will instantiate each random variable and theexplanation graph will represent each of their outcomes. Even when the graph may ultimatelyexhibit symmetry with respect to random variable identities, and many parts of the graph maybe shared, the computation that produced these graphs may not be shared.This paper presents astructure for representing explanation graphs compactly by exploiting the symmetry with respect toi.i.d random variables, and a procedure to build this structure without enumerating each instanceof a random process.

Illustration. We illustrate the problem and our approach using the simple example in Figure 1,which shows a program describing a process of tossing a number of i.i.d. coins, and evaluatingif at least two of them came up “heads”. The example is specified in an extension of the PRISMlanguage, called Px. Explicit random processes of PRISM enables a clearer exposition of ourapproach. In PRISM and Px, a special predicate of the formmsw(p, i, v) describes, given arandom processp that defines a family of i.i.d. random variables, thatv is the value of thei-th random variable in the family. The argumenti of msw is called theinstanceargument of thepredicate. In this paper, we consider Param-Px, a further extension of Px to define parameterizedprograms. In Param-Px, a built-in predicate,in is used to specify membership; e.g.x in smeansx is member of an enumerable sets. The size ofs is specified by a separatepopulation directive.

The program in Figure 1 defines a family of random variables with outcomes inh, t gen-erated bytoss. The instances that index these random variables are drawn from the setcoins.Finally, predicatetwoheads is defined to hold if tosses of at least two distinct coins comeup“heads”.

State of the Art, and Our Solution. Inference in PRISM, Problog and PITA follows the struc-ture of the derivations for a query. Consider the program in Figure 1(a) and let the cardinality ofthe set of coins ben. The querytwoheads will take Θ(n2) time, since it will construct bindings tobothX andY in the clause definingtwoheads. However, the size of an explanation graph isΘ(n);see Figure 1(b). Computing the probability of the query overthis graph will also takeΘ(n) time.

In this paper, we present a technique to construct a symbolicversion of an explanation graph,called alifted explanation graphthat represents instances symbolically and avoids enumeratingthe instances of random processes such astoss. The lifted explanation graph for querytwoheadsis shown in Figure 1(c). Unlike traditional explanation graphs where nodes are specific instances

2

1 % Two distinct tosses show ”h”2 twoheads :-

3 X in coins,

4 msw(toss, X, h),

5 Y in coins,

6 X < Y,

7 msw(toss, Y, h).

8

9 % Cardinality of coins:10 :- population(coins, 100).

11

12 % Distribution parameters:13 :- set_sw(toss,

14 categorical([h:0.5, t:0.5])).

(toss,1)

(toss,2)(toss,2)

(toss,n−1)(toss,3)1

0(toss,n)1

01

th

thth

thth

th

∃X.∃Y.X< Y

(toss,X)

(toss,Y) 0

1 0

h t

h t

(a) Simple Px program (b) Ground expl. Graph (c) Lifted expl.Graph

Figure 1: Example program and ground explanation graph

f1(n) = h1(1,n)

h1(i,n) =

g1(i,n)+ (1− f1) ·h1(i +1,n) if i < ng1(i,n) if i = n

g1(i,n) = π · f2(i,n)f1 = π

f2(i,n) =

h2(i +1,n) if i < n0 otherwise

h2( j,n) =

g2+(1− f2) ·h2( j +1,n) if j < ng2 if j = n

g2 = πf2 = π

Figure 2: Recurrences for computing probabilities for Example in Fig. 1

of random variables, nodes in the lifted explanation graph may be parameterized by their instance(e.g (toss,X) instead of(toss,1)). A set of constraints on those variables, specify the allowedgroundings.

Note that the graph size is independent of the size of the population. Moreover, the graphcan be constructed in time independent of the population size as well. Probability computationis performed by first deriving recurrences based on the graph’s structure and then solving therecurrences. The recurrences for probability computationderived from the graph in Fig. 1(c) areshown in Fig. 2. In the figure, the equations with subscript 1 are derived from the root of thegraph; those with subscript 2 from the left child of the root;and whereπ is the probability thattoss is “h”. Note that the probability of the query,f1(n), can be computed inΘ(n) time from therecurrences.

These recurrences can be solved inO(n) time with tabling or dynamic programming. More-over, in certain cases, it is possible to obtain a closed formfrom a recurrence. For instance, notingthatg2 is independent of its parameters, we geth2( j,n) = 1− (1−π)n− j+1.

Lifted explanations vs. Lifted Inference. Our work is a form oflifted inference, a set of tech-niques that have been intensely studied in the context of first-order graphical models and MarkovLogic Networks (Poole 2003; Braz et al. 2005; Milch et al. 2008). Essentially, lifted explanationsprovide a way to perform lifted inference over PLPs by leveraging their query evaluation mecha-nism. Directed first-order graphical models (Kisynski 2010) can be readily cast as PLPs, and ourtechnique can be used to perform lifted inference over such models. Our solution, however, doesnot cover techniques based on counting elimination (Braz etal. 2005; Milch et al. 2008).

It should be noted that Problog2 does not construct query-specific explanation graphs. Instead,it uses a knowledge compilation approach where the models ofa program are represented by a

3

propositional boolean formula. These formulae, in turn, are represented in a compact standardform such as dDNNFs or SDDs (Darwiche 2001; Darwiche 2011). Query answers and their prob-abilities are then computed using linear-time algorithms over these structures.

The knowledge compilation approach has been extended to do ageneralized form of lifted in-ference using first-order model counting (Van den Broeck et al. 2011). This technique performslifted inference, including inversion and counting elimination over a large class of first ordermodels. However, first order model counting is defined only when the problem can be statedin a first-order constrained CNF form. Problems such as the example in Figure 1 cannot bewritten in that form. To address this, a skolemization procedure which eliminates existentialquantifiers and converts to first-order CNF without adding function symbols was proposed byVan den Broeck et al. (2014). While the knowledge compilation approach takes a core lifted in-ference procedure and moves to apply it to a class of logic programs, our approach generalizesexisting inference techniques to perform a form of lifted inference.

Contributions. The technical contribution of this paper is two fold.

1. We define a lifted explanation structure, and operations over these structures (see Section 3).We also give method to construct such structures during query evaluation, closely followingthe techniques used to construct explanation graphs.

2. We define a technique to compute probabilities over such structures by deriving and solvingrecurrences (see Section 4). We provide examples to illustrate the complexity gains due toour technique over traditional inference.

The rest of the paper begins by defining parameterized Px programs and their semantics (Section 2).After presenting the main technical work, the paper concludes with a discussion of related work.(Section 5).

2 Parameterized Px Programs

The PRISM language follows Prolog’s syntax. It adds a binarypredicatemsw to introduce randomvariables into an otherwise familiar Prolog program. Specifically, in msw(s, v), s is a “switch”that represents a random process which generates a family ofrandom variables, andv is bound tothe value of a variable in that family. The domain and distribution parameters of the switches arespecified usingvalue facts andset sw directives, respectively. Given a switchs, we useDs todenote the domain ofs, andπs : Ds→ [0,1] to denote its probability distribution.

The model-theoretic distribution semantics explicitly identifies each member of a random vari-able family with aninstanceparameter. In the PRISM system, the binarymsw is interpretedstochastically, generating a new member of the random variable family whenever anmsw is en-countered during inference.

2.1 Px and Inference

The Px language extends the PRISM language in three ways. Firstly, themsw switches in Px areternary, with the addition of an explicitinstanceparameter. This brings the language closer to the

4

formalism presented when describing PRISM’s semantics (Sato and Kameya 2001). Secondly, Pxaims to compute the distribution semantics with no assumptions on the structure of the explana-tions. Thirdly, in contrast to PRISM, the switches in Px can be defined with a wide variety ofunivariate distributions, including continuous distributions (such as Gaussian) and infinite discretedistributions (such as Poisson). However, in this paper, weconsider only programs with finitediscrete distributions.

Exact inference of Px programs with finite discrete distributions uses explanation graphs withthe following structure.

Definition 1 (Ground Explanation Graph). Let S be the set of ground switches in a Px program P,and Ds be the domain of switch s∈ S. LetT be the set of all ground terms over symbols in P. Let“ ≺” be a total order over S×T such that(s1, t1)≺ (s2, t2) if either t1 < t2 or t1 = t2 and s1 < s2.

A ground explanation treeover P is a rooted treeγ such that:

• Leaves inγ are labeled0 or 1.

• Internal nodes inγ are labeled(s,z) where s∈ S is a switch, and z is a ground term oversymbols in P.

• For node labeled(s,z), there are k outgoing edges to subtrees, where k= |Ds|. Each edge islabeled with a unique v∈ Ds.

• Let (s1,z1),(s2,z2), . . . ,(sk,zk),c be the sequence of node labels in a root-to-leaf path in thetree, where c∈ 0,1. Then(si,zi) ≺ (sj ,zj) if i < j for all i , j ≤ k. As a corollary, nodelabels along any root to leaf path in the tree are unique.

An explanation graphis a DAG representation of a ground explanation tree.

We useφ to denote explanation graphs. We use(s, t)[vi : φi ] to denote an explanation graphwhose root is labeled(s, t), with each edge labeledvi (ranging over a suitable index seti), leadingto subgraphφi.

Consider a sequence of alternating node and edge labels in a root-to-leaf path:(s1,z1),v1,(s2,z2),v2, . . . ,(sk,zk),vk,c. Each such path enumerates a set of random variable val-uationss1[z1] = v1,s2[z2] = v2, . . . ,sk[zk] = vk. Whenc= 1, the set of valuations forms an expla-nation. An explanation graph thus represents a set of explanations.

Note that explanation trees and graphs resemble decision diagrams. Indeed, explanation graphsare implemented using Binary Decision Diagrams (Bryant 1992) in PITA and Problog; and Multi-Valued Decision Diagrams (Srinivasan et al. 1990) in Px. Theunion of two sets of explanationscan be seen as an “or” operation over corresponding explanation graphs. Pair-wise union of expla-nations in two sets is an “and” operation over corresponding explanation graphs.

Inference via Program Transformation. Inference in Px is performed analogous to that inPITA (Riguzzi and Swift 2011). Concretely, inference is done by translating a Px program to onethat explicitly constructs explanation graphs, performing tabled evaluation of the derived program,and computing probability of answers from the explanation graphs. We describe the translation fordefinite pure programs; programs with built-ins and other constructs can be translated in a similarmanner.

5

First every clause containing a disequality constraint is replaced by two clauses using less-than constraints. Next, for every user-defined atomA of the form p(t1, t2, . . . , tn), we defineexp(A,E) as atomp(t1, t2, . . . , tn,E) with a new predicatep/(n+ 1), with E as an added “ex-planation” argument. For such atomsA, we also definehead(A,E) as atomp′(t1, t2, . . . , tn,E)with a new predicatep′/(n+ 1). A goal G is a conjunction of atoms, whereG = (G1,G2) forgoalsG1 andG2, or G is an atomA. Functionexpis extended to goals such thatexp((G1,G2)) =((exp(G1,E1),exp(G2,E2)),and(E1,E2,E)), whereand is a predicate in the translated programthat combines two explanations using conjunction, andE1 andE2 are fresh variables. Functionexpis also extended tomsw atoms such thatexp(msw(p, i,v),E) is rv(p, i,v,E), whererv is a predicatethat bindsE to an explanation graph with root labeled(p, i) with an edge labeledv leading to a 1child, and all other edges leading to 0.

Each clause of the formA :− G in a Px program is translated to a new clausehead(A,E) :− exp(G,E). For each predicatep/n, we definep(X1,X2, . . .Xn,E) to be such thatE is the disjunction of allE′ for p′(X1,X2, . . .Xn,E′). As in PITA, this is done using answer sub-sumption.

Computing Answer Probabilities. Probability of an answer is determined by first materializ-ing the explanation graph, and then computing the probability over the graph. The probabilityassociated with a node in the graph is computed as the sum of the products of probabilities asso-ciated with its children and the corresponding edge probabilities. The probability associated withan explanation graphϕ, denotedprob(ϕ) is the probability associated with the root. This can becomputed in time linear in the size of the graph by using dynamic programming or tabling.

2.2 Syntax and Semantics of Parameterized Px Programs

Parameterized Px, called Param-Px for short, is a further extension of the Px language. The firstfeature of this extension is the specification ofpopulationsand instancesto specify ranges ofinstance parameters ofmsws.

Definition 2 (Population). A populationis a named finite set, with a specified cardinality. A popu-lation has the following properties:

1. Elements of a population may be atomic, or depth-bounded ground terms.

2. Elements of a population are totally ordered using the default term order.

3. Distinct populations are disjoint.

Populations and their cardinalities are specified in a Param-Px program bypopulation facts.For example, the program in Figure 1(a) defines a population namedcoins of size 100. Theindividual elements of this set are left unspecified. When necessary,element/2 facts may be usedto define distinguished elements of a population. For example,element(fred, persons) definesa distinguished element “fred” in population persons. In presence ofelement facts, elements ofa population are ordered as follows. The order ofelement facts specifies the order among thedistinguished elements, and all distinguished elements occur before other unspecified elements inthe order.

6

Definition 3 (Instance). An instanceis an element of a population. In a Param-Px program, abuilt-in predicatein/2 can be used to draw an instance from a population. All instances of apopulation can be drawn by backtracking overin.

For example, in Figure 1(a),X in coins bindsX to an instance of populationcoins. Aninstance variableis one that occurs as the instance argument in anmsw predicate in a clause ofa Param-Px program. For example, in Figure 1(a),X andY in the clause definingtwoheads areinstance variables.

Constraints. The second extension in Param-Px are atomic constraints, ofthe formt1 = t2,t1 6= t2 and t1 < t2, wheret1 and t2 are variables or constants, to compare instances of apopulation. We use braces “·” to distinguish the constraints from Prolog built-in comparisonoperators.

Types. We use populations in a Param-Px program to confer types to program variables. Eachvariable that occurs in an “in” predicate is assigned a unique type. More specifically,X has typep if X in p occurs in a program, wherep is a population; andX is untyped otherwise. We extendthis notion of types to constants and switches as well. A constant c has typep if there is a factelement(c, p); andc is untyped otherwise. A switchs has typep if there is anmsw(s, X, t)in the program andX has typep; ands is untyped otherwise.

Definition 4 (Well-typedness and Typability). A Param-Px program iswell-typedif:

1. For every constraint in the program of the formt1 = t2, t1 6= t2 or t1 < t2, the types oft1 and t2 are identical.

2. Types of arguments of every atom on the r.h.s. of a clause are identical to the types ofcorresponding parameters of l.h.s. atoms of matching clauses.

3. Every switch in the program has a unique type.

A Param-Px program istypableif we can add literals of the form Xin p (where p is a population)to r.h.s. of clauses such that the resulting program is well-typed.

The first two conditions of well-typedness ensure that only instances from the same popula-tion are compared in the program. The last condition imposesthat instances of random variablesgenerated by switchs are all indexed by elements drawn from the same population. In the rest ofthe paper, unless otherwise specified, we assume all Param-Px programs under consideration arewell-typed.

Semantics of Param-Px Programs. Each Param-Px program can be readily transformed intoa non-parameterized “ordinary” Px program. Eachpopulation fact is used to generate a set ofin/2 facts enumerating the elements of the population. Other constraints are replaced by theircounterparts is Prolog: e.g.X < Y with X<Y. Finally, eachmsw(s,i,t) is preceded byi in pwherep is the type ofs. The semantics of the original parameterized program is defined by thesemantics of the transformed program.

7

3 Lifted Explanations

In this section we formally definelifted explanation graphs. These are a generalization of groundexplanation graphs defined earlier, and are introduced in order to represent ground explanationscompactly. As illustrated in Figure 1 in Introduction, the compactness of lifted explanations is aresult of summarizing the instance information. Constraints over instances form a basic buildingblock of lifted explanations. We use the following constraint domain for this purpose.

3.1 Constraints on Instances

Definition 5 (Instance Constraints). LetV be a set of instance variables, with subranges of integersas domains, such that m is the largest positive integer in thedomain of any variable. Atomicconstraints on instance variables are of one of the following two forms: X< aY±k, X= aY±k,where X,Y ∈ V , a∈ 0,1, where k is a non-negative integer≤ m+1. The language of constraintsover bounded integer intervals, denoted byL (V ,m), is a set of formulaeη, whereη is a non-empty set of atomic constraints representing their conjunction.

Note that each formula inL (V ,m) is a convex region inZ|V|, and hence is closed underconjunction and existential quantification.

Let vars(η) be the set of instance variables in an instance constraintη. A substitutionσ :vars(η) → [1..m] that maps each variable to an element in its domain is asolution to η if eachconstraint inη is satisfied by the mapping. The set of all solutions ofη is denoted by[[η]]. Theconstraint formulaη is unsatisfiable if[[η]] = /0. We say thatη |= η ′ if everyσ ∈ [[η]] is a solutionto η ′.

Note also that instance constraints are a subclass of the well-known integer octagonalconstraints (Mine 2006) and can be represented canonically by difference bound matrices(DBMs) (Yovine 1998; Larsen et al. 1997), permitting efficient algorithms for conjunction and ex-istential quantification. Given a constraint onn variables, a DBM is a(n+1)×(n+1) matrix withrows and columns indexed by variables (and a special “zero” row and column). For variablesXandY, the entry in cell(X,Y) of a DBM represents the upper bound onX −Y. For variableX,the value at cell(X,0) is X’s upper bound and the value at cell(0,X) is the negation ofX’s lowerbound.

Geometrically, each entry in the DBM representing aη is a “face” of the region represent-ing [[η]]. Negation of an instance constraintη can be represented by a set of mutually exclusiveinstance constraints. Geometrically, this can be seen as the set of convex regions obtained bycomplementing the “faces” of the region representing[[η]]. Note that whenη hasn variables, thenumber of instance constraints in¬η is bounded by the number of faces of[[η]], and hence byO(n2).

Let ¬η represent the set of mutually exclusive instance constraints representing the negationof η. Then the disjunction of two instance constraintsη andη ′ can be represented by the set ofmutually exclusive instance constraints(η ∧¬η ′)∪ (η ′∧¬η)∪η ∧η ′, where we overload∧ torepresent the element-wise conjunction of an instance constraint with a set of constraints.

An existentially quantified formula of the form∃X.η can be represented by a DBM obtainedby removing the rows and columns corresponding toX in the DBM representation ofη. We denotethis simple procedure to obtain∃X.η from η by Q(X,η).

8

Definition 6 (Range). Given a constraint formulaη ∈ L (V ,m), and X∈ vars(η), let σX(η) =v | σ ∈ [[η]],σ(X) = v. Then range(X,η) is the interval[l ,u], where l= min(σX(η)) and u=max(σX(η)).

Since the constraint formulas represent convex regions, itfollows that each variable’s rangewill be an interval. Note that range of a variable can be readily obtained in constant time from theentries for that variable in the zero row and zero column of the constraint’s DBM representation.

3.2 Lifted Explanation Graphs

Definition 7 (Lifted Explanation Graph). Let S be the set of ground switches in a Param-Px pro-gram P, Ds be the domain of switch s∈ S, m be the sum of the cardinalities of all populations in Pand C be the set of distinguished elements of the populationsin P. A lifted explanation graphovervariablesV is a pair (Ω : η,ψ) which satisfies the following conditions

1. Ω : η is the notation for∃Ω.η, whereη ∈L (V ,m) is either a satisfiable constraint formula,or the single atomic constraintfalse andΩ ⊆ vars(η) is the set of quantified variables inη. Whenη is false, Ω = /0.

2. ψ is a singly rooted DAG which satisfies the following conditions

• Internal nodes are labeled(s, t) where s∈ S and t∈ V ∪C.

• Leaves are labeled either0 or 1.

• Each internal node has an outgoing edge for each outcome∈ Ds.

• If a node labeled(s, t) has a child labeled(s′, t ′) thenη |= t < t ′ or η |= t = t ′ and(s,c)≺ (s′,c) for any ground term c (see Def. 1).

Similar to ground explanation graphs (Def. 1), the DAG components of the lifted explanationgraphs are represented by textual patterns(s, t)[αi : ψi ] where(s, t) is the label of the root andψi

is the DAG associated with the edge labeledαi. Irrelevant parts may denoted “” to reduce clutter.In the lifted explanation graph shown in Figure 1(c), theΩ : η part would beX,Y : X <Y. We,now define the standard notion of bound and free variables over lifted explanation graphs.

Definition 8 (Bound and free variables). Given a lifted explanation graph(Ω : η,ψ), a variableX ∈ vars(η), is called a bound variable if X∈ Ω, otherwise its called a free variable.

The lifted explanation graph is said to bewell-structuredif every pair of nodes(s,X) and(s′,X)with the same bound variableX, have a common ancestor withX as the instance variable. In therest of the paper, we assume that the lifted explanation graphs are well-structured.

Definition 9 (Substitution operation). Given a lifted explanation graph(Ω : η,ψ), a variableX ∈ vars(η), the substitution of X in the lifted explanation graph with avalue k from its domain,

9

denoted by(Ω : η,ψ)[k/X] is defined as follows:

(Ω : η,ψ)[k/X] = ( /0 : false,0), if η[k/X] is unsatisfiable

(Ω : η,ψ)[k/X] = (Ω\X : η[k/X],ψ[k/X]), if η[k/X] is satisfiable

((s, t)[αi : ψi ])[k/X] = (s,k)[αi : ψi [k/X]], if t = X

((s, t)[αi : ψi ])[k/X] = (s, t)[αi : ψi [k/X]], if t 6= X

0[k/X] = 0

1[k/X] = 1

In the above definition,η[k/X] refers to the standard notion of substitution. The definition ofsubstitution operation can be generalized to mappings on sets of variables. Letσ be a substitutionthat maps variables to their values. By(Ω : η,ψ)σ we denote the lifted explanation graph obtainedby sequentially performing substitution operation on eachvariableX in the domain ofσ .

Lemma 1 (Substitution lemma). If (Ω : η,ψ) is a lifted explanation graph, and X∈ vars(η), then(Ω : η,ψ)[k/X] where k is a value in domain of X, is a lifted explanation graph.

When a substitution[k/X] is applied to a lifted explanation graph, andη[k/X] is unsatisfiable,the result is( /0 :false,0)which is clearly a lifted explanation graph. Whenη[k/X] is satisfiable,the variable is removed fromΩ and occurrences ofX in ψ are replaced byk. The resultant DAGclearly satisfies the conditions imposed by the Def 7. Finally we note that a ground explanationgraphφ (Def. 1) is a trivial lifted explanation graph( /0 : true,φ). This constitutes the informalproof of lemma 1.

3.3 Semantics of Lifted Explanation Graphs

The meaning of a lifted explanation graph(Ω : η,ψ) is given by the ground explanation treerepresented by it.

Definition 10 (Grounding). Let (Ω : η,ψ) be a closed lifted explanation graph, i.e., it has no freevariables. Then the ground explanation tree represented by(Ω : η,ψ), denoted Gr((Ω : η,ψ)),is given by the function Gr(Ω,η,ψ). When[[η]] = /0, then Gr( ,η, ) = 0. We consider the caseswhen[[η]] 6= /0. The grounding of leaves is defined as Gr( , ,0) = 0 and Gr( , ,1) = 1. Whenthe instance argument of the root is a constant, grounding isdefined as Gr(Ω,η,(s, t)[αi : ψi ]) =(s, t)[αi : Gr(Ω,η,ψi)]. When the instance argument is a bound variable, the grounding is definedas Gr(Ω,η,(s, t)[αi : ψi ])≡

∨c∈range(t,η)(s,c)[αi : Gr(Ω\t,η[c/t],ψi[c/t])].

In the above definitionψ[c/t] represents the tree obtained by replacing every occurrenceoft in the tree withc. The disjunct(s,c)[αi : Gr(Ω \ t,η[c/t],ψi[c/t])] in the above definition isdenotedφ(s,c) when the lifted explanation graph is clear from the context.The grounding of thelifted explanation graph in Figure 1(c) is shown in Figure 3 when there are three coins. Note that inthe figure the disjuncts corresponding to the grounding of the left subtree of the lifted explanationgraph have been combined using the∨ operation. In a similar way the two disjuncts correspondingto the root would also be combined. Further note that the third disjunct corresponding to groundingX with 3 is ommitted because, it has all 0 children and would therefore get collapsed into a single0 node.

10

(toss,1)

(toss,2) 0

1 (toss,3)

1 0

∨(toss,3)

(toss,2)

0

1 0

h t

h t

h t

h t

h t

Figure 3: Lifted expl. graph grounding example

3.4 Operations on Lifted Explanation Graphs

And/Or Operations. Let (Ω : η,ψ) and(Ω′ : η ′,ψ ′) be two lifted explanation graphs. We nowdefine “∧” and “∨” operations on them. The “∧” and “∨” operations are carried out in two steps.First, the constraint formulas of the inputs are combined. The key issue in defining these operationsis to ensure the right order among the graph nodes (see criterion 3 of Def. 7). However, the freevariables in the operands may haveno known orderamong them. Since, an arbitrary order cannotbe imposed, the operations are defined in arelational, rather than functional form. We use thenotation(Ω : η,ψ)⊕ (Ω′ : η ′,ψ ′) → (Ω′′ : η ′′,ψ ′′) to denote that(Ω′′ : η ′′,ψ ′′) is a result of(Ω : η,ψ)⊕ (Ω′ : η ′,ψ ′). When an operation returns multiple answers due to ambiguity on theorder of free variables, the answers that are inconsistent with the final order are discarded. Weassume that the variables in the two lifted explanation graphs are standardized apart such that thebound variables of(Ω : η,ψ) and(Ω′ : η ′,ψ ′) are all distinct, and different from free variables of(Ω : η,ψ) and(Ω′ : η ′,ψ ′). Let ψ = (s, t)[αi : ψi ] andψ ′ = (s′, t ′)[α ′

i : ψ ′i ].

Combining constraint formulae

Q(Ω,η)∧Q(Ω′,η ′) is unsatisfiable. Then the orders among free variables inη and η ′ are in-compatible.

• The∧ operation is defined as(Ω : η,ψ)∧ (Ω′ : η ′,ψ ′)→ ( /0 : false,0)

• The∨ operation simply returns the two inputs as outputs:

(Ω : η,ψ)∨ (Ω′ : η ′,ψ ′)→(Ω : η,ψ)

(Ω : η,ψ)∨ (Ω′ : η ′,ψ ′)→(Ω′ : η ′,ψ ′)

Q(Ω,η)∧Q(Ω′,η ′) is satisfiable. The orders among free variables inη andη ′ are compatible

• The∧ operation is defined as follows(Ω : η,ψ)∧ (Ω′ : η ′,ψ ′)→ (Ω∪Ω′ : η ∧η ′,ψ ∧ψ ′)

• The∨ operation is defined as

(Ω : η,ψ)∨ (Ω′ : η ′,ψ ′)→(Ω∪Ω′ : η ∧¬η ′,ψ)

(Ω : η,ψ)∨ (Ω′ : η ′,ψ ′)→(Ω∪Ω′ : η ′∧¬η,ψ ′)

(Ω : η,ψ)∨ (Ω′ : η ′,ψ ′)→(Ω∪Ω′ : η ∧η ′,ψ ∨ψ ′)

11

Combining DAGs Now we describe∧ and∨ operations on the two DAGsψ and ψ ′ in thepresence of a single constraint formula. The general form ofthe operation is(Ω : η,ψ ⊕ψ ′).

Base cases:The base cases are as follows (symmetric base cases are defined analogously).

(Ω : η,0∨ψ ′)→(Ω : η,ψ ′)

(Ω : η,1∨ψ ′)→(Ω : η,1)(Ω : η,0∧ψ ′)→(Ω : η,0)(Ω : η,1∧ψ ′)→(Ω : η,ψ ′)

Recursion: When the base cases do not apply, we try to compare the roots ofψ andψ ′. Theroot nodes are compared as follows: We say(s, t) = (s′, t ′) if η |= t = t ′ ands= s′, else(s, t)< (s′, t ′) (analogously(s′, t ′)< (s, t)) if η |= t < t ′ or η |= t = t ′ and(s,c)≺ (s′,c) forany ground termc. If neither of these two relations hold, then the roots are not comparableand its denoted as(s, t) 6∼ (s′, t ′).

a. (s, t)< (s′, t ′)(Ω : η,ψ ⊕ψ ′)→ (Ω : η,(s, t)[αi : ψi ⊕ψ ′])

b. (s′, t ′)< (s, t)(Ω : η,ψ ⊕ψ ′)→ (Ω : η,(s′, t ′)[α ′

i : ψ ⊕ψ ′i ])

c. (s, t) = (s′, t ′)(Ω : η,ψ ⊕ψ ′)→ (Ω : η,(s, t)[αi : ψi ⊕ψ ′

i ])

d. (s, t) 6∼ (s′, t ′)

i. t is a free variable or a constant, andt ′ is a free variable (the symmetric case isanalogous).

(Ω : η,ψ ⊕ψ ′)→(Ω : η ∧ t < t ′,ψ ⊕ψ ′)

(Ω : η,ψ ⊕ψ ′)→(Ω : η ∧ t = t ′,ψ ⊕ψ ′)

(Ω : η,ψ ⊕ψ ′)→(Ω : η ∧ t ′ < t,ψ ⊕ψ ′)

ii. t is a free variable or a constant andt ′ is a bound variable (the symmetric case isanalogous)

(Ω : η,ψ ⊕ψ ′)→ (Ω : η ∧ t < t ′,ψ ⊕ψ ′)

∨ (Ω : η ∧ t = t ′,ψ ⊕ψ ′)

∨ (Ω : η ∧ t ′ < t,ψ ⊕ψ ′)

Note that in the above definition, all three lifted explanation graphs use the samevariable names for bound variablet ′. Lifted explanation graphs can be easily stan-dardized apart on the fly, and henceforth we assume that the operation is appliedas and when required.

12

iii. t andt ′ are bound variables. Letrange(t,η) = [l1,u1] andrange(t ′,η) = [l2,u2].We can conclude thatrange(t,η) andrange(t ′,η) are overlapping, otherwise(s, t)and(s′, t ′) could have been ordered. Without loss of generality, we assume thatl1 ≤ l2 and we consider various cases of overlap as follows:Whenl1 = l2 andu1 = u2

(Ω : η,ψ ⊕ψ ′)→ (Ω∪t ′′ : η ∧ l1−1< t ′′∧ t ′′−1< u1∧ t ′′ < t∧ t ′′ < t ′,

(s, t ′′)[αi :

(ψi [t′′/t]⊕ψ ′

i [t′′/t ′])∨

(ψi [t′′/t]⊕ψ ′)∨

(ψ ′i [t

′′/t ′]⊕ψ)])

When⊕ is∨ the result can be simplified as

(Ω : η,ψ ⊕ψ ′)→ (Ω∪t ′′\t, t ′ : η[t ′′/t, t ′′/t ′],(s, t ′′)[αi : ψi [t′′/t]∨ψ ′

i [t′′/t ′]])

Whenl1 = l2 andu1 < u2 the result is

(Ω : η ∧ t ′−1< u1,ψ ⊕ψ ′)∨ (Ω : η ∧u1 < t ′,ψ ⊕ψ ′)

Whenl1 = l2 andu2 < u1 the result is

(Ω : η ∧ t = t ′,ψ ⊕ψ ′)∨ (Ω : η ∧u2 < t,ψ ⊕ψ ′)

Whenl1 < l2 andu1 = u2 the result is

(Ω : η ∧ t = t ′,ψ ⊕ψ ′)∨ (Ω : η ∧ t < l2,ψ ⊕ψ ′)

Whenl1 < l2 andu1 < u2 the result is

(Ω : η∧u1< t ′,ψ⊕ψ ′)∨(Ω : η∧t < l2∧t ′−1<u1,ψ⊕ψ ′)∨(Ω : η∧t = t ′,ψ⊕ψ ′)

Whenl1 < l2 andu2 < u1 the result is

(Ω : η ∧u2 < t,ψ ⊕ψ ′)∨ (Ω : η ∧ t < l2,ψ ⊕ψ ′)∨ (Ω : η ∧ t = t ′,ψ ⊕ψ ′)

Lemma 2 (Correctness of “∧” and “∨” operations). Let (Ω : η,ψ) and(Ω′ : η ′,ψ ′) be two liftedexplanation graphs with free variablesX1,X2 . . . ,Xn. LetΣ be the set of all substitutions mappingeach Xi to a value in its domain. Then, for everyσ ∈ Σ, and⊕ ∈ ∧,∨

Gr(((Ω : η,ψ)⊕ (Ω′ : η ′,ψ ′))σ) = Gr((Ω : η,ψ)σ)⊕Gr((Ω′ : η ′,ψ ′)σ)

Quantification.

Definition 11 (Quantification). Operation quantify((Ω : η,ψ),X) changes a free variable X∈vars(η) to a quantified variable. It is defined as

quantify((Ω : η,ψ),X) = (Ω∪X : η,ψ), if X ∈ vars(η)

13

Lemma 3 (Correctness ofquantify). Let (Ω : η,ψ) be a lifted explanation graph, letσ−X be asubstitution mapping all the free variables in(Ω : η,ψ) except X to values in their domains. LetΣ be the set of mappingsσ such thatσ maps all free variables to values in their domains and isidentical toσ−X at all variables except X. Then the following holds

Gr(quantify((Ω : η,ψ),X)σ−X) =∨

σ∈ΣGr((Ω : η,ψ)σ)

Construction of Lifted Explanation Graphs Lifted explanation graphs for a query are con-structed by transforming the Param-Px programP into one that explicitly constructs a lifted expla-nation graph, following a similar procedure to the one outlined in Section 2 for constructing groundexplanation graphs. The main difference is the use of existential quantification. LetA :− G be aprogram clause, andvars(G)−vars(A) be the set of variables inG and not inA. If any of these vari-ables has a type, then it means that the variable used as an instance argument inG is existentiallyquantified. Such clauses are then translated ashead(A,Eh) :− exp(G,Eg),quantify(Eg,Vs,Eh),whereVs is the set of typed variables invars(G)−vars(A). A minor difference is the treatment ofconstraints:expis extended to atomic constraintsϕ such thatexp(ϕ,E) bindsE to ( /0 : ϕ,1).

We order the populations and map the elements of the populations to natural numbers as fol-lows. The population that comes first in the order is mapped tonatural numbers in the rangle1..m, wherem is the cardinality of this population. Any constants in thispopulation are mappedto natural numbers in the low end of the range. The next population in the order is mapped tonatural numbers starting fromm+1 and so on. Thus, each typed variable is assigned a domain ofcontiguous positive values.

The rest of the program transformation remains the same, theunderlying graphs are constructedusing the lifted operators. In order to illustrate some of the operations described in this section,we present another example Param-Px program (Figure 4) and show the construction of liftedexplanation graphs (Figure 5 and Figure 6).

Dice Example. The listing in Figure 4 shows a Param-Px program, where the query tests if onrolling a set of dice, we got atleast two “ones” or two “twos”.The lifted explanation graph for thefirst clause is obtained by first taking a conjunction of threelifted explanation graphs: one for theconstraintX <Y and two corresponding to themsw goals and then quantifying the variablesXandY. The lifted explanation graph for the second clause is constructed in a similar fashion. Theyare shown in Figure 5. Note, that we ommitted the edge labels to avoid clutter as they are obvious.Next the two lifted explanation graphs need to be combined byan∨ operation. Let us denote thetwo lifted explanation graphs to be combined as(Ω : η,ψ) and(Ω′ : η ′,ψ ′). When∨ operation issought to be performed, it is noticed thatQ(Ω,η)∧Q(Ω′,η ′) is satisfiable. In fact,Q(Ω,η) andQ(Ω,η ′) both evaluate totrue. Therefore, only one result is returned

(Ω∪Ω′ : η ∧η ′,ψ ∨ψ ′)

The operation to be performed is the one described in recursive cased, subcaseiii in the descriptionearlier. Here we note that(roll ,X) 6∼ (roll ,X′) and further the range of both instance variables issame. Therefore the following simplified operation is performed

(Ω : η,ψ ⊕ψ ′)→ (Ω∪X′′\X,X′ : η[X′′/X,X′′/X′],(s,X′′)[αi : ψi [X′′/X]∨ψ ′

i [X′′/X′]])

The result is shown in Figure 6.

14

1 % Get two ”ones” or two ”twos”2 q :-

3 X in dice,

4 msw(roll, X, 1),

5 Y in dice,

6 X < Y,

7 msw(roll, Y, 1).

8

9 q :-

10 X in dice,

11 msw(roll, X, 2),

12 Y in dice,

13 X < Y,

14 msw(roll, Y, 2).

15

16 % Cardinality of dice:17 :- population(dice, 100).

18

19 % Distribution parameters:20 :- set_sw(roll, categorical([1:1/6, 2:1/6, 3:1/6, 4:1/6, 5:1/6, 6:1/6])).

Figure 4: Rolling dice Param-Px example

∃X∃Y.X <Y

(roll,X)

(roll,Y)

1 0 0 0 0 0

0 0 0 0 0

∃X′∃Y′.X′ <Y′

(roll,X′)

0 (roll,Y′)

0 1 0 0 0 0

0 0 0 0

Figure 5: Lifted expl. graphs for clauses in dice example

(roll,X′′)

(roll,Y)

1 0 0 0 0 0

(roll,Y′)

0 1 0 0 0 0

0 0 0 0

∃X′′∃Y∃Y′.X′′ <Y,X′′ <Y′

Figure 6: Final lifted expl. graph for dice example

15

4 Lifted Inference using Lifted Explanations

In this section we describe a technique to compute answer probabilities in a lifted fashion fromclosed lifted explanation graphs. This technique works on arestricted class of lifted explanationgraphs satisfying a property we call thefrontier subsumption property.

Definition 12 (Frontier). Given a closed lifted explanation graph(Ω : η,ψ), the frontier ofψ w.r.tX ∈ Ω denoted frontierX(ψ) is the set of non-zero maximal subtrees ofψ, which do not contain anode with X as the instance variable.

Analogous to the set representation of explanations described in 2.1, we consider the set rep-resentations of lifted explanations, i.e., root-to-leaf paths in the DAGs of lifted explanation graphsthat end in a “1” leaf.

We considerterm substitutionsthat can be applied to lifted explanations. These substitutionsreplace a variable by a term and further apply standard re-writing rules such as simplification ofalgebraic expressions. As before, we allowterm mappingsthat specify a set ofterm substitutions.

Definition 13 (Frontier subsumption property). A closed lifted explanation graph(Ω : η,ψ) sat-isfies the frontier subsumption property w.r.t X∈ Ω, if under term mappingsσ1 = X±k+1/Y |〈X±k<Y〉 ∈ η andσ2 = X+1/X, every treeφ ∈ frontierX(ψ) satisfies the following condi-tion: for every lifted explanation E2 in ψ, there is a lifted explanation E1 in φ such that E1σ1 is asub-explanation (i.e., subset) of E2σ2.

A lifted explanation graph is said to satisfy frontier subsumption property, if it is satisfied foreach bound variable. This property can be checked in a bottomup fashion for all bound variablesin the graph. The tree obtained by replacing all subtrees infrontierX(ψ) by 1 in ψ is denotedψX.

Examples for frontier subsumption property. Note that the lifted explanation graph in Figure1(c) satisfies the frontier subsumption property. The frontier w.r.t the instance variableX containsthe left subtree rooted at(toss,Y). The single lifted explanation inψ is toss[X] = h, toss[Y] =h. The single explanation in the frontier tree istoss[Y] = h. Under the mappingsσ1 = X +1/Y andσ2 = X +1/X. We can see thattoss[Y] = hσ1 is a sub-explanation oftoss[X] =h, toss[Y] = hσ2. This property is not satisfied however for the dice example.The frontier w.r.tX′′ (see Figure 6) contains the two subtrees rooted at(roll,Y) and (roll,Y′). Now there aretwo lifted explanations inψ: roll [X′′] = 1, roll [Y] = 1 androll [X′′] = 2, roll [Y′] = 2. Let usconsider the first tree in the frontier. It contains a single lifted explanationroll [Y] = 1. Underthe mappingsσ1 = X′′+1/Y andσ2 = X′′+1/X′′, roll [Y] = 1σ1 is not a sub-explanationof roll [X′′] = 2, roll [Y′] = 2σ2.

For closed lifted explanation graphs satisfying the frontier subsumption property, the probabil-ity of query answers can be computed using the following set of recurrences. With each subtreeψ = (s, t)[αi : ψi ] of the DAG of the lifted explanation graph, we associate the function f (σ ,ψ)whereσ is a (possibly incomplete) mapping of variables inΩ to values in their domains.

Definition 14 (Probability recurrences). Given a closed lifted explanation graph(Ω : η,ψ), wedefine f(σ ,ψ) (as well as g(σ ,ψ) and h(σ ,ψ) wherever applicable) for a partial mappingσ ofvariables inΩ to values in their domains based on the structure ofψ. As beforeψ = (s, t)[αi : ψi ]

16

Case 1: ψ is a 0 leaf node. Then f(σ ,0) = 0

Case 2: ψ is a 1 leaf node. Then f(σ ,1) =

1, if [[ησ ]] 6= /0

0, otherwise

Case 3: tσ is a constant. Then f(σ ,ψ) =

∑αi∈Ds

πs(αi) · f (σ ,ψi), if [[ησ ]] 6= /0

0, otherwise

Case 4: tσ ∈ Ω, and range(t,ησ) = (l ,u). Then

f (σ ,ψ) =

h(σ [l/t],ψ), if [[ησ ]] 6= /0

0, otherwise

h(σ [c/t],ψ) =

g(σ [c/t],ψ)+((1−P(ψX))×h(σ [c+1/t],ψ)), if c < u

g(σ [c/t],ψ), if c = u

g(σ ,ψ) =

∑αi∈Ds

πs(αi) · f (σ ,ψi), if [[ησ ]] 6= /0

0, otherwise

In the above definitionσ [c/t] refers to a new partial mapping obtained by augmentingσ withthe substitution[c/t], P(ψt) is the sum of the probabilities of all branches leading to a 1 leaf inψt . The functionsf , g andh defined above can be readily specialized for eachψ. Moreover, theparameterσ can be replaced by the tuple of values actually used by a function. These rewritingsteps yield recurrences such as those shown in Fig. 2. Note that P(ψt) can be computed usingrecurrences as well (shown asf in Fig. 2).

Definition 15 (Probability of Lifted Explanation Graph). Let (Ω : η,ψ) be a closed lifted expla-nation graph. Then, the probability of explanations represented by the graph, prob((Ω : η,ψ)), isthe value of f(,ψ).

Theorem 4(Correctness of Lifted Inference). Let (Ω : η,ψ) be a closed lifted explanation graph,andφ = Gr(Ω : η,ψ) be the corresponding ground explanation graph. Then prob((Ω : η,ψ)) =prob(φ).

Given a closed lifted explanation graph, letk be the maximum number of instance variablesalong any root to leaf path. Then the functionf (σ ,ψ) for the leaf will have to be computed for eachmapping of thek variables. Each recurrence equation itself is either of constant size or boundedby the number of children of a node. Using dynamic programming (possibly implemented viatabling), a solution to the recurrence equations can be computed in polynomial time.

Theorem 5(Efficiency of Lifted Inference). Letψ be a closed lifted inference graph, n be the sizeof the largest population, and k be the largest number of instance variables along any root of leafpath inψ. Then, f(,ψ) can be computed in O(|ψ|×nk) time.

17

There are two sources of further optimization in the generation and evaluation of recurrences.First, certain recurrences may be transformed into closed form formulae which can be more ef-ficiently evaluated. For instance, the closed form formula for h(σ ,ψ) for the subtree rooted atthe node(toss,Y) in Fig 1(c) can be evaluated inO(log(n)) time while a naive evaluation of therecurrence takesO(n) time. Second, certain functionsf (σ ,ψ) need not be evaluated for everymappingσ because they may be independent of certain variables. For example, leaves are alwaysindependent of the mappingσ .

Other Examples. There are a number of simple probabilistic models that cannot be tackled byother lifted inference techniques but can be encoded in Param-Px and solved using our technique.For one such example, consider an urn withn balls, where the color of each ball is given by adistribution. Determining the probability that there are at least two green balls is easy to phrase asa directed first-order graphical model. However, lifted inference over such models can no longerbe applied if we need to determine the probability of at leasttwo green or two red balls. Theprobability computation for one of these events can be viewed as a generalization of noisy-ORprobability computation, however dealing with the union requires the handling of intersection ofthe two events, due to which theO(log(N)) time computation is no longer feasible.

For a more complex example, we use an instance of acollective graphicalmodel (Sheldon and Dietterich 2011). In particular, consider a system of n agents whereeach agent moves between various states in a stochastic manner. Consider a query to evaluatewhether there are at leastk agents in a given states at a given timet. Note that we cannotcompile a model of this system into a clausal form without knowing the query. This system can berepresented as a PRISM/Px program by modeling each agent’s evolution as a Markov model. Thesize of the lifted explanation graph, and the number of recurrences for this query isO(k.t). Whenthe recurrences are evaluated along three dimensions: time, total number of agents, and number ofagents in states, resulting in a time complexity ofO(n.k.t).

5 Related Work and Discussion

First-order graphical models (Poole 2003; Braz et al. 2005)are compact representations of propo-sitional graphical models over populations. The key concepts in this field are that ofparameterizedrandom variablesandparfactors. A parameterized random variable stands for a population ofi.i.d.propositional random variables (obtained by grounding thelogical variables). Parfactors are factors(potential functions) on parameterized random variables.By allowing large number of identicalfactors to be specified in a first-order fashion, first-order graphical models provide a representationthat is independent of the population size. A key problem, then, is to performlifted probabilisticinference over these models, i.e. without grounding the factors unnecessarily. The earliest suchtechnique wasinversion eliminationdue to Poole (2003). When summing out a parameterized ran-dom variable (i.e., all its groundings), it is observed thatif all the logical variables in a parfactorare contained in the parameterized random variable, it can be summed out without grounding theparfactor.

The idea ofinversion elimination, though powerful, exploits one of the many forms of sym-metry present in first-order graphical models. Another kindof symmetry present in such modelsis that the values of an intermediate factor may depend on thehistogram of propositional random

18

variable outcomes, rather than their exact assignment. This symmetry is exploited bycountingelimination(Braz et al. 2005) and elimination bycounting formulas(Milch et al. 2008).

Van den Broeck et al. (2011) presented a form of lifted inference that uses constrained CNFtheories with positive and negative weight functions over predicates as input. Here the task ofprobabilistic inference in transformed to one of weighted model counting. To do the latter, theCNF theory is compiled into a structure known as first-order deterministic decomposable negationnormal form. The compiled representation allows lifted inference by avoiding grounding of theinput theory. This technique is applicable so long as the model can be formulated as a constrainedCNF theory.

Bellodi et al. (2014) present another approach to lifted inference for probabilistic logic pro-grams. The idea is to convert a ProbLog program to parfactor representation and use a modifiedversion of generalized counting first order variable elimination algorithm (Taghipour et al. 2013)to perform lifted inference. Problems where the model size is dependent on the query, such asmodels with temporal aspects, are difficult to solve with theknowledge compilation approach.

In this paper, we presented a technique for lifted inferencein probabilistic logic programs us-ing lifted explanation graphs. This technique is a natural generalization of inference techniquesbased on ground explanation graphs, and follows the two stepapproach: generation of an expla-nation graph, and a subsequent traversal to compute probabilities. While the size of the liftedexplanation graph is often independent of population, computation of probabilities may take timethat is polynomial in the size of the population. A more sophisticated approach to computingprobabilities from lifted explanation graph, by generating closed form formulae where possible,will enable efficient inference. Another direction of research would be to generate hints for liftedinference based on program constructs such as aggregation operators. Finally, our future work isfocused on performing lifted inference over probabilisticlogic programs that represent undirectedand discriminative models.

Acknowledgments. This work was supported in part by NSF grants IIS 1447549 and CNS1405641. We thank Andrey Gorlin for discussions and review of this work.

A Proofs

Lemma 2 (Correctness of “∧” and “∨” operations). Let (Ω : η,ψ) and(Ω′ : η ′,ψ ′) be two liftedexplanation graphs with free variablesX1,X2 . . . ,Xn. LetΣ be the set of all substitutions mappingeach Xi to a value in its domain. Then, for everyσ ∈ Σ, and⊕ ∈ ∧,∨

Gr(((Ω : η,ψ)⊕ (Ω′ : η ′,ψ ′))σ) = Gr((Ω : η,ψ)σ)⊕Gr((Ω′ : η ′,ψ ′)σ)

Proof. The proof is by structural induction.

Case 1: Q(Ω,η)∧Q(Ω′,η ′) is unsatisfiable.

Case 1.1:∧ operation

Q(Ω,η)∧Q(Ω,η ′) is unsatisfiable, implies that for eachσ ∈ Σ, either one or bothof Q(Ω,η)σ and Q(Ω′,η ′)σ are unsatisfiable. Either one or both of∃Ω.ησ and∃Ω′.η ′σ are unsatisfiable. By definition 9, either one or both of the(Ω : η,ψ)σ

19

and (Ω′ : η ′,ψ ′)σ should be( /0 : false,0). By definition 10, we know thatGr(( /0 : false,0)) = 0. ThereforeGr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′,ψ ′)σ) = 0. The∧operation in this case is defined as

(Ω : η,ψ)∧ (Ω′ : η ′,ψ ′)→ ( /0 : false,0)

Therefore,Gr(((Ω : η,ψ)∧ (Ω′ : η ′,ψ ′))σ) = Gr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′,ψ ′)σ).

Case 1.2:∨ operationAssume without loss of generality that(Ω′ : η ′,ψ ′)σ = ( /0 : false,0). Then bydefinition 10Gr((Ω : η,ψ)σ) ∨ Gr((Ω′ : η ′,ψ ′)σ) = Gr((Ω : η,ψ)σ). Since thedefinition of∨ operation in this case is to backtrack and return both(Ω : η,ψ) and(Ω′ : η ′,ψ ′), under the substitutionσ , (Ω′ : η ′,ψ ′) will be discarded. Therefore,Gr(((Ω : η,ψ)∨ (Ω′ : η ′,ψ ′))σ) = Gr((Ω : η,ψ)σ)∨Gr((Ω′ : η ′,ψ ′)σ).

Case 2: Q(Ω,η)∧Q(Ω′,η ′) is satisfiable.

Case 2.1:∧ operationThe∧ operation is defined as(Ω : η,ψ)∧ (Ω′ : η ′,ψ ′)→ (Ω∪Ω′ : η ∧η ′,ψ ∧ψ ′)

Case 2.1.1:(Q(Ω,η)∧Q(Ω′,η ′))σ is unsatisfiable(Q(Ω,η) ∧ Q(Ω′,η ′))σ is unsatisfiable, implies atleast one ofQ(Ω,η)σ andQ(Ω′,η)σ is unsatisfiable. Atleast one of∃Ω.ησ and ∃Ω′.η ′σ is unsatisfi-able. By definition 9, atleast one of(Ω : η,ψ)σ and (Ω′ : η ′,ψ ′)σ is equal to( /0 :false,0). By definition 10,Gr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′,ψ ′)σ) = 0. Fur-ther, if one of∃Ω.ησ and∃Ω′.η ′σ is unsatisfiable∃Ω∪Ω′.η ∧η ′σ is also unsat-isfiable. Therefore, by definition 9,Gr((Ω∪Ω′ : η ∧η ′,ψ ∧ψ ′)σ) = 0. ThereforeGr(((Ω : η,ψ)∧ (Ω′ : η ′,ψ ′))σ) = Gr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′,ψ ′)σ).

Case 2.1.1:(Q(Ω,η)∧Q(Ω′,η ′))σ is satisfiable.

Case 2.1.1.1:ψ = 0 (analogouslyψ ′ = 0).By definition 10,Gr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′,ψ ′)σ) = 0. Based on thedefinition of∧ operation(Ω∪Ω′ : η∧η ′,ψ∧ψ ′)σ =(Ω∪Ω′ : η∧η ′,0)σ . Bydefinition 10Gr((Ω∪Ω′ : η ∧η ′,0)σ) = 0. Therefore,Gr(((Ω : η,ψ)∧ (Ω′ :η ′,ψ ′))σ) = Gr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′,ψ ′)σ).

Case 2.1.1.2:ψ = 1 (analogouslyψ ′ = 1).By definition 10,Gr((Ω : η,ψ)σ) = 1. ThereforeGr((Ω : η,ψ)σ)∧Gr((Ω′ :η ′,ψ ′)σ) = Gr((Ω′ : η ′,ψ ′)σ). Based on the definition of∧ operation(Ω∪Ω′ : η ∧ η ′,ψ ∧ψ ′) = (Ω∪Ω′ : η ∧η ′,ψ ′). By definition 9(Ω∪Ω′ : η ∧η ′,ψ ′)σ =(Ω∪Ω′ : (η∧η ′)σ ,ψ ′σ) and(Ω′ : η ′,ψ ′)σ =(Ω′ : η ′σ ,ψ ′σ). Wecan claim thatGr((Ω′ : η ′σ ,ψ ′σ)) = Gr((Ω∪Ω′ : (η ∧η ′)σ ,ψ ′σ)) because,range(t,η ′σ) = range(t,(η ∧η ′)σ) for anyt ∈ Ω′ Why? Because there existno variables in common betweenησ and η ′σ and ησ ∧η ′σ is satisfiablebased on the assumptions.

Case 2.1.1.3: Neitherψ nor ψ ′ is a leaf node and(s, t) < (s′, t ′) (analogously(s′, t ′)< (s, t)).Since(Q(Ω,η)∧Q(Ω′,η ′)σ) is satisfiable, we can conclude thatησ andη ′σare satisfiable. By definition 9,Gr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′,ψ ′)σ)=Gr(Ω :

20

ησ ,ψσ)∧Gr(Ω′ : η ′σ ,ψ ′σ). Let us consider the case wheret is either a con-stant or a free variable. Since(s, t)< (s′, t ′) it implies that(s, tσ)≺ (s′, t ′σ).By definition 10,Gr(Ω : ησ ,ψσ)∧Gr(Ω′ : η ′σ ,ψ ′σ) = Gr(Ω,ησ ,ψσ)∧Gr(Ω′,η ′σ ,ψ ′σ). FurtherGr(Ω : ησ ,ψσ)∧Gr(Ω′ : η ′σ ,ψ ′σ) = (s, tσ)[αi :Gr(Ω,ησ ,ψiσ)∧Gr(Ω′,η ′σ ,ψ ′σ)]. Based on the definition of∧ operation(Ω∪Ω′ : η ∧η ′,ψ ∧ψ ′)σ = (Ω∪Ω′ : η ∪η ′,(s, t)[αi : ψi ∧ψ ′])σ . Therefore,Gr((Ω∪Ω′ : η∧η ′,ψ∧ψ ′)σ)=Gr(Ω∪Ω′,(η∧η ′)σ ,(s, t)[αi : ψi∧ψ ′]σ)=(s, tσ)[αi : Gr(Ω∪Ω′,(η ∧η ′)σ ,(ψi ∧ψ ′)σ)]. Based on inductive hypothe-sis,Gr(Ω,ησ ,ψiσ)∧Gr(Ω′,η ′σ ,ψ ′σ) =Gr(Ω∪Ω′,(η ∧η ′)σ ,(ψi ∧ψ ′)σ).Now consider the case whent ∈ Ω. By definition 10Gr(Ω : η,ψ)σ ∧Gr(Ω′ :η ′,ψ ′)σ = Gr(Ω,ησ ,ψσ) ∧ Gr(Ω′,η ′σ ,ψ ′σ) =

∨c∈range(t,ησ)(s,c)[αi :

Gr(Ω \ t,ησ [c/t],ψiσ [c/t]) ∧ Gr(Ω,η ′σ ,ψ ′σ)]. Similarly, Gr(Ω ∪Ω′,(η ∧η ′)σ ,(s, t)[αi : ψi ∧ψ ′]σ) =

∨c∈range(t,(η∧η ′)σ)(s,c)[αi : Gr(Ω∪Ω′ \

t,(η ∧ η ′)σ [c/t],(ψi ∧ ψ ′)σ [c/t])]. But range(t,ησ) = range(t,(η ∧η ′)σ) and based on inductive hypothesis,Gr(Ω \ t,ησ [c/t],ψiσ [c/t])∧Gr(Ω′,η ′σ ,ψ ′σ) = Gr(Ω∪Ω′ \t,(η ∧η ′)σ [c/t],(ψi ∧ψ ′)σ [c/t])

Case 2.1.1.4:Neitherψ norψ ′ is a leaf node and(s, t) = (s′, t ′).Since the variables in the lifted explanation graphs are standardized apart,this implies that neithert nor t ′ is a bound variable.Gr((Ω : η,ψ)σ) ∧Gr((Ω′ : η ′,ψ ′)σ) = (s, tσ)[αi : Gr(Ω,ησ ,ψiσ)∧Gr(Ω′,η ′σ ,ψ ′

i σ)]. Sim-ilarly, Gr(Ω ∪ Ω′,(η ∧ η ′)σ ,(s, t)[αi : ψi ∧ ψ ′

i ]σ) = (s, tσ)[αi : Gr(Ω ∪Ω′,(η∧η ′)σ ,(ψi ∧ψ ′

i )σ)]. Based on inductive hypothesis,Gr(Ω,ησ ,ψiσ)∧Gr(Ω′,η ′σ ,ψ ′

i σ) = Gr(Ω∪Ω′,(η ∧η ′)σ ,(ψi ∧ψ ′i )σ).

Case 2.1.1.5:Neitherψ nor ψ ′ is a leaf node and(s, t) 6∼ (s′, t ′) and t is a freevariable or a constant andt ′ is a free variable.Consider the case wheret andt ′ are free variables, ort is a constant andt ′ isa free variable. Based onσ exactly one of(s, tσ)< (s, t ′σ), (s, tσ) = (s′, tσ)and (s, t ′σ) < (s, tσ) will hold. According to the definition of∧ operation,three lifted explanation graphs are returned(Ω∪Ω′ : η ∧η ′∧ t < t ′,ψ ∧ψ ′),(Ω∪Ω′ : η ∧η ′∧ t = t ′,ψ ∧ψ ′), and(Ω∪Ω′ : η ∧η ′∧ t ′ < t,ψ ∧ψ ′). Underthe substitutionσ only one will be retained. And the proof then proceeds asin case 2.1.1.3 or case 2.1.1.4.

Case 2.1.1.6:Neitherψ nor ψ ′ is a leaf node and(s, t) 6∼ (s′, t ′) and t is a freevariable or a constant andt ′ is a bound variable. ConsiderGr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′,ψ ′)σ). This can be written as(s, tσ)[αi : Gr(Ω,ησ ,ψiσ)] ∧∨

c∈range(t ′,η ′σ)(s,c)[α ′i : Gr(Ω′ \ t ′,η ′σ [c/t ′],ψ ′

i σ [c/t ′])]. By using conti-nuity of range we can rewrite it as

(s, tσ)[αi : Gr(Ω,ησ ,ψiσ)]∧∨

c∈range(t ′,(η ′∧t<t ′)σ)

(s,c)[α ′i : Gr(Ω′ \t ′,η ′σ [c/t ′],ψ ′

i σ [c/t ′])]

∨(s, tσ)[α ′i : Gr(Ω′ \t ′,η ′σ [tσ/t ′],ψ ′

i σ [tσ/t ′])]

∨∨

c∈range(t ′,(η ′∧t ′<t)σ)

(s,c)[α ′i : Gr(Ω′ \t ′,η ′σ [c/t ′],ψ ′

i σ [c/t ′])]

21

By distributivity of ∧ over∨ for ground explanation graphs, we can rewrite itas

(s, tσ)[αi : Gr(Ω,ησ ,ψiσ)]∧∨

c∈range(t ′,(η ′∧t<t ′)σ)

(s,c)[α ′i : Gr(Ω′ \t ′,η ′σ [c/t ′],ψ ′

i σ [c/t ′])]

∨(s, tσ)[αi : Gr(Ω,ησ ,ψiσ)]∧ (s, tσ)[α ′i : Gr(Ω′ \t ′,η ′σ [tσ/t ′],ψ ′

i σ [tσ/t ′])]

∨(s, tσ)[αi : Gr(Ω,ησ ,ψiσ)]∨

c∈range(t ′,(η ′∧t ′<t)σ)

(s,c)[α ′i : Gr(Ω′ \t ′,η ′σ [c/t ′],ψ ′

i σ [c/t ′])]

This can be re-written as

Gr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′∧ t < t ′,ψ ′)σ)

∨Gr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′∧ t = t ′,ψ ′)σ)

Gr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′∧ t ′ < t,ψ ′)σ)

By inductive hypothesis this is equal to∨

ϕ∈t<t ′,t=t ′,t ′<t ′

Gr((Ω∪Ω′ : η ∧η ′∧ϕ,ψ ∧ψ ′)σ)

Case 2.1.1.7:Neitherψ norψ ′ is a leaf node and(s, t) 6∼ (s′, t ′) andt, t ′ are boundvariables.Whenl1= l2 andu1= u2, Gr((Ω : η,ψ)σ)∧Gr((Ω′ : η ′,ψ ′)σ) can be writtenas∨

c∈[l1,u1]

(s,c)[αi : Gr((Ω,η,ψi)σ [c/t])]∧∨

c′∈[l2,u2]

(s,c′)[αi : Gr((Ω′,η ′,ψ ′i )σ [c′/t ′])]

Let the sequence of positive integers in the interval[l1,u1] be 〈l1 =k1,k2, . . . ,kn = u1〉. Then the above expression can be re-written as

((s,k1)[αi : Gr((Ω,η,ψi)σ [k1/t])]∧∨

c′∈[k1,kn]

(s,c′)[αi : Gr((Ω′,η ′,ψ ′i )σ [c/t ′])])∨

((s,k2)[αi : Gr((Ω,η,ψi)σ [k2/t])]∧∨

c′∈[k1,kn]

(s,c′)[αi : Gr((Ω′,η ′,ψ ′i )σ [c′/t ′])])∨

. . .

((s,kn)[αi : Gr((Ω,η,ψi)σ [kn/t])]∧∨

c′∈[k1,kn]

(s,c′)[αi : Gr((Ω′,η ′,ψ ′i )σ [c′/t ′])])

22

This can again be re-written as∨

c∈[k1,kn]

((s,c)[αi : Gr((Ω,η,ψi)σ [c/t])]∧

(s,c)[αi : Gr((Ω′,η ′,ψ ′i )σ [c/t ′])])

∨

d∈[k1,kn−1]

((s,d)[αi : Gr((Ω,η,ψi)σ [d/t])]∧

∨

e∈[d+1,kn]

(s,e)[αi : Gr((Ω′,η ′,ψ ′i )σ [e/t ′])])

∨

f∈[k1,kn−1]

((s, f )[αi : Gr((Ω′,η ′,ψ ′i )σ [ f/t])]∧

∨

g∈[d+1,kn]

(s,g)[αi : Gr((Ω,η,ψi)σ [g/t ′])])

By moving the substitutions out we get the following equivalent expression.∨

c∈[k1,kn]

(s, t)[αi : Gr((Ω,η,ψi))]σ [c/t]∧

(s, t ′)[αi : Gr((Ω′,η ′,ψ ′i ))]σ [c/t ′])

∨

d∈[k1,kn−1]

((s, t)[αi : Gr((Ω,η,ψi))]σ [d/t]∧

∨

e∈[d+1,kn]

(s, t ′)[αi : Gr((Ω′,η ′,ψ ′i ))]σ [e/t ′])

∨

f∈[k1,kn−1]

((s, t ′)[αi : Gr((Ω′,η ′,ψ ′i ))]σ [ f/t ′]∧

∨

g∈[ f+1,kn]

(s, t)[αi : Gr((Ω,η,ψi))]σ [g/t])

Since[d+1,kn] = /0 and[ f +1,kn] = /0 whend = kn and f = kn respectively,the above expression can be re-written as follows.

∨

c∈[k1,kn]

((s, t)[αi : Gr((Ω,η,ψi))]σ [c/t]∧

(s, t ′)[αi : Gr((Ω′,η ′,ψ ′i ))]σ [c/t ′])

∨

d∈[k1,kn]

((s, t)[αi : Gr((Ω,η,ψi))]σ [d/t]∧

∨

e∈[d+1,kn]

(s, t ′)[αi : Gr((Ω′,η ′,ψ ′i ))]σ [e/t ′])

∨

f∈[k1,kn]

((s, t ′)[αi : Gr((Ω′,η ′,ψ ′i ))]σ [ f/t ′]∧

∨

g∈[ f+1,kn]

(s, t)[αi : Gr((Ω,η,ψi))]σ [g/t])

23

First we transform the above expression by usingc in place ofd and f . Further,we perform a simple renaming operation with a new variablet ′′ to get thefollowing equivalent expression.

∨

c∈[k1,kn]

((s, t ′′)[αi : Gr((Ω,η,ψi))]σ [t ′′/t][c/t ′′]∧

(s, t ′′)[αi : Gr((Ω′,η ′,ψ ′i ))]σ [t ′′/t ′][c/t ′′])

∨

c∈[k1,kn]

((s, t ′′)[αi : Gr((Ω,η,ψi))]σ [t ′′/t][c/t ′′]∧

∨

e∈[c+1,kn]

(s, t ′)[αi : Gr((Ω′,η ′,ψ ′i ))]σ [e/t ′])

∨

c∈[k1,kn]

((s, t ′′)[αi : Gr((Ω′,η ′,ψ ′i ))]σ [t ′′/t ′][c/t ′′]∧

∨

g∈[c+1,kn]

(s, t)[αi : Gr((Ω,η,ψi))]σ [g/t])

The above expression can now be simplified as follows∨

c∈[k1,kn]

(s, t ′′)[αi :

(Gr((Ω,η,ψi))]σ [t ′′/t][c/t ′′]∧Gr((Ω′,η ′,ψ ′

i ))]σ [t ′′/t ′][c/t ′′])

∨

Gr((Ω,η,ψi))]σ [t ′′/t][c/t ′′]∧

∨

e∈[c+1,kn]

(s, t ′)[αi : Gr((Ω′,η ′,ψ ′i ))]σ [e/t ′]

∨

Gr((Ω′,η ′,ψ ′

i ))]σ [t ′′/t ′][c/t ′′]∧∨

g∈[c+1,kn]

(s, t)[αi : Gr((Ω,η,ψi))]σ [g/t]

]

If the range oft ′′ can be forced to be[l1,u1] and if we introduce additionalconstraintst ′′ < t andt ′′ < t ′, the above grounding expression is equivalent tothe grounding of the following lifted explanation graph

(Ω∪t ′′ : η ∧ l1−1< t ′′∧ t ′′−1< u1∧ t ′′ < t∧ t ′′ < t ′,

(s, t ′′)[αi :

(ψi [t′′/t]⊕ψ ′

i [t′′/t ′])∨

(ψi [t′′/t]⊕ψ ′)∨

(ψ ′i [t

′′/t ′]⊕ψ)])

Therefore the theorem is proved in this case.The proof for remaining cases is analogous and straightforward, because basedon the values ofl1,u1, l2 andu2 we have disjuncts where the ranges of rootvariables are either identical or non-overlapping. Both ofthese cases havebeen proved already.

24

Case 2.2:Operation∨

Case 2.2.1:WhenQ(Ω,η) is not identical toQ(Ω′,η ′). The definition of∨ operationreturns several lifted explanation graphs. IfQ(Ω,η)σ andQ(Ω′,η ′)σ are both un-satisfiable then we can see thatGr((Ω : η,ψ)σ)∨Gr((Ω′ : η ′,ψ ′)σ) = 0. Further,all the lifted explanation graphs returned by the definitionof ∨ have unsatisfiableconstraints therefore, the theorem is proved in this case. WhenQ(Ω,η)σ is satis-fiable but notQ(Ω′,η ′)σ ( or vice-versa),Gr((Ω : η,ψ)σ)∨Gr((Ω′ : η ′,ψ ′)σ) =Gr((Ω : η,ψ)σ). Based on the definition of∨ operation, only the first lifted expla-nation graph has satisfiable constraint, so the theorem is proved. Same reasoningapplies in the symmetric case.

Case 2.2.2:When Q(Ω,η) is identical toQ(Ω′,η ′) and Q(Ω,η)σ is unsatisfiable,the proof of the theorem is trivial. So we consider the case where Q(Ω,η)σandQ(Ω′,η ′)σ are both satisfiable andQ(Ω,η) may or may not be identical toQ(Ω′,η ′).

Case 2.2.2.1:When ψ = 0 (analogouslyψ ′ = 0). Here Gr((Ω : η,ψ)σ) ∨Gr((Ω′ : η ′,ψ ′)σ) = Gr((Ω′ : η ′,ψ ′)σ). Similarly Gr((Ω ∪ Ω′ : η ∧η ′,ψ ′)σ) = Gr((Ω′ : η ′,ψ ′)σ). Therefore the theorem is proved.

Case 2.2.2.2:When ψ = 1 (analogouslyψ ′ = 1). Here Gr((Ω : η,ψ)σ) ∨Gr((Ω′ : η ′,ψ ′)σ) = 1. Similarly Gr((Ω∪Ω′ : η ∧η ′,1)σ) = 1. Therefore,the theorem is proved.

Case 2.2.2.3:Neither ψ nor ψ ′ is a leaf node and(s, t) < (s, t ′) (analogously(s′, t ′)< (s, t)). Proof is analogous to Case 2.1.1.3.

Case 2.2.2.4:Neitherψ nor ψ ′ is a leaf node and(s, t) = (s′, t ′). Proof is analo-gous to Case 2.1.1.4.

Case 2.2.2.5:Neitherψ nor ψ ′ is a leaf node and(s, t) 6∼ (s′, t ′) and t is a freevariable or a constant andt ′ is a free variable. Proof is analogous to Case2.1.1.5.

Case 2.2.2.6:Neitherψ nor ψ ′ is a leaf node and(s, t) 6∼ (s′, t ′) and t is a freevariable or a constant andt ′ is a bound variable. Proof is analogous to Case2.1.1.6.

Case 2.2.2.7:Neitherψ norψ ′ is a leaf node and(s, t) 6∼ (s′, t ′) andt, t ′ are boundvariables. Proof is analogous to Case 2.1.1.7. However, we need to show thatthe simplified form is valid. Again consider the RHS∨

c∈[l1,u1]

(s,c)[αi : Gr((Ω,η,ψi)σ [c/t])]∧∨

c′∈[l2,u2]

(s,c′)[αi : Gr((Ω′,η ′,ψ ′i )σ [c′/t ′])]

We can perform two renaming operations and rewrite the aboveexpression as∨

c∈[l1,u1]

(s,c)[αi : Gr((Ω,η,ψi)σ [t ′′/t][c/t])]∧

∨

c′∈[l2,u2]

(s,c′)[αi : Gr((Ω′,η ′,ψ ′i )σ [t ′′/t ′][c′/t ′])]

25

It is straightforward to see that the above grounding is equivalent to

Gr(Ω∪Ω′∪t ′′\t, t ′′ : η ∧η ′[t ′′/t, t ′′/t ′],(s, t ′′)[αi : ψi [t′′/t]∨ψ ′

i [t′′/t ′]])

Lemma 3 (Correctness ofquantify). Let (Ω : η,ψ) be a lifted explanation graph, letσ−X be asubstitution mapping all the free variables in(Ω : η,ψ) except X to values in their domains. LetΣ be the set of mappingsσ such thatσ maps all free variables to values in their domains and isidentical toσ−X at all variables except X. Then the following holds

Gr(quantify((Ω : η,ψ),X)σ−X) =∨

σ∈ΣGr((Ω : η,ψ)σ)

Proof. Let us first consider the case when the root is(s,X) for some switchs in ψ. quanti f y((Ω :η,ψ),X) = (Ω∪X : η,ψ). If ησ−X is unsatisfiable, thenGr(quanti f y((Ω : η,ψ),X)σ−X) = 0.Next,

∨σ∈Σ Gr((Ω : η,ψ)σ [c/X]) = 0 sinceησ is also unsatisfiable for anyσ . On the other hand

if ησ−X is satisfiable,

Gr(quanti f y((Ω : η,ψ),X)σX) = Gr((Ω∪X : η,ψ)σ−X)

= Gr((Ω∪X : ησ−X,ψσ−X))

=∨

c∈range(X,ησ−X)

(s,c)[αi : Gr(Ω\X,ησ−X[c/X],ψiσ−X[c/X])]

Next,∨

σ∈ΣGr((Ω : η,ψ)σ) =

∨

c∈σX(ησ−X)

(s,c)[αi : Gr(Ω\X,ησ−X[c/X],ψiσ−X[c/X])]

By using continuity of range,range(X,ησ−X) = σX(ησ−X). Therefore the theorem is proved inthis case. Now consider the case whereX doesn’t occur in the root of the lifted explanation graph.Since, the lifted explanation graph is well-structured, there is subtreeψ ′ in ψ such that the rootof ψ ′ containsX and all occurrences ofX are withinψ ′. If we remove the subtreeψ ′ from ψ,thenGr(quanti f y((Ω : η,ψ),X)σ−X) =

∨σ∈Σ Gr((Ω : η,ψ)σ) since all the disjuncts on the right

hand side will be identical to each other and to the ground explanation tree on the left hand side.Therefore, we need only show that the grounding of the subtreeψ ′ whenX is a quantified variableis same as

∨σ∈Σ Gr((Ω,ησ ,ψ ′σ) which we already showed.

Theorem 4(Correctness of Lifted Inference). Let (Ω : η,ψ) be a closed lifted explanation graph,andφ = Gr(Ω : η,ψ) be the corresponding ground explanation graph. Then prob((Ω : η,ψ)) =prob(φ).

Proof. Consider the following modification of the grounding algorithm for lifted explanationgraphs. An extra argument is added toGr(Ω,η,ψ) to make itGr(Ω,η,ψ,σ). Whenever a variableis substituted by a value from its domain,σ is augmented to record the substitution. Further the setΩ and the constraint formulaη are not altered when recursively grounding subtrees. Rather, ησ istested for satisfiability andtσ is tested for membership inΩ to determine if a node contains boundvariable. The grounding of a lifted explanation graph(Ω : η,ψ) is given byGr(Ω,η,ψ,). It is

26

easy to see that the ground explanation tree produced by thismodified procedure is same as thatproduced by the procedure given in definition 10.

We will prove that ifGr(Ω,η,ψ,σ) = φ , thenprob(φ) = f (σ ,ψ). We prove this using struc-tural induction based on the structure ofψ.

Case 1: If ψ is a 0 leaf node, thenGr(Ω,η,0,σ) = 0. Thereforeprob(φ) = f (σ ,ψ) = 0.

Case 2: If ψ is a 1 leaf node, andησ is satisfiable, thenGr(Ω,η,ψ,σ) = 1 and prob(φ) =f (σ ,ψ). On the other hand ifησ is not satisfiable, thenGr(Ω,η,ψ,σ) = 0 andprob(φ) =f (σ ,ψ).

Case 3: If ψ = (s, t)[αi : ψi ] and tσ 6∈ Ω, and ησ is satisfiable, thenφ = (s, tσ)[αi :Gr(Ω,η,ψi,σ)]. Therefore, prob(φ) = ∑αi∈Ds

πs(αi) · prob(Gr(Ω,η,ψi,σ)). Butf (σ ,ψ) =∑αi∈Ds

πs(αi) · f (σ ,ψi). Therefore, by inductive hypothesis the theorem is provedin this case. Ifησ is unsatisfiable, thenφ = 0, thereforeprob(φ) = f (σ ,ψ).

Case 4: If ψ = (s, t)[αi : ψi ] and tσ ∈ Ω andησ is satisfiable. In this caseφ is defined as thedisjunction of the ground trees(s,c)[αi : Gr(Ω,η,ψi,σ [c/t])] wherec∈ range(t,ησ). Letus order the grounding trees in the increasing order of the valuec. Given two treesφ(s,c) andφ(s,c+1) corresponding to valuesc,c+ 1 ∈ range(t,ησ), the∨ operation on ground trees,would recursively perform disjunction of theφ(s,c+1) with the subtrees in the following set

Fr = φ ′ | φ ′ is a maximal subtree ofφ(s,c) withoutc as instance argument of any node

The setFr contains ground trees corresponding to the trees infrontiert(ψ) and possibly 0leaves. Since we assumed that frontier subsumption property is satisfied, for everyφ ′ ∈ Frthat is not a 0 leaf, it holds that every explanation inφ(s,c+1) contains a subexplanation inφ ′.Therefore,φ(s,c) ∨ φ(s,c+1), can be computed equivalently asφ(s,c) ∨ (¬ψt [c/t]∧ φ(s,c+1)).Since¬ψt [c/t] contains only internal nodes with instance argumentc, the explanationsof ¬ψt [c/t] are independent of explanations inφ(s,c+1). Further, the explanations ofφ(s,c) are mutually exclusive with explanations in¬ψt [c/t]. Therefore the probabilityprob(φ(s,c)∨φ(s,c+1)) can be computed asprob(φ(s,c))+(1− prob(ψt[c/t])) · prob(ψ(s,c+1)).The probability of the complete disjunction

∨c∈range(t,ησ)(s,c)[αi : Gr(Ω,η,ψi ,σ [c/t])] is

obtained by the expression

prob(φ(s,l))+(1− prob(ψt[l/t]))× (

prob(φ(s,l+1))+(1− prob(ψt[l +1/t]))× (

· · ·

(1− prob(ψt[u−1/t]))× prob(φ(s,u))))

Now consider f (σ ,ψ) for the sameψ, f (σ ,ψ) = h(σ [l/t],ψ). The expansion ofh(σ [l/t],ψ) is as follows

g(σ [l/t],ψ)+(1− prob(ψt[l/t]))× (

g(σ [l +1/t],ψ)+(1− prob(ψt[l +1/t]))× (

· · ·

(1− prob(ψt[u−1/t])×g(σ [u/t],ψ)

27

For a given ground treeφ(s,c), prob(φ(s,c)) = ∑αi∈Dsπs(αi) · prob(Gr(Ω,η,ψi,σ [c/t])).

Similarly g(σ [c/t],ψ) = ∑αi∈Dsπs(αi) · f (σ [c/t],ψi). But by inductive hypothesis,

Prob(Gr(Ω,η,ψi,σ [c/t])) = f (σ [c/t],ψi). Therefore,prob(φ(s,c)) = g(σ [c/t],ψ). There-fore prob(φ) = f (σ ,ψ). When,ησ is not satisfiable,prob(φ) = f (σ ,ψ) = 0. Therefore,the theorem is proved.

References

[Bellodi et al. 2014] BELLODI , E., LAMMA , E., RIGUZZI , F., COSTA, V. S., AND ZESE, R.2014. Lifted variable elimination for probabilistic logicprogramming.TPLP 14,4-5, 681–695.

[Braz et al. 2005] BRAZ, R. D. S., AMIR, E., AND ROTH, D. 2005. Lifted first-order proba-bilistic inference. InProceedings of the Nineteenth International Joint Conference on ArtificialIntelligence. 1319–1325.

[Bryant 1992] BRYANT, R. E. 1992. Symbolic boolean manipulation with ordered binary-decision diagrams.ACM Computing Surveys (CSUR) 24,3, 293–318.

[Darwiche 2001] DARWICHE, A. 2001. On the tractable counting of theory models and its appli-cation to truth maintenance and belief revision.Journal of Applied Non-Classical Logics 11,1-2, 11–34.

[Darwiche 2011] DARWICHE, A. 2011. SDD: A new canonical representation of propositionalknowledge bases. InInternational Joint Conference on Artificial Intelligence. 819–826.

[De Raedt et al. 2007] DE RAEDT, L., K IMMIG , A., AND TOIVONEN, H. 2007. ProbLog: Aprobabilistic prolog and its application in link discovery. In Proceedings of the 20th Interna-tional Joint Conference on Artificial Intelligence. Vol. 7. 2462–2467.

[Dries et al. 2015] DRIES, A., K IMMIG , A., MEERT, W., RENKENS, J., DEN BROECK, G. V.,VLASSELAER, J., AND RAEDT, L. D. 2015. Problog2: Probabilistic logic programming.In Machine Learning and Knowledge Discovery in Databases - European Conference, ECMLPKDD. 312–315.

[Kisynski 2010] KISYNSKI, J. J. 2010. Aggregation and constraint processing in lifted proba-bilistic inference. Ph.D. thesis, The University of British Columbia (Vancouver.

[Larsen et al. 1997] LARSEN, K. G., LARSSON, F., PETTERSSON, P., AND Y I , W. 1997. Effi-cient verification of real-time systems: Compact data structure and state-space reduction. InInIEEE RTSS97. 14–24.

[Milch et al. 2008] MILCH , B., ZETTLEMOYER, L. S., KERSTING, K., HAIMES, M., AND

KAELBLING , L. P. 2008. Lifted probabilistic inference with counting formulas. InProceedingsof the Twenty-Third AAAI Conference on Artificial Intelligence. 1062–1068.

28

[Mine 2006] MINE, A. 2006. The octagon abstract domain.Higher-Order and Symbolic Compu-tation 19,1, 31–100.

[Poole 2003] POOLE, D. 2003. First-order probabilistic inference. InProceedings of the Eigh-teenth International Joint Conference on Artificial Intelligence. Vol. 3. 985–991.

[Riguzzi and Swift 2011] RIGUZZI , F. AND SWIFT, T. 2011. The PITA system: Tabling andanswer subsumption for reasoning under uncertainty.Theory and Practice of Logic Program-ming 11,4-5, 433–449.

[Sato and Kameya 1997] SATO, T. AND KAMEYA , Y. 1997. PRISM: a language for symbolic-statistical modeling. InProceedings of the Fifteenth International Joint Conference on ArtificialIntelligence. Vol. 97. 1330–1339.

[Sato and Kameya 2001] SATO, T. AND KAMEYA , Y. 2001. Parameter learning of logic programsfor symbolic-statistical modeling.Journal of Artificial Intelligence Research, 391–454.

[Sheldon and Dietterich 2011] SHELDON, D. R. AND DIETTERICH, T. G. 2011. Collectivegraphical models. InAdvances in Neural Information Processing Systems 24. 1161–1169.

[Srinivasan et al. 1990] SRINIVASAN , A., KAM , T., MALIK , S., AND BRAYTON, R. K. 1990.Algorithms for discrete function manipulation. InInternational Conference on Computer-AidedDesign, ICCAD. 92–95.

[Taghipour et al. 2013] TAGHIPOUR, N., FIERENS, D., DAVIS, J., AND BLOCKEEL, H. 2013.Lifted variable elimination: Decoupling the operators from the constraint language.J. Artif.Intell. Res. (JAIR) 47, 393–439.

[Van den Broeck et al. 2014] VAN DEN BROECK, G., MEERT, W., AND DARWICHE, A. 2014.Skolemization for weighted first-order model counting. InPrinciples of Knowledge Represen-tation and Reasoning.

[Van den Broeck et al. 2011] VAN DEN BROECK, G., TAGHIPOUR, N., MEERT, W., DAVIS, J.,AND DE RAEDT, L. 2011. Lifted probabilistic inference by first-order knowledge compilation.In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence.2178–2185.

[Yovine 1998] YOVINE, S. 1998. Model-checking timed automata. InEmbedded Systems. Num-ber 1494 in LNCS. 114–152.

29

Inference in Probabilistic Logic Programs using Lifted ... · Figure 2: Recurrences for computing...

Documents

Transcript of Inference in Probabilistic Logic Programs using Lifted ... · Figure 2: Recurrences for computing...