nasslli2010_4

8/3/2019 nasslli2010_4

1/57

MARKOVLOGICNASSLLI 2010

Mathias Niepert

8/3/2019 nasslli2010_4

2/57

MARKOV LOGIC: INTUITION

A logical KB is a set ofhard constraints

on the set of possible worlds

Lets make some of them soft constraints:

When a world violates a formula, it becomes less

probable, not impossible

Give each formula a weight

(Higher weight Stronger constraint)

satisfiesitformulasofweightsexpP(world)

8/3/2019 nasslli2010_4

3/57

MARKOV LOGIC: DEFINITION

A Markov Logic Network (MLN) is a set of pairs

(Fi, wi)where

Fi is a formula in first-order logic wi is a real-valued weight

Together with a finite set of constants C, itdefines a Markov network with

One binary node for each grounding of each predicate

in the MLN. The value of the node is 1 if the ground

atom is true, and 0 otherwise.

Onefeature for each grounding of each formula F inthe MLN, with the corresponding weight w

i

8/3/2019 nasslli2010_4

4/57

LOG-LINEAR MODELS

A distribution is a log-linear model over aMarkov network H if it is associated with

A set of features F= {f1(D1),,fk(Dk)}, where each Diis a complete subgraph (clique) ofH,

A set of weights w1 ,,wk such that

P(X1; :::Xn) =1

Z

exphPki=1

wifi(Di)i

8/3/2019 nasslli2010_4

5/57

ASSUMPTIONS

1. Unique names.Different constants refer to

different objects. (Genesereth & Nilsson, 1987)

2. Domain closure. The only objects in the

domain are those representable using the

constant and function symbols (Genesereth &

Nilsson, 1987)

3. Known functions. For each function, the value

of that function applied to every possible tuple of

arguments is known, and is an element of C

8/3/2019 nasslli2010_4

6/57

EXAMPLE: FRIENDS & SMOKERS

habits.smokingsimilarhaveFriends

cancer.causesSmoking

8/3/2019 nasslli2010_4

7/57


)()(),(,)()(

ySmokesxSmokesyxFriendsyxxCancerxSmokesx



8/3/2019 nasslli2010_4

8/57


1.1

5.1



)()(),(,)()(


8/3/2019 nasslli2010_4

9/57


1.1

5.1

Two constants:Anna (A) and Bob (B)



)()(),(,)()(


8/3/2019 nasslli2010_4

10/57


)()(),(,

)()(

ySmokesxSmokesyxFriendsyx

xCancerxSmokesx

1.1

5.1

Cancer(A)

Smokes(A) Smokes(B)

Cancer(B)


8/3/2019 nasslli2010_4

11/57


)()(),(,

)()(


xCancerxSmokesx

1.1

5.1

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)


8/3/2019 nasslli2010_4

12/57


)()(),(,)()(


xCancerxSmokesx

1.1

5.1

Cancer(A)


Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)


8/3/2019 nasslli2010_4

13/57


)()(),(,

)()(


xCancerxSmokesx

1.1

5.1

Cancer(A)


Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)


8/3/2019 nasslli2010_4

14/57

MARKOV LOGIC NETWORKS

MLN is template for ground Markov networks

Probability of a world x:

Typed variables and constants greatly reduce size of

ground Markov net Functions, existential quantifiers, etc.

Infinite and continuous domains are possible

Weight of formula i No. of true groundings of formula i in x

iii xnwZxP )(exp

1)(

8/3/2019 nasslli2010_4

15/57


)()(),(,)()(


xCancerxSmokesx

1.1

5.1


)()(:5.1

)()(:5.1

BCancerBSmokes

ACancerASmokes

)()(),(:1.1)()(),(:1.1

)()(),(:1.1

)()(),(:1.1

BSmokesBSmokesBBFriends

ASmokesBSmokesABFriends

BSmokesASmokesBAFriends

ASmokesASmokesAAFriends

3*1.11*5.1exp1

)(exp1

))(,)(,),(,),(,),(,),(,)(,)((

Z

xnwZ

TBCFACTBBFFABFTBAFTAAFFBSTASP

i

ii

8/3/2019 nasslli2010_4

16/57

RELATION TO FIRST-ORDER LOGIC

Infinite weights First-order logic Satisfiable KB, positive weights

Satisfying assignments = Modes of distribution

Markov logic allows inconsistencies (contradictionsbetween formulas)

8/3/2019 nasslli2010_4

17/57

MAP INFERENCE IN

MARKOV LOGIC NETWORKS

Problem: Find most likely state of world y given

evidence e

)|(maxarg eyPy

Query Evidence

8/3/2019 nasslli2010_4

18/57

MAP INFERENCE


evidence e

ni is thefeature corresponding to formula Fi

wi is the weight corresponding to formula Fi

i

ii

ey

eynwZ

),(exp1maxarg

8/3/2019 nasslli2010_4

19/57

MAP INFERENCE


evidence e



i

iiy

eynw ),(maxarg

8/3/2019 nasslli2010_4

20/57

MAP INFERENCE

)()(),(,

)()(


xCancerxSmokesx

1.1

5.1

Cancer(A)


Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)


Evidence: Friends(A,B), Friends(B,A), Smokes(B)true

true ?

??true

? ?

8/3/2019 nasslli2010_4

21/57

MAP INFERENCE

)()(),(,

)()(


xCancerxSmokesx

1.1

5.1

Cancer(A)


Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)


Evidence: Friends(A,B), Friends(B,A), Smokes(B)true

true false

truetruetrue

false true

8/3/2019 nasslli2010_4

22/57

MAP INFERENCE

Problem: Find most likely state of world given

evidence e

This is the weighted MAX-SAT problem

Use weighted MAX-SAT solver (e.g.

MaxWalkSAT) Better: Integer Linear Programming

iiiy

eynw ),(maxarg

8/3/2019 nasslli2010_4

23/57

MAP INFERENCE

:Smokes(A) _ Cancer(A) 1.5

:Smokes(B) _ Cancer(B) 1.5

:Smokes(A) _ Cancer(B) 1.5:Smokes(B) _ Cancer(A) 1.5

:Friends(A,B) _ :Smokes(A) _ Smokes(B) 0.55

:Friends(A,B) _ :Smokes(B) _ Smokes(A) 0.55

)()(),(,

)()(


xCancerxSmokesx

1.1

5.1


Evidence: Friends(A,B), Friends(B,A), Smokes(B)

8/3/2019 nasslli2010_4

24/57

RELATION TO STATISTICAL MODELS

Special cases: Markov networks

Bayesian networks

Log-linear models

Exponential models

Max. entropy models

Hidden Markov models

Obtained by makingall predicates zero-arity

Every probability distribution over discrete or finite-

precision numeric variables can be represented as a

Markov logic network.

8/3/2019 nasslli2010_4

25/57

MARKOV LOGIC

Declarative language, several challenges

Inference

Weight Learning

Structure Learning

Many ways to perform probabilistic inference

Conditional probability query

MAP query

Theres a large body of work on probabilistic

inference in graphical models

Well talk about some of these methods and how

they can be put to work in Markov logic networks

8/3/2019 nasslli2010_4

26/57

INFERENCE IN GRAPHICAL MODELS

Conditional probability queries P(Y|E=e) whereY Xand E X andY and E disjoint

LetW= XYE be the variables that are

neither query nor evidence, then

P(e) can be computed reusing the previous

computation

P(YjE = e) = P(Y;e)P(e)

P(y;e) = PwP(y;e;w)

P(e) = Py P(y;e)

8/3/2019 nasslli2010_4

27/57

COMPLEXITY OF INFERENCE

The process of summing out the joint not

satisfactory as it leads to the

Exponential blowup that the graphical model

representation was supposed to avoid

Problem: Exponential blow-up in the worst case is

unavoidable

Worse: Approximate inference is also NP-hard

But: We really care about the cases that weencounter in practice not the worst-case

8/3/2019 nasslli2010_4

28/57


Theoretical analysis can focus on Bayesian

networks as they can be converted to Markov

networks with no increase in representation size

First question: How do we encode a BN?

DAG structure + worst-case representation of eachCPD as a table of size |Val({X_i} [ PaXi )|

Decision Problem BN-Pr-DP:

Given a BN Bover X, a variableX2 X, and a value

x2Val(X), decide whether PB(X=x) > 0

BN-Pr-DP is NP-complete

8/3/2019 nasslli2010_4

29/57


BN-Pr-DP is in NP:

We guess an assignment x to the network variables

We check whetherX=xholds in x and whether P(x)>0

The latter can be accomplished in linear time using

the chain rule of BNs

BN-Pr-DP is NP-hard:

Reduction from 3-SAT

Given any 3-SAT formula f we can create a Bayesian

network B with some distinguished binary variableXsuch that f is satisfiable if and only if PB(X=x

1)>0

The BN has to be constructible in polynomial time

8/3/2019 nasslli2010_4

30/57


For each prop. variable qi one root node Qi with

P(qi1)=0.5 For each clause ci one node Ci with edge from Qi

to Cj if qi or qi occurs in the clause cj

8/3/2019 nasslli2010_4

31/57


c10

c11

q10 , q3

0 0 1

q11 , q3

0 0 1

q10 , q3

1 1 0

q11 , q3

1 0 1

c1 = q1 _ :q3

8/3/2019 nasslli2010_4

32/57


We cannot connect the Cis (i=1,,m) directly to

the variableXas the CPD forXwould be

exponentially large

We introduce m-2 AND nodes

8/3/2019 nasslli2010_4

33/57


Now,Xhas value 1 if and only if all of the clauses

are satisfied

All nodes have at most three parents and,

therefore, the size of the BN is polynomial in the

size off

8/3/2019 nasslli2010_4

34/57


Prior probability of each possible assignment is

1/2n

P(X=x1) = number of satisfying assignments to f

divided by 2n

f has a satisfying assignment iff P(x1) > 0

8/3/2019 nasslli2010_4

35/57


Probabilistic inference is a numerical problem

not a decision problem

We can use a similar construction to show the

following problem is #P-complete

Given a BN Bover X, a variableX2 X, and a value

x2Val(X), compute PB(X=x)

We have to do a weighted count of instantiations

#P is the set of the counting problems associated

with the decision problems in the set NP

#P problem must be at least as hard as the

corresponding NP problem

8/3/2019 nasslli2010_4

36/57

COMPLEXITY OFAPPROXIMATE

INFERENCE

Goal is to compute P(Y|e)

An estimate r has relative error e for P(y|e) if:

We can use a similar construction again to show

the following problem is NP-hard

Given a BN Bover X, a variableX2 X, and a valuex2Val(X), find a number r that has relative error e

for PB

(X=x)

)1()|(

1

er

e

r

eyP

8/3/2019 nasslli2010_4

37/57

COMPLEXITY OFAPPROXIMATE

INFERENCE

Goal is to compute P(Y|e)

An estimate r has absolute error e for P(y|e) if:

Computing P(X=x) up to some absolute error r

has a randomized polynomial time algorithm

However, when evidence is introduced, were

back to NP-hardness Following problem is NP-hard for any e 2 (0,1/2)

Given a BN Bover X, a variableX2 X, a value

x2Val(X), and an observation E=e, find a number r

that has absolute error e for PB(X=x|e)

er |)|(| eyP

8/3/2019 nasslli2010_4

38/57

MONTE CARLO PRINCIPLE

Consider the game of solitaire:

Whats the probability of winning

a game?

Hard to compute analytically

because winning or losingdepends on a complex procedure

of reorganizing cards

Lets play a few hands, and see

empirically how many do in fact

win Idea: Approximate a probability

distribution using only samples

from that distribution

?

Lose

Lose

Win

Lose

Chance of winning is 1 in 5!

Lose

8/3/2019 nasslli2010_4

39/57

SAMPLING FROM ABAYESIAN NETWORK

Generate samples (particles) from a Bayesian

network using a random number generator

d0 easy

d1

difficult

1

1

3

i0 normal

i1 high

g1 A

g2 B

g3 C

l0 weak

l1 strongs0 low

s1 high

8/3/2019 nasslli2010_4

40/57




d0 easy

d1

difficult

1

1

3

i0 normal

i1 high

g1 A

g2 B

g3 C

l0 weak

l1 strongs0 low

s1 high

8/3/2019 nasslli2010_4

41/57




d0 easy

d1

difficult

1

1

3

i0 normal

i1 high

g1 A

g2 B

g3 C

l0 weak

l1 strongs0 low

s1 high

8/3/2019 nasslli2010_4

42/57




d0 easy

d1

difficult

1

1

3

i0 normal

i1 high

g1 A

g2 B

g3 C

l0 weak

l1 strongs0 low

s1 high

8/3/2019 nasslli2010_4

43/57




d0 easy

d1

difficult

1

1

3

i0 normal

i1 high

g1 A

g2 B

g3 C

l0 weak

l1 strongs0 low

s1 high

8/3/2019 nasslli2010_4

44/57

SAMPLING

One sample can be computed in linear time

Sampling process generates a set of particles D =

{x[1],,x[M]}

When computing P(y), the estimate is simply the

fraction of particles in which y was seen

with 1 the indicator function and y[m] the

assignment toY in particle x[m]

PD =1

MPM

m=11fy[m] = yg

8/3/2019 nasslli2010_4

45/57

EXAMPLE: BAYESIAN NETWORK INFERENCE

Suppose we have a Bayesiannetwork with variables X

Our state space is the set of allpossible assignments of values to

variables We can draw a sample in time

that is linear in the size of thenetwork

Draw Nsamples, use them to

approximate the joint

1st sample: D=d0,I=i1,G=g2,S=s0, L=l1

2nd sample: D=d1,I=i1,G=g1,S=s1, L=l1

8/3/2019 nasslli2010_4

46/57

REJECTION SAMPLING

Suppose we have a Bayesiannetwork with variables X

We wish to condition on some

evidence E=e and compute theposterior over Y=X-E

Draw samples and reject them

when not compatible with

evidence e

Inefficient if the evidence is itself

improbable we must reject alarge number of samples

1st sample: D=d0,I=i1,G=g2,S=s0, L=l1 reject

2nd sample: D=d1,I=i1,G=g1,S=s1, L=l1 accept

8/3/2019 nasslli2010_4

47/57

SAMPLING IN MARKOV LOGIC NETWORKS

Sampling is performed on the ground Markov

logic network

Alchemy uses a variant of the MCMC (Markov

Chain Monte Carlo) method

Can answer arbitrary queries of the form

P(Fi|MLNC,L)

Example: P(Cancer(Alice)|MLNC,L)

8/3/2019 nasslli2010_4

48/57

MAP INFERENCE IN GRAPHICAL MODELS

The following problem is NP-complete:

Given a BN Bover Xand a number t, decide whether

there exists an assignment x to Xsuch that P(x) > t.

There exist several algorithms for MAP inferencewith reasonable performance on most practical

problems

8/3/2019 nasslli2010_4

49/57

MAP INFERENCE IN MARKOV LOGIC

NETWORKS


evidence e



iii

y eynw ),(maxarg

8/3/2019 nasslli2010_4

50/57


NETWORKS

Problem: Find most likely state of world given

evidence e

This is the weighted MAX-SAT problem

Use weighted MAX-SAT solver (e.g.

MaxWalkSAT) Better: Integer Linear Programming

i

iiy

eynw ),(maxarg

8/3/2019 nasslli2010_4

51/57

THE MAXWALKSAT ALGORITHM

for i 1 to max-tries dosolution = random truth assignment

forj 1 to max-flips do

if weights(sat. clauses) > threshold then

return solutionc random unsatisfied clause

with probabilityp

flip a random variable in c

else

flip variable in c that maximizes weights(sat. clauses)

return failure, best solution found

8/3/2019 nasslli2010_4

52/57


NETWORKS

Weve tried Alchemy (and MaxWalkSAT) and the

results were poor

Better results with integer linear programming

(ILP)

ILP performs exact inference

Works very well on the problems we are

concerned with

Originated in the field of operations research

8/3/2019 nasslli2010_4

53/57

LINEAR PROGRAMMING

Alinear programming problem is the problem of

maximizing (or minimizing) a linear function subject

to a finite number of linear constraints

Standard form oflinear programming:

n

j

jjxc1

)...,,2,1(0

)...,,2,1(1

njx

mibxa

j

i

n

j

jij

maximize

subject to

8/3/2019 nasslli2010_4

54/57

54

INTEGER LINEAR PROGRAMMING

An integer linear programming problem is the

problem of maximizing (or minimizing) a linear

function subject to a finite number of linear

constraints

Difference to LP: Variables only allowed to haveinteger values

n

j

jjxc1

,...}1,0,1{...,

)...,,2,1(0

)...,,2,1(1

j

j

i

n

j

jij

x

njx

mibxa

maximize

subject to

8/3/2019 nasslli2010_4

55/57

MAP INFERENCE


:Smokes(B) _ Cancer(B) 1.5

:Smokes(A) _ Cancer(B) 1.5:Smokes(B) _ Cancer(A) 1.5

:Friends(A,B) _ :Smokes(A) _ Smokes(B) 0.55

:Friends(A,B) _ :Smokes(B) _ Smokes(A) 0.55

)()(),(,

)()(


xCancerxSmokesx

1.1

5.1


Evidence: Friends(A,B), Friends(B,A), Smokes(B)

8/3/2019 nasslli2010_4

56/57

MAP INFERENCE - EXAMPLE


Introduce new variable for each ground atom: sa , ca

Introduce new variable for each formula: xj

Add the following three constraints:

sa + xj 1

-ca + xj 0

-xj - sa + ca -1

Add 1,5xj to the objective function

n

j

jjxc

1

}1,0{

)...,,2,1(1

j

i

n

j

jij

x

mibxa

maximize

subject to

8/3/2019 nasslli2010_4

57/57

TOMORROW

Ontology Matching with Markov Logic

Weight Learning

Experiments

nasslli2010_4

Documents

Transcript of nasslli2010_4