nasslli2010_4
-
Upload
harsh-prasad -
Category
Documents
-
view
214 -
download
0
Transcript of nasslli2010_4
-
8/3/2019 nasslli2010_4
1/57
MARKOVLOGICNASSLLI 2010
Mathias Niepert
-
8/3/2019 nasslli2010_4
2/57
MARKOV LOGIC: INTUITION
A logical KB is a set ofhard constraints
on the set of possible worlds
Lets make some of them soft constraints:
When a world violates a formula, it becomes less
probable, not impossible
Give each formula a weight
(Higher weight Stronger constraint)
satisfiesitformulasofweightsexpP(world)
-
8/3/2019 nasslli2010_4
3/57
MARKOV LOGIC: DEFINITION
A Markov Logic Network (MLN) is a set of pairs
(Fi, wi)where
Fi is a formula in first-order logic wi is a real-valued weight
Together with a finite set of constants C, itdefines a Markov network with
One binary node for each grounding of each predicate
in the MLN. The value of the node is 1 if the ground
atom is true, and 0 otherwise.
Onefeature for each grounding of each formula F inthe MLN, with the corresponding weight w
i
-
8/3/2019 nasslli2010_4
4/57
LOG-LINEAR MODELS
A distribution is a log-linear model over aMarkov network H if it is associated with
A set of features F= {f1(D1),,fk(Dk)}, where each Diis a complete subgraph (clique) ofH,
A set of weights w1 ,,wk such that
P(X1; :::Xn) =1
Z
exphPki=1
wifi(Di)i
-
8/3/2019 nasslli2010_4
5/57
ASSUMPTIONS
1. Unique names.Different constants refer to
different objects. (Genesereth & Nilsson, 1987)
2. Domain closure. The only objects in the
domain are those representable using the
constant and function symbols (Genesereth &
Nilsson, 1987)
3. Known functions. For each function, the value
of that function applied to every possible tuple of
arguments is known, and is an element of C
-
8/3/2019 nasslli2010_4
6/57
EXAMPLE: FRIENDS & SMOKERS
habits.smokingsimilarhaveFriends
cancer.causesSmoking
-
8/3/2019 nasslli2010_4
7/57
EXAMPLE: FRIENDS & SMOKERS
)()(),(,)()(
ySmokesxSmokesyxFriendsyxxCancerxSmokesx
habits.smokingsimilarhaveFriends
cancer.causesSmoking
-
8/3/2019 nasslli2010_4
8/57
EXAMPLE: FRIENDS & SMOKERS
1.1
5.1
habits.smokingsimilarhaveFriends
cancer.causesSmoking
)()(),(,)()(
ySmokesxSmokesyxFriendsyxxCancerxSmokesx
-
8/3/2019 nasslli2010_4
9/57
EXAMPLE: FRIENDS & SMOKERS
1.1
5.1
Two constants:Anna (A) and Bob (B)
habits.smokingsimilarhaveFriends
cancer.causesSmoking
)()(),(,)()(
ySmokesxSmokesyxFriendsyxxCancerxSmokesx
-
8/3/2019 nasslli2010_4
10/57
EXAMPLE: FRIENDS & SMOKERS
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A) Smokes(B)
Cancer(B)
Two constants:Anna (A) and Bob (B)
-
8/3/2019 nasslli2010_4
11/57
EXAMPLE: FRIENDS & SMOKERS
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants:Anna (A) and Bob (B)
-
8/3/2019 nasslli2010_4
12/57
EXAMPLE: FRIENDS & SMOKERS
)()(),(,)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants:Anna (A) and Bob (B)
-
8/3/2019 nasslli2010_4
13/57
EXAMPLE: FRIENDS & SMOKERS
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants:Anna (A) and Bob (B)
-
8/3/2019 nasslli2010_4
14/57
MARKOV LOGIC NETWORKS
MLN is template for ground Markov networks
Probability of a world x:
Typed variables and constants greatly reduce size of
ground Markov net Functions, existential quantifiers, etc.
Infinite and continuous domains are possible
Weight of formula i No. of true groundings of formula i in x
iii xnwZxP )(exp
1)(
-
8/3/2019 nasslli2010_4
15/57
EXAMPLE: FRIENDS & SMOKERS
)()(),(,)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Two constants:Anna (A) and Bob (B)
)()(:5.1
)()(:5.1
BCancerBSmokes
ACancerASmokes
)()(),(:1.1)()(),(:1.1
)()(),(:1.1
)()(),(:1.1
BSmokesBSmokesBBFriends
ASmokesBSmokesABFriends
BSmokesASmokesBAFriends
ASmokesASmokesAAFriends
3*1.11*5.1exp1
)(exp1
))(,)(,),(,),(,),(,),(,)(,)((
Z
xnwZ
TBCFACTBBFFABFTBAFTAAFFBSTASP
i
ii
-
8/3/2019 nasslli2010_4
16/57
RELATION TO FIRST-ORDER LOGIC
Infinite weights First-order logic Satisfiable KB, positive weights
Satisfying assignments = Modes of distribution
Markov logic allows inconsistencies (contradictionsbetween formulas)
-
8/3/2019 nasslli2010_4
17/57
MAP INFERENCE IN
MARKOV LOGIC NETWORKS
Problem: Find most likely state of world y given
evidence e
)|(maxarg eyPy
Query Evidence
-
8/3/2019 nasslli2010_4
18/57
MAP INFERENCE
Problem: Find most likely state of world y given
evidence e
ni is thefeature corresponding to formula Fi
wi is the weight corresponding to formula Fi
i
ii
ey
eynwZ
),(exp1maxarg
-
8/3/2019 nasslli2010_4
19/57
MAP INFERENCE
Problem: Find most likely state of world y given
evidence e
ni is thefeature corresponding to formula Fi
wi is the weight corresponding to formula Fi
i
iiy
eynw ),(maxarg
-
8/3/2019 nasslli2010_4
20/57
MAP INFERENCE
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants:Anna (A) and Bob (B)
Evidence: Friends(A,B), Friends(B,A), Smokes(B)true
true ?
??true
? ?
-
8/3/2019 nasslli2010_4
21/57
MAP INFERENCE
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants:Anna (A) and Bob (B)
Evidence: Friends(A,B), Friends(B,A), Smokes(B)true
true false
truetruetrue
false true
-
8/3/2019 nasslli2010_4
22/57
MAP INFERENCE
Problem: Find most likely state of world given
evidence e
This is the weighted MAX-SAT problem
Use weighted MAX-SAT solver (e.g.
MaxWalkSAT) Better: Integer Linear Programming
iiiy
eynw ),(maxarg
-
8/3/2019 nasslli2010_4
23/57
MAP INFERENCE
:Smokes(A) _ Cancer(A) 1.5
:Smokes(B) _ Cancer(B) 1.5
:Smokes(A) _ Cancer(B) 1.5:Smokes(B) _ Cancer(A) 1.5
:Friends(A,B) _ :Smokes(A) _ Smokes(B) 0.55
:Friends(A,B) _ :Smokes(B) _ Smokes(A) 0.55
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Two constants:Anna (A) and Bob (B)
Evidence: Friends(A,B), Friends(B,A), Smokes(B)
-
8/3/2019 nasslli2010_4
24/57
RELATION TO STATISTICAL MODELS
Special cases: Markov networks
Bayesian networks
Log-linear models
Exponential models
Max. entropy models
Hidden Markov models
Obtained by makingall predicates zero-arity
Every probability distribution over discrete or finite-
precision numeric variables can be represented as a
Markov logic network.
-
8/3/2019 nasslli2010_4
25/57
MARKOV LOGIC
Declarative language, several challenges
Inference
Weight Learning
Structure Learning
Many ways to perform probabilistic inference
Conditional probability query
MAP query
Theres a large body of work on probabilistic
inference in graphical models
Well talk about some of these methods and how
they can be put to work in Markov logic networks
-
8/3/2019 nasslli2010_4
26/57
INFERENCE IN GRAPHICAL MODELS
Conditional probability queries P(Y|E=e) whereY Xand E X andY and E disjoint
LetW= XYE be the variables that are
neither query nor evidence, then
P(e) can be computed reusing the previous
computation
P(YjE = e) = P(Y;e)P(e)
P(y;e) = PwP(y;e;w)
P(e) = Py P(y;e)
-
8/3/2019 nasslli2010_4
27/57
COMPLEXITY OF INFERENCE
The process of summing out the joint not
satisfactory as it leads to the
Exponential blowup that the graphical model
representation was supposed to avoid
Problem: Exponential blow-up in the worst case is
unavoidable
Worse: Approximate inference is also NP-hard
But: We really care about the cases that weencounter in practice not the worst-case
-
8/3/2019 nasslli2010_4
28/57
COMPLEXITY OF INFERENCE
Theoretical analysis can focus on Bayesian
networks as they can be converted to Markov
networks with no increase in representation size
First question: How do we encode a BN?
DAG structure + worst-case representation of eachCPD as a table of size |Val({X_i} [ PaXi )|
Decision Problem BN-Pr-DP:
Given a BN Bover X, a variableX2 X, and a value
x2Val(X), decide whether PB(X=x) > 0
BN-Pr-DP is NP-complete
-
8/3/2019 nasslli2010_4
29/57
COMPLEXITY OF INFERENCE
BN-Pr-DP is in NP:
We guess an assignment x to the network variables
We check whetherX=xholds in x and whether P(x)>0
The latter can be accomplished in linear time using
the chain rule of BNs
BN-Pr-DP is NP-hard:
Reduction from 3-SAT
Given any 3-SAT formula f we can create a Bayesian
network B with some distinguished binary variableXsuch that f is satisfiable if and only if PB(X=x
1)>0
The BN has to be constructible in polynomial time
-
8/3/2019 nasslli2010_4
30/57
COMPLEXITY OF INFERENCE
For each prop. variable qi one root node Qi with
P(qi1)=0.5 For each clause ci one node Ci with edge from Qi
to Cj if qi or qi occurs in the clause cj
-
8/3/2019 nasslli2010_4
31/57
COMPLEXITY OF INFERENCE
c10
c11
q10 , q3
0 0 1
q11 , q3
0 0 1
q10 , q3
1 1 0
q11 , q3
1 0 1
c1 = q1 _ :q3
-
8/3/2019 nasslli2010_4
32/57
COMPLEXITY OF INFERENCE
We cannot connect the Cis (i=1,,m) directly to
the variableXas the CPD forXwould be
exponentially large
We introduce m-2 AND nodes
-
8/3/2019 nasslli2010_4
33/57
COMPLEXITY OF INFERENCE
Now,Xhas value 1 if and only if all of the clauses
are satisfied
All nodes have at most three parents and,
therefore, the size of the BN is polynomial in the
size off
-
8/3/2019 nasslli2010_4
34/57
COMPLEXITY OF INFERENCE
Prior probability of each possible assignment is
1/2n
P(X=x1) = number of satisfying assignments to f
divided by 2n
f has a satisfying assignment iff P(x1) > 0
-
8/3/2019 nasslli2010_4
35/57
COMPLEXITY OF INFERENCE
Probabilistic inference is a numerical problem
not a decision problem
We can use a similar construction to show the
following problem is #P-complete
Given a BN Bover X, a variableX2 X, and a value
x2Val(X), compute PB(X=x)
We have to do a weighted count of instantiations
#P is the set of the counting problems associated
with the decision problems in the set NP
#P problem must be at least as hard as the
corresponding NP problem
-
8/3/2019 nasslli2010_4
36/57
COMPLEXITY OFAPPROXIMATE
INFERENCE
Goal is to compute P(Y|e)
An estimate r has relative error e for P(y|e) if:
We can use a similar construction again to show
the following problem is NP-hard
Given a BN Bover X, a variableX2 X, and a valuex2Val(X), find a number r that has relative error e
for PB
(X=x)
)1()|(
1
er
e
r
eyP
-
8/3/2019 nasslli2010_4
37/57
COMPLEXITY OFAPPROXIMATE
INFERENCE
Goal is to compute P(Y|e)
An estimate r has absolute error e for P(y|e) if:
Computing P(X=x) up to some absolute error r
has a randomized polynomial time algorithm
However, when evidence is introduced, were
back to NP-hardness Following problem is NP-hard for any e 2 (0,1/2)
Given a BN Bover X, a variableX2 X, a value
x2Val(X), and an observation E=e, find a number r
that has absolute error e for PB(X=x|e)
er |)|(| eyP
-
8/3/2019 nasslli2010_4
38/57
MONTE CARLO PRINCIPLE
Consider the game of solitaire:
Whats the probability of winning
a game?
Hard to compute analytically
because winning or losingdepends on a complex procedure
of reorganizing cards
Lets play a few hands, and see
empirically how many do in fact
win Idea: Approximate a probability
distribution using only samples
from that distribution
?
Lose
Lose
Win
Lose
Chance of winning is 1 in 5!
Lose
-
8/3/2019 nasslli2010_4
39/57
SAMPLING FROM ABAYESIAN NETWORK
Generate samples (particles) from a Bayesian
network using a random number generator
d0 easy
d1
difficult
1
1
3
i0 normal
i1 high
g1 A
g2 B
g3 C
l0 weak
l1 strongs0 low
s1 high
-
8/3/2019 nasslli2010_4
40/57
SAMPLING FROM ABAYESIAN NETWORK
Generate samples (particles) from a Bayesian
network using a random number generator
d0 easy
d1
difficult
1
1
3
i0 normal
i1 high
g1 A
g2 B
g3 C
l0 weak
l1 strongs0 low
s1 high
-
8/3/2019 nasslli2010_4
41/57
SAMPLING FROM ABAYESIAN NETWORK
Generate samples (particles) from a Bayesian
network using a random number generator
d0 easy
d1
difficult
1
1
3
i0 normal
i1 high
g1 A
g2 B
g3 C
l0 weak
l1 strongs0 low
s1 high
-
8/3/2019 nasslli2010_4
42/57
SAMPLING FROM ABAYESIAN NETWORK
Generate samples (particles) from a Bayesian
network using a random number generator
d0 easy
d1
difficult
1
1
3
i0 normal
i1 high
g1 A
g2 B
g3 C
l0 weak
l1 strongs0 low
s1 high
-
8/3/2019 nasslli2010_4
43/57
SAMPLING FROM ABAYESIAN NETWORK
Generate samples (particles) from a Bayesian
network using a random number generator
d0 easy
d1
difficult
1
1
3
i0 normal
i1 high
g1 A
g2 B
g3 C
l0 weak
l1 strongs0 low
s1 high
-
8/3/2019 nasslli2010_4
44/57
SAMPLING
One sample can be computed in linear time
Sampling process generates a set of particles D =
{x[1],,x[M]}
When computing P(y), the estimate is simply the
fraction of particles in which y was seen
with 1 the indicator function and y[m] the
assignment toY in particle x[m]
PD =1
MPM
m=11fy[m] = yg
-
8/3/2019 nasslli2010_4
45/57
EXAMPLE: BAYESIAN NETWORK INFERENCE
Suppose we have a Bayesiannetwork with variables X
Our state space is the set of allpossible assignments of values to
variables We can draw a sample in time
that is linear in the size of thenetwork
Draw Nsamples, use them to
approximate the joint
1st sample: D=d0,I=i1,G=g2,S=s0, L=l1
2nd sample: D=d1,I=i1,G=g1,S=s1, L=l1
-
8/3/2019 nasslli2010_4
46/57
REJECTION SAMPLING
Suppose we have a Bayesiannetwork with variables X
We wish to condition on some
evidence E=e and compute theposterior over Y=X-E
Draw samples and reject them
when not compatible with
evidence e
Inefficient if the evidence is itself
improbable we must reject alarge number of samples
1st sample: D=d0,I=i1,G=g2,S=s0, L=l1 reject
2nd sample: D=d1,I=i1,G=g1,S=s1, L=l1 accept
-
8/3/2019 nasslli2010_4
47/57
SAMPLING IN MARKOV LOGIC NETWORKS
Sampling is performed on the ground Markov
logic network
Alchemy uses a variant of the MCMC (Markov
Chain Monte Carlo) method
Can answer arbitrary queries of the form
P(Fi|MLNC,L)
Example: P(Cancer(Alice)|MLNC,L)
-
8/3/2019 nasslli2010_4
48/57
MAP INFERENCE IN GRAPHICAL MODELS
The following problem is NP-complete:
Given a BN Bover Xand a number t, decide whether
there exists an assignment x to Xsuch that P(x) > t.
There exist several algorithms for MAP inferencewith reasonable performance on most practical
problems
-
8/3/2019 nasslli2010_4
49/57
MAP INFERENCE IN MARKOV LOGIC
NETWORKS
Problem: Find most likely state of world y given
evidence e
ni is thefeature corresponding to formula Fi
wi is the weight corresponding to formula Fi
iii
y eynw ),(maxarg
-
8/3/2019 nasslli2010_4
50/57
MAP INFERENCE IN MARKOV LOGIC
NETWORKS
Problem: Find most likely state of world given
evidence e
This is the weighted MAX-SAT problem
Use weighted MAX-SAT solver (e.g.
MaxWalkSAT) Better: Integer Linear Programming
i
iiy
eynw ),(maxarg
-
8/3/2019 nasslli2010_4
51/57
THE MAXWALKSAT ALGORITHM
for i 1 to max-tries dosolution = random truth assignment
forj 1 to max-flips do
if weights(sat. clauses) > threshold then
return solutionc random unsatisfied clause
with probabilityp
flip a random variable in c
else
flip variable in c that maximizes weights(sat. clauses)
return failure, best solution found
-
8/3/2019 nasslli2010_4
52/57
MAP INFERENCE IN MARKOV LOGIC
NETWORKS
Weve tried Alchemy (and MaxWalkSAT) and the
results were poor
Better results with integer linear programming
(ILP)
ILP performs exact inference
Works very well on the problems we are
concerned with
Originated in the field of operations research
-
8/3/2019 nasslli2010_4
53/57
LINEAR PROGRAMMING
Alinear programming problem is the problem of
maximizing (or minimizing) a linear function subject
to a finite number of linear constraints
Standard form oflinear programming:
n
j
jjxc1
)...,,2,1(0
)...,,2,1(1
njx
mibxa
j
i
n
j
jij
maximize
subject to
-
8/3/2019 nasslli2010_4
54/57
54
INTEGER LINEAR PROGRAMMING
An integer linear programming problem is the
problem of maximizing (or minimizing) a linear
function subject to a finite number of linear
constraints
Difference to LP: Variables only allowed to haveinteger values
n
j
jjxc1
,...}1,0,1{...,
)...,,2,1(0
)...,,2,1(1
j
j
i
n
j
jij
x
njx
mibxa
maximize
subject to
-
8/3/2019 nasslli2010_4
55/57
MAP INFERENCE
:Smokes(A) _ Cancer(A) 1.5
:Smokes(B) _ Cancer(B) 1.5
:Smokes(A) _ Cancer(B) 1.5:Smokes(B) _ Cancer(A) 1.5
:Friends(A,B) _ :Smokes(A) _ Smokes(B) 0.55
:Friends(A,B) _ :Smokes(B) _ Smokes(A) 0.55
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Two constants:Anna (A) and Bob (B)
Evidence: Friends(A,B), Friends(B,A), Smokes(B)
-
8/3/2019 nasslli2010_4
56/57
MAP INFERENCE - EXAMPLE
:Smokes(A) _ Cancer(A) 1.5
Introduce new variable for each ground atom: sa , ca
Introduce new variable for each formula: xj
Add the following three constraints:
sa + xj 1
-ca + xj 0
-xj - sa + ca -1
Add 1,5xj to the objective function
n
j
jjxc
1
}1,0{
)...,,2,1(1
j
i
n
j
jij
x
mibxa
maximize
subject to
-
8/3/2019 nasslli2010_4
57/57
TOMORROW
Ontology Matching with Markov Logic
Weight Learning
Experiments