nasslli2010_4

download nasslli2010_4

of 57

Transcript of nasslli2010_4

  • 8/3/2019 nasslli2010_4

    1/57

    MARKOVLOGICNASSLLI 2010

    Mathias Niepert

  • 8/3/2019 nasslli2010_4

    2/57

    MARKOV LOGIC: INTUITION

    A logical KB is a set ofhard constraints

    on the set of possible worlds

    Lets make some of them soft constraints:

    When a world violates a formula, it becomes less

    probable, not impossible

    Give each formula a weight

    (Higher weight Stronger constraint)

    satisfiesitformulasofweightsexpP(world)

  • 8/3/2019 nasslli2010_4

    3/57

    MARKOV LOGIC: DEFINITION

    A Markov Logic Network (MLN) is a set of pairs

    (Fi, wi)where

    Fi is a formula in first-order logic wi is a real-valued weight

    Together with a finite set of constants C, itdefines a Markov network with

    One binary node for each grounding of each predicate

    in the MLN. The value of the node is 1 if the ground

    atom is true, and 0 otherwise.

    Onefeature for each grounding of each formula F inthe MLN, with the corresponding weight w

    i

  • 8/3/2019 nasslli2010_4

    4/57

    LOG-LINEAR MODELS

    A distribution is a log-linear model over aMarkov network H if it is associated with

    A set of features F= {f1(D1),,fk(Dk)}, where each Diis a complete subgraph (clique) ofH,

    A set of weights w1 ,,wk such that

    P(X1; :::Xn) =1

    Z

    exphPki=1

    wifi(Di)i

  • 8/3/2019 nasslli2010_4

    5/57

    ASSUMPTIONS

    1. Unique names.Different constants refer to

    different objects. (Genesereth & Nilsson, 1987)

    2. Domain closure. The only objects in the

    domain are those representable using the

    constant and function symbols (Genesereth &

    Nilsson, 1987)

    3. Known functions. For each function, the value

    of that function applied to every possible tuple of

    arguments is known, and is an element of C

  • 8/3/2019 nasslli2010_4

    6/57

    EXAMPLE: FRIENDS & SMOKERS

    habits.smokingsimilarhaveFriends

    cancer.causesSmoking

  • 8/3/2019 nasslli2010_4

    7/57

    EXAMPLE: FRIENDS & SMOKERS

    )()(),(,)()(

    ySmokesxSmokesyxFriendsyxxCancerxSmokesx

    habits.smokingsimilarhaveFriends

    cancer.causesSmoking

  • 8/3/2019 nasslli2010_4

    8/57

    EXAMPLE: FRIENDS & SMOKERS

    1.1

    5.1

    habits.smokingsimilarhaveFriends

    cancer.causesSmoking

    )()(),(,)()(

    ySmokesxSmokesyxFriendsyxxCancerxSmokesx

  • 8/3/2019 nasslli2010_4

    9/57

    EXAMPLE: FRIENDS & SMOKERS

    1.1

    5.1

    Two constants:Anna (A) and Bob (B)

    habits.smokingsimilarhaveFriends

    cancer.causesSmoking

    )()(),(,)()(

    ySmokesxSmokesyxFriendsyxxCancerxSmokesx

  • 8/3/2019 nasslli2010_4

    10/57

    EXAMPLE: FRIENDS & SMOKERS

    )()(),(,

    )()(

    ySmokesxSmokesyxFriendsyx

    xCancerxSmokesx

    1.1

    5.1

    Cancer(A)

    Smokes(A) Smokes(B)

    Cancer(B)

    Two constants:Anna (A) and Bob (B)

  • 8/3/2019 nasslli2010_4

    11/57

    EXAMPLE: FRIENDS & SMOKERS

    )()(),(,

    )()(

    ySmokesxSmokesyxFriendsyx

    xCancerxSmokesx

    1.1

    5.1

    Cancer(A)

    Smokes(A)Friends(A,A)

    Friends(B,A)

    Smokes(B)

    Friends(A,B)

    Cancer(B)

    Friends(B,B)

    Two constants:Anna (A) and Bob (B)

  • 8/3/2019 nasslli2010_4

    12/57

    EXAMPLE: FRIENDS & SMOKERS

    )()(),(,)()(

    ySmokesxSmokesyxFriendsyx

    xCancerxSmokesx

    1.1

    5.1

    Cancer(A)

    Smokes(A)Friends(A,A)

    Friends(B,A)

    Smokes(B)

    Friends(A,B)

    Cancer(B)

    Friends(B,B)

    Two constants:Anna (A) and Bob (B)

  • 8/3/2019 nasslli2010_4

    13/57

    EXAMPLE: FRIENDS & SMOKERS

    )()(),(,

    )()(

    ySmokesxSmokesyxFriendsyx

    xCancerxSmokesx

    1.1

    5.1

    Cancer(A)

    Smokes(A)Friends(A,A)

    Friends(B,A)

    Smokes(B)

    Friends(A,B)

    Cancer(B)

    Friends(B,B)

    Two constants:Anna (A) and Bob (B)

  • 8/3/2019 nasslli2010_4

    14/57

    MARKOV LOGIC NETWORKS

    MLN is template for ground Markov networks

    Probability of a world x:

    Typed variables and constants greatly reduce size of

    ground Markov net Functions, existential quantifiers, etc.

    Infinite and continuous domains are possible

    Weight of formula i No. of true groundings of formula i in x

    iii xnwZxP )(exp

    1)(

  • 8/3/2019 nasslli2010_4

    15/57

    EXAMPLE: FRIENDS & SMOKERS

    )()(),(,)()(

    ySmokesxSmokesyxFriendsyx

    xCancerxSmokesx

    1.1

    5.1

    Two constants:Anna (A) and Bob (B)

    )()(:5.1

    )()(:5.1

    BCancerBSmokes

    ACancerASmokes

    )()(),(:1.1)()(),(:1.1

    )()(),(:1.1

    )()(),(:1.1

    BSmokesBSmokesBBFriends

    ASmokesBSmokesABFriends

    BSmokesASmokesBAFriends

    ASmokesASmokesAAFriends

    3*1.11*5.1exp1

    )(exp1

    ))(,)(,),(,),(,),(,),(,)(,)((

    Z

    xnwZ

    TBCFACTBBFFABFTBAFTAAFFBSTASP

    i

    ii

  • 8/3/2019 nasslli2010_4

    16/57

    RELATION TO FIRST-ORDER LOGIC

    Infinite weights First-order logic Satisfiable KB, positive weights

    Satisfying assignments = Modes of distribution

    Markov logic allows inconsistencies (contradictionsbetween formulas)

  • 8/3/2019 nasslli2010_4

    17/57

    MAP INFERENCE IN

    MARKOV LOGIC NETWORKS

    Problem: Find most likely state of world y given

    evidence e

    )|(maxarg eyPy

    Query Evidence

  • 8/3/2019 nasslli2010_4

    18/57

    MAP INFERENCE

    Problem: Find most likely state of world y given

    evidence e

    ni is thefeature corresponding to formula Fi

    wi is the weight corresponding to formula Fi

    i

    ii

    ey

    eynwZ

    ),(exp1maxarg

  • 8/3/2019 nasslli2010_4

    19/57

    MAP INFERENCE

    Problem: Find most likely state of world y given

    evidence e

    ni is thefeature corresponding to formula Fi

    wi is the weight corresponding to formula Fi

    i

    iiy

    eynw ),(maxarg

  • 8/3/2019 nasslli2010_4

    20/57

    MAP INFERENCE

    )()(),(,

    )()(

    ySmokesxSmokesyxFriendsyx

    xCancerxSmokesx

    1.1

    5.1

    Cancer(A)

    Smokes(A)Friends(A,A)

    Friends(B,A)

    Smokes(B)

    Friends(A,B)

    Cancer(B)

    Friends(B,B)

    Two constants:Anna (A) and Bob (B)

    Evidence: Friends(A,B), Friends(B,A), Smokes(B)true

    true ?

    ??true

    ? ?

  • 8/3/2019 nasslli2010_4

    21/57

    MAP INFERENCE

    )()(),(,

    )()(

    ySmokesxSmokesyxFriendsyx

    xCancerxSmokesx

    1.1

    5.1

    Cancer(A)

    Smokes(A)Friends(A,A)

    Friends(B,A)

    Smokes(B)

    Friends(A,B)

    Cancer(B)

    Friends(B,B)

    Two constants:Anna (A) and Bob (B)

    Evidence: Friends(A,B), Friends(B,A), Smokes(B)true

    true false

    truetruetrue

    false true

  • 8/3/2019 nasslli2010_4

    22/57

    MAP INFERENCE

    Problem: Find most likely state of world given

    evidence e

    This is the weighted MAX-SAT problem

    Use weighted MAX-SAT solver (e.g.

    MaxWalkSAT) Better: Integer Linear Programming

    iiiy

    eynw ),(maxarg

  • 8/3/2019 nasslli2010_4

    23/57

    MAP INFERENCE

    :Smokes(A) _ Cancer(A) 1.5

    :Smokes(B) _ Cancer(B) 1.5

    :Smokes(A) _ Cancer(B) 1.5:Smokes(B) _ Cancer(A) 1.5

    :Friends(A,B) _ :Smokes(A) _ Smokes(B) 0.55

    :Friends(A,B) _ :Smokes(B) _ Smokes(A) 0.55

    )()(),(,

    )()(

    ySmokesxSmokesyxFriendsyx

    xCancerxSmokesx

    1.1

    5.1

    Two constants:Anna (A) and Bob (B)

    Evidence: Friends(A,B), Friends(B,A), Smokes(B)

  • 8/3/2019 nasslli2010_4

    24/57

    RELATION TO STATISTICAL MODELS

    Special cases: Markov networks

    Bayesian networks

    Log-linear models

    Exponential models

    Max. entropy models

    Hidden Markov models

    Obtained by makingall predicates zero-arity

    Every probability distribution over discrete or finite-

    precision numeric variables can be represented as a

    Markov logic network.

  • 8/3/2019 nasslli2010_4

    25/57

    MARKOV LOGIC

    Declarative language, several challenges

    Inference

    Weight Learning

    Structure Learning

    Many ways to perform probabilistic inference

    Conditional probability query

    MAP query

    Theres a large body of work on probabilistic

    inference in graphical models

    Well talk about some of these methods and how

    they can be put to work in Markov logic networks

  • 8/3/2019 nasslli2010_4

    26/57

    INFERENCE IN GRAPHICAL MODELS

    Conditional probability queries P(Y|E=e) whereY Xand E X andY and E disjoint

    LetW= XYE be the variables that are

    neither query nor evidence, then

    P(e) can be computed reusing the previous

    computation

    P(YjE = e) = P(Y;e)P(e)

    P(y;e) = PwP(y;e;w)

    P(e) = Py P(y;e)

  • 8/3/2019 nasslli2010_4

    27/57

    COMPLEXITY OF INFERENCE

    The process of summing out the joint not

    satisfactory as it leads to the

    Exponential blowup that the graphical model

    representation was supposed to avoid

    Problem: Exponential blow-up in the worst case is

    unavoidable

    Worse: Approximate inference is also NP-hard

    But: We really care about the cases that weencounter in practice not the worst-case

  • 8/3/2019 nasslli2010_4

    28/57

    COMPLEXITY OF INFERENCE

    Theoretical analysis can focus on Bayesian

    networks as they can be converted to Markov

    networks with no increase in representation size

    First question: How do we encode a BN?

    DAG structure + worst-case representation of eachCPD as a table of size |Val({X_i} [ PaXi )|

    Decision Problem BN-Pr-DP:

    Given a BN Bover X, a variableX2 X, and a value

    x2Val(X), decide whether PB(X=x) > 0

    BN-Pr-DP is NP-complete

  • 8/3/2019 nasslli2010_4

    29/57

    COMPLEXITY OF INFERENCE

    BN-Pr-DP is in NP:

    We guess an assignment x to the network variables

    We check whetherX=xholds in x and whether P(x)>0

    The latter can be accomplished in linear time using

    the chain rule of BNs

    BN-Pr-DP is NP-hard:

    Reduction from 3-SAT

    Given any 3-SAT formula f we can create a Bayesian

    network B with some distinguished binary variableXsuch that f is satisfiable if and only if PB(X=x

    1)>0

    The BN has to be constructible in polynomial time

  • 8/3/2019 nasslli2010_4

    30/57

    COMPLEXITY OF INFERENCE

    For each prop. variable qi one root node Qi with

    P(qi1)=0.5 For each clause ci one node Ci with edge from Qi

    to Cj if qi or qi occurs in the clause cj

  • 8/3/2019 nasslli2010_4

    31/57

    COMPLEXITY OF INFERENCE

    c10

    c11

    q10 , q3

    0 0 1

    q11 , q3

    0 0 1

    q10 , q3

    1 1 0

    q11 , q3

    1 0 1

    c1 = q1 _ :q3

  • 8/3/2019 nasslli2010_4

    32/57

    COMPLEXITY OF INFERENCE

    We cannot connect the Cis (i=1,,m) directly to

    the variableXas the CPD forXwould be

    exponentially large

    We introduce m-2 AND nodes

  • 8/3/2019 nasslli2010_4

    33/57

    COMPLEXITY OF INFERENCE

    Now,Xhas value 1 if and only if all of the clauses

    are satisfied

    All nodes have at most three parents and,

    therefore, the size of the BN is polynomial in the

    size off

  • 8/3/2019 nasslli2010_4

    34/57

    COMPLEXITY OF INFERENCE

    Prior probability of each possible assignment is

    1/2n

    P(X=x1) = number of satisfying assignments to f

    divided by 2n

    f has a satisfying assignment iff P(x1) > 0

  • 8/3/2019 nasslli2010_4

    35/57

    COMPLEXITY OF INFERENCE

    Probabilistic inference is a numerical problem

    not a decision problem

    We can use a similar construction to show the

    following problem is #P-complete

    Given a BN Bover X, a variableX2 X, and a value

    x2Val(X), compute PB(X=x)

    We have to do a weighted count of instantiations

    #P is the set of the counting problems associated

    with the decision problems in the set NP

    #P problem must be at least as hard as the

    corresponding NP problem

  • 8/3/2019 nasslli2010_4

    36/57

    COMPLEXITY OFAPPROXIMATE

    INFERENCE

    Goal is to compute P(Y|e)

    An estimate r has relative error e for P(y|e) if:

    We can use a similar construction again to show

    the following problem is NP-hard

    Given a BN Bover X, a variableX2 X, and a valuex2Val(X), find a number r that has relative error e

    for PB

    (X=x)

    )1()|(

    1

    er

    e

    r

    eyP

  • 8/3/2019 nasslli2010_4

    37/57

    COMPLEXITY OFAPPROXIMATE

    INFERENCE

    Goal is to compute P(Y|e)

    An estimate r has absolute error e for P(y|e) if:

    Computing P(X=x) up to some absolute error r

    has a randomized polynomial time algorithm

    However, when evidence is introduced, were

    back to NP-hardness Following problem is NP-hard for any e 2 (0,1/2)

    Given a BN Bover X, a variableX2 X, a value

    x2Val(X), and an observation E=e, find a number r

    that has absolute error e for PB(X=x|e)

    er |)|(| eyP

  • 8/3/2019 nasslli2010_4

    38/57

    MONTE CARLO PRINCIPLE

    Consider the game of solitaire:

    Whats the probability of winning

    a game?

    Hard to compute analytically

    because winning or losingdepends on a complex procedure

    of reorganizing cards

    Lets play a few hands, and see

    empirically how many do in fact

    win Idea: Approximate a probability

    distribution using only samples

    from that distribution

    ?

    Lose

    Lose

    Win

    Lose

    Chance of winning is 1 in 5!

    Lose

  • 8/3/2019 nasslli2010_4

    39/57

    SAMPLING FROM ABAYESIAN NETWORK

    Generate samples (particles) from a Bayesian

    network using a random number generator

    d0 easy

    d1

    difficult

    1

    1

    3

    i0 normal

    i1 high

    g1 A

    g2 B

    g3 C

    l0 weak

    l1 strongs0 low

    s1 high

  • 8/3/2019 nasslli2010_4

    40/57

    SAMPLING FROM ABAYESIAN NETWORK

    Generate samples (particles) from a Bayesian

    network using a random number generator

    d0 easy

    d1

    difficult

    1

    1

    3

    i0 normal

    i1 high

    g1 A

    g2 B

    g3 C

    l0 weak

    l1 strongs0 low

    s1 high

  • 8/3/2019 nasslli2010_4

    41/57

    SAMPLING FROM ABAYESIAN NETWORK

    Generate samples (particles) from a Bayesian

    network using a random number generator

    d0 easy

    d1

    difficult

    1

    1

    3

    i0 normal

    i1 high

    g1 A

    g2 B

    g3 C

    l0 weak

    l1 strongs0 low

    s1 high

  • 8/3/2019 nasslli2010_4

    42/57

    SAMPLING FROM ABAYESIAN NETWORK

    Generate samples (particles) from a Bayesian

    network using a random number generator

    d0 easy

    d1

    difficult

    1

    1

    3

    i0 normal

    i1 high

    g1 A

    g2 B

    g3 C

    l0 weak

    l1 strongs0 low

    s1 high

  • 8/3/2019 nasslli2010_4

    43/57

    SAMPLING FROM ABAYESIAN NETWORK

    Generate samples (particles) from a Bayesian

    network using a random number generator

    d0 easy

    d1

    difficult

    1

    1

    3

    i0 normal

    i1 high

    g1 A

    g2 B

    g3 C

    l0 weak

    l1 strongs0 low

    s1 high

  • 8/3/2019 nasslli2010_4

    44/57

    SAMPLING

    One sample can be computed in linear time

    Sampling process generates a set of particles D =

    {x[1],,x[M]}

    When computing P(y), the estimate is simply the

    fraction of particles in which y was seen

    with 1 the indicator function and y[m] the

    assignment toY in particle x[m]

    PD =1

    MPM

    m=11fy[m] = yg

  • 8/3/2019 nasslli2010_4

    45/57

    EXAMPLE: BAYESIAN NETWORK INFERENCE

    Suppose we have a Bayesiannetwork with variables X

    Our state space is the set of allpossible assignments of values to

    variables We can draw a sample in time

    that is linear in the size of thenetwork

    Draw Nsamples, use them to

    approximate the joint

    1st sample: D=d0,I=i1,G=g2,S=s0, L=l1

    2nd sample: D=d1,I=i1,G=g1,S=s1, L=l1

  • 8/3/2019 nasslli2010_4

    46/57

    REJECTION SAMPLING

    Suppose we have a Bayesiannetwork with variables X

    We wish to condition on some

    evidence E=e and compute theposterior over Y=X-E

    Draw samples and reject them

    when not compatible with

    evidence e

    Inefficient if the evidence is itself

    improbable we must reject alarge number of samples

    1st sample: D=d0,I=i1,G=g2,S=s0, L=l1 reject

    2nd sample: D=d1,I=i1,G=g1,S=s1, L=l1 accept

  • 8/3/2019 nasslli2010_4

    47/57

    SAMPLING IN MARKOV LOGIC NETWORKS

    Sampling is performed on the ground Markov

    logic network

    Alchemy uses a variant of the MCMC (Markov

    Chain Monte Carlo) method

    Can answer arbitrary queries of the form

    P(Fi|MLNC,L)

    Example: P(Cancer(Alice)|MLNC,L)

  • 8/3/2019 nasslli2010_4

    48/57

    MAP INFERENCE IN GRAPHICAL MODELS

    The following problem is NP-complete:

    Given a BN Bover Xand a number t, decide whether

    there exists an assignment x to Xsuch that P(x) > t.

    There exist several algorithms for MAP inferencewith reasonable performance on most practical

    problems

  • 8/3/2019 nasslli2010_4

    49/57

    MAP INFERENCE IN MARKOV LOGIC

    NETWORKS

    Problem: Find most likely state of world y given

    evidence e

    ni is thefeature corresponding to formula Fi

    wi is the weight corresponding to formula Fi

    iii

    y eynw ),(maxarg

  • 8/3/2019 nasslli2010_4

    50/57

    MAP INFERENCE IN MARKOV LOGIC

    NETWORKS

    Problem: Find most likely state of world given

    evidence e

    This is the weighted MAX-SAT problem

    Use weighted MAX-SAT solver (e.g.

    MaxWalkSAT) Better: Integer Linear Programming

    i

    iiy

    eynw ),(maxarg

  • 8/3/2019 nasslli2010_4

    51/57

    THE MAXWALKSAT ALGORITHM

    for i 1 to max-tries dosolution = random truth assignment

    forj 1 to max-flips do

    if weights(sat. clauses) > threshold then

    return solutionc random unsatisfied clause

    with probabilityp

    flip a random variable in c

    else

    flip variable in c that maximizes weights(sat. clauses)

    return failure, best solution found

  • 8/3/2019 nasslli2010_4

    52/57

    MAP INFERENCE IN MARKOV LOGIC

    NETWORKS

    Weve tried Alchemy (and MaxWalkSAT) and the

    results were poor

    Better results with integer linear programming

    (ILP)

    ILP performs exact inference

    Works very well on the problems we are

    concerned with

    Originated in the field of operations research

  • 8/3/2019 nasslli2010_4

    53/57

    LINEAR PROGRAMMING

    Alinear programming problem is the problem of

    maximizing (or minimizing) a linear function subject

    to a finite number of linear constraints

    Standard form oflinear programming:

    n

    j

    jjxc1

    )...,,2,1(0

    )...,,2,1(1

    njx

    mibxa

    j

    i

    n

    j

    jij

    maximize

    subject to

  • 8/3/2019 nasslli2010_4

    54/57

    54

    INTEGER LINEAR PROGRAMMING

    An integer linear programming problem is the

    problem of maximizing (or minimizing) a linear

    function subject to a finite number of linear

    constraints

    Difference to LP: Variables only allowed to haveinteger values

    n

    j

    jjxc1

    ,...}1,0,1{...,

    )...,,2,1(0

    )...,,2,1(1

    j

    j

    i

    n

    j

    jij

    x

    njx

    mibxa

    maximize

    subject to

  • 8/3/2019 nasslli2010_4

    55/57

    MAP INFERENCE

    :Smokes(A) _ Cancer(A) 1.5

    :Smokes(B) _ Cancer(B) 1.5

    :Smokes(A) _ Cancer(B) 1.5:Smokes(B) _ Cancer(A) 1.5

    :Friends(A,B) _ :Smokes(A) _ Smokes(B) 0.55

    :Friends(A,B) _ :Smokes(B) _ Smokes(A) 0.55

    )()(),(,

    )()(

    ySmokesxSmokesyxFriendsyx

    xCancerxSmokesx

    1.1

    5.1

    Two constants:Anna (A) and Bob (B)

    Evidence: Friends(A,B), Friends(B,A), Smokes(B)

  • 8/3/2019 nasslli2010_4

    56/57

    MAP INFERENCE - EXAMPLE

    :Smokes(A) _ Cancer(A) 1.5

    Introduce new variable for each ground atom: sa , ca

    Introduce new variable for each formula: xj

    Add the following three constraints:

    sa + xj 1

    -ca + xj 0

    -xj - sa + ca -1

    Add 1,5xj to the objective function

    n

    j

    jjxc

    1

    }1,0{

    )...,,2,1(1

    j

    i

    n

    j

    jij

    x

    mibxa

    maximize

    subject to

  • 8/3/2019 nasslli2010_4

    57/57

    TOMORROW

    Ontology Matching with Markov Logic

    Weight Learning

    Experiments