Markov Logic: A Representation Language for Natural Language Semantics

54
Markov Logic: A Representation Language for Natural Language Semantics Pedro Domingos Dept. Computer Science & Eng. University of Washington (Based on joint work with Stanley Kok, Matt Richardson and Parag Singla)

description

Markov Logic: A Representation Language for Natural Language Semantics. Pedro Domingos Dept. Computer Science & Eng. University of Washington (Based on joint work with Stanley Kok, Matt Richardson and Parag Singla). Overview. Motivation Background Representation Inference Learning - PowerPoint PPT Presentation

Transcript of Markov Logic: A Representation Language for Natural Language Semantics

Page 1: Markov Logic: A Representation Language for Natural Language Semantics

Markov Logic:A Representation Language for

Natural Language Semantics

Pedro DomingosDept. Computer Science & Eng.

University of Washington

(Based on joint work with Stanley Kok,Matt Richardson and Parag Singla)

Page 2: Markov Logic: A Representation Language for Natural Language Semantics

Overview

Motivation Background Representation Inference Learning Applications Discussion

Page 3: Markov Logic: A Representation Language for Natural Language Semantics

Motivation

Natural language is characterized by Complex relational structure High uncertainty (ambiguity, imperfect knowledge)

First-order logic handles relational structure Probability handles uncertainty Let’s combine the two

Page 4: Markov Logic: A Representation Language for Natural Language Semantics

Markov Logic[Richardson & Domingos, 2006]

Syntax: First-order logic + Weights Semantics: Templates for Markov nets Inference: Weighted satisfiability + MCMC Learning: Voted perceptron + ILP

Page 5: Markov Logic: A Representation Language for Natural Language Semantics

Overview

Motivation Background Representation Inference Learning Applications Discussion

Page 6: Markov Logic: A Representation Language for Natural Language Semantics

Markov Networks Undirected graphical models

B

DC

A

1( ) ( )c

c

P X XZ

3.7 if A and B

( , ) 2.1 if A and B

0.7 otherwise

2.3 if B and C and D( , , )

5.1 otherwise

A B

B C D

( )cX c

Z X Potential functions defined over cliques

Page 7: Markov Logic: A Representation Language for Natural Language Semantics

Markov Networks Undirected graphical models

B

DC

A

Potential functions defined over cliques

Weight of Feature i Feature i

exp ( )i iX i

Z w f X

1( ) exp ( )i i

i

P X w f XZ

1 if A and B ( , )

0 otherwise

1 if B and C and D( , , )

0

f A B

f B C D

Page 8: Markov Logic: A Representation Language for Natural Language Semantics

First-Order Logic

Constants, variables, functions, predicatesE.g.: Anna, X, mother_of(X), friends(X, Y)

Grounding: Replace all variables by constantsE.g.: friends (Anna, Bob)

World (model, interpretation):Assignment of truth values to all ground predicates

Page 9: Markov Logic: A Representation Language for Natural Language Semantics

Overview

Motivation Background Representation Inference Learning Applications Discussion

Page 10: Markov Logic: A Representation Language for Natural Language Semantics

Markov Logic Networks

A logical KB is a set of hard constraintson the set of possible worlds

Let’s make them soft constraints:When a world violates a formula,It becomes less probable, not impossible

Give each formula a weight(Higher weight Stronger constraint)

satisfiesit formulas of weightsexp)(worldP

Page 11: Markov Logic: A Representation Language for Natural Language Semantics

Definition

A Markov Logic Network (MLN) is a set of pairs (F, w) where F is a formula in first-order logic w is a real number

Together with a set of constants,it defines a Markov network with One node for each grounding of each predicate in

the MLN One feature for each grounding of each formula F

in the MLN, with the corresponding weight w

Page 12: Markov Logic: A Representation Language for Natural Language Semantics

Example: Friends & Smokers

)()(),(,

)()(

ySmokesxSmokesyxFriendsyx

xCancerxSmokesx

1.1

5.1

Cancer(A)

Smokes(A) Smokes(B)

Cancer(B)

Suppose we have two constants: Anna (A) and Bob (B)

Page 13: Markov Logic: A Representation Language for Natural Language Semantics

Example: Friends & Smokers

)()(),(,

)()(

ySmokesxSmokesyxFriendsyx

xCancerxSmokesx

1.1

5.1

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Suppose we have two constants: Anna (A) and Bob (B)

Page 14: Markov Logic: A Representation Language for Natural Language Semantics

Example: Friends & Smokers

)()(),(,

)()(

ySmokesxSmokesyxFriendsyx

xCancerxSmokesx

1.1

5.1

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Suppose we have two constants: Anna (A) and Bob (B)

Page 15: Markov Logic: A Representation Language for Natural Language Semantics

Example: Friends & Smokers

)()(),(,

)()(

ySmokesxSmokesyxFriendsyx

xCancerxSmokesx

1.1

5.1

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Suppose we have two constants: Anna (A) and Bob (B)

Page 16: Markov Logic: A Representation Language for Natural Language Semantics

More on MLNs

MLN is template for ground Markov nets Typed variables and constants greatly reduce

size of ground Markov net Functions, existential quantifiers, etc. MLN without variables = Markov network

(subsumes graphical models)

Page 17: Markov Logic: A Representation Language for Natural Language Semantics

Relation to First-Order Logic

Infinite weights First-order logic Satisfiable KB, positive weights

Satisfying assignments = Modes of distribution MLNs allow contradictions between formulas

Page 18: Markov Logic: A Representation Language for Natural Language Semantics

Overview

Motivation Background Representation Inference Learning Applications Discussion

Page 19: Markov Logic: A Representation Language for Natural Language Semantics

MPE/MAP Inference

Find most likely truth values of non-evidence ground atoms given evidence

Apply weighted satisfiability solver(maxes sum of weights of satisfied clauses)

MaxWalkSat algorithm [Kautz et al., 1997] Start with random truth assignment With prob p, flip atom that maxes weight sum;

else flip random atom in unsatisfied clause Repeat n times Restart m times

Page 20: Markov Logic: A Representation Language for Natural Language Semantics

Conditional Inference

P(Formula|MLN,C) = ? MCMC: Sample worlds, check formula holds P(Formula1|Formula2,MLN,C) = ? If Formula2 = Conjunction of ground atoms

First construct min subset of network necessary to answer query (generalization of KBMC)

Then apply MCMC (or other)

Page 21: Markov Logic: A Representation Language for Natural Language Semantics

Ground Network Construction

Initialize Markov net to contain all query preds For each node in network

Add node’s Markov blanket to network Remove any evidence nodes

Repeat until done

Page 22: Markov Logic: A Representation Language for Natural Language Semantics

Probabilistic Inference Recall

Exact inference is #P-complete Conditioning on Markov blanket is easy:

Gibbs sampling exploits this

exp ( )( | ( ))

exp ( 0) exp ( 1)

i ii

i i i ii i

w f xP x MB x

w f x w f x

1( ) exp ( )i i

i

P X w f XZ

exp ( )i i

X i

Z w f X

Page 23: Markov Logic: A Representation Language for Natural Language Semantics

Markov Chain Monte Carlo

Gibbs Sampler 1. Start with an initial assignment to nodes

2. One node at a time, sample node given others

3. Repeat

4. Use samples to compute P(X) Apply to ground network Initialization: MaxWalkSat Can use multiple chains

Page 24: Markov Logic: A Representation Language for Natural Language Semantics

Overview

Motivation Background Representation Inference Learning Applications Discussion

Page 25: Markov Logic: A Representation Language for Natural Language Semantics

Learning Data is a relational database Closed world assumption (if not: EM) Learning parameters (weights)

Generatively: Pseudo-likelihood Discriminatively: Voted perceptron + MaxWalkSat

Learning structure Generalization of feature induction in Markov nets Learn and/or modify clauses Inductive logic programming with pseudo-

likelihood as the objective function

Page 26: Markov Logic: A Representation Language for Natural Language Semantics

Generative Weight Learning

Maximize likelihood (or posterior) Use gradient ascent

Requires inference at each step (slow!)

Feature count according to data

Feature count according to model

log ( ) ( ) [ ( )]i Y ii

dP X f X E f Y

dw

Page 27: Markov Logic: A Representation Language for Natural Language Semantics

Pseudo-Likelihood [Besag, 1975]

Likelihood of each variable given its Markov blanket in the data

Does not require inference at each step Widely used

( ) ( | ( ))x

PL X P x MB x

Page 28: Markov Logic: A Representation Language for Natural Language Semantics

Most terms not affected by changes in weights After initial setup, each iteration takes

O(# ground predicates x # first-order clauses)

( ) ( 0) ( 0) ( 1) ( 1)i i i ix

nsat x p x nsat x p x nsat x

Optimization

where nsati(x=v) is the number of satisfied groundingsof clause i in the training data when x takes value v

Parameter tying over groundings of same clause Maximize using L-BFGS [Liu & Nocedal, 1989]

Page 29: Markov Logic: A Representation Language for Natural Language Semantics

),(),()|(log yxnEyxnxyPw iwiwi

Gradient of Conditional Log Likelihood

# true groundings of

formula in DBExpected # of true

groundings – slow!

Approximate expected count by MAP count

Discriminative Weight Learning

Page 30: Markov Logic: A Representation Language for Natural Language Semantics

Used for discriminative training of HMMs Expected count in gradient approximated by count in

MAP state MAP state found using Viterbi algorithm Weights averaged over all iterations

Voted Perceptron[Collins, 2002]

initialize wi=0 for t=1 to T do find the MAP configuration using Viterbi wi, = * (training count – MAP count) end for

Page 31: Markov Logic: A Representation Language for Natural Language Semantics

HMM is special case of MLN Expected count in gradient approximated by count in

MAP state MAP state found using MaxWalkSat Weights averaged over all iterations

Voted Perceptron for MLNs[Singla & Domingos, 2004]

initialize wi=0 for t=1 to T do find the MAP configuration using MaxWalkSat wi, = * (training count – MAP count) end for

Page 32: Markov Logic: A Representation Language for Natural Language Semantics

Overview

Motivation Background Representation Inference Learning Applications Discussion

Page 33: Markov Logic: A Representation Language for Natural Language Semantics

Applications to Date

Entity resolution (Cora, BibServ) Information extraction for biology

(won LLL-2005 competition) Probabilistic Cyc Link prediction Topic propagation in scientific communities Etc.

Page 34: Markov Logic: A Representation Language for Natural Language Semantics

Entity Resolution Most logical systems make unique names assumption What if we don’t? Equality predicate: Same(A,B), or A = B Equality axioms

Reflexivity, symmetry, transitivity For every unary predicate P: x1 = x2 => (P(x1) <=> P(x2)) For every binary predicate R:

x1 = x2 y1 = y2 => (R(x1,y1) <=> R(x2,y2)) Etc.

But in Markov logic these are soft and learnable Can also introduce reverse direction:

R(x1,y1) R(x2,y2) x1 = x2 => y1 = y2 Surprisingly, this is all that’s needed

Page 35: Markov Logic: A Representation Language for Natural Language Semantics

Example: Citation Matching

Page 36: Markov Logic: A Representation Language for Natural Language Semantics

Markov Logic Formulation: Predicates Are two bibliography records the same?

SameBib(b1,b2) Are two field values the same?

SameAuthor(a1,a2)SameTitle(t1,t2)SameVenue(v1,v2)

How similar are two field strings?Predicates for ranges of cosine TF-IDF score:TitleTFIDF.0(t1,t2) is true iff TF-IDF(t1,t2)=0TitleTFIDF.2(a1,a2) is true iff 0 <TF-IDF(a1,a2) < 0.2Etc.

Page 37: Markov Logic: A Representation Language for Natural Language Semantics

Markov Logic Formulation: Formulas

Unit clauses (defaults):! SameBib(b1,b2)

Two fields are same => Corresponding bib. records are same:Author(b1,a1) Author(b2,a2) SameAuthor(a1,a2) => SameBib(b1,b2)

Two bib. records are same => Corresponding fields are same:Author(b1,a1) Author(b2,a2) SameBib(b1,b2) => SameAuthor(a1,a2)

High similarity score => Two fields are same:TitleTFIDF.8(t1,t2) =>SameTitle(t1,t2)

Transitive closure (not incorporated in experiments):SameBib(b1,b2) SameBib(b2,b3) => SameBib(b1,b3)

25 predicates, 46 first-order clauses

Page 38: Markov Logic: A Representation Language for Natural Language Semantics

What Does This Buy You?

Objects are matched collectively Multiple types matched simultaneously Constraints are soft, and strengths can be

learned from data Easy to add further knowledge Constraints can be refined from data Standard approach still embedded

Page 39: Markov Logic: A Representation Language for Natural Language Semantics

Example

Record Title Author Venue

B1 Object Identification using CRFs Linda Stewart PKDD 04

B2 Object Identification using CRFs Linda Stewart 8th PKDD

B3 Learning Boolean Formulas Bill Johnson PKDD 04

B4 Learning of Boolean Formulas William Johnson 8th PKDD

Subset of a Bibliography Database

Page 40: Markov Logic: A Representation Language for Natural Language Semantics

Standard Approach [Fellegi & Sunter, 1969]

b1=b2?

Sim(Linda Stewart, Linda Stewart)

b3=b4?

Author

Title

Venue

Sim(PKDD 04, 8th PKDD)

Sim(Object Identification using CRFs, Object Identification using CRFs)

Sim(Bill Johnson, William Johnson)

Title

Author

Sim(Learning Boolean Formulas, Learning of Boolean Expressions)

Sim(PKDD 04, 8th PKDD)

Venue

record-match node

field-similarity node (evidence

node)

Page 41: Markov Logic: A Representation Language for Natural Language Semantics

What’s Missing?

b1=b2?

Sim(Linda Stewart, Linda Stewart)

b3=b4?

Author

Title

Venue

Sim(PKDD 04, 8th PKDD)

Sim(Object Identification using CRF, Object Identification using CRF)

Sim(Bill Johnson, William Johnson)

Title

Author

Sim(Learning Boolean Formulas, Learning of Boolean Expressions)

Sim(PKDD 04, 8th PKDD)

Venue

If from b1=b2, you infer that “PKDD 04” is same as “8th PKDD”, how canyou use that to help figure out if b3=b4?

Page 42: Markov Logic: A Representation Language for Natural Language Semantics

Merging the Evidence Nodes

Author

Still does not solve the problem. Why?

b1=b2?

Sim(Linda Stewart, Linda Stewart)

b3=b4?

Author

Title

Venue

Sim(Object Identification using CRFs, Object Identification using CRFs)

Sim(Bill Johnson, William Johnson)

Title

Author

Sim(Learning Boolean Formulas, Learning of Boolean Expressions)

Sim(PKDD 04, 8th PKDD)

Page 43: Markov Logic: A Representation Language for Natural Language Semantics

Introducing Field-Match Nodes

b1=b2?

Sim(Linda Stewart, Linda Stewart)

b3=b4?

Author

Title

Venue

b1.T=b2.T?

b1.V=b2.V?b3.V=b4.V?

b3.A=b4.A?

b3.T=b4.T?

b1.A=b2.A?

Sim(Object Identification using CRFs, Object Identification using CRFs)

Sim(Bill Johnson, William Johnson)

Title

Author

Sim(Learning Boolean Formulas, Learning of Boolean Expressions)

field-match node

Full representation in Collective Model

Sim(PKDD 04, 8th PKDD)

Page 44: Markov Logic: A Representation Language for Natural Language Semantics

Flow of Information

b1=b2?

Sim(Linda Stewart, Linda Stewart)

b3=b4?

Author

Title

Venue

b1.T=b2.T?

b1.V=b2.V?b3.V=b4.V?

b3.A=b4.A?

b3.T=b4.T?

b1.A=b2.A?

Sim(Object Identification using CRFs, Object Identification using CRFs)

Sim(Bill Johnson, William Johnson)

Title

Author

Sim(Learning Boolean Formulas, Learning of Boolean Expressions)

Sim(PKDD 04, 8th PKDD)

Page 45: Markov Logic: A Representation Language for Natural Language Semantics

Flow of Information

b1=b2?

Sim(Linda Stewart, Linda Stewart)

b3=b4?

Author

Title

Venue

b1.T=b2.T?

b1.V=b2.V?b3.V=b4.V?

b3.A=b4.A?

b3.T=b4.T?

b1.A=b2.A?

Sim(Object Identification using CRFs, Object Identification using CRFs)

Sim(Bill Johnson, William Johnson)

Title

Author

Sim(Learning Boolean Formulas, Learning of Boolean Expressions)

Sim(PKDD 04, 8th PKDD)

Page 46: Markov Logic: A Representation Language for Natural Language Semantics

Flow of Information

b1=b2?

Sim(Linda Stewart, Linda Stewart)

b3=b4?

Author

Title

Venue

b1.T=b2.T?

b1.V=b2.V?b3.V=b4.V?

b3.A=b4.A?

b3.T=b4.T?

b1.A=b2.A?

Sim(Object Identification using CRFs, Object Identification using CRFs)

Sim(Bill Johnson, William Johnson)

Title

Author

Sim(Learning Boolean Formulas, Learning of Boolean Expressions)

Sim(PKDD 04, 8th PKDD)

Page 47: Markov Logic: A Representation Language for Natural Language Semantics

Flow of Information

b1=b2?

Sim(Linda Stewart, Linda Stewart)

b3=b4?

Author

Title

Venue

b1.T=b2.T?

b1.V=b2.V?b3.V=b4.V?

b3.A=b4.A?

b3.T=b4.T?

b1.A=b2.A?

Sim(Object Identification using CRF, Object Identification using CRF)

Sim(Bill Johnson, William Johnson)

Title

Author

Sim(Learning Boolean Formulas, Learning of Boolean Expressions)

Sim(PKDD 04, 8th PKDD)

Page 48: Markov Logic: A Representation Language for Natural Language Semantics

Flow of Information

b1=b2?

Sim(Linda Stewart, Linda Stewart)

b3=b4?

Author

Title

Venue

b1.T=b2.T?

b1.V=b2.V?b3.V=b4.V?

b3.A=b4.A?

b3.T=b4.T?

b1.A=b2.A?

Sim(Object Identification using CRFs, Object Identification using CRFs)

Sim(Bill Johnson, William Johnson)

Title

Author

Sim(Learning Boolean Formulas, Learning of Boolean Expressions)

Sim(PKDD 04, 8th PKDD)

Page 49: Markov Logic: A Representation Language for Natural Language Semantics

Experiments

Databases: Cora [McCallum et al., IRJ, 2000]:

1295 records, 132 papers BibServ.org [Richardson & Domingos, ISWC-03]:

21,805 records, unknown #papers

Goal: De-duplicate bib.records, authors and venues Pre-processing: Form canopies [McCallum et al, KDD-00 ]

Compared with naïve Bayes (standard method), etc. Measured area under precision-recall curve (AUC) Our approach wins across the board

Page 50: Markov Logic: A Representation Language for Natural Language Semantics

Results:Matching Venues on Cora

0.771

0.061

0.342

0.096 0.047

0.339 0.339

0

0.25

0.5

0.75

1

MLN(VP) MLN(ML) MLN(PL) KB CL NB BN

System

AUC

Page 51: Markov Logic: A Representation Language for Natural Language Semantics

Overview

Motivation Background Representation Inference Learning Applications Discussion

Page 52: Markov Logic: A Representation Language for Natural Language Semantics

Relation to Other Approaches

Representation Logical language

Probabilistic language

Markov logic First-order logic Markov nets

RMNs Conjunctive queries

Markov nets

PRMs Frame systems Bayes nets

KBMC Horn clauses Bayes nets

SLPs Horn clauses Bayes nets

Page 53: Markov Logic: A Representation Language for Natural Language Semantics

Going Further

First-order logic is not enough We can “Markovize” other representations

in the same way Lots to do

Page 54: Markov Logic: A Representation Language for Natural Language Semantics

Summary

NLP involves relational structure, uncertainty Markov logic combines first-order logic and

probabilistic graphical models Syntax: First-order logic + Weights Semantics: Templates for Markov networks

Inference: MaxWalkSat + KBMC + MCMC Learning: Voted perceptron + PL + ILP Applications to date: Entity resolution, IE, etc. Software: Alchemy

http://www.cs.washington.edu/ai/alchemy