CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of...

55
CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta, Xuerui Wang, Ben Wellner, Fuchun Peng, Michael Hay.

Transcript of CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of...

Page 1: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

CRFs and Joint Inferencein NLP

Andrew McCallum

Computer Science Department

University of Massachusetts Amherst

Joint work with Charles Sutton, Aron Culotta, Xuerui Wang,Ben Wellner, Fuchun Peng, Michael Hay.

Page 2: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

From Text to Actionable Knowledge

SegmentClassifyAssociateCluster

Filter

Prediction Outlier detection Decision support

IE

Documentcollection

Database

Discover patterns - entity types - links / relations - events

DataMining

Spider

Actionableknowledge

Page 3: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

SegmentClassifyAssociateCluster

Filter

Prediction Outlier detection Decision support

IE

Documentcollection

Database

Discover patterns - entity types - links / relations - events

DataMining

Spider

Actionableknowledge

Uncertainty Info

Emerging Patterns

Joint Inference

Page 4: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

An HLT Pipeline

SNA, KDD, EventsTDT, Summarization

Coreference

Relations

NER

Parsing

MT

ASR

Errorscascade &

accumulate

Page 5: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

An HLT Pipeline

SNA, KDDTDT, Summarization

Coreference

Relations

NER

Parsing

MT

ASR

Unified,joint

inference.

Page 6: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

SegmentClassifyAssociateCluster

Filter

Prediction Outlier detection Decision support

IE

Documentcollection

Database

Discover patterns - entity types - links / relations - events

DataMining

Spider

Actionableknowledge

Uncertainty Info

Emerging Patterns

Joint Inference

Page 7: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

SegmentClassifyAssociateCluster

Filter

Prediction Outlier detection Decision support

IE

Documentcollection

ProbabilisticModel

Discover patterns - entity types - links / relations - events

DataMining

Spider

Actionableknowledge

Solution:

Conditional Random Fields [Lafferty, McCallum, Pereira]

Conditional PRMs [Koller…], [Jensen…], [Geetor…], [Domingos…]

Discriminatively-trained undirected graphical models

Complex Inference and LearningJust what we researchers like to sink our teeth into!

Unified Model

Page 8: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

(Linear Chain) Conditional Random Fields

yt -1

yt

xt

yt+1

xt +1

xt -1

Finite state model Graphical model

Undirected graphical model, trained to maximize

conditional probability of output sequence given input sequence

. . .

FSM states

observations

yt+2

xt +2

yt+3

xt +3

said Jones a Microsoft VP …

OTHER PERSON OTHER ORG TITLE …

output seq

input seq

Asian word segmentation [COLING’04], [ACL’04]IE from Research papers [HTL’04]Object classification in images [CVPR ‘04]

Wide-spread interest, positive experimental results in many applications.

Noun phrase, Named entity [HLT’03], [CoNLL’03]Protein structure prediction [ICML’04]IE from Bioinformatics text [Bioinformatics ‘04],…

[Lafferty, McCallum, Pereira 2001]

p(y | x) =1

Zx

Φ(y t , y t−1,x, t)t

∏ where

Φ(y t ,y t−1,x, t) = exp λ k fk (y t ,y t−1,x, t)k

∑ ⎛

⎝ ⎜

⎠ ⎟

Page 9: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Outline

• Motivating Joint Inference for NLP.

• Brief introduction of Conditional Random Fields

• Joint inference: Motivation and examples

– Joint Labeling of Cascaded Sequences (Belief Propagation)

– Joint Labeling of Distant Entities (BP by Tree Reparameterization)

– Joint Co-reference Resolution (Graph Partitioning)

– Joint Segmentation and Co-ref (Sparse BP)

– Joint Extraction and Data Mining (Iterative)

• Topical N-gram models

Page 10: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Jointly labeling cascaded sequencesFactorial CRFs

Part-of-speech

Noun-phrase boundaries

Named-entity tag

English words

[Sutton, Khashayar, McCallum, ICML 2004]

Page 11: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Jointly labeling cascaded sequencesFactorial CRFs

Part-of-speech

Noun-phrase boundaries

Named-entity tag

English words

[Sutton, Khashayar, McCallum, ICML 2004]

Page 12: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Jointly labeling cascaded sequencesFactorial CRFs

Part-of-speech

Noun-phrase boundaries

Named-entity tag

English words

[Sutton, Khashayar, McCallum, ICML 2004]

But errors cascade--must be perfect at every stage to do well.

Page 13: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Jointly labeling cascaded sequencesFactorial CRFs

Part-of-speech

Noun-phrase boundaries

Named-entity tag

English words

[Sutton, Khashayar, McCallum, ICML 2004]

Joint prediction of part-of-speech and noun-phrase in newswire,matching accuracy with only 50% of the training data.

Inference:Loopy Belief Propagation

Page 14: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

2. Jointly labeling distant mentionsSkip-chain CRFs

Senator Joe Green said today … . Green ran for …

[Sutton, McCallum, SRL 2004]

Dependency among similar, distant mentions ignored.

Page 15: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

2. Jointly labeling distant mentionsSkip-chain CRFs

Senator Joe Green said today … . Green ran for …

[Sutton, McCallum, SRL 2004]

14% reduction in error on most repeated field in email seminar announcements.

Inference:Tree reparameterization BP

[Wainwright et al, 2002]

See also[Finkel, et al, 2005]

Page 16: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

3. Joint co-reference among all pairsAffinity Matrix CRF

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . .

45

99Y/N

Y/N

Y/N

11

[McCallum, Wellner, IJCAI WS 2003, NIPS 2004]

~25% reduction in error on co-reference of proper nouns in newswire.

Inference:Correlational clusteringgraph partitioning

[Bansal, Blum, Chawla, 2002]

“Entity resolution”“Object correspondence”

Page 17: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

p

Databasefield values

c

4. Joint segmentation and co-reference

o

s

o

s

c

c

s

o

Citation attributes

y y

y

Segmentation

[Wellner, McCallum, Peng, Hay, UAI 2004]Inference:Sparse Generalized Belief Propagation

Co-reference decisions

Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, B. Laurel (ed), Addison-Wesley, 1990.

Brenda Laurel. Interface Agents: Metaphors with Character, in Laurel, The Art of Human-Computer Interface Design, 355-366, 1990.

[Pal, Sutton, McCallum, 2005]

World Knowledge

35% reduction in co-reference error by using segmentation uncertainty.

6-14% reduction in segmentation error by using co-reference.

Extraction from and matching of research paper citations.

see also [Marthi, Milch, Russell, 2003]

Page 18: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Joint IE and Coreference from Research Paper Citations

Textual citation mentions(noisy, with duplicates)

Paper database, with fields,clean, duplicates collapsed

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

AUTHORS TITLE VENUECowell, Dawid… Probab… SpringerMontemerlo, Thrun…FastSLAM… AAAI…Kjaerulff Approxi… Technic…

4. Joint segmentation and co-reference

Page 19: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Laurel, B. Interface Agents: Metaphors with Character , in

The Art of Human-Computer Interface Design , T. Smith (ed) ,

Addison-Wesley , 1990 .

Brenda Laurel . Interface Agents: Metaphors with Character , in

Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .

Citation Segmentation and Coreference

Page 20: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Laurel, B. Interface Agents: Metaphors with Character , in

The Art of Human-Computer Interface Design , T. Smith (ed) ,

Addison-Wesley , 1990 .

Brenda Laurel . Interface Agents: Metaphors with Character , in

Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .

1) Segment citation fields

Citation Segmentation and Coreference

Page 21: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Laurel, B. Interface Agents: Metaphors with Character , in

The Art of Human-Computer Interface Design , T. Smith (ed) ,

Addison-Wesley , 1990 .

Brenda Laurel . Interface Agents: Metaphors with Character , in

Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .

1) Segment citation fields

2) Resolve coreferent citations

Citation Segmentation and Coreference

Y?N

Page 22: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Laurel, B. Interface Agents: Metaphors with Character , in

The Art of Human-Computer Interface Design , T. Smith (ed) ,

Addison-Wesley , 1990 .

Brenda Laurel . Interface Agents: Metaphors with Character , in

Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .

1) Segment citation fields

2) Resolve coreferent citations

3) Form canonical database record

Citation Segmentation and Coreference

AUTHOR = Brenda Laurel TITLE = Interface Agents: Metaphors with CharacterPAGES = 355-366BOOKTITLE = The Art of Human-Computer Interface DesignEDITOR = T. SmithPUBLISHER = Addison-WesleyYEAR = 1990

Y?N

Resolving conflicts

Page 23: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Laurel, B. Interface Agents: Metaphors with Character , in

The Art of Human-Computer Interface Design , T. Smith (ed) ,

Addison-Wesley , 1990 .

Brenda Laurel . Interface Agents: Metaphors with Character , in

Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .

1) Segment citation fields

2) Resolve coreferent citations

3) Form canonical database record

Citation Segmentation and Coreference

AUTHOR = Brenda Laurel TITLE = Interface Agents: Metaphors with CharacterPAGES = 355-366BOOKTITLE = The Art of Human-Computer Interface DesignEDITOR = T. SmithPUBLISHER = Addison-WesleyYEAR = 1990

Y?N

Perform jointly.

Page 24: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

x

s

Observed citation

CRF Segmentation

IE + Coreference Model

J Besag 1986 On the…

AUT AUT YR TITL TITL

Page 25: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

x

s

Observed citation

CRF Segmentation

IE + Coreference Model

Citation mention attributes

J Besag 1986 On the…

AUTHOR = “J Besag”YEAR = “1986”TITLE = “On the…”

c

Page 26: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

x

s

IE + Coreference Model

c

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Structure for each citation mention

Page 27: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

x

s

IE + Coreference Model

c

Binary coreference variablesfor each pair of mentions

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Page 28: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

x

s

IE + Coreference Model

c

y n

n

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Binary coreference variablesfor each pair of mentions

Page 29: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

y n

n

x

s

IE + Coreference Model

c

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Research paper entity attribute nodes

AUTHOR = “P Smyth”YEAR = “2001”TITLE = “Data Mining…”...

Page 30: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Inference by Sparse “Generalized BP”

Exact inference onthese linear-chain regions

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

From each chainpass an N-best List

into coreference

[Pal, Sutton, McCallum 2005]

Page 31: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Inference by Sparse “Generalized BP”

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Approximate inferenceby graph partitioning…

…integrating outuncertaintyin samples

of extraction

Make scale to 1Mcitations with Canopies

[McCallum, Nigam, Ungar 2000]

[Pal, Sutton, McCallum 2005]

Page 32: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

y n

n

Inference by Sparse “Generalized BP”

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Exact (exhaustive) inferenceover entity attributes

[Pal, Sutton, McCallum 2005]

Page 33: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

y n

n

Inference by Sparse “Generalized BP”

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Revisit exact inferenceon IE linear chain,

now conditioned on entity attributes

[Pal, Sutton, McCallum 2005]

Page 34: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

y n

n

Parameter Estimation: Piecewise Training

Coref graph edge weightsMAP on individual edges

Divide-and-conquer parameter estimation

IE Linear-chainExact MAP

Entity attribute potentialsMAP, pseudo-likelihood

In all cases:Climb MAP gradient with

quasi-Newton method

[Sutton & McCallum 2005]

Page 35: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

p

Databasefield values

c

4. Joint segmentation and co-reference

o

s

o

s

c

c

s

o

Citation attributes

y y

y

Segmentation

[Wellner, McCallum, Peng, Hay, UAI 2004]

Inference:Variant of Iterated Conditional Modes

Co-reference decisions

Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, B. Laurel (ed), Addison-Wesley, 1990.

Brenda Laurel. Interface Agents: Metaphors with Character, in Laurel, The Art of Human-Computer Interface Design, 355-366, 1990.

[Besag, 1986]

World Knowledge

35% reduction in co-reference error by using segmentation uncertainty.

6-14% reduction in segmentation error by using co-reference.

Extraction from and matching of research paper citations.

Page 36: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Outline

• Motivating Joint Inference for NLP.

• Brief introduction of Conditional Random Fields

• Joint inference: Motivation and examples

– Joint Labeling of Cascaded Sequences (Belief Propagation)

– Joint Labeling of Distant Entities (BP by Tree Reparameterization)

– Joint Co-reference Resolution (Graph Partitioning)

– Joint Segmentation and Co-ref (Sparse BP)

– Joint Extraction and Data Mining (Iterative)

• Topical N-gram models

Page 37: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

“George W. Bush’s father is George H. W. Bush (son of Prescott Bush).”

Page 38: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

“George W. Bush’s father is George H. W. Bush (son of Prescott Bush).”

Page 39: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

“George W. Bush’s father is George H. W. Bush (son of Prescott Bush).”

Page 40: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

“George W. Bush’s father is George H. W. Bush (son of Prescott Bush).”

?

Page 41: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Relation Extraction as Sequence Labeling

George W. Bush

…George H. W. Bush (son of Prescott Bush) …

Father Grandfather

Page 42: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Learning Relational Database Features

George W. Bush

…George H. W. Bush (son of Prescott Bush) …

Father Grandfather

Name Son

Prescott Bush George H. W. Bush

George H. W. Bush George W. Bush

Search DB for “relational paths” between subject and token

Subject_Is_SonOf_SonOf_Token=1.0

Page 43: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Highly weighted relational paths

• Many Family equivalences– Sibling=Parent_Offspring– Cousin=Parent_Sibling_Offspring

• College=Parent_College• Religion=Parent_Religion• Ally=Opponent_Opponent• Friend=Person_Same_School

• Preliminary results: nice performance boost using relational features (~8% absolute F1)

Page 44: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Testing on Unknown Entities

John F. Kennedy

… son of Joseph P. Kennedy, Sr. and Rose Fitzgerald

Name Son

Joseph P. Kennedy John F. Kennedy

Rose Fitzgerald John F. Kennedy

Father Mother

Fill DB with “first-pass” CRFUse relational features with “second-pass” CRF

Page 45: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Next Steps

• Feature induction to discover complex rules

• Measure relational features’ sensitivity to noise in DB

• Collective inference among related relations

Page 46: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Outline

• Motivating Joint Inference for NLP.

• Brief introduction of Conditional Random Fields

• Joint inference: Motivation and examples

– Joint Labeling of Cascaded Sequences (Belief Propagation)

– Joint Labeling of Distant Entities (BP by Tree Reparameterization)

– Joint Co-reference Resolution (Graph Partitioning)

– Joint Segmentation and Co-ref (Sparse BP)

– Joint Extraction and Data Mining (Iterative)

• Topical N-gram models

Page 47: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Topical N-gram Model - Our first attempt

z1 z2 z3 z4

w1 w2 w3 w4

y1 y2 y3 y4

1

T

D

. . .

. . .

. . .

WTW

1 2 2

{0, 1, 1:2, 2:2, 1:3, 2:3, 3:3}

Wang & McCallum

Page 48: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Beyond bag-of-words

z1 z2 z3 z4

w1 w2 w3 w4

TW

D

. . .

. . .

Wallach

Page 49: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

LDA-COL (Collocation) Model

z1 z2 z3 z4

w1 w2 w3 w4

y1 y2 y3 y4

1 2

T

Griffiths & Steyvers

D

1 2

WW

. . .

. . .

. . .

Page 50: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Topical N-gram Model

z1 z2 z3 z4

w1 w2 w3 w4

y1 y2 y3 y4

1

T

D

. . .

. . .

. . .

WTW

1 2 2

Wang & McCallum

Page 51: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Topical N-gram Model

z1 z2 z3 z4

w1 w2 w3 w4

y1 y2 y3 y4

1

T

D

. . .

. . .

. . .

WTW

1 2 2

Wang & McCallum

Page 52: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Topic Comparison

learningoptimalreinforcementstateproblemspolicydynamicactionprogrammingactionsfunctionmarkovmethodsdecisionrlcontinuousspacessteppoliciesplanning

LDA

reinforcement learningoptimal policydynamic programmingoptimal controlfunction approximatorprioritized sweepingfinite-state controllerlearning systemreinforcement learning RLfunction approximatorsmarkov decision problemsmarkov decision processeslocal searchstate-action pairmarkov decision processbelief statesstochastic policyaction selectionupright positionreinforcement learning methods

policyactionstatesactionsfunctionrewardcontrolagentq-learningoptimalgoallearningspacestepenvironmentsystemproblemstepssuttonpolicies

Topical N-grams (2+) Topical N-grams (1)

Page 53: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Topic Comparison

motionvisualfieldpositionfiguredirectionfieldseyelocationretinareceptivevelocityvisionmovingsystemflowedgecenterlightlocal

LDA

receptive fieldspatial frequencytemporal frequencyvisual motionmotion energytuning curveshorizontal cellsmotion detectionpreferred directionvisual processingarea mtvisual cortexlight intensitydirectional selectivityhigh contrastmotion detectorsspatial phasemoving stimulidecision strategyvisual stimuli

motionresponsedirectioncellsstimulusfigurecontrastvelocitymodelresponsesstimulimovingcellintensitypopulationimagecentertuningcomplexdirections

Topical N-grams (2+) Topical N-grams (1)

Page 54: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Topic Comparison

wordsystemrecognitionhmmspeechtrainingperformancephonemewordscontextsystemsframetrainedspeakersequencespeakersmlpframessegmentationmodels

LDA

speech recognitiontraining dataneural networkerror ratesneural nethidden markov modelfeature vectorscontinuous speechtraining procedurecontinuous speech recognitiongamma filterhidden controlspeech productionneural netsinput representationoutput layerstraining algorithmtest setspeech framesspeaker dependent

speechwordtrainingsystemrecognitionhmmspeakerperformancephonemeacousticwordscontextsystemsframetrainedsequencephoneticspeakersmlphybrid

Topical N-grams (2+) Topical N-grams (1)

Page 55: CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Summary

• Joint inference can avoid accumulating errors in an pipeline from extraction to data mining.

• Examples– Factorial finite state models– Jointly labeling distant entities– Coreference analysis– Segmentation uncertainty aiding coreference & vice-versa– Joint Extraction and Data Mining

• Many examples of sequential topic models.