Dynamic Conditional Random Fields for Labeling and Segmenting Sequences
-
Upload
galvin-hatfield -
Category
Documents
-
view
31 -
download
3
description
Transcript of Dynamic Conditional Random Fields for Labeling and Segmenting Sequences
![Page 1: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/1.jpg)
Dynamic Conditional Random Fieldsfor Labeling and Segmenting Sequences
Khashayar Rohanimanesh
Joint work with
Charles SuttonAndrew McCallum
University of Massachusetts Amherst
![Page 2: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/2.jpg)
Noun Phrase Segmentation(CoNLL-2000, Sang and Buckholz, 2000)
B I I B I I O O ORockwell International Corp. 's Tulsa unit said it signed
B I I O B I O B Ia tentative agreement extending its contract with Boeing Co.
O O B I O B B I Ito provide structural parts for Boeing 's 747 jetliners.
![Page 3: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/3.jpg)
Named Entity Recognition
CRICKET - MILLNS SIGNS FOR BOLAND
CAPE TOWN 1996-08-22
South African provincial side Boland said on Thursday they had signed Leicestershire fast bowler David Millns on a one year contract. Millns, who toured Australia with England A in 1992, replaces former England all-rounder Phillip DeFreitas as Boland's overseas professional.
Labels: Examples:
PER Yayuk BasukiInnocent Butare
ORG 3MKDPLeicestershire
LOC LeicestershireNirmal HridayThe Oval
MISC JavaBasque1,000 Lakes Rally
[McCallum & Li, 2003]
![Page 4: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/4.jpg)
Information Extraction
a seminar entitled “Nanorheology of Polymers & Complex
STIME LOCFluids," at 4:30 p.m, Monday, Feb. 27, in Wean Hall 7500.
SPEAKThe seminar will be given by Professor Steven Granick
Seminar Announcements [Peshkin,Pfeffer 2003]
PROTEINSNC1, a gene from the yeast Saccharomyces cerevisiae,
LOCencodes a homolog of vertebrate synaptic vesicle-associated
membrane proteins (VAMPs) or synaptobrevins. ”subcellular-localization(SNC1,vesicle)
Biological Abstracts [Skounakis,Craven,Ray 2003]
![Page 5: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/5.jpg)
Simultaneous noun-phrase & part-of-speech tagging
B I I B I I O O O N N N O N N V O VRockwell International Corp. 's Tulsa unit said it signed
B I I O B I O B IO J N V O N O N Na tentative agreement extending its contract with Boeing Co.
![Page 6: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/6.jpg)
Probabilistic Sequence Labeling
![Page 7: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/7.jpg)
Linear-Chain CRFs
c(,)c(,)
c(,)c(,)
Finite-State
c(,)c(,)
c(,)c(,)
![Page 8: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/8.jpg)
Linear-Chain CRFs
Graphical Model
(,)
(,)
(,)
(,)
(,)
(,)
(,)
(,)
Training
Um… what's ?
x
y
![Page 9: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/9.jpg)
Linear-Chain CRFs
Graphical Model Training
Rewrite as:
for some features fk and weights k
Now solve for k by convex optimization.
x
y
![Page 10: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/10.jpg)
General CRFs
A CRF is an undirected, conditionally-trained graphical model.
Train k by convex optimization to maximize conditional log-likelihood.
Features fk can be arbitrary, overlapping, domain-specific.
![Page 11: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/11.jpg)
CRF Training
Train k by convex optimization to maximize conditional log-likelihood.
![Page 12: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/12.jpg)
Optimization Methods
• Generalized Iterative Scaling (GIS)– Improved Iterative Scaling
• First order methods– Non-Linear conjugate gradient
• Second Order methods– Limited memory Quasi-Newton (BFGS)
![Page 13: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/13.jpg)
From Generative to Conditional
Graphical ModelModel
HMMs
MEMMs
Linear chainCRFs
Models observation
- Does not model observation- Label bias problem
- Does not model observation- Eliminates label bias problem
![Page 14: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/14.jpg)
Dynamic CRFs
![Page 15: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/15.jpg)
Simultaneous noun-phrase & part-of-speech tagging
B I I B I I O O O N N N O N N V O VRockwell International Corp. 's Tulsa unit said it signed
B I I O B I O B IO J N V O N O N Na tentative agreement extending its contract with Boeing Co.
![Page 16: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/16.jpg)
Features
• Word identity “International”• Capitalization Xxxxxxx• Character classes Contains digits• Character n-gram …ment• Lexicon memberships In list of company
names• WordNet synset (speak, say, tell)• …• Part of speech Proper Noun
![Page 17: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/17.jpg)
Multiple Nested Predictionson the Same Sequence
Part-of-speech
Word identity (input observation)
(output prediction)
Noun phrase
Rockwell Int’l Corp. 's Tulsa
![Page 18: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/18.jpg)
Multiple Nested Predictionson the Same Sequence
Part-of-speech
Noun phrase
Word identity (input observation)
(input observation)
(output prediction)
But errors in each stage are compounding.Uncertainty from one stage to the next is not preserved.
Rockwell Int’l Corp. 's Tulsa
![Page 19: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/19.jpg)
Cascaded Predictions
Segmentation
Chinese character (input observation)
(output prediction)
Part-of-speech
Named-entity tag
![Page 20: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/20.jpg)
Cascaded Predictions
Segmentation
Part-of-speech
Chinese character (input observation)
(input observation)
(output prediction)
Named-entity tag
![Page 21: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/21.jpg)
Cascaded Predictions
Segmentation
Part-of-speech
Named-entity tag
Chinese character (input observation)
(input observation)
(input obseration)
(output prediction)
Even more stages here, so compounding of errors is worse.
![Page 22: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/22.jpg)
Joint PredictionCross-Product over Labels
Segmentation+POS+NE
Chinese character (input observation)
(output prediction)
2 x 45 x 11 = 990 possible states
O(T x 9902) running time
O(|V| x 9902) parameters
e.g.: state label = (Wordbeg, Noun, Person)
![Page 23: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/23.jpg)
Segmentation
Part-of-speech
Named-entity tag
Chinese character (input observation)
(output prediction)
(output prediction)
(output prediction)
O(|V| x 990) parameters
Joint PredictionFactorial CRF
![Page 24: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/24.jpg)
Linear-chain
Factorial
() exp k fk ()k
p(y | x) 1
Z(x)y (y t , y t 1)xy (x t , y t )
t1
T
where
Linear-Chain to Factorial CRFsModel Definition
...
...
...
...
...
...
p(y | x) 1
Z(x)u(ut ,ut 1)v (v t ,v t 1)w (wt ,wt 1)
t1
T
uv (ut ,v t )vw (v t ,wt )wx (wt , x t )
w
v
u
x
y
x
![Page 25: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/25.jpg)
Linear-chain
Factorial
Linear-Chain to Factorial CRFsLog-likelihood Training
...
...
...
...
...
...
w
v
u
x
y
x
L
k
fk (x(i),ut( i),ut 1
(i) )t
i
p(u | x)u
t
fk (x( i),ut ,ut 1)i
k2
L
k
fk (x(i),y t( i),y t 1
(i) )t
i
p(y | x)u
t
fk (x( i),y t ,y t 1)i
k2
![Page 26: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/26.jpg)
Dynamic CRFsUndirected conditionally-trained analogue
to Dynamic Bayes Nets (DBNs)
Factorial Higher-Order Hierarchical
![Page 27: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/27.jpg)
Need for Inference
...
...x
y
...
...x
y
Marginal distributions
Most-likely (Viterbi) labeling
p(y t ,y t1 | x)
argmaxy
p(y | x)
Used during training
Used to label a sequence 9000 training instances x 100 maximizer iterations= 900,000 calls to inference algorithm!
![Page 28: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/28.jpg)
Max-clique: 3 x 45 x 45 = 6075 assignments
NP
POS
Inference (Exact)Junction Tree
![Page 29: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/29.jpg)
Max-clique: 3 x 45 x 45 x 11 = 66825 assignments
NER
POS
SEG
Inference (Exact)Junction Tree
![Page 30: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/30.jpg)
Inference (Approximate)Loopy Belief Approximation
v6v5
v3v2v1
v4
m4(v1) m6(v3)m5(v2)m1(v4) m3(v6)m2(v5)
m1(v2)
m5(v6)m4(v5)
m2(v3)
m5(v4) m5(v4)
m3(v2)m2(v1)
![Page 31: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/31.jpg)
[Wainwright, Jaakkola, Willsky 2001]
1
3
2
4
5
6
14
23
25
45
36
56
12
Inference (Approximate)Tree Re-parameterization
![Page 32: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/32.jpg)
1
3
2
4
5
6
14
23
25
45
36
56
12
[Wainwright, Jaakkola, Willsky 2001]
Inference (Approximate)Tree Re-parameterization
![Page 33: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/33.jpg)
p1
p3
p2
p4
p5
p6
45
56
p23p2 p3
p36p3 p6
p25p2 p5
p14p1p4
p12p1p2
[Wainwright, Jaakkola, Willsky 2001]
Inference (Approximate)Tree Re-parameterization
![Page 34: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/34.jpg)
p1
p3
p2
p4
p5
p6
45
56
p23p2 p3
p36p3 p6
p25p2 p5
p14p1p4
p12p1p2
[Wainwright, Jaakkola, Willsky 2001]
Inference (Approximate)Tree Re-parameterization
![Page 35: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/35.jpg)
ExperimentsSimultaneous noun-phrase & part-of-speech tagging
• Data from CoNLL Shared Task 2000 (Newswire)– Training subsets of various sizes: from 223-894 sentences– Features include: word identity, neighboring words,
capitalization, lexicons of parts-of-speech, company names (1,358227 feature functions !)
B I I B I I O O O N N N O N N V O VRockwell International Corp. 's Tulsa unit said it signed
B I I O B I O B IO J N V O N O N Na tentative agreement extending its contract with Boeing Co.
![Page 36: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/36.jpg)
ExperimentsSimultaneous noun-phrase & part-of-speech tagging
Two experiments• Compare exact and approximate inference• Compare accuracy of cascaded CRFs and Factorial DCRFs
B I I B I I O O O N N N O N N V O VRockwell International Corp. 's Tulsa unit said it signed
B I I O B I O B IO J N V O N O N Na tentative agreement extending its contract with Boeing Co.
![Page 37: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/37.jpg)
Noun Phrase Accuracy
![Page 38: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/38.jpg)
Accuracy
POS-tagger, (Brill, 1994) F1 for NP on 8936: 93.87
![Page 39: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/39.jpg)
Summary
• Many natural language tasks are solved by chaining errorful subtasks.
• Approach: Jointly solve all subtasks in a single graphical model.– Learn dependence between subtasks– Allow higher-level to inform lower level
• Improved joint and POS accuracy over cascaded model, but NP accuracy lower.
• Current work: Emphasize one subtask
![Page 40: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/40.jpg)
Maximize Marginal Likelihood(Ongoing work)
NP
POS
O() log p(np( i) | x( i))i
p(np( i),pos(i) | x(i))pos
i
O
k
p(pos | np( i),x(i)) fk (pos,np( i),x(i))pos
i
p(pos,np | x(i)) fk (pos,np,x(i))np
pos
i
![Page 41: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/41.jpg)
Thank you!
![Page 42: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/42.jpg)
![Page 43: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/43.jpg)
State-of-the-art Performance
• POS tagging: – 97% (Brill, 1999)
• NP chinking:– 94.38% (Sha and Pereira)– 94.39% (?)
![Page 44: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/44.jpg)
Alternatives to Traditional Joint
• Optimize Marginal Likelihood
• Optimize Utility
• Optimize Margin (M3N) [Taskar, Guestrin, Koller 2003]
![Page 45: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/45.jpg)
Maximize Marginal Likelihood(Ongoing work)
NP
POS
O() log p(np( i) | x( i))i
p(np( i),pos(i) | x(i))pos
i
O
k
p(pos | np( i),x(i)) fk (pos,np( i),x(i))pos
i
p(pos,np | x(i)) fk (pos,np,x(i))np
pos
i
![Page 46: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/46.jpg)
Undirected Graphical Models
Directed
Undirected
![Page 47: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/47.jpg)
Hidden Markov Models
TrainingGraphical Model
p(|)p(|)
p(|)p(|)
p(|)
p(|)
p(|)
p(|)
p(,)=p() p(|) p(|) p(|) p(|) p(|)
![Page 48: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/48.jpg)
Hidden Markov Models
p(|)p(|)
p(|)p(|)
Finite-State
p(|)p(|)
p(|)p(|)
Graphical Model
p(|)p(|)
p(|)p(|)
p(|)
p(|)
p(|)
p(|)
p(,)=p() p(|) p(|) p(|) p(|) p(|)
![Page 49: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/49.jpg)
![Page 50: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/50.jpg)
![Page 51: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/51.jpg)
![Page 52: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/52.jpg)
![Page 53: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813763550346895d9ef5b7/html5/thumbnails/53.jpg)