CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural...
Transcript of CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural...
CS388:NaturalLanguageProcessing
GregDurre8
Lecture5:NamedEn=tyRecogni=on,CRFs
Administrivia
‣ Project1duenextThursday
‣Mini1gradingunderway
Recall:HMMs
‣ Inferenceproblem:
‣ Viterbi:
y1 y2 yn
x1 x2 xn
… P (y,x) = P (y1)nY
i=2
P (yi|yi�1)nY
i=1
P (xi|yi)
argmaxyP (y|x) = argmaxyP (y,x)
P (x)
‣ Training:maximumlikelihoodes=ma=on(count+normalize)
scorei(s) = maxyi�1
P (s|yi�1)P (xi|s)scorei�1(yi�1)
y = (y1, ..., yn)Output‣ Inputx = (x1, ..., xn)
Recall:ViterbiAlgorithm
slidecredit:DanKlein
‣ Computescoresfornext=mestep(scoreofop=maltagsequenceendingwithtagiat=mestept)
Viterbi/HMMs:OtherResources
‣ Lecturenotesfrommyundergradcourse(postedonline)
‣ EisensteinChapter7.3butthenota=oncoversamoregeneralcasethanwhat’sdiscussedforHMMs
‣ Jurafsky+Mar=n8.4.5
‣WeignoretheSTOPtokenhere.It’snotinthetagsetandjustdon’tusetheseprobabili=es
ThisLecture
‣ Condi=onalrandomfields
‣ Next=me:finishupNERsystems
‣ FeaturesforNER
‣ InferenceandLearninginCRFs
NamedEn=tyRecogni=on
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
PERSON LOC ORG
B-PER I-PER O O O B-LOC B-ORGO O O O O
‣ BIOtagset:begin,inside,outside
‣WhymightanHMMnotdosowellhere?
‣ LotsofO’s
‣ Sequenceoftags—shouldweuseanHMM?
‣ Insufficientfeatures/capacitywithmul=nomials(especiallyforunks)
HMMsProsandCons
‣ Condi=onalrandomfields:logis=cregression+featuresonpairsofy’s
‣ Bigadvantage:transi=ons,scoringpairsofadjacenty’s
y1 y2 yn
x1 x2 xn
…
‣ Bigdownside:notabletoincorporateusefulwordcontextinforma=on
‣ Solu=on:switchfromgenera=vetodiscrimina=vemodel(condi=onalrandomfields)sowecancondi=onontheen=reinput.
Condi=onalRandomFields
Condi=onalRandomFields
‣ Flexiblediscrimina=vemodelfortaggingtasksthatcanusearbitraryfeaturesoftheinput.Similartologis=cregression,butstructured
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
B-PER I-PER
Curr_word=Barack&Label=B-PERNext_word=Obama&Label=B-PERCurr_word_starts_with_capital=True&Label=B-PERPosn_in_sentence=1st&Label=B-PERLabel=B-PER&Next-Label=I-PER…
TaggingwithLogis=cRegression‣ Logis=cregressionovereachtagindividually:
P (yi = y|x, i) = exp(w>f(y, i,x))Py02Y exp(w>f(y0, i,x))
<latexit sha1_base64="dDjgKeKxIN481MkDmhKPLvs3x3A=">AAADgnicrVJdb9MwFHWTASN8dfDIi0VVrUVVSTak8UClCV54LBLdhpouchxns+Y4ke2MWcb/g9/FG78G3DZja4vghStZOjn3nptz7ZtWjEoVhj9anr915+697fvBg4ePHj9p7zw9kmUtMJngkpXiJEWSMMrJRFHFyEklCCpSRo7Ti/fz/PElEZKW/JPSFZkV6IzTnGKkHJXstL6NezqhI/01LpA6T3NzZQe0D0cwzgXCJiZXVe869cWexqqs4PV3bnt6QAc3yn7fmljWRWL0LowpX1ZixMxna+E/e+2uNwu6498CbUexoiwj5oaxm64rUWaJoaPInnK4nK2RaZvQ1fqguzrmwnkj/YvPW/1WDf/P4W3S7oTDcBFwE0QN6IAmxkn7e5yVuC4IV5ghKadRWKmZQUJRzIgN4lqSCuELdEamDnJUEDkzixWysOuYDOalcIcruGBvKwwqpNRF6irnHuV6bk7+KTetVf5mZiivakU4Xv4orxlUJZzvI8yoIFgx7QDCgjqvEJ8j9ybKbW3gLiFaH3kTHO0No/3h3sfXncN3zXVsg+fgBeiBCByAQ/ABjMEE4NZPr+sNvVf+lv/Sj/z9ZanXajTPwEr4b38Bi9IlZA==</latexit>
“differentfeatures”approachtofeaturesforasingletag
Probabilityoftheithwordgemngassignedtagy(B-PER,etc.)
TaggingwithLogis=cRegression
‣ SetZequaltotheproductofdenominators;we’lldiscussthisinafewslides
‣ Logis=cregressionovereachtagindividually:
P (yi = y|x, i) = exp(w>f(y, i,x))Py02Y exp(w>f(y0, i,x))
<latexit sha1_base64="dDjgKeKxIN481MkDmhKPLvs3x3A=">AAADgnicrVJdb9MwFHWTASN8dfDIi0VVrUVVSTak8UClCV54LBLdhpouchxns+Y4ke2MWcb/g9/FG78G3DZja4vghStZOjn3nptz7ZtWjEoVhj9anr915+697fvBg4ePHj9p7zw9kmUtMJngkpXiJEWSMMrJRFHFyEklCCpSRo7Ti/fz/PElEZKW/JPSFZkV6IzTnGKkHJXstL6NezqhI/01LpA6T3NzZQe0D0cwzgXCJiZXVe869cWexqqs4PV3bnt6QAc3yn7fmljWRWL0LowpX1ZixMxna+E/e+2uNwu6498CbUexoiwj5oaxm64rUWaJoaPInnK4nK2RaZvQ1fqguzrmwnkj/YvPW/1WDf/P4W3S7oTDcBFwE0QN6IAmxkn7e5yVuC4IV5ghKadRWKmZQUJRzIgN4lqSCuELdEamDnJUEDkzixWysOuYDOalcIcruGBvKwwqpNRF6irnHuV6bk7+KTetVf5mZiivakU4Xv4orxlUJZzvI8yoIFgx7QDCgjqvEJ8j9ybKbW3gLiFaH3kTHO0No/3h3sfXncN3zXVsg+fgBeiBCByAQ/ABjMEE4NZPr+sNvVf+lv/Sj/z9ZanXajTPwEr4b38Bi9IlZA==</latexit>
=1
Zexp
nX
i=1
w>f(yi, i,x)
!
<latexit sha1_base64="9mWNFhX2lavGP1rdwRTHCIBRYmE=">AAADR3ichVJNj9MwEHWyfCzhqwtHLhbValOpqpoFCS6VVnDhWCS6u1BnI8dxWmudD9kObGT877hw5cZf4MIBhDjiNql22yIYKdLzm3mTN6OJS86kGg6/Ou7Otes3bu7e8m7fuXvvfmfvwbEsKkHohBS8EKcxlpSznE4UU5yeloLiLOb0JD5/ucifvKdCsiJ/o+qShhme5SxlBCtLRXtOuD/264iN6o8ow2oep/rC9FkPjiBKBSYa0YvSX6U+mDOkihKu3qnx6z7rXyp7PaORrLJI1wcQsbypJJjrt8bA//Y62GzmWXcrojYjpBhPqL5kzLbrUhRJpNkoMGc5bGZrZbWJ2Hq9t5oyMPpd4w9xmiq/maFt8g/HVzqvW0eCzeaqF3W6w8FwGXAbBC3ogjbGUecLSgpSZTRXhGMpp8GwVKHGQjHCqfFQJWmJyTme0amFOc6oDPXyDgzct0wC00LYL1dwyV5VaJxJWWexrVw4lZu5Bfm33LRS6fNQs7ysFM1J86O04lAVcHFUMGGCEsVrCzARzHqFZI7tYpU9Pc8uIdgceRscHw6CJ4PD10+7Ry/adeyCR+Ax8EEAnoEj8AqMwQQQ55Pzzfnh/HQ/u9/dX+7vptR1Ws1DsBY7zh/X7xNL</latexit>
‣ Scoreofapredic=on:sumofweightsdotfeaturesovereachindividualpredictedtag(thisisasimpleCRFbutnotthegeneralform)
“differentfeatures”approachtofeaturesforasingletag
P (y = y|x) =nY
i=1
P (yi = yi|x, i)<latexit sha1_base64="u6eeugJqXQbeP8PnEClvMeO7rAk=">AAAE83ic1VTLbhMxFHU7A7Th0RSWbCyiqIkaVZmCBJtIFWxYBom0pXE68jiexKrnIdsDHRn/BhsWIMSWn2HH3+BkJuRFeYgVVxrpzH0cn2PZDlLOpGq3v21sOu616ze2tis3b92+s1PdvXssk0wQ2iMJT8RpgCXlLKY9xRSnp6mgOAo4PQkunk3qJ6+pkCyJX6o8pYMIj2IWMoKVTfm7zna928h91snfogircRDqS9NiTdiBKBSYaEQv08as9MacI5WkcPYfmkbeYq35ZLNpNJJZ5Ot8DyIWF50Ec/3KGPhbrr1Vskr3R39uOkgxPqR6njELoqeSU5EMfc06njmPYWGsHMqNz5YtVuozj57RZ4U6xGmoGoWDkuUXeheol4UjwUZjZVf4S/n/osanV+vZh39KopZI5lizfc9cYfJsIn3Kv+7wPzLiV2vtg/Y04DrwSlADZXT96lc0TEgW0VgRjqXse+1UDTQWihFOTQVlkqaYXOAR7VsY44jKgZ7eWQPrNjOEYSLsFys4zS5OaBxJmUeB7Zwolau1SfJntX6mwicDzeI0UzQmxUJhxqFK4OQBgEMmKFE8twATwaxWSMbYHjxln4mK3QRv1fI6OD488B4eHL54VDt6Wm7HFrgPHoAG8MBjcASegy7oAeKkzjvng/PRzdz37if3c9G6uVHO3ANL4X75DrNZtXc=</latexit>
‣ Overalltags:
‣ Condi=onalmodel:xisobserved,yisn’t
Example:“EmissionFeatures”fe
BarackObamawilltravelB-PERI-PEROO
BarackObamawilltravelB-PERB-PEROO
feats=fe(B-PER,i=1,x)+fe(I-PER,i=2,x)+fe(O,i=3,x)+fe(O,i=4,x)
feats=fe(B-PER,i=1,x)+fe(B-PER,i=2,x)+fe(O,i=3,x)+fe(O,i=4,x)
[CurrWord=Obama&label=I-PER,PrevWord=Barack&label=I-PER,CurrWordIsCapitalized&label=I-PER,…]
AddingStructure
‣Wewanttobeabletolearnthatsometagsdon’tfollowothertags—wanttohavefeaturesontagpairs
=1
Zexp
nX
i=1
w>f(yi, i,x)
!
<latexit sha1_base64="9mWNFhX2lavGP1rdwRTHCIBRYmE=">AAADR3ichVJNj9MwEHWyfCzhqwtHLhbValOpqpoFCS6VVnDhWCS6u1BnI8dxWmudD9kObGT877hw5cZf4MIBhDjiNql22yIYKdLzm3mTN6OJS86kGg6/Ou7Otes3bu7e8m7fuXvvfmfvwbEsKkHohBS8EKcxlpSznE4UU5yeloLiLOb0JD5/ucifvKdCsiJ/o+qShhme5SxlBCtLRXtOuD/264iN6o8ow2oep/rC9FkPjiBKBSYa0YvSX6U+mDOkihKu3qnx6z7rXyp7PaORrLJI1wcQsbypJJjrt8bA//Y62GzmWXcrojYjpBhPqL5kzLbrUhRJpNkoMGc5bGZrZbWJ2Hq9t5oyMPpd4w9xmiq/maFt8g/HVzqvW0eCzeaqF3W6w8FwGXAbBC3ogjbGUecLSgpSZTRXhGMpp8GwVKHGQjHCqfFQJWmJyTme0amFOc6oDPXyDgzct0wC00LYL1dwyV5VaJxJWWexrVw4lZu5Bfm33LRS6fNQs7ysFM1J86O04lAVcHFUMGGCEsVrCzARzHqFZI7tYpU9Pc8uIdgceRscHw6CJ4PD10+7Ry/adeyCR+Ax8EEAnoEj8AqMwQQQ55Pzzfnh/HQ/u9/dX+7vptR1Ws1DsBY7zh/X7xNL</latexit>
‣ Score:sumofweightsdotfefeaturesovereachpredictedtag(“emissions”)plussumofweightsdotftfeaturesovertagpairs(“transi=ons”)
P (y = y|x) =nY
i=1
P (yi = yi|x, i)<latexit sha1_base64="u6eeugJqXQbeP8PnEClvMeO7rAk=">AAAE83ic1VTLbhMxFHU7A7Th0RSWbCyiqIkaVZmCBJtIFWxYBom0pXE68jiexKrnIdsDHRn/BhsWIMSWn2HH3+BkJuRFeYgVVxrpzH0cn2PZDlLOpGq3v21sOu616ze2tis3b92+s1PdvXssk0wQ2iMJT8RpgCXlLKY9xRSnp6mgOAo4PQkunk3qJ6+pkCyJX6o8pYMIj2IWMoKVTfm7zna928h91snfogircRDqS9NiTdiBKBSYaEQv08as9MacI5WkcPYfmkbeYq35ZLNpNJJZ5Ot8DyIWF50Ec/3KGPhbrr1Vskr3R39uOkgxPqR6njELoqeSU5EMfc06njmPYWGsHMqNz5YtVuozj57RZ4U6xGmoGoWDkuUXeheol4UjwUZjZVf4S/n/osanV+vZh39KopZI5lizfc9cYfJsIn3Kv+7wPzLiV2vtg/Y04DrwSlADZXT96lc0TEgW0VgRjqXse+1UDTQWihFOTQVlkqaYXOAR7VsY44jKgZ7eWQPrNjOEYSLsFys4zS5OaBxJmUeB7Zwolau1SfJntX6mwicDzeI0UzQmxUJhxqFK4OQBgEMmKFE8twATwaxWSMbYHjxln4mK3QRv1fI6OD488B4eHL54VDt6Wm7HFrgPHoAG8MBjcASegy7oAeKkzjvng/PRzdz37if3c9G6uVHO3ANL4X75DrNZtXc=</latexit>
‣ Thisisasequen=alCRF
P (y = y|x) = 1
Zexp
nX
i=1
w>fe(yi, i,x) +nX
i=2
w>ft(yi�1, yi, i,x)
!
<latexit sha1_base64="lyRXhBWi93dh2TsKkX2Efg4P9mc=">AAAC0HicfVJNj9MwEHXC11K+Chy5WFRIXVGqpCDBpdIKLlyQCqK7K+o2clyntdZxInsCjbwW4srP48Yv4G/gdlPo7iJGsvTmjefNeMZpKYWBKPoZhFeuXrt+Y+9m69btO3fvte8/ODRFpRkfs0IW+jilhkuh+BgESH5cak7zVPKj9OTNOn70mWsjCvUR6pJPc7pQIhOMgqeS9q9Rl+QUlmlmazckIOSc27+MO93ildvHQ0wyTZmNnf3kMOGrkkieQZeYKk+sGMZupvA24YubESjKP37mEt5tCtQuET3R29F+irciAzezyv1PBnZkrHgWu96u784LEy0WS9hP2p2oH20MXwZxAzqosVHS/kHmBatyroBJaswkjkqYWqpBMMldi1SGl5Sd0AWfeKhozs3Ubhbi8BPPzHFWaH8U4A27m2Fpbkydp/7mulNzMbYm/xWbVJC9mlqhygq4YmeFskpiKPB6u3guNGcgaw8o08L3itmS+p2B/wMtP4T44pMvg8NBP37eH7x/0Tl43YxjDz1Cj1EXxeglOkBv0QiNEQveBSY4DVz4IVyFX8NvZ1fDoMl5iM5Z+P03x2PmcQ==</latexit>
Example
BarackObamawilltravelB-PERI-PEROO
BarackObamawilltravelB-PERB-PEROO
feats=fe(B-PER,i=1,x)+fe(I-PER,i=2,x)+fe(O,i=3,x)+fe(O,i=4,x)+ft(B-PER,I-PER,i=1,x)+ft(I-PER,O,i=2,x)+ft(O,O,i=3,x)
feats=fe(B-PER,i=1,x)+fe(B-PER,i=2,x)+fe(O,i=3,x)+fe(O,i=4,x)+ft(B-PER,B-PER,i=1,x)+ft(B-PER,O,i=2,x)+ft(O,O,i=3,x)
‣ Obamacanstartanewnameden=ty(emissionfeatslookokay),butwe’renotlikelytohavetwoPERen==esinarow(transi=onfeats)
Sequen=alCRFs
‣ Cri=calproperty:thisstructureisgoingtoallowustousedynamicprogramming(Viterbi)tosumormaxoverallsequences
P (y = y|x) = 1
Zexp
nX
i=1
w>fe(yi, i,x) +nX
i=2
w>ft(yi�1, yi, i,x)
!
<latexit sha1_base64="lyRXhBWi93dh2TsKkX2Efg4P9mc=">AAAC0HicfVJNj9MwEHXC11K+Chy5WFRIXVGqpCDBpdIKLlyQCqK7K+o2clyntdZxInsCjbwW4srP48Yv4G/gdlPo7iJGsvTmjefNeMZpKYWBKPoZhFeuXrt+Y+9m69btO3fvte8/ODRFpRkfs0IW+jilhkuh+BgESH5cak7zVPKj9OTNOn70mWsjCvUR6pJPc7pQIhOMgqeS9q9Rl+QUlmlmazckIOSc27+MO93ildvHQ0wyTZmNnf3kMOGrkkieQZeYKk+sGMZupvA24YubESjKP37mEt5tCtQuET3R29F+irciAzezyv1PBnZkrHgWu96u784LEy0WS9hP2p2oH20MXwZxAzqosVHS/kHmBatyroBJaswkjkqYWqpBMMldi1SGl5Sd0AWfeKhozs3Ubhbi8BPPzHFWaH8U4A27m2Fpbkydp/7mulNzMbYm/xWbVJC9mlqhygq4YmeFskpiKPB6u3guNGcgaw8o08L3itmS+p2B/wMtP4T44pMvg8NBP37eH7x/0Tl43YxjDz1Cj1EXxeglOkBv0QiNEQveBSY4DVz4IVyFX8NvZ1fDoMl5iM5Z+P03x2PmcQ==</latexit>
‣ HowdoesthiscomparetoHMMs?
HMMsvs.CRFs
y1 y2 yn
x1 x2 xn
…
‣ Bothmodelsareexpressibleindifferentfactorgraphnota=on
P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .
y1 y2 yn…
�t
�e
P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
‣ Phisare“poten=als”,usedinthegeneralCRFformula=on
P (y,x) = P (y1)nY
i=2
P (yi|yi�1)nY
i=1
P (xi|yi)<latexit sha1_base64="DwFPL8cTe3GQA7nLvAL1TZ0b6wg=">AAADkXicbVJbb9MwFHYTLiPA6LZHXiyqilSUqukmAQ9FFbwg8VIkug7qLnJcp7WWm2xna5T69/B/eOPf4DZZlXY9kqXP5/t8bj5eEjAhu91/NcN89PjJ06Nn1vMXL49f1U9OL0WcckJHJA5ifuVhQQMW0ZFkMqBXCac49AI69m6+rvnxLeWCxdFPmSV0GuJ5xHxGsNQu96T2pzm0sxUKsVx4fr5ULdiHyOeY5IguE/ueuFNudo1knEDfrohbKkciDd08ewsRi+CGIjjIfykF9wNokToYw2pWVOcHJFog6VLyMBexL0O8VPZ4N4al29ho9D1Tq3tY7cdR+e+iKHhX5EAB9eUEFh2wfk9dR9B3pZ3p23tHtTOXteC7Le8UPNU8a7N2JT3ibL6QU2u4rSlT7d2Z6im7TguihMezbbK1k8EVLBNWaKeglyXNWm690e10NwYfAqcEDVDa0K3/RbOYpCGNJAmwEBOnm8hpjrlkJKDKQqmgCSY3eE4nGkY4pGKabzZKwab2zKAfc30iCTfe6osch0JkoaeV6y7FPrd2HuImqfQ/TnMWJamkESkS+WkAZQzX6wlnjFMig0wDTDjTtUKywPr7pF5iSw/B2W/5IbjsdZzzTu/HRWPwpRzHEXgN3gAbOOADGIBvYAhGgBjHxoXRNz6bZ+Ync2CWWqNWvjkDO2Z+/w+GlSQt</latexit>
CRF
HMM
HMMsvs.CRFs
‣ HMMs:inthestandardHMM,emissionsconsideronewordata=me
‣ CRFssupportfeaturesovermanywordssimultaneously,non-independentfeatures(e.g.,suffixesandprefixes),notgenera=vemodels
‣ NaiveBayes:logis=cregression::HMMs:CRFslocalvs.globalnormaliza=on<->genera=vevs.discrimina=ve
(locallynormalizeddiscrimina=vemodelsdoexist(MEMMs))
CRFsinGeneral
anyreal-valuedscoringfunc=onofitsarguments
‣ CRFs:discrimina=vemodelwiththefollowingform:
P (y|x) = 1
Z
Y
k
exp(�k(x,y))
normalizer
‣ Ourspecialcase:linearfeature-basedpoten=als �k(x,y) = w>fk(x,y)
P (y|x) = 1
Zexp
nX
k=1
w>fk(x,y)
!
‣ Problem:intractableinferenceinthegeneralcase!Compu=ngZrequiresanexponentsum
FeaturesforNER
BasicFeaturesforNER
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
OB-LOC
Transi=ons:
Emissions: Ind[B-LOC&Currentword=Hangzhou]Ind[B-LOC&Prevword=to]
ft(yi�1, yi) = Ind[yi�1 & yi]
fe(y6, 6,x) =
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
=Ind[O—B-LOC]Ind[yi-1—yi]
EmissionFeaturesforNER
Leicestershireisaniceplacetovisit…
Itookavaca=ontoBoston
Applereleasedanewversion…
AccordingtotheNewYorkTimes…
ORG
ORG
LOC
LOC
TexasgovernorGregAbboJsaid
LeonardoDiCapriowonanaward…
PER
PER
LOC
�e(yi, i,x)
EmissionFeaturesforNER
‣ Contextfeatures(can’tuseinHMM!)‣Wordsbefore/auer‣ Tagsbefore/auer
‣Wordfeatures(canuseinHMM)‣ Capitaliza=on‣Wordshape‣ Prefixes/suffixes‣ Lexicalindicators
‣ Gaze8eers‣Wordclusters
Leicestershire
Boston
Applereleasedanewversion…
AccordingtotheNewYorkTimes…
CRFsOutline
‣Model: P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
‣ Inference
‣ Learning
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
InferenceandLearninginCRFs
Compu=ng(arg)maxes
y1 y2 yn…
�e
�t
‣ :canuseViterbiexactlyasinHMMcase
‣ andplaytheroleofthePsnow,usetheexactsameViterbidynamicprogramexp(�t(yi�1, yi)) exp(�e(yi, i,x))
P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
argmaxyP (y|x)
maxy1,...,yn
e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x)e�t(y1,y2)e�e(y1,1,x)
InferenceinGeneralCRFs
y1 y2 yn…
�e
�t
‣ Candoefficientinferenceinanytree-structuredCRF
‣Max-productalgorithm:generaliza=onofViterbitoarbitrarytree-structuredgraphs(sum-productisgeneraliza=onofforward-backward)
CRFsOutline
‣Model: P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
‣ Inference:argmaxP(y|x)fromViterbi
‣ Learning
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
TrainingCRFs
‣ Gradientisanalogoustologis=cregression:goldfeats—expectedfeats
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
P (y|x) / expw>f(x, y)‣ Logis=cregression:
‣ ForCRFs:maximize L(y⇤,x) = logP (y⇤|x)
intractable!
@
@wL(y⇤,x) =
nX
i=2
ft(y⇤i�1, y
⇤i ) +
nX
i=1
fe(y⇤i , i,x)
�Ey
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
TrainingCRFs
‣ Let’sfocusonemissionfeatureexpecta=on
@
@wL(y⇤,x) =
nX
i=2
ft(y⇤i�1, y
⇤i ) +
nX
i=1
fe(y⇤i , i,x)
�Ey
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
Ey
"nX
i=1
fe(yi, i,x)
#=
X
y2YP (y|x)
"nX
i=1
fe(yi, i,x)
#=
nX
i=1
X
y2YP (y|x)fe(yi, i,x)
=nX
i=1
X
s
P (yi = s|x)fe(s, i,x)
TrainingCRFs
=nX
i=1
X
s
P (yi = s|x)fe(s, i,x)sumover=mesteps
sumovertagsfeatsofthattagatthatstep
marginalprobability
Forward-BackwardAlgorithm‣ Howdowecomputethesemarginals?P (yi = s|x)
P (yi = s|x) =X
y1,...,yi�1,yi+1,...,yn
P (y|x)
‣WhatdidViterbicompute? P (ymax|x) = maxy1,...,yn
P (y|x)
‣ Cancomputemarginalswithdynamicprogrammingaswellusingforward-backward
Forward-BackwardAlgorithm
P (y3 = 2|x) =sum of all paths through state 2 at time 3
sum of all paths
Forward-BackwardAlgorithm
slidecredit:DanKlein
P (y3 = 2|x) =sum of all paths through state 2 at time 3
sum of all paths
=
‣ Easiestandmostflexibletodoonepasstocomputeandonetocompute
Forward-BackwardAlgorithm
‣ Ini=al:
‣ Recurrence:
‣ SameasViterbibutsumminginsteadofmaxing!
‣ Thesequan==esgetverysmall!Storeeverythingaslogprobabili=es
↵1(s) = exp(�e(s, 1,x))<latexit sha1_base64="H++BwoHSkR4cFwia4mrESCMTX7M=">AAACE3icbVBNS8NAEN3Ur1q/oh69LBahlVKSKuhFEL14VLC20JSw2U7apZtk2d1IS+h/8OJf8eJBEa9evPlv3NYetPXBwOO9GWbmBYIzpR3ny8otLC4tr+RXC2vrG5tb9vbOnUpSSaFOE57IZkAUcBZDXTPNoSkkkCjg0Aj6l2O/cQ9SsSS+1UMB7Yh0YxYySrSRfPvQI1z0iO+WVBmfYQ8GouSJHvOhpCpuxYuI7gVhNhiVy75ddKrOBHieuFNSRFNc+/an10loGkGsKSdKtVxH6HZGpGaUw6jgpQoEoX3ShZahMYlAtbPJTyN8YJQODhNpKtZ4ov6eyEik1DAKTOf4RjXrjcX/vFaqw9N2xmKRaojpz6Iw5VgneBwQ7jAJVPOhIYRKZm7FtEckodrEWDAhuLMvz5O7WtU9qtZujovnF9M48mgP7aMSctEJOkdX6BrVEUUP6Am9oFfr0Xq23qz3n9acNZ3ZRX9gfXwDGCWcbA==</latexit>
↵t(st) =X
st�1
↵t�1(st�1) exp(�e(st, t,x))<latexit sha1_base64="MsrARqnMVgv4lQtn8MktFrMi2DM=">AAAChHicbVFNT9wwEHVCS+n2g2175GKxWjWR0lUMreilCNFLjyB1AWmzshyvw1o4iWVPEKsov4R/1Rv/pk4IogsdydLze2/G45lUK2khju88f+PFy81XW68Hb96+e789/PDxzJaV4WLKS1Wai5RZoWQhpiBBiQttBMtTJc7Tq5+tfn4tjJVl8RtWWsxzdlnITHIGjqLD23HClF4ySgIb4h84ETc6SPRSUhHYiERJzmCZZvVNE4aD3gqBpdCZbZXT2tIavpCmwb3c3YKeDdcrUohgveb4UYeHpKitH2I6HMWTuAv8HJAejFAfJ3T4J1mUvMpFAVwxa2ck1jCvmQHJlWgGSWWFZvyKXYqZgwXLhZ3X3RAbPHbMAmelcacA3LH/ZtQst3aVp87Z9m+fai35P21WQfZ9XstCVyAKfv9QVikMJW43ghfSCA5q5QDjRrpeMV8ywzi4vQ3cEMjTLz8HZ3sTsj/ZO/06Ojrux7GFdtAuChBBB+gI/UInaIq453mfvdgj/qYf+fv+t3ur7/U5n9Ba+Id/AUKdvsc=</latexit> exp(�t(st�1, st))
<latexit sha1_base64="+aaYS+pf0KkwlV5IpyMknO6c7lg=">AAAChHicbVFNT9wwEHUCpXT7tdBjLxYr1ERKVzG0gksr1F44gtQFpM3KcrwOa+Eklj1BrKL8kv4rbvybOiGoLDCSpef33ozHM6lW0kIc33n+2vqrjdebbwZv373/8HG4tX1my8pwMeGlKs1FyqxQshATkKDEhTaC5akS5+nV71Y/vxbGyrL4A0stZjm7LGQmOQNH0eHf3YQpvWCUBDbEP3AibnSQ6IWkIrARiZKcwSLN6psmDAcPXggshc5tq5zWltbwlTQN7uXuFvRsuFqSQgSrRf/L8JATteVDTIejeBx3gZ8D0oMR6uOEDm+TecmrXBTAFbN2SmINs5oZkFyJZpBUVmjGr9ilmDpYsFzYWd0NscG7jpnjrDTuFIA79nFGzXJrl3nqnG379qnWki9p0wqyw1ktC12BKPj9Q1mlMJS43QieSyM4qKUDjBvpesV8wQzj4PY2cEMgT7/8HJztjcn+eO/02+joVz+OTfQZ7aAAEXSAjtAxOkETxD3P++LFHvE3/Mjf97/fW32vz/mEVsL/+Q8xFr7H</latexit>
Forward-BackwardAlgorithm
‣ Ini=al:�n(s) = 1
‣ Recurrence:
‣ Bigdifferences:countemissionforthenext=mestep(notcurrentone)
�t(st) =X
st+1
�t+1(st+1) exp(�e(st+1, t+ 1,x))
<latexit sha1_base64="o+YDBL44WMQt+0KfoSuWzIAvSK8=">AAAC3XicbVLLbtQwFHXCq4RHp7BkYzGqlIgwSgpS2SBVsGFZJKatmIwsx+N0rDqJZd+gjqJIbFiAEFv+ix3/wQfguClipr2SreNz7r3Hr1xJYSBJfnv+jZu3bt/Zuhvcu//g4fZo59GRqRvN+JTVstYnOTVciopPQYDkJ0pzWuaSH+dnb3v9+BPXRtTVB1gpPi/paSUKwShYioz+7GZUqiUlaWgi/Bpn/FyFmVoKwkMTp3FWUljmRXveRVFwmQuhIeCyTVOS1pAWnqddhwfZrcKBjdZbEohhs+k/HS6L4r5/hIMs53Ct3TNn51S3CAdy082RsZ3WPMlonEwSF/gqSAcwRkMcktGvbFGzpuQVMEmNmaWJgnlLNQgmeRdkjeGKsjN6ymcWVrTkZt661+nwrmUWuKi1HRVgx/5f0dLSmFWZ28x+j2ZT68nrtFkDxat5KyrVAK/YhVHRSAw17p8aL4TmDOTKAsq0sHvFbEk1ZWA/RGAvId088lVwtDdJX0z23r8cH7wZrmMLPUFPUYhStI8O0Dt0iKaIeR+9z95X75tP/C/+d//HRarvDTWP0Vr4P/8Cs27g5w==</latexit> exp(�t(st, st+1))<latexit sha1_base64="MMxuphUYCcsoCkHdOiUL0s5PNcs=">AAAC+nicbVLLbtQwFHVSHu3wmpYlG4tRpUSEUVKQYINUwYZlkZi20mRkOR6nY9VJLPsGOgr5FDYsQIgtX9Idf4PjZhAz7ZVsHZ9z7z1+ZUoKA3H8x/O3bt2+c3d7Z3Dv/oOHj4a7e8emqjXjE1bJSp9m1HApSj4BAZKfKs1pkUl+kp2/6/STT1wbUZUfYan4rKBnpcgFo2ApsusN91Mq1YKSJDAhfoNTfqGCVC0E4YGJkigtKCyyvLlow3CwyoXAEHDZpi5IY0gDz5O2xb3sVkHPhustCUSw2fSfDquiqOsfYqtlHG70e+b8nOoWQU9u2jkystO66bonRKvqEJPhKB7HLvB1kPRghPo4IsPLdF6xuuAlMEmNmSaxgllDNQgmeTtIa8MVZef0jE8tLGnBzaxxT9fifcvMcV5pO0rAjv2/oqGFMcsis5nd9s2m1pE3adMa8tezRpSqBl6yK6O8lhgq3P0DPBeaM5BLCyjTwu4VswXVlIH9LQN7Ccnmka+D44Nx8mJ88OHl6PBtfx3b6Al6igKUoFfoEL1HR2iCmPfZ++p99374X/xv/k//11Wq7/U1j9Fa+L//AuUc6iA=</latexit>
Forward-BackwardAlgorithm
�n(s) = 1
P (s3 = 2|x) = ↵3(2)�3(2)Pi ↵3(i)�3(i)
‣Whatdoesthedenominatorheremean?
‣ Doesthisexplainwhybetaiswhatitis?
↵1(s) = exp(�e(s, 1,x))<latexit sha1_base64="H++BwoHSkR4cFwia4mrESCMTX7M=">AAACE3icbVBNS8NAEN3Ur1q/oh69LBahlVKSKuhFEL14VLC20JSw2U7apZtk2d1IS+h/8OJf8eJBEa9evPlv3NYetPXBwOO9GWbmBYIzpR3ny8otLC4tr+RXC2vrG5tb9vbOnUpSSaFOE57IZkAUcBZDXTPNoSkkkCjg0Aj6l2O/cQ9SsSS+1UMB7Yh0YxYySrSRfPvQI1z0iO+WVBmfYQ8GouSJHvOhpCpuxYuI7gVhNhiVy75ddKrOBHieuFNSRFNc+/an10loGkGsKSdKtVxH6HZGpGaUw6jgpQoEoX3ShZahMYlAtbPJTyN8YJQODhNpKtZ4ov6eyEik1DAKTOf4RjXrjcX/vFaqw9N2xmKRaojpz6Iw5VgneBwQ7jAJVPOhIYRKZm7FtEckodrEWDAhuLMvz5O7WtU9qtZujovnF9M48mgP7aMSctEJOkdX6BrVEUUP6Am9oFfr0Xq23qz3n9acNZ3ZRX9gfXwDGCWcbA==</latexit>
↵t(st) =X
st�1
↵t�1(st�1) exp(�e(st, t,x))<latexit sha1_base64="MsrARqnMVgv4lQtn8MktFrMi2DM=">AAAChHicbVFNT9wwEHVCS+n2g2175GKxWjWR0lUMreilCNFLjyB1AWmzshyvw1o4iWVPEKsov4R/1Rv/pk4IogsdydLze2/G45lUK2khju88f+PFy81XW68Hb96+e789/PDxzJaV4WLKS1Wai5RZoWQhpiBBiQttBMtTJc7Tq5+tfn4tjJVl8RtWWsxzdlnITHIGjqLD23HClF4ySgIb4h84ETc6SPRSUhHYiERJzmCZZvVNE4aD3gqBpdCZbZXT2tIavpCmwb3c3YKeDdcrUohgveb4UYeHpKitH2I6HMWTuAv8HJAejFAfJ3T4J1mUvMpFAVwxa2ck1jCvmQHJlWgGSWWFZvyKXYqZgwXLhZ3X3RAbPHbMAmelcacA3LH/ZtQst3aVp87Z9m+fai35P21WQfZ9XstCVyAKfv9QVikMJW43ghfSCA5q5QDjRrpeMV8ywzi4vQ3cEMjTLz8HZ3sTsj/ZO/06Ojrux7GFdtAuChBBB+gI/UInaIq453mfvdgj/qYf+fv+t3ur7/U5n9Ba+Id/AUKdvsc=</latexit> exp(�t(st�1, st))
<latexit sha1_base64="+aaYS+pf0KkwlV5IpyMknO6c7lg=">AAAChHicbVFNT9wwEHUCpXT7tdBjLxYr1ERKVzG0gksr1F44gtQFpM3KcrwOa+Eklj1BrKL8kv4rbvybOiGoLDCSpef33ozHM6lW0kIc33n+2vqrjdebbwZv373/8HG4tX1my8pwMeGlKs1FyqxQshATkKDEhTaC5akS5+nV71Y/vxbGyrL4A0stZjm7LGQmOQNH0eHf3YQpvWCUBDbEP3AibnSQ6IWkIrARiZKcwSLN6psmDAcPXggshc5tq5zWltbwlTQN7uXuFvRsuFqSQgSrRf/L8JATteVDTIejeBx3gZ8D0oMR6uOEDm+TecmrXBTAFbN2SmINs5oZkFyJZpBUVmjGr9ilmDpYsFzYWd0NscG7jpnjrDTuFIA79nFGzXJrl3nqnG379qnWki9p0wqyw1ktC12BKPj9Q1mlMJS43QieSyM4qKUDjBvpesV8wQzj4PY2cEMgT7/8HJztjcn+eO/02+joVz+OTfQZ7aAAEXSAjtAxOkETxD3P++LFHvE3/Mjf97/fW32vz/mEVsL/+Q8xFr7H</latexit>
�t(st) =X
st+1
�t+1(st+1) exp(�e(st+1, t+ 1,x))
<latexit sha1_base64="o+YDBL44WMQt+0KfoSuWzIAvSK8=">AAAC3XicbVLLbtQwFHXCq4RHp7BkYzGqlIgwSgpS2SBVsGFZJKatmIwsx+N0rDqJZd+gjqJIbFiAEFv+ix3/wQfguClipr2SreNz7r3Hr1xJYSBJfnv+jZu3bt/Zuhvcu//g4fZo59GRqRvN+JTVstYnOTVciopPQYDkJ0pzWuaSH+dnb3v9+BPXRtTVB1gpPi/paSUKwShYioz+7GZUqiUlaWgi/Bpn/FyFmVoKwkMTp3FWUljmRXveRVFwmQuhIeCyTVOS1pAWnqddhwfZrcKBjdZbEohhs+k/HS6L4r5/hIMs53Ct3TNn51S3CAdy082RsZ3WPMlonEwSF/gqSAcwRkMcktGvbFGzpuQVMEmNmaWJgnlLNQgmeRdkjeGKsjN6ymcWVrTkZt661+nwrmUWuKi1HRVgx/5f0dLSmFWZ28x+j2ZT68nrtFkDxat5KyrVAK/YhVHRSAw17p8aL4TmDOTKAsq0sHvFbEk1ZWA/RGAvId088lVwtDdJX0z23r8cH7wZrmMLPUFPUYhStI8O0Dt0iKaIeR+9z95X75tP/C/+d//HRarvDTWP0Vr4P/8Cs27g5w==</latexit> exp(�t(st, st+1))<latexit sha1_base64="MMxuphUYCcsoCkHdOiUL0s5PNcs=">AAAC+nicbVLLbtQwFHVSHu3wmpYlG4tRpUSEUVKQYINUwYZlkZi20mRkOR6nY9VJLPsGOgr5FDYsQIgtX9Idf4PjZhAz7ZVsHZ9z7z1+ZUoKA3H8x/O3bt2+c3d7Z3Dv/oOHj4a7e8emqjXjE1bJSp9m1HApSj4BAZKfKs1pkUl+kp2/6/STT1wbUZUfYan4rKBnpcgFo2ApsusN91Mq1YKSJDAhfoNTfqGCVC0E4YGJkigtKCyyvLlow3CwyoXAEHDZpi5IY0gDz5O2xb3sVkHPhustCUSw2fSfDquiqOsfYqtlHG70e+b8nOoWQU9u2jkystO66bonRKvqEJPhKB7HLvB1kPRghPo4IsPLdF6xuuAlMEmNmSaxgllDNQgmeTtIa8MVZef0jE8tLGnBzaxxT9fifcvMcV5pO0rAjv2/oqGFMcsis5nd9s2m1pE3adMa8tezRpSqBl6yK6O8lhgq3P0DPBeaM5BLCyjTwu4VswXVlIH9LQN7Ccnmka+D44Nx8mJ88OHl6PBtfx3b6Al6igKUoFfoEL1HR2iCmPfZ++p99374X/xv/k//11Wq7/U1j9Fa+L//AuUc6iA=</latexit>
Compu=ngMarginals
y1 y2 yn…
�e
�t
P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
Z =X
y
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
‣ ForbothHMMsandCRFs:
‣ Normalizingconstant
P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s
0)backwardi(s0)
ZforCRFs,P(x)forHMMs
‣ AnalogoustoP(x)forHMMs
Posteriorsvs.Probabili=es
P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s
0)backwardi(s0)
‣ Posteriorisderivedfromtheparametersandthedata(condi=onedonx!)
HMM
CRF
Modelparameter(usuallymul=nomialdistribu=on)
Inferredquan=tyfromforward-backward
Inferredquan=tyfromforward-backward
Undefined(modelisbydefini=oncondi=onedonx)
P (xi|yi), P (yi|yi�1) P (yi|x), P (yi�1, yi|x)
TrainingCRFs
‣ Transi=onfeatures:needtocompute
‣ …butyoucanbuildapre8ygoodsystemwithoutlearnedtransi=onfeatures(useheuris=cweights,orjustenforceconstraintslikeB-PER->I-ORGisillegal)
P (yi = s1, yi+1 = s2|x)usingforward-backwardaswell
‣ Foremissionfeatures:
goldfeatures—expectedfeaturesundermodel
@
@wL(y⇤,x) =
nX
i=1
fe(y⇤i , i,x)�
nX
i=1
X
s
P (yi = s|x)fe(s, i,x)
CRFsOutline
‣Model: P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
‣ Inference:argmaxP(y|x)fromViterbi
‣ Learning:runforward-backwardtocomputeposteriorprobabili=es;then
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
@
@wL(y⇤,x) =
nX
i=1
fe(y⇤i , i,x)�
nX
i=1
X
s
P (yi = s|x)fe(s, i,x)
PseudocodeandTips
Pseudocode
foreachepoch
foreachexample
extractfeaturesoneachemissionandtransi=on(lookupincache)
computemarginalprobabili=eswithforward-backward
computepoten=alsphibasedonfeatures+weights
accumulategradientoverallemissionsandtransi=ons
Implementa=onTipsforCRFs‣ Cachingisyourfriend!Cachefeaturevectorsespecially
‣ Trytoreduceredundantcomputa=on,e.g.ifyoucomputeboththegradientandtheobjec=vevalue,don’trerunthedynamicprogram
‣ Ifthingsaretooslow,runaprofilerandseewhere=meisbeingspent.Forward-backwardshouldtakemostofthe=me
‣ Exploitsparsityinfeaturevectorswherepossible,especiallyinfeaturevectorsandgradients
‣ Doalldynamicprogramcomputa=oninlogspacetoavoidunderflow
DebuggingTipsforCRFs‣ Hardtoknowwhetherinference,learning,orthemodelisbroken!
‣ Computetheobjec=ve—isop=miza=onworking?
‣ Learning:istheobjec=vegoingdown?Trytofit1example/10examples.Areyouapplyingthegradientcorrectly?
‣ Inference:checkgradientcomputa=on(mostlikelyplaceforbugs)
‣ Isthesameforalli?‣ Doprobabili=esnormalizecorrectly+look“reasonable”?(Nearlyuniformwhenuntrained,thenslowlyconvergingtotherightthing)
‣ Ifobjec=veisgoingdownbutmodelperformanceisbad:
‣ Inference:checkperformanceifyoudecodethetrainingset
X
s
forwardi(s)backwardi(s)
NextTime
‣ FinishdiscussingNER
‣ Neuralnetworks