Lexical Semantics Wrap-up and Midterm Reviekathy/NLP/2019/ClassSlides/Class13... · 2019-10-16 ·...
Transcript of Lexical Semantics Wrap-up and Midterm Reviekathy/NLP/2019/ClassSlides/Class13... · 2019-10-16 ·...
LexicalSemanticsWrap-upandMidtermReview
Howdoweknowwhenawordhasmorethanonesense?• ATISexamples
• Whichflightsservebreakfast?• DoesAmericaWestservePhiladelphia?
• The“zeugma”test:
• ?DoesUnitedservebreakfastandSanJose?
Synonyms• Wordthathavethesamemeaninginsomeorallcontexts.• filbert/hazelnut• couch/sofa• big/large• automobile/car• vomit/throwup• Water/H20
• TwolexemesaresynonymsiftheycanbesuccessfullysubsNtutedforeachotherinallsituaNons• Ifsotheyhavethesameproposi&onalmeaning
Synonyms• Buttherearefew(orno)examplesofperfectsynonymy.• Whyshouldthatbe?• EvenifmanyaspectsofmeaningareidenNcal• SNllmaynotpreservetheacceptabilitybasedonnoNonsofpoliteness,slang,register,genre,etc.
• Example:• WaterandH20
Somemoreterminology• Lemmasandwordforms
• Alexemeisanabstractpairingofmeaningandform• Alemmaorcita&onformisthegrammaNcalformthatisusedtorepresentalexeme.
• Carpetisthelemmaforcarpets• Dormiristhelemmaforduermes.
• Specificsurfaceformscarpets,sung,duermesarecalledwordforms
• Thelemmabankhastwosenses:• Instead,abankcanholdtheinvestmentsinacustodialaccountintheclient’sname
• Butasagricultureburgeonsontheeastbank,theriverwillshrinkevenmore.
• AsenseisadiscreterepresentaNonofoneaspectofthemeaningofaword
Synonymyisarelationbetweensensesratherthanwords
• Considerthewordsbigandlarge• Aretheysynonyms?
• Howbigisthatplane?• WouldIbeflyingonalargeorsmallplane?
• Howabouthere:• MissNelson,forinstance,becameakindofbigsistertoBenjamin.
• ?MissNelson,forinstance,becameakindoflargesistertoBenjamin.
• Why?• bighasasensethatmeansbeingolder,orgrownup• largelacksthissense
Antonyms• Sensesthatareoppositeswithrespecttoonefeatureoftheirmeaning
• Otherwise,theyareverysimilar!• dark/light• short/long• hot/cold• up/down• in/out
• Moreformally:antonymscan• defineabinaryopposiNonoratoppositeendsofascale(long/short,fast/slow)
• Bereversives:rise/fall,up/down
Hyponymy• Onesenseisahyponymofanotherifthefirstsenseismorespecific,denoNngasubclassoftheother• carisahyponymofvehicle• dogisahyponymofanimal• mangoisahyponymoffruit
• Conversely• vehicleisahypernym/superordinateofcar• animalisahypernymofdog• fruitisahypernymofmango
superordinate vehicle fruit furniture mammal
hyponym car mango chair dog
Hypernymymoreformally• Extensional:
• Theclassdenotedbythesuperordinate• extensionallyincludestheclassdenotedbythehyponym
• Entailment:• AsenseAisahyponymofsenseBifbeinganAentailsbeingaB
• HyponymyisusuallytransiNve• (AhypoBandBhypoCentailsAhypoC)
• Whywouldhypernyms/hyponymsbeimportanttoconstrucNngameaningrepresentaNon?
II.WordNet• Ahierarchicallyorganizedlexicaldatabase• On-linethesaurus+aspectsofadicNonary
• Versionsforotherlanguagesareunderdevelopment
Category Unique Forms
Noun 117,097
Verb 11,488
Adjective 22,141
Adverb 4,601
WordNet
• Whereitis:• h_ps://wordnet.princeton.edu/
FormatofWordnetEntries
WordNetNounRelations
WordNetVerbRelations
WordNetHierarchies
Howis“sense”deJinedinWordNet?• Thesetofnear-synonymsforaWordNetsenseiscalledasynset(synonymset);it’stheirversionofasenseoraconcept
• Example:chumpasanountomean• ‘apersonwhoisgullibleandeasytotakeadvantageof’
• Eachofthesesensessharethissamegloss• ThusforWordNet,themeaningofthissenseofchumpisthislist.
Wordnetexample
Questions?
Midterm• Format
• MulNpleChoicequesNons• ShortanswerquesNons• Problemsolving
• Whatwillitcover?• Anythingcoveredinclass• Fromreadingthatsupportsmaterialinclass• Mathasneededforneuralnets,machinelearning,smoothing
Midterm• Closedbook,nonotes,noelectronics
• Willavoidaskingyoutorecallformulas• Thatsaid,youshouldknowhowtocomputetheprobabilityofngrams,ofPOStags,basicsforsmoothing,languagemodeling,howtodocomputaNonforneuralnets.
• Willcoveranythingfrombeginningthroughtoday
• SamplemidtermquesNonsposted
Toptopics• Viterbialgorithm• Dependencyparsing• RNNs
Questions?
ViterbiandPOS
Twokindsofprobabilities(1)• TagtransiNonprobabiliNesp(ti|ti-1)
• Determinerslikelytoprecedeadjsandnouns• That/DTflight/NN• The/DTyellow/JJhat/NN• SoweexpectP(NN|DT)andP(JJ|DT)tobehigh• ButP(DT|JJ)tobelow
• ComputeP(NN|DT)bycounNnginalabeledcorpus:
Twokindsofprobabilities(2)• WordlikelihoodprobabiliNesp(wi|ti)
• VBZ(3sgPresverb)likelytobe“is”• ComputeP(is|VBZ)bycounNnginalabeledcorpus:
10/16/19
27
AnExample:theverb“race”
• Secretariat/NNPis/VBZexpected/VBNto/TOrace/VBtomorrow/NR
• People/NNSconNnue/VBto/TOinquire/VBthe/DTreason/NNfor/INthe/DTrace/NNfor/INouter/JJspace/NN
• Howdowepicktherighttag? 10/16/19
28
Disambiguating“race”
10/16/19
29
Disambiguating“race”
10/16/19
30
Disambiguating“race”
10/16/19
31
Disambiguating“race”
10/16/19
32
• P(NN|TO)=.00047• P(VB|TO)=.83• P(race|NN)=.00057• P(race|VB)=.00012• P(NR|VB)=.0027• P(NR|NN)=.0012• P(VB|TO)P(NR|VB)P(race|VB)=.00000027• P(NN|TO)P(NR|NN)P(race|NN)=.00000000032• Sowe(correctly)choosetheverbreading,
10/16/19
33
HMMS
HiddenMarkovModels• Wedon’tobservePOStags
• Weinferthemfromthewordswesee
• Observedevents
• Hiddenevents
10/16/19
35
HiddenMarkovModel• ForMarkovchains,theoutputsymbolsarethesameasthestates.• Seehotweather:we’reinstatehot
• Butinpart-of-speechtagging(andotherthings)• Theoutputsymbolsarewords• Thehiddenstatesarepart-of-speechtags
• Soweneedanextension!• AHiddenMarkovModelisanextensionofaMarkovchaininwhichtheinputsymbolsarenotthesameasthestates.
• Thismeanswedon’tknowwhichstatewearein.
10/16/19
36
HiddenMarkovModels• StatesQ = q1, q2…qN; • ObservaNonsO= o1, o2…oN;
• EachobservaNonisasymbolfromavocabularyV={v1,v2,…vV}
• TransiNonprobabiliNes• Transition probability matrix A = {aij}
• ObservaNonlikelihoods• Output probability matrix B={bi(k)}
• SpecialiniNalprobabilityvectorπ
€
π i = P(q1 = i) 1≤ i ≤ N
€
aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N
€
bi(k) = P(Xt = ok |qt = i)
HiddenMarkovModels
• Someconstraints
10/16/19
38
€
π i = P(q1 = i) 1≤ i ≤ N
€
aij =1; 1≤ i ≤ Nj=1
N
∑
€
bi(k) =1k=1
M
∑
€
π j =1j=1
N
∑
Assumptions• Markovassump&on:
• Output-independenceassump&on
10/16/19
39
€
P(qi |q1...qi−1) = P(qi |qi−1)
€
P(ot |O1t−1,q1
t ) = P(ot |qt )
ThreefundamentalProblemsforHMMs
• Likelihood:GivenanHMMλ=(A,B)andanobservaNonsequenceO,determinethelikelihoodP(O,λ).
• Decoding:GivenanobservaNonsequenceOandanHMMλ=(A,B),discoverthebesthiddenstatesequenceQ.
• Learning:GivenanobservaNonsequenceOandthesetofstatesintheHMM,learntheHMMparametersAandB.WhatkindofdatawouldweneedtolearntheHMMparameters?
10/16/19
40
Decoding• Thebesthiddensequence
• Weathersequenceintheicecreamtask• POSsequencegivenaninputsentence
• Wecoulduseargmaxovertheprobabilityofeachpossiblehiddenstatesequence• Whynot?
• Viterbialgorithm• Dynamicprogrammingalgorithm• Usesadynamicprogrammingtrellis
• Eachtrelliscellrepresents,vt(j),representstheprobabilitythattheHMMisinstatejaterseeingthefirsttobservaNonsandpassingthroughthemostlikelystatesequence 10/1
6/19
41
Viterbiintuition:wearelookingforthebest‘path’
10/16/19
42
promised to back the bill
VBD
VBN
TO
VB
JJ
NN
RB
DT
NNP
VB
NN
promised to back the bill
VBD
VBN
TO
VB
JJ
NN
RB
DT
NNP
VB
NN
S1 S2 S4 S3 S5
promised to back the bill
VBD
VBN
TO
VB
JJ
NN
RB
DT
NNP
VB
NN
Slide from Dekang Lin
Intuition• ThevalueineachcelliscomputedbytakingtheMAXoverallpathsthatleadtothiscell.
• AnextensionofapathfromstateiatNmet-1iscomputedbymulNplying:
10/16/19
43
TheViterbiAlgorithm
10/16/19
44
TheAmatrixforthePOSHMM
10/16/19
45
What is P(VB|TO)? What is P(NN|TO)? Why does this make sense? What is P(TO|VB)? What is P(TO|NN)? Why does this make sense?
TheBmatrixforthePOSHMM
10/16/19
46
Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.
Problem• Iwanttorace(possiblestates:PPSVBTONN)
Viterbiexample
10/16/19
49
t=1
Viterbiexample
10/16/19
50
t=1
X
Viterbiexample
10/16/19
51
t=1
X
J=NN
I=S
TheAmatrixforthePOSHMM
10/16/19
52
Viterbiexample
10/16/19
53
t=1
X
J=NN
I=S
.041X
TheBmatrixforthePOSHMM
10/16/19
54
Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.
Viterbiexample
10/16/19
55
t=1
X
J=NN
I=S
.041X 0 0
Viterbiexample
10/16/19
56
t=1
X
J=NN
I=S
.041X 0 0
0
0
.025
Viterbiexample
10/16/19
57
t=1
J=NN
I=S
0
0
0
.025
Show the 4 formulas you would use to compute the value at this node and the max.
DependencyParsing
Dependencyparsing• AnexamplefromtheNYTimestoday:
Lastweek,onthethirdfloorofasmallbuildinginSanFrancisco’sMissionDistrict,awomanscrambledtheDlesofaRubik’sCube
Dependencyparsing• AnexamplefromtheNYTimestoday:
Lastweek,onthethirdfloorofasmallbuildinginSanFrancisco’sMissionDistrict,awomanscrambledtheDlesofaRubik’sCube
Dependencyparsing• AnexamplefromtheNYTimestoday:
Lastweek,onthethirdfloor,awomanscrambledtheDlesofaRubik’sCube
RNNsandLSTMs
wx
wh
RecurrentNeuralNetworks
x1
h0
h1
ℎ↓𝑡 = 𝜎( 𝑊↓ℎ ℎ↓𝑡−1 + 𝑊↓𝑥 𝑥↓𝑡 )
σ
Slide from Radev
wx
wh
RNN
x1
h0
h1
ℎ↓𝑡 = 𝜎(𝑊↓ℎ ℎ↓𝑡−1 + 𝑊↓𝑥 𝑥↓𝑡 ) 𝑦↓𝑡 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑊↓𝑦 ℎ↓𝑡 )
σ
y1soGmax
wy
Slide from Radev
RNN
wx
wh
x1
h0
h1
σwx
wh
x2
h2
σwx
wh
x3
h3
σ
y3soGmax
The
cat sat
wy
Slide from Radev
UpdatingParametersofanRNN
wx
wh
x1
h0
h1
σwx
wh
x2
h2
σwx
wh
x3
h3
σ
y3soGmax
The
cat sat
Cost
wy
Backpropagation through time
Slide from Radev
RNN–Ihadinmindyourfacts,buddy,nothers.
wx
wh
x1
h0
h1
σwx
wh
x2
h2
σwx
wh
x3
h3
σ
y3sigmoid
I had
in
wx
wh
x3
h3
σ
mind
…
In this overview, w refers to the weights But there are different kinds of weights Let’s be more specific
RNN–Ihadinmindyourfacts,buddy,nothers.
wx
U
x1
h0
h1
wx
U
x2
h2
σwx
U
x3
h3
σ
y3sigmoid
I had
in
wx
U
x3
h3
σ
mind
…
σ
RNN–Ihadinmindyourfacts,buddy,nothers.
wx
U
x1
h0
h1
wx
U
x2
h2
σwx
U
x3
h3
σ
y3sigmoid
I had
in
wx
U
x3
h3
σ
mind
…
W are the weights: the word embedding matrix multiplication with xt yields the embedding for x U is another weight matrix H0 is often not specified. H is the hidden layer.
σ
RNN–Ihadinmindyourfacts,buddy,nothers.
wx
U
x1
h0
h1
wx
U
x2
h2
σwx
U
x3
h3
σ
y3sigmoid
I had
in
wx
U
x3
h3
σ
mind
…
ht = σ ( U wxt ) ht-1
σ
RNN–Ihadinmindyourfacts,buddy,nothers.
wx
wh
x1
h0
h1
σwx
wh
x2
h2
σwx
wh
x3
h3
σ
Sigmoid
I had
in
wx
wh
x3
h3
σ
mind
…
Y = positive? Y = negative? Final embedding run through the sigmoid
function -> [0,1] 1 = positive 0= negative Often final h is used as word embedding for the sentence
UpdatingParametersofanRNN
wx
wh
x1
h0
h1
σwx
wh
x2
h2
σwx
wh
x3
h3
σ
y3sigmoid
Cost
wy
Backpropagation through time Gold label = 0 (negative) Adjust weights using gradient Repeat many times with all examples
Slide from Radev
I had
in
TransformingRNNtoLSTM
𝑢↓𝑡 = 𝜎( 𝑊↓ℎ ℎ↓𝑡−1 + 𝑊↓𝑥 𝑥↓𝑡 )
wx
wh
x1
h0
u1
σ
[slides from Catherine Finegan-Dollak]
TransformingRNNtoLSTM
wx
wh
x1
h0
u1
σ
c0
[slides from Catherine Finegan-Dollak]
TransformingRNNtoLSTM
𝑐↓𝑡 = 𝑓↓𝑡 ⊙ 𝑐↓𝑡−1 + 𝑖↓𝑡 ⊙ 𝑢↓𝑡
wx
wh
x1
h0
u1
σ
c0
+ c1
f1
i1
[slides from Catherine Finegan-Dollak]
TransformingRNNtoLSTM
𝑐↓𝑡 = 𝑓↓𝑡 ⊙ 𝑐↓𝑡−1 + 𝑖↓𝑡 ⊙ 𝑢↓𝑡
wx
wh
x1
h0
u1
σ
c0
+ c1
f1
i1
[slides from Catherine Finegan-Dollak]
TransformingRNNtoLSTM
𝑐↓𝑡 = 𝑓↓𝑡 ⊙ 𝑐↓𝑡−1 + 𝑖↓𝑡 ⊙ 𝑢↓𝑡
wx
wh
x1
h0
u1
σ
c0
+ c1
f1
i1
[slides from Catherine Finegan-Dollak]
wx
wh
x1
h0
u1
σ
c0
+ c1
f1
i1
TransformingRNNtoLSTM
𝑓↓𝑡 = 𝜎( 𝑊↓ℎ𝑓 ℎ↓𝑡−1 + 𝑊↓𝑥𝑓 𝑥↓𝑡 ) f1
x1
h0
σwhf
wxf
[slides from Catherine Finegan-Dollak]
wx
wh
x1
h0
u1
σ
c0
+ c1
f1
i1
TransformingRNNtoLSTM
𝑖↓𝑡 = 𝜎( 𝑊↓ℎ𝑖 ℎ↓𝑡−1 + 𝑊↓𝑥𝑖 𝑥↓𝑡 )
x1
h0
σwhi
wxi
i1
[slides from Catherine Finegan-Dollak]
TransformingRNNtoLSTM
wx
wh
x1
h0
u1
σ
c0
+ c1
f1
i1h1
tanh
o1
ℎ↓𝑡 = 𝑜↓𝑡 ⊙ tanh𝑐↓𝑡
[slides from Catherine Finegan-Dollak]
LSTMforSequences
wx
wh
x1
h0
u1
σ
c0
+c1
f1
i1h1
tanh
o1wx
wh
x2
u2
σ
+c2
f2
i2h2
tanh
o2wx
wh
x2
u2
σ
+c2
f2
i2h2
tanh
o2
The cat sat
[slides from Catherine Finegan-Dollak]
Problem10fromsamplemidtermquestions