Lexical Semantics Wrap-up and Midterm Reviekathy/NLP/2019/ClassSlides/Class13... · 2019-10-16 ·...

LexicalSemanticsWrap-upandMidtermReview

Howdoweknowwhenawordhasmorethanonesense?• ATISexamples

• Whichflightsservebreakfast?• DoesAmericaWestservePhiladelphia?

• The“zeugma”test:

•  ?DoesUnitedservebreakfastandSanJose?

Synonyms• Wordthathavethesamemeaninginsomeorallcontexts.•  filbert/hazelnut•  couch/sofa•  big/large•  automobile/car•  vomit/throwup•  Water/H20

• TwolexemesaresynonymsiftheycanbesuccessfullysubsNtutedforeachotherinallsituaNons•  Ifsotheyhavethesameproposi&onalmeaning

Synonyms• Buttherearefew(orno)examplesofperfectsynonymy.• Whyshouldthatbe?•  EvenifmanyaspectsofmeaningareidenNcal•  SNllmaynotpreservetheacceptabilitybasedonnoNonsofpoliteness,slang,register,genre,etc.

• Example:• WaterandH20

Somemoreterminology•  Lemmasandwordforms

•  Alexemeisanabstractpairingofmeaningandform•  Alemmaorcita&onformisthegrammaNcalformthatisusedtorepresentalexeme.

•  Carpetisthelemmaforcarpets•  Dormiristhelemmaforduermes.

•  Specificsurfaceformscarpets,sung,duermesarecalledwordforms

•  Thelemmabankhastwosenses:•  Instead,abankcanholdtheinvestmentsinacustodialaccountintheclient’sname

•  Butasagricultureburgeonsontheeastbank,theriverwillshrinkevenmore.

• AsenseisadiscreterepresentaNonofoneaspectofthemeaningofaword

Synonymyisarelationbetweensensesratherthanwords

• Considerthewordsbigandlarge• Aretheysynonyms?

•  Howbigisthatplane?• WouldIbeflyingonalargeorsmallplane?

• Howabouthere:•  MissNelson,forinstance,becameakindofbigsistertoBenjamin.

•  ?MissNelson,forinstance,becameakindoflargesistertoBenjamin.

• Why?•  bighasasensethatmeansbeingolder,orgrownup•  largelacksthissense

Antonyms• Sensesthatareoppositeswithrespecttoonefeatureoftheirmeaning

• Otherwise,theyareverysimilar!•  dark/light•  short/long•  hot/cold•  up/down•  in/out

• Moreformally:antonymscan•  defineabinaryopposiNonoratoppositeendsofascale(long/short,fast/slow)

•  Bereversives:rise/fall,up/down

Hyponymy• Onesenseisahyponymofanotherifthefirstsenseismorespecific,denoNngasubclassoftheother•  carisahyponymofvehicle•  dogisahyponymofanimal•  mangoisahyponymoffruit

•  Conversely•  vehicleisahypernym/superordinateofcar•  animalisahypernymofdog•  fruitisahypernymofmango

superordinate vehicle fruit furniture mammal

hyponym car mango chair dog

Hypernymymoreformally• Extensional:

•  Theclassdenotedbythesuperordinate•  extensionallyincludestheclassdenotedbythehyponym

• Entailment:• AsenseAisahyponymofsenseBifbeinganAentailsbeingaB

• HyponymyisusuallytransiNve•  (AhypoBandBhypoCentailsAhypoC)

• Whywouldhypernyms/hyponymsbeimportanttoconstrucNngameaningrepresentaNon?

II.WordNet• Ahierarchicallyorganizedlexicaldatabase• On-linethesaurus+aspectsofadicNonary

•  Versionsforotherlanguagesareunderdevelopment

Category Unique Forms

Noun 117,097

Verb 11,488

Adjective 22,141

Adverb 4,601

WordNet

•  Whereitis:•  h_ps://wordnet.princeton.edu/

FormatofWordnetEntries

WordNetNounRelations

WordNetVerbRelations

WordNetHierarchies

Howis“sense”deJinedinWordNet?• Thesetofnear-synonymsforaWordNetsenseiscalledasynset(synonymset);it’stheirversionofasenseoraconcept

• Example:chumpasanountomean•  ‘apersonwhoisgullibleandeasytotakeadvantageof’

•  Eachofthesesensessharethissamegloss•  ThusforWordNet,themeaningofthissenseofchumpisthislist.

Wordnetexample

Questions?

Midterm• Format

• MulNpleChoicequesNons•  ShortanswerquesNons• Problemsolving

• Whatwillitcover?• Anythingcoveredinclass•  Fromreadingthatsupportsmaterialinclass• Mathasneededforneuralnets,machinelearning,smoothing

Midterm• Closedbook,nonotes,noelectronics

• Willavoidaskingyoutorecallformulas•  Thatsaid,youshouldknowhowtocomputetheprobabilityofngrams,ofPOStags,basicsforsmoothing,languagemodeling,howtodocomputaNonforneuralnets.

• Willcoveranythingfrombeginningthroughtoday

• SamplemidtermquesNonsposted

Toptopics• Viterbialgorithm• Dependencyparsing• RNNs

Questions?

ViterbiandPOS

Twokindsofprobabilities(1)• TagtransiNonprobabiliNesp(ti|ti-1)

• Determinerslikelytoprecedeadjsandnouns•  That/DTflight/NN•  The/DTyellow/JJhat/NN•  SoweexpectP(NN|DT)andP(JJ|DT)tobehigh•  ButP(DT|JJ)tobelow

• ComputeP(NN|DT)bycounNnginalabeledcorpus:

Twokindsofprobabilities(2)• WordlikelihoodprobabiliNesp(wi|ti)

• VBZ(3sgPresverb)likelytobe“is”• ComputeP(is|VBZ)bycounNnginalabeledcorpus:

10/16/19

27

AnExample:theverb“race”

• Secretariat/NNPis/VBZexpected/VBNto/TOrace/VBtomorrow/NR

• People/NNSconNnue/VBto/TOinquire/VBthe/DTreason/NNfor/INthe/DTrace/NNfor/INouter/JJspace/NN

• Howdowepicktherighttag? 10/16/19

28

Disambiguating“race”

10/16/19

29


10/16/19

30


10/16/19

31


10/16/19

32

HiddenMarkovModels• Wedon’tobservePOStags

• Weinferthemfromthewordswesee

• Observedevents

• Hiddenevents

10/16/19

35

HiddenMarkovModel•  ForMarkovchains,theoutputsymbolsarethesameasthestates.•  Seehotweather:we’reinstatehot

• Butinpart-of-speechtagging(andotherthings)•  Theoutputsymbolsarewords•  Thehiddenstatesarepart-of-speechtags

•  Soweneedanextension!• AHiddenMarkovModelisanextensionofaMarkovchaininwhichtheinputsymbolsarenotthesameasthestates.

•  Thismeanswedon’tknowwhichstatewearein.

10/16/19

36

HiddenMarkovModels• StatesQ = q1, q2…qN; • ObservaNonsO= o1, o2…oN;

•  EachobservaNonisasymbolfromavocabularyV={v1,v2,…vV}

• TransiNonprobabiliNes•  Transition probability matrix A = {aij}

• ObservaNonlikelihoods• Output probability matrix B={bi(k)}

• SpecialiniNalprobabilityvectorπ

€

π i = P(q1 = i) 1≤ i ≤ N

€

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

€

bi(k) = P(Xt = ok |qt = i)

HiddenMarkovModels

• Someconstraints

10/16/19

38

€

π i = P(q1 = i) 1≤ i ≤ N

€

aij =1; 1≤ i ≤ Nj=1

N

∑

€

bi(k) =1k=1

M

∑

€

π j =1j=1

N

∑

Assumptions• Markovassump&on:

• Output-independenceassump&on

10/16/19

39

€

P(qi |q1...qi−1) = P(qi |qi−1)

€

P(ot |O1t−1,q1

t ) = P(ot |qt )

ThreefundamentalProblemsforHMMs

• Likelihood:GivenanHMMλ=(A,B)andanobservaNonsequenceO,determinethelikelihoodP(O,λ).

• Decoding:GivenanobservaNonsequenceOandanHMMλ=(A,B),discoverthebesthiddenstatesequenceQ.

• Learning:GivenanobservaNonsequenceOandthesetofstatesintheHMM,learntheHMMparametersAandB.WhatkindofdatawouldweneedtolearntheHMMparameters?

10/16/19

40

Decoding•  Thebesthiddensequence

• Weathersequenceintheicecreamtask•  POSsequencegivenaninputsentence

• Wecoulduseargmaxovertheprobabilityofeachpossiblehiddenstatesequence• Whynot?

• Viterbialgorithm•  Dynamicprogrammingalgorithm•  Usesadynamicprogrammingtrellis

•  Eachtrelliscellrepresents,vt(j),representstheprobabilitythattheHMMisinstatejaterseeingthefirsttobservaNonsandpassingthroughthemostlikelystatesequence 10/1

6/19

41

Viterbiintuition:wearelookingforthebest‘path’

10/16/19

42

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN


VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

S1 S2 S4 S3 S5


VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

Slide from Dekang Lin

Intuition• ThevalueineachcelliscomputedbytakingtheMAXoverallpathsthatleadtothiscell.

• AnextensionofapathfromstateiatNmet-1iscomputedbymulNplying:

10/16/19

43

TheViterbiAlgorithm

10/16/19

44

TheAmatrixforthePOSHMM

10/16/19

45

What is P(VB|TO)? What is P(NN|TO)? Why does this make sense? What is P(TO|VB)? What is P(TO|NN)? Why does this make sense?

TheBmatrixforthePOSHMM

10/16/19

46

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Problem• Iwanttorace(possiblestates:PPSVBTONN)

Viterbiexample

10/16/19

49

t=1

Viterbiexample

10/16/19

50

t=1

X

Viterbiexample

10/16/19

51

t=1

X

J=NN

I=S

TheAmatrixforthePOSHMM

10/16/19

52

Viterbiexample

10/16/19

53

t=1

X

J=NN

I=S

.041X

TheBmatrixforthePOSHMM

10/16/19

54

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Viterbiexample

10/16/19

55

t=1

X

J=NN

I=S

.041X 0 0

Viterbiexample

10/16/19

56

t=1

X

J=NN

I=S

.041X 0 0

0

0

.025

Viterbiexample

10/16/19

57

t=1

J=NN

I=S

0

0

0

.025

Show the 4 formulas you would use to compute the value at this node and the max.

DependencyParsing

Dependencyparsing• AnexamplefromtheNYTimestoday:

Lastweek,onthethirdfloorofasmallbuildinginSanFrancisco’sMissionDistrict,awomanscrambledtheDlesofaRubik’sCube

Dependencyparsing• AnexamplefromtheNYTimestoday:

Lastweek,onthethirdfloor,awomanscrambledtheDlesofaRubik’sCube

RNNsandLSTMs

wx

wh

RecurrentNeuralNetworks

x1

h0

h1

ℎ↓𝑡 = 𝜎( 𝑊↓ℎ ℎ↓𝑡−1 + 𝑊↓𝑥 𝑥↓𝑡 )

σ

Slide from Radev

wx

wh

RNN

x1

h0

h1

ℎ↓𝑡 = 𝜎(𝑊↓ℎ ℎ↓𝑡−1 + 𝑊↓𝑥 𝑥↓𝑡 ) 𝑦↓𝑡 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑊↓𝑦 ℎ↓𝑡 )

σ

y1soGmax

wy

Slide from Radev

RNN

wx

wh

x1

h0

h1

σwx

wh

x2

h2

σwx

wh

x3

h3

σ

y3soGmax

The

cat sat

wy

Slide from Radev

UpdatingParametersofanRNN

wx

wh

x1

h0

h1

σwx

wh

x2

h2

σwx

wh

x3

h3

σ

y3soGmax

The

cat sat

Cost

wy

Backpropagation through time

Slide from Radev

RNN–Ihadinmindyourfacts,buddy,nothers.

wx

wh

x1

h0

h1

σwx

wh

x2

h2

σwx

wh

x3

h3

σ

y3sigmoid

I had

in

wx

wh

x3

h3

σ

mind

…

In this overview, w refers to the weights But there are different kinds of weights Let’s be more specific


wx

U

x1

h0

h1

wx

U

x2

h2

σwx

U

x3

h3

σ

y3sigmoid

I had

in

wx

U

x3

h3

σ

mind

…

σ


wx

U

x1

h0

h1

wx

U

x2

h2

σwx

U

x3

h3

σ

y3sigmoid

I had

in

wx

U

x3

h3

σ

mind

…

W are the weights: the word embedding matrix multiplication with xt yields the embedding for x U is another weight matrix H0 is often not specified. H is the hidden layer.

σ


wx

U

x1

h0

h1

wx

U

x2

h2

σwx

U

x3

h3

σ

y3sigmoid

I had

in

wx

U

x3

h3

σ

mind

…

ht = σ ( U wxt ) ht-1

σ


wx

wh

x1

h0

h1

σwx

wh

x2

h2

σwx

wh

x3

h3

σ

Sigmoid

I had

in

wx

wh

x3

h3

σ

mind

…

Y = positive? Y = negative? Final embedding run through the sigmoid

function -> [0,1] 1 = positive 0= negative Often final h is used as word embedding for the sentence

UpdatingParametersofanRNN

wx

wh

x1

h0

h1

σwx

wh

x2

h2

σwx

wh

x3

h3

σ

y3sigmoid

Cost

wy

Backpropagation through time Gold label = 0 (negative) Adjust weights using gradient Repeat many times with all examples

Slide from Radev

I had

in

TransformingRNNtoLSTM

𝑢↓𝑡 = 𝜎( 𝑊↓ℎ ℎ↓𝑡−1 + 𝑊↓𝑥 𝑥↓𝑡 )

wx

wh

x1

h0

u1

σ

[slides from Catherine Finegan-Dollak]


wx

wh

x1

h0

u1

σ

c0



𝑐↓𝑡 = 𝑓↓𝑡 ⊙ 𝑐↓𝑡−1 + 𝑖↓𝑡 ⊙ 𝑢↓𝑡 

wx

wh

x1

h0

u1

σ

c0

+ c1

f1

i1


wx

wh

x1

h0

u1

σ

c0

+ c1

f1

i1


𝑓↓𝑡 = 𝜎( 𝑊↓ℎ𝑓 ℎ↓𝑡−1 + 𝑊↓𝑥𝑓 𝑥↓𝑡 ) f1

x1

h0

σwhf

wxf


wx

wh

x1

h0

u1

σ

c0

+ c1

f1

i1


𝑖↓𝑡 = 𝜎( 𝑊↓ℎ𝑖 ℎ↓𝑡−1 + 𝑊↓𝑥𝑖 𝑥↓𝑡 )

x1

h0

σwhi

wxi

i1



wx

wh

x1

h0

u1

σ

c0

+ c1

f1

i1h1

tanh

o1

ℎ↓𝑡 = 𝑜↓𝑡 ⊙ tanh𝑐↓𝑡  


LSTMforSequences

wx

wh

x1

h0

u1

σ

c0

+c1

f1

i1h1

tanh

o1wx

wh

x2

u2

σ

+c2

f2

i2h2

tanh

o2wx

wh

x2

u2

σ

+c2

f2

i2h2

tanh

o2

The cat sat


Problem10fromsamplemidtermquestions

Lexical Semantics Wrap-up and Midterm Reviekathy/NLP/2019/ClassSlides/Class13... · 2019-10-16 ·...

Documents

Transcript of Lexical Semantics Wrap-up and Midterm Reviekathy/NLP/2019/ClassSlides/Class13... · 2019-10-16 ·...