different languages Mul
Transcript of different languages Mul
-
CS388:NaturalLanguageProcessing
GregDurre8
Lecture25:Mul
-
Administrivia
‣ Project2backtoday/tomorrow
‣ TACCalloca
-
Dealingwithotherlanguages
‣ManyalgorithmssofarhavebeendevelopedforEnglish‣ Somestructureslikecons
-
ThisLecture
‣Morphologicalrichness:effectsandchallenges
‣ Cross-lingualtaggingandparsing
‣Morphologytasks:analysis,inflec
-
Morphology
-
Whatismorphology?‣ Studyofhowwordsform
‣ Derivaestrangement(n)
become(v)=>unbecoming(adj)
Ibecome/shebecomes
‣ Inflecinflammable
‣Mostlyappliestoverbsandnouns
-
MorphologicalInflec
-
MorphologicalInflec
-
NounInflec
-
IrregularInflec
-
Agglu
-
Morphologically-RichLanguages
‣ManylanguagesusedallovertheworldhavemuchrichermorphologythanEnglish(Chineseisthemainexcep
-
Morphologically-RichLanguages
‣ Greatresourcesforchallengingyourassump
-
MorphologicalAnalysis/Inflec
-
MorphologicalAnalysis:Hungarian
Ámakormányegyetlenadócsökkentésétsemjavasolja.
n=singular|case=nomina/ve|proper=no
deg=posi/ve|n=singular|case=nomina/ve
n=singular|case=nomina/ve|proper=no
n=singular|case=accusa/ve|proper=no|pperson=3rd|pnumber=singular
mood=indica/ve|t=present|p=3rd|n=singular|def=yes
Butthegovernmentdoesnotrecommendreducingtaxes.
-
MorphologicalAnalysis
‣ Givenaword,needtopredictwhatitsmorphologicalfeaturesare
‣ LotsofworkonArabicinflec
-
Predic
-
Predic
-
MorphologicalReinflec
-
WordSegmenta
-
MorphemeSegmentaun+becom+ing—weshouldbeabletorecognizethesecommonpiecesandsplitthemoff
‣ Howdowedothis?
-
MorphemeSegmenta
-
ChineseWordSegmenta
-
Cross-LingualTaggingandParsing
-
Cross-LingualTagging
‣ LabelingPOSdatasetsisexpensive‣ Canwetransferannota
-
UnsupervisedTagging
‣Mul
-
TaggingbyAnnota
-
Cross-LingualTagging
DasandPetrov(2011)
‣ EM-HMM/featureHMM:unsupervisedmethodswithagreedymappingfromlearnedtagstogoldtags
‣ Projec
-
Cross-LingualParsing
McDonaldetal.(2011)
‣ NowthatwecanPOStagotherlanguages,canweparsethemtoo?
‣ Directtransfer:trainaparseroverPOSsequencesinonelanguage,thenapplyittoanotherlanguage
Iliketomatoes
PRONVERBNOUN
JelesaimePRONPRONVERB
Ilikethem
PRONVERBPRON
Parsertrained toaccepttag input
VERBistheheadofPRONandNOUN
parsenew data
train
-
Cross-LingualParsing
McDonaldetal.(2011)
‣Mul
-
Cross-LingualWordRepresenta
-
Mul
-
Mul
-
Mul
-
Mul
-
Mul
-
MulHindi(Devanagari).Transferswelldespitedifferentalphabets!
‣ Japanese=>English:differentscriptandverydifferentsyntax
-
Mul
-
Wherearewenow?
‣ Universaldependencies:treebanks(+tags)for70+languages
‣Manylanguagesares
-
Takeaways
‣ManylanguageshaverichermorphologythanEnglishandposedis