different languages Mul

40
CS388: Natural Language Processing Greg Durre8 Lecture 25: Mul<linguality and Morphology when your parser works in 90 different languages

Transcript of different languages Mul

  • CS388:NaturalLanguageProcessing

    GregDurre8

    Lecture25:Mul

  • Administrivia

    ‣ Project2backtoday/tomorrow

    ‣ TACCalloca

  • Dealingwithotherlanguages

    ‣ManyalgorithmssofarhavebeendevelopedforEnglish‣ Somestructureslikecons

  • ThisLecture

    ‣Morphologicalrichness:effectsandchallenges

    ‣ Cross-lingualtaggingandparsing

    ‣Morphologytasks:analysis,inflec

  • Morphology

  • Whatismorphology?‣ Studyofhowwordsform

    ‣ Derivaestrangement(n)

    become(v)=>unbecoming(adj)

    Ibecome/shebecomes

    ‣ Inflecinflammable

    ‣Mostlyappliestoverbsandnouns

  • MorphologicalInflec

  • MorphologicalInflec

  • NounInflec

  • IrregularInflec

  • Agglu

  • Morphologically-RichLanguages

    ‣ManylanguagesusedallovertheworldhavemuchrichermorphologythanEnglish(Chineseisthemainexcep

  • Morphologically-RichLanguages

    ‣ Greatresourcesforchallengingyourassump

  • MorphologicalAnalysis/Inflec

  • MorphologicalAnalysis:Hungarian

    Ámakormányegyetlenadócsökkentésétsemjavasolja.

    n=singular|case=nomina/ve|proper=no

    deg=posi/ve|n=singular|case=nomina/ve

    n=singular|case=nomina/ve|proper=no

    n=singular|case=accusa/ve|proper=no|pperson=3rd|pnumber=singular

    mood=indica/ve|t=present|p=3rd|n=singular|def=yes

    Butthegovernmentdoesnotrecommendreducingtaxes.

  • MorphologicalAnalysis

    ‣ Givenaword,needtopredictwhatitsmorphologicalfeaturesare

    ‣ LotsofworkonArabicinflec

  • Predic

  • Predic

  • MorphologicalReinflec

  • WordSegmenta

  • MorphemeSegmentaun+becom+ing—weshouldbeabletorecognizethesecommonpiecesandsplitthemoff

    ‣ Howdowedothis?

  • MorphemeSegmenta

  • ChineseWordSegmenta

  • Cross-LingualTaggingandParsing

  • Cross-LingualTagging

    ‣ LabelingPOSdatasetsisexpensive‣ Canwetransferannota

  • UnsupervisedTagging

    ‣Mul

  • TaggingbyAnnota

  • Cross-LingualTagging

    DasandPetrov(2011)

    ‣ EM-HMM/featureHMM:unsupervisedmethodswithagreedymappingfromlearnedtagstogoldtags

    ‣ Projec

  • Cross-LingualParsing

    McDonaldetal.(2011)

    ‣ NowthatwecanPOStagotherlanguages,canweparsethemtoo?

    ‣ Directtransfer:trainaparseroverPOSsequencesinonelanguage,thenapplyittoanotherlanguage

    Iliketomatoes

    PRONVERBNOUN

    JelesaimePRONPRONVERB

    Ilikethem

    PRONVERBPRON

    Parsertrained 
toaccepttag
input

    VERBistheheadofPRONandNOUN

    parsenew 
data

    train

  • Cross-LingualParsing

    McDonaldetal.(2011)

    ‣Mul

  • Cross-LingualWordRepresenta

  • Mul

  • Mul

  • Mul

  • Mul

  • Mul

  • MulHindi(Devanagari).Transferswelldespitedifferentalphabets!

    ‣ Japanese=>English:differentscriptandverydifferentsyntax

  • Mul

  • Wherearewenow?

    ‣ Universaldependencies:treebanks(+tags)for70+languages

    ‣Manylanguagesares

  • Takeaways

    ‣ManylanguageshaverichermorphologythanEnglishandposedis