CS11-737: Multilingual Natural Language Processing ...
Transcript of CS11-737: Multilingual Natural Language Processing ...
![Page 1: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/1.jpg)
CS11-737: Multilingual Natural Language Processing
Yulia Tsvetkov
Morphological Analysis and Inflection
![Page 2: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/2.jpg)
What is a word
Bob’s handy man is a do-it-yourself kinda guy, isn’t he?
![Page 3: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/3.jpg)
Morphology
The study of the formation and internal structure of words
![Page 4: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/4.jpg)
Morpheme
Image from Lori Levin and David R. Mortensen’s draft book “Human Languages for Artificial Intelligence”
![Page 5: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/5.jpg)
Words are made of morphemes
Bob’s handy man is a do-it-yourself kinda guy, isn’t he?
freemorpheme
boundmorphemes
Example by Austin Matthews
![Page 6: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/6.jpg)
Morphological processes
● concatenation● affixation = stem+affix
○ prefix○ suffix
● non-concatenative affixation○ infix
● compounding = stem+stem
stemprefix + stemprefix + stem + suffix=circumfixation
=
![Page 7: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/7.jpg)
Tagalog
● Tagalog○ stem - bundok ○ singular - mabundok○ plural - mabubundok○ gloss - ‘mountainous’
Example from Lori Levin and David R. Mortensen’s draft book “Human Languages for Artificial Intelligence”
![Page 8: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/8.jpg)
Arabic, Chinese
● Arabic○ root and pattern morphology
● Chinese○ compound words
![Page 9: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/9.jpg)
Morphological functions
● Derivational morphemes ○ bound morphemes used to create new words ○ is these affixes are attached to a new base, the
resulting combination yields a word with a new meaning
○ often derived word belongs to a different syntactic class
● Inflectional morphemes○ bound morphemes used to mark grammatical
distinctions○ change the form but not POS tag or the key meaning
of the word
=
![Page 10: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/10.jpg)
Interlinear glossed text (IGT)
● https://www.eva.mpg.de/lingua/resources/glossing-rules.php
![Page 11: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/11.jpg)
Interlinear glossed text (IGT)
● https://www.eva.mpg.de/lingua/resources/glossing-rules.php
![Page 12: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/12.jpg)
Types of morphological categories and functions
1. Nounsa. NUMBER: Singular, Dual, Pluralb. GENDER (natural & grammatical): Masculine, Feminine, Neuter (Animate, Vegetable; AND AGREEMENTc. DEFINITENESS: Definite, Indefinited. POSSESSION: 1st, 2nd, 3rd; Singular & Plurale. NOUN CLASS (Grammatical gender): Declension types I, II, III, etc.f. CASE PARADIGM (DECLENSION)
2. Adjectivesa. RELATIONAL : QUALITATIVE : DEFECTIVEb. DEGREE: Comparative and Superlative
3. Verbsa. TRANSITIVITY: Transitive, Intransitiveb. ASPECT: Perfective, Imperfectivec. TENSE: Distant Past, Past, Present, Future, Distant Futured. VOICE: Active, Passive e. MOOD: Indicative, Imperative, Subjunctivef. Conjugation Class: I, II, III Conjugations and Conjugations: 1st, 2nd, 3rd Person, Sg, Pl Agreement
![Page 13: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/13.jpg)
Morphological typology
● Isolating or Analytic○ Vietnamese, Chinese, English
● Synthetic○ Fusional or Flexional
■ German, Greek, Russian■ Templatic: Hebrew and Arabic
○ Agglutinative or Agglutinating■ Finnish, Turkish, Malayalam, Swahili
○ Polysynthetic ■ Inuit, Yupik
![Page 14: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/14.jpg)
(Cettolo, Girardi, & Federico, 2012)
Type-token curves
![Page 15: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/15.jpg)
Why is rich morphology a challenge for NLP?
● High type-token ratio due to the large variety of grammatical features expressed with morphology
○ This leads to the lexical sparsity and out-of-vocabulary words
● In language generation long-range relations between words need to be enforced for modeling morphological agreement
○ This leads to agreement errors
● Morphological properties vary across languages and language families, and mapping of morphological features across languages is a challenge
○ This is exacerbated by the variability of morphological rules and irregularities (e.g. dance → danced → danced but eat → ate → eaten)
○ This leads to problems in transfer learning, translation errors, and biases in translation
![Page 16: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/16.jpg)
Types of morphological processing
● Analysis○ morphological parsing○ morphological segmentation
● Generation○ inflection generation ○ paradigm completion
● Acquisition of inflectional morphology
![Page 17: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/17.jpg)
Morphological analysis
![Page 18: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/18.jpg)
Morphological analysis with FSTs
![Page 19: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/19.jpg)
Morphological analysis with RNNs
Canonical segmentation
a. surface segmentation: achievability → achiev+abil+ity
b. canonical segmentation achievability → achieve+able+ity
1. Character bidirectional GRU encoder with attention
2. GRU decoder produces output characters3. Neural reranker for segments to identify
canonical segments
![Page 20: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/20.jpg)
Evaluation of morphological analysis
● Error rate● Edit distance ● Morpheme F1
![Page 22: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/22.jpg)
1. Inflection generation
2. Paradigm completion
Morphological generation
![Page 23: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/23.jpg)
The SIGMORPHON shared tasks
● Cross-lingual transfer for morphological inflection● Morphological analysis in context● Morphological paradigm completion
![Page 24: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/24.jpg)
Morphological inflection generation
![Page 25: CS11-737: Multilingual Natural Language Processing ...](https://reader030.fdocuments.us/reader030/viewer/2022012709/61a912d60aa25977e514d7e9/html5/thumbnails/25.jpg)
Paper for class discussion
● https://www.aclweb.org/anthology/D19-1091.pdf● Read the paper● Provide critique to a part of the paper (e.g., focusing on an individual
component of proposed model architecture or a part of experimental setup)● Propose directions for follow-up work