Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
CSA4050: Advanced Topics in NLP
-
Upload
dean-richard -
Category
Documents
-
view
32 -
download
0
description
Transcript of CSA4050: Advanced Topics in NLP
![Page 1: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/1.jpg)
CSA4050: Advanced Topicsin NLP
Computational Morphology II
Introduction 2 Level Morphology
![Page 2: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/2.jpg)
4.12.2001 CSA405 Lecture 2lev 2
The Problem
• So far we have assumed that words are formed by strict concatenation of component morphemes as in the following example:en + large + ment + s
• This assumption is convenient because it imposes a 1:1 correspondence between segmentation of the string and lookup of lexical items (which may be different types e.g. roots, affixes, particles etc)
• The problem is that this is an unrealistic assumption to make.
![Page 3: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/3.jpg)
4.12.2001 CSA405 Lecture 2lev 3
English Spelling Rules
• Final consonant doublingbegin + ing = beginning
• s to es church + s = churches
• y to i carry + ed = carried
• Final e deletionrake + ing = raking
• n to min + practical = impractical
![Page 4: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/4.jpg)
4.12.2001 CSA405 Lecture 2lev 4
Semitic Languages
dhalt
dhalt
dahal
dahlet
dhalna
dhaltu
dahlu
• Deletion of vowel• Changes or insertion
of vowel• Non-concatenative
morphology
[in examples h should be crossed]
![Page 5: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/5.jpg)
4.12.2001 CSA405 Lecture 2lev 5
Handling Spelling Rules
• Such phenomena usually occur at morpheme boundaries, and prevent direct lookup of the surface string in the lexicon.
• The solution is to suppose that two strings are involved:
• The surface string: that which appears on the page• The lexical string: that which is used to index
items in the lexicon.• What kind of mapping exists between the two
strings?
![Page 6: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/6.jpg)
4.12.2001 CSA405 Lecture 2lev 6
Lexical Transformations
SURFACE STRING
LEXICAL STRING
![Page 7: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/7.jpg)
4.12.2001 CSA405 Lecture 2lev 7
Phonological Rules
• Morphological rules are a reflection of phonological changes.
• Assumption: lexical/surface transformation is rule governed.
• Phonological rules systems had been extensively studied from the point of view of generative linguistics under Chomsky during the 1970s
![Page 8: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/8.jpg)
4.12.2001 CSA405 Lecture 2lev 8
Typical Phonological Rule
• Typical rule has the following shape Phon1 -> Phon2//Lcontext __ Rcontext
• Meaning: Phoneme Phon1 is transformed to phoneme Phon2 if it occures between left context Lcontext and right context Rcontext
• Example [B] -> [P] // __ #
• B is pronounced like P if it is word final (cf kelb)
![Page 9: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/9.jpg)
4.12.2001 CSA405 Lecture 2lev 9
Properties of Phonological Rules within the Generative Tradition
• Rules are rewrite rules
• Rules apply sequentially
• Rules are ordered
• Rules may act upon their own output (cyclic rules)
• Effects of rules are not always reversible
• Collections of rules have Turing power
![Page 10: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/10.jpg)
4.12.2001 CSA405 Lecture 2lev 10
C. Douglas Johnson (1972)
• A theory of phonology with the right properties could be implemented using only finite state machinery.
• Each rule is associated with a finite state transducer (FST).• All rules operated in simultaneously, thus eliminating the
delicate problems of ordering associated with sequential cascades of rules.
• The collection of FS rules operating in parallel is mathematically equivalent to a single FST representing the intersection of the component FSTs
• Johnson’s work was mainly theoretical. He was not involved with computational issues, in particular the issue of computing the intersection of multiple FSTs.
![Page 11: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/11.jpg)
4.12.2001 CSA405 Lecture 2lev 11
Finite State Machinery
• FS Automaton• For recognition and
generation of regular languages.
• All operations over regular languages have corresponding operations over corresponding FSAs
• FS Transducer• Like FSAs but with output
as well as input• For recognition and
generation of regular relations.
• Some operations over regular languages do not have corresponding operations over corresponding FSTs
![Page 12: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/12.jpg)
4.12.2001 CSA405 Lecture 2lev 12
Kimmo Koskenniemi (1983)
• Worked on morphology of Finnish and came up with a system of finite state transducers.
• Came up with a computational framework for executing collections of finite state transducers in parallel.
![Page 13: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/13.jpg)
4.12.2001 CSA405 Lecture 2lev 13
Koskenniemi’s Model
SURFACE STRING
LEXICAL STRING
FST1 FST2 FST3 … FSTnInterpreter executes round-robin keeping FSTs in lock-step before moving head
![Page 14: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/14.jpg)
4.12.2001 CSA405 Lecture 2lev 14
Martin Kay and Ron Kaplan (1981)
• Kay and Kaplan (both at Xerox PARC) were very interested in the computational issues underlying morphological processing.
• In particular, they studied the problems of– How to combine FSTs in parallel (computing the
intersection of regular relations)
– How to combine FSTs in series (computing the composition of FSTs).
• Restrictions on rules have pleasant consequences
![Page 15: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/15.jpg)
4.12.2001 CSA405 Lecture 2lev 15
Restrictions on Rules
• With the restriction that a rule shall not apply to its own output, Kaplan and Kay showed that the result of combining the corresponding relations under the under the operations of intersection, composition and union remains within a closed subclass of those computable by FSTs.
• They then spent many years designing and implementing a calculus for describing and combining FSTs based upon regular expressions.
![Page 16: CSA4050: Advanced Topics in NLP](https://reader036.fdocuments.us/reader036/viewer/2022071807/5681308a550346895d9665f6/html5/thumbnails/16.jpg)
4.12.2001 CSA405 Lecture 2lev 16
Summary
GenerativePhonology
ChomskyGenerativeTradition
MultilevelCascadesof Rules
JohnsonParallel Rules
Kaplan/KayCalculus
KoskiniemmiParallel Rules
KIMMOPC-Kimmo
Xerox Toolsxfst/twolc/lexc