Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8...
Transcript of Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8...
![Page 1: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/1.jpg)
1
Morphology
September 2010
Jugal Kalita, University of Colorado at Colorado Springs
Adapted from Kathy McCoy, University of Delaware
![Page 2: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/2.jpg)
2
What is Morphology?
• The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
– Stems – core meaning units in a lexicon
– Affixes (prefixes, suffixes, infixes, circumfixes) – bits and pieces that combine with stems to modify their meanings and grammatical functions (can have multiple ones)
• Immaterial • Trying • Absobl**dylutely, Man-f**king-hattan (infixing bloody or fucking) • Unreadable
![Page 3: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/3.jpg)
3
Why is Morphology Important to the Lexicon?
Full listing versus Minimal Redundancy • true, truer, truest, truly, untrue, truth, truthful,
truthfully, untruthfully, untruthfulness • Untruthfulness = un- + true + -th + -ful + -ness • These morphemes appear to be productive • By representing knowledge about the internal
structure of words and the rules of word formation, we can save room and search time.
![Page 4: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/4.jpg)
4
Need to do Morphological Parsing
Morphological Parsing (or Stemming)
• Taking a surface input and breaking it down into its morphemes
• foxes breaks down into the morphemes fox (noun stem) and –es (plural suffix)
• rewrites breaks down into re- (prefix) and write (stem) and –s (suffix)
![Page 5: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/5.jpg)
5
Two Broad Classes of Morphology
• Inflectional Morphology – Combination of stem and morpheme resulting in word of
same class – Usually fills a syntactic feature such as agreement – E.g., plural –s, past tense -ed
• Derivational Morphology – Combination of stem and morpheme usually results in a
word of a different class – Meaning of the new word may be hard to predict – E.g., +ation in words such as computerization
![Page 6: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/6.jpg)
6
Word Classes
• By word class, we have in mind familiar notions like noun and verb that we discussed a bit in the previous lecture.
• Right now we’re concerned with word classes because the way that stems and affixes combine is based to a large degree on the word class of the stem.
![Page 7: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/7.jpg)
7
English Inflectional Morphology • Word stem combines with grammatical morpheme
– Usually produces word of same class – Usually serves a syntactic function (e.g., agreement)
like likes or liked bird birds
• Nominal morphology – Plural forms
• s or es • Irregular forms (next slide) • Mass vs. count nouns (email or emails)
– Possessives
![Page 8: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/8.jpg)
8
Complication in Morphology • It can get a little complicated by the fact that some
words misbehave (refuse to follow the rules) • The terms regular and irregular will be used to refer
to words that follow the rules and those that don’t.
Regular (Nouns) • Singular (cat, thrush) • Plural (cats, thrushes) • Possessive (cat’s thrushes’)
Irregular (Nouns) • Singular (mouse, ox) • Plural (mice, oxen)
![Page 9: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/9.jpg)
9
• Verbal inflection – Main verbs (sleep, like, fear) are relatively regular
• -s, ing, ed • And productive (i.e., can be used with newly formed
verbs): Emailed, instant-messaged, faxed, Imed, SMSed • But eat/ate/eaten, catch/caught/caught: These are
irregular. – Primary (be, have, do) and modal verbs (can, will, must) are
often irregular and not productive • Be: am/is/are/were/was/will/been/being
– Irregular verbs few (~250) but frequently occurring – English verbal inflection is much simpler than e.g., Latin,
Sanskrit or German
![Page 10: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/10.jpg)
10
Regular and Irregular Verbs
• Regulars… – Walk, walks, walking, walked, walked
• Irregulars – Eat, eats, eating, ate, eaten – Catch, catches, catching, caught, caught – Cut, cuts, cutting, cut, cut
![Page 11: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/11.jpg)
11
Derivational Morphology
• Derivational morphology is somewhat messy. – There is usually only a partial pattern to what is acceptable – Irregular meaning change – Changes of word class
![Page 12: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/12.jpg)
12
English Derivational Morphology
• Word stem combines with grammatical morpheme – Usually produces a word of a different class – More complicated than inflectional
• Example: nominalization – -ize verbs -ation nouns – generalize, realize generalization, realization – verb -er nouns – Murder, spell murderer, speller
• Example: verbs, nouns adjectives – embrace, pity embraceable, pitiable – care, wit careless, witless
![Page 13: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/13.jpg)
13
• Example: adjective adverb – happy happily
• More complicated to model than inflection – Less productive: *science-less, *concern-less, *go-able,
*sleep-able – It’s difficult to know what works with which types of words – Meanings of derived terms harder to predict by rule
• clueless, careless, nerveless
![Page 14: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/14.jpg)
14
Derivational Examples • Verb/Adj to Noun
-ation computerize computerization
-ee appoint appointee
-er kill killer
-ness fuzzy fuzziness
![Page 15: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/15.jpg)
15
Derivational Examples • Noun/Verb to Adj
-al Computation Computational
-able Embrace Embraceable
-less Clue Clueless
![Page 16: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/16.jpg)
16
Compute
• Many paths are possible… • Start with compute
– Computer -> computerize -> computerization – Computation -> computational – Computer -> computerize -> computerizable – Compute -> computee
![Page 17: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/17.jpg)
17
Parsing
• Taking a surface input and identifying its components and underlying structure
• Morphological parsing: parsing a word into stem and affixes and identifying the parts and their relationships – Stem and features:
• goose goose +N +SG or goose +V • geese goose +N +PL • gooses goose +V +3SG
– Bracketing: indecipherable [in [[de [cipher]] able]]
![Page 18: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/18.jpg)
18
Why parse words?
• For spell-checking – Is muncheble a legal word?
• To identify a word’s part-of-speech (pos) – For sentence parsing, for machine translation, …
• To identify a word’s stem – For information retrieval
![Page 19: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/19.jpg)
19
What do we need to build a morphological parser?
• Lexicon: stems and affixes (w/ corresponding pos) • Morphotactics of the language: model of the order in
which morphemes can be affixed to a stem. E.g., plural morpheme follows noun in English
• Orthographic rules: spelling modifications that occur when affixation occurs – in il in context of l (in- + legal)
![Page 20: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/20.jpg)
20
Morphotactic Models • English nominal inflection
q0 q2 q1
plural (-s) reg-n
irreg-sg-n
irreg-pl-n
• Inputs: cats, goose, geese
![Page 21: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/21.jpg)
21
Antworth data on English Adjectives
• Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
![Page 22: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/22.jpg)
22
Antworth data on English Adjectives
• Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
![Page 23: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/23.jpg)
23
Antworth data on English Adjectives
• Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
![Page 24: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/24.jpg)
24
Antworth data on English Adjectives
• Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
![Page 25: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/25.jpg)
25
Antworth data on English Adjectives
• Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
![Page 26: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/26.jpg)
26
Antworth data on English Adjectives
• Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
![Page 27: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/27.jpg)
27
• Derivational morphology: adjective fragment
q5 q0
q1 q2 un-
adj-root
-er, -ly, -est
ε
• Adj-root: clear, happy, real, big, red
![Page 28: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/28.jpg)
28
• Derivational morphology: adjective fragment
q5 q0
q1 q2 un-
adj-root
-er, -ly, -est
ε
• Adj-root: clear, happy, real, big, red
• BUT: unbig, redly, realest
![Page 29: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/29.jpg)
29
Antworth data on English Adjectives
• Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
![Page 30: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/30.jpg)
30
Antworth data on English Adjectives
• Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
![Page 31: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/31.jpg)
31
• Derivational morphology: adjective fragment
q3
q5
q4
q0
q1 q2 un-
adj-root1
-er, -ly, -est
ε
adj-root1
adj-root2
-er, -est
• Adj-root1: clear, happy, real
• Adj-root2: big, red
![Page 32: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/32.jpg)
32
FSAs and the Lexicon
• First we’ll capture the morphotactics – The rules governing the ordering of affixes in a language.
• Then we’ll add in the actual words
![Page 33: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/33.jpg)
33
Using FSAs to Represent the Lexicon and Do Morphological
Recognition • Lexicon: We can expand each non-terminal in our
NFSA into each stem in its class (e.g. adj_root2 = {big, red}) and expand each such stem to the letters it includes (e.g. red r e d, big b i g)
q0
q1 ε
r e
q2
q4
q3
-er, -est d b
g q5 q6 i
q7
![Page 34: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/34.jpg)
34
Limitations
• To cover all of e.g. English will require very large FSAs with consequent search problems – Adding new items to the lexicon means recomputing the
FSA – Non-determinism
• FSAs can only tell us whether a word is in the language or not – what if we want to know more? – What is the stem? – What are the affixes and what sort are they? – We used this information to build our FSA: can we get it
back?
![Page 35: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/35.jpg)
35
Parsing/Generation vs. Recognition
• Recognition is usually not quite what we need. – Usually if we find some string in the language we need to find the
structure in it (parsing) – Or we have some structure and we want to produce a surface
form (production/generation)
• Example – From “cats” to “cat +N +PL”
![Page 36: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/36.jpg)
36
Finite State Transducers
• The simple story – Add another tape – Add extra symbols to the transitions
– On one tape we read “cats”, on the other we write “cat +N +PL”
![Page 37: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/37.jpg)
37
Parsing with Finite State Transducers
• cats cat +N +PL • Kimmo Koskenniemi’s two-level morphology
– Words represented as correspondences between lexical level (the morphemes) and surface level (the orthographic word)
– Morphological parsing: building mappings between the lexical and surface levels
c a t +N +PL
c a t s
![Page 38: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/38.jpg)
38
Finite State Transducers
• FSTs map between one set of symbols and another using an FSA whose alphabet Σ is composed of pairs of symbols from input and output alphabets
• In general, FSTs can be used for – Translator (Hello:Ciao) – Parser/generator (Hello:How may I help you?) – To map between the lexical and surface levels of Kimmo’s
2-level morphology
![Page 39: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/39.jpg)
39
• FST is a 5-tuple consisting of – Q: set of states {q0,q1,q2,q3,q4} – Σ: an alphabet of complex symbols, each an i/o pair s.t. i ∈ I
(an input alphabet) and o ∈ O (an output alphabet) and Σ is in I x O
– q0: a start state – F: a set of final states in Q {q4} – δ(q,i:o): a transition function mapping Q x Σ to Q – Emphatic Sheep Quizzical Cow
q0 q4 q1 q2 q3
b:m a:o a:o
a:o !:?
![Page 40: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/40.jpg)
40
Transitions
• c:c means read a c on one tape and write a c on the other • +N:ε means read a +N symbol on one tape and write nothing on
the other • +PL:s means read +PL and write an s
c:c a:a t:t +N:ε +PL:s
![Page 41: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/41.jpg)
41
FST for a 2-level Lexicon
• E.g.
Reg-n Irreg-pl-n Irreg-sg-n
c a t g o:e o:e s e g o o s e
q0 q1 q2 q3 c a t
q1 q3 q4 q2
s e:o e:o e
q0 q5
g
![Page 42: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/42.jpg)
42
FST for English Nominal Inflection
q0 q7
+PL:^s#
Combining (cascade or composition) this FSA with FSAs for each noun type replaces e.g. reg-n with every regular noun representation in the lexicon. ^ is morpheme boundary; # is word boundary
q1 q4
q2 q5
q3 q6
reg-n
irreg-n-sg
irreg-n-pl
+N:ε
+PL:#
+SG:#
+SG:#
+N:ε
+N:ε
![Page 43: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/43.jpg)
43
Problems with a 1-level FST
• Of course, its not as easy as – “cat +N +PL” <-> “cats”
• Or even dealing with the irregulars geese, mice and oxen
• But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes
![Page 44: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/44.jpg)
Examples of Spelling Changes Name Rule Example Consonant Doubling 1-letter consonant doubled
before –ing/-ed beg/begging run/running
E deletion Silent e dropped before –ing and -ed
make/making
E insertion e added after –s, -z, -x, -ch, -sh before -s
miss/misses watch/watches mash/mashes
Y replacement -y changes to –ie before –s, -i before -ed
try/tries
K insertion Verbs ending in vowel + -c add -k
panic/panicked
44
![Page 45: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/45.jpg)
45
Multi-Tape Machines
• To deal with this we can simply add more tapes and use the output of one tape machine as the input to the next
• So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols
![Page 46: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/46.jpg)
46
Multi-Level Tape Machines
• We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape
• ^ is morpheme boundary; # is word boundary
![Page 47: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/47.jpg)
47
Orthographic Rules and FSTs • Define additional FSTs to implement rules such as
consonant doubling (beg begging), ‘e’ deletion (make making), ‘e’ insertion (watch watches), etc.
Lexical f o x +N +PL
Intermediate f o x ̂ s #
Surface f o x e s
![Page 48: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/48.jpg)
48
Lexical to Intermediate Level
![Page 49: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/49.jpg)
49
Intermediate to Surface • The add an “e” rule as in fox^s# <-> foxes
![Page 50: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/50.jpg)
50
Note
• A key feature of this machine is that it doesn’t do anything to inputs to which it doesn’t apply.
• Meaning that they are written out unchanged to the output tape.
![Page 51: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/51.jpg)
51
![Page 52: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/52.jpg)
52
![Page 53: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/53.jpg)
53
• Note: These FSTs can be used for generation as well as recognition by simply exchanging the input and output alphabets (e.g. ^s#:+PL)
![Page 54: Morphology - University of Colorado Colorado Springsjkalita/work/cs589/2010/Morphology1.pdf8 Complication in Morphology • It can get a little complicated by the fact that some words](https://reader033.fdocuments.us/reader033/viewer/2022042114/5e9071879538532f1e4d6875/html5/thumbnails/54.jpg)
54
Summing Up • FSTs provide a useful tool for implementing a
standard model of morphological analysis, Kimmo’s two-level morphology – Key is to provide an FST for each of multiple levels of
representation and then to combine those FSTs using a variety of operators (cf AT&T FSM Toolkit)
– Other (older) approaches are still widely used, e.g. the rule-based Porter Stemmer