Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer...
-
Upload
charleen-norton -
Category
Documents
-
view
212 -
download
0
Transcript of Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer...
![Page 1: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/1.jpg)
Linguistically-motivated, statistically-driven
induction of morphology
Erwin Chan
Dept. of Computer and Information ScienceUniversity of Pennsylvania
![Page 2: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/2.jpg)
Overview
• Problem: induction of morphology from unannotated text
• Main idea: knowledge of linguistic and statistical properties of morphology allows for a simple induction algorithm
• Develops ideas from previous work:– Goldsmith (2001)– Schone & Jurafsky (2000)– Yarowsky & Wicentowski (2000, 2004)
![Page 3: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/3.jpg)
Outline
1. Goals of morphology induction
2. Linguistic model of morphology
3. Statistical model of morphology
4. Induction algorithm
5. Conclusion, relevance to cognitive science
![Page 4: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/4.jpg)
Computational modeling of language acquisition
Raw corpus Induction algorithm
(“fully” unsupervised)
Linguisticknowledge
![Page 5: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/5.jpg)
Desired properties of output
1. Analysis of input data– morphology, POS, parse
2. Generalize analysis– produce tool to apply to new data– morphological analyzer, POS tagger, parser
![Page 6: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/6.jpg)
Generalize morphological structure
• Word-specific morphological analysisdogs = dog + s
cats = cat + s
churches = church + es
finches = finch + es
• Out-of-vocabulary words?
• Summarize phonological propertiesIf ends in ch, add es, otherwise add s
![Page 7: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/7.jpg)
Morphophonological rules
• generative phonology, finite-state morphology
• Analysis: inflected base form• Generation: base form inflected
• Rule specifies:– rewrite pattern– context of application
• N.PL rule: $ es / ch _ #$ s / _ # ( $ is null suffix )
![Page 8: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/8.jpg)
Towards induction of rules
• This presentation: from a corpus,– Select words to be base forms– Formulate rewrite patterns (transforms)
• Future: learn other rule components– context of application– POS categories (e.g. “Noun”)– fine-grained inflectional categories (Noun.PL)– allomorphs
![Page 9: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/9.jpg)
Outline
1. Goals of morphology induction
2. Linguistic model of morphology
3. Statistical model of morphology
4. Induction algorithm
5. Conclusion, relevance to cognitive science
![Page 10: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/10.jpg)
Linguistic model of morphology
• Model that generates inflectional morph paradigms– Base forms– Transforms– Transform signatures
• Simplifying assumptions:– One inflectional property for word
(not adequate for agglutinative languages: Finnish)– Omit derivational morphology
![Page 11: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/11.jpg)
Base-and-transforms model of morphological paradigms
• Apply transforms to base form to generate
each inflection
base
Lexeme 1
base base
Lexeme 2 Lexeme 3
![Page 12: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/12.jpg)
Base forms
• Same inflectional type across lexemes for a particular POS category– e.g. Nom.Sg for all nouns
• Representation in lexicon
• Surface form– not abstract, underlying
![Page 13: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/13.jpg)
Transforms
• Specifies conversion process between base and inflected forms
• Similar to a rule, but omits context of application
• Tuple of 2 regular expressions (X,Y)– X: replaced portion of base form– Y: replaced portion of inflected form
![Page 14: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/14.jpg)
Transform examples (for English)
Base form
eat
time
time
hang
Inflected form
eating
times
timing
hung
Transform
( $, ing )
( $, s )
( e, ing )
( *a*, *u* ) non-concat.
![Page 15: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/15.jpg)
Transform signatures
• Summarizes the inflections of a set of words– set of base forms X set of transforms– each base form belongs to exactly one trans. signature
Base forms Transforms t-sig #1 { time, save } { ( $, s ) ( e, ing ) } t-sig #2 { walk } { ( $, s ) }
generates: time, times, timing,save, saves, saving, walk, walks
![Page 16: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/16.jpg)
Comparison to stem-suffix signatures
• Stem-suffix signature (Goldsmith 2001,2007)
Stems Suffixes
sig #1 { time, save, walk } { $, s }
sig #2 { tim, sav } { ing }
• Compare lexical representations– stem-suffix sig: multiple stems for a lexeme– transform sig: one base form per lexeme
![Page 17: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/17.jpg)
Outline
1. Goals of morphology induction
2. Linguistic model of morphology
3. Statistical model of morphology
4. Induction algorithm
5. Conclusion, relevance to cognitive science
![Page 18: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/18.jpg)
Statistical model of morphology
• Need to show learnability of linguistic model
• Understand distribution of data:look for patterns that hold across languages
• Propose simple model of distribution of inflections
• Implications for linguistic model
![Page 19: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/19.jpg)
Examine annotated corpora
• Word representation: (lemma, infl. category)e.g. went = ( go, verb-past-tense )
• Collapse phonological sub-classese.g. N.Masc.Sg N.Sg
N.Fem.Sg N.Sg
![Page 20: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/20.jpg)
Spanish newswire verbs
Lemma Inflection
Log(freq)
Sparse data
![Page 21: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/21.jpg)
CHILDES adult Spanish verbs
InflectionLemma
Log(freq)
![Page 22: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/22.jpg)
Dist. of inflectional categories
• (roughly) Zipfian
• Slovene nouns
• 3 inflections
don’t occur at all
# types # types
N.Nom.Sg 7950 N.Inst.Pl 1630
N.Gen.Sg 5967 N.Dat.Sg 1515
N.Acc.Sg 5157 N.Gen.Dual 876
N.Nom.Pl 4154 N.Nom.Dual 682
N.Gen.Pl 3900 N.Dat.Pl 626
N.Inst.Sg 3334 N.Acc.Dual 586
N.Loc.Sg 3252 N.Loc.Dual 160
N.Acc.Pl 2967 N.Inst.Dual 120
N.Loc.Pl 1848 N.Dat.Dual 14
![Page 23: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/23.jpg)
High type frequency of base form
• Most type-frequent inflection accords with intuitive notions of what inflection a base form should be– Slovene: A.Pos.Nom.Sg.Indef
N.Nom.Sg
V.Main.Ind.Pres.3.Sg– Swedish: A.Pos.Sg.Indef.Nom
N.Sg.Indef.Nom
V.Inf.Act– Spanish: A.Sg
N.Sg
V.Inf
![Page 24: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/24.jpg)
Multinomial distribution
• Urn-and-balls problem– Assume inflectional categories have constant prob.– Choose lexeme and number of words, then
generate inflections according to their prob. dist.
• Let an inflection set be the inflectional types of the words generated for a particular lexeme
• What is the prob. dist. over inflection sets?Can calculate from multinomial
![Page 25: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/25.jpg)
Inflection sets and base forms
• If base form is usually most frequent, multinomial predicts:– Inflection set with base relatively high prob– Inflection set without base relatively low prob
– If a rare inflection occurs,
its base form is likely to occur
![Page 26: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/26.jpg)
Occurrence of base in infl sets• Percentage of inflection sets of size >= 2
that contain most type-freq inflectionAdj Noun Verb
Slovene 64% 68% 80%
Greek 89% 83% 62%
Swedish 80% 84% 57%
Spanish 82%
Sp CHILDES 70%
![Page 27: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/27.jpg)
Implications for linguistic model
• Zipfian + multinomial distributions predict
that data will exist in corpus to support rule learning
– Prominence of base form
– (base, inflected) exist even for rare inflections
![Page 28: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/28.jpg)
Outline
1. Goals of morphology induction
2. Linguistic model of morphology
3. Statistical model of morphology
4. Induction algorithm
5. Conclusion, relevance to cognitive science
![Page 29: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/29.jpg)
Overview of induction algorithm
• Learn transform signatures for portion of vocab– Select words to be base forms
• Construct increasingly complex data structures1. suffixes
2. transforms
3. transform signatures
• Ranking and filtering based on ling, stat models
![Page 30: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/30.jpg)
Additional simplifying assumptions
• Assume language is suffixing
• Not learning POS categories
![Page 31: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/31.jpg)
Step 1. Suffixes
• Find 50 most type-frequent suffixes
• Keep track of words that end in each suffix
ing: { beating, eating, cheating, etc. }
• Rank by number of types
![Page 32: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/32.jpg)
Most type-frequent suffixes (Brown)# types # types
1. $ 42596 41. les 237
2. s 10730 42. ses 230
3. e 4967 43. et 224
4. d 4800 44. ck 223
5. ed 3868 45. ding 220
6. y 3648 46. ning 219
7. n 3226 47. ded 219
8. g 3107 48. ment 217
9. ng 2951 49. ngs 216
10. ing 2869 50. rd 211
![Page 33: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/33.jpg)
Step 2. Transforms
• For each pair of suffixes s1 and s2,
construct 2 transforms: (s1,s2) and (s2,s1)– Don’t allow deletion: ( _ , $)
• Hypothesize base forms (next slide)
• Rank transforms by # of base forms
• Keep top 50
![Page 34: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/34.jpg)
Transform construction
s1 words s2 words
Base forms
for (s1,s2)
relation (s1,s2)
![Page 35: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/35.jpg)
Top transforms (Brown corpus)# base forms
# base forms
1. ( $, s ) 5257 41. ( on, ng ) 229
2. ( ing, ed ) 1922 42. ( ng, on ) 229
3. ( ed, ing ) 1922 43. ( $, r ) 221
4. ( $, 's ) 1609 44. ( ion, e ) 216
5. ( $, ed ) 1481 45. ( e, ion ) 216
6. ( $, ing ) 1335 46. ( y, e ) 214
7. ( $, ly ) 1069 47. ( e, y ) 214
8. ( $, d ) 1041 48. ( $, al ) 213
9. ( s, ed ) 925 49. ( y, ed ) 212
10. ( ed, s ) 925 50. ( ed, y ) 212
![Page 36: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/36.jpg)
Step 3. Transform signatures
• Intersect base form sets of different transforms
Transform 1 ( $, s )
Transform 2( $, ing )
Base forms for transform 1
Base forms for transform 2
Base forms in transforms 1 and 2
3 transform signatures
![Page 37: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/37.jpg)
Rank, filter transform signatures
• Rank by number of words
• Go down list and filter:
Missing base form #4. ($,s) ($,ed) ($,ing)
#5. (s,ed) (s,ing) transforms consisting of
“derived” suffixes
![Page 38: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/38.jpg)
Filter transform signatures
• Remove redundant signatures
(want a grammar of minimal size)
#1 ($,s)
#2 ($,’s)
#14 ($,s) ($,'s) redundant:
combination of #1 and #2
![Page 39: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/39.jpg)
Final transform signatures
1. ( $, s )
2. ( $, 's )
3. ( $, s ) ( $, ed ) ( $, ing )
4. ( $, ly )
5. ( $, s ) ( $, d ) ( e, ing )
6. ( y, ies )
7. ( $, ly ) ( $, ness )
8. ( $, s ) ( $, ed) ( $, ing ) ( $, er ) ( $, ers )
9. ( $, ed ) ( $, ing ) ( $, es )
10. ( $, ' )
11. ( $, s ) ( $, al )
12. ( $, e )
13. ( $, y )
spurious
Deletion from base
![Page 40: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/40.jpg)
Evaluation: precision of relation
• Precision:– Whether (base, derived-from-base)
relationship is inflectional
– Gold standard: Brown corpus lemmas– 96.7% correct
![Page 41: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/41.jpg)
Error Analysis
1. Agglutinative morphologyInflected base gold basesurvivors’ survivors survivor
2. Gold standard doesn’t have deriv basehunters hunt hunter
3. Spurious morphological relationshiphone hon honelouise louis louise
![Page 42: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/42.jpg)
Evaluation: vocab coverage
• Brown open-class POS categories– 31709 base forms– 539494 tokens (all inflections)
• 13 transform signatures– 5846 base forms = 18.4% coverage– 113165 tokens = 21.0% coverage
• (include redundant: 27%, 41.9% coverage)
![Page 43: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/43.jpg)
How to expand coverage
• Have initial, high-precision set of base forms
• Bootstrap– Find other inflections of base forms– Use new inflections to acquire more base forms– Repeat
![Page 44: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/44.jpg)
Why induction algorithm works
• Exploits combinatorics of multinomial
• Find legitimate morphological relationships– Intersection filters non-linguistic features– only linguistic features likely to co-occur
across large portion of vocabulary
• Find base forms– t-sigs with base more probable than t-sigs without,
so t-sigs with base are ranked high
![Page 45: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/45.jpg)
Comparison to other algorithms
• Components:– spelling and frequencies– set intersection, set cover (greedy approx. algorithm)– knowledge of base-and-transforms model
• Doesn’t use:– entropy – parameter optimization– minimum description length– transitional probability between characters– distributional semantics
![Page 46: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/46.jpg)
Outline
1. Goals of morphology induction
2. Linguistic model of morphology
3. Statistical model of morphology
4. Induction algorithm
5. Conclusion, relevance to cognitive science
![Page 47: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/47.jpg)
Summary
• Task: induction of morphology from raw data– Importance of generalization– Generalization through morphophonological
rules
• Linguistic model:– Base forms, transforms, transform signatures– Improved lexical representation
![Page 48: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/48.jpg)
Summary
• Statistical model:– Zipf + Multinomial prominence of base forms– Data distribution sufficient to learn ling. model
• Induction algorithm:– build increasingly complex representations
suffix transform transform signature
– uses knowledge of linguistic and statistical models
![Page 49: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/49.jpg)
Main ideas
• Knowledge of linguistic and statistical properties of morphology allows for a simple induction algorithm
• Look for “universal” properties of data
• Incorporate “universals” into algorithm
as a learning bias
![Page 50: Linguistically-motivated, statistically-driven induction of morphology Erwin Chan Dept. of Computer and Information Science University of Pennsylvania.](https://reader035.fdocuments.us/reader035/viewer/2022070400/56649e765503460f94b775fb/html5/thumbnails/50.jpg)
Relevance to cognitive science
• Linguistics:– Statistical / algorithmic evidence for rules– Statistical origin of rules ?
• Psycholinguistics:– “Past tense” learning models (R&M, Pinker)– presupposes list of (base, inflected) forms
• Computational linguistics:– towards induction of phonological rules and
finite-state models of morphology