Simple and Phrasal Implicatives Lauri Karttunen CSLI, Stanford *SEM, June 7, 2012.
Finite-State Methods in Natural Language Processing Lauri Karttunen LSA 2005 Summer Institute July...
-
Upload
sydnie-wolfrey -
Category
Documents
-
view
213 -
download
0
Transcript of Finite-State Methods in Natural Language Processing Lauri Karttunen LSA 2005 Summer Institute July...
Finite-State Methods in Natural Finite-State Methods in Natural Language ProcessingLanguage Processing
Lauri Karttunen
LSA 2005 Summer Institute
July 27, 2005
Course OutlineCourse Outline
July 18:Intro to computational morphologyXFST
ReadingsLauri Karttunen, “Finite-State Constraints”, The Last
Phonological Rule. J. Goldsmith (ed.), pages 173-194, University of Chicago Press, 1993.
Karttunen and Beesley, “25 Years of Finite-State Morphology”
Chapter 1: “Gentle Introduction” (B&K)
July 20:Regular expressionsMore on XFST
ReadingsChapter 2: “Systematic Introduction”Chapter 3: “The XFST interface”
July 25More on XFST: Date ParserConcatenative morphotactics: The LEXC language
ReadingsChapter 4. “The LEXC Language”
July 27Constraining non-local dependencies: Flag DiacriticsComplex morphotactics and alternations: Finnish
Numerals
ReadingsChapter 5. “Flag Diacritics””
August 1Non-concatenative morphotactics
Reduplication, interdigitation
Realizational morphologyReadings
Chapter 8. “Non-Concatenative Morphotactics”Gregory T. Stump. Inflectional Morphology. A Theory of Paradigm
Structure. Cambridge U. Press. 2001. (An excerpt)Lauri Karttunen, “Computing with Realizational Morphology”, Lecture
Notes in Computer Science, Volume 2588, Alexander Gelbukh (ed.), 205-216, Springer Verlag. 2003.
August 3Optimality theory
ReadingsPaul Kiparsky “Finnish Noun Inflection” Generative Approaches to Finnic and
Saami Linguistics, Diane Nelson and Satu Manninen (eds.), pp.109-161, CSLI Publications, 2003.
Nine Elenbaas and René Kager. "Ternary rhythm and the lapse constraint". Phonology 16. 273-329.
Syllabification revisitedSyllabification revisited
define MarkNonDiphthongs [ [. .] -> "." || [HighV | MidV] _ LowV, # i.a, e.a LowV _ MidV, # a.e i _ [MidV - e], # i.o, i.ä u _ [MidV - o], # u.e y _ [MidV - ö], # y.e $V i _ e, # poiki.en V u _ o, # $V y _ ö, # $V [MidV | LowV] _ [u|y] C C|.#.]]; # oike.us
define Syllabify [ C* V+ C* @-> ... "." || _ C V ];
regex FinnWords .o. MarkNonDiphthongs .o. Syllabify;
ConstraintsConstraints
ge
hund
bon
nemal eg
et
ineget
o
a
ec
j n
MF%+ => _ ~$[%+Fem] %+Pl ;MF+ +Fem
+Pl
Constraining by compositionConstraining by composition
xfst[0]: read lexc < adj-noun-tags.lexcRoot...2, Nouns...2, NounRoots...4, Nmf...5, ....Building lexicon...Minimizing...Done!2.7 Kb. 45 states, 70 arcs, Circular.
xfst[1]: up gehundinoMF+hund+Noun+Fem+Sg
xfst[1]: regex "MF+" => _ ~$["+Fem"] "+Pl" ;1.2 Kb, 2 states, 7 arcs, Circular
xfst[2]: compose3.2 Kb, 61 states, 89 arcs, Circularxfst[1]: up gehundinoxfst[1]: *** Not accepted ***Less words, bigger network.
Esperanto with FlagsEsperanto with Flags
Multichar_Symbols+Noun +Adj +Nsuff+ASuff +Nize+Pl +Sg +Acc MF++Aug +Dim +Fem Op+ [email protected]@ @U.MF.No@
LEXICON Root Nouns ; Adjectives ;
LEXICON Nouns NounRoots ; @U.MF.Yes@ Ge ; LEXICON GeMF+:ge NounRoots;
LEXICON NounRoots bird Nmf ; hund Nmf ;kat Nmf ;
LEXICON Nmf+Noun:0 AugDimFem ;
LEXICON [email protected]@ Fem ; +Dim:et AugDimFem ; +Aug:eg AugDimFem ; Nend ; Adjend ;
LEXICON Fem+Fem:in AugDimFem ;
Constraining by flagsConstraining by flags
xfst[0]: read lexc < esperanto-flags.lexc
xfst[1]: up gehundinoxfst[1]:xfst[1]: down MF+hund+Noun+Fem+NSuff+Sgxfst[1]:
xfst[1]: set obey-flags offvariable obey-flags = off
xfst[1]: up gehundinoxfst[1]: MF+hund+Noun+Fem+NSuff+Sg
xfst[1]: set show-flags onvariable show-flags = on
xfst[1]: down [email protected]@[email protected]@[email protected]@
Flags in the sigmaFlags in the sigma
xfst[1]: print sigma
MF+ Neg+ Op+ a b c d e f g h i j k l m n o r
t u v +ASuff +Acc +Adj +Aug +Dim +Fem +Nsuff
+Nize +Noun +Pl +Sg @U.MF.No@ @U.MF.Yes@
Size: 35
@U.MF.Yes@: UNIFY feature 'MF' with value 'Yes'
@U.MF.No@: UNIFY feature 'MF' with value 'No'
2 flag diacritics
Eliminating flagsEliminating flags
xfst[1]: eliminate flag MF3.2 Kb. 61 states 89 arcs, CircularSize: 35
xfst[1]: print sigmaMF+ Neg+ Op+ a b c d e f g h i j k l m n o r t uv +ASuff +Acc +Adj +Aug +Dim +Fem +NSuff +Nize +Noun +Pl +SgSize: 33
The eliminate flag command composes the network with constraint networks that have the same effect as the flag diacritics that are removed.
Flag DiacriticsFlag Diacritics
Special symbols for encoding features, that is, attribute-value pairs.
Checked at runtime to avoid the cost of compiling them into the structure of the network
If a check fails, the path is abandoned.
Attributes and ValuesAttributes and Values
Epsilon arcs with feature constraints.
@U.Feature.Value@
@C.Feature@
Unify ‘Feature’ with ‘Value’ if possible.
Set ‘Feature’ to the unspecified value.
RulesRules
There can be any number of attributes.
An attribute can have any number of values.
If the value of an attribute is unspecified, it unifies successfully with any given value and is set to that value.
If the value of an attribute is specified, it unifies only with the given value.
Actions: Unify, Positive SetActions: Unify, Positive Set
@U.Feature.Value@ Unify Value with the current setting of Feature, if possible. Otherwise fail.
@P.Feature.Value@ Set Feature to Value regardless of the currentsetting. Always succeeds.
More Actions: Negative Set, ClearMore Actions: Negative Set, Clear
@N.Feature.Value@ Set Feature to thecomplement of Value
regardless of the current
setting. Always succeeds.
@C.Feature@ Make Feature beunspecified.
Alwayssucceeds.
More Actions: RequireMore Actions: Require
@R.Feature.Value@ Succeed in Feature is set
to Value. Otherwise fail.
@R.Feature@ Succeed if Feature hasbeen set to some
value.Otherwise fail.
More Actions: EqualityMore Actions: Equality
@E.Feature1.Feature2@ Succeed if Feature1has the same value asFeature2. Otherwise
fail.
Eliminating flagsEliminating flags
The constraints on "@U.FEATURE.VALUE@" have the form
~[?* PROHIBIT_FLAGS ~$[ALLOW_FLAGS] SELF ?*]
Constraint for eliminating @U.MF.No@:
~[?* ["@U.MF.Yes@"] # prohibit
~$["@P.MF.No@" | ”@C.MF@”] # allow
"@U.MF.No@"
?*]
Finnish NumeralsFinnish Numerals
Numbers and NumeralsNumbers and Numerals
The mapping from integers 0, 1, 2, 3 … to the corresponding numerals one, two, three… is a regular relation.
Some languages have a very simple numeral system, some are more complicated:seventy-three, soixante-treize, drei-und-sibzig
We can compile transducers that map between the numbers and the corresponding numerals.
Number-to-Numeral transducerNumber-to-Numeral transducer
Generation
105
hundred five hundred and five
one hundred and five
Analysis
hundred five
105
The Goal Ahead: FinnishThe Goal Ahead: Finnish
Analysis
sadanviiden
105+Sg+Gen
hundred and five (Sg Gen)
Generation
28+Ord+Pl+Gen
kahdensienkymmenensienkahdeksansien
twenty-eighth (Pl Gen)
Finnish NumeralsFinnish Numerals
Compound numerals written as one word 2 • 1000 + 5 • 100 + 3 • 10 + 1 = 2531
kaksituhattaviisisataakolmekymmentäyksi
Express ordinality, number, and casesata+Sg+Nom (100) sata+Ord+Sg+Nom (100th)sata sadas
sata+Sg+Gen (100) sata+Ord+Sg+Gen (100th)sadan sadannen
sata+Pl+Gen (100) sata+Ord+Pl+Gen (100th)satojen sadansien
Singular vs. PluralSingular vs. Plural
Numerals generally occur with singular nounskaksi+Sg+Gen kenkä+Sg+Gen
kahden kengän omistaja
(owner of two shoes)
Sets and public events may be in pluralkaksi+Pl+Gen kenkä+Pl+Gen kaksien kenkien omistaja(owner of two pairs of shoes)
kolme+Ord+Pl+Nom olympialainen+Pl+Nomkolmannet olympialaiset(third olympic games)
yksi+Pl+Nom hää+Pl+Nomyhdet häät(one wedding)
MorphotacticsMorphotactics
All parts of compound numerals agree in all respects two thousand five hundred (2500)kaksi+Sg+Gen tuhat+Sg+Gen viisi+Sg+Gen sata+Sg+Genkahden tuhannen viiden sadan
two ten eighth (28th)kaksi+Ord+Pl+Gen kymmenen+Ord+Pl+Gen kahdeksan+Ord+Pl+Genkahde ns i en kymmene ns i en kahdeksa ns i en
Singular nominative is exceptionalSingular nominative is exceptional
Numeral with a nounkaksi+Gen kenkä+Gen
kahden kengän (two shoes)
kaksi+Nom kenkä+Part
kaksi kenkää (two shoes)
Compound numeralkaksi+Gen tuhat+Gen viisi+Gen sata+Gen kolme+Gen (2503) kahden tuhannen viiden sadan kolmen
kaksi+Nom tuhat+Part viisi+Nom sata+Part kolme+Nom (2503) (kaksi • tuhatta) + (viisi • sataa) + kolme
Morphological AlternationsMorphological Alternations
Semiregular stem alternationsyksi+Sg+Nom : yksi (one)yksi+Sg+Ess : yhtenäyksi+Sg+Gen : yhdenyksi+Sg+Part : yhtäyksi+Pl+Gen : yksien
Irregular stem alternationsyksi+Ord+Sg+Nom : ensimmäinen (first)
Regular suffix alternationsVowel harmony
kolme+Sg+Part : kolmea vs. neljä+Sg+Part : neljää
Illative vowelkolme+Sg+Ill : kolmeen vs. neljä+Ill+Part : neljään
Partitive tyksi+Sg+Part : yhtä vs. neljä+Sg+Part : neljää
Solution for FinnishSolution for Finnish
Maps a number with morphological tagsinto an inflected Finnish numeral.Encodes morphotactic constraints.
Numbers/Finnish
Transducer
lexc sourcelexicon
.o.
Looping lexicon with all the formsof all Finnish single numerals concatenatedin all possible ways. Composed with morphophonological rules.
ExampleExample
Numbers/Finnish
Transducer
2 5 +Ord +Pl +Genkaksi +Ord +Pl +Gen kymmenen +Ord +Pl +Gen viisi +Ord +Pl +Gen
lexc sourcelexicon
.o.
kaksi +Pl +Nom kymmenen +Part VIISI +Ord +Genkahdet kymmentä viidennen (ungrammatical)
kaksi +Ord +Pl +Gen kymmenen +Ord +Pl +Gen viisi +Ord +Pl +Genkahdensien kymmenensien viidensien
Sublexicon for OneSublexicon for One
LEXICON Yksi YKSI+Sg:yksi Nom; # singular nominative YKSI+Sg:yhde WeakGrade; # weak stem (most cases) YKSI+Sg:yhte StrongGrade; # strong stem (essive, ill.) YKSI+Sg:yht Par; # partitive stem YKSI:yks PlStem1; # plural stem YKSI+Ord1+Sg:ensimmäinen Nom; # singular nominative YKSI+Ord1+Sg:ensimmäise AnyGrade; # weak/strong stem YKSI+Ord1+Sg:ensimmäis Par; # partitive stem YKSI+Ord+Sg:yhdes Nom; # singular nominative YKSI+Ord+Sg:yhdenne WeakGrade; # weak stem YKSI+Ord+Sg:yhdente StrongGrade; # strong stem YKSI+Ord+Sg:yhdet Par; # partitive stem YKSI+Ord:yhdens PlStem1; # plural stem
Some sublexiconsSome sublexicons
LEXICON WeakGrade
SgGen; ! Singular Genitive
PlNom; ! Plural Nominative
InvarWeak; ! Invariant (plural and singular) cases
LEXICON InvarWeak
+Tra:ksi Next; ! Translative “into”
+Ine:ssA Next; ! Inessive “in”
+Ela:ltA Next; ! Elative “from” (inside)
+Ade:llA Next; ! Adessive “on”
+Abl:ltA Next; ! Ablative “from” (outside)
+All:lle Next; ! Allative “onto”
+Abe:ttA Next; ! Abessive “without”
Sample paths for TwoSample paths for Two
kaksi+Sg+Nom kaksi+Sg+Gen kaksi+Sg+Esskaksi kahde n kahte na
kaksi+Sg+Par kaksi+Pl+Gen kaksi+Pl+Illkah TA kaks i en kaks i Vn
kaksi+Ord+Sg+Nom kaksi+Ord1+Sg+Nomkahde s toinen
kaksi+Ord+Sg+Ill kaksi+Ord1+Sg+Illkahde nte Vn toise Vn
Morphophonologial rulesMorphophonologial rules
define BackV [a | o | u];define FrontV [ä | ö | y];define Vow [BackV | FrontV | i | e];
define VHarmony [A -> a || BackV ~$[FrontV] _
.o.
A -> ä];
define IllativeV [V -> a || a (h) _ ,
V -> e || e (h) _ , … ]
define PartitiveT [T -> 0 || \Vow Vow _ ];
Example againExample again
Numbers/Finnish
Transducer
2 5 +Ord +Pl +GenKAKSI +Ord +Pl +Gen KYMMENEN +Ord +Pl +Gen VIISI +Ord +Pl +Gen
lexc sourcelexicon
.o.
morpho-phonological
rules
.o.
KAKSI +Pl +Nom KYMMENEN +Part VIISI +Ord +Gen (ungrammatical)kahdet kymmentä viidennen
KAKSI +Ord +Pl +Gen KYMMENEN +Ord +Pl +Gen VIISI +Ord +Pl +Genkahdensien kymmenensien viidensien
Remaining problemsRemaining problems
Special ordinals for yksi (one), kaksi (two)ensimmäinen (1st) vs. kahdeskymmenesyhdes (21st)
Compose the lexicon with an appropriate filter to eliminate unwanted variants.
No internal tags2+Sg+Gen00+Sg+Gen
Delete them: 0 <- Tag || _ $[\Tag Tag+] .#. ;
Singular nominative as partitive in compounds%+Nom -> %+Par // %+Sg %+Nom ~$Tag %+Sg _ ;
Ordinal/Plural/Case agreementFlag diacritics!
Flags for Finnish numeralsFlags for Finnish numerals
@U.Type.Card@ @U.Type.Ord@
@U.Number.Sg@ @U.Number.Pl@
@U.Case.Nom@ @U.Case.Gen@ @U.Case.Par@ @U.Case.Tra@
@U.Case.Ess@ @U.Case.Abe@ @U.Case.Ine@ @U.Case.Ela@
@U.Case.Ill@ @U.Case.Ade@ @U.Case.Abl@ @U.Case.All@
@U.Case.Com@ @U.Case.Ins@
3 00 +Sg +Gen @U.Type.Card@ @U.Num.Sg@ @U.Case.Gen@ @U.Type.Card@ @U.Num.Sg@ @U.Case.Gen@
k o lmen s a dan
300+Sg+Genkolmensadan
ConclusionConclusion
Mapping from numbers to numerals can be done in a simple and elegant way even for languages with complex morphology.
Necessary for text to speech applications.
Tervetuloa kahdensienkymmenensienkahdeksansien olympialaisten avajaisiin!
Welcome to the opening ceremonies of the 28th Olympic Games!
Demo!Demo!