Human Language Technology
Finite State Transducers
October 2009 HLT: finite state transducers 2
Acknowledgement
• Material in this lecture derived/copied in part from– Richard Sproat CL46 Lectures– Lauri Karttunen LSA lectures 2005– Shuly Wintner 2008 Malta
October 2009 HLT: finite state transducers 3
Three Key Concepts
Finite StateTransducers
RegularRelations
ComputationalMorphology
October 2009 HLT: finite state transducers 4
Three Key Concepts
Finite StateTransducers
RegularRelations
ComputationalMorphology
October 2009 HLT: finite state transducers 5
A Regular Set
ababababababababababababababab..
L1
October 2009 HLT: finite state transducers 6
Two Regular Sets
ababababababababababababababab..
bababababababababababababababa..
L1 L2
October 2009 HLT: finite state transducers 7
A Regular Relation L1 x L2
ababababababababababababababab..
bababababababababababababababa..
L1 L2
or {("ab","ba"), ("abab","baba"),...}
October 2009 HLT: finite state transducers 8
Some closure properties for regular relations
• Concatenation [R1 R2]
• Power (Rn)
• Reversal
• Inversion (R-1)
• Composition: R1 ○ R2
October 2009 HLT: finite state transducers 9
Concatenation and Power
ConcatenationR1 = {("a","b")}R2 = {("c","d")}[R1 R2] = {("ac","bd")}
PowerR1+ = {("a","b"),("aa","bb"), ("aaa","bbb"), ...}
October 2009 HLT: finite state transducers 1061
Composition
• R1 ○ R2 denotes the composition of relations R1 and R2.
• DefinitionIf R1 contains <x,y>
And R2 contains <y,z>
Then R1 ○ R2 contains <x,z>
• R1 and R2 and B must be relations. If either is just a language, it is assumed to abbreviate the identity relation.
• R1 ○ R2 is written [R1 .o. R2] in xfst
October 2009 HLT: finite state transducers 11
Closure Properties of Regular Languages and Relations
Operation Regular Languages Regular RelationsUnion yes yesConcatenation yes yesIteration yes yes
Intersection yes noSubtraction yes noComplementation yes no
Composition n/a yes
October 2009 HLT: finite state transducers 12
Morphology as a Regular Relation
catcatsmicelives...
catcat+N+PLmouse+N+PLlife+N+PLlive+V+3SING..
surface language lexical language
or {("cat,cat"),("cats","cat+N+PL"),......}
October 2009 HLT: finite state transducers 13
Part-of-Speech Tagging
• I know some new tricks• PRON V DET ADJ N
• said the Cat in the Hat• V DET N P DET N
October 2009 HLT: finite state transducers 14
Singular-to-plural mapping:
• cat hat ox child mouse sheep • cats hats oxen children mice sheep
October 2009 HLT: finite state transducers 15
Three Key Concepts
Finite StateTransducers
RegularRelations
ComputationalMorphology
October 2009 HLT: finite state transducers 16
FSA
a
Used for• Recognition• Generation
October 2009 HLT: finite state transducers 17
Finite State Transducers
• A finite state transducer (FST) is essentially an FSA finite state automaton that works on two (or more) tapes.
• The most common way to think about transducers is as a kind of translating machine which works by reading from one tape and writing onto the other.
October 2009 HLT: finite state transducers 18
FST Definition
• A 2 way FST is a quintuple (K,s,F,ixo,) where
i, o are input and output alphabets
• K is a finite set of states
• s K is an initial state
• FK are final states is a transition relation of type
K x i x o x K
October 2009 HLT: finite state transducers 19
FST
a
Used for• Recognition• Generation• Translation
b
upper tape
lower tape
October 2009 HLT: finite state transducers 20
A Very Simple Transducer
a
b
Relation { ("a","b") }
Notation a:b encodes the transition
October 2009 HLT: finite state transducers 21
A Very Simple Transducer
a
b
a:b
also written as
October 2009 HLT: finite state transducers 22
A Very Simple Transducer
a
b
a:b
upper side
lower side
October 2009 HLT: finite state transducers 23
Symbol Pairs
• Symbols vs. symbol pairs– In general, no distinction is made in
xfst betweena the language {“a”}a:a the identity relation
{(“a”, “a”)}
a
October 2009 HLT: finite state transducers 24
A (more interesting) Transducer
• Relation
{ ("a","b"), ("aa","bb"), ...}
• Notationa:b*
• N.B. with this notation a and b must be single symbols
1
a:b
October 2009 HLT: finite state transducers 25
Transducer have SeveralModes of Operation
• generation mode: It writes on both tapes. A string of as on one tape and a string of bs on the other tape. Both strings have the same length.
• recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs.
• translation mode (left to right): It reads as from the first tape and writes a b for every a that it reads onto the second tape.
• translation mode (right to left): It reads bs from the second tape and writes an a for every b that it reads onto the first tape.
October 2009 HLT: finite state transducers 26
The Basic Idea
• Morphology is regular
• Morphology is finite state
October 2009 HLT: finite state transducers 27
Morphology is Regular
• The relation between the surface forms of a language and the corresponding lexical forms can be described as a regular relation, e.g.
{ ("leaf+N+Pl","leaves"),("hang+V+Past","hung"),...}
• Regular relations are closed under operations such as concatenation, iteration, union, and composition.
• Complex regular relations can be derived from simpler relations.
October 2009 HLT: finite state transducers 28
Morphology is finite-state
• A regular relation can be defined using the metalanguage of regular expressions.
• [{talk} | {walk} | {work}]• [%+Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:{ed}];
• A regular expression can be compiled into a finite-state transducer that implements the relation computationally.
October 2009 HLT: finite state transducers 29
Compilation
• [{talk} | {walk} | {work}]• [%+Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:
{ed}];
Regular expression
k
t
a
a
wo
l
r
+Progr:i :g
+3rdSg:s
+Past:e :d
:n
+Base:
Finite-state transducer
finalstate
initialstate
October 2009 HLT: finite state transducers 30
work+3rdSg --> works
k:k
t:t
a:a
a:a
w:wo:o
l:l
r:r
+Progr:i :g
+3rdSg:s
+Past:e :d
:n
+Base:
Generation
October 2009 HLT: finite state transducers 31
talked --> talk+Past
k:k
t:t
a:a
a:a
w:wo:o
l:l
r:r
+Progr:i :g
+3rdSg:s
+Past:e :d
:n
+Base:
Analysis
October 2009 HLT: finite state transducers 32
XFST Demo 2
• xfst[0]: regex • [{talk} | {walk} | {work}]• [% +Base:0 | %+SgGen3:s | %+Progr:{ing} | %
+Past:{ed}];
% xfstxfst[0]:
start xfst
compile a regular expression
apply the resultxfst[1]: apply up walkedwalk+Past
xfst[1]: apply down talk+SgGen3talks
October 2009 HLT: finite state transducers 33
Lexical transducer
veut
vouloir +IndP +SG + P3
Finite-state transducer
inflected form
citation form inflection codes
v o u l o i r +IndP +SG +P3
v e u t
• Bidirectional: generation or analysis• Compact and fast• Comprehensive systems have been
built for over 40 languages:– English, German, Dutch, French,
Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese, Korean, Basque, Greek, Arabic, Hebrew, Bulgarian, …
October 2009 HLT: finite state transducers 34
How lexical transducers are made
LexiconFST
RuleFSTs
Compiler
f a t +Adj
r
+Comp
f a t t e
Lexical Transducer(a single FST)composition
LexiconRegular Expression
RulesRegular Expressions
Morphotactics
Alternations
October 2009 HLT: finite state transducers 35
Sequential Model
...
Surface form
Intermediate form
Lexical form
fst 1
fst 2
fst n
Ordered sequenceof rewrite rules(Chomsky & Halle ‘68)can be modeledby a cascade offinite-state transducersJohnson ‘72Kaplan & Kay ‘81
October 2009 HLT: finite state transducers 36
Parallel Model
Set of parallelof two-level rules (constraints)
compiled into finite-state automatainterpreted as transducers
Koskenniemi ‘83
fst 1 fst 2 fst n...
Surface form
Lexical form
October 2009 HLT: finite state transducers 37
Sequential vs. Parallel rules
compose intersect
FST
rule 1 rule 2 rule n...
Surface form
Lexical form
Koskenniemi 1983
Intermediate form
...
Surface form
Lexical form
rule 1
rule n
rule 1
Chomsky&Halle 1968
October 2009 HLT: finite state transducers 38
Sequential vs. Parallel Rules
• Sequential rules are combined by means of composition.
• Advantage: FSTs are closed under composition• Disadvantage: order of operations is sensitive• Parallel rules are combined by means of
intersection• In general, FSTs are not closed under
intersection.• … but FSTs without ε-transitions are closed
under intersection.
October 2009 HLT: finite state transducers 39
Crossproduct
• A .x. B The relation that maps every string in A to every string in B, and vice versa
• A:B Same as [A .x. B].
b:y c:0a:x
a b c .x. x y [a b c] : [x y] {abc}:{xy}
October 2009 HLT: finite state transducers 40
Composition
• A .o. B The relation C such that if A maps x to y and B maps y to z, C maps x to z.
b:B c:Ca:A
b ca
a:A
b:B
c:C
d:D {abc} .o. [a:A | b:B | c:C | d:D]*
October 2009 HLT: finite state transducers 41
Transducers are not closed under intersection
ε:b
c:bc:b
ε:a
ε:bc:a
T1(Cn) = { anbm | m≥0 }
T2(Cn) = { ambn | m≥0 }
T1∩T2 (Cn) = { anbn }
October 2009 HLT: finite state transducers 42
Xerox RE Operators
• $ containment• => restriction• -> replacement
– Make it easier to describe complex languages and relations without extending the formal power of finite-state systems.
October 2009 HLT: finite state transducers 43
Containment
aa?? ?? aa$a$a
[?* a ?*][?* a ?*]
October 2009 HLT: finite state transducers 44
Restriction
??cc
bb
bb
cc?? aa
cc
a => b _ ca => b _ c
““AnyAny aa must be preceded bymust be preceded by bband followed byand followed by cc.”.”
~[~[?* b] a ?*] & ~[?* a ~[c ?*]] ~[~[?* b] a ?*] & ~[?* a ~[c ?*]]
Equivalent expression Equivalent expression
October 2009 HLT: finite state transducers 45
Replacement
a:ba:b
bb
aa
??
??
b:ab:a
aa
a:ba:b
a b -> b a
““Replace ‘ab’ by ‘ba’.”Replace ‘ab’ by ‘ba’.”
[[~$[a b] [[a b] .x. [b a]]]* ~$[a b]]
Equivalent expression Equivalent expression
October 2009 HLT: finite state transducers 46
Replacement + Marking
0:[0:[
[[
0:]0:]
??
aa
ee
iioo
uu]]
a|e|i|o|u -> %[ ... %]
p o t a t op o t a t op[o]t[a]t[o]p[o]t[a]t[o]
October 2009 HLT: finite state transducers 47
Conditional Replacement
The relation that replaces A by B between L and R leaving everything else unchanged.
A -> BA -> B
Replacement
L _ RL _ R
Context
October 2009 HLT: finite state transducers 48
Sequential application
N -> m / _ p
p -> m / m _
k a N p a n
k a m p a n
k a m m a n
October 2009 HLT: finite state transducers 49
Sequential application in detail
N:m
N
?? 0
2
1
pN:m
m
pN
m
p:m
?? 0 1
mp
m
k a N p a n
k a m p a n
k a m m a n
0 0 0 2 0 0 0
0 0 0 1 0 0 0
October 2009 HLT: finite state transducers 50
Composition
N:m
N
?? 0
3
1
N:m
m
p
N
?
m2
p:m
p:m
N m
N:mk a N p a n
k a m m a n
0 0 0 3 0 0 0
Top Related