11
Chapter 3 Regular Chapter 3 Regular Expressions and LanguagesExpressions and Languages
Giza Pyramids, Egypt
22
OutlineOutline 3.1 Regular Expressions3.1 Regular Expressions 3.2 Finite Automata & Regular 3.2 Finite Automata & Regular
ExpressionsExpressions 3.3 Applications of RE’s3.3 Applications of RE’s 3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s
33
3.1 Regular Expressions3.1 Regular Expressions
Use of Regular expressionsUse of Regular expressions– The regular expression is a kind of The regular expression is a kind of
generator for languages.generator for languages.
– It offers a “declarative” way of It offers a “declarative” way of expressing strings of symbols.expressing strings of symbols.
– It defines It defines all and onlyall and only regular languages regular languages (a theorem).(a theorem).
44
3.1 Regular Expressions3.1 Regular Expressions
Applications of Regular expressionsApplications of Regular expressions– Used as commands for finding strings in Used as commands for finding strings in
Web browsers or text-formatting systems Web browsers or text-formatting systems (such as UNIX (such as UNIX grepgrep commands) commands)**
– Used as lexical analyzer generator (such LUsed as lexical analyzer generator (such Lex or Flex)ex or Flex) A lexical analyzer breaks source programs intA lexical analyzer breaks source programs int
o “tokens” (keywords, identifiers, signs, …)o “tokens” (keywords, identifiers, signs, …)
– The The grepgrep command searches files or standard input globally for li command searches files or standard input globally for lines matching a given regular expression, and prints them to the prnes matching a given regular expression, and prints them to the program's standard output. ogram's standard output.
55
3.1 Regular Expressions3.1 Regular Expressions
Operators of Regular ExpressionsOperators of Regular Expressions
– Review of three operations on languages Review of three operations on languages LL
and and MM:: UnionUnion --- --- LL∪∪M = M = {{xx | | xxLL or or xxMM}}
ConcatenationConcatenation --- --- LM = LM = {{xyxy | | xxLL, , yyMM}}
– Example --- Example --- LL00 = { = {}, }, LL11 = = LL, , LL22 = = LL, …LL, …
ClosureClosure (or star, or Kleene closure) --- (or star, or Kleene closure) ---
LL** = = LL00∪∪LL11∪∪LL22∪∪......
66
3.1 Regular Expressions3.1 Regular Expressions
Example 3.1 --- Example 3.1 --- (language)(language)
– ** = { = {} because } because 00 = { = {}.}.
– If If LL = {0, 1}, then = {0, 1}, then LL00 = { = {}, }, LL11 = = LL, , LL22 = =
{00. 01, 10, 11}, …{00. 01, 10, 11}, …
– If If LL is the set of all strings of 0’s, then it is the set of all strings of 0’s, then it can be proved that can be proved that LL** is is LL itself (see the itself (see the textbook for the proof).textbook for the proof).
77
3.1 Regular Expressions3.1 Regular Expressions
3.1.2 Building Regular Expressions3.1.2 Building Regular Expressions– Recursive definition of a regular expression Recursive definition of a regular expression (RE)(RE)
EE and the language which it defines, and the language which it defines, LL((EE):): BasisBasis: :
– Constants Constants and and are RE’s, defining languages { are RE’s, defining languages {} } and and , respectively , respectively LL(() = {) = {}, }, LL(() = ) = ..
– If If aa is a symbol, then is a symbol, then aa is an RE, defining the lang is an RE, defining the language {uage {aa} } LL((aa) = {) = {aa}. (note: }. (note: aa is of bold face)is of bold face)
– A variable like A variable like LL (capitalized and italic) (capitalized and italic) represents represents any language.any language.
88
3.1 Regular Expressions3.1 Regular Expressions 3.1.2 Building Regular Expressions3.1.2 Building Regular Expressions
– Recursive definition of an RE (cont’d):Recursive definition of an RE (cont’d): InductionInduction: given two RE’s : given two RE’s EE and and FF, then, then
– E E + + FF is an RE such that is an RE such that LL((E E + + FF) = ) = LL((EE))∪∪LL((FF) ) ((unionunion))
– EFEF is an RE such that is an RE such that LL((EFEF) = ) = LL((EE))LL((FF))((concatenationconcatenation))
– EE** is an RE such that is an RE such that LL((EE**) = () = (LL((EE))))** ((closureclosure))
– ((EE) is an RE such that ) is an RE such that LL((((EE)) = )) = LL((EE) ) ((parenthparenth
esizationesization).).
99
3.1 Regular Expressions3.1 Regular Expressions
Examples (supplemental)(1/4)Examples (supplemental)(1/4) – RE RE FF = = 11 “expresses” the language “expresses” the language LL((11) = ) =
{1}.{1}.
– RE RE E = E = 11**
Language expressed by Language expressed by EE --- ---
LL = = LL((EE) = ) = LL((11**) = () = (LL((11))))* * = ({1})= ({1})** (closure of language)(closure of language)
= {= {, 1, 11, 111, 1111, …} , 1, 11, 111, 1111, …}
= {1= {1nn | | nn 0} 0}
1010
3.1 Regular Expressions3.1 Regular Expressions
Examples (supplemental)(2/4)Examples (supplemental)(2/4)
– RE RE G = G = 0011**
Language expressed by Language expressed by GG --- ---
LL = = LL((GG) = ) = LL((0101**) = ) = LL((00))LL((11**) ) (concatenation)(concatenation)
= {0}{= {0}{, 1, 11, 111, 1111, …} , 1, 11, 111, 1111, …}
= {0, 01, 011, 0111, …}= {0, 01, 011, 0111, …}
= {01= {01nn | | nn 0} 0}
1111
3.1 Regular Expressions3.1 Regular Expressions
Examples (supplemental)(3/4)Examples (supplemental)(3/4)
– RE RE H = H = 11 + + 0011**
Language expressed by Language expressed by HH --- ---
LL = = LL((HH) = ) = LL((11 + + 0101**) = ) = LL((11) ) U U LL(0(011**))
= {1} = {1} UU {0, 01, 011, 0111, …} {0, 01, 011, 0111, …}
= {1, 0, 01, 011, 0111, …}= {1, 0, 01, 011, 0111, …}
= {1}= {1}UU{01{01nn | | nn 0} 0}
1212
3.1 Regular Expressions3.1 Regular Expressions Examples (supplemental)(4/4)Examples (supplemental)(4/4)
– RE RE K = K = + + aa**
Language expressed by Language expressed by KK --- ---
LL = = LL((KK) = ) = LL(( + + aa**) = ) = LL(( ) ) UU LL((aa**))
= {= {} } UU { {a, aa, aaa, …}a, aa, aaa, …}
= {= {a, aa, aaa, …}a, aa, aaa, …}
= = LL((aa**))
That is, we have the following That is, we have the following RE equalitiesRE equalities::
+ + aa** = = aa** = = aa* * ++
1313
3.1 Regular Expressions3.1 Regular Expressions
Example 3.2 Example 3.2 – An RE defining a language of strings of An RE defining a language of strings of
alternating 0’s and 1’s alternating 0’s and 1’s (including none)(including none) is is one of the two below:one of the two below: ((0101))** + ( + (1010)* + )* + 00((1010))** + + 11((0101))* *
(0…1 1…0(0…1 1…0 0…0 1…1) 0…0 1…1) (( + + 11)()(0101)*()*( + + 00))
((Why? See the textbook.)Why? See the textbook.)
1414
3.1 Regular Expressions3.1 Regular Expressions
3.1.3 Precedence of RE operators3.1.3 Precedence of RE operators
– Precedence Precedence
Highest --- Highest --- ** (closure)(closure)
Next--- . Next--- . (concatenation) (left to right)(concatenation) (left to right)
Last--- + Last--- + (union) (left to right)(union) (left to right)
Use parentheses anywhere to resolve ambiguityUse parentheses anywhere to resolve ambiguity
1515
3.1 Regular Expressions3.1 Regular Expressions
3.1.3 Precedence of RE operators3.1.3 Precedence of RE operators
– Example 3.3: Example 3.3:
Three ways to interpret Three ways to interpret 0101* + * + 1:1:
((00((11*)) + *)) + 1 1 by precedence above by precedence above (= (= 0101* + * + 1)1)
((0101)* +)* + 1 (another meaning) 1 (another meaning)
00((11* + * + 11) ) (a third meaning)(a third meaning)
1616
3.2 FA’s & RE’s3.2 FA’s & RE’s
Theorems to be proved:Theorems to be proved:
– Every language defined by a DFA is also Every language defined by a DFA is also
defined by an RE.defined by an RE.
– Every language defined by an RE is also Every language defined by an RE is also
defined by an defined by an -NFA.-NFA.
1717
3.2 FA’s & RE’s3.2 FA’s & RE’s
Relations of theorems (yellow lines are to be Relations of theorems (yellow lines are to be proved): proved):
-NFA-NFA
RERE
NFANFA
DFADFA
1818
3.2 FA’s & RE’s3.2 FA’s & RE’s
3.2.1 From DFA’s to RE’s3.2.1 From DFA’s to RE’s– Theorem 3.4:Theorem 3.4:
If If LL = = LL((AA) for some DFA ) for some DFA AA, then there is an RE , then there is an RE RR such that such that LL = = LL((RR).).
ProofProof. . Prove by constructing progressively string sets defineProve by constructing progressively string sets define
d by a d by a certain RE formcertain RE form RRijij((kk)) until the entire set of accepuntil the entire set of accep
table strings (i.e., language table strings (i.e., language LL((AA)) is obtained.)) is obtained. Assume the states are {1, 2, ..., Assume the states are {1, 2, ..., nn} (1 is the start state).} (1 is the start state).
1919
3.2 FA’s & RE’s3.2 FA’s & RE’s
Meaning of Meaning of RRijij((kk)) --- ---
– RE RE RRijij((kk)) is used to denote the set of strings is used to denote the set of strings
ww such that such that Each Each ww is the label of a path from state is the label of a path from state ii to to
state state jj in DFA in DFA AA; ;
the path has the path has nono intermediateintermediate node whose node whose
number is larger than number is larger than kk..
2020
3.2 FA’s & RE’s3.2 FA’s & RE’s
Meaning of Meaning of RRijij((kk)) --- ---
– T construct T construct RRijij((kk)),, we use induction, starting we use induction, starting
at at kk = 0 and stop at = 0 and stop at kk = = n (the largest state n (the largest state number)number).. Then, when Then, when kk = = nn, , ii = =11, and , and jj specifies an specifies an
acceptingaccepting state, then state, then RRijij((kk)) defines a set of defines a set of
strings strings acceptedaccepted by DFA by DFA AA, with each string , with each string forming a path starting from the start state to forming a path starting from the start state to the accepting state.the accepting state.
2121
3.2 FA’s & RE’s3.2 FA’s & RE’s
Meaning of Meaning of RRijij((kk)) --- ---
Basis:Basis:
– when when kk = 0, all state numbers = 0, all state numbers 1, and so ther 1, and so ther
e is e is nono intermediate state in path intermediate state in path ii to to jj, leading , leading
to 2 cases:to 2 cases:
(1)(1) an arc (a transition) from an arc (a transition) from ii to to jj;;
(2)(2) a path from a path from ii to to ii itself. itself.
2222
3.2 FA’s & RE’s3.2 FA’s & RE’s
Meaning of Meaning of RRijij((kk)) --- ---
Basis (cont’d):Basis (cont’d):
– If If ii jj,, only only (1)(1) is possible, leading to 3 cases: is possible, leading to 3 cases:
no symbol for such a transition no symbol for such a transition RRijij(0) (0) = =
one symbol one symbol aa for the transition for the transition RRijij(0) (0) = = aa
multiple symbls multiple symbls aa11, , aa22, ..., , ..., aamm for the transition, for the transition,
RRijij(0) (0) = = aa11 + a + a22 + ... + a + ... + amm
2323
3.2 FA’s & RE’s3.2 FA’s & RE’s
Meaning of Meaning of RRijij((kk)) (supplemental)(supplemental) --- ---
Basis (cont’d) Basis (cont’d) ii jj::
RRijij(0) (0) = =
RRijij(0) (0) = = aa
RRijij(0) (0) = = aa11 + a + a22 + ... + a + ... + amm
qi qj
a qi qj
a1+…+am qi qj
2424
3.2 FA’s & RE’s3.2 FA’s & RE’s Meaning of Meaning of RRijij
((kk)) --- ---
Basis (cont’d):Basis (cont’d):
– If If ii = = jj,, only only (2)(2) is possible, which means is possible, which means there exists at least a there exists at least a path path from from ii to to ii itself, itself, in addition to the 3 cases:in addition to the 3 cases: no symbol for such a transition no symbol for such a transition RRijij
(0)(0)= =
one symbol one symbol aa for the transition for the transition RRijij(0)(0)= = + + aa
multiple symbls multiple symbls aa11, , aa22, ..., , ..., aamm for the transition, for the transition,
RRijij(0) (0) = = + + aa11 + a + a22 + ... + a + ... + amm
2525
3.2 FA’s & RE’s3.2 FA’s & RE’s
Meaning of Meaning of RRijij((kk)) (supplemental)(supplemental) --- ---
Basis (cont’d) Basis (cont’d) ii = = jj::
RRijij(0) (0) = =
RRijij(0) (0) = = + + aa
RRijij(0) (0) = = + + aa11 + a + a22 + ... + a + ... + amm
+ a
qi
qi
qi
+a1+…+am
2626
3.2 FA’s & RE’s3.2 FA’s & RE’s
InductionInduction (to compute (to compute RRijij((kk)) ))::
– Suppose there is a path from Suppose there is a path from ii to to jj that goes t that goes through no state numbered higher than hrough no state numbered higher than kk. Th. Then, two cases should be considered:en, two cases should be considered: (1)(1) the path does not go through the path does not go through k k RRijij
((k-k-1)1)
(2) (2) the path goes through the path goes through k k at least once, then that least once, then the path may be broken into 3 pieces:e path may be broken into 3 pieces:
– through through ii to to kk without passing without passing kk RRiikk((k-k-1)1)
– from from kk to to kk itself itself ( (RRkkkk((k-k-1)1)))** (recusive) (recusive);;
– from from kk to to jj without passing without passing k k RRkkjj((k-k-1)1)..
2727
3.2 FA’s & RE’s3.2 FA’s & RE’s
Illustration of paths represented by Illustration of paths represented by RRiijj((kk)) ::
ki j……
((RRkkkk((k-k-1)1)))**
circulating zero or more times
RRkkjj((k-k-1)1)RRiikk
((k-k-1)1)
…
2828
3.2 FA’s & RE’s3.2 FA’s & RE’s
Induction (cont’d)Induction (cont’d)::
– The three pieces are concatenated to beThe three pieces are concatenated to be
RRikik((k-k-1)1)((RRkkkk
((k-k-1)1)))**RRkjkj((k-k-1)1)..
– Combining Combining (1)(1) & & (2)(2), we get the RE defini, we get the RE defini
ng “ng “all the paths from all the paths from ii to to jj that go through that go through
no state higher than no state higher than kk”” as as
RRijij((kk)) = R = Rijij
((k-k-1)1) + R + Rikik((k-k-1)1)((RRkkkk
((k-k-1)1)))**RRkjkj((k-k-1)1)..
2929
3.2 FA’s & RE’s3.2 FA’s & RE’s
Induction (cont’d)Induction (cont’d)::
– ConstructingConstructing RRikik((kk)) in the order of in the order of kk until until kk = =
nn for for ii = 1 = 1
– For each accepting state For each accepting state jjkk, we can get the , we can get the
union below as the resultunion below as the result
RR11jj11
((nn) ) ++ RR11jj22
((nn)) + ... + ... ++ RR11jjmm
((nn))
where {where {jj11, , jj22, ..., , ..., jjmm} are the set of final states, } are the set of final states, FF..
(End of proof of thereom)(End of proof of thereom)
3030
3.2 FA’s & RE’s3.2 FA’s & RE’s Example 3.5Example 3.5
– Convert the following DFA into an RE.Convert the following DFA into an RE.
– RRijij(0)(0) may be constructed to be ( may be constructed to be (details in the next pagedetails in the next page):):
0, 1
1startstart 20
1
R11(0) + 1
R12(0) 0
R21(0)
R22(0) ( + 0 + 1)
3131
Example 3.5 Example 3.5 (cont’d)(cont’d)
– RR1111(0)(0) = = + + 11 because because (1, (1, 11) = 1 & going back to itself) = 1 & going back to itself
– RR1212(0)(0) = = 00 because because (1, (1, 00) = 2 (going out to state 2)) = 2 (going out to state 2)
– RR2121(0)(0) = = because there is no path from state 2 to 1because there is no path from state 2 to 1
– RR2222(0)(0) = ( = ( + + 00 + + 11) because ) because (2, (2, 00) = 2 & ) = 2 & (2, (2, 11) = 2 & ) = 2 &
going back to itselfgoing back to itself
3.2 FA’s & RE’s3.2 FA’s & RE’s
0, 1
1startstart 20
1
3232
Example 3.5 Example 3.5 (cont’d)(cont’d)
– We can then compute all We can then compute all RRijij((kk)) for for kk=1 & =1 & kk=2.=2.
– However, we may alternatively compute However, we may alternatively compute onlonlyy necessarynecessary terms of terms of RRijij((kk) ) backwardbackward from th from th
e final states, to save time.e final states, to save time.
3.2 FA’s & RE’s3.2 FA’s & RE’s
0, 1
1startstart 20
1
3333
3.2 FA’s & RE’s3.2 FA’s & RE’s
Example 3.5 Example 3.5 (cont’d)(cont’d)
– There is only one final state 2, so only have There is only one final state 2, so only have to compute to compute
RR1212(2)(2) = = RR1212
(1)(1) + + RR1212(1)(1)((RR2222
(1)(1)))**RR2222(1)(1)..
– Only have to compute Only have to compute RR1212(1)(1) and and RR2222
(1)(1), ,
withoutwithout computing computing RR2121(1)(1) and and RR1111
(1)(1)..
– To compute each of these terms, we need soTo compute each of these terms, we need some RE equalities to simplify intermediate reme RE equalities to simplify intermediate results.sults.
3434
3.2 FA’s & RE’s3.2 FA’s & RE’s
Some equalitiesSome equalities ( (RR is an RE): is an RE):
1.1. RR==RR== ( (==annihilatorannihilator for concatenation) for concatenation)
2.2. + + RR = = RR + + RR ( (==identityidentity for union) for union)
3.3. RR = = RR = = RR ( (= = identityidentity for concatenation) for concatenation)
4.4. (( + + aa))** = = aa* * == ((aa + + ))**
5.5. (( + + aa))aa** = ( = (aa** + + aaaa**) = ) = aa** + + aa++ = = aa**
aa**(( + + aa) = () = (aa** + + aa**aa) = ) = aa** + + aa++ = = aa**
(all provable by easy deduction)(all provable by easy deduction)
3535
3.2 FA’s & RE’s3.2 FA’s & RE’s
To compute To compute
RR1212(2)(2) = = RR1212
(1)(1) + + RR1212(1)(1)((RR2222
(1)(1)))**RR2222(1)(1)
– RR1212(1)(1) = = RR1212
(0)(0) + + RR1111(0)(0)((RR1111
(0)(0)))**RR1212(0)(0)
= = 00 + ( + ( + + 11)()( + + 11))**00 (by substitutions) (by substitutions)
= = 00 + ( + ( + + 11))11**00 (by 4. in last page) (by 4. in last page)
= = 00 + + 11** 00 (by 5.) (by 5.)
= (= ( + 1 + 1**))00 (by distributive law) (by distributive law)
= = 11**00 (by 4.) (by 4.)
3636
3.2 FA’s & RE’s3.2 FA’s & RE’s
To compute To compute
RR1212(2)(2) = = RR1212
(1)(1) + + RR1212(1)(1)((RR2222
(1)(1)))**RR2222(1)(1)
– RR2222(1)(1) = = RR2222
(0)(0) + + RR2121(0)(0)((RR1111
(0)(0)))**RR1212(0)(0)
= (= ( + 0 + 1) + + 0 + 1) + (( + 1) + 1)**0 0 (by substitutions)(by substitutions)
= (= ( + 0 + 1) + + 0 + 1) + (by 1.)(by 1.)
= = + 0 + 1 + 0 + 1 (by 2.)(by 2.)
3737
3.2 FA’s & RE’s3.2 FA’s & RE’s
To compute To compute
RR1212(2)(2) = = RR1212
(1)(1) + + RR1212(1)(1)((RR2222
(1)(1)))**RR2222(1)(1)
– Finally, Finally, RR1212(2)(2)
= 1= 1**0 +10 +1**0(0( + 0 + 1) + 0 + 1)**(( + 0 + 1) + 0 + 1) (by subst.)(by subst.)
= 1= 1**0 +10 +1**0(0 + 1)0(0 + 1)**(( + 0 + 1) + 0 + 1) (by 4.)(by 4.)
= 1= 1**0 +10 +1**0(0 + 1)0(0 + 1)* * (by 6.)(by 6.)
=1=1**0(0( + (0 + 1 + (0 + 1))**) (by distributive law)) (by distributive law)
= 1= 1**0(0 + 1)0(0 + 1)** (by 4.)(by 4.)
3838
3.2 FA’s & RE’s3.2 FA’s & RE’s
Check the correctness of the final resultCheck the correctness of the final result
RR1212(2)(2) = = 11**00((00 + + 11))**
correct (by looking at the diagram directly)! correct (by looking at the diagram directly)! The above method also works for NFA andThe above method also works for NFA and
--NFANFA. .
0, 1
1startstart 20
1
3939
3.2 FA’s & RE’s3.2 FA’s & RE’s
3.2.2 Converting DFA’s to RE’s 3.2.2 Converting DFA’s to RE’s by Eliminating Sby Eliminating Statestates --- --- another wayanother way– Step 1 – regard symbols on arcs as RE’sStep 1 – regard symbols on arcs as RE’s– Step 2 – conduct the following conversionStep 2 – conduct the following conversion– Step 3 – collect RE’s for all the final statesStep 3 – collect RE’s for all the final states
(for a complete diagram of this, see textbook)(for a complete diagram of this, see textbook)
Sq1 q2
s
R11
Q1 P1
. . .
. . .
q1 q2
R11+ Q1S*P1
. . .
. . .
Fig. 3.7 (partial)
Fig. 3.8 (partial)
4040
3.2 FA’s & RE’s3.2 FA’s & RE’s
Details of Step 3:Details of Step 3:(1) For (1) For eacheach final state final state qq, eliminate all states , eliminate all states
as above except the start state as above except the start state qq00..
(2) If (2) If qq qq00, then a 2-state automaton is left , then a 2-state automaton is left as follows:as follows:
Corresponding RE is (Corresponding RE is (RR++SUSU**TT))**SUSU* * (provable (provable by the first method)by the first method)
UURR
qq00 qqSS
TTstartstart
Fig. 3.9
4141
3.2 FA’s & RE’s3.2 FA’s & RE’s(3) If (3) If qq = = qq00, then perform , then perform one moreone more state state
elimination to eliminate elimination to eliminate qq, leaving only , leaving only the start state the start state qq00 as follows (see an as follows (see an example in the next page):example in the next page):
The corresponding RE is The corresponding RE is RR**..
(4)(4) Collect the result for each final state Collect the result for each final state derived as above to get the final result.derived as above to get the final result.
RR
qq00startstartFig. 3.10
4242
3.2 FA’s & RE’s3.2 FA’s & RE’s
An example of Case (3) in the last page An example of Case (3) in the last page (supplemental)(supplemental)
– Regard Regard qq00 as two separate states, as two separate states, qq as as ss, and apply Figs. , and apply Figs.
3.7 & 3.8 to eliminate 3.7 & 3.8 to eliminate qq11 as follows: as follows:
VVXX
qq00 qqYY
ZZstartstart
S= Vq0 q0
s
R11=X
Q1=Y P1=Z
. . .. . .
q0 q0
R11+ Q1S*P1
=X+YV*Z
. . .. . .
Fig. 3.7 (partial) Fig. 3.8 (partial)
4343
3.2 FA’s & RE’s3.2 FA’s & RE’s An example of Case (3) in the last page An example of Case (3) in the last page (supplemental) (supplemental)
(cont’d)(cont’d)
– Use the result Use the result RR1111 + + QQ11SS**PP11 = = XX + + YVYV**ZZ as as RR in Fig. 3.10 in Fig. 3.10
like the following:like the following:
– And the final result is And the final result is RR** = ( = (X + YV*ZX + YV*Z))**..
– This will be used in your homework.This will be used in your homework.
R=X + YV*Z R=X + YV*Z
qq00startstart
4444
3.2 FA’s & RE’s3.2 FA’s & RE’s Example 3.5 revisitedExample 3.5 revisited
– Use the derivation for 2-state automaton described Use the derivation for 2-state automaton described previously directly to bepreviously directly to be
((RR++SUSU**TT))**SUSU* * = (= (1 1 + + 001111
= = 11* * 11 correct!correct!
0, 1
1startstart 20
1
UURR
qq11 qq22
SS
TTstartstart
4545
3.2 FA’s & RE’s3.2 FA’s & RE’s Example 3.6Example 3.6
– Step 1: regard all symbols on the arcs as RE’s, Step 1: regard all symbols on the arcs as RE’s, we getwe get
Astartstart B1
0, 1
C0, 1
D0, 1
Astartstart B1
0 + 1
C0 + 1
D0 + 1
4646
3.2 FA’s & RE’s3.2 FA’s & RE’s Example 3.6Example 3.6
– Step 2: to remove B, use the following conversion we getStep 2: to remove B, use the following conversion we get
ss = = , , qq11 = A, = A, qq22 = C, = C, SS = = , , QQ11 = = 11, , PP11 = = 00 + + 11, , RR1111 = = , ,
so so RR1111 + + QQ11SS**PP11 = = + + 11**((00 + + 11) = ) = 11((00 + + 11) = ) = 11((00 + + 11))
Sq1q2
s
R11
Q1 P1
. . .
. . .
q1 q2
R11+ Q1S *P1
. . .
. . .Astartstart B
1
0 + 1
C0 + 1
D0 + 1
4747
Example 3.6 (cont’d)Example 3.6 (cont’d)
– For final state D, we have to remove C further, resulting inFor final state D, we have to remove C further, resulting in
ss = C, = C, qq11 = A, = A, qq22 = D, = D, SS = = , , QQ11 = =1(0 + 1)1(0 + 1), , PP11 = =00 + + 11, , RR1111= = , ,
so so RR1111 + + QQ11SS**PP11 = = + + 1(0 + 1)1(0 + 1)**((00 + + 11) = ) = 11((00 + + 11)()(00 + + 11))
Sq1q2
s
R11
Q1 P1
. . .. . .
q1 q2
R11+ Q1S *P1
. . .. . .
Astartstart
0 + 1
C1(0 + 1)
D0 + 1
3.2 FA’s & RE’s3.2 FA’s & RE’s
4848
Example 3.6 (cont’d)Example 3.6 (cont’d)
– By the following conversion, we getBy the following conversion, we get
– RR = ( = (00 + + 11), ), qq11 =A, =A, qq22 =D, =D, SS = = 11((00 + + 11)()(00 + + 11),), T T = = , , UU = =
soso ( (RR++SUSU**TT))**SUSU** = (0+1+= (0+1+1(0 + 1)1(0 + 1)**))**((11((00 + + 11)()(00 + + 11)) )) ** = = ((00 + + 11))**11(0 + 1)(0 + 1)(0 + 1)(0 + 1)
Astartstart
0 + 1
1(0 + 1)(0 + 1)1(0 + 1)(0 + 1)D
3.2 FA’s & RE’s3.2 FA’s & RE’s
UURR
qq11 qq22
SS
TTstartstart
4949
Example 3.6 (cont’d)Example 3.6 (cont’d)– For the other final state C, starting from the For the other final state C, starting from the
following diagramfollowing diagram
We have to eliminate D by the following diagramWe have to eliminate D by the following diagram
3.2 FA’s & RE’s3.2 FA’s & RE’s
Astartstart
0 + 1
C1(0 + 1)
D0 + 1
Sq1q2
s
R11
Q1 P1
. . .. . .
q1 q2
R11+ Q1S *P1
. . .. . .
5050
Example 3.6 (cont’d)Example 3.6 (cont’d)– Since D has no successor (and C before it is a final state), Since D has no successor (and C before it is a final state),
deleting D has no effect to the other partsdeleting D has no effect to the other parts, resulting in the , resulting in the following diagram.following diagram.
And by the following conversion, we getAnd by the following conversion, we get
((RR++SUSU**TT))**SUSU** = = ((0 0 + + 1 1 + + 1(0 + 1)1(0 + 1)**))**((1(0 + 1)) 1(0 + 1)) **
= (= (0 0 + + 11))**1(0 + 1)1(0 + 1)
3.2 FA’s & RE’s3.2 FA’s & RE’s
Astartstart
0 + 1
C1(0 + 1)
UURR
qq11 qq22
SS
TTstartstart
5151
Example 3.6 (cont’d)Example 3.6 (cont’d)– The final result is a sum of the previous two The final result is a sum of the previous two
derivation results:derivation results:
((0 0 + + 11))**1(0 + 1)1(0 + 1) + + ((00 + + 11))**11(0 + 1)(0 + 1)(0 + 1)(0 + 1)
3.2 FA’s & RE’s3.2 FA’s & RE’s
5252
3.2 FA’s & RE’s3.2 FA’s & RE’s
3.2.3 Converting RE’s to Automata3.2.3 Converting RE’s to Automata
– Theorem 3.7Theorem 3.7 Every language defined by Every language defined by
an RE is also defined by an FA.an RE is also defined by an FA.
ProofProof. .
Basis. Basis. There are three cases, as shown There are three cases, as shown
below.below.
RE = RE =
a
RE = a
5353
InductionInduction. Three cases need be considered.. Three cases need be considered.
(1) RE = (1) RE = RR + + SS
3.2 FA’s & RE’s3.2 FA’s & RE’s
RE = R + S
R
S
5454
InductionInduction. Three cases need be considered.. Three cases need be considered.
(2) RE = (2) RE = RSRS
3.2 FA’s & RE’s3.2 FA’s & RE’s
RE = RS
R
S
5555
3.2 FA’s & RE’s3.2 FA’s & RE’s
InductionInduction. Three cases need be considered.. Three cases need be considered.
(3) RE =(3) RE = R R**
RE = R*
R R
R0
5656
3.2 FA’s & RE’s3.2 FA’s & RE’s
– Example 3.8Example 3.8 (see Fig. 3.18 in the (see Fig. 3.18 in the textbook).textbook).
Convert RE (Convert RE (00 + + 11)*)*11((0 0 + + 11) into a DFA.) into a DFA.
(a) (a) 00 + + 110
1
5757
3.2 FA’s & RE’s3.2 FA’s & RE’s
– Example 3.8Example 3.8 (see Fig. 3.18 in the (see Fig. 3.18 in the textbook).textbook).
Convert RE (Convert RE (00 + + 11)*)*11((0 0 + + 11) into a DFA.) into a DFA.
(b) ((b) (00 + + 1)1)**
0
1
5858
3.2 FA’s & RE’s3.2 FA’s & RE’s
– Example 3.8Example 3.8 (see Fig. 3.18 in the (see Fig. 3.18 in the textbook).textbook).
Convert RE (Convert RE (00 + + 11)*)*11((0 0 + + 11) into a DFA.) into a DFA.
(c) ((c) (00 + + 1)1)**11((00 + + 1)1)
Connect every two parts by an Connect every two parts by an -transition-transition0
1
1(B)
5959
3.3 Applications of RE’s3.3 Applications of RE’s
Two examples of uses of RE’sTwo examples of uses of RE’s
– Lexical analysisLexical analysis
– Text searchText search
3.3.1 RE’s in UNIX3.3.1 RE’s in UNIX
– RE’s used in UNIX are extended versions RE’s used in UNIX are extended versions
of RE’s, allowing of RE’s, allowing non-regularnon-regular languages to languages to
be recognized.be recognized.
6060
3.3 Applications of RE’s3.3 Applications of RE’s
3.3.1 RE’s in UNIX3.3.1 RE’s in UNIX– Rules for character classes:Rules for character classes:
The symbol . (dot) The symbol . (dot) any characters. any characters. [[aa11aa22……aakk] ] aa11 + + aa22 + … + + … + aakk
[[aa11--aakk] ] [ [aa11aa22……aakk]]
e.g., [0-9] e.g., [0-9] [0 1 … 9] [0 1 … 9] 00 + + 11 + … + + … + 99
[A-Z] [A-Z] A + B + … +Z A + B + … +Z
[A-Za-z0-9] [A-Za-z0-9] set of all letters and digits set of all letters and digits
[+[+.0-9] .0-9] characters for forming signed digits characters for forming signed digits
Special notationsSpecial notations
e.g., e.g., [:digit:][:digit:] = [0-9], = [0-9], [:alpha:][:alpha:] = [A-Za-z], = [A-Za-z], [:alnum:][:alnum:] = = [A-Za-z0-9][A-Za-z0-9]
6161
3.3 Applications of RE’s3.3 Applications of RE’s
3.3.1 RE’s in UNIX3.3.1 RE’s in UNIX
– Operators used in UNIX:Operators used in UNIX: | as union | as union + in RE + in RE
? as “zero ? as “zero or or one of” like one of” like RR? ? + + RR
+ as “one or more of” like + as “one or more of” like RR+ + RRRR* * (= (= RR++))
{{nn} as “} as “nn copies of” like R{5} copies of” like R{5} RRRRRRRRRR (= (= RR55))
– * still used in UNIX.* still used in UNIX.
6262
3.3 Applications of RE’s3.3 Applications of RE’s
3.3.2 Lexical analysis3.3.2 Lexical analysis– Example recalled (in Chapter 1)Example recalled (in Chapter 1)
’’[A-Z][a-z]*[A-Z][a-z]*[ ][ ][A-Z][A-Z][A-Z][A-Z]’’
means the following REmeans the following RE
(A+B+…+Z)(a+b+…+z)*(A+B+…+Z)(a+b+…+z)*__(A+B+…Z)(A+B+…+Z)(A+B+…Z)(A+B+…+Z)
where where __ means a blank.means a blank.
The above can be used to represent addresses The above can be used to represent addresses
like like Ithaca NY, Buffalo NYIthaca NY, Buffalo NY, … , …
6363
3.3 Applications of RE’s3.3 Applications of RE’s 3.3.2 Lexical analysis3.3.2 Lexical analysis
– Each UNIX command lex or flex has a form:Each UNIX command lex or flex has a form:
UNIX-style REUNIX-style RE {code for lexical analyze{code for lexical analyzerr
generation}generation}
– ExamplesExamples else else {return(ELSE);}{return(ELSE);}
[A-Za-z][A-Za-z0-9]*[A-Za-z][A-Za-z0-9]* {code to enter the{code to enter the found identifier ifound identifier i
nn the symbol table;the symbol table; return(ID)}return(ID)}
>=>= {return(GE);}{return(GE);} ……
6464
3.3 Applications of RE’s3.3 Applications of RE’s
3.3.3 Finding Patterns in Text3.3.3 Finding Patterns in Text– We can use RE’s in UNIX for pattern search in WeWe can use RE’s in UNIX for pattern search in We
b pagesb pages– Example: UNIX RE for addresses (incomplete)Example: UNIX RE for addresses (incomplete)
’’[0-9]+[A-Z]?[0-9]+[A-Z]?[ ][ ][A-Z][a-z]*([A-Z][a-z]*([ ][ ][A-Z][a-z]*)*[A-Z][a-z]*)*[ ][ ] (S (Street|Sttreet|St\.\.|Avenue|Ave|Avenue|Ave\.\.|Road |Rd|Road |Rd\.\.))’’
e.g., 123A Main Street, 20 Ta Hsueh Rd., …e.g., 123A Main Street, 20 Ta Hsueh Rd., …
– Notes: 1. there is inconsistency in textbook; blanks should be replaced by [ ] Notes: 1. there is inconsistency in textbook; blanks should be replaced by [ ] (see p. 4 & p. 113 in the textbook)(see p. 4 & p. 113 in the textbook) 2. the backslash is used to differentiate a real dot from the dot used for 2. the backslash is used to differentiate a real dot from the dot used for ‘ ‘any character’)any character’)
6565
3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s
Purpose:Purpose:– To derive “high-level” algebraic laws for To derive “high-level” algebraic laws for
equivalent RE’sequivalent RE’s
Two RE’s are said to be Two RE’s are said to be equivalentequivalent if the if the
languages they define are identical. languages they define are identical.
The RE’s to be discussed include The RE’s to be discussed include variablesvariables, ,
instead of just constants like instead of just constants like , , 00, , 11, , aa, , 0101, ,
……
6666
3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s
3.4.1 Associativity & Commutativity 3.4.1 Associativity & Commutativity – Assume Assume LL, , MM, and , and NN are RE’s ( are RE’s (variablesvariables))– Commutative law for unionCommutative law for union
LL + + MM = = MM + + LL – Associative law for unionAssociative law for union
((LL + + MM) + ) + NN = = LL + ( + (MM + + NN) ) – Associative law for concatenationAssociative law for concatenation
((LMLM))NN = = LL((MNMN) ) (Note: commutative law for concatenation is false)(Note: commutative law for concatenation is false)
6767
3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s
3.4.2 Identities and Annihilators 3.4.2 Identities and Annihilators
– identity for union (identity for union ( + + LL = = LL + + = = LL))
– U U annihilator for union (U + annihilator for union (U + LL = = LL + U = U) + U = U)
– identity for concatenation (identity for concatenation (LL = = LL = = LL ) )
– annihilator for concatenation (annihilator for concatenation (LL = = LL = =
))
6868
3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s 3.4.3 Distributive Laws3.4.3 Distributive Laws
– Left distributive law of concatenation over Left distributive law of concatenation over unionunion
LL((MM + + NN) = ) = LMLM + + LNLN
– Right distributive law of concatenation over Right distributive law of concatenation over unionunion
((MM + + NN))LL = = MLML + + NLNL
Note: U: universal languageNote: U: universal language
6969
3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s
3.4.4 The Idempotent Law 3.4.4 The Idempotent Law
– Idempotent law for unionIdempotent law for union
LL + + LL = = LL
Note: “idempotent” means Note: “idempotent” means 【數】冪等【數】冪等 (( 的的 ););
等冪等冪 (( 的的 ))
7070
3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s
3.4.5 Laws Involving Closures3.4.5 Laws Involving Closures– ((LL**))** = = LL**
** = = ** = = – LL++ = = LL**LL = = LLLL** – ((LL + + MM))** = ( = (LL**MM**))**
– LL* = * = LL+ + + + (easy)(easy)– LL?? = = L L (definition of ? said before)(definition of ? said before)
(for proofs, see the textbook)(for proofs, see the textbook)
7171
3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s
3.4.6 & 3.4.7 Discovering Laws for RE’3.4.6 & 3.4.7 Discovering Laws for RE’s and A Test for an RE Algebraic Laws and A Test for an RE Algebraic Law
– It can be proved thatIt can be proved that
((LL + + MM))** = ( = (LL**MM**))* * is true is true iffiff (a + b) (a + b)** = (a = (a**bb**))** is true is true
7272
3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s
3.4.6 & 3.4.7 Discovering Laws for RE’s a3.4.6 & 3.4.7 Discovering Laws for RE’s and A Test for an RE Algebraic Law (cont’nd A Test for an RE Algebraic Law (cont’d)d)– That is, replace variables in an RE equality with sThat is, replace variables in an RE equality with s
ingle symbols, and check if the resulting ingle symbols, and check if the resulting concreteconcrete RE equality can be proved to be true; if so, then tRE equality can be proved to be true; if so, then the original RE equality is also true.he original RE equality is also true.
Proof.Proof. By By Theorems 3.13 and 3.14.Theorems 3.13 and 3.14. For details, se For details, see the textbook. e the textbook. (iff = if and only if)(iff = if and only if)
7373
3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s
3.4.7a Some RE Equalities 3.4.7a Some RE Equalities (supplemental) (supplemental)
– ** = = ** = =
– rrrr** = = rr**rr
– rr** = r = r**rr** = ( = (rr**))** = r = r** + + rr**
– rr** = = + + rrrr** = = + + rr**rr = = + + rr** =( =( + + rr))** = ( = ( + + rr))rr**
– rr** = ( = (rr + + rr22 + … + + … +rrkk))** ((kk 1) 1)(for proofs, see the text and exercises of Chapter 6 in my Chinese (for proofs, see the text and exercises of Chapter 6 in my Chinese
textbook)textbook)
7474
3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s
3.4.7a Some RE Equalities 3.4.7a Some RE Equalities (supplemental) (supplemental) – rr** = = + + rr + + rr22 + … + + … + rrk k - 1 - 1 + + rrkkrr** ((kk 1) 1)
– ((pp + + qq))** = ( = (pp** + + qq**))**==((pp**qq**))**==pp**((qpqp**))**= (= (pp**qq))**pp**
– ((pqpq))**pp = = pp((qpqp))**
– ((pp**qq))* * = = + ( + (pp + + qq))**qq
– ((pqpq**))* * = = + + p p((pp + + qq))**
(for proofs, see the text and exercises of Chapter 6 in my Chinese (for proofs, see the text and exercises of Chapter 6 in my Chinese textbook)textbook)
Top Related