06L3
-
Upload
theresapainter -
Category
Documents
-
view
222 -
download
0
Transcript of 06L3
-
8/13/2019 06L3
1/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 1
DFA RE
How to construct a RE for the following DFA?
By observation: (0+1)*0+
A systematic method?
Assume that we are given a DFA with stateslabeled by q0, q1, , qnand q0is the start state.
1
1
0
0
q0 q1start
-
8/13/2019 06L3
2/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 2
DFA RE
We define a term Rijk to denote the set of all stringsthat take the DFA M from qito qjwith intermediate
states going through q0, q1, q2, or qkonly.
R00-1
= {, 0}, R000
= {, 0, 00, 000, ...}= 0*R01
-1= {1}, R010= {1, 01, 001, 0001, ...}= 0*1
Note that 101 R010. Why?
11
0
0
q0 q1
e.g.
start
-
8/13/2019 06L3
3/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 3
DFA RE
Rij-1= {a | (qi, a) = qj} if i j
{a | (qi, a) = qj}{} if i = j
Rijk= Rij
k-1Rikk-1(Rkk
k-1)*Rkjk-1
qi
qk
qj
a path in M
We can recursively build the RE for Rij0, Rij
1,
Rij2, , Rij
n.
-
8/13/2019 06L3
4/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 4
DFA RE
Let rijkbe a RE for Rijk. Compute rij-1 i,j = 0n,
then rij0 i,j = 0n, then rij
1 i,j = 0n, , until rijn
i,j = 0n according to the formula:
The RE for the language accepted by M:
where q0is the start state and F = {qj1qj2
qjp}
rijk= rijk-1+ rikk-1(rkkk-1)*rkjk-1
r0j1n+ r0j2n+ .. + r0jpn
-
8/13/2019 06L3
5/18
-
8/13/2019 06L3
6/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 6
DFA, NFA, -NFA and RE
DFA NFA -NFA RE
DFA, NFA, -NFA and RE are all equivalent.
A language that can be described by them is called a
Regular Language
-
8/13/2019 06L3
7/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 7
Algebraic Rules for RE
Commutative Rule
Associative Rule
Distributive Rule
Identity
-
8/13/2019 06L3
8/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 8
Commutative & Associative Rules
Let L, M and N be regular expressions, which
of the followings are correct?
L + M = M + L
LM = ML
(L + M) + N = L + (M + N)
(LM)N = L(MN)
Commutative
Rules
AssociativeRules
-
8/13/2019 06L3
9/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 9
Distributive Rules
Which of the followings are correct?
L(M + N) = LM + LN
L + (MN) = (L + M)(L + N)
(M + N)L = ML + NL
(MN) + L = (M + L)(N + L)
Left Distributive
Rules
Right Distributive
Rules
-
8/13/2019 06L3
10/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 10
Identities
What is the identity for union?
? + L = L + ? = L
What is the identity for concatenation?
?L = L? = L
-
8/13/2019 06L3
11/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 11
Other Rules for Kleene Closure
(L*)* = L*
L+= LL* = L*L
L* = L++
* =
* =
-
8/13/2019 06L3
12/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 12
Class Discussion
Which of the followings are correct, andwhy?
L + ML = (L + M)L(L + M)* = (L*M*)*
(To prove, we need to show that any string gener-ated by the RE on the right can also be generated
by the RE on the left, and vice versa. Try usingthe above algebraic rules. To disprove, we need tofind a counter-example.)
-
8/13/2019 06L3
13/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 13
Applications of RE
Two common applications of RE:
Lexical analysis in compiler
Finding patterns in text
-
8/13/2019 06L3
14/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 14
Lexical Analyzer
Recognize tokens in a program source
code.
The tokens can be variable names, reservedwords, operators, numbers, etc.
Each kind of token can be specified as an
RE, e.g., a variable name is of the form [A-Za-z][A-Za-z0-9]*. We can then construct
an -NFA to recognize it automatically.
-
8/13/2019 06L3
15/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 15
Lexical Analyzer
By putting all these -NFAs together, we
obtain one that can recognize different
kinds of tokens in the input string. We can convert this -NFA to NFA and
then to DFA, and implement this DFA as a
deterministic program - the lexical analyzer.
-
8/13/2019 06L3
16/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 16
Text Search
grep in Unix stands for Global (search
for) Regular Expression and Print.
Unix has its own notations for regular
expressions:
Dot . stands for any character.
[a1a2ak] stands for {a1, a2ak}, e.g., [bcd12]
stands for the set {b, c, d, 1, 2}.
[x-y] stands for all characters from x to y in the
ASCII sequence.
-
8/13/2019 06L3
17/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 17
Text Search
| means or, i.e., + in our normal notation.
* means Kleene star, as in our normal notation.
? means zero or one, e.g., R? is + R
+ means one or more, e.g., R+ is RR*
{n} means n copies of, e.g., R{5} is RRRRR
(You can find out more by man grep, man regex)
We can use these notations to search for string
patterns in text.
-
8/13/2019 06L3
18/18
2014/2/3 CSC 3130 Formal Languages and Automata Theory 18
Text Search
For example, credit card numbers:
[0-9]{16} | [0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}
For example, phone numbers:[0-9]{8} | [0-9]{3}-[0-9]{5} | 852-[0-9]{8} |
852-[0-9]{3}-[0-9]{5}