06L3

8/13/2019 06L3

1/18

2014/2/3 CSC 3130 Formal Languages and Automata Theory 1

DFA RE

How to construct a RE for the following DFA?

By observation: (0+1)*0+

A systematic method?

Assume that we are given a DFA with stateslabeled by q0, q1, , qnand q0is the start state.

1

1

0

0

q0 q1start

8/13/2019 06L3

2/18


DFA RE

We define a term Rijk to denote the set of all stringsthat take the DFA M from qito qjwith intermediate

states going through q0, q1, q2, or qkonly.

R00-1

= {, 0}, R000

= {, 0, 00, 000, ...}= 0*R01

-1= {1}, R010= {1, 01, 001, 0001, ...}= 0*1

Note that 101 R010. Why?

11

0

0

q0 q1

e.g.

start

8/13/2019 06L3

3/18


DFA RE

Rij-1= {a | (qi, a) = qj} if i j

{a | (qi, a) = qj}{} if i = j

Rijk= Rij

k-1Rikk-1(Rkk

k-1)*Rkjk-1

qi

qk

qj

a path in M

We can recursively build the RE for Rij0, Rij

1,

Rij2, , Rij

n.

8/13/2019 06L3

4/18


DFA RE

Let rijkbe a RE for Rijk. Compute rij-1 i,j = 0n,

then rij0 i,j = 0n, then rij

1 i,j = 0n, , until rijn

i,j = 0n according to the formula:

The RE for the language accepted by M:

where q0is the start state and F = {qj1qj2

qjp}

rijk= rijk-1+ rikk-1(rkkk-1)*rkjk-1

r0j1n+ r0j2n+ .. + r0jpn

8/13/2019 06L3

5/18

8/13/2019 06L3

6/18


DFA, NFA, -NFA and RE

DFA NFA -NFA RE

DFA, NFA, -NFA and RE are all equivalent.

A language that can be described by them is called a

Regular Language

8/13/2019 06L3

7/18


Algebraic Rules for RE

Commutative Rule

Associative Rule

Distributive Rule

Identity

8/13/2019 06L3

8/18


Commutative & Associative Rules

Let L, M and N be regular expressions, which

of the followings are correct?

L + M = M + L

LM = ML

(L + M) + N = L + (M + N)

(LM)N = L(MN)

Commutative

Rules

AssociativeRules

8/13/2019 06L3

9/18


Distributive Rules

Which of the followings are correct?

L(M + N) = LM + LN

L + (MN) = (L + M)(L + N)

(M + N)L = ML + NL

(MN) + L = (M + L)(N + L)

Left Distributive

Rules

Right Distributive

Rules

8/13/2019 06L3

10/18


Identities

What is the identity for union?

? + L = L + ? = L

What is the identity for concatenation?

?L = L? = L

8/13/2019 06L3

11/18


Other Rules for Kleene Closure

(L*)* = L*

L+= LL* = L*L

L* = L++

* =

* =

8/13/2019 06L3

12/18


Class Discussion

Which of the followings are correct, andwhy?

L + ML = (L + M)L(L + M)* = (L*M*)*

(To prove, we need to show that any string gener-ated by the RE on the right can also be generated

by the RE on the left, and vice versa. Try usingthe above algebraic rules. To disprove, we need tofind a counter-example.)

8/13/2019 06L3

13/18


Applications of RE

Two common applications of RE:

Lexical analysis in compiler

Finding patterns in text

8/13/2019 06L3

14/18


Lexical Analyzer

Recognize tokens in a program source

code.

The tokens can be variable names, reservedwords, operators, numbers, etc.

Each kind of token can be specified as an

RE, e.g., a variable name is of the form [A-Za-z][A-Za-z0-9]*. We can then construct

an -NFA to recognize it automatically.

8/13/2019 06L3

15/18


Lexical Analyzer

By putting all these -NFAs together, we

obtain one that can recognize different

kinds of tokens in the input string. We can convert this -NFA to NFA and

then to DFA, and implement this DFA as a

deterministic program - the lexical analyzer.

8/13/2019 06L3

16/18


Text Search

grep in Unix stands for Global (search

for) Regular Expression and Print.

Unix has its own notations for regular

expressions:

Dot . stands for any character.

[a1a2ak] stands for {a1, a2ak}, e.g., [bcd12]

stands for the set {b, c, d, 1, 2}.

[x-y] stands for all characters from x to y in the

ASCII sequence.

8/13/2019 06L3

17/18


Text Search

| means or, i.e., + in our normal notation.

* means Kleene star, as in our normal notation.

? means zero or one, e.g., R? is + R

+ means one or more, e.g., R+ is RR*

{n} means n copies of, e.g., R{5} is RRRRR

(You can find out more by man grep, man regex)

We can use these notations to search for string

patterns in text.

8/13/2019 06L3

18/18


Text Search

For example, credit card numbers:

[0-9]{16} | [0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}

For example, phone numbers:[0-9]{8} | [0-9]{3}-[0-9]{5} | 852-[0-9]{8} |

852-[0-9]{3}-[0-9]{5}

06L3

Documents

Transcript of 06L3