06L3

download 06L3

of 18

Transcript of 06L3

  • 8/13/2019 06L3

    1/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 1

    DFA RE

    How to construct a RE for the following DFA?

    By observation: (0+1)*0+

    A systematic method?

    Assume that we are given a DFA with stateslabeled by q0, q1, , qnand q0is the start state.

    1

    1

    0

    0

    q0 q1start

  • 8/13/2019 06L3

    2/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 2

    DFA RE

    We define a term Rijk to denote the set of all stringsthat take the DFA M from qito qjwith intermediate

    states going through q0, q1, q2, or qkonly.

    R00-1

    = {, 0}, R000

    = {, 0, 00, 000, ...}= 0*R01

    -1= {1}, R010= {1, 01, 001, 0001, ...}= 0*1

    Note that 101 R010. Why?

    11

    0

    0

    q0 q1

    e.g.

    start

  • 8/13/2019 06L3

    3/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 3

    DFA RE

    Rij-1= {a | (qi, a) = qj} if i j

    {a | (qi, a) = qj}{} if i = j

    Rijk= Rij

    k-1Rikk-1(Rkk

    k-1)*Rkjk-1

    qi

    qk

    qj

    a path in M

    We can recursively build the RE for Rij0, Rij

    1,

    Rij2, , Rij

    n.

  • 8/13/2019 06L3

    4/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 4

    DFA RE

    Let rijkbe a RE for Rijk. Compute rij-1 i,j = 0n,

    then rij0 i,j = 0n, then rij

    1 i,j = 0n, , until rijn

    i,j = 0n according to the formula:

    The RE for the language accepted by M:

    where q0is the start state and F = {qj1qj2

    qjp}

    rijk= rijk-1+ rikk-1(rkkk-1)*rkjk-1

    r0j1n+ r0j2n+ .. + r0jpn

  • 8/13/2019 06L3

    5/18

  • 8/13/2019 06L3

    6/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 6

    DFA, NFA, -NFA and RE

    DFA NFA -NFA RE

    DFA, NFA, -NFA and RE are all equivalent.

    A language that can be described by them is called a

    Regular Language

  • 8/13/2019 06L3

    7/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 7

    Algebraic Rules for RE

    Commutative Rule

    Associative Rule

    Distributive Rule

    Identity

  • 8/13/2019 06L3

    8/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 8

    Commutative & Associative Rules

    Let L, M and N be regular expressions, which

    of the followings are correct?

    L + M = M + L

    LM = ML

    (L + M) + N = L + (M + N)

    (LM)N = L(MN)

    Commutative

    Rules

    AssociativeRules

  • 8/13/2019 06L3

    9/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 9

    Distributive Rules

    Which of the followings are correct?

    L(M + N) = LM + LN

    L + (MN) = (L + M)(L + N)

    (M + N)L = ML + NL

    (MN) + L = (M + L)(N + L)

    Left Distributive

    Rules

    Right Distributive

    Rules

  • 8/13/2019 06L3

    10/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 10

    Identities

    What is the identity for union?

    ? + L = L + ? = L

    What is the identity for concatenation?

    ?L = L? = L

  • 8/13/2019 06L3

    11/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 11

    Other Rules for Kleene Closure

    (L*)* = L*

    L+= LL* = L*L

    L* = L++

    * =

    * =

  • 8/13/2019 06L3

    12/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 12

    Class Discussion

    Which of the followings are correct, andwhy?

    L + ML = (L + M)L(L + M)* = (L*M*)*

    (To prove, we need to show that any string gener-ated by the RE on the right can also be generated

    by the RE on the left, and vice versa. Try usingthe above algebraic rules. To disprove, we need tofind a counter-example.)

  • 8/13/2019 06L3

    13/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 13

    Applications of RE

    Two common applications of RE:

    Lexical analysis in compiler

    Finding patterns in text

  • 8/13/2019 06L3

    14/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 14

    Lexical Analyzer

    Recognize tokens in a program source

    code.

    The tokens can be variable names, reservedwords, operators, numbers, etc.

    Each kind of token can be specified as an

    RE, e.g., a variable name is of the form [A-Za-z][A-Za-z0-9]*. We can then construct

    an -NFA to recognize it automatically.

  • 8/13/2019 06L3

    15/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 15

    Lexical Analyzer

    By putting all these -NFAs together, we

    obtain one that can recognize different

    kinds of tokens in the input string. We can convert this -NFA to NFA and

    then to DFA, and implement this DFA as a

    deterministic program - the lexical analyzer.

  • 8/13/2019 06L3

    16/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 16

    Text Search

    grep in Unix stands for Global (search

    for) Regular Expression and Print.

    Unix has its own notations for regular

    expressions:

    Dot . stands for any character.

    [a1a2ak] stands for {a1, a2ak}, e.g., [bcd12]

    stands for the set {b, c, d, 1, 2}.

    [x-y] stands for all characters from x to y in the

    ASCII sequence.

  • 8/13/2019 06L3

    17/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 17

    Text Search

    | means or, i.e., + in our normal notation.

    * means Kleene star, as in our normal notation.

    ? means zero or one, e.g., R? is + R

    + means one or more, e.g., R+ is RR*

    {n} means n copies of, e.g., R{5} is RRRRR

    (You can find out more by man grep, man regex)

    We can use these notations to search for string

    patterns in text.

  • 8/13/2019 06L3

    18/18

    2014/2/3 CSC 3130 Formal Languages and Automata Theory 18

    Text Search

    For example, credit card numbers:

    [0-9]{16} | [0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}

    For example, phone numbers:[0-9]{8} | [0-9]{3}-[0-9]{5} | 852-[0-9]{8} |

    852-[0-9]{3}-[0-9]{5}