CMSC 330: Organization of Programming Languages

39
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata

description

Theory of Regular Expressions Finite Automata. CMSC 330: Organization of Programming Languages. Regular Expression Review. Terms Alphabet String Language Regular expression ( “ regex ” ) Operations Concatentation Union Kleene closure Ruby vs. theoretical regexes. - PowerPoint PPT Presentation

Transcript of CMSC 330: Organization of Programming Languages

Page 1: CMSC 330:  Organization of Programming Languages

CMSC 330: Organization of Programming Languages

Theory of Regular ExpressionsFinite Automata

Page 2: CMSC 330:  Organization of Programming Languages

CMSC 330 2

Regular Expression Review• Terms

– Alphabet– String– Language– Regular expression (“regex”)

• Operations– Concatentation– Union– Kleene closure

• Ruby vs. theoretical regexes

Page 3: CMSC 330:  Organization of Programming Languages

CMSC 330 3

Previous Course Review

• {s | cond(s)} means the set of all strings s such that the given condition applies

• s A means s is an element of the set A• De Morgan’s Laws:

(A ∩ B)C = AC B∪ C

(A B)∪ C = AC ∩ BC

• Quantifiers and Qualifiers– Existential quantifier (“there exists”): ∃– Universal quantifier (“for all”): ∀– Qualifier (“such that”): s.t.

Page 4: CMSC 330:  Organization of Programming Languages

CMSC 330 4

Implementing Regular Expressions

• We can implement regular expressions by turning them into a finite automaton– A “machine” for recognizing a regular language

Page 5: CMSC 330:  Organization of Programming Languages

CMSC 330 5

Example

• Machine starts in start or initial state• Repeat until the end of the string is reached:

– Scan the next symbol s of the string– Take transition edge labeled with s

• The string is accepted if the automaton is in a final or accepting state when the end of the string is reached

States

Start state Final state

Transition on 1

Page 6: CMSC 330:  Organization of Programming Languages

CMSC 330 6

Example

0 0 1 0 1 1accepted

Page 7: CMSC 330:  Organization of Programming Languages

CMSC 330 7

Example

0 0 1 0 1 0not accepted

Page 8: CMSC 330:  Organization of Programming Languages

CMSC 330 8

What Language is This?

• All strings over Σ = {0, 1} that end in 1• What is a regular expression for this language?

(0|1)*1

Page 9: CMSC 330:  Organization of Programming Languages

CMSC 330 9

Formal Definition

• A deterministic finite automaton (DFA) is a 5-tuple (Σ, Q, q0, F, δ) where– Σ is an alphabet– Q is a nonempty set of states– q0 ∊ Q is the start state– F ⊆ Q is the set of final states– δ : Q x Σ → Q specifies the DFA's transitions

Page 10: CMSC 330:  Organization of Programming Languages

CMSC 330 10

More on DFAs• A finite state automaton can have more than one final

state:

• A string is accepted as long as there is at least one path to a final state

Page 11: CMSC 330:  Organization of Programming Languages

CMSC 330

Our Example, Stated Formally

Σ = {0, 1}

Q = {S0, S1}

q0 = S0

F = {S1}

δ = { (S0, 0, S0), (S0, 1, S1), (S1, 0, S0), (S1, 1, S1) }

δ 0 1S0 S0 S1S1 S0 S1

Page 12: CMSC 330:  Organization of Programming Languages

CMSC 330 12

Another Example

(a,b,c notation shorthand for three self loops)

NS3acba

YS0ε

YS0aa

YS1aabbb

YS2bbc

YS2acc

YS2aabcc

accepts?

state at end

string

Page 13: CMSC 330:  Organization of Programming Languages

CMSC 330 13

Another Example (cont’d)

What language does thisDFA accept? a*b*c*

S3 is a dead state – anonfinal state with notransition to anotherstate

Page 14: CMSC 330:  Organization of Programming Languages

CMSC 330 14

Shorthand Notation

• If a transition is omitted, assume it goes to a dead state that is not shown

is short for

Language?

Strings over {0,1,2,3} with alternating even and odd digits, beginning with odd digit

Page 15: CMSC 330:  Organization of Programming Languages

CMSC 330 15

What Lang. Does This DFA Accept?

a*b*c* again, so DFAs are not unique

Page 16: CMSC 330:  Organization of Programming Languages

CMSC 330

Equivalent DFAs

=

These DFAs are equivalent in the sense that they accept the same language.

Page 17: CMSC 330:  Organization of Programming Languages

CMSC 330 17

DFA State Semantics

– S0 = “Haven’t seen anything yet” OR “seen zero or more b’s” OR “Last symbol seen was a b”

– S1 = “Last symbol seen was an a”– S2 = “Last two symbols seen were ab”– S3 = “Last three symbols seen were abb”

• Language?• (a|b)*abb

Page 18: CMSC 330:  Organization of Programming Languages

CMSC 330 18

Review

• Basic parts of a regular expression?

• What does a DFA do?

• Basic parts of a DFA?

concatenation, |, *, , , {a}

alphabet, set of states, start state, final states,

transition function (Σ, Q, q0, F, δ)

implements regular expressions; accepts strings

Page 19: CMSC 330:  Organization of Programming Languages

CMSC 330 19

Notes about the DFA definition

• Can not have more than one transition leaving a state on the same symbol – the transition function must be a valid function

• Can not have transitions with no or empty labels– the transitions must be labeled by alphabet symbols

Page 20: CMSC 330:  Organization of Programming Languages

CMSC 330 20

Nondeterministic Finite Automata (NFA)

• An NFA is a 5-tuple (Σ, Q, q0, F, δ) where– Σ is an alphabet– Q is a nonempty set of states– q0∊ Q is the start state– F ⊆ Q is the set of final states– δ : Q x (Σ ∪ {ε}) → Q specifies the NFA's transitions

• Transitions on ε are allowed – can optionally take these transitions without consuming any input

• Can have more than one transition for a given state and symbol

• An NFA accepts s if there is at least one path from its start to final state on s

Page 21: CMSC 330:  Organization of Programming Languages

CMSC 330 21

NFA for (a|b)*abb

• ba– Has paths to either S0 or S1– Neither is final, so rejected

• babaabb– Has paths to different states– One leads to S3, so accepted

Page 22: CMSC 330:  Organization of Programming Languages

CMSC 330

Why are NFAs useful?

• Sometimes an NFA is much easier to understand than its equivalent DFA

Page 23: CMSC 330:  Organization of Programming Languages

CMSC 330 23

• Language?• (ab|aba)*

Another example DFA

Page 24: CMSC 330:  Organization of Programming Languages

CMSC 330 24

NFA for (ab|aba)*

• aba– Has paths to states S0, S1

• ababa– Has paths to S0, S1– Need to use ε-transition

(ab(ε|a))*

Page 25: CMSC 330:  Organization of Programming Languages

CMSC 330 25

Relating R.E.'s to DFAs and NFAs

• Regular expressions, NFAs, and DFAs accept the same languages!

DFA NFA

RE

cantransform

cantransform

cantransform

(we’ll discuss this next)

Page 26: CMSC 330:  Organization of Programming Languages

CMSC 330 26

Reducing Regular Expressions to NFAs

• Goal: Given regular expression e, construct NFA: <e> = (Σ, Q, q0, F, δ)– Remember, REs are defined inductively; i.e. recursively

• Base case: a

<a> = ({a}, {S0, S1}, S0, {S1}, {(S0, a, S1)} )

Page 27: CMSC 330:  Organization of Programming Languages

CMSC 330 27

Reduction (cont’d)

• Base case: ε

<ε> = (ε, {S0}, S0, {S0}, ∅)

• Base case: ∅

<∅> = (∅, {S0, S1}, S0, {S1}, ∅)

Page 28: CMSC 330:  Organization of Programming Languages

CMSC 330 28

Reduction (cont’d)

• Concatenation: AB

<A> <B>

Page 29: CMSC 330:  Organization of Programming Languages

CMSC 330 29

Reduction (cont’d)

• Concatenation: AB

– <A> = (ΣA, QA, qA, {fA}, δA)

– <B> = (ΣB, QB, qB, {fB}, δB)

– <AB> = (ΣA∪ΣB, QA∪QB, qA, {fB}, δA∪ δB∪ {(fA, ε, qB)} )

<A> <B>

Page 30: CMSC 330:  Organization of Programming Languages

CMSC 330 30

Practice

• Draw the NFA for these regular expressions using the reduction method:– ab– hello

• Write the formal (5-tuple) NFA for the same regular expressions

Page 31: CMSC 330:  Organization of Programming Languages

CMSC 330 31

Reduction (cont’d)

• Union: (A|B)

Page 32: CMSC 330:  Organization of Programming Languages

CMSC 330 32

• Union: (A|B)

– <A> = (ΣA, QA, qA, {fA}, δA)

– <B> = (ΣB, QB, qB, {fB}, δB)

– <(A|B)> = (ΣA∪ΣB, QA∪ QB∪ {S0,S1}, S0, {S1},

δA∪ δB∪ {(S0,ε,qA), (S0,ε,qB), (fA,ε,S1), (fB,ε,S1)})

Reduction (cont’d)

Page 33: CMSC 330:  Organization of Programming Languages

CMSC 330 33

Practice

• Draw the NFA for these regular expressions using exactly the reduction method:– ab|bc– hello|hi

• Write the formal NFA for the same regular expressions

Page 34: CMSC 330:  Organization of Programming Languages

CMSC 330 34

Reduction (cont’d)

• Closure: A*

Page 35: CMSC 330:  Organization of Programming Languages

CMSC 330 35

Reduction (cont’d)

• Closure: A*

– <A> = (ΣA, QA, qA, {fA}, δA)

– <A*> = (ΣA, QA∪ {S0,S1}, S0, {S1},

δA ∪ {(fA,ε,S1), (S0,ε,qA), (S0,ε,S1), (S1,ε,S0)})

Page 36: CMSC 330:  Organization of Programming Languages

CMSC 330 36

Practice

• Draw the NFA for these regular expressions using exactly the reduction method:– (ab|bc*)*– hello|(hi)*

• Write the formal NFA for the same regular expressions

Page 37: CMSC 330:  Organization of Programming Languages

CMSC 330 37

Reduction Complexity

• Given a regular expression A of size n...Size = # of symbols + # of operations

• How many states+transitions does <A> have?– 2+1 for each symbol– 0+1 for each concatenation– 2+4 added for each union– 2+4 added for each closure– O(n)– That’s pretty good!

Page 38: CMSC 330:  Organization of Programming Languages

CMSC 330 38

-closure• We say:

– if it is possible to transition from state p to state q taking only -transitions

– if p, p1, p2, … pn, q Q (p q) such that

• For any state p, the -closure of p is defined as the set of states q such that – {q | }

Page 39: CMSC 330:  Organization of Programming Languages

CMSC 330 39

Example

• What’s the -closure of S2 in this NFA?

• {S2, S0}