1
Implementation of Lexical Analysis(Scanning)
CS780(Prasad) L4Lexer 1
Adapted from material by:Prof. Alex Aiken and Prof. George Necula (UCB)
Prof. Saman Amarasinghe (MIT)
Outline
• Specifying lexical structure using regular expressions
• Recognizing tokens using finite automataDeterministic Finite Automata (DFAs)
CS780(Prasad) L4Lexer 2
( )Non-deterministic Finite Automata (NFAs)
• Implementation of regular expressionsRegExp => NFA => DFA => Tables
Regular Expressions => Lexical Spec.
1. Write a regular expression for the lexemes of each token
• Number = digit + (Kleene plus)+ (Kleene plus)• Keyword = ‘if’ | ‘else’ | … (Union)(Union)• Identifier = letter (letter | digit)*
CS780(Prasad) L4Lexer 3
• Identifier = letter (letter | digit)*• OpenPar = ‘(’• …
2. Construct R, matching all lexemes for all tokensR = Keyword | Identifier | Number | …
= R1 | R2 | …
(cont’d)3. Let input be the sequence of characters
x1…xn• For each 1 ≤ i ≤ n, check if
x1…xi ∈ L(R)
4. It must be that
CS780(Prasad) L4Lexer 4
x1…xi ∈ L(Rj) for some j5. Remove x1…xi from input and go to (3)
Problem: There are ambiguities in the algorithm.
2
Ambiguities
• How much input is used? What ifx1…xi ∈ L(R) and alsox1…xK ∈ L(R)
Rule: Pick longest possible string in L(R) The “maximal munch” << vs < , != vs !
CS780(Prasad) L4Lexer 5
• Which token is used? What ifx1…xi ∈ L(Rj) and alsox1…xi ∈ L(Rk)
Rule: use rule listed first (j if j < k)Treats “if” as a keyword not an identifier
Error Handling
• What ifNo rule matches a prefix of input ?
• Problem: Can get stuck …
CS780(Prasad) L4Lexer 6
• Solution:
Write a rule matching all “bad” stringsPut it last (catch-all clause)
Summary
• Regular expressions provide a concise notation for string patterns.
• Use in lexical analysis requires small extensions:
T l bi i i
CS780(Prasad) L4Lexer 7
To resolve ambiguitiesTo handle errors
• Good algorithms knownRequire only single pass over the inputFew operations per character (table lookup)
Parity Problem
1s. ofnumber even contains )(:
}1,0{*
*
ωω
ω
⇔→Σ
Σ∈=Σ
paritybooleanparity
CS780(Prasad) L4Lexer 8
EP OP
1
10 0
Finite automaton = Recognizer
3
Basic Features
• Consumes the entire input string.• Remembers the parity of the bit string by
abstracting from the number of 1s in the string.• Finite amount of memory required for this
purpose
CS780(Prasad) L4Lexer 9
purpose.Observe that counting requires unbounded memory, while computing the parity requires very small and fixed amount of memory.
• Accepts/Rejects the input in a deterministic fashion.
Deterministic Finite State Automaton (DFA)
Q: Finite set of statesΣ: Finite Alphabet
),,,,( 0 FqQM δΣ=
CS780(Prasad) L4Lexer 10
pδ: Transition function
total function from Q x Σ to Q: Initial/Start State
F : Set of final/accepting state0q
Finite Automata State Graphs
• The start state
• State
CS780(Prasad) L4Lexer 11
• An accepting state
• A transitiona
Operation of the machine
• Read the current letter of input under the
0 0 1
0q
Input Tape
Finite Control
CS780(Prasad) L4Lexer 12
• Read the current letter of input under the tape head.
• Transit to a new state depending on the current input and the current state, as dictated by the transition function.
• Halt after consuming the entire input.
4
• Machine configuration:
• Yields relation:
Associating Language with DFA
∗Σ∈∈ ωω , where],[ Qqq
CS780(Prasad) L4Lexer 13
• Language:
]),,([ ],[ M ωδω aqaq a
} ],[ ],[ |{ *M0
* Fqqq ∈∧Σ∈444 3444 21
a εωω
• Set of strings over {a,b} that contain bb
• Design states by partitioning Σ*
∗∗ )|()|( babbba
Example
CS780(Prasad) L4Lexer 14
• Design states by partitioning Σ .Strings containing bb q2Strings not containing bb
Strings that end in b q1Strings that do not end in b q0
Initial state: q0Final state: q2
q2
State Diagram and Table
q0 q1
ab
a
b
a
b
CS780(Prasad) L4Lexer 15
bδ a bq0 q0 q1
q1 q0 q2
q2 q2 q2
}2{},{
}2,1,0{
qFba
qqqQ
==Σ=
],1[],0[ * εqaabq a
q0 q2
Strings over {a,b} that do not contain bb
q1
ab
a
b
a
b
CS780(Prasad) L4Lexer 16
bδ a bq0 q0 q1
q1 q0 q2
q2 q2 q2
}1,0{},{
}2,1,0{
qqFba
qqqQ
==Σ=
],0[],0[ * εqbaq a
5
Strings over {a,b} containing even number of a’s and odd number of b’s.
Σ*
Ea OaEb Ob ObEb
CS780(Prasad) L4Lexer 17
b
b
b
b
aaa a
[Oa,Ob]
[Ea,Eb] [Ea,Ob]
[Oa,Eb]
Example
• Alphabet {0,1}• What language does this recognize?
1 0
CS780(Prasad) L4Lexer 18
0 0
11
Nondeterministic Finite Automata
qi qj
qkq
qi qja a
aDFA
NFA
a
CS780(Prasad) L4Lexer 19
relationsubset function total)(:
function total :
QQQΡowQ
NFA
NFA
DFA
×Σ×⊆→Σ×
→Σ×
δδ
δ
q2
NFA State Diagram (Strings over {a,b} ending in bb)
q0 q1
a
b b
}210{ qqqQ =
b
CS780(Prasad) L4Lexer 20
δ a bq0 {q0} {q0,q1}
q1 φ {q2}
q2 φ φ
}2{},{
}2,1,0{
qFba
qqqQ
==Σ=
bbba ∗)|(
6
],0[],0[],0[],0[ εqbqbbqabbq aaa
],1[],0[],0[],0[ εqbqbbqabbq aaa
Halts in non-accepting state after consuming the input.
CS780(Prasad) L4Lexer 21
],2[],1[],0[],0[ εqbqbbqabbq aaa
Halts in accepting state after consuming the input.
)(NFALabb ∈
Introducing ε-transitions into NFA
• A ε-transition causes the machine to h i d i i i ll
)(}){(: QPQ →∪Σ× εδ
CS780(Prasad) L4Lexer 22
change its state non-deterministically, without consuming any input.
)()( NFAsLDFAsL ⊆
Deterministic and Nondeterministic Automata
• Deterministic Finite Automata (DFA)One transition per input per stateNo ε-moves
• Nondeterministic Finite Automata (NFA)
CS780(Prasad) L4Lexer 23
Can have multiple transitions for one input in a given stateCan have ε-moves
• Finite automata have finite memoryNeed only to encode the current state
How do we associate a language with NFA?• A DFA can take only one path through the
state graph
• NFAs can chooseWhether to make ε moves
CS780(Prasad) L4Lexer 24
Whether to make ε-moves
Or one of the multiple transitions for a single input to take
Accept if there exists a accepting computation.
Reject if all computations are non-accepting.
7
• Every DFA is an NFA. However, does non-determinism make NFAs strictly more expressive (powerful) than DFAs?
)()( DFAsLNFAsL ⊇
CS780(Prasad) L4Lexer 25
For type 0 languages (Turning Machines) and type 3 languages (regular languages), non-determinism does not add expressive power.For context-free languages and context-sensitive languages, non-determinism does enhance the expressive power.
NFA State Diagram (a | b)* bb (a | b)*
q2q0 q1
a
b b
b ba
CS780(Prasad) L4Lexer 26
q2q0 q1
a
b b
ba
a
DFA
NFA for (a | b)* (aa | bb) (a | b)*
q2q0 q1
a
a a
b ba
CS780(Prasad) L4Lexer 27
q22q11b b
ba
NFA vs. DFA
01
0NFA
• Equivalent machines
CS780(Prasad) L4Lexer 28
0
01
0
1
0
1
DFA
8
NFA vs. DFA
• NFAs and DFAs recognize the same set of languages (regular languages).
• DFAs are faster to execute
CS780(Prasad) L4Lexer 29
There are no choices to consider.
• For a given language, NFA can be simpler than DFA
• DFA can be exponentially larger than NFA.
Reg. Expr. to Finite Automata
• High-level sketch
Regular
NFA
CS780(Prasad) L4Lexer 30
Regularexpressions DFA
LexicalSpecification
Table-driven Implementation of DFA
Regular Expressions to NFA (1)
• For each kind of regular expr., define an NFANotation: NFA for rexp M
M
CS780(Prasad) L4Lexer 31
• For εε
• For input aa
• For φ
Regular Expressions to NFA (2)
A Bε
F A | B
• For AB
CS780(Prasad) L4Lexer 32
• For A | B
A
B
εε
ε
ε
9
Regular Expressions to NFA (3)
A
ε
• For A*
CS780(Prasad) L4Lexer 33
A εε
ε
Thompson’s Construction
Reg. Expr. −> NFA conversion
• Consider the regular expression(1|0)*1
• The corresponding NFA is
CS780(Prasad) L4Lexer 34
10 1ε ε
ε
ε
ε
ε ε
ε
ε
A BC
D
E
FG H I J
Regular Expression −> NFA conversion
(1|0)*1(1|0)*1
CS780(Prasad) L4Lexer 35
10 1ε ε
ε
ε
ε
ε ε
ε
ε
A BC
D
E
FG H I J
NFA to DFA: The Trick• Simulate the NFA, in parallel
Michael Rabin and Dana Scott’s work
• Each state of DFA = a non-empty subset of states of the NFA
• Start state
CS780(Prasad) L4Lexer 36
= the set of NFA states reachable through ε-moves from NFA start state
• Add a transition S →a S’ to DFA iffS’ is the set of NFA states reachable from some state in S after seeing the input ‘a’, considering ε-moves as well
10
NFA to DFA: Remark• An NFA may be in many states at any time.• How many different states ?
If there are N states, the NFA must be in some subset of those N states
• How many subsets are there?
CS780(Prasad) L4Lexer 37
2N - 1 = finitely many
• NFA −> DFA conversion is at the heart of tools such as flex. In practice, flex-like tools trade off speed for space in the choice of NFA and DFA representations. (DFA Minimization)
Myhill-Nerode’s work
NFA −> DFA Example
10 1ε ε
ε
ε
ε
ε ε
ε
A BC
D
E
FG H I J
CS780(Prasad) L4Lexer 38
ε
ABCDHI
FGHIABCD
EJGHIABCD
0
1
0
10 1
Implementation
• A DFA can be implemented by a 2D table TOne dimension is “states”Other dimension is “input symbol”For every transition Si →a Sk define T[i,a] = k
CS780(Prasad) L4Lexer 39
y i k [ ]• DFA “execution”
If in state Si and input ‘a’, read T[i,a] = k and skip to state Sk
Very efficient
Table Implementation of a DFA
S
T
U
0
1
0
10 1
CS780(Prasad) L4Lexer 40
0 1S T UT T UU T U
11
Finite State Machine Minimization
Language over {H,T} : Strings with even number of H
CS780(Prasad) L4Lexer 41
Minimizing DFA: The Idea
• Two states s and t should be merged unless there is some way to tell them apart.
Initially, assume that all states are equivalent until proven otherwise.
CS780(Prasad) L4Lexer 42
proven otherwise.
• How can we tell if two states s and t are different? If one is accepting and the other is not.
If, on input c, s −> x and t −> y, and we already know x is different from y.
Minimizing DFA: The Algorithm• Overall, it partitions the set of states.
• Initial partition :
(Accepting States, Non-accepting states)
• Refinement :
In each pass, find a new partition for each state such
CS780(Prasad) L4Lexer 43
p , pthat
s and t are in the same new partition if and only if states s and t were in the same old partition, and, on each input c, states s and t go to states in the same partition.
Example DFA
q0 q1 q2 q3a
b
b
b
a
a,b a
CS780(Prasad) L4Lexer 44
q4 q7q5
q6
a a,b
aa,b
bb
(a u b)(a u b*)
12
Refinement of State Partitions
• { {q0,q7} , {q1,q2,q3,q4,q5,q6} }• { {q0},{q7}, {q1,q2,q3,q4,q5,q6} }
On any transition
• { {q0},{q7}, {q1,q2,q3,q4,q5,q6} }
CS780(Prasad) L4Lexer 45
• { {q0},{q7}, {q1,q4}, {q2,q3,q5,q6} }On “a” transition
• { {q0},{q7}, {q1,q4}, {q2,q5},{q3,q6} }On “b” transition
Example DFA showing equivalent states
q0 q1 q2 q3a
b
b
b
a
a,b a
CS780(Prasad) L4Lexer 46
q4 q7q5
q6
a a,b
aa,b
bb
(a u b)(a u b*)
Example Minimum DFA
q0
a,b
CS780(Prasad) L4Lexer 47
q1,q4 q7q2,q5
q3,q6
a a,b
aa,b
b
,
b(a u b)(a u b*)
Summary
• Lexer creates tokens out of a text stream.• Tokens are defined using regular expressions.• Regular expressions can be mapped to
Nondeterministic Finite Automatons (NFA) • NFA is transformed to a DFA
By removing non determinism and
CS780(Prasad) L4Lexer 48
By removing non-determinism, andBy minimizing states.
• Executing a DFA is straightforward. • Common scanner generator tools
lex, flex in Cjlex in Java
13
What’s Next?
Lexical Analyzer (Scanner)
Syntax Analyzer (Parser)
Token Stream
Parse Tree
Program (character stream)
49
Intermediate Code Optimizer
Code Generator
Optimized Intermediate Representation
Assembly code
Intermediate Code GeneratorIntermediate Representation
Animating Lexical Analysis
CS780(Prasad) L4Lexer 50
Lexical Analyzer in Action
f o r v a r 1 = 1 0 v a r 1 < =
CS780(Prasad) L4Lexer 51
Lexical Analyzer in Action
for ID(“var1”) eq_op Num(10) ID(“var1”) leq_op
f o r v a r 1 = 1 0 v a r 1 < =
CS780(Prasad) L4Lexer 52
14
Lexical Analyzer in Action
f
for
o r v a r 1 = 1 0 v a r 1 < =
ID(“var1”) eq_op Num(10) ID(“var1”) leq_op
CS780(Prasad) L4Lexer 53
• Partition input program text into subsequence of characters corresponding to tokens
• Attach the corresponding attributes to the tokens• Eliminate white space and comments
Animating NFA construction
CS780(Prasad) L4Lexer 54
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (.·(0|1|2|3|4|5|6|7|8|9)*)?
24
CS780(Prasad) L4Lexer 55
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (. ·(0|1|2|3|4|5|6|7|8|9)*)?
CS780(Prasad) L4Lexer 56
ε
-
15
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (. ·(0|1|2|3|4|5|6|7|8|9)*)?
CS780(Prasad) L4Lexer 57
ε
-ε
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (. ·(0|1|2|3|4|5|6|7|8|9)*)?
ε
CS780(Prasad) L4Lexer 58
ε
-ε
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (. ·(0|1|2|3|4|5|6|7|8|9)*)?
ε
012
CS780(Prasad) L4Lexer 59
ε 23456789
-ε
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (. ·(0|1|2|3|4|5|6|7|8|9)*)?
ε
012
CS780(Prasad) L4Lexer 60
ε 23456789
-ε ε
16
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (. ·(0|1|2|3|4|5|6|7|8|9)*)?
ε
012
CS780(Prasad) L4Lexer 61
ε 23456789
-ε ε
ε
ε
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (. ·(0|1|2|3|4|5|6|7|8|9)*)?
ε
012
CS780(Prasad) L4Lexer 62
ε 23456789
-ε . ε
ε
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (. ·(0|1|2|3|4|5|6|7|8|9)*)?
ε
012
ε
CS780(Prasad) L4Lexer 63
ε 23456789
-ε . εε
εε
ε
012
ε
012
CS780(Prasad) L4Lexer 64
ε 23456789
23456789
-ε . εε
εε
17
ε
012
ε
012
CS780(Prasad) L4Lexer 65
ε 23456789
23456789
-ε . εε
εε
Animating Token Recognition
CS780(Prasad) L4Lexer 66
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 67
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 68
ε
0123456789
0123456789
-ε . εε
εε
18
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 69
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 70
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 71
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 72
ε
0123456789
0123456789
-ε . εε
εε
19
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 73
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 74
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 75
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 76
ε
0123456789
0123456789
-ε . εε
εε
20
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 77
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 78
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 79
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 80
ε
0123456789
0123456789
-ε . εε
εε
21
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 81
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 82
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 83
ε
0123456789
0123456789
-ε . εε
εε
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 84
ε
0123456789
0123456789
-ε . εε
εε
22
String Matching
ε
0
ε
0
- 1 2 . 8
CS780(Prasad) L4Lexer 85
ε
0123456789
0123456789
-ε . εε
εε
Animating Closure Construction
CS780(Prasad) L4Lexer 86
NFA to DFA conversion
1. Closure
• The closure of a state is the set of states that can be reached from that state without consuming any of the input
Closure(S) is the smallest set T such that
⎟⎟⎠
⎞⎜⎜⎝
⎛= UU sedgeST ),( ε
87'
),('
'
'
TT
sedgeTT
TT
ST
Ts
=
⎟⎟⎠
⎞⎜⎜⎝
⎛←
←
←
∈
until
repeat
UU ε
• Algorithm
⎟⎠
⎜⎝ ∈U
Ts
g )(
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (. ·(0|1|2|3|4|5|6|7|8|9)*)?
ε ε . ε
ε0
123
4
ε0
123
4
S = {1}T = {}T’ = {}
L4Lexer 88
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
'
),('
'
'
TT
sedgeTT
TT
ST
Ts
=
⎟⎟⎠
⎞⎜⎜⎝
⎛←
←
←
∈
until
repeat
UU ε
23
ε ε . ε
ε0
123
4
ε0
123
4
S = {1}T = {1}T’ = {}
89
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
'
),('
'
'
TT
sedgeTT
TT
ST
Ts
=
⎟⎟⎠
⎞⎜⎜⎝
⎛←
←
←
∈
until
repeat
UU ε
ε ε . ε
ε0
123
4
ε0
123
4
S = {1}T = {1}T’ = {1}
90
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
'
),('
'
'
TT
sedgeTT
TT
ST
Ts
=
⎟⎟⎠
⎞⎜⎜⎝
⎛←
←
←
∈
until
repeat
UU ε
ε ε . ε
ε0
123
4
ε0
123
4
S = {1}T = {1, 2}T’ = {1}
91
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
'
),('
'
'
TT
sedgeTT
TT
ST
Ts
=
⎟⎟⎠
⎞⎜⎜⎝
⎛←
←
←
∈
until
repeat
UU ε
ε ε . ε
ε0
123
4
ε0
123
4
S = {1}T = {1, 2}T’ = {1}
92
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
'
),('
'
'
TT
sedgeTT
TT
ST
Ts
=
⎟⎟⎠
⎞⎜⎜⎝
⎛←
←
←
∈
until
repeat
UU ε
24
ε ε . ε
ε0
123
4
ε0
123
4
S = {1}T = {1, 2}T’ = {1, 2}
93
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
'
),('
'
'
TT
sedgeTT
TT
ST
Ts
=
⎟⎟⎠
⎞⎜⎜⎝
⎛←
←
←
∈
until
repeat
UU ε
ε ε . ε
ε0
123
4
ε0
123
4
S = {1}T = {1, 2}T’ = {1, 2}
94
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
'
),('
'
'
TT
sedgeTT
TT
ST
Ts
=
⎟⎟⎠
⎞⎜⎜⎝
⎛←
←
←
∈
until
repeat
UU ε
ε ε . ε
ε0
123
4
ε0
123
4
S = {1}T = {1, 2}T’ = {1, 2}
95
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
'
),('
'
'
TT
sedgeTT
TT
ST
Ts
=
⎟⎟⎠
⎞⎜⎜⎝
⎛←
←
←
∈
until
repeat
UU ε
ε ε . ε
ε0
123
4
ε0
123
4
S = {1}T = {1, 2}T’ = {1, 2}
96
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
'
),('
'
'
TT
sedgeTT
TT
ST
Ts
=
⎟⎟⎠
⎞⎜⎜⎝
⎛←
←
←
∈
until
repeat
UU ε
25
Question: What is closure(3)?
ε ε . ε
ε0
123
4
ε0
123
4
S = {3}T = ??
CS780(Prasad) L4Lexer 97
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
Question: What is closure(3)?
ε ε . ε
ε0
123
4
ε0
123
4
S = {3}T = {2, 3, 4, 8}
CS780(Prasad) L4Lexer 98
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
Animating DFAEdge Construction
CS780(Prasad) L4Lexer 99
NFA to DFA conversion
2. DFAedge
• Given a symbol and a state,what states can you reach?
CS780(Prasad) L4Lexer 100
26
What is DFAedge({1}, 3)?
ε ε . ε
ε0
123
4
ε0
123
4
d = {1}
CS780(Prasad) L4Lexer 101
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
DFAedge({1}, 3) = ??
What is DFAedge({1}, 3)?
ε ε . ε
ε0
123
4
ε0
123
4
d = {1}
edge({1}, 3) = {}
CS780(Prasad) L4Lexer 102
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
DFAedge({1}, 3) = {}
What is DFAedge({1}, 3)?
ε ε . ε
ε0
123
4
ε0
123
4
d = {1}
closure({1}) = {1,2}
CS780(Prasad) L4Lexer 103
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
DFAedge({1}, 3) = {}
What is DFAedge({1}, 3)?
ε ε . ε
ε0
123
4
ε0
123
4
d = {1, 2}
edge({2}, 3) = {3}
CS780(Prasad) L4Lexer 104
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
DFAedge({1}, 3) = {3}
27
What is DFAedge({1}, 3)?
ε ε . ε
ε0
123
4
ε0
123
4
d = {1, 2}
closure({3}) ={2,3,4,8}
CS780(Prasad) L4Lexer 105
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
DFAedge({1}, 3) = {2,3,4,8}
Question: What is DFAedge({3}, .)?
ε ε . ε
ε0
123
4
ε0
123
4
d = {3}
CS780(Prasad) L4Lexer 106
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
DFAedge({3}, .) = ??
Question: What is DFAedge({3}, .)?
ε ε . ε
ε0
123
4
ε0
123
4
d = {3}
CS780(Prasad) L4Lexer 107
-ε . εε
ε
1 4 5 84567
8
9
32 4567
8
9
76
ε
DFAedge({3}, .) = {5,6,8}
Animating DFA Construction
CS780(Prasad) L4Lexer 108
NFA to DFA conversion
28
ε
-ε . εε1 4 5 8
ε0
123
4567
8
9
32
ε0
123
4567
8
9
76
ε
CS780(Prasad) L4Lexer 109
closure({1}) ={1, 2}ε
ε
1,2
ε
-ε . εε1 4 5 8
ε0
123
4567
8
9
32
ε0
123
4567
8
9
76
ε
CS780(Prasad) L4Lexer 110
DFAedge({1,2},-) ={2}ε
ε
1,2
2
-
ε
-ε . εε1 4 5 8
ε0
123
4567
8
9
32
ε0
123
4567
8
9
76
ε
CS780(Prasad) L4Lexer 111
Closure({2}) ={2}ε
ε
1,2
2
-
ε
-ε . εε1 4 5 8
ε0
123
4567
8
9
32
ε0
123
4567
8
9
76
ε
CS780(Prasad) L4Lexer 112
3
DFAedge({1,2},0..9) ={3}ε
ε
1,2
2
-
29
ε
-ε . εε1 4 5 8
ε0
123
4567
8
9
32
ε0
123
4567
8
9
76
ε
CS780(Prasad) L4Lexer 113
2,3,4,8
Closure({3}) ={2,3,4,8}ε
ε
1,2
2
-
ε
-ε . εε1 4 5 8
ε0
123
4567
8
9
32
ε0
123
4567
8
9
76
ε
CS780(Prasad) L4Lexer 114
2,3,4,8
DFAedge({2},0..9) ={3}Closure({3}) = {2,3,4,8} ε
ε
1,2
2
-
ε
-ε . εε1 4 5 8
ε0
123
4567
8
9
32
ε0
123
4567
8
9
76
ε
CS780(Prasad) L4Lexer 115
2,3,4,8
DFAedge({2,3,4,8},0..9) ={3}closure({3}) = {2,3,4,8} ε
ε
1,2
2
,4,5,6
-
ε
-ε . εε1 4 5 8
ε0
123
4567
8
9
32
ε0
123
4567
8
9
76
ε
CS780(Prasad) L4Lexer 116
5,6,82,3,4,8
DFAedge({2,3,4,8}, .) ={5}closure({5}) = {5,6,8} ε
ε
1,2
2
,4,5,6
.
-
30
ε
-ε . εε1 4 5 8
ε0
123
4567
8
9
32
ε0
123
4567
8
9
76
ε
CS780(Prasad) L4Lexer 117
0,1,2,3,4,5,6,7,8,95,6,8 6,7,82,3,4,8
DFAedge({5,6,8},0..9) ={7}closure({7}) = {6,7,8} ε
ε
1,2
2
,4,5,6
.
-
ε
-ε . εε1 4 5 8
ε0
123
4567
8
9
32
ε0
123
4567
8
9
76
ε
CS780(Prasad) L4Lexer 118
0,1,2,3,4,5,6,7,8,95,6,8 6,7,82,3,4,8
DFAedge({6,7,8}, 0..9)={7}closure({7}) ={6,7,8} ε
ε
1,2
2
,4,5,6 ,4,5,6
.
-
ε
-ε . εε1 4 5 8
ε0
123
4567
8
9
32
ε0
123
4567
8
9
76
ε
NFA
CS780(Prasad) L4Lexer 119
0,1,2,3,4,5,6,7,8,95,6,8 6,7,82,3,4,8
ε
ε
1,2
2
,4,5,6 ,4,5,6
.
-
DFA
What is the minimal DFA for the following?
CS780(Prasad) L4Lexer 120
Top Related