1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

107
1 Chapter 3 Scanning - Theory and Practice Prof Chung. 06/27/22

Transcript of 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Page 1: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

1

Chapter 3 Scanning - Theory and Practice

Prof Chung.

04/21/23

Page 2: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

22

Outlines 3.1 Overview 3.2 Regular Expressions 3.4 Finite Automata and Scanners 3.5 Using a Scanner Generator

LEX --- Introduce in TA course: LEX introduction 3.7 Practical Considerations 3.8 Translating Regular Expressions into Finite

Automata 3.9 Summary

Modify form http://www.cs.ualberta.ca/~amaral/courses/680/

04/21/23

Page 3: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Overview(1) Formal notations

For specifying the precise structure of tokens are necessary

Quoted string in Pascal Can a string split across a line? Is a null string allowed? Is .1 or 10. ok? The 1..10 problem

Scanner generators Tables, Programs

What formal notations to use?

04/21/233

Page 4: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Overview(2) Lexical analyzer (scanner) role

Produce a sequence of (tokens) for parser Stripe out comments and whitespaces Associate a line number with each error message Expand macros

04/21/234

LexicalAnalyzer

Parser

SymbolTable

sourceprogram

to semanticanalysis

token

getNextToken

Page 5: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Regular Expressions (1) Tokens

built from symbols of a finite vocabulary.

Structures of tokens use regular expressions to define

Set Definition The sets of strings defined by regular expressions are termed is a regular expression denoting the empty set

is a regular expression denoting the set that contains only the empty string

A string s is a regular expression denoting a set containing only s

04/21/235

Page 6: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Regular Expression (2) if A and B are regular expressions, so are

A | B (alternation)

A regular expression formed by A or B

(a)|(b) = {a, b}

AB or A•B (concatenation)

A regular expression formed by A followed by B

(a)(b) = {ab}

A* (Kleene closure)

A regular expression formed by zero or more repetitions of A

a* = {, a, aa, aaa, …}

04/21/236

More Complex Example (a|b|c)* = {, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc …}

Page 7: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Regular Expression (3) Some notational convenience

P+

PP* (at least one)

Not(A) V - A

Not(S) V* - S

AK

AA …A (k copies)

A?

Optional, zero or one occurrence of A

04/21/237

More Complex Example Let D = (0 | 1 | 2 | 3 | 4 | ... | 9 ) Let L = (A | B | ... | Z | a | b |... | z)

comment = -- not(EOL)* EOL ex: --hello12_34 \n

decimal = D+ · D+

ex: 123.456

ident = L (L | D | _)* ex: A1a5_6

comments = ((#| ) not(#))* ex:#A435#3

Page 8: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Regular Expressions (4) Is regular expression as power as CFG? { [i ]i | i1}

Regular grammar

04/21/238

Aa

AaB

Aa

ABaor

Page 9: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Finite Automata and Scanners (1) Finite automaton (FA)

can be used to recognize the tokens specified by a regular expression

A FA consists of A finite set of states S A set of input symbols (the input symbol alphabet) A set of transitions (or moves) from one state to

another, labeled with characters in V

A special start state s0 (only one) A set of final, or accepting, states F

04/21/239

FA = {S, , s0, F, move }

Page 10: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Finite Automata and Scanners (2)

04/21/2310

is a transition

is a state

is a final state

is the start state

Example at next page….

Page 11: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Example A transition diagram

This machine accepts (abc+)+

Finite Automata and Scanners (3)

04/21/2311

a

a b c

c

( a b c + ) +

Page 12: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Finite Automata and Scanners (4) Other Example

(0|1)*0(0|1)(0|1)

04/21/2312

1 42 30 0,1 0,1

5

0,1 0

0,1 (0|1)*0(0|1)(0|1)

Page 13: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Finite Automata and Scanners (5) Other Example

ID = L(L|D)*(_(L|D)+)*

A data structure can be translated for many REs or FAs

04/21/2313

L -

L | D

L | D

(_(L|D)+)*L (L|D)*

Final for two * symbol

What difference?Answer : “_” by times

item 2 = item 3

Page 14: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Finite Automata and Scanners (6) Other Example

RealLit = (D+(|.))|(D*.D+)

04/21/2314

Page 15: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Two kinds of FA: Deterministic: next transition is unique Non-deterministic: otherwise

Finite Automata and Scanners (7)

04/21/2315

...a

a

Which path we should select?

...

Page 16: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

A transition diagram

A transition table

4321

Finite Automata and Scanners (8)

04/21/2316

1/ /

Not(Eol)

3 42Eol

State Character

- Eol a b …

3

3

2

4 3 3 3

Page 17: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Finite Automata and Scanners (9) Any regular expression

can be translated into a DFA that accepts the set of strings denoted by the regular expression

The transition can be done Automatically by a scanner generator : LEX (TA course) Manually by a programmer :

Coding the DFA in two form 1. Table-driven, commonly produced by a scanner

generator 2. Explicit control, produced automatically or by

hand

04/21/2317

Page 18: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Finite Automata and Scanners (10) Scanner Driver Interpreting a Transition

Table/* Note: CurrentChar is already set to the current input character. */

State = StartState;

while (TRUE) {

NextSate = T[State , CurrentChar];

if (NextSate == ERROR)

break;

State = NextState;

CurrentChar = getchar();

}

If(is_final_state(State))

/* Return or process valid token. */

else

lexical_error(CurrentChar);

04/21/2318

Table-driven

Page 19: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Finite Automata and Scanners (11) Scanner with Fixed Token Definitionif (CurrentChar == ‘/') {

CurrentChar = getchar();

if (CurrentChar == ‘/') {

do {

CurrentChar = getchar();

} while (CurrentChar != '\n');

} else {

ungetc(CurrentChar, stdin);

lexical_error(CurrentChar);

}

}

else

lexical_error(CurrentChar);

/* Return or process valid token. */

04/21/2319

Explicit control

Page 20: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Finite Automata and Scanners (12) Transducer

We may perform some actions during state transition.

A scanner can be turned into a transducer by the appropriate insertion of actions based on state transitions

04/21/2320

Page 21: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

2121

Using a Scanner Generator By TA….

04/21/23

Page 22: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (1) Reserved Words

Usually, all keywords are reserved in order to simplify parsing.

In Pascal, we could even write begin begin; end; end; begin; end if else then if = else;

The problem with reserved words is that they are too numerous. COBOL has several hundreds of reserved words!

ZEROS ZERO ZEROES

04/21/2322

Page 23: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (2) Compiler Directives and Listing Source Lines

Compiler options e.g. optimization, profiling, etc. handled by scanner or semantic routines Complex pragmas are treated like other statements.

Source inclusion e.g. #include in C handled by preprocessor or scanner

Conditional compilation e.g. #if, #endif in C useful for creating program versions

04/21/2323

Page 24: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (3) Entry of Identifiers into the Symbol Table

Who is responsible for entering symbols into symbol table?

Scanner?

Consider this example: { int abc; … { int abc; } }

04/21/2324

Page 25: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (4) How to handle end-of-file?

Create a special EOF token. EOF token is useful in a CFG

Multicharacter Lookahead Blanks are not significant in Fortran DO 10 I= 1,100

Beginning of a loop

DO 10 I = 1.100 An assignment statement DO 10I=1.100

A Fortran Scanner can determine whether the O is the last character of a DO

token only after reading as far as the comma

04/21/2325

Page 26: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (5) Multicharacter Lookahead (Cont’d)

In Ada and Pascal To scan 10..100

There are three token 10 .. 100

Two-character (..) lookahead after the 10

It is easy to build a scanner that can perform general backup.

If we reach a situation in which we are not in final state and

cannot scan any more characters, we extract characters from the right end of the buffer and queue them fir rescanning

Until we reach a prefix of the scanned characters flagged as a valid token

04/21/2326

Example at next page

Page 27: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (6) An FA That Scans Integer and Real Literals

and the Subrange Operator

04/21/2327

D ●

D

D

D

● ●Buffered

TokenToken Flag

1 Integer Literal

12 Integer Literal

12. Invalid

12.3 Real Literal

12.3e Invalid

12.3e+ Invalid

Detail Operation of each case at next page

Page 28: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (7) An FA That Scans Integer and Real Literals

and the Subrange Operator

04/21/2328

D

1

Buffered Token Token Flag

Integer Literal1

Input Token

Input string: 12.3e+q

Page 29: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (8) An FA That Scans Integer and Real Literals

and the Subrange Operator

04/21/2329

D

1

Buffered Token Token Flag

Integer Literal1

Input Token

D

2 2

Input string: 12.3e+q

Page 30: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (9) An FA That Scans Integer and Real Literals

and the Subrange Operator

04/21/2330

D

1

Buffered Token Token Flag

Invalid1

Input Token

D

2 2. .

Input string: 12.3e+q

Page 31: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (10) An FA That Scans Integer and Real Literals

and the Subrange Operator

04/21/2331

D

1

Buffered Token Token Flag

Real Literal1

Input Token

D

2 2. .

3 3

D

Input string: 12.3e+q

Page 32: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (11) An FA That Scans Integer and Real Literals

and the Subrange Operator

04/21/2332

D

1

Buffered Token Token Flag

Invalid1

Input Token

D

2 2. .

3 3

D

e e

?

Input string: 12.3e+q

Backup is invoked!

Page 33: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (11) An FA That Scans Integer and Real Literals

and the Subrange Operator

04/21/2333

D

1

Buffered Token Token Flag

Invalid1

Input Token

D

2 2. .

3 3

D

e e

?

Input string: 12.3e+q

Backup is invoked!

Page 34: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (12) An FA That Scans Integer and Real Literals

and the Subrange Operator

04/21/2334

D

1

Buffered Token Token Flag

Invalid1

Input Token

D

2 2. .

3 3

D

e e

?

+ +

?

Input string: 12.3e+q

Page 35: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Practical Considerations (13)cannot scan any more characters, and not in

accept state Backup is invoked !

04/21/2335

D

1

Buffered Token Token Flag

Invalid1

Input Token

D

2 2. .

3 3

D

e e

?

+ +

?

Input string: 12.3e+q

Page 36: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

3636

Outlines 3.1 Over View 3.2 Regular Expression 3.3 Finite Automata and Scanners 3.4 Using a Scanner Generator 3.5 Practical Considerations 3.6 Translating Regular Expressions into Finite

Automata Creating Deterministic Automata Optimizing Finite Automata

3.7 Tracing Example

04/21/23

Page 37: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Translating Regular Expressions into Finite Automata(1) Regular expressions are equivalent to FAs

The main job of a scanner generator To transform a regular expression definition into an

equivalent FA

04/21/2337

A regular expression

Nondeterministic FA

Deterministic FA

Optimized Deterministic FA

minimize # of states

Importance in NFA->DFA

Page 38: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Translating Regular Expressions into Finite Automata(2) We can transform any regular expression

into an NFA with the following properties: There is an unique final state The final state has no successors Every other state has at least one successors

Example : A Nondeterministic Finite Automaton (NFA) Input : babb Regular Expressions : (a|b)*abb

04/21/2338

Unique final state

Final S has no successor

0a

a

2b b

b

31

either one or two successors

Page 39: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Translating Regular Expressions into Finite Automata(3) We need to review the definition of regular

expression Item 1:

It is null string

Item 2: a It is a char of the vocabulary

Item 3 : | It is “or” operation. Example : A|B

Item 4 : ● It is the operation of catenation Example : AB

Item 4 : * It is the operation of repetition Example : A*

04/21/2339

More Example at Next Page

Page 40: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Translating Regular Expressions into Finite Automata(4) NFA : (null string)

NFA : a (1string) A char of the vocabulary

04/21/2340

a

Processing Token

aProcessing Token

Page 41: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

NFA :

NFAFor A

Translating Regular Expressions into Finite Automata(5)

04/21/2341

NFAFor B

Processing Token

Page 42: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

NFA : ●

Translating Regular Expressions into Finite Automata(6)

04/21/2342

NFAFor A

NFAFor B

Processing Token ●

Page 43: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

NFA :

Translating Regular Expressions into Finite Automata(7)

04/21/2343

NFAFor A

Processing Token

= 0 times

> 1 times

Page 44: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Construct an NFA for Regular Expression

01* | 1 (0(1*)) |1

Translating Regular Expressions into Finite Automata(8)

04/21/2344

1 *Processing Token

Start

Page 45: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Construct an NFA for Regular Expression

01*|1 (0(1*)) |1

Translating Regular Expressions into Finite Automata(9)

04/21/2345

Processing Token 1*

Start

0

For Connection

Page 46: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Construct an NFA for Regular Expression

01*+1 (0(1*))+1

Translating Regular Expressions into Finite Automata(10)

04/21/2346

Processing Token 1*

0 | 1

Start

Page 47: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

What’s problem about NFA? Ans: It may be ambiguous that difficult to

program!!!

A Nondeterministic Finite Automaton (NFA):(a|b)*abb

Translating Regular Expressions into Finite Automata(11)

04/21/2347

2b

3Start 0a

1b

a

b

Input : babbProcessing Tokenb a

Ambiguous!!!

Which one should we select?

Page 48: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

What’s problem about NFA? Ans: It may be ambiguous that difficult to

program!!!

A deterministic Finite Automaton (NFA): b*abb

Translating Regular Expressions into Finite Automata(12)

04/21/2348

2b

3Start 0a

1b

b

Input : babbProcessing Tokenb a

No Ambiguous!!!

It have unique path!

b b

Page 49: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(1) The transformation

from an NFA N to an equivalent DFA M works by what is sometimes called the subset construction

An Example for each step… Initial NFA : 01*|1 (0(1*)) |1

04/21/2349

Start

4

652

3

1 10

7

8 9

More Detail operation at next page…

Page 50: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2350

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7, 10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

More Detail operation at next page…

Page 51: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2351

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Page 52: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2352

2

0

1

8 1

1. -closure(1) ={1, 2, 8}

Page 53: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2353

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Page 54: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2354

2

0

2. -closure(2) ={2}

Page 55: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2355

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Page 56: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2356

4 1

53

10

7

3. -closure(3) ={3 ,4 ,5 ,7 ,10}

Page 57: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2357

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Page 58: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2358

4 1

5

10

7

4. -closure(4) ={4 ,5 ,7 ,10}

Page 59: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2359

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Page 60: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2360

15

5. -closure(5) ={5}

Page 61: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2361

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Page 62: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2362

1

65

10

7

6. -closure(6) ={5 ,6 ,7 ,10}

This point line not be computed!!

Page 63: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2363

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Page 64: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2364

10

7

7. -closure(7) ={5, 7 ,10}

1

5

Page 65: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2365

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Page 66: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2366

81

8. -closure(8) ={8}

Page 67: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2367

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Page 68: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2368

10

9

9. -closure(9) ={9 ,10}

Page 69: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2369

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Page 70: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2370

10

10. -closure(10) ={10}

Page 71: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2371

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Total closures, but…..

Page 72: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2372

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={5, 7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Delete Sub Set ...

Page 73: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(2) Step 1:

04/21/2373

Start

4 1

652

0

3

110

7

8 9 1

1. -closure(1) ={1, 2, 8}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,5 ,7 ,10}4. -closure(4) ={4 ,5 ,7 ,10}5. -closure(5) ={5}

6. -closure(6) ={5 ,6 ,7 ,10}7. -closure(7) ={7 ,10}8. -closure(8) ={8}9. -closure(9) ={9 ,10}10. -closure(10) ={10}

Now Closures, No Sub Set ...

State 3 state 3state 4state 5,7state 10empty

Page 74: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(3) Step 1:

The initial state of M is the set of states reachable from the initial state of N by -transitions

Usually called l-closure or ε-closure

04/21/2374

Algorithm for example at upside

Page 75: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(4) Step 2:

To create the successor states Take any state S of M and any character c, and compute S’s

successor under c S is identified with some set of N’s states, {n1, n2,…}

Find all possible successor states to {n1, n2,…} under c Obtain a set {m1, m2,…}

T=close({m1, m2,…})

04/21/2375

S T{n1, n2,…} close({m1, m2,…})

Page 76: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(7) Step 2:

void make_deterministic( nondeterministic_fa N , deterministic *M){ set_of_fa_states T; M->initial_state = SET_OF(N.initial_state) ; close (& M->initial_state ); Add M-> initial_state to M->states; while( states or transitions can be added) { choose S in M->states and c in Alphabet; T=SET_OF (y in N . states SUCH THAT x->y under c for some x in S); close(& T); if(T not in M->states) add T to M->states;

Add the transition to M->transitions: S->T under c; } M->final_states = SET_OF(S in M->states SUCH_THAT N.final_state in S);}

04/21/2376

Example at next page…

Page 77: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(5) Step 2:

First Re-Number for simplifying the work flow1. -closure(1) ={1, 2, 8} A = {1, 2, 8}

3. -closure(3) ={3 ,4 ,5 ,7 ,10} B = {3 ,4 ,5 ,7 ,10}

6. -closure(6) ={5 ,6 ,7 ,10} C = {5 ,6 ,7 ,10}

9. -closure(9) ={9 ,10} D = {9, 10}

04/21/2377

Start

4 1

652

0

3

1 10

7

8 9 1

More Operation at next page ……

Page 78: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(6)

04/21/2378

{1,2,8}

{3 ,4 ,5 ,7 ,10}

{9, 10}

{5 ,6 ,7 ,10

}

Start

4 1

652

0

3

1 10

7

8 9 1

A : {1, 2, 8} B : {3 ,4 ,5 ,7 ,10} C : {5 ,6 ,7 ,10}D : {9, 10}

A

B C

D

Start

0

1

1

1

NoOut-

Degree

Final

Page 79: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Creating Deterministic Automata(7) Step 2:

void make_deterministic( nondeterministic_fa N , deterministic *M){ set_of_fa_states T; M->initial_state = SET_OF(N.initial_state) ; close (& M->initial_state ); Add M-> initial_state to M->states; while( states or transitions can be added) { choose S in M->states and c in Alphabet; T=SET_OF (y in N . states SUCH THAT x->y under c for some x in S); close(& T); if(T not in M->states) add T to M->states;

Add the transition to M->transitions: S->T under c; } M->final_states = SET_OF(S in M->states SUCH_THAT N.final_state in S);}

04/21/2379

Example at next page…

Page 80: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Optimizing Finite Automata(1) Minimize number of states

Every DFA has a unique smallest equivalent DFA Given a DFA M

we use Transition Table to construct the equivalent minimal DFA.

Initially, we draw a transition table from DFA diagram.

04/21/2380

Start

1

A

DFA

D

1

B C

Table

StateCharacter

0 1

A B D

B C

C C

D

A: Start StateB,C,D: Final State

Page 81: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Optimizing Finite Automata(2) Minimize number of states

04/21/2381

StateCharacter

0 1

A B D

B C

C C

D

Start

1

A

DFA

D

1

B C

OptimizeB is

equal C

StateCharacter

0 1

A {B, C} D

{B, C} {B, C}

D

New DFAStart A

D

1B,C

A: Start StateB,C,D: Final StateSpecial :

B can merge into C, Because the B and C are final state.

Page 82: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Additional Simplifying rules (removing

parentheses) “*” has highest precedence and is left associative Concatenation has 2nd highest precedence and is left

associative “| “has lowest precedence and is left associative E.g., (a)|((b)*(c)) == a|b*c

04/21/2382

Page 83: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

8383

Outlines 3.1 Over View 3.2 Regular Expression 3.3 Finite Automata and Scanners 3.4 Using a Scanner Generator 3.5 Practical Considerations 3.6 Translating Regular Expressions into Finite

Automata 3.7 Tracing Example

Modify form http://www.cs.ualberta.ca/~amaral/courses/680/

04/21/23

Page 84: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(1) Review Steps of Scanner Generator

04/21/2384

A regular expression

Nondeterministic FA

Deterministic FA

Optimized Deterministic FA

minimize # of states

Importance in NFA->DFA

Page 85: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(2) Regular Expression

IF and IFA

04/21/2385

if {return IF;}

[a - z] [a – z|0 - 9 ] * {return ID;}

[0 - 9] + {return NUM;}

. {error ();}

Page 86: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(3) Translate from RE to NFA

04/21/2386

A regularexpression

Nondeterministic FA

Deterministic FA

OptimizedDeterministic FA

minimize # of states

Page 87: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(4)

04/21/2387

The NFA for a symbol i is: i1 2start

The NFA for the regular expression if is:

f 31start 2i

The NFA for a symbol f is: f 2start 1

IF

if {return IF;}

Page 88: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(5)

04/21/2388

a-z1start

[a-z] [a-z|0-9 ] * {return ID;}

42 3a-z

0-9

ID

Page 89: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(6)

04/21/2389

54320-91start

NUM

[0 – 9]+ {return NUM;}

0-9

Page 90: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(9)

04/21/2390

NUM

21

any but \n

error

IDIF

1 2

i f

3 a-z1 42 3a-z

0-9

54320-91 0-9

Page 91: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(10) Translate from NFA to DFA

04/21/2391

A regularexpression

Nondeterministic FA

Deterministic FA

OptimizedDeterministic FA

minimize # of states

Page 92: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(11)

04/21/2392

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

Full NFA Diagram

Special case :Handle in Final

Page 93: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(12)

04/21/239393

2 3 84 5 6 7

914

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

1. -closure(1) ={ 1, 4, 9, 14}

2. -closure(2) ={ 2}

3. -closure(3) ={ 3}

4. -closure(5) ={ 5, 6, 8}5. -closure(7) ={ 7, 8}

6. -closure(8) ={ 6, 8}

7. -closure(10) ={ 10, 11, 13}

9. -closure(13) ={11, 13}

8. -closure(12) ={12, 13}

10 11 12 13

10. -closure(15) ={15}

15

Page 94: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(13)

04/21/2394

DFA States = {1-4-9-14}

1-4-9-14

Now we need to compute:

move(1-4-9-14,a-h) = {5,15}

-closure({5,15}) = {5,6,8,15}

a-h 5-6-8-15

2 3 84 5 6 7

914

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

10 11 12 1315

Page 95: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(16)

04/21/2395

DFA States = {1-4-9-14}move(1-4-9-14, i) =

1-4-9-14

a-h {2,5,15}

-closure({2,5,15}) = {2,5,6,8,15}

2-5-6-8-15i

2 3 84 5 6 7

914

1

a-z

0-90-9

a-z

0-9i

f

IF

error

anycharacter

10 11 12 1315

5-6-8-15

Page 96: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(21)

04/21/2396

DFA States = {1-4-9-14}move(1-4-9-14, j-z) =

-closure({5,15}) =1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

j-z{5,15}

{5,6,8,15}

2 3 84 5 6 7

914

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

10 11 12 1315

Page 97: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(22)

04/21/2397

DFA States = {1-4-9-14}move(1-4-9-14, 0-9) =

1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

j-z10-11-13-15

0-9

{10,15}

-closure({10,15}) ={10,11,13,15}

2 3 84 5 6 7

914

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

10 11 12 1315

Page 98: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(23)

04/21/239898

DFA States = {1-4-9-14}move(1-4-9-14, other) =

1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

j-z10-11-13-15

0-9

15other

{15}

-closure({15}) = {15}

2 3 84 5 6 7

914

1

a-z

0-90-9

a-z

0-9i

f

IF

error

anycharacter

10 11 12 1315

NUM

ID

Page 99: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(24)

04/21/2399

DFA states = {1-4-9-14}

The analysis for 1-4-9-14is complete. We mark it andpick another state in the DFAto analysis. (Practice)

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

j-z10-11-13-15

0-9

15other

Page 100: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(25)

04/21/23100

5-6-8-15

2-5-6-8-15

10-11-13-15

3-6-7-8

11-12-13

6-7-8

15

1-4-9-14

a-e, g-z, 0-9

a-z,0-9

a-z,0-9

0-9

0-9

f

i

a-h

j-z

0-9

other

ID

ID

NUM NUM

IF

error

ID

a-z,0-9

See pp. 118 of Aho-Sethi-Ullmanand pp. 29 of Appel.

Page 101: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(26)

04/21/23101

A regularexpression

Nondeterministic FA

Deterministic FA

OptimizedDeterministic FA

minimize # of states

Minimize DFA

Page 102: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(27)

04/21/23102

State

character

0-9 a-e f g-h i j-z other

A D C C C B C E

B G G F G G G -

C G G G G G G -

D H - - - - - -

E - - - - - - -

F G G G G G G -

G G G G G G G -

H H - - - - - -

A

B

C

D

E

F

G

HTransition Table

DFA

Page 103: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(28)

04/21/23103

State

character

0-9 a-e f g-h i j-z other

A D C C C B C E

B G G F G G G -

C G G G G G G -

D H - - - - - -

E - - - - - - -

F G G G G G G -

G G G G G G G -

H H - - - - - -

A

B

C

D

E

F

G

HTransition Table

DFA

State

character

0-9

a-e f g-

h i j-zother

A D C C C B C E

B C C C C C C -

C C C C C C C -

D D - - - - - -

E - - - - - - -

New Transition Table-1

Page 104: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(29)

04/21/23104

A

B

C

D

E

F

G

H

DFA

State

character

0-9

a-e f g-

h i j-zother

A D C C C B C E

B C C C C C C -

C C C C C C C -

D D - - - - - -

E - - - - - - -

New Transition Table-1

State

character

0-9

a-e f g-

h i j-zother

A D B B B B B E

B B B B B B B -

D D - - - - - -

E - - - - - - -

New Transition Table-2

Page 105: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Tracing Example(30)

04/21/23105

A

B

C

D

E

F

G

H

DFA

B

D

E

A

0-9

a-z

0-9

other

IF

ID

NUM

error

a-z,0-9

B=C=F=GD=H

State

character

0-9

a-e f g-

h i j-zother

A D B B B B B E

B B B B B B B -

D D - - - - - -

E - - - - - - -

New Transition Table-2

i

fNew DFA

IF can be handled by look-ahead programming

Page 106: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Chapter 3 End

Any Question?

04/21/23106

隨堂考試 (1+) What is the optimized DFA for 1+?

Page 107: 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

1. -closure(1) ={1, 2}2. -closure(2) ={2}3. -closure(3) ={3 ,4 ,2}4. -closure(4) ={4 ,2}

1 42 3

1*

= 1 . 1+ (Can use this method)

1. -closure(1) ={1, 2} A3. -closure(3) ={3 ,4 ,2} B

StateCharacter

0 1

A B

B B

A

B

{1,2}

AStart

{3,4,2}

1B 1

Can Not Optimized, (Merge)For A is Start State, B is Final State!