We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id...

15
IMPLEMENTING LEXICAL ANALYZER USING FINITE AUTOMATION

Transcript of We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id...

Page 1: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

IMPLEMENTING LEXICAL

ANALYZER USING FINITE AUTOMATION

Page 2: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

We are given the following regular definition:

if -> ifthen -> thenelse -> elserelop -> <| <=|=|<>|>|>=id -> letter(letter|digit)*

num -> digit+(.digit+)? (E(+|-)?digit+)?

letter -> [a-z]|[A-Z]digit ->[0-9]

Page 3: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

Recognize the keyword: if, then, else and

lexemes: relop, id, num delim -> blank|tab|newline

ws -> delim+ if a match for ws is found lexical analyzer does not return a token to parser. It proceeds to find a token following the white space and return that to parser.

Page 4: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.
Page 5: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

TRANSITION DIAGRAMS Transition diagram depicts the actions that

takes place when a lexical analyzer is called by parser to get the next token

TD keeps track of information about characters that are seen as fwd pointer scans the input

Position in TD are drawn as circles called states

States are connected by arrows called edges Edges leaving state s have labels indicating

i/p characters that can next appear after transition diagram have reached state s.

Page 6: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

Start state: state where control resides when we begin to recognize a token.

No valid transitions indicate failure Accepting state: state in which token

can be found. * indicates state in which retraction

must takes place

0 1 2start letter

letter/digit

delimiter*

Page 7: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

There may be several transition diagrams If failure occurs while following one

transition diagram, then retract the fwd pointer to where it was in start state of this diagram and activate next transition diagram

If failure occurs in all transition diagrams, lexical error will be detected and error recovery routines will be invoked

e.g. DO 5 I=1.25 DO 5 I=1,25

Page 8: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.
Page 9: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

RECOGNITION OF RESERVED WORDS Initialize appropriately the symbol table in

which information about identifiers is stored Enter the reserved words into symbol table

before any characters in the i/p are seen. Make a note in the symbol table of the token

to be returned when the keyword is identified. Return statement next to accepting state

uses gettoken() and install_id() to obtain token and attribute value

When a lexeme is identified, symbol table is checked if found as keyword install_id() will return 0 If an identifier , pointer to symbol table entry will

be returned gettoken() will return the corresponding

token

Page 10: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

RECOGNITION OF NUMBERS When accepting state is reached,

call a procedure install_num() that enters the lexeme into table of numbers and returns a pointer to created entry

Returns the token NUM

Page 11: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.
Page 12: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

IMPLEMENTING LEXICAL ANALYZER Token nexttoken( ) { While (1) { switch(state) { case 0: c=nextchar(); If (c==blank|| c==tab|| c==newline) { State =0; lexeme_beginning++; } else if (c==’<’) state=1; else if (c ==’=’)state=5; else if (c==’>’) state=6; else state=fail(); break; case 1: c= nextchar(); if (c==’=’) state=2; else if (c==’>’) state=3; else state=4; break; case 2: token.attribute=LE; token.name=relop; return token;

Page 13: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

case 8: retract (1); token.attribute=GT; token.name=relop; return token; case 9: c= nextchar(); if (isletter(c)) state=10; else state= fail(); break; case 10: c= nextchar(); if (isletter(c)) state=10; else if (isdigit(c)) state=10; else state=11; break; case11: retract (1); entry=install_id( ); name=gettoken(); token.name= name; token. attribute=entry; return token; break; /* cases 12-24 here for numbers*/

Page 14: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

case 25: c= nextchar(); if (isidgit(c)) state=26; else state=fail(); break; case 26: c= nextchar(); if (isidgit(c)) state=26; else state=27; break; case 27:retract (1); install_num( ); return (NUM); } } }

Page 15: We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

CODE FOR NEXT STATE int state=0, start=0; int lexical_value; int fail() { forward=token_beginning; switch( start){ case 0:start=9; break; case 9: start=12; break; case 12: start=20; break; case 20: start=25; break; case 25: recover( ); break; default: /* compiler error*/ } return start; }