We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id...
-
Upload
john-nicholson -
Category
Documents
-
view
214 -
download
0
Transcript of We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id...
IMPLEMENTING LEXICAL
ANALYZER USING FINITE AUTOMATION
We are given the following regular definition:
if -> ifthen -> thenelse -> elserelop -> <| <=|=|<>|>|>=id -> letter(letter|digit)*
num -> digit+(.digit+)? (E(+|-)?digit+)?
letter -> [a-z]|[A-Z]digit ->[0-9]
Recognize the keyword: if, then, else and
lexemes: relop, id, num delim -> blank|tab|newline
ws -> delim+ if a match for ws is found lexical analyzer does not return a token to parser. It proceeds to find a token following the white space and return that to parser.
TRANSITION DIAGRAMS Transition diagram depicts the actions that
takes place when a lexical analyzer is called by parser to get the next token
TD keeps track of information about characters that are seen as fwd pointer scans the input
Position in TD are drawn as circles called states
States are connected by arrows called edges Edges leaving state s have labels indicating
i/p characters that can next appear after transition diagram have reached state s.
Start state: state where control resides when we begin to recognize a token.
No valid transitions indicate failure Accepting state: state in which token
can be found. * indicates state in which retraction
must takes place
0 1 2start letter
letter/digit
delimiter*
There may be several transition diagrams If failure occurs while following one
transition diagram, then retract the fwd pointer to where it was in start state of this diagram and activate next transition diagram
If failure occurs in all transition diagrams, lexical error will be detected and error recovery routines will be invoked
e.g. DO 5 I=1.25 DO 5 I=1,25
RECOGNITION OF RESERVED WORDS Initialize appropriately the symbol table in
which information about identifiers is stored Enter the reserved words into symbol table
before any characters in the i/p are seen. Make a note in the symbol table of the token
to be returned when the keyword is identified. Return statement next to accepting state
uses gettoken() and install_id() to obtain token and attribute value
When a lexeme is identified, symbol table is checked if found as keyword install_id() will return 0 If an identifier , pointer to symbol table entry will
be returned gettoken() will return the corresponding
token
RECOGNITION OF NUMBERS When accepting state is reached,
call a procedure install_num() that enters the lexeme into table of numbers and returns a pointer to created entry
Returns the token NUM
IMPLEMENTING LEXICAL ANALYZER Token nexttoken( ) { While (1) { switch(state) { case 0: c=nextchar(); If (c==blank|| c==tab|| c==newline) { State =0; lexeme_beginning++; } else if (c==’<’) state=1; else if (c ==’=’)state=5; else if (c==’>’) state=6; else state=fail(); break; case 1: c= nextchar(); if (c==’=’) state=2; else if (c==’>’) state=3; else state=4; break; case 2: token.attribute=LE; token.name=relop; return token;
case 8: retract (1); token.attribute=GT; token.name=relop; return token; case 9: c= nextchar(); if (isletter(c)) state=10; else state= fail(); break; case 10: c= nextchar(); if (isletter(c)) state=10; else if (isdigit(c)) state=10; else state=11; break; case11: retract (1); entry=install_id( ); name=gettoken(); token.name= name; token. attribute=entry; return token; break; /* cases 12-24 here for numbers*/
case 25: c= nextchar(); if (isidgit(c)) state=26; else state=fail(); break; case 26: c= nextchar(); if (isidgit(c)) state=26; else state=27; break; case 27:retract (1); install_num( ); return (NUM); } } }
CODE FOR NEXT STATE int state=0, start=0; int lexical_value; int fail() { forward=token_beginning; switch( start){ case 0:start=9; break; case 9: start=12; break; case 12: start=20; break; case 20: start=25; break; case 25: recover( ); break; default: /* compiler error*/ } return start; }