Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... ·...
Transcript of Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... ·...
Recognition of TokensLecture 3
Section 3.4
Robb T. Koether
Hampden-Sydney College
Mon, Jan 19, 2015
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 1 / 21
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 2 / 21
Outline
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 3 / 21
A Class of Tokens
We will explore and demonstrate the concepts of a lexer by usinga simple class of tokens.
digit → [0-9]
digits → digit+
number → digits (. digits)? (E [+-]? digits)?
letter → [A-Za-z]
id → letter (letter | digit)∗
if → if
then → then
else → else
relop → < | > | <= | >= | = | <>
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 4 / 21
Whitespace
In addition to recognizing tokens, the lexer must strip whitespacefrom the input.Whitespace is not a token, but it must be recognized by the lexer.
ws → (blank | tab | newline)+
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 5 / 21
Outline
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 6 / 21
The Input Buffer
The input to the lexer is a stream of characters.We may consider the characters to be residing in a buffer.We mark two positions in the buffer.
lexemeBeginforward
The pointer lexemeBegin holds the starting position of the currenttoken.The pointer forward points to the current symbol.
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 7 / 21
The Input Buffer
The lexer begins in the start state with the current symbol (pointedto by both lexemeBegin and forward).The process moves from state to state by following the transitionswhose labels match the current symbol (forward).This continues until no further moves are possible.
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 8 / 21
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
The input buffer
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
Advance one symbol
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
Could be an identifier; could be a keyword
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
It is the keyword int
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
Skip whitespace
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
Could be an identifier; could be a keyword
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
It is an identifier
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
This is an operator, but which one?
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
It is the assignment operator
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
Outline
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 10 / 21
Transition Diagrams
Definition (Transition Diagram)A transition diagram is a directed graph.It consists of a finite set of nodes, called states.One state is designated the start state.The directed edges between states represent transitions.Each transition is labeled with a symbol (or possibly a regularexpression).A subset of the set of states is designated the accepting states.The remaining states are rejecting states.
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 11 / 21
Transition Diagrams
Example (Relational Operators)Consider the relational operators <, >, <=, >=, =, and <>.The first symbol may be <, >, =, or something else.If the first symbol is <, then the next symbol may be =, >, orsomething else.If the first symbol is >, then the next symbol may be = orsomething else.If the first symbol is =, then the next symbol is something else.
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 12 / 21
Transition Diagrams
Example (Relational Operators)
1
0
=
=
>
<
>
=
other
other
2
3
4
5
6
8
7
LE
NE
LT and retract
EQ
GE
GT and retract
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 13 / 21
Transition Diagrams
Example (Identifiers)
9letter other
10 11 identifierand retract
letter | digit
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 14 / 21
Transition Diagrams
Example (Keywords)
if other keyword
and retract
t h e n other keywordand retract
e l s e other keywordand retract
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 15 / 21
Transition Diagrams
Draw the transition diagram for numbers
digit → [0-9]
digits → digit+
number → digits (. digits)? (E [+-]? digits)?
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 16 / 21
Outline
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 17 / 21
Writing the Lexer
The lexer is the program that implements the transition diagram.We could use
A switch statement, and/orAn if-else structure.
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 18 / 21
Writing the Lexer
The Lexer for Relational OperatorsToken getRelop(){
int state = 0;char c = get_next_symbol();while (c == ’<’ || c == ’=’ || c == ’>’){
switch (state){
case 0:if (c == ’<’) state = 1;else if (c == ’=’) state = 2;else if (c == ’>’) state = 3;else fail();break;
...case 8:
retract();return Token(GT);
}}
}
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 19 / 21
Outline
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 20 / 21
Assignment
AssignmentRead Section 3.4.Exercises 1, 2(c)(i).
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 21 / 21