Programming Language Concepts (CIS 635)
Elsa L Gunter
4303 GITC
NJIT, www.cs.njit.edu/~elsa/635
Copyright 2002 Elsa L. Gunter
Sample Grammar
<expr> ::= <term> | <term> + <expr>
| <term> - <expr>
<term> ::= <factor> | <factor> * <term>
| <factor> / <term>
<factor> ::= <id> | ( <expr> )
Copyright 2002 Elsa L. Gunter
Tokens as SML Datatypes
• + - * / ( ) <id>
• Becomes an SML datatype
datatype token =
Id_token of string
| Left_parenthesis | Right_parenthesis
| Times_token | Divide_token
| Plus_token | Minus_token
Copyright 2002 Elsa L. Gunter
Parsing Token Streams
• We will create three mutually recursive parsing functions:
expr : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option)
term : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option)
factor : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option)
Copyright 2002 Elsa L. Gunter
<expr> ::= <term> [( + | - ) <expr> ] fun expr tokens = (case term tokens of ( true , tokens_after_term) => (case tokens_after_term of ( SOME Plus_token, tokens_after_plus) =>
Parsing an Expression
Copyright 2002 Elsa L. Gunter
<expr> ::= <term> + <expr> fun expr tokens = (case term tokens of ( true , tokens_after_term) => (case tokens_after_term of ( SOME Plus_token , tokens_after_plus) =>
Parsing a Plus Expression
Copyright 2002 Elsa L. Gunter
<expr> ::= <term> + <expr>
(case expr (tokens_after_plus(), tokens_after_plus) of ( true ,
tokens_after_expr) => ( true ,
tokens_after_expr)
Parsing a Plus Expression
Copyright 2002 Elsa L. Gunter
<expr> ::= <term> + <expr>
(case expr (tokens_after_plus(), tokens_after_plus)
of ( true, tokens_after_expr) =>
( true , tokens_after_expr)
Parsing a Plus Expression
Copyright 2002 Elsa L. Gunter
<expr> ::= <term> + <expr>
(case expr (tokens_after_plus(), tokens_after_plus) of ( true ,
tokens_after_expr) => ( true ,
tokens_after_expr)
Parsing a Plus Expression
Copyright 2002 Elsa L. Gunter
<expr> ::= <term> + <expr>
| ( false ,rem_tokens) =>
( false , rem_tokens))
• Code for Minus_token is almost identical
What If No Expression After Plus
Copyright 2002 Elsa L. Gunter
<expr> ::= <term>
| _ => ( true ,
tokens_after_term))
What If No Plus or Minus
Copyright 2002 Elsa L. Gunter
expr> ::= <term> [( + | - ) <expr> ]
| ( false , rem_tokens) =>
( false , rem_tokens))
• Code for term is same as for expr except for replacing addition with multiplication and subtraction with division
What if No Term
Copyright 2002 Elsa L. Gunter
<factor> ::= <id>
and factor (SOME (Id_token id_name) ,
tokens) =
( true ,
(tokens(), tokens))
Parsing Factor as Id
Copyright 2002 Elsa L. Gunter
<factor> ::= ( <expr> )
| factor (SOME Left_parenthesis , tokens) = (case expr (tokens(), tokens) of ( true , tokens_after_expr) =>
Parsing Factor as Parenthesized Expression
Copyright 2002 Elsa L. Gunter
<factor> ::= ( <expr> )
(case tokens_after_expr
of ( SOME Right_parenthesis ,
tokens_after_rparen ) =>
( true , (tokens_after_rparen(),
tokens_after_rparen))
Parsing Factor as Parenthesized Expression
Copyright 2002 Elsa L. Gunter
What if No Right Parenthesis
<factor> ::= ( <expr> )
| _ => ( false , tokens_after_expr))
Copyright 2002 Elsa L. Gunter
<factor> ::= ( <expr> )
| ( false , rem_tokens) =>
( false , rem_tokens))
What If No Expression After Left Parenthesis
Copyright 2002 Elsa L. Gunter
What If No Id or Left Parenthesis
<factor> ::= <id> | ( <expr> )
| factor tokens = ( false , tokens)
Copyright 2002 Elsa L. Gunter
Parsing - in C
• Assume global variable currentToken that holds the latest token removed from token stream
• Assume subroutine lex( ) to analyze the character stream, find the next token at the head of that stream and update currentToken with that token
• Assume subroutine error( ) to raise an exception
Copyright 2002 Elsa L. Gunter
Parsing expr – in C
<expr> ::= <term> [( + | - ) <expr> ]
void expr ( ) {
term ( );
if (nextToken == PLUS_CODE) {
lex ( );
expr ( ); }
else if (nextToken == MINUS_CODE) {
lex ( );
expr ( );}
Copyright 2002 Elsa L. Gunter
SML Code
fun expr tokens =(case term tokens of ( true , tokens_after_term) => (case tokens_after_term of (SOME Plus_token,tokens_after_plus) => (case expr (tokens_after_plus(),
tokens_after_plus) of ( true , tokens_after_expr) =>
( true , tokens_after_expr)
Copyright 2002 Elsa L. Gunter
Parsing expr – in C (optimized)
<expr> ::= <term> [( + | - ) <expr> ]
void expr ( ) { term( ); while (nextToken == PLUS_CODE || nextToken == MINUS_CODE) { lex ( ); term ( ); }}
Copyright 2002 Elsa L. Gunter
Parsing factor – in C
<factor> ::= <id>
void factor ( ) {
if (nextToken = ID_CODE)
lex ( );
Copyright 2002 Elsa L. Gunter
<factor> ::= <id>
and factor (SOME (Id_token id_name) ,
tokens) =
( true , (tokens(), tokens))
Parsing Factor as Id
Copyright 2002 Elsa L. Gunter
Parsing factor – in C
<factor> ::= ( <expr> )
else if (nextToken == LEFT_PAREN_CODE) { lex ( ); expr ( ); if (nextToken == RIGHT_PAREN_CODE) lex;
Copyright 2002 Elsa L. Gunter
Comparable SML Code
| factor (SOME Left_parenthesis , tokens) = (case expr (tokens(), tokens) of ( true , tokens_after_expr) => (case tokens_after_expr
of ( SOME Right_parenthesis , tokens_after_rparen ) => ( true , (tokens_after_rparen(), tokens_after_rparen))
Copyright 2002 Elsa L. Gunter
Parsing factor – in C
else error ( );/* Right parenthesis missing */ } else error ( );/* Neither <id> nor ( was found at start */}
Copyright 2002 Elsa L. Gunter
Error cases in SML
(* No right parenthesis *)
| _ => ( false , tokens_after_expr))
(* No expression found *)
| ( false , rem_tokens) =>
( false , rem_tokens))
(* Neither <id> nor left parenthesis found *)
| factor tokens = ( false , tokens)
Copyright 2002 Elsa L. Gunter
Lexers – Simple Parsers
• Lexers are parsers driven by regular grammars
• Use character codes and arithmetic comparisons rather than case analysis to determine syntactic category for each character
• Often some semantic action must be taken– Compute a number or build a string and
record it in a symbol table
Copyright 2002 Elsa L. Gunter
Example
• <pos> = <digit> <pos> | <digit>• <digit> = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
fun digit c =
(case Char.ord c
of n => if n >= Char.ord #”0” andalso
n <= Char.ord #”9”
then SOME (n – Char.ord #”0”)
else NONE)
Copyright 2002 Elsa L. Gunter
Example
fun pos [] = (NONE,[]) | pos (chars as ch::rem_chars) = (case digit ch of NONE => (NONE, chars) | SOME n => (case pos rem_chars of (NONE, more_chars) => (SOME (10,n), more_chars) | (SOME (p,m), more_chars) => (SOME (10*p,(p*n)+m), more_chars)))
Copyright 2002 Elsa L. Gunter
Problems for Recursive-Descent Parsing
• Left Recursion:
A ::= Aw
translates to a subroutine that loops forever
• Indirect Left Recursion:
A ::= Bw
B ::= Av
causes the same problem
Copyright 2002 Elsa L. Gunter
Problems for Recursive-Descent Parsing
• Parser must always be able to choose the next action based only only the next very next token
• Pairwise disjointedness Test: Can we always determine which rule (in the non-extended BNF) to choose based on just the first token
Copyright 2002 Elsa L. Gunter
Pairwise Disjointedness Test
• For each rule
A ::= y
Calculate
FIRST (y) = {a | y =>* aw} { | if y =>* }• For each pair of rules A ::= y and A ::= z,
require FIRST(y) FIRST(z) = { }
• Test too strong: Can’t handle
<expr> ::= <term> [ ( + | - ) <expr> ]
Copyright 2002 Elsa L. Gunter
Example
Grammar: <S> ::= <A> a <B> b<A> ::= <A> b | b<B> ::= a <B> | a
FIRST (<A> b) = {b}FIRST (b) = {b}Rules for <A> not pairwise disjoint
Top Related