Download - Programming Language Concepts (CIS 635)

Programming Language Concepts (CIS 635)

Elsa L Gunter

4303 GITC

NJIT, www.cs.njit.edu/~elsa/635

http://www.cs.njit.edu/~elsa/635








Parsing Token Streams

• We will create three mutually recursive parsing functions:

expr : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option)

term : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option)

factor : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option)


<expr> ::= <term> [( + | - ) <expr> ] fun expr tokens = (case term tokens of ( true , tokens_after_term) => (case tokens_after_term of ( SOME Plus_token, tokens_after_plus) =>

Parsing an Expression


<expr> ::= <term> + <expr> fun expr tokens = (case term tokens of ( true , tokens_after_term) => (case tokens_after_term of ( SOME Plus_token , tokens_after_plus) =>

Parsing a Plus Expression


<expr> ::= <term> + <expr>

(case expr (tokens_after_plus(), tokens_after_plus) of ( true ,

tokens_after_expr) => ( true ,

tokens_after_expr)




(case expr (tokens_after_plus(), tokens_after_plus)

of ( true, tokens_after_expr) =>

( true , tokens_after_expr)




(case expr (tokens_after_plus(), tokens_after_plus) of ( true ,

tokens_after_expr) => ( true ,

tokens_after_expr)




| ( false ,rem_tokens) =>

( false , rem_tokens))

• Code for Minus_token is almost identical

What If No Expression After Plus


<expr> ::= <term>

| _ => ( true ,

tokens_after_term))

What If No Plus or Minus


expr> ::= <term> [( + | - ) <expr> ]

| ( false , rem_tokens) =>


• Code for term is same as for expr except for replacing addition with multiplication and subtraction with division

What if No Term


<factor> ::= <id>

and factor (SOME (Id_token id_name) ,

tokens) =

( true ,

(tokens(), tokens))

Parsing Factor as Id


<factor> ::= ( <expr> )

| factor (SOME Left_parenthesis , tokens) = (case expr (tokens(), tokens) of ( true , tokens_after_expr) =>

Parsing Factor as Parenthesized Expression



(case tokens_after_expr

of ( SOME Right_parenthesis ,

tokens_after_rparen ) =>

( true , (tokens_after_rparen(),

tokens_after_rparen))

Parsing Factor as Parenthesized Expression


What if No Right Parenthesis


| _ => ( false , tokens_after_expr))





What If No Expression After Left Parenthesis


What If No Id or Left Parenthesis

<factor> ::= <id> | ( <expr> )

| factor tokens = ( false , tokens)


Parsing - in C

• Assume global variable currentToken that holds the latest token removed from token stream

• Assume subroutine lex( ) to analyze the character stream, find the next token at the head of that stream and update currentToken with that token

• Assume subroutine error( ) to raise an exception


Parsing expr – in C

<expr> ::= <term> [( + | - ) <expr> ]

void expr ( ) {

term ( );

if (nextToken == PLUS_CODE) {

lex ( );

expr ( ); }

else if (nextToken == MINUS_CODE) {

lex ( );

expr ( );}


SML Code

fun expr tokens =(case term tokens of ( true , tokens_after_term) => (case tokens_after_term of (SOME Plus_token,tokens_after_plus) => (case expr (tokens_after_plus(),

tokens_after_plus) of ( true , tokens_after_expr) =>

( true , tokens_after_expr)


Parsing expr – in C (optimized)

<expr> ::= <term> [( + | - ) <expr> ]

void expr ( ) { term( ); while (nextToken == PLUS_CODE || nextToken == MINUS_CODE) { lex ( ); term ( ); }}


Parsing factor – in C

<factor> ::= <id>

void factor ( ) {

if (nextToken = ID_CODE)

lex ( );


<factor> ::= <id>

and factor (SOME (Id_token id_name) ,

tokens) =

( true , (tokens(), tokens))

Parsing Factor as Id




else if (nextToken == LEFT_PAREN_CODE) { lex ( ); expr ( ); if (nextToken == RIGHT_PAREN_CODE) lex;


Comparable SML Code

| factor (SOME Left_parenthesis , tokens) = (case expr (tokens(), tokens) of ( true , tokens_after_expr) => (case tokens_after_expr

of ( SOME Right_parenthesis , tokens_after_rparen ) => ( true , (tokens_after_rparen(), tokens_after_rparen))



else error ( );/* Right parenthesis missing */ } else error ( );/* Neither <id> nor ( was found at start */}


Error cases in SML

(* No right parenthesis *)

| _ => ( false , tokens_after_expr))

(* No expression found *)



(* Neither <id> nor left parenthesis found *)

| factor tokens = ( false , tokens)


Lexers – Simple Parsers

• Lexers are parsers driven by regular grammars

• Use character codes and arithmetic comparisons rather than case analysis to determine syntactic category for each character

• Often some semantic action must be taken– Compute a number or build a string and

record it in a symbol table


Example

• <pos> = <digit> <pos> | <digit>• <digit> = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

fun digit c =

(case Char.ord c

of n => if n >= Char.ord #”0” andalso

n <= Char.ord #”9”

then SOME (n – Char.ord #”0”)

else NONE)


Example

fun pos [] = (NONE,[]) | pos (chars as ch::rem_chars) = (case digit ch of NONE => (NONE, chars) | SOME n => (case pos rem_chars of (NONE, more_chars) => (SOME (10,n), more_chars) | (SOME (p,m), more_chars) => (SOME (10*p,(p*n)+m), more_chars)))


Problems for Recursive-Descent Parsing

• Left Recursion:

A ::= Aw

translates to a subroutine that loops forever

• Indirect Left Recursion:

A ::= Bw

B ::= Av

causes the same problem


Problems for Recursive-Descent Parsing

• Parser must always be able to choose the next action based only only the next very next token

• Pairwise disjointedness Test: Can we always determine which rule (in the non-extended BNF) to choose based on just the first token


Pairwise Disjointedness Test

• For each rule

A ::= y

Calculate

FIRST (y) = {a | y =>* aw} { | if y =>* }• For each pair of rules A ::= y and A ::= z,

require FIRST(y) FIRST(z) = { }

• Test too strong: Can’t handle

<expr> ::= <term> [ ( + | - ) <expr> ]


Example

Grammar: <S> ::= <A> a <B> b<A> ::= <A> b | b<B> ::= a <B> | a

FIRST (<A> b) = {b}FIRST (b) = {b}Rules for <A> not pairwise disjoint