Download - Programming Language Concepts (CIS 635)

Page 2: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Sample Grammar

<expr> ::= <term> | <term> + <expr>

| <term> - <expr>

<term> ::= <factor> | <factor> * <term>

| <factor> / <term>

<factor> ::= <id> | ( <expr> )

Page 3: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Tokens as SML Datatypes

• + - * / ( ) <id>

• Becomes an SML datatype

datatype token =

Id_token of string

| Left_parenthesis | Right_parenthesis

| Times_token | Divide_token

| Plus_token | Minus_token

Page 4: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Parsing Token Streams

• We will create three mutually recursive parsing functions:

expr : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option)

term : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option)

factor : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option)

Page 5: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<expr> ::= <term> [( + | - ) <expr> ] fun expr tokens = (case term tokens of ( true , tokens_after_term) => (case tokens_after_term of ( SOME Plus_token, tokens_after_plus) =>

Parsing an Expression

Page 6: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<expr> ::= <term> + <expr> fun expr tokens = (case term tokens of ( true , tokens_after_term) => (case tokens_after_term of ( SOME Plus_token , tokens_after_plus) =>

Parsing a Plus Expression

Page 7: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<expr> ::= <term> + <expr>

(case expr (tokens_after_plus(), tokens_after_plus) of ( true ,

tokens_after_expr) => ( true ,


Parsing a Plus Expression

Page 8: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<expr> ::= <term> + <expr>

(case expr (tokens_after_plus(), tokens_after_plus)

of ( true, tokens_after_expr) =>

( true , tokens_after_expr)

Parsing a Plus Expression

Page 9: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<expr> ::= <term> + <expr>

(case expr (tokens_after_plus(), tokens_after_plus) of ( true ,

tokens_after_expr) => ( true ,


Parsing a Plus Expression

Page 10: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<expr> ::= <term> + <expr>

| ( false ,rem_tokens) =>

( false , rem_tokens))

• Code for Minus_token is almost identical

What If No Expression After Plus

Page 11: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<expr> ::= <term>

| _ => ( true ,


What If No Plus or Minus

Page 12: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

expr> ::= <term> [( + | - ) <expr> ]

| ( false , rem_tokens) =>

( false , rem_tokens))

• Code for term is same as for expr except for replacing addition with multiplication and subtraction with division

What if No Term

Page 13: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<factor> ::= <id>

and factor (SOME (Id_token id_name) ,

tokens) =

( true ,

(tokens(), tokens))

Parsing Factor as Id

Page 14: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<factor> ::= ( <expr> )

| factor (SOME Left_parenthesis , tokens) = (case expr (tokens(), tokens) of ( true , tokens_after_expr) =>

Parsing Factor as Parenthesized Expression

Page 15: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<factor> ::= ( <expr> )

(case tokens_after_expr

of ( SOME Right_parenthesis ,

tokens_after_rparen ) =>

( true , (tokens_after_rparen(),


Parsing Factor as Parenthesized Expression

Page 16: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

What if No Right Parenthesis

<factor> ::= ( <expr> )

| _ => ( false , tokens_after_expr))

Page 17: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<factor> ::= ( <expr> )

| ( false , rem_tokens) =>

( false , rem_tokens))

What If No Expression After Left Parenthesis

Page 18: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

What If No Id or Left Parenthesis

<factor> ::= <id> | ( <expr> )

| factor tokens = ( false , tokens)

Page 19: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Parsing - in C

• Assume global variable currentToken that holds the latest token removed from token stream

• Assume subroutine lex( ) to analyze the character stream, find the next token at the head of that stream and update currentToken with that token

• Assume subroutine error( ) to raise an exception

Page 20: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Parsing expr – in C

<expr> ::= <term> [( + | - ) <expr> ]

void expr ( ) {

term ( );

if (nextToken == PLUS_CODE) {

lex ( );

expr ( ); }

else if (nextToken == MINUS_CODE) {

lex ( );

expr ( );}

Page 21: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

SML Code

fun expr tokens =(case term tokens of ( true , tokens_after_term) => (case tokens_after_term of (SOME Plus_token,tokens_after_plus) => (case expr (tokens_after_plus(),

tokens_after_plus) of ( true , tokens_after_expr) =>

( true , tokens_after_expr)

Page 22: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Parsing expr – in C (optimized)

<expr> ::= <term> [( + | - ) <expr> ]

void expr ( ) { term( ); while (nextToken == PLUS_CODE || nextToken == MINUS_CODE) { lex ( ); term ( ); }}

Page 23: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Parsing factor – in C

<factor> ::= <id>

void factor ( ) {

if (nextToken = ID_CODE)

lex ( );

Page 24: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

<factor> ::= <id>

and factor (SOME (Id_token id_name) ,

tokens) =

( true , (tokens(), tokens))

Parsing Factor as Id

Page 25: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Parsing factor – in C

<factor> ::= ( <expr> )

else if (nextToken == LEFT_PAREN_CODE) { lex ( ); expr ( ); if (nextToken == RIGHT_PAREN_CODE) lex;

Page 26: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Comparable SML Code

| factor (SOME Left_parenthesis , tokens) = (case expr (tokens(), tokens) of ( true , tokens_after_expr) => (case tokens_after_expr

of ( SOME Right_parenthesis , tokens_after_rparen ) => ( true , (tokens_after_rparen(), tokens_after_rparen))

Page 27: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Parsing factor – in C

else error ( );/* Right parenthesis missing */ } else error ( );/* Neither <id> nor ( was found at start */}

Page 28: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Error cases in SML

(* No right parenthesis *)

| _ => ( false , tokens_after_expr))

(* No expression found *)

| ( false , rem_tokens) =>

( false , rem_tokens))

(* Neither <id> nor left parenthesis found *)

| factor tokens = ( false , tokens)

Page 29: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Lexers – Simple Parsers

• Lexers are parsers driven by regular grammars

• Use character codes and arithmetic comparisons rather than case analysis to determine syntactic category for each character

• Often some semantic action must be taken– Compute a number or build a string and

record it in a symbol table

Page 30: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter


• <pos> = <digit> <pos> | <digit>• <digit> = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

fun digit c =

(case Char.ord c

of n => if n >= Char.ord #”0” andalso

n <= Char.ord #”9”

then SOME (n – Char.ord #”0”)

else NONE)

Page 31: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter


fun pos [] = (NONE,[]) | pos (chars as ch::rem_chars) = (case digit ch of NONE => (NONE, chars) | SOME n => (case pos rem_chars of (NONE, more_chars) => (SOME (10,n), more_chars) | (SOME (p,m), more_chars) => (SOME (10*p,(p*n)+m), more_chars)))

Page 32: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Problems for Recursive-Descent Parsing

• Left Recursion:

A ::= Aw

translates to a subroutine that loops forever

• Indirect Left Recursion:

A ::= Bw

B ::= Av

causes the same problem

Page 33: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Problems for Recursive-Descent Parsing

• Parser must always be able to choose the next action based only only the next very next token

• Pairwise disjointedness Test: Can we always determine which rule (in the non-extended BNF) to choose based on just the first token

Page 34: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter

Pairwise Disjointedness Test

• For each rule

A ::= y


FIRST (y) = {a | y =>* aw} { | if y =>* }• For each pair of rules A ::= y and A ::= z,

require FIRST(y) FIRST(z) = { }

• Test too strong: Can’t handle

<expr> ::= <term> [ ( + | - ) <expr> ]

Page 35: Programming Language Concepts (CIS 635)

Copyright 2002 Elsa L. Gunter


Grammar: <S> ::= <A> a <B> b<A> ::= <A> b | b<B> ::= a <B> | a

FIRST (<A> b) = {b}FIRST (b) = {b}Rules for <A> not pairwise disjoint