Formal Syntax of Programming Languages Mooly Sagiv
Transcript of Formal Syntax of Programming Languages Mooly Sagiv
![Page 1: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/1.jpg)
Formal Syntax of
Programming Languages
Mooly Sagiv
http://www.cs.tau.ac.il/~msagiv/courses/pl16.html
![Page 2: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/2.jpg)
Benefits of formal definitions
• Intellectual
• Better understanding
• Formal proofs
• Mechanical checks by computer
• Tool generation
– Consistency
– Entailment
– Query evaluation
![Page 3: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/3.jpg)
What is a good formal definition?
• Natural
• Concise
• Easy to understand
• Permits effective mechanical reasoning
![Page 4: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/4.jpg)
Syntax vs. Semantics
• The pattern of formation
of sentences or phrases in
a language
• Examples
– Regular expressions
– Context free grammars
• The study or science of
meaning in language
• Examples
– Interpreter
– Compiler
– Better mechanisms will be
given in the course
![Page 5: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/5.jpg)
Benefits of formal syntax for
programming language
• Intellectual
• Simplicity
• Better understanding
– Interaction between different parts
• Abstraction
– Portability
• Tool generations
– Parser
![Page 6: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/6.jpg)
Tokens
• Basic units of the programming language
• Usually defined by regular expressions
• Examples
– Keywords
• “if”
• “while”
– Identifier {letter}({letter}|{digit}|_)*
– Numbers {0-9}+
![Page 7: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/7.jpg)
Example Tokens
Type Examples
ID foo n_14 last
NUM 73 00 517 082
REAL 66.1 .5 10. 1e67 5.5e-10
IF if
COMMA ,
NOTEQ !=
LPAREN (
RPAREN ) 7
![Page 8: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/8.jpg)
Example Non Tokens
Type Examples
comment /* ignored */
preprocessor directive #include <foo.h>
#define NUMS 5, 6
macro NUMS
whitespace \t \n \b
8
![Page 9: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/9.jpg)
Recursive Syntax Definitions
• The syntax of programming languages is naturally defined
recursively
• Valid program are represented as syntax trees
![Page 10: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/10.jpg)
Expression Definitions
• Every identifier is an expression
• If E1 and E2 are expressions and op is a binary operation
then so is ‘E1 op E2’ is an expression
<E> id | <E> <op> <E>
<op> + | - | * | /
![Page 11: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/11.jpg)
Statement Definitions
• If id is a identifier and E is an expression then ‘id := E’ is a
statement
• If S1 and S2 are statements and E is an expression then
– ‘S1 ; S2’ is a statement
– ‘if (E) S1 else S2’ is a statement
<S> id := <E>
<S> <S> ; <S>
<S> if (<E>) <S> else <S>
![Page 12: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/12.jpg)
C Example
void match0(char *s)
/* find a zero */
{
if (!strcmp(s, “0.0”))
return 0 ;
}
<Proc>
<S>
!
<S>
ID <Exp-List>
<E>
<E>
<E>
<Exp-List>
Exp
return <E>
Type
<ArgList>
<Arg
>
Type
ID
*
void
char s
strcmp
ID s
FNUM
“0.0”
NUM
“0”
ID
match0
![Page 13: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/13.jpg)
Context Free Grammars
• Non-terminals
– Start non-terminal
• Terminals (tokens)
• Context Free Rules
<Non-Terminal> Symbol Symbol … Symbol
![Page 14: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/14.jpg)
Example Context Free Grammar
1 <S> <S> ; <S>
2 <S> id := <E>
3 <S> print (<L>)
4 <E> id
5 <E> num
6 <E> <E> + <E>
7 <E> (<S>, <E>)
8 <L> <E>
9 <L> <L>, <E>
![Page 15: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/15.jpg)
Derivations
• Show that a sentence is in the grammar (valid program)
– Start with the start symbol
– Repeatedly replace one of the non-terminals by a right-hand side of a production
– Stop when the sentence contains terminals only
• Rightmost derivation
• Leftmost derivation
![Page 16: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/16.jpg)
Parse Trees
• The trace of a derivation
• Every internal node is labeled by a non-terminal
• Each symbol is connected to the deriving non-
terminal
![Page 17: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/17.jpg)
Example Parse Tree <<S>>
<S> ; <S>
<S> ; id := E
id := <E> ; id := <E>
id := num ; id := <E>
id := num ; id := <E> + <E>
id := num ; id := <E> + num
id := num ; id :=num + num
; <S> <S>
<S>.
id := <E> id := <E>
num <E> + <E>
num num
![Page 18: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/18.jpg)
Ambiguous Grammars
• Two leftmost derivations
• Two rightmost derivations
• Two parse trees
![Page 19: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/19.jpg)
A Grammar for Arithmetic Expressions
1 <E> <E> + <E>
2 <E> <E> * <E>
3 <E> id
4 <E> (<E>)
![Page 20: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/20.jpg)
Drawbacks of
Ambiguous Grammars
• Ambiguous semantics
• Parsing complexity
• May affect other phases
• But how can we express the syntax of PL
using non-ambiguous gramars?
![Page 21: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/21.jpg)
Non Ambiguous Grammar
for Arithmetic Expressions
Ambiguous grammar
1 <E> <E> + <T>
2 <E> <T>
3 <T> <T> * <F>
4 <T> <F>
5 <F> id
6 <F> (<E>)
1 <E> <E> + <E>
2 <E> <E> * <E>
3 <E> id
4 <E> (<E>)
![Page 22: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/22.jpg)
Non Ambiguous Grammars
for Arithmetic Expressions
Ambiguous grammar
1 <E> <E> + <T>
2 <E> <T>
3 T <T> * <F>
4 T <F>
5 F id
6 F (<E>)
1 <E> <E> + <E>
2 <E> <E> * <E>
3 <E> id
4 <E> (<E>)
1 <E> <E> * <T>
2 <E> <T>
3 <T> <F> + <T>
4 <T> <F>
5 <F> id
6 <F> (<E>)
![Page 23: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/23.jpg)
Parser Generators
• Input: A context free grammar
• Output: A parser for this grammar
– Reports syntax error
– Generates syntax tree
• Example tools
– yacc, bison, CUP
![Page 24: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/24.jpg)
Pushdown Automaton
control
parser-table
input
stack
$
$ u t w
V
![Page 25: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/25.jpg)
Efficient Parsers
• Pushdown automata
• Deterministic
• Report an error as soon as the input is not a
prefix of a valid program
• Not usable for all context free grammars
cup
context free grammar
tokens parser
“Ambiguity
errors”
parse tree
![Page 26: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/26.jpg)
Designing a parser
language design
context-free grammar design
Cup
parser (Java program)
![Page 27: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/27.jpg)
Kinds of Parsers
• Top-Down (Predictive Parsing) LL – Construct parse tree in a top-down matter
– Find the leftmost derivation
– For every non-terminal and token predict the next production
– Preorder tree traversal
• [Bottom-Up LR – Construct parse tree in a bottom-up manner
– Find the rightmost derivation in a reverse order
– For every potential right hand side and token decide when a production is found
– Postorder tree traversal]
![Page 28: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/28.jpg)
Top-Down Parsing
1
t1 t2
input
5
4
3
2
![Page 29: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/29.jpg)
Bottom-Up Parsing
t1 t2 t4 t5 t6 t7 t8
input
1 2
3
![Page 30: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/30.jpg)
Example Grammar for Predictive Parsing
<S> id := <E>
<S> <S> ; <S>
<S> if (<E>) <S> else <S>
<E> <T> <EP>
<T> id | (<E>)
<EP> | + <E>
![Page 31: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/31.jpg)
Example Predictive Parser
def parse_S(): if id(input): match(input, id) match(input, assign) parse_E() elif if_tok(input): match(input, if_tok) match(input, lp) parse_E() match(input, rp) parse_S() match(input, else_tok) parse_S() else: syntax_error()
def parse_EP(): if plus(input): match(input, plus) parse_E() elif rp(input) or else_tok(input) or eof(input): return // else: syntax_error()
def parse_E(): parse_T() parse_EP() def parse_T(): if id(input): match(input, id) elif lp(input): match(input, lp) parse_E() match(input, rp) else: syntax_error()
<S> id := <E>
<S> if (<E>) <S> else <S>
<E> <T> <EP>
<T> id | (<E>)
<EP> | + <E>
<S> <S> ; <S>
![Page 32: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/32.jpg)
Bad Example for Predictive Parsing
<S> <A> c | <B> d
<A> a
<B> a
def parse_S():
if a(input):
match(input, a)
parse_A()
match(input, c)
elif a(input):
match(input, a)
parse_A()
match(input, c)
else report syntax error
![Page 33: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/33.jpg)
Efficient Predictive Parsers
• Pushdown automata/Recursive descent
• Deterministic
• Report an error as soon as the input is not a
prefix of a valid program
• Not usable for all context free grammars
Parser
Generator
context free grammar
tokens parser
“Ambiguity
errors”
parse tree
![Page 34: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/34.jpg)
Predictive Parser Generation
• First assume that all non-terminals are not
nullable
– No possibility for <A> *
• Define for every string of grammar symbols
– First() = { t | : * t }
• The grammar is LL(1) if for every two
grammar rules <A> and <A>
– First() First() =
![Page 35: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/35.jpg)
Bad Example for Predictive Parsing
<S> <A> c | <B> d
<A> a
<B> a
First()
a {a}
c {c}
d {d}
A {a}
B {a}
<S> {a}
Ac {a}
Bd {a}
![Page 36: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/36.jpg)
Computing First Sets
• For tokens t, define First(t) = {t}
• For Non-terminals <A>, defines First(<A>)
inductively
– If <A> V then First(V) First(A)
• Can be computed iteratively
• For = V define
First() = First(V)
![Page 37: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/37.jpg)
Computing First Iteratively
For each token t, First(t) := {t}
For each non-terminal <A>, First(<A>) := {}
while changes occur do
if there exists a non-terminal <A> and
a rule <A> V and
a token t First(V) and
t First(<A>)
add t to First(<A>)
![Page 38: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/38.jpg)
A Simple Example
<E> ( <E> ) | ID
First(ID) First(<E>)
{ID} {}
{ID}
{ID, (}
![Page 39: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/39.jpg)
Constructing Predictive Parser
• Construct First sets
• If the grammar is not LL(1) report an error
• Otherwise construct a predictive parser
• A procedure for every non-terminals
• For tokens t First() apply the rule
<A>
– Otherwise report an error
![Page 40: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/40.jpg)
Handling Nullable Non-Terminals
• Which tokens predicate empty derivations?
• For a non-terminal A define
– Follow(A) = { t | ,: <S> * <A> t }
• Follow can be computed iteratively
• First need to be updated too
![Page 41: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/41.jpg)
Predictive Parser
<S> id := <E>
<S> if (<E>) <S> else <S>
<E> id <EP> | (<E>)
<EP> | + <E> <EP>
def parse_S(): if id(input): match(input, id) match(input, assign) parse_E() elif if_tok(input): match(input, if_tok) match(input, lp) parse_E() match(input, rp) parse_S() match(input, else_tok) parse_S() else: syntax_error()
def parse_EP(): if plus(input): match(input, plus) parse_E() parse_EP() elif rp(input) or else_tok(input) or eof(input): return // else: syntax_error()
def parse_E(): if id(input): match(input, id) parse_EP() elif lp(input): match(input, lp) parse_E() match(input, rp) else: syntax_error()
![Page 42: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/42.jpg)
Handling Nullable Non-Terminals
• For a non-terminal <A> define
– Follow(<A>) = { t | : <S> * <A>t }
• For a rule <A>
– If is nullable then
select(<A> ) = First() Follow(<A>)
• Otherwise select(<A> ) = First()
• The grammar is LL(1) if for every two grammar
rules <A> and <A>
– Select(A ) Select(A ) =
![Page 43: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/43.jpg)
Computing First For Nullable
Non-Terminals For each token t, First(t) := {t}
For each non-terminal <A>, First(<A>) = {}
while changes occur do
if there exists a non-terminal <A> and
a rule <A> V1 V2 … Vn and
V1, V2, …, Vn-1 are nullable
a token t First(Vn) and
t First(<A>)
add t to First(<A>)
![Page 44: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/44.jpg)
An Imperative View
• Create a table with
#Non-Terminals #Tokens
Entries
• If t select(<A> ) apply the rule
“apply the rule <A> ”
• Empty entries correspond to syntax errors
![Page 45: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/45.jpg)
A Simple Example
<E> ( <E> ) | ID
( ) ID $
<E> <E> (<E>) <E>id
![Page 46: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/46.jpg)
Left Factoring
• If a grammar contains two productions
– <A> a | a
then it it is not LL(1)
• Left factor A a A’
A’ |
• Can be done for a general grammar
• The resulting grammar may or may not be
LL(1)
![Page 47: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/47.jpg)
Left Recursion
• Left recursive grammar is never LL(1)
– <A> <A>a | b
• Convert into a right-recursive grammar
• Can be done for a general grammar
• The resulting grammar may or may not be
LL(1)
![Page 48: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/48.jpg)
History & Theoretical Properties
• Rich history in linguistics (Panini, Chomsky)
• Used in Algol’60 (Naur)
• Decidable problems – Emptiness
– Finiteness
– Derivation
– LL(1) grammar
• Undecidable problems – Inclusion
– Equivalence
– Ambiguity
– “LL(1) language”
![Page 49: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/49.jpg)
Abstract Syntax
• An ambiguous grammar used to concisely
define the hierarchy of sentences
• Define the structure of a tree
Exp ::= Exp BinOp Exp | ID | Const
![Page 50: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/50.jpg)
Context Sensitive Features
• Some PL features cannot be defined using
context free grammars
– Identifiers
– Types
• Some are awkward to define
![Page 51: Formal Syntax of Programming Languages Mooly Sagiv](https://reader030.fdocuments.us/reader030/viewer/2022032709/623bc9713d7ad85ec62af168/html5/thumbnails/51.jpg)
Summary
• Context free grammars provide a natural way to
define the syntax of programming languages
• Ambiguity may be resolved
• Predictive parsing is natural
– Good error messages
– Natural error recovery
– But not expressive enough
• [Bottom-up parsing is more expressible
– Good tools, Bison, yacc, CUPP]
• Useful beyond PL