Compilation 0368-3133 Lecture 3: Syntax Analysis Top Down parsing Bottom Up parsing Noam Rinetzky 1.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as...
-
Upload
garry-hall -
Category
Documents
-
view
226 -
download
0
Transcript of Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as...
Muhammad Idrees, Lecturer University of Lahore1
Top-Down Parsing
Top down parsing can be viewed as an attempt to find a leftmost derivation for an input string. Equivalently, it can be viewed as an attempt to construct a parse tree for the input starting form the root and creating nodes of the parse tree in preorder.
Muhammad Idrees, Lecturer University of Lahore2
Top-Down Parsing(cont’d)
We now consider a general form of top down parsing, called recursive descent, that may involve backtracking, that is making repeated scans of the input.However, backtracking parsers are not seen frequently
Muhammad Idrees, Lecturer University of Lahore3
Top-Down Parsing(cont’d)
One reason is that backtracking is rarely needed to parse programming language constructs. In situations like natural language parsing, backtracking is still not very efficient, and tabular methods such as the dynamic programming algorithm or method of Earley are preferred.
Muhammad Idrees, Lecturer University of Lahore4
Example.
Consider the grammar,
S cAd
A ab|a
and the input string w = cad. To construct a parse tree for this string top down, we initially create a tree consisting of a single node labeled S.
Muhammad Idrees, Lecturer University of Lahore5
Example (cont’d)
An input pointer points to “c”, the first symbol of w. We then use the first production for S to expand the tree and obtain the tree of the Fig.(a)
S
Fig(a)
c A d
Muhammad Idrees, Lecturer University of Lahore6
Example (cont’d)
The leftmost leaf, labeled “c”, matches the first symbol of w,So we now advance the input pointer to “a”,the second symbol of w, and consider the next leaf, labeled A. We can then expand A using the first alternative for A to obtain the tree of the fig(b).
Muhammad Idrees, Lecturer University of Lahore7
Example (cont’d)
S
c A d
a b
Fig(b)
Muhammad Idrees, Lecturer University of Lahore8
Example (cont’d)
We now have a match for the second input symbol so we advance the input pointer to “d”, the third input symbol and compare “d” against the next leaf, labeled “b”. Since “b” does not match “d”, we report failure and go back to A to see whether there is another alternative for A that we have not tried but that might produce a match.
Muhammad Idrees, Lecturer University of Lahore9
Example (cont’d)
S
c A d
a
Fig(c)
Muhammad Idrees, Lecturer University of Lahore10
Example (cont’d)
In going back to A, we must reset the input pointer to position 2, the position it had when we first came to A, which means that the procedure for A must store the input pointer in a local variable. We now try the second alternative for A to obtain the tree of the fig(c). The leaf “a” matches the second symbol w and the leaf “d” matches the third symbol. Since we have produced a parse tree for w, we halt and announce successful completion of parsing.
Muhammad Idrees, Lecturer University of Lahore11
Top-down Parsing To find a leftmost derivation for an input string Construct a parse tree from the root Example
S cAdA ab | a
Input w = cad
S
c A d
S
c A d
S
c A d
a b a
Muhammad Idrees, Lecturer University of Lahore12
Example
input: cad
cad S
c dA
cadS
c dA
a b
cadS
c dA
a b Problem: backtrack
cadS
c dA
a
cadS
c dA
a
Example: S c A d A ab | a
Muhammad Idrees, Lecturer University of Lahore13
Parsing – Top-Down & Predictive
Top-Down Parsing Parse tree / derivation of a token string occurs in a top down fashion.
For Example, Consider:
type simple
| id
| array [ simple ] of type
simple integer
| char
| num dotdot numSuppose input is :
array [ num dotdot num ] of integer
Parsing would begin with
type ???
Start symbol
Muhammad Idrees, Lecturer University of Lahore14
Top-Down Parse
type]simple of[arraytype
type]simple of[array
type
numnum dotdot
Input : array [ num dotdot num ] of integer
Lookahead symbol
type
?
Input : array [ num dotdot num ] of integer
Lookahead symbol
type simple
| id
| array [ simple ] of type
simple integer
| char
| num dotdot num
Start symbol
Muhammad Idrees, Lecturer University of Lahore15
Top-Down Parse
Input : array [ num dotdot num ] of integer
type]simple of[array
type
numnum dotdot simple
type]simple of[array
type
numnum dotdot simple
integer
Lookahead symbol
type simple
| id
| array [ simple ] of type
simple integer
| char
| num dotdot num
Start symbol
Muhammad Idrees, Lecturer University of Lahore16
Recursive Descent or Predictive Parsing
Parser Operates by Attempting to Match Tokens in the Input Stream Utilize both Grammar and Input Below to Motivate Code for Algorithm
array [ num dotdot num ] of integertype simple
| id
| array [ simple ] of type
simple integer
| char
| num dotdot num
procedure match ( t : token ) ;
begin if lookahead = t then lookahead : = nexttoken else errorend ;
Muhammad Idrees, Lecturer University of Lahore17
Top-down algorithm (continued)
procedure simple ;begin if lookahead = integer then match ( integer ); else if lookahead = char then match ( char ); else if lookahead = num then begin match (num); match (dotdot); match (num) end else errorend ;
type simple
| id
| array [ simple ] of type
simple integer
| char
| num dotdot num
Muhammad Idrees, Lecturer University of Lahore18
Top-Down Algorithm (Continued)
procedure type ;begin if lookahead is in { integer, char, num } then simple else if lookahead = ‘’ then begin match (‘’ ) ; match( id ) end else if lookahead = array then begin match( array ); match(‘[‘); simple; match(‘]’); match(of); type end else errorend ;
type simple
| id
| array [ simple ] of type
simple integer
| char
| num dotdot num
Muhammad Idrees, Lecturer University of Lahore19
Tracing
Input: array [ num dotdot num ] of integerTo initialize the parser:set global variable : lookahead = arraycall procedure: type
Procedure call to type with lookahead = array results in the actions:
match( array ); match(‘[‘); simple; match(‘]’); match(of); type
Procedure call to simple with lookahead = num results in the actions:
match (num); match (dotdot); match (num)
Procedure call to type with lookahead = integer results in the actions:
simple
Procedure call to simple with lookahead = integer results in the actions:
match ( integer )
type simple
| id
| array [ simple ] of type
simple integer
| char
| num dotdot num
Muhammad Idrees, Lecturer University of Lahore20
Compiler Phases – Front End
SemanticAction Checking
Scanner
Parser
IntermediateRepresentation
IntermediateRepresentation
SemanticError
RequestToken
GetToken
Start
Muhammad Idrees, Lecturer University of Lahore21
Big Picture
Parsing: Matching code we are translating to rules of a grammar. Building a representation of the code.
Scanning: An abstraction that simplifies the parsing process by converting the raw text input into a stream of known objects called tokens.
Grammar dictates syntactic rules of a language i.e, how a legal sentence in a language could be formed
Lexical rules of a language dictate how a legal word in a language is formed by concatenating alphabet of the language.
Muhammad Idrees, Lecturer University of Lahore22
Overall Operation
Parser is in control of the overall operation Demands scanner to produce a token Scanner reads input file into token buffer & forms a token Token is returned to parser Parser attempts to match the token Failure: Syntax Error! Success:
– Does nothing and returns to get next token
or– Takes Semantic Action
Muhammad Idrees, Lecturer University of Lahore23
Overall Operation
Semantic Action: Lookup variable name– If found okay– If not: Put in symbol table
If semantic checks succeed, do code-generation
Return to get next token No more tokens? Done!
Muhammad Idrees, Lecturer University of Lahore24
Tokenization
Input File Token Buffer
What does the Token Buffer contain?– Token being identified
Why a two-way ( ) street? – Characters can be read– and unread– Termination of a token
Muhammad Idrees, Lecturer University of Lahore25
Example
main() m
Muhammad Idrees, Lecturer University of Lahore26
Example
main() am
Muhammad Idrees, Lecturer University of Lahore27
Example
main() iam
Muhammad Idrees, Lecturer University of Lahore28
Example
main() niam
Muhammad Idrees, Lecturer University of Lahore29
Example
main() (niam
Muhammad Idrees, Lecturer University of Lahore30
Example
main() niam
Keyword: main
Muhammad Idrees, Lecturer University of Lahore31
Overall Operation
Parser is in control of the overall operation Demands scanner to produce a token Scanner reads input file into token buffer & forms a token Token is returned to parser Parser attempts to match the token Failure: Syntax Error! Success:
– Does nothing and returns to get next token
OR– Takes Semantic Action
Muhammad Idrees, Lecturer University of Lahore32
Overall Operation
Semantic Action: Lookup variable name– If found okay– If not: Put in symbol table
If semantic checks succeed, do code-generation
Return to get next token No more tokens? Done!
Muhammad Idrees, Lecturer University of Lahore33
Grammar Rules
<C-PROG> MAIN OPENPAR <PARAMS> CLOSEPAR <MAIN-BODY><PARAMS> NULL<PARAMS> VAR <VAR-LIST><VARLIST> , VAR <VARLIST><VARLIST> NULL<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE> VAR <VAR-LIST>;<ASSIGN-STMT> VAR = <EXPR>;<EXPR> VAR<EXPR> VAR<OP><EXPR><OP> +<OP> -<TYPE> INT<TYPE> FLOAT
Muhammad Idrees, Lecturer University of Lahore34
Demo
main() { int a,b; a = b;}
Parser
Scanner Token Buffer
Muhammad Idrees, Lecturer University of Lahore35
Demo
main() { int a,b; a = b;}
Parser
Scanner
"Please, get methe next token"
Token Buffer
Muhammad Idrees, Lecturer University of Lahore36
Demo
main() { int a,b; a = b;}
Parser
Scanner m
Muhammad Idrees, Lecturer University of Lahore37
Demo
main() { int a,b; a = b;}
Parser
Scanner am
Muhammad Idrees, Lecturer University of Lahore38
Demo
main() { int a,b; a = b;}
Parser
Scanner iam
Muhammad Idrees, Lecturer University of Lahore39
Demo
main() { int a,b; a = b;}
Parser
Scanner niam
Muhammad Idrees, Lecturer University of Lahore40
Demo
main() { int a,b; a = b;}
Parser
Scanner (niam
Muhammad Idrees, Lecturer University of Lahore41
Demo
main() { int a,b; a = b;}
Parser
Scanner niam
Muhammad Idrees, Lecturer University of Lahore42
Demo
main() { int a,b; a = b;}
Parser
Scanner
Token: main
Token Buffer
Muhammad Idrees, Lecturer University of Lahore43
Demo
main() { int a,b; a = b;}
Parser
Scanner
"I recognize this"
Token Buffer
Muhammad Idrees, Lecturer University of Lahore44
Parsing (Matching)
Start matching using a rule When match takes place at a certain position, move further (get
next token & repeat the process) If expansion needs to be done, choose appropriate rule (How to
decide which rule to choose?) If no rule found, declare error If several rules found the grammar (set of rules) is ambiguous Grammar ambiguous? Language ambiguous?
Muhammad Idrees, Lecturer University of Lahore45
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
"Please, get methe next token"
Muhammad Idrees, Lecturer University of Lahore46
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: MAIN
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
Muhammad Idrees, Lecturer University of Lahore47
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
"Please, get methe next token"
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
Muhammad Idrees, Lecturer University of Lahore48
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: OPENPAR
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
Muhammad Idrees, Lecturer University of Lahore49
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: CLOSEPAR
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><PARAMETERS> NULL
Muhammad Idrees, Lecturer University of Lahore50
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: CLOSEPAR
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><PARAMETERS> NULL
Muhammad Idrees, Lecturer University of Lahore51
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: CLOSEPAR
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
Muhammad Idrees, Lecturer University of Lahore52
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: CURLYOPEN
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
Muhammad Idrees, Lecturer University of Lahore53
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: INT
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>; <TYPE> INT
Muhammad Idrees, Lecturer University of Lahore54
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: INT
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>; <TYPE> INT
Muhammad Idrees, Lecturer University of Lahore55
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: INT
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>; <TYPE> INT
Muhammad Idrees, Lecturer University of Lahore56
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: VAR
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>; <VARLIST> , VAR <VARLIST><VARLIST> NULL
Muhammad Idrees, Lecturer University of Lahore57
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: ',' [COMMA]
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>; <VARLIST> , VAR <VARLIST><VARLIST> NULL
Muhammad Idrees, Lecturer University of Lahore58
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: VAR
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>; <VARLIST> , VAR <VARLIST><VARLIST> NULL
Muhammad Idrees, Lecturer University of Lahore59
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: ';'
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>; <VARLIST> , VAR <VARLIST><VARLIST> NULL
Muhammad Idrees, Lecturer University of Lahore60
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: ';'
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>; <VARLIST> , VAR <VARLIST><VARLIST> NULL
Muhammad Idrees, Lecturer University of Lahore61
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: ';'
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>; <VARLIST> , VAR <VARLIST><VARLIST> NULL
Muhammad Idrees, Lecturer University of Lahore62
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: ';'
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>; <VARLIST> , VAR <VARLIST><VARLIST> NULL
Muhammad Idrees, Lecturer University of Lahore63
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: ';'
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>;
Muhammad Idrees, Lecturer University of Lahore64
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: ';'
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<DECL-STMT> <TYPE>VAR<VAR-LIST>;
Muhammad Idrees, Lecturer University of Lahore65
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: VAR
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<ASSIGN-STMT> VAR = <EXPR>;<EXPR> VAR
Muhammad Idrees, Lecturer University of Lahore66
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: '='
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<ASSIGN-STMT> VAR = <EXPR>;<EXPR> VAR
Muhammad Idrees, Lecturer University of Lahore67
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: VAR
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<ASSIGN-STMT> VAR = <EXPR>;<EXPR> VAR
Muhammad Idrees, Lecturer University of Lahore68
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: VAR
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<ASSIGN-STMT> VAR = <EXPR>;<EXPR> VAR
Muhammad Idrees, Lecturer University of Lahore69
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: VAR
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<ASSIGN-STMT> VAR = <EXPR>;<EXPR> VAR
Muhammad Idrees, Lecturer University of Lahore70
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: ';'
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<ASSIGN-STMT> VAR = <EXPR>;
Muhammad Idrees, Lecturer University of Lahore71
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: ';'
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE<ASSIGN-STMT> VAR = <EXPR>;
Muhammad Idrees, Lecturer University of Lahore72
Scanning & Parsing Combined
main() { int a,b; a = b;}
Parser
Scanner
Token: CURLYCLOSE
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY><MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE