Compiler Construction
description
Transcript of Compiler Construction
![Page 1: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/1.jpg)
1
Compiler ConstructionCompiler Construction
Syntax AnalysisSyntax Analysis
Top-down parsingTop-down parsing
![Page 2: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/2.jpg)
2
Syntax Analysis, continued
![Page 3: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/3.jpg)
3
Syntax analysis
Last week we covered The goal of syntax analysis Context-free grammars Top-down parsing (a simple but weak parsing method)
Today, we will Wrap up top-down parsing, including LL(1) Start on bottom-up parsing
Shift-reduce parsers LR parsers: SLR(1), LR(1), LALR(1)
![Page 4: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/4.jpg)
4
Top-Down Parsing
![Page 5: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/5.jpg)
5
Recursive descent (Last Week)
Recursive descent parsers simply try to build a parse tree, top-down, and BACKTRACK on failure.
Recursion and backtracking are inefficient.It would be better if we always knew the correct
action to take.It would be better if we could avoid recursive
procedure calls during parsing.PREDICTIVE PARSERS can solve both problems.
![Page 6: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/6.jpg)
6
Predictive parsers
A predictive parser always knows which production to use, so backtracking is not necessary.
Example: for the productionsstmt -> if ( expr ) stmt else stmt | while ( expr ) stmt | for ( stmt expr stmt ) stmt
a recursive descent parser would always know which production to use, depending on the input token.
![Page 7: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/7.jpg)
7
Transition diagrams
Transition diagrams can describe recursive parsers, just like they can describe lexical analyzers, but the diagrams are slightly different.
Construction:1. Eliminate left recursion from G2. Left factor G3. For each non-terminal A, do
1. Create an initial and final (return) state2. For each production A -> X1 X2 … Xn, create a path from the initi
al to the final state with edges X1 X2 … Xn.
![Page 8: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/8.jpg)
8
Using transition diagrams
Begin in the start state for the start symbolWhen we are in state s with edge labeled by terminal a to st
ate t, if the next input symbol is a, move to state t and advance the input pointer.
For an edge to state t labeled with non-terminal A, jump to the transition diagram for A, and when finished, return to state t
For an edge labeled ε, move immediately to t.
Example (4.15 in text): parse the string “id + id * id”
![Page 9: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/9.jpg)
9
Example transition diagrams
An expression grammar with left recursion and ambiguity removed:
E -> T E’E’ -> + T E’ | εT -> F T’T’ -> * F T’ | εF -> ( E ) | id
Corresponding transition diagrams:
![Page 10: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/10.jpg)
10
Predictive parsing without recursion
To get rid of the recursive procedure calls, we maintain our own stack.
![Page 11: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/11.jpg)
11
The parsing table and parsing program
The table is a 2D array M[A,a] where A is a nonterminal symbol and a is a terminal or $.
At each step, the parser considers the top-of-stack symbol X and input symbol a: If both are $, accept If they are the same (nonterminals), pop X, advance input If X is a nonterminal, consult M[X,a]. If M[X,a] is “ERROR” call an
error recovery routine. Otherwise, if M[X,a] is a production of he grammar X -> UVW, replace X on the stack with WVU (U on top)
![Page 12: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/12.jpg)
12
Example
Use the table-driven predictive parser to parseid + id * id
Assuming parsing table
Initial stack is $EInitial input is id + id * id $
![Page 13: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/13.jpg)
13
Building a predictive parse table
We still don’t know how to create M, the parse table.The construction requires two functions: FIRST and FOLLOW.For a string of grammar symbols α, FIRST(α) is the set of te
rminals that begin all possible strings derived from α. If α =*> ε, then ε is also in FIRST(α).
FOLLOW(A) for nonterminal A is the set of terminals that can appear immediately to the right of A in some sentential form. If A can be the last symbol in a sentential form, then $ is also in FOLLOW(A).
![Page 14: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/14.jpg)
14
How to compute FIRST(α)
1. If X is a terminal, FIRST(X) = X.2. Otherwise (X is a nonterminal),
1. 1. If X -> ε is a production, add ε to FIRST(X)2. 2. If X -> Y1 … Yk is a production, then place a in FIRST(X) if for s
ome i, a is in FIRST(Yi) and Y1…Yi-1 =*> ε.
Given FIRST(X) for all single symbols X,Let FIRST(X1…Xn) = FIRST(X1)
If ε ∈ FIRST(X1), then add FIRST(X2), and so on…
![Page 15: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/15.jpg)
15
How to compute FOLLOW(A)
Place $ in FOLLOW(S) (for S the start symbol)If A -> α B β, then FIRST(β)-ε is placed in FOLLOW(B)If there is a production A -> α B or a production A -> α B β
where β =*> ε, then everything in FOLLOW(A) is in FOLLOW(B).
Repeatedly apply these rules until no FOLLOW set changes.
![Page 16: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/16.jpg)
16
Example FIRST and FOLLOW
For our favorite grammar:E -> TE’E’ -> +TE | εT -> FT’T’ -> *FT’ | εF -> (E) | id
What is FIRST() and FOLLOW() for all nonterminals?
![Page 17: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/17.jpg)
17
Parse table construction withFIRST/FOLLOW
Basic idea: if A -> α and a is in FIRST(α), then we expand A to α any time the current input is a and the top of stack is A.
Algorithm: For each production A -> α in G, do: For each terminal a in FIRST(α) add A -> α to M[A,a] If ε ∈ FIRST(α), for each terminal b in FOLLOW(A), do: add A -> α to M[A,b] If ε ∈ FIRST(α) and $ is in FOLLOW(A), add A -> α to M[A,$] Make each undefined entry in M[ ] an ERROR
![Page 18: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/18.jpg)
18
Example predictive parse table construction
For our favorite grammar:E -> TE’E’ -> +TE | εT -> FT’T’ -> *FT’ | εF -> (E) | id
What the predictive parsing table?
![Page 19: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/19.jpg)
19
LL(1) grammars
The predictive parser algorithm can be applied to ANY grammar.
But sometimes, M[ ] might have multiply defined entries.Example: for if-else statements and left factoring:
stmt -> if ( expr ) stmt optelseoptelse -> else stmt | ε
When we have “optelse” on the stack and “else” in the input, we have a choice of how to expand optelse (“else” is in FOLLOW(optelse) so either rule is possible)
![Page 20: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/20.jpg)
20
LL(1) grammars
If the predictive parsing construction for G leads to a parse table M[ ] WITHOUT multiply defined entries,we say “G is LL(1)”
1 symbol of lookahead
Leftmost derivation
Left-to-right scan of the input
![Page 21: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/21.jpg)
21
LL(1) grammars
Necessary and sufficient conditions for G to be LL(1):
If A -> α | β1. There does not exist a terminal a such that
a ∈ FIRST(α) and a ∈ FIRST(β)2. At most one of α and β derive ε3. If β =*> ε, then FIRST(α) does not intersect with F
OLLOW(β).This is the same as saying thepredictive parser alwaysknows what to do!
![Page 22: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/22.jpg)
22
Top-down parsing summary
RECURSIVE DESCENT parsers are easy to build, but inefficient, and might require backtracking.
TRANSITION DIAGRAMS help us build recursive descent parsers.
For LL(1) grammars, it is possible to build PREDICTIVE PARSERS with no recursion automatically. Compute FIRST() and FOLLOW() for all nonterminals Fill in the predictive parsing table Use the table-driven predictive parsing algorithm
![Page 23: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/23.jpg)
23
Bottom-Up Parsing
![Page 24: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/24.jpg)
24
Bottom-up parsing
Now, instead of starting with the start symbol and working our way down, we will start at the bottom of the parse tree and work our way up.
The style of parsing is called SHIFT-REDUCESHIFT refers to pushing input symbols onto a stack.REDUCE refers to “reduction steps” during a parse:
We take a substring matching the RHS of a rule Then replace it with the symbol on the LHS of the rule
If you can reduce until you have just the start symbol, you have succeeded in parsing the input string.
![Page 25: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/25.jpg)
25
Reduction example
S -> aABeGrammar: A -> Abc | b Input: abbcbcde B -> d
Reduction steps: abbcbcde aAbcbcde aAbcde aAde aABe S <-- SUCCESS!
In reverse, thereduction tracesout a rightmostderivation.
![Page 26: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/26.jpg)
26
Handles
The HANDLE is the part of a sentential form that gets reduced in a backwards rightmost derivation.
Sometimes part of a sentential form will match a RHS in G, but if that string is NOT reduced in the backwards rightmost derivation, it is NOT a handle.
Shift-reduce parsing, then, is really all about finding the handle at each step then reducing the handle.
If we can always find the handle, we never have to backtrack.
Finding the handle is called HANDLE PRUNING.
![Page 27: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/27.jpg)
27
Shift-reduce parsing with a stack
A stack helps us find the handle for each reduction step.The stack holds grammar symbols.An input buffer holds the input string.$ marks the bottom of the stack and the end of input.Algorithm:1. Shift 0 or more input symbols onto the stack, until a han
dle β is on top of the stack.2. Reduce β to the LHS of the appropriate production.3. Repeat until we see $S on stack and $ in input.
![Page 28: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/28.jpg)
28
Shift-reduce example
E -> E + EGrammar: E -> E * E w = id + id * id E -> ( E ) E -> id
STACK INPUT ACTION1. $ id+id*id$ shift
![Page 29: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/29.jpg)
29
Shift-reduce parsing actions
SHIFT: The next input symbol is pushed onto the stack.
REDUCE: When the parser knows the right end of a handle is on the stack, the handle is replaced with the corresponding LHS.
ACCEPT: Announce success (input is $, stack is $S)ERROR: The input contained a syntax error; call an
error recovery routine.
![Page 30: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/30.jpg)
30
Conflicts during shift/reduce parsing
Like predictive parsers, sometimes a shift-reduce parser won’t know what to do.
A SHIFT/REDUCE conflict occurs when the parser can’t decide whether to shift the input symbol or reduce the current top of stack.
A REDUCE/REDUCE conflict occurs when the parser doesn’t know which of two or more rules to use for reduction.
A grammar whose shift-reduce parser contains errors is said to be “Not LR”
![Page 31: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/31.jpg)
31
Example shift/reduce conflict
Ambiguous grammars are NEVER LR.stmt -> if ( expr ) stmt | if ( expr ) stmt else stmt | other
If we have a shift-reduce parser in configuration STACK INPUT … if ( expr ) stmt else … $
what to do? We could reduce “if ( expr ) stmt” to “stmt” (assuming the else is part of
a different surrounding if-else statement) We could also shift the “else” (assuming this else goes with the current i
f)
![Page 32: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/32.jpg)
32
Example reduce/reduce conflict
Some languages use () for function calls AND array refs.stmt -> id ( parameter_list )stmt -> expr := exprparameter_list -> parameter_list , parameterparameter_list -> parameterparameter -> idexpr -> id ( expr_list )expr -> idexpr_list -> expr_list , exprexpr_list -> expr
![Page 33: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/33.jpg)
33
Example reduce/reduce conflict
For input A(I,J) we would get token stream id(id,id)The first three tokens would certainly be shifted: STACK INPUT … id ( id , id ) …
The id on top of the stack needs to be reduced, but we have two choices: parameter -> id OR expr -> id
The stack gives no clues. To know which rule to use, we need to look up the first ID in the symbol table to see if it is a procedure name or an array name.
One solution is to have the lexer return “procid” for procedure names. Then the shift-reduce parser can look into the stack to decide which reduction to use.
![Page 34: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/34.jpg)
34
LR (Bottom-Up) Parsers
![Page 35: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/35.jpg)
35
Relationship between parser types
![Page 36: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/36.jpg)
36
LR parsing
A major type of shift-reduce parsing is called LR(k).“L” means left-to-right scanning of the input“R” means rightmost derivation“k” means lookahead of k characters (if omitted, assume k=1)LR parsers have very nice properties:
They can recognize almost all programming language constructs for which we can write a CFG
They are the most powerful type of shift-reduce parser, but they never backtrack, and are very efficient
They can parse a proper superset of the languages parsable by predictive parsers
They tell you as soon as possible when there’s a syntax error. DISADVANTAGE: hard to build by hand (we need something like yacc)
![Page 37: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/37.jpg)
37
LR parsing
![Page 38: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/38.jpg)
38
LR parsing
The parser’s structure is similar to predictive parsing.The STACK now stores pairs (Xi, si).
Xi is a grammar symbol. si is a STATE.
The parse table now has two parts: ACTION and GOTO.The action table specifies whether to SHIFT, REDUCE, ACCE
PT, or flag an ERROR given the state on the stack and the current input.
The goto table specifies what state to go to after a reduction is performed.
![Page 39: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/39.jpg)
39
Parser configurations
A CONFIGURATION of the LR parser is a pair (STACK, INPUT): ( s0 X1 s1 … Xm sm, ai ai+1 … an $ )
The stack configuration is just a list of the states and grammar symbols currently on the stack.
The input configuration is the list of unprocessed input symbols.
Together, the configuration represents a right-sentential form X1 … Xm ai ai+1 … an (some intermediate step in a right derivation of the input from the start symbol)
![Page 40: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/40.jpg)
40
The LR parsing algorithm
At each step, the parser is in some configuration.The next move depends on reading ai from the input and sm
from the top of the stack. If action[sm,ai] = shift s, we execute a SHIFT move, entering the c
onfiguration ( s0 X1 s1 … Xm sm ai s, ai+1 … an $ ). If action[sm,ai] = reduce A -> β, then we enter the configuration (
s0 X1 s1 … Xm-r sm-r A s, ai+1 … an $ ), where r = | β | and s = goto[sm-r,A].
If action[sm,ai] = accept, we’re done. If action[sm,ai] = error, we call an error recovery routine.
![Page 41: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/41.jpg)
41
LR parsing example
Grammar:1. E -> E + T2. E -> T3. T -> T * F4. T -> F5. F -> ( E )6. F -> id
![Page 42: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/42.jpg)
42
LR parsing example
CONFIGURATIONSSTACK INPUT ACTION0 id * id + id $ shift 5
![Page 43: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/43.jpg)
43
LR grammars
If it is possible to construct an LR parse table for G, we say “G is an LR grammar”.
LR parsers DO NOT need to parse the entire stack to decide what to do (other shift-reduce parsers might).
Instead, the STATE symbol summarizes all the information needed to make the decision of what to do next.
The GOTO function corresponds to a DFA that knows how to find the HANDLE by reading the top of the stack downwards.
In the example, we only looked at 1 input symbol at a time. This means the grammar is LR(1).
![Page 44: Compiler Construction](https://reader036.fdocuments.us/reader036/viewer/2022081503/568150c0550346895dbee129/html5/thumbnails/44.jpg)
44
How to construct an LR parse table?
We will look at 3 methods: Simple LR (SLR): simple but not very powerful Canonical LR: very powerful but too many states LALR: almost as powerful with many fewer states
yacc uses the LALR algorithm.