CH4p3.1 CSE 4100 Chapter 4 - Part 3: Bottom-Up Parsing Prof. Steven A. Demurjian Computer Science &...
-
date post
21-Dec-2015 -
Category
Documents
-
view
219 -
download
2
Transcript of CH4p3.1 CSE 4100 Chapter 4 - Part 3: Bottom-Up Parsing Prof. Steven A. Demurjian Computer Science &...
CH4p3.1
CSE4100
Chapter 4 - Part 3: Bottom-Up ParsingChapter 4 - Part 3: Bottom-Up Parsing
Prof. Steven A. DemurjianComputer Science & Engineering Department
The University of Connecticut371 Fairfield Way, Unit 2155
Storrs, CT [email protected]
http://www.engr.uconn.edu/~steve(860) 486 - 4818
Material for course thanks to:Laurent MichelAggelos KiayiasRobert LeBarre
CH4p3.2
CSE4100
Basic IntuitionBasic Intuition Recall thatRecall that
LL(k) works TOP-DOWN With a LEFTMOST DerivationPredicts the right production to select based on lookahead
Our Our newnew motto motto LR(k) works
BOTTOM-UPWith a RIGHTMOST DerivationCommits to the production choice after seeing the whole
body (left hand side), working in “reverse”
CH4p3.3
CSE4100
Bottom-Up ParsingBottom-Up Parsing Inverse or Complement of Top-Down ParsingInverse or Complement of Top-Down Parsing Top Down Parsing Utilizes “Start Symbol” and Top Down Parsing Utilizes “Start Symbol” and
Attempts to Derive the Input String using ProductionsAttempts to Derive the Input String using Productions Bottom-Up Parsing Makes Modifications to the Input Bottom-Up Parsing Makes Modifications to the Input
String which Allows it to Reduce to Start SymbolString which Allows it to Reduce to Start Symbol For Example, Consider Grammar & Derivations:For Example, Consider Grammar & Derivations:
S S a A B e a A B eA A Abc | b Abc | bB B d d
What Does Each Derivation Represent?What Does Each Derivation Represent? Top-Down ---- Leftmost Derivation Bottom-Up ---- Rightmost Derivation in Reverse!
S aABe aAbcBe abbcBe abbcde
abbcde aAbcde aAde aABe S
CH4p3.4
CSE4100
Type of DerviationType of Derviation Grammar:Grammar:
S S a A B e a A B eA A Abc | b Abc | bB B d d
Key Issues:Key Issues: How do we Determine which Substring to
“Reduce”? How do we Know which Production Rule to Use? What is the General Processing for BUP? How are Conflicts Resolved? What Types of BUP are Considered?
TDP: S aABe aAbcBe abbcBe abbcde
BUP: S aABe aAde aAbcde abbcde Is a rightmost derivation that happens in reverse!
CH4p3.5
CSE4100
What is a Handle?What is a Handle? Defn: ADefn: A Right-Sentential Form Right-Sentential Form is Sentential Form that is Sentential Form that
has Been Derived in a Righmost Derivationhas Been Derived in a Righmost Derivation S aABe aAde aAbcde abbcde Underline all Right Sentential Forms
HandleHandle is a Substring of a Right Sentential Form that: is a Substring of a Right Sentential Form that: Appears on Right Hand Side of Production Rule Can be Used to Reduce the Right Sentential Form
via a Substitution in a Step of a RM Derivation Formally is a rule A → β and position in Right
Sentential Form γ s.t. S RM
* αAw RM αβw and A occurs at γ in αAw
Example: Handles are Underlined in:Example: Handles are Underlined in: S aABe aAde aAbcde abbcde Abc is Right hand Side of Rule A → Abc at
Position 2 in Right Sentential Form γ = aAbcde
CH4p3.6
CSE4100
What is a Handle?What is a Handle? Consider again...Consider again...
S aABe aAde aAbcde abbcde
S → aABe A → Abc | b B → d
CH4p3.7
CSE4100
Handle PruningHandle Pruning What bottom-up really means...What bottom-up really means...
abbcde aAbcde
CH4p3.8
CSE4100
Handle PruningHandle Pruning
aAbcde aAde
CH4p3.9
CSE4100
Handle PruningHandle Pruning
aAde aABe
CH4p3.10
CSE4100
Handle PruningHandle Pruning
aABe S
CH4p3.11
CSE4100
What’s Going on in Parse Tree?What’s Going on in Parse Tree? Consider Right Sentential Form: Consider Right Sentential Form: αβαβw and Rule A w and Rule A ββ
SS
αα
ββ ww
AAWhat Doesαα Signify? Signify?
What Doesββ Represent? Represent?
What Doesw Contain?w Contain?Input Processed
Still on Parsing Stack
Candidate Handle to be Reduced
Input yet to beConsumed
CH4p3.12
CSE4100
Bottom-Up Parsing …Bottom-Up Parsing … Recognized body of last production applied in rightmost Recognized body of last production applied in rightmost
derivationderivation Replace the symbol sequence of that body by the RHS Replace the symbol sequence of that body by the RHS
of the Production Rule Based on “Current” Inputof the Production Rule Based on “Current” Input RepeatsRepeats At the endAt the end
EitherWe are left with the start symbol Success!
OrWe get “stuck” somewhere Syntax error!
Key Issue: If there are Multiple Handles for the “Same” Key Issue: If there are Multiple Handles for the “Same” Sentential Form, then the Grammar G is AmbiguousSentential Form, then the Grammar G is Ambiguous
CH4p3.13
CSE4100
General Processing of BUPGeneral Processing of BUP Basic mechanismsBasic mechanisms
“Shift” “Reduce”
Basic data-structureBasic data-structure A stack of grammar symbols (Terminals and Non-
Terminals) Basic ideaBasic idea
Shift input symbols on the stack until ... the entire handle of the last rightmost reduction
When the body of the last RM reduction is on Stack, reduce it by replacing the body by the right-hand-side of the Production Rule
When only start symbol is left We are done.
CH4p3.14
CSE4100
ExampleExample
$$ abbcde$abbcde$ ShiftShift
$a$a bbcde$bbcde$ ShiftShift
$ab$ab bcde$bcde$ ReduceReduce
$aA$aA bcde$bcde$ ShiftShift
$aAb$aAb cde$cde$ ShiftShift
$aAbc$aAbc de$de$ ReduceReduce
$aA$aA de$de$ ShiftShift
$aAd$aAd e$e$ ReduceReduce
$aAB$aAB e$e$ ShiftShift
$aABe$aABe $$ ReduceReduce
$S$S $$ AcceptAccept
Handle
Rule to Reduce with
CH4p3.15
CSE4100
ExampleExample
$$ abbcde$abbcde$ ShiftShift
$a$a bbcde$bbcde$ ShiftShift
$ab$ab bcde$bcde$ ReduceReduce
$aA$aA bcde$bcde$ ShiftShift
$aAb$aAb cde$cde$ ShiftShift
$aAbc$aAbc de$de$ ReduceReduce
$aA$aA de$de$ ShiftShift
$aAd$aAd e$e$ ReduceReduce
$aAB$aAB e$e$ ShiftShift
$aABe$aABe $$ ReduceReduce
$S$S $$ AcceptAccept
HandleRule to Reduce with
CH4p3.16
CSE4100
ExampleExample
$$ abbcde$abbcde$ ShiftShift
$a$a bbcde$bbcde$ ShiftShift
$ab$ab bcde$bcde$ ReduceReduce
$aA$aA bcde$bcde$ ShiftShift
$aAb$aAb cde$cde$ ShiftShift
$aAbc$aAbc de$de$ ReduceReduce
$aA$aA de$de$ ShiftShift
$aAd$aAd e$e$ ReduceReduce
$aAB$aAB e$e$ ShiftShift
$aABe$aABe $$ ReduceReduce
$S$S $$ AcceptAccept
Handle Rule to Reduce with
CH4p3.17
CSE4100
ExampleExample
$$ abbcde$abbcde$ ShiftShift
$a$a bbcde$bbcde$ ShiftShift
$ab$ab bcde$bcde$ ReduceReduce
$aA$aA bcde$bcde$ ShiftShift
$aAb$aAb cde$cde$ ShiftShift
$aAbc$aAbc de$de$ ReduceReduce
$aA$aA de$de$ ShiftShift
$aAd$aAd e$e$ ReduceReduce
$aAB$aAB e$e$ ShiftShift
$aABe$aABe $$ ReduceReduce
$S$S $$ AcceptAccept
CH4p3.18
CSE4100
ExampleExample
$$ abbcde$abbcde$ ShiftShift
$a$a bbcde$bbcde$ ShiftShift
$ab$ab bcde$bcde$ ReduceReduce
$aA$aA bcde$bcde$ ShiftShift
$aAb$aAb cde$cde$ ShiftShift
$aAbc$aAbc de$de$ ReduceReduce
$aA$aA de$de$ ShiftShift
$aAd$aAd e$e$ ReduceReduce
$aAB$aAB e$e$ ShiftShift
$aABe$aABe $$ ReduceReduce
$S$S $$ AcceptAccept
CH4p3.19
CSE4100
Key ObservationKey Observation At any point in timeAt any point in time
Content of the stack is a prefix of a right-sentencial form
This prefix is called a viable prefix Check again!Check again!
Below = all the right-sentencial form of a rightmost derivation
S aABe aAde aAbcde abbcde
$$
$a$a
$ab$ab
$aA$aA
$aAb$aAb
$aAbc$aAbc
$aA$aA
$aAd$aAd
$aAB$aAB
$aABe$aABe
$S$S
CH4p3.20
CSE4100
What is General Processing for BUP?What is General Processing for BUP? Utilize a Stack Implementation: Utilize a Stack Implementation:
Contains Symbols, Non-Terminals, and Input Input is Examined w.r.t. Stack/Current State
General Operation: Options to Process Stack Include:General Operation: Options to Process Stack Include: Shift Symbols from Input onto Stack When Handle β on Top of Stack
Reduce by using Rule: A β Pop all Symbols of Handle β Push Non-Terminal A onto Stack
When Configuration ($S, $) of Stack, ACCEPT Error Occurs when Handle Can’t be Found or S is
on Stack with Non-Empty Input
CH4p3.21
CSE4100
Consider the Example BelowConsider the Example Below
CH4p3.22
CSE4100
What are Possible Grammar Conflicts?What are Possible Grammar Conflicts? Shift-Reduce (S/R) Conflict:Shift-Reduce (S/R) Conflict:
Content of Stack and Reading Current Input More than One Option of What to do Next
stmt if expr then stmt | if expr then stmt else stmt | otherConsider Stack as below with input of token else $ …. if expr then stmt
Do we Reduce if expr then stmt to stmt Do we Shift “else” onto Stack?
CH4p3.23
CSE4100
What are Possible Grammar Conflicts?What are Possible Grammar Conflicts? Reduce-Reduce (R/R) Conflict:Reduce-Reduce (R/R) Conflict:
stmt id ( parameter_list ) parameter_list parameter_list, parameterparameter id
expr id ( expression_list ) | id
expression_list expression_list, expr | expr
Consider Stack as below with input of token $ …. id (id, … , id) …. Do we Reduce to stmt? Do we Reduce to expr?
CH4p3.24
CSE4100
Bottom-Up Parsing TechniquesBottom-Up Parsing Techniques LR(k) ParsersLR(k) Parsers
Left to Right Input Scanning (L) Construct a Rightmost Derivation in Reverse (R) Use k Lookahead Symbols for Decisions
AdvantagesAdvantages Well Suited to Almost All PLs Most General Approach/Efficiently Implemented Detects Syntax Errors Very Quickly
DisadvantagesDisadvantages Difficult to Build by Hand Tools to Assist Parser Construction (Yacc, Bison)
CH4p3.25
CSE4100
Components of an LR ParserComponents of an LR Parser
Grammar Table Generator
Parsing Table
Input Tokens Driver Routines
OutputParse Tree
ParsingTable
Differs Based on Grammar/Lookaheads
Common to all LR Parsers
CH4p3.26
CSE4100
Three Classes of LR ParsersThree Classes of LR Parsers Simple LR (SLR) or LR(0)Simple LR (SLR) or LR(0)
Easiest but Limited in Grammar Applicability Grammar Leads to S/R and R/R Conflicts
Canonical LR Canonical LR Powerful but Expensive LR(k) – Usually LR(1)
Lookahead LR (LALR) – In Between TwoLookahead LR (LALR) – In Between Two Two Fold Focus:Two Fold Focus:
Parser Table Construction – Item and Item Sets Examination of LR Parsing Algorithm
CH4p3.27
CSE4100
LR Parser StructureLR Parser Structure
action[action[sm , ai ] is Parsing Table with Four Options] is Parsing Table with Four Options1. Shift S onto Stack1. Shift S onto Stack 2. Reduce by Rule2. Reduce by Rule3. Accept ($,$)3. Accept ($,$) 4. Report an Error4. Report an Error
goto[goto[sm , ai ] determines next state for action ] determines next state for action Question: What does following Represent?Question: What does following Represent?
(s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm , ai ai+1 ... an $)
state Grammar symbol (Terminal or non-terminal)
OUTPUT
X1 X2 ... Xm-1 Xm ai ai+1 ... an
LR Parsing Program
action goto
a1 ... ai ai+1 ... an$INPUTsm
Xm
sm-1
Xm-1
……
X1
s0
CH4p3.28
CSE4100
What is the Parsing Table?What is the Parsing Table? Combination of State, Action, and GotoCombination of State, Action, and Goto
Shift s5 means shift input symbol and state 5 Reduce r2 means reduce using rule 2 goto state/NT indicates the next state
CH4p3.29
CSE4100
Actions Against ConfigurationActions Against Configuration
action[action[sm , ai ] = ] = 1. Shift s in Parsing Table – Move aism+1 to Stack
(s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm ai sm+1 , ai+1 ... an $)
2. Reduce A β means Remove 2×| β| symbols from stack and Push A along with state s = goto[sm-1 , A] onto stack
Uses Prior State after popping to determine goto
3. Accept – Parsing Complete
4. Error – Call recovery Routine
Configuration: (s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm , ai ai+1 ... an $)
CH4p3.30
CSE4100
How Does BUP Work?How Does BUP Work?
Stack Input Action
CH4p3.31
CSE4100
Another Detailed ExampleAnother Detailed Example
CH4p3.32
CSE4100
Constructing Parsing TablesConstructing Parsing Tables Three Types of Parsers (SLR, Canonical, LALR) all Three Types of Parsers (SLR, Canonical, LALR) all
have Shared Concept for Parsing Table Constructionhave Shared Concept for Parsing Table Construction An An ItemItem Characterizes for Each Grammar Rule Characterizes for Each Grammar Rule
What we’ve Seen or Derived What we’ve Yet to See or Derive
Consider the Grammar Rule: Consider the Grammar Rule: E → E + T There are Four Items for this Rule
E → . E + TE → E . + T E → E + . T E → E + T .
E . + T Means we’ve Derived E and have yet to Derive + T, so we are Expecting “+” Next
Note: A Note: A → ε has Item A → .
____.________.____Has To BeBeen Seen/Seen/ DerivedDerived
CH4p3.33
CSE4100
Another Characterization of ItemsAnother Characterization of Items Consider the Grammar Rule: Consider the Grammar Rule: E → E + T
There are Four Items for this RuleE → . E + TE → E . + T E → E + . T E → E + T .
This Represents Summary of History of ParseThis Represents Summary of History of Parse Each Item Refers to:Each Item Refers to:
What’s Been Placed on Stack (Left of “.”) What Remains to Reduce for a Rule (Right of “.”)
E → E + .. T
on stack left to derive/reduce Seen a string derived from E+ Looking for String Derivable from T Found input through the “+” Yet to process input for T
CH4p3.34
CSE4100
Start with SLR Parsing Table ConstructionStart with SLR Parsing Table Construction Step 1: Construct an Augmented Grammar which has Step 1: Construct an Augmented Grammar which has
a Single Alternative/Production Rule:a Single Alternative/Production Rule:
Now, Every Derivation Starts with the Production Now, Every Derivation Starts with the Production Rule: Rule: E’ → E $
Augmented E’→ E $E → E + T E → TT → T * FT → F F → ( E )F → Id
OriginalE → E + T E → TT → T * FT → F F → ( E )F → Id
CH4p3.35
CSE4100
Start with SLR Parsing Table ConstructionStart with SLR Parsing Table Construction Step 2: Construct the Closure of All ItemsStep 2: Construct the Closure of All Items
Intuitively, if A → α . B β is in Closure, we would Expect to see B β at Some Point in Derivation
If B → γ is a Production Rule, Expect to see a Substring Derivable from γ in Future
Step 3: Compute the GOTO (Item_Set, X), where X is Step 3: Compute the GOTO (Item_Set, X), where X is a Grammar Symbola Grammar Symbol Intuitively, Identifies Which Items are Valide for
Viable Prefix γ Utilized to Determine Next Action (State) for the
Parser Note: Different from goto as Previously Discussed!
CH4p3.36
CSE4100
Calculating Closure Calculating Closure Closure ([I]) where I is Set of ItemsClosure ([I]) where I is Set of Items
All Items in I are in Closure ([I]) If A → α . B β in Closure ([I]) and B → γ is a
Production Rule, then Add B → . γ to Closure ([I]) Repeat Step 2 Until there are No New Items Added
I0 = Closure ([E’ = Closure ([E’ → . E]) --- Add in Following ItemsE’ E’ → . E - Rule 1 - Any Rules E → γγ - Yes… - Yes…E E → . E + T - Rule 2 E E → . T - Rule 3 - Any Rules T → γγ - Yes… - Yes…T T → . T * F - Rule 4T T → . F - Rule 5 - Any Rules T → γγ - Yes… - Yes… F F → . ( E ) - Rule 6F F → . id - Rule 7
1: E’→ E $2: E→ E + T 3: E → T
4: T → T * F5: T → F 6: F → ( E )7: F → Id
CH4p3.37
CSE4100
What’s Next Step?What’s Next Step? Recall the Parsing Table Recall the Parsing Table
States are 0, 1, 2, … 11 which Correspond to Item Sets
actions based on Input and Current State goto is What State to Transition to Next This is a Push Down Automata!
What are Three Critical Functions to Calculate?What are Three Critical Functions to Calculate? State closure
To compute the set of productions in a given state Transition function
To compute the states reachable from a given state Items
To compute the set of states in the PDA
CH4p3.38
CSE4100
What is Important Part of Process?What is Important Part of Process? Viable Prefix DefinitionViable Prefix Definition
(1) a string that equals a prefix of a right-sentential form up to (and including) its unique handle.
(2) any prefix of a string that satisfies (1) Essentially a subset of a right-sentential form May be inclusive of entire handle (right hand side
of a production rule) Examples of Viable Prefixes are: Examples of Viable Prefixes are:
a, aA, aAd, aAbc, ab, aAb,… Not viable prefixes: aAde, Abc, aAA,…Not viable prefixes: aAde, Abc, aAA,…
CH4p3.39
CSE4100
What is The Big Deal ?What is The Big Deal ? Consider the stack againConsider the stack again Each Element of Stack Represents Each Element of Stack Represents
a right sentential form a right sentential form They are all Viable PrefixesThey are all Viable Prefixes When Parsing, two Alternatives:When Parsing, two Alternatives:
lengthening a viable prefix pruning a handle
In other words...In other words... States represent viable prefixes We transition between viable prefixes!
$$
$a$a
$ab$ab
$aA$aA
$aAb$aAb
$aAbc$aAbc
$aA$aA
$aAd$aAd
$aAB$aAB
$aABe$aABe
$S$SAnswer: We are either
-
CH4p3.40
CSE4100
Intuition for this ProcessIntuition for this Process ObjectiveObjective
Turn a Grammar into a PDA We want We want
A PDA With states the capture viable prefixes
We haveWe have A grammar
With production rules We know thatWe know that
Production rules are used to derive handles Viable prefixes are (strings) prefixes of handles
CH4p3.41
CSE4100
ExampleExample Consider augmented grammar given below….Consider augmented grammar given below…. Assume that Assume that
We start the parsing (with E’) and therefore We are at the initial state of the PDA We have some input: (e.g., id + id * id)
QuestionsQuestions Which productions are activated at this point ? In other words, which productions could be used to
match the rest of the input ?
1: E’ → E $2: E → E + T 3: → T
4:T → T * F5: → F 6: F → ( E )7: → Id
CH4p3.42
CSE4100
Example IIExample II Consider the DerivationConsider the Derivation
Given Below…Given Below…
In Example, Production Rules: 1,2,3,5,7 are active and utilized to “lead” to the viable prefix “id”
1: E’ → E $2: E → E + T 3: → T
4:T → T * F5: → F 6: F → ( E )7: → IdE’ E $ by (1)
E + T $ by (2)
T + T $ by (3)
F + T $ by (5)
id + T $ by (7)....
CH4p3.43
CSE4100
State IState I00
PDA State (Closure([PDA State (Closure([E’ → E $])]) A PDA State is...A PDA State is...
The set of productionsthat are active in the state
QuestionQuestion How do we compute that from G ?
1: E’ → E $2: E → E + T 3: → T
4:T → T * F5: → F 6: F → ( E )7: → Id
E’→ . E $E → . E + T
E’→ . E $ E’→ . E $E → . E + T E → . T
E’→ . E $E → . E + T E → . TT → . T * FT → . F
E’→ . E $E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id
CH4p3.44
CSE4100
E
T
F
(
Id
PDA TransitionPDA Transition How can we leave state How can we leave state II00 ? ? What does it mean to leave What does it mean to leave II00 ? ?
Terminals – mean’s that we’ve Consumed the terminal from the input stream
Non-terminals – mean’s that we have pushed onto the stack the non-terminal, input, and states that will allow for a future reduction
State IState I00
E’→ . E $E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id
This defines theGOTO
Function!
CH4p3.45
CSE4100
The GOTO FunctionThe GOTO Function GOTO(I, X) is Defined forGOTO(I, X) is Defined for
An item set I A grammar symbol (non-terminal or terminal) X
GOTO(I, X) = GOTO(I, X) = {items [A {items [A → → αα X .X . ΒΒ] where ] where A A → → αα . . X X ββ in I} in I}
Algorithmically:Algorithmically: Look for Rules of Form: A → α . X β Identify the Grammar Symbols in I to Right of “.” Group all A → α . X β with Same “X” to Form a
New State Compute the Closure of the New State for All X
This leads to …This leads to …
CH4p3.46
CSE4100
State I5
GOTO(I0, id )
F → Id .
GOTO(I0, T)
State I2
E → T . T → T . * F
Destination statesDestination states
State I0
E’→ . E $E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id
GOTO(I0, E)State I1
E’→ E . $E → E . + T
GOTO(I0, F)State I3
T → F .
State I4
GOTO(I0, ( ) F → ( . E )E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id
CH4p3.47
CSE4100
Destination statesDestination states
For For GOTO(I0, ( ) we compute Closure([F→ ( . E ) ]) Since E→ E + T and E→T, include E→ . E + T, E → . T Since T→ T * F and T→F, include T→ . T * F, T → . F Since F→ ( E ) and F→ Id, include F→ . ( E ) , F → .
Id Now, compute Now, compute GOTO(I1, X ) for X = E, T, F, ( , Id( , Id
State I0
E’→ . E $E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id State I4
GOTO(I0, ( ) F → ( . E )E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id
CH4p3.48
CSE4100
State I5
GOTO(I0, id )F → Id .
GOTO(I0, T)
State I2
E → T . T → T . * F
What Does it Mean when “.” at End of Rule?What Does it Mean when “.” at End of Rule?
For the Three States above, the “.” Occurs at For the Three States above, the “.” Occurs at the end of an Itemthe end of an Item E→ T . and T→ F . and F→ id .
Each if these is a “Reduction” to ReplaceEach if these is a “Reduction” to Replace T by E on Stack T by F on Stack F by id on Stack
State I0
E’→ . E $E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id
GOTO(I0, F)State I3
T → F .
CH4p3.49
CSE4100
Represents the Possible Next Steps in a Represents the Possible Next Steps in a DerivationDerivation
Consider Symbol Directly to Right of “.”Consider Symbol Directly to Right of “.” That is what we Expect to see Next in a
Derivation For two Rules, we
Expect to See “E” Move “.” to Right to Consume “E” for Both Move “.” to Right to Consume “E” for Both
Production RulesProduction Rules We’ve Seen “E” We expect to see What Follows “.” Next
Now, Compute:Now, Compute:Closure([Closure([E’→ . E $, E→ . E + T]) = State I1
How is this Interpreted …How is this Interpreted …
State I0
E’→ . E $E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id
GOTO(I0, E)State I1
E’→ E . $E → E . + T
E’→ . E $E→ . E + T
CH4p3.50
CSE4100
Continue Process to Yield …Continue Process to Yield … The State Machine also The State Machine also
Represents Viable PrefixesRepresents Viable Prefixes Possible Combinations that Possible Combinations that
appear on Parsing Stackappear on Parsing Stack
CH4p3.51
CSE4100
Viable Prefixes and Valid ItemsViable Prefixes and Valid Items Consider a Derivation:Consider a Derivation:
Let Let αα ββ11 be a Viable be a Viable .. Prefix Prefix A → β1 . β2 is Valid Item if the above derivation
exists When When αα ββ11 is on the Parsing Stack – Two Cases:is on the Parsing Stack – Two Cases:
If β2 ≠ ε Then we Don’t have Handle on Stack If β2 = ε Then Perhaps A → β1 is the Reduction
However, Reduction Choice may not be Limited to a However, Reduction Choice may not be Limited to a Single Production Rule:Single Production Rule: There may be two or more Valid Items for the
Same Viable Prefix! Shift/Reduce or Reduce/Reduce Conflicts
Possible!
S’ S’ αα A w A w αα ββ11 ββ22 w w *rm
*rm
CH4p3.52
CSE4100
How Does this Relate to State Machine?How Does this Relate to State Machine? Consider the Viable Prefix E+T*Consider the Viable Prefix E+T* Each State in Machine Represents a Set of One or Each State in Machine Represents a Set of One or
More ItemsMore Items Specifically, for E+T*, we end up in State Specifically, for E+T*, we end up in State II7 7 if you if you
Follow the Transitions of the State Machine Follow the Transitions of the State Machine
CH4p3.53
CSE4100
Consider the StateConsider the State Item Set is:Item Set is:
with three possible derivations:with three possible derivations:
Which do you Choose? Why?Which do you Choose? Why?
T → T * . FF → . ( E )F → . Id
E’ E’ EE E + T E + T * F
E’ E’ EE E + T E + T * F E + T * ( E )
E’ E’ EE E + T E + T * F E + T * id
CH4p3.54
CSE4100
End Result of Process?End Result of Process?
Machine that ContainsMachine that Contains All Item Set States Transitions Between
States on Terminals Non-Terminals
What do we need this for?What do we need this for? To Construct the
Parsing Table!
CH4p3.55
CSE4100
What’s Next Step?What’s Next Step? Constructing SLR Parsing tableConstructing SLR Parsing table
action[state,symbol] goto[state,symbol]
Easy Part of this Process:Easy Part of this Process: Determining “shift” actions Examine Machine for all terminal transitions These are “shifts” from one state to next Push both the terminal and state onto parsing stack
More Difficult Part of this Process:More Difficult Part of this Process: Reductions are Items with “.” at End of Item Two Questions
What is the “input” that Determines Correct Reduction? What is the “state” to push onto Stack?
CH4p3.56
CSE4100
Recall First and Follow CalculationsRecall First and Follow Calculations Recall the Grammar:Recall the Grammar:
First (E’) = First (E) = First (T) = { First (E’) = First (E) = First (T) = { (, id(, id } }
Follow (E’) = {Follow (E’) = {$$}} Follow (E)={First( Follow (E)={First( ++T ), First( T ), First( ) ) ), First (), First ($$)}={)}={+, ), $ +, ), $ }} Follow (T)={Follow (E), First (Follow (T)={Follow (E), First (**F)} = {F)} = {+, ), $, * +, ), $, * }} Follow (F) = {Follow(T)} = {Follow (F) = {Follow(T)} = {+, ), $, * +, ), $, * }}
1: E’ → E $2: E → E + T 3: → T
4:T → T * F5: → F 6: F → ( E )7: → Id
CH4p3.57
CSE4100
Return to Item SetsReturn to Item Sets Suppose an Item Set Contains the Item: A → Suppose an Item Set Contains the Item: A → αα .. When Reach this Item it is Time to Reduce and When Reach this Item it is Time to Reduce and
Replace Replace αα on the Stack with A on the Stack with A However, What is the “Input” under which this However, What is the “Input” under which this
Reduction is Allowed to Occur?Reduction is Allowed to Occur? Want to Replace α with A Reading some current input x Only Do the Reduction if x in Follow (A)
Consider Two Reductions in a Same Item Set:Consider Two Reductions in a Same Item Set:A → A → αα . . and and B → B → αα . . and current input xand current input x
If x in Follow (A), reduce using A → α If x in Follow (B), reduce using B → α If x in both, Reduce/Reduce Error!
We’ll See Two Examples Shortly …We’ll See Two Examples Shortly …
CH4p3.58
CSE4100
Back to Item Sets/State MachineBack to Item Sets/State Machine RED underlinesRED underlines
are all shifts are all shifts with associated with associated gotosgotos
BLUE circlesBLUE circles are all gotos for are all gotos for non-terminalsnon-terminals
GREEN GREEN underlines underlines are all are all reductionsreductions
Reductions are Reductions are based on Followbased on Follow
CH4p3.59
CSE4100
Action and goto tables Action and goto tables Action contains shifts, Action contains shifts,
reduction, and accept reduction, and accept (green)(green)
All other states are error All other states are error statesstates
Goto contains the next state Goto contains the next state to shift onto the stackto shift onto the stack
StateState idid ++ ** (( )) $$ EE TT FF
00 SS SS SS SS SS
11 SS
22 R2R2 SS R2R2 R2R2
33 R4R4 R4R4 R4R4 R4R4
44 SS SS SS SS SS
55 R6R6 R6R6 R6R6 R6R6
66 SS SS SS SS
77 SS SS SS
88 SS SS
99 R1R1 SS R1R1 R1R1
1010 R3R3 R3R3 R3R3 R3R3
1111 R5R5 R5R5 R5R5 R5R5
1: E’→ E $2: E→ E + T 3: E → T
4: T → T * F5: T → F 6: F → ( E )7: F → Id
StateState idid ++ ** (( )) $$ EE TT FF
00 55 44 11 22 33
11 66
22 77
33
44 55 44 88 22 33
55
66 55 44 99 33
77 55 44 1010
88 66 1111
99 77
1010
1111
CH4p3.60
CSE4100
Formal AlgorithmsFormal Algorithms To Calculate the Parsing Table, we Require Three To Calculate the Parsing Table, we Require Three
AlgorithmsAlgorithms State closure
To compute the set of productions in a given state Transition function
To compute the states reachable from a given state Items
To compute the set of states in the PDA Algorithms from Prof. Michel …Algorithms from Prof. Michel …
CH4p3.61
CSE4100
State Closure AlgorithmState Closure Algorithm
function closure(set{Item} I) : set{Item}{
set{Item} J0 = I;repeat
Ji+1 = Ji;for each A→α.Bβ in Ji and
each B→γ in P s.t. B→.γ in Ji
Ji+1 = Ji+1 ∪ { B → .γ } i = i + 1;
until Ji = Ji-1;return Ji;
}
CH4p3.62
CSE4100
GOTO FunctionGOTO Function
function GOTO (set{Item} s,symbol X) : set{Item}{
set{Item} J = ε;for each c in s
if c of the form A→α.XβJ = J ∪ { A→αX.β }
return closure(J);}
CH4p3.63
CSE4100
All State Functions (set-of-items)All State Functions (set-of-items)
function items(Grammar G’) : set{State}{ set{State} C0 = { closure({S’ →.S}) };
i = 0;repeat
Ci+1 = Ci;for each S in Ci and each symbol X in G’
Z = goto(S,X);if Z ≠ ε AND Z in Ci
then Ci+1 = Ci+1 ∪ { Z };i = i + 1;
until Ci = Ci-1;return Ci;
}
CH4p3.64
CSE4100
Using Ambiguous GrammarsUsing Ambiguous Grammars Ambiguous Grammars will Cause Multiple Entries for Ambiguous Grammars will Cause Multiple Entries for
a given state/terminal in Parsing Tablea given state/terminal in Parsing Table Results in Two Types of ConflictsResults in Two Types of Conflicts
Shift/Reduce Conflicts Reduce/Reduce Conflicts
Compiler Writing Tools (Yacc, Bison, etc.) Compiler Writing Tools (Yacc, Bison, etc.) Automatically Resolve these by:Automatically Resolve these by: For Shift/Reduce – chooses Shift For Reduce/Reduce – Reduce by “earlier” rule
Consider Two ExamplesConsider Two Examples Dangling Else Simplified Expression Grammar
CH4p3.65
CSE4100
Dangling Else AmbiguityDangling Else Ambiguity Recall the Grammar:Recall the Grammar:
stmt if expr then stmt else stmt | if expr then stmt
| other Rewrite the Grammar as:
s i s e s | i s | a
Essentially collapsing “expr then stmt” into “s” and with “a” representing all other statements
Now Compute LR(0) Items and SLR Parsing Table
CH4p3.66
CSE4100
The Item Sets for the GrammarThe Item Sets for the Grammar
I0: s’ .s s . i s e s s . i s s . a
I1: s’ s .
I2: s i . s e s s i . s s . i s e s s . i s s . a I3: s a .
I5: s i s e . s s . i s e s s . i s s . a
s
i I4: s i s . e s s i s .
s
aa i e
a
I6: s i s e s .s
Follow(s’)= $Follow(s)=$, e
CH4p3.67
CSE4100
The Parsing tableThe Parsing table
Notice s/r conflict for action[4,e]Notice s/r conflict for action[4,e]ifif <expr> <expr> thenthen <stmt> <stmt> elseelse <stmt> <stmt>
If shift on If shift on elseelse what is the result w.r.t. language? what is the result w.r.t. language? If reduce If reduce elseelse on what is the result w.r.t language? on what is the result w.r.t language?
State action gotoi e a $ s
0 s2 s3 11 acc2 s2 s3 43 r2 r34 s5 r2
r25 s2 s3 66 r1 r1
Follow(s’)= $Follow(s)=$, e
Rules:
s i s e s s i s s a
CH4p3.68
CSE4100
Solution to Dangling ElseSolution to Dangling Else Pick Shift over Reduce: action[4, e] = s5Pick Shift over Reduce: action[4, e] = s5 Consider input Consider input iiaea iiaea which is equivalent to:which is equivalent to:
ifif <expr> <expr> thenthen ifif <expr> <expr> thenthen <stmt> <stmt>
elseelse <stmt> <stmt> Parser as follows w.r.t. stack/input:Parser as follows w.r.t. stack/input:
Using this approach, we eliminate the need for a more Using this approach, we eliminate the need for a more complex unambiguous grammar with more rulescomplex unambiguous grammar with more rules
$ …. ea$ shift e$ ….e a$ shift a$ ….e...a $ reduce using s a$ ….e $ reduce using s i s e s$ ..i.. $ reduce using s i s$ $ accept
CH4p3.69
CSE4100
Example 2 – Simplified Expression GrammarExample 2 – Simplified Expression Grammar Consider the Grammar:Consider the Grammar:
E E E + E | E * E | ( E ) | id E + E | E * E | ( E ) | id What’s Problem with this Grammar?What’s Problem with this Grammar? Why would this Grammar be Preferable?Why would this Grammar be Preferable?
Employ Techniques Similar to Previous Example to Remove Multiple Table Entries
Result is to Achieve both Associative and Precedence Behavior for + and *
Change Assoc/Precedence by Changing Table No more Extra Work Improve Performance
CH4p3.70
CSE4100
E
id
(
First, Calculate Item SetsFirst, Calculate Item Sets I0: E’ .E E . E + E E . E * E E . (E) E . id
I1: E’ E. E E . + E E E . * E
I2: E (.E) E . E + E E . E * E E . (E) E . id
I3: E id . I4: E E + . E E . E + E E . E * E E . (E) E . id
I5: E E * . E E . E + E E . E * E E . (E) E . id
I6: E (E.) E E . + E E E . * E
I7: E’ E + E. E E . + E E E . * E
I8: E’ E * E. E E . + E E E . * E
I9: E (E).
Follow(E’)= $Follow(E)=$, +, *, )
E
id
(
(
id
E
(
+
*
+
*
E
id
(
*+ +
*
CH4p3.71
CSE4100
Consider States Consider States I7 and and I8
State State I7
E’ E + E. action[7,+] = reduce by action[7,+] = reduce by E E + E action[7,*] = reduce by action[7,*] = reduce by E E + E
action[7,)] = reduce by action[7,)] = reduce by E E + E action[7,$] = reduce by action[7,$] = reduce by E E + E
E E . + E action[7,+] = shift to state 4 ,+] = shift to state 4
E E . * E action[7,*] = shift to state 5,*] = shift to state 5 State State I8
action[7,+] = reduce by action[7,+] = reduce by E E + E or shift to state 4 or shift to state 4 action[7,*] = reduce by action[7,*] = reduce by E E + E or shift to state 5 or shift to state 5
How is Each Conflict Resolve?How is Each Conflict Resolve?
CH4p3.72
CSE4100
Parsing Table:Parsing Table:State action goto
id + * ( ) $ E0 s3 s2 11 s4 s5 acc2 s3 s2 63 r4 r4 r4 r44 s3 s2 85 s3 s2 86 s4 s5 s97 r1 s5 r1 r18 r2 r2 r2 r29 r3 r3 r3 r3
Rules:1 E’ .E 2 E . E + E3 E . E * E4 E . (E)5 E . id
“+”is left assoc
Shift “*” onto stack since it has higher precedenceReduce using rule 2
regardless of + or *
CH4p3.73
CSE4100
Canonical Parser Table ConstructionCanonical Parser Table Construction Not all Parser Tables are Created Equally!Not all Parser Tables are Created Equally! Differentiate between SLR/LR(0), LR(1), and LALR(1) Differentiate between SLR/LR(0), LR(1), and LALR(1)
(Yacc/Bison)(Yacc/Bison) Key Issue: Utilization of LookaheadsKey Issue: Utilization of Lookaheads
SLR – Current Input LR(1) – Current Input plus Next Token LR(k) – Current Input plus Next k Tokens
ConsiderConsiderid + id * idid + id * id
SLR/LR(0)Current Input
LR(1) – id determines if shift or reduce – 2nd token (+) determines rule
– if conflict, 2nd token can break tie– on the fly dis-ambiguity – sometimes s, sometimes r– depends on that 2nd toek
CH4p3.74
CSE4100
Recall the Prior Grammar Recall the Prior Grammar Item set Item set I0 as given below left …as given below left … For LR(1) items, we must consider basis on which the For LR(1) items, we must consider basis on which the
rule causes a shift on a lookahead terminal rule causes a shift on a lookahead terminal When we put When we put E’→ . E into LR(1) set, we must also into LR(1) set, we must also
consider the first terminal that appears after Econsider the first terminal that appears after E This is the lookahead…This is the lookahead…
LR(0)E’→ . E E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id
Step 1: LR(1)E’→ . E, $ E → . E + T, $ E → . T, $
What appear afterE in 2nd Item?
Step 2: LR(1)E’→ . E, $ E → . E + T, $/+ E → . T, $If it appears after E,what else does itappear after?
Step 3: LR(1)E’→ . E, $ E → . E + T, $/+ E → . T, $/+
CH4p3.75
CSE4100
Another Way to View Process …Another Way to View Process … Closure[Closure[E’→ E] begins with placing:] begins with placing:
E’ → . E, $ into the item set… into the item set… Since Since E → E + T, we place:
E→ . E + T, $ into item setcarrying along lookahead $ from E’→ . E, $
Now, for Now, for E→ . E + T, what can “E” on right hand side be replaced with? E → E + T again!
If we do this replacement, we need to ask what is the lookahead that follows E on r.h.s. in E → E + T ? We calculate First (+T) the remainder of the rule This is “+” so we add in this additional lookahead
E’→ . E, $ E → . E + T, $ E → . E + T, +
E’→ . E, $ E → . E + T, $/+
We abbreviate this as …
CH4p3.76
CSE4100
Continuing …Continuing … Since Since E → T, we add: E→ . T, $/+ into the Set Now, what does T go to?
T → T * F and T→ F So we add:So we add:
T → . T * F, $/+ and T→ . F , $/+ into Set What can T go to? T → T * F What is the First token following T? First (*F) = * So, add in: * to get: T → . T * F, $/+/* Since T→ F, we also add “*” to yield: T→ . F , $/+/* Are we done?
CH4p3.77
CSE4100
Continuing …Continuing … Since Since T → . F, we now consider the two F rules:
F → ( E ) and F → Id We add in the items:
F → . ( E ), $/+/*F → . Id, $/+/*bringing along the lookaheads from T→ . F , $/+/*
The lookaheads in this case are:First (what follows F concatenated with $/+/*)
This is $/+/*! We arrive at item set I0 :
LR(1)E’ → . E, $E → . E + T, $/+ E → . T, $/+T → . T * F , $/+/* T → . F , $/+/* F → . ( E ) , $/+/* F → . Id , $/+/*
CH4p3.78
CSE4100
Another Example … LR(0) Sets Another Example … LR(0) Sets
I0: S’ .S S . CC C . cC C . d
I1: S’ S .
I2: S C.C C . cC C . d
I4: C d .
S
C
I5: S CC.C
d
I6: C cC .
S’ S S’ CCC cC | dFollow(S’)= $Follow(S)=$Follow(C)=c,d,$
c
C
d
I3: C c.C C . cC C . d
d c
c
CH4p3.79
CSE4100
Now Consider … LR(1) SetsNow Consider … LR(1) SetsS
C
C
d
Follow(S’)= $Follow(S)=$Follow(C)=c,d,$
c
d
c
c
I0: S’ .S, $ S . CC, $ C . cC, c/d C . d , c/d
I1: S’ S ., $
I2: S C.C, $ C . cC, $ C . d, $
I4: C d ., c/d
I5: S CC., $
I8: C cC ., c/d
I3: C c.C, c/d C . cC, c/d C . d, c/d
I6: C c.C, $ C . cC, $ C . d, $
c
d I7: C d ., $
d
I9: C cC ., $C
C
CH4p3.80
CSE4100
Parsing TableParsing Table Easy to Construct from the State Machine …Easy to Construct from the State Machine …
Shifts on terminals (arcs) Reductions based on lookaheads Gotos as with SLR case
State action gotoc d $ S C
0 s3 s4 1 21 acc2 s6 s7 53 s3 s4 84 r3 r35 r1 6 s6 s7 97 r38 r2 r29 r2
CH4p3.81
CSE4100
What’s Real Problem Here?What’s Real Problem Here? Grammar we used with 3 Production Rules Grammar we used with 3 Production Rules
Result was 10 LR(1) states! For Expression Grammar (slide 58), LR(1) would
have 22 states! Lookahead LR Parsing (LALR), on which Compiler Lookahead LR Parsing (LALR), on which Compiler
Tools (Yacc, Bison) are Based, Achieve Similar Tools (Yacc, Bison) are Based, Achieve Similar Results with Less StatesResults with Less States Objective is to Create LR(1) Sets Identify Sets with Similar Cores (Items are the
same but lookaheads may be different) Merge Sets with Similar Cores Factor of 10 in Reduction of States
CH4p3.82
CSE4100
What are the Similar Cores?What are the Similar Cores?S
C
C
d
c
d
c
c
I0: S’ .S, $ S . CC, $ C . cC, c/d C . d , c/d
I1: S’ S ., $
I2: S C.C, $ C . cC, $ C . d, $
I4: C d ., c/d
I5: S CC., $
I8: C cC ., c/d
I3: C c.C, c/d C . cC, c/d C . d, c/d
I6: C c.C, $ C . cC, $ C . d, $
c
d I7: C d ., $
d
I9: C cC ., $C
C
CH4p3.83
CSE4100
Resulting State Machine …Resulting State Machine …S
C
C
d
c
c
I0: S’ .S, $ S . CC, $ C . cC, c/d C . d , c/d
I1: S’ S ., $
I2: S C.C, $ C . cC, $ C . d, $
I47: C d ., $/c/d
I5: S CC., $
I36: C c.C, c/d/$ C . cC, c/d/$ C . d, c/d/$
c
d
I89: C cC ., $/c/d
C
d
CH4p3.84
CSE4100
… … With Simplified Parsing TableWith Simplified Parsing Table
State action gotoc d $ S C
0 s36 s47 1 21 acc2 s36 s47 536 s36 s47 8947 r3 r3 r35 r1 89 r2 r2 r2
CH4p3.85
CSE4100
Parser GeneratorsParser Generators The entire process we describe can be automatedThe entire process we describe can be automated
Computation of the machine states Computation of the lookaheads Computation of the action and goto tables Optimization of the LALR tables.
Therefore...Therefore... Tools exist to do this for you!
CH4p3.86
CSE4100
Parser Generators IIParser Generators II In the C/C++ worldIn the C/C++ world
Most famous parser generatorYACC LALR(1)
Most used parser generatorBISON LALR(1)
Table-driven leftmostPCCTS LL(k)
In the Java worldIn the Java world Several alternatives
CUP (a BISON/YACC lookalike) LALR(1)JACK LALR(1)
CH4p3.87
CSE4100
Big PictureBig Picture
CH4p3.88
CSE4100
The Road AheadThe Road Ahead What are we missing ?What are we missing ?
A parse tree! How can we get one ?How can we get one ?
By augmenting the grammar! With actions [pieces of Java code]
Purpose of actionsPurpose of actions Manufacture the tree as a side-effect of parsing.
ReadingReading Syntax directed translation via
Attribute GrammarsYacc