CH4p3.1 CSE 4100 Chapter 4 - Part 3: Bottom-Up Parsing Prof. Steven A. Demurjian Computer Science &...

CH4p3.1

CSE4100

Chapter 4 - Part 3: Bottom-Up ParsingChapter 4 - Part 3: Bottom-Up Parsing

Prof. Steven A. DemurjianComputer Science & Engineering Department

The University of Connecticut371 Fairfield Way, Unit 2155

Storrs, CT [email protected]

http://www.engr.uconn.edu/~steve(860) 486 - 4818

Material for course thanks to:Laurent MichelAggelos KiayiasRobert LeBarre

CH4p3.2

CSE4100

Basic IntuitionBasic Intuition Recall thatRecall that

LL(k) works TOP-DOWN With a LEFTMOST DerivationPredicts the right production to select based on lookahead

Our Our newnew motto motto LR(k) works

BOTTOM-UPWith a RIGHTMOST DerivationCommits to the production choice after seeing the whole

body (left hand side), working in “reverse”

CH4p3.3

CSE4100

Bottom-Up ParsingBottom-Up Parsing Inverse or Complement of Top-Down ParsingInverse or Complement of Top-Down Parsing Top Down Parsing Utilizes “Start Symbol” and Top Down Parsing Utilizes “Start Symbol” and

Attempts to Derive the Input String using ProductionsAttempts to Derive the Input String using Productions Bottom-Up Parsing Makes Modifications to the Input Bottom-Up Parsing Makes Modifications to the Input

String which Allows it to Reduce to Start SymbolString which Allows it to Reduce to Start Symbol For Example, Consider Grammar & Derivations:For Example, Consider Grammar & Derivations:

S S a A B e a A B eA A Abc | b Abc | bB B d d

What Does Each Derivation Represent?What Does Each Derivation Represent? Top-Down ---- Leftmost Derivation Bottom-Up ---- Rightmost Derivation in Reverse!

S aABe aAbcBe abbcBe abbcde

abbcde aAbcde aAde aABe S

CH4p3.4

CSE4100

Type of DerviationType of Derviation Grammar:Grammar:

S S a A B e a A B eA A Abc | b Abc | bB B d d

Key Issues:Key Issues: How do we Determine which Substring to

“Reduce”? How do we Know which Production Rule to Use? What is the General Processing for BUP? How are Conflicts Resolved? What Types of BUP are Considered?

TDP: S aABe aAbcBe abbcBe abbcde

BUP: S aABe aAde aAbcde abbcde Is a rightmost derivation that happens in reverse!

CH4p3.5

CSE4100

What is a Handle?What is a Handle? Defn: ADefn: A Right-Sentential Form Right-Sentential Form is Sentential Form that is Sentential Form that

has Been Derived in a Righmost Derivationhas Been Derived in a Righmost Derivation S aABe aAde aAbcde abbcde Underline all Right Sentential Forms

HandleHandle is a Substring of a Right Sentential Form that: is a Substring of a Right Sentential Form that: Appears on Right Hand Side of Production Rule Can be Used to Reduce the Right Sentential Form

via a Substitution in a Step of a RM Derivation Formally is a rule A → β and position in Right

Sentential Form γ s.t. S RM

* αAw RM αβw and A occurs at γ in αAw

Example: Handles are Underlined in:Example: Handles are Underlined in: S aABe aAde aAbcde abbcde Abc is Right hand Side of Rule A → Abc at

Position 2 in Right Sentential Form γ = aAbcde

CH4p3.6

CSE4100

What is a Handle?What is a Handle? Consider again...Consider again...

S aABe aAde aAbcde abbcde

S → aABe A → Abc | b B → d

CH4p3.7

CSE4100

Handle PruningHandle Pruning What bottom-up really means...What bottom-up really means...

abbcde aAbcde

CH4p3.8

CSE4100

Handle PruningHandle Pruning

aAbcde aAde

CH4p3.9

CSE4100


aAde aABe

CH4p3.10

CSE4100


aABe S

CH4p3.11

CSE4100

What’s Going on in Parse Tree?What’s Going on in Parse Tree? Consider Right Sentential Form: Consider Right Sentential Form: αβαβw and Rule A w and Rule A ββ

SS

αα

ββ ww

AAWhat Doesαα Signify? Signify?

What Doesββ Represent? Represent?

What Doesw Contain?w Contain?Input Processed

Still on Parsing Stack

Candidate Handle to be Reduced

Input yet to beConsumed

CH4p3.12

CSE4100

Bottom-Up Parsing …Bottom-Up Parsing … Recognized body of last production applied in rightmost Recognized body of last production applied in rightmost

derivationderivation Replace the symbol sequence of that body by the RHS Replace the symbol sequence of that body by the RHS

of the Production Rule Based on “Current” Inputof the Production Rule Based on “Current” Input RepeatsRepeats At the endAt the end

EitherWe are left with the start symbol Success!

OrWe get “stuck” somewhere Syntax error!

Key Issue: If there are Multiple Handles for the “Same” Key Issue: If there are Multiple Handles for the “Same” Sentential Form, then the Grammar G is AmbiguousSentential Form, then the Grammar G is Ambiguous

CH4p3.13

CSE4100

General Processing of BUPGeneral Processing of BUP Basic mechanismsBasic mechanisms

“Shift” “Reduce”

Basic data-structureBasic data-structure A stack of grammar symbols (Terminals and Non-

Terminals) Basic ideaBasic idea

Shift input symbols on the stack until ... the entire handle of the last rightmost reduction

When the body of the last RM reduction is on Stack, reduce it by replacing the body by the right-hand-side of the Production Rule

When only start symbol is left We are done.

CH4p3.14

CSE4100

ExampleExample

$$ abbcde$abbcde$ ShiftShift

$a$a bbcde$bbcde$ ShiftShift

$ab$ab bcde$bcde$ ReduceReduce

$aA$aA bcde$bcde$ ShiftShift

$aAb$aAb cde$cde$ ShiftShift

$aAbc$aAbc de$de$ ReduceReduce

$aA$aA de$de$ ShiftShift

$aAd$aAd e$e$ ReduceReduce

$aAB$aAB e$e$ ShiftShift

$aABe$aABe $$ ReduceReduce

$S$S $$ AcceptAccept

Handle

Rule to Reduce with

CH4p3.15

CSE4100

ExampleExample












HandleRule to Reduce with

CH4p3.16

CSE4100

ExampleExample












Handle Rule to Reduce with

CH4p3.17

CSE4100

ExampleExample












CH4p3.18

CSE4100

ExampleExample












CH4p3.19

CSE4100

Key ObservationKey Observation At any point in timeAt any point in time

Content of the stack is a prefix of a right-sentencial form

This prefix is called a viable prefix Check again!Check again!

Below = all the right-sentencial form of a rightmost derivation

S aABe aAde aAbcde abbcde

$$

$a$a

$ab$ab

$aA$aA

$aAb$aAb

$aAbc$aAbc

$aA$aA

$aAd$aAd

$aAB$aAB

$aABe$aABe

$S$S

CH4p3.20

CSE4100

What is General Processing for BUP?What is General Processing for BUP? Utilize a Stack Implementation: Utilize a Stack Implementation:

Contains Symbols, Non-Terminals, and Input Input is Examined w.r.t. Stack/Current State

General Operation: Options to Process Stack Include:General Operation: Options to Process Stack Include: Shift Symbols from Input onto Stack When Handle β on Top of Stack

Reduce by using Rule: A β Pop all Symbols of Handle β Push Non-Terminal A onto Stack

When Configuration ($S, $) of Stack, ACCEPT Error Occurs when Handle Can’t be Found or S is

on Stack with Non-Empty Input

CH4p3.21

CSE4100

Consider the Example BelowConsider the Example Below

CH4p3.22

CSE4100

What are Possible Grammar Conflicts?What are Possible Grammar Conflicts? Shift-Reduce (S/R) Conflict:Shift-Reduce (S/R) Conflict:

Content of Stack and Reading Current Input More than One Option of What to do Next

stmt if expr then stmt | if expr then stmt else stmt | otherConsider Stack as below with input of token else $ …. if expr then stmt

Do we Reduce if expr then stmt to stmt Do we Shift “else” onto Stack?

CH4p3.23

CSE4100

What are Possible Grammar Conflicts?What are Possible Grammar Conflicts? Reduce-Reduce (R/R) Conflict:Reduce-Reduce (R/R) Conflict:

stmt id ( parameter_list ) parameter_list parameter_list, parameterparameter id

expr id ( expression_list ) | id

expression_list expression_list, expr | expr

Consider Stack as below with input of token $ …. id (id, … , id) …. Do we Reduce to stmt? Do we Reduce to expr?

CH4p3.24

CSE4100

Bottom-Up Parsing TechniquesBottom-Up Parsing Techniques LR(k) ParsersLR(k) Parsers

Left to Right Input Scanning (L) Construct a Rightmost Derivation in Reverse (R) Use k Lookahead Symbols for Decisions

AdvantagesAdvantages Well Suited to Almost All PLs Most General Approach/Efficiently Implemented Detects Syntax Errors Very Quickly

DisadvantagesDisadvantages Difficult to Build by Hand Tools to Assist Parser Construction (Yacc, Bison)

CH4p3.25

CSE4100

Components of an LR ParserComponents of an LR Parser

Grammar Table Generator

Parsing Table

Input Tokens Driver Routines

OutputParse Tree

ParsingTable

Differs Based on Grammar/Lookaheads

Common to all LR Parsers

CH4p3.26

CSE4100

Three Classes of LR ParsersThree Classes of LR Parsers Simple LR (SLR) or LR(0)Simple LR (SLR) or LR(0)

Easiest but Limited in Grammar Applicability Grammar Leads to S/R and R/R Conflicts

Canonical LR Canonical LR Powerful but Expensive LR(k) – Usually LR(1)

Lookahead LR (LALR) – In Between TwoLookahead LR (LALR) – In Between Two Two Fold Focus:Two Fold Focus:

Parser Table Construction – Item and Item Sets Examination of LR Parsing Algorithm

CH4p3.27

CSE4100

LR Parser StructureLR Parser Structure

action[action[sm , ai ] is Parsing Table with Four Options] is Parsing Table with Four Options1. Shift S onto Stack1. Shift S onto Stack 2. Reduce by Rule2. Reduce by Rule3. Accept ($,$)3. Accept ($,$) 4. Report an Error4. Report an Error

goto[goto[sm , ai ] determines next state for action ] determines next state for action Question: What does following Represent?Question: What does following Represent?

(s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm , ai ai+1 ... an $)

state Grammar symbol (Terminal or non-terminal)

OUTPUT

X1 X2 ... Xm-1 Xm ai ai+1 ... an

LR Parsing Program

action goto

a1 ... ai ai+1 ... an$INPUTsm

Xm

sm-1

Xm-1

……

X1

s0

CH4p3.28

CSE4100

What is the Parsing Table?What is the Parsing Table? Combination of State, Action, and GotoCombination of State, Action, and Goto

Shift s5 means shift input symbol and state 5 Reduce r2 means reduce using rule 2 goto state/NT indicates the next state

CH4p3.29

CSE4100

Actions Against ConfigurationActions Against Configuration

action[action[sm , ai ] = ] = 1. Shift s in Parsing Table – Move aism+1 to Stack

(s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm ai sm+1 , ai+1 ... an $)

2. Reduce A β means Remove 2×| β| symbols from stack and Push A along with state s = goto[sm-1 , A] onto stack

Uses Prior State after popping to determine goto

3. Accept – Parsing Complete

4. Error – Call recovery Routine

Configuration: (s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm , ai ai+1 ... an $)

CH4p3.30

CSE4100

How Does BUP Work?How Does BUP Work?

Stack Input Action

CH4p3.31

CSE4100

Another Detailed ExampleAnother Detailed Example

CH4p3.32

CSE4100

Constructing Parsing TablesConstructing Parsing Tables Three Types of Parsers (SLR, Canonical, LALR) all Three Types of Parsers (SLR, Canonical, LALR) all

have Shared Concept for Parsing Table Constructionhave Shared Concept for Parsing Table Construction An An ItemItem Characterizes for Each Grammar Rule Characterizes for Each Grammar Rule

What we’ve Seen or Derived What we’ve Yet to See or Derive

Consider the Grammar Rule: Consider the Grammar Rule: E → E + T There are Four Items for this Rule

E → . E + TE → E . + T E → E + . T E → E + T .

E . + T Means we’ve Derived E and have yet to Derive + T, so we are Expecting “+” Next

Note: A Note: A → ε has Item A → .

____.________.____Has To BeBeen Seen/Seen/ DerivedDerived

CH4p3.33

CSE4100

Another Characterization of ItemsAnother Characterization of Items Consider the Grammar Rule: Consider the Grammar Rule: E → E + T

There are Four Items for this RuleE → . E + TE → E . + T E → E + . T E → E + T .

This Represents Summary of History of ParseThis Represents Summary of History of Parse Each Item Refers to:Each Item Refers to:

What’s Been Placed on Stack (Left of “.”) What Remains to Reduce for a Rule (Right of “.”)

E → E + .. T

on stack left to derive/reduce Seen a string derived from E+ Looking for String Derivable from T Found input through the “+” Yet to process input for T

CH4p3.34

CSE4100

Start with SLR Parsing Table ConstructionStart with SLR Parsing Table Construction Step 1: Construct an Augmented Grammar which has Step 1: Construct an Augmented Grammar which has

a Single Alternative/Production Rule:a Single Alternative/Production Rule:

Now, Every Derivation Starts with the Production Now, Every Derivation Starts with the Production Rule: Rule: E’ → E $

Augmented E’→ E $E → E + T E → TT → T * FT → F F → ( E )F → Id

OriginalE → E + T E → TT → T * FT → F F → ( E )F → Id

CH4p3.35

CSE4100

Start with SLR Parsing Table ConstructionStart with SLR Parsing Table Construction Step 2: Construct the Closure of All ItemsStep 2: Construct the Closure of All Items

Intuitively, if A → α . B β is in Closure, we would Expect to see B β at Some Point in Derivation

If B → γ is a Production Rule, Expect to see a Substring Derivable from γ in Future

Step 3: Compute the GOTO (Item_Set, X), where X is Step 3: Compute the GOTO (Item_Set, X), where X is a Grammar Symbola Grammar Symbol Intuitively, Identifies Which Items are Valide for

Viable Prefix γ Utilized to Determine Next Action (State) for the

Parser Note: Different from goto as Previously Discussed!

CH4p3.36

CSE4100

Calculating Closure Calculating Closure Closure ([I]) where I is Set of ItemsClosure ([I]) where I is Set of Items

All Items in I are in Closure ([I]) If A → α . B β in Closure ([I]) and B → γ is a

Production Rule, then Add B → . γ to Closure ([I]) Repeat Step 2 Until there are No New Items Added

I0 = Closure ([E’ = Closure ([E’ → . E]) --- Add in Following ItemsE’ E’ → . E - Rule 1 - Any Rules E → γγ - Yes… - Yes…E E → . E + T - Rule 2 E E → . T - Rule 3 - Any Rules T → γγ - Yes… - Yes…T T → . T * F - Rule 4T T → . F - Rule 5 - Any Rules T → γγ - Yes… - Yes… F F → . ( E ) - Rule 6F F → . id - Rule 7

1: E’→ E $2: E→ E + T 3: E → T

4: T → T * F5: T → F 6: F → ( E )7: F → Id

CH4p3.37

CSE4100

What’s Next Step?What’s Next Step? Recall the Parsing Table Recall the Parsing Table

States are 0, 1, 2, … 11 which Correspond to Item Sets

actions based on Input and Current State goto is What State to Transition to Next This is a Push Down Automata!

What are Three Critical Functions to Calculate?What are Three Critical Functions to Calculate? State closure

To compute the set of productions in a given state Transition function

To compute the states reachable from a given state Items

To compute the set of states in the PDA

CH4p3.38

CSE4100

What is Important Part of Process?What is Important Part of Process? Viable Prefix DefinitionViable Prefix Definition

(1) a string that equals a prefix of a right-sentential form up to (and including) its unique handle.

(2) any prefix of a string that satisfies (1) Essentially a subset of a right-sentential form May be inclusive of entire handle (right hand side

of a production rule) Examples of Viable Prefixes are: Examples of Viable Prefixes are:

a, aA, aAd, aAbc, ab, aAb,… Not viable prefixes: aAde, Abc, aAA,…Not viable prefixes: aAde, Abc, aAA,…

CH4p3.39

CSE4100

What is The Big Deal ?What is The Big Deal ? Consider the stack againConsider the stack again Each Element of Stack Represents Each Element of Stack Represents

a right sentential form a right sentential form They are all Viable PrefixesThey are all Viable Prefixes When Parsing, two Alternatives:When Parsing, two Alternatives:

lengthening a viable prefix pruning a handle

In other words...In other words... States represent viable prefixes We transition between viable prefixes!

$$

$a$a

$ab$ab

$aA$aA

$aAb$aAb

$aAbc$aAbc

$aA$aA

$aAd$aAd

$aAB$aAB

$aABe$aABe

$S$SAnswer: We are either

-

CH4p3.40

CSE4100

Intuition for this ProcessIntuition for this Process ObjectiveObjective

Turn a Grammar into a PDA We want We want

A PDA With states the capture viable prefixes

We haveWe have A grammar

With production rules We know thatWe know that

Production rules are used to derive handles Viable prefixes are (strings) prefixes of handles

CH4p3.41

CSE4100

ExampleExample Consider augmented grammar given below….Consider augmented grammar given below…. Assume that Assume that

We start the parsing (with E’) and therefore We are at the initial state of the PDA We have some input: (e.g., id + id * id)

QuestionsQuestions Which productions are activated at this point ? In other words, which productions could be used to

match the rest of the input ?

1: E’ → E $2: E → E + T 3: → T

4:T → T * F5: → F 6: F → ( E )7: → Id

CH4p3.42

CSE4100

Example IIExample II Consider the DerivationConsider the Derivation

Given Below…Given Below…

In Example, Production Rules: 1,2,3,5,7 are active and utilized to “lead” to the viable prefix “id”

1: E’ → E $2: E → E + T 3: → T

4:T → T * F5: → F 6: F → ( E )7: → IdE’ E $ by (1)

E + T $ by (2)

T + T $ by (3)

F + T $ by (5)

id + T $ by (7)....

CH4p3.43

CSE4100

State IState I00

PDA State (Closure([PDA State (Closure([E’ → E $])]) A PDA State is...A PDA State is...

The set of productionsthat are active in the state

QuestionQuestion How do we compute that from G ?

1: E’ → E $2: E → E + T 3: → T

4:T → T * F5: → F 6: F → ( E )7: → Id

E’→ . E $E → . E + T

E’→ . E $ E’→ . E $E → . E + T E → . T

E’→ . E $E → . E + T E → . TT → . T * FT → . F

E’→ . E $E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id

CH4p3.44

CSE4100

E

T

F

(

Id

PDA TransitionPDA Transition How can we leave state How can we leave state II00 ? ? What does it mean to leave What does it mean to leave II00 ? ?

Terminals – mean’s that we’ve Consumed the terminal from the input stream

Non-terminals – mean’s that we have pushed onto the stack the non-terminal, input, and states that will allow for a future reduction

State IState I00


This defines theGOTO

Function!

CH4p3.45

CSE4100

The GOTO FunctionThe GOTO Function GOTO(I, X) is Defined forGOTO(I, X) is Defined for

An item set I A grammar symbol (non-terminal or terminal) X

GOTO(I, X) = GOTO(I, X) = {items [A {items [A → → αα X .X . ΒΒ] where ] where A A → → αα . . X X ββ in I} in I}

Algorithmically:Algorithmically: Look for Rules of Form: A → α . X β Identify the Grammar Symbols in I to Right of “.” Group all A → α . X β with Same “X” to Form a

New State Compute the Closure of the New State for All X

This leads to …This leads to …

CH4p3.46

CSE4100

State I5

GOTO(I0, id )

F → Id .

GOTO(I0, T)

State I2

E → T . T → T . * F

Destination statesDestination states

State I0


GOTO(I0, E)State I1

E’→ E . $E → E . + T

GOTO(I0, F)State I3

T → F .

State I4

GOTO(I0, ( ) F → ( . E )E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id

CH4p3.47

CSE4100

Destination statesDestination states

For For GOTO(I0, ( ) we compute Closure([F→ ( . E ) ]) Since E→ E + T and E→T, include E→ . E + T, E → . T Since T→ T * F and T→F, include T→ . T * F, T → . F Since F→ ( E ) and F→ Id, include F→ . ( E ) , F → .

Id Now, compute Now, compute GOTO(I1, X ) for X = E, T, F, ( , Id( , Id

State I0

E’→ . E $E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id State I4

GOTO(I0, ( ) F → ( . E )E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id

CH4p3.48

CSE4100

State I5

GOTO(I0, id )F → Id .

GOTO(I0, T)

State I2

E → T . T → T . * F

What Does it Mean when “.” at End of Rule?What Does it Mean when “.” at End of Rule?

For the Three States above, the “.” Occurs at For the Three States above, the “.” Occurs at the end of an Itemthe end of an Item E→ T . and T→ F . and F→ id .

Each if these is a “Reduction” to ReplaceEach if these is a “Reduction” to Replace T by E on Stack T by F on Stack F by id on Stack

State I0


GOTO(I0, F)State I3

T → F .

CH4p3.49

CSE4100

Represents the Possible Next Steps in a Represents the Possible Next Steps in a DerivationDerivation

Consider Symbol Directly to Right of “.”Consider Symbol Directly to Right of “.” That is what we Expect to see Next in a

Derivation For two Rules, we

Expect to See “E” Move “.” to Right to Consume “E” for Both Move “.” to Right to Consume “E” for Both

Production RulesProduction Rules We’ve Seen “E” We expect to see What Follows “.” Next

Now, Compute:Now, Compute:Closure([Closure([E’→ . E $, E→ . E + T]) = State I1

How is this Interpreted …How is this Interpreted …

State I0


GOTO(I0, E)State I1

E’→ E . $E → E . + T

E’→ . E $E→ . E + T

CH4p3.50

CSE4100

Continue Process to Yield …Continue Process to Yield … The State Machine also The State Machine also

Represents Viable PrefixesRepresents Viable Prefixes Possible Combinations that Possible Combinations that

appear on Parsing Stackappear on Parsing Stack

CH4p3.51

CSE4100

Viable Prefixes and Valid ItemsViable Prefixes and Valid Items Consider a Derivation:Consider a Derivation:

Let Let αα ββ11 be a Viable be a Viable .. Prefix Prefix A → β1 . β2 is Valid Item if the above derivation

exists When When αα ββ11 is on the Parsing Stack – Two Cases:is on the Parsing Stack – Two Cases:

If β2 ≠ ε Then we Don’t have Handle on Stack If β2 = ε Then Perhaps A → β1 is the Reduction

However, Reduction Choice may not be Limited to a However, Reduction Choice may not be Limited to a Single Production Rule:Single Production Rule: There may be two or more Valid Items for the

Same Viable Prefix! Shift/Reduce or Reduce/Reduce Conflicts

Possible!

S’ S’ αα A w A w αα ββ11 ββ22 w w *rm

*rm

CH4p3.52

CSE4100

How Does this Relate to State Machine?How Does this Relate to State Machine? Consider the Viable Prefix E+T*Consider the Viable Prefix E+T* Each State in Machine Represents a Set of One or Each State in Machine Represents a Set of One or

More ItemsMore Items Specifically, for E+T*, we end up in State Specifically, for E+T*, we end up in State II7 7 if you if you

Follow the Transitions of the State Machine Follow the Transitions of the State Machine

CH4p3.53

CSE4100

Consider the StateConsider the State Item Set is:Item Set is:

with three possible derivations:with three possible derivations:

Which do you Choose? Why?Which do you Choose? Why?

T → T * . FF → . ( E )F → . Id

E’ E’ EE E + T E + T * F

E’ E’ EE E + T E + T * F E + T * ( E )

E’ E’ EE E + T E + T * F E + T * id

CH4p3.54

CSE4100

End Result of Process?End Result of Process?

Machine that ContainsMachine that Contains All Item Set States Transitions Between

States on Terminals Non-Terminals

What do we need this for?What do we need this for? To Construct the

Parsing Table!

CH4p3.55

CSE4100

What’s Next Step?What’s Next Step? Constructing SLR Parsing tableConstructing SLR Parsing table

action[state,symbol] goto[state,symbol]

Easy Part of this Process:Easy Part of this Process: Determining “shift” actions Examine Machine for all terminal transitions These are “shifts” from one state to next Push both the terminal and state onto parsing stack

More Difficult Part of this Process:More Difficult Part of this Process: Reductions are Items with “.” at End of Item Two Questions

What is the “input” that Determines Correct Reduction? What is the “state” to push onto Stack?

CH4p3.56

CSE4100

Recall First and Follow CalculationsRecall First and Follow Calculations Recall the Grammar:Recall the Grammar:

First (E’) = First (E) = First (T) = { First (E’) = First (E) = First (T) = { (, id(, id } }

Follow (E’) = {Follow (E’) = {$$}} Follow (E)={First( Follow (E)={First( ++T ), First( T ), First( ) ) ), First (), First ($$)}={)}={+, ), $ +, ), $ }} Follow (T)={Follow (E), First (Follow (T)={Follow (E), First (**F)} = {F)} = {+, ), $, * +, ), $, * }} Follow (F) = {Follow(T)} = {Follow (F) = {Follow(T)} = {+, ), $, * +, ), $, * }}

1: E’ → E $2: E → E + T 3: → T

4:T → T * F5: → F 6: F → ( E )7: → Id

CH4p3.57

CSE4100

Return to Item SetsReturn to Item Sets Suppose an Item Set Contains the Item: A → Suppose an Item Set Contains the Item: A → αα .. When Reach this Item it is Time to Reduce and When Reach this Item it is Time to Reduce and

Replace Replace αα on the Stack with A on the Stack with A However, What is the “Input” under which this However, What is the “Input” under which this

Reduction is Allowed to Occur?Reduction is Allowed to Occur? Want to Replace α with A Reading some current input x Only Do the Reduction if x in Follow (A)

Consider Two Reductions in a Same Item Set:Consider Two Reductions in a Same Item Set:A → A → αα . . and and B → B → αα . . and current input xand current input x

If x in Follow (A), reduce using A → α If x in Follow (B), reduce using B → α If x in both, Reduce/Reduce Error!

We’ll See Two Examples Shortly …We’ll See Two Examples Shortly …

CH4p3.58

CSE4100

Back to Item Sets/State MachineBack to Item Sets/State Machine RED underlinesRED underlines

are all shifts are all shifts with associated with associated gotosgotos

BLUE circlesBLUE circles are all gotos for are all gotos for non-terminalsnon-terminals

GREEN GREEN underlines underlines are all are all reductionsreductions

Reductions are Reductions are based on Followbased on Follow

CH4p3.59

CSE4100

Action and goto tables Action and goto tables Action contains shifts, Action contains shifts,

reduction, and accept reduction, and accept (green)(green)

All other states are error All other states are error statesstates

Goto contains the next state Goto contains the next state to shift onto the stackto shift onto the stack

StateState idid ++ ** (( )) $$ EE TT FF

00 SS SS SS SS SS

11 SS

22 R2R2 SS R2R2 R2R2

33 R4R4 R4R4 R4R4 R4R4

44 SS SS SS SS SS

55 R6R6 R6R6 R6R6 R6R6

66 SS SS SS SS

77 SS SS SS

88 SS SS

99 R1R1 SS R1R1 R1R1

1010 R3R3 R3R3 R3R3 R3R3

1111 R5R5 R5R5 R5R5 R5R5

1: E’→ E $2: E→ E + T 3: E → T

4: T → T * F5: T → F 6: F → ( E )7: F → Id

StateState idid ++ ** (( )) $$ EE TT FF

00 55 44 11 22 33

11 66

22 77

33

44 55 44 88 22 33

55

66 55 44 99 33

77 55 44 1010

88 66 1111

99 77

1010

1111

CH4p3.60

CSE4100

Formal AlgorithmsFormal Algorithms To Calculate the Parsing Table, we Require Three To Calculate the Parsing Table, we Require Three

AlgorithmsAlgorithms State closure

To compute the set of productions in a given state Transition function

To compute the states reachable from a given state Items

To compute the set of states in the PDA Algorithms from Prof. Michel …Algorithms from Prof. Michel …

CH4p3.61

CSE4100

State Closure AlgorithmState Closure Algorithm

function closure(set{Item} I) : set{Item}{

set{Item} J0 = I;repeat

Ji+1 = Ji;for each A→α.Bβ in Ji and

each B→γ in P s.t. B→.γ in Ji

Ji+1 = Ji+1 ∪ { B → .γ } i = i + 1;

until Ji = Ji-1;return Ji;

}

CH4p3.62

CSE4100

GOTO FunctionGOTO Function

function GOTO (set{Item} s,symbol X) : set{Item}{

set{Item} J = ε;for each c in s

if c of the form A→α.XβJ = J ∪ { A→αX.β }

return closure(J);}

CH4p3.63

CSE4100

All State Functions (set-of-items)All State Functions (set-of-items)

function items(Grammar G’) : set{State}{ set{State} C0 = { closure({S’ →.S}) };

i = 0;repeat

Ci+1 = Ci;for each S in Ci and each symbol X in G’

Z = goto(S,X);if Z ≠ ε AND Z in Ci

then Ci+1 = Ci+1 ∪ { Z };i = i + 1;

until Ci = Ci-1;return Ci;

}

CH4p3.64

CSE4100

Using Ambiguous GrammarsUsing Ambiguous Grammars Ambiguous Grammars will Cause Multiple Entries for Ambiguous Grammars will Cause Multiple Entries for

a given state/terminal in Parsing Tablea given state/terminal in Parsing Table Results in Two Types of ConflictsResults in Two Types of Conflicts

Shift/Reduce Conflicts Reduce/Reduce Conflicts

Compiler Writing Tools (Yacc, Bison, etc.) Compiler Writing Tools (Yacc, Bison, etc.) Automatically Resolve these by:Automatically Resolve these by: For Shift/Reduce – chooses Shift For Reduce/Reduce – Reduce by “earlier” rule

Consider Two ExamplesConsider Two Examples Dangling Else Simplified Expression Grammar

CH4p3.65

CSE4100

Dangling Else AmbiguityDangling Else Ambiguity Recall the Grammar:Recall the Grammar:

stmt if expr then stmt else stmt | if expr then stmt

| other Rewrite the Grammar as:

s i s e s | i s | a

Essentially collapsing “expr then stmt” into “s” and with “a” representing all other statements

Now Compute LR(0) Items and SLR Parsing Table

CH4p3.66

CSE4100

The Item Sets for the GrammarThe Item Sets for the Grammar

I0: s’ .s s . i s e s s . i s s . a

I1: s’ s .

I2: s i . s e s s i . s s . i s e s s . i s s . a I3: s a .

I5: s i s e . s s . i s e s s . i s s . a

s

i I4: s i s . e s s i s .

s

aa i e

a

I6: s i s e s .s

Follow(s’)= $Follow(s)=$, e

CH4p3.67

CSE4100

The Parsing tableThe Parsing table

Notice s/r conflict for action[4,e]Notice s/r conflict for action[4,e]ifif <expr> <expr> thenthen <stmt> <stmt> elseelse <stmt> <stmt>

If shift on If shift on elseelse what is the result w.r.t. language? what is the result w.r.t. language? If reduce If reduce elseelse on what is the result w.r.t language? on what is the result w.r.t language?

State action gotoi e a $ s

0 s2 s3 11 acc2 s2 s3 43 r2 r34 s5 r2

r25 s2 s3 66 r1 r1

Follow(s’)= $Follow(s)=$, e

Rules:

s i s e s s i s s a

CH4p3.68

CSE4100

Solution to Dangling ElseSolution to Dangling Else Pick Shift over Reduce: action[4, e] = s5Pick Shift over Reduce: action[4, e] = s5 Consider input Consider input iiaea iiaea which is equivalent to:which is equivalent to:

ifif <expr> <expr> thenthen ifif <expr> <expr> thenthen <stmt> <stmt>

elseelse <stmt> <stmt> Parser as follows w.r.t. stack/input:Parser as follows w.r.t. stack/input:

Using this approach, we eliminate the need for a more Using this approach, we eliminate the need for a more complex unambiguous grammar with more rulescomplex unambiguous grammar with more rules

$ …. ea$ shift e$ ….e a$ shift a$ ….e...a $ reduce using s a$ ….e $ reduce using s i s e s$ ..i.. $ reduce using s i s$ $ accept

CH4p3.69

CSE4100

Example 2 – Simplified Expression GrammarExample 2 – Simplified Expression Grammar Consider the Grammar:Consider the Grammar:

E E E + E | E * E | ( E ) | id E + E | E * E | ( E ) | id What’s Problem with this Grammar?What’s Problem with this Grammar? Why would this Grammar be Preferable?Why would this Grammar be Preferable?

Employ Techniques Similar to Previous Example to Remove Multiple Table Entries

Result is to Achieve both Associative and Precedence Behavior for + and *

Change Assoc/Precedence by Changing Table No more Extra Work Improve Performance

CH4p3.70

CSE4100

E

id

(

First, Calculate Item SetsFirst, Calculate Item Sets I0: E’ .E E . E + E E . E * E E . (E) E . id

I1: E’ E. E E . + E E E . * E

I2: E (.E) E . E + E E . E * E E . (E) E . id

I3: E id . I4: E E + . E E . E + E E . E * E E . (E) E . id

I5: E E * . E E . E + E E . E * E E . (E) E . id

I6: E (E.) E E . + E E E . * E

I7: E’ E + E. E E . + E E E . * E

I8: E’ E * E. E E . + E E E . * E

I9: E (E).

Follow(E’)= $Follow(E)=$, +, *, )

E

id

(

(

id

E

(

+

*

+

*

E

id

(

*+ +

*

CH4p3.71

CSE4100

Consider States Consider States I7 and and I8

State State I7

E’ E + E. action[7,+] = reduce by action[7,+] = reduce by E E + E action[7,*] = reduce by action[7,*] = reduce by E E + E

action[7,)] = reduce by action[7,)] = reduce by E E + E action[7,$] = reduce by action[7,$] = reduce by E E + E

E E . + E action[7,+] = shift to state 4 ,+] = shift to state 4

E E . * E action[7,*] = shift to state 5,*] = shift to state 5 State State I8

action[7,+] = reduce by action[7,+] = reduce by E E + E or shift to state 4 or shift to state 4 action[7,*] = reduce by action[7,*] = reduce by E E + E or shift to state 5 or shift to state 5

How is Each Conflict Resolve?How is Each Conflict Resolve?

CH4p3.72

CSE4100

Parsing Table:Parsing Table:State action goto

id + * ( ) $ E0 s3 s2 11 s4 s5 acc2 s3 s2 63 r4 r4 r4 r44 s3 s2 85 s3 s2 86 s4 s5 s97 r1 s5 r1 r18 r2 r2 r2 r29 r3 r3 r3 r3

Rules:1 E’ .E 2 E . E + E3 E . E * E4 E . (E)5 E . id

“+”is left assoc

Shift “*” onto stack since it has higher precedenceReduce using rule 2

regardless of + or *

CH4p3.73

CSE4100

Canonical Parser Table ConstructionCanonical Parser Table Construction Not all Parser Tables are Created Equally!Not all Parser Tables are Created Equally! Differentiate between SLR/LR(0), LR(1), and LALR(1) Differentiate between SLR/LR(0), LR(1), and LALR(1)

(Yacc/Bison)(Yacc/Bison) Key Issue: Utilization of LookaheadsKey Issue: Utilization of Lookaheads

SLR – Current Input LR(1) – Current Input plus Next Token LR(k) – Current Input plus Next k Tokens

ConsiderConsiderid + id * idid + id * id

SLR/LR(0)Current Input

LR(1) – id determines if shift or reduce – 2nd token (+) determines rule

– if conflict, 2nd token can break tie– on the fly dis-ambiguity – sometimes s, sometimes r– depends on that 2nd toek

CH4p3.74

CSE4100

Recall the Prior Grammar Recall the Prior Grammar Item set Item set I0 as given below left …as given below left … For LR(1) items, we must consider basis on which the For LR(1) items, we must consider basis on which the

rule causes a shift on a lookahead terminal rule causes a shift on a lookahead terminal When we put When we put E’→ . E into LR(1) set, we must also into LR(1) set, we must also

consider the first terminal that appears after Econsider the first terminal that appears after E This is the lookahead…This is the lookahead…

LR(0)E’→ . E E → . E + T E → . TT → . T * FT → . FF → . ( E )F → . Id

Step 1: LR(1)E’→ . E, $ E → . E + T, $ E → . T, $

What appear afterE in 2nd Item?

Step 2: LR(1)E’→ . E, $ E → . E + T, $/+ E → . T, $If it appears after E,what else does itappear after?

Step 3: LR(1)E’→ . E, $ E → . E + T, $/+ E → . T, $/+

CH4p3.75

CSE4100

Another Way to View Process …Another Way to View Process … Closure[Closure[E’→ E] begins with placing:] begins with placing:

E’ → . E, $ into the item set… into the item set… Since Since E → E + T, we place:

E→ . E + T, $ into item setcarrying along lookahead $ from E’→ . E, $

Now, for Now, for E→ . E + T, what can “E” on right hand side be replaced with? E → E + T again!

If we do this replacement, we need to ask what is the lookahead that follows E on r.h.s. in E → E + T ? We calculate First (+T) the remainder of the rule This is “+” so we add in this additional lookahead

E’→ . E, $ E → . E + T, $ E → . E + T, +

E’→ . E, $ E → . E + T, $/+

We abbreviate this as …

CH4p3.76

CSE4100

Continuing …Continuing … Since Since E → T, we add: E→ . T, $/+ into the Set Now, what does T go to?

T → T * F and T→ F So we add:So we add:

T → . T * F, $/+ and T→ . F , $/+ into Set What can T go to? T → T * F What is the First token following T? First (*F) = * So, add in: * to get: T → . T * F, $/+/* Since T→ F, we also add “*” to yield: T→ . F , $/+/* Are we done?

CH4p3.77

CSE4100

Continuing …Continuing … Since Since T → . F, we now consider the two F rules:

F → ( E ) and F → Id We add in the items:

F → . ( E ), $/+/*F → . Id, $/+/*bringing along the lookaheads from T→ . F , $/+/*

The lookaheads in this case are:First (what follows F concatenated with $/+/*)

This is $/+/*! We arrive at item set I0 :

LR(1)E’ → . E, $E → . E + T, $/+ E → . T, $/+T → . T * F , $/+/* T → . F , $/+/* F → . ( E ) , $/+/* F → . Id , $/+/*

CH4p3.78

CSE4100

Another Example … LR(0) Sets Another Example … LR(0) Sets

I0: S’ .S S . CC C . cC C . d

I1: S’ S .

I2: S C.C C . cC C . d

I4: C d .

S

C

I5: S CC.C

d

I6: C cC .

S’ S S’ CCC cC | dFollow(S’)= $Follow(S)=$Follow(C)=c,d,$

c

C

d

I3: C c.C C . cC C . d

d c

c

CH4p3.79

CSE4100

Now Consider … LR(1) SetsNow Consider … LR(1) SetsS

C

C

d

Follow(S’)= $Follow(S)=$Follow(C)=c,d,$

c

d

c

c

I0: S’ .S, $ S . CC, $ C . cC, c/d C . d , c/d

I1: S’ S ., $

I2: S C.C, $ C . cC, $ C . d, $

I4: C d ., c/d

I5: S CC., $

I8: C cC ., c/d

I3: C c.C, c/d C . cC, c/d C . d, c/d

I6: C c.C, $ C . cC, $ C . d, $

c

d I7: C d ., $

d

I9: C cC ., $C

C

CH4p3.80

CSE4100

Parsing TableParsing Table Easy to Construct from the State Machine …Easy to Construct from the State Machine …

Shifts on terminals (arcs) Reductions based on lookaheads Gotos as with SLR case

State action gotoc d $ S C

0 s3 s4 1 21 acc2 s6 s7 53 s3 s4 84 r3 r35 r1 6 s6 s7 97 r38 r2 r29 r2

CH4p3.81

CSE4100

What’s Real Problem Here?What’s Real Problem Here? Grammar we used with 3 Production Rules Grammar we used with 3 Production Rules

Result was 10 LR(1) states! For Expression Grammar (slide 58), LR(1) would

have 22 states! Lookahead LR Parsing (LALR), on which Compiler Lookahead LR Parsing (LALR), on which Compiler

Tools (Yacc, Bison) are Based, Achieve Similar Tools (Yacc, Bison) are Based, Achieve Similar Results with Less StatesResults with Less States Objective is to Create LR(1) Sets Identify Sets with Similar Cores (Items are the

same but lookaheads may be different) Merge Sets with Similar Cores Factor of 10 in Reduction of States

CH4p3.82

CSE4100

What are the Similar Cores?What are the Similar Cores?S

C

C

d

c

d

c

c


I1: S’ S ., $

I2: S C.C, $ C . cC, $ C . d, $

I4: C d ., c/d

I5: S CC., $

I8: C cC ., c/d

I3: C c.C, c/d C . cC, c/d C . d, c/d

I6: C c.C, $ C . cC, $ C . d, $

c

d I7: C d ., $

d

I9: C cC ., $C

C

CH4p3.83

CSE4100

Resulting State Machine …Resulting State Machine …S

C

C

d

c

c


I1: S’ S ., $

I2: S C.C, $ C . cC, $ C . d, $

I47: C d ., $/c/d

I5: S CC., $

I36: C c.C, c/d/$ C . cC, c/d/$ C . d, c/d/$

c

d

I89: C cC ., $/c/d

C

d

CH4p3.84

CSE4100

… … With Simplified Parsing TableWith Simplified Parsing Table

State action gotoc d $ S C

0 s36 s47 1 21 acc2 s36 s47 536 s36 s47 8947 r3 r3 r35 r1 89 r2 r2 r2

CH4p3.85

CSE4100

Parser GeneratorsParser Generators The entire process we describe can be automatedThe entire process we describe can be automated

Computation of the machine states Computation of the lookaheads Computation of the action and goto tables Optimization of the LALR tables.

Therefore...Therefore... Tools exist to do this for you!

CH4p3.86

CSE4100

Parser Generators IIParser Generators II In the C/C++ worldIn the C/C++ world

Most famous parser generatorYACC LALR(1)

Most used parser generatorBISON LALR(1)

Table-driven leftmostPCCTS LL(k)

In the Java worldIn the Java world Several alternatives

CUP (a BISON/YACC lookalike) LALR(1)JACK LALR(1)

CH4p3.87

CSE4100

Big PictureBig Picture

CH4p3.88

CSE4100

The Road AheadThe Road Ahead What are we missing ?What are we missing ?

A parse tree! How can we get one ?How can we get one ?

By augmenting the grammar! With actions [pieces of Java code]

Purpose of actionsPurpose of actions Manufacture the tree as a side-effect of parsing.

ReadingReading Syntax directed translation via

Attribute GrammarsYacc

CH4p3.1 CSE 4100 Chapter 4 - Part 3: Bottom-Up Parsing Prof. Steven A. Demurjian Computer Science &...

Documents

Transcript of CH4p3.1 CSE 4100 Chapter 4 - Part 3: Bottom-Up Parsing Prof. Steven A. Demurjian Computer Science &...