LR(k) Parsing CPSC 388 Ellen Walker Hiram College.

LR(k) Parsing

CPSC 388Ellen WalkerHiram College

Bottom Up Parsing

• Start with tokens• Build up rule RHS (right side)• Replace RHS by LHS• Done when stack is only start symbol

• (Working from leaves of tree to root)

Operations in Bottom-up Parsing

• Shift:– Push the terminal from the beginning of the string to the top of the stack

• Reduce– Replace the string xyz at the top of the stack by a nonterminal A (assuming A->xyz)

• Accept (when stack is $S’; empty input)

Sample Parse

• S’ -> S; S-> aSb | bSa | SS | e• String: abba

– Stack = $, input = abba$; shift– Stack = $a input = bba$; reduce S->e

– Stack = $aS input = bba$ ; shift– Stack = $aSb input = ba$ ; reduce S->aSb

– Stack = $S input = ba ; shift

Sample Parse (cont)

– Stack = $S input = ba$ ; shift– Stack = $Sb input = a$ ; reduce S->e– Stack = $SbS input = a$ ; shift– Stack = $SbSa input = $; reduce S->bSa

– Stack = $SS input = $; reduce S->SS– Stack = $S input = $; reduce S’-> S– Stack = $S’ input = $; accept

LR(k) Parsing

• LR(0) grammars can be parsed with no lookahead (stack only)

• LR(1) grammars need 1 character lookahead

• LR(k), k>1 use multi-character lookahead

• Most “real” grammars are LR(1)

Shift vs. Reduce

• First, build NFA of LR(0) items• Transform NFA to DFA• If unambiguous, grammar is LR(0) - use DFA directly to parse (states indicate shift vs. reduce)

• Otherwise, use SLR(1) algorithm

LR(0) Items

• Rules with . between stack & input• For S->(S) | a, the LR(0) items are:S -> .(S) S-> (.S) S->(S.) S->(S).S-> .a S-> a.

• S -> .(S) and S-> .a are initial items

• S-> (S). and S->a. are complete items

Building NFA

• Each LR(0) item is a state• Shift transitions

• Change of goal transitions

aA -> .aB A -> a.B

εS -> x.Ay A-> .aB

More on NFA

• Initial state is “ S’ -> .S”• No final state, but acceptance happens in S’->S. state

• Complete LR(0) items have no outbound transitions– We’ll worry about getting past them later

• No “reduce transitions”– “shift” on non-terminal used during reduce

NFA: S-> (S) | Ab ; A -> aA | ε

SS'-> .S S' -> S.

S->.(S) S->(.S) S->(S.) S->(S).

( S )ε

ε

S->.Ab

A

S->A.b

b

S->A .b

A->.aA

a

A->a.A

A

A->aA.

ε

ε

A->.

ε

ε

ε

ε

NFA -> DFA

• Compute ε-closure (closure items)– All are initial items

• Use subset construction (kernel items)

• Grammar + kernel items are sufficient (closure items can be inferred)

• DFA is computed directly by YACC, etc.

DFA Construction Details

• For each symbol (terminal or nonterminal) after the marker, create a shift transition. These are kernel items.

SS'-> .S S' ->

S.

DFA Construction Details

• If there are multiple shift transitions on the same symbol, these are combined into the same state.

• (Because the NFA will be in all those states at once).

Adding Closure Items

• When the marker is immediately before a non-terminal symbol, the closure items are all of the initial forms for the new symbol, e.g.– S’ -> .S (kernel item)– S -> .(S) (closure item)– S -> .Ab (closure item)

• These denote the change of goal transitions (which are all epsilon-transitions)

DFA “Final” States

• The DFA doesn’t actually accept the string, so the concept of “final” isn’t the same

• In JFLAP, mark any state where a reduction can take place as final

DFA S-> (S) | Ab ; A -> aA | ε

LR(0) Parsing

• At each step, push a state onto the stack, and do an action based on the current state– A->a.xb (not a complete item)If x is terminal, shift.

– A->aXb. (a complete item)Reduce by A->aXb

When Not LR(0)?

• Shift-reduce conflict– State contains both a complete item and a “shift” item (with leading terminal)

• Reduce-reduce conflict– State contains 2 or more complete items.

• Previous example is not LR(0)! (Why)?

Simple LR(1)

• If a shift is possible, do it• Else if there is a complete item for A, and the next terminal is in Follow(A), reduce A. Compute the next state by taking the A link from the last state left on the stack before pushing A

• Otherwise, there is a parse error

SLR(1) Table

• Rows are states, columns are symbols (terminal and nonterminal)

• Table entries (3 types):– sn shift & goto state n (only for terminals)– Rk reduce using rule k (rule #’s start at 0 in JFLAP)

– n Goto state n (only for nonterminals, after reduction)

Transitions and Table Entries

• Transition from state m to state n on terminal x– Put sn in table [m][x]

• Transition from state m to state n on nonterminal X– Put n in table [m][X]

• State m has a complete item for rule k, and terminal x is in FINAL of the LHS of rule k– Put rk in table[m][x]

• State m is “S’->S”– Put acc (accept) in table[m][$]

SLR(1) Example

• Grammar– S-> (S) | Ab A-> aA | ε

• Firsts– S: (,a,b A: a,ε

• Follows– S: $,) A: b

SLR(1) Example TableStat

( ) a b $ A S

0 s2 s3 r4 7 1

1 acc

2 s2 s3 r4 7 5

3 s3 r4 4

4 r3

5 s6

6 r1 r1

7 s8

8 r2 r2

SLR(1) Example

• Stack input$0 (aab)$$0(2 aab)$$0(2a7 ab)$$0(2a7a7 b)$$0(2a7a7A8 b)$ A->ε$0(2a7A8 b)$ A->ε$0(2A5 b)$ A->aA

SLR(1) Example cont.

• $0(2A5 b)$• $0(2A5b6 )$• $0(2S3 )$• $0(2S3)4 $• $0S1 $• $0S’ $ accept!

Another SLR(1) Grammar to Try

• S -> zMNz• M -> aMa• M -> z• N -> bNb• N -> z

Parsing Conflicts in SLR(1)

• Shift-reduce conflict– Prefer shift over reduce

• Reduce-reduce conflicts– Error in design of grammar (usually)

– Possible to designate a grammar-specific choice

Dangling Else

• Remember: if C if C else S– Shift-preference puts else with inner if!

– To put else with outer if, inner “if C” must be reduced to S first

• Good example of how language “evolved” to make it easy for the compiler!

More than SLR(1)

• SLR(k) Parsing– Multiple-token lookahead (for shifts) and multiple-token follow information (for reductons)

• General LR(1) parsing– Include lookaheads in DFA construction

• LALR(1) parsing– Simplified state diagram for GLR(1)– What YACC / Bison uses

LR(k) Parsing CPSC 388 Ellen Walker Hiram College.

Documents

Transcript of LR(k) Parsing CPSC 388 Ellen Walker Hiram College.