LR(k) Parsing CPSC 388 Ellen Walker Hiram College.

30
LR(k) Parsing CPSC 388 Ellen Walker Hiram College

Transcript of LR(k) Parsing CPSC 388 Ellen Walker Hiram College.

LR(k) Parsing

CPSC 388Ellen WalkerHiram College

Bottom Up Parsing

• Start with tokens• Build up rule RHS (right side)• Replace RHS by LHS• Done when stack is only start symbol

• (Working from leaves of tree to root)

Operations in Bottom-up Parsing

• Shift:– Push the terminal from the beginning of the string to the top of the stack

• Reduce– Replace the string xyz at the top of the stack by a nonterminal A (assuming A->xyz)

• Accept (when stack is $S’; empty input)

Sample Parse

• S’ -> S; S-> aSb | bSa | SS | e• String: abba

– Stack = $, input = abba$; shift– Stack = $a input = bba$; reduce S->e

– Stack = $aS input = bba$ ; shift– Stack = $aSb input = ba$ ; reduce S->aSb

– Stack = $S input = ba ; shift

Sample Parse (cont)

– Stack = $S input = ba$ ; shift– Stack = $Sb input = a$ ; reduce S->e– Stack = $SbS input = a$ ; shift– Stack = $SbSa input = $; reduce S->bSa

– Stack = $SS input = $; reduce S->SS– Stack = $S input = $; reduce S’-> S– Stack = $S’ input = $; accept

LR(k) Parsing

• LR(0) grammars can be parsed with no lookahead (stack only)

• LR(1) grammars need 1 character lookahead

• LR(k), k>1 use multi-character lookahead

• Most “real” grammars are LR(1)

Shift vs. Reduce

• First, build NFA of LR(0) items• Transform NFA to DFA• If unambiguous, grammar is LR(0) - use DFA directly to parse (states indicate shift vs. reduce)

• Otherwise, use SLR(1) algorithm

LR(0) Items

• Rules with . between stack & input• For S->(S) | a, the LR(0) items are:S -> .(S) S-> (.S) S->(S.) S->(S).S-> .a S-> a.

• S -> .(S) and S-> .a are initial items

• S-> (S). and S->a. are complete items

Building NFA

• Each LR(0) item is a state• Shift transitions

• Change of goal transitions

aA -> .aB A -> a.B

εS -> x.Ay A-> .aB

More on NFA

• Initial state is “ S’ -> .S”• No final state, but acceptance happens in S’->S. state

• Complete LR(0) items have no outbound transitions– We’ll worry about getting past them later

• No “reduce transitions”– “shift” on non-terminal used during reduce

NFA: S-> (S) | Ab ; A -> aA | ε

SS'-> .S S' -> S.

S->.(S) S->(.S) S->(S.) S->(S).

( S )ε

ε

S->.Ab

A

S->A.b

b

S->A .b

A->.aA

a

A->a.A

A

A->aA.

ε

ε

A->.

ε

ε

ε

ε

NFA -> DFA

• Compute ε-closure (closure items)– All are initial items

• Use subset construction (kernel items)

• Grammar + kernel items are sufficient (closure items can be inferred)

• DFA is computed directly by YACC, etc.

DFA Construction Details

• For each symbol (terminal or nonterminal) after the marker, create a shift transition. These are kernel items.

SS'-> .S S' ->

S.

DFA Construction Details

• If there are multiple shift transitions on the same symbol, these are combined into the same state.

• (Because the NFA will be in all those states at once).

Adding Closure Items

• When the marker is immediately before a non-terminal symbol, the closure items are all of the initial forms for the new symbol, e.g.– S’ -> .S (kernel item)– S -> .(S) (closure item)– S -> .Ab (closure item)

• These denote the change of goal transitions (which are all epsilon-transitions)

DFA “Final” States

• The DFA doesn’t actually accept the string, so the concept of “final” isn’t the same

• In JFLAP, mark any state where a reduction can take place as final

DFA S-> (S) | Ab ; A -> aA | ε

LR(0) Parsing

• At each step, push a state onto the stack, and do an action based on the current state– A->a.xb (not a complete item)If x is terminal, shift.

– A->aXb. (a complete item)Reduce by A->aXb

When Not LR(0)?

• Shift-reduce conflict– State contains both a complete item and a “shift” item (with leading terminal)

• Reduce-reduce conflict– State contains 2 or more complete items.

• Previous example is not LR(0)! (Why)?

Simple LR(1)

• If a shift is possible, do it• Else if there is a complete item for A, and the next terminal is in Follow(A), reduce A. Compute the next state by taking the A link from the last state left on the stack before pushing A

• Otherwise, there is a parse error

SLR(1) Table

• Rows are states, columns are symbols (terminal and nonterminal)

• Table entries (3 types):– sn shift & goto state n (only for terminals)– Rk reduce using rule k (rule #’s start at 0 in JFLAP)

– n Goto state n (only for nonterminals, after reduction)

Transitions and Table Entries

• Transition from state m to state n on terminal x– Put sn in table [m][x]

• Transition from state m to state n on nonterminal X– Put n in table [m][X]

• State m has a complete item for rule k, and terminal x is in FINAL of the LHS of rule k– Put rk in table[m][x]

• State m is “S’->S”– Put acc (accept) in table[m][$]

SLR(1) Example

• Grammar– S-> (S) | Ab A-> aA | ε

• Firsts– S: (,a,b A: a,ε

• Follows– S: $,) A: b

SLR(1) Example TableStat

( ) a b $ A S

0 s2 s3 r4 7 1

1 acc

2 s2 s3 r4 7 5

3 s3 r4 4

4 r3

5 s6

6 r1 r1

7 s8

8 r2 r2

SLR(1) Example

• Stack input$0 (aab)$$0(2 aab)$$0(2a7 ab)$$0(2a7a7 b)$$0(2a7a7A8 b)$ A->ε$0(2a7A8 b)$ A->ε$0(2A5 b)$ A->aA

SLR(1) Example cont.

• $0(2A5 b)$• $0(2A5b6 )$• $0(2S3 )$• $0(2S3)4 $• $0S1 $• $0S’ $ accept!

Another SLR(1) Grammar to Try

• S -> zMNz• M -> aMa• M -> z• N -> bNb• N -> z

Parsing Conflicts in SLR(1)

• Shift-reduce conflict– Prefer shift over reduce

• Reduce-reduce conflicts– Error in design of grammar (usually)

– Possible to designate a grammar-specific choice

Dangling Else

• Remember: if C if C else S– Shift-preference puts else with inner if!

– To put else with outer if, inner “if C” must be reduced to S first

• Good example of how language “evolved” to make it easy for the compiler!

More than SLR(1)

• SLR(k) Parsing– Multiple-token lookahead (for shifts) and multiple-token follow information (for reductons)

• General LR(1) parsing– Include lookaheads in DFA construction

• LALR(1) parsing– Simplified state diagram for GLR(1)– What YACC / Bison uses