CH4.1 CSE244 More on LR Parsing Aggelos Kiayias Computer Science & Engineering Department The...

25
CH4.1 CSE244 More on LR Parsing More on LR Parsing Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155 [email protected] http://www.cse.uconn.edu/~akiayias

Transcript of CH4.1 CSE244 More on LR Parsing Aggelos Kiayias Computer Science & Engineering Department The...

CH4.1

CSE244

More on LR ParsingMore on LR Parsing

Aggelos KiayiasComputer Science & Engineering Department

The University of Connecticut191 Auditorium Road, Box U-155

Storrs, CT [email protected]

http://www.cse.uconn.edu/~akiayias

CH4.2

CSE244

Picture So FarPicture So Far

SLR construction:SLR construction:based on canonical collection of LR(0) items – based on canonical collection of LR(0) items – gives rise to canonical LR(0) parsing table.gives rise to canonical LR(0) parsing table.

No multiply defined labels => Grammar is called No multiply defined labels => Grammar is called “SLR(1)”“SLR(1)”

More general class: LR(1) grammars.More general class: LR(1) grammars.Using the notion of LR(1) item and the canonical Using the notion of LR(1) item and the canonical LR(1) parsing table.LR(1) parsing table.

CH4.3

CSE244

LR(1) ItemsLR(1) Items

DEF. A DEF. A LR(1) item LR(1) item is a production with a marker is a production with a marker together with a terminal:together with a terminal:E.g. [E.g. [S aA.Be, c]]intuition: it indicates how much of a certain production we have seen already (aA) + what we could expect next (Be) + a lookahead that agrees with what should follow in the input if we ever do Reduce by the production S aABeBy incorporating such lookahead information into the item concept we will make more wise reduce decisions.

Direct use of lookahead in an LR(1) item is only Direct use of lookahead in an LR(1) item is only performed in considering reduce actions. (I.e. when performed in considering reduce actions. (I.e. when marker is in the rightmost).marker is in the rightmost).

Core Core of an LR(1) item [of an LR(1) item [S aA.Be, c] is the LR(0) ] is the LR(0) item item S aA.Be

Different LR(1) items may share the same core.

CH4.4

CSE244

Usefulness of LR(1) itemsUsefulness of LR(1) items

E.g. if we have two LR(1) items of the formE.g. if we have two LR(1) items of the form [ A . , a ] [ B . , b ] we will take

advantage of the lookahead to decide which reduction to use (the same setting would perhaps produce a reduce/reduce conflict in the SLR approach).

How the Notion of Validity changes:

An item [ A 1.2 , a ] is valid for a viable prefix 1 if we have a rightmost derivation that yields Aaw which in one step yields 12aw

CH4.5

CSE244

Constructing the Canonical Collection of Constructing the Canonical Collection of LR(1) itemsLR(1) items

Initial item: [ S’ .S , $] Closure. (more refined)

if [A.B , a] belongs to the set of items, and B is a production of the grammar, then:we add the item [B . , b] for all bFIRST(a)

Goto. Goto. (the same)(the same)A state containing [A.X , a] will move to a state containing [AX. , a] with label X

Every state is closed according to Closure. Every state has transitions according to Goto.

CH4.6

CSE244

Constructing the LR(1) Parsing TableConstructing the LR(1) Parsing Table

Shift actions: (same)Shift actions: (same)If If [A.b , a] is in state Iis in state Ikk and I and Ikk moves to state I moves to state Imm

with label with label b then we add the action then we add the actionaction[k, action[k, b] = “] = “shift shift mm””

Reduce actions: (more refined)Reduce actions: (more refined)If If [A. , a] is in state Iis in state Ikk then we add the action: then we add the action:“Reduce “Reduce A””into into actionaction[A, [A, a]]Observe that we don’t use information from Observe that we don’t use information from FOLLOW(A) anymore.FOLLOW(A) anymore.

Goto part of the table is as before.Goto part of the table is as before.

CH4.7

CSE244

Example IExample I

S’ SS CC

C c C | d

FIRSTS c d

C c d

construction

CH4.8

CSE244

Example IIExample II

S’ SS L = R | R

L * R | idR L

FIRSTS * id

L * idR * id

CH4.9

CSE244

LR(1) more general to SLR(1):LR(1) more general to SLR(1):

S’ SS L = R | R

L * R | idR L

I0 = { [S’ .S , $ ]

[S .L = R , $ ][S .R , $ ][L .* R , = / $ ][L . id , = / $ ][R .L , $ ] }

I1 = {[S’ S . , $ ]}

I2 = { [S L . = R , $ ]

[R L . , $ ] }

I3 = { [S R. , $ ]}

I4 = { [L *.R , = / $ ]

[R .L , = / $ ][L .* R , = / $ ] [L . id , = / $ ] }

action[2, = ] ?s6(because of S L . = R )THERE IS NO CONFLICT ANYMORE

I5 = {[L id. , = / $ ]}I6 = { [S L = . R , $ ]

[R .L , $ ] [L .* R , $ ] [L . id , $ ] }

I7 = {[L *R. , = / $ ]}

I8 = {[R L. , = / $ ]}

I10 = {[L *R. , $ ]}

I11 = {[L id. , $ ]}I12 = {[R L. , $ ]}

I9 = {[L *.R , $ ]

[R .L , $ ][L .* R , $ ] [L . id , $ ] }

CH4.10

CSE244

LALR ParsingLALR Parsing

Canonical sets of LR(1) itemsCanonical sets of LR(1) items Number of states much larger than in the SLR constructionNumber of states much larger than in the SLR construction LR(1) = Order of thousands for a standard prog. Lang.LR(1) = Order of thousands for a standard prog. Lang. SLR(1) = order of hundreds for a standard prog. Lang.SLR(1) = order of hundreds for a standard prog. Lang. LALR(1) (lookahead-LR)LALR(1) (lookahead-LR) A tradeoff:A tradeoff:

Collapse states of the LR(1) table that have the same core (the “LR(0)” part of each state)

LALR never introduces a Shift/Reduce Conflict if LR(1) doesn’t.

It might introduce a Reduce/Reduce Conflict (that did not exist in the LR(1))…

Still much better than SLR(1) (larger set of languages) … but smaller than LR(1), actually ~ SLR(1)

What Yacc and most compilers employ.What Yacc and most compilers employ.

CH4.11

CSE244

Collapsing states with the same core.Collapsing states with the same core.

E.g., If IE.g., If I33 II6 6 collapse then whenever the LALR(1) collapse then whenever the LALR(1) parser puts Iparser puts I3636 into the stack, the LR(1) parser into the stack, the LR(1) parser would have either Iwould have either I33 or Ior I66

A shift/reduce action would not be introduced by A shift/reduce action would not be introduced by the LALR “collapse”the LALR “collapse” Indeed if the LALR(1) has a Shift/Reduce

conflict this conflict should also exist in the LR(1) version: this is because two states with the same core would have the same outgoing arrows.

On the other hand a reduce/reduce conflict may be On the other hand a reduce/reduce conflict may be introduced.introduced.

Still LALR(1) preferred: table proportional to Still LALR(1) preferred: table proportional to SLR(1)SLR(1)

Direct construction is also possible.Direct construction is also possible.

CH4.12

CSE244

Error Recovery in LR ParsingError Recovery in LR Parsing

For a given stack $...IFor a given stack $...Iii and input symbols and input symbols s…s’…$ it holds that action[i,it holds that action[i,s] = empty] = empty

Panic-mode error recovery.Panic-mode error recovery.

CH4.13

CSE244

Panic Recovery Strategy IPanic Recovery Strategy I

Scan down the stack till a state IScan down the stack till a state Ijj is found is found Ij moves with the non-terminal A to some state

Ik Ik moves with s’ to some state Ik’

Proceed as follows:Proceed as follows: Pop all states till Ij Push A and state Ik Discard all symbols from the input till s’

There may be many choices as above.There may be many choices as above. [essentially the parser in this way determines that a [essentially the parser in this way determines that a

string that is produced by A has an error; it string that is produced by A has an error; it assumes it is correct and advances]assumes it is correct and advances]

Error message: construct of type “A” has error at Error message: construct of type “A” has error at location Xlocation X

CH4.14

CSE244

Panic Recovery Strategy IIPanic Recovery Strategy II

Scan down the stack till a state IScan down the stack till a state Ijj is found is found Ij moves with the terminal t to some state Ik Ik with s’ has a valid action.

Proceed as follows:Proceed as follows: Pop all states till Ij Push t and state Ik Discard all symbols from the input till s’

There may be many choices as above.There may be many choices as above. Error message: “missing Error message: “missing tt””

CH4.15

CSE244

ExampleExample

E’ E

E E + E |

| E * E | ( E ) | id

id + * ( ) $ EE

00 s3s3 e1e1 e1e1 s2s2 e2e2 e1e1 11

11 e3e3 s4s4 s5s5 e3e3 e2e2 accacc

22 s3s3 e1e1 e1e1 s2s2 e2e2 e1e1 66

33 r4r4 r4r4 r4r4 r4r4 r4r4 r4r4

44 s3s3 e1e1 e1e1 s2s2 e2e2 e1e1 77

55 s3s3 e1e1 e1e1 s2s2 e2e2 e1e1 88

66 e3e3 s4s4 s5s5 e3e3 s9s9 e4e4

77 r1r1 r1r1 s5s5 r1r1 r1r1 r1r1

88 r2r2 r2r2 r2r2 r2r2 r2r2 r2r2

99 r3r3 r3r3 r3r3 r3r3 r3r3 r3r3

action goto

CH4.16

CSE244

Collection of LR(0) itemsCollection of LR(0) itemsE’ EE E + E | | E * E | ( E ) | id

I0 I2 I5 I8

E’ .E E (. E ) E E * . E E E * E .E .E + E E .E + E E .E + E E E . + E E .E * E E .E * E E .E * E E E . * E

E .( E ) E .( E ) E .( E ) E .id E .id E .id

I1 I3 I6 I9

E’ E. E id. E ( E . ) E ( E ) . E E . + E E E . + E E E . * E I4 E E . * E

E E + . E E .E + E I7

E .E * E E E + E . E .( E ) E E . + E

E .id E E . * E

Follow(E’)=$Follow(E)=+*)$

CH4.17

CSE244

The parsing tableThe parsing table

idid ++ ** (( )) $$ EE00 s3s3 s2s2 1111 s4s4 s5s5 accacc22 s3s3 s2s2 6633 r4r4 r4r4 r4r4 r4r444 s3s3 s2s2 7755 s3s3 s2s2 8866 s4s4 s5s5 s9s977 s4/s4/r1r1 s5s5/r1/r1 r1r1 r1r188 s4/s4/r2r2 s5/s5/r2r2 r2r2 r2r299 r3r3 r3r3 r3r3 r3r3

CH4.18

CSE244

Error-handlingError-handling

idid ++ ** (( )) $$ EE00 s3s3 e1e1 s2s2 1111 s4s4 s5s5 accacc22 s3s3 s2s2 6633 r4r4 r4r4 r4r4 r4r444 s3s3 s2s2 7755 s3s3 s2s2 8866 s4s4 s5s5 s9s977 s4/s4/r1r1 s5s5/r1/r1 r1r1 r1r188 s4/s4/r2r2 s5/s5/r2r2 r2r2 r2r299 r3r3 r3r3 r3r3 r3r3

CH4.19

CSE244

Error-handlingError-handling

I0 I2 I5 I8

E’ .E E (. E ) E E * . E E E * E .E .E + E E .E + E E .E + E E E . + E E .E * E E .E * E E .E * E E E . * E

E .( E ) E .( E ) E .( E ) E .id E .id E .id

e1 Push E into the stack and move to state 1“missing operand”

:e1 Push id into the stack and change to state 3

“missing operand”

CH4.20

CSE244

Error-handlingError-handling

idid ++ ** (( )) $$ EE00 s3s3 e1e1 e1e1 s2s2 e1e1 1111 s4s4 s5s5 accacc22 s3s3 s2s2 6633 r4r4 r4r4 r4r4 r4r444 s3s3 s2s2 7755 s3s3 s2s2 8866 s4s4 s5s5 s9s977 s4/s4/r1r1 s5s5/r1/r1 r1r1 r1r188 s4/s4/r2r2 s5/s5/r2r2 r2r2 r2r299 r3r3 r3r3 r3r3 r3r3

CH4.21

CSE244

Error-handlingError-handling

idid ++ ** (( )) $$ EE00 s3s3 e1e1 e1e1 s2s2 e2e2 e1e1 1111 s4s4 s5s5 e2e2 accacc22 s3s3 s2s2 6633 r4r4 r4r4 r4r4 r4r444 s3s3 e1e1 s2s2 7755 s3s3 s2s2 8866 s4s4 s5s5 s9s977 s4/s4/r1r1 s5s5/r1/r1 r1r1 r1r188 s4/s4/r2r2 s5/s5/r2r2 r2r2 r2r299 r3r3 r3r3 r3r3 r3r3

CH4.22

CSE244

Error-handlingError-handling

e2 remove “)” from input.

“unbalanced right parenthesis”

Try the input id+)

CH4.23

CSE244

Error-handling state 1Error-handling state 1

idid ++ ** (( )) $$ EE00 s3s3 e1e1 e1e1 s2s2 e2e2 e1e1 1111 e3e3 s4s4 s5s5 accacc22 s3s3 s2s2 6633 r4r4 r4r4 r4r4 r4r444 s3s3 s2s2 7755 s3s3 s2s2 8866 s4s4 s5s5 s9s977 s4/s4/r1r1 s5s5/r1/r1 r1r1 r1r188 s4/s4/r2r2 s5/s5/r2r2 r2r2 r2r299 r3r3 r3r3 r3r3 r3r3

CH4.24

CSE244

Error-HandlingError-Handling

I1 I3 I6 I9

E’ E. E id. E ( E . ) E ( E ) . E E . + E E E . + E E E . * E I4 E E . * E

E E + . E E .E + E I7

E .E * E E E + E . E .( E ) E E . + E

E .id E E . * E

e3 Push + into the stack and change to state 4“missing operator”

CH4.25

CSE244

Intro to TranslationIntro to Translation Side-effects and Translation Schemes.Side-effects and Translation Schemes.

Do the construction as before but:Do the construction as before but: Side-effect in front of a symbol will be

executed in a state when we make the move following that symbol to another state.

Side-effects on the rightmost end are executed during reduce actions.

E’ EE E + E {print(+)} | E * E {print(*)} | {parenthesis++} ( E ) {parenthesis--} | id { print(id); print(parenthesis); }

Do for example id*(id+id)$

side-effectsattached to the symbolsto the right of them.