An Object-Oriented Parser Generator Based on Parsing Expression Grammars
an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ”...
-
Upload
coral-watts -
Category
Documents
-
view
219 -
download
0
Transcript of an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ”...
![Page 1: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/1.jpg)
an efficient Bottom-up parser for a large and useful class of context-free grammars.
the “L” stands for left-to-right scan of the input;the “R” for constructing a Rightmost derivation in reverse.
The attractive reasons of LR parsers(1) LR parsers can be constructed for most programming
languages.(2) LR parsing method is more general than LL parsing method.(3) LR parsers can detect syntactic errors as soon as possible.
But, it is too much work to implement an LR parser by hand for a
typical programming-language grammar.=====> Parser Generator
![Page 2: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/2.jpg)
CLR
LALR
SLR
The techniques for producing LR parsing tables Simple LR(SLR) - LR(0) items, FOLLOW Canonical LR(CLR) - LR(1) items Lookahead LR(LALR) - ① LR(1) items
② LR(0), Lookahead
LR Parsing[2/60]
![Page 3: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/3.jpg)
Sm
a1
stack
ParsingTable
DriverRoutine
… ai … an $ : input
LR parser
Stack : SS00XX11SS11XX2 2 •••••• X XmmSSmm, where Si : state and Xi V.
Configuration of an LR parser :
(S0X1S1 ••• XmSSmm, aaiiai+1 ••• an$)
stack contents unscanned input
![Page 4: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/4.jpg)
symbolsstates <Terminals> <Nonterminals>
… … …
ACTION Table GOTO Table
LR Parsing Table (ACTION table + GOTO table)
The LR parsing algorithm::= same as the shift-reduce parsing algorithm.
Four Actions : shift reduce accept error
![Page 5: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/5.jpg)
1. ACTION[Sm,ai] = shift S
::= (S0X1S1 XmSm, aiai+1 an$)
(S0X1S1 XmSmaiS, ai+1 an$)
2. ACTION[Sm,ai] = reduce A α and |α| = r
::= (S0X1S1 XmSm, aiai+1 an$)
(S0X1S1 Xm-rSm-r, aiai+1 an$), GOTO(Sm-r , A) = S
(S0X1S1 Xm-rSm-rAS, aiai+1 an$)
3. ACTION [Sm,ai] = accept, parsing is completed.
4. ACTION [Sm,ai] = error, the parser has discovered an error
and calls an error recovery routine.
![Page 6: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/6.jpg)
5 r1 r1
4 5s3
3 r3 r3
2 r2 r2
1 s4 acc
0 1 2s3
symbolsstates
LIST ELEMENTa , $
G: 1. LIST LIST , ELEMENT 2. LIST ELEMENT 3. ELEMENT a
Parsing Table :
where,sj means shift and stack state j,ri means reduce by production numbered i,acc means accept, and blank means error.
![Page 7: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/7.jpg)
0 a,a$ s3
0 a 3 ,a$ r3 GOTO 2
0 ELEMENT 2 ,a$ r2 GOTO 1
0 LIST 1 ,a$ s4
a$ s3
0 LIST 1, 4 a 3 $ r3 GOTO 5
0 LIST 1, 4 ELEMENT 5 $ r1 GOTO 1
0 LIST 1 $ accept
0
0 a 3
0 LIST 1, 4
STACK INPUT ACTION
Input : = a, a Parsing Configuration :
initial
configuration
![Page 8: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/8.jpg)
The method for constructing an LR parsing table from a grammar
① SLR ② LALR ③ CLR
Definition : an LR(0) item a production with a dot at some position of the right side.
ex) A XYZ P, [A .XYZ] [A X.YZ] [A XY.Z] [A XYZ.]
mark symbol ::= the symbol after the dot if it exists. kernel item ::= [A α.] if α, A = S'. closure item ::= [A .α] : the result of performing the CLOSURE
operation. reduce item ::= [A α.]
![Page 9: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/9.jpg)
[Aα.β] means that an input string derivable from α has just been seen, if next seeing an input string derivable from β, we may be able to reduce by the production A αβ.
Definition : Augmented GrammarG = (VN, VT, P, S)
G' = (VN {S'},VT, P {S' S}, S')
where, S' is a new start symbol not in VN.
The purpose of this new starting production is to indicate
to
the parser when it should stop parsing and announce acceptance of the input. That is, acceptance occurs when and only when the parser is about to reduce by S' S.
![Page 10: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/10.jpg)
If S αAωαβ1β2ω, then αβ1 : viable prefix.
”viable prefix is a prefix of a right sentential form that does not continue past the right end of the handle of that sentential form.”
We say item [Aβ1.β2] is valid for a viable prefix
if there is a derivation S αAω αβ1β2ω,
“In general, an item will be valid for many viable prefixes.”
Canonical collection of LR(0) items::= the set of valid items for each viable prefix that can appear on the stack of an LR parser. Computation : CLOSURE & GOTO function
rm rm
*
![Page 11: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/11.jpg)
Definition :
CLOSURE(I)CLOSURE(I)
= I ∪ {[B . ] | [A .B] CLOSURE(I), B P}
Meaning : [A .B] in CLOSURE(I) indicates that, at some point in the
parsing process, we next expect to see a substring derivable from B as input. If B is a production, we would also expect to see a substring from at this point. For this reason, we also include [B . ] in CLOSURE(I).
![Page 12: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/12.jpg)
Computing Algorithm:
Algorithm CLOUSURE(I) ;
begin
CLOUSURE := I ;
repeat
if [A .B ] CLOSURE and B P then
if [B .] CLOSURE then
CLOSURE := CLOSURE {∪ [B .]}
fi
fi
until no change
end.
![Page 13: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/13.jpg)
예 1) E' E
E E + T | TT T F | FF (E) | id
CLOSURE ({[E' .E]}) = {[E' .E], [E .E+T], [E .T], [T .TF], [T .F],
[F .(E)], [F .id]}. CLOSURE({[E E.+T]}) = { [E E.+T] }.
예 2) S AS | b
A SA | a CLOSURE({[S A.S]})
= {[S A.S], [S .AS], [S .b], [A .SA], [A .a]}.
![Page 14: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/14.jpg)
Definition : GOTO(I,X)GOTO(I,X) = CLOSURE({[A X.] | [A .X] I}).
Meaning :If I is the set of items that are valid for some viable prefix , then GOTO(I,X) is the set of items that are valid for the viable prefix X.
ex) I = {[E' E.], [E E.+T]} GOTO(I,+) = CLOSURE({[E E+.T]})
= {[E E+.T], [T .TF], [T .F], [F .(E)], [F .id]}
![Page 15: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/15.jpg)
CC00 = {CLOSURE ({[S' .S]})} ∪ {GOTO(I,X) | I ∈ C0, X ∈ V}
We are now ready to give the algorithm to construct C0, the canonical collection of sets of LR(0) items for an augmented grammar; the algorithm is the following:
![Page 16: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/16.jpg)
Construction algorithm of C0. Algorithm Canonical_Collection;
begin C0 := { CLOSURE({[S' . S]}) };
repeat for I C∈ 0 do
Closure := CLOSURE(I); for each X ∈ MARK SYMBOL of Closure do J := GOTO(I,X);
if Ji = J then GOTO[I,X] := Ji
else GOTO[I,X] := J; C0 := C0 {J}∪ fi end for end for until no change end.
![Page 17: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/17.jpg)
G : LIST LIST , ELEMENT LIST ELEMENT ELEMENT a
Augmented Grammar
G' : ACCEPT LIST LIST LIST , ELEMENT LIST ELEMENT ELEMENT a
![Page 18: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/18.jpg)
Co : I0 : CLOSURE({[ACCEPT .LIST]})
= {[ACCEPT .LIST], [LIST .LIST,ELEMEMT],
[LIST .ELEMENT], [ELEMENT .a]}.
GOTO(I0,LIST) = I1 = {[ACCEPT LIST.],
[LIST LIST.,ELEMEMT]}. GOTO(I0,ELEMENT) = I2 = {[LIST ELEMENT.]}.
GOTO(I0,a) = I3 = {[ELEMENT a.]}.
GOTO(I1,,) = I4 = {[LIST LIST,.ELEMEMT],
[ELEMENT .a]}. GOTO(I4,ELEMENT) = I5 = {[LIST LIST,ELEMEMT.]}.
GOTO(I4,a) = I3.
![Page 19: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/19.jpg)
I0
I1
I2
I3
I4 I5
ELEMENTELEMENT
LIST ,
aa
Definition ::= a directed graph in which the nodes are labeled by the
sets of items and the edges by grammar symbols.
Ex)
![Page 20: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/20.jpg)
예 1) G : PR b DL ; SL e (PR P )
DL d ; DL | d (DL D )
SL s ; SL | s (SL S )
예 2) G : S S + A | A
A (S) | a(S) | a
• - 생성 규칙에 대한 LR(0) 아이템 [A->.]은 closure아이템인 동시에reduce 아이템이 된다 .
renaming G : P → bD ; Se D → d ; D | d S → s ; S | s
![Page 21: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/21.jpg)
C0 :[P' P.]
I1
[P' .P][P .bD;Se]
I0
[P bD.;Se]I3
[P bD;.Se] [S .s;S][S .s]
I5
[P bD;S.e]I7
[P bD;Se.]
I8
[P b.D;Se] [D .d;D][D .d]
I2
[S s.;S][S s.]
I8
[S s;.S][S .s;S][S .s]
I11
[D d.;D][D d.]
I4
[D d;.D][D .d;D][D .d]
I6
[D d;D.]
I9
[S s;S.]I12
S
e
s
S
D
;;
P
b
D d
d
s
;
![Page 22: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/22.jpg)
symbolsstates VT U {$} VN
0123
shiftreduceaccepterror
GOTO
ACTION Table GOTO Table
…
Three methods SLR(simple LR) - C0, Follow CLR(Canonical LR) - C1
LALR(Lookahead LR) - C1
C0. Lookahead
Parsing Table
![Page 23: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/23.jpg)
State i is constructed from Ii, where Ii C∈ 0.
The size of parsing table depends on the number of states of C0.
But, |C0| << |C1| .
The size of parsing table :SLR : |V| x |C0|
CLR : |V| x |C1|
LALR : |V| x |C0|
![Page 24: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/24.jpg)
::= The method constructing the SLR parsing table from the C0.
Constructing Algorithm: C0 = {I0,I1,I2,...,In}
1. ACTION[i,a] := "shift j"
if [A .a ] I∈ i and GOTO(Ii,a) = Ij.
2. ACTION[i,a] := "reduce A α", for all a ∈ FOLLOW(A)
if [A .] I∈ i .
3. ACTION[i,$] := "accept" if [S' S.] I∈ i .
4. GOTO[i,A] := j if GOTO(Ii, A) = Ij.
5. "error" for all undefined entries and initial state is i if [S' .S] I∈ i .
reduce item 에 대해 FOLLOW 를 사용하여 resolve.
![Page 25: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/25.jpg)
[L E.]
[A .L] [L .L,E][L .E][E .a]
I2
[E a.]I3
[A L.][L L.,E]
I1
[L L,.E][E .a]
I4
[L L,E.]
I5
I0
E
L
, a
E
a
G : 0. A L (A : ACCEPT, L : LIST, E : ELEMENT) 1. L L , E 2. L E 3. E a
FOLLOW(A) = {$}FOLLOW(L) = {,,$}FOLLOW(E) = {,,$}
![Page 26: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/26.jpg)
I5 r1 r1
I4 s3
I3 r3 r3
I2 r2 r2
I1 s4 acc
I0 s3
symbolsstates
a , $
5
1 2
L E
ACTION Table GOTO Table
Parsing Table :
![Page 27: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/27.jpg)
[S .S] [S .L=R][S .R][L .R][L .id][R .L]
[S S.]I1 I0
[L id.]
[S L.=R][R L.]
[S L=.R][R .L][L .R][L .id]
[L .R][R .L][L .R][L .id] [L R.]
[S L=R.] [R L.]
I2
I6
I9
I4
I5
I7
I8
[S R.]I3
Sid
L
R
id
R
id
L
R
=
G: 1. S L = R 2. S R 4. L id 3. L R 5. R L
C0 :
![Page 28: an efficient Bottom-up parser for a large and useful class of context-free grammars. the “ L ” stands for left-to-right scan of the input; the “ R.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649ef05503460f94c00678/html5/thumbnails/28.jpg)
Consider I2 :
ACTION[2,=] := “shift 6 ”
ACTION[2,=] := “reduce RL ” ( = FOLLOW(R))∵ ∈
shift-reduce conflict
Not SLR(1)
Shift-reduce conflict vs. Reduce-reduce conflict