Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn...

55
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University

Transcript of Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn...

Parsing

Jaruloj ChongstitvatanaDepartment of Mathematics and Computer

ScienceChulalongkorn University

2

Outline

Top-down parsing Recursive-descent

parsing LL(1) parsing

LL(1) parsing algorithm

First and follow sets Constructing LL(1)

parsing table Error recovery

Bottom-up parsing - Shift reduce parsers LR(0) parsing

0LR( ) items FFFFFF FFFFFFFF FF FFF

FF 0LR( ) parsing algorithm 0LR( )gr ammar

S 1LR( )par si ng S FFFFFFF FFFFFFFFF1

S 1LR( )gr ammar FFFFFFF FFFFFFFF

3

Introduction

Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream of tokens.We already learn how to describe the syntactic structure of a language using (context-free) grammar.So, a parser only need to do this?

Stream of tokens

Context-free grammarParser Parse tree

2301373 4

Top–Down Parsing Bottom–Up Parsing

A parse tree is created from root to leavesThe traversal of parse trees is a preorder traversalTracing leftmost derivationTwo types: Backtracking parser Predictive parser

A parse tree is created from leaves to rootThe traversal of parse trees is a reversal of postorder traversalTracing rightmost derivationMore powerful than top-down parsing

Try different structures and backtrack if it does not matched the input

Guess the structure of the parse tree from the next input

5

Parse Trees and Derivations

E E + E id + E id + E * E id + id * E id + id * id

E E + E E + E * E E + E * id E + id * id id + id * id

Top-down parsing

Bottom-up parsing

id

E*

E

id

id

+

E

E E

+

*

id

E

E

id

E

E

id

E

6

Top-down Parsing

What does a parser need to decide?Which production rule is to be used at each

point of time ?

How to guess?What is the guess based on?What is the next token?

Reserved word if, open parentheses, etc. What is the structure to be built?

If statement, expression, etc.

7

Top-down Parsing

Why is it difficult?Cannot decide until later

Next token: if Structure to be built: St St Mat chedSt | Unmat chedSt UnmatchedSt

if (E) St| if (E) MatchedSt else UnmatchedSt MatchedSt if (E) MatchedSt else MatchedSt |...

Production with empty stringNext token: id Structure to be built: par par parList | parList exp , parList | exp

8

Recursive-Descent

Write one procedure for each set of productions with the same nonterminal in the LHSEach procedure recognizes a structure described by a nonterminal.A procedure calls other procedures if it need to recognize other structures.A procedure calls match procedure if it need to recognize a terminal.

9

Recursive-Descent: Example

E E O F | FO + | -F ( E ) | id

procedure F{ switch token

{ case (: match(‘(‘); E; match(‘)’);

case id: match(id);default: error;

}}

For this grammar: We cannot decide

which rule to use for E, and

If we choose E E O F, it leads to infinitely recursive loops.

Rewrite the grammar into EBNF

procedure E{ F;

while (token=+ or token=-){ O; F; }

}

procedure E{ E; O; F; }

E ::= F {O F}O ::= + | -F ::= ( E ) | id

10

Match procedure

procedure match(expTok){ if (token==expTok)

then getTokenelse error

}

The token is not consumed until getToken is executed.

11

-Problems in Recursive Descent

Difficult to convert grammars into EBNF Cannot decide which production to use

at each point Cannot decide when to use - production

A

12

LL(1) Parsing

1LL( ) Read input from (L ) left to right Simulate (L ) leftmost derivation1 lookahead symbol

Use stack to simulate leftmost derivation Part of sentential form produced in the left

most derivation is stored in the stack. Top of stack is the leftmost nonterminal sy

mbol in the fragment of sentential form.

13

Concept of LL(1) Parsing

Simulate leftmost derivation of the input.Keep part of sentential form in the stack.

If the symbol on the top of stack is a terminal, try to match it with the next input token and p op it out of stack.

If the symbol on the top of stack is a nontermi nal X, replace it with Y if we have a productio

n rule X Y. Which production will be chosen, if there ar

e both X Y and X Z ?

14

1Example of LL( ) Parsing

( n + ( n ) ) * n $

$

E

E T XX A T X | A + | -T F NN M F N | M *F ( E ) | n

T

X

F N )

E

( T

X

F

N

n A

T

X

+ F

N

(

E

)

T

X

F

N

n

M

F

N

*

n

Finished

E TX FNX (E)NX (TX)NX (FNX)NX (nNX)NX (nX)NX (nATX)NX (n+TX)NX (n+FNX)NX (n+(E)NX)NX (n+(TX)NX)NX (n+(FNX)NX)NX (n+(nNX)NX)NX (n+(nX)NX)NX (n+(n)NX)NX (n+(n)X)NX (n+(n))NX (n+(n))MFNX (n+(n))*FNX (n+(n))*nNX (n+(n))*nX (n+(n))*n

15

LL(1) Parsing Algorithm

Push the start symbol into the stackWHILE stack is not empty ($ is not on top of stack) and the stream

of tokens is not empty (the next input token is not $)SWITCH (Top of stack, next token)

CASE (terminal a, a):Pop stack; Get next token

CASE (nonterminal A, terminal a):IF the parsing table entry M[A, a] is not empty THEN

Get A X1 X2 ... Xn from the parsing table entry M[A, a] Pop stack; Push Xn ... X2 X1 into stack in that order

ELSE ErrorCASE ($,$): AcceptOTHER: Error

16

LL(1) Parsing Table

If the nonterminal N is o n the top of stack and

the next token is t , whi ch production rule to u

se? Choose a rule N X su

ch thatX * tY orX * and S *

WNtY

N

Q

t … … …

X Y

t

Y

t

N X

17

FFFFF FFF

Let X be or be in V or T.First( X ) is the set of the first terminal in

any sentential form derived from X. If X is a terminal or , then First( X ) ={ X }. If X is a nonterminal and X X 1X 2 ... Xn is a

rule, thenFirst(X

1 -) { } is a subset of First(X)

First(Xi -) { } is a subset of First(X) if for all j<iFirst(Xj ) contains {}

is in First(X) if for all j ≤ n First(Xj )contains

18

Examples of First Set

exp exp addop term | term

addop -+| term term mulop facto

r | factormulop *factor (exp) | num

- First(addop) = {+, } First(mulop) = {*} First(factor) = {(, num}

First(term) = {(, num} First(exp) = {(, num}

st ifst | other ifst if ( exp ) st elsepar

t elsepart else st |

exp 0 | 1

First(exp) = {0,1} First(elsepart) = {else, }

First(ifst) = {if} First(st) = {if, other}

19

Algorithm for finding First(A)

For all terminals a, First(a) = {a}For all nonterminals A, First(A) := { }While there are changes to any First(A)

For each rule A X1 X2 ... Xn

For each Xi in {X1, X2, …, Xn }

If for all j<i First(Xj) contains , Then

add First(Xi)-{} to First(A)

If is in First(X1), First(X2), ..., and First(Xn)

Then add to First(A)

If A is a terminal or , t hen First(A) = {A}.

If A is a nonterminal, th en for each rule A

X1

X2

... Xn , First(A) contains First(X

1 - )

{}. If also for some i<n, Fir

st(X1

), First(X2

), ..., and First(Xi ) contain

, then First(A) cont ains First(Xi+1 -) {}.

If First(X1

), First(X2

), .. ., and First(Xn ) conta in , then First(A) als

o contains .

20

Finding First Set: An Example

exp term exp’ exp’ addop term exp’ |

addop -+ | term factor term’ term’ mulop factor term’ |

mulop * factor ( exp ) | num

First

exp

exp’

addop

term

term’

mulop

factor

-+

*

( num

-+

( num

*

( num

21

Follow Set

Let $ denote the end of input tokens If A is the start symbol, then $ is in Follo

w(A). If there is a rule B FFFFFFFF - F,() { } is in Follow(A). If there is production B X A Y and is i n First(Y), then Follow(A) contains Follo

w(B).

22

Algorithm for Finding Follow(A)

Follow(S) = {$}

FOR each A in V-{S}

Follow(A)={}

WHILE change is made to some Follow sets

FOR each production A X1 X2 ... Xn,

FOR each nonterminal Xi

Add First(Xi+1 Xi+2...Xn)-{} into Follow(Xi).

(NOTE: If i=n, Xi+1 Xi+2...Xn= )

IF is in First(Xi+1 Xi+2...Xn) THEN

Add Follow(A) to Follow(Xi)

If A is the start sy mbol, then $ is i

n Follow(A). If there is a rule A Y X Z, then Fir

- st(Z) { } is in Follow(X).

If there is producti on B X A Y and

is in First(Y), th en Follow(A) con

tains Follow(B).

23

Finding Follow Set: An Example

exp term exp’ exp’ addop term exp’ |

addop -+ | term factor term’ term’ mulop factor term’

| mulop * factor ( exp ) | num

First

exp

exp’

addop

term

term’

mulop

factor

-+

*

( num

-+

( num

*

( num

Follow)

-+

$( num

( num

-+

*

$

( num

$

*

-+

$

$ -+ $

))

)

))

)

24

Constructing LL(1) Parsing Tables

FOR each nonterminal A and a production A X FOR each token a in First(X)

A X is in M(A, a) IF is in First(X) THEN

FOR each element a in Follow(A) Add A X to M(A, a)

25

1Example: Constructing LL( ) Parsing Table

First Followexp {(, num} {$,)}exp’ {+,-, } {$,)}addop {+,-} {(,num}term {(,num} {+,-,),$}term’ {*, } {+,-,),$}mulop {*} {(,num}factor {(, num} {*,+,-,),$}

1 exp term exp’2 exp’ addop term exp’ 3 exp’ 4 addop + 5 addop -6 term factor term’7 term’ mulop factor term’8 term’ 9 mulop *10 factor ( exp ) 11 factor num

( ) + - * n $exp

exp’

addop

term

term’

mulop

factor

1 1

2 23 3

4 5

6 6

78 8 8 8

9

10 11

26

LL(1) Grammar

A grammar is an LL(1) grammar if its LL( 1) parsing table has at most one produc

tion in each table entry.

27

- LL(1 ) Parsing Table for non LL(1 ) Grammar

1 exp exp addop term 2exp term 3term term mulop facto

r 4term factor 5 factor ( exp ) 6f act or num 7addop + 8 addop - F FFFF 9 *

First(exp) = { (, num } First(term) = { (, num }

First(factor) = { (, num } - First(addop) = { +, } First(mulop) = { * }

( ) + - * num $exp 1,2 1,2term 3,4 3,4

factor 5 6addop 7 8mulop 9

28

Causes of - FFFFFFF(1)

FFF -FFFFFF( 1 ) ? -Left recursion Left factor

29

Left Recursion

Immediate left recursion A A X | Y A A X 1 | A X 2 |…| A

X n | Y 1 | Y 2 |... | Y m

General left recursion A => X =>* A Y

Can be removed very easily A Y A’, A’ X A’| A Y 1A’ | Y 2A’ |...| Ym A’

, A’ X 1 A’| X 2 A’|…| X n

A’|

Can be removed when -there is no empty strin

g production and no cy cle in the grammar

A=Y X*

A={Y1, Y2,…, Ym} {X1, X2, …, Xn}*

30

Removal of Immediate Left Recursion

exp exp + term | exp - term | termterm term * factor | factor

factor ( exp ) | num Remove l ef t r ecur si on

exp term exp’ exp’ - + term exp’ | term exp’ | term factor term’ term’ * factor term’ | factor ( exp ) | num

exp = term ( term)*

term = factor (* factor)*

31

General Left Recursion

Bad News! Can only be removed when there is no emp

- ty string production and no cycle in the grammar.

GoodNews!!!! Never seen in grammars of any programmi

ng languages

32

Left Factoring

-Left factor causes non LL(1) Given A X Y | X Z. Both A X Y and A X

Z can be chosen when A is on top of stack a nd a token in First(X) is the next token.

A X Y | X Z - can be left factored as

A X A’ and A’ Y | Z

33

Example of Left Factor

ifSt if ( exp ) st FFFF st | if ( exp ) st - can be left factored as

ifSt if ( exp ) st elseParte lsePart FFFF st |

seq st ; seq | st - can be left factored as

seq st seq’ seq’ ; seq |

34

Outline

Top-down parsing Recursive-descent

parsing LL(1) parsing

LL(1) parsing algorithm

First and follow sets Constructing LL(1)

parsing table Error recovery

Bottom-up parsing - Shift reduce parsers LR(0) parsing

0LR( ) items FFFFFF FFFFFFFF FF FFF

FF 0LR( ) parsing algorithm 0LR( )gr ammar

S 1LR( )par si ng S FFFFFFF FFFFFFFFF1

S 1LR( )gr ammar FFFFFFF FFFFFFFF

35

Bottom-up Parsing

Use explicit stack to perform a parse Simulaterightmostderivation(R) froml ef t (L) t o r i ght , t hus cal l ed LR parsing

- Mor e power f ul t han t op down par si ng Left recursion does not cause problem

Two act i ons Shift: take next input token into the stack Reduce: replace a string B on top of stack b

y a nonterminal A, given a production A B

36

- Example of Shift reduce Parsing

FFFFFFF FF rightmost derivation

from left to right1 ( ( ) )2 ( ( ) )3 ( ( ) )4 ( ( S ) )5 ( ( S ) )6 ( ( S ) S ) 7 ( S )8 ( S )9 ( S ) S

10 S’ S

Grammar S’ S

S (S)S | Parsing actions

Stack Input Action$ ( ( ) ) $ shift

$ ( ( ) ) $ shift $ ( ( ) ) $ reduce S $ ( ( S ) ) $ shift $ ( ( S ) ) $ reduce S $ ( ( S ) S ) $ reduce S ( S ) S $ ( S ) $ shift $ ( S ) $ reduce S $ ( S ) S $ reduce S ( S ) S $ S $ accept

37

- Example of Shift reduce Parsing

1 ( ( ) )2 ( ( ) )3 ( ( ) )4 ( ( S ) )5 ( ( S ) )6 ( ( S ) S ) 7 ( S )8 ( S )9 ( S ) S

10 S’ S

Grammar S’ S

S (S)S | Parsing actions

Stack Input Action$ ( ( ) ) $ shift

$ ( ( ) ) $ shift $ ( ( ) ) $ reduce S $ ( ( S ) ) $ shift $ ( ( S ) ) $ reduce S $ ( ( S ) S ) $ reduce S ( S ) S $ ( S ) $ shift $ ( S ) $ reduce S $ ( S ) S $ reduce S ( S ) S $ S $ accept

Viable prefix

handle

38

FFFFFFFFFFFFF

Right sentential form sentential form in a rightm

ost derivation

Vi abl e pr efi x sequence of symbols on th

e parsing stack

Handle right sentential form + pos

ition where reduction can b e performed + production

used for reduction

0LR( ) i t em production with distinguish

ed position in its RHS

Right sentential form ( S ) S ( ( S ) S )

Viable prefix ( S ) S, ( S ), ( S, ( ( ( S ) S, ( ( S ), ( ( S , ( (, (

Handle ( S ) S. with S ( S ) S . with S ( ( S ) S . ) with S ( S ) S

LR(0) item S ( S ) S. S ( S ) . S S ( S . ) S S ( . S ) S S . ( S ) S

39

- Shift reduce parsers

There are two possible actions: shift and reduce

Par si ng i s compl et ed when the input stream is empty and the stack contains only the start symbol

The grammar must be augmented a new start symbol S’ is added a production S’ S is added

To make sure that parsing is finished when S’ is on top of stack because S’ never appears on the

RHS of any production.

40

LR(0) parsing

Keep t r ack of what i s l ef t t o be done i n t he par si ng pr ocess by usi ng fi ni t e aut omat a of i t ems

An item A w . B y means: A F F F F FFFF FF FFFF FFF FFF FFFFFFFFF FF FFF

future, at the time, we know we already construct w in t

he parsing process, if B is constructed next, we get the new item

A w B . Y

41

LR(0) items

LR(0) item production with a distinguished position in the RHS

Initial Item Item with the distinguished position on the leftmos

t of the production Complete Item

Item with the distinguished position on the rightmo st of the production

Closure Item of x Item x together with items which can be reached fr

om x via -transition Kernel Item

Original item, not including closure items

42

Finite automata of items

: S’ S

S (S)S S

Items: S’ .S S’ S.

S .(S)S S (.S)S S (S.)S S (S).S S (S)S. S .

S’ .S S’ S.

S .(S)S S .

S (S.)S S (.S)S

S (S).S S (S)S.

S

S

(

)

S

43

DFA of LR(0) Items

S’ .S S’ S.

S .(S)S S .

S (S.)S S (.S)S

S (S).S

S (S)S.

S

S(

)

S

S’ .S S .(S)S S .

S (.S)S S .(S)S S .

S’ S.

S (S).S S .(S)S S .

S (S.)S

S (S)S.

S

(

S

)

((

S

44

LR(0) parsing algorithm

Item in state token Action

A-> x.By where B is terminal B shift B and push state s

containing A -> xB.y

A-> x.By where B is terminal not B error

A -> x. - reduce with A -> x (i.e. pop x,

backup to the state s on top of

stack) and push A with new

state d(s,A)

S’ -> S. none accept

S’ -> S. any error

45

LR(0) Parsing Table

State Action Rule ( a ) A 0 shift 3 2 1 1 reduce A’ -> A 2 reduce A -> a 3 shift 3 2 4 4 shift 5 5 reduce A -> (A)

A’ .A A .(A) A .a

A’ A.

A a.

A (A).

A (.A) A .(A) A .a

A (A.)

A

A

a

a(

()

0

4

3

2

1

5

46

Example of LR(0) Parsing

State Action Rule ( a ) A 0 shift 3 2 1 1 reduce A’ -> A 2 reduce A -> a 3 shift 3 2 4 4 shift 5 5 reduce A -> (A)

Stack Input Action$0 ( ( a ) ) $ shift$0(3 ( a ) ) $ shift$0(3(3 a ) ) $ shift$0(3(3a2 ) ) $ reduce$0(3(3A4 ) ) $ shift$0(3(3A4)5 ) $ reduce$0(3A4 ) $ shift$0(3A4)5 $ reduce$0A1 $ accept

47

- 0Non LR( )Grammar

0

1

4

2

5

3

S’ .S S .(S)S S .

S (.S)S S .(S)S S .

S’ S.

S (S).S S .(S)S S .

S (S.)S

S (S)S.

S

(

S

)

((

S

Conflict - Shift reduce conflict

A state contains a comp lete item A x. and a sh

ift item A x.By - Reduce reduce conflict

A state contains more th an one complete items.

0A grammar is a LR( ) grammar if there is

no conflict in the graFFFF.

48

SLR(1) parsing

Simple LR with 1 lookahead symbolExamine the next token before deciding to shift or reduce If the next token is the token expected in

an item, then it can be shifted into the stack.

If a complete item A x. is constructed and the next token is in Follow(A), then reduction can be done using A x.

Otherwise, error occurs.

Can avoid conflict

49

SLR(1) parsing algorithm

Item in state token Action

A-> x.By (B is terminal) B shift B and push state s containing

A -> xB.y

A-> x.By (B is terminal) not B error

A -> x. in

Follow(A)

reduce with A -> x (i.e. pop x,

backup to the state s on top of

stack) and push A with new state

d(s,A)

A -> x. not in

Follow(A)

error

S’ -> S. none accept

S’ -> S. any error

50

SLR(1) grammar

Conflict - Shift reduce conflict

A state contains a shift item A x.W y such that W is a terminal and a complete item B z. such that W is in Follow(B).

- Reduce reduce conflict A state contains more than one complete item

with some common Follow set.

A grammar is an SLR(1 ) grammar if ther e is no conflict in the grammar.

51

SLR(1) Parsing Table

A’ .A A .(A) A .a

A’ A.

A a.

A (A).

A (.A) A .(A) A .a A (A.)

A

A

a

a(

(

)

0

43

2

1

5

State ( a ) $ A 0 S3 S2 1 1 AC 2 R2 3 S3 S2 4 4 S5 5 R1

A (A) | a

52

SLR(1) Grammar not LR(0)

0

1

4

2

5

3

S’ .S S .(S)S S .

S (.S)S S .(S)S S .

S’ S.

S (S).S S .(S)S S .

S (S.)S

S (S)S.

S

(

S

)

((

S

State ( ) $ S 0 S2 R2 R2 1 1 AC 2 S2 R2 R2 3 3 S4 4 S2 R2 R2 5 5 R1 R1

S (S)S |

53

Disambiguating Rules for Parsing Conflict

Shift-reduce conflictPrefer shift over reduce

In case of nested if statements, preferring shift over reduce implies most closely nested rule for dangling else

Reduce-reduce conflictError in design

54

Dangling Else

state

if else other $ S I

0 S4 S3 1 2

1 ACC

2 R1 R1

3 R2 R2

4 S4 S3 5 2

5 S6 R3

6 S4 S3 7 2

7 R4 R4

S’ .S 0S .IS .otherI .if SI .if S else S

S I. 2

S .other 3

I if S else .S 6S .IS .otherI .if SI .if S else S

I if .S 4I if .S else SS .IS .otherI .if SI .if S else S

S’ S. 1

I if S. 5I if S. else S

I .if S else S 7

S

S

if

I

other

S

if

I

I

ifother

else

other

other

End

55