Compilers section 4.7

29
COMPILER CONSTRUCTION PRINCIPLES, TECHNIQUES AND TOOLS 2 ND EDITION ALFRED. V AHO MUHAMMAD YASIR KHAN AIR UNIVERSITY MULTAN

Transcript of Compilers section 4.7

Page 1: Compilers section 4.7

COMPILER CONSTRUCTIONPRINCIPLES, TECHNIQUES AND TOOLS2ND EDITION

ALFRED. V AHO

MUHAMMAD YASIR KHAN

AIR UNIVERSITY MULTAN

Page 2: Compilers section 4.7

TODAY'S AGENDA

• 4.7 More Powerful LR Parser• 4.7.1 Canonical LR(1) Items

• 4.7.2 Constructing LR(1) Sets of Items

• 4.7.3 Canonical LR(1) Parsing Table

• 4.7.4 Constructing LALR Parsing Table

• 4.7.5 Efficient Construction of LALR Parsing Tables

• 4.7.6 Compaction of LR Parsing Tables

Page 3: Compilers section 4.7

INTRODUCTION

There are two different Methods

• Canonical LR or “LR” method

It makes full use of the lookahead Symbol(s).

this method uses a large set of items, called LR(1) items.

• LookaHead or “LALR” Method:

based on LR(0) sets of items parser,

has many fewer states than typical parsers based on LR(1) items.

(LALR is method of choice in most situations.)

Page 4: Compilers section 4.7

4.7.1 CANONICAL LR(1) ITEMS

• in the SLR method, state i calls for reduction by A a if the set of items Ii contains item [A a·] and a is in FOLLOW (A) .

• In some situations,

when state i appears on top of the stack, the viable prefix βa on the stack is such that βA cannot be followed by a in any right-sentential form.

Thus, the reduction by A a should be invalid on input a.

Page 5: Compilers section 4.7

EXAMPLE DEMONSTRATION

S L=R I R

L *R I id

R L

I2: S L· = R

R L·

Example 4.48, where in state 2 we had item R L·, which could correspond to A a above, and a could be the = sign, which is in FOLLOW (R) . Thus, the SLR parser calls for reduction by R L in state 2 with = as the next input. However, there is no right-sentential form of the grammar in Example 4.48 that begins R = ...Thus state 2, which is the state corresponding to viable prefix L only, should not really call for reduction of that L to R.

Page 6: Compilers section 4.7

CANONICAL LR(1) PARSING

• In the beginning, all we know is that we have not read any input (S'S), we hope to parse an S and after that we should expect to see a $ as lookahead. We write this as: S'S, $

• Now, consider a general item A, x. It means that we have parsed an , we hope to parse and after those we should expect an x. Recall that if there is a production , we should add to the state. What kind of lookahead should we expect to see after we have parsed ?

• We should expect to see whatever starts a . If is empty or can vanish, then we should expect to see an x after we have parsed (and reduced it to B)

Page 7: Compilers section 4.7

CONTINUE…

• Formally, we say LR( l) item [A a.β, a] is valid for a viable prefix γ if

• there is a derivation S => δ Aw => γa β w , where

• 1. γ = δa , and

• 2. Either a is the first symbol of w, or w is ϵ and a is $.

rm rm

Page 8: Compilers section 4.7

4.7.2 CONSTRUCTING LR(1) SETS OF ITEMS

• The method for building the collection of sets of valid LR( l ) items is essentially the same as the one for building the canonical collection of sets of LR ( O) items.

• We need only to modify the two procedures CLOSURE and GOTO .

Page 9: Compilers section 4.7

• The closure function for LR(1) items is then defined as follows:

For each item A, x in state I, each production in the grammar,and each terminal b in FIRST(x), add , b to I

If a state contains core item with multiple possible

lookaheads b1, b2,..., we write , b1/b2 as shorthand for ,

b1 and , b2

Page 10: Compilers section 4.7

For better understanding Please Open chapter 4

section 4.7.2Page # 263

Example 4.54Compilers 2ns Edition

BY Alferd V. Aho

Page 11: Compilers section 4.7

4.7.3 CANONICAL LR(1) PARSING TABLES

• The table is created in the same way as SLR, except we now use the possible lookahead tokens saved in each state, instead of the FOLLOW sets.

• Note that the conflict that had appeared in the SLR parser is now gone.

• However, the LR(1) parser has many more states. This is not very practical.

Page 12: Compilers section 4.7
Page 13: Compilers section 4.7

CANONICAL LR(1)

• The table formed from the parsing action and goto functions produced by

• Algorithm is called the canonical LR( l) parsing table.

• An LR parser using this table is called a canonical- LR( l) parser.

• If the parsing action function has no multiply defined entries, then the given grammar is called an LR (1) grammar.

Page 14: Compilers section 4.7
Page 15: Compilers section 4.7

4.7.4 CONSTRUCTING LALR PARSING TABLE• LALR (lookahead LR) technique is often used in practice,

because

• Tables ob tained by it are considerably smaller than the canonical LR tables

• most common syntactic constructs of programming languages can be expressed con veniently by an LALR grammar.

Page 16: Compilers section 4.7

• For a comparison of parser size

• SLR and LALR tables for a grammar always have the same number of states

• The canonical LR table would typically have several thousand states for the same-size language.

Page 17: Compilers section 4.7

EXAMPLE REFERENCE: PAGE 263 EXAMPLE 4.54

• S' S S CC ( 4.55)

C cC I d

• Regular Language : c * d c * d

I4: C d·, c/d I7: C d· , $

Take Union of then I47

Set of items represented as[C d. , c/d/$]

Page 18: Compilers section 4.7

MERGE WILL CAUSE REDUCE/REDUCE CONFLICT!

• Example 4.58 : Consider the grammar

S' S

S a A d I b B d I a B e I b A e

A c

B c

{[A c. , d], [B c. e]} valid for Viable Prefix ac

{[A c. , e], [B c. , d]} valid for Viable Prefix bc

String Generated by Grammar:

acd , ace , bcd, bce

Conflict arise when taking union.A c. , d/eB c. , d/e

Page 19: Compilers section 4.7

SOLUTION:

• The general idea is to construct the sets of LR( 1 ) items, and if no

conflicts arise, merge sets with common cores.

Page 20: Compilers section 4.7

EXAMPLE 4.60S’ SS CC,C cC,C d

136: C c·C, c/d/$ C

·cC, c/d/$ C · d,

c/d/$ I47: C d., c/d/$

189: C cC· , c/d/ $

Page 21: Compilers section 4.7

4.7.5 EFFICIENT CONSTRUCTION OF LALR PARSING TABLES

Rules to construct efficient LALR parsing tables:

• We can represent any set of LR(O) or LR(l) items 1 by its kernel, [S' ·S, $]

• We can construct the LALR( l ) item kernels from the LR(O) -item kernels by a process of propagation and spontaneous generation of lookaheads.

• If we have the LALR( l ) kernels , we can generate the LALR( l ) parsing table by closing each kernel, using the function CLOSURE and then computing table entries by Algorithm

Page 22: Compilers section 4.7

EXAMPLE 4.61 Need Lookahead to make LALR(1)

Page 23: Compilers section 4.7

• There are two ways a lookahead b can get attached to an LR ( O ) item B γ.δ in some set of LALR( l ) items J:

• There is a set of items I, with a kernel item A a·β, a, and J = GOTO (I, X), and the construction of GOTO (CLOSURE({[A a.β,

a] }) ,X ) Such a lookahead b is said to be generated spontaneously for B γ.δ

• As a special case, lookahead $ is generated spontaneously for the item S' ·S in the

initial set of items.

• All is as in (1), but a = b, and GOTO (CLOSURE({[A --+ a.β, b] } ),X ), contains

[B γ.δ, b) only because A a.β has b as one of its associated lookaheads . In such

a case, we say that lookaheads propagate from A a.β in the kernel of I to B γ.δ in the

kernel of J.

• Note that propagation does not depend on the particular lookahead symbol; either

all lookaheads propagate from one item to another, or none do .

Page 24: Compilers section 4.7

4.7.6 COMPACTION OF LR PARSING TABLES• One useful technique for compacting the action field is

to recognize that usually many rows of the action table are identical.

• We can save considerable space, at little cost in time , if we create a pointer for each state into a one-dimensional array.

• To access information from this array, we assign each terminal a number from zero to one less than the number of terminals, and we use this integer as an offset from the pointer value for each state.

Page 25: Compilers section 4.7

2ND WAY…• Further space efficiency can be achieved at the expense of a

somewhat slower parser by creating a list for the actions of each state.

• The list consists of (terminal-symbol, action) pairs. The most frequent action for a state can be placed at the end of the list.

• in place of a terminal we may use the notation "any," meaning that if the current input symbol has not been found so far on the list, we should do that action no matter what the input is.

• Moreover, error entries can safely be replaced by reduce actions

Page 26: Compilers section 4.7
Page 27: Compilers section 4.7

• We can also encode the GOTO table by a list , but here it appears more efficient to make a list of pairs for each nonterminal A.

• Each pair on the list for A is of the form (currentState, nextState), indicating

GOTO [curr entState, A] = next State

Page 28: Compilers section 4.7
Page 29: Compilers section 4.7

THANK YOU…

• For more better understanding please review book articles.

Questions are welcome.