1 Parsers and Grammar. 2 Categories of Grammar Rules Declarations or definitions....

19
1 Parsers and Grammar

Transcript of 1 Parsers and Grammar. 2 Categories of Grammar Rules Declarations or definitions....

Page 1: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

1

Parsers and Grammar

Page 2: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

2

Categories of Grammar Rules

Declarations or definitions. AttributeDeclaration ::=

[ final ] [ static ] [ access ] datatype [ = expression ]{ , datatype [ = expression ] } ;

access ::= ' public ' | ' protected ' | ' private '

Statements. assignment, if, for, while, do_while

Expressions,

such as the examples in these slides.

Structures such as statement blocks, methods, and entire classes.

StatementBlock ::= '{' { Statement; } '}'

Page 3: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

3

Parsing Algorithms (1)

Broadly divided into LL and LR. LL algorithms match input directly to left-side

symbols, then choose a right-side production that matches the tokens. This is top-down parsing

LR algorithms try to match tokens to the right-side productions, then replace groups of tokens with the left-side nonterminal. They continue until the entire input has been "reduced" to the start symbol

LALR (look-ahead LR) are a special case of LR; they require a few restrictions to the LR case

Reference: Sebesta, section 4.3 - 4.5.

Page 4: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

4

Parsing Algorithms (2)

Look ahead: algorithms must look at next token(s) to decide

between alternate productions for current tokens LALR(1) means LALR with 1 token look-ahead LL(1) means LL with 1 token look-ahead

LL algorithms are simpler and easier to visualize.

LR algorithms are more powerful: can parse some grammars that LL cannot, such as left recursion.

yacc, bison, and CUP generate LALR(1) parsers

Recursive-descent is a useful LL algorithm that "every computer professional should know" [Louden].

Page 5: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

5

Top-down (LL) Parsing ExampleFor the input: z = (2*x + 5)*y - 7;tokens: ID = ( NUMBER * ID + NUMBER ) * ID - NUMBER ;

Grammar rules (as before):

assignment => ID = expression ;expression => expression + term

| expression - term| term

term => term * factor| term / factor| factor

factor => ( expression )| ID| NUMBER

Page 6: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

6

Top-down Parsing Example (2)

The top-down parser tries to match input to left sides.

ID = ( NUMBER * ID + NUMBER )* ID - NUMBER ;assignmentID = expression ID = expression - term ;ID = term - term ;ID = term * factor - term ;ID = factor * factor - term ;ID = ( expression * factor - term ;ID = ( expression + term ) * factor - term ; ID = ( term + term ) * factor - term ; ID = ( term * factor + term )* factor - term ; ID = ( factor * ID + factor )* factor - term ;ID = ( NUMBER * ID + NUMBER )* factor - term ;ID = ( NUMBER * ID + NUMBER )* ID - factor ;ID = ( NUMBER * ID + NUMBER )* ID - ID ;

Page 7: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

7

Top-down Parsing Example (3)

Problem in example: we had to look ahead many tokens in order to know which production to use.

This isn't necessary provided that we know the grammar is parsable using LL (top-down) methods.

There are conditions on the grammar that we can test to verify this. (see: The Parsing Problem)

Later we will study the recursive-descent algorithm which does top-down parsing with minimal look-ahead.

Page 8: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

8

Bottoms-up (LR) Parsing Example (1)

tokens: ID = ( NUMBER * ID + NUMBER ) * ID - NUMBER ;

parser: ID ... read (shift) first tokenfactor ... reducefactor = ... shift

FAIL: Can't match any rules (reduce)Backtrack and try again

ID = ( NUMBER ... shiftID = ( factor ... reduceID = ( term * ... sh/reduceID = ( term * ID ... shiftID = ( term * factor ... reduceID = ( term ... reduceID = ( term + ... shiftID = ( expression + NUMBER ... reduce/shID = ( expression + factor ... reduceID = ( expression + term ... reduce

Action

Page 9: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

9

Bottoms-up Parsing Example (2)

tokens: ID = ( NUMBER * ID + NUMBER ) * ID -NUMBER;

input: ID = ( expression ... reduceID = ( expression ) ... shiftID = factor ... reduceID = factor * ... shiftID = term * ID ... reduce/shID = term * factor ... reduceID = term ... reduceID = term - ... shiftID = expression - ... reduceID = expression - NUMBER ... shiftID = expression - factor ... reduceID = expression - term ... reduceID = expression ; shiftassignment reduce

SUCCESS!!Start Symbol

Page 10: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

10

Bottoms-up Parsing Example (3)

LR parsing processes the input stream from the Left and tries to match the input to the Right side of a production.

When something matches, it reduces the expression to a left side non-terminal symbol.

Repeat the process until the entire input stream is matched.

This could potentially be an O(n3) task, but Knuth and others devised a table-based algorithm that is O(n).

Page 11: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

11

The Parsing Problem

Page 12: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

12

The Parsing Problem

Top-down parsers must decide which production to use based on the current symbol, and perhaps "peeking" at the next symbol (or two...).

Predictive parser: a parser that bases its actions on the next available token (called single symbol look-ahead).

Two conditions are necessary: [see Louden, p. 108-110]

Page 13: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

13

The Parsing Problem (cont.)

Condition 1: the ability to choose between multiple alternatives, such as: A 1 | 2 | ... | n

define First() = set of all tokens that can be the first token for any production cascade that produces symbol

then a predictive parser can be used for rule A if:

First(1) First(2) ... First(n) is empty.

Condition 2: the ability of the parser to detect presence of an optional element, such as A [ ].

Can the parser detect for certain when is present?

Page 14: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

14

The Parsing Problem (cont.)

Example: list expr [list].

How do we know that list isn't part of expr?

define Follow( ) = set of all tokens that can follow the non-terminal some production. Use a special symbol ($) to represent the end of input if can be the end of input.

Example: Follow( factor ) = { +, -, *, /, ), $ } while Follow( term ) = { *, /, ), $ }

then a predictive parser can detect the presence of optional symbol if First( ) Follow( ) is empty.

Page 15: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

15

Review and Thought Questions

Page 16: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

16

Lexics vs. Syntax vs. Semantics

Division between lexical and syntactic structure is not fixed:

number can be a token or defined by a grammar rule. Implementation can often decide:

scanners are faster parsers are more flexible error checking of number format as regex is simpler

Division between syntax and semantics is not fixed: we could define separate rules for IntegerNumber and

FloatingPtNumber , IntegerTerm, FloatingPtTerm, ... in order to specify which mixed-mode operations are allowed.

or specify as part of semantics

Page 17: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

17

Numbers: Scan or Parse?

We can construct numbers from digits using the scanner or parser. Which is easier / better ?

Scanner: Define numbers as tokens:

number : [-]\d+

Parser: grammar rules define numbers (digits are tokens):

number '-' unsignednumber | unsignednumber

unsignednumber => unsignednumber digit | digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Page 18: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

18

Is Java 'Class' grammar context-free?

A class may have static and instance attributes.

An inner class or local class have same syntax as top-level class, but:

may not contain static members (except static constants)

inner class may access outer class using OuterClass.this

local class cannot be "public"

Does this means the syntax for a class depends on context?

Page 19: 1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

19

Alternative operator notation

Some languages use prefix notation: operator comes first

expr + expr expr | * expr expr | NUMBER

Examples:

* + 2 3 4 means (2 + 3) * 4

+ 2 * 3 4 means 2 + (3 * 4)

Using prefix notation, we don't have to worry about precedence of different operators in BNF rules !