Context-Free Grammarsweb.cse.ohio-state.edu/.../extras/slides/25.Context-Free-Grammars.pdfA grammar...

47
Context-Free Grammars 24 October 2013 OSU CSE 1

Transcript of Context-Free Grammarsweb.cse.ohio-state.edu/.../extras/slides/25.Context-Free-Grammars.pdfA grammar...

Context-Free Grammars

24 October 2013 OSU CSE 1

BL Compiler Structure

24 October 2013 OSU CSE 2

CodeGeneratorParserTokenizer

string ofcharacters

(source code)

string oftokens

(“words”)

abstractprogram

string ofintegers

(object code)

The parser is arguably the most interesting, and most difficult,

piece of the BL compiler.

Plan for the BL Parser

• Design a context-free grammar (CFG) to specify syntactically valid BL programs

• Use the grammar to implement a recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Programobject)

24 October 2013 OSU CSE 3

Plan for the BL Parser

• Design a context-free grammar (CFG) to specify syntactically valid BL programs

• Use the grammar to implement a recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Programobject)

24 October 2013 OSU CSE 4

A grammar is a set of formation rules for strings in

a language.

Plan for the BL Parser

• Design a context-free grammar (CFG) to specify syntactically valid BL programs

• Use the grammar to implement a recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Programobject)

24 October 2013 OSU CSE 5

A grammar is context-free if it satisfies certain technical conditions described herein.

Languages

• A language is a set of strings over some alphabet Σ

• If L is a language, then mathematically it is a set of string of Σ

24 October 2013 OSU CSE 6

Aside: Characters vs. Tokens

• In the following examples of CFGs, we deal with languages over the alphabet of individual characters (e.g., Java’s charvalues)Σ = character

• In the BL project, we deal with languages over an alphabet of tokens (to be explained later)

24 October 2013 OSU CSE 7

Example: Real-Number Constants

• Some syntactically valid real-number constants (i.e., some strings in the “language of valid real-number constants”):37.044615.22E1699241.18.E-93

24 October 2013 OSU CSE 8

CFG Rewrite Rulesreal-const digit-seq . digit-seq |

digit-seq . digit-seq exponent |digit-seq . |digit-seq . exponent

exponent E digit-seq |E + digit-seq |E – digit-seq

digit-seq digit digit-seq |digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 9

CFG Rewrite Rulesreal-const digit-seq . digit-seq |

digit-seq . digit-seq exponent |digit-seq . |digit-seq . exponent

exponent E digit-seq |E + digit-seq |E – digit-seq

digit-seq digit digit-seq |digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 10

This is a rewrite rule (a replacement rule), which

describes how strings in the language may be formed.

CFG Rewrite Rulesreal-const digit-seq . digit-seq |

digit-seq . digit-seq exponent |digit-seq . |digit-seq . exponent

exponent E digit-seq |E + digit-seq |E – digit-seq

digit-seq digit digit-seq |digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 11

A name on the left of a rewrite rule is called anon-terminal symbol.

CFG Rewrite Rulesreal-const digit-seq . digit-seq |

digit-seq . digit-seq exponent |digit-seq . |digit-seq . exponent

exponent E digit-seq |E + digit-seq |E – digit-seq

digit-seq digit digit-seq |digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 12

The special CFG symbol means “can be rewritten as”

or “can be replaced by”.

CFG Rewrite Rulesreal-const digit-seq . digit-seq |

digit-seq . digit-seq exponent |digit-seq . |digit-seq . exponent

exponent E digit-seq |E + digit-seq |E – digit-seq

digit-seq digit digit-seq |digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 13

The special CFG symbol |means “or”, i.e., there are

multiple possible “rewrites” for the same non-terminal.

CFG Rewrite Rulesreal-const digit-seq . digit-seq |

digit-seq . digit-seq exponent |digit-seq . |digit-seq . exponent

exponent E digit-seq |E + digit-seq |E – digit-seq

digit-seq digit digit-seq |digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 14

So this ...

CFG Rewrite Rulesreal-const digit-seq . digit-seqreal-const digit-seq . digit-seq exponentreal-const digit-seq .real-const digit-seq . exponent exponent E digit-seq |

E + digit-seq |E – digit-seq

digit-seq digit digit-seq |digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 15

... means exactly the same thing as these four separate

rewrite rules.

CFG Rewrite Rulesreal-const digit-seq . digit-seq |

digit-seq . digit-seq exponent |digit-seq . |digit-seq . exponent

exponent E digit-seq |E + digit-seq |E – digit-seq

digit-seq digit digit-seq |digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 16

One non-terminal symbol (normally in the first rewrite

rule) is called thestart symbol.

CFG Rewrite Rulesreal-const digit-seq . digit-seq |

digit-seq . digit-seq exponent |digit-seq . |digit-seq . exponent

exponent E digit-seq |E + digit-seq |E – digit-seq

digit-seq digit digit-seq |digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 17

A symbol from the alphabet on the right-hand side of a

rewrite rule is called aterminal symbol.

CFG Rewrite Rulesreal-const digit-seq . digit-seq |

digit-seq . digit-seq exponent |digit-seq . |digit-seq . exponent

exponent E digit-seq |E + digit-seq |E – digit-seq

digit-seq digit digit-seq |digit

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 18

To remember the name: terminalsymbols are what you end up with

when generating strings in the language (see below).

Four Components of a CFG

• Non-terminal symbols for this CFG:– real-const, exponent, digit-seq, digit

• Terminal symbols for this CFG:– ., E, +, -, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

• Start symbol for this CFG:– real-const

• Rewrite rules for this CFG:– (see previous slides)

24 October 2013 OSU CSE 19

Derivations

• A derivation of a string of terminal symbols consists of a sequence of specific rewrite-rule applications that begin with the start symbol and continue until only terminal symbols remain– A string is in the language of the CFG iff

there is a derivation that leads to it• The symbol indicates a derivation step,

i.e., a specific rewrite-rule application24 October 2013 OSU CSE 20

Example: Derivation of 5.6E10

• Begin with the start symbol:real-const

24 October 2013 OSU CSE 21

Example: Derivation of 5.6E10

• Begin with the start symbol:real-const

• ... and pick one possible rewrite:real-const digit-seq . digit-seq |

digit-seq . digit-seq exponent |digit-seq . |digit-seq . exponent

24 October 2013 OSU CSE 22

Which rewrite is appropriate

to derive 5.6E10?

Example: Derivation of 5.6E10

• This is the first step of the derivation:real-const digit-seq . digit-seq exponent

24 October 2013 OSU CSE 23

Example: Derivation of 5.6E10

• Choose a non-terminal to rewrite:real-const digit-seq . digit-seq exponent

24 October 2013 OSU CSE 24

Example: Derivation of 5.6E10

• Choose a non-terminal to rewrite:real-const digit-seq . digit-seq exponent

• ... and pick one possible rewrite:digit-seq digit digit-seq |

digit

24 October 2013 OSU CSE 25

Which rewrite is appropriate

to derive 5.6E10?

Example: Derivation of 5.6E10

• This is the second step of the derivation:real-const digit-seq . digit-seq exponent

digit . digit-seq exponent

24 October 2013 OSU CSE 26

Example: Derivation of 5.6E10

• Choose a non-terminal to rewrite:real-const digit-seq . digit-seq exponent

digit . digit-seq exponent

24 October 2013 OSU CSE 27

Example: Derivation of 5.6E10

• Choose a non-terminal to rewrite:real-const digit-seq . digit-seq exponent

digit . digit-seq exponent• ... and pick one possible rewrite:

digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 28

Example: Derivation of 5.6E10

• This is the third step of the derivation:real-const digit-seq . digit-seq exponent

digit . digit-seq exponent5 . digit-seq exponent

24 October 2013 OSU CSE 29

Example: Derivation of 5.6E10

• Choose a non-terminal to rewrite:real-const digit-seq . digit-seq exponent

digit . digit-seq exponent5 . digit-seq exponent

24 October 2013 OSU CSE 30

Example: Derivation of 5.6E10

• Choose a non-terminal to rewrite:real-const digit-seq . digit-seq exponent

digit . digit-seq exponent5 . digit-seq exponent

• ... and pick one possible rewrite:digit-seq digit digit-seq |

digit

24 October 2013 OSU CSE 31

One Derivation of 5.6E10real-const digit-seq . digit-seq exponent

digit . digit-seq exponent5 . digit-seq exponent5 . digit exponent5 . 6 exponent5 . 6 E digit-seq5 . 6 E digit digit-seq5 . 6 E 1 digit-seq5 . 6 E 1 digit5 . 6 E 1 0

24 October 2013 OSU CSE 32

One Derivation of 5.6E10real-const digit-seq . digit-seq exponent

digit . digit-seq exponent5 . digit-seq exponent5 . digit exponent5 . 6 exponent5 . 6 E digit-seq5 . 6 E digit digit-seq5 . 6 E 1 digit-seq5 . 6 E 1 digit5 . 6 E 1 0

24 October 2013 OSU CSE 33

Note that a derivation is used in this way to

generate a string in the language of the CFG.

Another Derivation of 5.6E10real-const digit-seq . digit-seq exponent

digit-seq . digit-seq E digit-seqdigit-seq . digit-seq E digit digit-seqdigit-seq . digit-seq E digit digitdigit-seq . digit-seq E digit 0digit-seq . digit-seq E 1 0digit-seq . digit E 1 0digit-seq . 6 E 1 0digit . 6 E 1 05 . 6 E 1 0

24 October 2013 OSU CSE 34

Derivation Trees

• A derivation tree depicts a derivation (such as those above) in a tree

• Note that the order in which rewrites are done is sometimes arbitrary– A tree captures the required temporal order of

rewrites from top-to-bottom– A tree captures the required spatial order

among terminal symbols from left-to-right

24 October 2013 OSU CSE 35

A Derivation Tree for 5.6E10

24 October 2013 OSU CSE 36

real-const

digit

exponentdigit-seqdigit-seq .

digit

5 6

digit-seqE

digit digit-seq

digit1

0

A Derivation Tree for 5.6E10

24 October 2013 OSU CSE 37

real-const

digit

exponentdigit-seqdigit-seq .

digit

5 6

digit-seqE

digit digit-seq

digit1

0

This tree captures bothderivations previously illustrated

(and all others) for 5.6E10.

Other Examples

• Can you find a derivation tree for 5.E3?– If so, it’s in the language of the CFG;

otherwise it’s not in that language• Can you find a derivation tree for .6E10?

– If so, it’s in the language of the CFG; otherwise it’s not in that language

24 October 2013 OSU CSE 38

A Famous CFG

expr expr add-op term | termterm term mult-op factor | factorfactor ( expr ) | digit-seqadd-op + | -mult-op * | DIV | REMdigit-seq digit digit-seq | digitdigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 39

Example: 4+6*2

• Find a derivation tree for 4+6*2

24 October 2013 OSU CSE 40

A Derivation Tree for 4+6*2

24 October 2013 OSU CSE 41

expr

digit

digit-seq

+

4

expr add-op term

term

factor

term mult-op factor

digit

digit-seq

6

factor

digit

digit-seq

2

*

Example: (4+6)*2

• Find a derivation tree for (4+6)*2• How is it different from the previous one?

24 October 2013 OSU CSE 42

A Simpler CFG for Expressionsexpr expr op expr | ( expr ) | digit-seqop + | - | * | DIV | REMdigit-seq digit digit-seq | digitdigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24 October 2013 OSU CSE 43

One Derivation Tree for 4+6*2

24 October 2013 OSU CSE 44

expr

digit

digit-seq +

4

expr op expr

expr op expr

digit

digit-seq

6

digit

digit-seq

2

*

Another Derivation Tree for 4+6*2

24 October 2013 OSU CSE 45

expr

*

opexpr

expr op expr

digit

digit-seq

4

digit

digit-seq

6

+ digit

digit-seq

2

expr

Ambiguity

• The second (simpler) CFG for arithmetic expressions is ambiguous because some strings in the language of the CFG have more than one derivation tree

• As is often the case, ambiguity is bad– If you want to use the derivation tree as the

basis for evaluating the expression, only one of the derivation trees shown above results in the right answer (which one?)

24 October 2013 OSU CSE 46

Resources• Wikipedia: Context-Free Grammar

– http://en.wikipedia.org/wiki/Context-free_grammar

24 October 2013 OSU CSE 47