Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input...

19
Looking ahead in Looking ahead in javacc javacc 2/28/06 2/28/06

Transcript of Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input...

Page 1: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

Looking ahead in javaccLooking ahead in javacc

2/28/062/28/06

Page 2: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

2

What’s LOOKAHEAD?What’s LOOKAHEAD?

• The job of a parser is The job of a parser is to read an input to read an input stream and determine stream and determine whether or not the whether or not the input stream is in the input stream is in the grammar. grammar.

• This can be quite time This can be quite time consuming. consuming.

• Consider the following Consider the following exampleexample

void Input() :void Input() :{}{}{{ "a" BC() "c""a" BC() "c"}}

void BC() :void BC() :{}{}{{ "b" [ "c" ]"b" [ "c" ]}}

What strings are

matched?

Page 3: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

3

Matching “abc”Matching “abc”Step 1Step 1 Starting; there’s only one choice here - the char must be 'a' – Starting; there’s only one choice here - the char must be 'a' –

which it is, so OK.which it is, so OK.

Step 2Step 2 Proceeding to non-terminal BC; again, there’s Proceeding to non-terminal BC; again, there’sonly one choice for the next input character - it must be 'b'. This is in only one choice for the next input character - it must be 'b'. This is in

line w/ the input - fineline w/ the input - fine

Step 3Step 3 We now come to a We now come to a "choice point""choice point" in the grammar. We can in the grammar. We can eithereither

go inside the [...] and match it, or ignore it altogether. We decidego inside the [...] and match it, or ignore it altogether. We decideto go inside. So the next input character must be a 'c'. We areto go inside. So the next input character must be a 'c'. We areagain OK.again OK.

Step 4Step 4 Now we have completed with non-terminal BC and go back to Now we have completed with non-terminal BC and go back tonon-terminal Input. Now the grammar says the next character must benon-terminal Input. Now the grammar says the next character must beyet another 'c'. But there are no more input characters. So we haveyet another 'c'. But there are no more input characters. So we havea problema problem..

Page 4: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

4

Steps ContinuedSteps Continued

Step 5Step 5. In the general case, we conclude a bad choice happened . In the general case, we conclude a bad choice happened somewhere. In this case, we made the bad choice in Step 3; so somewhere. In this case, we made the bad choice in Step 3; so backtrack to step 3 and make another choice. backtrack to step 3 and make another choice.

Step 6.Step 6. We have now backtracked and made the other choice we We have now backtracked and made the other choice we couldcouldhave made at Step 3 - namely, ignore the [...]. Now we have have made at Step 3 - namely, ignore the [...]. Now we have completedcompletedwith non-terminal BC and go back to non-terminal Input. Now thewith non-terminal BC and go back to non-terminal Input. Now thegrammar says the next character must be yet another 'c'. The nextgrammar says the next character must be yet another 'c'. The nextinput character is a 'c', so we are OK now.input character is a 'c', so we are OK now.

Step 7Step 7. We realize we have reached the end of the grammar (end of. We realize we have reached the end of the grammar (end ofnon-terminal Input) successfully. This means we have successfullynon-terminal Input) successfully. This means we have successfullymatched the string "abc" to the grammar.matched the string "abc" to the grammar.

Backtracking is to be avoided!

Page 5: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

5

RethinkingRethinking

• The amount of time taken is a The amount of time taken is a function of how the grammar is function of how the grammar is written. written.

• Many grammars can be written to Many grammars can be written to cover the same set of inputs - or the cover the same set of inputs - or the same language (i.e., there can be same language (i.e., there can be multiple equivalent grammars for the multiple equivalent grammars for the same input language). same input language).

• What about the grammar above?What about the grammar above?

Page 6: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

6

What can be said of these?What can be said of these?

void Input() : {} { "a" "b" "c" [ "c" ] }

void Input() :{}{ "a" ( BC1() | BC2() )}

void BC1() :{}{ "b" "c" "c"}

void BC2() :{}{ "b" "c" [ "c" ]}

void Input() :{}{ "a" "b" "c" "c"| "a" "b" "c"}

GoodGood

BadBad

UglyUgly

Page 7: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

7

Looking AheadLooking Ahead• Backtracking performance is unacceptable so most parsers don’t backtrack in this Backtracking performance is unacceptable so most parsers don’t backtrack in this

general manner (if at all), rather they make decisions at choice points based on general manner (if at all), rather they make decisions at choice points based on limited information and then commit to it.limited information and then commit to it.

• Parsers generated by javacc make decisions at choice points based on some Parsers generated by javacc make decisions at choice points based on some exploration of tokens further ahead in the input stream, and once they make such a exploration of tokens further ahead in the input stream, and once they make such a decision, they commit to it. i.e.,No backtracking is performed once a decision is decision, they commit to it. i.e.,No backtracking is performed once a decision is made.made.

• The process of exploring tokens further in the input stream is termed "looking ahead" The process of exploring tokens further in the input stream is termed "looking ahead" into the input stream - hence our use of the term "LOOKAHEAD".into the input stream - hence our use of the term "LOOKAHEAD".

• Since some of these decisions may be made with less than perfect information you Since some of these decisions may be made with less than perfect information you need to know something about LOOKAHEAD to make your grammar work correctly.need to know something about LOOKAHEAD to make your grammar work correctly.

• The two ways in which you make the choice decisions work properly are:The two ways in which you make the choice decisions work properly are:

1.1. . Modify the grammar to make it simpler.. Modify the grammar to make it simpler.

2.2. . Insert hints at the more complicated choice points to help the parser make the right . Insert hints at the more complicated choice points to help the parser make the right choices.choices.

Page 8: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

8

Four Choice Points in javaccFour Choice Points in javacc

1.1. An expansion of the form: ( exp1 | exp2 | ... ). In this case, An expansion of the form: ( exp1 | exp2 | ... ). In this case, the generated parser has to somehow determine which of the generated parser has to somehow determine which of exp1, exp2, etc. to select to continue parsing. exp1, exp2, etc. to select to continue parsing.

2.2. . An expansion of the form: ( exp )?. In this case, the . An expansion of the form: ( exp )?. In this case, the generated parser must somehow determine whether to generated parser must somehow determine whether to choose exp or to continue beyond the ( exp )? without choose exp or to continue beyond the ( exp )? without choosing exp.choosing exp.

3.3. An expansion of the form ( exp )*. In this case, the An expansion of the form ( exp )*. In this case, the generated parser must do the same thing as in the generated parser must do the same thing as in the previous case, and furthermore, after each time a previous case, and furthermore, after each time a successful match of exp (if exp was chosen) is completed, successful match of exp (if exp was chosen) is completed, this choice determination must be made again. this choice determination must be made again.

4.4. An expansion of the form ( exp )+. This is essentially An expansion of the form ( exp )+. This is essentially similar to the previous case with a mandatory first match similar to the previous case with a mandatory first match to exp to exp

Page 9: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

9

The Default AlgoThe Default Algo• The default choice determination The default choice determination

algorithm looks ahead 1 token in the algorithm looks ahead 1 token in the input stream and uses this to help input stream and uses this to help make its choice at choice pointsmake its choice at choice points

void basic_expr() :void basic_expr() :{}{}{{ <ID> "(" expr() ")"<ID> "(" expr() ")" // Choice 1// Choice 1|| "(" expr() ")""(" expr() ")" // Choice 2// Choice 2|| "new" <ID>"new" <ID> // Choice 3// Choice 3}}

The choice determination algorithm :

if (next token is <ID>) { choose Choice 1} else if (next token is "(") { choose Choice 2} else if (next token is "new") { choose Choice 3} else { produce an error message}

Page 10: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

10

A Modified GrammarA Modified Grammarvoid basic_expr() :

{}

{

<ID> "(" expr() ")“ // Choice 1

|

"(" expr() ")" // Choice 2

|

"new" <ID> // Choice 3

|

<ID> "." <ID> // Choice 4

}

What happans on

<ID>?

Why?

Warning: Choice conflict involving two expansions at line 25, column 3 and line 31, column 3 respectively. A common prefix is: <ID> Consider using a lookahead of 2 for earlier expansion.

Page 11: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

11

Another example Another example

void identifier_list() :void identifier_list() :{}{}{{ <ID> ( "," <ID> )*<ID> ( "," <ID> )*}}

• Suppose the first <ID> has already been matched and that Suppose the first <ID> has already been matched and that the parser has reached the choice point (the (...)* construct). the parser has reached the choice point (the (...)* construct). Here's how the choice determination algorithm works:Here's how the choice determination algorithm works:

while (next token is ",") {while (next token is ",") { choose the nested expansion (i.e., go into the (...)* choose the nested expansion (i.e., go into the (...)* construct)construct) consume the "," tokenconsume the "," token if (next token is <ID>) consume it, otherwise report errorif (next token is <ID>) consume it, otherwise report error}}

Note: the choice determination algorithm does not look beyond the (...)*

Page 12: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

12

What to do here?What to do here?

• When the default algorithm is making a choice at ( "," When the default algorithm is making a choice at ( "," <ID> )*, it will always go into the (...)* construct if the <ID> )*, it will always go into the (...)* construct if the next token is a ",".next token is a ",".

• It will do this even when identifier_list was called from It will do this even when identifier_list was called from funny_list and the token after the "," is an <INT>. funny_list and the token after the "," is an <INT>.

• Intuitively, the right thing to do in this situation is to Intuitively, the right thing to do in this situation is to skip the (...)* construct and return to funny_listskip the (...)* construct and return to funny_list

void funny_list() : {} { identifier_list() "," <INT> }

void identifier_list() :void identifier_list() :{}{}{{ <ID> ( "," <ID> )*<ID> ( "," <ID> )*}}

Page 13: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

13

A Concrete exampleA Concrete example

Consider Consider "id1, id2, 5","id1, id2, 5", the parser will complain that it the parser will complain that it encountered a 5 when it was expecting an <ID>. encountered a 5 when it was expecting an <ID>. NoteNote - when - when you built the parser, it would have given you the following you built the parser, it would have given you the following warning message:warning message:

Warning: Choice conflict in (...)* construct at line 25, column 8.Warning: Choice conflict in (...)* construct at line 25, column 8. Expansion nested within construct and expansion following Expansion nested within construct and expansion following constructhave common prefixes, one of which is: ",“ Consider constructhave common prefixes, one of which is: ",“ Consider using a lookahead of 2 or more for nested expansion.using a lookahead of 2 or more for nested expansion.

Essentially, JavaCC is saying it has detected a situation in yourEssentially, JavaCC is saying it has detected a situation in yourgrammar which may cause the default lookahead algorithm to grammar which may cause the default lookahead algorithm to do strange things. The generated parser will still work using do strange things. The generated parser will still work using the default lookahead algorithm - except that it probably the default lookahead algorithm - except that it probably doesn’t do what you expectdoesn’t do what you expect

Page 14: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

14

Multiple Token Lookaheads Multiple Token Lookaheads SpecsSpecs

• In the majority of situations, the default algorithm In the majority of situations, the default algorithm works just fine. In situations where it does not works just fine. In situations where it does not work well, work well, javacc javacc provides you with warning provides you with warning messages likethe ones shown above. messages likethe ones shown above.

• If you have If you have javacc file javacc file without producing any without producing any warnings, then the grammar is a LL(1) grammar. warnings, then the grammar is a LL(1) grammar.

• Essentially, LL(1) grammars are those that can be Essentially, LL(1) grammars are those that can be handled by top-down parsers (such as those handled by top-down parsers (such as those generatedby generatedby javaccjavacc using at most one token of using at most one token of LOOKAHEAD.LOOKAHEAD.

• There are two options for lookaheadsThere are two options for lookaheads

Page 15: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

15

LL(1)?LL(1)?

• When you derive table multiple When you derive table multiple entries in a row/column indicated an entries in a row/column indicated an errorerror

• See See www.cs.usfca.edu/galles/cs414/lecturwww.cs.usfca.edu/galles/cs414/lecture/lecture3.java.pdfe/lecture3.java.pdf

Page 16: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

16

Option 1 - Option 1 - Modify your grammar

• You can modify your grammar so that the You can modify your grammar so that the warning messages go away. That is, you can warning messages go away. That is, you can attempt to make your grammar LL(1) by making attempt to make your grammar LL(1) by making

some changes to itsome changes to it void basic_expr() :{}{ <ID> ( "(" expr() ")" | "." <ID> )| "(" expr() ")"| "new" <ID>}

void basic_expr() :

{}

{

<ID> "(" expr() ")“ // Choice 1

|

"(" expr() ")" // Choice 2

|

"new" <ID> // Choice 3

|

<ID> "." <ID> // Choice 4

}

FactorFactor

Page 17: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

17

Option 2 – Provide “Hints”Option 2 – Provide “Hints”• You can provide the generated parser with some hints to help it out in the non-LL(1) You can provide the generated parser with some hints to help it out in the non-LL(1)

situations that the warning messages bring to your attention.situations that the warning messages bring to your attention.• All such hints are specified using either setting the global LOOKAHEAD value to a All such hints are specified using either setting the global LOOKAHEAD value to a

larger value or by using the LOOKAHEAD(...) construct to provide a local hint.larger value or by using the LOOKAHEAD(...) construct to provide a local hint.• Picking Option 1 or Option 2 is often a design decision However Picking Option 1 or Option 2 is often a design decision However

– Option 1 makes your grammar perform better. JavaCC generated parsers can handle LL(1) Option 1 makes your grammar perform better. JavaCC generated parsers can handle LL(1) constructs much faster than other constructs.constructs much faster than other constructs.

– Option 2 is that you have a simplerOption 2 is that you have a simpler grammar - one that is easier to develop and maintain - one grammar - one that is easier to develop and maintain - one that focuses on human-friendliness and not machine-friendliness.that focuses on human-friendliness and not machine-friendliness.

– Sometimes Option 2 is the only choice - especially in the presence of user actions.Sometimes Option 2 is the only choice - especially in the presence of user actions.

void basic_expr() :void basic_expr() :{}{}{{ { initMethodTables(); } <ID> "(" expr() ")"{ initMethodTables(); } <ID> "(" expr() ")"|| "(" expr() ")""(" expr() ")"|| "new" <ID>"new" <ID>|| { initObjectTables(); } <ID> "." <ID>{ initObjectTables(); } <ID> "." <ID>}}

• Since the actions are different, left-factoring cannot be performed.Since the actions are different, left-factoring cannot be performed.

Page 18: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

18

• Global Option LOOKAHEADGlobal Option LOOKAHEAD

void basic_expr() :{}{ LOOKAHEAD(2) <ID> "(" expr() ")"// Choice 1| "(" expr() ")" // Choice 2| "new" <ID> // Choice 3| <ID> "." <ID> // Choice

4}

void basic_expr() :{}{ LOOKAHEAD(2) <ID> "(" expr() ")"// Choice 1| "(" expr() ")" // Choice 2| "new" <ID> // Choice 3| <ID> "." <ID> // Choice

4}

if (next 2 tokens are <ID> and "(" ) { choose Choice 1} else if (next token is "(") { choose Choice 2} else if (next token is "new") { choose Choice 3} else if (next token is <ID>) { choose Choice 4} else { produce an error message}

if (next 2 tokens are <ID> and "(" ) { choose Choice 1} else if (next token is "(") { choose Choice 2} else if (next token is "new") { choose Choice 3} else if (next token is <ID>) { choose Choice 4} else { produce an error message}

Page 19: Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.

19

ReferencesReferences

• https://javacc.dev.java.net/doc/https://javacc.dev.java.net/doc/lookahead.htmllookahead.html