Fall 2013
Chart 2
Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the Triangle
Compiler
Chart 3
Lexical Analyzer
Parser & Semantic Analyzer
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Chart 4
Main functiono Parse source program to discover its phrase structureo Recursive-descent parsingo Constructing an ASTo Scanning to group characters into tokens
Chart 5
Scanning (or lexical analysis)o Source program transformed to a stream of tokens
• Identifiers• Literals• Operators• Keywords• Punctuation
o Comments and blank spaces discarded Parsing
o To determine the source programs phrase structureo Source program is input as a stream of tokens (from the
Scanner)o Treats each token as a terminal symbol
Representation of phrase structureo AST
Chart 6
Scan the file character by character and group characters into words and punctuation (tokens), remove white space and comments
Tokens for this example:
letvary:Integeriny:=y+1
let var y: Integerin !new year y := y+1
Note: !new year does not appear in list of tokens. Comments are removed along with white spaces.
Chart 7
let var y: Integerin !new year y := y+1
InputConverter
Buffer
Scanner
l e t v a r y : I n t e g e r i n . . . .
= space)character string
let
let
var
var
y
Ident.
:
colonInteger
Ident.
in
in
y
Ident.
:=
becomes
y
Ident.
+
op.
1
Intlit.
eot
Chart 8
// literals, identifiers, operators... INTLITERAL = 0, "<int>", CHARLITERAL = 1, "<char>", IDENTIFIER = 2,
"<identifier>", OPERATOR = 3, "<operator>",
// reserved words - must be in alphabetical order...
ARRAY = 4, "array", BEGIN = 5, "begin", CONST = 6, "const", DO = 7, "do", ELSE = 8, "else", END = 9, "end", FUNC = 10, "func", IF = 11, "if", IN = 12,
"in", LET = 13, "let", OF = 14, "of", PROC = 15, "proc", RECORD = 16, "record", THEN = 17, "then", TYPE = 18, "type", VAR = 19, "var", WHILE = 20, "while",
// punctuation... DOT = 21, ".", COLON = 22, ":", SEMICOLON = 23, ";", COMMA = 24, ",", BECOMES = 25, "~", IS = 26,
// brackets... LPAREN = 27, "(", RPAREN = 28, ")", LBRACKET = 29, [", RBRACKET = 30, "]", LCURLY = 31, "{", RCURLY = 32, "}",
// special tokens... EOT = 33, "", ERROR = 34; "<error>"
Chart 9
Context free grammarso Generates a set of sentenceso Each sentence is a string of terminal symbolso An unambiguous sentence has a unique phrase
structure embodied in its syntax tree Develop parsers from context-free grammars
Chart 10
A regular expression (RE) is a convenient notation for expressing a set of stings of terminal symbols
Main featureso ‘|’ separates alternativeso ‘*’ indicates that the previous item may be represented
zero or more timeso ‘(‘ and ‘)’ are grouping parentheses
The empty string -- a special string of length 0
Chart 11
Algebraic Propertieso | is commutative and associative
• r|s = s|r• r|(s|t) = (r|s)|t
o Concatenation is associative• (rs)t = r(st)
o Concatenation distributes over |• r(s|t) = rs|rt• (s|t)r = sr|tr
– is the identity for concatenation• r = r
• r = ro * is idempotent
• r** = r*• r* = (r| )*
Chart 12
Common Extensionso r+ one or more of expression r, same as rr*o rk k repetitions of r
• r3 = rrro ~r the characters not in the expression r
• ~[\t\n]o r-z range of characters
• [0-9a-z]o r? Zero or one copy of expression (used for fields of
an expression that are optional)
Chart 13
Regular Expression for Representing Monthso Examples of legal inputs
• January represented as 1 or 01• October represented as 10
o First Try: [0|1|][0-9] 0, 1, or followed by a number between 0 and 9
• Matches all legal inputs? Yes1, 2, 3, …, 10, 11, 12, 01, 02, …, 09
• Matches any illegal inputs? Yes0, 00, 18
Chart 14
Regular Expression for Representing Monthso Examples of legal inputs
• January represented as 1 or 01• October represented as 10
o Second Try: [1-9]|(0[1-9])|(1[0-2])• Any number between 1 and 9 or 0 followed by any number
between 1 and 9 or 1 followed by any number between 0 and 2
• Matches all legal inputs? Yes1, 2, 3, …, 10, 11, 12, 01, 02, …, 09
• Matches any illegal inputs? No
Chart 15
Regular Expression for Floating Point Numberso Examples of legal inputs
• 1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6, -2.5e+5• Assume that a 0 is required before numbers less than 1 and
does not prevent extra leading zeros, so numbers such as 0011 or 0003.14159 are legal
o Building the regular expression• Assume
digit 0|1|2|3|4|5|6|7|8|9
• Handle simple decimals such as 1.0, 0.2, 3.14159digit+.digit+ 1 or more digits followed by . followed by 1 or more
decimals
• Add an optional sign (only minus, no plus)
(-| )digit+.digit+ or -?digit+.digit+
Chart 16
Regular Expression for Floating Point Numbers (cont.)o Building the regular expression (cont.)
• Format for the exponent(E|e)(+|-)?(digit+)
• Adding it as an optional expression to the decimal part
(-| )digit+.digit+((E|e)(+|-)?(digit+))?
Chart 17
Extended BNF (EBNF)o Combination of BNF and REo N::=X, where N is a nonterminal symbol and X is an
extended RE, i.e., an RE constructed from both terminal and nonterminal symbols
o EBNF• Right hand side may use |. *, (, )• Right hand side may contain both terminal and nonterminal
symbols
Chart 18
Expression::= primary-Expression (Operator primary-Expression)*
primary-Expression ::= Identifier| ( Expression )
Identifier ::= a|b|c|d|e
Operator ::= +|-|*|/
Generatesea + ba – b – ca + (b * c)a + (b + c) / da – (b – (c – (d – e)))
Chart 19
Left FactorizationXY | XZ is equivalent to X(Y | Z)
single-Command ::= V-name := Expression| if Expression then single-
Command| if Expression then single-
Commandelse single-Command
single-Command ::= V-name := Expression| if Expression then single-
Command( |else single-Command)
Chart 20
Elimination of left recursionN::= X | NY is equivalent to N::=X(Y)*
Identifier ::= Letter| Identifier Letter| Identifier Digit
Identifier ::= Letter| Identifier (Letter | Digit)
Identifier ::= Letter(Letter | Digit)*
Chart 21
Substitution of nonterminal symbolsGiven N::=X, we can substitute each occurrence of N with X
iff N::=X is nonrecursive and is the only production rule for N
single-Command ::= for Control-Variable := Expression To-or-DowntoExpression do single-Command
| …Control-Variable ::= IdentifierTo-or-Downto ::= to
| down
single-Command ::= for Identifier := Expression (to|downto)Expression do single-Command
| …
Chart 22
Starter set of an RE Xo Starters[[X]]o Set of terminal symbols that can start a string generated
by X Examples
o Starter[[his | her | its]] = {h, i}o Starter[[(re)* set]] = {r, s}
Chart 23
Precise and complete definition of starters:
starters[[starters[[t]] = {t} where t is a terminal symbol
starters[[X Y]] = starters[[X]] starters[[Y]] if X generates starters[[X Y]] = starters[[X]] if X does not
generate starters[[X | Y]] = starters[[X]] starters[[Y]]starters[[X *]] = starters[[X]]
To generalize for a starter set of an extended RE addo starters[[N]] = starters[[X]] where N is a
nonterminal symbol defined production rule N ::= X
Chart 24
Expression ::= primary-Expression (Operator primary-Expression)*primary-Expression ::= Identifier
| ( Expression )Identifier ::= a|b|c|d|eOperator ::= +|-|*|/
starters[[Expression]] = starters[[primary-Expression (Operator primary-Expression)*]]
= starters[[primany-Expression]] = starters[[Identifier]] starters[[ (Expressions ) ]] = starters[[a | b | c | d | e]] { ( } = {a, b, c, d, e, (}
Chart 25
The purpose of scanning is to recognize tokens in the source program. Or, to group input characters (the source program text) into tokens.
Difference between parsing and scanning:o Parsing groups terminal symbols, which are tokens, into
larger phrases such as expressions and commands and analyzes the tokens for correctness and structure
o Scanning groups individual characters into tokens
Chart 26
Lexical Analyzer
Parser & Semantic Analyzer
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Chart 27
let var y: Integerin !new year y := y+1
InputConverter
Buffer
Scanner
l e t v a r y : I n t e g e r i n . . . .
= space)character string
let
let
var
var
y
Ident.
:
colonInteger
Ident.
in
in
y
Ident.
:=
becomes
y
Ident.
+
op.
1
Intlit.
eot
Chart 28
Handle keywords (reserve words)o Recognizes identifiers and keywordso Match explicitly
• Write regular expression for each keyword• Identifier is any alpha numeric string which is not a keyword
o Match as an identifier, perform lookup• No special regular expressions for keywords• When an identifier is found, perform lookup into preloaded
keyword table
How does Triangle handle keywords?Discuss in terms of efficiency and ease to code.
Chart 29
Remove white spaceo Tabs, spaces, new lines
Remove commentso Single line
-- Ada commento Multi-line, start and end delimiters
{ Pascal comment }/* c comment */
o Nestedo Runaway comments
• Nonterminated comments can’t be detected till end of file
Chart 30
Perform look aheado Multi-character tokens
1..10 vs. 1.10&, &&<, <=etc
Challenging input languageso FORTRAN
• Keywords not reserved• Blanks are not a delimiter• Example (comma vs. decimal)
DO10I=1,5 start of a do loop (equivalent to a C for loop)DO10I=1.5 an assignment statement, assignment to variable DO10I
Chart 31
Challenging input languages (cont.)o PL/I, keywords not reserved
IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;
Chart 32
Error Handlingo Error token passed to parser which reports the erroro Recovery
• Delete characters from current token which have been read so far, restart scanning at next unread character
• Delete the first character of the current lexeme and resume scanning from next character.
o Examples of lexical errors:• 3.25e bad format for a constant• Var#1 illegal character
o Some errors that are not lexical errors• Mistyped keywords
• Begim• Mismatched parenthesis• Undeclared variables
Chart 33
Issueso Simpler design – parser doesn’t have to worry about
white space, etc.o Improve compiler efficiency – allows the construction of
a specialized and potentially more efficient processoro Compiler portability is enhanced – input alphabet
peculiarities and other device-specific anomalies can be restricted to the scanner
Chart 34
What are the keywords in Triangle? How are keywords and identifiers implemented in
Triangles? Is look ahead implemented in Triangle?
o If so, how?
Chart 35
Lexical Analyzer
Parser
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Semantic Analyzer
Chart 36
Given an unambiguous, context free grammar, parsing iso Recognition of an input string, i.e., deciding whether or
not the input string is a sentence of the grammaro Parsing of an input string, i.e., recognition of the input
string plus determination of its phrase structure. The phrase structure can be represented by a syntax tree, or otherwise.
Unambiguous is necessary so that every sentence of the grammar will form exactly one syntax tree.
Chart 37
The syntax of programming language constructs are described by context-free grammars.
Advantages of unambiguous, context-free grammarso A precise, yet easy-to understand, syntactic
specification of the programming languageo For certain classes of grammars we can automatically
construct an efficient parser that determines if a source program is syntactically well formed.
o Imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors.
o Easier to add new constructs to the language if the implementation is based on a grammatical description of the language
Chart 38
Check the syntax (structure) of a program and create a tree representation of the program
Programming languages have non-regular constructso Nestingo Recursion
Context-free grammars are used to express the syntax for programming languages
sequence of tokens parser syntax tree
Chart 39
Comprised ofo A set of tokens or terminal symbolso A set of non-terminal symbolso A set of rules or productions which express the legal
relationships between symbolso A start or goal symbol
Example:1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9
Tokens: -,+,0,1,2,…,9 Non-terminals: expr, digit Start symbol: expr
Chart 40
1. expr expr – digit
2. expr expr + digit
3. expr digit
4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
expr
expr
expr digit
digit
digit
3
2
8
+
-
Chart 41
Given a grammar for a language and a program, how do you know if the syntax of the program is legal?
A legal program can be derived from the start symbol of the grammar
Grammar must be unambiguous and context-free
Chart 42
The derivation begins with the start symbol At each step of a derivation the right hand side of a
grammar rule is used to replace a non-terminal symbol
Continue replacing non-terminals until only terminal symbols remain
1. expr expr – digit
2. expr expr + digit
3. expr digit
4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
expr expr – digit expr – 2 expr + digit - 2Rule 1 Rule 4 Rule 2
expr + 8-2 digit + 8-2 3+8 -2Rule 4 Rule 3 Rule 4
Chart 43
The rightmost non-terminal is replaced in each step
1. expr expr – digit
2. expr expr + digit
3. expr digit
4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
expr + digit - 2 expr + 8-2
expr + 8-2 digit + 8-2Rule 3
expr expr – digitRule 1
expr – digit expr – 2Rule 4
expr – 2 expr + digit - 2Rule 2
Rule 4
digit + 8-2 3+8 -2Rule 4
Chart 44
The leftmost non-terminal is replaced in each step
1. expr expr – digit
2. expr expr + digit
3. expr digit
4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
digit + digit – digit 3 + digit – digit
3 + digit – digit 3 + 8 – digitRule 4
expr expr – digitRule 1
expr – digit expr + digit – digitRule 2
expr + digit – digit digit + digit – digitRule 3
Rule 4
3 + 8 – digit 3 + 8 – 2Rule 4
Chart 45
The leftmost non-terminal is replaced in each step
digit + digit – digit 3 + digit – digit
3 + digit – digit 3 + 8 – digitRule 4
expr expr – digitRule 1
expr – digit expr + digit – digitRule 2
expr + digit – digit digit + digit – digitRule 3
Rule 4
3 + 8 – digit 3 + 8 – 2Rule 4
expr
expr
expr digit
digit
digit
3
2
8
+
-
33
22
11
44
55
66
11
22
33
44
55
66
Chart 46
Parser examines terminal symbols of the input string, in order from left to right
Reconstructs the syntax tree from the bottom (terminal nodes) up (toward the root node)
Bottom-up parsing reduces a string w to the start symbol of the grammar.o At each reduction step a particular sub-string matching
the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.
Chart 47
Types of bottom-up parsing algorithmso Shift-reduce parsing
• At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.
o LR(k) parsing• L is for left-to-right scanning of the input, the R is for
constructing a right-most derivation in reverse, and the k is for the number of input symbols of look-ahead that are used in making parsing decisions.
Chart 48
1. expr expr – digit
2. expr expr + digit
3. expr digit
4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
3 + 8 - 2
3 + 8 - 2
digit
3 + 8 - 2
digitdigit
3 + 8 - 2
digitdigit
expr
Chart 49
3 + 8 - 2
digitdigit
expr
3 + 8 - 2
digitdigit
exprdigit
expr
3 + 8 - 2
digitdigit
exprdigit
Chart 50
1. S aABe
2. A Abc | b
3. B d
Example input: abbcde
a b b c d e
a b b c d e
A
Abbcde aAbcde
a b b c d e
A
aAbcde
Chart 51
1. S aABe
2. A Abc | b
3. B d
Example input: abbcde
a b b c d e
A
A
aAde
a b b c d e
A
A
aAbcde aAde
Chart 52
1. S aABe
2. A Abc | b
3. B d
Example input: abbcde
a b b c d e
A
A
aAde aABe
B
a b b c d e
A
A
aABe
B
Chart 53
1. S aABe
2. A Abc | b
3. B d
Example input: abbcde
a b b c d e
A
A
aABe S
B
S
Chart 54
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat .
the cat sees a rat
Noun
the cat sees a rat. the Noun sees a rat.
.
the cat sees a rat
Noun
the Noun sees a rat.
.
Chart 55
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject sees a rat.
Subject
.
the cat sees a rat
Noun
the Noun sees a rat. Subject sees a rat.
Subject
.
Chart 56
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject sees a rat. Subject Verb a rat.
Subject
Verb
.
the cat sees a rat
Noun
Subject Verb a rat.
Subject
Verb
.
Chart 57
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Subject Verb a rat. Subject Verb a Noun.
the cat sees a rat
Noun
Subject
Verb
.
Noun
the cat sees a rat
Noun
Subject
Verb
.
Noun
Subject Verb a Noun.
Chart 58
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject
Verb
.
Noun
Subject Verb a Noun. Subject Verb Object.
Object
Subject Verb Object.
the cat sees a rat
Noun
Subject
Verb
.
Noun
ObjectWhat would happened if we
choose ‘Subject a Noun’ instead of
‘Object a Noun’?
Chart 59
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject
Verb
.
Noun
Subject Verb Object.
Object
Sentence
Chart 60
The parser examines the terminal symbols of the input string, in order from left to right.
The parser reconstructs its syntax tree from the top (root node) down (towards the terminal nodes).
An attempt to find the leftmost derivation for an input string
Chart 61
General rules for top-down parserso Start with just a stub for the root nodeo At each step the parser takes the left most stubo If the stub is labeled by terminal symbol t, the parser
connects it to the next input terminal symbol, which must be t. (If not, the parser has detected a syntactic error.)
o If the stub is labeled by nonterminal symbol N, the parser chooses one of the production rules N::= X1…Xn, and grows branches from the node labeled by N to new stubs labeled X1,…, Xn (in order from left to right).
o Parsing succeeds when and if the whole input string is connected up to the syntax tree.
Chart 62
Two formso Backtracking parsers
• Guesses which rule to apply, back up, and changes choices if it can not proceed
o Predictive Parsers• Predicts which rule to apply by using look-ahead tokens
Backtracking parsers are not very efficient. We will cover Predictive parsers
Chart 63
Many typeso LL(1) parsing
• First L is scanning the input form left to right; second L is for producing a left-most derivation; 1 is for using one input symbol of look-ahead
• Table driven with an explicit stack to maintain the parse tree
o Recursive decent parsing• Uses recursive subroutines to traverse the parse tree
Chart 64
Lookahead in predictive parsingo The lookahead token (next token in the input) is used to
determine which rule should be used nexto For example:
1. term num term’
2. term’ ‘+’ num term’ | ‘-’ num term’ |
– num ‘0’|’1’|’2’|…|’9’
Example input: 7 + 3 - 2
term’num7
+
term
num term’
term
num term’
Chart 65
1. term num term’
2. term’ ‘+’ num term’ | ‘-’ num term’ |
– num ‘0’|’1’|’2’|…|’9’
Example input: 7 + 3 - 2
term’num7
+
term
num term’
3
term’num7
+
term
num term’
3
- num term’
Chart 66
1. term num term’
2. term’ ‘+’ num term’ | ‘-’ num term’ |
– num ‘0’|’1’|’2’|…|’9’
Example input: 7 + 3 - 2
term’num7 +
term
num term’
3 - num term’
2
term’num7 +
term
num term’
3 - num term’
2
Chart 67
Top-down parsing algorithmo Consists of a group of methods (programs) parseN, one
for each nonterminal symbol N of the grammar.o The task of each method parseN is to parse a single N-
phraseo These parsing methods cooperate to parse complete
sentences
Chart 68
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
.
a. Decide which production rule to apply. Only one, #1.This step created four stubs.
Chart 69
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun
Chart 70
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun
Chart 71
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun
Chart 72
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun Noun
Chart 73
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun Noun
Chart 74
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun Noun
Chart 75
ParseSentenceParseSubjectParseObjectParseVerbParseNoun
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Chart 76
ParseSentenceparseSubjectparseVerbparseObjectparseEnd
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Sentence
Subject
Verb
Object
.
Chart 77
ParseSubjectif input = “I”
acceptelse if input =“a”
acceptparseNoun
else if input = “the”
acceptparseNoun
else error
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Subject I
|
Noun
a
|
Noun
the
Chart 78
ParseNounif input = “cat”
acceptelse if input =“mat”
acceptelse if input = “rat”
acceptelse error
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Noun cat
| mat
| rat
Chart 79
ParseObjectif input = “me”
acceptelse if input =“a”
acceptparseNoun
else if input = “the”acceptparseNoun
else error
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Object
me
|
Noun
a
|
Noun
the
Chart 80
ParseVerbif input = “like”
acceptelse if input =“is”
acceptelse if input = “see”
acceptelse if input = “sees”
accept else error
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Verb like
| is
| see
| sees
Chart 81
ParseEndif input = “.”
acceptelse error
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
.
Chart 82
Given a (suitable) context-free grammaro Express the grammar in EBNF, with a single production
rule for each nonterminal symbol, and perform any necessary grammar transformations
• Always eliminate left recursion• Always left-factorize whenever possible
o Transcribe each EBNF production rule N::=X to a parsing method parseN, whose body is determined by X
o Make the parser consist of:• A private variable currentToken;• Private parsing methods developed in previous step• Private auxiliary methods accept and acceptIt, both of which
call the scanner• A public parse method that calls parseS, where S is the start
symbol of the grammar), having first called the scanner to store the first input token in currentToken
Chart 83
“C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg.”o Bjarne Stroustrup
Chart 84
Did you really say that? Dr. Bjarne Stroustrup: Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows your whole leg off. What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don't suspect it. I also said, "Within C++, there is a much smaller and cleaner language struggling to get out." For example, that quote can be found on page 207 of The Design and Evolution of C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is much cleaner than its syntax. I was thinking of programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.
Chart 85
For production rule N::=Xo Convert production rule to parsing method named parseN
• Private void parseN () {• Parse X• }
o Refine parseE to a dummy statemento Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt()o Refine parse N (where N is a non terminal symbol) to a call of the corresponding
parsing methodparseN()
o Refine parse X Y to{parseXparseY}}
o Refine parse X|YSwitch (currentToken.kind) {Cases in starter[[X]]
Parse XBreak;
Cases in starters[[Y]]:Parse YBreak
Default:Report a syntax error
}
Chart 86
For X | Y o Choose parse X only if the current token is one that can
start an X-phraseo Choose parse Y only if the current token is one that can
start an Y-phrase• starters[[X]] and starters[[Y]] must be disjoint
For X*o Choose
while (currentToken.kind is in starters[[X]])• starter[[X]] must be disjoint from the set of tokens that can
follow X* in this particular context
Chart 87
A grammar that satisfies both these conditions is called an LL(1) grammar
Recursive-descent parsing is suitable only for LL(1) grammars
Chart 88
Good programming languages are designed with a relatively large “distance” between syntactically correct programs, to increase the likelihood that conceptual mistakes are caught on syntactic errors.
Error repair usually occurs at two levels:o Local: repairs mistakes with little global import, such as
missing semicolons and undeclared variables.o Scope: repairs the program text so that scopes are
correct. Errors of this kind include unbalanced parentheses and begin/end blocks.
Chart 89
Repair actions can be divided into insertions and deletions. Typically the compiler will use some look ahead and backtracking in attempting to make progress in the parse. There is great variation among compilers, though some languages (PL/C) carry a tradition of good error repair. Goals of error repair are:o No input should cause the compiler to collapseo Illegal constructs are flaggedo Frequently occurring errors are repaired gracefullyo Minimal stuttering or cascading of errors.
LL-Style parsing lends itself well to error repair, since the compiler uses the grammar’s rules to predict what should occur next in the input
Chart 90
Single-Command ::= | V-name := Expression| Identifier ( Actual-Parameter-Sequence )| begin Command end| let Declaration in single-Command| if Expression then single-Command
else single-Command| while Expression do single-Command
V-name ::= Identifier| V-name . Identifier| V-name [ Expression ]
Identifier :: = Letter (Letter | Digit)*Letter ::= a|b|c|d|e|f|g|h|I|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|ZDigit :: = 0|1|2|3|4|5|6|7|8|9
Chart 91
Starter Set for REo starters[[X]] is the string of terminal symbols that can
start a string generated by X Example
starters[[single-Command]] = starters[[:=, (, begin, let, if, while]]• What about Vname vs Identifier?
• Use the look ahead when encounter Identifier to look for := or (.
Chart 92
Program ::= Command Program (1.14)Command ::= V-name := Expression AssignCommand (1.15a)
| Identifier ( Expression ) CallCommand (1.15b)| Command ; Command SequentialCommand (1.15c)| if Expression then Command IfCommand (15.d)
else Command| while Expression do Command WhileCommand (1.15e| let Declaration in Command LetCommand (1.15f)
Expression ::= Integer-Literal IntegerExpression (1.16a)| V-name
VnameExpression (1.16b)| Operator Expression UnaryExpression (1.16c)| Expression Operator Expression
BinaryExpressioiun (1.16d)V-name ::= Identifier SimpelVname (1.17)Declaration ::= const Identifier ~ Expression ConstDeclaration (1.18a)
| var Identifier : Typoe-denoter VarDeclaration (1.18b)| Declaration ; Declaration SequentialDeclaration (1.18c)
Type-denoter ::= Identifier SimpleTypeDenoter (1.19)
Label
Chart 93
An explicit representation of the source program’s phrase structure
AST for Mini-Triangle
Chart 94
Program ASTs (P):
Program
C
Program ::= Command Program (1.14
Command ASTs (C):
AssignCommand
V E
CallCommand
Identifier E
spelling
SequentialCommand
C1C2
Command ::= V-name := Expression AssignCommand (1.15a)
| Identifier ( Expression ) CallCommand (1.15b)
| Command ; Command SequentialCommand (1.15c)
(1.15a)(1.15b) (1.15c)
Chart 95
Command ASTs (C):
WhileCommand
E C
IfCommand
C1C2(1.15e)
(1.15d)
LetCommand
D C(1.15f) E
Command ::= | if Expression then Command IfCommand (15.d)
else Command
| while Expression do Command WhileCommand (1.15e
| let Declaration in Command LetCommand (1.15f)
Top Related