1 Lecture 5: Syntax Analysis (Section 2.2) CSCI 431 Programming Languages Fall 2002 A modification...

26
1 Lecture 5: Lecture 5: Syntax Analysis Syntax Analysis (Section 2.2) (Section 2.2) CSCI 431 Programming Languages CSCI 431 Programming Languages Fall 2002 Fall 2002 A modification of slides A modification of slides developed by Felix Hernandez- developed by Felix Hernandez- Campos at UNC Chapel Hill Campos at UNC Chapel Hill

Transcript of 1 Lecture 5: Syntax Analysis (Section 2.2) CSCI 431 Programming Languages Fall 2002 A modification...

11

Lecture 5: Lecture 5: Syntax AnalysisSyntax Analysis

(Section 2.2)(Section 2.2)

CSCI 431 Programming LanguagesCSCI 431 Programming Languages

Fall 2002Fall 2002

A modification of slides developed by Felix A modification of slides developed by Felix Hernandez-Campos at UNC Chapel HillHernandez-Campos at UNC Chapel Hill

22

Review: Compilation/InterpretationReview: Compilation/Interpretation

Compiler or InterpreterCompiler or Interpreter

Translation Translation ExecutionExecution

Source CodeSource Code

Target CodeTarget Code

Interpre-Interpre-tationtation

33

Review: Syntax AnalysisReview: Syntax Analysis

Compiler or InterpreterCompiler or Interpreter

Translation Translation Execution Execution

Source CodeSource Code• Specifying the Specifying the formform

of a programming of a programming

languagelanguage

– TokensTokens» Regular ExpressionsRegular Expressions

(also F.A.s & Reg. Grammars)(also F.A.s & Reg. Grammars)

– SyntaxSyntax» Context-FreeContext-Free

GrammarsGrammars(also P.D.A.s)(also P.D.A.s)

Target CodeTarget Code

44

Phases of CompilationPhases of Compilation

55

Syntax AnalysisSyntax Analysis

• Syntax:Syntax:– Webster’s definition: Webster’s definition: 1 a : the way in which linguistic 1 a : the way in which linguistic

elements (as words) are put together to form constituents elements (as words) are put together to form constituents (as phrases or clauses)(as phrases or clauses)

• The syntax of a programming languageThe syntax of a programming language– Describes its formDescribes its form

» Organization of tokensOrganization of tokens » Context Free Grammars (CFGs)Context Free Grammars (CFGs)

– Must be Must be recognizablerecognizable by compilers and interpreters by compilers and interpreters» ParsingParsing» LL and LR parsersLL and LR parsers

66

Context Free GrammarsContext Free Grammars

• CFGsCFGs– Add recursion to regular expressionsAdd recursion to regular expressions

» Nested constructionsNested constructions

– NotationNotationexpressionexpression identifieridentifier | | numbernumber | | -- expressionexpression | | (( expressionexpression )) | | expressionexpression operatoroperator expressionexpressionoperator operator ++ | | -- | | ** | | //

» Terminal symbolsTerminal symbols» Non-terminal symbolsNon-terminal symbols» Production rule (i.e. substitution rule)Production rule (i.e. substitution rule)

terminal symbol terminal symbol terminal and non-terminal symbols terminal and non-terminal symbols

77

ParsingParsing

• Parsing an arbitrary Context Free GrammarParsing an arbitrary Context Free Grammar– O(nO(n33))– Too slow for large programsToo slow for large programs

• Linear-time parsingLinear-time parsing– LL parsers (a ‘Left-to-right, Left-most’ derivation)LL parsers (a ‘Left-to-right, Left-most’ derivation)

» Recognize LL grammarRecognize LL grammar» Use a top-down strategyUse a top-down strategy

– LR parsers (a ‘Left-to-right, Right-most’ derivation)LR parsers (a ‘Left-to-right, Right-most’ derivation)» Recognize LR grammarRecognize LR grammar» Use a bottom-up strategyUse a bottom-up strategy

88

Parsing exampleParsing example

• Example: comma-separated list of identifierExample: comma-separated list of identifier

– CFGCFG

id_list id_list idid id_list_tailid_list_tailid_list_tail id_list_tail ,, id_list_tailid_list_tailid_list_tail id_list_tail ;;

– ParsingParsing

A, B, C;A, B, C;

99

Top-down derivation of Top-down derivation of A, B, C;A, B, C;

CFGCFG

Left-to-right,Left-to-right,Left-most derivationLeft-most derivation

LL(1) parsingLL(1) parsing

1010

Top-down derivation of Top-down derivation of A, B, C;A, B, C;

CFGCFG

1111

Bottom-up parsing of Bottom-up parsing of A, B, C;A, B, C;

CFGCFG

Left-to-right,Left-to-right,Right-most derivationRight-most derivation

LR parsingLR parsing(a shift-reduce parser)(a shift-reduce parser)

1212

Bottom-up parsing of Bottom-up parsing of A, B, C;A, B, C;

CFGCFG

1313

Bottom-up parsing of Bottom-up parsing of A, B, C;A, B, C;

CFGCFG

1414

LR Parsing vs. LL ParsingLR Parsing vs. LL Parsing

• LLLL– A ‘top-down’ or ‘predictive’ parserA ‘top-down’ or ‘predictive’ parser– Predict needed productions based on the current left-most Predict needed productions based on the current left-most

non-terminal in the tree and the current input tokennon-terminal in the tree and the current input token– The top-of-stack contains the left-most non-terminalThe top-of-stack contains the left-most non-terminal– The stack contains a record of what the parser expects to The stack contains a record of what the parser expects to

seesee

• LRLR– A ‘bottom-up’ or shift-reduce parserA ‘bottom-up’ or shift-reduce parser– Shifts tokens onto the stack until it recognizes a right-hand Shifts tokens onto the stack until it recognizes a right-hand

side then reduces those tokens to their left-hand sideside then reduces those tokens to their left-hand side– The stack contains a record of what the parser has already The stack contains a record of what the parser has already

seenseen

1515

An appropriate LR GrammarAn appropriate LR Grammar

id_listid_list id_list_prefixid_list_prefix ;;

id_list_prefixid_list_prefix id_list_prefixid_list_prefix ,, idid

idid

This grammar can’t be parsed top-down!This grammar can’t be parsed top-down!

Problems for LL grammars:Problems for LL grammars:

- left recursion, example above- left recursion, example above

- common prefixes, example:- common prefixes, example:

stmtstmt id := id := exprexpr | id ( | id (arg_listarg_list))

1616

LL(1) Grammar for the Calculator LL(1) Grammar for the Calculator LanguageLanguage

1717

LR(1) Grammar for the Calculator LR(1) Grammar for the Calculator LanguageLanguage

1818

Hierarchy of Linear ParsersHierarchy of Linear Parsers

• Basic containment relationshipBasic containment relationship– All CFGs can be recognized by LR parserAll CFGs can be recognized by LR parser– Only a subset of all the CFGs can be recognized by LL Only a subset of all the CFGs can be recognized by LL

parsersparsers

LL parsingLL parsing

CFGsCFGs LR parsingLR parsing

1919

Bigger PictureBigger Picture

• Chomsky Hierarchy of GrammarsChomsky Hierarchy of Grammars

RegularRegularGrammarGrammar

Context Free GrammarContext Free Grammar

Context Sensitive GrammarContext Sensitive Grammar

Unrestricted GrammarUnrestricted Grammar

2020

Implementation of an LL ParserImplementation of an LL Parser

• Two options:Two options:– A recursive descent parser (section 2.2.3)A recursive descent parser (section 2.2.3)

» For LL grammars onlyFor LL grammars only

– Parse table and a driver (section 2.2.5)Parse table and a driver (section 2.2.5)» LR parsers covered in section 2.2.6LR parsers covered in section 2.2.6

2121

Recursive Descent Parser ExampleRecursive Descent Parser Example

• LL(1) grammarLL(1) grammar

2222

Recursive Descent Parser ExampleRecursive Descent Parser Example

• Outline of Outline of

recursive parserrecursive parser

– This parser onlyThis parser onlyverifies syntaxverifies syntax

– matchmatch is isthe scannerthe scanner

2323

Recursive Descent Parser ExampleRecursive Descent Parser Example

2424

Recursive Descent Parser ExampleRecursive Descent Parser Example

2525

Recursive Descent Parser ExampleRecursive Descent Parser Example

A program that develops recursive decent A program that develops recursive decent parsers: parsers: JavaCC

2626

Semantic AnalysisSemantic Analysis

Compiler or InterpreterCompiler or Interpreter

Translation Translation Execution Execution

Source CodeSource Code• Specifying the Specifying the meaningmeaning

of a programming of a programming

languagelanguage

– Attribute GrammarsAttribute Grammars

Target CodeTarget Code