1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph....

15
1 July 3, 2022 1 July 3, 2022 July 3, 2022 Azusa, Azusa, CA CA Sheldon X. Liang Ph. D. Computer Science at Computer Science at Azusa Azusa Pacific University Pacific University Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/ CS400 Compiler Construction CS400 Compiler Construction

Transcript of 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph....

Page 1: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

1

April 20, 20231

April 20, 2023April 20, 2023 Azusa, CAAzusa, CA

Sheldon X. Liang Ph. D.

Computer Science at Computer Science at Azusa Pacific UniversityAzusa Pacific University

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS400 Compiler ConstructionCS400 Compiler Construction

Page 2: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

2

The Reason Why Lexical Analysis is a Separate Phase

• Simplifies the design of the compiler– LL(1) or LR(1) parsing with 1 token lookahead would not

be possible (multiple characters/tokens to match)

• Provides efficient implementation– Systematic techniques to implement lexical analyzers by

hand or automatically from specifications– Stream buffering methods to scan input

• Improves portability– Non-standard symbols and alternate character encodings

can be normalized (e.g. trigraphs)

April 20, 20232

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 3: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

3

Interaction of the Lexical Analyzer with the Parser

LexicalAnalyzer

ParserSource

Program

Token,tokenval

Symbol Table

Get nexttoken

error error

April 20, 20233

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 4: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

4

Attributes of Tokens

Lexical analyzer

<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>

y := 31 + 28*x

Parsertoken

tokenval(token attribute)

April 20, 20234

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 5: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

5

Formalization

April 20, 20235

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Lexical Analysis & Lexical Analyzer Generators

Regular Expressions Finite Automata RE Conversion FA Lexer Design

Page 6: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

6

April 20, 20236

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Keep in mind following questionsKeep in mind following questions

• Token– Lexical units– Atom parse element– Abstracted in syntax: e.g. Id

• Lexeme – Specific string making up token– Value / attribute related to a token– Concrete in language, e.g., Amt

• Spec of patterns for tokens– Alphabet - a finite set– String s - a finite sequence from – Language – a specific set of strings

Page 7: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

7

Tokens, Patterns, and Lexemes

• A token is a classification of lexical units– For example: id and num

• Lexemes are the specific character strings that make up a token– For example: abc and 123

• Patterns are rules describing the set of lexemes belonging to a token– For example: “letter followed by letters and digits”

and “non-empty sequence of digits”

April 20, 20237

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 8: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

8

• An alphabet is a finite set of symbols (characters)

• A string s is a finite sequence of symbols from s denotes the length of string s denotes the empty string, thus = 0

• A language is a specific set of strings over some fixed alphabet

April 20, 20238

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Specification of Patterns for Tokens: Definitions

Page 9: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

9

Specification of Patterns for Tokens: String Operations

• The concatenation of two strings x and y is denoted by xy

• The exponentation of a string s is defined by

s0 = si = si-1s for i > 0

note that s = s = sApril 20, 2023

9Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 10: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

10

• UnionL M = {s s L or s M}

• ConcatenationLM = {xy x L and y M}

• ExponentiationL0 = {}; Li = Li-1L

• Kleene closureL* = i=0,…, Li

• Positive closureL+ = i=1,…, Li

April 20, 202310

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Specification of Patterns for Tokens: Language Operations

Page 11: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

11

• Basis symbols: is a regular expression denoting language {}– a is a regular expression denoting {a}

• If r and s are regular expressions denoting languages L(r) and M(s) respectively, then– rs is a regular expression denoting L(r) M(s)– rs is a regular expression denoting L(r)M(s)– r* is a regular expression denoting L(r)*

– (r) is a regular expression denoting L(r)

• A language defined by a regular expression is called a regular set

April 20, 202311

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Specification of Patterns for Tokens: Regular Expressions

Page 12: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

12

Nondeterministic Finite Automata

• An NFA is a 5-tuple (S, , , s0, F) where

S is a finite set of states is a finite set of symbols, the alphabet is a mapping from S to a set of statess0 S is the start stateF S is the set of accepting (or final) states

April 20, 202312

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 13: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

13

Conversion of an NFA into a DFA

• The subset construction algorithm converts an NFA into a DFA using:

-closure(s) = {s} {t s … t}-closure(T) = sT -closure(s)move(T,a) = {t s a t and s T}

• The algorithm produces:Dstates is the set of states of the new DFA consisting of sets of states of the NFADtran is the transition table of the new DFA

April 20, 202313

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 14: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

14

April 20, 202314

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Got it with following questionsGot it with following questions• Tokens

– Lexical units– Atom parse element– Abstracted in syntax: e.g. Id

• Lexeme – Specific string making up token– Value / attribute related to a token– Concrete in language, e.g., Amt

• Spec of patterns for tokens– Alphabet - a finite set– String s - a finite sequence from – Language – a specific set of strings

Page 15: 1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

15

Thank you very much!

Questions?

April 20, 202315

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction