1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph....

16
1 May 15, 2022 1 May 15, 2022 May 15, 2022 Azusa, Azusa, CA CA Sheldon X. Liang Ph. D. Computer Science at Computer Science at Azusa Azusa Pacific University Pacific University Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/ CS400 Compiler Construction CS400 Compiler Construction

description

3 January 18, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Keep in mind with following questions Regular Expressions –a–a concise and flexible means for identifying strings of text –w–written in a formal language –I–Interpreted by a RegEx processor Why RegEx –P–Precise definition of language –L–Layered definition of language –L–Lexical/Syntax/Semantic Further use of RegEx –S–Supportive foundation of Lexer –F–Formal communication –C–Common application ***

Transcript of 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph....

Page 1: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

1

May 3, 20231

May 3, 2023May 3, 2023 Azusa, CAAzusa, CA

Sheldon X. Liang Ph. D.

Computer Science at Computer Science at Azusa Pacific UniversityAzusa Pacific University

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS400 Compiler ConstructionCS400 Compiler Construction

Page 2: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

2

Formalization

May 3, 20232

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Lexical Analysis & Lexical Analyzer Generators

Regular Expressions Finite Automata RE Conversion FA Lexer Design

Page 3: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

3

May 3, 20233

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Keep in mind with following questionsKeep in mind with following questions• Regular Expressions

– a concise and flexible means for identifying strings of text

– written in a formal language– Interpreted by a RegEx processor

• Why RegEx – Precise definition of language– Layered definition of language– Lexical/Syntax/Semantic

• Further use of RegEx– Supportive foundation of Lexer– Formal communication– Common application ***

Page 4: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

4

May 3, 20234

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Why: Language Definition Problem

• How to precisely define language• Layered structure of language definition

– Start with a set of letters in language– Lexical structure - identifies “words” in

language (each word is a sequence of letters)– Syntactic structure - identifies “sentences” in

language (each sentence is a sequence of words) – Semantics - meaning of program (specifies what

result should be for each input)– Today’s topic: lexical and syntactic structures

Page 5: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

5

• Basis symbols: is a regular expression denoting language {}– a is a regular expression denoting {a}

• If r and s are regular expressions denoting languages L(r) and M(s) respectively, then– rs is a regular expression denoting L(r) M(s)– rs is a regular expression denoting L(r)M(s)– r* is a regular expression denoting L(r)*

– (r) is a regular expression denoting L(r)• A language defined by a regular expression is called a regular

setMay 3, 2023

5Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Specification of Patterns for Tokens: Regular Expressions

Page 6: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

6

May 3, 20236

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 7: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

7

May 3, 20237

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 8: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

8

Specification of Patterns for Tokens: Regular Definitions

• Regular definitions introduce a naming convention: d1 r1

d2 r2

…dn rn

where each ri is a regular expression over {d1, d2, …, di-1 }

• Any dj in ri can be textually substituted in ri to obtain an equivalent set of definitions

May 3, 20238

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 9: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

9

Specification of Patterns for Tokens: Regular Definitions

• Example:

letter AB…Zab…z digit 01…9 id letter ( letterdigit )*

• Regular definitions are not recursive:

digits digit digitsdigit wrong!

May 3, 20239

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 10: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

10

Specification of Patterns for Tokens: Notational Shorthand

• The following shorthands are often used:

r+ = rr*

r? = r[a-z] = abc…z

• Examples:digit [0-9]num digit+ (. digit+)? ( E (+-)? digit+ )?

May 3, 202310

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 11: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

11

Regular Definitions and Grammars

stmt if expr then stmt if expr then stmt else stmt expr term relop term termterm id num

if if then then else elserelop < <= <> > >= = id letter ( letter | digit )*

num digit+ (. digit+)? ( E (+-)? digit+ )?

Grammar

Regular definitions

May 3, 202311

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 12: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

12

Coding Regular Definitions in Transition Diagrams

May 3, 202312

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

9start letter 10 11*other

letter or digit

return(gettoken(), ())

id → lttr ( lttr digit )*

Page 13: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

13

Coding Regular Definitions in Transition Diagrams: Codetoken nexttoken()

{ while (1) { switch (state) { case 0: c = nextchar(); if (c==blank || c==tab || c==newline) { state = 0; lexeme_beginning++; } else if (c==‘<’) state = 1; else if (c==‘=’) state = 5; else if (c==‘>’) state = 6; else state = fail(); break; case 1: … case 9: c = nextchar(); if (isletter(c)) state = 10; else state = fail(); break; case 10: c = nextchar(); if (isletter(c)) state = 10; else if (isdigit(c)) state = 10; else state = 11; break; …

int fail(){ forward = token_beginning; swith (start) { case 0: start = 9; break; case 9: start = 12; break; case 12: start = 20; break; case 20: start = 25; break; case 25: recover(); break; default: /* error */ } return start;}

Decides thenext start state

to check

May 3, 202313

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Page 14: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

14

Common Application of Regular Expressions

May 3, 202314

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Validate passwords and email addresses

Extract specific sections from an HML page

Parse data files

Replace values (strings)

Page 15: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

15

May 3, 202315

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Got it with following questionsGot it with following questions• Regular Expressions

– a concise and flexible means for identifying strings of text

– written in a formal language– Interpreted by a RegEx processor

• Why RegEx – Precise definition of language– Layered definition of language– Lexical/Syntax/Semantic

• Further use of RegEx– Supportive foundation of Lexer– Formal communication– Common application ***

Page 16: 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

16

Thank you very much!

Questions?

May 3, 202316

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction