1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph....

Post on 06-Jan-2018

242 views 1 download

description

3 January 18, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Keep in mind with following questions Regular Expressions –a–a concise and flexible means for identifying strings of text –w–written in a formal language –I–Interpreted by a RegEx processor Why RegEx –P–Precise definition of language –L–Layered definition of language –L–Lexical/Syntax/Semantic Further use of RegEx –S–Supportive foundation of Lexer –F–Formal communication –C–Common application ***

Transcript of 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph....

1

May 3, 20231

May 3, 2023May 3, 2023 Azusa, CAAzusa, CA

Sheldon X. Liang Ph. D.

Computer Science at Computer Science at Azusa Pacific UniversityAzusa Pacific University

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS400 Compiler ConstructionCS400 Compiler Construction

2

Formalization

May 3, 20232

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Lexical Analysis & Lexical Analyzer Generators

Regular Expressions Finite Automata RE Conversion FA Lexer Design

3

May 3, 20233

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Keep in mind with following questionsKeep in mind with following questions• Regular Expressions

– a concise and flexible means for identifying strings of text

– written in a formal language– Interpreted by a RegEx processor

• Why RegEx – Precise definition of language– Layered definition of language– Lexical/Syntax/Semantic

• Further use of RegEx– Supportive foundation of Lexer– Formal communication– Common application ***

4

May 3, 20234

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Why: Language Definition Problem

• How to precisely define language• Layered structure of language definition

– Start with a set of letters in language– Lexical structure - identifies “words” in

language (each word is a sequence of letters)– Syntactic structure - identifies “sentences” in

language (each sentence is a sequence of words) – Semantics - meaning of program (specifies what

result should be for each input)– Today’s topic: lexical and syntactic structures

5

• Basis symbols: is a regular expression denoting language {}– a is a regular expression denoting {a}

• If r and s are regular expressions denoting languages L(r) and M(s) respectively, then– rs is a regular expression denoting L(r) M(s)– rs is a regular expression denoting L(r)M(s)– r* is a regular expression denoting L(r)*

– (r) is a regular expression denoting L(r)• A language defined by a regular expression is called a regular

setMay 3, 2023

5Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Specification of Patterns for Tokens: Regular Expressions

6

May 3, 20236

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

7

May 3, 20237

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

8

Specification of Patterns for Tokens: Regular Definitions

• Regular definitions introduce a naming convention: d1 r1

d2 r2

…dn rn

where each ri is a regular expression over {d1, d2, …, di-1 }

• Any dj in ri can be textually substituted in ri to obtain an equivalent set of definitions

May 3, 20238

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

9

Specification of Patterns for Tokens: Regular Definitions

• Example:

letter AB…Zab…z digit 01…9 id letter ( letterdigit )*

• Regular definitions are not recursive:

digits digit digitsdigit wrong!

May 3, 20239

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

10

Specification of Patterns for Tokens: Notational Shorthand

• The following shorthands are often used:

r+ = rr*

r? = r[a-z] = abc…z

• Examples:digit [0-9]num digit+ (. digit+)? ( E (+-)? digit+ )?

May 3, 202310

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

11

Regular Definitions and Grammars

stmt if expr then stmt if expr then stmt else stmt expr term relop term termterm id num

if if then then else elserelop < <= <> > >= = id letter ( letter | digit )*

num digit+ (. digit+)? ( E (+-)? digit+ )?

Grammar

Regular definitions

May 3, 202311

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

12

Coding Regular Definitions in Transition Diagrams

May 3, 202312

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

9start letter 10 11*other

letter or digit

return(gettoken(), ())

id → lttr ( lttr digit )*

13

Coding Regular Definitions in Transition Diagrams: Codetoken nexttoken()

{ while (1) { switch (state) { case 0: c = nextchar(); if (c==blank || c==tab || c==newline) { state = 0; lexeme_beginning++; } else if (c==‘<’) state = 1; else if (c==‘=’) state = 5; else if (c==‘>’) state = 6; else state = fail(); break; case 1: … case 9: c = nextchar(); if (isletter(c)) state = 10; else state = fail(); break; case 10: c = nextchar(); if (isletter(c)) state = 10; else if (isdigit(c)) state = 10; else state = 11; break; …

int fail(){ forward = token_beginning; swith (start) { case 0: start = 9; break; case 9: start = 12; break; case 12: start = 20; break; case 20: start = 25; break; case 25: recover(); break; default: /* error */ } return start;}

Decides thenext start state

to check

May 3, 202313

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

14

Common Application of Regular Expressions

May 3, 202314

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Validate passwords and email addresses

Extract specific sections from an HML page

Parse data files

Replace values (strings)

15

May 3, 202315

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction

Got it with following questionsGot it with following questions• Regular Expressions

– a concise and flexible means for identifying strings of text

– written in a formal language– Interpreted by a RegEx processor

• Why RegEx – Precise definition of language– Layered definition of language– Lexical/Syntax/Semantic

• Further use of RegEx– Supportive foundation of Lexer– Formal communication– Common application ***

16

Thank you very much!

Questions?

May 3, 202316

Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/

CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction