1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph....
-
Upload
elaine-arnold -
Category
Documents
-
view
242 -
download
1
description
Transcript of 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph....
1
May 3, 20231
May 3, 2023May 3, 2023 Azusa, CAAzusa, CA
Sheldon X. Liang Ph. D.
Computer Science at Computer Science at Azusa Pacific UniversityAzusa Pacific University
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS400 Compiler ConstructionCS400 Compiler Construction
2
Formalization
May 3, 20232
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Lexical Analysis & Lexical Analyzer Generators
Regular Expressions Finite Automata RE Conversion FA Lexer Design
3
May 3, 20233
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Keep in mind with following questionsKeep in mind with following questions• Regular Expressions
– a concise and flexible means for identifying strings of text
– written in a formal language– Interpreted by a RegEx processor
• Why RegEx – Precise definition of language– Layered definition of language– Lexical/Syntax/Semantic
• Further use of RegEx– Supportive foundation of Lexer– Formal communication– Common application ***
4
May 3, 20234
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Why: Language Definition Problem
• How to precisely define language• Layered structure of language definition
– Start with a set of letters in language– Lexical structure - identifies “words” in
language (each word is a sequence of letters)– Syntactic structure - identifies “sentences” in
language (each sentence is a sequence of words) – Semantics - meaning of program (specifies what
result should be for each input)– Today’s topic: lexical and syntactic structures
5
• Basis symbols: is a regular expression denoting language {}– a is a regular expression denoting {a}
• If r and s are regular expressions denoting languages L(r) and M(s) respectively, then– rs is a regular expression denoting L(r) M(s)– rs is a regular expression denoting L(r)M(s)– r* is a regular expression denoting L(r)*
– (r) is a regular expression denoting L(r)• A language defined by a regular expression is called a regular
setMay 3, 2023
5Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Specification of Patterns for Tokens: Regular Expressions
6
May 3, 20236
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
7
May 3, 20237
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
8
Specification of Patterns for Tokens: Regular Definitions
• Regular definitions introduce a naming convention: d1 r1
d2 r2
…dn rn
where each ri is a regular expression over {d1, d2, …, di-1 }
• Any dj in ri can be textually substituted in ri to obtain an equivalent set of definitions
May 3, 20238
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
9
Specification of Patterns for Tokens: Regular Definitions
• Example:
letter AB…Zab…z digit 01…9 id letter ( letterdigit )*
• Regular definitions are not recursive:
digits digit digitsdigit wrong!
May 3, 20239
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
10
Specification of Patterns for Tokens: Notational Shorthand
• The following shorthands are often used:
r+ = rr*
r? = r[a-z] = abc…z
• Examples:digit [0-9]num digit+ (. digit+)? ( E (+-)? digit+ )?
May 3, 202310
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
11
Regular Definitions and Grammars
stmt if expr then stmt if expr then stmt else stmt expr term relop term termterm id num
if if then then else elserelop < <= <> > >= = id letter ( letter | digit )*
num digit+ (. digit+)? ( E (+-)? digit+ )?
Grammar
Regular definitions
May 3, 202311
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
12
Coding Regular Definitions in Transition Diagrams
May 3, 202312
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
9start letter 10 11*other
letter or digit
return(gettoken(), ())
id → lttr ( lttr digit )*
13
Coding Regular Definitions in Transition Diagrams: Codetoken nexttoken()
{ while (1) { switch (state) { case 0: c = nextchar(); if (c==blank || c==tab || c==newline) { state = 0; lexeme_beginning++; } else if (c==‘<’) state = 1; else if (c==‘=’) state = 5; else if (c==‘>’) state = 6; else state = fail(); break; case 1: … case 9: c = nextchar(); if (isletter(c)) state = 10; else state = fail(); break; case 10: c = nextchar(); if (isletter(c)) state = 10; else if (isdigit(c)) state = 10; else state = 11; break; …
int fail(){ forward = token_beginning; swith (start) { case 0: start = 9; break; case 9: start = 12; break; case 12: start = 20; break; case 20: start = 25; break; case 25: recover(); break; default: /* error */ } return start;}
Decides thenext start state
to check
May 3, 202313
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
14
Common Application of Regular Expressions
May 3, 202314
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Validate passwords and email addresses
Extract specific sections from an HML page
Parse data files
Replace values (strings)
15
May 3, 202315
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Got it with following questionsGot it with following questions• Regular Expressions
– a concise and flexible means for identifying strings of text
– written in a formal language– Interpreted by a RegEx processor
• Why RegEx – Precise definition of language– Layered definition of language– Lexical/Syntax/Semantic
• Further use of RegEx– Supportive foundation of Lexer– Formal communication– Common application ***
16
Thank you very much!
Questions?
May 3, 202316
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction