11
Syntax SpecificationSyntax SpecificationRegular ExpressionsRegular Expressions
22
Phases of CompilationPhases of Compilation
33
Syntax AnalysisSyntax Analysis
• Syntax:Syntax:– Webster’s definition: Webster’s definition: 1 a : the way in which linguistic 1 a : the way in which linguistic
elements (as words) are put together to form constituents elements (as words) are put together to form constituents (as phrases or clauses)(as phrases or clauses)
• The syntax of a programming languageThe syntax of a programming language– Describes its formDescribes its form
» i.e.i.e. Organization of tokens Organization of tokens (elements)(elements)
– Formal notationFormal notation» Context Free Grammars (CFGs)Context Free Grammars (CFGs)
44
Review: Formal definition of tokensReview: Formal definition of tokens
• A set of tokens is a set of strings over an alphabetA set of tokens is a set of strings over an alphabet– {read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …}{read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …}
• A set of tokens is a A set of tokens is a regular setregular set that can be defined by that can be defined by comprehension using a comprehension using a regular expressionregular expression
• For every regular set, there is a For every regular set, there is a deterministic finite deterministic finite automatonautomaton (DFA) that can recognize it (DFA) that can recognize it
– i.e.i.e. determine whether a string belongs to the set or not determine whether a string belongs to the set or not– Scanners extract tokens from source code in the same way Scanners extract tokens from source code in the same way
DFAs determine membershipDFAs determine membership
55
Regular ExpressionsRegular Expressions
• A regular expression (RE) is:A regular expression (RE) is:– A single characterA single character– The empty string, The empty string, – The The concatenationconcatenation of two regular expressions of two regular expressions
» Notation:Notation: RE RE11 RE RE22 ( (i.e. i.e. RERE11 followed by RE followed by RE22))– The The unionunion of two regular expressionsof two regular expressions
» Notation: Notation: RERE11 | RE | RE22
– The The closureclosure of a regular expression of a regular expression» Notation: Notation: RE*RE*» * is known as the * is known as the Kleene starKleene star» * * represents the concatenation of 0 or more stringsrepresents the concatenation of 0 or more strings
– Non-null enumerationNon-null enumeration» Notation: Notation: RERE++
» represents all non-null concatenations of RE (1 or more times)represents all non-null concatenations of RE (1 or more times)
66
Regular Expressions BasicsRegular Expressions Basics
Let alphabet Let alphabet ={a,b} ={a,b} (means a and b are its only letters)(means a and b are its only letters)
aa**=(=(, a, aa, aaa, ...}, a, aa, aaa, ...}
(ab)*=((ab)*=(, ab, abab, ababab, ...}, ab, abab, ababab, ...}
a a b=(a, b=(a, , b, bb, bb, ...}, b, bb, bb, ...}
(a (a b)* b)*== all strings containing a’s and b’s all strings containing a’s and b’s
(a*b*)*=(ab*)*= all strings containing a’s and b’s(a*b*)*=(ab*)*= all strings containing a’s and b’s
a*b*={aa*b*={aiibbjj| i >=0, j>=0)| i >=0, j>=0)
77
Building Regular ExpressionsBuilding Regular Expressions
Regular Expressions as LanguageRegular Expressions as Language
• ** while loopwhile loop–iterates 0 or more timesiterates 0 or more times
• concatenationconcatenation uvuv–sequential; first sequential; first u, u, then then vv
• uu vv OROR–select from one or the other or bothselect from one or the other or both
88
Description Description Regular Expression Regular Expression
Let Let ={a,b} – all expressions over this alphabet={a,b} – all expressions over this alphabet
Strings withStrings with
• exactly one aexactly one a b*ab*b*ab*
• exactly two a’sexactly two a’s b*ab*ab*b*ab*ab*
• one or more a’sone or more a’s (b*ab*)* or (a (b*ab*)* or (a b)*a (a b)*a (a b)* b)*
• even number of a’seven number of a’s (b*ab*ab*)*(b*ab*ab*)*
• even number of a’s and exactly one beven number of a’s and exactly one b
• (aa)*b(aa)* (aa)*b(aa)* (aa)*ab(aa)*a (aa)*ab(aa)*a
• odd number of a’sodd number of a’s (b*ab*ab*)*b*ab*(b*ab*ab*)*b*ab*
• that don’t contain aathat don’t contain aa (b (b ab)*( ab)*( a) a)
99
Regular Expression Regular Expression Description Description
Same alphabetSame alphabet
• (aa)*(aa)* even number of a’seven number of a’s
• (a (a b) (a b) (a b) (a b) (a b) (a b) (a b) b)
all strings of length 4all strings of length 4
• ((a ((a b) (a b) (a b) (a b) (a b) (a b) (a b))* b))*
strings of length divisible by 4strings of length divisible by 4
• (aa)* (aa)* ((a ((a b) (a b) (a b) (a b) (a b) (a b) (a b))* b))*
strings of a’s of length strings of a’s of length divisible by 4divisible by 4
1010
Token Definition ExampleToken Definition Example
• Numeric literals in PascalNumeric literals in Pascal– Definition of the token Definition of the token unsigned_numberunsigned_number
digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
unsigned_integer unsigned_integer digitdigit digitdigit* | * | digitdigit++
unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( . ( ( . unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )
• Recursion is not allowed in Regular Expressions!Recursion is not allowed in Regular Expressions!
1111
ExerciseExercise
digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
unsigned_integer unsigned_integer digitdigit digitdigit**
unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )
• Regular expression forRegular expression for– Decimal numbersDecimal numbers
number number … …
1212
ExerciseExercise
digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
unsigned_integer unsigned_integer digitdigit digitdigit**
unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )
• Regular expression forRegular expression for– Decimal numbersDecimal numbers
number number ( + | – | ( + | – | ) ) unsigned_integer unsigned_integer ( ( ( ( unsigned_integer unsigned_integer ) | ) | ) )
1313
ExerciseExercise
digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
unsigned_integer unsigned_integer digitdigit digitdigit**
unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )
• Regular expression forRegular expression for– IdentifiersIdentifiers
identifier identifier ……
1414
ExerciseExercise
digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
unsigned_integer unsigned_integer digitdigit digitdigit**
unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )
• Regular expression forRegular expression for– IdentifiersIdentifiers
identifier identifier letter letter ( ( letterletter | digit | | digit | )* )*
letter letter a | b | c | … | z a | b | c | … | z
Top Related