Lexical Analysis, Regular Expressions & Finite State Machines
description
Transcript of Lexical Analysis, Regular Expressions & Finite State Machines
![Page 1: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/1.jpg)
Lexical Analysis,Regular Expressions &Finite State Machines
![Page 2: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/2.jpg)
Processing English
• Consider the following two sentences• Hi, I am 22 years old. I come from Alabama.• 22 come Alabama I, old from am. Hi years I.
• Are they both correct?• How do you know?
• Same words, numbers and punctuation
• What did you do first?1. Find words, numbers and punctuation2. Then, check order (grammar rules)
![Page 3: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/3.jpg)
Finding Words and Numbers
• How did you find words, numbers and punctuation? • You have a definition of what each is, or looks like• For example, what is a number? a word?
• Although your are a bit more agile, the process was:1. Start with first character
2. If letter, assume word; if digit, assume number
3. Scan left to right 1 character at a time, until punctuation mark (space, comma, etc.)
4. Recognize word or number
5. If no more characters, done; otherwise return to 1
![Page 4: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/4.jpg)
Processing CodeHow do you process the following?
What are the main parts in which to break the input? void quote() {
print( "To iterate is human, to recurse divine." + " - L. Peter Deutsch");
}
Schemes: childOf(X,Y) marriedTo(X,Y)Facts: marriedTo('Zed','Bea'). marriedTo('Jack','Jill'). childOf('Jill','Zed'). childOf('Sue','Jack').Rules: childOf(X,Y) :- childOf(X,Z), marriedTo(Y,Z). marriedTo(X,Y) :- marriedTo(Y,X).Queries: marriedTo('Bea','Zed')? childOf('Jill','Bea')?
def addABC(x):s = “ABC”return x + s
addABC(input(“String: ”))
![Page 5: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/5.jpg)
Example
def addABC ( x ) :s = “ABC”return x + s
addABC ( input ( “String: ” ) )
![Page 6: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/6.jpg)
What are the Parts?
• They are called TOKENS• Process similar to English processing• Lexical Analysis
• Input:A program in some language
• Output:A list of tokens
(type, value, location)
![Page 7: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/7.jpg)
Example RevisitedSample Input: Sample Output:
def addABC(x): s = “ABC” return x + s
addABC(input(“String: ”))
(FUNDEF,”def”,1)(ID,”addABC”,1)(LEFT_PAREN,”(”,1)(ID,”x”,1)(RIGHT_PAREN,”)”,1)(COLON,”:”,1)(ID,”s”,2)(ASSIGN,”=”,2)(STRING,”’ABC’”,2)(FUNRET,”return”,3)(ID,”x”,3)(OPERATOR,”+”,3)(ID,”s”,3)(ID,”addABC”,4)(LEFT_PAREN,”(”,4)…
![Page 8: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/8.jpg)
Program Compilation
Lexical Analysis is first step of process
ProgramCompiler
Code
LexicalAnalyzer
Program
Parser
TokensCode
Generator
Internal Data Code
KeywordsString literals
Variables…
Error messages
Syntax Analysis Or Interpreter(Executed directly)
![Page 9: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/9.jpg)
Token Specification
• Regular Expressions• Pattern description for strings
• Concatenation: abc -> “abc” • Boolean OR: ab|ac -> “ab”, “ac” • Kleene closure: ab* -> “a”, “ab”, “abbb”, etc.• Optional: ab?c -> “ac”, “abc”• One or more: ab+ -> “ab”, “abbb”• Group using ()
• (a|b)c -> “ac”, “bc”• (a|b)*c -> “c”, “ac”, “bc”, “bac”, “abaaabbbabbaaaaac”, etc.
![Page 10: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/10.jpg)
RegEx Extensions
• Exactly n: a3b+ -> “aaab”, “aaabb”, …• [A-Z] = A|B|…|Z• [ABC] = A|B|C• [~aA] = any character but “a” or “A”• \ = escape character (e.g., \* -> “*”)• Whitespace characters
• \s, \t, \n, \v
![Page 11: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/11.jpg)
Token Recognition
• Finite State Machine• A DFSM is a 5-tuple (Σ,S,s0,δ,F)
• Σ: finite, non-empty set of symbols (input alphabet)
• S: finite, non-empty set of states• s0: member of S designated as start state
• δ: state-transition function δ: S x Σ -> S• F: subset of S (final states, may be empty)
![Page 12: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/12.jpg)
FSM & RegEx
• abc
• a(b|c)
• ab*
• (a(b?c))+
a b c
Note the special double-circle designation of a final/accepting state.
a
a
ab
b
b
a
c
cc
![Page 13: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/13.jpg)
Finite State Transducer
• Extended FSM:• Γ: finite, non-empty set of symbols (output
alphabet)• δ: state-transition function δ: S x Σ -> S x Γ
• FST consumes input symbols and emits output symbols• Lexical analyzer
• consume raw characters
• emit tokens
![Page 14: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/14.jpg)
CS 236 Coolness Factor!
• Design our own language• Subset of Datalog (LP-like)
• Build an interpreter for our language• Lexical Analyzer (Project 1)• Parser (Project 2)• Interpreter (Projects 3 and 4)• Optimization (Project 5)
![Page 15: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/15.jpg)
Designing a Language
• Define the tokens • Elements of the language, punctuation, etc.• For example, what are they in C++?
• Recognize the tokens (lexical analysis)• Define the grammar
• Forms of correct sentences• For example, what are they in C++?
• Recognize the grammar (parsing)• Interpret and execute the program• C++ is a bit too complicated for us…
![Page 16: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/16.jpg)
Varied World Viewsfct personlist siblings(person x) {
return x’s siblings
}
fct int square(int x) {
return x * x
}
fct boolean succeeds(person x) {
if studies(x) return T else return F
}
fct boolean sibling(person x, person y) {
if y is x’s sibling return T else return F
}
fct boolean square(int x, int y) {
if y == x * x return T else return F
}
fct boolean succeeds(person x) {
if studies(x) return T else return F
}
Look up table or oracleNo concerns with efficiency
![Page 17: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/17.jpg)
Logic Programming
• Assume: all functions are Boolean• Compute using facts and rules
• Facts are the known true values of the functions• Rules express relations among functions
• Example: studies(x), succeeds(x)• Facts: studies(Matt), studies(Jenny)• Rule: succeeds(x) :- studies(x)
• Closed-world Assumption
![Page 18: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/18.jpg)
Logic Programming
• Computing is like issuing queries• First check if it can be answered with facts• Second check if rules can be applied
• Examples• studies(Alex)?
• NO (neither facts nor rules to establish it)
• studies(Matt)?• YES (there is fact about that)
• succeeds(Jenny)?• YES (no fact, but a rule that if Jenny studies then she succeeds and a fact that
Jenny studies)
![Page 19: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/19.jpg)
Functions of Several Arguments
• Examples• loves(x,y), parent(x,y), inclass(x,y)• loves(x,y) :- married(x,y)
• Computing• parent(Christophe, Samuel)?
• Yes, if there is a fact that matches
• parent(Christophe, X)?• Yes, if there is a value of X that would cause it to match a fact – return value of X
• loves(X, Y)?• Yes, if there are values of X and Y that would make this true, either by matching a
fact or via rules (e.g., married(Christophe, Isabelle)) – return values of X and Y
![Page 20: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/20.jpg)
When We Are Done
Sample Program: Sample Execution:
Schemes: snap(S,N,A,P) csg(C,S,G) cn(C,N) ncg(N,C,G)
Facts: snap('12345','C. Brown','12 Apple St.','555-1234'). snap('22222','P. Patty','56 Grape Blvd','555-9999'). snap('33333','Snoopy','12 Apple St.','555-1234'). csg('CS101','12345','A'). csg('CS101','22222','B'). csg('CS101','33333','C'). csg('EE200','12345','B+'). csg('EE200','22222','B').
Rules: cn(C,N) :- snap(S,N,A,P),csg(C,S,G). ncg(N,C,G) :- snap(S,N,A,P),csg(C,S,G).
Queries: cn('CS101',Name)? ncg('Snoopy',Course,Grade)?
cn('CS101',Name)? Yes(3) Name='C. Brown' Name='P. Patty' Name='Snoopy'
ncg('Snoopy',Course,Grade)? Yes(1) Course='CS101', Grade='C'
Demo…
![Page 21: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/21.jpg)
Project 1: Lexical Analyzer
Sample Input: Sample Output:
Queries: IsInRoomAtDH('Snoopy',R,'M',H)#SchemesFactsRules.
(QUERIES,"Queries",1)(COLON,":",1)(ID,"IsInRoomAtDH",2)(LEFT_PAREN,"(",2)(STRING,"'Snoopy'",2)(COMMA,",",2)(ID,"R",2)(COMMA,",",2)(STRING,"'M'",2)(COMMA,",",2)(ID,"H",2)(RIGHT_PAREN,")",2)(COMMENT,"#SchemesFactsRules",3)(PERIOD,".",4)Total Tokens = 14
Define and find the tokens
![Page 22: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/22.jpg)
Basic FST for Project 1
<character (except <cr> and <eof>)>
‘ ‘
:
string
:
…
white space
ident.
-
<space> | <tab> | <cr>
<space> | <tab> | <cr>
<letter>
<letter> | <digit>
<any other char>
eof
error
Special check forKeywords (Schemes,Facts, Rules, Queries)<eof>
or:-
orkeywd.
start
:-
error<cr> or <eof>
![Page 23: Lexical Analysis, Regular Expressions & Finite State Machines](https://reader035.fdocuments.us/reader035/viewer/2022070314/56813a10550346895da1e894/html5/thumbnails/23.jpg)
Implementing a FSTState in Variable
state = START;input = readChar();while (state != ACCEPT) { if (state == START) { if (input == QUOTE) { input = readChar();
state = STRING; } else if (input == ...) { ... other kinds of tokens ... } } else if (state == STRING) { if (input == QUOTE) { input = readChar();
state = ACCEPT; } else { input = readChar();
state = STRING; } }}
State in Position in Codeinput = readChar();// begin in START state
if (input == QUOTE) { input = readChar(); // now in STRING state
while (input != QUOTE) { input = readChar(); // stay in STRING state } input = readChar(); // now in ACCEPT state
} else if (input == ...) { ... other kinds of tokens ...}