ICS 482 Natural Language Processing Regular Expression and Finite Automata
description
Transcript of ICS 482 Natural Language Processing Regular Expression and Finite Automata
![Page 1: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/1.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 11
ICS 482ICS 482Natural Language ProcessingNatural Language Processing
Regular Expression andRegular Expression andFinite AutomataFinite Automata
Muhammed Al-MulhemMuhammed Al-Mulhem
March 1, 2009March 1, 2009
![Page 2: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/2.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 22
Regular ExpressionsRegular Expressions
Regular expression (RE): A formula for Regular expression (RE): A formula for specifying a set of strings. specifying a set of strings.
String: A sequence of alphanumeric String: A sequence of alphanumeric characters (letters, numbers, spaces, tabs, characters (letters, numbers, spaces, tabs, and punctuation). and punctuation).
![Page 3: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/3.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 33
Regular Expression PatternsRegular Expression Patterns
RE RE String matchedString matched
woodchuckswoodchucks ““interesting links to interesting links to woodchuckswoodchucks and lemurs”and lemurs”
aa ““SSaarah Ali stopped by Mona’s”rah Ali stopped by Mona’s”
Ali says,Ali says, ““My gift please,” My gift please,” Ali says,Ali says,””
bookbook ““all our pretty all our pretty bookbooks”s”
!! ““Leave him behindLeave him behind!!” said Sami” said Sami
![Page 4: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/4.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 44
Specify Options and Range using Specify Options and Range using [ ] and -[ ] and -
RERE MatchMatch
[wW]ood[wW]ood Wood or woodWood or wood
[abc][abc] ““a”, “b”, or “c”a”, “b”, or “c”
[A-Z] an uppercase letter
[a-z] a lowercase letter
[0-9] a single digit
![Page 5: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/5.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 55
RE OperatorsRE OperatorsRERE DescriptionDescription
a*a* Zero or more a’sZero or more a’s
a+a+ One or more a’sOne or more a’s
a?a? Zero or one a’sZero or one a’s
[ab]*[ab]* Zero or more a’s or b’s. Matches aaa.., ababab.., Zero or more a’s or b’s. Matches aaa.., ababab.., bbbb..bbbb..
[0-9]+[0-9]+ Sequence of one or more digits.Sequence of one or more digits.
.. Wildcard expression-matches any single character.Wildcard expression-matches any single character.
\b\b Matches a word boundary. Matches the but not otherMatches a word boundary. Matches the but not other
![Page 6: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/6.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 66
Sidebar: ErrorsSidebar: Errors
Find all instances of the word “the” in a Find all instances of the word “the” in a text.text.– /the//the/
What About ‘The’What About ‘The’
– /[tT]he//[tT]he/What about ‘Theater”, ‘Another’What about ‘Theater”, ‘Another’
![Page 7: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/7.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 77
Sidebar: ErrorsSidebar: Errors
The process we just went through was The process we just went through was based on: based on: – Matching strings that we should not have Matching strings that we should not have
matched (there, then, other)matched (there, then, other)False positivesFalse positives
– Not matching things that we should have Not matching things that we should have matched (The)matched (The)
False negativesFalse negatives
![Page 8: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/8.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 88
Sidebar: ErrorsSidebar: Errors
Reducing the error rate for an application Reducing the error rate for an application often involves two efforts often involves two efforts – Increasing accuracyIncreasing accuracy (minimizing false (minimizing false
positives)positives)– Increasing coverageIncreasing coverage (minimizing false (minimizing false
negatives)negatives)
![Page 9: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/9.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 99
Regular expressionsRegular expressionsBasic regular expression patternsBasic regular expression patterns
Perl-based syntax (slightly different from other Perl-based syntax (slightly different from other notations for regular expressions)notations for regular expressions)
Disjunctions Disjunctions [abc][abc]
Ranges Ranges [A-Z][A-Z]
Negations Negations [^Ss][^Ss]
Optional characters Optional characters ?, +?, + and and **
Wild cards Wild cards ..
Anchors Anchors \b\b and and \B\B
Disjunction, grouping, and precedence Disjunction, grouping, and precedence ||
![Page 10: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/10.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 1010
Preceding character or nothing Preceding character or nothing using ?using ?
![Page 11: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/11.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 111104/21/23 11
WildcardWildcard
![Page 12: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/12.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 121204/21/23 12
Negation using ^Negation using ^
![Page 13: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/13.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 1313
Writing correct expressionsWriting correct expressions
Exercise: write a regular expression to match the Exercise: write a regular expression to match the English article “the”:English article “the”:
/the/ missed ‘The’
included ‘the’ in ‘others’/[tT]he/
/\b[tT]he\b/ Missed ‘the25’ ‘the_’
/[^a-zA-Z][tT]he[^a-zA-Z]/Missed ‘The’ at the beginning of a line
![Page 14: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/14.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 1414
A more complex exampleA more complex example
Exercise: Write a Perl regular expression that Exercise: Write a Perl regular expression that will match “any PC with more than 500MHz and will match “any PC with more than 500MHz and 32 Gb of disk space for less than $1000”:32 Gb of disk space for less than $1000”:
![Page 15: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/15.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 1515
ExampleExamplePricePrice– /$[0-9]+/ /$[0-9]+/ # whole dollars # whole dollars – /$[0-9]+\.[0-9][0-9]/ /$[0-9]+\.[0-9][0-9]/ # dollars and cents # dollars and cents – /$[0-9]+(\.[0-9][0-9])?/ /$[0-9]+(\.[0-9][0-9])?/ #cents optional #cents optional – /\b$[0-9]+(\.[0-9][0-9])?\b/ /\b$[0-9]+(\.[0-9][0-9])?\b/ #word boundaries #word boundaries
Specifications for processor speed Specifications for processor speed – /\b[0-9]+ *(MHz|[Mm]egahertz|Ghz|[Gg]igahertz)\b/ /\b[0-9]+ *(MHz|[Mm]egahertz|Ghz|[Gg]igahertz)\b/
Memory size Memory size – /\b[0-9]+ *(Mb|[Mm]egabytes?)\b/ /\b[0-9]+ *(Mb|[Mm]egabytes?)\b/ – /\b[0-9](\.[0-9]+) *(Gb|[Gg]igabytes?)\b/ /\b[0-9](\.[0-9]+) *(Gb|[Gg]igabytes?)\b/
Vendors Vendors – /\b(Win95|WIN98|WINNT|WINXP *(NT|95|98|2000|XP)?)\b/ /\b(Win95|WIN98|WINNT|WINXP *(NT|95|98|2000|XP)?)\b/
– /\b(Mac|Macintosh|Apple)\b/ /\b(Mac|Macintosh|Apple)\b/
![Page 16: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/16.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 1616
Advanced Operators – Aliases for Advanced Operators – Aliases for common rangescommon ranges
Underscore: Correct figure 2.6
![Page 17: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/17.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 1717
\ to Reference special characters\ to Reference special characters
![Page 18: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/18.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 1818
Operators for countingOperators for counting
![Page 19: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/19.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 1919
FFinite inite SState tate AAutomatautomata
FSAFSA recognizes the regular languages recognizes the regular languages represented by regular expressionsrepresented by regular expressions– SheepTalk: /baa+!/SheepTalk: /baa+!/
• Directed graph with labeled nodes and arc transitions
• Five states: q0 the start state, q4 the final state, 5 transitions
q0 q4q1 q2 q3
b a aa !
![Page 20: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/20.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 2020
FormallyFormally
FSAFSA is a 5-tuple consisting of is a 5-tuple consisting of– QQ: set of states {q0,q1,q2,q3,q4}: set of states {q0,q1,q2,q3,q4} : an alphabet of symbols {a,b,!}: an alphabet of symbols {a,b,!}
– q0q0: A start state: A start state
– FF: a set of final states in Q {q4}: a set of final states in Q {q4} (q,i)(q,i): a transition function mapping : a transition function mapping Q x Q x
to Qto Q
q0 q4q1 q2 q3
b a
a
a !
![Page 21: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/21.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 2121
FSA recognizes (FSA recognizes (acceptsaccepts) strings of a ) strings of a regular languageregular language– baa!baa!– baaa!baaa!– baaaa!baaaa!– ……
A rejected inputA rejected input
aa bb aa !! bb
q0 q4q1 q2 q3b a
aa !
![Page 22: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/22.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 2222
State Transition TableState Transition Table
StateStateInputInput
bb aa !!
00 11 ØØ ØØ
11 ØØ 22 ØØ
22 ØØ 33 ØØ
33 ØØ 33 44
44 ØØ ØØ ØØq0 q4q1 q2 q3
b aa
a !
FSA can be represented FSA can be represented with State Transition Table with State Transition Table
![Page 23: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/23.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 2323
Non-Deterministic FSAs for Non-Deterministic FSAs for SheepTalkSheepTalk
q0 q4q1 q2 q3
b a a a !
q0 q4q1 q2 q3
b a a !
![Page 24: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/24.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 2424
A language is a set of A language is a set of stringsstrings
String:String: A sequence of lettersA sequence of letters
LanguagesLanguages
![Page 25: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/25.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 2525
Tracing FSA - Initial ConfigurationTracing FSA - Initial Configuration
1q 2q 3q 4qa b b a
5q
a a bb
ba,
Input String
a b b a
ba,
0q
![Page 26: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/26.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 2626
Reading the InputReading the Input
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
![Page 27: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/27.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 2727
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
![Page 28: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/28.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 2828
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
![Page 29: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/29.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 2929
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
![Page 30: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/30.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 3030
0q 1q 2q 3q 4qa b b a
Output: “accept”
5q
a a bb
ba,
a b b a
ba,
![Page 31: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/31.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 3131
RejectionRejection
1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
0q
![Page 32: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/32.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 3232
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
![Page 33: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/33.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 3333
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
![Page 34: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/34.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 3434
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
![Page 35: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/35.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 3535
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
Output:“reject”
a b a
ba,
![Page 36: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/36.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 3636
Another ExampleAnother Example
a
b ba,
ba,
0q 1q 2q
a ba
![Page 37: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/37.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 3737
a
b ba,
ba,
0q 1q 2q
a ba
![Page 38: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/38.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 3838
a
b ba,
ba,
0q 1q 2q
a ba
![Page 39: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/39.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 3939
a
b ba,
ba,
0q 1q 2q
a ba
![Page 40: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/40.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 4040
a
b ba,
ba,
0q 1q 2q
a ba
Output: “accept”
![Page 41: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/41.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 4141
RejectionRejection
a
b ba,
ba,
0q 1q 2q
ab b
![Page 42: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/42.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 4242
a
b ba,
ba,
0q 1q 2q
ab b
![Page 43: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/43.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 4343
a
b ba,
ba,
0q 1q 2q
ab b
![Page 44: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/44.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 4444
a
b ba,
ba,
0q 1q 2q
ab b
![Page 45: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/45.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 4545
a
b ba,
ba,
0q 1q 2q
ab b
Output: “reject”
![Page 46: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/46.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 4646
FormalitiesFormalities
Deterministic Finite Accepter (DFA)Deterministic Finite Accepter (DFA) FqQM ,,,, 0
Q
0q
F
: set of states
: input alphabet
: transition function
: initial state
: set of final states
![Page 47: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/47.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 4747
About AlphabetsAbout Alphabets
Alphabets means we need a finite set of Alphabets means we need a finite set of symbols in the input.symbols in the input.
These symbols can and will stand for These symbols can and will stand for bigger objects that can have internal bigger objects that can have internal structure.structure.
![Page 48: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/48.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 4848
Input Aplhabet Input Aplhabet
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
ba,
![Page 49: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/49.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 4949
Set of States Set of States
Q
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
543210 ,,,,, qqqqqqQ
ba,
![Page 50: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/50.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 5050
Initial State Initial State
0q
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
![Page 51: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/51.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 5151
Set of Final StatesSet of Final States
F
0q 1q 2q 3qa b b a
5q
a a bb
ba,
4qF
ba,
4q
![Page 52: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/52.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 5252
Transition Function Transition Function
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
QQ :
ba,
![Page 53: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/53.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 5353
10 , qaq
2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q 1q
![Page 54: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/54.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 5454
50 , qbq
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
![Page 55: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/55.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 5555
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
32 , qbq
![Page 56: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/56.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 5656
Transition FunctionTransition Function
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b
0q
1q
2q
3q
4q
5q
1q 5q
5q 2q
2q 3q
4q 5q
ba,5q5q5q5q
![Page 57: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/57.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 5757
Extended Transition FunctionExtended Transition Function(Reads the entire string)(Reads the entire string)
*
QQ *:*
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
![Page 58: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/58.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 5858
20 ,* qabq
3q 4qa b b a
5q
a a bb
ba,
ba,
0q 1q 2q
![Page 59: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/59.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 5959
40 ,* qabbaq
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
![Page 60: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/60.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 6060
50 ,* qabbbaaq
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
![Page 61: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/61.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 6161
50 ,* qabbbaaq
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
Observation: There is a walk from to with label
0q 5qabbbaa
![Page 62: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/62.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 6262
ExampleExample
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
abbaML M
accept
![Page 63: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/63.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 6363
Another ExampleAnother Example
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
abbaabML ,, M
acceptacceptaccept
![Page 64: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/64.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 6464
More ExamplesMore Examples
a
b ba,
ba,
0q 1q 2q
}0:{ nbaML n
accept trap state
![Page 65: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/65.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 6565
ML = { all substrings with prefix }ab
a b
ba,
0q 1q 2q
accept
ba,3q
ab
![Page 66: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/66.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 6666
ML = { all strings without substring }001
0 00 001
1
0
1
10
0 1,0
![Page 67: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/67.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 6767
Regular LanguagesRegular Languages
A language is regular if there is a DFA A language is regular if there is a DFA such thatsuch that
All regular languages form a language All regular languages form a language familyfamily
LM MLL
![Page 68: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/68.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 6868
ExampleExampleThe languageThe language
is regular:is regular:
*,: bawawaL
a
b
ba,
a
b
ba
0q 2q 3q
4q
![Page 69: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/69.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 6969
Finite State AutomataFinite State Automata
Regular expressions can be viewed as a Regular expressions can be viewed as a textual way of specifying the structure of textual way of specifying the structure of finite-state automata.finite-state automata.
![Page 70: ICS 482 Natural Language Processing Regular Expression and Finite Automata](https://reader034.fdocuments.us/reader034/viewer/2022052702/568142b8550346895daef962/html5/thumbnails/70.jpg)
March 1, 2009March 1, 2009 Dr. Muhammed Al-mulhemDr. Muhammed Al-mulhem 7070
More FormallyMore Formally
You can specify an FSA by enumerating You can specify an FSA by enumerating the following things.the following things.– The set of states: QThe set of states: Q– A finite alphabet: A finite alphabet: ΣΣ– A start stateA start state– A set of accept/final statesA set of accept/final states– A transition function that maps QxA transition function that maps QxΣΣ to Q to Q