Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti...
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti...
![Page 1: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/1.jpg)
Finite Automata and Regular Expressions
i206 Fall 2010
John ChuangSome slides adapted from Marti Hearst
![Page 2: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/2.jpg)
John Chuang 2
QuickTime™ and a decompressor
are needed to see this picture.
http
://x
kcd.
com
/208
/
![Page 3: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/3.jpg)
John Chuang 3
What is a Computer?
Theory of computation:- What are the fundamental capabilities and limitations of computers?
- Complexity theory: what makes some problems computationally hard and others easy?
- Computability theory: what makes some problems solvable and others unsolvable?
- Automata theory: definitions and properties of mathematical models of computation
Source: Sipser, 1997
![Page 4: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/4.jpg)
John Chuang 4
Finite Automata
Also known as finite state automata (FSA) or finite state machine (FSM)
A simple mathematical model of a computer
Applications: hardware design, compiler design, text processing
![Page 5: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/5.jpg)
John Chuang 5
A First Example
Touch-less hand-dryer
hand
hand hand
DRYEROFF
DRYERON
hand
![Page 6: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/6.jpg)
John Chuang 6
Finite Automata State Graphs
• The start state
• An accepting state
• A transitionx
• A state
![Page 7: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/7.jpg)
John Chuang 7
Finite Automata
Transition:s1 x s2
- In state s1 on input “x” go to state s2
If end of input- If in accepting state => accept- Else => reject
If no transition possible => reject
![Page 8: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/8.jpg)
John Chuang 8
Language of a FA
Language of finite automaton M: set of all strings accepted by M
Example:
Which of the following are in the language?- x, tmp2, 123, a?, 2apples
A language is called a regular language if it is recognized by some finite automaton
letterletter | digit
S A
![Page 9: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/9.jpg)
John Chuang 9
Example
What is the language of this FA?
+
digit
S
B
A-
digit
digit
![Page 10: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/10.jpg)
John Chuang 10
Regular Expressions
Regular expressions are used to describe regular languages
Arithmetic expression example: (8+2)*3
Regular expression example: (\+|-)?[0-9]+
![Page 11: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/11.jpg)
John Chuang 11
Example
What is the language of this FA? Regular expression: (\+|-)?[0-9]+
+
digit
S
B
A-
digit
digit
![Page 12: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/12.jpg)
John Chuang 12
Three Equivalent Representations
Finite automata
Regularexpressions
Regular languages
Each can
describethe others
Theorem:
For every regular expression, there is a deterministic finite-state automaton that defines the same language, and vice versa.
Adapted from Jurafsky & Martin 2000
![Page 13: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/13.jpg)
John Chuang 13
Regex Rules
Goal: match patterns // String of characters matches the same string
woodchuck “how much wood does a woodchuck chuck?”
p “you are a good programmer”
pink elephant “this is not a pink elephant.”! “Keep you hands to yourself!”
. Wildcard; matches any character at that positionp.nt pant, pint, punt
![Page 14: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/14.jpg)
John Chuang 14
Regex Rules ? Zero or one occurrences of the preceding
character/regexwoodchucks? “how much wood does a woodchuck chuck?”behaviou?r “behaviour is the British spelling of
behavior”
* Zero or more occurrences of the preceding character/regexbaa* ba, baa, baaa, baaaa …ba* b, ba, baa, baaa, baaaa …
[ab]* , a, b, ab, ba, baaa, aaabbb, …[0-9][0-9]* any positive integer, or zerocat.*cat A string where “cat” appears twice anywhere
+ One or more occurrences of the preceding character/regexba+ ba, baa, baaa, baaaa …
{n} Exactly n occurrences of the preceding character/regexba{3} baaa
![Page 15: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/15.jpg)
John Chuang 15
Regex Rules * is greedy:
<.*> <a href=“index.html”>Home</a>
Lazy (non-greedy) quantifier:<.*?> <a href=“index.html”>Home</a>
![Page 16: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/16.jpg)
John Chuang 16
Regex Rules
[] Disjunction (Union)[wW]ood “how much wood does a Woodchuck chuck?”
[abcd]* “you are a good programmer”[A-Za-z0-9] (any letter or digit)[A-Za-z]* (any letter sequence)
| Disjunction (Union)(cats?|dogs?)+ “It’s raining cats and a dog.”
( ) Grouping(gupp(y|ies))* “His guppy is the king of guppies.”
![Page 17: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/17.jpg)
John Chuang 17
Regex Rules ^ $ \b Anchors (start/end of input string; word boundary)^The “The cat in the hat.”^The end\.$ “The end.”^The .* end\.$ “The bitter end.”(the)* “I saw him the other day.”(\bthe\b)* “I saw him the other day.”
Special rule: when ^ is FIRST WITHIN BRACKETS it means NOT[^A-Z]* (anything not an upper case letter)
![Page 18: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/18.jpg)
John Chuang 18
Regex Rules \ Escape characters
\. “The + and \ characters are missing.”
\+ “The + and \ characters are missing.”
\\ “The + and \ characters are missing.”
![Page 19: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/19.jpg)
John Chuang 19
Operator Precedence
What is the difference? - [a-z][a-z]|[0-9]*- [a-z]([a-z]|[0-9])*
Operator Precedence
() highest
* + ? {}
sequences, anchors
| lowest
![Page 20: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/20.jpg)
John Chuang 20
Regex for Dollars
No commas\$[0-9]+(\.[0-9][0-9])?
With commas\$[0-9][0-9]?[0-9]?(,[0-9][0-9][0-9])*(\.[0-9][0-9])?
With or without commas \$[0-9][0-9]?[0-9]?((,[0-9][0-9][0-9])*| [0-9]*) (\.[0-9][0-9])?
![Page 21: Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d415503460f94a1b1f7/html5/thumbnails/21.jpg)
John Chuang 21
Regex in Python
import re
result = re.search(pattern, string) result = re.findall(pattern, string) result = re.match(pattern, string)
Python documentation on regular expressions - http://docs.python.org/library/re.html- Some useful flags like IGNORECASE, MULTILINE, DOTALL, VERBOSE