Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16...

44
Regular Expressions and Finite State Automata Gul Agha 2104 SC http://www.cs.uiuc.edu/class/fa05/cs421/current/ Based in part on slides by Mattox Beckman, as updated by Vikram Adve and Elsa Gunter

Transcript of Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16...

Page 1: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

Regular Expressions andFinite State Automata

Gul Agha2104 SChttp://www.cs.uiuc.edu/class/fa05/cs421/current/

Based in part on slides by Mattox Beckman, as updated by Vikram Adve and Elsa Gunter

Page 2: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 2

Language Syntax

Syntax is the description of which strings of symbols are meaningful expressions in a languageIt takes more than syntax to understand a language; need meaning (semantics) tooSyntax is the entry point

Page 3: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 3

Elements of Syntax

Character set – previously always ASCII, now often 64 character setsKeywords – usually reservedSpecial constants – cannot be assigned toIdentifiers – can be assigned toOperator symbolsDelimiters (parenthesis, braces, brackets,)Blanks (aka white space)

Page 4: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 4

Elements of Syntax

ExpressionsType expressionsDeclarations (in functional languages)Statements (in imperative languages)Subprograms

Page 5: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 5

Elements of Syntax

ModulesInterfacesClasses (for object-oriented languages)

Page 6: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 6

Formal Language Descriptions

Regular expressions, regular grammars, finite state automataContext-free grammars, BNF grammars, syntax diagrams

Whole family more of grammars and automata – covered in automata theory

Page 7: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 7

Grammars

Grammars are formal descriptions of which strings over a given character set are in a particular languageLanguage designers write grammarLanguage implementers use grammar to know what programs to acceptLanguage users use grammar to know how to write legitimate programs

Page 8: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 8

Regular Expressions

Start with a given character set –a, b, c…

Each character is a regular expressionThe regular expression b represents the set { b }

Page 9: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 9

Regular Expressions

If x and y are regular expressions, then xy is a regular expression

xy represents the set of all strings made concatenating a string in x with a string in y

If x and y are regular expressions, then x | yis a regular expression

x | y represents the set of strings in x or in y (or both)

Page 10: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 10

Regular Expressions

If x is a regular expression, then so is ( x )( x ) represents the same thing as x

If x is a regular expression, then so is x*x* represents strings made by concatenating zero or more strings from x

εrepresents the empty set

Page 11: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 11

Example Regular Expressions

(0|1)*1The set of all strings of 0’s and 1’s ending in 1, {1, 01, 11,…}

a*b(a*)The set of all strings of a’s and b’s with exactly one b

((01) |(10))*You tell me

Regular expressions (equivalently, regular grammars) important for lexing, breaking strings into recognized words

Page 12: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 12

Example: Lexing

Regular expressions good for describing lexemes (words) in a programming language

Identifier = (a | b | … | z | A | B | … | Z) (a | b |… | z | A | B | … | Z | 0 | 1 | … | 9)*Digit = (0 | 1 | … | 9)Number = (1 | … | 9)(0 | … | 9) | ~ (1 | … | 9)(0 | … | 9) Keywords: if = if, while = while,…

Page 13: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 13

Implementing Regular Expressions

Regular expressions can generate strings in a languageNot so good for recognizing when a string is in languageRegular expressions: which option to choose, how many repetitions to makeAnswer: finite state automata

Page 14: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 14

Finite State Automata

A finite state automata over an alphabet is:a directed graphedges are labeled with elements of alphabet,

or empty stringsome nodes (or states), marked as finalone node marked as start state

Syntax of FSA

Page 15: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 15

Example FSA

0 11

0Start State

Final State

Final State

1

0

10

Page 16: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 16

Deterministic FSA’s

If FSA has for every state exactly one edge for each letter in alphabet then FSA is deterministic

No edge labeled with ε

In general FSA in non-deterministic.NFSA also allows edges labeled by ε

Deterministic FSA special kind of non-deterministic FSA

Page 17: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 17

DFSA Language Recognition

Think of a DFSA as a board game; DFSA is boardYou have string as a deck of cards; one letter on each cardStart by placing a disc on the start state

Page 18: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 18

DFSA Language Recognition

Move the disc from one state to next along the edge labeled the same as top card in deck; discard top card

When you run out of cards:if you are in final state, you win; string is in languageif you are not in a final state, you lose; string is not in language

Page 19: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 19

DFSA Language Recognition -Summary

Given a string (over the alphabet)Start at start stateMove over edge labeled with first letter to new stateRemove first letter from stringRepeat until string goneIf end in final state then string in language

Semantics of FSA

Page 20: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 20

Example DFSA

Regular expression: (0 | 1)* 1Deterministic FSA

0 1

1

0

Page 21: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 21

Example DFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0 1

1

0

Page 22: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 22

Example DFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0 1

1

0

Page 23: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 23

Example DFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0 1

1

0

Page 24: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 24

Example DFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0 1

1

0

Page 25: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 25

Example DFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0 1

1

0

Page 26: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 26

Example DFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0 1

1

0

Page 27: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 27

Example DFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0 1

1

0

Page 28: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 28

NFSA generalize DFSA in two ways:Include edges labeled by ε

Allows process to non-deterministically change state

Each state can have zero, one or more edges labeled by each letter

Given a letter, non-deterministically choose an edge to use

Non-deterministic FSA’s

Page 29: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 29

NFSA Language Recognition

Play the same game as with DFSAFree move: move across an edge with empty string label without discarding cardWhen you run out of letters, if you are in final state, you win; string is in languageYou can take one or more moves back and try againIf have tried all possible paths without success, then you lose; string not in language

Page 30: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 30

FSA Language RecognitionMove the disc from one state to next if edge between labeled the same as top card in deck; discard top card Free move: move across an edge with empty string label without discarding cardWhen you run out of cards, if you are in final state, you win; string is in languageYou can take a move back and try anotherIf you have cards left and you have tried all possible edges without success, then you lose; string not in language

Page 31: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 31

Example NFSA

Regular expression: (0 | 1)* 1Non-deterministic FSA

0

1

1

Page 32: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 32

Example NFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0

1

1

Page 33: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 33

Example NFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0

1

1

Page 34: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 34

Example NFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0

1

1

Page 35: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 35

Example NFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0

1

1

Page 36: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 36

Example NFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1Guess

0

1

1

Page 37: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 37

Example NFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1Backtrack

0

1

1

Page 38: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 38

Example NFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1Guess again

0

1

1

Page 39: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 39

Example NFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1Guess

0

1

1

Page 40: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 40

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1Backtrack

Example NFSA

0

1

1

Page 41: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 41

Example NFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1Guess again

0

1

1

Page 42: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 42

Example NFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1

0

1

1

Page 43: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 43

Example NFSA

Regular expression: (0 | 1)* 1Accepts string 0 1 1 0 1Guess (Hurray!!)

0

1

1

Page 44: Regular Expressions and Finite State Automataakantor/teaching/cs421sp2006/...2/10/2006 CS 421 16 Deterministic FSA’s If FSA has for every state exactly one edge for each letter in

2/10/2006 CS 421 44

Rule Based Execution

SearchWhen stuck backtrack to last point with choices remainingExecuting the NFSA in last example was example of rule based executionFSA’s are rule-based programs; transitions between states (labeled edges) are rules; set of all FSA’s is programming language