LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3.

19
LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3.

LING 438/538Computational Linguistics

Sandiway Fong

Lecture 11: 10/3

2

Administrivia

• homework 2– will be returned

tomorrow (by email)

• homework 3– will be out on

Thursday

3

Last Tuesday

• textbook– Chapter 2: Regular Expressions and Finite State Automata

• regular expressions – Unix grep and – wildcard search in Microsoft Word

• implementing the FSA in Prolog– Method 1:

• two line program fsa/2 + • transition/3 (δ function) and final_state/1

– Method 2: • define each state, e.g. x, as a predicate, e.g. x/1, • taking the input list as an argument

– non-determinism handled by Prolog’s computation rule

4

Today’s Topic

• more on FSA– expressive power– limits

5

Determinism

• deterministic FSA (DFSA)– no ambiguity about where to go at any given state

• non-deterministic FSA (NDFSA)– no restriction on ambiguity (surprisingly, no increase in formal power)

• textbook– D-RECOGNIZE (FIGURE 2.13)– ND-RECOGNIZE (FIGURE 2.21)

fsa(S,L) :-fsa(S,L) :- L = [C|M], L = [C|M], transition(S,C,T),transition(S,C,T),fsa(T,M).fsa(T,M).

fsa(y,[]) :- fsa(y,[]) :- end_state(E)..

6

NDFSA → (D)FSA

[discussed at the end of section 2.2 in the textbook]• construct a new machine

– each state of the new machine represents the set of possible states of the original machine when stepping through the input

• Note: – new machine is equivalent to old one (but has more states)– new machine is deterministic

• example

s x

z

a

a

a

b

y

b

b

a

b

s {x,y}

{z}

a

aa

{y,z}

b

a

{y}

b

a

b

b

7

ε-transitions

• jump from state to another state with the empty character– ε-transition (textbook) or λ-transition– no increase in expressive power

• examplesa

ε

b> a

b

b>

a

ε

b>

what’s the equivalentwithout the ε-transition?

8

Start State(s)

• Finite State Automata (FSA)

– (Q,s,f,Σ,)1. set of states (Q): {s,x,y}

must be a finite set2. start state (s): s3. end state(s) (f): y

4. alphabet (Σ): {a, b}5. transition function :

signature: character × state → state (a,s)=x (a,x)=x (b,x)=y (b,y)=y

s x

y

aa

b

b

>

9

FSA Properties

• FSAs (and thus regular languages) are preserved, i.e. maintain their FSA nature, under...– concatenation– union– intersection– complementation– and other operations...

– [see section 2.3 of textbook]

10

concatenation

• concatenate two FSAs, result is a FSA– trick: use ε-transitions to link the automatons

• example– [figure 2.24]

11

union

• disjunction (union) of two FSAs, result is a FSA– trick: use ε-transitions to link the automatons

• example– [figure 2.26]

12

intersection

• (conjunction) intersect two FSAs, result is a FSA– trick: use (modified) set-of-states construction

• example

s1 x ya

a b

b

s2 zb

a b

{s1,s2} a{x,s2}

a

{y,z}

b

b

look familiar?that’s becausea+b* ∩ a*b+ = a+b+

13

complementation

• (complementation) the negation or opposite FSA – with respect to Σ*

• the set of all possible strings from the alphabet

– i.e. accepts everything original FSA rejects– and rejects everything original FSA accepts– result is still a FSA

14

Limits of Finite State Technology

• Language = set of strings• case 1

– suppose set is finite– e.g. L = {ba, abc, ccb, dd}

• easy to encode as a FSA

(by closure under union)

• case 2– set is infinite– ...

s1 s2 s3ab

s1 s2 s3ba s4

c

s1 s2 s3cc s4

b

s1 s2 s3dd

s0

ε

ε

ε

ε

15

Limits of Finite State Technology

• Language = set of strings• case 2

– set is infinite– e.g. L = a+b+ = { ab, aab, abb, aabb, aaab, abbb,

… }• “one or more a’s followed by one or more b’s”• we know this set is regular

– however, consider L = {anbn | n ≥ 1} = { ab, aabb, aaabbb, …}

• “same number of b’s as a’s…”• this set is not regular. Why?

s x

y

aa

b

b

16

The Limits of Finite State Technology

• [Formally, we can use the Pumping Lemma to prove this particular case.]

• informally, – we can build FSA for…– ab– aabb– aaabbb– …

a b

a a b b

a a a b b b

= end state

17

The Limits of Finite State Technology

• we can merge the individual FSA for…– ab– aabb– aaabbb a a a b b bb

b

b

• such direct encoding would require an infinite number of states– and we’re using Finite State Automata

• quite different from the infinity obtained by looping– freely iterate (no counting)

18

The Limits of Finite State Technology

• example– L = a+b+ = { ab, abb, aab,

aabb, aaab, abbb, … }– “one or more a’s followed

by one or more b’s”

• Note:– can be divided into two

independent halves– each half can be replaced

by iteration

s1 s2 s3ba

s1 s2 s3aa s4

b

s1 s2 s3ba s4

b

s1 s2 s3aa s4

bs5

b

s1 s2 s3aa s4

as5

b

s1 s2 s3ba s4

bs5

b

19

The Limits of Finite State Technology

• example– L = a+b+ = { ab, abb, aab,

aabb, aaab, abbb, … }– “one or more a’s followed

by one or more b’s”

• Note:– can be divided into two

independent halves– each half can be replaced

by iteration

s1 s2 s3ba

s1 s2 s3aa s4

b

s1 s2 s3ba s4

b

s1 s2 s3aa s4

bs5

b

s1 s2 s3aa s4

as5

b

s1 s2 s3ba s4

bs5

b

s1 s2 s3ba s4

b

s1 s2 s3aa s4

bs5

b

s0

εε

s1 s2 s3aa s4

as5

b s6b

s0

εε

s1 s2 s3aa s4

as5

b s6b b s7

s1 s2 s3aa s4

as5

b bs3 s4a

s5b ba