Fussy Set Theory

Fussy Set Theory

• Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element u of U a number in the interval [0,1].

• Set Theory: A={a, b, c}.Subset of A: {a, c}.

• An element is either in a set of not in a set. is either 0 or 1.

]1,0[: UA

)(uA

)(uA

Set Theory

• Let U be the set of all elements (universe)

• There are three basic operations:

• AB={elements in A or in B}.

• AB={elements in both A and B}

• Not A=U-A.

• Definition Let U be the universe of discourse, A and B be two fussy subsets of U, and be the complement of A relative to U. Also, let u be an element of U. Then,

A

)}(),(min{)(

)}(),(max{)(

)(1

uuu

uuu

u

BABA

BABA

AA

Fuzzy Information Retrieval

We first set up term-term correlation matric:

For terms ki and kl,

Where ni is the number of documents containing ki , nl is the number of documents containing kl

And ni,l is the number of documents containing both ki and kl. Note Ci,i=1.

lili

lili nnn

nc

,

,,


We define a fuzzy set for each term ki. In the fuzzy set for ki , a document dj has a degree of membership

ij computed as

Example: c1,2=0.1, c1,3=0.21.

D1=(0, 1, 1, 0). 1,1= 1-0.9*0.79.

D2=(1, 0, 0, 0). 1,2= 1-0. (since c1,1=1.)How is d3=(1, 0, 1,0)?

jl dk

liji c )1(1 ,,


Whenever, the document dj contains a term that is strongly related to ki, then the document dj is belong to the fuzzy set of term ki, i.e.,

i,j is very close to 1.

Example, c1,2=0.9, d1=(0, 1, 0, 0).

1,1 =1-(1-0.9)=0.9

Query:• Query is a Boolean formula, e.g.,

• q=Ka and (Kb or not Kc).

• q= (1, 1, 1) or (1, 1, 0) or (1, 0, 0).

• Suppose q is

)( cba kkkq

pdnf ccccccq 21

bDaD

cD321 ccccccDq

3cc 2cc

1cc

)]([ cba kkkq Figure 1. Fuzzy document sets for the query . Each is a conjunctive component. is the query fuzzy set.},3,2,1{, icci qD

))1)(1(1())1(1(

)1(1

)1(1

,,,,,,

,,,

3

1,

,, 321

jcjbjajcjbja

jcjbja

ijcc

jccccccjq

i

},,,{,, cbaiji jdWhere is the membership of

in the fuzzy set associated with . q,j is the membership of document j for query q.

ik

Exercise: suppose there are 3 doc. and 4 terms.

d1=(1, 0, 1, 0), d2=(1, 1, 0, 0), and d3=(0, 1, 1, 0).

(1) Compute the term-term correlation matrix c i,j.

(2) Compute i,j (membership of document j in term i.)

(3) If the query q=(1, 0, 0, 0) or (1, 1, 0, 0), compute q,k for each document dk.

Some changes in the last slide.

q, j= cc1+cc2+cc3,j=max {cc1,j, cc2,j , cc3,j},

where cc1,j, cc2,j , cc3,j are computed as before.

String Matching Allowing Errors

• Problem: Given a short pattern P of length m, a long text T of length n, and a maximum allowed number of errors k, find all the text positions where the pattern occurs with at most k errors.

Dynamic Programming

• C[i,j] be the number of errors allowed, i and j are the indices for the pattern and the text.

• Three kinds of error: mismatch (a, b), insertion( a, )and deletion ( , a).

])1,1[],1,[],,1[min(1 else

]1,1[ then ) ( if ], [

] 0, [

0 ],0[

jicjiCjiC

jiCTPjiC

iiC

jC

ji

The matrix

The dynamic programming algorithm search ‘survey’ in the text ‘surgery’ with two errors. Bold entries indicate matching positions. Running time O(nm).

s x s u r g e r y

0 0 0 0 0 0 0 0 0 0

s 1 0 1 0 1 1 1 1 1 1

u 2 1 1 1 0 1 2 2 2 2

r 3 2 2 2 1 0 1 2 2 3

v 4 3 3 3 2 1 1 2 3 3

e 5 4 4 4 3 2 2 1 2 3

y 6 5 5 5 4 3 3 2 2 2

Exercise

• Let ABCABCDDABEDF be the text and pattern be ABCDAB. Find the occurrence of the pattern with at most 1 error.

String Matching Allowing Errors (FAST Algorithm)

• Just keep the cells with value at most k.

• This will reduce the time complexity .

Regular expressions Matching

• Regular expression:

1. Any letter x in {},is a regular expression, where is the set of all letters.

2. if A and B are regular expression, then A|B, A.B and (A)* are regular expressions.

Regular expressions Matching(Not Required)

• Given an regular expression E and a string T, find all the substrings in T that match E.

• Let d(i) be the set of all states in the automaton that can be reached after T1T2…Ti is accepted.

• Given d(i), d(i+1) can be computed easily.• There is a starting and final state in the automa

ton. • Whenever the final state is reach, we find a su

bstring in T that match the expression.

S

ε

f

FA

FB

ε

ε

ε

FA|B

S fFA FB

ε

BAF

S fFA

ε

*(A)F

ε

ε

ε

AA

B

ε

ε A

B

B ε

ε

AB)|(BB)|AA(

a

b c

d

e f

g h

i j k

l

Example:

• E=(A|AA).(B|AB).

• T=ABBAB.

• D(1)={a, b, d, c}

• D(2)={ a,b, d, e, f, g, i },

• D(3)={a,b,c, e, f, g, i, h, l}.

• D(4)={a,b,d,c,j}

• D(5)={a,b,d, e, f, g, i, k}

Running time

• O(n2), where n is the size of the automaton since d(s, i) could contain O(n) states.

Fussy Set Theory

Documents

Transcript of Fussy Set Theory