Fussy Set Theory
description
Transcript of Fussy Set Theory
Fussy Set Theory
• Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element u of U a number in the interval [0,1].
• Set Theory: A={a, b, c}.Subset of A: {a, c}.
• An element is either in a set of not in a set. is either 0 or 1.
]1,0[: UA
)(uA
)(uA
Set Theory
• Let U be the set of all elements (universe)
• There are three basic operations:
• AB={elements in A or in B}.
• AB={elements in both A and B}
• Not A=U-A.
• Definition Let U be the universe of discourse, A and B be two fussy subsets of U, and be the complement of A relative to U. Also, let u be an element of U. Then,
A
)}(),(min{)(
)}(),(max{)(
)(1
uuu
uuu
u
BABA
BABA
AA
Fuzzy Information Retrieval
We first set up term-term correlation matric:
For terms ki and kl,
Where ni is the number of documents containing ki , nl is the number of documents containing kl
And ni,l is the number of documents containing both ki and kl. Note Ci,i=1.
lili
lili nnn
nc
,
,,
Fuzzy Information Retrieval
We define a fuzzy set for each term ki. In the fuzzy set for ki , a document dj has a degree of membership
ij computed as
Example: c1,2=0.1, c1,3=0.21.
D1=(0, 1, 1, 0). 1,1= 1-0.9*0.79.
D2=(1, 0, 0, 0). 1,2= 1-0. (since c1,1=1.)How is d3=(1, 0, 1,0)?
jl dk
liji c )1(1 ,,
Fuzzy Information Retrieval
Whenever, the document dj contains a term that is strongly related to ki, then the document dj is belong to the fuzzy set of term ki, i.e.,
i,j is very close to 1.
Example, c1,2=0.9, d1=(0, 1, 0, 0).
1,1 =1-(1-0.9)=0.9
Query:• Query is a Boolean formula, e.g.,
• q=Ka and (Kb or not Kc).
• q= (1, 1, 1) or (1, 1, 0) or (1, 0, 0).
• Suppose q is
)( cba kkkq
pdnf ccccccq 21
bDaD
cD321 ccccccDq
3cc 2cc
1cc
)]([ cba kkkq Figure 1. Fuzzy document sets for the query . Each is a conjunctive component. is the query fuzzy set.},3,2,1{, icci qD
))1)(1(1())1(1(
)1(1
)1(1
,,,,,,
,,,
3
1,
,, 321
jcjbjajcjbja
jcjbja
ijcc
jccccccjq
i
},,,{,, cbaiji jdWhere is the membership of
in the fuzzy set associated with . q,j is the membership of document j for query q.
ik
Exercise: suppose there are 3 doc. and 4 terms.
d1=(1, 0, 1, 0), d2=(1, 1, 0, 0), and d3=(0, 1, 1, 0).
(1) Compute the term-term correlation matrix c i,j.
(2) Compute i,j (membership of document j in term i.)
(3) If the query q=(1, 0, 0, 0) or (1, 1, 0, 0), compute q,k for each document dk.
Some changes in the last slide.
q, j= cc1+cc2+cc3,j=max {cc1,j, cc2,j , cc3,j},
where cc1,j, cc2,j , cc3,j are computed as before.
String Matching Allowing Errors
• Problem: Given a short pattern P of length m, a long text T of length n, and a maximum allowed number of errors k, find all the text positions where the pattern occurs with at most k errors.
Dynamic Programming
• C[i,j] be the number of errors allowed, i and j are the indices for the pattern and the text.
• Three kinds of error: mismatch (a, b), insertion( a, )and deletion ( , a).
])1,1[],1,[],,1[min(1 else
]1,1[ then ) ( if ], [
] 0, [
0 ],0[
jicjiCjiC
jiCTPjiC
iiC
jC
ji
The matrix
The dynamic programming algorithm search ‘survey’ in the text ‘surgery’ with two errors. Bold entries indicate matching positions. Running time O(nm).
s x s u r g e r y
0 0 0 0 0 0 0 0 0 0
s 1 0 1 0 1 1 1 1 1 1
u 2 1 1 1 0 1 2 2 2 2
r 3 2 2 2 1 0 1 2 2 3
v 4 3 3 3 2 1 1 2 3 3
e 5 4 4 4 3 2 2 1 2 3
y 6 5 5 5 4 3 3 2 2 2
Exercise
• Let ABCABCDDABEDF be the text and pattern be ABCDAB. Find the occurrence of the pattern with at most 1 error.
String Matching Allowing Errors (FAST Algorithm)
• Just keep the cells with value at most k.
• This will reduce the time complexity .
Regular expressions Matching
• Regular expression:
1. Any letter x in {},is a regular expression, where is the set of all letters.
2. if A and B are regular expression, then A|B, A.B and (A)* are regular expressions.
Regular expressions Matching(Not Required)
• Given an regular expression E and a string T, find all the substrings in T that match E.
• Let d(i) be the set of all states in the automaton that can be reached after T1T2…Ti is accepted.
• Given d(i), d(i+1) can be computed easily.• There is a starting and final state in the automa
ton. • Whenever the final state is reach, we find a su
bstring in T that match the expression.
S
ε
f
FA
FB
ε
ε
ε
FA|B
S fFA FB
ε
BAF
S fFA
ε
*(A)F
ε
ε
ε
AA
B
ε
ε A
B
B ε
ε
AB)|(BB)|AA(
a
b c
d
e f
g h
i j k
l
Example:
• E=(A|AA).(B|AB).
• T=ABBAB.
• D(1)={a, b, d, c}
• D(2)={ a,b, d, e, f, g, i },
• D(3)={a,b,c, e, f, g, i, h, l}.
• D(4)={a,b,d,c,j}
• D(5)={a,b,d, e, f, g, i, k}
Running time
• O(n2), where n is the size of the automaton since d(s, i) could contain O(n) states.