Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has...

120
Rough Set Theory

Transcript of Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has...

Page 1: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Rough Set Theory

Page 2: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

2

Introduction

Fuzzy set theory– Introduced by Zadeh in 1965 [1]– Has demonstrated its usefulness in chemistry

and in other disciplines [2-9] Rough set theory

– Introduced by Pawlak in 1985 [10,11]– Popular in many other disciplines [12]

Page 3: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Introduction

Rough set theory was developed by Zdzislaw Pawlak in the early 1980’s.

Representative Publications:– Z. Pawlak, “Rough Sets”, International Journal

of Computer and Information Sciences, Vol.11, 341-356 (1982).

– Z. Pawlak, Rough Sets - Theoretical Aspect of Reasoning about Data, Kluwer Academic Pubilishers (1991).

Page 4: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Introduction (2) The main goal of the rough set analysis is

induction of approximations of concepts. Rough sets constitutes a sound basis for KDD. It

offers mathematical tools to discover patterns hidden in data.

It can be used for feature selection, feature extraction, data reduction, decision rule generation, and pattern extraction (templates, association rules) etc.

identifies partial or total dependencies in data, eliminates redundant data, gives approach to null values, missing data, dynamic data and others.

Page 5: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Basic Concepts of Rough Sets

Information/Decision Systems (Tables) Indiscernibility Set Approximation Reducts and Core Rough Membership Dependency of Attributes

Page 6: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Information Systems/Tables

IS is a pair (U, A) U is a non-empty

finite set of objects. A is a non-empty finite

set of attributes such that for every

is called the value set of a.

aVUa :

.AaaV

Age LEMS

x 1 16-30 50x2 16-30 0x3 31-45 1-25x4 31-45 1-25

x5 46-60 26-49x6 16-30 26-49

x7 46-60 26-49

Page 7: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

7

Consider a data set containing the results of threemeasurements performed for 10 objects. The resultscan be organized in a matrix10x3.

Basic concepts of the rough sets theory

Measurements

Objects

Page 8: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Decision Systems/Tables

DS: is the decision

attribute (instead of one we can consider more decision attributes).

The elements of A are called the condition attributes.

Age LEMS Walk

x 1 16-30 50 yes x2 16-30 0 no x3 31-45 1-25 nox4 31-45 1-25 yes

x5 46-60 26-49 nox6 16-30 26-49 yes

x7 46-60 26-49 no

}){,( dAUT Ad

Page 9: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Issues in the Decision Table

The same or indiscernible objects may be represented several times.

Some of the attributes may be superfluous.

Page 10: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Indiscernibility

The equivalence relation A binary relation which is

reflexive (xRx for any object x) ,

symmetric (if xRy then yRx), and

transitive (if xRy and yRz then xRz). The equivalence class of an element

consists of all objects such that xRy.

XXR

Xx Xy

Rx][

Page 11: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

11

Basic concepts of the rough sets theory

Page 12: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

12

Basic concepts of the rough sets theory

Page 13: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Indiscernibility (2) Let IS = (U, A) be an information system, then

with any there is an associated equivalence relation:

where is called the B-indiscernibility relation.

If then objects x and x’ are indiscernible from each other by attributes from B.

The equivalence classes of the B-indiscernibility relation are denoted by

AB

)}'()(,|)',{()( 2 xaxaBaUxxBIND IS

)(BINDIS

),()',( BINDxx IS

.][ Bx

Page 14: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

An Example of Indiscernibility The non-empty subsets of

the condition attributes are {Age}, {LEMS}, and {Age, LEMS}.

IND({Age}) = {{x1,x2,x6}, {x3,x4}, {x5,x7}}

IND({LEMS}) = {{x1}, {x2}, {x3,x4}, {x5,x6,x7}}

IND({Age,LEMS}) = {{x1}, {x2}, {x3,x4}, {x5,x7}, {x6}}.

Age LEMS Walk

x 1 16-30 50 yes x2 16-30 0 no x3 31-45 1-25 nox4 31-45 1-25 yes

x5 46-60 26-49 nox6 16-30 26-49 yes

x7 46-60 26-49 no

Page 15: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Observations

An equivalence relation induces a partitioning of the universe.

The partitions can be used to build new subsets of the universe.

Subsets that are most often of interest have the same value of the decision attribute.

It may happen, however, that a concept such as “Walk” cannot be defined in a crisp manner.

Page 16: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Set Approximation

Let T = (U, A) and let and We can approximate X using only the information contained in B by constructing the B-lower and B-upper approximations of X, denoted and respectively, where

AB .UX

XB XB

},][|{ XxxXB B

}.][|{ XxxXB B

Page 17: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Set Approximation (2)

B-boundary region of X,

consists of those objects that we cannot decisively classify into X in B.

B-outside region of X,

consists of those objects that can be with certainty classified as not belonging to X.

A set is said to be rough if its boundary region is non-empty, otherwise the set is crisp.

,)( XBXBXBNB

,XBU

Page 18: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

An Example of Set Approximation Let W = {x | Walk(x) =

yes}.

The decision class, Walk, is rough since the boundary region is not empty.

Age LEMS Walk

x 1 16-30 50 yes x2 16-30 0 no x3 31-45 1-25 nox4 31-45 1-25 yes

x5 46-60 26-49 nox6 16-30 26-49 yes

x7 46-60 26-49 no

}.7,5,2{

},4,3{)(

},6,4,3,1{

},6,1{

xxxWAU

xxWBN

xxxxWA

xxWA

A

Page 19: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

An Example of Set Approximation (2)

yes

yes/no

no

{{x1},{x6}}

{{x3,x4}}

{{x2}, {x5,x7}}

AW

WA

Page 20: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

U

set X

U/R

R : subset of attributes

XR

XXR

Lower & Upper Approximations

Page 21: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Lower & Upper Approximations (2)

}:/{ XYRUYXR

}:/{ XYRUYXR

Lower Approximation:

Upper Approximation:

Page 22: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Lower & Upper Approximations(3)

X1 = {u | Flu(u) = yes}

= {u2, u3, u6, u7}

RX1 = {u2, u3} = {u2, u3, u6, u7, u8, u5}

X2 = {u | Flu(u) = no}

= {u1, u4, u5, u8}

RX2 = {u1, u4} = {u1, u4, u5, u8, u7, u6}X1R

X2R

U Headache Temp. FluU1 Yes Normal NoU2 Yes High YesU3 Yes Very-high YesU4 No Normal NoU5 NNNooo HHHiiiggghhh NNNoooU6 No Very-high YesU7 NNNooo HHHiiiggghhh YYYeeesssU8 No Very-high No

The indiscernibility classes defined by R = {Headache, Temp.} are {u1}, {u2}, {u3}, {u4}, {u5, u7}, {u6, u8}.

Page 23: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Lower & Upper Approximations (4)

R = {Headache, Temp.}U/R = { {u1}, {u2}, {u3}, {u4}, {u5, u7}, {u6, u8}}

X1 = {u | Flu(u) = yes} = {u2,u3,u6,u7}X2 = {u | Flu(u) = no} = {u1,u4,u5,u8}

RX1 = {u2, u3}

= {u2, u3, u6, u7, u8, u5}

RX2 = {u1, u4}

= {u1, u4, u5, u8, u7, u6}

X1R

X2R

u1

u4u3

X1 X2

u5u7u2

u6 u8

Page 24: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Properties of Approximations

YX

YBXBYXB

YBXBYXB

UUBUB BB

XBXXB

)()()(

)()()(

)()(,)()(

)(

)()( YBXB )()( YBXB implies and

Page 25: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Properties of Approximations (2)

)())(())((

)())(())((

)()(

)()(

)()()(

)()()(

XBXBBXBB

XBXBBXBB

XBXB

XBXB

YBXBYXB

YBXBYXB

where -X denotes U - X.

Page 26: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Four Basic Classes of Rough Sets

X is roughly B-definable, iff and

X is internally B-undefinable, iff and

X is externally B-undefinable, iff and

X is totally B-undefinable, iff and

)(XB,)( UXB

)(XB,)( UXB

)(XB

,)( UXB )(XB

.)( UXB

Page 27: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Accuracy of Approximation

  where |X| denotes the cardinality of

Obviously

If X is crisp with respect to B.

If X is rough with respect to B.

|)(|

|)(|)(

XB

XBXB

.X

.10 B,1)( XB,1)( XB

Page 28: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Issues in the Decision Table

The same or indiscernible objects may be represented several times.

Some of the attributes may be superfluous (redundant).

That is, their removal cannot worsen the classification.

Page 29: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Reducts

Keep only those attributes that preserve the indiscernibility relation and, consequently, set approximation.

There are usually several such subsets of attributes and those which are minimal are called reducts.

Page 30: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Dispensable & Indispensable Attributes

Let Attribute c is dispensable in T if , otherwise attribute c is indispensable in T.

.Cc

)()( }){( DPOSDPOS cCC

XCDPOSDUX

C /

)(

The C-positive region of D:

Page 31: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Independent

T = (U, C, D) is independent

  if all are indispensable in T. Cc

Page 32: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Reduct & Core

The set of attributes is called a reduct of C, if T’ = (U, R, D) is independent and

The set of all the condition attributes

indispensable in T is denoted by CORE(C).

where RED(C) is the set of all reducts of C.

CR

).()( DPOSDPOS CR

)()( CREDCCORE

Page 33: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

An Example of Reducts & Core

U Headache Musclepain

Temp. Flu

U1 Yes Yes Normal NoU2 Yes Yes High YesU3 Yes Yes Very-high YesU4 No Yes Normal NoU5 No No High NoU6 No Yes Very-high Yes

U Musclepain

Temp. Flu

U1,U4 Yes Normal No

U2 Yes High YesU3,U6 Yes Very-high YesU5 No High No

U Headache Temp. Flu

U1 Yes Norlmal NoU2 Yes High YesU3 Yes Very-high YesU4 No Normal NoU5 No High NoU6 No Very-high Yes

Reduct1 = {Muscle-pain,Temp.}

Reduct2 = {Headache, Temp.}

CORE = {Headache,Temp}       {MusclePain, Temp}      = {Temp}

Page 34: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Discernibility Matrix (relative to positive region)

Let T = (U, C, D) be a decision table, with

By a discernibility matrix of T, denoted M(T), we will

mean matrix defined as:

for i, j = 1,2,…,n such that or belongs to the C-positive region of D.

is the set of all the condition attributes that classify objects ui and uj into different classes.

}.,...,,{ 21 nuuuU

nn

)]()([)}()(:{)]()([

jiji

ji

udud Dd if ucuc Ccudud Dd if ijm

iu ju

ijm

Page 35: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Discernibility Matrix (relative to positive region) (2)

The equation is similar but conjunction is taken over all non-empty entries of M(T) corresponding to the indices i, j such that

or belongs to the C-positive region of D. denotes that this case does not need to be

considered. Hence it is interpreted as logic truth. All disjuncts of minimal disjunctive form of this

function define the reducts of T (relative to the positive region).

iu juijm

Page 36: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

36

Basic concepts of the rough sets theory

Page 37: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

37

Basic concepts of the rough sets theory

Page 38: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Discernibility Function (relative to objects)

For any ,Uui

}},...,2,1{,:{)( njijmuf ijj

iT

where (1) is the disjunction of all variables a such that if (2) if (3) if

ijm,ijma .ijm

),( falsemij .ijm),(truetmij .ijm

Each logical product in the minimal disjunctive normal form (DNF) defines a reduct of instance .iu

Page 39: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Examples of Discernibility Matrix

No a b c d

u1 a0 b1 c1 yu2 a1 b1 c0 nu3 a0 b2 c1 nu4 a1 b1 c1 y

C = {a, b, c}D = {d}

In order to discern equivalence classes of the decision attribute d, to preserve conditions described by the discernibility matrix for this table

u1 u2 u3

u2

u3

u4

a,c

b

c a,b

Reduct = {b, c}

cb

bacbca

)()(

Page 40: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Examples of Discernibility Matrix (2)

a b c d Eu1 1 0 2 1 1

u2 1 0 2 0 1

u3 1 2 0 0 2

u4 1 2 2 1 0

u5 2 1 0 0 2

u6 2 1 1 0 2

u7 2 1 2 1 1

u1 u2 u3 u4 u5 u6

u2

u3

u4

u5

u6

u7

b,c,d b,c

b b,d c,d

a,b,c,d a,b,c a,b,c,d

a,b,c,d a,b,c a,b,c,d

a,b,c,d a,b c,d c,d

Core = {b}

Reduct1 = {b,c}

Reduct2 = {b,d}

Page 41: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

An Illustrative ExampleAn Illustrative Exampleid Protocol Service Flag Land Label

t1 tcp http error 0 anomaly

t2 tcp telnet normal 1 normal

t3 udp telnet error 0 anomaly

t4 icmp http error 1 normal

t5 icmp telnet error 0 normal

t6 tcp telnet error 0 anomaly

t7 icmp telnet normal 1 normal

t8 tcp http normal 1 normalProtocol -- tcp, udp, icmp, etc.Service -- http, telnet, ftp, etc.Flag -- if a connection is normal or an error Land -- if connection is from/to the same host/port Label -- the service request is normal or anomaly.

Page 42: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Elementary SetsElementary Sets

U/IND(D)=U/Label={t1, t3, t6}, {t2, t4, t5, t7, t8}} U/Protocol={{t1, t2, t6, t8}, {t4, t5, t7}, {t3}} U/Service={{t1, t4, t8}, {t2, t3, t5, t6, t7}} U/Flag={{t1, t3, t4, t5, t6}, {t2, t7, t8}} U/Land={{t1, t3, t5, t6}, {t2, t4, t7, t8}}

Page 43: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Rough Membership The rough membership function quantifies the

degree of relative overlap between the set X and the equivalence class to which x belongs.

The rough membership function can be interpreted as a frequency-based estimate of

where u is the equivalence class of IND(B).

Bx][

]1,0[: UBX |][|

|][|

B

BBX x

Xx

),|( uXxP

Page 44: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Rough Membership (2)

The formulae for the lower and upper approximations can be generalized to some arbitrary level of precision        by means of the rough membership function

Note: the lower and upper approximations as originally formulated are obtained as a special case with

]1,5.0(

}.1)(|{

})(|{

xxXB

xxXBBX

BX

.1

Page 45: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Dependency of Attributes

Discovering dependencies between attributes is an important issue in KDD.

A set of attribute D depends totally on a set of attributes C, denoted if all values of attributes from D are uniquely determined by values of attributes from C.

,DC

Page 46: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Dependency of Attributes (2)

Let D and C be subsets of A. We will say that D depends on C in a degree k

denoted by if

where called C-positive

region of D.

),10( k

,DC k

||

|)(|),(

U

DPOSDCk C

),()(/

XCDPOSDUX

C

Page 47: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Dependency of Attributes (3)

Obviously

If k = 1 we say that D depends totally on C. If k < 1 we say that D depends partially

(in a degree k) on C.

.||

|)(|),(

/

DUX U

XCDCk

Page 48: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

What Are Issues of Real World ?

Very large data sets Mixed types of data (continuous valued,

symbolic data) Uncertainty (noisy data) Incompleteness (missing, incomplete data) Data change

Use of background knowledge

Page 49: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

A Rough Set Based KDD Process

Discretization based on RS and Boolean Reasoning (RSBR).

Attribute selection based RS with Heuristics (RSH).  

Rule discovery by GDT-RS.

Page 50: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Observations A real world data set always contains mixed

types of data such as continuous valued, symbolic data, etc.

When it comes to analyze attributes with real values, they must undergo a process called discretization, which divides the attribute’s value into intervals.

There is a lack of the unified approach to discretization problems so far, and the choice of method depends heavily on data considered.

Page 51: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Discretization based on RSBR

In the discretization of a decision table T = where is an interval of real values, we search for a partition of for any

Any partition of is defined by a sequence of the so-called cuts from

Any family of partitions can be identified with a set of cuts.

}),{,( dAU ),[ aaa wvV

aP aV .Aa

aVkvvv ...21 .aV

AaaP }{

Page 52: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Discretization Based on RSBR (2)

In the discretization process, we search for a set of cuts satisfying some natural conditions.

U a b d

x1 0.8 2 1x2 1 0.5 0x3 1.3 3 0x4 1.4 1 1x5 1.4 2 0x6 1.6 3 1x7 1.3 1 1

U a b d

x1 0 2 1x2 1 0 0x3 1 2 0x4 1 1 1x5 1 2 0x6 2 2 1x7 1 1 1

P P

P = {(a, 0.9),

(a, 1.5),

(b, 0.75),

(b, 1.5)}

Page 53: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

A Geometrical Representation of Data

0 0.8 1 1.3 1.4 1.6 a

b3

2

1

0.5

x1

x2

x3

x4x7

x5

x6

Page 54: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

A Geometrical Representation of Data and Cuts

0 0.8 1 1.3 1.4 1.6 a

b

3

2

1

0.5

x1

x2

x3

x4

x5

x6

x7

Page 55: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Discretization Based on RSBR (3)

The sets of possible values of a and b are defined by

The sets of values of a and b on objects from U are given by

a(U) = {0.8, 1, 1.3, 1.4, 1.6};

b(U) = {0.5, 1, 2, 3}.

);2,0[ Va ).4,0[ Vb

Page 56: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

The Set of Cuts on Attribute a

0.8 1.0 1.3 1.4 1.6

aap1

ap2ap3

ap4

1c 2c 3c 4c

Page 57: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

57

An illustrative example of application of therough set approach

Page 58: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

58

An illustrative example of application of therough set approach

Page 59: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

59

An illustrative example of application of therough set approach

Page 60: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

U b c d e

u1 0 2 1 1

u2 0 2 0 1

u3 2 0 0 2

u4 2 2 1 0

u5 1 0 0 2

u6 1 1 0 2

u7 1 2 1 1

Searching for CORE

Removing attribute a

Removing attribute a does not cause inconsistency.

Hence, a is not used as CORE.

Page 61: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Searching for CORE (2)

Removing attribute bU a c d e

u1 1 2 1 1

u2 1 2 0 1

u3 1 0 0 2

u4 1 2 1 0

u5 2 0 0 2

u6 2 1 0 2

u7 2 2 1 1

01214

11211

:

:

edcau

edcau

Removing attribute b cause inconsistency.

Hence, b is used as CORE.

Page 62: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Searching for CORE (3)

Removing attribute c

U a b d e

u1 1 0 1 1

u2 1 0 0 1

u3 1 2 0 2

u4 1 2 1 0

u5 2 1 0 2

u6 2 1 0 2

u7 2 1 1 1

Removing attribute c

does not cause inconsistency.

Hence, c is not used

as CORE.

Page 63: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Searching for CORE (4)

Removing attribute d

U a b c e

u1 1 0 2 1

u2 1 0 2 1

u3 1 2 0 2

u4 1 2 2 0

u5 2 1 0 2

u6 2 1 1 2

u7 2 1 2 1

Removing attribute d

does not cause inconsistency.

Hence, d is not used

as CORE.

Page 64: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Searching for CORE (5)

CORE(C)={b}

Initial subset R = {b}

Attribute b is the unique indispensable attribute.

Page 65: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

R={b}U a b c d e

u1 1 0 2 1 1

u2 1 0 2 0 1

u3 1 2 0 0 2

u4 1 2 2 1 0

u5 2 1 0 0 2

u6 2 1 1 0 2

u7 2 1 2 1 1

10 eb

U’ b e

u1 0 1

u2 0 1

u3 2 2

u4 2 0

u5 1 2

u6 1 2

u7 1 1

The instances containing b0 will not be considered.

T T’

Page 66: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

A Rough Set Based KDD Process

Discretization based on RS and Boolean Reasoning (RSBR).

Attribute selection based RS with Heuristics (RSH).  

Rule discovery by GDT-RS.

Page 67: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Main Features of GDT-RS

Unseen instances are considered in the discovery process, and the uncertainty of a rule, including its ability to predict possible instances, can be explicitly represented in the strength of the rule.

Biases can be flexibly selected for search control, and background knowledge can be used as a bias to control the creation of a GDT and the discovery process.

Page 68: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

A Sample DB

u1 a0 b0 c1 yu2 a0 b1 c1 yu3 a0 b0 c1 yu4 a1 b1 c0 nu5 a0 b0 c1 nu6 a0 b2 c1 yu7 a1 b1 c1 y

Condition attributes : a, b, c Va = {a0, a1} Vb = {b0, b1, b2} Vc = {c0, c1}Decision attribute : d, Vd = {y , n}

U a b c d

Page 69: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Rule Representation

X Y with S X denotes the conjunction of the conditions

that a concept must satisfy Y denotes a concept that the rule describes S is a “measure of strength” of which the

rule holds

Page 70: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Finding Detection Rules With GRSFinding Detection Rules With GRS Input: UIS=<U,C,D,{Va}aC,f,g,d> and

classification precision factors P and N

Output: A set of classification rules with formatIF <condition> THEN <conclusion> <Certainty-Factor>

Procedure outline– Compute the dependency degree of the decision

attribute D on the condition attribute set C – Find the generalized attribute reducts of the condition

attribute set C according to the attribute dependency – Construct classification rules with certainty factors.

Page 71: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Classification RulesClassification Rules Using the reduct, positive rules are

– Rule 1: If C2=0 then D=1 (normal) with CF=0.85.

– Rule 2: If C2=1 then D=1 (normal) with CF=0.47.

– Rule 3: If C2=2 then D=1 (normal) with CF=0.10. Negative rules are

– Rule 4: If C2=0 then D=0 (abnormal) with CF=0.05.

– Rule 5: If C2=1 then D=0 (abnormal) with CF=0.33.

– Rule 6: If C2=2 then D=0 (abnormal) with CF=0.85.

Page 72: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Rule Strength (1)

The strength of the generalization X   (BK is no used),

 

is the number of the observedinstances satisfying the ith generalization.

kPG

krelins

NPGN

lkl

k

PGPIp

PGsXs)()|(

)()(

))(1)(()( YXrXsYXS

)( krelins PGN

Page 73: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Rule Strength (2)

The strength of the generalization X

(BK is used),

kPG

lkl

lklbkk

N

PGPIBKF

PGPIpPGsXs

)|(

)|()()(

Page 74: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Rule Strength (3)

The rate of noises

is the number of instances belonging to the class Y within the instances satisfying the generalization X.

)(),()()( XN

YXNXN

relins

classinsrelinsYXr

),( YXN classins

Page 75: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Rule Discovery by GDT-RS

Condition Attrs. : a, b, c a: Va = {a0, a1}   b: Vb = {b0, b1, b2}   c: Vc = {c0, c1}

Class : d:d: Vd = {y , n}

U a b c du1 a0 b0 c1 yu2 a0 b1 c1 yu3 a0 b0 c1 yu4 a1 b1 c0 nu5 a0 b0 c1 nu6 a0 b2 c1 nu7 a1 b1 c1 y

Page 76: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Regarding the Instances (Noise Rate = 0)

U a b c d    u1,u1’ u3,    u5

a0 b0 c1y,y,n

u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

67.03

11)'1(

33.03

21)'1(

}{

}{

ur

ur

n

y

)'1(

)'1(

)'1(

0Let

}{

}{

ud

Tur

Tur

T

noisen

noisey

noise

 と

U a b c du1’ a0 b0 c1u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

Page 77: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Generating Discernibility Vector for u2

U a b c du1’ a0 b0 c1u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

7,2

6,2

4,2

2,2

1,2

}{

},{

}{

m

bm

cam

m

bm

u1’ u2 u4 u6 u7u2 b a,c b

Page 78: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Obtaining Reducts for u2

U a b c du1’ a0 b0 c1u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

u1’ u2 u4 u6 u7u2 b a,c b

)()(

)()(

)()()()2(

cbba

cab

bcabufT

Page 79: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Generating Rules from u2

U a b c du1’ a0 b0 c1u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

)()()2( cbbaufT

{a0,b1} {b1,c1}

{a0b1}a0b1c0

a0b1c1(u2)

s({a0b1}) = 0.5

{b1c1}

a0b1c1(u2)

a1b1c1(u7)

s({b1c1}) = 10)}11({ ycbr

y

y

0)}10({ ybar

y

Page 80: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Generating Rules from u2 (2)

U a b c du1’ a0 b0 c1u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

1)01()2

12( with }11{

5.0)01()2

11( with }10{

Sycb

Syba

Page 81: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Generating Discernibility Vector for u4

U a b c du1’ a0 b0 c1u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

}{

},{

},,{

7,4

6,4

4,4

2,4

1,4

cm

m

m

cam

cbam

u1’ u2 u4 u6 u7u4 a,b,c a,c c

Page 82: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Obtaining Reducts for u4

U a b c du1’ a0 b0 c1u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

)(

)()()()4(

c

ccacbaufT

u1’ u2 u4 u6 u7u4 a,b,c a,c c

Page 83: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Generating Rules from u4

U a b c du1’ a0 b0 c1u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

{c0}

{c0}

a0b0c0

a1b1c0(u4)

0)}0({6

1)0(

ncr

cs

n

)()4( cufT

a1b2c0

Page 84: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Generating Rules from u4 (2)

U a b c du1’ a0 b0 c1u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

167.0)01()6

11( with }0{ Snc

Page 85: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Generating Rules from All Instances

U a b c du1’ a0 b0 c1u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

u2: {a0b1}    y, S = 0.5 {b1c1}    y, S =1

u4: {c0}    n, S = 0.167

u6: {b2}    n, S=0.25

u7: {a1c1}    y, S=0.5    {b1c1}    y, S=1

Page 86: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

The Rule Selection Criteria in GDT-RS

Selecting the rules that cover as many instances as possible.

Selecting the rules that contain as little attributes as possible, if they cover the same number of instances.

Selecting the rules with larger strengths, if they have same number of condition attributes and cover the same number of instances.

Page 87: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Generalization Belonging to Class y

a0b1c1(y)

a1b1c1(y)

*b1c1 1/2 1/2a1*c1 1/3a0b1* 1/2

{b1c1} y   with S = 1 u2 , u7{a1c1} y with S = 1/2 u7{a0b1} y with S = 1/2 u2

u2 u7

Page 88: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Generalization Belonging to Class n

a0b2c1(n)

a1b1c0(n)

**c0 1/6*b2* 1/4

c0 n with S = 1/6 u4b2 n with S = 1/4 u6

u4 u6

Page 89: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Results from the Sample DB( Noise Rate = 0 )

Certain Rules: Instances Covered

{c0} n with S = 1/6 u4

{b2} n with S = 1/4 u6

{b1c1} y with S = 1 u2 , u7

Page 90: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Possible Rules:Possible Rules:

b0 y with S = (1/4)(1/2)

a0 & b0 y with S = (1/2)(2/3)

a0 & c1 y with S = (1/3)(2/3)

b0 & c1 y with S = (1/2)(2/3)

Instances Covered: u1, u3, u5

Results from the Sample DB (2)(Noise Rate > 0)

Page 91: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Regarding Instances(Noise Rate > 0)

U a b c d    u1,u1’ u3,    u5

a0 b0 c1y,y,n

u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

67.03

11)'1(

33.03

21)'1(

}{

}{

ur

ur

n

y

yud

Tur

T

noisey

noise

)'1(

)'1(

5.0Let

}{

U a b c du1’ a0 b0 c1 yu2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

Page 92: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Rules Obtained from All Instacnes

U a b c du1’ a0 b0 c1 yu2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

u2: {a0b1}    y, S=0.5 {b1c1}    y, S=1

u4: {c0}    n, S=0.167

u6: {b2}    n, S=0.25

u7: {a1c1}    y, S=0.5 {b1c1}    y, S=1

u1’:{b0} y, S=1/4*2/3=0.167

Page 93: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Example of Using BKa0b0c0 a0b0c1 a0b1c0 a0b1c1 a0b2c0 a0b2c1 … a1b2c1

a0b0* 1/2 1/2a0b1* 1/2 1/2a0*c1 1/3 1/3 1/3a0** 1/6 1/6 1/6 1/6 1/6 1/6

BK :   a0 => c1, 100%

a0b0c0 a0b0c1 a0b1c0 a0b1c1 a0b2c0 a0b2c1 … a1b2c1a0b0* 0 1a0b1* 0 1a0*c1 1/3 1/3 1/3a0** 0 1/3 0 1/3 0 1/3

Page 94: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Changing Strength of Generalization by BK

U a b c du1’ a0 b0 c1u2 a0 b1 c1 yu4 a1 b1 c0 nu6 a0 b2 c1 nu7 a1 b1 c1 y

)()()2( cbbaufT

{a0,b1} {b1,c1}

{a0b1}a0b1c0

a0b1c1(u2)

s({a0b1}) = 0.5

0)}10({ ybar

{a0b1}a0b1c0

a0b1c1(u2)

s({a0b1}) = 1

0)}10({ ybar

1/2

1/2 100%

0%

a0 => c1, 100%

Page 95: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Algorithm 1Optimal Set of Rules

Step 1. Consider the instances with the same condition attribute values as one instance, called a compound instance.

Step 2. Calculate the rate of noises r for each compound instance.

Step 3. Select one instance u from U and create a discernibility vector for u.

Step 4. Calculate all reducts for the instance u by using the discernibility function.

Page 96: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Algorithm 1Optimal Set of Rules (2)

Step 5. Acquire the rules from the reducts for the instance u, and revise the strength of generalization of each rule.

Step 6. Select better rules from the rules (for u) acquired in Step 5, by using the heuristics for rule selection.

Step 7. If then go back to Step 3. Otherwise go to Step 8.

}.{uUU , U

Page 97: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Algorithm 1Optimal Set of Rules (3)

Step 8. Finish if the number of rules selected in Step 6 for each instance is 1. Otherwise find a minimal set of rules, which contains all of the instances in the decision table.

Page 98: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

The Issue of Algorithm 1

It is not suitable for the database with a large number of attributes.

Methods to Solve the Issue: Finding a reduct (subset) of condition

attributes in a pre-processing. Finding a sub-optimal solution using some

efficient heuristics.

Page 99: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Algorithm 2 Sub-Optimal Solution   

  Step1: Set R = {}, COVERED = {}, and SS = {all instances IDs}. For each class , divide the decision table T into two parts: current class    and other classes

Step2: From the attribute values    of the instances (where   means the jth value of attribute i, ),, SSITI kk

kI

cD

ijv

ijv

T.T

Page 100: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Algorithm 2Sub-Optimal Solution (2)

choose a value v with the maximal number of occurrence within the instances contained in T+ , and the minimal number of occurrence within the instances contained in T-.

Step3: Insert v into R. Step4: Delete the instance ID from SS if the

instance does not contain v.

Page 101: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Algorithm 2Sub-Optimal Solution (3)

Step5: Go back to Step2 until the noise rate is less than the threshold value.

Step6: Find out a minimal sub-set R’ of R according to their strengths. Insert

into RS. Set R = {}, copy the instance IDs

in SS to COVERED , and

set SS = {all instance IDs}- COVERED.

)'( cDR

Page 102: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Algorithm 2Sub-Optimal Solution (4)

Step8: Go back to Step2 until all instances of T+ are in COVERED.

Step9: Go back to Step1 until all classes are handled.

Page 103: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Time Complexity of Alg.1&2

Time Complexity of Algorithm 1:

Time Complexity of Algorithm 2:

Let n be the number of instances in a DB,

m the number of attributes,      the number of generalizations and is less than

))(( 23TGNmnmnO

)( 22nmO

) (T G N

).2( 1mO

Page 104: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Experiments

DBs that have been tested: meningitis, bacterial examination, cancer, mushroom, slope-in-collapse, earth-quack, contents-sell, …... Experimental methods:

– Comparing GDT-RS with C4.5– Using background knowledge or not– Selecting different allowed noise rates as the

threshold values– Auto-discretization or BK-based discretization.

Page 105: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Experiment 1(meningitis data) (2)

GDT-RS (auto-discretization):

bacteriastreptoculture

bacteriapriskGrounpacuteonsetseizure

bacteria

priskGrounpnegativeCCourseacuteonset

bacteriaCRPmsex

bacteriacellMonoabnormalctFind

bacteriacellPoly

)(

)()()0(

)()()(

)4()(

)15()(

)221(

Page 106: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Experiment 1(meningitis data) (3)

GDT-RS (auto-discretization):

virusnriskcellMonocellPoly

viruscellPolynormalctFindCRP

viruscellMonocellPolyCRP

)()15()221(

)221()()4(

)15()221()4(

Page 107: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Using Background Knowledge(meningitis data)

Never occurring together: EEGwave(normal) EEGfocus(+) CSFcell(low) Cell_Poly(high) CSFcell(low) Cell_Mono(high) Occurring with lower possibility: WBC(low) CRP(high) WBC(low) ESR(high) WBC(low) CSFcell(high)

Page 108: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Using Background Knowledge (meningitis data) (2)

Occurring with higher possibility:

WBC(high) CRP(high)

WBC(high) ESR(high)

WBC(high) CSF_CELL(high)

EEGfocus(+) FOCAL(+)

EEGwave(+) EEGfocus(+)

CRP(high) CSF_GLU(low)

CRP(high) CSF_PRO(low)

Page 109: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Explanation of BK

If the brain wave (EEGwave) is normal, the focus of brain wave (EEGfocus) is never abnormal.

If the number of white blood cells (WBC) is high, the inflammation protein (CRP) is also high.

Page 110: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Using Background Knowledge (meningitis data) (3)

rule1 is generated by BK

rule1:

4)/384(30

);()(

)10()5()(

ES

EVIRUSCULTURE

CSFcellESRacuteONSET

Page 111: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Using Background Knowledge (meningitis data) (4)

rule2 is replaced by rule2’

rule2:

rule2’:

./30

;)7,4[))((

ES

lEEGabnormaLOCEVIRUSDIAG

.4)/10(

;)7,4[)(

ES

lEEGabnormaLOCEEGfocus

Page 112: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Experiment 2(bacterial examination data)

Number of instances: 20,000   Number of condition attributes: 60   Goals:

– analyzing the relationship between the bacterium-detected attribute and other attributes

– analyzing what attribute-values are related to the sensitivity of antibiotics when the value of bacterium-detected is (+).

Page 113: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Attribute Selection(bacterial examination data)

Class-1 : bacterium-detected (+、-)

condition attributes : 11 Class-2 : antibiotic-sensibility

(resistant (R), sensibility(S))

condition attributes : 21  

Page 114: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Some Results (bacterial examination data)

Some of rules discovered by GDT-RS are the same as C4.5, e.g.,

Some of rules can only be discovered by GDT-RS, e.g.,

)3(lactamese bacterium-detected(+)

)10( 3quantityurine bacterium-detected(-)

)(1 pneumoniadisease bacterium-detected(-).

Page 115: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Experiment 3(gastric cancer data)

Instances number : 7520   Condition Attributes: 38   Classes :

– cause of death (specially, the direct death)– post-operative complication

Goals :– analyzing the relationship between the direct

death and other attributes– analyzing the relationship between the post-

operative complication and other attributes.

Page 116: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Result of Attribute Selection( gastric cancer data )

Class : the direct deathsex, location_lon1, location_lon2, location_cir1,

location_cir2, serosal_inva, peritoneal_meta,

lymphnode_diss, reconstruction, pre_oper_comp1,

post_oper_comp1, histological, structural_atyp,

growth_pattern, depth, lymphatic_inva,

vascular_inva, ln_metastasis, chemotherapypos

(19 attributes are selected)

Page 117: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Result of Attribute Selection (2)( gastric cancer data )

Class : post-operative complication

multi-lesions, sex, location_lon1, location_cir1,location_cir2, lymphnode_diss, maximal_diam,

reconstruction, pre_oper_comp1, histological,

stromal_type, cellular_atyp, structural_atyp,

growth_pattern, depth, lymphatic_inva,

chemotherapypos

(17 attributes are selected)

Page 118: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Experiment 4(slope-collapse data)

Instances number : 3436  – (430 places were collapsed, and 3006 were not)

Condition attributes: 32   Continuous attributes in condition attributes: 6

– extension of collapsed steep slope, gradient, altitude, thickness of surface of soil, No. of active fault, distance between slope and active fault.

Goal : find out what is the reason that causes the slope to be collapsed.

Page 119: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

Result of Attribute Selection( slope-collapse data )

9 attributes are selected from 32 condition attributes: altitude, slope azimuthal, slope shape, direction of

high rank topography, shape of transverse section, position of transition line, thickness of surface of soil, kind of plant, distance between slope and active fault.

(3 continuous attributes in red color)

Page 120: Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.

The Discovered Rules (slope-collapse data)

s_azimuthal(2) s_shape(5) direction_high(8) plant_kind(3) ∧ ∧ ∧    S = (4860/E)

altitude[21,25) s_azimuthal(3) ∧ ∧ soil_thick(>=45) S = (486/E) s_azimuthal(4) direction_high(4) t_shape(1) tl_position(2) ∧ ∧ ∧ ∧

s_f_distance(>=9) S = (6750/E) altitude[16,17) s_azimuthal(3) ∧ ∧ soil_thick(>=45) ∧ s_f_distance(>=9)

S = (1458/E) altitude[20,21) t_shape(3) tl_position(2) plant_kind(6) ∧ ∧ ∧ ∧

s_f_distance(>=9) S = (12150/E) altitude[11,12) s_azimuthal(2) tl_position(1) S = (1215/E)∧ ∧ altitude[12,13) direction_high(9) tl_position(4) ∧ ∧ ∧ s_f_distance[8,9) S

= (4050/E) altitude[12,13) s_azimuthal(5) t_shape(5) ∧ ∧ ∧ s_f_distance[8,9) S =

(3645/E) …...