Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.
-
date post
21-Dec-2015 -
Category
Documents
-
view
220 -
download
4
Transcript of Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.
![Page 1: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/1.jpg)
Johanna GOLDJohanna GOLD
Rough Sets TheoryRough Sets TheoryLogical Analysis of Data.Logical Analysis of Data.
MondayMonday, , NovemberNovember 26, 2007 26, 2007
![Page 2: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/2.jpg)
IntroductionIntroduction
Comparison of two theories for rules induction.
Different methodologies Same results?
![Page 3: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/3.jpg)
Set of objects described by attributes. Each object belongs to a class. We want decision rules.
GeneralitiesGeneralities
![Page 4: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/4.jpg)
There are two approaches: Rough Sets Theory (RST) Logical Analysis of Data (LAD)
Goal : compare them
ApproachesApproaches
![Page 5: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/5.jpg)
ContentsContents
1. Rough Sets Theory
2. Logical Analysis Of data
3. Comparison
4. Inconsistencies
![Page 6: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/6.jpg)
Two examples having the exact same values in all attributes, but belonging to two different classes.
Example: two sick people have the same symptomas but different disease.
InconsistenciesInconsistencies
![Page 7: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/7.jpg)
RST doesn’t correct or aggregate inconsistencies.
For each class : determination of lower and upper approximations.
Covered by RSTCovered by RST
![Page 8: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/8.jpg)
Lower : objects we are sure they belong to the class.
Upper : objects than can belong to the class.
ApproximationsApproximations
![Page 9: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/9.jpg)
Lower approximation → certain rules
Upper approximation → possible rules
Impact on rulesImpact on rules
![Page 10: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/10.jpg)
Rules induction on numerical data → poor rules → too many rules.
Need of pretreatment.
PretreatmentPretreatment
![Page 11: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/11.jpg)
Goal : convert numerical data into discrete data.
Principle : determination of cut points in order to divide domains into successive intervals.
DiscretizationDiscretization
![Page 12: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/12.jpg)
First algorithm: LEM2 Improved algorithms:
Include the pretreatment MLEM2, MODLEM, …
AlgorithmsAlgorithms
![Page 13: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/13.jpg)
Induction of certain rules from the lower approximation.
Induction of possible rules from the upper approximation.
Same procedure
LEM2LEM2
![Page 14: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/14.jpg)
For an attribute x and its value v, a block [(x,v)] of attribute-value pair (x,v) is all the cases where the attribute x has the value v.
Ex : [(Age,21)]=[Martha]
[(Age,22)]=[David ; Audrey]
Definitions (1)Definitions (1)
![Page 15: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/15.jpg)
Let B be a non-empty lower or upper approximation of a concept represented by a decision-value pair (d,w).
Ex : (level,middle)→B=[obj1 ; obj5 ; obj7]
DefinitionsDefinitions (2) (2)
![Page 16: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/16.jpg)
Let T be a set of pairs attribute-value (a,v). Set B depends on set T if and only if:
Definitions (3)Definitions (3)
Tva
BvaT
),(
)],[(][
![Page 17: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/17.jpg)
A set T is minimal complex of B if and only if B depends on T and there is no subset T’ of T such as B depends on T’.
Definitions (4)Definitions (4)
![Page 18: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/18.jpg)
Let T be a non-empty collection of non-empty set of attribute-value pairs.
T is a set of T. T is a set of (a,v).
Definitions (5)Definitions (5)
![Page 19: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/19.jpg)
T is a local cover of B if and only if:
Each member T of T is a minimal complex of B.
T is minimal
Definitions (6)Definitions (6)
BTT
Τ
][
![Page 20: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/20.jpg)
LEM2’s output is a local cover for each approximation of the decision table concept.
It then convert them into decision rules.
AlgorithmAlgorithmprincipleprinciple
![Page 21: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/21.jpg)
AlgorithmAlgorithm
![Page 22: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/22.jpg)
Among the possible blocks, we choose the one: With the highest priority With the highest intersection With the smallest cardinal
Heuristics detailsHeuristics details
conceptva ,
va,
![Page 23: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/23.jpg)
As long as it is not a minimal complex, pairs are added.
As long as there is not a local cover, minimal complexes are added.
Heuristics detailsHeuristics details
![Page 24: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/24.jpg)
Illustration through an example. We consider that the pretreatment has
already been done.
IllustrationIllustration
![Page 25: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/25.jpg)
Data setData set
Attributes Décision
Case Height (cm) Hair Attraction
1 160 Blond -
2 170 Blond +
3 160 Red +
4 180 Black -
5 160 Black -
6 170 Black -
![Page 26: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/26.jpg)
For the attribute Height, we have the values 160, 170 and 180.
The pretreatment gives us two cut points: 165 and 175.
Cut pointsCut points
![Page 27: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/27.jpg)
[(Height, 160..165)]={1,3,5} [(Height, 165..180)]={2,4} [(Height, 160..175)]={1,2,3,5} [(Height, 175..180)]={4} [(Hair, Blond)]={1,2} [(Hair, Red)]={3} [(Hair, Black)]={4,5,6}
Blocks [(a,v)]Blocks [(a,v)]
![Page 28: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/28.jpg)
G = B = [(Attraction,-)] = {1,4,5,6} Here there is no inconsistencies. If there
were some, it’s at this point that we have to chose between the lower and the upper approximation.
First conceptFirst concept
![Page 29: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/29.jpg)
Pair (a,v) such as [(a,v)]∩[(Attraction,-)]≠Ø
(Height,160..165) (Height,165..180) (Height,160..175) (Height,175..180) (Hair,Blond) (Hair,Black)
Eligible pairsEligible pairs
![Page 30: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/30.jpg)
We chose the most appropriate, which is to say (a,v) for which
| [(a,v)] ∩ [(Attraction,-)] |
is the highest. Here : (Hair, Black)
Choice of a pairChoice of a pair
![Page 31: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/31.jpg)
The pair (Hair, Black) is a minimal complex because:
Minimal complexMinimal complex
)],[()],[( AttractionBlackHair
![Page 32: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/32.jpg)
B = [(Attraction,-)] – [(Hair,Black)]
= {1,4,5,6} - {4,5,6}
= {1}
New conceptNew concept
![Page 33: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/33.jpg)
Through the pairs (Height,160..165), (Height,160..175) and (Hair, Blond).
Intersections having the same cardinality, we chose the pair having the smallest cardinal:
(Hair, Blond)
Choice of a pair (1)Choice of a pair (1)
![Page 34: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/34.jpg)
Problem : (Hair, Blond) is non a minimal complex. We chose the following pair:
(Height,160..165).
Choice of a pair (2)Choice of a pair (2)
)],[()],[( AttractionBlondHair
![Page 35: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/35.jpg)
{(Hair, Blond),(Height,160..165)} is a second minimal complex.
Minimal ComplexMinimal Complex
)],[(
)]165..160,[()],[(
Attraction
HeightBlondHair
![Page 36: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/36.jpg)
{{(Hair, Black)}, {(Hair, Blond), (Height, 160..165)}}
is a local cover of [(Attraction,-)].
End of the conceptEnd of the concept
![Page 37: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/37.jpg)
(Hair, Red) → (Attraction,+) (Hair, Blond) & (Height,165..180 ) → (Attraction,+)
(Hair, Black) → (Attraction,-) (Hair, Blond) & (Height,160..165 ) → (Attraction,-)
RulesRules
![Page 38: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/38.jpg)
ContentsContents
1. Rough Sets Theory
2. Logical Analysis Of data
3. Comparison
4. Inconsistencies
![Page 39: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/39.jpg)
Work on binary data. Extension of boolean approach on non-
binary case.
PrinciplePrinciple
![Page 40: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/40.jpg)
Let S be the set of all observations. Each observation is described by n
attributes. Each observation belongs to a class.
Definitions (1)Definitions (1)
![Page 41: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/41.jpg)
The classification can be considered as a partition into two sets
An archive is represented by a boolean function Φ :
Definitions (2)Definitions (2)
SandS),( SS
1,0S
![Page 42: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/42.jpg)
A literal is a boolean variable or its negation:
A term is a conjunction of literals :
The degree of a term is the number of literals.
Definitions (3)Definitions (3)
ii xorx
321321 xxxxxx
![Page 43: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/43.jpg)
A term T covers a point
if T(p)=1. A characteristic term of a point p is the
unique term of degree n covering p. Ex :
Definitions (4)Definitions (4)
np 1,0
4321)0,1,1,0( xxxx
![Page 44: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/44.jpg)
A term T is an implicant of a boolean function f if T(p) ≤ f(p) for all
An implicant is called prime if it is minimal (its degree).
Definitions (5)Definitions (5)
np 1,0
![Page 45: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/45.jpg)
A positive prime pattern is a term covering at least one positive example and no negative example.
A negative prime pattern is a term covering at least one negative example and no positive example.
Definitions (6)Definitions (6)
![Page 46: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/46.jpg)
ExampleExample
1 1 0
0 1 0
1 0 1
1 0 0
0 0 1
0 0 0
1a 2a 3a
S
S
![Page 47: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/47.jpg)
is a positive pattern : There is no negative example such as There is one positive example : the 3rd
line.
It's a positive prime pattern : covers one negative example : 4th
line. covers one negative example : 5th
line.
ExampleExample
31aa131 aa
1a
3a
![Page 48: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/48.jpg)
symmetry between positive and negative patterns.
Two approaches : Top-down Bottom-up
Pattern generationPattern generation
![Page 49: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/49.jpg)
we associate each positive example to its characteristic term→ it’s a pattern.
we take out the literals one by one until having a prime pattern.
Top-downTop-down
![Page 50: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/50.jpg)
we begin with terms of degree one: if it does not cover a negative
example, it is a pattern If not, we add literals until having
a pattern.
Bottom-upBottom-up
![Page 51: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/51.jpg)
We prefer short pattern → simplicity principle.
we also want to cover the maximum of examples with only one model → globality principle.
hybrid approach bottom-up – top-down.
ObjectivesObjectives
![Page 52: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/52.jpg)
Hybrid approachHybrid approach
We fix a degree D. We start by a bottom-up approach to
generate the models of degree lower or equal to D.
For all the points which are not covered by the 1st phase, we proceed to the top-down approach.
![Page 53: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/53.jpg)
Extension from binary case : binerization. Two types of data :
quantitative : age, height, … qualitative : color, shape, …
Extension to the Extension to the non binary casenon binary case
![Page 54: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/54.jpg)
For each value v that a qualitative attribute x can be, we associate a boolean variable b(x,v) :
b(x,v) = 1 if x = v b(x,v) = 0 otherwise
Qualitative dataQualitative data
![Page 55: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/55.jpg)
there are two types of associated variables:
Level variables Interval variables
Quantitative dataQuantitative data
![Page 56: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/56.jpg)
For each attribute x and each cut point t, we introduce a boolean variable b(x,t) :
b(x,t) = 1 if x ≥ t b(x,t) = 0 if x < t
Level variablesLevel variables
![Page 57: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/57.jpg)
For each attribute x and each pair of cut points t’, t’’ (t’<t’’), we introduce a boolean variable b(x,t’,t’’) :
b(x,t’,t’’) = 1 if t’ ≤ x < t’’ b(x,t’,t’’) = 0 otherwise
Intervals variablesIntervals variables
![Page 58: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/58.jpg)
ExampleExample
1 green yes 31
4 blue no 29
2 blue yes 20
4 red no 22
3 red yes 20
2 green no 14
4 green no 7
S
S
1x 2x 3x 4x
![Page 59: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/59.jpg)
ExampleExample
1
4
2
4
3
2
4
S
S
1x 2b 3b1ba 0 0 0
b 1 1 1
c 1 0 0
d 1 1 1
e 1 1 0
f 1 0 0
g 1 1 1
5.35.25.1
13
12
11
xbxbxb
![Page 60: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/60.jpg)
ExampleExample
green
blue
blue
red
red
green
green
S
S
2x 5b 6b4ba 1 0 0
b 0 1 0
c 0 1 0
d 0 0 1
e 0 0 1
f 1 0 0
g 1 0 0
redxb
bluexb
greenxb
26
25
24
![Page 61: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/61.jpg)
ExampleExample
yes
no
yes
no
yes
no
no
S
S
3x 7ba 1
b 0
c 1
d 0
e 1
f 0
g 0
yesxb 37
![Page 62: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/62.jpg)
ExampleExample
31
29
20
22
20
14
17
S
S
4x 9b8ba 1 1
b 1 1
c 1 0
d 1 1
e 1 0
f 0 0
g 0 0
2117
49
48
xbxb
![Page 63: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/63.jpg)
ExampleExample
1
4
2
4
3
2
4
S
S
1x 11b 12b10ba 0 0 0
b 0 0 0
c 1 1 0
d 0 0 0
e 0 1 1
f 1 1 0
g 0 0 0
5.35.25.35.15.25.1
112
111
110
xbxbxb
![Page 64: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/64.jpg)
ExampleExample
31
29
20
22
20
14
17
S
S
4x 13ba 0
b 0
c 1
d 0
e 1
f 0
g 0
2117 413 xb
![Page 65: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/65.jpg)
ExampleExample
13ba 0 0 0 1 0 0 1 1 1 0 0 0 0
b 1 1 1 0 1 0 0 1 1 0 0 0 0
c 1 0 0 0 1 0 1 1 0 1 1 0 1
d 1 1 1 0 0 1 0 1 1 0 0 0 0
e 1 1 0 0 0 1 1 1 0 0 1 1 1
f 1 0 0 1 0 0 0 0 0 1 1 0 0
g 1 1 1 1 0 0 0 0 0 0 0 0 0
1b 2b 3b 4b 5b 6b 7b 8b 9b 10b 11b 12b
![Page 66: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/66.jpg)
A set of binary attributes is called supporting set if the archive obtained by the elimination of all the other attributes will remained "contradiction-free".
A supporting set is irredundant if there is no subset of it which is a supporting set.
Supporting setSupporting set
![Page 67: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/67.jpg)
We associate to the attribute a variable
such as if the attribute belongs to the supporting set.
Application : elements a and e are different on attributes 1, 2, 4, 6, 9, 11, 12 and 13 :
VariablesVariables
ib
iy 1iy
113121196421 yyyyyyyy
![Page 68: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/68.jpg)
We do the same for all pairs of true and false observations :
Exponential number of solutions : we choose the smallest set :
Linear program Linear program
SpSpyppIii '','1)'','(
q
i iy1min
![Page 69: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/69.jpg)
Positive patterns :
Negative patterns :
Solution ofSolution ofour exampleour example
214 x5.25.1 13 xandyesx
2143 xandnox5.25.1 13 xandnox
)5.25.1(21 114 xorxandx
![Page 70: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/70.jpg)
ContentsContents
1. Rough Sets Theory
2. Logical Analysis Of data
3. Comparison
4. Inconsistencies
![Page 71: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/71.jpg)
LAD more flexible than RST
Linear program -> modification of parameters
Basic ideaBasic idea
![Page 72: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/72.jpg)
RST : couples (attribute, value) LAD : binary variables Correspondence?
ComparisonComparisonblocks / variablesblocks / variables
![Page 73: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/73.jpg)
For an attribute a taking the values:
Qualitative dataQualitative data
...,, 321 vvv
RST LAD
1,va 2,va 3,va
11 vab
22 vab 33 vab
![Page 74: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/74.jpg)
Discretization : convert numerical data into discrete data.
Principle : determination of cut points in order to divide domains into successive intervals :
Quantitative dataQuantitative data
max21min ... vppv
![Page 75: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/75.jpg)
RST : for each cut point, we have two blocks :
Quantitative dataQuantitative data
)..,( 1min pva
)..,( 2min pva
)..,( max1 vpa
)..,( max2 vpa
![Page 76: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/76.jpg)
LAD : for each cut point, we have a level variable :
...
Quantitative dataQuantitative data
11 pab
22 pab
33 pab
![Page 77: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/77.jpg)
LAD : for each pair of cut points, we have a interval variable :
...
Quantitative dataQuantitative data
212;1 papb
313;1 papb
323;2 papb
![Page 78: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/78.jpg)
Correspondence :
Level variable :
Quantitative dataQuantitative data
ii pab
)..,(1 maxvpab ii )..,(0 min ii pvab
![Page 79: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/79.jpg)
Quantitative dataQuantitative data
)..,()..,(1 minmax; jiji pvaANDvpab
)..,()..,(0 maxmin; vpaORpvab jiji
jiji papb ;
Correspondence :
Interval variable :
![Page 80: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/80.jpg)
Three parameters can change : Right hand side of constraints: coefficients of the objective function: coefficients of the left hand side of the
constraints:
Variation of LP Variation of LP parametersparameters
j
jic
iu
![Page 81: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/81.jpg)
We try to adapt the three heuristics : The highest priority The highest intersection with the concept The smallest cardinality
Heuristics Heuristics adaptationadaptation
![Page 82: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/82.jpg)
Priority on blocks -> priority on attributes
Introduction as weights in the objective function
Minimization : choice of pairs with first priorities
The highest priorityThe highest priority
![Page 83: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/83.jpg)
Pb : in LAD, no notion of concept ; everything is done symmetrically, the same time.
The highest The highest intersectionintersection
![Page 84: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/84.jpg)
Modification of the heuristic : difference between the intersection with a concept and the intersection with the other.
The highest, the better.
The highest The highest intersectionintersection
![Page 85: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/85.jpg)
Goal of RST : find minimal complexes: Find blocks covering the most examples of
the concept : highest possible intersection with the concept
Find blocks covering the less examples of the other concept : difference of intersections
The highest The highest intersectionintersection
![Page 86: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/86.jpg)
For LAD : difference between the number of times a variable takes the value 1 in
and in . Introduction as weights in the constraints :
we choose first the variable with the highest difference.
The highest The highest intersectionintersection
SS
![Page 87: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/87.jpg)
Simple : number of times a variable takes the value 1.
Introduction as weight in the constraints.
The smallest The smallest cardinalitycardinality
![Page 88: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/88.jpg)
Two calculations to be introduced : The highest difference The smallest cardinality
Difference of the two calculations
Weight of the Weight of the constraintsconstraints
![Page 89: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/89.jpg)
Before : everything is 1. Pb : modification of the weights of the
left hand side has no signification.
Right hand side of Right hand side of the constraintsthe constraints
![Page 90: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/90.jpg)
Average of compared to the number of attributes.
Average of in each constraint
Inconvenient : not a real signification
Ideas of Ideas of modificationmodification
jic
jic
![Page 91: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/91.jpg)
Not touch the weight in the constraints: introduce everything in the coefficients of the objective function:
Ideas of Ideas of modificationmodification
ycardinalit
SinofnbSinofnb
priorityui
)11(
![Page 92: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/92.jpg)
ContentsContents
1. Rough Sets Theory
2. Logical Analysis Of data
3. Comparison
4. Inconsistencies
![Page 93: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/93.jpg)
Use of two approximations : lower and upper.
Rules generation: sure and possible.
For RSTFor RST
![Page 94: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/94.jpg)
Classification mistakes: positive point classified as negative or the other way.
Two different cases.
For LADFor LAD
![Page 95: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/95.jpg)
All other points are well classify : our point will not be covered.
If the number of non covered points is high: generation of longer patterns.
If this number is small : erroneous classification and we forgot the points for the following.
Pos. PointPos. Pointclassified as neg.classified as neg.
![Page 96: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/96.jpg)
Terms covering a lot of positive points : also some negative points.
Probably wrongly classified : not taken into account for the evaluation of candidates terms.
Neg. PointNeg. Pointclassified as pos.classified as pos.
![Page 97: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/97.jpg)
We introduce a ratio. A term is still candidate if the ratio between
negative and positive points is smallest than:
RatioRatio
S
S
![Page 98: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/98.jpg)
An inconsistence can be considered as a mistake of classification
Inconsistence : two « identical » objects differently classified.
One of them is wrongly classified (approximations)
InconsistenciesInconsistenciesand mistakesand mistakes
![Page 99: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/99.jpg)
Let consider an inconsistence in LAD : two points : two classes :
There are two possibilities : is not covered by small degree patterns is covered by patterns of
Equivalence?Equivalence?
21 petp21 CetC
1C1p
2p
![Page 100: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/100.jpg)
We have only one inconsistence. The covered point is isolated ; it’s not
taken into account. Patterns of will be generated without
the inconsistence point
-> lower approximation
11stst case case
1C
![Page 101: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/101.jpg)
A point covered by the other concept patterns is wrongly classified.
It’s not taken into account for the candidate terms.
It’s not taken into account for the pattern generation of
-> lower approximation
22ndnd case case
2C
![Page 102: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/102.jpg)
Not taken into account for but not a problem for
For : upper approximation
22ndnd case case
2C1C
1C
![Page 103: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/103.jpg)
According to a ratio, LAD decide if a point is well classified or not.
For an inconsistence, it’s the same as consider:
The upper approximation of a class The lower approximation of the other
On more than 1 inconsistence : we re-classify the points.
Equivalence?Equivalence?
![Page 104: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/104.jpg)
ConclusionConclusion
Complete data : we can try to match LAD and RST.
Inconsistencies : classification mistakes of LAD can correspond to approximations.
Missing data : different management
![Page 105: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/105.jpg)
Jerzy W. Grzymala-Busse, MLEM2 - Discretization During Rule Induction, Proceedings of the IIPWM'2003, International Conference on Intelligent Information Processing and WEB Mining Systems, Zakopane, Poland, June 2-5, 2003, 499-508. Springer-Verlag.
Jerzy W. Grzymala-Busse, Jerzy Stefanowski, Three Discretization Methods for Rule Induction, International Journal of Intelligent Systems, 2001.
Endre Boros, Peter L. Hammer, Toshihide Ibaraki, Alexander Kogan, Eddy Mayoraz, Ilya Muchnik, An Implementation of Logical Analysis of Data, Rutcor Research Raport 22-96, 1996.
Sources (1)Sources (1)
![Page 106: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/106.jpg)
Endre Boros, Peter L. Hammer, Toshihide Ibaraki, Alexander Kogan, Logical Analysis of Numerical Data, Rutcor Research Raport 04-97, 1997.
Jerzy W. Grzymala-Busse, Rough Set Strategies to Data with Missing Attribute Values,Proceedings of theWorkshop on Foundation and New Directions in Data Mining, Melbourne, FL, USA. 2003.
Jerzy W. Grzymala-Busse, Sachin Siddhaye, Rough Set Approaches to Rule Induction from Incomplete Data, Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based System[C],Perugia,Italy, July 4, 2004 2 : 923- 930.
Sources (2)Sources (2)
![Page 107: Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d6b5503460f94a4a5a2/html5/thumbnails/107.jpg)
Jerzy Stefanowski, Daniel Vanderpooten, Induction of Decision Rules in Classi_cation and Discovery-Oriented Perspectives, International Journal of Intelligent Systems, 16 (1), 2001, 13-28.
Jerzy Stefanowski, The Rough Set based Rule Induction Technique for Classification Problems, Proceedings of 6th European Conference on Intelligent Techniques and Soft Computing EUFIT 98, Aachen 7-10 Sept., (1998) 109.113.
Roman Slowinski, Jerzy Stefanowski, Salvatore Greco, Benedetto Matarazzo, Rough Sets Processing of Inconsistent Information in Decision Analysis, Control and Cybernetics 29, 379±404, 2000.
Sources (3)Sources (3)