Fuzzy-rough data mining
description
Transcript of Fuzzy-rough data mining
![Page 1: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/1.jpg)
Fuzzy-rough data miningRichard JensenAdvanced Reasoning GroupUniversity of Aberystwyth
[email protected]://users.aber.ac.uk/rkj
![Page 2: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/2.jpg)
Outline• Knowledge discovery process
• Fuzzy-rough methods– Feature selection and extensions– Instance selection– Classification/prediction– Semi-supervised learning
![Page 3: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/3.jpg)
Knowledge discovery• The process
• The problem of too much data– Requires storage– Intractable for data mining algorithms– Noisy or irrelevant data is misleading/confounding
![Page 4: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/4.jpg)
Feature Selection
![Page 5: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/5.jpg)
Feature selection• Why dimensionality reduction/feature selection?
• Growth of information - need to manage this effectively• Curse of dimensionality - a problem for machine learning and data mining• Data visualisation - graphing data
High dimensionaldata
DimensionalityReduction
Low dimensionaldata
Processing System
Intractable
![Page 6: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/6.jpg)
Why do it?• Case 1: We’re interested in features– We want to know which are relevant– If we fit a model, it should be interpretable
• Case 2: We’re interested in prediction– Features are not interesting in themselves– We just want to build a good classifier (or other kind of
predictor)
![Page 7: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/7.jpg)
Feature selection process• Feature selection (FS) preserves data semantics by selecting
rather than transforming
• Subset generation: forwards, backwards, random…• Evaluation function: determines ‘goodness’ of subsets• Stopping criterion: decide when to stop subset search
Generation Evaluation
StoppingCriterion Validation
Feature set Subset
Subsetsuitability
Continue Stop
![Page 8: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/8.jpg)
Fuzzy-rough feature selection
![Page 9: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/9.jpg)
Fuzzy-rough set theory• Problems:– Rough set methods (usually) require data
discretization beforehand– Extensions, e.g. tolerance rough sets, require
thresholds– Also no flexibility in approximations• E.g. objects either belong fully to the lower (or upper)
approximation, or not at all
![Page 10: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/10.jpg)
Fuzzy-rough sets
implicator
t-norm
Fuzzy-rough set
Rough set
![Page 11: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/11.jpg)
Fuzzy-rough feature selection• Based on fuzzy similarity
• Lower/upper approximations
|minmax||)()(|1),(
aayaxayxRa
)},({),( yxRTyxR aP Pa
(e.g.)
![Page 12: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/12.jpg)
FRFS: evaluation function• Fuzzy positive region #1
• Fuzzy positive region #2 (weak)
• Dependency function
![Page 13: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/13.jpg)
FRFS: finding reducts• Fuzzy-rough QuickReduct– Evaluation: use the dependency function (or other
fuzzy-rough measure)
– Generation: greedy hill-climbing
– Stopping criterion: when maximal evaluation function is reached (or to degree α)
![Page 14: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/14.jpg)
FRFS• Other search methods– GAs, PSO, EDAs, Harmony Search, etc– Backward elimination, plus-L minus-R, floating
search, SAT, etc
• Other subset evaluations– Fuzzy boundary region– Fuzzy entropy– Fuzzy discernibility function
![Page 15: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/15.jpg)
Ant-based FS
![Page 16: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/16.jpg)
Boundary regionUpperApproximation
Set X
LowerApproximation
Equivalence class [x]B
![Page 17: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/17.jpg)
FRFS: boundary region• Fuzzy lower and upper approximation define
fuzzy boundary region
• For each concept, minimise the boundary region– (also applicable to crisp RSFS)
• Results seem to show this is a more informed heuristic (but more computationally complex)
![Page 18: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/18.jpg)
Finding smallest reducts• Usually too expensive to search exhaustively for reducts
with minimal cardinality
• Reducts found via discernibility matrices through, e.g.:– Converting from CNF to DNF (expensive)– Hill-climbing search using clauses (non-optimal)– Other search methods - GAs etc (non-optimal)
• SAT approach– Solve directly in SAT formulation– DPLL approach ensures optimal reducts
![Page 19: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/19.jpg)
Fuzzy discernibility matrices• Extension of crisp approach– Previously, attributes had {0,1} membership to clauses– Now have membership in [0,1]
• Fuzzy DMs can be used to find fuzzy-rough reducts
![Page 20: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/20.jpg)
Formulation• Fuzzy satisfiability
• In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true
• For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true
![Page 21: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/21.jpg)
Example
![Page 22: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/22.jpg)
DPLL algorithm
![Page 23: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/23.jpg)
Experimentation: results
![Page 24: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/24.jpg)
FRFS: issues• Problem – noise tolerance!
![Page 25: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/25.jpg)
Vaguely quantified rough sets
y belongs to the lower approximation of A iff all elements of Ry belong to Ay belongs to the upper approximation of A iff at least one element of Ry belongs to A
y belongs to the lower approximation of A iff most elements of Ry belong to Ay belongs to the upper approximation of A iff at least some elements of Ry belong to A
Pawlakrough set
VQRS
![Page 26: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/26.jpg)
VQRS-based feature selection• Use the quantified lower approximation,
positive region and dependency degree– Evaluation: the quantified dependency (can be
crisp or fuzzy)– Generation: greedy hill-climbing– Stopping criterion: when the quantified positive
region is maximal (or to degree α)
• Should be more noise-tolerant, but is non-monotonic
![Page 27: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/27.jpg)
ProgressRough set
theory
Fuzzy rough set theory
VQRS
OWA-FRFS
Fuzzy VPRS
...
Qualitative data
Quantitative data
Noisy data
Monotonic
![Page 28: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/28.jpg)
More issues...
• Problem #1: how to choose fuzzy similarity?
• Problem #2: how to handle missing values?
![Page 29: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/29.jpg)
Interval-valued FRFS• Answer #1: Model uncertainty in fuzzy
similarity by interval-valued similarityIV fuzzy rough set
IV fuzzy similarity
![Page 30: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/30.jpg)
Interval-valued FRFS• When comparing two object values for a given
attribute – what to do if at least one is missing?
• Answer #2: Model missing values via the unit interval
![Page 31: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/31.jpg)
Other measures• Boundary region
• Discernibility function
![Page 32: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/32.jpg)
Initial experimentationOriginal Dataset
Cross-validation folds
Type-1 FRFS
Reduced folds
JRip
Data corruption
IV-FRFS methods
Reduced folds
JRip
![Page 33: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/33.jpg)
Initial experimentation
![Page 34: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/34.jpg)
Initial results: lower approx
![Page 35: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/35.jpg)
Instance Selection
![Page 36: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/36.jpg)
Instance selection: basic ideas
Not needed
Remove objects to keep the underlying approximations unchanged
![Page 37: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/37.jpg)
Instance selection: basic ideas
Remove objects whose positive region membership is < 1
Noisy objects
![Page 38: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/38.jpg)
FRIS-I
![Page 39: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/39.jpg)
FRIS-II
![Page 40: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/40.jpg)
FRIS-III
![Page 41: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/41.jpg)
Fuzzy rough instance selection• Time complexity is a problem for FRIS-II and
FRIS-III
• Less complex: Fuzzy rough prototype selection– More on this later...
![Page 42: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/42.jpg)
Fuzzy-rough classification and prediction
![Page 43: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/43.jpg)
FRNN/VQNN
![Page 44: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/44.jpg)
FRNN/VQNN
![Page 45: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/45.jpg)
Further developments• FRNN and VQNN have limitations (for classification
problems)– FRNN only uses one neighbour– VQNN equivalent to FNN if the same similarity relation is
used
• POSNN uses the positive region to also consider the quality of neighbours– E.g. instances in overlapping class regions are less
interesting– More on this later...
![Page 46: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/46.jpg)
Discovering rules via RST• Equivalence classes– Form the antecedent part of a rule– The lower approximation tells us if this is
predictive of a given concept (certain rules)
• Typically done in one of two ways:– Overlaying reducts– Building rules by considering individual
equivalence classes (e.g. LEM2)
![Page 47: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/47.jpg)
QuickRules framework• The fuzzy tolerance classes used during this process
can be used to create fuzzy rules
• When a reduct is found the resulting rules cover all instances
Generation Evaluation and
StoppingCriterion Validation
Feature set Subset
Subsetsuitability
Continue Stop
Rule Induction
![Page 48: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/48.jpg)
Harmony search approach• R. Diao and Q. Shen. A harmony search based
approach to hybrid fuzzy-rough rule induction, Proceedings of the 21st International Conference on Fuzzy Systems, 2012.
![Page 49: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/49.jpg)
a b c score
2 3 1 3
2 3 2 4
4 4 2 9
3 4 5 21
Minimise ( a – 2 ) 2 + ( b – 3 ) 4 + ( c – 1 ) 2 + 3
Musicians
Notes
Harmony
HarmonyMemory
Harmony search approach
Fitness
![Page 50: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/50.jpg)
Key notion mapping
Harmony Rule set
Fitness Combined evaluation
Musician Fuzzy rule rx
Note Feature subset
Solution
Evaluation
Variable
Value
HarmonySearch
Hybrid RuleInduction
NumericalOptimisation
![Page 51: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/51.jpg)
Comparison vs QuickRulesHarmonyRules56.33±10.00
QuickRules63.1±11.89
Rule cardinality distribution for dataset web of 2556 features
![Page 52: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/52.jpg)
Fuzzy-rough semi-supervised learning
![Page 53: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/53.jpg)
Semi-supervised learning (SSL)• Lies somewhere between supervised and
unsupervised learning
• Why use it?– Data is expensive to label/classify– Labels can also be difficult to obtain– Large amounts of unlabelled data available
• When is SSL useful?– Small number of labelled objects but large number of
unlabelled objects
![Page 54: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/54.jpg)
Semi-supervised learning• A number of methods for SSL – self-learning,
generative models etc.– Labelled data objects – usually small in number– Unlabelled data objects – usually large in number– A set of features describe the objects– Class label tells us only which labelled objects belong to
• SSL therefore attempts to learn labels (or structure) for data which has no labels– Labelled data provides ‘clues’ for the unlabelled data
![Page 55: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/55.jpg)
Co-training
Labelled Dataset
Learner 1 Learner 2
subset 1 subset 2
PredictionsPredictions
Unlabelled Data
![Page 56: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/56.jpg)
Self-learning
Labelled Dataset
Learner
Predictions
Unlabelled Data
Labelled data objects
![Page 57: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/57.jpg)
Fuzzy-rough self learning (FRSL)• Basic idea is to propagate labels using the upper and lower
approximations– Label only those objects which belong to the lower approximation
of a class to a high degree– Can use upper approximation to decide on ties
• Attempts to minimise mis-labelling and subsequent reinforcement
• Paper: N. Mac Parthalain and R. Jensen. Fuzzy-Rough Set based Semi-Supervised Learning. Proceedings of the 20th International Conference on Fuzzy Systems (FUZZ-IEEE’11), pp. 2465-2471, 2011.
![Page 58: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/58.jpg)
FRSL
Labelled dataset
Fuzzy-rough learner
Predictions
Unlabelled Data
Labelled data objects
Lower approximation
membership = 1?
Yes
No
![Page 59: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/59.jpg)
Experimentation (Problem 1)
![Page 60: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/60.jpg)
SS-FCM
![Page 61: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/61.jpg)
FNN
![Page 62: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/62.jpg)
FRSL
![Page 63: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/63.jpg)
Experimentation (Problem 2)
cluster 1 cluster 2 labelled 1 labelled 2
![Page 64: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/64.jpg)
SS-FCM
![Page 65: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/65.jpg)
FNN
![Page 66: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/66.jpg)
FRSL
![Page 67: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/67.jpg)
Conclusion• Looked at fuzzy-rough methods for data mining– Feature selection, finding optimal reducts– Handling missing values and other problems – Classification/prediction– Instance selection– Semi-supervised learning
• Future work– Imputation, better rule induction and instance selection
methods, more semi-supervised methods, optimizations, instance/feature weighting
![Page 68: Fuzzy-rough data mining](https://reader035.fdocuments.us/reader035/viewer/2022081604/56816625550346895dd97fe2/html5/thumbnails/68.jpg)
FR methods in Weka• Weka implementations of all fuzzy-rough methods
can be downloaded from:
• KEEL version available soon (hopefully!)
http://users.aber.ac.uk/rkj/book/wekafull.jar