http://majorplus.nerbnerb.com
Ensemble with Neighbor Rules VotingEnsemble with Neighbor Rules Voting
Itt Romneeyangkurn, Sukree SinthupinyoFaculty of Computer Science
Thammasat University
2
http://majorplus.nerbnerb.com
OutlineOutline
Introduction and Motivation Preliminaries
Decision Tree Ensemble of Decision Trees Simple Majority Rule Simple Majority Class
Experiment Process Making Rule into Group Majority Rule+ and Majority Class+
Results of Experiment Conclusions and Future Work
http://majorplus.nerbnerb.com
Introduction of Decision TreeIntroduction of Decision Tree
Decision Tree is world-wide Used in data mining and machine learning
CART, C4.5 and ID3 etc. Advantages
Simple to understand and interpret Requires little data preparation Able to handle both numerical and categorical data Use a white box model Possible to validate a model using statistical tests Robust, perform well with large data in a short time
http://majorplus.nerbnerb.com
Introduction of Decision TreeIntroduction of Decision Tree
Outlook
Yes
No Yes No Yes
Attribute
Sunny
RainOvercast
Humidity
Wind
High Normal Strong
Weak
Attribute Value
Class
http://majorplus.nerbnerb.com
Decision Tree EnsembleDecision Tree Ensemble
Single Classifier -> Ensemble of Classifiers More accurate than individual classifier AdaBoost, Bagging and Random forest
etc.
http://majorplus.nerbnerb.com
Decision Tree EnsembleDecision Tree Ensemble
DT
TrainingSet
R
DT3
OriginalTraining
Set
R3
DTn
Rn
DT2
R2
DT1
R1
TrainingSet 1
for Tree 1
TrainingSet 2
for Tree 2
TrainingSet 3
for Tree 3
TrainingSet n
for Tree n
Individual Classifier Ensemble of Classifiers
http://majorplus.nerbnerb.com
BootstrappingBootstrapping
Also called replicationsBe created by uniformly sampling m
times with replacement from a dataset of size m
Used to train the multiple classifiersCart, nearest neighbor classifiers and
C4.5 etc.10 bootstrap replications
http://majorplus.nerbnerb.com
BootstrappingBootstrapping
Example of Original Dataset
Original Dataset: 1,2,3,4,5,6,7,8
Example of Bootstrap Replications
1st Bootstrap: 2,7,8,3,7,6,3,1
2nd Bootstrap: 7,8,5,6,4,2,7,1
3rd Bootstrap: 3,6,2,7,5,6,2,2
4th Bootstrap: 4,5,1,4,6,4,3,8
http://majorplus.nerbnerb.com
Simple Majority Vote (Bagging)Simple Majority Vote (Bagging)
Advantages Improve classification accuracy Reduce variance Helps to avoid over-fitting
Method Generates T bootstrap samples Generates each classifier Majority vote among the resulting T
decision tree is the final output
http://majorplus.nerbnerb.com
Simple Majority Vote (Bagging)Simple Majority Vote (Bagging)
DT1
Test Set
DT2
DT3
DT4
DT5
DT6
DT7
DT8
DT9
DT10
A
A
A
B
B
A
A
B
A
A
A
A
A
A
A
A
A
A=7B=3B
B
B
Simple
Majority
VoteA
http://majorplus.nerbnerb.com
Simple Majority ClassSimple Majority Class
Based on BaggingDifference in Voting
Majority vote among the class of training set that match the rule, which classify the data tester, in T decision trees, is the final output
http://majorplus.nerbnerb.com
Simple Majority ClassSimple Majority Class
DT1
Test set
DT2
DT3
DT4
DT5
DT6
DT7
DT8
DT9
DT10
DT1
Rule 1
Rule 2
Rule 3
Original Training Set
Data 1: AData 2: AData 3: BData 4: A
Data n: B
A: 0
B: 0
123
1
45678
23Example
A: 8B: 3A: 5B: 2
A: 6B: 3
A: 4B: 5
A: 9B: 2A: 8B: 2
B: 5
A: 8B: 3
A: 8B: 1
A: 7B: 2
A: 6
A=69B=28Simple
Majority
ClassA
http://majorplus.nerbnerb.com
Similarity Between RulesSimilarity Between Rules
Continuous Attribute Using the overlap between two rules’ ranges
Discrete Attribute Using the number of discrete attribute values in commo
n between both rules
lyindividual ruleeach of ranges
ruleeach of ranges ebetween th overlap
lyindividual rules twoby the covered attributes theof average
rulesboth by shared attributes theof percentage
More information about similarity between rules, please see “Bootstrapping Rule Induction to Archive Rule Stability and Reduction”
http://majorplus.nerbnerb.com
ExperimentExperiment
Nine well-known bench mark data sets
Ten-fold cross-validation
Based on C4.5 Run in default
mode with pruning enabled
Data Set Instances Attributes
Classes
Balance-Scale
625 4 3
Bridges 105 10 6
Car 1,728 6 4
Dermatology
366 34 6
Hayes-Roth 132 4 3
Labor-Neg 40 16 2
Soybean 307 35 19
TAE 151 3 3
Zoo 101 16 7
http://majorplus.nerbnerb.com
ExperimentExperiment
Generate 10 bootstrap samples of the original training set.
Generate 10 classifiers by using of C4.5.
Find similarity between rules from all classifier.
All rules are made into groups.
http://majorplus.nerbnerb.comGroup of Rules with Similarity Value 0.8Group of Rules with Similarity Value 0.8
DT1
DT2
DT3
DT10
R(1,1)
R(1,2)
R(1,3)
R(2,1)
R(2,2)
R(3,1)
R(3,2)
R(3,3)
R(10,1)
R(10,2)
R(10,3)
Group of Rule(1,1)
R(1,1)
Similarity Between Rule R(1,1) and R(1,2)
0.2302Similarity Between Rule R(1,1) and R(1,3)
-0.7495Similarity Between Rule R(1,1) and R(2,1)
0.9454Similarity Between Rule R(1,1) and R(2,2)
0.7382
R(2,1)
R(3,2)
R(5,4)R(6,2)
R(7,1)
R(10,3)
http://majorplus.nerbnerb.com
Majority Rule+Majority Rule+
Based on BaggingDifference in Voting
Majority vote among the class of rule-member is the final output
http://majorplus.nerbnerb.com
Majority Rule+Majority Rule+
DT1
Test Set
DT2
DT3
DT4
DT5
DT6
DT7
DT8
DT9
DT10
A: 3B: 4A: 2B: 5
A: 5B: 5
A: 2B: 1
A: 4B: 2A: 2B: 3
B: 6
A: 4B: 2
A: 1B: 3
A: 1B: 5
A: 3
A=27B=36Majority
Rule+ B
DT1
Rule 1
Rule 2
Rule 3
Example
Group of Rule 2
R(1,2)
R(2,2)
R(3,1)
R(4,2)R(6,2)
R(8,2)
R(9,1)
A A
B
AB
B
B
A=3B=4
A:
B:
3
4
http://majorplus.nerbnerb.com
Majority Class+Majority Class+
Based on BaggingDifference in Voting
Majority vote among the class of training set that matches the rule-member is the final output
http://majorplus.nerbnerb.com
Majority Class+Majority Class+
DT1
Test Set
DT2
DT3
DT4
DT5
DT6
DT7
DT8
DT9
DT10
A: 2B: 10A: 2B: 8
A: 6B: 4
A: 2B: 12
A: 4B: 7A: 2B: 14
B: 7
A: 4B: 12
A: 1B: 5
A: 5B: 13
A: 5
A=33B=92Majority
Class+ B
DT1
Rule 1
Rule 2
Rule 3
Example
Group of Rule 2
R(1,2)
R(2,2)
R(3,1)
R(4,2)R(6,2)
R(8,2)
R(9,1)
A:
B:
0
0
Original Training Set
Data 1: AData 2: AData 3: BData 4: A
Data n: B
12345678910
12
http://majorplus.nerbnerb.comComparing Bagging and Majority Rule+Comparing Bagging and Majority Rule+
Data SetBaggin
g
Majority Rule+
0.6 0.7 0.8 0.9
Balance-Scale
78.58±4.0978.09±3.43
79.21±3.66
79.70±4.79 ⊕ 80.01±4.
54 ⊕
Bridges 59.91±14.8661.82±15.68 ⊕ 59.00±18
.2061.73±16.57
61.73±16.57
Car 93.81±1.3789.76±2.00 ⊖ 90.34±2.
33 ⊖ 93.17±1.87
94.33±1.97
Dermatology 95.36±4.0495.90±3.29 ⊕ 95.91±3.
90 ⊕ 96.19±3.88 ⊕ 95.63±3.
69
Hayes-Roth 74.89±8.6574.89±13.02
75.66±11.94
74.95±11.96
73.41±11.60
Labor-Neg 67.50±22.5067.50±22.50
67.50±22.50
67.50±22.50
67.50±22.50
Soybean 85.67±6.3986.62±5.62
85.02±5.45
86.62±5.75
86.96±4.85
TAE 43.00±15.9545.00±14.08
45.00±14.08
43.00±15.95
43.00±15.95
Zoo 92.00±7.4892.00±7.48
92.00±7.48
94.00±6.63 ⊕ 94.00±6.
63 ⊕
http://majorplus.nerbnerb.comComparing Simple Majority Class and Majority Class+Comparing Simple Majority Class and Majority Class+
Data SetBaggin
g
Majority Class+
0.6 0.7 0.8 0.9
Balance-Scale
81.45±5.4780.81±4.81 ⊖ 80.81±4.
3281.29±5.68
81.45±5.74
Bridges 63.64±19.1359.82±17.00 ⊖ 62.64±19
.3763.64±17.78
64.55±18.89
Car 94.68±1.2891.67±1.29 ⊖ 93.40±1.
19 ⊖ 94.45±1.48
94.56±1.77
Dermatology 91.51±6.2391.25±6.29
93.44±4.65 ⊕ 93.71±5.
37 ⊕ 93.71±4.05 ⊕
Hayes-Roth 70.38±10.6973.30±12.73
74.12±11.15 ⊕ 72.64±10
.52 ⊕ 71.15±9.71
Labor-Neg 67.50±22.5067.50±22.50
67.50±22.50
67.50±22.50
67.50±22.50
Soybean 87.28±5.4163.94±11.39 ⊖ 78.84±6.
99 ⊖ 85.00±3.38 ⊖ 87.60±3.
87
TAE 43.67±14.4939.67±14.64 ⊖ 44.33±15
.0644.33±15.06
43.67±14.49
Zoo 91.00±11.3640.64±15.20 ⊖ 84.09±13
.61 ⊖ 92.00±6.00
92.00±6.00
http://majorplus.nerbnerb.com
Appropriate Similarity ValueAppropriate Similarity Value
Comparing Bagging and Majority Rule+
Similarity Value
Significantly Better
Significantly Worse
0.6 2 1
0.7 1 1
0.8 3 0
0.9 2 0
Comparing Simple Majority Class and Majority Class+
Similarity Value
Significantly Better
Significantly Worse
0.6 0 6
0.7 2 3
0.8 2 1
0.9 1 0
The Best Similarity Between Rules is 0.8
http://majorplus.nerbnerb.com
ConclusionsConclusions
Majority vote with neighbor rules improves accuracy over traditional simple majority vote.
The least similarity value is 0.8.
http://majorplus.nerbnerb.com
Future WorkFuture Work
Run more with 10-15 data set from UCI
Cluster the rules by using similarity value and derive to one classifier Reduces time and resource.
The similarity value 0.8 could be used to apply with other decision trees methods, such as AdaBoost etc.
http://majorplus.nerbnerb.com
The EndThe End
Top Related