Download - Http://majorplus.nerbnerb.com Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.

http://majorplus.nerbnerb.com

Ensemble with Neighbor Rules VotingEnsemble with Neighbor Rules Voting

Itt Romneeyangkurn, Sukree SinthupinyoFaculty of Computer Science

Thammasat University

2


OutlineOutline

Introduction and Motivation Preliminaries

Decision Tree Ensemble of Decision Trees Simple Majority Rule Simple Majority Class

Experiment Process Making Rule into Group Majority Rule+ and Majority Class+

Results of Experiment Conclusions and Future Work


Introduction of Decision TreeIntroduction of Decision Tree

Decision Tree is world-wide Used in data mining and machine learning

CART, C4.5 and ID3 etc. Advantages

Simple to understand and interpret Requires little data preparation Able to handle both numerical and categorical data Use a white box model Possible to validate a model using statistical tests Robust, perform well with large data in a short time


Introduction of Decision TreeIntroduction of Decision Tree

Outlook

Yes

No Yes No Yes

Attribute

Sunny

RainOvercast

Humidity

Wind

High Normal Strong

Weak

Attribute Value

Class


Decision Tree EnsembleDecision Tree Ensemble

Single Classifier -> Ensemble of Classifiers More accurate than individual classifier AdaBoost, Bagging and Random forest

etc.


Decision Tree EnsembleDecision Tree Ensemble

DT

TrainingSet

R

DT3

OriginalTraining

Set

R3

DTn

Rn

DT2

R2

DT1

R1

TrainingSet 1

for Tree 1

TrainingSet 2

for Tree 2

TrainingSet 3

for Tree 3

TrainingSet n

for Tree n

Individual Classifier Ensemble of Classifiers


BootstrappingBootstrapping

Also called replicationsBe created by uniformly sampling m

times with replacement from a dataset of size m

Used to train the multiple classifiersCart, nearest neighbor classifiers and

C4.5 etc.10 bootstrap replications


BootstrappingBootstrapping

Example of Original Dataset

Original Dataset: 1,2,3,4,5,6,7,8

Example of Bootstrap Replications

1st Bootstrap: 2,7,8,3,7,6,3,1

2nd Bootstrap: 7,8,5,6,4,2,7,1

3rd Bootstrap: 3,6,2,7,5,6,2,2

4th Bootstrap: 4,5,1,4,6,4,3,8


Simple Majority Vote (Bagging)Simple Majority Vote (Bagging)

Advantages Improve classification accuracy Reduce variance Helps to avoid over-fitting

Method Generates T bootstrap samples Generates each classifier Majority vote among the resulting T

decision tree is the final output


Simple Majority Vote (Bagging)Simple Majority Vote (Bagging)

DT1

Test Set

DT2

DT3

DT4

DT5

DT6

DT7

DT8

DT9

DT10

A

A

A

B

B

A

A

B

A

A

A

A

A

A

A

A

A

A=7B=3B

B

B

Simple

Majority

VoteA


Simple Majority ClassSimple Majority Class

Based on BaggingDifference in Voting

Majority vote among the class of training set that match the rule, which classify the data tester, in T decision trees, is the final output


Simple Majority ClassSimple Majority Class

DT1

Test set

DT2

DT3

DT4

DT5

DT6

DT7

DT8

DT9

DT10

DT1

Rule 1

Rule 2

Rule 3

Original Training Set

Data 1: AData 2: AData 3: BData 4: A

Data n: B

A: 0

B: 0

123

1

45678

23Example

A: 8B: 3A: 5B: 2

A: 6B: 3

A: 4B: 5

A: 9B: 2A: 8B: 2

B: 5

A: 8B: 3

A: 8B: 1

A: 7B: 2

A: 6

A=69B=28Simple

Majority

ClassA


Similarity Between RulesSimilarity Between Rules

Continuous Attribute Using the overlap between two rules’ ranges

Discrete Attribute Using the number of discrete attribute values in commo

n between both rules

lyindividual ruleeach of ranges

ruleeach of ranges ebetween th overlap

lyindividual rules twoby the covered attributes theof average

rulesboth by shared attributes theof percentage

More information about similarity between rules, please see “Bootstrapping Rule Induction to Archive Rule Stability and Reduction”


ExperimentExperiment

Nine well-known bench mark data sets

Ten-fold cross-validation

Based on C4.5 Run in default

mode with pruning enabled

Data Set Instances Attributes

Classes

Balance-Scale

625 4 3

Bridges 105 10 6

Car 1,728 6 4

Dermatology

366 34 6

Hayes-Roth 132 4 3

Labor-Neg 40 16 2

Soybean 307 35 19

TAE 151 3 3

Zoo 101 16 7


ExperimentExperiment

Generate 10 bootstrap samples of the original training set.

Generate 10 classifiers by using of C4.5.

Find similarity between rules from all classifier.

All rules are made into groups.

http://majorplus.nerbnerb.comGroup of Rules with Similarity Value 0.8Group of Rules with Similarity Value 0.8

DT1

DT2

DT3

DT10

R(1,1)

R(1,2)

R(1,3)

R(2,1)

R(2,2)

R(3,1)

R(3,2)

R(3,3)

R(10,1)

R(10,2)

R(10,3)

Group of Rule(1,1)

R(1,1)

Similarity Between Rule R(1,1) and R(1,2)

0.2302Similarity Between Rule R(1,1) and R(1,3)

-0.7495Similarity Between Rule R(1,1) and R(2,1)

0.9454Similarity Between Rule R(1,1) and R(2,2)

0.7382

R(2,1)

R(3,2)

R(5,4)R(6,2)

R(7,1)

R(10,3)


Majority Rule+Majority Rule+


Majority vote among the class of rule-member is the final output


Majority Rule+Majority Rule+

DT1

Test Set

DT2

DT3

DT4

DT5

DT6

DT7

DT8

DT9

DT10

A: 3B: 4A: 2B: 5

A: 5B: 5

A: 2B: 1

A: 4B: 2A: 2B: 3

B: 6

A: 4B: 2

A: 1B: 3

A: 1B: 5

A: 3

A=27B=36Majority

Rule+ B

DT1

Rule 1

Rule 2

Rule 3

Example

Group of Rule 2

R(1,2)

R(2,2)

R(3,1)

R(4,2)R(6,2)

R(8,2)

R(9,1)

A A

B

AB

B

B

A=3B=4

A:

B:

3

4


Majority Class+Majority Class+


Majority vote among the class of training set that matches the rule-member is the final output


Majority Class+Majority Class+

DT1

Test Set

DT2

DT3

DT4

DT5

DT6

DT7

DT8

DT9

DT10

A: 2B: 10A: 2B: 8

A: 6B: 4

A: 2B: 12

A: 4B: 7A: 2B: 14

B: 7

A: 4B: 12

A: 1B: 5

A: 5B: 13

A: 5

A=33B=92Majority

Class+ B

DT1

Rule 1

Rule 2

Rule 3

Example

Group of Rule 2

R(1,2)

R(2,2)

R(3,1)

R(4,2)R(6,2)

R(8,2)

R(9,1)

A:

B:

0

0

Original Training Set

Data 1: AData 2: AData 3: BData 4: A

Data n: B

12345678910

12

http://majorplus.nerbnerb.comComparing Bagging and Majority Rule+Comparing Bagging and Majority Rule+

Data SetBaggin

g

Majority Rule+

0.6 0.7 0.8 0.9

Balance-Scale

78.58±4.0978.09±3.43

79.21±3.66

79.70±4.79 ⊕ 80.01±4.

54 ⊕

Bridges 59.91±14.8661.82±15.68 ⊕ 59.00±18

.2061.73±16.57

61.73±16.57

Car 93.81±1.3789.76±2.00 ⊖ 90.34±2.

33 ⊖ 93.17±1.87

94.33±1.97

Dermatology 95.36±4.0495.90±3.29 ⊕ 95.91±3.

90 ⊕ 96.19±3.88 ⊕ 95.63±3.

69

Hayes-Roth 74.89±8.6574.89±13.02

75.66±11.94

74.95±11.96

73.41±11.60

Labor-Neg 67.50±22.5067.50±22.50

67.50±22.50

67.50±22.50

67.50±22.50

Soybean 85.67±6.3986.62±5.62

85.02±5.45

86.62±5.75

86.96±4.85

TAE 43.00±15.9545.00±14.08

45.00±14.08

43.00±15.95

43.00±15.95

Zoo 92.00±7.4892.00±7.48

92.00±7.48

94.00±6.63 ⊕ 94.00±6.

63 ⊕

http://majorplus.nerbnerb.comComparing Simple Majority Class and Majority Class+Comparing Simple Majority Class and Majority Class+

Data SetBaggin

g

Majority Class+

0.6 0.7 0.8 0.9

Balance-Scale

81.45±5.4780.81±4.81 ⊖ 80.81±4.

3281.29±5.68

81.45±5.74

Bridges 63.64±19.1359.82±17.00 ⊖ 62.64±19

.3763.64±17.78

64.55±18.89

Car 94.68±1.2891.67±1.29 ⊖ 93.40±1.

19 ⊖ 94.45±1.48

94.56±1.77

Dermatology 91.51±6.2391.25±6.29

93.44±4.65 ⊕ 93.71±5.

37 ⊕ 93.71±4.05 ⊕

Hayes-Roth 70.38±10.6973.30±12.73

74.12±11.15 ⊕ 72.64±10

.52 ⊕ 71.15±9.71

Labor-Neg 67.50±22.5067.50±22.50

67.50±22.50

67.50±22.50

67.50±22.50

Soybean 87.28±5.4163.94±11.39 ⊖ 78.84±6.

99 ⊖ 85.00±3.38 ⊖ 87.60±3.

87

TAE 43.67±14.4939.67±14.64 ⊖ 44.33±15

.0644.33±15.06

43.67±14.49

Zoo 91.00±11.3640.64±15.20 ⊖ 84.09±13

.61 ⊖ 92.00±6.00

92.00±6.00


Appropriate Similarity ValueAppropriate Similarity Value

Comparing Bagging and Majority Rule+

Similarity Value

Significantly Better

Significantly Worse

0.6 2 1

0.7 1 1

0.8 3 0

0.9 2 0

Comparing Simple Majority Class and Majority Class+

Similarity Value

Significantly Better

Significantly Worse

0.6 0 6

0.7 2 3

0.8 2 1

0.9 1 0

The Best Similarity Between Rules is 0.8


ConclusionsConclusions

Majority vote with neighbor rules improves accuracy over traditional simple majority vote.

The least similarity value is 0.8.


Future WorkFuture Work

Run more with 10-15 data set from UCI

Cluster the rules by using similarity value and derive to one classifier Reduces time and resource.

The similarity value 0.8 could be used to apply with other decision trees methods, such as AdaBoost etc.


The EndThe End