wRACOG: A Gibbs Sampling-Based Oversampling Technique

Post on 21-May-2015

246 views 2 download

Tags:

description

This paper was presented at the International Conference on Data Mining, 2013.

Transcript of wRACOG: A Gibbs Sampling-Based Oversampling Technique

Barnan DasSchool of Electrical Engineering and Computer Science

Washington State University

wRACOG: A Gibbs Sampling-Based Oversampling TechniqueBarnan Das, Narayanan C. Krishnan, Diane J. Cook

2

Imbalanced Class Distribution

3

Automated Prompting for Older Adults

4

Automated Prompting for Older Adults

Class Distribution

5

149

3831

Total number of data points

3980

Solution?

6

Preprocessing

Sampling• Over-sampling the minority class• Under-sampling the majority class

Oversampling• Spatial location of samples in Euclidean space

Proposed Approach

7

Preprocessing technique to oversample minority class

Approximate discrete probability distribution using

Generate new minority class data points using

Chow-Liu’s algorithm Gibbs sampling

Approximating Discrete Probability Distribution

8

Minority Class

Mutual Information Between Attributes

I (xi,xj)i = 1,2,…(n-1)j = 2,3,…,ni < j

Maximum-weighted Dependence Tree

Chow-Liu Dependence Tree

Gibbs Sampling

9

For all attributes

Chow-Liu Dependence Tree

Gibbs Sampling

10

Minority Class Samples

Majority Class Samples

Markov Chains

(wrapper-based)RApidly COnverging Gibbs sampler: RACOG & wRACOG

11

Differ in sample selection from Markov chains RACOG:• Based on burn-in and lag• Stopping criteria: predefined number of iterations• Effectiveness of new samples is not judged

wRACOG:• Iterative training on dataset, addition of

misclassified data points• Stopping criteria: No further improvement of

performance measure (TP rate)

Experimental Setup

12

Datasets

• prompting• abalone• car• nursery• letter• connect-4

Classifiers

• C4.5 decision tree

• SVM• k-Nearest

Neighbor• Logistic

Regression

Other Methods

• SMOTE• SMOTEBoost• RUSBoost

Results (Sensitivity)

13

Results (G-mean)

14

Results (ROC)

15

New Samples Generated

16

Iterations of Gibbs Sampler

17

Conclusion

18

• Oversampling technique to address imbalanced classes

• Takes probability distribution of minority class into account

• Performs better than other sampling methods

19

Backup Slides

20