Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos...
-
Upload
jaime-farthing -
Category
Documents
-
view
215 -
download
1
Transcript of Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos...
![Page 1: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/1.jpg)
Data Mining For Credit Card Fraud: A Comparative Study
XxxxxxxxDSCI 5240 | Dr. Nick Evangelopoulos
Graduate Presentation
![Page 2: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/2.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 2
OverviewO Credit Card FraudO Data Mining TechniquesO DataO Experimental SetupO Results
![Page 3: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/3.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 3
Credit Card FraudO Two Types:
O Application FraudO Obtain new cards using false information
O Behavioral FraudO Mail theftO Stolen/lost cardO Counterfeit card
![Page 4: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/4.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 4
Credit Card FraudO Online Revenue loss due to Fraud
(cybersource.com)
![Page 5: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/5.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 5
Data Mining TechniquesO Logistic Regression
O Used to predict outcome of categorical dependent variable
O Fraud variable is binaryO Support Vector MachinesO Random Forest
![Page 6: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/6.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 6
Support Vector Machines (SVM)
O Supervised learning models with associated learning algorithms that analyze and recognize patterns
O Linear classifiers that work in high dimensional feature space that is non-linear mapping of input space
O Two properties of SVMO Kernel representationO Margin optimization
![Page 7: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/7.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 7
Random Forest (RF)O Ensemble of classification treesO Performs well when individual members are
dissimilar
![Page 8: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/8.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 8
Data: DatasetsO 13 Months of data (Jan 2006 – Jan 2007)O 50 Million credit card transactions on 1 Million
credit cardsO 2420 known fraudulent transactions with 506
credit cards
![Page 9: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/9.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 9
Percentage of Transaction by transaction type
![Page 10: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/10.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 10
Data Selection
![Page 11: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/11.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 11
Primary attributes in Dataset
![Page 12: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/12.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 12
Derived Attributes
![Page 13: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/13.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 13
Experimental SetupO For SVM, Gaussian radial basis function was used
as the kernel functionO For Random Forest, number of attributes
considered at the node and number of trees was set.
O Data were sampled at different rates using random under sampling of majority class
![Page 14: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/14.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 14
Training and testing data
![Page 15: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/15.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 15
Results
![Page 16: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/16.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 16
Proportion of fraud captured at different depths
![Page 17: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/17.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 17
Fraud Capture Rate w/ Different Fraud Rates in Training Data
![Page 18: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/18.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 18
ConclusionO Examine the performance of two data mining techniques
O SVM and RF together with logistic regressionO Used real life data set from Jan 2006 – Jan 2007O Used data undersampling approach to sample dataO Random forest showed much higher performance at
upper file depthsO SVM performance at the upper file depths tended to
increase with lower proportion of fraud in the training data
O Random forest demonstrated overall better performance
![Page 19: Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.](https://reader037.fdocuments.us/reader037/viewer/2022103111/551756e35503461c6e8b4622/html5/thumbnails/19.jpg)
Graduate Presentation | DSCI 5240 | Xxxxxxx 19
Questions