2013 credit card fraud detection why theory dosent adjust to practice

38
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Credit Card Fraud Detection Why Theory Doesn't Adjust to Practice Alejandro Correa Bahnsen, Luxembourg University Andrés Gonzalez Montoya, Scotia Bank

description

Presentation at the SAS Analytics Conference 2013, London, UK. Presenter: Alejandro Correa Bahnsen

Transcript of 2013 credit card fraud detection why theory dosent adjust to practice

Page 1: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Credit Card Fraud Detection Why Theory Doesn't Adjust to Practice

Alejandro Correa Bahnsen, Luxembourg University Andrés Gonzalez Montoya, Scotia Bank

Page 2: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Introduction

€ 500

€ 600

€ 700

€ 800

2007 2008 2009 2010 2011E 2012E

Europe fraud evolution Internet transactions (millions of euros)

Page 3: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Introduction

$-

$1.0

$2.0

$3.0

$4.0

$5.0

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

US fraud evolution Online revenue lost due to fraud (Billions of dollars)

Page 4: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Increasing fraud levels around the world

• Different technologies and legal requirements makes it harder to control

• There is a need for advanced fraud detection systems

Introduction

Page 5: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Introduction

• Transaction flow

• Database

• Evaluation of algorithms

• If-Then rules (Expert Rules)

• Financial measure

• Predictive modeling

• Logistic Regression

• Cost Sensitive Logistic Regression

Agenda

Page 6: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Simplify transaction flow

Fraud??

Network

Page 7: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Data

• Larger European card

processing company

• 2012 card present transactions

• 750,000 Transactions

• 3500 Frauds

• 0.467% Fraud rate

• 148,562 EUR lost due to fraud

on test dataset

Dec

Nov

Oct

Sep

Aug

Jul

Jun

May

Apr

Mar

Feb

Jan

Test

Train

Page 8: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Raw attributes

• Other attributes:

Age, country of residence, postal code, type of card

Data

TRXID Client ID Date Amount Location Type Merchant

Group Fraud

1 1 2/1/12 6:00 580 Ger Internet Airlines No

2 1 2/1/12 6:15 120 Eng Present Car Rent No

3 2 2/1/12 8:20 12 Bel Present Hotel Yes

4 1 3/1/12 4:15 60 Esp ATM ATM No

5 2 3/1/12 9:18 8 Fra Present Retail No

6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes

Page 9: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Derived attributes

Data

Trx

ID

Client

ID Date Amount Location Type

Merchant

Group Fraud

No. of Trx – same

client – last 6 hour

Sum – same client

– last 7 days

1 1 2/1/12 6:00 580 Ger Internet Airlines No 0 0

2 1 2/1/12 6:15 120 Eng Present Car Renting No 1 580

3 2 2/1/12 8:20 12 Bel Present Hotel Yes 0 0

4 1 3/1/12 4:15 60 Esp ATM ATM No 0 700

5 2 3/1/12 9:18 8 Fra Present Retail No 0 12

6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes 1 760

By Group Last Function

Client None hour Count

Credit Card Transaction Type day Sum(Amount)

Merchant week Avg(Amount)

Merchant Category month

Merchant Country 3 months

– Combination of following criteria:

Page 10: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Misclassification = 1 −𝑇𝑃+𝑇𝑁

𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

• Recall =𝑇𝑃

𝑇𝑃+𝐹𝑁

• Precision =𝑇𝑃

𝑇𝑃+𝐹𝑃

• F-Score = 2𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙

Evaluation

True Class (𝑦𝑖)

Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)

Predicted class

(𝑝𝑖)

Fraud (𝑝𝑖=1) TP FP

Legitimate (𝑝𝑖=0) FN TN

• Confusion matrix

Page 11: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Introduction

• Transaction flow

• Database

• Evaluation of algorithms

• If-Then rules (Expert Rules)

• Financial measure

• Predictive modeling

• Logistic Regression

• Cost Sensitive Logistic Regression

Agenda

Page 12: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Fraud

Algorithms

• If-Then rules

• Predictive modeling

• Logistic Regression

• Decision Trees

• Random Forest

• Cost Sensitive Logistic Regression

Fraud??

Network

Page 13: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• “Purpose is to use facts and rules, taken from the knowledge of many human experts, to help make decisions.”

• Example of rules

• More than 4 ATM transactions in one hour?

• More than 2 transactions in 5 minutes?

• Magnetic stripe transaction then internet transaction?

If-Then rules (Expert rules)

Page 14: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• More than 4 ATM transactions in one hour?

• More than 2 transactions in 5 minutes?

• Magnetic stripe transaction then internet transaction?

If-Then rules (Expert rules)

Fraud??

Network

If one or more rules is activated then decline the transaction

Page 15: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Problems with rules

• New fraud patterns are not detected

• Only simple rules can be created

• Advantages of rules

• Easy to implement

• Very easy to interpret

If-Then rules (Expert rules)

Page 16: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

If-Then rules (Expert rules)

1.04%

31%

17%

22%

Miss-cla Recall Precision F1-Score

Results

Page 17: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Motivation

• False positives carries a different cost than

false negatives

• Frauds range from few to thousands of euros

(dollars, pounds, etc)

Financial evaluation

There is a need for a real comparison measure

Page 18: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Cost matrix

where:

• Evaluation measure

Financial evaluation

Ca Administrative costs

Amt Amount of transaction i

True Class (𝑦𝑖)

Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)

Predicted class

(𝑝𝑖)

Fraud (𝑝𝑖=1) Ca Ca

Legitimate (𝑝𝑖=0) Amt 0

Page 19: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

If-Then rules

1.04%

31%

17%

22%

Miss-cla Recall Precision F1-Score

Results

€ 95,520

€ 148,562

Cost Cost No Model

148,562 EUR are the losses due to fraud in the test database (2 months)

Page 20: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Introduction

• Transaction flow

• Database

• Evaluation of algorithms

• If-Then rules (Expert Rules)

• Financial measure

• Predictive modeling

• Logistic Regression

• Cost Sensitive Logistic Regression

Agenda

Page 21: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Predictive modeling is the use of statistical and mathematical techniques to discover patterns in data in order to make predictions

Predictive modeling

Page 22: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Predictive modeling

Am

ount

of

transaction

Number of transactions last day

Normal Transaction

Fraud

Page 23: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Predictive modeling

Am

ount

of

transaction

Number of transactions last day

Normal Transaction

Fraud

Page 24: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Predictive modeling

Amount of transaction

Number of transactions last day

Normal Transaction

Fraud

Amount spend on internet last month

Page 25: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

True Class (𝑦𝑖)

Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)

Predicted class

(𝑝𝑖)

Fraud (𝑝𝑖=1) 0 1

Legitimate (𝑝𝑖=0) 1 0

• Model

• Cost Function

• Cost Matrix

Logistic Regression

Page 26: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

€ 148,196

€ 148,562

Cost Cost No Model

0.52% 0% 2%

0%

Miss-cla Recall Precision F1-Score

Logistic Regression

Results

148,562 EUR are the losses due to fraud in the test database (2 months)

Page 27: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

1% 5% 10% 20% 50%

Logistic Regression

Sub-sampling procedure:

0.467%

Select all the frauds and a random sample of the legitimate transactions.

620,000

310,000

62,000 31,000 15,500 5,200

Fraud Percentage

Page 28: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Logistic Regression

Results € 148,562 € 148,196

€ 142,510

€ 112,103

€ 79,838

€ 65,870

€ 46,530

€ -

€ 20,000

€ 40,000

€ 60,000

€ 80,000

€ 100,000

€ 120,000

€ 140,000

€ 160,000

0%

10%

20%

30%

40%

50%

60%

70%

No Model All 1% 5% 10% 20% 50%

Cost Recall Precision Miss-cla F1-Score

Selecting the algorithm by Cost

Page 29: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Logistic Regression

• Best model selected using traditional F1-Score does not gives the best results in terms of cost

• Model selected by cost, is trained using less than 1% of the database, meaning there is a lot of information excluded

• The algorithm is trained to minimize the miss-classification (approx.) but then is evaluated based on cost

• Why not train the algorithm to minimize the cost instead?

Page 30: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

True Class (𝑦𝑖)

Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)

Predicted class

(𝑝𝑖)

Fraud (𝑝𝑖=1) Ca Ca

Legitimate (𝑝𝑖=0) Amt 0

• Cost Matrix

Cost Sensitive Logistic Regression

• Cost Function

• Objective

Find 𝜃 that minimized the cost function (Genetic Algorithms)

Page 31: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Cost Function

• Gradient

• Hessian

Cost Sensitive Logistic Regression

Page 32: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Cost Sensitive Logistic Regression

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Legitimate

Fraud

Amount cumulative distribution

€49

€370 €124

€196

Page 33: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

€ 148,562

€ 31,174 € 37,785

€ 66,245 € 67,264 € 73,772

€ 85,724

€ -

€ 20,000

€ 40,000

€ 60,000

€ 80,000

€ 100,000

€ 120,000

€ 140,000

€ 160,000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

No Model All 1% 5% 10% 20% 50%

Cost Recall Precision F1-Score

Cost sensitive Logistic Regression

Results

Page 34: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Cost sensitive Logistic Regression

Results € 148,562

€ 95,520

€ 46,530

€ 31,174 € 35,466 € 34,203

€ -

€ 20,000

€ 40,000

€ 60,000

€ 80,000

€ 100,000

€ 120,000

€ 140,000

€ 160,000

0%

10%

20%

30%

40%

50%

60%

70%

80%

No Model If-Then rules Logistic Regression Cost SensitiveLogistic Regression

Decision Trees Random Forests

Cost Recall Precision F1-Score

Page 35: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Conclusion

• Selecting models based on traditional statistics does not gives the best results in terms of cost

• Models should be evaluated taking into account real financial costs of the application

• Algorithms should be developed to incorporate those financial costs

Page 36: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Contact information

Alejandro Correa Bahnsen

University of Luxembourg

Luxembourg

[email protected]

http://www.linkedin.com/in/albahnsen

http://www.slideshare.net/albahnsen

Page 37: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Thank You!!

Alejandro Correa Bahnsen Andres Gonzalez Montoya

Page 38: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Hastie, T., & Tibshirani, R. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Beijing.

• Hand, D., Whitrow, C., Adams, N. M., Juszczak, P., & Weston, D. (2007). Performance criteria for plastic card fraud detection tools. Journal of the Operational Research Society, 59, 956–962.

• Sheng, V., & Ling, C. (2006). Thresholding for making classifiers cost-sensitive. Proceedings of the National Conference on Artificial Intelligence.

• Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602–613.

• Ling, C., & Sheng, V. (2008). Cost-sensitive learning and the class imbalance problem. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of Machine Learning (pp. 231–235). Springer.

• Moro, S., Laureano, R., & Cortez, P. (2011). Using data mining for bank direct marketing: An application of the crisp-dm methodology. In EUROSIS (Ed.), European Simulation and Modeling Conference - ESM’2011 (pp. 117–121). Guimares, Portugal.

References