Download - 2013 credit card fraud detection why theory dosent adjust to practice

Transcript
Page 1: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Credit Card Fraud Detection Why Theory Doesn't Adjust to Practice

Alejandro Correa Bahnsen, Luxembourg University Andrés Gonzalez Montoya, Scotia Bank

Page 2: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Introduction

€ 500

€ 600

€ 700

€ 800

2007 2008 2009 2010 2011E 2012E

Europe fraud evolution Internet transactions (millions of euros)

Page 3: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Introduction

$-

$1.0

$2.0

$3.0

$4.0

$5.0

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

US fraud evolution Online revenue lost due to fraud (Billions of dollars)

Page 4: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Increasing fraud levels around the world

• Different technologies and legal requirements makes it harder to control

• There is a need for advanced fraud detection systems

Introduction

Page 5: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Introduction

• Transaction flow

• Database

• Evaluation of algorithms

• If-Then rules (Expert Rules)

• Financial measure

• Predictive modeling

• Logistic Regression

• Cost Sensitive Logistic Regression

Agenda

Page 6: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Simplify transaction flow

Fraud??

Network

Page 7: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Data

• Larger European card

processing company

• 2012 card present transactions

• 750,000 Transactions

• 3500 Frauds

• 0.467% Fraud rate

• 148,562 EUR lost due to fraud

on test dataset

Dec

Nov

Oct

Sep

Aug

Jul

Jun

May

Apr

Mar

Feb

Jan

Test

Train

Page 8: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Raw attributes

• Other attributes:

Age, country of residence, postal code, type of card

Data

TRXID Client ID Date Amount Location Type Merchant

Group Fraud

1 1 2/1/12 6:00 580 Ger Internet Airlines No

2 1 2/1/12 6:15 120 Eng Present Car Rent No

3 2 2/1/12 8:20 12 Bel Present Hotel Yes

4 1 3/1/12 4:15 60 Esp ATM ATM No

5 2 3/1/12 9:18 8 Fra Present Retail No

6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes

Page 9: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Derived attributes

Data

Trx

ID

Client

ID Date Amount Location Type

Merchant

Group Fraud

No. of Trx – same

client – last 6 hour

Sum – same client

– last 7 days

1 1 2/1/12 6:00 580 Ger Internet Airlines No 0 0

2 1 2/1/12 6:15 120 Eng Present Car Renting No 1 580

3 2 2/1/12 8:20 12 Bel Present Hotel Yes 0 0

4 1 3/1/12 4:15 60 Esp ATM ATM No 0 700

5 2 3/1/12 9:18 8 Fra Present Retail No 0 12

6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes 1 760

By Group Last Function

Client None hour Count

Credit Card Transaction Type day Sum(Amount)

Merchant week Avg(Amount)

Merchant Category month

Merchant Country 3 months

– Combination of following criteria:

Page 10: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Misclassification = 1 −𝑇𝑃+𝑇𝑁

𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

• Recall =𝑇𝑃

𝑇𝑃+𝐹𝑁

• Precision =𝑇𝑃

𝑇𝑃+𝐹𝑃

• F-Score = 2𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙

Evaluation

True Class (𝑦𝑖)

Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)

Predicted class

(𝑝𝑖)

Fraud (𝑝𝑖=1) TP FP

Legitimate (𝑝𝑖=0) FN TN

• Confusion matrix

Page 11: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Introduction

• Transaction flow

• Database

• Evaluation of algorithms

• If-Then rules (Expert Rules)

• Financial measure

• Predictive modeling

• Logistic Regression

• Cost Sensitive Logistic Regression

Agenda

Page 12: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Fraud

Algorithms

• If-Then rules

• Predictive modeling

• Logistic Regression

• Decision Trees

• Random Forest

• Cost Sensitive Logistic Regression

Fraud??

Network

Page 13: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• “Purpose is to use facts and rules, taken from the knowledge of many human experts, to help make decisions.”

• Example of rules

• More than 4 ATM transactions in one hour?

• More than 2 transactions in 5 minutes?

• Magnetic stripe transaction then internet transaction?

If-Then rules (Expert rules)

Page 14: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• More than 4 ATM transactions in one hour?

• More than 2 transactions in 5 minutes?

• Magnetic stripe transaction then internet transaction?

If-Then rules (Expert rules)

Fraud??

Network

If one or more rules is activated then decline the transaction

Page 15: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Problems with rules

• New fraud patterns are not detected

• Only simple rules can be created

• Advantages of rules

• Easy to implement

• Very easy to interpret

If-Then rules (Expert rules)

Page 16: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

If-Then rules (Expert rules)

1.04%

31%

17%

22%

Miss-cla Recall Precision F1-Score

Results

Page 17: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Motivation

• False positives carries a different cost than

false negatives

• Frauds range from few to thousands of euros

(dollars, pounds, etc)

Financial evaluation

There is a need for a real comparison measure

Page 18: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Cost matrix

where:

• Evaluation measure

Financial evaluation

Ca Administrative costs

Amt Amount of transaction i

True Class (𝑦𝑖)

Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)

Predicted class

(𝑝𝑖)

Fraud (𝑝𝑖=1) Ca Ca

Legitimate (𝑝𝑖=0) Amt 0

Page 19: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

If-Then rules

1.04%

31%

17%

22%

Miss-cla Recall Precision F1-Score

Results

€ 95,520

€ 148,562

Cost Cost No Model

148,562 EUR are the losses due to fraud in the test database (2 months)

Page 20: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Introduction

• Transaction flow

• Database

• Evaluation of algorithms

• If-Then rules (Expert Rules)

• Financial measure

• Predictive modeling

• Logistic Regression

• Cost Sensitive Logistic Regression

Agenda

Page 21: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Predictive modeling is the use of statistical and mathematical techniques to discover patterns in data in order to make predictions

Predictive modeling

Page 22: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Predictive modeling

Am

ount

of

transaction

Number of transactions last day

Normal Transaction

Fraud

Page 23: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Predictive modeling

Am

ount

of

transaction

Number of transactions last day

Normal Transaction

Fraud

Page 24: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Predictive modeling

Amount of transaction

Number of transactions last day

Normal Transaction

Fraud

Amount spend on internet last month

Page 25: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

True Class (𝑦𝑖)

Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)

Predicted class

(𝑝𝑖)

Fraud (𝑝𝑖=1) 0 1

Legitimate (𝑝𝑖=0) 1 0

• Model

• Cost Function

• Cost Matrix

Logistic Regression

Page 26: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

€ 148,196

€ 148,562

Cost Cost No Model

0.52% 0% 2%

0%

Miss-cla Recall Precision F1-Score

Logistic Regression

Results

148,562 EUR are the losses due to fraud in the test database (2 months)

Page 27: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

1% 5% 10% 20% 50%

Logistic Regression

Sub-sampling procedure:

0.467%

Select all the frauds and a random sample of the legitimate transactions.

620,000

310,000

62,000 31,000 15,500 5,200

Fraud Percentage

Page 28: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Logistic Regression

Results € 148,562 € 148,196

€ 142,510

€ 112,103

€ 79,838

€ 65,870

€ 46,530

€ -

€ 20,000

€ 40,000

€ 60,000

€ 80,000

€ 100,000

€ 120,000

€ 140,000

€ 160,000

0%

10%

20%

30%

40%

50%

60%

70%

No Model All 1% 5% 10% 20% 50%

Cost Recall Precision Miss-cla F1-Score

Selecting the algorithm by Cost

Page 29: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Logistic Regression

• Best model selected using traditional F1-Score does not gives the best results in terms of cost

• Model selected by cost, is trained using less than 1% of the database, meaning there is a lot of information excluded

• The algorithm is trained to minimize the miss-classification (approx.) but then is evaluated based on cost

• Why not train the algorithm to minimize the cost instead?

Page 30: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

True Class (𝑦𝑖)

Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)

Predicted class

(𝑝𝑖)

Fraud (𝑝𝑖=1) Ca Ca

Legitimate (𝑝𝑖=0) Amt 0

• Cost Matrix

Cost Sensitive Logistic Regression

• Cost Function

• Objective

Find 𝜃 that minimized the cost function (Genetic Algorithms)

Page 31: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Cost Function

• Gradient

• Hessian

Cost Sensitive Logistic Regression

Page 32: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Cost Sensitive Logistic Regression

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Legitimate

Fraud

Amount cumulative distribution

€49

€370 €124

€196

Page 33: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

€ 148,562

€ 31,174 € 37,785

€ 66,245 € 67,264 € 73,772

€ 85,724

€ -

€ 20,000

€ 40,000

€ 60,000

€ 80,000

€ 100,000

€ 120,000

€ 140,000

€ 160,000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

No Model All 1% 5% 10% 20% 50%

Cost Recall Precision F1-Score

Cost sensitive Logistic Regression

Results

Page 34: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Cost sensitive Logistic Regression

Results € 148,562

€ 95,520

€ 46,530

€ 31,174 € 35,466 € 34,203

€ -

€ 20,000

€ 40,000

€ 60,000

€ 80,000

€ 100,000

€ 120,000

€ 140,000

€ 160,000

0%

10%

20%

30%

40%

50%

60%

70%

80%

No Model If-Then rules Logistic Regression Cost SensitiveLogistic Regression

Decision Trees Random Forests

Cost Recall Precision F1-Score

Page 35: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Conclusion

• Selecting models based on traditional statistics does not gives the best results in terms of cost

• Models should be evaluated taking into account real financial costs of the application

• Algorithms should be developed to incorporate those financial costs

Page 36: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Contact information

Alejandro Correa Bahnsen

University of Luxembourg

Luxembourg

[email protected]

http://www.linkedin.com/in/albahnsen

http://www.slideshare.net/albahnsen

Page 37: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

Thank You!!

Alejandro Correa Bahnsen Andres Gonzalez Montoya

Page 38: 2013 credit card fraud detection why theory dosent adjust to practice

Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013

• Hastie, T., & Tibshirani, R. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Beijing.

• Hand, D., Whitrow, C., Adams, N. M., Juszczak, P., & Weston, D. (2007). Performance criteria for plastic card fraud detection tools. Journal of the Operational Research Society, 59, 956–962.

• Sheng, V., & Ling, C. (2006). Thresholding for making classifiers cost-sensitive. Proceedings of the National Conference on Artificial Intelligence.

• Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602–613.

• Ling, C., & Sheng, V. (2008). Cost-sensitive learning and the class imbalance problem. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of Machine Learning (pp. 231–235). Springer.

• Moro, S., Laureano, R., & Cortez, P. (2011). Using data mining for bank direct marketing: An application of the crisp-dm methodology. In EUROSIS (Ed.), European Simulation and Modeling Conference - ESM’2011 (pp. 117–121). Guimares, Portugal.

References