Download - Data Innovation Summit - Made in Belgium 2015

Data Innova)on Summit 2015

Andrea Dal Pozzolo, Olivier Caelen, and Gianluca Bontempi

Fraud Detec)on and Concept-‐DriB Adapta)on with Delayed Supervised Informa)on

26/3/2015

About Me

•  I’m a PhD student •  Machine Learning techniques for Fraud Detec)on in electronic transac)on.

Academic partner: MLG -‐ ULB

•  Researchers expert on data mining, computa)onal modelling, sta)s)cs and their applica)ons to fraud detec.on, bioinforma)cs and )me series predic)on.

Industrial partner: Worldline

•  Worldline is leader in electronic payment services.

•  In Brussels has a team of experts with more than 25 years of exper)se in fraud detec)on

The problem •  Growing presence of frauds •  It is not easy for a human analyst to detect fraudulent

paXerns -‐-‐> need automa)c systems for fraud detec)on

Challenges

1.  Concept driB (i.e. customers’ spending habits change)

Challenges

2. Unbalanced classifica)on (i.e. few frauds)

dataset$X1

dataset$X2

Challenges

Predic)ve model 3. True class label of only few alerted and checked transac)ons.

Goal of Detec)on With a limited budget, few transac)ons can be manually checked. Goal: limi)ng the false alerts for a given budget

Two types of transac)ons

Time%

Feedbacks%

Supervised%samples%

Delayed%samples%

t −δ t −1 t

FtDt−δ

All%fraudulent%transac9ons%of%a%day%

All%genuine%transac9ons%of%a%day%Fraudulent%transac9ons%in%the%feedback%

Genuine%transac9ons%in%the%feedback%

Data streams

•  Feedbacks: – Classifier dependent – Small set of risky transac)ons

Time%

Fraudulent%transac9ons%in%

Genuine%transac9ons%in%Fraudulent%feedback%in%%

Genuine%feedback%in%%

FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2

FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8


Day'1'

Day'2'

Day'3'

FtFt

StSt

Dt−9

•  Delayed samples – Large set of mostly genuine transac)ons

Learning strategy Time%


FtWDt

Time%


Wt

AW

Proposed'solu3on'

Standard'solu3on'

Concept driB adapta)on

•  Learn from new concepts and forget outdate transac)ons using a sliding window.

Time%



Day'1'

Day'2'

Dt−9

FtWDt

FtWDt

Unbalance Delayed samples

•  Remove randomly genuine transac)ons to balance the delayed sample before training

Unbalanced)dataset) Balanced)dataset)

Undersampling)

Experiments on a real dataset

WAW

Experiments on a synthe)c dataset with Concept DriB

WAW

Benefit of proposed solu)on

•  Exploit the feedbacks from inves)gators •  Meets realis)c working condi)ons

•  Gives large influence to feedbacks w.r.t. delayed samples.

Conclusion

•  Alert-‐feedback interac)on has to be considered in designing fraud detec)ons systems

•  Feedbacks from inves)gators have to be separately handled.

•  Aggrega)ng two dis)nct classifiers is an effec)ve solu)on for concept driBs.

Future work

1.  Adap)ve classifier aggrega)ons

2.  Implementa)on into Big Data architectures

3.  This work will be con)nued with the BruFence project

BruFence

•  Big Data Mining for Fraud Detec)on and Security •  2015-‐2018 funded by Innoviris (Brussels Region).

• ULB - QualSec

• ULB - MLG

• UCL - MLG

!

!

!

!

• Wordline

• Steria

• NViso

BruFence - consortium

Spice

• ULB - QualSec

• ULB - MLG

• UCL - MLG

!

!

!

!

• Wordline

• Steria

• NViso


Spice

• ULB - QualSec

• ULB - MLG

• UCL - MLG

!

!

!

!

• Wordline

• Steria

• NViso


Spice

• ULB - QualSec

• ULB - MLG

• UCL - MLG

!

!

!

!

• Wordline

• Steria

• NViso


Spice

Thank you for listening