Data Innova)on Summit 2015
Andrea Dal Pozzolo, Olivier Caelen, and Gianluca Bontempi
Fraud Detec)on and Concept-‐DriB Adapta)on with Delayed Supervised Informa)on
26/3/2015
About Me
• I’m a PhD student • Machine Learning techniques for Fraud Detec)on in electronic transac)on.
Academic partner: MLG -‐ ULB
• Researchers expert on data mining, computa)onal modelling, sta)s)cs and their applica)ons to fraud detec.on, bioinforma)cs and )me series predic)on.
Industrial partner: Worldline
• Worldline is leader in electronic payment services.
• In Brussels has a team of experts with more than 25 years of exper)se in fraud detec)on
The problem • Growing presence of frauds • It is not easy for a human analyst to detect fraudulent
paXerns -‐-‐> need automa)c systems for fraud detec)on
Goal of Detec)on With a limited budget, few transac)ons can be manually checked. Goal: limi)ng the false alerts for a given budget
Two types of transac)ons
Time%
Feedbacks%
Supervised%samples%
Delayed%samples%
t −δ t −1 t
FtDt−δ
All%fraudulent%transac9ons%of%a%day%
All%genuine%transac9ons%of%a%day%Fraudulent%transac9ons%in%the%feedback%
Genuine%transac9ons%in%the%feedback%
Data streams
• Feedbacks: – Classifier dependent – Small set of risky transac)ons
Time%
Fraudulent%transac9ons%in%
Genuine%transac9ons%in%Fraudulent%feedback%in%%
Genuine%feedback%in%%
FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2
FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8
FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8
Day'1'
Day'2'
Day'3'
FtFt
StSt
Dt−9
• Delayed samples – Large set of mostly genuine transac)ons
Learning strategy Time%
FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8
FtWDt
Time%
FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8
Wt
AW
Proposed'solu3on'
Standard'solu3on'
Concept driB adapta)on
• Learn from new concepts and forget outdate transac)ons using a sliding window.
Time%
FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8
FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8
Day'1'
Day'2'
Dt−9
FtWDt
FtWDt
Unbalance Delayed samples
• Remove randomly genuine transac)ons to balance the delayed sample before training
Unbalanced)dataset) Balanced)dataset)
Undersampling)
Benefit of proposed solu)on
• Exploit the feedbacks from inves)gators • Meets realis)c working condi)ons
• Gives large influence to feedbacks w.r.t. delayed samples.
Conclusion
• Alert-‐feedback interac)on has to be considered in designing fraud detec)ons systems
• Feedbacks from inves)gators have to be separately handled.
• Aggrega)ng two dis)nct classifiers is an effec)ve solu)on for concept driBs.
Future work
1. Adap)ve classifier aggrega)ons
2. Implementa)on into Big Data architectures
3. This work will be con)nued with the BruFence project
BruFence
• Big Data Mining for Fraud Detec)on and Security • 2015-‐2018 funded by Innoviris (Brussels Region).
• ULB - QualSec
• ULB - MLG
• UCL - MLG
!
!
!
!
• Wordline
• Steria
• NViso
BruFence - consortium
Spice
• ULB - QualSec
• ULB - MLG
• UCL - MLG
!
!
!
!
• Wordline
• Steria
• NViso
BruFence - consortium
Spice
• ULB - QualSec
• ULB - MLG
• UCL - MLG
!
!
!
!
• Wordline
• Steria
• NViso
BruFence - consortium
Spice
• ULB - QualSec
• ULB - MLG
• UCL - MLG
!
!
!
!
• Wordline
• Steria
• NViso
BruFence - consortium
Spice
Top Related