Data Mining for Fraud Detection
description
Transcript of Data Mining for Fraud Detection
![Page 1: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/1.jpg)
CS490D:Introduction to Data Mining
Prof. Chris Clifton
April 14, 2004
Fraud and Misuse Detection
![Page 2: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/2.jpg)
What is Fraud Detection?
• Identify wrongful actions– Is right and wrong universal?– If so, why not just prevent wrong actions
• Identify actions by the wrong people
• Identify suspect actions– Legal– But probably not right
![Page 3: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/3.jpg)
In Data Mining terms…
• Classification?– Classify into fraudulent and non-fraudulent
behavior– What do we need to do this?
• Outlier Detection– Assume non-fraudulent behavior is normal– Find the exceptions
• Problems?
![Page 4: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/4.jpg)
– +–
Solution: Differential Profiling
• Determine individual behavior– What is normal for the
individual– What separates one
individual from another
• Gives profile of individual behavior
• How do we do this?Profile
ClassificationMining
ProfileProfile
+ +–
![Page 5: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/5.jpg)
Has this been done?Intrusion Detection (Lane&Brodley)
• Profiled computer users based on command sequences– Command– Some (but not all) argument information– Sequence information
![Page 6: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/6.jpg)
ResultsAccuracy Time to Alarm
![Page 7: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/7.jpg)
Scaling Issues
• What happens with millions of users?– Credit card– Cell phone
• What about new users?
• Ideas?
![Page 8: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/8.jpg)
Multi-user profiles
• Cluster users
• Develop profiles for clusters– E.g., differential profiling
• Old customers: Do they match profile for their cluster?– Allows wider range of acceptable behavior
• New customer: Do they match any profile?
![Page 9: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/9.jpg)
Data mining for detection and prevention
![Page 10: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/10.jpg)
Matching known fraud/non-compliance
• Which new cases are similar to known cases?
• How can we define similarity?• How can we rate or score
similarity?
![Page 11: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/11.jpg)
Anomalies and irregularities
• How can we detect anomalous or unusual behavior?
• What do we mean by usual?• Can we rate or score cases on
their degree of anomaly?
![Page 12: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/12.jpg)
Techniques used to identify fraud
Predict and Classify– Regression
algorithms (predict numeric outcome): neural networks, CART, Regression, GLM
– Classification algorithms (predict symbolic outcome): CART, C5.0, logistic regression
Group and Find Associations
– Clustering/Grouping algorithms: K-means, Kohonen, 2Step, Factor analysis
– Association algorithms: apriori, GRI, Capri, Sequence
![Page 13: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/13.jpg)
Techniques for finding fraud:
• Predict the expected value for a claim, compare that with the actual value of the claim.
• Those cases that fall far outside the expected range should be evaluated more closely
![Page 14: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/14.jpg)
Techniques for finding fraud:
•Build a profile of the characteristics of fraudulent behavior.•Pull out the cases that meet the historical characteristics of fraud.
Decision Trees and Rules
![Page 15: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/15.jpg)
Techniques for finding fraud:
• Group behavior using a clustering algorithm
• Find groups of events using the association algorithms
• Identify outliers and investigate
Clustering and Associations
![Page 16: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/16.jpg)
Fraud detection using CRISP-DM
Provides a systematic way to detect fraud and abuse
Ensures auditing and investigative efforts are maximized
Continually assesses and updates models to identify new emerging fraud patterns
Leads to higher recoupments
![Page 17: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/17.jpg)
Data mining in action: Fraud, waste and abusecase studies
![Page 18: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/18.jpg)
Payment Error Prevention
…used this information to focus their auditing effort
The US Health Care Finance Administration needed to isolate the likely causes of payment error by developing a profile of acceptable billing practices and...
![Page 19: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/19.jpg)
Payment error prevention solution
• Clementine™• Using audited discharge records, built
profiles of appropriate decisions such as diagnosis coding and admission
• Matched new cases• Cases not matching are audited
![Page 20: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/20.jpg)
Payment error prevention results
• Detected 50% of past incorrect payments – resulting in significant recovery of funding lost to payment errors
• PRO analysts able to use resultant Clementine models to prevent future error
![Page 21: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/21.jpg)
Billing and payment fraud
Identified suspicious cases to focus investigations
The US Defense Finance and Accounting Service needed to find fraud in millions of Dept of Defense transactions and...
![Page 22: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/22.jpg)
Billing and payment fraud solution
• Clementine• Detection models based on known fraud
patterns• Analyzed all transactions – scored based on
similarity to these known patterns• High scoring transactions flagged for
investigation
![Page 23: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/23.jpg)
Billing and payment fraud results
• Identified over 1,200 payments for further investigation
• Integrated the detection process• Anomaly detection methods (e.g.,
clustering) will serve as ‘sentinel’ systems for previously undetected fraud patterns
![Page 24: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/24.jpg)
Audit selection
Focused audit investigations on cases with the highest likely
adjustments
The Washington State Department of Revenue needed to detect erroneous tax returns and...
![Page 25: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/25.jpg)
Audit selection solution
• Clementine• Using previously audited returns • Model adjustment (recovery) per auditor
hour based on return information• Models will then score future returns
showing highest potential adjustment
![Page 26: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/26.jpg)
Audit selection results
• Maximizes auditors’ time by focusing on cases likely to yield the highest return
• Closes the ‘tax gap’
![Page 27: Data Mining for Fraud Detection](https://reader031.fdocuments.us/reader031/viewer/2022031704/546e856bb4af9fa5268b46d1/html5/thumbnails/27.jpg)
Data mining - key to detecting and preventing fraud, waste and abuse
• Learn from the past– High quality, evidence based
decisions• Predict
– Prevent future instances• React to changing circumstances
– Models kept current, from latest data