+
Adaptive Fraud DetectionAdaptive Fraud Detection
by by Tom Fawcett and and Foster Provost
Presented by: Lara NargozianPresented by: Lara Nargozianupdated from last 3 yearupdated from last 3 year’’s presentation s presentation by Adam Boyer,by Adam Boyer,Yunfei Zhao and Ahmen Abdeen HamedYunfei Zhao and Ahmen Abdeen Hamed
+Why?Why?
Solving real-world problems that are very important to Solving real-world problems that are very important to each and everyone of useach and everyone of us
Provide a framework that can be adapted to solve Provide a framework that can be adapted to solve similar problemssimilar problems
Use Data Mining algorithms and techniques learned Use Data Mining algorithms and techniques learned this semesterthis semester Rule LearningRule Learning ClassificationClassification
Fun to learn aboutFun to learn about
2
+OutlineOutline
Problem DescriptionProblem Description Cellular cloning fraud problemCellular cloning fraud problem Why it is importantWhy it is important Current strategiesCurrent strategies
Construction of Fraud DetectorConstruction of Fraud Detector FrameworkFramework Rule learning, Monitor construction, Evidence combinationRule learning, Monitor construction, Evidence combination
Experiments and EvaluationExperiments and Evaluation Data used in this studyData used in this study Data preprocessingData preprocessing Comparative resultsComparative results
ConclusionConclusion
Exam QuestionsExam Questions
3
+The ProblemThe Problem
How to detect suspicious changes in user behavior to identify and prevent cellular fraud Non-legitimate users, aka bandits, gain illicit access to a
legitimate user’s, or victim’s, account
Solution useful in other contexts Identifying and preventing credit card fraud, toll fraud, and
computer intrusion
4
+Cellular Fraud - CloningCellular Fraud - Cloning
Cloning FraudCloning Fraud A kind of A kind of Superimposition Superimposition fraud.(parasite)fraud.(parasite) Fraudulent usage is superimposed upon ( added to ) the Fraudulent usage is superimposed upon ( added to ) the
legitimate usage of an account.legitimate usage of an account. Causes inconvenience to customers and great expense to Causes inconvenience to customers and great expense to
cellular service providers.cellular service providers.
5
+Cellular communications andCellular communications andCloning FraudCloning Fraud
Mobile Identification Number Mobile Identification Number (MIN) and (MIN) and Electronic Serial Number Electronic Serial Number (ESN)(ESN) Identify a specific accountIdentify a specific account Periodically transmitted unencrypted whenever phone is onPeriodically transmitted unencrypted whenever phone is on
Cloning occurs when a customerCloning occurs when a customer’’s MIN and s MIN and ESN are programmed into a cellular phone not ESN are programmed into a cellular phone not belonging to the customerbelonging to the customer Bandit can make virtually unlimited, untraceable calls at Bandit can make virtually unlimited, untraceable calls at
someone else’s expensesomeone else’s expense
6
+ Interest in reducing Cloning Interest in reducing Cloning FraudFraud Fraud is detrimental in several ways:Fraud is detrimental in several ways:
Fraudulent usage congests cell sitesFraudulent usage congests cell sites Fraud incurs land-line usage chargesFraud incurs land-line usage charges Cellular carriers must pay costs to other carriers for usage Cellular carriers must pay costs to other carriers for usage
outside the home territoryoutside the home territory Crediting process is costly to carrier and inconvenient to the Crediting process is costly to carrier and inconvenient to the
customercustomer
7
+Strategies for dealing Strategies for dealing with cloning fraudwith cloning fraud
Pre-call MethodsPre-call Methods Identify and block fraudulent calls as they are madeIdentify and block fraudulent calls as they are made Validate the phone or its user when a call is placedValidate the phone or its user when a call is placed
Post-call MethodsPost-call Methods Identify fraud that has already occurred on an account so Identify fraud that has already occurred on an account so
that further fraudulent usage can be blockedthat further fraudulent usage can be blocked Periodically analyze call data on each account to determine Periodically analyze call data on each account to determine
whether fraud has occurred.whether fraud has occurred.
8
+Pre-call MethodsPre-call Methods
Personal Identification Number (PIN)Personal Identification Number (PIN) PIN cracking is possible with more sophisticated equipment.PIN cracking is possible with more sophisticated equipment.
RF Fingerprinting RF Fingerprinting Method of identifying phones by their unique transmission Method of identifying phones by their unique transmission
characteristicscharacteristics
AuthenticationAuthentication Reliable and secure private key encryption method.Reliable and secure private key encryption method. Requires special hardware capability Requires special hardware capability An estimated 30 million non-authenticatable phones are in An estimated 30 million non-authenticatable phones are in
use in the US alone (in 1997)use in the US alone (in 1997)
9
+Post-call MethodsPost-call Methods
Collision DetectionCollision Detection Analyze call data for temporally overlapping callsAnalyze call data for temporally overlapping calls
Velocity CheckingVelocity Checking Analyze the locations and times of consecutive callsAnalyze the locations and times of consecutive calls
Disadvantage of the above methods Disadvantage of the above methods Usefulness depends upon a moderate level of legitimate Usefulness depends upon a moderate level of legitimate
activityactivity
10
+Another Post-call MethodAnother Post-call Method( Main focus of this paper )( Main focus of this paper )
User Profiling User Profiling Analyze calling behavior to detect usage anomalies Analyze calling behavior to detect usage anomalies
suggestive of fraudsuggestive of fraud Works well with low-usage customersWorks well with low-usage customers Good complement to collision and velocity checking Good complement to collision and velocity checking
because it covers cases the others might missbecause it covers cases the others might miss
11
Sample Frauded AccountSample Frauded Account
Date Time Day Duration Origin Destination Fraud1/01/95 10:05:01 Mon 13 minutes Brooklyn, NY Stamford, CT
1/05/95 14:53:27 Fri 5 minutes Brooklyn, NY Greenwich, CT
1/08/95 09:42:01 Mon 3 minutes Bronx, NY Manhattan, NY
1/08/95 15:01:24 Mon 9 minutes Brooklyn, NY Brooklyn, NY
1/09/95 15:06:09 Tue 5 minutes Manhattan, NY Stamford, CT
1/09/95 16:28:50 Tue 53 seconds Brooklyn, NY Brooklyn, NY
1/10/95 01:45:36 Wed 35 seconds Boston, MA Chelsea, MA Bandit
1/10/95 01:46:29 Wed 34 seconds Boston, MA Yonkers, NY Bandit
1/10/95 01:50:54 Wed 39 seconds Boston, MA Chelsea, MA Bandit
1/10/95 11:23:28 Wed 24 seconds Brooklyn, NY Congers, NY
1/11/95 22:00:28 Thu 37 seconds Boston, MA Boston, MA Bandit
1/11/95 22:04:01 Thu 37 seconds Boston, MA Boston, MA Bandit
12
+The Need to be AdaptiveThe Need to be Adaptive
Patterns of fraud are dynamic – bandits constantly Patterns of fraud are dynamic – bandits constantly change their strategies in response to new detection change their strategies in response to new detection techniquestechniques
Levels of fraud can change dramatically from month-to-Levels of fraud can change dramatically from month-to-monthmonth
Cost of missing fraud or dealing with false alarms Cost of missing fraud or dealing with false alarms change with inter-carrier contractschange with inter-carrier contracts
13
+
Automatic Construction of Profiling Fraud Automatic Construction of Profiling Fraud DetectorsDetectors
+One ApproachOne Approach
Build a fraud detection system by classifying calls as Build a fraud detection system by classifying calls as being fraudulent or legitimatebeing fraudulent or legitimate
However there are two problems that make simple However there are two problems that make simple classification techniques infeasible.classification techniques infeasible.
15
+Problems with simple Problems with simple classificationclassification ContextContext
A call that would be unusual for one customer may be typical A call that would be unusual for one customer may be typical for another customer (For example, a call placed from for another customer (For example, a call placed from Brooklyn is not unusual for a subscriber who lives there, but Brooklyn is not unusual for a subscriber who lives there, but might be very strange for a Boston subscriber. )might be very strange for a Boston subscriber. )
Granularity (over fitting?)Granularity (over fitting?) At the level of the individual call, the variation in calling At the level of the individual call, the variation in calling
behavior is large, even for a particular user.behavior is large, even for a particular user.
16
+In Summary: In Summary: Learning The ProblemLearning The Problem
1.1. Which phone call features are important?Which phone call features are important?
2.2. How should profiles be created?How should profiles be created?
3.3. When should alarms be raised?When should alarms be raised?
17
+ Proposed Detector Constructor Proposed Detector Constructor Framework (DC-1)Framework (DC-1)
18
+DC-1 Processing Account-Day DC-1 Processing Account-Day ExampleExample
19
+DC-1 Fraud Detection StagesDC-1 Fraud Detection Stages
Stage 1: Rule LearningStage 1: Rule Learning
Stage 2: Profile MonitoringStage 2: Profile Monitoring
Stage 3: Combining EvidenceStage 3: Combining Evidence
20
+Rule Learning – the 1Rule Learning – the 1stst stage stage
Rule GenerationRule Generation Rules are generated locally based on differences Rules are generated locally based on differences
between fraudulent and normal behavior for each between fraudulent and normal behavior for each accountaccount
Rule Selection Rule Selection Then they are combined in a rule selection stepThen they are combined in a rule selection step
21
+Rule GenerationRule Generation
DC-1 uses the DC-1 uses the RLRL program to generate rules program to generate rules with certainty factors above user-defined with certainty factors above user-defined thresholdthreshold
For each Account, RL generates a For each Account, RL generates a ““locallocal”” set set of rules describing the fraud on that of rules describing the fraud on that account. account.
Example:Example:
(Time-of-Day = Night) AND (Location = Bronx) (Time-of-Day = Night) AND (Location = Bronx) FRAUD FRAUD
Certainty Factor = 0.89Certainty Factor = 0.89
22
+Rule SelectionRule Selection
Rule Rule generation step typically yields tens of generation step typically yields tens of thousands of rulesthousands of rules
If a rule is found in ( or covers ) many accounts then If a rule is found in ( or covers ) many accounts then it is probably worth usingit is probably worth using
Selection algorithm identifies a small set of general Selection algorithm identifies a small set of general rules that cover the accountsrules that cover the accounts
Resulting set of rules is used to construct specific Resulting set of rules is used to construct specific monitorsmonitors
23
+ Rule Selection and Covering Rule Selection and Covering AlgorithmAlgorithm
24
+Profiling Monitors – the 2Profiling Monitors – the 2ndnd stagestage
Monitors have 2 distinct steps -Monitors have 2 distinct steps - Profiling step:Profiling step:
Monitor is applied to an account’s normal usage to measure Monitor is applied to an account’s normal usage to measure the accountthe account‘‘s normal activity.s normal activity.
Statistics are saved with the account.Statistics are saved with the account.
Use step:Use step: A monitor processes a single account-day, A monitor processes a single account-day, References the normalcy measure from profilingReferences the normalcy measure from profiling Generates a numeric value describing how abnormal the Generates a numeric value describing how abnormal the
current account-day is. current account-day is.
26
+Most Common Monitor Most Common Monitor TemplatesTemplates
ThresholdThreshold
Standard DeviationStandard Deviation
27
+Threshold MonitorsThreshold Monitors
28
+Standard Deviation MonitorsStandard Deviation Monitors
29
+Comparing the same standard deviation monitor on two accounts
30
+Example for Standard Example for Standard DeviationDeviation
Rule Rule (TIME OF DAY = NIGHT) AND (LOCATION = BRONX)(TIME OF DAY = NIGHT) AND (LOCATION = BRONX) FRAUD FRAUD
Profiling StepProfiling Step the subscriber called from the Bronx an average of the subscriber called from the Bronx an average of 55 minutes per night minutes per night
with a standard deviation of with a standard deviation of 22 minutes. At the end of the Profiling step, minutes. At the end of the Profiling step, the monitor would store the values (5,2) with that account. the monitor would store the values (5,2) with that account.
Use stepUse step if the monitor processed a day containing if the monitor processed a day containing 33 minutes of airtime from minutes of airtime from
the Bronx at night, the monitor would emit a zero; if the monitor saw the Bronx at night, the monitor would emit a zero; if the monitor saw 1515 minutes, it would emit (15 - 5)/2 = 5. This value denotes that the minutes, it would emit (15 - 5)/2 = 5. This value denotes that the account is five standard deviations above its average (profiled) usage account is five standard deviations above its average (profiled) usage level. level.
31
+ Combining Evidence from Combining Evidence from the Monitors – the 3the Monitors – the 3rdrd stage stage Weights the monitor outputs and learns a Weights the monitor outputs and learns a
threshold on the sum to produce high threshold on the sum to produce high confidence alarmsconfidence alarms
DC-1 uses Linear Threshold Unit (LTU)DC-1 uses Linear Threshold Unit (LTU) Simple and fastSimple and fast Enables good first-order judgmentEnables good first-order judgment
A Feature selection process is used toA Feature selection process is used to Choose a small set of useful monitors in the final detectorChoose a small set of useful monitors in the final detector Some rules don’t perform well when used in monitors, some Some rules don’t perform well when used in monitors, some
overlapoverlap Forward selection process chooses set of useful monitorsForward selection process chooses set of useful monitors
32
+Final Output of DC-1
Detector that profiles each user’s behavior based on several indicators
An alarm when sufficient evidence of fraudulent activity
33
+
Data used in the studyData used in the study
+ Data InformationData Information
Four months of phone call records from the Four months of phone call records from the New York City area.New York City area.
Each call is described by 31 original attributesEach call is described by 31 original attributes
Some derived attributes are addedSome derived attributes are added Time-Of-Day Time-Of-Day (MORNING, AFTERNOON, TWILIGHT, EVENING, NIGHT)(MORNING, AFTERNOON, TWILIGHT, EVENING, NIGHT)
To-PayphoneTo-Payphone
Each call is given a class label of fraudulent or Each call is given a class label of fraudulent or legitimate.legitimate.
35
+Data CleaningData Cleaning
Eliminated credited calls made to Eliminated credited calls made to destinations/numbers that are not in the destinations/numbers that are not in the created blockcreated block The destination number must be only called by the The destination number must be only called by the
legitimate user.legitimate user.
Days with 1-4 minutes of fraudulent usage Days with 1-4 minutes of fraudulent usage were discarded.were discarded. May have credited for other reasons, such as wrong numberMay have credited for other reasons, such as wrong number
Call times were normalized to Greenwich Call times were normalized to Greenwich Mean Time for chronological sortingMean Time for chronological sorting
36
+Data DescriptionData Description
Once the monitors are created and accounts Once the monitors are created and accounts profiled, the system transforms raw call data profiled, the system transforms raw call data into a series of account-days using the into a series of account-days using the monitor outputs as features monitor outputs as features
Selected for Profiling, training and testing:Selected for Profiling, training and testing: 3600 accounts that have at least 30 fraud-free days of 3600 accounts that have at least 30 fraud-free days of
usage before any fraudulent usage. usage before any fraudulent usage. Initial 30 days of each account were used for profiling.Initial 30 days of each account were used for profiling. Remaining days were used to generate 96,000 account-Remaining days were used to generate 96,000 account-
days. days. Distinct training and testing accounts:10,000 account-days Distinct training and testing accounts:10,000 account-days
for training; 5000 for testingfor training; 5000 for testing 20% fraud days and 80% non-fraud days20% fraud days and 80% non-fraud days
37
+
Experiments and EvaluationExperiments and Evaluation
+Output of DC-1 componentsOutput of DC-1 components
Rule learning: 3630 rulesRule learning: 3630 rules Each covering at least two accountsEach covering at least two accounts
Rule selection: 99 rulesRule selection: 99 rules
2 monitor templates yielding 198 2 monitor templates yielding 198 monitorsmonitors
Final feature selection: 11 monitorsFinal feature selection: 11 monitors
39
+The Importance Of Error CostThe Importance Of Error Cost
Classification accuracy is not sufficient to Classification accuracy is not sufficient to evaluate performanceevaluate performance
Should take misclassification costs into Should take misclassification costs into accountaccount
Estimated Error Costs:Estimated Error Costs: False positive(false alarm): $5False positive(false alarm): $5 False negative (letting a fraudulent account-day go False negative (letting a fraudulent account-day go
undetected): $0.40 per minute of fraudulent air-timeundetected): $0.40 per minute of fraudulent air-time
Factoring in error costs requires second Factoring in error costs requires second training pass by LTUtraining pass by LTU
40
+Alternative Detection MethodsAlternative Detection Methods
Collisions + VelocitiesCollisions + Velocities Errors almost entirely due to false negativesErrors almost entirely due to false negatives
High Usage – detect sudden large jump in High Usage – detect sudden large jump in account usageaccount usage
Best Individual DC-1 MonitorBest Individual DC-1 Monitor (Time-of-day = Evening) ==> Fraud(Time-of-day = Evening) ==> Fraud
SOTA - State Of The ArtSOTA - State Of The Art Incorporates 13 hand-crafted profiling methodsIncorporates 13 hand-crafted profiling methods Best detectors identified in a previous studyBest detectors identified in a previous study
41
DC-1 Vs. AlternativesDC-1 Vs. Alternatives
Detector Accuracy(%) Cost ($) Accuracy at Cost
Alarm on all 20 20000 20
Alarm on none 80 18111 +/- 961 80
Collisions + Velocities
82 +/- 0.3 17578 +/- 749 82 +/- 0.4
High Usage 88+/- 0.7 6938 +/- 470 85 +/- 1.7
Best DC-1 monitor 89 +/- 0.5 7940 +/- 313 85 +/- 0.8
State of the art (SOTA)
90 +/- 0.4 6557 +/- 541 88 +/- 0.9
DC-1 detector 92 +/- 0.5 5403 +/- 507 91 +/- 0.8
SOTA plus DC-1 92 +/- 0.4 5078 +/- 319 91 +/- 0.8
42
+Shifting Fraud DistributionsShifting Fraud Distributions
Fraud detection system should adapt to Fraud detection system should adapt to shifting fraud distributionsshifting fraud distributions
To illustrate the above point - To illustrate the above point - One non-adaptive DC-1 detector trained on a One non-adaptive DC-1 detector trained on a
fixed distribution ( 80% non-fraud ) and fixed distribution ( 80% non-fraud ) and tested against range of 75-99% non-fraudtested against range of 75-99% non-fraud
Another DC-1 was allowed to adapt (re-train Another DC-1 was allowed to adapt (re-train its LTU threshold) for each fraud distributionits LTU threshold) for each fraud distribution
Second detector was more cost effective Second detector was more cost effective than the firstthan the first
43
44
Effects of Changing Fraud Distribution
0
0.2
0.4
0.60.8
1
1.2
1.4
75 80 85 90 95 100Percentage of non-fraud
Cost
Adaptive
80/20
+ConclusionConclusion
DC-1 uses a rule learning program DC-1 uses a rule learning program to uncover indicators of fraudulent to uncover indicators of fraudulent behavior from a large database of behavior from a large database of customer transactions. customer transactions.
Then the indicators are used to Then the indicators are used to create a set of monitors, which create a set of monitors, which profile legitimate customer profile legitimate customer behavior and indicate anomalies. behavior and indicate anomalies.
Finally, the outputs of the monitors Finally, the outputs of the monitors are used as features in a system are used as features in a system that learns to combine evidence to that learns to combine evidence to generate high confidence alarms. generate high confidence alarms.
47
+ConclusionConclusion
Adaptability to dynamic patterns of fraud Adaptability to dynamic patterns of fraud can be achieved by generating fraud can be achieved by generating fraud detection systems automatically from detection systems automatically from data, using data mining techniquesdata, using data mining techniques
DC-1 can adapt to the changing conditions DC-1 can adapt to the changing conditions typical of fraud detection environmentstypical of fraud detection environments
Experiments indicate that DC-1 performs Experiments indicate that DC-1 performs better than other methods for detecting better than other methods for detecting fraudfraud
48
+
Exam QuestionsExam Questions
49
+Question 1 Question 1
• What are the two major fraud detection categories, What are the two major fraud detection categories, differentiate them, and where does DC-1 fall under?differentiate them, and where does DC-1 fall under?
• Pre Call MethodsPre Call Methods
• Involves validating the phone or its user when a call is placed.Involves validating the phone or its user when a call is placed.
• Post Call Methods – Post Call Methods – DC1 falls hereDC1 falls here
• Analyzes call data on each account to determine whether cloning Analyzes call data on each account to determine whether cloning fraud has occurred.fraud has occurred.
50
+Question 2Question 2
• Why do fraud detection methods need to be Why do fraud detection methods need to be adaptive?adaptive?
• Bandits change their behavior- patterns of fraud dynamicBandits change their behavior- patterns of fraud dynamic
• Levels of fraud varies month-to-monthLevels of fraud varies month-to-month
• Cost of missing fraud or handling false alarms changes Cost of missing fraud or handling false alarms changes between inter-carrier contractsbetween inter-carrier contracts
51
+Question 3Question 3
•What are the two steps of profiling What are the two steps of profiling monitors and and what are the two main monitors and and what are the two main monitor templates?monitor templates?
•Profiling Step: measure an accounts normal activity Profiling Step: measure an accounts normal activity and save statisticsand save statistics
•Use Step: process usage for an account-day to Use Step: process usage for an account-day to produce a numerical output describing how abnormal produce a numerical output describing how abnormal activity was on that account-dayactivity was on that account-day
• Threshold and Standard Deviation monitors. Threshold and Standard Deviation monitors.
52
+
The End. The End. Questions?Questions?
53
Top Related