Download - Adaptive Fraud Detection

+

Adaptive Fraud DetectionAdaptive Fraud Detection

by by Tom Fawcett and and Foster Provost

Presented by: Lara NargozianPresented by: Lara Nargozianupdated from last 3 yearupdated from last 3 year’’s presentation s presentation by Adam Boyer,by Adam Boyer,Yunfei Zhao and Ahmen Abdeen HamedYunfei Zhao and Ahmen Abdeen Hamed

http://home.comcast.net/~tom.fawcett/public_html/index.html

http://pages.stern.nyu.edu/~fprovost/

+Why?Why?

Solving real-world problems that are very important to Solving real-world problems that are very important to each and everyone of useach and everyone of us

Provide a framework that can be adapted to solve Provide a framework that can be adapted to solve similar problemssimilar problems

Use Data Mining algorithms and techniques learned Use Data Mining algorithms and techniques learned this semesterthis semester Rule LearningRule Learning ClassificationClassification

Fun to learn aboutFun to learn about

2

+OutlineOutline

Problem DescriptionProblem Description Cellular cloning fraud problemCellular cloning fraud problem Why it is importantWhy it is important Current strategiesCurrent strategies

Construction of Fraud DetectorConstruction of Fraud Detector FrameworkFramework Rule learning, Monitor construction, Evidence combinationRule learning, Monitor construction, Evidence combination

Experiments and EvaluationExperiments and Evaluation Data used in this studyData used in this study Data preprocessingData preprocessing Comparative resultsComparative results

ConclusionConclusion

Exam QuestionsExam Questions

3

+The ProblemThe Problem

How to detect suspicious changes in user behavior to identify and prevent cellular fraud Non-legitimate users, aka bandits, gain illicit access to a

legitimate user’s, or victim’s, account

Solution useful in other contexts Identifying and preventing credit card fraud, toll fraud, and

computer intrusion

4

+Cellular Fraud - CloningCellular Fraud - Cloning

Cloning FraudCloning Fraud A kind of A kind of Superimposition Superimposition fraud.(parasite)fraud.(parasite) Fraudulent usage is superimposed upon ( added to ) the Fraudulent usage is superimposed upon ( added to ) the

legitimate usage of an account.legitimate usage of an account. Causes inconvenience to customers and great expense to Causes inconvenience to customers and great expense to

cellular service providers.cellular service providers.

5

+Cellular communications andCellular communications andCloning FraudCloning Fraud

Mobile Identification Number Mobile Identification Number (MIN) and (MIN) and Electronic Serial Number Electronic Serial Number (ESN)(ESN) Identify a specific accountIdentify a specific account Periodically transmitted unencrypted whenever phone is onPeriodically transmitted unencrypted whenever phone is on

Cloning occurs when a customerCloning occurs when a customer’’s MIN and s MIN and ESN are programmed into a cellular phone not ESN are programmed into a cellular phone not belonging to the customerbelonging to the customer Bandit can make virtually unlimited, untraceable calls at Bandit can make virtually unlimited, untraceable calls at

someone else’s expensesomeone else’s expense

6

+ Interest in reducing Cloning Interest in reducing Cloning FraudFraud Fraud is detrimental in several ways:Fraud is detrimental in several ways:

Fraudulent usage congests cell sitesFraudulent usage congests cell sites Fraud incurs land-line usage chargesFraud incurs land-line usage charges Cellular carriers must pay costs to other carriers for usage Cellular carriers must pay costs to other carriers for usage

outside the home territoryoutside the home territory Crediting process is costly to carrier and inconvenient to the Crediting process is costly to carrier and inconvenient to the

customercustomer

7

+Strategies for dealing Strategies for dealing with cloning fraudwith cloning fraud

Pre-call MethodsPre-call Methods Identify and block fraudulent calls as they are madeIdentify and block fraudulent calls as they are made Validate the phone or its user when a call is placedValidate the phone or its user when a call is placed

Post-call MethodsPost-call Methods Identify fraud that has already occurred on an account so Identify fraud that has already occurred on an account so

that further fraudulent usage can be blockedthat further fraudulent usage can be blocked Periodically analyze call data on each account to determine Periodically analyze call data on each account to determine

whether fraud has occurred.whether fraud has occurred.

8

+Pre-call MethodsPre-call Methods

Personal Identification Number (PIN)Personal Identification Number (PIN) PIN cracking is possible with more sophisticated equipment.PIN cracking is possible with more sophisticated equipment.

RF Fingerprinting RF Fingerprinting Method of identifying phones by their unique transmission Method of identifying phones by their unique transmission

characteristicscharacteristics

AuthenticationAuthentication Reliable and secure private key encryption method.Reliable and secure private key encryption method. Requires special hardware capability Requires special hardware capability An estimated 30 million non-authenticatable phones are in An estimated 30 million non-authenticatable phones are in

use in the US alone (in 1997)use in the US alone (in 1997)

9

+Post-call MethodsPost-call Methods

Collision DetectionCollision Detection Analyze call data for temporally overlapping callsAnalyze call data for temporally overlapping calls

Velocity CheckingVelocity Checking Analyze the locations and times of consecutive callsAnalyze the locations and times of consecutive calls

Disadvantage of the above methods Disadvantage of the above methods Usefulness depends upon a moderate level of legitimate Usefulness depends upon a moderate level of legitimate

activityactivity

10

+Another Post-call MethodAnother Post-call Method( Main focus of this paper )( Main focus of this paper )

User Profiling User Profiling Analyze calling behavior to detect usage anomalies Analyze calling behavior to detect usage anomalies

suggestive of fraudsuggestive of fraud Works well with low-usage customersWorks well with low-usage customers Good complement to collision and velocity checking Good complement to collision and velocity checking

because it covers cases the others might missbecause it covers cases the others might miss

11

Sample Frauded AccountSample Frauded Account

Date Time Day Duration Origin Destination Fraud1/01/95 10:05:01 Mon 13 minutes Brooklyn, NY Stamford, CT

1/05/95 14:53:27 Fri 5 minutes Brooklyn, NY Greenwich, CT

1/08/95 09:42:01 Mon 3 minutes Bronx, NY Manhattan, NY

1/08/95 15:01:24 Mon 9 minutes Brooklyn, NY Brooklyn, NY

1/09/95 15:06:09 Tue 5 minutes Manhattan, NY Stamford, CT

1/09/95 16:28:50 Tue 53 seconds Brooklyn, NY Brooklyn, NY

1/10/95 01:45:36 Wed 35 seconds Boston, MA Chelsea, MA Bandit

1/10/95 01:46:29 Wed 34 seconds Boston, MA Yonkers, NY Bandit

1/10/95 01:50:54 Wed 39 seconds Boston, MA Chelsea, MA Bandit

1/10/95 11:23:28 Wed 24 seconds Brooklyn, NY Congers, NY

1/11/95 22:00:28 Thu 37 seconds Boston, MA Boston, MA Bandit

1/11/95 22:04:01 Thu 37 seconds Boston, MA Boston, MA Bandit

12

+The Need to be AdaptiveThe Need to be Adaptive

Patterns of fraud are dynamic – bandits constantly Patterns of fraud are dynamic – bandits constantly change their strategies in response to new detection change their strategies in response to new detection techniquestechniques

Levels of fraud can change dramatically from month-to-Levels of fraud can change dramatically from month-to-monthmonth

Cost of missing fraud or dealing with false alarms Cost of missing fraud or dealing with false alarms change with inter-carrier contractschange with inter-carrier contracts

13

+

Automatic Construction of Profiling Fraud Automatic Construction of Profiling Fraud DetectorsDetectors

+One ApproachOne Approach

Build a fraud detection system by classifying calls as Build a fraud detection system by classifying calls as being fraudulent or legitimatebeing fraudulent or legitimate

However there are two problems that make simple However there are two problems that make simple classification techniques infeasible.classification techniques infeasible.

15

+Problems with simple Problems with simple classificationclassification ContextContext

A call that would be unusual for one customer may be typical A call that would be unusual for one customer may be typical for another customer (For example, a call placed from for another customer (For example, a call placed from Brooklyn is not unusual for a subscriber who lives there, but Brooklyn is not unusual for a subscriber who lives there, but might be very strange for a Boston subscriber. )might be very strange for a Boston subscriber. )

Granularity (over fitting?)Granularity (over fitting?) At the level of the individual call, the variation in calling At the level of the individual call, the variation in calling

behavior is large, even for a particular user.behavior is large, even for a particular user.

16

+In Summary: In Summary: Learning The ProblemLearning The Problem

1.1. Which phone call features are important?Which phone call features are important?

2.2. How should profiles be created?How should profiles be created?

3.3. When should alarms be raised?When should alarms be raised?

17

+ Proposed Detector Constructor Proposed Detector Constructor Framework (DC-1)Framework (DC-1)

18

+DC-1 Processing Account-Day DC-1 Processing Account-Day ExampleExample

19

+DC-1 Fraud Detection StagesDC-1 Fraud Detection Stages

Stage 1: Rule LearningStage 1: Rule Learning

Stage 2: Profile MonitoringStage 2: Profile Monitoring

Stage 3: Combining EvidenceStage 3: Combining Evidence

20

+Rule Learning – the 1Rule Learning – the 1stst stage stage

Rule GenerationRule Generation Rules are generated locally based on differences Rules are generated locally based on differences

between fraudulent and normal behavior for each between fraudulent and normal behavior for each accountaccount

Rule Selection Rule Selection Then they are combined in a rule selection stepThen they are combined in a rule selection step

21

+Rule GenerationRule Generation

DC-1 uses the DC-1 uses the RLRL program to generate rules program to generate rules with certainty factors above user-defined with certainty factors above user-defined thresholdthreshold

For each Account, RL generates a For each Account, RL generates a ““locallocal”” set set of rules describing the fraud on that of rules describing the fraud on that account. account.

Example:Example:

(Time-of-Day = Night) AND (Location = Bronx) (Time-of-Day = Night) AND (Location = Bronx) FRAUD FRAUD

Certainty Factor = 0.89Certainty Factor = 0.89

22

+Rule SelectionRule Selection

Rule Rule generation step typically yields tens of generation step typically yields tens of thousands of rulesthousands of rules

If a rule is found in ( or covers ) many accounts then If a rule is found in ( or covers ) many accounts then it is probably worth usingit is probably worth using

Selection algorithm identifies a small set of general Selection algorithm identifies a small set of general rules that cover the accountsrules that cover the accounts

Resulting set of rules is used to construct specific Resulting set of rules is used to construct specific monitorsmonitors

23

+ Rule Selection and Covering Rule Selection and Covering AlgorithmAlgorithm

24

+Profiling Monitors – the 2Profiling Monitors – the 2ndnd stagestage

Monitors have 2 distinct steps -Monitors have 2 distinct steps - Profiling step:Profiling step:

Monitor is applied to an account’s normal usage to measure Monitor is applied to an account’s normal usage to measure the accountthe account‘‘s normal activity.s normal activity.

Statistics are saved with the account.Statistics are saved with the account.

Use step:Use step: A monitor processes a single account-day, A monitor processes a single account-day, References the normalcy measure from profilingReferences the normalcy measure from profiling Generates a numeric value describing how abnormal the Generates a numeric value describing how abnormal the

current account-day is. current account-day is.

26

+Most Common Monitor Most Common Monitor TemplatesTemplates

ThresholdThreshold

Standard DeviationStandard Deviation

27

+Threshold MonitorsThreshold Monitors

28

+Standard Deviation MonitorsStandard Deviation Monitors

29

+Comparing the same standard deviation monitor on two accounts

30

+Example for Standard Example for Standard DeviationDeviation

Rule Rule (TIME OF DAY = NIGHT) AND (LOCATION = BRONX)(TIME OF DAY = NIGHT) AND (LOCATION = BRONX) FRAUD FRAUD

Profiling StepProfiling Step the subscriber called from the Bronx an average of the subscriber called from the Bronx an average of 55 minutes per night minutes per night

with a standard deviation of with a standard deviation of 22 minutes. At the end of the Profiling step, minutes. At the end of the Profiling step, the monitor would store the values (5,2) with that account. the monitor would store the values (5,2) with that account.

Use stepUse step if the monitor processed a day containing if the monitor processed a day containing 33 minutes of airtime from minutes of airtime from

the Bronx at night, the monitor would emit a zero; if the monitor saw the Bronx at night, the monitor would emit a zero; if the monitor saw 1515 minutes, it would emit (15 - 5)/2 = 5. This value denotes that the minutes, it would emit (15 - 5)/2 = 5. This value denotes that the account is five standard deviations above its average (profiled) usage account is five standard deviations above its average (profiled) usage level. level.

31

+ Combining Evidence from Combining Evidence from the Monitors – the 3the Monitors – the 3rdrd stage stage Weights the monitor outputs and learns a Weights the monitor outputs and learns a

threshold on the sum to produce high threshold on the sum to produce high confidence alarmsconfidence alarms

DC-1 uses Linear Threshold Unit (LTU)DC-1 uses Linear Threshold Unit (LTU) Simple and fastSimple and fast Enables good first-order judgmentEnables good first-order judgment

A Feature selection process is used toA Feature selection process is used to Choose a small set of useful monitors in the final detectorChoose a small set of useful monitors in the final detector Some rules don’t perform well when used in monitors, some Some rules don’t perform well when used in monitors, some

overlapoverlap Forward selection process chooses set of useful monitorsForward selection process chooses set of useful monitors

32

+Final Output of DC-1

Detector that profiles each user’s behavior based on several indicators

An alarm when sufficient evidence of fraudulent activity

33

+

Data used in the studyData used in the study

+ Data InformationData Information

Four months of phone call records from the Four months of phone call records from the New York City area.New York City area.

Each call is described by 31 original attributesEach call is described by 31 original attributes

Some derived attributes are addedSome derived attributes are added Time-Of-Day Time-Of-Day (MORNING, AFTERNOON, TWILIGHT, EVENING, NIGHT)(MORNING, AFTERNOON, TWILIGHT, EVENING, NIGHT)

To-PayphoneTo-Payphone

Each call is given a class label of fraudulent or Each call is given a class label of fraudulent or legitimate.legitimate.

35

+Data CleaningData Cleaning

Eliminated credited calls made to Eliminated credited calls made to destinations/numbers that are not in the destinations/numbers that are not in the created blockcreated block The destination number must be only called by the The destination number must be only called by the

legitimate user.legitimate user.

Days with 1-4 minutes of fraudulent usage Days with 1-4 minutes of fraudulent usage were discarded.were discarded. May have credited for other reasons, such as wrong numberMay have credited for other reasons, such as wrong number

Call times were normalized to Greenwich Call times were normalized to Greenwich Mean Time for chronological sortingMean Time for chronological sorting

36

+Data DescriptionData Description

Once the monitors are created and accounts Once the monitors are created and accounts profiled, the system transforms raw call data profiled, the system transforms raw call data into a series of account-days using the into a series of account-days using the monitor outputs as features monitor outputs as features

Selected for Profiling, training and testing:Selected for Profiling, training and testing: 3600 accounts that have at least 30 fraud-free days of 3600 accounts that have at least 30 fraud-free days of

usage before any fraudulent usage. usage before any fraudulent usage. Initial 30 days of each account were used for profiling.Initial 30 days of each account were used for profiling. Remaining days were used to generate 96,000 account-Remaining days were used to generate 96,000 account-

days. days. Distinct training and testing accounts:10,000 account-days Distinct training and testing accounts:10,000 account-days

for training; 5000 for testingfor training; 5000 for testing 20% fraud days and 80% non-fraud days20% fraud days and 80% non-fraud days

37

+

Experiments and EvaluationExperiments and Evaluation

+Output of DC-1 componentsOutput of DC-1 components

Rule learning: 3630 rulesRule learning: 3630 rules Each covering at least two accountsEach covering at least two accounts

Rule selection: 99 rulesRule selection: 99 rules

2 monitor templates yielding 198 2 monitor templates yielding 198 monitorsmonitors

Final feature selection: 11 monitorsFinal feature selection: 11 monitors

39

+The Importance Of Error CostThe Importance Of Error Cost

Classification accuracy is not sufficient to Classification accuracy is not sufficient to evaluate performanceevaluate performance

Should take misclassification costs into Should take misclassification costs into accountaccount

Estimated Error Costs:Estimated Error Costs: False positive(false alarm): $5False positive(false alarm): $5 False negative (letting a fraudulent account-day go False negative (letting a fraudulent account-day go

undetected): $0.40 per minute of fraudulent air-timeundetected): $0.40 per minute of fraudulent air-time

Factoring in error costs requires second Factoring in error costs requires second training pass by LTUtraining pass by LTU

40

+Alternative Detection MethodsAlternative Detection Methods

Collisions + VelocitiesCollisions + Velocities Errors almost entirely due to false negativesErrors almost entirely due to false negatives

High Usage – detect sudden large jump in High Usage – detect sudden large jump in account usageaccount usage

Best Individual DC-1 MonitorBest Individual DC-1 Monitor (Time-of-day = Evening) ==> Fraud(Time-of-day = Evening) ==> Fraud

SOTA - State Of The ArtSOTA - State Of The Art Incorporates 13 hand-crafted profiling methodsIncorporates 13 hand-crafted profiling methods Best detectors identified in a previous studyBest detectors identified in a previous study

41

DC-1 Vs. AlternativesDC-1 Vs. Alternatives

Detector Accuracy(%) Cost ($) Accuracy at Cost

Alarm on all 20 20000 20

Alarm on none 80 18111 +/- 961 80

Collisions + Velocities

82 +/- 0.3 17578 +/- 749 82 +/- 0.4

High Usage 88+/- 0.7 6938 +/- 470 85 +/- 1.7

Best DC-1 monitor 89 +/- 0.5 7940 +/- 313 85 +/- 0.8

State of the art (SOTA)

90 +/- 0.4 6557 +/- 541 88 +/- 0.9

DC-1 detector 92 +/- 0.5 5403 +/- 507 91 +/- 0.8

SOTA plus DC-1 92 +/- 0.4 5078 +/- 319 91 +/- 0.8

42

+Shifting Fraud DistributionsShifting Fraud Distributions

Fraud detection system should adapt to Fraud detection system should adapt to shifting fraud distributionsshifting fraud distributions

To illustrate the above point - To illustrate the above point - One non-adaptive DC-1 detector trained on a One non-adaptive DC-1 detector trained on a

fixed distribution ( 80% non-fraud ) and fixed distribution ( 80% non-fraud ) and tested against range of 75-99% non-fraudtested against range of 75-99% non-fraud

Another DC-1 was allowed to adapt (re-train Another DC-1 was allowed to adapt (re-train its LTU threshold) for each fraud distributionits LTU threshold) for each fraud distribution

Second detector was more cost effective Second detector was more cost effective than the firstthan the first

43

44

Effects of Changing Fraud Distribution

0

0.2

0.4

0.60.8

1

1.2

1.4

75 80 85 90 95 100Percentage of non-fraud

Cost

Adaptive

80/20

+ConclusionConclusion

DC-1 uses a rule learning program DC-1 uses a rule learning program to uncover indicators of fraudulent to uncover indicators of fraudulent behavior from a large database of behavior from a large database of customer transactions. customer transactions.

Then the indicators are used to Then the indicators are used to create a set of monitors, which create a set of monitors, which profile legitimate customer profile legitimate customer behavior and indicate anomalies. behavior and indicate anomalies.

Finally, the outputs of the monitors Finally, the outputs of the monitors are used as features in a system are used as features in a system that learns to combine evidence to that learns to combine evidence to generate high confidence alarms. generate high confidence alarms.

47

+ConclusionConclusion

Adaptability to dynamic patterns of fraud Adaptability to dynamic patterns of fraud can be achieved by generating fraud can be achieved by generating fraud detection systems automatically from detection systems automatically from data, using data mining techniquesdata, using data mining techniques

DC-1 can adapt to the changing conditions DC-1 can adapt to the changing conditions typical of fraud detection environmentstypical of fraud detection environments

Experiments indicate that DC-1 performs Experiments indicate that DC-1 performs better than other methods for detecting better than other methods for detecting fraudfraud

48

+

Exam QuestionsExam Questions

49

+Question 1 Question 1

• What are the two major fraud detection categories, What are the two major fraud detection categories, differentiate them, and where does DC-1 fall under?differentiate them, and where does DC-1 fall under?

• Pre Call MethodsPre Call Methods

• Involves validating the phone or its user when a call is placed.Involves validating the phone or its user when a call is placed.

• Post Call Methods – Post Call Methods – DC1 falls hereDC1 falls here

• Analyzes call data on each account to determine whether cloning Analyzes call data on each account to determine whether cloning fraud has occurred.fraud has occurred.

50

+Question 2Question 2

• Why do fraud detection methods need to be Why do fraud detection methods need to be adaptive?adaptive?

• Bandits change their behavior- patterns of fraud dynamicBandits change their behavior- patterns of fraud dynamic

• Levels of fraud varies month-to-monthLevels of fraud varies month-to-month

• Cost of missing fraud or handling false alarms changes Cost of missing fraud or handling false alarms changes between inter-carrier contractsbetween inter-carrier contracts

51

+Question 3Question 3

•What are the two steps of profiling What are the two steps of profiling monitors and and what are the two main monitors and and what are the two main monitor templates?monitor templates?

•Profiling Step: measure an accounts normal activity Profiling Step: measure an accounts normal activity and save statisticsand save statistics

•Use Step: process usage for an account-day to Use Step: process usage for an account-day to produce a numerical output describing how abnormal produce a numerical output describing how abnormal activity was on that account-dayactivity was on that account-day

• Threshold and Standard Deviation monitors. Threshold and Standard Deviation monitors.

52

+

The End. The End. Questions?Questions?

53