Communication Fraud Isaac

download Communication Fraud Isaac

If you can't read please download the document

Transcript of Communication Fraud Isaac

Telecommunications Fraud Detection using Bayesian Networks

by

Osunmakinde Isaac Olusegun African Institute for Mathematical Sciences 6, Melrose Road, 7945 Muizenberg, Cape Town South Africa. E-mail: [email protected], [email protected]

Supervisor : Dr. Anet E. G. Potgieter Computer Science Department University of Cape Town Cape Town, South Africa. E-mail: [email protected], [email protected] May 26, 2005

ContentsList of Figures Abstract Terminologies 1 Introduction 1.1 1.2 1.3 1.4 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objectives of Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scope and Limitations of this Essay . . . . . . . . . . . . . . . . . . . . . . . . iii iv v 1 1 2 2 3 4 4 4 4 6 7 8 8 8 9 9 9

Principal Results and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Overview of the Problem and Literature Review 2.1 Overview and the details of the problem . . . . . . . . . . . . . . . . . . . . . 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.1.7 2.2 Fraud in Telecommunications . . . . . . . . . . . . . . . . . . . . . . . . . . . Fixed / Land Phones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mobile Phones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Unlawful Methods of Obtaining Security Numbers . . . . . . . . . . . . . The Accessibility of Analog, Digital and ESN / MIN . . . . . . . . . . . . . . The Proposed Solution for South Africa . . . . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 2.2.2 2.2.3 2.2.4 Articial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fundamentals of Probability Theory and Bayes Theorem . . . . . . . . . . .

Bayesian Networks and its (In)dependencies . . . . . . . . . . . . . . . . . . . 11 Further Characteristics of Bayesian Networks and Learning . . . . . . . . . . 12

i

CONTENTS 2.2.5 2.2.6 Agents and Adaptive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Existing Telecommunications Fraud Detection Techniques . . . . . . . . . . . 13 15

3 Methodology 3.1

Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Call Data and Impurities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2

System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 3.2.2 Classication of Users Accounts . . . . . . . . . . . . . . . . . . . . . . . . . 17 Transformation of Call Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3

Behavioural Bayesian Network (BBN) Model . . . . . . . . . . . . . . . . . . . . . . 19 3.3.1 3.3.2 3.3.3 3.3.4 Qualitative and Quantitative Learning . . . . . . . . . . . . . . . . . . . . . . 19 Call Patterns Recognition Technique . . . . . . . . . . . . . . . . . . . . . . . 22 Fraud Detection Techniques with BBN . . . . . . . . . . . . . . . . . . . . . . 24 Techniques of Performance Evaluation for The BBN Model . . . . . . . . . . 25

3.4

Mathematical Modelling of Telephone Customer Population . . . . . . . . . . . . . . 25 28

4 Simulation and Evaluation 4.1 4.2 4.3

Diagnosing and Predicting Phone Calls . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Pattern Recognition of Call Records . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Performance Evaluation of BBN Model 4.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . 30

The Global Precision and Occurrence Matrix . . . . . . . . . . . . . . . . . . 31

4.4

The Mathematical Model Simulation with Octave Program . . . . . . . . . . . . . . 31 35

5 Conclusion, Discussion and Results 5.1 5.2 5.3

Contribution to Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 37

Acknowledgements

ii

List of Figures2.1 2.2 2.3 3.1 3.2 3.3 3.4 3.5 4.1 4.2 4.3 4.4 Schematic Classication of Deceitful Fraud . . . . . . . . . . . . . . . . . . . . . . . Public Transmission Path for Land Lines . . . . . . . . . . . . . . . . . . . . . . . . 5 6

A Sample Bayesian Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 System Model for Telecommunication Fraud Detection . . . . . . . . . . . . . . . . . 17 Behavioural Bayesian Network (BBN) Model for Land Line Telephone Fraud . . . . 19 Present Knowledge Associated to Call Time Node . . . . . . . . . . . . . . . . . . . 22 Present Knowledge Associated to Duration Node . . . . . . . . . . . . . . . . . . . . 22 3 Compartmental Model for Telephone Customers Population . . . . . . . . . . . . . 27 Prediction of Call Record 1 of Table 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . 29 Prediction of Call Record 2 of Table 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . 29 The Eect When Illegal Activities Stop After 5 Calls trapped . . . . . . . . . . . . . 33 The Eect When Illegal Activities Stop After 10 Calls trapped . . . . . . . . . . . . 33

iii

AbstractTelecommunications (both xed and mobile lines) are an attractive target for fraudsters in the business world. The fraud here can be broadly classied as superimposed and subscription fraud which bring losses to the network carriers. This necessitates fraud detection which primarily involves monitoring the behaviour of customers or populations of users to detect and alert undesirable call behaviour. This essay will probe ways in which illegitimate users can make unlimited calls to the detriment of the legal owner. As an improvement over the previous techniques of fraud detection, this essay uses a Bayesian network model as the major tool to trap illegal calls. A major goal of this work is to ensure that customers no longer feel threatened. The activities of fraudsters will be better monitored and the source of the suspicious charges will be identied.

iv

TerminologiesAI....................................Articial Intelligence BaBe................................Bayesian Behaviour Networks Agent Architecture BBN.................................Behavioural Bayesian Network BNA.................................Bayesian Network Administrator of BaBe CDR.................................Call Detail Record CPT.................................Conditional Probability Table CUP.................................Current User Prole DAG.................................Directed Acyclic Graph EM...................................Expected Maximisation ESN.................................Equipment Serial Number GSM................................Global System Mobile network MIN.................................Mobile Identication Number PIN.................................Personal Identication Number SIM.................................Subscribers Identication Module SOM................................Self Organising Map TSP.................................Telephone Service Provider UPH................................User Prole History

v

Chapter 1

IntroductionThe telecommunications industries, both xed and mobile, lose approximately 4 % of their daily revenue to fraud. Fraud in a telecommunication network is dened as the illegal use of network services by a fraudster. The lost income is not just millions of rands but network resources which are also wasted. Some examples of South African networks include Telkom, Cell-c, Vodacom and MTN. These network carriers do not declare all fraud losses so as to maintain subscribers satisfaction. Otherwise, subscribers feel threatened and they either collaborate with fraudsters (impersonators) who provide cheap calls for commercial purposes or change to a competing operator (carrier). This prevents new people subscribing to defrauded networks. This essay focuses on xed and mobile line networks for early detection of fraudulent calls. It is faster to trap fake calls on xed lines than mobile phones because calls originate from a single location. Telecommunication fraud types can be classied into two categories: subscription and superimposed fraud [13]. Subscription fraud results from an illegal subscription to a service using a false identity without intention of paying. Superimposed fraud is committed by impersonators who use a network without permission and incur unknown charges to a callers bill. In this essay, a mathematical model is designed and simulated to monitor the approximate population distribution of subscribers. This research primarily and eectively combats some types of fraud with Bayesian networks that dene probabilistic models. This centres around Bayes rule for calculating the probability that a call is fraud given subscribers behaviour. That is; we want to deal with expressions of the form: P r(a call = f raud | Subscriber s behaviour).

1.1

Problem Denition

In the rst instance, it is observed that network operators lose customers who feel threatened by fraudsters to competing carriers. This could quickly put a telecommunication company out of business. Currently, one of the prevailing methods for combating fraud is block crediting. A bill is sent to the customer who either approves or disapproves the billed amount [16]. This approach is expensive 1

1.2. OBJECTIVES OF FRAUD DETECTION

and may contain errors. Researchers focus more on mobile fraud with little or no research on xed line fraud. Fixed line fraud may not lose as much money as mobile fraud but it is still part of the carrier losses. Recall that carriers lose millions of rands daily on revenue. In addition, little or no work has been done on incomplete call records. The physical equipment of a Telephone Service Provider (TSP) may lose out a particular eld in a call record due to some unexpected faults. The existing research has resulted in fraud detection techniques which are good but are not yet sophisticated enough. These shortcomings lead to false alarm rates.

1.2

Objectives of Fraud Detection

This essay uses a Bayesian network model for real time fraud detection to reduce the losses in the network industries. Real time fraud detection is gaining ground in the development of fraud detection models [9]. We intend to describe in detail the deceitful telecommunication fraud types and their interrelationships. Also, we will use mathematical modelling to monitor the changes in the population distribution of legal customers. This distribution assists in eective fraud detection. Multiple Bayesian inferences which serve as fraud indicators are used instead of block crediting. Additionally, parameter estimations of Bayesian inferences are used for the incomplete data. In general, the objective of fraud detection is to maximise correct predictions and minimise incorrect predictions at an acceptable level [14]. We will show that the Bayesian model does both eectively.

1.3

Scope and Limitations of this Essay

The proposed type of new network model used is a Behavioural Bayesian Network (BBN) which is useful in trapping abnormal calls. This model uses the user prole history (UPH) and current user prole (CUP) of calls to monitor how the call pattern changes. This essay primarily focuses on how user calls change on xed lines but the model mentioned can also monitor users behaviour on mobile phones if used in a single location. Furthermore, a closed mathematical model is assumed for the subscriber population. That is, we assume there is no immigration or emigration. However, within these limits, this essay shows how probabilistic and mathematical fraud models can be used to simulate the occurrence of telecommunication fraud and subscriber populations respectively. It is interesting to note that, further studies on the subject matter of this essay require the use of multi-agent systems to incorporate automation in callers behaviour. A Multi-agent system is roughly dened as the use of several programs working together to show good reasoning. Consequently, the velocity trap and the overlap models will work together with the Behavioural Bayesian model to produce high fraud catching rate. An overlap or collision model intends to trap a phone number that can make calls at dierent places about the same time. Velocity trap model intends to mark a phone number making calls at a close time interval but at impractical distant locations 2

1.4. PRINCIPAL RESULTS AND CONCLUSION

to one another. Thus, even though this essay is limited in scope to the Behavioural model, the direction for future work is clear.

1.4

Principal Results and Conclusion

This essay will show that the fraud catching rate is eective as regards behaviour of callers. The simulation of the mathematical model describes the rate at which the legal customers are being troubled by the fraudsters and the long term eects of fraud incidences on customers if fraudsters are not troubled by the service providers. Furthermore, the feedback or the eectiveness of the Bayesian network model can be observed from the distribution of the customer population. Also, the results of this mathematical simulation enables the network carrier to know the appropriate time schedule to combat illegal calls with the Bayesian network model.

3

Chapter 2

Overview of the Problem and Literature Review2.1 Overview and the details of the problem

The following sections identify and classify dierent types of fraud and describe how they are committed.

2.1.1

Fraud in Telecommunications

Presently, telecommunication operators (TSP) could save themselves and their legal customers from deceitful fraud types by using intelligent models or systems. Building ecient probabilistic intelligent models requires studying the diagramatical interrelationships of the existing fraud types in Figure 2.1. This essay builds up the pictorial representation of fraud types discussed by the previous researchers [2], [15], [11]. It should be noted that this diagram includes only deceitful fraud types. When you receive a business discussion over the telephone then, you regard it as telemarketing. If the person online is anonymous then it is clearly a fraudulent venture. Any intelligent model does not trap telemarketing fraud because a telephone subscriber may decide to become a victim of it [15].

2.1.2

Fixed / Land Phones

Previous research shows that fraudsters concentrated on xed lines before the invention of the current popular mobile phones. These land line fraud occurred in both developed and underdeveloped countries. Figure 2.1 shows the major fraud is superimposed while subscription fraud rarely occurs. The dotted arcs show that the fraud type does not occur regularly while the continuous lines show that the fraud types occur very often. Fraud is said to be superimposed when an impersonator illegally steals resources from legitimate users by gaining access to their phone account [13]. The impersonator is said to be an insider when the fraud is committed by an employee of the TSP. Also, superimposed fraud can be external when committed by unscrupulous members of the general public.

4

2.1. OVERVIEW AND THE DETAILS OF THE PROBLEM

Figure 2.1: Schematic Classication of Deceitful Fraud

Fraud is said to be subscription fraud when a fraudster illegitimately obtains a legal account and uses services without intention to pay the bills. The Secrets of Fraudsters on Land Phones For insider superimposed fraud, the employee traces the phone line of wealthy customers (e.g companies), taps the line, connects a receiver temporarily and makes calls for personal usage. External superimposed fraud is committed similarly to insider type, in public places, possibly at the local loop but for commercial and private use. A local loop is the transmission path between a legal customer phone to the closest local oce of the Telephone Service Provider (TSP). All these calls are made at the detriment of the legal user who must pay higher bills. Figure 2.2 shows where an external superimposed fraud is mostly committed. In subscription fraud, a swindler steals the personal bank account data (e.g account number) of someone who subscribes to the same or dierent network (TSP) and gets a land line service from a given telephone operator. The legal user may notice the subscription fraud at the billing period, if they are very observant. The xed line swindlers stay temporarily in a house or apartment. 5

2.1. OVERVIEW AND THE DETAILS OF THE PROBLEM

Figure 2.2: Public Transmission Path for Land Lines

2.1.3

Mobile Phones

The subscription fraud explained above applies here but is common and popular today for commercial and personal usages. Superimposed fraud is committed when impersonators gain access to the accounts of legitimate users by stealing their unique Mobile Identication Number (MIN) and Equipment Serial Number (ESN) to perform cloning or spoong. Spoong describes a phone user who pretends to be someone else. A phone is said to be cloned when a SIM card is reprogrammed to transmit the MIN / ESN belonging to another legitimate cell phone [15]. The cloning variations [12] are given below: The subtype-1 is to duplicate only one MIN / ESN on a new SIM card. The subtype-2 is a roaming fraud where more than one MIN / ESN are stored on a SIM card. The continuous changing from one MIN / ESN to another is called tumbling because each call is billed to one MIN / ESN at a time. So, it is dicult for the owner to notice illegal calls. The subtype-3 is life time phone fraud. The fraudsters could manually program any MIN / ESN stolen through handset keypads. Subscribers are identied by a network using a smart card called SIM (Subscriber Identication Module) which contains the MIN / ESN. The MIN and ESN take the following forms: 6

2.1. OVERVIEW AND THE DETAILS OF THE PROBLEM

MIN = Continent code + Country code + TSP type + Phone number. ESN = Manufacturers number + Serial number of the phone. Thus, if a fraudster impersonates with a stolen MIN / ESN, the legitimate user is billed. Research has shown that theft is another serious deceitful fraud in the mobile industry. When a mobile phone is stolen, it is still superimposed on legal users by spoong the network. Spoong means stealing a handset but not the security numbers and making calls as if it was the owner that is using the network. The stolen phone continues to be used by the fraudster until it is reported to the TSP for blockage, only if the owner remembers his or her Personal Identication Number (PIN). The superimposed fraud could be internal or external [5]. As in the case of land lines, internal fraud here is committed by the employees who manufacture the SIM cards. The superimposed fraud is said to be external when unscrupulous people clone a stolen MIN / ESN on a new smart card or keep on spoong a stolen phone. When a stolen phone is reported, spoong stops because the MIN / ESN is blocked but it could still be cloned. The Secrets of Fraudsters on Mobile Phones For mobile subscription fraud, a cellular or GSM phone is collected on contract agreement such that a periodic bill debits a legal bank account. Research shows that subscription fraud is mostly local calls so that they will not be noticed on time. This takes time to be resolved by the TSP. The internal fraud is caused by certain criminal employees of the SIM card company who sell MIN / ESN codes to unscrupulous people or themselves for commercial or personal use. For cloning, there is a computer program for sale on the Internet [3] that uses a search and replacement algorithm, removing the old MIN / ESN and replacing it with a new combination on the smart card. After executing the program, an electronic device called Electrically Erasable Programmable Read Only Memory (EEPROM burner) uses ultraviolet light to clear and write onto the microchip of the SIM card.

2.1.4

The Unlawful Methods of Obtaining Security Numbers

The Secrets of Obtaining MIN / ESN The unlawful methods of trapping these security numbers are stated below: Eavesdropping The fraudsters use an antenna, or a software-based electronic mobile communication scanner to trap valid MIN / ESN numbers by illegally monitoring the radio wave transmissions from the cell phones of legitimate subscribers. Cracking A specialised computer is used to read the setup menu of a mobile phone to trap the MIN / ESN. Handset technicians and motor vehicle mechanics are usually involved in this dubious business.

7

2.1. OVERVIEW AND THE DETAILS OF THE PROBLEM

The Unlawful Techniques of Discovering Subscribers PIN Numbers The illegal methods [12] of getting the PIN are given below: Shoulder Surng This is the technique of spying on users when they enter their PINs in the public places through the use of cameras, telescope or looking over their shoulders. Scanning / Cracking This involves the use of a computer and a sequential number dial-out program to obtain the PIN. Social Engineering A traditional criminal technique to trick gullible people to reveal their security number believing that they are required for legitimate purposes.

2.1.5

The Accessibility of Analog, Digital and ESN / MIN

Analogue mobile phones (e.g cellulars) are cheap and do not encrypt the MIN / ESN during transmission. This makes it easier for fraudsters to trap the security codes. The MIN / ESN digital phone (e.g GSM) was initially well encrypted during transmission but later was cracked (made simpler) because of the decoding disparity from one country network to the other. Initially when sophisticated encryption techniques were used to protect the security numbers on GSM, there were restrictions on accessibilities and some countries found it dicult to decrypt because of the expensive equipment required. The resulting internetwork linking countries had more lax security and was easier to breach. Thus, little or no encryption is used with digital phones which puts legitimate users at risk of being defrauded [7].

2.1.6

The Proposed Solution for South Africa

The South African telephone operators intend to blacklist defrauded SIM cards which may disturb but does not stop law breakers. Blacklisting involves making handsets and SIM cards useless. Currently, there is no uniform rule of blacklisting phones [1]. Greylisting involves blocking SIM cards but not handsets which is already in practice. These are imperfect solutions and a more intelligent model is required. Thus, making appropriate laws for the telephone industry is a challenge for policy makers.

2.1.7

Discussion

This essay extracts fraud types and their relationships from the above references with the scientic methods of how they are committed. Also, the reasons behind defrauding were reviewed so as to understand the decision making on suspicious call records. In addition, the lawbreakers do not respect governmental policies. So, probabilistic models (Bayesian Networks) in the Articial Intelligence (AI) eld will be applied by telecommunication industries to trap oenders or fraudsters. The approach is to design a special model for each type of fraud. In this essay, a Behavioural Bayesian model is used. 8

2.2. LITERATURE REVIEW

2.2

Literature Review

This section gives a short technical background of Articial Intelligence and probabilistic Bayesian Network models. It further explains the previous techniques and their applications.

2.2.1

Articial Intelligence

Articial Intelligence (AI) is a eld of computer science that inculcates intelligence into a machine by humans to solve problems. It is almost widely accepted that the two types of AI are strong and weak [6]. Strong AI is the creation of computer-based articial intelligence that can reason and solve problems with self awareness. There has been no standard example yet but strong AI assumes the human mind is software and the brain is purely hardware. Thus, a computer system can roughly approximate a human being, if it can include all the properties of human minds such as consciousness. Weak AI is the creation of computer-based articial intelligence that can reason and solve problems only in a limited domain; e.g. a probabilistic model applied to detect fraud records in the telecommunication domain. Similar and other examples include pattern recognition, image processing, neural networks, robotics etc. It is observed from previous research [6] that a moderate level of progress has been made in weak AI and it is viewed as the set of Computer Science problems without good solutions. I intend to make my own contribution by using Bayesian Networks to detect fraudulent activities in telecommunication call records. The detection requires probabilistic reasoning with uncertain information. It entails the use of Bayesian inferences as a statistical inference in which probabilities are used as degree of beliefs but not as frequencies or proportions. This imposes a level of condence to problem solving. Thus, this essay begins the Bayesian network model by reviewing the foundation of probabilities.

2.2.2

Fundamentals of Probability Theory and Bayes Theorem

A Bayesian network requires discrete random values such that if there exists random variables X1 , ..., Xn with corresponding values x1 , ..., xn then their joint probability is given by the expression: P r(X1 = x1 , ..., Xn = xn ) The following are the important properties of discrete probability distributions [8] : 0 P r(Xi ) 1, where i = 1, 2, ...

n

P r(Xi ) = 1i=1

9

2.2. LITERATURE REVIEW

n1

n2

n5

n4

n3

n6Figure 2.3: A Sample Bayesian Network Model

These basic properties are used by the Bayesian network as a multivariate model that advances Bayes theorem to solve complex problems like identifying fraudulent patterns in callers records. For instance, to understand the solution to this expression, P r(A call f rom 27720251590 = true | user s behaviour) = ? we need to know Bayes theorem which rst requires the knowledge of conditional probability. The conditional probabilities are identied as the basic building blocks of Bayes theorem. For any two random variables Xi and Xj , the conditional probability is dened as P r(Xi | Xj ) = P r(Xi , Xj ) P r(Xj ) (2.1)

With the expression 2.1, using the chain rule [10] of conditional probabilities, we have: P r(Xi , Xj ) = P r(Xi | Xj )P r(Xj ) (2.2)

The order of choosing Xi and Xj does not matter in the equation 2.2, therefore, we have the expression: P r(Xj , Xi ) = P r(Xj | Xi )P r(Xi ) Now, if we equate the right hand sides of the equations 2.2 and 2.3, we have: P r(Xi | Xj )P r(Xj ) = P r(Xj | Xi )P r(Xi ) Thus, Bayes theorem is given as: (2.3)

10

2.2. LITERATURE REVIEW

P r(Xi | Xj ) =

P r(Xj | Xi )P r(Xi ) P r(Xj )

(2.4)

From equation 2.4, P r(Xi | Xj ) is called the posterior probability. The posterior probability of Xi (the hypothesis) is obtained after making observations or studying the behaviour of a user. Thus, it is the original degree of belief when the likelihood and prior are combined. Also, P r(Xj | Xi ) is the likelihood function of Xi given Xj . It is taken as the conditional probability of what we know Xj (evidence) based on what we do not know. In other words, it is the probability of seeing the observation Xj given that the hypothesis is true. From the Bayes theorem in equation 2.4, the prior probability of Xi is the probability of Xi before making any observation or any inference. The marginal probability of Xj is mostly taken as a normalising constant and P r(Xi | Xj ) P r(Xj ) is called the scaling factor because it gives a measure of the impact that observations have on the belief of the hypothesis.

2.2.3

Bayesian Networks and its (In)dependencies

A Bayesian belief network is a directed acyclic graph (DAG) as in Figure 2.3 which is a joint probability distribution of discrete variables. The graphical representation of the Bayesian network model displays the qualitative knowledge about a particular domain. The probabilistic model in Figure 2.3 gives an ecient description of multivariate probability densities. Since the majority of work in this essay has to do with learning the behaviour of users from callers databases then, the nodes ni , where i = 1, 2, ..., 6. can represent learning parameters. However, some of these nodes could represent fraud indicators with the dependency relationships as arcs. The following concepts will be continually used in the subsequent sections. Conditionally Dependent: It is obvious in gure 2.3 that n5 is conditionally dependent on n1 ; we denote and compute the probability as: P r(n5 | n1 ). Additionally, n4 is conditionally dependent on n1 and n2 ; we denote and compute the probability as: P r(n4 | n1 , n2 ), n3 is conditionally dependent on n2 ; we denote and compute the probability as: P r(n3 | n2 ), n6 is conditionally dependent on n4 and n3 ; We denote and compute the probability as: P r(n6 | n4 , n3 ), Conditionally Independent: An evidence is a value that is seen or observed with certainty in a real life. A node is said to be non-empty if it contains value. If n4 is non-empty (as evidence) node 11

2.2. LITERATURE REVIEW

then n6 is conditionally independent of n1 and n2 . Also, the only path from n5 to n4 is blocked by node n1 then, we say that n5 is d-separated (conditionally independent) from n4 . By Bayes, we say that P r(n5 | n1 ) is conditionally independent of P r(n4 |n1 , n2 ). In fact, this is an important property that the belief network exploits to reduce the number of probabilities to be computed as a model which makes intractable problems feasible. Unconditionally Independent: A node is said to be empty if it does not contain a value. Also, in gure 2.3, suppose n2 is empty then, we say that n3 is unconditionally independent of n4 . That is, by Bayes theorem, we say P r(n4 | n1 ) is unconditionally independent of P r(n3 ). Joint Probability: A query node is a node we want to infer or knows its probability. The joint probability of a model depends on what is expected in a query node given certain evidences. However, by Bayes, the factorised joint probability density of this model is: P r(n1 , n2 , n3 , n4 , n5 , n6 ) = P r(n1 )P r(n2 )P r(n3 | n2 )P r(n4 | n1 , n2 )P r(n5 | n1 )P r(n6 | n3 , n4 )

2.2.4

Further Characteristics of Bayesian Networks and Learning

Some important features of Bayesian networks apart from those mentioned in the previous 2 subsections are learning, probabilistic reasoning, level of condence, and uncertainty. Learning: The various scenarios in which a Bayesian network can learn are known structures (model) with complete data (records), known structures with incomplete data, unknown structure with complete data, and unknown structure with incomplete data. Most times, the known structure describes supervised learning while the unknown structure is regarded as unsupervised learning. In this essay, we intend to mine (unsupervised learning) structure from data. As a result of the unsupervised scenario, the Bayesian network learns from training data to represent quantitative knowledge in the form of a conditional probability table (CPT). Various examples of CPTs will be given in chapter three of this essay. Probabilistic reasoning: Causal (top down), Diagnostic (bottom up) and Explaining away are the three patterns of reasoning observed in Bayesian networks. The causal inference technique uses cause to infer eect. In other words, when a cause is observed, we can generate or predict the possible eects. For instance, from gure 2.3, it makes sense to calculate: P r(n5 = true | n1 = true). With a fraud detection model, we can work with expressions of the form: P r(a call 27832775214 = true | the behaviour 27832775214 [user]) The diagnostic inference technique uses eects (or symptoms) as evidence to infer causes. It is normally used in expert systems. Also, in the same view, from Figure 2.3, we can consider the probability as: P r(n1 = true | n5 = true). Also, in a fraud detection model, it makes sense to say: P r(a call is f rom 27832775214 | a call = true). Explaining away is a reasoning technique where change in belief in a possible explanation, if an alternative explanation is actually observed [10]. For instance, we may want to ask; what is the P r(n4 | n1 ) and P r(n4 | n6 )? These two results can be explained to make better decisions. This reasoning type is called Berksons 12

2.2. LITERATURE REVIEW

paradox which uses a causal reasoning step embedded in a diagnostic type. In the same way, I would make 3 inferences using fraud indicators in my model before deciding that a call is fraudulent. Thus, any of these reasoning techniques can be used to learn the patterns of behaviour of a user or detect abnormal patterns from callers records. This decision depends on the hypotheses to be tested on a belief model to be used. Level of condence: The Bayesian network model uses a degree of belief that is equivalent to probabilities to express the condence level of a result. For instance, a Bayesian belief model gives the output of a probability distribution about an entity such as: P r(calling destination = true) = 0.92 and P r(calling destination = f alse) = 0.08. This is very important if the result is to be used to make decisions. We can then say that we have 92 % condence that this is a legitimate call. Uncertainty: The Bayesian network model has the capability to reason with uncertain information. For instance, the incompleteness or distortion in telecommunication call records may be due to some physical problems with their equipment. The incomplete data can still be estimated with Bayesian models.

2.2.5

Agents and Adaptive Systems

This essay would be incomplete if the signicance of this section is not claried. The Bayesian Network model will not be well-adaptive if agents are not described. In this context, an intelligent agent is a simple software program that can perceive, with limitations, aspects of its environment and aect that environment either directly or through cooperation with other agents. When there are two or more agents working together, we have multi-agent systems. An agent is a bit dierent from just a program because of the following characteristics [4]: Agents are situated, interactional, autonomous or semi-autonomous and are structured. Multi-agent systems are required when each agent has incomplete information or capabilities for solving the problem and, thus, has a limited viewpoint, data is decentralised, computation is asynchronous, and when there is no system global control. One of the agent types required to complete this research work is called a personal agent which is used for user proling. A system is said to be adaptive if it can adjust to its environment. The behaviours of legitimate and illegal callers change from time to time and these necessitate the behavioural Bayesian Network (BBN) model to adapt to changes. An agent is required to make good decisions as a result of monitoring (state transition analysis) the calling patterns from the BBN.

2.2.6

Existing Telecommunications Fraud Detection Techniques

At present, a number of methods have been used for telecommunication fraud detection which are; data mining, rule-based approach, statistical techniques, supervised and unsupervised neural networks. Data mining is a technique that takes a massive amount of data as inputs, extracts some usable information and produces outputs in the form of patterns for decision making. This traditional

13

2.2. LITERATURE REVIEW

data mining method is popularly used to predict any suspicious activities implied by the new data set [16]. Using the traditional data mining method alone for fraud detection requires a lot of time which prohibits real time fraud detection. Fraudulent activities need to be detected in a very short period of time to minimise the losses of the network carriers. The rule-based technique works best with user proles containing explicit information, where fraud criteria can be considered as rules [16]. This technique is dicult to manage because it requires denite, back-breaking and time consuming programming for every imaginable possible fraud. It is dicult to incorporate dynamism of new fraud types into the rules. In fact, it should be noted that the performance of the rule-based approach drastically reduces as the call data grows. Statistical method of fraud detection uses thresholding where summarised statistics such as average call duration, longest call duration and average number of calls per day are usually compared with the pre-determined threshold [5]. Since the threshold is taken at a particular time period, the statistical threshold model nds it dicult to adapt to changes in call behaviour of a customer. The popularly used Neural Network with supervised learning trains its model with a call detail record (CDR) which were pre-classied as either fraudulent or legal calls. This technique can only detect fraudulent calls if is already existing in the training data. New type of illegal calls cannot be detected. Also, unsupervised Neural Network models like the self organising map (SOM) cluster homogeneous call records by summarising the behaviour of the customers. Detection is done such that new call groups are checked if they deviate from the clusters. It is obvious that old and new calls are compared but at later time some old calls are removed from the history. However, there is a possibility that a customers old behaviour could be repeated and the unsupervised Neural Network model does not describe the level of condence for its fraud detection results.

14

Chapter 3

MethodologyThe major approach proposed to combat telecommunication fraud is the use of probabilistic models and the BBN is described in detail in this essay. As discussed earlier, a mathematical model which can be used as a control strategy by the TSP to monitor the distribution of subscribers population is described in this section. Moreover, many components like agents described in the system model will work together to increase the trapping capabilities of the BBN. In addition to fraud detection, this essay does not only try to describe the relationship between probabilistic and mathematical models but also shows the inter-dependence between computer science and the telecommunications industries. Furthermore, this system uses a post-call method of fraud detection since the pre-call method has been defeated as explained in chapter 2 by line tapping and cloning. The pre-call method identies and prevents illegal calls as they are placed before a communication line is opened. The telecommunication industries rely on post-call methods because they identify fraud that has already occurred on users account and ensure the account is tossed (greylisted) for subsequent fraudulent usages.

3.1

Data Collection

Due to security considerations, it is not presently feasible to use real life telecommunication call records. The telecommunication industries are strict and they implement condentiality policies to secure their subscribers personal information. So, a sample fabricated call record with derived attributes (table 3.3) which have characteristics we can examine are potentially useful since there are over 20 call attributes in the real life data. The success of this system for a chronological call record used in this essay will enable it to be applied to the real life call records of the network carriers where little or no work will be done. Since the BBN used for this essay focuses on land phones, and possibly stationary mobile phone, table 3.1 contains the necessary variables of raw call records.

15

3.2. SYSTEM MODEL

Table 3.1: (A sampled raw customers call records of a network carrier) Caller Number 021-7879320 021-7879320 021-7879320 Date 01/02/05 10/03/05 ........ Day Mon Sat ... Time 01:06 17:00 ..... Duration 10 mins 50 secs ....... Dest. Number 021-5298100 27832775214 .......... Dest. Location Muizenberg Steenberg 2 ..........

3.1.1

Call Data and Impurities

It should be noted that call data collected is used to train the BBN model and if there are untrue data included, the model is likely to be weak in fraud trapping rate. For a new network service with new subscribers, it is likely we have over 98 % error free call records especially for the rst 30 days. So, this can serve as training dataset for the BBN model to ensure a better fraud catching rate. Suppose an existing telecommunication network is to adopt this system then, we need to mark and discard the impure data (fraudulent calls) using some agents discussed in the previous chapters. The existing block crediting method discussed earlier uses educated guesses (heuristics) which sometimes wrongly credit calls and introduce impurities into the database. One of the major tasks of an agent is to automatically include some destination numbers which a legal user had called long ago that are later called by fraudsters. It is the call record of the legal customer that is included for training but not for the fraudsters. Some of these records are mistakenly marked out using the block crediting process but are still useful to monitor the behaviour of a legal customer. Also, due to a small time lag, some unscrupulous people perform spoong on land lines but the illegal calls can be detected over a longer time using call pattern recognition. Thus, fraud-free call records are required to train the BBN model so as to maximise the fraud catching rate.

3.2

System Model

Figure 3.1 shows the primary part of the system model called the probabilistic model on the left and the auxiliary part is the mathematical model which is discussed in detail in section 3.4. Several modularised components work in conjunction with the BBN such that all customers calling records are kept in the master database. A network service uses a common master database so as to validate every phone number. For instance, a valid South Africa Telkom phone number is 021-7879320 but not 2348035719937. An agent is required to read call records from the user prole history (UPH) and current user prole (CUP), transform the data and present it in an acceptable form to the BBN. For instance, the agent derives informative values like weekend or weekday, morning, afternoon or o-peak for call days and call times respectively from table 3.1. All these constitute a transformed database which serve as a training set for the BBN. The new call records in the associated database (CUP)

16

3.2. SYSTEM MODEL

CUP Master Database Mathematical ModelPopulation Distribution

Data TransformationAssociated Database

UPH

Data Transformation TrainingSet

BBN Legal Illegal Calls Illegal Removed

Legal Calls

Figure 3.1: System Model for Telecommunication Fraud Detection

are usually predicted to be legal or illegal phone calls. The BBN model is the principal computational intelligence component that monitors the proles and an agent uses the results of these monitors to determine if a call is legal or not. In principle, the legal calls are continually added as part of the user behaviours (UPH) to retrain the model.

3.2.1

Classication of Users Accounts

In the context of this essay, real time fraud detection, the UPH constitutes all calls made by a user from the day he or she joins the network to a day before the CUP date while the CUP includes only daily calls. These days allocation is subject to change depending on the model designer or operation to be performed. Individual accounts need to be maintained because there are regular variations in calling behaviour of legitimate subscribers and fraudsters. Imagine some users often make international calls while others make them once in a while. Thus, the behaviour of a phone can be derived from its information in the UPH to identify the calls in the CUP as fraudulent or legitimate. A sample UPH is shown in table 3.1 while table 3.2 shows a CUP le which includes a uniform 17

3.2. SYSTEM MODEL

Table 3.2: (A sampled raw records of CUP of a network carrier) Caller Number 021-7879320 021-7879320 021-7879320 Date 04/04/05 04/04/05 ........ Day Wed Wed ... Time 04:14 19:30 ..... Duration 4 mins 1 hr ....... Dest. Number 021-5298100 27832775214 .......... Dest. Location Muizenberg Steenberg 3 ..........

Table 3.3: (A derived training set for the BBN model of a network carrier) Duration Call Time Call Type Call Day Dest-Num Dest-Loc min10 afternoon business weekday A local1 secs o-peak personal weekend B local2 ...... ........ ... ..... ....... .........

date and day.

3.2.2

Transformation of Call Data

An agent examines the raw data of UPH in table 3.1 and creates an informative statistical data in a form that can be used to train Bayesian Networks. For instance, the UPH can be transformed into table 3.3 as a training dataset. The call numbers represent the phone numbers for land lines but they are not part of the training data because they do not contribute any information to the model knowledge. This is described further in subsection 3.3.1. The call times are range of time periods as follows:[1] morning if call is made between 8:01hr GMT 15:00hr GMT, [2] afternoon if call is made between 15:01hr GMT - 19:59hr GMT, and [3] o-peak if call is made between 20:00hr GMT - 8:00hr GMT. The call day is the aggregation of days in time such that; [1] Mondays - Thursdays are represented as weekday and [2] Fridays - Sundays are represented as weekend. Destination numbers (Dest-Num) are uniquely represented as A, B, C, ..., and New. Destination location (Dest-Loc) is local1, if area called is within a province; it is local2, if area called is outside the province but within the country and inter if area called is outside the country. Duration is represented as: hr if call >= 1 hour mins10 if call > 10 minutes but call < 1 hr min10 if call = modal node state in Col1, then move to Step-6, else move to step-16. Step-6: Compare each node state in Col1 with the modal node state in Colj, nd the best match and compute the maximum occurrences. Step-7: Let the maximum occurrences of state for the best match with modal state in Colj = m1 and modal state in Colj = mj. Step-8: Compute the conditional probability; pr(Col1 = m1 | Colj = mj) = = P r(Col1 = m1, Colj = mj) P r(Colj = mj)

N (Instances of both m1 and mj in the training table) N (Instances of mj in Colj)

Step-9: The numerical result of step-8 gives the probabilistic relationship between the training parameters of Col1 and Colj Step-10: Repeat the process between Col1 and Colj+1. Step-11: Compare and nd the maximum probabilistic relation value between Col1 and other columns. Step-12: If the maximum probabilistic relation is between Col1 and Col5, then Col1 depends on Col5. For instance, in gure 3.2, Duration depends on Dest-Num. Step-13: Draw a directed arc from Col5 to Col1. Step-14: Repeat step-2 through step-13 by xing Col2 through Coln Step-15 Move to step-18 Step-16: Swap xed Col and Colj 20

3.3. BEHAVIOURAL BAYESIAN NETWORK (BBN) MODEL

Step-17: Move to Step-6 Step-18: The qualitative Bayesian Network is completed. Thus, this unsupervised algorithm is used to build the BBN model in gure 3.2. For a complete model, the quantitative learning is described below. The quantitative knowledge is the conditional probability table (CPT) associated with each node on BBN. The CPTs are computed from the instances of the records in the table 3.3 using a derived expression from the model in gure 3.2 as:

P r(T, CD, DN, DL, DU, CT ) = P r(T )P r(CD)P r(DN |T, CD)P r(DL|DN )P r(DU |DN )P r(CT |DU, T )

(3.1)

where T = Call Time, CD = Call Day, DN = Destination Number, DL = Destination Location, DU = Duration, and CT = Call Type. The quantitative knowledge of these training parameters are computed such that all the node states are used. For example, the knowledge of call time are calculated as marginal probabilities such as:

P r(T = morning) =

N (Instances of morning) T otal number of records N (Instances of af ternoon) T otal number of records N (Instances of of f peak) T otal number of records

(3.2)

P r(T = af ternoon) =

P r(T = of f peak) =

These results are written in a conditional probability table (CPT) form called quantitative knowledge. Also, the conditional probability of duration given a destination number is computed as: P r(DU = du|DN = dn) = P r(DU = du, DN = dn) P r(DN = dn)

= for all values of du and dn.

N (Instances of DU = du, DN = dn) N (Instancesof DN = dn)

The other probabilities in equation 3.1 are computed in the same way. The numerical probability values obtained are between 0 and 1. Every probability value is multiplied by 100 % so as to observe the feature of Bayesian Network as a belief network. For instance, if a probability value is 0.97 then it implies 97% degree of belief (condence). In this essay, the number of states in a node determines the threshold value. The threshold value is an average point of belief. For instance, the three states in the call time node give a threshold point of 21

3.3. BEHAVIOURAL BAYESIAN NETWORK (BBN) MODEL

100 = 33.333% 3 Thus, after using table 3.3 to train the BBN model, the following are the examples of quantitative knowledge associated to the nodes as shown in gures 3.3 and 3.4 as the CPTs. In other words, this is the knowledge acquired from the call records about the behaviour of a phone user. Since learning is said to be a continuous process then the model is also retrained so as to be adaptive.

Figure 3.3: Present Knowledge Associated to Call Time Node

Figure 3.4: Present Knowledge Associated to Duration Node Figure 3.3 shows that this phone user makes more calls in the morning than other periods since the threshold of the call time is about 33 %. A threshold value is also an average point of making decision. From gure 3.4, the threshold duration of call is 25 %. Therefore, we are 44.291 % sure that this phone user mostly spends seconds to call destination A and we strongly believe (63.462 %) that he or she spends less than 10 minutes to make a call to a new phone number. The other nodes on the model acquire similar knowledge. Suppose some telecommunication companies equipments are physically malfunctioning. This results in some incomplete call data. Therefore, some parameter estimation algorithms are used to estimate the missing values. Firstly, if the missing data is expected to be discrete then, we replace it with the modal value of its eld (column) in the database. Secondly, if it is continuous data, we replace it with average value of the eld. Also, we can use the power of inference with the current Bayesian network. Inference is the reasoning technique discussed earlier and examples will be shown in the next section 3.3.3 and chapter 4.

3.3.2

Call Patterns Recognition Technique

We believe that no two users can have the same behaviour and it is obvious that users call patterns change dynamically. Suppose a phone user has consistent call patterns. Then, we believe by Bayesian inference that the prior probabilities assigned to two dierent sets of his or her call records and with sucient evidence will bring the posterior probabilities of the sets closer.

22

3.3. BEHAVIOURAL BAYESIAN NETWORK (BBN) MODEL

We recognise call patterns when two consecutive CUPs are used to train the BBN model and the characteristics observed on the call patterns are compared. The call patterns of CUPs could be compared on daily, weekly or monthly basis. Subjective probabilities are assigned to phone calls based on the call behaviour. In this essay, we recognise call patterns on weekly basis by proling 7 days of calls in the CUP. With BBN, a knowledge discovery method is used to recognise the hidden information about call patterns that characterise the call times (morning, afternoon or o-peak) or call days (weekday or weekend) of CUPs. The user prole monitoring method is suitable to recognise when a users call patterns change. This technique is used to analyse calling behaviour in order to detect usage anomalies. This method follows the steps below: 1 Compute the marginal probabilities of all states of target variables like equation 3.2 on the two sets of data (CUP). 2 Put in an assumed evidence like call type in each case and observe the degrees of belief of the target variables using the inference expression such as:

P r(CD = weekday|CT = per) = t,du,dl,dn P r(T = t, DU = du, DL = dl, DN = dn, CD = weekday, CT = per)t,du,dl,dn,cd P r(T

= t, DU = du, DL = dl, DN = dn, CD = cd, CT = per)

P r(T = secs|CT = bus) = du,dl,dn,cd P r(DU = du, DL = dl, DN = dn, CD = cd, T = secs, CT = bus)t,du,dl,dn,cd P r(T

(3.3)

= t, DU = du, DL = dl, DN = dn, CD = cd, CT = bus)

for all values of t, du, dl, dn, and cd; where per = personal and bus = business 3 Ensure the inference calculations for the consecutive CUPs, note the threshold values for the target variables, and compare the results by observing changes in the degrees of beliefs. Also, this method with BBN compliments the velocity and collision fraud traps that will be discussed in my future work. When the call times and call days of the consecutive CUPs tend to have similar or close degree of beliefs then this shows a consistency in the call behaviour of the user. If no fraud is detected then, call behaviour must have changed perhaps due to: (1) Users income changes (increases or decreases), (2) change of phone ownership, (3) technical / personal problems. Examples are shown with the state transition analysis in chapter 4. As a solution to minimise losses to subscription fraud, it is proposed that call billing periods should be reduced. Suppose a subscription fraud is reported, the call patterns of the customers are compared and suspicious patterns are used to identify the legal callers and the fraudulent callers. If the call patterns are the same, then it is most likely that the accounts are owned by a single customer. 23

3.3. BEHAVIOURAL BAYESIAN NETWORK (BBN) MODEL

Table 3.4: (A test data (CUP) for the BBN model of a network carrier) Duration Call Time Call Type Call Day Dest-Num Dest-Loc mins10 morning personal weekday D inter secs afternoon business weekend A local1 min10 morning personal weekday A local1 hr o-peak business weekend B local2 .... ....... ........ ....... .... .......

3.3.3

Fraud Detection Techniques with BBN

Using an evidence that there is a call made, Bayesian inference is used to predict if a call record in the CUP is fraudulent. An evidence is a nding or observation obtained from a new call in the CUP. For instance, suppose we have derived call records in a CUP database of table 3.4 What is the degree of belief that a call was made given the call behaviour of the user in the UPH? An example of evidence is that a destination number B was observed if we consider a record in table 3.4. To predict if a call is truly from the owner, we use fraud indicators (nodes) as enough evidences on BBN to know the degrees of belief (as posterior probability). This is achieved using Bayesian inference. In principle, we use primary, secondary and tertiary indicators to infer a call. A primary indicator is ordinarily used to infer a call on BBN. A secondary indicator provides useful information in isolation but is not sucient itself. Also, a tertiary indicator provides a supportive information in conjunction with other indicators to predict if a call was actually made by the owner. Using equation 3.1, an example of an inference expression that is used to predict a call usually takes the form of equation 3.4 if destination number is a primary indicator:

P r(CT = per|DN = B) =

P P r(T =t,DU =du,DL=dl,CD=cd,CT =per,DB=B) P t,du,dl,cd t,du,dl,cd,ct P r(T =t,DU =du,DL=dl,CD=cd,CT =ct,DN =B)

(3.4)

where per = personal; for all possible values (states) of t,du,dl,ct, and cd. The inference expression in equation 3.5 shows the use of the three indicators if destination number is primary, duration is secondary and call time is tertiary.

P r(CT = per|DN = B, DU = hr, T = m) = dl,cd P r(DL = dl, CD = cd, CT = per, DN = B, DU = hr, T = m)ct,dl,cd, P r(CT

(3.5)

= ct, DL = dlCD = cd, DN = B, DU = hr, T = m)

where per = personal, m = morning; for all possible values of ct, dl and cd. Thus, this shows that we are making forward reasoning (inferring) from UPH. The simulation examples are shown in chapter 4 using BNA (Bayesian Network Administrator) of BaBe (Bayesian Behaviour Networks Agent Architecture). BaBe is a probabilistic software that is used to simulate Bayesian models while BNA is a component in BaBe that is used to make Bayesian inferences.

24

3.4. MATHEMATICAL MODELLING OF TELEPHONE CUSTOMER POPULATION

3.3.4

Techniques of Performance Evaluation for The BBN Model

The performance evaluation of a model can be done using a test set (in the CUP) which are ideally dierent from the training set (in UPH). The targeted evaluation techniques which will be used in this essay are The global precision of the Bayesian Network and The Confusion matrix (Occurrence matrix). GLOBAL PRECISION This total precision which directly evaluates a modelled network is useful but can be too general in some cases. The computation is given by: T otal precision = N umber of correct predictions 100% T otal number of cases

THE CONFUSION MATRIX The confusion matrix provides a more precise or accurate feedback on the performances of a model. The confusion matrix is organised in form of a table but each column is summed up to parameters found (e.g call time) in the le while every row species the number of predictions. The major confusion matrix we consider is the Occurrence Matrix,. The predictions of a model are placed on rows in the occurrence matrix while the actual values found in a database are placed on columns. The Occurrence matrix is the number of cases for each prediction or real values of a target variable in a database. Thus, these techniques are used in chapter 4 to evaluate the Behavioural Bayesian Network (BBN) model.

3.4

Mathematical Modelling of Telephone Customer Population

The mathematical model given below is used by a Telephone Service Provider (TSP) to study the population distribution of Legal customers (L), opportunistic customers (I) and Removed customers from the network (R). These classes of people are dened and itemised below: L = These are the legal phone users who make legitimate calls. I = These are the opportunistic / illegitimate customers who make fraudulent calls at the expense of the legal customers accounts. This class includes both: Active fraudsters and Legal customers who collaborate with fraudsters for fraudulent services.

25

3.4. MATHEMATICAL MODELLING OF TELEPHONE CUSTOMER POPULATION

For the purposes of this model, we treat these as equivalent. We do not consider the intent. R = These are the illegal subscribers who stop making fraudulent calls because they are uncomfortable with the Bayesian network model used by the TSP to trap their fraudulent calls. We assume that no former legal user who has begun to collaborate with the fraudulent operators will return to legal use of the network. Thus, former legal users who have ceased to use fraudulent services end up in this class. N = This is the total number of subscribers that use a network. Mathematically, we have these expression below: L+I +R = N (3.6)

These parameters formulate the mathematical model that is described with the three compartments in gure 3.4. First, consider the assumptions and denitions of the model in the next sections. Assumptions of the Model 1 After a certain number of fraud, most legal users will start to collaborate with the illegal customers for fraudulent services. 2 Use of fraudulent services is equivalent to defrauding the network. 3 Once a customer leaves the legal class to collaborate with fraudsters, they never come back. 4 The classical model considered is a closed system. A model is said to be closed when the population is assumed xed as; no new customers, no immigration or emigration and no birth. Also, the following are dened as: This is the rate at which opportunistic customers stop fraudulent activities as their calls are regularly trapped with the Bayesian network model being used by the TSP. Removal (I)= This is the number of opportunistic customers that stop fraudulent activities as their illegal calls are regularly trapped by the Bayesian network model of the TSP. This is the probability that a legal customer becomes a victim of the fraudulent calls of a fraudster.I N

This is the number of contacts per unit time that are sucient to make a legal customer account to be defrauded.I N This is the number of opportunistic customers per unit time that defraud a single legal customers account. I Incidence ( N L)= This is the total number of legal customers accounts that are being defrauded by opportunistic customers per unit time. By assumption 1, it is also equal to the rate at which legal customers move into the I class.

26

3.4. MATHEMATICAL MODELLING OF TELEPHONE CUSTOMER POPULATION

The legal class in gure 3.4 constitutes the legal customers that are found in the database of a network. Suppose there are three fraudulent calls identied in the callers records then, it is assumed that there are three illegal customers somewhere who defraud the account of the legal users. It is assumed that the opportunistic callers change to the removed class as their calls are being trapped by the TSP.

L

IncidenceI NL

Removal II

R

Figure 3.5: 3 Compartmental Model for Telephone Customers Population From equation 3.6, let the following be the proportions of the population size: L I R = l, = i, and = r. N N N Since a network user must belong to one of these three classes, these proportions are normalised as: l + i + r = 1. (3.7)

Therefore, the mathematical model below describes the changes in the customers proportions: dl = il dt di = il i dt dr = i. dt (i) (ii) (iii)

In this essay, we are interested in the long term behaviour and the computational aspects of these parameters. An OCTAVE program which uses a numerical dierential equation solving scheme is used to simulate the model. Recall that, this mathematical model is used by the TSP as a control strategy to assist in monitoring fraudulent activities in the network.

27

Chapter 4

Simulation and EvaluationAs the modelling phase is completed in chapter 3, this chapter is the validation phase where we put the BBN and the mathematical model into use. Also, we compute the performances of these models with techniques discussed earlier.

4.1

Diagnosing and Predicting Phone Calls

In this experiment mode, we use the prole monitoring technique that allows us to see the prior probabilities of dierent states of variables in the CUP (equations 3.2). As discussed earlier, we use Bayesian inference to predict phone calls. This allows us to assign values to variable(s) with certainty (hard evidence) and also allows assigning likelihood degrees (soft evidence) to the states. These set of observations (evidences) propagate probabilities through the nodes of the BBN and take the new information into account. The prediction of phone calls enables us to see the consistency of our BBN by measuring the probabilistic interactions between the CUP parameters. This is similar to a what-if scenario which predicts what happens if we put in evidence. Observe table 3.4 of phone records for a given CUP, we infer record 1 and 2 from UPH of table 3.3 and their monitors (state transition analysis) are shown in gures 4.1 and 4.2 respectively. Observe that record 1 of table 3.4 contains a new destination phone number D. So, we must make an inference from his or her call history (behaviour). What is the degree of belief that the legal phone user makes a call to a new destination number D, given that we observe three evidences as indicators from the record 1? Statistically, we say: P r(DN = N ew|DL = inter, T = morning, DU = mins10) =ct,cd P r(CD

= cd, CT = ct, DN = new, DL = inter, T = morning, DU = mins10) = ct, DN = dn, CD = cd, DL = inter, T = morning, DU = mins10)

ct,dn,cd P r(CT

Figure 4.1 shows part of the results of this expression as simulated with BNA of BaBe that it is a suspicious call, if compared with its threshold value. We are 5.5 % sure that the call was made by the user. The monitors show that the Dest-Num is the target variable, the primary indicator is 28

4.1. DIAGNOSING AND PREDICTING PHONE CALLS

Figure 4.1: Prediction of Call Record 1 of Table Figure 4.2: Prediction of Call Record 2 of Table 3.4 3.4 Dest-Loc, the secondary indicator is Call Time and tertiary indicator is Duration. These indicators are dynamically selected according to their degree of probabilistic relations as described in the EM algorithm with the target variable. Also, record 2 of table 3.4 contains an existing destination phone number A. We make an inference from his or her call history (behaviour) that, what is the degree of belief that the legal phone user makes a call to the known destination number A, given that we observe three evidences as indicators from record 2. Statistically, we say: P r(DN = A|DL = local1, DU = secs, T = af ternoon) =ct,cd P r(CD ct,dn,cd P r(CT

= cd, CT = ct, DN = A, DL = local1, DU = secs, T = af ternoon) = ct, DN = dn, CD = cd, DL = local1, DU = secs, T = af ternoon)

29

4.2. PATTERN RECOGNITION OF CALL RECORDS

Table 4.1: (Prior Probabilities of CUP1 and CUP2) Call Type (%) bus(62.8),pers(37.1) bus(54.2),pers(45.7) Call Time (%) noon(31.1),morning(37.3),o(31.3) noon(34.3),morning(34.3),o(31.3) Duration (%) hr(3.7),min10(62.8),mins10(6.6),secs(26.7) hr(3.9),min10(52.3),mins10(16.1),secs(27.5)

Table 4.2: (Recognised Probabilities patterns of CUP1 and CUP2) Call Type (%) personal(100) personal(100) Call Time (%) noon(8.4), morning(11.4), o(80.0) noon(14.9), morning(18.4), o(66.6) Duration (%) hr(6.8), min10(52.1), mins10(15.1), secs(25.8) hr(5.9), min10(37.5), mins10(34.4), secs(22.0)

Figure 4.2 also shows part of the results of this expression as simulated with BNA of BaBe that it is a legal call, if compared with its threshold value. We are 67.5 % condent that the call was made by the user. The monitors show the state transition analysis of all variables.

4.2

Pattern Recognition of Call Records

With the procedure of pattern recognition described in chapter three, two sets of CUP data of the same size 32 are used to train the BBN model. Their marginal probabilities are calculated as described in equation 3.2 and the results are shown in table 4.1 using BNA. It is obvious that the marginal or prior probabilities of target variable states of these weekly records have dierent values and are not used to determine change in call patterns but Bayesian inferences are used to recognise patterns. Having conrmed the records to be fraud free, we make reference to equation 3.3. We can observe the call patterns on table 4.2. Row 1 and row 2 correspond to their prior probabilities in table 4.1. These results are extracted from monitor formats like gure 4.1 Table 4.2, row 1 shows that personal calls are made at o-peak periods which took seconds and less than 10 minutes with 80.0 %, 25.8 % and 52.1 % degrees of beliefs respectively. Also, table 4.2, row 2 shows that personal calls are made at o-peak periods which took less than hour but more than seconds with 66.6 %, 37.5 % and 34.4 % degrees of beliefs respectively. These records are fraud-free but changes might be due to the reasons given in section 3.3.2.

4.3

Performance Evaluation of BBN Model

In this essay, we use a network targeted evaluation method which uses an associated text le (CUP) to evaluate the performance of our BBN for the prediction of the target variable (call type). With the evaluation computations described in chapter three, Table 4.3 describes the global precision and the occurrence matrix of the BBN.

30

4.4. THE MATHEMATICAL MODEL SIMULATION WITH OCTAVE PROGRAM

Table 4.3: Results of the Precision and the Occurrence Matrix for Table 3.4 Total Precision = 80.0 % Value business (4) personal (6) business (4) (3) 1 personal (6) 1 (5)

4.3.1

The Global Precision and Occurrence Matrix

The BBN model describes the call records in table 3.4 to have a total precision of 80.0 % . There are 10 records in all of which 2 illegal records are intentionally included. The 80.0 % shows that the model traps the 2 records. The prediction is like the previous Bayesian inference calculations such as, nding the P r(Call type = ct | Duration = du, Call time = t) for all values of ct, du and t. Table 4.3 shows occurrence confusion matrix in which the 4 business and 6 personal records found in the test le appear on the columns while the predicted values appear on rows. The values in bold form, with 3 business and 5 personal call records are true while 1 record each is predicted as false. Occurrence matrix is useful because every row gives the total predicted value and each column gives total record found by the model. Various other methods will be used in our future research to evaluate Bayesian models.

4.4

The Mathematical Model Simulation with Octave Program

Suppose an agent reports 20 legal customers, the Bayesian model traps 3 illegal calls and no legal customers initially collaborate with the active fraudsters from a callers database of a network at a given time of a year. It follows from equation 3.6 that the initial conditions of the parameters are: L(0) = 20, I(0) = 3, R(0) = 0, N = 23 and it follows from equation 3.7 that: 0.89 + 0.13 + 0.00 = 1. There is another agent which analyses callers database, estimates the average number of illegal calls before a customer notices that his or her account is defrauded, and reports 3 attempts per 30 days. This means, Recall assumption one and estimate from the database the rate at which legal customers collaborate with active fraudsters as: 1 1 = 0.033 = = t 30 We need to estimate from the database. If the Bayesian model is 100 % eective, then all illegal calls get trapped. This implies that rate of making illegal calls = rate of getting caught. Suppose that if an agent records 2 illegal calls from callers database per 7 days, the phone line get blocked. This disturbs fraudulent activities. Observe these two cases below: 31

4.4. THE MATHEMATICAL MODEL SIMULATION WITH OCTAVE PROGRAM

Table 4.4: (Average duration of fraud noticeable periods) Phone Numbers Illegal Calls Per 30 days 27720251590 3 27820251589 4 27731226263 2 Average 3

1. Estimate from the database that fraudsters are caught making 5 illegal calls before they leave illegal class. This implies that: Rate of leaving I = Rate of getting caught making 5 illegal calls. Therefore, mathematically, Rate = 2 illegal calls 7 days 1 call 3.5 days

Rate of 1 call = thus,

5 illegal calls 5 illegal calls 17.5 days 18 days

Recall the denition of R above. Since the rate of leaving I is also estimated from database 1 as: = 18 = 0.055 The simulation is shown in gure 4.3. Observe from gure 4.3 that the population of the illegal class decreases quickly because the trap rate of the Bayesian model is 100 % ecient which does not leave any loop hole for the fraudsters and it vanishes at about 150 days. This stabilises the legal and the removed classes long before 1 year (365 days). 2. Suppose the fraudsters still leave after being caught making 5 illegal calls but now assume the Bayesian model is only 50 % eective. rate of getting caught making an illegal call = = rate of getting caught making 5 illegal calls = rate of making 5 2 = 10 illegal calls therefore, Make 10 illegal calls get caught making 5 calls leave class I Recall that, Rate of making 1 call =1 call 3.5 days . Get caught once 2 calls

Thus, Rate of making 10 calls =1 35

10 calls 35 days .

Hence, since the rate of leaving I is estimated as: = 4.4. 32

= 0.029. The simulation is shown in gure

4.4. THE MATHEMATICAL MODEL SIMULATION WITH OCTAVE PROGRAM

Time series: (Distribution of Telephone Customers) 0.9 0.8 Legal 0.7 0.6 Subscribers 0.5 0.4 0.3 Removed 0.2 0.1 Illegal 0 0 50 100 150 200 time (days) 250 300 350 400 l(t) i(t) r(t)

Figure 4.3: The Eect When Illegal Activities Stop After 5 Calls trapped

Time series: (Distribution of Telephone Customers) 0.9 0.8 0.7 Legal 0.6 Subscribers 0.5 0.4 Removed 0.3 0.2 0.1 0 0 50 100 150 200 time (days) 250 300 350 400 Illegal l(t) i(t) r(t)

Figure 4.4: The Eect When Illegal Activities Stop After 10 Calls trapped

Observe from gure 4.4 that the population of the illegal class decreases slowly because the trapping eciency of the Bayesian model is 50 %. Hence, fraudsters can make more illegal calls before being

33

4.4. THE MATHEMATICAL MODEL SIMULATION WITH OCTAVE PROGRAM

convinced to stop illegal activities. So, the population of the illegal class does not vanish throughout a whole year (365 days). This reduces the number of legal customers and increases the population of the removed class by about half within a year (365 days). In view of all these, we have been able to show the long term eects of fraudulent activities if Bayesian network model is not used properly and it is obvious on the graph when to increase the use of the Bayesian model on the callers database.

34

Chapter 5

Conclusion, Discussion and ResultsMost of the numerical results of this essay are stated in chapter 4 and our BBN model has been shown to be ecient. With reference to subsection 4.3.1, 2 illegal records were intentionally put in the test data (CUP). The model traps the two illegal records (gure 4.3) which rendered the records in the text le 80 % reliable. The traditional data mining method requires voluminous records for detecting fraud but our BBN uses a daily detection technique. In our real time fraud detection method, threshold values discussed in chapter 3 were used as a theoretic-decision point to determine normal or suspicious calls. BBN was modelled with usage monitors because fraud behaviour changes frequently as sophisticated fraudsters can adapt to absolute detection methods. This makes BBN adaptive and scalable to ght against fraud. Besides these, the results of the mathematical model in gures 4.3 and 4.4 require ecient intelligent Bayesian models to trap fraudulent activities so as to reduce the population of the fraudsters.

5.1

Contribution to Knowledge

The followings are the main ideas that this essay has dealt with: This essay describes a technological counter measure to combat most fraud types on xed line telephones that many researchers have overlooked because the losses are not as much as in mobile phones. The details of superimposed and subscription fraud have been spelt out. A tentative solution to subscription fraud is to reduce the billing periods, especially when there is a dramatic change in call patterns. A real time fraud detection technique was presented in this essay as an improvement over the traditional detection methods that require voluminous data and other previous techniques discussed in chapter 2. BBN uses several fraud indicators which gives a level of condence (belief) to any decision taken on a call record.

35

5.2. RECOMMENDATIONS

As a contribution to the academic and industrial development, probabilistic and mathematical models were designed which showed an inter-relationship with each other. Our system model, together with the BBN, has demonstrated fraud detection system designs and the computational capabilities of adaptive intelligence in pattern recognition. The dynamic monitoring of the UPH and the CUP has been very helpful to accomplish our immediate fraud detection systems. Also, we have shown the inter-dependence of computing and telecommunication network establishments.

5.2

Recommendations

Since our model is an economical way of combating fraud which will assist in minimising revenue losses, we strongly recommend the use of BBN for telecommunication network carriers. This model should be of interest not only to the telecommunication companies, but also to anyone interested in AI and its applications.

5.3

Future Work

We have simulated and demonstrated this model, but have not yet applied it to real life data. Most xed line or related phone fraud models were addressed but we will focus on mobile phone fraud detection in our further research. Furthermore, we can integrate velocity and overlap Bayesian fraud detection models with the BBN for better fraud catching rate. Eventually we would like to develop a fraud detection model for every class of fraud which will later work together as a network. A system model that quickly adapts to new fraud patterns will frustrate the illegal users in favour of the network carriers and the legal phone customers. Also, we will integrate agents into the BaBe Agent Architecture to act upon the calling patterns from the BBN.

36

AcknowledgementsThis is the most dicult aspect of this essay because many people assisted me but I must mention a few. I rst give Almighty God the glory for His mercies endureth for ever. I am grateful to the entire management (e.g Prof. Neil Turok, Prof. Hahne Fritz), sta, lecturers and tutors of the African Institute for Mathematical Sciences (AIMS); all AIMS sponsors, University Of Stellenbosh, University Of Cape Town, University Of Cambridge and all other institutions for their concerted eorts in my capacity building programme. I cannot but give kudos to Kate Marvel for her excellent contributions in this essay. More importantly, I express my profound appreciation to my supervisor Dr Anet E. G. Potgieter of Computer Science department, University of Cape Town. She gave me constructive ideas and provided enough time for this work. I extend my appreciation to the entire management and sta of Complex Adaptive Systems Ltd. and members of AIM lab research group, UCT. I thank my lovely partner Cecilia Oluseyi Osunmakinde for her support, every member of my family, Prof. O. D. Makinde, distinguished friends, the Nigerians and other AIMS colleagues for their indefatigable pieces of advice. I say thank you to the developers and the maintenance sta of the software programs like Latex, gnuplot, BaBe, etc. Thank you and God bless you all. ........Isaac Olusegun Osunmakinde

37

Bibliography[1] Sapa, Crime initiative ..... future, Cape Times, 14 April, pp. 3, 2005. [2] D. Adams, Thieves Answer Call for Mobile Phones, The Age, pp. A3. 25 May 25, 1996. [3] D. Walters & W. Wilkinson, Wireless Fraud, Now and in the Future: A view of the problems, some solutions , Mobile Phone News, October 24, pp. 4-7., 1994. [4] F. George, Articial Intelligence, Structures and Strategies for complex problem solving, 5th edition, 2005. [5] H. Michael, D. Lambert, C. Jose, Pinheiro & D. Sun, Detecting Fraud in the Real World, Technical Report, Lucent Technologies, 2000. [6] http://en.wikipedia.org/wiki/Articial-intelligence. [7] http://www.gsmworld.com/history/page15.htm & http://www.austel.gov.au/standards/brief.htm. [8] J. Pearl, Probabilistic reasoning in Intelligent systems:, Networks of Plausible Inference, Morgan Kaufmann Publishers, 1988. [9] K. Pequeno, Real time fraud Detection: Telcoms next big step, Telecommunications (America Edition), Vol. 31, No. 5, pp. 5960, 1997. [10] N. Nilsson, Articial Intelligence: , A new synthesis, rst edition, San Fransisco USA. Morgan Kaufmann Publishers, 1998. [11] O. Cellular, Cellular Fraud Facts, paper presented at the tional Crime Stoppers Conference, Hawaii, September, See also: http://www.wireless101.com/new/fpf/fraud.htm, 1994. InternaInternet

[12] P. Delaney, Investigating Telecommunications Fraud, Criminal and Civil Investigation Handbook, 2nd ed., McGraw-Hill Inc. New York, 1993. [13] S. Rosset, U. Murad, E. Newmann, Y. Idan & G. Pinkas, Discovery of fraud rules for telecommunications challenges and solutions, In proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 409413. ACM Press, 1999. [14] SAS. INSTITUTE, Using Data Mining Techniques for Fraud Detection: A Best Practices Approach to Government Technology Solutions, White papers, http://www.sas.com, 1996. [15] T. Brooks & M. Davis, Are Your Phone Bills Fraud Free ?, Security Management, vol. 38, no. 4, pp. 67-8., 1994. 38

BIBLIOGRAPHY

[16] T. Fawcett & F.Provost, Adaptive Fraud Detection, Data Mining and Knowledge Discovery 1,2, Kluwer Academic Publishers, Boston, Massachusttes, pp. 128, 1997.

39