Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine...

51
Computing & Information Sciences Kansas State University Monday, 31 Mar 2008 CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday, 31 March 2008 William H. Hsu Department of Computing and Information Sciences, KSU KSOL course pages: http://snurl.com/1ydii / http://snipurl.com/1y5ih Course web site: http://www.kddresearch.org/Courses/Spring-2008/CIS732 Instructor home page: http://www.cis.ksu.edu/~bhsu Reading: Today: Sections 7.9 – 7.11, 2.6, Han & Kamber 2 e Wednesday: 8.1 – 8.2, Han & Kamber 2 e Outlier Detection

Transcript of Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine...

Page 1: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Lecture 26 of 42

Monday, 31 March 2008

William H. Hsu

Department of Computing and Information Sciences, KSU

KSOL course pages: http://snurl.com/1ydii / http://snipurl.com/1y5ih

Course web site: http://www.kddresearch.org/Courses/Spring-2008/CIS732

Instructor home page: http://www.cis.ksu.edu/~bhsu

Reading:

Today: Sections 7.9 – 7.11, 2.6, Han & Kamber 2e

Wednesday: 8.1 – 8.2, Han & Kamber 2e

Outlier Detection

Page 2: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Outlier DetectionOutlier Detection

Lian Duan

Management Sciences, UIOWA

Page 3: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

What are outliers?What are outliers?

Hawkins-Outlier: An outlier is an observation that deviates so much from other observations as to arouse suspicion that it is generated by a different mechanism.

A relative concept: Situation Your angle A example: Suppose you are the US president. Common Thing: Compare to History and Majority

Page 4: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Outlier Detection and ClusteringOutlier Detection and Clustering

Interwoven with each other. Not all objects should belong to a certain cluster. Abnormal events might have temporal or spatial locality. (Body Temperature)

Single Point Outliers Cluster-based Outleirs

Page 5: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Previous WorkPrevious Work

DB(pct,dmin)-Outlier [Binary]: Given an object p, at least percentage pct of the objects in D lies greater than distance dmin from p.

Density-based local outlier [Degree]: Given the lowest acceptable bound of LOF, an object p in a dataset D is a density-based local outlier if LOF(p)>LOFLB.

Other statistical methods.

Page 6: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Local Outlier FactorLocal Outlier Factor

Local Density: the inverse of the average distance to its k-nearest neighbors.

Local Outlier Factor: the ratio of the local density of p and those of p’s k-nearest neighbors. The LOF of each object depends on the density of the cluster relative

to it and the distance between it and the cluster.

Page 7: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Illustration Of LOFIllustration Of LOF

A example:

LOF-Outlier vs. DB(pct,dmin)-Outlier

Page 8: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

LDBSCAN=DBSCAN+LOFLDBSCAN=DBSCAN+LOF

DBSCAN: Retrieve all points which is density-reachable from the given Core-Point(MinPts, ε).

Problem: How many are many?

Page 9: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

LDBSCAN (continued)LDBSCAN (continued)

A relative concept of core points and similarity. Core Points: LOF<LOFUB Similarity: p N∈ MinPts(q) and LRD(q)/(1+pct)<LRD(p)<LRD(q)*(1+pct)

Page 10: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

LDBSCAN (continued)LDBSCAN (continued)

The same clustering idea with DBSCAN

Parameter: LOFUB pct

Page 11: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

LDBSCAN (continued)LDBSCAN (continued)

Page 12: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

AdvantageAdvantage

Density-based vs Partitioning Clustering: Small clusters, arbitrary shape, and noise.

Page 13: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Advantage (continued)Advantage (continued)

LDBSCAN vs DBSCAN Easier to select proper parameters. Handle local density problems.

Page 14: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Advantage (continued)Advantage (continued)

LDBSCAN vs OPTICS Comet-like clusters Hierarchical structure

Page 15: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

PerformancePerformance

Experiment facility: P 2.4G, 512M memory, redhat 9.0, jdk1.4.2Ⅳ Algorithm steps:

Search k-nearest neighbors: O(n2) or O(nlogn) Calculate LRDs and LOFs: O(n) Clustering: O(n)

Its compute complexity isequal to that of LOF.

Page 16: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

ExperimentExperiment

Wisconsin Breast Cancer Data After data preprocessing, the resultant dataset has 327 (57.8%)

benign records and 239 (42.2%) malignant records with nine attributes.

Discover two clusters and five single point outliers.Cluster A contains 296 benign records and 6 malignant records. Its

average local density is 0.743.Cluster B contains 26 benign records and 233 malignant records. Its

average local density is 0.167.Five single point outlier whose LOFs fall into the range from 3 to 5.

Page 17: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Experiment (continued)Experiment (continued)

Boston Housing Data After data preprocessing, the resultant dataset has 506 records with 14

attributes. Cluster: (1, 82, 0.556); (2, 345, 0.528); (3, 26, 0.477); (4, 34, 0.266); (5, 9,

0.228); (6, 6, 0.127). 4 single point outliers. Cluster 5 vs Cluster 6 (from cluster 1)

24.514 (bigger per capita cirme rate) vs 20.005; 284th record (from cluster 4): LRD=0.155, LOF=1.468.

2nd attribute: higher proportion of residential land zoned for lots. 3rd attribute: lower proportion of non-retail bussiness acres per town.

Page 18: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Appendix: Cluster-based OutliersAppendix: Cluster-based Outliers

Definition 1 (Upper Bound of the Cluster-Based Outlier): Let C1, ..., Ck be the clusters of the database D discovered by LDBSCAN in the sequence that |C1|≥|C2|≥…≥|Ck|. Given parameters α, the number of the objects in the cluster Ci is the UBCBO if (|C1|+|C2|+…+|Ci-1|)≥|D|*α and (|C1|+|C2|+…+|Ci-2|) < |D|*α.

Definition 2 (Cluster-based outlier): Let C1, ..., Ck be the clusters of the database D discovered by LDBSCAN. Cluster-based outliers are the clusters in which the number of the objects is no more than UBCBO.

Definition 3 (Cluster-based outlier factor): Let C1 be a cluster-based outlier and C2 be the nearest non-outlier cluster of C1. The cluster-based outlier factor of C1 is defined as

2

||/)(*),(*||)( 22111Cp

i

i

CplrdCCdistCCCBOF

2

||/)(*),(*||)( 22111Cp

i

i

CplrdCCdistCCCBOF

Page 19: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Experiment (continued)Experiment (continued)

Abnormal Network Throughput Detection Network throughput has the

characteristic that are consistent with self-similarity.

Monitoring 300 nodes per 5 minutes: 3600 per hour

Single point VS. Cluster-based 30 VS. 3 alerts per hour Occasional fluctuations VS. Abnormal

events over a period

Page 20: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

ConclusionConclusion

Outlier detection and clustering improve accuracy with each other. Cluster-based outlier detection is more meaningful. ADVERTISING: LDBSCAN is good at both outlier detection and

clustering. Clusters with arbitrary shape and different local density Single point outliers and cluster-based outliers Degree of outliers

Page 21: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Outlier Detection for High Dimensional Data

Outlier Detection for High Dimensional Data

Chilly (Ruohan) Wu

2/26/2006

Page 22: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Basic Information of paperBasic Information of paper

Charu C. Aggarwal & Philip S. Yu IBM T. J. Watson Research Center ACM SIGMOD 2001 May 21-24, Santa Barbara, California USA

Page 23: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

What’s outlier?What’s outlier?

Outlier data point which is very different from the rest of the data based on

some measure. Hawkins – Outlier (generally accepted, formal)

An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism.

often contains useful information on abnormal behavior of the system described by the data

Application credit card fraud, network intrusion detection, financial applications

and marketing

Page 24: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Existing methods of outlier detectionExisting methods of outlier detection

Distribution-Based Methods Data points are modeled using a stochastic distribution Outliers are observations which deviate from the given distribution.

Distance-Based methods define outliers by using the full dimensional distances of the points

from one another

Page 25: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Existing methods of outlier detection (cont)Existing methods of outlier detection (cont)

Clustering-Based Methods Outlier as a side-product of cluster Outliers are points which do not lie in clusters

Density-Based Methods Based on the densities of local neighborhoods

But these methods do not work quite as well when the dimensionality is high

Page 26: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

What is special in high dimensional space?What is special in high dimensional space?

Domain in which the data can hundreds of dimensions. It is very difficult and inaccurate to estimate the multidimensional distributions

of the data points

The data is sparse in high dimensionality The concept of locality becomes difficult to define

The actual values of distances for any pair of points are similar in high dimensional space It is difficult to find out outliers based on distance Meaningful clusters cannot be found

every point is an almost equally good outlier the notion of proximity fails to retain its meaningfulness

Page 27: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Example: Several 2-dimensional cross-section of a high dimensional data set

Example: Several 2-dimensional cross-section of a high dimensional data set

Page 28: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Desiderata for High Dimensional Outlier Detection Algorithms

Desiderata for High Dimensional Outlier Detection Algorithms

handle the sparsity problems in high dimensionality effectively. provide interpretability in terms of the reasoning which creates the

abnormality. Proper measures must be identified in order to account for the physical

significance of the definition of an outlier in k-dimensional subspace The algorithms should continue to be computationally efficient for very

high dimensional problems. The algorithms should provide importance to the local data behavior

while determining whether a point is an outlier.

a distance based threshold for an outlier in a k-dimensional subspace is not directly comparable to one in (k + 1)-dimensional subspace.

algorithms should be devised which avoid a combinatorial exploration of the search space.

Page 29: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Defining Outlier in Lower Dimensional Projections

Defining Outlier in Lower Dimensional Projections

The essential idea behind this technique define outliers by examining those projections of the data which have

abnormally low density Defining Abnormal Lower Dimensional Projections

Abnormal lower dimensional projection is one in which the density of the data is exceptionally lower than average.

Page 30: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Sparisity coefficientsSparisity coefficients

Each attribute of the data is divided into equi-depth ranges. Thus, each range contains a fraction f = 1/ of the records.

N points in the k-d database (uniform distribution) The probability of any point in a k-d cube is The expected fraction and standard deviation of the points in a k-d cube is

and N(D) is the number of points in a k-d cube D Sparsity coefficient S(D) of cube D

kf

kN f (1 )k kN f f

( )( )

(1 )

k

k k

n D N fS D

N f f

Page 31: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Sparisity coefficientsSparisity coefficients

Only sparisity coefficients which are negative indicate cubes in which the presence of the points is significantly lower than expected

In general, the uniformly distributed assumption is not true. However, the sparsity coefficient provides an intuitive idea of the level of

significance for a given projection.

( )( )

(1 )

k

k k

n D N fS D

N f f

Page 32: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Brut-force AlgorithmBrut-force Algorithm

d: total dimensionality of the data k: the dimension of the projection which is used to determine the

outlier m: the number of input projections to be determmined The algorithm works by examining all possible sets of k-

dimensional candidate projections (with corresponding grid ranges) and retaining the m projections which have the most negative sparsity coefficients.

Page 33: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Brut-force AlgorithmBrut-force Algorithm

<-concatenation

Page 34: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

An overview of Evolutionary SearchAn overview of Evolutionary Search

The fundamental idea underlying: in nature, resources are scarce and this leads to a competition among

the species. Consequently, all the species undergo a selection mechanism, in which only the fittest survive. Consequently, the fitter individuals tend to mate each other more often, resulting in still better individuals. At the same time, nature occasionally throws in a variant by the process of mutation, so as to ensure sufficient amount of diversity among the species, and hence also a greater scope for improvement.

Page 35: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

An overview of Evolutionary SearchAn overview of Evolutionary Search

Each feasible solution to the problem is defined as an individual This feasible solution is in the form of a string and is the genetic representation of

the individual.

The process of conversion of feasible solutions of the problem into string representations is called coding. For example, a possible coding for a feasible solution to the traveling salesman

problem could be a string containing a sequence of numbers representing the order in which he visits the cities

Page 36: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

An overview of Evolutionary SearchAn overview of Evolutionary Search

The genetic material at each locus on the string is referred to as a gene

the possible values that gene could possibly take on are the alleles.

The measure of fitness of an individual is evaluated by the fitness function, which has as its argument the string representation of the individual and returns a value indicating its fitness. the better the objective function value, the betterthe fitness value.

Page 37: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

An overview of Evolutionary SearchAn overview of Evolutionary Search

As the process of evolution progresses, the individuals in the population become more and more genetically similar to each other. This phenomenon is referred to as convergence.

Dejong defined convergence of a gene as the stage at which 95% of the population had the same value for that gene.

The population is said to have converged when all genes have converged.

Page 38: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

The Evolutionary Outlier Detection AlgorithmThe Evolutionary Outlier Detection Algorithm

:the grid range for the i-th dimension can take any of the values 1 throught

or it can take on the value * ( don’t care) Take a total of +1 values

Example(4-d, =10) A possible solution to the problem : *3*9 The fitness for the corresponding solution may be computed using the

sparsity coefficient discussed earlier.

imim

Page 39: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

The Evolutionary Outlier Detection AlgorithmThe Evolutionary Outlier Detection Algorithm

starts with a population of p random solutions and iteratively used the processes of selection, crossover and mutation in order to perform a combination of hill climbing, solution recombination and random search over the space of possible projections.

The process was continued until the population converged to a global optimum. De Jong convergence criterion to determine the termination condition.

At each stage of the algorithm, the m best projection solutions (most negative sparsity coefficients) were kept track of.

At the end of the algorithm, these solutions were reported as the best projections in the data.

Page 40: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

The outlier detection AlgorithmThe outlier detection Algorithm

<-S:the poputation of solutions in any iteration

Page 41: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

The Selection Criterion for the Genetic AlgorithmThe Selection Criterion for the Genetic Algorithm

roulette wheel mechanism the probability of sampling a string from the population was

proportional to p-r(i), where p is the total number of strings, and r(i) is the rank of the ith string.

the strings are ordered in such a way that the strings with the most negative sparsity coefficients occur first.

the most abnormally sparse solutions are

likely to have a greater number of copies

Page 42: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

The Selection Criterion for the Genetic AlgorithmThe Selection Criterion for the Genetic Algorithm

Page 43: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

The Crossover AlgorithmThe Crossover Algorithm

Unbiased two-point Crossover determining a point in the string at random called the crossover point, and

exchanging the segments to the right of this point For example, consider the strings 3*2*1 and 1*33*. If the crossover is

performed after the third position, then the two resulting strings are 3*23* and 1*3*1.

if the crossover occurred after the fourth position, then the two resulting children strings would be 3*231 and 1*3**.

In general, since the evolutionary algorithm only finds projections of a given dimensionality in a run, this kind of crossover mechanism often creates infeasible solutions after the crossover process

Page 44: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

The Crossover AlgorithmThe Crossover Algorithm

it is desirable that the two children obtained after solution recombination also correspond to a k-dimensional projection

Different positions in the string Type I: Both strings have a don't care. Type II: Neither has a don't care. Let us assume that there are k’<=k

positions of this type. Type III: One has a don't care. Since each string as exactly k-k’ such

positions in each string; and these positions are disjoint. Thus, there are a total of 2(k-k’) such positions.

Page 45: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Optimized CrossoverOptimized Crossover

create at least one child string from the two parent strings which is a fitter solution recombination than either parent.

find the best possible recombination from the two parents There are a total of possibilities for the children.

Observe k’ is typically quite small First, search the space of the possibilities for the Type II positions for the best possible

combination then, use greedy algorithm in order to find a solution recombinant for the (k-k’) Type III

position we always extend the string with the position which results in the string with most negative

sparsity coefficient. We keep extending the string for an extra (k-k’) positions until all k positions have been set. (first child S)

The second child is created by always picking the positions from a different parent than the one from which the string S derives its positions.

' (2 2 ')( ')2 ( )k k kk k

'2k

Page 46: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

The Crossover AlgorithmThe Crossover Algorithm

Page 47: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

The Mutation AlgorithmThe Mutation Algorithm

mutations of two types: Type 1:affact the position which is *

Let Q be the set of positions in the string which are * . Then pick a position in the string which is not in Q and change it to . At the same time, we change a randomly picked position in Q to a number between 1 and .

Type 2:only affact the position which is not * The value of such a position is changed from a value between 1 and to

another value between 1 and . With P1 and P2, we perform mutation of Type 1 and Type 2

For the purpose of our implementation P1=P2

Page 48: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

The Mutation AlgorithmThe Mutation Algorithm

Page 49: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

Choice of Projection ParametersChoice of Projection Parameters

K and each subcube represented by a k-dimensional projection contains an expected fraction

of 1/ of the data If k=4, =10, N<10,000,each cube will contain less than one point should be picked high enough that there are sufficient number of intervals on each

dimension that corresponds to a reasonable notion of locality. We calculate the sparsity coefficient of cube by a choice of sparsity coefficient of s=-3 would result in 99:9% level of significance that the

given data cube contains less points than expected and is hence an abnormally sparse projection.

k

1k

N

2log ( / 1)k N s

Page 50: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

EMPIRICAL RESULTSEMPIRICAL RESULTS

We tested the performance of the method using both the brute-force and the evolutionary technique

the brute-force technique required considerably more computational resources than the evolutionary search technique for high dimensional data sets.

in order to find k-dimensional projections of a d-dimensional problem, there are a total of possibilities. D=20,k=4, =10,result in 7*10^7 possibilities

( )d kk

Page 51: Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI Lecture 26 of 42 Monday,

Computing & Information SciencesKansas State University

Monday, 31 Mar 2008CIS 732 / 830: Machine Learning / Advanced Topics in AI

EMPIRICAL RESULTSEMPIRICAL RESULTS