RESEARCH PROPOSAL - Semantic Scholar...RESEARCH PROPOSAL 1. Title: Geographic Crime Linkage...

RESEARCH PROPOSAL

1. Title: Geographic Crime Linkage Analysis: A Spatio-Temporal Data Mining Approach

2. Research Topic: Topic 5. Geographic Crime Series Linkage Analysis

3. Principal Investigators:PI: Shashi Shekhar, ProfessorEmail: [email protected]

Co-PI: Jaideep Srivastava, ProfessorEmail: [email protected]

Geographic crime linkage analysis: A Spatio-temporal datamining approach

a. Abstract

Geographic crime linkage analysis focuses on identifying spatially grouped serial crimes and

criminals from a given a set of crime reports and other related information provided by state and

local law enforcement agencies. Discovering and tracking spatial relationships from crime data is

an important problem in crime analysis e.g. identifying relationships that exist among crimes com-

mitted by the same offender (e.g. serial killer) or same group of offenders (e.g. organized crimes).

However, geographic crime linkage analysis is challengingdue to several reasons: i) existense of

disparate sources of data in the form of incident reports, dispatch records and modus operandi in-

formation (ii) crime data might have spatially skewed ditributions (iii) size, volume and complexity

of data available to the law enforcement agencies is growing(iv) high risk of generating spurious

patterns and (v) presence of large amounts of missing or imprecise data.Existing work in crime

analysis assume a normal distribution of crime datasets anddo not consider micro-environmental

factors into account. Also, they focus on manual discovery of spatially grouped crimes whose re-

sults might be analyst-oriented, dependent on the underlying distribution of the data. Existing tools

in crime analysis, classical data mining and geographic profiling make use of a variety of techniques

which have these limitations. The focus of this proposal is to create and explore a novel spatio-

temporal data mining platform (STDMP) for geographic crimeseries linkage analysis (GCSLA)

which includes developing a spatio temporal data mining framework that can address the spatio-

temporal nature of crime data, consider micro-environmental factors while generating hypotheses,

do not assume any specific distribution of the data and can scale large to crime datasets.We would

validate our proposed approach with real datasets from law enforcement agencies and also deploy

our framework as components of existing crime analysis tools.

2

b. Table of Contents

a ABSTRACT 2

c Research Plan or Main Body 4

1. Purpose, Goals and Objectives . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 4

2. Review of Relevant Literature . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 7

3. Research Design and Methods . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 11

4. Implications for Criminal Justice Policy and Practise . .. . . . . . . . . . . . . . . . 20

5. Management Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21

6. Dissemination Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 21

d. Appendices

3

c.Research Plan or Main Body

1. Purpose, Goals and Objectives -

Geographic Profiling is one of the most common techniques used by law enforcement and crim-

inal justice agencies in order to prioritize the search areafor a serial offender’s home or other anchor

point. The first step in geographic profiling often termed as ”‘Scenario Selection”’ or ”‘Crime Se-

ries Linkage Analysis”’ deals with the analysis of crime data for a connected series of crimes.

Formally this problem is defined as follows: Given crime datain the form of conviction records,

incident reports about specific crime instances, Modus Operandi information as well as other data

common to state and local law enforcement agencies such as geography of different localities, spa-

tial foot print of building premises and locations of crime generating locations like bars, schools,

bus stops etc, geographic crime series linkage analysis identifies a set of spatially grouped crimes

or crimes with previously unknown spatial relationships that form a series, such that the result-

ing series are statistically significant. We propose to build algorithms and tools to identify sets of

spatially grouped crimes that are statistically linked.

Classical Linkage analysis also known as comparative case analysis tools are useful for crime

analysts to identify and extract series of crimes. Their objective is to reduce manual labor in sifting

through numerous crime reports.However, growth in varietyand volume of observational data has

out-paced the ability of traditional crime analysts to identify, discover, and track spatially grouped

crimes and crime series. Incorporation of spatial featuresfrom the data in linkage analysis will

further eliminate false series and consequently improve the efficiency and effectiveness of law

enforcement in dealing with crimes.

Identifying and extracting spatially grouped crimes that are statistically linked is inherently hard

because (i) Disparate sources of information need to be combined to find the links between crimes.

(ii) Spatial as well as temporal characteristics of crime data need to be considered while identifying

the series. For instance crimes have a spatially skewed distribution and hence a series occurring

in a high crime neighborhood might be difficult to detect because so many other similar crimes

are occurring. (iii) Volume of data available to the law enforcement agencies in terms of dispatch

4

records, incident records, details about micro-environment and geography is growing manifold due

to increase in the number of crimes as well as improvement andstandardization of techniques for

evidence gathering. (iv) Generation of spurious or unwanted patterns need to be minimized. (v)

Presence of large amounts of unwanted or irrelevant data as well as missing data from data sets.

The state of the art in crime analysis, classical linkage analysis, classical data mining and spatial

statistics are limited in their ability to identify statistically meaningful and intersting geographic

crime series from large datasets. Specifically, there are three main limitations of the state of the

art namely: a) traditional methods do not consider micro environmental factors in to account while

performing linkage analysis, b) existing methods in crime analysis assume a normal distribution

of the data , however data can have both poisson and bernoullidistributions and c) traditional

approaches such as scenario selection may not be scalable tolarge datasets.

The overall objective of this proposal is to discover a geographic crime series from spatio-

temporal crime datasets so as to aid in geographic profiling.The proposed approach to overcome

critical limitations of existing techniques relies on spatio-temporal data mining techniques. We

propose to incorporate the resulting algorithms and techniques of this research into software tool

kits to be used by crime analysts.

To address the various challenges in geographic crime series linkage analysis we propose a

spatio-temporal data mining paltform (STDMP) for geographic crime series linkage analysis(GCSLA).

This approach seeks to build upon existing CRISP-DM (Cross Industry platform for Data Mining)

for GCSLA. We propose to develop data mining algorithms thattake into account the spatial and

temporal characteristics of crime data. The major goal of this research is to utilize spatial and tem-

poral characteristics as well as the micro-environmental factors such as locations of bars, schools,

spatial footprint of buildings and an offenders awareness of space.

Proposed research has the potential to make a significant contribution in the area of geographic

crime linkage analysis by advancing methods for identifying, discovering, predicting, and tracking

spatial and spatio-temporal crime series. Even though the focus will be on analyzing crime, re-

sulting analytical tools and techniques are likely to benefit identifying, discovering, predicting, and

5

tracking many other kinds of spatio-temporal patterns in homeland sequrity and epidemiology, etc.

2. Review of Relevant Literature

Several disciplines have been involved in developing linkage analysis tools that could be applied

to the problem of identifying a geographic crime series. Thestate of the art related to this problem

can be divided into different approaches such as traditional crime analysis, strategic crime analysis

and tactical crime analysis. Crime analysts have mainly leveraged on spatial statistics and classical

data mining for selecting and refining their hypotheses for solving crime.

Traditional crime analysis is based on theories and concepts such as a) Routine activity theory

[12] that formalizes the three factors affecting crime namely a likely offender, a suitable target and

the absence of law enforcement, b) rational choice theory [13] which describes various constraints

evaluated by offenders to commit crime, c)crime pattern theory drawn from Environmental Crimi-

nology [8] which integrates routine activity theory and rational choice theory within a geographic

framework and d) notions of crime attractors and crime generators such as restaurants, shopping

centres, parks etc.[29, 7, 6].

Strategic crime analysis methods include, hot spot analysis and extraction of space time clusters.

These methods mainly focus on identifying a set of spatiallygrouped crimes, high crime areas or

hotspots to target crime reduction efforts

Hot Spot Analysis [4, 14, 18, 24, 31, 34, 41, 28], discover high-density regions from point

datasets which show the actual locations of the crimes. Thishas been traditionally used by law en-

forcement agencies to undertake policy measures such as optimal placement of resources and crime

reduction efforts. These methods focus on the discovery of the geometry (e.g. circle, ellipse, etc.)

of the high-density regions [14]. For example, the Spatial and Temporal Analysis of Crime (STAC)

module Crimestat[] , nearest neighbor hierarchical clustering techniques, and K-means clustering

techniques are among the methods that use the ellipse methodto identify hotspots [24].Kernel

density estimation methods have been developed to identifyisodensity hotspot surfaces because

hotspots may not have crisp ellipsoid boundaries. Local indicators of spatial association (LISA)

statistics were proposed to eliminate the limitations of ellipsoid-based and kernel-based estimation

6

techniques [4, 18]. The clumping method was proposed by Roach to discover clumped points (e.g.

hotspots) from a point dataset [34]. Roman, [35] points out the need for extending hotspot analysis

techniques to aggregate crime data.

A major limitation of these hotspot analysis methods is thatthey can compute only statisti-

cally significant groupings or clusters of crimes, however they cannot discover unknown spatial

relationships between different types of crime events. Forexample,{Bars,AutoThe f t} may be an

interesting spatial relationship. In the context of routine activity theory, Roman [35] highlights that

schools and bars are potential areas for crimes and hence they are locations which can have a high

activity of crime attractors and crime generators. The study of micro-environments and its effect

on crimes is thriving. Block et. al. [7, 6] have presented a detail analysis of spatial locations like

bars and transit stations. Given a large spatial-temporal dataset, hot spot analysis methods can only

identify high density areas of crime but cannot identify a strongly correlated set of crime types or a

strong correlation between a spatial feature such as a Bar and a crime type such as Auto theft. Ex-

isting hotspot analysis methods also do not incorporate temporal measures such as time of the day,

day of the week etc [35]. Hence, these traditional hotspot analysis techniques also cannot identify

a group of statistically linked crimes belonging to a series. Although SatScan explicitly looks at

space and time hot spots, but it does not incorporate others features of an area except population.

The knox index[22, 23] and near repeat calculator by Jerry Ratcliffe[30] explicitly look at space

and time, but do not identify specific hot spots.

Tactical crime analysis focusses on building signatures ofspecific crime instances by making

use of information from crime incident reports, modus operandi information and a large number

of historical case reports.The state of the art in tactical analysis include geographic profiling, MO

classification and manual linking of a large number of geographically distributed crimes by sifting

through a large number of open and solved cases.

A major hypotheses tactical analysis methods use is that ’A large proportion of crime is com-

mitted by a small proportion of offenders’. In the process oflinking several crime instances and

modus operandi signatures crime analysts make an exponential set of hypotheses called scenarios

7

and term this phase of identifying a group of linked crimes as”‘scenario selection”’. The presence

of an exponential set of scenarios makes tactical analysis really challenging. Crime analysts make

use of linkage analysis techniques in order to link several crime instances.

Modus operandi information used in tactical crime analysisconsist of spatio temporal attributes

such as point and time of entry, time and point of escape or mode of escape, other signatures such as

locations of discarded objects that might be associated with the crime such as food items, telephone

wires, documents, computers etc. These information are normally associated with the offender’s

familarity or awareness of space and opportunities presented at the time of the crime. Hence, a

spatio-temporal pattern in a modus operandi characterisesthe offender’s signature in that particular

crime. A spatio-temporal pattern in a set of crime instancescharacterises their possible connection

and being a part of a series of crimes committed by the same individual or a group of individuals

operating together.

Geographic Profiling [36, 37,?]is a methodology for analyzing the geographic locations ofa

linked series of crimes [36, 37]. Rossmo et al.,[37] highlight that ”Scenario Selection” is one of the

critical and time consuming steps in geographic profiling. Scenario selection is a process to identify

a series of statistically related crimes so as to obtain an optimal subset of crime sites that can be

profiled. In this context, one of the important questions asked by geographic profilers is ”While it

is possible to prune the data to eliminate ”suspicious” sites, how can this be done in an unbiased

way?” Traditional geographic profiling tools such as Rigel[37], make use of an Expert System

guided by a set of practical rules that have been developed using human knowledge. However,

these tools cannot handle outliers and noisy data effectively. Hence , they cannot prune out these

data without making use of human guided rules. This may be dueto the absence of a strong

statistical relationship to spatial statistical measuresto identify a series of connected crimes.Also,

this reflects the lack of a correct and complete algorithmic procedure to discover a set of statistically

linked crimes.

In addition to Geographic Profiling, techniques for Offender Profiling are being developed to

link crimes. Salfati et. al[38] showed that serial homicideoffenders revealed consistency across

8

the three crimes for all offenders. Adderly[?] proposed techniques based on classical data mining

techniques such as Multilayer Perceptrons and Self Organizing Maps(SOM) to produce a list of

offences that could be attributed to an offender. However these techniques are limited in their

ability to consider spatial characteristics and the skeweddistribution of crime data.

Overall, tactical crime analysis methods are limited in their ability to a) Identify a set of statis-

tically linked crimes that are a part of potential series , b)statistically linking the spatio-temporal

signatures of Modus operandi from different crime reports,c) Identification of a common source be-

tween geographically distributed crime data in spatio-temporal context so as to categorize whether

the a set of crimes are committed by the same indiidual or different individuals or individuals oper-

ating in a group and d) scalability to large datasets, that isthe techniques are suitably efficient and

practical when a large input dataset is provided.

Some of the techniques or methods used by crime analysts and tools such as CrimeStat [24],

extensively rely on spatial statistical measures such as Ripley’s K-Function[33] whose value repre-

sents a global measure of spatial auto-correlation, Knox Index [22, 23] etc.Diggle et al. [20] have

proposed a space-time K function which does not require the a-priori specification of a threshold

distance and time, and finds a space-time correlation based on separation over space and time.

However, the only issue with spatial statistics based techniques is their scalability to large datasets

as they cannot perform an early pruning of spurious patternsor an early identification of valid

patterns to avoid unnecessary computational overhead.

While combining data sets from multiple state and law enforcement agencies one of the seri-

ous issues faced by crime analysts and practitioners is the problem of identifying common sources

between various datasets. Since, most of these datasets arespatio-temporal in nature and it is chal-

lenging to identify common sources For example, differentiating between serial criminals operating

in two different locations, with same or different name or journey to crime patterns. Traditionally

crime analysis has dependend on text mining techniques to identify common sources. However,

complexity of spatio temporal data and intrinsic spatio temporal relationships limits the usefulness

of conventional data mining and text mining techniques for extracting spatio-temporal patterns[40]

9

The limitations of the existing state of the in the context ofthe problem defined in previous

section can be listed as follows: a) existing techniques in crime analysis do not consider micro-

environmental factors, b) state of the art techniques such as geographic profiling require human

expertise to generate rule bases consisting of different scenarios and cannot address the risk of

generating spurious patterns , c) existing methods usuallyassume normal ditribution of datasets

or the type of distribution is known apriori and d) existing methods cannot scale to very large

datasets. We propose to use spatio-temporal data mining techniques along with best industrial

practices of data mining to develop techniques and tools that can be used to link crimes based on

disparate sources of information including modus operandicharacteristics as well as the underlying

environmental characteristics.

3. Research Design and Methods

The technical challenges in crime analysis are the existence of an exponential set of hypotheses

to solve open crime cases, extraction of non-trivial and/orpreviously unknown spatial relationships,

discovering statistically linked scenarios of connected crimes , common source identification, and

scalability to large spatio-temporal datasets. The limitations of the state of the art such as Hot Spot

Analysis, Geographic Profiling, classical data mining, text mining and spatial statistics motivate

the use of novel spatio temporal datamining techniques.

Our proposed framework is illustrated by Figure 1 which describes the implications of using our

proposed spatio-temporal data mining techniques in crime analysis. It also shows the overall impact

our proposed approach can have on the existing state of crimeanalysis.As described in Figure 1,

the major bottleneck in crime analysis is the existence of a large number of open cases, exponential

set of plausible hypotheses and datasets from different state and law enforcement agencies.

Crime and Intelligence analysts often ask questions such as’Where?’,’When?’,’Who?’ and

’How?’ to formulate, refine,reduce and validate their hypotheses for solving crime. Using our

approach proposed in Figure 1, we would explore methods thatanswer these questions effectively.

Cross Industry Platform for Data Mining(CRISP-DM) Framewo rk for Geographic Crime

10

Figure 1: Spatio Temporal Data Mining Approach and Implications

Series Linkage

We propose to follow a process similar to the Cross Industry Platform for Data Mining(CRISP-

DM)[9] as shown in Figure 2. There are seven iterative stagesin this process specifically tailored

for crime linkage.

Gather domain knowledge about crime, offender, environmental criminology

First and foremost, we need to develop good understanding ofthe domain which in our case

is the domain of crimes, criminals and other factors affecting crime. In the next stage we would

need to develop more understanding about collection of data(Eg. Information contained in the

incident records, dispatch records of the local law enforcement agencies) as well as examine data

for quality issues like missing data, irrelevant or redundant information. The next stage in the life

cycle is to mitigate the problem identified with the data in the previous stage as well as transform

data so as to make it easily consumable by the following stages. Once the data is transformed and

pre-processed, a model for the data is built which is then evaluated by well established validation

11

methodology. The evaluation results may influence the understanding of our domain knowledge as

well as provide valuable and timely information when deployed on the field to detect linked crime.

Figure 2: Cross Industry Platform for Data Mining(CRISP-DM) Framework for Geographic CrimeSeries Linkage

Literature in environmental criminology serve as invaluable sources of information to develop

a good understanding of the domain knowledge with respect tocrimes, criminals and the meth-

ods employed to commit crime. Studies in environmental criminology suggests that analysis of

crimes has four dimensions - victim, offender, geo-temporal and legal[8]. They also suggest that

urban crime has a well defined theoretical model. In crime, the universal 80-20 rule: 20% of some

things are responsible for 80% of outcomes, that is to say that 80% of crimes are involve 20% of

people(criminals or victim) or in 20% of places more than often is true. Further crimes are com-

mitted by offenders who operate together in loosely formed co-offending groups[32] and in most

of the cases the offenders donot exibit a well defined or standard Modus Operandi or crime sig-

12

Table 1: Example Crime Types from Lincoln City Police datasetCrime Types Assault, Burglary, Larceny, Robbery, Vandalism

nature. However, offenders do favour particular types of buildings, use a finite variety of methods

to gain entry and have slight temporal preferences when committing crimes[1]. Routine Activity

Theory[12, 16, 11] suggest that there must be convergence intime and space of a likely offender,

a suitable target and the absence of a suitable guardian for acrime to occur. Crime Pattern Theory

provides valuable information about how people interact with their physical environment. Rational

Choice Perspective Theory[13] provides an analysis on the offender’s decision making processes

based on maximizing the gain from the crime while trying to minimize risk of being caught.

Based on the domain knowledge, we propose to classify crimesinto various crime types and

define a process that consists of a sequence of steps that needto be performed for each individual

crime type. The data set from Lincoln City Police Departmentlists some of the crime types as

shown in Table . To define a process, consider the case of a burglary. A burglary involves the

following sequence of steps: Jdentify Target, Gain entry, identify items to be stolen, steal them and

finally exit from the premise. Each step has a well defined, finite set of methods or techniques that

can be easily defined.

Understand Crime Data

Once the crime types and the process for each crime type are specified using domain knowledge,

the next step would be to identify, collect and examine the data from incident records, dispatch

records as well as data common to state/local law enforcement agencies. At this stage special

emphasis would be placed on identifying spatial features inthe data. For Eg, during a burglary the

method of gaining entry can be either from the front door, side window, rear door or a fire exit.

The spatial co-ordinate of the ”‘method of entry”’ dimension in the data such as front, side, rear

is important to the development of signature during the model building stage as this would enable

identification of particular preference of the offender in choosing the method of entry.

Data Preparation

13

The next step in the CRISP-DM process involves resolving issues with data collected. The data

collected might have some missing as well as irrelevant information. The issue of missing data is

taken care of usually by replacing the missing values by a place holder. Irrelevant or redundant

information can be filtered out using feature selection algorithms. Further, data might not be in

a format suitable for data mining algorithms. For instance,the Lincoln City Police Department’s

incident record dataset contains comments recorded by police officers in the form of unstructured

text but the information contained in them is of high value tothe data mining algorithms. Thus

a tool to convert this unstructured text into valuable information might be very handy for further

analysis of crimes.

Model Building using Data Mining Algorithms

Once the data quality issues are taken care of, the next step in the CRISP-DM involves building

a model for the analysis of crime data. Model building in Datamining can be addressed broadly

in two different ways. First, the supervised method of modelbuilding refers to the use of labeled

information to build a model. Classification is one of the best examples for the supervised approach.

In classification, the model is built on a dataset that has been labeled or classified previously using

domain knowledge and the model is evaluated on unlabeled data called the test dataset. For crime

linkage analysis many common classification techniques such as Naive Bayes, Bayesian Belief

Networks[25], Multi Layer Perceptrons[1] have been used previously. Unsupervised techniques

are distinguished from supervised techniques by virtue of not using records with manual labels.

Clustering is one of the best examples for an unsupervised data mining technique. Clustering

algorithms like Self Organizing Maps(SOM)[1] have been applied previously for operational crime

fighting. There also exists a vast amount of literature on identification of hot spots based on crime

incidents in the geographical area. Clustering techniquesneed the specification of a similarity or a

distance function in order to group a set of records that are similar to each other in the same cluster

than to those records in a different cluster.

Many general purpose data mining tools, such as Clementine,See5/C5.0, and Enterprise Miner,

are designed to analyze large commercial databases. Although these tools have been used in an-

14

alyzing scientific and engineering data, astronomical data, multi-media data, genomic data, and

web data, they donot address spatio-temporal characteristics of crime data. For Instance, specific

features of geographical data like rich array of data types,implicit spatial relationships among the

variables, observations that are not independent and spatial autocorrelation among the features lead

to poor performance of generic data mining algorithms and the need for specialized data mining

algorithms for spatial data[40]. Existing approaches to crime linkage analysis use generic data min-

ing techniques that incorporate geography and time as features rather than use spatial properties like

spatial autocorrelation.

We therefore propose to explore development of novel spatial data mining algorithms that can

identify crime links utilizing both temporal as well as spatial dimensions of crime data. Specifi-

cally, we seek to build upon techniques such as Spatial Autoregressive Regression(SAR), Markov

Random Fields(MRF)[39] or other classification techniquesthat incorporate spatial dependence

or context into them. In the case of unsupervised techniques, we propose to explore similarity

measures that incorporate characteristics of spatial data. We will also explore the possibility of

applying techniques such as spatial co-location pattern mining[19] to discover previously unknown

and interesting spatial patterns. Given crime data containing crime types, crime instances, location

of special events, locations of other business entities andlocations of criminal’s residences, the

co-location algorithm extracts previously unknown relationships among these entities.

Further most of the generic data mining algorithms assume normal distribution of data while

it is common to have a Poisson or a Bernoulli distribution in crime data. We propose to develop

algorithms that consider other possible distributions in data as well as take into consideration the

micro-environmental characteristics to identify crime links.

So as to make our approach practically feasible and overcomepitfalls, we would validate our

approach analytically and exprimentally. Figure 3, illustrates our validation setup to over come

pitfalls and ensure the consistency of patterns discoveredfrom spatio-temporal crime datasets.

Evaluation

As shown in Figure 3, the experimental and anlytical evaluation of our proposed algorithms and

15

Figure 3: Validation Methodlogy to overcome pitfalls

interest measures would involve testing them with both realand synthetic datasets.To minimize

potential pitfalls in our algorithms/interest measures wewould validate them extensively based

on different criteria . Specifically, We will answer questions such as: What are the high interest

zones? (the parameter values for which a specific algorithm produces a large number of patterns

with high interest measure values), What are the dominance zones(the parameter values for which

a specific algorithm is the fastest ) among the different pruning strategies for large datasets ? What

is the effect of number of event types on the runtime of the algorithm? What is the effect of the

values of different timing parameters which are provided asinput to ST cascade algorithms on

their performance? What are the appropriate choices of different timing parameters for different

problem characteristics?

Deployment

We also propose to develop the proposed data mining algorithms and incorporate them into

an automated data mining framework such as CRISP-DM described above, thus resulting in an

easily usable tool for crime analysts. Our emphasis would beon implementing the algorithms in a

modularized manner so that they can either be used as a stand alone tool by crime analysts or as

16

integrated with existing software tools used by crime analysts such as Crimestat.

The challenges towards realizing our proposed approaches are the following: a. risk of gen-

erating spurious patterns, b. exhorbitant computational cost, c. presence of missing information

or noisy data and d.integration with existing crime analysis tools like Crimestat [24]. To address

these challenges we would explore the following: a) proposecomposite- multi dimensional interest

measures that are related to statistical measures proposedin spatial statistics, b) we would propose

scalable, computationally efficient, correct and completealgorithms to discover statistically mean-

ingful patterns c.)we would establish the correctness and completeness of algorithms, d)we would

explore measures for the early discovery and removal of spurious patterns so as to prevent the prop-

agation of errors, we would design composite multi-dimensional interest measures to acheive this.

and e) we would incorporate our algorithms as user friendly tools which can be added as .NET

components to popular tools like Crimestat.

One of the requirements of the proposed approach is the availability of real datasets for vali-

dating our proposed composite multi-dimensional interestmeasures and algorithms. We would get

real datasets from Lincoln city police department, Lincoln, NE.

Soundness of STDMP for GSCLA

As shown in Figure 4 datamining is a secondary or an exploratory analysis technique which

assumes little about the dataset, hypotheses specific data collection need not be performed. Hence,

this reduces a great effort in the side of law enforcement agencies which make several primary

hypotheses , collect the data and then further refine their hypotheses. Our proposed methods just

require data that has been collected without any type of primary hypotheses to discover useful and

interesting patterns. These patterns can be further analyzed by crime analysts for generating more

refined hypotheses.

Another dimension of soundness of data mining approaches isthe statistical significance of the

interest measures and discovered patterns.To demonstratethe soundness of our spatio-temporal data

mining approach, we will evaluate the statistical significance of the proposed interest measures and

the correctness and completeness of our algorithms. To evaluate our proposed interest measures

17

Figure 4: Data Mining as a secondary or exploratory data analysis

we would relate them to well known statistical significance measures from spatial statistics namely

cross K-function [33], space time K-function[20] and knox index[22, 23]. The major motivation

behind proposing new interest measures is to achieve bettercomputational performance and scal-

ability to large datsets than that is provided by spatial statistical methods. To relate our proposed

interest measures we would prove that our interest measuresare an upper bound to spatial statis-

tical measures. For example, Participation Index(PI) is aninterest measure proposed by Huang et

al. [19], this measure is related to the cross K-function measure proposed by Ripley[33].Figure 4

illustrates the relationship of the PI to the cross K-Function, it can seen that the PI is an upper bound

to the cross K-Function. This proves that the PI discovers patterns that are statistically significant

and can contribute to computational efficiency due to its monotonic nature.

We would also explore a conceptual model of the pattern families extracted using our spatio-

temporal datamining approach. A conceptual model of a pattern family involves the creation of

a taxonomy of different types of patterns that are useful in different application domains and not

restricted to crime analysis.An example of a conceptual model is the model of ’Events and Pro-

cesses’ from domains like time geography[21]. We would explore conceptual models on similar

18

Figure 5: Participation Index upper bound to Cross K-Function

lines.Figure 5, illustrates the various phases involved inestablishing a sound spatio-temporal data

mining approach. This shows the role of conceptual models ofpatterns in our proposed approach.

This helps in identifying a taxonomy of differnt types of patterns that would be useful in different

application domains.

Figure 6: Steps to demonstrate the soundness of the proposedtechnical approach

The proposed project would be accomplished through the following tasks:

Task T1: Classify crimes and develop signatures for each crime typeWe plan to provide a gener-

alized classification of crime types. For each crime type we further plan to specify the signature,

19

that is the sequence of steps that are usually performed to commit the crime based on domain

knowledge.

Task T2: Identify sources of data and transform data to be suitable to be used in building data

mining modelsWe plan to identify data requirements for crime linkage analyis, data sources and

methods as well as techniques to transform data so that it is suitable to be used in building data

mining models for geographic crime series linkage.

Task T3: Develop Spatio-Temporal Data Mining Algorithms for STDMPWe propose to develop

novel, scalable algorithms that consider micro-environmental information as well as spatio-temporal

characteristics of crime data while identifying links between crimes.

Task T4: Validate STDMPWe plan to validate the proposed using real-world data such as Lincoln

city police department, Lincoln, NE crime dataset that contains incident data, dispatch records as

well as other environmental factors like location of bars, etc. We plan to consult domain experts

from criminal justice agencies such as State of Minnesota’sBureau of Criminal Apprehension,

various Police Departments and domain experts in Environmental Criminology.

Task T5: Deploy STDMP in crime analysis toolsWe plan to implement the proposed novel al-

gorithms as modularized components that are easy to used as standalone tools as well as easy to

integrate into existing crime analysis tools like Crimestat.

20

4. Implications for Criminal Justice Policy and Practices Our proposed approach of spatio-

temporal data mining to identify geographically linked crimes, previously unkown spatial relation-

ships and geographic crime series aims to minimize the manual effort and intervention required by

automatically mining patterns from the data using novel methods. Whenever a crime is committed,

law enforcement officers may have to go through the background information of a large number

of past criminals to narrow down the number of suspects whichis a time consuming task [17]

which can be automated and can save time in Figure 4, the rectangle denotes the universal set of all

hypotheses.

Set of HypothesesGeo−link solved by

analysisapproachessolved by both

HypothesesSet of

link analysis

HypothesesSet of traditional solved by

Figure 7: Different Hypotheses of Crime

The set of hypotheses identified by the traditional link analysis techniques are denoted by the

circle on the left. Link analysis techniques that take into consideration spatial properties, identify

another set of hypotheses as shown in the cirle to the right. Our aim is to identify the set of

hypotheses in the intersection of the two circles which reduces the size of the set of hypotheses,

leading to reduced manual effort, which will help practioners to ensure timely action and policy

makers to formulate relevant policies based on geographic areas.

Our team includes collaborators from the Minnesota Department of Public Safety , Bureau of

Criminal Apprehension (BCA), CriMNet Group Program Office.CriMNet[27], a part of BCA,

is a state-level program that works with Minnesota state andlocal agencies to make accurate and

comprehensive criminal justice information available to criminal justice professionals in law en-

forcement. Specifically CrimeNet has a Name Event Index Service(NEIS) and Comprehensive

Incident-Based Reporting System(CIBRS)[26] that are focused to collect, organize and link in-

dividuals, incidents and events across multiple resord systems used by multiple justice entities.

21

Colloborators from BCA will provide necessary informationand contacts on the field, specifically

police departments in the state of Minnesota that might be potential users of the results of proposed

research. They are enthusiatic to incorporate the developed approach into their systems to aid the

crime analysts.

Policy makers can bring in relevant policy changes based on the discovery of new patterns

from our proposed spatio-temporal data mining approach to crime linkage analysis. For instance,

A Brazilian city of Diadema passed a legislation to shut downbars early leading to reduction

in homicides by about half and reduction in other crimes and events as mentioned above in the

research methods relating to discovery of unknown spatial relationships in crime[10].

5. Management Plan and Organization

We will measure the succes of this project in terms of (i) succesful research resulting in the

creation of new spatio-temporal data mining techniques, (ii) the building of new tools embodying

the new results, an their use by crime analysis experts, (iii) the success in being able to reduce the

plausible set of hypotheses to solve cime.

The detailed project plan is detailed in Table 2

Table 2: Project Task Schedule for Tasks T1T5 described in Section 3Quarters Year 1 Year 2

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4Scientific ApproachSTDMP for GCSLA T1 T2 T3 T3 T4 T4 T5 Final Report

Management ApproachProgress Monitoring Quarterly review of goals Final Report

NIJ Reports Half Yearly progress report to NIJ Final Report

The team, consisting of geographic information scientistsand crime analysts, is capable of car-

rying out the proposed tasks. They not only have strong trackrecords in G.I.Sc., data management,

and human activity (e.g. crime) analysis but they have also worked collaboratively. The PI, Dr.

Shashi Shekhar, is a leader in Spatio-temporal data management and analysis. The Co-PI, Dr.

Jaideep Srivatsava, is a leader in the area of Web Mining and Database Systems. Professor Richard

Block, PhD, Emeritus Professor of Sociology and Criminal Justice at Loyola University Chicago,

has been studying the relationship between crime and community for the last 30 years. The col-

22

Table 3: Dissemination strategyDeliverable Target AudienceScholarly publications in Crime Analysisconferences and journals Research Community.NET components of algorithms Crime analysts andin tools such as CrimeStat practitionersResulting Patterns Policy Makers

laborators from the Minnesota Department of Public Safety ,Bureau of Criminal Apprehension,

CriMNet Group Program Office The CriMNet program office regularly involves subject matter ex-

perts from the law enforcement community in research and analysis projects and proposes to do so

with this project. The researchers and collaborators make this team truly unique.

6. Dissemination Strategy

The new algorithms, techniques and tools would be disseminated to academic conferences in

data mining, spatio-temporal data analysis and special crime mapping related conferences like

MAPS (Mapping and analysis for Public Safety) orgainzed by the National Institute of Justice

(NIJ). Further, several techniques developed may be incorporated as tools to be used by state an-

gencies like the Minnesota Bureau of Criminal Apprehensionand also as a part of spatial statistics

applications like Crimestat.

A Dissemination strategy is shown in Table 3.

23

Appendix I

References

[1] R. Adderley. The use of data-mining techniques in operational crime fighting. In M. M.Kantardzic and J. Zurada, editors,Next Generation of Data-Mining Applications. John Wileyand Sons Inc., Hoboken, NJ, USA, 2005.

[2] Pieter Adriaans and Dolf Zantinge.Data mining. Addison-Wesley Longman Publishing Co.,Inc., Boston, MA, USA, 1997.

[3] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules inlarge databases. InProceedings of the 20th International Conference on Very Large DataBases, 1994.

[4] L. Anselin. Local indicators of spatial association-lisa.Geographical Analysis, 27(2):93–155,1995.

[5] Mikhail Bilenko. Learnable Similarity Functions and Their Application to Record Linkageand Clustering.PhD thesis, Department of Computer Sciences, University ofTexas at Austin,2006.

[6] R. Block and C. R. Block. Place, space, and crime: A spatial analysis of liquor places. InJ. Eck and D. Weisburd, editors,Crime and Place. Criminal Justice Press, 1996.

[7] R. Block and C. R. Block. Risky places: A comparison of theenvirons of rapid transit stationsin chicago and the bronx. In J. Mollenkopf, editor,Analyzing Crime Patterns: Frontiers ofPractice. Sage Publishing, 1999.

[8] Paul J. Brantingham and Patricia L. Brantingham.Environmental Criminology. WavelandPress, Long Grove, IL, USA, 1990.

[9] Pete Chapman, Julian Clinton, Randy Kerber, Thomas Khabaza, Thomas Reinartz, ColinShearer, and Rudiger Wirth.CRISP-DM 1.0: Step-by-Step Data Mining Guide. CRISP-DMconsortium: NCR Systems Engineering Copenhagen (USA and Denmark) DaimlerChryslerAG (Germany), SPSS Inc. (USA) and OHRA Verzekeringen en BankGroep B.V (The Nether-lands), 2000.

[10] Brazil city slashes crime by closing its bars early. Sanfrancisco chronicle.http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2006/05/10/MNGIOIOQ3M1.DTL, May2006.

[11] R. V. Clarke and M. Felson. Introduction: Criminology,routine activity and rational choice.In R. V. Clarke and M. Felson, editors,Routine Activity and Rational Choice: Advances inCriminology Theory, volume 5. Transaction Publishers, Somerset, NJ, USA, 1993.

[12] L. E. Cohen and M. Felson. Social change and crime rate trends: A routine activity approach.American Sociological Review, 44:588–608, 1979.

[13] D. Cornish and R. V. Clarke. Introduction. In D. Cornishand R. V. Clarke, editors,TheReasoning Criminal.Springer-Verlag, 1985.

[14] John E. Eck and et. al. Mapping crime: Understanding hotspots. US National Institute ofJustice (http://www.ncjrs.gov/pdffiles1/nij/209393.pdf), 2005.

[15] Ronald E.Wilson and Katie M.Filbert. Crime mapping andanalysis. InEncyclopedia of GIS.Springer, 2008.

[16] M. Felson. Routine activities and crime prevention: Armchair concepts and practical action.Studies on Crime and Crime Prevention, 1:30–34, 1992.

[17] Bill Mc Garigle. Crime Profilers Gain New Weapons: Linkage anal-ysis and geographic profiling systems get criminals where they live.http://www.vgin.virginia.gov/documents/articles/localgovt/Crime%20ProfilersGain New Weapons.htm,1997.

[18] A. Getis and J.K. Ord. Local spatial statistics: An overview. In Spatial Analysis: Modellingin a GIS Environment, pages 261–277. GeoInformation International, Cambridge, England,1996.

[19] Yan Huang, Shashi Shekhar, and Hui Xiong. Discovering co-location patterns from spa-tial datasets: A general approach.IEEE Transactions on Knowledge and Data Engineering(TKDE), 16(12):1472–1485, December 2004.

[20] Peter J.Diggle, AG Chetwynd, R. Hggkvist, and SE Morris. Second-order analysis of space-time clustering.Statistical Methods in Medical Research, 4(2):124–136, 1995.

[21] Harvey J.Miller. Time geography. InEncyclopedia of GIS. Springer, 2008.

[22] G. Knox. Detection of low density epidemicity.British Journal of Preventative and SocialMedicine, 17(1):21–27, 1963.

[23] G. Knox. Epidemiology of childhood leukaemia in northumberland and durham.BritishJournal of Preventative and Social Medicine, 18:17–24, 1984.

[24] Ned Levine. CrimeStat 3.0: A Spatial Statistics Program for the Analysis of Crime Inci-dent Locations. Ned Levine & Associatiates: Houston, TX / National Institute of Justice:Washington, DC, 2004.

[25] G. C. Oatley, J. Zeleznikow, and Ewart B. W.”. Matching and predicting crimes. In A. Mac-intosh, R. Ellis, and T. Allen, editors,Applications and Innovations in Intelligent Systems XII.Proceedings of AI2004), pages 19–32, 2004.

[26] State of Minnesota Bureau of Criminal Apprehension. Com-prehensive Incident Based Reporting System - CIBRS.http://www.bca.state.mn.us/cibrs/Documents/CIBRS%20Fact%20Sheet.pdf, 2007.

[27] State of Minnesota Bureau of Criminal Apprehension. CriMNet.http://www.crimnet.state.mn.us/Misc/AboutCrimnet.htm, 2007.

[28] Atsuyuki Okabe, KeiIchi Okunuki, and Shino Shiode. Thesanet toolbox: New methods fornetwork spatial analysis.Transactions in GIS, 10(4):535–550, 2006.

[29] Brantingham P.J. and P.L.” Brantingham. Environmental criminology. Prospect Heights, IL:Waveland, 1991.

[30] J. Ratcliffe. Near repeat calculator. ”http://www.temple.edu/cj/misc/nr/access.asp?ac=emsub”,2007.

[31] Jerry H. Ratcliffe. The hotspot matrix: A framework forthe spatio-temporal targeting ofcrime reduction.Police Practice and Research, 5(1):05–23, 2004.

[32] A. J. Reiss. Co-offending and criminal careers. In M. Tonry and N. Morris, editors,Crimeand Justice: A Review of Research, volume 10. University of Chicago Press, 1988.

[33] B.D Ripley. The second-order analysis of stationary point processes.Applied Probability,13(2):55–66, 1976.

[34] S.A. Roach.The Theory of Random Clumping. Methuen, London, 1968.

[35] Caterina Gouvis Roman. Routine activities of youth andneighborhood violence: Spatialmodeling of place, time and crime. In Fahui Wang, editor,Geographic Information Systemsand Crime Analysis, chapter 17, pages 293–310. Idea Group, Hershey, PA, USA, 2005.

[36] D.K. Rossmo.Geographic Profiling. CRC Press, Boca Raton, FL , USA, 2000.

[37] Kim D. Rossmo, Ian Laverty, and Brad Moore. Grogaphic profiling for serial crime investiga-tion. In Fahui Wang, editor,Geographic Information Systems and Crime Analysis, chapter 6,pages 102–117. Idea Group, Hershey, PA, USA, 2005.

[38] C. G. Salfati and A. L. Bateman. Serial homicide: an investigation of behavioural consistency.Journal of Investigative Psychology and Offender Profiling, 2:121–144, 2005.

[39] S. Shekhar, P. Schrater, R. Vatsavai, W. Wu, and S. Chawla. Spatial contextual classificationand prediction models for mining geospatial data, 2002.

[40] Shashi Shekhar, Pusheng Zhang, Yan Huang, and Ranga Raju Vatsavai.Data Mining: NextGeneration Challenges and Future Directions - Trends in Spatial Data Mining. AAAI Press,Menlo Park, CA, USA, 2004.

[41] S. Shiode and A. Okabe. Network variable clumping method for analyzing point patternson a network. InUnpublished paper presented at the Annual Meeting of the Associations ofAmerican Geographers, Philadelphia, Pennsylvania, 2004.

[42] Xiaoning Yang William M. Pottenger and Stephen V. Zanias. Link Analysis Survey StatusUpdate January 2006. Technical report, Lehigh University Computer Science and Engineer-ing Department, 2007.

RESEARCH PROPOSAL - Semantic Scholar...RESEARCH PROPOSAL 1. Title: Geographic Crime Linkage...

Documents

Transcript of RESEARCH PROPOSAL - Semantic Scholar...RESEARCH PROPOSAL 1. Title: Geographic Crime Linkage...