RESEARCH PROPOSAL - Semantic Scholar...RESEARCH PROPOSAL 1. Title: Geographic Crime Linkage...
Transcript of RESEARCH PROPOSAL - Semantic Scholar...RESEARCH PROPOSAL 1. Title: Geographic Crime Linkage...
RESEARCH PROPOSAL
1. Title: Geographic Crime Linkage Analysis: A Spatio-Temporal Data Mining Approach
2. Research Topic: Topic 5. Geographic Crime Series Linkage Analysis
3. Principal Investigators:PI: Shashi Shekhar, ProfessorEmail: [email protected]
Co-PI: Jaideep Srivastava, ProfessorEmail: [email protected]
Geographic crime linkage analysis: A Spatio-temporal datamining approach
a. Abstract
Geographic crime linkage analysis focuses on identifying spatially grouped serial crimes and
criminals from a given a set of crime reports and other related information provided by state and
local law enforcement agencies. Discovering and tracking spatial relationships from crime data is
an important problem in crime analysis e.g. identifying relationships that exist among crimes com-
mitted by the same offender (e.g. serial killer) or same group of offenders (e.g. organized crimes).
However, geographic crime linkage analysis is challengingdue to several reasons: i) existense of
disparate sources of data in the form of incident reports, dispatch records and modus operandi in-
formation (ii) crime data might have spatially skewed ditributions (iii) size, volume and complexity
of data available to the law enforcement agencies is growing(iv) high risk of generating spurious
patterns and (v) presence of large amounts of missing or imprecise data.Existing work in crime
analysis assume a normal distribution of crime datasets anddo not consider micro-environmental
factors into account. Also, they focus on manual discovery of spatially grouped crimes whose re-
sults might be analyst-oriented, dependent on the underlying distribution of the data. Existing tools
in crime analysis, classical data mining and geographic profiling make use of a variety of techniques
which have these limitations. The focus of this proposal is to create and explore a novel spatio-
temporal data mining platform (STDMP) for geographic crimeseries linkage analysis (GCSLA)
which includes developing a spatio temporal data mining framework that can address the spatio-
temporal nature of crime data, consider micro-environmental factors while generating hypotheses,
do not assume any specific distribution of the data and can scale large to crime datasets.We would
validate our proposed approach with real datasets from law enforcement agencies and also deploy
our framework as components of existing crime analysis tools.
2
b. Table of Contents
a ABSTRACT 2
c Research Plan or Main Body 4
1. Purpose, Goals and Objectives . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 4
2. Review of Relevant Literature . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 7
3. Research Design and Methods . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 11
4. Implications for Criminal Justice Policy and Practise . .. . . . . . . . . . . . . . . . 20
5. Management Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21
6. Dissemination Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 21
d. Appendices
3
c.Research Plan or Main Body
1. Purpose, Goals and Objectives -
Geographic Profiling is one of the most common techniques used by law enforcement and crim-
inal justice agencies in order to prioritize the search areafor a serial offender’s home or other anchor
point. The first step in geographic profiling often termed as ”‘Scenario Selection”’ or ”‘Crime Se-
ries Linkage Analysis”’ deals with the analysis of crime data for a connected series of crimes.
Formally this problem is defined as follows: Given crime datain the form of conviction records,
incident reports about specific crime instances, Modus Operandi information as well as other data
common to state and local law enforcement agencies such as geography of different localities, spa-
tial foot print of building premises and locations of crime generating locations like bars, schools,
bus stops etc, geographic crime series linkage analysis identifies a set of spatially grouped crimes
or crimes with previously unknown spatial relationships that form a series, such that the result-
ing series are statistically significant. We propose to build algorithms and tools to identify sets of
spatially grouped crimes that are statistically linked.
Classical Linkage analysis also known as comparative case analysis tools are useful for crime
analysts to identify and extract series of crimes. Their objective is to reduce manual labor in sifting
through numerous crime reports.However, growth in varietyand volume of observational data has
out-paced the ability of traditional crime analysts to identify, discover, and track spatially grouped
crimes and crime series. Incorporation of spatial featuresfrom the data in linkage analysis will
further eliminate false series and consequently improve the efficiency and effectiveness of law
enforcement in dealing with crimes.
Identifying and extracting spatially grouped crimes that are statistically linked is inherently hard
because (i) Disparate sources of information need to be combined to find the links between crimes.
(ii) Spatial as well as temporal characteristics of crime data need to be considered while identifying
the series. For instance crimes have a spatially skewed distribution and hence a series occurring
in a high crime neighborhood might be difficult to detect because so many other similar crimes
are occurring. (iii) Volume of data available to the law enforcement agencies in terms of dispatch
4
records, incident records, details about micro-environment and geography is growing manifold due
to increase in the number of crimes as well as improvement andstandardization of techniques for
evidence gathering. (iv) Generation of spurious or unwanted patterns need to be minimized. (v)
Presence of large amounts of unwanted or irrelevant data as well as missing data from data sets.
The state of the art in crime analysis, classical linkage analysis, classical data mining and spatial
statistics are limited in their ability to identify statistically meaningful and intersting geographic
crime series from large datasets. Specifically, there are three main limitations of the state of the
art namely: a) traditional methods do not consider micro environmental factors in to account while
performing linkage analysis, b) existing methods in crime analysis assume a normal distribution
of the data , however data can have both poisson and bernoullidistributions and c) traditional
approaches such as scenario selection may not be scalable tolarge datasets.
The overall objective of this proposal is to discover a geographic crime series from spatio-
temporal crime datasets so as to aid in geographic profiling.The proposed approach to overcome
critical limitations of existing techniques relies on spatio-temporal data mining techniques. We
propose to incorporate the resulting algorithms and techniques of this research into software tool
kits to be used by crime analysts.
To address the various challenges in geographic crime series linkage analysis we propose a
spatio-temporal data mining paltform (STDMP) for geographic crime series linkage analysis(GCSLA).
This approach seeks to build upon existing CRISP-DM (Cross Industry platform for Data Mining)
for GCSLA. We propose to develop data mining algorithms thattake into account the spatial and
temporal characteristics of crime data. The major goal of this research is to utilize spatial and tem-
poral characteristics as well as the micro-environmental factors such as locations of bars, schools,
spatial footprint of buildings and an offenders awareness of space.
Proposed research has the potential to make a significant contribution in the area of geographic
crime linkage analysis by advancing methods for identifying, discovering, predicting, and tracking
spatial and spatio-temporal crime series. Even though the focus will be on analyzing crime, re-
sulting analytical tools and techniques are likely to benefit identifying, discovering, predicting, and
5
tracking many other kinds of spatio-temporal patterns in homeland sequrity and epidemiology, etc.
2. Review of Relevant Literature
Several disciplines have been involved in developing linkage analysis tools that could be applied
to the problem of identifying a geographic crime series. Thestate of the art related to this problem
can be divided into different approaches such as traditional crime analysis, strategic crime analysis
and tactical crime analysis. Crime analysts have mainly leveraged on spatial statistics and classical
data mining for selecting and refining their hypotheses for solving crime.
Traditional crime analysis is based on theories and concepts such as a) Routine activity theory
[12] that formalizes the three factors affecting crime namely a likely offender, a suitable target and
the absence of law enforcement, b) rational choice theory [13] which describes various constraints
evaluated by offenders to commit crime, c)crime pattern theory drawn from Environmental Crimi-
nology [8] which integrates routine activity theory and rational choice theory within a geographic
framework and d) notions of crime attractors and crime generators such as restaurants, shopping
centres, parks etc.[29, 7, 6].
Strategic crime analysis methods include, hot spot analysis and extraction of space time clusters.
These methods mainly focus on identifying a set of spatiallygrouped crimes, high crime areas or
hotspots to target crime reduction efforts
Hot Spot Analysis [4, 14, 18, 24, 31, 34, 41, 28], discover high-density regions from point
datasets which show the actual locations of the crimes. Thishas been traditionally used by law en-
forcement agencies to undertake policy measures such as optimal placement of resources and crime
reduction efforts. These methods focus on the discovery of the geometry (e.g. circle, ellipse, etc.)
of the high-density regions [14]. For example, the Spatial and Temporal Analysis of Crime (STAC)
module Crimestat[] , nearest neighbor hierarchical clustering techniques, and K-means clustering
techniques are among the methods that use the ellipse methodto identify hotspots [24].Kernel
density estimation methods have been developed to identifyisodensity hotspot surfaces because
hotspots may not have crisp ellipsoid boundaries. Local indicators of spatial association (LISA)
statistics were proposed to eliminate the limitations of ellipsoid-based and kernel-based estimation
6
techniques [4, 18]. The clumping method was proposed by Roach to discover clumped points (e.g.
hotspots) from a point dataset [34]. Roman, [35] points out the need for extending hotspot analysis
techniques to aggregate crime data.
A major limitation of these hotspot analysis methods is thatthey can compute only statisti-
cally significant groupings or clusters of crimes, however they cannot discover unknown spatial
relationships between different types of crime events. Forexample,{Bars,AutoThe f t} may be an
interesting spatial relationship. In the context of routine activity theory, Roman [35] highlights that
schools and bars are potential areas for crimes and hence they are locations which can have a high
activity of crime attractors and crime generators. The study of micro-environments and its effect
on crimes is thriving. Block et. al. [7, 6] have presented a detail analysis of spatial locations like
bars and transit stations. Given a large spatial-temporal dataset, hot spot analysis methods can only
identify high density areas of crime but cannot identify a strongly correlated set of crime types or a
strong correlation between a spatial feature such as a Bar and a crime type such as Auto theft. Ex-
isting hotspot analysis methods also do not incorporate temporal measures such as time of the day,
day of the week etc [35]. Hence, these traditional hotspot analysis techniques also cannot identify
a group of statistically linked crimes belonging to a series. Although SatScan explicitly looks at
space and time hot spots, but it does not incorporate others features of an area except population.
The knox index[22, 23] and near repeat calculator by Jerry Ratcliffe[30] explicitly look at space
and time, but do not identify specific hot spots.
Tactical crime analysis focusses on building signatures ofspecific crime instances by making
use of information from crime incident reports, modus operandi information and a large number
of historical case reports.The state of the art in tactical analysis include geographic profiling, MO
classification and manual linking of a large number of geographically distributed crimes by sifting
through a large number of open and solved cases.
A major hypotheses tactical analysis methods use is that ’A large proportion of crime is com-
mitted by a small proportion of offenders’. In the process oflinking several crime instances and
modus operandi signatures crime analysts make an exponential set of hypotheses called scenarios
7
and term this phase of identifying a group of linked crimes as”‘scenario selection”’. The presence
of an exponential set of scenarios makes tactical analysis really challenging. Crime analysts make
use of linkage analysis techniques in order to link several crime instances.
Modus operandi information used in tactical crime analysisconsist of spatio temporal attributes
such as point and time of entry, time and point of escape or mode of escape, other signatures such as
locations of discarded objects that might be associated with the crime such as food items, telephone
wires, documents, computers etc. These information are normally associated with the offender’s
familarity or awareness of space and opportunities presented at the time of the crime. Hence, a
spatio-temporal pattern in a modus operandi characterisesthe offender’s signature in that particular
crime. A spatio-temporal pattern in a set of crime instancescharacterises their possible connection
and being a part of a series of crimes committed by the same individual or a group of individuals
operating together.
Geographic Profiling [36, 37,?]is a methodology for analyzing the geographic locations ofa
linked series of crimes [36, 37]. Rossmo et al.,[37] highlight that ”Scenario Selection” is one of the
critical and time consuming steps in geographic profiling. Scenario selection is a process to identify
a series of statistically related crimes so as to obtain an optimal subset of crime sites that can be
profiled. In this context, one of the important questions asked by geographic profilers is ”While it
is possible to prune the data to eliminate ”suspicious” sites, how can this be done in an unbiased
way?” Traditional geographic profiling tools such as Rigel[37], make use of an Expert System
guided by a set of practical rules that have been developed using human knowledge. However,
these tools cannot handle outliers and noisy data effectively. Hence , they cannot prune out these
data without making use of human guided rules. This may be dueto the absence of a strong
statistical relationship to spatial statistical measuresto identify a series of connected crimes.Also,
this reflects the lack of a correct and complete algorithmic procedure to discover a set of statistically
linked crimes.
In addition to Geographic Profiling, techniques for Offender Profiling are being developed to
link crimes. Salfati et. al[38] showed that serial homicideoffenders revealed consistency across
8
the three crimes for all offenders. Adderly[?] proposed techniques based on classical data mining
techniques such as Multilayer Perceptrons and Self Organizing Maps(SOM) to produce a list of
offences that could be attributed to an offender. However these techniques are limited in their
ability to consider spatial characteristics and the skeweddistribution of crime data.
Overall, tactical crime analysis methods are limited in their ability to a) Identify a set of statis-
tically linked crimes that are a part of potential series , b)statistically linking the spatio-temporal
signatures of Modus operandi from different crime reports,c) Identification of a common source be-
tween geographically distributed crime data in spatio-temporal context so as to categorize whether
the a set of crimes are committed by the same indiidual or different individuals or individuals oper-
ating in a group and d) scalability to large datasets, that isthe techniques are suitably efficient and
practical when a large input dataset is provided.
Some of the techniques or methods used by crime analysts and tools such as CrimeStat [24],
extensively rely on spatial statistical measures such as Ripley’s K-Function[33] whose value repre-
sents a global measure of spatial auto-correlation, Knox Index [22, 23] etc.Diggle et al. [20] have
proposed a space-time K function which does not require the a-priori specification of a threshold
distance and time, and finds a space-time correlation based on separation over space and time.
However, the only issue with spatial statistics based techniques is their scalability to large datasets
as they cannot perform an early pruning of spurious patternsor an early identification of valid
patterns to avoid unnecessary computational overhead.
While combining data sets from multiple state and law enforcement agencies one of the seri-
ous issues faced by crime analysts and practitioners is the problem of identifying common sources
between various datasets. Since, most of these datasets arespatio-temporal in nature and it is chal-
lenging to identify common sources For example, differentiating between serial criminals operating
in two different locations, with same or different name or journey to crime patterns. Traditionally
crime analysis has dependend on text mining techniques to identify common sources. However,
complexity of spatio temporal data and intrinsic spatio temporal relationships limits the usefulness
of conventional data mining and text mining techniques for extracting spatio-temporal patterns[40]
9
The limitations of the existing state of the in the context ofthe problem defined in previous
section can be listed as follows: a) existing techniques in crime analysis do not consider micro-
environmental factors, b) state of the art techniques such as geographic profiling require human
expertise to generate rule bases consisting of different scenarios and cannot address the risk of
generating spurious patterns , c) existing methods usuallyassume normal ditribution of datasets
or the type of distribution is known apriori and d) existing methods cannot scale to very large
datasets. We propose to use spatio-temporal data mining techniques along with best industrial
practices of data mining to develop techniques and tools that can be used to link crimes based on
disparate sources of information including modus operandicharacteristics as well as the underlying
environmental characteristics.
3. Research Design and Methods
The technical challenges in crime analysis are the existence of an exponential set of hypotheses
to solve open crime cases, extraction of non-trivial and/orpreviously unknown spatial relationships,
discovering statistically linked scenarios of connected crimes , common source identification, and
scalability to large spatio-temporal datasets. The limitations of the state of the art such as Hot Spot
Analysis, Geographic Profiling, classical data mining, text mining and spatial statistics motivate
the use of novel spatio temporal datamining techniques.
Our proposed framework is illustrated by Figure 1 which describes the implications of using our
proposed spatio-temporal data mining techniques in crime analysis. It also shows the overall impact
our proposed approach can have on the existing state of crimeanalysis.As described in Figure 1,
the major bottleneck in crime analysis is the existence of a large number of open cases, exponential
set of plausible hypotheses and datasets from different state and law enforcement agencies.
Crime and Intelligence analysts often ask questions such as’Where?’,’When?’,’Who?’ and
’How?’ to formulate, refine,reduce and validate their hypotheses for solving crime. Using our
approach proposed in Figure 1, we would explore methods thatanswer these questions effectively.
Cross Industry Platform for Data Mining(CRISP-DM) Framewo rk for Geographic Crime
10
Figure 1: Spatio Temporal Data Mining Approach and Implications
Series Linkage
We propose to follow a process similar to the Cross Industry Platform for Data Mining(CRISP-
DM)[9] as shown in Figure 2. There are seven iterative stagesin this process specifically tailored
for crime linkage.
Gather domain knowledge about crime, offender, environmental criminology
First and foremost, we need to develop good understanding ofthe domain which in our case
is the domain of crimes, criminals and other factors affecting crime. In the next stage we would
need to develop more understanding about collection of data(Eg. Information contained in the
incident records, dispatch records of the local law enforcement agencies) as well as examine data
for quality issues like missing data, irrelevant or redundant information. The next stage in the life
cycle is to mitigate the problem identified with the data in the previous stage as well as transform
data so as to make it easily consumable by the following stages. Once the data is transformed and
pre-processed, a model for the data is built which is then evaluated by well established validation
11
methodology. The evaluation results may influence the understanding of our domain knowledge as
well as provide valuable and timely information when deployed on the field to detect linked crime.
Figure 2: Cross Industry Platform for Data Mining(CRISP-DM) Framework for Geographic CrimeSeries Linkage
Literature in environmental criminology serve as invaluable sources of information to develop
a good understanding of the domain knowledge with respect tocrimes, criminals and the meth-
ods employed to commit crime. Studies in environmental criminology suggests that analysis of
crimes has four dimensions - victim, offender, geo-temporal and legal[8]. They also suggest that
urban crime has a well defined theoretical model. In crime, the universal 80-20 rule: 20% of some
things are responsible for 80% of outcomes, that is to say that 80% of crimes are involve 20% of
people(criminals or victim) or in 20% of places more than often is true. Further crimes are com-
mitted by offenders who operate together in loosely formed co-offending groups[32] and in most
of the cases the offenders donot exibit a well defined or standard Modus Operandi or crime sig-
12
Table 1: Example Crime Types from Lincoln City Police datasetCrime Types Assault, Burglary, Larceny, Robbery, Vandalism
nature. However, offenders do favour particular types of buildings, use a finite variety of methods
to gain entry and have slight temporal preferences when committing crimes[1]. Routine Activity
Theory[12, 16, 11] suggest that there must be convergence intime and space of a likely offender,
a suitable target and the absence of a suitable guardian for acrime to occur. Crime Pattern Theory
provides valuable information about how people interact with their physical environment. Rational
Choice Perspective Theory[13] provides an analysis on the offender’s decision making processes
based on maximizing the gain from the crime while trying to minimize risk of being caught.
Based on the domain knowledge, we propose to classify crimesinto various crime types and
define a process that consists of a sequence of steps that needto be performed for each individual
crime type. The data set from Lincoln City Police Departmentlists some of the crime types as
shown in Table . To define a process, consider the case of a burglary. A burglary involves the
following sequence of steps: Jdentify Target, Gain entry, identify items to be stolen, steal them and
finally exit from the premise. Each step has a well defined, finite set of methods or techniques that
can be easily defined.
Understand Crime Data
Once the crime types and the process for each crime type are specified using domain knowledge,
the next step would be to identify, collect and examine the data from incident records, dispatch
records as well as data common to state/local law enforcement agencies. At this stage special
emphasis would be placed on identifying spatial features inthe data. For Eg, during a burglary the
method of gaining entry can be either from the front door, side window, rear door or a fire exit.
The spatial co-ordinate of the ”‘method of entry”’ dimension in the data such as front, side, rear
is important to the development of signature during the model building stage as this would enable
identification of particular preference of the offender in choosing the method of entry.
Data Preparation
13
The next step in the CRISP-DM process involves resolving issues with data collected. The data
collected might have some missing as well as irrelevant information. The issue of missing data is
taken care of usually by replacing the missing values by a place holder. Irrelevant or redundant
information can be filtered out using feature selection algorithms. Further, data might not be in
a format suitable for data mining algorithms. For instance,the Lincoln City Police Department’s
incident record dataset contains comments recorded by police officers in the form of unstructured
text but the information contained in them is of high value tothe data mining algorithms. Thus
a tool to convert this unstructured text into valuable information might be very handy for further
analysis of crimes.
Model Building using Data Mining Algorithms
Once the data quality issues are taken care of, the next step in the CRISP-DM involves building
a model for the analysis of crime data. Model building in Datamining can be addressed broadly
in two different ways. First, the supervised method of modelbuilding refers to the use of labeled
information to build a model. Classification is one of the best examples for the supervised approach.
In classification, the model is built on a dataset that has been labeled or classified previously using
domain knowledge and the model is evaluated on unlabeled data called the test dataset. For crime
linkage analysis many common classification techniques such as Naive Bayes, Bayesian Belief
Networks[25], Multi Layer Perceptrons[1] have been used previously. Unsupervised techniques
are distinguished from supervised techniques by virtue of not using records with manual labels.
Clustering is one of the best examples for an unsupervised data mining technique. Clustering
algorithms like Self Organizing Maps(SOM)[1] have been applied previously for operational crime
fighting. There also exists a vast amount of literature on identification of hot spots based on crime
incidents in the geographical area. Clustering techniquesneed the specification of a similarity or a
distance function in order to group a set of records that are similar to each other in the same cluster
than to those records in a different cluster.
Many general purpose data mining tools, such as Clementine,See5/C5.0, and Enterprise Miner,
are designed to analyze large commercial databases. Although these tools have been used in an-
14
alyzing scientific and engineering data, astronomical data, multi-media data, genomic data, and
web data, they donot address spatio-temporal characteristics of crime data. For Instance, specific
features of geographical data like rich array of data types,implicit spatial relationships among the
variables, observations that are not independent and spatial autocorrelation among the features lead
to poor performance of generic data mining algorithms and the need for specialized data mining
algorithms for spatial data[40]. Existing approaches to crime linkage analysis use generic data min-
ing techniques that incorporate geography and time as features rather than use spatial properties like
spatial autocorrelation.
We therefore propose to explore development of novel spatial data mining algorithms that can
identify crime links utilizing both temporal as well as spatial dimensions of crime data. Specifi-
cally, we seek to build upon techniques such as Spatial Autoregressive Regression(SAR), Markov
Random Fields(MRF)[39] or other classification techniquesthat incorporate spatial dependence
or context into them. In the case of unsupervised techniques, we propose to explore similarity
measures that incorporate characteristics of spatial data. We will also explore the possibility of
applying techniques such as spatial co-location pattern mining[19] to discover previously unknown
and interesting spatial patterns. Given crime data containing crime types, crime instances, location
of special events, locations of other business entities andlocations of criminal’s residences, the
co-location algorithm extracts previously unknown relationships among these entities.
Further most of the generic data mining algorithms assume normal distribution of data while
it is common to have a Poisson or a Bernoulli distribution in crime data. We propose to develop
algorithms that consider other possible distributions in data as well as take into consideration the
micro-environmental characteristics to identify crime links.
So as to make our approach practically feasible and overcomepitfalls, we would validate our
approach analytically and exprimentally. Figure 3, illustrates our validation setup to over come
pitfalls and ensure the consistency of patterns discoveredfrom spatio-temporal crime datasets.
Evaluation
As shown in Figure 3, the experimental and anlytical evaluation of our proposed algorithms and
15
Figure 3: Validation Methodlogy to overcome pitfalls
interest measures would involve testing them with both realand synthetic datasets.To minimize
potential pitfalls in our algorithms/interest measures wewould validate them extensively based
on different criteria . Specifically, We will answer questions such as: What are the high interest
zones? (the parameter values for which a specific algorithm produces a large number of patterns
with high interest measure values), What are the dominance zones(the parameter values for which
a specific algorithm is the fastest ) among the different pruning strategies for large datasets ? What
is the effect of number of event types on the runtime of the algorithm? What is the effect of the
values of different timing parameters which are provided asinput to ST cascade algorithms on
their performance? What are the appropriate choices of different timing parameters for different
problem characteristics?
Deployment
We also propose to develop the proposed data mining algorithms and incorporate them into
an automated data mining framework such as CRISP-DM described above, thus resulting in an
easily usable tool for crime analysts. Our emphasis would beon implementing the algorithms in a
modularized manner so that they can either be used as a stand alone tool by crime analysts or as
16
integrated with existing software tools used by crime analysts such as Crimestat.
The challenges towards realizing our proposed approaches are the following: a. risk of gen-
erating spurious patterns, b. exhorbitant computational cost, c. presence of missing information
or noisy data and d.integration with existing crime analysis tools like Crimestat [24]. To address
these challenges we would explore the following: a) proposecomposite- multi dimensional interest
measures that are related to statistical measures proposedin spatial statistics, b) we would propose
scalable, computationally efficient, correct and completealgorithms to discover statistically mean-
ingful patterns c.)we would establish the correctness and completeness of algorithms, d)we would
explore measures for the early discovery and removal of spurious patterns so as to prevent the prop-
agation of errors, we would design composite multi-dimensional interest measures to acheive this.
and e) we would incorporate our algorithms as user friendly tools which can be added as .NET
components to popular tools like Crimestat.
One of the requirements of the proposed approach is the availability of real datasets for vali-
dating our proposed composite multi-dimensional interestmeasures and algorithms. We would get
real datasets from Lincoln city police department, Lincoln, NE.
Soundness of STDMP for GSCLA
As shown in Figure 4 datamining is a secondary or an exploratory analysis technique which
assumes little about the dataset, hypotheses specific data collection need not be performed. Hence,
this reduces a great effort in the side of law enforcement agencies which make several primary
hypotheses , collect the data and then further refine their hypotheses. Our proposed methods just
require data that has been collected without any type of primary hypotheses to discover useful and
interesting patterns. These patterns can be further analyzed by crime analysts for generating more
refined hypotheses.
Another dimension of soundness of data mining approaches isthe statistical significance of the
interest measures and discovered patterns.To demonstratethe soundness of our spatio-temporal data
mining approach, we will evaluate the statistical significance of the proposed interest measures and
the correctness and completeness of our algorithms. To evaluate our proposed interest measures
17
Figure 4: Data Mining as a secondary or exploratory data analysis
we would relate them to well known statistical significance measures from spatial statistics namely
cross K-function [33], space time K-function[20] and knox index[22, 23]. The major motivation
behind proposing new interest measures is to achieve bettercomputational performance and scal-
ability to large datsets than that is provided by spatial statistical methods. To relate our proposed
interest measures we would prove that our interest measuresare an upper bound to spatial statis-
tical measures. For example, Participation Index(PI) is aninterest measure proposed by Huang et
al. [19], this measure is related to the cross K-function measure proposed by Ripley[33].Figure 4
illustrates the relationship of the PI to the cross K-Function, it can seen that the PI is an upper bound
to the cross K-Function. This proves that the PI discovers patterns that are statistically significant
and can contribute to computational efficiency due to its monotonic nature.
We would also explore a conceptual model of the pattern families extracted using our spatio-
temporal datamining approach. A conceptual model of a pattern family involves the creation of
a taxonomy of different types of patterns that are useful in different application domains and not
restricted to crime analysis.An example of a conceptual model is the model of ’Events and Pro-
cesses’ from domains like time geography[21]. We would explore conceptual models on similar
18
Figure 5: Participation Index upper bound to Cross K-Function
lines.Figure 5, illustrates the various phases involved inestablishing a sound spatio-temporal data
mining approach. This shows the role of conceptual models ofpatterns in our proposed approach.
This helps in identifying a taxonomy of differnt types of patterns that would be useful in different
application domains.
Figure 6: Steps to demonstrate the soundness of the proposedtechnical approach
The proposed project would be accomplished through the following tasks:
Task T1: Classify crimes and develop signatures for each crime typeWe plan to provide a gener-
alized classification of crime types. For each crime type we further plan to specify the signature,
19
that is the sequence of steps that are usually performed to commit the crime based on domain
knowledge.
Task T2: Identify sources of data and transform data to be suitable to be used in building data
mining modelsWe plan to identify data requirements for crime linkage analyis, data sources and
methods as well as techniques to transform data so that it is suitable to be used in building data
mining models for geographic crime series linkage.
Task T3: Develop Spatio-Temporal Data Mining Algorithms for STDMPWe propose to develop
novel, scalable algorithms that consider micro-environmental information as well as spatio-temporal
characteristics of crime data while identifying links between crimes.
Task T4: Validate STDMPWe plan to validate the proposed using real-world data such as Lincoln
city police department, Lincoln, NE crime dataset that contains incident data, dispatch records as
well as other environmental factors like location of bars, etc. We plan to consult domain experts
from criminal justice agencies such as State of Minnesota’sBureau of Criminal Apprehension,
various Police Departments and domain experts in Environmental Criminology.
Task T5: Deploy STDMP in crime analysis toolsWe plan to implement the proposed novel al-
gorithms as modularized components that are easy to used as standalone tools as well as easy to
integrate into existing crime analysis tools like Crimestat.
20
4. Implications for Criminal Justice Policy and Practices Our proposed approach of spatio-
temporal data mining to identify geographically linked crimes, previously unkown spatial relation-
ships and geographic crime series aims to minimize the manual effort and intervention required by
automatically mining patterns from the data using novel methods. Whenever a crime is committed,
law enforcement officers may have to go through the background information of a large number
of past criminals to narrow down the number of suspects whichis a time consuming task [17]
which can be automated and can save time in Figure 4, the rectangle denotes the universal set of all
hypotheses.
Set of HypothesesGeo−link solved by
analysisapproachessolved by both
HypothesesSet of
link analysis
HypothesesSet of traditional solved by
Figure 7: Different Hypotheses of Crime
The set of hypotheses identified by the traditional link analysis techniques are denoted by the
circle on the left. Link analysis techniques that take into consideration spatial properties, identify
another set of hypotheses as shown in the cirle to the right. Our aim is to identify the set of
hypotheses in the intersection of the two circles which reduces the size of the set of hypotheses,
leading to reduced manual effort, which will help practioners to ensure timely action and policy
makers to formulate relevant policies based on geographic areas.
Our team includes collaborators from the Minnesota Department of Public Safety , Bureau of
Criminal Apprehension (BCA), CriMNet Group Program Office.CriMNet[27], a part of BCA,
is a state-level program that works with Minnesota state andlocal agencies to make accurate and
comprehensive criminal justice information available to criminal justice professionals in law en-
forcement. Specifically CrimeNet has a Name Event Index Service(NEIS) and Comprehensive
Incident-Based Reporting System(CIBRS)[26] that are focused to collect, organize and link in-
dividuals, incidents and events across multiple resord systems used by multiple justice entities.
21
Colloborators from BCA will provide necessary informationand contacts on the field, specifically
police departments in the state of Minnesota that might be potential users of the results of proposed
research. They are enthusiatic to incorporate the developed approach into their systems to aid the
crime analysts.
Policy makers can bring in relevant policy changes based on the discovery of new patterns
from our proposed spatio-temporal data mining approach to crime linkage analysis. For instance,
A Brazilian city of Diadema passed a legislation to shut downbars early leading to reduction
in homicides by about half and reduction in other crimes and events as mentioned above in the
research methods relating to discovery of unknown spatial relationships in crime[10].
5. Management Plan and Organization
We will measure the succes of this project in terms of (i) succesful research resulting in the
creation of new spatio-temporal data mining techniques, (ii) the building of new tools embodying
the new results, an their use by crime analysis experts, (iii) the success in being able to reduce the
plausible set of hypotheses to solve cime.
The detailed project plan is detailed in Table 2
Table 2: Project Task Schedule for Tasks T1T5 described in Section 3Quarters Year 1 Year 2
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4Scientific ApproachSTDMP for GCSLA T1 T2 T3 T3 T4 T4 T5 Final Report
Management ApproachProgress Monitoring Quarterly review of goals Final Report
NIJ Reports Half Yearly progress report to NIJ Final Report
The team, consisting of geographic information scientistsand crime analysts, is capable of car-
rying out the proposed tasks. They not only have strong trackrecords in G.I.Sc., data management,
and human activity (e.g. crime) analysis but they have also worked collaboratively. The PI, Dr.
Shashi Shekhar, is a leader in Spatio-temporal data management and analysis. The Co-PI, Dr.
Jaideep Srivatsava, is a leader in the area of Web Mining and Database Systems. Professor Richard
Block, PhD, Emeritus Professor of Sociology and Criminal Justice at Loyola University Chicago,
has been studying the relationship between crime and community for the last 30 years. The col-
22
Table 3: Dissemination strategyDeliverable Target AudienceScholarly publications in Crime Analysisconferences and journals Research Community.NET components of algorithms Crime analysts andin tools such as CrimeStat practitionersResulting Patterns Policy Makers
laborators from the Minnesota Department of Public Safety ,Bureau of Criminal Apprehension,
CriMNet Group Program Office The CriMNet program office regularly involves subject matter ex-
perts from the law enforcement community in research and analysis projects and proposes to do so
with this project. The researchers and collaborators make this team truly unique.
6. Dissemination Strategy
The new algorithms, techniques and tools would be disseminated to academic conferences in
data mining, spatio-temporal data analysis and special crime mapping related conferences like
MAPS (Mapping and analysis for Public Safety) orgainzed by the National Institute of Justice
(NIJ). Further, several techniques developed may be incorporated as tools to be used by state an-
gencies like the Minnesota Bureau of Criminal Apprehensionand also as a part of spatial statistics
applications like Crimestat.
A Dissemination strategy is shown in Table 3.
23
Appendix I
References
[1] R. Adderley. The use of data-mining techniques in operational crime fighting. In M. M.Kantardzic and J. Zurada, editors,Next Generation of Data-Mining Applications. John Wileyand Sons Inc., Hoboken, NJ, USA, 2005.
[2] Pieter Adriaans and Dolf Zantinge.Data mining. Addison-Wesley Longman Publishing Co.,Inc., Boston, MA, USA, 1997.
[3] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules inlarge databases. InProceedings of the 20th International Conference on Very Large DataBases, 1994.
[4] L. Anselin. Local indicators of spatial association-lisa.Geographical Analysis, 27(2):93–155,1995.
[5] Mikhail Bilenko. Learnable Similarity Functions and Their Application to Record Linkageand Clustering.PhD thesis, Department of Computer Sciences, University ofTexas at Austin,2006.
[6] R. Block and C. R. Block. Place, space, and crime: A spatial analysis of liquor places. InJ. Eck and D. Weisburd, editors,Crime and Place. Criminal Justice Press, 1996.
[7] R. Block and C. R. Block. Risky places: A comparison of theenvirons of rapid transit stationsin chicago and the bronx. In J. Mollenkopf, editor,Analyzing Crime Patterns: Frontiers ofPractice. Sage Publishing, 1999.
[8] Paul J. Brantingham and Patricia L. Brantingham.Environmental Criminology. WavelandPress, Long Grove, IL, USA, 1990.
[9] Pete Chapman, Julian Clinton, Randy Kerber, Thomas Khabaza, Thomas Reinartz, ColinShearer, and Rudiger Wirth.CRISP-DM 1.0: Step-by-Step Data Mining Guide. CRISP-DMconsortium: NCR Systems Engineering Copenhagen (USA and Denmark) DaimlerChryslerAG (Germany), SPSS Inc. (USA) and OHRA Verzekeringen en BankGroep B.V (The Nether-lands), 2000.
[10] Brazil city slashes crime by closing its bars early. Sanfrancisco chronicle.http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2006/05/10/MNGIOIOQ3M1.DTL, May2006.
[11] R. V. Clarke and M. Felson. Introduction: Criminology,routine activity and rational choice.In R. V. Clarke and M. Felson, editors,Routine Activity and Rational Choice: Advances inCriminology Theory, volume 5. Transaction Publishers, Somerset, NJ, USA, 1993.
[12] L. E. Cohen and M. Felson. Social change and crime rate trends: A routine activity approach.American Sociological Review, 44:588–608, 1979.
[13] D. Cornish and R. V. Clarke. Introduction. In D. Cornishand R. V. Clarke, editors,TheReasoning Criminal.Springer-Verlag, 1985.
[14] John E. Eck and et. al. Mapping crime: Understanding hotspots. US National Institute ofJustice (http://www.ncjrs.gov/pdffiles1/nij/209393.pdf), 2005.
[15] Ronald E.Wilson and Katie M.Filbert. Crime mapping andanalysis. InEncyclopedia of GIS.Springer, 2008.
[16] M. Felson. Routine activities and crime prevention: Armchair concepts and practical action.Studies on Crime and Crime Prevention, 1:30–34, 1992.
[17] Bill Mc Garigle. Crime Profilers Gain New Weapons: Linkage anal-ysis and geographic profiling systems get criminals where they live.http://www.vgin.virginia.gov/documents/articles/localgovt/Crime%20ProfilersGain New Weapons.htm,1997.
[18] A. Getis and J.K. Ord. Local spatial statistics: An overview. In Spatial Analysis: Modellingin a GIS Environment, pages 261–277. GeoInformation International, Cambridge, England,1996.
[19] Yan Huang, Shashi Shekhar, and Hui Xiong. Discovering co-location patterns from spa-tial datasets: A general approach.IEEE Transactions on Knowledge and Data Engineering(TKDE), 16(12):1472–1485, December 2004.
[20] Peter J.Diggle, AG Chetwynd, R. Hggkvist, and SE Morris. Second-order analysis of space-time clustering.Statistical Methods in Medical Research, 4(2):124–136, 1995.
[21] Harvey J.Miller. Time geography. InEncyclopedia of GIS. Springer, 2008.
[22] G. Knox. Detection of low density epidemicity.British Journal of Preventative and SocialMedicine, 17(1):21–27, 1963.
[23] G. Knox. Epidemiology of childhood leukaemia in northumberland and durham.BritishJournal of Preventative and Social Medicine, 18:17–24, 1984.
[24] Ned Levine. CrimeStat 3.0: A Spatial Statistics Program for the Analysis of Crime Inci-dent Locations. Ned Levine & Associatiates: Houston, TX / National Institute of Justice:Washington, DC, 2004.
[25] G. C. Oatley, J. Zeleznikow, and Ewart B. W.”. Matching and predicting crimes. In A. Mac-intosh, R. Ellis, and T. Allen, editors,Applications and Innovations in Intelligent Systems XII.Proceedings of AI2004), pages 19–32, 2004.
[26] State of Minnesota Bureau of Criminal Apprehension. Com-prehensive Incident Based Reporting System - CIBRS.http://www.bca.state.mn.us/cibrs/Documents/CIBRS%20Fact%20Sheet.pdf, 2007.
[27] State of Minnesota Bureau of Criminal Apprehension. CriMNet.http://www.crimnet.state.mn.us/Misc/AboutCrimnet.htm, 2007.
[28] Atsuyuki Okabe, KeiIchi Okunuki, and Shino Shiode. Thesanet toolbox: New methods fornetwork spatial analysis.Transactions in GIS, 10(4):535–550, 2006.
[29] Brantingham P.J. and P.L.” Brantingham. Environmental criminology. Prospect Heights, IL:Waveland, 1991.
[30] J. Ratcliffe. Near repeat calculator. ”http://www.temple.edu/cj/misc/nr/access.asp?ac=emsub”,2007.
[31] Jerry H. Ratcliffe. The hotspot matrix: A framework forthe spatio-temporal targeting ofcrime reduction.Police Practice and Research, 5(1):05–23, 2004.
[32] A. J. Reiss. Co-offending and criminal careers. In M. Tonry and N. Morris, editors,Crimeand Justice: A Review of Research, volume 10. University of Chicago Press, 1988.
[33] B.D Ripley. The second-order analysis of stationary point processes.Applied Probability,13(2):55–66, 1976.
[34] S.A. Roach.The Theory of Random Clumping. Methuen, London, 1968.
[35] Caterina Gouvis Roman. Routine activities of youth andneighborhood violence: Spatialmodeling of place, time and crime. In Fahui Wang, editor,Geographic Information Systemsand Crime Analysis, chapter 17, pages 293–310. Idea Group, Hershey, PA, USA, 2005.
[36] D.K. Rossmo.Geographic Profiling. CRC Press, Boca Raton, FL , USA, 2000.
[37] Kim D. Rossmo, Ian Laverty, and Brad Moore. Grogaphic profiling for serial crime investiga-tion. In Fahui Wang, editor,Geographic Information Systems and Crime Analysis, chapter 6,pages 102–117. Idea Group, Hershey, PA, USA, 2005.
[38] C. G. Salfati and A. L. Bateman. Serial homicide: an investigation of behavioural consistency.Journal of Investigative Psychology and Offender Profiling, 2:121–144, 2005.
[39] S. Shekhar, P. Schrater, R. Vatsavai, W. Wu, and S. Chawla. Spatial contextual classificationand prediction models for mining geospatial data, 2002.
[40] Shashi Shekhar, Pusheng Zhang, Yan Huang, and Ranga Raju Vatsavai.Data Mining: NextGeneration Challenges and Future Directions - Trends in Spatial Data Mining. AAAI Press,Menlo Park, CA, USA, 2004.
[41] S. Shiode and A. Okabe. Network variable clumping method for analyzing point patternson a network. InUnpublished paper presented at the Annual Meeting of the Associations ofAmerican Geographers, Philadelphia, Pennsylvania, 2004.
[42] Xiaoning Yang William M. Pottenger and Stephen V. Zanias. Link Analysis Survey StatusUpdate January 2006. Technical report, Lehigh University Computer Science and Engineer-ing Department, 2007.