Crime Analysis using Data Analysis

26
CRIME ANALYSIS AND PREDICTION USING DATA MINING CHETAN HIREHOLI, M.TECH, SOFTWARE ENGINEERING

Transcript of Crime Analysis using Data Analysis

Page 1: Crime Analysis using Data Analysis

CRIME ANALYSIS AND PREDICTION USING DATA MININGCHETAN HIREHOLI,M.TECH, SOFTWARE ENGINEERING

Page 2: Crime Analysis using Data Analysis

Data Mining, what is it?

Data mining is about finding new information in a lot of data.

• Generally, data mining (sometimes called data or knowledge discovery) is

the process of analyzing data from different perspectives and summarizing it into useful information - information that

can be used to increase revenue, cuts costs, or both.• Data mining software is one of a

number of analytical tools for analyzing data.

Page 3: Crime Analysis using Data Analysis

Timeline

John W. Tukey- Exploratory Data Analysis, 1962

Gregory Piatetsky- Shapiro organizes and chairs the first Knowledge Discovery in Databases (KDD) workshop, 1989

BusinessWeek publishes a cover story on “Database Marketing”, 1994

For the first time, the term “data science” is included in the title of the conference (“Data science, classification, and related methods”), 1996 by IFCS

The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades… - Hal Varian, Google’s Chief Economist, 2009

Page 4: Crime Analysis using Data Analysis

Application and Trends…

Financial Data Analysis Retail Industry Telecommunication Industry Biological Data Analysis Other Scientific Applications Intrusion Detection

Page 5: Crime Analysis using Data Analysis

Feel Good, Do Good!

“Crime Analysis and Prediction Using Data Mining”

Shiju Sathyadevan, Devan M.S and Surya Gangadharan. S, 2014 IEEE

Page 6: Crime Analysis using Data Analysis

Abstract

What is Crime analysis?- Crime analysis is a law enforcement function that involves systematic analysis for identifying and analyzing patterns and trends in crime and disorder.

The proposed system has an approach between computer science and criminal justice to develop a data mining procedure that can help solve crimes faster.

Page 7: Crime Analysis using Data Analysis

Introduction

It is only within the last few decades that the technology made spatial data mining a practical solution for wide audiences of Law enforcement officials which is affordable and available.

Huge chunks of data to be collected- web sites, news sites, blogs, social media, RSS feeds etc.

So the main challenge in front of us is developing a better, efficient crime pattern detection tool to identify crime patterns effectively.

Page 8: Crime Analysis using Data Analysis

Doing analysis is a hard job!

The reason for choosing this(Clustering): Only known data present with us Classification technique will not predict

well Also nature of crimes change over time

So in order to be able to detect newer and unknown patterns in future, clustering techniques work better.

Page 9: Crime Analysis using Data Analysis

Steps in doing Crime Analysis

Data Collection

Classification

Pattern Identification

Prediction

Visualization

Page 10: Crime Analysis using Data Analysis

Related Work

Using Series Finder will get me more Films!

Series Finder for finding the patterns in burglary. For achieving this they used the modus operandi of offender and

they extracted some crime patterns which were followed by offender. The algorithm constructs modus operandi of the offender.

In your dreams… You

can’t catch me!,I’m KRISHH!

Page 11: Crime Analysis using Data Analysis

Methodology

Data Collection Collecting data from various sources like news sites, blogs,

social media, RSS feeds etc. But the data we got is ‘VERY UNSTRUCTURED’!, and how do we

store it?! The advantage of NoSQL database over SQL database is that it allows

insertion of data without a predefined schema. Object-oriented programming- hence is easy to use and flexible. Unlike SQL database it not need to know what we are storing in advance,

specify its size etc.

Okay! Enough of humor, come

lets get serious, and look into

how it actually works!

Page 12: Crime Analysis using Data Analysis

Methodology

Classification Naïve Bayes- a supervised learning method as well as a statistical

method

The algorithm classifies a news article into a crime type to which it fits the best Eg. "What is the probability that a crime document D belongs to a given class C?“

Thomas Bayes

Page 13: Crime Analysis using Data Analysis

Methodology

Classification Naïve Bayes has it’s advantages:

Simple, and converges quicker than logistic regression. Compared to SVM (Support Vector Machine), it is easy to implement and

comes with high performance. Also in case of SVM as size of training set increases the speed of execution decreases.

Works well for small amount of training to calculate the classification parameters.

Also it fixes the Zero-frequency problem!

Page 14: Crime Analysis using Data Analysis

Methodology

Classification Using Naive Bayes algorithm we create a model by training crime data related

to vandalism, murder, robbery, burglary, sex abuse, gang rape, arson, armed robbery, highway robbery, snatching etc.

Test results shows that Naive Bayes shows more than 90% accuracy!!

Page 15: Crime Analysis using Data Analysis

Pseudo code for Naïve Bayes

Page 16: Crime Analysis using Data Analysis

Methodology

Classification

Named Entity Recognition(NER)- also known as Entity Extraction finds and classify elements in text into predefined categories such as the person names, organizations, locations, date, time etc.

Sample NER

Page 17: Crime Analysis using Data Analysis

Methodology

Classification Coreference Resolution- Find the referenced entities in a text.

Input: E.g.: A pillion bike rider snatched away a gold mangalsutra worth Rs 85,000 of a 60-year-old womanpedestrian in sector 19, Kharghar on Friday. The victim,Shakuntala Mande, was walking towards a vegetable outlet around 9.40am, when a bike came close to her and the pillion rider snatched her mangalsutra. A robbery case has been registered at Kharghar police station.

Page 18: Crime Analysis using Data Analysis

Methodology

Pattern Identification Apriori algorithm- used to determine association rules which highlight

general trends The result of this phase is the crime pattern for a particular place. After getting a general crime pattern for a place, when a new case arrives and

if it follows the same crime pattern then we can say that the area has a chance for crime occurrence.

Information regarding patterns helps police officials to facilitate resources in an effective manner.

Page 19: Crime Analysis using Data Analysis

Methodology

Prediction Decision tree- It is simple to understand and

interpret! Its robust nature and also it works well with large

data sets.

Root node

Leaf node

Splitting ?

Page 20: Crime Analysis using Data Analysis

Methodology

Visualization A heat map which indicates level of activity,

usually darker colors to indicate low activity and brighter colors to indicate high activity.

Page 21: Crime Analysis using Data Analysis

Methodology

Visualization In the x-axis all main locations in India are

plotted whereas in y-axis the crime rate is plotted.

The graph shows the regions which has maximum crime rate.

The data plotted here is based on the historical records.

Page 22: Crime Analysis using Data Analysis

Methodology

Visualization Shows the rate/percentage of crime

occurrence in places like airport, temples, bus station, railway stations, bank, casino, jewelry shops, bar, ATM, airport, bus station, highways etc..

In the x axis the main spots like temple, bank, bus station, railway station, ATM etc. are plotted while in y-axis the rate of crime is plotted.

Page 23: Crime Analysis using Data Analysis

Future Work

Criminal Profiling Helps the crime investigators to record the characteristics of criminals. The main goal of doing criminal profiling is that:

To provide crime investigators with a social and psychological assessment of the offender

To evaluate belongings found in the possession of the offender. For doing this, the maximum details of each criminals is collected from

criminal records and the modus operandi is found out

Page 24: Crime Analysis using Data Analysis

Future Work

Criminal Profiling Sifting through each crime record after a particular crime occurrence is

tedious task. So instead we can use some visualization mechanisms to represent the

criminal details in a human understandable form.

Page 25: Crime Analysis using Data Analysis

Future Work

Criminal Profiling

Page 26: Crime Analysis using Data Analysis

Conclusion Data Collection

• Web sites, news channels, blogs, etc.

Classification

• Using Naïve Bayes theorem, a predictor is created

Patten Identificatio

n• Apriori Algorithm

Prediction • Decision Tree

Visualization

• Neo4j• GraphDB