Crime Analysis using Data Analysis
-
Upload
chetan-hireholi -
Category
Education
-
view
97 -
download
1
Transcript of Crime Analysis using Data Analysis
CRIME ANALYSIS AND PREDICTION USING DATA MININGCHETAN HIREHOLI,M.TECH, SOFTWARE ENGINEERING
Data Mining, what is it?
Data mining is about finding new information in a lot of data.
• Generally, data mining (sometimes called data or knowledge discovery) is
the process of analyzing data from different perspectives and summarizing it into useful information - information that
can be used to increase revenue, cuts costs, or both.• Data mining software is one of a
number of analytical tools for analyzing data.
Timeline
John W. Tukey- Exploratory Data Analysis, 1962
Gregory Piatetsky- Shapiro organizes and chairs the first Knowledge Discovery in Databases (KDD) workshop, 1989
BusinessWeek publishes a cover story on “Database Marketing”, 1994
For the first time, the term “data science” is included in the title of the conference (“Data science, classification, and related methods”), 1996 by IFCS
The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades… - Hal Varian, Google’s Chief Economist, 2009
Application and Trends…
Financial Data Analysis Retail Industry Telecommunication Industry Biological Data Analysis Other Scientific Applications Intrusion Detection
Feel Good, Do Good!
“Crime Analysis and Prediction Using Data Mining”
Shiju Sathyadevan, Devan M.S and Surya Gangadharan. S, 2014 IEEE
Abstract
What is Crime analysis?- Crime analysis is a law enforcement function that involves systematic analysis for identifying and analyzing patterns and trends in crime and disorder.
The proposed system has an approach between computer science and criminal justice to develop a data mining procedure that can help solve crimes faster.
Introduction
It is only within the last few decades that the technology made spatial data mining a practical solution for wide audiences of Law enforcement officials which is affordable and available.
Huge chunks of data to be collected- web sites, news sites, blogs, social media, RSS feeds etc.
So the main challenge in front of us is developing a better, efficient crime pattern detection tool to identify crime patterns effectively.
Doing analysis is a hard job!
The reason for choosing this(Clustering): Only known data present with us Classification technique will not predict
well Also nature of crimes change over time
So in order to be able to detect newer and unknown patterns in future, clustering techniques work better.
Steps in doing Crime Analysis
Data Collection
Classification
Pattern Identification
Prediction
Visualization
Related Work
Using Series Finder will get me more Films!
Series Finder for finding the patterns in burglary. For achieving this they used the modus operandi of offender and
they extracted some crime patterns which were followed by offender. The algorithm constructs modus operandi of the offender.
In your dreams… You
can’t catch me!,I’m KRISHH!
Methodology
Data Collection Collecting data from various sources like news sites, blogs,
social media, RSS feeds etc. But the data we got is ‘VERY UNSTRUCTURED’!, and how do we
store it?! The advantage of NoSQL database over SQL database is that it allows
insertion of data without a predefined schema. Object-oriented programming- hence is easy to use and flexible. Unlike SQL database it not need to know what we are storing in advance,
specify its size etc.
Okay! Enough of humor, come
lets get serious, and look into
how it actually works!
Methodology
Classification Naïve Bayes- a supervised learning method as well as a statistical
method
The algorithm classifies a news article into a crime type to which it fits the best Eg. "What is the probability that a crime document D belongs to a given class C?“
Thomas Bayes
Methodology
Classification Naïve Bayes has it’s advantages:
Simple, and converges quicker than logistic regression. Compared to SVM (Support Vector Machine), it is easy to implement and
comes with high performance. Also in case of SVM as size of training set increases the speed of execution decreases.
Works well for small amount of training to calculate the classification parameters.
Also it fixes the Zero-frequency problem!
Methodology
Classification Using Naive Bayes algorithm we create a model by training crime data related
to vandalism, murder, robbery, burglary, sex abuse, gang rape, arson, armed robbery, highway robbery, snatching etc.
Test results shows that Naive Bayes shows more than 90% accuracy!!
Pseudo code for Naïve Bayes
Methodology
Classification
Named Entity Recognition(NER)- also known as Entity Extraction finds and classify elements in text into predefined categories such as the person names, organizations, locations, date, time etc.
Sample NER
Methodology
Classification Coreference Resolution- Find the referenced entities in a text.
Input: E.g.: A pillion bike rider snatched away a gold mangalsutra worth Rs 85,000 of a 60-year-old womanpedestrian in sector 19, Kharghar on Friday. The victim,Shakuntala Mande, was walking towards a vegetable outlet around 9.40am, when a bike came close to her and the pillion rider snatched her mangalsutra. A robbery case has been registered at Kharghar police station.
Methodology
Pattern Identification Apriori algorithm- used to determine association rules which highlight
general trends The result of this phase is the crime pattern for a particular place. After getting a general crime pattern for a place, when a new case arrives and
if it follows the same crime pattern then we can say that the area has a chance for crime occurrence.
Information regarding patterns helps police officials to facilitate resources in an effective manner.
Methodology
Prediction Decision tree- It is simple to understand and
interpret! Its robust nature and also it works well with large
data sets.
Root node
Leaf node
Splitting ?
Methodology
Visualization A heat map which indicates level of activity,
usually darker colors to indicate low activity and brighter colors to indicate high activity.
Methodology
Visualization In the x-axis all main locations in India are
plotted whereas in y-axis the crime rate is plotted.
The graph shows the regions which has maximum crime rate.
The data plotted here is based on the historical records.
Methodology
Visualization Shows the rate/percentage of crime
occurrence in places like airport, temples, bus station, railway stations, bank, casino, jewelry shops, bar, ATM, airport, bus station, highways etc..
In the x axis the main spots like temple, bank, bus station, railway station, ATM etc. are plotted while in y-axis the rate of crime is plotted.
Future Work
Criminal Profiling Helps the crime investigators to record the characteristics of criminals. The main goal of doing criminal profiling is that:
To provide crime investigators with a social and psychological assessment of the offender
To evaluate belongings found in the possession of the offender. For doing this, the maximum details of each criminals is collected from
criminal records and the modus operandi is found out
Future Work
Criminal Profiling Sifting through each crime record after a particular crime occurrence is
tedious task. So instead we can use some visualization mechanisms to represent the
criminal details in a human understandable form.
Future Work
Criminal Profiling
Conclusion Data Collection
• Web sites, news channels, blogs, etc.
Classification
• Using Naïve Bayes theorem, a predictor is created
Patten Identificatio
n• Apriori Algorithm
Prediction • Decision Tree
Visualization
• Neo4j• GraphDB