San Francisco Crime Classification
-
Upload
sai-praneeth-reddy -
Category
Data & Analytics
-
view
406 -
download
1
Transcript of San Francisco Crime Classification
![Page 1: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/1.jpg)
SAN FRANCISCO CRIME CLASSIFICATIONSai Praneeth
![Page 2: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/2.jpg)
Project Outline1.Problem Identification2.Data Understanding & Cleansing 3.Data Visualization4.Prediction Methodologies 5.Validation & Scoring
![Page 3: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/3.jpg)
Problem IdentificationCurrent State• The current crime index of
S.F is 3(Safer than 3% ofthe cities in the US.)
• 67.67 annual crimes per 1,000 residents.
• Don’t have model to predict crimes based on location and time
Future State• A proper model
predicting crime based on Date, Time and Location.
• Help the corrections department to act properly with corrective measures based on our model.
• What are the different metrics that influence response?
• Is the data enough to give us a clear picture of crime committed?
• What kind of model best fits the data?
![Page 4: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/4.jpg)
Problem Statement
• Given time and location, you must predict the category of
crime that occurred.
• This competition's dataset provides nearly 12 years of crime
reports from across all of San Francisco's neighborhoods.
• It also encourages us to explore the dataset visually.
![Page 5: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/5.jpg)
Data OverviewTimestamp
Category(Different Crimes)DescriptionResolution
Day of Week
PdDistrict Address Longitude & Latitude
![Page 6: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/6.jpg)
Data Cleansing and Manipulation
Cleaning The Data
Check for Missing valuesCheck for Entry errorsCheck for Duplicates
Check for outliers
Manipulating The Data Time Stamp
AddressLongitude Latitude
![Page 7: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/7.jpg)
Data Visualization
![Page 8: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/8.jpg)
Data Visualization
![Page 9: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/9.jpg)
Data Visualization
![Page 10: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/10.jpg)
Data Visualization
![Page 11: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/11.jpg)
Data Visualization
![Page 12: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/12.jpg)
Variables Selection & Data Partition
• Data Partition▫ 60:40
![Page 13: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/13.jpg)
Project Diagram
![Page 14: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/14.jpg)
1. Decision Tree (Two-way split)• This decision tree with typical two way split.• In the properties panel the method was changed to assessment and the
assessment measure was changed to decision as we are trying to classify the categorical variables.
![Page 15: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/15.jpg)
1.Decision Tree (Two-way split)• Most Important variable for split -> Zip code • No of leaves in the pruned tree -> 6• Validation Misclassification 0.273474
![Page 16: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/16.jpg)
1. Decision Tree (Two-way split)
![Page 17: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/17.jpg)
2. Decision Tree (Three-way splits)• This decision tree has three way split.• In the properties panel we changed the maximum branch to three and we
still have the same assessment criteria.• This greatly increased model accuracy.
![Page 18: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/18.jpg)
2. Decision Tree (Three-way splits)• Most Important variable for split -> Zip codes• No of leaves in the pruned tree -> 7• Validation Misclassification -> 0.134316
![Page 19: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/19.jpg)
2. Decision Tree (Three-way splits)
![Page 20: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/20.jpg)
3.Gradient Boosting• “Gradient boosting is a boosting approach that resamples the data set
several times to generate results that form a weighted average of the resampled data set. Tree boosting creates a series of decision trees which together form a single predictive model”
• Here the assessment measure is taken as misclassification.• The Train proportion is taken as 60%• Most Important variable for split -> PDistrict• Validation Misclassification -> 0.34221
![Page 21: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/21.jpg)
4.Ensemble model• Combination of all the four models.• Validation misclassification of 0.141683
![Page 22: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/22.jpg)
Model Comparison
• Best model is Three way decision tree with misclassification of 0.135668• Model drastically improved after converting latitude and longitude to zip
codes.
![Page 23: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/23.jpg)
Betterment of Model • Demographics Data Inclusion
• Time Series Analysis
![Page 24: San Francisco Crime Classification](https://reader031.fdocuments.us/reader031/viewer/2022030217/5887125c1a28abf2228b6553/html5/thumbnails/24.jpg)
Questions
THANK YOU