Georgetown Data Analytics Project (Team DC)
-
Upload
noah-turner -
Category
Data & Analytics
-
view
492 -
download
0
Transcript of Georgetown Data Analytics Project (Team DC)
TEAM DC: Georgetown Data Analytics: Exploring DC crime with neighborhood cluster analysis, interactive mapping tool, and regression analysis.
Problem
DC property owners and metro area residents are at greater risk of becoming victims of crime if they are unaware of the frequency and types of crime in their neighborhoods of interest.
Imperfect information - A person visiting or buying a home in an unfamiliar neighborhood faces imperfect information regarding crime and its impact on safety and property values.
Began with:-DC Crime Data - 2011-2014-DC Neighborhood Data (US Census Data) - 2000-2012
Based on given data, we decided to go with the most up to date property values and population numbers (2012) from DC Neighborhood data to be most representative of current state.
Data Munging/Exploration
DC Crime Data:
1. Did not have neighborhood names, but rather cluster, block number, and latitude/longitude coordinates as far as geographic location.
2. Through visualization (Tableau), we discovered that the latitude/longitude coordinates were out of date and needed to be updated to be read by our mapping tool.
3. Only provided a long date (MM/DD/YYYY TT:TT:TT).
Data Munging/Exploration: Problems
Data Munging/Exploration: Solutions to problems
Long Date problem:
• MonthName function (Excel or Access) to populate month field• DayName function (Excel or Access) to populate day field• Right function (Excel or Access) to parse out time of day• Updated Year field by individual year conversion (SQL)
SQL Example to update year field from long date:
UPDATE [CrimeData_2011-2014] SET [CrimeData_2011-2014].REPORT_YEAR = "2014"WHERE ((([CrimeData_2011-2014].REPORT_DAT) Like "*2014*"));
Data Munging/Exploration - Solutions to problems
Neighborhood Names problem:
• Neighborhood Data contained cluster number and neighborhood name, while Crime data only included cluster. An inner join on the cluster fields brought in Neighborhood Names.
SQL Code for INNER JOIN:
UPDATE [CrimeData_2011-2014] INNER JOIN [CensusbyNeighborhood-CSV1] ON [CrimeData_2011-2014].NEIGHBORHOOD_CLUSTER = [CensusbyNeighborhood-
CSV1].NeighClst SET [CrimeData_2011-2014].NEIGHBORHOOD = [CensusbyNeighborhood-CSV1].
[Neighborhood];
Data Munging/Exploration - Solutions to problems
Latitude/Longitude problem:
• Latitude/Longitude coordinates were outdated, and Tableau couldn’t read them for mapping.
• We used Python (pyproj and proj.4 libraries) to bring in up-to-date latitude/longitude coordinates.
Code can be viewed on Github.
Data Analysis & ExplorationMethods/Tools
• Using Tableau, R and Excel explored the data visually and with regression.• Tableau mapping and charts – getting an understanding of the data, the
neighborhood clusters and crime incidence.• Linear/multiple regressions on current 39 Cluster data (2012 median
property value, total per crime category).• Exploring time series median home price data on violent crime per 1000
residents 2000-2011 in R.• Exploring regression coefficients between crime areas to determine
impact of violent crime events on property value and if there is marginal negative return to violent crime – as hypothesized.
Data exploration of raw counts of crime by neighborhood cluster. Shows heavy crime incidence in commercial areas (dominated by theft). Needs
further exploration.
Map of Clusters: Tableau (hyperlink)
Comparing Neighborhoods - Crime TypesAdMo/Kalorama- Prop: $1.1M - Crime: 2863 Foxhall/Palisades- Prop: $1.0M - Crime: 602
Fairlawn- Prop: $243K - Crime: 2763 Union St/E. Market- Prop: $549K - Crime: 6178
Adams Morgan/Kalorama
Comparing Neighborhoods - Day of WeekFoxhall/Palisades
Union Station/Eastern Market Fairlawn
Products for Residents/Business Owners
http://danieljwood.github.io/DC-crime/
https://www.google.com/maps/d/edit?mid=zUgxKwpr9tdQ.kygX4j2r7scg
Total crimes (count) show little relationship to property value – explained mainly due to the slightly positive correlation (and
aggregate number) of non-violent crime events.
Correlations: Negative correlations for violent crime indicate impact on property values – flat/positive correlations for non-violent theft
ARSON
ASSAULT W/DANGEROUS WEAPON
BURGLARY
HOMICIDE
MOTOR VEHICLE THEFT
ROBBERY
SEX ABUSE
THEFT F/AUTO
THEFT/OTHER
Correlation: Property Value -0.34 -0.5981 -0.41 -0.494 -0.615527 -0.3761 -0.3893 0.11122 0.191
Analysis of Correlation and Map Clustering
• Maps are excellent for quickly looking through categories in geographic proximity, but don’t provide much insight without neighborhood specifics such as commercial activity(JP will elaborate). Of particularly interest are population density and # liquor licenses (as a proxy for commercial activity)
• Correlations show that the only two crime groups that are weakly positively correlated with property value are Theft and Theft from Motor Vehicle (not Motor Vehicle Theft). This conforms to our expectations about violent crime impact property value. The correlation itself is fairly weak, but its sign is interesting.
• Other inferences from the map: trees commit very few crimes ☺
Model1=lm(formula = Median2012 ~ VCrime2010,
data = dccrimereg)
Coefficients:(Intercept) VCrime2010
940372 -31296
lm(formula = Prop09 ~ VC07 * VC08, data = dccrimereg2)
Coefficients:(Intercept) VC07 VC08 VC07:VC08
1065111 -22563 -38566 1290
Average % violent crime per 1000 plotted against % change in housing asset value over time – not what I expected!
Or is it! Perhaps what we’re observing is the decline in overall violent crime is showing a massive rebound for the higher crime (asset price
depressed) neighborhood clusters. Gentrification is occurring, but what is also observed is that housing prices seem to be playing catch up in
those high crime neighborhoods.
Regression Discussion
• Non-violent crime is neutral to weakly associated with increasing property values (unsurprising).
• Asset value accumulation is an important part of breaking cycle of poverty and violent crime thus has a detrimental effect on the ability to accumulate wealth….HOWEVER, high violent crime areas are experiencing much growth in asset values which indicates that falling crime rates in otherwise desirable areas are an instrumental factor. Decay rate seems fast. The perception of falling crime rates may be key factor.
• Further thoughts on where the data might go….– Drilling down to one number for the impact of violent crime on property value– Decay rate of violent crime negative impact on property value growth rate.– Quantifying negative wealth accumulation effect over time of violent crime.
Conclusion
Original Problem: DC property owners and metro area residents are at greater risk of becoming victims of crime if they are unaware of the frequency and types of crime in their neighborhoods of interest.
Solution: By implementing effective data analytic techniques, we were able to expose important trends and relationships regarding crime activity in relation to the different neighborhoods and associated property values throughout the DC metro area. This information can provide important insight for business owners, potential homebuyers, as well as the DC Metro Police Department.
Additional Development: Create a web-based application which can graphically represent trends, patterns, and crime frequencies in near-real time.
https://www.google.com/maps/d/edit?mid=zUgxKwpr9tdQ.kygX4j2r7scg
Data Analysis
• Neighborhoods which have more retail businesses, restaurants and bars are more affected by certain types of crime (e.g. theft) than residential neighborhoods.
• Residential neighborhoods with higher average incomes and property values generally have a lower occurrence of violent crimes vs residential neighborhoods with lower average incomes and property values
• Day of the week appears to impact certain types of crime (robbery/theft) in high-traffic neighborhoods but it is not a direct correlation in every case.