Georgetown Data Analytics Project (Team DC)

34
TEAM DC: Georgetown Data Analytics : Exploring DC crime with neighborhood cluster analysis, interactive mapping tool, and regression analysis.

Transcript of Georgetown Data Analytics Project (Team DC)

TEAM DC: Georgetown Data Analytics: Exploring DC crime with neighborhood cluster analysis, interactive mapping tool, and regression analysis.

Problem

DC property owners and metro area residents are at greater risk of becoming victims of crime if they are unaware of the frequency and types of crime in their neighborhoods of interest.

Imperfect information - A person visiting or buying a home in an unfamiliar neighborhood faces imperfect information regarding crime and its impact on safety and property values.

Data Science Pipeline

Began with:-DC Crime Data - 2011-2014-DC Neighborhood Data (US Census Data) - 2000-2012

Based on given data, we decided to go with the most up to date property values and population numbers (2012) from DC Neighborhood data to be most representative of current state.

Data Munging/Exploration

DC Crime Data:

1. Did not have neighborhood names, but rather cluster, block number, and latitude/longitude coordinates as far as geographic location.

2. Through visualization (Tableau), we discovered that the latitude/longitude coordinates were out of date and needed to be updated to be read by our mapping tool.

3. Only provided a long date (MM/DD/YYYY TT:TT:TT).

Data Munging/Exploration: Problems

Data Munging/Exploration: Solutions to problems

Long Date problem:

• MonthName function (Excel or Access) to populate month field• DayName function (Excel or Access) to populate day field• Right function (Excel or Access) to parse out time of day• Updated Year field by individual year conversion (SQL)

SQL Example to update year field from long date:

UPDATE [CrimeData_2011-2014] SET [CrimeData_2011-2014].REPORT_YEAR = "2014"WHERE ((([CrimeData_2011-2014].REPORT_DAT) Like "*2014*"));

Data Munging/Exploration - Solutions to problems

Neighborhood Names problem:

• Neighborhood Data contained cluster number and neighborhood name, while Crime data only included cluster. An inner join on the cluster fields brought in Neighborhood Names.

SQL Code for INNER JOIN:

UPDATE [CrimeData_2011-2014] INNER JOIN [CensusbyNeighborhood-CSV1] ON [CrimeData_2011-2014].NEIGHBORHOOD_CLUSTER = [CensusbyNeighborhood-

CSV1].NeighClst SET [CrimeData_2011-2014].NEIGHBORHOOD = [CensusbyNeighborhood-CSV1].

[Neighborhood];

Data Munging/Exploration - Solutions to problems

Latitude/Longitude problem:

• Latitude/Longitude coordinates were outdated, and Tableau couldn’t read them for mapping.

• We used Python (pyproj and proj.4 libraries) to bring in up-to-date latitude/longitude coordinates.

Code can be viewed on Github.

Data Analysis & ExplorationMethods/Tools

• Using Tableau, R and Excel explored the data visually and with regression.• Tableau mapping and charts – getting an understanding of the data, the

neighborhood clusters and crime incidence.• Linear/multiple regressions on current 39 Cluster data (2012 median

property value, total per crime category).• Exploring time series median home price data on violent crime per 1000

residents 2000-2011 in R.• Exploring regression coefficients between crime areas to determine

impact of violent crime events on property value and if there is marginal negative return to violent crime – as hypothesized.

Data exploration of raw counts of crime by neighborhood cluster. Shows heavy crime incidence in commercial areas (dominated by theft). Needs

further exploration.

Total Crime per Capita (000)

Comparing Neighborhoods - Crime TypesAdMo/Kalorama- Prop: $1.1M - Crime: 2863 Foxhall/Palisades- Prop: $1.0M - Crime: 602

Fairlawn- Prop: $243K - Crime: 2763 Union St/E. Market- Prop: $549K - Crime: 6178

Adams Morgan/Kalorama

Comparing Neighborhoods - Day of WeekFoxhall/Palisades

Union Station/Eastern Market Fairlawn

Products for Residents/Business Owners

http://danieljwood.github.io/DC-crime/

https://www.google.com/maps/d/edit?mid=zUgxKwpr9tdQ.kygX4j2r7scg

Property Value plotted against Violent Crime Rate, by neighborhood cluster,

2000-2011

Total crimes (count) show little relationship to property value – explained mainly due to the slightly positive correlation (and

aggregate number) of non-violent crime events.

Correlations: Negative correlations for violent crime indicate impact on property values – flat/positive correlations for non-violent theft

ARSON

ASSAULT W/DANGEROUS WEAPON

BURGLARY

HOMICIDE

MOTOR VEHICLE THEFT

ROBBERY

SEX ABUSE

THEFT F/AUTO

THEFT/OTHER

Correlation: Property Value -0.34 -0.5981 -0.41 -0.494 -0.615527 -0.3761 -0.3893 0.11122 0.191

Analysis of Correlation and Map Clustering

• Maps are excellent for quickly looking through categories in geographic proximity, but don’t provide much insight without neighborhood specifics such as commercial activity(JP will elaborate). Of particularly interest are population density and # liquor licenses (as a proxy for commercial activity)

• Correlations show that the only two crime groups that are weakly positively correlated with property value are Theft and Theft from Motor Vehicle (not Motor Vehicle Theft). This conforms to our expectations about violent crime impact property value. The correlation itself is fairly weak, but its sign is interesting.

• Other inferences from the map: trees commit very few crimes ☺

Model1=lm(formula = Median2012 ~ VCrime2010,

data = dccrimereg)

Coefficients:(Intercept) VCrime2010

940372 -31296

lm(formula = Prop09 ~ VC07 * VC08, data = dccrimereg2)

Coefficients:(Intercept) VC07 VC08 VC07:VC08

1065111 -22563 -38566 1290

Average % violent crime per 1000 plotted against % change in housing asset value over time – not what I expected!

Or is it! Perhaps what we’re observing is the decline in overall violent crime is showing a massive rebound for the higher crime (asset price

depressed) neighborhood clusters. Gentrification is occurring, but what is also observed is that housing prices seem to be playing catch up in

those high crime neighborhoods.

Regression Discussion

• Non-violent crime is neutral to weakly associated with increasing property values (unsurprising).

• Asset value accumulation is an important part of breaking cycle of poverty and violent crime thus has a detrimental effect on the ability to accumulate wealth….HOWEVER, high violent crime areas are experiencing much growth in asset values which indicates that falling crime rates in otherwise desirable areas are an instrumental factor. Decay rate seems fast. The perception of falling crime rates may be key factor.

• Further thoughts on where the data might go….– Drilling down to one number for the impact of violent crime on property value– Decay rate of violent crime negative impact on property value growth rate.– Quantifying negative wealth accumulation effect over time of violent crime.

Conclusion

Original Problem: DC property owners and metro area residents are at greater risk of becoming victims of crime if they are unaware of the frequency and types of crime in their neighborhoods of interest.

Solution: By implementing effective data analytic techniques, we were able to expose important trends and relationships regarding crime activity in relation to the different neighborhoods and associated property values throughout the DC metro area. This information can provide important insight for business owners, potential homebuyers, as well as the DC Metro Police Department.

Additional Development: Create a web-based application which can graphically represent trends, patterns, and crime frequencies in near-real time.

https://www.google.com/maps/d/edit?mid=zUgxKwpr9tdQ.kygX4j2r7scg

Appendix

Crime Type Percentage per Capita

Data Analysis

• Neighborhoods which have more retail businesses, restaurants and bars are more affected by certain types of crime (e.g. theft) than residential neighborhoods.

• Residential neighborhoods with higher average incomes and property values generally have a lower occurrence of violent crimes vs residential neighborhoods with lower average incomes and property values

• Day of the week appears to impact certain types of crime (robbery/theft) in high-traffic neighborhoods but it is not a direct correlation in every case.