The Seasonal Cycles in the Distribution of Precipitation ...
A Stochastic Model for Seasonal Cycles in Crime - UCLAbertozzi/WORKFORCE/REU 2013/Crime Fighters...A...
Transcript of A Stochastic Model for Seasonal Cycles in Crime - UCLAbertozzi/WORKFORCE/REU 2013/Crime Fighters...A...
-
A Stochastic Model for Seasonal Cycles in Crime
Yunbai Cao Kun Dong Beatrice Siercke Matt Wilber
August 9, 2013
(UCLA, Harvey Mudd) August 9, 2013 1 / 43
-
Introduction
Introduction
(UCLA, Harvey Mudd) August 9, 2013 2 / 43
-
Introduction
(UCLA, Harvey Mudd) August 9, 2013 3 / 43
Objective
We would like to create a mathematical model for theseasonal variation of burglary in Los Angeles, CA andHouston, TX, in order to forecast future crime trends.While seasonality of crime data is widely agreed upon,it is poorly understood, and we seek to determine if amodel can reproduce seasonality for a wide variety ofdata sets without relying on environmental factors.
-
Introduction
Criminology Background
(UCLA, Harvey Mudd) August 9, 2013 4 / 43
Inconsistent Seasonal Folklore
Disagreement over seasonality of crimes
Variation with crime type
Burglary peak in summer or winter
Geographical variation of behavior
Inconsistent hypotheses of what drives seasonality
-
Introduction
Data Sets
(UCLA, Harvey Mudd) August 9, 2013 5 / 43
Los Angeles Houston
Data sets contain property crime rates
LA data 2005-2013, Houston 2009-2013
Would like to divide into long-term trend, seasonal,and noise components Xt = Tt + St + Nt
-
Methods
Methods
(UCLA, Harvey Mudd) August 9, 2013 6 / 43
-
Methods
Singular Spectrum Analysis
(UCLA, Harvey Mudd) August 9, 2013 7 / 43
Decomposition of timeseries X = (x1, x2, . . . , xN)into elementaryreconstructed series
Window lengthL = N K + 1Low-rank approximation oforiginal time series
Allows us to extractlong-term trend,seasonality, and noise
M =
x1 x2 x3 xKx2 x3 x4 xK+1x3 x4 x5 xK+2...
......
. . ....
xL xL+1 xL+2 xN
-
Methods
Singular Spectrum Analysis
(UCLA, Harvey Mudd) August 9, 2013 8 / 43
Interested in the first six or so singular values
-
Methods
Singular Spectrum Analysis
(UCLA, Harvey Mudd) August 9, 2013 9 / 43
First modecorresponds tolong-term trend
Modes 2 and 5display yearlyperiods
-
Methods
Trend Extraction
Los Angeles Houston
(UCLA, Harvey Mudd) August 9, 2013 10 / 43
-
Methods
Seasonality Extraction
Los Angeles Houston
(UCLA, Harvey Mudd) August 9, 2013 11 / 43
-
Methods
Seasonality Extraction
Los Angeles Houston
(UCLA, Harvey Mudd) August 9, 2013 12 / 43
-
Methods
Crime Type Comparisons
(UCLA, Harvey Mudd) August 9, 2013 13 / 43
Aggravated Assault,Murder, and Theft have aclear downward trend
Auto Theft and Robberydip then rise
Burglary and Rape brieflyincrease after two years,but otherwise decrease
-
Methods
Crime Type Comparisons
(UCLA, Harvey Mudd) August 9, 2013 14 / 43
Aggravated Assault, AutoTheft, Burglary, Rape, andTheft all have clear yearlyseasonal components
-
Methods
Crime Type Comparisons
(UCLA, Harvey Mudd) August 9, 2013 15 / 43
Murder has no seasonal component, only noise
Robbery has a semi-annual seasonal component
-
Methods
Crime Type Comparisons
In summary,
Extracted a clear trend for each crime type
Most types have a yearly seasonal component
Murder and Robbery do not have yearly components
(UCLA, Harvey Mudd) August 9, 2013 16 / 43
-
Methods
Stochastic Differential Equation (SDE)
(UCLA, Harvey Mudd) August 9, 2013 17 / 43
We will consider two-dimensional SDEs of the form
dXt = f (Xt ,Yt) dt + g(Xt ,Yt) dWt
dYt = f (Xt ,Yt) dt + g(Xt ,Yt) dVt
White-noise terms dWt and dVt represent Brownianmotion
-
Methods
Stochastic Differential Equation (SDE)
(UCLA, Harvey Mudd) August 9, 2013 18 / 43
Ito Calculus uses a different chain rule. If we consideru(t, x , y), then
du =u
tdt + u, dXt+
1
2Hess(u)dG , dG
where
dXt =[dXtdYt
]and dG =
[g(Xt ,Yt) dWtg(Xt ,Yt) dVt
].
-
Methods
Lotka-Volterra
Follows the equations
x = x( y), x(0) = x0y = y( x), y(0) = y0
(UCLA, Harvey Mudd) August 9, 2013 19 / 43
Predator-PreyModel
Growth/decay rateparameters ,
Interaction effectparameters ,
Used incriminologyliterature
Natural periodicsolutions
-
Methods
Lotka-Volterra
Used a stochastic version of Lotka-Volterra,
dXt = Xt( Yt) dt + 1Xt dWt , X (0) = X0dYt = Yt( Xt) dt + 2Yt dVt , Y (0) = Y0.
White noise, IID terms dWt and dVt
Noise is proportional to size of population at time t
Xt = Available targets per criminal
Yt = Crime rate
(UCLA, Harvey Mudd) August 9, 2013 20 / 43
-
Methods
Energy of the Model
(UCLA, Harvey Mudd) August 9, 2013 21 / 43
Well-known energy of the Lotka-Volterra Model:
E = log y + log x y x
Using Itos Lemma, we find
Et = E0 +
t0
1( Xt) dWt + 2( Yt) dVt
1
2(1
2 + 22)t
-
Methods
Energy of the Model
(UCLA, Harvey Mudd) August 9, 2013 22 / 43
Expected Energy
Taking expectation,
E(Et) = E(E0)1
2(1
2 + 22)t.
Energy decreases over time
Orbit tends to leave equilibrium
Built-in period variation
Can be used to validate weak order of scheme
-
Methods
Numerics For Stochastic Lotka-Volterra
(UCLA, Harvey Mudd) August 9, 2013 23 / 43
Used semi-implicit scheme, derived by setting
Xt+1 Xt + (Xt t Xt+1Yt t + 1Xt Wt)
Yt+1 Yt + (XtYt t Yt+1 t + 2Yt Vt)(1)
where t is the timestep and Wt ,Vt
t N(0, 1).
-
Methods
Numerics For Stochastic Lotka-Volterra
Solving for Xt+1, Yt+1 results in thescheme
Xt+1 =1 + t + 1Wt
1 + YttXt
Yt+1 =1 + Xtt + 2Vt
1 + tYt .
Weak first order, but quick to compute
Maintains positivity with very highprobability
(UCLA, Harvey Mudd) August 9, 2013 24 / 43
-
Results
Results
(UCLA, Harvey Mudd) August 9, 2013 25 / 43
-
Results
Simulations
(UCLA, Harvey Mudd) August 9, 2013 26 / 43
Plot1 = 0, 2 = 0.02
Least squareparameterestimation
Minimum leastsquares differencefrom data
Missing period
-
Results
Simulations
(UCLA, Harvey Mudd) August 9, 2013 27 / 43
Los Angeles Houston
Reproduced missing period
-
Results
Simulations
(UCLA, Harvey Mudd) August 9, 2013 28 / 43
Los Angeles Aggravated Assault Data in red,simulation in blue
Better fit withoutmissed period orwidely varyingpeak heights
Shows that ourmodel worksacross a variety ofdata
-
Results
Simulations
(UCLA, Harvey Mudd) August 9, 2013 29 / 43
Houston Aggravated Assault
Data in red,simulation in blue
Criminologicalimplications
-
Results
LA Burglary Video
(UCLA, Harvey Mudd) August 9, 2013 30 / 43
laclip.mpgMedia File (video/mpeg)
-
Results
Houston Burglary Video
(UCLA, Harvey Mudd) August 9, 2013 31 / 43
HTclip.mpgMedia File (video/mpeg)
-
Results
Missing Period Statistics
Want to identifypeaks in time series
Use peakfinder.m,by Nathanael Yoder 1
Yoder, Nathanael (2009). PeakFinder(http://www.mathworks.com/matlabcentral/fileexchange/25500-peakfinder),MATLAB Central File Exchange. Retrieved July 2013.
(UCLA, Harvey Mudd) August 9, 2013 32 / 43
-
Results
Missing Period Statistics
(UCLA, Harvey Mudd) August 9, 2013 33 / 43
Peakfinder.m
Used for finding local peaks or valleys ... in anoisy vector.
Input: A real vector
Output: [pks, locs], two vectors with peak heightsand peak locations, respectively.
Options
sel: The amount above surrounding data for apeak to be identifiedthresh: A threshold above which maxima must beto be considered peaks
-
Results
Method A
(UCLA, Harvey Mudd) August 9, 2013 34 / 43
Peaks must be 10units above nearbydata
Peaks must be atleast 180 units high
-
Results
Method A
(UCLA, Harvey Mudd) August 9, 2013 35 / 43
If any pair of peaksare at leastminpeakdistance
days apart, a periodhas been missed
-
Results
Method
(UCLA, Harvey Mudd) August 9, 2013 36 / 43
Peaks must be 7 unitsabove nearby data
Peaks must be atleast 179.6 units high
Peaks must be atleast 44 days apart
Peaks within 103days apart each otherare grouped into thesame peak cluster
-
Results
Method
(UCLA, Harvey Mudd) August 9, 2013 37 / 43
If any pair of peaks are atleast minpeakdistance daysapart, a period has beenmissed
If the first peak appears afterminpeakdistance days, aperiod has been missed
If number of peak clusters isless than 8, a period has beenmissed
-
Results
Noise Parameter Analysis: Method A
Percentage relatively sensitive to noise
Varies little with 1
Decreases with 2
(UCLA, Harvey Mudd) August 9, 2013 38 / 43
Percent Missed Periods vs. Noise Parameters
1/2 0.010 0.015 0.020 0.025 0.030
0.000 56.9 36.3 21.3 11.0 5.50.01 56.8 37.6 20.2 11.2 5.00.02 59.8 35.7 19.4 10.6 5.00.03 55.0 34.6 18.7 8.2 4.0
-
Results
Noise Parameter Analysis: Method
Varies little with 1Decreases with 2
(UCLA, Harvey Mudd) August 9, 2013 39 / 43
Percent Missed Periods vs. Noise Parameters
1/2 0.010 0.015 0.020 0.025 0.030
0.000 32.4 27.9 19.5 12.1 6.90.01 38.9 30.6 18.0 13.9 6.70.02 39.5 27.2 17.4 13.4 8.10.03 40.5 28.0 19.3 12.6 8.1
-
Results
Minpeakdistance Analysis: Method A
Both percentages decrease as minpeakdistance increases
Percentages relatively insensitive to minpeakdistance parameter
1 = 0, 2 = 0.02
(UCLA, Harvey Mudd) August 9, 2013 40 / 43
Percent Missed Periods vs. Minpeakdistance
minpeakdistance 450 475 500 525 550
Percent Missed Peaks 22.5 21.7 19.2 16.0 17.0
-
Results
Minpeakdistance Analysis: Method
Both percentages decrease as minpeakdistance increases
Percentages relatively sensitive to minpeakdistance parameter
1 = 0, 2 = 0.02
(UCLA, Harvey Mudd) August 9, 2013 41 / 43
Percent Missed Periods vs. Minpeakdistance
minpeakdistance 450 460 470 480 490 500
Percent Missed Peaks 27.5 22.6 20.5 18.5 14.1 12.8
-
Results
Conclusions
Decomposed crime data into long-term trend, seasonal, and noisecomponents
Produced an SDE model to simulate burglary rates over several yearspans
Affirmed models ability to reproduce missed periods as in LA data
(UCLA, Harvey Mudd) August 9, 2013 42 / 43
-
Results
Acknowledgements
Jeffrey Brantingham, for the Los Angeles crime data
Houston Police Department, for the Houston crime data
Scott McCalla and James von Brecht for lots of advice and mentoring
(UCLA, Harvey Mudd) August 9, 2013 43 / 43