SSC2009_KCollins

10
1 SSC Annual Meeting, June 2009 Proceedings of the Survey Methods Section SPATIAL MODELLING OF GEOCODED CRIME DATA Krista Collins 1 ABSTRACT How are different types of crime distributed across cities? What factors are associated with high neighbourhood crime rates? These are but a few of the questions Statistics Canada is addressing in the analysis of crime rates from city neighbourhoods, as part of the project on geocoding crime data. To ensure the analysis is accurate the effects of spatial autocorrelation must be accounted for in the model. This paper will give an overview of the spatial autoregressive models used to analyse the crime rate in several Canadian cities and the potential influence these models have for crime reduction strategies in Canada. KEY WORDS: Simultaneous autoregressive model, Spatial autocorrelation, Spatial error, Spatial lag model RÉSUMÉ Comment les différents types d’affaires criminelles se répartissentelles dans les villes? Quels facteurs sont liés avec des taux élevés de criminalité des quartiers? Voici quelque questions que Statistique Canada aborde dans le cadre d’analyses des taux de crimes dans les quartiers des villes, dans le cadre du projet sur le géocodage des données de la criminalité. Afin d’assurer une estimation fidèle, les effets de l’autocorrélation spatiale doivent être pris en compte dans le modèle. Ce papier va donner une vue d’ensemble des modèles autorégressifs spatiaux utilisés pour l’analyse des taux de crime dans plusieurs villes canadiennes et de l’influence possible que ses modèles peuvent avoir sur les stratégies de réduction de la criminalité. MOTS CLÉS : Autocorrélation spatiale; modèle à autocorrélation spatiale des erreurs; modèle à variable endogène décalée; modèleautorégressif simultané; modèle à variable endogène décalée; 1 Krista Collins, Statistics Canada, 100 Tunney’s Pasture Driveway, Ottawa, ON, Canada, K1A 0T6, [email protected] 1. INTRODUCTION There are often many reasons to expect that data which are located close to each other in space will be more related than data that are further apart; this is a basic law of geography (Cressie 1993). By analysing the spatial structure of the data, more information can be learnt about the nature of the spatial process. For example, the distribution of crime within a city is known to be affected by the properties of the different city regions and by the social interactions of the people in those areas (Puech 2004). By mapping the distribution of crime, regions with a high concentration of crime, known as hotspots can be detected. The crime data can then be related to social and demographic characteristics to discover the factors associated with high and low crime neighbourhoods. The distribution of neighbourhood crime can also be compared across major Canadian cities. This information can then be used by police services and community planners to identify potentially highrisk areas and to make informed decisions on resource allocation and crime reduction strategies. This is precisely the objective of the geocoding crime analysis project at Statistics Canada (Savoie et al., 2006; Savoie, 2008; Statistics Canada, 2008). Geocoding is the process of assigning an exact latitude and longitude coordinate to a street address (Savoie et al., 2006). In this case the street address corresponds to the location where a criminal incident took place. Geocoded crime data can then be integrated into a geographic information system to relate the incidents to the characteristics of the neighbourhoods in which they reside. The relationship between crime and neighbourhood characteristics can then be modelled. If the spatial structure in the data is not completely accounted for in the regression model then the residuals will be spatially autocorrelated. Spatial autocorrelation, simply defined as the nonzero covariance between observations that are related in space, violates the fundamental assumption of independence of the observations in a standard regression model. Spatial

description

Es un libro sobre econometría espacial

Transcript of SSC2009_KCollins

  • 1

    SSCAnnualMeeting, June 2009ProceedingsoftheSurveyMethodsSection

    SPATIALMODELLINGOFGEOCODEDCRIMEDATA

    KristaCollins1

    ABSTRACT

    Howaredifferent typesof crimedistributedacross cities?Whatfactors are associatedwithhighneighbourhoodcrime rates?ThesearebutafewofthequestionsStatisticsCanadaisaddressingintheanalysisofcrimeratesfromcityneighbourhoods,aspart of the project on geocoding crime data. To ensure the analysis is accurate the effects of spatial autocorrelationmust beaccountedforinthemodel.ThispaperwillgiveanoverviewofthespatialautoregressivemodelsusedtoanalysethecrimerateinseveralCanadiancitiesandthepotentialinfluencethesemodelshaveforcrimereductionstrategiesinCanada.

    KEYWORDS:Simultaneousautoregressivemodel, Spatialautocorrelation, Spatialerror, Spatiallag model

    RSUM

    Comment lesdiffrents typesdaffaires criminellesse rpartissentellesdans lesvilles?Quels facteurssont lis avecdes tauxlevsdecriminalitdesquartiers?VoiciquelquequestionsqueStatistiqueCanadaabordedanslecadredanalysesdestauxdecrimesdanslesquartiersdesvilles,danslecadreduprojetsur legocodagedesdonnesdelacriminalit.Afindassureruneestimationfidle,leseffetsdelautocorrlationspatialedoiventtreprisencomptedanslemodle.Cepapier vadonnerunevuedensembledesmodlesautorgressifsspatiauxutilisspourlanalysedestauxdecrimedansplusieursvillescanadiennesetdelinfluencepossiblequesesmodlespeuventavoirsurlesstratgiesderductiondelacriminalit.

    MOTSCLS: Autocorrlationspatiale modleautocorrlationspatialedes erreurs modlevariableendognedcalemodleautorgressifsimultan modlevariableendognedcale

    1 KristaCollins,StatisticsCanada,100TunneysPastureDriveway,Ottawa,ON,Canada,K1A0T6, [email protected]

    1.INTRODUCTION

    Thereareoftenmanyreasonstoexpectthatdatawhicharelocatedclosetoeachotherinspacewillbemorerelatedthandatathatarefurtherapartthisisabasiclawofgeography(Cressie1993).Byanalysingthespatialstructureofthedata,moreinformationcanbelearntaboutthenatureofthespatialprocess.Forexample,thedistributionofcrimewithinacityisknowntobeaffectedbythepropertiesofthedifferentcityregionsandbythesocialinteractionsofthepeopleinthoseareas(Puech2004).Bymappingthedistributionofcrime,regionswithahighconcentrationofcrime,knownashotspotscan be detected. The crime data can then be related to social and demographic characteristics to discover the factorsassociated with high and low crime neighbourhoods. The distribution of neighbourhood crime can also be comparedacrossmajorCanadiancities.This informationcan thenbeusedbypoliceservicesandcommunityplanners to identifypotentiallyhighriskareasandtomake informeddecisionsonresourceallocationandcrimereductionstrategies.Thisisprecisely the objective of the geocodingcrimeanalysis project atStatisticsCanada (Savoie et al., 2006Savoie, 2008StatisticsCanada, 2008).

    Geocodingistheprocessofassigninganexactlatitudeandlongitudecoordinatetoastreetaddress(Savoieetal.,2006).Inthiscasethestreetaddresscorrespondstothelocationwhereacriminalincidenttookplace.Geocodedcrimedatacanthenbeintegratedintoageographicinformationsystem torelatetheincidentstothe characteristicsoftheneighbourhoodsin which they reside. The relationship between crime and neighbourhood characteristics can then be modelled. If thespatialstructure in thedata is not completelyaccounted for in theregressionmodelthen theresidualswill bespatiallyautocorrelated.Spatialautocorrelation,simplydefinedasthenonzerocovariancebetweenobservationsthatarerelatedinspace,violatesthefundamentalassumptionof independenceoftheobservations inastandardregressionmodel.Spatial

  • 2

    autocorrelation can cause inefficient estimation of the parameters in a standard regression model and can lead toinaccurateestimationofthesamplevarianceandsignificancetests.

    ThispaperwillgiveanoverviewofthegeocodingcrimeanalysisprojectatStatisticsCanada.Thefocuswillbeonthespatialmodelsusedtorelate theneighbourhoodcrimeratetovarioussocioeconomicanddemographicvariablesfrom theneighbourhood.Interestedpersonsarereferredtotheoriginalanalysispapersformoreinformationabouttheotheraspectsoftheanalysiscoveredinthegeocodingproject(Savoieetal.,2006Savoie,2008StatisticsCanada,2008).Thepaperwillstartwithanoverviewofthespatialmodelsusedintheanalysisofneighbourhoodcrime.Insection3,theimportanceofmodelingthespatial structureofthedataisillustratedwithananalysisofviolentcrimeinHalifaxandpropertycrimeinMontral.Thepaperconcludeswithsomecommentsaboutthe useofspatial models.

    2. SPATIALAUTOREGRESSIVEMODELS

    Spatial data can be classified into one of three types: regional, geostatistical and point patterns (Cressie, 1993). Thegeocodedincidentsofcrime are classifiedaspointpatternswhichcanbemappedtoillustratethedistributionofcrimeinagivenarea(Figure3Band4B).However,tocreateamodelrelatingcrimetothecharacteristicsoftheneighbourhoodinwhich it occurs, the crime data is aggregated to regions rather than modelling the exact location of all the criminalincidents. This is classified as regional data.By definition the regions span the entire study area and the focus of theanalysis is onmodelling the regional data as it is defined. There is hence no possibility of interpolation between twoadjacentregions.Theobserveddatapointsareassumedto be representativeoftheentireregioninwhichtheyreside.

    Although there are twomain realms for the spatial analysis of regional data, only simultaneous autoregressive (SAR)models,whicharecommonlyusedfortheanalysisofcrimeintheliterature,wereusedforthegeocodingproject.SARmodelsarehencetheonlyformspatialmodelsconsideredinthispaper.

    2.1TaxonomyofSimultaneousAutoregressiveModels

    Prior to writing a formula for theSARmodels, a definition of how the regions relate to one another in themodel isneeded. In the simplest form of the model, regions influence one another only if they are considered neighbours.Neighbouringlocationscanbedefinedbyacontiguitystructure,whichisbasedonthegeographicalarrangementoftheneighbourhoodsbyadistanceband,inwhichallregionswithinacertainspecifieddistanceareclassifiedasneighboursor by knearest neighbours, in which the kspecified closest regions are considered neighbours (Durbin, 1998). ThesedifferentpossibilitiesareillustratedinFigure1.

    A CBA CB

    Figure1:Differentmethods fordefining neighbouring regions for the selected region inyellow.A)Rook contiguityincludeslocationswithacommonborder,B)Queencontiguity includesregionswithacommonborderorvertex,andC)Distance includesallregionswhosecentroidiswithinthespecifieddistanceband.

    The desired neighbourhood structure is represented in the model by a spatial weights matrix, W. The spatial weightsmatrixisabinaryNNmatrix,wheretheoffdiagonalelements,Wij for ij,equalone if locationj isaneighbouroflocation i and zero otherwise. Thus the spatial weights matrix allows for the potential interaction of neighbouringlocationsinthemodelandrulesoutspatialdependencefornonneighbouringregions.Byconvention,zerosareplacedonthediagonal elementsof thematrix,Wii, indicatingthata location cannotbeaneighbourof itself.Theelementsof thematrixarethenrowstandardisedsothatthesumofeachrowisequaltoone.

    Thegenericformofthe SARmodel canthenbewrittenas:

    uW

    yWXy + =

    + + = e l e

    e r b

    2

    1 (1)

  • 3

    whereyistheN1vectorofthedependentvariable,XistheNkmatrixofindependentvariables,isthek1vectorofregressioncoefficients,and arescalarspatialautoregressiveparameters,W1and W2arespatialweightsmatricesandand uarethe N1 vectorsoferrorterms(Anselin,1988).Forthispapertheresiduals,u,areassumedtobenormalwithahomoskedasticvariance,althoughitispossibletointroduceheterogeneityinthemodelbyrelaxingthisassumption.

    Equation1showsthattherearetwodifferentwaysofaddingspatialeffects intoastandardregressionmodeleitherbyaddingaspatialautoregressive termW1yorbymodelingspatial dependence in the error terms,W2. Theuseof thesubscripts 1 and 2 on the spatial weights matrices is simply to denote the possibility of modelling a different spatialstructureforthedifferentspatialterms.Sincetheweightsmatricesarerowstandardized,thetermsW1yandW2representtheaveragevalueofthedependentvariableandtheresidualsfromneighbouringlocations,respectively.

    Bysettingeitherorbothofthespatialparameters,and inequation1equal tozero,differentformsofSARmodelscanbe obtained. If both and are set to zero, then the SAR model reduces to a standard linear regression model( e b + =Xy ).By removing the spatial dependence in the error terms (=0) theSARmodel reduces to the spatial lagmodel:

    e r b + + = yWXy 1 (2)wheretheterm W1yiscalledthespatiallag (Anselin,1988).Ifthespatialdependenceisonlyintheerrorterms,obtainedbysetting =0,thentheresultingSARmodeliscalledthespatialerrormodel:

    uWXy + = + = e l e e b 2, . (3)ThefullSARmodel,equation1,containsboththespatiallagandspatialerrorterms.Thismodelisrarelyusedinpracticeandisnotdescribedinthispaper.Thespatiallagmodel(2)andthespatialerrormodel(3)aredescribedindetailinthenexttwosubsectionsandtheyareillustratedintheanalysisofpropertycrimeinMontralinsections3.3and3.4.

    2.2SpatialLagModel

    In the spatial lagmodel (2), the inclusion of the spatial lag as a covariate in themodel implies there is a theoreticalassumptionthatthedependentvalueatone locationhasadirectinfluenceonthevalueatneighbouringlocations.Inthecaseofcrimedatathisdirecteffectofneighbouringregionscouldbecausedbyspatialdiffusionascrimespreadsfromone location to another or by other social interactions. The magnitude of the spatial coefficient, , could then beinterpretedasameasureoftheeffectofneighbouringlocationsgivenalltheothervariablesinthemodelhoweverthisisnotnecessarilythecase.Thespatiallagcoefficientonlyrepresentstheeffectofneighbouringlocationsifthespatialunitsthat are used to aggregate the datamatch the underlying spatial structure in the data. In other words, the variable ofinterest needs to be homogeneously distributed in each neighbourhood, as shown in Figure 2A. Otherwise there is amismatchbetweenthespatialscaleusedtomeasurethephenomenonandthespatialscaleatwhichitoccurs(Figure2B).The spatial lag coefficient then indicates the strengthof themismatchbetween the scales.Thismismatch between thespatialunitsiscommonwhenpredefinedadministrativeunitsareusedfortheanalysis.Inthiscasethespatiallagtermhasnorealmeaningintheinterpretationofthefinalmodel.Inpracticeformostsocialphenomenathespatiallagcoefficientrepresentsinpartthedirecteffectofneighbouringlocationsandtheeffectofthemismatchedscale.The spatialcoefficientisstillanimportantpartofthemodelasitaccountsforthespatialstructureandallowstheothervariablesinthemodeltobeinterpretedinthesamewayascoefficientsinastandardregressionmodel.

    A BA B

    Figure2:Interpretationofthespatiallagcoefficient:A)illustratesthespatialunitsmatchingandB)illustratesadifferentspatialscaleofmeasurementandoccurrence.

    Thespatial lagmodelcanalso bewrittenas: e r b r 11

    11 )()(

    - - - + - = WIXWIy (4)Inthisformitisapparentthatthevalueofyatagivenlocationisdependentonnotonlytheindependentvariablesfromthat location but it is also dependent on the independent variables from all other locations by means of the spatial

  • 4

    multiplier, ( ) 11 - - W r (Anselin, 2002). The dependent variable is also correlated with the error term at all locations.These correlations between the variables that are induced by the model make least squares an inefficient method ofestimating themodelparameters.Theparametersmust insteadbe estimatedbymaximum likelihood, a twostage leastsquaresapproachorbyinstrumentalvariables.Inthemaximumlikelihoodapproach,anestimateof risobtainedfirstbythenonlinearoptimizationofthefollowingloglikelihoodequation:

    - +

    - - - =

    ii

    LOLO wN

    eeeeNL )1ln(

    )()'(ln

    2 r

    r r

    , (5)wherethevariableswi representtheeigenvaluesofthespatialweightsmatrix,eo representstheresidualsfromaregressionofyonX,andeL representstheresidualsfromtheregressionofW1yonX(AnselinandBera,1998).Maximumlikelihoodcanthenbeusedtoestimatetheothercoefficientsandthestandarddeviationbyusingtheestimatedvalueof r.

    Themodel estimateshave theasymptoticpropertiesofconsistency,normalityandefficiency(AnselinandBera,1998).Theasymptoticvariance,usedforhypothesistesting,canbeestimatedas:

    [ ] [ ] [ ] [ ] [ ]

    1

    42

    22

    222

    2

    20

    )(

    0'

    '

    )(

    )'(

    ,,AsyVar

    -

    + +

    =

    s s

    s s b

    s s b

    s b b

    s b r

    NWtr

    XXXWX

    WtrXWXXWXWWWtrWtr

    A

    A

    AAAAAAA

    (6)

    where 111 )( - - = WIWWA r .Asshowninthematrix,thecovariancebetweentheerrortermandtheregressioncoefficientis

    zero, but there is a nonzero covariance between the spatial coefficient and the error and between the spatial lag andregressioncoefficients.

    Contrary to a standard regressionmodel there are two types of observed error terms for the spatial lagmodel, as theprediction error and the residuals are not equal (Anselin, 2005). As usual, the prediction errors are defined as thedifferencebetweentheobservedandfittedvalues, yyep .. - = ,where b r )( 11 XWIy

    - - = .Bydefinitionthepredictionerrorsarespatiallyautocorrelated.Theresidualsaredefinedas b r )( 1 XyWIe - - = ,and theywillbeuncorrelated if themodeladequatelyaccountsforthespatialautocorrelationinthedata.Thereforealltestsforthemodelassumptionsandforanyremainingunexplainedspatialautocorrelationinthedataaredonewiththeresidualsandnotthepredictionerrors.

    2.3SpatialErrorModel

    Thespatialerrormodel(3)canalsobewrittenas:uWIXy 12)(

    - - + = l b (7)whichillustratesthatthespatialdisturbanceiscompletelymodelledintheresiduals.Thespatialerrormodelisusedwhenspatial autocorrelation is found in the residuals from a standard regression model and the dependence needs to beaccounted for in the model to obtain unbiased and efficient estimates of the regression parameters. This spatialdependencecouldbetheeffectofunmeasuredvariablesormeasurementerrorswhichhaveasystematicspatialpattern.Thereisthusnodirectinterpretationofthespatialerrorcoefficient,,soitistreatedasanuisanceparameter.

    Thespatialerrormodelcanbe estimatedusinggeneralised leastsquaresconditionalon thevalueof, since thespatialdependence is completely contained in the model error terms. The value of is estimated first by the nonlinearoptimizationofthefollowingloglikelihoodequation:

    ( ) - +

    - =

    iiwN

    uuNL l1lnln

    2(8)

    where [ ] LLLLLLLL yXXXXyyyuu - = -1 ,andyL andXLarespatiallyfilteredvariables: yWyyL 2 l - = and XWXXL 2 l - = ,andthevariableswi representtheeigenvaluesfromthespatialweightsmatrix.Theresultingestimateofl isthenusedtoestimatetheotherregressioncoefficientsandthevariancebygeneralisedleastsquares.Theasymptoticvariancematrixoftheparameterestimatesisblockdiagonalbetween b andtheothertwoparameters, 2 s and l.Itisestimatedby

  • 5

    [ ] [ ]12AsyVar - = LLXX s b (9)and

    [ ]1

    22

    24

    2

    )()(

    )(

    )(2/

    ,AsyVar

    -

    + =

    BBBB

    B

    WWtrWtrWtr

    WtrN

    s

    s s

    l s (10)

    where 122 )( - - = WIWWB l (AnselinandBera,1998).

    Differentstructuresforthespatialdependenceintheerrortermscanalsobeusedtocreatealternateformsofthespatialerrormodelinsteadoftheautoregressiveprocessthatisillustratedhere.Forexample,aspatialmovingaverageprocess,inwhich the error term is dependent on a random uncorrelated error term and the average neighbouring values of theuncorrelatederror,couldalsobeusedwiththismodelstructure(AnselinandBera,1998).

    3.AnalysisofNeighbourhoodCrime

    3.1DescriptionofVariables

    3.1.1CrimeData

    Thecrimedataused inallof thespatialmodels is obtainedfrompoliceservices through theUniformCrimeReportingSurvey(UCR).TheUCR,administeredbytheCanadianCentreforJusticeStatistics(CCJS)atStatisticsCanada,collectsadministrativedatafromall of thepoliceservicesacrossCanada.ThusanyanalysisofUCRdata isbasedonlyon thecrimesknowntothepolice,whichgiveaspecificpictureofthenatureandtheextentofcrime inCanada(Savoieetal.,2006).

    FormostoftheanalysisdoneatCCJStheUCRdataisgroupedintothreebroadcategoriesofcrime:violent,propertyandotheroffences.Violentoffencesaredefinedascrimesagainsttheperson andinclude:homicide,attemptedmurder,sexualassault,assaultandrobbery.Property offences aredefinedas crimes against propertyand include: arson,breakingandentering,theft,possessionofstolengoodsandmischief.Thecategoryforotheroffencesincludesalltheoffencesthatarenotclassifiedasviolentorproperty,suchasbailviolations,drugpossessionortrafficking,andcounterfeitingcurrency.Manyoftheseothercrimescannotbemodelledspatiallyasthereisnophysicallocationwheretheoffencetakesplace.For example, violations against the administration of justice, such as bail violations and failure to appear in court areusuallygivenadefaultlocationofthecourthouseorpolicestation.Forthisreasononlyviolentandpropertycrimesareanalysedwiththespatialmodels.

    Sincethevolumeofcrimeinanygivenregionisdirectlyrelatedtothesizeofthepopulationin theregion,crimestatisticsareusuallydisplayedasarateofthenumberofincidentsperpopulation.Thisprovidesagoodrepresentationofthecrimeinlargegeographyareas,suchascitiesorprovinces,wherethepopulationonadailybasisisrelativelyconstant.Howevera crime rate based solely on the residential population is not sufficient for representing the rate within a cityneighbourhood,sincepeoplemovebetweenneighbourhoodduringthedayandhavethepotentialtobeavictimofcrimeinanyoftheneighbourhoodsthattheyvisit.Instead, forthepurposeofthisanalysis,thecrimerateiscalculatedasthenumberofcriminalincidentsperpopulationatrisk,wherethepopulationatriskisdefinedasthecombinedpopulationofresidents andpeoplewhowork in the area.Using thepopulation at risk helps tostabilise the crime ratewhichwouldotherwise be inflated for downtown and commercial areas where there is a small residential population but a highconcentrationofpeopleworkingintheareaorengaginginotheractivities(Savoieetal.,2006).Inallofthemodelsthecrimerateislogtransformedtoobtainanapproximatelynormaldistributionforthedependentvariable.

    3.1.2ExplanatoryVariables

    Thecrimedatawasrelatedtovariousneighbourhoodcharacteristicsthatdescribethedemographicsandsocioeconomicstatus of the resident population, the condition of the dwellings in the neighbourhood and the area is zoned for thedifferenttypesoflanduse.ThedemographicanddwellingcharacteristicswereobtainedfromtheCensusofPopulationconducted onMay 15, 2001. The demographic characteristics include the percentage of males aged 15 to 24 in the

  • 6

    resident population, the proportion of the population who are aged 15 and over and have never been married, thepercentage of the population who aremembers of a visibleminority, the proportion of singlemother families amongeconomic families living in private households, and the proportion of residents who live alone. The socioeconomiccharacteristics included variables about the median household income, the proportion of residents over 20 with abachelorsdegreeandtheproportionofthepopulationlivinginaprivatehouseholdwithalowincomein2000.Variablesdescribing theconditionsof the dwellings in the neighbourhood, suchas theproportion of dwellings in need ofmajorrepairandtheproportionofhouseholdthatareoccupiedbytheowner,werealsoincludedinthepreliminarymodels.

    The landuse variables describe the proportion of the neighbourhood thatwas zoned for commercial land use and forsingleandmultiple family dwellings.The landuse datawasobtained directly from each of the cities analysed in thisproject.Additional variables describing other factors thought pertinent to thecitywere also added to theanalysis.Forexample,this includesavariablerepresentingthedensityofbarsandotherdrinkingestablishmentsoverthegeographicareain Montral,andthedistancefromthedowntowncoreinThunderBay.

    Prior to the analysis the explanatory variables were transformed to have amean of zero and a standard variance. Forfurther informationon thevariablesused in thisanalysisand informationon theirpotential influenceon thecrimeraterefertoSavoieetal.(2006)andSavoie(2008).

    3.2AnalysisofCrimeinHalifax

    The study area ofHalifax included the portion of theHalifaxRegionalMunicipality that is primarily servicedby theHalifaxRegionalPolice.TheHalifaxRegionalMunicipalityconsistsoffourseparatemunicipalities:Halifax,Dartmouth,BedfordandHalifaxCounty,whichwereamalgamated in1996tobegovernedbyasinglecitycouncil.Thestudyareaincludestheregion southwestoftheharbour,the formermunicipalityofHalifaxandtheregionnortheastoftheharbour,the formermunicipality ofDartmouth (Figure3A).Theremainderof theHalifaxRegionalMunicipality, including theregionnorthoftheharbourthatconnectsHalifaxandDartmouth,knownasBedford,areservicedbytheHalifaxCountryRuralRCMPand werenotincludedin theanalysis.ThestudyareaisthusdividedintotwoseparateregionsbytheHalifaxharbour.Thisisthefirstindicationthatthereisalocationeffectinthedatathatneedstobeconsideredintheanalysis.

    Figure 3: A)TheHalifaxstudyareawithneighbourhoodsdefinedbycensustracts. B)Thedistributionofviolentcrime.

    Thestudyareaspans160squarekilometresandhadaresidentpopulationof191,514peoplein2001.Theneighbourhoodboundariesusedinthisanalysisaredefinedbythe51censustractsintheareas.Censustractsaresmallrelativelystablegeographicareaswith apopulation of2,500 to8,000 residents, definedbyStatisticsCanada in conjunctionwith localspecialists.Forthepurposeofthespatialanalysis,thecensustractsareconsideredneighboursofeachotherbasedonaqueencontiguitystructure,whichincludesregionsthateitherhaveaborderorvertexincommon.

  • 7

    Thedistributionofviolentcrime inHalifaxisconcentratedinthedowntownarea,whichis locatedoneithersideoftheHalifax harbour (Figure 3B). The logtransformed violent crime rate was modelled by the various neighbourhoodcharacteristicsusingastandardlinearregressionmodelinSAS.Sincethestudyareaisdividedintotwoseparateregionsby the harbour, a binary location variable indicating if the neighbourhood was located southwest of the harbour (inHalifax)ornortheastoftheharbour(inDartmouth)wasincludedasavariableintheanalysis.Astepwiseprocedurewasthenusedtodeterminetheoptimalsetofvariablestoincludeinthemodelandvarianceinflationfactors(VIFs)wereusedtotestformulticollinearityamongsttheexplanatoryvariables.

    Thelocationvariablewasmarginallysignificantinthemodeloftheentirestudyarea(Table1)soseparatemodelswerealso fit to the regions northeast and southwest of the harbour. A comparison of the regional models indicates thatdifferentvariablesaremorecloselyrelatedtoviolentcrimeinthedifferentregionsofthecity,whichisnotapparentfroma model of the entire study area with the binary location variable. For example, neighbourhoods where a higherpercentageoftheresidentpopulationhadabachelorsdegreetendedtohavelessviolentcrimeinthenortheastregions,allelsebeingequal.Thiswasnottrueintheregionssouthwestofthecitywhichiswherethethreelocaluniversitiesarelocated.Thisdifferencemaybebecauseresidentswithabachelorsdegreearemoreevenlyspreadoutinthesouthwestregionsaroundthethreeuniversitieswhichareallinneighbourhoodswithavaryinglevelofviolentcrime.Regardlessofthereasonforthedifference,accountingforthespatialnatureofthecityallowsforabetterfitofthemodeltothedata.TheadjustedRsquaredvalueforthemodeloftheentirestudyareawas0.48,whilethevalueforthemodelnortheastoftheharbourwas0.81 andfortheregionsinthesouthwestthevaluewas0.60.

    Table1:ThestandardlinearregressionmodelsforviolentcrimeintheentireHalifaxstudyareaandtheneighbourhoodsnortheastandsouthwestoftheHalifaxharbour.

    Variable Coeff pvalue Coeff pvalue Coeff pvalueIntercept 3.15

  • 8

    wasnomutlicollinearityamongthepredictorvariables.TheresidualsofthestandardregressionmodelwerethentestedforthepresenceofspatialautocorrelationusingaMoransIstatistic.Thevalueofthe statisticwas0.23forpropertycrime(p

  • 9

    To ensure the spatial autocorrelation in the datawas completely accounted for by the finalmodel, the residuals weretestedforremainingspatialautocorrelationusinganotherLMtest,inwhichtheresidualsfromthespatiallagmodelaretestedforspatialerrordependence(Anselinetal.,1996).TheresultsoftheLMtestwas1.72(p=0.190)indicatingthatthespatialstructureofthedatawasadequatelyexplainedbythe spatiallagmodel.

    Table2: The standard linearregressionandspatiallagmodelsfitto therateofpropertycrimeontheislandofMontral.

    Variable Coeff pvalue Coeff pvalueIntercept 3.47

  • 10

    4.CONCLUSION

    Whenever the dataset being analysed is representative of a geographic location it is important to consider the spatialstructure of the data in the analysis. As shown with the analysis of violent crime in Halifax, a different model wasobtainedfor the twoseparateareas in thestudy. Inbothregionsa standard linear regressionmodeldidaccount for thespatialstructureofthedata,butbyallowingadifferentmodelinthetworegionsofthecity,themodelsprovidedabetterfittothedataandprovidedmoreinformationaboutthedifferentfactorsinfluencingcrimeinthearea.IntheanalysisofpropertycrimeontheislandofMontral,aspatialmodelwasnecessarytoallowforaccurateinferenceoftheothermodelparameters.Ifastandardlinearmodelhadbeen usedforthisdataitmayhaveerroneouslyresultedintherecommendationof a policy to repair housing in atriskneighbourhoods as a possiblemeans of reducing crime. Although thatmay beadvantageous to the people living in those regions, the spatial model did not show that neighbourhoods with a highproportionofhousesinneedofmajorrepairwasoneofthemostimportantvariablesinpredictingtherateofpropertycrime.Thespatiallagmodelinsteadillustratedthatneighbourhoodswithahighpropertycrimeratewereassociatedwithneighbourhoodswithahighpercentageof lowincomeprivatehouseholds,people livingalone,commercialzoningandnumberofbarspersquarekilometre andalowpercentageofresidentswhoaremembersofavisibleminority.Finally,thedifferencebetweenthespatiallagandspatialerrormodelsfortheMontralpropertycrimedataillustratedtheimportanceofensuringtheformofthespatialstructureinthedataisaccuratelyrepresentedinthemodel.ThiscanbeeasilydetectingusingLMtestsappliedtotheresidualsofastandardlinearmodel.

    REFERENCES

    Anselin,L. (1988). Spatialeconometrics:MethodsandModels.Kluwer,Dordrecht.

    Anselin, L. (2002). Under the hood: Issues in the specification and interpretation of spatial regression analysis.Agriculturaleconomics.27:247267.

    Anselin,L.(2005). ExploringspatialdatawithGeoDa:aworkbook.SpatialAnalysisLaboratory(SAL).DepartmentofAgriculturalandConsumerEconomics,UniversityofIllinois,UrbanaChampaign,IL.

    Anselin,L,A.Bera,R.Florax,M.Yoon. (1996).Simplediagnostic tests forspatialdependence.RegionalScienceandUrbanEconomics.26:77104.

    Anselin, L. and A. Bera. (1998). Spatial dependence in linear regression models with an introduction to spatialeconometrics.IN:A.UllahandD.Giles(eds.)Handbookofappliedeconomicstatistics.NewYork:Marcel Dekker.

    Cressie,N.(1993).StatisticsforSpatial Data(RevisedEdition). NewYork:JohnWiley.

    Dubin,R.A.(1998).Spatialautocorrelation:aprimer.Journalofhousingeconomics.7:304327.

    Legendre,P. (1993).Spatialautocorrelation:troubleornewparadigm?Ecology.74(6):16591673.

    Puech, F. 2004.How do criminals locate?Crime and spatial dependence inMinasGerais.Urban/ Regional 0509018EconomicsWorkingPaperArchiveEconWPA.

    Savoie,J.,F.BdardandK.Collins.(2006).NeighbourhoodcharacteristicsandthedistributionofcrimeontheIslandofMontral.CrimeandJusticeResearchPaperSeries.StatisticsCanadaCatalogueno.85561XIENo.7.

    Savoie,J.(2008).AnalysisofthespatialdistributionofcrimeinCanada: Summaryofmajortrends1999,2001,2003and2006. CrimeandJusticeResearchPaperSeries.StatisticsCanadaCatalogueno.85561XIENo.15.

    StatisticsCanada.(2008).Neighbourhoodcharacteristicsandthedistributionofcrime:Edmonton,HalifaxandThunderBay.JoseSavoie(ed.).CrimeandJusticeResearchPaperSeries.StatisticsCanadaCatalogueno.85561XIE,no.10.