GWR Presentation
description
Transcript of GWR Presentation
![Page 1: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/1.jpg)
Geographically Weighted Regression
CSDE Statistics Workshop Christopher S. Fowler PhD. February 1st 2011 Significant portions of this workshop were culled from presentations prepared by Fotheringham, Charleton and Brunsdon and presented at the 2010 Advanced Workshop on Spatial Analysis at the University of Santa Barbara.
Center for Studies in
Demography and Ecology
University of
Washington
![Page 2: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/2.jpg)
Outline for the Session
The motivation for GWR
◦ Examples from YOUR discipline
Mapping OLS Residuals
◦ A good baseline for why we need GWR
GWR
◦ Definitions, basic concepts
Running GWR
◦ A straightforward implementation in ArcGIS
GWR and some extensions
![Page 3: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/3.jpg)
Basics of OLS
y X
Assumes a stationary process
Same stimulus provokes the same
response anywhere in the study area
![Page 4: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/4.jpg)
Why might relationships vary
spatially? Sampling variation
Relationships intrinsically different across
space (attitudes, preferences, contextual
effects)
Model misspecification
![Page 5: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/5.jpg)
Applications: Ecology
―GWR works on
trees…‖
Could have been “differentiated
sampling pattern creates predictable
and changing levels of interaction
among observations”
![Page 6: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/6.jpg)
Applications: Public Health
The relationship between
mortality and occupational
segregation and between
mortality and unemployment
varies across Tokyo
Relationships vary
systematically
![Page 7: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/7.jpg)
Applications: Sociology/Public Policy
The link between multifamily
housing and residential burglaries
varies widely even when
controlling for numerous
socioeconomic and neighborhood
factors
Missing variables (and
they may very well be
unknowable)
![Page 8: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/8.jpg)
Back up…How do we know if we
have nonstationarity in our model?
Map residuals and test them for spatial
autocorrelation
…if our model errs systematically with a
spatial pattern then we may be on to
something.
![Page 9: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/9.jpg)
Poverty in the Southern U.S.
![Page 10: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/10.jpg)
Our example Model
65
Poverty FemaleHeadedHousehold Unemployed
Black andolder M etro
AtLeastH ighSchoolEducation
Based on the work of Paul Voss and Katherine Curtis
These are all understood to be good predictors of poverty
What kinds of spatial structures influence this data set?
![Page 11: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/11.jpg)
Lab Part 1
Run our OLS model in ArcGIS
Examine model output
Map residuals
Calculate Moran’s I and Local Moran’s I
![Page 12: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/12.jpg)
![Page 13: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/13.jpg)
Our best aspatial model
![Page 14: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/14.jpg)
So what now?
◦ Add more missing variables and try again
Repeat the steps from the lab
◦ Accept that there is something about certain places
that makes them different (spatial heterogeneity)
Try GWR
◦ Test variables meant to explore interactions taking
place at short distances (spatial dependence)
Try Spatial Regression (Likely a spatial lag model)
◦ Assume that the correlation is a ―nuisance‖ and
control for it in the error term
Try Spatial Regression (Likely a spatial error model)
![Page 15: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/15.jpg)
Outline for Part II
What is GWR
Weighting in GWR
![Page 16: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/16.jpg)
Geographically Weighted Regression
Local statistical technique to analyze
spatial variations in relationships
We are not content with global averages
of spatial data (climate for example)
Why should we be satisfied with global
averages in a statistical analysis?
![Page 17: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/17.jpg)
Put another way….Simpson’s
Paradox If we think of these
points as our data
grouped into colors
by region we can
see that the global
and local models
differ significantly
Source: Rücker and Schumacher BMC Medical Research Methodology 2008
8:34 doi:10.1186/1471-2288-8-34
![Page 18: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/18.jpg)
Basic definitions
Spatial nonstationarity exists when the same stimulus provokes a different response in different parts of the study region
Global models are statements about processes that are assumed to be stationary and, as such, are location independent
Local models are spatial disaggregations of global models, the results of which are location specific
Spatial heterogeneity refers to spatial patterns resulting from broad similarities usually over time
Spatial dependence refers to spatial patterns that result from interactions among observations
GWR in greater detail
![Page 19: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/19.jpg)
Spatial Heterogeneity and Spatial Dependence
![Page 20: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/20.jpg)
GWR and Spatial Processes
GWR is excellent at picking up broad
scale regional differences
◦ spatial heterogeneity
Not as effective at dealing with small scale
interaction processes
◦ Too much bias in each local model
◦ That doesn’t mean it wont try (and give you
misleading results)
![Page 21: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/21.jpg)
GWR in a nutshell
Global model
Where i indicates that there is a set of coefficients estimated for every observation in our data set
y X
i i i iy X
becomes
![Page 22: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/22.jpg)
The Key Difference
We estimate a set of regression
coefficients for each observation
To do so we weight near observations
more heavily than more distant ones.
We may also estimate coefficients based
on some local subset of observations
![Page 23: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/23.jpg)
Some advantages of GWR
Excellent tool for testing model
specification
◦ Where does model fit look good, where are
you missing something?
Residuals generally lower and not spatially
autocorrelated
![Page 24: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/24.jpg)
Real values for β
.9 .8 .8 .7 .5
.8 .7 .6 .5 .4
.7 .6 .5 .4 .4
.6 .5 .4 .3 .2
.5 .4 .3 .2 .1
![Page 25: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/25.jpg)
Estimated Values of β in global
model
.5 .5 .5 .5 .5
.5 .5 .5 .5 .5
.5 .5 .5 .5 .5
.5 .5 .5 .5 .5
.5 .5 .5 .5 .5
![Page 26: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/26.jpg)
Residuals from global model
+ + + + 0
+ + + 0 -
+ + 0 - -
+ 0 - - -
0 - - - -
![Page 27: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/27.jpg)
Reasons to use GWR
Identify model misspecification
Identify nonstationarity in relationships
Improved model fit (R2, AIC, etc)
Reduced spatial autocorrelation
Represent ―context‖
◦ Address spatial heterogeneity when precise
variables may not exist
![Page 28: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/28.jpg)
You’ve convinced me, what next?
Run your aspatial model (as we did in 1st
lab)
◦ We will want the results and diagnostics to
compare with what comes next.
Decide how you are going to weight your
nearby locations
◦ Fixed bandwidth
◦ Variable bandwidth
◦ User-defined bandwidth
![Page 29: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/29.jpg)
It all comes down to how you
weight the observations… We can use a fixed bandwidth ―h‖
Number of observations will vary, but area they represent will remain constant
Wij = exp[-((dij/h)2)/2]
h
![Page 30: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/30.jpg)
Weighting option 2
Or we can employ an adaptive bandwidth
Number of observations will remain fixed, but area will not be the same
Wij = [1-(dij2/ h2)] 2 if j is one of i‘s N nearest neighbors
![Page 31: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/31.jpg)
Kernels and Weights
So how do we know what bandwidth to use?
•Bandwidth specifies
shape of weights curve
•Kernel type tells us
whether we will define
our bandwidth based on
distance (fixed) or
number of neighbors
(adaptive)
![Page 32: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/32.jpg)
Judging the appropriate bandwidth
A tradeoff between
◦ Bias: we include observations that are not part of the same spatial ―group‖
and
◦ Variance: we don’t have enough points in our model to say anything with conviction
AICc or CV measure model fit
Optimize fit to obtain best bandwidth.
AIC
Bandwidth
Optimum
Variance Bias
![Page 33: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/33.jpg)
To sum
Weighting assumptions are very important to outcomes in GWR
Fixed distance kernel is more appropriate when the distribution of your observations is relatively stable across space (e.g. size, number of neighbors).
Adaptive kernel is appropriate when distribution varies across space (e.g. events are clustered or polygons are heterogeneous)
Once a kernel type is selected optimization takes some of the guesswork out of it, but robustness checks are still needed
![Page 34: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/34.jpg)
Residuals from the OLS model from
last lesson
Looks
reasonably good
Moran’s I is still
.22 and highly
significant
![Page 35: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/35.jpg)
Lab
Run GWR model
Check Residuals
Check variation in coefficients
![Page 36: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/36.jpg)
![Page 37: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/37.jpg)
Further topics/issues in GWR
Where to go for next steps
General troubleshooting
Significance testing
Outlier problems
Poisson and Logistic model
implementations
Mixed form models
![Page 38: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/38.jpg)
Other software implementations of
GWR
GWR 3.x (4.0 should be out soon)
R (spgwr package)
Stata
Matlab
Perhaps others I haven’t heard of…
![Page 39: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/39.jpg)
General Troubleshooting
Regional dummies –BAD
◦ Eliminate them from model—we are trying to
show regional variation, not control for it
Binary and low probability count variables
◦ Use caution, lack of variation may cause
model to crash or have trouble finding a
workable bandwidth
![Page 40: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/40.jpg)
Significance Testing
How do I know if the variation I see in my
coefficients is meaningful?
Could do t-test, but you will run into
problems with multiple (1,387) tests
◦ Results in lots of false positives
◦ Standard correction (Bonferroni) will make
any significance finding nearly impossible
![Page 41: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/41.jpg)
Best Method: Monte Carlo
simulation Randomly reassign all observation values
(dependent and independent variables travel together) to different observation locations
◦ Each county’s data gets assigned randomly to a different county
Re-run GWR and record coefficients
Repeat lots of times (at least 100)
Define a distribution for coefficient values and compare your coefficients to this distribution
![Page 42: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/42.jpg)
pe is effective number of parameters
p is the number of parameters
Other method: Fotheringham
Significance Test
1 e
e
Fotheringhamp
pnp
![Page 43: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/43.jpg)
Fotheringham Significance Test
1 e
e
Fotheringhamp
pnp
.05.001283
37.971 (37.97)
1387 8
Fotheringham
Type equation here.
In Excel we can find the significant T-statistic using:
TINV(.001283,1379)
In R we use:
qt(1-(.001283/2),1379)
Either way we get a value of ~3.23
![Page 44: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/44.jpg)
Results: Significant Nonstationarity
for Percent Hispanic
![Page 45: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/45.jpg)
Outlier problems
Outliers cause problems for everybody, but their impact is greater for local regressions, particularly when bandwidth keeps number of observations low.
In standard OLS ◦ Run model and identify observations with high or
low residuals (~ +/- 4)
◦ Weight these observations less than 1
◦ Re-run until none of the observations have extreme residuals
◦ Now do your GWR with weights assigned
![Page 46: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/46.jpg)
Poisson and Logistic model forms
Implementations exist in both R and GWR
3.x software
Both require much greater care with
respect to colinearity and lack of variation
![Page 47: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/47.jpg)
Mixed-form models
What if some of your variables are stationary and others have variation?
Mixed-form models allow you to hold some coefficients constant while allowing others to vary
Not yet implemented in any statistical package, but not that difficult from a technical standpoint
![Page 48: GWR Presentation](https://reader030.fdocuments.us/reader030/viewer/2022013118/55cf9926550346d0339bdc44/html5/thumbnails/48.jpg)
Concluding comments
What comes next?
◦ Spatial regression
◦ Multilevel models