Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and...
Transcript of Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and...
This article was downloaded by: [North Carolina State University]On: 09 September 2013, At: 01:37Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Journal of Environmental Planningand ManagementPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/cjep20
Constructing InterpretableEnvironments fromMultidimensional Data: GISSuitability Overlays and PrincipalComponent AnalysisJon Bryan BurleyPublished online: 02 Aug 2010.
To cite this article: Jon Bryan Burley (1995) Constructing Interpretable Environments fromMultidimensional Data: GIS Suitability Overlays and Principal Component Analysis, Journalof Environmental Planning and Management, 38:4, 537-550, DOI: 10.1080/09640569512805
To link to this article: http://dx.doi.org/10.1080/09640569512805
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information(the “Content”) contained in the publications on our platform. However, Taylor& Francis, our agents, and our licensors make no representations or warrantieswhatsoever as to the accuracy, completeness, or suitability for any purposeof the Content. Any opinions and views expressed in this publication are theopinions and views of the authors, and are not the views of or endorsed by Taylor& Francis. The accuracy of the Content should not be relied upon and should beindependently verified with primary sources of information. Taylor and Francisshall not be liable for any losses, actions, claims, proceedings, demands, costs,expenses, damages, and other liabilities whatsoever or howsoever caused arisingdirectly or indirectly in connection with, in relation to or arising out of the use ofthe Content.
This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
Journal of Environmental Planning and Management, Vol. 38, No. 4, 1995
Constructing Interpretable Environments from
Multidimensional Data: GIS Suitability Overlays and
Principal Component Analysis
JON BRYAN BURLEY* & TERRY J. BROW N²
*Department of Geography, Michigan State University, East Lansing, MI 48824, USA;
² School of Natural Resources and Environment, University of M ichigan, Ann Arbor, MI
48109 USA
(Received January 1994; revised February and May 1995)
ABSTRACT In landscape planning applications, practitioners and governmental
agencies are often faced with a broad array of clientele and constituents having
particular land use requirements and needs, ranging from biological conservation to
urban development, generating complex multidimensional regional planning goals and
objectives. Under this often complex situation, investigators are searching for methods
to intelligently simplify complicated spatial environments and render them into inter-
pretable and practical settings. While num erous investigators have studied the
generation of a single suitability map, we were interested in addressing the problem of
coping with a set of many suitability maps. We applied a data reduction method,
principal component analysis, across 15 suitability overlays representing diverse land-
scape requirements to search for simpli ® ed explanations indicating the latent structure
of the landscape. The study area was located in a moraine landscape of southern
Michigan. We discovered that the 15 suitability overlays could be reduced to seven
dimensions, containing 65% of the original data structure and that the seven dimensions
re¯ ect a structure where a variety of land uses each have their own optimal spatial
locations, indicating low to moderate competition between potentially con¯ icting land
uses and rendering a more easily understood environment. This approach did not render
a simple elegant solution but it did reduce the complexity associated with combining
many suitability maps.
Introduction
Geographical Information Systems (GIS) and statistical analysis are two com-
puter intensive numerical applications which are currently being explored in
unison. GIS databases contain spatial information about the attributes of site
speci® c locations. Small raster GIS databases may contain 1/4 million grid cell
cases (n 5 250 000) and 10 to 20 variables (v 5 10 to 20). Yet, these small GIS data
sets are substantial in size for many statistical applications, such as multiple
regress ion analysis, multivariate analysis and spatial autocorrelation techniques.
GIS databases present investigators with observation sets which are substantially
larger than observation sets typically examined and reported by investigators
employing traditional ® eld recording techniques.
537
0964-0568/95/040537-14 Ó 1995 University of Newcastle upon Tyne
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
538 J. B. Burley & T. J. Brown
Presently, relatively few studies have explored the potential of integrating
GIS with statistical analysis. Often investigators using GIS databases employ
heuristic models illustrated by the work of Johnson & Burley (1990) or pre-
viously developed empirical models to assess the contents of the landscape,
illustrated by Burley et al. (1990). Essentially, the purpose of these models is to
identify an analytic or a computational feature of the landscape such as the
suitability of a particular landscape feature to support a speci® c land use. The
map is generated by employing combinations of spatial inventory properties
such as depth to the water table or percentage slope. McHarg (1969), Steinitz et
al. (1976), Hopkins (1977), Westman (1985) and Steiner (1991) explain in greater
detail ideas associated with suitability analysis. However, as McHarg (1969,
pp. 31± 41) notes in his highw ay study, there is relatively little formal guidance
available when one is combining suitability maps to form grand composite
analysis maps. Currently , the scholarship associated with combining suitability
maps is not much further than the 1969 McHarg study. We suggest that one
approach for the genera l advancement of GIS methodology may be found in
multivariate statistical methods. Without statistical examination of the data set,
interaction relationships and covariance properties between the variables may
be lost, meaning that the investigator may miss some important properties
associated with the study area. This paper investigates the statistical descriptive
powers of the GIS database to reveal latent landscape structure and character to
search for a numerically derived approach to generate composite maps.
One reason why statistical methods may be infrequently employed in GIS
databases, is that inferential statistical algorithms are not imbedded in many GIS
software programs, meaning that the data must be exported to another software
program for statistical analysis. Today, converting GIS software generated ® les
to text ® les offers a solution for exporting the GIS ® les and importing the ® les
into a statistical analysis software program.
We are especially interested in exporting GIS ® les to a statistical software
package and to examine the dimensionality of a study site with multivariate
statistical analysis techniques. We wanted to take a series of suitability models
with associated suitability maps across a wide array of program types ranging
from housing and commercial land-uses to wildlife habitat and watershed
protection and study the associative qualities of these individual suitability
maps. Could 10 or more suitability maps derived from suitability models be
reduced to several simple dimensions (three or four overlays) containing most of
the information in the original suitability maps? Were there natural groupings of
suitability overlays? We were interested in this reductionist approach to see if
there was a less complex method of presen ting a landscape’s spatial properties.
A set of 10 or more suitability overlays may render the multivariate interpret-
ation of the landscape dif® cult to conduct. Traditional GIS methods enable a
single continuous variable to be spatially presented with relatively easy in-
terpretation. Even two variables can be combined into one overlay for interpret-
ation. Colour maps are helpful in rendering an interpretation. Simpli® ed
categories for each variable also aid in cross category interpretation. This process
can be applied to three variables; however, a large number of categories for each
variable generates a large legend . For example, suppose there are three variables
each with the potential for 10 values, resulting in thousands of different variable
combination sets for a given map. Often three or four ordinal categories are
selected for each variable generating nine to 16 potential categories. However,
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
Constructing Interpretable Environments from Multidimensional Data 539
with 10 variables, each containing three categories, the potential number of
categories is 177 147, although in many cases the actual combinations on a
composite map may be only 10% of the theoretical combinations because not
every combination exists within the study area. Nevertheless, a legend for a map
with only 17 000 combinations may still be much greater in size than the
composite map itself.
One approach to summarizing a large number of variables in a map is to sum
the variables or to multiply the variables (Hopkins, 1977). Multiplying can
generate a greater spread (variance) in the data set. However, without examin-
ing each variable, the contribution for each variable to a particular spatial
position is unknown. Thus the non-statistically derived composite map with
known contributions as identi® ed in a composite map legend is often con-
strained to a small set of variables.
Plant ecologists faced this reductionist problem. They desired to interpret
many variables and to reduce the number of dimensions associated with those
variables by applying multivariate statistical techniques such as principal
component analysis (PCA)Ð (see Johnson & Wichern, 1988). In their case, the
variables were vegetation types found in a stand or plot. For example, Curtis
(1959) derived the importance value (sum of the % frequency, % dominance and
% density) for each vegetation type in a tree stand, meaning that he could have
30 or more variables (vegetation types) and hundreds of cases (stands). He then
statistically analysed the variables and discovered that he could reduce the
number of dimensions to two or three variables without losing much of the
data’s character by combining the variables in a linear combination as indicated
by the statistical analysis. Pielou (1984) explains this statistical approach in
detail. Kendall (1939) illustrated the importance of this data reduction technique
but the numerical computing power necessary to conduct the multivariate
analysis was not present until the development of the computer. One study
conducted by Eastman & Fulk (1993) demonstrates the utility of PCA from
remotely sensed data for studying changes across time. In their work the PCA
dimensions represented different electromagnetic properties of their study area,
the continent of Africa. Fung & LeDrew (1987) also applied PCA to study spatial
changes by employing electromagnetic data. Thus plant ecologists and remote
sensing specialists have been able to explore their data in a new way and report
important ® ndings. We speculated whether similar work could be accomplished
with GIS databases.
We therefore selected a small study area containing suitability maps, the
results from 15 GIS suitability models, to examine the latent structure of the
landscape through PCA. If the latent structure was determ ined to be comprised
of one, two or three dimensions, theoretically the overlays could be collapsed
into a few simple maps. If the latent structure was determ ined to be comprised
of ® ve to eight dimensions, the overlays could be partially reduced. If the latent
structure contained 10 to 15 dimensions, the overlays would be considered
non-collapsible; indicating that the structure is highly complex and dif® cult to
reduce.
Study Area and Methods
The study area for this investigation is Scio Township , in Washtenaw County,
Michigan (Figure 1). The township is located on the western border of Ann
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
540 J. B. Burley & T. J. Brown
Figure 1. Map locating the study area.
Arbor, Michigan, a community west of the Detroit Metropolitan Region. This
township is presen tly comprised of agricultural and suburban land-uses resid ing
upon glacial moraines and till plains. The town of Dexter, Michigan, is located
in the northwestern corner of the township. The Huron River and Interstate 94
traverse the township in a genera l east± west direction.
The database for the study area was comprised of 31 inventory overlays (Table
1). The data consisted of 1 ha cells (100 m 3 100 m). Each overlay contained 9506
data cells (9506 cases). The database resided in MAPII (Pazner et al., 1989) on a
Mac II platform, System 7 operating system .
The 31 overlays were available for use by students enrolled in a graduate level
landscape planning course during the winter of 1993 in the Landscape Architec-
ture programme, School of Natural Resources and Environment at the Univer-
sity of Michigan. Each student created a suitability model for a speci® c land use
or fragile land. Each student was allowed to pick a speci® c land use or fragile
land type (Table 2) and build a spatial model to identify suitable sites for
development or for conservation. Each student generated a ® nal suitability map
for their land use or fragile land (n 5 15). They were to generate an overlay
which contained three levels of land use suitability or landscape fragility based
upon their landscape study type: highly suitable or highly fragile (numerical
value 5 3); moderately suitable or fragile (numerical value 5 2); and poorly
suited or fragile (numerical value 5 1).
For this speci® c study, we were not concerned whether each student’ s suitabil-
ity model was necessarily the best state-of-the-art model; rather, we were
interested in the relationships between suitability models generated. In other
words, we desired to analyse the analysis. Therefore, as long as each student
gave a serious attempt at producing a suitability/fragility map, we were willing
to accept the map. This study differs from the use of some past non-GIS PCA
studies where investigators employed inventory data to generate classi ® cations
and ordinations of the information. We were concerned about the situation in
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
Constructing Interpretable Environments from Multidimensional Data 541
Table 1. List of study area overlays
Variable name Overlay description
Elevation Centroid elevation in m etres
Aspect Slope orientation
Slope % slope
Soil name Soil conservation soil series classi® cation
A horizon Uni® ed soil series classi® cation of A horizon
Low est horizon Uni® ed soil series classi® cation of lowest horizon
A horizon soil permeability Perm eability of A horizon
Low est horizon soil perm eability Perm eability of lowest horizon
Floodplain Area within 100 year ¯ oodplain
Watertable Depth to seasonal high watertable
Open water Type of open water (Lake, River ¼ )
Wetland Type of wetland
Watershed Watershed catchment boundaries
Wells Potential production from well
Sur® cial geology Geological type at surface
Forest type Classi ® cation of woodlands
Agriculture Existing agricultural land uses
Residence type Existing residential land uses
Dwellings per cell Number of residential dwellings cell2 1
Community facilities Existing com munity facilities
Commerce and industry Existing com mercial and industrial land uses
Transportation Existing transportation land uses
Recreation Existing recreation land use type
Recreation activity Existing recreation activity/facility
Lots per cell Number of land parcels cell2 1
Ownership Predominant land ownership
Utilities 1 M ajor gas and electric utilities
Utilities 2 M ajor sewer and water lines
Zoning Zoning map of 1992 plan
General developm ent plan Township developm ent plan 1980± 2000
Windbreaks Location of woodland shelterbelts and wind rows
which numerous analysis studies are conducted to the point where the analysis
requires analysis. This study is not about how to generate a suitability map;
instead, the study addresses the problem of how to cope with a set of developed
suitability maps.
To conduct the investigation, each suitability map was then exported as a text
® le comprised of numerical values starting in the upper left corner of the overlay
and completing at the last numerical value at the lower right corner of the
overlay. A space delim ited cell values. Each overlay generated a text ® le with
9506 numbers. These numbers were imported into SYSTAT for the Macintosh
(SYSTAT, 1992). The values for each overlay were standardized with a mean of
0 and a variance of 1, creating a data set with 9506 cases and 15 variables. Each
variable represen ted a suitability layer.
We chose a statistical analysis package and PCA to apply some advanced
multivariate mathematical features associated with these tools. In many respects
GIS has always suffered from a lack of a full range of numerical computing
techniques (Steinitz et al., 1976). At the turn of the century, hand drawn analysis
techniques hampered numerical procedures. During the formative years of
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
542 J. B. Burley & T. J. Brown
Table 2. Typical list of spatial model suitability variables
Land usesÐ De® nitionsÐ Student
** Estate residentialÐ 1 dw elling unit per hectare, on-site septic tank, domestic wells for water Ð
Dwayne Yee
** Rural residentialÐ 2.5 dw elling units per hectare, on site-septic tank, domestic wells for water Ð
David Purdy
** Suburban residential Ð 5 to 10 dw elling units per hectare, on-site septic tank or fully serviced
public sanitary sewer, on-site wells or fully serviced public water Ð Peter Ter
Louw
** Mobile hom eÐ 17.5 units per hectare maximum, sewered or package system, detached dwelling
units, m inimum 6 hectares of developm entÐ Richard Hitz
** General commercialÐ 2 to 4 hectares in size, public sewer and water, within 6 minutes driving
time from existing housingÐ Raymond Slawski
** Active recreationÐ soccer/football ® eld complex, softball/baseball ® eld com plex, intensive
development and managed for mass public use, potential septic tank and well
water Ð M ike De Vries
Passive recreation Ð trails, cam ping, interpretive centre, archery, ri¯ e, trap, skeet ranges, ® shing,
boating, canoeing, extensively developed for outdoor recreation
** ConservationÐ critical land set aside for protection of resources, limited developm ent, public
landÐ often com bined to form GreenwaysÐ Jeffrey Helms
Of® ce parkÐ single ® rm or multi-clients, modern architecture and site development, corporate
headquarters, professional of® ce park, package or public utilities services
Land® ll/recycling centreÐ service study area or larger vicinity, determine disposal of by-
products, land® ll needs
Fragile landsÐ De® nitionsÐ Student
** Soil erosion Ð identify susceptible areas Ð Paul Strauch
** Wildlife habitat Ð select wildlife type for study and identify critical/potential habitat areas Ð Nancy
Larson
** Vegetation Ð select vegetation type for study and identify critical/potential restoration/preservation
areasÐ Chamaine Kettler
** Ground water quality/supplyÐ protect ground water resources from contamination and/or
degradation due to development or land use activities Ð Debra Gelber
** Surface water/creekshed quality Ð identify areas most susceptible to surface water pollution or
areas with ¯ ooding hazards Ð Chris Kunkle
Visual quality Ð identify signi ® cant visual quality areas suitable for protection
** Rural character Ð identify signi ® cant rural character suitable for protectionÐ Janis Eathrone
** Agricultural landÐ identify important agricultural lands suitable for protectionÐ Bill Schneider
** Noise quality Ð identify sensitive areas to noise pollution or identify areas with adverse noise
pollution that require attenuationÐ Kuang-Ta Hsu
Note: **Denotes land uses or fragile lands selected for study
computer related GIS analysis techniques, investigators were limited to comput-
ing with integers; but the strength of GIS procedures was in their ability to truly
explore and present spatial information (Berry, 1993), something that databases,
engineering computational software and statistical software are relatively weak
at exploring. Nevertheless, formative GIS software suffered from a lack of
complex trigonometric functions, integrals and matrix algebra computational
power. Recently, the connectivity of GIS software with database software has
increased the computational potential of spatial analysis. However, connectivity
to a full range of high powered statistical options is only beginning to be
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
Constructing Interpretable Environments from Multidimensional Data 543
available. Our method of exporting text ® les for importing into statistical
software will eventually become obsolete and the process may become more
transparent, but for the interim this approach is appropriate for many GIS
software packages.
Once the numbers were imported into the statistical analysis software, the
standardized scores were then examined with PCA techniques to determ ine the
number of dimensions necessary to explain the variance in the suitability maps.
For each dimension, the analysis generates eigenvalues. The eigenvalue indicates
the genera l strength of the dimension. With standardized scores, the sum of the
eigenvalues is equal to the number of variables employed in the procedure. The
largest eigenvalue is associated with the ® rst principal component, representing
the greatest proportion of explained variance in the data set. The ® rst principal
component is an axis in multidim ensional space speci® cally selected to match
the greatest proportion of variance. The second principal component is ortho-
gonal to the ® rst axis and explains the second largest amount of variance in the
data set. Orthogonality is an important property of PCA because the perpendic-
ularity of the axis means that the dimensions are independent. The last principal
component contains the smallest eigenvalue. This number can be 0.0 indicating
that all of the variance has already been explained by the previous dimensions.
The numerical size of the eigenvalues leads to an assessment and interpre-
tation of the PCA results. Eigenvalues greater than or approximately equal to 1.0
were considered signi® cant dimensions (Guttman, 1954). Another approach for
determ ining dimensions is to create a scree plot where the eigenvalue is plotted
against the rank order of the eigenvalues. Eigenvalues with slopes steeper than
a straight line generated by the smaller eigenvalues are considered signi® cant.
Jackson (1993) recently evaluated a variety of techniques for estimating the
number of interpretable dimensions.
Each eigenvalue contains a set of eigenvector coef® cients, indicating the
relative association of each variable with the dimension. With standardized
variables, the coef® cients range in value from 1.0 to 2 1.0. The coef® cients form
an equation, a linear combination that can be employed to compute a numerical
value for each observation case. By applying a PCA linear combination equation
to the corresponding GIS data set, one can generate a map illustrating where the
landscape is strongly associated with the selected dimension. Each dimension
can be characterized by examining the corresponding eigenvector coef® cients.
For each eigenvalue, the eigenvector coef® cients greater than or equal to 0.400 or
less than 2 0.400 are often identi® ed as being strongly associated with that
particular dimension. An investigator can use the strongly associated coef® cients
to name or characterize the particular axis or plot the data points along different
PCA dimensions to search for latent structure. For example, Curtis (1959)
discovered that one of his dimensions sorted his vegetation stands from dry
areas to wet landscapes and so he termed one of his dimensions a moisture
gradient. In our study, suppose an axis contains commercial and residential
variables as signi® cantly associated with a particular dimension. This dimension
could be labelled as an urban land dimension. The labelling of a dimension is
purely a subjective process. These labels represent a synopsis and interpretation
of the latent structure within the data set and comprise a simpli® cation and
reductionist investigation approach.
In this study, three PCA sets were examined. The ® rst set contained all 15
variables. The second set contained land use variables only (v 5 7) and the third
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
544 J. B. Burley & T. J. Brown
Table 3. Eigenvalues for 15 variable PCA study
% of variance explained
Principal component Eigenvalue cum ulative
1 2.376 15.84
2 1.689 27.10
3 1.367 36.21
4 1.187 44.13
5 1.066 51.23
6 1.046 58.21
7 0.988 64.86
8 0.964 71.29
9 0.941 77.56
10 0.843 83.18
11 0.729 88.04
12 0.723 92.86
13 0.627 97.04
14 0.443 100.00
15 0.000 100.00
set contained fragile landscape variables only (v 5 8). We were most interested
in the 15 variable set, but desired to examine the two subsets also, to see if the
subsets would corroborate the results obtained in the 15 variable set. If there
were strong differences, we would consider the 15 variable set to be highly
sensitive to changes in the variable mixture subjected to PCA scrutiny.
Whether an investigator is using an applied empirical GIS wildlife habitat
model or a non-point pollution GIS model, or a heuristic industrial facility
suitability analysis model, there is no guarantee that the results will reveal any
important information. In addition, the results generated from a complex GIS
modelling approach may render obvious information, negating the need for
complex analysis. This fallibility is also true for PCA related GIS studies.
However, we believe that in a complex landscape with an array of various
program requirements, PCA is a reductionist tool that may merit application in
GIS modelling and we were willing to investigate the topic.
Results
Table 3 presents the eigenvalue results. Notice that six eigenvalues were greater
than 1.000 and the seventh eigenvalue was within 1/500th of the value 1.000.
The ® rst six eigenvalues explain 58.21% of the variance in the data set and the
® rst seven explain 64.86%. A plot of the eigenvalues suggests that approximately
four or ® ve dimensions are interpretable. Thus somewhere between four and
seven variables are considered interpretable.
Table 4 presen ts the coef® cients for the ® rst seven eigenvectors. These
coef® cients indicate variables associated with speci® c dimensions. Commercial
development and medium density housing were the variables with the largest
positive coef® cient values in the ® rst eigenvector. Agricultural land and green-
ways contained strong positive coef® cients in the ® rst two eigenvectors. Estate
development had a strong positive coef® cient for the second eigenvector, while
commercial development and medium density housing had strong negative
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
Constructing Interpretable Environments from Multidimensional Data 545
values. Land suitable for oak woodlands and susceptible wetlands were the
variables with the strongest positive coef® cients in the third eigenvector, with
noise mitigation negatively associated with the third eigenvector. Land requiring
erosion control measures contained a strong negative coef® cient in the fourth
eigenvector. Areas suitable for softball recreation were positively associated with
the ® fth eigenvalue, while aquifer recharge areas and suburban residential areas
were negatively associated. The sixth eigenvector contained a strong positive
coef® cient for rural character and negative coef® cient for suburban residential
and residential areas. The variable, wildlife habitat in wet woodlands, contained
a positive coef® cient with the seventh eigenvector. Since the remaining eigen-
values were considered non-signi® cant, no other coef® cients were examined
further.
In comparison to the split analysis where PCA was conducted with land use
variables only and then upon fragile land variables only, there was relatively
little difference in the signi® cant dimensions and associated coef® cients. There-
fore, the grouped analysis was considered suf® cient for this study.
Discussion
The results indicate that there are a fair number of dimensions, possibly up to
six or seven dimensions, representing the variance in the study area’ s suitability
maps. Unlike PCA studies conducted by Burley (1991) where data from seven to
15 agronomic crop variables could be summarized in one or two dimensions, the
Scio Township study area was much more complex. The results indicate that the
data set could be reduced, but only moderately.
Dimension Interpretations
Each dimension can be interpreted or given a name representing the underlying
factors or character associated with an eigenvalue. For example, Curtis (1959)
assigned the three major dimensions associated with his work in Wisconsin . He
determ ined that the three axes were associated with light, temperature and
moisture. There are no `hard and fast’ rules concerning the assignment of
characterizing names, and the names are rather subjective. Nevertheless, charac-
terizing each axis can be useful in developing an interpretation of the results.
We labelled the ® rst axis a density development axis, where large values along
the axis represent areas suitable for density development, and low values along
the axis are areas with severe restric tions. We suggest that the second axis is a
rural development axis. Both agriculture and greenways are important variables
in the ® rst two dimensions suggesting that lands in Scio Township suitable for
density development and for rural development are also suitable for agriculture
and greenw ays, implying a potential con¯ ict between development, agriculture
and greenw ays. In some respects such a con¯ ict is not surprising as expanding
urban development often consumes agricultural land and potential greenw ay
corridors. In other respects, the covarying development lands and the greenw ay
corridors indicate that there are opportunities for the two landscape types to be
woven together supplying amenities to expanding development. We believe this
covariation is a promising prospect for the study area.
The third axis was interpreted to be a natural area sanctuary versus urban
stress dimension , where high values along the axis represent locations
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
546 J. B. Burley & T. J. Brown
Ta
ble
4.
Eig
en
vecto
rco
ef®
cien
tsfo
rth
e®
rst
sev
en
eig
en
va
lues
Va
riab
leD
imen
sio
n1
Dim
en
sio
n2
Dim
en
sio
n3
Dim
en
sio
n4
Dim
en
sio
n5
Dim
en
sio
n6
Dim
en
sio
n7
Aq
uif
er
rech
arg
e0.1
79
20.0
58
20.3
80
20.3
97
20.4
27
0.0
15
0.1
23
Co
mm
erc
ial
0.8
23
20.5
04
0.1
80
20.0
41
20.0
61
20.0
71
0.0
75
Esta
te0.2
78
0.4
52
20.1
03
20.2
80
0.1
26
20.2
20
20.3
32
Wet
wo
od
lan
ds
20.0
43
0.1
01
0.2
62
20.3
53
0.3
79
0.1
17
0.6
37
Su
bu
rban
resi
den
tia
l0.0
61
0.1
59
0.0
65
0.1
54
20.4
27
20.4
91
0.0
50
Oa
kw
oo
dla
nd
s2
0.1
10
0.2
82
0.7
41
0.1
27
20.0
62
0.0
16
0.0
40
Gre
en
way
s0.4
86
0.4
60
20.1
97
0.0
43
0.1
28
20.1
95
0.1
69
Ag
ricu
ltu
re0.5
82
0.5
53
0.0
06
0.0
16
0.0
65
0.2
38
20.0
56
Med
ium
den
sity
ho
usi
ng
0.8
23
20.5
04
0.1
80
20.0
41
20.0
61
20.0
71
0.0
75
Ero
sio
nco
ntr
ol
0.0
86
20.0
43
0.0
51
20.6
74
0.2
93
0.0
07
20.3
99
Ru
ral
chara
cte
r0.3
91
0.2
35
0.1
44
0.2
15
20.2
42
0.5
92
20.2
85
No
ise
att
en
ua
tio
n2
0.0
57
20.3
48
20.4
55
20.2
89
0.2
91
0.0
44
20.1
67
Resi
den
tia
l0.0
95
0.0
67
0.2
76
0.1
51
0.2
00
20.5
25
20.2
59
So
ftb
all
recre
ati
on
0.2
92
20.0
76
20.0
53
0.3
97
0.4
92
0.0
34
0.0
26
Herb
aceo
us
wetl
an
ds
20.2
82
20.3
92
0.4
21
20.1
28
20.0
10
0.0
93
20.2
90
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
Constructing Interpretable Environments from Multidimensional Data 547
where oak woodlands and wetlands would be suitable for development and low
values along the axis would be locations requiring mitigation from urban noise.
This is an interesting axis because it identi® es areas of sanctuary from areas of
environmental stress. We labelled it the refuge/stress axis.
The fourth axis was an erosion control axis, where low values represented
areas requiring erosion control measures and high values represented areas with
low soil erosion concerns. The ® fth axis was interpreted to be a recreation axis
where high values represen ted areas suitable for active recreation and low
values represented areas with important aquifer contamination zones and also
areas intended for suburban residential use. This axis suggests that there is a
con¯ ict with suburban residential use and aquifer protection. While suburban
residential land could be developed on these fragile aquifer related parcels,
development of this land should consider design methods to minim ize distur-
bance of the fragile aquifer related lands.
The sixth axis is a rural character/residential development axis, where high
values indicate land with important rural character and low values indicate land
with residential development character. The seventh axis is a wet woodland
dimension where high values indicate areas appropriate for the conservation of
a swamp associated with wildlife types that prefer wet woodlands. The results
indicate that the wet woodlands are not strongly associated with other types of
landscape suitabilities, implying minim al con¯ ict.
Dimensions Expressed as GIS Maps
The eigenvector coef® cients for each dimension can be employed to construct a
GIS map of the spatial properties associated with the dimension. For example,
equation (1) represents a linear combination for the sixth principal component.
This equation computes the rural character and residential development suit-
ability for each raster cell in the GIS database. Values near 1.0 indicate locations
suitable for rural character preservation. Values near 2 1.0 indicate locations
suitable for residential housing development. Figure 2 is a map of the study area
where equation (1) has been employed. This technique can be employed for each
dimension , creating seven overlay maps, reducing the suitability overlays from
15 to seven, revealing a latent landscape structure:
Dimension 6 5 (STDaquifer recharge * 0.015) 1
(STDcommercial * 2 0.071) 1
(STDestate * 2 0.220) 1 (STDwet woodlands * 0.117) 1
(STDsuburban residential * 2 0.491) 1 (STDoak woodland * 0.016) 1
(STDgreenways * 2 0.195) 1 (STDagriculture * 0.238) 1
(STDmedium density housing * 2 0.071) 1 (STDerosion control * 0.007) 1
(STDrural character * 0.592) 1 (STDnoise attenuation * 0.044) 1
(STDresidential * 2 0.525) 1 (STDsoftball recreation * 0.034) 1
(STDherbaceous wetland * 0.093) (1)
where: Dimension 6 5 rural character and residential development score for raster cell
STD 5 standard normalized variable, with mean of 0 and unit variance.
A PCA examination of the two subsets revealed similar dimension sets and
reinforced the axis reported in this study. The investigation determ ined that
there were seven major components associated for the 15 variables studied,
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
548 J. B. Burley & T. J. Brown
Figure 2. Map illustrating results from dimension six and the resulting equation.
consisting of a density development axis, a rural development axis, a refuge/
stress axis, an erosion control axis, a recreation axis, a rural character/residential
development axis and a wet woodland axis. These seven dimensions form a set
of spatial overlays summarizing the set of 15 suitability overlays. In theory, this
smaller set of seven overlays could then be employed in further landscape
planning studies to assign speci® c locations to speci® c land uses and to derive
a township zoning plan, development plan and natural resource protection plan.
A discussion concerning the generation of these plans is beyond the scope and
intentions of this study. However, our study illustrates a procedure to sort
information in a reductionist manner that may assist in the synthesis of a
landscape plan.
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
Constructing Interpretable Environments from Multidimensional Data 549
Cautions and Limitations
While this investigation generated results that indicate moderate complexity for
the study area, this does not mean that all study areas are similarly structured.
Instead, this investigation indicates that Scio Township as studied by the
investigatory team is relatively complex. Due to the nature of this study, there
is no statistical evidence that another investigatory team comprised of another
year’ s students would derive similar results. Therefore, the results presented in
this study may not be externally valid. In addition, the acquisition and develop-
ment of more overlays for the database or the change in grid cell area may
potentially presen t different results. However, we believe that this study pre-
sents an approach to examine the physical suitability of the landscape for a
variety of land uses and environmental features.
There are several statistical limitations of this study. First, to employ PCA, one
should have multivariate normal data. However, overlays often contain non-
normal distributions. One can correct for normality by transforming the data sets
to normal distributions; nevertheless, this does not mean the data set is multi-
variate normal. In addition, the data values are often not continuous values but
rather ordinal values with low ranges, meaning that the data set does not
necessarily conform to the requirem ents of many parametric statistical tests
including PCA (Johnson & Wichern , 1988). Finally, map data are often spatially
autocorrelated. This means that the raster cell data are not independent from
adjoining cells. Again parametric tests assume independence of observations for
the raster cells; however, the raster cells are most likely spatially autocorrelated
and not independent. Knowing these limitations, we decided to proceed and
explore the data set.
Conclusion
We believe that the complexity of the study area may be due to the geophysical
nature of Scio Township. This township is a glacial moraine landscape with
numerous topographical and hydrological features. A comparative investigation
examining a glacial lake plain landscape would offer insight to understanding
the complexity ® ndings of the study area.
For our study area and database, we concluded that a simple reductionist
model was not necessarily possible and that the minimum number of dimen-
sions required to describe the study area is six or seven. This translates into 37
possible landscape suitability combinations, or 2187. We suggest that this
complexity poses a problem for landscape planners, because one may have to
assess and reassess development plans along six or seven dimensions, and offers
insight into why clear, neatly packaged, environmental development plans may
be dif® cult to generate. We also suggest that this complexity may be a blessing ,
because the results indicate that the spatial properties of suitability maps may
not necessarily covary, affording different locations for varying land uses and
landscape operations.
We anticipate that GIS investigators will begin to apply PCA to their study
areas as an exploratory technique to identify latent structures and relationships
among suitability maps and landscape features. In addition, we would expect
that researchers would begin to report results concerning the variability of
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013
550 J. B. Burley & T. J. Brown
these dimensions. If somewhat ® rm latent structures are discovered , other
multivariate techniques such as discriminant functions (Johnson & Wichern ,
1988) may be applicable.
References
Berry, J.K. (1993) Beyond M apping: Concepts, Algorithms, and Issues in GIS (Fort Collins, CO, GIS
World).
Burley, J.B. (1991) Vegetation productivity equation for reclaiming surface mines in Clay County,
Minnesota, International Journal of Surface M ining and Reclamation , 5, pp. 1 ± 6.
Burley, J.B., Kopperl, J., Paliga, J. & Carter, W . (1990) Land-use/watershed modeling for Lake Itasca,
Minnesota: a land-use planning, GIS project at Colorado State University, Landscape and Land Use
Planning, 17, pp. 19 ± 25 (W ashington, DC, American Society of Landscape Architects Open Com -
mittee on Landscape and Land Use Planning).
Curtis, J.T . (1959) Vegetation of Wisconsin: An Ordination of Plant Communities (M adison, WI, Univer-
sity of Wisconsin Press).
Eastman, J.R. & Fulk, M . (1993) Long sequence of time series evaluation using standardized principal
components, Photogrammetric Engineering & Remote Sensing, 59(6), pp. 991± 996.
Fung, T. & LeDrew, E. (1987) Application of principal components analysis to change detection,
Photogrammetric Engineering & Remote Sensing, 53(12), pp. 1649± 1658.
Guttman, L. (1954) Some necessary conditions for common factor analysis, Psychometrika , 30,
pp. 179± 185.
Hopkins, L.D . (1977) M ethods for generating land suitability maps: a comparative evaluation, Journal
of the American Institute of Planners, 43(4), pp. 386± 400.
Jackson, D.A. (1993) Stopping rules in principal components analysis: a comparison of heuristical
and statistical approaches, Ecology , 74(8), pp. 2204 ± 2214.
Johnson, R.A. & Wichern, D.W. (1988) Applied M ultivariate Statistica l Analysis (Englewood Cliffs, NJ,
Prentice Hall).
Johnson, R. & Burley, J.B. (1990) Snowy Range ski resort: an illustration of GIS planning principles,
Landscape Architectural Review , 9(1), pp. 15± 18.
Kendall, M. (1939) The geographical distribution of crop productivity in England, Journal of the Royal
Statist ical Society , 102, pp. 21± 48.
McHarg, I.L. (1969) Design with Nature (G arden City, NY, Doubleday/Natural History Press).
Pazner, M., Kirby, K.C., & Thies, N . (1989) M AP II: Reference M anual. A Geographic Information System
for the M acintosh (N ew York, J W iley).
Pielou, E .C. (1984) The Interpretation of Ecological Data (N ew York, J Wiley).
SYSTAT (1992) Version 5.2 Edition (Evanston, IL, SYSTAT).
Steiner, F. (1991) The Living Landscape: an Ecolog ical Approach to Landscape Planning (New York,
McGraw-H ill).
Steinitz, C., Parker, P., & Jordan, L. (1976) Hand drawn overlays: their history and prospective uses,
Landscape Architecture, 66, pp. 444± 455.
Westman, W.E. (1985) Ecology, Impact Assessment, and Environmental Planning (New York, J Wiley).
Dow
nloa
ded
by [
Nor
th C
arol
ina
Stat
e U
nive
rsity
] at
01:
37 0
9 Se
ptem
ber
2013