Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and...

16
This article was downloaded by: [North Carolina State University] On: 09 September 2013, At: 01:37 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Environmental Planning and Management Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/cjep20 Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis Jon Bryan Burley Published online: 02 Aug 2010. To cite this article: Jon Bryan Burley (1995) Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis, Journal of Environmental Planning and Management, 38:4, 537-550, DOI: 10.1080/09640569512805 To link to this article: http://dx.doi.org/10.1080/09640569512805 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is

Transcript of Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and...

Page 1: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

This article was downloaded by: [North Carolina State University]On: 09 September 2013, At: 01:37Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Environmental Planningand ManagementPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/cjep20

Constructing InterpretableEnvironments fromMultidimensional Data: GISSuitability Overlays and PrincipalComponent AnalysisJon Bryan BurleyPublished online: 02 Aug 2010.

To cite this article: Jon Bryan Burley (1995) Constructing Interpretable Environments fromMultidimensional Data: GIS Suitability Overlays and Principal Component Analysis, Journalof Environmental Planning and Management, 38:4, 537-550, DOI: 10.1080/09640569512805

To link to this article: http://dx.doi.org/10.1080/09640569512805

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information(the “Content”) contained in the publications on our platform. However, Taylor& Francis, our agents, and our licensors make no representations or warrantieswhatsoever as to the accuracy, completeness, or suitability for any purposeof the Content. Any opinions and views expressed in this publication are theopinions and views of the authors, and are not the views of or endorsed by Taylor& Francis. The accuracy of the Content should not be relied upon and should beindependently verified with primary sources of information. Taylor and Francisshall not be liable for any losses, actions, claims, proceedings, demands, costs,expenses, damages, and other liabilities whatsoever or howsoever caused arisingdirectly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone is

Page 2: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 3: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

Journal of Environmental Planning and Management, Vol. 38, No. 4, 1995

Constructing Interpretable Environments from

Multidimensional Data: GIS Suitability Overlays and

Principal Component Analysis

JON BRYAN BURLEY* & TERRY J. BROW N²

*Department of Geography, Michigan State University, East Lansing, MI 48824, USA;

² School of Natural Resources and Environment, University of M ichigan, Ann Arbor, MI

48109 USA

(Received January 1994; revised February and May 1995)

ABSTRACT In landscape planning applications, practitioners and governmental

agencies are often faced with a broad array of clientele and constituents having

particular land use requirements and needs, ranging from biological conservation to

urban development, generating complex multidimensional regional planning goals and

objectives. Under this often complex situation, investigators are searching for methods

to intelligently simplify complicated spatial environments and render them into inter-

pretable and practical settings. While num erous investigators have studied the

generation of a single suitability map, we were interested in addressing the problem of

coping with a set of many suitability maps. We applied a data reduction method,

principal component analysis, across 15 suitability overlays representing diverse land-

scape requirements to search for simpli ® ed explanations indicating the latent structure

of the landscape. The study area was located in a moraine landscape of southern

Michigan. We discovered that the 15 suitability overlays could be reduced to seven

dimensions, containing 65% of the original data structure and that the seven dimensions

re¯ ect a structure where a variety of land uses each have their own optimal spatial

locations, indicating low to moderate competition between potentially con¯ icting land

uses and rendering a more easily understood environment. This approach did not render

a simple elegant solution but it did reduce the complexity associated with combining

many suitability maps.

Introduction

Geographical Information Systems (GIS) and statistical analysis are two com-

puter intensive numerical applications which are currently being explored in

unison. GIS databases contain spatial information about the attributes of site

speci® c locations. Small raster GIS databases may contain 1/4 million grid cell

cases (n 5 250 000) and 10 to 20 variables (v 5 10 to 20). Yet, these small GIS data

sets are substantial in size for many statistical applications, such as multiple

regress ion analysis, multivariate analysis and spatial autocorrelation techniques.

GIS databases present investigators with observation sets which are substantially

larger than observation sets typically examined and reported by investigators

employing traditional ® eld recording techniques.

537

0964-0568/95/040537-14 Ó 1995 University of Newcastle upon Tyne

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 4: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

538 J. B. Burley & T. J. Brown

Presently, relatively few studies have explored the potential of integrating

GIS with statistical analysis. Often investigators using GIS databases employ

heuristic models illustrated by the work of Johnson & Burley (1990) or pre-

viously developed empirical models to assess the contents of the landscape,

illustrated by Burley et al. (1990). Essentially, the purpose of these models is to

identify an analytic or a computational feature of the landscape such as the

suitability of a particular landscape feature to support a speci® c land use. The

map is generated by employing combinations of spatial inventory properties

such as depth to the water table or percentage slope. McHarg (1969), Steinitz et

al. (1976), Hopkins (1977), Westman (1985) and Steiner (1991) explain in greater

detail ideas associated with suitability analysis. However, as McHarg (1969,

pp. 31± 41) notes in his highw ay study, there is relatively little formal guidance

available when one is combining suitability maps to form grand composite

analysis maps. Currently , the scholarship associated with combining suitability

maps is not much further than the 1969 McHarg study. We suggest that one

approach for the genera l advancement of GIS methodology may be found in

multivariate statistical methods. Without statistical examination of the data set,

interaction relationships and covariance properties between the variables may

be lost, meaning that the investigator may miss some important properties

associated with the study area. This paper investigates the statistical descriptive

powers of the GIS database to reveal latent landscape structure and character to

search for a numerically derived approach to generate composite maps.

One reason why statistical methods may be infrequently employed in GIS

databases, is that inferential statistical algorithms are not imbedded in many GIS

software programs, meaning that the data must be exported to another software

program for statistical analysis. Today, converting GIS software generated ® les

to text ® les offers a solution for exporting the GIS ® les and importing the ® les

into a statistical analysis software program.

We are especially interested in exporting GIS ® les to a statistical software

package and to examine the dimensionality of a study site with multivariate

statistical analysis techniques. We wanted to take a series of suitability models

with associated suitability maps across a wide array of program types ranging

from housing and commercial land-uses to wildlife habitat and watershed

protection and study the associative qualities of these individual suitability

maps. Could 10 or more suitability maps derived from suitability models be

reduced to several simple dimensions (three or four overlays) containing most of

the information in the original suitability maps? Were there natural groupings of

suitability overlays? We were interested in this reductionist approach to see if

there was a less complex method of presen ting a landscape’s spatial properties.

A set of 10 or more suitability overlays may render the multivariate interpret-

ation of the landscape dif® cult to conduct. Traditional GIS methods enable a

single continuous variable to be spatially presented with relatively easy in-

terpretation. Even two variables can be combined into one overlay for interpret-

ation. Colour maps are helpful in rendering an interpretation. Simpli® ed

categories for each variable also aid in cross category interpretation. This process

can be applied to three variables; however, a large number of categories for each

variable generates a large legend . For example, suppose there are three variables

each with the potential for 10 values, resulting in thousands of different variable

combination sets for a given map. Often three or four ordinal categories are

selected for each variable generating nine to 16 potential categories. However,

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 5: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

Constructing Interpretable Environments from Multidimensional Data 539

with 10 variables, each containing three categories, the potential number of

categories is 177 147, although in many cases the actual combinations on a

composite map may be only 10% of the theoretical combinations because not

every combination exists within the study area. Nevertheless, a legend for a map

with only 17 000 combinations may still be much greater in size than the

composite map itself.

One approach to summarizing a large number of variables in a map is to sum

the variables or to multiply the variables (Hopkins, 1977). Multiplying can

generate a greater spread (variance) in the data set. However, without examin-

ing each variable, the contribution for each variable to a particular spatial

position is unknown. Thus the non-statistically derived composite map with

known contributions as identi® ed in a composite map legend is often con-

strained to a small set of variables.

Plant ecologists faced this reductionist problem. They desired to interpret

many variables and to reduce the number of dimensions associated with those

variables by applying multivariate statistical techniques such as principal

component analysis (PCA)Ð (see Johnson & Wichern, 1988). In their case, the

variables were vegetation types found in a stand or plot. For example, Curtis

(1959) derived the importance value (sum of the % frequency, % dominance and

% density) for each vegetation type in a tree stand, meaning that he could have

30 or more variables (vegetation types) and hundreds of cases (stands). He then

statistically analysed the variables and discovered that he could reduce the

number of dimensions to two or three variables without losing much of the

data’s character by combining the variables in a linear combination as indicated

by the statistical analysis. Pielou (1984) explains this statistical approach in

detail. Kendall (1939) illustrated the importance of this data reduction technique

but the numerical computing power necessary to conduct the multivariate

analysis was not present until the development of the computer. One study

conducted by Eastman & Fulk (1993) demonstrates the utility of PCA from

remotely sensed data for studying changes across time. In their work the PCA

dimensions represented different electromagnetic properties of their study area,

the continent of Africa. Fung & LeDrew (1987) also applied PCA to study spatial

changes by employing electromagnetic data. Thus plant ecologists and remote

sensing specialists have been able to explore their data in a new way and report

important ® ndings. We speculated whether similar work could be accomplished

with GIS databases.

We therefore selected a small study area containing suitability maps, the

results from 15 GIS suitability models, to examine the latent structure of the

landscape through PCA. If the latent structure was determ ined to be comprised

of one, two or three dimensions, theoretically the overlays could be collapsed

into a few simple maps. If the latent structure was determ ined to be comprised

of ® ve to eight dimensions, the overlays could be partially reduced. If the latent

structure contained 10 to 15 dimensions, the overlays would be considered

non-collapsible; indicating that the structure is highly complex and dif® cult to

reduce.

Study Area and Methods

The study area for this investigation is Scio Township , in Washtenaw County,

Michigan (Figure 1). The township is located on the western border of Ann

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 6: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

540 J. B. Burley & T. J. Brown

Figure 1. Map locating the study area.

Arbor, Michigan, a community west of the Detroit Metropolitan Region. This

township is presen tly comprised of agricultural and suburban land-uses resid ing

upon glacial moraines and till plains. The town of Dexter, Michigan, is located

in the northwestern corner of the township. The Huron River and Interstate 94

traverse the township in a genera l east± west direction.

The database for the study area was comprised of 31 inventory overlays (Table

1). The data consisted of 1 ha cells (100 m 3 100 m). Each overlay contained 9506

data cells (9506 cases). The database resided in MAPII (Pazner et al., 1989) on a

Mac II platform, System 7 operating system .

The 31 overlays were available for use by students enrolled in a graduate level

landscape planning course during the winter of 1993 in the Landscape Architec-

ture programme, School of Natural Resources and Environment at the Univer-

sity of Michigan. Each student created a suitability model for a speci® c land use

or fragile land. Each student was allowed to pick a speci® c land use or fragile

land type (Table 2) and build a spatial model to identify suitable sites for

development or for conservation. Each student generated a ® nal suitability map

for their land use or fragile land (n 5 15). They were to generate an overlay

which contained three levels of land use suitability or landscape fragility based

upon their landscape study type: highly suitable or highly fragile (numerical

value 5 3); moderately suitable or fragile (numerical value 5 2); and poorly

suited or fragile (numerical value 5 1).

For this speci® c study, we were not concerned whether each student’ s suitabil-

ity model was necessarily the best state-of-the-art model; rather, we were

interested in the relationships between suitability models generated. In other

words, we desired to analyse the analysis. Therefore, as long as each student

gave a serious attempt at producing a suitability/fragility map, we were willing

to accept the map. This study differs from the use of some past non-GIS PCA

studies where investigators employed inventory data to generate classi ® cations

and ordinations of the information. We were concerned about the situation in

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 7: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

Constructing Interpretable Environments from Multidimensional Data 541

Table 1. List of study area overlays

Variable name Overlay description

Elevation Centroid elevation in m etres

Aspect Slope orientation

Slope % slope

Soil name Soil conservation soil series classi® cation

A horizon Uni® ed soil series classi® cation of A horizon

Low est horizon Uni® ed soil series classi® cation of lowest horizon

A horizon soil permeability Perm eability of A horizon

Low est horizon soil perm eability Perm eability of lowest horizon

Floodplain Area within 100 year ¯ oodplain

Watertable Depth to seasonal high watertable

Open water Type of open water (Lake, River ¼ )

Wetland Type of wetland

Watershed Watershed catchment boundaries

Wells Potential production from well

Sur® cial geology Geological type at surface

Forest type Classi ® cation of woodlands

Agriculture Existing agricultural land uses

Residence type Existing residential land uses

Dwellings per cell Number of residential dwellings cell2 1

Community facilities Existing com munity facilities

Commerce and industry Existing com mercial and industrial land uses

Transportation Existing transportation land uses

Recreation Existing recreation land use type

Recreation activity Existing recreation activity/facility

Lots per cell Number of land parcels cell2 1

Ownership Predominant land ownership

Utilities 1 M ajor gas and electric utilities

Utilities 2 M ajor sewer and water lines

Zoning Zoning map of 1992 plan

General developm ent plan Township developm ent plan 1980± 2000

Windbreaks Location of woodland shelterbelts and wind rows

which numerous analysis studies are conducted to the point where the analysis

requires analysis. This study is not about how to generate a suitability map;

instead, the study addresses the problem of how to cope with a set of developed

suitability maps.

To conduct the investigation, each suitability map was then exported as a text

® le comprised of numerical values starting in the upper left corner of the overlay

and completing at the last numerical value at the lower right corner of the

overlay. A space delim ited cell values. Each overlay generated a text ® le with

9506 numbers. These numbers were imported into SYSTAT for the Macintosh

(SYSTAT, 1992). The values for each overlay were standardized with a mean of

0 and a variance of 1, creating a data set with 9506 cases and 15 variables. Each

variable represen ted a suitability layer.

We chose a statistical analysis package and PCA to apply some advanced

multivariate mathematical features associated with these tools. In many respects

GIS has always suffered from a lack of a full range of numerical computing

techniques (Steinitz et al., 1976). At the turn of the century, hand drawn analysis

techniques hampered numerical procedures. During the formative years of

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 8: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

542 J. B. Burley & T. J. Brown

Table 2. Typical list of spatial model suitability variables

Land usesÐ De® nitionsÐ Student

** Estate residentialÐ 1 dw elling unit per hectare, on-site septic tank, domestic wells for water Ð

Dwayne Yee

** Rural residentialÐ 2.5 dw elling units per hectare, on site-septic tank, domestic wells for water Ð

David Purdy

** Suburban residential Ð 5 to 10 dw elling units per hectare, on-site septic tank or fully serviced

public sanitary sewer, on-site wells or fully serviced public water Ð Peter Ter

Louw

** Mobile hom eÐ 17.5 units per hectare maximum, sewered or package system, detached dwelling

units, m inimum 6 hectares of developm entÐ Richard Hitz

** General commercialÐ 2 to 4 hectares in size, public sewer and water, within 6 minutes driving

time from existing housingÐ Raymond Slawski

** Active recreationÐ soccer/football ® eld complex, softball/baseball ® eld com plex, intensive

development and managed for mass public use, potential septic tank and well

water Ð M ike De Vries

Passive recreation Ð trails, cam ping, interpretive centre, archery, ri¯ e, trap, skeet ranges, ® shing,

boating, canoeing, extensively developed for outdoor recreation

** ConservationÐ critical land set aside for protection of resources, limited developm ent, public

landÐ often com bined to form GreenwaysÐ Jeffrey Helms

Of® ce parkÐ single ® rm or multi-clients, modern architecture and site development, corporate

headquarters, professional of® ce park, package or public utilities services

Land® ll/recycling centreÐ service study area or larger vicinity, determine disposal of by-

products, land® ll needs

Fragile landsÐ De® nitionsÐ Student

** Soil erosion Ð identify susceptible areas Ð Paul Strauch

** Wildlife habitat Ð select wildlife type for study and identify critical/potential habitat areas Ð Nancy

Larson

** Vegetation Ð select vegetation type for study and identify critical/potential restoration/preservation

areasÐ Chamaine Kettler

** Ground water quality/supplyÐ protect ground water resources from contamination and/or

degradation due to development or land use activities Ð Debra Gelber

** Surface water/creekshed quality Ð identify areas most susceptible to surface water pollution or

areas with ¯ ooding hazards Ð Chris Kunkle

Visual quality Ð identify signi ® cant visual quality areas suitable for protection

** Rural character Ð identify signi ® cant rural character suitable for protectionÐ Janis Eathrone

** Agricultural landÐ identify important agricultural lands suitable for protectionÐ Bill Schneider

** Noise quality Ð identify sensitive areas to noise pollution or identify areas with adverse noise

pollution that require attenuationÐ Kuang-Ta Hsu

Note: **Denotes land uses or fragile lands selected for study

computer related GIS analysis techniques, investigators were limited to comput-

ing with integers; but the strength of GIS procedures was in their ability to truly

explore and present spatial information (Berry, 1993), something that databases,

engineering computational software and statistical software are relatively weak

at exploring. Nevertheless, formative GIS software suffered from a lack of

complex trigonometric functions, integrals and matrix algebra computational

power. Recently, the connectivity of GIS software with database software has

increased the computational potential of spatial analysis. However, connectivity

to a full range of high powered statistical options is only beginning to be

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 9: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

Constructing Interpretable Environments from Multidimensional Data 543

available. Our method of exporting text ® les for importing into statistical

software will eventually become obsolete and the process may become more

transparent, but for the interim this approach is appropriate for many GIS

software packages.

Once the numbers were imported into the statistical analysis software, the

standardized scores were then examined with PCA techniques to determ ine the

number of dimensions necessary to explain the variance in the suitability maps.

For each dimension, the analysis generates eigenvalues. The eigenvalue indicates

the genera l strength of the dimension. With standardized scores, the sum of the

eigenvalues is equal to the number of variables employed in the procedure. The

largest eigenvalue is associated with the ® rst principal component, representing

the greatest proportion of explained variance in the data set. The ® rst principal

component is an axis in multidim ensional space speci® cally selected to match

the greatest proportion of variance. The second principal component is ortho-

gonal to the ® rst axis and explains the second largest amount of variance in the

data set. Orthogonality is an important property of PCA because the perpendic-

ularity of the axis means that the dimensions are independent. The last principal

component contains the smallest eigenvalue. This number can be 0.0 indicating

that all of the variance has already been explained by the previous dimensions.

The numerical size of the eigenvalues leads to an assessment and interpre-

tation of the PCA results. Eigenvalues greater than or approximately equal to 1.0

were considered signi® cant dimensions (Guttman, 1954). Another approach for

determ ining dimensions is to create a scree plot where the eigenvalue is plotted

against the rank order of the eigenvalues. Eigenvalues with slopes steeper than

a straight line generated by the smaller eigenvalues are considered signi® cant.

Jackson (1993) recently evaluated a variety of techniques for estimating the

number of interpretable dimensions.

Each eigenvalue contains a set of eigenvector coef® cients, indicating the

relative association of each variable with the dimension. With standardized

variables, the coef® cients range in value from 1.0 to 2 1.0. The coef® cients form

an equation, a linear combination that can be employed to compute a numerical

value for each observation case. By applying a PCA linear combination equation

to the corresponding GIS data set, one can generate a map illustrating where the

landscape is strongly associated with the selected dimension. Each dimension

can be characterized by examining the corresponding eigenvector coef® cients.

For each eigenvalue, the eigenvector coef® cients greater than or equal to 0.400 or

less than 2 0.400 are often identi® ed as being strongly associated with that

particular dimension. An investigator can use the strongly associated coef® cients

to name or characterize the particular axis or plot the data points along different

PCA dimensions to search for latent structure. For example, Curtis (1959)

discovered that one of his dimensions sorted his vegetation stands from dry

areas to wet landscapes and so he termed one of his dimensions a moisture

gradient. In our study, suppose an axis contains commercial and residential

variables as signi® cantly associated with a particular dimension. This dimension

could be labelled as an urban land dimension. The labelling of a dimension is

purely a subjective process. These labels represent a synopsis and interpretation

of the latent structure within the data set and comprise a simpli® cation and

reductionist investigation approach.

In this study, three PCA sets were examined. The ® rst set contained all 15

variables. The second set contained land use variables only (v 5 7) and the third

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 10: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

544 J. B. Burley & T. J. Brown

Table 3. Eigenvalues for 15 variable PCA study

% of variance explained

Principal component Eigenvalue cum ulative

1 2.376 15.84

2 1.689 27.10

3 1.367 36.21

4 1.187 44.13

5 1.066 51.23

6 1.046 58.21

7 0.988 64.86

8 0.964 71.29

9 0.941 77.56

10 0.843 83.18

11 0.729 88.04

12 0.723 92.86

13 0.627 97.04

14 0.443 100.00

15 0.000 100.00

set contained fragile landscape variables only (v 5 8). We were most interested

in the 15 variable set, but desired to examine the two subsets also, to see if the

subsets would corroborate the results obtained in the 15 variable set. If there

were strong differences, we would consider the 15 variable set to be highly

sensitive to changes in the variable mixture subjected to PCA scrutiny.

Whether an investigator is using an applied empirical GIS wildlife habitat

model or a non-point pollution GIS model, or a heuristic industrial facility

suitability analysis model, there is no guarantee that the results will reveal any

important information. In addition, the results generated from a complex GIS

modelling approach may render obvious information, negating the need for

complex analysis. This fallibility is also true for PCA related GIS studies.

However, we believe that in a complex landscape with an array of various

program requirements, PCA is a reductionist tool that may merit application in

GIS modelling and we were willing to investigate the topic.

Results

Table 3 presents the eigenvalue results. Notice that six eigenvalues were greater

than 1.000 and the seventh eigenvalue was within 1/500th of the value 1.000.

The ® rst six eigenvalues explain 58.21% of the variance in the data set and the

® rst seven explain 64.86%. A plot of the eigenvalues suggests that approximately

four or ® ve dimensions are interpretable. Thus somewhere between four and

seven variables are considered interpretable.

Table 4 presen ts the coef® cients for the ® rst seven eigenvectors. These

coef® cients indicate variables associated with speci® c dimensions. Commercial

development and medium density housing were the variables with the largest

positive coef® cient values in the ® rst eigenvector. Agricultural land and green-

ways contained strong positive coef® cients in the ® rst two eigenvectors. Estate

development had a strong positive coef® cient for the second eigenvector, while

commercial development and medium density housing had strong negative

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 11: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

Constructing Interpretable Environments from Multidimensional Data 545

values. Land suitable for oak woodlands and susceptible wetlands were the

variables with the strongest positive coef® cients in the third eigenvector, with

noise mitigation negatively associated with the third eigenvector. Land requiring

erosion control measures contained a strong negative coef® cient in the fourth

eigenvector. Areas suitable for softball recreation were positively associated with

the ® fth eigenvalue, while aquifer recharge areas and suburban residential areas

were negatively associated. The sixth eigenvector contained a strong positive

coef® cient for rural character and negative coef® cient for suburban residential

and residential areas. The variable, wildlife habitat in wet woodlands, contained

a positive coef® cient with the seventh eigenvector. Since the remaining eigen-

values were considered non-signi® cant, no other coef® cients were examined

further.

In comparison to the split analysis where PCA was conducted with land use

variables only and then upon fragile land variables only, there was relatively

little difference in the signi® cant dimensions and associated coef® cients. There-

fore, the grouped analysis was considered suf® cient for this study.

Discussion

The results indicate that there are a fair number of dimensions, possibly up to

six or seven dimensions, representing the variance in the study area’ s suitability

maps. Unlike PCA studies conducted by Burley (1991) where data from seven to

15 agronomic crop variables could be summarized in one or two dimensions, the

Scio Township study area was much more complex. The results indicate that the

data set could be reduced, but only moderately.

Dimension Interpretations

Each dimension can be interpreted or given a name representing the underlying

factors or character associated with an eigenvalue. For example, Curtis (1959)

assigned the three major dimensions associated with his work in Wisconsin . He

determ ined that the three axes were associated with light, temperature and

moisture. There are no `hard and fast’ rules concerning the assignment of

characterizing names, and the names are rather subjective. Nevertheless, charac-

terizing each axis can be useful in developing an interpretation of the results.

We labelled the ® rst axis a density development axis, where large values along

the axis represent areas suitable for density development, and low values along

the axis are areas with severe restric tions. We suggest that the second axis is a

rural development axis. Both agriculture and greenways are important variables

in the ® rst two dimensions suggesting that lands in Scio Township suitable for

density development and for rural development are also suitable for agriculture

and greenw ays, implying a potential con¯ ict between development, agriculture

and greenw ays. In some respects such a con¯ ict is not surprising as expanding

urban development often consumes agricultural land and potential greenw ay

corridors. In other respects, the covarying development lands and the greenw ay

corridors indicate that there are opportunities for the two landscape types to be

woven together supplying amenities to expanding development. We believe this

covariation is a promising prospect for the study area.

The third axis was interpreted to be a natural area sanctuary versus urban

stress dimension , where high values along the axis represent locations

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 12: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

546 J. B. Burley & T. J. Brown

Ta

ble

4.

Eig

en

vecto

rco

ef®

cien

tsfo

rth

rst

sev

en

eig

en

va

lues

Va

riab

leD

imen

sio

n1

Dim

en

sio

n2

Dim

en

sio

n3

Dim

en

sio

n4

Dim

en

sio

n5

Dim

en

sio

n6

Dim

en

sio

n7

Aq

uif

er

rech

arg

e0.1

79

20.0

58

20.3

80

20.3

97

20.4

27

0.0

15

0.1

23

Co

mm

erc

ial

0.8

23

20.5

04

0.1

80

20.0

41

20.0

61

20.0

71

0.0

75

Esta

te0.2

78

0.4

52

20.1

03

20.2

80

0.1

26

20.2

20

20.3

32

Wet

wo

od

lan

ds

20.0

43

0.1

01

0.2

62

20.3

53

0.3

79

0.1

17

0.6

37

Su

bu

rban

resi

den

tia

l0.0

61

0.1

59

0.0

65

0.1

54

20.4

27

20.4

91

0.0

50

Oa

kw

oo

dla

nd

s2

0.1

10

0.2

82

0.7

41

0.1

27

20.0

62

0.0

16

0.0

40

Gre

en

way

s0.4

86

0.4

60

20.1

97

0.0

43

0.1

28

20.1

95

0.1

69

Ag

ricu

ltu

re0.5

82

0.5

53

0.0

06

0.0

16

0.0

65

0.2

38

20.0

56

Med

ium

den

sity

ho

usi

ng

0.8

23

20.5

04

0.1

80

20.0

41

20.0

61

20.0

71

0.0

75

Ero

sio

nco

ntr

ol

0.0

86

20.0

43

0.0

51

20.6

74

0.2

93

0.0

07

20.3

99

Ru

ral

chara

cte

r0.3

91

0.2

35

0.1

44

0.2

15

20.2

42

0.5

92

20.2

85

No

ise

att

en

ua

tio

n2

0.0

57

20.3

48

20.4

55

20.2

89

0.2

91

0.0

44

20.1

67

Resi

den

tia

l0.0

95

0.0

67

0.2

76

0.1

51

0.2

00

20.5

25

20.2

59

So

ftb

all

recre

ati

on

0.2

92

20.0

76

20.0

53

0.3

97

0.4

92

0.0

34

0.0

26

Herb

aceo

us

wetl

an

ds

20.2

82

20.3

92

0.4

21

20.1

28

20.0

10

0.0

93

20.2

90

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 13: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

Constructing Interpretable Environments from Multidimensional Data 547

where oak woodlands and wetlands would be suitable for development and low

values along the axis would be locations requiring mitigation from urban noise.

This is an interesting axis because it identi® es areas of sanctuary from areas of

environmental stress. We labelled it the refuge/stress axis.

The fourth axis was an erosion control axis, where low values represented

areas requiring erosion control measures and high values represented areas with

low soil erosion concerns. The ® fth axis was interpreted to be a recreation axis

where high values represen ted areas suitable for active recreation and low

values represented areas with important aquifer contamination zones and also

areas intended for suburban residential use. This axis suggests that there is a

con¯ ict with suburban residential use and aquifer protection. While suburban

residential land could be developed on these fragile aquifer related parcels,

development of this land should consider design methods to minim ize distur-

bance of the fragile aquifer related lands.

The sixth axis is a rural character/residential development axis, where high

values indicate land with important rural character and low values indicate land

with residential development character. The seventh axis is a wet woodland

dimension where high values indicate areas appropriate for the conservation of

a swamp associated with wildlife types that prefer wet woodlands. The results

indicate that the wet woodlands are not strongly associated with other types of

landscape suitabilities, implying minim al con¯ ict.

Dimensions Expressed as GIS Maps

The eigenvector coef® cients for each dimension can be employed to construct a

GIS map of the spatial properties associated with the dimension. For example,

equation (1) represents a linear combination for the sixth principal component.

This equation computes the rural character and residential development suit-

ability for each raster cell in the GIS database. Values near 1.0 indicate locations

suitable for rural character preservation. Values near 2 1.0 indicate locations

suitable for residential housing development. Figure 2 is a map of the study area

where equation (1) has been employed. This technique can be employed for each

dimension , creating seven overlay maps, reducing the suitability overlays from

15 to seven, revealing a latent landscape structure:

Dimension 6 5 (STDaquifer recharge * 0.015) 1

(STDcommercial * 2 0.071) 1

(STDestate * 2 0.220) 1 (STDwet woodlands * 0.117) 1

(STDsuburban residential * 2 0.491) 1 (STDoak woodland * 0.016) 1

(STDgreenways * 2 0.195) 1 (STDagriculture * 0.238) 1

(STDmedium density housing * 2 0.071) 1 (STDerosion control * 0.007) 1

(STDrural character * 0.592) 1 (STDnoise attenuation * 0.044) 1

(STDresidential * 2 0.525) 1 (STDsoftball recreation * 0.034) 1

(STDherbaceous wetland * 0.093) (1)

where: Dimension 6 5 rural character and residential development score for raster cell

STD 5 standard normalized variable, with mean of 0 and unit variance.

A PCA examination of the two subsets revealed similar dimension sets and

reinforced the axis reported in this study. The investigation determ ined that

there were seven major components associated for the 15 variables studied,

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 14: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

548 J. B. Burley & T. J. Brown

Figure 2. Map illustrating results from dimension six and the resulting equation.

consisting of a density development axis, a rural development axis, a refuge/

stress axis, an erosion control axis, a recreation axis, a rural character/residential

development axis and a wet woodland axis. These seven dimensions form a set

of spatial overlays summarizing the set of 15 suitability overlays. In theory, this

smaller set of seven overlays could then be employed in further landscape

planning studies to assign speci® c locations to speci® c land uses and to derive

a township zoning plan, development plan and natural resource protection plan.

A discussion concerning the generation of these plans is beyond the scope and

intentions of this study. However, our study illustrates a procedure to sort

information in a reductionist manner that may assist in the synthesis of a

landscape plan.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 15: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

Constructing Interpretable Environments from Multidimensional Data 549

Cautions and Limitations

While this investigation generated results that indicate moderate complexity for

the study area, this does not mean that all study areas are similarly structured.

Instead, this investigation indicates that Scio Township as studied by the

investigatory team is relatively complex. Due to the nature of this study, there

is no statistical evidence that another investigatory team comprised of another

year’ s students would derive similar results. Therefore, the results presented in

this study may not be externally valid. In addition, the acquisition and develop-

ment of more overlays for the database or the change in grid cell area may

potentially presen t different results. However, we believe that this study pre-

sents an approach to examine the physical suitability of the landscape for a

variety of land uses and environmental features.

There are several statistical limitations of this study. First, to employ PCA, one

should have multivariate normal data. However, overlays often contain non-

normal distributions. One can correct for normality by transforming the data sets

to normal distributions; nevertheless, this does not mean the data set is multi-

variate normal. In addition, the data values are often not continuous values but

rather ordinal values with low ranges, meaning that the data set does not

necessarily conform to the requirem ents of many parametric statistical tests

including PCA (Johnson & Wichern , 1988). Finally, map data are often spatially

autocorrelated. This means that the raster cell data are not independent from

adjoining cells. Again parametric tests assume independence of observations for

the raster cells; however, the raster cells are most likely spatially autocorrelated

and not independent. Knowing these limitations, we decided to proceed and

explore the data set.

Conclusion

We believe that the complexity of the study area may be due to the geophysical

nature of Scio Township. This township is a glacial moraine landscape with

numerous topographical and hydrological features. A comparative investigation

examining a glacial lake plain landscape would offer insight to understanding

the complexity ® ndings of the study area.

For our study area and database, we concluded that a simple reductionist

model was not necessarily possible and that the minimum number of dimen-

sions required to describe the study area is six or seven. This translates into 37

possible landscape suitability combinations, or 2187. We suggest that this

complexity poses a problem for landscape planners, because one may have to

assess and reassess development plans along six or seven dimensions, and offers

insight into why clear, neatly packaged, environmental development plans may

be dif® cult to generate. We also suggest that this complexity may be a blessing ,

because the results indicate that the spatial properties of suitability maps may

not necessarily covary, affording different locations for varying land uses and

landscape operations.

We anticipate that GIS investigators will begin to apply PCA to their study

areas as an exploratory technique to identify latent structures and relationships

among suitability maps and landscape features. In addition, we would expect

that researchers would begin to report results concerning the variability of

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013

Page 16: Constructing Interpretable Environments from Multidimensional Data: GIS Suitability Overlays and Principal Component Analysis

550 J. B. Burley & T. J. Brown

these dimensions. If somewhat ® rm latent structures are discovered , other

multivariate techniques such as discriminant functions (Johnson & Wichern ,

1988) may be applicable.

References

Berry, J.K. (1993) Beyond M apping: Concepts, Algorithms, and Issues in GIS (Fort Collins, CO, GIS

World).

Burley, J.B. (1991) Vegetation productivity equation for reclaiming surface mines in Clay County,

Minnesota, International Journal of Surface M ining and Reclamation , 5, pp. 1 ± 6.

Burley, J.B., Kopperl, J., Paliga, J. & Carter, W . (1990) Land-use/watershed modeling for Lake Itasca,

Minnesota: a land-use planning, GIS project at Colorado State University, Landscape and Land Use

Planning, 17, pp. 19 ± 25 (W ashington, DC, American Society of Landscape Architects Open Com -

mittee on Landscape and Land Use Planning).

Curtis, J.T . (1959) Vegetation of Wisconsin: An Ordination of Plant Communities (M adison, WI, Univer-

sity of Wisconsin Press).

Eastman, J.R. & Fulk, M . (1993) Long sequence of time series evaluation using standardized principal

components, Photogrammetric Engineering & Remote Sensing, 59(6), pp. 991± 996.

Fung, T. & LeDrew, E. (1987) Application of principal components analysis to change detection,

Photogrammetric Engineering & Remote Sensing, 53(12), pp. 1649± 1658.

Guttman, L. (1954) Some necessary conditions for common factor analysis, Psychometrika , 30,

pp. 179± 185.

Hopkins, L.D . (1977) M ethods for generating land suitability maps: a comparative evaluation, Journal

of the American Institute of Planners, 43(4), pp. 386± 400.

Jackson, D.A. (1993) Stopping rules in principal components analysis: a comparison of heuristical

and statistical approaches, Ecology , 74(8), pp. 2204 ± 2214.

Johnson, R.A. & Wichern, D.W. (1988) Applied M ultivariate Statistica l Analysis (Englewood Cliffs, NJ,

Prentice Hall).

Johnson, R. & Burley, J.B. (1990) Snowy Range ski resort: an illustration of GIS planning principles,

Landscape Architectural Review , 9(1), pp. 15± 18.

Kendall, M. (1939) The geographical distribution of crop productivity in England, Journal of the Royal

Statist ical Society , 102, pp. 21± 48.

McHarg, I.L. (1969) Design with Nature (G arden City, NY, Doubleday/Natural History Press).

Pazner, M., Kirby, K.C., & Thies, N . (1989) M AP II: Reference M anual. A Geographic Information System

for the M acintosh (N ew York, J W iley).

Pielou, E .C. (1984) The Interpretation of Ecological Data (N ew York, J Wiley).

SYSTAT (1992) Version 5.2 Edition (Evanston, IL, SYSTAT).

Steiner, F. (1991) The Living Landscape: an Ecolog ical Approach to Landscape Planning (New York,

McGraw-H ill).

Steinitz, C., Parker, P., & Jordan, L. (1976) Hand drawn overlays: their history and prospective uses,

Landscape Architecture, 66, pp. 444± 455.

Westman, W.E. (1985) Ecology, Impact Assessment, and Environmental Planning (New York, J Wiley).

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

01:

37 0

9 Se

ptem

ber

2013