Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with...

32
Intro to Spatial Data Analysis GIS 5222 Jake K. Carr Week 4 Intro to Spatial Data Analysis Jake K. Carr

Transcript of Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with...

Page 1: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Intro to Spatial Data Analysis

GIS 5222

Jake K. Carr

Week 4

Intro to Spatial Data Analysis Jake K. Carr

Page 2: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Spatial Data Analysis

Spatial Data Analysis (SDA) is the process of identifyinggeographic patterns in data and analyzing how the relationshipsbetween features are affected by their relative locations.

What makes spatial data special is that each observation isgeographically referenced to a particular area on the map.

By exploiting the information contained in that geographicreference we will be able to draw stronger conclusions than if wesimply ignored it.

Intro to Spatial Data Analysis Jake K. Carr

Page 3: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Spatial Data Types

There are two types of spatial data:

• Continuous• Data from a surface of values - like current temperature.

• Discrete• Data associated with individual geographic objects - like

population by county in OH.

Intro to Spatial Data Analysis Jake K. Carr

Page 4: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Continuous Phenomena

Continuous phenomena such as precipitation or temperature canbe found or measured anywhere.

These phenomena have no gaps.

You can measure a value at any location.

Intro to Spatial Data Analysis Jake K. Carr

Page 5: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Temperature Readings

Intro to Spatial Data Analysis Jake K. Carr

Page 6: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Discrete Features

For discrete features the ‘location’ of each feature is mutuallyexclusive of the ‘location’ of other features.

If a feature is located here, it cannot also be located elsewhere.

• Ex: Counties in OH.

The variables that are associated with discrete features is oftenwhat we are most interested in:

• Ex: Population counts for each county in OH.

Intro to Spatial Data Analysis Jake K. Carr

Page 7: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

County Population

Here, the (geographic) featuresare counties in OH.

The attribute (variable) ofinterest is Population.

Intro to Spatial Data Analysis Jake K. Carr

Page 8: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Spatial Data Representation

There are two main ways to represent spatial data:

• Raster• A grid of cells in which the value of an attribute is assigned to

the grid cell corresponding to the ‘same’ location.

• Vector

• Locational shapes constructed of individual points (vectors) inwhich an attribute value is assigned to the point(s)corresponding to that location.

Raster is faster, but Vector is ‘corrector’ !

Intro to Spatial Data Analysis Jake K. Carr

Page 9: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Spatial Data Representation

Vector Raster

Intro to Spatial Data Analysis Jake K. Carr

Page 10: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Raster Example

Intro to Spatial Data Analysis Jake K. Carr

Page 11: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Vector Example

Intro to Spatial Data Analysis Jake K. Carr

Page 12: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Geometry Types

For this course, we will (almost) always work with vector data.

The vector format is useful for ‘accurate’ representation of spatialdata features from the standard geometry types:

• Points: a pair of double-precision coordinates in the order(X,Y).

• Lines: an ordered set of points (vertices), connected insequence.

• Polygons: one or more rings. A ring is a connected sequenceof three or more points (vertices) that form a closed,non-self-intersecting loop.

Intro to Spatial Data Analysis Jake K. Carr

Page 13: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

All Three Geometry Types

Intro to Spatial Data Analysis Jake K. Carr

Page 14: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Areal Support/Block

Areal data1involves aggregated quantities for each areal unit withinsome relevant spatial partition of a given region (such as thecounties within a state).

On p. 1 of the text (p. 24 in .pdf) the author mentions theconcept of an ‘areal support or block.’ Support is just a statisticsword for unit of observation.

The areal support of a set of county population values is thecounty.

Areal data is always represented by POLYGON geometries _ arealand polygon are interchangeable.

1The text specifically focuses on analysis of areal data.Intro to Spatial Data Analysis Jake K. Carr

Page 15: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Geometry Types

Geometry types build from the ‘ground up’:

Points are the basic building block of all vector geometries (moreon that in a minute).

Lines (polylines) are then built up as a series of connected points.

Polygons are closed polylines - the first and last point in the seriesis the same.

Intro to Spatial Data Analysis Jake K. Carr

Page 16: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Vector Geometry

Why is vector data called vector data?

Intro to Spatial Data Analysis Jake K. Carr

Page 17: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Vector Geometry

Intro to Spatial Data Analysis Jake K. Carr

Page 18: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Where to Find Geometries

Intro to Spatial Data Analysis Jake K. Carr

Page 19: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Fun with vector geometries!

See:

• polygonVertices.py

• pointVertices.py

• pointVertices shapefile.py

Intro to Spatial Data Analysis Jake K. Carr

Page 20: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Shapefiles

Every shapefile data set includes a minimum of three files.

The first of these files digitally stores the geometry of the featuresas sets of vector coordinates (.shp).

A second required file holds an index that matches the spatialfeatures to their associated attribute data (.shx).

The third required file stores the attribute data in dBASE format(.dbf).

Intro to Spatial Data Analysis Jake K. Carr

Page 21: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Shapefiles

At a minimum, we need the following three files to use counties ina map document:

• counties.shp: the main shape file containing vectorcoordinate data

• counties.shx: the index file

• counties.dbf: the dBASE table

Intro to Spatial Data Analysis Jake K. Carr

Page 22: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Shapefiles

There are a few additional files associated with shapefileconstruction, but these are optional.

One of the most important optional files is the projection file(.prj). This file includes the coordinate system definition.

– Do you remember how to determine the coordinate systemfor a given shapefile with ArcPy?

Another optional file stores the metadata for the file (.xml).Metadata is additional descriptive information about the shapefile -like when it was produced, and what time period the attribute datawas collected, etc.

Intro to Spatial Data Analysis Jake K. Carr

Page 23: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Spatial Analysis

In some applications the purpose of analysis is to describe thespatial arrangment of geographic features.

In others, the focus may be on describing the spatial variation inattribute values associated with those geographic features.

These descriptions might involve identifying interesting aspects ofthe data, such as detecting clusters or concentrations of high (orlow) values.

The next step might be to try to understand why certain areas ofthe map have a concentration of high (or low) values.

Intro to Spatial Data Analysis Jake K. Carr

Page 24: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Explaining Variation

There are two aspects to variation in a spatial data set.

The first is the basic variation in the data values (disregarding theinformation provided by the locational index).

The second is spatial variation, or the variation in the data valuesacross the map.

Describing these two aspects of variation involves differentstrategies.

Intro to Spatial Data Analysis Jake K. Carr

Page 25: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Variation in Data Values

Simply plot a histogram of the data:

See matplotlib’s pyplot submodule example:

variation.py

Intro to Spatial Data Analysis Jake K. Carr

Page 26: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Spatial Variation in Data Values

Spatial variation in AREA:

Intro to Spatial Data Analysis Jake K. Carr

Page 27: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Explaining Variation

To explain variation we will try to find a model that will accountfor the variation in some attribute.

It is possible that this model could also provide a good explanationof the spatial variation in our data.

It is also possible that a model that apparently does well indescribing attribute variation leaves important aspects of its spatialvariation unexplained.

– For example all the cases that are very poorly fitted by themodel might be in one part of the map.

Intro to Spatial Data Analysis Jake K. Carr

Page 28: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Elements of Spatial Analysis

• Cartographic Modelling: Each data set is represented as amap and map-based operations (i.e. Buffer Analysis) generatenew maps.

• Mathematical Modelling: Model outcomes are dependenton the form of the spatial interaction between objects in themodel. This occurs either through the spatial relationships orthe geographical positioning of objects within the model.

• Statistical Modelling: Techniques for the proper analysis ofspatial data which make use of the spatial referencing in thedata.

Intro to Spatial Data Analysis Jake K. Carr

Page 29: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Spatial Data Matrix

Spatial data consists of a set of (k) attributes

Z = {Z1,Z2, ...,Zk}

measured at (or associated with) a set of (n) spatial locations

S = {S(1), S(2), ...S(n)}

In other words, there are n locations with k variables measured ateach location.

Intro to Spatial Data Analysis Jake K. Carr

Page 30: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Spatial Data Matrix

Variables are represented by capital letters, like Z and S .Observations from those variables are indicated by lower caseletters, such as z and s.

The Spatial Data Matrix consisting of k attributes for each of ngeographic features has the standard form:

z1(1) z2(1) . . . zk(1) s(1)z1(2) z2(2) . . . zk(2) s(2)

......

......

...z1(n) z2(n) . . . zk(n) s(n)

Intro to Spatial Data Analysis Jake K. Carr

Page 31: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

Spatial Data Matrix: ArcMap Edition

In ArcMap, the order of these variables changes to the form:

z1(1) s(1) z2(1) . . . zk(1)z1(2) s(2) z2(2) . . . zk(2)

......

......

...z1(n) s(n) z2(n) . . . zk(n)

Z1 is typically called FID, and S() is the Shape* variable.

Intro to Spatial Data Analysis Jake K. Carr

Page 32: Intro to Spatial Data Analysis · Geometry Types For this course, we will (almost) always work with vector data. The vector format is useful for ‘accurate’ representation of spatial

We’ve already seen this!

Intro to Spatial Data Analysis Jake K. Carr