Coursefinal

70
1 Chapter 4 GIS Data Model Raster Data model

description

 

Transcript of Coursefinal

Page 1: Coursefinal

1

Chapter 4GIS Data Model

Raster Data model

Page 2: Coursefinal

2

A. THE DATA MODEL( Raster Data Model) • geographical variation in the real world is infinitely complex • the closer you look, the more detail you see, almost without limit it would take an infinitely large database to capture the real world precisely• data must somehow be reduced to a finite and manageable quantity by a process of generalization or abstraction • geographical variation must be represented in terms of discrete elements or objects the rules used to convert real geographical variation into discrete objects is the data model • Tsichritzis and Lochovsky (1977) define a data model as "a set of guidelines for the representation of the logical organization of the data in a database... (consisting) of named logical units of data and the relationships between them.”• current GISs differ according the way in which they organize reality through the data model • each model tends to fit certain types of data and applications better than others the data model chosen for a particular project or application is also influenced by:

- the software available - the training of the key individuals historical precedent

Page 3: Coursefinal

3

there are two major choices of data model - raster and vector raster model divides the entire study area into a regular grid of cells in specific sequence

•the conventional sequence is row by row from the top left corner • each cell contains a single value. • is space-filling since every location in the study area corresponds to a cell in the raster • one set of cells and associated values is a layer there may be many layers in a database, e.g. soil type, elevation, land use, land cover vector model uses discrete line segments or points to identify locations

• discrete objects (boundaries, streams, cities) are formed by connecting line segments • vector objects do not necessarily fill space, not all locations in space need to be referenced in the model • a raster model tells what occurs everywhere - at each place in the area

• a vector model tells where everything occurs - gives a location to every object

conceptually, the raster models are the simplest of the available data models

Page 4: Coursefinal

4

B. CREATING A RASTER consider laying a grid over a geologic map

create a raster by coding each cell with a value that represents the rock type which appears in the majority of that cells areas when finished, every cell will have a coded value

in most cases the values that are to be assigned to each cell in the raster are written into a file, often coded in ASCII

this file can be created manually by using a word processor, database or spreadsheet program or it can be created automatically

then it is normally imported into the GIS so that the program can reformat the data for its specific processing needs there are several methods for creating raster databases

Page 5: Coursefinal

5

B. CREATING A RASTER

Cell by cell entry direct entry of each layer cell by cell is simplest entry may be done within the GIS or into an ASCII file for importing each program will have specific requirements the process is normally tedious and time-consuming layer can contain millions of cells

average Landsat image is around 7.4 x 106 pixels, average TM scene is about 34.9 x 106 pixels

Page 6: Coursefinal

6

run length encoding can be more efficient values often occur in runs across several cells

this is a form of spatial autocorrelation - tendency for nearby things to be more similar than distant things data entered as pairs, first run length, then value

e.g. the array 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 0 1 1 1 1 would be entered as 3 0 2 1 2 0 3 1 2 0 3 1 1 0 4 1 this is 16 items to enter, instead of 20 in this case the saving is 20%, but much higher savings occur in practice

imagine a database of 10,000,000 cells and a layer which records the county containing each pixel suppose there are only two counties in the area covered by the database each cell can have one of only two values so the runs will be very long only some GISs have the capability to use run length encoded files

Digital data much raster data is already in digital form, as images, etchowever, resampling will likely be needed in order that pixels coincide in each layer

because remote sensing generates images, it is easier to interface with a raster GIS than any other type

elevation data is commonly available in digital raster form from agencies such as the US Geological Survey

Page 7: Coursefinal

7

C. CELL VALUES Types of values the type of values contained in cells in a raster depend upon both the reality being coded and the GIS different systems allow different classes of values, including: whole numbers (integers) real (decimal) values alphabetic values many systems only allow integers, others which allow different types restrict each separate raster layer to a single kind of value if systems allow several types of values, e.g. some layers numeric, some non-numeric, they should warn the user against doing unreasonable operations e.g. it is unreasonable to try to multiply the values in a numeric layer with the values in a non- numeric layer integer values often act as code numbers, which "point" to names in an associated table or legend e.g. the first example might have the following legend identifying the name of each soil class: 0 = "no class" 1 = "fine sandy loam" 2 = "coarse sand" 3 = "gravel"

Page 8: Coursefinal

8

One value per cell each pixel or cell is assumed to have only one value this is often inaccurate - the boundary of two soil types may run across the middle of a pixel in such cases the pixel is given the value of the largest fraction of the cell, or the value of the middle point in the cell note, however, a few systems allow a pixel to have multiple values

the NARIS system developed at the University of Illinois in the 1970s allowed each pixel to have any number of values and associated percentages

e.g. 30% a, 30% b, 40% cD. MAP LAYERS the data for an area can be visualized as a set of maps of layers

a map layer is a set of data describing a single characteristic for each location within a bounded geographic area

only one item of information is available for each location within a single layer - multiple items of information require multiple layers

on the other hand, a topographic map can show multiple items of information for each location, within limits

e.g. elevation (contours), counties (boundaries), roads, railroads, urbanized areasthese would be 5 layers in a raster GIS typical raster databases contain up to a hundred layers

each layer (matrix, lattice, raster, array) typically contains hundreds or thousands of cells

important characteristics of a layer are its resolution, orientation and zone

Page 9: Coursefinal

9

Page 10: Coursefinal

10

Resolution in general, resolution can be defined as the minimum linear dimension of the smallest unit of geographic space for which data are recorded

in the raster model the smallest units are generally rectangular (occasionally systems have used hexagons or triangles) these smallest units are known as cells, pixels note: high resolution refers to rasters with small cell dimensions high resolution means lots of detail, lots of cells, large rasters, small cells Orientation the angle between true north and the direction defined by the columns of the raster Zones each zone of a map layer is a set of contiguous locations that exhibit the same value these might be :

ownership parcels political units such as counties or nations lakes or islands individual patches of the same soil or vegetation type there is considerable confusion over terms here other terms commonly used for this concept are patch, region, polygon

each of these terms, however, have different meanings to individual users and different definitions in specific GIS packages

in addition, there is a need for a second term which refers to all individual zones that

Page 11: Coursefinal

11

have the same characteristics class is often used for this concept

note that not all map layers will have zones, cell contents may vary continuously over the region making every cell's value unique e.g. satellite sensors record a separate value for reflection from each cell major components of a zone are its value and location(s) Value is the item of information stored in a layer for each pixel or cell cells in the same zone have the same value Location

generally location is identified by an ordered pair of coordinates (row and column numbers) that unambiguously identify the location of each unit of geographic space in the raster (cell, pixel, grid cell)

usually the true geographic location of one or more of the corners of the raster is also known E. EXAMPLE ANALYSIS USING A RASTER GIS Objective identify areas suitable for logging an area is suitable if it satisfies the following criteria :

is Jackpine (Black Spruce are not valuable) is well drained (poorly drained and waterlogged terrain cannot support equipment, logging causes unacceptable environmental damage)

is not within 500 m of a lake or watercourse (erosion may cause deterioration of water quality)

Page 12: Coursefinal

12

Page 13: Coursefinal

13

Procedure recode layer 2 as follows, creating layer 4

y if value 2 (Jackpine n if other value (recode layer 3 as follows, creating layer 5

y if value 2 (good n if other value( spread the lake on layer 1 by one cell (500 m), creating layer 6 recode the spread lake on layer 6 as follows, creating layer 7

n if in spread lake y if not

overlay layers 4 and 5 to obtain layer 8, coding as follows

y if both 4 and 5 are y n otherwise

overlay layers 7 and 8 to obtain layer 9, coding as follows

y if both 7 and 8 are y n otherwise

Operations usedrecode overlay spread

Page 14: Coursefinal

14

Raster GIS capability

Page 15: Coursefinal

15

Page 16: Coursefinal

16

Page 17: Coursefinal

17

Page 18: Coursefinal

18

Slopes and aspects if the values in a layer are elevations, we can compute the steepness of slopes by looking at the difference between a pixel's value and those of its adjacent neighbors the direction of steepest slope, or the direction in which the surface is locally "facing", is called its aspect

aspect can be measured in degrees from North or by compass points - N, NE, E

slope and aspect are useful in analyzing vegetation patterns, computing energy balances and modeling erosion or runoff

aspect determines the direction of runoff this can be used to sketch drainage paths for runoff

E. OPERATIONS ON EXTENDED NEIGHBORHOODS Distance calculate the distance of each cell from a cell or the nearest of several cells

each pixel's value in the new layer is its distance from the given cell(s) Buffer zones buffers around objects and features are very useful GIS capabilities

e.g. build a logging buffer 500 m wide around all lakes and watercourses buffer operations can be visualized as spreading the object spatially by a given distance the result could be a layer with values: 1 if in original selected object 2 if in buffer 0 if outside object and buffer applications include noise buffers around roads, safety buffers around hazardous facilities

Page 19: Coursefinal

19

in many programs the buffer operation requires the user to first do a distance operation, then a reclassification of the distance layer the rate of spreading may be modified by another layer representing "friction "

e.g. the friction layer could represent varying cost of travel this will affect the width of the buffer - narrow in areas of high friction, etc .

Visible area or "viewshed" given a layer of elevations, and one or more viewpoints, compute the area visible from at least one viewpoint e.g. value = 1 if visible, 0 if not

useful for planning locations of unsightly facilities such as smokestacks, or surveillance facilities such as fire towers, or transmission facilities F. OPERATIONS ON ZONES (GROUPS OF PIXELS) Identifying zones

by comparing adjacent pixels, identify all patches or zones having the same value give each such patch or zone a unique number set each pixel's value to the number of its patch or zone Areas of zones

measure the area of each zone and assign this value to each pixel instead of the zone's number

alternatively output may be in the form of a summary table sent to the printer or a file

Page 20: Coursefinal

20

Perimeter of zones measure the perimeter of each zone and assign this value to each pixel instead of the zone's number

alternatively output may be in the form of a summary table sent to the printer or a file

length of perimeter is determined by summing the number of exterior cell edges in each zone

note: the values calculated in both area and perimeter are highly dependent upon the orientation of objects (zones) with respect to the orientation of the grid however, if boundaries in the study area do not have a dominant orientation such errors may cancel out Distance from zone boundary

measure the distance from each pixel to the nearest part of its zone boundary, and assign this value to the pixel

boundary is defined as the pixels which are adjacent to pixels of different values Shape of zone

measure the shape of the zone and assign this to each pixel in the zone one of the most common ways to measure shape is by comparing the perimeter length of a zone to the square root of its area by dividing this number by 3.54 we get a measure which ranges from 1 for a circle (the most compact shape possible) to 1.13 for a square to large numbers for long, thin, wiggly zones commands like this are important in landscape ecology

Page 21: Coursefinal

21

Perimeter of zones helpful in studying the effects of geometry and spatial arrangement of habitat

e.g. size and shape of woodlots on the animal species they can sustain

e.g. value of linear park corridors across urban areas in allowing migration of animal species G. COMMANDS TO DESCRIBE CONTENTS OF LAYERS it is important to have ways of describing a layer's contents particularly new layers created by GIS operations particularly in generating results of analysis One layer generate statistics on a layer e.g. mean, median, most common value, other statistics More than one layer compare two maps statistically

e.g. is pattern on one map related to pattern on the other ?e.g. chi-square test, regression, analysis of variance Zones on one layer generate statistics for the zones on a layer e.g. largest, smallest, number, mean area

Page 22: Coursefinal

22

H. ESSENTIAL HOUSEKEEPING list available layers input, copy, rename layers import and export layers to and from other systems other raster GIS input of images from remote sensing system other types of GIS identify resolution, orientation

"resample "changing cell size, orientation, portion of raster to analyze change colors provide help to the user exit from the GIS (the most important command of all!)

Page 23: Coursefinal

23

INTRODUCTION Why use raster? •data are acquired in that form remote sensing, photogrammetry or scanning •is a common way of structuring digital elevation data .•raster assumes no prior knowledge of the phenomenon, sampling is done uniformly

•knowledge of variability would allow us to sample more heavily in areas of high variability (rugged terrain) and less heavily in smooth terrain

•data are often converted to raster as a common format for data interchange •for merging with remote sensing images or DEMs •raster algorithms are often simpler and faster

•e.g. buffer zone generation is simpler in raster •raster may be appropriate if the solution requires uniform resolution, e.g. in finding optimum routes for linear features such as power lines, or in inferring the locations of stream networks from DEMs

Objectives there are many options for storing raster data (many data structures(

•some are more economical than others in use of storage •some are more efficient in access and processing speed

Page 24: Coursefinal

24

B. STORAGE OPTIONS FOR RASTER DATA by convention, raster data is normally stored row by row from the top left

this is the European/North American reading order is also the order of scan of a TV image

example the image A A A A A B B B A A B B A A A B

would be stored in 16 memory positions, one for each pixel, in the sequence: A A A A A B B B A A B B A A A B

What if there is more than one layer? two options: 1. store the layers separately

this is the normal practice 2 . store all information for each pixel together

this requires extra space to be allocated initially within each pixel's storage location for layers which might be created later during analysis this is usually difficult to anticipate

What do raster systems store in each pixel? some allow only an integer, in a fixed range, e.g. -127 to +127 (1 byte per pixel) or -32767 to +32767 (2 bytes per pixel(some allow integers, real (decimal) numbers and mixed alphabetic letters and numbers in each pixel

in this case it helps if the system keeps track of what type of data is stored in each layer and stops the user doing wrong types of analysis on the data

Page 25: Coursefinal

25

Example: vegetation data is recorded as a class (A thru G) in each pixel elevation data is recorded as a decimal number (e.g. 100.3 m( the system should not allow the user to add the pixel values from the two layers (A + 100.3) or perform any other kind of arithmetic operation on the vegetation data

Raster/Vector combinations •many raster-based systems allow vector input Example:

•a polygon, defined by its vertices, is input •convert this to a raster

•e.g. assign 1 to all pixels inside the polygon, 0 to all outside •some forms of data are really hybrids of raster and vector:

•Freeman chain code has finite resolution based on pixels (raster-like) but defines lines and the boundaries of objects (vector-like) •a raster can be used to define objects at fixed resolution if every pixel is given an object number instead of a value

•the object numbers are pointers to an attribute table: Raster ObjectAttributes •23 23 23 24 23 A 100.0 23 23 24 24 24 B 101.1 23 23 24 24 23 23 23 24

•this gives us an object with its attributes, plus a list of pixels associated with the object instead of the object's coordinates

•in this sense, a raster is a finite resolution geometry rather than an alternative way of structuring spatial data

Page 26: Coursefinal

26

C. RUN ENCODING geographical data tends to be "spatially auto-correlated", meaning that objects which are close to each other tend to have similar attributes

Tobler expressed it this way: "All things are related, but nearby things are more related than distant things"

because of this principle, we expect neighboring pixels to have similar values so instead of repeating pixel values, we can code the raster as pairs of numbers - (run length, value(

e.g. instead of 16 pixel values in original raster matrix, we have : 4A 1A 3B 2A 2B 3A 1B produces 7 integer/value pairs to be stored

if a run is not required to break at the end of each line we can compress this further: 5A 3B 2A 2B 3A 1B = 6 pairs however, it helps to limit the possible size of the run so that we can use less space to store the run length, as the amount of space allocated must be sufficient for the maximum run length Problems layers now have different lengths depending on the amount of compression (lengths of runs( storing all layers together for each pixel now makes no sense run encoding would be little use for DEM data or any other type of data where neighboring pixels almost always have different values

Page 27: Coursefinal

27

D. SCAN ORDER 1. Row order described already are there better ways of ordering the raster than row by row from the top left? other orders may produce greater compression 2 Row prime order (Boustrophedon( suppose we reverse every other row: diagram this has the charming name boustrophedon from the Greek for "how an oxen plows a field" avoids a long jump at the end of each row, so perhaps the raster would produce fewer runs and thus greater compression this order is used in the Public Land Survey System: the sections in each township are numbered in this way one the original raster it results in: 4A 3B 3A 3B 3A = 5 runs 3 Morton order

Morton order is the basis of many efforts to reduce database volume named for Guy Morton who devised it as a way of ordering data in the Canada Geographic Information System however, this way of ordering or scanning a raster was well known long before Morton it is associated with the names of several mathematicians and geometers: Hilbert,

Page 28: Coursefinal

28

Peano, and Koch coincidentally, Morton is the name of the lower left corner county in Kansas

the strategy is to exhaust each area of the map in sequence, whereas row by row order scans from one side to the other

this minimizes the number of large jumps

diagram this is one of several hierarchical ordering systems

it is built up level by level, repeating the same pattern at each level, as follows 2 3 10 11 14 15 42 43 46 47 58 59 62 63 0 1 8 9 12 13 40 41 44 45 56 57 60 61 2 3 6 7 34 35 38 39 50 51 54 55 0 1 4 5 32 33 36 37 48 49 52 53 10 11 14 15 26 27 30 31 8 9 12 13 24 25 28 29

2 3 6 7 18 19 22 23 0 1 4 5 16 17 20 21 it is only valid for square arrays where the numbers of rows and columns are powers of 2

e.g. 2x2, 4x4, 8x8, 16x16, 32x32, 64x64, etc. how does it do on our 4x4 array? 5A 3B 1A 1B 2A 2B 2A = 7 runs

which is as long as row by row compression 4Peano scan (also Pi-Order or Hilbert(the Peano scan or Pi-order is like boustrophedon in always moving to a neighboring pixel diagram

Page 29: Coursefinal

29

E. DECODING SCAN ORDERS since Morton and Peano orders are useful but complex, two types of questions arise when they are used: 1 What are the row and column numbers for a given pixel? 2 What is the position in the scan order for a given row and column number? Method start by numbering the rows and columns from 0 up: 3 10 11 14 15 2 8 9 12 13 1 2 3 6 7 0 0 1 4 5 0 1 2 3- row 2, column 3 is position 13 in the Morton sequence 1- How to go from row 2, column 3 to Morton sequence? a. convert row and column numbers to binary representations: 16s 8s 4s 2s 1s 1 0 row 2 1 1 column 3 b. interleave the bits, alternating row and column bits (called bit interleaving (1 1 0 1 row col c. evaluate this sequence of bits as a binary number: Answer: 8 + 4 + 1 = 13 so to get the Morton position, interleave the bits of the row and column number . How to find row and column number from Morton position 9? a. convert the position number to a binary number 16s 8s 4s 2s 1s 1 0 0 1 (8 + 1 = 9) row col b. separate the bits: 1 0 row = 2 0 1 col = 1 Generalization can express the row and column number to any base, not just base 2 (binary), and including mixtures of bases example: row 6, column 15, using base 4 instead of base 2 6464s 16s 4s 1s 1 2 row 6 = 1x4 + 2x1 3 3 col 15 = 3x4 + 3x1 interleaving: 1 3 2 3 1x64 + 3x16 + 2x4 + 3x1 = 123 answer: row 6 column 15 is position 123

Page 30: Coursefinal

30

HIERARCHICAL DATA STRUCTURES

A. INTRODUCTION different scan orders produce only small differences in compression

the major reason for interest in Morton and other hierarchical scan orders is for faster data access

the amount of information shown on a map varies enormously from area to area, depending on the local variability

it would make sense then to use rasters of different sizes depending on the density of information

large cells in smooth or unvarying areas, small cells in rugged or rapidly varying areas

unfortunately unequal-sized squares won't fit together ("tile the plane") except under unusual circumstances one such circumstance is when small squares nest within large ones

there are, however, some methods for compressing raster data that do allow for varying information densities B. INDEXING PIXELS consider the 16 by 16 array in which just one cell is different notation: row and column numbering starts at 0 thus the odd cell is at row 4, column 7

Page 31: Coursefinal

31

Procedure begin by dividing the array into four 8x8 quadrants, and numbering them 0, 1, 2 and 3 as in the Morton order quads 1, 2 and 3 are homogeneous (all A) quad 0 is not homogeneous, so we divide only it into four 4x4 quads these are numbered 00, 01, 02 and 03 because they are partitions of the 8x8 quad 0 of these, 00, 01 and 02 are homogeneous, but 03 is divided again into 030, 031, 032 and 033 now only 031 is not homogeneous, so it is divided again into 0310, 0311, 0312 and 0313 what we have done is to recursively subdivide using a rule of 4 until either: a square is homogeneous or

we reach the highest level of resolution (the pixel size) this allows for discretely adaptable resolution where each resolution step is fixed this concept is related to the use of Morton order for run encoding if we had coded the raster using Morton order, each homogeneous square would have been a run 8x8 squares are runs of 64 in Morton order, 4x4 are runs of 16, etc the run encoded Morton order would have been: 16A 16A 16A 4A 1A 1B 1A 1A 4A 4A 64A 64A 64A if we allow runs to continue between blocks we could reduce this to: 53A 1B 202A i.e. a homogeneous block of 2m by 2m pixels is equivalent to a Morton run of 22m pixels

Page 32: Coursefinal

32

Decoding locations the conversion to row and column is the same as for decoding Morton numbers except that in this case the code is in base 4 in the example the lone B pixel is assigned code 0311 1. convert the code to base 2

hint: every base 4 digit converts to a pair of base 2 digits thus 0311 becomes 00110101

2. separate the bits to get: row 0100 = 4 column 0111 = 7

so the numbering system is just the Morton numbering of blocks, expressed in base 4 however, sequence and data compression are not the most useful aspects of this concept C. THE QUADTREE can express this sequencing as a tree the top is the entire array at each level there is a four-way branching each branch terminates at a homogeneous block the term quadtree is used because it is based on a rule of 4 each of the terminal branches in the tree (the ones having values) is known as a leaf in this case there are 13 leafs or homogeneous square blocks

Page 33: Coursefinal

33

Coding quadtrees to store this tree in memory, need to decide what to store in each memory location there are many ways of storing quadtrees, but they all share the same basic ideas one way is to store in each memory location EITHER: 1. the value of the block (e.g. A or B), or or 2. a pointer to the first of the four "daughter" blocks at the next level down all four daughter blocks of any parent always occur together overhead - Coding quadtrees thus, the quadtree might be stored in memory as: Position: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Contents: 2 6 A A A A A A 10 A 14 A A A B A A

)level:(0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4

the content of position 1 is a pointer indicating that the map is subdivided into four blocks whose contents can be found starting at position 2 position 2 indicates that the four parts of the 0 block can be found beginning at position 6 positions 3, 4 and 5 indicate that the other three level 1 blocks are all A and are not further subdivided

Accessing data through a quadtree consider two ways in which this quadtree may be accessed: 1. find all parts of the map with a given value 2. determine the contents of a given pixel notation: if the array has 2n by 2n pixels there are n possible levels in the tree, or n+1 if we

count the top level (level 0) use m for the number of leafs

Page 34: Coursefinal

34

1. to find the parts of a map with a given value we must examine every leaf to see if its value matches the one required this requires m steps as there are m leafs

2. to find the contents of a given pixel, start at the top of the tree if the entire map is homogeneous, stop as the contents of the pixel are known already if not, follow the branch containing the pixel do know which branch to follow: take the row and column numbers, write them in binary, interleave the bits, and convert to base 4 e.g. row 4, column 7 converts to 0311 at each level, use the appropriate digit to determine which branch to follow e.g. for 0311, at level 0 follow branch 0, at level 1 follow branch 3, etc. in the worst case, may have to go to level n to find the contents of the pixel, so the number of steps will be n

Page 35: Coursefinal

35

QUADTREE ALGORITHMS AND SPATIAL INDEXES A. INTRODUCTION

this unit examines how quadtrees are used in several simple processes, including :

- measurement of area - overlay - finding adjacent leafs - measuring the area of contiguous patches

in addition, this unit will look at how quadtrees can be used to provide indexes for faster access to vector-coded objects

finally, alternative forms of spatial indexing will be reviewed

Definition to traverse a quadtree: begin by moving down the leftmost branch to the first leaf

after processing each leaf in this branch, move back up to the previous branching point, and turn right this will either lead down to another leaf, or back to a previous branching point diagram several of the following examples use this simple raster and its associated quadtree

Page 36: Coursefinal

36

B. AREA ALGORITHM Procedure to measure the area of A on the map:

traverse the tree and add those leafs coded A, weighted by the area at the level of the leaf

Example in the example quadtree, elements at level 0 have area 16, at level 1 - area 4, at level 2 - area 1

thus, area of A is: 1 (leaf 00) + 1(leaf 02) + 1 (leaf 03) + 4 (leaf 2) + 1 (leaf 32) = 8 units

C. OVERLAY ALGORITHM Procedure to overlay the two maps: traverse the trees simultaneously, following all branches which exist in either tree where one tree lacks branches (has a leaf where the other tree has branches), assign the value of the associated leaf to each of the branches e.g. node 3 is branched on map 1, not on map 2 the leafs derived from this node (30, 31, 32 and 33) have values B, B, A and B on map 1, all 2 on map 2 the new tree has the attributes of both of the maps, e.g. A1, B2

Page 37: Coursefinal

37

D. ADJACENCY ALGORITHM Problem find if two leafs (e.g. 03 and 2) are adjacent Corollary: find the leafs adjacent to a given leaf (e.g. 03) note that in arc based systems adjacencies are coded in the data structure (R and L polygons), so this operation is simpler with vector based systems Definition here adjacent means sharing a common edge, not just a common point diagram Two cases leaf codes are: 1. same length (same size blocks, e.g. 01 and 02) or 2. one is longer than the other (different size blocks, e.g. 03 and 2) solving this problem requires the use of: 1. conversion from base 4 to binary and back base 4 because of the "rule of 4" used in constructing quadtrees 2. bit interleaving 3. a new concept called Tesseral Arithmetic Tesseral Arithmetic tesseral arithmetic is an alternate arithmetic useful for working with the peculiarities of quadtree addressing to add binary numbers normally, a "carry" works to the position to the left e.g. adding 1 to 0001 gives 0010 this is the same as decimal arithmetic except that carries occur when the total reaches 2 instead of 10

Page 38: Coursefinal

38

in tesseral arithmetic, a "carry" works two positions to the left e.g. adding 1 to 0001 gives 0100

the reverse happens on subtraction 1000 less 1 is 0010 not 0111, as the subtraction affects only the alternate bits

in other words, if we number the bits from the left starting at 1 adding or subtracting 1 affects only the even- numbered bits adding or subtracting 2 (binary 10) affects only the odd-numbered bits

Determining Adjacency Determining adjacency 1. same size blocks: two leafs are adjacent if their binary representations differ by binary 1 or 10 (decimal 1 or 2) in tesseral arithmetic

example: 01 and 03 are adjacent because 0001 and 0011 differ by binary 10, or decimal 2 example: 033 and 211 are adjacent because in tesseral arithmetic 001111 + 10 = 100101, or 100101 - 10 = 001111

2 .different size blocks: taking the longer of the two codes :convert it from base 4 to binary tesseral-add and -subtract 01 and 10 to create four new codes reject any cases where subtracting was not possible (a "negative" code would have resulted, or a "carry" would have been necessary to the left of the leftmost digit) discard the excess rightmost digits in the resulting transformed longer codes convert back to base 4 to get the leaf

Page 39: Coursefinal

39

the two blocks are adjacent if any of the transformed and truncated codes are equal to the shorter code

example: Are 02 and 2 adjacent? convert 02 to binary = 0010 0010 + 1 = 0011 0010 + 10 = 1000 0010 - 1 (impossible) 0010 - 10 = 0000

truncating gives 00 and 10 these are equal to 0 and 2 in base 4

therefore, 02 and 2 are adjacent (also 02 and 0 are adjacent) example: Are 033 and 2 adjacent?

convert 033 to binary = 001111 001111 + 1 = 011010 001111 + 10 = 100101 001111 - 1 = 001110 001111 - 10 = 001101

truncating to two digits gives 01, 10 and 00 these are equal to 1,2 and 0 in base 4

therefore, 033 and 2 are adjacent example: Find leafs adjacent to 03 in the first map above method: find the codes of adjacent blocks of the same size, then work down the tree to find the appropriate leaf

(note: can only find equal or shorter codes - equal or bigger leaf blocks) 0011 + 1 = 0110 = 12 : leaf 1 0011 + 10 = 1001 = 21 : leaf 2 0011 - 1 = 0010 = 02 : leaf 02 0011 - 10 = 0001 = 01 : leaf 01

Page 40: Coursefinal

40

Length of common boundary the length of common boundary between the two blocks is determined by the level of the longer code

can use this to construct an algorithm to determine the perimeter of a patch e.g. the length of the A/B boundary in the first example map

diagram E. AREA OF A CONTIGUOUS PATCH ALGORITHM Problem find the area of a contiguous patch of the same value, e.g. all A Corollary: How many separate patches of A are there? note: this is a general method which can be used in both quadtree and vector data structures

i.e. find contiguous sets of quadtree blocks or irregularly shaped polygons, given that adjacencies are known or can be determined

the following example uses the original raster map note that there are only two contiguous patches; the areas of A and B form only one patch each

Method Area of a contiguous patch create a list of leafs, with their associated codes, by traversing the tree allow space for a "pointer" for each leaf, and give it an initial value of 0

Page 41: Coursefinal

41

Algorithm for each leaf i:

find all adjacent leafs j with equal or shorter length codes (4 maximum) if the adjacent leaf j has the same value, determine which of i and j has the higher (larger value) position in the list, and set its pointer to the lower position (note: if a pointer has already been changed, it may be changed again or left, the result is the same)

this produces the final pointer list Results 1. the number of contiguous patches will be equal to the number of zeros

in the example, two pointers are zero, indicating two contiguous patches

2. the value of each patch can be obtained by looking up the values of leafs with 0 pointers

in the example, leafs 00 and 01 have 0 pointers these have the values A and B respectively

3. to find the area of each patch, select one of the zeros and sum its area plus the areas of any leafs which point to it directly or indirectly

the component leafs of each patch can be found by starting at with a leaf at the end (or beginning) of the list and following the pointers until a 0 is found

Page 42: Coursefinal

42

e.g. leaf at position 10 (code 33) points to 8, which points to 7, which points to 5, which points to 2, which has a zero pointer therefore, leaf position 10 (code 33) is part of the same patch as leaf 2 (code 01) and has the value B

the areas can be found by summing the leaf areas for the example: A leafs: 00 02 03 2 32 A positions: 1 3 4 6 9 Area of A: 1 + 1 + 1 + 4 + 1 = 8

B leafs: 01 1 30 31 33 B positions: 2 5 7 8 10 Area of B: 1 + 4 + 1 + 1 + 1 = 8 F. QUADTREE INDEXES Indexing using quadtrees indexes are used in vector systems to get fast access to the objects in a particular area of a map

very useful in searching for potentially overlapping or intersecting objects therefore, they are an essential part of a polygon overlay operation

looked at the usefulness of a simple sort of objects on one axis (e.g. x) in the moving band operation for intersection calculations

now will look at methods which can be thought of as sorting on both axes simultaneously

these use 2D coding systems and a simple one dimension sort Setting up the index

steps are: 1. for each object (point, line, area) in the database, find the smallest quadtree leaf which encloses the object

Page 43: Coursefinal

43

some large objects will have to be classified as NULL, as they span more than one of the four leafs in the first branching (0, 1, 2 and 3) other smaller objects may be enclosed within a small leaf, e.g. 031

2. sort or index the objects by the enclosing quadtree leafs Using the index to find all objects which might intersect an area, line or point of interest

find the quadtree leaf enclosing the object of interest starting at this point follow up the quadtree through all branching points that contain the original cell and down the quadtree to all branching points and leafs below the cell

example: the area of interest is enclosed in leaf 31 of the original example quadtree the objects which may intersect the area of interest are those in leaf 31 and all leafs above it

thus, these are 3 and the null leaf objects in other (remote) leafs cannot intersect the area of interest, so need not be checked

example: the area of interest is enclosed in leaf 0 the objects which may intersect the area are in leaf 0, the null leaf and all leafs below 0 - 00, 01, 02, 03 there may be other leafs below these as well such as 010, 011, 012, 013, etc

Page 44: Coursefinal

44

Generalizations quadtree indexing is most effective for small objects, particularly points large objects tend to require large enclosing leafs even though they may not fill much of the space (i.e. highway corridors) these objects will always need to be checked for intersection it may pay to subdivide objects so that the pieces fall entirely within smaller leafs indexing in this way is intuitively more efficient than indexing by x or y alone since the quadtree index is effectively two-dimensional the divisions at each branching need not be equal in size it may pay to have some blocks of smaller area and some of larger area, rather than four equal squares at each branching however, for general efficiency the blocks should be rectangular G. R-TREE INDEXES R-tree indexes are a response to the problem of indexing large areasR stands for "range", a concept similar to MER Method find two, possibly overlapping, rectangles (aligned with x

and y axes) such that :as many objects as possible are wholly within one or the other rectangle there are roughly equal numbers of objects wholly enclosed in each rectangle the overlap between the rectangles is minimum

Page 45: Coursefinal

45

indexing is determined by the rectangle in which the object is contained objects which are wholly within a rectangle are associated with that rectangle

objects which are not wholly within either of the two rectangles are associated with the undivided map

apply the procedure recursively, finding two new smaller rectangles within each existing rectangle this creates a tree structure similar to the quadtree every object is associated with some node in the tree to find the objects which might intersect a given area of interest :

find the smallest rectangle used in the indexing procedure which wholly encloses the area of interest

the objects are those associated with this rectangle and all nodes above and below it in the tree Problem

although benchmark tests have shown that R-trees are generally more efficient than quadtrees and simple 1-D sorts, they are computationally intensive to construct

Page 46: Coursefinal

46

GIS ApplicationNetwork Analysis

Page 47: Coursefinal

47

Network Analysis• Much of the economic and social activity of the world is organised into networks.

• The form, capacity and efficiency of these networks have a substantial impact on our standard of living and affect perception of the world around us.

• Networks also exist in the physical world, e.g. networks of streams and rivers.

What Is Network?• rail network (KCR, MTR)

• road and highway network (KMB) 0electricity network (CLP) 0telephone network (CWHKT)

• air transportation network (Cathay Pacific)

• street network (Emergency Services, Police Department, etc.)

Page 48: Coursefinal

48

Questions That Require Use of Network

• What is the best route from a location to a given destination?

• Where should I locate a service centre?

• Which centre serves a particular location?

• How accessible is a location to other locations?

• How many trips will be generated between origins and destinations?

•Given street addresses, how can I map occurrence of given events on a street map?

• A network can be represented digitally by nodes and links. •Nodes represent intersections, interchanges and confluence points. •Links represent transportation facility segments between nodes. Network Data Structure

Page 49: Coursefinal

49

Page 50: Coursefinal

50

Page 51: Coursefinal

51

Page 52: Coursefinal

52

Page 53: Coursefinal

53

Page 54: Coursefinal

54

Page 55: Coursefinal

55

Page 56: Coursefinal

56

Page 57: Coursefinal

57

Page 58: Coursefinal

58

Page 59: Coursefinal

59

Page 60: Coursefinal

60

Page 61: Coursefinal

61

Page 62: Coursefinal

62

Page 63: Coursefinal

63

Page 64: Coursefinal

64

Page 65: Coursefinal

65

Page 66: Coursefinal

66

Page 67: Coursefinal

67

Page 68: Coursefinal

68

Page 69: Coursefinal

69

Page 70: Coursefinal

70