MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED...

49
MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Master of Science in Geography University of Regina Michael Wayne Frith Redlands, California May, 1997 O 1997 Michael W. Frith

Transcript of MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED...

Page 1: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

MEASURING ERROR IN MANUALLY DIGITIZED MAPS

A Thesis Submitted to the Faculty of Graduate Studies and Research

In Partial Fulfillment of the Requirements for the Degree of Master of Science

in Geography

University of Regina

Michael Wayne Frith Redlands, California

May, 1997

O 1997 Michael W. Frith

Page 2: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

395 Wellington Street 395. nie Wellington Ottawa ON K I A ON4 Ottawa ON K 1 A ON4 Canada Canada

Your file Votre Md-

Our nle Notre rdfdnrnce

The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or sel1 reproduire, prêter, distribuer ou copies of this thesis in rnicroform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/nlm, de

reproduction sur papier ou sur fonnat électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantid extracts fiom it Ni la thèse ni des extraits substantiels may be printed or othewise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation .

Page 3: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

ABSTRACT

Many spatial databases have been created from the conversion of analog maps to

their digital representation. Manual conversion introduces error into the database, but the

amount of error is often left unmeasured. Estimates of this error can be made using Perkal

epsilon bands. Three aspects of error associated with the manual conversion of analog to

digital maps are estimated using an epsilon band: registration error, digitizing error and

inherent map error. Since these estimates are independent, they can be added for a total

error estimate.

To determine the feasibility of establishing a total error estimate, a sample line

was constmcted and independently digitized by eight operators in both point and stream

digitizing modes. This approach yielded a sample data set in which the perpendicular

distances between the two lines generated in point and stream modes were used as the

digitizing error. The registration error was retumed from the digitizing software. The

inherent map error was based on the Canadian federal government's Energy, Mines and

Resources Mapping Branch horizontal accuracy standards. Those standards stipulate that

any point on a map must be within 0.5 mm of the location of the true point at rnap scale.

Results indicated that the maximum digitizing error is not usually representative

of a rnanually digitized data set. Using an epsilon based on the median of the digitizing

error is also not a particularly valid approach since 50% of the line may not be present in

the epsilon band. An epsilon band based on the mean deviation and its accompanying

statistics appears to provide the most reasonable measure of digitizing accuracy.

Page 4: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

1 would like to express my sincere gratitude to Dr. David Gauthier for his

encouragement and guidance through this extended journey. 1 would also like to thank the

Department of Geography and the Faculty of Graduate Studies and Research for the

funding received to aid in this endeavour. Partial funding to assist this project was also

received through a Social Sciences and Humanities Research Grant of Dr. Gauthier.

Thanks are d s o extended to Environmental Systems Research Institute for

allowing me the flexibility and resources to complete my degree. In particular, I wish to

thank Bill Moreland.

Thanks also to my friend and confidant Becky who put up with me the last 2

years.

iii

Page 5: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Page

ABSTRACT ................................................................................................................................................. II

LIST OF FIGURES ..................................................................................................................................... V

LIST OF TABLES ....................................................................................................................................... V

CHAPTER 1 . NATURE OF STUDY .......................................................................................................... I

CHAPTER 2 . LITERATURE REVIEW .................................................................................................... 6

CHAPTER 3 . METHODOLOGY ............................................................................................................. 23

................................................................................................................................................... 3.1 DATA 23 ............................................................................................................................ 3.2 DATA COLLECTION 2 3

................................................................................................................................. 3.2.1 Registration 23 .................................................................................................................................... 3.2.2 Digitizing 24

................................................................................................................. 3.3 CALCULA~ON OF EPSILON 2 4 ................................................................................................................................. 3.3.1 Registration 24

............................................................................................................. 3.3.2 Inherent Map Error ..... 25 ........................................................................................................................... 3.3.3 Digitizing Error 25

............................................................................................................................... 3.3.4 Total Epsilon 26

CHAITER 4 . RESULTS AND DISCUSSION ......................................................................................... 28

4.1 OPERATOR BACKGROUNDS ................................................................................................................. 28 ........................................................................................................................ 4.2 DIGITIZING RE~ULTS 2 8

................................................................................................................................ 4.2.1 Registration 2 8 .......................................................................................................................... 4.2.2 Digitizing Error 3 0

................................................................................................................... 4.2.3 Test of Independence 30 4.3 TOTAL EPSILON CALCULATION ........................................................................................................... 30 4.4 DiSCussION ......................................................................................................................................... 33

4.4.1 Assessrnent of the Three Measures .............................................................................................. 33 4.4.2 Itnprovements tu the Methodology ............................................................................................... 35

........................................................................................................... 4.4.3 Areas of Further Research 36

APPENDM A .............................................................................................................................................. 38

LITERATURE CITED .............................................................................................................................. 39

Page 6: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

LIST OF FIGURES

.................................... Figure 2.1. Digitized line with epsilon band and the "control" line 9 Figure 2.2. Point mode digitizing is a sampling method that creates a new line using the

points selected by the operator ................................................................................. 10 Figure 2.3. Perpendicular error distance is from a vertex on the digitized line to the

control line .................................................................................................................... 12 ....... Figure 2.4. Two lines representing the sarne feature but digitized at different scales 14

Figure 2.5. The RMS error is calculated from the differences in the tic table coordinates ................................................................................... and the transformed coordinates 16

Figure 2.6. Tontroi" iine with concave epsilon band .................................................................. 21 Figure 2.7. Possible representations of the "true" line from pairs of random points

generated within specified distance ........................................................................... 22

LIST OF TABLES

........................................................... Table 4.1. RMS error in table units and map units 29 Table 4.2. Statistics based on perpendicular e m r distance between the point mode

......................................................... digitized line and the Stream mode digitized line 31 Table 4.3. Calculation of total epsilon distances (metres) ................................................ 32 Table 4.4. Examination of standard deviation digitizing error for individual operators and

the operators as a group ................................................................................................ 34

Page 7: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

CHAPTER 1. NATURI3 OF STUDY

"...it is not the map that is in 'error', it rnerely contains a considerable level of uncertainty ..."

Fisher, 1987, p. 3 15

7.1 Purpose The purpose of this thesis is to construct a method for estimating the arnount of

uncertainty introduced when analog maps are converted tci digital data in a vector based

geographic information system (GIS). There is a substantial need to obtain reliable

estimates of error associated with digital data to better infom data anaiysis, interpretation

and decision-making. The main priority of research into issues of accuracy and

uncertainty associated with GIS is to "develop an adequate means of representing and

modeling the uncertainty and error characteristics of spatial data and to develop GIS

related methods and techniques that cm explicitly take error into account during their

operations with spatial data" (Openshaw, 1989, p.265).

1.2 Background Maps have been the main source of data for geographic analysis for many years.

"The primacy of maps is an unquestioned premise of the field." (Chrisman, 1982a, p.3).

However, maps serve not only geographers. The shift from analog to digital maps has

expanded the utility of maps for many disciplines. The digital map allows for the

manipulation of data in ways that are not possible with paper maps. Analog maps have

traditionally been difficult to use for overlay analysis and are frequently, but incorrectly,

often treated as being 100% accurate (Aronoff, 1989). "For the first time we are

emancipated from the tyranny of paper map sheet with finite size and depth, although

Page 8: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

111U1;Il iC3Gcllbll 3 L l l l 1 I b b U J L u Ucr U W i i ~ b v r r u i u ~ b r a u u i i i u i v i i = iii~ii~bv.i.v..~ .L.-.C..CJ-i,

seamless and scale variable cartographic databases." (MuIler, 1992, p. 2).

The increased speed and power of cornputers allows spatial information to be

processed faster than ever before. The transition from analog to digital rnap has benefited

many areas of rnap analysis but has raised substantive issues regarding spatial data

handling. Chrisman (1984, p.8 1) noted that "numbers in a database create an illusion of

accuracy and the cornputer opens new ways of potentid abuse." Goodchild (1996) argues

that users expect GIS databases to be developed with the "principles of scientific

measurement". Much of the necessary research into spatial processes and spatial statistics

and resulting conflicts is just beginning.

One of the major issues in the use and anaiysis of digital data is that of accuracy

and the lack of knowledge regarding spatial error. Accurate data is important for GIS but

is often overlooked or dismissed in applications and is very rarely specified on output

products. The rnap user generally has no idea of rnap accuracy and output is often

assumed to represent a higher level of accuracy than it actually contains (Chrisman, 1984;

Keefer et al., 1988).

A GIS allows users to produce maps and models by combining various sets of

spatial data. Two GIS capabilities that excite enthusiasm among potential users are the

ability to change rnap scales and the abiIity to efficiently overlay maps in any order

desired by the user. It is this ability to manipulate mapped information that makes a GIS

so valuable. However, researchers and decision-makers can be misled due to

misunderstandings in the imprecision inherent in cartographic forrns of representation and

the compounding of errors when rnap scales are changed and when maps are merged

Page 9: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

\ A U J U L , 1 /u I )- i i iu i i a u i u w i . . . & m.--- --- ---,,.--, ---- - - - - - d Y

products has arisen for several reasons: (1) they are a requirement of many spatial data

transfer standards; (2) public agencies require accurate estimates of enor to support

decisions based on spatial data; (3) accurate error estimates help to preserve public

confidence; and (4) estimates of error assist in the resolution of litigation disputes

(Goodchild, 1993).

Digital spatial data should not be subject to the sarne limited methods of

determining accuracy that are associated with paper maps. Many spatial databases have

been converted from paper maps without considering the uses of the resulting data or the

intended use of the paper map. The paper map is a communication device transrnitting the

cartographer's view of reality to the map reader. Initial cartographic research focused on

this communication model and gave scant regard to accuracy. "That model diverted

attention away from the data gatherer and the map maker toward the transmission

process; it thereby down played problems of data accuracy and precision and those of

representation." (Woodward, 1992, p.52). "Cartographers feel little need to communicate

information on accuracy, except indirectly through map quality statements or in detailed

legends." (Goodchild, 1991, p.2). Openshaw (1989, p.263) States "there is a remarkable

lack of information about the level of errors in maps and remotely sensed data and, there

are seemingly no available tools for measuring error in the outputs, and no methodology

for assessing their significance."

One of the main features of a GIS is the ability to produce "new" information. The

users of rnaps and other GIS products want to combine information from many sources to

aid in decision making and they want the information to contain as little uncertainty as

Page 10: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

POSSlOle. K C S C a i c i i ~ i ~ aiiu U G L I J I U ~ : - I ~ I ~ U ~ i i i u a r U C L V ~ ruiiifiuuiivv ... ---- -- --

Ieast be aware of the Iimits. If there is no quantified estimate of data quality then users

may rightly be cautious in the use of that data. For example, knowledge of error is

important to researchers that use data analysis to refine research directions or decision-

makers who may be held liable for damages incurred as a result of poor decisions. If a

user understands Iimits associated with the data, the risk of damages as a resuit of poor

decisions are reduced.

The determination of an accuracy level should be based on the intended use of the

information. The acceptable level of accuracy is "that level where the costs of making the

wrong decision are equal to the costs of acquiring more accurate information" (Aronoff,

1989, p.55). Inaccuracies cm Iead to faise perceptions about the data (Bailey, 1988)

which can lead to faulty decisions (Mead, 1982; Chrisman, 1984; Hudson, 1988).

Little has been written in articles dealing with applications of GIS in regard to

accuracy determination and levels of confidence in the output data. The most plausible

explanations for that lack are that users either do not know how much error is in the data,

or they have no way of quantifying the error, or they have sorne reason to believe it is not

significant. The first two explanations are no longer valid for certain types of errors.

Many managers use the "best" data available or data that is "good enough" for their

application, though they may have no quantified estimate of its accuracy. There are no

commercial GIS or other tools that can determine the arnount of error in a data layer or

incorporate the error during overlay procedures (Openshaw, 1989).

Page 11: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

- - - - - - - - - - - -

Chapter 2 provides a literature review of relevant concepts and approaches to

error estimation and analysis. Chapter 3 describes the methodology used in this thesis to

measure and depict error sources. Chapter 4 presents the results and discusses

conclusions and possible approaches and methods that can be used to assist users of

digital mapped data to account for and manage the error and uncertainty in spatial

databases.

Page 12: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

CHAPTER 2. LITERATURE REVIEW

"Digitizing is usually the most expensive part of a GIS, yet the error introduced into maps by digitizing is often overlooked or assumed to be negligible"

Keefer et al ., 199 1, p.957

2.1 Introduction There are two basic types of error in spatial databases: positional and attribute

(Amrhein and Griffith, 199 f ). Either the coordinates of the feature are wrong andor the

content or description of the feature is wrong. Error in these spatial databases occurs from

two main sources, the encoding process and the source documents (Veregin, 1993). The

first source of error is distinguished by discrepancies between the source document and

the digital data derived from that source. The second source of error is associated with

error in the source document.

There are two approaches to accuracy determination: testing and simulation.

Testing is the cornparison of the collected data to data of a known and higher accuracy

(Vonderohe and Chnsman, 1985; Chrisman, 1989). Simulation seeks to develop a

procedure with stochastic modeling that can produce a "random" data set. This "random"

data set is based on the original or generated data set but is perturbed according to some

mathematical or probability function. The "random" data set will share many

characteristics of the original data set but is only a possible version of what the real world

may be like. Usually, multiple "random" data sets are created and combined to assess

final output results. With simulation, predictions about accuracy are made from

assurnptions about the true data (Chrisman, 1989). Testing is a more accurate method, but

Page 13: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

I L 4 1 W U I I U VLIIUL.3 U I 1 U 1 1 1 W l b U b b U l U b b UULU UV l n v r ui i r u j u v r . i u & \uvvuuririu, a i . uiu.vi..vr-,

1984).

Three indices of database accuracy are layer-based, feature-based and domain

specific indices. Layer based indices provide a quick summary of data quality over an

entire layer. A "layer" is a representation of a single type of entity, e.g. soils, hydrology,

land use. The disadvantage of this approach is that data quality varies significantIy across

space (Burroughs, 1986; Veregin, 199 1). Feature based indices provide information about

variations across space but at a high cost in terms of storage and management. A

"feature" is a single entity in a layer, e.g. a lake in a hydrology layer. Intermediate to these

two approaches are "domain specific" indices in which the spatial or thernatic dornain is

subdivided into discrete classes (Veregin, 1991). Spatial domains define areas on the map

that have different values for the same attribute, e.g. areas of the map that have more

recent information or have been surveyed at different times. Thematic domains refer to

similar features in a layer, e.g. urban areas in a land use layer. Ideally, maps should

include not only a total rnap uncertainty index but also domain specific errors.

There are numerous methods to reduce positional error in a spatial database. If

new databases are being created, then experienced digitizers and operational feedback

have been found to reduce input error (Jenks, 198 1 ; Otawa, 1987). The use of survey

control points allows for the "rubber-sheeting" or stretching of the map to a more

accurate position (Star and Estes, 1990). The use of high level surveying, e.g. global

positioning systerns, c m reduce positional error during raw data collection.

Most spatial databases are created from the conversion of analog maps (Marble,

1996). The conversion process requires the registration of the map or "map separates"

Page 14: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

- m m - - - --a------- u -- - - Y I * .

depict a single layer, e.g. hydrology, roads. A skilled operator then digitizes the rnap or it

can be electronically scanned using an autornated scanner.

It is possible to measure the error and uncertainty introduced by the cornponents

of the map conversion process: digitizing, registration and map compilation. Any point on

the map within a radius of the composite error value should contain the true feature. This

distance is called the epsilon distance and when applied to linear features forms an error

band called an epsilon band (Figure 2.1). While the epsilon band mode1 does not provide

a determination of the location of the true line, it does provide an estimate of the

deviation of a digitized line from the true line,

2.2 Digitizing Digitizing is a strenuous event that requires intense concentration and places a

premiurn on an operator's psychological and physiological ability to discern the centre of

a line and follow the centre of the line with the cursor. There are two rnethods of manual

digitizing: point mode and Stream mode. Both involve the operator moving the cursor or

"puck" dong the features to be collected. The difference in the two modes lies in the

procedure of collecting those features. In point mode, the operator moves the cursor to

any point that the operator considers to be important in defining that feature, e.g. a bend

in a river or road. The operator then enters the location of that point into the database.

Point mode digitizing requires intelligent interpretation of each feature by the operator. It

is a tedious process and does not always give a true representation of the feature being

digitized (Douglas and Peucker, 1973; Burroughs, 1986) (Figure 2.2)- It is a sampling

Page 15: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Epsilon Band True

Ensilm

........ ........ ...... ........ ...... ......

.................. .............. ........ ....... ........ ........ ..-- %.." ......... ....m. ........ Digitized line

Figure 2.1. Digitized line with epsilon band and the "true" line.

Page 16: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Original line . . . . . . . . . . . . . . . . . . . . . . . . . . Digitized line (adapted from Burroughs, 1986)

Figure 2.2. Point mode digitizing is a sampling method that creates a new line using the points selected by the operator.

Page 17: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

=------ ." - - - - - , - - - - - - - - - - - - - - - - - I -

accuracy of the resulting data (Blakernore, 1984; Klinkenberg and Xiao, 1990).

Alternatively, in stream mode digitizing, the operator moves the cursor along the

feature, e.g. polygon boundary, and points are automatically entered without judgement

by the operator using algorithms that are based on the distance traveled along the line or

the time elapsed. AIthough strearn mode digitizing is faster and easier than point mode,

operators tend to undercut or overshoot comers and have to make corrections as they

digitize (Jenks, 198 1).

Traylor (1979) had 15 subjects digitize a 6" x 4.5" generalized representation of

Australia in stream mode. Using perpendicular e m r distances from the digitized line to

the original line (Figure 2.3), Traylor found that stream mode digitizing error does not

occur randomly, but is correlated to the direction of cursor travel and an inability of the

operators to correct their mistakes even though they know they are not following the line.

Traylor suggested that a digitizing signature could be created for each operator and data

sets created by them could be modified according to that signature. Jenks (198 1) also

found stream mode digitizing dominated by "latitudinal" errors, i.e. operators would

'overshoot' or 'undercut' corners and realizing that they had strayed from the line would

slowly make their way back to the line rather than make an abrupt correction.

Honeycutt ( 1985) examined the affect of cartographic generalization on positional

uncertainty from four rnaps scales: 1 :24 000, 1 :62 500, 1: 100 000, and 1 :250 000. Eight

strearn channels were digitized in point mode at each of the four scales, with the largest

scale version acting as the base line. The generalized versions of the line were digitized

by one operator and overlain with the base line resulting in polygons that represented the

Page 18: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Original line ......................... Digitized line

Error distance

Figure 2.3. Perpendicular error distance is from a vertex on the digitized line to the true line.

Page 19: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

- -

polygon with area weighted average and variance. It was found that generahzation caused

a bimodal distribution of location error about the cartographic line. The operator was not

able to follow the centre of the line but veered to the right and left of the line. It was

reasoned that overshooting and undercutting, based on Traylor's findings, were

responsible for this distribution. Generdization reduces the nurnber of points required to

represent a feature causing the shape of the feature to change. It is implausible that

digitizing emor causes the bimodal distribution. The transected polygons represent the

arnount of deviation between the base line and the srnaller scale representation (see

Figure 2.4), not the inability of the operator to follow the line. The comparison of

digitized lines at different scales does not provide a suitable measure of digitizing error

and, therefore, the comparison was invalid.

Otawa (1987) studied the variability of digitizing error of 14 people. Using the

same map and hardware, various sized polygons with varying complexities were

digitized. The subjects had little or no pior experience with digitizing. Analysis consisted

of comparing polygon area from operator to operator. It was concluded that manual

digitizing created more error than expected and that the larger the polygon to be digitized,

the Iarger the error.

Keefer et al. (1 988) used a method simiiar to Traylor (1979) to examine digitizing

error. Map-like features were digitized in point mode and used as the "control" Iine. The

features were then digitized again in Stream mode and the perpendicular error distance

from the sample line to the control line was calculated (see Figure 2.2) The data was

found to be non-random with a high correlation or senai dependence from point to point.

Page 20: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Line digitized at small scale

- - - m m - _

Line digitized at large scale (source: Honeycutt. 1985)

Figure 2.4. Two lines representing the same feature but digitized at two different scales

Page 21: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

average (ARMA) mode1 to simulate Stream mode digitizing error. Line length and

polygon area output from the program were compared with the original data. The authors

concluded that "time series analysis is a very effective method of studying the effect of

digitizing upon map accuracy" (p.482), although they did not mention the size of the

digitizing error.

A study by Maffini et al. (1989) examined the distribution of error from digitizing

discrete and continuous features. A11 features were digitized in point mode at three

different scales under three time constraints; cornfortable, humed, and very humied. Not

surprisingly, the largest scaIe and slowest digitizing speed provided the most accurate

data.

2.3 Regis tration Error Registration is the process of defining points to correlate features in one

coordinate systern to another coordinate system. This process is done in anticipation of

transforming the features from one coordinate system to the other system. In the case of

digitizing, the transformation is from table coordinates to map coordinates. The points

used during registration are called "tics". Registration fitness or acceptability is

deterrnined by measuring the error between the output tic coordinates and the transformed

coordinates using root mean square analysis (see Figure 2.5). The distance deviation

between the original coordinates compared to the transformed coordinates determines the

"root mean square" (RMS) error.

Page 22: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

used for RMS calculations

Output coordinate (tic entered by digitking operator)

Transformed input coordinate (original "true" geographic registration tic)

Figure 2.5. The RMS error is calculated from the differences in the tic table coordinates and the transformed coordinates.

Page 23: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

RMS Error =

where xi,yi are the tic table coordinates x,,yj are the transformed input tic coordinates n is the number of tics

Bolstad et al. (1990) simulated the registration process by having four operators

digitize a series of points. The mean deviation around the points was 0.068 mm or 1.7 m

at a map scale of 1 :24 000. Rogowski (1995) used the affine transformation in ESRI

Corporation's ARCIINFO GIS software resulting in an RMS error of I .O metre. It is the

ARC/INFO method that is used in this research.

2.4 lnherent Map Error There are several sources of inherent map error. Primary data capture (surveying,

geodesy and photogrammetry) introduces human, instrumental and environmental errors.

Human error results from observers not reading instruments correctly or not positioning

equipment correctly. Instrumental errors occur from poorly constructed equipment or lack

of proper calibration. Environmental errors are caused by humidity, temperature,

pressure, magnetic variations, obstruction of signals, wind, and illumination.

Observations from primary data capture are subjected to rigorous statistical and

mathematical modeling to remove most of the error.

Additional sources of error include those caused by plotting control points for

map production, drawing of the features, generalization of the features, e m r in colour

registration of map separates, feature exaggeration for communication reasons, definition

Page 24: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

and Bossier, 1992).

2.5 Epsilon Band Mode1 The epsilon model of positional uncertainty for cartographic lines was adapted by

Chrisman (1982a) based on work done by Perkal (1956). Perkal used a circle of diarneter

epsilon to determine an approxirnate length of a line. Chrisman (1982) modified this

approach concluding that somewhere within this epsiIon distance the true line exists (see

Figure 2. f ). "The epsilon model provides a conservative, generalized model directed at

unifying al1 sources of error." (Chrisman, 1982a, p.61).

The epsilon model cm be used in either a probabilistic or deterministic method

(Goodchild, 1988). In a deterministic method, the probability of the true line within the

epsilon band is 1 .O. Theoretically, this means that there can be no error outside of the

epsilon distance. The probabilistic approach assumes that the error around a line is

represented by a normal distribution (Maffini et al., 1989). With the probabilistic method,

epsilon can take on any probabilistic measure, such as standard deviation, which implies,

for example, that there is a 68% chance that the true line is within the epsilon band. Mark

and Csillag (1989) used a probabilistic epsilon band to define a probability surface

between two polygons to deterrnine the probabiiity of a sample point belonging to one or

the other polygon.

Chrisman (1982b) used epsilon bands to study systematic errors associated with

the United States Geological Survey's (USGS) Geographic Information Retrieval and

Analysis System (GIRAS) digital land useiland cover series. His determination of epsilon

was based on inherent map or scale error, digitizing error and round-off error and resulted

Page 25: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

interpretation and registration emrs that were not included. Because of these missing

values, the 20 m epsilon was considered to be quite conservative. The results showed 7

percent of a 100 000 hectare database fell in this epsilon band. The area in the epsilon

band represented a possibility that that area was a different land usenand cover class.

Blakemore (1984) studied the number of industrial establishments that existed in

Employrnent Office Areas (EOA) in northwest England. To determine within which EOA

an establishment was located, the establishments were "geocoded" to the EOA base map.

The base map had a 1 km grid square resolution yielding an epsilon value of .707 1 km.

Point-in-polygon overlay was conducted to determine which establishments fell within

which EOA. The results were not encouraging as approximately 40 percent of the sample

points that were tested fell within the epsilon band and could not be definitely assigned to

a pol ygon.

Dunn et al. (1 990) used epsilon bands for a study on the arnount of error in digital

databases associated with the Monitoring Landscape Change project in England and

Wales. Several lines in the database were digitized twice providing administrators with an

opportunity to examine digitizing error. The values for epsilon were based on the

maximum range between the two lines and the interquartile range (IQR), the range

between the 251h and 75" percentiles. The difference between these values was quite

large: 20 m for the range and 3.1 m for the IQR. The area of uncertainty (the area in the

epsilon band) for the polygons in the study varied from 10.0% to 15.8% for the range

epsilon and 1.6% to 2.5% for the IQR epsilon.

Page 26: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

with the band being closer to the "true" line at the center of the line segment than at the

end (Figure 2.6). Randorn pairs of locations were generated about the endpoints of a line

within a specified distance of the endpoint (Figure 2.7). These pairs of locations

simulated possible representations of the "true" line. Dutton found that the standard

deviation of these representations varied along the length of the line with dispersion being

greatest at the endpoints and least at the midpoint. The problem with Dutton's proposal is

îhat the digitized points must be independent of each other, but stream mode digitized

points are not independent.

Goodchild and Dubuc (1987) discussed problems with the use of the epsilon band.

They suggested that:

there should be no upper limit for error; the epsilon band does not provide distribution of error within the epsilon band; and dthough epsilon provides a mode1 of deviation for the line, it does not mode1 the line itself.

These difficulties are mediated, however, by the choice of epsilon distance.

Careful measurement of the components involved in the digitizing process will provide

an appropriate epsilon distance. Additionally, the epsilon band mode1 is relatively easily

implemented by most users and provides at least a coarse rneasure of the components of

error that are understandable to most users.

The objective of this research is to develop a quick and reasonably accurate

method to estimate how rnuch uncertainty exists in newly created spatial databases. The

assignment of an uncertainty measurement that assesses the percentage of the true line

within an epsilon distance from the observed line is, therefore, a realistic goal.

Page 27: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Concave Epsilon Band

Figure 2.6 "True" line with concave epsilon band

Page 28: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

/- Circle of specified i

"Tme" line

Figure 2.7. Possible representations of the "tme" line from pairs of random points generated within specified distance.

Page 29: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Error introduced from digitizing is from three sources: registration error,

digitizing error and inherent map error. These error sources are assumed to be

independent and cm, therefore, be summed, resulting in a total measure of error calied

"epsilon". To assess characteristics of these sources of error, eight individuals were

chosen to digitize standardized line information under controlled conditions. The results

were assessed and compared relative to the three sources of error and epsilon

3.1 Data The data consisted of registration points and a sinuous line (Appendix A). The

data were plotted with a thickness of 1 mm on mylar to limit the amount of stretch and

distortion. Mylar is a more stable medium than paper that cm change in size as humidity

and temperature vary.

3.2 Data Collection The entry of information from the constructed data set required two steps from

each operator: registration and digitizing.

3.2.1 Registration

An empty data file was created for each operator, each possessing the same "tic

table" (see below). The empty data files were identical in terms of the geographic space

they represented. The geographic space of each data set was referenced according to four

known coordinates corresponding to the minimum x and y values, the minimum x and

maximum y values, the maximum x and minimum y values, and the maximum x and y

values. These four known geographic registration points are known as "tics". The

Page 30: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

the tics. The coordinates of the tics are stored in the computer in a data file known as a

"tic table". Each time an operator begins a digitizing session, the tics (norrnally

represented on a map with large cross-hairs) rnust be re-entered from the map using an

electronic puck or cursor. The coordinates the operator enters are compared with those

aiready in the computer to detemine the root mean square (RMS) enor.

The operator enters the identification number (ID) of the tic and then places the

cursor over the crosshair as accurately as possible and enters the tic location. When al1 the

tics have been entered, the computer determines the RMS error. The deviation of the tics

entered by the operator compared to those in the "tic table" yields the RMS error. RMS

values greater than 0.003 are usually not acceptable and the map must be registered again.

3.2.2 Digitizing

The digitizing was conducted over three days. The mylar was not removed from

the digitizing table ensuring that no distortion resulted from re-taping the map to the

table. The temperature and humidity were controlled by the local environment system.

Once the map was registered to the table, each operator digitized the sarnple line in a

dense point mode; this was the control line. Each operator then digitized the simple line

in Stream mode.

3.3 Calculation of Epsilon An epsilon distance was calculated for each procedure. Each procedure distorts

the data set before it is passed to the next procedure. Because of the independence of each

procedure, it was valid to sum the epsilon distances to obtain a total epsilon distance.

3.3.1 Registration

Page 31: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

during the registration process. That value was a measure of how well the rnap was

registered to the table. The lower the RMS error, the closer the entered tics align with the

rnap tics.

3.3 -2 Inherent Map Error

The epsilon distance for inherent rnap error was based on standards from the

Canadian federal Department of Energy, Mines and Resources, producers of the National

Topographie Series. The accepted standard states that 90 percent of well-defined features

measured from the rnap faIl within .5 mm relative to their true position (Energy, Mines

and Resources, 1976). In other words, a feature represented on the rnap will be found

within a radius of .5 mm at the rnap scale on the earth's surface. The epsilon distance was

calculated by multiplying the radius by the rnap scaie.

E , = radius x rnap scale.

where E , is the epsilon distance for inherent rnap error.

The rnap scale used for this project was 150 000 producing an inherent rnap error epsilon

distance of 25 m. This value is the maximum rnap error value.

3.3.3 Digitizing Error

An approach similar to that taken by Keefer et al. (1988) was used. They

determined digitizing error by having operators digitize the feature in a dense point mode.

The operators then digitized the feature in strearn mode and the perpendicular error

distance between the two lines was considered the digitizing error. The approach used

here was as follows:

Page 32: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

-- - - Y A

digitized line was taken as the "control" line.

2. Each operator then re-digitized their point mode sample line in stream mode.

3. A computer program was written to compare the perpendicular distances between the

vertices entered in stream mode and the point mode line. The prograrn calculated the

distance between the stream mode digitized line and the point mode line. The

perpendicular distance from each vertex to the "control" line represented the deviation

by the operator. The distance between the two lines is the digitizing error.

4. Three measures (maximum deviation, mean deviation, and the median deviation)

were calculated using the Statistica computer package (StatSoft, 1994). The

maximum epsilon represents the maximum error that occurs. The median and

standard deviation epsilons represent a probabilistic value. The standard deviation

represents 68% of the digitizing error while the median represents the 50 percentile of

the digitizing error. With the median, the chance of a point being in the epsilon band

is the same as being outside of it.

3.3.4 Total Epsilon

Since each source of error (inherent map error, registration error, digitizing error)

is independent of the other, the epsilon values for each procedure were summed to obtain

the total epsilon value. The total epsilon is:

E t = E r + E * + E d

where: E , is the total epsilon

E , is the epsilon from registration

E , is the epsilon from inherent map error

€ d is the epsilon from digitizing

Page 33: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

- - - - - - - I - -

epsilon distance. To differentiate between the three total epsilons for each operator, each

total epsilon was named according to the type of digitizing epsilon used. For example, the

maximum epsilon is comprised of the maximum digitizing deviation epsilon distance, the

inherent map error epsilon and the registration epsilon.

Page 34: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

CHAPTER 4. RESULTS AND DISCUSSION

4.1 Operator Backgrounds The operators had varied backgrounds: two were computer programmers

who had never digitized before, three had considerable digitizing experience, and three

had moderate experience. The operators were somewhat nervous with one operator

reporting hands shaking more than normal. . The operators were toid this digitizing was

for a thesis and may have contributed to their nervousness. The only constraint placed on

the operators was that they produce a low RMS error during map registration. The

accepted n o m for rnap registration is 0.003 or lower.

4.2 Digitizing Results 4.2.1 Registration

The ARC/INFO registration process uses an affine transformation refemng to a

linear transformation of the table coordinates to map space. This process yields the root

mean square error of the table coordinates as well as the map space coordinates. The

output units are those defined by the values in the tic file prior to registration, in this

project the output units are meters. Table 4.1 shows the RMS error values for each

operator. Similar values for the Table RMS error produce different Map RMS errors due

to rounding by the ARC/INFO transformation software. Al1 operators were able to

achieve RMS errors less than or equal to 0.003 except for Operator 3 who, after several

üttempts, could achieve an RMS of only 0.004.

Page 35: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Table 4.1. RMS error in table units and map units.

Operator RMS Error RMS Error Table Units Map Units

(inches) (metres ) 1 0.002 1.978 2 0.003 3.271 3 0.004 4.562 4 0.002 2.163 5 0.002 2.32 1 6 0.00 1 1.276 7 0.002 2.072 8 0.002 2.246

Page 36: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

The perpendicular error distances (Figure 2.2) between the point mode version of

the sample line and the Stream mode version of the sample line for each operator were

used in the calculation of the mean, maximum, and median deviations. Table 4.2 shows

the mean, maximum, and median deviations for each operator.

Using the raw values should produce a mean close to zero; error on one side of the

"controI" line should be equal to mor on the other side of the "control" line. A mean

distant from zero would suggest: (1) that the operator spent more time on one side of the

line; (2) that the operator may have had trouble digitizing the feature, especially curves;

and/or (3) the operator had a tendency to undercut or overshoot curves in a particular

direction. Absolute values of the deviations were used in the calculations. Table 4.2

shows the digitizing error for each operator. There may be sorne error introduced through

calculations by Statistica, but the arnount, if any, is unknown.

4.2.3 Test of Independence

Spearman's Rank Order Correlation was applied to each operator's registration

error and each measure of digitizing error. Using the mean digitizing error, Spearman's r

value was -.357 with a p level of -385. The maximum digitizing error had a Spearman's r

value of -.595 with a p level of .120. The median digitizing error had a Spearman's r value

of -.524 with a p level of. 183. From this, it is concluded that the digitizing error and

registration error are independent.

4.3 Total Epsilon Calculation The total epsilon distance is the sum of the digitizing epsilon, registration epsilon

and the inherent map error epsilon. Table 4.3 shows the calculation of the total epsiion

Page 37: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Table 4.2. Statistics based on perpendicular error distance between the point mode digitized line and the Stream mode digitized line.

Operator # of Mean Median Maximum Points Deviation (m) (m)

Page 38: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Table 4.3. CalcuIation of totaI epsiIon distances (metres).

Maximum Mean Median Deviation Deviation Deviation

Operator Epsilon Epsilon Epsilon 1 Registration Error 1 -978 1.978 1.978

Map Error 25.000 25 .O00 25.000 Digitizing Error 49.305 1 3.230 1 1.362 Total Epsilon 76.283 40.208 38.340

2 Registration Error 3.27 1 3.27 1 3.27 1 Map Error 25 .O00 25 .O00 25.00

Digitizing Error 3 1.44 1 7.876 6.468 Total Epsilon 59.712 36.147 34.739

3 Registration Error 4.562 4.562 4.562 Map Error 25.000 25.000 25 .O00

Digitizing Error 20.042 6.278 5.3 17 Total Epsilon 49.6û4 35.840 34.879

4 Registration Error 2.163 2.1 63 2.163 Map Error 25.000 25.000 25 .O00

Digitizing Error 33.124 9.206 7.62 1 Total Epsilon 60.28 7 36.369 34.784

5 Registration Error 2.32 1 2.32 1 2.32 1 Map Error 25.000 25.000 25.000

Digitizing Error 20.323 6.196 5.756 Total Epsilon 4 7.644 33.51 7 33.077

6 Registration Error 1 -276 1.276 1.276 Map Error 25 .O00 25.000 25.000

Digitizing Error 3 1.693 9.176 7.529 Total Epsilon 57.969 35.452 33.805

7 Registration Error 2.072 2.072 2.072 Map Error 25 .O00 25.000 25.000

Digitizing Error 26.566 5.57 1 4.797 Total Epsilon 53.638 32.643 31.869

8 Registration Error 2.246 2.246 2.246 Map Error 25.000 25.000 25.000

Digitizing Error 39.0 1 8 9.444 6.5 1 1 Total Epsilon 66.264 36.690 33.757

Page 39: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

table shows that experienced operators introduce less error than beginners. Registration

error is not affected by experience; it is simply the ability of the operator to place the

crosshair of the cursor on a point. Stream mode digitizing is a psychomotor event;

operators must have good eye-hand coordination. Table 4.4 provides a closer examination

of the standard deviation listing the standard error and confidence limits for each operator

and for al1 the operators as a group.

4.4 Discussion 4.4.1 Assessrnent of the Three Measures

The maximum deviation represents the largest error that would occur. An epsilon

band based on this value will contain the largest percentage of the "control" line.

However, an epsilon based on this value will be quite large. The maximum deviation is

that small part of the digitized line that may have occurred when the operator lost

concentration and had trouble following the line, or from the jerking motion that rnay

occur as the cursor sticks moving over the table. The median deviation is a cmde index of

central tendency that excludes the extremes at either end of the scale. The median epsilon

will tend to show the smallest epsilon band width but contain a lower percentage of the

line. The mean deviation is a better rneasure of central tendency in which al1 values are

taken into account.

Page 40: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Table 4.4. Examination of standard deviation digitizing error for individual operators and the operators as a group.

Operator # of Mean Standard Standard Confidence Points Deviation Deviation Error Limits (95%)

Operators

Page 41: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Therefore, users will have to choose among the three measures depending upon

their objectives and the strengths and weaknesses of each measure. If a user requires

greater certainty that the largest arnount of the line is contained within the epsilon band

width, then the maximum deviation epsilon would be chosen, aithough there will be

greater uncertainty about the true position of the line with that measure. If the user wishes

to reduce the uncertainty regarding the position of the line, then the median epsilon is

best, although a smaller percentage of the line will be contained with the epsilon band.

Clearly the maximum deviation allows the most variability of the digitized line

relative to the "control" line giving the widest area of uncertainty. The line must occur

somewhere within that range. If the rnedian deviation is used, a mistake rnay occur since

50% of the range is ignored. Users will not norrnally use this measure for that reason. The

mean and standard deviation are statistical measures that provide the best estimate of the

accuracy of the digitized line. Researchers or organizations that produce digital data to

provide a statistical measurement of the positional accuracy of that data c m use this

approac h.

This thesis used a scale of 150 000 as this is a common scale used for digitizing.

The relative importance of scale must be mentioned. At globaI scales, error from

digitizing and other sources is not usually a concem. At more detailed scaies, error

becomes an important issue.

4.4.2 Improvements to the Methodology

Although the test was conducted on a group of 8 subjects, providing a sample of

approximately 2,000 points, a Iarger sample is likely necessary to provide a better profile

of digitizing error, particularly if stratified according to the experience of the operators.

Page 42: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

experience. Furtherrnore, each operator in this test was not fatigued, i.e. they had not been

digitizing before they did the test. A better approach would be to have the operators

digitize the sarnple line, spend some tirne digitizing a map and then digitize the sample

line again. This would provide more realistic data as it would better represent the

digitizing process.

4.4.3 Areas of Further Research

The objective, as defined in Chapter 1, "to construct a method for estimating the

amount of uncertainty introduced when analog maps are converted to digital data in a

vector based GIS" has been achieved. The rnethodology was developed and applied to a

set of data to establish the feasibility of the application. There are, however, important

issues that have arisen in considering the application of this research.

1. AIthough these measures cm be readily applied with today's GIS software, the

software does not readily allow for any measures of accuracy to be stored with the

data set. There would have to be a written record of this information or a text file that

could accompany the data, recording the original information and subsequent

modifications to the data set.

2. One readily apparent problem with the interpretation of spatial polygonal data is that

lines are often seen as a hard edge, such as the lines representing boundaries on a soi1

map. However, in most cases, there is a buffer or transition zone between polygons

that is not well interpreted by an infinitely thin boundary line. The epsilon band can

Page 43: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

incorporating this transition zone in decision-making.

3. While the epsilon band gives a uniform band about a line, error introduced from

digitizing is not uniform. Future research should consider the direction of digitizing

be as well as the sinuosity of the feature being digitized. For exarnple, straight-line

segments should have less e m r than curved segments. Furthemore, a variable width .

epsilon would provide a better estimate of error. Operator characteristics could be

measured based on the direction of digitizing and the quantity of error that is

introduced. Once the digitizing characteristics of operators has been quantified, the

data set can be altered based on those values. For example, if the operator has a

tendency to undercut right to left curves by a certain average distance, a

transformation can be developed and applied that will alter al1 right to left curves by

the specified amount.

4. If epsilon error information is stored with a map data set, concerns arise as to the

disposition and interpretation of that information with each subsequent overlay of that

rnap with other maps. What happens to the epsilon bands during map overlay? If one

data set has epsilon bands and the other does not, are the bands removed or are they

applied to the output and in what ways are they modified? If both data sets have

epsilon bands, which bands take precedence when lines from each data set represents

the same feature? For exarnple, if a soils data set was overlain with a hydrology data

set, shore and river boundaries would be present in each data set, but which shore and

river boundaries are to be used in the output?

Page 44: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Appendix A. Sample line used for digitizing based on hand drawn line at 150 000.

Page 45: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

LITERATURE CITED

Abler, R.F. 1987. "The NSF NCGIA". International Journal of Geographic Information Systems, 1 (4):303-326.

Amrhein, C.G. and Griffith, D. A. 199 1. "A Mode1 for Statistical Quality Control of Spatial Data in a GIS " in Proceedings, GIS 9 1, Canadian Con ference, pp.9 1 - 1 03.

Aronoff, S. 1989. Geographic information Systems: A Management Perspective. WDL Publications, Ottawa, Canada.

Bailey, R.G. 1988. "Problems with Using Overlay Mapping for Planning and Their Implications for Geographic Information ;ystems". Environmental Management, 12(1): 11-17.

Blakemore, M. 1984. "Generalisation and Error in Spatial Data Bases" Cartographica, 21(2+3): 131-139.

Bolstad, P.V., Gessler, P. and Lillesand, T.M.. 1990. "Positional uncertainty in manually digitized map data". International Journal of Geographic Information Systems, 4(4):399-4 12.

Burroughs, P.A. 1986. Principles of Geographical Information Systems for Land Resources Assessment. Clarendon Press, Oxford.

Chrisman, N.R. 1982a. "Methods of Spatial Analysis Based on Error in Categoncal Maps", unpublished Ph.D. dissertation, University of Bristol.

Chrisman, N.R. 1982b. "A Theory of Cartographie Error and Its Measurement in Digital Data Bases". Proceedings Auto-Carto 5, Environmental Assessrnent and Resource Management, Foreman, J. (ed), American Society of Photogrammetry and American Congress of Surveying and Mapping, pp. 1 59- 168.

Chrisman, N.R. 1984. "The Role of Quality Information in the Long-Term Functioning of a Geographic Information S ystem". Cartographica, pp.52 1 -529.

Chrisman, N.R. 1989. "Error in Categorical Maps: Testing versus Simulation". Auto- Carto 9, pp.52 1-529.

Douglas, D.H. and Peucker, T.K.. 1973. "Algorithms for the reduction of the number of points required to represent a digitized line or its caricature". Canadian Cartographer, 10: 1 12- 122.

Page 46: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

error in digital databases of land use: an empirical study". International Journal of Geographic Information Systems, 4(4):385-398.

Dutton, G. 1992. "Handling Positional Uncertainty in Spatial Databases". Proceedings, 5'h International Symposium on Spatial Data Handling, pp.460-469.

Energy, Mines and Resources 1976. "A Guide to the Accuracy of Maps". Technical Report Series, Ottawa.

Fisher, P.F. 1987. "The Nature of Soi1 Data in GIS --Error or Uncertainty". IGIS Symposium: The Research Agenda, Vol. 3, pp.307-3 18, Arlington, VA. NASA.

Goodchild, M.F. 1988. "The Issue of Accuracy in Global Databases", in Building Databases for Global Science, Mounsey, H. and Tomlinson, R.F. eds. Taylor and Francis.

Goodchild, M.F. 1991. "Keynote address". Proceedings, Symposium on Spatial Database Accuracy. Department of Surveying and Land Information, University of Melbourne, pp. 1 - 16.

Goodchild, M.F. 1993. "Data Models and Data Quality: Problems and Prospects", in Environmental Modeling with GIS, Goodchild, M.F., Parks and Seyaert Eds. Oxford University Press.

Goodchild, M.F. 1996. "Generaiization, Uncertainty, and Error Modeling". GlSILIS '96, pp. 765-774.

Goodchild, M.F., and Dubuc, 0. 1987. "A Mode1 of Error for Choropleth Maps with Applications to GIS". Auto-Carto 8, pp. 165- 174.

Honeycutt, D.M. 1985. "Epsilon, Generalization, and Probability in Spatial Data Bases", Research Paper, Dept. of Geography, UCSB

Hudson, D. 1988, "Some Comments on Data Quality in a GIS". Technical Papers, ACSM-ASPRS Annual Convention, Volume 2.

Jenks, G.F. 198 1. "Lines, Cornputers, and Human Frailties". AAAG, 7 l(1): 1 - 10-

Keefer, B.J., Smith, J.L. and Gregoire, T. G. 1988. "Simulating Manual Digitizing Error with Statistical Models". GISLIS '88, pp.475-483.

Keefer, B.J., Smith, J.L. and Gregoire, T. G. 1991. "Modeling and Evaluating the Effects of Stream Mode Digitizing Errors on Map Variables". Photogramrnetric Engineering and Remote Sensing, 57(7):957-963.

Page 47: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

Klinkenberg, B. and Xiao, Y., 1990. "Some Conceptuai Definitions in Error Analysis i n GIS". GIS '90, Canadian Symposium, pp. 1 124- 1 130, CISM.

Maffini, G., Arno, M. and Bitterlich. W. 1989. "Observations and cornments on the generation and treatment of error in digital GIS data". Accuracy of Spatial Databases, Goodchild, M.F. and Gopal, S. Eds. Taylor and Francis.

Marble, D. 1996. Persona1 Communication.

Mark, D.M. and Csillag, F. 1989. "The Nature of Boundaries on 'Area-Class' Maps". Cartographica, 26(1):65-78.

Mead, D. A. 1982. "Assessing data quality in geographic information systems" in Remote sensing for resource management, Johannsen, C.J. and Sander, J.L. (eds.), Ankeny, Iowa.

Muller, J.C. 1 992. "Towards an Integrated Cartographie Research Mode1 : Suggestions and Priorities". Computers, Environment and Urban Systems, 16:249-259.

Openshaw, S. 1989. "Learning to live with errors in spatial databases" in Accuracy of Spatial Databases, pp.263-276.

Otawa, T. 1987. "Accuracy of Digitizing: Overlooked Factor in GIS Operations" in GIS '87, pp.295-299.

Perkal, J. 1956. "On the epsilon length". Bulletin de 1'Academie Polonaises des Sciences, 4(7): 399-403.

Rogowski, A.S. 1995. "Quantifying soi1 variability in GIS applications 1. Estimates of position". International Journal of Geographic Information Systems, 9(1):81-94.

StatSoft. 1994. CSS (Complete Statistical System): Statistica. StatSoft Incorporated, Tulsa, Oklahoma.

Star, J. and Estes, J. 1990. Geographic Information Systems: An Introduction. Prentice Hall, Englewood Cliffs, New Jersey, 1990.

Thapa, K. and Bossler, J. 1992. "Accuracy of Spatial Data Used in Geographic Information Systems". Photogrammetric Engineering and Remote Sensing, 58(6):835-84 1.

Traylor, C.T. 1979. "The evaluation of a methodology to measure manual digitizing error in cartographie data bases" unpublished Ph.D. dissertation, University of Kansas

Page 48: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

- . 89- 12, Santa Barbara, california.

Veregin, H. 199 1. "GIS Data Quality Evaluation For Coverage Documentation Systems". Report for the Environmental Protection Agency, Las Vegas, Nevada.

Veregin, H. 1993. "Quality assurance for GIS databases". Research in Contemporary & Applied Geography: a Discussion Series - State University of New York at Bingharnton, v17 n2, 18 pp.

Veregin, H. 1994. GIS Quaiity Assurance Research. Lockheed Engineering and Sciences Company/ Environmental Monitoring Systems Laboratory, US Environmental Protection Agency.

Vonderohe, A.P. and Chrisman, N.R. 1985. "Tests to Estabhh the Quality of Digital Cartographie Data: Some Examples From the Dane County Land Records Project" in Proceedings Auto-Carto 7, pp.552-559.

Woodward, D. 1992. "The Representation of the World" in Geography's Inner Worlds: Pervasive Themes in Contemporary American Geography, Abler, R.F., Marcus, M.G. and Olson, J.M., Rutgers University Press, New Brunswick, NJ, 50-73.

Page 49: MEASURING ERROR IN MANUALLY DIGITIZED MAPS · 2005-02-02 · MEASURING ERROR IN MANUALLY DIGITIZED MAPS A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial

APPLIED - 1 IMAGE. lnc = 1653 East Main Street - -. - Rochester, NY 14609 USA -- -- - - Phone: 71 61462-0300 -- -- - - Fax: 7161268-5989

O 1993. Applled Image. Inc.. All Rlghîs Reserved