CONTEXT SENSITIVE TRANSFORMATION OF GEOGRAPHIC …190847/FULLTEXT01.pdf · This research is...

I

CONTEXT SENSITIVE TRANSFORMATIONOF GEOGRAPHIC INFORMATION

Ola Ahlqvist

II

Context sensitive transformation of geographic information

Ola AhlqvistDepartment of Physical GeographyStockholm University

ISSN 1104-7208ISBN 91-7265-213-6© Ola Ahlqvist 2000

Cover figure: Aspects of semantic uncertainty in a multiuser context

Mailing address:Department of Physical GeographyStockholm UniversityS-10691 StockholmSweden

Visiting address:Svante Arrhenius väg 8

Telephone:+46-8-162000Facsimile:+46-8-164818

III

ABSTRACT

This research is concerned with theoretical and methodological aspects of geographic informationtransformation between different user contexts. In this dissertation I present theories and methodologicalapproaches that enable a context sensititve use and reuse of geographic data in geographic informationsystems.

A primary motive for the reported research is that the patrons interested in answering environmentalquestions have increased in number and been diversified during the last 10-15 years. The interest frominternational, national and regional authorities together with multinational and national corporations embracea range of spatial and temporal scales from global to local, and from many-year/-decade perspectives to realtime applications. These differences in spatial and temporal detail will be expressed as rather differentquestions towards existing data. It is expected that geographic information systems will be able to integrate alarge number of diverse data to answer current and future geographic questions and support spatial decisionprocesses. However, there are still important deficiencies in contemporary theories and methods forgeographic information integration

Literature studies and preliminary experiments suggested that any transformation between different users’contexts would change either the thematic, spatial or temporal detail, and the result would include someamount of semantic uncertainty. Consequently, the reported experiments are separated into studies of changein either spatial or thematic detail. The scope concerned with thematic detatil searched for approaches torepresent indiscernibility between categories, and the scope concerned with spatial detail studied semanticeffects caused by changing spatial granularity.

The findings make several contributions to the current knowledge about transforming geographicinformation between users’ contexts. When changing the categorical resolution of a geographic dataset, it ispossible to represent cases of indiscernibility using novel methods of rough classification described in thethesis. The use of rough classification methods together with manual landscape interpretations made itpossible to evaluate semantic uncertainty in geographic data. Such evaluations of spatially aggregatedgeographic data sets show both predictable and non-predictable effects. and these effects may vary fordifferent environmental variables.

Development of methods that integrate crisp, fuzzy and rough data enables spatial decision supportsystems to consider various aspects of semantic uncertainty. By explicitly representing crisp, fuzzy and roughrelations between datasets, a deeper semantic meaning is given to geographic databasses. The explicitrepresentation of semantic relations is called a Geographic Concept Topology and is held as a viabletool for context transformation and full integration of geographic datasets.

Key words:Geographic information, geographic context, semantic models, conceptual models, interoperability,uncertainty, scale, classification , rough sets, fuzzy sets, decision support, uncertainty

V

ACKNOWLEDGEMENTS

Apart from the support granted through the Department of Physical Geography at Stockholm University, thisresearch was made possible by funding from the Swedish Council of Building Research (BFR), the SwedishBoard for Co-ordination of Research (FRN), the Swedish Board for Engineering Sciences (TFR), theSwedish Board for Technological and Industrial Development (NUTEK), the Swedish Board forCommunications Research (KFB) and the Swedish Defense Research Establishment (FOA) through theCentre for Geoinformatics, Stockholm. Additional funding from Knut and Alice Wallenberg's foundationthrough the Sweden-America foundation and from Lillemor and Hans W:son Ahlmanns fund for Geographicresearch made possible my fellowship at NCGIA, University of California Santa Barbara, during the finalyear of this project.

Personally, I would like to thank my supervisor, ass. Prof. Wolter Arnberg. His never ending enthusiasm,his sharp intellect and him being at least ten years ahead of mainstream geographic information science hasmeant a lot to my work on this thesis.

Rolf Ruben has also been a companion and discussion partner throughout this work and I owe him a lotfor his concern for important details and all the interesting data he has provided for this treatise. I also wantto thank Ivonne Anders, Jonas Lind, and Sabine Aflerbach for helping me with some of the figures and maps.

To all staff, faculty and students who courteously hosted my 11-month visit at the National Center forGeographic Information & Analysis, Santa Barbara I direct my deepest thanks. I especially want to thankProf. Mike Goodchild, LaNell Lucius, Abby Caschetta, Christian Brown, and Sum Huynh for yourinvaluable help.

To my loving parents Sture and Vera, I owe more then can be expressed by words, thank you foreverything. Finally to my family, my beautiful wife Åsa, and our wonderful sons, Mathias, Jakob and Isak.You are the center of my universe.

VII

Contents

CONTEXT SENSITIVE TRANSFORMATION OF GEOGRAPHIC INFORMATION .........................I

ABSTRACT................................................................................................................................................... III

ACKNOWLEDGEMENTS ........................................................................................................................... V

1. FRAMEWORK AND OBJECTIVES........................................................................................................ 1

Introduction.................................................................................................................................................... 1Framework..................................................................................................................................................... 1Scope ............................................................................................................................................................. 2Objectives ...................................................................................................................................................... 2Approach and thesis structure ........................................................................................................................ 2Expected scientific contribution..................................................................................................................... 3

2. ENVIRONMENTAL MANAGEMENT AND INFORMATION SOURCES ........................................5An applied perspective................................................................................................................................... 5Models of the real world ................................................................................................................................ 6Real world models and Geography................................................................................................................ 8Real world models and geographic information systems............................................................................. 11

3. DETAIL IN GEOGRAPHIC INFORMATION MODELS ................................................................... 17

Quality – Detail, Granularity, Accuracy, Fitness-for-use and Uncertainty .................................................. 17Spatial granularity........................................................................................................................................ 20Temporal granularity ................................................................................................................................... 23Categorical granularity................................................................................................................................. 24Knowledge representation and uncertainty .................................................................................................. 27A geographical concept topology ................................................................................................................ 29

4. DISCRETE METHODS FOR INTERPRETATION OF LANDSCAPE INFORMATION............... 33

Introduction.................................................................................................................................................. 33Description of the study area ....................................................................................................................... 35The National Forest Inventory ..................................................................................................................... 37Material and methods – the R-method ......................................................................................................... 38Results – R-data ........................................................................................................................................... 50Material and methods – Structural data; classification of economic map sheets according to landscape type..................................................................................................................................................................... 53Results – Structural data .............................................................................................................................. 55Discussion and conclusions ......................................................................................................................... 55

5. PILOT STUDIES....................................................................................................................................... 63

Introduction.................................................................................................................................................. 63Pilot study 1 ................................................................................................................................................. 63Pilot study 2 ................................................................................................................................................. 71Overall discussion and conclusions ............................................................................................................. 73Acknowledgements...................................................................................................................................... 74

6. ROUGH CLASSIFICATION AND ACCURACY ASSESSMENT ...................................................... 75

Introduction.................................................................................................................................................. 75

7. ESTIMATING SEMANTIC UNCERTAINTY IN LAND COVER CLASSIFICATIONS ................ 99

Introduction.................................................................................................................................................. 99Experimental design: Conceptual discussion............................................................................................... 99Methods ..................................................................................................................................................... 100Results........................................................................................................................................................ 111Discussion and conclusions ....................................................................................................................... 117

VIII

8. CRISP, FUZZY AND ROUGH DECISION SUPPORT IN GIS......................................................... 121

Introduction ............................................................................................................................................... 121Method development ................................................................................................................................. 122Experiment ................................................................................................................................................ 123Discussion and conclusions ....................................................................................................................... 130

9. CONCLUSIONS...................................................................................................................................... 135

Summary of findings and objective fulfillment ......................................................................................... 135Conclusions ............................................................................................................................................... 136

REFERENCES ............................................................................................................................................ 137

Framework and objectives • 1

FRAMEWORK AND OBJECTIVES

Our truth is the intersection of independent lies.RICHARD LEVINS

Introduction

Current decision making with an expandingamount of information to take into considerationcalls for an effective information service. Thedevelopment of computer technology has impliedconsiderable changes of work routines as well asan improved efficiency in a number of sectors.Computerized systems have the ability to handlelarge sets of information, which could assist themental, human parts in completing the decisionprocess. Future systems for decision support areexpected to give quick overviews and extractnecessary information based on questions,available facts and other considerations given.This would give the opportunity to concentratethe human resources on the overall visions anddecisions.

A computerized treatment of geographicdatasets is today made possible throughcommercial Geographic Information System-packages. The use of these software is howeverhampered by lack of information on data quality,the functions and processes included in the dataand relations between the data. Variations intemporal and spatial scale are another majorbottleneck when trying to integrate different datain a Geographic Information System.

One of the reasons why all these obstaclesemerge when using a Geographic InformationSystem is that the potential sources of informationare so diverse. Given a certain location we mayhave to deal with material from a detailed level uptowards highly generalized levels of information,each developed for a specific purpose andassembled in different ways. The use ofGeographic Information System as a tool tohandle all this data has been suggested for sometime now.

The theoretical base for how to treat highlydiverse data properly in an integrated fashion hasnot been developed as quickly as the technicaltools available. In geography there is no suchthing as a single representation of the world thatincorporates every possible viewpoint. This is of

fundamental importance and must be consideredwhen we organize spatial data for integrated usein a Geographic Information System

From efforts to integrate geographic datasetsin analyses from local to global scales, in whichgeneralization constitutes one important process,we may conclude that we still lack a firmtheoretical and methodological basis for thisprocess (Wilkinson, 1998; Devogele et al., 1998;Thomlinson et al., 1998; Van Beurden andDouven, 1999). Increasing amounts of availabledata at increasingly better levels of detail give ustheoretically an almost infinite possibility tochoose at what spatial, temporal and thematicresolution we perform geographic analysis. Thisis a fairly recent turn into a data rich situationwhere each implementation raises some importantquestions.

The problem outlined above, indicates asubstantial gap between geography andcontemporary use of geographic informationsystems. This is used as an outset for this thesis.

Framework

This thesis project was initiated through theSwedish Centre for Geoinformatics as one ofseven research foci carried out as PhD-studentprojects. The original title of this project was"Knowledge based digitisation of thematic maps”.This title wanted to emphasise that map reading isan intellectual process and as such would requirea context-sensitive digitisation for further use asinformation in a Geographic Information System.The main goal of the project was formulated:“…to find a model how to digitise map symbolstogether with the mapping model so that thecontext can be exploited in a GIS.” An alternativeinterpretation of this goal is to look for an inverseto the mapping process in order to achieve moreeffective reuse of data in different situations.

As a PhD-student I was given relatively freehands to interpret this project focus into aresearch plan according to my own understandingof the problem. Coming from six years ofprofessional practice within local and regional

Chapter

1

2 • Framework and objectives

environmental planning and management, Inaturally projected the research question ontothese experiences. Together with my supervisor,ass. Prof. Wolter Arnberg, I also developedseveral contacts with other researchers and one ofthe more imortant ones has been the involvementin the “Sustainable Landscapes” project.

The Swedish research program RemoteSensing for the Environment (RESE) hashighlighted the landscape perspective in theproject “Sustainable Landscapes”. In landscapestudies a major concern is to integrate variablesthat depict structure and composition as well asoperative processes within the landscape. As suchthe landscape as it is treated by the “SustainableLandscapes” project seemed to provide a suitabletestbed for the development of a conceptualmodel for geographic information handling. Thisdissertation may not show any concrete evidencefrom this collaboration but many of thediscussions and work by other members withinthis group have certainly influenced my work.

Scope

This research is concerned with theoretical andmethodological aspects of geographic informationtransformation between different contexts. Myown academic and professional experience hasaffected this scope in two important ways:1. Examples and discussions are restricted to

certain parts of biology and earth sciences,mainly within the realm of ecologicalgeography and landscape ecology.

2. The research questions are formulated froman application oriented view, emanatingfrom my own experience of current practicein regional and local environmentalmanagement.

Thus, worked examples mainly concerninformation with relevance to managerial issuesof nature conservation, such as the localimplementation of global conventions on forexample biodiversity (UNEP, 1992) andsustainable development (WCED, 1987). Testsuses information from vegetation maps, scalar,ordinal and categorical variables interpreted frommaps and aerial images, and continuous data fromdigital elevation models. Findings are expected tobe applicable to situations where any sort ofcategorization is applied to geographic data.

Objectives

The main goal of this study is to enable a contextsensitive use and reuse of geographic data. In

other words to make it possible to organizegeographical information of different origin insuch a way that this information can be used atother levels of scale and detail and in othercontexts than those used to assemble theinformation. To reach this general goal it has beenbroken down into a handful of objectives towardwhich focused efforts have been directed:• To review both theoretical and

methodological aspects of integratinggeographic data.

• To identify important deficiencies or gaps incontemporary theories and/or methods forgeographic data integration.

• To identify approaches that considergeographic context information.

• To suggest a feasible solution to support acontext sensitive use of existing geographicdata.

• To demonstrate an application of a contextsensitive integration of geographic data.

These objectives are to be interpreted within aframework of computerized geographic analysis.An important outset is the current ambition ofgeographic information science that tries tointegrate geography, philosophy, physics andmathematics with the realms of cognitive andsociocultural sciences (Couclelis, 1999). Locatedwithin an admittedly complex intersection ofseparate sciences this work does not try todevelop the general theory of spatiotemporalphenomena. I do however detail some importantmeans of improving methods for transfer ofgeographic information between different usercontexts.

Approach and thesis structure

The dissertation is divided into several chapterseach addressing one or several of the researchobjectives. The chapter organization intends tolead the reader through a logical order ofargumentation and findings.

This first chapter provides an overallintroduction to the problem as well as thebackground for the study. It also intends to definethe limits of the presented research and to give ageneral overview of the thesis.

To address the main goal of this dissertation Irecognized early in my preliminary studies theproblems that multiple world views will imposeon any change of spatial or categorical detail orany effort to translate information from onecontext into another. These preliminaries are

Framework and objectives • 3

mainly articulated in chapter 2 and 3, whichdescribe some of the problems associated with acomputer-assisted analysis of geographicinformation. Chapter 4 and 5 were also part of thepreliminary studies. The suite of chapters from 2through 5 has been revised later on, and especiallychapters 2 and 3 have been continuously updatedduring the entirety of this project.

Early in my work with this dissertation I alsohad to formulate an experimental design that wassuitable for investigating transformation betweendifferent user’s contexts. From the preliminarystudies it seemed reasonable to assume that anysuch transformation would either change thethematic, spatial or temporal detail. Iconsequently decided to perform experiments ondata that could isolate effects caused by eitherchange in spatial or thematic detail. The limitedamount of previous research on temporal aspectsas well as the limitations in time for a dissertationled to a very restricted treatment of this dimensionin my studies.

The continued preliminary work included dataassemblage and two case studies presented inchapters 4 and 5. The findings from these studiesboth confirmed that the general experimentaldesign gave interesting results and they alsocalled for a methodology to handle categoricaluncertainty. The continued studies thereforefollowed two parallel trails. One concerned withcategorical detatil that searches for approaches torepresent indiscernibility between categories,reported in chapter 6, and one concerned withfurther studies of effects caused by changingspatial detail, reported in chapter 7. Finally, inchapter 8, I pull together the initial discussionsfrom chapters 2 and 3 with some of myexperimental results to demonstrate a combinationof map algebra with different extensions of settheories to define semantically certain, graded andindiscernible relations between geographicconcepts.

Expected scientific contribution

I hold the most important contribution of thisdissertation to be the Geographic ConceptTopology construction. This is theoreticallyestablished in chapter 3 and demonstrated inchapter 8. Although still unverified in a widersetting, I claim that this structure enables anexplicit representation of semantic relations forgeographic concepts. In addition I propose that aGeographic Concept Topology can be used as a

primary tool for a context sensitive transformationof geographic information. The GeographicConcept Topology acknowledges that differentspatial representations may be used in concert andit is capable of handling important aspects ofsemantic uncertainty simultaneously. Still, thefeasibility of the Geographic Concept Topologyframework remains to be tested in a widerpractical situation with large amounts of diverse,real data.

Furthermore, the Geographic ConceptTopology serves as a first suggestion to formalizethe due process and boundary object ideas firstproposed by Star (1989) and introduced to thewider geographic decision and planningcommunity by Harvey and Chrisman (1999).These notions are fully explained in chapters 2and 3 but the actual achievement is the connectionwith the Geographic Concept Topology construct,as a concrete example of the ideas of “dueprocess” and “boundary objects”.

Geographic information science has onlyrecently directed its interest towards the full suiteof uncertainty aspects possible in geographicinformation. Among the least researched parts areuncertainty related to poorly defined objects orconcepts, yet these are very common in thegeographic discipline (Fisher, 1999). I view theresearch reported in chapters 6 and 8 as importanttheoretical foundations for further development ofgeneral considerations of imprecision ingeographic information.

These findings and the experimental design inchapter 7 enabled the investigation of variousaspects of semantic accuracy in a geographic dataset. These aspects have only recently beenacknowledged and so far a very limited amount ofresearch on appropriate methods for measuringsemantic quality aspects of geographic data hasbeen conducted. In chapter 7 I describe a testdesign that uses manual interpretations atdifferent resolutions. This design makes itpossible to detect generalization effects other thanpurely statistical, and this has recently beenacknowledged to be a neglected and problematicpart of a quality report for geographic data(Weibel and Dutton, 1999).

4 • Framework and objectives

Environmental management and information sources• 5

ENVIRONMENTAL MANAGEMENT ANDINFORMATION SOURCES

Uncertainty sends the brave on the trail of discoveryand the coward on the route of the herd.

DARTWILL AQUILA, http://www.bentarz.se/me/dartwill.htm

An applied perspective

During the last 10 years or so the concept ofspatial decision support systems (SDSS) hasevolved to improve the performance of decisionmakers and managers when they confront semistructured spatial decision problems (Malczewski,1999, p.277). Still the application of computerizedgeographical analysis is to many people anoverwhelming task. Given a local authority, thedepartment responsible for natural conservationmay wish to use the information produced bysome other department. Or it may even want tocompare a new survey with an older one toidentify changes in the vegetation. Besidesproblems of getting data to match into thegeographic information system currently used bythe department some profound questions will bearticulated sooner or later. At what scale can weuse this information? Do we need to performsome kind of generalization on these data? And ifso, what generalization method should we use?And how accurate is the result? Some of thesequestions have been tackled to some extent butsufficient knowledge is still lacking to be able torecommend a standardized set of methods.Notably the issue of reliability or quality hasreceived some well-deserved attention lately. Onerequirement is of course to minimize the error inthe final output, but from an informationalviewpoint we also need to make sure that theinformation is carried through the analysisprocess without being distorted in terms of thesemantic content, the meaning of the data.

In the beginning of the work I came from anapplied environment. Following myundergraduate education I was employed by theÅland Landskapsstyrelse, office for regionalplanning, to develop an environmental databasewith geographic references. The database wasdeveloped using the PC-network based Paradox

software with loose coupling to a custom graphicssoftware. The main issues were database designand integration of information from separateoffices within the administration. This work gaveme a thorough introduction into the problems ofinformation sharing, database development,geographic data types and programming of userinterfaces. Also the problems related tohomogenization and integration of data fromdifferent users became evident to me.

Following this employment I held a substituteposition as ecologist within the municipaladministration of Järfälla, located just northwestof Stockholm. My main duties were themanagement of natural areas owned by themunicipality and the management of two largernatural reserves. This also included managementof the forest resources within the natural areas. Ialso had responsibility for nature conservationissues within the local planning process. In this Iparticipated in the development of newmanagement plans for the nature reserves as wellas a municipality-wide water management plan.Much work was performed using traditionalcartographic techniques and it includeddevelopment of new cartographic products as partof geographic analyses and presentations.

The projects I have been involved in oftenrequired the production of a map of some kinddepicting a situation of interest. One reportwanted to define and delineate ecologicallysensitive areas; another report included suggestedareas for water conservation and protectionpurposes. Each report reflected a specific purposeand a specific question.

The title of this PhD project was in thebeginning “knowledge based digitization ofgeographical information”. The idea was to see ifmaps and other existing data could be digitizedinto a computer using the knowledge of skilledexperts to enrich the database with some ‘extra’

Chapter

2

6 • Environmental management and information sources

information that made it possible to access thesedata for other purposes, to answer new questionsthan those used at the time of collecting thisinformation.

I started out to try to compare my own exam-work from my undergraduate studies, a vegetationmap (Ahlqvist and Wiborn, 1992), with anothervegetation map collected over the same area 15years earlier (Edberg, 1971). I found not onlygeometrical deviations, but also a large differencein classification systems, which specificallycaught my attention. Could these two maps everbe used to answer the question if the vegetationhad changed during this period? This questionturned out to be a very researchable one, and thework with this dissertation finally landed in astudy of translating the classification system ofthe new map into the old map producing twosemantically similar maps (chapter 8). Howsimilar they are is still a question, but the generalidea to convert information from one contextbackground into another has wide applications.

This and the next chapter will outline somegeneral factors influencing the process oftransforming geographic phenomena, or things“out there”, into computer representations. Firstof all I will treat the issue of how models of thereal world may be constructed as a way todescribe and understand the world that surroundsus. This review of previous research will almostimmediately acknowledge the second factor,which is the importance of the user context or thepurpose behind the construction of a certainmodel. The third factor, the mode of observationacts as a kind of mediator between the other two.The mode of observation articulates theobservational detail or granularity, which isdirectly related to the detail of the knowledge, andit also includes aspects of uncertainty in theobservation. This outline parallels work byCouclelis (1996) who sketched a similar divisionbetween factors that are in part responsible forhow we choose to create a computerrepresentation of a geographic entity. Toward theend of chapter 3 the discussion has bothsummarized previous research as well as thefindings in this dissertation. At that point therequirements for a context sensitivetransformation of imprecise geographical datahave been articulated together with somesuggested solutions to these requirements. This isthen finally brought into the proposedGeographical Concept Topology framework

capable of supporting a context sensitive use andreuse of existing geographic data

Models of the real world

How to understand the real world has been anissue ever since the development of life on earthbut from a shorter history-of-science perspectiveit seems to be a matter of “faith”. When scientiststry to make models of the real world they havedifferent ‘models’ or perceptions of the realworld, different ‘world views’. At a very broadlevel Johnston (1999) list three types of sciencemodels, or “faiths” – empirical, hermeneutic andcritical – that may be separated into a multitude ofseparate approaches with their own detailedexemplars, paradigm instances of how scienceshould be done.

The simplified illustration in Figure 1 is anattempt to illustrate the relation between the realworld, our perception of this as perceived reality,and the specification that ultimately leads to adata representation of the real world. From thegeographic literature it could be assumed that‘perceived reality’ or ‘abstract view of theuniverse’ (Salgé, 1995, David et al. 1996, Markand Frank, 1996) is formed through some kind ofunderstanding or modeling within the humanmind. Also, a separation can be distinguishedbetween a) a perceived reality which is inherentlyvirtual, represented by human knowledgestructures and b) a conceptual and logicalspecification which can be used to collect datainto a database and which is somehow a subset ofthe perceived reality. The perceived reality mightalso be termed ‘world view’. Since certain partsof every individual worldview are shared - bothcognitive and bodily - with other individuals,some authors suggest that this overlap might besynthesized into a ‘shared world’ (Gould, 1994).This notion corresponds with the idea ofexperiential realism discussed by Mark and Frank(1996) that is based on a real world (sharedworld) that people share mental experienceswithin.

To start with the question of “what exists?”there is a problematic and old philosophicalcontroversy between “plenum” and “atomic”ontologies (Couclelis, 1992; Couclelis, 1999). Isthe world made up by discrete objects (atomic) oris it a continuum of named attributes (plenum)?Ontology is the branch of metaphysics that dealswith the nature of being. The term has during thelast five years or so been used in the geographic

Environmental management and information sources • 7

information science literature where its meaningranges from the metaphysical science of being, tothe more computer scientific view that ontology isa formalization of how to represent objects andconcepts and their interrelations within an area ofinterest. These different interpretations led Smith(1998) to make a terminological distinctionbetween R-ontology (referent) and E-ontology(epistemological). R-ontology refers to a theoryabout how a given referent-domain is structured,what sorts of entities it contains, relations and soon. This relates mostly to the short introductionabove. E-ontology on the other hand is a theoryabout how a given individual or group orlanguage or science conceptualizes a givendomain. It follows from that definition that thereare as many proper E-ontologies as there areconceptualizations, and it is this type of ontologythat will be dealt with in this dissertation.

So, how does ontology take us any further?The experiential or cognitive perspectiveadvocated by Mark and Frank (1996) suggeststhat humans deal with categories in a way thatdepart in a few fundamental ways from thetraditional set-theoretic view that until recentlyhas been the dominating idea for a formalizedtreatment of geographic information. I have nointention to go deeper into the philosophical orpsychological sciences and theories of knowledge(epistemology). I will instead follow one of

Pawlak’s (1991) propositions and hold knowledgeas being deep seated in the abilities of humanbeings and other species to classify anything;(apparently) real things, states, processes,moments of time and all other more or lessabstract concepts we can think of. By thisdefinition, knowledge is necessarily connectedwith the classification patterns related to specificparts of the real or abstract world and seen fromthe opposite direction classification is one of thefundamental tools of science (Mark, 1993).Knowledge thus consists of a family ofclassification patterns (conceptualizations or E-ontologies) of a domain of interest, which provideexplicit facts about reality – together with areasoning capacity able to deliver implicit factsderivable from explicit knowledge (Sowa, 1999).By a classification or conceptualization I meanany subdivision or partition of a real or abstractworld using concepts and it is assumed from hereon that classification is used to create categorieswhich are also assumed to be basic “buildingblocks” of knowledge.

The terms ‘category’, ‘class’ and ‘concept’ areheld synonymous although the common use of‘class’ within computer implementations makethis term ambiguous for this discussion and is thereason for me to prefer ‘category’ or ‘concept’ inthis treatise. The term ‘entity’ refers to instancesof concepts in the real world and as aconsequence of Figure 1 that will be instances ofconcepts in the perceived reality. The related term‘object’ refer to the digital representation of theentity and is therefore relevant to the specificationand data in Figure 1. Entities and objects may alsobe termed grains and the term granularity is thusrelated to the resolution of the information.Unfortunately the term resolution is alreadyassociated with specific meanings for both spatialand temporal measurements (Veregin, 1999). Iprefer here to use granularity as a more genericterm in the sense that information contain grainssuch as classes, pixels and time units, that arelimited in their spatial, temporal and categoricalextent. Thus the granularity imposes restrictionson the possibilities to discern betweenentity/object elements within a grain.

The full process of creating a model of realityfrom the real world through human perception toa computer representation will be readdressed atthe end of this chapter. For now it suffice toconclude that real world perceptions areinherently complex but seem to be possible to

’Real world’

Perceived reality

Specification

Data

Figure 1 The abstraction process from perception ofreal world phenomena as entities in the perceivedreality through a specification to an objectrepresentation in a database.


divide into building blocks that we call categories.Through E-ontology these categories may bedefined and given significance and hopefullyfurther organized in the framework of ageographical decision support system. Dealingwith the where, when, and why of the real world,geography has developed some workable theoriesand methods to be able to conduct study andanalysis of real world phenomena. It is impossiblehere to provide a full review of current methodsor theories. In the following section I will simplyelaborate on the notion of real world models ingeography and the traditional use of acartographic language to express geographicalknowledge.

Real world models and Geography

Models of the real world have within geography atradition of being space-time centered wheredescriptions of space seem to have dominateduntil work by Newton in the seventeenth centurymade it possible to treat time in a similar manner(Couclelis, 1999). The ‘object’ or ‘plenum’ viewslead either to a world view focused on objects oron fields (Couclelis, 1992) which in turn maysuggest a scale dependency of geographic spaceinto for example small scale and large scale space(Mark and Frank, 1996). It is also commonlynoticed that a separation can be made betweentrue objects and humanly constructed objects, forexample fiat vs. bona-fide objects (Smith 1995),non-geographic vs. geographic entities (Nunes,1991). As a contrast one can also argue for apsychological definition of space where scale isdefined not by the actual or apparent absolute sizebut on the basis of the projective size of the spacein relation to the human body (Montello 1993). Inthis case a room in a house and the surface of theearth as seen from an airplane would belong to thesame psychological space domain as they canboth be apprehended from the same position. Thelack of consensus on this issue indicates that weprobably have to deal with some combination ofthese notions (Peuquet, 1988). The plenum andatomic (Couclelis, 1982) ‘space paradigms’ areprobably at work in parallel in our way to use ourown ‘external models’ of reality. The traditionalmap actually supports some of these ideas as ituses small-scale space to represent a large-scalespace, extending the well known Euclideangeometry of everyday objects into a geographic

space of realms and regions (Montello 1993) andgeographic information systems havetheoretically the ability to incorporate bothplenum and atomic views represented as rasters(fields) and vectors (objects) respectively.

Geographical information includes indiscreetvalues, inaccurate attribute definitions as well asvariations in temporal and spatial scale.Traditionally geography has been communicatedthrough maps but also through texts and images.The latter becomes evident whenever visiting ageography library where books constitute asignificant part of the information volume. Duringthe last few decades increased use and availabilityof remotely sensed data has added a variety ofnew information sources for geographic analyses,for example aerial photographs, satellite imagesand radar data. Despite the fact that remotesensing devices provide an increasing amount ofgeographic information, I still regard the map asone of the most important sources of documentedspatial information. It is also a well-refined modelof communicating the atomic view of the realworld. In addition, the fact that maps in manycases are the only available historical spatialrecord, the set of existing maps is an invaluablesource for environmental information.Considerable amounts of geographical datacollected in textual form with some sort ofgeocoding inherent, together with numerousinventories that have been carried out during thepast few decades also form an extensive source,however mostly textual, for information on theenvironment (Frank and Mark, 1991).

Communication through a ‘map-interface’,which usually consists of a set of symbols, colors,text, is adapted to and designed for human-to-human communication. This communicationprocess includes at least two steps where humaninterpretation is involved: first the real world isinterpreted by the cartographer who produces amap using sets of well-known semantics andabstractions, then the user reads the map and triesto extract the necessary realism from theabstractions in the map. The map metaphor hasbeen described and also further developed byseveral authors, among which Christopher Boardand Arthur Robinson have made substantialcontributions (MacEachren, 1995).


The form a representation of geographicphenomena takes on a map or other displaycannot be divorced from its purpose and therequirements of the society in which the visuallanguage gains currency (Gombricht, 1977). Thisis essentially an expression of underlying faith,the hermeneutic science metaphor (Johnston,1999) or the socio-cultural perspective on timeand space (Couclelis, 1999). Still, we cannot

ignore the fact that each spatial entity has beenidentified for a specific purpose and that the waythis entity is visually represented on the map canbe different according to the cultural preferencesof the cartographer or the intended audience.

The examples in Figure 2 show the samegeographic region as three different thematicmaps compiled at1:50 000 scale portray it, andwhere some common features are shown.

Figure 2 Three different thematic maps covering the same area compiled in 1:50 000 scale. The legends cover somecommon features among which the boulder concept is discussed in the text. (From Lind 1997)


Although the maps have been compiled in thesame cultural setting and with mapmakers fromthe natural science disciplines, the representationof ‘boulder’ in Figure 2 within the mapped areadiffers from 0 to 7 symbol instances. The symbolsshould be interpreted as an indication of actualplace for the feature and to some extent the arealcoverage or frequency.

Some of the differences in Figure 2 might beheld as interpretation inaccuracy during mapcompilation, but when we are given informationon the purpose of each map the differencesbecome understandable: For vegetation mappingthe presence of boulders can be a significantcharacter of the vegetation type and control thevariation of the vegetation within one given classunit. The signs are to be taken as a secondarylabel indicating the presence of boulders withinthe area and the location should not be expectedto be of high spatial accuracy. To ageomorphology map, boulders are of vitalimportance to the interpretation of the landformsand their genesis. In this map we can thereforeexpect a higher amount of boulder signs andrelatively high spatial accuracy in their location.The geology map finally does use boulder signs,but we would expect them to appear only whenthe boulders are used as an indicator of actualbedrock.

So, we see that maps can serve a multitude ofpurposes. Important for this work is that maps canbe considered as spatial representations of realworld features which can in turn stimulate otherspatial representations and all suchrepresentations are acts of knowledge-construction (MacEachren 1995). No matter howfar this process is driven, the geometricrepresentation of a feature on a map will alwaysbe a generalized abstraction of its current formand status (Livingstone and Raper 1994). Themap as a representational model to communicatesomething of the nature of the real world is onlyable to deliver a fraction of the total amount ofinformation present in the real world. So, we areeither forced, or we deliberately choose to usedifferent levels of detail in our representation ofthe features of interest. The example in Figure 3taken from Board (1967) illustrates howrepresentations of spatial features can be seen asorganized along a gradient from an infinite realityto an ultimate ideal abstraction. It also indicatesthat relative abstraction levels can be identified asa function of two important components:

dimensional scale and degree of complexity.Along this abstraction gradient we trade faithfulcomplexity with distorted understanding (Board1967). It is apparent that by chosing a certainlevel of abstraction a certain amount of detail getslost. Still manual map reading may gain some ofthe lost detail through inference.

Given some knowledge of the purpose of aninformation collection, a knowledge basedreasoning on the information value of each mapelement reveals more than can be read only becoupling the map legend to a concept definition.This is a kind of ‘back-tracking’ of the map-making processes by using some knowledgeabout the context in which the map features wereassembled. The meaning of ‘context’ may varyamong people but I intend to embrace a widemeaning of the term and define geographiccontext as the historical, social, physical, anddisciplinary domain where geographicabstractions are formed. The geographic entitieswe try to describe such as those mentioned in theexamples above own three special characteristicsresponsible for the shortcomings of currentrepresentational techniques according to Peuquetet al. (1995).• The data volume needed to adequately

represent geographic entities can be verylarge.

• Spatial relationships between geographicentities tend to be imprecise and application-specific, and the number of possible spatialinterrelationships very large.

• The definitions of geographic objectstend to be inexact and context-dependent.

As if the volumes of data and complex relationswere not enough, two of these three statementsinclude formulations such as ‘application-specific’ and ‘context-dependent’, whichillustrates the complex nature of geographicrepresentations. In fact, that there can never be asingle uniform representation of the geographicworld is well known to geographers. The twolatter statements also talk about impreciserelationships and inexact definitions, which willbe subject for further elaboration in followingsections about accuracy and knowledgerepresentation.

To summarize, communication of geographythrough maps is traditionally a manual task that isnow turning increasingly automated andinformation intensive. Nonetheless, anyqualitative or quantitative spatial analysis need to


consider that every representation of geographicalfeatures, be it on a map or in a digital database, isan abstraction of the reality, and as such they havebeen generalized for some specific purpose,therefore depending on the geographic context.

Real world models and geographicinformation systems

Apart from highlighting the context dependentnature of all geographic information, the mapexample in the previous section also shows howthe cartographic language has been used to exploitthe human ability to understand a situation bysimultaneously overview a large area and pick updetails. In Figure 2 a general pattern is given ascolored areas and important details are given assymbols. Boulders would not really be visible inthe given scale, but by using symbols one canindicate the presence and approximate location ofthese phenomena. It also makes sense to the map-reader, as it is possible to extrapolate cognitivelythe ‘boulders’ from the given location of therepresentation. This possibility of using acombination of detail and generalizations is notreadily implemented in current geographicinformation systems but it might be possible todo, given that we can develop enhanced

possibilities to express geographical meaning forentities in a geographical database.

In a geographic information system thevisualization and the storage of data are separate.Possibilities to change scale by zooming in andout, reclassify data, create great opportunities forgeographical analysis. In a geographicinformation system we are theoretically notconstrained anymore by a paper map sheet withfinite size and depth. ‘The map’ may instead beused as an abstract algebra paradigm (Tomlin,1990) where the map elements are handled in aGIS toolbox to perform spatial analysis. The mapitself will become a process as the on-screen mapwill play a key role as an interface to data infuture GIS use (Kraak, 1995) GIS map-use istherefore two-way oriented in a way that include alarge amount of user interaction with the data. Inthe light of the previous discussion on mapcommunication and multiple world-views, theuser-producer dialogue in a GIS makes it evenmore important that communication can becarried out within the ‘shared world’ of the userand the producer. On the downside, there is atleast one serious concern that must be dealt with.There is no guarantee that the displayedinformation will appear in the same way as in the

( Board C. 1967 )

Figure 3 The gradient between reality and abstraction indicating examples of types of maps at their appropriate levelof abstraction (after Board 1967)


source material. The previous map exampleillustrated that many map features have asymbolic meaning, and these features have beendesigned for use with other map features in thescale and extent set by the paper map. Thissuggests that a straightforward transfer, that isdigitization, of map features into digital format isa difficult task. A further aggravation is that theGIS user interface tends to mask the differentorigins of data, thereby leaving the user unawareof inherent limitations in the information. Even ifthe user should be aware of this problem it is notalways possible to trace the origin and thelimitations of the data stored in a GIS database.Nevertheless, digitized map data is widely used ingeographic information systems.

Any attempts to use GIS to integrate data fromenvironmental databases and to use models oranalytical tools upon data need a fullunderstanding of the origin and context of eachdata set used. Thus taking data using severaldifferent conceptualizations from differentcontexts the GIS integration process relies upon atransformation of this information into the desiredconceptualization and current context. The issueof finding automated methods for that kind of GISintegration has been the focus of much work.Recently the above described integration processhas been put into a comprehensive framework ofinteroperability of geographic informationsystems (see for example collections edited byVckovski, 1998 and Goodchild et al. 1999).Interoperability has earlier been understood as acapability to transfer data from one computersystem to another. It is only recently, and at amore general level, that the term has found itsway to the wider geographic information sciencecommunity. At the general levels of geographicinformation systems and applications,interoperability is concerned with theestablishment of a smooth interface betweenmultiple information sources (Harvey, 1999). Atthe GIS level, problems of interoperability can becreated by different geometric syntacticrepresentations, difference in class hierarchies,and different semantics (Bishr, 1998). Sincedifferent applications have different worldviewsand semantics, interoperability at the applicationlevel is essentially a semantic problem (Bishr etal., 1999)

The discussion so far has elaborated on thefact that geographical data is subjected to majorinfluences from various individual

conceptualizations of the same reality.Furthermore I have argued that a geographicinformation system theoretically has the ability todo spatial analysis of integrated geographicaldata. Still there are apparently some fundamentalaspects of context and semantics that need to beresolved. I now intend to resume the initialdiscussion on models of the real world in thecontext of modeling a computer representation ofthe real word and how the geographic context canbe represented in this model.

Existing models in useThe individual worldview and the shared worldconcepts correspond to the external andconceptual models in the ANSI-SPARCdefinition, which has been used as a generalframework for designing geographic informationsystems (Laurini and Thompson 1992, p.357). Anoverview of this model framework is given inFigure 4. The external models are defined by thepotential user and their purposes and needs, theconceptual (or semantic) level is concerned with asynthesis of all external models, the logical levelis a high-level description which ismathematically based and computing oriented andthe internal level is concerned with the byte-leveldata structure of the database (Laurini andThompson 1992, p.357f.)

Somewhere along this chain of model levelsthe supposedly chaotic real world is somehowsystematized and made discrete for the purpose ofdigital handling. Apparently this step will have tobe taken at a high level. The bridge between aconcept/semantic model and a logical model iseasier to create if a more formal mechanism isused at the conceptual level rather than narrativestatements. However, the conceptual level willstill need to hold both the deeper semantic notionsof external models as well as synthesizedconcepts, which easily translate into logical levelmodels. Clearly, the separation of models into afew levels does not solve this problem, but a shortdescription from this more data modeling orientedviewpoint seems appropriate. Also, by explicitlyidentifying a semantic level in the data modelstresses the importance of the actual meaning ofdata. I will return to this issue from many aspectssince it is central for this thesis.

Proposed semantic level modelsSeen from a GIS integration viewpoint the focusup until around 1995 was mainly on systems, dataand to some extent information (Sheth, 1999).


Work on semantic or conceptual models forgeographic information focusing on informationand knowledge has received a significant amountof attention only in the past five or six years(Livingstone and Raper, 1994; Peuquet, 1994;Ruas and Lagrange, 1995; David et al., 1996;Usery, 1996). David et al. (1996) reported onearly work to develop conceptual models forgeometry within the European Committee onstandards (CEN/TC287) and suggested that themain bottleneck in geographic informationhandling is the understanding of the semanticlevel and the way entity meaning affects themodeling of entity interactions over varyingspatial and temporal scales.

Ruas and Lagrange (1995) outlined onepossible logical model connecting the semanticmodels with the physical models. From theirperspective of generalization, this should be seenas a process allowing us to perform a change inthe perception level of geographic data. They alsostated that the first generalization stage is thetransition from one initial data schema to onecorresponding at another level of perception.According to this the actual generalizationdecisions are made at the semantic level andfurther operations need ‘only’ be carried out on arule base at the logical and physical levels. Theissue of generalization is readdressed in thesections about spatial and categorical granularityin chapter 3.

Another relatively early idea proposed byPeuquet (1988; 1994) incorporates concepts fromperceptual psychology and advocate a “triad”representation of spatiotemporal data in the laterpublication. It builds on the idea that the three“views”, time based, location based and objectbased, all provide different aspects of the data andthus each facilitates a specific kind of query. Theintegration of these three views would enable forexample objects to be explained by the spatialview and conversely spatial patterns to bematched against object based knowledge. The“triad”- view (Peuquet, 1994) is based on a dualframework of object- location integration(Peuquet, 1988) and the incorporation of time intothis framework is still under investigation(Peuquet, 1999). This kind of simultaneousrepresentation of multiple views of the same factseem as a theoretically sound concept, and thefollowing few paragraphs will show that muchresearch verify the difficulty to find one commonlevel of understanding. Instead it seems as if a

description of common and diverging points ofreference are the most feasible way to givegeographic entities more meaningfulrepresentations. In chapter 8 I argue that theintegration of fuzzy, rough and crisprepresentations is a feasible implementation of thedual framework proposed by Peuquet (1988).

Usery (1996a, b) developed a feature basedconceptual model along the same line of thoughtas Peuquet. Using an entity based view ofgeographic phenomena this model explicitlyrepresents spatial, temporal and thematicattributes which can be directly accessed. A fuzzyset implementation of geographic features isproposed as a solution to capture some of theambiguity inherent in features based on humanperception and cognition (Usery, 1996b). The useof fuzzy set representation is a notable exceptionfrom the other proposed frameworks presented inthis short review. Although this model seems verypromising it remains to be tested. Also, to someextent noted by Usery, the mechanism forcomparing multiple views of the same geographicfeature has not been identified.

Livingstone and Raper (1994) argued for asemantic model where the entities should define

Exernalmodel 1

Externalmodel 2

Externalmodel 3

’real world’

Conceptualmodel

Logicalmodel

Internalmodel

Figure 4 The four information modelling levels;external, conceptual, logical and internal, according tothe ANSI-SPARC design methodology (After Lauriniand Thompson 1992)


the space they occupy and also guide theappropriate spatiotemporal representation. Thisview follow that of Nunes (1991) who claimedthat the debate on concepts of space, shortlyreferred to earlier, show that no furtherspecification of geographic space is possibleunless the geographic objects can be defined. Asemantic model theory developed form thisviewpoint needs to be defined at a higher level ofabstraction than the spatial and temporal modelsused to represent the phenomenon (Livingstoneand Raper 1994). It has been argued that thiswould provide the necessary link between a GISapplication, external process models and usedspatial databases, and that an object-orientedapproach using environmental metaclassesprovide the means to perform co-ordinationbetween different “world views” of the same “realworld” entity (Livingstone and Raper, 1994;Raper and Livingstone, 1995). A metaclass is inobject oriented wording an assemblage of dataclasses or model classes, it works independentlyfrom the subordinate data or model classes, at thelogical model level, and it is concerned with thebehavior and relationships of the class categoriesand available morphisms between the classes.One main problem with this construct seems to bethat since a metaclass determine what objects itwill be possible to represent one need to explicitlydefine at a metaclass level the attributes andmethods of all current and future objects.

Recent work by Bishr (1998) and Bishr et al.(1999) have provided a useful formalized,approach for semantic modeling. Bishr in histhesis (1997) proposed a general framework forsemantic translators capable of mapping betweenspatial database schemas while preserving theirsemantics. The main tool to connect semanticallysimilar objects is in his framework based oncommon ontologies, essentially a standardizedvocabulary for various domains of interest.Gahegan (1999) basically propose the same ideaand both authors hold the use of interchangeformat (the term proxy context in Bishr’s work)as a mediator to transform data from oneinformation context to another. Gahegan alsoconclude that such a framework includingcategorization and transformation can achievecommunication of meaning. However, Kuhn(1999), although involved in the work by Bishr etal. (1999), points out that existing approaches tosemantic modeling such as semantic networks andfirst order logic are too limited for a rich and deep

description of semantic meaning. That motivatedhim to suggest a connection between semanticnets and algebra that combines the best of thesetwo worlds, a direction proposed as early as 1984by Andrew Frank (Kuhn, 1999). One of the mainachievements by this approach would be thepossibility to provide links between two differentsemantic networks.

Semantic model integrationThe original issue of modeling the real world hasnow turned into an even more challenging one ofintegrating different worldviews. In the abovediscussion several ideas based upon definition ofcommon ontologies (Bishr et al., 1999; Gahegan,1999) or metaclasses (Raper and Livingstone,1995) were put forward. Such work willultimately become a matter of getting groups ofpeople together to negotiate their disagreementsand consequently the issue of real worldintegration turns into what has been formalized aspart of the sociology of science theory as Groupor Organizational Decision Support Systems(King and Star, 1990). Bishr et al. (1999) uses theterm “geospatial information community” tomean a group of spatial data producers and userswho share an ontology of real-world phenomena.However, King and Star (1990) takes a broaderstance, uses a social metaphor rather than apsychological one, as in e.g. Smith and Mark(1998), and address the entire decision makingprocess in which “due process” and theconstruction of "boundary objects" is of particularimportance (Star, 1989). Due process can beexplained as groups and organizations constantstruggle to recognize, gather and weigh evidencefrom heterogeneous conflicting sources (King andStar, 1990). Boundary objects is a structure forcoordinating distributed work that not onlyinvolves heterogeneous actors, elements, andgoals but also incorporates different researchmethods, values, and languages. A boundaryobject both supplies common points of referenceas well as differences to enhance participantunderstanding of what world views otherparticipants hold, and why they hold them. Thistheory has recently been brought into thegeographic information science by Harvey (1997)and further discussed by Harvey and Chrisman(1998), Chrisman (1999) and Harvey (1999). Itseems from their examples of wetlands mappingsin the United States and the ATKIS standarddatabase model in Germany that a definition ofcommon ontologies and schema integration can at


best reach some kind of associations and partialmatching. Again this can hardly be represented byapproaches based on binary relations but it can beconstructively moved further if viewed from theideas of due process and boundary objects.Vague, inconsistent, ambiguous and illogicalinformation open the domain for conceptnegotiation, and there is enough proof that thesesituations are successfully handled within forexample organizational decision processes (Kingand Star, 1990). Several types of boundary objectshave been identified and King and Star (1990) listfour such types; repositories, ideal types,coincident boundaries, and standardized forms.Repositories are “piles” of objects that areindexed in a standardized form such as a libraryor a museum. Ideal type or platonic object may befairly vague but a good enough abstraction fromall included domains of participants such as anatlas or a diagram. Coincident boundaries areterrain objects that have the same boundaries butdifferent internal contents such as the delineationof the counties within Sweden. The last type ofboundary objects, standardized forms or labels,are methods of common communication such asthe standardized form used by the national forestinventory described in chapter 4. It is argued thatboundary objects may serve as a mediator innegotiations around which similarities anddifferences can be articulated (King and Star,1990; Harvey and Chrisman, 1998). If it turns outpossible to formalize the idea of boundary objectsinto something that explicitly can representcommonalities as well as differences this wouldhopefully serve as a better means to representgeographical meaning in a geographicinformation system. A similar line of thoughtalthough never formalized in this way wasproposed by Nyerges (1991a) for geographic dataintegration based on concept meaning and the fullimplication of these ideas will be more clear bythe end of the next chapter.

To summarize; a geographic informationmodel need to capture the vital components ofgeographic information. A host of authorsconform in the outline of which the basiccharacteristics are that makes up geographicinformation. (Sinton, 1975; Peuquet 1994,Albrecht 1996, Gahegan, 1999) For exampleAlbrecht (1996) state that in order to fullycharacterize geographic information it isnecessary to simultaneously capture the basicspatiotemporal, thematic and topological aspects

of the geographic entities and phenomena. Time,space (3D), theme, and their inter-/intraconnections thus can be viewed as a basicset of rather abstract properties that need to bedescribed. How these characteristics should bemodeled have been the focus for much researchand development and it is only lately withincreased demand for interoperability and dataintegration that the issue of meaning of the entityitself has gained focus.

Thus, in the next chapter I will start byexamining the space, theme and time componentsseparately. First of all though, I address in generalthe quality question, which in any use of data isan important concern. Quality issues includeaspects of detail and accuracy and these have alsobecome central in my thesis.

Detail in geographic information models • 17

DETAIL IN GEOGRAPHIC INFORMATIONMODELS

Human knowledge is a process of approximation. In the focus of experience,there is comparative clarity. But the discrimination of this clarity leads into

the penumbral background. There are always questions left over.The problem is to discriminate exactly what we know vaguely.

ALFRED NORTH WHITEHEAD, Essays in Science and Philosophy

In the previous chapter I concluded that aspects oftime, space (3D), theme, and their inter-/intraconnections can be viewed as a basic set ofrather abstract properties that need to be describedto fully characterize geographic information. As afirst step in the experimental design I decided toinvestigate these aspects separately in order toisolate and identify important deficiencies or gapsin theory and/or methods for geographic dataintegration. This chapter will treat the issue ofdetail and changing detail in space and theme.

As a preliminary I will go through somedefinitions pertaining to quality assessment anddiscuss their relevance to this dissertation. This isfollowed by an examination of spatial, temporaland thematic properties of geographic objects,reviewing other research efforts in the context ofthe work presented in this dissertation. By the endof this chapter I pull together most of thediscussion and the findings reported further on inthis dissertation in a discussion on a proposedsolution to provide context information withgeographic data. The proposed semantic modelframework is labeled Geographic ContextTopology, GeCoTope, and the work in chapter 8demonstrates a partial implementation of thisframework.

Quality – Detail, Granularity, Accuracy,Fitness-for-use and Uncertainty

During the last 20-30 years some well neededresearch efforts have been made to understand

aspects of error and quality control in geographicdata. Also, work to systematically define andstandardize aspects of geographic data quality hasbeen published (Guptill and Morrison, 1995).

Some attempts have been directed towardscreating a common typology of quality. The twoexamples below both suggest a general typologyfor data quality or ‘goodness’ measures. Althoughthey represent quite different fields of researchthey agree in much, and both works outline aseparation between measurable and non-measurable aspects of quality.

Veregin (1999) in a recent treatment of thequality issue implicitly outlines a two dimensionalmatrix of geographic data quality components.Like any geographical phenomenon description,quality aspects may be divided into spatial,temporal and thematic components (Sinton, 1978;Veregin, 1999). Each one of these dimensionsincludes aspects of accuracy, resolution,consistency and completeness. So, we have forexample an aspect of spatial resolution in adataset, an aspect of thematic consistency etcTable 1. Interestingly, another and somewhatsimilar typology from the field of Modeling andSimulation (M&S) can be found in Meyer’s(1998) definition of ‘goodness’ measures. In hisdefinitions (level of) Detail is a measure of thecompleteness/complexity of a model with respectto the observable characteristics and behaviors ofphenomena that the model represents. (level of)Accuracy is a measure of the exactness of a

Table 1 Data quality components

Accuracy Granularity Consistency CompletnessSpace Spatial

accuracySpatialgranularity

Spatialconsistency

Spatialcompleteness

Time Temporalaccuracy

Temporalgraunlarity

Temporalconsistency

Temporalcompleteness

Theme Categoricalaccuracy

Categoricalgranularity

Categoricalconsistency

Categoricalcompleteness

Chapter

3

18 • detail in geographic information sources

model's details with respect to the observablecharacteristics and behaviors of phenomena thatthe model represents. (level of) Fidelity is ameasure of the agreement of a simulation withrespect to perceived (i.e. within a specific context)reality. (level of) Resolution is a measure of theminimum degree to which accuracy and/or detailmust coincide with the fidelity of the simulation.Accuracy and detail are relevant primarily inrelation to models. Fidelity and resolution areappropriate to use in a simulation context.Meyer’s terms fidelity and resolution are notdirectly related to any of the four aspects listed byVeregin, instead they are embraced by a broaderquality term ‘fitness for use’. These qualityaspects have received very little attention from ageographic information science perspective. Thismay be due to the fact that they are hard foranyone but the data consumer to evaluate.Notably, both authors agree that fidelity,resolution or fitness-for-use are extremely hard toquantify, as they have almost no context-freemeaning. This is probably already about tochange with continued development of forexample applied environmental models(Goodchild et al., 1993; 1996a; 1996b) but also asa result of designing an infrastructure that enablessemantic interoperability (Harvey, 1999). In anycase, following the argumentation of Meyer(1998), a necessary foundation for any suchevaluation is that the aspects of detail andaccuracy first can be properly measured.

So, if we turn to the quality measures thatseem possible to quantify; detail/resolution andaccuracy, we may note that Veregin’s definitionof resolution correspond with Meyer’s definitionof detail, whereas both authors use the termaccuracy to mean the same thing. As for the useof the terms detail, resolution or granularity it isstill a matter of discussion (Duckham et. al, 2000)and I noted earlier that I prefer to use granularityin this text to avoid confusion with detail andresolution. Occasionally I will also use the termresolution/granularity, mainly when reference ismade to some specific work using the termresolution in the meaning of granularity.

Veregin (1999) and Meyer (1998) as well asseveral others (Salgé, 1995; Goodchild, 1995)treat accuracy as a relative measure since it isdependent on the intended form and content of thedatabase. In addition to accuracy and granularityVeregin (1999) also include quality measures ofconsistency and completeness, Table 1. If we

consider the measurements in Table 1 as aminimum requirement to document we now needto suggest some viable ways to measure eachproperty. It turns out that the matrix works fairlywell for well-defined features (Goodchild, 1995;Veregin, 1999). We need to keep in mind thoughthat measurement is always made against a logicalspecification of the conceptual model that wasused to collect the data (Veregin, 1999).Goodchild (1995) noted that for poorly definedfeatures it is not always possible to separate forexample attribute accuracy from spatial accuracy.For example in the case of vegetation maps it issubjected to discussion whether the location of aboundary between two vegetation types isuncertain due to the problem of measuring theexact location of the vegetation types or if it isdue to the problem of discerning between the twovegetation types at the correct location(Goodchild, 1995; Painho, 1995). Salgé (1995)provided the first treatment of qualitymeasurements from this perspective in hisseminal text on semantic accuracy. Semanticaccuracy refers to the quality with whichgeographical objects are described in accordancewith the selected model (Salgé, 1995). Figure 5show a modified version of Figure 1. It illustratesthe concepts of model quality or ‘ability ofabstraction’ as a measure of how well a real worldfeature can be defined in the perceived reality. Italso shows the meaning of an evaluation ofdataset quality as how well geographical objectsin a database correspond with the perceivedreality. Veregin (1999) actually uses these notionsbut deviates slightly from the proper definition of

’Real world’

Perceived reality 1

Specification

Data

Quality of the dataset

Quality of the model “Ability of abstraction”

Figure 5 Two main aspects of semantic accuracy


completeness to produce a measurable situation inthe case of completeness measures. Brassel(1995) in a similar way as Salgé (1995) definedcompleteness as the difference between theobjects in a database and ‘the abstract universe’ ofall such objects. This lead Veregin (1999) toconsider both data completeness and modelcompleteness but this is really a general problemfor all four quality measures, and it is directlyrelated to the nature of the object underconsideration.

It is important not to restrict the discussion ofquality to either well defined objects or poorlydefined objects and an alternative way to treat thesubject is to approach the quality issue from anuncertainty viewpoint. Fisher (1999) discussesuncertainty in general and separates uncertaintyinto two main categories depending on thedifficulties to define object classes and instancesof these. For well-defined objects the type ofuncertainty can be characterized as error of somesort and this can be treated with probability-basedmethods, which typically produce true or falseresults. For this type of uncertainty there are todaya collection of measures such as mean error, rootmean square error, percent correctly classified,Kappa, sampling interval, sample resolution(pixel size or time collection interval for eachmeasurement) (Veregin, 1999). For poorlydefined or unresolved objects two types ofuncertainty have been recognized, vagueness andambiguity (Fisher, 1999). These latter types ofuncertainty may be treated using concepts offuzzy and rough sets. No general framework, suchas the one described for well defined features(Veregin, 1999) Table 1; have been articulated forquality measures of poorly defined features. Salgé(1995) suggest that similar measures as for welldefined objects should be used, for examplecommission and omission error. I would suggestthat the framework of Table 1 could be used alsoin the case of poorly defined features with onereservation. The reason for this reservationoriginates from the experiment in chapter 7 wherethe degree of categorical completeness propagatesinto the analysis and is embedded in the followingaccuracy. We must bear in mind that forassessment of poorly defined objects as definedby Fisher (1999) the conceptual model is ofprimary importance. Thus, until furtherinvestigated quality assessment for poorly definedfeatures need to commence from the lower rightpart of the matrix in Table 1 in order to

acknowledge that the conceptualization of the realworld governs the measurement of time andspace. This also conform with Nunes (1991)referred to in chapter 2.

As already noted, which analytical approachto use for the uncertainty assessment is generallyguided by the nature of the objects under study.With well-defined object classes and individualsthe uncertainty is probabilistic in nature whereaspoorly defined objects or classes are betterhandled by fuzzy set approaches (Fisher 1999).Rough sets is used where uncertainty comes in theform of indiscernibility, it is therefore suitedwhenever the granularity of the information is toolimited to discern between sought alternatives.These later cases have been thoroughly exploredin chapters 6 through 8. Although this generaldivision might be argued it clarifies thecomplementary nature of these three approaches,and that a proper use of each method is decidedby the data at hand. All of these uncertaintyaspects will be further put into the context ofrepresenting geographical meaning in the lastsection of this chapter.

A central concern of this dissertation is use ofdata from different contexts and it is essential toadmit that several instances of the same realworld feature may be possible with equalrelevance, since perceived reality often only existsas an intellectual construct. In Figure 6 I haveincluded the additional aspect of semanticheterogeneity (Bishr, 1997) to give a morecomplete illustration of the semantic uncertaintyinvolved in a translation between contexts.

If we recapitulate the earlier description ofcategories, conceptualization or E-ontologies, wesee that semantic accuracy evaluation must beacknowledged as a fundamental part in therepresentation of knowledge through categoricaldata. The experiments and methods dealt with inthis dissertation all pertain to the problem oftransforming information from one context toanother and it has been recognized that this ismainly a matter of semantics. So, a major scope inchapter 7 is measurements of semantic accuracy.There I demonstrate approaches that evaluate boththe aspect of ‘ability of abstraction’ as well as anevaluation of the semantic accuracy of twodifferent spatial generalization operators. By acareful measurement of attributes in a controlledenvironment, errors have been reduced to aminimum and remaining errors are controlled toenable a constructive interpretation of remaining


deviations. A problem though is that any attemptto quantify semantic accuracy against perceivedreality becomes extremely susceptible tosubjectivity, but so will any assessment oftranslation results between user contexts be. Asingle measurement of semantic accuracy cannotaccommodate for all different worldviews thatmight be at play and the use of these kind of‘closed’ tests may very well be contested. In anenvironment of open systems operated byindividuals who often use unresolved informationwe need a type of quality tests adequate for theseconditions. Star (1987) argued in such a contextfor a different form of evaluation based on realtime design, acceptance, use and modification of asystem by a community and suggested the ideasof due process and boundary objects introducedearlier.

It will be important to keep the uncertaintydiscussion in mind through the followingtreatment of spatial, temporal and thematicgranularity, as I will return to the space-time-theme dependence by the end of these followingsections.

Spatial granularity

As I already noted, a well developed way toformalize and communicate space is thetraditional map, and map scale has been one wayto articulate a sort of spatial granularity.According to ICA (1973) the scale of a map is theratio of distance measured upon it to the actual

distances that it represents on the ground. Dent(1993, p.77) expands the term further by statingthat scale relates to the size of precision andgeneralization applied in the study. He also setsout that the nature of the inquiry sets the scale,and the scale in turn determines the degree ofgeneralization. This clearly relates to otherdimensions than only spatial granularity but to alarger domain of generalization. This is probablydue to the fact that scaling of geographical datahas traditionally been made manually includingall sorts of spatiotemporal and categoricalconsiderations. With increased demand forautomated tools for generalization, updates andrevision of databases made at different scalelevels the issue of scaling has gained a renewedinterest.

The lack of an adequate definition ofgeneralization in the context of digital processingenvironments motivated McMaster and Shea(1992) to suggested the following definition:‘Digital generalization can be defined as theprocess of deriving, from a data source, asymbolically or digitally encoded cartographicdataset through the application of spatial andattribute transformations’. In their definition theyalso stated that the objectives of a generalizationprocess are to produce data that is consistent withchosen map purpose and intended audience; andto maintain clarity of presentation at the targetscale. Since then the increased use of digitalsystems for storage and analysis of geographic

’Real world’

Perceived reality 1

Specification 1

Data 1

Quality of the dataset

Perceived reality 2

Quality of the model “Ability of abstraction”

Specification 2

Data 2

Schematic heterogeneity

Semantic heterogeneity

Figure 6 Aspects of semantic uncertainty in a multi-user context


data has widened the traditional meaning ofgeneralization to encompass not only map output,but almost any transition between representationalmodels of the real world (Weibel and Dutton,1999). In order to be a generalization though, thistransition is confined to one that decrease thelevel of detail but at the same time try tomaximize the information content with respect tosome application.

The ‘value-added’ aspect of digitalgeneralization included in these definitions wasrather lately added to the GIS research agenda(Müller, 1989; McMaster and Shea, 1992; Mülleret al., 1995). The last decade has provided amultitude of fruitful discussions andinvestigations on generalization issues in a GIScontext (Buttenfield and McMaster 1991; Mülleret al 1995).

Even if a lot of the work reported so far onautomated generalization has a cartographicflavor to it, a subset of the research labeled objectand model generalization (Weibel and Dutton,1999) is of direct interest to this discussion.Object and model generalization is the process ofdata abstraction dealing with the identities ofgeographic phenomena and their semanticrelationships. This is essentially the same problemthat has been outlined previously as a problem ofGIS integration, interoperability and semanticmodeling. An automated generalization processneeds knowledge on object geometry, spatial andsemantic relations (Ruas, and Lagrange, 1995).Thus the model generalization research needsknowledge on semantic relations in very much thesame way as any other knowledge basedtechnique. Today scale, granularity andgeneralization are receiving interest fromresearchers as generic issues. For exampleGoodchild and Quattrochi (1997) suggest a fullscience of scale that would include the ability tochange scale in ways that are compatible with ourown understanding of Earth system processes.

Apparently, generalization is concerned withchanges of both thematic and spatial detail and arestricted focus on pure spatial granularity, as inthis section, lead rather naturally to considermodern granularity limited data such as remotesensing images. With remote sensing images Iinclude both aerial photographs and satelliteimages. In maps the concept of scale is setexplicitly but the remote sensing image has nopre-defined scale as such. It does however have aconcept of spatial granularity but this must not be

confused with scale since there is no direct linkbetween the two.

Resolution/granularity is a measure ofdetectable separation between objects in a visualfield (Dent, 1993, p.260). This has only recentlybeen noticed as a theoretical problem as theefforts to integrate and compare remote sensingdata at different scales has shown problematic andtherefore received some much needed attention(Wilkinson 1996). Studies of the effects of spatialdata aggregation on grid data have often beenperformed using remote sensing images,simulated remote sensing images, and classifiedversions of the two. Those studies have provideda lot of useful insights into the effects imposed bya changed spatial granularity on grid data (cfQuattrochi and Goodchild, 1997).

Following the work by Woodcock andStrahler (1987), many studies have reported onthe effects of spatial data aggregation. Most workemploy methods and frameworks that provideglobal measures of the aggregation effects ofchanged spatial granularity and these effects canbe evaluated from several aspects. One suchaspect is to see how image statistics is changedover resolutions/granularity (Bian and Butler,1999; Van Beurden and Douven, 1999; Milne andCohen, 1999; Moody and Woodcock, 1994). Oneother evaluating certain landscape components(rural, forest) and their inherent responses toaggregation processes (Woodcock and Strahler,1987, Turner et al, 1989). All these efforts mayhowever be characterized as evaluations of truthin the model, where the model is a speciallyconstructed set-theoretic reality surrogate whoserelation to reality itself is left unspecified (Smith,1995). The issue of evaluating how aggregationaffects the semantic accuracy (as defined inprevious sections) of for example land cover hasnot been worked upon as much.

This apparent gap in the literature aboutspatial granularity motivated me to look more intothese aspects and the work presented in chapter 5are the first efforts in this direction. I argue thatfrom an application perspective the preservationof image statistics is not necessarily the goal. Onthe contrary, for example visualization often usesaggregation to reduce noise and enhance certaininformation, similar to the way we all ‘low-pass-filter’ a visual experience when we squint oureyes, which of course change the image statistics.It follows from the definitions above that theultimate goal of any kind of generalization is that


the information we finally produce is appropriateto the task. But how do we make sure that this isthe case?

This has connections to the issue ofappropriate spatial granularity for study, whichhave been debated by geographers for some timenow. As a result of these discussions there istoday an agreement that a changes of analysisscale also changes the importance and relevanceof specific variables (Meentemeyer, 1989). Therecent development of landscape ecology(Forman and Godron, 1986) has not only broughtan increased insight into the spatial domain ofecology but also some renewed attention to this asa geographical issue. Purely ecological studieshave traditionally been biased toward particularspatial and temporal scales (Johnson 1996). In thedevelopment of landscape ecology as a field ofresearch some general ideas of scaling upecological processes (King 1990), to some extentbased on a hierarchical concept of ecosystems(O’Neill et al. 1986) were put forward. Furtherwork speculated that patterns and theirrelationships might be ordered into scale domainsand that transitions between such domains mightbe relatively abrupt, much like phase transitionsin physics (Wiens 1992). One example of asuggested framework uses a scale dependenthierarchy of classification systems where bothabiotic and biotic factors are coupled to bothspace and time (Klijn and Udo de Haes 1994).

Figure 7 gives an overview of this idea in thespatial domain. The concept indicates the

possibility of assigning certain granularityintervals at which controlling factors can beconsidered relevant for the spatial pattern ofecosystems. By identifying controlling factors fora specific variable it would be possible to delimitthe scale interval within which this variable willexhibit an identifiable pattern. This can be seen asan effect of the ‘constraint envelope’ conceptsuggested by O’Neill et al. (1986). These ideasmay be hard to verify (Schneider, 1994) but theyhave gained support and are now considered byseveral authors as one important key to makingprogress within the domain of geographicinformation science (Lam and Quattrochi 1992;Buttenfield 1995; Painho 1995; Jelinski and Wu1996; Johnson 1996).

One of the problems to handle changes inlevels of detail may be coupled to the way wemeasure this as an absolute space or asrepresentative fraction as in maps (Goodchild,1999). A possible candidate for a scale invariantmeasure of granularity is the “scope” (Schneider,1994) or LOS, “large over small” (Goodchild,1999) measure. Schneider (1994) define this asthe ration of the range (extent) to theresolution/granularity and suggests that thisrelative measure is useful in comparingphenomena over spatial, temporal as well asthematic scales. The information value of thismeasure seem to be high since it in some senseincorporates two of the three most prominentscale concepts related to geographic information‘extent of a study’ and ‘operational scale’ (Lam

Parent material

Geomorphology

Groundwater

Surface water

Soil

Vegetation

Fauna

Ecozone

Ecoprovince

Ecoregion

Ecodistrict

Ecosection

Ecoseries

Ecotope

Eco-Element

Climate 62.500 km

2.500 km

100 km

625 ha

25 ha

1,5 ha

0,25 ha

2

2

2

Indicativemapping

scale

Basic mapping

unit

Controlling factor

1:5,000

1:50,000,000

1:10,000,000

1:2,000,0001:500,000

1:100,0001:25,000

Ecosystemclassification

Figure 7 The relation between ecosystem components and spatial scales at which they are ecologically relevant. Linesconnecting controlling factors and ecosystem classification indicate which controlling factors may determine theobserved spatial pattern of ecosystems at each scale. Adapted from Klijn and Udo de Haes (1994).


and Quattrochi, 1992). And if the third of thesescale concepts ‘cartographic scale’ is tooproblematic for the digital domain (Goodchild,1999) it seems as if the scope measure is apotentially powerful concept. It remains still to betested in what way it might be applicable to thegeneral issue of changed granularity ofgeographical data.

MacEachren (1995) also emphasize a need toinvestigate the possible psychological factors thatmay interact with such scale dependent real-worldpatterns. The latter has been discussed as part ofthe broader issue of scale and detail in thecognition of geographical information (Montelloand Golledge, 1999).

Apparently a model of geographic reality thatcan be varied in scale must account for thevariations in objects, attributes and relations thatmay be encountered as an effect of changingspatial granularity. In chapter 7 I follow this lineof thought by demonstrating location specificmethods for the estimation of generalization andspatial grain effects on categorical data sets. Thestudy can be seen as an extension of the initialcase studies reported in chapter 5. The design inthese two studies still did not allow for anassessment of the influence of physicallycontrolling factors such as hydrology, topography,climate et c. From the literature this still seem tobe an important source for the conceptualizationof a domain of interest (cf. Klijn and Udo deHaes, 1994; Johnson, 1996). Accordingly itshould be possible to include these ideas in aframework for transformation of informationbetween contexts. The demonstration in chapter 8include wetness index as a determining factor forthe target classification. I take this as affirmativeproof that the proposed framework is capable ofincorporating the idea of controllingenvironmental variables although this specificissue is not very much elaborated on in this thesis.Although chapter 7 failed to address the issue ofcontrolling factors, it has attacked the problem ofevaluating semantic accuracy due to changedspatial granularity. The hypothesis tested in bothchapter 5 and 7 is whether the outcome of achanged spatial granularity of a dataset is thesame as if data were instead collected at thisdesired level of granularity. The joint results fromchapters 5 and 7 do not give any clear-cut answerswhich I take as evidence for that we have to dealwith a multifaceted problem and that we miss anappropriate information theoretic model to handle

aggregation operations. The case studies inchapter 5 showed a sensitivity of certainenvironmental variables to changed spatialgranularity whereas the experiment in chapter 7did not provide clear evidence for this.

Temporal granularity

Peuquet (1999) suggest that the goal of atemporal representation is to record change overtime and the basic questions asked against atemporal database would be about states andchanges. From an ecological perspective Huston(1994, p232) holds disturbance and successionalchanges as the primary landscape processes thatare observed by humans and that regularlyinteract with human activities, and the intensityand frequency of disturbance are of majorimportance to community and ecosystemproperties. Also from a more general perspectiveduration and frequency seem to be importantconcepts to describe temporal patterns (Peuquet,1999). A disturbance regime often forms adynamic equilibrium characteristic to a particularlandscape (Huston, 1994). A landscape can in thesame way be defined as the tangible andcharacteristic result of the interactions between aspecific society, its cultural preferences andpotential, the given physio-geographicalconditions and biotic as well as abiotic processes(Sporrong 1993). Thus, a landscape is describedby geographical boundaries and concepts rangingfrom physical objects such as houses, trees andrivers via processes like climatic variability andecosystem succession to highly immaterial partsof the landscape such as land ownership patternand human ideological structures. Once again wemay note the problem to separate geographicalinformation into its ‘primary’ components.

The time aspect of geographical information,although not as much investigated, seems tobehave in a similar way as the spatial aspect. BothLangran (1992) and Klijn and Udo de Haes(1994) point at support for the hypothesis that thenatural rate of change, that is frequency, for anatural process generally follows the spatialhierarchy outlined in Figure 7. In that figure thetemporal scales are reflected by the most rapidlyresponding processes being located relatively lowin the hierarchy.

Recent attempts on the formation of coastalgeomorphology theories as well as otherlandscape ecological studies emphasize the use ofspace and time scales as an organizing principle


(Raper and Livingstone, 1995; Turner et al.,1993). Both these works argue for a geographicmodel in which time and space are treatedtogether in relation to the nature and behavior ofthe described entities. Peuquet (1999) reviewsome approaches to the representation ofspatiotemporal data in digital databases anddivide these in groups of location-based, entity-based and time-based. She concludes that sectionby arguing for a combined representation such asthe ‘triad representational framework’ treated inthe previous section (Peuquet, 1994)

The scope concept was introduced in theprevious section and I will use it here to speculateon how the spatiotemporal characteristics of ageographic phenomena might be expressed usingthe ideas reviewed so far. Given a piece ofhemiboreal Scots Pine forest the spatial scale willhave a lower limit of some 50-100m in order tocall a piece of landscape a “forest” that alsoexhibit “foresty” properties such as influence onlocal climate and flora. The upper spatial limitmight be set using the controlling factors soil andground/surface water giving a spatial granularityinterval of approx. 50-1000m. The temporal scopeof forests depend on management, but for a ScotsPine forest the natural “disturbance regime”, thatis the turnaround interval, can be set somewherebetween 50 and 100 years (Angelstam 1997). Ifthese figures shall be as absolute values or on arelative scale as scope is still a matter of questionas discussed in the previous sections.

The figures indicate appropriate ranges andthe grains of the information, which govern theusage of this set of data. This information shouldfor example be used to set limits for when thecurrent data set is violated due to for example azooming operation in a GIS.

What this short elaboration demonstrates isthat regardless of the space-time strategy used,categories will always be guiding the space-timeconceptualization and the granularity of the

spatiotemporal representation. So, I therefore turnto the treatment of thematic detail and how thiscould be handled before the final two sectionswhere I propose an integrated framework of aGeographic Concept Topology.

Categorical granularity

The theme of a geographical dataset is to beunderstood here as the domain of a study. It wasdefined earlier that knowledge was made explicitby classification patterns where categories arebuilding blocks of knowledge. The treatment ingeographic research of categorical granularity hasnot received as much direct interest as the issue ofspatial granularity. For example Mark et al.(1999) list only four studies of geographicalcategories that have involved human subjectstesting. But in the last few years a number ofpublications indicate an interest in categorytheory and ontology.

Categorical granularity or degrees ofcomplexity in the terminology of Board (1969) isnormally organized as concepts in a hierarchicalstructure. In such structures general and abstractcategories can be found at the top and lessabstract and more specific categories are found atthe bottom of hierarchical organizations, such asthe two hierarchies outlined in Figure 8. Increasedcomplexity is found by climbing down to lowerlevels in the hierarchy. Such a hierarchicalstructure can be either constructed as a conceptualhierarchy (left) or a hierarchical tree (right)(Freksa and Barkowsky, 1996). There are somefundamental differences in the two hierarchytypes of Figure 8 that will be examined below.

In the case of hierarchical trees Mark et al.(1999) separate between ‘partonomies’ and‘taxonomies’ depending on the type of relationbetween categories at one level up or down thetree. Taxonomies relate to ‘kind-of’ relationships,such as an oak forest is kind of a deciduous forest.A taxonomy enable inferences about properties

3.1.1 3 .1.2 3.2.1 3.2.2

3.1 3.2

dry droughty mesic very dry

moist wet

dry mesic wet

dry wet

very wet

Figure 8 A conceptual hierarchy structure (left) adapted from Freksa and Barkowsky (1996) compared with ahierarchical tree structure (right) compatible with e.g. several vegetation classification systems (Påhlsson 1995)


and class inclusion, which is fruitfully exploitedin for example object oriented knowledgerepresentation. A partonomy on the other hand isbased on ‘part-of’ relations such as ‘a tree is apart of a forest’. Although geographical conceptsassociate naturally with both partonomies andtaxonomies, partonomies does not supportproperty inferences (Mark et al., 1999). The lackof inferential power makes the application ofpartonomies as knowledge representationaltechnique a question suitable for furtherinvestigation. A deeper investigation of thedifferences between partonomies and taxonomieswill not be treated here.

In a concept hierarchy, refinement ofconcepts, that is increased complexity, is notmerely done by a subdivision of individualconcepts as in the hierarchical tree case. In aconcept hierarchy we see that additionalcategorical detail is given by creating categoriesfor borderline cases. Thus, each subdivision isalso revised according to the horizontalneighborhood structure. Consequently eachconcept is defined at a specific place in thehierarchy and given a meaning by its position inboth vertical and horizontal direction (Freksa andBarkowsky, 1996). We can observe this in Figure8 where the immediate neighborhood of theconcept mesic is dry/wet in horizontal direction.In vertical direction the concept neighborhoodconsists of dry/wet at the more general level anddroughty/mesic/moist at a more detailed level. Allof these neighborhoods are related to the meaningof the source concept moist.

Concept relations can also be defined betweenseparate hierarchies in order to relate a categoryin one context with a corresponding category inanother context. Building further on the idea of ameaningful neighborhood structure I willillustrate such a relation between two differentcontexts. This problem domain has been termed“the metadata folding problem” within computerscience (Aslan and McLeod, 1999) where it isunderstood as the problem of partial integration ofremote and local databases in the presence ofsemantic conflicts. And it seems to have set off avariety of efforts to resolve this problem althoughAslan and McLeod (1999) list some seriousshortcomings for most of these that are related tolimited dynamics and considerable knowledgedemands. The Cyc-project (Lenat, 1995) for

example uses a global schema approach that todaydemonstrate some workable implementations,such as search engines for the internet, some 10years after its start.

From a geographical viewpoint, anycategorical mapping that fails to incorporatespatiotemporal aspects will fail to give a fullgeographic description. Thus approaches basedsolely on categorical matching will not qualify asa viable candidate in a geographical situation. Inthe following real example I use two differentvegetation classification systems, and theseactually represent instances of the two hierarchytypes illustrated in Figure 8.

The upper part of Figure 9 shows thehierarchical tree of the forest vegetation types in aNordic vegetation classification system (Påhlsson,1995). It is separated into four levels of detailwhere the lowest level exhibits the highest degreeof complexity. Enlarged squares give a fewexamples of the most specific categories, in thiscase vegetation types. The highlighted vegetationtypes have also been mapped onto a diagramdecided by soil moisture and richness in the lowerpart of Figure 9. We may look upon this diagramas one level in a concept hierarchy that has beenextended in two dimensions, guided by bothnutrition status and moisture. In an object basedview these are the attributes of a certainvegetation object but in a location-based viewthey are the physical conditions of a continuousspace. The nutrition-moisture gradient forms thebasis for several other classification systemscommonly used within the Nordic countries (cf.Påhlsson 1972), and as such the linkages forms amapping between these two classificationsystems.

By using the idea of a concept hierarchy suchas the left one in Figure 8, each concept is nowclosely associated to its neighboring concepts interms of both nutrition status and soil moisture,whereas in the original hierarchy (upper part,Figure 9) the horizontal neighborhood is notnecessarily indicative of any specific property inthe adjacent classes. Typically a geographicvegetation object should be regarded to havesome internal heterogeneity, thus the mappingonto these continuous variables comes in the formof a more or less vague range, the elliptical shapesin the diagram of Figure 9.


The example has large resemblance with thework by Mark (1993) on the conceptualboundaries between similar categories of waterbodies in English (lake, pond, lagoon) and French(lac, étang, lagune). In that experiment Markshowed that many categories for water entitiesmay be discriminated using size, spatial relationto the ocean, salinity, presence of marshes at theedges, and whether it is man made or not. Animportant observation in these examples is that amapping between similar concepts, but fromdifferent contexts, often cannot be achievedthrough a crisp relation. In the chosen examplesthere are components of uncertainty related tograded concepts and indiscernibility betweencompeting concepts that cannot be fully describedby a binary relation. These problems areapproached in the following section and tackled in

more detail in chapters 6 through 8 where I alsopropose and demonstrate a few solutions tohandle them.

Although the examples so far deal with thehorizontal neighborhood structure it is alsoconceivable to think of additional verticalneighborhood mappings as in Figure 8 left. Theexample in Figure 9 uses two dimensions at oneconcept hierarchy level, moisture and nutrition.The example by Mark (1993) explicitly illustratedthe use of three dimensions, or controlling factors,size, edge marshiness, and man-madeness.Theoretically the dimensionality could beincreased infinitely creating a multidimensionalnetwork of concept meanings provided bymappings between similar concepts and betweenconcepts and controlling factors. Nyerges (1991a)outlines a general interpretation of this idea that

Nutritionstatus

Dry

Mesic

WetPoor Rich

Oak forestPoor in herbs

Spruce forestOf wet dwarfShrub type

Pine forest oflichen type

Pine forest oncalcareous ground

Birch forest ofherb type

Ash forest

Spruce forestof fern type

Mixed forest ofwet herb type

Alder shoreforest

Forest vegetation

Coniferous forest Deciduous forest Mixed forest

Spruce forestPine forest Birch/Aspen forestBeech/

Hornbeamforest

Alderforest

Other Grown Juv.Oak/Elm/Ash/Lime

forest

Pine forest oncalcareous

ground

Pine forest oncalcareous

ground

Spruce forest ofwet dwarf-shrub

type

Spruce forest ofwet dwarf-shrub

type



Oak forest poorin herbs

Oak forest poorin herbs Ash forestAsh forest Mixed forest of

wet herb typeMixed forest ofwet herb type

Alder shoreforest

Alder shoreforest

Spruce forest offern type

Spruce forest offern type



Figure 9 The taxonomy of forest vegetation classes (from Påhlsson 1995) is mapped onto a diagram using water andnutrition availability. Some of the vegetation-types are highlighted as enlarged boxes to exemplify the mapping.


he called a heterarchy of concepts, based onmultiple, and interconnected conceptualhierarchies, forming a multidimensionalknowledge framework of concept meanings.Thus, a heterarchy would be something similar towhat linguists would call connected word maps.Such a heterarchy of concepts could be exploredby e.g. generalization operations (Nyerges,1991a) or be used to provide input to predictivegeographical models (Nyerges, 1991b). I will nowfollow this general design and furthermore arguefor some additions that I see as a prerequisite for asemantically rich representation of geographicconcepts.

I have demonstrated in this section thatcategories from different contexts are often notrelated through crisp, binary relations. In suchcases there is some amount of uncertaintyinvolved in the relation. Now, uncertain conceptrelations, such as those illustrated in Figure 9 donot only extend the original notion of concepthierarchies (Freksa and Barkowsky, 1996) intoconsidering a horizontal neighborhood in twodimensions. More important, it illustrates thegeneral idea behind my proposed formalization oftransformation between similar concepts from twodifferent contexts. This idea acknowledges thatgeographical categories may use location-based,time-based as well as object-based views for theirrepresentation. Figure 9 illustrate how expertknowledge is used to produce an approximatemapping of categories that is accompanied by adefinition of how target concepts are related tocertain important spatial properties. I now claimthat the arguments and discussions broughtforward so far implies that the categorical aspectof geographic information is the proper startingpoint for any transformation of geographicinformation that involves a changed spatial,temporal or categorical granularity or anycombination of these.

In the next two sections I will go into detail onavailable options to describe semantic uncertaintyin (expert) knowledge representations. Here I willfocus on crisp, fuzzy and rough sets as candidateframeworks for the representation of conceptinterrelations and the degree of uncertainty in themapping. Thus, the remainder of this chapteroutlines a core idea of knowledge representationunder semantic uncertainty.

Knowledge representation and uncertainty

Integration, understanding and communication areessential concepts when we talk about knowledge.The previous section proposed that multipleinterconnected conceptual hierarchies could beused to express a deeper geographical meaning ofused concepts. Most efforts to implementsemantic level models to date have proposed settheoretic approaches for its implementation(Nyerges, 1991b; Livingstone and Raper, 1994;Bishr, 1999; Gahegan, 1999). Related frameworkshave also been proposed, such as ‘mereotopology’(Smith, 1996) that uses mereology, referred to asthe theory of part and whole, as an alternative toset theory to describe topological relationsbetween parts and wholes of things. One reason tosearch for alternatives to set theory has been itslimited capability to express semantic ambiguityof categories (Mark and Frank, 1996). Althoughuncertainty may have various sources,randomness and imprecision are two major typesthat are of importance in spatial knowledgerepresentation and inference (Leung, 1997).Probability as a mathematical and statistical ideais well studied, understood and also welldocumented (Fisher, 1999) and itsimplementation within the field of geography isfairly well developed. I see no reason today toabandon the set-theoretic realm since the fuzzy(Zadeh, 1965) and rough (Pawlak, 1982)extensions of traditional set theory seem to beviable techniques capable of handling the types ofcategory uncertainty or imprecision thatpreviously was problematic from arepresentational viewpoint. As an example, thework by Usery (1996) has shown the possibility(sic) to represent features as fuzzy sets. I willexplore this issue in more depth in this section,which finally by the end of this chapter blend intoa general description of geographic knowledgerepresentation under semantic uncertainty and aproposed solution.

There are a variety of views on the theory ofknowledge but in order to build a spatial decisionsupport system we need data that explicitly defineat least parts of our current knowledge. Recentadvances in data intensive methods, such asknowledge discovery through data mining(Fayyad et al., 1996), will probably gain insignificance with the ever-increased amount andvolumes of data. Although some pointers to theliterature are provided by Pawlak (1991) anddeeper treatments can be found in Fayyad et al.


(1996) and Leung (1997) I will neither go intodetail into these, nor is my intention to cover alltechniques to represent knowledge in spatialdecision support systems. The followingdescription tries instead to link the different viewson knowledge representation with the previousdiscussion of different types of uncertainty.

There are today a few major approaches torepresent knowledge and to make inferences fromthis in a formalistic manner (Leung, 1997). Thefirst and maybe most known and used alternativeis to use declarative knowledge representedthrough some kind of formalism such as first-order predicate logic, production systems built upby ordered sets of IF-THEN statements, semanticnetworks, frame-based hierarchies, object-oriented approaches or combinations thereof.These representations are adequate wheneverknowledge can be expressed as clear-cut true orfalse statements such as ‘this area is a swamp’.This also leads to clear-cut answers, ‘yes’ or ‘no’.In this case uncertainty is essentially a stochasticevent connected to well-defined objects. Onefrequently cited implementation uses aprobabilistic approach to uncertaintyrepresentation (Goodchild, Guoqing and Shiren,1992). From an error matrix we may produceprobability vectors for each pixel given the actualestimate. Using such probability estimates it ispossible to produce multiple realizations of theoriginal interpretation and make the actualcomparison of result upon these realizations.

However, human knowledge is often inexactand we therefore need some knowledgerepresentation technique that is capable ofhandling uncertainty or imperfection related topoorly defined objects. Consider the threefollowing expressions ‘this area might beswampy’, ‘this area is rather swampy’ and ‘thisarea is either swampy or flooded’. They are allpropositions that include aspects of uncertaintythat are quite different in nature and thereforerequire different methods for representation.

The first proposition ‘this area might beswampy’ articulates a type of uncertainty that iscaused by some degree of randomness and thevalidity of an inference under randomness can beexpressed as the chance, or probability of theevent. These measures are often produced throughstatistical testing and evaluation of frequencies ofoccurrence. To make inferences and to updateprevious knowledge with new evidence we mayuse Bayesian methods of updating. A geographic

implementation of this can be found in Aspinall(1992). We may also use an extension ofBayesian probability theory called Dempster-Shafer theory, cf. Eastman (1997) and Leung(1997) for an introduction to the two approachesin a spatial information setting.

In the second proposition ‘ this area is ratherswampy’ we have expressed that the area mightpossibly be regarded as swampy. This type ofinexact human knowledge led to the developmentof fuzzy sets and fuzzy logic (Zadeh, 1965).Fuzzy sets are a suitable representational tool toaccommodate graded and subjective statementssuch as ‘rather’ to express partial belonging to aspecific concept. As such fuzzy logic is suited torepresent and infer with this type of imprecisehuman knowledge. In implementations of fuzzysystems we may translate the grade ofbelongingness of an object (this area) to theconcept (swampy) into membership values, whichcan be used in propositions and inference. Gopaland Woodcock (1994) and Woodcock and Gopal(2000) present geographic implementations of thismethod where they use fuzzy sets for accuracyassessment and area estimation of thematic maps.Their approach enables a translation fromlinguistic terms of interpretation judgments intofuzzy memberships for further use in accuracyassessment. It follows that the validity of answersfrom a fuzzy system is depending on the meaningassociated with the concepts used in the fuzzypropositions. It is important to note thatmembership values as such reflect an orderingthat is not based on probability but on admittedpossibility (Burrough and McDonnell, 1998).

The last one of the propositions ‘this areamight be either swampy or flooded’ mayintuitively be translated into a 50/50 % chance ofeither alternative. But in many situations thisstatement is based on the total lack of support foreither of the two outcomes (a natural tag to thestatement would be ‘… I don’t know.’). To giveboth of the alternatives the same chance is toviolate the fact that no such information isavailable. The recent development of rough settheory (Pawlak, 1982; Pawlak, 1991) hasprovided a viable tool to address uncertainty thatarise from inexact, noisy or incompleteinformation. Also, rough sets and the concept ofrough classification have demonstrated promisingapplications for geographic information handling(Schneider, 1995; Worboys, 1998a; Ahlqvist,


Keukelaar and Oukbir, 2000a - chapter 6 thisthesis).

In the previous section about categoricalgranularity I illustrated some examples that triedto associate categories in one context with similarcategories in another context. This was known asmapping semantic similarity and it wasacknowledged that this mapping needed to be ableto represent graded concepts and indiscernibilitybetween competing concepts. The concepts ofprobabilistic, fuzzy, and rough uncertainty caneasily be confused but in light of the discussionabove it should be apparent that these threeconcepts represents quite different facets ofuncertainty. In the following chapters, especiallychapters 6 through 8, this knowledgerepresentation technique will be addressed inmore depth.

It is also conceivable to think of combinationsof these types of uncertain knowledge, forexample ‘this area might be rather swampy’stating that there is some probability of this areato be regarded as swampy, even though it mightnot be a full member of the ‘swampy’ concept. Itwould only be natural then to try to merge thedifferent representational techniques into jointmeasures of uncertainty capable of expressing allcombinations of uncertain knowledge. Formaldescriptions of the relations between thesedifferent models of uncertainty have also beenpublished (Dubois and Prade, 1992). This iswhere we reach the current forefront of theresearch on uncertainty in decision-makingproblems (cf. Pal and Skowron, 1999). In chapter8 I present one effort to further develop thesetheories applied to the field of geographical dataintegration. Chapter 8 also goes further to includethe wider problem of knowledge representationunder semantic uncertainty. These last issues arethe focus of the following section.

A geographical concept topology

Approaching the remaining problem ofcombining the different types of uncertaintyrepresented by probabilities and possibilities thereseem to exist workable approaches. The idea ofusing multiple sources to explain a specificconcept has close similarity with multi-criteriadecision analysis (Malczewski, 1999; Eastman,1997). The multi criteria decision analysisframework enables a combination of separatelines of uncertain evidence such as deterministic,probabilistic and possibilistic (fuzzy) rules, into

an answer or several scenarios. I propose here thatthese theoretical constructs can be used toperform a transformation from one context intoanother, using multiple sources of knowledge andinterconnected concepts. I further propose thateach concept involved in this transformation canbe given a deeper meaning by explicitly definingmappings such as the examples related to Figure9. These mappings can use either certain one-to-one, many-to-one relations or uncertainprobabilities (not treated in this work) orpossibilities. Above given examples have shownthat semantic relationships very often is of anunderdetermined character that is either graded orindiscernible of a type that is not probabilistic.Therefore implementations of context mediationbased on first order logic are too restricted, and Ipropose instead to construct mappings usingformalisms capable of expressing various formsof imprecision. In chapter 8 I present a method tointegrate fuzzy and rough data into a result thatcan be generalized into a map. Thisimplementation is a demonstration of the idea touse crisp, fuzzy or rough sets to define a completeor incomplete translation from one concept toanother, and create a transformation betweencontexts from this. Since no information sourceprovide a one to one mapping the idea is based onthe integration of the multiple lines of evidence. Ituses the concept of multi criteria decision supportand the idea of fuzzy aggregation operations toextended the fuzzy approach to incorporate crisp,fuzzy and rough sets through conversions intobifuzzy sets.

I denote such mappings between concepts asemantic topology of available geographicdatabases. The notion of topology is withingeographic information science normallyunderstood as the spatial interrelations thatdescribe how real world phenomena are linkedtogether (cf. Burrough and McDonnell, 1998).Topology may also be generally understood as thestudy of interconnections (FOLDOC) andtherefore include any type of connection such asspatial, temporal or categorical. I will thus in thisthesis use a wide definition of topology andpropose the use of a Geographic ConceptTopology, a GeCoTope, to establish a formalizedrepresentation of semantic interrelations. AGeCoTope forms part of the metadata for eachdataset and it also forms a higher-level metadatastructure capable of connecting different datasetsboth semantically and functionally. The


GeCoTope is primarily intended to make explicitthe semantic similarity between categories fromseparate contexts and I hypothesize thatcategories of controlling factors such astopography, hydrology, soils, are among the moreimportant contexts to establish mappings to.

Figure 10 illustrate the implementation of theGeCoTope as a context transformation toolbetween (1, 3) and within (2) three hypotheticalorganizations. In the experiment in chapter 8 Ishow how the concept of multi-criteria evaluationcan be used to transform local vegetation data intoa regional vegetation classification (1) ‘weighing’additional information about wetness into thedecision (2). In chapter 8 I also suggest that otherlocal organizations could produce similarmappings and that a network of such mappingsgive a possibility to transform data within andbetween organizations (4) realized throughchained transformations. It may be discussedwhether it is better to use ‘controlling factors’such as the topography form an elevation model(2) as a bridge or ‘proxy context’ for these

mappings rather than arbitrary classificationsystems. In any case, the framework of theGeCoTope allows for solutions based on‘arbitrary’ common ontologies as well ascontrolling physical factors.

This issue connects to the use of federateddata bases that in earlier work has been seen as aviable solution to establish bridges (proxycontexts) between localized contexts (Bishr,1997) whereas others have argued for localizedapproaches (Aslan and McLeod, 1999). Here theproposed framework seems open for bothfederated and localized solutions. A combinationof these may also be envisioned where a largenumber of transformation alternatives can beaccomplished through chained transformationssuch as those outlined in chapter 8. The fact thatthese operations will often includetransformations between measurementframeworks require careful consideration of theinherent meaning of each data set astransformation rules are formulated (Chrisman,1999).

Recreational areas

Local Inventory of biodiversity

Regional vegetation map

Topographic map

Elevation

Soil map Local In ventory of vegetation

Local Organization 1

Local organization 2

Regional organization

Water management map

2

3 4

1

Figure 10 Organizational perspective on exemplified (1-3) and possible (4) transformations using the suggestedapproach.


From a general geographical point of view,and following the argumentation in chapter 8, theGeCoTope framework enable the consideration ofboth location-based (wetness data) and object-based views (vegetation units) in a concepttransformation process. This has other interestingconnections to the ideas of Boundary object andDue process (Star, 1989) mentioned earlier inchapter 1. In chapter 8 I argue that theconstruction of fuzzy membership functions andrough classification rules in that chapter, can beinterpreted as a “due process” where differentgroups or organizations constantly try torecognize, gather and weigh evidence fromheterogeneous conflicting sources.

I also suggest in chapter 8 that many currenttechniques for spatial analysis, including theproposed framework, is a spatial implementationof the boundary object idea. Consequently, Isuggest that multiple concept mappings, heretermed a GeCoTope, can be used to negotiatesemantic similarities and differences using aformalized framework of bifuzzy classifications.

Since the due process is supposed to be adynamic and ongoing activity I hold it likely thatit will require a fairly simple implementationstructure. I anticipate that the GeCoTopeframework, where each link is maintained by alimited number of participants, is a simple yetpowerful implementation of the idea of a dueprocess.

With this preview of the findings in my work Ihave given a summary of the entire thesis. I hopethat this makes the argumentation in the followingchapters easier to follow. I also hope that itenables a “high level browsing” of the followingtexts for those not interested in the details andparticulars of each chapter.

Discrete methods for interpretation of landscape information • 33

DISCRETE METHODS FORINTERPRETATION OF LANDSCAPEINFORMATION

Introduction

In this work it has been of great importance toreduce the possible sources of error and to controlthose that are inevitable. It has also been a goal tobe able to study changes in spatial and categoricalgranularity separately. This chapter is primarilydevoted to a description of the data that wasselected for analysis of changes of spatialgranularity. For the studies of categoricalgranularity another set of previously describeddata has been used (Ahlqvist and Wiborn, 1986)

Most, if not all, studies on effects of changedspatial granularity in raster based data use datathat in some way are a product of automated orsemi automated data collection (Woodcock andStrahler, 1987, Turner et al, 1989; Moody andWoodcock, 1994; Bian and Butler, 1999; VanBeurden and Douven, 1999; Milne and Cohen,1999). This has a positive effect in that a widerange of resolutions and aggregation levels maybe covered by the study since the most detailedlevel of data can be collected in huge quantities,10-100 thousands of pixels. The major drawbackof the automated approach is that a translation hasto be made between the information amachine/sensor captures and the information thathumans preferably use. By focusing interesttowards manually interpreted information it ispossible to close up on scaling effects that pertainto the human mind and the concepts we use tounderstand the real world.

Interpretation of land cover information can bemade with extreme variation using different datasources (maps, aerial images, satellite images),different spatial representations (point, area), anddifferent measurement scales (continuous,ordinal, nominal). Early in the digital map agethere were no standard method to collectinformation for digital spatial databases. Thus,textbooks on the subject from this time tend tohave rather detailed descriptions of the proceduresto capture data both manually and by automatedmethods, cf McDougall, 1976, p.57-60 for

description of collecting rasterized data. Theprogress of technology has significantly reducedthe amount of textbook space down to a few linesor paragraphs for consideration of manualmethods to produce raster based, source datasets(Cromley, 1992, p.131; Jones, 1998, p.96).Manual production of geographic datasets istoday mostly confined to the object/vector spatialmodel. This might be an issue of renewed debatesince analyses of spatially regionalized data havebeen shown to produce difficult problems, such asthe modifiable areal unit problem (Openshaw,1983). Moreover, as most methods for spatialmodeling are done in a raster model of space itcan be argued that source data sets more oftenshould employ a regular grid structure in thecollection phase.

Against this background of a diminishinginterest for manually collected grid data it maynot be surprising that the data described in thischapter was collected using methods originallydeveloped in the 1970’s. The described methodsare based on manual inventory and interpretationof maps and areal images. Both methods also usea regular square grid as the basic areal unit. Thecontent of these datasets therefore resemblecontemporary datasets that today is assembledthrough satellite or airborne sensors followed bydigital image analysis.

In 1996 a co-operation was initiated betweenMSc Rolf Ruben and BSc. Ola Ahlqvist. Rubenearly had supplementary training additional to hisuniversity degree in forestry with two years ofstudies at the Royal Institute of Technology, dept.for Surveying and Mapping Engineering. Duringhis 38 years within the silvicultural organizationhe worked with development of more effectivesite-specific methods for forest primaryproduction. Parts of this work is documented e.g.the Reforestation assessment (Skogsstyrelsen,1966). Other work is only available as internalreports, field protocols and other working papers.Thus, the main content of this chapter is acompilation and documentation of Mr. Ruben’s

Chapter

4

34 • Discrete methods for interpretation of landscape information

previously unpublished method development andresults.

Consequently, this chapter has severalobjectives. Primarily it gives a thoroughdescription of the datasets that have been used forstudies of changed spatial granularity in thisdissertation. Second, Rolf Ruben and myself wantto present these two previously unpublisheddatasets. We also hope that a thorough descriptionof the methods behind these data will initiatedeeper discussions about methodological andstatistic aspects. In this way this chapter may beof both public and scientific interest both in termsof actual information and the methods employed.

Structure of the method and data descriptionFirst I give a short background to the context inwhich the described methods were developed anddata was collected. Information from the NationalForest Inventory has been used as a basis fordefinitions and quality control. Therefore adescription of that inventory is also provided as abackground.

Both of these methods, the R-method and theclassification of economic map sheets intolandscape types are described in separate sections.Each of these sections consists of a Method sub-section and a Result sub-section.

The first dataset to be described, R-data of thePSU model, cover the Stockholm county andconsist of landscape descriptions from 34,464square shaped areas, each covering 25 hectares.The second dataset, classification of economicmap sheets into landscape types, cover entireSweden and contain information on landscapetypes within 18,000 square shaped areas, eachcovering 25km2.

Planning support for the usage of theforested landscape and the R-methodIn 1980 the Swedish National Board of Forestrystarted a long-term project called PlanningSupport for the Usage of the Forested Landscape,PSU. The aim was to develop methods to linkstrategic data from the National Forest Inventorywith operative data often collected in the form offorest management plans. Another purpose for theproject was to provide an information source forforest and land-use politics aiming at a betterbalance between central and local concerns.Stockholm County was chosen as a pilot example.

The project should be viewed from the forestindustries’ continuous need for both general anddetailed planning and decision support in their

management of landscape resources. Also, by thetime of the initiation of the project the technicaldevelopments had enabled computerizedtreatment of areal information from remotesensing sources.

The method to provide a reliable source ofinformation on a general level within reasonabletime and financial limits was the same then as it isnow. A statistically representative sample ismeasured with greatest possible accuracy with aconsistent and thoroughly controlled method. TheNational Forest Inventory has in this wayprovided reliable information on the status of theforest resources in Sweden. The National ForestInventory data can be spatially separated down toa spatial resolution corresponding to the countyadministrative level. This spatial resolution thusenables an illustration of the spatial distribution ofcertain forest variables. Regional authorities mayalso use this disintegrated information as anindication of the relation of the local region to theentire country. Because of statistical reasons it isnot possible to get a higher spatial resolutiondirectly from the National Forest Inventory.

The technique to collect similar information atthe local level, e g municipalities or a singleproperty has up until recently been characterizedby a manual survey, estimation, and mapping ofthe entire area. In this way, a good documentationof the variation within the surveyed area has beencollected. This documentation has then been usedin forestry operations. The quality of these plansin terms of absolute values is however not as goodas those achieved through systematic sampling.These local data sources are therefore not suitedfor integration over larger areas because of thevariable and unknown bias included in eachseparate dataset. The comprehensive forestryinventory that was conducted during 1980through 1993 aimed at producing county-wisecollections of detailed forestry data. For reasonsexplained above some kind of correction of theindividual bias would be required forcomprehensive forestry inventory data. Thiscorrection would have been possible using themethods and data described in this report. Thecomprehensive forestry inventory was howevernever completed because of a change in theSwedish forestry statute in 1994 (SFS 1993:553)

In view of that the county and municipalauthorities has been short of a reliable informationsource for planning purposes since theseorganizational levels require a higher spatial


resolution than what the National ForestInventory can produce. It is in this context that thedescribed R-method was developed.

Classification of economic map sheets intolandscape typesIn the late 1980s The Swedish National Board ofForestry, Environmental Protection Agency, theNational Heritage Board and the Nordic Councilof Ministers co-operated on a project toinvestigate the relation between forestry and otherinterests in the Swedish and Finnish archipelagoregions (Kihlbom, 1991). To ensure scientificallysound comparisons in cases like these it isimportant to produce a stratification of thelandscape into regions, and several researchershave during the last century divided Sweden intoregions based on a multitude of geographical ortopic specific criteria (Hall and Arnberg,submitted). As one part of this effort, Rolf Ruben,made a delineation of the Swedish archipelagoregion in areal units of one Swedish economicmap sheet. Furthermore, as a working material,the background information for this delineationwas extended to cover entire Sweden. As a resultthis working material came to include estimationsof constituting landscape types in 18,000, 5 by 5km quadratic squares covering entire Sweden.Still, this material has never been documented,nor analyzed in any larger extent.

Description of the study area

SwedenSweden is located in northwestern Europebetween Lat 55° 20′ N through 69° 4′ N and Long10° 58′ E through 24° 10′ E. The extension fromnorth to south is at roughly the same latitude asAlaska or—in the Southern Hemisphere—thestretch of ocean between Cape Horn in SouthAmerica and the Antarctic continent. In terms ofarea it is similar to Spain, Thailand or California.In population, it is in the same league as Belgium,Ecuador or New Jersey. Its long coastlines, largeforests and numerous lakes characterize Sweden.

Geologically, Sweden is located on thePrecambrian Baltic shield and the geologichistory extends as far back as the Archeanorogeny some 2.8 billion years ago. Thelandscape still bears traces of many of thegeological transformations that the land hassubsequently undergone. Most of the interior ofnorthern and central Sweden is dominated by‘Norrland terrain’. This terrain consists ofscattered bedrock hills from 50 to 400 m high.

This topography is mainly the result of millions ofyears of weathering and erosion of a previouslyflat rock surface. The western border betweenSweden and Norway mainly follows theScandinavian mountain range. It was folded upduring the Silurian and Devonian periods in theCaledonian orogeny, later eroded and was thenraised again during the Tertiary period and theneroded again. Now its peaks rise 1,000–2,000meters above sea level. Drainage from themountains flows in a southeasterly directioneventually forming the river valleys that feeds theGulf of Bothnia. Much of southern Sweden ischaracterized by a sub-Cambrian peneplain,which in many places has been reshaped bytectonic movements and erosion. This peneplainhas formed for instance the characteristicStockholm region fissure-valley landscape, whichextends into the Baltic Sea as an archipelago.

The two main natural soils, podzol andcambisol, are shallow, usually within the range20-50 cm deep. These soils are formed in depositsmainly originating from the latest Weichselianglaciation. These glacial deposits are often foundas till covering the bedrock surface. In lower partsof the terrain, especially below the highestshoreline, glacial or postglacial clay or silt hasbeen deposited on top of the till cover. Peat is alsoabundant in the northern parts of Sweden.

Sweden's climate is determined by its northernposition in the border zone between Arctic andwarmer air masses as well as its location on thewestern rim of the Eurasian continent close to theAtlantic Ocean with its warm Gulf Stream.Annual mean temperatures range from +8°C inthe south to -3°C in the north. Annualprecipitation typically varies between 500 and800 mm but in the mountain range annuals mayreach as much as 2000 mm. Mean annualprecipitation is on average more than twice asmuch as evapotranspiration. This, together withthe influence of glacial deposits, gives Swedennumerous lakes of varying sizes.

The vegetation of Sweden is divided into fourmain zones. The Alpine zone is located along themountains in the northwest. The boreal zone innorthern Sweden is part of a vast coniferous forestbelt, the taiga, which covers the northern PolarRegions. The boreonemoral zone south of thebiological Norrland boundary, LimesNorrlandicus, is a belt of mixed forests consistingof coniferous trees and a significant amount ofbirch and aspen on till soil and groves of nemoral


trees in the cultivated landscapes. In the far southand on the west coast is the nemoral zone withelm, ash, beech, maple, lime and oak trees, withbeech being the most characteristic.

The most densely populated areas lie in thetriangle formed by the three largest cities—Stockholm, Göteborg and Malmö—and along theBaltic coastline north of the capital. The interiorof Norrland is very sparsely populated. The mostdistinctive agricultural districts appear as

scattered ‘islands’ in a sea of forests.

StockholmThe County of Stockholm is part of the relativelyhomogeneous central Swedish fissure-valleylandscape, located within the boreonemoral zone.This region was entirely covered by ice during theWeichselian glaciation. During the deglaciationphase the region has changed from beingcompletely submerged under the ice-sea lake

Figure 11 Map of study areas. Structure data were collected for entire Sweden (upper inset) and R-data collected forStockholm County (main map). Dot symbols show the location of NFI tracts and plots within these tracts (lower inset) thatwere used for accuracy control.


surface into a region of scattered smaller andlarger islands and lakes at the Baltic Sea coast.Governed by the fissure valley bedrock structurethe glaciation and deglaciation has produced anareal pattern of lower and higher land, in whichvalleys are filled with sedimentary clays andhigher ground is covered by till or consists of barerock.

The southern shore of Lake Mälaren is mostlyconsisted of rocky cliffs, its northern shore ratherflat. The city of Stockholm was founded in the13th century on a small island at the entrance ofLake Mälaren. The southern region was sparselypopulated and in the north the town met withfarming settlements and villages. During the lastfew hundred years, Stockholm has undergoneperiods of rapid urbanization. In the second halfof the 20th century the built up areas started toexpand radially along the main roads and railwayswithin a radius of 25-30 km. Between thesesuburban settlements there are still significantareas of forestland as well as some agriculturethat reach fairly deep into the city center.

The National Forest Inventory

The following brief description is meant to give abackground on the purpose of the National ForestInventory, its content and the spatial and thematicresolution of the data.

BackgroundThe goal of the National Forest Inventory is toprovide information on the forest natural resourcefor planning and management on a national andregional level. It also aims at providing data forforestry research. Based on the National ForestInventory it should be possible to presentinformation on a regional level with adequateaccuracy. It is on the other hand not possible togive estimates of smaller geographic units, forinstance municipalities.

The first National Forest Inventory wasconducted 1923–29 and the second 1938-52. Boththese inventories were carried out one county at atime and in the form of a traverse sampling.Starting 1953 and until present entire Sweden issurveyed with a sparse sampling frame. At thesame time the actual sampling was changed to aninventory of quadratic so called tracts withcircular sample plots located along the tract edgesFigure 11. Further refinement of the sampledensity has resulted in that from 1973 andonwards it is possible to make reliable estimates

for several important forestry parameters based onthe integration of 5 years of data.

The density of the sample has been chosen toprovide a maximum standard error of 12% forestimates of forestland area and a maximumstandard error of 5% for estimates of growingstock.

MethodThe National Forest Inventory is an inventorybased on sampling. The sample consist today ofcircular plots with a 10m radius (314m2). Theplots are distributed along the sides of quadraticso-called assessment tracts. One such tract canunder normal conditions be surveyed in the fieldin one workday. The tracts are systematicallydistributed over the entire country in a sparsenetwork. To be able to use different samplingdensity Sweden has been divided into 5 regions.The distance between tracts and the number ofsample plots may vary between these 5 regions.Stockholm County is located within a regionwhere the side of the quadratic tract is 1200m, theaverage distance between the tracts in a 5-yearsample is 7km, and the number of plots in eachtract is 20. The field inventory is carried outduring the summer from May trough August bysome 20 inventory teams each consisting of 5persons. In addition to this approximately 10% ofthe surveyed area is subject to control inventory.This is made to enable detection and correction ofany errors in the data collection and to estimatethe accuracy of the registrations.

There are two types of errors in theregistrations, random errors and systematic errors.Random errors usually arise as a consequence ofthe sampling. These errors can be statisticallyestimated and they are presented with the data asa standard error for each variable. Systematicerrors might occur due to deficiencies in themeasurements, used factors in calculations ofvolume or individual variation in theinterpretation of inventory methods and fieldinterpretation.

For area calculations of some characteristicthe following information is used; the total area ofthe county, the total number of field plots in thecounty and the number of field plots that have thewanted characteristic. First the area factor iscalculated, that is the land area that each field plotin the sample represents. The area factor = totalland area / number of filed plots. The area factorfor Stockholm County for the period 1973-77 is254ha. To calculate the area of for example


swamp land supposing that we know from thefield data that 100 field plots were found to beswamp. The total swamp area in the county wouldbe estimated to be 254 * 100 = 25 400 ha.

VariablesThe National Forest Inventory measures andcalculates a host of variables out of which thisintroduction only considers variables alsoincluded in the R-method.

Land-use classesLand areas are in the National Forest Inventorydivided into the following land-use classes.• Forestland• Swamp• Rock surface• Power lines• Various land• Sub alpine woodland• High mountains• Roads and railways• Agricultural land• National parks, nature reserves, certain

military areas etc. (NRS)• Water• Outside County

Site qualitySite productivity is a measure of the site specific,wood productive capacity. In the National ForestInventory, Jonson’s (1914) site productivityestimation method is still used. This is expressedas 8 ordinal classes divided by the ideal timbervolume production for a 100-year growth period.Field estimation of site productivity is difficultand includes several subjective elements. This is

why data such as local forestry plans andcomprehensive forest inventories can be expectedto include rather large errors in terms of absolutevalues. Results from the control inventoryperformed during 1973-77 show that site qualitymeasurements are fairly consistent in the NationalForest Inventory data (Svensson, 1980).

Cutting classesThe concept of cutting class is related to forestryapplication and it is a classification of thematurity within a field plot relative to the finalfelling age. It also indicates the next silviculturalmeasure to be conducted. The classification isbased on information on crown closure, averagetree diameter, age and height. Table 2 illustrates aslightly generalized version of this division.

Forest typeThe forest type variable is derived by firstspecifying the constituent tree species fraction ofbasal area expressed in tenths. The classificationrules in Table 3 are then applied on the derivedspecies mixture, dividing the forest type variableinto 5 classes.

Crown closureThe crown closure variable expresses to whatdegree an existing stand uses the site productivityas a consequence of the density of the stand.Crown closure is given on a relative scale from 0to 1 divided into 10 classes. 1 denotes fully closedcrown foliage that makes full use of the standarea. If a stand is so dense that it impedes thedevelopment it is classified as over-closed and isgiven a value of 1+. Crown closure below 0.3 isheld as a regeneration area and a value between0.3 and 0.5 is held as sparsely distributed forest.

Growing stockGrowing stock or standing volume is measured inm3sk, and is a measure of total stem volume overbark above stump height including top. Thegrowing stock variable is only measured on treeshigher than breast height, that is > 1.3m.

Material and methods – the R-method

Background

Table 3 The National Forest Inventory Site quality classes (Jonson, 1914)

Site quality class I II III IV V VI VII VIIIIdeal yield,m3sk/ha and year 10,5 8,0 6,0 4,5 3,4 2,5 1,8 1,2

Table 2 The National forest inventory classification ofcutting classes (Svensson 1979)

Cutting class Meaning A Unstocked forest B1 Thicket stage B2, B3 Young forest C Young thinning forest D1, D4 Old thinning forest D2, D3 Final felling forest E Residual or creamed forest


The basic idea behind the R-method (R=ruta,square in eng.) is a systematic summary oflandscape information within quadratic, 25-hectare squares. One important feature in thedesign of the R-method is the ability tocompensate for systematic bias. This isaccomplished by comparison of the results withthe corresponding estimates from the NationalForest Inventory, which have statisticallydetermined standard errors. The systematic bias inthe R-data can hereby be quantified and correctedto the level of the National Forest Inventory. Oneof the main requirements has also been that the R-method shall produce objective data within areasonable time and at a higher resolution than theNational Forest Inventory. The described methodis able to produce data for an area correspondingto Stockholm County within one man-year.

The definition of the R-data variables givenbelow, allow for two variables to be directlyrelated to the county estimates from the NationalForest Inventory. These are ‘area of productiveforestland’ and ‘growing stock’ per hectareproductive forestland.

MethodThe R-method involves the systematic summaryof landscape information derived from aerialphotographs. The collected R-data include all500x500m squares delimited and specified by co-ordinates on the Swedish economic map and thatalso have at least 50% of the square area locatedwithin Stockholm county. This comprehensivesummary of landscape information within arelatively large area is derived from large amountsof detailed information. This detailed informationis derived through ocular inspection, which isinterpreted and translated into 8 variables with 3to 10 classes for each variable. These variablesand classes are described in detail below.

The R-method has mainly used aerialphotographs as information source for thedatabase that has been created for StockholmCounty. Regardless if the areal unit is 25 ha or 1

ha, aerial photographs, satellite images and alsofield studies are able to provide large quantities oflandscape data of interest for various applications.The final choice of measurement method and thesize of the areal unit is guided both by the purposeof the study and by available resources. There ishowever a fundamental difference betweenremote sensing and field based measuringmethods. Both airplane and satellite carriedsensors have the ability to give an overlook anddetailed information at the same time. Fieldmeasurements on the other hand often demandtime-consuming transportation in order to getclose to the same amount of summarizedinformation.

During the summer 1975 all of StockholmCounty was surveyed by aerial photography at alow elevation, 9200m, using panchromatic film.The R-data described here used these imagescopied onto photographic transparencies at a 1:50000 scale. For the interpretation the photographictransparencies were mounted on a light-table andinterpreted through a mirror-stereoscope (Wild).An enlargement factor of 3 times was used duringthe interpretation. A constant source for additionalinformation was also the Swedish topographicmap, scale 1:50,000 and the Swedish economicmap, scale 1:10,000. In addition a countywidemap of forest landscape parameters, scale1:50,000, was used. This map was compiled withthe aid of silvicultural advisers from the localdistricts during a preparatory phase of the actualR-data inventory.

During the digital registration of the R-dataeach 25-hectare square unit was specified by itsco-ordinates in the national grid. The boundary ofeach economic map sheet was transferred onto theaerial photographs using a mm-graded ruler andwith the guidance of the land ownership pattern.The interpreted values for the 8 variables werewritten onto a transparent film with a 1cm2 squaregrid, which in a 1:50 000 scale corresponds to theareal unit of the inventory, 25 hectare, Figure 12.

Table 4 The National forest inventory classification of forest types based on measured tree species mixture (Svensson1979) Forest type Fraction of basal area or of number of

stems/plants Pine forest Pine at least 7/10 Spruce forest Spruce at least7/10 Mixed coniferous forest Pine and spruce altogether at least 7/10 Mixed coniferous and broad leaved forest Broad leaved trees 3/10-6/10 Broad leaved forest Broad leaved trees at least 7/10


Thus, the inventory is conducted on a square-by-square basis. The entire area within the squareand nothing on the outside is visually interpretedand classified according to the instructions below.Regardless of content, each square was alwaysgiven an estimate of the area of productiveforestland within the square. This estimate isgiven as tenths, 2.5 hectares, of the total squarearea. Thus the code 1 means 10,0-19.9% or 2,50-4,99 hectares. In cases of doubt, a directmeasurement was performed on the economicmap sheet.

The pilot study that was conducted as part ofthe PSU-project covers entire Stockholm countyand it describes forest landscape variables for34,464 quadratic, 25-hectare squares. The averageinterpretation performance using the R-methodwas estimated to about 4000 hectares/day.

Instructions for the inventoryThe following section describes how theinterpreted variables are to be classified andwritten down on the transparent inventory sheet.

Figure 12 Example inventory transparency


Area productive forestlandCodes 0-9 are given for each tenth of the totalsquare area that is covered by productiveforestland. Numbers are written in the top left partof the square on the inventory transparency.

Land-use classIf water, planned buildings, agricultural area orother land-cover types except forestland cover50% or more of the square area, a code is given inthe middle of the square on the inventorytransparency. Codes are given according to Table5.

For all squares coded as V, P, J or A the arealcoverage of that land-use class is specified intenths in the lower right part of the square on theinventory transparency.

Forest wastelandAreas not classified, as V-, P-, J- or A-squaresaccording to the specification above, areforestland squares. In these squares the total areaof productive forestland, rock surface wastelandand swamp wasteland is always more than 12.5hectares, that is more than 50% of the total squarearea. If the area of wasteland dominates, thesquare is classified as either rock surfacewasteland or swamp wasteland according to Table6. The code is written in the middle of the squareon the inventory transparency.

For both types of wasteland the areal coverageof that type of wasteland is specified in tenths in

the lower right part of the square on the inventorytransparency. In addition, the crown closure andthe bearing capacity of the wasteland is specifiedby a number next to the wasteland class codefollowing the directions for these variables below.As usual, the remaining productive forestland isspecified in tenths in the upper left corner of thesquare on the inventory transparency.

Visible roads, power lines, landfills etc. is notincluded in the forest area. Larger trails, ditches,telephone lines etc. should however never beexcluded. This would require images of higherquality and a substantially higher level ofambition.

All other squares are evaluated on all 8variables, which are specified with a six-digitcode in the square on the inventory transparency.The first sign in this code is, as previouslyexplained, a number that represent the amount ofproductive forestland within the square. All sixsigns describe forestry conditions in theproductive forestland area. At the interpretationthe signs are written down within the square areaon the inventory transparency in two rows withthree signs in each row in a systematic fashion,Figure 12.

Site qualityThe second sign, written in the upper middle partof the square on the inventory transparency,represents the site specific, wood productivecapacity. This is encoded according to Table 7.

Table 5 Land-use classes and corresponding codes in R-data Code Land-use class

V Water area also including smaller islands < 1ha P Area planned for dense settlements J Agricultural area with accompanying grounds. 10 classes are separated as follows: J0 More than 50% of the area is settlement, farm center et c. J1 ≥ 50% of the area is field and accompanying buildings J2 ≥ 50% of the area is field of good quality J3 " normal quality J4 " poor quality J5 " varying quality J6 ≥ 50% of the J-area is grazed J7 " and field on solid ground J8 " and field on swampy ground J9 Other varying agricultural land A Other land-use classes than forestland cover more than 50% of the area

Table 6 Forest wasteland classes and corresponding codes in R-data Code Waste land type

H Rock surface wasteland; at least 50% of the forestland is rock surface and otherwaste land

M Swamp waste land; at least 50% of the forestland is swamp and other waste land


Table 10 Growing stock classes and correspondingcodes in R-data

Code Growing stock class (m3sk/ha productive forestland)

0 0-24 1 25-49 2 50-74 3 75-99 4 100-124 5 125-149 6 150-174 7 175-199 8 200-2249 225-

Tree speciesThe third sign, written in the upper right part ofthe square on the inventory transparency,represents what tree species that is estimated asdominant on the productive forestland.Classification and encoding follows Table 8.

Cutting classThe fourth sign, written in the lower left part ofthe square on the inventory transparency,represents cutting class and is a classification of

the maturity within a forest stand relative to thefinal felling age. Classes and codes used are givenin Table 9.

Growing stockThe fifth sign, written in the lower middle part ofthe square on the inventory transparency, gives anestimation of the average growing stock. Valuesare given as one of ten classes according to Table10.

Stand density and ground bearing capacityThe sixth and last sign, written in the lower rightpart of the square on the inventory transparency,represents the relative stand density measured ascrown closure and the ground bearing capacity onthe productive forestland. Classification andencoding follows Table 11.

Commentary on the instructions for theinventory

Land-use classThe first consideration at the interpretation ofevery individual square area is to determinewhether or not other land-use classes thanforestland, including forest wasteland, cover more

Table 7 Site quality classes and corresponding codes in R-data Code Site quality class

+ Good productive capacity; site quality ≥ 6,0 m3sk/ha*year is estimated to cover ≥50% of the productive forestland area

• Normal productive capacity; site quality 4,0 – 5,9 m3sk/ha*year is estimated to cover≥ 50% of the productive forestland area

- Poor productive capacity; site quality < 4,0 m3sk/ha*year is estimated to cover ≥ 50%of the productive forestland area

, Other productive forestland with varying site quality

Table 8 Tree species classes and corresponding codes in R-data Code Tree species class

T Pine forest; pine constitute at least 70% of the growing stock G Spruce forest; Norwegian spruce constitute at least 70% of the growing stock B Coniferous forest; neither pine nor spruce dominates, coniferous trees constitute at

least 70% of the growing stock L Deciduous forest and mixed forests where deciduous trees constitute at least 30%

of the growing stock∅ Regeneration area or area with unknown species composition

Table 9 Cutting classes and corresponding codes in R-data Code Cutting class

X Final felled area with or without planting and nurse trees as well as other areasaccording to the old Swedish forestry statute SVL§ 5:1, 2, 3, is estimated to cover ≥50% of the productive forestland area.

U Thicket stage forest is estimated to cover ≥ 50% of the productive forestland area. C Thinning stage forest is estimated to cover ≥ 50% of the productive forestland area. S Final felling forest is estimated to cover ≥ 50% of the productive forestland area.E Other combinations of cutting classes


than half of the square area. If so, this square isassigned one of the four following land-use classlabels:

· Water (V)· Planned settlement area (P)· Agricultural area (J)· Other land-use (A)If this is not the case, the square is regarded to

be forestland. This procedure will generallyreduce the frequency of land-use classes thatalready occur in lower frequencies or in smallerareal patches than the average. This is caused bythe fact that such land-use classes less oftencovers 50% or more of the interpreted 25-hectarearea.

No direct comparisons can be made betweenthe estimates of land-use classes from R-data andthe National Forest Inventory, unless thedifferences in classification can be eliminated,Figure 13. See also section Data analysis below.

V-squares - Larger water bodies such as themajor bays of the Baltic archipelago, the lakesMälaren and Erken, are not included in theinventory. On the other hand, pure V-squares in-between islands and the coastline are included forpurposes of completeness. For the squaresclassified as water, the code 0V9 (0 forestland and9 water) normally mean 100% water. The totalnumber of such squares is rather large withinStockholm County. Theoretically, following thedefinition, the code 0V9 implies that 0-9% of thesquare area might be other than water, forexample shoreline tree vegetation. However, these

areas ore not considered holding any productiveforestland.

V-squares encoded 0V5 through 8 indicate acertain proportion of other land-use classes thanwater of which some might be forestland. Thesescattered pieces of forestland close to water is notincluded in the total estimate of productiveforestland in the county. It might be of interest toknow that the total area of such scatteredforestland with proximity to water within thecounty is probably around 3000 hectares.

Shoreline forestland has sometimes during theinterpretation been considered as wasteland andtherefore not been counted as productiveforestland.

P-squares - In cases where half of the squarearea consists of planned settlements the economicmap has been consulted in detail. From this thefollowing areas have been included in the class:all plots whether built on or not, smaller adjacentparcels not suitable for forestry or agriculture, andareas such as sport fields, maintained beaches etc.

Forested areas an non-maintained agriculturalland bigger than 0.5 hectares has been regarded asproductive forestland even if these areas mighthave substantial other values such as recreationalarea for local residents. If more than half of theforest production theoretically could be harvestedthese areas have been classified as forestland evenif it is situated with densely populated areas.Consequently parts of the Djurgården and

Table 11 Crown closure and ground bearing capacity classes and corresponding compound codes in R-data Code Crown closure and ground bearing capacity.

0 Forests with markedly varying crown closure as well as ground bearing capacity 1 Well closed forest stands and high ground bearing capacity comprise > 50% of the

productive forestland area 2 " and normal ground bearing capacity comprise > 50% of the

productive forestland area 3 " and clay or otherwise wet ground with low ground bearing

capacity comprise > 50% of the productive forestland area 4 Forest stands with normal

closure and high ground bearing capacity comprise > 50% of theproductive forestland area

5 " and normal ground bearing capacity comprise > 50% of theproductive forestland area

6 " and clay or otherwise wet ground with low ground bearingcapacity comprise > 50% of the productive forestland area

7 Forest stands with sparselydistributed trees

and high ground bearing capacity comprise > 50% of theproductive forestland area

8 " and normal ground bearing capacity comprise > 50% of theproductive forestland area

9 " and clay or otherwise wet ground with low ground bearingcapacity comprise > 50% of the productive forestland area


Humlegården parks have been classified asforestland.

Residential plots in the forestland that are notyet built upon have been counted as P-land if it isanticipated that they may be built upon in a nearfuture. Otherwise they have been considered asforestland. Old, non-developed, one-hectare sizedparcels, frequently found in the Värmdö area,have been counted as forestland.

Squares encoded 0P9 (0 forestland, 9 plannedsettlement area) are pure built-up areas. P-squaresencoded 0P5 through 8 have a certain proportionof other land-use classes. Smaller groves andparks are often present, probably covering some800 hectares in the entire county, but they havenot been counted as forestland.

There are 1,701 squares in the county encoded1 through 4 P 8 through 5. These hold somethinglike 9,000 hectares of productive forestland oralmost 3% of the entire forestland in the county. Itshould be noted that this forestland has an averagedistance to dense settlements of 100m.

J-squares; The agricultural areas are inStockholm county mainly concentrated ascontinuous areas such as those aroundSkeppstuna, Hölö and many others. In the

archipelago and the forest districts the agriculturalareas are markedly scattered. It was previouslymentioned that this pattern might give someskewed results due to the fairly large basic arealunits used in the inventory. This is especially truefor the J-areas and to partly compensate for thatfact the J class is subdivided into 10 subclassesaccording to Table 5.

“J0 - More than 50% of the area is settlement,farm center etc.” are not very common but stillviewed to be of particular cultural and historicalinterest. As soon as this situation has beenconceived as possible, this code has been used.

“J1; ³ 50% of the area is field andaccompanying buildings” has often been used,maybe too often.

“J2 - 50% of the area is field of good quality”is probably the most frequently used J-sub-class.The yield from the agricultural fields maysometimes be overestimated due to favorable fieldstructures, proper maintenance or some indicationof homogeneous and lush crops at the imageinterpretation.

“J3 - 50% of the area is field of normalquality” is not as common as one might suspect.The agricultural output of this class might be

Land use classes in R-data Land use classes in theNational Forest Inventory

Forestland

Swamp

Forestland Rock surface

Sub alpine woodland

High mountains

Water Fresh water

Agricultural land Agricultural land

Planned developments Roads and railways

Other Power lines

Various land

National parks, naturereserves, certain militaryareas etc. (NRS)

Outside County

Figure 13 Corresponding land-use classes in R-data and the National Forest Inventory


slightly overestimated in the archipelago regionand underestimated in other regions.

“J4 and J5 - 50% of the area is field of poor orvarying quality” probably should have been usedmore often, maybe the courage was missing! Thesupport from the image for this delimitation isoften rather weak.

Codes “J6, J7 and J8 - 50% of the area isgrazed and field on solid or swampy ground”have maybe been used too liberally and theseareas are very likely to have been interpreted asforestland in the National Forest Inventory.

“J9 - Other varying agricultural land” usuallyrefers mainly to presently non-cultivatedagricultural land with uncertainty about futureland-use.

Similar with the V- and P-squares, noproductive forestland has been recorded for the1,381 squares encoded 0 J0-9 5 through 9. Thisunregistered area is estimated to be at the most1,000 hectares for the entire county. These areasare mostly smaller forested plots adjacent to farmbuildings, fields or small forest groves withinenclosed pastures. They are probably as aconsequence of their location especially valuablebiotopes for the preservation of biodiversitywithin the cultural landscape.

A-squares comprise both concentratedoccurrences of for instance golf courses, timberstorage areas, bathing places and mixed V-, P-and J-areas that altogether cover more than 12.5hectares. The definition of the A-class is thereforea bit more complicated than the previous ones.

As a consequence, one A-square can have atotally different mix of land-use classes thananother. We do know however that neither the V-,P-, J- nor the forest area alone exceeds 12.5hectares. The use of an A-class is to get 100%coverage of the entire county area.

By the technique of excluding the wastelandarea in the A-square estimates, some informationis gained. The code 3 A 5 for instance, give us thefollowing information; at least 30% is productiveforestland, at least 50% is A-class area and atleast 10% is forest wasteland. This design enablesa good estimation of the relation betweenproductive forestland and forest wasteland.

Forest wastelandH-squares - If forest wasteland is mostly made upof rocky ground or high hills the square is codedH. Stereo-interpretation of these areas requiresgood sense of observation and a good knowledgeof the regional conditions. Big, high hills with

steep slopes into the forests are easy to detect bothon the economic map sheets and in the stereoimages. This type of landscape is common onSödertörn, Värmdö and in the southern parts ofthe archipelago. In all other areas it is a lot harderto delimit areas of rock surface wasteland onlywith maps and stereo imagery.

The following method was employed to solvethis problem: Two versions, one older and onenewer, of the economic map was compared. If theolder version on its orthophoto had a bright spotwhere the newer version had a height-curve, andthe aerial imagery did not show any other sign ofproductive forestland, then this area wasconsidered as rock surface wasteland. Smalleroutcrops covered by mosses and shadowed byforest with north facing slopes are almostimpossible to detect in the stereo imagery. Andthis may be the most common type of rocksurface impediment in the county.

Rocky hill areas larger than 5 hectares havebeen mapped onto the countywide forestryinventory of 1972 and this information was ofcourse of great help at the interpretation.

M-squares - Larger mires is relatively easy toidentify in stereo images and they are alsospecifically marked on the new economic mapsheet. However, on Scots pine bogs it maysometimes be hard to separate wasteland fromforestland with poor productive capacity. It islikely that some of the M-areas have beenclassified by the national forest inventory asswamp forest of poor site quality. On the otherhand, in the comprehensive forestry inventory thesame areas would probably have been classifiedas wasteland.

These two wasteland categories occur as bothdensely and sparsely forested variants. This iswhy the code for crown closure and groundbearing capacity, explained below, has been used.The code 0 means mixed conditions and this codehas been fairly frequently used.

Area productive forestlandFor every inventoried 25 hectare square unit theshare of productive forestland has been recordedas codes 0 through 9. Thus the code records tenthsof the total square area and for example the code1 means that there is at least 10.0% and at most19.9% productive forest land within the squareunit. The figure is a combined estimate using theeconomic map sheet and the stereo image.

Water, house lots, agricultural fields,buildings, roads, power lines were most easily


located on the economic map sheet. The mapsheet does not however identify for examplegrazing areas and forest clear-cuts. These caseswere settled by using the stereo image but eventhis may not result in a clear answer. In such casesthe area was located on the old economic mapsheet, which has a fairly good bottom imprint ofthe aerial image from the end of the 1940s. If thesame land use pattern did not appear on this mapit is probably a felling site area. If however thesame land use pattern appeared on the older mapit may be a grazed area. This might still be arecently abandoned grazing area and to resolvethis possibility the areal image was consultedonce again with specific attention to details. Ifbrushes could be seen to encroach from the edgesor along ditches and especially if signs ofscarification or performed ground clearing couldbe found the area was classified as forest land. Onthe other hand, if grazing animals, areas ofheavily trampled ground was detected in theimage; the area was classified as pasture.

In remaining uncertain situations the decisionbetween forest land or grazed agricultural landwas based on considerations of the spatialconfiguration of the area under consideration andsurrounding land parcels. Properties such as size,shape and especially relation to water and farmcenter were used as decision variables if it couldbe feasible to continue grazing or forestry on thearea.

From this detailed description it is importantto note two things. First, the areal estimate ismade with great care in order to produce a reliableestimate for each areal unit. Second, there are amultitude of considerations of very differentcharacter and either one of these may determinethe outcome of the classification.

As previously explained the areal distributionof land cover types is of importance for thisclassification system. First of all it is decided ifthe combined area of productive forestland andwasteland covers more than half the areal unit.After this the wasteland area is estimated andsubtracted. The remainder is classified asproductive forestland and that area is recorded as

10% classes. This procedure also enables notexplicitly recorded land use areas to be calculatedas the difference between given area and the totalarea of the square unit.

The following variables are solely determinedbased on the properties of the productiveforestland area within the square area.

Site qualityTable 7 makes clear that the site quality variabledoes not refer to the average site quality withinthe square. Instead a mode-like measure isemployed where a specific site quality class ischosen only if it covers 50% or more of theproductive forest area. Site qualities may,especially within the County of Stockholm bevery variable from one stand to another, and eachstand is typically relatively small, only c.3hectare. It is therefore argued that it is notmeaningful to produce an average. The chosenmeasure is considered more expressive to peoplethat are non-forestry experts.

The estimates are not based on measurements.A countywide map with site quality estimatesfrom the local districts silvicultural adviserstogether with elevation information from theeconomic map sheet has been importantinformation sources. With this information inmind it is a fairly easy task for a skilled forestryexpert to use the aerial image to finally choosewhich of the four classes that best correspondwith the picture.

It is anticipated that site quality information atthis resolution could be very valuableinformation, especially for municipal planningpurposes.

The approximate correspondence between thesite quality classes used in the R-method andclasses according to Jonson (1914) are illustratedin Table 12. The mixed class of the R-method hasno equivalence in the Jonson system.

Tree speciesAll trained aerial image interpreters know that itis precarious to make estimates of tree speciesproportions in black-and-white images. Especiallydicey are cases with younger and intermediate age

Table 12 Site quality class correspondence between the R-method and Jonson (1914)

Site quality class R-data

Good (+) Normal (•) Poor (-)

Site quality class Jonson (1914)

I II III IV V VI VII VIII

Ideal yield, m3sk/ha and year

10,5 8,0 6,0 4,5 3,4 2,5 1,8 1,2


coniferous forest stands. So, for that reason theinformation from the local silvicultural advisershave been thoroughly studied before the aerialphoto interpretation. Surprisingly often thebackground photograph on the old economic mapgave additional information on the actual foreststand.

Squares with more than 3/10 deciduous forests(L-squares) appeared easiest to identify. Afterthis, older, not too well closed spruce forestsseemed easiest to identify. The dark shadow fromthese scrubby spruces was a contributing factor.On rocky substrates where normally pine treesprevail the code T was recorded, at least when thelighter gray shade of pine forests could bedistinguished. Where the local silviculturaladvisers had recorded pine trees this was taken asevidence to use the T class label even if the areawas of young or intermediate aged forest. Spruceforests on good site quality locations weredetermined similarly.

In most cases, the images of the forestlandscape were studied in a way that severalsquares could be compared in detail. When thejoint information suggested the code L, G or T,this was recorded.

Cutting classThe managed forests in the Stockholm regionhave an average growth period of some 100 years.Because of this it is anticipated that informationabout the forest maturity or cutting class is oflarge interest for all use of the forested landscape.For forestry purposes information on the locationof large volumes of forests ready for final fellingis of course of interest. Furthermore, thisinformation is of interest to all parties involved inthe physical planning process, especiallymunicipal and non-governmental organizationsinterested in for example the preservation ofrecreational values in the forested landscape. Thedesign of the cutting classes in the R-method, X,U, C, S, and E is for that reason defined in order

to enable all parties to participate in a constructivedialogue about the spatial issues of forestlandscape information. The correspondencebetween these classes and the National ForestInventory is illustrated in Table 13

During the air photo interpretation it can oftenbe hard to tell between the S and C classes. If indoubt the photographic backdrop from the 1950son the old economic map was compared with thestereo images. If there was a significant differencebetween the two the square was coded C,otherwise it was coded S. If the difference wassubstantial for some stands and not for otherstands the square was classified as a combination,E. The X and U cutting classes stand out fairlywell from the other classes in the imageinterpretation. It may, however, be hard to tell Xand U apart. In those cases the records from thelocal silvicultural advisers have been an excellentinformation source.

The E cutting class has always been usedwhen none of the classes X, U, C, or S alone havebeen estimated to cover at least half of theproductive forest land area within the square unit.Due to the landscape structure in the Stockholmregion, most if not all squares would have beenclassified as E-squares if the forests had beenintensely managed for a longer period of time.This is not the case at the time of this inventorybut it might very well be the case for many areastoday except maybe for larger areas managed bylarge forest companies in the northern andwestern parts of the county.

Growing stockThe mean growing stock per hectare productiveforestland was estimated for each square unit.These estimates were based on experience frommeasurements through the National ForestInventory and especially estimates from forestmanagement plans covering some 40,000hectares, which were compared with the stereoimage. In addition, information from other

Table 13 Cutting class correspondence between the R-method and the Swedish National Forest Inventory

R-method cutting class National forest inventory cutting class X Felling site A Unstocked forest B1 Thicket stage

U Thicket stage B2, B3 Young forest C Thinning stage C Young thinning forest D1, D4 Old thinning forest

S Final felling stage D2, D3 Final felling forest E Other combinations

of cutting classes E Residual or creamed forest


planning activities and related field samplingcovering some 100,000 hectares of the countyassisted the growing stock estimates.

The growing stock varies from stand to standbetween 0 and 400m3sk/hectare. Especially thecutting class value influences the amount ofgrowing stock. The forest stand average forfelling sites is somewhere between 0 and 10m3sk/hectare, not counting nurse trees, whichhave some 20-80 m3sk/hectare. The thicket stagehas a large variation in the range 10-120m3sk/hectare and in some cases even more at thetransition to thinning stage forest. The values forthinning stage forests is between 60 and 250m3sk/hectare and forests ready for final fellingusually lie in the range 80-400 m3sk/hectare. Insome extreme cases these stands may hold morethan 400 m3sk/hectare.

The second most important factor is the sitequality. Poor site quality usually gives values atthe lower end whereas good site quality result invalues at the higher end of the intervalsmentioned above.

Another contributing factor is how densely theforest grows and in the stereo images the crownclosure is a measure of this. A well-closed standwith large crowns does not have as much growingstock as a similarly well-closed stand with smallcrowns that indicate a higher number of stems.

Also the tree species affects the volume of thegrowing stock. Deciduous stands have generallyless volume than Norwegian spruce stands withsimilar conditions. Alder, Ash and Aspen might inthinning stage stands reach the same levels ascomparable coniferous stands, whereas Birch andespecially Oak stands normally have significantlylower volumes of growing stock.

An estimate of the growing stock volumeusing the described method is of course verysubjective. Each person has different experiences,which become articulated through the series ofconsiderations that are included in the estimationprocess. Systematic errors in the estimations maylead to an over- or underestimation of the totalgrowing stock volume, maybe as much as +/-15%. The quality control that has been performedas part of this report shows however that there is avery good agreement between the R-dataestimates and the National Forest Inventory forthe entire county.

Stand density and ground bearing capacityStand density is measured as crown closure sincethis is possible to detect in the aerial images. In

the National Forest Inventory the stand densitiesis given as volume-density and is calculated fromfield measurements of basal area and meanheight. The relation between the NFI measuresand the R-data measure is not developed anyfurther in this text.

The ground bearing capacity variable wasrecorded in the comprehensive forest inventorybut not in the NFI. There are several reasons toinclude this variable in the R-data inventory. Forforestry purposes this variable is of interest toroad construction and logging planning. Theindirect consequences for choice of plant speciesand silvicultural planning is however far moreimportant. Also, this variable may be of interest toother interest groups such as for recreationpurposes.

In the original R-data these two variables arefound in the same position. This is the primarycause of the following mixed treatment of thesevariables.

Codes 0-9 have been used for all square unitsclassified as productive forest and also for thosesquares where forest wasteland is dominant (Hand M squares). Note however the specialmeaning of the code 0 in connection with theforest wasteland class.

For squares classified as productive forests thecode 0 is used with a special meaning for fellingsites that show signs of regeneration measures.Felling sites with no signs of soil cultivation orscarification or other regular point pattern, whichmay indicate seedlings, have been given the code0. Felling sites with nurse trees and signs ofground clearing have often been coded 4, 5, or 7although no seedlings have been identified.

The codes 1, 2, and 3 means that the foreststands are generally well closed with high normalor low ground bearing capacity. The interpretationmay however have favored the normal case.

The codes 4, 5, 6, and 7 have also beenapplied with no significant deviations from theinstruction. Though the code 5 “Forest standswith normal closure and normal ground bearingcapacity” have been used in situations of a lack ofindications for any of the other classes.

Squares given the codes 8 or 9 “Forest standswith sparsely distributed trees and {normal, low}ground bearing capacity“ are usually areas thathave been neglected or otherwise show signs ofunsatisfactory forest conditions. A large amountof these areas may nevertheless be abandonedagricultural land or enclosed pastures that hold


other specific values than those favored by thisclassification system. It may thus turn out usefulfrom a nature conservation perspective toinvestigate these areas in more detail.

Data processingDuring the period from 1981 until 1987 the Boardfor regional planning and industry within theStockholm county council was the authority thatspecifically co-operated with the PSU-projectleaders at the Swedish National Board ofForestry. This co-operation has largely consistedof registration and processing of original R-dataperformed by Gerd Lundström under thesupervision of ME Björn Lindfeldt.

Registration control and correctionsCopies of inventory originals were continuouslyprovided for computer registration. Printouts ofthe recorded data were similarly brought back tothe interpreter (Rolf Ruben) for control andcorrection. This procedure was repeated until noerrors were found.

By February 1986 the entire county was R-inventoried and computer registered. Finally thedatabase was added the national grid coordinatesfor each registered R-data square. At the sametime the codes from the inventory sheets weretranslated into number format.

To be able to evaluate a possible drift in theestimates, several square units were interpretedtwice and even three or four times after sometime. Also after a longer break in the inventorywork some 1000 hectares were re-interpreted.These were all measures to evaluate anysystematic errors from one period of inventory toanother. Codes were given to these square units tobe able to identify the most recently inventoriedsquare.

Quality assessmentAs soon as the following prerequisites can befulfilled, the R-data can be compared withcorresponding variables from the National ForestInventory.• R-data should be collected according to the

above instruction for the inventory and for ageographical area that correspond tostatistically verified data from the NationalForest Inventory.

Furthermore data from the National ForestInventory need to be processed so that thefollowing information is available:• The total area an the land use class

proportions

• The productive forestland area, mean sitequality and the areal coverage for each sitequality class.

• The total volume of the growing stock, anoverall mean value per hectare, arealdistribution of growing stock classes and themean growing stock per hectare for the threesite quality classes <4m3sk/ha, 4-6 m3sk/haand ≥ 6 m3sk/ha as well as for the cuttingclasses (A + B1); (C + D1) and (D2, 3, 4 +E).

• Areal distribution of cutting classes and thedistribution for the site quality classesabove, at least for the cutting classes (A +B1); (C + D1) and (D2, 3, 4 + E).

• The areal distribution of tree species and thegrowing stock distribution for tree speciesand for the three site quality classes thatbelong to the cutting class D2, 3, 4 + E)together with total figures.

If possible the standard error should be given.The National Forest Inventory has not

surveyed the entire county. The municipalities ofDanderyd, Järfälla, Lidingö, Nacka, Sollentuna,Solna, Stockholm, Sundbyberg, Täby and parts ofVaxholm municipality, has not been surveyed indetail. The area of these municipalities is in theNational Forest Inventory assigned to the land useclass “Agricultural land”. Since this is a ratherlarge area it would be hard to make any absolutecomparisons between the two datasets if not somesort of adjustments were performed. To make acorrect comparison between the two materials, theR-data not within the survey area of the NationalForest Inventory needs to be removed.

The remaining difference between theNational Forest Inventory figures and the R-dataestimates may be related to that the distinctionbetween productive forestland and forestwasteland has been different. There is also apossibility that deviation may occur due to thedifferences in spatial resolution of the twomethods. The National Forest Inventory fieldplots are 314m2 while the areal unit in R-data is250,000m2. This means that the two methods havedifferent abilities to record properties of arealunits, which size lie between these limits. In casesof systematic deviation in R-data from theNational Forest Inventory data it is anticipatedthat using a correction factor can compensate forthis deviation.

The area productive forestland is the onlyvariable that is consistently given for all square


units no matter what land use class that dominatesthe square. Concerning the areal estimates ofother land use classes, the purpose of the R-method is only to give a rough estimate of theland use structure by, for example, calculating thefrequency of square units with a specific land useclass or by examining the map image, Figure 16.For such areas the area of the specific land useclass is given in tenths together with the estimateof productive forestland area. The remaining area,if any, is an indirect measure of the amount ofother land use types.

Squares not covered by 50% or moreproductive forest land has not been specifiedaccording to the forest variables, site quality, treespecies, cutting class, growing stock, standdensity, and ground bearing capacity. Theestimate of these variables is accordingly basedon square units that have a 50% or more arealcoverage of productive forestland. The truth ofthe overall estimates relies upon the assumptionthat the forests do not have entirely differentproperties for these different square units. Thisassumption may prove incorrect but isnevertheless the best estimate available.

For the quantitative summaries the classmiddle have been used as a mean value for eachclass. This means for example that for the variable“area productive forest land”, the class 0, whichby definition includes an interval 0-10% of the

square area, is represented with the 5% value ofthe square area, 1.25 hectare.

Condensed-county methodThe quality assessment was performed byconcentrating the evaluation to those R-datasquares that spatially intersected one or more ofthe National Forest Inventory field plots, Figure14. In this way the spatial correspondencebetween the two data sets increase significantly.All National Forest Inventory field plots from theperiod 1973-77 thus determine the “condensedcounty”. The sampling based National ForestInventory is designed to give reliable estimates ofcounty totals. It is therefore argued that the"condensed county" method gives an even bettercorrespondence between the actual situation andthe sample, due to the increased spatialcorrespondence.

Results – R-data

Quality assessmentThe quality assessment is here restricted to totalproductive forest area and total growing stockestimates for the entire county. Other qualityassessments are also possible but will requireadditional information from the National ForestInventory according to the specification above. Ifpossible though approximate figures from theNational Forest Inventory have been given to givethe reader an idea of the differences that may existbetween the two data sets.

The total area of the “condensed county” is46,750 hectare. It consists of 1,870 R-data squareunits and 3,091 corresponding National ForestInventory field plots. The total area productiveforestland for the “condensed county” isaccording to R-data 21,012 hectare. To producean estimate of the total area within the wholecounty, the same technique is used as within theNFI. In 46,750 hectare of surveyed area we have21,012 hectares of productive forestland. Theratio 21,012/46,750 is used against the knownland area of the entire county, 649,000 hectares.This gives 21,012 / 46,750 * 649,000 = 291,696ha. The National Forest Inventory estimates thearea productive forestland to be 287,000, whichmeans that the R-data estimate is 1.6% higher.We should expect some degree of overestimationwhen the areal unit for investigation is increasedfrom 314m2 of the NFI field plot to 250,000m2 ofR-data. Such circumstances generally suppressuncommon classes, in this case forest wasteland,and favor more common classes. It is however

Figure 14 Conceptual layout of the condensed countysquare units. The square grid illustrates the grid of the25 ha R-data inventory units. The small dots representfield plots in the National forest inventory, whichusually come in square shaped tracts. The shaded 25haunits indicate those squares that intersect with one ormore National Forest Inventory plots.


hard to predict the magnitude of this effect since itis dependent on the spatial properties of land useclasses, such as the size and distribution ofhomogeneous areas.

The growing stock estimate is in R-data givenas mean growing stock volume per hectareproductive forestland within each 25-hectare unit.The mean growing stock in the “condensedcounty” is estimated to 120.17 m3sk/ha, whichcan be compared with the National ForestInventory estimate of 121 m3sk/ha. The differencebetween the two estimates is thus <1%. Thecounty total is in R-data estimated to be35,003,520 m3sk/ha, which deviates by 2.6%from the NFI estimate of 34,100,000 m3sk/ha. Ofcourse, the R-data estimate here includes theprevious overestimation of the productiveforestland area. A summary of the qualityassessment is given in Table 14.

Clearly there is a very good correspondencebetween the estimates of area productiveforestland and growing stock produced by theNational Forest Inventory and the R-methodrespectively. The estimates of total growing stockwill also be discussed further in one of the resultsections below.

Land use classFigure 16 illustrates the geographic distribution ofrecorded land use classes. A total of 337,481hectares of productive forestland has beenrecorded in R-data for the Stockholm County.250,643 hectares have been described regardingtheir site quality, tree species, cutting class,growing stock, stand density, and ground bearingcapacity. This area is 78% of the total productiveforestland area in the county. The remaining 22%is consequently not described in detail for thesevariables because it is located in square units thatpredominantly consist of other land use classes.

The National Forest Inventory states that thearea productive forestland is 287,000 hectare. Thereason for this large deviation is due to thepreviously mentioned fact that the National ForestInventory does not survey significant parts of thecounty. The previous quality assessment showedthe reliability of the R-data estimates forproductive forestland. The estimates for the otherland use categories given in Table 15 are onlygiven as a rough estimate of the areal proportionof land cover categories within the county. Forreasons explained previously these figures, exceptthe figure for forestland, can not be used as anestimate of the absolute values of these land usetypes.

Site qualityFigure 17 illustrates the geographic distribution ofthe site quality classes. Out of the forest area thathas been described in detail, 30.2, 28.8, 19.7 and21.3% have been classified as good, normal, poorand other varying site quality in that order.

Studying the areal distribution of the sitequality variable in Table 16 we see that 46% ofthe felling sites are on good site qualities. Thisrelation is however not the case for the thicketstage forests. This result is not unexpected, butmaybe a bit more pronounced than anticipated.That clear cuts during the 1960s and the 1970swere mainly done on the good sites have beenknown through information from localsilvicultural management plans. The reason forthis is partly the more aggressive logging policyduring these years, partly due to the autumn stormin 1969 that fell large areas of high-grown forestson clay or otherwise wet ground, that is on goodsite qualities. During the 1950s the logging policywas different and many forest plantations wereperformed during this period. Markedly often itwas forests on eskers and other higher hills thatwere logged during this period.

Table 14 Summary of the quality assessment of R-data against the National Forest Inventory for Stockholm countyestimates of area productive forestland and growing stock for the period 1973-77.

NFI estimate R-data CondensedCounty

R-data Totalestimate

Area productive forestland (ha) 287,000 291,696 337,481Average growing stock (m3sk/ha) 121 120 126Total growing stock (m3sk) 34,100,000 35,003,520 42,522,606

Table 15 Areal coverage within Stockholm County of the 5 land use classes that was identified by R-data.Forest Water Planned

buildingsAgricultural

areaOther

Area (ha) 337,481 178,457 51,500 93,542 39,166


For the other cutting classes there are onlyminor deviations from the overall distribution forthe entire county. It seems only logical that thefelling stage forests have a slightly lower averagesite quality than the thinning stage forests.

Cutting classThe geographic distribution of the differentcutting classes is shown in Figure 18. The arealdistribution summarized in Table 17 seem tocorrespond well with the estimates of the NationalForest Inventory acquired in a totally differentmanner.

Tree speciesThe comparisons of results for tree species aresummarized in Table 18. These numbers indicatethat the Norwegian spruce forest area have beenlargely underestimated in R-data. The spruceforests have probably been classified asconiferous forest due to the well-known difficultyto separate spruce from pine during the imageinterpretation. Stereo image interpretation of colorinfrared images would certainly lead to a more

accurate result. The underestimation may also bea result of the size of the areal unit formeasurement. If spruce forest stands generallyoccur as relatively small and dispersed units, thiswould result in a lower frequency in the R-datathan in the reality. The geographic distribution oftree species according to R-data is illustrated inFigure 19.

Growing stockThe good correspondence between the R-data andNFI estimates of growing stock was previouslyestablished. This evaluation was based on a so-called “condensed county” assessment. Onereason to perform this type of evaluation was thatthe National Forest Inventory does not cover theentire county with forest parameter evaluations,but R-data give an exhaustive evaluation of theentire county area. The average growing stockvolume per hectare is estimated to 126m3sk/hectare in R-data, Table 14. This figure ishigher than the county average for the areascovered by the NFI discussed earlier. From thiswe may draw the conclusion that the forests in the

Table 16 Areal distribution of site quality classes over cutting classes. Numbers are given in hectares based on R-datafrom Stockholm county 1975.

Cutting classSite qualityclass

Felling site Thicketstage

Thinningstage

Final fellingstage

Othercombination

Total

Good 13,201 4,809 23,852 24,401 8,830 75,094Normal 8,722 6,337 22,673 25,232 8,855 71,821Poor 3,410 5,944 16,757 20,408 3,297 49,817Other/varying 3,526 3,509 15,654 20,219 10,998 53,909Total 28,860 20,601 78,938 90,260 31,982 250,643

Table 17 Comparison between R-data and NFI estimates of the areal distribution of recorded cutting classes. Numbersare given as percent coverage of the total forest area in the county 1975.

Cutting class R-data (%) NFI (%)Felling site (A + B1) 11.5 14.2Thicket stage (B2 + B3) 8.2 10.8Thinning stage (C + D1 + D4) 31.5 29.8Final felling stage (D2 + D3) 36.0 44.3Other combinations, residual (E) 12.7 0.9

Table 18 Comparison between R-data and NFI estimates of the areal distribution of recorded tree species. Numbers aregiven as percent coverage of the total forest area in the county 1975. The NFI estimates pertain to the entire Svealandregion and not only Stockholm County.

Tree species R-data (%) NFI (%)Pine 31.8 36.6Norwegian spruce 10.4 28.8Conifer 44.0 22.8Deciduous or mixed 6.7 11.8Unknown 7.1 0


municipalities of Danderyd, Järfälla, Lidingö,Nacka, Sollentuna, Solna, Stockholm,Sundbyberg, Täby and parts of Vaxholmmunicipality has a significantly higher averagegrowing stock than other areas in the county.

Stand densityThe general impression from Table 19 is thatmost of the thinning and final felling stage forestshave a well-closed structure and only some 2% ofthe thinning stage forests hold sparsely distributedstands. No map of this parameter has beenproduced.

Ground bearing capacityThe areal distribution of the ground bearingcapacity is summarized in Table 20 for each sitequality class. This shows for instance that highground bearing capacity occurs mainly on areas ofpoor site quality, these are often on rockysubstrates. Low bearing capacity occurs largely inlocations with good site quality, often in clayfilled depressions and valleys.

Material and methods – Structural data;classification of economic map sheetsaccording to landscape type

BackgroundAs previously mentioned the classification ofeconomic map sheets into landscape types wasperformed as a background material for thedelineation of the archipelago region in Sweden.The extent of this working material came to coverentire Sweden in which four different landscapetypes have been identified. The idea behind thisextended background data was that it could beused to differentiate for example forestrypractices between different landscape types. Thegrid structure of the material was also anticipatedto be possible to compare with other data such asnational and regional summaries from theNational Forest Inventory and data such as the R-data described previously in this chapter.

MethodIn the working material four different landscapetypes have been identified, coastal district,urban/suburban district, agricultural district andforest district. As a result this working materialcame to include estimations of constituting

Table 19 Areal distribution of stand density (crown closure) classes over cutting classes. Numbers are given in hectaresbased on R-data from Stockholm county 1975.

Cutting class

Crownclosure

Fellingsite

Thicketstage

Thinningstage

Finalfellingstage

Othercombi-nation

Totalhectare

Total%

Well closed 892 6,690 52,728 46,505 8,909 115,723 46Normal 1,428 6,976 19,826 37,046 11,539 76,815 31Sparsely 2,471 1,327 1,835 3,799 3,357 12,788 5N/A 24,070 5,608 4,549 2,912 8,178 45,317 18Total ha 28,860 20,601 78,938 90,261 31,983 250,643% 12 8 31 36 13 100

Table 20 Areal distribution of ground bearing capacity classes over site quality classes. Numbers are given in hectaresbased on R-data from Stockholm county 1975.

Bearingstrength

Site quality

Good Normal Poor Other Total (ha) Total (%)High 765 18,992 39,161 8,673 115,723 46Normal 47,035 38,878 2,370 31,963 76,815 31Low 11,780 1,980 1,612 2,118 12,788 5N/A 15,515 11,971 6,675 11,155 45,317 18Total (ha) 75,095 71,821 49,818 53,909 250,643Total (%) 15 14 10 11 100


landscape types in 18000, 5 by 5 km quadraticsquares covering entire Sweden.

The term district above will be usedthroughout this description an is meant to besynonymous to the Swedish word ‘bygd’ whichessentially mean a geographic region often in thecountryside and populated to some extent. Thespatial extent of this concept is not clear but fromthe translation above it follows that a ‘bygd’ ordistrict in the context of this report embraces atleast one or a few kilometers.

As source material the Swedish topographicmap sheet series at 1:50 000 scale was used. Thevarying availability of the latest editions of thesemap sheets resulted in a time span from 1960 to1980 in the used maps. Each map sheet is dividedinto 5 by 5 km squares corresponding toindividual map sheets in the Swedish economicmap sheet series in 1:10 000 scale. The arealcoverage of certain land cover / land use types isestimated for each 5x5km square unit. On thebasis of this estimation the square is classified asone of four landscape types. Furthermore theestimated areal proportion of all landscape typespresent in the square unit is also recorded. Theestimates are given as tenths of the total squareunit area. Thus, the smallest area registered is 2.5km2, which may in fact be a summary of manysmaller areas summing to an area large enough tobe registered.

Instructions for the inventoryAreal units corresponding to one entire economicmap sheet is divided into four categoriesaccording to the instruction below. It is alsopossible to use other sizes of the areal unitfollowing the occasional modifications detailedbelow.

The landscape types identified are:1. Coastal district, code K, Archipelago covers

at least 60% of the areal unit. Archipelagodenotes the land area within 500 m from theseashore or from the shore of any of the fourlarger lakes in Sweden, Vänern, Vättern,Mälaren, and Hjälmaren. Included are alsowater bodies of these lakes. Code K is usedwithin 500m from sea etc. no matter what thesize of the areal unit is.

2. Urban/suburban district, code T, is usedwhen at least 1/8 of the areal unit is coveredby urban/suburban land, that is, built up landincluding house lots, streets, parks, industrialand commercial areas. Also golf courses,airfields, power lines, freeways and sports

facilities (excluding slalom slopes) is countedas urban/suburban land.

3. Agricultural district, code J, is used whenfields, meadows and urban/suburban landcover at least ¼ of the areal unit. By fields andmeadows mean agricultural fields, grazedareas including tree patches within these areas.Also roads and buildings within or connectedto these areas are counted. In addition to thaturban/suburban land as defined above isincluded in this landscape type, thoughobserving the sequence of work given below.

4. Forest district, code S, is used for all otherareas, thus encompassing forested areas andother land cover types that do not belong tothe other three categories

Sequence of workIn the classification process, first an evaluation ismade whether the square should be classified asCoastal district, code K, or not. If this is not thecase it is tested whether the requirements forUrban/suburban district, code T, is fulfilled.Again, if this is not the case it is tested if therequirements for Agricultural district, code J, isfulfilled. If the requirements for neither K, T, norJ is fulfilled the area is classified as Forestdistrict, code S.

The areal estimates are always given asfigures rounded to whole tenths of the areal unit.Occurrences less than 1/20 is recorded with a dot,• = incidence.

During the inventory, information is recordedas letter and number codes in geocoded squareson a separate interpretation worksheet. The classcode is given in the middle of the square on theworksheet. In addition the areal share of eachlandscape type within the square unit is recordedas integer numbers in the four corners of theinterpretation worksheet, each cornercorresponding to one landscape type.

In the upper left corner, the coastal areal sharewithin the square unit, rounded to whole tenths, isrecorded. The lower left corner is in a similarmanner used to record the areal coverage ofurban/suburban area. The lower right cornerrecords the agricultural area and the upper rightcorner is used to record the forest area. Inlandlake area other than that included in the coastalarea definition is not explicitly recorded but canbe calculated as the remaining area between thoserecorded for the four landscapes type and the fullarea of the areal unit.


Results – Structural data

Most of the material is still only available asunprocessed interpretation worksheets. Due to thefocus of the original project on coastal areas thesehave, however, been transferred into digitalformat. For the purpose of this work a portioncorresponding to the Stockholm County has alsobeen transferred into digital format. Figure 15show the areal share estimates for each recordedlandscape type for Stockholm County. Figure 21illustrates the areal distribution of the finalclassifications into landscape types for StockholmCounty.

Discussion and conclusions

This chapter has given a detailed description oftwo previously unpublished data sets. The

methods that was used to record these data setsand the final structure of these data resembles to avarying degree the structure used in contemporarytechniques for digital analysis of landscapeanalysis.

A primary goal has been to highlight thesedata as a possible source for further scientificstudies. Another important goal has been toillustrate some interesting techniques to recordand assess the quality of information about thephysical landscape.

For the purpose of this dissertation the chapterhas given a thorough description of the data usedin chapters 5 and 7.


A – Urban/suburban B - Forest

C - Agricultural D – CoastalFigure 15 Images illustrating the estimates of areal share of the landscape types, Urban/suburban (A), Forest (B),Agricultural (C), and Coastal district (D) form the Structural data inventory 1960-80. Each pixel represents an areacorresponding to one Swedish economic map sheet, 5 x 5 km. Low areal share is given in dark tones and higher shares aregiven as increasingly whiter tones.


Figure 16 Map of land use categories in Stockholm County according to R-data from 1975


Figure 17 Map of site quality categories in Stockholm County according to R-data from 1975


Figure 18 Map of cutting class categories in Stockholm County according to R-data from 1975


Figure 19 Map of tree species categories in Stockholm County according to R-data from 1975


Figure 20 Map of growing stock categories in Stockholm County according to R-data from 1975


Figure 21 Landscape types in Stockholm County according to the Structural data inventory 1960-80

Pilot studies • 63

PILOT STUDIES

Introduction

The following chapter presents two pilot studiesthat were performed during the first phase of thisresearch. Parallel to the problem formulation itwas essential to gain some insight into theproblems associated with data transformations. AsI have already mentioned I tried to separate thetransformation problem into separate geographicdimensions. This chapter deals with the problemof transforming geographical data through achange of spatial scale.

Scaling of geographical data has traditionallybeen made manually, and the issue of aggregationand generalization has a long tradition withingeography. There is today an increased demandfor automated tools for generalization, updatesand revision of databases made at differing levelsof generalization. Several disciplines apart fromgeography such as ecology, land surveying,computer science and cognitive psychology arecurrently involved in research on this issue. Areview of research within this field has beenpresented in the spatial granularity section ofchapter 3.

The need for a general framework to deal withscale and complexity changes of geographicaldata was also indicated in chapter 3. The spatialpattern of various environmental variables andtheir relationships might be ordered into scaledomains and it is also speculated that transitionsbetween such domains might be relatively abruptmuch like phase transitions in physics (Wiens,1992). King (1990) denotes such domains as the‘maximum extent’ in the context of ecologicalmodels. The fairly wide support for ideas of scaledependent controlling factors, on for exampleecosystems, encouraged me to try to designexperiments that investigated these theories as atool for context transformation processes.

Both studies approach the scaling problemtrying to isolate the change in spatial granularity,fixing the thematic and temporal granularity.Thus, they also make a first attempt to detect anyevidence for a scale dependent perception of land

cover categories. This research direction is furtherdeveloped in chapter 7.

The first study considers both continuous andcategorical data and uses both global measures aswell as location specific analysis to estimate thespatial aggregation effects. For the categoricaldata this first study uses two different spatialaggregation strategies, one standard procedureusing a majority decision and an alternativeaggregation method based on confusion matrices.The second pilot study wanted to test the idea thatphysical controlling factors such as climate,hydrology and soils impose general and scaledependent constraints on other environmentalvariables. The second pilot study also considersboth qualitative and quantitative data but focusmore on the problem of how mixture classes andmixed pixels behave in the aggregation process.

Data for the pilot studies consisted of the R-data material presented in the previous chapterand additional data collected specifically for thepurpose of the pilot studies, but in line with thesame methodology. Of these two pilot studies, thesecond has been presented at the 8th annual GAPanalysis meeting, 1998, Santa Barbara, USA(Ahlqvist, 1998).

Pilot study 1

ObjectiveThe objective of this study is to evaluate theresults of a straightforward aggregation of aninitial dataset. This kind of assessment has beendone before but in this study the initial dataset, aswell as the reference dataset for testing theaggregation performance, is derived throughmanual classification at both levels of resolution.The basic hypothesis in this study is that thehuman perception of landscape features varieswith field of view. If true this would imply thatinformation gathered for a certain area wouldshow differences depending on the spatial unitused for registering the information.

MethodTo investigate whether or not there is a differencein interpretation depending on scale, data were

Chapter

5

64 • Pilot studies

needed from two separate interpretations of thesame area using different spatial resolutions.Other studies of aggregation effects on grid datahave mainly evaluated direct changes in spatialmetrics such as local variance and pattern indices.This study uses an alternative approach based onmanual interpretation at two resolutions using thesame classification scheme. It is anticipated thatthe effect of different automatic data aggregationmethods can be assessed in the right context ifusing manually classified data at differentresolutions as reference data.

It may be speculated that the same piece oflandscape might be differently evaluated as aresult of the area of focus for interpretation andclassification. The experimental design also hasthe profound difference with normal remotesensing image based tests that some of theinformation in this material has a sub-pixelresolution. Variables such as area productiveforest actually gives the areal percentage of theclass productive forest land within each pixel.Also the data actually contain gridded objectinformation not spectral ‘object-like’ informationand this is very important for the purpose for theinvestigation especially in the case of comparingaggregation results with a reference image.Aggregation of spectral signatures averages allsub-pixel object signatures and gives theintegrated signature as an input to theclassification algorithm to decide what classshould be assigned to the pixel. In theinterpretation process the interpreter actuallysummarizes the individual features on which theclassification is based, and thus provides an objectspecific estimate for the pixel value. The use ofmanually interpreted data at two different levelsof resolution is uncommon and as a result thereare no guidelines on what methods might beappropriate for this kind of analysis. The methodsused in the analysis are therefore a blend oftechniques from digital cartographicgeneralization, remote sensing and statisticalliterature.

Study area and data usedThe study area is located within the StockholmCounty, Figure 11. There is a problem of doingmultiscale studies with manually interpretedinformation and that is to collect a reasonablysmall but still representative sample. Three studysites were chosen so that both the physicalproperties of the landscape as well as differentland ownership patterns should be represented

within these samples. The total number ofinterpreted pixels was limited to n=675 at thehigher resolution and n=27 at the lowerresolution.

The low-resolution data set was extractedfrom the R-data described previously having aspatial resolution of 500x500m. To recapitulatequickly, this method is based on air photointerpretation and classification of 25 ha quadraticareas in a regular grid with full areal coverage.Interpretation is made using areal panchromaticphotographic images at 1:50 000 scale.

The additional high-resolution data set wascollected using the same method over the threestudy sites, Angarn, Hejsta and Strömma,indicated in Figure 11. The only thing thatseparates the two is that the second dataset wascompiled using a spatial resolution of 100x100m.Data was thus assembled for 1 ha pixels and 25 hapixels separately, giving two sets of data ofdifferent resolution distributed over the threestudy sites.

It may seem odd to use this kind of data in astudy on modern automated generalization toolsbut it is anticipated that this data have someinherent properties that make them most suited forthis kind of analyses.

As earlier mentioned, the classification systemfor R-data has been adapted from the SwedishNational Forest Inventory classifications. Thisinventory is an annual survey covering the wholeof Sweden. Part of the overall R-data approach isa ground truth calibration method. Using datafrom the nation wide National Forest Inventory R-data interpretations can be evaluated and adjustedfor systematic bias in the interpretation. Theaccuracy evaluation is an important step to ensurethat relevant conclusions can be made from theresults. Also, calibration to reduce systematic biasis necessary whenever absolute accuracy isneeded e.g. comparison with other datasets. Dataused as ground truth in this study is a compilationof existing data from the Swedish National ForestInventory.

Quantitative data – area productive forest landThe parameter ‘Area productive forest land’ ismade up of 11 interval classes from 0=0-5%, 1=5-15% up to 9=95-100%. This parameter wasdigitized from manual interpretation protocols toproduce the raster images shown in Figure 22columns a) and b).

Accuracy assessment of absolute values wasmade by comparing averages of manually


interpreted data with ground truth data from theSwedish National Forest Inventory (NFI). Fourtotal averages of productive forest land, two formanually interpreted data at 1 and 25 haresolution and two averages for the correspondingNFI sites are given in Table 21. These averagesare based only on those interpreted pixels and NFIfield sites that overlap spatially.

Automatic generalization of 1 ha pixels weredone by averaging 5x5 pixels into rounded integerpixel values producing images shown in Figure 22c). This operation produces 25 ha pixel imageswith average values of area productive forest veryclose to the original 1 ha pixel images.

Comparison of summary statistics from themanual and automatic data at 25 ha resolution ispresented in Table 22.

During the analysis it was speculated that theinterpretation accuracy could be affected by thespatial configuration of the landscape features.Therefore a range of spatial pattern indices wasproduced from the high-resolution data (1hapixels) using the pattern module in IDRISI. Thederived index images were then compared withthe images showing differences between theautomatic and manual interpretation, Figure 22 d),using simple cross tabulation.

a) 1 ha data manual interpretation

b) 25 ha data manual interpretation

c) 25 ha data automatic generalisation

d) Difference (auto – man)

Angarn -1 0 0

0 +1 +2

+1 +1 +2

Hejsta 0 +1 +2

+1 0 +1

+1 +1 +2

Strömma 0 +1 +1

0 +2 +3

+4 +2 +2

Figure 22 Results from manual interpretation with a) 1 ha pixel resolution, b) 25 ha pixel resolution, c) automaticgeneralisation of a) 1 ha pixels, and d) showing the difference between interpreted data and automatically generaliseddata at 25 ha pixel resolution. Increased tree cover is shown as increasingly darker tones.

Table 21 Accuracy evaluation results showing the accuracy of the manual interpreted data for the variable areaproductive forest land. Data from the Swedish National Forest Inventory serve as ground truth.

Total erroranalysis area

NFI referenceplot area

Average of man.Interpretation (%)

Average ofNFI data (%)

Deviation Error

1 ha pixeldata 225 ha 21,4 ha 48.2 46.3 0.019 4.1%

25 ha pixeldata 17125 ha 35,0 ha 45.1 43.6 0.015 3.4%


Qualitative data – tree speciesData on tree species for the three test sites werecollected in the same way as for quantitative dataabove. The NFI reference data contain 9 treespecies classes that were aggregated into 5 classesto give corresponding classes with the manuallyinterpreted data, Table 23. After this, the onlydiscrepancy between the two tree speciesclassifications is the undetermined class in the R-data classification. Pixels classified asundetermined in the R-data set have beenexcluded from the analysis to avoid erroneousresults. Those NFI data that spatially overlap suchundetermined pixels have also been excludedfrom the analysis.

The 1 ha data were aggregated into 25 ha databy categorical generalization using two differentaggregation operators. The first strategy uses amajority method assigning the aggregated pixel avalue of the most frequent class in the source 25pixels. In this study it was possible to give twoalternative answers in the case of ties with equalcounts.

The other aggregation operator used confusionmatrices for each individual tree-species class.NFI reference plots have been used to determine

the mix of tree species within interpreted pixels.In this way it was possible to determine forexample that pixels classified as Scots pine inaverage contain 39% Scots pine, 21% Norwegianspruce, 29% mixed conifers and 11% deciduous.For pixels classified as Spruce, Conifer, andDeciduous mix, other distributions were derivedfrom the reference NFI data. In this wayconfusion matrices were produced for all treespecies classes and at both interpretation levels, 1ha and 25 ha resolution, Figure 24

We may look upon the diagrams in Figure 24as class signatures. Consequently, this method isvery similar to a supervised remote sensing imageclassification. So why is not the usual Imageclassification algorithm used from here?

The number of samples from the three testsites did not prove to be large enough to give afull estimation of the usefulness of the latterapproach. The aggregation of 25 pixels into oneproved to reduce the number of pixels in thelower resolution data too much. Nevertheless,these first results are included to give a generalidea of the approach. From each set of 25 pixelsto be aggregated, a species class signature wasproduced. This is illustrated as the nine

Table 22 Summary statistics from images produced by manual interpretation and automatic aggregation.

Site Data Average Max. class Min class Stand.dev.Angarn 1 ha man. 7 9 0 2.8

25 ha man. 7.8 9 3 1.925 ha auto. 6.9 9 4 1.5

Hejsta 1 ha man. 6.6 9 0 2.225 ha man. 7.6 9 7 0.725 ha auto. 6.6 8 5 0.9

Strömma 1 ha man. 0.6 4 0 1.325 ha man. 2.2 5 0 1.725 ha auto. 0.6 1 0 0.6

Table 23 Classification correspondence between the NFI system and classification system used in manually interpreteddata.

NFI classes R-data classesScots pine Scots pineSpruce SpruceConifer ConiferForeign conifersMixed conifer/deciduous deciduous share 35-44%Mixed conifer/deciduous deciduous share 45-64%Birch Deciduous mixed forestBeechBroad leafed deciduous species

Undetermined species composition


histograms in Figure 23 a-i. Each histogramcorresponds to the species distribution within one5x5 pixel window

To get sufficiently large samples for astatistical testing of the scaling hypothesis statedin the beginning, more data was needed. One wayto achieve this was to use low-resolution datafrom the entire Stockholm County. Compilinglow-resolution R-data together with spatiallyoverlapping NFI-reference data, gave anadditional set of samples. Here only those R-datapixels that contain at least one NFI plot have beenselected together with these NFI plots. This made

it possible to examine the correspondencebetween samples of interpreted data at twodifferent resolutions against a ground truth sampletaken from the national forest inventory. Eachpair of samples selected to overlap spatially asmuch as possible and at the same time to producea sample large enough for significance test withKolmogorov-Smirnov two-sample test. This testcan be used to determine whether twoindependent samples may have been drawn fromthe same population or from populations with thesame distribution. (Siegel, 1956)

Pixelsize

Scots pine Spruce Conifer mix Deciduous mix

1ha

n=24 n=14 n=34 n=20

25ha

n=163 n=48 n=183 n=24

0

10

20

30

40

50 %

0

10

20

30

40

50 %

0

10

20

30

40

50 %

0

10

20

30

40

50 %

0

10

20

30

40

50 %

0

10

20

30

40

50 %

0

10

20

30

40

50 %

0

10

20

30

40

50 %

Scots pineSpruce

Conif er mixDeciduous mix

Figure 24 Confusion matrices showing the class signatures in data produced by manual interpretation at two levels ofresolution 1 ha and 25 ha. Coloured bars show the relative distribution of actual tree species within each classaccording to NFI field control data.

0

0.25

0.5

a)0

0.25

0.5

b)0

0.25

0.5

c)

0

0.25

0.5

d)0

0.25

0.5

e)0

0.25

0.5

f)

0

0.25

0.5

g)0

0.25

0.5

h)0

0.25

0.5

i)

a)

d)

g)

b)

e)

h)

c)

f)

f)

Scots pine

Spruce

Conif er mix

Deciduous mix

Species unknown

Figure 23 Image showing original 1 ha pixels in Hejsta study site. Histograms show class distribution within each 5x5pixel set.


Results

Quantitative data – area productive forest landThe results form the accuracy evaluation for thevariable “area productive forest land” is given inTable 21. It shows the correspondence ofaverages from manually interpreted data with theaverages from field control data. We see from theleft columns of Table 21 that the interpreted datafor tree cover show an error of 4.1% and 3.4%respectively. We know from quality assessmentsof the NFI data that the standard error inStockholm County may be as high as 4.3%. Asnone of the interpretations deviate more than thestandard error from the NFI-data no calibrationfor systematic bias was made.

Generalized results were produced andcompared with the manually interpreted data at25ha resolution. These data are illustratedtogether with source 1ha data in Figure 22,columns a, b, and c. The difference imagesshowing deviations between the manualinterpretation and the automated aggregation aredisplayed in Figure 22, column d. Summarystatistics from the manual interpretation and theautomatic aggregation are presented in Table 22.

From the speculation that the spatialconfiguration of landscape features couldinfluence the interpretation accuracy a certaincorrelation was found with some of the testedindexes. Using simple cross tabulations thestrongest direct correlation was found with theindex of diversity. The correlation figures aregiven below both as a cross tabulation table inTable 24 and as histograms in Figure 25. Here wesee that the correctly classified pixels (deviation0) occur over areas with a flat distribution of

diversity values with some weight on the lowerdiversity values. Pixels classified with a +1 to +3class deviation from expected value show adistribution with more frequent high diversityvalues. For pixels classified as –1 and +4 thesample is only n=1 and these will therefore notbe considered in further discussions.

Qualitative data – tree speciesData on tree species for the three test sites areshown in Figure 26 together with results fromautomated categorical generalization using themajority aggregation operator. Apart from the twoaggregated data being quite different; we also seeexamples of pixels assigned to two classes by theautomatic aggregation. These undetermined pixelsare illustrated by splitting the square pixeldiagonally using colors of the tied classes for thetwo halves.

Dev Distribution of diversity index classes

-1

0%

10%

20%

30%

40%

1 2 3 4 5 6 7 8 9 10

n=1

0

0%

10%

20%

30%

40%

1 2 3 4 5 6 7 8 9 10

n=7

1

0%

10%

20%

30%

40%

1 2 3 4 5 6 7 8 9 10

n=10

2

0%

10%

20%

30%

40%

1 2 3 4 5 6 7 8 9 10

n=5

3

0%

10%

20%

30%

40%

1 2 3 4 5 6 7 8 9 10

n=3

4

0%

10%

20%

30%

40%

1 2 3 4 5 6 7 8 9 10

n=1

Figure 25 Histograms over 25 ha classifications ofproductive forest land. Each diagram shows subsetsummary of those interpreted pixels deviating (Dev.) –1, 0, +1, +2, +3 and +4 from pixels derived throughaggregation of 1ha interpreted pixels. N= Sample size.

Table 24 Cross-tabulation summary of all three sites withdegree of misclassification (columns) against diversityindex classes (rows)

-1 0 1 2 3 41 7 38 21 19 3 92 0 25 12 9 6 23 1 16 8 0 2 24 0 14 24 14 1 05 7 17 22 15 4 46 3 18 33 11 7 47 4 19 60 15 12 48 2 23 53 29 14 09 1 5 12 11 1310 0 0 5 2 13


Figure 27 exemplify the results from thecategorical generalization using the secondaggregation operator based on class signatures.For comparison purposes the previous result formthe majority based aggregation operator and themanually interpreted low-resolution data are alsodisplayed in Figure 27 c and d respectively.Aggregation based on class signatures derivedfrom confusion matrices is problematic. Due tothe small samples it is hard to reject thepossibility that these distributions could be drawnfrom the same distribution as the referencedistributions for the four species classes in aformal Kolmogorov-Smirnov test. In the examplemost pixels can be classified to belong to eitherScots pine or Spruce classes at a 0.15 confidencelevel. Only the lower left pixel can be predictedthe same way at a 0.05 confidence level. No pixelcan be given only one class assignment based inthis analysis.

The overall analysis of species distribution atthe two levels of resolutions uses only globalmeasures and are presented as histograms inFigure 28. The first two columns are the best

comparison that could be produced having onlyfully overlapping R-data and NFI-data. The rightcolumn illustrates a similar result using allavailable information for Stockholm county,disregarding whether there is spatial overlap ornot. The lower row of diagrams d-f all give thebest ground truth estimate available. Figure 28 d)shows the species distribution for the three testsites that have been interpreted at 1 ha resolution.Figure 28 a) above are directly comparable as thisshow tree species distribution from those 1 hadata that are located over the same area. Movingto the next pair of diagrams Figure 28 b) and e)we can see that the distributions are verydifferent. It is clear that the study regions arelocated at sites with more Scots pine and lessSpruce compared with the total county average.The most apparent result of these diagrams is thatclassifications made at 25 ha resolution produce afar more erroneous result compared with NFI datathan the 1 ha classification.

A formal estimation of the difference betweeninterpreted data and ground truth NFI data wasmade using a Kolmogorov-Smirnov two-sample

a) 1 ha data b) 25 ha data manualinterpretation

c) 25 ha data automatic aggregation

Angarn

Hejsta

Strömma

Scots pineSpruce

Conif er mix

Deciduous mix

Species unknown

Figure 26 Results from manual interpretation of tree species with a) 1 ha pixel resolution, b) 25 ha pixel resolution, c)automatic generalisation of a) 1 ha pixels using most frequent class as classification criterion for aggregated pixels.Pixel count resulting in ties are illustrated by splitting the output pixel into multiple colors representing the tied classes.


test. The test was able to reject the assumptionthat the distributions illustrated in Figure 28 b)and e) could be drawn from the same distributionat a 0.001 significance level. The same applied tothe sample distributions in Figure 28 c) and f).The testing of 1 ha resolution data, Figure 28 a)and d) was however not able to reject theassumption that the samples are drawn from samedistribution. Thus data at 25 ha resolution haddeviated significantly from the initial distributionat 1 ha resolution.

DiscussionThe results in Table 21 indicated that the relativedifference between the manual and automatic dataat 25ha resolution was small. Nevertheless allthree test sites illustrated in Figure 22 show a

relatively high degree of overestimation ofproductive forestland. A majority of theautomatically generalized pixels show none orminor deviations from the expected result. In 9cases areas show a difference in interpretationwith as much as 20-50% productive forestland(classes +2 through +4). Considering that 1 haand 25 ha data pixels were shown to be accuratewith only 3-4% error these large deviationsrequire some explanation. One reasonablesuggestion would be that the spatial configurationof the areas influences the interpretation (Turner,1989). The results from the cross tabulationindicate some degree of correlation between largedeviations and index of diversity, Figure 25. Itwould be interesting to see if this result could be

a)

a)

d)

g)

b)

e)

h)

c)

f)

i)b)

c) d)

Figure 27 Images showing categorical generalisation over the Hejsta study site. Original 1 ha pixels (a), are aggregatedusing Kolmogorov-Smirnov two-sample test (b). Pixel aggregation resulting in ties are illustrated by splitting the outputpixel into multiple colors representing the tied classes. For comparison, image (c) show the aggregation result using amajority operator and image (d) show the result from manual interpretation.


further validated into a general pattern. If so itcould be possible to model this kind ofinterpretation accuracy by measuring an index ofdiversity in the interpreted area. Furtherdevelopment of such neutral predictions ofinterpreter’s reliability might also be used toconstruct a spatially differentiated uncertaintysurface over a survey area.

Increasing grain size in raster data can bemade by simple aggregation of groups of nadjacent pixels into one large pixel, which isassigned a value of the most frequent class. Theeffects of this has been shown earlier by Turnerand others (1989). In essence their results showthat increased grain size will decrease number ofclasses by eliminating less frequent classes.Furthermore the aggregation will produce anincrease in indices such as dominance andcontagion (spatial correlation). The test resultsfollow these general rules. Even if no classes aretotally eliminated Figure 28 show a decrease inless frequent classes for lower resolution data andoriginally more frequent classes get an increase.

The fact that the ground truth data from theNFI only cover a sub area of the classified areadoes of course have significant effect. The

number of samples and the spatial distribution ofthe tree species is therefore of vital importance forthe interpretation of the classification results.

Although the output image in Figure 27 bproduces a result that is hard to interpret it has theadvantage of keeping the actual speciesdistribution in the original pixels as relativeproportions of tree species classes. It may be anappropriate method to preserve some informationthrough steps of data transformations.

Pilot study 2

ObjectiveThe objective of this study is to investigate if theuse of mixture classes and mixed pixels indicate ascale dependence of geographic concepts and if ascale dependency can be attributed to controllingenvironmental factors.

MethodThe second pilot study has a lot in common withthe first one. It uses the same input data from thesame study areas. In this study however, focus hasbeen shifted from analysis of the informationchange with respect to the National ForestInventory data to go deeper into studying the site-

a)

n=92

b)

n=440

c)

n=36269

Scots pine

Spruce

Conif er mix

Deciduous mix

All 1 ha interpreted pixels(above) having spatiallycorresponding field sites fromNFI (below)

All 25 ha interpreted pixels(above) having spatiallycorresponding field sites fromNFI (below)

Total interpreted data coveringStockholm county. 25 hapixels above, NFI data below

d)

n=50 n=1357

f)

n=2789

0

10

20

30

40

50%

0

10

20

30

40

50%

0

10

20

30

40

50%

0

10

20

30

40

50%

0

10

20

30

40

50%

0

10

20

30

40

50%

Figure 28 Tree species distribution histograms for spatially corresponding regions given for both interpretationresolutions. The total interpreted sample covering entire Stockholm county at 25 ha pixel resolution is given in column 3together with the total NFI field sites covering same region.


specific correspondence between manually andautomatically aggregated datasets.

Three quantitative and qualitative variables,site quality, age, and tree species were chosen forthis study. All three variables have mixtureclasses in their classification definitions. Thespatial aggregation is performed using a majoritydecision. Any class that reaches 50% or moreareal coverage within the aggregated area is takenas class label for the aggregated pixel. If no classreaches the 50% limit the mixture class is chosenfor the output. To investigate the effect that theapplication of mixture classes could have atdifferent spatial resolutions, data were spatiallygeneralized using two slightly differentaggregation operators than those used in the firstpilot study.

Crude aggregation simply sums up thedifferent class areas and applies the classificationrules on these sums. Refined aggregation uses thenon-mixture classified pixels to provideinformation on the detailed distribution of e.g.tree species. This type of class detail enhancementis somewhat ‘cartographic’ enhancing someaspect of the information.

Accordingly, the Crude aggregation operatormakes a strict evaluation of constituent classes.The Refined aggregation operator ignores themixture class in the input thus assuming that theseareas have the same class proportions as the restof the area under consideration.

Both aggregation operators were applied onthe data. This resulted in two images per variablemaking a total of six images of automaticallyaggregated data plus six manually interpretedimages, three at 25 ha resolution and three at 1haresolution.

The resulting images were compared bothwith the original images at 1 ha resolution andwith the manually interpreted images at 25 ha

resolution. The agreement analysis was performedusing the grid-based geographical informationsystem IDRISI. Crosstab analyses were made onall images to produce overall Kappa estimates forthe agreement between each assessed image andtwo reference images. Here both 1ha and 25hainterpreted data were used as reference images.

ResultsAutomatic aggregation of the original 1haresolution data resulted in two images per variablemaking six total images of automaticallyaggregated data. The overall kappa estimatesusing both resolutions of the interpreted data asreference are presented in Table 25, the samenumbers are illustrated in the bar diagrams ofFigure 29. The test of all variables against thehigh resolution 1ha data shows an overallperformance of the manual 25 hainterpretation/aggregation as Kappa values in therange 0.22-0.25, and for the automaticaggregation: 0.25-0.31.

At the same time, the automaticallyaggregated data tested against manuallyinterpreted 25ha data show much betteragreement than against the 1ha-interpreted data.

Different aggregation strategies (crude andrefined) perform differently for the three variablesanalyzed. Using 25 ha data as reference, the‘crude’ method is undoubtedly more similar to theway the interpreter assesses the site quality thanthe refined method. The other two variables showthe same tendency but not as strongly.

DiscussionThe first impression is that all aggregations at thecoarse 25 ha resolution are erroneous. However,using manually interpreted 25-hectare resolveddata as a reference, the overall agreementincreases, indicating that aggregations are

Figure 29 Diagrams showing overall kappa estimates for the automatically aggregated data using 1 ha data (leftdiagram) and 25 ha data (right diagram) as reference images. Values taken from Table 25.


producing a result similar to the humaninterpreter’s classification of the landscape.

The fact that the two methods of aggregationperform differently for the three variables in thisstudy is of major interest for the development ofscale dependent generalization methods. Theindication that some variables seem to producebetter aggregation results than others is also animportant point that needs to be furtherinvestigated.

Hierarchical land cover classification schemessuch as the US Federal Geographic DataCommittee, National Vegetation ClassificationStandard (FGDC-STD-005), and EU´s CORINEland cover classification are intended to beapplicable at a multitude of scales. This intentioncan only be fulfilled if the defined classhierarchies account for how accuratelyenvironmental factors/variables can beaggregated.

The taken approach made it possible to showthe presence of scale dependent perception of apiece of landscape using classification schemesfor three variables. It was also possible to assessscale related differences between manually andautomatically aggregated geographicalinformation.

Further analyses are needed to evaluate thestrength of the indications pointed out here. Theinfluence of spatial pattern metrics such as spatialauto-correlation of variables might for instancedirect the choice of aggregation strategy. If theseresults can be firmly verified this would supportthe idea that constraints imposed byenvironmental factors act as controlling variablesat different scales. This direction may be furtherinvestigated to find methods that can be used toevaluate the spatial domain of a given variable.

Enhanced knowledge of a variable’s spatial size isneeded to promote the development of neutralmeasurements of abstraction levels.

Further research using the presented andsimilar methods will most certainly add to theknowledge base on which to build theories for therepresentation, analysis and communication ofgeographical data across multiple scales.

Overall discussion and conclusions

To summarize these studies the tests indicate asensitivity of certain environmental variables tochanged spatial granularity. The tests also indicatethat these effects are not merely a result ofaggregation. There may also be an effect of thehuman interpretation of landscape features atdifferent resolutions. This is a multifacetedproblem and computer models need to be able toadapt to a multitude of properties of theenvironmental variables registered.

As a general interpretation the results illustratethe problem with geographical data with specificattention to the scale effect. The differencebetween the automated and manual results couldbe held as an interpretation error due to thedifficulty of correctly estimating an area of highheterogeneity. However, the result can also betaken to support a view that the automatic resultneeds to be adjusted to the perception of themanual interpreter in order to communicate theoriginal information correctly. Although actualdata values arrive at one result, this is not theresult that corresponds to the current user’sperception of this piece of landscape. The usermay translate the entirety into something morethan just the sum of the pieces. This again stressesthe importance of a truthful linkage between theuser and the automated system that takes into

Table 25 Comparison of overall Kappa values from the evaluation of aggregation results using manually interpreted25ha resolution data (first column) and 1ha resolution data (second column) as reference images.Overall Kappa estimation results

Ref.img. resolutionVariable Aggregation method 25ha 1haSite quality Autom.crude 0.51 0.31

Autom.refined 0.29 0.30Manual aggr./interp. - 0.22

Age Autom.crude 0.38 0.25Autom.refined 0.33 0.25Manual aggr./interp. - 0.24

Tree species Autom.crude 0.44 0.25Autom.refined 0.41 0.30Manual aggr./interp. - 0.25


consideration a change of context, in this casescale.

The use of mixture classes in the classificationsystem made it difficult to get full insight into theaggregation process since a mixture class reducesthe thematic granularity in those instances ofsource data. Ideally, in cases of mixed conditionsan explicit representation of this mixture would bepreferable over the mixture class label. Themixture class problem was tackled to some extentby using different aggregation methods, such asthe crude and refined procedure described in thetext.

For the continued investigations of the spatialgranularity effect, reported in chapter 7, I madetwo principal refinements of the experimentaldesign. First, I decided to use the Structural dataset from chapter 4, which does not make use ofmixture classes in the same way as R-data.Second, after testing the interpretationconsistency, it was possible to use methods ofrough classification to represent the ambiguitythat occurs in the spatial aggregation process dueto the detected interpretation inconsistensy.

The explicit representation of the landscapetype mixture for each spatial unit seemed moresuited to the task of detecting differences ininterpretation at different spatial granularity.

One main advantage of the R-data is that itwas possible to do thorough quality tests. Theresult of the quality tests in chapter 4 made itpossible to treat the R-data estimates as havingthe same accuracy as National Forest Inventoryestimates.

Acknowledgements

The work reported in this chapter was co-sponsored by AM/FM-GIS, Nordic Region andAxel Lagrelius fund for geographic research.

Rough classification and accuracy assessment • 75

ROUGH CLASSIFICATION AND ACCURACYASSESSMENT

Introduction

This chapter investigates the thematic dimension.In the previous chapter, aggregation effects led touncertainty due to indiscernibility. This chapterinvestigates the possibility to representindiscernibility that occurs due to limitedcategorical granularity. The described theory andmethods will then be used in the analysis ofchapter 7. It is also an important method for thedevelopnment of the proposed GeographicConcept Topology.

The chapter consists of a previously publishedpaper, which here will be reproduced in itsentirety thanks to the kind permission ofTaylor&Francis.

My contribution to the following paper hasprimarily been the initial idea of using rough setsin a reclassification process. In writing, mycontributions are mainly articulated in theIntroduction, Categorization, and Discussionsections, and to some lesser extent also in theExperiment and Conclusion sections

Chapter

6

76 • Rough classification and accuracy assessment


int. j. geographical information science, 2000, vol. 14, no. 5, 475±496

Research Article

Rough classi�cation and accuracy assessment

OLA AHLQVIST1 , JOHANNES KEUKELAAR2 andKARIM OUKBIR21Centre for GeoInformatics, Department of Physical Geography,Stockholm University, 106 91 Stockholm, SwedenEmail: [email protected] for GeoInformatics, Department of Numerical Analysis and ComputerScience, Royal Institute of Technology, 100 44 Stockholm, SwedenEmail: {johannes, karo}@nada.kth.se

(Received 27 May 1999; accepted 13 November 1999 )

Abstract. In search for methods to handle imprecision in geographical informa-tion this paper explores the use of rough classi;cation to represent uncertainty.Rough classi;cation is based on rough set theory, where an uncertain set isspeci;ed by giving an upper and a lower approximation. Novel measures arepresented to assess a single rough classi;cation, to compare a rough classi;cationto a crisp one and to compare two rough classi;cations. An extension to theerror matrix paradigm is also presented, both for the rough-crisp and the rough-rough cases. An experiment on vegetation and soil data demonstrates the viabilityof rough classi;cation, comparing two incompatible vegetation classi;cationscovering the same area. The potential uses of rough sets and rough classi;cationare discussed and it is suggested that this approach should be further investigatedas it can be used in a range of applications within geographic information sciencefrom data acquisition and analysis to metadata organization.

1. IntroductionGeneralization of information into groups is a common step in traditional as

well as computerized geographical analysis. Classi;cation into prede;ned categoriesis in many cases an important step to perform geographical analysis in order tomeasure or describe a phenomenon of interest. No matter what strategy we employfor the generalization procedure, this will lead to a loss of detail in one or moredimensions. This imprecision has to be treated in a controlled manner, and the issueof the representation of uncertainty in spatial data has become more and more of aconcern (Goodchild et al. 1992, Burrough and Frank 1996 ). One of the reasons forthis is that a current goal is to increase geographical information system interoperabi-lity. This need may partly be met by eŒorts to arrive at various standards such asthe work by the Open GIS Consortium, CEN/TC287 and ISO/TC211 or researchfocused on the development of federated database systems (Devogele et al. 1998 ).Still there are a number of issues that remain to be solved before any of theseapproaches will become feasible implementations.

In this paper we focus on one of these underlying problems, that is, semantic

International Journal of Geographical Information ScienceISSN 1365-8816 print/ISSN 1362-3087 online © 2000 Taylor & Francis Ltd

http://www.tandf.co.uk/journals


O. Ahlqvist et al.476

imprecision as a result of categorization, exempli;ed by spatial vegetation and soilinformation. Building on existing theories of rough sets (Pawlak 1982 ), in §2 wediscuss the causes of semantic heterogeneity in geographical data, fuzzy sets as onemethod to handle this heterogeneity and the idea of using rough sets and roughclasses as an alternative. In §3 we develop a number of quality measures as well asmethods for uncertainty assessment useful when comparing layers of roughly classi-;ed geographical information. We also discuss their relation to commonly usedmethods for accuracy assessment in remote sensing and geographical informationsystems, such as overall classi;cation accuracy measures and confusion matrices.Finally in §5 these ideas are taken together in an experiment on geographic data todemonstrate our ;ndings in a reclassi;cation task followed by an accuracy assessmenton the reclassi;ed data.

2. CategorizationTo make geographical analyses and presentations using computerized systems

we need to assemble information about geographic phenomena in a quanti;ablemanner. Geographical data as geographic entities are quanti;ed in terms of temporal,spatial and thematic dimensions (Lanter and Veregin 1992 ) and this complexitymakes it practically impossible to measure all aspects on continuous scales. So, theassemblage of geographical data normally implies some kind of generalization of thebasic dimensions. This generalization is often done by ;xing one dimension, control-ling another and measuring the third (Sinton 1978 ). The two main geographicaldata modeling paradigms used by contemporary geographical information systemscould be looked upon as two cases of this general assumption. Both paradigms ;xtime. The spatial dimension is either controlled by a regular grid or measured as theposition of 1-, 2- or 3-dimensional objects. Finally, the thematic dimension is eithermeasured as ;eld values or controlled by grouping objects into predetermined classes.

A geographical information category can be termed an ‘abstraction’ meaning asimultaneous focus on important content, structure and process while temporarilyignoring certain details, rather than eliminating details (Nyerges 1991a). From thisit follows that abstractions, that apparently form in our minds, can not be regardedas crisply de;ned and delimited objects. We will not go any further into the increas-ingly large body of work on the theoretical basis for how categories are formed. Forthe purposes of this work we conclude that apparently there is a complex backgroundfor the formation of entities which are represented in geographic databases. Toincrease interoperability in terms of a better conceptual match between diŒerentdatabases we need tools to handle the conceptual and literal heterogeneity thatoccurs at the higher levels of geographic information modeling (Raper andLivingstone 1995, Bishr 1998 ). If we separate this heterogeneity into aspects ofsemantic, schematic and syntactic heterogeneity this article deals with methods tohandle schematic heterogeneity, i.e. where classi;cation and hierarchical organiz-ations of real world categories vary across disciplines or contexts (Bishr 1998 ).

Turning now to the issue of how to represent and logically handle semanticheterogeneity, object oriented methods have been proposed as a viable alternative.For example Raper and Livingstone (1995 ) demonstrated how an object orientedgeomorphologic spatial model enabled the representation of entities identi;ed as aresult of a categorization procedure, as well as providing the means to link processmodels to data models. Openshaw (1996 ) recently noted the evident linkage betweengeographical sciences with a multitude of linguistic knowledge expressions and the


Rough classi&cation and accuracy assessment 477

theory of fuzzy sets. Other authors argue in the opposite direction employing discretemodels but using other organizing elements such as concept neighborhoods (Freksaand Barkowsky 1996, Bishr 1998 ). In the next section we will brieKy go throughfuzzy set theory, its current applications and the reasons we see for developing analternative method to deal with imprecision and semantic heterogeneity.

2.1. Fuzzy setsWhen considering uncertainty representation of geographic data in a digital

environment we often ;nd implementations in a ;eld (raster) rather than an object(vector) environment. Fuzzy set theory (Zadeh 1965 ) is an extension of classical settheory. In fuzzy sets each data point has an associated membership value, whichexpresses the degree of membership of the data point in a particular set. The mappingof data points to degrees of membership is called the membership function. Thetheory is well known and contemporary geographic information systems usuallyinclude methods to handle data layers with attribute vagueness using fuzzy settheory. The fuzzy representation provides the means to express partial membership,not in the sense of a probabilistic attribute but in the form of an admission ofpossibility (Burrough and McDonnell 1998 ). Thus, it can be used to representuncertainty about class membership.

So far, most implementations of fuzzy sets in geography have been focused oncharacterizing attribute ambiguity in data. However, during the last few years theissue of spatial vagueness has also been approached (Wang and Hall 1996, Burroughand Frank 1996, Brown 1998 ). Molenaar (1996 ) discusses fuzzy spatial objects usinga semantic formalism but he also expresses some doubt as to whether this can bereadily handled by existing tools. One major obstacle to the diŒusion of fuzzy setbased uncertainty handling is that the necessary membership function can be veryhard to determine. In those cases, we anticipate that an alternative approach basedon rough sets is more appropriate, since there will be no need to determine amembership function, or even resort to an arbitrary one. We might for instance endup with the following reclassi;cation situation: given a categorical map with a mappolygon labelled A, translate this into another classi;cation system, given the alter-native of assigning a label 4 or 6 according to the reclassi;cation rule. Data in realsituations are often of this discrete nature and membership values may be hard todetermine.

2.2. Rough setsPawlak (1982 ) initially introduced the idea of rough sets but links between this

theory and spatial applications have not until recently been elaborated. Rough sets,like fuzzy sets, are an extension of standard mathematical sets. In this extension anuncertain set is represented by its upper and lower approximation. If the data pointis in the lower approximation, we are sure that it is in the set. If it is not in theupper approximation, we are sure that it is not in the set. The spatial representationof the rough set can be in the form of pixels or entire polygons which are given oneof three possible values: not a member of the set (neither in lower nor upperapproximation), maybe a member of the set (in the upper but not in the lowerapproximation) and absolutely a member of the set (in the lower approximation).Thus rough sets may also be used to represent uncertainty about class membership.

The use of rough set theory has not to any large extent been treated in the GISliterature but recent work by Schneider (1997 ) and Worboys (1998a, 1998b) has



shown clear advantages of using a rough set approach in dealing with imprecisegeospatial data. Schneider (1997 ) discusses rough sets in ROSE (Guting et al. 1995 ),in much the same way as we have implemented them below, but does not discussclassi;cations and related topics, focusing much more on the formal modeling aspects.Stell and Worboys (1998 ) reported on a formal approach to multi-resolution inspatial data handling using an approach similar to rough and fuzzy set theories.Their most recent ;ndings are reported in (Worboys 1998a, 1998b) where rough setsare used to handle imprecision due to ;nite spatial or semantic resolution. Also thework of Cohn and Gotts (1996 ) on spatial relations between regions with indetermin-ate boundaries has much in common with rough sets.

3. Rough classi�cationThis section builds on rough sets to introduce the idea of rough classi;cation.

3.1. Rough setsA rough set is a pair (X , X ) of standard sets, the lower approximation and the

upper approximation . In the representation we have chosen, X k X. The meaning ofthese two sets is that if a data point lies in X , we are sure that the point is in therough set, if a data point lies in XÕ X , we are unsure whether or not the point isin the rough set, and if a data point is outside X, we are sure that the point is notin the rough set. These sets can contain either individual points, or continuous areas;we will use the term ‘area’ below. We will often call XÕ X the area of uncertaintyof a rough set. As opposed to rough sets, standard sets are often called crisp, a termthat also applies to a rough set where X 5 X, which implies an empty area ofuncertainty. Conversely, a rough set with an empty lower approximation and a non-empty area of uncertainty can be called completely rough.

According to Duntsch (1997 ), the basic set operations, union, intersection andnegation, can easily be extended to rough sets. Union and intersection as follows:(X , X )< (Y , Y )5 (X <Y , X<Y ), (X , X )> (Y , Y )5 (X >Y , X>Y ). Negation can beextended in two ways: (X , X )*5 (Õ X, Õ X ) and (X , X )+ 5 (Õ X , Õ X ).

3.2. Rough classi&cationA standard, or crisp classi;cation C consists of a number of classes X

i, i ×I,

each of which is a crisp set. I will be called the index set of a classi;cation. For acrisp classi;cation, YiYjÞ i :X

i>X

j5 ù : the classes are pairwise disjoint. We will

de;ne the cardinality of C as |C |5 |<Xi|, which, in this case, is equal to S |X

i|.

Likewise, a rough classi®cation R consists of a number of rough classes Xi5 (X

i,

Xi), i ×I, each of which is a rough set. In a rough classi;cation, however, we would

like to be able to express our uncertainty about which class, if any, a certain areabelongs to. Therefore, instead of pairwise disjointness, we impose only the followingrestriction: YiYj Þ i :X

i>X

j5 h. Of course, since X k X, this implies that

YiYj Þ i :Xi>X

j5 ù. We do not impose such a restriction on pairs of upper approxi-

mations, though, so there may be areas where two or more upper approximationsoverlap, but none where any lower approximation overlaps with anything else thanthe corresponding upper approximation. I will be called the index set of the classi;ca-tion. We will de;ne |R |5 |<X

i|, which, in this case, is not equal to S |X

i|. See ;gure 1

for an example of a rough classi;cation.In this way, we have expressed two fundamental types of uncertainty:



Figure 1. A rough classi;cation.

E Uncertainty of spatial location: If, for a rough class (X ,X ), X 5 X, uncertaintyabout the spatial location of (part of ) that class has been expressed.

E Uncertainty of attribute value: If a certain area is assigned to the area ofuncertainty of more than one class, it is no longer certain to which class thatarea belongs. Thus, uncertainty of attribute value has been expressed.

Given a subset C of I, the index set of a rough classi;cation R, we de;ne itsrough component as follows: R

C5 >

i×C(X

iÕ X

i)Õ <

i1CX

i. A rough component is the

area that is in all the areas of uncertainty of the classes whose index is in C, and innone of the areas of uncertainty of any of the classes of the rough classi;cationwhose indices are not in C. See ;gure 1 for an example rough component. The roughclasses (X

i,X

i) where i ×C we will call the founders of the rough component R

C.

Rough components are de;ned such that they are always pairwise disjoint;furthermore, the union of all rough components of a classi;cation and the lowerapproximations of all its rough classes is exactly the area covered by the classi;cation.

4. Quality measures of classi�cationsIn this section we will discuss the various ways that we can measure the uncer-

tainty in rough and crisp classi;cations. Uncertainty in geospatial data is, as westated above, divided into three major dimensions and since these dimensions areoften dependent on each other, it may not be useful to explore thematic and spatialuncertainty independently (Lanter and Veregin 1992 ). The measures we will examinein the following discussion deal both with the thematic and the spatial aspect ofuncertainty. This section is split up into four subsections, depending on what webase our measures. We can base our measures on a single rough classi;cation, onthe comparison of two crisp classi;cations, on the comparison of a rough and acrisp classi;cation, or on the comparison of two rough classi;cations.

When comparing two classi;cations, we will sometimes assume that one of theclassi;cations is the reference classi;cation, containing our baseline data, whereasthe other one is the assessed classi;cation, the one whose quality we are trying tomeasure.



4.1. Single rough classi&cationsWe de;ne two measures that apply to single rough classi;cations: the overlap

measure and the overall crispness measure. In each of these de;nitions, we will betalking about a rough classi;cation R, consisting of the rough classes (X

i,X

i), i ×I.

4.1.1. T he overlap measureThe overlap measure can be computed as follows: M

o5 (S |X

i|Õ |R |)/|R |. Since

all the area covered by the rough classi;cation is included at least once in S |Xi|,

Mo> 0. If M

o5 0, there is no intersection between any of the rough classes. If M

o>0,

Zi Zj Þ i : (Xi>X

j)Þ ù (since the lower approximations are pairwise disjoint, this

area of intersection must be in the area of uncertainty of said rough classes). Mocan

grow if either the total area of intersection grows or (parts of ) this area are sharedby more rough classes. The upper bound for M

ois the number of classes in the

rough classi;cation minus one, as the maximum overlap occurs when all classescontain the whole area of the classi;cation. For the example in ;gure 1, M

o5

((141 91 10)1 (141 91 4)Õ (141 91 101 141 4))/(141 91 101 141 4)#0.17.Thus, the overlap measure measures the amount of overlap in the rough classi-

;cation. This can be said to measure uncertainty in attribute value.

4.1.2. T he overall crispness measureThe overall crispness measure can be computed as follows: M

c5 S |X

i|/|R |.

Obviously, Mc> 0, and, since the lower approximations are pairwise disjoint, M

c< 1.

If Mc5 0, all the lower approximations of the classes of the rough classi;cation are

empty, making the classi;cation completely rough. If, on the other hand, Mc5 1,

the areas of uncertainty of the classes of the rough classi;cation are empty, andthe classi;cation is in fact crisp. For the example in ;gure 1, M

c5 (101 4)/

(141 91 101 141 4)#0.27. This can easily be extended to a class based measure(one measure for each rough class), instead of an overall measure.

The crispness measure measures how much of the total area of the classi;cationis assigned to lower approximations. This can be said to measure certainty in spatiallocation.

4.2. Comparing two crisp classi&cationsWhen comparing two crisp classi;cations, the standard ;rst step is to compose

an error matrix (Congalton 1991 ), so we start oŒ with a description of this paradigm.Congalton (1991 ) also describes the various accuracy measures that can be computedfrom such a matrix, which we will brieKy review.

4.2.1. Error matrixWe will consider the case where two crisp classi;cations A and B are being

compared. A consists of the classes Xi, i ×I, while B consists of the classes Y

j, j ×J.

Much of the following will only be valid if |I |5 |J | ( let us de;ne N5 |I |), and willprobably only make sense if, in fact, I5 J.

The error matrix is now de;ned as an N by N matrix with elements xi,j having

values xi,j 5 |X

i>Y

j|. We will also assume that <

iX

i5 <

jY

j, i.e. that the two classi-

;cations cover exactly the same area. Because of that and the fact that all the Xi

are pairwise disjoint, and all the Yjare also pairwise disjoint, the error matrix has

the following three properties:



E The row-sum property: Sjxi,j 5 |X

i|, i.e. the sum of the elements in a single row

of the matrix is the area of the corresponding class of A.E The column-sum property: S

ix

i,j 5 |Yj|, i.e. the sum of the elements in a single

column of the matrix is the area of the corresponding class of B.E The total-sum property: S

i,j xi,j 5 |A |, i.e. the sum of all the elements of the

matrix is the area covered by either of the classi;cations.

4.2.2. Some commonly used measuresOverall accuracy is de;ned as the total match between the two classi;cations

divided by the total area of the classi;cations, i.e.:

Ao

5 �i

xi,i/|A | (1)

This can be split up according to the classes in the classi;cations, but in thatcase one is left with the choice of dividing by the column total or by the row total.Assuming that the reference classi;cation is associated with the columns of thematrix, dividing by the column total gives the omission error, also called producer ’saccuracy. Dividing by the row total gives the commission error, also called user ’saccuracy. In other ‘words’:

Oj5 x

j,j/�i

xi,j , and C

i5 x

i,i/�j

xi,j (2)

A more complicated statistical measure, which makes various assumptions aboutthe input data, is KÃ (Congalton 1991 ):

KÃ 5

|A |�i

xi,i

Õ �iA�

jx

i,j ·�j

xj,iB

|A |2Õ �iA�

j

xi,j·�

j

xj,iB

(3)

Please note that, apart from the naming of omission and commission errors,these three measures are completely symmetrical with respect to the columns androws of the matrix, i.e. transposing the matrix, by switching A and B, will have noeŒect on the computed value.

4.3. Comparing rough and crisp classi&cationsWe ;rst discuss how to extend error matrices to comparing rough and crisp

classi;cations, and then we discuss the various measures we can compute from sucha matrix. We also introduce a method by which we may apply any measure thatcan be applied to the comparison of two crisp classi;cations.

4.3.1. Error matrix extensionWhen comparing a rough classi;cation, R, with a crisp one, C, such as in ;gure 2,

we can start with constructing an extended error matrix, see table 1 for the example.For typographic reasons, we choose to associate the crisp classi;cation with therows of the matrix, and the rough classi;cation with the columns of the matrix,independently of which one is the reference classi;cation.

R consists of the rough classes (Xj,X

j), j ×J and C consists of the crisp classes

Yi, i ×I. Again, we will assume |I |5 |J | (and hope for I5 J ) and de;ne N5 |I |. We

will also assume again that <Xj5 <Y

i. The matrix will be of size 2NÖN, having



Figure 2. Comparing a rough and a crisp classi;cation.

Table 1. An extended error matrix.

A AÕ A B BÕB

A 10 15 0 8B 0 8 4 15

elements xi,k . The de;nition of x

i,k depends on k. If k is odd, xi,2jÕ1 5 x

i,j+ 5 |Xj>Y

i|.

Otherwise, k is even, and xi,2j 5 x

i,j? 5 |(XjÕ X

j)>Y

i|. When we talk about the

diagonal of this matrix, we will mean the elements xi,i+

and the elements xi,i?.

The column-sum property still holds: The sum of all the elements in a column isexactly the area of the corresponding part ( lower approximation or area of uncer-tainty, as the case may be) of the corresponding rough class. However, the row-sumand total-sum properties do not hold any longer. If we compute the sum of a rowof the matrix, S

kx

i,k, we do not, as we would like, get |<

jX

j>Y

i|, but we get

Sj|X

j>Y

i|, thus counting all the overlapped areas multiple times, and similarly for

the total sum of all the matrix elements.

4.3.2. T he relative crispness measureAssuming that the crisp classi;cation is the reference one, the relative crispness

measure compares the crispness of a rough classi;cation in the areas where itcorresponds to the crisp classi;cation (where it is ‘right’) to its crispness in areaswhere it does not correspond to the crisp classi;cation (where it is ‘wrong’).

If, on the other hand, the crisp classi;cation is the one that is being assessed, therelative crispness measure measures whether the crisp classi;cation is more correctin areas where the rough classi;cation is certain than in areas where it is uncertain.



Given the following two de;nitions:

D5G �xi,i+

�(xi,i+ 1 x

i,i? )If�(x

i,i+ 1 xi,i? ) Þ 0.

0 Otherwise.

(4)

O5G �iÞjx

i,j+�iÞj

(xi,j+ 1 x

i,j? )If�iÞj

(xi,j+ 1 x

i,j? )Þ 0.

0 Otherwise.

(5)

D measures the crispness in the diagonal elements of the matrix, whereas O measuresthe crispness in the oŒ-diagonal elements. The crispness is not measured in a waycompatible with M

c, since overlapping areas of uncertainty are counted twice.

The relative crispness measure can be computed as follows: Mr5 DÕ O. Since

0< D< 1 and 0< O< 1, Õ 1< Mr< 1. If M

r5 Õ1, that means that there is no crispness

on the diagonal of the matrix, whereas there is perfect crispness oŒ the diagonal ofthe matrix. Values between Õ 1 and 0 mean that there is more crispness oŒ thediagonal than there is on the diagonal. M

r5 0 means that there is no diŒerence in

crispness on or oŒ the diagonal. Values between 0 and 1 mean that there is morecrispness on the diagonal than there is oŒ the diagonal. If M

r5 1, there is no crispness

oŒ the diagonal, while there is perfect crispness on the diagonal. Higher values ofM

rare ‘better’ than lower values. For the example error matrix in table 1, M

r#0.31,

which tells us that the diagonal elements are signi;cantly more crisp than theoŒ-diagonal elements.

4.3.3. Overall accuracyAs stated above for the case of two crisp classi;cations, overall accuracy is the

total correct area divided by the total area. This can be extended to the case of onecrisp and one rough classi;cation. However, we will not get one single answer, but,;ttingly, an upper and a lower bound.

The upper and lower bounds for overall accuracy are as follows: S xi,i+

/

|R |< Ao< S (x

i,i+ 1 xi,i?)/|R |. We can be sure that we are not counting anything

twice, since we only use elements from the diagonal of the matrix. The areascorresponding to these diagonal elements are all subsets of diŒerent classes of C,and those are disjoint. For the example extended error matrix in table 1, 0.27<A

o< 0.86.The way to arrive at the bounds given above, is to think of the ways the rough

classi;cation can be converted to a crisp one. If the assumption is made that it ispermissible not to assign (parts of ) areas of uncertainty to any crisp class at all, thebounds given above are valid and tight. If this assumption is not valid, we will haveto resort to the procedure outlined below.

4.3.4. Error matrix parameterizationSince the row-sum and total-sum properties do not hold for extended error

matrices, it is hard to directly apply traditional measures that involve elementsbeyond those on the diagonal. For this kind of global measures, we will convert therough classi;cation into a crisp one. However, doing so in a consistent fashion is



not straightforward, since there may be overlap in the areas of uncertainty, and thisinformation is missing from the matrix. So, we will have to go back to the roughclassi;cation, R, and convert this rough classi;cation to a virtual crisp classi;cation,V , on which (together with the crisp classi;cation, C ) we will base our crisp errormatrix. V will have the same index set as R, namely J. So there will be a one to onecorrespondence between the classes (X

j,X

j) of R and the classes Z

jof V .

Since there is no unique way to convert a rough classi;cation to a crisp one, andthe value of the measure that we want to compute depends on exactly how we dothis, we will have to parameterize our matrix depending on which classes of V , ifany, we assign certain component areas. The component areas will be Y

i>R

C, the

intersections of the classes of C with the rough components of R ; in the situationin ;gure 2 these are the white areas that have numbers in them. We do not needto parameterize on the lower approximations of the classes of R, since we haveno choice where to assign them; they will become zero-th order terms in ourparameterized matrix.

Given the de;nition Cj5 {R

C| j ×C}, the N by N parameterized error matrix has

elements pi,j given by:

pi,j 5 |Y

i>X

j|1 �

K×Cj

(xi,j,K · |Yi

>K |) (6)

with the following two restrictions imposed upon the parameters:

0< xi,j,K < 1 and�

jx

i,j,K < 1 (7)

This last inequality assumes that it is permissible not to assign some of the areacovered by the areas of uncertainty to any class of V . If this is not permissible, theinequality becomes an equality: S

jx

i,j,K 5 1.If we apply this to the example in ;gure 2, we get the parameterized error matrix

in table 2, although we have simpli;ed the indices somewhat. The additional restric-tions that we should impose are x2 1 x3 < 1 and x6 1 x7 < 1.

Now we can apply any accuracy measure to this new matrix, and we will geta formula that gives us the value of the accuracy measure depending on exactlyhow we treat the areas of uncertainty in the rough classi;cation. Theoretically, wecould then maximize and minimize this equation within the bounds imposed onthe parameters, and obtain the maximum and minimum values for the accuracymeasure applied.

Let us look at an example. Omission and commission errors are the diagonalelements of table 2 divided by, respectively, the sum of the column and the sum ofthe row. In any parameterized error matrix, this comes down to a quotient of twolinear expressions, combined with the appropriate restrictions mentioned above.

For another example, let us look at KÃ . Each of the xi,j is a linear expression in

an unknown number of parameters in our case, the same goes for Sjx

i,j and Sjx

j,i .

Table 2. A parameterized error matrix.

A B

A 10111x11 4x2 4x31 4x4B 5x61 3x8 4110x51 5x7



|A |, meanwhile, is perhaps best translated as |C |, which is constant. Thus, the wholecomes down to the quotient of two quadratic expressions which should be optimizedover a hypercube limited by some hyperplanes.

For both these examples, at least local minima and maxima can be found withsoftware such as Matlab. Since ;nding the extremes of a quadratic expression overan N-dimensional box is known to be NP-complete (Garey and Johnson 1979 ),optimizing the expression for KÃ is at least that hard. We have been unable to ;ndany mention about the complexity of ;nding the extremes of a quotient of two linearexpressions, so have no such information about omission and commission errors.

4.4. Comparing two rough classi®cationsIn the rest of this subsection, we will assume that we are comparing the rough

classi;cation R, consisting of (Xi,X

i), i ×I, with S, consisting of (Y

j, Y

j), j ×J. We

will make the usual assumptions |I |5 |J | and <Xi5 <Y

jand the de;nition N5 |I |.

We ;rst discuss the error matrix extension for this case and its properties, thenexplore measures for it, and ;nally parameterize it to apply conventional measures.

4.4.1. Error matrix extensionWhen comparing two rough classi;cations with each other, like in ;gure 3, we

will use the two-dimensionally extended error matrix, or 2Deem. The 2Deem for theexample from ;gure 3 is given in table 3.

The matrix will be of size 2NÖ2N, having elements xk,l. The de;nition of x

k,ldepends on both k and l. If both are odd, x2iÕ1,2jÕ1 5 x

i+ ,j+ 5 |Xi>Y

j|. If k is odd,

but l is even, x2iÕ1,2j 5 xi+ ,j? 5 |X

i>(Y

jÕ Y

j)|. Symmetrically, if k is even, but l is

odd, x2i,2jÕ1 5 xi?,j+ 5 |(X

iÕ X

i)>Y

j|. Finally, if both are even, x2i,2j 5 x

i?,j? 5

| (XiÕ X

i)>(Y

jÕ Y

j) |. When we talk about the diagonal of this matrix, we will mean

the four sets of elements xi+ ,i+ , x

i+ ,i? , xi?,i+ and x

i?,i? .For a 2Deem, none of the row-sum, column-sum and total-sum properties hold

Figure 3. Comparing two rough classi;cations.



Table 3. A two-dimensionally extended error matrix.

R , S � A AÕ A B BÕ B

A 7 3 0 0

AÕ A 3 12 0 5B 0 0 2 2

BÕ B 0 8 2 17

any longer, since neither the parts of classes associated with the individual columnsnor the parts of classes associated with the individual rows are pairwise disjointany more.

4.4.2. Overall accuracyFor this case, as well, an upper and a lower bound for overall accuracy can

easily be computed.

4.4.3. Error matrix parameterizationTo apply conventional crisp measures to this case, we can use an approach

similar to the one we used for the rough-crisp case in §4.3.4.; constructing a para-meterized error matrix. This time, we will parameterize on the intersections of acomponent of R with a component of S. We will repeat the de;nition C

i5 {R

C|i ×C}

and add Dj5 {S

C| j ×C}. The elements of the NÖN parameterized error matrix are

now given by:

pi,j 5 X

i>Y

j1 �

K×Ci

(xi,j,K,Yj

· |K>Yj|)1 �

L×Dj

(xi,j,Xi ,L

· |Xi>L |)1

�K ×Ci

�L×Dj

(xi,j,K,L · |K>L |) (8)

Of course, still

0< xi,j,K,L < 1 and YK, L :�

i,jx

i,j,K,L < 1 (9)

(or make that ...5 1 if you do not believe in not assigning parts of areas of uncertaintyto any class). Like in §4.3.4., this parameterized error matrix is linear in its parameters,so the conclusions we have drawn there apply to this case as well.

The only diŒerence is in the number of parameters. The maximum possiblenumber of parameters is much larger for this case (on the order of 22N rather than2N , a quadratic diŒerence). In practice, however, many, if not most of these parameters(in either case) will only ever occur in the matrix multiplied by zero, and can thusbe dropped from calculations. It is harder to estimate this practical number ofparameters, but it seems likely that it will be larger in this case than in the casein §4.3.4.

5. ExperimentThe goal of our experiment was to use the idea of rough classi;cations to compare

two vegetation maps, covering the same area of 1.8 by 2.3 km in Stockholm County,just north of Stockholm. The two maps had been produced for nature preservation



tasks and diŒerent classi;cation schemes were used to delineate vegetation categorieson a categorical map sheet in 1:15 000 scale. The quality in terms of spatial andattribute accuracy of these maps are not known but for the purpose of this experimentthat is of less importance, since the main idea is to demonstrate the technique ofusing rough classi;cations.

5.1. Experiment descriptionIn our experiment we considered the two layers as two diŒerent representations

of the same area. We started out with two vegetation maps which provided twocrisp vegetation classi;cation layers called veg9 and veg35. Building on vegetationconcepts that were introduced by PaÃhlsson (1972 ), veg9 was classi;ed using moistureand nutrient status as the classi;cation basis giving nine diŒerent vegetation classesfor the experiment area. Veg35 used a Nordic classi;cation system described byPaÃhlsson (1995 ), giving 35 diŒerent vegetation classes for the experiment area. Theclass descriptions are given in tables 4 and 7. To compare the two, we needed toreclassify one of them; for obvious reasons, we picked veg35. This can be seen as ageneralization operation where we reduce the number of classes from 35 to 9 andsimultaneously change the classi;cation system.

Reclassi;cation of veg35 into the classi;cation scheme used for veg9 would be astraightforward task if a one to one or many to one correspondence between thetwo classi;cation systems existed. Since this is not the case, we used the idea ofrough classi;cation to represent the uncertainty in the reclassi;cation operation.Rules to reclassify from the classi;cation system given by PaÃhlsson (1995 ) to theone given by PaÃhlsson (1972 ) were constructed using both the guidelines given in(PaÃhlsson 1995 ) and our own knowledge about the association between the diŒerentvegetation classes. The re-classi;cation rules are given in table 7. The reclassi;cationrules can also be constructed using decision tables obtained through training datasets as described in (Skowron and Grzymala-Busse 1993 ).

Since the reclassi;cation of veg35 into rough classes introduces a certain degreeof uncertainty it will be of interest to see if this uncertainty can be resolved by usingadditional information about the area. Given that the veg9 classi;cation system usesnutrition and moisture properties we could argue that more information on theseproperties could give evidence to resolve some of the areas of uncertainty. We decidedto use soil information as proxy for the moisture component and provided thisevidence in the form of a digitized soil map covering the experiment area. Thisrequired a reclassi;cation from the soil map classes (table 5) into rough veg9 classes.

Table 4. Classes in veg9.

Id Description

1 bare rock2 dry heather3 mesic heather4 wet heather5 swamp6 dry meadow7 mesic meadow8 wet meadow9 steppe alike



Table 5. Soil classes.

Id Description

1 washed moraine2 outcrop3 moraine4 sandy sediments5 postglacial clay6 clay sediments7 muddy clay8 mud peat

The rules for rough classi;cation of soil classes in the soil map was entirely done byour own knowledge about the association between soil classes and vegetation classesin the veg9 classi;cation scheme. These classi;cation rules are given in table 6.

We proceeded as shown with non-dashed lines in ;gure 4, where single boxes arecrisp layers and double boxes are rough layers. We took veg35 and the soils classi;ca-tions and reclassi;ed them into two separate rough classi;cations using the nineclasses of veg9. Having this data, we produced some statistics about them and abouttheir match with veg9, which are presented in §5.2.

5.1.1. Merging two rough classi&cationsThe problem under consideration in this section is, given two rough classi;cations

of the same area, with the same classi;cation scheme, how do we combine them?For example, if, with respect to a speci;c piece of land, the one classi;cation claimsthat it is either forest or city, while the other classi;cation claims that it is eitherlake or forest?

Let us call the input classi;cations A and B, and let us say that we are lookingat these classi;cations of point x. We will call the set of classes of A of which x is amember K; for B, we will call that set L . The output classi;cation we will call C,and the set of classes of which x is a member in C we will call M.

Note that in this representation we can not diŒerentiate between if x is in thearea of uncertainty of exactly one class or in the lower approximation of exactly oneclass. This is acceptable, because we are looking at complete classi;cations, and inthat case x being in the area of uncertainty of exactly one class does not really makesense. This is true for M, the set of output classes, too. If M contains more than one

Table 6. Rough reclassi;cation from soils to veg9.

Soil class veg9 rough classes

1 2, 62 2, 63 3, 74 3, 75 3, 76 4, 87 4, 88 4, 8



Table 7. Veg35 classes and rough reclassi;cation to veg9.

Id Description Veg9 rough classes

1 pine forest type 1 22 pine forest type 2 23 spruce forest type 1 2, 34 spruce forest type 2 4, 85 spruce forest type 3 3, 76 spruce forest type 4 87 spruce forest type 5 48 mixed conifer forest type 1 29 mixed conifer forest type 2 2, 310 mixed conifer forest type 3 3, 711 mixed conifer forest type 4 none12 broad leafed deciduous forest type 1 713 broad leafed deciduous forest type 2 814 broad leafed deciduous forest type 3 715 alder forest type 1 816 alder forest type 2 817 birch/aspen forest type 1 2, 318 birch/aspen forest type 2 719 birch/aspen forest type 3 2, 320 brushwood 721 mixed forest type 1 2, 322 mixed forest type 2 723 mixed forest type 3 824 early successional forest none25 clear cuts/non determined none26 alder scrub 827 geoliteral shore vegetation none28 subliteral shore vegetation none29 dry meadow type 1 630 dry meadow type 2 6, 731 meadow type 1 6, 732 meadow type 2 6, 733 meadow type 3 834 meadow type 4 3, 735 wet meadow type 8

class, x will be in the areas of uncertainty of all those classes. If M contains onlyone class, x will be in the lower approximation of that class.

There are two cases that have to be considered separately. Either the twoclassi;cations contradict each other (i.e. K>L 5 ù ) or they do not (i.e. K>L Þ ù ). Itis quite likely that any two classi;cations will contradict each other in some points,if only in sliver areas along the not-quite-equal borders of classes.

Let us consider the non-contradictory case ;rst. In this case, we will apply therule that M5 K>L . For a class c, if c 1 K or ck L , that means that that classi;cationsays that it is certain that x is not in c. So only if c ×K and c ×L can we concludethat c ×M.

But what about the contradictory case? Here, we will adopt the reasoning that,even if they can not both be right, one of them probably is. Since we do not knowwhich one, we will just say that M5 K<L . So if classi;cation A says that our point



Figure 4. Process graph.

X is forest, while B says that it is really water, we will conclude that it must beeither forest or water.

5.1.2. Implementation detailsThis work was carried out using a custom-built system, ROVer (Rough Object

Visualizer), shown in the screenshot in ;gure 5. It is based on ROSE, the RObustSystem Extension, a library of spatial operators based on exact arithmetic, speciallydeveloped for integration into a spatial database system (Guting et al. 1995 ), withan extension to ROSE that deals with rough sets. The user interface was writtenwith gtk1 , a small, e�cient and Kexible GUI library for X11. Perl has been embeddedas a programming language. ROVer can display any number of overlaid rough andcrisp classes and classi;cations, even transparently, using a highly dynamic userinterface, and is driven by Perl scripts, which call C routines in ROSE to do theactual computations. What is not apparent from the screen-shot is that as the usermoves the mouse over the geometry shown, not only do the coordinates update, butalso the little icons in the legend change to indicate which classes the mouse pointeris in (showing a ‘1 ’ if the mouse is in the lower approximation, a ‘?’ if it is in thearea of uncertainty, and a ‘Õ ’ if it is outside the rough class).

5.2. ResultsThe reclassi;ed veg35 layer has an overlap measure of 1.058 and a crispness

measure of 0.308. That means that about 30% of the area is certain, and the remaining70% is uncertain. On that uncertain area, there is overlap (overlap measure is greaterthan zero). In fact, the overlap covers a little bit more than the whole classi;cation;there must, in other words, be areas which are in at least three areas of uncertainty.When comparing the reclassi;ed veg35 layer with the veg9 layer, we get an overallaccuracy in the range of [0.039, 0.461], not very good.

Themerged classi;cation has an overlap measure of 1.981 and a crispness measureof 0.168. That means that the certain area has almost been halved as compared to



Figure 5. ROVer screenshot.

veg35, from about 30% to about 17%. About 83% of the area is uncertain, and onthose 83% there is an overlap that covers almost twice the whole classi;cation. Since1.981>0.83Ö2, there must be areas where there is a triple overlap; in other words,there must be areas which are in the area of uncertainty of at least four classes. Theoverall accuracy when comparing this classi;cation to veg9 is in the range of[0.031, 0.697], still not very good. It does mean, however, that if we get moreinformation about the area under consideration, we could improve the overallaccuracy at most up to 0.697.

6. DiscussionOur experiment shows the simplicity of rough classi;cation in situations where

we have di�culties to associate a particular area with one speci;c category basedon available information. The experiment shows the applicability of rough classi;ca-tion both to reclassi;cation of existing datasets and to performing spatial analyseson these datasets. We also see a potential to work with rough classi;cation inprimary classi;cation work such as surveying. To exemplify this we would like tohighlight an example from the ;eld of research in glacial geomorphology where wethink that a rough classi;cation would be well suited.

In a recent paper Hattestrand (1998 ) treated the formation and distributionalcharacteristics of glacial geomorphological features, particularly ribbed moraines. As



a basis for his study a survey of glacial landforms was done covering almost two-thirds of Sweden. One type of moraine that was mapped was the Veiki moraine, atype of hummocky moraine characterized by plateaus with rim ridges separated bydepressions. In his survey Hattestrand used a narrower de;nition of this morainetype than previous work by other authors, and what came out was a more restricteddistribution of this speci;c landform. Especially the spatial distribution of the ridgescoincided with the distribution of other ice-marginal features and suggested anassociation between them. This ;nding together with other observations made areinterpretation of the early phases in the latest glaciation of Fennoscandia possible.We suggest that surveying of geomorphological landform elements using roughclassi;cation would enable a rough assignment of nearly matching features to theupper approximation of the set. Subsequent analysis of for example the distributionpattern may then exploit this preservation of uncertainty and make ;ndings as theone mentioned possible.

From a geographical information system users point of view the visual interfaceis a primary tool for interpretation, analyses and visualizations of geographical data.Most current geographic information system software provides ready to use functionsto zoom in and out, regroup, aggregate and generalize the data. Still most people inthe geographic information community now realize that the digital generalizationprocess is still problematic and that research needs to develop methods for dataabstraction and data reduction that keep track of data quality (Muller et al. 1995 ).The advantage of the rough set based approach is that one doesn’t have to quantifyuncertainty.

There is of course a risk of being too vague if every question is answered with arough ‘maybe’. A way to compensate for this vagueness is to provide additionalinformation on the context of the concept or class. Freksa and Barkowsky (1996 )discuss fuzziness in geographical objects and argue that a discrete model of spatialconcepts that preserves the conceptual neighborhood has clear advantages in practice.A conceptual neighborhood means a situation where we either have compatibleconcepts on diŒerent levels of granularity or we have competing concepts on thesame level of granularity. Some examples demonstrate this idea of conceptual neigh-borhood in the context of concepts for spatial relations (Freksa and Barkowsky1996, Cohn and Gotts 1996 ). In our experiment we show that the same ideas canbe applied to concepts such as vegetation classes. The rules we used to reclassify ourinitial data into rough layers apply the idea of conceptual neighborhoods eitherstated explicitly (PaÃhlsson 1995 ), or by using an expert decision approach. Of course,the neighborhood structure is not preserved as such in the data but in all cases ofuncertainty each item is assigned rough set values according to the concept neighbor-hood. Thus, we see rough classi;cation as a candidate approach for the representationof conceptual neighborhoods in a wider perspective.

The explicit de;nition of multiple conceptual hierarchies as a heterachy of con-cepts as part of the meta data may help to preserve the geographical meaning indatabases (Nyerges 1991b). Such an idea to employ a discrete model on the seman-tic level is closely related to the mechanisms developed by Bishr (1998 ) to captureand handle concepts at a level of application semantics. Bishr (1998 ) presents athorough background to the reasons why and how semantic similarity betweenconcepts should be used to develop mediator concepts to resolve, for example,schematic heterogeneity.

The combination of two rough layers performed in our experiment in order to



resolve some of the uncertainty can be compared to the use of inference networkoperations. Examples of commonly used methods include: fuzzy logic AND and ORoperations, Bayesian updating of prior probabilities to posterior probabilities, andthe related evidence theory or Dempster-Shafer theory. The latter divides a probabil-ity space into two parts, an inner measure given by a belief function and an outermeasure given by a plausibility function.

There is an apparent similarity between Dempster-Shafer belief and plausibility,and the upper and lower approximations of rough sets. Let us, as an example, lookat evidence of the inKuence of civilization. At the locations where one sees, say, grass,one can deduce that some external inKuence, be it a herded Kock of sheep or alawnmower, must keep out the trees from that location. This can be seen as positiveevidence for the inKuence of civilization. At other locations, such as in a forest, suchevidence may not be present, but this is not the same as evidence of the absence ofcivilization. Evidence of absence could be given by, say, extensive growth of reindeermoss, which is very fragile and grows very slowly. Dempster-Shafer belief andplausibility diŒerentiate between these options, and so do rough sets.

This has motivated work on the relation between rough set theory and evidencetheory, which has been reported in (Skowron and Grzymala-Busse 1993 ). It followsfrom their work that the overall crispness measure described in §4.1.2. can beinterpreted as a belief value in the sense of Dempster-Shafer logic. However, muchof their ;ndings remain to be applied in a spatial context such as the one reportedhere. Combination of datasets in overlay operations is an important step to per-forming multi criteria evaluation where an attempt is made to combine a set ofcriteria for a decision according to a speci;ed question. In such a set of criteria wemay want to use boolean, fuzzy and rough set layers.

7. ConclusionWe have discussed rough set based classi;cation, and have argued that it is a

viable alternative to fuzzy set based classi;cation when a classi;cation with explicitrepresentation of uncertainty is desired. We have introduced various useful conceptsand measures related to rough classi;cation, and have shown how to compare arough classi;cation to both a crisp and a rough classi;cation. In our experiment,we have shown the practical use of this theory, by converting data from oneclassi;cation system to another, so they can be compared with other data in thelatter classi;cation system, or be used for further processing that requires data inthis classi;cation system; the uncertainty in our data is explicitly represented at eachstep in the analysis. This experiment would not have been possible without thetheory developed here.

It would be possible to perform uncertain reclassi;cation based on fuzzy settheory. However, as stated previously, membership functions are either di�cult todetermine or rather arbitrary. When discussing rough sets, (Yao, 1998 ) states: ‘Roughmembership functions may be interpreted as a special type of fuzzy membershipfunctions...’. One could argue then that the decision whether a data point ‘is’, ‘is not’or ‘is maybe’ in a given class, for instance, is also rather arbitrary. However, in ourcase, given the reclassi;cation rules, it is straightforward to determine the roughmemberships for each data point of the classi;cation.

7.1. Future workTwo alternate reclassi;cation approaches, indicated with dashed lines in the

graph in ;gure 4, which remain to be investigated, are:



E The ‘classical’ direct conversion: Converting the crisp classi;cation veg35directly to the nine classes of veg9, and comparing them. This would seem tobe an inferior approach, since we cannot indicate our uncertainty when con-verting from one classi;cation system to another: one input class must alwaysmap onto exactly one output class. It would, however, give us a baseline withwhich to compare the other approaches.

E The rough direct conversion: Converting the two crisp classi;cations into onerough classi;cation in one single step, without any intermediate classi;cations.This way we can be sure that we do not lose any information in intermediatesteps, which does happen in the indirect approach that we took in the experi-ment above, see ;gure 4. A closer matching between the conversion and thereference classi;cation can probably be achieved this way, because we havethe maximum amount of information available on which to base our conclu-sions, but it is more knowledge-intensive; making this mapping is much moredi�cult than making the two consecutive mappings.

We intend to look into more appropriate measures that can be derived directlyfrom the rough and/or crisp classi;cations, without the need for converting therough classi;cation to a virtual crisp classi;cation. We should probably also explorethese alternate approaches to our experiment. It would be interesting to look atwhether and how the reclassi;cation approach and the measures developed in thisarticle can be extended to fuzzy data.

Another direction one might take is to develop more high-level interfaces torough set based systems. There would then, as we mentioned in the discussion, be adesire to use crisp and fuzzy data as well as rough data in such high-level systems.This would require some way of combining these three kinds of data in multi criteriaanalysis, for example. Combining crisp and fuzzy layers is rather straightforward,using crisp data as a mask on fuzzy layers. The question of how to combine roughdata in such overlay operations is, as far as we see today, nontrivial.

A ;eld that we have not covered here is the one concerned with spatial aggregationof raster data. The increased use and availability of diŒerent air-borne sensor imageryproduces datasets on the environment from local to global scales. In order to exploitthese data there is a current need for proper aggregation and generalization methods.In this paper we have used rough set classi;cation to explore the possibilities toperform generalization using a thematic approach. An interesting question thatremains to be studied is how to incorporate this with a spatial aggregation of forexample satellite imagery or other grid datasets.

Altogether, our ;ndings suggest that rough set theory and rough classi;cationopens an interesting ;eld for further studies as a complement to existing theories onthe representation of uncertainty in geographical information.

AcknowledgmentsWe are grateful to the anonymous reviewers; their comments made this a better

paper. Gunnar Andersson helped out with the optimization problem. Johan Hstadpointed out the NP-completeness of the quadratic optimization problem. PerSvensson has helped us with many useful remarks about this paper. WithoutProfessor Hans Hauska and the CGI interdisciplinary project course, none of thiswould ever have happened.

The work reported here was sponsored by the Swedish Council of Building



Research (BFR), the Swedish Board for Co-ordination of Research (FRN), theSwedish Board for Engineering Sciences (TFR), the Swedish Board for Technologicaland Industrial Development (NUTEK), the Swedish Board for CommunicationsResearch (KFB) and the Swedish Defence Research Establishment (FOA) throughthe Centre for Geoinformatics, Stockholm.

References

Bishr, Y, 1998, Overcoming the semantic and other barriers to GIS interoperability.International Journal of Geographical Information Science, 12, 299–314.

Brown, D. G., 1998, Classi;cation and boundary vagueness in mapping presettlement foresttypes. International Journal of Geographical Information Science, 12, 105–129.

Burrough, P. A., and Frank, A. U. (editors), 1996, Geographic Objects with IndeterminateBoundaries (London: Taylor & Francis).

Burrough, P. A., and McDonnell, R. A., 1998, Principles of Geographical Information Systems(New York: Oxford University Press).

Cohn, A. G., and Gotts, N. M., 1996, The ‘Egg-Yolk’ Representation of Regions withIndeterminateBoundaries. In Geographic Objects with Indeterminate Boundaries, editedby P. A. Burrough and A. U. Frank (London: Taylor & Francis), pp. 171–187.

Congalton, R. G., 1991, A review of assessing the accuracy of classi;cations of remotelysensed data. Remote Sensing Environment, 37, 35–46.

Devogele, T., Parent, C., and Spaccapietra, S., 1998, On spatial database integration.International Journal of Geographical Information Science, 12, 335–352.

Duntsch, I., 1997, A Logic for Rough Sets. In T heoretical Computer Science 179 (ElsevierScience), pp. 427–436.

Freksa, C., and Barkowsky, T., 1996, On the Relations between Spatial Concepts andGeographic Objects. In Geographical Objects with Indeterminate Boundaries, edited byP. A. Burrough and A. U. Frank, 1996 (London: Taylor & Francis), pp. 109–122.

Garey, M. R., and Johnson, D. S., 1979, Computers and Intractability—A Guide to the T heoryof NP-Completeness (New York: W. H. Freeman and Company).

Goodchild, M. F., Guoqing, S., and Shiren, Y., 1992, Development and test of an errormodel for categorical data. International Journal of Geographical Information Systems,6, 87–104.

Guting, R. H., de Ridder, T., and Schneider, M., 1995, Implementation of the rose algebra:E�cient algorithms for realm-based spatial data types. In Advances in SpatialDatabases: 4th International Symposium, SSD’95, Lecture Notes in Computer Science,no. 951 (Springer-Verlag), pp. 000–000.

Hattestrand, C., 1998, The Glacial Geomorphology of Central and Northern Sweden Ser. Ca 85(Stockholm: Sveriges Geologiska Undersokning).

Lanter, D. P., and Veregin, H., 1992, A research paradigm for propagating error in layer-based GIS. Photogrammetric Engineering & Remote Sensing, 58 (6), 825–833.

Molenaar, M., 1996, A syntactic approach for handling the semantics of fuzzy spatial objects.In Geographical Objects with Indeterminate Boundaries, edited by P. A. Burrough andA. U. Frank, 1996 (London: Taylor & Francis), pp. 207–224.

Muller, J-C., Wiebel, R., Lagrange, J-P., and Salge, F., 1995, Generalization: State ofthe Art and Issues. In GIS and Generalization: Methodology and Practice, edited byJ-C. Muller, J-P. Lagrange and R. Weibel (London: Taylor & Francis), pp 3–17.

Nyerges, T. L., 1991a, Geographic information abstractions: Conceptual clarity for geographicmodeling. Environment and Planning A, 23, 1483–1499.

Nyerges, T. L., 1991b, Representing Geographical Meaning. In Map Generalization: MakingRules for Knowledge Representation, edited by B. P. Butten;eld and R. B. McMaster(Essex, England: Longman Scienti;c & Technical), pp. 59–85.

Openshaw, S., 1996, Commentary: Fuzzy logic as a new scienti;c paradigm for doing geo-graphy. Environment and Planning A, 28, 761–768.

PaÃhlsson, L., 1972, Oversiktlig Vegetationsinventering (Stockholm: Statens NaturvaÃrdsverk).PaÃhlsson, L., 1995, Vegetationstyper i Norden, Tema Nord, no. 1994:665 (Copenhagen: Nordic

Council of Ministers).


Rough classi&cation and accuracy assessment496

Pawlak, Z., 1982, Rough sets. International Journal of Computer and Information Sciences,341–356.

Raper, J., and Livingstone, D., 1995, Development of a geomorphological spatial modelusing object-oriented design. International Journal of Computer and InformationSciences, 9, 359–383.

Schneider, M., 1997, Spatial Data T ypes for Database Systems, Lecture Notes in ComputerScience, vol. 1288 (Berlin: Springer).

Sinton, D. F., 1978, The inherent structure of information as a constraint to analysis: mappedthematic data as a case study. In Harvard Papers on GIS, edited by G. Dutton, vol. 7(Reading, MA: Addison-Wesley), pp. 1–17.

Skowron, A., and Grzymala-Busse, J., 1993, From rough set theory to evidence theory. InAdvances in the Dempster-Shafer T heory of Evidence, edited by R. Yager, M. Fedrizziand J. Kasprzykpages (New York: Wiley), pp. 193–236.

Stell, J., and Worboys, M., 1998, Strati;ed map spaces: A formal basis for multi-resolutionspatial databases. In Proceedings of the 8th International Symposium on Spatial DataHandling (Burnaby, Canada: International Geographical Union), pp. 180–189.

Wang, F., and Hall, G. B., 1996, Fuzzy representation of geographical boundaries in GIS.International Journal of Geographical Information Systems, 10, 573–590.

Worboys, M. F., 1998a, Computationwith imprecise geospatial data. Computers, Environmentand Urban Systems, 22, 85–106.

Worboys, M. F., 1998b, Imprecision in ;nite resolution spatial data. Geoinformatica, 2,257–280.

Yao, Y. Y., 1998, A comparative study of fuzzy sets and rough sets. Information Sciences,109, 227–242.

Zadeh, L. A., 1965, Fuzzy sets. Information and Control, 338–353.

Estimating sematic uncertainty in land cover classifications • 99

ESTIMATING SEMANTIC UNCERTAINTY INLAND COVER CLASSIFICATIONS

Introduction

After the pilot tests in chapter 5 it was concludedthat the use of mixture classes in R-data made itdifficult to fully investigate the effect of spatialaggregation. The aggregation experiments alsocalled for a way to handle alternative or tiedoutcomes in the result. Further studies ofaggregation effects using the chosen approachtherfore required some solutions to theseproblems.

First, I chose to use the structural datadescribed in chapter 4 instead of the R-data. Thisdata set was more consistent in the interpretationof the different attributes and it consequently gavean areal estimate of each interpreted variable.

Second, the problem of mixed aggregationoutcomes or ties was interpreted as a problem ofindiscernibility due to a limited granularity in theinformation. Chapter 6 fruitfully developed themethod of rough classification and roughaccuracy assessment. These findings are nowanticipated to be applicable to the previouslyproblematic aggregation effect.

Thus, these two modifications are the majordifference between this more elaborate study ofspatial aggregation effects.

Scale change by spatial aggregation istechnically a straightforward process. Harderthough is to produce information from theaggregation that is consistent with the analysispurpose at the target scale. This chapter illustratesmethods to search for semantic differences in ascale change from fine resolution, pixel basedlandscape data generalized into a coarseresolution data set.

Despite the large interest in data aggregationand generalization, very few, if any, studies haveinvestigated digital generalization effects with aquantitative, location specific approach. Thiswork anticipates that the use of a location specificassessment of aggregation results will provide anadditional and sometimes more informative aspectof aggregation effects. The main purpose of thischapter is to demonstrate location specific

methods for estimation of scaling andgeneralization effects on the semantic accuracy ofcategorical data sets.

Experimental design: Conceptualdiscussion

To define the general scope of this study, consideran initial measurement rL at a spatial granularityof r spatial units and another identicalmeasurement rL10 made with a granularity of

r∗10 . We would like to find the generalizationfunction ()g that satisfies

Eq. 1 rr LLg 10)( =

In other words, can we define a generalizationprocedure for a given variable so that the outcomeof the generalization is the same as ifmeasurement were performed at the desired levelof generalization? The assumption of this work isthat any deviation from a strict calculation oflandscape content indicates scale dependence inthe interpretation of used landscape concepts.Thus, if we can falsify Eq. 1 it strengthens thealternate hypothesis, that there is a scaledependent component in the use of the dataconcepts. In the context of this experiment, thedefinition of the digital generalization method is adigital implementation of a manual interpretationand classification instruction.

One large difference between this and otherstudies of aggregation is that it uses data acquiredfrom manual interpretation of printed maps. Thereference dataset is thus derived using exactly thesame technique and exactly the same inputinformation as the one used in the collection oflower resolution data. The major reason for thisdesign is to evaluate whether there is a semanticeffect involved in aggregation of geographicinformation. Semantic differences will here beestimated as differences in semantic accuracy.

The notion of semantic accuracy wasintroduced in chapter 2 and exemplified in Figure5 and Figure 6. Semantic accuracy includes an

Chapter

7

100 • Estimating semantic uncertainty in land cover classifications

evaluation of “ability of abstraction” as a measureof how well a real world feature can be defined inthe perceived reality. It also includes anevaluation of how well geographical objects in adatabase correspond with the perceived reality.The specification in these figures serves as aframework for the hypothesis testing in thisexperiment.

The following experiment will try to evaluateboth the above-mentioned aspects of semanticaccuracy; “ability of abstraction” and “accuracyof the dataset”. First the ability of abstraction ismeasured using differences between multipleinterpretations as a proxy for an actual “ability ofabstraction”-measure. The logic behind this is thata multiple interpretation of the same real worldentity uses the same specification and wouldultimately end up with the same data. Anydifferences at the data level would be possible tointerpret as a difference at the level of perceivedreality since the path from perceived reality todata is exactly the same and can be held as aconstant factor.

The analysis of generalized versions againstthe manual reference to answer the mainhypothesis (Eq. 1) uses the same logic in that ituses two versions from the same real worldfeature. Once again data is used as a proxy for thecomparison between perceived reality and the realworld. The degree of correspondence between thetwo can then be interpreted as the semanticaccuracy of the dataset, that is the semanticaccuracy of the translation from the originaldataset to the generalized level.

Also, the experiment includes an estimation ofthe differences between generalization methodsthat use different levels of detail in source data.Automated spatial generalization of categoricaldata often uses the same categories at both inputand output levels. Manual generalization on theother hand may use several levels of detail butthere are few if any reported examples of how thiscan be made automatically. Hodgson (1998)propose a conceptual model of manual imageinterpretation. In this he speculates that a manualinterpreter identifies intermediate levelabstractions before the final classification. Ofprimary interest in order to make automatedgeneralization possible is of course to determinethe required spatial and thematic resolution of theinput data in order to produce the desired output.

Since this work uses a site specific approachto the evaluation of aggregation effects the

analyses will follow some of the standardprocedures for performing site specific accuracyassessment, mainly those of Congalton and Green(1999). Due to the inevitable interpretationinconsistency, this study also implementsmethods to include classification uncertaintyusing rough classification described in theprevious chapter.

Methods

Source and reference dataIn chapter 4 it was concluded that the existence ofmixture classes in R-data caused undesirableeffects for the analysis of aggregation effects.Now, for the continued investigations of thespatial granularity effect, I made two principalrefinements of the experimental design.

First, I decided to use the “structural data” setfrom chapter 4, which does not make use ofmixture classes in the same way as R-data. Theexplicit representation of the landscape typemixture for each spatial unit is more suited to thetask of detecting differences in interpretation atdifferent spatial granularity. One main advantageof the R-data is that it was possible to do thoroughquality tests. The result of the quality tests inchapter 4 made it possible to treat the R-dataestimates as having the same accuracy as NationalForest Inventory estimates. However, for theseclosed experiments it is of secondary interest tohave absolute calibration of the interpretationvalues. More important is to assure an internalconsistency in the interpretations, and this waspossible to evaluate for the Structural data.

The second modification of the experimentaldesign of chapter 4 is the use of rough setrepresentations explained in chapter 6 (Ahlqvist,Keukelaar and Oukbir, 2000a). After testing forinterpretation consistency, it was possible to userough classification to represent the ambiguitythat occurs in the spatial aggregation process dueto the detected interpretation inconsistency.

Thus, for the purpose of this chapter, a portionof the original Structural data was selected. Fromthe entire dataset, data covering two topographicmap sheets (Swedish land survey, 1961, 1977)within the Stockholm County was selected for adetailed study, Figure 30. These study areas wereinterpreted once again using the same method asdescribed in chapter 4, only this time using asmaller grid size, 500x500m. This gave two setsof data for the same area, both datasets using thesame source information, same areal extent, same


interpreter (Rolf Ruben), and same classificationscheme. The only thing that differs in the twopieces of information is the areal unit ofinvestigation.

The interpretation of landscape types usestopographic maps to determine the proportion ofland cover types within each areal unit. Usingthese estimates a classification of the areal unitinto one of four different landscape types is made.The areal units consist of a regular quadratic gridwhere each map sheet is digitized into an array Lof square grid cells. As already noted this studyuses two different cell sizes, 500x500m called

500L data and 5000x5000m called 5000L data.

One problem of this kind of approach is toachieve enough data in order to provide astatistically valid result, as each pixel at the lowresolution corresponds to 100 pixels at the highresolution. According to general suggestions inCongalton and Green (1999) we need c. 200pixels for the assessment. This would in turnrequire interpretation of another 20000 pixels at

the 500L resolution. Clearly some kind of trade off

between the statistical requirements of theanalysis and the practical limitations for this studyhad to be made. Since sampling is not used for theassessments of correspondence it may be arguedthat a smaller number of target pixels can besufficient. I have decided to include an areacorresponding to two topographic map sheets inthis study, giving N=5000 for the fine resolution

500L data and N=50 for the coarse resolution

5000L data. The study areas, and the layout of the

interpretation grid are illustrated in Figure 30. Thestudy area is comprised of two separate areas eachcorresponding to one topographic map sheet5x5km and in a 1:10000 scale. For purposes ofanalysis and visualization the two map sheetshave been tiled side by side to produce onevirtually contiguous study area. All furtheraggregation processes and analyses are madewithout any involvement of the border zonebetween the two maps thus making sure that noartificial effects arise. Figure 31 shows images

500L

5000L

Stockholm county

Figure 30 Study area interpretation and tiling. Topographic maps (right) from NE and SW parts of Stockholm County(left) were interpreted at two different grid resolutions L500 and L5000. The two separate study areas were tiled andanalyzed together.


that portray the interpretation result at bothresolutions.

Congalton and Green (1999) state somenecessary steps to ensure proper collection of datafor accuracy assessment of a map:• Accuracy assessment sample sites must be

located on both the reference data and on themap

• Sample units must be exactly the same areaon both the reference data and the map

• Reference and map data must be collectedfor each sample unit to create reference andmap labels based on the map classificationscheme.

Since these recommendations pertain toapplied accuracy assessment the reference datamay be collected from a variety of sources, andmay be captured either through observation ormeasurements (Congalton and Green, 1999).

These first three requirements are fulfilled by thedata collection procedure explained above.

Classification schemeA detailed description of the classification schemewas described in chapter 4. The description givenbelow is in a translated and more formalisticform. As can be seen from the images in Figure31 the collected data includes both nominal andcontinuous variables. The following section willmore thoroughly explain the basis forclassification and its formal implementation. Thesame scheme was used in all datasets.

For each cell location l in L an estimation ismade of the areal coverage of four land covercategories by visually examining the papertopographic map sheet. Only the area within thespatial limits of the cell is considered at any onetime. Categories and their definitions are given inTable 26.

Landscape type landscape type

Urban/suburban Urban/suburban

Agriculture Agriculture

Forest ForestFigure 31 Interpreted source data at the two resolutions L500, left column, and L5000, right column. The upper twoshowing landscape type classifications (Blue= Coastal landscape, beige= Agricultural landscape, green= Forestlandscape, purple= Urban/suburban landscape). The lower six showing areal coverage of assessed land cover types,black=0-5% coverage, and increased coverage shown as increasingly whiter shades.


Associated with each cell location l in L isan array of variables { }PUPAPFPCLT ,,,,assigned values in the following manner. The firstvariable in the cell array { }LTl is assigned aclass label from the total set of landscape typeclasses LT =[Coastal, Urban/suburban,Agricultural, Forest] according to theclassification scheme in Table 27. Theclassification procedure starts by assessing theareal coverage of land cover type archipelago asstep 1. If the classification rule is fulfilled thearray variable is assigned { }LTl =Coastallandscape, if not the classification proceeds withsteps 2 through 4 until { }LTl has been assignedone of the four possible landscape type values.

After the assignment of landscape type class,the cell array is also given the values from theinitial estimation of land cover proportions for thefour categories in Table 1 to cell array elementsarchipelago { }PCl , forest { }PFl , agriculture

{ }PAl , and urban/suburban { }PUl . The arrayelements are assigned to one of 11 possible arealcoverage classes [ ]10,9...1,0 . These values

represent an interval of 10%, where { } 10*Pxl isthe middle of the interval in percent. Thus a value{ } 4=Pxl represents an interval from 35%-45%

areal coverage. Note that this does not apply forclasses 0 and 10 where values represent 0%-5%and 95%-100% areal coverage respectively.

The values for archipelago { }PCl areaccording to the interpretation instruction andclassification system (Chapter 4) not consistentlygiven as the areal coverage of this land covertype. This circumstance made that particularvariable unsuited for the study and it wastherefore excluded from the analysis. Necessaryinformation on the areal coverage of thearchipelago cover type was derived from the{ }LTl variable instead. The implication of this

circumstance will be discussed in detail in thenext section.

The study thus focuses on the distribution andareal proportion of the estimates ofurban/suburban, agriculture, and forestland covertypes. These three variables are used to analyzethe effects of aggregation on quantitative pixelvalues.

The process described above provide two setsof data [ ]PUPAPFLTL ,,,500 , N=5000 and

[ ]PUPAPFLTL ,,,5000 , N=50 interpreted

from the same source. Thus in the sense of Eq. 1we now have the two measurements

500LLr = and 500010 LL r = .

Generalization strategiesThe data collection has this far provided twomeasurements of the same area at two differentspatial granularity levels 500L and 5000L . To

answer the question if Eq. 1 holds, a definition isneeded of the generalization function ( )g thatwill be tested. The following section gives a brief

Table 26 Land cover categories assessed for areal coverage in the studyLand cover category DefinitionArchipelago Actual sea area and land area within 500m from the seashore

Urban/suburban Built up land including residential ground, streets, parks, industrial andcommercial areas

Agriculture Agricultural fields, grazed areas and tree patches. Also roads and buildingsconnected to these areas.

Forest Forested areas and other land cover types that do not belong to the other threecategories

Table 27 Classification scheme followed in the studyStep Landscape type Classification rule1 Coastal district At least 60% of the areal unit (pixel) is covered by archipelago2 Urban/suburban district At least 1/8 of the areal unit (pixel) is covered by urban/suburban3 Agricultural district At least ¼ of the areal unit (pixel) is covered by Inägor and

urban/suburban4 Forest district All other combinations


background on the various methods for rastergeneralization described in the literature followedby a detailed description of the aggregationprocedures that were used for our analysis.

Doing aggregation of data is not a trivial taskand a large literature is concerned withaggregation and the wider field of generalization.Following the definitions in McMaster and Shea(1992) raster-mode generalization can be dividedinto four fundamental categories.

Structural generalization refers to a spatialrearrangement of the raster matrix and normallyproduces a cell size that represents an increasedarea.

Numerical generalization is the type ofgeneralization often referred to as spatial filteringor convolution.

Numerical categorization is a process referredto as image classification in the remote sensingliterature.

Categorical generalization is confined togeneralization of categorical data.

In the following analysis of raster data I willuse a combination of numerical generalizationand numerical categorization. Two differentgeneralization strategies are used to evaluatedifference in performance. The differencebetween these two strategies is that they use inputdata at different levels of detail.

The first step of both digital generalizationstrategies is to define the spatial tessellation forthe output to be exactly the same as the data wealready have interpreted at the lower resolution

5000L . Thus a 5000x5000m quadratic grid of non-overlapping cells is produced, where each cellembraces 10x10 = 100 original 500L data cells.Now a numerical generalization is made

( )500Lg . It takes 500L data as input andcalculates numerical averages for the four landcover type variables { }PUPAPFPC ,,, ,which are output to the coarse resolution grid. Thenext section will describe exactly how thenumerical generalization step uses two differentstrategies to produce these averages. One calledProportion based generalization Eq. 2 and theother called Category based generalization Eq. 3.

Eq. 2 ( ) [ ]UAFCP PGPGPGPGLg ,, ,500

Eq. 3 ( ) [ ]UAFCC CGCGCGCGLg ,,,500

The output from the averaging procedure isassigned to vector variables identified by the

subscript. PG and CG denotes the twogeneralization strategies, Proportion based (PG)and Category based (CG). The Proportion basedgeneralization is using all of the available primaryinformation to produce images on areal coveragefor the variables at the coarser resolution. TheCategory based generalization uses only the classcategory information from each cell in theoriginal data to produce images on areal coveragefor the variables at the coarser resolution. It wassuggested earlier that a manual interpretationmight use intermediate level abstractions in theclassification process. The purpose of the secondapproach in this study is to simulate one possibleset of such intermediate level abstractions and touse them as input to the following generalizationprocess.

The implementations of the two strategies thusmake use of different inputs from the original cellvector variables [ ]PUPAPFLTL ,,,500 and

the following examples will illustrate in detailhow they work.

Figure 32 a illustrates the Proportion based(p) numerical generalization procedure from Eq.2. It uses all the available information in thevector variables so that for each variable

{ },,,, PUPAPFPCx = a mean value iscalculated for each 10x10 cell frame, which ispassed on to the corresponding vector variable inthe target resolution xPG .

Figure 32 b, illustrate how the category basednumerical generalization procedure from Eq. 3only uses the nominal cell classification

{ }LTL500 and calculates the frequency of each

class within each 10x10 cell frame as a measureof spatial coverage for each land cover type. Thefrequency for each class is output to thecorresponding vector variable in the targetresolution xCG .

After the averaging process a numericalcategorization is performed on the result. Thiscan be expressed as a function using the arealproportions in the coverage variables at eachlocation to produce a classification given asnominal values to the generalized landscape typevariable LTPG .

Eq. 4 uses notation for the proportionalgeneralization strategy but it also applies to thecategorical generalization strategy. Thisnumerical categorization function is theautomated implementation of the classification


into landscape types used for the source datacompilation described in chapter 4. It uses exactlythe same rules and since the discriminationbetween the classes is not exclusive we need tofollow the same flow in the classification processas in Table 27.

Data quality assessmentSo far several measures have been taken in thedesign of the data collection to ensure the qualityof the final data for analysis. These are:• Use of spatially exhaustive measurements

instead of sampling to eliminate samplinguncertainty

• Βalancing the number of measurements withpractical considerations to reach reliableconclusions

• Retrieving reference data using exactly thesame method as for the source data

• Use of only one interpreter for both datasetsthus eliminating need to calibrateinterpretations

• Use of a clearly defined interpretationprocedure to enable a consistent interpretationprocess

Some additional steps are still needed to furthercontrol the quality of the empirical data. Thesereceive extended treatment in the next section.

Interpretation consistencyIn general, consistency refers to the absence ofapparent contradictions in a database (Veregin,1999). An important part of any study usingmanually classified information is the sensitivityof the classification scheme to observervariability. This can also be referred to as datacollection consistency.

The use of only one interpreter during the datacollection and a clearly defined interpretationprocedure serve the purpose of reducing theinevitable variance in any estimation process. Buteven if these steps manage to reduce variabilitywe will still be facing some variation in the datathat is hard to control. Manual imageinterpretation variability has been reported (Ihse,1978; McGuire, 1992), and it would be reasonableto assume that a certain amount of interpretationvariability would be present in map interpretationalso. Apart from pure mistakes one source for thisvariation can be due to a lower ability ofabstraction in the used concepts. Althoughinterpretation variability is difficult to control it isoften possible to measure it and incorporateinformation on the variation into the data. Indoing so the results of further analyses are likelyto be more reliable since the uncertainty of eachestimation can be explicitly stated. According to

3

A

Landscapetype

Agricult. Agricult.

Forest. Forest.7

1

F

9

9

A

1

3

A

7

40% 75%

60% 25%

500PA

500LT

500PF

APG

FPG

ACG

FCG

a) b)

Figure 32 Illustration of the proportion based (a) and category based (b) numerical generalization procedures over ahypothetical 2x2 source pixel window. The proportion method calculates the average of the source proportions for thewindow and the category method takes the proportion of each class in source data for the window. Values are output tothe corresponding cover type variables.

Eq. 4 ( )

otherwisePGPG

PGPG

ForestalAgricultur

suburbanUrbanCoastal

PGPGPGPGfPGUA

U

C

UAFCLT4

1

81

6/

,,,≥+

≥≥

�

�

�

==


Congalton and Green (1999) there are two optionsto measure interpretation variability in the contextof reference data collection for accuracyassessment of remotely sensed images. One is tomeasure each variable at reference sites to providean exact reference for the complete interpretation.The other option is the use of multiple interpretersof reference sites. Both alternatives seem viablealternatives to measure the ability of abstraction.In this case, the first option would be feasiblesince we are using paper sheet maps and do notneed to perform extensive field measurements.Using map originals and measuring the total areasof interpreted map elements would probablyprovide a fairly exact measurement of groundtruth. Although the first option was possible thesecond option was chosen because of itssimplicity and since data already were availableas multiple interpretations distributed over thestudy area.

Thus, an estimate of interpretation consistencyas proxy for ability of abstraction was performed.By doing multiple interpretations of the same areaat different times. A total of 73 areal units (cells)were interpreted twice at 500L resolution and the

multiple samples were assessed using errormatrices for the variables PC, PF, PA, and PU.Table 28 shows one of the error matrices, variable

500PF , that was generated from the multiple

interpretations.

The overall accuracy of the example errormatrix is as low as 72% meaning that on averageevery 4th pixel is differently interpreted from timeto time. Table 29 provides a summary of thestatistics from the four error matrices that weregenerated for the four variables. From the overallaccuracy and KHAT statistics in the first twocolumns of Table 4 we can see that the absolutefit of the analysis is moderate. There is a 73% -93% agreement between the two interpretations interms of overall accuracy measures. The KHATstatistics that compensate for chance agreement isas low as 0.63-0.65 for the proportion variables.But it is much better, 0.89, for the finalclassification into landscape types, 500LT . The

general picture from all error matrices is that thefinal classification is most consistent between thetwo interpretations. This can be explained by thelarger tolerance of this variable to minordeviations in areal coverage interpretation.

The derived error matrices now give us someuseful information on how severe theinterpretation uncertainty is, and what measuresmay be taken to incorporate the uncertainty intoour analyses.

According to the error matrix in Table 28 theuncertainty is restricted almost entirely within +/-one areal coverage class. Since the variables arescalar intervals there is a transition around theerror matrix diagonal from totally consistentinterpretations, almost consistent interpretationsthrough non-consistent interpretations. Congalton

Table 28 Example error matrix from the assessment of interpretation consistency for the forest variable PF500.Rows=interpretation #1, columns=interpretation#2. Class 0=0-5%; class 1=5-15%… class 10=95-100% forested area.Major diagonal shaded, extended diagonal bold outlined.Forest 0 1 2 3 4 5 6 7 8 9 10 Total Class accuracy

0 16 2 18 0.89

1 1 2 1 4 0.5

2 2 2 4 0.5

3 1 1 2 0.5

4 2 1 3 0.67

5 1 1 2 0.5

6 1 1 0

7 1 1 2 0.5

8 2 2 4 0.5

9 1 4 1 6 0.67

10 1 2 24 27 0.89

Total 17 6 3 1 4 3 0 3 3 8 25 73

Table 29 Assessment of interpretation consistencyClassification Overall accuracy KHAT Extended overall accuracy

PF 0.73 0.65 0.99PA 0.85 0.63 1.00PU 0.90 0.64 0.99LT 0.93 0.89 N/A


and Green (1999) describe a simple but usefulvariation of the error matrix that is appropriate forthis situation. Given that we are dealing withclassifications on a continuous scale it can bejustified to expand the major diagonal in a normalerror matrix. Such an extended diagonal isoutlined in the Table 28 error matrix with boldlines. This fairly simple extension allow for someconsideration of the idea that class boundaries arenot totally crisp since it accepts plus or minus oneclass as a correct classification. In Table 29,extended overall accuracy measures, based on theextended major diagonal, show a substantialincrease of the overall accuracy. In other words, ifwe are willing to accept a variation in theinterpretation of areal coverage within + or – onecover class the interpretations will be totallyconsistent, 99%-100% overall accuracy, fromtime to time.

What we also see in the error matrix, Table 3,is that the error is not evenly distributed amongthe classes. Class 0 and 10 that accommodatevalues of no or full areal coverage are moreaccurate (89%) than the other classes. Theaverage for classes 1-9 is 48%. This type ofskewed class accuracy can be analyzed throughaccuracy assessment using of fuzzy logic (Gopaland Woodcock, 1994; Woodcock and Gopal,1999). This technique will not be used here sincethe data does not require this method at the stageof interpretation consistency evaluation. The issueof fuzzy logic will however be returned to in theaggregation procedures and in the analysis of theresults. To some extent the assessment in thissection provides a proxy for the “ability ofabstraction” measure discussed earlier.

Spatial autocorrelationKnowledge of the degree of spatialautocorrelation, or the correlation of a variablewith itself through space, is important for anyanalysis of scale change. Also, if the analysis ismade using any kind of sampled reference data,we need to consider the spatial autocorrelation tochoose an appropriate sampling scheme thatfulfills the criterion of independent samples. Sincethe analyses in this work is made on spatiallyexhaustive data, the risk of uncertainty introducedby sampling is eliminated, and the spatialautocorrelation is not of primary interest for theproduction of the reference data. More importantthough is the effects that spatial autocorrelationhas on the outcome of any spatial aggregationperformed. It is therefore necessary to know thedegree of spatial autocorrelation for the analyzedvariables in order to do a correct interpretation ofthe results.

The concept of a spatial correlogram (Cliffand Ord, 1981) was used to analyze the presenceand range of spatial autocorrelation. The spatialautocorrelation was estimated as King's caseMoran I index using the AUTOCORR module inIDRISI for Windows 2.0. In the analysis theoriginal images was analyzed separately (image#1 and #2) to avoid artificial effects due to theconcatenation of non-contiguous landscapes. Alsoa binary mask was produced for the coastallandscape type. The rationale behind this was toeliminate bias caused by the low and constantlevels of the forest, agriculture, andurban/suburban cover types in these areas.

In Table 30 and Figure 33 we see that theautocorrelation quickly tapers off from 0.23 –

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

500 1000 1500 2000 2500 5000

Urban #1Agricul. #1Forest #1Urban #2Agricul. #2Forest #2

Figure 33 Autocorrelation for both analyzed images, #1 and #2, and for the three variables Urban, Agricultural, andForest landscape types. 1st lag at 500m through 10th lag at 5000m.


0.47 at 500m lag distance to –0.20 – 0.02 at5000m lag distance. This means that we canexpect that cells at the lower resolution of 5000 mwill be a spatial mix of land cover types from thehigher resolution, 500m. The forest andagriculture variables exhibit a small increase inthe autocorrelation at a lag distance of 1-3 km.This is probably due to the fact that the landscapein this region is dissected by fairly long fissurevalleys. There are a few more dominant suchfissure valleys in the study area. These extend in aSW-NE direction and are about 1500m across.The spatial pattern of forest, agriculture and insome sense the urban areas are of course guidedby these major fissures thus providing the bumpsin the autocorrelation plot.

The analysis does not suggest that thedifferent land cover variables would behavedifferently in the generalization due to effectscaused by differences in autocorrelation.

Interpretation uncertainty propagationAfter the previous definition of the digitalgeneralization methods we now turn back to theearlier assessment of interpretation uncertainty.For the following analysis of scale dependentinterpretation it is important to incorporate theuncertainty in interpretation in order to drawcorrect conclusions. Figure 35 builds further onFigure 31 to illustrate the two differentgeneralization strategies over an example 2x2pixel matrix of source data.

The initial proportions of land cover variablesin this example are very close to the classificationthreshold for landscape type “Agriculturaldistrict”. The areal estimates of the land covertypes after the first numerical generalizationdiffers by 50% units. Despite this large differencethe final classification of landscape type is thesame. We see by this example that theoverestimating effect of the classification rules byusing the category based generalization is evenmore pronounced than in Figure 32. Furthermore,the final classification step is clearly very tolerantto misinterpretations in the landscape typevariables at certain levels. Despite the large

differences in land cover proportions,“Agricultural district” is still chosen as thelandscape type in both generalization strategies.Simply because the classification threshold is notwithin the range of the differences in thisparticular situation. Now, in Figure 34 when thetolerance of +/- one areal coverage class isapplied to the same generalization situation wesee that both strategies fail to decide upon theclass label.

Figure 34a shows how the initial data isaveraged into an interval of possible arealcoverage for the land cover type at hand. Sincethe interval crosses the classification threshold forthe agricultural and forest landscape types theoutput classification has to accept that either oneof these class labels can be correct. Figure 34bshows the same situation for the category-basedstrategy. Remember that this generalizationstrategy try to simulate a construction ofintermediate level abstractions used in the finalclassification. The input information is in theform of classified landscape types. Application ofthe same logic as the illustration in Figure 34 willproduce borderline cases where a +/- one-classinterpretation consistency may yield differentfinal landscape type classifications. Theinformation from the land cover type variablescan be used in this way to determine if thelandscape type classification of the source data iscertain or if several classes are possible. In Figure34b the input data has undecided classifications in3 of 4 cells due to the information we have fromthe land cover type variables Figure 35b.

Uncertainty handlingIn the previous sections the concept of anextended diagonal in the error matrix wasintroduced. The following analysis of the resultsfrom the numerical generalization will use thesame method to handle the detected inconsistencyin the interpretation of the source data. Summaryresults will therefore include overall accuracymeasures for both the major diagonal and theextended major diagonal of the error matrices. Itwas also concluded previously that the use of an

Table 30 Moran’s I autocorrelation index of the original 500L interpreted data. I is given for 1st and 10th lagautocorrelation corresponding to 500m and 5000m lag distance respectively.

1st lag 10th lagImage #1 Image #2 Image #1 Image #2

Urban/suburban 0.27 0.45 -0.08 0.02Agriculture 0.22 0.35 -0.11 -0.07Forest 0.30 0.40 -0.10 -0.20


extended major diagonal is not suited to nominal,qualitative data such as the landscape typevariable in the final classification. To be able toanalyze ambiguous generalization outcomes suchas those illustrated in Figure 34 some logic isneeded that can handle the classificationambiguity in a location specific accuracyassessment. We face here the same situation as inthe experiments reported in chapter 5, howeverthis time it is my ambition to confront thisparticular problem.

The review of methods for uncertaintyrepresentation in chapter 3 suggested fuzzy sets asa suitable method for data with poorly definedobjects or individuals. The definitions above maynot seem as poorly defined. Still, the objective ofthis study is to reveal differences in theinterpretation of a concept with respect to anobject, i.e. the areal unit. This fuzziness is whatFreksa and Barkowsky (1996) describe as a lackof detail that result in fuzziness. Fuzziness cannotbe viewed as a property of a single object: ratherit is the property of a relation between a conceptand an object. Using the terminology established

previously, this could be interpreted as theaccuracy of the model or “ability of abstraction”.As a consequence of this follows that the conceptof ability of abstraction can be approached usingfuzzy set reasoning. As already noted there havebeen examples of evaluating crisp classificationsagainst a fuzzy reference (Gopal and Woodcock,1994; Woodcock and Gopal, 1999). Theirmethods are capable of answering questions of theaccuracy of a crisp map using a fuzzy reference,but the inverse situation is not easily inferred. Asit turns out in this chapter, we have a fuzzyclassified map, which we want to evaluate againsta fuzzy reference.

It has been pointed out previously that onedifficulty with the use of fuzzy sets is the need todetermine a proper membership function. Thefuzzy set approach also requires that there is somekind of gradual transition from non-membershipto full membership. This is the situation in forexample an ordered set of linguistic concepts:[‘short’, ‘medium’, ‘long’]. In the case of non-ordered linguistic concepts such as [‘forest’,‘urban’] we do not even have the necessary

3

A

Landscapetype

Landscapetype

Agricult.

Forest.7

1

F

9

3

A

7

3

A

7

25% 75%

75% 25%

AA

APG

LTPGLTCG

ACG

FCGFPG

500PA

500LT

500PFLandscape

type

a) b)Figure 35 Illustration of the numerical generalization followed by numerical categorization into landscape type classes.The proportion based generalization strategy (a) uses averages from the land cover type variables Eq. 2 and the categorybased generalization strategy (b) uses landscape type frequencies Eq. 3 as input to the final classification step Eq. 4

∗APG

∗FPG

∗LTPG

∗AC G

∗FC G

∗LTCG

500PA∗

500LT

500PF

3

F/A

Landscapetype

Landscapetype

Landscapetype

Agricult.

Forest. 7

1

F

9

3

F/A

7

3

F/A

7

15-35% 0-75%

65-85% 25-100%

F/AF/A

a) b)Figure 34 Illustration of the numerical generalization followed by numerical categorization into landscape type classesaccounting for +/- 10% classification accuracy in the input data. The proportion based generalization strategy (a) usesaverages from the land cover type variables and the category based generalization strategy (b) uses landscape typefrequencies as input to the final classification step.


information to build a membership function. Infact, it would be inappropriate to build one sincethere is no evidence for a gradual transition fromnone to full membership. The only thing known isthat the information is not detailed enough toprovide a unanimous decision. This non-quantified uncertainty has until recently been hardto handle. However, the recent advances on roughset theory (Pawlak, 1982) and the previousexamination of the use of rough sets to representuncertainty in geographic datasets will now comeuseful. The following analysis will use the novelmethods of rough classification and calculation ofoverall accuracy measures using extended errormatrices described in the previous chapter.

The uncertainty propagation in the numericalgeneralization and categorization was outlined inFigure 34. This uncertainty can now berepresented using the rough set based logic tocreate rough classes. Consider for example ageneralization that results unambiguously in thelandscape type class ‘Coastal’. This result will beassigned a label of C meaning that it is in thelower, certain approximation of the rough set( )CC, . If however the result was ambiguous,that is the numerical categorization found thatboth landscape type class ‘Coastal’ and ‘Forest’was possible, the cell would be assigned two

labels; CC − and FF − . This means that thecell is in the area of uncertainty for both thecoastal rough class and the forest rough class.Thus, the numerical categorization into landscapetype classes was performed once again, now usingthe logic of rough classifications. The twooutcomes from the proportion and category basedrough numerical categorizations ∗

LTPG and∗LTCG respectively, were compared against both

the crisp 5000L and a rough version ∗5000L of the

interpreted data in extended error matrices. Thelogic behind the construction of a rough versionof the reference dataset was the same as for thegeneralized versions.

The rough classification is then used in anextended error matrix for a site-specific accuracyassessment Table 31. In an extended error matrixwe have a rough representation of either targetdata or reference data. In the example below, thegeneralization has produced a rough classification(rows), which is compared against a crispreference (columns). The error matrix extensionwill mean that some of the properties of a

‘normal’ error matrix do not hold since we willhave overlapping areas in the upperapproximations. This property will make it harderto calculate for example user’s and producer’saccuracy (Congalton, 1991) but it can be doneusing a parameterized error matrix as explained inchapter 6. However, one measure, the overallaccuracy, will still be easy to calculate but it willbe given as an interval with an upper and a lowerbound, Eq. 5.

Eq. 5 ( )+≤≤ ++ RxxARx iiiiOii // ?,,,

In other words the lower bound of the overallaccuracy is calculated dividing the sum of thecertain, lower approximation areas in the diagonal

+iix , by the total area R . Dividing the sum of

the certain and uncertain, upper approximationareas, ( )?,, iiii xx ++ by the total area, give the

upper bound.As mentioned, we may have either target data

or reference data in the form of a roughclassification. Furthermore, we may have bothtarget and source data as rough classifications.We only need to extend the error matrix in bothdirections producing a two-dimensionallyextended error matrix, or 2Deem (chapter 6) asshown in Table 32. From the 2Deem it is still hardto calculate producer’s and user’s accuracy. Theoverall accuracy is still possible to calculate,although a bit more elaborate than in Eq. 5.

AnalysisUp to this point the methods for data collectionhave been firmly established, the generalizationstrategies used to produce two target datasets havebeen defined, and methods to handle the

Table 31 Extended error matrix for proportion basedgeneralization (a) and two dimensionally extendederror matrix, 2Deem for proportion-basedgeneralization (b). Major diagonal shaded.

Proport.

Coast. Forest Agri. Urb.

C 14CC −

F 1 3 2 FF − 3 5 8 4 A 3 5 AA − 3 5 8 4 U 2 UU − 1


interpretation accuracy have been described. Theanalyses in the following result section usesseveral, well described, methods as well as somenovel methods based on rough classification.

The first result section tries to answer if theoriginal hypothesis, ‘Eq. 1 holds’, can be justified.Many techniques may be used to assess theoutcome of a spatial aggregation of data. A simplebut useful method that was used here is toproduce histograms of the distribution of classvalues in the original and aggregated images.Error matrices were also produced to provideoverall accuracy measures such as the overallaccuracy and K (KHAT-estimate) (Congalton,1991). The Kappa analysis method is widely usedto statistically determine if one error matrix issignificantly different from another. Also theKHAT statistic provides a measure of actualagreement between reference data and estimateddata. This measure of agreement is based onactual agreement in the error matrix compensatedfor the chance agreement indicated in row andcolumn totals. This analysis technique is thereforesimilar to Chi square analysis. Chi square analysismay raise problems if cell values in the matrix

equal zero (Congalton, 1983). An analysis ofdifference images was seen as a powerful,location specific method for evaluation of theoutcomes from the different generalizationstrategies. The joint outcome of these methodswas used to answer the original hypothesis.

The implementation of rough classificationand extended error matrices section goes deeperinto the results. The concept of roughclassification and extended error matrices fromchapter 6 makes it possible to do a site-specificevaluation of the final landscape typeclassifications. The use of 1- and 2-dimensionallyextended error matrices is here furtherdemonstrated in a thorough analysis of possibleexplanations for the results.

Results

Semantic correspondence analysisThe two source data sets interpreted at 500x500mand 5000x5000m pixel resolutions together withtwo digitally generalized sets of data at5000x5000m pixel resolution were used in thefollowing analyses. Examples of the sourceinterpretations and results of the two

Table 32 Two dimensionally extended error matrix, 2Deem for proportion-based generalization. Major diagonal shaded.

Proport. C CC − F FF − A AA − U UU −C 10 4 4 3 2

CC −F 1 6 6 1

FF − 2 1 17 17 2 8A 3 2 3 3 2

AA − 2 1 17 17 2 8U 2

UU − 1 1 1 1

a) 500m Agriculture c) prop.gen. Agriculture e) 5000m Agriculture

b) 500m Landscape type d) cat.gen. Agriculture

Figure 36 Example source 500PA data (a), 500LT data (b), results from the proportion based APG (c), and category

based ACG (d) digital generalization for land cover type variable Agriculture, and reference 5000PA data (e).


generalization processes are illustrated by theimages in Figure 36. The upper left image (a) issource data for the Agriculture land cover type.The lower left image (b) shows source data for thelandscape type variable. The middle columnimages show the generalized output for land covertype agriculture, using the proportion-basedmethod (image c), and the category based method(image d). Image (e) shows the manualinterpretation of the agriculture land cover typevariable at the low spatial resolution.

Quantitative variablesHistograms of the distribution of the classes ininterpreted data and the generalized output isshown in Figure 37. In general the two digitallygeneralized outcomes in rows 2 and 3 show largesimilarities with the interpreted reference data inrow 4. In the 500Px interpreted data, both the

Agriculture variable and the Urban/suburbanvariable show a unimodal distribution with lowmean values, whereas the forest variable show amore flat almost bimodal distribution. We knowfrom the autocorrelation analysis, Figure 33 and

0

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 10

5.27.1

500 ==

σµ

PA6.39.4

500 ==

σµ

PF7.16.0

500 ==

σµ

PU

1.17.1

==

σµ

APG0.29.4

==

σµ

FPG9.06.0

==

σµ

UPG

5.13.2

==

σµ

ACG1.29.3

==

σµ

FCG4.11.1

==

σµ

UCG

2.15.1

5000 ==

σµ

PA9.06.0

5000 ==

σµ

PU5.27.4

5000 ==

σµ

PF

Fre

quen

cy

Fre

quen

cy

Fre

quen

cy

Freq

uenc

y

Freq

uenc

y

Freq

uenc

y

Freq

uenc

y

Freq

uenc

y

Freq

uenc

y

Fre

quen

cy

Fre

quen

cy

Fre

quen

cy

Agriculture Forest Urban/suburbanFigure 37 Histograms of the scalar variables Agriculture (a), Forest (f), and Urban/suburban (u). 1st row show sourcedata at 500x500m pixel resolution, 2nd row show generalized data at 5000x5000m pixel resolution using proportionbased averaging, and 3rd row show generalized data at 5000x5000m pixel resolution using category based averaging.The 4th row shows the reference interpretation at 5000x5000m pixel resolution.


Table 30, that the aggregation window is greaterthan the autocorrelation range. As we then wouldexpect (Bian and Butler, 1999), the generalizationchanges the distribution of the classes and tend toeliminate low frequency values far away from themean. A reduced variance is also apparentthroughout all generalization outcomes. Theproportion based generalization in 2nd row, retainsthe mean but reduces the variance. Theoreticallythis should be the case for this type ofgeneralization and the result only confirms this.

The category-based generalization though,changes both the mean and the variance values.The mean values here suggest a tendency to overestimate the Agriculture and Urban/suburbanareas while the forest area is underestimated if weuse 500Px interpreted data as a reference. Since

this study is more interested in the differencesbetween the manual interpretation and the digitalgeneralization at the same resolution I will mainlyuse the 5000Px source data as a reference. As we

can see in the 4th row, the 5000Px data produce

mean and variance values that differ slightly fromthe high-resolution data. The Agriculture andForest variable is interpreted with a lower meanvalue than the high resolution data whereas theUrban/suburban variable retains the mean.

The category based generalization apparentlymakes an underestimation of the forestland covertype but makes an overestimation of theAgriculture and Urban/suburban land cover type.The statistical summaries also suggest that thecategory based generalization method retain moreof the variation in the data. Variance estimatesdoes not drop as much as in the proportion basedgeneralization. Although not apparent, thevariance estimates suggest a pattern where theproportion based generalization method followthe low-resolution manual interpretation betterthan the category-based generalization.

So far the generalization outcome has onlybeen analyzed using global measures. Figure 36

gave examples of the spatial outcome of theinterpretations and the generalizations. To make alocation specific analysis, error matrices wereproduced, assessing the outcomes of thegeneralization against the manually interpretationas reference data. Example of one error matrix hasalready been given in Table 28. Overall accuracy,K , and extended overall accuracy statistics werecalculated from all error matrices. Table 33displays these statistics from the proportion basedand category based generalization strategyrespectively. The overall accuracy shows arelatively bad correspondence between the digitalgeneralizations and the manual estimate. The Kvalue only confirms this having values in theregion considered as “poor agreement” by Landisand Koch (1977).

Since both overall accuracy and K statisticswill be influenced by the +- 10% interpretationaccuracy found in data they will not be used forfurther discussion. They are only provided herefor comparison purposes. The lower row withrough overall accuracy measures show accuraciescompensated for interpretation inconsistency.These were produced using a 2-dimensionallyexpanded error matrix with rough versions of bothsource and reference data. These tables togetherwith the previous histograms give us moreinformation on how differently the two methodsof aggregation agree with the manuallyinterpreted reference. The general picture fromthe histograms is repeated here. The proportionalgeneralization works best, capable of producingestimates of the different land cover types with a100% correspondence with the reference images.The category-based strategy manages to produce90-100% correspondence with the manuallyinterpreted reference images. It should be notedfor the category strategy that the best result of100% for the urban land cover category is aneffect of the narrow range of this variable incombination with the interpretation consistency of

Table 33 Summary accuracy statistics for correspondence assessment of proportion based (PG) and category based (CG)aggregation of land cover type data,

5000PxCGx ⇔ .

Archipelago Forest Agriculture UrbanPGC CGC PGF CGF PGA CGA PGU CGU

Overall accuracy N/A 0.54 0.50 0.24 0.56 0.32 0.88 0.54

Overall kappa K N/A 0.41 0.43 0.14 0.41 0.16 0.78 0.29

Rough Overall Acc. N/A 0.90 1.00 0.94 1.00 0.96 1.00 1.00


+/- one cover class. This has the effect ofembracing most of the range for this land covervariable. The accuracy estimates for the threeother land cover variables, 90-96% are morelikely to give a correct estimate of the accuracyfor this generalization method. The samereasoning applies for the proportion-basedestimates although accuracies for the other landcover variables also produce 100%correspondence with the reference. Thus acorrespondence of up to 100% is likely to berepresentative for the proportion basedgeneralization method. These highest figures ofcourse only apply if all the interpretationinconsistencies turn out in a favorable direction.

To illustrate how these differences ininterpretation and generalization are distributed inspace, a spatial residual analysis was made. InFigure 38 5000Px data have been subtracted from

the generalized outcomes xPG and xCG to

produce difference images showing thegeneralized deviation from the reference image,

5000PxPGx − . Gray shades are set to show

deviations from the reference with brighter shadesrepresenting higher deviations. The shades havebeen set to indicate only significant deviationsthat exceed the uncertainty interval of +/- 1 class.Signs show if deviations are positive or negative.

The images in Figure 38 give a spatial versionof the histograms in Figure 37 and accuracymeasures in Table 33. For example the forestvariable in the reference interpretation has a meanof µ=4.7. The category based generalization meanis lower, µ=3.9. Figure 38 d illustrate the spatial

distribution of the significant deviations from thereference image as the gray pixels. The proportionbased generalization mean is µ=4.9 which isslightly higher than the reference image mean. InFigure 38 c however we see no gray pixels. Thismeans that the deviation from the reference imageis not anywhere of a magnitude greater than thebounds of the +/- 10% interpretation accuracy.This goes for all variables in the proportion basedgeneralization outcome, Figure 38 a, c, and e.Figure 38 f, show that the category basedgeneralization produce only one significantdeviation from the reference image and this ispositive.

Qualitative variablesFigure 39 illustrate different versions of thelandscape type classifications. First two imagesFigure 39 a and b, show the landscape type, high-resolution 500LT interpretation and low-

resolution 5000LT interpretation respectively.

Figure 39 c and d, show the two generalizedclassifications, LTPG and LTCG that wereproduced using generalized land cover imagesand the classification scheme defined earlier. Thevisualizations of these images do not account forthe +/-10% interpretation accuracy that of coursealso affect the final landscape type classificationdiscussed earlier. A visual inspection of theimages in Figure 39 c and d is neverthelessinstructive in that it confirms the pattern that theprevious images and statistics so far have onlysuggested.

xPG

xCG

a) prop. Ag riculture c) prop. Forest e) prop. Urban/suburban

++++

++ + +

+

--- -

-

- +

b) cat. Agriculture d) cat. Forest f) cat. Urban/suburban

Figure 38 Difference images for cover type variables Agriculture (a,b), Forest (c, d), and Urban/suburban (e, f). Theimages were produced by subtracting the interpreted reference image from the digitally generalized image. First row areresults from the proportion-based generalization 5000PxPGx − , second row category based generalization 5000PxCGx − .Grays are set to show significant deviations from the reference image in increasingly brighter shade. Signs show ifdeviations are positive or negative.


Both generalization outcomes, Figure 39 c, d,are visually different from the reference imageFigure 39 b. The proportion based strategyproduce fewer areas of the urban landscape classand more areas of the forest landscape classcompared with the reference image. The category-based generalization apparently overestimates theagricultural landscape type and the urbanlandscape type class whereas the forest landscapeclass is underestimated. A location specificaccuracy assessment was performed on theseimages also, using error matrices and calculatingoverall accuracies. It was earlier concluded thatthe method of using an expanded major diagonalis not appropriate for these categorical data.Instead rough classifications have been used torepresent the uncertainty and accordingly theoverall accuracy measure will be an intervalwithin which the classification accuracy isdefinitely located.

As we can see from the standard overallaccuracy, none of the generalization strategiesproduce a good correspondence with thereference, 82% and 64% respectively. This resultonly confirms the visual inspection of the imagesin Figure 39. Since there is a certain amount ofuncertainty hidden in the crisp classification, an

analysis of the rough set based classificationprovide valuable additional information. Therough overall accuracy row in Table 34 tell us thatthe overall accuracy for both generalizationmethods may reach as much as 100%correspondence with the manually interpretedreference. As before, these highest figures onlyapply if all the interpretation inconsistencies pullin a favorable direction. We may actually end upwith an overall accuracy as low as 28% and 26%for the two generalization strategies respectivelyif the uncertainty pulls in a completelyunfavorable direction.

Implementation of rough classification andextended error matricesThe above results have considered two cases ofuncertainty. One where source and reference dataare both regarded as totally crisp and the otherwhere both source and reference data incorporatesinterpretation uncertainty represented as roughclassifications. After a brief illustration of theimplementation of rough classification thissection will further illustrate the possibilities touse 1- or 2-dimensionally extended error matrices(referred to below as 1Deem and 2Deem) withrough classified data.

The actual implementation of rough

a) 500x500m data b) 5000x5000m data

c) prop. generalization d) categ. generalization

Figure 39 Landscape type classifications. Manual high-resolution interpretation 500LT (a), manual interpretation

5000LT (b), outcome of proportion based generalization LTPG (c), and category based generalization LTCG (d)

Table 34 Summary statistics from a traditional crisp error matrix (overall accuracy and overall Kappa) and a2Dimensionally extended error matrix produced from the comparison of generalized landscape type classifications.

ProportionalStrategy

Categorystrategy

Overall accuracy 0.82 0.64

Overall kappa K 0.75 0.52

Rough Overall Acc. 0.28-1.00 0.26-1.00


classifications has been made using the IDRISIfor Windows 2.0 GIS and Microsoft Excel 2000spreadsheet software. The representation of roughclasses in IDRISI uses separate images for eachvariable in each rough classification. Forexample, the landscape type classification LTPGconsists of four images PGLTCOA, PGLTFOR,PGLTAGR, and PGLTURB. Each image uses

values of 0, 1 and 2 to represent X¬ , XX − ,

and X respectively.This representation is very similar to the way

fuzzy measures may be accommodated by rasterGIS. In the example images shown in Figure 40we see the outcome of the category-basedgeneralization into rough classifications. Lowerapproximations X , that are the certain areas,show as white pixels, areas ofuncertainty XX − show as gray pixels and areas

that certainly not belong to the class X¬ show asblack areas. Figure 40 illustrate that the roughclassification allow overlapping pixels in the areaof uncertainty of the rough classes, but lowerapproximations never overlap with anything elsethan their corresponding upper approximations.

To further demonstrate the use of 1- and 2-dimensionally extended error matrices Table 35show the previous analysis results under twodifferent assumptions. If one of the data sets iscrisp and the other is rough we can use a 1Deemin the accuracy assessment. This is illustrated bythe upper and lower left 1Deems. These matricesuse rough versions of the generalized outcomesfrom the two different strategies and a crisp

version of the reference data, that is assuming thatreference data is totally confident in itsclassifications. For the 1Deem case we may useEq. 5 to calculate an overall accuracy interval. Forthe shown data these measures would be 0.50 ≤AO ≤ 0.90 for the Proportion based generalization(upper left table), and 0.50 ≤ AO ≤ 0.88 for thecategory based generalization (lower left table). Inthe upper and lower right tables the same twogeneralization strategies are compared against arough version of the reference data in 2Deems.The overall accuracies for these two tables, 0.28 ≤AO ≤ 1.00 for the Proportion based generalization(upper right table), and 0.26 ≤ AO ≤ 1.00 for thecategory based generalization (lower right table),have already been presented in Table 34. Theincrease of the accuracy interval is an effect of theuncertainty added by the rough version of thereference data set.

In addition to the overall accuracy, anexamination of the error matrices revealsadditional information on the nature of theuncertainty in this specific case. One importantaspect of a rough classification is the precision ofthe lower approximations since these estimatesare regarded as certain under the giveninformation. Both P- and C- classifications are100% correct for the coast class, but as we can seein the 2Deem the reference data is not totallyconfident in its classifications. 4 of the cells areassigned to the upper approximations of one ormore other classes. Three of the cells attributed tothe lower approximation of class ‘Forestlandscape’ in the P-classification was not correctaccording to the 1Deem, but from the 2Deem it is

a) Coastal rough set b) Forest rough set

c) Agricultural rough set d) Urban rough set

( )FF ,

( )UU ,

( )CC,

( )AA,

CCC

C−

¬

AAA

A−

¬

FFF

F−

¬

UUU

U−

¬

Figure 40 Rough classification CGLT implemented as IDRISI for Windows 2.0 images.


clear that the reference data is very uncertain inthis class. Only one cell in the reference isassigned to the lower approximation of Forestedlandscape and this specific cell was actually given

a partly correct label of FF − and AA − inthe proportion based generalization. This lastspecific fact cannot be read directly from the errormatrix. Both the proportion based and thecategory-based generalization corresponds in theirlower approximations with the lowerapproximations of the reference interpretation.This means that when each generalization methodis confident about its classification, the referencedata does not reject a correspondence. There isconfusion between the forest and agriculturallandscape types, which is evident if we look at theupper approximations of these sets for bothgeneralization strategies. Cells assigned to thearea of uncertainty for the forest landscape class,are also assigned to the area of uncertainty for theagricultural landscape class. This confusion is aneffect of the classification scheme that forcesareas assigned to the upper approximation ofagricultural landscape to be in the upperapproximation of the forest landscape as well.

This produces an artificial symmetry in the lowerapproximations rows of Agricultural and Forestlandscape types. In general terms, the detail andaccuracy of the information is not enough toproduce confident answers in many of thesecases.


The main focus of this study was to investigate ifa changed spatial resolution also changes theinterpretation of a certain landscape concept. Theoriginal hypothesis stated that the outcome of thegeneralization is equal to a manual interpretationat the target level. The large accuracy intervalsmay seem hard to draw any conclusions from.Indeed this is a problem and in this study one ofthe reasons can be found in the rather lowcrispness of the interpretations. The uncertainty ofthe interpreted images are caused by therecognition of a limited ability of abstractionrepresented as a +/- one areal cover class accuracyin the interpretation. This uncertainty intervalforces the overall accuracy interval to be quitebig. In fact, it is never possible to reach a betteroverall accuracy interval than 0.36 ≤ AO ≤ 1.00.Following terminology in Chapter 6 (Ahlqvist,

Table 35 Extended error matrices for both generalization strategies. a): Proportion based (P) classification against crispreference, b): Category based (C) classification against crisp reference, c): P-classification against rough reference, d):C-classification against rough reference.

C14

3 13 9 8 2

32 9 8 2

2 51 4 2

P

2

P

C

C CC − F FF − A AA − U UU −CF

FF −

A

AA −

UU −

U

CF

FF −

A

AA−

UU −

U

CF

FF −

AAA −

UU −

U

CF

FF −

AAA −

UU −

U

C CC − F FF − A AA − U UU −

C F A U

C F A U

141 51 7 10 5

32 7 10 5

31 3

10 4 4 4 21 1 5 51 4 20 20 12

1 2 11 4 20 20 12

2 2 1 24 4 4

10 4 4 4 25 5 1

1 5 16 16 71 2 1

1 5 16 16 76 66 61 6

1


Keukelaar and Oukbir, 2000a) this measure canbe translated as the overall crispness measure, MC

= 0.36, meaning that only 36% of the referencedata is unambiguously classified into lowerapproximations. Similarly, uncertainty in theinput to the automated generalization causes theoutput to have overall crispness measures of MC =0.50 for the PGLT-generalization and MC = 0.58for the CGLT-generalization. In spite of the lowcrispness and high uncertainty, the results doallow for some general conclusions to be drawn.

For the classification into landscape types, theoverall accuracy of up to 100% indicate that Eq. 1may hold for any one of the two suggestedgeneralization strategies. Thus we have noevidence to falsify the original hypothesis. Thiscan also be translated as a high semantic accuracyof the dataset. But if we consider the outcomefrom the analysis of land cover proportions inTable 33 the picture is a bit different. Theproportion-based classification did produce a100% correspondence with the reference for allvariables. The category based method though, didnot produce a full agreement with the referenceimages. The estimates of areal coverage departedsignificantly from the reference in up to 10% ofthe cases. The characteristic of the category basedgeneralization method is the use of a secondaryclassification of the original information as inputdata. Figure 37 and Figure 38 showed that thissecondary classification exaggerates the amountof urban and agricultural areas. Clearly thisskewed information has a negative impact on thecorrespondence with the reference interpretation.In this case study the skewed areal land coverinformation has in spite of this no proven effecton the final classification into landscape types.This is mainly due to the rather broad limits set bythe classification system. Other classificationsituations may be more susceptible to this kind ofbias though.

It was speculated earlier that an interpretationat a coarser spatial resolution might useintermediate level abstractions before the finalclassification is made. The type of intermediatelevel abstraction used here in the category-basedgeneralization does not support this idea but itstill does not reject such a possibility. The pilotstudies in chapter 5 indicated that the automatedaggregation of high-resolution data and the low-resolution interpretation does not correspond verywell. Some of the errors in those studies may beattributed to interpretation inconsistency but the

general picture from all analyses carried out isthat some of the analyzed parameters may well beaggregated to satisfy Eq. 1 whereas otherparameters may not.

Increased granularity in remote sensinginformationIn the light of this it is appropriate to discuss theidea of using one single high-resolution databaseas a primary source for information derivatives.This can be held as a feasible and highly desirablesolution in that source data will be possible totranslate into different concepts according to thecurrent needs (Müller 1989; Albrecht 1996; Gray1997). The approach would be ideal for data thatcan be collected through quantitativemeasurements of field like parameters such astemperature, topography but also perhapsindividual plant species occurrences. Remotesensing information is well suited to this kind oftreatment but there are still some fundamentalproblems that need to be resolved before this kindof framework can be made operative forenvironmental monitoring and management(Wilkinson 1996).

Remote sensing through satellites iscontinually producing environmental informationof more and more accurate and fine-grainedresolution both in terms of spatial and spectralresolution. Over the next ten years we can expectthe complexity of remotely sensed datasets togrow significantly through the use of multi-sensor, hyper spectral, multi-view angels andmulti-temporal time series. Given that futureremote sensing systems will provide even moredetailed data, each call for automatedinterpretation or classification of these fine-grained data into larger spatial and thematicgroupings will need a thorough understanding ofthe generalization steps needed for suchoperations. The problem, for example with mixedpixels, will not disappear with increasedresolution, instead the high resolution pixels willbecome mixtures of different things than we areused to today and accordingly call for otherconceptualizations. Key questions here will be toidentify what actual variables that should beconsidered as defining a more abstract concept.

For complex category definitions such asvegetation types Gray (1997) propose a method ofkeeping information as disaggregated fields inorder to be able to use different class concepts inother contexts. The fine grained dataset given byremote sensing devices will of course give us


unique opportunities to study the meaning ofvarying concepts in relation to one another. Suchstudies would also have the potential to providean increased understanding of our own mentalmodels of the real world entities (Gray 1997).Given such an understanding of ourconceptualizations methods, to translate field-based information into objects has been describedby Mackay et al. (1994).

Still there is also a need to approach theproblem with an integrated view of the landscapewhere we will deliberately expose ourselves to thethematic inaccuracies represented in e.g. historicalmaps and cultural heritage descriptions. Lookingat the landscape as a whole, errors and loss ofinformation will not necessarily be the case. Theuse of summary concepts such as “agriculturaldistrict” gives an instantaneous idea of thelandscape properties. It may be full of errors andcannot provide estimates of the actual field areaand the spatial distribution of individual fields,but on the other hand the concept “agriculturaldistrict” will most likely give the reader moreimmaterial notions of ownership structure andland use traditions which are complicated tocommunicate in more formal terms. Of course theterm “agricultural district” might mean somethingrather different in a north-American than in aEuropean context for example. This is exactly thereason for the current efforts to develop richermodels on a semantic level that have been pointedout in chapter 2.

This field of research has been of interest togeographers for a long time now and with thecurrent developments within the field of cognitivepsychology and computer science there areopenings for further findings. Being able tohandle multiple perceptions of the environment isnot only important for visualization of data. It isalso of great importance when dealing withenvironmental models e.g. for wildlife habitatmanagement.

Implications for image classificationIt is also interesting to discuss these results in thecontext of current techniques to produce landcover maps from remotely sensed data. Mostmultispectral classification procedures still uselocation specific classification based on thespectral information from one image pixellocation. At the same time, it is oftenrecommended that training data for a supervisedclassification be given as larger regions of severalhundred pixels in order to capture the total range

of the spectral signature of a specific class. Thisstandard procedure requires that there is a 100%consistency in the use of a specific class conceptat the two levels of resolution used in a normalsupervised image classification.

The great variety of currently used categoriesat different levels of resolution has never beentested to confirm or reject this fundamentalassumption. I therefore argue that more extensivetests using similar methods and data illustrated inthis work need to be made as part of any normaldata quality report. It has been argued that thisapproach is neither rigorous nor always possible(Weibel and Dutton, 1999). The problem is thatthe other alternative would be to ignore theproblem. There is today no documentedknowledge if there is a semantic variation in theuse of a specific concept at different scales.

So, why insist on comparing results with otherinterpreted results when a digital reference mightbe used for accuracy assessment? The majorpurpose of this chapter is not to show how wellthe human interpreter manages to summarizepieces of data compared with the 'right' answerthat the computer might give us. The purpose is tofind out whether the human mind makes differentassessments of an area if we apply different scalesfor our investigation. Of course a baseline ofdigital numbers could be of interest to see if weover- or under estimate at various resolutions.This might also give some clues of what causesthe differences. The aspect covered by thecorrespondence assessment translates into to thesemantic accuracy of the dataset. I can onlysuggest that the use of manual interpretations atdifferent resolutions is a viable method to makethe necessary evaluation of generalization effectsother than purely statistical.

Rough classification methodThe analysis of the final classification results andthe demonstration of data in 1Deeem and 2Deemsshow the power of rough classifications torepresent and illustrate a type of uncertainty oftenfound in geographic datasets. The three levels ofmemberships that can be represented by the lowerapproximation, the area of uncertainty and thenegation of the upper approximation, translatesquite intuitively into the statements ‘absolutelysure’, ‘maybe’, ‘absolutely not’. In cases wheremore levels of gradual certainty is needed,methods using fuzzy sets have proven to be afeasible alternative (Gopal and Woodcock, 1994;Woodcock and Gopal, 2000). The rough set


approach to uncertainty thus enriches the existingsuite of methods to represent uncertainty ingeographic datasets. The inherent type ofinformation uncertainty should guide the use ofprobability or fuzzy/rough set basedrepresentations of uncertainty.

One drawback of the rough classificationmethod is of course the problems of generatingmeasures of per-class accuracy or a KHAT-likestatistic. Although both a crisp-rough and arough-rough comparison can produce an overallaccuracy interval, applied questions will oftentake advantage of some more summary measures.

ConclusionsThis chapter has successfully demonstrated theevaluation of semantic uncertainty in manuallyinterpreted land cover classifications. Theanalysis illustrated how categorical uncertaintycould be translated into rough classifications. Thismade it possible to do assessments of thecorrespondence between two datasets usingsemantically uncertain data in one or both of theassessed data sets.

The evaluation of scale dependency in the useof certain landscape concepts provided noevidence for a scale dependent use of theseconcepts. Still, the results from chapter 5 indicatethe opposite possibility. It is therefore too early todraw any general conclusions other than a broadrecommendation of caution. Consequently, it wasargued that more extensive testing is required of apossible scale dependency of commonly usedland cover mapping concepts. Used methods werehere suggested as a way to estimate semanticaccuracy in such tests.

Using methods of rough classification anderror matrix extensions developed earlier, thepotential for these methods to performgeneralization and accuracy assessments wasfurther illustrated.

The findings also raise important questions forfuture quality assessments of digitally aggregateddata. Issues of semantic accuracy assessments aswell as contemporary techniques to producedigital land cover classifications were discussed.

Crisp, fuzzy, and rough decision support in GIS • 121

CRISP, FUZZY AND ROUGH DECISIONSUPPORT IN GIS

Introduction

In chapter 2 it was established that separate viewson the real world result in two generaldistinctions. One view where the world isconsidered to be made up of discrete objects andanother view that consider the world as beingmade up of a continuum of named attributes.From a semiotic perspective Sowa (2000) holdssymbolic and image like reasoning as twonecessary components of a complete system ofreasoning. Geographic information systems havetheoretically the ability to incorporate bothplenum/image like/field andatomic/symbolic/object views represented byrasters and vectors respectively.

It was further recognized that this theoreticalbasis is reflected in several proposed conceptualmodeling frameworks for geographical databases(Peuquet, 1988; Nyerges, 1991; Peuquet, 1994;Livingstone and Raper, 1994; Mark and Frank,1996; Usery, 1996; Bishr, 1998; Mennis et al.,2000). Most of these works suggest that it isnecessary to simultaneously represent field-based,object-based and time-based views to be able toprovide a full description of a geographicphenomenon.

Although some of the proposed frameworksnever have been extended towards the actualdatabase representation, those that were haveproposed set theoretic approaches for itsimplementation (Livingstone and Raper, 1994;Usery, 1996; Bishr, 1998; Mennis et al., 2000).

In this chapter I will suggest one possibleimplementation structure that not only integratesobject and plenum spatial views, but also iscapable of considering vagueness and ambiguityin these views. I also discuss how this furtheropens up possibilities for improvedinteroperability of geographic informationsystems.

Current and emerging techniques forgeographic data integrationWe can presume that future geographic questionswill need information not explicitly represented inavailable databases. One large challenge istherefore to derive implicit facts from explicitgeographic knowledge, and to produce astatement about the uncertainty in the presentedinformation.

Multi criteria evaluation (Carver, 1991;Eastman, 1997) or multi criteria decision analysis(Malczewski, 1999) is a group of methods thatcan be used to face this challenge. Such analysesare often implemented in geographic informationsystems by using two or more pieces (layers) ofattributes or constraints for a certain objective.The multi criteria decision analysis frameworkenables a combination of separate lines of certainor uncertain factors such as Boolean, probabilisticand possibilistic (fuzzy) information combinedthrough a set of rules into an answer or severalscenarios. Standard multi criteria decisionanalysis can for example make use of fuzzy setmembership factors and Boolean constraints thatmay be summed together to produce an outputresult.

Current multi criteria decision analysisimplementations still lack the indiscernibilityaspect of uncertainty. In brief, indiscernibilityrefers to the granularity of the knowledge whereasvagueness (fuzziness) arises due to gradualnotions of categories (cf. Fisher, 1999; Duckhamet al., 2000). Indiscernibility may be explained inthe context of a reclassification situation whenone or several of the source classes do nottranslate directly into the target classes. Forexample, elements of source class A could beelements of both classes 2 and 3 in the targetclassification.

It has been shown in chapter 6 that rough settheory and rough classification is capable ofrepresenting indiscernibility. It was alsoillustrated how proxy information could be usedas additional evidence for the target classification

Chapter

8

122 • Crisp, fuzzy, and rough decision support in GIS

and resolve some of the unresolved areas ofuncertainty. That is, information on soil moisturewas in chapter 6 taken from a soil map to eithersupport or reject ambiguities in the first roughreclassification. I will now readdress thisproblem, which in essence can be formulated as adecision function: given the old class and someadditional evidence, calculate the new class.

A suggested approachThe main problem addressed in this chapter is thecombination of nominal and continuous data inthe decision rule. In standard multicriteriaevaluation this is often solved by a conversion ofcontinuous variables into grades of membershipin a specified set.

This work uses a similar approach, and usesthe concept of bifuzzy sets to represent twodifferent facets of imprecision and these ideas areillustrated in an experiment. In this it isdemonstrated how to use crisp, fuzzy or roughsets to define a complete or incomplete translationfrom one concept to another, and create atransformation between contexts. Since noinformation in the experiment provides a one toone mapping between the source data and thetarget classification the idea is based on theintegration of multiple lines of evidence. It usesthe concepts of multi criteria decision analysisand fuzzy aggregation operations, extendingcurrent fuzzy approaches to incorporate crisp,fuzzy and rough sets through conversions intobifuzzy sets.

Merging together information about a certainconcept using object-based and location-basedviews, the experiment not only illustrates the useof this method to make transformations ofinformation from one context to another, it alsosuggests this framework as a solution to the widerproblem of integrating different representations ofgeographic space (cf Peuquet, 1988; Peuquet,1994; Couclelis, 1996; Couclelis, 1999; Mennis etal. 2000).

Method development

The final experiment will use continuous (z)wetness data converted into fuzzy membershipvalues MFj(z) corresponding to the targetclassification (j). We also convert indiscerniblenominal classes (i) through reclassification intorough classes corresponding to the targetclassification (j). The theoretical basis forcombining rough and fuzzy (z+i) classificationswill be referred to below and it includes the

conversion of data into bifuzzy classificationsbefore the actual integration is performed throughan intersection overlay operation. The experimentillustrates all this in an application that integratesbifuzzy classifications into a final image thatcombines both vague and indiscernibleinformation. A more detailed description of theformal development of the bifuzzy representationhas been presented at GIScience2000 (Ahlqvist,Keukelaar, and Oukbir, 2000b).

Transforming fuzzy and rough into bifuzzydataIt is of interest to see how the two facets ofimprecision that rough and fuzzy data representcan be integrated. It is tempting to convert roughsets to fuzzy ones by assigning membership value0 to elements not in the rough set, membershipvalue 1 to elements in the lower approximation ofthe rough set, and perhaps membership value 0.5to elements in the area of uncertainty of the roughset. We could even use some conditionalprobability instead of the value 0.5. However,Dubois and Prade (1992) argue that these attemptscan be only partially in agreement with fuzzy settheory. It seems, therefore, that a differentapproach is called for.

Two methodologies have been proposed tounify rough sets and fuzzy sets (Dubois andPrade, 1992). Both result in a mathematical object

which is a pair of fuzzy sets ),( FFM = , each

determined by a membership function Fµ and

Fµ . This will hereafter be called a bifuzzy set or

BF-set, and this mathematical object is able toexpress both uncertainty due to indiscernibility aswell as uncertainty due to vagueness (Ahlqvist,Keukelaar, and Oukbir, 2000b). In the same wayas with rough sets and rough classifications(chapter 6) is possible to define a bifuzzyclassification as a set of bifuzzy sets and todevelop useful quality measures that can beapplied on a single bifuzzy classification(Ahlqvist, Keukelaar, and Oukbir, 2000b).

Bifuzzy representations and multi-criteriaevaluationHaving defined a formal representation forvagueness and indiscernibility, let us now movedirectly further toward the question where wehave several classifications that we want toevaluate together. In the context of this work thiscan be formulated as a problem of combiningdifferent types of uncertainty such as vagueness


and ambiguity, and to this there seem to beexisting and workable approaches. The idea ofusing multiple sources that partly explain aspecific concept has close similarity with multi-criteria decision analysis (Eastman, 1997;Malczewski, 1999). In the following discussion Iwill in the last few paragraphs of this section firstsummarize some of the work in Ahlqvist,Keukelaar, and Oukbir (2000b). It illustrates theconceptual idea of using the multi criteria analysisframework to integrate fuzzy and rough datathrough the use of bifuzzy classifications. Afterthat I will exemplify this approach in anexperiment with geographical data.

Imagine a situation with three classificationsthat we want to use in a multi-criteria evaluation,one crisp, fully determined classification, asecond fuzzy classification with vagueinformation and a third rough classification withambiguous information. All three classificationsare defined on the same universe U , and on thesame index set I . The first classification is acrisp classification, C , consisting of a number ofcrisp classes C

iX . The second classification is a

rough classification R , consisting of a number of

rough classes ( )iiRi XXX ,= . Finally we have a

fuzzy classification F , consisting of a number offuzzy classes F

iX , each of which is a fuzzy set,

determined by the membership function FiXµ .

To use these three classifications together in amulti-criteria evaluation, we convert all of themto bifuzzy classifications. The conversion of thethree existing classifications proceeds in the moststraightforward way: A crisp class C

iX is

converted to a bifuzzy class ),( Ci

Ci XX

MiX µµ= ,

with 1)( =zCiXµ if C

iXz ∈ , 0 otherwise. A

fuzzy class FiX is converted to a bifuzzy class

),( Fi

Fi XX

MiX µµ= . A rough class R

iX is converted

to a bifuzzy class ),(ii XX

MiX µµ= , with

1)( =ziXµ if iXz ∈ , 0 otherwise, and,

similarly, 1)( =ziXµ if iXz ∈ , 0 otherwise.

To be able to proceed with a multi-criteriaevaluation, it is necessary to define some logic forthis, and for the purpose of this work only the

intersection and union operations on bifuzzy sets

will be defined. The intersection of two bifuzzysets is defined as

)),min(),,(min( BABABA µµµµ=∩ . Theunion of two BF-sets is defined similarly as

)),max(),,(max( BABABA µµµµ=∪ .Similarly, other operations such as product,

bounded sum, bounded difference and convexcombination defined for fuzzy sets (Burrough andMcDonnell, 1998) seem possible to extend tobifuzzy sets.

It is also possible to define something like anoverall accuracy measure (Congalton, 1991) forbifuzzy classifications. Let us assume that thereference classification, R , is compared with aclassification M . We could then compute WAM ,the weighted accuracy measure, Eq. 6

Here, the function ),,,( dcbaW defineswhat it means for a point to be correctlyclassified. If we desire exact equality of theuncertainty intervals, we could use

2/)2(),,,( dbcadcbaWexact −+−−= .Any deviation in the evaluated data from thereference will here result in a decrease of theglobal accuracy value. It does not make anydifference if the discrepancy is in the necessarymembership value or the possible membershipvalue. Other functions may also be defined for

),,,( dcbaW (Ahlqvist, Keukelaar and Oukbir,2000b) but is not necessary for the purpose of thischapter.

Experiment

The map data used in this illustration was takenfrom two separate investigations covering thesame area (Edberg et al. 1971; Ahlqvist andWiborn, 1992). These two vegetation maps hadbeen produced for nature preservation tasks.However, they used different vegetationclassification systems to produce the finalcategorical map sheet. These two maps weredigitized by scanning and segmentation into twoGIS raster images with pixel values representingvegetation class labels, Figure 41. Data entry andall subsequent GIS operations were performedusing the IDRISI for Windows v.2.0 software.The left map from 1971 uses originally 9 classesbut has been thematically aggregated to 3 classes;wet, mesic and dry vegetation. This map is calledVEG3. The right map of Figure 41 is from 1986,it uses 35 classes and will be called VEG35.

Eq. 6 �∈∈

=Ii U

RIi U

MMRRWA dxxdxxxxxWMiiiii

)())(),(),(),(( µµµµµ


The following experiment demonstrates thecombination of two different types of knowledgein a geographic analysis. Using the concept ofmulti criteria analysis, we take two differentsources of information and combine them into thefinal image. The two data sources are converted torough and fuzzy classifications to express thedifferent kinds of semantic uncertainty these datahave in the context of the target classificationsystem. A digital elevation model is used toproduce an image with gradual transitions fromnon-membership to full membership in the targetclasses given as fuzzy membership values. Theother line of information comes from the detailedvegetation map, VEG35, which is reclassified intoa rough classification. In the rough classification,membership, non-membership, and possiblemembership in the target classes are representedby the upper and lower approximation of roughclasses. The combination of the two is a bifuzzyrepresentation that accounts for both gradationand indiscernibility. The bifuzzy representation isfinally visualized by thresholding membershipvalues, which produce the final image of thetransformed information.

From fuzzy and rough to bifuzzyrepresentationsThe following sections will explain how theoriginal data was translated into rough and fuzzyrepresentations. It also shows how these two, thefuzzy and the rough classifications, weretransformed into bifuzzy classifications. Thewhole operation can be seen as the generalprocedure of getting two pieces of informationcompatible for assessment of for example changesin vegetation between two times.

Rough information – reclassification of crisp dataThe rules for rough reclassification wereestablished deductively using domain knowledge.Each original VEG35-class was evaluated againstthe target VEG3 classification system andtranslated into rough classification rules, Table36. Rules such as these could also be set up byinduction using a collection of ground truthsamples from each original class.

Table 36 shows that a majority of the classesin the original crisp classification cannot bereclassified directly into lower approximations ofthe target classification system. Following thereclassification rules set up in Table 36 the roughclassification R is produced. This is built up by 3

DryMesicWet

N N

Pine f orest ty pe 1Pine f orest ty pe 2Spruce f orest ty pe1Spruce f orest ty pe2Spruce f orest ty pe3Spruce f orest ty pe4Spruce f orest ty pe5Mixed conif er f orest ty pe 1Mixed conif er f orest ty pe 2Mixed conif er f orest ty pe 3Mixed conif er f orest ty pe 4Broad leaf ed decid. ty pe 1Broad leaf ed decid. ty pe 2Broad leaf ed decid. ty pe 3Alder f orest ty pe 1Alder f orest ty pe 2Birch/aspen f orest ty pe 1Birch/aspen f orest ty pe 2Birch/aspen f orest ty pe 3BrushwoodMixed f orest ty pe 1Mixed f orest ty pe 2Mixed f orest ty pe 3Early succession for. ty pe 1Clear cuts/non determinedAlder scrubGeoliteral shore v egetationSubliteral shore v egetationDry meadow ty pe 1Dry meadow ty pe 2Meadow ty pe 1Meadow ty pe 2Meadow ty pe 3Meadow ty pe 4Wet meadow

Figure 41 Original maps used in this study. Left, the VEG3 map produced by Edberg et al. (1971) thematicallyaggregated into dry, mesic and wet vegetation types, and right, the VEG35 map produced by Ahlqvist and Wiborn (1986).


rough classes( ) { }wetmesicdryiXXX ii

Ri ,,,, ∈= , each

being a rough set and represented in thisexperiment by 6 separate images.

Figure 42 illustrates the resulting roughclassification. In this, lower approximations arewhite and areas of uncertainty are shaded for eachof the 3 classes in the rough classification R. InTable 36 we noticed earlier that a majority of theoriginal classes in the crisp classification I cannotbe assigned to a lower approximation in the roughclassification R. Figure 42 shows the spatialoutcome of this, from which it becomes obviousthat the spatial overlap is fairly high.

The overlap and overall crispness measuresdescribed in chapter 6 are here calculated to beMo= 0.64 and Mc=0.41. In this case a totally crispclassification would yield Mo=0, Mc=1 and atotally rough classification Mo=2.0, Mc=0.Obviously the categorical granularity of the crispclassification I has not enough detail to discernbetween the different lower approximations in R.But still, the rough classification is not short ofinformation. It is often able to exclude one of theclasses as a possible alternative, leaving only twoalternatives to choose from. This information willbe used later on together with location-basedinformation about the possibility that a certainlocation belongs to one of the target classes. Thissecond piece of information is introduced next.

Fuzzy information – wetness indexThe VEG3 classification system is based uponvegetation moisture. Thus, additional information

about soil water content may resolve some of theindiscernibility that was a result of the roughreclassification in the previous section. Onecommonly used source of information on soil

Table 36 Original VEG35 classes and correspondingrough VEG3 classes

Original classificationI (VEG35)

Roughclassification

R (VEG3) 1 : Pine forest type 1 {Dry} 2 : Pine forest type 2 {Dry, mesic} 3 : Spruce forest type1 {Dry, mesic} 4 : Spruce forest type2 {Mesic, wet} 5 : Spruce forest type3 {Mesic} 6 : Spruce forest type4 {Wet} 7 : Spruce forest type5 N/A 8 : Mixed conifer for. type 1 {Dry} 9 : Mixed conifer for. type 2 {Dry, mesic} 10 : Mixed conifer for. type 3 {Mesic} 11 : Mixed conifer for. type 4 {Dry, mesic, wet} 12 : Broad leafed dec. type 1 {Mesic} 13 : Broad leafed dec. type 2 N/A 14 : Broad leafed dec. type 3 {Mesic} 15 : Alder forest type 1 {Wet} 16 : Alder forest type 2 {Wet} 17 : Birch/aspen forest type 1 {Dry, mesic} 18 : Birch/aspen forest type 2 {Mesic, wet} 19 : Birch/aspen forest type 3 {Dry, mesic} 20 : Brushwood {Mesic, wet} 21 : Mixed forest type 1 {Dry, mesic} 22 : Mixed forest type 2 {Mesic} 23 : Mixed forest type 3 {Wet} 24 : Early succession for. type 1 {Dry, mesic, wet} 25 : clear cuts/non determined {Dry, mesic, wet} 26 : alder scrub {Wet} 27 : geoliteral shore vegetation {Wet} 28 : subliteral shore vegetation N/A 29 : Dry meadow type 1 N/A 30 : Dry meadow type 2 {Dry, mesic} 31 : Meadow type 1 {Dry, mesic} 32 : Meadow type 2 {Dry, mesic} 33 : Meadow type 3 {Mesic} 34 : Meadow type 4 N/A 35 : Wet meadow {Mesic, wet}

i = dry i = mesic i = wet

i X

ii X X −

i X −

Figure 42 Images showing the outcome of the rough reclassification using the reclassification rules set up in Table 36.Lower approximations are colored white and areas of uncertainty are shaded for each of the 3 classes in the roughclassification R.


water content is the topographically basedwetness index, w, Eq. 7 (Moore et al. 1991). Thisindex is calculated for each cell in an elevationimage using the following formula:

Eq. 7�

��

�=

βtanln sAw

Here As is specific catchment area defined as theupslope area draining across a unit width ofcontour, and β is the slope of the cell. So, awetness index image was calculated from anoriginal digital elevation model from GeographicSwedish data (GSD) provided by the SwedishNational Land Survey, using the TAPES-Gsoftware (Gallant and Wilson, 1996). The useddigital elevation model had a spatial resolution of50x50m. The calculated wetness index valueswere interpolated over a 5x5m resolution image.The purpose for this was to transform the original50x50m resolution into the map data at 5x5mresolution. The accuracy of the output images isof course questionable, but the intent of thisexperiment is only to demonstrate the principlesbehind a rough-fuzzy data integration, not toproduce a fully accurate representation. The DEM

together with the derived wetness image aredisplayed in Figure 43.

The fuzzy membership functions used totransform the wetness image into fuzzyinformation were established using the semanticimport approach (cf. Burrough and McDonnell,1998). This method is suitable in cases wherethere is a fairly good qualitative knowledge abouthow to group data. The two major issues in thesemantic import approach are the choice betweenlinear, sinusoidal or some other function definingthe class membership, and the definition of thetransitions zone limits and widths. To make thingssimple I have chosen linear transition functionsand to support the establishment of transition zonelimits and widths, information from the oldervegetation map was used as a guiding reference.

The established membership functionsdisplayed in Figure 44 were used to produce thefuzzy classification F. This consists of 3 fuzzyclasses { }wetmesicdryiX F

i ,,, ∈ , each being

a fuzzy set and represented here by the threeimages displayed in Figure 44. These 3 imagesshow the membership values at every pixellocation in each of the 3 fuzzy classes, dry, mesic,

0 m

10 m

20 m

30 m

40 m

0.0

2.0

4.0

6.0

8.0

Elevation a.s.l (m) Wetness index (w)

Figure 43 Original digital elevation model over the study area together with the wetness index image produced using Eq.1 (Medgivande Lantmäteriverket 2000. Ur GSD - Höjddata, ärende nr L2000/646)


and wet. In a sense they convey a location-basedview of the target classification.

Transformation into bifuzzy dataAfter the conversion of source data into one roughclassification and one fuzzy classification it istime for the last step before the actual integrationof the two pieces of evidence to produce a finaloutput. Here I only follow the definition in one ofthe previous sections on how to produce bifuzzyclassifications from the fuzzy and roughclassifications. This results in two bifuzzyclassifications FM and RM originating from

the fuzzy and rough classifications respectively.This transformation does not change the actualinformation, it only translates it into the notationof bifuzzy classifications. Therefore the bifuzzyclassification FM is now represented by a set of6 images but 3 of those are identical to one of theother 3 images. Consequently the bifuzzyclassification version of the original fuzzyclassification still looks exactly as in Figure 44,and similarly the bifuzzy classification of theoriginal rough classification could be illustratedas Figure 42.

0 0.5 2.0 5.00.0

0.5

1.0

)(wMFi

)(wMFdry )(wMFmesic )(wMFwet

Wetness (w)

a) FdryX b) F

mesicX c) FwetX

MFi(w)0.0

0.5

1.0

Figure 44 Membership functions for the fuzzy reclassification (top) and the three images showing the three fuzzy classes

{ }F

wetmesicdryX ,, . In the images black areas represent no membership and higher degree of membership is increasingly

brighter reaching white at full membership.


Bifuzzy data integrationThe subsequent integration of bifuzzyclassifications used the concept of multicriteriaevaluation discussed earlier through anintersection operation

( ) ( )( )Ri

Fi

Ri

Fi MMMM

RF MM µµµµ ,min,,min=∩ .

The result can be represented as a set of 6 images,Figure 45. The grade of membership in theseimages is to be interpreted as the degree ofpossibility and necessity for the target classes. Atthis stage the previously defined quality measureswere calculated for the integrated bifuzzyclassification. The roughness of the integratedinformation was calculated to be 67.0=rM ,

which obviously gives an overall crispness33.0=cM . The bifuzzy overlap measure was

calculated to be 18.0=oM .

If these measures are compared with thebifuzzy classifications from the initial rough andfuzzy data, Table 37, we see that the integrationof the two data sources reduces the overallcrispness as well as the bifuzzy overlap. Giventhat the original fuzzy classification introducesdegrees of necessary membership in the lowerapproximation areas of the rough classification(Figure 45), this of course reduces the overallcrispness. At the same time the bifuzzyintersection operation reduces the amount ofoverlap since it uses a minimum operation on the



0.0 iMµ

0.5

1.0

0.0 iMµ

0.5

1.0

Figure 45 Images of the bifuzzy classification showing the degree of possible (upper row) membership and necessary(lower row) membership in each of the target classes; dry, mesic and wet vegetation respectively.


membership values. What is achieved is a resultwith less spatial confusion about the targetclasses. We also get an explicit representation ofthe spatial variation within each vegetation unitwith respect to the belongingness to a specificclass, Figure 45.

It is also possible that we for some reasonwish to de-fuzzify the result into a crisp result. Toillustrate how this can be done the six imageshave been combined to produce a final crispclassification C ′ . The logic followed in thiscombination is to let all bifuzzy lowerapproximations ( ) 0>z

iMµ at location z result

in crisp classes CiXz ′∈ in C ′ . This is possible

since all lower approximations of bifuzzy sets aredisjunct and will not result in locations beingassigned to multiple classes. The remaining areasthat may include overlapping areas are assigned tocrisp classes in C ′ based on the maximum valueof the upper approximations at each pixellocation. If for example a location has the

following values

{ }{ }5.0,4.0,1.0)(

,,=z

wetmesicdryMµ this will

produce the output classification CwetXz ′∈ . The

resulting image is given in Figure 46A and can becompared by the original VEG3 map from Edberget al. (1971) to the right, Figure 46B. If thestandard overall accuracy is calculated using thesecrisp maps we will find that it is 0.62.

A) Result B) VEG3

DryMesicWet

(i)

(ii)

N

Figure 46 A) Final map produced by intersection of bifuzzy classifications using rough reclassification of VEG35 fromAhlqvist and Wiborn (1986) and fuzzy classification from wetness index image. B) Original VEG3 map from Edberg et al.(1971). Arrows show examples of i) vegetation units in the final map that was previously boundary areas, and ii) newboundaries within original vegetation units as a result of moisture heterogeneity within these units.

Table 37 Quality measures for the bifuzzy

classifications RM , FM , and RFM ∪ that wereproduced respectively from the rough and fuzzyoriginal information, and through bifuzzy dataintegration.

Classification BF-Crispness

BF-Overlap

Rough RM 0.41 0.64

Fuzzy FM 1.00 0.32

Bifuzzy RFM ∪ 0.33 0.18



The relation between this approach and existingtheories and methods has been indicatedthroughout the text. The illustrated exampleserves as a backdrop against which the proposedideas have been presented and explained. InAhlqvist, Keukelaar and Oukbir, (2000b) some ofthe ideas of the method development and detailsof the experiment results were discussed. In thiswork I prefer to elaborate on the wider applicationof the described methodology in the context ofthis thesis.

Geographic information systemsinteroperabilityThe example demonstrated that the bifuzzyclassification is able to integrate data withdifferent aspects of uncertainty, such as vague andambiguous information. This has widerapplications as it can be seen as a generaltransformation mechanism between different usercontexts. In Figure 47 the experiment has beenput into an organizational context. Let us assumethe VEG35 map to be an existing inventoryproduced within a local authority for managementpurposes. Now suppose a regional organizationwants to produce a map with lower categoricalresolution covering the entire region usingexisting local information. The transformation of

local information can be made using methods ofrough, Figure 47 (1), fuzzy (2) and bifuzzy multi-criteria evaluation (3) outlined in the previousexperiment.

If we picture several local organizations, theirlocal information could be made accessiblethrough similar mappings either directly, Figure47 (4) or through some mediation with otherauxiliary data (5). It is also possible to transforminformation from the elevation data through theontology of VEG3 using the fuzzy membershipfunctions and then further reclassify it into theontology of VEG35, Figure 47 (6). In this processfuzzy wetness information, (2), is transformedinto a bifuzzy representation using a reversedversion of the rough reclassification rules (1) inTable 36. The result of this is essentially 35bifuzzy classes where the lower approximationsof these are empty and the upper approximationscould be illustrated with Figure 44.

Using the class numbers from Table 36, thefuzzy class dry, F

dryX , would be reclassified into

the upper approximation of bifuzzy classes {1-3,8-9,11,17,19,21,24-25,30-32} and would looklike Figure 44a. Similarly, F

mesicX , would be

reclassified into the upper approximation ofbifuzzy classes {2-5,9-12,14,17-22,24-25,30-33},Figure 44b, and F

wetX would be reclassified into

Local Inventory of vegetation, VEG35

Regional vegetation

Elevation Local Organization 1

Regional Organization

2

4Local Information, XXX

Auxillary data

5

7

Local Organization …n

3 1

6

Figure 47 Organizational perspective on exemplified (1-3) and possible (4-7) transformations using the suggestedapproach.


the upper approximation of bifuzzy classes{4,6,11,15-16,18,20,23-27,35}, Figure 44c. Sincethis is a rough classification there are overlaps,that is two or more contributing membershipfunctions that need to be resolved.

The most reasonable solution seems to be toapply the union operation on the contributingfuzzy membership values. Accordingly, forexample the derived VEG35 bifuzzy class 2would be assigned ( )F

mesicFdry XXM µµ ∪= ,02

and using the max-operator for fuzzy union theresult of this would look like Figure 48. The sameidea could be applied to other transformationsthrough already defined relationships. In Figure47 a transformation between the two localorganizations (7) therefore could be realizedthrough already established links (1-5).

The suggested chained transformations inFigure 47 require a careful consideration of theoriginal formulation, the meaning of each piece ofinformation, as the transformation rules areconstructed. A similar framework has beenproposed by Bishr (1998) and Bishr et al. (1999)as a general framework for semantic translatorscapable of mapping between spatial databaseschemas while preserving their semantics. Themain tool to connect semantically similar objectsin Bishr’s work is based on common ontologies,essentially a standardized vocabulary for variousdomains of interest. Very much the same idea isreflected by Gahegan (1999) and both suggest theuse of an interchange format (the term proxycontext in Bishr’s work) as a mediator totransform data from one information context toanother. As pointed out in the introduction, theuse of traditional set theory in these and otherworks imposes severe limitations on the ability torepresent semantically meaningful relationships(Kuhn, 1999; Mark and Frank, 1996). The use offuzzy and rough extensions to traditional settheory is primarily what makes the illustratedapproach substantially different from previouswork. It is anticipated that this improvementenables the representation of semantically richerrelations between concepts

Bifuzzy data integration and generalizationIn the beginning of this thesis it was argued thatthe traditional meaning of generalization has beenwidened in the digital world to include not onlymap output, but also almost any transitionbetween representational models of the realworld. Thus, digital generalization can be seen as

a process that changes the context of geographicdata.

The experiment is an example of one suchchange of context where the initial data was acategorically detailed vegetation description. Thissource data set was transformed into a contextthat emphasized vegetation wetness informationat a low categorical granularity. Thistransformation of the VEG35 source data into theVEG3 classification system included use ofadditional wetness information into the bifuzzyrepresentation.

These steps all relate to the process of modelgeneralization introduced earlier in chapter 3,which is a data reduction process that preserves ageometrically and semantically correct objectmodel. The normal understanding of the modelgeneralization process is that it that can bemodeled completely formally (Weibel, 1995).However, this view fails to address the importantissue of semantic accuracy (Salgé, 1995), whichwill be an issue as soon as the model-generalizedoutput is to be translated into any other model ofreality that can be understood by humans.Semantic accuracy has been thoroughly discussedin previous chapters. Hence, it suffices to repeatthat measurements always are made against alogical specification of the conceptual model thatwas used to collect the data (Veregin, 1999).Against this background it is instructive to look at

xM 2

µ

0.5

1.0

0.0

Figure 48 Example illustration of the upperapproximation of bifuzzy class 2 in the VEG35classification system after reclassification from fuzzywetness classes.


some of the accuracy values, remembering thatthe absolute values should not be taken tooseriously.

The original VEG3 map evaluated to be42.0=esxactWA compared with the derived

bifuzzy classification. Let us for one momentimagine that the derived bifuzzy classification

RFM ∪ was the full set of information that wasavailable to the surveyor/cartographer whoproduced the VEG3 map. That is, knowledgeabout the vegetation was at a fairly high thematicgranularity and there was also information on thewetness for the entire area. Under theseassumptions the accuracy value of 0.42 can beseen as a measure of the amount of generalizationthat has been applied on the original data when itwas transferred into a vegetation map with crisplydelineated polygons. Also, the fuzzy informationin the wetness image can be said to represent theheterogeneity within vegetation units from theaspect of wetness, that is within the context of thetarget classification.

This experiment exemplifies the discussion inchapter 3 on accuracy measurements for poorlydefined objects. It becomes obvious in this case ofpoorly defined features that the attribute accuracyis tightly connected with the spatial accuracy. Inthe final step it was shown how the bifuzzyclassification could be hardened into a map-likeimage, Figure 46A, that uses the samecartographic manner as the original VEG3 mapshown in Figure 46B. This produces a change ofgeometry in the result, as new boundaries arecreated within original map units Figure 46A (i),and new vegetation units are created at originalboundary locations Figure 46A (ii). In chapter 3 Ireferred to the discussion whether the location ofa boundary between two vegetation types isuncertain due to the problem of measuring theexact location of the vegetation types or if it isdue to the problem of discerning between the twovegetation types at the correct location(Goodchild, 1995; Painho, 1995). In this examplethe uncertainty of the location of the boundarybetween two vegetation types has been treatedprimarily as a problem of discerning between thetwo vegetation types.

The examples illustrate transformationsbetween different measurement frameworks(Chrisman, 1999) and also that differentmeasurement frameworks imply decisions aboutinformation granularity and accuracy.Accommodating for semantic accuracy means to

produce a representation that is correct both withrespect to the original model as well as theconceptualization of the target model. To addressmodel generalization from this perspective issomewhat conflicting with current understandingof the concept (Weibel and Dutton, 1999) andmaybe the division into model and cartographicgeneralization needs a revision. Anyhow,semantic considerations ultimately ask thegeneralization process to look for methods thatconvey geographic meaning and understanding ofspatial processes, a direction that has beenproposed by several authors (Müller, 1989;Ormsby and Mackaness, 1999; Van Beurden andDouven, 1999).

Controlling variablesThe demonstrated example shows how use ofprocess knowledge from the wetness image isused to discern between the different wetnessclasses in the VEG3 classification system. Theidea of scale dependent controlling factors wasintroduced in chapter 3 and Figure 7. In thisexperiment the use of an elevation imageconverted to wetness information is a way toutilize a ‘controlling variable’ to guide thedelineation in the target context. I thereforeanticipate that the proposed methodology can beused to define links between concepts andcontrolling processes.

However, this procedure is not asstraightforward as it might seem, and a commenton this is appropriate. The spatial granularity ofthe VEG35 map was 5m whereas the sourceelevation model used a 50m spatial granularity ofthe original measurement. In order to fit theelevation data to the grid of the map, it wasinterpolated over a 5m grid and in this process asmoothing effect was achieved. The selection ofthis data set, the following spatial transformationand wetness index calculation are all explicitdecisions about how the original measurementframework can be transformed without putting thecredibility of the final result at risk.

These decisions will be articulated in the finalprocess of hardening the bifuzzy maps into a crispfinal result. Many of the boundaries in the final,crisp map are drawn as a result of these decisionsand might have been located differently using forexample other sequences of these operations (VanBeurden and Douven, 1999). To be able to useprocess knowledge in this manner requires carefulconsideration of these issues. It also requiresexplicit representation of these considerations as


part of the metadata that explains in detail how adata set may be transformed. Reclassificationtables and fuzzy membership functions as thoseused in this experiment is one way to make suchconsiderations explicit and open for revision.

Bifuzzy sets and conceptual modelingIn chapter 2 it was established that a conceptuallevel geographic information model need tocapture the vital components of geographicinformation. I have also referred to a host ofauthors that all outline, time, space (3D), theme,and their inter-/intraconnections as basiccharacteristics that makes up geographicinformation. (Sinton, 1975; Peuquet, 1994;Albrecht, 1996; Usery, 1996; Gahegan, 1999)

The “triad”- view proposed by Peuquet (1994)is based on a dual framework of object- locationintegration (Peuquet, 1988) with the incorporationof time. This kind of simultaneous representationof multiple views of the same fact raises thequestion of how to find one common level ofunderstanding. The example has provided aninteresting illustration of this question. Somethingthat in the object based view was held as a wetvegetation type was a matter of gradedmembership in the field view. The integration ofthese two views through a bifuzzy set intersectioncreated the joint field/object based viewillustrated in Figure 45. The most important stepto achieve the bifuzzy representation is that boththe field based and the object-based view ismapped onto a common ontology. The choice ofcommon ontologies is still open for discussion. Itmay either be domain specific and standardizedontologies as suggested by Bishr (1998) or sets ofscale dependent ontologies of physicallycontrolling variables as suggested earlier. Themapping from the original data onto the chosencommon ontology may with the suggestedapproach be defined crisply, vaguely orindiscernibly through crisp, fuzzy or rough setsrespectively.

We see that the fairly dry mathematical idea ofa bifuzzy set has turned out to be able to negotiatecommon and diverging points of reference. Thisturn my discussion into the even more challengingissue of integrating different worldviews and whathas been formalized as ‘Group or OrganizationalDecision Support Systems’ (King and Star, 1990)introduced earlier in chapter 1. There the ideas of“due process” and “boundary objects” wereintroduced. The identification of importantconcepts such as different vegetation classes and

wetness information make it possible to thinkabout the spatial units as “boundary objects”(Star, 1987) for the negotiation of differentaspects of uncertainty.

It was described in chapter 1 that boundaryobjects come in different types. It appears clearernow that a standard multi criteria evaluationperformed as an overlay operation in a geographicinformation system makes use of the boundaryobject “terrain with coincident boundary”. Whensuch an evaluation is conducted it evaluates thejoint outcome of the overlay operation for eachspatial unit, be it a polygon or a pixel. The spatialunit essentially acts as a boundary object withinwhich similarities and differences are articulatedand negotiated by the overlay operator. Also thecase with a multi spectral image classificationprocess uses the boundary object idea, althoughnot usually acknowledged that way. Each locationpixel acts as a boundary object for the evaluationof the information from the different spectralbands in the image data.

It seem reasonable to think that thedemonstrated technique may be used in the kindof negotiation that can be expected from divergentviewpoints held by different people inorganizations. Such negotiations may often reachagreements around vague or imprecise terms assuggested by (King and Star, 1990). In the case ofthis experiment, space served as a boundaryobject around which the different aspects ofuncertainty where integrated. The regionalorganization, and the two local organizationsmay, according to King and Star (1990), exchangetheir information only through the process ofcontinuous identification, gathering and weighingof heterogeneous information into something hereformulated through bifuzzy classifications. This isan articulation of a “due process” (Star, 1987;King and Star, 1990), which has been described ina similar geographic setting by Harvey (1999).Within the due process, boundary objects sit inthe middle of a group of participants trying tonegotiate their divergent viewpoints. I imaginethat the use of space as a boundary object willmake it possible to apply multicriteria evaluationto perform concept mediation and thus perform atransformation between contexts within andbetween organizations. This remains to be fullytested, as also whether other types of boundaryobjects such as ‘repositories’, ‘ideal types’ or‘standardized forms’ prove to be suitable forgeographic applications.


Returning to the situation of a polygonoverlay, this is actually a two-step process wherethe first step consists of identifying the lines thatdefine the spatial limits of the boundary objects,and the second step is the actual negotiationprocess. That view of the polygon overlay processalso conforms with the transformational view ofGIS operations described by Chrisman (1997).

To conclude, spatial boundary objects aredefined and used in current digital geographicinformation analysis. I therefore propose that theboundary object idea can be used for coordinatinga specific objective, for example in a datatransformation process from one context toanother. Yet, it remains to prove whether the ideabased on bifuzzy data integration presented herewill contribute to a successful implementation ofthis proposition. It also remains to be verified in awider setting if the proposed framework iscapable of negotiating the different spatialconceptual frameworks, field and object basedviews.

Conclusions • 135

CONCLUSIONS

Summary of findings and objectivefulfillment

Spatial decision support using geographicinformation systems has been severely limited bythe problems of integrating different types ofgeographic data. Efforts directed towards formalspecifications of data structures, increased accessthrough data warehouses and content standardshas made important contributions. Still, theproblem of conveying the actual meaning ofgeographic phenomena across different databaseshas not until recently been tackled. This study hastried to investigate the matter of using oldinformation to answer new geographic questions.First of all I made clear that many questions needto be answered using existing data and in mostcases these will first need to be adjusted to fit thequestion. For geographic data an adjustment ofexisting data may involve changes in fivedimensions, three spatial, time and theme.

Review of both theoretical andmethodological aspects of integratinggeographic dataIn chapters 2 and 3 it was argued that spatialdecision support using geographic informationsystems requires integration of geographic data.Presented examples and cited literature madeclear that the process of geographic dataintegration still lack a firm theoretical basis and afull suite of tools based on such theories.

A translation between geographic abstractionsof real world features was determined to, at leastpotentially, include changes of spatial, temporaland categorical resolution of constituent data sets.The apparent complexity of such translationprocesses motivated a research approach whereeach dimension of a geographic dataset wasstudied separately. Appropriate data for theexperiments were compiled using already existingdata described in chapter 4 and supplemental datacollection.

Identification of important deficiencies orgaps in theory and/or methods forgeographic data integrationImportant deficiencies in current methods forgeographic data integration were found. The issueof semantic uncertainty as an important qualityaspect has received relatively little attention sofar. Thus, the theoretical background on this issuewas examined in detail in chapters 2 and 3. Also,the use of a traditional set theoretic approach tothe semantic level modeling of geographicinformation was reported as problematic in thesechapters. As a consequence, these questionsinitiated the research reported in chapters 5 and 6.

The classification and reclassificationambiguity found in chapters 5 and 6 can be seenas a kind of semantic uncertainty.

Chapter 5 demonstrated how changing thespatial resolution of a geographic data set causeboth predictable and non-predictable effects andhow these effects may vary for differentenvironmental variables. The mixed pixelsituation causes effects in the thematic dimension.

In the introductory parts of chapter 6 it wasconcluded that current representational techniquesusing crisp and fuzzy sets fail to addresssituations of classification uncertainty due toindiscernibility.

Identification of approaches that considergeographic context information.Solutions and recommendations that promote acontext sensitive use of existing geographic datawere proposed in chapters 5, 6, 7, and 8.

In chapter 5 the studies of changing spatialresolution made clear that different environmentalconcepts vary in their sensitivity to changes ofspatial resolution. This issue was further studiedwith a refined experimental design in chapter 7.From these studies no general solution to thespatial aggregation problem could be deduced. Itremains clear however that any change of thespatial resolution of a dataset must evaluate thesensitivity to semantic effects for this change.

Chapter 6 demonstrated the possibility torepresent (re)classification ambiguity due to

Chapter

9

136 • Conclusions

indiscernibility using rough set theory. Thismethod finds applications whenever operationsinvolve changes in the categorical resolution of ageographic dataset and when the categories aretranslated by one-to-many relations. This chapteralso showed how rough spatial data may becompared with other crisp or rough spatial datausing extensions to the normal error matrixparadigm.

Chapter 7 presented a testing technique toevaluate semantic uncertainty often found ingeographic data. The used method uses conceptsof rough classification together with manualinterpretations of landscape variables. The testdesign was suggested as a way to estimatesemantic accuracy of commonly used land coveror land use categories at different scale levels.

Chapter 8 proposed the concept of bifuzzyclassifications to enable data integration withdifferent types of semantic uncertainty involved.This technique was proposed to be used togetherwith existing methods of multi criteria evaluationto create a joint representation of crisp, vague andindiscernable uncertainties.

Suggested solution to support a contextsensitive use of existing geographic data.The theoretical background given in chapter 2 and3 led to the construction of the GeographicConcept Topology. The arguments for thisconstruct were given throughout these twochapters. The Geographic Concept Topology is aformalized representation of semanticinterrelations. I suggest a joint use of fuzzy andrough extensions to traditional set theory asdemonstrated in chapter 8. I further argue that thisenable explicit representation of semanticallyricher relations between geographic concepts. Thecollected set of such explicit links consequentlyprovides all interconnected data sets with a deepergeographical meaning.

From a general geographical point of view Ipropose that the GeCoTope framework enable theintegration of location-based and object-basedviews. It was shown in chapter 8 that it is possibleto extend several crisp, fuzzy, and roughtransformations into a general transformationmechanism between different user contexts andhence between different groups or organizations. Ialso argue that the GeCoTope framework mayserve as a mediator around which similarities anddifferences between different worldviews can benegotiated.

Demonstration of an application of a contextsensitive integration of geographic dataIn chapter 8 I demonstrate that the explicitrepresentation of crisp, fuzzy and rough relationsbetween independent datasets is a feasiblealternative to translation and full integration ofgeographic datasets. I also argue that such atranslation using predefined crisp, fuzzy andrough relations illustrate a kind of modelgeneralization. The proposed framework enablesthe use of common ontologies and/or physicallycontrolling factors at the desired level ofabstraction.

Conclusions

• Spatial decision support using geographicinformation systems requires integration ofgeographic data and this process still lack afirm theoretical basis and a full suite of toolsbased on such theories.

• Geographic data integration requirestranslation between abstractions of real worldfeatures, which often include changes ofspatial, temporal and categorical resolution ofconstituent data sets.

• Changing the spatial resolution of ageographic data set cause both predictable andnon-predictable effects and these effects mayvary for different environmental variables

• When changing the categorical resolution of ageographic dataset, it is possible to represent(re)classification ambiguity due toindiscernibility using rough set logic.

• Rough spatial data may be compared withother crisp or rough spatial data usingextensions to the normal error matrixparadigm

• The use of rough classification methodstogether with manual interpretations makes itpossible to evaluate the semantic uncertaintyoften found in geographic data.

• Integration of crisp, fuzzy and rough dataenable spatial decision support systems toconsider various aspects of categoricaluncertainty.

• Explicit representation of crisp, fuzzy andrough relations between datasets is a viablealternative to translation and full integration ofgeographic datasets.

• Translation using predefined crisp, fuzzy andrough relations enable model generalizationusing controlling factors at the desired level ofgeneralization.

References • 137

REFERENCES

Ahlqvist, O., 1998a: Increased spatial resolution inprimary data - The scale and aggregation problemrevisited. Poster presentation at the 8th annual GAPanalysis meeting, 1998, Santa Barbara, California,USA.

Ahlqvist and Wiborn, 1990, Tranviks naturreservat –Naturinventering med förslag till skötselplan (inSwedish), Norrtälje, 59p.

Ahlqvist, O and Arnberg, W., 1997, A theory ofgeographic context – The fifth dimension ingeographical information, in: Hauska, H. (Ed),Proceedings, ScanGIS’97, Stockholm Sweden.

Ahlqvist, O., Keukelaar, J., and Oukbir, K., 2000a,Rough classification and accuracy assessment,International journal of geographic informationscience,

Ahlqvist, O., Keukelaar, J. and Oukbir, K., 2000b:Fuzzy and Rough Geographic Data Integration,paper presented at GIScience 2000, FirstInternational Conference on GeographicInformation Science, Savannah, Georgia, USA,October 28-31, 2000.

Albrecht, J., 1996, Geographic objects and how toavoid them, in: P.A. Burrough and A.U. Frank(Eds), Geographic, Objects, With IndeterminateBoundaries, Taylor & Francis, London, pp.325-337.

Angelstam, P. 1997, Landscape analysis as a tool forthe scientific management of biodiversity, Ecol.Bull. 46, pp. 140-170.

Aslan, G. and McLeod, D., 1999, Semanticheterogeneity resolution in federated databases bymetadata implantation and stepwise evolution, TheVLDB Journal, 8:120-132.

Aspinall, R., 1992, An inductive modelling procedurebased on Bayes' theorem for analysis of pattern inspatial data, International Journal of GeographicalInformation Systems, vol.6, (no.2), pp.105-121.

Beurden, A.U.C.J. van, and Douven, W.J.A.M., 1999,Aggregation issues of spatial information inenvironmental research, International journal ofgeographic information science, 13 (5), pp.513-527

Bian, L., and Butler, R., 1999, Comparing effects ofaggregation methods on statistical and spatialproperties of simulated data, Photogrammetricengineering and remote sensing, 65(1):73-84.

Bishr, Y., 1997, Semantic aspects of interoperable GIS,PhD Thesis, ITC Publications Series No. 56,Enschede, The Netherlands, 154p.

Bishr, Y., 1998, Overcoming the semantic and otherbarriers to GIS interoperability, InternationalJournal of Geographic Information Science, vol.12,no.4, pp. 299-314.

Bishr, Y.A., Pundt, H., Kuhn, W., and Radwan, M.,1999, Probing the concept of informationcommunities – a first step toward semanticinteroperability, in: Goodchild, M., Egenhofer, M.,Fegeas, R., Kottman, C., Interoperating geographicinformation systems, Kluwer academic publishers,Boston/Dordrecht/London, 509 p., pp. 55-69.

Board C. 1967, Maps as models, in: Chorley, R.J., andHagget, P. (Eds), Models in geography, pp. 671-725, London.

Burrough, P.A. and McDonnell, R.A., 1998, Principlesof Geographical Information Systems, OxfordUniversity Press, 333p.

Buttenfield B.P. 1995, Object-oriented mapgeneralization: modelling and cartographicconsiderations, in: J-C. Müller, J-P. Lagrange, andR. Weibel (Eds), GIS and generalisation;Methodology and practice, Taylor & Francis,London, pp. 91-105.

Buttenfield B.P. and McMaster R.B. (eds.) 1991, Mapgeneralization; Making rules for knowledgerepresentation, Longman Scientific & Technical,Essex, England, 245p.

Carver, S.J., 1991, Integrating multi-criteria evaluationwith geographical information systems.International Journal of Geographical InformationSystems, vol.5, no.3, pp.321-339

Chrisman, N., 1999, A transformational approach toGIS operations, International Journal ofGeographical Information Science, vol.13, no.7,pp.617-637.

Chrisman, N.S., 1999, Trading Zones versus BoundaryObjects: incomplete translations of technicalexpertise, Presentation at Society for the SocialStudies of Science, San Diego CA October 1999,URL:http://faculty.washington.edu/chrisman/Present/4S99.pdf

Cliff, A.D., and Ord, J.K., 1981, Spatial processes:Models and applications, Pion, London, 265p.

Congalton, R.G., 1983, A quantitative method to testfor consistency and correctness inphotointerpretation, Photogrammetric engineeringand remote sensing, 49(1):69-74.

Congalton, R.G., 1991, A review of assessing theaccuracy of classifications of remotely sensed data,Remote sensing of Environment 37:35-46.

Congalton, R.G., and Green, K., 1999, Assessing theaccuracy of remotely sensed data: Principles andpractices, CRC Press, 137p.

Couclelis H. 1992, People manipulate objects (butcultivate fields): Beyond the raster-vector debate inGIS, in: Frank, A.U., Campari, I. and Formentini,U. (Eds), Theories and methods of spatio-temporalreasoning in geographic space, Lecture notes incomputer science 639, Springer, Berlin, pp. 65-77.

Couclelis, H., 1996, Towards an operational typologyof geographic entities with ill-defined boundaries,in: P.A. Burrough and A.U. Frank (Eds),Geographic, Objects, With IndeterminateBoundaries, Taylor & Francis, London, pp.45-55.

Couclelis, H., 1999, Space, time, geography, , in:Longley, P.A. et al. (Eds), Geographicalinformation systems – principles and technicalissues, second edition, John Wiley & sons Inc. pp.29-38.

Cousins, S.A.O., 1997, Mapping small linear elementsin rural landscapes at different scales andresolutions. Proceedings, 18th ICA/ACIInternational Cartographic Conference, pp. 2060-2067, Stockholm, Sweden.

Cromley, R.G., 1992, Digital cartography, PrenticeHall, Englewood Cliffs, New Jersey, 317p.

David B., Van Den Herrewegen M., and Salgé F. 1996,Conceptual models for geometry and quality ofgeographic information, in: P.A. Burrough and

138 • References

A.U. Frank (Eds), Geographic, Objects, WithIndeterminate Boundaries, Taylor & Francis,London, pp.193-206.

Dent B.D. 1993, Cartography - Thematic map design,W.M.C. Brown publishers, 427p.

Devogele, T., Parent, C., and Spaccapietra, S., 1998,On spatial database integration, Internationaljournal of geographic information science, 12 (4),pp.335-352 .

Dubois, D. and Prade, H., 1992, Putting rough sets andfuzzy sets together, In: Słowiński, R. (Ed.) 1992,Intelligent decision support – Handbook ofapplications and advances of the rough sets theory,Kluwer Academic Publishers, Dordrecht, 471p.,pp.203-232.

Duckham, M., Mason, K., Stell, J.G. and Worboys,M.F., 2000, A formal ontological approach toimperfection in geographic information, paperpresentation at GISRUK 2000, York, April 2000.

Eastman, J.R., 1997, Idrisi for Windows, User’s guide,V.2.0, Clark University, Worcester MA, USA.

Eastman, J.R., 1999, Multi-criteria evaluation and GIS,, in: Longley, P.A. et al. (Eds), Geographicalinformation systems – principles and technicalissues, second edition, John Wiley & sons Inc. pp.493-502.

Edberg, E., Lindberg, T., and Pettersson, M., 1971,Ängsö nationalpark och Tranvik – Vegetation ochfågelfauna (in Swedish), Rotobeckman, Stockholm,93p.

Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., andUthurusamy, R.(Eds), 1996, Advances inKnowledge Discovery and Data Mining, AAAIPress, Menlo Park, CA, 625p.

Fisher, P. F., 1999, Models of uncertainty in spatialdata, in: Longley, P.A. et al. (eds), Geographicalinformation systems – principles and technicalissues, second edition, John Wiley & sons Inc. pp.191-205.

FOLDOC, Free on-line dictionary of computing, URL:http://foldoc.doc.ic.ac.uk/foldoc/ index.html

Forman, R.T.T. and Godron, M., 1986, Landscapeecology, Wiley, New York, 619 p.

Frank A.U., and Mark D.M. 1991, Language issues forGIS, in: Maguire, D.J., Goodchild, M.F., andRhind, D.W. (Eds), Geographical InformationSystems, Vol. 1: Principles, Longman Scientific andTechnical, London, pp. 147-163.

Freksa C. and Barkowsky T. 1996, On the relationsbetween spatial concepts and geographic objects,in: P.A. Burrough and A.U. Frank (Eds),Geographic, Objects, With IndeterminateBoundaries, pp. 109-121, Taylor & Francis,London, 345 p.

Gahegan, M.N., 1999, Characterizing the semanticcontent of geographic data, models, and systems,in: Goodchild, M., Egenhofer, M., Fegeas, R.,Kottman, C., Interoperating geographicinformation systems, Kluwer academic publishers,Boston/Dordrecht/London, 509 p., pp. 71-83.

Gallant, J.C. and Wilson, J.P., 1996. TAPES-G: A grid-based terrain analysis program for theenvironmental sciences. Computers andGeosciences vol. 22 no. 7 pp 713-722.

Gombricht, E.H., 1977, Art and Illusion- a study in thepsychology of pictorial representation, LondonPhaidon, 386p.

Goodchild, M.F., 1995, Attribute accuracy, in: Guptill,S.C., and Morrison,J.L. (Eds), Elements of spatialdata quality, 1st ed., Elsevier Science Ltd, Oxford,pp. 59-79.

Goodchild, M.F., 1999, “Scale, resolution, process”, in:Montello, D.R., and Golledge, R., 1999, Scale anddetatil in the cognition of geographic information,Report of a specialist meeting held under theauspices of the Varenius project, Santa Barbara,Califiornia, May, 14-16, 1998, NCGIA, pp. 17-19.

Goodchild, M.F., and Longley, P.A., 1999, The futureof GIS and spatial analysis, , in: Longley, P.A. et al.(Eds), Geographical information systems –principles and technical issues, second edition,John Wiley & sons Inc. pp. 567-580.

Goodchild, M.F. and Quattrochi, D.A. (Eds.), 1997,Scale in Remote Sensing and GIS, LewisPublishers, Boca Raton, 406p.

Goodchild, M. F., Guoqing, S., and Shiren, Y., 1992,Development and test of an error model forcategorical data, International journal ofgeographic information systems, 6(2), pp. 87-104.

Goodchild, M.F., Bradley O. Parks, and Louis T.Steyaert (eds.) 1993. Environmental Modeling withGIS. Compiled papers and proceedings of the 1stInternational Conference/Workshop on IntegratingGIS and Environmental Modeling. OxfordUniversity Press, New York.

Goodchild, Michael F., Louis T. Steyaert, and BradleyO. Parks (eds) 1996a. Environmental Modelingwith GIS. Compiled papers and proceedings of the2nd International Conference/Workshop onIntegrating GIS and Environmental Modeling. GISWorld Books, Fort Collins, Colorado.

Goodchild, Michael F., Bradley O. Parks, Louis T.Steyaert, et al. (eds.) 1996b. The 3rd InternationalConference/Workshop on integrating GIS andEnvironmental Modeling. Santa Fe, NM. January21-26, 1996. National Center for GeographicInformation and Analysis. University of California,Santa Barbara. CD. Also see corresponding website at:http://www.ncgia.ucsb.edu/conf/SANTA_FE_CD-ROM/main.html

Goodchild, M.F., Egenhofer, M., Fegeas, R., Kottman,C., 1999, Interoperation geographic informationsystems, Kluwer academic publishers,Boston/Dordrecht/London, 509 p.

Gopal, S. and Woodcock, C., 1994, Theory andmethods for accuracy assessment of thematic mapsusing fuzzy sets, Photogrammetric Engineeringand Remote Sensing, vol.60, (no.2), pp.181-8.

Gould, M.D. 1994, GIS design: A hermeneutic view,Photogrammetric engineering & Remote Sensing,Vol.60, No.9, pp 1105-1115.

Gray, M.V., 1997, Classification as an impediment tothe reliable and valid use of spatial information: adisaggregate approach, in: Hirtle, S.C., and Frank,A.U. (Eds.), Spatial information theory: Atheoretical basis for GIS, Springer Verlag, Berlin,pp. 137-149.

Guptill, S.C., and Morrison,J.L. (Eds), Elements ofspatial data quality, 1st ed., Elsevier Science Ltd,Oxford, 202p.

Hall, O. and Arnberg, W., A Method For LandscapeRegionalization Based On Signatures And Fuzzy

References • 139

Membership, submitted to Computers, Environmentand Urban Systems.

Harvey, F., 1997, Improving multi-purpose GIS design:Participative design, in: Hirtle, S.C., and Frank,A.U. (Eds.), Spatial information theory: Atheoretical basis for GIS, Springer Verlag, Berlin,pp. 313-328.

Harvey, F., 1999, Designing for interoperability:Overcoming semantic differences, in: Goodchild,M.F., Egenhofer, M., Fegeas, R., Kottman, C.,Interoperating geographic information systems,Kluwer academic publishers,Boston/Dordrecht/London, 509 p., pp. 85-97.

Harvey, F. and Chrisman, N., 1998, Boundary objectsand the social construction of GIS technology,Environment & Planning A, vol.39, no.9 , pp.1683-1694

Hodgson, M.E., 1998, What size window for imageclassification? A cognitive perspective,Photogrammetric engineering and remote sensing,64(8):797-807.

Ihse, M., 1978, Aerial photo interpretation ofvegetation in south and central Sweden amethodological study of medium scale mapping,Swedish Environmental Protection Agency, PM1083, Solna, 165p.

Jelinski D.E. and Wu J. 1996, The modifiable areal unitproblem and implications for landscape ecology,Landscape Ecology, vol. 11 no. 3 pp 129-140.

Johnson A.R. 1996, Spatiotemporal hierarchies onecological modelling, in: Goodchild, M.F. et al.(Eds), Environmental Modeling with GIS. Compiledpapers and proceedings of the 2nd InternationalConference/Workshop on Integrating GIS andEnvironmental Modeling. GIS World Books, FortCollins, Colorado, pp 451-456.

Johnston, R.J., 1999, Geography and GIS, , in:Longley, P.A. et al. (Eds), Geographicalinformation systems – principles and technicalissues, second edition, John Wiley & sons Inc. pp.39-47.

Jones, C., 1998, Geographical information systems andcomputer cartography, Addison Wesley LongmanLimited, Essex, England, 319p.

Jonson, T., 1914, Om bonitering av skogsmark (Inswedish), SST, årg 12, pp. 369-392.

Kihlbom, D., 1991, Skärgårdsskog - ekologi och skötselJönköping. 163 pp (in Swedish).

King, J.L. and Star, S.L., 1990, Conceptual foundationsfor the development of organizational decisionsupport systems, Proceedings of the Twenty-ThirdAnnual Hawaii International Conference on SystemSciences, Kailua-Kona, HI, USA, 2-5 Jan. 1990,IEEE Comput. Soc. Press, Los Alamitos, CA, USA1990, pp.143-51

Klijn F., and Udo de Haes H.A. 1994, A hierarchicalapproach to ecosystems and its implications forecological land classification, Landscape Ecology,vol. 9 no. 2 pp 89-104.

Kuhn, W., 1999, An algebraic interpretation ofsemantic networks, in: Freksa, C. and Mark, D.M.(Eds.), Spatial information theory. Cognitive andcomputational foundations of geographicinformation science, Proceedings, Internationalconference COSIT’99, Stade, Germany, August 25-29, 1999, 476p., pp.331-347.

Laurini R., and Thompson D. 1992, Fundamentals ofspatial information systems, Academic presslimited, London, 680 p.

Lam, N.S-N. and Quattrochi, D.A., 1992, On the issuesof scale, resolution and fractal analysis in themapping sciences, Professional geographer, 44(1),pp.88-98.

Landis, J.R.; Koch, G.G., 1977, An application ofhierarchical kappa-type statistics in the assessmentof majority agreement among multiple observers,Biometrics, 33(2):363-74.

Lanter D.P. and Veregin, H., 1992, A researchparadigm for propagating error in layer-based GIS.Photogrammetric Engineering & Remote Sensing,Vol.58, No.6, pp.825-833.

Lenat, D.B., 1995, CYC: a large-scale investment inknowledge infrastructure., Communications of theACM, vol.38, (no.11), pp.32-38.

Leung, Y., 1997, Intelligent spatial decision supportsystems, Springer-Verlag, Berlin, 470p.

Livingstone D., and Raper J. 1994, Modellingenvironmental systems with GIS: theoreticalbarriers to progress, in: Worboys, M.F. (Ed),Innovations in GIS 1, Taylor & Francis, London,pp. 229-240.

Malczewski, J., 1999, GIS and multicriteria analysis,John Wiley & sons Inc., New Yourk, 392p.

Mark, D.M., 1993, Toward a theoretical framework forgeographic entity types, in: Frank, A.U. andCampari, I. (Eds), Spatial information theory : atheoretical basis for GIS, proceedings Europeanconference, COSIT'93, Marciana Marina, ElbaIsland, Italy, September 19-22, Berlin, Springer-Verlag, 477 p., pp. 270-283.

Mark, D.M., 1999, Spatial representation: a cognitiveview, in: Longley, P.A. et al. (Eds), Geographicalinformation systems – principles and technicalissues, second edition, John Wiley & sons Inc. pp.81-89.

Mark D.M. and Frank A.U. 1996, Experiential andformal models of geographic space, Environmentand planning B: Planning and design, volume 23,pp 3-24.

Mark, D.M., Smith, B., Tversky, B., 1999, Ontologyand geographic objects: An empirical study ofcognitive categorization, in: Freksa, C. and Mark,D.M. (Eds.), Spatial information theory. Cognitiveand computational foundations of geographicinformation science, Proceedings, Internationalconference COSIT’99, Stade, Germany, August 25-29, 1999, 476p., pp.283-298.

MacEachren A.M. 1995, How maps work;Representation, Visualization and Design, TheGuilford press, New York, USA, 513 p.

McDougall, E.B., 1976, Computer programming forspatial problems, John Wiley & Sons, New York,158p.

McGuire, K. 1992, Analyst variability in labellingunsupervised classifications, Photogrammetricengineering and remote sensing, 58(12):1705-1709.

McMaster, R.B., and Shea, K.S., 1992, Generalizationin digital cartography, Association of AmericanGeographers, Washington, 134p.

Meentemeyer V. 1989, Geographical perspectives ofspace, time and scale, Landscape Ecology, vol. 3nos. 3/4 pp 163-173.

140 • References

Mennis, J.L., Peuquet, D.J., and Qian, L., 2000, Aconceptual framework for incorporating cognitiveprinciples into geographical databaserepresentation, International Journal ofGeographical Information Science, vol.14, no.6,pp.501-520.

Meyer, R.J. 1998, ”You Say Resolution, I Say Fidelity,You Say Accuracy, I Say Detail, ...Let's NOT Callthe Whole Thing Off!”, paper presented at the 1998Spring Simulation Interoperability Workshop,URL:Http://siso.sc.ist.ucf.edu/siw/98spring/docs/papers/98s-siw-245.pdf

Milne, B.T. and Cohen, W.B., 1999, Multiscaleassessment of binary and continuous landcovervariables for MODIS validation, mapping andmodeling applications, Remote Sensing ofEnvironment, 70:82-98

Montello, D.R., and Golledge, R., 1999, Scale anddetatil in the cognition of geographic information,Report of a specialist meeting held under theauspices of the Varenius project, Santa Barbara,Califiornia, May, 14-16, 1998, NCGIA.

Moody, A. and Woodcock, C.E., 1994,Scale/dependent errors in the estimation of land-cover proportions: Implications for global land-cover datasets, Photogrammetric engineering andremote sensing, 60(5):585-594.

Moore, I.D., Grayson, R.B., and Ladson, A.R., 1991,Digital terrain modelling: a review of hydrological,geomorphological and biological applications,Hydrologic Processes, vol 5, no 1, pp.3-30.

Morrison, J.L., 1995, Spatial data quality, in: Guptill,S.C., and Morrison,J.L. (Eds), Elements of spatialdata quality, 1st ed., Elsevier Science Ltd, Oxford,202p.

Müller, J-C, 1989, Theoretical considerations forautomated map generalization, ITC journal, No 3/4,pp.200-204

Müller, J-C, Weibel, R., Lagrange, J.P., Salgé, F., 1995,Generalization: state of the art and issues, in:Müller, J-C, Lagrange, J.P., Weibel, R. (eds) GISand generalisation: methodological and practicalissues. Landon, Taylor and Francis, pp. 3-17

Nunes J. 1991, Geographic space as a set of concretegeographical entities, in: D.M. Mark and A.U.Frank (Eds), Cognitive and linguistic aspects ofgeographic space, Kluwer academic publishers, pp.9-33.

Nyerges T.L. 1991a, Representing geographicalmeaning, in: B.P. Buttenfield and R.B. McMaster(eds.), Map generalization; Making rules forknowledge representation, Longman Scientific &Technical, Essex, England, pp. 59-85.

Nyerges, T.L., 1991b, Geographic informationabstractions: conceptual clarity for geographicalmodeling, Environment and Planning A, vol 23, pp.1483-1499.

O'Neill, R.V., Johnson, A.R. and King, A.W., 1989, Ahierarchical framework for the analysis of scale,Landscape Ecology, vol. 3 nos. 3/4 pp 193-205,SPB Academic Publishing bv, The Hague

Openshaw, S., 1983, The modifiable areal unitproblem, Geo Books, Norwich, England, 40p.

Ormsby, D. and Mackaness, W., 1999, Thedevelopment of phenomenological generalizationwithin an object-oriented paradigm, Cartography

and Geographic Information Science, Vol.26, No.1,pp.70-80.

Painho M. 1995, The effects of generalisation onattribute accuracy on natural resource maps, in: J-C.Müller, J-P. Lagrange, and R. Weibel (Eds), GISand generalisation; Methodology and practice,Taylor & Francis, London, pp.194-206.

Pal, S.K. and Skowron, A. (Eds), 1999, Rough fuzzyhybridization : a new trend in decision-making,Singapore ; New York : Springer, 454p.

Pawlak, Z., 1982, Rough sets, International Journal ofComputer and Information Sciences, 11(5), pp.341-356.

Pawlak, Z., 1991, Rough sets : theoretical aspects ofreasoning about data, Kluwer Academic PublishersDordrecht/Boston, 229 p.

Peuquet D.J. 1988, Representations of geographicspace, toward a conceptual synthesis, Annals of theAssociation of American Geographers, 78(3), pp.374-394.

Peuquet D.J. 1994, It’s about time: A conceptualframework for the representation of temporaldynamics in geographic information systems,Annals of the Association of AmericanGeographers, 84(3), pp. 441-461.

Peuquet, D.J., 1999, Time in GIS and geographicaldatasets, in: Longley, P.A. et al. (Eds),Geographical information systems – principles andtechnical issues, second edition, John Wiley & sonsInc. pp. 91-103.

Påhlsson, L., 1972, Översiktlig vegetationsinventering(in Swedish), Statens naturvårdsverk, Stockholm,xx p.

Påhlsson L. (Ed) 1995, Vegetationstyper i Norden (inSwedish), Nordic Council of Ministers, Tema nord1994:665, Copenhagen.

Raper, J.F., 1999, Spatial representation: the scientist’sperspective, , in: Longley, P.A. et al. (Eds),Geographical information systems – principles andtechnical issues, second edition, John Wiley & sonsInc. pp. 61-70.

Rixskogstaxeringen, 1997, URL:http://www.-umea.slu.se/eng/dept/geom/prod/projekt/rikstax/

Rosenfield, G.H., and Fitzpatric-Lins, K., 1986, "ACoefficient of Agreement as a Measure of ThematicClassification Accuracy," PhotogrammetricEngineering and Remote Sensing, 52, 2, pp.223-227.

Salgé, F., 1995, Semantic accuracy, in: Guptill, S.C.,and Morrison,J.L. (Eds), Elements of spatial dataquality, 1st ed., Elsevier Science Ltd, Oxford,pp.139-151

Schneider, D.C., 1994, Quantitative ecology – Spatialand temporal scaling, Academic Press, San Diego,USA, 395p.

Schneider, M., 1995, Spatial data types for databasesystems, PhD Thesis, Fernuniversität, Hagen,Germany.

Sheth, A.P., 1999, Changing focus on interoperabilityin information systems: from system, syntax,structure to semantics, in: Goodchild, M.F.,Egenhofer, M., Fegeas, R., Kottman, C., 1999,Interoperation geographic information systems,Kluwer academic publishers,Boston/Dordrecht/London, 509 p., pp. 6-29.

Sinton, D.F., 1978, The inherent structure ofinformation as a constraint to analysis: Mapped

References • 141

thematic data as a case study, in: Dutton, G. (Ed.),Harvard papers on GIS, Volume 7, Addison-Wesley, Reading, MA, pp. 1-17

Skogsstyrelsen, 1966, Skogsstyrelsens återväxttaxering: inventering av återväxter (plantbestånd) somanlagts i Sveriges enskilda skogar i huvudsak under1950-talet (in Swedish), in: Sverigesskogsvårdsförbunds tidskrift , Skogsstyrelsen,Stockholm, pp. 211-328.

Smith, B., 1995, On drawing lines on a map, in: Frank,A.U. and Kuhn, W., (Eds), Spatial informationtheory. A theoretical basis for GIS,Berlin/Heidelberg, Springer verlag, pp. 475-84.

Smith, B., 1996, Mereotopology: A Theory of Parts andBoundaries, Data and Knowledge Engineering, 20,pp. 287–303.

Smith, B. and Mark, D.M., 1998, Ontology andGeographic Kinds, in: Poiker, T. K. and Chrisman,N. (eds.), Proceedings. 8th InternationalSymposium on Spatial Data Handling (SDH’98),12-15 July 1998, Vancouver, InternationalGeographical Union, 1998, pp. 308–320.

Sondheim, M., Gardels, K., and Buehler, K., 1999, GISinteroperability, in: Longley, P.A. et al. (Eds),Geographical information systems – principles andtechnical issues, second edition, John Wiley & sonsInc. pp. 347-358.

Sowa, J.F., 2000, Knowledge representation: logicalphilosophical and computational foundations,Pacific Grove, CA, USA, 593p.

Star, S. L., 1989. The structure of ill-structuredsolutions: Boundary objects and heterogeneousdistributed problem solving, in: Gasser, L. andHuhns, M. N. (Eds.), Distributed artificialintelligence, Research notes in artificialintelligence, Pitman, London, England, 390p., pp.37-54.

Star, S. L., & Griesemer, J. R. (1989). Institutionalecology, "translations" and boundary objects:Amateurs and professionals in Berkeley's Museumof Vertebrate Zoology, 1907-39. Social Studies ofScience, 19, 387-420.

Steele, J.H., 1989, Discussion: Scale and Coupling inecological systems, in: Roughgarden, J., May,R.M.and Levin, S.A. (Eds), Perspectives in EcologicalTheory, pp.177-180, Princeton university press,Princeton, New Jersey, 394p.

Svensson, S.A. 1980, The Swedish National forestsurvey 1973-1977, State of forest, growth andannual cut (In Swedish, summary in English),Swedish university of agricultural sciences, Report30, Umeå, 167p.

Svensson, S.A., 1983, Standard error of estimatesbased on the National Forest Survey 1973-82 (InSwedish, summary in English), Swedish universityof agricultural sciences, Report 34, Umeå, 98p.

Thomlinson, J.R., Bolstad, P.V., and Cohen, W.B,1998, Coordinating methodologies for scalinglandcover classifications from site-specific toglobal: Steps toward validating global mapproducts, Remote Sensing of Environment 70, pp.16-28

Tomlin, C.D., 1990, Geographic information systemsand cartographic modeling, Engelwood Cliffs, NJ,Prentice Hall, 249p.

Turner, M.G., O’Neill, R.V., Gardner, R.H., and Milne,B.T., 1989, Effects of changing spatial scale on the

analysis of landscape pattern, Landscape ecology,3(3/4) pp.153-162

UNEP (United Nations Environment Programme),1992, Convention on biological diversity, UNEPEnvironmental Law and Institutions ProgrammeActivity Centre, Nairobi, Kenya, 52 p.

Usery, E.L.,1996, A conceptual framework and fuzzyset implementation for geographic features, in: in:P.A. Burrough and A.U. Frank (Eds), Geographic,Objects, With Indeterminate Boundaries, Taylor &Francis, London, 345 p., pp. 71-85.

Van Beurden, A.U.C.J. and Douven, W.J.A.M., 1999,Aggregation issues of spatial information inenvironmental research, International journal ofgeographic information science, 13(5), pp. 513-527

WCED (World Comminssion on Environment andDevelopment), 1987, Our common future,University press, Oxford, 383p.

Vckovski, A., 1998, guest editorial: Special issue,Interoperability in GIS, International Journal ofGeographic Information Science, vol.12, no.4, pp.297-298.

Weibel, R., 1995, Three essential building blocks forautomated generalization. In: Müller, J-C,Lagrange, J.P., Weibel, R. (eds) GIS andgeneralisation: methodological and practicalissues. Landon, Taylor and Francis, pp. 56-69.

Weibel, R., and Dutton, G., 1999, Generalising spatialdata and dealing with multiple representations, in:Longley, P.A. et al. (eds), Geographicalinformation systems – principles and technicalissues, second edition, John Wiley & sons Inc. pp.125-155.

Veregin, H., 1999, Data quality parameters, in:Longley, P.A. et al. (eds), Geographicalinformation systems – principles and technicalissues, second edition, John Wiley & sons Inc. pp.177-189.

Wiens, J.A. 1992, Ecology 2000: An Essay on Futuredirections in Ecology, Ecological Society ofAmerica Bulletin 73(3), pp.165-170, EcologySociety of America.

Wilkinson, G.G., 1996, A review of current issues inthe integration of GIS and remote sensing data,International journal of geographic informationsystems, 10 (1), pp.85-101 .

Woodcock, C.E. and Strahler, A.H., 1987, The factor ofscale in remote sensing, Remote Sensing ofEnvironment, 21, pp.311-332.

Woodcock, C.E. and Gopal, S., 2000, Fuzzy set theoryand thematic maps: accuracy assessment and areaestimation, International Journal of GeographicalInformation Science, vol.14, no.2, pp.153-157.

Worboys, M.F., 1998a, Imprecision in finite resolutionspatial data, Geoinformatica, 2(3), pp. 257-280.

Worboys, M.F., 1998b, Computation with imprecisegeospatial data. Computers, Environment andUrban Systems, vol.22, (no.2), pp.85-106.

Zadeh, L.A., 1965, Fuzzy sets, Information andcontrol, 8:338-353.

THE DEPARTMENT OF PHYSICAL GEOGRAPHY, STOCKHOLM UNIVERSITY,Dissertation Series

PETER JANSSON, 1994. Studies of Short-term Variations in Ice Dynamics, Storglaciären, Sweden.Dissertation No. 1. (Fakultetsopponent: Prof. Hans Röthlisberger.)

ELISABETH ISAKSSON, 1994. Climate Records From Shallow Firn Cores, Dronning Maud Land, Antarctica.Dissertation No. 2. (Fakultetsopponent: Prof. Claus Hammer.)

KARIN HOLMGREN, 1995. Late Pleistocene Climatic and Environmental Changes in Central Southern Africa.Dissertation No. 3. (Fakultetsopponent: Prof. Steve Trudgill.)

PIUS ZEBHE YANDA, 1995. Temporal and Spatial Variations of Soil Degradation in Mwisanga Catchment,Kondoa, Tanzania. Dissertation No. 4. (Fakultetsopponent: Prof. Jonas Åkerman.)

ANDERS MOBERG, 1996. Temperature Variations in Sweden Since the 18th Century. Dissertation No. 5.(Fakultetsopponent: Prof. Philip. D. Jones.)

ARJEN P. STROEVEN, 1996. Late Tertiary Glaciations and Climate Dynamics in Antarctica: Evidence fromthe Sirius Group, Mount Fleming, Dry Valleys. Dissertation No. 6. (Fakultetsopponent: Prof. DavidSugden.)

ANNIKA DAHLBERG, 1996. Interpretations of Environmental Change and Diversity: A Study from North EastDistrict, Botswana. Dissertation No. 7. (Fakultetsopponent: Prof. Andrew Warren.)

HELLE SKÅNES, 1996. Landscape Change and Grassland Dynamics. Retrospective Studies Based on AerialPhotographs and Old Cadastral Maps During 200 Years in South Sweden. Dissertation No. 8.(Fakultetsopponent: Prof. Hubert Gulinck.)

CLAS HÄTTESTRAND, 1997. Ribbed Moraines and Fennoscandian Palaeoglaciology. Dissertation No. 9.(Fakultetsopponent: Prof. Chris Clark.)

GUÐRÚN GÍSLADÓTTIR, 1998. Environmental Characterisation and Change in South-western Iceland.Dissertation No. 10. (Fakultetsopponent: Prof. Jesper Brandt)

JENS-OVE NÄSLUND, 1998. Ice Sheet, Climate, and Landscape Interactions in Dronning Maud Land,Antarctica. Dissertation No. 11. (Fakultetsopponent: Prof. David Sugden)

MATS G. ERIKSSON, 1998. Landscape and Soil erosion History in Central tanzania. A Study Based onLacustrine, Colluvial and Alluvial Deposits. Dissertation No. 12. (Fakultetsopponent: Prof. MichaelThomas)

NKOBI M. MOLEELE, 1999: Bush encroachment and the role of browse in cattle production. An ecologicalperspective from a bush encrached grazing system, Olifants Drift, Kgatleng District, SoutheastBotswana. Dissertation No. 13. (Fakultetsopponent: Prof. Norman Owen-Smith)

LARS-OVE WESTERBERG, 1999: Mass movement in east African highlands: Processes, effects and scarrecovery. Dissertation No. 14. (Fakultetsopponent: Prof. Ana L. Coelho Netto)

MALIN M. STENBERG, 2000: Spatial Variability and Temporal Changes in Snow Chemistry, Dronning MaudLand, Antarctica. Dissertation No. 15. (Fakultetsopponent: Prof. Jon-Ove Hagen)

OLA AHLQVIST, 2000, Context Sensitive Transformation of Geographic Information, Dissertation No. 16.(Fakultetsopponent: Prof. Peter Fisher)

DEPARTMENT OF PHYSICAL GEOGRAPHY, STOCKHOLM UNIVERSITY,Doctor of Philosophy Dissertations Before 1994

SIGURDUR THORARINSSON, 1944. Tefrokronologiska studier på Island. Πjórsárdálur och dess förödelse.*CARL MANNERFELT, 1945. Några glacialmorfologiska formelement.*O. HARALD JOHNSSON, 1946. Termisk-hydrologiska studier i sjön Klämmingen.*CARL C. WALLÉN, 1949. Glacial meteorological investigations on the Kårsa Glacier in Swedish Lappland

1942–1948.*CARL-GUSTAV HOLDAR, 1957. Deglaciationsförloppet i Torneträsk-området.VALTER SCHYTT, 1958. A. Snow studies at Maudheim. B Snow studies inland. C. The inner structure of the

ice shelf at Maudheim as shown by core drilling.GUNNAR ØSTREM, 1965. Studies of ice-cored moraines.P. J. WILLIAMS, 1969. Properties and behaviour of freezing soils.LEIF WASTENSON, 1970. Flygbildstolkning av berghällar och blockmark. Metodstudier av

tolkningsmöjligheten i olika flygbildsmaterial.LARS-ERIK ÅSE, 1970. Shore-displacement in Eastern Svealand and Åland during the last 4,000 years.STIG JONSSON, 1970. Strukturstudier av subpolär glaciäris från Kebnekaiseområdet.BO STRÖMBERG, 1971. Isrecessionen i området kring Ålands hav.ERIK BERGSTRÖM, 1973. Den prerecenta lokalglaciationens utbredningshistoria inom Skanderna.KURT VIKING ABRAHAMSSON, 1974. Äkäslompol-områdets glacialmorfologi och deglaciation.DIETRICH SOYEZ, 1974. Geomorfologiska inventeringar och deglaciationsstudier i Dalarna och Västerbotten.WESTON BLAKE JR., 1975. Studies of glacial history in the Queen Elizabeth Islands, Canadian arctic

archipelago.WIBJÖRN KARLÉN, 1976. Holocene climatic fluctuations indicated by glacier and tree-limit variations in

northern Sweden.BENGT LUNDÉN, 1977. Jordartskartering med flygbildsteknik. En metodundersökning i olika bildmaterial.MARGARETA IHSE, 1979. Flygbildstolkning av vegetation. Metodstudier för översiktlig kartering.LILL LUNDGREN, 1980. Soil erosion in Tanzanian mountain areas.KURT-ÅKE ZAKRISSON, 1980. Vårflödesprognoser med utgångspunkt från snötaxeringar i

Malmagenområdet.OLLE MELANDER, 1980. Inlandsisens avsmältning i norvästra Lappland.ANN-CATHRINE ULFSTEDT, 1980. Fjällen i Västerbotten och södra Norrbotten. Anmärkningar om

geomorfologi och isrecession.WOLTER ARNBERG, 1981. Multispectral reflectance measurements and signature analysis.CARL CHRISTIANSSON, 1981. Soil erosion and sedimentation in Semi-arid Tanzania.JOHAN KLEMAN, 1985. The spectral reflectance of coniferous tree stands and of barley influenced by stress.

An analysis of field measured spectral data.LENNART VILBORG, 1985. Cirque forms in northern and central Sweden. Studies 1959–1985.PER HOLMLUND, 1988. Studies of the drainage and the response to climatic change of Mikkaglaciären and

Storglaciären.HENRIK ÖSTERHOLM, 1989. The glacial history of Prins Oscars Land Nordaustlandet, Svalbard.LAINE BORESJÖ, 1989. Multitemporal analysis of satellite data for vegetation mapping in Sweden.INGMAR BORGSTRÖM, 1989. Terrängformerna och den glaciala utvecklingen i södra fjällen.CHRISTIAN BRONGE, 1989. Climatic aspects of hydrology and lake sediments with examples from northern

Scandinavia and Antarctica.KJELL WESTER, 1992. Spectral signature measurements and image processing for geological remote sensing.GUNHILD ROSQVIST, 1992. Late Quaternary equatorial glacier fluctuations.MAJ-LIZ NORDBERG, 1993. Integration of remote sensing and ancillary data for geographical analysis and

risk assessment of forest regeneration failures.

* Dissertations in Physical Geography from the Department of Geography, Stockholm University College

CONTEXT SENSITIVE TRANSFORMATION OF GEOGRAPHIC …190847/FULLTEXT01.pdf · This research is...

Documents

Transcript of CONTEXT SENSITIVE TRANSFORMATION OF GEOGRAPHIC …190847/FULLTEXT01.pdf · This research is...