Geographic Information Retrieval (GIR) Definitions: Geographic information retrieval (GIR) is...

1
Geographic Information Retrieval (GIR) Geographic Information Retrieval (GIR) Definitions: Geographic information retrieval (GIR) is concerned with spatial approaches to the Definitions: Geographic information retrieval (GIR) is concerned with spatial approaches to the retrieval of geographically referenced, or Georeferenced information objects (GIOs): retrieval of geographically referenced, or Georeferenced information objects (GIOs): Information objects that are about specific regions or features on or near the surface of the Earth. Information objects that are about specific regions or features on or near the surface of the Earth. Geospatial data are a special type of georeferenced information that encodes a specific geographic Geospatial data are a special type of georeferenced information that encodes a specific geographic feature or set of features along with associated attributes feature or set of features along with associated attributes maps, air photos, satellite imagery, digital geographic data, etc maps, air photos, satellite imagery, digital geographic data, etc Georeferencing and GIR Georeferencing and GIR Within a GIR system, e.g., a geographic digital library, information objects can be georeferenced by Within a GIR system, e.g., a geographic digital library, information objects can be georeferenced by place names or by geographic coordinates (i.e. longitude & latitude) place names or by geographic coordinates (i.e. longitude & latitude) GIR is not GIS GIR is not GIS GIS is concerned with spatial representations, relationships, and analysis at the level of the GIS is concerned with spatial representations, relationships, and analysis at the level of the individual spatial object or field. individual spatial object or field. GIR is concerned with the retrieval of geographic information resources (and geographic information GIR is concerned with the retrieval of geographic information resources (and geographic information objects at the set level) that may be relevant to a geographic query region. objects at the set level) that may be relevant to a geographic query region. Spatial Approaches to GIR Spatial Approaches to GIR A spatial approach to geographic information retrieval is one based on the integrated use of spatial A spatial approach to geographic information retrieval is one based on the integrated use of spatial representations, and spatial relationships. representations, and spatial relationships. A spatial approach to GIR can be qualitative or quantitative A spatial approach to GIR can be qualitative or quantitative Quantitative: Quantitative: based on the geometric spatial properties of a geographic information object based on the geometric spatial properties of a geographic information object Qualitative: Qualitative: based on the non-geometric spatial properties. based on the non-geometric spatial properties. Based on the Based on the coordinate coordinate encoding of encoding of spatial representations spatial representations The coordinate representation is used to geometrically determine: The coordinate representation is used to geometrically determine: topological topological spatial relationships, spatial relationships, metric metric spatial characteristics spatial characteristics Topological Relationships Topological Relationships Relative spatial relations without concern for measureable distance or absolute direction. Relative spatial relations without concern for measureable distance or absolute direction. Possible Topological relationships between two overlapping regions: Possible Topological relationships between two overlapping regions: Spatial Representation - Considerations •What spatial characteristics should be represented and how? •How much coordinate detail is needed to represent geographic features or regions? •How can imprecise geographic regions be represented? •How do the representation choices impact the types of queries that can be asked and the spatial operations that can be performed within the system? •How do the representation choices impact storage and system performance? •How do the representation choices impact those people who will need to catalog (i.e. encode) these representations and those who will use these representations? Geometric Approximations •The decomposition of spatial objects into approximate representations is a common approach to simplifying complex and often multi-part coordinate representations •Types of Geometric Approximations •Conservative: superset •Progressive: subset •Generalizing: could be either •Concave or Convex –Provide faster geometric operations on convex polygons Goals of a geometric Approximation Quality: The quality criterion concerns maximizing the degree to which the approximation models the complex spatial object from which it was derived. •A simple measure of quality is area: the greater the difference in area between an approximation and a spatial object, the poorer the quality. Given this definition, a point representation is the poorest quality approximation of a geographic region or feature, since all of these have area. Simplicity: The simplicity criterion concerns maximizing the computational efficiencies of storage, disk access, and computational geometry operations that utilize the approximation. –Factors that contribute to simplicity include: fewer points needed to encode the representation; fewer parts, simple polygons without holes, and convexity rather than concavity. –In the context of GIR, simplicity also has to involve a consideration of how easy it is to create (catalog) the approximation and for the end-user to understand the Geographic Information Retrieval (GIR): Searching Where Geographic Information Retrieval (GIR): Searching Where and What and What Ray R. Larson and Patricia Frontiera: Ray R. Larson and Patricia Frontiera: School of Information Management & Systems and College of Environmental Design, University of California, School of Information Management & Systems and College of Environmental Design, University of California, Berkeley -- [email protected] Berkeley -- [email protected] This research was sponsored at U.C. Berkeley by the This research was sponsored at U.C. Berkeley by the National Science Foundation and the Joint Information National Science Foundation and the Joint Information Systems Committee (UK) under the International Digital Systems Committee (UK) under the International Digital Libraries Program award #IIS-99755164. Additional Libraries Program award #IIS-99755164. Additional Support was provided by the Institute for Museum and Support was provided by the Institute for Museum and Library Services as part of the “Going Places in the Library Services as part of the “Going Places in the Catalog” project. Catalog” project. Acknowledgements Acknowledgements Geographic data are an extremely important resource for a wide range of scientists, planners, policy makers, and analysts who study natural and planned environments. Notably, the landscape of geographic analysis has been changing rapidly from data and computation poor to data and computation rich. Developments in digital electronic technologies, such as satellites, integrated GPS units, digital cameras, and miniature sensors, are dramatically increasing the types and amounts of digitally available raw geographic data and derived information products. At the same time, advances in computer hardware, software and network technologies continue to improve our ability to store and analyze these large, complex data sets. These factors are contributing to a growing political, social, scientific and economic awareness of the value of geographic information and driving new applications for its use. In response to this, geographic digital libraries that specialize in providing access to these data are growing in number, collection size, and sophistication. Moreover, mainstream digital libraries, i.e. those that deal with primarily text materials, are increasingly considering geographic access methods for information resources that, while not specifically about geographic features, have important geographic characteristics. Simply stated, most of the objects in digital libraries are, to a greater or lesser extent, about or related to particular places on or near the surface of the Earth. Place name georeferencing is extremely effective because names are the primary means by which people refer to geographic locations. However, place names have well-documented lexical and geographical problems. Lexical problems include lack of uniqueness, alternate names or spellings, and name changes. Geographical problems include boundaries that change over time, places with ambiguous boundaries, and geographic features or areas of interest without known place names. Unlike place names, geographic coordinate representations provide an unambiguous and persistent method for locating geographic areas or features. However, the use of coordinates presents many challenges in terms of storage, indexing, processing and user interface design that only recently have begun to be investigated in the context of geographic information retrieval (GIR). . San Francisco Bay Area -122.418, 37.775 The Geographic Footprint •In GIR applications, the geographic footprint is typically the only quantitative spatial characteristic that is encoded and utilized. •The Footprint is a geometric representation of the extent of the geographic content of the information object being described. Usually expressed in geographic coordinates (i.e. latitude and longitude •Points: maintain a general sense of location but not extent or shape •Polygons: identify location, extent, and shape with varying degree of precision –The minimum aligned bounding rectangle (MBR) is the most commonly used polygonal spatial representantion in GIR systems. Spatial query formation •A spatial approach to GIR requires a geographic interface to support spatial thinking and query formation. Spatial Queries – key issues •Communicating with the user: if the user selects a place name from a list, what type of geometric approximation is used to represent the query region (a point, a simple bounding box polygon, a complex polygon?) •Level of detail in a graphic interface needs to be sufficient to support geographic queries. •How can queries for more complex spatial characteristics be supported? –Density, dispersion, pattern Spatial Query Example: 1 st and 2 nd generation interfaces from the FGDC/NSDI Efforts: Presentation of Results and Spatial evaluation •The user interface for a GIR system requires a geographic interface to support the formation of spatial queries, this component is also needed to support the presentation of results to the user and to assist the user in evaluating the results. This type of display also helps the user understand how the system has interpreted the geospatial aspect of his query and matched it against the available information objects. Example: Results display from CheshireGeo: Spatial Ranking methods from the Literature: Spatial Similarity Measures Matching and Spatial Ranking •Spatial similarity can be considered as a indicator of relevance: documents whose spatial content is more similar to the spatial content of query will be considered more relevant to the information need represented by the query. •Need to consider both: Qualitative, non-geometric spatial attributes and Quantitative, geometric spatial attributes •Three basic approaches to spatial similarity measures and ranking Geodata.gov NSDI Clearinghouse The Geodata.gov site provides better support for a query on wetlands near Petaluma because of the increase cartographic detail that appears as the user zooms in. ( you can’t even find Petaluma on the NSDI site – and you can get even more lost if you zoom in further). Method 1: Simple Overlap •Candidate geographic information objects (GIOs) that have any overlap with the query region are retrieved. •Included in the result set are any GIOs that are contained within, overlap, or contain the query region. •The spatial score for all GIOs is either relevant (1) or not relevant (0). •The result set cannot be ranked –topological relationship only, no metric refinement Method 2: Topological Overlap •Spatial searches are constrained to only those candidate GIOs that either: •are completely contained within the query region, •overlap with the query region, •or, contain the query region. •Each category is exclusive and all retrieved items are considered relevant. •The result set cannot be ranked •categorized topologoical relationship only, •no metric refinement Method 3: Degree of Overlap •Candidate geographic information objects (GIOs) that have any overlap with the query region are retrieved. •A spatial similarity score is determined based on the degree to which the candidate GIO overlaps with the query region. •The greater the overlap with respect to the query region, the higher the spatial similarity score. •This method provides a score by which the result set can be ranked •topological relationship: overlap •metric refinement: area of overlap Our Approach Is a Probabilistic Estimate of Probability of Relevance based on Logistic Regression from from a sample of data with relevance judgements. •Test Data •2554 metadata records indexed by 322 unique geographic regions (represented as MBRs) and associated place names. –2072 records (81%) indexed by 141 unique CA place names •881 records indexed by 42 unique counties (out of a total of 46 unique counties indexed in CEIC collection) •427 records indexed by 76 cities (of 120) •179 records by 8 bioregions (of 9) •3 records by 2 national parks (of 5) •309 records by 11 national forests (of 11) •3 record by 1 regional water quality control board region (of 1) •270 records by 1 state (CA) –482 records (19%) indexed by 179 unique user defined areas (approx 240) for regions within or overlapping CA •12% represent onshore regions (within the CA mainland) •88% (158 of 179) offshore or coastal regions •Geographic Approximations for CA Counties, UDAs, and training sample: Logistic Regression model #1 X 1 = area of overlap(query region, candidate GIO) / area of query region X 2 = area of overlap(query region, candidate GIO) / area of candidate GIO Where: Range for all variables is 0 (not similar) to 1 (same) Results in Mean Average Query Precision: (the average precision values after each new relevant document is observed in a ranked list.) •For metadataindexed by CA named place regions, and For all metadata in the test collection: One factor missing from these results is how much of the areas are offshore and how much onshore. So we add a ShoreFactor variable… Logistic Regression Model #2 X 1 = area of overlap(query region, candidate GIO) / area of query region X 2 = area of overlap(query region, candidate GIO) / area of candidate GIO X 3 = 1 – abs(fraction of overlap region that is onshore fraction of candidate GIO that is onshore) Results for both models over all test data… These results suggest: •Addition of Shorefactor variable improves the model (LR 2), especially for MBRs •Improvement not so dramatic for convex hull approximations – b/c the problem that shorefactor addresses is not that significant when areas 1) Minimum Bounding Circle (3) 2) MBR: Minimum aligned Bounding rectangle (4) 3) Minimum Bounding Ellipse (5) 6) Convex hull (varies) 5) 4-corner convex polygon (8) 4) Rotated minimum bounding rectangle (5) Presented in order of increasing quality. Number in parentheses denotes number of parameters needed to store representation After Brinkhoff et al, 1993b Conservative Approximations P ( R | Q , D )= c 0 + c i X i i =1 m X 1 = area of overlap(query region, candidate GIO) / area of query region X 2 = area of overlap(query region, candidate GIO) / area of candidate GIO X 3 = 1 – abs(fraction of overlap region that is onshore fraction of candidate GIO that is on Where: Range for all variables is 0 (not similar) to 1 (same) Counties Cities National Parks National Forests Water QCB Regions Bioregions MBRs Ave. False Area of Approximation: MBRs: 94.61% Convex Hulls: 26.73% Convex Hulls 42 of 58 counties referenced in the test collection metadata 10 counties randomly selected as query regions to train LR model 32 counties used as query regions to test model These results suggest: •Convex Hulls perform better than MBRs •Expected result given that the CH is a higher quality approximat •A probabilistic ranking based on MBRs can perform as well if not better than a non- probabiliistic ranking method based on Convex Hulls •Since any approximation other than the MBR requires great expense, this suggests that the exploration of new ranking methods based on the MBR are a good way to go. Candidate GIO MBRs A) GLORIA Quad 13: fraction onshore = .55 B) WATER Project Area: fraction onshore = .74 Query Region MBR Q) Santa Clara County: fraction onshore = .95 Onshore Areas Computing Shorefactor: Q – A Shorefactor: 1 – abs(.95 - .55) = .60 Q – B Shorefactor: 1 – abs(.95 - .74) = .79 Shorefactor = 1 – abs(fraction of query region approximation that is onshore fraction of candidate GIO approximation that is onshore) A B Q
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    224
  • download

    1

Transcript of Geographic Information Retrieval (GIR) Definitions: Geographic information retrieval (GIR) is...

Page 1: Geographic Information Retrieval (GIR) Definitions: Geographic information retrieval (GIR) is concerned with spatial approaches to the retrieval of geographically.

Geographic Information Retrieval (GIR)Geographic Information Retrieval (GIR)•Definitions: Geographic information retrieval (GIR) is concerned with spatial approaches to the retrieval of geographically referenced, or Definitions: Geographic information retrieval (GIR) is concerned with spatial approaches to the retrieval of geographically referenced, or Georeferenced information objects (GIOs):Georeferenced information objects (GIOs):

•Information objects that are about specific regions or features on or near the surface of the Earth. Information objects that are about specific regions or features on or near the surface of the Earth.

•Geospatial data are a special type of georeferenced information that encodes a specific geographic feature or set of features along with Geospatial data are a special type of georeferenced information that encodes a specific geographic feature or set of features along with associated attributesassociated attributes

–maps, air photos, satellite imagery, digital geographic data, etcmaps, air photos, satellite imagery, digital geographic data, etc

Georeferencing and GIRGeoreferencing and GIR•Within a GIR system, e.g., a geographic digital library, information objects can be georeferenced by place names or by geographic Within a GIR system, e.g., a geographic digital library, information objects can be georeferenced by place names or by geographic coordinates (i.e. longitude & latitude)coordinates (i.e. longitude & latitude)

GIR is not GISGIR is not GIS•GIS is concerned with spatial representations, relationships, and analysis at the level of the individual spatial object or field.GIS is concerned with spatial representations, relationships, and analysis at the level of the individual spatial object or field.•GIR is concerned with the retrieval of geographic information resources (and geographic information objects at the set level) that may be GIR is concerned with the retrieval of geographic information resources (and geographic information objects at the set level) that may be relevant to a geographic query region.relevant to a geographic query region.Spatial Approaches to GIR Spatial Approaches to GIR •A spatial approach to geographic information retrieval is one based on the integrated use of spatial representations, and spatial A spatial approach to geographic information retrieval is one based on the integrated use of spatial representations, and spatial relationships. relationships. •A spatial approach to GIR can be qualitative or quantitativeA spatial approach to GIR can be qualitative or quantitative

–Quantitative:Quantitative: based on the geometric spatial properties of a geographic information object based on the geometric spatial properties of a geographic information object

–Qualitative:Qualitative: based on the non-geometric spatial properties. based on the non-geometric spatial properties.

•Based on theBased on the coordinate coordinate encoding of encoding of spatial representationsspatial representations

•The coordinate representation is used to geometrically determine: The coordinate representation is used to geometrically determine:

•topological topological spatial relationships,spatial relationships,

•metric metric spatial characteristicsspatial characteristics Topological RelationshipsTopological Relationships•Relative spatial relations without concern for measureable distance or absolute direction.Relative spatial relations without concern for measureable distance or absolute direction.

•Possible Topological relationships between two overlapping regions:Possible Topological relationships between two overlapping regions:Spatial Representation - Considerations•What spatial characteristics should be represented and how? •How much coordinate detail is needed to represent geographic features or regions? •How can imprecise geographic regions be represented? •How do the representation choices impact the types of queries that can be asked and the spatial operations that can be performed within the system? •How do the representation choices impact storage and system performance? •How do the representation choices impact those people who will need to catalog (i.e. encode) these representations and those who will use these representations?

Geometric Approximations•The decomposition of spatial objects into approximate representations is a common approach to simplifying complex and often multi-part coordinate representations •Types of Geometric Approximations

•Conservative: superset

•Progressive: subset

•Generalizing: could be either

•Concave or Convex–Provide faster geometric operations on convex polygons

Goals of a geometric Approximation•Quality: The quality criterion concerns maximizing the degree to which the approximation models the complex spatial object from which it was derived.

•A simple measure of quality is area: the greater the difference in area between an approximation and a spatial object, the poorer the quality. Given this definition, a point representation is the poorest quality approximation of a geographic region or feature, since all of these have area.

•Simplicity: The simplicity criterion concerns maximizing the computational efficiencies of storage, disk access, and computational geometry operations that utilize the approximation.

–Factors that contribute to simplicity include: fewer points needed to encode the representation; fewer parts, simple polygons without holes, and convexity rather than concavity.

–In the context of GIR, simplicity also has to involve a consideration of how easy it is to create (catalog) the approximation and for the end-user to understand the approximation.

Geographic Information Retrieval (GIR): Searching Where and WhatGeographic Information Retrieval (GIR): Searching Where and WhatRay R. Larson and Patricia Frontiera: Ray R. Larson and Patricia Frontiera: School of Information Management & Systems and College of Environmental Design, University of California, Berkeley -- [email protected] of Information Management & Systems and College of Environmental Design, University of California, Berkeley -- [email protected]

This research was sponsored at U.C. Berkeley by the National Science This research was sponsored at U.C. Berkeley by the National Science Foundation and the Joint Information Systems Committee (UK) under the Foundation and the Joint Information Systems Committee (UK) under the International Digital Libraries Program award #IIS-99755164. Additional International Digital Libraries Program award #IIS-99755164. Additional

Support was provided by the Institute for Museum and Library Services as Support was provided by the Institute for Museum and Library Services as part of the “Going Places in the Catalog” project.part of the “Going Places in the Catalog” project.

AcknowledgementsAcknowledgements

Geographic data are an extremely important resource for a wide range of scientists, planners, policy makers, and analysts who study natural and planned environments. Notably, the landscape of geographic analysis has been changing rapidly from data and computation poor to data and computation rich. Developments in digital electronic technologies, such as satellites, integrated GPS units, digital cameras, and miniature sensors, are dramatically increasing the types and amounts of digitally available raw geographic data and derived information products. At the same time, advances in computer hardware, software and network technologies continue to improve our ability to store and analyze these large, complex data sets.

These factors are contributing to a growing political, social, scientific and economic awareness of the value of geographic information and driving new applications for its use. In response to this, geographic digital libraries that specialize in providing access to these data are growing in number, collection size, and sophistication. Moreover, mainstream digital libraries, i.e. those that deal with primarily text materials, are increasingly considering geographic access methods for information resources that, while not specifically about geographic features, have important geographic characteristics. Simply stated, most of the objects in digital libraries are, to a greater or lesser extent, about or related to particular places on or near the surface of the Earth.

Place name georeferencing is extremely effective because names are the primary means by which people refer to geographic locations. However, place names have well-documented lexical and geographical problems. Lexical problems include lack of uniqueness, alternate names or spellings, and name changes. Geographical problems include boundaries that change over time, places with ambiguous boundaries, and geographic features or areas of interest without known place names. Unlike place names, geographic coordinate representations provide an unambiguous and persistent method for locating geographic areas or features. However, the use of coordinates presents many challenges in terms of storage, indexing, processing and user interface design that only recently have begun to be investigated in the context of geographic information retrieval (GIR).

.

San Francisco Bay Area

-122.418, 37.775

The Geographic Footprint•In GIR applications, the geographic footprint is typically the only quantitative spatial characteristic that is encoded and utilized.•The Footprint is a geometric representation of the extent of the geographic content of the information object being described. Usually expressed in geographic coordinates (i.e. latitude and longitude

•Points: maintain a general sense of location but not extent or shape

•Polygons: identify location, extent, and shape with varying degree of precision

–The minimum aligned bounding rectangle (MBR) is the most commonly used polygonal spatial representantion in GIR systems.

Spatial query formation•A spatial approach to GIR requires a geographic interface to support spatial thinking and query formation. Spatial Queries – key issues•Communicating with the user: if the user selects a place name from a list, what type of geometric approximation is used to represent the query region (a point, a simple bounding box polygon, a complex polygon?)•Level of detail in a graphic interface needs to be sufficient to support geographic queries.•How can queries for more complex spatial characteristics be supported?

–Density, dispersion, pattern

Spatial Query Example: 1st and 2nd generation interfaces from the FGDC/NSDI Efforts:

Presentation of Results and Spatial evaluation•The user interface for a GIR system requires a geographic interface to support the formation of spatial queries, this component is also needed to support the presentation of results to the user and to assist the user in evaluating the results. This type of display also helps the user understand how the system has interpreted the geospatial aspect of his query and matched it against the available information objects.

Example: Results display from CheshireGeo:

Spatial Ranking methods from the Literature:

Spatial Similarity Measures Matching and Spatial Ranking•Spatial similarity can be considered as a indicator of relevance: documents whose spatial content is more similar to the spatial content of query will be considered more relevant to the information need represented by the query. •Need to consider both: Qualitative, non-geometric spatial attributes and Quantitative, geometric spatial attributes

•Three basic approaches to spatial similarity measures and ranking

Geodata.govNSDI Clearinghouse

The Geodata.gov site provides better support for a query on wetlands near Petaluma because of the increase cartographic detail that appears as the user zooms in. ( you can’t even find Petaluma on the

NSDI site – and you can get even more lost if you zoom in further).

Method 1: Simple Overlap•Candidate geographic information objects (GIOs) that have any overlap with the query region are retrieved. •Included in the result set are any GIOs that are contained within, overlap, or contain the query region. •The spatial score for all GIOs is either relevant (1) or not relevant (0).•The result set cannot be ranked

–topological relationship only, no metric refinement

Method 2: Topological Overlap•Spatial searches are constrained to only those candidate GIOs that either:

•are completely contained within the query region,•overlap with the query region, •or, contain the query region.

•Each category is exclusive and all retrieved items are considered relevant. •The result set cannot be ranked

•categorized topologoical relationship only, •no metric refinement

Method 3: Degree of Overlap•Candidate geographic information objects (GIOs) that have any overlap with the query region are retrieved.•A spatial similarity score is determined based on the degree to which the candidate GIO overlaps with the query region. •The greater the overlap with respect to the query region, the higher the spatial similarity score.•This method provides a score by which the result set can be ranked

•topological relationship: overlap•metric refinement: area of overlap

Our Approach Is a Probabilistic Estimate of Probability of Relevance based on Logistic Regression from from a sample of data with relevance judgements.

•Test Data•2554 metadata records indexed by 322 unique geographic regions (represented as MBRs) and associated place names.

–2072 records (81%) indexed by 141 unique CA place names•881 records indexed by 42 unique counties (out of a total of 46 unique counties indexed in CEIC collection)•427 records indexed by 76 cities (of 120)•179 records by 8 bioregions (of 9)•3 records by 2 national parks (of 5)•309 records by 11 national forests (of 11)•3 record by 1 regional water quality control board region (of 1)•270 records by 1 state (CA)

–482 records (19%) indexed by 179 unique user defined areas (approx 240) for regions within or overlapping CA•12% represent onshore regions (within the CA mainland) •88% (158 of 179) offshore or coastal regions

•Geographic Approximations for CA Counties, UDAs, and training sample:

Logistic Regression model #1•X1 = area of overlap(query region, candidate GIO) / area of query region

•X2 = area of overlap(query region, candidate GIO) / area of candidate GIO

•Where: Range for all variables is 0 (not similar) to 1 (same)

•Results in Mean Average Query Precision: (the average precision values after each new relevant document is observed in a ranked list.)•For metadataindexed by CA named place regions, and For all metadata in the test collection:

One factor missing from these results is how much of the areas are offshore and how much onshore. So we add a ShoreFactor variable…

Logistic Regression Model #2•X1 = area of overlap(query region, candidate GIO) / area of query region

•X2 = area of overlap(query region, candidate GIO) / area of candidate GIO 

•X3 = 1 – abs(fraction of overlap region that is onshore fraction of candidate GIO

that is onshore)

Results for both models over all test data…These results suggest:

•Addition of Shorefactor variable improves the model (LR 2), especially for MBRs•Improvement not so dramatic for convex hull approximations – b/c the problem that shorefactor addresses is not that significant when areas are represented by convex hulls.

1) Minimum Bounding Circle (3) 2) MBR: Minimum aligned Bounding

rectangle (4)

3) Minimum Bounding Ellipse (5)

6) Convex hull (varies)5) 4-corner convex polygon (8)4) Rotated minimum bounding rectangle (5)

Presented in order of increasing quality. Number in parentheses denotes number of parameters needed to store representation

After Brinkhoff et al, 1993b

Conservative Approximations

P(R | Q,D) = c0 + c iX i

i=1

m

∑•X1 = area of overlap(query region, candidate GIO) / area of query region

•X2 = area of overlap(query region, candidate GIO) / area of candidate GIO 

•X3 = 1 – abs(fraction of overlap region that is onshore fraction of candidate GIO that is onshore)

•Where: Range for all variables is 0 (not similar) to 1 (same) Counties Cities

National Parks

National Forests

Water QCB Regions

Bioregions

MBRs

Ave. False Area of Approximation:MBRs: 94.61% Convex Hulls:

26.73%

Convex Hulls 42 of 58 counties referenced in the test collection metadata

• 10 counties randomly selected as query regions to train LR model

• 32 counties used as query regions to test model

These results suggest:•Convex Hulls perform better than MBRs

•Expected result given that the CH is a higher quality approximat

•A probabilistic ranking based on MBRs can perform as well if not better than a non-probabiliistic ranking method based on Convex Hulls•Since any approximation other than the MBR requires great expense, this suggests that the exploration of new ranking methods based on the MBR are a good way to go.

Candidate GIO MBRs A) GLORIA Quad 13: fraction onshore = .55 B) WATER Project Area: fraction onshore = .74

Query Region MBR Q) Santa Clara County: fraction onshore = .95

Onshore Areas

Computing Shorefactor:Q – A Shorefactor: 1 – abs(.95 - .55) = .60Q – B Shorefactor: 1 – abs(.95 - .74) = .79

Shorefactor = 1 – abs(fraction of query region approximation that is onshore – fraction of candidate GIO approximation that is onshore)

A

B

Q