Edinburgh Euro GeoInf 2007 1
Geographical Web Search Enginesand
Geographical Information Retrieval (GIR)
Christopher Jones
Cardiff University
Edinburgh Euro GeoInf 2007 2
Where is Geo-information?
Personal knowledge (in our heads)– of landscape, of where things, people and
services are located, where things happened…
Documents (various media)– Lists of where facilities, resources, structures
are located– Textual descriptions of geographic phenomena– Images and videos of geographic space
Maps
Edinburgh Euro GeoInf 2007 3
GIS and the Web
A GIS typically :– Isolated– Supports individual
organisation – Accessed privately– Small range of topics– Structured data /
geo-coded locations– Finds answers – Complicated to use
World Wide Web is :– Global networked– Supports everyone
on Internet– Accessed publicly– Vast range of topics– Unstructured
free text / images– Finds documents – Easy to use
Edinburgh Euro GeoInf 2007 4
WWW as a source of geo-information• Geographic context
embedded in natural language descriptions
• Web queries depend on exact match of text terms
• No intelligent interpretation of spatial relationships (“near”, “west” etc)
• Place names ambiguous and confused with names of organisations, people, buildings and streets
• No geo-relevance ranking
Edinburgh Euro GeoInf 2007 5
Current motivation of GIR : Find geo-specific resources on the Web
(mostly documents and images) find web resources about
Something related_to Somewhere
related_to = in, near, within Xkm, north_of ..etc.
• Resolve ambiguity of names (many places have same name)
• Interpret the query spatial relationships query footprint
• Find documents geographically associated with region of query footprint
• Relevance rank geographically by place and subject
near north
Edinburgh Euro GeoInf 2007 7
Geographical Search Engines
• Google etc have “local” versions.
-Based on business (yellow pages) directories.
Edinburgh Euro GeoInf 2007 8
Geographical Search Engines
SPIRIT research prototype general geo-web search
Structured user interface:
Dropdown menu of spatial relationships
Edinburgh Euro GeoInf 2007 9
Geographical search engines
SPIRITResultslisted as
URLs Plus
symbols on map
User Interface screen shots from Ross Purves et al University of Zurich
Edinburgh Euro GeoInf 2007 10
Anatomy of a Geographical Search Engine
Textual
Spatial
IndexesSpatialTextual
SearchEngine
RelevanceRanking
RankedResults
Search Request + Query footprint
UnrankedResults
Place Ontology
UserInterface
Broker
RankedResults
Query disambiguation
Geo-tagging
Textual
Spatial
WebResources
Document Footprints
Text Indexing
Query footprint
Edinburgh Euro GeoInf 2007 11
Geo-Tagging = Geo-parsing + Geo-coding
Geo-parsing Recognising genuine
geographic references (place names, addresses, post codes, phone codes ) ignoring non-geographic uses.
Geo-coding– Attaching a unique
quantitative locations (footprint) to geographic references
Edinburgh Euro GeoInf 2007 12
Geo-Parsing : true & false references
Some types of false geographic reference
• Personal names Smedes York
• Business name Dorchester Hotel, York Properties..
• Street names Oxford Street,
London Road…
• Common words that are also places urban, institute, land, battle, derby, over, well, ……
Edinburgh Euro GeoInf 2007 13
Geo-Parsing : distinguishing between false and true geo-references
Look for patterns and context
Personal names (Jack London, Mr York): <First_name> <Location>; <Title> <Location>
Business names (Paris Hotel) :
<Business_type> <Location> (or vice versa)
Street names (Oxford Street) :
<Location> <Road_type>
Detect spatial propositions in, near, south of, outside etc “he lived in Over”
Genuine occurrences can be used to train machine learning
Edinburgh Euro GeoInf 2007 14
Geo-coding (grounding) the genuine geo-references
Many different places with the same name
(referent ambiguity) Newport, Cambridge,
Springfield………
Use context to decide (references to parent or nearby places )
Or – choose most important one (by population or place type hierarchy)
Edinburgh Euro GeoInf 2007 15
Anatomy of a Geographical Search Engine
Textual
Spatial
IndexesSpatialTextual
SearchEngine
RelevanceRanking
RankedResults
Search Request + Query footprint
UnrankedResults
Place Ontology
UserInterface
Broker
RankedResults
Query disambiguation
Geo-tagging
Textual
Spatial
WebResources
Document Footprints
Text Indexing
Query footprint
Edinburgh Euro GeoInf 2007 16
Indexing Web ResourcesStandard text index is
inverted file
Query: Restaurants in Cardiff
Find documents that contain all terms
Works literally for “in” but won’t find contained places.
Doesn’t work in general for “near”, “Xkms from”, “north_of” etc
apple Doc79, Doc89, Doc822….
Cardiff Doc2, Doc19, Doc37, …
door Doc16, Doc49, Doc112…..
hotel Doc1, Doc2, Doc23, …
in Doc4, Doc7, Doc19…
London Doc20, Doc35, Doc150…..
pub Doc9, Doc11, Doc100, …
restaurant Doc19, Doc22, Doc37, ..
…………………….
…………………………………………..
Text Term List of resources containing term
Edinburgh Euro GeoInf 2007 17
Why Spatial Indexing?Query “Hotels outside and within 30Kms of Glasgow”
Need to find documents referring to hotels that are in places other than Glasgow
Query : “Castles in Wales”Need to find documents that refer to names of places in
Wales (perhaps without mentioning “Wales”)
• In both cases to use conventional text indexing requires a query to contain the names of all places in Wales and all places outside Glasgow within 30km
Edinburgh Euro GeoInf 2007 18
Spatial indexing of resources• Use dominant geographic references of
documents to create document footprints (point, polygon, bounding rectangle..)
• Use footprints to index documents• Convert query to a query footprint• Match query footprint to doc. footprints
Spatial Query Result
Edinburgh Euro GeoInf 2007 19
Anatomy of a Geographical Search Engine
Textual
Spatial
IndexesSpatialTextual
SearchEngine
RelevanceRanking
RankedResults
Search Request + Query footprint
UnrankedResults
Place Ontology
UserInterface
Broker
RankedResults
Query disambiguation
Geo-tagging
Textual
Spatial
WebResources
Document Footprints
Text Indexing
Query footprint
Edinburgh Euro GeoInf 2007 20
Geographical Relevance Ranking
• Determine “distance” between query footprint and document footprint
• Depends on query spatial operator (in, outside, X Kms from, north_of etc)
Spatial score
Example: airports near Leicester the further away, the lower the spatial score
D
Q
Figure from Marc van Kreveld, University of Utrecht
Edinburgh Euro GeoInf 2007 21
Combining textual and spatial scores
• Textual scores: BM25• Spatial scores: by spatial footprint
analysis
0
1
1
normalizedBM25 score
spatial score
query / ideal footprint
footprints of documents
Figure from Marc van Kreveld University of Utrecht
Edinburgh Euro GeoInf 2007 22
Anatomy of a Geographical Search Engine
Textual
Spatial
IndexesSpatialTextual
SearchEngine
RelevanceRanking
RankedResults
Search Request + Query footprint
UnrankedResults
Place Ontology
UserInterface
Broker
RankedResults
Query disambiguation
Geo-tagging
Textual
Spatial
WebResources
Document Footprints
Text Indexing
Query footprint
Edinburgh Euro GeoInf 2007 23
Place OntologyEncodes knowledge of terminology and structure
of geographic space
• alternative names, languages• place types (political, topographic, social.. )• footprint (point, MBR, polygon) • spatial relationships and attributes : containment, adjacency, overlap • imprecise (vernacular) places
(“Midlands”, “south of France”, “Scottish borders”, “Pennines”, “Highlands”…..)
Derive from gazetteers, thesauri, maps & the web
Edinburgh Euro GeoInf 2007 24
Roles of Place Ontology
User Interface
Query Disambiguation
Geo-Tagging
Metadata Extraction
Web collection
document footprints
Relevance Ranking
Relevance Ranking
Spatial Index
documentfootprints
Search Component
Query Expansion(query footprint)
ontology
Edinburgh Euro GeoInf 2007 25
Mining text on the web for vernacular place name knowledge
• Objective: estimate spatial extent of vague place
• Documents that refer to vague places may also refer to more precise places inside them.
• Places that occur frequently in association with a target named place may have higher chance of being inside
• Analyse frequency of occurrence of co-located places
Edinburgh Euro GeoInf 2007 26
Places mentioned in documents retrieved by queries on the
“Cotswolds”
Figure from Ross Purves et al University of Zurich
Edinburgh Euro GeoInf 2007 27
GIR and GIS• GIR currently dominated by web search
– Unstructured results in multiple documents
• Sometimes single focused result wanted
• Hotels within 1 kilometre of the British Museum in London
• Where are pre-sixteenth century dwellings in USA?
• Which areas of East Anglia would be flooded if sea level rose by 1 metre?
Edinburgh Euro GeoInf 2007 28
Bringing GIR and GIS together
Geo-knowledge
GIS
The Web
GIRWorldKnowledge
Geo-knowledge
GIS
The Web
GIRWorldKnowledge
Edinburgh Euro GeoInf 2007 29
GeoInformation Services
Encode Geo-information in Web Services (Geo-services)
• Parse natural language queries• Interpret geo-terminology of queries• Identify the relevant geo-services to
match geo and non-geo concepts• Compose appropriate chain of services
Edinburgh Euro GeoInf 2007 30
EU - TRIPOD Project• Improve accessibility of images on web• Focus on geographical context• Enhance captions / metadata for archival
images• Automatically generate captions for images
from location / orientation – aware cameras• Web harvesting to enrich metadata• Interpret (vague) spatial natural language • Toponym ontology of places and landmarks
(including vernacular places)• Use 3D landscape models to determine what is
in camera view• Prototype image search enginehttp://tripod.shef.ac.uk/index.html
Edinburgh Euro GeoInf 2007 31
Future of GIR?• Improve “conventional GIR” components:
– Geo-tagging, spatio-textual indexing and geo-relevance ranking
• Place ontologies with world-wide coverage
• Understanding of spatial natural language
• Integrate time & space (temporal language)
• Open GeoInformation Web services
• Adapt GIR to personal needs & location
Top Related