Geographical Web Search Engines and Geographical Information Retrieval (GIR)

32
Edinburgh Euro GeoInf 200 7 1 Geographical Web Search Engines and Geographical Information Retrieval (GIR) Christopher Jones Cardiff University

description

Geographical Web Search Engines and Geographical Information Retrieval (GIR). Christopher Jones Cardiff University. Where is Geo-information?. Personal knowledge (in our heads) of landscape, of where things, people and services are located, where things happened… Documents (various media) - PowerPoint PPT Presentation

Transcript of Geographical Web Search Engines and Geographical Information Retrieval (GIR)

Edinburgh Euro GeoInf 2007 1

Geographical Web Search Enginesand

Geographical Information Retrieval (GIR)

Christopher Jones

Cardiff University

Edinburgh Euro GeoInf 2007 2

Where is Geo-information?

Personal knowledge (in our heads)– of landscape, of where things, people and

services are located, where things happened…

Documents (various media)– Lists of where facilities, resources, structures

are located– Textual descriptions of geographic phenomena– Images and videos of geographic space

Maps

Edinburgh Euro GeoInf 2007 3

GIS and the Web

A GIS typically :– Isolated– Supports individual

organisation – Accessed privately– Small range of topics– Structured data /

geo-coded locations– Finds answers – Complicated to use

World Wide Web is :– Global networked– Supports everyone

on Internet– Accessed publicly– Vast range of topics– Unstructured

free text / images– Finds documents – Easy to use

Edinburgh Euro GeoInf 2007 4

WWW as a source of geo-information• Geographic context

embedded in natural language descriptions

• Web queries depend on exact match of text terms

• No intelligent interpretation of spatial relationships (“near”, “west” etc)

• Place names ambiguous and confused with names of organisations, people, buildings and streets

• No geo-relevance ranking

Edinburgh Euro GeoInf 2007 5

Current motivation of GIR : Find geo-specific resources on the Web

(mostly documents and images) find web resources about

Something related_to Somewhere

related_to = in, near, within Xkm, north_of ..etc.

• Resolve ambiguity of names (many places have same name)

• Interpret the query spatial relationships query footprint

• Find documents geographically associated with region of query footprint

• Relevance rank geographically by place and subject

near north

Edinburgh Euro GeoInf 2007 6

GIR, GIS and The Web

Geo-knowledge

GIS

The Web

GIRWorldKnowledge

Edinburgh Euro GeoInf 2007 7

Geographical Search Engines

• Google etc have “local” versions.

-Based on business (yellow pages) directories.

Edinburgh Euro GeoInf 2007 8

Geographical Search Engines

SPIRIT research prototype general geo-web search

Structured user interface:

Dropdown menu of spatial relationships

Edinburgh Euro GeoInf 2007 9

Geographical search engines

SPIRITResultslisted as

URLs Plus

symbols on map

User Interface screen shots from Ross Purves et al University of Zurich

Edinburgh Euro GeoInf 2007 10

Anatomy of a Geographical Search Engine

Textual

Spatial

IndexesSpatialTextual

SearchEngine

RelevanceRanking

RankedResults

Search Request + Query footprint

UnrankedResults

Place Ontology

UserInterface

Broker

RankedResults

Query disambiguation

Geo-tagging

Textual

Spatial

WebResources

Document Footprints

Text Indexing

Query footprint

Edinburgh Euro GeoInf 2007 11

Geo-Tagging = Geo-parsing + Geo-coding

Geo-parsing Recognising genuine

geographic references (place names, addresses, post codes, phone codes ) ignoring non-geographic uses.

Geo-coding– Attaching a unique

quantitative locations (footprint) to geographic references

Edinburgh Euro GeoInf 2007 12

Geo-Parsing : true & false references

Some types of false geographic reference

• Personal names Smedes York

• Business name Dorchester Hotel, York Properties..

• Street names Oxford Street,

London Road…

• Common words that are also places urban, institute, land, battle, derby, over, well, ……

Edinburgh Euro GeoInf 2007 13

Geo-Parsing : distinguishing between false and true geo-references

Look for patterns and context

Personal names (Jack London, Mr York): <First_name> <Location>; <Title> <Location>

Business names (Paris Hotel) :

<Business_type> <Location> (or vice versa)

Street names (Oxford Street) :

<Location> <Road_type>

Detect spatial propositions in, near, south of, outside etc “he lived in Over”

Genuine occurrences can be used to train machine learning

Edinburgh Euro GeoInf 2007 14

Geo-coding (grounding) the genuine geo-references

Many different places with the same name

(referent ambiguity) Newport, Cambridge,

Springfield………

Use context to decide (references to parent or nearby places )

Or – choose most important one (by population or place type hierarchy)

Edinburgh Euro GeoInf 2007 15

Anatomy of a Geographical Search Engine

Textual

Spatial

IndexesSpatialTextual

SearchEngine

RelevanceRanking

RankedResults

Search Request + Query footprint

UnrankedResults

Place Ontology

UserInterface

Broker

RankedResults

Query disambiguation

Geo-tagging

Textual

Spatial

WebResources

Document Footprints

Text Indexing

Query footprint

Edinburgh Euro GeoInf 2007 16

Indexing Web ResourcesStandard text index is

inverted file

Query: Restaurants in Cardiff

Find documents that contain all terms

Works literally for “in” but won’t find contained places.

Doesn’t work in general for “near”, “Xkms from”, “north_of” etc

apple Doc79, Doc89, Doc822….

Cardiff Doc2, Doc19, Doc37, …

door Doc16, Doc49, Doc112…..

hotel Doc1, Doc2, Doc23, …

in Doc4, Doc7, Doc19…

London Doc20, Doc35, Doc150…..

pub Doc9, Doc11, Doc100, …

restaurant Doc19, Doc22, Doc37, ..

…………………….

…………………………………………..

Text Term List of resources containing term

Edinburgh Euro GeoInf 2007 17

Why Spatial Indexing?Query “Hotels outside and within 30Kms of Glasgow”

Need to find documents referring to hotels that are in places other than Glasgow

Query : “Castles in Wales”Need to find documents that refer to names of places in

Wales (perhaps without mentioning “Wales”)

• In both cases to use conventional text indexing requires a query to contain the names of all places in Wales and all places outside Glasgow within 30km

Edinburgh Euro GeoInf 2007 18

Spatial indexing of resources• Use dominant geographic references of

documents to create document footprints (point, polygon, bounding rectangle..)

• Use footprints to index documents• Convert query to a query footprint• Match query footprint to doc. footprints

Spatial Query Result

Edinburgh Euro GeoInf 2007 19

Anatomy of a Geographical Search Engine

Textual

Spatial

IndexesSpatialTextual

SearchEngine

RelevanceRanking

RankedResults

Search Request + Query footprint

UnrankedResults

Place Ontology

UserInterface

Broker

RankedResults

Query disambiguation

Geo-tagging

Textual

Spatial

WebResources

Document Footprints

Text Indexing

Query footprint

Edinburgh Euro GeoInf 2007 20

Geographical Relevance Ranking

• Determine “distance” between query footprint and document footprint

• Depends on query spatial operator (in, outside, X Kms from, north_of etc)

Spatial score

Example: airports near Leicester the further away, the lower the spatial score

D

Q

Figure from Marc van Kreveld, University of Utrecht

Edinburgh Euro GeoInf 2007 21

Combining textual and spatial scores

• Textual scores: BM25• Spatial scores: by spatial footprint

analysis

0

1

1

normalizedBM25 score

spatial score

query / ideal footprint

footprints of documents

Figure from Marc van Kreveld University of Utrecht

Edinburgh Euro GeoInf 2007 22

Anatomy of a Geographical Search Engine

Textual

Spatial

IndexesSpatialTextual

SearchEngine

RelevanceRanking

RankedResults

Search Request + Query footprint

UnrankedResults

Place Ontology

UserInterface

Broker

RankedResults

Query disambiguation

Geo-tagging

Textual

Spatial

WebResources

Document Footprints

Text Indexing

Query footprint

Edinburgh Euro GeoInf 2007 23

Place OntologyEncodes knowledge of terminology and structure

of geographic space

• alternative names, languages• place types (political, topographic, social.. )• footprint (point, MBR, polygon) • spatial relationships and attributes : containment, adjacency, overlap • imprecise (vernacular) places

(“Midlands”, “south of France”, “Scottish borders”, “Pennines”, “Highlands”…..)

Derive from gazetteers, thesauri, maps & the web

Edinburgh Euro GeoInf 2007 24

Roles of Place Ontology

User Interface

Query Disambiguation

Geo-Tagging

Metadata Extraction

Web collection

document footprints

Relevance Ranking

Relevance Ranking

Spatial Index

documentfootprints

Search Component

Query Expansion(query footprint)

ontology

Edinburgh Euro GeoInf 2007 25

Mining text on the web for vernacular place name knowledge

• Objective: estimate spatial extent of vague place

• Documents that refer to vague places may also refer to more precise places inside them.

• Places that occur frequently in association with a target named place may have higher chance of being inside

• Analyse frequency of occurrence of co-located places

Edinburgh Euro GeoInf 2007 26

Places mentioned in documents retrieved by queries on the

“Cotswolds”

Figure from Ross Purves et al University of Zurich

Edinburgh Euro GeoInf 2007 27

GIR and GIS• GIR currently dominated by web search

– Unstructured results in multiple documents

• Sometimes single focused result wanted

• Hotels within 1 kilometre of the British Museum in London

• Where are pre-sixteenth century dwellings in USA?

• Which areas of East Anglia would be flooded if sea level rose by 1 metre?

Edinburgh Euro GeoInf 2007 28

Bringing GIR and GIS together

Geo-knowledge

GIS

The Web

GIRWorldKnowledge

Geo-knowledge

GIS

The Web

GIRWorldKnowledge

Edinburgh Euro GeoInf 2007 29

GeoInformation Services

Encode Geo-information in Web Services (Geo-services)

• Parse natural language queries• Interpret geo-terminology of queries• Identify the relevant geo-services to

match geo and non-geo concepts• Compose appropriate chain of services

Edinburgh Euro GeoInf 2007 30

EU - TRIPOD Project• Improve accessibility of images on web• Focus on geographical context• Enhance captions / metadata for archival

images• Automatically generate captions for images

from location / orientation – aware cameras• Web harvesting to enrich metadata• Interpret (vague) spatial natural language • Toponym ontology of places and landmarks

(including vernacular places)• Use 3D landscape models to determine what is

in camera view• Prototype image search enginehttp://tripod.shef.ac.uk/index.html

Edinburgh Euro GeoInf 2007 31

Future of GIR?• Improve “conventional GIR” components:

– Geo-tagging, spatio-textual indexing and geo-relevance ranking

• Place ontologies with world-wide coverage

• Understanding of spatial natural language

• Integrate time & space (temporal language)

• Open GeoInformation Web services

• Adapt GIR to personal needs & location

Edinburgh Euro GeoInf 2007 32

More Information

• See www.geo-spirit.org for information on SPIRIT project and downloads of articles and project deliverables.

[N.B. Prototype search engine (with link from SPIRIT web site) is no longer functional]

TRIPOD : www.ProjectTripod.org