James Reid Project Manager EDINA

22
James Reid Project Manager EDINA

description

James Reid Project Manager EDINA. The geoXwalk project. funded under JISC IE Development Programme builds on Phase I scoping study aims to develop a demonstrator gazetteer service suitable for extension to full service. time-frame: start 1 June 2002 for 1 year - PowerPoint PPT Presentation

Transcript of James Reid Project Manager EDINA

Page 1: James Reid Project Manager EDINA

James ReidProject Manager

EDINA

Page 2: James Reid Project Manager EDINA

The geoXwalk project

• funded under JISC IE Development Programme– builds on Phase I scoping study – aims to develop a demonstrator gazetteer service

suitable for extension to full service.

• time-frame: start 1 June 2002 for 1 year• project partners: EDINA and UK Data Archive• aim: to develop a ‘proof of concept’ demonstrator

Page 3: James Reid Project Manager EDINA

JISC Information Environment -geoXwalk as ‘shared service’

Portal

Content providers

End-user

Portal

Broker/Aggregator

Authentication

Authorisation

Collect’n Desc

Service Desc

Resolver

Inst’n Profile

Shared services

Portal

Provision layer

Fusion layer

Presentationlayer

geoXwalk

Page 4: James Reid Project Manager EDINA

Geo-referencing: that’s what’s special about the spatial

• subject content most often referenced by topic …… but much (80%?) can be referenced to specific geographic

places• broad disciplinary base for more powerful geographic

searching– across the social, life & physical sciences as well as the

humanities– also from libraries, archives and museums– now from digital libraries, service providers & data providers

• geo-referencing thus a way of viewing information content:– subject, people, place and time

• geographic co-ordinates are persistent regardless of name, political boundary or other changes

Page 5: James Reid Project Manager EDINA

Why this is difficult...

How to search ‘geographically’ given that : e.g. a postcode, a placename and an administrative area are all valid

geographies and yet every information system cannot know about all the possible variations of what constitutes a ‘geography’!

Problem compounded by inconsistency of use even in the ‘standards’ e.g. placenames evolve, have alternative names

Long history in UK of boundary changes and changes in the geographies used to record things e.g. electoral ward boundary changes …

Page 6: James Reid Project Manager EDINA

There is underlying complexity, such as Multiple Geographies …

Page 7: James Reid Project Manager EDINA

Make variations in definitions of ‘geography’ transparent

Provide a means to ‘crosswalk’ geographies i.e. translate one geography into another - hence the name

‘Geographic agnosticism’

The vision

How?

A digital gazetteer that stores the different geographies and can implicitly resolve the relationships between them

Provision as a service to service other services

Page 8: James Reid Project Manager EDINA

Gazetteer - A list of geographic features together with their associated spatial location

Digital Gazetteer - An electronic list of geographic features together with their associated spatial location

(An authority database of places (and features?))

Digital Gazetteer Service - A network-addressable middle-ware server supporting geographic referencing and searching. A shared ‘terminology’ service.

Page 9: James Reid Project Manager EDINA

Why not just use hierarchical thesauri? (part of the ‘Document Tradition’)

Comment: one type of simple relationship between entries is exploited entries ordered from very general to very specific (BT, NT) can efficiently determine what a given area contains normally structured to handle alternative names (SY)

X rigid structure, one view only, typically geo-politicalentities can belong in many hierarchies and new relationships evolve

X names may not be uniqueX cannot deal with spatial proximity / contiguityX no way to relate to other geographies, e.g. postcodesX lack of simple hierarchies in UK (and other ‘old’) geographies

United Kingdom………………………… (nation)England …………………………..(country)

Devon………………………….. (county)Barton………………………………..

Page 10: James Reid Project Manager EDINA
Page 11: James Reid Project Manager EDINA

Uses of geoXwalk Digital Gazetteer Service

1. As ‘shared service’, enabling other information services to support full range of spatial searching (query constraints)1. no need to hold all data (at service) to resolve spatial query2. uses co-ordinates and (implicit) spatial relationships to ‘cross-

walk’ between geographies3. machine-to-machine (m2m) interaction to ‘shared service’

2. As reference facility for researchers, libraries & museums 1. including means to resolve variant names etc.

3. As online facility to assist metadata creators and means to semi-automatically geo-reference existing resources

Page 12: James Reid Project Manager EDINA

Reference use

Information server

Information server

Searching (1 - use cases)

Geo-parsing &indexing

The geoXwalkServer

geoXwalk Use Cases

Searching (2)

e.g.• Where is Aberdour?• On what river is Dundee situated?• By what alternative names has York been known?• List me all places ending with ‘kirk’

Page 13: James Reid Project Manager EDINA

<?xml version="1.0" encoding="UTF-8"?><gazetteer-service xmlns="http://www.alexandria.ucsb.edu/gazetteer" version="1.1"> <query-request> <gazetteer-query> <name-query operator="equals” text="Fife"/> </gazetteer-query> <report-format>standard</report-format> </query-request></gazetteer-service>

Query for a placename

<?xml version="1.0" encoding="UTF-8"?><gazetteer-service xmlns="http://www.alexandria.ucsb.edu/gazetteer" xmlns:gml="http://www.opengis.net/gml" version="1.1"> <query-request> <gazetteer-query> <and> <class-query thesaurus="Edina FT Thesaurus” term="towns"/> <footprint-query operator="within"> <gml:Box> <gml:coordinates> -0.02988,51.45753, 1.30798,52.07042 </gml:coordinates> </gml:Box> </footprint-query> </and> </gazetteer-query> <report-format>standard</report-format> </query-request></gazetteer-service>

Query by feature type and bounding box

XML query fragments

Page 14: James Reid Project Manager EDINA

Developments to Date

1. Creation & population of GB gazetteer database with:1. Enhanced OS 1:50,000 Placename Gazetteer2. Digital boundary data (UKBORDERS)3. Additional Place Name Variants (partial for Scotland and

Wales)4. Derived multi-source data e.g. named woodlands and lakes

based on hybrid 1:50K gazetteer and OS products

2. Development of spatial extensions to database to support enhanced geographic search capabilities

3. Development of middleware to support m2m and interactive searching

4. Support for and testing of alternative query protocols -ADL / Z39.50(?)

5. Development of a geoparser to support semi-automatic indexing

Page 15: James Reid Project Manager EDINA

Ongoing Work and Issues

• Merging geo-data from different scales & from different sources– how to accommodate historical data– positional accuracy & expression of confidence?– how to minimise effort in de-duplication of place(s)?

• places have multiple names, types, and footprints• need to be able to identify duplicate entries for the same place

• Presenting geo-names on different occasions?– many variant ‘proper’ names, what is preferred?

• what is the ‘name authority body’? - none in the Scotland or the UK• preferred name varies with location and use and culture

– there are language and character code set issues– ‘standard’ codes for postal addresses and other geographies

• IPR issues in metadata; and hence terms & conditions of use• Service performance issues and appropriate protocols

Page 16: James Reid Project Manager EDINA

Contact details

[email protected], Data Library, University of Edinburghtelephone +44 (0)131 650 3302

• For information on geoXwalk project:www.geoXwalk.ac.uk

Page 17: James Reid Project Manager EDINA

Task: Find resource about 'Liverpool docks’Search using a ‘traditional’ gazetteer might yield:

… that means more & better hits …. !!!

Using spatial proximity in an active gazetteer, the search can be widened:

Place County/UALiverpool Liverpool

Bebbington Wirral

Birkenhead Wirral

Bootle Sefton

New Brighton Wirral

Seacombe Wirral

Seaforth Wirral

Waterloo Sefton

co-ordinates allow (near) co-located places to be co-identified.

Page 18: James Reid Project Manager EDINA

Supporting service searching:“Photographs of towns along the River Tweed”

Place name - River TweedFeature Type: River

Relation: ‘near’Distance: 1/2 km

Target type: towns

Places...PeeblesInnerleithenMelroseKelsoColdstreamBerwick upon Tweed

Image finder server

(Images indexed on place names)

Page 19: James Reid Project Manager EDINA

Supporting cross searching:geoXwalk in the Common Information Environment

Coordinate footprints - Dundee(334995, 729203, 350609, 734710)

Places:Barnhill Broughty Ferry Craigie Douglas And Angus FintryLocheeMonifiethWest Ferry

<

Page 20: James Reid Project Manager EDINA

Supporting cross searching different services

geoXwalkServer

Content Provider C

ContentProvider A

ContentProvider B

Coordinate footprints

Parish names

Place names

Portal service

Post code: L34 0HS?

‘Find resources for this postcode’ (NB postcode often used to geo-reference survey data files)

Knowsley

340900,392300 - 347217, 397660

BX003

<

Page 21: James Reid Project Manager EDINA

As online facility to assist metadata creation

• Most of the extant resources in the JISC IE have some form of spatial reference e.g. placename, county name, postcode

• A ‘geoparser’ has been developed which will assist in the semi-automatic indexing of these resources by using the gazetteer as reference.

• The results of the geoparsing can be used to update the documents metadata, making it directly geographically searchable.

Page 22: James Reid Project Manager EDINA

Need screen shot of parser here

<