Discovery of parameterised data access services through hierarchical classification schemes.

16
Discovery of parameterised data access services through hierarchical classification schemes.

Transcript of Discovery of parameterised data access services through hierarchical classification schemes.

Page 1: Discovery of parameterised data access services through hierarchical classification schemes.

Discovery of parameterised data access services through hierarchical classification

schemes.

Page 2: Discovery of parameterised data access services through hierarchical classification schemes.

The Problems…

• Specific term (e.g. a species name) and services that support queries– OBIS has all fauna – how do you find it if

looking for “whales”– Cant classify with all 50K names..

• General term “fauna” and many services– Eg 100 map layers with whale species

distribution maps

Page 3: Discovery of parameterised data access services through hierarchical classification schemes.

Problem restated

• Types of service supported– Records– Specimens– Distribution models– Management zones– Survey effort

• Sparsely populated (in general)

Page 4: Discovery of parameterised data access services through hierarchical classification schemes.

User wants…

• Samples of services against type– link to more

• Type = named data access query template? – Think so for phase 1 : propose this as a single

solution (minimise metadata, maximise consistency)– Try stuffing the data in!– Review in phase 2

• Implication – don’t know type in advance!– Advanced search – browse DAQT by topic

Page 5: Discovery of parameterised data access services through hierarchical classification schemes.

Granularity of DAQT

• WMS = map

• Or– Distribution map– Sampling map– Tracking map– Corridors– Management Zones

• Orthogonal to topic?

Page 6: Discovery of parameterised data access services through hierarchical classification schemes.

Requirements

• Finding services that may contain the data content being searched for

• Avoiding false positives– Services that don’t possibly contain the data– Allow “no records” where meaningful

• Easily find key general services in masses of specific services

• Clear, predictable rules for managing metadata

Page 7: Discovery of parameterised data access services through hierarchical classification schemes.

Multi-dimensional classifications

• Classify services with three sets of associations/metadata slots:– parameterType– contentDescriptor– contentClassifier

• Search strategies:– Specific then general

• Harvest terms, don’t search hierarchy at runtime

Page 8: Discovery of parameterised data access services through hierarchical classification schemes.

parameterType

This identifies the “type” of term from the classification ontology that can be used to parameterise the service interface.– eg. OBIS has

parameterType=speciesTaxonomy:species

• Does this get used in discovery?• Yes – because it tells you that for a given

discovery target this service may be used if it contains the right content

Page 9: Discovery of parameterised data access services through hierarchical classification schemes.

contentDescriptor

• Contains a set of taxonomy terms for which the service provides “semantic coverage”

• e.g. Whale migration DB has contentDescriptor =speciesTaxonomy:order:cetacea

• Declares what class of content it may be searched for, not how to search

• Can be multivalued if not all sub terms are represented classify by those that do – eg. Mammalia, Reptilia

Page 10: Discovery of parameterised data access services through hierarchical classification schemes.

Search Strategy 1Action Example Result

Enter phrase “blue whale” targetTermType=Species

targetTermValue=Tursiops tursiops

Classifier=kingdom:mammalia

Classifier=family:baleenidae

Search

parameterType=targetTermType

AND

contentDescriptor in [targetTermValue, Classifiers]

parameterType=Species

contentDescriptor in (kingdom:mammalia, family:baleenidae etc)

OBIS (pt=Species, cD=fauna)

Whale Distributions DB (pt=Species, cD=cetacea)

Blue Whale Sanctuary (pt=Species, cD=species:XX

Page 11: Discovery of parameterised data access services through hierarchical classification schemes.

Good?

• Can discover specific services and more general ones from a search phrase

• Can auto-populate classifier search terms from name service

• Cant find specific services from more general term…

Page 12: Discovery of parameterised data access services through hierarchical classification schemes.

contentClassifier

• Describes specific data set against more general terms it may be discovered by– Eg Blue Whale Sanctuary can be discovered

looking for “whales”• Actually searching Order=cetacea

• Fully populated heirarchy– Automated generation– Easier search

Page 13: Discovery of parameterised data access services through hierarchical classification schemes.

Search Strategy 2Action Example Result

Enter phrase “blue whale” targetTermType=Species

targetTermValue=Tursiops tursiops

Classifier=kingdom:mammalia

Classifier=family:baleenidae

Search

contentClassifier in [targetTermValue, Classifiers]

contentClassifier in (kingdom:mammalia, family:baleenidae etc)

OBIS (pt=Species, cD=fauna)

Whale Distributions DB (pt=Species, cD=cetacea)

Blue Whale Sanctuary (pt=Species, cD=species:XX

Page 14: Discovery of parameterised data access services through hierarchical classification schemes.

Issues

• Potentially large number of hits against general term

• But how do you know until you try?– MaxResults strategy good enough…– But don’t want to lose more specific results

• The closer the parameterType is to the search term (in hierarchy) the more relevant?

• Parameterised service reflects governance?

Page 15: Discovery of parameterised data access services through hierarchical classification schemes.

Search workflow

• Search strategies 1 and 2 are quite different– Can they be reconciled?

• Search workflow:– Search (type 1)– Wider search if results not found– Search type 2

Page 16: Discovery of parameterised data access services through hierarchical classification schemes.

“Brute force” alternative

• Search 1 U Search 2– 1 or 2 calls?

• Max results

• Ordering Search 1 > Search 2– Or max for each?

• Related services – may be the key?– Two different models for species distribution– Archive vs current boundaries