Conclusions. LSIDs suck (sadly) “suck”is a technical term.

141
Conclusions

Transcript of Conclusions. LSIDs suck (sadly) “suck”is a technical term.

Conclusions

LSIDs suck (sadly)

“suck”is a technical term

DOIs suck ($ € £ )

Handles suck less

Metadata matters

RDF rocks

XML schema suck

What we need:

Unique identifiers

Resolvable

Have metadata

Taxonomic names aren’t enough

Names have too much information

Cherie Booth

Cherie Booth Cherie Blair

Names can change when circumstances change

Jonathon Roughgarden = Joan Roughgarden

Names carry meaning

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Semantically opaque

identifier has no meaning

trouble with meaning

Zbtb7

POK erythroid myeloid ontogenic factor

POK erythroid myeloid ontogenic factor

Pokemon gene

Pokemon causes cancer

Funny!

Not funny

Zbtb7

Zbtb7

LSID parts

Opaque is a myth

Credit card

LSIDs are nice

Explict metadata and data access

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

LSIDs suck

Have to fuss with DNS

Reliant on Internet address

What about DOI’s ?

Resolve this:

doi:10.1080/10635150490264996

What do you get?

Have subscription?

What, no subscription?

Metadata?

Human-readable documents

Can’t predict what you get

Handles might be useful

hdl:2254/20971

Handle to HTML

HTML is XML

GUID resolving to metadata

Metadata matters

RDF

Resource Description Framework

Simple format (e.g., XML)

Everything is a resource…

…or a literal

supports inference

underpins Semantic Web

subject object

property

“triple”

http://www.w3.orgWorld Wide Web

consortium

dc:publisher

RDF is everywhere

RDF is everywhere

RDF is everywhere

Existing vocabularies

Basic metadata (Dublin Core)

Geography (WGS 84)

Publications (PRISM)

People (FOAF)

Rights (Creative Commons)

Requires you to have URIs for objects

URIs include:

URL

URN

DOI

LSID

RDF documents can be independent

Can be as small as one triple

Aggregate triples from different sources

Store in a triple store

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Make new inferences

There are known knowns, things we know that we

know

There are known unknowns, things we now

know we don’t know

But there are also unknown unknowns, things we do not

know we don't know

unknown knowns

things we don’t know we know

latent knowledge

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Poissonia heterantha

Coursetia heterantha

Tephrosia heterantha

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Tephrosia heterantha

basionym

basionym

International Plant Names Index (IPNI)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Poissonia heterantha

Coursetia heterantha

Tephrosia heterantha

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

basionym

International Plant Names Index (IPNI)

basionym

Poissonia heterantha Coursetia heterantha

IPNI “knows” these names are synonyms

But doesn’t know it knows it

Melissotarsus insularis

Melissotarsus insularis no hit

CASENT0107663-D01 DQ176312

Melissotarsus sp. BLF m1DQ176312

CASENT0107663-D01Melissotarsus insularis

1

Melissotarsus insularisMelissotarsus sp. BLF m1 =

CASENT0107663-D01

DQ176312

Melissotarsus sp. BLF m1

source

subject TaxId:342313

TaxId:342313

HNS30687 Melissotarsus insularislabel

CASENT0107663-D01subject HNS30687

label

GUIDs are trivial

It’s about metadata

It’s about inference

Lessons from DOIs

Crossref

“reference backbone”

add value to electronic publications

$$$$$$

Crossref assigns DOI prefix

Crossref stores metadata

Crossref can be searched

Sound familiar?

“Oh, the vision thing” George Bush (Snr), 1987

GBIF assigns handles

Data sources provide identifiers

Data sources provide metadata and data

GBIF has metadata standards

Data sources supply metadata to GBIF

Low barrier to entry

Data sources install Java Handle System (or

whatever)

If use RDF then GBIF can support inference

We learn something we didn’t know

Are we ready to do this?

Quality of service

24

7

365

downtime

Memorandum

ASMX

404

GUIDs are easy

Metadata is what counts…

…and persistence