Download - Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Integrating Live Plant Images with Other Types of Biodiversity

Records

Steve BaskaufVanderbilt Dept. of Biological Sciences

http://bioimages.vanderbilt.edu/August 3, 2010

I. Challenges in Biodiversity Informatics

• Common interest in databasing metadata.• Metadata describe resources and their

properties. • Resource: anything that can be assigned an

identifier (e.g. a tree, a specimen, an image, a taxon, a name, etc.)

• Property: a string literal that describes the resource or a relationship between the subject resource and some other resource.

Example: Vanderbilt Arboretum 5935 identifiedand geolocated trees

Example

subjectresource(a tree)

“native”

literalproperty

relationshipproperty

objectresource(an image)

text string

establishmentMeans

depiction

Relationship “graph”

the tree(7-314)

image(79657)

“native”establishmentMeans

depiction

Tree ID Establishment Means

Image ID

7-314 native 79657

7-340 native 79674

4-145 cultivated 79684

Traditional database (typical for specimens)

Non-“flat” relationships in live-plant imaging

live tree

whole tree imageleaf image

bark image

determination

taxon

standardized viewsBaskauf and Kirchoff (2008)

Vulpina 7:16-30

Duplicate herbarium specimens

live tree

specimen image

duplicate herbarium specimen at institution B

herbarium specimen at institution A

live treesame

individual

determination A

taxon A

determination B

taxon B

live tree (individual organism)

whole tree image

leaf image

bark image

determination A

taxon A

specimen image

herbarium specimen

determination B

taxon B

Complex relationshipsindividual-based organization system

Baskauf (2010) Biodiversity Informatics 7:17-44

II. Building blocks of a Web-based metadata system

1. We need to be able to unambiguously identify the resources (globally unique identifiers =GUIDs)

2. We need standardized property definitions (e.g. Darwin Core terms)

3. We need a technological solution for communicating properties and relationships to a user anywhere (RDF/XML representation sent to user via the Internet)

design principleshttp://bioimages.vanderbilt.edu/guid

Building block #1: GUIDs

A globally unique identifier (GUID) should be:1. globally unique2. actionable3. persistent

Anyone on the planet should be able to use the GUID to find out about the particular thing that it identifies, forever.

That is a pretty tall order (but you can do it)!!!

1. How do you make an identifier globally unique?

• Create a locally unique identifier:– identifier (catalog number) unique within a

collection, e.g. GIS tree ID number: 7-314– namespace (collection code) unique within the

institution, e.g. vanderbilt

vanderbilt/7-314• Make it globally unique by appending a

domain name that you control, e.g. bioimages.vanderbilt.edu

Complete HTTP URI GUID

• combine “http://” with other pieces: http://bioimages.vanderbilt.edu/vanderbilt/7-314

• This identifier looks like a URL!

An HTTP URI is a uniform resource identifier as well as a resource locator (web address=URL).

2. What does actionable mean?

• Something happens when you put an actionable GUID in a Web browser (GUID is “resolved”).

• HTTP URIs– unlike LSIDs and DIOs, they work in any web

browser– resolved using existing Internet infrastructure– consensus GUID of Linked Data (Semantic

Web) community– http://bioimages.vanderbilt.edu/vanderbilt/7-314

http://bioimages.vanderbilt.edu/vanderbilt/7-314

3. Persistent URIs always work

• URIs “break”: when filenames change:

Javascript based URI:http://bioimages.vanderbilt.edu/metadata.htm?baskauf/66921/metadata/img/3456/2304

Independent of method:http://bioimages.vanderbilt.edu/baskauf/66921.htm

Both URIs eventually lead to the same page, but the second URI is simpler and won’t change.

• URIs “break”: when domain names disappearbioblitznashville.org vs. vanderbilt.edu• Planning for URI permanence is important.

http://bioimages.vanderbilt.edu/metadata.htm?baskauf/66921/metadata/img/3456/2304

http://bioimages.vanderbilt.edu/baskauf/66921.htm

How long is “persistent”?

• Forever is a pretty long time.• The Internet is only 40 years old and the Web

only 20.• Plan for your institution and domain name to

last at least 10 years.• Don’t change the URI of anything that you are

trying to identify!

Building block #2: Standardized property definitions

Recent consensus on metadata terms:• Dublin Core Metadata Initiative (DCMI) =

describes generic resources• Friend-Of-A-Friend (FOAF) = describes people

and their affiliations• Darwin Core (DwC) = describes biodiversity

resources• Media Resources Task Group (MRTG) =

describes media (e.g. images) in a biodiversity context

A property described by a metadata term:

• is an HTTP URI, e.g. http://rs.tdwg.org/dwc/terms/establishmentMeans

• has a definition that can be accessed via the Internet

• has an abbreviated form that usually makes sense to humansdwc: = http://rs.tdwg.org/dwc/terms/so the abbreviated URI for the term isdwc:establishmentMeans

subject resource (tree)

“native”

object resource (image)

establishmentMeans

depiction

nativehttp://bioimages.vanderbilt.edu/vanderbilt/7-314

dwc:establishmentMeans

foaf:depiction

http://bioimages.vanderbilt.edu/baskauf/79657

Resource Description Framework (RDF) graph

Building block #3: Communicating relationships

nativehttp://bioimages.vanderbilt.edu/vanderbilt/7-314

dwc:establishmentMeans

foaf:depiction

http://bioimages.vanderbilt.edu/baskauf/79657

Resource Description Framework (RDF) graph

RDF in XML format (a tiny snippet)<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314"> <dwc:establishmentMeans>native</dwc:establishmentMeans> <foaf:depiction rdf:resource="http://bioimages.vanderbilt.edu/baskauf/79657"/></rdf:Description>

How do you translate relationships into language a computer can understand?

III. Why use a new way to describe metadata?

• People are good at figuring out what web pages mean.

• Computers (like a GoogleBot) have to guess what the information on a web page means.

• The Semantic Web (a.k.a. Web 2.0) provides a means to provide information to computers explicitly.

Content Negotiation, part 1“I am a human. Send me

http://bioimages.vanderbilt.edu/vanderbilt/7-314”

web server

GET http://bioimages.vanderbilt.edu/vanderbilt/7-314MIME type: text/html

http://bioimages.vanderbilt.edu/vanderbilt/7-314.htm

I cannot send this guy a tree!

Web page

Content Negotiation, part 2

web server

GET http://bioimages.vanderbilt.edu/vanderbilt/7-314MIME type: application/rdf+xml

http://bioimages.vanderbilt.edu/vanderbilt/7-314.rdf

10011000101!

XML file

“I am a computer. Send me http://bioimages.vanderbilt.edu/vanderbilt/7-314”

What’s so great about this?• A computer can crawl the Web and discover

metadata about resources that are identified by HTTP URI GUIDs.

• RDF metadata from many sources can be assembled into a database (RDF “triple store”).

• The database can be searched or used to generate web content.

• Source data does not need to be “sent” to the database; any “semantic web client” can retrieve it at will.

• The format is standard, no special communication protocols are required.

Why would this benefit me now?

• RDF/XML metadata files for numerous resources can be transformed directly into web pages using a single program file.

single web page usingXSLT and/or AJAX

Benefits (cont.)

• Branding in the URI.http://bioimages.vanderbilt.edu/vanderbilt/7-314

Benefits (cont.)

• HTTP URI GUIDs provide direct access to metadata about a resource to anyone with Internet access. – Clickable attribution link in website– Reference link in publication PDF– Physical QR codes for Smart Phone access

QR code on a museum display

http://bioimages.vanderbilt.edu/