Integrating Live Plant Images with Other Types of Biodiversity
Records
Steve BaskaufVanderbilt Dept. of Biological Sciences
http://bioimages.vanderbilt.edu/August 3, 2010
I. Challenges in Biodiversity Informatics
• Common interest in databasing metadata.• Metadata describe resources and their
properties. • Resource: anything that can be assigned an
identifier (e.g. a tree, a specimen, an image, a taxon, a name, etc.)
• Property: a string literal that describes the resource or a relationship between the subject resource and some other resource.
Example: Vanderbilt Arboretum 5935 identifiedand geolocated trees
Example
subjectresource(a tree)
“native”
literalproperty
relationshipproperty
objectresource(an image)
text string
establishmentMeans
depiction
Relationship “graph”
the tree(7-314)
image(79657)
“native”establishmentMeans
depiction
Tree ID Establishment Means
Image ID
7-314 native 79657
7-340 native 79674
4-145 cultivated 79684
Traditional database (typical for specimens)
Non-“flat” relationships in live-plant imaging
live tree
whole tree imageleaf image
bark image
determination
taxon
standardized viewsBaskauf and Kirchoff (2008)
Vulpina 7:16-30
Duplicate herbarium specimens
live tree
specimen image
duplicate herbarium specimen at institution B
herbarium specimen at institution A
live treesame
individual
determination A
taxon A
determination B
taxon B
live tree (individual organism)
whole tree image
leaf image
bark image
determination A
taxon A
specimen image
herbarium specimen
determination B
taxon B
Complex relationshipsindividual-based organization system
Baskauf (2010) Biodiversity Informatics 7:17-44
II. Building blocks of a Web-based metadata system
1. We need to be able to unambiguously identify the resources (globally unique identifiers =GUIDs)
2. We need standardized property definitions (e.g. Darwin Core terms)
3. We need a technological solution for communicating properties and relationships to a user anywhere (RDF/XML representation sent to user via the Internet)
design principleshttp://bioimages.vanderbilt.edu/guid
Building block #1: GUIDs
A globally unique identifier (GUID) should be:1. globally unique2. actionable3. persistent
Anyone on the planet should be able to use the GUID to find out about the particular thing that it identifies, forever.
That is a pretty tall order (but you can do it)!!!
1. How do you make an identifier globally unique?
• Create a locally unique identifier:– identifier (catalog number) unique within a
collection, e.g. GIS tree ID number: 7-314– namespace (collection code) unique within the
institution, e.g. vanderbilt
vanderbilt/7-314• Make it globally unique by appending a
domain name that you control, e.g. bioimages.vanderbilt.edu
Complete HTTP URI GUID
• combine “http://” with other pieces: http://bioimages.vanderbilt.edu/vanderbilt/7-314
• This identifier looks like a URL!
An HTTP URI is a uniform resource identifier as well as a resource locator (web address=URL).
2. What does actionable mean?
• Something happens when you put an actionable GUID in a Web browser (GUID is “resolved”).
• HTTP URIs– unlike LSIDs and DIOs, they work in any web
browser– resolved using existing Internet infrastructure– consensus GUID of Linked Data (Semantic
Web) community– http://bioimages.vanderbilt.edu/vanderbilt/7-314
3. Persistent URIs always work
• URIs “break”: when filenames change:
Javascript based URI:http://bioimages.vanderbilt.edu/metadata.htm?baskauf/66921/metadata/img/3456/2304
Independent of method:http://bioimages.vanderbilt.edu/baskauf/66921.htm
Both URIs eventually lead to the same page, but the second URI is simpler and won’t change.
• URIs “break”: when domain names disappearbioblitznashville.org vs. vanderbilt.edu• Planning for URI permanence is important.
How long is “persistent”?
• Forever is a pretty long time.• The Internet is only 40 years old and the Web
only 20.• Plan for your institution and domain name to
last at least 10 years.• Don’t change the URI of anything that you are
trying to identify!
Building block #2: Standardized property definitions
Recent consensus on metadata terms:• Dublin Core Metadata Initiative (DCMI) =
describes generic resources• Friend-Of-A-Friend (FOAF) = describes people
and their affiliations• Darwin Core (DwC) = describes biodiversity
resources• Media Resources Task Group (MRTG) =
describes media (e.g. images) in a biodiversity context
A property described by a metadata term:
• is an HTTP URI, e.g. http://rs.tdwg.org/dwc/terms/establishmentMeans
• has a definition that can be accessed via the Internet
• has an abbreviated form that usually makes sense to humansdwc: = http://rs.tdwg.org/dwc/terms/so the abbreviated URI for the term isdwc:establishmentMeans
subject resource (tree)
“native”
object resource (image)
establishmentMeans
depiction
nativehttp://bioimages.vanderbilt.edu/vanderbilt/7-314
dwc:establishmentMeans
foaf:depiction
http://bioimages.vanderbilt.edu/baskauf/79657
Resource Description Framework (RDF) graph
Building block #3: Communicating relationships
nativehttp://bioimages.vanderbilt.edu/vanderbilt/7-314
dwc:establishmentMeans
foaf:depiction
http://bioimages.vanderbilt.edu/baskauf/79657
Resource Description Framework (RDF) graph
RDF in XML format (a tiny snippet)<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314"> <dwc:establishmentMeans>native</dwc:establishmentMeans> <foaf:depiction rdf:resource="http://bioimages.vanderbilt.edu/baskauf/79657"/></rdf:Description>
How do you translate relationships into language a computer can understand?
III. Why use a new way to describe metadata?
• People are good at figuring out what web pages mean.
• Computers (like a GoogleBot) have to guess what the information on a web page means.
• The Semantic Web (a.k.a. Web 2.0) provides a means to provide information to computers explicitly.
Content Negotiation, part 1“I am a human. Send me
http://bioimages.vanderbilt.edu/vanderbilt/7-314”
web server
GET http://bioimages.vanderbilt.edu/vanderbilt/7-314MIME type: text/html
http://bioimages.vanderbilt.edu/vanderbilt/7-314.htm
I cannot send this guy a tree!
Web page
Content Negotiation, part 2
web server
GET http://bioimages.vanderbilt.edu/vanderbilt/7-314MIME type: application/rdf+xml
http://bioimages.vanderbilt.edu/vanderbilt/7-314.rdf
10011000101!
XML file
“I am a computer. Send me http://bioimages.vanderbilt.edu/vanderbilt/7-314”
What’s so great about this?• A computer can crawl the Web and discover
metadata about resources that are identified by HTTP URI GUIDs.
• RDF metadata from many sources can be assembled into a database (RDF “triple store”).
• The database can be searched or used to generate web content.
• Source data does not need to be “sent” to the database; any “semantic web client” can retrieve it at will.
• The format is standard, no special communication protocols are required.
Why would this benefit me now?
• RDF/XML metadata files for numerous resources can be transformed directly into web pages using a single program file.
single web page usingXSLT and/or AJAX
Benefits (cont.)
• Branding in the URI.http://bioimages.vanderbilt.edu/vanderbilt/7-314
Benefits (cont.)
• HTTP URI GUIDs provide direct access to metadata about a resource to anyone with Internet access. – Clickable attribution link in website– Reference link in publication PDF– Physical QR codes for Smart Phone access
QR code on a museum display
http://bioimages.vanderbilt.edu/
Top Related