Persistent identifiers for digitized specimens
-
Upload
dag-endresen -
Category
Technology
-
view
417 -
download
0
description
Transcript of Persistent identifiers for digitized specimens
GBIF European Regional Nodes Meeting, 6 to 8 March, 2013, Joensuu, Finland
Globally unique identifiers for digitized specimensComparison of alternatives
Dag EndresenGBIF Norway, NHM-UiONatural History Museum, University of Oslo (NHM-UiO)Global Biodiversity Information Facility (GBIF)
6 March 2013
Topics
• Darwin Core (DwC) & Identifiers• Persistent Identifiers• UUIDs• PID and the digitization workflow
2
Darwin Core – a vocabulary of terms
Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715
Term name: occurrenceID
Identifier: http://rs.tdwg.org/dwc/terms/occurrenceID
Class: http://rs.tdwg.org/dwc/terms/Occurrence
Definition: An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique.
Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]".
Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732".
For discussion see http://code.google.com/p/darwincore/wiki/Occurrence
Record-level Termsdcterms:type | dcterms:modified | dcterms:language | dcterms:rights | dcterms:rightsHolder | dcterms:accessRights | dcterms:bibliographicCitation | dcterms:references | institutionID | collectionID | datasetID | institutionCode | collectionCode | datasetName | ownerInstitutionCode | basisOfRecord | informationWithheld | dataGeneralizations | dynamicProperties
OccurrenceoccurrenceID | catalogNumber | occurrenceRemarks | recordNumber | recordedBy | individualID | individualCount | sex | lifeStage | reproductiveCondition | behavior | establishmentMeans | occurrenceStatus | preparations | disposition | otherCatalogNumbers | previousIdentifications | associatedMedia | associatedReferences | associatedOccurrences | associatedSequences | associatedTaxa
EventeventID | samplingProtocol | samplingEffort | eventDate | eventTime | startDayOfYear | endDayOfYear | year | month | day | verbatimEventDate | habitat | fieldNumber | fieldNotes | eventRemarks
dcterms:LocationlocationID | higherGeographyID | higherGeography | continent | waterBody | islandGroup | island | country | countryCode | stateProvince | county | municipality | locality | verbatimLocality | verbatimElevation | minimumElevationInMeters | maximumElevationInMeters | verbatimDepth | minimumDepthInMeters | maximumDepthInMeters | minimumDistanceAboveSurfaceInMeters | maximumDistanceAboveSurfaceInMeters | locationAccordingTo | locationRemarks | verbatimCoordinates | verbatimLatitude | verbatimLongitude | verbatimCoordinateSystem | verbatimSRS | decimalLatitude | decimalLongitude | geodeticDatum | coordinateUncertaintyInMeters | coordinatePrecision | pointRadiusSpatialFit | footprintWKT | footprintSRS | footprintSpatialFit | georeferencedBy | georeferencedDate | georeferenceProtocol | georeferenceSources | georeferenceVerificationStatus | georeferenceRemarks
GeologicalContextgeologicalContextID | earliestEonOrLowestEonothem | latestEonOrHighestEonothem | earliestEraOrLowestErathem | latestEraOrHighestErathem | earliestPeriodOrLowestSystem | latestPeriodOrHighestSystem | earliestEpochOrLowestSeries | latestEpochOrHighestSeries | earliestAgeOrLowestStage | latestAgeOrHighestStage | lowestBiostratigraphicZone | highestBiostratigraphicZone | lithostratigraphicTerms | group | formation | member | bed
IdentificationidentificationID | identifiedBy | dateIdentified | identificationReferences | identificationVerificationStatus | identificationRemarks | identificationQualifier | typeStatus
TaxontaxonID | scientificNameID | acceptedNameUsageID | parentNameUsageID | originalNameUsageID | nameAccordingToID | namePublishedInID | taxonConceptID | scientificName | acceptedNameUsage | parentNameUsage | originalNameUsage | nameAccordingTo | namePublishedIn | namePublishedInYear | higherClassification | kingdom | phylum | class | order | family | genus | subgenus | specificEpithet | infraspecificEpithet | taxonRank | verbatimTaxonRank | scientificNameAuthorship | vernacularName | nomenclaturalCode | taxonomicStatus | nomenclaturalStatus | taxonRemarks
ResourceRelationship (Auxiliary Terms)resourceRelationshipID | resourceID | relatedResourceID | relationshipOfResource | relationshipAccordingTo | relationshipEstablishedDate | relationshipRemarks
MeasurementOrFact (Auxiliary Terms)measurementID | measurementType | measurementValue | measurementAccuracy | measurementUnit | measurementDeterminedDate | measurementDeterminedBy | measurementMethod | measurementRemarks
Semantic MediaWiki
a forum for
discussion and development of
terminology.
http://terms.gbif.org/
9
10
• Persistent Identifier (PID)• Globally Unique Identifier (GUID)• Universal Resource Identifier (URI)• Persistent Uniform Resource Locator (PURL)• Life Science Identifier (LSID)• Digital Object Identifier (DOI)• Handle system (Handle)• Archival Resource Key (ARK)• Universally Unique Identifier (UUID)
11
• Scalability, number of IDs• Community acceptance• Long-term life-cycle• Resolvable, resolution service(s)• Cost per identifier• People-friendly or machine-friendly• Generation of IDs
– Central generation, PID issuer – Distributed generation at source
12
• A UUID is a 16-octet (128-bit) number.• Example: C37E3F9B-BCAF-4479-8EB7-
3346A2DB2373
• The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.
• Allows for easy generation at source in a distributed network.
13
• Quick Response Code (QR code).• A type of matrix barcode (or two-
dimensional code).• Popular due to its fast readability and large
storage capacity.• The use of QR Codes is free of any license.• The QR Code is clearly defined and
published as an ISO standard.• Invented in Japan by the Toyota subsidiary
Denso Wave in 1994.14
QR code for all museum objects at NHM-UiO would provide:
•Machine-readable using an ordinary smart phone (or a barcode reader).
•New and efficient workflows for collection management.
•Deployment for stable identifiers appropriate for data-basing.
15
dwc:datasetID DOI?
Furthermore, I think that we need persistent identifiers!
Cato the Elder ended all his speeches in the senate of Rome with: "Ceterum
autem censeo Carthaginem esse delendam" (English: "Furthermore, I think Carthage must be destroyed").
21