The GBIF IPT in the landscape of biodiversity data publishing
Biodiversity Data Publishing:...
Transcript of Biodiversity Data Publishing:...
-
Biodiversity Data Publishing: Overview Fhatani Ranwashe
-
Biodiversity Data Publishing: Overview
INDEX
• Background
• Data publishing landscape
• Biodiversity data publishing
o Data types
o Data standards
o Data quality
o Data publishing methods
• Data publishing in South Africa
-
Background
“Free and open access to primary biodiversity data is
essential for informed decision-making to achieve
conservation of biodiversity and sustainable development.
However, primary biodiversity data are neither easily
accessible nor discoverable. “ (Chavan and Penev, 2011)
Biodiversity data publishing refers to making biodiversity
data available to the public in a standard form via the
internet.
-
Biodiversity Data Publishing: Overview
Data publishing landscape
Publishing and discovery of biodiversity data: the constraints and challenges
• the lack of sustainable practices for data publishing;
• the lack of easy-to-use tools and related guidelines for authoring metadata
documents;
• the difficulty of dealing with heterogeneity and diversity of standards;
• the cost of creation and maintenance of infrastructure by small- and medium-scale
data publishers; and
• the lack of professional reward structures or incentives.
Chavan and Penev, BMC Bioinformatics 2011, 12(Suppl 15):S2
-
Biodiversity Data Publishing: Overview
Data publishing landscape
“Currently the GBIF facilitates discovery of over 10,000 data resources,
providing access to over 267 million primary biodiversity data records.” (Chavan
and Penev, 2011)
Partnerships with other international organisations such as the Catalogue of
Life, Biodiversity Information Standards (TDWG), the Consortium for the
Barcode of Life (CBOL), the Encyclopaedia of Life (EoL), and Integrated
Taxonomic Information System (ITIS).
-
Data publishing landscape
2008 2009 2010 2011 2012
The data publishing area is in continuous evolution and expansion.
2014
Idea for simple,
compressed text-based file for publishing introduced at
TDWG
DiGIR/TAPIR in high use to
publish biodiversity
data
GBIF introduces
IPT 1.0
Darwin Core
standards established by TDWG
GBIF redevelops
IPT
GBIF introduces
IPT 2.0
Data Publishing taught at Nodes
training.
Nodes and aggregators
begin to install and use IPTs
Occurrence and checklist type datasets show
continued growth
GBIF introduces
IPT 3.0 with Digital
Object Identifier
(DOI)
-
Data Types
• Occurrences (observations, specimens etc)
o 'collection event'
o an observation in the field, vouchered (labeled) specimen in a museum or
herbarium, or other evidence.
-
Data Types
• Checklists (names)
o lists of scientific names of organisms grouped into taxonomic hierarchies,
o provide taxonomic 'backbones' around which species information can be
organized.
-
Data Types
• Metadata (data about data)
o structured descriptions of datasets
o help to give context to datasets and enable users to assess whether data are
fit for use in a particular research project or application
-
Data Types
http://www.gbif.org/publishing-data/summary#datatypes
• Sampling-event (quantitative information )
o records from thousands of different kinds of environmental, ecological, and
natural resource monitoring and assessment investigations
-
Biodiversity Data Publishing: Overview
Data Standards
ABCD Access to Biological Collection Data (2005)
DwC Darwin Core (2009)
AC Audubon Core Multimedia Resources Metadata Schema (2013)
NCD Natural Collection Descriptions (Draft)
http://www.tdwg.org
-
Biodiversity Data Publishing: Overview
Darwin Core
-
Mapping cores
• Taxon Core – Species information
The category of information pertaining to taxonomic names, taxon
name usages, or taxon concepts. 43 terms.
• Occurrence Core – Collection event information
The category of information pertaining to evidence of an
occurrence in nature, in a collection, or in a dataset (specimen,
observation, etc.). 169 terms.
• Event – Sampling information
The category of information pertaining to a sampling event. Issued
29 May 2015. 95 terms
-
Biodiversity Data Publishing: Overview
Extensions
• Darwin Core does not provide terms for every possible type of data.
– 22 registered
– 25 under development
• Examples
– Audubon Media Description (aka Audubon Core)
– Darwin Core Identification History
– Darwin Core Measurement or Facts
-
Biodiversity Data Publishing: Overview
Darwin Core Archive
• A Darwin Core Archive (DwCA) is the text
representation of data formatted to Darwin
Core.
• A DwCA is a compressed file containing a
minimum of three files.
-
Biodiversity Data Publishing: Overview
Star schema
Literature
Taxon
Core
Descriptio
n
Occurrences
meta.xml
EML.xml
+
DwC Archive
Checklist
Vernacula
r
Distributio
n
Type
s
-
Biodiversity Data Publishing: Overview
Simple Darwin Core
• SIMPLEDWC - flat file
structure showing how to
use taxa & occurrence
Darwin Core terms.
• Use if someone suggests
to "Format your data
according to the Darwin
Core"
-
Biodiversity Data Publishing: Overview
Data quality
“Data quality and error in data are often neglected issues with environmental
databases, modelling systems, GIS, decision support systems, etc. Too often, data are
used uncritically without consideration of the error contained within, and this can lead
to erroneous results, misleading information, unwise environmental decisions
and increased costs.” (Chapman, 2005)
• Data capture and recording at the time of gathering,
• Data manipulation prior to digitisation (label preparation, copying of data to a
ledger, etc.),
• Identification of the collection (specimen, observation) and its recording,
• Digitisation of the data,
• Documentation of the data (capturing and recording the metadata),
• Data storage and archiving,
• Data presentation and dissemination (paper and electronic publications, web-
enabled databases, etc.),
• Using the data (analysis and manipulation).
-
Biodiversity Data Publishing: Overview
Data publishing methods
-
Biodiversity Data Publishing: Overview
Data publishing methods
-
Data publishing in South Africa
South Africa is a country node for GBIF
• SANBI IPT:
o http://197.189.235.147:8080/iptsanbi/
o 2737451 records published
• ADU IPT
o http://aduipt.uct.ac.za:8080/ipt-2.3.2/
o 288822 records published
• SAIAB IPT
o http://ipt.saiab.ac.za
o 138140 records published
• ICLEI IPT
o http://197.189.235.147:8080/ipticlei/
• KwaZulu-Natal Museum IPT
• Endangered Wildlife Trust IPT
-
2.1m
SANBI and Data Partners by numbers
649k
105k
78k
8.5m
5k
51k (Nematodes)
SANBI-IPT (Trusted Data Hosting Centre)
18k (Algae)
17k
91k
-
Thank You!