Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

31
9 SEPT 2015 EnviroInfo & ICT4S 2015 Kyle Copas

Transcript of Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

Page 1: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

9 SEPT 2015

EnviroInfo & ICT4S 2015Kyle Copas

Page 2: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

WHAT IS GBIF?

http://www.gbif.org

Page 3: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

WHAT IS GBIF?

http://www.gbif.org

• Open-access research infrastructure for biodiversity information, funded by the world’s governments

• Established in 2001 on recommendation byOECD Global Science Forum in 1999

• 92 national and organizational participants: ‘member state’ approach

• Secretariat hosted by KU and attached to the Zoological Museum in Copenhagen

Page 4: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

WHAT DOES GBIF DO?

• Provides and maintains open access and web services to global species data through GBIF.org

• Promotes common standards and free tools for biodiversity data management and exchange

• Offers guidance on national mobilization of biodiversity information

• Supports collaborative network at global and regional levels

www.gbif.org

Page 5: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

GBIF BY THE NUMBERS

570,330,659 species occurrence records

1,611,321 species

15,109 datasets

763data-publishing institutions

http://www.gbif.org | 9 SEP 2015

Page 6: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

CITATIONS IN PEER-REVIEWED RESEARCH

2 SEP 2015

rese

arch

use

Annual number of peer-reviewed publications using GBIF-mediated data

2008

2009

2010

2011

2012

2013

2014

2015 (Jan-Aug)

0 50 100 150 200 250 300 350 400

52

89

148

169

229

249

357

261

Page 7: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

OCCURRENCE DATA SHARED THROUGH GBIF

Observations from field surveys,

inventories and citizen scientists

Records extracted from literature

Specimens from museum and

herbarium collections

Page 8: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data
Page 9: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data
Page 10: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

DARWIN CORE (DwC) STANDARD

http://rs.tdwg.org/dwc/index.htm

Page 11: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

DARWIN CORE ARCHIVE (DwC-A)

http://rs.tdwg.org/dwc/terms/guides/text/index.htm

Page 12: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

DARWIN CORE ARCHIVE (DwC-A)

http://rs.tdwg.org/dwc/terms/guides/text/index.htm

Preferred format for publishing data to GBIF• Simple format: text files• Efficient harvesting: single file• Efficient storage: compressed• Easy access: no special software required• Extensible: related files in one archive

Page 13: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

http://www.gbif.org/newsroom/news/sample-based-data

• Darwin Core extension for sample-based data ratified by TDWG

• IPT v2.3 (released today?) supports publication of sample-based datasets

• Training manual on publishing sample-based datasets coming in Oct 2015

• GBIF.org to add simple search and discovery of sample-based datasets in 2016

data mobilization

SAMPLE-BASED DATA2015 objective: Enable mobilization and discovery of biodiversity

monitoring data using defined sampling protocols, including measures of species abundance

Photo CC BY-ND 2014 Florida Fish and Wildlife, Karen Parker https://flic.kr/p/pJVe2R

Page 14: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

DwC-A WITH EVENT CORE

http://rs.tdwg.org/dwc/terms/guides/text/index.htm

Event core

Occurrence

Measurement-or-fact

meta.xml

EML.xml…

+DwC Archive

Page 15: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data
Page 16: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

GBIF API v1

RESTful JSON-based API provides discovery, access and information services to users (including GBIF) for

• Registry: administrative details• Checklists: taxonomies and spp. info• Occurrences: search and download• Maps• Metrics and utilities

Active, responsive mailing list for API usersFully documented on GBIF.org

http://www.gbif.org/developer/summary

Page 17: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

REPORTING ON DATA TRENDS

http://www.gbif.org/analytics/global

glob

al

Page 18: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

REPORTING ON DATA TRENDS

http://www.gbif.org/analytics/country/SE/published

swed

en

Page 19: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

http://www.gbif.org/country

• Currently connecting data and other APIs to automate national reporting

• Intended to show important details on national use and impact of GBIF

• Links metrics, trend analysis and recent content

• Board review of prototype reports next month; roll-out expected for all countries in early 2016

wnat’s next

COUNTRY REPORTS2015 objective: Provide regular updates to highlight availability, use and

impact of data mobilized by the GBIF network at national level

Page 20: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data
Page 21: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data
Page 22: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

GBIF API: JSON RESPONSE

http://api.gbif.org/v1/occurrence/985492326/fragment

Page 23: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

DATA SOURCES

Citizen science networks• eBird: 150 million high-quality

observations• Recorder networks in UK,

Sweden, Norway, Finland et al.Bioblitzes• Sometimes hosted by national

and organizational participants• Important tool for public outreach

Crowdsourced digitization• Virtual public transcription

projects: ongoing or timebound

citiz

en s

cien

ce

Page 24: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

CITIZEN SCIENCE DATA: 2014TOP NATIONAL PUBLISHERS EXCLUDING eBIRD

GBIF Secretariat analysis, Nov 2014

Country # ObservationsUSA 161,894,332Sweden 38,604,747UK 22,304,967Finland 13,519,114Australia 7,448,478Germany 4,951,803Denmark 4,609,679Ireland 1,582,524Norway 370,911Estonia 169,086Belgium 370,911Canada 121,916

Country # ObservationsSweden 38,604,747UK 22,304,967Finland 13,519,114USA 8,362,169Australia 7,448,478Germany 4,951,803Denmark 4,609,679Ireland 1,582,524Norway 370,911Estonia 169,086Belgium 370,911Canada 121,916

citiz

en s

cien

ce

Page 25: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

BASIS OF RECORD

• Greater proportion of observations

• Contributions to specimen data are underreported and underappreciated

GBIF Secretariat analysis, Nov 2014

Basis of record Citizen science Institutional

Human observation 204,330,760 67,405,200

Observation 45,803,179 55,760,093

Unknown 3,782,583 36,026,607

Preserved specimen 1,984,634 97,090,284

Literature 51,204 404,875

No information 81 3

Fossil specimen 47 3,850,597

Living specimen 0 822,136

Machine observation 0 689,739

Material sample 0 2,293

Ratio of specimens 0.8% 39%

citiz

en s

cien

ce

Page 26: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

CITIZEN SCIENCE NETWORKS

iNaturalist• Naturalists + social media +

photo sharing = citizen science

• Open source software leveraging existing APIs

• Community curation produces ‘research grade’ observations

http://www.inaturalist.org

citiz

en s

cien

ce

Page 27: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data
Page 28: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data
Page 29: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data
Page 30: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

SUGGESTIONS / OBSERVATIONS

• Continue to build on existing standards Ensure that domain-specific infrastructures are cross-compatible, at a minimum

• Keep it simpleComplexity—even if it’s under wraps—could be at odds with the needs of citizen scientists

• Focus on the user rather than the observerCitizens want to make a difference, so ensure that their data helps users first and foremost

www.gbif.org

Page 31: Intro to GBIF: Infrastructures and Platforms for Environmental Crowd Sensing and Big Data

Kyle [email protected]

gbif.orgTwitter @gbifFacebook gbifnewsLinkedIn linkedin.com/grp/home?gid=55171Github github.com/gbif

GBIF Ebbe Nielsen Challenge gbif.devpost.com

gbif2.devpost.com