iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (CooperativeAgreement EF-1115210). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do notnecessarily reflect the views of the National Science Foundation. All images used with permission or are free from copyright.
Insights from Advancing the Digitization ofBiodiversity Collections (ADBC)
DeborahPaul,GregRiccardi,GilNelsoniDigBio,FloridaStateUniversityICEDIG 5-6March2018@idbdeb @griccardi @iDigGilNelson @iDigBio
2
Topics
• ADBC Model Integrated Collections Network• Community Building• Resources developed• Lessons learned• Key components of such a program
3
How to get digitisation going (c. 2009)
• Step 1: Make a plan and get funding• Step 2: Create a central coordination program• Step 3: Fund digitization projects• Step 4: Digitize and organize• Step 5: Publish and use data
• ∞: figure out how to keep going
4
ADBC: Advancing Digitization of Biodiversity Collections
• A call from the community > NIBA• US National Science Foundation
– Budgetis$100millionover10years,weareinyear7.• The goal is to digitize and aggregate
– 100s ofMillionsofbiologicalandpaleontologicalrecordsoverthe10-yearlifeoftheproject.
• iDigBio project is the hub of ADBC– UofFlorida,FloridaStateU
• Digitization projects– FundedbyNSFpeerreview– 20ThematicCollectionsNetworks
• We are encouraged by our funders to collaborate– Issuesareglobal,effortneedstobeglobal
5
ADBC is iDigBio and the Thematic Collection Networks (TCNs)
Credit: Malcolm Burrows
6
NIBA Strategic Plan 2010
• Vision Statement: NIBA will– Developaninclusive,vibrant,partnershipofU.S.
biologicalcollections– Documentthenation’sbiodiversityresources– Createadynamicelectronicresource– Servethecountry’sneedsinansweringcriticalquestions
abouttheenvironment,humanhealth,biosecurity,commerce,andthebiologicalsciences
7
What does iDigBio do?• Enable digitization of biodiversity collections data
– Developefficient&effectivestandards&workflows– Workforceeducation&training
• Provide portal access to biodiversity datain a cloud computing environment– Respondtocyberinfrastructureneeds– Enableaccess&discoverability
• Facilitate use of biodiversity data toaddress key environmental andeconomic challenges– Researchers,educators,generalpublic,
policy-makers,…• Plan for long-term sustainability of the
national digitization network & effort– Expandparticipation:partners,datasources,public,…– Proliferateandbroadenusesofbiodiversitydata
Cyber-infrastructure
Digitization
Education &Outreach
Serving theResearch
Community
8
iDigBio Mission to Coordinate:• Engaging the collections community• Facilitating digitization & mobilization of data• Providing portal and API access to data• Facilitating research and outreach
108,000,000+
9
Advancing Digitization of Biodiversity Collections ADBCNational Digitization Network675 participating collections in 336 institutions (20 TCNs + 23 PENs)
Vertebrates,invertebrates,
plants, fossils, fungi,tissues, sounds,
videos, 2D, 3D, …
iDigBio Portal has1,537 recordsetscontaining 105M
records for ≈318Mspecimens with23M associatedmedia records
10
Thematic Collections Networks (2 of 20)…
11
SCAN TCN
a Data Portal Built toVisualize, Manipulate,
and Export SpeciesOccurrences
• Southwest Collection of Arthropods TCN evolves– into SymbiotaCollectionsofArthropodsNetwork– Fromoneprojecttomanyprojects– Supportedbyacommonplatform– Customizedbasedoncommunityinput
• 3 TCNs SCAN, LepNet, and InvertEBase• Each museum or project is a separate collection in the database
– butallcollectionssearchabletogether
12
More Thematic Collection Networks Highlights
• Thiers – MaCC to MiCC– infrastructure,community
• Experimenting with light-field photography– InvertEBase
• Linking data – ePANDDA
• Still need to go 5x faster (Cobb) –– georeferencedandid,– gapsforDiptera,predator– parasitoid
13
county=“devon”2,124
Education OutreachCitizen ScienceK-12 materialsUndergraduateFossil ClubsMentor teachers
Education OutreachCitizen ScienceK-12 materialsUndergraduateFossil ClubsMentor teachers
Community Building Activities
DigitizationWorkflowsProtocolsTask ClustersDissemination
Research UseTool collaborationPortal developmentENM workshopResearch SpotlightData quality
TrainingBiodiversity data skillsData literacyCollections softwareImagingProject Management
Activities
DigitizationWorkflowsProtocolsTask ClustersDissemination
Research UseTool collaborationPortal developmentENM workshopResearch SpotlightData quality
TrainingBiodiversity data skillsData literacyCollections softwareImagingProject Management
14
iDigBio Success: Workshop Principles
• Community-driven process• Each workshop
– Createdinresponsetoneed– Organizedbyinterestedparties– Attendedbydiversegroup
• Geographical distribution of sites• Demographic distribution• Repository of materials for all
15
Workshops reveal pattern of skills needs and knowledge gaps
• What skills are needed to mobilize and use the data?
16
Workshops reveal pattern of skills needs and knowledge gaps
• Digitisation workflow workshops– FlatSheetsandPackets,Pinned
SpecimensinTraysandDrawers,ThingsinSpirits,3D objectsinTrays,Imaging,…
• Capacity building needs revealed– software– standards– datacleaningandmanagement– spreadsheets,textfiles– datavisualizationandsynthesis– recognizingautomatabletasks– limitednumberofpeopleinthe
communitywiththenecessaryskills
Actions
• PartnerindevelopingandimplementingDataCarpentry,now
• BiodiversityInformaticsWorkshopSeriesatiDigBio– DataCarpentry– ManagingNHC Data– DemystifyingDataStandardsandtheIPT– FieldtoDatabase
• PartnerinBiodiversityInformatics101atSPNHC
• PartnerinDarwinCoreHour
17
Developing a Collections Digitization and Data Use Community
0
2
4
6
8
10
12
14
16
2011 Year1
2012 Year2
2013 Year3
2014 Year4
2015 Year5
2016 Year6
2017 Year7
2018 Year8
2019 Year9
2020 Year10
2021 Year11
2022 Year12
2023 Year13
iDigBio’s Evolving Focus
Discovery andDevelopment
Digitization BestPractices
Research Use of NHCData
18
Cool research uses
Predicting Extinction Using convolutional neuralnetworks to automate tropicalpollen counts and identification
Collecting trends: how wars and human historyinfluence biological collections
Sinervo, B. et al. Erosion of lizarddiversity by climate change and alteredthermal niches. Science 328, 894-899(2010)
Derek Haselhorst, Program in Ecology, Evolution andConservation Biology, University of Illinois, iDigBioResearch Spotlight: September 2017
Vaughn Shirey, The Academy of Natural Sciences of Drexel University,now here at Luomos, in iDigBio Spotlight: March 2018
19
20
More research published
• Workflows• Digitisation methods• Imaging, CT, recordings, CNN• Phenology• Public participation• Georeferencing• Small collection import
– gapanalysisstrategicplanning,research– awareness
21
Exemplary initiatives• Entomological Collections Network
– createsacohesiveentomologicalcollectionsfamily• SPNHC CC Network and EPG
– Universityaffiliated– createsmomentum,addresseslimitedexpertiseandmoney,while
capitalizingonopportunitiesforstudentsandearlycareerprofessionalstodrivechange
• We Dig Bio– PublicParticipation– Visibilityandengagementinlocalandworldwideevents
• createsworldwiderelevanceforyoursmallcollectionandcommunity• NANSH.org
– offersacompletemodelfromhowtoorganizeand– wheretoshareresults
• The Carpentries– foundationalbiodiversityinformaticsskillsandliteracyfor
reproducibleresearch
22
My wish list for DiSSCo
• set up plan for data flow before beginning– databacktoproviders– strongdatastandards– prescriptive andproscriptive examples
• explicit identifier recommendations /requirements
• implement annotations collaboratively• set up a clear citation / attribution strategy
– fortheproject,fordata,forcollections– branding,socialnorms,automated– visualizeresearchdone
• managing media is also challenging– “sendmeaharddrive”stillexists– dataproviderinfrastructureaccessissues– addressarchivalstorage
• (require?) robust collections metadata• encourage use of extensions, or other
methods for getting richer recordsets• support need for (improving / offering)
– taxon,locality,peopleserviceandauthorityfiles
• duplicate / related-object finding• networked – so users can tell which
aggregators have which recordsets• support metrics needs• support media analysis (ML, CNN, …)• hardware / software general
recommendations?– publiccloudoropensourcesoftware
stackswherepossible• capacity + community building
23
Where does iDigBio go from here?
• Limited time program (10 years)• How to sustain activities?
– Digitizationprojects?– Supportfordigitizationimprovements?– Datamobilizationanduseskills?
• How to sustain data infrastructure?– Datapersistence?– Dataquality?– Dataportal?
• How to sustain commitment– Governmental?– Community?
24
iDigBio Successes: Using Data• Take a look at the monthly Research Spotlight and Research on our website.
• Watch the presentations and read discussions from the iDigBio workshopUsing Biodiversity Specimen-Based Data to Study Global Change.
• Be Ignited by speakers at the Ecological Society of America 2015 sessionEnhancing Ecological Research with iDigBio Biological Specimen Data.
• Find out more about Big Data and Bugs: How Massively Collected Biodiversity DataAre Changing the Way We Do Insect Science at the Entomological Society of America2017.
• Listen to Gil Nelson’s talk highlighting Research Outcomes of the ADBC Community’sEfforts to Digitize Data for Biodiversity Research at iDigBio's Summit VII 2017.
• Discuss open research project ideas on GitHub with iDigBio and collaborators.
• Check out GUODA and Effechecka and Fresh Data
25
oVert
Effechecka
www.idigbio.org
facebook.com/iDigBio
twitter.com/iDigBio
vimeo.com/idigbio
idigbio.org/rss-feed.xml
webcal://www.idigbio.org/events-calendar/export.ics
iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (CooperativeAgreement EF-1115210). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s)and do not necessarily reflect the views of the National Science Foundation. All images used with permission or are free from copyright.
Kiitos paljon ICEDIG, Anna palaa
Thanks a lot ICEDIG, Go for it!
27
Thematic Collections Networks (TCNs)and Partners to Existing Networks (PENs)TCN: network of institutions strategically digitizinginformation for a particular research theme, such as impactsof climate change or biota of a region.
The Mid-Atlantic Megalopolis
Cretaceous World
SoRo
oVert
Top Related