Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13...
Transcript of Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13...
Mass digitization & crowdsourcingMaarten Heerlien
NBN Crowdsourcing Summit, Manchester 09-25-2015
NBN Crowdsourcing Summit, Manchester 09-25-2015 2
Naturalis Biodiversity Center
• Merger of Naturalis, National Herbarium, Zoological Museum
Amsterdam & ETI BioInformatics (2010)
• Staff: 300 (100 scientists)
• 200 peer-reviewed publications per year
• Collection: 37 million (5th in the world)
• 9 exhibition spaces
• 300.000 visitors per year (50.000 school children)
NBN Crowdsourcing Summit, Manchester 09-25-2015 3
Collection digitization
• 2010-2015: FCD program (FES Collection Digitization)
• Goals:
• 7 million objects digitized in detail
• 30 million objects digitized on a meta-level
• Permanent collection digitization infrastructure
• Budget: € 13 million (€ 1,87 per object)
• People: 80 temporary employees
• Funding: Economic Structure Enhancement Fund (FES)
NBN Crowdsourcing Summit, Manchester 09-25-2015 4
From tailor made to production lines
• Priority driven (policy, research, preservation)
• Digitization processes based on collection types
• Divide complicated and labor-intensive processes in a shorter
series of tasks
• Standardize the processes for data entry and photography
NBN Crowdsourcing Summit, Manchester 09-25-2015 5
Digitization process
2015 6
Digistreets
• Herbarium sheets
• Molluscs
• Wet collections
• Entomology
• Wood
• (in-)vertebrates dry
• Library
• Geology
• Microscopic slides
NBN Crowdsourcing Summit, Manchester 09-25-2015 7
Digitization and public engagement
NBN Crowdsourcing Summit, Manchester 09-25-2015 8
Digitization and public engagement
NBN Crowdsourcing Summit, Manchester 09-25-2015 9
Crowdsourcing: Glashelder!
• Transcription by online volunteers
• 100.000 microscopic glass slides
• Mites, Springtails and Aphids
• Goals:
• Full transcription and validation by the crowd
• 6 months
• Costs comparable to in-house digitization
• Existing platform: VeleHanden.nl (Many Hands)
• Dedicated platform for transcription of
handwritten heritage
• Benefit of pre-existing crowd
NBN Crowdsourcing Summit, Manchester 09-25-2015 10
Transcription
NBN Crowdsourcing Summit, Manchester 09-25-2015 11
Validation
NBN Crowdsourcing Summit, Manchester 09-25-2015 12
Results Glashelder!
• 9 months
• 200.000 transcriptions
• Record: 1913 transcriptions on one day
• 100.000 validations
• 497 participants
• More than 100 transcriptions: 73
• More than 1000 transcriptions: 26
• More than 10.000 transcriptions: 5
• 18 validators
• Costs comparable to in-house digitization
• Lots of media attention
NBN Crowdsourcing Summit, Manchester 09-25-2015 13
Results FCD
Digistreet Start End Objects digitized 1 object =
Molluscs 2011 – Q1 2013 – Q1 650.000 1 sample
Entomology 2011 – Q2 2015 – Q2 850.000 1 insect
Wood 2011 – Q3 2013 – Q2 125.000 1 wood sample
Library 2011 – Q3 2015 – Q2 820.000 1 page
Alcohol specimens 2012 – Q1 2015 – Q2 100.000 1 sample
Herbarium 2012 – Q2 2015 – Q2 4.400.000 1 herbarium sheet
Dry (e)vertebrates 2012 – Q2 2015 – Q2 275.000 1 specimen (part)
Glass slides 2012 – Q3 2015 – Q2 800.000 1 microscopic slide
Geology 2013 – Q2 2015 – Q2 200.000 1 sample
Total 8.220.000
NBN Crowdsourcing Summit, Manchester 09-25-2015 14
Access to digitized collections
• 8.220.000 specimens digitized in detail
• Published as open content
• Digitized data and scans of objects: CC0
• Other content: CC-BY
• Bioportal.naturalis.nl
• Netherlands Biodiversity API
• Data and content aggregators
NBN Crowdsourcing Summit, Manchester 09-25-2015 15
Thank you
• [email protected] | +31-71-751-9387
• https://science.naturalis.nl/en/collection/digitization
• http://bioportal.naturalis.nl (digital collections portal)
• http://docs.biodiversitydata.nl (Github API documentation)
• https://youtu.be/ODtuWKoujFw (introduction to Naturalis digitization program)
• https://youtu.be/TywNYCigY0k (digitizing entomology collections)
• https://youtu.be/hmG4twyHXkE (digitizing herbarium sheets)
• https://en.wikipedia.org/wiki/Wikipedia:GLAM/Naturalis (Naturalis content donation page)
• Heerlien et al., 2015. The natural history production line: An industrial approach to the
digitization of scientific collections. ACM Journal on Computing and Cultural Heritage 8, 1,
Article 3 (February 2015). http://dx.doi.org/10.1145/2644822