Embed Size (px)
Transcript of Eol fellow-march2010
Thomas GarnettEOL Fellows March 2010The Biodiversity Heritage Library: Liberating the Worlds Biodiversity Literature
The cited half-life of publications in taxonomy is longer than in any other scientific disciplineMacro-economic case for open access, Tom MoritzCurrent taxonomic literature often relies on texts and specimens > 100 years old.Levinus VincentElenchus tabularum, pinacothecarum, 1719
*BHL Why?The Taxonomic ImpedimentThe taxonomic impediment is a term that describes the gaps of knowledge in our taxonomic system- Darwin Declaration, 1998Georges Louis Leclerc, comte de Buffon Histoire naturelle : gnrale et particulire (Oiseaux), 1799-1808
BHL Members: US/UKAcademy of Natural Science (Philadelphia, PA)American Museum of Natural History (New York, NY)California Academy of Science (San Francisco, CA)The Field Museum (Chicago, IL)Harvard University Botany Libraries (Cambridge, MA)Harvard University, Ernst Mayr Library of the Museum of Comparative Zoology (Cambridge, MA)Marine Biological Laboratory / Woods Hole Oceanographic Institution (Woods Hole, MA)Missouri Botanical Garden (St. Louis, MO)Natural History Museum (London, UK)The New York Botanical Garden (New York, NY)Royal Botanic Gardens, Kew (Richmond, UK)Smithsonian Institution Libraries (Washington, DC)
BHL Members: BHL-EuropeMuseum fr Naturkunde - Leibniz-Institut fr Evolutions- und Biodiversittsforschung an der Humboldt-Universitt zu BerlinNatural History Museum, UKNarodni muzeum NMP CZAngewandte Informationstechnik Forschungsgesellschaft mbHFreie Universitt Berlin FUBBGBMGeorg-August-Universitt Gttingen Stiftung ffentlichen RechtsNaturhistorisches Museum WienHungarian Natural History MuseumMuseum and Institute of Zoology, Polish Academy of SciencesUniversity of CopenhagenStichting Nationaal Natuurhistorisch Museum, NaturalisNational Botanic Garden of BelgiumRoyal Museum for Central Africa,Royal Belgian Institute of Natural SciencesBibliothque nationale de FranceMuseum national dhistoire naturelleConsejo Superior de Investigaciones CientificasUniversit degli Studi di FirenzeRoyal Botanic Garden, EdinburghSpecies 2000John Wiley & Sons limitedHelsingin yliopisto UH-Viikki
BHL Members: BHL-ChinaChinese Academy of Science Institute of BotanyChinese Academy of Science Institute of ZoologyChinese Academy of Science Institute of MicrobiologyChinese Academy Science - Institute of Oceanography
BHL is a Focused ProgramThough BHL has is composed of libraries it has been a domain-specific program, not just a digital library project. It arose from and is responsive to the biodiversity community composed of the disciplines of taxonomy, systematics, evolutionary biology, ecology, conservation, and wildlife management. These are the primary audience.
Agricultural meteorologyPhysical AnthropologyMeliorationCrops and climateEthnologySocio-cultural AnthropologyPrehistoric archaeology BiochemistryFluid dynamicsGeneticsCytologyBiophysicsPlant loreMineralogyBioacousticsBioelectronicsRadioecologyBiomagnetismEnvironmentalManagementPhysical geographyToponymyEnvironmentalPolicyBiomechanicsGeomorphologyGeophysicsStratigraphyGeochemistrySedimentationGeomicrobiologyMicroscopyOrogenyPetrologyTaxidermyWile animaltradeVivariums, terrariums, aquariumsZoosAgricultural ecologyBioclimatologyBiogeomorphologyEcophysiologyRestoration ecologyForestryPlant CultureMedical botany / zoologySoil scienceEconomic botanyGeobiologyCoral Islands, Reefs & AtollsSeismologyContinental driftPlate tectonicsHydrologyOceanographyAtlases & GazeteersHistory of discoveries,Exploration & travelBioluminescencePhenologySpecimen catalogsCollection &preservationNatural History DirectoriesScientific drawing& illustrationHistory ofNatural sciencesImmunologyMicrobial ecologyVirologyNatural History Terminology, Abbrv.CyanobacteriaTopical terms derived from LCSHPaleontologyNatural History BiographiesNatural History Dictionaries & EncyclopediasAnimal biochemistryAnimal cultureAquacultureWildlife conservation
Core LiteratureBotanyPlant conservation Phytogeography Plant anatomy Plant physiologyPlant ecology Spermatophyta, Phanerogams Cryptogams Biological diversity EvolutionPhylogenetic relationships Evolutionary genetics Scientific voyages and expeditions Pre-Linnaean works Linnaean works Biodiversity conservation Conservation biology Ecosystem management Endangered species & ecosystems Extinction Classification, Nomenclature Biogeography Zoology/Botany--Morphology Zoology/Botany--Anatomy Zoology/Botany--Embryology Zoology/Botany--Reproduction Zoology/Botany--Geographical distribution Classification, systematics and taxonomy ZoologyInvertebrates Chordates Vertebrates Animal Behavior
Stats: Now Online70,630 volumes26.4 million pages
Oldest book: Schffers Herbarius, 1484.
What is the plan?
Digitize the core literature of biodiversity. Full works, not bits & pieces.Open Access: all content can be repurposed, reused, reformatted.Congruent: must fit in to a dynamic knowledge ecology. Scan public domain biodiversity literature.Negotiate rights to digitize copyrighted materials.Ingest content digitized by others.Provide interfaces & APIs for repository.GUIsServices for data mining & citation resolution
BHL Digital PreservationCommitted to long-term storage, curation, and preservation of digital text assets for the world-wide biodiversity communityBHL is a steward for this literature.To keep this content available and open for the future requires careful organizational planning.Preservation is both a technical and political/social process.
BHL Relationship with Non-Profit Journal Publishers
Opt in Copyright Model: The BHL works with professional societies and associations to integrate their publications into the BHL in a way that serves the societies missions and goals BHL indexes the articles using Taxonomic Intelligence, thereby vastly increasing their usability. Publishers content is embedded in the emerging knowledge ecology that is sweeping biology in this century .73 Permission Agreements to date. More under negotiation.Integration with gray literature in later phases of project.
Scanning = human work
Scan & Store: Internet ArchiveScanning on ScribesStorage in Petaboxes
Referrers: 1 Jan 08 31 Jan 10 Jan 1, 2008 Jan 31, 2010
Name Finding via TaxonFinder
Image from ScannerConverted to text OCR via OC OCR OCRName finding via TaxonFinderExtract namesSubmit to NameBankSOAP responseName Finding in actionwith Taxonomic Intelligence
OCR error rate for names onlyTop OCR errorsOf the 3,003 names, 1,056 were incorrectly transcribed by OCR.
1Insert Space8n->v2Omit Space9l->i3e->c10r->i4u->I11u->ii5u->n12h->l6i->l13h->ii7c->e14e->o
ConsiderationsImproving OCR software is out of scopeGoogles Tesseract is only viable open source optionFlurry of activity in 2006-2007, quiet sinceRekeying is expensive given size of corpusWill not scale
Name finding statistics27.7 million pages scanned70.4 million name strings found56.2 million names verified with a NameBankID1.4 million unique names with a NameBankID3.3 million unique names *without* a NameBankIDThis is where the interesting data live!!!
PDF Generation Stats
Mandate for new developmentdisplay / manage articles
meet community demands for bibliography / citation management
build from more open source tools
Development goals re: citationsCreate a repository for community-vetted taxonomic bibliographies.Ability to ingest, display, download, and index articles so that the BHL can operate as an article repository.Build from existing community of work around Drupal / Biblio.In use by collaborators
ServicesOpenURLFacilitate links to citations: protologues, articles, referencesDocumentation: http://www.biodiversitylibrary.org/openurlhelp.aspxNames ServiceReturn all occurrences of a name throughout BHL digitized corpusDocumentation: http://bit.ly/2e6sg9Access to 51million name strings using TaxonFinder1.4million unique namesWorking out a strategy for obscure speciesAlgorithm improvements to detect nomenclatural & taxonomic actsNew API
Services: OpenURL DisambiguationLooking for:
Services: OpenURL Results
Taxonomic name finding enhancementsNomenclatural acts in web servicesOther algorithms / verificationWoRMS dataImprovementRanking resultsVisualizationLifeDesksBibliography sharingResolve to articlesEOL Interfaces
Thank You TomWe welcome your input and advice.Tom GarnettBiodiversity Heritage Library Program Directorgarnettt@si.edu202-633-2238