More than just a pretty picture: improving the discoverability of illustrations in the Biodiversity...
-
Upload
trish-rose-sandler -
Category
Technology
-
view
490 -
download
3
description
Transcript of More than just a pretty picture: improving the discoverability of illustrations in the Biodiversity...
Image access via Flickr The Biodiversity Heritage
Library (BHL) is.... Art of Life project
More than just a pretty picture: improving the discoverability of
illustrations in the Biodiversity Heritage Library by Gilbert Borrego, Grace Costantino, Bianca Crowley, Kyle Jaebker, Trish Rose-Sandler
Hidden within BHL literature are
millions of rich illustrations
• An open access digital library
for historic biodiversity literature
• An open data repository of
taxonomic names and
bibliographic information
BHL staff manually identify and push BHL images to a
Flickr stream (www.flickr.com/photos/biodivlibrary) but
the process does not scale to the millions of images
available
The Art of Life project , enabled by a grant from NEH,
aims to automate the process of identifying and
tagging images via algorithms
Users can add tags to images
in Flickr so that they are
searchable. They are also
encouraged to add species
names via machine tags so
BHL can automatically share
these images with the
Encyclopedia of Life (http://eol.org/collections/53002)
The project defined a metadata schema for natural history
illustrations that will help crowdsource more detailed
descriptions via image portals such as Wikimedia Commons (http://tinyurl.com/9hm7nsb)
www.biodiversitylibrary.org
Uploading Images to FlickrThe Biodiversity Heritage Library (BHL, www.biodiversitylibrary.org) provides access to thousands of scientific illustrations through the social media site, Flickr. To expedite the process of uploading these images to Flickr, a workflow was developed within BHL’s backend database. When paginating, or enhancing a book’s page metada-ta, staff can click a single button to upload all illustrations within that book to Flickr. Bibliographic information and a link to the image in BHL are also embedded during the process.
This workflow was internally documented in the form of a tutorial to ensure that all BHL partners can contribute to this effort and be part of the program’s expanding outreach efforts.
The use of Flickr as an outreach platform exposes our rich image collection to search engines and new users. Additionally, it allows us to provide images of species to include on the Encyclopedia of Life’s taxon pages. While the original intention of BHL’ Flickr account was to provide easy access to scientific figures, plates and illustra-tions, the site has taken on a life of its own and is being repurposed by users all around the world in the most imaginative ways.
From BHL’s backend dashboard, staff select the pages to upload to Flickr.
Final view in Flickr.
Once images are uploaded, staff can create sets, add additional bibliographic information, and assign
sets to collections.
Visit the BHL Flickr today! http://www.flickr.com/photos/biodivlibraryLearn how you can help add species names to BHL Images:http://www.flickr.com/groups/encyclopedia_of_life/discuss/72157629515768640/
The Flickr Tagging ProcessCrowdsourcing Species Identification and Image Tagging
The Biodiversity Heritage Library (BHL, www.biodiversitylibrary.org), an open access digital library consortium for biodiversity literature, utilizes Flickr to provide access to thousands of images extracted from its digital collections. In order to improve discoverability and usability of these images, BHL crowdsources the task of adding species name machine tags to images in Flickr.
Tags are searchable keywords that users can apply to images in Flickr. Machine tags are specially formatted to be read by computers: taxonomy:binomial=“Genus species”
BHL encourages its users to identify the species depicted in an image using the book’s image descriptions and add that species name to the image as a machine tag. By adding these tags to BHL images, users can search within Flickr for images of specific species and BHL can automatically share these images with the Encyclopedia of Life (EOL, www.eol.org).
EOL is an open access project dedicated to providing a webpage for every species. EOL harvests machine-tagged images from the BHL Flickr, uploads them to a BHL Image Collection in EOL, and automatically associates the images with the matching species page. To date, thousands of machine-tagged images have been added to EOL.
Visit the BHL Flickr today! http://www.flickr.com/photos/biodivlibraryLearn how you can help add species names to BHL Images:http://www.flickr.com/groups/encyclopedia_of_life/discuss/72157629515768640/
Find an image in Flickr
Add a species name machine tag
The image is automatically ingested into the BHL Image Collection in EOL
And automatically associated with the corresponding species
page in EOL
Users clamor for the Art of LifeThe Art of Life project evolved out of a need to improve access to the rich corpus of natural history illustrations hidden within the digitized pages of books and journals in the Biodiversity Heritage Library (BHL, www.biodiversitylibrary.org). Currently, these illustrations have no descriptive metadata such as title, creator or subject matter that can be searched. The only way to uncover these gems is by opening up a BHL book or vol-ume and scrolling through page by page.
One solution has been for BHL staff to manually identify pages that contain illustrations and to push those pages into a BHL Flickr stream which allows for discovery through themed collections and in some cases species names. While this approach has resulted in improved access to some of BHL’s illustrations, it requires significant staff time and the process does not scale well to the millions of images that are present within the BHL pages.
Example of an illustration described using Art of Life schemaIllustration schema elements.
Visit the BHL Flickr today! http://www.flickr.com/photos/biodivlibrary
Read more about the Art of Life project:http://biodivlib.wikispaces.com/Art+of+Life
Elements chosen were a mix of VRA
Core 4.0 and Darwin Core
Workflow diagram that outlines how each illustration will move through the Art of Life processes.
Thus, the Art of Life project was designed as a solution for automating the process of image identification and crowdsourcing their descriptions. The project is a partnership between the Missouri Botanical Garden and the Indianapolis Museum of Art and supported by the National Endowment for the Humanities. It runs from May 2012-April 2014. The Art of Life has five primary objectives: 1) define a metadata schema appropriate for nat-ural history illustrations, 2) build algorithms to automatically identify BHL pages with illustrations, 3) sort and classify the illustrations, 4) crowdsource descriptions through tagging applications; and 5) integrate descriptive metadata back into BHL and share images and descriptions with audiences outside of BHL. These illustrations will be of interest to a diversity of audiences including: artists; biologists; humanities scholars; librarians; educa-tors; citizen scientists.
Automating the Heavy LiftingUsing Algorithms to Identify Images in BHL
In the Art of Life project, the Indianapolis Museum of Art (IMA) and the Biodiversity Heritage Library (BHL, www.biodiversitylibrary.org) have been working to develop algorithms to identify images from the pages of books and journals digitized from the BHL. Multiple algorithms are being developed including ABBYY OCR, contrast, color, and compression. These algorithms are being tested to determine the most efficient and accurate means of identifying images.
The IMA developed a set of software tools for running and analyzing the results of the algorithms. This software allows for the import of publications and journals determined to be good test samples for the algorithms. These samples termed the “Gold Standard” are being used to evaluate the algorithms for how useful they will be in determining if a scan contains a sketch or drawing. Using a custom built interface for reviewing the results, accurate processing results can be seen as well as false positives. In addition to the visual review of results, analysis across the entire “Gold Stan-dard” is ongoing to determine the best combination of algorithms.
Once completed, the algorithms will be deployed on a cluster to process the entire BHL collection. After the processing has been completed the metadata will be used to add additional descriptive and finding aides. This will allow users to discover and process illustrations from the books and journals that used to be very hard to discover.
Visit the BHL Flickr today! http://www.flickr.com/photos/biodivlibrary
Read more about the Art of Life project:http://biodivlib.wikispaces.com/Art+of+Life
Learn how you can help add species names to BHL Images:http://www.flickr.com/groups/encyclopedia_of_life/discuss/72157629515768640/
Algorithm Results Viewer
Compression Ratio Algorithm Analysis
Close-up Algorithm Result