Gregoire Taillefer poster ESC final

1
Amélie Grégoire Taillefer and Terry A. Wheeler Dept. of Natural Resource Sciences, McGill University, Ste-Anne-de-Bellevue, QC, Canada Databasing the Lyman Entomological Museum: challenges and opportunities 6 weeks sampling Databases create opportunities Challenges Acknowledgments Future work 1. Preparation Identify specimens to lowest taxonomic level possible Verify status of taxonomic name Add unique identifier to each specimen The Lyman Entomological Museum began as the private collection of Henry H. Lyman, which was bequeathed in 1914 to McGill University. The largest university insect collection in Canada, it holds specimens in all orders with a worldwide terrestrial coverage collected from 1860 to the present. Since the mid 1990s much of the focus in collection development has been in the Diptera, although ongoing research projects, donations and exchanges continue to add material in all orders, particularly Coleoptera. 70% of the Diptera specimens were collected from Canada. Digitization – recording specimen collection labels in digital form – is a time-consuming and laborious process. Retrospective digitization of large collections is a costly undertaking, but the benefits in terms of data sharing and accessibility far outweigh the costs. Canadensys (canadensys.net ), the Canadian biodiversity open database, compiles taxonomic, geographic, temporal, numerical, and historical information about three megadiverse groups: plants, insects and fungi, housed in 18 institutions across Canada, which collectively house several million specimens. About 1.3 million specimen records are currently available on Canadensys; the Lyman Entomological Museum makes up 20% of that total. Steps in digitization Background and history A digitized collection is a rich source of primary biodiversity data for a range of applications in taxonomy, inventories, catalogs, and ecology. Data can be searched via maps (as above) or in list format. Shared, open, accessible data creates opportunities for building large datasets for analysis of large-scale patterns. Extraction of data on Canadensys for a particular taxon, locality or set of samples is easy and rapid. Collection databases have traditionally been used for curation, loan management or taxonomic research. Digitization facilitates all these functions. However, because of the extensive spatial, temporal and ecological data associated with specimen records, these databases are also valuable resources for ecological and conservation research. The dataset can easily be managed for the purpose of loans, systematic research, taxonomic coverage within an area for systematic, ecological or conservation purposes. The databases provide baseline data, as well as evidence of change over time, for regions or biotas areas that may have experienced habitat change. 1. Implementing an efficient, standard data entry procedure 2. Old labels with minimal information 3. Georeferencing old specimen localities 4. Errors in coordinates or localities on labels 5. Misidentified specimens 6. Data cleaning, validation and correction 7. Training volunteers and staff for data search and new entries Major funding for Canadensys was provided by the Canada Foundation for Innovation. Canadensys coordinates ongoing open access to our database. We thank David Shorthouse and Carole Sinou for all their help and advice in data cleaning and formatting for publication on Canadensys. No database is ever completed. Data checking and verification are an ongoing process as taxonomic experts verify identifications or provide finer taxonomic resolution. New specimens added to the collection require ongoing commitment by collection staff, students or volunteers in data entry and publication. For example, more than 150,000 arctic Diptera and new accessions from other regions currently await digitization in the Lyman Museum. Lyman Museum LEM-0013538 Progress to date Order Geographic scope Specimens databased Diptera Worldwide 240,000 + Neuroptera Canada 2,700 + Coleoptera (Buprestidae, Dermestidae) Canada 2,600 + Hymenoptera (Vespidae, Eumenidae) Canada 2,900 + Araneae Canada 4,500 + Source: Lyman Entomological Museum georeferenced records (253,061), Canadensys, Google Earth. (accessed on 2013-10-11) LEM0249541, from McGill University http :// dataset.canadensys.net/lemq-specimens (accessed on 2013-10-11) Biota 2-The Biodiversity Database Manager, R.K. Colwell, University of Connecticut, http:// viceroy.eeb.uconn.edu/Biota/biota , specimen and collection record tables. About 10% (253,000 specimens) of the Lyman collection has been databased with Canadensys support. Our database is freely hosted by Canadensys and shared internationally via the Global Biodiversity Information Facility (www.gbif.org ). 2. Databasing BIOTA 2 program used at Lyman Data entry requires frequent data verification Georeference records 3. Data publication Export data as text file Add columns and formulas for accepted data format Convert database information into Darwin Core (internationally accepted biodiversity information standard) Add collection metadata Serve data via Canadensys and GBIF

Transcript of Gregoire Taillefer poster ESC final

Page 1: Gregoire Taillefer poster ESC final

Amélie Grégoire Taillefer and Terry A. WheelerDept. of Natural Resource Sciences, McGill University, Ste-Anne-de-Bellevue, QC, Canada

Databasing the Lyman Entomological Museum: challenges and opportunities

6 weeks sampling

Databases create opportunities

Challenges

Acknowledgments

Future work

1. Preparation• Identify specimens to lowest taxonomic level possible• Verify status of taxonomic name • Add unique identifier to each specimen

The Lyman Entomological Museum began as the private collection of Henry H. Lyman, which was bequeathed in 1914 to McGill University. The largest university insect collection in Canada, it holds specimens in all orders with a worldwide terrestrial coverage collected from 1860 to the present. Since the mid 1990s much of the focus in collection development has been in the Diptera, although ongoing research projects, donations and exchanges continue to add material in all orders, particularly Coleoptera. 70% of the Diptera specimens were collected from Canada.

Digitization – recording specimen collection labels in digital form – is a time-consuming and laborious process. Retrospective digitization of large collections is a costly undertaking, but the benefits in terms of data sharing and accessibility far outweigh the costs. Canadensys (canadensys.net), the Canadian biodiversity open database, compiles taxonomic, geographic, temporal, numerical, and historical information about three megadiverse groups: plants, insects and fungi, housed in 18 institutions across Canada, which collectively house several million specimens. About 1.3 million specimen records are currently available on Canadensys; the Lyman Entomological Museum makes up 20% of that total.

Steps in digitization

Background and history

A digitized collection is a rich source of primary biodiversity data for a range of applications in taxonomy, inventories, catalogs, and ecology. Data can be searched via maps (as above) or in list format. Shared, open, accessible data creates opportunities for building large datasets for analysis of large-scale patterns. Extraction of data on Canadensys for a particular taxon, locality or set of samples is easy and rapid.

Collection databases have traditionally been used for curation, loan management or taxonomic research. Digitization facilitates all these functions. However, because of the extensive spatial, temporal and ecological data associated with specimen records, these databases are also valuable resources for ecological and conservation research. The dataset can easily be managed for the purpose of loans, systematic research, taxonomic coverage within an area for systematic, ecological or conservation purposes. The databases provide baseline data, as well as evidence of change over time, for regions or biotas areas that may have experienced habitat change.

1. Implementing an efficient, standard data entry procedure

2. Old labels with minimal information

3. Georeferencing old specimen localities

4. Errors in coordinates or localities on labels

5. Misidentified specimens

6. Data cleaning, validation and correction

7. Training volunteers and staff for data search and new entries

Major funding for Canadensys was provided by the Canada Foundation for Innovation. Canadensys coordinates ongoing open access to our database. We thank David Shorthouse and Carole Sinou for all their help and advice in data cleaning and formatting for publication on Canadensys.

No database is ever completed. Data checking and verification are an ongoing process as taxonomic experts verify identifications or provide finer taxonomic resolution. New specimens added to the collection require ongoing commitment by collection staff, students or volunteers in data entry and publication. For example, more than 150,000 arctic Diptera and new accessions from other regions currently await digitization in the Lyman Museum.

Lyman MuseumLEM-0013538

Progress to dateOrder Geographic scope Specimens

databased

Diptera Worldwide 240,000 +

Neuroptera Canada 2,700 +

Coleoptera (Buprestidae, Dermestidae) Canada 2,600 +

Hymenoptera (Vespidae, Eumenidae) Canada 2,900 +

Araneae Canada 4,500 +

Source: Lyman Entomological Museum georeferenced records (253,061), Canadensys, Google Earth. (accessed on 2013-10-11)

LEM0249541, from McGill University http://dataset.canadensys.net/lemq-specimens (accessed on 2013-10-11)

Biota 2-The Biodiversity Database Manager, R.K. Colwell, University of Connecticut, http://viceroy.eeb.uconn.edu/Biota/biota, specimen and collection record tables.

About 10% (253,000 specimens) of the Lyman collection has been databased with Canadensys support. Our database is freely hosted by Canadensys and shared internationally via the Global Biodiversity Information Facility (www.gbif.org).

2. Databasing• BIOTA 2 program used at

Lyman• Data entry requires

frequent data verification• Georeference records

3. Data publication• Export data as text file• Add columns and formulas for accepted data format• Convert database information into Darwin Core (internationally accepted biodiversity

information standard) • Add collection metadata• Serve data via Canadensys and GBIF