p resentation by Randall Schuh, American Museum of Natural History
description
Transcript of p resentation by Randall Schuh, American Museum of Natural History
NSF ADBC Digitization TCN-TTDPlants, Herbivores, and ParasitoidsA Model System for the study of Tri-Trophic Associations
Ten months later…
presentation by
Randall Schuh, American Museum of Natural HistoryRob Naczi, New York Botanical GardenChristiane Weirauch, University of California RiversideKatja Seltmann, American Museum of Natural History
,
http://tcn.amnh.org
The Tri-Trophic ApproachCapturing Data for the Nearctic Biota
•85% of 11,000 Hemiptera from the Nearctic are herbivorous with high host specificity
•Bias in plant groups attacked, e.g., , Pinaceae, Poaceae, Asteraceae, Chenopodiaceae, Rosaceae
•Some serious agricultural pests (armored scales, mealy bugs, potato leafhoppers, Lygus bugs)
•Vectors of viral and bacterial diseases (green peach aphid is a vector of over 100 plant viruses)
•Parasitic Hymenoptera are beneficial as biological control agents
MICHMICH
MOMO
NYBGNYBGEMC
WIS
MIN
KANU
ISC
COLO
MAINE
MU
TEX
ILLILLS
Botanical InstitutionsBotanical Institutions
MICHMICH
MOMO
NYBGNYBGEMC
WIS
MIN
KANU
ISC
COLO
MAINE
MU
TEX
ILLILLS
SEINET
CCH
CPNH
Botanical InstitutionsBotanical InstitutionsBotanical Data ProvidersBotanical Data Providers
MICHMICH
MOMO
NYBGNYBGEMC
WIS
MIN
KANU
ISC
COLO
MAINE
MU
TEX
ILLILLS
SEINET
CCH
CPNH
AMNHCDFA
UCRC
CAS
BPBM
MEM
CMNHINHS
CUIC
CSUC
TAMU
OSAC
NCSU
SEMC
UDCCEMEC
UMEC
UKIC
Botanical InstitutionsBotanical InstitutionsBotanical Data ProvidersBotanical Data ProvidersEntomological CollectionsEntomological Collections
Project management
• Steering Committee of 10 PIs + Project Manager▫Decision-making on overall project goals, directions, and
progress
• Full-time Project Manager at AMNH (Katja Seltmann)▫Day-to-day project management, technical capability, data
analysis, training of entomology partners, vetting and upload of authority files, centralized georeferencing
• Full-time Project Coordinator at NYBG (Kim Watson)▫Training of botany partners, barcoding of NYBG specimens,
and label-data capture for all partner institutions
Entomological Databasing
Streamlined Interface for Rapid Data Entry
Taxon names
Locality data
Collection Events
Specimen Data
Host names
Database Attributes•Web enabled•Open-source software•Centralized data storage, backup, and management
Database Benefits•Single-product management•Simplified user training•Centralized authority-file management•Centralized georeferencing•Data aggregation shifted to HUB and DiscoverLife.org
Authority Files
Botanical• Tropicos database used across entire project
Entomological• Published catalogs and unpublished lists from
specialists
Objectives• Present uniform up-to-date taxonomy• Reduce decision making by data-entry personnel• Limit entry of new names by data-entry personnel
Data Aggregation and Dissemination------------------------
leveraging DiscoverLife.org
Approaches to OutreachAMNH Short Course in Collection Databasing Fundamentals• Train graduate-students through participant-support funding• Involve students from multiple graduate programs• Provide fundamentals, including database options, data
structures, unique specimen identification, specimen handling, georeferencing, research tools, data dissemination
Undergraduate Research Projects• REU projects joining project data to student research
involvement
Community Outreach• http://research.amnh.org/pbi/heteropteraspeciespage/
Rob NacziNew York Botanical Garden
Botanical Specimen Imaging
Insect Specimen Imaging• Image representative
specimens for each species
•Use existing imaging stations at partner institutions
•About 30% of Hemiptera are already imaged
•Expect to produce about 20,000 new images
Use of OCR for Populating Botanical Records
Workflow• jpgs of specimen sheets batch-cropped to labels• labels saved as new set of jpgs, then exported to ABBYY Fine
Reader 11 Corporate Edition• overnight, labels batch-processed through ABBYY• each OCR output file saved as individual text file tied to
barcode no.• individual text files merged into Excel spreadsheet, in which
data can be searched, grouped, and parsed• parsed fields pushed to database
Challenges• increasing accuracy of parsing• hand-written labels (now experimenting with out-sourcing)
Data Storage Issues
Botany• botanical images are valuable products of our digitization
efforts, but also challenges, due to storage demands• our concern is with long-term storage (archiving) of
uncompressed, original images• have encouraged home institutions of our partners to step
up, but some unable/unwilling• our solution for now is storage on portable drives, but this is
tenuous fix and not reliable enough for truly archival storage
Entomology• no major issues
Christiane WeirauchUniversity of California Riverside
Subcontract ManagementSetup• 7 collaborating institutions, 27 subawards• Benefit: long-term data capture across >30 institutions
Issues1) Delays: administrative and accounting issues2) Database selection: which one to use?3) Training: onsite versus remote training?4) Tracking productivity of subawards not using PBI database
Solutions/suggestions1) Streamlined administrative and accounting procedures 2) Encourage use of a default database; more discussion3) Combination of onsite and remote training and monitoring4) Regular contact with subawards
Unique Specimen Identifiers (USIs)
AMNH Matrix-code labels
• Setup: Matrix codes (barcode scanner) and string of prefix and 8-digit number (human eye) encode the same unique identifier
• Benefit: Tracking of specimens; connect images to records
• Format: Prefix (8 characters): acronym and identifier: e.g., UCRC_ENT XXXXXXXX
•Non-standard USIs: accepted in the database
• Exceptions: collections that were previously databased without USIs (e.g., Aphidoidea, certain mirid taxa)
Collection StagingOrganizing, sorting, and identifying specimens in preparation for databasing
• Importance: highest identification level and accuracy will yield most useful data for future applications
• Priority: well-curated and well-identified collections• TTD: limited budget for staging by experts; very successful
for , e.g., Miridae and Membracidae
• Issue: routine staging more time-consuming than anticipated
• Possible solution: budget for graduate students or post docs to help with staging (and training/supervision of databasing crew)
Tri-trophic concept: Hemiptera, plants, parasitoidsCapture of host data
•New TTD records: 26% with host records (compared to 24% previously databased); added >800 new hosts
Challenges of integrating parasitoid data
•Level of identification of parasitoids (undescribed species; accurate identification requires skilled personnel)
•Level of host identification (e.g., “white fly”)
•Incorporation of host information from secondary sources (e.g., taxonomic literature)?
On the right track; prioritize specimens with quality host records & integrate secondary host information
Katja SeltmannThe American Museum of Natural History
Efficiency of Data Capture: Insects• Total as of October 17, 2012 = 198,409▫ Includes Illinois, Texas, and Kansas▫ All 20 subcontracts are digitizing now▫ 53 contributors for ttd-tcn project
Numbers from NHCR database (central database at AMNH – 11 subcontracts)
• $20,000 in equipment costs• Specimens per min average: 3-3.5min/specimen (range 1.2-6)• Cost per specimen: $.93 (includes equipment)• Peak in July (more hours digitizing)• 65 collecting events on Christmas Day
Efficiency of Data Capture: PlantsAll but three institutions up and running
• As of October 9, 2012 have 102,651 images▫3 of 15 institutions not yet begun
• 4 plant collections report:▫$30482.51 equipment costs▫$.73 cents a specimen image
▫The unmentioned curator volunteerism 4-8 hrs/week depending on institution/taxon ~19 hours a week total
Training Methods: Insects (NHCR Database)• Curators also training (sexing specimens, database)
• Online training via Skype▫ Digitizers clubhouse (building community)▫ Online manuals▫ Online videos▫ Remote training
• Using central db can access quality of data▫ Flag when new name is entered▫ Flag when more than 10 specimens entered in one min by one person▫ Flag when exact duplicate collecting events or localities (check training)
Training Methods: Plants▫Site visits to subcontract institutions
Kim Watson, Melissa Tulig Install imaging equipment Personal involvement
Quality Assessment of Transformed Records (NHCR)
Determination
Completeness
Note Language
(A,B,B) ; (A,A,A) ; (A,C,B)
Present total:1487 9134
Canada 14 96
USA 1441 8564
Mexico 32 474
Georeferencing: NHCR database130,000 specimen records
Georeferencing: NHCR database
•GEOLocate (North America)•Discover Life validation•Centralized and controlled georeferencing (NYBG, AMNH)•Volunteer georeferencing
Difficult data Issues: specimen relationships
Difficult data Issues: means for curation?
Summary and Predictions:• over 50,000 locality records from NHCR
• will reach 1 million new specimen records for insects (harder to predict for plants at the moment)
• less than $1 a specimen (inclusive)
• Arthropod (NHCR) data concerns will become more central as other groups come online
Thanks to
National Science Foundationco-PIs and collaborators
http://tcn.amnh.org