Land Use and Water Quality in the Hudson River Amy Gao GIS in Water Resources November 16, 2010.
Amy Driskell - Information management and data Quality
-
Upload
consortium-for-the-barcode-of-life-cbol -
Category
Education
-
view
1.056 -
download
0
description
Transcript of Amy Driskell - Information management and data Quality
![Page 1: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/1.jpg)
Managing Data Flow Through the Barcoding Pipeline
Amy DriskellLaboratories of Analytical Biology (LAB)
National Museum of Natural HistorySmithsonian Institution
![Page 2: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/2.jpg)
What is the “pipeline”?
LIMS Specimen
Data Deposition
Data QC
![Page 3: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/3.jpg)
Outline
1. BEFORE the LIMS2. LIMS– Data recorded– Exploring laboratory success/failure– Tracking project completion
3. Data QC– Criteria and data requirements– Checking for contamination and validity
![Page 4: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/4.jpg)
Critical data management BEFORE specimen enters the laboratory pipeline
• Data elements (“metadata”) necessary for laboratory processing:– Taxonomy, collection information, etc.
IMPORTANT!• Assess laboratory successes/failures in light of this
information• Tailor/change lab protocols
![Page 5: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/5.jpg)
Careful Metadata Collection at Specimen Collection or Harvest
• Metadata can be formatted at the beginning of a project (e.g. at specimen collection) to guarantee a smooth information transfer into the LIMS
• Multiple sources for metadata:– Spreadsheets– Field Information Management Systems (FIMS)– Museum databases– Fusion tables
![Page 6: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/6.jpg)
Rockin’ It “Old School” -- Spreadsheets
• Modified BOLD specimen spreadsheet for use in field/museum• Additional fields desired by PIs• Modified easily to interface with multiple kinds of databases• 96-well format – 2D barcoded tubes, extraction plates• NOT directly connected to other databases, including LIMS
![Page 7: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/7.jpg)
An Elegant Solution:Biocode Moorea FIMS
http://biocode.berkeley.edu/
Actively connected to their LIMS
![Page 8: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/8.jpg)
bioValidator – cleaning up the collection of metadata
• Many aspects of metadata require specific formats: digital lat/long, meters, names
• bioValidator enforces adherence to formatting and other rules
• Photo matcher
http://biovalidator.sourceforge.net/
![Page 9: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/9.jpg)
Museum Collection Databases
• Sampling directly from existing collections?• Some museum databases cannot link directly
to lab-based information systems (LIMS)• Requires output from collection database,
input into lab database – no automatic updates
![Page 10: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/10.jpg)
Why? 1. Downstream insertion of data into other databases simplified
2. Because metadata has important uses in the lab• Determine possible causes of failure: taxonomy, collection
event, specimen age• adjust extraction or amplification protocols• design new primers – e.g. smaller fragments
![Page 11: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/11.jpg)
Specimens enter the labMetadata enters the LIMS
LIMS Specimen
&Metadata
Data Deposition
Data QC
![Page 12: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/12.jpg)
What is a LIMS?• An electronic lab “notebook” (aka database) to
replace our traditional paper lab notebooks.• Tracks a specimen through lab processes from
extraction through to barcode sequence completion (data QC may use external software).
• Records every lab procedure.• Provides information to guide further lab efforts –
success rates, “redo” lists• Records the physical location of extracts, etc.
![Page 13: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/13.jpg)
My requirements for a LIMS
• I want a system that records every piece of information about each specimen/extract for which I produce a barcode sequence.
• I want my procedures and protocols to be transparent enough so that anyone can reproduce my results.
• This includes my QC procedures.• Currently no good place to publish these data.
![Page 14: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/14.jpg)
Data to be recorded
• Extraction: protocol, digestion time, etc.• PCR: recipes, DNA [ ], cycling parameters, clean-up
method (PCR machine, brand of enzyme, lot #)• Gel photos• Sequencing: recipe, clean-up, machine, etc.
• Bonus: success or failure can be mapped back to any of these recorded values. Maybe the Taq was bad? Or the PCR machine needs repair?
![Page 15: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/15.jpg)
• A LIMS can be homegrown (like LAB’s barcoding LIMS, or SI’s plant barcoding LIMS) – relatively simple relational databases
• Sophisticated, commercially produced – Geneious plug-in Moorea Biocode LIMS (plug-in is free)
http://software.mooreabiocode.org
•Software updated and maintained•Plugs into the Geneious data analysis software
![Page 16: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/16.jpg)
Workflow
![Page 17: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/17.jpg)
Mapping workflow elements to success
![Page 18: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/18.jpg)
Tracking project progress & identifying next steps
• Which specimens have completed barcodes?
• Which specimens need additional labwork?
• Which specimens should be abandoned?
• Where are the original DNA extracts or tissue samples?
![Page 19: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/19.jpg)
Project Progress
![Page 20: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/20.jpg)
Raw data enters the QC process
LIMS Specimen
Data Deposition
Data QC
![Page 21: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/21.jpg)
Data QC
• OUTSIDE of LIMS database• “Clean up” raw data – trim, examine quality• Assemble passed traces (“contig”) for a
specimen• Examine/edit contigs• Check validity of resulting sequences
![Page 22: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/22.jpg)
My data QC ethos
• All criteria for each step of data analysis is recorded
• For raw trace processing: trimming criteria, length and quality requirements, binning criteria
• For assembly: assembly parameters, product length, etc.
• Hand editing is minimized*• It would be possible for anyone to recreate the
barcode sequence
![Page 23: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/23.jpg)
Any DNA sequence analysis software can be used for data QC
• Sequencher (Genecodes) & Geneious (Biomatters)– Trim ends of raw sequences with adjustable criteria, explore
effects of trim criteria– Discard short or poor sequences– Assemble trimmed reads with stringent, but adjustable
criteria– Output completed sequences
• Geneious LIMS is plugged into the data analysis software– direct communication– binning*
• Sequencher data must be exported and imported into LIMS
![Page 24: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/24.jpg)
Data analysis
Here are the traces. You can see some FIMS data in the document fields (eg
identified by, tissue id). You will also notice a binning column (see the following slide)
![Page 25: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/25.jpg)
BinningAutomatic categorization of reads and
assemblies
• Change binning parameters, examine effects
• Trimming and assembly dialog boxes similar
![Page 26: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/26.jpg)
Final Steps:Is it a contaminant? Is it identified correctly?
• A number of procedures for identifying contamination or incorrect identification– BLASTing database of known contaminants; Genbank;
BOLD– Quick and dirty assembly tests– NJ trees– Geneious taxonomy verification tool
![Page 27: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/27.jpg)
Verify Taxonomy• BLASTs your sequences• Gets the NCBI taxonomy for the best hit(s)• Compares to the taxonomy from the FIMS
![Page 28: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/28.jpg)
Good, clean, barcode sequences• Feed back into LIMS*– Monitor progress– Connect sequences and traces to specimen data
• Prepare for output to databases Genbank or BOLD upload packages
LIMS &
Data QCSpecimen
Data Deposition
![Page 29: Amy Driskell - Information management and data Quality](https://reader033.fdocuments.us/reader033/viewer/2022061300/54d27b944a79596d078b4632/html5/thumbnails/29.jpg)
Positive Information Flow from field or museum to final data deposition
1. Collect metadata to flow easily into LIMS and other databases
2. Record all aspects of all laboratory procedures (LIMS)
3. Use LIMS system for reporting and protocol investigation, monitoring of project progress
4. Input information and data from QC procedures into LIMS*
5. LIMS output upload packages for public databases