Designing a metabolomics experiment
description
Transcript of Designing a metabolomics experiment
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
Designing a metabolomics experiment
Grier P Page Ph.D.Senior Statistical Geneticist
RTI International
Atlanta Office
770-407-4907
RTI International
Types of Metabolomics
RTI International
Designing a good study
RTI International
Understand the strengths and weaknesses of each step of the experiments.
Take these strengths and weaknesses into account in your design.
Primary consideration of good experimental design
From Drug Discov Today. 2005 Sep 1;10(17):1175-82.
RTI International
State the Question and Articulate the Goals
RTI International
The Myth That Data Mining has No Hypothesis
There always needs to be a biological question in the experiment. If there is not even a question don’t bother.
The question could be nebulous: What happens to the gene expression of this tissue when I apply Drug A.
The purpose of the question is to drive the experimental design.
Make sure the samples answer the question: Cause vs. effect.
RTI International
Experimental Design
RTI International
Biological replication is essential.
Two types of replication– Biological replication – samples from different individuals
are analyzed– Technical replication – same sample measured
repeatedly Technical replicates allow only the effects of measurement
variability to be estimated and reduced, whereas biological replicates allow this to be done for both measurement variability and biological differences between cases. Almost all experiments that use statistical inference require biological replication,
RTI International
Statistical analyses
Supervised analyses – linear models etc– Using fold change alone as a differential
expression test is not valid.– ‘Shrinkage’ and or use of Bayes can be a good
thing. False-discovery rate is a good alternative to
conventional multiple-testing approaches. Data is not missing at random Pathway testing is desirable.
RTI International
Classification
Supervised classification– Supervised-classification procedures require
independent cross-validation.– See MAQC-II recommendations Nat Biotechnol. 2010
August ; 28(8): 827–838. doi:10.1038/nbt.1665. Wholly separate model building and validation
stages. Can be 3 stage with multiple models tested Unsupervised classification
– Unsupervised classification should be validated using resampling-based procedures.
RTI International
Sample size estimation for metabolomics studies
RTI International
There is strength in numbers —power and sample size .
Unsupervised analyses– Principal components, clustering, heat maps
and variants– These are actually data transformations or
data display rather than hypothesis testing, thus unclear if sample size estimation is appropriate or even possible.
– Stability of clustering may be appropriate to think about. Garge et al 2005 suggested 50+ samples for any stability.
RTI International
Sample size in supervised experiments
Supervised analyses– Linear models and variants– Methods are still evolving, but we suggest the
approach we developed for microarrays may be appropriate for metabolomics (being evaluated)
RTI International
RTI International
RTI International
Experimental ConductAll experiments are subject to non-
biological variability that can confound any study
UMSA Analysis
Insulin Resistant
Insulin Sensitive
Day 1Day 2
RTI International
Design Issues
Known sources of non-biological error (not exhaustive) that must be addressed– Technician / post-doc– Reagent lot– Temperature– Protocol– Date– Location– Cage/ Field positions
RTI International
Control Everything!
Know what you are doing Practice! Practice!
RTI International
Metabolite quality
Still evolving field, few good metrics such as RIN score or A260/A280 ratios to assess contamination and quality of extraction.
Example from RNA
• Confirmation of RNA integrity, based on an 28S:18S ratio greater than 1.5 as quantified by Agilent BioAnalyzer and formaldehyde gel electrophoresis
• However, • The Drosophila RNA has
a split peak for the 28s ribosomal RNA on theBioanalyzer.
Intact RNA
Degraded RNA
Images from Agilent
Be aware of what your specific Species should look like
• The Drosophila RNA has a split peak for the 28s ribosomal RNA on the Bioanalyzer.
• And no 18S peak
RTI International
What if you can’t control or make all things uniform
Randomize Orthogonalize
What are Orthogonalization and Randomization ?
• Orthogonalization- spreading the biological sources of error evenly across the non-biological sources of error. – Maximally powerful for known sources of
error.
• Randomization – spear the biological sources of error at random across the non-biological sources of error.– Useful for controlling for unknown sources of
error
Examples of Orthogonalization and Randomization ?
Sample # Treatment Variety
1 1 1
2 1 2
3 1 1
4 1 2
5 2 1
6 2 2
7 2 1
8 2 2
Order Sample
1 1
2 2
3 5
4 6
5 8
6 7
7 4
8 3
Order Sample
1 7
2 6
3 4
4 1
5 2
6 8
7 5
8 3
The experiment Orthogonalize Randomize
RTI International
Know your data - What should it look like
These are OK
These are not OK
RTI International
One bad sample can contaminate an experiment
Histogram of p-values
Potentially Bad Chip
Histogram of p-values with bad chip removed
RTI International
Quality of Database, Bioinformatics and Interpretative tools
RTI International
Just because a database says something does not mean it is right. Read the evidence.
Databases are biased. Databases are incomplete Databases have lots of data Understand data before you use it Database are useful!
Understand what databases include, don’t include, and assumptions
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
Issues in the Annotation of Genes
RTI International
Gene Symbol p-value fc 50/21 Gene Ontology Biological Process Gene Ontology Cellular ComponentPathwayAco2 0.746656 0.955755 --- --- Krebs-TCA_Cycle // GenMAPPPdk2 0.967577 1.005459 6086 // acetyl-CoA biosynthesis from pyruvate5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPPdk2 0.823635 1.02781 6086 // acetyl-CoA biosynthesis from pyruvate 5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPPdha2 0.368075 1.403263 6096 // glycolysis 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPIdh1 0.710704 0.994378 6099 // tricarboxylic acid cycle 5829 // cytosol ---Acly 0.367315 0.982691 6099 // tricarboxylic acid cycle 5622 // intracellular Fatty_Acid_Synthesis // GenMAPPAco2 1.22E-06 0.561041 --- --- Krebs-TCA_Cycle // GenMAPPFh1 6.76E-06 0.690515 6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPAtp5g3 1.53E-06 0.754735 6099 // tricarboxylic acid cycle // 5739 // mitochondrion ---Suclg1 8.87E-07 0.694384 6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPMdh1 5.92E-09 0.519311 6099 // tricarboxylic acid cycle // --- Krebs-TCA_Cycle // GenMAPPMor1 4.24E-07 0.617645 6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPIdh1 2.36E-06 0.677013 6099 // tricarboxylic acid cycle // 5829 // cytosol // ---Idh3g 2.19E-06 0.709971 6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPDlst 2.49E-07 0.688339 --- --- ---Sdhd 5.13E-07 0.583485 6121 // mitochondrial electron transport, succinate to ubiquinone 5749 // respiratory chain complex II (sensu Eukaryota) Krebs-TCA_Cycle // GenMAPPSdhc 1.82E-06 0.64108 --- --- ---RGD:735073 2.13E-07 0.570307 --- 9352 // dihydrolipoyl dehydrogenase complex---Cs 1.56E-07 0.560436 --- 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPRGD:621624 1E-06 0.486736 6099 // tricarboxylic acid cycle // 5829 // cytosol ---Idh3B 2.57E-07 0.694389 --- --- Krebs-TCA_Cycle // GenMAPPMdh1 1.08E-05 0.496911 6099 // tricarboxylic acid cycle // --- Krebs-TCA_Cycle // GenMAPPPc 1.91E-05 0.468765 6094 // gluconeogenesis // 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPRGD:708561 0.004002 0.76777 --- 5913 // cell-cell adherens junction Krebs-TCA_Cycle // GenMAPPRGD:708561 0.03978 0.686511 --- 5913 // cell-cell adherens junction Krebs-TCA_Cycle // GenMAPPDlat 4.76E-06 0.435534 6086 // acetyl-CoA biosynthesis from pyruvate // inferred from electronic annotation /// 6096 // glycolysis // inferred from electronic annotation /// 8152 // metabolism // inferred from electronic annotation5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPSdhd 1.3E-06 0.64335 6121 // mitochondrial electron transport, succinate to ubiquinone // inferred from sequence or structural similarity5749 // respiratory chain complex II (sensu Eukaryota) // inferred from sequence or structural similarityKrebs-TCA_Cycle // GenMAPPSdha 7.85E-06 0.730667 6099 // tricarboxylic acid cycle // 5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPIdh3a 0.000449 0.690147 6099 // tricarboxylic acid cycle // 5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPPdk4 0.044616 1.700116 6086 // acetyl-CoA biosynthesis from pyruvate5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPCs 1.36E-06 0.592128 --- 5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPAcly 0.000227 0.554459 6085 // acetyl-CoA biosynthesis 5622 // intracellular // Fatty_Acid_Synthesis // GenMAPP
Annotation is inconsistent across sources
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
Issues with pathway data
TCA cycle from Ingenuity
TCA from GeneMAPP
TCA cycle from Ingenuity
RTI International
Design your experiment well Conduct your experiment well Control for non-biological sources of error Know what is good and bad quality data at each stage
including metabolite, image, data, and annotation If you are aware of these issues and control for them
highly powerful and reproducible metabolite experimentation is possible.
Else you get garbage
Summary
RTI International
Practice compendium research – to allow others to replicate your work
Many high profile omic studies are not even technically reproducible
Overshare your data and show work
RTI International
The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray based predictive models. Nat Biotechnol. 2010 August ; 28(8): 827–838.
Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006 Jan;7(1):55-65.
Reproducible clusters from microarray research: whither? BMC Bioinformatics. 2005 Jul 15;6 Suppl 2:S10.
Baggerly K. "Disclose all data in publications." Nature. 2010 Sep 23;467(7314):401. PMID: 20864982
Repeatability of published microarray gene expression analyses. Nat Genet. 2009 Feb;41(2):149-55
A design and statistical perspective on microarray gene expression studies in nutrition: the need for playful creativity and scientific hard-mindedness. Nutrition. 2003 Nov-Dec;19(11-12):997-1000.
References
If time allows
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
RTI Regional Comprehensive Metabolomics Resource Core
(RTI RCMRC)
Susan Sumner, PhDDirector RTI RCMRC
Discovery SciencesProteomics and Metabolomics Programs
RTI International
RTI International
Contact Information for the RTI RCMRC
Susan C.J. Sumner, PhD
Director RTI RCMRC
Senior Scientist nanoSafety
RTI International
Discovery Sciences
3040 Cornwallis Drive
Research Triangle Park
North Carolina 27709
919-541-7479 (office)
919-622-4456 (cell)
Jason P. Burgess, PhD
Program Coordinator, RTI RCMRC
Associate Director, Discovery Sciences
RTI International
3040 Cornwallis Drive
Research Triangle Park
North Carolina 27709
919-541-6700 (office)
RTI International
MS and NMR Instruments at RTI and DHMRI
RTI DHMRI
Mass Spectrometers (38)LC-MS 13 6GC-MS 4 3GC x GC-TOF-MS 1 1ICP-MS 6 1MALDI ToF/ToF 2 1
NMR (6) 2 4
RTI International
Some RTI Metabolomics Applications and PilotsExperience with adolescent and adult human subject research, animal model and cell based research, e.g.,Apoptosis- cellsDrug induced liver injury- animal modelsin utero exposure to chemicals and fetal imprinting- animal modelsDietary exposure and imprinting- animal modelsNAFLD - pediatric obesity; microbiomeWeight Loss- pediatric obesityPreterm delivery- human subjectsResponse to vaccine- human subjectsNicotine withdrawal- human subjectsColon cancer- human subjects
RTI International
Pilot and Feasibility Studies
The aim of the pilot and feasibility program is to foster collaborations and promote the use of metabolomics.
Studies will be selected through an application process.– Application involves abstract, description of samples available (matrix type, volume, type
and duration of storage, sample processing, freeze thaws, etc), description of phenotypes, and plan for subsequent grant/contract submissions for metabolomics analysis beyond initial pilot study.
Applications may also include technology development.
Applications must agree to deposit data in DRCC, coauthor publications, and submit joint grant/contract proposals.
Deadlines being defined