Designing a high quality metabolomics experiment
-
Upload
aiko-cummings -
Category
Documents
-
view
42 -
download
0
description
Transcript of Designing a high quality metabolomics experiment
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
Designing a high quality metabolomics experiment
Grier P Page Ph.D.Senior Statistical Geneticist
RTI International
Atlanta Office
770-407-4907
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
Errors Errors Everywhere
RTI International
Understand the strengths and weaknesses of each step of the experiments.
Take these strengths and weaknesses into account in your design.
Primary consideration of good experimental design
RTI International
The Myth That Metabolomics does not need a Hypothesis
There always needs to be a biological question in the experiment. If there is not even a question don’t bother.
The question could be nebulous: What happens to the metabolome of this tissue when I apply Drug A.
The purpose of the question is to drive the experimental design.
Make sure the samples answer the question: Cause vs. effect.
RTI International
Design Issues
Known sources of non-biological error (not exhaustive) that must be addressed– Technician / post-doc– Reagent lot– Temperature– Protocol– Date– Location– Cage/ Field positions
RTI International
Biological replication is essential.
Two types of replication– Biological replication – samples from different individuals
are analyzed– Technical replication – same sample measured
repeatedly Technical replicates allow only the effects of measurement
variability to be estimated and reduced, whereas biological replicates allow this to be done for both measurement variability and biological differences between cases. Almost all experiments that use statistical inference require biological replication.
RTI International
How many replicates?
Controlled experiments – cell lines, mice, rats 8-12 per group.
Human studies – discovery 20+ per group For predictive models – 100+ per group, need model
building and validation sets The more the better, always.
RTI International
Experimental ConductAll experiments are subject to non-
biological variability that can confound any study
RTI International
What are Orthogonalization and Randomization ?
Orthogonalization- spreading the biological sources of error evenly across the non-biological sources of error. – Maximally powerful for known sources of error.
Randomization – spear the biological sources of error at random across the non-biological sources of error.– Useful for controlling for unknown sources of error
RTI International
Examples of Orthogonalization and Randomization ?
Sample # Treatment Variety
1 1 1
2 1 2
3 1 1
4 1 2
5 2 1
6 2 2
7 2 1
8 2 2
Order Sample
1 1
2 2
3 5
4 6
5 8
6 7
7 4
8 3
Order Sample
1 7
2 6
3 4
4 1
5 2
6 8
7 5
8 3
The experiment Orthogonalize Randomize
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
Statistical analyses have assumptions too
RTI International
Statistical analyses
Supervised analyses – linear models etc– Assume IID (independently identically distibuted)– Normality– Sometimes can rely on central limit– ‘Weird’ variances– Using fold change alone as a statistic alone is not valid.
– ‘Shrinkage’ and or use of Bayes can be a good thing. False-discovery rate is a good alternative to
conventional multiple-testing approaches. Pathway testing is desirable.
RTI International
Classification
Supervised classification– Supervised-classification procedures require
independent cross-validation.– See MAQC-II recommendations Nat Biotechnol. 2010
August ; 28(8): 827–838. doi:10.1038/nbt.1665. Wholly separate model building and validation
stages. Can be 3 stage with multiple models tested Unsupervised classification
– Unsupervised classification should be validated using resampling-based procedures.
RTI International
Unsupervised classification - continued
Unsupervised analysis methods– Cluster analysis– Principle components– Separability analysis
All have assumptions and input parameters and changing them results in very different answers
RTI International
There is strength in numbers —power and sample size .
Unsupervised analyses– Principal components, clustering, heat maps
and variants– These are actually data transformations or
data display rather than hypothesis testing, thus unclear if sample size estimation is appropriate or even possible.
– Stability of clustering may be appropriate to think about. Garge et al 2005 suggested 50+ samples for any stability.
RTI International
Sample size in supervised experiments
Supervised analyses– Linear models and variants– Methods are still evolving, but we suggest the
approach we developed for microarrays may be appropriate for metabolomics (being evaluated)
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
Metabolomics does not reveal everything and different technologies show different things
RTI International
Metabolite quality
Still evolving field RTI is one of the Metabolomics Reference
Standards Synthesis Centers
RTI International
Just because a database says something does not mean it is right. Read the evidence.
Databases are biased. Databases are incomplete Databases have lots of data Understand data before you use it Database are useful!
Understand what databases include, don’t include, and assumptions
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
Issues in the Annotation of Genes, proteins, metabolites
RTI International
Gene Symbol p-value fc 50/21 Gene Ontology Biological Process Gene Ontology Cellular ComponentPathwayAco2 0.746656 0.955755 --- --- Krebs-TCA_Cycle // GenMAPPPdk2 0.967577 1.005459 6086 // acetyl-CoA biosynthesis from pyruvate5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPPdk2 0.823635 1.02781 6086 // acetyl-CoA biosynthesis from pyruvate 5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPPdha2 0.368075 1.403263 6096 // glycolysis 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPIdh1 0.710704 0.994378 6099 // tricarboxylic acid cycle 5829 // cytosol ---Acly 0.367315 0.982691 6099 // tricarboxylic acid cycle 5622 // intracellular Fatty_Acid_Synthesis // GenMAPPAco2 1.22E-06 0.561041 --- --- Krebs-TCA_Cycle // GenMAPPFh1 6.76E-06 0.690515 6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPAtp5g3 1.53E-06 0.754735 6099 // tricarboxylic acid cycle // 5739 // mitochondrion ---Suclg1 8.87E-07 0.694384 6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPMdh1 5.92E-09 0.519311 6099 // tricarboxylic acid cycle // --- Krebs-TCA_Cycle // GenMAPPMor1 4.24E-07 0.617645 6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPIdh1 2.36E-06 0.677013 6099 // tricarboxylic acid cycle // 5829 // cytosol // ---Idh3g 2.19E-06 0.709971 6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPDlst 2.49E-07 0.688339 --- --- ---Sdhd 5.13E-07 0.583485 6121 // mitochondrial electron transport, succinate to ubiquinone 5749 // respiratory chain complex II (sensu Eukaryota) Krebs-TCA_Cycle // GenMAPPSdhc 1.82E-06 0.64108 --- --- ---RGD:735073 2.13E-07 0.570307 --- 9352 // dihydrolipoyl dehydrogenase complex---Cs 1.56E-07 0.560436 --- 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPRGD:621624 1E-06 0.486736 6099 // tricarboxylic acid cycle // 5829 // cytosol ---Idh3B 2.57E-07 0.694389 --- --- Krebs-TCA_Cycle // GenMAPPMdh1 1.08E-05 0.496911 6099 // tricarboxylic acid cycle // --- Krebs-TCA_Cycle // GenMAPPPc 1.91E-05 0.468765 6094 // gluconeogenesis // 5739 // mitochondrion Krebs-TCA_Cycle // GenMAPPRGD:708561 0.004002 0.76777 --- 5913 // cell-cell adherens junction Krebs-TCA_Cycle // GenMAPPRGD:708561 0.03978 0.686511 --- 5913 // cell-cell adherens junction Krebs-TCA_Cycle // GenMAPPDlat 4.76E-06 0.435534 6086 // acetyl-CoA biosynthesis from pyruvate // inferred from electronic annotation /// 6096 // glycolysis // inferred from electronic annotation /// 8152 // metabolism // inferred from electronic annotation5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPSdhd 1.3E-06 0.64335 6121 // mitochondrial electron transport, succinate to ubiquinone // inferred from sequence or structural similarity5749 // respiratory chain complex II (sensu Eukaryota) // inferred from sequence or structural similarityKrebs-TCA_Cycle // GenMAPPSdha 7.85E-06 0.730667 6099 // tricarboxylic acid cycle // 5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPIdh3a 0.000449 0.690147 6099 // tricarboxylic acid cycle // 5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPPdk4 0.044616 1.700116 6086 // acetyl-CoA biosynthesis from pyruvate5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPCs 1.36E-06 0.592128 --- 5739 // mitochondrion // Krebs-TCA_Cycle // GenMAPPAcly 0.000227 0.554459 6085 // acetyl-CoA biosynthesis 5622 // intracellular // Fatty_Acid_Synthesis // GenMAPP
Annotation is inconsistent across sources
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
Issues with pathway data
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
Share Your Data
Use shared data!
RTI International
Practice compendium research – to allow others to replicate your work
Many high profile omic studies are not even technically reproducible
Overshare your data and show work
RTI International
Limited in the literature so far. Some work on tissue and species metabolomes.
Use metabolomics databases
RTI International
Design your experiment well Conduct your experiment well Control for non-biological sources of error Know what is good and bad quality data at each stage
including metabolite, image, data, and annotation If you are aware of these issues and control for them
highly powerful and reproducible metabolite experimentation is possible.
Else you get garbage Share your data and use shared data
Summary
RTI International
The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray based predictive models. Nat Biotechnol. 2010 August ; 28(8): 827–838.
Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006 Jan;7(1):55-65.
Baggerly K. "Disclose all data in publications." Nature. 2010 Sep 23;467(7314):401. PMID: 20864982
Repeatability of published microarray gene expression analyses. Nat Genet. 2009 Feb;41(2):149-55
A design and statistical perspective on microarray gene expression studies in nutrition: the need for playful creativity and scientific hard-mindedness. Nutrition. 2003 Nov-Dec;19(11-12):997-1000.
39 Steps. From Drug Discov Today. 2005 Sep 1;10(17):1175-82.
References
RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
RTI Regional Comprehensive Metabolomics Resource Core
(RTI RCMRC)
Susan Sumner, PhDDirector RTI RCMRC
Discovery SciencesProteomics and Metabolomics Programs
RTI International
RTI International
Contact Information for the RTI RCMRC
Susan C.J. Sumner, PhD
Director RTI RCMRC
Senior Scientist nanoSafety
RTI International
Discovery Sciences
3040 Cornwallis Drive
Research Triangle Park
North Carolina 27709
919-541-7479 (office)
919-622-4456 (cell)
Jason P. Burgess, PhD
Program Coordinator, RTI RCMRC
Associate Director, Discovery Sciences
RTI International
3040 Cornwallis Drive
Research Triangle Park
North Carolina 27709
919-541-6700 (office)
RTI International
MS and NMR Instruments at RTI and DHMRI
RTI DHMRI
Mass Spectrometers (38)LC-MS 13 6GC-MS 4 3GC x GC-TOF-MS 1 1ICP-MS 6 1MALDI ToF/ToF 2 1
NMR (6) 2 4
RTI International
Some RTI Metabolomics Applications and PilotsExperience with adolescent and adult human subject research, animal model and cell based research, e.g.,Apoptosis- cellsDrug induced liver injury- animal modelsin utero exposure to chemicals and fetal imprinting- animal modelsDietary exposure and imprinting- animal modelsNAFLD - pediatric obesity; microbiomeWeight Loss- pediatric obesityPreterm delivery- human subjectsResponse to vaccine- human subjectsNicotine withdrawal- human subjectsColon cancer- human subjects
RTI International
Pilot and Feasibility Studies
The aim of the pilot and feasibility program is to foster collaborations and promote the use of metabolomics.
Studies will be selected through an application process.– Application involves abstract, description of samples available (matrix type, volume, type
and duration of storage, sample processing, freeze thaws, etc), description of phenotypes, and plan for subsequent grant/contract submissions for metabolomics analysis beyond initial pilot study.
Applications may also include technology development.
Applications must agree to deposit data in DRCC, coauthor publications, and submit joint grant/contract proposals.
Deadlines being defined