Data Stewardship and integration of Biomedical OMICS data · Robust & validated protocols; quality...
Transcript of Data Stewardship and integration of Biomedical OMICS data · Robust & validated protocols; quality...
Thomas Hankemeier, Amy Harms
Netherlands Metabolomics CentreBiomedical Metabolomics Facility Leiden
Leiden Academic Centre for Drug ResearchLeiden University, The Netherlands
Data Stewardship and integration of Biomedical OMICS data
Biologicalquestion
Samplepreparation
Experi-mentaldesign
Data acquisition
Data pre-processing
Biologicalinter-
pretation
Dataanalysis
Samples Raw data List of peaks/Biomolecules(identification)
Relevant biomolecules/ connectivities
&Models
Metabolites
Sampling
Protocol
Metabolomics workflow
Biomedical Metabolomics Facility Leiden• Robust & validated protocols; quality system & trained personal
• > 15,000 samples/year
• Various types of samples: blood, urine, biopsies, cells, etc
• Large number of clinical/preclinical studies with academia, clinics, industry (cardiovascular and metabolic diseases, diabetes, infectious diseases, CNS diseases and nutritional studies)
• Access for academic & clinical researchers & industry(international pharma & nutrition)
Biologicalquestion
Samplepreparation
Experi-mentaldesign
Data acquisition
Data pre-processing
Biologicalinter-
pretation
DataanalysisSampling
Metabolomics FacilityAdvice Advice
www.bmfl.nl
Oxidativestress
Metabolicstress
Inflammatorystress
Biology-driven Global profiling
Validated metabolomics platforms
More details: www.bmfl.nl
Medium polarcentral carbon/
energy metabolism> 200GC-MS Apolar
metabolites> 400LC-MS
Apolar lipids
> 800LC-MS
Polar lipids
> 150LC-MS
Biogenicamines
> 90LC-MS/MS
Endocannabinoids> 40
LC-MS/MS
Oxylipinspro/anti inflammatory
lipid mediators> 120
LC-MS/MS Oxydative/nitrosative stress
> 60LC-MS/MS
> 2500 metabolites>1000 identified> 500 quantitativeVariation < 10%!
Global profiling of lipids using RP-UPLC-TOF MS
PG,PI, PSer, PE
FA
GPCho, SM, GPGro, GPEtn
DG, TG & ChoE
lyso-GPCho, lyso-GPEtn
+ve ESI
-ve ESI
Low energy trace
Waters Synapt qTOF-MSAgilent qTOF MS
Bile acidsFFALPCLPEPIPEPGPCSMDGTGCECER
Castro-Perez et al, J. Proteom. Res, 2010
Human plasma
Data processing: combining targeted & untargeted
• ‘pseudo targeted’ using target list• Identified• Quantitative (if reference compounds available)
• ‘known unknowns’ • ‘unknowns’
• MZextract (Van der Kloet/NMC, new!)Example: lipid profile
TG(52:1)
0 5×100 6 1×100 7 2×100 70
5×100 6
1×100 7
2×100 7
2×100 7
3×100 7
QC SamplesRegular Samples
conventional
unta
rget
ed a
ppro
ach
Comparison quantificationtargeted vs MZ extract
Feature set: several m/z of one analyte
Good Practices
Sample Randomization:• Important to randomize case/control, treated/untreated to avoid
artifacts introduced by changes in instrumental drift• Experimental design dictates the randomization strategy• Within batch variation is lower than between batch, so the batch
design should block related samples to minimize variation
Blinded analysis:• For important clinical studies, the person running the samples
should be blinded to the sample identity• The lab is unblinded for data analysis only after data have been
deposited in a database or with a collaborator
Quality Control ToolsDuring routine analysis, calibration lines, blanks, QCs and are prepared together with the samples. A statistical tool has been developed to apply corrections to the data and to output quality parameters
For data analysis, all peak areas are corrected for internal standard response followed by a QC correction. This tool corrects for instrumental and experimental drift within and over batches. QC-samples (pooled study-samples) bracket ~15 study-samples within a batch.
Assuring Traceability
Make sure that the results that we deliver can be proven and explained not only now but also 5 years from now.
Proper data management should facilitate research based on (existing) research.
An (easy to use) exchange format, using controlled vocabularies/ontologies gives certainty about what was measured and how it was measured.
Researchers need to share information required to reproduce the results (https://biosharing.org/pages/about/). Which means sharing:• SOP’s• Scripts/software to (pre-)process the data• Decisions made, for example why data was discarded• Etc.
Our efforts to assure traceability
Experimental design: Assure reproducible data that can be shared and link to other resources (proteomics, transcriptomics, and genomics)
Traceable data: Starting with ELN coupled to our in-house developed LIMS
Interpretable data: Use external identifiers and controlled vocabularies to present/report data
Data analysis: Freeze data + scripts/algorithms with output, rerun the data analysis pipeline on the same data should produce the same output. Scripts and software should be open (accessible) to understand what happens to data.
We are working hard to deposit our studies in Metabolights (1 live, 3 under curation and 4 more in preparation)
Leiden leads MetabolomeXchange, an international data aggregation and notification service for metabolomics.
What we tried to make data available:Data support platform (NMC)• Easily access and analyze experimental
metabolomics data with the data support platform (DSP).• a metabolomics data
warehouse• a data processing
infrastructure
MetabolomeXchange
International data aggregation and notificationservice for metabolomics set up by Leiden
Easy to search forand subscribe to publiclyavailable data sets
PhenoMeNal Consortium
• H2020 Societal Challenge in Health, Demographic Change, and Well being
• 3 years• 13 partners• 8 Mio Euros• 830 PM
e-infrastructure for the processing, analysis and information-mining of the massive amount of medical molecular phenotyping and genotypingdata that will be generated by metabolomics applications now entering research and clinic.
The Aim
Data collection QC Data pre-
processingStatistical Analysis
Workflows
Biomedical Data & Metadata
DTL / FAIR DATA
Findable, Accessible, Interoperable, and Re-usableLeiden University, partner of DTL (Dutch Techcentre forLife Sciences), supports the idea and developments of international FAIR Data principles.
Linking with vendors?
• Discussions are ongoing between Leiden and vendors to see how the experience and expertise gained through many NMC can be used to enhance the their workflow.
• Leiden has been participating in EU funded grants for improving the infrastructure for metabolomics communication.
• Vendors and community are both developing software tools to integrate metabolomics tools in a more system biology approach and we should work together.
Summary & discussion points
• Workflow and data management crucial and different for each facility and field
• Share good practices
• For Metabolomics:
• Absolute concentrations are key!?
• Benefits for validation and replication!!
• Some main facilities for high throughput!?
• Benefits can be achieved in omics integration; NL/DTL to lead by example?
• Sharing metadata often bottleneck
www.bmfl.nl and www.metabolomicscentre.nl
Ruud BergerBiochemical interpretation
Amy HarmsLeader
Metabolomics Facility
Jan van der GreefSystems approaches
& SDPPM
Ronan FlemingMetabolic modelling
(guest)
Paul VultoOrgan-on-a-chip &
microfluidics(guest)
Slavik KovalData analysis
Peter LindenburgMetabolomics
technology
Rawi RamautarMetabolomics
technology
Acknowledgement
AcknowledgementPhD students: Amar Oedit, Vasu Kantae, Bas Trietsch, Junzeng Fu, Robert-Jan Raterink, Can Gulersonmez, Min He, Nelus Schoeman, Mengmeng Sun, Vincent van Duinen, Rosilene Rossetto-Burgos, Abidemi Junaid, Wei Zhang, Renate BuijinkPost docs: Oskar Gonzalez, Michel van Weeghel, Anne-Charlotte Dubbelman, Petri Kylli, Estefania Moreno-Gordaliza, Marek Noga Technicians: Gerwin Spijksma, Faisa Guled, Anthanasis Giannitsis (clean room), Sabine Bos, Lieke Lamont-de Vries, Hyung Elfrink, Belèn Gonzàlez Amoros, Marian Martinez Zapata, Sandra Pous-Torres, Monique Nieman Scientific Programmer: Michael van VlietMechanical Workshop: Raphael Zwier
www.metabolomicscentre.nl