Visual Steering and Verification of Mass Spectrometry Data ...
Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence...
Transcript of Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence...
![Page 1: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/1.jpg)
Alexander Fillbrunn, Julianus Pfeuffer, Jeanette Prinz
The Center for Integrative Bioinformatics (CIBI) and KNIME
Analysis of Mass Spectrometry and Sequence
Data with KNIME
![Page 2: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/2.jpg)
Schedule
• About us
• Generic KNIME Nodes
• Mass-spectrometry data analysis in KNIME with OpenMS
• Introduction and theory of label-free quantification
• Demo of a label-free quantification workflow
• Analysis of high-throughput sequencing data with KNIME and SeqAn
• Introduction and theory of variant calling
• Demo of a variant calling workflow
![Page 3: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/3.jpg)
German Network for Bioinformatics Infrastructure
de.NBI Mission Statement
• The 'German Network for Bioinformatics Infrastructure' provides comprehensivefirst-class bioinformatics services to users in life sciences research, industry andmedicine. The de.NBI program coordinates bioinformatics training andeducation and the cooperation of the German bioinformatics community withinternational bioinformatics network structures.
Center for Integrative Bioinformatics (CIBI)
• … provides cutting-edge and integrative tools for proteomics, metabolomics, NGS and image data analysis as well as a workflow engine to integrate tools into coherent solutions for reproducible analysis of large-scale biological data.
![Page 4: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/4.jpg)
Generic KNIME Nodes
• Wrapping of command line tools (OpenMS, SeqAn,…) in KNIME via GenericKNIMENodes (GKN)
• Every OpenMS and SeqAn tool writes its Common Tool Description (CTD) via its command line parser
• GKN generates Java source code (static) or an XML representation (dynamic) for nodes to show up in KNIME
• Wraps C++ (&more) executables and provides additional file handling nodes
![Page 5: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/5.jpg)
Julianus Pfeuffer, Alexander Fillbrunn
Mass-spectrometry data analysis in KNIME
![Page 6: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/6.jpg)
OpenMS• OpenMS – an open-source C++ framework for computational mass
spectrometry
• Jointly developed at ETH Zürich, FU Berlin, University of Tübingen
• Open source: BSD 3-clause license
• Portable: available on Windows, OSX, Linux
• Vendor-independent: supports all standard formats and vendor-formats through proteowizard
• OpenMS TOPP tools – The OpenMS Proteomics Pipeline tools
– Building blocks: One application for each analysis step
– All applications share identical user interfaces
– Uses PSI standard formats
• Can be integrated in various workflow systems
– Galaxy
– WS-PGRADE/gUSE
– KNIME
Kohlbacher et al., Bioinformatics (2007), 23:e191
![Page 7: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/7.jpg)
Installation of the OpenMS plugin
• Community-contributions update site (stable & trunk)– Bioinformatics & NGS
• Provides > 180 OpenMS TOPP tools as Community nodes – SILAC, iTRAQ, TMT, label-free, SWATH, SIP, …
– Search engines: OMSSA, MASCOT, X!TANDEM, MSGFplus, …
– Protein inference: FIDO
![Page 8: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/8.jpg)
A Mass Spectrum
![Page 9: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/9.jpg)
Peak
DataMaps
Annotated
Maps
Data Flow in Shotgun Proteomics
HPLC/MSSample
Sig.
Proc.
Data Reduction
Diff.
Quant.
Identification
Differentially
Expressed
Proteins
100 GB
1 GB50 MB
50 MB 50 kB
Raw
Data
![Page 10: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/10.jpg)
Quantification StrategiesQuantitative Proteomics
Relative Quantification
Labeled
In vivo
14N/15N SILAC
In vitro
iTRAQ TMT 16O/18O
Label-Free
SpectralCounting MRM Feature-Based
Absolute Quantification
AQUA SISCAPA
After: Lau et al., Proteomics, 2007, 7, 2787
![Page 11: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/11.jpg)
Quantitative Data – LC-MS Maps
• Spectra are acquired with rates up to dozens per second
• Stacking the spectra yields maps
• Resolution:
– Up to millions of points per spectrum
– Tens of thousands of spectra per LC run
• Huge 2D datasets of up to hundreds of GB per sample
• MS intensity follows the chromatographic concentration
![Page 12: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/12.jpg)
LC-MS Data (Map)
13
Quantification(15 nmol/µl, 3x over-expressed, …)
![Page 13: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/13.jpg)
Label-Free Quantification (LFQ)
• Label-free quantification is probably the most natural way of quantifying – No labeling required, removing further sources of
error, no restriction on sample generation, cheap
– Data on different samples acquired in different measurements – higher reproducibility needed
– Manual analysis difficult
– Scales very well with the number of samples, basically no limit, no difference in the analysis between 2 or 100 samples
![Page 14: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/14.jpg)
LFQ – Analysis Strategy
1. Find features in all maps
![Page 15: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/15.jpg)
1. Find features in all maps
2. Align maps
LFQ – Analysis Strategy
![Page 16: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/16.jpg)
1. Find features in all maps
2. Align maps
3. Link corresponding features
LFQ – Analysis Strategy
![Page 17: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/17.jpg)
1. Find features in all maps
2. Align maps
3. Link corresponding features
4. Identify features
GDAFFGMSCK
LFQ – Analysis Strategy
![Page 18: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/18.jpg)
1. Find features in all maps
2. Align maps
3. Link corresponding features
4. Identify features
5. Quantify
GDAFFGMSCK
1.0 : 1.2 : 0.5
LFQ – Analysis Strategy
![Page 19: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/19.jpg)
Feature Finding
• Identify all peaks belonging to one peptide
• Key idea:
– Identify suspicious regions (e.g. highest peaks)
– Fit a model to that region and identify peaks explained by it
![Page 20: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/20.jpg)
Feature Finding
• Extension: collect all data points close to the seed
• Refinement: remove peaks that are not consistent with the model
• Fit an optimal model for the reduced set of peaks
• Iterate this until no further improvement can be achieved
![Page 21: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/21.jpg)
Feature-Based Alignment
• LC-MS maps can contain millions of peaks
• Retention time of peptides (or metabolites) can shift between
experiments
• In label-free quantification, maps thus need to be aligned in
order to identify corresponding features
• Alignment can be done on the raw maps (where it is usually
called ‘dewarping’) or on already identified features
• The latter is simpler, as it does not require the alignment of
millions of peaks, but just of tens of thousands of features
• Disadvantage: it replies on an accurate feature finding
![Page 22: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/22.jpg)
Feature-Based Alignment
~350,000 peaks
~ 700 features
![Page 23: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/23.jpg)
Map 1
Map 2
Map k
…
rt
m/z
T1
T2
Tk
Consensus map
• Dewarp k maps onto a comparable coordinate system
• Choose one map (usually the one with the largest number of features) as reference map (here: map 2 -> T2 = 1)
Multiple Alignment
…
rt
![Page 24: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/24.jpg)
Peptide Identification
Sven Nahnsen
LC-MS/MS experiment Fragment m/z values
Sequence database
Theoretical fragment m/z values from suitable peptides
Compare
Q9NSC5|HOME3_HUMAN Homer protein homolog 3 -Homo sapiens (Human)MSTAREQPIFSTRAHVFQIDPATKRNWIPAGKHALTVSYFYDATRNVYRIISIGGAKAIINSTVTPNMTFTKTSQKFGQWDSRANTVYGLGFASEQHLTQFAEKFQEVKEAARLAREKSQDGGELTSPALGLASHQVPPSPLVSANGPGEEKLFRSQSADAPGPTERERLKKMLSEGSVGEVQWEAEFFALQDSNNKLAGALREANAAAAQWRQQLEAQRAEAERLRQRVAELEAQAASEVTPTGEKEGLGQGQSLEQLEALVQTKDQEIQTLKSQTGGPREALEAAEREETQQKVQDLETRNAELEHQLRAMERSLEEARAERERARAEVGRAAQLLDVSLFELSELREGLARLAEAAP
569.24572.33580.30581.46582.63606.32 610.24616.14
569.24572.33580.30581.46582.63606.32 610.24616.14
569.24574.83580.70580.92579.99603.92 611.14616.74
570.84571.72580.40591.18579.35607.25 611.42614.45
569.24572.33580.30581.46582.63606.32 610.24616.14
569.24572.33580.30581.46582.63606.32 610.24616.14
569.24572.33580.30581.46582.63606.32 610.24616.14
569.24572.33580.30581.46582.63606.32 610.24616.14
569.24572.33580.30581.46582.63606.32 610.24616.14
569.24572.33580.30581.46582.63606.32 610.24616.14
1 QRESTATDILQK 18.77
2 EIEEDSLEGLKK 14.78
3 GIEDDLMDLIKK 12.63
Score hits
Theoretical spectra
m/z
[%]
m/z
[%]
m/z
[%]
m/z
[%]
Experimental spectra
m/z
RT
![Page 25: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/25.jpg)
© 2019 KNIME AG. All rights reserved.
Variant Calling/Annotation with SeqAn and KNIME
Jeanette Prinz, Julianus Pfeuffer, Alexander Fillbrunn, René Rahn
![Page 26: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/26.jpg)
© 2019 KNIME AG. All rights reserved. 32
Next Generation Sequencing (NGS)
• DNA sequencing: process of determining the nucleic acid sequence – the order of nucleotides (A,T,G,C) in DNA
• NGS platforms perform sequencing of millions of small fragments (reads) of DNA in parallel => fast, cheap
• Bioinformatics used to map the individual reads to reference genome
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3841808/
![Page 27: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/27.jpg)
© 2019 KNIME AG. All rights reserved. 33
NGS Application Areas
Taken from: http://www.nfcr.org/sites/default/files/images/GenomicProfiling2.jpghttp://ecx.images-amazon.com/images/I/51tztcMqIRL._SS500_.jpg
Cancer
Hereditary Diseases
Metagenomics
Agriculture
![Page 28: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/28.jpg)
© 2019 KNIME AG. All rights reserved. 34
SeqAn - a Bioinformatics Resource
![Page 29: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/29.jpg)
© 2019 KNIME AG. All rights reserved. 35
SeqAn Library Features
• Efficient index data structures• Efficient algorithms• Fully parallelized and vectorized pairwise alignment
algorithms• Search schemes for index searches
• Fast I/O• SAM/BAM, FastA/FastQ, VCF, …
![Page 30: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/30.jpg)
© 2019 KNIME AG. All rights reserved. 36
SeqAn: NGS Tools
Installation: KNIME Community Contributions-> Bioinformatics & NGS-> SeqAn NGS ToolBox
![Page 31: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/31.jpg)
© 2019 KNIME AG. All rights reserved. 37
Variant Calling
https://www.ebi.ac.uk/training/online/course/human-genetic-variation-i-introduction/variant-identification-and-analysis/what-variant
![Page 32: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/32.jpg)
© 2019 KNIME AG. All rights reserved. 38
Variant Annotation
![Page 33: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/33.jpg)
© 2019 KNIME AG. All rights reserved. 39
Variant Consequences
http://www.ensembl.info/2012/08/06/variation-consequences/
![Page 34: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/34.jpg)
© 2019 KNIME AG. All rights reserved. 40
Variant Calling/Annotation with KNIME
Demo
![Page 35: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/35.jpg)
© 2019 KNIME AG. All rights reserved. 41
Summary
• SeqAn KNIME nodes for mapping to reference genome and variant calling
• KNIME nodes for variant annotation and visualization of results
• Workflow can be easily extended and adjusted to your own needs, e.g. to include quality control
![Page 36: Analysis of Mass Spectrometry and Sequence Data with KNIMEAnalysis of Mass Spectrometry and Sequence Data with KNIME. Schedule • About us ... NGS and image data analysis as well](https://reader031.fdocuments.us/reader031/viewer/2022041020/5ecfbc4185fee802e9779347/html5/thumbnails/36.jpg)
42© 2019 KNIME AG. All rights reserved.
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States.
KNIME® is also registered in Germany.