Tackling Analytical challenges in Cancer proteogenomics using...
Transcript of Tackling Analytical challenges in Cancer proteogenomics using...
![Page 1: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/1.jpg)
Tackling Analytical challenges in Cancer proteogenomics using
Galaxy frameworkDecember 11, 2018
Pratik JagtapGalaxy-P Team
University of Minnesota
Slides for the Talk: z.umn.edu/mumbaislides
![Page 2: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/2.jpg)
• Introduction to proteogenomics and multi-omic studies
• RNASeq Data Processing: Data Analysis using Galaxy platform
• Proteomics data analysis using Galaxy
• Identification of novel proteoforms and visualization
RNASeq data processing. Generation of protein sequence database.
Sequence database searching and peptide /
protein identification
Results visualization and interpretation
Raw RNA-seq data
Raw MS/MS proteomics data
WORKSHOP STRUCTURE
![Page 3: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/3.jpg)
MULTI-OMICS
![Page 4: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/4.jpg)
MULTI-OMICS TECHNOLOGIES
Ruggles et al. Mol Cell Proteomics 2017;16:959-981 © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
• Next-Gen Sequencing
• RNASeq
• Mass Spectrometry
• Proteogenomics
• Proteo-transcriptomics
• Metaproteomics
• Meta-transcriptomics
• Metabolomics
![Page 5: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/5.jpg)
LOOKING BEYOND THE KNOWN PROTEOME
Mass spectrumReference Protein Database
from genomic annotation
Cancer / Disease related
Databases such as COSMIC,
IARC p53, OMIM…
Deep genome sequencing data
from ICGC, TCGA and CPTAC
RNASeq data
(Customized OR
Combined)
6-frame DNA
sequences.
3-frame cDNA
sequences.Identification of
peptides
corresponding
to novel proteoforms.
![Page 6: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/6.jpg)
https://doi.org/10.1007/978-1-4939-7717-8_7
Multiomics / trans-omics
![Page 7: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/7.jpg)
GALAXYGalaxy Instance for proteogenomics workshop: z.umn.edu/galaxypinmumbai
User will need to register and login in using password onto the site. Step by step instructions for the
workshop are provided in the document below (registration instructions start on page 5).
Documentation for Galaxy instance usage:z.umn.edu/mumbaidocs
![Page 8: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/8.jpg)
GALAXYGalaxy Instance for proteogenomics workshop: z.umn.edu/proteogenomicsgateway
User will need to register and login in using password onto the site. Step by step instructions for the
workshop are provided in the document below (registration instructions start on page 5).
Documentation for Galaxy instance usage:z.umn.edu/pginnov18
![Page 9: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/9.jpg)
REGISTER
![Page 10: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/10.jpg)
IMPORT HISTORY
![Page 11: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/11.jpg)
IMPORT HISTORY
![Page 12: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/12.jpg)
INPUT DATA
![Page 13: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/13.jpg)
DATASET FOR MULTI-OMICS ANALYSIS
Heydarian et al J Proteomics Bioinform. (2014) 17:7. pii: 1000302.
• Mouse cell culture.
• RNA-seq analysis
RNA-seq libraries were sequenced on a HiSeq 2000
(Illumina SY-401–1001) to a read depth of
~90,000,000 single end 97 bp reads per sample.
• iTRAQ-labeling and Mass SpectrometryReversed phase liquid chromatography using Easy-nLCsystem (Thermo Scientific) and analyzed on a LTQ-Orbitrap Elite mass spectrometer (Thermo Scientific).
![Page 14: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/14.jpg)
Select History 1
Import history
Start using this history
Select Workflow 1
Import workflow
Using the workflow
Run Workflow 1
INPUT
WORKFLOW
GALAXY
OUTPUT
![Page 15: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/15.jpg)
GALAXY INTERFACE
Left (Tool) Pane
Main Viewing Pane
History Pane
![Page 16: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/16.jpg)
WORKSHOP WORKFLOWS
Workflow #1
RNA-Seq to Variant
FASTA database
Workflow #2
Database Searching
Using MS/MS data
Workflow #3
Identifying Novel Variants
And Visualization
![Page 17: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/17.jpg)
Genomic coordinate information
OBJECTIVE OF WORKFLOW 1
Create custom variant database
![Page 18: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/18.jpg)
Workflow #1
RNA-Seq to Variant
FASTA database
Workflow #2
Database Searching
Using MS/MS data
Workflow #3
Identifying Novel Variants
And Visualization
FASTA SequencesGenome Mapping Information
WORKSHOP WORKFLOWS
![Page 19: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/19.jpg)
INPUT DATA
![Page 20: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/20.jpg)
• RNA-Seq FASTQ file : Reads in FASTQ format
• GTF file: Gene Transfer Format • Tabular file to describe genes and related features
• Known protein and contaminant protein sequence FASTA file
• Mass-spectrometry (MGF) file
INPUT DATA
![Page 21: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/21.jpg)
Select
‘MousePG_Input_History’
Import history
Start using this history
Select
‘MousePG_Workflow1
_RNAseq_Dbcreation’
Import workflow
Using the workflow
Run Workflow 1
INPUT
WORKFLOW
GALAXY
OUTPUT
![Page 22: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/22.jpg)
IMPORT WORKFLOW
![Page 23: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/23.jpg)
IMPORT WORKFLOW
![Page 24: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/24.jpg)
RUNNING A WORKFLOW
![Page 25: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/25.jpg)
SELECTING INPUT FILES TO RUN A WORKFLOW
![Page 26: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/26.jpg)
JOB STATUS (HISTORY PANE)
Job in queue Job running Job successful Job failed
![Page 27: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/27.jpg)
Workflow #1
RNA-Seq to Variant
FASTA database
Workflow #2
Database Searching
Using MS/MS data
Workflow #3
Identifying Novel Variants
And Visualization
WORKSHOP WORKFLOWS
![Page 28: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/28.jpg)
WORKFLOW #1: RNA-SEQ TO VARIANT PROTEIN
SAV / In-Del Variants
Assembly Workflow
![Page 29: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/29.jpg)
POTENTIAL NOVEL PEPTIDE IDENTIFICATIONS
5’3’
Exon 1 Exon 2 Exon 8Exon 3 Exon 4 Exon 5 Exon 6 Exon 7
Expressed 5’ UTR
Alternate start
Alternate frame
+2
+1
Novel Exon
Novel Spliceform
Exon extension
Expressed 3’ UTR
/Alternate stop
Intergenic
/Novel gene
+3
+3
*
*
Single amino acid
variant
UTR UTR UTR
CD
S
CDS CDS CDS
CD
S
CDS
CD
S
Sta
rt Sto
p
+2Known
Peptides +2
InDels A
![Page 30: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/30.jpg)
RNA-SEQ TO FASTA DATABASE CREATION
RNA-Seq
FASTQ
HISAT
Alignment tool
STRINGTIE
RNA-Seq to transcripts
GFF COMPARE
Translate transcripts
FREEBAYES
CustomPro DB
Sequence
FASTA
GTF
Variant Calling
● Variant annotation
● Genome mapping
Evaluates the assembly with
annotated transcripts
Mapping
Files
Genome
SAV / In-Del Variants
Assembly Workflow
![Page 31: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/31.jpg)
RNA-SEQ TO FASTA DATABASE CREATION
RNA-Seq
FASTQ
HISAT
Alignment tool
STRINGTIE
RNA-Seq to transcripts
GFF COMPARE
Translate transcripts
FREEBAYES
CustomPro DB
Sequence
FASTA
GTF
Variant Calling
● Variant annotation
● Genome mapping
Evaluates the assembly with
annotated transcripts
Mapping
Files
Genome
SAV / In-Del Variants
![Page 32: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/32.jpg)
ALIGNMENT
Mapping to gene/genome
Reference gene/genome
HISAT2: Outputs BAM file (Dataset #9)Kim D., Langmead B. and Salzberg S.L. HISAT: a fast spliced aligner with low memory requirements. Nature Methods (2015)
RNASeq reads
![Page 33: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/33.jpg)
VARIANT CALLING
Mapping to gene/genome
Reference gene/genome
FreeBayes : Outputs VCF file (Dataset #14)Garrison E., Marth G. Haplotype-based variant detection from short-read sequencing. (arXiv:1207.3907)
RNASeq reads
![Page 34: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/34.jpg)
VIEWING SNP VARIANT IN IGV
![Page 35: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/35.jpg)
RNA-SEQ TO FASTA DATABASE CREATION
RNA-Seq
FASTQ
HISAT
Alignment tool
STRINGTIE
RNA-Seq to transcripts
GFF COMPARE
Translate transcripts
FREEBAYES
CustomPro DB
Sequence
FASTA
GTF
Variant Calling
● Variant annotation
● Genome mapping
Evaluates the assembly with
annotated transcripts
Mapping
Files
Genome
SAV / In-Del Variants
![Page 36: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/36.jpg)
CustomProDB
Wang X., Zhang B. customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search. Bioinformatics (2013)
Reference gene/genome
Original Protein Variant ProteinTranslate Translate
FASTA Sequence Variant FASTA Sequence
![Page 37: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/37.jpg)
RNA-SEQ TO FASTA DATABASE CREATION
RNA-Seq
FASTQ
HISAT
Alignment tool
STRINGTIE
RNA-Seq to transcripts
GFF COMPARE
Translate transcripts
FREEBAYES
CustomPro DB
Sequence
FASTA
GTF
Variant Calling
● Variant annotation
● Genome mapping
Evaluates the assembly with
annotated transcripts
Mapping
Files
Genome
Assembly Workflow
![Page 38: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/38.jpg)
ALIGNMENT
Mapping to gene/genome
Reference gene/genome
![Page 39: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/39.jpg)
TRANSCRIPT ASSEMBLY
Mapping to gene/genome
Reference gene/genome
Assembled Transcript
Splicing
3-Frames Translation FASTA Sequence
![Page 40: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/40.jpg)
RNA-SEQ TO FASTA DATABASE CREATION
RNA-Seq
FASTQ
HISAT
Alignment tool
STRINGTIE
RNA-Seq to transcripts
GFF COMPARE
Translates novel
transcripts
FREEBAYES
CustomPro DB
Sequence
FASTA
GTF
Variant Calling
● Variant annotation
● Genome mapping
Evaluates the assembly with
annotated transcripts
Mapping
Files
Genome
SAV / In-Del Variants
Assembly Workflow
![Page 41: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/41.jpg)
OUTPUTS
>generic|ENSMUSP00000107433|Erp29|ER protein 29
MAAAAGVSGAASLSPLLSVLLGLLLLFAPHGGSGLHTKGALPLDTVTFYKSRLLLGP
>generic|ENSMUSP00000120715|Rps2|ribosomal protein S2
MADDAGAAGGPGGPGGPGLGGRGGFRGGFGSGLRGRGRGRGRGRGRGRGARGGKAEDKEWIPVTKLGRLVKDMKIKSLEEIY
LFSLPIKESEIIDFFLGASLKDEVLKIMPVQKQTRAGQR
ENSMUSP00000107433 chr5 121452190 121452340 – 0 150
ENSMUSP00000107433 chr5 121449139 121449163 – 150 174
ENSMUSP00000120715 chr17 24720275 24720452 + 0 177
ENSMUSP00000120715 chr17 24720533 24720731 + 177 375
ENSMUSP00000120715 chr17 24720968 24721302 + 375 709
ENSMUSP00000120715 chr17 24721622 24721727 + 709 814
ENSMUSP00000120715 chr17 24721802 24721897 + 814 909
FASTA Sequence File
Genomic Mapping File
![Page 42: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/42.jpg)
Workflow #1
RNA-Seq to Variant
FASTA database
Workflow #2
Database Searching
Using MS/MS data
Workflow #3
Identifying Novel Variants
And Visualization
FASTA SequencesGenome Mapping Information
WORKSHOP WORKFLOWS
![Page 43: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/43.jpg)
SNAPSHOT OF WHAT HISTORY LOOKS LIKE AT THIS STAGE
![Page 44: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/44.jpg)
PROTEOMICS DATA ANALYSIS USING GALAXY
Protein FASTA: reference proteins + potential variants
Peaklist of MS/MS data
Multiple algorithms for matching MS/MS to peptides
Organization and scoring of peptide spectral matches (PSMs)
Generation of an sqLite database for downstream data visualization and filtering
Putative variant peptide sequences for further verification and analysis
Proteomics. 11:996-9Nat Biotechnol. 33:22-4
![Page 45: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/45.jpg)
Mass Spectrometry and Proteomics
![Page 46: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/46.jpg)
![Page 47: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/47.jpg)
Vaudel et al. Nature Biotechnol. 2015, 33:22–24.Vaudel et al. J Proteome Res. 2018, doi: 10.1021/acs.jproteome.8b00175.
• Bundles a multiple freely-available algorithms for matching MS/MS to peptide sequences
• Infers proteins from peptide sequence matches
• Assigns confidence scores to peptide sequence matches and inferred proteins
• Provides outputs in standard formats (e.g. mzidentML) for further processing
![Page 48: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/48.jpg)
![Page 49: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/49.jpg)
WORKSHOP WORKFLOWS
Workflow #1
RNA-Seq to Variant
FASTA database
Workflow #2
Database Searching
Using MS/MS data
Workflow #3
Identifying Novel Variants
And Visualization
![Page 50: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/50.jpg)
YOUR CURRENT HISTORY
![Page 51: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/51.jpg)
In order to access the input for this part of the workshop, Click on “Shared Data”→ “Histories”→“ MousePG_History2”. And click on Import History.
IF NOT…
![Page 52: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/52.jpg)
Select ‘MousePG_Workflow3_Novel_peptide_analysis’
Import workflow
Start using this workflow
Run Workflow
ACTIVE HISTORY
FROM EARLIER
WORKFLOW
WORKFLOW
![Page 53: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/53.jpg)
WORKFLOW FOR THIS SECTION
Workflow 2
Workflow 2
Workflow 1
![Page 54: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/54.jpg)
WORKFLOW FOR THIS SECTION
Workshop Documentation: z.umn.edu/galaxypinmumbai5.2 BlastP analysis 325.3 Novel proteoform analysis 335.4 Using Multi-omics Visualization Platform for visualizing novel proteoforms 35
![Page 55: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/55.jpg)
SELECT DISTINCT PSM.*FROM PSM JOIN BLAST ON PSM.SEQUENCE =BLAST.QSEQID
WHERE BLAST.PIDENT < 100 OR BLAST.GAPOPEN
>= 1 OR BLAST.LENGTH < BLAST.QLEN
ORDER BY PSM.SEQUENCE, PSM.ID
BLASTP ANALYSIS
![Page 56: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/56.jpg)
MULTI-OMICS VISUALIZATION PLATFORM FOR VISUALIZING NOVEL PROTEOFORMS
![Page 57: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/57.jpg)
MULTI-OMICS VISUALIZATION PLATFORM FOR VISUALIZING NOVEL PROTEOFORMS
SPECTRAL QUALITY VISUALIZATION (Lorikeet Viewer)
GENOMIC LOCALIZATION (Integrated Genomics Viewer)
![Page 58: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/58.jpg)
ESSREALVEPTSESPRPALAR
![Page 59: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/59.jpg)
GENOMIC LOCALIZATION (INTEGRATED GENOMICS VIEWER)
![Page 60: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/60.jpg)
NOVEL PROTEOFORM ANALYSIS
![Page 61: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/61.jpg)
UCSC GENOME BROWSER
![Page 62: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/62.jpg)
CDART BLAST SEARCH
![Page 63: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/63.jpg)
PROJECT OVERVIEW
![Page 64: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/64.jpg)
GO AND TRY IT OUT!Galaxy Instance for proteogenomics workshop: z.umn.edu/galaxypinmumbai
User will need to register and login in using password onto the site. Step by step instructions for the
workshop are provided in the document below (registration instructions start on page 5).
Documentation for Galaxy instance usage:z.umn.edu/mumbaidocs
![Page 65: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/65.jpg)
GO AND TRY IT OUT!
GALAXY INSTANCE ONE
· Galaxy Instance for proteogenomics workshop: z.umn.edu/galaxypinmumbai
User will need to register and login in using password onto the site. Step by step instructions for the workshop
are provided in the document below (registration instructions start on page 5).
· Documentation for Galaxy instance usage: z.umn.edu/mumbaidocs
GALAXY INSTANCE TWO (Back up if GALAXY INSTANCE ONE gets busy)
· Proteogenomics Gateway: z.umn.edu/proteogenomicsgateway
User will need to register and login in using password onto the site. Step by step instructions for the workshop
are provided in the document below (registration instructions start on page 5).
· Documentation for Galaxy instance usage: z.umn.edu/pginnov18
![Page 66: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/66.jpg)
• Instructors• Pratik Jagtap
• Support• Praveen Kumar• Prof. Timothy Griffin Galaxy-P team (University of Minnesota)• Subina Mehta• James Johnson and Thomas McGowan (University of Minnesota)• Matthew Chambers• Jetstream Cloud at Indiana University
• Funding
WORKSHOP INSTRUCTORS AND ACKNOWLEDGEMENTS
![Page 67: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/67.jpg)
Minnesota Supercomputing InstituteJames JohnsonThomas McGowanLee ParsonsMichael Milligan
Ira CookeMelbourne , Australia
University of MinnesotaTimothy GriffinPratik JagtapPraveen KumarCandace GuerreroSubina MehtaAdrian Hegeman (Co-I)Art EschenlauerShane HublerRay SajulgaCaleb EasterlyAndrew Rajczewski
Biologists / collaboratorsLaurie ParkerJoel RudneyManeesh BhargavaAmy SkubitzChris WendtBrian CrookerSteven FriedenbergKevin VikenKristin BoylanMarnie PetersonSomiah AfiuniBrian SandriAlexa PragmanWanda WeberAmy Treeful
Harald BarsnesMarc VaudelUniversity of Bergen, Norway
University of Freiburg,Freiburg, Germany
VIB, UGhent, Belgium
Judson HerveyNaval Research InstituteWashington, D.C.
Matt ChambersNashville, TN
Alessandro TancaPorto Conte Ricerche, Italy
CarolinKolmederUniversity of Helsinki, Finland
Thilo MuthBernhard RenardRobert Koch Institut
Thomas DoakJeremy Fisher Indiana University
Josh EliasStanford University
Brook NunnU of Washington
Lennart Martens (Co-I)Bart MesuereRobbert G Singh
Bjoern GrueningBérénice Batut
Lloyd Smith (Co-I)Michael ShortreedUW-Madison
Karen ReddyMo HeydarianJohns Hopkins UniversityFunding
Anamika KrishanpalPriyabrata PanigrahiPersistent Systems Limited
Stephan KangIntero Life Sciences
galaxyp.org
FundingACKNOWLEDGMENTS
![Page 68: Tackling Analytical challenges in Cancer proteogenomics using …galaxyp.org/wp-content/uploads/2019/01/CancerProteogenomics_IIT… · Heydarian et al J Proteomics Bioinform. (2014)](https://reader034.fdocuments.us/reader034/viewer/2022042916/5f55c55870be6140054852d5/html5/thumbnails/68.jpg)
QUESTIONS?
Follow us on twitter.com/usegalaxyp
Workshop Documentation: z.umn.edu/galaxypinmumbai
Slides for the Talk: z.umn.edu/mumbaislides
Visit: http://galaxyp.org
Feedback: https://z.umn.edu/fbindia