August 20th 2015 Pratik Jagtapcbs.umn.edu/sites/cbs.umn.edu/files/public/downloads/08...Pratik...
Transcript of August 20th 2015 Pratik Jagtapcbs.umn.edu/sites/cbs.umn.edu/files/public/downloads/08...Pratik...
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
PROTEOINFORMATICS OVERVIEW
Center for Mass Spectrometry and Proteomics
August 20th 2015 Pratik Jagtap
http://www.cbs.umn.edu/msp
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Outline
• PROTEOMICS WORKFLOW • PEAKLIST PROCESSING • Search Databases Overview • Protein Identification • Protein Validation and Quantification • Publication Guidelines
Terminology • RAW file
• Peaklist
• Peaklist processing
• Peptide-Spectral Match (PSM)
• Genome Assembly and annotation
• Variety of search databases
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
PROTEOMICS WORKFLOW
Eng et al 2011 Mol Cell Proteomics. 10(11): R111.009522.
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Mass spectral data (.RAW)
StaAsAcal validaAon of Protein IdenAficaAon.
Protein IdenAficaAon
Processing Mass Spectrometer
PROTEOMICS WORKFLOW
Search databases Protein
QuanAtaAon.
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Outline • PROTEOMICS WORKFLOW
• PEAKLIST PROCESSING • Search Databases Overview • Protein Identification • Protein Validation and Quantification • Publication Guidelines
Terminology • RAW file
• Peaklist
• Peaklist processing
• Peptide-Spectral Match (PSM)
• Genome Assembly and annotation
• Variety of search databases
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved. .
MASS SPECTRAL DATA
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Cappadona et al 2012 Amino Acids. Sep 2012; 43(3): 1087–1108
Eng et al 2011 Mol Cell Proteomics. 10(11): R111.009522.
MASS SPECTRAL DATA
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Eng et al 2011 Mol Cell Proteomics. 10(11): R111.009522.
PROTEOMICS WORKFLOW
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Peaklist Processing
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
RAW DATA CONVERSION TOOLS
.RAW XRawfile library from
ThermoFinnigan Xcalibur software.
ReAdW
mzxML
http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW
msconvert
ProteoWizard
mzML
http://proteowizard.sourceforge.net/
Others Raw2MSM extract_msn DeconMSn DTASuperCharge
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Average ppm and Standard deviation improves when MaxQuant processed files are used.
ORBITRAP: PROCESSING AND EFFECTS
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Peaklist Processing
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Eng et al 2011 Mol Cell Proteomics. 10(11): R111.009522.
PROTEOMICS WORKFLOW
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Outline • PROTEOMICS WORKFLOW • PEAKLIST PROCESSING
• Search Databases Overview
• Protein Identification • Protein Validation and Quantification • Publication Guidelines
Terminology • RAW file
• Peaklist
• Peaklist processing
• Peptide-Spectral Match (PSM)
• Genome Assembly and annotation
• Variety of search databases
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Mass spectral data (.RAW)
StaAsAcal validaAon of Protein IdenAficaAon.
Protein IdenAficaAon
Processing Mass Spectrometer
PROTEOMICS WORKFLOW
Search databases Protein
QuanAtaAon.
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Search against database. Mass spectrum
DATABASE SEARCH
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Salzberg Genome Biology 2007 8:102 doi:10.1186
DNA → GENOME → PROTEOMIC DATABASE.
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
GENOMIC AND PROTEOMIC DATABASES
Finished and Published Genomes • 3551 Bacterial Genomes. • 211 Archaeal Genomes. • 58 Eukaryal Genomes. • 3363 Viral Genomes
http://www.genomesonline.org/index
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
PROTEOMIC DATABASES
CUSTOMIZED DATABASES
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Swiss-Prot is the manually annotated and reviewed section of the UniProt Knowledgebase (UniProtKB).
It is a high quality annotated and non-redundant protein sequence database,
which brings together experimental results, computed features and scientific
conclusions. http://en.wikipedia.org/wiki/Swiss-Prot
TrEMBL contains high-quality computationally analyzed records, which are enriched with automatic annotation.
The translations of annotated coding sequences in the EMBL-Bank/GenBank/DDBJ nucleotide sequence database are automatically processed and entered in
TrEMBL. http://en.wikipedia.org/wiki/TrEMBL
PROTEOMIC DATABASES
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
UNIPROT DATABASE
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
UNIPROT DATABASE
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. RefSeq sequences form a foundation for medical, functional, and diversity studies. They provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis (especially RefSeqGene records), expression studies, and comparative analyses. http://www.ncbi.nlm.nih.gov/refseq/
PROTEOMIC DATABASES
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
CUSTOMIZED PROTEOMIC DATABASES
Customized database
repositories (CPTAC / UniMesh)
Genomic DNA
sequences.
Expressed sequence
tags / cDNA sequences.
Six-frame translation
Three-frame translation
Metagenomic databases.
Translation
RNASeq data.
Translation and database reduction
workflows
Proteomic databases.
24
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Eng et al 2011 Mol Cell Proteomics. 10(11): R111.009522.
PROTEOMICS WORKFLOW
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Outline • PROTEOMICS WORKFLOW • PEAKLIST PROCESSING • Search Databases Overview
• Protein Identification
• Protein Validation and Quantification • Publication Guidelines
Terminology • RAW file
• Peaklist
• Peaklist processing
• Peptide-Spectral Match (PSM)
• Genome Assembly and annotation
• Variety of search databases