The Critical Role of Mass Spectrometry in a Proteomics Core … · 2009. 11. 18. · Sample 1....

Post on 16-Sep-2020

8 views 0 download

Transcript of The Critical Role of Mass Spectrometry in a Proteomics Core … · 2009. 11. 18. · Sample 1....

The Critical Role of Mass Spectrometry

in a Proteomics Core Facility:

“Can you validate my western blot?”

J. Will Thompson

Sr. Laboratory Administrator

Duke Proteomics Core Facility

Duke Institute for Genome Sciences & Policy

Duke School of Medicine

“ Can you Validate My Western Blot?”One of the simplest, yet most important and impactful roles of a mass spectrometry proteomics core in a Biochemistry/School of Medicine setting is to provide verification of data acquired with classical techniques

IL-28a?

IL-28A (Interferon λ-2)MKLDMTGDCTPVLVLMAAVLTVTGAVPVARLHGALPDARGCHIAQFKSLSPQELQAFKRAKDALEESLLLKDCRCHSRLF PRTWDLRQLQVRERPMALEAELALTLKVLEATADTDPALVDVLDQPLHTLHHILSQFRACIQPQPTAGPRTRGRLHHWLYRLQEAPKKESPGCLEASVTFNLFRLLTRDLNCVASGDLCV

Red – peptides identified by MascotBlue – residues unique to IL-28A (versus IL-28B)

ELISA Standard(100 ng)

MWMarkers

Carrier Proteins(BSA, etc)

Duke Proteomics Core Facility• Established Summer 2007 by Duke School of Medicine and Duke Institute

for Genome Sciences & Policy (Arthur Moseley, Director)

• Now 5 full-time staff (3 Ph.D., 2 batchelors)

• Major Hardware– 5 Nanoacquity UPLCs, 1 with 2D technology

– 3 QToFs (Global Ultima, Premier, Synapt HDMS)

– 1 Xevo TQ

– 1 LTQ Orbitrap XL (HHMI)

– Mesoscale Discovery 2400 Imager

• Informatics Infrastructure– 28 Terabytes of NetApp storage

– 10-blade IBM Mascot server

– Dell R900 Server for Rosetta Elucidator

– Desktop workstations:• PLGS 2.4

• Mascot Daemon

• Mascot Distiller

• Scaffold

• VerifyE

Erik Arthur Will LauraMeredith

Challenges and Opportunities for Mass-Spectrometry Based Proteomics

• Clinical Proteomics / Biomarker Discovery– “Large” clinical-based studies where QC metrics must be tightly controlled,

deliverables are well-defined and expected, and data must be of high quality– “Discovery and Validation of a Serum Proteomic Signature of Response to

Interferon Therapy in Chronic HCV Infection”

• Translational Research– Medium to large-scale studies requiring cutting-edge but robust technology,

with longer timelines and more flexible end deliverables– “Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using

Cryosectioning and Mass Spectrometry”

• Basic Research– Highly collaborative small to medium-scale studies where new technologies

can be tested and ultimately deployed, with loose timelines and where hypothesis generation is many times a key goal

– “A Proteomics Approach to Dissect Lipid Droplet – Chlamydia Interactions”

Serum/PlasmaSample

HAPDepletion

DEPLETED PROTEOME

Peptide and Protein Quantitation

nanoscale

UPLC

MS

Quantitative Pipeline Qualitative Pipeline

Automated data transfer to NetApp enterprise data storage

Integration of quantitative and qualitative data (Rosetta Elucidator)

Automated translation to DB searchable format (.xml, .mgf)

Image Conversion, Image alignment and Quantitative Analysis (Rosetta Elucidator)

Database search of product ion spectra (Matrix Sciences Mascot or Waters’ IdentityE )

Peptide ID Quality Scoring & Translating Peptides to Proteins(Rosetta Elucidator or Proteome Software’s Scaffold)

MSE or MS/MS

Quantitative Serum Protein Mass Spectrometry in the Duke Proteomics Core Facility

DIGEST

Q-ToF Mass Spectrometry

High Resolution Accurate Mass MeasurementsPrecursor Ions and Product Ions

Data Acquisition for Biomarker Discovery

Column Condition QC Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample 8 Sample 9 Sample 10 QC 2 Sample 11 Sample 12 Sample 13

Day 1 (+): Instrument Performance Checks, Column Conditioning, Preliminary database searches

Day 2: Data Collection Day 3: Data Collection

QC X-1 Sample X-5

Sample X-4

Sample X-3

Sample X-2

Sample X-1

Sample X QC X………

Day X: Data Collection

•Strategy is to maximize biological powering by analyzing as many samples as possible•Robust LC-MS platform allows singlicate analysis to be performed of each sample•Data QC is performed by daily injections of a “standard” of the same biofluid(Bioreclamation, Inc)•Need a higher throughput platform with same/better analytical metrics

Association of LC/MS “Features (Isotopes)”

TreatedControl

Raw Data

Ratio Data

Ratio Builder

Combined Data

Combined Data Builder

Aligned Data

PeakTeller

PeptideTeller Results (Keller et al, ISB)Peptide Annotation with Multiple Search Engines

Mascot Searches(semitryptic)

PLGS 2.4 Searches(tryptic)

A common score is assigned based on decoy database validation, allowing annotation with multiple search engines simultaneously with controlled FDR

ProteinTeller StatisticsMerged Protein Annotation with PLGS 2.4 / Mascot v 2.2 searches

“Typical” Protein Annotation Metrics Plasma Proteomics Study (~30+ patients)

• 3944 Peptides to 302 Proteins, single dimension of LC-MS analysis (2hrs/sample)– 3768 Peptides to 104 proteins (with 2+ peptides)

APOB_HUMAN, P04114

CO4B_HUMAN, P0C0L5

CERU_HUMAN, P00450

FINC_HUMAN, P02751

CFAH_HUMAN, P08603

CFAB_HUMAN, P00751

PLMN_HUMAN, P00747

ITIH2_HUMAN, P19823

CO5_HUMAN, P01031

APOA4_HUMAN, P06727

HEMO_HUMAN, P02790

VTDB_HUMAN, P02774

ITIH4_HUMAN, Q14624

AACT_HUMAN, P01011

AFAM_HUMAN, P43652

ANT3_HUMAN, P01008

A1BG_HUMAN, P04217

KNG1_HUMAN, P01042

ITIH1_HUMAN, P19827

THRB_HUMAN, P00734

Peptide Distribution

Metrics for File delivered for stats analysis:•33,862 total Isotope Groups•5065 annotated Isotope Groups•3944 unique peptide sequences•302 unique proteins

Reproducibility of Plasma Datasets

Analytical Variability Analytical + Biological Variability

25% CV

Challenges and Opportunities for Mass-Spectrometry Based Proteomics

• Clinical Proteomics / Biomarker Discovery– “Large” clinical-based studies where QC metrics must be tightly controlled,

deliverables are well-defined and expected, and data must be of high quality– “Discovery and Validation of a Serum Proteomic Signature of Response to

Interferon Therapy in Chronic HCV Infection”

• Translational Research– Medium to large-scale studies requiring cutting-edge but robust technology,

with longer timelines and more flexible end deliverables– “Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using

Cryosectioning and Mass Spectrometry”

• Basic Research– Highly collaborative small to medium-scale studies where new technologies

can be tested and ultimately deployed, with loose timelines and where hypothesis generation is many times a key goal

– “A Proteomics Approach to Dissect Lipid Droplet – Chlamydia Interactions”

Hepatitis C Infection

75% Have Chronic infection

Eligible for Treatment(SOC = IFN/Ribavirin)

Responders Non-responders (>50%)

DyslipidemiaChronic Insulin ResistanceSteatosisHepatic FibrosisLiver Cancer

Discovery Proteomics Focus MURDOCK Horizon 1“Start with an unmet clinical need”

Hepatitis C Virion

Clinical Biomarker / Clinical Diagnostic

3169 CHC patientsDuke Hepatology Database &

Biorepository

Number ofAnalytes

Number ofSamples

10,000s

10s

10-100

100 -1,000

10

1,000s

BiomarkerValidation

BiomarkerDiscovery

BiomarkerVerification

Open Platform LC/MS LC/MS/MS (MRM)Antibody-based Assays

Antibody-based AssaysLC/MS/MS (MRM)

The Classical Biomarker Discovery ParadigmApplication to Hepatitis C

G1 G2 G3R 10 5 5

NR 10 - -

Discovery cohort (n=30)

Discovery Cohort 2 (n=30)

Verification cohort 2 (n=250)

Verification cohort 3 (n=177, Industry Collaborator Clinical Trial)

Verification cohort 1 (n=41)

(March 2008)

(July 2008)

(August 2009)

Hypothesis Testing in Initial HCV 55-Patient Dataset

• Traditional t-Test/ANOVA, no statistically significant individual species– Non-parametric test

– Minimum p-value: 7.5 x 10-5 (not passing Bonferroni)

– Binary regression with best prediction

Model Fitting with Best Single Isotope Group Leave-One-Out Cross Validation

Traditional Hypothesis-Test is Not Powerful Enough to Extract Signal from Noise

Sparse Latent Factor Regression(Bayesian Factor Regression Modeling, BFRM)

35,000 Isotope Groups

Predictive Factor“Metaproteins”

Factor Score“Expression Value”

Statistical Analysis: Joe Lucas, PhD, Duke Institute for Genome Sciences and Policy

• Regression - Leads directly to prediction

• Sparsity – Many isotopes are irrelevant

• Latent Factors – let data determine important relationships

• Resulting model for prediction:

• 3 Metaproteins, 650 Isotope Groups

Latent Factors which contain Biological Information

Gender Differences ????? Differences

Transplant patient

Latent Factors which contain Biological Information

Ethnicity Drinking History

Cross-Validation Results, Predicting HCV Treatment Response (n=55)

Demographics: Race, Gender, Genotype and Viral LoadMetaprotein Factors

Metaprotein Factors + Demographics

AUROCsDemographic Factors 0.69Metaprotein Factors 0.84Both Factors0.89

Independent Verification with 41 New HCV Patients

Meta-protein, training and verification cohort 1

Data Fit for Metaprotein Predictors

Alignment Challenges

• Model is based on 9160 isotope groups

• We must match these to new data– Restrict to identical peptides with identical charge

state

– 1997 matches

• Estimate factor scores – Project the loadings of just these 1997

Accuracy after Projection

• Original model

• Use factors from projection onto 1997

• Same (training) samples

Discovery Cohort Predictions Entire Model

Discovery Cohort Predictions Only Peptides Available in Verification Data

Adapting to Projection

• Model averaging– Stochastic search

– Models that work with the projections

– Throws out poorly performing models

• Use this limited set to predict new 41

Blinded Prediction of Treatment Response

• Sensitivity: .78

• Specificity: .8

• PPV: .89

• NPV: .67

Difficult to set cutoff due to “batch effects”

• Sensitivity: .92• Specificity: .8• PPV: .89• NPV: .88

Moving Past Biomarker Discovery (HCV)Secondary Questions (and Strategy)

• Metaprotein predictors have been used to verify a predictive signature for HCV Tx response in an independent cohort

• Is this signature real and predictive?– More samples; verification/validation cohorts– Large cohort for validation and large number of peptides – 650

– Immunoassay (we know PTMs are important)– MRM

– scientifically the best way forward – 650 peptides is a MRM challenge

• What are the Predictive Proteins/Peptides?– Improve Peptide Annotation in Dataset

• pI Fractionation / Multidimensional LC• Improvement in DB search algorithms

• Improvements in data alignment algorithms

Challenges and Opportunities for Mass-Spectrometry Based Proteomics

• Clinical Proteomics / Biomarker Discovery– “Large” clinical-based studies where QC metrics must be tightly controlled,

deliverables are well-defined and expected, and data must be of high quality– “Discovery and Validation of a Serum Proteomic Signature of Response to

Interferon Therapy in Chronic HCV Infection”

• Translational Research– Medium to large-scale studies requiring cutting-edge but robust technology,

with longer timelines and more flexible end deliverables– “Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using

Cryosectioning and Mass Spectrometry”

• Basic Research– Highly collaborative small to medium-scale studies where new technologies

can be tested and ultimately deployed, with loose timelines and where hypothesis generation is many times a key goal

– “A Proteomics Approach to Dissect Lipid Droplet – Chlamydia Interactions”

Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using Cryosectioning and Mass Spectrometry

Tissue Sectioning, Lysis and Digestion

LC-MS data collection and processing(Rod Cell “reassembled” in-silico)

Western Blot Confirmation of Protein Trends

Boris Reidel, Nikolai Skiba, Vadim Arshavsky

Using Rosetta Elucidator to Find Matching Trends at Protein Level

(approximately 750 Proteins quantified, with over 3500 peptides)

Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using Cryosectioning and Mass Spectrometry

Boris Reidel, Nikolai Skiba, Vadim Arshavsky

Cellular Machinery of the Photoreceptor Cell(Proteins with specific Subcellular Localization)

Boris Reidel, Nikolai Skiba, Vadim Arshavsky

Cellular Machinery of the Photoreceptor Cell(Protein Translocation)

Light Adjusted RetinaDark Adjusted Retina Boris Reidel, Nikolai Skiba, Vadim Arshavsky

Challenges and Opportunities for Mass-Spectrometry Based Proteomics

• Clinical Proteomics / Biomarker Discovery– “Large” clinical-based studies where QC metrics must be tightly controlled,

deliverables are well-defined and expected, and data must be of high quality– “Discovery and Validation of a Serum Proteomic Signature of Response to

Interferon Therapy in Chronic HCV Infection”

• Translational Research– Medium to large-scale studies requiring cutting-edge but robust technology,

with longer timelines and more flexible end deliverables– “Spatial Proteomic Tissue Profiling of the Photoreceptor Rod Cell using

Cryosectioning and Mass Spectrometry”

• Basic Research– Highly collaborative small to medium-scale studies where new technologies

can be tested and ultimately deployed, with loose timelines and where hypothesis generation is many times a key goal

– “A Proteomics Approach to Dissect Lipid Droplet – Chlamydia Interactions”

A Proteomics Approach to Dissect

Lipid Droplet-Chlamydia interactions

Hector A. Saka, Raphael Valdivia

LDRB

RB

RB

Inclusion

Nucleus

Cytoplasm

EB

LD: Lipid droplet

RB: Reticulate body (non-infections, metabolically active)

EB: Elementary body (infectious, metabolically inactive)

Hypothesis:-Chlamydia bacteria utilizes lipid droplet to subvert host immune responseKey Question:-What are proteomic changes in the lipid droplet as a function of infectionApproach:-Isolate LDs from infected/uninfected cells with density gradient centrifugation-Analyze proteome

Peptide Level Expression DataVimentin shown independently to have n-terminal domain processed by bacterial protease CPAF

Kumar Y, Valdivia RH. Cell Host Microbe. 2008 4(2):159-69.

Key Point: Only by mining the data at the peptide levelcan one understand the underlying biology of this infection

- The N-terminal peptides are changing in expression

Recruitment and Processing of Host Proteins during Chlamydia Trachomatis Infection revealed ONLY at Peptide Level

• Chlamydia-induced subversion of host cell protein’s function leads to qualitative/quantitative changes in the lipid droplet proteome

decrease in expression of N-terminal tryptic peptides

increase in expression of N-terminal semi-tryptic peptides

• Chlamydia ‘co-opts’ the function of structural proteins via protease processing

– stabilizes the inclusion body; minimizes the exposure of the inclusion body contents to host immune-surveillance proteins

– Kumar and Valdivia, Cell Host Microbe. 2008 Aug 14;4(2):159-69.

*Method used to calculate absolute abundance adapted from Silva et al, Mol Cell Proteomics. 2006 Jan;5(1):144-56.

Using Absolute Quantification to Characterize Protein Abundance in Lipid Droplets

(Expression Levels of Top 50 most abundant Proteins shown)*

Novel Hypothesis Generation Using MSE and Absolute Quantitation

75

25

Sypro Orange

Discussion Points

• Software is generally undervalued with respect to how critical it is for success

• Robust analytical workflows and well-planned experiments (and/or well-curated clinical cohorts) are certainly a winning combination

• Vendor collaboration helps to decrease the time in which new developments can have impact

Key Colleagues and Funding Sources• Duke Proteomics Core Facility

– Arthur Moseley, Director

– Laura Dubois

– Erik Soderblom

– Meredith Turner

• HCV Project Team

– John McHutchison, PI

– Jeanette McCarthy, co-PI

– Joe Lucas

– Keyur Patel

• Duke Eye Institute

– Vadim Arshavsky, PI

– Nikolai P. Skiba

– Boris Reidel

• Duke Department of Molecular Genetics & Microbiology

– Raphael Valdivia, PI

– Alex Saka

• Industry Colleagues

– Waters Corporation

• Scott Geromanos

• Martha Stapels

• Keith Fadgen

• Jim Langridge

– Rosetta Biosoftware

• Andrey Bondenrenko

• Cindy Chepanoske

• Jon Karakowski

• Andy Keller

• Funding

– Duke School of Medicine

• Sally Kornbluth, Vice-Dean of Research

– Duke Translational Research Institute

• Victoria Christian, COO DTRI

– Duke Comprehensive Cancer Center

– MURDOCK Study (DHMRI)

– NCRR Grant Number 1UL1 RR024128-01 (CTSA)