Statistical Consideration for Identification and Quantification in Top-Down Proteomics Richard LeDuc...

Statistical Consideration for Identification and Quantification

in Top-Down Proteomics

Richard LeDuc National Center for Genome Analysis Support

Discovery Omics withTop Down Proteomics

Acknowledgements

• Leonid Zamdborg• Shannee Babai• Bryon Early• Ian Spauling• Kevin Glowacz• Eric Bluhm• Vinayak Viswanathan• Yong-Bin Kim• Ryan Fellers• Tom Januszyk• Brian Cis• Chris Strouse• Seyoung Sohn• Greg Taylor• Joe Sola• Lee Bynum• Andrew Birck

All the other numerous members of the KRG who have contributed insights over the years.

Drs. Neil Kelleher, Paul Thomas, and Andy Forbes, and ProSight Development Team (past and present)

Yury Bukhman, James McCurdy, Adam Halstead, Irene Ong (Area 3), Mary Lipton (PNNL), Kathryn Richmond (Enabling Technologies) and others

Proteomics CoreReid TownsendPetra GilmoreCheryl LichtiJames MaloneAlan Davis

Michael Gross (NCRR Mass Spec)Henry Rohrs (NCRR Mass Spec)Ron Bose (Oncology)Mike Boyne (FDA)Jeffry Hiken (Genetics)

Le-Shin Wu, Carrie Ganote,Tom Doak,Bill Barnett, and a cast of thousands

Limbrick LaboratoryDavid LimbrickDiego Morales

Holtzman LaboratoryDavid HoltzmanRick PerrinJacqueline PaytonChengjie Xiong (Biostatistics)

National Center for Genome Analysis Support

Washington Univ. School of Medicine

The Kelleher Research Group

Differential Omics Studies1. RNA-seq, Bottom-up

proteomics, metabolomics

2. Looking for a list of discovered entities that have different expression levels between treatments

3. Very popular for target discovery

4. Frequently done on organisms before a genome is completed

‘P score’ = Pf,n =(xf)n x e-xf

f is the # of input fragment ions,

n is the # of matches,

Ma is the Mass Accuracy

2211.111

nifncrude ppp

F. Meng, B. Cargile, L. Miller, J. Johnson, and N. Kelleher, Nat. Biotechnol., 2001, 19, 952-957.

“Kelleher P-Score” Example

Modeling the Scrambled P-Scores

Motivation Goodness of Fit

9,839 MS/MS Queries (MS1 and MS2 data)

Better is better, butthe easy ones are easy

Computers Ask the Darndest Questions

Top Down Proteomics!• Three pillars of proteomics:

• Identification• Characterization• Quantification.

• Top down proteomic studies are underway.

• These are large and complex studies(At several institutions, a typical production bottom-up study would have 200+ LC runs)

Top Down Proteomics BiometricsSources of

1. Intensity calculation

2. LC alignment

3. Mass Spec Physics

4. SeparationDifferent fractions etc.

5. Protein IsolationChIP, RBC ghosts etc

6. Tissue variation

7. Individual variation

8. Population variation

9. Random and systemic errors

Experimental Design

• Ronald A. Fisher (1926) : "The Arrangement of Field Experiments“

• All measurements have errors• All biological systems have

individual variation

• The goal of experimental design is to design the experiment so that the variation can be partitioned

• Typically testing variation between groups against the variation within

Healthy Group

Sub 1 Sub 2

R1 R2 R3 R4 R5 R7R6 R8

Diseased Group

Sub 1 Sub 2

R1 R2 R3 R4 R5 R7R6 R8

Control Samples PNH Samples

1 642 8 10 1211 133 5 7 9

Coomassie

Catalase

Peroxiredoxin

250150100755037

Typical Results: Human RBC Ghosts

Control Samples

PNH Samples

Populations of Experiments• Instead of doing 1

experiment, you are doing an unknown number of experiments

• Number of experiments determined by how many unique entities are observed consistently over the entire set of observations

Control Samples

PNH Samples

Typical Results: Breast Cancer Model

Sources of Variation: The Model

ijklijkijiijkl erdaI )()(Wherei=1 or 2 and represents the two preparations,j = 1 to 3 for each digestion within a given

preparation,k = 1 to 3 for each injection (or run) within each

digestionl = 1 to the number of peptides for the given protein.

Under this model, let

residuals theis e

npreperatio i thefrom digestion random j thefrom run random k for theeffect theis r

npreperatio i thefrom digestion random j for theeffect theis d

npreparatio random i for theeffect theis a

thththk(ij)

ththj(i)

Variance Component Estimates

Power Calculations

Power Curves for High Subject Low Residual Variation

0 0.5 1 1.5 2 2.5Effect Size (in STD)

Inbreed Mice

Systems Analysis• What to do with the laundry lists

of significant genes?• Gene Ontology Analysis• Gene Set Enrichment Analysis

• Often paired with RNA or metabolomic data.

• Creates a third level of analysis

To Review

• Everything is in place for top-down proteomic studies.

• In any discovery omic study, extreme care must be taken – lots of pilot work to understand the behavior of your analytic system

• Technology and mathematical formalism does not trump biology. (Bad experimental design results in bad experiments)

• Funded by National Science Foundation1. Large memory clusters for assembly

2. Bioinformatics consulting for biologists

3. Optimized software for better efficiency

• Partner Institutions:• Extreme Science and Engineering Discovery Environment (XSEDE)• Texas Advanced Computing Center (TACC) at the University of Texas at

Austin• San Diego Supercomputer Center (SDSC) at the University of California, San

Diego.• Pittsburgh Supercomputing Center (PSC)

• Open for business at: http://ncgas.org

Questions?

Statistical Consideration for Identification and Quantification in Top-Down Proteomics Richard LeDuc...

Documents

Transcript of Statistical Consideration for Identification and Quantification in Top-Down Proteomics Richard LeDuc...

LEDUC Rug & Upholstery

Colins slideshow leduc

Mass Spectrometry Proteomics and Metabolomics · liquid chromatography-tandem mass spectrometry; Helmschrodt et al.; Biochem Biophys Res Commun; 2014; 446(3); 726-730. • Quantification

Systems Biology - Gene-Quantification · Genomics, Proteomics & Systems Biology 1990 1995 2000 2005 2010 2015 2020 Genomics Proteomics Systems Biology ... • Two new approaches which

LCMS 151 - Plasma proteomics - Bruker · B The median coefficient of variation (CV) for protein quantification is low (9.3%) indicating very good reproducibility for protein quantification

MICRO-HYDRAULICS - Hydro Leduc - Conception et … leduc... · Let us surprise you with the innovative solutions possible ... LEDUC micro-hydraulics offers a complete and original

Beaumont/Leduc County JGS

LEDUC - yorkrealty.ca

Isobaric Tag based MS Quantification Algorithms … · 0 Isobaric Tag based MS Quantification Algorithms Analysis and Implementation Master’s degree in Proteomics and Bioinformatics

TargetSeeker-MS: A Computational Method for Drug Target … · Mass spectrometry (MS) -based proteomics allows the large -scale identifica tion and ""! quantification of proteins

Isobaric Labeling-based Relative Quantification in Shotgun Proteomics

State Key Laboratory of Agricultural Microbiology, College ... · UNCORRECTED PROOF Environmental Pollutionxxx(2018)xxx-xxx 3 2.5. iTRAQ proteomics identification and quantification

Leduc Standards-Final 2021May

LEDUC - ICLR

Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012.

Introduction to mass spectrometry- based protein identification and quantification Austin Yang, Ph.D. Aebersold R, Mann M. Mass spectrometry-based proteomics.

Patrick Leduc - NIST

Australia, Leduc Killian

Quantification in Proteomics - IOCB quantification in proteomics ... Stable isotope labeling by amino acids in cell culture ... •ICAT –Isotope coded affinity tag

Quantitative Proteomics Workshop_2010 - Martin Well… · Peptide separation MS Peptide Quantification Locate peaks of significant expression behaviour MS-MS Identification Peptide