Proteomic Characterization of Alternative Splicing and Coding Polymorphism

16
Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park

description

Proteomic Characterization of Alternative Splicing and Coding Polymorphism. Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park. Why don’t we see more novel peptides?. - PowerPoint PPT Presentation

Transcript of Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Page 1: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Proteomic Characterization of Alternative Splicing and Coding

Polymorphism

Nathan EdwardsCenter for Bioinformatics and Computational

Biology

University of Maryland, College Park

Page 2: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Why don’t we see more novel peptides?

Tandem mass spectrometry doesn’t discriminate against novel peptides...

...but protein sequence databases do!

Searching traditional protein sequence databases biases the results towards well-understood protein isoforms!

Page 3: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

What goes missing?

Known coding SNPs

Novel coding mutations

Alternative splicing isoforms

Alternative translation start-sites

Microexons

Alternative translation frames

Page 4: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Why should we care?

Alternative splicing is the norm!• Only 20-25K human genes• Each gene makes many proteins

Proteins have clinical implications• Biomarker discovery

Evidence for SNPs and alternative splicing stops with transcription• Genomic assays, ESTs, mRNA sequence.• Little hard evidence for translation start site

Page 5: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Novel Splice Isoform

Human Jurkat leukemia cell-line• Lipid-raft extraction protocol, targeting T cells• von Haller, et al. MCP 2003.

LIME1 gene:• LCK interacting transmembrane adaptor 1

LCK gene:• Leukocyte-specific protein tyrosine kinase• Proto-oncogene• Chromosomal aberration involving LCK in leukemias.

Multiple significant peptide identifications

Page 7: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Novel Splice Isoform

Page 8: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Novel Mutation

HUPO Plasma Proteome Project• Pooled samples from 10 male & 10 female

healthy Chinese subjects• Plasma/EDTA sample protocol• Li, et al. Proteomics 2005. (Lab 29)

TTR gene• Transthyretin (pre-albumin) • Defects in TTR are a cause of amyloidosis.• Familial amyloidotic polyneuropathy

• late-onset, dominant inheritance

Page 9: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Novel Mutation

Ala2→Pro associated with familial amyloid polyneuropathy

Page 10: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Novel Mutation

Page 11: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Searching Expressed Sequence Tags (ESTs)

Pros

No introns!

Primary splicing evidence for annotation pipelines

Evidence for dbSNP

Often derived from clinical cancer samples

Cons

No frame

Large (8Gb)

“Untrusted” by annotation pipelines

Highly redundant

Nucleotide error rate ~ 1%

Page 12: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Compressed EST Peptide Sequence Database

For all ESTs mapped to a UniGene gene:• Six-frame translation• Eliminate ORFs < 30 amino-acids• Eliminate amino-acid 30-mers observed once• Compress to C2 FASTA database

• Complete, Correct for amino-acid 30-mersGene-centric peptide sequence database:

• Size: 223 Mb vs 8 Gb, 20774 FASTA entries• Running time: 15 mins vs 22 hours• E-values: 50-fold reduction

Download:• http://www.umiacs.umd.edu/~nedwards

Page 13: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Back to the lab...

Current LC/MS/MS workflows identify a few peptides per protein• ...not sufficient for protein isoforms

Need to raise the sequence coverage to (say) 80%• ...protein separation prior to LC/MS/MS

analysis

Page 14: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Future informatics directions...

Combine results from multiple searches from multiple engines

Fast, automated triage of “significant false-positive” peptide identifications

Compressed EST peptide sequence database for other species• Mouse, Rat, Zebrafish, Chicken, Cow, A. thaliana, ??

Relational database and web-application infrastructure• Interactive browser data-grid, flexible web-services export• Java Applet MS/MS viewers, GFF for Genome Browser

Page 15: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Conclusions

Peptides identify more than just proteins• Untapped source of disease biomarkers• Functional vs silencing variants

Compressed peptide sequence databases make routine EST searching feasible

Statistically significant peptide identification is only the first step

Page 16: Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Acknowledgements

Catherine Fenselau, Steve Swatkoski• UMCP Biochemistry

Chau-Wen Tseng, Xue Wu• UMCP Computer Science

Cheng Lee• Calibrant Biosystems

PeptideAtlas, HUPO PPP, X!Tandem

Funding: NCI