Post on 22-May-2015
description
Identifying cancer selective proteins
Martin McIntosh
Computational Biology Program
Fred Hutchinson Cancer Research Center
Background
• A variety of alterations in cancer may result in cells encoding proteins or
polypeptides not observed in normal somatic tissues.
• They may be derived from cancer-related changes in genomes, splicing, post-translational modifications, etc.
• These unique disease-related products may be useful for a variety of translational goals, including.
– Therapy: specific targeting of disease tissues. – Diagnosis: circulating markers or targets for nanotechnology-based imaging.
• I am going to talk about how we are trying to find these products, and
implore people (NCI? Others?) to help out.
How we are looking for neoantigen candidates: start with RNA-seq.
Central dogma
Central dogma
Central dogma
Central dogma
• Un-annotated does not mean it is interesting: 15% of splicing events we see in somatic tissues are un-annotated.
• Annotated!= unimportant: Large bias of cancer tissues populate the EST databases.
What do we know about the human transcript repertoire
cancernormal
FewSamples:
MoreSamples:
tissue normal cancer
brain 666467 37798
testis 165655 1059
placenta 153235 4
eye 82100 0
spleen 75504 0
uterus 70546 35040
blood 69245 24036
kidney 63980 30706
lung 63495 32601
thymus 62142 0
pancreas 59037 25447
muscle 55891 9730
heart 53531 0
liver 52532 36124
prostate 43049 11959
ovary 8413 26755
UCSC EST Libraries (those that map to human tissues): Characterized by organ/tissue and development stage.
Example of putative “Novel” protein
Left: A four nucleotide extension and alternate exon for SF1 which together cause
frame shift that maintains the stop codon in the terminal exon. Right: Confirmation
of spectra by comparing tumor (red) to synthetic spectra (blue). Confirmed by
sequencing.
Why not use MS proteomics?
MS/MS=Matching technology Low sensitivity compared to RNAseq. Low coverage per protein identified.
Biology gets in the way. Exon-exon boundaries frequently cut by trypsin.
Cancer selective splicing events across disease sites
Figure 2: (Left): Clustering of prevalent and abundant cancer selective transcripts to known CT antigens observed in ovarian cancer tissues, a subset of 112 known tumor selective transcripts
identified. (Right): A tandem 3’ splice site, with a NAGNAG motif, in BRCA1, is observed in ovarian
(top) and prostate (bottom) cancer, in normal testis, but no other normal or control RNA-Seq data or
normal ESTs. Figure shows splice viewer our group developed.
Right panel shows splicing viewer developed into IGV (broad) by my group (Damon May).
Lots of changes do not result in code
How we are trying to improve the pipeline.
Specificity to tumor cells:
• Many putative coding sequences may be un-annotated species belonging to infiltrating cells.
• We are creating single-cell suspensions and separating tumor cells from other cells, and sequencing each component.
40S$ 60S$ 80S$
2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$
?"
M$
120S$
A B C
• Separa onfollowingsucroseultracentrifuge.
Numberofribosome'sbound:asmeasuredbyop calreadout.
Ribosomes+Transcript/ribosome
DerivedfromOvcar3CellLine
How we are trying to improve pipeline
Specificity for coding sequences
• Enrich for mRNA’s undergoing active translation.
• Capture polysome-bound transcripts.
A
B
C
Result from one mouse pool (mouse heart). Actin beta, including annotated exon known to be selected for NSMD. Brings up an epistemological issue for proteomics people
What exactly do we mean by a protein coding gene?
Non-coding RNA (Malat1) found in mouse heart. Pronounced with 2 or 3 ribosomes . Interested in looking at ribosome foot printing
Is it really sufficient that we see ribosomes?
Summary
• Who cares about a millions of genomes. • Genomes looks to me like an engineering
problem and not really a research problem. • Relying on changes in proteins derived solely
from changes in cancer genomes (e.g., mutations) may not provide a large number of putative candidates.
• MS proteomics does not work well enough, RNA-seq works too well.
• We need someone to begin to better characterize the nucleotides contained in somatic tissues.
Credit
• People who did the work:
– Matt Fitzgibbon (Computational lead).
– Nigel Clegg (visual curation and EST database).
– Damon May (IGV Visual curation).
– Lindsay Bergen (all Laboratory work).
• Funding:
– No. HHSN261200800001E: NCI in-Silico Center of Excellence
– Canary Foundation.
– Illumina
• Thanks:
– Vivian MacKay (UW Biochem), polysome fractionation.
– Nicole Urban, Chuck Drescher, FHCRC Ovarian SPORE.