Dr. Martin McIntosh: Identifying Cancer Selective Proteins Using RNA-Sequencing and Bioinformatics...

Identifying cancer selective proteins

Martin McIntosh

Computational Biology Program

Fred Hutchinson Cancer Research Center

Background

• A variety of alterations in cancer may result in cells encoding proteins or

polypeptides not observed in normal somatic tissues.

• They may be derived from cancer-related changes in genomes, splicing, post-translational modifications, etc.

• These unique disease-related products may be useful for a variety of translational goals, including.

– Therapy: specific targeting of disease tissues. – Diagnosis: circulating markers or targets for nanotechnology-based imaging.

• I am going to talk about how we are trying to find these products, and

implore people (NCI? Others?) to help out.

How we are looking for neoantigen candidates: start with RNA-seq.

Central dogma

• Un-annotated does not mean it is interesting: 15% of splicing events we see in somatic tissues are un-annotated.

• Annotated!= unimportant: Large bias of cancer tissues populate the EST databases.

What do we know about the human transcript repertoire

cancernormal

FewSamples:

MoreSamples:

tissue normal cancer

brain 666467 37798

testis 165655 1059

placenta 153235 4

eye 82100 0

spleen 75504 0

uterus 70546 35040

blood 69245 24036

kidney 63980 30706

lung 63495 32601

thymus 62142 0

pancreas 59037 25447

muscle 55891 9730

heart 53531 0

liver 52532 36124

prostate 43049 11959

ovary 8413 26755

UCSC EST Libraries (those that map to human tissues): Characterized by organ/tissue and development stage.

Example of putative “Novel” protein

Left: A four nucleotide extension and alternate exon for SF1 which together cause

frame shift that maintains the stop codon in the terminal exon. Right: Confirmation

of spectra by comparing tumor (red) to synthetic spectra (blue). Confirmed by

sequencing.

Why not use MS proteomics?

MS/MS=Matching technology Low sensitivity compared to RNAseq. Low coverage per protein identified.

Biology gets in the way. Exon-exon boundaries frequently cut by trypsin.

Cancer selective splicing events across disease sites

Figure 2: (Left): Clustering of prevalent and abundant cancer selective transcripts to known CT antigens observed in ovarian cancer tissues, a subset of 112 known tumor selective transcripts

identified. (Right): A tandem 3’ splice site, with a NAGNAG motif, in BRCA1, is observed in ovarian

(top) and prostate (bottom) cancer, in normal testis, but no other normal or control RNA-Seq data or

normal ESTs. Figure shows splice viewer our group developed.

Right panel shows splicing viewer developed into IGV (broad) by my group (Damon May).

Lots of changes do not result in code

How we are trying to improve the pipeline.

Specificity to tumor cells:

• Many putative coding sequences may be un-annotated species belonging to infiltrating cells.

• We are creating single-cell suspensions and separating tumor cells from other cells, and sequencing each component.

40S$ 60S$ 80S$

2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$

• Separa onfollowingsucroseultracentrifuge.

Numberofribosome'sbound:asmeasuredbyop calreadout.

Ribosomes+Transcript/ribosome

DerivedfromOvcar3CellLine

How we are trying to improve pipeline

Specificity for coding sequences

• Enrich for mRNA’s undergoing active translation.

• Capture polysome-bound transcripts.

Result from one mouse pool (mouse heart). Actin beta, including annotated exon known to be selected for NSMD. Brings up an epistemological issue for proteomics people

What exactly do we mean by a protein coding gene?

Non-coding RNA (Malat1) found in mouse heart. Pronounced with 2 or 3 ribosomes . Interested in looking at ribosome foot printing

Is it really sufficient that we see ribosomes?

Summary

• Who cares about a millions of genomes. • Genomes looks to me like an engineering

problem and not really a research problem. • Relying on changes in proteins derived solely

from changes in cancer genomes (e.g., mutations) may not provide a large number of putative candidates.

• MS proteomics does not work well enough, RNA-seq works too well.

• We need someone to begin to better characterize the nucleotides contained in somatic tissues.

Credit

• People who did the work:

– Matt Fitzgibbon (Computational lead).

– Nigel Clegg (visual curation and EST database).

– Damon May (IGV Visual curation).

– Lindsay Bergen (all Laboratory work).

• Funding:

– No. HHSN261200800001E: NCI in-Silico Center of Excellence

– Canary Foundation.

– Illumina

• Thanks:

– Vivian MacKay (UW Biochem), polysome fractionation.

– Nicole Urban, Chuck Drescher, FHCRC Ovarian SPORE.

Dr. Martin McIntosh: Identifying Cancer Selective Proteins Using RNA-Sequencing and Bioinformatics...

Health & Medicine

Transcript of Dr. Martin McIntosh: Identifying Cancer Selective Proteins Using RNA-Sequencing and Bioinformatics...

A comparison of sequencing platforms and bioinformatics ...

RNA sequencing and bioinformatics analysis of human lens … · 2021. 3. 26. · RESEARCH ARTICLE Open Access RNA sequencing and bioinformatics analysis of human lens epithelial cells

BIOINFORMATICS BIOINFORMATICS NEXT-GENERATION SEQUENCING VS. MICROARRAY IN EPIGENETIC Bosch Vincent Klecker Sophie Straat Julien © Copyright 2013.

Sequencing and Mapping CAP 5937-01 Bioinformatics Fall 2004 Amar Mukherjee.

Sequencing and Sequence Alignment CIS 667 Bioinformatics Spring 2004.

Bioinformatics Methods and Computer Programs for Next-Generation Sequencing Data Analysis Gabor Marth Boston College Biology Next Generation Sequencing.

Introduction to Bioinformatics of Bisulfite Sequencing ...publish.illinois.edu/compgenomicscourse/files/2020/06/Jenkinson_UI… · Introduction to Bioinformatics of Bisulfite Sequencing

Next-generation Sequencing and Bioinformatics for Plant ... · Next-generation Sequencing and Bioinformatics for Plant Science Edited by Vijai Bhadauria Caister Academic Press

Bioinformatics and Sequencing Facility - Biology

Next Generation Sequencing and By. The world wide sequencing capacity exceeds 14Ptb 4 years = Bioinformatics is The Largest.

IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics.

Whole-Genome Sequencing and Bioinformatics as Pertinent ...

Bioinformatics for Next Generation Sequencing Data...Bioinformatics for Next Generation Sequencing Data Alberto Magi 1,2,3,*,†, Matteo Benelli 1,2,4,†, Alessia Gozzini 1, Francesca

CS 5263 Bioinformatics CS 4593 AT: Bioinformatics Next-generation sequencing technology.

High Throughput Sequencing. Agenda Introduction to sequencing Applications Bioinformatics analysis pipelines What should you ask yourself before planning.

Bioinformatics Pipeline for Fosmid based Molecular Haplotype Sequencing

Whole exome sequencing implicates eye development, the ... · the economical next-generation sequencing technique of whole exome sequencing (WES), the current costs and bioinformatics

1. 2 Cloning and Sequencing Explorer Series Bioinformatics.

Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.

Application of Sequencing Technology and Bioinformatics to Phytomedicine