RNA-seq analysis
description
Transcript of RNA-seq analysis
![Page 1: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/1.jpg)
© FIMM - Institiute for Molecular Medicine Finland www.fimm.fi
![Page 2: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/2.jpg)
© FIMM - Institiute for Molecular Medicine Finland www.fimm.fi
RNA-seq analysisDr.Tech. Daniel Nicorici
FIMM – Institute for Molecular Medicine Finland
CSC - June 2, 2010
![Page 3: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/3.jpg)
www.fimm.fi
Outline
› RNA sequencing overview
› Finding fusion genes
› Alternative splicing
› Conclusions
3
![Page 4: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/4.jpg)
www.fimm.fi
RNA-seq
› high-throughput sequencing technology for sequencing RNAs (actually cDNAs which contain the RNAs' content)
› invaluable tool for study of diseases like cancer
› allows researchers to obtain information like: gene/transcript/exon expressions alternative splicing gene fusions post-transcriptional mutations single nucleotide variations …
4
![Page 5: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/5.jpg)
www.fimm.fi
RNA-seq - cont’d
› It reduces greatly the variability between experiments compared to other established measurement technologies like microarrays, exon arrays, etc.
› Due to the small size of the read (cDNA is fragmented before sequencing) the bioinformatics analysis is challenging, e.g.
de novo assembly aligning of sequenced reads computation of gene/transcript/exon expressions
5
![Page 6: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/6.jpg)
www.fimm.fi
Reads in RNA-seq
Fig. 1 – Adaptor and reads in RNA-seq
adaptoradaptor
This is sequenced (short reads)
5’ end 3’ end
6
![Page 7: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/7.jpg)
www.fimm.fi
Reads in RNA-seq – cont’d
Exon A Exon B
Exon A Exon Btranscript
chromosome
???
?
?
Exon C Exon D
Exon C Exon D
???
?
?
Fig. 2 – Reads’ mappings at chromosome and transcript level
7
![Page 8: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/8.jpg)
www.fimm.fi
Why RNA-seq?
RNA-seq
cDNA array
SNPs array
Exon array(alternative splicing)
~1000€/sample
- exon/transcripts expressions- gene expressions- alternative splicing events- SNPs- fusion genes- ...
Exon array(fusion genes)
~700€/sample
~400€/sample
~600€/sample
~700€/sample
Fig. 3 – RNA-seq vs array technologies
8
![Page 9: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/9.jpg)
www.fimm.fi
General steps of RNA-seq analysis
1. Filtering of short reads
2. Aligning the reads against a reference
3. Computationaly analysing of reads’ alignments1. compute the gene/transcript/exon expressions
2. find new/known alternative splicing events
3. find new/known fusion genes
4. find new/known SNPs
4. Visualization
9
![Page 10: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/10.jpg)
www.fimm.fi
Examples of RNA-seq visualization
Fig. 4 – Visualization using MapView
10
![Page 11: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/11.jpg)
www.fimm.fi
Examples of RNA-seq visualization – cont’d
Fig. 5 – Coverage plot
11
![Page 12: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/12.jpg)
www.fimm.fi
Examples of RNA-seq visualization – cont’d
Coverage plot for gene ERBB2 in breast cancer
Coverage plot for gene ERBB2 in normal breast
Nor
mal
ized
cov
erag
eN
orm
aliz
ed c
over
age
4.41
0.00
0.00
130.71
Fig. 6 – Coverage plots visualization 12
![Page 13: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/13.jpg)
www.fimm.fi
Examples of RNA-seq visualization – cont’d
Fig. 7 – Visualization of reads’ mappings using the UCSC browser
13
![Page 14: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/14.jpg)
www.fimm.fi
Examples of RNA-seq visualization – cont’d
Fig. 8 – Visualization of coverages using UCSC browser
14
![Page 15: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/15.jpg)
www.fimm.fi
Examples of RNA-seq visualization – cont’d
Fig. 9 – ”Gel-like” visualization of coverages using UCSC browser
15
![Page 16: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/16.jpg)
www.fimm.fi
Examples of RNA-seq visualization – cont’d
Fig. 10 – Histogram of distances between the paired-end reads
16
![Page 17: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/17.jpg)
www.fimm.fi
Examples of RNA-seq visualization – cont’d
Fig. 11 – Visualization of candidate fusion genes
17
![Page 18: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/18.jpg)
www.fimm.fi
Finding fusion genes
Steps:
1. Reads filtering (quality, B’s, etc.)
2. Align all reads on genome
3. Aligning against the transcriptome all the reads which map uniquely on genome, or do not map on genome
4. Find the candiates fusion-genes by looking for paired-end reads which map simultaneusly on two different transcripts from two different genes
5. Find the fusion junction (e.g. generating exon-exon combinations and find on which one the reads are aligning)
6. Filtering of candidate fusion-genes
18
![Page 19: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/19.jpg)
www.fimm.fi
Reads in RNA-seq – cont’d
Exon A Exon B
Exon A Exon Btranscript
chromosome
???
?
?
Exon C Exon D
Exon C Exon D
???
?
?
Fig. 2 – Reads’ mappings at chromosome and transcript level
19
![Page 20: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/20.jpg)
www.fimm.fi
Finding fusion genes – cont’d
› RNA-seq data for the leukemia K562 cell line [1] Philadelphia chromosome with the known BCR-ABL fusion genes ~15 000 candidate fusion-genes found ~85% candidate fusion-genes are known paralogs or have no protein
product!!! 15 candidate fusion-genes are found after additional filtering of candidate
fusion-genes where the known BCR-ABL is number one candidate
› Filtering of candidate fusion-genes is highly necessary in order to reduce the large number of candidate fusion-genes (from ten of thousands to tens)!!!
20
![Page 21: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/21.jpg)
www.fimm.fi
Alternative splicing
› process by which the gene’s exons are pieced together in multiple ways forming mRNA during the RNA splicing.
› there is a large body of evidence showing the links between alternative splicing and different diseases like cancer
› Shannon’s entropy from information theory has been used previously for finding the imbalance in transcript expression [2,3]
› Jensen-Shannon divergence has been used in quantifying the relative changes in expression of transcripts [4]
› MDL [5] can be used for measuring the relative changes in expression of transcripts too
21
![Page 22: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/22.jpg)
www.fimm.fi
Alternative splicing – cont’d
Steps:
1. Reads filtering (quality, B’s, etc.)
2. Align all reads on genome
3. Aligning against the transcriptome all the reads which map uniquely on genome, or do not map on genome
4. Compute (normalized) transcript expressions (e.g. RPKM)
5. Repeat steps 1-4 for all samples
6. Find relative-changes/imbalances between their transcript expressions of the same gene across the group of samples
22
![Page 23: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/23.jpg)
www.fimm.fi
Alternative splicing – cont’d
23
Transcript of gene ”G” Sample ”A” Sample ”B”
Transcript 1 3 1
Transcript 2 5 7
Transcript 3 4 2
Transcript 4 4 6
Transcript 5 2 3
Table 1 – Example of a gene with its five transcripts
![Page 24: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/24.jpg)
www.fimm.fi
Alternative splicing – cont’d
24
)(log18
2log2
18
4log4
18
4log4
18
5log5
18
3log3)( 5,182 bitsCAL
)(log19
3log3
19
6log6
19
2log2
19
7log7
19
1log1)( 5,192 bitsCBL
)(log37
5log5
37
10log10
37
6log6
37
12log12
37
4log4)( 5,372 bitsCBAL
imbalanceistherethenBLALBALIf )()()(
54321
54321
54321
54321
0,,,,54321
5, ,,,,
iiiii
iiiiiniiiii
n n
i
n
i
n
i
n
i
n
i
iiiii
nCwhere
› Computing the imbalance of transcript expression for example from Table 1 using MDL method [5]:
› MDL’s advantage: the criteria for deciding between balanced/imbalanced is built-in
Transcript of gene ”G” Sample ”A”
Sample ”B”
Transcript 1 3 1
Transcript 2 5 7
Transcript 3 4 2
Transcript 4 4 6
Transcript 5 2 3
![Page 25: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/25.jpg)
www.fimm.fi
Alternative splicing – cont’d
› only the transcripts which are validated (e.g. there are reads which map only on the given transcript [3]) are used for finding the imbalances
› for example in a prostate cancer control sample versus treated sample are found ~3500 alternatively spliced genes
25
![Page 26: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/26.jpg)
www.fimm.fi
Conclusions
› RNA-seq data analysis: is computational intensive (when compared to, for example, microarray
analysis) needs very good filtering criteria, which are based on biology mathematics, in
order to improve the quality of the results (i.e. low number of false positives) there is not only one established way of doing it many tools used for analysis, e.g. aligners, samtools, etc., are still work in
progress
› Visualization: multiple facets, i.e. read coverage, fusion genes, etc. depends on the user profile:
1. biologist/medical doctor
2. bioinformatician
26
![Page 27: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/27.jpg)
www.fimm.fi
References
1. Berger M. et al., Integrative analysis of the melanoma transcriptome, Genome Research, Feb. 2010.
2. Ritchie W. et al., Entropy measures quantify global splicing disorders in cancer, PLOS Computational Biology, vol. 4, March 2008.
3. Gan Q. et al., Dynamic regulation of alternative splicing and chromatin structure in Drosophila gonads revealed by RNA-seq, Cell Research, May 2010.
4. Trapnell C. et al., Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation, Nature Biotechnology, vol. 28, May 2010.
5. P. Grunwald, “Minimum description length principle tutorial”, in Advances in Minimum Description Length: Theory and Applications, P. Grunwald, I.J. Myung, and M. Pitt, Eds., pp. 22-79. MIT Press, Cambridge, 2005.
27
![Page 28: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/28.jpg)
www.fimm.fi
Acknowledgements
› Olli Kallioniemi
› Janna Saarela
› Henrik Edgren
› Astrid Murumägi
› Sara Kangaspeska
› Pekka Ellonen
28
![Page 29: RNA-seq analysis](https://reader030.fdocuments.us/reader030/viewer/2022033019/56814dd6550346895dbb3c26/html5/thumbnails/29.jpg)
www.fimm.fi
› Thank you!
29