Analysis of RNA-seq Data - Bioinformatics-core-shared...
Transcript of Analysis of RNA-seq Data - Bioinformatics-core-shared...
![Page 1: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/1.jpg)
Analysis of
RNA-seq Data
Bernard Pereira
![Page 2: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/2.jpg)
The many faces of RNA-seq
![Page 3: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/3.jpg)
Applications
Discovery • Find new transcripts
• Find transcript boundaries
• Find splice junctions
Comparison Given samples from different experimental conditions, find effects of the treatment on
• Gene expression strengths
• Isoform abundance ratios, splice patterns, transcript boundaries
![Page 4: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/4.jpg)
Applications
![Page 5: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/5.jpg)
Differential Expression
Mortazavi, A. et al (2008) Nature Methods
• Comparing feature abundance under different conditions
• Assumes linearity of signal over a range
of expression levels
• When feature=gene, well-established
pre- and post-analysis strategies exist
![Page 6: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/6.jpg)
Range of detection
Wang et al (2014) Nature Biotech. Guo et al. (2013) Plos One
![Page 7: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/7.jpg)
Library Prep i
Malone, J.H. & Oliver, B.
A B
![Page 8: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/8.jpg)
Library Prep ii
Biological Technical
![Page 9: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/9.jpg)
Library Prep iii A B
![Page 10: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/10.jpg)
Library Prep iii
A B
![Page 11: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/11.jpg)
Library Prep iv
Hansen, K.D. et al. (2010) Nuc. Acids Res.
![Page 12: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/12.jpg)
Library Prep v
• Duplicates (optical & PCR)
• Sequence errors
• Indels
• Repetitive/problematic
sequence
![Page 13: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/13.jpg)
Hot off the sequencer…
Auer and Doerg (2010) Genetics
![Page 14: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/14.jpg)
FASTQC
![Page 15: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/15.jpg)
Trimming
• Quality-based trimming
• Adapter ‘contamination’
![Page 16: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/16.jpg)
Sequence to sense
Haas, B.J. & Zody, M.C. (2010) Nature Biotech.
![Page 17: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/17.jpg)
De novo assembly
Haas, B.J.. et al (2013) Nature Protocols
• eg. Trinity
![Page 18: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/18.jpg)
Reference-based assembly
Genome mapping • Can identify novel features • Spice aware? • Can be difficult to reconstruct
isoform and gene structures
Transcriptome mapping • No repetitive reference • Overcomes issues of complex
structures • Novel features? • How reliable is the
transcriptome?
Trapnell & Salzberg (2009) Nature Biotech
![Page 19: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/19.jpg)
A smart suit(e)
Trapnell, C. et al (2012) Nature Protocols
![Page 20: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/20.jpg)
Tophat/Bowtie
Kim, D. et al (2012) Genome Biology
![Page 21: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/21.jpg)
Tophat/Bowtie
Kim, D. et al (2012) Genome Biology
![Page 22: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/22.jpg)
Cufflinks
Trapnell, C. et al. (2010) Nature Biotech.
![Page 23: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/23.jpg)
How do we look?
![Page 24: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/24.jpg)
Duplicates & RNA-seq
? Single-end vs paired-end
Variant calling vs DE analysis
Intrinsically lower complexity
Highly expressed genes
Platform/pipeline
Model as part of counting process
![Page 25: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/25.jpg)
Counting
Genome-based features • Exon or gene boundaries?
• Isoform structures?
• Gene multireads?
Oshlack, A. et al. (2010) Genome Biology
Transcript-based features • Transcript assembly?
• Novel structures?
• Isoform multireads?
![Page 26: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/26.jpg)
Counting • eg. HTseq
![Page 27: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/27.jpg)
Counting
Mortazavi, A. et al (2008) Nature Methods
![Page 28: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/28.jpg)
Library size • Sequencing depth varies
between samples
Counting & normalisation
• An estimate for the relative counts for each gene is obtained
• Assumed that this estimate is representative of the original population
Gene Properties • GC content, length,
sequence
Library composition • Highly expressed genes
overrepresented at cost of lowly expressed genes
![Page 29: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/29.jpg)
Normalisation i Total Count • Normalise each sample by total number of reads sequenced.
• Can also use another statistic similar tototal count; eg. median, upper quartile
Robinson, M.D. & Oshlack, A. (2010) Genome Biology
![Page 30: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/30.jpg)
Normalisation ii
Oshlack, A. & Wakefield, M.J. (2009) Biology Direct
RPKM • Reads per kilobase per million =
reads for gene A
length of gene A X Total number of reads
![Page 31: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/31.jpg)
Normalisation iii Geometric scaling factor • Implemented in DESeq
• Assumes that most genes are not differentially expressed
GM of Gene 1
GM of Gene 2
GM of Gene 3
GM of Gene N
.
.
.
RC of Gene 2
RC of Gene 2
RC of Gene 3
RC of Gene N
.
.
.
Median
RC = read counts (per sample) GM =geometric mean (all samples)
![Page 32: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/32.jpg)
Normalisation iv Trimmed mean of M • Implemented in edgeR
• Assumes most genes are
not differentially
expressed
Weight each gene by inverse of its variance (‘trimming’)
k k’
For each gene
Mean weighted ratio
g = each gene
![Page 33: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/33.jpg)
Differential expression • Simple
3 2 1 0
A
B
All we need • Know what the data looks like
• Some measure of difference
Cond A
Cond B
Gene X
Other
![Page 34: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/34.jpg)
Modelling – old trends
• What the data looks like: normal distribution
• Some measure of difference: t-test etc
• Technical replicates introduce some variance
k
![Page 35: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/35.jpg)
Modelling – in fashion • Use the Poisson distribution for count data from technical replicates
• Just one parameter required – the mean
k
![Page 36: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/36.jpg)
Modelling – in fashion
k1
k2
• Biology is never that simple…
Anders, S. & Huber, W. (2010) Genome Biology
• The negative binomial distribution represents an overdispersed Poisson
distribution, and has parameters for both the mean and the overdispersion.
![Page 37: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/37.jpg)
Modelling – in fashion
Robinson, M.D. & Smyth, G. (2008) Biostatistics
• Estimating the dispersion parameter can be difficult with a small
number of samples
• ‘Share’ information from all genes to obtain global estimate
![Page 38: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/38.jpg)
Shrinkage
Robinson, M.D. & Smyth, G. (2007) Bioinformatics
• Genes do not share a common dispersion parameter
• ‘Moderated’ estimate – assign a per-gene weight to the combined estimate
![Page 39: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/39.jpg)
DESeq • DESeq fits a mean/dispersion relationship model
• Shifts individual estimates to regression line
![Page 40: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/40.jpg)
The mean-variance relationship • Variance = Technical (variable) + Biological (constant)
• A=technical replicates ---> E =(very) biologically different replicates
Law et al. (2010) Genome Biology
![Page 41: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/41.jpg)
Filtering • Independent filtering = remove genes that have little chance of
showing DE
• Can use eg. total count
![Page 42: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/42.jpg)
Liu et al. (2014) Bioinformatics
On replicates…
![Page 43: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/43.jpg)
HIGH MEDIUM LOW
Liu et al. (2014) Bioinformatics
On replicates…
![Page 44: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/44.jpg)
On replicates…
Liu et al. (2014) Bioinformatics
![Page 45: Analysis of RNA-seq Data - Bioinformatics-core-shared ...bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/... · Differential Expression Mortazavi, A. et al (2008)](https://reader031.fdocuments.us/reader031/viewer/2022021622/5b78e4da7f8b9a331e8c8d29/html5/thumbnails/45.jpg)
Summary
Oshlack, A. et al (2010) Genome Biology