Cell/Tissue di erences are re ected in...

24
Rickard Sandberg Transcriptomics with RNA-Seq Assistant Professor Ludwig Institute for Cancer Research Department of Cell and Molecular Biology Karolinska Institutet Feb 2012 Thursday, May 24, 12 hair cells hippocampal neuron kidney cells Mammals: 100s of cell types, tissues, organs, systems muscle cells Cell/Tissue dierences are reected in gene expression patterns Thursday, May 24, 12 hair cells hippocampal neuron kidney cells Mammals: 100s of cell types, tissues, organs, systems muscle cells Cell/Tissue dierences are reected in gene expression patterns Thursday, May 24, 12 hair cells hippocampal neuron kidney cells Mammals: 100s of cell types, tissues, organs, systems muscle cells Cell/Tissue dierences are reected in gene expression patterns zygote blastocyst Thursday, May 24, 12

Transcript of Cell/Tissue di erences are re ected in...

Page 1: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Rickard Sandberg

Transcriptomics with RNA-Seq

Assistant ProfessorLudwig Institute for Cancer ResearchDepartment of Cell and Molecular BiologyKarolinska Institutet

Feb 2012Thursday, May 24, 12

hair cells hippocampal neuron

kidney cells

Mammals: 100s of cell types, tissues, organs, systems

muscle cells

Cell/Tissue di!erences are re"ected in gene expression patterns

Thursday, May 24, 12

hair cells hippocampal neuron

kidney cells

Mammals: 100s of cell types, tissues, organs, systems

muscle cells

Cell/Tissue di!erences are re"ected in gene expression patterns

Thursday, May 24, 12

hair cells hippocampal neuron

kidney cells

Mammals: 100s of cell types, tissues, organs, systems

muscle cells

Cell/Tissue di!erences are re"ected in gene expression patterns

zygote blastocyst

Thursday, May 24, 12

Page 2: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

hair cells hippocampal neuron

kidney cells

Mammals: 100s of cell types, tissues, organs, systems

muscle cells

Cell/Tissue di!erences are re"ected in gene expression patterns

zygote blastocyst

All the information needed to encode an organisms is captured in the genome of the zygote together with the proteins that act on the genome

Thursday, May 24, 12

Transcriptome analyses

Thursday, May 24, 12

Transcriptome analyses

- rRNAs (dominating, ~95%)

- mRNAs (~5%)

- long non-coding RNAs (e.g. lincRNAs) (~0.05%)

- snoRNAs, snRNAs

- microRNAs, piRNAs

Thursday, May 24, 12

Di!erent protocols identify di!erent parts of the transcriptome

- rRNAs

- mRNAs

- long non-coding RNAs (e.g. lincRNAs)

- snoRNAs, snRNAs

- microRNAs, piRNAs

PolyA selection

Thursday, May 24, 12

Page 3: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Di!erent protocols identify di!erent parts of the transcriptome

Ribominus(removal of

ribosomal RNAs)

- rRNAs

- mRNAs

- long non-coding RNAs (e.g. lincRNAs)

- snoRNAs, snRNAs

- microRNAs, piRNAs

Thursday, May 24, 12

Di!erent protocols identify di!erent parts of the transcriptome

Ribominus(removal of

ribosomal RNAs)

- rRNAs

- mRNAs

- long non-coding RNAs (e.g. lincRNAs)

- snoRNAs, snRNAs

- microRNAs, piRNAs

not so randomhexamers or DSN

Thursday, May 24, 12

Di!erent protocols identify di!erent parts of the transcriptome

small RNA protocol

- rRNAs

- mRNAs

- long non-coding RNAs (e.g. lincRNAs)

- snoRNAs, snRNAs

- microRNAs, piRNAs

Thursday, May 24, 12

Methods for sequence library generation

Thursday, May 24, 12

Page 4: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Isolate polyA+ RNA

mRNA-seq protocol

Wang et al. 2009 Nat Rev Gen

Thursday, May 24, 12

Isolate polyA+ RNA

mRNA-seq protocol

Wang et al. 2009 Nat Rev Gen

Thursday, May 24, 12

Isolate polyA+ RNA

mRNA-seq protocol

Wang et al. 2009 Nat Rev Gen

! polyA+ RNAs! rRNA- RNAs! short RNAs (e.g. miRNAs)! Ribosome footprint

sequencing! GRO-Seq (Global Run On

sequencing)! CLIP-Seq (RNA-protein

interactions)

! non-RNA applications:ChIP-Seq, DNAse hypersensitive sites,...

Thursday, May 24, 12

Strand-speci#c RNA-Seq protocols

Thursday, May 24, 12

Page 5: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

TestesLiverSkeletal MuscleHeartAK074759BC011574AK092689

log 1

0(read

s) 02

02

02

02

3B

3A

3B

RNA-Seq generate quantitative expression estimates

<10M readsThursday, May 24, 12

TestesLiverSkeletal MuscleHeartAK074759BC011574AK092689

log 1

0(read

s) 02

02

02

02

3B

3A

3B

RNA-Seq generate quantitative expression estimates

<10M reads

Brain expression / UHR expression (Taqman)

Bra

in R

eads

/ U

HR

Rea

ds (

RN

A-S

EQ

)

104

R = 0.953slope = .933103

102

101

100

10-1

10-2

10-3

10-4

104 103 102 101 100 10-1 10-2 10-3 10-4

Mortazavi et al. Nat Methods 2008Ramskold et al. PLoS Comp Biol 2009

03691215 12.3

0.13 0.10Exon Intron Intergenic

MKPR

Wang*, Sandberg* et al. Nature 2008

150x

Thursday, May 24, 12

How gene expression levels are estimated

gene A (2 kb transcript)gene B (600 bp transcript)

Thursday, May 24, 12

How gene expression levels are estimated

gene A (2 kb transcript)gene B (600 bp transcript)

FragmentationThe number of fragments are proportional to the abundance and length of the transcript.

Thursday, May 24, 12

Page 6: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

How gene expression levels are estimated

gene A (2 kb transcript)gene B (600 bp transcript)

ACGCG...TCGAG...AGGTA...CCGTG...CTGCG...

Sequencing

FragmentationThe number of fragments are proportional to the abundance and length of the transcript.

Thursday, May 24, 12

How gene expression levels are estimated

gene A (2 kb transcript)gene B (600 bp transcript)

ACGCG...TCGAG...AGGTA...CCGTG...CTGCG...

Sequencing

FragmentationThe number of fragments are proportional to the abundance and length of the transcript.

Normalize for different transcripts lengths and different sequence depths in different samples.

RPKM (Reads per kilobase and million mappable reads): Given 10 million mappable reads:

RPKM, Gene A: 500 reads x 1000/2000 x 106/107

500 / (2 x 10) = 25 RPKM

RPKM roughly corresponds to transcripts per cell (Mortazavi et al. 2008)(assuming a standard cell with ~ 300.000 transcripts)

Thursday, May 24, 12

How gene expression levels are estimated

gene A (2 kb transcript)gene B (600 bp transcript)

ACGCG...TCGAG...AGGTA...CCGTG...CTGCG...

Sequencing

FragmentationThe number of fragments are proportional to the abundance and length of the transcript.

Normalize for different transcripts lengths and different sequence depths in different samples.

RPKM (Reads per kilobase and million mappable reads): Given 10 million mappable reads:

RPKM, Gene A: 500 reads x 1000/2000 x 106/107

500 / (2 x 10) = 25 RPKM

RPKM roughly corresponds to transcripts per cell (Mortazavi et al. 2008)(assuming a standard cell with ~ 300.000 transcripts)

Fragments PKM (FPKM)

Thursday, May 24, 12

Gene quanti#cation and mRNA copy numbers in cells

CN

X LT

=

X =109R T

C, number of reads mapping to transcriptN, total number of sequenced reads

X, copies per cell of transcriptT, total length of transcriptomeL, transcript length

R, RPKM (reads per kilobase and million mappable reads)

T, can be estimated from

1. starting amount of mRNA2. spiked in controls3. estimate transcriptome length - if 300.000 transcript of around 1500 nt each -> 4.5 *108

- 1 RPKM ~ 0.5 transcripts per cell

XN LC T= =

106R T103

Thursday, May 24, 12

Page 7: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Use molecular barcodes

Kivioja et al. Nature Methods 9, 72–74 (2012)

Thursday, May 24, 12

RNA sequencing of blastocyst-derived cell lines

Read counts for selected genes

ES TS XEN EpiSCNanog 6525 20 1 263

Cdx2 124 6256 1 1

Sox17 11 5 9814 99

Sox3 151 1234 6 796

Shh 0 0 0 1

Ihh 4 12 107 17

Dhh 10 212 575 80

Thursday, May 24, 12

Signi#cance of expression level

background RPKM ~ 0.05 RPKMdetection level of 0.3 RPKMan average 1 500 nt transcript20 M uniquely mapping reads

background model:0.05 x 1.5 x 20 = 1.5 reads

expressed at 0.3 RPKM:0.3 x 1.5 x 20 = 9 readsbinomial test for 9 reads out of 20 M mappingto transcript given a background probability of 1.5 / 20x109

gives a p-value of 2.8e-5

expressed at 1 RPKM:1 x 1.5 x 20 = 30 reads

0.05 RPKM1 RPKM

Thursday, May 24, 12

Depth needed for accurate expression level estimation

Perc

enta

ge o

f gen

es w

ithin

±20

% o

f fin

al e

xpre

ssio

n

100

80

60

40

20

01 5 10 15 20 25 30 35 40 45

1-9 RPKM (n=4338)10-29 RPKM (n=3048)30-99 RPKM (n=2817)100-999 RPKM (n=1469)1000-6705 RPKM (n=56)

Million mapped reads

B

A

01 5 10 15 20 25 30 35 40 45

Million mapped reads

Perc

enta

ge o

f gen

es w

ithin

fold

-cha

nge

of fi

nal e

xpre

ssio

n

100

80

60

40

20

2-fold1.5-fold1.2-fold1.1-fold1.05-fold

Mortazavi et al. 2008 Ramskold/Kavak et al. 2011 (bookchapter)

Thursday, May 24, 12

Page 8: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Our default view of gene expression?

Thursday, May 24, 12

mRNA isoform regulation

Alternative Promoters

CoreExtens.

Alternative Splice Sites

MXE1 MXE2

Mutually Exclusive Exons

5‘ Exon 3‘ ExonSE

Skipped Exons

Alternative Polyadenylation

pA pA

Thursday, May 24, 12

• Expressed Sequence Tags• Traditional 3’UTR focused microarrays• Exon and Tiling Arrays• Deep Sequencing using Illumina/Solexa, SOLiD, (454)

Genome-wide detection of mRNA isoforms

Thursday, May 24, 12

RNA-Sequencing: Challenges and opportunities

! Aligning RNA-Seq data

! Di!erential expression

! De novo transcriptome reconstruction

! Alternative RNA isoforms

! Gene fusion events

! SNPs and mutations

! RNA-Seq analyses pipelines

! Single-cell RNA-Sequencing

Thursday, May 24, 12

Page 9: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Mapping of millions of short reads

Task: Map millions of short sequences (25-100 nt) onto a genome (3 000 Mbp ) or transcriptome

Mismatches (sequencing errors and SNPs)

Unique / Repetitive matches

Indels (Normal variation, CNVs)

Large rearrangements (translocations)

BLAST, BLAT tools not designed for these tasks

Thursday, May 24, 12

Strategies for mapping splice junctions

- Compilations of known and putative splice junctions and consequent mapping towards genome and junctions

- Mapping of reads towards genome to !nd exons, then search unmapped reads towards all combinations of exon-exon border from the exons found (and given some maximal distance)

- Split read in smaller parts, map separately, !nd the junctions

- Map towards transcriptome, convert coordinates to genome, remove redundancies,

Thursday, May 24, 12

Genome Chromosome Fasta Files

+

Known and putative splice junctions Fasta File

2. map reads towardsgenome + junction compilation

GTAAGT-----------AG Exon n+1

1. compile sets of junctions

Exon n

Compilation of splice junctions

Thursday, May 24, 12

Tophat MethodIdentifying the transcriptome

A B C identify candidate exons

via genomic mapping

A B C A B C Generate possible

pairings of exons

Align “unmappable”

reads to possible junctions

A B C A B C

Thursday, May 24, 12

Page 10: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Longer readsLonger reads

GATGTTCTCAGTGTCC GATGTAATCAGTGTCC AACCCTCTCAGTGTCC

>HWI-EAS229_75_30DY0AAXX:7:1:0:949

Very long (100Kb+) intron

By segmenting the long reads, and mapping the segments independently, we can

look harder for junctions we might have missed with shorter reads

Running time

independent of

intron size

Thursday, May 24, 12

Mapping to transcriptomeExons 5’UTR 3’UTRIntronsGene:

DNA (genome)W

C

pre-mRNA

Transcription

AAAAA

RNA processing (splicing, polyadenylation)

mRNA AAAAA

Exons 5’UTR 3’UTRIntronsGene:

DNA (genome)W

C

Thursday, May 24, 12

Microexons and junction coverage

Exons 5’UTR 3’UTRIntronsGene:

DNA (genome)W

C

2 or more splice junctions within the same read

in-house mapping tophat mapping

Thursday, May 24, 12

Microexons and junction coverage

Exons 5’UTR 3’UTRIntronsGene:

DNA (genome)W

C

2 or more splice junctions within the same read

in-house mapping tophat mapping

Di"erent read length will have di"erent problems!Thursday, May 24, 12

Page 11: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Mismapping rates for splice junctions

discovery of novel junctions

Thursday, May 24, 12

Mapping of RNA-Seq reads

Garber et al. 2011 Nat Methods

Thursday, May 24, 12

Paired reads mapping can be more accurate

Picking the right alignment

2 mismatches Exact match

Bowtie reports the “best” alignment it comes across, but this isn’t

always the right one. To do a better job, we want paired end

reads

Thursday, May 24, 12

! Check the fraction of reads that mapped

! Check the fraction of splice junctions mapped

! (optional) Reads are positioning along transcripts

! and Visualize the data!

Pro#le the mapped data to #gure out how well the library prep worked

Thursday, May 24, 12

Page 12: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Visualization

Integrated Genome Viewer (Broad Inst.)

Custom tracks at UCSC Genome Browser

Thursday, May 24, 12

Visualization of aligned reads (in IGV)

Thursday, May 24, 12

Integrated Genome Viewer

Imports many mentioned formats (SAM, BAM, BED etc)

Excellent for visualization of RNA-Sequencing or ChIP-sequencing data

Can also download/visualize data from public or private servers

Thursday, May 24, 12

! Aligning RNA-Seq data

! Di!erential expression

! De novo transcriptome reconstruction

! Alternative RNA isoforms

! Gene fusion events

! SNPs and mutations

! RNA-Seq analyses pipelines

! Single-cell RNA-Sequencing

RNA-Sequencing: Challenges and opportunities

Thursday, May 24, 12

Page 13: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Time for more quality controls:Look at replicates and that samples group by

origin/type

Hierarchical clustering

!100

!50

0

50

100

150

í100 !50 0 50 100 150

PC3 (n=4)

T24(n=4)

Lncap (n=4)

SVD component 1

SVD

com

pone

nt 2

PCA / SVD

Thursday, May 24, 12

Di!erential Expression

Either based on reads or RPKM values

RPKM, Gene A: 500 reads x 1000/2000 x 106/107

500 / (2 x 10) = 25 RPKM

Most tools developed for microarrays are based on RPKM values,whereas RNA-Seq tools aim to use read counts

Reads • have more statistical power• have unresolved biases• need fewer replicates?

RPKMs• better understood statistics, but lack of power

log 1

0(read

s) 02

02

02

02

3B

3A

3B

Thursday, May 24, 12

Statistical models of di!erential expression

Thursday, May 24, 12

non-coding RNAs in prostate cancer:Expression and di!erential expression

Thursday, May 24, 12

Page 14: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Transcript length e!ects in di!erential expression tests

Oshlack and Wake!eld Biology Direct 2009Thursday, May 24, 12

Transcript length e!ects in di!erential expression tests

Oshlack and Wake!eld Biology Direct 2009

p-values should not be the basis for sorting

Thursday, May 24, 12

! Library generation

! Aligning RNA-Seq data

! Gene expression calculations and di!erential expression

! De novo transcriptome reconstruction

! Alternative RNA isoforms

! Gene fusion events

! SNPs and mutations

! RNA-Seq analyses pipelines

! Single-cell RNA-Sequencing

RNA-Sequencing: Challenges and opportunities

Thursday, May 24, 12

Finding novel non-annotated genes or transcript variants

Thursday, May 24, 12

Page 15: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Two principal approaches for transcriptome reconstruction

Thursday, May 24, 12

scripture cufflinks

Genome-guided transcriptome reconstruction

Thursday, May 24, 12

Transcript reconstruction

Nature Biotechnology, April 2010

Thursday, May 24, 12

Increased depth improves reconstruction

Nature Biotechnology, April 2010

Thursday, May 24, 12

Page 16: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Discovery of cell-type speci#c alternative isoforms

Nature Biotechnology, April 2010

Thursday, May 24, 12

Genome-independent transcriptome reconstruction

Garbherr et al. Nature Biotechnology, July 2011

Default k = 25

Thursday, May 24, 12

Garbherr et al. Nature Biotechnology, July 2011

Genome-independent transcriptome reconstruction: accuracy and coverage

Thursday, May 24, 12

Genome-independent transcriptome reconstruction: accuracy and coverage

Garbherr et al. Nature Biotechnology, July 2011

Thursday, May 24, 12

Page 17: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

! Library generation

! Aligning RNA-Seq data

! Gene expression calculations and di!erential expression

! De novo transcriptome reconstruction

! Alternative RNA isoforms (e.g. alternative splicing)

! Gene fusion events

! SNPs and mutations

! RNA-Seq analyses pipelines

! Single-cell RNA-Sequencing

RNA-Sequencing: Challenges and opportunities

Thursday, May 24, 12

mRNA isoform regulation

Alternative Promoters

CoreExtens.

Alternative Splice Sites

MXE1 MXE2

Mutually Exclusive Exons

5‘ Exon 3‘ ExonSE

Skipped Exons

Alternative Polyadenylation

pA pA

Thursday, May 24, 12

Alternative Splicing as a Switch and as a Tuner

Thursday, May 24, 12

Alternative Splicing as a Switch and as a Tuner

Switching on the Fas receptor

5 76

Cascino et al. 1995

Thursday, May 24, 12

Page 18: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Alternative Splicing as a Switch and as a Tuner

Soluble Inhibition of apoptosis5 7

Switching on the Fas receptor

5 76

Cascino et al. 1995

Thursday, May 24, 12

Alternative Splicing as a Switch and as a Tuner

Soluble Inhibition of apoptosis5 7

Membrane-bound Apoptosis5 76

Switching on the Fas receptor

5 76

Cascino et al. 1995

Thursday, May 24, 12

Alternative Splicing as a Switch and as a Tuner

Soluble Inhibition of apoptosis5 7

Membrane-bound Apoptosis5 76

Switching on the Fas receptor

5 76

Cascino et al. 1995

Thursday, May 24, 12

Alternative Splicing as a Switch and as a Tuner

Soluble Inhibition of apoptosis5 7

Membrane-bound Apoptosis5 76

Switching on the Fas receptor

5 76

Cascino et al. 1995

Tuning the inner ear: splicing of calcium-activated potassium channels in hair cells

32v2

Ramanathan et al. 1999

Thursday, May 24, 12

Page 19: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Alternative Splicing as a Switch and as a Tuner

Low frequencies2 32v

Soluble Inhibition of apoptosis5 7

Membrane-bound Apoptosis5 76

Switching on the Fas receptor

5 76

Cascino et al. 1995

Tuning the inner ear: splicing of calcium-activated potassium channels in hair cells

32v2

Ramanathan et al. 1999

Thursday, May 24, 12

Alternative Splicing as a Switch and as a Tuner

Low frequencies2 32v

High frequencies2 3

Soluble Inhibition of apoptosis5 7

Membrane-bound Apoptosis5 76

Switching on the Fas receptor

5 76

Cascino et al. 1995

Tuning the inner ear: splicing of calcium-activated potassium channels in hair cells

32v2

Ramanathan et al. 1999

Thursday, May 24, 12

SE

An theoretical example: a skipped exon event

Brain

Liver

Detecting alternatively spliced exons

Thursday, May 24, 12

SE

An theoretical example: a skipped exon eventinclusion exclusion

Brain

Liver

Reads supporting:

(1+2+1)0

1 2

4Brain

Liver

Detecting alternatively spliced exons

Thursday, May 24, 12

Page 20: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

SE

An theoretical example: a skipped exon eventinclusion exclusion

Brain

Liver

Reads supporting:

(1+2+1)0

1 2

4

p~0.14

Brain

Liver

Detecting alternatively spliced exons

Thursday, May 24, 12

SE

5’ CE 3’ CESE

An theoretical example: a skipped exon eventinclusion exclusion

Brain

Liver

Reads supporting:

(1+2+1)0

1 2

4

p~0.14

Brain

Liver

Detecting alternatively spliced exons

Thursday, May 24, 12

SE

5’ CE 3’ CESE

An theoretical example: a skipped exon eventinclusion exclusion

Brain

Liver

Reads supporting:

(1+2+1)0

1 2

4

inclusion exclusion

Brain

Liver

Reads supporting:

(1+2+1) (2+1)

1(4+2+4)

4

10

3

p~0.14

Brain

Liver

Detecting alternatively spliced exons

Thursday, May 24, 12

SE

5’ CE 3’ CESE

An theoretical example: a skipped exon eventinclusion exclusion

Brain

Liver

Reads supporting:

(1+2+1)0

1 2

4

inclusion exclusion

Brain

Liver

Reads supporting:

(1+2+1) (2+1)

1(4+2+4)

4

10

3

p~0.14

p<0.05

Brain

Liver

Detecting alternatively spliced exons

Thursday, May 24, 12

Page 21: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Tissue-regulated mRNA isoforms

Wang*, Sandberg* et al. 2008 Nature

Thursday, May 24, 12

Coverage needed: Alternative mRNA isoform events

Assuming:comparing two isoforms with 25% vs 75% inclusion levels

Thursday, May 24, 12

The power of paired-end reads (informative length)

Katz et al. Nature Methods 2010

Thursday, May 24, 12

from Expression to Regulation, “RNA-maps”

Licatalosi D., et al. Nature 2008

Binding of NOVA splicing factor

-­50 0 0 0 +50

0.0

0.2

0.4

0.6

0.8

1.0

Mean  phastCons  score

+50 -­50

S E

-­50 +500

Position  relative  to  splice  junction

a d

0.0

0.2

0.4

0.6

0.8

1.0

Exon  Inclusion  Level,  Second  Tissue

0.0 0.2 0.4 0.6 0.8 1.0

Exon  Inclusion  Level,  Heart

TPM1  exon  2

SLC25A3  exon  3

Heart:                92%

Brain:                <1%

SE

MXE

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Cumulative  frequency

Switch  score

pKS  =  3.7e-­5

b

0  -­  0.25 0.25  -­  0.5 0.5  -­  1

Switch  score  bin

0.0

0.2

0.4

0.6

0.8

1.0

Fraction  frame-­preserving

c

p  =  6e-­10

p  =  8e-­15

p  =  0.01SE

MXE

e

Tissue-­biased  inclusion

Tissue-­biased  exclusion

skipped  exon

constitutive  exon

UGCAUG  (Fox1/2)

0 4

-­log10(p  value)

breast

adipose

brain

colon

heart

liver

lymph  node

skel.  muscle

testes

1 2 3

Figure  4

high    (0.5  -­  1.0)

medium    (0.25  -­  0.5)

low    (0  -­  0.25)

SE  switch  score

colon

skel.  musclelymph  nodeliver

adipose

testes

brain

skipped  exon mutually  exclusive  exon

breast cerebellum

cerebellum

Wang*, Sandberg* et al. 2008 Nature

Thursday, May 24, 12

Page 22: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

• Model this as a signal separation problem (signal and image processing !eld)

• Improve with more even read densities over exons

Deconvolution of mRNA isoform expression

TestesLiverSkeletal MuscleHeartAK074759BC011574AK092689

log 1

0(read

s) 02

02

02

02

3B

3A

3B

Unique regions for di"erent isoforms

Thursday, May 24, 12

! Library generation

! Aligning RNA-Seq data

! Gene expression calculations and di!erential expression

! De novo transcriptome reconstruction

! Alternative RNA isoforms

! Gene fusion events

! SNPs and mutations

! RNA-Seq analyses pipelines

! Single-cell RNA-Sequencing

RNA-Sequencing: Challenges and opportunities

Thursday, May 24, 12

Fusion events, e.g. translocations in cancer

Oszolak and Milos, Nature Rev Genet 2011

Thursday, May 24, 12

! Library generation

! Aligning RNA-Seq data

! Gene expression calculations and di!erential expression

! De novo transcriptome reconstruction

! Alternative RNA isoforms

! Gene fusion events

! SNPs and mutations

! RNA-Seq analyses pipelines

! Single-cell RNA-Sequencing

RNA-Sequencing: Challenges and opportunities

Thursday, May 24, 12

Page 23: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Single nucleotide polymorphism and mutations

! Estimate SNPs/mutations from RNA-Seq data using similar techniques as for genomic variation, but

! uneven coverage dictated from mRNA copy numbers

! limited to coding sequence and untranslated regions

Thursday, May 24, 12

Quality controls on variations found

Thursday, May 24, 12

Allelic imbalance

Skelly et al. Genome Res 2011

Thursday, May 24, 12

Mixed species/strains experiments

! Mixed species experiments allows mapping of host and pathogen interactions

! Parasite-host interactions

! Tumor-stroma interactions

Thursday, May 24, 12

Page 24: Cell/Tissue di erences are re ected in genesandberg.cmb.ki.se/media/data/courses/bioinfocell/RNAseq_2012.pdf · hair cells hippocampal neuron kidney cells Mammals: 100s of cell types,

Cross-strain experiments

Thursday, May 24, 12

Threshold for allele-speci"c expression?

Thursday, May 24, 12

Conclusions

• RNA-seq enables genome-wide transcriptome quanti!cation with more accurate and absolute expression estimates

• Low background enables quanti!cation of lowly expressed transcripts (~1 copy per cell)

• Investigate alternative promoters, splicing and polyadenylation, non-coding RNAs

• Allows for de novo transcriptome reconstruction and gene expression analyses in organisms without reference genome

Thursday, May 24, 12