RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential...

13
RNAseq: Normalization and differential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology. 2010 Hardcastle, Kelly. baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010

Transcript of RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential...

Page 1: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

RNAseq:Normalization and differential expression I

Jens Gietzelt

22.05.2012

Robinson, Oshlack. A scaling normalization method for differentialexpression analysis of RNA-seq data. Genome Biology. 2010

Hardcastle, Kelly. baySeq: Empirical Bayesian methods foridentifying differential expression in sequence count data. BMCBioinformatics. 2010

Page 2: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

IntroductionPairwise calibration (EdgeR)

Differential expression

Outline of the presentation

1 Introduction

2 Pairwise calibration (EdgeR)

3 Differential expression

Jens Gietzelt

Page 3: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

IntroductionPairwise calibration (EdgeR)

Differential expression

Introduction

normalization:

comparison of expression levels between genes within a sample(same scale)

however technical effects introduce a bias in the comparison betweensamples

⇒ normalization is crucial before performing differential expression

calibration method EdgeR takes advantage of within-samplecomparability

differential expression:

appropriate distribution for count data

incorporate calibration parameters

Jens Gietzelt

Page 4: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

IntroductionPairwise calibration (EdgeR)

Differential expression

Framework

Yg ,k ... observed count for gene g in library k

Nk =G∑

g=1Yg ,k ... total number of reads for library k

ηg ,k ... number of transcripts of gene g in library kLg ... length of gene g

Sk =G∑

g=1ηg ,kLg ... total RNA output of sample k

E (Yg ,k) =ηg ,kLg

SkNk

counts are a linear function of the number of transcripts

library size calibration (Yg ,k/Nk) is appropriate for the comparisonof replicates

comparison of biologically different samples may be biased byvarying RNA composition

Jens Gietzelt

Page 5: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

kidney vs. liver dataset

Page 6: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

IntroductionPairwise calibration (EdgeR)

Differential expression

Trimmed mean of log-foldchange

RNA production Sk of one sample cannot be determined directly

estimation of relative differences of RNA production fk = Sk/Sr of apair of samples (k , r)

assumption: most genes are not differentially expressed

⇒ compute robust mean over log-foldchanges:

double filtering over both mean and difference of log-valuescalculate a weighted mean over the log-foldchanges⇒ resacle factors fk = TMM(k,r), where r is reference sample

log2

(TMM(k,r)

)=

∑g∈G∗

wg ,(k,r) (log2 (Yg ,k/Nk)− log2 (Yg ,r/Nr ))∑g∈G∗

wg ,(k,r)

wg ,(k,r) =

(1

Yg ,k− 1

Nk+

1

Yg ,r− 1

Nr

)−1

Jens Gietzelt

Page 7: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

kidney vs. liver dataset

Page 8: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

Simulation: pair of samples

simulated data sampled from poisson distribution

Page 9: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

Simulation: replicates

Cloonan: log-transformation and quantile normalization

Page 10: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

IntroductionPairwise calibration (EdgeR)

Differential expression

Differential expression

methods in use:

DegSeq (normal distr.)

EdgeR (negative binomial)

DEseq (negative binomial, multiple groups)

baySeq (negative binomial, multiple groups)

Myrna (permutation based)

Jens Gietzelt

Page 11: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

IntroductionPairwise calibration (EdgeR)

Differential expression

EdgeR

technical replicates: poisson distr.biologically different samples: negative binomial distr.

Y ∼ NB (p, m)

Y ... number of successes in a sequence of Bernoulli trials withprobability p before r failures occur

alternative parametrization:qg ,e ... proportion of sequenced RNA of gene g for experimental group e

Yg ,k,e ∼ NB (qg ,eNk fk , φg )

E (Yg ,k,e) = µg ,k,e = qg ,eNk fk , Var (Yg ,k,e) = µg ,k,e + µ2g ,k,eφg

test if qg ,1 is significantly different from qg ,2

dispersons φg are moderated towards a common disperson

Jens Gietzelt

Page 12: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

IntroductionPairwise calibration (EdgeR)

Differential expression

baySeq I

empirical Bayes approach to detect differential expression

Dg = {Yg ,k , Nk , fk}k=1,...,KM ... user specified modelθM ... vector of parameters of model M

P (M|Dg ) =P (Dg |M) P (M)

P (Dg )

calculate marginal likelihood:

P (Dg |M) =

∫P (Dg |θM ,M) P (θM |M) dθM

Jens Gietzelt

Page 13: RNAseq: Normalization and differential expression I · RNAseq: Normalization and di erential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method

IntroductionPairwise calibration (EdgeR)

Differential expression

baySeq II

P (Dg |M) =

∫P (Dg |θM ,M) P (θM |M) dθM

e.g. Poisson-Gamma conjugacy, however no such conjugacy withnegative binomial data

⇒ define an empirical distribution on θM and estimate the marginallikelihood numerically

prior P (M) is estimated by iteration:

P (M) = pg , p∗g = P (M|Dg )

baySeq:

applicable to complex experimental designs

computationally intensive

Jens Gietzelt