Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

27
Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up

Transcript of Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Page 1: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Differential Expressions:Multiple Treatments

ANOVA

Kruskal Wallis

Factorial Set-up

Page 2: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Some Examples:

• 1.   Is there a difference in the mean expression for three different conditions?

• 2.   Is there a difference in the mean sugar content in five different brands on cereal?

• 3.   IS there a difference in the mean Pb content in the three main lakes in Eastern WA, (Couer D’Alene, Liberty Lake and Newman Lake)

• 4.   Is there a difference in the cholesterol ratio among the 4 groups Young male, Young female, Older males, Older females.

Page 3: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Models

• In each case we are interested in comparing multiple means to each other.

•  • MODEL:•  Y= Xb + e• X now is categorical and the matrix takes values of 1 or 0’s. • ONE WAY ANOVA: Yij= m+ti + eij

• TWO WAY ANOVA Yijk= m +bj+ti + eijk

• TWO-WAY ANOVA with INTERACTIONS Yijk= m +bj+ti + tbji + eijk

Page 4: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

One-Way Anova

• let us deal with ONEWAY ANOVA for simplicity.•  • So the hypothesis of interest is:•  • Ho: m1 = m2 = ... =mk

• Ha: at least one is unequal.•  • So we are interested in finding whether or not at least one of the

treatments are different from another. •  • We are also interested in identifying WHICH ones are different.

Page 5: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Example

• The logic behind ANOVA:• The idea is we decide whether the means are the same or not based on

their variability.•  • Assume that we wish to compare the three expression mean based on

five replicate arrays

Page 6: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Model Based Approach To ANOVA

• Like multiple regression and simple linear regression ANOVA can also written in terms of a linear model:

• Cell-means model• Yij = mi + eij

• OR • Yij = m + ti + eij

• Where: Yij is our observed data• m: our overall mean (grand mean)• ti: our effect from treatment i• eij: our error terms.• Our assumption is that eij are independent and follows a Normal

distribution with mean 0 and variance s2.

Page 7: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Hypotheses

• So the hypothesis we are testing are:•  • Ho: m1 = m2 = ... =mk

• Ha: at least one inequality

• Ho: t1 = t2 = ... =tk=(0)

• Ha: at least one inequality

Page 8: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Partitioning the SS

• Here too, we divide the SSTotal into:• SSTotal = SSModel + SSError • df N-1 r-1 N-r • E(MSE) = s2

• E(MSModel)= s2 + Sniti2/r-1

• Hence under the null, the term on the right drops out and E(MSE)/E(MSModel) =1. Also Cochran’s theorem indicates that the error chi-square and the model chi-square are independent under the null, hence MSM/MSE follows a F-statistic and we can test the null using F critical points.

Page 9: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Follow-up Analysis

• Once we declare that there is an overall difference we want to see where the differences lie. We could be interested in the following:

•  • 1.   Comparing all pairs of treatments to each other• 2.   Comparing some pre-chosen specific treatments to each other• 3.   Comparing the treatments to a STANDARD treatment• 4.   Comparing treatments to the BEST treatment. • When we are comparing pairs generally we have t(t-1)/2 total number

of comparisons. Hence, if we perform each comparison at Type I error or level alpha (say .05) our overall Type I error becomes VERY large.

• So there are different methods for controlling the Type I error.

Page 10: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

MC methods• 1.   Fisher’s LSD (controls per comparison error rate)• a.    Essentially this is doing t(t-1)/2 pooled t tests (or confidence intervals)

using the overall pooled variance each at level alpha.• b.   Easier to find significances (liberal)• c.    Extremely high overall error rate for large number of treatments

• 2.   Tukey’s HSD (controls Family wise error rate)• a.    Essentially does t(t-1)/2 pooled t tests (or confidence intervals) each at a

level lower than alpha, so that the overall error rate is alpha.• b.   Harder to find significances (conservative)• c.    This is the exact method and would be recommended by statisticians if

your sample sizes foe each treatment is equal.

• For unequal sample sizes, other methods in use are Bonferroni method, Tukey-Kramer method, Scheffe method etc.

• There are MANY methods for multiple comparisons and is a very active research area in statistics.

Page 11: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Non-parametric Alternative

• Here we assume the same data structure as a one-way layout.

• The model : Yij= m+ti + eij

• Here we do not assume underlying normality any more, but still assume equal variance and independence.

• The hypotheses:• Ho: t1 = t2 = ... =tk=(0)

• Ha: at least one inequality

Page 12: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Procedure

• Rank all observations jointly from smallest to largest. Let rij be the rank of Yij.

• For i=1…k, define Ri = Srij, Ri. = Ri/ni, R.. = (N+1)/2

• Compute, H = 12S(Ri. – R..)2/(N)(N+1)

• Reject H0 if H > h(a,k, n1…nk)

• Or if H > c2(k-1,a) (large sample approximation)

Page 13: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Multiple Treatments in Micro-arrays• For microarrays most of the ideas from DE for 2

conditions extend fairly easily into multiple conditions.• However here we have multiplicity from two different

aspects, the multiple conditions and the multiple genes.• There does not appear to be any consensus on HOW to do

this.• Most of what appears to be proposed is to use EB methods.• However, one can perform ANOVA F tests or Kruskal-

Wallis test for one gene at a time and rank the genes by the attribute of interest.

Page 14: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Linear Models Approach

• We will do a brief discussion on the linear model approach.

• Analyze all arrays together combining information in optimal way

• Here we use combined estimation of precision• Extensible to arbitrarily complicated experiments• Design matrix: specifies RNA targets used on arrays• Contrast matrix: specifies which comparisons are of

interest

Page 15: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.
Page 16: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Parallel inference for genes

10,000-40,000 linear models• Curse of dimensionality:• Need to adjust for multiple testing, e.g., control family-

wise error rate (FWE) or false discovery rate (FDR).

• Boon of parallelism:• Can borrow information from one gene to another.

Page 17: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.
Page 18: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.
Page 19: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Estimating hyper-parameters

• Closed form estimators with good properties are available:• for c0 in terms of quantiles of the

• for s0 and d0 in terms of the first two

• moments of log s2 | t˜ g | .

Page 20: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Within-array replicate spots

• Replicate spots of each gene on same array, assume duplicates at regular spacing

• Assume spatial component of correlation between duplicates is same for each gene

• Estimate spatial correlation from consensus estimator across genes

• Greatly improves estimation of precision

Page 21: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

How many genes are differentiallyexpressed?

• Log-ratios don’t appear to be normally distributed, this is hard to check

• Log-ratios for different genes are correlated in unknown way

• High level of multiple testing means that very small p-values are required – distributional assumptions must hold in extreme tail

• Little opportunity for usual CLT results to apply

Page 22: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Ranking easier than testing

• If there was only one gene, a t-test would give a reliable p-value for judging whether the true log-ratio was zero

• With many genes, computed p-values cannot be trusted (unless we have > 16 arrays)

• It is more realistic to rank the genes in order of evidence for differential expression.

Page 23: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

LIMMA Package for R

• Linear models for microarray data. A software package for the R programming environment.

• Focus is differential expression including• - moderated t-statistics• - methods for duplicate spots• - classifying F-tests• - stemmed heat diagrams• Available from www.bioconductor.org

Page 24: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Microarray time courseexperiments: types/features

• Typically short series: k = 4-10 time points for shorter,and 11-20 time points for longer series;

• Often irregularly spaced; with no or few (< 5) replications.• Can be periodic, OR• May have no particular pattern, as in developmental time

courses.

Page 25: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Time Series Issues

• May be longitudinal, where mRNA samples at different times are extracted from the same unit (cell line, tissue or individual), but more commonly cross-sectional, where mRNA samples are from different units.

• Gene expression values at different time points may be correlated, especially in a longitudinal study, or when a common reference design is used for a crosssectional study.

• At other times, the experimental design induces correlations in cross-sectional studies.

Page 26: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Issues… contd

• Two general types of hypotheses of interest: the one-sample (or one-class) problem: which genes are changing in time?

• and the 2 or >2 sample (or class) problem: which genes are changing differently in time across the samples (or classes)?

• Two broad types of mRNA samples: from cells or cell lines which give reasonably repeatable responses within classes, and whole organism (mice, humans), where there is a lot of response variability within classes.

Page 27: Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.

Analyzing time series data

• Generally time is used as a FACTOR in MA experiments with time series and the contrasts of interest are defined.

• These are then tested using traditional ANOVA or EB methods.