Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias...
-
Upload
amos-davis -
Category
Documents
-
view
218 -
download
0
Transcript of Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias...
![Page 1: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/1.jpg)
Fishing expeditions in gloomy waters: Detecting differential expression in
microarray data
Matthias E. FutschikInstitute for Theoretical Biology
Humboldt-University, Berlin, Germany
Hvar Summer School, 2004
![Page 2: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/2.jpg)
Overview•Starting points: Where are we?
• Gene expression matrix
•Data pre-processing• Background subtraction• Data transformation
•Normalisation• Hybridisation model• Within slide normalisation• Local regression
•Detection of differential expression• Hypothesis testing• Statistical tests
![Page 3: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/3.jpg)
Roadmap: Where are we?
Good news: We are almost ready for ‘higher` data analysis !
![Page 4: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/4.jpg)
Data-Preprocessing
Background subtraction: May reduce spatial artefacts May increase variance as both
foreground and background intensities are estimates ( “arrow-like” plots MA-plots)
Preprocessing: Thresholding: exclusion of low
intensity spots or spots that show saturation
Transformation: A common transformation is log-transformation for stabilitation of variance across intensity scale and detection of dye related bias.
Log-transformationLog-transformation
![Page 5: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/5.jpg)
The problem:
Are all low intensity genes down-regulated?? Are all genes spotted on the
left side up-regulated ??
![Page 6: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/6.jpg)
Hybridisation model
• Microarrays do not assess gene activities directly, but indirectly by measuring the fluorescence intensities of labelled target cDNA hybridised to probes on the array. So how do we get what we are interested in? Answer: Find the relation between flourescance spot intensities and mRNA abundance!
• Explicitly modelling the relation between signal intensities and changes in gene expression can separate the measured error into systematic and random errors.
• Systematic errors are errors which are reproducible and might be corrected in the normalisation procedure, whereas random errors cannot be corrected, but have to be assessed by replicate experiments.
![Page 7: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/7.jpg)
Hybridisation model for two-colour arrays
I = N(θ) A + ε
A first attempt:For two-colour microarrays, the fundamental variables are the fluorescence intensities of spots in the red (Ir) and the green channel (Ig). These intensities
are functions of the abundance of labelled transcripts Ar/g. Under ideal
circumstances, this relation of I and A is linear up to an additional experimental error ε:
N : normalisation factor determined by experimental parameters θ such as the laser power amplification of the scanned signal.
Frequently, however, this simple relation does not hold for microarrays due to effects such as intensity background, and saturation.
![Page 8: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/8.jpg)
Hybridisation model for two-colour arrays
( )
( )
r r r r
g g g g
I k AR
I k A
M - κ (θ) = D + ε
Let`s try a more flexible approach based on ratio R (pairing of intensities reduces variablity due to spot morphology)
After some calculus (homework! I will check it tomorrow) we get
How do we get κ (θ)?
κ: non-linear normalisation factors (functions) dependent on experimental parameters.
D = log2(Ar/Ag)
M = log2(Ir/Ig)
![Page 9: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/9.jpg)
Normalization – bending data to make it look nicer...
Normalization describes a variety of data transformations aiming to correct for experimental variation
![Page 10: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/10.jpg)
Within – array normalization Normalization based on 'householding genes' assumed to be
equally expressed in different samples of interest
Normalization using 'spiked in' genes: Ajustment of intensities so that control spots show equal intensities across channels and arrays
Global linear normalisation assumes that overall expression in samples is constant. Thus, overall intensitiy of both channels is linearly scaled to have value.
Non-linear normalisation assumes symmetry of differential
expression across intensity scale and spatial dimension of array
![Page 11: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/11.jpg)
Normalization by local regression
Regression of local intensity >> residuals are 'normalized' log-fold changes
Common presentation:MA-plots: A = 0.5* log2(Cy3*Cy5)
M = log2(Cy5/Cy3)>> Detection of intensity-dependentbias!
Similarly, MXY-plots for detection of spatial bias. M, and thus κ, is function of A, X and Y
Normalized expression changesshow symmetry across intensity scale and slide dimension
![Page 12: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/12.jpg)
Normalisation by local regression and problem of model selection
Example: Correction of intensity-dependent bias in data by loess (MA-regression: A=0.5*(log
2(Cy5)+log
2(Cy3)); M = log
2(Cy5/Cy3);
Raw data Local regression
Corrected data
However, local regression and thus correction depends onchoice of parameters.
Correction: M- M
reg
? ??
Different choices of paramters lead to different normalisations.
![Page 13: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/13.jpg)
Optimising by cross-validation and iteration
Iterative local regression by locfit (C.Loader): 1) GCV of MA-regression 2) Optimised MA-regression 3) GCV of MXY-regression 4) Optimised MXY-regression
2 iterations generally sufficient
GCV of MA
![Page 14: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/14.jpg)
Optimised local scaling
Iterative regression of M and spatial dependent scaling of M: 1) GCV of MA-regression 2) Optimised MA-regression 3) GCV of MXY-regression 4) Optimised MXY-regression 5) GCV of abs(M)XY-regression 6) Scaling of abs(M)
![Page 15: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/15.jpg)
Comparison of normalisation procedures
MA-plots: 1) Raw data 2) Global lowess (Dudoit et al.) 3) Print-tip lowess (Dudoit et al.) 4) Scaled print-tip lowess (Dudoit et al.) 5) Optimised MA/MXY regression by locfit 6) Optimised MA/MXY regression wit1h scaling
=> Optimised regression leads to a reduction of variance (bias)
![Page 16: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/16.jpg)
Comparison II: Spatial distribution
=> Not optimally normalised data show spatial bias
MXY-plots canindicate spatial bias
MXY-plots:
![Page 17: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/17.jpg)
Averaging by sliding window reveals un-corrected bias
Distribution of median M within a window of 5x5 spots:
=> Spatial regression requires optimal adjustment to data
![Page 18: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/18.jpg)
Statistical significance testing by permutation test
M
Original distribution
What is the probabilty to observe a median M within a window by chance?
Comparison with empirical distribution=> Calculation of probability(p-value) using Fisher’s method
Mr1
Mr2
Mr3
Randomised distributions
![Page 19: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/19.jpg)
Statistical significance testing by permutation test
Histogram of p-values for a window size of 5x5Number of permutation: 106
p-values for negative M p-values for positive M
![Page 20: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/20.jpg)
Statistical significance testing by permutation testMXY of p-values for a window size of 5x5Number of permutation: 106
Red: significant positive MGreen: significant negative M
M. Futschik and T. Crompton, Genome Biology, to appear
![Page 21: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/21.jpg)
Normalization makes results of different microarrays comparable
Between-array normalization scaling of arrays linearly or e.g. by quantile-quantile
normalization Usage of linear model e.g. ANOVA or mixed-models:
yijg
= µ + Ai + D
j + AD
ig + G
g + VG
kg + DG
jg + AG
ig + ε
ijg
![Page 22: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/22.jpg)
Classical hypothesis testing:
1) Setting up of null hypothesis H0(e.g. gene X is not
differentially expressed) and alternative hypothesis H
a (e.g. Gene X is differentially expressed)
2) Using a test statistic to compare observed values with values predicted for H
0.
3) Define region for the test statistic for which H0 is
rejected in favour of Ha.
Going fishing: What is differentially expressed
![Page 23: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/23.jpg)
Significance of differential gene expression
Typical test statistics1) Parametric tests e.g. t-test, F-test assume
a certain type of underlying distribution 2) Non-parametric tests (i.e. Sign test,
Wilcoxon rank test) have less stringent assumptions
t = ( t = (
P-value: probability of occurrence by chance
Two kinds of errors in hypothesis testing:1) Type I error: detection of false positive2) Type II error: detection of false negative
Level of significance :α = P(Type I error)Power of test : 1- P(Type II error) = 1 – β
![Page 24: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/24.jpg)
Detection of differential expression
• What makes differential expression differential expression? What is noise?
• Foldchanges are commonly used to quantify differenitial expression but can be misleading (intensity-dependent).
• Basic challange: Large number of (dependent/correlated) variables compared to small number of replicates (if any).
Can you spot the interesting spots?
![Page 25: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/25.jpg)
Criteria for gene selection
Accuracy: how closely are the results to the true values
Precision: how variable are the results compared to the true value
Sensitivity: how many true posítive are detected Specificity: how many of the selected genes are true
positives.
![Page 26: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/26.jpg)
>> Multiple testing required with large number of tests but small number of replicates.
>> Adjustment of significance of tests necessary
Example: Probability to find a true H
0 rejected for α=0.01 in 100 independent
tests: P = 1- (1-α) 100 ~ 0.63
Multiple testing poses challanges
![Page 27: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/27.jpg)
Compound error measures:
Per comparison error rate: PCER= E[V]/N Familiywise error rate: FWER=P(V≥1) False discovery rate: FDR= E[V/R]
N: total number of tests V: number of reject true H
0 (FP)
R: number of rejected H (TP+FP) Aim to control the error rate: 1) by p-value adjustment (step-down procedures: Bonferroni, Holm, Westfall-Young, ...) 2) by direct comparison with a background distribution (commonly generated by random permuation)
![Page 28: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/28.jpg)
Alternative approach:Treat spots as replicates
For direct comparison: Gene X is significantly differentially expressed if corresponding fold change falls in chosen rejection region. The parameters of the underlying distribution are derived from all or a subset of genes.
Since gene expression is usually heteroscedastic with respect to abundance,variance has to be stabalised by local variance estimation. Alternatively, local estimates of z-score can be derived.
![Page 29: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/29.jpg)
Constistency of replications
Case study: SW480/620 cell line comparisonSW480: derived from primary tumourSW 620: derived from lymphnode metastisis of same patient Model for cancer progression
Experimental design:• 4 independent hybridisations,• 4000 genes• cDNA of SW620 Cy5-labelled,• cDNA of SW480 Cy3-labelled. This design poses a problem! Can you spot it?
![Page 30: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/30.jpg)
Usage of paired t-test
d
dt
sd: average differences of paired intensitiessd: standard deviation of d
p-value < 0.01 Bonferroni adjusted p-value < 0.01
![Page 31: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/31.jpg)
Robust t-test
2 2 2 tot gene gene exp
Adjust estimation of variance:Compound error model:
Gene-specific error
Experiment-specificerror
This model avoidsselection of control spots
M. Futschik et al, Genome Letters, 2002
![Page 32: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/32.jpg)
Another look at the results
Significant genes as red spots:3 σ-error bars do not overlap with M=0 axis. That‘s good!
![Page 33: Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ea95503460f94bad5a3/html5/thumbnails/33.jpg)
Take-home messages
Don‘t download and analyse array data blindly Visualise distributions: the eye is astonishing
good in finding interesting spots Use different statistics and try to understand the
differences Remember: Statistical significance is not
necessary biological significance! Ready to go fishing in Hvar ... ?