BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment
Extending the Loop Design for Microarray Experiments
description
Transcript of Extending the Loop Design for Microarray Experiments
Extending the Loop Extending the Loop Design for Design for Microarray Microarray
Experiments Experiments Naomi S. Altman, Naomi S. Altman,
Pennsylvania State University), Pennsylvania State University), [email protected]@stat.psu.edu
Interface Meetings May 04Interface Meetings May 04
Expt Design and Expt Design and MicroarraysMicroarrays
Microarrays are Microarrays are ExpensiveExpensiveNoisyNoisy
A perfect situation for A perfect situation for optimal designoptimal design
OutlineOutline
Reference DesignReference Design Loop DesignsLoop Designs ReplicationReplication Optimal Design/AnalysisOptimal Design/Analysis Incorporating Multiple Factors Incorporating Multiple Factors
and Blocksand Blocks
Arrow NotationArrow Notation
Introduced by Kerr and Churchill Introduced by Kerr and Churchill (2001)(2001)
Each array is represented by an arrow.Each array is represented by an arrow.
Red Green
Reference DesignReference Design
Reference
A
B
C
D4 arrays
1 sample/treatment
4 reference samples
Loop DesignLoop Design(Kerr and Churchill 2001)(Kerr and Churchill 2001)
A
C
B
D
4 arrays
2 samples/treatment
ReplicationReplicationOften there is confusion among:Often there is confusion among:
Biological replicatesBiological replicates
Technical replicatesTechnical replicatesrepeated samplesrepeated samplessplit sample and relabelsplit sample and relabelspot replicationspot replication
In this presentation: We consider only In this presentation: We consider only one spot/gene/arrayone spot/gene/arrayany technical replicates are averagedany technical replicates are averagedeach sample is an each sample is an independent biological independent biological replicatereplicate
Linear Mixed Model for Linear Mixed Model for Microarray DataMicroarray Data
is the response of the gene in one channelis the response of the gene in one channel
is the mean response of the gene over all is the mean response of the gene over all treatments, channels, arraystreatments, channels, arrays
is the effect of treatment iis the effect of treatment i
the effect of dye jthe effect of dye j
is the effect of the array k (or spot on the array)is the effect of the array k (or spot on the array)
is the random deviation from the other effects is the random deviation from the other effects and includes biological variation, technical and includes biological variation, technical variation and random errorvariation and random error
ijkkjiijkY
ji
ijkY
ijkk
Linear Mixed Model for Linear Mixed Model for Microarray DataMicroarray Data
The 2 channels on a single spot are correlatedThe 2 channels on a single spot are correlated
→ → array should be treated as a random effectarray should be treated as a random effect
ijkkjiijkY
Differencing Channels on Differencing Channels on an Arrayan Array
Often the difference between samples Often the difference between samples on a single array is the unit of on a single array is the unit of analysis:analysis:
rGkiRkktir YY )).((
Normalization is almost always done on this quantity.
In a reference design, the difference between treatments A and B can be estimated from 2 arrays by
)).(()).((ˆˆ
luBrktArBA
But there can be a large loss of information.
Var()=0.126 Var(M)=0.453
)).(( ktAr
Drosophila arrays courtesy of
Bryce MacIver, PSU
Reference DesignReference Design
The reference sample is the same biological The reference sample is the same biological material on every arraymaterial on every array
T treatments, T treatments, k replicates,k replicates, kT arrayskT arrays
If there are technical dye-swaps, these are If there are technical dye-swaps, these are averaged to form 1 replicate.averaged to form 1 replicate.
If all comparisons are between treatments, If all comparisons are between treatments, there is no need to dye-swap. If there are there is no need to dye-swap. If there are dye-swaps, these should be balanced by dye-swaps, these should be balanced by treatment.treatment.
Reference Design – Usual Reference Design – Usual AnalysisAnalysis
Usually the analysis is done on Usually the analysis is done on E.g.E.g.
).()().()(ˆˆ
BrArBA
24
and with k replicates, the variance of the estimated difference is k/4 2
Using the linear mixed model, we see that the variance of one pair is
The optimal w is
The resulting variance for a single replicate is
and with k replicates, the variance of the estimated difference is
Reference Design – Optimal Reference Design – Optimal WeightsWeights
Consider using Consider using
ThenThen )).(()).((ˆˆluBr
wktAr
wBA
rGkiRkktirw wYY )).((
)/( 222
)/(24 2242 )/(2 224
)(/2/4 2242 kk )(/2 224 k
)/(24 2242min Var 22222 /22
)/( 222 optw
Reference Design – Optimal Reference Design – Optimal WeightsWeights
We do not know the optimal weights but
if we use mixed model ANOVA such as those available in SAS, Splus or R, the weights are approximated from the data – leading to more efficient computations.
Loop DesignsLoop Designs
A
C
B
D
A loop is balanced for dye effects and has two replicates at each node.
T treatments, 2k replicates, Tk arrays
Recall: for a reference design we get only k replicates on Tk arrays
Using optimal weighting
Var(A-B)=Var(A-D) =
Var(A-C)=
Both are smaller than the variance of the reference design with 4 arrays
Loop Designs T=4, 4 Loop Designs T=4, 4 arraysarrays
22222 2/ A
C
B
D
22222 /
22222 /22
Loop Designs T=4Loop Designs T=4
A
C
B
D
A
B
C
D
A
D
B
C
Design L4C Design L4B Design L4D
Loop Design – 3 loops = 6 replicates/treatments
3* L4C Var(A-B)=
Var(A-C)=
L4B+L4C+L4D
Var(difference) =
T=4, 12 arraysT=4, 12 arrays
22222 6/3/
Reference Design – 3 replicates/treatment
Var(difference) =
)(3/23/2 22222
22222 3/3/
22222 343/23/
Loop Design – 3 loops = 6 replicates/treatments
3* L4C Var(A-B)= 0.46
Var(A-C)= 0.58
L4B+L4C+L4D
Var(difference) = 0.47
T=4, 12 arraysT=4, 12 arraysAssuming Assuming
Reference Design – 3 replicates/treatment
Var(difference) = 0.83
3/ 22
2
22
2
Incorporating 2x2 FactorialIncorporating 2x2 Factorialin a Loop in a Loop
The design is 2 genotypes G,g and 2 tissuesT,tOnly within genotype and within tissue comparisons are of interest
GT
gt
gT
Gt
An 8 Treatment ExampleAn 8 Treatment ExampleA
C
B
DG
F E
H
An 8 Treatment ExampleAn 8 Treatment ExampleA
C
B
DG
F E
H
2 Complete Blocks
An 8 Treatment ExampleAn 8 Treatment ExampleA
C
B
DG
F E
H
Replication:
Yellow loop?
Red “loop”?
And now for the rest of And now for the rest of the storythe story
Missing arrays – Missing arrays – not fatal but not fatal but reduce reduce efficiencyefficiency
Added Added treatmentstreatments
A
C
B
D
A
C
B
D
E
And now for the rest of And now for the rest of the storythe story
Missing arrays – Missing arrays – not fatal but not fatal but reduce reduce efficiencyefficiency
Added Added treatmentstreatments
A
C
B
D
A
C
B
D
E
The Moral of the StoryThe Moral of the Story Loop designs are very efficientLoop designs are very efficient
Can incorporate factorial arrangementsCan incorporate factorial arrangements Can incorporate blocksCan incorporate blocks Can be replicated in various ways to Can be replicated in various ways to
improve efficiencyimprove efficiency Optimal design can help determine Optimal design can help determine
which (generalized) loop design to which (generalized) loop design to useuse
ANOVA-type analyses on the ANOVA-type analyses on the individual channels – not differencing individual channels – not differencing – should be used for analysis.– should be used for analysis.
C2
B2
A1
C1
B1
A2