Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG)
description
Transcript of Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG)
Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG)
Rafael A. IrizarryDepartment of Biostatistics, JHU
(joint work with Bridget Hobbs and Terry Speed,
Walter & Eliza Hall Institute of Medical Research)
Summary
• Summarize the expression level of a probe set by Average Log2 (PM-BG)
• PMs need to be normalized • Background makes no use of probe-specific MM• Evaluate and compare through bias, variance and
model fit to AvDiff and the Li & Wong algorithm• Use Gene Logic spike-in and dilution study• All three expression measures performed well• AvLog(PM-BG) is arguably the best of the three
SD vs. Avg of Defective Probes
Normalization at Probe Level
Expression after Normalization
Background Distribution
Average Log2(PM-BG)
• Normalize probe level data
• Compute BG = background mean by estimating the mode of the MM distribution
• Subtract BG from each PM
• If PM-BG < 0 use minimum of positives divided by 2
• Take average
Spike-In Experiments
• Add concentrations (0.5pM – 100 pM) of 11 foreign species cRNAs to hybridization mixture
• Set A: 11 control cRNAs were spiked in, all at the same concentration, which varied across chips.
• Set B: 11 control cRNAs were spiked in, all at different concentrations, which varied across chips. The concentrations were arranged in 12x12 cyclic Latin square (with 3 replicates)
Why Remove Background?
Probe Level Data (12 chips)
What Did We Learn?
• Don’t subtract or divide by MM
• Probe effect is additive on log scale
• Take logs
Expression Level
Spike-In BGene Conc 1 Conc 2 Rank
BioB-5 100 0.5 1
BioB-3 0.5 25.0 2
BioC-5 2.0 75.0 3
BioB-M 1.0 35.7 4
BioDn-3 1.5 50.0 5
DapX-3 35.7 3.0 6
CreX-3 50.0 5.0 7
CreX-5 12.5 2.0 8
BioC-3 25.0 100 9
DapX-5 5.0 1.5 10
DapX-M 3.0 1.0 11
Later we consider 24 different combinations of concentrations
Differential Expression
Observed vs True Ratio
Dilution Experiment• cRNA hybridized to human chip (HGU_95) in
range of proportions and dilutions• Dilution series begins at 1.25 g cRNA per
GeneChip array, and rises through 2.5, 5.0, 7.5, 10.0, to 20.0 g per array. 5 replicate chips were used at each dilution
• Normalize just within each set of 5 replicates• For each probe set compute expression, average
and SD over replicates, and fit a line to log expression vs. log concentration
• Regression line should have slope 1 and high R2
Dilution Experiment Data
Expression and SD
Slope Estimates and R2
Model check
• Compute observed SD of 5 replicate expression estimates
• Compute RMS of 5 nominal SDs
• Compare by taking the log ratio
• Closeness of observed and nominal SD taken as a measure of goodness of fit of the model
Observed vs. Model SE
Observed vs. Model SE
Conclusion
• Take logs• PMs need to be normalized • Using global background improves on use of
probe-specific MM• Gene Logic spike-in and dilution study show all
three expression measures performed very well• AvLog(PM-BG) is arguably the best in terms of
bias, variance and model fit• Future: better BG; robust/resistant summaries
Acknowledgements
• Gene Brown’s group at Wyeth/Genetics Institute, and Uwe Scherf’s Genomics Research & Development Group at Gene Logic, for generating the spike-in and dilution data
• Gene Logic for permission to use these data • Francois Collin (Gene Logic)• Ben Bolstad (UC Berkeley)• Magnus Åstrand (Astra Zeneca Mölndal)