Smart Home Technologies CSE 4392 / CSE 5392 Spring 2006 Manfred Huber [email protected].
First analysis steps o quality control and optimization o calibration and error modeling o data...
-
date post
18-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of First analysis steps o quality control and optimization o calibration and error modeling o data...
![Page 1: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/1.jpg)
First analysis stepsFirst analysis stepso quality control and optimizationo calibration and error modelingo data transformations
Wolfgang Huber
Dep. of Molecular Genome Analysis (A. Poustka)
DKFZ Heidelberg
![Page 2: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/2.jpg)
Acknowledgements
Anja von HeydebreckGünther Sawitzki
Holger Sültmann, Andreas Buness, Markus Ruschhaupt, Klaus Steiner, Jörg Schneider, Katharina Finis, Stephanie Süß, Anke Schroth, Friederike Wilmer, Judith Boer, Martin Vingron, Annemarie Poustka
Sandrine Dudoit, Robert Gentleman, Rafael Irizarry and Yee Hwa Yang: Bioconductor short course, summer 2002
and many others
![Page 3: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/3.jpg)
4 x 4 or 8x4 sectors
17...38 rows and columns per sector
ca. 4600…46000probes/array
sector: corresponds to one print-tip
a microarray slideSlide: 25x75 mm
Spot-to-spot: ca. 150-350 m
![Page 4: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/4.jpg)
Terminologysample: RNA (cDNA) hybridized to the array,
aka target, mobile substrate.
probe: DNA spotted on the array, aka spot, immobile substrate.
sector: rectangular matrix of spots printed using the same print-tip (or pin), aka print-tip-group
plate: set of 384 (768) spots printed with DNA from the same microtitre plate of clones
slide, array
channel: data from one color (Cy3 = cyanine 3 = green, Cy5 = cyanine 5 = red).
batch: collection of microarrays with the same probe layout.
![Page 5: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/5.jpg)
Raw datascanner signal
resolution:5 or 10 m spatial, 16 bit (65536) dynamical per channel
ca. 30-50 pixels per probe (60 m spot size)40 MB per array
Image Analysis
spot intensities2 numbers per probe (~100-300 kB)… auxiliaries: background, area, std dev, …
![Page 6: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/6.jpg)
Image analysis
1. Addressing. Estimate location of spot centers.
2. Segmentation. Classify pixels as foreground (signal) or background.
3. Information extraction. For each spot on the array and each dye
• foreground intensities;• background intensities; • quality measures.
R and G for each spot on the array.
![Page 7: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/7.jpg)
Segmentation
adaptive segmentationseeded region growing
fixed circle segmentation
Spots may vary in size and shape.
![Page 8: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/8.jpg)
spot intensity dataspot intensity data
two-color spotted arrays
Pro
bes (
gen
es)
n one-color arrays (Affymetrix, nylon)
conditions (samples)
![Page 9: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/9.jpg)
Which genes are differentially transcribed?
Which genes are differentially transcribed?
same-same tumor-normal
log-ratio
![Page 10: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/10.jpg)
ratios and fold changes
Fold changes are useful to describe continuous changes in expression
10001500
3000
x3
x1.5
A B C
0200
3000
?
?
A B C
But what if the gene is “off” (below detection limit) in one condition?
![Page 11: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/11.jpg)
ratios and fold changes
Many interesting genes will be off in some of the conditions of interest
1.If you want expression measure (“net normalized spot intensity”) to be an unbiased estimator of abundance
many values 0 need something more than
(log)ratio
2. If you let expression measure be biased
can keep ratios. how do you chose the bias?
![Page 12: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/12.jpg)
Raw data are not mRNA concentrations
o tissue contamination
o clone identification and mapping
o image segmentation
o RNA degradation
o PCR yield, contamination
o signal quantification
o amplification efficiency
o spotting efficiency
o ‘background’ correction
o reverse transcription efficiency
o DNA-support binding
o hybridization efficiency and specificity
o other array manufacturing-related issues
The problem is less that these steps are ‘not perfect’; it is that they may vary from gene to gene, array to array, experiment to experiment.
![Page 13: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/13.jpg)
Sources of variationSources of variationamount of RNA in the biopsy efficiencies of-RNA extraction-reverse transcription -labeling-photodetection
PCR yieldDNA qualityspotting efficiency, spot sizecross-/unspecific hybridizationstray signal
Calibration Error model
Systematic o similar effect on many measurementso corrections can be estimated from data
Stochastico too random to be ex-plicitely accounted for o “noise”
![Page 14: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/14.jpg)
iik ika a
ai per-sample offset
ik ~ N(0, bi2s1
2)
“additive noise”
bi per-sample normalization factor
bk sequence-wise probe efficiency
ik ~ N(0,s22)
“multiplicative noise”
exp( )iik k ikb b b
ik ik ik ky a b x
modeling ansatz
measured intensity = offset + gain true abundance
![Page 15: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/15.jpg)
The two-component model
raw scale log scale
“additive” noise
“multiplicative” noise
B. Durbin, D. Rocke, JCB 2001
![Page 16: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/16.jpg)
Calibration ("normalization")
Calibration ("normalization")
Correct for systematic variations.To do: fit appropriate "correction parameters" ai, bi (and possibly more…) and apply to the data."Heteroskedasticity" (unequal variances) weighted regression or variance stabilizing transformation
Outliers: use a robust method
![Page 17: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/17.jpg)
the variance-mean dependence
the variance-mean dependence
data (cDNA slide):
relation between
mean u=E(Yik)
andvariance
v=Var(Yik):2 2 2
0( ) ( )v u c u u s
![Page 18: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/18.jpg)
variance stabilization
variance stabilization
Xu a family of random variables with
EXu=u, VarXu=v(u).
Define
var f(Xu ) independent of u
1( )
v( )
x
f x duu
derivation: linear approximation
![Page 19: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/19.jpg)
0 20000 40000 60000
8.0
8.5
9.0
9.5
10
.01
1.0
raw scale
tra
nsf
orm
ed
sca
le
variance stabilization
f(x)
x
![Page 20: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/20.jpg)
variance stabilizing transformations
variance stabilizing transformations
1( )
v( )
x
f x duu
1.) constant variance
( ) constv u f u
2.) const. coeff. of variation
2( ) logv u u f u
4.) microarray
2 2 00( ) ( ) arsinh
u uv u u u s f
s
3.) offset2
0 0( ) ( ) log( )v u u u f u u
![Page 21: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/21.jpg)
the arsinh transformationthe arsinh transformation
- - - log u
——— arsinh((u+uo)/c)
2arsinh( ) log 1
arsinh log log2 0limx
x x x
x x
intensity-200 0 200 400 600 800 1000
![Page 22: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/22.jpg)
parameter estimationparameter estimation
2Yarsinh , (0, )iki
k ki kii
aN c
b
:
o maximum likelihood estimator: straightforward – but sensitive to deviations from normality
o model holds for genes that are unchanged; differentially transcribed genes act as outliers.
o robust variant of ML estimator, à la Least Trimmed Sum of Squares regression.
o works as long as <50% of genes are differentially transcribed
ii k i k i ka a L a i p e r - s a m p l e o ff s e t
L i k l o c a l b a c k g r o u n d p r o v i d e d b y i m a g e a n a l y s i s
i k ~ N ( 0 , b i2 s 1
2 )
“ a d d i t i v e n o i s e ”
b i p e r - s a m p l en o r m a l i z a t i o n f a c t o r
b k s e q u e n c e - w i s el a b e l i n g e ffi c i e n c y
i k ~ N ( 0 , s 22 )
“ m u l t i p l i c a t i v e n o i s e ”
e x p ( )ii k k i kb b b
i k i k i k i ky a b x
m e a s u r e d i n t e n s i t y = o ff s e t + g a i n * t r u e a b u n d a n c e
![Page 23: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/23.jpg)
Least trimmed sum of squares regression
Least trimmed sum of squares regression
0 2 4 6 8
02
46
8
x
y 2n/2
( ) ( )i=1
( )i iy f x
minimize
- least sum of squares - least trimmed sum of squares
![Page 24: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/24.jpg)
evaluation: effects of different data transformations
evaluation: effects of different data transformations
diff
ere
nce r
ed
-g
reen
rank(average)
![Page 25: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/25.jpg)
Coefficient of
variation
Coefficient of
variation
cDNA slide: H. Sueltmann
![Page 26: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/26.jpg)
evaluation: a benchmark for Affymetrix genechip expression measures
o Data: Spike-in series: from Affymetrix 59 x HGU95A, 16 genes, 14 concentrations, complex backgroundDilution series: from GeneLogic 60 x HGU95Av2,liver & CNS cRNA in different proportions and amounts
o Benchmark: 15 quality measures regarding-reproducibility-sensitivity -specificity Put together by Rafael Irizarry (Johns Hopkins) http://affycomp.biostat.jhsph.edu
![Page 27: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/27.jpg)
ROC curves
![Page 28: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/28.jpg)
affycomp results (28 Sep 2003) good
bad
![Page 29: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/29.jpg)
SummarySummary
log-ratio
'glog' (generalized log-ratio)
- interpretation as "fold change"
+ interpretation even in cases where genes are off in some conditions
+ visualization
+ can use standard statistical methods (hypothesis testing, ANOVA, clustering, classification…) without the worries about low-level variability that are often warranted on the log-scale
2 2
2 2
log
log
i
j
i i i
j j j
xx
x x c
x x c
![Page 30: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/30.jpg)
Availability
o implementation in Ro open source package
vsn on www.bioconductor.org
o Bioconductor is an international collaboration on open source software for bioinformatics and statistical omics
![Page 31: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/31.jpg)
Quality control:
diagnostic plots and artifacts
![Page 32: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/32.jpg)
Scatterplot, colored by PCR-plateTwo RZPD Unigene II filters (cDNA nylon membranes)
PCR platesPCR plates
![Page 33: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/33.jpg)
PCR platesPCR plates
![Page 34: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/34.jpg)
PCR plates: boxplotsPCR plates: boxplots
![Page 35: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/35.jpg)
array batchesarray batches
![Page 36: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/36.jpg)
print-tip effectsprint-tip effects
-0.8 -0.6 -0.4 -0.2 0.0 0.2
0.0
0.2
0.4
0.6
0.8
1.0
41 (a42-u07639vene.txt) by spotting pin
log(fg.green/fg.red)
F̂
1:11:21:31:42:12:22:32:43:13:23:33:44:14:24:34:4
q (log-ratio)
F(q
)
![Page 37: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/37.jpg)
spotting pin quality declinespotting pin quality decline
after delivery of 3x105 spots
after delivery of 5x105 spots
H. Sueltmann DKFZ/MGA
![Page 38: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/38.jpg)
spatial effectsspatial effects
R Rb R-Rbcolor scale by rank
spotted cDNA arrays, Stanford-type
another array:
print-tip
color scale
~ log(G)
color scale
~ rank(
G)
![Page 39: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/39.jpg)
10 20 30 40 50 60
1020
3040
5060
1:nrhyb
1:nr
hyb
1 2 3 4 5 6 7 8 910111213141516171823242526272829303132333435363738737475767778798081828384858687888990919293949596979899100
0.6
0.8
1.0
1.2
1.4
1.6
1.8
Batches: array to array differences dij = madk(hik -hjk)
arrays i=1…63; roughly sorted by time
![Page 40: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/40.jpg)
Density representation of the scatterplot(76,000 clones, RZPD Unigene-II filters)
![Page 41: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/41.jpg)
Oligonucleotide chips
![Page 42: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/42.jpg)
Affymetrix files
Main software from Affymetrix: MAS - MicroArray Suite.
DAT file: Image file, ~10^7 pixels, ~50 MB.
CEL file: probe intensities, ~400000 numbers
CDF file: Chip Description File. Describes which probes go in which probe sets (genes, gene fragments, ESTs).
![Page 43: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/43.jpg)
Image analysisDAT image files CEL filesEach probe cell: 10x10 pixels.Gridding: estimate location of probe cell
centers.Signal:
– Remove outer 36 pixels 8x8 pixels.– The probe cell signal, PM or MM, is the 75th
percentile of the 8x8 pixel values.Background: Average of the lowest 2% probe
cells is taken as the background value and subtracted.
Compute also quality values.
![Page 44: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/44.jpg)
Data and notationData and notationPMijg , MMijg = Intensities for perfect match and
mismatch probe j for gene g in chip i
i = 1,…, n one to hundreds of chips
j = 1,…, J usually 11 or 16 probe pairs
g = 1,…, G 6…30,000 probe sets.
Tasks: calibrate (normalize) the measurements from different chips
(samples)summarize for each probe set the probe level data, i.e., 16 PM
and MM pairs, into a single expression measure.compare between chips (samples) for detecting differential
expression.
![Page 45: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/45.jpg)
expression measures: MAS 4.0
expression measures: MAS 4.0
Affymetrix GeneChip MAS 4.0 software uses AvDiff, a trimmed mean:
o sort dj = PMj -MMj o exclude highest and lowest valueo J := those pairs within 3 standard
deviations of the average
1( )
# j jj J
AvDiff PM MMJ
![Page 46: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/46.jpg)
Expression measures MAS 5.0
Expression measures MAS 5.0
Instead of MM, use "repaired" version CTCT= MM if MM<PM = PM / "typical log-ratio"if MM>=PM
"Signal" = Tukey.Biweight (log(PM-CT))
(… median)
Tukey Biweight: B(x) = (1 – (x/c)^2)^2 if |x|<c, 0 otherwise
![Page 47: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/47.jpg)
Expression measures: Li & Wong
Expression measures: Li & Wong
dChip fits a model for each gene
where
– i: expression index for gene i
– j: probe sensitivity
Maximum likelihood estimate of MBEI is used as expression measure of the gene in chip i.
Need at least 10 or 20 chips.
Current version works with PMs only.
2, (0, )ij ij i j ij ijPM MM N
![Page 48: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/48.jpg)
Affymetrix: IPM = IMM + Ispecific ?
log(PM/MM)0From: R. Irizarry et al.,
Biostatistics 2002
![Page 49: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/49.jpg)
Chemistry
i
25
1
log log ( )i ii
Y x w s
wi
position- and sequence-specific effects wi(s):Naef et al., Phys Rev E 68 (2003)
![Page 50: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/50.jpg)
Expression measures RMA: Irizarry et al. (2002)
Expression measures RMA: Irizarry et al. (2002)
o Estimate one global background value b=mode(MM). No probe-specific background!
o Assume: PM = strue + b
Estimate s0 from PM and b as a conditional expectation E[strue|PM, b].
o Use log2(s).
o Nonparametric nonlinear calibration ('quantile normalization') across a set of chips.
![Page 51: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/51.jpg)
AvDiff-like
with A a set of “suitable” pairs.
Li-Wong-like: additive model
Estimate RMA = ai for chip i using robust method median polish (successively remove row and column medians, accumulate terms, until convergence). Works with d>=2
2
1RMA log ( )j j
j A
PM BG
Robust expression measures RMA: Irizarry et al. (2002)
Robust expression measures RMA: Irizarry et al. (2002)
2log ( )ij i j ijPM BG a b
![Page 52: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/52.jpg)
Software for pre-processing of Affymetrix
data• Bioconductor R package affy.• Background estimation.• Probe-level normalization.• Expression measures• Two main functions: ReadAffy,
expresso.• Can use vsn as a normalization
method for expresso.
![Page 53: First analysis steps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649d225503460f949f8923/html5/thumbnails/53.jpg)
ReferencesNormalization for cDNA microarray data: a robust composite method
addressing single and multiple slide systematic variation. YH Yang, S Dudoit, P Luu, DM Lin, V Peng, J Ngai and TP Speed. Nucl. Acids Res. 30(4):e15, 2002.
Variance Stabilization Applied to Microarray Data Calibration and to the Quantification of Differential Expression. W.Huber, A.v.Heydebreck, H.Sültmann, A.Poustka, M.Vingron. Bioinformatics, Vol.18, Supplement 1, S96-S104, 2002.
A Variance-Stabilizing Transformation for Gene Expression Microarray Data. : Durbin BP, Hardin JS, Hawkins DM, Rocke DM. Bioinformatics, Vol.18, Suppl. 1, S105-110.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. Irizarry, RA, Hobbs, B, Collin, F, Beazer-Barclay, YD, Antonellis, KJ, Scherf, U, Speed, TP (2002). Accepted for publication in Biostatistics. http://biosun01.biostat.jhsph.edu/~ririzarr/papers/index.html
A more complete list of references is in:Elementary analysis of microarray gene expression data. W. Huber,
A. von Heydebreck, M. Vingron, manuscript. http://www.dkfz-heidelberg.de/abt0840/whuber/