8/18/20151 FINANCIAL MANAGEMENT. 8/18/20152 Financial Ratio Analysis.
12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this...
-
Upload
hillary-thornton -
Category
Documents
-
view
212 -
download
0
Transcript of 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this...
![Page 1: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/1.jpg)
04/21/23 1
Microarray Data Pre-Processing
![Page 2: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/2.jpg)
04/21/23 2
Copyright notice
• Many of the images in this power point presentation of other people. The Copyright belong to the original authors. Thanks!
![Page 3: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/3.jpg)
04/21/23 3
Microarray data analysis: preprocessing
The main goal of data preprocessing is to removethe systematic bias in the data as completely aspossible, while preserving the variation in geneexpression that occurs because of biologicallyrelevant changes in transcription.
![Page 4: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/4.jpg)
04/21/23 4
Microarray data analysis: preprocessing
Observed differences in gene expression could be due to transcriptional changes, or they could becaused by artifacts such as:
• different labeling efficiencies of Cy3, Cy5• uneven spotting of DNA onto an array surface• variations in RNA purity or quantity• variations in washing efficiency• variations in scanning efficiency
![Page 5: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/5.jpg)
Microarray data analysis: preprocessing
• Image analysis
• Background correction
• Normalization
• Summarization
04/21/23 5
![Page 6: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/6.jpg)
Image analysis
• The raw data from a cDNA microarray experiment consist of pairs of image files, 16-bit TIFFs, one for each of the dyes.
• Image analysis is required to extract measures of the red and green fluorescence intensities for each spot on the array.
![Page 7: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/7.jpg)
Steps in Images Processing
1. Addressing: locate centers
2. Segmentation: classification of pixels either as signal or background. using seeded region growing).
3. Information extraction: for each spot of the array, calculates signal intensity pairs, background and quality measures.
![Page 8: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/8.jpg)
Addressing
This is the process of assigning coordinates to each of the spots.
Automating this part of the procedure permits high throughput analysis.
4 by 4 grids19 by 21 spots per grid
![Page 9: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/9.jpg)
Addressing
Registration
Registration
![Page 10: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/10.jpg)
Problems in automatic addressing
Misregistration of the red and green channels
Rotation of the array in the image
Skew in the array
Rotation
Rotation
![Page 11: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/11.jpg)
Segmentation methods• Fixed circles• Adaptive Circle• Adaptive Shape
– Edge detection.– Seeded Region Growing. (R. Adams and L.
Bishof (1994): Regions grow outwards from the seed points preferentially according to the difference between a pixel’s value and the running mean of values in an adjoining region.
• Histogram Methods
– Adaptive threshold.
![Page 12: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/12.jpg)
Examples of algorithms and software implementation
Methods Software / algorithms
Fixed Circle ScanAlyze, GenePix, QuantArray
Adaptive Circle GenePix
Adaptive Shape Edging and region growing.
Histogram Method QuantArray and adaptivethresholding.
![Page 13: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/13.jpg)
Limitation of fixed circle method
SRG Fixed Circle
![Page 14: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/14.jpg)
Limitation of circular segmentation
—Small spot—Not circular
Results from SRG
![Page 15: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/15.jpg)
Information Extraction
• Spot Intensities
– mean (pixel intensities).
– median (pixel intensities).
– Pixel variation (IQR of log (pixel
intensities).• Background values
– Local
– Morphological opening
– Constant (global)
– None
• Quality Information
Signal
Background
![Page 16: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/16.jpg)
04/21/23 16
Background Correction• Recall that Spot signal or simply signal is fluorescence intensity due
to target molecules hybridized to probe sequences contained in a spot (what we would like to measure) plus background fluorescence (what we would rather not measure).
• Background is fluorescence that may contribute to spot pixel intensities but is not due to fluorescence from target molecules hybridized to spot probe sequences.
• The idea is to remove background fluorescence from the spot signal fluorescence because the spot signal is believed to be a sum of fluorescence due to background and fluorescence due to hybridized target cDNA.
![Page 17: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/17.jpg)
04/21/23 17
Local background
• Focusing on small regions surrounding the spot mask.
• Median of pixel values in this region
• Most software package implement such an approach
ScanAlyze ImaGene Spot, GenePix
• By not considering the pixels immediately surrounding the spots, the background estimate is less sensitive to the performance of the segmentation procedure
![Page 18: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/18.jpg)
04/21/23 18
Global background
• Global method which subtracts a constant background for all spots
• Some findings suggests that the binding of fluorescent dyes to ‘negative control spots’ is lower than the binding to the glass slide– More meaningful to estimate background based on a
set of negative control spots– If no negative control spots: approximation of the
average background = third percentile of all the spot foreground values
![Page 19: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/19.jpg)
04/21/23 19
Background Correction Strategies(applied prior to logging signal intensity)
1. Subtract local background, e.g.,signal mean – background mean orsignal mean – background median
This can increase variation in measurements, especially for low expressing genes. Some believe that local backgroundwill overestimate the background contribution to spotfluorescence. Background fluorescence where cDNA hasbeen spotted may be different than background where nocDNA has been spotted.
![Page 20: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/20.jpg)
04/21/23 20
Background Correction Strategies(applied prior to logging signal intensity)
2. For each spot, find the local background of the
spot as well as the local backgrounds of all
neighboring spots. Compute the median or mean of these
local backgrounds. Subtract that summary of local
backgrounds from the spot’s signal.
This is similar to option 1 but can reduce some variation in
background estimation.
![Page 21: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/21.jpg)
04/21/23 21
Background Correction Strategies(applied prior to logging signal intensity)
3. Find the median or mean of local backgrounds in asector. Subtract the sector summary of local backgroundsfrom each signal in the sector.
4. Subtract the median or mean of blank spot signals ornegative control signals in a sector from all other signals ina sector.
5. Estimate the background for each spot by fitting a rowand column model to the local background values in asector. (See next slide.)
![Page 22: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/22.jpg)
04/21/23 22
Modeling local backgrounds within each sector (Kafadar and Phang. (2003). CSDA 44 313-338)
bij = m + ri + cj + eij
background for spotin ith row and jth column
of the sector
baselinebackground
for the sector
roweffectfor thesector
columneffectfor thesector
residual
An estimated background for each spot bij is obtained via median polish.^
![Page 23: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/23.jpg)
04/21/23 23
Comments on Background Correction• Subtracting background may result in a
negative or zero adjusted-signal values. Such values cannot be logged. One simple approach is to replace all negative values by zero, add one to all values (whether zero or not), and log the resulting values.
![Page 24: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/24.jpg)
04/21/23 24
Data Normalization
• Large sets of experiments involve dozens to hundreds arrays
• To make the arrays comparable, the data need to be normalized
• Because equal amounts of mRNA are used in all arrays, the spot intensities of an array should sum to a fixed number
![Page 25: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/25.jpg)
04/21/23 25
What is Normalization?• Normalization describes the process of removing
(or minimizing) non-biological variation in the measured gene expression levels of hybridized mRNA so that biological differences can be more easily detected.
• Typically normalization is attempting to remove global effects, i.e., effects that can be seen by examining plots that show all the data for a slide or slides.
• Normalization does not necessarily have anything to do with the normal distribution that plays a prominent role in statistics.
![Page 26: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/26.jpg)
04/21/23 26
Sources of Non-Biological Variation• Dye bias: differences in heat and light sensitivity,
efficiency of dye incorporation• Differences in the amount of labeled cDNA
hybridized to each channel in a microarray experiment – Channel is used to refer to a combination of a dye
and a slide.
• Variation across replicate slides• Variation across hybridization conditions• Variation in scanning conditions• Variation among technicians doing the lab work• etc.......................................................................
![Page 27: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/27.jpg)
04/21/23 27
Normalization Methods forTwo-Color Microarray Data
![Page 28: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/28.jpg)
04/21/23 28
Side-by-side boxplots show examples of variation across channels.
![Page 29: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/29.jpg)
04/21/23 29
Slide 2Cy3 Cy5Slide 1
Cy3 Cy5
median
Q3=75th percentile
Q1=25th percentile
minimum
maximum
![Page 30: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/30.jpg)
04/21/23 30
Interquartile range (IQR) is Q3-Q1. Points more than 1.5*IQR above Q3or more than 1.5*IQR below Q1 are displayed individually.
median
Q3=75th percentile
Q1=25th percentile
minimum
maximum
![Page 31: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/31.jpg)
04/21/23 31
One of the simplest normalization strategies is to align the log signals so that all channels have the same median.
• The value of the common median is not important for subsequent analyses.
• A convenient choice is zero so that positive or negative values reflect signals above or below the median for a particular channel.
• If negative normalized signal values seem confusing, any positive constant may be added to all values after normalization to zero medians.
![Page 32: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/32.jpg)
04/21/23 32
Log
Mea
n S
igna
l Cen
tere
d at
0
![Page 33: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/33.jpg)
04/21/23 33
Note that medians match but variation seems to differ greatly across channels.
Log
Mea
n S
igna
l Cen
tere
d at
0
![Page 34: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/34.jpg)
04/21/23 34
Scale normalization (Yang, et al. 2002. Nucliec Acids Research, 30, 4 e15)
Consider a matrix X with i=1,...,I rows and j=1,...,J columns.
Let xij denote the entry in row i and column j.
We will apply scale normalization to the matrix of log signal mean values that have already been median centered (each row corresponds to a gene and each column corresponds to a channel).
For each column j, let mj=median(x1j, x2j, ..., xIj).
For each column j, let MADj=median(|x1j-mj|,|x2j-mj|,...,|xIj-mj|).
MAD: median absolute deviation
To scale normalize the columns of X to a constant value C, multiply all the entries in the jth column by C/MADj for all j=1,...,J.
A common choice for C is the geometric mean of MAD1,...,MADJ =
The choice of C will not effect subsequent tests or p-values but will affect fold change calculations.
( ) J/J
j jMAD1
1∏
=
*Yang et al. recommended scale normalization for log R/G values.
![Page 35: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/35.jpg)
04/21/23 35
Log
Mea
n S
igna
l (ce
nter
ed a
nd s
cale
d)
Data after Median Centering and Scale Normalizing
![Page 36: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/36.jpg)
04/21/23 36
A Simple Example
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7 2 7 15 3 3 6 5 8 4 1 5 2 9 5 9 13 6 11
![Page 37: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/37.jpg)
04/21/23 37
Determine Channel Medians
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7 2 7 15 3 3 6 5 8 4 1 5 2 9 5 9 13 6 11
medians 7 6 6 11
![Page 38: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/38.jpg)
04/21/23 38
Subtract Channel Medians
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 1 9 3 2 2 0 -4 1 4 3 -4 0 -1 -3 4 -6 -1 -4 -2 5 2 7 0 0
This is the data after median centering.
![Page 39: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/39.jpg)
04/21/23 39
Find Median Absolute Deviations
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 1 9 3 2 2 0 -4 1 4 3 -4 0 -1 -3 4 -6 -1 -4 -2 5 2 7 0 0
MAD 2 4 1 2
![Page 40: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/40.jpg)
04/21/23 40
Find Scaling Constant
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 1 9 3 2 2 0 -4 1 4 3 -4 0 -1 -3 4 -6 -1 -4 -2 5 2 7 0 0
MAD 2 4 1 2
C = (2*4*1*2)1/4 = 2
![Page 41: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/41.jpg)
04/21/23 41
Find Scaling Factors
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 1 9 3 2 2 0 -4 1 4 3 -4 0 -1 -3 4 -6 -1 -4 -2 5 2 7 0 0
Scaling 2 2 2 2Factors 2 4 1 2
![Page 42: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/42.jpg)
04/21/23 42
Scale Normalize theMedian Centered Data
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 1 4.5 6 2 2 0 -2.0 2 4 3 -4 0.0 -2 -3 4 -6 -0.5 -8 -2 5 2 3.5 0 0
This is the data after median centering andscale normalizing.
![Page 43: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/43.jpg)
04/21/23 43Log Green
Log
Red
Slide 1 Log Signal Means after Median Centering and Scaling All Channels
Evidence of intensity-dependent dye bias
![Page 44: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/44.jpg)
04/21/23 44 A = (Log Green + Log Red) / 2
M =
Log
Red
- Lo
g G
reen
M vs. A Plot of the Logged, Centered, and Scaled Slide 1 Data
![Page 45: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/45.jpg)
04/21/23 45
To handle intensity-dependent dye bias, Yang, et al. (2002. Nucliec Acids Research, 30, 4 e15) recommend “lowess” normalization prior to median centering and scale normalizing.
“lowess” stands for
LOcally WEighted polynomial regreSSion.
The original reference for lowess is
Cleveland, W. S. (1979). Robust locally weightedregression and smoothing scatterplots.
JASA 74 829-836.
![Page 46: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/46.jpg)
04/21/23 46
LOESS
• At each point in the data set a low-degree polynomial is fit to a subset of the data, with explanatory variable values near the point whose response is being estimated.
• The polynomial is fit using weighted least squares, giving more weight to points near the point whose response is being estimated and less weight to points further away.
• The value of the regression function for the point is then obtained by evaluating the local polynomial using the explanatory variable values for that data point.
• The LOESS fit is complete after regression function values have been computed for each of the n data points.
From Wikipedia, the free encyclopedia
![Page 47: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/47.jpg)
04/21/23 47Log Green
Log
Red
Slide 1 Log Signal Means
![Page 48: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/48.jpg)
04/21/23 48 A = (Log Green + Log Red) / 2
M =
Log
Red
- Lo
g G
reen
M vs. A Plot for Slide 1 Log Signal Means
![Page 49: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/49.jpg)
04/21/23 49 A = (Log Green + Log Red) / 2
M =
Log
Red
- Lo
g G
reen
M vs. A Plot for Slide 1 Log Signal Meanswith lowess fit (f=0.40)
![Page 50: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/50.jpg)
04/21/23 50 A = (Log Green + Log Red) / 2
M =
Log
Red
- Lo
g G
reen
Adjust M Values
![Page 51: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/51.jpg)
04/21/23 51A = (Adjusted Log Green + Adjusted Log Red) / 2
M =
Adj
uste
d Lo
g R
ed –
Adj
uste
d L
og G
reen
M vs. A Plot after Adjustment
![Page 52: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/52.jpg)
04/21/23 52
M vs. A Plot for Slide 1 Log Signal Means
Adj
uste
d Lo
g R
ed
Adjusted Log Green
adjusted log red = log red – adj/2
adjusted log green=log green + adj/2
where adj = lowess fitted value
![Page 53: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/53.jpg)
04/21/23 53 A = (Log Green + Log Red) / 2
M =
Log
Red
- Lo
g G
reen
M vs. A Plot for Slide 1 Log Signal Meanswith lowess fit (f=0.40)
For spots with A=7, the lowess fitted value is 0.883. Thus the value of adj discussed on the previous slide is 0.883 for spots with A=7.
The M value for such spots would be moved down by 0.883. The log red value would bedecreased by 0.883/2 and the log green value would be increased by 0.883/2 to obtain adjusted log red and adjusted log green values, respectively.
0.883
![Page 54: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/54.jpg)
04/21/23 54
How is the lowess curve determined?Weight function
Consider the tricube weight function defined as
Suppose we have data points (x1,y1), (x2,y2),...(xn,yn).
Let 0 < f ≤ 1 denote a fraction that will determine the smoothness of the curve.
Let r = n*f rounded to the nearest integer.
t
T
(t)
Tricube Weight Function
For i=1, ..., n; let hi be the rth smallest
number among |xi-x1|, |xi-x2|, ..., |xi-xn|.
T(t) = ( 1 - | t | 3 ) 3 for | t | < 1
= 0 for | t | ≥ 1.
For k=1, 2, ..., n; let wk(xi)=T( ( xk – xi ) / hi ).
![Page 55: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/55.jpg)
04/21/23 55
An Examplei 1 2 3 4 5 6 7 8 9 10xi 1 2 5 7 12 13 15 25 27 30yi 1 8 4 5 3 9 16 15 23 29
x
y
Suppose alowess curve
will be fitto this datawith f=0.4.
![Page 56: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/56.jpg)
04/21/23 56
Table Containing |xi-xj| Values
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
x1 0 1 4 6 11 12 14 24 26 29x2 1 0 3 5 10 11 13 23 25 28x3 4 3 0 2 7 8 10 20 22 25x4 6 5 2 0 5 6 8 18 20 23x5 11 10 7 5 0 1 3 13 15 18x6 12 11 8 6 1 0 2 12 14 17x7 14 13 10 8 3 2 0 10 12 15x8 24 23 20 18 13 12 10 0 2 5x9 26 25 22 20 15 14 12 2 0 3x10 29 28 25 23 18 17 15 5 3 0
![Page 57: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/57.jpg)
04/21/23 57
Calculation of hi from |xi-xj| Values
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
x1 0 1 4 6 11 12 14 24 26 29 h1= 6x2 1 0 3 5 10 11 13 23 25 28 h2= 5x3 4 3 0 2 7 8 10 20 22 25 h3= 4x4 6 5 2 0 5 6 8 18 20 23 h4= 5x5 11 10 7 5 0 1 3 13 15 18 h5= 5x6 12 11 8 6 1 0 2 12 14 17 h6= 6x7 14 13 10 8 3 2 0 10 12 15 h7= 8x8 24 23 20 18 13 12 10 0 2 5 h8=10 x9 26 25 22 20 15 14 12 2 0 3 h9=12x10 29 28 25 23 18 17 15 5 3 0 h10=15
n=10, f=0.4 r=4
![Page 58: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/58.jpg)
04/21/23 58
Weights wk(xi) Rounded to Nearest 0.001 k
1 2 3 4 5 6 7 8 9 10 1 1.000 0.986 0.348 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2 0.976 1.000 0.482 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 0.000 0.193 1.000 0.670 0.000 0.000 0.000 0.000 0.000 0.000 4 0.000 0.000 0.820 1.000 0.000 0.000 0.000 0.000 0.000 0.000 5 0.000 0.000 0.000 0.000 1.000 0.976 0.482 0.000 0.000 0.000 6 0.000 0.000 0.000 0.000 0.986 1.000 0.893 0.000 0.000 0.000 7 0.000 0.000 0.000 0.000 0.850 0.954 1.000 0.000 0.000 0.000 8 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.976 0.670 9 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.986 1.000 0.95410 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.893 0.976 1.000
i
w6(x5) = (1 - ( | x6 - x5 | / h5 ) 3 ) 3 = ( 1 - ( | ( 13 – 12 ) / 5 | ) 3 ) 3 = ( 1 – 1 / 125 ) 3 0.976~~
![Page 59: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/59.jpg)
04/21/23 59
How is the lowess curve determined?Regression
)x(ˆi
*0β
∑=
n
kkkik )xβ - β -y)(x(w
1
210
For each i=1, 2, ..., n; let and denote the values of and 0β 1β
that minimize .
ii*
i**
i x)x(ˆ)x(ˆy 10 β+β=For i=1, 2, ..., n; let *iii y - ye =
))s/(e(B kk 6=δ
)x(ˆi
*1β
Consider the bisquare weight function defined as
B(t) = ( 1 - t 2 ) 2 for | t | < 1
= 0 for | t | ≥ 1.
B
(t)
Bisquare Weight Function
t
For k=1,2,...,n; let
where s is the median of |e1|, |e2|, ..., |en|.
and
![Page 60: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/60.jpg)
04/21/23 60
)x(ˆi1β
∑1
210
n
kkkikk )xβ - β -y)(x(w
=
δ
)x(ˆi0βFor each i=1, 2, ..., n; let and denote the values of and
that minimize .
0β 1β
How is the lowess curve determined?
iy
iiii x)x(ˆ)x(ˆy 10 β+β=For i=1, 2, ..., n; let .
Now use the new fitted values to compute new as on the previous slide.
Substitute the new for the old in the expression above and repeat the
minimization described above to obtain new values. These resulting values
are the lowess fitted values. Plot these values versus x1, x2, ..., xn and connect
with straight lines to obtain the lowess curve.
iy iy
kδ
kδ
kδ
![Page 61: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/61.jpg)
04/21/23 61
![Page 62: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/62.jpg)
04/21/23 62
![Page 63: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/63.jpg)
04/21/23 63
![Page 64: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/64.jpg)
04/21/23 64
![Page 65: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/65.jpg)
04/21/23 65
![Page 66: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/66.jpg)
04/21/23 66
![Page 67: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/67.jpg)
04/21/23 67
![Page 68: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/68.jpg)
04/21/23 68
![Page 69: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/69.jpg)
04/21/23 69
![Page 70: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/70.jpg)
04/21/23 70
![Page 71: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/71.jpg)
04/21/23 71
Plot Showing All 10 Lines and Predicted Values after One More Iteration
![Page 72: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/72.jpg)
04/21/23 72
The Lowess Curve
![Page 73: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/73.jpg)
04/21/23 73
After a separate lowess normalization for eachslide, the adjusted values can be median centeredand scale normalized across all channels using thelowess-normalized data for each channel.
A sector represents the set of points spottedby a single pin on a single slide. The entirenormalization process described above can becarried out separately for each sector on eachchannel.
It may be necessary to normalize by sector/channelcombinations if spatial variability is apparent.
![Page 74: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/74.jpg)
04/21/23 74
Boxplots of Mean Signal after Logging, Lowess Normalization,Median Centering, and Scaling
N
orm
aliz
ed S
igna
l
![Page 75: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/75.jpg)
04/21/23 75
Bolstad, et al. (2003, Bioinformatics 19 2:185-193) propose quantile normalization for microarray data • Quantile normalization is most commonly used in
normalization of Affymetrix data
• It can be used for two-color data as well.
• Quantile normalization can force each channel to have the same quantiles.
• xq (for q between 0 and 1) is the q quantile of a data set if the fraction of the data points less than or equal to xq is at least q, and the fraction of the data points greater than or equal to xq at least 1-q.
• median=x0.5 Q1=x0.25 Q3=x0.75
![Page 76: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/76.jpg)
04/21/23 76
Boxplots of Log Signal Means after Quantile Normalization
![Page 77: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/77.jpg)
04/21/23 77Log Green
Log
Red
Original Slide 1 Log Signal Means
![Page 78: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/78.jpg)
04/21/23 78
Comparison of Slide 1 Log Signal Means after Quantile Normalization
Log Green
Lo
g R
ed
![Page 79: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/79.jpg)
04/21/23 79
Details of Quantile Normalization
1. Find the smallest log signal on each channel.
2. Average the values from step 1.
3. Replace each value in step 1 with the average computed in step 2.
4. Repeat steps 1 through 3 for the second smallest values, third smallest values,..., largest values.
![Page 80: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/80.jpg)
04/21/23 80
A Simple Example
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7 2 7 15 3 3 6 5 8 4 1 5 2 9 5 9 13 6 11
![Page 81: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/81.jpg)
04/21/23 81
Find the Smallest Valuefor Each Channel
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7 2 7 15 3 3 6 5 8 4 1 5 2 9 5 9 13 6 11
![Page 82: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/82.jpg)
04/21/23 82
Average These Values
(1+2+2+8)/4=3.25
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7 2 7 15 3 3 6 5 8 4 1 5 2 9 5 9 13 6 11
![Page 83: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/83.jpg)
04/21/23 83
Replace Each Value by the Average
(1+2+2+8)/4=3.25
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7 3.25 7 15 3 3 6 5 3.25 4 3.25 5 3.25 9 5 9 13 6 11
![Page 84: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/84.jpg)
04/21/23 84
Find the Next Smallest Values
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7 3.25 7 15 3 3 6 5 3.25 4 3.25 5 3.25 9 5 9 13 6 11
![Page 85: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/85.jpg)
04/21/23 85
Average These Values
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7 3.25 7 15 3 3 6 5 3.25 4 3.25 5 3.25 9 5 9 13 6 11
(3+5+5+9)/4=5.5
![Page 86: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/86.jpg)
04/21/23 86
Replace Each Value by the Average
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7 3.25 7 15 3 5.50 6 5.50 3.25 4 3.25 5.50 3.25 5.50 5 9 13 6 11
![Page 87: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/87.jpg)
04/21/23 87
Find the Average of theNext Smallest Values
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7 3.25 7 15 3 5.50 6 5.50 3.25 4 3.25 5.50 3.25 5.50 5 9 13 6 11
(7+6+6+11)/4=7.5
![Page 88: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/88.jpg)
04/21/23 88
Replace Each Value by the Average
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7.50 3.25 7 15 3 5.50 7.50 5.50 3.25 4 3.25 5.50 3.25 5.50 5 9 13 7.50 7.50
![Page 89: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/89.jpg)
04/21/23 89
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 8 15 9 13 2 7.50 3.25 7 15 3 5.50 7.50 5.50 3.25 4 3.25 5.50 3.25 5.50 5 9 13 7.50 7.50
(8+13+7+13)/4=10.25
Find the Average of theNext Smallest Values
![Page 90: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/90.jpg)
04/21/23 90
Replace Each Value by the Average
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 10.25 15 9 10.25 2 7.50 3.25 10.25 15 3 5.50 7.50 5.50 3.25 4 3.25 5.50 3.25 5.50 5 9 10.25 7.50 7.50
![Page 91: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/91.jpg)
04/21/23 91
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 10.25 15 9 10.25 2 7.50 3.25 10.25 15 3 5.50 7.50 5.50 3.25 4 3.25 5.50 3.25 5.50 5 9 10.25 7.50 7.50
(9+15+9+15)/4=12.00
Find the Average of theNext Smallest Values
![Page 92: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/92.jpg)
04/21/23 92
Replace Each Value by the Average
Gene Slide1Cy3 Slide1Cy5 Slide2Cy3 Slide2Cy5 1 10.25 12.00 12.00 10.25 2 7.50 3.25 10.25 12.00 3 5.50 7.50 5.50 3.25 4 3.25 5.50 3.25 5.50 5 12.00 10.25 7.50 7.50
This is the data matrix after quantile normalization.
![Page 93: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/93.jpg)
04/21/23 93
Background Correction and Normalization of Affymetrix
GeneChip Data
![Page 94: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/94.jpg)
04/21/23 94
Affymetrix .CEL Files
• A .CEL file contains one number representing signal intensity for each probe cell on a single GeneChip.
• .CEL files can be read with Affymetrix software or in R using the Bioconductor package affy.
• We will discuss two methods for normalizing and obtaining expression measures using data from Affymetrix .CEL files.
![Page 95: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/95.jpg)
04/21/23 95
Methods
1. Microarray Analysis Suite (MAS) 5.0 Signal proposed by Affymetrix. Statistical Algorithms Description Document (2002) Affymetrix Inc.
2. Robust Multi-array Average (RMA) proposed by Irizarray et al. (2003) Biostatistics 4, 249-264.
These are perhaps the two most popular of many methods for normalizing and computing expression measures using Affymetrix data. Currently over 50 methods are describedand compared at http://affycomp.biostat.jhsph.edu/.
![Page 96: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/96.jpg)
04/21/23 96
MAS 5.0 Signal: Background Adjustment
• Each chip is divided into 16 rectangular zones.
• The lowest 2% of intensities in each zone are averaged to form a zone-specific background value denoted bZk for zones k=1, 2, ..., 16.
• The standard deviation of the lowest 2% of intensities in each zone is calculated and denoted nZk for zones k=1, 2, ..., 16.
• Let dk(x,y) denote the distance from the center of zone k to a probe cell located at coordinates (x,y) on the chip.
![Page 97: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/97.jpg)
04/21/23 97
GeneChip Divided into 16 Zones
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
probe cell atcoordinates
(x,y)
x
y
![Page 98: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/98.jpg)
04/21/23 98
d1(x,y)d4(x,y)
d16(x,y)
16 Distances to Zone Centers for Each Probe Cell
![Page 99: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/99.jpg)
04/21/23 99
MAS 5.0 Signal: Background Adjustment (continued)
• Let wk(x,y)=1/(dk(x,y)+100).
• Denote the background for the cell located at coordinates (x,y) by
b(x,y)=Σk=1 wk(x,y) bZk / Σk=1 wk(x,y).
• Denote the “noise” for the cell located at coordinates (x,y) by
n(x,y)=Σk=1 wk(x,y) nZk / Σk=1 wk(x,y).
2
16 16
16 16
![Page 100: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/100.jpg)
04/21/23 100
MAS 5.0 Signal: Background Adjustment (continued)
• Let I(x,y) denote the original intensity of the cell located at coordinates (x,y) on the chip. (75th percentile of 36 pixel intensities in the center of the cell.)
• Let I’(x,y)=max ( I(x,y) , 0.5 ).
• Define the background-adjusted intensity for the cell at coordinates (x,y) by
A(x,y)=max { I’(x,y)-b(x,y) , 0.5n(x,y) }.
• Henceforth these background-adjusted intensities will be referred to as either PM or MM for perfect match or mismatch cells, respectively.
![Page 101: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/101.jpg)
04/21/23 101
MAS 5.0 Signal: Ideal Mismatch Computation
• MM values are supposed to provide measures of cross-hybridization and stray signal intensity that inflate the value of PM.
• In the simplest case, a PM value would be corrected simply by subtracting its corresponding MM value.
• However, some MM values are bigger than their corresponding PM values so that PM-MM would become negative.
• Because negative values do not make sense and would pose problems with subsequent steps in analysis, Affymetrix determines an Ideal Mismatch (IM) value for each probe pair that is guaranteed to be less than PM.
![Page 102: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/102.jpg)
04/21/23 102
MAS 5.0 Signal: Ideal Mismatch Computation (continued)
For a given probe set containing n probe pairs, let PMj and MMj denote the perfect match and mismatch values of the jth probe pair. The IM value from the jth probe pair (IMj) is determined as follows:
• If PMj > MMj, then IMj = MMj and no further computation is needed.
• If PMj ≤ MMj, compute
M = TBW { log2(PM1/MM1),...,log2(PMn/MMn) }
where TBW denotes a one-step Tukey BiWeight (a special weighted average described later).
![Page 103: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/103.jpg)
04/21/23 103
MAS 5.0 Signal: Ideal Mismatch Computation (continued)
• If M > 0.03, then IMj = PMj / 2M.
• If M ≤ 0.03, then compute P = and let
IMj = PMj / 2P.
• Note that at M = 0.03, IMj = PMj / 1.021012 so that PMj will be slightly larger than IMj.
• As M gets larger, IMj decreases. As M gets smaller, IMj
increases towards PMj / 1.020949.
1 + ( 0.03-M )10
0.03
![Page 104: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/104.jpg)
04/21/23 104
MAS 5.0 Signal: Signal Log Value Computation
• Let Vj = max ( PMj – IMj , 2-20 ).
• Define the probe value for the jth probe pair by PVj = log2(Vj).
• The signal log value for a given probe set is defined by
SLV = TBW ( PV1 , PV2 , ... , PVn )
where TBW denotes a one-step Tukey BiWeight
(a special weighted average to be discussed later).
![Page 105: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/105.jpg)
04/21/23 105
• Let SLVi denote the signal log value for the ith probe set
on a single chip.
• Let I denote the number of probe sets on the chip.
• Let SF = 500/TrimMean( 2SLV , 2SLV , ..., 2SLV ; 0.02,0.98).
• MAS 5.0 Signal for the ith probe set is Signali = SF * 2SLV.
• All computations are done separately for each chip to obtain a Signal value for each chip and probe set.
MAS 5.0 Signal: Scaling and Signal Calculation
1 2 I
The average of the values in parenthesesthat are strictly between the 0.02 and 0.98
quantiles of the values in parentheses.
i
![Page 106: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/106.jpg)
04/21/23 106
The One-Step Tukey BiWeight EstimatorUsed by Affymetrix
• Let x1, x2, ..., xn denote observations.
• Let m = median ( x1, x2, ..., xn ).
• Let MAD = median ( |x1 – m|, |x2 – m|, ..., |xn – m| ).
• For each i = 1, 2, ..., n; let ti = . xi - m
5 * MAD + 0.0001Factor Affymetrix
uses to avoiddivision by 0.
![Page 107: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/107.jpg)
04/21/23 107
The One-Step Tukey BiWeight EstimatorUsed by Affymetrix (ctd.)
Recall the bisquare weight function defined as
B(t) = ( 1 - t 2 ) 2 for | t | < 1
= 0 for | t | ≥ 1.
B
(t)
Bisquare Weight Function
tn
nTBW ( x1, x2, ..., xn ) = Σi=1 B(ti) xi
Σi=1 B(ti)
![Page 108: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/108.jpg)
04/21/23 108
An Example
Compute TBW ( 1, 7, 13, 15, 28, 1075 ).
m = ( 13 + 15 ) / 2 = 14.
MAD = median ( |1-14|,|7-14|,|13-14|,|15-14|,|28-14|,|1075-14| )
= median ( 13, 7, 1, 1, 14, 1061 )
= median ( 1, 1, 7, 13, 14, 1061 )
= ( 7 + 13 ) / 2 = 10.
t1 = -13 / 50 t2 = -7 / 50 t3 = -1 / 50
t4 = 1 / 50 t5 = 14 / 50 t6 = 1061 / 50
Ignore the 0.0001factor to make
calculationseasier.
![Page 109: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/109.jpg)
04/21/23 109
An Example (continued)t1 = -13 / 50 t2 = -7 / 50 t3 = -1 / 50
t4 = 1 / 50 t5 = 14 / 50 t6 = 1061 / 50
B(t1)=B(0.26)=( 1 - 0.262 ) 2 = 0.8693698B(t2)=B(0.14)=( 1 - 0.142 ) 2 = 0.9611842B(t3)=B(0.02)=( 1 - 0.022 ) 2 = 0.9992002B(t4)=B(0.02)=( 1 - 0.022 ) 2 = 0.9992002B(t5)=B(0.28)=( 1 - 0.282 ) 2 = 0.8493466B(t6)=0
0.8693698*1+ 0.9611842*7+0.9992002*13+0.9992002*15+0.8493466*28+0*1075
0.8693698+ 0.9611842+0.9992002+0.9992002+0.8493466+0
=12.68772.
![Page 110: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/110.jpg)
04/21/23 110
Obtaining MAS5.0 Signal Valuesfrom Affymetrix .CEL Files
• MAS5.0 Signal values can be obtained from Affymetrix software.
• Approximate MAS5.0 Signal values can be computed with the mas5 function that is part of the Bioconductor package affy.
![Page 111: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/111.jpg)
04/21/23 111
Robust Multi-array Average (RMA)1. Background adjust PM values from .CEL files.
2. Take the base-2 log of each background-adjusted PM intensity.
3. Quantile normalize values from step 2 across all GeneChips.
4. Perform median polish separately for each probe set with rows indexed by GeneChip and columns indexed by probe.
5. For each row, find the average of the fitted values from step 4 to use as probe-set-specific expression measures for each GeneChip.
![Page 112: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/112.jpg)
04/21/23 112
RMA: Background Adjustment Assume PM = S + B where signal S ~ Exp(λ) independent of background B ~ N+(μ,σ2).
N+(μ,σ2) denotes N(μ,σ2) truncated on the left at 0.
![Page 113: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/113.jpg)
04/21/23 113
The Probability Density Function of theExponential Distribution with Mean 1/λ = 10000
s
λe-λs
![Page 114: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/114.jpg)
04/21/23 114
b
e-(b-μ) /(2σ )2 2
(2πσ2)0.5
The Probability Density Function of the Normal Distribution with Mean μ = 1000 and Variance σ2 = 3002
![Page 115: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/115.jpg)
04/21/23 115
s+b
Den
sity
of
s+b
The Probability Density Function of s + bwhere s~Exp(λ=1/10000) and
b~N+(μ = 1000,σ2 = 3002)
![Page 116: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/116.jpg)
04/21/23 116
RMA: Background Adjustment (continued) N(0,1) density function
N(0,1) distribution function
Separately for each chip, estimate μ, σ, and λ from theobserved PM distribution. Plug those estimates into theformula above to obtain an estimate of E(S|PM) for each PMvalue. These serve as background-adjusted PM values.
![Page 117: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/117.jpg)
04/21/23 117
RMA: Background Adjustment (continued)Obtaining Estimates of μ, σ, and λ
(unpublished description of the procedure)
• Estimate the mode of the PM distribution using a kernel density estimate of the PM density.
• Estimate the density of the PM values less than the mode. The mode of this distribution serves as an estimate of μ.
• Assume the data to the left of the estimate of μ are the background observations that fell below their mean. Use those observations to estimate σ.
• Subtract the estimate of μ from all observations larger than the estimate. The mode of this distribution estimates 1/λ.
![Page 118: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/118.jpg)
04/21/23 118
Den
sity
PM Density Estimate Based on Simulated Data
Data below the estimatedmode is used to estimatebackground parameters
μ and σ.
![Page 119: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/119.jpg)
04/21/23 119
Den
sity
Density Estimate of PM Data below the Estimated Mode of the PM Distribution
Estimate of μ = 1612
This data isused to estimateσ as 642.3.
![Page 120: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/120.jpg)
04/21/23 120
Estimate of σ
According to the RMA R code, σ is estimated as follows:
The purpose of the factor of 2 in the numerator is not clear.
![Page 121: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/121.jpg)
04/21/23 121
Den
sity
Density Estimate of PM – μ ValuesGreater than Zero
Estimate of 1/λ = 2019
^
The mean of thesevalues would be a
much better estimateof 1/λ in this case.
(Mean is 9848 and1/λ=10000.)
![Page 122: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/122.jpg)
04/21/23 122
RMA: Quantile Normalization
1. After background adjustment, find the smallest log2(PM) on each chip.
2. Average the values from step 1.
3. Replace each value in step 1 with the average computed in step 2.
4. Repeat steps 1 through 3 for the second smallest values, third smallest values,..., largest values.
![Page 123: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/123.jpg)
04/21/23 123
RMA: Median Polish• For a given probe set with J probe pairs, let yij denote the
background-adjusted, base-2-logged, and quantile-normalized value for GeneChip i and probe j.
• Assume yij = μi + αj + eij where α1 + α2 + ... + αn = 0.
• Perform Tukey’s Median Polish on the matrix of yij values with yij in the ith row and jth column.
gene expressionof the probe seton GeneChip i
probe affinityaffect for thejth probe in theprobe set
residual for thejth probe on theith GeneChip
![Page 124: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/124.jpg)
04/21/23 124
RMA: Median Polish (continued)
• Let yij denote the fitted value for yij that results from the median polish procedure.
• Let αj = y.j – y.. where y.j =Σi=1 yij and y..= Σi=1Σj=1 yij and
and I denotes the number of GeneChips.
• Let μi = yi. =Σj=1 yij / J
• μi is the probe-set-specific measure of expression for GeneChip i.
^
^ ^ ^ ^ ^I I J^
I IJ
^ ^ ^
^
J
^
![Page 125: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/125.jpg)
04/21/23 125
An ExampleSuppose the following are background-adjusted, log2-transformed, quantile-normalized PM intensitiesfor a single probe set. Determine the final RMAexpression measures for this probe set.
1 2 3 4 51 4 3 6 4 72 8 1 10 5 113 6 2 7 8 84 9 4 12 9 125 7 5 9 6 10
Gen
eChi
p
Probe
![Page 126: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/126.jpg)
04/21/23 126
An Example (continued)
4 3 6 4 7 8 1 10 5 11 6 2 7 8 8 9 4 12 9 12 7 5 9 6 10
48797
rowmedians
0 -1 2 0 3 0 -7 2 -3 3-1 -5 0 1 1 0 -5 3 0 3 0 -2 2 -1 3
matrix afterremoving
row medians
![Page 127: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/127.jpg)
04/21/23 127
An Example (continued) 0 -1 2 0 3 0 -7 2 -3 3-1 -5 0 1 1 0 -5 3 0 3 0 -2 2 -1 3
0 -5 2 0 3
column medians
0 4 0 0 0 0 -2 0 -3 0-1 0 -2 1 -2 0 0 1 0 0 0 3 0 -1 0
matrix aftersubtracting
column medians
![Page 128: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/128.jpg)
04/21/23 128
An Example (continued)
0 4 0 0 0 0 -2 0 -3 0-1 0 -2 1 -2 0 0 1 0 0 0 3 0 -1 0
0 0-1 0 0
rowmedians
matrix afterremoving
row medians
0 4 0 0 0 0 -2 0 -3 0 0 1 -1 2 -1 0 0 1 0 0 0 3 0 -1 0
![Page 129: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/129.jpg)
04/21/23 129
An Example (continued) 0 4 0 0 0 0 -2 0 -3 0 0 1 -1 2 -1 0 0 1 0 0 0 3 0 -1 0
0 1 0 0 0
column medians
matrix aftersubtracting
column medians
0 3 0 0 0 0 -3 0 -3 0 0 0 -1 2 -1 0 -1 1 0 0 0 2 0 -1 0
![Page 130: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/130.jpg)
04/21/23 130
An Example (continued)
0 3 0 0 0 0 -3 0 -3 0 0 0 -1 2 -1 0 -1 1 0 0 0 2 0 -1 0
All row medians and column medians are 0.Thus the median polish procedure has converged.The above is the residual matrix that we willsubtract from the original matrix to obtain thefitted values.
![Page 131: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/131.jpg)
04/21/23 131
An Example (continued)
0 3 0 0 0 0 -3 0 -3 0 0 0 -1 2 -1 0 -1 1 0 0 0 2 0 -1 0
4 3 6 4 7 8 1 10 5 11 6 2 7 8 8 9 4 12 9 12 7 5 9 6 10
4 0 6 4 78 4 10 8 116 2 8 6 99 5 11 9 127 3 9 7 10
original matrix residuals from median polish
matrix of fitted values
4.28.26.29.27.2
row means= μ1
= μ2
= μ3
= μ4
= μ5
^
^
^
^
^
RMAexpressionmeasuresfor the 5 GeneChips
![Page 132: 12/5/20151 Microarray Data Pre-Processing. 12/5/20152 Copyright notice Many of the images in this power point presentation of other people. The Copyright.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f575503460f94c7c3ae/html5/thumbnails/132.jpg)
04/21/23 132
Miscellaneous Comments on Normalization
• We have only scratched the surface in terms of normalization methods. There are many variations on the techniques that were described previously as well as other approaches that we won’t discuss at this point in the course.
• Normalization affects the final results, but it is often not clear what normalization strategy is best.
• It would be good to integrate normalization and statistical analysis, but it is difficult to do so. The most common approach is to normalize data and then perform statistical analysis of the normalized data as a separate step in the microarray analysis process.