Research Article A case study on choosing normalization...

Comparative and Functional GenomicsComp Funct Genom 2004; 5: 432–444.Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.416

Research Article

A case study on choosing normalizationmethods and test statistics for two-channelmicroarray data

Yang Xie1, Kyeong S. Jeong2, Wei Pan1, Arkady Khodursky2 and Bradley P. Carlin1*1Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA2Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA

*Correspondence to:Bradley P. Carlin, Division ofBiostatistics, MMC 303, Schoolof Public Health, University ofMinnesota, Minneapolis, MN55455-0392, USA.E-mail: [email protected]

Received: 4 December 2003Revised: 28 May 2004Accepted: 18 June 2004

AbstractDNA microarray analysis is a biological technology which permits the whole genometo be monitored simultaneously on a single slide. Microarray technology not onlyopens an exciting research area for biologists, but also provides significant newchallenges to statisticians. Two very common questions in the analysis of microarraydata are, first, should we normalize arrays to remove potential systematic biases,and if so, what normalization method should we use? Second, how should we thenimplement tests of statistical significance? Straightforward and uniform answersto these questions remain elusive. In this paper, we use a real data example toillustrate a practical approach to addressing these questions. Our data is taken from aDNA–protein binding microarray experiment aimed at furthering our understandingof transcription regulation mechanisms, one of the most important issues in biology.For the purpose of preprocessing data, we suggest looking at descriptive plots firstto decide whether we need preliminary normalization and, if so, how this shouldbe accomplished. For subsequent comparative inference, we recommend use ofan empirical Bayes method (the B statistic), since it performs much better thantraditional methods, such as the sample mean (M statistic) and Student’s t statistic,and it is also relatively easy to compute and explain compared to the others. The falsediscovery rate (FDR) is used to evaluate the different methods, and our comparativeresults lend support to our above suggestions. Copyright 2004 John Wiley & Sons,Ltd.

Keywords: empirical Bayes methods; normalization; false discovery rate; signifi-cance testing; spatial effects; background correction

Introduction

Biological backgroundGene expression is a process of ‘the full use ofthe information in a gene via transcription andtranslation, leading to production of a protein andhence the appearance of the phenotype determinedby that gene’ (Lackie and Dow, 1999). The geneexpression process determines the intracellular con-centration of proteins, which play an important rolein many biological systems. On the other hand,the gene expression procedure is controlled by cer-tain proteins (regulators) in an organized way. As a

result, the knowledge of gene expression and tran-scription regulation are two key questions in biol-ogy. Answers to these questions will facilitate basicbiology and medical research, leading to applica-tions in clinical diagnosis, disease classification andfinding new treatments for diseases.

DNA microarray experimentation is a biologicaltechnology which permits the whole genome to bemonitored on a single slide, so that a better pictureof the interaction among thousands of genes canbe observed simultaneously (Brazma et al., 2000).As a result of microarray applications, the research

Copyright 2004 John Wiley & Sons, Ltd.

Choosing normalization methods and test statistics 433

focus of biologists has shifted from individualgenes to multiple genes, and their interaction andcooperation in a complicated way to maintainlife. One important microarray application is tocompare the expression levels of genes in samplesdrawn from different tissues or conditions througha transcriptional profiling experiment (Tani et al.,2002; Spellman et al., 1998). Another ingeniousapplication of microarray technology is to find theDNA-binding sites for proteins across the entiregenome through a chromatin immunoprecipitation(CHIP) experiment (Ren et al., 2000; Iyer et al.,2001), where the interest is to detect the targetDNA sequences that are bound by specific proteins.

Statistical issues

Microarray technology not only opens an excitingresearch area for biologists, it also provides sig-nificant challenges to statisticians. A common anddifficult question before starting any data analy-sis is whether we need to preprocess the data, andif so how. Generally, there are some systematicbiases and variations in microarray experiments,such as label or dye effects and slide or spa-tial effects, which may affect the measurementsof gene expression levels, and thus the conclu-sion of an experiment (Yang et al., 2002; Quack-enbush, 2002). In order to remove these biasesand make the multiple arrays comparable, variousnormalization methods have been proposed. For achromatin immunoprecipitation experiment usinggenomic DNA as control samples, intensities of thecontrol genomic DNA samples should be around aconstant, while for a transcriptional profiling exper-iment, the intensities in the reference channel willhave gene-specific means. This different propertymay require different normalization methods. Thispaper briefly describes several normalization meth-ods and aims to provide a practical approach todecision making in this regard. We will illustrateour approach for both chromatin immunoprecipita-tion and transcriptional profiling experiments.

Another challenging question is how to properlycarry out statistical data analysis. There are sev-eral overriding statistical issues here. One has to dowith the ‘large p, small n paradigm’ (West, 2000).Thousands of genes will be tested at the sametime, but generally we only have a few replica-tions of each. Empirical Bayes (EB) methods pro-vide a natural approach to addressing this problem

because they can effectively borrow informationacross genes (Efron and Morris, 1973; Carlin andLouis, 2000), and so EB methods for microar-ray have recently been implemented by a varietyof researchers (Newton et al., 2001, 2003, 2004;Lonnstedt and Speed, 2002; Kendziorski et al.,2003; Lin et al., 2003). A relevant statistical issueis how to determine the statistical significance fortesting a null hypothesis. In many situations, it isnot straightforward to obtain the null distributionof a chosen test statistic, so it is not easy to deter-mine the cut-off for rejection region or p valuefor the null hypothesis. However, because microar-ray experiments are largely exploratory in nature,investigators generally care little about precise pvalues, and are willing to accept some false pos-itives among the identified genes. Thus, the bestmethod is one that can lower both type I and typeII errors over a range of cut-offs (Lonnstedt andSpeed, 2002). This paper considers four traditionalor new test statistics used for microarray data anal-ysis, especially focusing on one EB method, andcompares the methods based on plots and their falsediscovery rates.

This paper is organized as follows. ‘Data andMethods’ contains a brief description of the datasetused in the paper, as well as the various normaliza-tion methods, test statistics, evaluation methods andtheir implementation. The ‘Results’ section thenillustrates in the context of our microarray experi-ment; in particular, firstly, we show the effects ofnormalization methods on the location and scale ofthe intensity measurements. In the following sub-section plots are used to illustrate the utility of anempirical Bayes method and how it differs fromother methods. After that we compare the resultsof using different normalization methods and statis-tics by looking at their overlap rates of identifiedgenes and their false discovery rates. In the lastsubsection, we apply our proposed approach to atranscriptional profiling experiment. Finally, thereis a discussion of our findings and open questionsfor future investigation.

Data and methods

Data

Data from a study of transcription regulation mech-anisms is used in this project. The overall goal of

Copyright 2004 John Wiley & Sons, Ltd. Comp Funct Genom 2004; 5: 432–444.

434 Y. Xie et al.

the study is to find the DNA binding sites in vivo ofa broad transcription regulator, leucine responsiveregulatory protein (Lrp) (Tani et al., 2002), throughanalysing the genome-wide distribution of Lrp.Detailed descriptions of the method of genome-wide location and function of DNA binding pro-teins can be found elsewhere (Ren et al., 2000; Iyeret al., 2001).

Briefly, to identify the target binding sites of Lrp,the combination of chromatin immunoprecipitationand microarray hybridization was used. DNA sam-ples from wild-type Escherichia coli were labelledwith red (Cy5) fluorophore following cross-linking,immunoprecipitation and amplification; we callthese samples ‘test samples’. Genomic DNA sam-ples were also prepared and labelled with green(Cy3) fluorophore; we call these samples as ‘con-trol samples’. We identify the genomic target lociby comparative hybridization of test and controlsamples to a DNA microarray. The ratio of Cy5 toCy3 fluorescence intensities measured at each spotin the microarray provided a measure of the extentof binding of Lrp to the corresponding genomiclocus. If there is no binding of Lrp, this ratio shouldbe a constant across all of the genes. So the specificpurpose of this microarray analysis is to identify thespots or genes with intensity ratios between the testsamples and the control samples that are differentfrom this constant. Each DNA microarray includes4221 ORFs of E. coli. After dealing with missingdata and background correction, 4011 genes withfive replicated arrays are used to do the analysis.In this paper, we define Mij as the log ratio of thebackground-corrected intensity levels in test andcontrol samples for gene i on array j :

Mij = log2Rij

Gij

= log2

(intensity level for test sample)ij(intensity level for control sample)ij

for i = 1, . . . , N and j = 1, . . . , n , where N =4011 and n = 5 in this experiment. Suppose thatµi and Mi are the population and sample meansof the Mij , respectively. Our goal is to test thenull hypothesis H0:µi = c for each gene i , wherec is a constant across all of the genes. We canestimate this constant by the median of the Mijfor each array, which we call cj . If we transformeach Mij by subtracting cj , our null hypothesis

becomes H0:µi = 0 for each gene i . For thisdata set, we apply this transformation before anyfurther analysis, and do not regard this as anormalization method.

We also define Aij as the log of geometricmean of the two channel intensities; that is, Aij =(log2 Rij + log2 Gij )/2. We denote the sample meanof Aij by Ai and will use generic notation M andA for Mi and Ai .

Normalization methods

In microarray experiments, the purpose of nor-malization is to remove the systematic varia-tion, such as the differences in labelling effi-ciency between two fluorescent dyes used. Thereare various sources of biases, including exper-imental variability in the processing proceduresand the scanner settings at the data collectionstep. Some of these factors lead to biases thatdepend on the spot’s intensity or its location onthe array, often referred to as spatial or ‘print-tip’effects. Locally weighted smoother (lowess) nor-malization and print-tip group lowess normaliza-tion methods were proposed to correct these kindsof biases (Yang et al., 2002). The lowess normal-ization method is a within-slide location normaliza-tion method. It assumes that the (dye) bias dependson the spot intensity, so it adjusts the log-ratiosMij by an intensity-dependent mean curve c(A),the lowess fit in a scatterplot of the log-ratio Mvs. overall spot intensity A. The print-tip grouplowess normalization method assumes that thereare systematic differences between the print-tips,so it adjusts the log-ratios by both the intensityand print-tip effects ck (A), the lowess fit in an Mvs. A plot for the k th grid only. Both of these meth-ods are used after a summary measurement of geneintensity level [typically log(R/G)] is obtained fortwo-channel arrays. Normalization can be appliedfor the purpose of constructing an expression valueusing physical and biological properties, as wellas for standardizing expression value for within-and between-sample variability. The purpose of ournormalization is the latter. Irizarry et al. (2003)address some of the normalization issues in acoherent way.

The preceding discussion notwithstanding, thequestions of whether and how to normalize the datado arise in practice. We address these questionsby looking at some descriptive plots. First, a



scatterplot of M vs. A is checked for each slide tosee whether there is any systematic pattern betweenthe log-ratio intensity and the overall intensity; ifthere is, we may need to do normalization. Thenwithin print-tip lowess curves are fitted for eachprint-tip group and the spatial plots are checked tosee whether there are spatial effects. If the lowesscurves are different for each print-tip group and theintensities are disproportionately distributed amongthe print-tip groups, then we may need to considerthe print-tip group lowess normalization.

Test statistics

In this project, we use four different statisticalmethods to analyse the data. The first one is theM -statistic, Mi , which corresponds to the earlypractice of simply using twofold change as asignificance indicator. M does not take accountof possibly different variations of Mij for differentgenes, and effectively it treats a highly variablegene in the same way as a stable one. A secondpossibility is the Student’s t statistic, ti = Mi/SEi ,where SEi is the estimated standard error of Mi .The ti statistic can be regarded as a standardizedversion of Mi .

Because we have thousands of genes but only asmall number of replicates for each gene, it is quitepossible that for some genes, just by chance, theirSE estimates (based on sample variances) can bevery small, leading to a huge t statistic. In orderto address this problem, another statistic, S , wasproposed by Tusher et al. (2001). This statistic isa modified t statistic that adds a constant a0 intothe denominator, i.e. Si = Mi/(SEi + a0). As sug-gested by Tusher et al. (2001), we use the medianof standard errors of all the genes as a0. The moti-vation of the S -statistic is intuitive, but it does nothave a rigorous justification (although a connec-tion exists between the S -statistic and a Bayesian-regularized t statistic; Baldi and Long, 2001).

The last method we consider is the B statistic(Lonnstedt and Speed, 2002), which is an empiricalBayes estimate of the log posterior odds of µi �= 0.We assume that genes are independent and themeasurement Mij is a random variable from anormal distribution with mean µi and variance σ 2

i :

Mij |µi , σi ∼ N (µi , σ2i )

Most genes have the same mean intensity levelbetween the two samples, corresponding to µi = 0.

Only a small proportion (say, p) of genes havedifferent mean intensities, leading to µi �= 0. Anindicator function �i is defined as 0 if µi = 0 andas 1 if µi �= 0. By definition and Bayes rule, we cancalculate the log posterior odds for gene i having�i = 1:

Bi = logPr(�i = 1|{Mij })Pr(�i = 0|{Mij })

= log

pn∏

i=1

Pr(Mij |�i = 1)

(1 − p)

n∏i=1

Pr(Mij |�i = 0)

We use conjugate prior distributions for meanµi and variance σ 2

i . For n arrays, a degrees offreedom parameter v , and scale parameters a > 0and c > 0, we set τi = na/2σ 2

i and assume thatτi ∼ Gamma(v , 1), and:

µi |τi

{ ∼ N (0, cna/2τi ) if �i = 1= 0 if �i = 0

(1)

Because this is a conjugate prior, we can easily cal-culate the joint distribution of {Mij , j = 1, . . . , n},µi , and τi , and then integrate to get the marginaldistributions of Pr(Mij |�i = 1) and Pr(Mij |�i =0). The final expression for B is then:

Bi = logp

1 − p

1√(1 + nc)

×

a + s2

i + M 2i

a + s2i + M 2

i

1 + nc

v+ n2

where s2i is the sum of squared errors over n arrays

for gene i . From this formula, we can find that theonly gene-specific part lies in the last ratio, whichis always greater or equal to 1, since 1 + nc ≥ 1.Thus, we can deduce a monotonically increasingrelationship between Bi and M 2

i or relative geneintensity levels, and the relationship is stronger ifthe variance for the gene is smaller. (A similarrelationship exists between ti and Mi .)

B has four hyperparameters: p, v , a and c.Since there are no consistent estimates for themand appropriate hyperpriors are not clear, we use


436 Y. Xie et al.

an EB approach to estimate them. First, we fix pand estimate v , a and c. The methods of momentsis used to get a and v , and the least squaresmethod to get c. There are no satisfactory estimatesfor p, but in most of cases that will not affectthe shape of the B vs. M plot (Lonnstedt andSpeed, 2002).

Evaluation methods

Three plots with numbers indicating whether thegenes are identified as being bound to Lrp bydifferent statistics are used to compare the falsepositive and false negative rates among the M , tand B statistics: average M vs. variance of M , tvs. average intensity A, and B vs. M . The over-lap rates of top genes identified by different nor-malization methods and statistics are calculatedto indicate the agreement between two differentmethods. The higher the overlap rate between twomethods, the better agreement between them. Wealso use the false discovery rate (Efron et al.,2001; Benjamini and Hochberg, 1995) to com-pare the three normalization methods and thefour statistics. FDR is an alternative to control-ling the false positive rate (type I error), and isdefined as the expected proportion of false posi-tive genes (FP) among total positive genes (TP);the observed FP : TP ratio is often used to esti-mate FDR. When we use FDR to compare var-ious statistical methods, we prefer the methodthat gives the lowest FDR while giving the samenumber of top (i.e. positive) genes as that of allother methods.

Implementation

We implemented the methods in the R softwarepackage (www.r-project.org). In particular, weused the SMA (Statistics for Microarray Analysis)package developed by Dudoit et al. (2002) (stat-www.berkeley.edu/users/sandrine/software.html). We used SMA to do lowess normalizationand print-tip group normalization, creating the Mvs. A plots, boxplots and spatial plots, and calcu-lating B statistics.

To calculate FDR, we used a permutation methodto estimate the false positive number FP (Tusheret al., 2001). Under the null hypothesis, H0:µi = 0,we can generate a permuted dataset as follows:multiplying each Mij by either 1 or −1 randomly.

For example, suppose that the original parametersMij for gene i are: 0.2, 0.4, −0.3, −0.5, 0.1. Wegenerated five random numbers, say, −1, 1, −1, 1,−1. Then our permuted data will be: −0.2, 0.4, 0.3,−0.5, −0.1. We do this permutation 50 times foreach genes. The false positive number from eachpermutation is the number of genes that counted assignificant genes from the permuted data. The aver-age of the false positive numbers over 50 permuta-tions is calculated as FP. The number of genes thatcounted as significant genes from the original datais regarded as TP, and we estimate FDR = FP/TP(Pan, 2003). Note that a more elaborate estimatorof FDR, namely FDR = π0FP/TP , with π0 as theprior probability of null hypothesis being true, hasbeen proposed (Storey and Tibshirani, 2003). Sinceπ0 is a constant, independent of whatever normal-ization or test statistic is used, using this estimatorwill not influence our final results.

Results

Effects of normalization

M vs. A plots, spatial plots, and boxplots of themeasurements in the first slide are displayed tocompare the within-slide normalization methods;the corresponding plots for the other four slides aresimilar. Figure 1 shows that, before normalization,the intensity ratio increases as the average intensityincreases, possibly indicating a systematic pattern.After the lowess normalization, this pattern disap-pears. Figure 2 displays 16 within-print-tip lowesslines, one for each print-tip group. This plot mayindicate the existence of spatial effects, since sixlowess curves seem to stand out from each other.

Figure 3 shows a spatial plot. There are dispro-portionately large numbers of extreme log-ratios inthe upper four grids if we do not use any normal-ization method, possibly indicating spatial effectsfor the experiment and the need for within-print-tip group lowess location normalization. Finally,the within-print-tip boxplots in Figure 4 also indi-cate that both the mean and variances are differentamong the 16 print-tip groups. After within-print-tip group lowess normalization, the mean of log-ratio of each print-tip group is adjusted to zero. Sowe may conclude from these plots that there seemto be spatial effects and we need to use the print-tipgroup normalization.


www.r-project.org

stat-www.berkeley.edu/users/sandrine/software.html

stat-www.berkeley.edu/users/sandrine/software.html


6 8 10 12 14

−10

−50

510

A

M

Non–normalized

6 8 10 12 14

−10

−50

510

A

Nor

mal

ized

M

Lowess

Figure 1. M vs. A plots under (a) no normalization and (b) lowess normalization for slide 1 with lowess curvessuperimposed

6 8 10 12 14

−10

−50

510

A

M

Figure 2. M vs. A plot with 16 individual lowess smoothing lines superimposed, one for each print-tip group in slide 1


438 Y. Xie et al.

Non–normalization

Figure 3. Spatial plots of slide 1, highlighting the spots with 5% extreme log-ratio intensities prior to normalization. Eachrectangular represents the log-ratio of a spot on the array. The dark rectangular represents positive log-ratios and the lightrectangular represents negative log-ratios

1 2 3 4 5 6 7 8 9 11 13 15

05

1015

boxplot for within slide lowess normalization

1 2 3 4 5 6 7 8 9 11 13 15

−50

510

15

boxplot for within slide print–tip normalization

Figure 4. Boxplots displaying (a) the log-ratio distribution after lowess normalization, and (b) within-print-tip group lowessnormalization for each of the 16 print-tip groups. The array was printed using a 4 × 4 set of print-tips



Effects of using various statistics

In this subsection, we illustrate the utility of Band how it differs from M and t , based on plotsfollowing the idea of Lonnstedt and Speed (2002).Corresponding illustration of the S statistic canbe found elsewhere (Tusher et al., 2001; Efronet al., 2000).

All statistics were calculated after applying theprint-tip group normalization method to the data.The various plots relating the statistics M , t andB are shown in Figure 5. Genes that are identifiedas ‘extreme’ by at least one of these three statisticsare plotted not as dots but as numbers in this figure;Table 1 provides the key to identifying to which ofthe 23 − 1 = 7 possible sets these extreme genesbelong. We selected the cut-off points so that eachstatistic will identify its top 150 genes as beingextreme. Specifically, the genes are selected asextreme for M if |Mi | > 1.43, for t if |ti | > 4.62,

Table 1. Number of genes falling in the corresponding sets,1–7. In columns 2–4, a ‘1’ indicates that the genes in thisset are ‘extreme’ for the given statistic

Extreme for

Set M t BNumber ofgenes in set

1 0 0 1 482 0 1 0 913 0 1 1 414 1 0 0 895 1 0 1 436 1 1 0 07 1 1 1 17

and for B if Bi > −2.11. After setting these cut-offpoints, all of the seven possible sets in Table 1 arenon-empty except Set 6.

From the M mean vs. log variance plot inFigure 5, we see that at the left end of the plot, the

−8 −6 −4 −2 0 2 4

−2−1

01

23

45

Average M vs Variance

log Var(M)

M

55

71

2

2

4

12

2

43

2

2

44

2 2

44

12

5

2

2

7

2 3

3

45

2

1

4

3

4

3

2

4

21

4

244

3

4

1455

5

15

1

7

2 22

4

3222 2

44

1

2

3

5

4

22

5

315

4

2

2

23

4

1

7

5

4

1

4

3

2

4

22

4

3

4

2

4

2

2

2

4 44

12

33

22

1

2

2

22

3

7 5

12

2

4

1

21

2 222

13

1

2

4

2

4

2

41

4

3

7

5

5

7 4

2

211

22

4

3

3 15

2

2

7

1

2

5

15

2

444

2

74

4

22

2

5

4

5

2

2

1

22

53 1

2

4

2

4

1

4

34

5

4

4

2

4

44

5 447

2

1

43

2

57

7

5 44

2

53 3

3

23

3 1

47

4

3

3

33

22

1 4

1

3

4

21

7

1

4

4 44

4

4

3

4

3

44

4

5

4

45 4

3

54

1

155 4

4

11

2

441

7

5

5 453

5

23

4

2

144

7

145 4

1545

2

4

2

2

5

3322 32

5

17

1

21

2

15

2

4

1

15

43

5

10 11 12 13 14 15

−20

010

2030

40

t vs average A

A

t

5 5

71

2

2

4

1

2

2

432

2

4 4

22

44

1 2

5 2

2 72 3

34

5

21

4

3

4

3

2

4

2 1

42

44

3

41 455 515

1

7

2 22

4

3 22

2

2

4 4 12 3

54

22

5

31 54

2

2

234

1

7

5

4 14

3

2

4

2 2

4

3

4

2

4

2

2

2

44 4

12 3

3

22

12

2

22

3

75

12

24

1

212

22

2

13

1

2

4

2

42

4 14

3

7

55 74

2

2

112 2

4

3

315 2

2

7

1

2 5

1 52

4442

7

4422

25

45

2

2 1

22

53

1

2

42

41

4

34

54 4

24

445

4472 1

43

2 5 7

7

544

2

53 3

3

23

314

7

4

3

3

33

22

141

3

4

2

1

7144 444

4

3

4

3

4445

44 54

35

41

1 55 4 4112

441 7

5

5 4 53

5

23

42144

7 14 541 545

2

4

2

2 5 33 2

232

5

1 7

1

2

1

2

1 5

2

41

1 54

3

5

−2 −1 0 1 2 3 4 5

−5−4

−3−2

−10

1

B vs average M

M.

B

55

7

1

2

2

4

1

22 4

3

2

2 4

4

22

4 4

1

2

5

2

2

7

2

3 3

4

5

2

1

4

3

4

3

2

42

1

42

4

4

3

4

1

4

55

5

1 51

7

2

2

2

4

32

2

2

2

4

4

1

2

35

42

25

3

154

22

2

3

4

1

7

5

4

1

4

3

2 4

2

2

4

3

424

22

24

44

1

2

33

2

2

1

2

2

2

2

3

7

51

2

2

4

1

2

1222

2

13

1

2

42

42

4

1

4

3

7

55

7

4

2

211

2

2

4

3

3 1

5

2

2

7

1

2

5

15

2

4

44

2

7

4 42

2

2

5

4

5

2

2

1

2

2

5

3

1

2

4

2

4

1

4

3

4

5

4

4

2

4

4

4

5

44

7

2

1

4

3

2

5

7

7

5

4 4

2

5 33

3

2

3

31

4

7

4

33

3

3

2

2

1

4

13

4

2

1

7

1

4

4

4

4

4

4

3

4

3

4

4

4

5

4

45

4

3

5

4

1 1

5

5

4

411

2

44

1

7

55

4

5 3 5

2

3

4

2

1

4

4

7

1

4

5

4

1 5

4

5

2

4

2

2

53

3

22

32

51

7

1

2

1

2

1

5

24

11

5

4

35

Figure 5. Lrp data for different statistics. The three plots are sample mean of M vs. its sample variance, t vs. averageintensity A, and B vs. M. When the plotting character is a number from 1 to 7, this indicates whether the gene is identifiedas extreme by M, t, B, or some combination thereof (see Table 1 for key)


440 Y. Xie et al.

genes have very small variance and their meansare not large. Almost all of these genes fall intoSet 2; i.e. the genes are identified only by t butnot by M and B statistics, which is consistent withour previous description of t : it can be inflated bya small variance. It is likely that these genes arefalse positives from using the t statistic.

It is reassuring to see that these genes are notidentified by the B statistic. When we look at theright end of the M vs. log variance plot, somegenes have a large mean but their variances arealso large. All of these genes fall into Set 4; i.e. thegenes are identified only by M but not by t and B .This phenomenon is consistent with our previousdescription of M : a large M does not take accountof its possibly large variation. We may considerthese genes as false positives from using M .

The t vs. A plot indicates that most of the geneswith extreme t values fall into Set 2 and are notidentified by M and B , while the B vs. M plotshows that the genes with extreme B values fallinto Set 7; that is, these genes are identified by allof M , t and B . Most of the genes falling in Set 1that are identified only by B have only moderatelyhigh B values. Hence it appears that B is morestable and reliable than t . Set 7 includes the geneswith high values for all the three statistics, and thisshows up clearly in the plots. Set 3 can be detectedby t and B , but not by M . Set 1 and Set 5 canbe detected by B , but not by t . Set 6 is the set ofgenes that can be detected by M and t but not byB . We note that the number of genes falling intoSets 1, 3, and 5 are between 40 and 50, but thereare no genes falling into Set 6, which confirms thatthe genes high in both M and t are also high in B .

Evaluation of both normalization methods andtest statisticsTable 2 shows the overlap rates of the top 150genes identified by different statistics to compare

the agreement among these statistics. The resultsare consistent across normalization method: theoverlap rates among M , t and S statistics are below50%, but there is more than an 80% overlap ratebetween S and B . This appears related to the factthat both B and S can be justified from a Bayesianpoint of view.

Table 3 compares estimated FDRs for the genesidentified as extreme by the four statistics beforeand after applying two normalization methods.The FDRs for non-normalized data are in therange 70–80% for the various statistics, while thisrange for the lowess normalization is 30–56%, andjust 17–43% for the print-tip group normalizationmethod. When we compare the FDRs for each teststatistic using the same normalization method, wefind that M always has the highest FDR, and theFDR for t is also rather high. By contrast, the Bstatistic has the lowest FDR, with the S statisticclose behind. Since B with print-tip normalizationoffers the lowest FDR (17%), we conclude that forthis dataset, preprocessing to account for spatialeffects and subsequent use of the B statistic is thebest approach.

Application to transcription profiling data

To assess the performance and versatility of ourapproach, we also applied it to a transcription

Table 3. False discovery rates for the identified genes bystatistic and normalization method

Normalization method

None Lowess Print-tip

M 0.86 0.56 0.43t 0.71 0.43 0.35S 0.72 0.33 0.22B 0.69 0.30 0.17

Table 2. Overlap rates of the genes identified as extreme by the various statistics based on three differentnormalization methods


M t S B M t S B M t S B

M 1.00 0.15 0.43 0.46 1.00 0.07 0.31 0.33 1.00 0.11 0.39 0.40t 1.00 0.41 0.38 1.00 0.35 0.39 1.00 0.42 0.39S 1.00 0.85 1.00 0.82 1.00 0.85B 1.00 1.00 1.00



profiling dataset. Transcriptional profiling was car-ried out to comprehensively define a family ofgenes whose transcription depends on the activ-ity of leucine-responsive regulatory protein, Lrp.Specifically, researchers set out to identify genesdifferentially expressed in Lrp+ and Lrp− strains.Two-colour hybridization of the cDNA microar-ray, as described in Tani et al. (2002), was usedin the experiment. Here we use six arrays of4281 genes each to illustrate our proposed pro-cedure of selecting normalization methods andtest statistics.

Figure 6 displays 16 within-print-tip lowess linesfor the Lrp expression data. From this plot, wecan see that there is no obvious pattern between

Table 4. For Lrp expression data: false discovery rates forthe identified genes by statistic and normalization method

Normalization method


M 0.171 0.162 0.169t 0.039 0.032 0.039S 0.031 0.032 0.031B 0.019 0.010 0.019

intensity ratio and average intensity, and the 16lowess curves seem to stay close to each other,indicating that no spatial effects exist and henceany normalization (especially the within print-tipnormalization) may not help much. Table 4 com-pares estimated FDRs for the top 150 genes identi-fied by the four statistics before and after applyinglowess and within print-tip normalization methods.As expected from Figure 6, the normalization doesnot have much effect on the FDR, and lowessnormalization seems to be slightly better than nonormalization and within-print-tip normalization.When we compare the four statistics, we find thatM always has the highest FDR and the other threestatistics, t , S and B , have much lower FDR, withB having consistently the lowest FDR among thesefour. So, based on the plot and FDR, we concludethat for this dataset, normalization does not havemuch effect on results, and the B statistic seems tooffer the best approach.

Discussion

In this paper, we have compared three normaliza-tion methods and four test statistics to identify tar-get genes bound by a protein called Lrp. Different

8 10 12 14

-4-2

02

4

A

M

Figure 6. For Lrp expression data: M vs. A plot with 16 individual lowess smoothing lines superimposed, one for eachprint-tip group in slide 1


442 Y. Xie et al.

normalization methods were developed to reducethe systematic biases and variations across microar-ray experiments. Before doing any data analysis,we must decide whether we need to normalize thedata, and if so which normalization method to use.Here we have illustrated the preliminary use of Mvs. A plots, spatial plots and boxplots to decidewhether there are any patterns between M and Aand whether there are spatial effects. If so, normal-ization is needed, with the print-tip group normal-ization method being most appropriate if there isevidence of a within-tip spatial effect. This shouldalso result in a reduction of the false discovery ratein subsequent analysis. For our data, both method-ologies (plot inspection and FDR calculation) pointto the use of print-tip group normalization beingpreferable. Hence, we suggest preliminary explo-ration of the various descriptive plots discussedto guide selection of the appropriate normaliza-tion method.

Although this suggestion is practical and easy touse, some concerns about normalization still linger.Because normalization methods assume that themean log ratio intensity for each slide is close tozero, there should be a small proportion of geneswith different intensities. But it can be difficult tocheck whether a dataset satisfies these assumptions,and whether the normalization methods will stillwork if the data violate the assumptions. If not,can more robust methods be found? Tseng et al.(2001) suggested a rank invariant method andReilly et al. (2003) proposed probability models fitusing the Gibbs sampler (see e.g. Carlin and Louis,2000) to select non-differentially expressed genesto do normalization. ANOVA methods proposedby Kerr et al. (2000) are also widely used innormalization and testing for microarray data. Onemight also compare ANOVA methods with theseother methods to see how they perform.

The B statistic uses information from all thegenes to estimate the posterior odds of rejectingthe null hypothesis. Although the normality andindependence assumptions were used to derivethe formula for B , we did not use a formaltest based on these assumptions. Furthermore, theconclusion of ‘extremeness’ for a gene is oftenbased solely on the rank of its test statistic.Because we cannot estimate the prior probabilityof extremeness satisfactorily and the scale of Bdepends on this probability, we cannot use apredetermined cutoff value (such as B = 0) for

gene selection. Fortunately, the ranks of the variousB statistics do not depend on p, so we can selectthe top genes with the most extreme B valuesbased on their ranks. Based on the analysis resultsfor the Lrp data, we conclude that the B statisticperforms much better than the M and t statistics,since B yields much smaller false positive ratesthan the other two. The performance of B , interms of the overlap rate and false discovery rate,is quite similar to that of S , but the former ismore explicitly model-based. Also, because B hasa closed form, it is easy to compute and convenientto use (e.g. it is directly available within the SMApackage). Hence, we conclude that, although theM and t statistics are the most commonly usedstatistics for microarray data analysis, based on ourresults, we instead recommend the B statistic. Itwill be interesting to see whether the B statistic canbe incorporated into other statistical packages andapplications in the detection of differential geneexpression (Pan, 2002; Kendziorski et al., 2003;Newton and Kendziorski, 2003).

In this paper, we used the FDR results from thetop 150 identified genes to compare the differentmethods. In order to check whether the results aresensitive to the number of genes so selected, werepeated our analysis using different cut-off points;specifically, using the top 20, 50 and 200 identi-fied genes. The results were consistent with ourprevious findings: the FDR was highest for non-normalization and lowest for print-tip normaliza-tion, and the B and S statistics have lower FDRsthan M and t . Thus, our results appear robust withrespect to the choice of extremeness cut-off point.

Our data analysis followed the common approachof using background subtracted measurement inten-sity. But Kooperberg et al. (2002) pointed out thatthis method may lead to a much larger variancethan needed when the expression levels are low.Qin and Kerr (2003) also showed that backgroundsubtraction can increase the variability of geneexpression and worsens one’s ability to detect theexpressed genes. To investigate this issue in oursetting, we redid our analyses without backgroundsubtraction. Our results were again consistent withour previous conclusions: print-tip normalizationand the B statistic enjoy the best FDR perfor-mance. We also found that the FDR is consistentlylower when using data without background sub-traction across different normalization methods andtest statistics. Thus, based on our results, avoiding



background subtraction seems preferable, but fur-ther investigation is needed.

Although there exists some distinction betweenchromatin immunoprecipitation experiment withgenomic DNA as controls and the usual cDNAtranscriptional profiling experiment; the copy num-ber of genomic DNA is fixed across the microar-ray, which is not true for a transcriptional profilingexperiment. In our chromatin immunoprecipitationdata, we assume that most of genes are not thebinding targets of Lrp and, after the transformation(subtracting the median of intensity ratio), the nullhypothesis is that the intensity ratio from two chan-nels should be zero. For our transcriptional profil-ing experiment, we also assume that there is only asmall number of genes expressing differently, andthe null hypothesis is again that the intensity ratiofrom two channel should be zero. These commonproperties between two different types of experi-ments make us think that similar analysis methods,including normalization and testing methods, mayapply to both of these experiments. Our resultsfor both the chromatin immunoprecipitation exper-iment and the transcriptional profiling experimentare consistent; inspecting the descriptive plots ishelpful in choosing the normalization method, andthe B statistic is consistently better than the M andt statistics in the sense of lowering the FDR.

Acknowledgements

We thank Dr Rowena Matthews from the Universityof Michigan in Ann Arbor for providing us with Lrpantibodies. The work is supported by NIH Grants HL65462(WP&YX), AI41966 (BPC&YX), GM066098 (AK&KSJ)and NSF/EPA Grant SES99-78238 (BPC&YX).

References

Lackie JM, Dow JAT. 1999. The Dictionary of Cell and MolecularBiology, 3rd edn. Academic Press: London.

Brazma A, Robinson A, Cameron G, Ashburner M. 2000. One-stop shop for microarray data. Nature 403: 699–700.

Tani T, Khodursky A, Blumenthal R, Brown P, Matthews R.2002. Adaptation to famine: a family of stationary-phase genesrevealed by microarray analysis. Proc Natl Acad Sci USA 99:13 471–13 476.

Spellman P, Sherlock G, et al. 1998. Comprehensive identificationof cell cycle-regulated genes of the yeast Saccharomycescerevisiae by microarray hybridization. Mol Cell Biol 9:3273–3297.

Ren B, Robert F, et al. 2000. Genome-wide location and functionof DNA binding proteins. Science 290: 2306–2309.

Iyer V, Horak C, Scafe C, et al. 2001. Genomic binding sites ofthe yeast cell-cycle transcription factors SBF and MBF. Nature409: 533–538.

Yang YH, Dudoit S, Luu P, Speed T. 2002. Normalization forcDNA microarray data. Nucleic Acids Res 30: e15.

Quackenbush J. 2002. Microarray data normalization andtransformation. Nature Genet Suppl 32: 496–501.

West M. 2000. Bayesian regression analysis in the large p, smalln paradigm. Working Paper 00-22. Institute of Statistics andDecision Sciences: Duke University, USA.

Efron B, Morris C. 1973. Combining possibly related estimationproblems. J R Statist Soc B 35: 379–421.

Carlin BP, Louis TA. 2000. Bayes and Empirical Bayes Methodsfor Data Analysis, 2nd edn. Chapman and Hall/CRC Press: BocaRaton, FL.

Newton M, Kendziorski C, Richmond C, Blattner F, Tsui K.2001. On differential variability of expression ratios: improvingstatistical inference about gene expression changes frommicroarray data. J Comput Biol 8: 37–52.

Lonnstedt I, Speed T. 2002. Replicated microarray data. StatistSinica 12: 31–46.

Kendziorski C, Newton M, Lan H, Gould M. 2003. On parametricempirical Bayes methods for comparing multiple groups usingreplicated gene expression profiles. Statist Med 22: 3899–3914.

Newton M, Kendziorski C. 2003. Parametric empirical Bayesmethods for microarrays. In The Analysis of Gene Expres-sion Data: Methods and Software, Parmigiani G, Garrett E,Irizarry R, Zeger S (eds). Springer-Verlag: New York.

Lin Y, Nadler S, Lan H, Attie A, Yandell B. 2003. Adaptivegene picking with microarray data: detecting important lowabundance signals. The Analysis of Gene Expression Data:Methods and Software, Parmigiani G, Garrett E, Irizarry R,Zeger S (eds). Springer-Verlag: New York.

Newton M, Noueiry A, Sarkar D, Ahlquist P. 2004. Detectingdifferential gene expression with a semiparametric hierarchicalmixture method. Biostatistics 5: 155–176.

Irizarry R, Hobbs B, Collin F, et al. 2003. Exploration, normaliza-tion, and summaries of high density oligonucleotide array probelevel data. Biostatistics 4: 249–264.

Tusher V, Tibshirani R, Chu G. 2001. Significance analysis ofmicroarrays applied to the ionizing radiation response. Proc NatlAcad Sci USA 98: 5116–5121.

Baldi P, Long AD. 2001. A Bayesian framework for the analysisof microarray expression data: regularized t-test and statisticalinferences of gene changes. Bioinformatics 17: 509–519.

Efron B, Tibshirani R, Storey JD, Tusher V. 2001. EmpiricalBayes analysis of a microarray experiment. J Am Statist Assoc96: 1151–1160.

Benjamini Y, Hochberg Y. 1995. Controlling the false discoveryrate: a practical and powerful approach to multiple testing. J RStatist Soc B 57: 289–300.

Dudoit S, Yang YH, Speed T. 2002. The SMA package. Avail-able online at: www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html.

Pan W. 2003. On the use of permutation in and the performanceof a class of nonparametric methods to detect differential geneexpression. Bioinformatics 19: 1333–1340.

Storey J, Tibshirani R. 2003. Statistical significance for genome-wide studies. Proc Natl Acad Sci USA 100: 9440–9445.


www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html

www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html

444 Y. Xie et al.

Efron B, Tibshirani R, Goss V, Chu G. 2000. Microarrays andtheir use in a comparative experiment. Technical report,Department of Statistics, Stanford University, USA.

Tseng G, Oh M, Rohlin L, Liao J, Wong W. 2001. Issues in cDNAmicroarray analysis: quality filtering, channel normalization,models of variations and assessment of gene effects. NucleicAcids Res 29: 3549–2557.

Reilly C, Wang C, Rutherford M. 2003. A method for normalizingmicroarrays using the genes that are not differentially expressed.J Am Statist Assoc 98: 868–878.

Kerr MK, Martin M, Churchill GA. 2000. Analysis of variance forgene expression microarrays. J Comput Biol 7: 819–837.

Pan W. 2002. A comparative review of statistical methodsfor discovering differentially expressed genes in replicatedmicroarray experiments. Bioinformatics 12: 546–554.

Kooperberg C, Fazzio T, Delrow J, Tsukiyama T. 2002. Improvedbackground correction for spotted DNA microarrays. J ComputBiol 9: 55–66.

Qin L, Kerr K. 2003. Empirical evaluation of methodolo-gies for microarray data analysis. Available online at:www.ima.umn.edu/talks/workshops/9-29-10-3.2003/kerrIMA.pdf.


www.ima.umn.edu/talks/workshops/9-29-10-3.2003/kerrIMA.pdf

Submit your manuscripts athttp://www.hindawi.com

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Anatomy Research International

PeptidesInternational Journal of


Hindawi Publishing Corporation http://www.hindawi.com

International Journal of

Volume 2014

Zoology


Molecular Biology International

GenomicsInternational Journal of


The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014


BioinformaticsAdvances in

Marine BiologyJournal of



Signal TransductionJournal of


BioMed Research International

Evolutionary BiologyInternational Journal of



Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Genetics Research International


Advances in

Virolog y

Hindawi Publishing Corporationhttp://www.hindawi.com

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational



Enzyme Research


International Journal of

Microbiology

Research Article A case study on choosing normalization...

Documents

Transcript of Research Article A case study on choosing normalization...