Reduction of selection bias in genomewide studies by resampling

16
Reduction of Selection Bias in Genomewide Studies by Resampling Lei Sun 1,2n and Shelley B. Bull 1,3 1 Department of Public Health Sciences, University of Toronto, Toronto, Canada 2 Programs in Genetics and Genomics, Hospital for Sick Children, Toronto, Canada 3 Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada The accuracy of gene localization, the reliability of locus-specific effect estimates, and the ability to replicate initial claims of linkage and/or association have emerged as major methodological concerns in genomewide studies of complex diseases and quantitative traits. To address the issue of multiple comparisons inherent in genomewide studies, the use of stringent criteria for assessing statistical significance has been generally acknowledged as a strategy to control type I error. However, the application of genomewide significance criteria does not take account of the selection bias introduced into parameter estimates, e.g., estimates of locus-specific effect size of disease/trait loci. Some have argued that reliable locus-specific parameter estimates can only be obtained in an independent sample. In this report, we examine statistical resampling techniques, including cross-validation and the bootstrap, applied to the initial sample to improve the estimation of locus- specific effects. We compare them with the naı ¨ve method in which all data are used for both hypothesis testing and parameter estimation, as well as with the split-sample approach in which part of the data are reserved for estimation. Upward bias of the naı ¨ve estimator and inadequacy of the split-sample approach are derived analytically under a simple quantitative trait model. Simulation studies of the resampling methods are performed for both the simple model and a more realistic genomewide linkage analysis. Our results suggest that cross-validation and bootstrap methods can substantially reduce the estimation bias, especially when the effect size is small or there is no genetic effect. Genet. Epidemiol. 28:352–367, 2005. & 2005 Wiley-Liss, Inc. Key words: biased estimation; cross-validation; bootstrap; linkage analysis; estimation of genetic effect Contract grant sponsor: Canadian Institutes of Health Research (CIHR); Contract grant sponsor: Natural Science and Engineering Research Council of Canada (NSERC). n Correspondence to: Lei Sun, Department of Public Health Sciences, 12 Queen’s Park Crescent West, Toronto, ON M5S 1A8, Canada. E-mail: [email protected] Received 4 October 2004; Accepted 22 December 2004 Published online 10 March 2005 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/gepi.20068 INTRODUCTION It has been appreciated in many scientific settings that attempts to reproduce results found in an initial sample often lead to disappointing results in a replication sample. These discrepan- cies can be explained by random variability, by differences in composition between the initial and the replication samples, and by the effects of variable selection and over-fitting in the initial sample. In the context of human genetics, and more particularly in genomewide searches to identify susceptibility genes for complex dis- eases/traits, the consequences of examining the whole genome by means of multiple hypothesis tests at genetic markers and/or candidate gene loci have also been well recognized. In an influential study, Lander and Kruglyak [1995] advocate the use of stringent levels of statistical significance in order to control the overall type I error rate. Suarez et al. [1994] show that in the presence of multiple disease gene loci, the number of families needed to attain the same level of statistical significance at a locus detected in an initial sample is substantially larger than the initial sample size. Once a genetic marker or candidate gene has been identified as providing significant evidence of a disease/trait locus in the vicinity, it is frequently of interest to estimate the magnitude of the genetic effect of that locus on the disease/ trait under study. Additional studies may be designed to replicate the findings in an indepen- dent sample from the same or a different popula- tion, to refine the location of the gene by fine- mapping, to examine a set of likely candidate Genetic Epidemiology 28: 352–367 (2005) & 2005 Wiley-Liss, Inc.

Transcript of Reduction of selection bias in genomewide studies by resampling

Page 1: Reduction of selection bias in genomewide studies by resampling

Reduction of Selection Bias in Genomewide Studies by Resampling

Lei Sun1,2n and Shelley B. Bull1,3

1Department of Public Health Sciences, University of Toronto, Toronto, Canada2Programs in Genetics and Genomics, Hospital for Sick Children, Toronto, Canada

3Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada

The accuracy of gene localization, the reliability of locus-specific effect estimates, and the ability to replicate initial claims oflinkage and/or association have emerged as major methodological concerns in genomewide studies of complex diseasesand quantitative traits. To address the issue of multiple comparisons inherent in genomewide studies, the use of stringentcriteria for assessing statistical significance has been generally acknowledged as a strategy to control type I error. However,the application of genomewide significance criteria does not take account of the selection bias introduced into parameterestimates, e.g., estimates of locus-specific effect size of disease/trait loci. Some have argued that reliable locus-specificparameter estimates can only be obtained in an independent sample. In this report, we examine statistical resamplingtechniques, including cross-validation and the bootstrap, applied to the initial sample to improve the estimation of locus-specific effects. We compare them with the naıve method in which all data are used for both hypothesis testing andparameter estimation, as well as with the split-sample approach in which part of the data are reserved for estimation.Upward bias of the naıve estimator and inadequacy of the split-sample approach are derived analytically under a simplequantitative trait model. Simulation studies of the resampling methods are performed for both the simple model and a morerealistic genomewide linkage analysis. Our results suggest that cross-validation and bootstrap methods can substantiallyreduce the estimation bias, especially when the effect size is small or there is no genetic effect. Genet. Epidemiol. 28:352–367,2005. & 2005 Wiley-Liss, Inc.

Key words: biased estimation; cross-validation; bootstrap; linkage analysis; estimation of genetic effect

Contract grant sponsor: Canadian Institutes of Health Research (CIHR); Contract grant sponsor: Natural Science and EngineeringResearch Council of Canada (NSERC).nCorrespondence to: Lei Sun, Department of Public Health Sciences, 12 Queen’s Park Crescent West, Toronto, ON M5S 1A8, Canada.E-mail: [email protected] 4 October 2004; Accepted 22 December 2004Published online 10 March 2005 in Wiley InterScience (www.interscience.wiley.com)DOI: 10.1002/gepi.20068

INTRODUCTION

It has been appreciated in many scientificsettings that attempts to reproduce results foundin an initial sample often lead to disappointingresults in a replication sample. These discrepan-cies can be explained by random variability, bydifferences in composition between the initial andthe replication samples, and by the effects ofvariable selection and over-fitting in the initialsample. In the context of human genetics, andmore particularly in genomewide searches toidentify susceptibility genes for complex dis-eases/traits, the consequences of examining thewhole genome by means of multiple hypothesistests at genetic markers and/or candidate geneloci have also been well recognized. In aninfluential study, Lander and Kruglyak [1995]

advocate the use of stringent levels of statisticalsignificance in order to control the overall type Ierror rate. Suarez et al. [1994] show that in thepresence of multiple disease gene loci, the numberof families needed to attain the same level ofstatistical significance at a locus detected in aninitial sample is substantially larger than theinitial sample size.Once a genetic marker or candidate gene has

been identified as providing significant evidenceof a disease/trait locus in the vicinity, it isfrequently of interest to estimate the magnitudeof the genetic effect of that locus on the disease/trait under study. Additional studies may bedesigned to replicate the findings in an indepen-dent sample from the same or a different popula-tion, to refine the location of the gene by fine-mapping, to examine a set of likely candidate

Genetic Epidemiology 28: 352–367 (2005)

& 2005 Wiley-Liss, Inc.

Page 2: Reduction of selection bias in genomewide studies by resampling

genes in the region, or to assess possible func-tional variants. In the face of three out of fourreports of failure to replicate a strong associationbetween Alzheimer disease and a gene encodinga-2 macroglobulin, the editors of Nature Genetics[1999] recommend that studies should includereplication in an independent sample, as well asdemonstrating consistency between family- andpopulation-based designs, and should report anestimate of the genetic effect size. Effect estimatesprovide important information on which to basepower and sample size calculations for subse-quent studies. If, however, the estimates are notaccurate, expectations for successful replicationwill be unrealistic.In a recent paper, Goring et al. [2001] draw

attention to the fact that estimates of locus-specificeffect size at genomewide linkage scorepeaks tend to be inflated. They demonstrate thatwhen the test statistic is maximized over manypointwise tests in the genome, the parameterestimates characterizing the locus-specificeffects are effectively biased upwards. Identifica-tion of a marker locus from a genomewide set ofmarkers by statistical significance criteria is thusanalogous to using multiple univariate tests toselect the best regression variable from a large setof explanatory variables. Although Goring et al.[2001] illustrate the phenomena of effect inflationin a variance components analysis of quantitative-trait-loci (QTL), they emphasize that this is ageneral feature of genomewide studies regardlessof the nature of the phenotype and the analyticmethods. Furthermore, they recommend that onedataset be used for locus mapping and another forparameter estimation. Unfortunately, investigatorsmay attempt to address the bias problem bytaking an available dataset and splitting it intotwo smaller datasets for mapping and estimation,sacrificing power and efficiency for both objec-tives.Allison et al. [2002] and Siegmund [2002]

propose statistical methods to reduce bias inestimates obtained in a single dataset. Allison etal. [2002] use a method-of-moments procedure forQTL effects in a genome scan that specifies howthe distribution of the effect estimate is truncatedby selection of loci through linkage test signifi-cance thresholds. The procedure involves simula-tion under a specific underlying genetic model(involving mode of inheritance, penetrances, andallele frequencies) that is generally unknown, and,unfortunately, assumption of an incorrect modelreduces the effectiveness of the bias correction,

limiting its practical utility. To obtain morerealistic sample size requirements for replicationof results obtained in a single sample, Siegmund[2002] proposes lower confidence limits for thegenetic effect parameter that accounts for thegenomewide multiple comparisons. Its applica-tion to effect estimation, however, does involvespecification of a model for the relation ofgenotype to phenotype that must at least beapproximately correct. Given that most genomescans are conducted using approaches that do notrequire specification of an underlying geneticmodel, a task that is difficult for complexdiseases/traits, the sensitivity to model assump-tions of these proposed solutions may be proble-matic.Statistical resampling techniques such as

cross-validation and the bootstrap have beensuccessfully employed to address over-fittingand variable selection bias in diagnostic andprognostic prediction models in clinical settingsand in microarray data analysis of gene expres-sion. We hypothesize that, in the context ofgenomewide genetic studies, these statisticaltechniques will also eliminate or reduce bias inlocus-specific genetic effect estimates obtained inthe same sample in which the locus was detected.In the following, we first describe each method ina general framework. We then use a simpleillustrative example to derive analytically theupward bias of the naıve estimator and theinadequacy of the split-sample approach, and weperform simulation studies to investigate andcompare the accuracy of the resampling methods.Furthermore, in a more realistic example of agenomewide linkage analysis with an AffectedSib Pair (ASP) design, our simulation studiesusing allele-sharing methods based on the Non-Parametric Linkage (NPL) score [Kruglyak et al.1996] yield comparable results. We close with adiscussion of the findings and implications forfurther work.

METHODS FOR EFFECT-SIZE ESTIMATIONAND BIAS REDUCTION

In the context of genomewide scans for complexdiseases/traits, one can assume that a particularmapping study is conducted using a high-densitymap of m markers, which are distributed acrossthe genome. Among the m markers, a subset ofsize p markers would correspond (i.e., are linked)to the p true disease/trait loci. In this report, wefocus on the single-locus model (p¼1) in which

Effect Estimation in Genomewide Scans 353

Page 3: Reduction of selection bias in genomewide studies by resampling

there is one disease/trait susceptibility gene locus,and later we discuss possible extensions to themulti-locus model. Suppose that n families orindividuals have been ascertained and genotypedat each of the m marker loci. Depending on thestudy design, we assume that an appropriategenetic mapping method has been selected to testfor linkage/association at each of the marker loci,e.g., allele sharing models [Kruglyak et al. 1996;Kong and Cox, 1997] for binary disease status,variance components models [Amos, 1994;Almasy and Blangero, 1998], or Haseman-Elstonregression [Elston et al. 2000] for quantitativetraits, with T1,y,Tm being the test statistics at them loci. In addition, we assume that criteria forselection of the most ‘‘interesting’’ or ‘‘promising’’marker region for subsequent fine-mapping,replication, or candidate gene studies have alsobeen specified. For example, to control the false-positive rate, the selected marker might be themarker with the maximum test statistic that alsomeets certain criterion for genomewide signifi-cance.Once a marker (or a set of markers) has been

identified as linked to the putative disease/traitlocus, the next key objective of interest is toestimate the true gene-effect size m via linkage/association parameters, with g1,y,gm being theestimates obtained at the m loci. For example, inthe context of linkage analysis based on the NPLscore, m is the expected excess IBD sharing at thedisease locus, and gi is the observed excess IBDsharing at marker i. Ideally, the parameterestimate should be obtained from an independentdataset, but it is often the case that a replicationsample is not readily available. Currently, themost commonly used estimate is a naıve one,calculated in the same sample that has been usedfor testing. However, this estimator yields esti-mates with upward bias due to maximizationacross the genome within each dataset, as well asselection of positive test results. Note that estima-tion is of interest only if there is evidence forlinkage/association. A simple split-sample meth-od, in which part of the sample is used for locusdetection and the remaining part of the sample isused for parameter estimation, would allow us toconstruct an unbiased estimator. However, thepower for detecting linkage/association is re-duced because the size of the sample used fordetection is reduced, and the estimate obtained inthe remaining sample is more variable. Thissuggests that a repeated sample-split approach,such as cross-validation or bootstrap resampling

techniques, may be exploited to obtain moreaccurate parameter estimates. In the following,we detail each of the methods.

NAIVE METHOD

In this method, all n families are used for bothgene localization and gene-effect-size estimation.For a given dataset, suppose marker in has beenselected as the putative susceptibility gene locus(e.g., Tmax ¼ Ti� and Tmax4zn, inA{1,y, m} wherezn is an appropriate threshold for a given type Ierror rate a). A naıve estimate of gene-effect size mwould then be mN ¼ gi� . The estimate is biasedupwards, because it is calculated based on thesame data used for marker selection in which onlythe marker with the maximum statistic that alsomeets the genomewide significance is selected. Inthe context of genomewide QTL analysis, Goringet al. [2001] demonstrate via simulation studiesthat the naıve QTL-heritability estimates aregrossly biased upward, especially under the nullmodel when no QTL exists at all [table 1 of Goringet al. 2001]. In the next section, we consider asimple illustrative example for which someanalytical results can be derived, and we providemathematical expressions for the bias and var-iance of the naıve estimator under both null andalternative models.

INDEPENDENT ESTIMATOR

In some situations, an independent replicationsample may be collected after the initial study. Ifso, for the marker locus in selected in the originaldataset, a new independent estimator mI ¼ gIi� canthen be obtained in the second dataset at thatmarker. If the selected marker is a true positive(TP), then mI is unbiased with respect to the truegene-effect size. Likewise, if the selected marker isa false positive (FP), then mI is unbiased withrespect to zero effect size, because there is nogenetic effect at that marker in the independentsample.

SPLIT-SAMPLE METHOD

In this method, the families are randomly splitinto two groups. One group is designated as thedetection sample (equivalent to the originalsample above with sample size nD), and the otheris designated as the estimation sample (equivalentto the independent replication sample above withsample size nE ¼ n� nD). Testing for linkage/association is first performed in the detectionsample based on test statistics fTD1; :::;TDmg. For

Sun and Bull354

Page 4: Reduction of selection bias in genomewide studies by resampling

the marker in selected in the detection sample, theparameter estimate would then be calculatedbased on the estimation sample only, i.e.,mE ¼ gEi� . Note that an estimate could also beobtained in the detection sample, i.e., mD ¼ gDi� ;but it would be biased upward as argued in thecase of the naıve estimator.

CROSS-VALIDATION (CV) METHOD

To obtain a 10-fold CVestimator, the families arerandomly split into 10 subsets or groups. For eachgroup k¼1,y,10 in turn, families in group k aredesignated as the estimation sample, and familiesin the remaining nine groups are designated as thedetection sample. For a given k, marker selectionand parameter estimation can be performedexactly as in a 90-10 split-sample case (withsample sizes nD¼.9n and nE¼.1n). Thus, we usethe same notation as above, with superscript kindicating which group is used as the estimationsample. Suppose marker ink is selected as theputative gene locus at step k, let mkD ¼ gDi�k andmkE ¼ gEi�k be the gene-effect-size estimates ob-tained, respectively, in the detection and estima-tion samples at that marker. This would yield avector of additive ðmkD � mkEÞ corrections. A bias-reduction factor can then be obtained by aver-aging over all the steps. Therefore, one may usemN � ðmkD � mkEÞ (called the shrinkage estimator) asa bias-reduced estimator of the true gene-effectsize, or alternatively use mkE (called the out-of-sample estimator). Note that (1) the above methodis proposed for estimation purposes only. Theoverall testing for linkage/association is per-formed using all available data (which givesmaximal power). (2) At some step, it’s possiblethat no marker is significant. If so, that step willnot be included in the bias-reduction factor,ðmkD � mkEÞ: The rationale is that mkD is viewed asan estimate of mN. Thus, at each step, we want tomimic the analyses performed by the naıvemethod as closely as possible. (3) The selectedmarker ink may be different from step to step and/or different from the marker in selected by thenaıve method based on all available data, but thebias-reduction factor, ðmkD � mkEÞ, is calculatedregardless of the chosen marker at each step.The argument is similar to that for (2) in that theanalyses within each resampling step shouldreflect those in the original sample so that mkD(a naıve estimate itself) approximates mN. In fact,equations (1) and (3) in the following section showthat the expected value of upward bias of a naıve

estimator does not depend on the specific locusselected, as long as its test statistic is themaximum. This is also similar to the classicalproblem of model validation in which differentpredictive models may be built at different cross-validation steps.To increase the stability of the 10-fold CV

estimator, one may wish to repeat the 10-foldresampling, independently, say 10 or 20 or moretimes to obtain a 10� 10-fold or 20� 10-fold CVestimator. The bias-reduction factor is then calcu-lated by averaging over all relevant steps.

BOOTSTRAP METHOD

With this method, repeated samples of nfamilies are randomly chosen from the originaldataset. For each of b¼1,y,B (e.g., B¼50, 100, or200) bootstrap samples, the sampling is performedwith replacement, so that in any particular samplesome families can be chosen more than once, andsome may not be chosen at all. Following Efronand Tibshirani [1997] and Ambroise and MacLa-chlan [2002], the families chosen for sample bconstitute the detection sample, and the familiesthat do not appear in sample b (out-of-samplefamilies) comprise the estimation sample, thusproviding independence within each of the Bresampling steps. This approach can be viewed asa direct extension of the CV method, but withnD¼n and nEE.368n (on average, 36.8% of thefamilies do not appear in a particular bootstrapdetection sample). The shrinkage estimator,mN � ðmkD � mkEÞ , and the out-of-sample estimator,mkE , can be calculated correspondingly.In the context of classification, the .632 estimator

[Efron, 1983] has been found to have goodproperties in estimation of prediction error. Inanalogy to that, we consider the weighted averageof mN and mkE as a new estimator, ð1� wÞ�mN þw�mkE (called the weighted estimator). Forexample, w¼.632 for the weighted bootstrapestimator and similarly w¼.9 for the weightedCV estimator.

PROOF OF PRINCIPLE: A SIMPLEQUANTITATIVE TRAIT MODEL

We now consider a simple example to demon-strate analytically the upward bias of the naıveestimator, the properties of the independentestimator, as well the inadequacy of the split-sample method. We note that although themarker independence assumption assumed in

Effect Estimation in Genomewide Scans 355

Page 5: Reduction of selection bias in genomewide studies by resampling

the example is not generally realistic, it is notcritical to the conclusions drawn here. Thesimplicity of the model allows us to obtainmathematical expressions for the bias of the naıveestimator and have a better understanding of thenature of the problem. We provide detailedderivations in the Appendix, and we present hereonly the key results.Suppose that a set of m¼22 markers, one

on each of the 22 pairs of autosomal chromo-somes, have been typed, and n independentfamilies or individuals have been ascertainedand genotyped at each of the marker loci.Let Xij be the random variable representing thedata for the relationship between the trait value(s)and marker i for family/individual j, i¼1,y,m andj¼1,y,n. For simplicity, we assume that markeri¼1 is the true trait locus. Furthermore, we assumethat Xij is normally distributed, in particular,X1jBN(m, 1) (m40) and XijBN(0, 1) for i¼2,y,m,j¼1,y,n. Thus, m represents the gene-effect size,the parameter of interest. For a given dataset, thetwo primary objectives: (i) a genomewide searchfor the location of the trait locus, and (ii) locus-

specific estimation of the gene-effect size of thetrait locus, can be formulated as:

ðiÞ testing Ho : m ¼ 0; and ðiiÞ estimating m:

A test statistic for objective (i) is Tmax ¼ Ti� ¼maxfT1; :::;Tmg; where Ti ¼

Pj Xij=n; the sample

mean at marker i. Because of the independenceassumption, T1BN(m, 1/n), TiBN(0, 1/n) fori¼2,y,m. Under the null hypothesis that m¼0,the pdf of Tmax can be shown to be fTmax ¼ m � fo �ðFoÞm�1; where fo and Fo are, respectively, the pdfand cdf of N(0, 1/n). Suppose the nominal type Ierror rate is chosen to be a¼.05, the appropriatethreshold, zn, can be derived by noting thatPHoðTmaxoz�Þ ¼ fFoðz�Þgm ¼ :95. Thus, z� � 2:8298=ffiffiffin

p. Note that the test is one-sided, since the

alternative of interest is m40.

NAIVE ESTIMATOR

To estimate the unknown parameter m, the naıveestimator would then be calculated based on the

same sample of size nN¼n, i.e., mN ¼ gi� ¼Pj Xi�j=n ¼ Ti� . Note that in this simple case, the

sample mean is used as both the test statistic andparameter estimate. Under the null hypothesis,Ho: m¼0, the bias and the standard error (SE) ofthe naıve estimator can be shown to have thefollowing expressions:

BiasHoN ¼ EHo ½mN� � 0 ¼

R1z� tfTmaxðtÞdt

að1Þ

SEHo

N ¼R1z� ðt� EHo ½mN�Þ

2fTmaxðtÞdta

( )12

ð2Þ

Note that under the null model when there is nogenetic effect, the bias is proportional to thestringency of the a level. Under a particularalternative model, Ha : m(40), the bias and SE ofthe naıve estimator can be calculated as follows:

BiasHa

N ¼ EHa ½mN� � m

�R1z� tfT1

ðtÞdtþR1z� yFT1

ðyÞfYðyÞdyTPþ FP

� m ð3Þ

where fT1and FT1

are, respectively, the pdf and cdfof N(m, 1/n), Y ¼ maxfT2; :::;Tmg, and fY and FYare, respectively, the pdf and cdf of Y,fY ¼ ðm� 1Þ � fo � ðFoÞm�2. TP is the true positiverate or power, and FP is the false-positive rate:

TP ðPowerÞ ¼ PðTmax4z�;Tmax ¼ T1Þ

¼ ð1� FT1ðz�ÞÞFYðz�Þ þ

Z 1

z�ð1� FT1

ðyÞÞfYðyÞdy

ð5Þ

FP ¼ PðTmax4z�;Tmax ¼ YÞ

¼Z 1

z�FT1

ðyÞfYðyÞdy ð6Þ

We note thatR1z� ð1� FT1

ðyÞÞfYðyÞdyoR1z� fYðyÞ dy

¼ PðY4z�Þ, a value that is rather small whenm, the number of markers, is not too large(e.g., P(Y4zn)¼1�FY(z

n)¼1�.952¼.058 form¼22). Thus, power is mainly determinedby 1� FT1

ðz�Þ ¼ PðZ �ffiffiffin

pz� � m

ffiffiffin

pÞ ¼ PðZ �

2:8298� mffiffiffin

pÞ. Similarly, one can argue that the

SEHaN �

R1z� ðt� EHa ½mN�Þ

2fT1ðtÞdtþ

R1z� ðy� EHa ½mN�Þ

2FT1ðyÞfYðyÞdy

TPþ FP

( )12

ð4Þ

Sun and Bull356

Page 6: Reduction of selection bias in genomewide studies by resampling

false-positive rate is generally small, becauseR1z� FT1

ðyÞfYðyÞdyoR1z� fYðyÞdy. In addition,R1

z� yFT1ðyÞfYðyÞdyo

R1z� yfYðyÞdy, a value that is

also small for small m (e.g.,R1z� yfYðyÞdy ¼ :012 for

m¼22). As a result, the bias of the naıve estimatorunder a particular alternative model is mainlydetermined by ð

R1z� tfT1

ðtÞdtÞ=TP, which is propor-tional to the power of testing for linkage.Although it is not possible, in practice, to

distinguish a true positive from a false positive,it is of interest to know the behavior of anestimator given these two outcomes. Ideally, inthe event of a TP, the estimate should be close tothe true gene-effect size, while in the event of anFP, the estimate should be close to zero gene-effectsize. (Note that bias given FP is measuredrelatively to zero gene-effect size.) However, thisis not the case for the naıve estimator. In fact, theconditional biases of the naıve estimator stratifiedby TP and FP are, respectively,

BiasHa

N jTP ¼ EHa ½mNjTP� � m

�R1z� tfT1

ðtÞdtTP

� m ð7Þ

BiasHaN jFP ¼ EHa ½mNjFP� � 0

¼R1z� yFT1

ðyÞfYðyÞdyFP

ð8Þ

We observe that the conditional bias of the naıveestimator stratified by TP decreases as the truegene-effect size increases because of increasedpower, while the bias stratified by FP stays highbecause of decreased FP. Certainly, as the truegene-effect size increases, the event of an FPbecomes so rare that the corresponding condi-tional bias may not be meaningful. Nevertheless,the naıve estimate stratified by FP is substantiallyhigher than zero gene-effect size.To demonstrate some results numerically, we

assume that the gene-effect size m ranges from 0 to.3 on a grid of .05, the sample size n is 150 or 300,and the nominal genomewide type I error a is .05.We use Maple 8 to perform the calculations;results are illustrated in Figure 1. Figure 1a showsthe bias, calculated based on equations (1) (m¼0)and (3) (m40), and the SEs, calculated based onequations (2) (m¼0) and (4) (m40), of the naıveestimator. We first note that the upward bias isparticularly a problem under the null hypothesiswhen there is in fact no genetic effect. This isbecause the estimation is performed conditionalon a positive test result. In that case, because Tmax,

the test statistic, has already exceeded the thresh-old of zn, the estimate of the effect size based onthe same statistic is then at least zn. For example,in the case of z� ¼ 2:8298=

ffiffiffin

p, the bias would be at

least .231 for n¼150 and .163 for n¼300. Indeed,the biases for the two cases are, respectively, .255and .181. The SE decreases as the sample sizeincreases, but it increases as the true gene-effectsize increases. The smaller variance for smallergene-effect size m is due to the fact that theestimates are generated from the right tail of theoriginal distribution fTmax , that is a truncateddistribution conditional on a positive test result(i.e., Tmax4zn). In contrast, when both m and n arelarge (i.e., power is high), the estimates are mostlygenerated from the distribution fT1

with SE beingclose to

ffiffiffiffiffiffiffiffi1=n

p. Indeed, when m¼.3 and n¼300

ðffiffiffiffiffiffiffiffi1=n

p¼ :058Þ, SEN¼.056.

Figure 1b shows the conditional biases of thenaıve estimator stratified by TP and FP status,calculated by equations (7) and (8), as well as theproportions of TP (power) and FP, calculated byequations (5) and (6), for each of the alternativesconsidered. It is clear that, as the gene-effect sizeand sample size increase, bias stratified by TPdecreases because it is proportional to the rate ofTP, which is increasing, while bias stratified by FPincreases slightly because it is proportional to therate of FP, which is decreasing.

INDEPENDENT ESTIMATOR

To investigate the properties of the independentestimator mI, we consider the situation in which anindependent sample of the same size as theoriginal sample (nI¼n) is collected, Xij, i¼1,y,mand j¼n+1,y,2n, after completion of the initialstudy has shown a genomewide significant result.For the marker in selected by the naıve method inthe original sample, the independent estimator,mI, of the gene-effect size is gIi� ¼

P2nj¼nþ1 Xi�j=n.

Under the null hypothesis that m¼0, mI isobviously unbiased, with SEI ¼ 1=

ffiffiffin

p: However,

under a particular alternative hypothesis, m(40),mI would be conservative, because in reality onecould not distinguish an FP from a TP in theoriginal sample. In fact, we show that

BiasHaI ¼ EHa ½mI� � m ¼ �m � FP

TPþ FPð9Þ

SEHaI ¼ 1

nþ m2 � TP � FP

ðTPþ FPÞ2

( )12

ð10Þ

Effect Estimation in Genomewide Scans 357

Page 7: Reduction of selection bias in genomewide studies by resampling

where TP and FP are true and false positive ratesof the hypothesis test performed in the originalsample. Table I gives the bias and SE calculated

based on equations (9) and (10), as well as thesquare root of the Mean Squared Error ð

ffiffiffiffiffiffiffiffiffiffiMSE

pÞ of

the independent estimator mI using the models

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0.0

0.0 0.05 0.10 0.15 0.20 0.25 0.30

0.05 0.10 0.15 0.20 0.25 0.30

−0.05

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0.0

−0.05

Gene-Effect Size

Ove

rall

Bia

s an

d S

EBias and

Bias and

SE, Naive estimator (n = 150)

SE, Naive estimator (n = 300)

Bias|TP (n=150)

Bias|FP (n=150)

Bias|TP (n=300)

Bias|FP (n=300)

0.0

Gene-Effect Size

Bia

s S

trat

ified

by

TP

and

FP

TP=Power (n=150): .053 .157 .347 .586 .796TP=Power (n=300): .133 .403 .731 .930 .990

FP (n=150): .046 .042 .035 .025 .014

FP (n=300): .043 .033 .017 .005 .001

.013

.024

.047

.047

(a)

(b)

Fig. 1. Numerical results for the simple example. a: Overall bias and standard error (SE) of the naıve estimator. Circles represent the

biases, and the vertical lines represent the SEs. b: Conditional bias of the naıve estimator stratified by True Positives (TP) and False

Positives (FP) status. *: The biases stratified by TP; and *: The biases stratified by FP.

Sun and Bull358

Page 8: Reduction of selection bias in genomewide studies by resampling

considered above. If one assumes the ability todistinguish a TP from a FP, the independentestimator obviously would be unbiased in bothcases. (Note that the bias stratified by FP ismeasured relatively to zero gene-effect size.)

SPLIT-SAMPLE ESTIMATOR

For the split-sample estimator, we consider 50–50, 60–40, 70–30, 80–20, and 90–10 sample splits,(i.e., nD¼.5n, .6n, .7n, .8n, .9n and nE¼.5n, .4n, .3n,.2n, .1n respectively). Table II gives the results ofthe split-sample method for the case of m¼.2 andn¼300, since results for other alternatives aresimilar, and under the null model, mE is unbiasedwith SEE ¼

ffiffiffiffiffiffiffiffiffiffi1=nE

p. TP and FP are calculated

based on equations (5) and (6) with correspondingsample size nD, and bias and SE are calculatedbased on equations (7) and (8) with correspondingsample size nE. The results in Table II indicate thatsimple split-sample methods are not adequate,either in terms of power of testing for linkage orthe stability of estimating the true gene-effect size.For example, a 50–50 split method reduces thepower from .731 (using all sample) to .347. A 90–10 split method preserves the power better, but thevariance of the estimate is substantially increased,because it is calculated in only 10% of the originalsample.

CVAND BOOTSTRAP ESTIMATORS

Within each resampling step, the CV method isessentially a split-sample method. However, thedependency among the steps makes it difficult toobtain analytical results even under our simplemodel. Instead, we perform simulation studies to

assess empirically the performance of the CV andbootstrap estimators, as well as the naıve andindependent estimators for comparison. Figure 2a(n¼150) and b (n¼300) illustrate the biases andSEs of the methods based on 1,000 simulatedreplicates. For the resampling-based approaches,we present results only for the weighted estima-tors, ð1� wÞ�mN þ w�mkE. We observe that the10� 10-fold CV and 100 bootstrap estimators havea similar performance, but as is well known, CVhas a smaller bias and bootstrap has a smallervariance. Under the null hypothesis, or when thegene effect is small, both CV and bootstrapestimators are biased upward, but the biases aremuch reduced compared with the bias of thenaıve estimator. This under-correction is due to aform of residual bias. That is, when the effect sizeis small, all sample points of a dataset that givesan overall significant linkage result tend to besampled from the right tail of the true underlyingdistribution with larger values than those from arandom sample. Thus, conditional on a significantresult, the value obtained in the estimationsample, mE, although independent of the corre-sponding detection sample, is however correlatedwith the overall sample, and therefore overesti-mates the true effect size (see Table III).Figure 2c (n¼150) and d (n¼300) illustrate the

conditional biases and SEs, stratified by TP and FPstatus, of the estimators considered. (Bias strati-fied by FP is relative to zero effect size.) Becausefalse positives are rare when the gene-effect size islarge and only FP occur when m¼0, we only showthe results for m¼.1 and .2 based on 1,000simulated TP or FP replicates. As pointed outin the previous section, the conditional naıveestimates are biased upward regardless of TP andFP, while the conditional independent estimatesare unbiased in both cases. The conditional CV

TABLE I. Numerical results of the bias, Standard Error(SE), and square root of the Mean Squared Error

ffiffiffiffiffiffiffiffiffiffiMSE

p� �for the independent estimator, lI , calculated in anindependent replication sample of the same size as theoriginal sample

Independent estimator, mI, with sample size n

n¼150 n¼300

Effect size Bias SEffiffiffiffiffiffiffiffiffiffiMSE

pBias SE

ffiffiffiffiffiffiffiffiffiffiMSE

p

m¼0 0 .081 .081 0 .057 .057m¼.05 �.0392 .084 .092 �.0330 .062 .070m¼.1 �.0464 .095 .106 �.0245 .072 .076m¼.15 �.0319 .102 .107 �.0113 .070 .071m¼.2 �.0185 .100 .101 �.0047 .065 .065m¼.25 �.0102 .095 .096 �.0015 .061 .061m¼.3 �.0052 .090 .091 �.0003 .058 .059

TABLE II. Numerical results for the split-samplemethods, including true positive rate (TP) or powerand false-positive rate (FP) of testing for linkage basedon the detection sample, and the bias, Standard Error(SE) and square root of Mean Squared Error

ffiffiffiffiffiffiffiffiffiffiMSE

p� �of

the estimate calculated in the estimation sample, lE

Parameter(m¼.2, n¼300)

Detection sample Estimation sample, mE

Method TP (power) FP Bias SEffiffiffiffiffiffiffiffiffiffiMSE

p� �50–50 split .347 .035 �.0185 .098 .10160–40 split .436 .031 �.0135 .103 .10470–30 split .521 .028 �.0101 .113 .11480–20 split .600 .024 �.0077 .134 .13590–10 split .670 .021 �.0060 .185 .186

Effect Estimation in Genomewide Scans 359

Page 9: Reduction of selection bias in genomewide studies by resampling

0.3

0.2

0.2

0.1

0.1

0.0

0.0

0.1

0.1

0.2

0.2

0.2

0.2

0.3

0.3

0.2

0.2

0.1

0.1

0.0

0.0

0.3

Gen

e-E

ffect

Siz

e

Overall Bias and SE

Bia

s an

dB

ias

and

Bia

s an

dB

ias

and

SE

, Nai

veS

E, 1

00 B

oots

trap

SE

, 10x

10-f

old

CV

SE

, Ind

epen

dent

Gen

e-E

ffect

Siz

e

Bias and SE

Bia

s an

dB

ias

and

Bia

s an

dB

ias

and

SE

, Nai

veS

E, 1

00 B

oots

trap

SE

, 10x

10-f

old

CV

SE

, Ind

epen

dent

TP

TP

TP

TP

FP

FP

FP

FP

TP

TP

TP

TP

FP

FP

FP

FP

Gen

e-E

ffect

Siz

e

Bias and SE Stratified by TP and FP

Bia

s an

dB

ias

and

Bia

s an

dB

ias

and

SE

, Nai

veS

E, 1

00 B

oots

trap

SE

, 10x

10-f

old

CV

SE

, Ind

epen

dent

0.3

0.25 0.2

0.15 0.1

0.05 0.0

−0.0

5

0.3

0.25 0.2

0.15 0.1

0.05 0.0

−0.0

5

0.3

0.25 0.2

0.15 0.1

0.05 0.0

−0.0

5

0.3

0.25

0.20

0.15

0.10

0.05 0.0

−0.0

5

TP

TP

TP

TP

FP

FP

FP

FP

TP

TP

TP

TP

FP

FP

FP

FP

0.1

0.1

Gen

e-E

ffect

Siz

e

Bias and SE Stratified by TP and FP

Bia

s an

dB

ias

and

Bia

s an

dB

ias

and

SE

, Nai

veS

E, 1

00 B

oots

trap

SE

, 10x

10-f

old

CV

SE

, Ind

epen

dent

(a)

(b)

(c)

(d)

Fig.2.Sim

ulationresu

ltsforthesimple

example.Overall

biasandstan

dard

error(SE)ofthenaıve,10�10-fold

weightedCV,100weightedbootstrapand

independentestim

ators,b

asedon1,000

simulatedreplicates.*,~

,+,and

�:T

hebiasesofthenaıve,100bootstrap,and10�10-fold

CVestim

ates,resp

ectively,

andtheverticallinesrepresentthecorrespondingSEs.a:sample

sizeof150.b

:sample

size

of300.C

onditionalbiasandSEstratifiedbyTruePositives(TP)and

FalsePositives(FP),basedon1,000

simulatedTPorFPreplicates.

c:Sample

size

of150.d:Sample

sizeof300.Seetextforsimulationdetails.

Page 10: Reduction of selection bias in genomewide studies by resampling

and bootstrap estimates stratified by TP substan-tially reduce the bias. On the other hand, theestimates stratified by FP maintain a considerabledegree of upward bias although the bias reductionin most cases is more than half.We also evaluate and compare the performance

of the 10-fold, 10� 10-fold, and 20� 10-fold CVmethods, and the 50, 100, and 200 bootstrapmethods, as well as the shrinkage estimator, theout-of-sample estimator, and the weighted esti-mator in each case. We conclude that 10� 10-foldCV increases the stability of the 10-fold CV, buthas a similar performance to the 20� 10-fold CV,and the 50, 100, and 200 bootstrap methods allgive a similar performance (results not shown).However, for a more complex model as in thefollowing simulation study, one may choose to useat least 20� 10-fold CV or 200 bootstrap samplesto guard against sampling variability. Table IIIcompares the shrinkage estimator, mN � ðmkD � mkEÞ;the out-of-sample estimator, mkE, and the weightedestimator, (1�w) *mN +w * mE, where w¼.632 for thebootstrap methods and w¼.9 for the CV methods.We note that under the null hypothesis (m¼0),

the shrinkage estimator has the smallest MSEfor both the CV and bootstrap methods. Whenthe gene-effect size is small (e.g., m¼.1),the shrinkage and out-of-sample estimatorsgive similar results. When the gene-effect sizeis moderate (e.g., m¼.2), the out-of-sampleand weighted estimators give similar resultsand perform better than the shrinkageestimator. When the gene-effect size is large(e.g., m¼.3), the weighted estimator has thesmallest MSE.

GENOMEWIDE LINKAGE STUDYOF AFFECTED SIB PAIRS

SIMULATION STUDY: DESIGN ANDMETHODS

We now consider a more realistic model in thecontext of genomewide linkage analysis with anASP design. We assume that a linkage analysis isconducted via allele sharing methods with nASPs, using a map of m¼352 markers. Markersare fully informative and evenly spaced at d¼10

TABLE III. Simulation results for the simple example, based on 1,000 simulated replicates, for the 20� 10-fold CVand 200 bootstrap methods, including the shrinkage, out-of-sample, and weighted estimators (respectively,

mN � mkD � mkE� �

; mkE and ð1� wÞ�mN þ w�mkE)a

ParametermN�(mD�mE) mE (1�w) nmN+w nmE

Method Bias SEffiffiffiffiffiffiffiffiffiffiMSE

pBias SE

ffiffiffiffiffiffiffiffiffiffiMSE

pBias SE

ffiffiffiffiffiffiffiffiffiffiMSE

p

m¼0, n¼150CV .0801 .117 .142 .0970 .108 .145 .1130 .099 .150Bootstrap .0917 .057 .108 .1410 .046 .148 .1834 .037 .187

m¼0, n¼300CV .0529 .082 .097 .0652 .075 .100 .0767 .069 .103Bootstrap .0619 .040 .074 .0975 .032 .103 .1280 .026 .131

m¼.1, n¼150CV .0015 .127 .127 .0168 .117 .118 .0313 .108 .112Bootstrap .0027 .070 .070 .0497 .056 .075 .0910 .046 .102

m¼.1, n¼300CV �.0102 .095 .095 �.0007 .088 .088 .0084 .081 .082Bootstrap �.0168 .054 .057 .0146 .044 .046 .0423 .036 .056

m¼.2, n¼150CV �.0215 .139 .141 �.0116 .130 .130 �.0020 .121 .121Bootstrap �.0483 .094 .105 �.0106 .077 .078 .0243 .064 .069

m¼.2, n¼300CV �.0152 .094 .096 �.0114 .089 .090 �.0078 .084 .084Bootstrap �.0451 .081 .093 �.0263 .068 .073 �.0075 .058 .059

m¼.3, n¼150CV �.0197 .131 .132 �.0153 .124 .125 �.0110 .117 .118Bootstrap �.0624 .118 .133 �.0386 .101 .108 �.0142 .086 .087

m¼.3, n¼300CV �.0024 .066 .066 �.0020 .065 .065 �.0017 .064 .064Bootstrap �.0198 .080 .083 �.0152 .074 .075 �.0091 .067 .068

aSee text for simulation details.

Effect Estimation in Genomewide Scans 361

Page 11: Reduction of selection bias in genomewide studies by resampling

centi-Morgan (cM) apart across the genome with atotal genome size of 3,300 cM. We use Xij to denotethe observed number of alleles shared IBD at theith marker for the jth ASP, and let p¼(p0, p1, p2) bethe distribution of IBD sharing. The null distribu-tion is p ¼ (.25, .5, .25), thus the expected numberof alleles shared IBD for an ASP is mo¼1� .5 + 2� .25¼1 with variance s2o ¼ 1=2 underthe null hypothesis of no linkage. The IBD sharingdistribution at a disease susceptibility gene locus,pg ¼ ðpg0; p

g1; p

g2Þ, would deviate from the null

distribution toward excess sharing. Let mga ¼1�p

g1 þ 2�p

g2 be the expected IBD sharing at the

gene locus, the excess sharing, m ¼ mga � mo, thencan be viewed as a measure of gene-effect size, theparameter of interest. Note that m in large partdetermines the power of testing for linkage. Weplace the disease locus in the middle of chromo-some 7, with marker i¼153 being the gene locus.We consider two sample sizes n¼150 and 300, andfour alternatives (see Table V). (Note that foreach of m¼.14 and .18, the first pg ¼ ðpg0; p

g1; p

g2Þ

distribution is consistent with a dominant diseasemodel, and the second with a recessive diseasemodel.)For a given marker i, the test statistic is the NPL

score:

Ti ¼ffiffiffin

p=so�

Xnj¼1

Xij=n� mo

0@

1A

¼ffiffiffin

p=ffiffiffiffiffiffiffiffi1=2

p�

Xnj¼1

Xij=n� 1

0@

1A:

Significance can be assessed using a normalapproximation, TiBN(0, 1), and excess sharing isan indication of linkage of marker i to the putativegene locus. In the context of genomewide linkageanalysis, such a test would be performed for eachof the m markers, and the peak of the scores,Tmax ¼ Ti� ¼ maxfT1; :::;Tmg , used to indicate thecentre of the region targeted for follow-up studies.To allow for the fact that a linkage study typicallyidentifies a chromosomal region rather a singlepoint location for follow-up studies, we allow theselected marker in to be the gene locus (i.e.,in¼153), the left adjacent marker (i.e., in¼152), orthe right adjacent marker (i.e., in¼154). That is, asuccessful linkage detection is achieved when theselected marker is at most 10 cM away fromthe gene locus. Requiring genomewide type Ierror a¼.05, we use a threshold of zn¼3.58(corresponding to LODE2.78). To estimate

m ¼ mga � mo, gi� ¼Pn

j¼1 Xi�j=n� 1, the observedexcess IBD sharing at marker in averaged overall n ASPs, is a natural choice.For each case of n and pg, we simulate marker

IBD sharing data, independently, for the n ASPs.For each ASP, IBD sharing of markers on the samechromosome follows a first-order stationary Mar-kov process under the Haldane no-interferencemodel, with corresponding transition probabilitygiven in Table IV. For chromosomes other than 7,we first simulate IBD sharing for the first markerbased on the null distribution, we then consecu-tively simulate data for the rest of the markersbased on values in Table IV. For chromosome 7,we start with the gene locus instead, based on oneof the four alternative distributions. We alsoconsider the case where IBD sharing at the genelocus follows the null distribution (i.e., no geneticeffect) to study the corresponding bias of theestimators.

RESULTS

Table V provides the TP (power) and FP rates oflinkage detection for each of the alternativedisease models considered. We first note thatpower of testing for linkage may change drasti-cally with a small change in the gene-effect size.Thus, accurate estimation of the effect size iscritical to a realistic assessment of power. We alsonote that power mainly depends on the effect sizerather than on the precise IBD distribution at thegene locus. Thus, in the rest of the section, wefocus on the results for the two alternatives ofpg¼(.18, .5, .32) and (.16, .5, .34).Figure 3a (n¼150) and b (n¼300) illustrate the

biases and SEs of the naıve, 20� 10-fold weightedCV, 200 weighted bootstrap, and independentestimators, under the null model and the twoalternative models above. Results are based on1,000 simulated replicates. We observe that resultsof the linkage example are extremely similar to

TABLE IV. Transition probability matrix of IBD sharingfor a sib paira

Sharing at next marker

Sharing at current marker 0 1 2

0 c2 2c f f2

1 cf c2+f2 2cf2 f2 2cf c2

ac ¼ y2 þ ð1� yÞ2 and f ¼ 1� c ¼ 2yð1� yÞ, and y is the combi-nation fraction between the two markers.

Sun and Bull362

Page 12: Reduction of selection bias in genomewide studies by resampling

those of the simple example. For example, thenaıve estimator grossly over-estimates the effectsize especially under the null hypothesis whenthere is in fact no genetic effect, e.g., the sampleaverage of the naıve estimates are .224 for n¼150and .156 for n¼300 when m¼0. In general, the CVmethod tends to have smaller bias while thebootstrap method has smaller variance. When theeffect size is moderate, both methods tend to beconservative, i.e., biased downward. We also findsimilar results for the stratified estimates, and theshrinkage, out-of-sample, and weighted estima-tors compared to those of the simple case above(not shown).

DISCUSSION

We have examined two cases in detail. The firstrepresents a quantitative trait assumed to benormally distributed, with one marker per chro-mosome. The second case is more complex, basedon IBD allele sharing in an ASP design andinvolving a 10-cM density genome scan. Thesestudies show that resampling-based methods suchas cross-validation and the bootstrap can yieldnearly unbiased effect estimates in relativelymodest sample sizes with weak effects withoutthe necessity of an independent replicate sample.We also note that, in practice, it may be difficult toobtain an independent replication sample. Forexample, sources of families in a particularpopulation and geographic area may have beenexhausted, and the cost of collecting a whole newdataset may be prohibitive.Our findings are consistent with those of Goring

et al. [2001] concerning the severity of effect-estimate bias of the naıve method in which thesample used to detect a genetic locus is also used

to estimate the genetic effect size at that locus. Weagree that replication of linkage findings has acritical role in genomewide studies of complexdiseases/traits, and that underpowered studiesremain a significant issue. We are concerned,however, that the statement of Goring et al. [2001,page 1357] that ‘‘attempts at bias correction giveunsatisfactory results, and that pointwise estima-tion on an independent data set may be the onlyway of obtaining reliable estimates of locus-specific effect’’ may lead investigators to adoptsample splitting strategies that are unnecessaryand misleading. Our studies quantify quite clearlythe degree to which sample-splitting reducespower to test for linkage and increases theuncertainty in effect-size estimates.Overall, in the cases we considered, the boot-

strap estimator generally has lower variability, butcross-validation is less biased. Both estimatorstend to be conservative and over-correct the bias.Among the shrinkage, out-of-sample, andweighted estimators ðmN � ðmkD � mkEÞ; mkE; andð1� wÞ�mN þ w�mkE respectively), the shrinkageMSE is better than the weighted MSE under thenull while the latter performs better for othercases. Considering the fact that power to detectlinkage in genetic studies of complex diseases/traits is generally low due to small effect size, theweighted estimator seems to be a good compro-mise. We also evaluated more sophisticatedresampling techniques such as the .632+ bootstrapmethod [Efron and Tibshirani, 1997]. This estima-tor has the form of ð1� wÞ�mN þ w�mkE; wherew¼.632/(1�.368R), R ¼ ðmkE � mNÞ=ða� mNÞ; therelative over-fitting rate, and a is the no-informa-tion error rate. We approximated a by m/m, thesample average of the effect size across all themarkers, which is similar to the approach ofSteyerberg et al. [2001]. We find that this estimator

TABLE V. Simulation results for the linkage example under the null hypothesis and the four alternatives considered,based on 1,000 TP or FP simulated replicatesa

Parameter Power (TP) FP or Type I error

m¼0.0 n¼150 pg¼(.25, .5, .25) .040n¼300 pg¼(.25, .5, .25) .057

m¼.14 n¼150 pg¼(.18, .5, .32) .119 .046pg¼(.2, .46, .34) .124 .045

n¼300 pg¼(.18, .5, .32) .459 .047pg¼(.2, .46, .34) .462 .047

m¼.18 n¼150 pg¼(.16, .5, .34) .308 .041pg¼(.2, .42, .38) .317 .038

n¼300 pg¼(.16, .5, .34) .812 .023pg¼(.2, .42, .38) .789 .024

Results include empirical type I error rate under the null, and power or true positive rate (TP) and false-positive rate (FP) under thealternatives.

Effect Estimation in Genomewide Scans 363

Page 13: Reduction of selection bias in genomewide studies by resampling

is not substantially better than the others. How-ever, we note that its performance might beimproved by precisely evaluating a.In the two cases considered here, we assume a

single gene model and homogeneity of the geneticmodel within the sampled families. Under locusheterogeneity, some families are linked to aspecific marker and some are not, and linkage ofa genetic marker may not be so well defined. Forexample, when measures of excess allele sharingare examined, the presence of unlinked families inthe sample will reduce the magnitude of the teststatistic and the corresponding effect size.Furthermore, the variability in these quantitiesassociated with variability in the composition of

the subsamples including a larger or smallerproportion of linked families will increase. In theabsence of covariate or other information thatcould be used to stratify families, we anticipatethat bias reduction of effect-size estimates mightbe less precise. In that case, one may wish toexamine in the analysis the variability of markerlocalization performed in each of the resamplingsteps. We are currently investigating the behaviorof our methods in the presence of locus hetero-geneity.Our resampling-based methods are designed

for samples of individuals or small families. In thecase that only one single family or a few familiesof uneven pedigree sizes represent most of the

0.05

−0.05

0.0

0.10

0.15

0.20

0.25

0.140.0 0.0 0.14 0.18 0.18

Gene-Effect Size

Bia

s an

d S

E

Bias andBias andBias andBias and

SE, NaiveSE, 200 BootstrapSE, 20x10-fold CVSE, Independent

0.05

−0.05

0.0

0.10

0.15

0.20

0.25

0.140.00.0 0.14 0.18 0.18

Gene-Effect Size

Bia

s an

d S

E

Bias andBias andBias andBias and

SE, NaiveSE, 200 BootstrapSE, 20x10-fold CVSE, Independent

(a)

(b)

Fig. 3. Simulation results for the linkage example. Overall bias and standard error (SE) of the naıve, 20� 10-fold weighted CV, 200weighted bootstrap and independent estimators, based on 1,000 simulated replicates. *, ~, +, and � : The biases of the naıve, 200

bootstrap, and 20� 10-fold CV estimates, respectively, and the vertical lines represent the corresponding SEs. a: Sample size of 150. b:

Sample size of 300. See text for simulation details.

Sun and Bull364

Page 14: Reduction of selection bias in genomewide studies by resampling

data, the proposed cross-validation and bootstrapmethods are not suitable and alternative ap-proaches such as those based on specification ofa genetic model may be more appropriate. Theextension of resampling methods to multilocusmodels also presents some interesting issues forthe detection of multiple genetic loci and simulta-neous estimation of multiple effect sizes. Depend-ing on the criteria used, the loci selected in theoriginal sample might be based on all markerswith test statistics that meet genomewide signifi-cance, or, the top ranked markers (say the top five)chosen according to the observed test statistics.Implementation of methods for these designs willrequire a variable selection step to be incorporatedinto the testing phase. This is the subject ofongoing research.The problem of over-estimating locus-specific

effect size is inherent in genomewide studiesregardless of the specific methods used. Theproposed resampling based estimation methodscan be readily applied to any given study as longas the initial methods for hypothesis testing forlinkage/association and parameter estimation ofthe effect size are defined. For example, in thelinkage example above, suppose one uses theexponential model of Kong and Cox [1997] in theinitial study, then the test statistic is the log-likelihood ratio, and the parameter of interest is d[Kong and Cox, 1997], which indirectly measuresthe amount of excess sharing (d¼0 under the nulland d40 for the alternative). To obtain the CVandbootstrap estimates, within each resampling re-plicate, one would need to perform the corre-sponding likelihood ratio test in the detectionsample, and obtain corresponding estimates of din both the detection and estimation samples toconstruct the shrinkage, out-of-sample orweighted estimates. In conclusion, our methodhas a very general framework, not limited to anyparticular study design, and initial results showpromise and indicate that resampling methods ingeneral can accurately and efficiently navigate theeffect-size estimation phase of a genome scan forcomplex diseases and quantitative traits.

ACKNOWLEDGMENTS

This work was supported by a grant from theCanadian Institutes of Health Research (CIHR) toS.B.B. and L.S., by grants from the Natural Scienceand Engineering Research Council of Canada(NSERC) to L.S. and to S.B.B. S.B.B. holds a Senior

Investigator Award from the Canadian Institutesof Health Research. The authors thank ProfessorDavid Andrews, Professor Radu Craiu, and Long-yang Wu for their useful suggestions and com-ments. The authors also thank two anonymousreviewers for their careful review of the manu-script and various helpful comments.

REFERENCES

Allison D, Fernandez JR, Heo M, Zhu S, Etzel C, Beasley TM,Amos CI. 2002. Bias in estimates of quantitative-trait-locuseffect in genome scans: demonstration of the phenomenon anda method-of-moments procedure for reducing bias. Am J HumGenet 70:575–585.

Almasy L, Blangero J. 1998. Multipoint quantitative-trait linkageanalyses in general pedigrees. Am J Hum Genet 62:1198–1211.

Ambroise C, McLachlan GJ. 2002. Selection bias in gene extractionon the basis of microarray gene-expression data. Proc Natl AcadSci 99:6562–6566.

Amos CI. 1994. Robust variance-components approach forassessing genetic linkage in pedigrees. Am J Hum Genet54:535–543.

Editorial. 1999. Freely associating. Nature Genetics 22:1–2.Efron B. 1983. Estimating the error rate of a prediction rule:

some improvements on cross-validation. J Am Stat Assoc78:316–331.

Efron B, Tibshirani R. 1997. Improvements on cross-validation: the.632+ bootstrap method. J Am Stat Assoc 92:438–548.

Elston RC, Buxbaum S, Jacobs KB, Olson JM. 2000. Haseman andElston revisited. Genet Epidemiol 19:1–17.

Goring H, Terwilliger JD, Blangero J. 2001. Large upward bias inestimation of locus-specific effects from genomewide scans. AmJ Hum Genet 69:1357–1369.

Kong A, Cox NJ. 1997. Allele-sharing models: LOD scores andaccurate linkage tests. Am J Hum Genet 61:1179–1188.

Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES. 1996. Parametricand non-parametric linkage analysis: a unified approach. Am JHum Genet 58:1347–1363.

Lander E, Kruglyak L. 1995. Genetic dissection of complex traits:guidelines for interpreting and reporting linkage results. NatureGenet 11:241–247.

Siegmund D. 2002. Upward bias in estimation of genetic effects.Am J Hum Genet 71:1183–1188.

Steyerberg EW, Eijkemans MJC, Harrell FE Jr, Habbema JDF. 2000.Prognostic modeling with logistic regression analysis: acomparison of selection and estimation methods in smalldatasets. Stat Med 19:1059–1079.

Suarez B, Hampe CL, Eerdewegh PV. 1994. Problems ofreplicating linkage claims in psychiatry. In: Geishon ES,Clonizer CR, editors. Genetic approaches to mental disorders.Washington, DC: American Psychiatric Press. p 23–46.

APPENDIX

In this Appendix, we derive expressions for thebiases and variances of the naıve and independentestimators of gene-effect size under the simplemodel described in the text. We also derive the

Effect Estimation in Genomewide Scans 365

Page 15: Reduction of selection bias in genomewide studies by resampling

expressions for the power or true positive rate(TP) and false-positive rate (FP) of testing forlinkage. Given the model, T1BN(m, 1/n), andTiBN(0, 1/n) for i¼2,y,m. Under the nullhypothesis that m¼0, pdf of Tmax¼max{T1,y, Tm}is fTmax ¼ m � fo � ðFoÞm�1; where fo and Fo are,respectively, the pdf and cdf of N(0, 1/n). Let zn

be the appropriate threshold for nominal type Ierror rate of a.

NAIVE ESTIMATOR

Given a significant test result, i.e., Tmax ¼ Ti�4z�,the naıve estimator of parameter m, the gene-effect size, would be mN ¼ gi� ¼ Tmax. Under thenull hypothesis, the bias of the naıve estimatorgiven in equation (1) can be calculated as thefollowing

BiasHoN ¼EHo ½mN� � 0 ¼ E½TmaxjTmax4z��

¼R1z� tfTmaxðtÞdxPðTmax4z�Þ ¼

R1z� tfTmaxðtÞdt

a:

The variance of the naıve estimator given as the SEin equation (2) can be calculated as

VHo

N ¼ VarHoðmNÞ ¼E½ðTmax � EHo ½mN�Þ2jTmax4z��

¼R1z� ðt� EHo ½mN�Þ

2fTmaxðtÞdta

:

Under a particular alternative hypothesis,m(40), the distribution of Tmax is rather complex.We consider T1 and Y¼max{T1,y,Tm} separately.The expectation of the naıve estimator can becalculated as

where fT1and FT1

are, respectively, the pdf and cdfof N(m, 1/n), and fY and FY are, respectively, the pdfand cdf of Y, fY¼(m�1) � fo � (Fo)m�2. TP and FP arethe true and false-positive rates of testing for

linkage given in equations (5) and (6) derived as inthe following

TP ðPowerÞ ¼ PðTmax4z�;Tmax ¼ T1Þ

¼PðT14z�;T14YÞ

¼Z z�

�1PðT14z�;T14yÞfYðyÞdy

þZ 1

z�PðT14z�;T14yÞfYðyÞdy

¼Z z�

1

Z 1

z�fT1

ðtÞdt� �

fYðyÞdy

þZ 1

z�

Z 1

yfT1

ðtÞdt !

fYðyÞdy

¼ð1� FT1ðz�ÞÞFYðz�Þ þ

Z 1

z�ð1� FT1

ðyÞÞfYðyÞdy;

FP ¼PðTmax4z�;Tmax ¼ YÞ ¼ PðY4z�;Y4T1Þ

¼Z 1

z�

Z y

�1fT1

ðtÞdt� �

fYðyÞdy ¼Z 1

z�FT1

ðyÞfYðyÞdy:

We note that Maple 8 cannot perform the integralR1z� tFYðtÞfT1

ðtÞdt: As an alternative, we approxi-mate P(T14zn, T14Y) by P(T14zn), which isequivalent to FY(t)E1 for t4zn. This approxima-tion is reasonable because the event that Y4T1

conditional on T14zn is unlikely under thealternative model. In that case,

R1z� tFYðtÞfT1

ðtÞdtsimplifies to

R1z� tfT1

ðtÞdt; which can be handled byMaple 8. This yields the working expression for

the bias of the naıve estimator given in equation(3). The variance of the naıve estimator under thealternative model can be calculated in a similarfashion yielding equation (4).

EHa ½mN� ¼E½TmaxjTmax4z�� ¼R1z� tfTmaxðtÞdtPðTmax4z�Þ ;

¼

R1z� t

R t�1 fYðyÞdy

� �fT1

ðtÞdtþR1z� y

R y�1 fT1

ðtÞdt� �

fYðyÞdyPðTmax4z�;Tmax ¼ T1Þ þ PðTmax4z�;Tmax ¼ YÞ ;

¼R1z� tFYðtÞfT1

ðtÞdtþR1z� yFT1

ðyÞfYðyÞdyTPþ FP

;

Sun and Bull366

Page 16: Reduction of selection bias in genomewide studies by resampling

The conditional naıve estimators stratified byTP and FP can be calculated as

EHa ½mNjTP� ¼ E½TmaxjTmax4z�;Tmax ¼ T1�

¼R1z� tð

R t�1 fYðyÞdyÞfT1

ðtÞdtPðTmax4z�;Tmax ¼ T1Þ

¼R1z� tFYðtÞfT1

ðtÞdtTP

;

EHa ½mNjFP� ¼ E½TmaxjTmax4z�;Tmax ¼ Y�

¼R1z� yð

R y�1 fT1ðtÞdtÞfYðyÞdy

PðTmax4z�;Tmax ¼ YÞ

¼R1z� yFT1

ðyÞfYðyÞdyFP

:

INDEPENDENT ESTIMATOR

We assume that an independent replicationsample of the same size as the original sample isavailable, Xij, i¼1,y,m and j¼n+1,y,2n. For themarker in selected by the naıve method in theoriginal sample, the independent estimator, mI, ofthe gene-effect size is gIi� ¼

P2nj¼nþ1 Xi�j=n ¼ TIi� .

Under the null model that m¼0, TI* is obviouslyunbiased, BiasHo

I ¼ EHo ½mI� � 0 ¼ 0, and VHoI ¼

VarHoðmIÞ ¼ 1=n:Under the alternative model that m40, the bias

of the independent estimator given in equation (9)can be calculated as

The variance of the independent estimator givenas the SE in equation (10) can be calculated as

VHaI ¼ VarHaðmIÞ

¼ E½ðTIi� � EHa ½mI�Þ2ji� ¼ 1� � PðTmax ¼ T1jTmax4z�Þ

þ E½ðTIi� � EHa ½mI�Þ2ji� 6¼ 1� � PðTmax ¼ YjTmax4z�Þ

¼ E½ðTIi� � mþ m� EHa ½mI �Þ2ji� ¼ 1� � TP

TPþ FP

þ E½ðTIi� � 0þ 0� EHa ½mI�Þ2ji� 6¼ 1� � FP

TPþ FP

¼ fVarðTIi� Þ þ ðEHa ½mI� � mÞ2g � TP

TPþ FP

þ fVarðTIi� Þ þ ðEHa ½mI� � 0Þ2Þg � FP

TPþ FP

¼ 1

nþ ðEHa ½mI� � mÞ2 � TP

TPþ FP

þ ðEHa ½mI�Þ2 � FP

TPþ FP

¼ 1

nþ m2 � TP � FP

ðTPþ FPÞ2:

The conditional independent estimators strati-fied by TP and FP are obviously unbiasedestimators, respectively, of the true gene-effectsize and the zero gene-effect size, EHa ½mIjTP� ¼ m,and EHa ½mIjFP� ¼ 0:

BiasHaI ¼EHa ½mI� � m

¼E½TIi� ji� ¼ 1� � PðTmax ¼ T1jTmax4z�Þþ E½TIi� ji� 6¼ 1� � PðTmax ¼ YjTmax4z�Þ � m

¼m � PðTmax ¼ T1jTmax4z�Þ þ 0 � PðTmax ¼ YjTmax4z�Þ � m

¼m � PðTmax4z�;Tmax ¼ T1ÞPðTmax4z�;Tmax ¼ T1Þ þ PðTmax4z�;Tmax ¼ YÞ � m

¼m � TP

TPþ FP� m ¼ �m � FP

TPþ FP:

Effect Estimation in Genomewide Scans 367