@let@token Bootstrap, Jackknife and cross …Bootstrap, Jackknife and cross-validation Jackknife...

Bootstrap, Jackknife and cross-validation

October 8, 2013


Bootstrap, jackknife and cross-validation

I Computer intensive resampling methods of statisticalinference (point estimation, standard error, confidenceinterval, model selection, hypothesis testing)

I Jackknife for eliminating bias of an estimate

I Bootstrap for estimating standard error of an estimate and forcomputing confidence intervals for population parameters

I Cross-validation for model selection


Jackknife

I Quenouille (1949) introduced the method in order toeliminate bias in an estimate

I Tukey (1958) suggested that Quenouille’s method could beused as a non-parametric method of estimating the mean andvariance of an estimate→ the term jackknife to signify an all-purpose statistical tool


Jackknife algorithm for bias reduction

1. Estimate θ according to some appropriate algorithm (forexample ML or LS) to obtain an estimate θ.

2. Delete an observation from the data and recalculate theestimate for θ from the remaining n − 1 observations. Theestimate is denoted by θ−i

3. Calculate the “pseudovalue”

Si = nθ − (n − 1)θ−i

4. Repeat steps 2 and 3 until you have gone through allobservations

5. The jackknife estimate for θ is the mean of the pseudovalues,namely

θ =1

n

n∑i=1

Si = nθ − n − 1

n

n∑i=1

θ−i .


Conclusions

I Computationally easy

I Intruduced for bias elimination but often used also forestimating standard error

I Here we introduced the so-called “delete-one jackknife”. Bydeleting more than one observation each time, a higher orderjackknife estimators can be constructed


Bootstrap

I Resampling with replacement, introduced as a general methodby Efron (1979)

I The origin of the name bootstrap: someone pulling himselfout of mud with his bootstraps

I Mostly used to find confidence intervals for populationparameters


Idea

I In the absence of any other knowledge about a population,the distribution of values found in a random sample of size nfrom the population is the best guide to the distribution in thepopulation

I To approximate what would happen if the population wasresampled, we resample the data

I The sampling is done with replacement


Bootstrap in point estimation

1. A set of n observations from which we estimate a parameterθ, the estimate is denoted by θ

2. Generate a bootstrap sample: randomly sample withreplacement n observations from the original data set andcalculate the estimate of θ denoted by θ∗i

3. Repeat 2) many times, say B, and obtain the estimatesθ∗1, θ

∗2, ..., θ

∗B


Point estimation (continues)

I Simplest bootstrap estimate for θ would be the mean of thebootsrap estimates

θ∗ =1

B

B∑i=1

θ∗i

I Bias-adjusted bootstrap estimate: If θ is biased for θ, then θ∗

will itself be biased because it estimates θ rather than θ. Thebias is defined as

Bias(θ) = θ − θand it can be estimated by

ˆBias(θ)) = θ∗ − θ

By combining the two formulae above we get thebias-adjusted bootstrap estimate

θ∗BA = θ − ˆBias(θ) = θ − θ∗ + θ = 2θ − θ∗


Estimating standard error

I To be able to compute the standard error of an estimateI sampling distribution of the statistic has to be knownI the exact formula for the standard error has to be known

I The standard error of the estimate θ can always be estimatedby the sample standard deviation of the B bootstrapreplications

seB =

√√√√ 1

B − 1

B∑i=1

(θ∗i − θ∗

)2,

where θ∗ is the mean of the bootstrap estimates (or by usingthe bias-adjusted estimates).

I Remark: To obtain a good estimate for the standard error ofthe estimate, 100 bootstrap replicates may be enough, but inorder to get good confidence intervals, a much larger numberof replicates (say 1000) is needed.


Example: Study of snow geese

I Aim: To study feeding habits of baby snow geese

I Data collection: 33 goslings were without food until their gutswere empty and then, they were allowed to feed for 6 hours ona diet of plants. The change in the weight of the gosling after2.5 hours was recorded as a percentage of initial weight.Digestion efficiency (measured as a percentage) was alsorecorded.

I Question: Are the weight change and digestion efficiencycorrelated?


Example: Data

−5 0 5

010

2030

4050

weight change (%)

Dig

estio

n ef

ficie

ncy

(%)

Baby snow geese study


Example: correlation coefficient

I The computed correlation coefficient is r = 0.309. Howaccurate is this estimate?

I Compute the standard error (s.e.) of the correlationcoefficient. If it is small, the estimate is accurate.

I There is no formula for the s.e. of the correlation coefficientand therefore, it is hard to assess the accuracy of the estimate→ Bootstrap (or jackknife)


Example: Bootstrap standard error

We draw a bootstrap sample, a random sample of size n (here 33),with replacement from the observed pairs of values, and calculatethe correlation coefficient from this sample, say r∗1 . This isrepeated a number of times, here B = 1000. The bootstrapestimate of the standard error of the correlation coefficient is then

se(r) =

√√√√ 1

999

1000∑i=1

(r∗i − r∗)2 = 0.191,

where r∗ is the mean of the bottstrap estimates of the correlationcoefficient.


Confidence intervals

I Several methods exist to constract bootstrap confidenceintervals

I Different methods do not necessarily give the same interval,particularly if the distribution is highly skewed

I It is difficult to know beforehand which method to use andone may have to perform a simulation study in order tochoose a suitable method


Method 1: Standard bootstrap interval

If we can assume that the estimator is (approximately) normallydistributed, i.e. θ ∼ N(θ, σ), a 100(1− α)% confidence interval forθ is the “usual”, i.e.

θ − zα/2σ < θ < θ + zα/2σ,

where zα/2 is the point for which Prob(Z ≥ zα/2) = α/2 forZ ∼ N(0, 1).


Method 1: Standard bootstrap interval

With the standard bootstrap confidence interval σ is estimated bythe standard deviation of the estimates of a parameter θ that arefound by bootstrap resampling of the values in the original sampleof data. The interval is then

Estimate± zα/2(Bootstrap standard deviation).

For example, using zα/2 = 1.96 gives the standard 95% bootstrapinterval


Method 1: Requirements

It is appropriate to calculate the standard bootstrap interval if

(a) θ is approximately normally distributed

(b) θ is unbiased (its mean value for repeated samples from thepopulation of interest is θ)

(c) bootstrap resampling gives a good approximation to σ

Under the conditions above the method is potentially usefulwhenever an alternative method for approximating σ is not easilyavailable.In practice it may be possible to avoid requirement b) byestimating bias in θ as part of the bootstrap procedure.


Example continues

A 95% boostrap confidence interval for the correlation coefficient is

θ ± zα/2seB = 0.309± 1.96 · 0.191 = 0.309± 0.374

giving us the interval

[−0.065, 0.683].

According to this result there is no significant correlation betweenthe weight chance and digestion efficiency of snow geese (theconfidence interval contains 0).


Example: Bootstrap distribution of r

Correlation coefficient

Fre

quen

cy

−0.2 0.0 0.2 0.4 0.6 0.8

050

100

150

200


Method 2: Percentile bootstrap interval

The distribution of bootstrap replicates is a description of thedistribution of the parameter of interest.

Rather than using the normality of the estimator, we can usethe distribution itself to assign the lower and upper confidencelimits.

The bootstrap replicates are ranked from lowest to highest(assuming no ties) and the lower and upper limits are chosenas follows

I Lower limit: choose the bootstrap replicate at which 100α2 %

of the replicates lie below that valueI Upper limit: choose the bootstrap replicate at which 100α

2 %of the replicates lie above that value

If the bootstrap distribution is symmetric then the limits willbe symmetric about the estimate, otherwise asymmetric


Example: Percentile intervals

The new confidence interval for the correlation coefficient is[−0.042, 0.589]

Correlation coefficient

Fre

quen

cy

−0.2 0.0 0.2 0.4 0.6 0.8

050

100

150

200

StandardEstimatePercentile


Other bootstrap confidence intervals

I Standard bootstrap confidence intervals have good theoreticalcoverage probabilities but tend to be erratic in actual practice

I Percentile intervals are less erratic, but have less satisfactorycoverage properties.

I Other possibilitiesI Bias-corrected percentile method (corrects for the initial

sample being a biased sample of the distribution)I Accelerated and bias-corrected percentile method (as the

bias-corrected method but allows the standard error of θ vary)


Conclusions

I Bootstrap is mostly used for estimating standard error of anestimate or for finding confidence intervals for parameters

I Can also be used in hypothesis testing

I Jackknife can be thought as an approximation of bootstrapbut it is less effective than bootstrap (only n samples)

I There is also a parametric version of bootstrap (fit aparametric model to the data and use this model to producebootstrap samples)


Cross-validation

I Originally suggested by Kurtz (1948)

I Extended to double cross-validation by Mosier (1959)

I Original objective was to verify replicability of results, i.e. tofind out whether a result is replicable or just obtained by achance


Prediction error

I Measures how well a model predicts the response value of afuture observation

I Used for model selection

I In regression models prediction error is defined as theexpected square difference between a future response and theprediction of the model in regression models. i.e.

E(y − y)2


Linear regression model

I We have a set of points {(xi , yi )}, i = 1, ..., n

I We are interested in the unknown relation between yi and xi ,e.g.

yi =

p∑j=0

βjxji + εi

I Error terms εi often assumed to be independent and N(0, σ2)distributed


Model selection problem

We fit a first order linear model (left) and a quadratic model(right) to the same data (n=10)

0 2 4 6 8 10

05

15

25

35

x

y

0 2 4 6 8 10

05

15

25

35

x

y

0 2 4 6 8 100

51

52

53

5

x

y

0 2 4 6 8 100

51

52

53

5

x

y


Which model is the best?

I I.e. how well are we going to predict future data drawn fromthe same distribution?

I As a prediction error, we can compute the average

n∑i=1

(yi − yi )2

n

where yi is the prediction by the model (either first order orquadratic)

I Here, yi taken from the sample in hand, and is not a futureresponse.

I What is the prediction error for a new sample? How to get anew sample to compute this prediction?


Cross validation

I To obtain a more realistic estimate of the prediction error, weneed a test sample that is different from the training sample.

I One can, for example, split the observations {(xi , yi )},i = 1, ..., n, into two sets, one for training and one for testing→ cross-validation


Cross validation

I Simple cross-validation: Divide the data into two groups, onefor training and one for testing. The parameters β areestimated from the training set. The cross-validation is theprediction error computed using the test set.

I Double cross-validation: Models are estimated for bothsub-samples, and then both equations are used to generatecross-validation


K -fold cross validation

I Data divided into K subsets (depends on n into how many)I For each subset i (repeate K times)

I Estimate the model based on the remaining K − 1 subsets(without the subset i)

I Compute the prediction error for the subset i

I Compute the average of the K prediction errors as the overallestimate of the prediction error

I K large: prediction error can be estimated accurately but thevariance of the estimator may be large.

I For large data sets, even 3-fold cross-validation is accurate.For small data, we may have to use leave-one-outcross-validation where K = n


Conclusions

I Used in model selection in regression

I Can be used also e.g. in classification (yi a label indicatingclass and the prediction error is defined as a missclassificationrate)


References

I Efron, B. (1979). Bootstrap methods: another look at thejackknife. Annals of Statistics 7, 1-26.

I Efron, B. and Tibshirani, R.J. (1993). An introduction to tehbootstrap. Chapman & Hall.

I Kurtz, A. K. (1948). A research test of the Rorschach Test.Personnel Psychology: A Journal of Applied Research 1,41-51.

I Mosier, C. I. (1951). Problems and designs of cross-validation.Educational and Psychological Measurement 11, 5-11.

I Quenouille, M. (1949). Approximate tests of correlation intime series. Journal of the Royal Statistical Society B 11,18-44.

I Tukey, J.W. (1958). Bias and confidence in not quite largesamples.Annals of Mathematical Statistics 29, 614.


@let@token Bootstrap, Jackknife and cross …Bootstrap, Jackknife and cross-validation Jackknife...

Documents

Transcript of @let@token Bootstrap, Jackknife and cross …Bootstrap, Jackknife and cross-validation Jackknife...