Limits to Statistical Theory Bootstrap analysis

Limits to Statistical TheoryBootstrap analysis

ESM 206

11 April 2006

Assumption of t-test

• Sample mean is a t-distributed random variable– Guaranteed if observations are normally distributed random variables or

sample size is very large

– In practice, OK if observations are not too skewed and sample size is reasonably large

• This assumption also applies when using standard formula for 95% CI of mean

Resampling for a confidence interval of the mean

IN AN IDEAL WORLD

• Take sample

• Calculate sample mean

• Take new sample

• Calculate new mean

• Repeat many times

• Look at the distribution of sample means

• 95% CI ranges from 2.5 percentile to 97.5 percentile

• IN THE REAL WORLD

• Find some way to simulate taking a sample

• Calculate the sample mean

• Repeat many times

• Look at the distribution of sample means

• 95% CI ranges from 2.5 percentile to 97.5 percentile

Bootstrap resampling

PARAMETRIC BOOTSTRAP• Assume data are random variables from

a particular distribution– E.g., log-normal

• Use data to estimate parameters of the distribution

– E.g., mean, variance

• Use random number generator to create sample

– Same size as original– Calculate sample mean

• Allows us to ask: What if data were a random sample from specified distribution with specified parameters?

NONPARAMETRIC BOOTSTRAP• Assume underlying distribution from

which data come is unknown– Best estimate of this distribution is the

data themselves – the empirical distribution function

• Create a new dataset by sampling with replacement from the data

– Same size as original– Calculate sample mean

WHICH IS BETTER?• If underlying distribution is correctly

chosen, parametric has more precision• If underlying distribution incorrectly

chosen, parametric has more bias

TcCB in the cleanup site

• Parametric bootstrap– If Y is log-normal, it is specified in

terms of mean and standard deviation of X = log(Y)

– Mean = -0.547

– SD = 1.360

– Use “Monte Carlo Simulation” to generate 999 replicate simulated datasets from log-normal distribution

– Calculate mean of each replicate and sort means

– 25th value is lower end of 95% CI

– 975th value is upper end of 95% CI

100.0%

maximum

quartile

median

quartile

minimum

168.64

Quantiles

Std Dev

Std Err Mean

upper 95% Mean

lower 95% Mean

3.9151948

20.0156

2.2809894

8.4581788

-0.627789

Moments

Cleanup

Distributions

100.0%

maximum

quartile

median

quartile

minimum

168.64

Quantiles

Std Dev

Std Err Mean

upper 95% Mean

lower 95% Mean

3.9151948

20.0156

2.2809894

8.4581788

-0.627789

Moments

Cleanup

Distributions

95% CI: [-0.678, 8.458]

Parametric bootstrap: results

• 95% CI: [0.917, 2.293]

Distribution of sample means

Bin (label shows upper limit)

Std Dev

Std Err Mean

upper 95% Mean

lower 95% Mean

-0.547426

1.3604488

0.1550375

-0.238642

-0.85621

Moments

log(cleanup)

Distributions

Normal QQ Plot

• Sort data

• Index the values (i = 1,2,…,n)

• Calculate q = i /(n+1)– This is the quantile

• Plot quantiles against data values– This is the empirical cumulative

distribution function (CDF)

• Construct CDF of standard normal using same quantiles

• Compare the distributions at the same quantiles

6.01 .05 .10 .25 .50 .75 .90 .95 .99

-3 -2 -1 0 1 2 3

Normal Quantile Plot

Std Dev

Std Err Mean

upper 95% Mean

lower 95% Mean

-0.547426

1.3604488

0.1550375

-0.238642

-0.85621

Moments

log(cleanup)

Distributions

Nonparametric bootstrap: results

• 95% CI: [0.851, 9.248]

Bootstrap mean

Bootstrap and hypothesis tests

• One sample t-test– Calculate bootstrap CI of mean– Does it overlap test value?

• Paired t-test– Calculate differences:

• Di = xi - yi

– Find bootstrap CI of mean difference– Does it overlap zero?

• Two-sample t-test– Want to create simulated data where

H0 is true (same mean) but allow variance and shape of distribution to differ between populations

– Easiest with nonparametric:• Subtract mean from each sample.

Now both samples have mean zero• Resample these residuals, creating

simulated group A from residuals of group A and simulated group B from residuals of group B

– Generate distribution of t values– P is fraction of simulated t’s that

exceed t calculated from data

TcCB: H0: cleanup mean = reference mean

• t = 1.45

• Bootstrapped ‘t’ values do not follow a t distribution!

• P = 0.02

Bin (label shows upper limit)

Limits to Statistical Theory Bootstrap analysis

Documents

Transcript of Limits to Statistical Theory Bootstrap analysis

Bootstrap Analysis Double- Independent Programming: Issues ... · Simple Case Bootstrap: The Idea ! Approximate the distribution of a statistical estimator using the observed sample

APPLICATION OF THE BOOTSTRAP STATISTICAL …slac.stanford.edu/pubs/slacpubs/4500/slac-pub-4669.pdf · Bl, BP, B, B,, and B,. The means and standard deviations of the bootstrap 25%

Application of the Bootstrap Statistical Method in ... · Application of the Bootstrap Statistical Method in ... the NASA Technical Reports Server, ... Application of the Bootstrap

On the Bootstrap for Spatial Econometric Models...The bootstrap is a statistical procedure that estimates the distributions of estimators or test statistics by resampling the data.

Black swans, or the limits of statistical modelling

Black swans, or the limits of statistical modelling · Black swans, or the limits of statistical modelling statistical models,limits,black swans ...

Statistical Bootstrap Testaddition, all three sets of bootstrap results have been produced using the SOC 9.3 algorithm and parameter settings. We begin with an overview of the Transiting

Bootstrap Methods Short Course - Biometrische … • The bootstrap is a general method for doing statistical analysis without making strong parametric assumptions. • Efron’s nonparametric

Limits to Statistical Theory Bootstrap analysis ESM 206 11 April 2006.

An Invitation to the Bootstrap : Panacea for Statistical ...€¦ · Statistical Inference: bias and variance estimation, conﬁdence intervals and testing hypothesis. It shows also

Statistical Methods for Data Analysis upper limits Luca Lista INFN Napoli.

Statistical Methods for Discovery and Limits

Limits on the power of quantum statistical zero-knowledgewatrous/Papers/HonestVerifierQuantumZero... · Limits on the power of quantum statistical zero-knowledge John Watrous Department

Statistical Computation of Tolerance Limits

Statistical Consulting Topics The Bootstraphomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/bootstrap... · Statistical Consulting Topics The Bootstrap... \The bootstrap is a

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Statistical DGA Limits for LTCs - Transformers Committeetransformerscommittee.org/subcommittees/fluids/C57_139/S09-Dukarm... · IEEE/PES TC WG C57.139 Apr 2009 Statistical DGA Limits

Bootstrap 3 vs. bootstrap 4

Constructing Statistical Tolerance Limits for Non … Tolerance Limits.pdf · General Approach for Constructing Statistical Tolerance Limits ... tolerance limits for transformed data

Bootstrap - dev.cs.kku.ac.th · Bootstrap • Bootstrap เป็น CSS Framework ที่ประอบด้วยคลาสพร้อม ใชท้ี่ช่วยใน