Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise...

11
Uncertainty in fall time surrogate • Prediction variance vs. data sensitivity – Non-uniform noise – Example 3.2.1 • Uncertainty in fall time data • Bootstrapping – Estimating accuracy of statistics

Transcript of Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise...

PowerPoint Presentation

Uncertainty in fall time surrogatePrediction variance vs. data sensitivityNon-uniform noiseExample 3.2.1Uncertainty in fall time dataBootstrappingEstimating accuracy of statistics

Linear RegressionFunctional formFor linear approximation

Define then Regression coefficients

AltogetherDifferentiate with respect to ith component of y.

2Example 3.2.1Given dataLinear fit X-2-1012Y-1.5-1.501.251.75

3Prediction variance with variable noisePrediction variance based on assumptions on noiseVariance of surrogate prediction

Allows different variances.Note that with different variances better to use weighted least squares.

Comparison for example at x=3Prediction variance (surprisingly small, why?)

Variance of prediction

If all data variances are the same, check you get the same If not, variance of y5 is most important

BootstrappingWhen we calculate statistics from random data bootstrapping can provide error estimates.If we had multiple samples we can use them to estimate the error in the computation.With bootstrapping we perform the amazing feat of getting the error from a single sample.This is done by resampling with replacement the same data.We draw a samples from the original data without removing it so that the new sample may have repetitions.We repeat for many bootstrap samples to get a distribution of the statistic of interest.

Example with sample meanx=randn(1,10)x =0.5377 1.8339 -2.2588 0.8622 0.3188 -1.3077 -0.4336 0.3426 3.5784 2.7694[bootstat,bootsam]=bootstrp(1000,@mean,x);bootsam(:,1:5) ans =

1 2 5 2 5 1 8 1 10 10 6 4 3 1 2 8 6 10 8 3 10 2 2 9 2 2 7 9 9 2 6 3 6 1 9 5 7 10 4 6 1 7 1 3 6 4 8 5 9 2Each column contains the indices of one boot strap sample. For example, the last column indicates that we drew x(2)=1.8339 four times, x(6) twice, along with x(3), x(5), x(9), and x(10).

What is the probability of getting no repetitions?Matlab bootstrp routinebootstat = bootstrp(nboot,bootfun,d1,...) draws nboot bootstrap data samples, computes statistics on each sample using bootfun, and returns the results in the matrix bootstat. bootfun is a function handle specified with @. Each row of bootstat contains the results of applying bootfun to one bootstrap sample. [bootstat,bootsam] = bootstrp(...) returns an n-by-nboot matrix of bootstrap indices, bootsam. Each column in bootsam contains indices of the values that were drawn from the original data sets to constitute the corresponding bootstrap sample

Statistics for sample mean mean(x) =0.6243mean(bootstat)=0.6091

std(x) =1.7699std(bootstat)=0.5191In this case we know that the standard deviation of the mean is the native standard deviation divided by the square root of the sample size, or about 0.56

In other cases we may not have a formula. May use bootstrapping to estimate accuracy of probability

Sample standard deviation[bootstat,bootsam]=bootstrp(10000,@std,x);mean(bootstat)=1.6387std(bootstat)=0.3415

Check ratio

a=randn(10,10000); s=std(a); mean(s) = 0.9728std(s)=0.2302

Bootstrap ratio is 0.208, actual ratio 0.237

ExerciseThe variables x and y are normally distributed with N(0,1) marginal distributions and a correlation coefficient of 0.7.Generate a sample of 10 pairs and use bootstrap to estimate the accuracy of the correlation coefficient you obtain from the sample.Compare to the accuracy you can get from a formula or by repeating step 1 many times.