Post on 15-Jan-2016
PowerPoint Presentation
Uncertainty in fall time surrogatePrediction variance vs. data sensitivityNon-uniform noiseExample 3.2.1Uncertainty in fall time dataBootstrappingEstimating accuracy of statistics
Linear RegressionFunctional formFor linear approximation
Define then Regression coefficients
AltogetherDifferentiate with respect to ith component of y.
2Example 3.2.1Given dataLinear fit X-2-1012Y-1.5-1.501.251.75
3Prediction variance with variable noisePrediction variance based on assumptions on noiseVariance of surrogate prediction
Allows different variances.Note that with different variances better to use weighted least squares.
Comparison for example at x=3Prediction variance (surprisingly small, why?)
Variance of prediction
If all data variances are the same, check you get the same If not, variance of y5 is most important
BootstrappingWhen we calculate statistics from random data bootstrapping can provide error estimates.If we had multiple samples we can use them to estimate the error in the computation.With bootstrapping we perform the amazing feat of getting the error from a single sample.This is done by resampling with replacement the same data.We draw a samples from the original data without removing it so that the new sample may have repetitions.We repeat for many bootstrap samples to get a distribution of the statistic of interest.
Example with sample meanx=randn(1,10)x =0.5377 1.8339 -2.2588 0.8622 0.3188 -1.3077 -0.4336 0.3426 3.5784 2.7694[bootstat,bootsam]=bootstrp(1000,@mean,x);bootsam(:,1:5) ans =
1 2 5 2 5 1 8 1 10 10 6 4 3 1 2 8 6 10 8 3 10 2 2 9 2 2 7 9 9 2 6 3 6 1 9 5 7 10 4 6 1 7 1 3 6 4 8 5 9 2Each column contains the indices of one boot strap sample. For example, the last column indicates that we drew x(2)=1.8339 four times, x(6) twice, along with x(3), x(5), x(9), and x(10).
What is the probability of getting no repetitions?Matlab bootstrp routinebootstat = bootstrp(nboot,bootfun,d1,...) draws nboot bootstrap data samples, computes statistics on each sample using bootfun, and returns the results in the matrix bootstat. bootfun is a function handle specified with @. Each row of bootstat contains the results of applying bootfun to one bootstrap sample. [bootstat,bootsam] = bootstrp(...) returns an n-by-nboot matrix of bootstrap indices, bootsam. Each column in bootsam contains indices of the values that were drawn from the original data sets to constitute the corresponding bootstrap sample
Statistics for sample mean mean(x) =0.6243mean(bootstat)=0.6091
std(x) =1.7699std(bootstat)=0.5191In this case we know that the standard deviation of the mean is the native standard deviation divided by the square root of the sample size, or about 0.56
In other cases we may not have a formula. May use bootstrapping to estimate accuracy of probability
Sample standard deviation[bootstat,bootsam]=bootstrp(10000,@std,x);mean(bootstat)=1.6387std(bootstat)=0.3415
Check ratio
a=randn(10,10000); s=std(a); mean(s) = 0.9728std(s)=0.2302
Bootstrap ratio is 0.208, actual ratio 0.237
ExerciseThe variables x and y are normally distributed with N(0,1) marginal distributions and a correlation coefficient of 0.7.Generate a sample of 10 pairs and use bootstrap to estimate the accuracy of the correlation coefficient you obtain from the sample.Compare to the accuracy you can get from a formula or by repeating step 1 many times.