Parameter Estimation

31
Parameter Parameter Estimation Estimation Estimation of the Mean Estimation of the Mean

description

Parameter Estimation. Estimation of the Mean. Suppose y 1 ………. y n are independent and identically distributed. The method of moments estimator (and least squares estimator) of the population mean μ is given by the sample mean. Also. where σ is the population standard deviation. - PowerPoint PPT Presentation

Transcript of Parameter Estimation

Page 1: Parameter Estimation

Parameter EstimationParameter Estimation

Estimation of the MeanEstimation of the Mean

Page 2: Parameter Estimation

Suppose y1 ………. yn are independent and identically distributed. The method of moments estimator (and least squares estimator) of the population mean μ is given by the sample mean

Also

where σ is the population standard deviation

Page 3: Parameter Estimation

It can be shown that

ˆ~ (0,1)

ˆ. .( )N

S E

The relation comes from the Central Limit Theorem and usually holds good in practice for all but the smallest values of n.

Confidence intervals for the population mean can be calculated.

(often by using sample mean +/- 2 standard errors)

Page 4: Parameter Estimation

However,σ usually needs to be estimated by the sample standard deviation and this introduces an additional degree of uncertainty which should lead to wider confidence intervals.

Page 5: Parameter Estimation

However,σ usually needs to be estimated by the sample standard deviation and this introduces an additional degree of uncertainty which should lead to wider confidence intervals.

When the population distribution is approximately normal we can make an appropriate correction by replacing the normal distribution with the t distribution with n-1 degrees of freedom.

Otherwise a greater correction is ideally required.

Page 6: Parameter Estimation

Example: Failures DataExample: Failures Data

The numbers of operating hours between successive failures of the air conditioning equipment aboard an aircraft were as follows:

413 14 58 37 100 65 9 169 447 184 36 201118 34 31 18 18 67 57 62 7 22 34 The data are also available as the R vector failures. We have n = 23 observations.

Page 7: Parameter Estimation

The data are clearly very positively skewed so an exponential Q-Q plot is carried out.

Page 8: Parameter Estimation
Page 9: Parameter Estimation

The graph suggests that they might reasonably be modelled by an Exp( μ -1) distribution (exponential mean μ ), corresponding to a memoryless property in the failure times.

From the plot, a resistant estimate of μ would appear to be about 80, but it is difficult to make any (graphical) assessment of uncertainty.

Page 10: Parameter Estimation

Gradient = 80

Page 11: Parameter Estimation

We now wish to find an estimate of the population mean, μ.

Let μ be the sample mean.

Page 12: Parameter Estimation

We can also work out the standard error

S.E. is given by σ/√n so is 119.2897/√23

This calculates as 24.87

Page 13: Parameter Estimation

A 95% confidence interval can be calculated by the usual methods or obtained on R. Since the population standard deviation has been estimated from the sample and the sample size is reasonably small, the t distribution is appropriate.

Page 14: Parameter Estimation

So the 95% confidence interval is [44.11,147.28].

This should really be widened a little bit to allow for non-normality of the population distribution.

Page 15: Parameter Estimation

Estimation of the MedianEstimation of the Median

Sometimes it can be more useful to look at the population MEDIAN rather than mean. A possible estimator of this is given by the sample median, m. Here, at least when n is moderately large,

where f(m) is the density of the underlying distribution at the median m.

Page 16: Parameter Estimation

For a normally distributed N(μ, σ) population, the sample median has standard error 1.253σ/√n, and so is a less efficient estimator of μ than the sample mean.

However, for longer-tailed distributions, the sample median is a more efficient estimator of location than the sample mean. This is a closely related to the fact that the sample median is a resistant estimator.

We will use the median in the “failures” example.

Page 17: Parameter Estimation

Example: Estimation of Median for Example: Estimation of Median for Failures DataFailures Data

We can estimate the population median from the sample median which has a value of 57. We need to ask, though, how accurate is this estimate and can we use it to construct a confidence interval for m?

Page 18: Parameter Estimation

We could use the formula for the standard error quoted earlier to calculate confidence intervals but the sample size is not very large.

We instead use bootstrap estimation to answer these questions.

Page 19: Parameter Estimation

Bootstrap estimation is a fairly general technique for making assessments ofuncertainty about estimators. It typically requires the use of simulation.

What we would like is the sampling distribution of m - m, giving the variation of the sample median about the population median.

^

Page 20: Parameter Estimation

However, this requires knowledge of the (unknown) underlying population distribution.

We therefore substitute for the population distribution by using instead the empirical distribution of the data (the bootstrap).

Page 21: Parameter Estimation

Suppose this empirical distribution has median m*. Let the random variable m* denote the sample median of a random sample (independent identicallydistributed observations) of size 23 from this empirical distribution.

Then we would expect that the sampling distribution of m* - m* should be very close to that of m - m.

^^

Page 22: Parameter Estimation

Now let us study the distribution of of m*- m*. Since we know the value of m* (57), it is just a case of looking at m*. We will use simulation and set up an R vector called ms of size 1000 and use it to store the results of 1000 simulations of m*.

First consider the command sample.

^

^

^

Page 23: Parameter Estimation
Page 24: Parameter Estimation

Now use a for loop to do a simulation

Page 25: Parameter Estimation

Recall that each component of ms is the median of a random sample of size 23,obtained by sampling with replacement from failures. Hence the variability in msis much less than the variability in failures itself.

Typing qqnorm(ms) produces the normal Q-Q plot for the distribution of ms.

Page 26: Parameter Estimation
Page 27: Parameter Estimation

This distribution is not particularly normal, so the earlier theory for the sampling distribution of the median would not have been very good here.

A reasonable 95% confidence interval, more formally a 95% percentile interval, for the original population median m is given by [Qe(0.025), Qe(0.975)], where Qe is the empirical quantile function of the bootstrap simulations ms of m̂

Page 28: Parameter Estimation

So that (34,67) is a reasonable confidence interval for m.

Again, this confidence interval should be widened a little to allow for the approximation involved in using the empirical distribution of the data.

Page 29: Parameter Estimation

Failures data - further discussion.

If we assume that the population distribution is Exp( μ-1), then for the population median, m, we have m = μ ln 2.

Page 30: Parameter Estimation

Failures data - further discussion.

If we assume that the population distribution is Exp( μ-1), then for the population median, m, we have m = μ ln 2.

It follows that we can also obtain an estimate of the population mean, μ, from an estimate of m. In particular the 95% confidence interval for m of (34, 67) obtained above translates into a 95% confidence interval for μ of (49.1, 96.7).

Page 31: Parameter Estimation

This should be compared with that obtained earlier by estimation based on the sample mean (44,147).

However, no allowance is made here for the uncertainty involved in the exponential assumption.