1 Session 7 Standard errors, Estimation and Confidence Intervals.
-
Upload
marissa-cameron -
Category
Documents
-
view
228 -
download
1
Transcript of 1 Session 7 Standard errors, Estimation and Confidence Intervals.
![Page 1: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/1.jpg)
1
Session 7
Standard errors, Estimation
and Confidence Intervals
![Page 2: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/2.jpg)
2
By the end of this session, you will be able to explain what is meant by an estimate of a
population parameter, and its standard error
explain the meaning of a confidence interval
calculate a confidence interval for the population mean using sample data, and state the assumptions underlying the above calculation
Learning Objectives
![Page 3: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/3.jpg)
3
Inference is about drawing conclusions concerning population characteristics using information gathered from the sample
It is assumed that the sample is representative of the population
A further assumption is that the sample has been drawn as a simple random sample from an infinite population
Reminder: What is inference?
![Page 4: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/4.jpg)
4
Population Sample
Mean Variance 2 s2
Std. deviation s
x
Population characteristics (parameters) are denotedby greek letters, sample values by latin letters
Sample characteristics are measurable and form estimates of the population values.
Estimation
![Page 5: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/5.jpg)
5
What is the mean number of personsper household in Mukono district?
Data from 80 households surveyed in thisdistrict gave a mean household size of 5.6with a standard deviation 3.30.
Hence our best estimate of the mean householdsize in Mukono district is therefore 5.6.
What results are likely if we sampled again with a different set of households?
Example of statistical inference
![Page 6: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/6.jpg)
6
Example using Stata
Open Stata file UNHS_hh&poverty.dta
Numeric code is 109 for Mukono
![Page 7: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/7.jpg)
7
Use summarize dialogueType db summarize or use menu
Statistics Summaries, tables Summaries Summary Statistics
Variable hhsize
Then use by/if/in tab
dist ==109 is condition
![Page 8: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/8.jpg)
8
Results
Summaries for Mukono only
Summaries for whole sample
![Page 9: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/9.jpg)
9
Suppose 10 University students were given astandard meal and the time taken to consumethe meal was recorded for each.
Suppose the 10 values gave:
mean = 11.24, with std.dev.= 0.864
Let’s assume this exercise was repeated 50 timeswith different samples of students
A histogram of the resulting 500 obs. appears below, followed by a histogram of the 50 means from each sample
The distribution of means
![Page 10: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/10.jpg)
10
The data appear to follow a normal distribution
Histogram of raw data
![Page 11: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/11.jpg)
11
The distn of the sample means is called its Sampling Distribution
Notice that the variability of the above distn is smaller than the variability of the raw data
Histogram of 50 sample means
![Page 12: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/12.jpg)
12
The estimate of the mean household size inMukono district was 5.6.
Is this sufficient for reporting purposes, given that this answer is based on one particular sample?
What we have is an estimate based on a sample of size 80. But how good is this estimate?
We need a measure of the precision, i.e. variability, of this estimate…
Back to estimation…
![Page 13: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/13.jpg)
13
The accuracy of the sample mean asan estimate of depends on:
(i) the sample size (n)
since the more data we collect, the morewe know about the population, and the
(ii) inherent variability in the data 2
These two quantities must enter the measureof precision of any estimate of a population parameter. We aim for high precision, i.e.low standard error!
x
Sampling Variability
![Page 14: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/14.jpg)
14
Precision of as estimate of is given by:
the standard error of the mean.
Also written as s.e.m., or sometimes s.e.
It is estimated using the sample data: s/n
For example on household size,
s.e.=3.298/80 = 3.298/8.944 = 0.369
Standard error of the mean
s.e. x n
x
![Page 15: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/15.jpg)
15
Instead of using a point estimate, it is usually more informative to summarise using an interval which is likely (i.e. with 95% confidence) to contain .
This is called an interval estimate or a Confidence Interval (C.I.)
For example, we could report that the mean household size of HHs in Mukono district is 5.6 with 95% confidence interval (4.87, 6.33), i.e. there is a 95% chance that the interval (4.87,6.33) includes the true value .
Confidence Interval for
![Page 16: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/16.jpg)
16
Analysis using Stata
Type db ci or use menu
Use the by/if/in tab as before
![Page 17: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/17.jpg)
17
Results
For whole sample
Just for Mukono
![Page 18: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/18.jpg)
18
The 95% confidence limits for (lower and upper)are calculated as:
and
2½%2½%
0–t t
where tn-1 is the 5% level for
the t-distribution with (n-1) degrees of freedom.Statistical tables and statistical software give t-values.
1 ( )nx t s n 1 ( )nx t s n
Finding the Confidence Interval
![Page 19: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/19.jpg)
19
2½%2½%
0–t t
P 10 5 2 = 1 6.31 12.7 31.82 2.92 4.30 6.963 2.35 3.18 4.544 2.13 2.78 3.755 2.02 2.57 3.36
6 1.94 2.45 3.147 1.89 2.36 3.008 1.86 2.31 2.909 1.83 2.26 2.8210 1.81 2.23 2.76
20 1.72 2.09 2.5330 1.70 2.04 2.4640 1.68 2.02 2.4260 1.67 2.00 2.39
1.64 1.96 2.33
1 ( )nx t s n
t-values for finding 95% C.I.
![Page 20: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/20.jpg)
20
10
11
12
13
0 5 10 15 20 25 30 35 40 45 50
If we sampled repeatedly and found a95% C.I. each time, only 95% of themwould include the true , i.e. there is a 95% chance that a single interval would include .
Correct interpretation of C.I.s
![Page 21: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/21.jpg)
21
For rural households (n=40) in Mukono, wefind mean=6.43, std.dev.=3.54 for the numberof persons per household.
Hence a 95% confidence interval for the true mean number of persons per household:
6.43 t39 (s/n) = 6.43 2.02(3.54/40)
= 6.43 1.13
= (5.30, 7.56)
Can you interpret this interval? Write down your answer. We will then discuss.
An example (persons per HH)
![Page 22: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/22.jpg)
22
Analysis in StataPress Page Up to retrieve the last commandThen add “& rurban == 0” to the condition
Or use the menus and change the dialogue
![Page 23: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/23.jpg)
23
The above computation of a confidence interval assumes that the data have a normal distribution.
More exactly, it requires the sampling distribution of the mean to have a normal distribution.
What happens if data are not normal?
Not a serious problem if sample size is large because of the Central Limit Theorem, i.e. that the sampling distribution of the mean has a normal distribution, for large sample sizes.
Underlying assumptions
![Page 24: 1 Session 7 Standard errors, Estimation and Confidence Intervals.](https://reader030.fdocuments.us/reader030/viewer/2022020718/5515f9ed550346cf6f8b582d/html5/thumbnails/24.jpg)
24
So even when data are not normal, the formula for a 95% confidence interval will give an interval whose “confidence” is still high - approximately 95%.
It is better to attach some measure of uncertainty than worry about the exact confidence level.
Assumptions - continued