F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of...

22
FOUNDATIONS OF STATISTICAL INFERENCE

Transcript of F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of...

Page 1: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

FOUNDATIONS OF STATISTICAL INFERENCE

Page 2: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

DEFINITIONS Statistical inference is the process of

reaching conclusions about characteristics of an entire population using data from a subset, or sample, of that population.

Simple random sampling is a sampling method which ensures that every combination of n members of the population has an equal chance of being selected.

Page 3: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

Statistical Inference The process of making guesses about the truth about a

population parameter from a sample statistic.

Sample (observation)

Make guesses about the whole population

Truth (not observable)

N

xN

ii

2

12

)(

N

xN

i 1

Population parameters

1

)(

ˆ

2

122

n

Xx

sn

n

ii

n

x

X

n

in

Sample statistics

*hat notation ^ is often used to indicate

“estimate”

Page 4: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

A sampling distribution is the distribution of sample statistics computed on the set of all possible random samples of size n that could be drawn from a population.

Most experiments are one-shot deals. So, how do we know if an observed effect from a single experiment is real or is just an artifact of sampling variability (chance variation)?

 

Probability distributions important here.

Because they form the basis of describing the distribution of a sample statistic.

Sampling Distributions

Page 5: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

Statistical Inference is based on Sampling

VariabilitySample Statistic – we summarize a sample into one number; e.g., could be a mean, a difference in means or proportions, an odds ratio, or a correlation or regression coefficient– E.g.: Average support for gun control among women and

men.– E.g.: Proportion of women and men who supported the war in

Iraq.

Sampling Variability – If we could repeat an experiment many, many times on different samples with the same number of subjects, the resultant sample statistic would not always be the same (because of chance!).

Standard Error – a measure of the sampling variability. It is the standard deviation of the sampling distribution.

Page 6: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

• For large enough sample sizes, the shape of the sampling distribution will be approximately normal.

• The sampling distribution is centered on , the mean of the population.

• The standard deviation of the sampling distribution can be computed as the population standard deviation divided by the square root of the sample size.

Page 7: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

Examples of Sample Statistics:Single population mean μ (known

population standard deviation )Single population mean μ (unknown

population standard deviation )Single population proportion pDifference in means μ1,μ2 (t-test)

Difference in proportions p1,p2 (Z-test)Odds ratio/risk ratioCorrelation coefficientRegression coefficient…

Page 8: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

The Central Limit Theorem:If all possible random samples, each of size n,

are taken from any population with a mean and a standard deviation , the sampling distribution of the sample means (averages) will:

x1. Have mean:

nx

2. Have standard deviation (also called standard error for sampling distribution):

3. Be approximately normally distributed regardless of the shape of the parent population (normality improves with larger n).

Page 9: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

Symbol Check

x The mean of the sample means.

x The standard deviation of the sample means. Also called “the standard error of the mean.”

Page 10: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

INTUITIVE TREATMENT OF SAMPLING DISTRIBUTION Suppose we have a population of size 100. We then draw a

sample of 100 people from the population of 100. We then compute the mean. How confident could we be about the computed sample statistic? How much sampling error would there be?

Suppose we have a population of size 100. We then draw every sample of size 99 from this population. We compute means for all of these samples. How many different samples could we draw? C99

100 =100? How much sampling error would there be in the computed means?

Suppose we have a population of size 100. We then draw a sample of 50 people from the population of 100. We then compute the means on each sample. How many different samples could we draw? C50

100 =1.089X1029 . How much sampling error would there be in the computed means?

The principle is that the larger the sample size, relative to the population we are drawing from, the lower the sampling error. The smaller the sample size, relative to the population we are drawing from, the larger the sampling error.

Page 11: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Page 12: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Page 13: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

Null RegionAlternative Region Alternative Region

Page 14: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Page 15: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

HYPOTHESIS TESTING USING THE NORMAL (Z) DISTRIBUTION

Calculate the estimated statistic from the sample.

Record the sample standard deviation and N. Then calculate the standard error of the

sampling distribution from the preceding.

Then calculate Z

Compare the calculated value for Z to the table of Z statistics.

Page 16: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

EXAMPLE: . Suppose we draw a sample with mean, variance,

and N as follows:

  How confident could we be that the mean was not

actually 10 (the a null hypthesis). We might then ask how many standard deviations (Z units) away 12.5 is from 10. We can then calculate a p value from the Z-statistic.

     Using the preceding table, there is only a .0016

chance that with a sample of size 50 and variance 36 we could have drawn a sample with mean 12.5 when the actual population mean was 10.

Page 17: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

EXAMPLE: With the NES92, we draw a sample of 1500 respondents. On the

variable, liking for Clinton we find a mean of 4.1 with a variance of 1.6 . What is the probability that the real liking for Clinton in the population is only 3, rather than the calculated 4.1?

Using the earlier table, the probability is less than 0.001 that the real liking for Clinton is 3.0.

What factors determine this probability?

 1) The magnitude of the hypothesized difference(the numerator)

2) The variance of the sample (1.6)

3) The N of the sample (1500)

Note that we can also think of these three quantities as distances in standard deviation units on the sampling distribution. See slide 13 again.

Page 18: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

THE CONFIDENCE INTERVAL APPROACH

Let UCL and LCL refer respectively to upper and lower confidence limits. Let μ be the estimated parameter. Let Z be the Z-statistic associated with the desired p-value. Let σe be the standard error. Then, calculate the confidence limits as follows.

Page 19: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

EXAMPLE: Construct a 99 percent confidence interval

around the point estimate 12.5 from the preceding example with the given information.

The interval does not contain zero. Therefore, we can be at least 99 percent confident the estimated mean is not zero. It also does not contain 10, so we can be at least 99 percent confident that the true estimate is not 10.

Page 20: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

USING THE T-DISTRIBUTION In actuality, we seldom know the population variance or

standard deviation. Under these circumstances we use the t distribution, rather than the Z (normal distribution) for our tests of significance. Unlike the Z distribution of which there is only one, there are many t distributions. One for each possible degree of freedom for the test. (Degrees of freedom refer to N minus the number of parameters estimated.) Note, however that as N becomes large, say 100, the t distribution equals the z distribution.

The t-distribution is used in precisely the same way as the Z in conducting the preceding tests. Simply substitute in the numbers for the t-distribution where you have the numbers for the Z distribution.

The t-distribution takes into account that we do not have full information about the population variability. With small N, the t-distribution is somewhat more conservative than the Z. It gives the same answer if N is larger than about 1,000. It is also quite close when N is larger than about 100.

See the next table.

Page 21: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Page 22: F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

THE P-VALUE

The p-value is the probability that we would have observed our sample statistic (or something more unexpected) just by chance if the null hypothesis (null value) is true.

For example, we might estimate as above 12.5, but posit a null value of 10.

Small p-values mean the null value is unlikely given our data.