Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for...

87
Inferences On Two Samples
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    1

Transcript of Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for...

Page 1: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

InferencesOn Two Samples

Page 2: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Overview

• We continue with confidence intervals and hypothesis testing for more advanced models

• Models comparing two means– When the two means are dependent– When the two means are independent

• Models comparing two proportions

Page 3: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Inference about Two Means:Dependent/paired Samples

Page 4: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Learning Objectives

• Distinguish between independent and dependent sampling

• Test hypotheses made regarding matched-pairs data

• Construct and interpret confidence intervals about the population mean difference of matched-pairs data

Page 5: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Two populations

• So far, we have covered a variety of models dealing with one population– The mean parameter for one population– The proportion parameter for one population

• However, there are many real-world applications that need techniques to compare two populations

Page 6: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Examples

• Examples of situations with two populations– We want to test whether a certain treatment

helps or not … the measurements are the “before” measurement and the “after” measurement

– We want to test the effectiveness of Drug A versus Drug B … we give 40 patients Drug A and 40 patients Drug B … the measurements are the Drug A and Drug B responses

Page 7: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Dependent Sample

• In certain cases, the two samples are very closely tied to each other

• A dependent sample is one when each individual in the first sample is directly matched to one individual in the second

• Examples– Before and after measurements (a specific person’s

before and the same person’s after)– Experiments on identical twins (twins matched with

each other

Page 8: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Independent Sample

• On the other extreme, the two samples can be completely independent of each other

• An independent sample is when individuals selected for one sample have no relationship to the individuals selected for the other

• Examples– Fifty samples from one factory compared to fifty

samples from another– Two hundred patients divided at random into two

groups of one hundred

Page 9: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Paired Samples

• The dependent samples are often called matched-pairs

• Matched-pairs is an appropriate term because each observation in sample 1 is matched to exactly one in sample 2– The person before the person after– One twin the other twin– An experiment done on a person’s left eye the

same experiment done on that person’s right eye

Page 10: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test hypotheses made regarding matched-pairs sample

Page 11: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Analysis of Paired Samples

• The method to analyze matched-pairs is to combine the pair into one measurement– “Before” and “After” measurements – subtract the

before from the after to get a single “change” measurement

– “Twin 1” and “Twin 2” measurements – subtract the 1 from the 2 to get a single “difference between twins” measurement

– “Left eye” and “Right eye” measurements – subtract the left from the right to get a single “difference between eyes” measurement

Page 12: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Compute Difference d

• Specifically, for the before and after example,– d1 = person 1’s after – person 1’s before– d2 = person 2’s after – person 1’s before– d3 = person 3’s after – person 1’s before

• This creates a new random variable d• We would like to reformulate our problem

into a problem involving d (just one variable)

Page 13: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test for the True Difference μd

• How do our hypotheses translate?– The two means are equal -> the mean difference is

zero -> μd = 0

– The two means are unequal -> the mean difference is non-zero -> μd ≠ 0

• Thus our hypothesis test is– H0: μd = 0

– H1: μd ≠ 0

– The standard deviation σd is unknown

• We know how to do this!

Page 14: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test for the True Difference

• To solve– H0: μd = 0– H1: μd ≠ 0– The standard deviation σd is unknown

• This is exactly the test of one population mean with the standard deviation being unknown

• This is exactly the subject covered in Unit 8

Page 15: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Assumptions

• In order for this test statistic to be used, the data must meet certain conditions– The sample is obtained using simple random

sampling– The sample data are matched pairs– The differences are normally distributed, or

the sample size (the number of pairs, n) is at least 30

• These are the usual conditions we need to make our Student’s t calculations

Page 16: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example

• An example … whether our treatment helps or not … helps meaning a higher measurement

• The “Before” and “After” results

Before After Difference

7.2 8.6 1.4

6.6 7.7 1.1

6.5 6.2 – 0.3

5.5 5.9 0.4

5.9 7.7 1.8

Page 17: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example (continued)

• Hypotheses– H0: μd = 0 … no difference

– H1: μd > 0 … helps

– (We’re only interested in if our treatment makes things better or not)

– α = 0.01

• Calculations– n = 5 (i.e. 5 pairs)– = .88 (mean of the paired-difference)

– sd = .83

d

Page 18: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example (continued)

• Calculations– n = 5– d = 0.88

– sd = 0.83

• The test statistic is

• This has a Student’s t-distribution with 4 degrees of freedom

3625830

08800

./.

.

n/s

dt d

Page 19: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example (continued)

• Use the Student’s t-distribution with 4 degrees of freedom

• The right-tailed α = 0.01 critical value is 3.75 (i.e. t0.01;4 d.f. = 3.75)

• 2.36 is less than 3.75 (the classical method)• Thus we do not reject the null hypothesis• There is insufficient evidence to conclude that our

method significantly improves the situation• We could also have used the P-Value method. P value

is 0.039 (note: tcdf(2.36, E99, 4) = 0.039)

Page 20: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example (continued)

• Matched-pairs tests have the same various versions of hypothesis tests– Two-tailed tests– Left-tailed tests (the alternatively hypothesis

that the first mean is less than the second)– Right-tailed tests (the alternatively hypothesis

that the first mean is greater than the second)• Each can be solved using the Student’s t

Page 21: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Classical and P-value Approaches

• Each of the types of tests can be solved using either the classical or the P-value approach

Page 22: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Summary of the Method

• A summary of the method– For each matched pair, subtract the first observation

from the second– This results in one data item per subject with the data

items independent of each other– Test that the mean of these differences is equal to 0

• Conclusions

– Do not reject that μd = 0

– Reject that μd = 0 ... Reject that the two populations have the same mean

Page 23: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Construct and interpret confidence intervals about the population mean

difference of matched-pairs data

Page 24: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Confidence Interval for the Paired Difference

• We’ve turned the matched-pairs problem in one for a single variable’s mean / unknown standard deviation– We just did hypothesis tests– We can use the techniques taught in Unit 7

(again, single variable’s mean / unknown standard deviation) to construct confidence intervals

• The idea – the processes (but maybe not the specific calculations) are very similar for all the different models

Page 25: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Confidence Interval for the Paired Difference

• Confidence intervals are of the form

Point estimate ± margin of error• This is precisely an application of our results for a

population mean / unknown standard deviation– The point estimate

d

and the margin of error

for a two-tailed test

ns

t d/ 2

Page 26: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Confidence Interval for the Paired Difference

• Thus a (1 – α) • 100% confidence interval for the difference of two means, in the matched-pair case, is

where tα/2 is the critical value of the Student’st-distribution with n – 1 degrees of freedom

n

std d

/ 2

Page 27: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example

Salt-free diets are often prescribed for people with high blood pressure. The following data was obtained from an experiment designed to estimate the reduction in diastolic blood pressure as a result of following a salt-free diet for two weeks. Assume diastolic readings to be normally distributed.

Find a 99% confidence interval for the mean reduction

Before 93 106 87 92 102 95 88 110

After 92 102 89 92 101 96 88 105

Difference 1 4 -2 0 1 -1 0 5

Page 28: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example (continued)

1. Population Parameter of InterestThe mean reduction (difference) in diastolic blood pressure

2. The Confidence Interval Criteria

a. Assumptions: Both sample populations are assumed normal

b. Test statistic: t with df = 8 1 = 7

c. Confidence level: 1 = 0.99

39.2 and ,0.1 ,8 dsdn

3. Sample evidence

Sample information:

Page 29: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example

4. The Confidence Interval

a. Confidence coefficients:Two-tailed situation, /2 = 0.005t(df, /2) = t(7, 0.005) = 3.50

b. Maximum error:

c. Confidence limits:

5. The Results1.957 to 3.957 is the 99% confidence interval estimate for the amount of reduction of diastolic blood pressure, d..

95728

392503 .)

.)(.(E

957.3 to957.1

957.20.1 to957.20.1

to

EdEd

Page 30: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Summary

• Two sets of data are dependent, or matched-pairs, when each observation in one is matched directly with one observation in the other

• In this case, the differences of observation values should be used

• The hypothesis test and confidence interval for the difference is a “mean with unknown standard deviation” problem, one which we already know how to solve

Page 31: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Inference about Two Means:Independent Samples

Page 32: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Learning Objectives

• Test hypotheses regarding the difference of two independent means

• Construct and interpret confidence intervals regarding the difference of two independent means

Page 33: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Independent Samples

• Two samples are independent if the values in one have no relation to the values in the other

• Examples of not independent– Data from male students versus data from

business majors (an overlap in populations)– The mean amount of rain, per day, reported in

two weather stations in neighboring towns (likely to rain in both places)

Page 34: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Independent Samples

• A typical example of an independent samples test is to test whether a new drug, Drug N, lowers cholesterol levels more than the current drug, Drug C

• A group of 100 patients could be chosen– The group could be divided into two groups of

50 using a random method– If we use a random method (such as a simple

random sample of 50 out of the 100 patients), then the two groups would be independent

Page 35: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test of Two Independent Samples

• The test of two independent samples is very similar, in process, to the test of a single population mean

• The only major difference is that a different test statistic is used

• We will discuss the new test statistic through an analogy with the hypothesis test of one mean

Page 36: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test hypotheses regarding the difference of two independent

means

Page 37: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test Statistic for a Single Mean

• For the test of one mean, we have the variables– The hypothesized mean (μ)– The sample size (n)– The sample mean (x)– The sample standard deviation (s)

• We expect that x would be close to μ

Page 38: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test statistic for the Difference of Two Means

• In the test of two means, we have two values for each variable – one for each of the two samples

– The two hypothesized means μ1 and μ2

– The two sample sizes n1 and n2

– The two sample means x1 and x2

– The two sample standard deviations s1 and s2

• We expect that x1 – x2 would be close to μ1 – μ2

Page 39: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Standard Error of the Test Statistic for a Single Mean

• For the test of one mean, to measure the deviation from the null hypothesis, it is logical to take

x – μ

which has a standard deviation/standard error of approximately

ns2

Page 40: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Standard Error of the Test Statistic for the Difference of Two Means

• For the test of two means, to measure the deviation from the null hypothesis, it is logical to take

(x1 – x2) – (μ1 – μ2)

which has a standard deviation/standard error of approximately

2

22

1

21

ns

ns

Page 41: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

t -Test Statistic for a Single Mean

• For the test of one mean, under certain appropriate conditions, the difference

x – μ

is Student’s t with mean 0, and the test statistic

has Student’s t-distribution with n – 1 degrees of freedom

ns

xt

2

Page 42: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

t - Test Statistic for the Difference of Two Means

• Thus for the test of two means, under certain appropriate conditions, the difference

(x1 – x2) – (μ1 – μ2)

is approximately Student’s t with mean 0, and the test statistic

has an approximate Student’s t-distribution

2

22

1

21

2121

ns

ns

)()xx(t

Page 43: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Distribution of the t-statistic

• This is Welch’s approximation, that

has approximately a Student’s t-distribution• The degrees of freedom is the smaller of

n1 – 1 and n2 – 1

2

22

1

21

2121

ns

ns

)()xx(t

Note: Some computer or calculator calculates the degrees of freedom for this t test statistic with a somewhat complicated formula. But, we’ll use the smallerof n1 – 1 and n2 – 1 as the degrees of freedom.

Page 44: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

A Special Case

• For the particular case where be believe that the two population means are equal, or μ1 = μ2, and the two sample sizes are equal, or n1 = n2, then the test statistic becomes

with n – 1 degrees of freedom, where n = n1 = n2

n

ss

)xx(t

22

21

21

Page 45: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

General Test Procedure

• Now for the overall structure of the test– Set up the hypotheses– Select the level of significance α– Compute the test statistic– Compare the test statistic with the appropriate

critical values– Reach a do not reject or reject the null

hypothesis conclusion

Page 46: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Assumptions

• In order for this method to be used, the data must meet certain conditions– Both samples are obtained using simple

random sampling– The samples are independent– The populations are normally distributed, or

the sample sizes are large (both n1 and n2 are at least 30)

• These are the usual conditions we need to make our Student’s t calculations

Page 47: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

State Hypotheses & level of significance

• State our two-tailed, left-tailed, or right-tailed hypotheses

• State our level of significance α, often 0.10, 0.05, or 0.01

Page 48: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Compute the Test Statistic

• Compute the test statistic

and the degrees of freedom, the smaller ofn1 – 1 and n2 – 1

• Compute the critical values (for the two-tailed, left-tailed, or right-tailed test

2

22

1

21

2121

ns

ns

)()xx(t

Page 49: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Make a Statistical Decision

• Each of the types of tests can be solved using either the classical or the P-value approach

• Based on either of these methods, do not reject or reject the null hypothesis

Page 50: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example

• We have two independent samples– The first sample of n = 40 items has a sample mean of

7.8 and a sample standard deviation of 3.3– The second sample of n = 50 items has a sample

mean of 11.6 and a sample standard deviation of 2.6– We believe that the mean of the second population is

exactly 4.0 larger than the mean of the first population– We use a level of significance α = .05

• We test versus 4211

:H4210

:H

Page 51: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example (continued)

• The test statistic is

• This has a Student’s t-distribution with 39 degrees of freedom

• The two-tailed critical value is -2.02, so we do not reject the null hypothesis (notice: invT(.025,39) = -2.02 or use a t-table)

• Or, compute the p-value which is 0.093 greater than 0.05 level of significance. (Notice that: 2*tcdf(-E99,-1.72,39) = 0.093)

• We do not have sufficient evidence to state that the deviation from 4.0 is significant

721

50

62

40

33

049128722

2

2

2

1

2

1

2121 ...

).()..(

n

s

n

s

)()xx(t

Page 52: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Construct and interpret confidence intervals regarding the difference of

two independent means

Page 53: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Confidence Interval of

• Confidence intervals are of the form

Point estimate ± margin of error• We can compare our confidence interval with the

test statistic from our hypothesis test

– The point estimate is x1 – x2

– We use the denominator of the test statistic as the standard error

– We use critical values from the Student’s t

Page 54: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Confidence Interval of

• Thus (1- confidence interval is

Point estimate ± margin of error

2

22

1

21

221 ns

ns

t)xx( /

Standard errorPoint estimate

where t has the degrees of freedom that is the smaller of n1-1 and n2-1 .

Page 55: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example

A recent study reported the longest average workweeks for non-supervisory employees in private industry to be chef and construction

Industry n Average Hours/Week Standard Deviation

Chef 18 48.2 6.7

Construction 12 44.1 2.3

Find a 95% confidence interval for the difference in mean length of workweek between chef and construction. Assume normality for the sampled populations and that the samples were selected randomly.

Page 56: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example

1. Parameter of interestThe difference between the mean hours/week for chefs and the mean hours/week for construction workers, 1 - 2

2. The Confidence Interval Criteria

a. Assumptions: Both populations are assumed normal and the samples were random and independently selected

b. Test statistic: t with df = 11;the smaller of n1 1 = 18 1 = 17 or n2 1 = 12 1 = 11

c. Confidence level: 1 = 0.953. The Sample Evidence

Sample information given in the tablePoint estimate for 1 - 2:

1414424821

...xx

Page 57: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example4. The Confidence Interval

a. Confidence coefficients: t0.025, 11d.f.= 2.20

b. Margin of error:

c. Confidence limits:

4.1 – 3.77 = 0.33 to 4.1 + 3.77 = 7.875. The Results

0.33 to 7.87 is a 95% confidence interval for the difference in mean hours/week for chefs and construction workers. ( It also means that there is a significant difference between the mean hours/week for chefs and the mean hours/week for construction workers at 0.05 level of significance, since the interval does not contain zero.)

77312

32

18

76202

22

.)..

)(.(E

Page 58: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Summary

• Two sets of data are independent when observations in one have no affect on observations in the other

• In this case, the differences of the two means should be used in a Student’s t-test

• The overall process, other than the formula for the standard error, are the general hypothesis test and confidence intervals process

Page 59: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Inference aboutTwo Population Proportions

Page 60: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Learning Objectives

• Test hypotheses regarding two population proportions

• Construct and interpret confidence intervals for the difference between two population proportions

Page 61: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test hypotheses regarding two population proportions

Page 62: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Inference about Two Proportions

• This progression should not be a surprise

• One mean and one proportion– Unit 7 – confidence intervals– Unit 8 – hypothesis tests

• Two means– Unit 9 - hypothesis tests and confidence

intervals

• Now for two proportions …

Page 63: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Examples

• We now compare two proportions, testing whether they are the same or not

• Examples– The proportion of women (population one) who have

a certain trait versus the proportion of men (population two) who have that same trait

– The proportion of white sheep (population one) who have a certain characteristic versus the proportion of black sheep (population two) who have that same characteristic

Page 64: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Two Population Proportions

• The test of two populations proportions is very similar, in process, to the test of one population proportion and the test of two population means

• The only major difference is that a different test statistic is used

• We will discuss the new test statistic through an analogy with the hypothesis test of one proportion

Page 65: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test of One Proportion

• For the test of one proportion, we had the variables of– The hypothesized population proportion (p0)

– The sample size (n)– The number with the certain characteristic (x)– The sample proportion ( )

• We expect that should be close to p0

n/xp̂ p̂

Page 66: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test of Two Proportions

• In the test of two proportions, we have two values for each variable – one for each of the two samples– The two hypothesized proportions (p1 and p2)

– The two sample sizes (n1 and n2)

– The two numbers with the certain characteristic (x1 and x2)

– The two sample proportions ( and )

• We expect that should be close to p1 – p2

111 n/xp̂ 222 n/xp̂

21 p̂p̂

Page 67: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test Statistic of One Proportion

• For the test of one proportion, to measure the deviation from the null hypothesis, we took

which has a standard deviation of

n)p(p 00 1

0pp̂

Page 68: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test Statistic of Two Proportions

• For the test of two proportions, to measure the deviation from the null hypothesis, it is logical to take

which has a standard deviation of

2

22

1

11 11n

)p(pn

)p(p

)pp()p̂p̂( 2121

Page 69: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test Statistic for One Proportion

• For the test of one proportion, under certain appropriate conditions, the difference

is approximately normal with mean 0, and the test statistic

has an approximate standard normal distribution

n)p(p

pp̂z

00

01

0pp̂

Page 70: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test Statistic for Two Proportions

• Thus for the test of two proportions, under certain appropriate conditions, the difference

is approximately normal with mean 0, and the test statistic

has an approximate standard normal distribution

2

22

1

11

212111n

)p(pn

)p(p)pp()p̂p̂(

z

)pp()p̂p̂( 2121

Page 71: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test Statistic for Equal Proportions

• For the particular case where we believe that the two population proportions are equal, or p1 = p2 (i.e.

p1 – p2 = 0). Thus

and

21

2121

11

n

)p̂(p̂

n

)p̂(p̂

)pp()p̂p̂(z

cccc

212121 p̂p̂)pp()p̂p̂(

21

21

111

nn)p̂(p̂

p̂p̂

cc

Here, since two population proportions are the same under the null hypothesis, we use , an estimated common proportion for both p1 and p2, which is computed by combining two samples together to calculate an estimated common sample proportion. That is,

cp̂

21

21

nn

xxp̂c

Page 72: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

General Test Procedure

• Now for the overall structure of the test– Set up the hypotheses– Select the level of significance α– Compute the test statistic– Compare the test statistic with the appropriate

critical values– Reach a do not reject or reject the null

hypothesis conclusion

Page 73: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Assumptions

• In order for this method to be used, the data must meet certain conditions– Both samples are obtained independently

using simple random sampling– Each sample size is large

• These are the usual conditions we need to make our test of proportions calculations

Page 74: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Hypotheses and Level of Significance

• State our two-tailed, left-tailed, or right-tailed hypotheses

• State our level of significance α, often 0.10, 0.05, or 0.01

Page 75: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Test Statistic and Critical Values

• Compute the test statistic

which has an approximate standard normal distribution

• Compute the critical values (for the two-tailed, left-tailed, or right-tailed test)

21

2121

111

nn)P̂(P̂

)pp()p̂p̂(z

cc

Page 76: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Make Statistical Decision• Each of the types of tests can be solved using

either the classical or the P-value approach

• Based on either of these two methods, do not reject the null hypothesis

Page 77: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example

• We have two independent samples– 55 out of a random sample of 100 students at one

university are commuters– 80 out of a random sample of 200 students at another

university are commuters– We wish to know of these two proportions are equal– We use a level of significance α = .05

• Both samples sizes are large so our method can be used

Page 78: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example (continued)

• The test statistic is

Notice that

• The critical values for a two-tailed test using the normal distribution are ± 1.96, thus we reject the null hypothesis

• Or, we calculate P-value which is 0.014 less than the 0.05 level of significance. ( Notice: 2*normalcdf(2.46,E99) = 0.014)

• We conclude that the two proportions are significantly different

462

200

1

100

14501450

400550

111

21

2121 .

).(.

..

nn)p̂(p̂

)pp()p̂p̂(z

cc

450200100

8055.p̂

c

Page 79: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Confidence Interval of p1 – p2

• Thus confidence intervals are

Point estimate ± margin of error

cp̂

2

22

1

11221

11n

)p̂(p̂n

)p̂(p̂z)p̂p̂( /

Standard errorPoint estimate

Here, for calculating the standard error, we use separate estimates of the population proportions, instead of the common estimate21

p̂,p̂

Page 80: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example

A consumer group compared the reliability of two similar microcomputers from two different manufacturers. The proportion requiring service within the first year after purchase was determined for samples from each of two manufacturers.

Find a 98% confidence interval for p1 p2, the difference in proportions needing service

Manufacturer Sample Size Proportion Needing Service

1 200 0.15

2 250 0.09

Page 81: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example (continued)

1. Population Parameter of Interest : The difference between the proportion of microcomputers needing service for manufacturer 1 and the proportion of microcomputers needing service for manufacturer 2, that is, p1- p2

2. Point estimate:

3. Confidence coefficients:z(/2) = z(0.01) = 2.33

06009015021

...p̂p̂

0 z

98.0 01.001.0

33.2

z(0.01

Page 82: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Example (continued)

• Margin of error:

• Confidence limits: 0.06 – 0.0724 = -0.0124 to 0.06 + 0.0724 = 0.1324

Results 0.0124 to 0.1324 is a 98% confidence interval for the difference in

proportions

07240250

910090

200

850150332 .

).)(.().)(.(.E

Page 83: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Summary

• We can compare proportions from two independent samples

• We use a formula with the combined sample sizes and proportions for the standard error

• The overall process, other than the formula for the standard error, are the general hypothesis test and confidence intervals process

Page 84: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Inferenceson Two Samples

Summary

Page 85: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Summary

• The process of hypothesis testing is very similar across the testing of different parameters

• The major steps in hypothesis testing are– Formulate the appropriate null and alternative

hypotheses– Calculate the test statistic– Determine the appropriate critical value or

values– Reach the reject / do not reject conclusions

Page 86: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Tests for Means and Proportions

• Similarities in hypothesis test processes

Parameter Mean (one population)

Two Means(Independent)

Two Means(Dependent)

TwoProportions

H0: μ = μ0 μ1 = μ2 μ1 = μ2 p1 = p2

(2-tailed) H1: μ ≠ μ0 μ1 ≠ μ2 μ1 ≠ μ2 p1 ≠ p2

(L-tailed) H1: μ < μ0 μ1 < μ2 μ1 < μ2 p1 < p2

(R-tailed) H1: μ > μ0 μ1 > μ2 μ1 > μ2 p1 > p2

Test statistic Difference Difference Difference Difference

Critical value Normal Normal Student t Normal

Page 87: Inferences On Two Samples. Overview We continue with confidence intervals and hypothesis testing for more advanced models Models comparing two means –When.

Summary

• We can test whether sample data from two different samples supports a hypothesis claim about a population mean or proportion

• For two population means, there are two cases– Dependent (or matched-pair) samples– Independent samples

• All of these tests follow very similar processes, differing only in their test statistics and the distributions for their critical values