CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison...

80
CHEE824 - Winter 2006 J. McLellan 1 Background Slides for CHEE824 Hypothesis tests For comparison of means Comparison of variances Discussion of power of a hypothesis test - type I and type II errors Joint confidence regions (for the linear case)

Transcript of CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison...

Page 1: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 1

Background Slides for CHEE824

• Hypothesis tests– For comparison of means

– Comparison of variances

– Discussion of power of a hypothesis test - type I and type II errors

• Joint confidence regions (for the linear case)

Page 2: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 2

Hypothesis Tests

… are an alternative approach to confidence limits for factoring in uncertainty in decision-making

Approach– make a hypothesis statement

– use appropriate test statistic for statement

– consider range of values for test statistic that would be likely to occur if hypothesis were true

– compare value of test statistic estimated from data to range - if significant, hypothesis is rejected, otherwise hypothesis is accepted

Page 3: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 3

Example

Naphtha reformer in a refinery» under old catalyst, octane number was 90» under new catalyst, average octane number of 92 has been

estimated using a sample of 4 data points» standard deviation of octane number in unit is known to be 1.5» has the octane number improved significantly?

» We could use confidence limits to answer this question• for the mean, with known variance• form interval, and see if old value (90) is contained in interval for

new mean

» consider direct test … hypothesis test

Page 4: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 4

Example

Hypothesis test -

Null hypothesis »

Alternate hypothesis»

– approach» mean is estimated using sample average

» if observed average is within reasonable variation limits of old mean, conclude that no significant change has occurred

» reference distribution - Standard Normal

90:0 =μH

90: >μaH

“status quo”

Page 5: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 5

Example

» to compare with Standard Normal, we must standardize» if mean under new catalyst was actually the old mean,

then

would be distributed as a Standard Normal distribution• observed values would vary accordingly

» now choose a fence - limit that contains 95% of values of Standard Normal

» if observed value exceeds fence, then it is unlikely that the mean under the new catalyst is equal to the old mean

• small chance of obtaining an observed average outside this range

» if value exceeds fence, reject null hypothesis

4/

90

σ−X

Page 6: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 6

Example

» Compute test statistic value using observed average of 92:

» now determine fence - test at 95% significance level - upper tail area is 0.05

• z = 1.65

» compare: 2.67 > 1.65 -conclude that mean must be significantly higher, since likelihood of obtaining an average of 92 when true mean is 90 is very small

67.24/5.1

9092

4/

90=

−=

−σx

fence - upper tailarea is 0.05

We only use the upper tail here, because we are interested in testingto see whether the new mean is greater than the old mean.

Page 7: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 7

Example

– there is a small chance (0.05) that we could obtain an observed average that would lie outside the fence even though the mean had not changed

» in this case, we would erroneously reject the null hypothesis, and conclude that the catalyst had caused a significant increase

» referred to as a “Type I error” - false rejection• this would happen 5% of the time

• to reduce, move fence further to the extreme of the distribution - reduce upper tail area

= 0.05 is the “significance level” • (1- ) is sometimes referred to as the “confidence level”

is a tuning parameter for the hypothesis test

Page 8: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 8

Hypothesis Tests

Review sequence1) formulate hypothesis

2) form test statistic

3) compare to “fence” value z = 1.65

4) in this case, reject null hypothesis

90:0 =μH

90: >μaH

4/

90

σ−X

Page 9: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 9

Types of Hypothesis Tests

One-sided tests– null hypothesis - parameter equal to old value

– alternate hypothesis - parameter >, < old value» e.g.,

Two-sided tests– null hypothesis - parameter equal to old value

– alternate hypothesis - parameter not equal to old value (could be greater than, less than)

» e.g.,

90:0 =μH

90: >μaH

90:0 =μH

90: ≠μaH

In two-sided tests, two fences are used (upper, lower), and significance area is split evenly between lower and upper tails.

Page 10: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 10

Hypothesis Tests for Means

… with known variance

Two-Sided Test - at the significance level

Hypotheses:

Test Statistic:

Fences:

Reject H0 if

0

00

:

:

μμμμ

≠=

aHH

n

X

/0

σμ−

2/

2/

z

z−

2/0

/ σμ

zn

X>

rejection region

Page 11: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 11

Hypothesis Tests for Means

… with known variance

One-Sided Test - at the significance level

Hypotheses:

Test Statistic:

Fences:

Reject H0 if

0

00

:

:

μμμμ

>=

aHH

n

X

/0

σμ−

z

σμ

zn

X>

/0

rejection region

Page 12: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 12

Hypothesis Tests for Means

… with known variance

One-Sided Test - at the significance level

Hypotheses:

Test Statistic:

Fences:

Reject H0 if

0

00

:

:

μμμμ

<=

aHH

n

X

/0

σμ−

−=− 1zz

σμ

−<−

10

/z

n

X

rejection region

Page 13: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 13

Hypothesis Tests for Means

When the variance is unknown, we estimate using the sample variance.

Test statistic– use “standardization” using sample standard deviation

Reference distribution - – becomes the Student’s t distribution– degrees of freedom are those of the sample variance

» n-1

ns

X

/0μ−

Page 14: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 14

Hypothesis Tests for Means

… with unknown variance

Two-Sided Test - at the significance level

Hypotheses:

Test Statistic:

Fences:

Reject H0 if

0

00

:

:

μμμμ

≠=

aHH

ns

X

/0μ−

2/1,12/,1

2/,1

−−−

=− nn

n

tt

t

2/,10

/ μ

−>−

ntnsX

rejection region

Page 15: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 15

Hypothesis Tests for Means

… with unknown variance

One-Sided Test - at the significance level

Hypotheses:

Test Statistic:

Fences:

Reject H0 if

0

00

:

:

μμμμ

>=

aHH

ns

X

/0μ−

,1−nt

μ

,10

/ −>−

ntns

X

rejection region

Page 16: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 16

Hypothesis Tests for Means

… with unknown variance

One-Sided Test - at the significance level

Hypotheses:

Test Statistic:

Fences:

Reject H0 if

0

00

:

:

μμμμ

<=

aHH

ns

X

/0μ−

−−− =− 1,1,1 nn tt

μ

−−<−

1,10

/ ntns

X

rejection region

Page 17: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 17

Hypothesis Tests for Variances

• Hypotheses» e.g.,

• Test Statistic» since

then

20

2

20

20

:

:

σσ

σσ

=

aH

H

21

22

1~ −− nn

s χσ

212

0

2~

)1(−

−n

sn χσ

Test Statistic

Page 18: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 18

Hypothesis Tests for Variances

Two-Sided Test - at the significance level

Hypotheses:

Test Statistic:

Fences:

Reject H0 if

20

2

20

20

:

:

σσ

σσ

=

aH

H

20

2)1(

σsn−

22/,12

0

22

2/1,120

2 )1(,

)1( χ

σχ

σ −−− >−

<−

nnsn

orsn

22/,1

22/1,1 , χχ −−− nn

Rejection region

Page 19: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 19

Hypothesis Tests for Variances

One-Sided Test - at the significance level

Hypotheses:

Test Statistic:

Fences:

Reject H0 if

20

2

20

20

:

:

σσ

σσ

>

=

aH

H

20

2)1(

σsn−

2,12

0

2)1(χ

σ−>

−n

sn

2,1 χ −n

Rejection region

Page 20: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 20

Hypothesis Tests for Variances

One-Sided Test - at the significance level

Hypotheses:

Test Statistic:

Fences:

Reject H0 if

20

2

20

20

:

:

σσ

σσ

<

=

aH

H

20

2)1(

σsn−

21,12

0

2)1(χ

σ−−<

−n

sn Rejection region

21,1 χ −−n

Page 21: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 21

Outline

• random samples• notion of a statistic• estimating the mean - sample average• assessing the impact of variation on estimates -

sampling distribution• estimating variance - sample variance and standard

deviation• making decisions - comparisons of means, variances

using confidence intervals, hypothesis tests• comparisons between samples

Page 22: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 22

Comparisons Between Two Samples

So far, we have tested means and variances against known values

» can we compare estimates of means (or variances) between two samples?

» Issue - uncertainty present in both quantities, and must be considered

Common Question» do both samples come from the same underlying parent

population?» e.g., compare populations before and after a specific

treatment

Page 23: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 23

Preparing to Compare Samples

Experimental issues» ensure that data is collected in a randomized order for

each sample• ensure that there are no systematic effects - e.g., catalyst

deactivation, changes in ambient conditions, cooling water heating up gradually

» blocking - subject experimentation to same conditions - ensure quantities other than those of interest aren’t changing

Page 24: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 24

Comparison of Variances

… is typically conducted prior to comparing means» recall that standardization required for hypothesis test (or

confidence interval) for the mean requires use of the standard deviation we should compare variances first before choosing appropriate mean comparison

Approach » focus on ratio of variances

• is this ratio = 1?

• will be assessed using sample variances

» what should we use for a reference distribution?

22

21 /σσ

Page 25: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 25

Comparison of Variances

Test Statistic– for use in both hypothesis tests and confidence

intervals

The quantity

» n1 and n2 are the number of points in the samples used to compute and respectively

1,122

22

21

21

21~

/

/−− nnF

s

s

σσ

F-distribution

21s

22s

Page 26: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 26

The F Distribution

… arises from the ratio of two Chi-squared random variables, each divided by their degrees of freedom

» sample variance is sum of squared Normal random variables

» dividing by population variance standardizes them, and the expression becomes sum of standard Normal r.v.’s, i.e., Chi-squared

1,122

22

21

21

21~

/

/−− nnF

s

s

σσ

212

2

1

1~ −− nn

s χσ

Page 27: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 27

Confidence Interval Approach

Form probability statement for this test statistic:

and rearrange:

σσ

−=<< −−−−− 1)/

/( 2/,1,12

222

21

21

2/1,1,1 2121 nnnn Fs

sFP

σσ

−=<<

−−−−−1)(

2/1,1,122

21

22

21

2/,1,122

21

2121 nnnn Fs

s

Fs

sP

Page 28: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 28

Confidence Interval Approach

100(1-)% Confidence Interval

Approach:» compute confidence interval» determine whether “1” lies in the interval

• if so - identical variances is a reasonable conjecture

• if not - different variances

2/1,1,122

21

22

21

2/,1,122

21

2121 σσ

−−−−−<<

nnnn Fs

s

Fs

s

Page 29: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 29

Hypothesis Test Approach

Typical approach – use a 1-sided test, with the test direction dictated by

which variance is larger

Test Statistic

22

21

22

22

21

21

/

/

s

s

s

s=

σσ

Under the null hypothesis,we are assuming that

122

21 =

σσ

Page 30: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 30

Hypothesis Tests for Variances

One-Sided Test - at the significance level

For

Hypotheses:

Test Statistic:

Fences:

Reject H0 if

22

21

22

210

:

:

σσ

σσ

>

=

aH

H

,1,1 21 −− nnF

22

21

s

s

22

21 ss >

,1,122

21

21 −−> nnFs

s

Page 31: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 31

Hypothesis Tests for Variances

One-Sided Test - at the significance level

For

Hypotheses:

Test Statistic:

Fences:

Reject H0 if

21

22

22

210

:

:

σσ

σσ

>

=

aH

H

,1,1 12 −− nnF

21

22

s

s

21

22 ss >

,1,121

22

12 −−> nnFs

s

Why the reversal?

Page 32: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 32

Why the reversal?

• Property of F-distribution

• typically, we would compare against

• Problem - » tables for upper tail areas of 1- are not always available

• Solution - use the following fact for F-distributions

• to use this, reverse the test ratio - previous slide

22

21

s

s−−− 1,1,1 21 nnF

νννν

,,1,,

1221

1

FF =−

Page 33: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 33

Example

Global warming problem from tutorial:» s1 - standard devn for March ‘99 is 3.2 C

» s2 - standard devn for March ‘98 is 2.3 C

» has the variance of temperature readings increased in 1999?

» first, work with variances: • 1999 -- 10.2 C2

• 1998 -- 5.3 C2

» since a) we are interested in whether variance increased, and b) 1999 variance (10.2) is greater than 1998 variance (5.3), use the ratio

Each is estimatedusing 31 data points

22

21

s

s

Page 34: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 34

Example

Hypotheses:

» observed value of ratio = 1.94» “fence value” - test at the 5% significance level:

• F31-1, 31-1, 0.05 = 1.84

» since observed value of test statistic exceeds fence value, reject the null hypothesis

• variance has increased

Note » if we had conducted the test at the 1% significance level

(F=2.39), we would not have rejected the null hypothesis

22

21

22

210

:

:

σσ

σσ

>

=

aH

H

Page 35: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 35

Example

Now use confidence intervals to compare variances:

» use a 95% confidence interval - outer tail area is 2.5% on each side

» this is a 2-tailed interval, so we need

2/1,1,122

21

22

21

2/,1,122

21

2121 σ

σ

−−−−−<<

nnnn Fs

s

Fs

s

48.0/1

/1

07.2

025.0,131,131

025.0,1,1

975.0,1,12/1,1,12/1,1,1

025.0,131,1312/,1,1

12

212121

21

==

=

==

==

−−

−−

−−−−−−−−

−−−−

F

F

FFF

FF

nn

nnnnnn

nn

Page 36: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 36

Example

Confidence interval:

Conclusion » since 1 is contained in this interval, we conclude that the

variances are the same» why does the conclusion differ from the hypothesis test?

• 2-sided confidence interval vs. 1-sided hypothesis test• in confidence interval, 1 is close to the lower boundary

0.493.0

)48.0(3.5

2.10

)07.2(3.5

2.10

22

21

22

21

<<⇒

<<

σσ

σσ

Page 37: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 37

Comparing Means

The appropriate approach depends on:» whether variances are known» whether a test of sample variances indicates that variances

can be considered to be equal • measurements coming from same population

Assumption: data are Normally distributed

The approach is similar, however the form depends on the conditions above

» form test statistic» use reference distribution» re-arrange (confidence intervals) or compare to fence

(hypothesis tests)

Page 38: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 38

Comparing Means

Known Variances» if variances are known ( ), then

» now we can standardize to obtain our test statistic

22

21 , σσ

),(~)(2

22

1

21

2121 nnNXX

σσμμ +−−

Z

nn

XX~

)()(

2

22

1

21

2121

σσ

μμ

+

−−−

Note - we are assuming that the samples used for the averages are independent.

Page 39: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 39

Comparing Means

Known Variances

Confidence Interval» form probability statement for test statistic as a Standard

Normal random variable» re-arrange interval» procedure analogous to that for mean with known

variance

2

22

1

21

2/21212

22

1

21

2/21 )()()(nn

zXXnn

zXXσσμμσσ

++−<−<+−−

Page 40: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 40

Comparing Means

Known Variances

Hypothesis Test

Test Statistic

Fences

Reject H0 if

21

210

:

:

μμμμ

≠=

aHH

2

22

1

21

21 )(

nn

XX

σσ+

2/

2/

z

z−

2/

2

22

1

21

21 )(

σσz

nn

XX>

+

Two-Sided Test

Page 41: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 41

Comparing Means

Unknown Variance– appropriate choice depends on whether variances can

be considered equal or are different» test using comparison of variances» if variances can be considered to be equal, assume that

we are sampling with same population variance » pool variance estimate to obtain estimate with more

degrees of freedom

Page 42: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 42

Pooling Variance

– If variances can reasonably be considered to be the same, then we can assume that we are sampling from population with same variance

» convert sample variances back to sums of squares, add them together, and divide by the combined number of degrees of freedom

» can follow similar procedure for

∑ −=−⇒∑ −−

===

11

1

21,1

211

1

21,1

1

21 )()1()(

1

1 n

ii

n

ii XXsnXX

ns

22s

Page 43: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 43

Pooling Variance

– We have obtained the original sum of squares from each sample variance

– combine to form overall sum of squares

– degrees of freedom

– pooled variance estimate

222

211 )1()1( snsnSSoverall −+−=

2)(11 2121 −+=−+−= nnnnoverallν

2

)1()1(

21

222

2112

−+−+−

=nn

snsnsp

Page 44: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 44

Comparing Means

Unknown Variance - “Equal Variances”

Confidence Intervals

» recall that» since variance is estimated, we use the t-distribution as a

reference distribution

» degrees of freedom = (n1-1) + (n2-1)

» if 0 lies in this interval, means are not different

212/,2121

212/,21

11)()(

11)(

nnstXX

nnstXX pp ++−<−<+−− νν μμ

2/,2/1, νν tt −=−

Page 45: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 45

Comparing Means

Unknown Variance - “Equal Variances”

Hypothesis Test

Test Statistic

Fences

Reject H0 if

21

210

:

:

μμμμ

≠=

aHH

21

2111

)(

nns

XX

p +

2/1,2/,

2/,

νν

ν

−=− tt

t

2/,

21

2111

)(νt

nns

XX

p

>+

Page 46: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 46

Comparing Means

Unknown Variance - “Unequal Variances”– test becomes an approximation

• approach» test statistic

» reference distribution - Student’s t distribution» estimate an “equivalent” number of degrees of freedom

2

22

1

21

21 )(

n

s

n

s

XX

+

Page 47: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 47

Comparing Means

Unknown Variance - “Unequal Variances”– equivalent number of degrees of freedom

– degrees of freedom ν is largest integer less than or equal to

11 2

2

2

22

1

2

1

21

2

2

22

1

21

⎟⎟⎠

⎞⎜⎜⎝

+−

⎟⎟⎠

⎞⎜⎜⎝

⎟⎟⎠

⎞⎜⎜⎝

⎛+

n

ns

n

ns

ns

ns

Page 48: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 48

Comparing Means

Unknown Variance - “Unequal Variances”

Confidence Intervals» similar to case of known variances, but using sample

variances and t-distribution

» degrees of freedom ν is the effective number of degrees of freedom (from previous slide)

» recall that

» if 0 isn’t contained in interval, conclude that means differ

2

22

1

21

2/,21212

22

1

21

2/,21 )()()(ns

ns

tXXns

ns

tXX ++−<−<+−− νν μμ

2/,2/1, νν tt −=−

Page 49: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 49

Comparing Means

Unknown Variance - “Unequal Variances”

Hypothesis Test

Test Statistic

Fences

Reject H0 if

21

210

:

:

μμμμ

≠=

aHH

2

22

1

21

21 )(

n

s

n

s

XX

+

2/,2/1, , νν tt −

2/,

2

22

1

21

21 )(νt

ns

ns

XX>

+

Page 50: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 50

Paired Comparisons for Means

Previous approach» 2 data sets obtained from 2 processes» compute average, sample variance for EACH data set» compare differences between sample averages

Issue - » extraneous variation present because we have conducted one

experimental program for process 1, and one distinct experimental program for process 2

» additional variation reduces sensitivity of tests• location of fences depends in part on extent of variation

» can we conduct experiments in a paired manner so that they have as much variation in common as possible, and extraneous variation is eliminated?

Page 51: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 51

Paired Comparisons of Means

Approach - » set up pairs of experimental runs with as much in common

as possible» collect pairs of observations for each experimental run --

process 1, process 2» compute differences» conduct a confidence interval or hypothesis test on the

mean of the differences, using the average of the differences in the test statistic

• variance estimated using the sample variance of the differences

» test to see if the mean of the differences is plausibly zero (no difference in population means)

Page 52: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 52

Paired Comparison of Means

Example - oxide thickness on silicon wafers

» runs at two positions in a furnace

» run pairs of tests with a wafer in each location

Furnace PositionA B difference

920 923 -3914 924 -10927 913 14891 881 10943 923 20902 884 18910 887 23856 858 -2937 916 21857 857 0

average 9.1variance 141.6556

std 11.90191

Page 53: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 53

Paired Comparison of Means

Confidence Interval

» are average and standard deviation of differences

» conclude that means are identical if zero is contained in interval

» n is number of data points in paired samples (e.g., 10 pairs)

nstDnstD dndn // 2/,1212/,1 μμ −− +<−<−

dsD ,

Page 54: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 54

Paired Comparison of Means

Hypothesis Test

Test Statistic

Fences

Reject H0 if

0:

0:

21

210

≠−=−

μμμμ

aHH

ns

D

d /

2/,12/1,1 , −−− nn tt

2/,1/ −> nd

tns

D

Page 55: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 55

“Tuning” Hypothesis Tests

What significance level should we use for a hypothesis test?

rejection region

Rejection region has area . If thenull hypothesis were actually true,there is probability that we the observed value would fall outsidethe fences, and we would erroneouslyreject the null hypothesis FALSE REJECTION- referred to as a Type I error

Page 56: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 56

Adjusting the False Rejection Rate

… is achieved by moving the fences further out» use a higher threshold as a basis to reject null

hypothesis» i.e., make the outer tail area SMALLER» e.g., instead of testing at 5% significance (95%

confidence level), test at 1% significance level (99% confidence level)

Page 57: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 57

Failure to Detect

Suppose the mean has actually increased.

False rejection regionwith area =

Failure to detect region - observed values of the teststatistic falling in this regionshould in fact be rejected, however they aren’t becausethey fall within the acceptanceregion - FAILURE TO REJECT referred to as a Type II errorwhich has a probability ofoccurring

Page 58: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 58

Failure to Detect

The probability of a type II error depends on:– size of the shift to be

detected– location of the fence --

significance level (Type I error probability)

– influences degree of overlap of two distributions, and thus the overlap area

Area =

Page 59: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 59

Failure to Detect

Schematic: Distribution for X-bar is standardized as:

however if the true mean has shifted, this not a standard Normalrandom variable. if new mean has shifted by μthen we must use

as the standardized form

n

X

/0

σμ−

n

X

nn

X

///00

σμμ

σμ

σμ −−

=−−

Page 60: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 60

Failure to Detect

Computing - for 1-sided hypothesis test» outer tail area on high side is » fence value is z

» type II error probability is where has mean μ0+μ

» in order to compute probability of type II error, convert to standard normal:

)( zXP < X

X

Page 61: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 61

Failure to Detect

Introduce

» size of shift as multiple of standard deviation of X (population)

» no analytical expression for » summarize in graphs referred to as Operating

Characteristic Curves » 1- is called the POWER of the hypothesis test

σμφ =

Page 62: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 62

Operating Characteristic Curve

• Example shape of the curve

Increasing sample size n

n=1n=5

n=50

0 1 2 3 4

Size of shift

Probabilityof failingto reject

1

0.8

0.6

0.4

0.2

0

For fixed value of

Page 63: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 63

Operating Characteristic Curve

• Illustrates trade-off between false detection/failure to detect for fixed sample size

• Use - examples» given desired false detection, failure to detect rates,

determine sample size required to detect given shift» given sample size and false detection rate, determine

failure to detect rate given size of shift

Sample size nFailure to detectrate

False detection rate

Page 64: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 64

Operating Characteristic Curves

… are available for:» 2-sided hypothesis test for mean

• variance known

• variance unknown

» 1-sided hypothesis test for mean• variance known

• variance unknown

» tests for variance

Page 65: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 65

Joint Confidence Region (JCR)

… answers the question

Where do the true values of the parameters lie?

Recall that for individual parameters, we gain an understanding of where the true value lies by:

» examining the variability pattern (distribution) for the parameter estimate

» identify a range in which most of the values of the parameter estimate are likely to lie

» manipulate this range to determine an interval which is likely to contain the true value of the parameter

Page 66: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 66

Joint Confidence Region

Confidence interval for individual parameter:

Step 1) The ratio of the estimate to its standard deviation is distributed as a Student’s t-distribution with degrees of freedom equal to that of the standard devn of the variance estimate

Step 2) Find interval which contains

of values -i.e., probability of a t-value falling in this interval is

Step 3) Rearrange this interval to obtain interval

which contains true value of parameter of the time

$~

$

νi i

st

i

[ , ], / , /−t tν ν 2 2 100 1( )%−( )1−

$, / $ ν i t s

i± 2

100 1( )%−

Page 67: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 67

Joint Confidence Region

Comments on Individual Confidence Intervals: » sometimes referred to as marginal confidence intervals -

cf. marginal distributions vs. joint distributions from earlier

» marginal confidence intervals do NOT account for correlations between the parameter estimates

» examining only marginal confidence intervals can sometimes be misleading if there is strong correlation between several parameter estimates

• value of one parameter estimate depends in part on anther• deletion of the other changes the value of the parameter

estimate• decision to retain might be altered

Page 68: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 68

Joint Confidence Region

Sequence:

Step 1) Identify a statistic which is a function of the parameter estimate statistics

Step 2) Identify a region in which values of this statistic lie a certain fraction of the time (a region)

Step 3) Use this information to determine a region which contains the true value of the parameters of the time

100 1( )%−

100 1( )%−

Page 69: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 69

Joint Confidence Region

The quantity

is the ratio of two sums of squares, and is distributed as an F-distribution with p degrees of freedom in the numerator, and n-p degrees of freedom in the denominator

( $ ) ( $ )

~ ,

ε

− −

T T

pn pp

sF

X X

2estimate ofinherentnoise variance(if MSE is used, degrees of freedom is n-p)

Page 70: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 70

Joint Confidence Region

We can define a region by thinking of those values of the ratio which have a value less than

i.e.,

Rearranging yields:

Fp n p, ,− −1

( $ ) ( $ )

, ,

ε

− −

≤ − −

T T

pn pp

sF

X X

2 1

( $ ) ( $ ) , , ε − − ≤ −T T

pn pps FX X 2

Page 71: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 71

Joint Confidence Region - Definition

The joint confidence region for the parameters is defined as those parameter values satisfying:

Interpretation:

» the region defined by this inequality contains the true values of the parameters of the time

» if values of zero for one or more parameters lie in this region, those parameters are plausibly zero, and consideration should be given to dropping the corresponding terms from the model

100 1( )%−

( $ ) ( $ ) , , ε − − ≤ − −T T

pn pps FX X 21

100 1( )%−

Page 72: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 72

Joint Confidence Region - Example with 2 Parameters

Let’s reconsider the solder thickness example:

95% Joint Confidence Region (JCR) for slope&intercept:

( ) ;X XT =⎡

⎢⎢⎢

⎥⎥⎥

10 2367

2367 563335

$.

.

; =−

⎢⎢⎢

⎥⎥⎥

45810

113

[ ]

( $ ) ( $ )

$ $$

$, , , .

β β β β

β β β ββ β

β βε ε

− −

= − −−

⎢⎢⎢

⎥⎥⎥

≤ =− −

T T

Tpn pps F s F

X X

X X0 0 1 1

0 0

1 1

2 2210 20952

sε2 13538= .

Page 73: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 73

Joint Confidence Region - Example with 2 Parameters

95% Joint Confidence Region (JCR) for slope&intercept:

The boundary is an ellipse...

[ ]45810 113

45810

113

2 135 38

2 135 38 4 46 1207 59

0 1

0

1

2 8 0 95. .

.

.

( . )

( . )( . ) .

, , .− − −

− −

⎢⎢⎢

⎥⎥⎥

= =

β β

β

β

X XT F

Page 74: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 74

Joint Confidence Region - Example with 2 Parameters

Region

320 600

-0.6

-1.6

Intercept

Slope

rotated - implies correlationbetween estimates of slopeand intercept

centred at least squares parameter estimates

greater “shadow” along horizontal axis --> variance ofintercept estimate is greater than that of slope

Page 75: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 75

Interpreting Joint Confidence Regions

1) Are axes aligned with coordinate axes?

» is ellipse horizontal or vertical?

» indicates no correlation between parameter estimates

2) Which axis has the greatest shadow?

» projection of ellipse along axis

» indicates which parameter estimate has the greatest variance

3) The elliptical region is, by definition, centred at the least squares parameter estimates

4) Long, narrow, rotated ellipses indicate significant correlation between parameter estimates

5) If a value of zero for one or more parameters lies in the region, these parameters are plausibly zero - consider deleting from model

Page 76: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 76

Joint Confidence Regions

What is the motivation for the ratio

used to define the joint confidence region?

Consider the joint distribution for the parameter estimates:

( $ ) ( $ )

ε

− −T T

p

s

X X

2

1

2

122

1

( ) det( )exp{ ( $ ) ( $ )}

/$

p

T

ΣΣ− − −−

Substitute in estimate for parameter covariance matrix:

( $ ) (( ) ) ( $ )

( $ ) ( $ )

ε

ε

− −

= − −

− −T T

T T

s

s

X X

X X

1 2 1

2

Page 77: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 77

Confidence Intervals from Densities

Individual Interval Joint Regionf b$( ) f b b$ $ ( , ) 0 1 0 1

bb0

b1

lower upper

area = 1-alpha

volume = 1-alpha

Joint ConfidenceRegion

Page 78: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 78

Relationship to Marginal Confidence Limits

Region

320 600

-0.6

-1.6

Intercept

Slope

centred at least squares parameter estimates

marginal confidence interval for intercept

marginal confidence interval

for slope

Page 79: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 79

Relationship to Marginal Confidence Limits

Region

320 600

-0.6

-1.6

Intercept

Slope 95% confidenceregion for parametersconsidered jointly

marginal confidence interval for intercept

marginal confidence interval

for slope

95% confidenceregion implied byconsidering parametersindividually

Page 80: CHEE824 - Winter 2006J. McLellan1 Background Slides for CHEE824 Hypothesis tests –For comparison of means –Comparison of variances –Discussion of power.

CHEE824 - Winter 2006

J. McLellan 80

Relationship to Marginal Confidence Intervals

Marginal confidence intervals are contained in joint confidence region

» potential to miss portions of plausible parameter values at tails of ellipsoid

» using individual confidence intervals implies a rectangular region, which includes sets of parameter values that lie outside the joint confidence region

» both situations can lead to • erroneous acceptance of terms in model

• erroneous rejection of terms in model