Hang Seng Management College STA102 Statistical Analysis ...

164
Hang Seng Management College STA102 Statistical Analysis for Business

Transcript of Hang Seng Management College STA102 Statistical Analysis ...

Page 1: Hang Seng Management College STA102 Statistical Analysis ...

Hang Seng Management College

STA102

Statistical Analysis for Business

Page 2: Hang Seng Management College STA102 Statistical Analysis ...

Lecturer: Mr Alex Yan Hon Wong

Office: Rm M252

Email: [email protected]

Page 3: Hang Seng Management College STA102 Statistical Analysis ...

STA102

Chapter 2Hypothesis Testing

Page 4: Hang Seng Management College STA102 Statistical Analysis ...

1. Introduction

Example: Weight of packets of rice in supermarket.

Suppose the designated weight of eachpacket of rice is 5kg.

Goal: Want to know whether the packets of rice in this supermarket underweight or not.

Page 5: Hang Seng Management College STA102 Statistical Analysis ...

1. Introduction

A random sample of packets of rice is selected and weighted.

If the mean weight is slightly less than 5kg, can we say that the packets of rice in this supermarket is underweight?

What about if the mean weight is largely less than 5kg?

Page 6: Hang Seng Management College STA102 Statistical Analysis ...

1. Introduction

How to define slightly and largely?

How large (small) is large (small)?

We use hypothesis testing to answer the above questions!

Page 7: Hang Seng Management College STA102 Statistical Analysis ...

1. Introduction

Example: Weight of packets of rice is 5kg

(1) Confidence Interval: What is the range of the mean weight?

(2) Hypothesis Testing: Underweight or not?

In hypothesis testing, we suspect that the mean weight is smaller than it should be.

We want to test whether our claim is correct or notbased on the sample data.

Page 8: Hang Seng Management College STA102 Statistical Analysis ...

1. Introduction

Set up 2 opposing hypotheses:

– Underweight vs not underweight

� Test based on a sample to confirm

Page 9: Hang Seng Management College STA102 Statistical Analysis ...

2. Procedure

(1) Set up 2 hypotheses: Null hypothesis and Alternative hypothesis.

(2) Assume the null hypothesis is true throughout the procedure. It can be rejected only at the end of the analysis.

(3) Identify Type I and Type II errors to determine the significance level (will discuss more details later).

Page 10: Hang Seng Management College STA102 Statistical Analysis ...

2. Procedure

(4) Choose a test statistic based on the parameterto be tested. (e.g. Z-stat, t-stat)

(5) Identify the rejection region, or p-value.

(6) Draw conclusion

Page 11: Hang Seng Management College STA102 Statistical Analysis ...

3. Null Hypothesis vs Alternative Hypothesis

�3.1 Null Hypothesis (H0)- Testing the population parameter- Assumed to be true at the beginning

- Always with a “=” sign- status quo

Page 12: Hang Seng Management College STA102 Statistical Analysis ...

3. Null Hypothesis vsAlternative Hypothesis

�3.2 Alternative Hypothesis (H1) - Considered as true only when there exists sufficient evidence toshow that the null hypothesis is false.

- With a “>”, “<” or “≠≠≠≠” signs.

Page 13: Hang Seng Management College STA102 Statistical Analysis ...

3. Null Hypothesis vs Alternative Hypothesis

�The hypotheses must be mutually exclusive and collectively exhaustive.

Collectively Exhaustive:One of the events (Null or Alternative Hypothesis must occur)

Page 14: Hang Seng Management College STA102 Statistical Analysis ...

3.3 Examples of Settingthe 2 Hypotheses

�Example 1: Criminal trial

H0: The defendant is innocent

H1: The defendant is guilty

We assume that H0 is true at the beginning, only after the verdict we can then reject or do not reject H0. If H0 is rejected, then the conclusion is that the defendant is guilty.

Page 15: Hang Seng Management College STA102 Statistical Analysis ...

�Example 2:– To test if the mean age of a large group of students is more than 17.

H0: µ = 17 H1: µ > 17

�Example 3:

– To test if the mean weight of packs of 5-kg rice of a particular brand is underweight.

H0: µ = 5 H1: µ < 5

Page 16: Hang Seng Management College STA102 Statistical Analysis ...

�Example 4:– To test if the proportion of household with monthly income of more than $18000 is greater than 0.8

H1: p > 0.8H0: p = 0.8

�Example 5:

– To test if the mean waiting timefor a particular bus route is lessthan 5 minutes.

H1: µ < 5H0: µ = 5

Page 17: Hang Seng Management College STA102 Statistical Analysis ...

�Example 6:

– To test if the mean mark of a test is equal to 75.

H0: µ = 75 H1: µ ≠≠≠≠ 75

Page 18: Hang Seng Management College STA102 Statistical Analysis ...

4. Type I and Type II Errors

�4.1 Type I error

-- Occurs when we reject the true null hypothesis (reject H0 when H0 is true)

-- P(Type I error) = αααα, which is also called the significance level.

Page 19: Hang Seng Management College STA102 Statistical Analysis ...

-- Occurs when we do not reject the falsenull hypothesis (do not reject H0 when H0

is false)

-- P(Type II error) = ββββ

4. Type I and Type II Errors

�4.2 Type II error

Page 20: Hang Seng Management College STA102 Statistical Analysis ...

Table for Type I and Type II errors

H0 is true H0 is false

Reject H0 Type I error �

Do not reject H0 � Type II error

Note:It is important to identify the Type I & Type II errors in each testing problem;

Think about the consequence of committing such error and reduce a particular error may lead todesirable outcome.

Page 21: Hang Seng Management College STA102 Statistical Analysis ...

4.3 Reconsider the Criminal Trial Example

H0: The defendant is innocent

H1: The defendant is guilty

�Committing Type I error = αααα

-- Putting the innocent people to jail

�Consequence:

-- Reject H0 when H0 is true

Assumption: H0 is true

Page 22: Hang Seng Management College STA102 Statistical Analysis ...

�Committing Type II error = ββββ

-- Setting the guilty one free�Consequence:

-- Do not reject H0 when H0 is false

4.3 Reconsider the Criminal Trial Example

H0: The defendant is innocent

H1: The defendant is guilty

Assumption: H0 is true

Page 23: Hang Seng Management College STA102 Statistical Analysis ...

Type I error αααα and Type II error ββββare inversely related. Any attempt to reduce one will increase the other.

↓↑ βα↑↓ βα

αααα = Prob(Putting the innocent people to jail) ↓

Prob(Putting the guilty people to jail) ↓ββββ = Prob(Setting the guilty one free)

Suppose ↑↓ βα

Page 24: Hang Seng Management College STA102 Statistical Analysis ...

Type I error αααα and Type II error ββββare inversely related. Any attempt to reduce one will increase the other.

↓↑ βα↑↓ βα

αααα = Prob(Putting the innocent people to jail) ↑Prob(Putting the guilty people to jail)

↓ββββ = Prob(Setting the guilty one free)

Suppose ↓↑ βα

Page 25: Hang Seng Management College STA102 Statistical Analysis ...

Minimize the type of error that we consider more seriously. Usually, αααα isset to be around 5%

Note: Our justice system is set up so that the probability of a Type I error (α) is small.

In the Criminal Trial Example, smallType I error implies that the probabilityof putting the innocent people to jail issmall.

Page 26: Hang Seng Management College STA102 Statistical Analysis ...

Example 7Mr. Wong, the owner of a bread shop, estimated that he could sell at least 100 loaves of bread per day. He carried out a hypothesis testing and concluded to reject the null hypothesis.

(a) Which type of error (I or II) did he make?

(b) Set up the null and alternative hypotheses.(c) What is the consequence of committing

such an error?

Page 27: Hang Seng Management College STA102 Statistical Analysis ...

Type I error: reject H0 when H0 is

true.

Example 7(a)

Given information:-- He carried out a hypothesis testing

and concluded to reject the null hypothesis

Solution:

Page 28: Hang Seng Management College STA102 Statistical Analysis ...

Example 7(b)

H0: demand ≥ 100 loaves of bread

H1: demand < 100 loaves of bread

Given information:-- Mr. Wong estimated that he could sell at least 100 loaves of bread per day (status quo)

H0: demand = 100 loaves of breadH1: demand < 100 loaves of bread

Solution:

Page 29: Hang Seng Management College STA102 Statistical Analysis ...

Consequence of committing Type I

error:

(1) Conclude that he bakes less than 100loaves of bread but in fact the demandis at least 100 loaves of bread.

(2) Stock-out occurs: Running out of inventory

Type I error: reject H0 when H0 is true.

Example 7(c)Solution:

Page 30: Hang Seng Management College STA102 Statistical Analysis ...

A health center wants to know whether the Chinese adult drinks, on the average, less than 3 cups of tea a day.

H0 : µµµµ = 3 vs H1 : µµµµ < 3

Type I error (reject H0 when H0 is true):

-- Conclude that the Chinese adult drinks, on theaverage, less than 3 cups of tea a day but in factthe average value is at least 3.

Type II error (do not reject H0 when H0 is false):

-- Conclude that the Chinese adult drinks, on the average, at least 3 cups of teas a day but in fact the average value is less than 3.

Example 8

Solution:

Page 31: Hang Seng Management College STA102 Statistical Analysis ...

A labour department wants to know whether thehousewife works, on the average, more than 40 hoursper week in house related activities.

H0 : µµµµ = 40 vs H1 : µµµµ > 40

Type I error (reject H0 when H0 is true):

-- Conclude that the average working hours is greater than 40 but in fact the average value is at most 40.

Type II error (do not reject H0 when H0 is false):

-- Conclude that the average working hours is at most40 but in fact it is greater than 40.

Example 9

Solution:

Page 32: Hang Seng Management College STA102 Statistical Analysis ...

5. Drawing Conclusion of the Test

Example 10– Suppose we want to test if the mean IQ of a large groups of students is 105. It is found that the mean IQ of a sample of 25 students from the group is 110. What would be your conclusion? What if the sample mean is 90?

H0 : µµµµ = 105 vs H1 : µµµµ ≠≠≠≠ 105

Page 33: Hang Seng Management College STA102 Statistical Analysis ...

If the sample mean (say IQ = 200) is far away from themean IQ = 105, it would provide enough evidence to say thatthe mean IQ is not equal to 105. (reject H0)

If the sample mean (say IQ = 104) is close to the mean IQ = 105, it does not provide enough evidence to say that the mean IQ is not equal to 105. (do not reject H0)

Is the sample mean IQ =110 “far away” from the mean IQ = 105 to allow us to confidently infer that the population mean IQ ≠ 105 (i.e. H0 is rejected)?

Page 34: Hang Seng Management College STA102 Statistical Analysis ...

How large the test-statistic (difference between sample statistic and population parameter) should be so that the Null Hypothesis (H0) is rejected?

To test for population mean when σ is Known , we use the z-statistic to judge the differencebetween sample mean and population mean.

Test statistic:

n

xz

σµ−

=

Page 35: Hang Seng Management College STA102 Statistical Analysis ...

(1) The Rejection Region Method (i.e., to findthe critical value(s) based on the significance level αααα to reject the null hypothesis)

(2) The p - Value Approach

Two approaches:

Page 36: Hang Seng Management College STA102 Statistical Analysis ...

6. The Rejection Region and P-value Methods

�6.1 Rejection RegionA range of values such that if the test statistic falls into that range, we decide to reject the null hypothesis in favor of the alternative hypothesis.

There may be only one or two rejection region(s):

-- One rejection region ↔ One-Tailed test

-- Two rejection regions ↔ Two-Tailed test

Page 37: Hang Seng Management College STA102 Statistical Analysis ...

6.1.1 One-Tailed Test

�When the Alternative Hypothesis H1 is with “>” or “<” sign.

�Only one rejection region.

Rejection region(Area=α)

Rejection region(Area=α)

1 - α 1 - α

H1 : “>” (right-tailed test) H1 : “<” (left-tailed test)

αz−αz

Page 38: Hang Seng Management College STA102 Statistical Analysis ...

H1: ‘≠≠≠≠’

6.1.2 Two-Tailed Test�When the Alternative Hypothesis H1

is with “≠≠≠≠” sign.�There are two rejection regions.

2

αRejectionregion

Rejectionregion

2

α1 - α

2

αz−2

αz

Page 39: Hang Seng Management College STA102 Statistical Analysis ...

The Rejection Region MethodOne-tailed test:

For right-tailed test (“>” for H1)

αzz >

Rejection region(Area=α)

αz

1 – α

If , reject H0

Page 40: Hang Seng Management College STA102 Statistical Analysis ...

One-tailed test:

For left-tailed test (“<” for H1)

αzz −<

Rejection region(Area=α)

αz−

The Rejection Region Method

1 - α

If , reject H0

Page 41: Hang Seng Management College STA102 Statistical Analysis ...

Two-tailed test (“≠≠≠≠” for H1)

2

αzz −<

The Rejection Region Method

2

αzz >

H1: ‘≠≠≠≠’2

αz−2

αz

2

αRejectionregion

Rejectionregion

2

α1 - α

If or , reject H0

Page 42: Hang Seng Management College STA102 Statistical Analysis ...

Example 11The general manager of a book chain store would like to know if the average December sales of the representatives in his store is more than $50,000. To verify his claim, he took a sample of 10 sales representatives in December and found that their mean sales was $45,000.

(a) Set up H0 and H1 for the general manager.

H0 : µ = 50,000H1 : µ > 50,000

(b) Categorize his test as a right-tailed,

a left-tailed or a two-tailed test.

Right-tailed

Page 43: Hang Seng Management College STA102 Statistical Analysis ...

6.2 The p-Value Approach

The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed given that the null hypothesis is true.

Page 44: Hang Seng Management College STA102 Statistical Analysis ...

Consider the right-tailed test:

If the p-value < αααα, reject H0 as the test

statistic is within the rejection region.

If the p-value > αααα, do not reject H0 as the

test statistic is outside the rejection region.

p-value = Area of the shadedregion

z

Page 45: Hang Seng Management College STA102 Statistical Analysis ...

7. Test for population mean(s)

7.1 Test for Mean of One Population when the Population Standard Deviation is Known

Test statistic:

n

xz

σµ−

=

z measures the number of σ away from the mean

Page 46: Hang Seng Management College STA102 Statistical Analysis ...

The manager of a department store is thinking

about establishing a new billing system for customers.

She determines that the system will be cost-

effective if the mean monthly account is more than

$170. From a random sample of 400 monthly

accounts, the mean is $178. The manager knows that

the accounts are approximately normally distributed

with a standard deviation of $65. Can the manager

conclude from this that the new system will be cost

-effective, at the 5% significance level?

Example 12

Page 47: Hang Seng Management College STA102 Statistical Analysis ...

The Presentation:

Rejection region method

170:0 =µH 170:1 >µH

(one-tailed test)

(Step 1)

Set up null and alternative hypotheses:

Page 48: Hang Seng Management College STA102 Statistical Analysis ...

n

xz

σµ−

=

400

65

170178 −=

n = 400

178=x

σσσσ = 65

α= 5%

Given information

(Step 3) Compute the test statistic:

(Step 2) Find useful information:

4615.2=

Page 49: Hang Seng Management College STA102 Statistical Analysis ...

(Step 4) Identify the rejection region:

Rejection region

z0.05 = 1.645

α= 5%

Rejection region: z > z0.05 = 1.645

1-α =95%

i.e. the null hypothesis must be rejected if

z > 1.645.

Page 50: Hang Seng Management College STA102 Statistical Analysis ...

(1) Since z > 1.645, z = 2.4615 falls into the rejection region.

(2) At αααα = 5%, H0 is rejected.

(3) In other words, there is enough evidence to support that the mean account is more than $170. Therefore, the manager can conclude that the new system is cost-effective.

(Step 5) Draw conclusion:

Page 51: Hang Seng Management College STA102 Statistical Analysis ...

The Presentation:

P-value method

170:0 =µH 170:1 >µH

(one-tailed test)

(Step 1)

Set up null and alternative hypotheses:

Page 52: Hang Seng Management College STA102 Statistical Analysis ...

n

xz

σµ−

=

400

65

170178 −=

n = 400

178=x

σσσσ = 65

α= 5%

Given information

(Step 3) Compute the test statistic:

(Step 2) Find useful information:

4615.2=

Page 53: Hang Seng Management College STA102 Statistical Analysis ...

p-value = P(Z > 2.46) = 0.0069

(Step 4) Compute the p-value:

2.46

P-value = Area of the shadedregion

From (Step 3), we have computed z = 2.4615

Page 54: Hang Seng Management College STA102 Statistical Analysis ...

z0.05=1.645

(1) Since p-value = 0.0069 < 0.05 = α [note z > z0.05]

α = 0.05

p-value = 0.0069

(Step 5) Draw conclusion:

Z = 2.46

(2) At α = 5%, H0 is rejected.

(3) In other words, there is enough evidence to support that the mean account is more than $170. (same result as the rejection region method)

Page 55: Hang Seng Management College STA102 Statistical Analysis ...

Suppose a statistics practitioner working for AT&T

determines that the mean and standard deviation of

monthly long-distance bills for all its residential

customers are $17.09 and $3.87, respectively. He

then takes a random sample of 100 customers and

recalculates their last month’s bill using the rates

quoted by a leading competitor. He finds that the

sample mean is $17.55. Assuming that the standard

deviation of this population is the same as for AT&T,

can we conclude at the 5% significance level that

there is a difference between the average AT&T

bill and that of its competitor?

Example 13

Page 56: Hang Seng Management College STA102 Statistical Analysis ...

The Presentation:

Rejection region method

09.17:0 =µH 09.17:1 ≠µH

(two-tailed test)

(Step 1)

Set up null and alternative hypotheses:

Page 57: Hang Seng Management College STA102 Statistical Analysis ...

α= 5%

Given information

(Step 3) Compute the test statistic:

(Step 2) Find useful information:

n = 100

17.55=x

σσσσ = 3.87

n

xz

σµ−

=

100

87.3

09.1755.17 −= 19.1=

Page 58: Hang Seng Management College STA102 Statistical Analysis ...

(Step 4) Identify the rejection region:

-z0.025

= -1.96 z0.025

=1.96

2.5%2.5% 95%

Rejectionregion

Rejectionregion

Rejection region: z < - z0.025 or z > z0.025

i.e. the null hypothesis must be rejected if

z < -1.96 or z > 1.96.

Page 59: Hang Seng Management College STA102 Statistical Analysis ...

(Step 5) Draw conclusion:

(2) At αααα = 5%, we do not reject H0.

(3) In other words, there is no significance difference between the average AT&T bill and its leading competitor.

(1) Since - 1.96 < z < 1.96 , z does not fall intothe rejection region.

Page 60: Hang Seng Management College STA102 Statistical Analysis ...

The Presentation:

P-value method

09.17:0 =µH 09.17:1 ≠µH

(two-tailed test)

(Step 1)

Set up null and alternative hypotheses:

Page 61: Hang Seng Management College STA102 Statistical Analysis ...

α= 5%

Given information

(Step 3) Compute the test statistic:

(Step 2) Find useful information:

n = 100

17.55=x

σσσσ = 3.87

n

xz

σµ−

=

100

87.3

09.1755.17 −= 19.1=

Page 62: Hang Seng Management College STA102 Statistical Analysis ...

(Step 4) Compute the p-value:

-1.19 1.19

From (Step 3), we have computed z = 1.19

Area = 0.1170Area = 0.1170

p-value = P(Z < -1.19) + P(Z > 1.19)

Area of left-tailed region Area of rigft-tailed region

= 0.2340

Page 63: Hang Seng Management College STA102 Statistical Analysis ...

(Step 5) Draw conclusion:

(2) At αααα = 5%, we do not reject H0.

(3) In other words, there is no significance difference between the average AT&T bill and its leading competitor.

(1) Since p-value = 0.2340 > 0.05 = αααα

Page 64: Hang Seng Management College STA102 Statistical Analysis ...

7.2 Test for Mean of One Population when the Population Standard Deviation is unknown

Test statistic:

n

s

xt

µ−=

with (n - 1) degrees of freedom

When the population standard deviation is unknown, we use the sample standard deviation. Then the student t statistic is used.

Page 65: Hang Seng Management College STA102 Statistical Analysis ...

A courier service advertises that its average

delivery time is less than 6 hours for local deliveries.

Assume the average delivery times are normally

distributed. A random sample of times for 12

deliveries to an address across town was recorded.

These data are shown below. Is there sufficient

evidence to support the courier’s advertisement, at

the 5% level of significance?

Data

3.03 6.33 6.50 5.22 3.56 6.76

7.98 4.82 7.96 4.54 5.09 6.46

Example 14

Page 66: Hang Seng Management College STA102 Statistical Analysis ...

The Presentation:

Rejection region method

6:0 =µH 6:1 <µH

(one-tailed test)

(Step 1)

Set up null and alternative hypotheses:

We are given 12 sample data, so a sample standard deviation is computed. Therefore, we should use t statistic (no population SD).

Page 67: Hang Seng Management College STA102 Statistical Analysis ...

(Step 2) Find useful information:

n = 12, α = 5%

From the data, we have

s = 1.58 Sample standard deviation

,69.5=x

69.0

12

58.1

669.5−=

−=t

(Step 3) Compute the test statistic:

Page 68: Hang Seng Management College STA102 Statistical Analysis ...

(Step 4) Identify the rejection region:

Rejection region: t < - t0.05, 11 = - 1.796

i.e. the null hypothesis must be rejected if

t < - 1.796

111-21d.f.

5% ,12

==

== αn

Page 69: Hang Seng Management College STA102 Statistical Analysis ...

Rejection

region

-1.796

(1) Since t > - 1.796, t-statistic does not fall into

the rejection region.

t = -0.69

(3) In other words, there is no evidence to support the

courier’s advertisement .

(Step 5) Draw conclusion:

(2) At α = 5%, we do not reject H0.

Page 70: Hang Seng Management College STA102 Statistical Analysis ...

(step 1) H0 : µµµµ = 6

H1 : µµµµ < 6

(Step 2) n = 12, α = 5%, s = 1.58,69.5=x

69.0

12

58.1

669.5−=

−=t(Step 3)

The Presentation:

P-value method

Page 71: Hang Seng Management College STA102 Statistical Analysis ...

p-value = P(t < -0.69) > 0.1 > 0.05 = α

(Step 4)

-0.69

p-value

(1) Since p-value > α, we do not reject H0 at α = 5%.

(2) In other words, there is no evidence to support the

courier’s advertisement .

(Step 5)

Important: we cannot find p-value when t-stat is used.

Page 72: Hang Seng Management College STA102 Statistical Analysis ...

7.3 Test on difference between two population means

(2) Independent Samples

(1) Matched Observations

e.g. : compare the mean marks of two “independent” groups of students taking the same course.(The number of observations in each group can be different)

-- “before / after” data: the difference between “before” and “after” data

e.g. : Test the effectiveness of a diet programby studying 20 participants.

Page 73: Hang Seng Management College STA102 Statistical Analysis ...

7.3.1 Independent Samples

Test statistic for µµµµ1 - µµµµ2 :

2

2

2

1

2

1

2121 )()(

nn

xxz

σσ

µµ

+

−−−=

When Both Population Variances are Known

Page 74: Hang Seng Management College STA102 Statistical Analysis ...

2

2

2

1

2

1

2121 )()(

nn

xxz

σσ

µµ

+

−−−=

1n1x

2

2n2x

2

, and are sample size, samplemean and population variance of group 1dataset.

, and are sample size, samplemean and population variance of group 2 dataset.

Rarely used as σσσσ12 & σσσσ2

2 are usually unknown.

Page 75: Hang Seng Management College STA102 Statistical Analysis ...

Test statisticfor µµµµ1 - µµµµ2 :

)11

(

)()(

21

2

2121

nnS

xxt

p +

−−−=

µµ

where2

)1()1(

21

2

22

2

112

−+−+−

=nn

snsnS p

with 2.. 21 −+= nnfd

22

21 σσ =

When Not Both Population Variances are Knownbut can be assumed to be Equal ( )

Pooled Variance t-test

Pooled Variance

Page 76: Hang Seng Management College STA102 Statistical Analysis ...

A researcher recorded the distances (in yards) for a random sample of British and America courses. The observations are as follows:

(population 1)

British: n1= 28 x1 = 6345 s1 = 71.3

(population 2)

American: n2 = 33 x2 = 6358 s2 = 55.7

Can we infer that British courses are shorter than American courses, at the 5% level of significance? (Assume that the 2 population variances are the same, i.e. equal-variance assumption)

Example 15

Page 77: Hang Seng Management College STA102 Statistical Analysis ...

H0 : µ1 - µ2 = 0H1 : µ1 - µ2 < 0

(Step 1)

(One-tailed test)

(Step 2)n1 = 28 x1 = 6345 s1 = 71.3

n2 = 33 x2 = 6358 s2 = 55.7

α= 5%

d.f. = 28 + 33 - 2 = 59

Note that we compare two population means based on the sample means and two groups of random samples are independent. Therefore, we should use pooled variance t-test.

Page 78: Hang Seng Management College STA102 Statistical Analysis ...

2

)1()1(

21

2

22

2

112

−+−+−

=nn

snsnSp

(Step 3a) Compute Sp2 :

(Step 3b) Compute the test statistic:

401023328

)7.55)(133()3.71)(128( 22

=−+

−+−=

+

−−−=

21

2

2121

11

)()(

nns

xxt

p

µµ80.0

33

1

28

14010

0)63586345(−=

+

−−=

Page 79: Hang Seng Management College STA102 Statistical Analysis ...

- t0.05, 59 = -1.671

(Step 4)

Rejection region: t < - t0.05, 59 = -1.671

i.e. the null hypothesis must be rejected if

t < -1.671.

Rejectionregion

95%5%

Page 80: Hang Seng Management College STA102 Statistical Analysis ...

(Step 5)

(3) In other words, there is no evidence to conclude that the British courses are shorter than the American courses.

(1) Since t > -1.671, t-statistic

does not fall into the rejection region.

(2) At α = 5%, we do not reject H0.

Page 81: Hang Seng Management College STA102 Statistical Analysis ...

7.3.2 Matched Observations

Test statistic for µµµµD:

n

s

xt

D

DD µ−=

with d.f. (n – 1)

-- Matched Data : “before / after” data

Page 82: Hang Seng Management College STA102 Statistical Analysis ...

Many people use scanners to read documents and store them in a Word file. To help determine which brand of scanner to buy, a student conducts an experiment wherein 8 documents are scanned by eachof the two scanners that he is interested in. He records the number of errors made by each. These data are list here. Can he infer that Brand A is betterthan Brand B, at the 5% level of significance?

Document 1 2 3 4 5 6 7 8Brand A 17 29 18 14 21 25 22 29 Brand B 21 38 15 19 22 30 31 37

Example 16

Page 83: Hang Seng Management College STA102 Statistical Analysis ...

H0 : µµµµD = 0 (Brand A is not better than Brand B)

H1 : µµµµD < 0 (Brand A is better than Brand B)

(Step 1)

(Note: One-tailed test)

Document 1 2 3 4 5 6 7 8

Brand A 17 29 18 14 21 25 22 29

Brand B 21 38 15 19 22 30 31 37

D = A - B -4 -9 3 -5 -1 -5 -9 -8

Page 84: Hang Seng Management College STA102 Statistical Analysis ...

n = 8d.f. = 8 – 1 = 7α = 5%,

= - 4.75 sD = 4.1662

The differences (data): -4 -9 3 -5 -1 -5 -9 -8

Dx

Given Information

Computed from the data

(Step 2)

Page 85: Hang Seng Management College STA102 Statistical Analysis ...

22.3

8

1662.4

075.4−=

−−=

−=

n

s

xt

D

DD µ

(Step 3)

Rejection region: t < - t0.05, 7 = -1.895

i.e. the null hypothesis must be rejected if

t < -1.895.

(Step 4)

Page 86: Hang Seng Management College STA102 Statistical Analysis ...

(3) In other words, there is sufficient evidence to

show that the performance of Brand A is better than that of Brand B.

(1) Since t < - 1.895, t-statistic falls into the rejection region.

(Step 5)

(2) At α = 5%, H0 is rejected.

Page 87: Hang Seng Management College STA102 Statistical Analysis ...

8. Test on Population Variance(s)

�Draw inference about the spread or variability of a distribution.

– Examples:• Measure the risk level of an investment

• Deal with the QC problem

Page 88: Hang Seng Management College STA102 Statistical Analysis ...

8.1 Test on Variance of One Population

The test statistic for population variance σσσσ2 :

2

22 )1(

σχ

sn −=

This statistic is called the Chi-square Statistic, which is defined as the sum of the squared independent standard normal distributed variates.

with d.f. = n - 1

Page 89: Hang Seng Management College STA102 Statistical Analysis ...

α

Determining the Chi-square critical value:

21, −nαχ

21,1 −− nαχα

-- The critical value in Chi-square distribution with (n-1) d.f. such that the area to its left(left-tailed area) is equal to is

-- The critical value in Chi-square distribution with (n-1) d.f. such that the area to its right(right-tailed area) is equal to is

Page 90: Hang Seng Management College STA102 Statistical Analysis ...

A>2σA=2σ

A<2σ

2

1,

2

−> nαχχ

A=2σ A≠2σ

A=2σ 2

1,1

2

−−< nαχχ

2

1,2

2

−>

nαχχ

2

1,2

1

2

−−<

nαχχ

or

H0 H1 Rejection Region

1

2

3

Page 91: Hang Seng Management College STA102 Statistical Analysis ...

The president of a company that developed anew type of machine boasts that this machine can fill 1-litre containers so consistently that the variance of the fills will be less than 1 cubic centimeter (cc). A random sample of 251-litre fills was taken and the results wererecorded. It is found that the sample variance is 0.6333. Do these data allow the president to make this claim at αααα = 5%.

Example 17

Page 92: Hang Seng Management College STA102 Statistical Analysis ...

H0 : σσσσ2 = 1

H1 : σσσσ2 < 1 (one-tailed test)

(Step 1)

n = 25, d.f. = 24,

s2 = 0.6333, αααα = 5%

(Step 2)

Page 93: Hang Seng Management College STA102 Statistical Analysis ...

20.151

)6333.0)(125()1(2

22 =

−=

−=

σχ

sn

(Step 3)

(Step 4)

Rejection region: 8484.13224,95.0

21,1

2 ==< −− χχχ α n

i.e. the null hypothesis must be rejected

if 8484.132 <χ

Page 94: Hang Seng Management College STA102 Statistical Analysis ...

Page 95: Hang Seng Management College STA102 Statistical Analysis ...

(2) In other words, these data do not allow the president to make his claim at α = 5%.

8484.132

24,95.0 =χ

0.05

8484.132 >χ(1) Since , We do not reject H0 atα = 5%

20.152 =χ

Rejectionregion

(Step 5)

Page 96: Hang Seng Management College STA102 Statistical Analysis ...

8.2 Test on Difference between

2 Population Variances

provided that the populations are normal

Test statistic:2

1

2

2

2

2

2

1

s

s or

s

sF =

If we want to test whether two population variancesare equal, we use the so-called F-test, which is formedby the ratio of two independent Chi-square variables.

Page 97: Hang Seng Management College STA102 Statistical Analysis ...

2

2

2

1

s

sF =

(d.f. for the numerator)

(d.f. for the denominator)

with 11 −n

12 −n

and

2

1

2

2

s

sF =

For

For

(d.f. for the numerator)

(d.f. for the denominator)

with

11 −n

12 −n and

Page 98: Hang Seng Management College STA102 Statistical Analysis ...

2

2

2

1 σσ =

2

2

2

1 σσ =

2

2

2

1 σσ <

1,1, 21 −−>nn

FF α

2

2

2

1 σσ =

H0 H1Rejection Region

2

2

2

1 σσ >

1,1,2

21 −−>

nnFF α

2

2

2

1 σσ ≠

1,1, 12 −−>nn

FF α

Test Statistic

22

21

s

sF =

21

22

s

sF =

The larger of the two

ratios

1,1,2

12 −−>

nnFF α( )

( )

Page 99: Hang Seng Management College STA102 Statistical Analysis ...

Rejection region

Rejection region

1,1,2

12

1

−− nnFα 1,1,

221 −− nn

1,1,2

21

1

−− nnFα 1,1,

212 −− nn

Graphically, the rejection region for a two-tailed test:

)(,2

2

2

12

2

2

1 sss

sF >=

)(,2

1

2

22

1

2

2 sss

sF >=

We do not use the left tailed critical values

Page 100: Hang Seng Management College STA102 Statistical Analysis ...

n1 = 25 s12 = 60n2 = 20 s22 = 25

Given the following statistics, test to

determine whether the variance of

population 1 is larger than the

variance of population 2 at α = 5%.

Example 18

Page 101: Hang Seng Management College STA102 Statistical Analysis ...

n1 = 25 s12 = 60n2 = 20 s22 = 25 α = 5%

(Step 1)

(Step 2)

H0 : σσσσ12 = σσσσ2

2

H1 : σσσσ12 > σσσσ2

2 (one-tailed test)

(Step 3)

40.225

6022

21 ===

s

sF

Page 102: Hang Seng Management College STA102 Statistical Analysis ...

(Step 4) Rejection region:

19,24,05.01,1, 21FFF nn => −−α 11.2=

Page 103: Hang Seng Management College STA102 Statistical Analysis ...

5%

F0.05,24,19 = 2.11

(2) In other words, there is enough evidence to

infer that the variance of population 1 is

greater than the variance of population 2.

F = 2.40

(Step 5)

(1) Since F > 2.11 , we reject H0 at α = 5%

Rejectionregion

Page 104: Hang Seng Management College STA102 Statistical Analysis ...

Recall: To test whether the two population means are equal using the pooled variance t-test, we assume σ1

2 = σ22 . Thus, the F-test should

be performed to confirm σ12 = σ2

2

before applying the pooled variance t-test.

Page 105: Hang Seng Management College STA102 Statistical Analysis ...

H0 : σσσσ12 = σσσσ2

2 H1 : σσσσ12 ≠≠≠≠ σσσσ2

2

Given the following data, test thehypothesis at 10% level of significance.

Example 19

Sample 1: 7 4 9 12 8 6 9 14Sample 2: 10 7 13 18 4 8 21 20 5 8

Page 106: Hang Seng Management College STA102 Statistical Analysis ...

8130.327.10

16.392

1

2

2 ===s

sF

(Step 1)

H0 : σσσσ12 = σσσσ2

2 H1 : σσσσ12 ≠≠≠≠ σσσσ2

2

(two-tailed test)

(Step 2)

n1 = 8 s12 = 10.27

n2 = 10 s22 = 39.16 αααα = 0.1

(Step 3)

Page 107: Hang Seng Management College STA102 Statistical Analysis ...

(Step 4)

Rejection region:

7,9,05.0

1,1,2

12

F

FFnn

=

>−−

α

68.3>F

That is,

Page 108: Hang Seng Management College STA102 Statistical Analysis ...

F0.05, 9, 7 = 3.68 F =3.8130

(Step 5)

5%

(2) In other words, there is enough evidence to

infer that population variances differ.

(1) Since F > 3.68, we reject H0 at α = 10%

Rejectionregion

Page 109: Hang Seng Management College STA102 Statistical Analysis ...

9. Test for Population Proportion

�Example:– Proportion of AD students who live in the hostel in HSMC.

– Find the “Yes” proportion.

9.1 Test on Proportion of One Population

Page 110: Hang Seng Management College STA102 Statistical Analysis ...

The test statistic for the populationproportion p

n

pp

ppz

)1(

ˆ

−=

The random variable is approximately normal for np and n(1-p) greater than 5.

See Chapter 1 Section 3.3 for more details.

Page 111: Hang Seng Management College STA102 Statistical Analysis ...

Suppose that in a sample of 200, we observe 140 successes. Is there sufficient evidence at αααα = 1%to indicate that the population proportion of successes is greater than 65%?

Example 20

Page 112: Hang Seng Management College STA102 Statistical Analysis ...

H0 : p = 0.65 H1 : p > 0.65 (One-tailed test)

n = 200 x = 140 α = 1%

(Step 1)

(Step 2)

7.0200

140ˆ ===

n

xp

(Step 3)

4825.1

200

)35.0(65.0

65.07.0

)1(

ˆ=

−=

−=

n

pp

ppz

Page 113: Hang Seng Management College STA102 Statistical Analysis ...

z0.01 = 2.33z = 1.4825

(Step 5)

(2) In other words, there is not enough evidence

to infer that the proportion of success is

greater than 0.65

(1) Since z < 2.33, we do not reject H0 at α = 1%

Rejectionregion

1%

(Step 4) Find rejection region

33.201.0 ==> zzz αRejection region:

Page 114: Hang Seng Management College STA102 Statistical Analysis ...

Area = 0.0694

p-value = P(Z > 1.48)

1.48

(Step 5)

(1) Since p-value > α, we do not reject H0 at α = 1%

(Step 4) Compute the p - value

(2) In other words, there is not enough evidence

to infer that the proportion of success is

greater than 0.65

= 0.0694

Page 115: Hang Seng Management College STA102 Statistical Analysis ...

In some states the law requires drivers to turn on their headlights when driving in therain. A highway patrol officer suspects thatif exactly one-quarter of all drivers follow the rule (In other words, he/she does not believe that exactly one-quarter of all drivers follow the rule). As a test, he/she randomlysamples 200 cars driving in the rain and counts the number whose headlights are turned on. He/she finds this number to be 41. Does the officer have enough evidence at the5% level of significance to support his/herbelief.

Example 21

Page 116: Hang Seng Management College STA102 Statistical Analysis ...

H0 : p = 0.25 H1 : p ≠ 0.25 (Two-tailed test)

n = 200 x = 41 α = 5%

(Step 1)

(Step 2)

205.0200

41ˆ ===

n

xp

(Step 3)

4697.1

200

)75.0(25.0

25.0205.0

)1(

ˆ−=

−=

−=

n

pp

ppz

Page 117: Hang Seng Management College STA102 Statistical Analysis ...

(Step 4)

Rejection region: z < - z0.025 or z > z0.025

i.e. z < -1.96 or z > 1.96

(Step 5)

(2) In other words, there is not enough evidenceto support the officer’s belief.

(1) Since -1.96 < z < 1.96 , we do not reject H0

at α = 5%,

Page 118: Hang Seng Management College STA102 Statistical Analysis ...

9.2 Test on Difference between

2 Population Proportions

�Example:

– Compare the proportion of students

who spend more than 10 hours per week

on studying in School A and School B.

We are going to compare the difference (subtraction) of two population proportion

21 pp −

Page 119: Hang Seng Management College STA102 Statistical Analysis ...

0: 210 =− ppH

Assumption:

The statistic is approximately normally distributed.

21 pp −

We are going to test whether two population proportions are equal. Therefore, the nullhypothesis is

Page 120: Hang Seng Management College STA102 Statistical Analysis ...

Test statisticfor p1 - p2: )

11)(ˆ1(ˆ

)()ˆˆ(

21

2121

nnpp

ppppz

+−

−−−=

where21

2211

21

21ˆˆ

ˆnn

pnpn

nn

xxp

++

=++

=

is the pooled proportion estimate.

Page 121: Hang Seng Management College STA102 Statistical Analysis ...

In order to monitor the opinions of theelectorate, six months ago, a survey wasundertaken to determine the degree ofsupport for a national party leader. Of asample of 1,100, 56% indicated that they would vote for this politician. This month,another survey of 800 voters revealed that46% now support the leader. At the 5%significance level, can we infer that the national leader’s popularity has decreased?

Example 22

Page 122: Hang Seng Management College STA102 Statistical Analysis ...

H0 : p1 - p2 = 0 H1 : p1 - p2 > 0 (One-tailed test)

Let p1 and p2 be the population proportionsof voters in six months ago and this month,respectively.

(Step 1)

(Step 2)

n1 = 1100n2 = 800 α = 5%

56.0ˆ1 =p

46.0ˆ2 =p

Page 123: Hang Seng Management College STA102 Statistical Analysis ...

21

2211

21

21ˆˆ

ˆnn

pnpn

nn

xxp

++

=++

=

( ) ( )

+−

−−−=

21

2121

11)ˆ1(ˆ

ˆˆ

nnpp

ppppz

( )31.4

800

1

1100

1)518.01(518.0

046.056.0=

+−

−−=

(Step 3a) Compute :

(Step 3b) Compute the Test Statistic:

5180.08001100

46.080056.01100=

+×+×

=

Page 124: Hang Seng Management College STA102 Statistical Analysis ...

0.05

z0.05 = 1.645

(1) Since z > 1.645, we reject H0 at α = 5%

(2) In other words, there is enough evidence toshow that the leader’s popularity has decreased.

z = 4.31

(Step 5)Rejectionregion

(Step 4)

645.105.0 ==> zzz αRejection region:

Page 125: Hang Seng Management College STA102 Statistical Analysis ...

�Test if 2 nominal variables are related

Nominal variables: contain two or more

categories without a natural ordering of

the categories.

e.g.:

Sex (Male & Female)

Job performance (Poor, Fair, satisfactory)

10 Chi-Squared Test of a Contingency Table

Page 126: Hang Seng Management College STA102 Statistical Analysis ...

(1) Want to test whether the performance of a certain scannerand the brand are independent.

(2) Want to find out if the bank’s employees’ standard of dress is independent of their professional advancement.

Page 127: Hang Seng Management College STA102 Statistical Analysis ...

with

Test statistic: ∑ =

−=

k

ii

ii

e

ef

1

22 )(

χ

)1)(1(.. −−= crfd

k: Number of cells in the contingency table.r: Number of rows in the contingency table.c: Number of columns in the contingency table.fi: Observed valuesei: Expected values

H0 : The 2 variables are independent H1 : The 2 variables are not independent

Page 128: Hang Seng Management College STA102 Statistical Analysis ...

Rule of Five

�In a contingency table where one or more cells have expected values of less than 5, we need to combine rows or columns to satisfy the rule of five.

Page 129: Hang Seng Management College STA102 Statistical Analysis ...

The manager of a company that manufactures shirts wants to determine whether there are differences in the quality of workmanship among the three daily shifts. She randomly selects 600 recently made shirts and carefully inspects them. Each shirt is classified as either perfect or flawed, and the shift that produced the shirt is also recorded.

Shift

Shirt condition A B C

Perfect 240 191 139

Flawed 10 9 11

Example 23

Do these data provide sufficient evidence to infer that there are differences in production quality among the three shifts of workers at the 5% level of significance?

Page 130: Hang Seng Management College STA102 Statistical Analysis ...

H0 : The 2 variables (shirt condition & daily shifts)are independent.

H1 : The 2 variables (shirt condition & daily shifts)are not independent

(Step 1)

Page 131: Hang Seng Management College STA102 Statistical Analysis ...

Note that in each cell, the expected value is n ×××× P(shift of worker) ×××× P(shirt condition) under the null hypothesis is true.

For instance, the cell in 1st row & 1st column

) () ( perfectisshirtPAShiftfromshirtPne ××=

(Step 2) Compute expected value

5.237600

570

600

250600 =××=

Page 132: Hang Seng Management College STA102 Statistical Analysis ...

In general, the expected frequency for a cell in row i and column j is

sizesampleTotal

totaljColumntotaliRowe

×=

So, the cell in the 1st row and the 1st column is

5.237600

570250=

×=e

(Step 2) Compute expected value

Page 133: Hang Seng Management College STA102 Statistical Analysis ...

Shift Shirt Condition A B C Total

Perfect

f1 = 240 f2 = 191 f3 = 139 570

Flawed

f4 = 10 f5 = 9 f6 = 11 30

Total 250 200 150 600

5.237600

5702501 =

×=e 190

600

5702002 =

×=e 5.142

600

5701503 =

×=e

5.12600

302504 =

×=e 10

600

302005 =

×=e 5.7

600

301506 =

×=e

(Step 2) Compute expected value

Page 134: Hang Seng Management College STA102 Statistical Analysis ...

( )

1

2

2 ∑=

−=

k

i i

ii

e

efχ

The test statistic is

(Step 3)

5.7

)5.711(

10

)109(

5.12

)5.1210(

5.142

)5.142139(

190

)190191(

5.237

)5.237240(

222

222

−+

−+

−+

−+

−+

−=

36.2=

Page 135: Hang Seng Management College STA102 Statistical Analysis ...

(Step 4)

d.f.: (r-1)(c-1) = (2-1)(3-1) = 2

αααα = 5%

2

,

2

dfαχχ >

i.e. the null hypothesis must be rejected

if

Rejection region:

99147.52 >χ

99147.52

2,05.0

2 => χχ

Page 136: Hang Seng Management College STA102 Statistical Analysis ...

χχχχ20.05, 2 = 5.99147

(1) Since χ2 < 5.99147, we do not reject H0 at α = 5%,

(2) There is not enough evidence to infer that thereare differences in production quality among thethree shifts of workers .

χχχχ2 = 2.36

(Step 5)

Rejectionregion

Page 137: Hang Seng Management College STA102 Statistical Analysis ...

11. ANOVA (ANalysis Of VAriance)

�Compare means of more than 2 independent populations– Example: test if there is any difference in the effectiveness of 3 training methods

3210 : µµµ ==H

equal are means allnot :1H

where µµµµi = mean score obtained by the employees at the end of the 3 training methods

Page 138: Hang Seng Management College STA102 Statistical Analysis ...

11.1 Single-Factor or One-Way ANOVA

�Factor: criterion to classify a population.

�Treatments: factor levels

– Example: comparing 3 training methods� Factor: training method� with 3 levels (treatments)

Page 139: Hang Seng Management College STA102 Statistical Analysis ...

� Three assumptions must be satisfied:

– The samples are drawn independently.

– The populations are with equal variances.

– Each population is normally distributed.

Page 140: Hang Seng Management College STA102 Statistical Analysis ...

One-Way ANOVA

Source of

Variation

Degrees of

freedom

Sums of

Squares

Mean

Squares F-statistic

Treatments 1−k SSTr )1( −

=k

SSTMST r

r MSE

MSTF r=

Error kn − SSE )( kn

SSEMSE

−=

Total 1−n SST

Test statistic:

ANOVA Table

Differences between Means

Rejection region:

knkFF −−> ,1,α

Page 141: Hang Seng Management College STA102 Statistical Analysis ...

Two types of differences between means

1. Differences between each sample mean and the grand mean of all observations.

2. Differences between observationsin each level and each sample mean.

One-Way ANOVA

Page 142: Hang Seng Management College STA102 Statistical Analysis ...

Sum of Squares for Treatments (SSTr)

∑ =−=

k

jjjr xxnSST

1

2)(

1. Differences between each sample mean and the grand mean of all observations are measured by SSTr(or between-treatment variation)

Page 143: Hang Seng Management College STA102 Statistical Analysis ...

x

∑ =−=

k

jjjr xxnSST

1

2)(

The closer the sample means , the closer they are to the grand mean and the smaller the SSTr.

jx

If large difference between the sample means, some sample means differ considerably from the grand mean and SSTr will be larger.

Page 144: Hang Seng Management College STA102 Statistical Analysis ...

2. Differences between observations in each level and each sample mean are measured by SSE (or within-treatments variation)

∑ ∑= =−=

k

j

n

ijij

j

xxSSE1 1

2)(

Computational formula:

( )∑ =−=

k

j jj snSSE1

21

Sum of Squares for Error (SSE)

Page 145: Hang Seng Management College STA102 Statistical Analysis ...

�If the differences between observations in each level and each sample mean (i.e. SSE) are small, the differences between sample means are due to the real differences among the population means.

�If SSTr is large but SSE is small, there is evidence to conclude that there is a significant differencebetween the sample means.

Page 146: Hang Seng Management College STA102 Statistical Analysis ...

Departments of Finance, Marketing and Management asked each of their MBA graduates to report the number of job offers. Can we conclude at the 5% significance levelthat there are differences in the number of job offers between the three MBA majors?

Example 24

Fin Mkt Mgt3 1 8

1 5 5

4 3 4

1 4 6

Page 147: Hang Seng Management College STA102 Statistical Analysis ...

H0 : µµµµFin = µµµµMkt = µµµµMgt

H1 : not all means are the same (at least 2 means differ)

(Step 1)

(Step 2) Fin Mkt Mgt3 1 8

1 5 54 3 4

1 4 6

mean: 2.25 3.25 5.75

Var: 2.25 2.91666 2.91666

Page 148: Hang Seng Management College STA102 Statistical Analysis ...

nFin = 4 xFin = 2.25 SFin2 = 2.25

nMkt = 4 xMkt = 3.25 SMkt2 = 2.91666

nMgt = 4 xMgt = 5.75 SMgt2 = 2.91666

n = 12, Grand Mean = 3.75, αααα = 5%

(Step 2)

x

Page 149: Hang Seng Management College STA102 Statistical Analysis ...

SSTr = 4(2.25 – 3.75)2

+ 4(3.25 – 3.75)2

+ 4(5.75 – 3.75)2

= 26

Sum of Squares for Treatments:

∑ =−=

k

jjjr xxnSST

1

2)(

(Step 3)

Page 150: Hang Seng Management College STA102 Statistical Analysis ...

SSE = 3(2.25) + 3(2.91666)

+ 3(2.91666)

= 24.24996

Sum of Squares for Error:

( )∑ =−=

k

j jj snSSE1

21

(Step 3)

Page 151: Hang Seng Management College STA102 Statistical Analysis ...

Source d.f. SS MS F

Treatments k-1 = 2 26 13 F = 4.8248

Error n-k = 9 24.24996 69444.2

Total n-1 = 11 50.24996

ANOVA Table

(Step 3)

Page 152: Hang Seng Management College STA102 Statistical Analysis ...

F0.05, 2, 9

5%

= 4.26

(1) Since F > 4.26, we reject H0 at α = 5%.

(2) There is enough evidence to infer that there are

differences in the no. of job offers for the three

majors.

(Step 5)

Rejectionregion

F = 4.8248

(Step 4)

Rejection region: 26.49,2,05.0,1, ==> −− FFF knkα

Page 153: Hang Seng Management College STA102 Statistical Analysis ...

11.2 Comparing the F-Test in the ANOVA

and the Pooled Variance t-Test

1. Few calculations are involved.

Still prefer to use one F-test because

Instead of applying one F-test for , 321 µµµ ==

we can apply 3 pooled variance t-tests: 323121 , , µµµµµµ ===

2. Performing more tests increases the chance of committing type I error.

Page 154: Hang Seng Management College STA102 Statistical Analysis ...

11.3 Randomized Block (Two-Way) ANOVA

�Example: 3 training methods

– trainees are assigned to each of 3 methods randomly

– trainees may differ in their abilities

– SSE may include the variations due to their abilities

Page 155: Hang Seng Management College STA102 Statistical Analysis ...

�Alter the design:

– Compare the 3 training methods by the trainees with similar ability

– Factor: each training method

– Block: each ability group

�Variation due to the ability differences can be reduced, and thus making it easier to determine whether differences exist between the treatment means.

Page 156: Hang Seng Management College STA102 Statistical Analysis ...

Example 25A pharmaceutical company has recently developed four drugs to reduce cholesterol levels of patients with very high level (over 280) of cholesterol. To determine whether any differences exist in their benefits, an experiment was organized. The company selected 25 groups of four men. In each group, the men were matched according to age and weight. The drugs were administered over a 2-month period, and the reduction in cholesterol was recorded.

Page 157: Hang Seng Management College STA102 Statistical Analysis ...

Block Drug 1 Drug 2 Drug 3 Drug 4 1 6.6 12.6 2.7 8.7 2 7.1 3.5 2.4 9.3 : : :

: : :

: : :

: : :

: : :

25 28.4 31.2 26.1 27.4

Test if the drugs are equally good to reduce cholesterol levels at the 5% significance level. Each block (4 persons) is matched according to age & weight.

Treatments

Page 158: Hang Seng Management College STA102 Statistical Analysis ...

H0 : µµµµ1 = µµµµ2 = µµµµ3 = µµµµ4

H1 : at least 2 means differ

(Step 1)

Use Excel:ANOVA: Two-Factor Without ReplicationData Analysis:雙因子變異數分析雙因子變異數分析雙因子變異數分析雙因子變異數分析 : 無重複試驗無重複試驗無重複試驗無重複試驗

Page 159: Hang Seng Management College STA102 Statistical Analysis ...

The Excel output:

ANOVA

Source SS d.f. MS F P-value F critRows 3848.657 24 160.3607 10.10537 9.70E-15 1.669456

ColumnsColumnsColumnsColumns 195.9547 3 65.31823 4.116127 0.009418 2.731809

Error 1142.558 72 15.86886

Total 5187.169 99

treatments

blocks

(Step 2-4)

F0.05, 3, 72

F0.05, 24, 72

Page 160: Hang Seng Management College STA102 Statistical Analysis ...

�Since p-value = 0.0094 < αααα, we rejectH0 at αααα = 5%.

�In other words, we conclude that there is sufficient evidence to infer that at least two of the drugs differ in reducing cholesterol.

(Step 5)

Page 161: Hang Seng Management College STA102 Statistical Analysis ...

Are there any significant differences

between groups? (ie, should the

randomized block design be recommended?)

H0 : µµµµ1 = µµµµ2 = …. = µµµµ25

H1 : at least 2 means differ

(Step 1)

Page 162: Hang Seng Management College STA102 Statistical Analysis ...

The Excel output:

ANOVA

Source SS d.f. MS F P-value F critRows 3848.657 24 160.3607 10.10537 9.70E-15 1.669456

ColumnsColumnsColumnsColumns 195.9547 3 65.31823 4.116127 0.009418 2.731809

Error 1142.558 72 15.86886

Total 5187.169 99

treatments

blocks

(Step 2-4)

F0.05, 3, 72

F0.05, 24, 72

Page 163: Hang Seng Management College STA102 Statistical Analysis ...

�Since p-value = 9.70E-15 < αααα, we reject H0 αααα = 5%.

�In other words, we conclude that there is sufficient evidence to infer that there are differences between the groups of men.

�The Randomized Block Design is a correct choice

(Step 5)

Page 164: Hang Seng Management College STA102 Statistical Analysis ...

End of Chapter 2