Analysis of Variance

53
Analysis of Variance

description

One way anova two way anova caomparison with example

Transcript of Analysis of Variance

Page 1: Analysis of Variance

Analysis of VarianceAnalysis of Variance

Page 2: Analysis of Variance

Introduction

• Analysis of variance compares two or more populations of interval data.

• Specifically, we are interested in determining whether differences exist between the population means.

• The procedure works by analyzing the sample variance.

Page 3: Analysis of Variance

• The analysis of variance is a procedure that tests to determine whether differences exits between two or more population means.

• To do this, the technique analyzes the sample variances

One Way Analysis of Variance

Page 4: Analysis of Variance

• Example– An apple juice manufacturer is planning to develop a new

product -a liquid concentrate.– The marketing manager has to decide how to market the

new product.– Three strategies are considered

• Emphasize convenience of using the product.• Emphasize the quality of the product.• Emphasize the product’s low price.

One Way Analysis of Variance

Page 5: Analysis of Variance

• Example continued– An experiment was conducted as follows:

• In three cities an advertisement campaign was launched .

• In each city only one of the three characteristics

(convenience, quality, and price) was emphasized.

• The weekly sales were recorded for twenty weeks

following the beginning of the campaigns.

One Way Analysis of Variance

Page 6: Analysis of Variance

One Way Analysis of Variance

Convnce Quality Price529 804 672658 630 531793 774 443514 717 596663 679 602719 604 502711 620 659606 697 689461 706 675529 615 512498 492 691663 719 733604 787 698495 699 776485 572 561557 523 572353 584 469557 634 581542 580 679614 624 532

Weekly sales

Weekly sales

Weekly sales

Page 7: Analysis of Variance

• Solution– The data are interval– The problem objective is to compare sales in three

cities.– We hypothesize that the three population means are

equal

One Way Analysis of Variance

Page 8: Analysis of Variance

H0: m1 = m2= m3

H1: At least two means differ

To build the statistic needed to test thehypotheses use the following notation:

• Solution

Defining the Hypotheses

Page 9: Analysis of Variance

Independent samples are drawn from k populations (treatments).

1 2 kX11

x21

.

.

.Xn1,1

1

1x

n

X12

x22

.

.

.Xn2,2

2

2x

n

X1k

x2k

.

.

.Xnk,k

k

kx

n

Sample sizeSample mean

First observation,first sample

Second observation,second sample

X is the “response variable”.The variables’ value are called “responses”.

Notation

Page 10: Analysis of Variance

Terminology

• In the context of this problem…Response variable – weekly salesResponses – actual sale valuesExperimental unit – weeks in the three cities when we record sales figures.Factor – the criterion by which we classify the populations (the treatments). In this problems the factor is the marketing strategy.Factor levels – the population (treatment) names. In this problem factor levels are the marketing strategies.

Page 11: Analysis of Variance

Two types of variability are employed when testing for the equality of the population

means

The rationale of the test statistic

Page 12: Analysis of Variance

Graphical demonstration:Employing two types of variability

Page 13: Analysis of Variance

20

25

30

1

7

Treatment 1 Treatment 2 Treatment 3

10

12

19

9

Treatment 1Treatment 2Treatment 3

20

161514

1110

9

10x1

15x2

20x3

10x1

15x2

20x3

The sample means are the same as before,but the larger within-sample variability makes it harder to draw a conclusionabout the population means.

A small variability withinthe samples makes it easierto draw a conclusion about the population means.

Page 14: Analysis of Variance

The rationale behind the test statistic – I

• If the null hypothesis is true, we would expect all the sample means to be close to one another (and as a result, close to the grand mean).

• If the alternative hypothesis is true, at least some of the sample means would differ.

• Thus, we measure variability between sample means.

Page 15: Analysis of Variance

• The variability between the sample means is measured as the sum of squared distances between each mean and the grand mean.

This sum is called the Sum of Squares for Treatments

SSTIn our example treatments arerepresented by the differentadvertising strategies.

Variability between sample means

Page 16: Analysis of Variance

2k

1jjj )xx(nSST

There are k treatments

The size of sample j The mean of sample j

Sum of squares for treatments (SST)

Note: When the sample means are close toone another, their distance from the grand mean is small, leading to a small SST. Thus, large SST indicates large variation between sample means, which supports H1.

Page 17: Analysis of Variance

• Solution – continuedCalculate SST

2k

1jjj

321

)xx(nSST

65.608x00.653x577.55x

= 20(577.55 - 613.07)2 + + 20(653.00 - 613.07)2 + + 20(608.65 - 613.07)2 == 57,512.23

The grand mean is calculated by

k21

kk2211

n...nnxn...xnxn

X

Sum of squares for treatments (SST)

Page 18: Analysis of Variance

• Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means.

• Therefore, even though sample means may markedly differ from one another, SST must be judged relative to the “within samples variability”.

The rationale behind test statistic – II

Page 19: Analysis of Variance

• The variability within samples is measured by adding all the squared distances between observations and their sample means.

This sum is called the Sum of Squares for Error

SSEIn our example this is the sum of all squared differencesbetween sales in city j and thesample mean of city j (over all the three cities).

Within samples variability

Page 20: Analysis of Variance

• Solution – continuedCalculate SSE

Sum of squares for errors (SSE)

k

jjij

n

i

xxSSE

sss

j

1

2

1

23

22

21

)(

24.670,811,238,700.775,10

= (n1 - 1)s12 + (n2 -1)s2

2 + (n3 -1)s32

= (20 -1)10,774.44 + (20 -1)7,238.61+ (20-1)8,670.24 = 506,983.50

Page 21: Analysis of Variance

To perform the test we need to calculate the mean squares as follows:

The mean sum of squares

Calculation of MST - Mean Square for Treatments

12.756,2813

23.512,571

k

SSTMST

Calculation of MSEMean Square for Error

45.894,8360

50.983,509

kn

SSEMSE

Page 22: Analysis of Variance

23.3

45.894,8

12.756,28

MSE

MSTF

Calculation of the test statistic

with the following degrees of freedom:v1=k -1 and v2=n-k

Required Conditions:1. The populations tested are normally distributed.2. The variances of all the populations tested are equal.

Page 23: Analysis of Variance

And finally the hypothesis test:

H0: m1 = m2 = …=mk

H1: At least two means differ

Test statistic:

R.R: F>Fa,k-1,n-k

MSEMST

F

The F test rejection region

Page 24: Analysis of Variance

The F test

Ho: m1 = m2= m3

H1: At least two means differ

Test statistic F= MST/ MSE= 3.2315.3FFF:.R.R 360,13,05.0knk 1

Since 3.23 > 3.15, there is sufficient evidence to reject Ho in favor of H1, and argue that at least one of the mean sales is different than the others.

23.317.894,812.756,28

MSEMST

F

Page 25: Analysis of Variance

single factor ANOVA

SS(Total) = SST + SSE

Anova: Single Factor

SUMMARYGroups Count Sum Average Variance

Convenience 20 11551 577.55 10775.00Quality 20 13060 653.00 7238.11Price 20 12173 608.65 8670.24

ANOVASource of Variation SS df MS F P-value F crit

Between Groups 57512 2 28756 3.23 0.0468 3.16Within Groups 506984 57 8894

Total 564496 59

Page 26: Analysis of Variance

• Fixed effects– If all possible levels of a factor are included in our analysis we

have a fixed effect ANOVA.– The conclusion of a fixed effect ANOVA applies only to the

levels studied.• Random effects

– If the levels included in our analysis represent a random sample of all the possible levels, we have a random-effect ANOVA.

– The conclusion of the random-effect ANOVA applies to all the levels (not only those studied).

Models of Fixed and Random Effects

Page 27: Analysis of Variance

• In some ANOVA models the test statistic of the fixed effects case may differ from the test statistic of the random effect case.

• Fixed and random effects - examples– Fixed effects - The advertisement Example .All the levels of

the marketing strategies were included – Random effects - To determine if there is a difference in the

production rate of 50 machines, four machines are randomly selected and there production recorded.

Models of Fixed and Random Effects.

Page 28: Analysis of Variance

Two Way Analysis of Variance

Two Way Analysis of Variance

Page 29: Analysis of Variance

Factor ALevel 1Level2

Level 1

Factor B

Level 3

Two - way ANOVATwo factors

Level2

One - way ANOVASingle factor

Treatment 3 (level 1)

Response

Response

Treatment 1 (level 3)

Treatment 2 (level 2)

Page 30: Analysis of Variance

Two-Factor Analysis of Variance -

• Example– Suppose in the Example, two factors are to be

examined:• The effects of the marketing strategy on sales.

– Emphasis on convenience– Emphasis on quality– Emphasis on price

• The effects of the selected media on sales.– Advertise on TV– Advertise in newspapers

Page 31: Analysis of Variance

• Solution– We may attempt to analyze combinations of levels, one

from each factor using one-way ANOVA.– The treatments will be:

• Treatment 1: Emphasize convenience and advertise in TV• Treatment 2: Emphasize convenience and advertise in

newspapers• …………………………………………………………………….• Treatment 6: Emphasize price and advertise in newspapers

Attempting one-way ANOVA

Page 32: Analysis of Variance

• Solution–The hypotheses tested are:

H0: m1= m2= m3= m4= m5= m6

H1: At least two means differ.

Attempting one-way ANOVA

Page 33: Analysis of Variance

City1 City2 City3 City4 City5 City6Convnce Convnce Quality Quality Price Price

TV Paper TV Paper TV Paper

– In each one of six cities sales are recorded for ten weeks.

– In each city a different combination of marketing emphasis and media usage is employed.

• Solution

Attempting one-way ANOVA

Page 34: Analysis of Variance

• The p-value =.0452. • We conclude that there is evidence that differences

exist in the mean weekly sales among the six cities.

City1 City2 City3 City4 City5 City6Convnce Convnce Quality Quality Price Price

TV Paper TV Paper TV Paper

• Solution

Attempting one-way ANOVA

Page 35: Analysis of Variance

• These result raises some questions:– Are the differences in sales caused by the different

marketing strategies?– Are the differences in sales caused by the different

media used for advertising?– Are there combinations of marketing strategy and

media that interact to affect the weekly sales?

Interesting questions – no answers

Page 36: Analysis of Variance

• The current experimental design cannot provide answers to these questions.

• A new experimental design is needed.

Two-way ANOVA (two factors)

Page 37: Analysis of Variance

Two-way ANOVA (two factors)

City 1sales

City3sales

City 5sales

City 2sales

City 4sales

City 6sales

TV

Newspapers

Convenience Quality Price

Are there differences in the mean sales caused by different marketing strategies?

Factor A: Marketing strategy

Fact

or B

: Ad

verti

sing

med

ia

Page 38: Analysis of Variance

Test whether mean sales of “Convenience”, “Quality”, and “Price” significantly differ from one another.

H0: mConv.= mQuality = mPrice

H1: At least two means differ

Calculations are based on the sum of square for factor ASS(A)

Two-way ANOVA (two factors)

Page 39: Analysis of Variance

Two-way ANOVA (two factors)

City 1sales

City 3sales

City 5sales

City 2sales

City 4sales

City 6sales

Factor A: Marketing strategy

Fact

or B

: Ad

verti

sing

med

ia

Are there differences in the mean sales caused by different advertising media?

TV

Newspapers

Convenience Quality Price

Page 40: Analysis of Variance

Test whether mean sales of the “TV”, and “Newspapers” significantly differ from one another.

H0: mTV = mNewspapers

H1: The means differ

Calculations are based onthe sum of square for factor BSS(B)

Two-way ANOVA (two factors)

Page 41: Analysis of Variance

Two-way ANOVA (two factors)

City 1sales

City 5sales

City 2sales

City 4sales

City 6sales

TV

Newspapers

Convenience Quality Price

Factor A: Marketing strategy

Fact

or B

: Ad

verti

sing

med

ia

Are there differences in the mean sales caused by interaction between marketing strategy and advertising medium?

City 3sales

TV

Quality

Page 42: Analysis of Variance

Test whether mean sales of certain cells are different than the level expected.

Calculation are based on the sum of square for interaction SS(AB)

Two-way ANOVA (two factors)

Page 43: Analysis of Variance

Sums of squares

a

1i

2i )x]A[x(rb)A(SS })()()){(2(10( 222

. xxxxxx pricequalityconv

b

1j

2j )x]B[x(ra)B(SS })()){(3)(10( 22 xxxx NewspaperTV

b

1j

2jiij

a

1i

)x]B[x]A[x]AB[x(r)AB(SS

r

kijijk

b

j

a

i

ABxxSSE1

2

11

)][(

Page 44: Analysis of Variance

F tests for the Two-way ANOVA

• Test for the difference between the levels of the main factors A and B

F= MS(A)MSE

F= MS(B)MSE

Rejection region: F > Fa,a-1 ,n-ab F > Fa, b-1, n-ab

• Test for interaction between factors A and B

F= MS(AB)MSE

Rejection region: F > F ,(a a-1)(b-1),n-ab

SS(A)/(a-1) SS(B)/(b-1)

SS(AB)/(a-1)(b-1)

SSE/(n-ab)

Page 45: Analysis of Variance

Required conditions:

1. The response distributions is normal2. The treatment variances are equal.3. The samples are independent.

Page 46: Analysis of Variance

F tests for the Two-way ANOVAConvenience Quality Price

TV 491 677 575TV 712 627 614TV 558 590 706TV 447 632 484TV 479 683 478TV 624 760 650TV 546 690 583TV 444 548 536TV 582 579 579TV 672 644 795

Newspaper 464 689 803Newspaper 559 650 584Newspaper 759 704 525Newspaper 557 652 498Newspaper 528 576 812Newspaper 670 836 565Newspaper 534 628 708Newspaper 657 798 546Newspaper 557 497 616Newspaper 474 841 587

Page 47: Analysis of Variance

• Example – continued– Test of the difference in mean sales between the three marketing

strategiesH0: mconv. = mquality = mprice

H1: At least two mean sales are different

F tests for the Two-way ANOVA

ANOVASource of Variation SS df MS F P-value F critSample 13172.0 1 13172.0 1.42 0.2387 4.02Columns 98838.6 2 49419.3 5.33 0.0077 3.17Interaction 1609.6 2 804.8 0.09 0.9171 3.17Within 501136.7 54 9280.3

Total 614757.0 59

Factor A Marketing strategies

Page 48: Analysis of Variance

• Example – continued– Test of the difference in mean sales between the three

marketing strategiesH0: mconv. = mquality = mprice

H1: At least two mean sales are different

F = MS(Marketing strategy)/MSE = 5.33

Fcritical = Fa,a-1,n-ab = F.05,3-1,60-(3)(2) = 3.17; (p-value = .0077)

– At 5% significance level there is evidence to infer that differences in weekly sales exist among the marketing strategies.

F tests for the Two-way ANOVA

MS(A)/MSE

Page 49: Analysis of Variance

• Example - continued– Test of the difference in mean sales between the two

advertising mediaH0: mTV. = mNespaper

H1: The two mean sales differ

F tests for the Two-way ANOVA

Factor B = Advertising media

ANOVASource of Variation SS df MS F P-value F critSample 13172.0 1 13172.0 1.42 0.2387 4.02Columns 98838.6 2 49419.3 5.33 0.0077 3.17Interaction 1609.6 2 804.8 0.09 0.9171 3.17Within 501136.7 54 9280.3

Total 614757.0 59

Page 50: Analysis of Variance

• Example - continued– Test of the difference in mean sales between the two

advertising mediaH0: mTV. = mNespaper

H1: The two mean sales differ

F = MS(Media)/MSE = 1.42 Fcritical = F ,a a-1,n-ab = F.05,2-1,60-(3)(2) = 4.02 (p-value = .2387)

– At 5% significance level there is insufficient evidence to infer that differences in weekly sales exist between the two advertising media.

F tests for the Two-way ANOVA

MS(B)/MSE

Page 51: Analysis of Variance

• Example - continued– Test for interaction between factors A and B

H0: mTV*conv. = mTV*quality =…=mnewsp.*price

H1: At least two means differ

F tests for the Two-way ANOVA

Interaction AB = Marketing*Media

ANOVASource of Variation SS df MS F P-value F critSample 13172.0 1 13172.0 1.42 0.2387 4.02Columns 98838.6 2 49419.3 5.33 0.0077 3.17Interaction 1609.6 2 804.8 0.09 0.9171 3.17Within 501136.7 54 9280.3

Total 614757.0 59

Page 52: Analysis of Variance

• Example - continued– Test for interaction between factor A and B

H0: mTV*conv. = mTV*quality =…=mnewsp.*price

H1: At least two means differ

F = MS(Marketing*Media)/MSE = .09

Fcritical = F ,(a a-1)(b-1),n-ab = F.05,(3-1)(2-1),60-(3)(2) = 3.17 (p-value= .9171)

– At 5% significance level there is insufficient evidence to infer that the two factors interact to affect the mean weekly sales.

MS(AB)/MSE

F tests for the Two-way ANOVA

Page 53: Analysis of Variance

Jyothimon CM.Tech Technology ManagementUniversity of Kerala

Send your feedbacks and queries [email protected]