Agresti/Franklin Statistics, 1 of 82 Chapter 13 Comparing Groups: Analysis of Variance Methods Learn...

Agresti/Franklin Statistics, 1 of 82

Chapter 13Comparing Groups: Analysis of

Variance Methods

Learn …. How to use Statistical inference

To Compare Several Population

Section 13.1

How Can We Compare Several Means?

One-Way ANOVA

Analysis of Variance

The analysis of variance method compares means of several groups

• Let g denote the number of groups

• Each group has a corresponding population of subjects

• The means of the response variable for the g populations are denoted by µ1, µ2, … µg

Hypotheses and Assumptions for the ANOVA Test

The analysis of variance is a significance test of the null hypothesis of equal population means:

• H0: µ1 = µ2 = …= µg

The alternative hypothesis is:

• Ha: At least two of the population means are unequal

The assumptions for the ANOVA test comparing population means are as follows:

1.The population distributions of the response variable for the g groups are normal with the same standard deviation for each group

2.Randomization:

• In a survey sample, independent random samples are selected from the g populations

• In an experiment, subjects are randomly assigned separately to the g groups

Example: How Long Will You Tolerate Being Put on Hold?

An airline has a toll-free telephone number for reservations

The airline hopes a caller remains on hold until the call is answered, so as not to lose a potential customer

The airline recently conducted a randomized experiment to analyze whether callers would remain on hold longer, on the average, if they heard:

• An advertisement about the airline and its current promotion

• Muzak (“elevator music”)

• Classical music

The company randomly selected one out of every 1000 calls in a week

For each call, they randomly selected one of the three recorded messages

They measured the number of minutes that the caller stayed on hold before hanging up (these calls were purposely not answered)

Denote the holding time means for the populations that these three random samples represent by:

• µ1 = mean for the advertisement

• µ2 = mean for the Muzak

• µ3 = mean for the classical music

The hypotheses for the ANOVA test are:

• H0: µ1=µ2=µ3

• Ha: at least two of the population means are different

Here is a display of the sample means:

As you can see from the graph on the previous page, the sample means are quite different

This alone is not sufficient evidence to enable us to reject H0

Variability between Groups and within Groups

The ANOVA method is used to compare population means

It is called analysis of variance because it uses evidence about two types of variability

Variability Between Groups and Within Groups

Two examples of data sets with equal means but unequal variability:

Which case do you think gives stronger evidence against H0:µ1=µ2=µ3?

What is the difference between the data in these two cases?

In both cases the variability between pairs of means is the same

In ‘Case b’ the variability within each sample is much smaller than in ‘Case a.’

The fact that ‘Case b’ has less variability within each sample gives stronger evidence against H0

ANOVA F-Test Statistic

The analysis of variance (ANOVA) F-test statistic is:

The larger the variability between groups relative to the variability within groups, the larger the F test statistic tends to be

ty variabiligroupsWithin

ty variabiligroupsBetween F

ANOVA F-Test Statistic

The test statistic for comparing means has the F-distribution

The larger the F-test statistic value, the stronger the evidence against H0

ANOVA F-test for Comparing Population Means of Several Groups

1. Assumptions: Independent random samples

Normal population distributions with equal standard deviations

2. Hypotheses:

• H0:µ1=µ2= … =µg

• Ha:at least two of the population means are different

3. Test statistic:

F- sampling distribution has df1= g -1, df2 = N – g = (total sample size – no. of groups)

ty variabiligroupsWithin

ty variabiligroupsBetween F

4. P-value: Right-tail probability above the observed F- value

5. Conclusion: If decision is needed, reject if P-value ≤ significance level (such as 0.05)

The Variance Estimates and the ANOVA Table

Let σ denote the standard deviation for each of the g population distributions

One assumption for the ANOVA F-test is that each population has the same standard deviation, σ

The F-test statistic is the ratio of two estimates of σ2, the population variance for each group

The estimate of σ2 in the denominator of the F-test statistic uses the variability within each group

The estimate of σ2 in the numerator of the F-test statistic uses the variability between each sample mean and the overall mean for all the data

Computer software displays the two estimates of σ2 in the ANOVA table

The MS column contains the two estimates, which are called mean squares

The ratio of the two mean squares is the F- test statistic

This F- statistic has a P-value

Example: ANOVA for Customers’ Telephone Holding Times

This example is a continuation of a previous example in which an airline conducted a randomized experiment to analyze whether callers would remain on hold longer, on the average, if they heard:• An advertisement about the airline and its

current promotion

• Muzak (“elevator music”)

• Classical music

Denote the holding time means for the populations that these three random samples represent by:

• µ1 = mean for the advertisement

• µ2 = mean for the Muzak

• µ3 = mean for the classical music

The hypotheses for the ANOVA test are:

• H0:µ1=µ2=µ3

• Ha:at least two of the population means are different

Since P-value < 0.05, there is sufficient evidence to reject H0:µ1=µ2=µ3

We conclude that a difference exists among the three types of messages in the population mean amount of time that customers are willing to remain on hold

Assumptions for the ANOVA F-Test and the Effects of Violating Them

1. Population distributions are normal• Moderate violations of the normality

assumption are not serious

2. These distributions have the same standard deviation, σ• Moderate violations are also not serious

You can construct box plots or dot plots for the sample data distributions to check for extreme violations of normality

Misleading results may occur with the F-test if the distributions are highly skewed and the sample size N is small

Misleading results may also occur with the F-test if there are relatively large differences among the standard deviations (the largest sample standard deviation being more than double the smallest one)

The 1998 General Social Survey asked subjects how many friend they have. Is this associated with the respondent’s astrological sign (the 13 symbols of the Zodiac)? The ANOVA table for the data reports F=0.61

State the null hypothesis

a. The population means for all 12 Zodiac signs are the same.

b. At least two population means are different.

State the alternative hypothesis

a. The population means for all 12 Zodiac signs are the same.

b. At least two population means are different.

Based on what you know about the F distribution would you guess that the test value of 0.61 provides strong evidence against the null hypothesis?

a. Nob. Yes

The P-value associated with the F-statistic is 0.82. At a significance level of 0.05, what is the correct decision?

a. Reject Ho

b. Fail to Reject Ho

c. Reject Ha

d. Fail to Reject Ha

Section 13.2

How Should We Follow Up an ANOVA F-Test?

Follow Up to an ANOVA F-Test

When an analysis of variance F-test has a small P-value, the test does not specify which means are different or how different they are

We can estimate differences between population means with confidence intervals

Confidence Intervals Comparing Pairs of Means

For two groups i and j, with sample means yi and yj having sample sizes ni and nj, the 95% confidence interval for µi - µj is:

The t-score has df = N – g

ji nnstyy

11025.

Confidence Intervals Comparing Pairs of Means

Some software refers to this confidence method as the Fisher method

When the confidence interval does not contain 0, we can infer that the population means are different

The interval shows just how different the means may be

Example: Number of Good Friends and Happiness

A recent GSS study asked: “About how many good friends do you have?”

The study also asked each respondent to indicate whether they were ‘very happy,’ ‘pretty happy,’ or ‘not too happy’

Let the response variable y = number of good friends

Let the categorical explanatory variable x = happiness level

Construct a 95% CI to compare the population mean number of good friends for the two categories: ‘very happy’ and ‘pretty happy.’

95% CI formula:

ji nnstyy

11025.

First, use the output to find s:

Use software or a table to find the t-value of 1.963

3.152.234 errorMSs

Next calculate the interval:

Since the CI contains only positive numbers, this suggest that, on average, people who are very happy have more good friends than people who are pretty happy

5.3) (0.7,or 2.3,3.0 468

1)3.15(963.1)4.74.10(

Controlling Overall Confidence with Many Confidence Intervals

The confidence interval method just discussed is mainly used when g is small or when only a few comparisons are of main interest

The confidence level of 0.95 applies to any particular confidence interval that we construct

How can we construct the intervals so that the 95% confidence extends to the entire set of intervals rather than to each single interval?

Methods that control the probability that all confidence intervals will contain the true differences in means are called multiple comparison methods

The method that we will use is called the Tukey method

It is designed to give overall confidence level very close to the desired value (such as 0.95)

This method is available in most software packages

Example: Multiple Comparison Intervals for Number of Good Friends

Section 13.3

What if There Are Two Factors?

Two-Way ANOVA

One-way ANOVA is a bivariate method:

• It has a quantitative response variable

• It has one categorical explanatory variable

Two-way ANOVA is a multivariate method:

• It has a quantitative response variable

• It has two categorical explanatory variables

Example: How Does Corn Yield Depend on Amounts of Fertilizer and Manure?

A recent study at Iowa State University:

• A field was portioned into 20 equal-size plots

• Each plot was planted with the same amount of corn seed

• The goal was to study how the yield of corn later harvested depended on the levels of use of nitrogen-based fertilizer and manure

• Each factor (fertilizer and manure) was measured in a binary manner

What are the four treatments you can compare with this experiment?

Inference about Effects in Two-Way ANOVA

In two-way ANOVA, a null hypothesis states that the population means are the same in each category of one factor, at each fixed level of the other factor

We could test:

H0: Mean corn yield is equal for plots at the low and high levels of fertilizer, for each fixed level of manure

We could also test:

H0: Mean corn yield is equal for plots at the low and high levels of manure, for each fixed level of fertilizer

The effect of individual factors tested with the two null hypotheses (the previous two pages) are called main effects

Assumptions for the Two-way ANOVA F-test

1. The population distribution for each group is normal

2. The population standard deviations are identical

3. The data result from a random sample or randomized experiment

F-test Statistics in Two-way ANOVA

For testing the main effect for a factor, the test statistic is the ratio of mean squares:

factor for the

F-test Statistics in Two-way ANOVA

When the null hypothesis of equal population means for the factor is true, the F-test statistic values tend to fluctuate around 1

When it is false, they tend to be larger

The P-value is the right-tail probability above the observed F-value

Example: Testing the Main Effects for Corn Yield

Data and sample statistics for each group:

Output from Two-way ANOVA:

First consider the hypothesis:

H0: Mean corn yield is equal for plots at the low and high levels of fertilizer, for each fixed level of manure

From the output, you can obtain the F-test statistic of 6.33 with its corresponding P-value of 0.022

The small P-value indicates strong evidence that the mean corn yield depends on fertilizer level

Next consider the hypothesis:

H0: Mean corn yield is equal for plots at the low and high levels of manure, for each fixed level of fertilizer

From the output, you can obtain the F-test statistic of 6.88 with its corresponding P-value of 0.018

The small P-value indicates strong evidence that the mean corn yield depends on manure level

Exploring Interaction between Factors in Two-Way ANOVA

No interaction between two factors means that the effect of either factor on the response variable is the same at each category of the other factor

A graph showing interaction:

Testing for Interaction

In conducting a two-way ANOVA, before testing the main effects, it is customary to test a third null hypothesis stating that their is no interaction between the factors in their effects on the response

Testing for Interaction

The test statistic providing the sample evidence of interaction is:

When H0 is false, the F-statistic tends to be large

errorfor

ninteractiofor

Example: Testing for Interaction with Corn Yield Data

ANOVA table for a model that allows interaction:

Example: Testing for Interaction with Corn Yield Data

The test statistic for H0: no interaction is• F = 1.10 with a corresponding P-value

• of 0.311• There is not much evidence of

interaction• We would not reject H0 at the usual

significance levels, such as 0.05

Check Interaction before Main Effects

In practice, in two-way ANOVA, you should first test the hypothesis of no interaction

If the evidence of interaction is not strong (that is, if the P-value is not small), then test the main effects hypotheses and/or construct confidence intervals for those effects

If important evidence of interaction exists, plot and compare the cell means for a factor separately at each category of the other factor

An experiment randomly assigns 100 subjects suffering from high cholesterol to one of four groups: low-dose Lipitor, high-dose Lipitor, low-dose Pravachol and high-dose Pravachol. After three months of treatment, the change in cholesterol level is measured.

What is the response variable?

a. Cholesterol levelb. Drug dosagec. Drug type

An experiment randomly assigns 100 subjects suffering from high cholesterol to one of four groups: low-dose Lipitor, high-dose Lipitor, low-dose Pravachol and high-dose Pravachol. After three months of treatment, the change in cholesterol level is measured.

What are the factors?

a. Cholesterol level and drug typeb. Drug dosage and cholesterol levelc. Drug type and drug dosage

Agresti/Franklin Statistics, 1 of 82 Chapter 13 Comparing Groups: Analysis of Variance Methods Learn...

Documents

Transcript of Agresti/Franklin Statistics, 1 of 82 Chapter 13 Comparing Groups: Analysis of Variance Methods Learn...

Variance Reduction in Black-box Variational Inference by Adaptive … · 2018. 7. 4. · Black-box variational inference (BBVI)[Ranganathet al., 2014] is a generic approximate inference

Agresti - Bayesian Inference for Categorical Data Analysis

Cluster-Robust Variance Estimation for Dyadic Data · Cluster-Robust Variance Estimation for Dyadic Data Abstract Dyadic data are common in the social sciences, although inference

Bayesian Inference for Categorical Data Analysis: A SurveyBayesian Inference for Categorical Data Analysis: A Survey Alan Agresti Department of Statistics University of Florida Gainesville,

Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Variance Reduction and Quasi-Newton for Particle-Based ......Variance Reduction and Quasi-Newton for Particle-Based Variational Inference of data points and the resulting need for

AGRESTI AMATO - ipsiasar.it

Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.

Bayesian Inference for Categorical Data Analysis: A Survey · 2005. 3. 18. · Bayesian Inference for Categorical Data Analysis: A Survey Alan Agresti Department of Statistics University

Is minimum-variance investing really worth the … minimum-variance investing really worth the while? An analysis with robust performance inference ... invariant to the portfolio revision

Multivariate L1 Statistical Methods: The Package MNM1. Introduction Classical multivariate statistical inference methods (Hotelling’s T2, multivariate analysis of variance, multivariate

EPI809/Spring 2008 1 Chapter 12 Multisample inference: Analysis of Variance Analysis of Variance.

INFERENCE ON VARIANCE COMPONENTS IN BIOASSAY EXPERIMENTS · VARIANCE COMPONENTS IN BIOASSAY EXPERIMENTS ... Bootstrap percentile method based on non-parametric cases resampling. ...

A Survey of Exact Inference for Contingency Tables Alan ...brian/720/hw01/agresti-1992-exact-inference.pdf · 2.2 "Exact" estimation in 2 x 2 tables 8.1 Controversy over exact conditional

Bayesian inference for categorical data analysisusers.stat.ufl.edu/~aa/articles/agresti_hitchcock_2005.pdfBayesian inference for categorical data analysis Alan Agresti 1 , David B.

Alan Agresti · Foundations of Linear and Generalized Linear Models Wiley Series in Probability and Statistics Alan Agresti

Inference for the Difference in Means -Two Normal Distributions and Variance Unknown

Agresti & Finlay: Chapter 1-2 - University at Albany

Lecture 10 Inference about the difference between population proportions (Chapter 13.6) One-way analysis of variance (Chapter 15.2)

Individualized Fusion Learning (iFusion) for ......Personalized (precision) medicine/individualized inference Bias-variance tradeoff ... share parameters or models S3 Finally obtain