Experimental Design STAT E-150 Statistical Methods.

Experimental Design

STAT E-150Statistical Methods

2

The design of an experiment typically takes place before the data is collected. It is the design of the experiment that determines the model.

Some basic vocabulary: • The individuals used in the experiment are called experimental units.

When the units are human beings, they are called subjects. A specific experimental condition applied to the units is called a treatment.

• The explanatory variables in the experiment are often called factors.

A specific value of a factor is called a level of the factor.

• A response variable is a measure of the outcome of the experiment.

3

The design of an experiment first describes the response variable(s), the factor(s), and the layout of the treatments.

Here is an example: What are the effects of repeated exposure to an advertising message? In an experiment designed to investigate this question, undergraduate students viewed a 40-minute television program that included ads for a digital camera. Some students saw a 30-second commercial; others saw a 90-second version. The same commercial was shown either one, three, or five times during the program. After viewing, all of the students answered questions about their recall of the ad, their attitude toward the digital camera, and their intention to purchase it.

4

What are the effects of repeated exposure to an advertising message? In an experiment designed to investigate this question, undergraduate students viewed a 40-minute television program that included ads for a digital camera. Some students saw a 30-second commercial; others saw a 90-second version. The same commercial was shown either one, three, or five times during the program. After viewing, all of the students answered questions about their recall of the ad, their attitude toward the digital camera, and their intention to purchase it.

Who were the subjects?

5


Who were the subjects? The subjects were undergraduate students

6


What are the factors of the experiment?

7


What are the factors of the experiment? The factors are the length of the commercial and the number of repetitions.

8


What are the levels of each factor?

9


What are the levels of each factor? The length of the commercial has two levels: 30 seconds and 90 seconds. The number of repetitions has three levels: 1, 3, and 5 repetitions.

10


What is (are) the response variable(s)?

11


What is (are) the response variable(s)? The response variables are the recall of the ad, the attitude toward the camera, and the intention to purchase the camera.

12

Here is a diagram of this experimental design:

Factor B: Number of Repetitions

1 time 3 times 5 times

Factor A:Length

30 sec. 1 2 3

90 sec. 4 5 6

13

The second step of the design is to define how experimental units will be assigned to treatments. Comparison of the effects of treatments can only be valid if the treatments are applied to similar groups of experimental units.

This requires at least two groups; one is often a control group.

14

The most important principle of statistical design is randomization, which involves using chance to select the sample from the population and to assign subjects to treatment.

This assignment, then, does not depend on any characteristic of the subjects and does not rely on the judgment of the experimenter in any way. This protects against bias which favors certain outcomes, allows us to draw cause-and-effect conclusions, and provides a justification for using a probability model.

15

Does talking on a hands-free cell phone distract drivers? Undergraduate students "drove" in a driving simulator equipped with a hands-free cell phone. How quickly does the student respond when the car ahead brakes suddenly? Twenty students (the control group) simply drove. Another twenty students (the experimental group) talked on the phone while driving.

16

Group 1 Treatment 1 20 students Drive Random Compare allocation brake time Group 2 Treatment 2 20 students Drive and talk

Randomization produces groups of subjects that we expect to be similar in all respects before the treatment is applied.

Here is a diagram of the design of this experiment:

17

Comparative design (comparing the effects of several treatments) helps ensure that influences other than the treatment (in this case, the cell phone) operate equally on all groups.

Any differences, then, must be due either to the treatment or to chance (i.e. the random assignment).

Recall that a small p-value is an indication that it is unlikely that the results we see are due to chance alone.

18

Randomization F-Test (One-Way ANOVA)

1. Observed F: Compute the value of F as usual. (FObs) 2. Randomization distribution:

a. Rerandomize = create a random reordering of the data.b. Compute the value of F for the rerandomized data (FRand)c. Repeat this many times and record the values of FRand

3. Find the proportion of values of FRand that are greater than or

equal to FObs

This proportion is an estimate of the p-value.

Both the F-test and the Randomization F-test use a quantitative response variable and one categorical predictor, and both test whether the observed differences in group means are too large to be due to chance alone.

19

Repeated Measures Analysis of Variance In this analysis, the groups are not independent. Instead, we have one sample of subjects and take multiple, or repeated, measurements on each subject.

20

Here’s an example:

Consider this data for four subjects over three treatments, where the response variable is number of trials to success on some task:

If you look at the treatment means, you see a very slight difference. But look also at the subject means: It is apparent that Subject 1 learns quickly under all conditions, and that Subjects 3 and 4 learn very slowly. These differences among the subjects are responsible for the differences within treatments, but they really have nothing to do with the treatment effect.

Treatment

Subject 1 2 3 Mean1 2 4 7 4.332 10 12 13 11.673 22 29 30 27.004 30 31 34 31.67

Mean 16 19 21 18.67

21

A repeated measure is a multivariate response in which the same variable is measured on each subject several times, possibly under different conditions. The hypotheses are the same as they are for a one-way ANOVA. The analysis is also the same: first a test for the equality of the means, and then appropriate tests to investigate any differences that are found.

22

The assumptions are:

1. Independent observations within each treatment2. Normal populations within each treatment3. Equal population variances within each treatment4. Sphericity

Note that there is an additional assumption of sphericity, which assumes that the pairwise differences among treatment levels have equal variances. Mauchly's test for sphericity is commonly used to test this assumption.

23

Example 1: A study was conducted to examine whether the anxiety a person experiences affects performance on a learning task. Subjects with varying levels of anxiety performed a learning task across a series of four trials and the number of errors made was recorded.Does the number of errors made by subjects change significantly across the four learning trials?

H0: μtrial1= μtrial2 = μtrial3 = μtrial4

Ha: the means are not all equal

24

We will use statistical tests to assess whether the assumptions are met. First, we can use the Kolmogorov-Smirnov results in the Tests of Normality table to assess normality as we did earlier; since all p-values are greater than .05, we do not reject the null hypothesis of normality.

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Trial 1 .238 12 .060 .890 12 .117

Trial 2 .169 12 .200* .940 12 .495

Trial 3 .182 12 .200* .947 12 .595

Trial 4 .201 12 .193 .885 12 .100

a. Lilliefors Significance Correction*. This is a lower bound of the true significance.

25

We will use Mauchly's test to test the hypothesis that the variances of the differences between conditions are equal. Since p = .112 we cannot reject the null hypothesis, and can conclude that there are no significant differences between the variances of the differences. We now know that the condition of sphericity is met.

Since the conditions for the test have been met, we can continue.

Mauchly's Test of Sphericityb

Measure:MEASURE_1

Within Subjects Effect Mauchly's WApprox. Chi-

Square df Sig.

Epsilona

Greenhouse-Geisser Huynh-Feldt Lower-bound

factor1 .398 8.957 5 .112 .622 .744 .333

Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix.

a. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table.b. Design: Intercept Within Subjects Design: factor1

26

We will consider the results in the first row of this table, where sphericity is assumed.

The results (F = 127.561, p = .000) indicate that we can reject the null hypothesis and conclude that the number of errors made by subjects did change significantly across the four learning trials.

Tests of Within-Subjects Effects

Measure:MEASURE_1

SourceType III Sum of Squares df Mean Square F Sig.

factor1 Sphericity Assumed 991.500 3 330.500 127.561 .000

Greenhouse-Geisser 991.500 1.865 531.709 127.561 .000

Huynh-Feldt 991.500 2.231 444.504 127.561 .000

Lower-bound 991.500 1.000 991.500 127.561 .000Error(factor1) Sphericity Assumed 85.500 33 2.591

Greenhouse-Geisser 85.500 20.512 4.168 Huynh-Feldt 85.500 24.536 3.485 Lower-bound 85.500 11.000 7.773

27

Example 2: This data was collected in a study investigating factors associated with the risk of developing high blood pressure, or hypertension. The subjects were all college students. We would like to see if there is evidence that diastolic blood pressure changes significantly during three different conditions: resting, doing mental arithmetic, and immersing a hand in cold water. The null hypothesis is that blood pressure does not change under these stressors:

H0: μrest = μarith = μcold

Ha: the means are not all equal

28

First we can see what the descriptive statistics for this data suggest. It appears that the means are different for the three stressors, but we will need to analyze the data to see if the difference is significant. Descriptive Statistics

Mean Std. Deviation N

diastolic bp rest 67.6457 7.45186 175

diastolic bp mental arithmetic 73.7771 9.42046 175

diastolic bp cold pressor 77.9886 14.63816 175

29

First we will check the conditions for this test. We can use the results in the Tests of Normality table to assess normality as we did earlier; since p = .200, we do not reject the null hypothesis of normality.

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

diastolic bp rest .046 175 .200* .992 175 .460

diastolic bp mental arithmetic .036 175 .200* .997 175 .985

diastolic bp cold pressor .056 175 .200* .981 175 .017

a. Lilliefors Significance Correction*. This is a lower bound of the true significance.

30

However, the results of Mauchly's test are significant (p is close to 0), and we conclude that the condition of sphericity is not met. However, we can still continue with this analysis even when the assumption of sphericity is violated.

Mauchly's Test of Sphericityb

Measure:MEASURE_1

Within Subjects Effect Mauchly's W

Approx. Chi-

Square df Sig.

Epsilona

Greenhouse-

Geisser Huynh-Feldt Lower-bound

stressor .654 73.415 2 .000 .743 .748 .500

Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix.

a. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table.b. Design: Intercept Within Subjects Design: stressor

31

The table below includes additional information that can be used in this situation:

You can use these results if the sphericity assumption is met, as we did in the first example; however, this is not the case in this analysis. The next lines show results for other tests where adjustments to the degrees of freedom have been made because the sphericity assumption was violated.


Measure:MEASURE_1


stressor Sphericity Assumed 9467.806 2 4733.903 73.754 .000


Huynh-Feldt 9467.806 1.496 6329.115 73.754 .000

Lower-bound 9467.806 1.000 9467.806 73.754 .000

Error(stressor) Sphericity Assumed 22336.396 348 64.185

Greenhouse-Geisser 22336.396 258.580 86.381

Huynh-Feldt 22336.396 260.289 85.814

Lower-bound 22336.396 174.000 128.370

32

The table below includes additional information that can be used in this situation:

The Greenhouse-Geisser test is commonly used in this situation. Those results are significant (F = 73.754, p = .000) and so we reject the null hypothesis. We can conclude that this indicates that diastolic blood pressure changes significantly during the various mental and physical stressors investigated in this study.


Measure:MEASURE_1


stressor Sphericity Assumed 9467.806 2 4733.903 73.754 .000


Huynh-Feldt 9467.806 1.496 6329.115 73.754 .000

Lower-bound 9467.806 1.000 9467.806 73.754 .000

Error(stressor) Sphericity Assumed 22336.396 348 64.185

Greenhouse-Geisser 22336.396 258.580 86.381

Huynh-Feldt 22336.396 260.289 85.814

Lower-bound 22336.396 174.000 128.370

33

But where are these differences? We can compare the means using the Bonferroni method:

These results indicate that there are no significant differences between the stressors when compared pairwise. All p-values are reported as .000.

Pairwise Comparisons

Measure:MEASURE_1

(I) stressor (J) stressorMean

Difference (I-J) Std. Error Sig.a

95% Confidence Interval for Differencea

Lower Bound Upper Bound

1 2 -6.131* .551 .000 -7.463 -4.800

3 -10.343* .960 .000 -12.664 -8.022

2 1 6.131* .551 .000 4.800 7.463

3 -4.211* .988 .000 -6.599 -1.824

3 1 10.343* .960 .000 8.022 12.664

2 4.211* .988 .000 1.824 6.599Based on estimated marginal means*. The mean difference is significant at the .05 level.a. Adjustment for multiple comparisons: Bonferroni.

34

You may also choose to create a graph that illustrates and supports the analysis. This graph supports the conclusion that the means of the three groups are different:

35

Another way to investigate differences between pairs of groups is by performing appropriate paired-sample t-tests. For example, you can compare the subjects' diastolic blood pressure in two of the conditions: resting, and after immersing a hand in cold water. Note that in this test, we are testing the mean difference between the groups, not the difference of the means. Paired sample t-tests are used to compare two means when the samples are not independent, so that a value in one sample can be paired with a corresponding value in the second sample. For example, a subject's resting diastolic blood pressure can be paired with the same subject's diastolic blood pressure after submerging a hand in cold water.

36

Assumptions and Conditions Paired data assumption:

The data must be paired. Independence assumption:

The groups are not independent, but the differences are independent.

Randomization condition: Pairs are randomly chosen, or treatments are assigned randomly to the members of each pair

Normal population assumption:

The population of differences is nearly normal(Check with a histogram and/or a Normal probability plot of the

differences.)

37

When the conditions are met and the null hypothesis is true we can model the sampling distribution of this statistic with a Student’s t-model with n-1 degrees of freedom, where n is the number of pairs.

We test the hypothesis H0: μd = Δ0 where the d’s are pairwise differences, and Δ0 is almost always 0.

38

H0: μd = 0H0: μd ≠ 0

Here are the results:

We will reject the null hypothesis (t = -10.590, p = .000) and conclude that there is a difference between resting diastolic blood pressure and diastolic blood pressure after immersing a hand in cold water.

Paired Samples Test

Paired Differences

t df Sig. (2-tailed)MeanStd.

DeviationStd. Error

Mean

95% Confidence Interval of the Difference

Lower UpperPair 1 diastolic bp rest -

diastolic bp cold pressor

-10.20251 12.88920 .96338 -12.10364 -8.30139 -10.590 178 .000

39

SPSS Instructions for Repeated Measures ANOVA 1. Choose > Analyze > Descriptive Statistics > Explore

In the Explore dialog box, choose the columns you wish to include in your analysis, and choose Plots as the display. Then click on Plots.

40

In the Explore Plots dialog box, select Normality plots with tests, deselect Stem-and-leaf and select None under Boxplots.

Click on Continue and then click on OK in the main dialog box.

41

2. Choose > Analyze > General Linear Model > Repeated Measures

Enter an appropriate Within-Subject Factor name; in this case it is stressor. Type in the number of levels, then click Add. You will see the results in the next box.

42


Now click on Define. You will see the Repeated Measures dialog box like the one shown on the left. Select the columns containing data for the factor levels in order by highlighting the column name and clicking on the arrow that will move the name to the list of variables. Then click on Plots.

43


In the Repeated Measures: Profile Plots dialog box, click on the factor and move it to the Horizontal Axis. Then click on Add to add this selection to the Plots: box. Click on Continue.

44


In the Repeated Measures: Profile Plots dialog box, click on the factor and move it to the Horizontal Axis. Then click on Add to add this selection to the Plots: box. Click on Continue.

45

SPSS Instructions for Paired Sample t-tests Choose > Analyze > Compare Means > Paired-Samples t-test Indicate the columns you wish to compare, and then click on OK:

Experimental Design STAT E-150 Statistical Methods.

Documents

Transcript of Experimental Design STAT E-150 Statistical Methods.