Introduction/Design of Experiments - Statistics...

18
Introduction/Design of Experiments Lecture Notes I Statistics 112, Fall 2002

Transcript of Introduction/Design of Experiments - Statistics...

Page 1: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Introduction/Design of Experiments

Lecture Notes I

Statistics 112, Fall 2002

Page 2: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Modes of Inquiry

� Descriptive. What is the world actually like? Examples:

– Economics: How are incomes distributed? How

much worse off are minority groups?

– Psychology: How do recruits for an intelligence

agency respond to psychologically trying job tests

(Rosnow and Rosenthal)?

– Biology: Taxonomy of species.

� Causal. How did the world get to be the way it is? What

would happen if some aspect of the world is changed?

Examples:

– Psychology: Milgram’s experiments on obedience,

motivated by a desire to understand the causes of

the Holocaust; Do smaller class sizes increase

students’ achievement?

– Medicine: Does taking Vitamin C prevent colds?

Page 3: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Descriptive Research

Uses:

�Generates scientific hypotheses. Examples: (i) What are the

root causes of terrorism? Need careful documentation of the

relationship between terrorist activity and other variables; (ii)

Pellagra.

� Prediction. Establishing causation is not always the goal.

Examples: prediction of job performance based on a job test,

medical treatment, computer recognition of handwritten zip

codes, racial and ethnic profiling.

The role of statistics:

�Graphical techniques and data summary techniques (e.g.,

linear regression) can provide an accurate description of the

main features of a complex data set and accurate predictions.

� Methods of statistical inference (e.g.,�-tests) provide a

measure of confidence in the reality of the patterns that are

found.

� Design of descriptive studies such as sample surveys. Idea of

probability sampling to control bias (Section 3.3).

Page 4: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Studies of Treatment Effects

Much of scientific research seeks to answer the question,

“What is the cause-and-effect relationship between a

treatment, policy or intervention and an outcome of

interest?” In other words, what would happen to the

outcome if we change a treatment or policy and do not

change any other aspects of the world. Examples:

� Medicine: How effective is a new drug? What is the

effect of smoking on one’s chance of developing cancer?

� Psychology: What change in an individual’s normal

solitary performance and behavior occurs when people

are present? What change in an individual’s moral

behavior occurs when the individual is commanded by

authority?

� Economics: What is the effect of a change in taxes on

labor supply and investment behavior? What is the effect

of a change in the minimum wage on employment?

� Education: What is the effect of smaller class sizes on

students’ achievement?

Page 5: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Studies of Treatment Effects Cont.

�Statistics plays a central role in both the design and the

analysis of studies of treatment effects.

� Terminology: The individuals in the study are called units.

When the units are human beings, they are called subjects. A

specific condition that is either applied or observed to hold for

the units is called a treatment. We are interested in the effect

of the treatment on a response variable.

� The challenge of studying treatment effects: Association is not

causation. Three reasons:

1. Chance fluctuations. One twin sister lives on the North

Pole and the other lives on the Equator. The sister on the

North Pole has four boys and the sister on the equator has

four girls. Does living in the cold make it more likely to have

a boy?

Heterogeneity of responses to treatment in a population.

2. Reverse causality or simultaneous causality. Which came

first - the chicken or the egg? Examples: Wealth and stock

ownership, Beta-carotene intake and morbidity, TV

watching and reading.

3. Lurking variables.

Page 6: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Lurking variables

Lurking variable (Section 2.4): A variable that has an important

effect on the relationship among the variables in a study but is not

included among the variables studied. Such variables are often

called omitted variables or confounding variables. Examples:

� There is a close relationship between the salaries of

Presbyterian ministers in Massachusetts and the price of rum

in Havana. Are the ministers benefiting from the rum trade or

supporting it?

� The death rate in the Navy during the Spanish-American War

was nine per thousand. For civilians in New York City during

the same period it was sixteen per thousand. Navy recruiters

later used these figures to show that it was safer to be in the

Navy than out of it.

� The Samaritans and suicide.

� Pellagra.

Page 7: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Statisical Principles for Designing Studies

�The two key statistical principles for designing studies of

treatment effects are (1) Design the study to control for the

effects of lurking variables on the response; (2) Replicate the

study on many units to reduce chance variation in the results.

� Comparison as a method of control: Laboratory experiments in

science and engineering often use a simple design in which

treatments are applied to all of the units. Example: Subject a

beam (the unit) to a load (treatment) and observe its response.

In studies with humans, such simple designs often do not

protect against lurking variables. Example: The placebo effect.

The simplest form of control is to compare several treatments

in an environment in which all other factors besides the

treatment are kept the same, e.g., compare the effects of

giving a drug to giving a placebo.

� The use of controls itself does not ensure a valid study design.

� In a randomized comparative experiment, the researcher uses

a chance mechanism (e.g., a coin flip) to assign subjects to

treatments. Prevents subtle biases in the assignment of

subjects to treatments and tends to balance lurking variables

between treatment groups.

Page 8: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Example: The Salk Vaccine Field Trial

�In the first half of the 20th century, polio was one of the most

frightening diseases, striking hardest at young children and

leaving many helpless cripples. It appeared in epidemic

waves, leading to summer seasons in which some

communities felt compelled to close swimming pools and

restrict public gatherings.

� By the 1950s, Jonas Salk developed a vaccine for polio that

had proved promising in laboratory experiments but it was

necessary to try it in the real world before releasing it for

general use.

� Need for replication: Suppose the vaccine was 50% effective.

Assume that during the trial, the rate of occurrence of polio

would be about 50 per 100,000. With 40,000 in the control

group and 40,000 in the vaccinated group, we would expect to

find about 20 control cases and 10 vaccinated cases, and a

difference of this magnitude could fairly easily be attributed to

chance (the standard error of the difference in proportions is

about 5.5). With 100,000 in each group, the expected

difference is 25 and the standard error is 8.7.(Review Chapter

8 for these calculations).

Page 9: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Designs for Salk Vaccine Field Trial

�The Historical Control Approach: Distribute the vaccine as

widely as possible, through the schools, to see whether the

rate of reported polio was appreciably less than usual during

the subsequent season.

� The Observed Control Approach: Offer vaccination to all

children in the second grade of participating schools and follow

the polio experience not only in these children but in the first

and third grade children.

� The Placebo Control Approach: Choose the control group from

the same population as the treatment group - children whose

parents consented to vaccination. Assign the treatment

randomly. Give a placebo to the control group. Do not tell

doctors which group the children belong to (double blinding).

Page 10: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Results of Salk Vaccine Field Trial

Placebo Control Observed Control

Size Rate Size Rate

Treatment 200,000 28 Grade 2 (vaccine) 225,000 25

Control 200,000 71 Grades 1 and 3 (control) 725,000 54

No consent 350,000 46 Grade 2 (no consent) 125,000 44

Table 1: The results of the Salk vaccine trial of 1954. Size of groups

and rate of polio cases per 100,000 in each group. The numbers are

rounded.

Page 11: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Logic of Randomized Comparative Experiment

�Randomization produces groups that should be similar in all

respects before the treatment is applied.

� Comparative design (i.e., use of the placebo and double

blinding in this experiment) ensures that influences other than

the treatment operate equally on the groups.

� Therefore, differences between the control and the treatment

group must be due either to the treatment or to the play of

chance in the random assignment of units due to the groups.

� Statistical inference provides a method for describing how

confident we can be that an observed difference between the

treatment and control groups did not arise due to chance.

� Salk Vaccine Field Trial: Suppose that the probability of a

randomly chosen child developing polio without the vaccination

(��� ) was the same as with the vaccination (��� ). The

approximate probability that we would observe a difference in

polio cases between the vaccinated and unvaccinated groups

as large as in Table 1 is about one in a billion (Review Chapter

8).

Page 12: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Inference for Comparing Two Proportions

�Let � � denote the probability of a randomly chosen child in the

study developing polio with the vaccination. Let � � denote the

probability of a randomly chosen child in the study developing

polio without the vaccination. Let� � and

� � denote the

number of children in the vaccination group and the placebo

control group that develop polio respectively and � � and � �denote the sample sizes for these two groups.

� Inference for comparing two proportions in a large sample is a

special case of comparing two means (Chapter 7.2). To test

the hypothesis � ��� � ��� � � , compute the � statistic

� �� �� � ��� ���� � � ���

� �� � �

� ��� � �� �

where the pooled standard error is

�� ���� � ����� ����� �� ���

�� �� � � �

� � � � �� � � � �

Under � � , � has a standard normal distribution. An

approximate level � confidence interval for � �� � � is

� � � � � �"! �$# �� �� ��� � � � � ��� � � �

� � �� � �%� � � �

� �

Page 13: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Inference for Salk Vaccine Field Trial

�Test of � � � � � � � � .

� � � ���, � � � ����� � ����� ,� ��� ����������� ,

� � � � � � , � � � ����� � ����� ,� � � ���������� � .�� � ����� �����

������������� ������������ �! "����� �

#%$'&)( � * �! "�����+��, * �.- �! "�����+��, * ��/�0������� � �

�/�����0��� ,�

� �! "�������+12 � �3 4�0���+�/56- �! "�����+17��! "���0���+1

� -8�! 9���� -value for two sided test of � � � � � is��: �<; = � � � � �?> � �A@CB � �ED .

� 95% Confidence interval for � � � � :#F$ & � �3 4�0���+�/5 * �.- �3 4�0���+�/5+,

����������� � �! "���0�+1G� * �.- �! "����� 1G�H,�/�0�������� �! "�������+1I �0JLKM ON3 � * �! "�����+��56- �3 4�0���+1G�H,QP * �� I � , * "������� 1�,� * -��! "���0�+� I!R -��3 4�0���+��1�,

Page 14: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Internal and External Validity of Studies

�The design of a study is biased if it systematically favors

certain outcomes. For example, a medical study without a

placebo control group is biased towards finding a favorable

treatement effect.

� Campbell and Stanley (1963) drew an important distinction

between internal validity and external validity.

� A study is said to lack internal validity if it is biased.

� The major limitation of experiments is lack of realism. The

subjects or treatments or setting of an experiment may not

realistically duplicate the conditions we really want to study.

Examples: psychology laboratory experiments, medical

experiments on animals.

� External validity refers to whether the results of the study can

be generalized to populations, settings and treatments not in

the study. Examples: Milgram’s studies of obedience,

Tennessee class size experiment.

� In order for statistical inferences to a population to be drawn,

the subjects in a study must be a probability sample from the

population.

Page 15: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Observational Studies

� For practical and ethical reasons, it is often difficult to

carry out a randomized experiment.

� In an observational study, the researcher lacks control of

which subjects are assigned to which treatments.

Example: smoking and lung cancer.

� How can we can control for lurking variables?

– For known lurking variables, we can use methods of

standardization, matching and regression that we will

study to control for the effects of lurking variables.

Example: Compare cancer rates among smokers

and non-smokers of the same age and sex.

– We cannot control for unknown lurking variables and

there is always the possibility that the real cause is

an unknown lurking variable. A lurking variable is

called a confounder if it is associated with both the

assignment to treatment and the outcome. Unlike in

a randomized experiment, more replication will not

help. Examples: Fisher’s criticism of smoking-cancer

link, pellagra, hormone replacement therapy.

Page 16: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Natural Experiments

�Natural Experiment: Random assignment created by “nature,”

e.g., by a sudden, unexpected event.

� Effect of minimum wages on employment. On April 1, 1992,

New Jersey increased its minimum wage but Pennsylvania did

not. Card and Krueger (1994) studied the difference in

employment between Pennsylvania and New Jersey fast food

restaurants (Burger King, Kentucky Fried Chicken, Wendy’s

and Roy Rogers) in New Jersey before and after the increase

in the minimum wage and compared this to a control group

consisting of fast food restaurants in adjacent, eastern

Pennsylvania.

� Threats to validity: Internal validity - (i) Lurking variable -

political economy (why did New Jersey change its law); (ii)

Interaction between another variable that changed between

time periods and the state. External validity - This study only

covers short term changes.

Page 17: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Block Designs

�Example: Suppose that the incidence of polio was known to

differ by grade levels. In the observed control approach, grade

level would be a lurking variable. Randomization prevents

systematic bias by tending to balance the number of children

of each grade level receiving the placebo and the vaccine.

However, randomization is not guaranteed to balance grade

levels between the placebo and vaccine groups.

� A block design combines the ideas of randomization and

controlling for a known lurking variable by matching.

� A block is a group of units or subjects that are known before

the experiment to be similar in some way that is expected to

affect the response to the treatments. In a block design, the

random assignment of units to treatments is carried out

separately within each block.

� Blocks are another form of control. They control the effects of

some outside variables by bringing those variables into the

experiment to form the blocks. The advantage of blocking is

that it can provide more precise conclusions by minimizing the

chance variation in the distribution of known confounders.

Page 18: Introduction/Design of Experiments - Statistics Departmentdsmall/stat112-02/handouts/lectslides1.… · Introduction/Design of Experiments Lecture Notes I Statistics 112, ... –

Review of Key Ideas

�Descriptive vs. Causal Research. Both are useful but one

must pay attention to what the objective of the study is in both

the design and the analysis.

� Association is not causation.

� Three key statistical ideas for designing studies to establish

causation: (i) Control the effects of known influences on the

outcome through comparative design (e.g., use

contemporaneous control groups, a placebo, double blinding);

(ii) Use randomization to balance the remaining lurking

variables among the treatment groups; (iii) Replicate the study

on as many units as possible.

� Statistical inference can be used to assess the influence of

chance variation in the distribution of lurking variables in

randomized experiments but cannot be used to assess the

influence of unknown lurking variables in observational studies.

� Internal validity vs. external validity of a study.