Introduction/Design of Experiments - Statistics...
Transcript of Introduction/Design of Experiments - Statistics...
Introduction/Design of Experiments
Lecture Notes I
Statistics 112, Fall 2002
Modes of Inquiry
� Descriptive. What is the world actually like? Examples:
– Economics: How are incomes distributed? How
much worse off are minority groups?
– Psychology: How do recruits for an intelligence
agency respond to psychologically trying job tests
(Rosnow and Rosenthal)?
– Biology: Taxonomy of species.
� Causal. How did the world get to be the way it is? What
would happen if some aspect of the world is changed?
Examples:
– Psychology: Milgram’s experiments on obedience,
motivated by a desire to understand the causes of
the Holocaust; Do smaller class sizes increase
students’ achievement?
– Medicine: Does taking Vitamin C prevent colds?
Descriptive Research
Uses:
�Generates scientific hypotheses. Examples: (i) What are the
root causes of terrorism? Need careful documentation of the
relationship between terrorist activity and other variables; (ii)
Pellagra.
� Prediction. Establishing causation is not always the goal.
Examples: prediction of job performance based on a job test,
medical treatment, computer recognition of handwritten zip
codes, racial and ethnic profiling.
The role of statistics:
�Graphical techniques and data summary techniques (e.g.,
linear regression) can provide an accurate description of the
main features of a complex data set and accurate predictions.
� Methods of statistical inference (e.g.,�-tests) provide a
measure of confidence in the reality of the patterns that are
found.
� Design of descriptive studies such as sample surveys. Idea of
probability sampling to control bias (Section 3.3).
Studies of Treatment Effects
Much of scientific research seeks to answer the question,
“What is the cause-and-effect relationship between a
treatment, policy or intervention and an outcome of
interest?” In other words, what would happen to the
outcome if we change a treatment or policy and do not
change any other aspects of the world. Examples:
� Medicine: How effective is a new drug? What is the
effect of smoking on one’s chance of developing cancer?
� Psychology: What change in an individual’s normal
solitary performance and behavior occurs when people
are present? What change in an individual’s moral
behavior occurs when the individual is commanded by
authority?
� Economics: What is the effect of a change in taxes on
labor supply and investment behavior? What is the effect
of a change in the minimum wage on employment?
� Education: What is the effect of smaller class sizes on
students’ achievement?
Studies of Treatment Effects Cont.
�Statistics plays a central role in both the design and the
analysis of studies of treatment effects.
� Terminology: The individuals in the study are called units.
When the units are human beings, they are called subjects. A
specific condition that is either applied or observed to hold for
the units is called a treatment. We are interested in the effect
of the treatment on a response variable.
� The challenge of studying treatment effects: Association is not
causation. Three reasons:
1. Chance fluctuations. One twin sister lives on the North
Pole and the other lives on the Equator. The sister on the
North Pole has four boys and the sister on the equator has
four girls. Does living in the cold make it more likely to have
a boy?
Heterogeneity of responses to treatment in a population.
2. Reverse causality or simultaneous causality. Which came
first - the chicken or the egg? Examples: Wealth and stock
ownership, Beta-carotene intake and morbidity, TV
watching and reading.
3. Lurking variables.
Lurking variables
Lurking variable (Section 2.4): A variable that has an important
effect on the relationship among the variables in a study but is not
included among the variables studied. Such variables are often
called omitted variables or confounding variables. Examples:
� There is a close relationship between the salaries of
Presbyterian ministers in Massachusetts and the price of rum
in Havana. Are the ministers benefiting from the rum trade or
supporting it?
� The death rate in the Navy during the Spanish-American War
was nine per thousand. For civilians in New York City during
the same period it was sixteen per thousand. Navy recruiters
later used these figures to show that it was safer to be in the
Navy than out of it.
� The Samaritans and suicide.
� Pellagra.
Statisical Principles for Designing Studies
�The two key statistical principles for designing studies of
treatment effects are (1) Design the study to control for the
effects of lurking variables on the response; (2) Replicate the
study on many units to reduce chance variation in the results.
� Comparison as a method of control: Laboratory experiments in
science and engineering often use a simple design in which
treatments are applied to all of the units. Example: Subject a
beam (the unit) to a load (treatment) and observe its response.
In studies with humans, such simple designs often do not
protect against lurking variables. Example: The placebo effect.
The simplest form of control is to compare several treatments
in an environment in which all other factors besides the
treatment are kept the same, e.g., compare the effects of
giving a drug to giving a placebo.
� The use of controls itself does not ensure a valid study design.
� In a randomized comparative experiment, the researcher uses
a chance mechanism (e.g., a coin flip) to assign subjects to
treatments. Prevents subtle biases in the assignment of
subjects to treatments and tends to balance lurking variables
between treatment groups.
Example: The Salk Vaccine Field Trial
�In the first half of the 20th century, polio was one of the most
frightening diseases, striking hardest at young children and
leaving many helpless cripples. It appeared in epidemic
waves, leading to summer seasons in which some
communities felt compelled to close swimming pools and
restrict public gatherings.
� By the 1950s, Jonas Salk developed a vaccine for polio that
had proved promising in laboratory experiments but it was
necessary to try it in the real world before releasing it for
general use.
� Need for replication: Suppose the vaccine was 50% effective.
Assume that during the trial, the rate of occurrence of polio
would be about 50 per 100,000. With 40,000 in the control
group and 40,000 in the vaccinated group, we would expect to
find about 20 control cases and 10 vaccinated cases, and a
difference of this magnitude could fairly easily be attributed to
chance (the standard error of the difference in proportions is
about 5.5). With 100,000 in each group, the expected
difference is 25 and the standard error is 8.7.(Review Chapter
8 for these calculations).
Designs for Salk Vaccine Field Trial
�The Historical Control Approach: Distribute the vaccine as
widely as possible, through the schools, to see whether the
rate of reported polio was appreciably less than usual during
the subsequent season.
� The Observed Control Approach: Offer vaccination to all
children in the second grade of participating schools and follow
the polio experience not only in these children but in the first
and third grade children.
� The Placebo Control Approach: Choose the control group from
the same population as the treatment group - children whose
parents consented to vaccination. Assign the treatment
randomly. Give a placebo to the control group. Do not tell
doctors which group the children belong to (double blinding).
Results of Salk Vaccine Field Trial
Placebo Control Observed Control
Size Rate Size Rate
Treatment 200,000 28 Grade 2 (vaccine) 225,000 25
Control 200,000 71 Grades 1 and 3 (control) 725,000 54
No consent 350,000 46 Grade 2 (no consent) 125,000 44
Table 1: The results of the Salk vaccine trial of 1954. Size of groups
and rate of polio cases per 100,000 in each group. The numbers are
rounded.
Logic of Randomized Comparative Experiment
�Randomization produces groups that should be similar in all
respects before the treatment is applied.
� Comparative design (i.e., use of the placebo and double
blinding in this experiment) ensures that influences other than
the treatment operate equally on the groups.
� Therefore, differences between the control and the treatment
group must be due either to the treatment or to the play of
chance in the random assignment of units due to the groups.
� Statistical inference provides a method for describing how
confident we can be that an observed difference between the
treatment and control groups did not arise due to chance.
� Salk Vaccine Field Trial: Suppose that the probability of a
randomly chosen child developing polio without the vaccination
(��� ) was the same as with the vaccination (��� ). The
approximate probability that we would observe a difference in
polio cases between the vaccinated and unvaccinated groups
as large as in Table 1 is about one in a billion (Review Chapter
8).
Inference for Comparing Two Proportions
�Let � � denote the probability of a randomly chosen child in the
study developing polio with the vaccination. Let � � denote the
probability of a randomly chosen child in the study developing
polio without the vaccination. Let� � and
� � denote the
number of children in the vaccination group and the placebo
control group that develop polio respectively and � � and � �denote the sample sizes for these two groups.
� Inference for comparing two proportions in a large sample is a
special case of comparing two means (Chapter 7.2). To test
the hypothesis � ��� � ��� � � , compute the � statistic
� �� �� � ��� ���� � � ���
� �� � �
� ��� � �� �
where the pooled standard error is
�� ���� � ����� ����� �� ���
�� �� � � �
� � � � �� � � � �
Under � � , � has a standard normal distribution. An
approximate level � confidence interval for � �� � � is
� � � � � �"! �$# �� �� ��� � � � � ��� � � �
� � �� � �%� � � �
� �
Inference for Salk Vaccine Field Trial
�Test of � � � � � � � � .
� � � ���, � � � ����� � ����� ,� ��� ����������� ,
� � � � � � , � � � ����� � ����� ,� � � ���������� � .�� � ����� �����
������������� ������������ �! "����� �
#%$'&)( � * �! "�����+��, * �.- �! "�����+��, * ��/�0������� � �
�/�����0��� ,�
� �! "�������+12 � �3 4�0���+�/56- �! "�����+17��! "���0���+1
� -8�! 9���� -value for two sided test of � � � � � is��: �<; = � � � � �?> � �A@CB � �ED .
� 95% Confidence interval for � � � � :#F$ & � �3 4�0���+�/5 * �.- �3 4�0���+�/5+,
����������� � �! "���0�+1G� * �.- �! "����� 1G�H,�/�0�������� �! "�������+1I �0JLKM ON3 � * �! "�����+��56- �3 4�0���+1G�H,QP * �� I � , * "������� 1�,� * -��! "���0�+� I!R -��3 4�0���+��1�,
Internal and External Validity of Studies
�The design of a study is biased if it systematically favors
certain outcomes. For example, a medical study without a
placebo control group is biased towards finding a favorable
treatement effect.
� Campbell and Stanley (1963) drew an important distinction
between internal validity and external validity.
� A study is said to lack internal validity if it is biased.
� The major limitation of experiments is lack of realism. The
subjects or treatments or setting of an experiment may not
realistically duplicate the conditions we really want to study.
Examples: psychology laboratory experiments, medical
experiments on animals.
� External validity refers to whether the results of the study can
be generalized to populations, settings and treatments not in
the study. Examples: Milgram’s studies of obedience,
Tennessee class size experiment.
� In order for statistical inferences to a population to be drawn,
the subjects in a study must be a probability sample from the
population.
Observational Studies
� For practical and ethical reasons, it is often difficult to
carry out a randomized experiment.
� In an observational study, the researcher lacks control of
which subjects are assigned to which treatments.
Example: smoking and lung cancer.
� How can we can control for lurking variables?
– For known lurking variables, we can use methods of
standardization, matching and regression that we will
study to control for the effects of lurking variables.
Example: Compare cancer rates among smokers
and non-smokers of the same age and sex.
– We cannot control for unknown lurking variables and
there is always the possibility that the real cause is
an unknown lurking variable. A lurking variable is
called a confounder if it is associated with both the
assignment to treatment and the outcome. Unlike in
a randomized experiment, more replication will not
help. Examples: Fisher’s criticism of smoking-cancer
link, pellagra, hormone replacement therapy.
Natural Experiments
�Natural Experiment: Random assignment created by “nature,”
e.g., by a sudden, unexpected event.
� Effect of minimum wages on employment. On April 1, 1992,
New Jersey increased its minimum wage but Pennsylvania did
not. Card and Krueger (1994) studied the difference in
employment between Pennsylvania and New Jersey fast food
restaurants (Burger King, Kentucky Fried Chicken, Wendy’s
and Roy Rogers) in New Jersey before and after the increase
in the minimum wage and compared this to a control group
consisting of fast food restaurants in adjacent, eastern
Pennsylvania.
� Threats to validity: Internal validity - (i) Lurking variable -
political economy (why did New Jersey change its law); (ii)
Interaction between another variable that changed between
time periods and the state. External validity - This study only
covers short term changes.
Block Designs
�Example: Suppose that the incidence of polio was known to
differ by grade levels. In the observed control approach, grade
level would be a lurking variable. Randomization prevents
systematic bias by tending to balance the number of children
of each grade level receiving the placebo and the vaccine.
However, randomization is not guaranteed to balance grade
levels between the placebo and vaccine groups.
� A block design combines the ideas of randomization and
controlling for a known lurking variable by matching.
� A block is a group of units or subjects that are known before
the experiment to be similar in some way that is expected to
affect the response to the treatments. In a block design, the
random assignment of units to treatments is carried out
separately within each block.
� Blocks are another form of control. They control the effects of
some outside variables by bringing those variables into the
experiment to form the blocks. The advantage of blocking is
that it can provide more precise conclusions by minimizing the
chance variation in the distribution of known confounders.
Review of Key Ideas
�Descriptive vs. Causal Research. Both are useful but one
must pay attention to what the objective of the study is in both
the design and the analysis.
� Association is not causation.
� Three key statistical ideas for designing studies to establish
causation: (i) Control the effects of known influences on the
outcome through comparative design (e.g., use
contemporaneous control groups, a placebo, double blinding);
(ii) Use randomization to balance the remaining lurking
variables among the treatment groups; (iii) Replicate the study
on as many units as possible.
� Statistical inference can be used to assess the influence of
chance variation in the distribution of lurking variables in
randomized experiments but cannot be used to assess the
influence of unknown lurking variables in observational studies.
� Internal validity vs. external validity of a study.