STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

51
STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS Assist.Prof.Dr. R. Serkan Albayrak Department of Business Administration Yaşar University

description

STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS. Assist.Prof.Dr . R. Serkan Albayrak Department of Business Administration Yaşar University. COURSE OBJECTIVES. To be able to plan an experiment in such a way that the statistical analysis results in valid and objective conclusions. - PowerPoint PPT Presentation

Transcript of STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Page 1: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Assist.Prof.Dr. R. Serkan AlbayrakDepartment of Business Administration

Yaşar University

Page 2: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

COURSE OBJECTIVES

• To be able to plan an experiment in such a way that the statistical analysis results in valid and objective conclusions.

• To learn a variety of experimental designs and be able to choose an appropriate design for a specific experiment.

• To be able to translate an experimental description into a statistical model, including identifying model restrictions and assumptions.

GRADING• 30% Homeworks (All homeworks will be graded by a student

grader, who will also help grade the exams. He or she will not hold office hours.)

• 30% Midterm• 40% Final Exam

Page 3: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Textbook

• Design and Analysis of Experiments, 6th edition, by D. C. Montgomery (Required). This book will be used to describe most of the technical details of the course material. After the first few chapters, we will skip around. By the end of the semester, we will have covered most sections.

• A First Course in Design and Analysis of Experiments by G. W. Oehlert (Recommended). This book is a nice counterpoint to Montgomery's book, providing a somewhat more readable discussion of more difficult topics as well as covering some material that the required text doesn't get to. It is, however, not as comprehensive as Montgomery's book, nor does it provide as many examples.

• Experimental Design Using ANOVA by B.G. Tabachnick and L.S.Fidell (Recommended). This book uses regression approach to ANOVA alongside the traditional approach.

Page 4: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Software

• R-Project : www.r-project.org• There are several tutorials freely available online. However

student assistant will deliver you my preferred tutorials both as a hardcopy and softcopy.

• You must be able to code in R as soon as possible. All homeworks will be prepared using R. No teamwork is allowed.

Page 5: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Induction

• For the most part, statistical inference makes propositions about populations, using data drawn from the population of interest via some form of random sampling.

• Example (Survey):Do you favor increasing the gas tax for public transportation?– Specific cases: 200 people called for a telephone survey– Inferential goal: get information on the opinion of the entire city.

• Example (Women's Health Initiative): Does hormone replacement improve health status in post-menopausal women?– Specific cases: Health status monitored in 16,608 women over a 5-

year period. Some took hormones, others did not.– Inferential goal: Determine if hormones improve the health of women

not in the study.

Page 6: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Model of a Variable Process

How do the inputs of a process affect an output?Input variables consist of• Controllable factors :measured and determined by scientist.• Uncontrollable factors : measured but not determined by the scientist.• Noise factors : unmeasured, uncontrolled factors, often called

experimental variability or `error`.

Page 7: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Model of a Variable Process

For the process, there are inputs such that:variability in input variability in output

• If variability in x leads to variability y, we say x is a source of variation.

• Good design and analysis of experiments can identify sources of variation.

Page 8: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Observational Studies and Experiments

Information on how inputs affect output can be gained from:Observational Studies:

• Input and output variables are observed from a pre-existing population.

• It may be hard to say what is input and what is output.

Controlled Experiments:• One or more input variables are controlled and manipulated by the

experimenter to determine their effect on the output.

Page 9: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Women`s Health Initiative (WHI)

Population:Healthy, post-menopausal women in the U.S.Input Variables:

1.estrogen treatment, yes/no2.demographic variables (age, race, diet, etc.)3.unmeasured variables (?)

Output Variables:4.coronary heart disease (eg. MI)5.invasive breast cancer6.other health related outcomes

Scientific Question:How does estrogen treatment affect health outcomes?

Page 10: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

WHI Observational Study

Observational Population:• 93,676 women enlisted starting in 1991;• tracked over eight years on average;• data consists of• x= input variables• y=health outcomes,•gathered concurrently on existing populations.

Results:Good health/low rates of CHD* generally associated with estrogen treatment.

Conclusion:Estrogen treatment positively associated with health, such as CHD.

*CHD:Coronary Heart Disease

Page 11: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

WHI Randomized Controlled Trial

Experimental Population:373,092 women determined to be eligible

18,845 provided consent to be in experiment16,608 included in the experiment

16,608 women randomized to either Women were of different ages and were treated at different clinics.Women were blocked together by age and clinicthen treatments were randomly assigned within each agetreatment block.• This type of random allocation is called a randomized block design.

Page 12: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

WHI Randomization Scheme

= number of women in study, in clinic i and in age group j= number of women in block i,j

Randomization scheme: For each block,• 50% of the women randomly assigned to treatment (x = 1)• remaining women assigned to control (x = 0).Question: Why did they randomize within a block?

Page 13: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

WHI RCT Results

Women on treatment had lower incidence rates for• colorectal cancer (kolon kanseri)• hip fracture (kalça kırığı)

but higher incidence rates for• CHD• breast cancer• stroke• pulmonary embolism (akciğer embolisi)

Conclusion:Estrogen isn't a viable preventative measure for CHD in the general population.That is, our inductive inference is(specific) higher CHD rate in treatment population than control population

suggests(general) if the whole population were treated, CHD incidence would increase

Page 14: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Correlation, causation, confounding

QUESTION: Why the different conclusions between the two studies?Consider the following possible explanation: Let• x = estrogen treatment• = " health consciousness" (not directly measured)• y = health outcomes

Association between x and y may be due to an unmeasured variable .

Page 15: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Randomized experiments versus observational studies

Randomization breaks the association between and x.

Observational studies can suggest good experiments to run, but can’t definitively show causation.

Randomization can eliminate correlation between x and y due to a differentcause , aka a confounder.

No causation without randomization.

Page 16: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Ingredients of an experimental design

1. Identify research hypotheses to be tested.2. Choose a set of experimental units, which are the units to whichtreatments will be randomized.3. Choose a response/output variable.4. Determine potential sources of variation in response:

4.1 factors of interest4.2 nuisance factors

5. Decide which variables to measure and control:5.1 treatment variables5.2 potential large sources of variation in the units (blocking

variables)6. Decide on the experimental procedure and how treatments are to berandomly assigned.

The order of these steps may vary due to constraints such as budgets, ethics,time, etc..

Page 17: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS
Page 18: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Three principles of Experimental Design

1. Replication: Repetition of an experiment.Replicates are runs of an experiment or sets of experimental units thathave the same values of the control variables.

More replication more precise inference = response of the ith unit assigned to treatment A, i = 1,…, = response of the ith unit assigned to treatment B, i = 1,…, 0 provides evidence that treatment affects response i.e. treatment is a source of variation and the amount of evidence is increasing with n.2. Randomization: Random assignment of treatments to experimental units.Removes potential for systematic bias on the part of the researcher, andremoves any pre-experimental source of bias. Makes confounding the effectof treatment with unobserved variables unlikely (but not impossible).3. Blocking: Randomization within blocks of homogeneous units.The goal is to evenly distribute treatments across large potential sourcesof variation.

Page 19: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Statistical Model

• Example: wheat yieldQuestion: Is one fertilizer better than another, in terms of yield?Outcome variable: Wheat yield.Factor of interest: Fertilizer type, A or B. One factor having two levels.Experimental material: One plot of land, divided into 2 rows of 6 subplots each.

Page 20: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Design Questions

How should we assign treatments/factor levels to the plots?Want to avoid confounding treatment effect with another source of variation.Potential sources of variation: Fertilizer , soil , sun , water, etc.

Implementation of the ExperimentAssigning treatments randomly avoids any pre-experimental bias in results.12 playing cards, 6 red, 6 black were shuffled and dealt:

1st card black 1st plot gets B2nd card red 2nd plot gets A3rd card black 3rd plot gets B

...This is the first design we will study, a completely randomized design (CRD).

Page 21: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Results

How much evidence is there that fertilizer type is a source of yield variation?Evidence about differences between two populations is generally measured by comparing summary statistics across the two sample populations.(Recall, a statistic is any computable function of known, observed data).

Page 22: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Summaries of Sample Distribution

Page 23: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS
Page 24: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Summaries of Sample Location

Page 25: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Summaries of Sample Scale

Absolute Deviations: Let Med() represent median function. Then absolute deviations is found by:

Page 26: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Summaries in R

Page 27: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Induction and Generalization

• So there is a difference in yield for these wheat yields.• Would you recommend B over A for future plantings?• Do you think these results generalize to a larger population?

Page 28: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Hypotheses: Competing Explanations

Questions:• Could the observed differences be due to fertilizer type?• Could the observed differences be due to plot-to-plot

variation?Hypothesis tests: H0 (null hypothesis): Fertilizer type does not affect yield. H1 (alternative hypothesis): Fertilizer type does affect yield.A statistical hypothesis test evaluates the compatibility of H0 with the data.

Page 29: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Test statistics and null distributions

Suppose we are interested in mean wheat yields. We can evaluate H0 by answering the following questions:

• Is a mean difference of 5.93 plausible/probable if H0 is true?• Is a mean difference of 5.93 large compared to experimental

noise?To answer the above, we need to compare{= 5.93}, the observed difference in the experiment

tovalues of that could have been observed if H0 were true.Hypothetical values of that could have been observed under H0 are referred to as samples from the null distribution.

Page 30: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

g(YA,YB) = g({Y1,A,…,Y6,A},{Y1,B,…,Y6,B}) = This is a function of the outcome of the experiment. It is a statistic.Since we will use it to perform a hypothesis test, we will call it a test statistic.Observed test statistic:g(11.4, 23.7, … , 14.2, 24:3) = 5.93 = gobs

Hypothesis testing procedure:Compare gobs to g(YA,YB), whereYA and YB are values that could have been observed, if H0 were true.

Page 31: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Experimental procedure and observed outcome

Page 32: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Experimental procedure and potential outcome

Page 33: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

The null distribution

Page 34: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS
Page 35: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Null distribution, wheat example

Page 36: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Comparing data to the null distribution:

Page 37: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Approximating a randomization distribution:

Assistant will talk about bootstrapping in class sessions

Page 38: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Essential nature of a hypothesis test

Page 39: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Questions

Page 40: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Choosing Test Statistic

Page 41: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

The t statistic

Page 42: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

The Kolmogorov-Smirnov statistic

Page 43: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Comparing the test statistics

Page 44: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS
Page 45: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS
Page 46: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Sensitivity to specific alternatives

Page 47: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS
Page 48: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Discussion

Page 49: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Basic decision theory

Page 50: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Decision procedure

Page 51: STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Interpretations of level- tests