Anova single factor
-
Upload
dhruv-patel -
Category
Education
-
view
80 -
download
3
description
Transcript of Anova single factor
ANOVAANOVA
One way Single Factor ModelsOne way Single Factor Models
KARAN DESAI-11BIE001DHRUV PATEL-11BIE024
VISHAL DERASHRI -11BIE030 HARDIK MEHTA-11BIE037MALAV BHATT-11BIE056
DEFINITION
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences between group means and their associated procedures (such as "variation" among and between groups), developed by R.A.Fisher .In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation
22
-Sir Ronald Aylmer Fisher
FRS was an English statistician, evolutionary biologist, geneticist, and eugenicist 33
Why ANOVA
• Compare the mean of more than two population?
• Compare populations each containing several subgroups or levels?
4
Problem with multiple T test
• One problem with this approach is the increasing number of tests as the number of groups increases
• The probability of making a Type I error increases as the number of tests increase.
• If the probability of a Type I error for the analysis is set at 0.05 and 10 t-tests are done, the overall probability of a Type I error for the set of tests = 1 – (0.95)10 = 0.40* instead of 0.05
5
In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. As doing multiple two-sample t-tests would result in an increased chance of committing a statistical type-I error, ANOVAs are useful in comparing (testing) three or more means (groups or variables) for statistical significance.
66
• Another way to describe the multiple comparisons problem is to think about the meaning of an alpha level = 0.05
• Alpha of 0.05 implies that, by chance, there will be one Type I error in every 20 tests: 1/20 = 0.05.
• This means that, by chance the null hypothesis will be incorrectly rejected once in every 20 tests
• As the number of tests increases, the probability of finding a ‘significant’ result by chance increases.
7
Importance of ANOVA
• The ANOVA is an important test because it enables us to see for example how effective two different types of treatment are and how durable they are.
• Effectively a ANOVA can tell us how well a treatment work, how long it lasts and how budget friendly it will be an
8
CLASSIFICATION OF ANOVA MODEL
1. Fixed-effects models: The fixed-effects model of analysis of
variance applies to situations in which the experimenter applies one or more treatments to the subjects of the experiment to see if the response variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole.
99
2. Random-effects model: Random effects models are used
when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables , some assumptions and the method of contrasting the treatments (a multi-variable generalization of simple differences) differ from the fixed-effects model.
1010
3.Mixed-effects models
A mixed-effects model contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types.
Example: Teaching experiments could be performed by a university department to find a good introductory textbook, with each text considered a treatment. The fixed-effects model would compare a list of candidate texts. The random-effects model would determine whether important differences exist among a list of randomly selected texts. The mixed-effects model would compare the (fixed) incumbent texts to randomly selected alternatives.
1111
ASSUMPTION
Normal distribution
Variances of dependent variable are equal in all populations
Random samples; independent scores
1212
One way Single factor ANOVA
1313
ONE-WAY ANOVA
One factor (manipulated variable)
One response variable
Two or more groups to compare
1414
USEFULLNESS
Similar to t-test
More versatile than t-test
Compare one parameter (response variable) between two or more groups
1515
Remember that…
Standard deviation (s) n
s = √[(Σ (xi – X)2)/(n-1)] i = 1
In this case: Degrees of freedom (df)
df = Number of observations or groups
1616
ANOVA
ANOVA (ANalysis Of VAriance) is a natural extension used to compare the means more than 2 populations.
Basic Question: Even if the true means of n populations were equal (i.e. we cannot expect the sample means (x1, x2, x3, x4 ) to be equal. So when we get different values for the x’s, How much is due to randomness? How much is due to the fact that we are sampling from
different populations with possibly different j’s.
ANOVA TERMINOLOGY
Response Variable (y) What we are measuringWhat we are measuring
Experimental Units The individual unit that we will measureThe individual unit that we will measure
Factors Independent variables whose values can change to affect Independent variables whose values can change to affect
the outcome of the response variable, ythe outcome of the response variable, y Levels of Factors
Values of the factorsValues of the factors Treatments
The combination of the levels of the factors applied to an The combination of the levels of the factors applied to an experimental unitexperimental unit
ExampleWe want to know how combinations of different
amounts of water (1 ac-ft, 3 ac-ft, 5 ac-ft) and different fertilizers (A, B, C) affect crop yields
Response variable – crop yield (bushels/acre)crop yield (bushels/acre)
Experimental unit Each acre that receives a treatmentEach acre that receives a treatment
Factors (2)(2) Water and fertilizerWater and fertilizer
Levels (3 for Water; 3 for Fertilizer)(3 for Water; 3 for Fertilizer) Water: 1, 3, 5; Fertilizer: A, B, CWater: 1, 3, 5; Fertilizer: A, B, C
Treatments (9 = 3x3)(9 = 3x3) 1A, 3A, 5A, 1B, 3B, 5B, 1C, 3C, 5C1A, 3A, 5A, 1B, 3B, 5B, 1C, 3C, 5C
Total Treatments
A B C1 AC-FT Treatment 1 Treatment 2 Treatment 3
Water 3 AC-FT Treatment 4 Treatment 5 Treatment 65 AC-FT Treatment 7 Treatment 8 Treatment 9
Fertilizer
Single Factor ANOVABasic Assumptions
If we focus on only one factor (e.g. fertilizer type in the previous example), this is called single factor ANOVA. In this case, levels and treatments are the same thing since
there are no combinations between factors.
Assumptions for Single Factor ANOVA1. The distribution of each population in the comparison has a
normal distribution2. The standard deviations of each population (although
unknown) are assumed to be equal (i.e.
3. Sampling is:RandomIndependent
Example The university would like to know if the delivery mode of the
introductory statistics class affects the performance in the class as measured by the scores on the final exam.
The class is given in four different formats: Lecture Text Reading Videotape Internet
The final exam scores from random samples of students from each of the four teaching formats was recorded.
Samples
Summary
There is a single factor under observation – teaching format There are k = 4 different treatments (or levels of teaching
formats) The number of observations (experimental units) are n1 = 7,
n2 = 8, n3 = 6, n4 = 5 total number of observations, n = 26
72 x : ns)observatio 26 all (ofmean Grand
74 x 75, x 65, x 76, x :MeansTreatment 4321
Why aren’t all thex’s the same? There is variability due to the different treatments --
Between Treatment Variability Between Treatment Variability (Treatment)(Treatment) There is variability due to randomness within each
treatment -- Within Treatment Variability Within Treatment Variability (Error)(Error)
If the average Between Treatment VariabilityBetween Treatment Variability is “large”
compared to the average Within Treatment VariabilityWithin Treatment Variability,
we can reasonably conclude that there really are
differences among the population means (i.e. at least
one μj differs from the others).
BASIC CONCEPTBASIC CONCEPT
Basic Questions
Given this basic concept, the natural questions are: What is “variability” due to treatment and due to error
and how are they measured? What is “average variability” due to treatment and due
to error and how are they measured? What is “large”?
How much larger than the observed average variability due to error does the observed average variability due to treatment have to be before we are convinced that there are differences in the true population means (the µ’s)?
How Is “Total” Variability Measured?
Variability is defined as the Sum of Square DeviationsSum of Square Deviations (from the grand mean). So,
SSTSST (Total Sum of Squares) Sum of Squared Deviations of all observations from the
grand mean. (McClave uses SSTotal)
SSTrSSTr (Between Treatment Sum of Squares) Sum of Square Deviations Due to Different Treatments.
(McClave uses SST)
SSESSE (Within Treatment Sum of Squares) Sum of Square Deviations Due to Error
SST = SSTr + SSESST = SSTr + SSE
How is “Average” Variability Measured?
“Average” Variability is measured in:
Mean Square ValuesMean Square Values (MSTr and MSE) Found by dividing SSTr and SSE by their respective
degrees of freedom
VariabilityVariability SSSS DFDF Mean Square (MS)Mean Square (MS)
Between Tr. (Treatment) SSTr k-1 SSTr/DFTR
Within Tr. (Error) SSE n-k SSE/DFE
TOTAL SST n-1
ANOVA TABLEANOVA TABLE
# observations -1
# treatments -1 DFT - DFTR
Formula for CalculatingSST
Calculating SST
Just like the numerator of the variance assuming all (26) entries come from one population
4394 )7281(...7282
)x(x SST
22
2ij
Formula for Calculating SSTr
Calculating SSTr Between Treatment
Variability
Replace all entries within each treatment by its mean – now all the variability is between (not within) treatments
578)7274(5)7275(6)7265(8)7276(7
)xx(n SSTr
2222
2jj
76767676767676
757575757575
6565656565656565
7474747474
Formula for Calculating SSE
Calculating SSE (Within Treatment Variability)
The difference between the SST and SSTr ---
3816578-4394
SSTr - SST SSE
Can we Conclude a Difference Among the 4 Teaching Formats?
We conclude that at least one population mean differs from the others if the average between treatment variability is large compared to the average within treatment variability, that is if MSTr/MSE is “large”.
The ratio of the two measures of variability for these normally distributed random variables has an F F distributiondistribution and the F-statistic (=MSTr/MSE)F-statistic (=MSTr/MSE) is compared to a critical F-value from an F distribution with: Numerator degrees of freedom = DFTr Denominator degrees of freedom = DFE
If the ratio of MSTr to MSE (the F-statistic) exceeds the critical F-value, we can conclude that at least one at least one population mean differs from the otherspopulation mean differs from the others.
Can We Conclude Different Teaching Formats Affect Final Exam Scores?
The F-test
H0:
HA: At least one j differs from the others
Select α = .05.
Reject H0 (Accept HA) if:
3.05FF MSE
MSTr F .05,3,22DFEDFTr,α,
Hand Calculations for the F-test
173.4522
3816
DFE
SSE MSE
192.673
578
DFTr
SSTr MSTr
CannotCannot conclude there is a difference among the conclude there is a difference among the μμjj’s’s
3.051.11
1.11173.45
192.67F
Excel Approach
EXCEL OUTPUT
p-value = .365975 > .05p-value = .365975 > .05Cannot conclude differencesCannot conclude differences
REVIEW
ANOVA Situation and Terminology Response variable, Experimental Units, Factors,
Levels, Treatments, Error Basic Concept
If the “average variability” between treatments is “a lot” greater than the “average variability” due to error – conclude that at least one mean differs from the others.
Single Factor Analysis By Hand By Excel