Probability & Statistical Inference Lecture 6
description
Transcript of Probability & Statistical Inference Lecture 6
PROBABILITY & STATISTICAL INFERENCE LECTURE 6MSc in Computing (Data Analytics)
Lecture Outline Quick Recap Testing the difference between two
sample means Practical Hypothesis Testing Analysis Of Variance
General Steps in Hypotheses testing1. From the problem context, identify the parameter of
interest.2. State the null hypothesis, H0 .3. Specify an appropriate alternative hypothesis, H1.4. Choose a significance level, .5. Determine an appropriate test statistic.6. State the rejection region for the statistic.7. Compute any necessary sample quantities, substitute
these into the equation for the test statistic, and compute that value.
8. Decide whether or not H0 should be rejected and report that in the problem context.
Type of questions that can be answered with Two sample hypothesis tests A manufacturing plant want to compare
the defective rate of items coming off two different process lines.
Whether the test results of patients who received a drug are better than test results of those who received a placebo.
The question being answered is whether there is a significant (or only random) difference in the average cycle time to deliver a pizza from Pizza Company A vs. Pizza Company B.
Difference in Means of Two Normal Distributions, Variances Known
Test Assumptions
Example
Example
Example
The P-Value is the exact significance level of a statistical test; that is the probability of obtaining
a value of the test statistic that is at least as extreme as that when the null hypothesis is true
Confidence Interval on a Difference in Means, Variances Known
Example
Example
Difference in Means of Two Normal Distributions, Variances unknownWe wish to test:
The pooled estimator of 2:
Difference in Means of Two Normal Distributions, Variances unknown
Example
Example
Example
Confidence Interval on the Difference in Means, Variance Unknown
Example
Example
Example
Practical Hypothesis Testing1. From the problem context, identify the parameter of
interest.2. State the null hypothesis, H0 .3. Specify an appropriate alternative hypothesis, H1.4. Choose a significance level, .5. Calculate the P-value using a software package of
choice.6. Decide whether or not H0 should be rejected and
report that in the problem context. Reject H0 when P-Value is less than .(Golden rule: Reject H0 for small )
Some Reserach Look up the correct formula for
calculating the hypotheses test between two proportions
What are the assumptions for the test Find an example of the research
Analysis of Variance
Introduction In the previous section we were
concerned with the analysis of data where we compared the sample means.
Frequently data contains more that two samples, they may compare several treatments.
In this lecture we introduce statistical analysis that allows us compare the mean of more that two samples. The method is called ‘Analysis of Variance ‘ or AVOVA for short.
Total Sum of SquaresData set:
14, 12, 10, 6 ,4, 2Group A:
6 ,4, 2Group B:
14, 12, 10Overall Mean : 8Total Sum of Squares:SST= (14-8)2 + (12-8)2 + (10-8)2 + (6-8)2 + (4-8)2 +
(2-8)2 =112
Between Group Variation Sum of Squares of
the Model:SSm= na(µ - µa)2 +
nb(µ - µb)2
=3*(8-4)2 + 3*(8-12)2
=96
Within Group Variation Sum of Squares of the
Error:
SSe=
= (14-12)2 + (12-12)2 + (10-12)2 + (6-4)2 + (4-4)2 + (4-2)2 +
= 16
2
1 1
__)(
k
i
n
jjij xx
Structure of the DataGroup Observation Total Mean
1 x11 x12 .......... x1n x1
2 x21 x22.......... x2n x2
.
.
...........
a xa1 xa2 .......... xan xa
Total
1x
2x
ax
x
ANOVA Table
Source Degrees of Freedom
Sum Of Squares Mean Square
F- Stat
Model a - 1 SSM /(a-1) MSM / MSE
Error n-aSSE /(n-a)
Total n-1SST /(n-1)
2
1
)( xxn
ii
a
jjj xxn
1
2)(
2
1 1
__)(
a
i
n
jjij xx
Where : n is the sample size and a is the number of groups
ANOVA Table – Original ExampleSource Degrees
of Freedom
Sum Of Squares Mean Square
F- Stat
Model 2 - 1 = 1 96 96 24
Error 6 – 2 = 4 16 4
Total 6 – 1 = 5 112
Where : n is the sample size and a is the number of groups
Model Assumptions Independence of observations within and
between samples normality of sampling distribution equal variance - This is also called the
homoscedasticity assumption
The ANOVA Equation We can describe the observations in the
above table using the following equation:
njai
Y ijiij ,......,2,1,......,2,1
Where : n is the sample size and k is the number of groups
ANOVA Hypotheses We wish to test the hypotheses:
The analysis of variance partitions the total variability into two parts.
Example
Graphical Display of Data
Figure 13-1 (a) Box plots of hardwood concentration data. (b) Display of the model in Equation 13-1 for the completely randomized single-factor experiment
Example We can use ANOVA to test the
hypotheses that different hardwood concentrations do not affect the mean tensile strength of the paper. The hypotheses are:
The ANOVA table is below:
Example The p-value is less than 0.05 therefore
the H0 can be rejected and we can conclude that at least one of the hardwood concentrations affects the mean tensile strength of the paper.
Test Model Assumptions Use the Bartletts Test to test for
homoscedasticity assumption Bartlett's test (Snedecor and Cochran, 1983) is
used to test if k samples have equal variances. Bartlett's test is sensitive to departures from
normality. That is, if your samples come from non-normal distributions, then Bartlett's test may simply be testing for non-normality. The Levene test is an alternative to the Bartlett test that is less sensitive to departures from normality.
Barlett Test for Equal Variance The hypotheses for the Barlett test are
as follows:
The barlett test statistic follows a chi-squared distribution
Interpert the p-value like any other hypothese test
ji,pair on least at for : H
... : H22
i1
222
210
j
n
If the Assumption of Equal Variance is not met If the assumption for equal variance is
not met use the Welches ANOVA Assignment for next week:
Investigate the difference between the standard ANOVA and Welches ANOVA?
Demo
Confidence Interval about the mean
For 20% hardwood, the resulting confidence interval on the mean is
Confidence Interval about on the difference of two treatments
For the hardwood concentration example,
An Unbalanced Experiment
Multiple Comparisons Following the ANOVA The least significant difference (LSD) is
If the sample sizes are different in each treatment:
Example: Multi-comparison Test
Example: Multi-comparison Test
Demo
Exercises