The DMAIC Lean Six Sigma Project and Team Tools Approach
Analyze Phase(Part 2)
Lean Six Sigma Black Belt Training! Analyze (Part 2) Agenda
Review Analyze Part 1Inferential StatisticsHypothesis TestingP-valuesDiscrete X / Continuous Y Statistical TestsContinuous X / Continuous Y Statistical TestsDiscrete X / Discrete Y Statistical Tests Applications / Lessons Learned / Conclusions Next Steps
Six Sigma AnalyzeInferential Statistics
(Identifying What’s Different (Xs) Statistically)
4
Introduction to Hypothesis Testing
109876543
0.4
0.3
0.2
0.1
0.0
X
Normal, Mean=6.5, StDev=1
11109876543
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
X
Normal, Mean=6.9, StDev=1.2
Are these samples from the same population?
Mean=6.5
StDev=1
Mean=6.9
StDev=1.2
Sample 1 Sample 2
5
Intro. to Confidence Intervals (pg. 157)
• Brutal Facts Regarding Samples– We know that the size of the sampling error is primarily
based on the variation in the population and the size of the sample selected.
– Larger samples have a smaller margin or error, yet are more costly to obtain.
– As reality in practice dictates, one sample is usually selected and it usually is the minimum size required.
– Therefore, a method was needed to estimate a population parameter. This method resulted in the term Confidence Interval.
6
Intro. to Confidence Intervals (pg. 157)• A statistic plus or minus a margin of error is called a confidence
interval. • A confidence interval is a range of values, calculated from a data set,
that gives an assigned probability that the true value falls within that range.
• The confidence level is dependent on the range of the margin of error that is selected. Generally, the margin of error that is accepted is plus or minus 2 standard errors, resulting in a 95% confidence level.
• “We are 95% confident that the true average door-to-balloon time is between 60 and 100 minutes.”
50
Assume we have a population of N size that is not normally distributed.
We draw 100 random samples and plot the averages of each sample.
We get a normal distribution with a mean of 50 and n=100.
50
68%
95%
The mean of our sampled distribution is 50.
How confident are we of where the population mean lies?
Similar to standard deviation, we know that 68% of the sample distribution lies within 1 standard error and 95% within 2 standard errors.
-2 SE +2 SE-1 SE +1 SE
Let’s assume we want to be 95% confident of where the true mean of the population lies
We can be 95% confident that the true mean lies within +/- 2SE
50-2 SE +2 SE-1 SE +1 SE
95%
σ√ nSE =
In this case, let’s assume that SE=3, so 2 x SE = 6.
• The mean of our sample distribution is 50.• We are 95% confident that the true mean of the population lies between 44 and 56.• Our margin of error is +/- 6.
10
Central Limit Theorem/ Margin of Error/ Confidence Intervals
• Why Use it? Why is this important?– Six Sigma practitioners use the sample data and apply normal theory for
making inferences about population parameters irrespective of the actual form of the parent population.
– Many statistical tests are founded on the principle that we do not need to know the original distribution. Means and proportions will always be “normal” if n is big enough.
– Practically, we use the central limit theorem to help us estimate the true average, and calculate the likelihood of observing certain events.
– Considering time and resources, we need to have a measure of confidence around our sample statistics.
– None of this is applicable if your data is Unreliable or BIASED!!!!
11
Data-Driven Problem Solving:Hypothesis Testing
Two fundamental questions must be adequately answered in order to be able to adequately perform hypothesis testing:
–What type of data is available (and reliable)?
–What question are you asking (what do you need to understand)?
12
Introduction to Hypothesis Testing (pg. 156)
• Hypothesis testing is basically the process of using statistical analysis to determine if the observed differences between two or more sets of data are due to random chance variation, or due to true differences in the underlying populations.
• Generally, Hypothesis Testing tells us whether or not sets of data are truly different with a certain level of
confidence.
13
Introduction to Hypothesis Testing
109876543
0.4
0.3
0.2
0.1
0.0
X
Normal, Mean=6.5, StDev=1
11109876543
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
X
Normal, Mean=6.9, StDev=1.2
Are these samples from the same population?
Mean=6.5
StDev=1
Mean=6.9
StDev=1.2
Sample 1 Sample 2
14
The Six Sigma Approach
Practical Problem – Lab specimens are mislabeled
too often; leads to incorrect
diagnosis and treatment
StatisticalProblem –
Specimens are mislabeled8 out of 10,000
collected
Statistical Solution –
~85% of mislabeled specimens come from
the ED
Practical Solution – Redesign of the
process of labeling and transporting
specimens leads to dramatic reduction in
errors
Six Sigma applies many tools, including statistical tools to practical problems. The key is data-driven decision making.
Statistical Problem
– Defining
the problem in statistical
terms
PracticalSolution –
addresses the verified root causes
Statistical
Solution – Using data
and statistics to understand
the cause of the problem
Practical Problem
– An unacceptable variation or gap
in quality
15
Introduction to Hypothesis Testing• Hypothesis Testing allows us to answer a practical
question - Is there a true difference between ___ and ___ ?
• Practically, Hypothesis Testing uses relatively small sample sizes to answer questions about the population.
• There is always a chance that the samples we have collected are not truly representative of the population. Thus, we may obtain a wrong conclusion about the population(s) being studied.
16
Introduction to Hypothesis Testing:Testing Terms and Concepts
• Statistically, we “ask and answer questions” using stated hypotheses that are tested at some level of confidence.
• The null hypothesis (Ho) is a statement being tested to determine whether or not it is true (the assumption that there is no difference).
• The alternative hypothesis (Ha) is a statement that represents reality if there is enough evidence to reject
the stated null (Ho)… i.e. the null hypothesis is false.
17
Introduction to Hypothesis TestingExample:
Is the average Length of Stay for a total knee replacement different for Hospital A vs. Hospital B?
Common Language:
Ho: There is no difference in average length of stay between facilities.
Ha: There is a difference in average length of stay between facilities.
Statistical Language:
Ho: Alos = Blos
Ha: Alos ≠ Blos
18
Introduction to Hypothesis Testing:Type I and Type II Errors (Risk)
• As stated earlier, there is the risk of arriving at a wrong conclusion about the hypothesis we are testing. The two types of error that can occur with hypothesis testing are called Type I and Type II. The associated risks are called Alpha and Beta risks.
• A Type I (Alpha) error is concluding there is a difference when there really isn’t one. - Rejecting the null when you should not!
• A Type II (Beta) error is concluding there is not a difference when there really is one. - Do not reject the null when you should!
19
Type I and Type II errors,Confidence, Power, and p-values
Type I Error
(risk)Correct
Type II Error
(risk)Correct
Reject H0
Do not reject H0
H0 is true
H0 is false
Th
e T
rue
Sta
tem
en
t
Conclusion DrawnYou conclude there IS a difference when there really isn’t
You conclude there is NO difference when there really is
20
Type I and Type II errors in the Justice System
Innocent person
convicted
Innocent person
acquitted
Guilty person acquitted
Guilty person
convicted
GuiltyAcquittal
Did not commit crime
Committed crime
Tru
e S
tate
Verdict
Result MatrixHo: No difference between the accused and an innocent person
Jury Trial Hypothesis TestingVerdict Decision
Acquittal
Guilty Do not reject
Ho
Reject Ho
Did not commit crime
Correct
Type I error
()
Ho
is true
Correct Type I error ()
Committed Crime
Type II error ()
Correct
Ho
is false
Type II error()
CorrectT
he T
ruth
The
Tru
th
22
Introduction to p-value
• The p-value measures the probability of observing a certain amount of difference if the null hypothesis is true.
• In comparing the average length of stay (ALOS) at Hospitals A and B, p-value measures the likelihood of observing a difference in ALOS if the null hypothesis is true.
• If the p-value is large, then both averages probably came from the same population (i.e. there is no difference between ALOS at Hospital A and B).
• If the p-value is small, then it is unlikely both averages came from the same population (i.e. there is a difference between ALOS at Hospital A and B).
23
P-Value (pg. 160)What’s the probability of getting a
value of “40”? mean
50
mean
5040 40
24
Setting the Alpha threshold
• Alpha () is the level of risk you are willing to accept of making a Type I error (i.e. rejecting the null when the null is true).
• Traditionally, alpha () is set at 0.05, which means you are willing to accept a 5% chance of making a Type I error (i.e. rejecting the null when the null is true).
25
P-ValueThe critical value at which the null hypothesis is
rejected.
“If p is low, Ho must go” (usually at or below 0.05)
mean
Fail to reject
Fail to reject
region (reject)
region (reject)
26
Hypothesis Testing – Basic Steps(see also pg 156-160)
1. State the practical problem2. State the null hypothesis3. State the alternate hypothesis4. Test the assumptions of the data5. Determine appropriate alpha () decision value 6. Calculate the appropriate test statistic and calculate
p-value7. If calculated p-value < then reject Ho; if
p-value > then fail to reject Ho
8. Formulate the statistical conclusion into a practical solution
Analyze – Hypothesis Testing – Type I / II Errors
Identify data types
Project Y Project Y Data Type
X Factor X Data Type
What hypothesis is being tested?
Null hypothesis statement
Alternate hypothesis statement
Statistical test
Assumptions
Are the assumptions for this test met (if applicable)?
Results
P-value
% Contribution of variation in X to variation in Y
Accept alternate hypothesis
Reject alternate hypothesis
Conclusions/Observations
Hypothesis Testing Worksheet
28
Statistical Testing – Basic Steps1. What theory or potential cause is presented or proposed? 2. Given the theory or potential cause in front of you, What is the question you are trying to
answer?3. Do you have data directly related to and describing the question you are asking? What
type of data do you have?4. If you do not have data, can you collect the appropriate data (reasonably and
appropriately)? If no data exists relating to the theory being considered, or if it will be very costly to obtain, re-visit the magnitude and urgency of testing this particular theory. Proceed with data collection and sorting/grouping as needed.
5. State the question as a null hypothesis (There is no difference…)6. State the alternate hypothesis7. Test the assumptions of the data as needed (normality, quantity, variances, etc.)8. Determine appropriate alpha () decision value (.05, etc.)9. Chose and calculate the appropriate test statistic (determined by the data you have and
the question you are asking) and the associated p-value10. If calculated p-value < then reject Ho; if p-value > then fail to reject Ho
11. Formulate the statistical conclusion into a practical solution (answer to question)
29
Remember? - Data-Driven Problem Solving:
Hypothesis TestingTwo fundamental questions must be adequately answered in order to be able to adequately perform hypothesis testing:
–What type of data is available (and reliable)?
–What question are you asking (what do you need to understand)?
30
What Type of Data to Analyze:
• Discrete X / Continuous Y
• Continuous X / Continuous Y
• Discrete X / Discrete Y
31
Reference Sheet: Statistical Test Selection and "p-values" interpretation (based on 95% Confidence)
Input (x) Output (Y)Practical / General question we are
askingThe Tool Minitab commands P-Value < 0.05 P-Value > 0.05
/ Continuous Is my collected set of data normally distributedAnderson Darling
Normality TestStat>Basic Statistics >
Display Descriptive StatitsticsYou can be confident that your data is not Normally distributed. You can assume that your data is Normally distributed.
Discrete ContinuousIs the average of my sample the same as a given or
known value 1 Sample t-Test
(against a known value)Stat > Basic Statistics >
1 - Sample tYou can be confident that your sample has a different average
from the known test value.There is no difference between your sample average and the known test value (based on the data
you have).
Discrete ContinuousAre the averages from 2 different sets of data the
same2 Sample t-Test
Stat > Basic Statistics > 2 - Sample t
You can be onfident that the averages of the two samples are different.
There is no difference between the averages of the two samples (based on the data you have).
Discrete ContinuousAre the averages from paired sets of data (e.g.
before / after) the samePaired t-Test Stat > Basic Statistics > Paired t
You can be confident that there is a consistent difference between the pairs of data.
There is no consistent difference between the pairs of data (based on the data you have).
Discrete ContinuousIs there at least one average from several sets of
data (>2) that is different One Way ANOVA Stat > ANOVA > One - Way
You can be confident that at least one of the samples has a different average from the others.
There is no difference in the averages of the samples (based on the data you have).
Discrete ContinuousIs there at least one median from several sets of data
(>2) that is different Kruskal Wallis & Mood's Median Test Stat > Nonparametrics
You can be confident that at least one of the samples has a different median from the others.
There is no difference in the medians of the samples (based on the data you have).
Discrete ContinuousIs there at least one variance from several sets of
data that is different F-test, Levene's test
Bartlett's testStat > ANOVA >
Test for equal variancesYou can be confident that at least one of your samples has a
different standard deviation from the others.There is no difference between the standard deviations of the samples (based on the data you
have).
Discrete DiscreteIs the proportion, or rate, from my sample the same
as a given proportional value1 Proportion
(against a known value)Stat > Basic Statistics >
1 ProportionYou can be confident that your sample has a different proportion
from the known test value.There is no difference between your sample proportion and the known test value (based on the
data you have).
Discrete DiscreteAre the proportions from 2 different sets of data the
same2 Proportion
Stat > Basic Statistics > 2 Proportions
You can be confident that the proportions from the two samples are different.
There is no difference between the proportions from the two samples (based on the data you have).
Discrete Discrete
Is there at least one proportion from several sets of data that is different; Are observed frequencies the
same as expectedChi-Square Stat > Tables > Cross Tabulation and Chi - Square
You can be confident that at least one of the samples has a different proportion from the others.
There is no difference in the proportions from the samples (based on the data you have).
Continuous ContinuousAs one variable changes, you can predict the change
in another (correlated) variableCorrelation
(Pearson Coefficient)Stat > Basic Statistics > Correlation
You can be confident that there is a correlation (Pearson coefficient is not zero).
There is no correlation (based on the data you have). (Pearson coefficient could be zero)
Continuous ContinuousDoes one continuous factor (input) affect another
continuous factor (output)Regression Stat > Regression > Regression
You can be confident that the input factor (predictor) affects the process output.
There is no correlation between the input factor (predictor) and the process output (based on the data you have).
33
Data-Driven Analysis:Discrete X / Continuous Y
• Descriptive Statistics: mean, median, variance, standard deviation
• Graphical display: box plots, error bars, run charts
• Potential Questions: Is there a difference in means, medians, variances
34
1 sample Chi2 TestHO: σ1=σt
HA: σ1≠σt t=target
Stat>Basic Stat>Display Desc>Graphical Summary (if target std dev falls within CI then fail to reject HO)
Chi2 TestHO: σ1=σt
HA: σ1≠σt t=target
Stat>Basic Stat>Display Desc>Graphical Summary (if target std dev falls within CI then fail to reject HO)
2 sample F TestHO: σ1=σ2
HA: σ1≠σ2
Stat>ANOVA>Test for Equal variance
Levene’s TestHO: σ1=σ2=σ3...
HA: σi≠σj for ij (at least one is different)Stat>ANOVA>Test for Equal Variance
>2 sample Bartlett’s TestHO: σ1=σ2=σ3…
HA: σi≠σj for i≠j (at least one is different)Stat>ANOVA>Test for Equal VarianceIf variances are NOT equal, proceed with caution or use Welch’s Test, which is not available in Minitab
Levene’s TestHO: : σ1=σ2=σ3...
HA: σi≠σj for ij (at least one is different)Stat>ANOVA>Test for Equal Variance
DistributionNormal Non-normal or unknown
Sam
ple
Variance Testing
35
Test for Equal VariancesStat>Basic Statistics>2 Variances
36
Test for Equal VariancesStat>Basic Statistics>2 Variances
Test for Equal Variances: Quality versus Region
95% Bonferroni confidence intervals for standard deviations
Region N Lower StDev Upper 1 116 2.13011 2.46845 2.92567 2 67 2.03534 2.46264 3.09934 3 100 2.58684 3.02983 3.64282
Bartlett's Test (Normal Distribution)Test statistic = 5.58, p-value = 0.061
Levene's Test (Any Continuous Distribution)Test statistic = 6.24, p-value = 0.002
37
Test for Equal VariancesStat>Basic Statistics>2 Variances
38
Hypothesis Testing: Discrete X / Continuous Y
For : 1 Sample t-test (See page 162 in The Lean Six Sigma Pocket Toolbook) Ho: equal to a target or known value
Ha: is not equal to a target or known value
Statistical Test: One sample t-testTest Statistic: T-value – based on the area under the curve of an unknown or non-normal distribution
39
Hypothesis Testing: Discrete X / Continuous Y
For : 2 Sample t-test (See page 182 in The Lean Six Sigma Pocket Toolbook)Ho: 1 = 2
Ha: 1 ≠ 2
Statistical Test: 2 Sample t-test Test Statistic: T-value – based on the area under the curve of an unknown or non-normal distribution
40
Hypothesis Testing: Discrete X / Continuous Y
Population is Normal Population is Non-Normal or Unknown
1 group 1-Sample T Test 1-Sample Wilcoxon
2 groups 2-Sample T Test Mann-Whitney Test
>2 groups ANOVA Mood’s Median Test orKruskal Wallis Test
41
Analyze Tools:Discrete X / Continuous Y
• Graphical display: Box plots– The box shows the range of data values comprising the 2nd
and 3rd quartiles of the data – the “middle” 50% of the data
Median line
3rd Quartile line
1st Quartile line
See page 110 in The Lean Six Sigma Pocket Toolbook
42
Analyze Tools: Box Plots
There are 24 entries in this table25%1st Quartile
25%
4th Quartile
25%
2nd Quartile
25%
3rd Quartile
Median= 4.5
The Inter Quartile Range (IQR) is the range encompassed by the 2nd Quartile and 3rd Quartile… 6-4=2
11122344444455556677881013
0
14
5Median
2nd Quartile
3rd Quartile
Extends to largest value within 3Q+1.5 x IQR
Outlier
Extends to smallest value within 2Q-1.5 x IQR
*
43
Data-Driven Analysis:Continuous X / Continuous Y
• Descriptive Statistics: correlation
• Graphical Display: scatter plot, run charts
• See 165-175 in The Lean Six Sigma Toolbook
44
Analyze Tools:Continuous X / Continuous Y
• Correlation indicates whether there is a relationship between the values of two measurements– Positive correlation: higher values in X are associated with higher
values in Y– Negative correlation: higher values in X are associated with lower
values in Y.
• Correlation does NOT imply cause-and-effect!– Correlation could be coincidence– Both variables could be influenced by some lurking variable
45
Hypothesis TestingCorrelation Statistics
• Regression analysis generates correlation coefficients to indicate the strength and nature of the relationship
– Pearson correlation coefficient (r): the strength and direction of the relationship
• Between 1 and -1
– r2:percent of variation in Y that is attributable to X• Between 0 and 1
46
Hypothesis Testing:Continuous X / Continuous Y
For : Regression and Correlation (pg. 168)Ho: The slope of the line is equal to zero
= 0
Ha: The slope of the line does not equal zero
≠ 0
Statistical Test: RegressionTest Statistic: F ratio – a measure of actual to expected
variation in the sample
47
Correlation ExampleStat>Basic Statistics>Correlation
Correlations: Clarity, Quality
Pearson correlation of Clarity and Quality = 0.075P-Value = 0.208
48
Pearson’s r Rules of Thumb
• Strength and direction of relationship between x and Y
• 0 to .20: no or negligible correlation.• .20 to .40: low degree of correlation.• .40 to .60: moderate degree of correlation.• .60 to .80: marked degree of correlation.• .80 to 1.00: high correlation.
49
Regression ExampleStat>Regression>Regression…
Regression Analysis: Quality versus Clarity
The regression equation isQuality = 11.7 + 1.02 Clarity
Predictor Coef SE Coef T PConstant 11.6524 0.7253 16.06 0.000Clarity 1.0234 0.8118 1.26 0.208
S = 2.82408 R-Sq = 0.6% R-Sq(adj) = 0.2%
Analysis of Variance
Source DF SS MS F PRegression 1 12.676 12.676 1.59 0.208Residual Error 281 2241.094 7.975Total 282 2253.770
50
Regression Example 2Stat>Regression>Fitted Line Plot…
Analyze - Continuous X / Continuous Y
Regression Analysis: Quality versus Clarity
The regression equation isQuality = 11.65 + 1.023 Clarity
S = 2.82408 R-Sq = 0.6% R-Sq(adj) = 0.2%
Analysis of Variance
Source DF SS MS F PRegression 1 12.68 12.6757 1.59 0.208Error 281 2241.09 7.9754Total 282 2253.77
51
r2 Rules of Thumb
• The “coefficient of determination”• What percent of the variation in Y is due to x?• less than or equal to .4 - not predictive• .40 to .65 mildly predictive• .65 to .86 moderately predictive• .86 to 1 strongly predictive
Residuals
• Regression uses a method called “least squares” to choose the line that minimizes the sum of the squared vertical distances from the points on the lines.
52
Residuals• The distances between the points and the regression line are called
“residuals.” The residuals represent the portion of the Y that are not explained by the regression equation
53
Residuals
Residuals
Residuals• In Minitab, you can plot the residuals four ways.
54
(also see 195-196 in The Lean Six Sigma Toolbook)
Residuals• Regression has three assumptions about residual “errors.”
55
Errors are:1.Random and independent2.Normally distributed3.Have constant variance
Residuals• Errors are random and independent
56
Residuals versus order1.Displayed in order collected2.If order is immaterial, do not use this3.Are the residuals random? Do they exhibit any patterns?
Residuals• Errors are normally distributed
57
Normal plot of residuals1.Errors should follow a straight line on a normal probability plot2.Use the “fat pencil” test. Would a fat pencil laid on the normal probability plot cover the data points?
Residuals• Errors have constant variance over all values of x
58
Residuals versus fits1.Should show a random scatter and have no pattern2.Should have roughly the same number of point above 0 as below
Flavor versus Quality
59
Correlations: Quality, Flavor
Pearson correlation of Quality and Flavor = 0.870P-Value = 0.000
Regression Analysis: Quality versus Flavor
The regression equation isQuality = 2.913 + 1.997 Flavor
S = 1.39575 R-Sq = 75.7% R-Sq(adj) = 75.6%
Analysis of Variance
Source DF SS MS F PRegression 1 1706.35 1706.35 875.89 0.000Error 281 547.42 1.95Total 282 2253.77
60
Analyze Tools: Continuous X / Continuous Y
20151050
20
15
10
5
0
A3
B3
Scatterplot of B3 vs A3
654321
6
5
4
3
2
1
A2
B2
Scatterplot of B2 vs A2
r =1r2=1Perfect positive correlation
654321
6
5
4
3
2
1
A1
B1
Scatterplot of B1 vs A1
r =-1r2=1Perfect negative correlation
r = 0r2= 0No correlation
61
Data-Driven Analysis:Discrete X / Discrete Y
• Descriptive Statistics: counts and proportions
• Graphical display: bar graph and Pareto chart– A Pareto chart is a type of bar graph where the categories
are arranged from largest to smallest with a line indicating the cumulative percent
62
Contingency Tables
• χ2 : the statistic used to test hypotheses about the frequency of some event– Goodness of Fit: is observed different from
expected?– Test for independence: are samples from the
same distribution?
63
Goodness of Fit Test
• Compare actual and expected frequencies• Calculate the χ2 statistic• Compare to a χ2 critical value from table• If χ2
calc > χ2crit, there is a difference
64
Calculate the χ2 statistic
• χ2= the sum of the squares of the differences between the actual and the expected frequencies divided by the expected frequencies
χ2= Σg
(fo-fe)2
fej=1
65
Coin-toss
• Will a fair coin tossed 100 times come up 66 times heads and 34 times tails?
66
Coin-tossObserved
(fo)Expected
(fe)
Heads 66 50
Tails 34 50
(fo-fe)2
fe
(66-50)2
505.12162
50= 256
50= =
(34-50)2
50=-162
50= 256
50= 5.12
10.24Σ
=10.24Σg
(fo-fe)2
fej=1
χ2calc=
67
Look up the χ2 critical value
• First we must determine the degrees of freedom in the contingency table
• “Degrees of freedom” represents the number of values in the final calculation of a statistic that are free to vary
• DF=(rows in data-1)*(columns in data-1)
• In our example, the DF=1
Df/area 0.1 0.05 0.025 0.01 0.005
1 2.70554 3.84146 5.02389 6.6349 7.87944
2 4.60517 5.99146 7.37776 9.21034 10.59663
3 6.25139 7.81473 9.3484 11.34487 12.83816
4 7.77944 9.48773 11.14329 13.2767 14.86026
5 9.23636 11.0705 12.8325 15.08627 16.7496
68
Look up the χ2 critical value
• If χ2calc > χ2
crit, there is a difference• χ2
calc = 10.24• χ2
crit = 3.84• There is a difference!
p-value
69
Chi-Square Test for Independence
• Goodness of Fit asked if frequencies were different than expected
• Test for Independence asks whether our samples come from the same population
• Example: Students in a Six Sigma Black Belt course are offered two different time slots for taking their final exam. Is there a difference in the passing and failing rates for each group?
• State the null and alternative hypotheses for this problem.
70
Chi-Square Test for Independence
• We use the same formula, but calculate the expected differently
χ2= Σg
(fo-fe)2
fej=1
71
Test for Independence
• Arrange the data in table, showing observed frequencies
• Calculate the expected frequencies for each cell
• Calculate the χ2 statistic in each cell• Sum the χ2 statistic from each cell• Compare to a χ2 critical value from table• If χ2
calc > χ2crit, there is a difference
72
Calculating fe
Number passing
Number failing
Total
1st test
fo=20 fo=50 fo=70
2nd test
fo=40 fo=70 fo=110
Total fo=60 fo=120 fo=180
fe=(f row * f column)
N
73
Calculating fe
Number passing
Number failing
Total
1st test
fo=20
fe=(70*60)/180
fo=50 fo=70
2nd test
fo=40 fo=70 fo=110
Total fo=60 fo=120 fo=180
fe=(f row * f column)
N
74
Calculating fe
Number passing
Number failing
Total
1st test
fo=20
fe=23.33
fo=50
fe=(120*70)/180
fo=70
2nd test
fo=40 fo=70 fo=110
Total fo=60 fo=120 fo=180
fe=(f row * f column)
N
75
Calculating fe
Number passing
Number failing
Total
1st test
fo=20
fe=23.33
fo=50
fe=46.67
fo=70
2nd test
fo=40
fe=36.37
fo=70
fe=73.33
fo=110
Total fo=60 fo=120 fo=180
fe=(f row * f column)
N
76
Calculate the χ2 statistic for each cell
Number passing
Number failing
Total
1st test
fo=20
fe=23.33
fo=50
fe=46.67
fo=70
2nd test
fo=40
fe=36.37
fo=70
fe=73.33
fo=110
Total fo=60 fo=120 fo=180
.476
.151
.238
.303
= 1.169Σg
(fo-fe)2
fej=1
χ2calc=
Df/area 0.1 0.05 0.025 0.01 0.005
1 2.70554 3.84146 5.02389 6.6349 7.87944
2 4.60517 5.99146 7.37776 9.21034 10.59663
3 6.25139 7.81473 9.3484 11.34487 12.83816
4 7.77944 9.48773 11.14329 13.2767 14.86026
5 9.23636 11.0705 12.8325 15.08627 16.7496
77
Look up the χ2 critical value
• If χ2calc > χ2
crit, there is a difference• χ2
calc = 1.169• χ2
crit = 3.84• There is no difference! Therefore, we fail to reject the null
hypothesis. Ho= pass and fail rate are independent of the time the test was administered.
p-value
78
Cramer’s test
• Quantifies the strength of the association between x and y
χ2calc
θ=n(q-1)
Where:
n=total number of observations
q=lesser of rows or columns
Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association
79
Cramer’s test
• Quantifies the strength of the association between x and y
1.169θ=
n(q-1)
Where:
n=total number of observations
q=lesser of rows or columns
Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association
80
Cramer’s test
• Quantifies the strength of the association between x and y
1.169θ=
180(1)
Where:
n=total number of observations
q=lesser of rows or columns
Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association
81
Cramer’s test
• Quantifies the strength of the association between x and y
0.00649θ=
Where:
n=total number of observations
q=lesser of rows or columns
Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association
82
Cramer’s test
• Quantifies the strength of the association between x and y
θ=0.0806
Where:
n=total number of observations
q=lesser of rows or columns
Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association
83
Hypothesis Testing:Discrete X / Discrete Y
For : Comparing one proportion to a given valueHo: The proportion is equal to a given percentage
Ha: The proportion is not equal to a given percentage
Statistical Test: 1 ProportionTest Statistic: Z score – based on the area under the curve of a normal distribution
84
Hypothesis Testing:Discrete X / Discrete Y
For: comparing two proportionsHo: The proportion of group A equals the proportion of
group B PA = PB
Ha: The proportion of group A does not equal the proportion of group BPA ≠ PB
Statistical Test: Test of ProportionsTest Statistic: Z Score – based on the area under the curve of
a normal distribution
85
Hypothesis Testing:Discrete X / Discrete Y
• Considerations– For contingency tables, the expected cell count
should be at least 5– For proportions tests, if you do not have enough
successes or failures in your numerator, consider using Fisher’s Exact Test
– Generally, np > 5 and n(1-p) > 5 is a minimum standard
Six Sigma Analyze:
Remember, statistical analysis and testing within the context of practically applying Lean Six Sigma is about using data to identify the Key Xs to “fix” that will most likely result in a measureable improvement in the process Y (output),
which in turn will improve customer satisfaction and efficiency.
87
Key Deliverables for Analyze
• Main elements of Define and Measure completed
• “Obvious Xs” identified and confirmed• Potential Xs identified, data collected and
analyzed• Root causes investigated and supported with
data – the Xs to improve
88
Start Date: Enter Date End Date: Enter Date
Benchmark Analysis Project Charter Formal Champion
Approval of Charter (signed)
SIPOC - High Level Process Map
Customer CTQs Initial Team meeting
(kickoff)
Start Date: Enter DateEnd Date: Enter Date
Identify Project Y(s) Identify Possible Xs
(possible cause and effect relationships)
Develop & Execute Data Collection Plan
Measurement System Analysis
Establish Baseline Performance
Start Date: Enter DateEnd Date: Enter Date
Identify Vital Few Root Causes of Variation Sources & Improvement Opportunities
Define Performance Objective(s) for Key Xs
Quantify potential $ Benefit
Start Date: Enter DateEnd Date: Enter Date
Generate Solutions Prioritize Solutions Assess Risks Test Solutions Cost Benefit
Analysis Develop &
Implement Execution Plan
Formal Champion Approval
Start Date: Enter DateEnd Date: Enter Date
Implement Sustainable Process Controls – Validate:
Control System Monitoring Plan Response Plan System Integration
Plan $ Benefits Validated Formal Champion
Approval and Report Out
Author: Enter NameDate: April 19, 2023
Project Name:Problem Statement:Mislabeled example
Project Scope:Enter scope description
Champion: NameProcess Owner: NameBlack Belt: NameGreen Belts:Names
Customer(s):CTQ(s):Defect(s):Beginning DPMO:Target DPMO:Estimated Benefits:Actual Benefits:
Not Complete Complete Not Applicable
MeasureMeasureDefineDefine
Directions:•Replace All Of The Italicized, Black Text With Your Project’s Information•Change the blank box into a check mark by clicking on Format>Bullets and•Numbering and changing the bullet.
AnalyzeAnalyze ImproveImprove ControlControl
Six Sigma Analyze:
Now, what specifically are we going to improve in the Improve Phase?
We should have evidence (data) to support what we are improving and why?
Top Related