The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Post on 28-Dec-2015

224 views 2 download

Tags:

Transcript of The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

The DMAIC Lean Six Sigma Project and Team Tools Approach

Analyze Phase(Part 2)

Lean Six Sigma Black Belt Training! Analyze (Part 2) Agenda

Review Analyze Part 1Inferential StatisticsHypothesis TestingP-valuesDiscrete X / Continuous Y Statistical TestsContinuous X / Continuous Y Statistical TestsDiscrete X / Discrete Y Statistical Tests Applications / Lessons Learned / Conclusions Next Steps

Six Sigma AnalyzeInferential Statistics

(Identifying What’s Different (Xs) Statistically)

4

Introduction to Hypothesis Testing

109876543

0.4

0.3

0.2

0.1

0.0

X

Normal, Mean=6.5, StDev=1

11109876543

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.00

X

Normal, Mean=6.9, StDev=1.2

Are these samples from the same population?

Mean=6.5

StDev=1

Mean=6.9

StDev=1.2

Sample 1 Sample 2

5

Intro. to Confidence Intervals (pg. 157)

• Brutal Facts Regarding Samples– We know that the size of the sampling error is primarily

based on the variation in the population and the size of the sample selected.

– Larger samples have a smaller margin or error, yet are more costly to obtain.

– As reality in practice dictates, one sample is usually selected and it usually is the minimum size required.

– Therefore, a method was needed to estimate a population parameter. This method resulted in the term Confidence Interval.

6

Intro. to Confidence Intervals (pg. 157)• A statistic plus or minus a margin of error is called a confidence

interval. • A confidence interval is a range of values, calculated from a data set,

that gives an assigned probability that the true value falls within that range.

• The confidence level is dependent on the range of the margin of error that is selected. Generally, the margin of error that is accepted is plus or minus 2 standard errors, resulting in a 95% confidence level.

• “We are 95% confident that the true average door-to-balloon time is between 60 and 100 minutes.”

50

Assume we have a population of N size that is not normally distributed.

We draw 100 random samples and plot the averages of each sample.

We get a normal distribution with a mean of 50 and n=100.

50

68%

95%

The mean of our sampled distribution is 50.

How confident are we of where the population mean lies?

Similar to standard deviation, we know that 68% of the sample distribution lies within 1 standard error and 95% within 2 standard errors.

-2 SE +2 SE-1 SE +1 SE

Let’s assume we want to be 95% confident of where the true mean of the population lies

We can be 95% confident that the true mean lies within +/- 2SE

50-2 SE +2 SE-1 SE +1 SE

95%

σ√ nSE =

In this case, let’s assume that SE=3, so 2 x SE = 6.

• The mean of our sample distribution is 50.• We are 95% confident that the true mean of the population lies between 44 and 56.• Our margin of error is +/- 6.

10

Central Limit Theorem/ Margin of Error/ Confidence Intervals

• Why Use it? Why is this important?– Six Sigma practitioners use the sample data and apply normal theory for

making inferences about population parameters irrespective of the actual form of the parent population.

– Many statistical tests are founded on the principle that we do not need to know the original distribution. Means and proportions will always be “normal” if n is big enough.

– Practically, we use the central limit theorem to help us estimate the true average, and calculate the likelihood of observing certain events.

– Considering time and resources, we need to have a measure of confidence around our sample statistics.

– None of this is applicable if your data is Unreliable or BIASED!!!!

11

Data-Driven Problem Solving:Hypothesis Testing

Two fundamental questions must be adequately answered in order to be able to adequately perform hypothesis testing:

–What type of data is available (and reliable)?

–What question are you asking (what do you need to understand)?

12

Introduction to Hypothesis Testing (pg. 156)

• Hypothesis testing is basically the process of using statistical analysis to determine if the observed differences between two or more sets of data are due to random chance variation, or due to true differences in the underlying populations.

• Generally, Hypothesis Testing tells us whether or not sets of data are truly different with a certain level of

confidence.

13

Introduction to Hypothesis Testing

109876543

0.4

0.3

0.2

0.1

0.0

X

Normal, Mean=6.5, StDev=1

11109876543

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.00

X

Normal, Mean=6.9, StDev=1.2

Are these samples from the same population?

Mean=6.5

StDev=1

Mean=6.9

StDev=1.2

Sample 1 Sample 2

14

The Six Sigma Approach

Practical Problem – Lab specimens are mislabeled

too often; leads to incorrect

diagnosis and treatment

StatisticalProblem –

Specimens are mislabeled8 out of 10,000

collected

Statistical Solution –

~85% of mislabeled specimens come from

the ED

Practical Solution – Redesign of the

process of labeling and transporting

specimens leads to dramatic reduction in

errors

Six Sigma applies many tools, including statistical tools to practical problems. The key is data-driven decision making.

Statistical Problem

– Defining

the problem in statistical

terms

PracticalSolution –

addresses the verified root causes

Statistical

Solution – Using data

and statistics to understand

the cause of the problem

Practical Problem

– An unacceptable variation or gap

in quality

15

Introduction to Hypothesis Testing• Hypothesis Testing allows us to answer a practical

question - Is there a true difference between ___ and ___ ?

• Practically, Hypothesis Testing uses relatively small sample sizes to answer questions about the population.

• There is always a chance that the samples we have collected are not truly representative of the population. Thus, we may obtain a wrong conclusion about the population(s) being studied.

16

Introduction to Hypothesis Testing:Testing Terms and Concepts

• Statistically, we “ask and answer questions” using stated hypotheses that are tested at some level of confidence.

• The null hypothesis (Ho) is a statement being tested to determine whether or not it is true (the assumption that there is no difference).

• The alternative hypothesis (Ha) is a statement that represents reality if there is enough evidence to reject

the stated null (Ho)… i.e. the null hypothesis is false.

17

Introduction to Hypothesis TestingExample:

Is the average Length of Stay for a total knee replacement different for Hospital A vs. Hospital B?

Common Language:

Ho: There is no difference in average length of stay between facilities.

Ha: There is a difference in average length of stay between facilities.

Statistical Language:

Ho: Alos = Blos

Ha: Alos ≠ Blos

18

Introduction to Hypothesis Testing:Type I and Type II Errors (Risk)

• As stated earlier, there is the risk of arriving at a wrong conclusion about the hypothesis we are testing. The two types of error that can occur with hypothesis testing are called Type I and Type II. The associated risks are called Alpha and Beta risks.

• A Type I (Alpha) error is concluding there is a difference when there really isn’t one. - Rejecting the null when you should not!

• A Type II (Beta) error is concluding there is not a difference when there really is one. - Do not reject the null when you should!

19

Type I and Type II errors,Confidence, Power, and p-values

Type I Error

(risk)Correct

Type II Error

(risk)Correct

Reject H0

Do not reject H0

H0 is true

H0 is false

Th

e T

rue

Sta

tem

en

t

Conclusion DrawnYou conclude there IS a difference when there really isn’t

You conclude there is NO difference when there really is

20

Type I and Type II errors in the Justice System

Innocent person

convicted

Innocent person

acquitted

Guilty person acquitted

Guilty person

convicted

GuiltyAcquittal

Did not commit crime

Committed crime

Tru

e S

tate

Verdict

Result MatrixHo: No difference between the accused and an innocent person

Jury Trial Hypothesis TestingVerdict Decision

Acquittal

Guilty Do not reject

Ho

Reject Ho

Did not commit crime

Correct

Type I error

()

Ho

is true

Correct Type I error ()

Committed Crime

Type II error ()

Correct

Ho

is false

Type II error()

CorrectT

he T

ruth

The

Tru

th

22

Introduction to p-value

• The p-value measures the probability of observing a certain amount of difference if the null hypothesis is true.

• In comparing the average length of stay (ALOS) at Hospitals A and B, p-value measures the likelihood of observing a difference in ALOS if the null hypothesis is true.

• If the p-value is large, then both averages probably came from the same population (i.e. there is no difference between ALOS at Hospital A and B).

• If the p-value is small, then it is unlikely both averages came from the same population (i.e. there is a difference between ALOS at Hospital A and B).

23

P-Value (pg. 160)What’s the probability of getting a

value of “40”? mean

50

mean

5040 40

24

Setting the Alpha threshold

• Alpha () is the level of risk you are willing to accept of making a Type I error (i.e. rejecting the null when the null is true).

• Traditionally, alpha () is set at 0.05, which means you are willing to accept a 5% chance of making a Type I error (i.e. rejecting the null when the null is true).

25

P-ValueThe critical value at which the null hypothesis is

rejected.

“If p is low, Ho must go” (usually at or below 0.05)

mean

Fail to reject

Fail to reject

region (reject)

region (reject)

26

Hypothesis Testing – Basic Steps(see also pg 156-160)

1. State the practical problem2. State the null hypothesis3. State the alternate hypothesis4. Test the assumptions of the data5. Determine appropriate alpha () decision value 6. Calculate the appropriate test statistic and calculate

p-value7. If calculated p-value < then reject Ho; if

p-value > then fail to reject Ho

8. Formulate the statistical conclusion into a practical solution

Analyze – Hypothesis Testing – Type I / II Errors

Identify data types

Project Y Project Y Data Type

X Factor X Data Type

What hypothesis is being tested?

Null hypothesis statement

Alternate hypothesis statement

Statistical test

Assumptions

Are the assumptions for this test met (if applicable)?

Results

P-value

% Contribution of variation in X to variation in Y

Accept alternate hypothesis

Reject alternate hypothesis

Conclusions/Observations

Hypothesis Testing Worksheet

28

Statistical Testing – Basic Steps1. What theory or potential cause is presented or proposed? 2. Given the theory or potential cause in front of you, What is the question you are trying to

answer?3. Do you have data directly related to and describing the question you are asking? What

type of data do you have?4. If you do not have data, can you collect the appropriate data (reasonably and

appropriately)? If no data exists relating to the theory being considered, or if it will be very costly to obtain, re-visit the magnitude and urgency of testing this particular theory. Proceed with data collection and sorting/grouping as needed.

5. State the question as a null hypothesis (There is no difference…)6. State the alternate hypothesis7. Test the assumptions of the data as needed (normality, quantity, variances, etc.)8. Determine appropriate alpha () decision value (.05, etc.)9. Chose and calculate the appropriate test statistic (determined by the data you have and

the question you are asking) and the associated p-value10. If calculated p-value < then reject Ho; if p-value > then fail to reject Ho

11. Formulate the statistical conclusion into a practical solution (answer to question)

29

Remember? - Data-Driven Problem Solving:

Hypothesis TestingTwo fundamental questions must be adequately answered in order to be able to adequately perform hypothesis testing:

–What type of data is available (and reliable)?

–What question are you asking (what do you need to understand)?

30

What Type of Data to Analyze:

• Discrete X / Continuous Y

• Continuous X / Continuous Y

• Discrete X / Discrete Y

31

Reference Sheet: Statistical Test Selection and "p-values" interpretation (based on 95% Confidence)

Input (x) Output (Y)Practical / General question we are

askingThe Tool Minitab commands P-Value < 0.05 P-Value > 0.05

             

/ Continuous Is my collected set of data normally distributedAnderson Darling

Normality TestStat>Basic Statistics >

Display Descriptive StatitsticsYou can be confident that your data is not Normally distributed. You can assume that your data is Normally distributed.

             

Discrete ContinuousIs the average of my sample the same as a given or

known value 1 Sample t-Test

(against a known value)Stat > Basic Statistics >

1 - Sample tYou can be confident that your sample has a different average

from the known test value.There is no difference between your sample average and the known test value (based on the data

you have).

             

Discrete ContinuousAre the averages from 2 different sets of data the

same2 Sample t-Test

Stat > Basic Statistics > 2 - Sample t

You can be onfident that the averages of the two samples are different.

There is no difference between the averages of the two samples (based on the data you have).

             

Discrete ContinuousAre the averages from paired sets of data (e.g.

before / after) the samePaired t-Test Stat > Basic Statistics > Paired t

You can be confident that there is a consistent difference between the pairs of data.

There is no consistent difference between the pairs of data (based on the data you have).

             

Discrete ContinuousIs there at least one average from several sets of

data (>2) that is different One Way ANOVA Stat > ANOVA > One - Way

You can be confident that at least one of the samples has a different average from the others.

There is no difference in the averages of the samples (based on the data you have).

             

Discrete ContinuousIs there at least one median from several sets of data

(>2) that is different Kruskal Wallis & Mood's Median Test Stat > Nonparametrics

You can be confident that at least one of the samples has a different median from the others.

There is no difference in the medians of the samples (based on the data you have).

             

Discrete ContinuousIs there at least one variance from several sets of

data that is different F-test, Levene's test

Bartlett's testStat > ANOVA >

Test for equal variancesYou can be confident that at least one of your samples has a

different standard deviation from the others.There is no difference between the standard deviations of the samples (based on the data you

have).

             

Discrete DiscreteIs the proportion, or rate, from my sample the same

as a given proportional value1 Proportion

(against a known value)Stat > Basic Statistics >

1 ProportionYou can be confident that your sample has a different proportion

from the known test value.There is no difference between your sample proportion and the known test value (based on the

data you have).

             

Discrete DiscreteAre the proportions from 2 different sets of data the

same2 Proportion

Stat > Basic Statistics > 2 Proportions

You can be confident that the proportions from the two samples are different.

There is no difference between the proportions from the two samples (based on the data you have).

             

Discrete Discrete

Is there at least one proportion from several sets of data that is different; Are observed frequencies the

same as expectedChi-Square Stat > Tables > Cross Tabulation and Chi - Square

You can be confident that at least one of the samples has a different proportion from the others.

There is no difference in the proportions from the samples (based on the data you have).

             

Continuous ContinuousAs one variable changes, you can predict the change

in another (correlated) variableCorrelation

(Pearson Coefficient)Stat > Basic Statistics > Correlation

You can be confident that there is a correlation (Pearson coefficient is not zero).

There is no correlation (based on the data you have). (Pearson coefficient could be zero)

             

Continuous ContinuousDoes one continuous factor (input) affect another

continuous factor (output)Regression Stat > Regression > Regression

You can be confident that the input factor (predictor) affects the process output.

There is no correlation between the input factor (predictor) and the process output (based on the data you have).

33

Data-Driven Analysis:Discrete X / Continuous Y

• Descriptive Statistics: mean, median, variance, standard deviation

• Graphical display: box plots, error bars, run charts

• Potential Questions: Is there a difference in means, medians, variances

34

1 sample Chi2 TestHO: σ1=σt

HA: σ1≠σt t=target

Stat>Basic Stat>Display Desc>Graphical Summary (if target std dev falls within CI then fail to reject HO)

Chi2 TestHO: σ1=σt

HA: σ1≠σt t=target

Stat>Basic Stat>Display Desc>Graphical Summary (if target std dev falls within CI then fail to reject HO)

2 sample F TestHO: σ1=σ2

HA: σ1≠σ2

Stat>ANOVA>Test for Equal variance

Levene’s TestHO: σ1=σ2=σ3...

HA: σi≠σj for ij (at least one is different)Stat>ANOVA>Test for Equal Variance

>2 sample Bartlett’s TestHO: σ1=σ2=σ3…

HA: σi≠σj for i≠j (at least one is different)Stat>ANOVA>Test for Equal VarianceIf variances are NOT equal, proceed with caution or use Welch’s Test, which is not available in Minitab

Levene’s TestHO: : σ1=σ2=σ3...

HA: σi≠σj for ij (at least one is different)Stat>ANOVA>Test for Equal Variance

DistributionNormal Non-normal or unknown

Sam

ple

Variance Testing

35

Test for Equal VariancesStat>Basic Statistics>2 Variances

36

Test for Equal VariancesStat>Basic Statistics>2 Variances

Test for Equal Variances: Quality versus Region

95% Bonferroni confidence intervals for standard deviations

Region N Lower StDev Upper 1 116 2.13011 2.46845 2.92567 2 67 2.03534 2.46264 3.09934 3 100 2.58684 3.02983 3.64282

Bartlett's Test (Normal Distribution)Test statistic = 5.58, p-value = 0.061

Levene's Test (Any Continuous Distribution)Test statistic = 6.24, p-value = 0.002

37

Test for Equal VariancesStat>Basic Statistics>2 Variances

38

Hypothesis Testing: Discrete X / Continuous Y

For : 1 Sample t-test (See page 162 in The Lean Six Sigma Pocket Toolbook) Ho: equal to a target or known value

Ha: is not equal to a target or known value

Statistical Test: One sample t-testTest Statistic: T-value – based on the area under the curve of an unknown or non-normal distribution

39

Hypothesis Testing: Discrete X / Continuous Y

For : 2 Sample t-test (See page 182 in The Lean Six Sigma Pocket Toolbook)Ho: 1 = 2

Ha: 1 ≠ 2

Statistical Test: 2 Sample t-test Test Statistic: T-value – based on the area under the curve of an unknown or non-normal distribution

40

Hypothesis Testing: Discrete X / Continuous Y

Population is Normal Population is Non-Normal or Unknown

1 group 1-Sample T Test 1-Sample Wilcoxon

2 groups 2-Sample T Test Mann-Whitney Test

>2 groups ANOVA Mood’s Median Test orKruskal Wallis Test

41

Analyze Tools:Discrete X / Continuous Y

• Graphical display: Box plots– The box shows the range of data values comprising the 2nd

and 3rd quartiles of the data – the “middle” 50% of the data

Median line

3rd Quartile line

1st Quartile line

See page 110 in The Lean Six Sigma Pocket Toolbook

42

Analyze Tools: Box Plots

There are 24 entries in this table25%1st Quartile

25%

4th Quartile

25%

2nd Quartile

25%

3rd Quartile

Median= 4.5

The Inter Quartile Range (IQR) is the range encompassed by the 2nd Quartile and 3rd Quartile… 6-4=2

11122344444455556677881013

0

14

5Median

2nd Quartile

3rd Quartile

Extends to largest value within 3Q+1.5 x IQR

Outlier

Extends to smallest value within 2Q-1.5 x IQR

*

43

Data-Driven Analysis:Continuous X / Continuous Y

• Descriptive Statistics: correlation

• Graphical Display: scatter plot, run charts

• See 165-175 in The Lean Six Sigma Toolbook

44

Analyze Tools:Continuous X / Continuous Y

• Correlation indicates whether there is a relationship between the values of two measurements– Positive correlation: higher values in X are associated with higher

values in Y– Negative correlation: higher values in X are associated with lower

values in Y.

• Correlation does NOT imply cause-and-effect!– Correlation could be coincidence– Both variables could be influenced by some lurking variable

45

Hypothesis TestingCorrelation Statistics

• Regression analysis generates correlation coefficients to indicate the strength and nature of the relationship

– Pearson correlation coefficient (r): the strength and direction of the relationship

• Between 1 and -1

– r2:percent of variation in Y that is attributable to X• Between 0 and 1

46

Hypothesis Testing:Continuous X / Continuous Y

For : Regression and Correlation (pg. 168)Ho: The slope of the line is equal to zero

= 0

Ha: The slope of the line does not equal zero

≠ 0

Statistical Test: RegressionTest Statistic: F ratio – a measure of actual to expected

variation in the sample

47

Correlation ExampleStat>Basic Statistics>Correlation

Correlations: Clarity, Quality

Pearson correlation of Clarity and Quality = 0.075P-Value = 0.208

48

Pearson’s r Rules of Thumb

• Strength and direction of relationship between x and Y

• 0 to .20: no or negligible correlation.• .20 to .40: low degree of correlation.• .40 to .60: moderate degree of correlation.• .60 to .80: marked degree of correlation.• .80 to 1.00: high correlation.

49

Regression ExampleStat>Regression>Regression…

Regression Analysis: Quality versus Clarity

The regression equation isQuality = 11.7 + 1.02 Clarity

Predictor Coef SE Coef T PConstant 11.6524 0.7253 16.06 0.000Clarity 1.0234 0.8118 1.26 0.208

S = 2.82408 R-Sq = 0.6% R-Sq(adj) = 0.2%

Analysis of Variance

Source DF SS MS F PRegression 1 12.676 12.676 1.59 0.208Residual Error 281 2241.094 7.975Total 282 2253.770

50

Regression Example 2Stat>Regression>Fitted Line Plot…

Analyze - Continuous X / Continuous Y

Regression Analysis: Quality versus Clarity

The regression equation isQuality = 11.65 + 1.023 Clarity

S = 2.82408 R-Sq = 0.6% R-Sq(adj) = 0.2%

Analysis of Variance

Source DF SS MS F PRegression 1 12.68 12.6757 1.59 0.208Error 281 2241.09 7.9754Total 282 2253.77

51

r2 Rules of Thumb

• The “coefficient of determination”• What percent of the variation in Y is due to x?• less than or equal to .4 - not predictive• .40 to .65 mildly predictive• .65 to .86 moderately predictive• .86 to 1 strongly predictive

Residuals

• Regression uses a method called “least squares” to choose the line that minimizes the sum of the squared vertical distances from the points on the lines.

52

Residuals• The distances between the points and the regression line are called

“residuals.” The residuals represent the portion of the Y that are not explained by the regression equation

53

Residuals

Residuals

Residuals• In Minitab, you can plot the residuals four ways.

54

(also see 195-196 in The Lean Six Sigma Toolbook)

Residuals• Regression has three assumptions about residual “errors.”

55

Errors are:1.Random and independent2.Normally distributed3.Have constant variance

Residuals• Errors are random and independent

56

Residuals versus order1.Displayed in order collected2.If order is immaterial, do not use this3.Are the residuals random? Do they exhibit any patterns?

Residuals• Errors are normally distributed

57

Normal plot of residuals1.Errors should follow a straight line on a normal probability plot2.Use the “fat pencil” test. Would a fat pencil laid on the normal probability plot cover the data points?

Residuals• Errors have constant variance over all values of x

58

Residuals versus fits1.Should show a random scatter and have no pattern2.Should have roughly the same number of point above 0 as below

Flavor versus Quality

59

Correlations: Quality, Flavor

Pearson correlation of Quality and Flavor = 0.870P-Value = 0.000

Regression Analysis: Quality versus Flavor

The regression equation isQuality = 2.913 + 1.997 Flavor

S = 1.39575 R-Sq = 75.7% R-Sq(adj) = 75.6%

Analysis of Variance

Source DF SS MS F PRegression 1 1706.35 1706.35 875.89 0.000Error 281 547.42 1.95Total 282 2253.77

60

Analyze Tools: Continuous X / Continuous Y

20151050

20

15

10

5

0

A3

B3

Scatterplot of B3 vs A3

654321

6

5

4

3

2

1

A2

B2

Scatterplot of B2 vs A2

r =1r2=1Perfect positive correlation

654321

6

5

4

3

2

1

A1

B1

Scatterplot of B1 vs A1

r =-1r2=1Perfect negative correlation

r = 0r2= 0No correlation

61

Data-Driven Analysis:Discrete X / Discrete Y

• Descriptive Statistics: counts and proportions

• Graphical display: bar graph and Pareto chart– A Pareto chart is a type of bar graph where the categories

are arranged from largest to smallest with a line indicating the cumulative percent

62

Contingency Tables

• χ2 : the statistic used to test hypotheses about the frequency of some event– Goodness of Fit: is observed different from

expected?– Test for independence: are samples from the

same distribution?

63

Goodness of Fit Test

• Compare actual and expected frequencies• Calculate the χ2 statistic• Compare to a χ2 critical value from table• If χ2

calc > χ2crit, there is a difference

64

Calculate the χ2 statistic

• χ2= the sum of the squares of the differences between the actual and the expected frequencies divided by the expected frequencies

χ2= Σg

(fo-fe)2

fej=1

65

Coin-toss

• Will a fair coin tossed 100 times come up 66 times heads and 34 times tails?

66

Coin-tossObserved

(fo)Expected

(fe)

Heads 66 50

Tails 34 50

(fo-fe)2

fe

(66-50)2

505.12162

50= 256

50= =

(34-50)2

50=-162

50= 256

50= 5.12

10.24Σ

=10.24Σg

(fo-fe)2

fej=1

χ2calc=

67

Look up the χ2 critical value

• First we must determine the degrees of freedom in the contingency table

• “Degrees of freedom” represents the number of values in the final calculation of a statistic that are free to vary

• DF=(rows in data-1)*(columns in data-1)

• In our example, the DF=1

Df/area 0.1 0.05 0.025 0.01 0.005

1 2.70554 3.84146 5.02389 6.6349 7.87944

2 4.60517 5.99146 7.37776 9.21034 10.59663

3 6.25139 7.81473 9.3484 11.34487 12.83816

4 7.77944 9.48773 11.14329 13.2767 14.86026

5 9.23636 11.0705 12.8325 15.08627 16.7496

68

Look up the χ2 critical value

• If χ2calc > χ2

crit, there is a difference• χ2

calc = 10.24• χ2

crit = 3.84• There is a difference!

p-value

69

Chi-Square Test for Independence

• Goodness of Fit asked if frequencies were different than expected

• Test for Independence asks whether our samples come from the same population

• Example: Students in a Six Sigma Black Belt course are offered two different time slots for taking their final exam. Is there a difference in the passing and failing rates for each group?

• State the null and alternative hypotheses for this problem.

70

Chi-Square Test for Independence

• We use the same formula, but calculate the expected differently

χ2= Σg

(fo-fe)2

fej=1

71

Test for Independence

• Arrange the data in table, showing observed frequencies

• Calculate the expected frequencies for each cell

• Calculate the χ2 statistic in each cell• Sum the χ2 statistic from each cell• Compare to a χ2 critical value from table• If χ2

calc > χ2crit, there is a difference

72

Calculating fe

Number passing

Number failing

Total

1st test

fo=20 fo=50 fo=70

2nd test

fo=40 fo=70 fo=110

Total fo=60 fo=120 fo=180

fe=(f row * f column)

N

73

Calculating fe

Number passing

Number failing

Total

1st test

fo=20

fe=(70*60)/180

fo=50 fo=70

2nd test

fo=40 fo=70 fo=110

Total fo=60 fo=120 fo=180

fe=(f row * f column)

N

74

Calculating fe

Number passing

Number failing

Total

1st test

fo=20

fe=23.33

fo=50

fe=(120*70)/180

fo=70

2nd test

fo=40 fo=70 fo=110

Total fo=60 fo=120 fo=180

fe=(f row * f column)

N

75

Calculating fe

Number passing

Number failing

Total

1st test

fo=20

fe=23.33

fo=50

fe=46.67

fo=70

2nd test

fo=40

fe=36.37

fo=70

fe=73.33

fo=110

Total fo=60 fo=120 fo=180

fe=(f row * f column)

N

76

Calculate the χ2 statistic for each cell

Number passing

Number failing

Total

1st test

fo=20

fe=23.33

fo=50

fe=46.67

fo=70

2nd test

fo=40

fe=36.37

fo=70

fe=73.33

fo=110

Total fo=60 fo=120 fo=180

.476

.151

.238

.303

= 1.169Σg

(fo-fe)2

fej=1

χ2calc=

Df/area 0.1 0.05 0.025 0.01 0.005

1 2.70554 3.84146 5.02389 6.6349 7.87944

2 4.60517 5.99146 7.37776 9.21034 10.59663

3 6.25139 7.81473 9.3484 11.34487 12.83816

4 7.77944 9.48773 11.14329 13.2767 14.86026

5 9.23636 11.0705 12.8325 15.08627 16.7496

77

Look up the χ2 critical value

• If χ2calc > χ2

crit, there is a difference• χ2

calc = 1.169• χ2

crit = 3.84• There is no difference! Therefore, we fail to reject the null

hypothesis. Ho= pass and fail rate are independent of the time the test was administered.

p-value

78

Cramer’s test

• Quantifies the strength of the association between x and y

χ2calc

θ=n(q-1)

Where:

n=total number of observations

q=lesser of rows or columns

Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association

79

Cramer’s test

• Quantifies the strength of the association between x and y

1.169θ=

n(q-1)

Where:

n=total number of observations

q=lesser of rows or columns

Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association

80

Cramer’s test

• Quantifies the strength of the association between x and y

1.169θ=

180(1)

Where:

n=total number of observations

q=lesser of rows or columns

Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association

81

Cramer’s test

• Quantifies the strength of the association between x and y

0.00649θ=

Where:

n=total number of observations

q=lesser of rows or columns

Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association

82

Cramer’s test

• Quantifies the strength of the association between x and y

θ=0.0806

Where:

n=total number of observations

q=lesser of rows or columns

Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association

83

Hypothesis Testing:Discrete X / Discrete Y

For : Comparing one proportion to a given valueHo: The proportion is equal to a given percentage

Ha: The proportion is not equal to a given percentage

Statistical Test: 1 ProportionTest Statistic: Z score – based on the area under the curve of a normal distribution

84

Hypothesis Testing:Discrete X / Discrete Y

For: comparing two proportionsHo: The proportion of group A equals the proportion of

group B PA = PB

Ha: The proportion of group A does not equal the proportion of group BPA ≠ PB

Statistical Test: Test of ProportionsTest Statistic: Z Score – based on the area under the curve of

a normal distribution

85

Hypothesis Testing:Discrete X / Discrete Y

• Considerations– For contingency tables, the expected cell count

should be at least 5– For proportions tests, if you do not have enough

successes or failures in your numerator, consider using Fisher’s Exact Test

– Generally, np > 5 and n(1-p) > 5 is a minimum standard

Six Sigma Analyze:

Remember, statistical analysis and testing within the context of practically applying Lean Six Sigma is about using data to identify the Key Xs to “fix” that will most likely result in a measureable improvement in the process Y (output),

which in turn will improve customer satisfaction and efficiency.

87

Key Deliverables for Analyze

• Main elements of Define and Measure completed

• “Obvious Xs” identified and confirmed• Potential Xs identified, data collected and

analyzed• Root causes investigated and supported with

data – the Xs to improve

88

Start Date: Enter Date End Date: Enter Date

Benchmark Analysis Project Charter Formal Champion

Approval of Charter (signed)

SIPOC - High Level Process Map

Customer CTQs Initial Team meeting

(kickoff)

Start Date: Enter DateEnd Date: Enter Date

Identify Project Y(s) Identify Possible Xs

(possible cause and effect relationships)

Develop & Execute Data Collection Plan

Measurement System Analysis

Establish Baseline Performance

Start Date: Enter DateEnd Date: Enter Date

Identify Vital Few Root Causes of Variation Sources & Improvement Opportunities

Define Performance Objective(s) for Key Xs

Quantify potential $ Benefit

Start Date: Enter DateEnd Date: Enter Date

Generate Solutions Prioritize Solutions Assess Risks Test Solutions Cost Benefit

Analysis Develop &

Implement Execution Plan

Formal Champion Approval

Start Date: Enter DateEnd Date: Enter Date

Implement Sustainable Process Controls – Validate:

Control System Monitoring Plan Response Plan System Integration

Plan $ Benefits Validated Formal Champion

Approval and Report Out

Author: Enter NameDate: April 19, 2023

Project Name:Problem Statement:Mislabeled example

Project Scope:Enter scope description

Champion: NameProcess Owner: NameBlack Belt: NameGreen Belts:Names

Customer(s):CTQ(s):Defect(s):Beginning DPMO:Target DPMO:Estimated Benefits:Actual Benefits:

Not Complete Complete Not Applicable

MeasureMeasureDefineDefine

Directions:•Replace All Of The Italicized, Black Text With Your Project’s Information•Change the blank box into a check mark by clicking on Format>Bullets and•Numbering and changing the bullet.

AnalyzeAnalyze ImproveImprove ControlControl

Six Sigma Analyze:

Now, what specifically are we going to improve in the Improve Phase?

We should have evidence (data) to support what we are improving and why?