DMAIC: Improve
-
Upload
colleen-bernard -
Category
Documents
-
view
47 -
download
0
description
Transcript of DMAIC: Improve
DMAIC: Improve
Robert Setaputra
Objective
Ready to develop, test, and implement solutions to improve the process by reducing variation in the critical output variables caused by the vital few of input variables.
Small note
In many cases, it is difficult to completely separate the activities in Measure, Analyze, and Improve.
Design of Experiment (DOE)
DOE is a collection of statistical methods for studying the relationships between independent variables, and their interactions (also called factors, input variables, or process variables) on a dependent variable (or CTQ).
Design of Experiment (DOE)
High LowTemperature - High - Low
Rainfall
FactorsLevels
23.5
24.6
Replications
Design of Experiment (DOE)
Full factorialAll possible combinationsNo prior knowledge about the subject2k = k factors each with 2 levels
22 = 2 factors each with 2 levels
Fractional factorialExcluding some combinationsPreferred when it is costly to do experiments2k-1 = k-1 factors each with 2 levels
Design of Experiment (DOE)
ANOVA One Factor ANOVA Two Factor
Remember Gage R&R with ANOVA?
Correlation Coefficient
The The sample correlation coefficientsample correlation coefficient ( (rr) measures the ) measures the degree of linearity in the relationship between degree of linearity in the relationship between XX and and YY
-1 -1 << rr << +1 +1
22 yyxx
yyxxrxy
22 yyxx
yyxxrxy
Correlation Analysis
Notes on Correlation Coefficient
Correlation is a measure of linear Correlation is a measure of linear association and not necessarily causationassociation and not necessarily causation
Just because two variables are highly Just because two variables are highly correlated, it does not mean that one correlated, it does not mean that one variable is the cause of the other, and variable is the cause of the other, and vice vice versa.versa.
Notes on Correlation Coefficients
Obviously, the above shows no correlations between X and Y
How about this one? Do you think there is no correlations between X and Y? Remember that rxy only measures linear correlation.
Example
A golfer is interested in investigating the relationship, if A golfer is interested in investigating the relationship, if any, between driving distance and 18-hole scoreany, between driving distance and 18-hole score
277.6259.5269.1267.0255.6272.9
697170707169
Average DrivingDistance (yds.)
Average18-Hole Score
Example (cont’d)
277.6259.5269.1267.0255.6272.9
697170707169
x y
10.65 -7.45 2.15 0.05-11.35 5.95
-1.0 1.0 0 0 1.0-1.0
-10.65 -7.45 0 0-11.35 -5.95
( )ix x( )ix x ( )( )i ix x y y ( )( )i ix x y y ( )iy y( )iy y
AverageStd. Dev.
267.0 70.0 -35.408.2192.8944
Total
Example
Correlation Coefficient
22 yyxx
yyxxrxy
22 yyxx
yyxxrxy
4775.337
4.35
9631.07573.36
4.35
or
Regression Analysis
Simple Regression AnalysisOne predictor and one response.
Multiple Regression AnalysisTwo or more predictors and one response.
Simple Linear Regression
Analyzes the relationship between two variables
It specifies one dependent (response) variable and one independent (predictor) variable
Simple Linear Regression
Regression Model and Parameters
Unknown parametersUnknown parameters are are00 InterceptIntercept11 SlopeSlope
The The assumed modelassumed model for a linear for a linear relationship is:relationship is: yyii = = 00 + + 11xxii + + ii
for all observations (for all observations (ii = 1, 2, …, = 1, 2, …, nn))
Estimations
The The fitted modelfitted model used to predict the used to predict the expectedexpected value of value of YY for a given value for a given value of of XX is: is: yyii = = bb00 + + bb11xxii
The The fitted coefficientsfitted coefficients are are bb00 the estimated interceptthe estimated intercept bb11 the estimated slope the estimated slope
Formulas
yi = b0 + b1xi
where:
1 2
( )( )
( )i i
i
x x y yb
x x
1 2
( )( )
( )i i
i
x x y yb
x x
0 1b y b x 0 1b y b x
ExampleReed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown below.
Number of TV Ads
Number ofCars Sold
13213
1424181727
Example
Slope
Intercept
Estimated regression equationˆ 10 5y x ˆ 10 5y x
1 2
( )( ) 205
( ) 4i i
i
x x y yb
x x
1 2
( )( ) 205
( ) 4i i
i
x x y yb
x x
0 1 20 5(2) 10b y b x 0 1 20 5(2) 10b y b x
Assessing the Fit Relationship Among SST, SSR, SSE
where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error
SST = SSR + SST = SSR + SSE SSE
2( )iy y 2( )iy y 2ˆ( )iy y 2ˆ( )iy y 2ˆ( )i iy y 2ˆ( )i iy y
R2 or Coefficient of Determination
• R2 is a measure of relative fit based on a comparison of SSR and SST.
• 0 < R2 < 1• R2 = 1 means that the regression fits
perfectly (x can 100% explain the variations in y).
R2 or Coefficient of Determination
where:SSR = sum of squares due to regressionSST = total sum of squares
R2 = SSR/SST
Note that in a simple regression, n a simple regression, RR2 = ( = (rr))2
Example
In Reed Auto Example, the coefficient of determination, R2 is
RR22 = SSR/SST = 100/114 = SSR/SST = 100/114 = .8772= .8772
The regression relationship is very strong; 88% of the variability in the number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold.
Hypothesis Testing
We need to determine whether We need to determine whether xx is is statistically significant to statistically significant to yy
To test for the significance, we must To test for the significance, we must conduct a hypothesis test to determine conduct a hypothesis test to determine whether the value of whether the value of bb11 is different than is different than
zero or not.zero or not.
Regression Using Excel (Reed Auto – previous TV ads example)
Regression StatisticsMultiple R 0.936585812R Square 0.877192982Adjusted R Square 0.83625731Standard Error 2.160246899Observations 5
ANOVAdf SS MS F Significance F
Regression 1 100 100 21.42857143 0.018986231Residual 3 14 4.666666667Total 4 114
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 10 2.366431913 4.225771274 0.024236012 2.468950436 17.53104956 2.468950436 17.53104956X Variable 1 5 1.08012345 4.629100499 0.018986231 1.562561893 8.437438107 1.562561893 8.437438107
>> Tools >> Data Analysis >> Regression
Interpreting the result
The regression equation is:
y = 10 + 5x
The above means that when x = 2, the model predicts y (that is ) to be 20.
R2 = 0.8772 means that X could explain 87.72% variations in Y.
y
Interpreting the result Is the slope (b1) statistically significant?
p-value for b1 is 0.01898. Using = 0.05, we reject Ho (since > p-value). Therefore we conclude that the slope is not equal to zero. It means that X is statistically influencing Y.
The above question can be rewrite as:
Is the slope (b1) statistically different than zero? We know that the slope is 5. But our interest is to check whether this value, 5, is statistically different
than zero or not.
0:
0::
1
1
a
o
H
HHypothesis
Reading ANOVA table
Note that in this case K = 1
Multiple Regression
Multiple regression is simply an extension of bivariate regression.
Multiple regression includes more than one independent variable.
Same concepts as in Bivariate Analysis.
Multiple Regression
• Y is the response variable and is assumed to be related to the k predictors (X1, X2, … Xk)
• Regression Model:
• Estimated Regression Equation:
Example: (Y is Price)
Regression StatisticsMultiple R 0.975237836R Square 0.951088837Adjusted R Square 0.947465788Standard Error 20.98830728Observations 30
ANOVAdf SS MS F Significance F
Regression 2 231276.5945 115638.3 262.5106096 2.02772E-18Residual 27 11893.74415 440.509Total 29 243170.3387
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept -23.20585344 30.51527447 -0.760467 0.453565781 -85.81798287 39.406276 -85.81798287 39.406276SqFt 0.187072549 0.012524347 14.93671 1.42561E-14 0.161374728 0.21277037 0.161374728 0.21277037LotSize 6.603089858 1.465180793 4.506672 0.00011462 3.596789208 9.609390509 3.596789208 9.609390509
Example (cont’d)
Is SqFt significantly affecting Price?
0:
0::
1
1
a
o
H
HHypothesis
p-value for b1 is 1.42561E-14 or 1.426 x 10-14 or 0.0000. Using = 0.05, we reject Ho (since > p-value). Therefore we conclude that the slope is not equal to zero. It means that SqFt is statistically influencing Price.
Example (cont’d)
Is LotSize significantly affecting Price?
0:
0::
2
2
a
o
H
HHypothesis
p-value for b1 is 0.00011462. Using = 0.05, we reject Ho (since > p-value). Therefore we conclude that the slope is not equal to zero. It means that LotSize is statistically influencing Price.
Reading ANOVA table