DMAIC: Improve

Post on 03-Jan-2016

47 views 0 download

Tags:

description

DMAIC: Improve. Robert Setaputra. Objective. Ready to develop, test, and implement solutions to improve the process by reducing variation in the critical output variables caused by the vital few of input variables. Small note. - PowerPoint PPT Presentation

Transcript of DMAIC: Improve

DMAIC: Improve

Robert Setaputra

Objective

Ready to develop, test, and implement solutions to improve the process by reducing variation in the critical output variables caused by the vital few of input variables.

Small note

In many cases, it is difficult to completely separate the activities in Measure, Analyze, and Improve.

Design of Experiment (DOE)

DOE is a collection of statistical methods for studying the relationships between independent variables, and their interactions (also called factors, input variables, or process variables) on a dependent variable (or CTQ).

Design of Experiment (DOE)

High LowTemperature - High - Low

Rainfall

FactorsLevels

23.5

24.6

Replications

Design of Experiment (DOE)

Full factorialAll possible combinationsNo prior knowledge about the subject2k = k factors each with 2 levels

22 = 2 factors each with 2 levels

Fractional factorialExcluding some combinationsPreferred when it is costly to do experiments2k-1 = k-1 factors each with 2 levels

Design of Experiment (DOE)

ANOVA One Factor ANOVA Two Factor

Remember Gage R&R with ANOVA?

Correlation Coefficient

The The sample correlation coefficientsample correlation coefficient ( (rr) measures the ) measures the degree of linearity in the relationship between degree of linearity in the relationship between XX and and YY

-1 -1 << rr << +1 +1

22 yyxx

yyxxrxy

22 yyxx

yyxxrxy

Correlation Analysis

Notes on Correlation Coefficient

Correlation is a measure of linear Correlation is a measure of linear association and not necessarily causationassociation and not necessarily causation

Just because two variables are highly Just because two variables are highly correlated, it does not mean that one correlated, it does not mean that one variable is the cause of the other, and variable is the cause of the other, and vice vice versa.versa.

Notes on Correlation Coefficients

Obviously, the above shows no correlations between X and Y

How about this one? Do you think there is no correlations between X and Y? Remember that rxy only measures linear correlation.

Example

A golfer is interested in investigating the relationship, if A golfer is interested in investigating the relationship, if any, between driving distance and 18-hole scoreany, between driving distance and 18-hole score

277.6259.5269.1267.0255.6272.9

697170707169

Average DrivingDistance (yds.)

Average18-Hole Score

Example (cont’d)

277.6259.5269.1267.0255.6272.9

697170707169

x y

10.65 -7.45 2.15 0.05-11.35 5.95

-1.0 1.0 0 0 1.0-1.0

-10.65 -7.45 0 0-11.35 -5.95

( )ix x( )ix x ( )( )i ix x y y ( )( )i ix x y y ( )iy y( )iy y

AverageStd. Dev.

267.0 70.0 -35.408.2192.8944

Total

Example

Correlation Coefficient

22 yyxx

yyxxrxy

22 yyxx

yyxxrxy

4775.337

4.35

9631.07573.36

4.35

or

Regression Analysis

Simple Regression AnalysisOne predictor and one response.

Multiple Regression AnalysisTwo or more predictors and one response.

Simple Linear Regression

Analyzes the relationship between two variables

It specifies one dependent (response) variable and one independent (predictor) variable

Simple Linear Regression

Regression Model and Parameters

Unknown parametersUnknown parameters are are00 InterceptIntercept11 SlopeSlope

The The assumed modelassumed model for a linear for a linear relationship is:relationship is: yyii = = 00 + + 11xxii + + ii

for all observations (for all observations (ii = 1, 2, …, = 1, 2, …, nn))

Estimations

The The fitted modelfitted model used to predict the used to predict the expectedexpected value of value of YY for a given value for a given value of of XX is: is: yyii = = bb00 + + bb11xxii

The The fitted coefficientsfitted coefficients are are bb00 the estimated interceptthe estimated intercept bb11 the estimated slope the estimated slope

Formulas

yi = b0 + b1xi

where:

1 2

( )( )

( )i i

i

x x y yb

x x

1 2

( )( )

( )i i

i

x x y yb

x x

0 1b y b x 0 1b y b x

ExampleReed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown below.

Number of TV Ads

Number ofCars Sold

13213

1424181727

Example

Slope

Intercept

Estimated regression equationˆ 10 5y x ˆ 10 5y x

1 2

( )( ) 205

( ) 4i i

i

x x y yb

x x

1 2

( )( ) 205

( ) 4i i

i

x x y yb

x x

0 1 20 5(2) 10b y b x 0 1 20 5(2) 10b y b x

Assessing the Fit Relationship Among SST, SSR, SSE

where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error

SST = SSR + SST = SSR + SSE SSE

2( )iy y 2( )iy y 2ˆ( )iy y 2ˆ( )iy y 2ˆ( )i iy y 2ˆ( )i iy y

R2 or Coefficient of Determination

• R2 is a measure of relative fit based on a comparison of SSR and SST.

• 0 < R2 < 1• R2 = 1 means that the regression fits

perfectly (x can 100% explain the variations in y).

R2 or Coefficient of Determination

where:SSR = sum of squares due to regressionSST = total sum of squares

R2 = SSR/SST

Note that in a simple regression, n a simple regression, RR2 = ( = (rr))2

Example

In Reed Auto Example, the coefficient of determination, R2 is

RR22 = SSR/SST = 100/114 = SSR/SST = 100/114 = .8772= .8772

The regression relationship is very strong; 88% of the variability in the number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold.

Hypothesis Testing

We need to determine whether We need to determine whether xx is is statistically significant to statistically significant to yy

To test for the significance, we must To test for the significance, we must conduct a hypothesis test to determine conduct a hypothesis test to determine whether the value of whether the value of bb11 is different than is different than

zero or not.zero or not.

Regression Using Excel (Reed Auto – previous TV ads example)

Regression StatisticsMultiple R 0.936585812R Square 0.877192982Adjusted R Square 0.83625731Standard Error 2.160246899Observations 5

ANOVAdf SS MS F Significance F

Regression 1 100 100 21.42857143 0.018986231Residual 3 14 4.666666667Total 4 114

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 10 2.366431913 4.225771274 0.024236012 2.468950436 17.53104956 2.468950436 17.53104956X Variable 1 5 1.08012345 4.629100499 0.018986231 1.562561893 8.437438107 1.562561893 8.437438107

>> Tools >> Data Analysis >> Regression

Interpreting the result

The regression equation is:

y = 10 + 5x

The above means that when x = 2, the model predicts y (that is ) to be 20.

R2 = 0.8772 means that X could explain 87.72% variations in Y.

y

Interpreting the result Is the slope (b1) statistically significant?

p-value for b1 is 0.01898. Using = 0.05, we reject Ho (since > p-value). Therefore we conclude that the slope is not equal to zero. It means that X is statistically influencing Y.

The above question can be rewrite as:

Is the slope (b1) statistically different than zero? We know that the slope is 5. But our interest is to check whether this value, 5, is statistically different

than zero or not.

0:

0::

1

1

a

o

H

HHypothesis

Reading ANOVA table

Note that in this case K = 1

Multiple Regression

Multiple regression is simply an extension of bivariate regression.

Multiple regression includes more than one independent variable.

Same concepts as in Bivariate Analysis.

Multiple Regression

• Y is the response variable and is assumed to be related to the k predictors (X1, X2, … Xk)

• Regression Model:

• Estimated Regression Equation:

Example: (Y is Price)

Regression StatisticsMultiple R 0.975237836R Square 0.951088837Adjusted R Square 0.947465788Standard Error 20.98830728Observations 30

ANOVAdf SS MS F Significance F

Regression 2 231276.5945 115638.3 262.5106096 2.02772E-18Residual 27 11893.74415 440.509Total 29 243170.3387

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept -23.20585344 30.51527447 -0.760467 0.453565781 -85.81798287 39.406276 -85.81798287 39.406276SqFt 0.187072549 0.012524347 14.93671 1.42561E-14 0.161374728 0.21277037 0.161374728 0.21277037LotSize 6.603089858 1.465180793 4.506672 0.00011462 3.596789208 9.609390509 3.596789208 9.609390509

Example (cont’d)

Is SqFt significantly affecting Price?

0:

0::

1

1

a

o

H

HHypothesis

p-value for b1 is 1.42561E-14 or 1.426 x 10-14 or 0.0000. Using = 0.05, we reject Ho (since > p-value). Therefore we conclude that the slope is not equal to zero. It means that SqFt is statistically influencing Price.

Example (cont’d)

Is LotSize significantly affecting Price?

0:

0::

2

2

a

o

H

HHypothesis

p-value for b1 is 0.00011462. Using = 0.05, we reject Ho (since > p-value). Therefore we conclude that the slope is not equal to zero. It means that LotSize is statistically influencing Price.

Reading ANOVA table