Addressing multicollinearity in semiconductor manufacturing
Click here to load reader
-
Upload
yu-ching-chang -
Category
Documents
-
view
217 -
download
2
Transcript of Addressing multicollinearity in semiconductor manufacturing
Case Study
Published online 14 January 2011 in Wiley Online Library(wileyonlinelibrary.com) DOI: 10.1002/qre.1173
Addressing Multicollinearity in SemiconductorManufacturingYu-Ching Chang and Christina Mastrangelo∗†
When building prediction models in the semiconductor environment, many variables, such as input/output variables,have causal relationships which may lead to multicollinearity. There are several approaches to address multicollinearity:variable elimination, orthogonal transformation, and adoption of biased estimates. This paper reviews these methodswith respect to an application that has a structure more complex than simple pairwise correlations. We also presenttwo algorithmic variable elimination approaches and compare their performance with that of the existing principalcomponent regression and ridge regression approaches in terms of residual mean square and R2. Copyright © 2011John Wiley & Sons, Ltd.
Keywords: multicollinearity; variable elimination; principal components regression; variance inflation factor
Introduction
Semiconductor manufacturing is a complex system with a long processing time: several hundred steps and many recurrentprocesses that use the same tool groups. One characteristic of semiconductor manufacturing is recurrence. Recurrenceoccurs for several reasons: processing is done layer by layer; expensive equipment is used repeatedly; and precision
requirements in alignment necessitate that some processes need to be performed by the same machine. This especially occurs inthe photolithography area. Figure 1 shows the front-end fabrication process by functional area; a layer is completed after everyloop. Owing to this recurrent characteristic, processes tend to interact making this kind of system more difficult to analyze thansequential systems. Therefore, when multivariate statistical analysis is used to analyze semiconductor systems, multicollinearityamong variables will likely exist most of the time.
If two variables are correlated, this is referred to as pairwise correlation. When that pair of variables is also highly correlatedwith one or more other pairs, as often happens in this situation, this will be referred to in this paper as ‘crossover multicollinearity’.The data used in this paper to demonstrate how to address crossover multicollinearity is from a central processing unit (CPU)which has 32 end-of-line electronic measurements (regressors) and a chip operating speed measurement (the response) perobservation. To disguise the data, it has been standardized. A subset of the correlation matrix is given in Table I. Each cell witha value denotes a ‘high’ pairwise correlation—an absolute value greater than 0.7 and used as the ‘cutoff’ value for this paper. Ifthere were only pairwise correlations present, there would only be one entry for each row or column. Table I also shows that therelationships between the variables are not transitive. For example, parameters 11 and 12 (P_11 and P_12) are highly correlatedas well as P_12 and P_30. We would expect that P_11 and P_30 will also be highly correlated. However, the correlation betweenP_11 and P_30 is not high; it is actually −0.54.
Crossover multicollinearity occurs in semiconductor manufacturing because many process steps are highly dependent on theprevious steps. Figure 2 shows an example of how this multicollinearity may occur in the etching process: step i is largely affectedby pattern definition in the lithography process i. Therefore, the measurements from lithography and etch process would be highlycorrelated. Moreover, lithography results at the next lithography step would be highly correlated to the previous lithographyprocess because they are processed by the same machine. As a result, lithography i is highly correlated to both etching i andlithography i+1 but etching i and lithography i+1 may or may not be highly correlated.
Industrial and Systems Engineering, University of Washington, Seattle, WA 98195-2650, U.S.A.∗Correspondence to: Christina Mastrangelo, Industrial and Systems Engineering, University of Washington, Seattle, WA 98195-2650, U.S.A.†E-mail: [email protected]
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854
84
3
Y.-C. CHANG AND C. MASTRANGELO
Diffusion Photo
Etch
Implant
CMP(optional)
Loop
Wafer Start
Figure 1. Front-end process
Multicollinearity review
Multicollinearity is a well-known condition that affects the estimates in regression models.1, 2 Two regressors, x1 and x2, are saidto be exactly collinear if there is a linear equation and constants, c0, c1, c2, such that c1x1 +c2x2 =c0. Exact multicollinearityis said to exist if there are more than two regressors that are exactly collinear. Multicollinearity, or more precisely approximatelinearity, is defined as a set of regressors, x1, x2,. . ., xp and constants, c0, c1,. . ., cp, that have a near-linear relationship
c1x1 +c2x2 +·· ·+cpxp ≈c0. (1)
The significance of collinearity can be demonstrated by a two-regressor model
y =�1x1 +�2x2 +�,
where E(�)=0 and Var(�)=�. The least-squares normal equation is (X′X)′b=X′y, and the estimates of the regression coefficientsare
b̂= (X′X)−1X′y. (2)
The covariance of b̂ is
cov(b̂)=�(X′X)−1. (3)
Let r12 denote the correlation between x1 and x2, and rjy the correlation between xj and y where j=1, 2. The inverse of (X′X)may be written as
(X′X)−1 =⎡⎣ 1 / (1−r2
12) −r12 / (1−r212)
−r / (1−r212) 1 / (1−r2
12)
⎤⎦ .
If x1 and x2 are highly correlated, r212 will be close to 1. As a result of (2), the signs and values of the regression coefficients b̂
will be poorly estimated and unstable. Also, (3) shows that the variances/covariances are much larger than the case when x1 andx2 are less correlated.
Multicollinearity diagnostics
There are three widely accepted indicators to detect multicollinearity: simple pairwise correlations, variance inflation factors, andeigenvalues. The first and simplest indicator is the correlation matrix with the following rule of thumb: if the absolute value ofan off-diagonal element is larger than 0.8 or 0.9, the two involved regressors are usually considered highly correlated.3, 4 If thepairwise correlation is between 0.7 and 0.8, there may be mild collinearity, and this should also be taken into consideration.5
Another widely accepted measure of multicollinearity is the variance inflation factor (VIF).6 The VIF measures the combinedeffect of the dependence among the regressors on the variance of that term. It is defined as
VIFj = (1−R2j )−1,
where R2j is the coefficient of determination when xj is regressed on the remaining regressors. In general, VIFs exceeding 10
indicate serious multicollinearity, and VIFs between 5 and 10 suggests there might be some mild multicollinearity. O’Brien7
recommends that the VIF threshold be 5 or as low as 4.
84
4
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854
Y.-C. CHANG AND C. MASTRANGELO
Tab
leI.
Co
rrel
atio
nm
atri
x(o
nly
hig
hly
corr
elat
edp
airs
>0.
7ar
esh
ow
n)
Var
iab
les
P_01
P_02
P_03
P_04
P_11
P_12
P_17
P_18
P_19
P_20
P_21
P_22
P_23
P_24
P_28
P_29
P_30
P_31
P_01
0.92
9P_
020.
727
0.88
2P_
030.
929
0.72
70.
782
P_04
0.88
20.
782
P_11
0.84
0−0
.734
0.81
6−0
.830
−1.0
00−0
.840
P_12
0.84
00.
790
−0.7
90−0
.841
−1.0
00−0
.768
P_17
−0.8
69P_
18−0
.811
−0.8
39P_
190.
762
−0.7
60P_
20−0
.734
0.73
3P_
210.
816
0.79
00.
762
−0.9
78−0
.816
−0.7
91P_
22−0
.830
−0.7
90−0
.760
−0.9
780.
830
0.79
0P_
23−0
.869
P_24
−0.8
110.
731
P_28
−1.0
00−0
.841
0.73
3−0
.816
0.83
00.
841
P_29
−0.8
40−1
.000
−0.7
910.
790
0.84
10.
767
P_30
−0.7
680.
767
P_31
−0.8
390.
731
#o
fp
airs
12
32
66
12
22
66
12
66
22
Sum
of
pai
rs0.
929
1.60
92.
438
1.66
45.
060
5.02
90.
869
1.65
01.
522
1.46
74.
952
4.97
90.
869
1.54
25.
062
5.03
01.
535
1.56
9
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854
84
5
Y.-C. CHANG AND C. MASTRANGELO
Lithography i Etching i Deposition i
Lithography i+1 Etching i+1 Deposition i+1
next layer
Highly correlated
Highly correlated
Highly correlated
Not highly correlated
Figure 2. An example of the causes of ‘crossover multicollinearity’
The third method is eigenvalue analysis or, essentially, principal component analysis. A condition number is defined as
�max
�min,
where �max and �min are the largest and smallest eigenvalues of the X′X matrix. A condition number larger than 1000 is anindicator of strong multicollinearity.8 If the condition number is between 100 and 1000, there might be moderate multicollinearityinvolved. Generally, multicollinearity occurs when there is a small �min very close to zero. An example regarding how to useprincipal component analysis to detect the linear relationship among variables is given in the following section.
Techniques to remedy multicollinearity
To address multicollinearity and increase the accuracy of the estimates, four approaches are recommended:
(1) Obtain additional data.(2) Eliminate variables.(3) Transform orthogonally.(4) Adopt biased estimates.
(1) Additional data. Note that regression coefficients are severely affected by the sample data if multicollinearity exists.Farrar and Glauber1 suggest that collecting additional data may relieve the problem of multicollinearity. Unfortunately,collecting additional data is not always possible due to the cost of sampling and availability of data. Weisberg9 uses atwo-regressor model with simulated data to demonstrate that it takes several times more data to have roughly the samevariance/covariance values or the same accuracy.
(2) Variable elimination. Dropping one of the highly correlated variables is a widely used approach due to its simplicity.However, there are two drawbacks of this method. First, the information about the dropped variables is lost in the model.This is a serious problem if the dropped variables are significantly explanatory to the response. Second, determining whichvariables among the highly correlated variables should be removed from the model is another challenge—we will proposea solution to this later in the paper.
(3) Orthogonal transformation. The purpose of orthogonal transformation is to introduce another set of variables which areindependent of each other. The X matrix is transformed into an orthogonal matrix. Such techniques include PrincipalComponents Analysis10 and Gram–Schmidt Orthogonalization.11 However, the new variables may be difficult to interpretif many variables are involved in the transformation.
(4) Adopting biased estimates. The idea of adopting biased estimates is to trade unbiased estimates for ones with a smallermean-squared error (MSE). Such methods include Ridge Regression12 and Latent Root Regression.13 Though biasedestimators provide considerable improvement in the MSE, the method is criticized for not using the unbiased estimators.Another problem is that it is difficult to assess how much improvement in MSE has been achieved or how much bias hasbeen introduced.
Clearly, these approaches have their own advantages and disadvantages in addressing multicollinearity. In this paper, wepropose two simple methods motivated by the above methods—one is a VIF approach and the other is a pairwise-eliminationapproach. These two algorithmic approaches suggest which variables should be dropped at each stage.
84
6
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854
Y.-C. CHANG AND C. MASTRANGELO
Table II. Intermediate results based on VIF algorithm
Original 1st run 2nd run 3rd run 4th run 5th run 6th run 7th run 8th run 9th run 10th run 11th run
P_01 71 411.20 50 682.13 4.46 4.44 4.41 4.26 4.16 4.01 3.78 3.73 3.66 3.63P_02 65 875.00 2.88 2.88 2.88 2.88 2.87 2.76 2.74 2.68 2.68 2.65 2.60P_03 87 307.80 61 993.16P_04 93 522.50P_05 2.20 2.20 2.19 2.19 2.18 2.18 2.04 2.04 2.01 1.98 1.92 1.80P_06 1.60 1.64 1.64 1.63 1.63 1.63 1.60 1.58 1.58 1.57 1.52 1.51P_07 1.60 1.59 1.58 1.58 1.58 1.58 1.51 1.50 1.49 1.49 1.45 1.45P_08 2.40 2.36 2.36 2.36 2.36 2.34 2.33 2.33 2.33 2.30 2.17 2.16P_09 10 276.10 7300.67 3.23 3.22 3.19 3.11 3.07 2.95 2.91 2.56 2.32 2.32P_10 19 119.10 4.77 4.76 4.76 4.74 4.66 4.49 3.65 3.45 3.34 2.89 2.79P_11 23 962.30 22 449.84 22 254.21P_12 19 461.10 19 457.29 19 031.39 14 081.15P_13 1.30 1.35 1.33 1.33 1.32 1.30 1.29 1.29 1.29 1.26 1.26 1.26P_14 1.40 1.35 1.35 1.33 1.33 1.30 1.30 1.30 1.27 1.27 1.26 1.25P_15 2.50 2.46 2.46 2.45 2.44 2.39 2.08 2.03 2.00 1.94 1.88 1.88P_16 2.70 2.67 2.67 2.67 2.66 2.65 2.65 2.64 2.49 2.33 2.33 2.32P_17 7.20 7.18 7.18 7.18 7.15 7.14 7.13 7.13 6.92P_18 8.10 8.09 8.09 8.09 8.09 8.09 8.08P_19 3.40 3.37 3.37 3.37 3.37 3.32 2.99 2.99 2.79 2.70 2.69 2.27P_20 2.70 2.75 2.75 2.75 2.74 2.74 2.64 2.58 2.58 2.58 2.56 2.22P_21 50.10 50.05 50.03 49.51 49.08P_22 48.80 48.81 48.81 48.74 47.92 11.54P_23 6.70 6.71 6.71 6.71 6.67 6.67 6.67 6.57 6.46 3.57 3.25 3.14P_24 4.20 4.19 4.19 4.19 4.18 4.18 4.17 3.88 3.88 3.85 3.18 3.13P_25 1.80 1.83 1.83 1.83 1.83 1.83 1.81 1.81 1.80 1.80 1.79 1.77P_26 2.70 2.66 2.66 2.66 2.64 2.57 2.54 2.54 2.54 2.40 2.32 2.30P_27 2.60 2.63 2.63 2.62 2.62 2.56 2.55 2.55 2.55 2.54 2.26 2.24P_28 22 952.10 22 532.34 22 335.47 9.52 9.24 9.17 6.24 6.24 4.37 4.26 4.01P_29 19 277.50 19 224.44 19 195.96 14 115.56 9.02 8.07 7.85 7.75P_30 4.60 4.56 4.54 4.53 4.53 4.52 3.90 3.78 3.21 3.21 3.19 3.07P_31 7.30 7.27 7.27 7.27 7.26 7.00 7.00 5.08 5.00 4.56P_32 3.20 3.16 3.16 3.16 3.12 2.51 2.37 2.33 2.14 2.14 1.90 1.89RMS 0.434 0.435 0.436 0.436 0.437 0.444 0.452 0.508 0.509 0.509 0.509 0.512R2(%) 56.9 56.9 56.7 56.7 56.3 56.0 55.1 49.6 49.4 49.4 49.4 49.1
In the following section of this paper, a proposed VIF and a pairwise variable elimination algorithm is given, and the results arepresented. Two existing methods, principal components and ridge regression, are briefly introduced and their results summarized.We will evaluate the performance of these four approaches using the residual mean square (RMS) and R2. A larger R2 ispreferred since it represents the proportion of variation explained by the regressors, and a smaller RMS is also preferred sinceit leads to a narrower prediction interval. After the comparison, we address the differences between variable elimination andvariable selection. An improvement on a pairwise dropping approach can be made by utilizing the existing variable selectiontechniques. The advantages and disadvantages are discussed subsequently. Finally, conclusions regarding the four methodsare given.
Variable elimination: VIF approach
Because larger VIFs indicate a higher possibility of multicollinearity, an intuitive variable elimination approach is to dropone variable which has the largest VIF at each run until a threshold is met. Table II shows all intermediate results ofdropping the variable with the highest VIF (marked in bold italics) until all VIFs are smaller than 4. In the first run, P_04is dropped because it has the largest VIF. Based on the model without P_04, the new model generates a new set ofVIFs and a new variable with the largest VIF is picked for dropping in the next run. At the 9th run, this simple methodachieves all VIFs smaller than 5; at the 11th run, all VIFs are smaller than 4. This VIF variable elimination can ‘break’ themulticollinearity. However, the major disadvantage of this method is that interesting process variables may be dropped.A simple variant of this algorithm to overcome this disadvantage is to select two or more of the largest VIFs and dropthe least ‘interesting’ variable where ‘interesting’ is defined by the perspective of the user who would have prior processknowledge.
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854
84
7
Y.-C. CHANG AND C. MASTRANGELO
Variable elimination: pairwise dropping approach
Another commonly used method for combating multicollinearity is a pairwise elimination approach. This approach drops onevariable from every highly correlated pair. A simple guide for this method is to drop the redundant variables first and keep theremaining ones. However, if crossover multicollinearity exists among variables, it is more difficult to decide which variable shouldbe dropped. For example, from Table I we can see that P_03 is highly correlated with P_01, P_02, and P_04, while P_01 is highlycorrelated with P_03 only. If we drop P_01, then P_03 must be preserved in the model because it is the only variable that can‘represent’ variable P_01. However, P_03 is still highly correlated with P_02 and P_04. We need to further remove these last twoparameters. The problem of retaining the most variables while breaking all the existing highly correlated pairs becomes moredifficult when the degree of crossover becomes higher. For example, P_11 and P_12 each have six highly correlated relationships.A simple idea to break ties as quickly as possible is to remove the variable that has most relationships with the rest. For example,if we remove P_03, since it has three large ‘correlationships’, we only need to remove either P_02 or P_04. We propose a simplealgorithm as follows:
Algorithm:
(1) Select the pair having the largest absolute correlation among all the pairs.(2) Remove the variable that has the largest number of highly correlated pairs.(3) In step 2 if there is a tie, remove the variable that is least ‘interesting’.(4) Repeat until all highly correlated pairs are removed.
Applying the algorithm to the data set shown in Table I, P_01 and P_03 are selected first because they have the highestcorrelation of 0.929. Since these two parameters have ‘1 pair’ (bottom of the table), assume P_3 is less interesting and removeit. Continuing in this manner, P_12 and P_28 are selected on the 4th iteration. Because both P_12 and P_28 have ‘6 pairs’, wedrop P_28 by assuming that it is less interesting. In the next iteration, P_11 and P_29 are selected, because both P_11 and P_29have six pairs, we drop P_29 assuming it is less interesting. If we continue the algorithm, we will drop variables P_03, P_04, P_12,P_18, P_20, P_21, P_22, P_23, P_28, P_29, and P_31. Column 2 of Table III shows the VIFs of original data set which indicatethat there is severe multicollinearity in the model. Column 3 of Table III shows the VIFs after these 11 variables are dropped andthat there is no longer obvious multicollinearity in that all the VIFs are smaller than 5. Comparing this method to the above VIFelimination method when 11 variables dropped (Table II), we can see that the two methods have similar results: similar variablesare dropped; the remaining VIFs are roughly the same; and the RMS (0.5133 and 0.512) and R2 (49.1 and 49.1%) are close.
Principal components regression
Principal components regression (PCR) addresses multicollinearity by transforming the regressors into a new set of coordinateaxes which make the transformed regressors orthogonal to each other. In other words, the correlation matrix estimated from thetransformed data would be a diagonal matrix, and the transformed regressors are independent to each other. Refer to Johnsonand Wichern14 for more details. An advantage of PCR is being able to reduce the data set in the new coordinate systems.15 Toillustrate the idea, say the original model is
y=Xb+e,
where the rank of X is p. Now, let T be a p∗p orthogonal matrix whose columns are the eigenvectors of X′X and K=diag(�1,�2,. . . ,�p), where �1 ≥�2 ≥·· ·≥�p ≥0 be the eigenvalues of X′X. Now the transformed model may be written as
y=Za+e,
where Z=XT, a=T′b.Since T is an orthogonal matrix, Z= [Z1, Z2,. . . , Zp] becomes a new set of orthogonal regressors and are referred to as THE
principal components. Because
Z′Z=T′X′XT=K (4)
the eigenvalue �k is the variance due to the kth principal component. Thus, the first kth principal components would attribute∑ki=1 �i /
∑pj=1 �j of the total population variance. The implication of PCR is that if we are interested in a model that can explain
at least 90% of variability and assume that the first k principal components can achieve this, we can use only the first k principalcomponents without loss of much information. In other words, the total number of regressors in the new coordinate system isreduced by p−k. The ordered eigenvalues of the 32 principal components are given in the first column of Table IV (the cumulativevariance is in the second column. The rest of Table IV shows the eigenvectors of the last 7 principal components. Note that manyvalues of the vectors are ≈0; which correspond to �j =0. The estimates based on the first k principal components are given inTable V. Note that since the last 4 eigenvalues are close to zero, we would expect that using the first 28 PCs, in terms of RMSand R2, would not make much difference than using all of the PCs.
The pairwise variable elimination algorithm achieves an RMS of 0.5133 and R2 of 49.1% by dropping 11 variables. However,comparing these results to those of PCR using the first 21 PCs (i.e. also dropping 11 PCs) would be inappropriate, because
84
8
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854
Y.-C. CHANG AND C. MASTRANGELO
Table III. VIF results
Original VIF after VIF by stepwise Stepwise on after BMA on afterVariables VIF pairwise dropped regression VIF by BMA dropped model dropped model
P_01 71 411.2 3.3 21.2 2.6P_02 65 875 2.6 46 676.1 40 128.3 1.6P_03 87 307.8 23.9P_04 93 522.5 66 263.6 56 963.6P_05 2.2 1.9 2.2 2 1.9 1.8P_06 1.6 1.5 1.5 1.3 1.5 1.3P_07 1.6 1.4 1.5 1.3 1.2P_08 2.4 2.2 2P_09 10 276.1 2.5 1.9 1.7P_10 19 119.1 2.8 13 557.3 11 668.4 2.2 1.9P_11 23 962.3 3.7 23 837.8 3.3 3.2P_12 19 461.1 19 120.9P_13 1.3 1.3 1.2 1.1 1.2 1.2P_14 1.4 1.3 1.3 1.2 1.2P_15 2.5 2 2.5 2 1.7 1.6P_16 2.7 2.4 2.7 2.2 2.2 2.2P_17 7.2 3.2 7.1P_18 8.1 8.1 7.1P_19 3.4 2.7 3.4 2.9 2.4 2.3P_20 2.7 2.7P_21 50.1 49.7 36P_22 48.8 48.2 39.7P_23 6.7 6.7P_24 4.2 3.1 3.7 3.6 2.6P_25 1.8 1.8 1.8 1.7 1.7 1.6P_26 2.7 2.5 2.7 2.4 1.3 1.2P_27 2.6 2.3 2.6 2.3P_28 22 952.1 22 918.5 212.8P_29 19 277.5 19 272.2P_30 4.6 3 4.5 2.9 3 3P_31 7.3 6.9 6P_32 3.2 1.9 3.2 1.9 1.8RMS 0.4344 0.5133 0.4347 0.4385 0.5137 0.5161R2(%) 56.9 49.1 56.9 56.5 49.1 48.8
even when 11 PCs are dropped, the first 21st PCs still correspond to the original 32 variables. Nevertheless, let us assume thiscomparison is meaningful. The PCR method achieves an RMS of 0.501 and an R2 of 50.15% by using the first 21st PCs. Theperformance of the two methods is similar.
Ridge regression
Ridge regression remedies the instability problem due to multicollinearity by introducing bias into the model. Instead of solving(2), the ridge estimate, b̂R, is obtained by b̂= (X′X+�I)−1X′y, where �≥0 is a constant. By increasing �, the ill-conditioningproblem will be reduced and the estimates will be more stable. See Figure 3 for the ridge trace where each line representsestimates for a regressor at various values of �. Hoerl et al.16, 17 suggest a procedure for a proper � given by
�= p�̂2
�̂′�̂
where p is the number of regressor variables.
The Hoerl–Kennard–Baldwin (HKB) ridge constant used is 0.0636. Table VI shows the coefficient estimators at various �. Notethat in the process of ‘stabilizing’ the coefficients by increasing �, some of the coefficients remain relatively constant and havesmall values as � varies: P_07 and P_08, for example. A small coefficient implies that the variable has small prediction power. Assuch, if the variable is dropped from the regressors, the prediction will not be affected that much. As a result, we can use ridgeregression to drop some variables which have small coefficients as suggested by the HKB ridge constant. Here, it is arbitrarilyassumed that a coefficient is ‘small’ if its absolute value is below 0.06. Column 6 of Table VI indicates that 8 coefficients (in bolditalics) are small, and the variables may be dropped from the model. The last column of Table VI shows the coefficients afterthose variables are dropped. In this example, these coefficients are fairly similar to the case with all variables. Finally, comparingthe algorithm to the ridge regression results, it is not surprising that ridge regression has a smaller RMS (0.43 versus 0.52).
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854
84
9
Y.-C. CHANG AND C. MASTRANGELO
Table IV. Eigenvalues and the last seven eigenvectors of T
Eigenvalues Cumulative PC 26 PC 27 PC 28 PC 29 PC30 PC 31 PC32
8.3548 0.2611 −0.045 −0.06 0.027 −0.069 −0.076 0.462 −0.4454.9315 0.4152 0.011 −0.05 −0.006 −0.086 −0.117 0.392 0.443.7121 0.5312 0.013 −0.034 0.008 0.075 0.085 −0.511 0.4922.217 0.6005 −0.091 0.024 0.002 0.102 0.139 −0.467 −0.5251.6747 0.6528 −0.034 0.033 −0.018 0.001 0 0 01.5648 0.7017 −0.046 0.084 −0.006 0.001 0 0 01.3324 0.7434 −0.063 0.017 −0.002 0 0 0 01.1487 0.7793 0.009 0.05 −0.017 0 0 0 00.8306 0.8052 0.15 0.042 −0.038 −0.027 −0.029 0.175 −0.1690.8185 0.8308 −0.253 0.154 0.016 −0.047 −0.063 0.211 0.2370.7797 0.8552 0.23 −0.068 0.01 0.455 0.495 0.21 0.0630.6468 0.8754 0.029 0.091 −0.051 0.524 −0.474 0.004 −0.0270.6231 0.8948 0.028 0.018 −0.017 0 0 0 00.4918 0.9102 −0.025 0.045 0.013 −0.001 0 0 00.4556 0.9244 0.149 −0.082 −0.01 0.001 0 0 00.385 0.9365 0.097 0.035 −0.008 0.001 0 0 00.3388 0.9471 −0.481 −0.431 0.012 0.001 0 0 00.2928 0.9562 −0.283 0.528 0.002 0 0 0 00.2614 0.9644 0.021 −0.041 −0.006 0 0 0 00.2294 0.9715 −0.064 0.043 −0.012 0 0 0 00.2124 0.9782 −0.309 0.08 0.706 0.002 −0.004 −0.001 00.1702 0.9835 0.351 −0.022 0.694 0.005 −0.003 0 00.1564 0.9884 −0.301 −0.419 0.003 0.001 0 0 00.1133 0.9919 −0.08 −0.018 0.011 0 0 0 00.0909 0.9948 0.005 0.007 0.01 0 0 0 00.0838 0.9974 −0.114 −0.02 0.023 0.001 0 0 00.0728 0.9997 0.031 −0.07 −0.023 0 0 0 00.0108 0.9999 −0.231 0.069 −0.01 0.459 0.506 0.182 0.0320.0001 1 −0.029 −0.09 0.041 0.526 −0.472 −0.029 0.0020 1 −0.267 0.183 −0.032 0.001 0 0 00 1 −0.016 0.471 −0.055 −0.001 0 0 00 1 0.194 −0.096 −0.071 −0.002 0 0 0
However, the amount of bias introduced in the model is unknown. Another problem of ridge regression is that the 8 droppedvariables appear to have no connection with the VIFs. For example, ridge regression suggests we can drop or ignore P_07 andP_08; however, their VIFs are quite decent. Therefore, they are not the source of multicollinearity. Nevertheless, P_07 and P_08are dropped or ignored for the sake of more stable estimates of the coefficients.
Variable elimination and variable selection
Since dropping variables is, in a sense, similar to variable selection (one includes variables while the other excludes variables fromthe model), it is interesting to see how well the proposed variable elimination algorithm performs comparing to the existingvariable selection algorithms such as Stepwise Regression and Bayesian Model Averaging (BMA).18 Our numerical results, columns4 and 5 of Table III, show that these classical variable selection techniques cannot handle multicollinearity well. We can see thatthere are many VIFs greater than 5 in columns 4 and 5. Nevertheless, we must point out that variable elimination and variableselection view the problem from very different aspects—variable selection is to select a good set of regressors while variableelimination is to remove variables that cause collinearity. Checking basic model assumptions such as normality, reducing theeffect of multicollinearity is something practitioners need to do before building a model. Thus, these variable selection techniquescan apply to the models once multicollinearity has been removed to further exclude some insignificant variables. See columns 6and 7 of Table III for the results of applying stepwise and BMA variable selection to the model after using the pairwise droppingapproach. Some additional variable selection algorithms19, 20 are not discussed here, but it is reported that they can work wellin the presence of multicollinearity.
Performance comparison
If we judge the performance of the above four approaches in terms of RMS and R2, ridge regression is the leading choice.However, ridge regression does not result in unbiased estimates. The PCR method has similar performance as the other twomethods when the same numbers of regressors are in the model. However, PCR is more difficult to use, and it can be challenging
85
0
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854
Y.-C. CHANG AND C. MASTRANGELO
Tab
leV
.Pr
inci
pal
com
po
nen
tsre
gre
ssio
n
All
PCs
Firs
t31
PCs
Firs
t30
PCs
Firs
t29
PCs
Firs
t28
PCs
Firs
t27
PCs
Firs
t26
PCs
Firs
t21
PCs
Firs
t16
PCs
Firs
t15
PCs
Firs
t14
PCs
100%
100%
100%
100%
99%
99%
99%
98%
94%
92%
91%
vari
abili
tyva
riab
ility
vari
abili
tyva
riab
ility
vari
abili
tyva
riab
ility
vari
abili
tyva
riab
ility
vari
abili
tyva
riab
ility
vari
abili
ty
Var
iab
leEs
tim
ate
Esti
mat
eEs
tim
ate
Esti
mat
eEs
tim
ate
Esti
mat
eEs
tim
ate
Esti
mat
eEs
tim
ate
Esti
mat
eEs
tim
ate
P_01
3.89
46.
108
0.16
50.
300
0.08
30.
057
0.02
80.
006
0.03
80.
037
0.00
9P_
027.
415
5.22
40.
182
0.38
80.
119
0.12
50.
101
0.06
50.
041
0.04
60.
073
P_03
−4.1
56−6
.603
−0.0
27−0
.176
0.05
80.
050
0.03
4−0
.007
0.02
30.
020
−0.0
13P_
04−8
.761
−6.1
50−0
.143
−0.3
88−0
.068
−0.0
70−0
.059
−0.0
47−0
.026
−0.0
250.
012
P_05
0.19
90.
201
0.20
30.
204
0.20
50.
223
0.23
90.
199
0.09
00.
093
0.06
5P_
06−0
.100
−0.1
01−0
.103
−0.1
03−0
.101
−0.0
95−0
.055
−0.0
47−0
.088
−0.0
92−0
.025
P_07
0.02
70.
027
0.02
40.
025
0.02
50.
028
0.03
60.
026
0.04
70.
044
0.09
8P_
080.
013
0.01
10.
011
0.01
10.
011
0.02
80.
052
0.09
40.
086
0.08
30.
112
P_09
1.40
62.
245
−0.0
110.
040
−0.0
44−0
.007
0.01
3−0
.044
−0.0
35−0
.040
−0.0
58P_
103.
566
2.38
6−0
.333
−0.2
22−0
.370
−0.3
86−0
.313
−0.2
24−0
.128
−0.1
33−0
.104
P_11
2.46
92.
154
−0.5
45−1
.416
0.01
20.
001
−0.0
31−0
.003
−0.0
46−0
.049
−0.0
37P_
12−2
.577
−2.4
43−2
.494
−1.6
61−0
.018
0.03
30.
076
0.04
50.
011
0.01
0−0
.016
P_13
−0.1
63−0
.162
−0.1
67−0
.167
−0.1
68−0
.151
−0.1
42−0
.165
−0.1
75−0
.177
−0.1
66P_
14−0
.030
−0.0
31−0
.030
−0.0
29−0
.032
−0.0
46−0
.024
−0.0
42−0
.074
−0.0
75−0
.126
P_15
−0.0
71−0
.071
−0.0
76−0
.076
−0.0
74−0
.064
−0.1
03−0
.092
0.01
60.
010
0.02
0P_
160.
065
0.06
60.
066
0.06
60.
068
0.07
60.
093
0.03
90.
024
0.01
90.
020
P_17
−0.0
72−0
.072
−0.0
74−0
.073
−0.0
70−0
.082
−0.2
870.
045
0.05
60.
055
0.05
4P_
18−0
.655
−0.6
56−0
.657
−0.6
57−0
.658
−0.6
59−0
.408
−0.0
41−0
.022
−0.0
20−0
.026
P_19
0.00
00.
000
0.00
20.
001
0.00
10.
006
−0.0
130.
045
−0.0
59−0
.056
−0.0
78P_
20−0
.040
−0.0
40−0
.038
−0.0
39−0
.039
−0.0
27−0
.007
0.04
50.
187
0.18
40.
119
P_21
0.56
50.
566
0.57
90.
586
0.59
3−0
.114
−0.0
76−0
.121
−0.0
35−0
.035
−0.0
39P_
220.
802
0.80
30.
804
0.80
80.
824
0.12
90.
119
0.15
30.
063
0.06
20.
062
P_23
−0.0
49−0
.049
−0.0
53−0
.053
−0.0
49−0
.052
−0.2
510.
035
−0.0
17−0
.019
−0.0
31P_
24−0
.083
−0.0
82−0
.078
−0.0
78−0
.077
−0.0
87−0
.096
0.05
9−0
.013
−0.0
11−0
.016
P_25
−0.1
20−0
.121
−0.1
22−0
.122
−0.1
22−0
.132
−0.1
29−0
.109
−0.1
26−0
.124
−0.0
68P_
26−0
.069
−0.0
69−0
.069
−0.0
69−0
.065
−0.0
88−0
.097
−0.0
79−0
.056
−0.0
56−0
.029
P_27
0.00
50.
004
0.00
40.
004
0.00
30.
026
−0.0
070.
006
−0.0
18−0
.021
−0.0
17P_
281.
938
1.78
0−0
.563
−1.4
51−0
.012
−0.0
030.
030
0.00
20.
046
0.04
80.
037
P_29
−2.8
35−2
.846
−2.4
74−1
.644
0.00
7−0
.034
−0.0
77−0
.046
−0.0
11−0
.010
0.01
6P_
300.
280
0.27
80.
280
0.27
90.
281
0.31
30.
400
0.36
40.
119
0.11
30.
108
P_31
−0.3
98−0
.400
−0.4
02−0
.402
−0.4
04−0
.349
−0.1
250.
036
0.06
50.
065
0.06
0P_
32−0
.034
−0.0
35−0
.041
−0.0
41−0
.046
0.02
5−0
.021
−0.0
610.
071
0.07
30.
075
RMS
0.43
40.
434
0.43
60.
436
0.43
70.
448
0.46
40.
501
0.54
30.
543
0.55
5R2
(%)
56.9
456
.93
56.7
356
.73
56.6
855
.59
53.9
550
.15
45.9
645
.95
44.7
0
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854
85
1
Y.-C. CHANG AND C. MASTRANGELO
Figure 3. Ridge trace of the variables
to interpret the meaning of the new variables. Moreover, the variable elimination methods can work with variable selectiontechniques such as BMA but the PCR method cannot. If we apply BMA on the pairwise variable elimination method, we canachieve a similar RMS of 0.5161 and an R2 of 48.8% by dropping six more variables. If the PCR method uses its first 15th PCs, itcan only achieve an RMS of 0.543 and an R2 of 46.0%.
Issues related to multicollinearity techniques
Because the multicollinearity methods (VIFs, pairwise elimination, and eigenvalues) use different criterion for detecting collinearity,they do not always lead to the same conclusions. Two discrepancies are given as examples: first, even if a variable is highlycorrelated with other variables, its VIF is not necessarily very high; second, some variables might have high VIFs, but they do notbelong to any highly correlated pairs.
To illustrate the first type of discrepancy, P_19(VIF=3.4), P_20(VIF=2.7), P_24(VIF=4.2), P_30(VIF=4.6) have acceptable VIFs,but they are all in highly correlated pairs. See Tables III and I, respectively. P_18 and P_24 have a correlation of −0.81. If themodel has only these regressors, we would think there would be collinearity. However, if we regress P_24 on P_18, the R-squareis 0.658, and the VIF is 2.92; thus, it does not indicate collinearity according to the suggested threshold of 5–10.
For the second type of discrepancy, P_09 and P_10 have very high VIFs (10 276 and 19 119, respectively) but they do notappear in any highly correlated pairs. A closer look at the entire correlation matrix (not included in this paper) shows that allpairwise correlations, with these two variables, appear to be moderate (mostly <0.6). The reason is that the correlation matrixinvolves only two variables at every entry; thus, it has limited ability to detect multicollinearity if the linear relationship includesthree or more variables.
Why do these discrepancies occur? The correlation matrix cannot detect a relationship involving three or more variables, andthe VIF method calculates a score but does not capture a relationship between variables. The two discrepancies may be partiallyexplained by looking at the principal components. Using (4), say that a �p →0 which implies that Zp →0, and hence, XTp →0.Applying this to the last principal component in Table IV gives
0=−0.445P_01+0.44P_02+0.492P_03−0.525P_04−0.169P_09+0.237P_10+0.063P_11+−0.027P_12+0.032P_28+0.002P_29 (5)
The variables in (5) have the largest VIFs in Table III (column 2). Yet again, looking at PC30 in Table IV shows a similar relationshipthat now includes P_21 and P_22. Referring to Table III, they also have large VIFs. In other words, the VIFs are aligned with theprincipal components. If this is the case, dropping any variable in (5) should break the relationship.
However, this is not the case. Recall the definition of multicollinearity in (1). Assume that there are two unknown linearrelationships in the system which are given by
c1x1 +c2x2 +c3x3 ≈ c7,
c4x4 +c5x5 +c6x6 ≈ c8,
where ci are constants and variables x1 ,x2 ,x3 are independent of any x4 ,x5 ,x6 and vice versa. According to (1), we can rewritethe above equations as
c1
c7x1 + c2
c7x2 + c3
c7x3 ≈ c4
c8x4 + c5
c8x5 + c6
c8x6
85
2
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854
Y.-C. CHANG AND C. MASTRANGELO
Tab
leV
I.C
oef
ficie
nts
of
esti
mat
esat
diff
eren
tri
dg
eco
nst
ants
.Th
ela
stco
lum
nco
nta
ins
the
coef
ficie
nts
of
the
vari
able
saf
ter
dro
pp
ing
Rid
ge
Aft
erd
rop
pin
gco
nst
ant
00.
010.
020.
040.
0636
0.08
0.16
0.32
0.64
�=
0.06
36
Inte
rcep
t0.
0000
0.00
000.
0000
0.00
000.
0000
0.00
000.
0000
0.00
000.
0000
0.00
00P_
013.
8935
3.73
523.
3583
2.71
922.
2062
1.95
121.
2613
0.76
310.
4526
2.16
51P_
027.
4150
5.55
384.
5423
3.39
502.
6500
2.31
041.
4553
0.87
720.
5279
2.89
29P_
03−4
.155
7−3
.979
6−3
.562
3−2
.855
1−2
.287
6−2
.005
6−1
.242
9−0
.692
5−0
.350
0−2
.262
4P_
04−8
.760
6−6
.543
0−5
.337
8−3
.970
8−3
.083
2−2
.678
5−1
.659
8−0
.971
2−0
.555
1−3
.370
0P_
050.
1990
0.20
030.
2011
0.20
190.
2025
0.20
280.
2036
0.20
430.
2048
0.19
54P_
06−0
.099
8−0
.100
5−0
.100
9−0
.101
2−0
.101
4−0
.101
5−0
.101
6−0
.101
5−0
.101
3−0
.116
4P_
070.
0266
0.02
640.
0262
0.02
590.
0257
0.02
560.
0254
0.02
530.
0253
P_08
0.01
280.
0121
0.01
180.
0115
0.01
130.
0112
0.01
110.
0111
0.01
12P_
091.
4059
1.34
521.
2018
0.95
870.
7637
0.66
680.
4044
0.21
500.
0970
0.75
86P_
103.
5657
2.56
252.
0172
1.39
850.
9966
0.81
340.
3520
0.03
98−0
.148
81.
1235
P_11
2.46
921.
7758
1.34
710.
8392
0.51
640.
3775
0.07
93−0
.042
2−0
.058
30.
2682
P_12
−2.5
773
−2.3
236
−2.1
405
−1.8
708
−1.6
425
−1.5
183
−1.1
217
−0.7
480
−0.4
553
−1.8
902
P_13
−0.1
632
−0.1
637
−0.1
642
−0.1
650
−0.1
655
−0.1
658
−0.1
666
−0.1
672
−0.1
675
−0.1
621
P_14
−0.0
300
−0.0
302
−0.0
303
−0.0
304
−0.0
306
−0.0
307
−0.0
310
−0.0
314
−0.0
318
P_15
−0.0
708
−0.0
718
−0.0
724
−0.0
732
−0.0
736
−0.0
738
−0.0
743
−0.0
744
−0.0
744
−0.0
657
P_16
0.06
460.
0652
0.06
550.
0659
0.06
620.
0663
0.06
680.
0672
0.06
760.
0507
P_17
−0.0
720
−0.0
722
−0.0
723
−0.0
723
−0.0
722
−0.0
721
−0.0
717
−0.0
713
−0.0
709
−0.0
354
P_18
−0.6
554
−0.6
557
−0.6
559
−0.6
562
−0.6
563
−0.6
564
−0.6
565
−0.6
565
−0.6
560
−0.6
627
P_19
0.00
010.
0003
0.00
050.
0006
0.00
070.
0007
0.00
080.
0008
0.00
08P_
20−0
.040
0−0
.039
7−0
.039
6−0
.039
4−0
.039
2−0
.039
1−0
.039
0−0
.038
8−0
.038
6P_
210.
5650
0.56
890.
5716
0.57
510.
5776
0.57
870.
5813
0.58
170.
5783
0.51
91P_
220.
8022
0.80
390.
8051
0.80
700.
8085
0.80
920.
8112
0.81
170.
8087
0.76
96P_
23−0
.048
5−0
.049
2−0
.049
6−0
.050
0−0
.050
1−0
.050
1−0
.050
0−0
.049
6−0
.049
2P_
24−0
.083
2−0
.082
0−0
.081
2−0
.080
3−0
.079
6−0
.079
3−0
.078
5−0
.077
9−0
.077
5−0
.080
9P_
25−0
.120
0−0
.120
6−0
.120
9−0
.121
2−0
.121
4−0
.121
5−0
.121
8−0
.121
9−0
.122
1−0
.121
8P_
26−0
.068
8−0
.068
6−0
.068
4−0
.068
1−0
.067
8−0
.067
7−0
.067
2−0
.066
6−0
.066
3−0
.065
9P_
270.
0046
0.00
440.
0042
0.00
400.
0038
0.00
370.
0035
0.00
320.
0031
P_28
1.93
801.
3762
1.01
870.
5912
0.32
030.
2049
−0.0
343
−0.1
164
−0.1
088
0.04
12P_
29−2
.835
0−2
.564
3−2
.351
8−2
.034
4−1
.768
5−1
.625
7−1
.179
6−0
.770
8−0
.457
0−2
.044
4P_
300.
2801
0.27
970.
2797
0.27
970.
2799
0.28
000.
2804
0.28
080.
2813
0.25
99P_
31−0
.398
4−0
.399
5−0
.400
2−0
.400
9−0
.401
4−0
.401
6−0
.402
1−0
.402
3−0
.401
8−0
.415
3P_
32−0
.034
3−0
.036
0−0
.037
1−0
.038
6−0
.039
7−0
.040
2−0
.041
7−0
.042
9−0
.043
4RM
S0.
4344
0.43
450.
4347
0.43
500.
4353
0.43
540.
4359
0.43
630.
4367
0.43
80R2
0.56
940.
5693
0.56
920.
5687
0.56
860.
5684
0.56
800.
5675
0.56
720.
5659
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854
85
3
Y.-C. CHANG AND C. MASTRANGELO
thus only one equation can be derived by principal component analysis which looks like
cax1 +cbx2 +ccx3 −cdx4 −cex5 −cf x6 ≈0,
where ca ∼cf are linear combinations of c1 ∼c8. As a result, two or more linear relationships will be identified as one linearrelationship. Thus, dropping one variable among the equation identified by principal component analysis does not guaranteethat the linear relationship will be broken.
In short, the three methods have their own issues. The correlation matrix is the simplest method, and it provides an easyoption to drop variables; however relationships involving more than 2 variables cannot be detected. VIFs detect the existence ofmulticollinearity but provide no information regarding which variables should be dropped. Principal component analysis shedssome light on the linear relationships among variables. However, it alone is not enough to break all collinearity, because it usuallydetects only one complex relationship in the system.
Conclusions
In this paper, we address a special type of multicollinearity referred to here as crossover multicollinearity which is observed inthe semiconductor manufacturing. This condition may be remedied by using principal component regression or ridge regression.However, we propose the use of a pairwise dropping algorithm in combination with a VIF variable elimination method to reducethe effects of multicollinearity. In comparing the results of our algorithm to PCR, both provide similar RMS and R2 values. Inaddition, the proposed method can work with the existing variable selection techniques while PCR cannot. Another advantageof our algorithm is that it is more understandable. The new set of regressors in PCR may have no any physical meaning and bedifficult to interpret.
Ridge regression does provide excellent results in terms of RMS and R2, and it does make the estimates more stable and reducethe RMS. However, ridge regression ignores the source of the problem—that is, what variables actually cause multicollinearity.
References1. Farrar DE, Glauber RR. Multicollinearity in regression analysis: The problems revisited. Review of Economics and Statistics 1967; 49:92--107.2. Montgomery DC, Peck EA, Vining G. Introduction to Linear Regression Analysis (4th edn). Wiley: Hoboken, NJ, 2006.3. Mason CH, Perreault WD. Collinearity, power, and interpretation of multiple regression analysis. Journal of Marketing Research 1991; 28:268--280.4. Mason RL, Gunst RF, Webster JT. Regression analysis and problems of multicollinearity. Communication in Statistics 1975; 4:277--292.5. Tabachnick BG, Fidell LS. Testing hypotheses in multiple regression. Using Multivariate Statistics. Allyn and Bacon: Boston, 2001.6. Marquardt DW. Generalized inverse, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics 1970; 12:591--612.7. O’Brien RM. A caution regarding rules of thumb for variance inflation factors. Quality and Quantity 2007; 41:673--690.8. Belsey DA, Kuh E, Welsch RE. Regression Diagnostics. Wiley: Hoboken, NJ, 1980.9. Weisberg S. Applied Linear Regression. Wiley: Hoboken NJ, 2005.
10. Massy WF. Principal component regression in exploratory statistical research. Journal of American Statistical Association 1965; 60:234--256.11. Farebrother RW. Gram–Schmidt regression. Applied Statistics 1974; 23:470--476.12. Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970; 12:55--67.13. Webster JT, Gunst RF, Mason RL. Latent root regression analysis. Technometrics 1974; 16:513--522.14. Johnson RA, Wichern DW. Applied Multivariate Statistical Analysis. Prentice-Hall: Upper Saddle River, NJ, 2002.15. Hocking RR. The analysis and selection of variables in linear regression. Biometrics 1976; 32:1--49.16. Hoerl AE, Kennard RW, Baldwin KF. Ridge regression: Some simulations. Communications in Statistics 1975; 4:105--123.17. Hoerl AE, Kennard RW. Ridge regression: Iterative estimation of the biasing parameter. Communications in Statistics 1976; A5:77--88.18. Raftery AE, Madigan D, Hoeting JA. Bayesian model averaging for regression models. Journal of the American Statistical Association 1997;
92:179--191.19. Thall PF, Russell KE, Simon RM. Variable selection in regression via repeated data splitting. Journal of Computational and Graphical Statistics
1997; 6:416--434.20. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of Royal Statistics Society B 2005 67:301--320.
Authors’ biographies
Dr Christina Mastrangelo is an Associate Professor of Industrial & Systems Engineering at the University of Washington. Sheholds BS, MS and Ph.D. degrees in Industrial Engineering from Arizona State University. Dr. Mastrangelo’s research interests liein the areas of operational modeling for semiconductor manufacturing, system-level modeling for infectious disease control,multivariate quality control, and statistical monitoring methods for continuous and batch processing. She is a member of ASA,ASEE, ASQ, INCOSE, INFORMS, and a senior member of IIE.
Yu-Ching Chang was a doctoral student of Department of Industrial and Systems Engineering at the University of Washingtonwhile writing this paper and later earning his Ph.D. He had worked for several years for Taiwan Semiconductor ManufacturingCompany. His research interests focus on production and operations management in the semiconductor manufacturing industry.
85
4
Copyright © 2011 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2011, 27 843--854