IEOM - Using MS Excel to Design and Optimize Response Surface … · 2020. 10. 30. · Selangor,...

19
Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020 © IEOM Society International Using MS Excel to Design and Optimize Response Surface Methodology-Based Engineering Problems Omar Magdi Khalifa* 1 , Shafeeq Ahmed Syed Ali* 2 , Ahmed Syed Ali 1 , Hedia Fgaier 3 , and Ali Elkamel 4 1 Department of Chemical Engineering, Khalifa University, P.O. Box 127788 Abu Dhabi, United Arab Emirates 2 Department of Chemical Engineering, Monash University, Jalan Lagoon Selatan, Bandar Sunway, 47500 Subang Jaya Selangor, Malaysia 3 Full Sail University, 3300 University Blvd, Winter Park, FL 32792, United States & Valencia College, 1800 S Kirkman Rd, Orlando, FL 32811, United States 4 College of Engineering University of Waterloo 200 University Avenue West Waterloo, ON, N2L 3G1, Canada [email protected], [email protected], [email protected], [email protected], [email protected] *Both authors contributed equally to this work Abstract Many engineering problems involve understanding effects of different variables on a desired output or response. Experimental-based problems can be challenging to assess, especially with limited resources, i.e. time and/or materials. When theoretical models become complicated and costly to produce, empirical or black-box models are highly sought. That can be achieved using mathematical and statistical tools to correlate between the input(s) and output(s) of a system. Proper design of experiment (DoE) is required to attain credible results and good-predicting model, which in turn, leads to proper optimization of the system. Response surface methodology has also been employed for such systems by providing visualization elements and a systematic approach to model an experimental model combining DoE and optimization in one method. Many software packages are utilized to carry-out DoE and ending up with optimization of systems using RSM. Access to such powerful packages can be challenging to many engineers and/or students, and; hence, this paper aims to design and optimize an RSM-based case study using MS Excel. It is designed to accommodate the main features of RSM study and optimize the results with the readily available add-ins. This methodology can be employed in engineering-based courses and serve as a viable learning tool. Keywords MS Excel, Minitab, Optimization, Response Surface Methodology, Design of Experiment 1209

Transcript of IEOM - Using MS Excel to Design and Optimize Response Surface … · 2020. 10. 30. · Selangor,...

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Using MS Excel to Design and Optimize Response Surface Methodology-Based Engineering Problems

    Omar Magdi Khalifa*1, Shafeeq Ahmed Syed Ali*2, Ahmed Syed Ali1, Hedia Fgaier3, and Ali Elkamel4

    1Department of Chemical Engineering, Khalifa University, P.O. Box 127788 Abu Dhabi, United Arab Emirates

    2Department of Chemical Engineering,

    Monash University, Jalan Lagoon Selatan, Bandar Sunway, 47500 Subang Jaya Selangor, Malaysia

    3Full Sail University, 3300 University Blvd, Winter Park, FL 32792, United States

    & Valencia College, 1800 S Kirkman Rd, Orlando, FL 32811, United States

    4College of Engineering University of Waterloo 200 University Avenue West

    Waterloo, ON, N2L 3G1, Canada

    [email protected], [email protected], [email protected], [email protected], [email protected]

    *Both authors contributed equally to this work

    Abstract

    Many engineering problems involve understanding effects of different variables on a desired output or response. Experimental-based problems can be challenging to assess, especially with limited resources, i.e. time and/or materials. When theoretical models become complicated and costly to produce, empirical or black-box models are highly sought. That can be achieved using mathematical and statistical tools to correlate between the input(s) and output(s) of a system. Proper design of experiment (DoE) is required to attain credible results and good-predicting model, which in turn, leads to proper optimization of the system. Response surface methodology has also been employed for such systems by providing visualization elements and a systematic approach to model an experimental model combining DoE and optimization in one method. Many software packages are utilized to carry-out DoE and ending up with optimization of systems using RSM. Access to such powerful packages can be challenging to many engineers and/or students, and; hence, this paper aims to design and optimize an RSM-based case study using MS Excel. It is designed to accommodate the main features of RSM study and optimize the results with the readily available add-ins. This methodology can be employed in engineering-based courses and serve as a viable learning tool. Keywords MS Excel, Minitab, Optimization, Response Surface Methodology, Design of Experiment

    1209

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    1. Introduction

    Designing and performing experiments with multiple variables can be tricky and hard to analyze. There is always the option of changing one variable at a time while keeping the other factors constant. However, it might require too many experiments or even lead to a “pseudo-optimal” point. Hence, a proper design of experiment (DoE) should be carried out to attain the best results with the least number of experiments and the highest accuracy . Experimental systems can be modeled and optimized considering it as a black-box; just a correlation between the variables (what is controlled) and the response (what is observed) without knowing the physical or chemical principles governing the process. A DoE can further be used for optimizing such a black-box using response surface methodology (RSM), which is a collection of mathematical and statistical tools (Bas 2007). The output of an RSM study can be in the form of 3D plots and/or contour maps, which helps visualizing the response surface, hence the name (Myers et al. 2009).

    There are various DoE types which can be applied, in which some can be applicable to the concerned experimental system. Two-level factorial is among the most abundant methods used as a DoE, which entails varying each variable (n) between two levels yielding 2n number of experiments (Montgomery, 1997). Likewise, three-level factorials are used for more accuracy. In general, the more available data points the more accurate is the model produced. There are also special DoE methods for RSM studies, namely central composite design and Box-Behnken design (Box and Draper, 2000). The choice of the design depends on the nature of the experimental system and the availability of resources.

    Many software packages are available for designing and optimizing experiments. Here MS Excel software with add-ins are used to design and optimize a typical Response Surface Methodology problem, mimicking the output of the same problem being solved by Minitab 19TM free trial version. MS Excel has proven to be a reliable tool for scientist and engineers competing with data analysis software programs (Sinex, 2009). It is also a great tool to tackle complex problems requiring numerical methods (Billo, 2007). Lastly, it MS Excel spreadsheets serve as a viable tool for teaching statistics (Nash, 2008).

    2. Problem

    The problem is taken from MinitabTM website, in which the data and results to be compared with the MS Excel solution employed in this paper. The problem statement is as follows:

    “A package engineer needs to ensure that the seals on plastic bags that hold a product are strong enough to prevent leaks, yet not so strong that the consumer cannot open the bags. The bags keep surgical instruments dry and sterile until someone opens the bags. The engineer wants to optimize the seal strength to between 20 and 32 lbs. (lower and upper bounds) with a target of 26 lbs. The engineer also wants to minimize the variability of seal strength so that it is 1 or less. The engineer determines that hot bar temperature, dwell time, and hot bar pressure are factors that affect the strength of the seal. The engineer also determines that hot bar temperature, dwell time, and material temperature are important factors that affect the variation. The engineer designs a central composite response surface experiment to examine the factors that impact the strength and variability of the seal. The engineer uses the natural log transformation to analyze the variability of the seal.

    The engineer collects data and analyzes the design to determine which factors impact seal strength.”

    3. Excel Procedures:

    The following approach is undertaken to design and optimize the seal strength problem. The steps can be duplicated to solve similar response surface methodology (RSM) problems.

    3.1. Determining Objective, Response variable and Factors

    Objective: To optimize seal strength to target and minimize the variability of seal strength.

    Response variables: Strength and Variability of Strength (VarStrength)

    Factors: Hot bar temperature (HotBarT | A), Hot bar pressure (HotBarP | B), Material Temperature (MatTemp | C) and Dwell time (DwelTime | D)

    1210

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    3.2. Importing data to Excel Worksheet

    Copy paste data to excel worksheet

    3.3. Determining the regression equations for Strength & VarStrength

    3.3.1. Determining the coded values for each factor level The coded value is determined based on the following equation (Dunn, 2010):

    𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝐶𝐶 = 𝜃𝜃 − 𝑉𝑉𝑎𝑎𝐶𝐶𝑎𝑎𝑉𝑉𝑎𝑎𝐶𝐶𝑎𝑎𝑉𝑉𝑟𝑟𝑎𝑎𝐶𝐶/2

    Where, 𝜃𝜃 = uncoded value Average = 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑣𝑣𝑀𝑀𝑣𝑣𝑀𝑀𝑣𝑣 𝑜𝑜𝑜𝑜 𝑜𝑜𝑀𝑀𝑓𝑓𝑓𝑓𝑜𝑜𝑓𝑓 + 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑣𝑣𝑀𝑀𝑣𝑣𝑀𝑀𝑣𝑣 𝑜𝑜𝑜𝑜 𝑜𝑜𝑀𝑀𝑓𝑓𝑓𝑓𝑜𝑜𝑓𝑓

    2

    Range = 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑣𝑣𝑀𝑀𝑣𝑣𝑀𝑀𝑣𝑣 𝑜𝑜𝑜𝑜 𝑜𝑜𝑀𝑀𝑓𝑓𝑓𝑓𝑜𝑜𝑓𝑓 − 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑣𝑣𝑀𝑀𝑣𝑣𝑀𝑀𝑣𝑣 𝑜𝑜𝑜𝑜 𝑜𝑜𝑀𝑀𝑓𝑓𝑓𝑓𝑜𝑜𝑓𝑓

    2

    Note: Instead of manually determining the maximum and minimum values of each factor level, the =max(data range) and =min(data range) functions of excel is used.

    3.3.2. Determining the matrix of coded coefficients The matrix of coded coefficients is determined using

    𝑏𝑏 = (𝑋𝑋′𝑋𝑋)¯¹𝑋𝑋′𝑌𝑌 Where, Y is the column matrix of responses of Strength or VarStrength and X is a matrix created using the coded values for factors and factor – factor interactions. In matrix X the first column represents the intercept value and by default all entries in that column is taken to be 1, and the following columns corresponds to the coded value for each factor and factor – factor interactions. The number of rows for matrix X is determined by the total number of trials in the experiment (n) and the number of columns is determined by number of factors, and level of factor – factor interactions being considered. For this problem, it was determined that only interactions till two factors will be determined and all higher interactions are considered negligible. In case higher interactions need to be considered more columns can be added to the matrix X. Once the coded values for factors A, B, C, D are obtained the two factor interactions (AA, AB, AC, AD, BB, BC, BD, CC, CD, DD) are obtained as a product of each of the corresponding individual factors. With both X and Y matrices, the matrix of coded coefficients can be determined by performing matrix operations (one operation at a time) following the general guidelines as stated below. Note: General Guidelines for matrix operations in MS Excel (Chaamwe and Shumba, 2016) The size (m×n) of the resultant matrix has to be pre-determined. (Matrix X is 31×15, therefore X’ will be

    15×31) Continuing the example of X’, once the size is determined in the area where the matrix is required, a drag

    selection is to be made covering exactly 15×31 cells Then start typing the respective matrix operation equation: =TRANSPOSE(array), within the brackets the array

    of data to be transposed (matrix X) is selected For all matrix operations it is important that once the function is typed it can only be initialized by pressing

    ctrl+shift+enter Other matrix functions being used are: =MMULT(array1, array2), =MINVERSE(array1), =MMULT(array,

    constant) etc. Matrix operations are highly sensitive to the order in which they are performed and hence only a single

    operation can be performed at a time.

    3.3.3 Determining the matrix of un-coded coefficients The coded coefficient values are converted to uncoded coefficient values using the following equations:

    1211

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    𝐼𝐼𝑟𝑟𝐼𝐼𝐶𝐶𝑎𝑎𝐼𝐼𝐶𝐶𝐼𝐼𝐼𝐼𝑀𝑀𝑀𝑀𝑓𝑓𝑜𝑜𝑢𝑢𝑣𝑣𝑢𝑢 = 𝐼𝐼𝑟𝑟𝐼𝐼𝐶𝐶𝑎𝑎𝐼𝐼𝐶𝐶𝐼𝐼𝐼𝐼𝑓𝑓𝑜𝑜𝑢𝑢𝑣𝑣𝑢𝑢 −��𝛽𝛽𝑀𝑀 ∗ 𝜃𝜃𝑀𝑀𝑣𝑣𝑎𝑎,𝑀𝑀

    12 ∗ 𝜃𝜃𝑓𝑓𝑀𝑀𝑀𝑀𝑎𝑎𝑣𝑣,𝑀𝑀

    �𝑀𝑀

    𝑀𝑀=1

    + ���𝛽𝛽𝑀𝑀𝑖𝑖 ∗ 𝜃𝜃𝑀𝑀𝑣𝑣𝑎𝑎,𝑀𝑀 ∗ 𝜃𝜃𝑀𝑀𝑣𝑣𝑎𝑎,𝑖𝑖

    12 ∗ 𝜃𝜃𝑓𝑓𝑀𝑀𝑀𝑀𝑎𝑎𝑣𝑣,𝑀𝑀 ∗

    12 ∗ 𝜃𝜃𝑓𝑓𝑀𝑀𝑀𝑀𝑎𝑎𝑣𝑣,𝑖𝑖

    �𝑀𝑀

    𝑀𝑀=1

    𝑀𝑀

    𝑖𝑖=𝑀𝑀

    𝛽𝛽𝑀𝑀,𝑀𝑀𝑀𝑀𝑓𝑓𝑜𝑜𝑢𝑢𝑣𝑣𝑢𝑢 =𝛽𝛽𝑀𝑀,𝑓𝑓𝑜𝑜𝑢𝑢𝑣𝑣𝑢𝑢

    12 ∗ 𝜃𝜃𝑓𝑓𝑀𝑀𝑀𝑀𝑎𝑎𝑣𝑣,𝑀𝑀

    −2𝛽𝛽𝑀𝑀𝑀𝑀 ∗ 𝜃𝜃𝑀𝑀𝑣𝑣𝑎𝑎,𝑀𝑀

    �12 ∗ 𝜃𝜃𝑓𝑓𝑀𝑀𝑀𝑀𝑎𝑎𝑣𝑣,𝑀𝑀�2 + ��

    𝛽𝛽𝑀𝑀𝑖𝑖 ∗ 𝜃𝜃𝑀𝑀𝑣𝑣𝑎𝑎,𝑖𝑖12 ∗ 𝜃𝜃𝑓𝑓𝑀𝑀𝑀𝑀𝑎𝑎𝑣𝑣,𝑀𝑀 ∗

    12 ∗ 𝜃𝜃𝑓𝑓𝑀𝑀𝑀𝑀𝑎𝑎𝑣𝑣,𝑖𝑖

    �𝑀𝑀

    𝑖𝑖=𝑀𝑀

    𝛽𝛽𝑀𝑀𝑖𝑖,𝑀𝑀𝑀𝑀𝑓𝑓𝑜𝑜𝑢𝑢𝑣𝑣𝑢𝑢 =𝛽𝛽𝑀𝑀𝑖𝑖,𝑓𝑓𝑜𝑜𝑢𝑢𝑣𝑣𝑢𝑢

    12 ∗ 𝜃𝜃𝑓𝑓𝑀𝑀𝑀𝑀𝑎𝑎𝑣𝑣,𝑀𝑀 ∗

    12 ∗ 𝜃𝜃𝑓𝑓𝑀𝑀𝑀𝑀𝑎𝑎𝑣𝑣,𝑖𝑖

    Where, 𝜷𝜷𝒊𝒊𝒊𝒊,𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖 refers to coefficients for interaction terms

    Note that different equations are used in case of intercept, single – factor term, two – factor interaction terms.

    An example of how the formula is entered to excel to calculate the uncoded B is shown:

    Figure 1: Excel Screenshot of determining the un-coded coefficient

    Note: In the case of coefficients for VarStrength, in step 3.3.2 the matrix Y of Strength can be replaced by matrix Y of VarStrength.

    The un-coded coefficients multiplied by the corresponding factor or two – factor interaction terms gives the regression equations. The regression equation thus obtained for strength is shown below:

    𝑆𝑆𝐼𝐼𝑎𝑎𝐶𝐶𝑟𝑟𝑎𝑎𝐼𝐼ℎ = −289.27 + 2.29𝐴𝐴 + 206.61𝐵𝐵 + 0.12𝐶𝐶 + 0.6𝐷𝐷 + 0.004𝐴𝐴𝐴𝐴 − 0.93𝐴𝐴𝐵𝐵 − 0.00007𝐴𝐴𝐶𝐶 − 0.00027𝐴𝐴𝐷𝐷− 39.61𝐵𝐵𝐵𝐵 + 0.044𝐵𝐵𝐶𝐶 + 0.0474𝐵𝐵𝐷𝐷 + 0.00053𝐶𝐶𝐶𝐶 − 0.0001𝐶𝐶𝐷𝐷 + 0.0029𝐷𝐷𝐷𝐷

    3.4. Predicted Response values & Residual Plots

    Once the regression equation is developed, response values for each of the trials at various factor settings can then be determined by substituting the corresponding values for the four factors and two – factor interaction terms. The values obtained as such are referred to as Predicted Response values. A Residual Response value is then the difference between the actual response value (from initial data) and the predicted response value (obtained from the regression equations). With these, multiple plots can be generated to study the experiment model – Normal Probability, versus fits, versus order and Histogram. These four plots are of great importance as they can reveal if any bias or hidden variable exists in the system, which assesses the general goodness of the model.

    3.4.1. Normal Probability Plot: Here the normal probability chart is generated using the median rank method, there are other available methods

    also, which can be selected based on available data and requirements. The column with residual response values is sorted from the smallest to the largest.

    1212

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    The median rank is then determined using the following equation:

    𝑚𝑚𝐶𝐶𝐶𝐶𝑚𝑚𝑉𝑉𝑟𝑟 𝑎𝑎𝑉𝑉𝑟𝑟𝑟𝑟 =𝑚𝑚 − 0.3𝑟𝑟 + 0.4

    Where, i refers to the rank of the data point and n is the total number of trials

    The sorted residual response values are then plotted against the median rank.

    Note: The trendline generated using excel in case of normal probaility charts vary from the ones using minitab software.

    3.4.2. Verses Fits Graph A verses fits graph can be plotted by inserting recommended chart selecting the predicted value and residual value columns.

    3.4.3. Verses Order Graph A verses order graph can be plotted by inserting recommended chart selecting the run order and residual value columns. (Run order refers to the order in which the trials have been performed and to generate the graph run order is a column with numerical entries 1 to 31)

    3.4.4. Histogram A histogram is plotted by first generating a pivot table, which categorizes the residual response values into different ranges.

    Note: General instruction for creating a pivot table

    Create a pivot table with the values of the residual property (VarStrength) Drag the VarStrength residual to ROWS and Values areas Adjust the Values area to show count of values Group the VarStrength residual in ROWS by values by adjusting minimum, maximum, and step value Create a pilot chart by selecting the above columns. That yields the histogram chart for residuals.

    Note: The above steps can be repeated using the residual response values of VarStrength to obtain the residual plots for variation of strength.

    3.5. Sum of Squares (SS) and Predicted Error in Sum of Squares (PRESS)

    Using MS Excel only SS total and SS error could be determined and not the sum of squares for each individual factors. This is due to the unbalanced data set for each factor level (number of trials representing each factor level is not equal). To determine the individual SS advanced softwares like minitab and R is to be used.

    3.5.1. Sum of Squares (SS) Total Sum of Squares (SS) Total is determined using the following equation:

    𝑆𝑆𝑆𝑆 𝑇𝑇𝐶𝐶𝐼𝐼𝑉𝑉𝑉𝑉 = ∑𝑀𝑀∑𝑖𝑖(𝑦𝑦𝑀𝑀𝑖𝑖 − 𝑦𝑦…���)2 = ∑(𝑦𝑦𝑀𝑀 − 𝑦𝑦�)2 Where, 𝑦𝑦𝑀𝑀𝑖𝑖 and 𝑦𝑦𝑀𝑀 are the response values, and 𝑦𝑦…��� and 𝑦𝑦� are the mean of the response values.

    3.5.2. Sum of Squares (SS) Error Sum of Squares (SS) Error is determined using the following equation:

    𝑆𝑆𝑆𝑆 𝐸𝐸𝑎𝑎𝑎𝑎𝐶𝐶𝑎𝑎 = ∑𝑀𝑀∑𝑖𝑖(𝑦𝑦𝑀𝑀𝑖𝑖 − 𝑦𝑦𝚤𝚤�)2 = ∑(𝑦𝑦𝑀𝑀 − 𝑦𝑦�)2 Where, 𝑦𝑦𝑀𝑀𝑖𝑖 and 𝑦𝑦𝑀𝑀 are the response values, and 𝑦𝑦𝚤𝚤� and 𝑦𝑦� are the predicted response values. In other words SS error can be said to be the sum of sqaures of residual values.

    1213

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    3.5.3. Predicted Error in Sum of Squares (PRESS) PRESS is used to determine the R – sq (pred) in the model summary. It is calculated using the following steps:

    𝑃𝑃𝑃𝑃𝐸𝐸𝑆𝑆𝑆𝑆 = �(𝐶𝐶𝑀𝑀

    1 − ℎ𝑀𝑀)2

    𝑀𝑀

    𝑀𝑀=1

    Where, n is the number of observations ei is the ith residual hi is the ith diagonal element of X(X’X)-1X’

    The required matrix X(X’X)-1X’ is step by step constructed following the general rules of matrix operation in excel as stated above. Note: For extracting the diagonal data entries of a square matrix =INDEX(A1:E1,,rows(array)) function in excel can be used. ‘A1:E1’ has to be replaced with the array representing the first row range of the matrix from which the diagonal matrix data is to be extracted.

    3.6. Model Summary

    The model summary shows different values of R-sq. The default R-sq is the percentage of error that can be explained by the model. The higher the value, the better fit is the model. The adjusted R-sq is a sort of a “normalized” value which is mainly used to compare between different models, especially when the number of factors differ for each model. The R-sq (pred) is a measure of the predictability of the model. It is an average value of multiple R-sq values calculated by omitting one data point at a time and calculating the residual of that point using the new model to check its predictability. A value of 100% entails that the model perfectly maps the response of the system and can be used to predict new points with confidence. A value of 0% (in this problem as well) means the model is over-fitted and will poorly predict new data points.

    The following equations are used to calculate model summary values:

    𝐷𝐷𝐷𝐷 𝐸𝐸𝑎𝑎𝑎𝑎𝐶𝐶𝑎𝑎 = 𝑟𝑟 − 𝐼𝐼

    𝑀𝑀𝐶𝐶𝑉𝑉𝑟𝑟 𝑆𝑆𝑆𝑆𝑉𝑉𝑉𝑉𝑎𝑎𝐶𝐶 𝐸𝐸𝑎𝑎𝑎𝑎𝐶𝐶𝑎𝑎 (𝑀𝑀𝑆𝑆𝐸𝐸 | 𝑠𝑠2) = 𝑆𝑆𝑆𝑆 𝐸𝐸𝑎𝑎𝑎𝑎𝐶𝐶𝑎𝑎𝐷𝐷𝐷𝐷 𝐸𝐸𝑎𝑎𝑎𝑎𝐶𝐶𝑎𝑎

    𝑆𝑆 = √𝑀𝑀𝑆𝑆𝐸𝐸

    𝑃𝑃 − 𝑠𝑠𝑆𝑆 (𝑎𝑎2) = 1 −𝑆𝑆𝑆𝑆 𝐸𝐸𝑎𝑎𝑎𝑎𝐶𝐶𝑎𝑎 𝑆𝑆𝑆𝑆 𝑇𝑇𝐶𝐶𝐼𝐼𝑉𝑉𝑉𝑉

    = 1 −∑(𝑦𝑦𝑀𝑀 − 𝑦𝑦𝚤𝚤�)2

    ∑(𝑦𝑦𝑀𝑀 − 𝑦𝑦�)2

    𝑃𝑃 − 𝑠𝑠𝑆𝑆 (𝑉𝑉𝐶𝐶𝑎𝑎) = 1 −(1 − 𝑃𝑃2) × (𝑟𝑟 − 1)

    (𝑟𝑟 − 𝐼𝐼)

    𝑃𝑃 − 𝑠𝑠𝑆𝑆 (𝐼𝐼𝑎𝑎𝐶𝐶𝐶𝐶) = 1 − 𝑃𝑃𝑃𝑃𝐸𝐸𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑇𝑇𝐶𝐶𝐼𝐼𝑉𝑉𝑉𝑉

    = 1 −∑ ( 𝐶𝐶𝑀𝑀1 − ℎ𝑀𝑀

    )2𝑀𝑀𝑀𝑀=1∑(𝑦𝑦𝑀𝑀 − 𝑦𝑦�)2

    3.7. Standardized Effect (SE) Coefficient & T – value

    SE coeff is determined using the following equation:

    𝑆𝑆𝐸𝐸 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 = �𝐷𝐷𝑚𝑚𝑉𝑉𝑎𝑎𝐶𝐶𝑟𝑟𝑉𝑉𝑉𝑉 𝐸𝐸𝑉𝑉𝐶𝐶𝑚𝑚𝐶𝐶𝑟𝑟𝐼𝐼 for the matrix, (X′X)−1 ∗ 𝑀𝑀𝑆𝑆𝐸𝐸

    Note: The required matrix is constructed step by step following the general rules of matrix operations in excel. (𝑋𝑋′𝑋𝑋)−1 was already determined as part of regression equation calculations and hence its imported from there. MSE was determined as part of model summary.

    Diagonal elements can be extracted using =INDEX(A1:E1,,Rows(array)), as mentioned before.

    1214

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    T – value is then determined from the following relation:

    𝑇𝑇 − 𝑎𝑎𝑉𝑉𝑉𝑉𝑉𝑉𝐶𝐶 = 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑎𝑎𝑉𝑉𝑉𝑉𝑉𝑉𝐶𝐶𝑠𝑠 𝐶𝐶𝐶𝐶𝑎𝑎 𝑎𝑎𝐶𝐶𝑎𝑎𝑎𝑎𝐶𝐶𝑠𝑠𝑠𝑠𝑚𝑚𝐶𝐶𝑟𝑟 𝐼𝐼𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶

    𝑆𝑆𝐸𝐸 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶

    3.8. Pareto Charts

    A Pareto chart shows the absolute values of all standardized effects in descending order and also shows a reference line to indicate which values are statistically significant.

    A Pareto chart can be generated through the following steps:

    1. Determine the absolute values for all the T – values. The =ABS(value) excel function can be used. 2. The absolute T – values are then sorted from smallest to largest.

    Note: The drag selection is made starting from the cell containing the numerical value for absolute T – value and it extends for the entire range of abs (T – value) and the column containing its names. This selection is to ensure that when the numerical value is being sorted the coeff names corresponding to the value will also get sorted and it is easier while plotting.

    3. Pareto chart is drawn by inserting a clustered bar graph for the sorted numerical data 4. The margin of error which is also plotted in pareto chart is determined using an excel function =TINV(alpha,

    degrees of freedom for error) 3.9. Contour Plots

    Excel is generally unable to process raw data to generate surface and contour plots. It usually requires the data to be in mesh format (x-axis data in columns, y-axis data is rows and the z-axis values in cells corresponding to the x and y axes. Add-ins like XYZ mesh can be used to generate the mesh data from raw data in excel)

    It is still possible to generate surface and contour plots if the data can be manually arranged in a mesh format. But this will be time consuming, prone to errors and almost impossible in case large data sets.

    Also there is a possiblity that we wont have all the experimental data for all combinations of XY. This will result in excel treating that data point as 0 and the plots generated will be distorted.

    1. One example was manually plotted to understand the limitations of excel by varying HotBarT and DwelTime over the given factor levels, while HotBarP and MatTemp were kept at minimum factor level values.

    2. Regression equations are generated for each cells to determine the response values corresponding HotBarT and DwelTime (Note that HotBarP and MatTemp are kept constant at the lowest factor level for all cases).

    3. Once the Mesh data table is completed, surface plots are inserted from recommended charts.

    Note: Increasing the number of factor levels being considered will result in a smoother curve.

    3.10. Optimization

    Optimization for the given problem is done using ‘solver’ add-in in excel. For some Excel packages the solver add-in might not be available by default and has to be setup before it can be used. Solver Add-in can be initialized in excel using the following steps

    File -> Option-> Add - in -> Solver -> go

    Select Solver add - in -> press OK

    Once the solver add-in is available the optimization can be done as follows:

    1. Determining the optimization objective:

    The problem statements requires Strength = 26 & VarStrength < 1

    1215

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    2. One set of factor values are copy pasted and the corresponding responses for Strength and VarStrength are calculated by constructing the regression equations.

    Figure 2: Screenshot showing the excel setup for optimization

    Note: The regression equation shown in the picture is not the complete one

    3. Select solver under the ‘Data’ tab in excel to start optimization. 4. Here we set the objective as to converge the cell containing Strength to a value of 26 by changing cells

    containing the factor values. Currently constraint is to set the VarStrength value to less than 1 and greater than 0.

    Figure 3: Screenshot showing the solver dialogue box and the selected objective and constraints

    Notes: If the factors have any limitations or ranges (upper and lower limits) that can be added as part of constraints as well.

    The objective can also be to minimize the value of VarStrength. In this case the optimisation condition for Strength to a value of 26 will be part of constraint.

    1216

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Based on the data available and various requirements different solving methods can be used. The solution generated will be different for each case.

    5. The Optimized results will be shown in the same cells which were used to initialise the response equation.

    Figure 4: Screenshot showing the optimized result

    Figure 4 shows, to achieve the requirements in Strength and VarStrength responses the factors have to be in the levels as shown. Note: Based on the constraints provided and the type of solver method selected, the answer might differ. But it is acceptable as long as it is logical, follows the physical laws and falls within the required range of constraints.

    4. Conclusion

    An RSM-based engineering problem was successfully designed and optimized using MS Excel. The results were in alignment with MinitabTM-based solution and it was observed that MS Excel was able to reproduce most of the results. Along the course of the study, it was also observed that due to the advanced nature of some statistical data analysis (like the particular case of ANOVA in this problem or the requirement of mesh table for graphing contour plots), MS Excel has limitations in reproducing some data. That is primarily attributed to the non-normality in some DoE designs and unequal/unsymmetrical sets of data. With readily available add-ins, these limitations can be easily overcome. Still, MS Excel is a powerful software package for students undertaking statistics courses to understand the underlying mathematical analyses and apply advanced statistical tools to real-life engineering problems.

    5. Conflict of Interest

    None of the authors has conflicting interests.

    Acknowledgements

    The authors are grateful to the support and facilities provided by Khalifa University of Science and Technology in Abu Dhabi (UAE).

    References

    Bas, D., Modelling and optimization I : Usability of response surface methodology, 78 (2007) 836–845. doi:10.1016/j.jfoodeng.2005.11.024.

    Billo, E. J., Excel for scientists and engineers: numerical methods, John Wiley & Sons, 2007.

    Box, G. EP., and Draper, N. R., Response surfaces, mixtures, and ridge analyses. Vol. 649. John Wiley & Sons, 2007. https://onlinelibrary.wiley.com/doi/book/10.1002/0470072768

    1217

    https://onlinelibrary.wiley.com/doi/book/10.1002/0470072768

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Chaamwe, N., and Shumba, L., Spreadsheets: A Tool for e-Learning — A Case of Matrices in Microsoft Excel, 6 (2016). DOI: 10.7763/IJIET.2016.V6.753

    Dunn, G. K., Process Improvement Using Data, McMaster, 2010.

    Montgomery, D.C., Design and Analysis of Experiments, 4th Edition, John Wiley and Sons. New York: 1997.

    Myers, R.H., Montgomery, D.C., and Anderson-Cook, C.M., Response surface methodology: process and product optimization using designed experiments, Wiley, 2009.

    Nash, J. C., Teaching statistics with Excel 2007 and other spreadsheets, 52 (2008) 4602-4606. DOI: 10.1016/j.csda.2008.03.008

    Ryan, B. F., Ryan, T. A., Jr., and Joiner, B. L., Minitab: Data Analysis, Statistical & Process Improvement Tools, 2019.

    Sinex, S. A., Excel for Scientists and Engineers: Numerical Methods (E. Joseph Billo); Advanced Excel for Scientific Data Analysis, (Robert de Levie), (2009) 570.

    Biographies

    Omar Magdi Khalifa is a graduate student currently pursuing his Masters of Science in Chemical Engineering in Khalifa University. He has obtained his BSc in Chemical Engineering from The Petroleum Institute, Khalifa University in 2018. He joined the Center for Membranes and Advanced Water Technology (CMAT) in 2018, where he is working on a produced water treatment project. His interests include electrochemical processes, ceramic membranes technology, advanced oxidation processes (AOPs), oil-in-water emulsions, black-box optimization, process simulation, and system hybridization. He held the president of AIChE student chapter during his bachelors and is working as a Teaching Assistant in the department of Chemical Engineering at Khalifa University.

    Shafeeq Ahmed Syed Ali is an undergraduate student currently pursuing his Bachelor of Chemical Engineering (hons) at Monash University Malaysia. He has completed a design project on novel bio-based acrylic acid production from raw sugar and is currently working on a research project titled ‘Glycerol Valorization through infinity loop’ which explores the potential of Eco – Industrial Parks in the sight of Biodiesel based applications. His research interests include: sustainable processes, industrial ecology, biochemistry, and statistical optimization. He is affiliated to Khalifa University of Science and Technology by virtue of a research internship.

    Ahammed Syed Ali is lab Engineer at the Khalifa University. He earned B.Tech. in Chemical Engineering from the University of Kerala, India, and a M.Tech. in Chemical Engineering from Indian Institute Of Technology, Madras. He was Assistant professor at TKM College of Engineering (University Of Kerala) from Feb 2002 to August 2008. He worked at the Yanbu Industrial College, KSA, as a Process Instructor & Trainer at the Qatar Petroleum, Doha. He has also delivered several presentations in seminars & training programs.

    Hedia Fgaier is currently a Professor of Mathematics at Full Sail University. Prior to this she was a Lecturer at the University of Waterloo and an Assistant Professor of Applied Mathematics at Al-Ain University of Science & Technology. Dr. Fgaier holds a PhD and a Master degree in applied mathematics from the University of Guelph, ON, Canada. Her research interests lie in the areas of dynamical systems, computer simulation, parameter estimation, and optimal control with applications to biology and medicine. Dr. Fgaier envisions her research to be a blend of theoretical investigations, development of computational methods, and the building and analysis of mathematical models of nonlinear systems. She has published in peer review journals such as Journal of Theoretical Biology and Computers & Chemical Engineering. She has participated in national and international conferences and workshops.

    Ali Elkamel is a Professor of Chemical Engineering. He holds a BSc in Chemical Engineering and BSc in Mathematics from Colorado School of Mines, MSc in Chemical Engineering from the University of Colorado-

    1218

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Boulder, and PhD in Chemical Engineering from Purdue University – West Lafayette, Indiana. His specific research interests are in computer-aided modelling, optimization and simulation with applications to energy production planning, carbon management, sustainable operations and product design. Professor Elkamel is currently focusing on research projects related to energy systems, integration of renewable energy in process operations and energy production systems, and the utilization of data analytics (Digitalization), machine learning, and Artificial Intelligence (AI) to improve process and enterprise-wide efficiency and profitability. Prof. Elkamel supervised over 90 graduate students and more than 30 post-doctoral fellows/research associates. Among his accomplishments are the Research Excellence Award, the Excellence in Graduate Supervision Award, the Outstanding Faculty Award, the Best teacher award, and the IEOM (Industrial engineering and Operations Management) Outstanding Service and Distinguished Educator Award. He has more than 280 journal articles, 141 proceedings, and 33 book chapters. He is also a co-author of four books; two recent books were published by Wiley and entitled Planning of Refinery and Petrochemical Operations and Environmentally Conscious Fossil Energy Production.

    1219

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Appendix

    Figure A1: Screenshot of determining the coded values, based on the given data and by applying the shown equation.

    Figure A2: Screenshot of Matrix X, which is constructed from the four factors and their two-factor interactions.

    1220

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Figure A3: Screenshot of determining the un - coded coefficients for the regression equations.

    Figure A4: Screenshot of determining the predicted response values by using the generated regression equation.

    1221

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Figure A5: Screenshot of generating the Normal Probability chart, indicating i and median rank

    Figure A6: Screenshot of MESH table for generating contour & surface plots. The selected cell shows how the particular response value corresponding to the given factors were determined.

    1222

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Solutions for Response – Strength

    Figure A7: Screenshot of obtained regression coefficients in coded & un-coded values

    Figure A8: Screenshot of residual plot – versus fit

    -6

    -4

    -2

    0

    2

    4

    6

    8

    10 15 20 25 30 35

    Versus Fits - Strength

    1223

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Figure A9: Screenshot of residual plot – versus order

    Figure A10: Screenshot of residual plots - Histogram

    Figure A11: Screenshot of residual plots – Normal probability

    -6-4-202468

    0 5 10 15 20 25 30 35

    Versus Order - Strength

    0

    2

    4

    6

    8

    10

    Histogram - Strength

    Total

    0

    0.2

    0.4

    0.6

    0.8

    1

    -8 -6 -4 -2 0 2 4 6 8

    Perc

    enta

    ge

    Residual

    Normal Probability Plot for Strength

    1224

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Figure A12: Screenshot of obtained Model summary

    Figure A13: Screenshot of Table detailing the SE coeff and T - value

    1225

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Figure A14: Screenshot of generated Pareto chart

    Figure A15: Screenshot of contour plot

    0 5 10 15 20 25 30

    BDD

    BCCAB

    DDCCBBAAAB

    Intercept

    Pareto Chart for Strength

    0.25

    0.5

    0.75

    1

    1.25

    125 150 175 200 225

    Contour Plots of Strength

    -40--20 -20-0 0-20 20-40

    1226

  • Proceedings of the 5th NA International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, August 10 - 14, 2020

    © IEOM Society International

    Figure A16: Screenshot of surface plots

    Figure A17: Screenshot of optimization results generated using SOLVER add-in

    0.25

    0.50.75

    11.25

    -40

    -20

    0

    20

    40

    125 150 175200

    225

    Surface Plots of Strength

    -40--20 -20-0 0-20 20-40

    1227

    1. Introduction2. Problem3. Excel Procedures:3.1. Determining Objective, Response variable and Factors3.2. Importing data to Excel Worksheet3.3. Determining the regression equations for Strength & VarStrength3.3.1. Determining the coded values for each factor level3.3.2. Determining the matrix of coded coefficients3.3.3 Determining the matrix of un-coded coefficients

    3.4. Predicted Response values & Residual Plots3.4.1. Normal Probability Plot:3.4.2. Verses Fits Graph3.4.3. Verses Order Graph3.4.4. Histogram

    3.5. Sum of Squares (SS) and Predicted Error in Sum of Squares (PRESS)3.5.1. Sum of Squares (SS) Total3.5.2. Sum of Squares (SS) Error3.5.3. Predicted Error in Sum of Squares (PRESS)

    3.6. Model Summary3.7. Standardized Effect (SE) Coefficient & T – value3.8. Pareto Charts3.9. Contour Plots3.10. Optimization

    4. Conclusion5. Conflict of InterestAcknowledgementsReferencesBiographiesAppendix