Department of Naval Architecture, - UniNa STiDuE

Department of Naval Architecture,

Ocean and Environmental Engineering

——————

Faculty of Engineering

University of Trieste

SHIP DESIGN———————

A Rational Approach

Part I

Giorgio Trincas

2010

II

April 30, 2010

III

Table of contents

1 Design Science 11.1 Contents of Design Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Design and Technical Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Design and Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.1 Knowledge and Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.2 Engineering and Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.3 Development of Design Knowledge . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 Areas of Design Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.4.1 Theory of Technical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 141.4.2 Theory of Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Bibliography 23

2 Standard Design Processes 252.1 Design Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.1 Descriptive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.1.2 Prescriptive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.1.3 Hybrid Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2 Design Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3 Systems Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.3.2 Process Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.3.3 Synergy with Information Technology . . . . . . . . . . . . . . . . . . . . . 352.3.4 Critique of Systems Engineering . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4 Concurrent Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.4.1 Basic principles and benefits . . . . . . . . . . . . . . . . . . . . . . . . . . 392.4.2 Information Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.4.3 Concurrent Engineering Environment . . . . . . . . . . . . . . . . . . . . . 43

2.5 Quality Function Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.6 Design Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.7 Decision–Based Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.7.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.7.2 Design Types and Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2.8 Decision Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

V

2.8.1 Design Time Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.8.2 Designing for an Original Product . . . . . . . . . . . . . . . . . . . . . . . 632.8.3 Metadesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632.8.4 Axioms of Decision–Based Design . . . . . . . . . . . . . . . . . . . . . . . 64

Bibliography 67

3 Design As a Multicriterial Decision–Making Process 693.1 Decision Making Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.1.1 Decision Making in Technical Systems Design . . . . . . . . . . . . . . . . . 713.2 Basic Concepts of Multicriterial Decision Making . . . . . . . . . . . . . . . . . . . 74

3.2.1 Why Multicriterial Decision Making? . . . . . . . . . . . . . . . . . . . . . . 743.2.2 Individual Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.2.3 Group Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.2.4 Elements of MCDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3.3 Multicriterial Decision–Making Theory . . . . . . . . . . . . . . . . . . . . . . . . . 813.3.1 MCDM Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813.3.2 Mathematical Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823.3.3 MADM and MODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833.3.4 Properties of Attributes/Objectives . . . . . . . . . . . . . . . . . . . . . . 853.3.5 Typology of MCDM Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.4 Nondominance and Pareto Optimality . . . . . . . . . . . . . . . . . . . . . . . . . 873.4.1 Key Concepts and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.5 Theory of the Displaced Ideal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913.5.1 Measurement of Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . 923.5.2 Traditional Utility Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 933.5.3 Ideal Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943.5.4 Fuzziness and Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963.5.5 Membership Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 983.5.6 Multiattribute Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 993.5.7 Composite Membership Functions . . . . . . . . . . . . . . . . . . . . . . . 101

3.6 Multiattribute Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1073.7 Multiattribute Utility Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

3.7.1 Additive Utility Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1103.7.2 Risk and Utility Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

3.8 Multiattribute Concept Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1133.8.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1153.8.2 Concept Design Process Description . . . . . . . . . . . . . . . . . . . . . . 117

3.9 Multiobjective Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1203.9.1 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1223.9.2 Goal Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1233.9.3 Compromise Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . 1253.9.4 Physical Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

3.10 Advanced Decision Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 136

VI

3.10.1 Distributed Decision Support Systems . . . . . . . . . . . . . . . . . . . . . 1363.10.2 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Bibliography 143

4 Multiattribute Solution Methods 1474.1 Decision Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1484.2 Measuring Attribute Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

4.2.1 Eigenvector Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1514.2.2 Weighted Least Square Method . . . . . . . . . . . . . . . . . . . . . . . . . 1524.2.3 Entropy Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

4.3 Selection Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1564.3.1 Compensatory and Non–Compensatory Models . . . . . . . . . . . . . . . . 1564.3.2 Overview of MADM methods . . . . . . . . . . . . . . . . . . . . . . . . . . 157

4.4 Methods with No Preference Information . . . . . . . . . . . . . . . . . . . . . . . 1594.4.1 Dominance Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1594.4.2 Maximin Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1614.4.3 Maximax Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

4.5 Selection Methods with Information on Attributes . . . . . . . . . . . . . . . . . . 1634.6 Methods with Threshold on Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 164

4.6.1 Conjunctive Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1644.6.2 Disjunctive Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

4.7 Methods with Ordinal Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 1664.7.1 Lexicographic Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1664.7.2 Elimination by Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1674.7.3 Permutation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

4.8 Methods with Cardinal Information . . . . . . . . . . . . . . . . . . . . . . . . . . 1704.8.1 Analytical Hierarchy Process . . . . . . . . . . . . . . . . . . . . . . . . . . 1714.8.2 Simple Additive Weighting Method . . . . . . . . . . . . . . . . . . . . . . . 1734.8.3 Hierarchical Additive Weighting Method . . . . . . . . . . . . . . . . . . . . 1744.8.4 Linear Assignment Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 1754.8.5 ELECTRE Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1764.8.6 TOPSIS Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

4.9 LINMAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1864.10 MAUT Method of Group Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . 1864.11 Methods for Trade–offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

4.11.1 Marginal Rate of Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . 1884.11.2 Indifference Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1894.11.3 Indifference curves in SAW and TOPSIS . . . . . . . . . . . . . . . . . . . . 1904.11.4 Hierarchical Trade–Offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

Bibliography 195

VII

5 Optimization Methods 1975.1 Mathematical Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1985.2 Historical Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2005.3 Statement of an Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . 201

5.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2025.3.2 Design Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2055.3.3 Graphical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

5.4 Classical Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 2095.4.1 Single Variable Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 2105.4.2 Multivariable Optimization without Constraints . . . . . . . . . . . . . . . 2115.4.3 Multivariable Optimization with Equality Constraints . . . . . . . . . . . . 2145.4.4 Multivariable Optimization with Inequality Constraints . . . . . . . . . . . 222

5.5 Classification of Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . 2295.6 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

5.6.1 Graphical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2335.6.2 Standard Form of a Linear Programming Problem . . . . . . . . . . . . . . 2355.6.3 Definitions and Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2375.6.4 Solution of a System of Linear Simultaneous Equations . . . . . . . . . . . 2415.6.5 Why the Simplex Method? . . . . . . . . . . . . . . . . . . . . . . . . . . . 2445.6.6 Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2455.6.7 Phases of the Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . 248

5.7 NLP: One–Dimensional Minimization Methods . . . . . . . . . . . . . . . . . . . . 2535.7.1 Elimination Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

5.8 NLP: Unconstrained Optimization Methods . . . . . . . . . . . . . . . . . . . . . . 2595.8.1 Direct Search Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2615.8.2 Descent Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

5.9 NLP: Constrained Optimization Methods . . . . . . . . . . . . . . . . . . . . . . . 2835.9.1 Characteristics of a Constrained Problem . . . . . . . . . . . . . . . . . . . 2855.9.2 Direct Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2865.9.3 Indirect Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

Bibliography 301

6 Design of Experiments 3016.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

6.1.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3036.1.2 Guidelines for Designing Experiments . . . . . . . . . . . . . . . . . . . . . 3046.1.3 Statistical Techniques in Designing Experiments . . . . . . . . . . . . . . . 3056.1.4 Basic Concept of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 306

6.2 Probabilistic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3076.2.1 Basic Statistical Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3096.2.2 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3146.2.3 Probability Density Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 316

6.3 Inferences about Differences in Randomized Designs . . . . . . . . . . . . . . . . . 320

VIII

6.3.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3206.3.2 Choice of Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3216.3.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

6.4 Experiments with a Single Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3256.4.1 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3256.4.2 Fixed Effects Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3266.4.3 Model Adequacy Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . 3336.4.4 Random Effects Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

6.5 Sampling Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3376.6 Response Surface Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

6.6.1 Approximating Response Functions . . . . . . . . . . . . . . . . . . . . . . . 3426.6.2 Phases of Response Surface Building . . . . . . . . . . . . . . . . . . . . . . 3476.6.3 Goals and Control of Quality . . . . . . . . . . . . . . . . . . . . . . . . . . 349

6.7 Building Empirical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3506.7.1 Multiple Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 3506.7.2 Multiple Linear Regression Models . . . . . . . . . . . . . . . . . . . . . . . 3516.7.3 Parameters Estimation in Linear Regression Models . . . . . . . . . . . . . 3526.7.4 Model Adequacy Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . 3546.7.5 Fitting a Second-Order Model . . . . . . . . . . . . . . . . . . . . . . . . . . 3616.7.6 Transformation of the Response Variable . . . . . . . . . . . . . . . . . . . 363

Bibliography 367

7 Metamodelling Techniques 3697.1 Notion of Metamodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

7.1.1 Nature of Metamodelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3707.1.2 Metrics for Performance Measurements . . . . . . . . . . . . . . . . . . . . 371

7.2 Metamodels in Engineering Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 3737.2.1 Modelling Design Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . 3757.2.2 Importance of Accuracy Estimates . . . . . . . . . . . . . . . . . . . . . . . 376

7.3 Engineering Simulation Metamodels . . . . . . . . . . . . . . . . . . . . . . . . . . 3767.4 Space Filling Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380

7.4.1 Response Surface Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3807.4.2 Spline Metamodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3827.4.3 Kernel Smoothing Metamodels . . . . . . . . . . . . . . . . . . . . . . . . . 3857.4.4 Spatial Correlation Metamodels . . . . . . . . . . . . . . . . . . . . . . . . . 3867.4.5 Frequency Domain Metamodels . . . . . . . . . . . . . . . . . . . . . . . . . 3877.4.6 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3877.4.7 Neuro–Fuzzy Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3897.4.8 Kriging Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3897.4.9 Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391

7.5 Sequential Sampling Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3937.5.1 Entropy Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3937.5.2 MSE Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394

IX

7.5.3 IMSE Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3947.5.4 Maximin Distance Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 3957.5.5 Maximin Scaled Distance Approach . . . . . . . . . . . . . . . . . . . . . . 3957.5.6 Cross–Validation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 3967.5.7 Potential Usages of Sequential Approaches . . . . . . . . . . . . . . . . . . . 396

7.6 Metamodelling Process in Technical Systems Design . . . . . . . . . . . . . . . . . 397

Bibliography 401

8 Fuzzy Sets and Fuzzy Logic 4058.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

8.1.1 Types of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4078.1.2 Crisp Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4088.1.3 Fuzzy Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410

8.2 Basics of Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4118.2.1 Membership Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4118.2.2 Formulations of Membership Functions . . . . . . . . . . . . . . . . . . . . 4158.2.3 Fuzzy partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4178.2.4 Properties of Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4188.2.5 Extension Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4208.2.6 Operations on Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4218.2.7 Elementhood and Subsethood . . . . . . . . . . . . . . . . . . . . . . . . . . 4278.2.8 Fuzzy Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4298.2.9 Operations on Fuzzy Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 431

8.3 Fuzzy SMART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4368.3.1 Screening Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4368.3.2 Categorization of a Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4388.3.3 Assessing the Alternatives: Direct Rating . . . . . . . . . . . . . . . . . . . 4438.3.4 Criterion Weights and Aggregation . . . . . . . . . . . . . . . . . . . . . . . 4458.3.5 Sensitivity Analysis via Fuzzy SMART . . . . . . . . . . . . . . . . . . . . . 448

8.4 Additive and Multipllcative AHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4518.4.1 Pairwise Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4518.4.2 Calculation of Impact Grades and Scores . . . . . . . . . . . . . . . . . . . 4558.4.3 Criterion Weights and Aggregation . . . . . . . . . . . . . . . . . . . . . . . 4608.4.4 Fuzzy Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4658.4.5 Original AHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467

8.5 ELECTRE Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4718.5.1 Discrimination Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472

8.6 Fuzzy Multiobjective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4768.6.1 Ideal and Nadir Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4778.6.2 Weighted Cebycev–Norm Distance Functions . . . . . . . . . . . . . . . . . 4788.6.3 Weighted Degrees of Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . 4808.6.4 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4828.6.5 Design of a Gearbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484

X

Bibliography 489

9 Engineering Economics 4919.1 Engineering Economics and Ship Design . . . . . . . . . . . . . . . . . . . . . . . . 493

9.1.1 Criteria for Optimizing Ship Design . . . . . . . . . . . . . . . . . . . . . . 4949.1.2 Operating Economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496

9.2 Time–Value of Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4989.3 Cash Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499

9.3.1 Cash Flow Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4999.3.2 Interest Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501

9.4 Financial Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5069.4.1 Taxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5069.4.2 Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5099.4.3 Practical Cash Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5119.4.4 Depreciation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5149.4.5 Inflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5179.4.6 Escalation Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520

9.5 Economic Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5219.5.1 Set of Economic Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5219.5.2 Definition of the Economic Criteria . . . . . . . . . . . . . . . . . . . . . . . 5229.5.3 Choice of the Economic Criteria in the Marine Field . . . . . . . . . . . . . 529

9.6 Ship Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5329.6.1 Building Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5349.6.2 Operating Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5409.6.3 Other Decision Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547

9.7 Ship Leg Economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5489.7.1 Inventory Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5489.7.2 Economic Cost of Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . 5499.7.3 Effects on NPV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549

Bibliography 551

Foreword

Eliminate the specialist man

(Oscar Niemeyer)

The role of the naval architects is both wide and focused. It obviously depends, to some extent,on the segment of the shipbuilding or shipping industry in which they choose to work. However,whatever it is, it is usually in a leadership role. That means that they are able to fulfill thisleadership role is a reflection of the useful breadth as well as specialization of the education andexperience gained in the industry.

Naval architects are found in many positions in the marine industry. They can be found inthe following categories: shipowner, design companies, shipbuilder, government (departmentof transportation, navy, research centers, classification societies, universities, independent re-search centers, marine equipment manufacturers). Typical positions of naval architects are:shipowner’s technical/design manager, design agent executive, shipyard executive, chief navalarchitect, project manager, technical project manager, technical manager, ship manager. It canbe seen that the role of naval architects offers many interesting challenges and opportunities fora satisfying and rewarding career in the marine industry.

Naval architects need education and training in all the topics required in the design and construc-tion of ships and other marine vehicles, to fully understand the consequences of ship design trade-off decisions.. In addition, they must have a basic understanding of most of the engineering disci-pline topics as well as of engineering economics. The educational requirements for naval architectscan be obtained by looking at the course curricula for the various universities that offer degrees innaval architecture. Although there are some differences, the traditional naval architecture topicsinclude: theoretical naval architecture, hydrodynamics, marine structures, materials, mechanics,ship dynamics, ship design theory and practice, shipbuilding practice, planning and scheduling,engineering economics, statistics, probability and risk, product modelling practice, computer–based tools, marine environment, marine industry, ship acquisition, shipowner’s requirements,regulatory and classification requirements, contracts and specifications, cost estimating, humanfactors, safety, composites, corrosion and preservation, marine engineering considerations.

1

The State-of-the-Art

Ship design was probably one of the first technological areas to benefit from a scientific approachas well as from the application of mathematics to modelling and problem solving. The historyof the study of mechanics of materials, the theory of elasticity, hydrostatics and hydrodynamics,the study of engines, all form part of the rich heritage of humanity’s quest for knowledge, whichhas been, and still is, applied to the design and manufacturing of ships.

In contrast with traditional ship design that often assumes a closed system in isolation fromexternal influences, this course aims to encourage students as future naval architects to identifyand understand the impacts their design decisions have as a whole on the technical and economicperformance on the shipping market where their ships will operate.

Ship design is an activity in which a clear view of the functional requirements and an equallyclear view of the constraints on feasible solutions are both essential. The primary requirementsof payload, speed, and size of a ship are challenged by the problems induced by an often hostileoperation environment and complex demands due to the required economics of operation, safety,etc. The simultaneous requirements frequently conflicting to some degree with each other appliesto the design of many ships and offshore vessels. Those not actively involved in design takeall this for granted and expect increasingly sophisticated marine vehicles with lasting life andreliability than their predecessors, at a reasonable cost. It would be difficult to achieve all thiseven in a vacuum but the designers have to operate in the real world of competition, which setshigh standards and demands, in the shortest possible time from the conceptual idea to productdevelopment.

Modern strategies and tactics should see as their goals not only to simplify the product and itsproduction, but also to make all kinds of processes more effective and economical. The mainobjective, the optimization of the relationship between costs and benefits, should be reachedthrough rational organization and introduction of scientific knowledge into the design process, soincreasing net efficiency.

Since shipbuilding industry has exploited modern computer-aided techniques (surface modelling,solid modelling, automatic mesh generation, direct interfaces between models and their analysis,numerically controlled production facilities), great benefits have accrued. If design practicesare well founded and competently managed, better ships are delivered to the customers, morequickly than before. The synergy between new technology, advanced design tools and short lead–times is achievable provided a rational design strategy is exploited. It creates a reward, which isconsiderably larger than the sum of the individual benefits of small changes in design practicesor islands of automation in the design office.

2

The Way Ahead

There is much to be done: a deeper understanding of all aspects of the design process and howthey interact is still needed. In many design departments the impressive potential of computerscience, scientific knowledge and successful practices are still a long way from being appliedas parts of a comprehensive and rational design methodology. There are too many companieswhere drawing boards, computers, and mathematical models are still used in a time–consumingsequential mode. The requirements and wishes of the customer are frequently translated intospecifications in departments separated from the design headquarters; and no central database isgenerally accessible. To improve out–of–time way of designing the following is required: robustand user’s friendly computer codes, proper understanding of all the processes involved in design,sound information managing and retrieval, efficient linking of design procedures to a comprehen-sive database, and updated decision–making support systems to facilitate sound decision-makingearly in the design process.

Designing ships requires a broad spectrum of specializations and skills. Beyond naval architects,they include mechanical, electrical, nautical, aeronautical, and production engineers, computer-systems experts, mathematicians, physicists, and market analysts. The research topics extendfrom hydrodynamics to quality assurance, from efficient materials selection to optimization in theface of many conflicting requirements. The result has to be the long–term emergence of an overalltheory of design. The more immediate outcomes will be deeper understanding of design process,specific decision–making procedures, faster and more reliable computer–based design tools, andimproved interfaces with existing methods and techniques. As these are addressed, one can beginto perceive areas that are common to a variety of engineering application and thus there is thebeginning of a general approach to design.

The successful practice of shipbuilding is not only a design matter, but depends on the integrationof marketing, design, procurement, production, and the business functions such as finance andhumean resources.

The Purpose of this Course

The overall objective of this course is to provide students with the opportunity to deepen under-standing of ship design by developing the concept and preliminary design of a modern vessel basedon the paradigm of decision-based design. Students will be introduced to techniques for integrat-ing science–based knowledge in structuring a rational design process. They will be provided withthe tools needed to incorporate simultaneously technical–specific issues and engineering economicsinto ship design. Their mathematical and statistical knowledge from traditional engineering cur-ricula will be supplemented to have a link to advanced techniques that are needed to developtechnical and economic modelling of ship properties. By the end of the course, students shouldhave internalized the meaning of multicriterial approach from different perspectives, having ex-plored how these various perspectives impact their own selection of the ‘best possible design’ as

3

a system and/or the optimal subsystem.. The course also enables students to identify, formulate,and negotiate robust solutions, and to gain a deeper understanding of issues typically associatedwith decision-based engineering. By developing and exercising students in structuring and solvingproblems, the course empowers future designers to become agents of innovation.

The tutor will use cognitive mapping to keep the course individually responsive to each stu-dent. The learning essay is an instrument to help develop and assess critical thinking skills. Thecourse introduces students to strategy and tactics for developing solutions through multicriterialdecision–making techniques. Because the ability to manage imprecision and uncertainty is criticalin design process, robustness techniques are provided to consider design always coupled with risk.

The course initially provides fundamentals of design theory and practice that are further exploredwith respect to ship design in the remainder of the course. In order to prepare students forthe design project, the second part of the course focuses on concept ship design in the contextof multicriterial decision-making. Decision–based design is the paradigm of naval architecturedesign used throughout the course, and is rooted in the belief that the principal role of a navalarchitect, when designing a ship or an offshore vessel, is to make decisions. Main emphasis is de-voted to a structured, multiattribute approach to making robust decisions since the early designphases. This approach examines potential solutions in conceptual form in order to select a subsetof feasible alternatives for development into non-dominated solutions, which are most likely tosucceed. The selection of the ‘best possible’ design ties the issues of fundamentals in ship design,engineering economics, and decision-making explored in the first part of the course to specificdesign problems by incorporating shipowner requirements as decision criteria.

Having been introduced to the set of multiattribute decision-making (MADM) techniques, thestudents will submit a revised project outline after reflecting on their initially proposed outlineand receiving feedback from the tutor.

Last but not least, the course integrates the tutor’s experience in decision-based and robustdesign, industry–academic interfaces, professionalism, and alternative paradigms in ship designlearning.

Finally, the purpose of this tutorial is threefold:

• to provide naval architecture students with the opportunity to get basic understanding offundamentals of rational ship design;

• to assist members of design teams make better design decisions by providing basic knowl-edge in one relatively easily accessible source;

• to serve as a handbook when they enter the marine industry.

4

Theory of Rational Design

and

Multicriterial Decision Making

Chapter 1

Design Science

The subject of this chapter is design science, which is often deemed as a new branch of scienceaiming to make the design process rational. Design has been formally acknowledged as a separateactivity since the beginning of the industrial revolution, at first in production and manufacturing.Design is not a body of knowledge, but is is a highly manipulative activity in which the designteam has to balance many factors that influence the achievement of the ‘best possible’ outcome.

The term ‘design science’ is to be understood as a systemic activity, which should contain andintegrate the existing body of knowledge about and for design, e.g. it is the scientifically andlogically related knowledge about engineering design. Design science must therefore explain thecausal connections and laws related to design of technical systems. The knowledge system mustbe fixed in the forms of its terminology, taxonomy, relationships (including inputs, throughputsand outputs), theories and hypotheses, so that it can serve as a basis for consciously guided designactivity.

Both the terms design and designing can be used, even though the comprehensive term designingis thought to designate the entirety of all design activities. The advantage of the word designinglies in its general meaning; it is widely used worldwide, as it is understood in the defined contextin Germanic, Romance and Slavish languages.

The purpose for design science is to develop a coherent and comprehensive knowledge aboutengineering design. Engineering is a very misused word. It can be used to describe a profession,the process of developing a design into working instructions, and a type of manufacturing. Oneof the earliest definitions of engineering is that ‘engineering is the art of directing the greatsources of power in nature for the use and convenience of man’. Another definition offered byErichsen (1989) is that ‘designers create and engineers analyze’ so that design is seen as a part ofengineering, where some engineers design and some analyze the design of others; in other terms,engineering develops and documents the design to enable its manufacture.

1

1 – Design Science

1.1 Contents of Design Science

The object of the design science is represented by the technical product to be designed, and thedesign process itself. The reasons why design science is today so necessary can be found especiallyin the new market situation of the globalization:

• its need increases constantly with time, not only with respect to quality and quantity, butalso because of demands on shortened time from ordering to delivery of technical products;

• the pressure of competition continues to grow dramatically in connection with the politicaldevelopments (e.g., reduction of trade barriers, internationalization);

• laws, regulations, standards, and codes with respect to quality assurance, safety, and envi-ronment are being reinforced and enhanced continuously.

However, shipbuilding is not a true free market since many countries support their shipbuildingindustry through direct and indirect (sometimes hidden) subsidies. At the building level, the shiptechnology is easily transferable and it does not require a high level of education or skills with highproductivity, if the country labor’s and/or exchange rate is low compared to other shipbuildingcountries. What is fundamental and decisive is the quality of design and its products.

Nowadays technical products must be able to perform the desired task with a demanded groupof properties, denoted with the term of affordability, namely, performance, operability, life–cycle,safety, reliability, maintainability, and so on. The time scale of planning, manufacturing, anddelivering must be suitable. At the same time cash flow and financing must be considered. Allthese requirements force the industrial companies, and especially the design departments, to uti-lize new methods and procedures in order to remain competitive.

The main question of rational design is to find which organization, skill, knowledge–basis areneeded and adequate so that the product and/or process is suitable for the intended purpose. Tothis end, design science should explain the complete structure of the design process and relatedtheories. Its contents should comprehend the engineering disciplines, making explicit the con-nections of the individual knowledge elements. Besides design science terms may be found like‘design philosophy’, ‘design theory’, ‘scientific design’, ‘design instructions’ or ‘design studies’.These names are not suitable to indicate the total knowledge of design as they were chosen todesignate only a part of the design knowledge.

Design science needs not only systematic descriptions (declarative knowledge, descriptive state-ments belonging to theory); it also needs methodology, instructions for the practical activity(procedural or prescriptive knowledge), and/or (deterministic and stochastic) algorithms andtechniques.

Historically the way towards design science did not lead straight to an entirety, but first to differ-ent separated activities. The state–of–the–art in these activities have turned out very differently,but in their final effect they have all aimed to improving engineering practice. Developmentshave always taken place simultaneously in a dual direction: from practice towards abstraction,and from theory towards application, without bringing these directions to a useful convergence.

2

1.1 – Contents of Design Science

Nevertheless, no comprehensive design science has emerged even though three classes of activitiesmay be envisaged by which improvement in design science can be reached:

• practice: rationalization in engineering practice;

• science: answering to scientific questions via research projects;

• education: improvement of design teaching in the universities.

The term design science will be accepted with difficulty in too many companies where, at best,the design department is responsible only for the last phases of the design process, that is, thefunctional and detail phases. Many authors have tried to define the terms design, engineeringdesign, and designing; among the others:

• engineering design is the process of applying various techniques and scientific principles forthe purpose of defining a device, a process, or a system in sufficient detail to permit itsphysical realization (Taylor and Acosta, 1959);

• engineering design is a purposeful activity directed toward the goal of fulfilling human needs,particularly those which can be met by the technology factors of our culture, and towarddecision making in the face of uncertainty, with high penalty for error (Asimov, 1962);

• designing means to find a technically perfect, economically favorable and aesthetically sat-isfactory solution for a given task (Kesselring, 1964);

• design is a goal-directed problem-solving activity (Archer, 1964);

• design is a creative activity, bringing into being something new and useful that has notexisted previously (Reswick, 1965);

• design is the imaginative jump from present facts to future possibilities (Page, 1966);

• design is the activity involved with actually constructing the system; i.e., given a specifi-cation of the system, it is mapped into its physical realization; the design task, however,extends throughout a system life cycle, from the initial commitment to build a new systemto its final full scale production (Katz, 1985);

• design is the creation of a synthesized solution in the form of products that satisfy perceivedneeds through mapping between the functional requirements in the functional domain andthe design parameters of the physical domain, through proper selection of the design pa-rameters that satisfy the functional requirements (Suh, 1989);

• designing is the chain of events that begins with the sponsor’s wish and moves through theactions of designers, manufacturers, distributors and consumers to the ultimate effects of anewly designed thing upon the world (Jones, 1992).

The nature of design is reflected in many other statements, which, at best, capture only partof the truth. Design is still considered a predominantly creative activity, founded on knowledgeand experience, devoted to determine the functional and structural construction, and to createdocuments and drawings that are ready for manufacturing. Therefore, to save design time andeffort, it is mandatory to look for innovative approaches to design, eliminating whichever iterativemethodology .

3


One should recognize, as a premise, that the task of design science itself cannot be limited only tothe delivery of technical-scientific tools (statistical databases, numerical codes, etc.) for improv-ing the most important technical properties. Design science should also offer complete knowledgeabout the design process, i.e. information about models, procedures and methods with which allproperties of the technical system can be established and realized in optimal quality, and alsohow they can be implemented to make the design process economic and quick.

The following partial goals of design science can be derived from the above:

• optimizing technical properties of the industrial product;• reducing the design lead–time and costs of the design process by controlling the responsi-

bilities and downstream commitments;• increasing attractiveness of design work for talented engineers also by decreasing the pro-

portion of routine work;• decreasing the risks for the company and design team;• shortening the complete education (maturity time to competence) of designers;• creating the knowledge–base for computer application in design by collecting and storing

all available information into relational databases;• making the design process scientifically transparent.

Engineering designing is executed in two well characterized, but not strictly separated, phasesof laying out and detailing. Despite the reference to those two phases, the largest advantagesfrom design science are foreseen in the most important design stage, that is, in the concept de-sign where innovations might and should be generated. It is during this design phase that themost important innovative, creative and holistic thoughts can bring the greatest benefits for theproduct to be designed.

It must also be clear that many psychological barriers are present, which work against acceptanceof rational design methods in engineering practice. Modification of the knowledge and provisionof instructions and education is only one of the necessary steps for acceptance.

1.2 Design and Technical Knowledge

Since designing can also be considered as a process of transformation of information, a hugeamount of information in technical knowledge is needed. This information consists in:

• basic knowledge at different extent for various design phases, taken from individual engi-neering sciences like hydrodynamics, strength, materials, manufacturing technology, as wellas mathematics, physics, and so forth;

• knowledge of mathematical models and modelling techniques;• knowledge of the product with regard to functions, operation mode, maintenance, etc., and

related guideline (heuristic) values and constraints;• knowledge of typical components, semi-finished parts, purchased subsystems from manu-

facturers and suppliers;

4

1.2 – Design and Technical Knowledge

• knowledge of organization management and engineering economics;

• knowledge of the working tools, which can support the design tasks, including libraries,handbooks, computer packages;

• knowledge of standards, regulations, rules, and patents.

There is no single source to obtain all this information in the current literature. The most in-clusive answer available in some engineering handbook generally presents only a fraction of therequired knowledge. Even recommended handbooks such as the ones published in the last decadeby Watson (1998) and Schneekluth & Bertram (2000) in ship design area, practically omit dis-cussion over the design process.

The technical knowledge in its highest form, that is, in the theory of design, is a process knowl-edge. Survey of the literature shows how little of this knowledge is contained even in specializedpublications about designing. Even successful designers are not conscious enough about why theydevelop designs the way they do. On the other hand, academicians that teach ship design gener-ally do not document the different approaches and even do not try to provide their students withcapability to understand which strategy is better at different design stages. In general, experi-ence internalized by individual designers is not available to scientific community. In teaching andsubsequent practice, only some of the enumerated elements are transmitted to future designers,without organizing them into a knowledge system. Only in recent years the construction of sucha knowledge system has started.

Fortunately, there has been considerable research into design in all disciplines over the past fewdecades. Outcomes have been translated into computer codes and procedures, and validated inconnection with design knowledge. For instance, hydrodynamics and structure numerical codeshave been developed into reliable tools as part of the basic/preliminary design process. Graphicalrepresentations within the framework of technical drawings have also reached a high technicallevel, especially as wire-frame, surface and solid-based modelling on computer packages.

It is perhaps astonishing to see how much the historical evolution in designing has crystallizedin a situation, which still influences the common idea of designing. Some technical experts andscientists, even in the universities, have considered design simply a more or less sophisticatedcomputation activity: this can include complex operations such as computational fluid dynamics(CFD), finite element methods (FEM), structural optimization and reliability, statistical process-ing and individual optimization applications. For others, the task of technical drawing charac-terizes designing.

The central core of design work, that is the prior thinking out of the technical products and pro-cess, has been removed from consideration merely as ‘getting an idea’ or an inspiration. In thisrespect, most engineers believe that these ideas, as well as other decisions, could be obtained justwith a portion of ‘intuition’ and ‘creativity’, without fully understanding of their implications.More often, design is still considered an ‘art form’ rather than involving a scientific activity. Tal-ent certainly influences successful design in several aspects, especially in reference to efficiencyand effectiveness, but a more important factor is the ‘design approach’.

5


Only with the movement towards rationalizing design in the 1950’s and 60’s the activity of de-signers was explained to be cognitive and technical–scientific. This was rediscovered in the late80’s in the USA, with practically no reference to previous European and Soviet Union researchwork. Design needed technical knowledge, and a new task was to find, explore and process thisknowledge to a theory. The discipline known under the terms ‘methodical or systematic design-ing’ has provided considerable preparatory work during the last forty years.

Nevertheless, practitioner engineers do not consider design as a science yet. The impression isthat engineers and workers can eliminate or compensate for all errors during the subsequentdevelopment and production of the product. In practice, designers do not find the time andmotivation to locate within their activity the research about design, to understand and introduceit into their own work. It is, therefore, useful to discuss some classical concepts commonly relatedto definition of design, such as the ideas of creativity, intuition, innovation, etc.

Design and Intuition

For many designers the term intuition is still a keyword. An appraisal of the results of intuitivethinking shows that one can often achieve brilliant ideas quickly. But, on the other hand, intuitionshows itself to be extremely unreliable, both in timing and in practical applicability of the idea.In any case, an intuitive idea must be examined (logically, systematically, analytically) for thepossibility of its realization. Frequently new ideas must be rejected since analyses of definitiveresults of solutions, which have emerged from intuition, show that almost nothing has remainedof the original idea after many necessary improvements and corrections.

Nevertheless, engineers think intuitively and are generally inclined to behave so (Hubka & Eder,1992). Design team leaders use intuition mainly in situations where the problem must be treatedquickly, or the design has to progress on the basis of partial information, and if new solutionsfor organizational concepts, for marketing strategies or for technical innovations are to be found.Rich experiences are a prerequisite for each intuition. Lack of experience, and thus a small basisfor comparison, can only lead to a risky guessing game.

Design and Feeling

If one observes experienced designers at work, it is impressive to see how often they make appro-priate quantitative decisions initially without any calculation. The subsequent computation, inorders of magnitude and with refined processes, often confirms the correctness of the estimate.This ability is called feeling for design, and it is often regarded as a talent. It is obvious that thisfeeling is based on experience and knowledge, rather than on the idea that the designer was bornwith something innate.

The products of feeling for design are quantified statements, usually even depending on severalquantities. The dependencies are not always quantitatively available, be it in the form of an exactor approximate derived formula, or of a quantified experience value (heuristic ‘rule of thumb’).

6


Often only qualitative statements are possible: larger or smaller, sharper or blunter. However,designers can exploit only that knowledge that they have stored and internalized.

Rules, which are presented as formulae, guidelines, or were gained intuitively through experience,begin subconsciously to influence the thinking processes after frequent, repeated use. In this way,the developed ‘feeling for design’ delivers quick decisions. These decisions are of limited validityas the experience must remain within a conventional range and can break down if this range isexceeded. The danger exists that a designer is not aware of exceeding the conventional rangelimits, and underestimates checking of his/her ‘design feeling’.

Design and Creativity

The combination of the two qualities of intuition and feeling can bring to the idea that designersshould be creative. Creativity does not ‘just occur’. Recent researches in psychology have shownthat creativity, generating novel ideas, occurs as a result of a natural tension between scientificand intuitive mental modes. The scientific mode (systematic, analytical) can recognize that aproblem exists and can analyze its nature. The intuitive mode (erratic, non-calculable) can yielda sense of dissatisfaction, which motivates the interaction with the scientific mode to attempt tosolve a problem. The oscillatory interplay between intuitive and systematic thinking modes canresult in creative proposals for original solutions. Creativity occurs as a direct result from usinga systematic approach to searching for solutions. Of course, science–based design methodologymay offer the appropriate methods to support creativity.

Neither a rigid systematic approach to designing, nor a fully intuitive mode of working are ade-quate by themselves to make it likely that good products may result. A flexible combination ofsystematic work and freedom of action, adapted to the specific design problem, and performedby well–educated designers using efficient tools in a suitable technical–scientific environment, islikely to increase the chances of getting successful designs.

A design team effective and creative in problem solving must show simultaneously:

• knowledge of products and physical principles, including adequate knowledge derived fromexperiences, heuristics and feelings;

• knowledge of processes, especially knowledge about design and problem–solving processes;• open-minded attitude, e.g. willingness to accept ideas and suggestions;• ability to communicate the generated proposals in useful and attractive forms.

Design, Innovation and Invention

Today the word innovation is a much abused term, used as a slogan by politicians and managers.It not only contains the implied positive objective (analogous to creativity) but also the dangerto elevate search for the new and original to the supreme goal of the design activity. The goal ofdesign must always and only be to achieve the ‘best possible solution’ in the given conditions.

7


If a new idea, not deducible from the current state–of–the–art, is devised for improving an in-dustrial product, this is called an invention. Inventions are normally patentable, to protect themagainst copying through granting a patent to the inventors.

The combination of inventing with designing has a varied character. Designers should examinesolutions with regard to the possibility of patent applications as inventions. It has to be empha-sized that invention should not be a primary target for designers. Higher on the scale of strivingfor competition should be the combination of an optimal solution containing an invention.

Design and Heuristics

As an adjective, the word heuristic refers to an instruction or guideline, which is not necessarilybased on science. In this sense, an heuristic is simply a ‘rule of thumb’, derived by experience,which could lead to an acceptable result with good probability.

Using this interpretation, Koen (1985) has proposed a hypothesis that in engineering all theoriesand instructions are to be regarded as heuristics because of two main reasons: (i) science servesin engineering only to formulate heuristics for the realization of technical systems; and (ii) theform in which the results of science are usually expressed in engineering is not directly suitablefor the explorative ways of design. Koen has even stated that the application of these heuristicsis the only approach useful to engineers. The interpretation about heuristics by Koen is useful forenhancing the humility of engineers, but is useless as a prescription about how one can accomplishdesign more effectively and efficiently.

More strictly, heuristics was defined by Klaus (1965) as the ‘science of the methods and rules ofdiscovery and invention’. The heuristic method is a particular form of the trial–and–error method .It differs from the deductive method in that it works with conjectures, working hypotheses, etc.The heuristic methods can be simulated on computers by prediction models.. The best–knownproponent of modern heuristics is Polya (1980), who is regarded as the re-discoverer of heuristics.

Interaction between heuristics and design methodology provides systematic heuristics. Muller(1980) has discussed the following assumptions as a result of his observations on design methodsof experienced designers, systems analysis and heuristic methods:

• design is a problem–solving activity;

• each design activity can be divided in a finite number of subtasks, which can be solvedsimultaneously or sequentially;

• each subtask needs a different method, which must be tailored correspondingly;

• each design team has its own and peculiar heuristic method.

Design and Education

If one wants to rationalize the engineering design process through an innovative approach, twoaxioms must be established:

8


• design is a rational activity, which can be decomposed into design phases;

• the design process can be conceived and structured in a very general form, even though itdepends on the product to be designed.

It is, therefore, mandatory to completely refuse some diffuse and wrong opinions, namely:

• design is an art and only especially talented persons can execute it;

• design is not a generalizable activity, but is always bound to the particular product to bedesigned.

Design education is closely related with the first axiom: design is teachable, provided a theory(design science) and right educational methods are made available. Engineering design commu-nity is still discussing whether design should be taught primarily by establishing a foundation oftheory or by engaging students in loosely supervised practice. For the broader activity of productdesign and development, both approaches must be refused when taken to their extremes. Theorywithout practice is ineffective because there are many nuances, exceptions and subtleties to belearned in practical settings and because some necessary tasks simply lack any theoretical basis.Practice without theory fails to exploit the knowledge that successful product development pro-fessionals and researchers have accumulated over time. Although there are still strong defendersof both extremes, it is likely that over time the theoretical approach will prevail.

One reason that the theory of design has developed so slowly is that most engineers did not andyet do not receive formal education in design. That this is so is validated from the fact that designtheory and practice are insufficient or even lacking from current curricula. Therefore, it must beone of the goals of the design science to propose suitable models and didactic instruments, whichconsider all elements (teaching, theory, practice, etc.) and integrate them into a total system.

The main task remains, however, to process the ‘design science’ generally and particularly forstudents, and to develop the necessary didactic principles, tools, procedures and teaching materi-als. The task is so special that it can be solved only by cooperation between theorists and designspecialists. This cooperative work requires specialized organizations combining engineering andeducation elements. Sorrowfully, there is nothing similar in Italy as, for instance, the AmericanSociety for Engineering Education (ASEE) in the USA, the Internationale Gesellschaft fur Inge-nieurpedagogik (IGIP) in Germany, and Societe Europeenne pour la Formation des Ingenieurs(SEFI) in France.

Designing is also a skill, i.e. it needs both knowledge and abilities, reachable through working,exercises, and training. If no or only little relevant and organized knowledge is presented, thelearning time will be long: hence, education and training are inseparable. Not each industrialcompany puts good instructors in charge of a new generation of designers, and the best designersare not always good instructors. The time to maturity and capability of design engineers is theproblematic parameter. It is generally numbered at about ten post–graduate years. Such a longtime span is still needed on average to gain and internalize the knowledge lacking from their ownexperiences and own study.

9


In any case, it is necessary to educate students in application of commercial packages. In shipdesign computer–based packages are available for design synthesis (ASSET, SDS), analysis (FEM,CFD), surface modelling (Rhinoceros), CAD (AutoCad, Fastship, MacSurf, AVEVA–TRIBON,NAPA, GHS, FORAN) and CAE (AVEVA–TRIBON, CALMA). However, although some peopleclaim that computers eliminate the need for a design process, the need for an efficient and rationaldesign process is mandatory.

In short, education and learning goals with design science should be acquisition of:

• knowledge: technical system theory, knowledge about specialized design activities, theoryof design processes, theory of decision making;

• ability: methodical procedures, mathematical modelling, transfer of know-how into databases,application of statistical and numerical codes;

• skill: sketching, drawing by computer (CAGD), computing via CAD–CAM tools.

1.3 Design and Science

Introduction of design science into the design process does not mean neither immediate successes,nor double–digit percentages of savings. It is undisputable that design science will bring a long–term improvement both in the procedures of the design process and in the quality of the industrialproducts. But, as for any science or theory, one cannot expect that it can be immediately appliedto the real problems of design engineers. On one hand, improvements must be derived fromscience and adapted to the practice, and on the other hand where improvements originate fromthe practice, they must be inserted and absorbed into the science. Only afterwards, technicalknowledge presented in a new order and form, may become a productive tool for designers.

Design science wants to deliver a framework discipline, which allows inclusion, transfer or referenceto relevant traditional design knowledge selected and revised from the existing global knowledge,combined with necessary ‘completions’. The new integrated knowledge system must form a logicaland organized entirety, in which the individual elements mutually fertilize each other. It will fulfillthe objective of delivering relevant information for design in suitable form.

1.3.1 Knowledge and Science

As in all areas of human activities, also in science and technology, knowledge is power. Thereforerelevant knowledge and experiences from the individual areas have to be collected and organized,and thereby abstracted to general rules, models and laws. That happened also when the mislead-ing title ‘ship theory’ as science dealing with phenomena and forms of floating objects, appearedas a collective term around the year 1800 with Russell and other scientists. The overall structureof knowledge was then simplified and made clearer by collecting the existing areas of statics andstructures.

10

1.3 – Design and Science

In order to arrive at a rational and more effective design process, the scientific community needsto develop design research, that is, suitably structured knowledge through gathering and inter-preting existing data, developing decision–making support systems, etc. Three forms of designresearch can be envisaged (Eder, 1994):

• research into design, e.g., various kinds of protocols, decision–making techniques, etc.;• research for design, e.g., computer–aided tools, databases, modelling techniques, design of

experiments;• research through design, e.g., abstraction from observations during design, synthesis from

intermediate and final solutions;

Design science is not autonomous in its development, and has to adapt and include knowledgefrom other disciplines and branches of science. Important knowledge areas are technology, math-ematics, computer science, and management. The search for general laws in engineering sciences,but also in the engineering design field, has always considered mathematics as its most importanttool. The increasing demands made on decision–making and modelling have always led to moreintensive exploitation of the classical as well as new methods of mathematics, such as statistics,probabilistic theory, fuzzy sets, cluster analysis, surface response methods, decision–making the-ory, optimization, artificial intelligence, and others.

The design process must be organized and directed in its progress. Management methods areoften discussed in relation to engineering design management and leadership. The methods ofsystems engineering (SE), concurrent engineering (CE), quality function deployment (QFD), totalquality management (TQM), product development strategies (PDS), etc., are also counted amongthe boundary region between management and design. Team work and team building (Shuster,1990) are important areas for current management techniques, including control of conflict, mu-tual support, interchanging and stimulating ideas, and so on.

1.3.2 Engineering and Science

The engineering sciences are committed to organize the technical knowledge as expediently aspossible, and strive for a complete and suitable form. A simplistic and misleading statement that‘engineering is applied science’ is frequently heard in this context. Although most engineers quotetechnology as the object of engineering sciences, the latter comprehends three basic elements, e.g.products, processes, and materials, which can be explored independently of each other; but theyform an inseparable unit.

The class of engineering sciences perhaps overcomes the average understanding of this term,which normally limits itself to applied mathematics, mechanics, fluid dynamics, electronics andsimilar areas. This concerns knowledge that makes possible the treatment of the technical prod-ucts and processes in all of their life phases so that design work is facilitated. The importanceof engineering sciences and the proportion of technical disciplines increase continuously with therising number of scientific specialties. In the course of the technical work, phenomena and prop-erties of the technical product must be evaluated, modelled, simulated even experimentally, etc.

11


A list of knowledge issues for research and development in engineering design consists of:

• case–based reasoning: adapting the lessons learned from a previous design problem into acomputerized knowledge–base for establishing new guidelines for future projects;

• collaborative design: allowing several designers to interact with a design database to modifydesign issues that are their responsibility;

• decision–making support systems: mainly multicriterial decision-making systems to help indesign evaluation and selection process;

• information delivery systems for design: not just information storage and retrieval, but alsointerpreting and presenting it in design–suitable forms;

• knowledge–based design tools: capturing knowledge, especially unstructured experiences andinformation for databases, artificial intelligence, etc.;

• modelling: response surface methodology, neural networks, kriging, etc.;

• optimization techniques;

• simulation: computational investigations of the behavior of technical systems, e.g., compu-tational fluid dynamics, finite element methods, etc.;

• virtual reality: visually walking through an environment that has only been built as a com-puter model.

The general theory of technical systems (Hubka, 1984) should act as the fundamental disciplinethat determines the problems and their interrelated structures.

1.3.3 Development of Design Knowledge

The need for improvement efforts concerning the design process has been caused by different rea-sons: (i) growing pressure towards better performance and very demanding targets in a high–techindustry; (ii) availability of fast and powerful computers together with development of numericalcodes; (iii) shortage of skilled people in some areas; and (iv) time pressure.

Up to the mid sixties there were only some isolated groups or individual experts who proposedcertain solutions for improvement of design work. The next period, especially in the seventies anduntil today, can be labelled as the prime time for the development of design science. Increasingnumbers of research contracts, exchange of opinions through international conferences and thefounding of ‘Institutes for Design Technology’ enlarged qualitative capacity to attack the designproblems scientifically.

Factors mainly affecting the development of design knowledge are the degree of industrial devel-opment, the levels of education, the extent of research and the size in the individual countries.The corresponding picture is quite complicated because of many factors, namely:

• The level of industrialization plays a large role when a need for rationalizing the design pro-cess is recognized. It is surprising how sparsely the research projects for design knowledgebegan to work and how weak the interest in the existing knowledge was in many countries

12

1.4 – Areas of Design Science

(especially in the USA). A possible explanation to this could be that where the economicpower is strong, a persistent inertia exists, which decreases research efforts for the purposeof design improvement.

• University graduate engineers are more open for design knowledge than graduates of thelower engineering schools or trained designers. It may be that the former are less repre-sented in the embodiment design process, and they especially process the more abstractanalytic, and numerical tasks. In comparison, the trained designers are mainly devoted tomore concrete tasks, where results from the older expertise appear more important.

• The cultural traditions also play a decisive role: understanding about goals and means fordesign knowledge in the countries of the European continent is different from that in theUK or the USA.

• The size and financial power of a country appears not to play a relevant role. The resultsin Germany or in the Scandinavian countries towards design science is incomparable in thisrespect with those in the USA, Russia and even Italy. In comparison, the output of resultsfrom USA computer science as a whole far outweigh any others.

To obtain a more precise picture of the situation, one would have to observe the state-of-the-artand/or the development in research, in engineering design practice, and in design education. Inrough terms the present state of design knowledge can be described as follows:

• much knowledge has been accumulated, mainly as ‘islands of knowledge’; relationships be-tween single elements of design knowledge have been inadequately explored.

• some individual areas were and are not explored sufficiently, so that the mutual relation-ships do not emerge in the knowledge system.

• more care has been taken recently of the design methodology.

As a result, some goals crystallized either into partial tasks or even into principles for designmethodology; for example:

• task to obtain a rational formulation of design strategy and design procedures;• task to obtain clear separation of individual activities, especially the ones requiring high

level of specialization (CFD, FEM, DoE, etc.);• scope of generating as many feasible designs as possible and their best selection in the con-

cept design phase.

1.4 Areas of Design Science

To establish the technical knowledge, the structure of ‘design science’ is ordered from partial andspecialized areas, namely, the theory of technical systems and theory of design processes. Toenhance understanding of the technical system, it is advisable to explore the relationships amongthese areas and with regard to exogenous factors.

13


1.4.1 Theory of Technical Systems

A partial area of ‘design science’ is the theory of technical systems (TTS) which describes andexplains the technical system to be designed from all viewpoints important for designing. Thecorresponding descriptive statements primarily affect the transformation process and the effectsof technical product systems on the performance of an industrial product, on the different waysof modelling its characteristics, and on the structure of the building components. Table 1.1summarizes the partial areas of the theory of technical systems.

Areas Transformation Technical Propertiesof Process Product of Technical

TTS Systems Systems Systems

Goals: Task: Classes:Reply to - modelling - operation mode - relationships amongspecific - technology - prototyping propertiesquestion - environment - parts - design characteristics

- organization - design - evaluation of properties

- Physics - Cybernetics - Mechanics- Electrical - System Technology - Strength (of materials)

Items Engineering - Building Technology - Industrial Designof - Manufacturing - Instrumentation - Ergonomics

analysis Technology Engineering - Aesthetics- Process - Branch Knowledge - Hydromechanics

Knowledge about TS families - Vibrations...

......

Table 1.1. Areas of the theory of technical systems

The theory of technical systems considers substantially the individual families of technical sys-tems. The existing knowledge systems about products of design only deal with some specialties,and in any case they cover only partially the information needed by designers.

Blanchard and Fabricky (1978) identify three basic elements in a technical system:

• components, which are the operating parts of a technical system; each system componentmay assume a variety of values as set by some evaluations subject to constraints;

• attributes, which are the properties characterizing the components of a technical system;• relationships, which are the links between components and attributes.

Theory of Properties

One of the most important parts of the theory of technical systems is the theory of properties,which are all those features which allow substantial evaluation of the technical and economiccharacteristics since the concept design phase. The actual properties of a feasible design are mea-surable quantities (attributes). Designers must establish and evaluate these properties, subjectto requirements and constraints, accurately and reliably since the very early design stages.

14


The total value of a design is composed of values of multiple attributes to enable an overall judg-ment such as utility value or required freight rate. The value scale forms a sequence of continuousor discrete total values. Discrete scales can be selected values from a continuous scale (ranking),or merely belonging to a set (Pareto–set).

A complete and general list of all properties necessary to describe requirements for a complextechnical system might require some hundreds of attributes. The way out lies in inquiring abouta general set of classes of properties which can be concretized in individual domains and sub-classes down to individual properties defined for a particular subsystem. Four collective classesof properties can be distinguished to provide a complete coverage of the requirements:

1. design–related classes, the so–called internal properties;2. classes that refer to economic properties, safety and ergonomic properties, aesthetic prop-

erties, law and rules compliance properties;3. classes that directly cover the manufacturing properties;4. classes related to the purpose of a technical system during its life–cycle time.

The boundaries between these classes are not well defined; each property can affect one or severalclasses. Typical relationships between one property and other properties represent the sum ofknowledge needed to be able to design for that property.

Quality of Technical Systems

Whichever technical system has to be evaluated from the viewpoint of its quality, which can beheavily influenced since early design phases (quality assurance).

The German standard DIN 55350 understands quality as an entirety of a technical product prop-erties, which refers to its ability to fulfill given requirements. This definition implies a totaljudgment (total value, value vector) about a set of attributes that have to describe how suitablethe technical system is to fulfill all partial requirements. This total value is composed of theindividual values of all relevant properties by a weighted combination of them.

According to groups of properties used to determine quality, different kinds of quality can becategorized, such as structural quality, manufacturing quality, and reliability quality. There arethree partial areas within the development of an industrial product, which govern its quality:

• Quality of design, which mostly affects the quality of technical systems. It can be assuredthrough a rational, scientific and systematic design process, which uses theoretical evalua-tion, experimental tests and validation at suitable milestones during the design work andafter its conclusion, based on the compiled documents (‘design audit’): this quality abso-lutely demands an objective set of technical knowledge (‘design science’).

• Quality of manufacturing , which is measured on the produced components of the technicalsystem. It is known as quality of conformity to the manufacturing specifications, i.e. to thedetail drawings. After production of the components, measurement of this quality aims atrejecting the unfit ones with the help of statistical processes (‘quality control’).

15


• Quality of application, which can be evaluated only when the technical system is employed.This includes also secondary processes such as maintenance, repair, refitting, upgrading, etc.

Knowledge systems have been developed for each of these three partial areas. The relevantstandards for recognition and verification of quality assurance schemes are ISO 9000, ISO 9001,and ISO 9002.

Models of Technical Systems

In engineering practice, at preliminary and functional design phases a set of drawings of the futuretechnical system is executed, which must contain all information necessary for its manufacturing.Design practice uses many kinds of representations (isometric or perspective projection, graphicrepresentation) and models (communication models for data transmission, experimental models,mathematical models, etc.). The practitioners are not conscious that they are really dealing withthe model of a technical system.

For design, models serve very different purposes such as:

• prediction of the properties of a technical system;

• optimization of a subsystem;

• management of the manufacturing planning.

Cybernetics tries to accurately specify the term ‘model’ and distinguishes two fundamental classesof models, both needed in design:

• Behavior models, which model the properties of a technical system via numerical com-putation of dynamic responses, analyses of experiments. Meaningful and simple models(metamodels) of technical systems should guarantee lack of ambiguity of interpretation.These models represent either a single attribute (e.g., lightship weight) or all single prop-erties belonging to a class (e.g., seakeeping behavior).

• Structure models, which model the structure of a technical system in detail and assemblydrawings; they are mostly used in embodiment design.

An important set of models is the class of ideal models, which aims to create a meaningful originalwhich can be seen as utopia. The idealization results in an ideal design (zenith), in which allessential functional properties of the technical system are included at the highest value. It mayserve as reference scale for evaluating, ranking and selecting competitive alternatives.

1.4.2 Theory of Design Process

The relevance of the design process has to be emphasized wherever, since it happens in manycountries, and in Italy too, that the top–management does not recognize yet the broad impor-tance and scope of design for the success of products and their company. By the way, that impliessalaries relatively low offered to the designers.

16


The scope of the theory of design processes is to explore the design process as much generallyas possible, and to organize, store and reference the complete process knowledge for and aboutdesign. This means that, in contrast to the present condition of empiricism, the relationshipsamong the individual knowledge elements should be explicitly recognized and explained.

The contents of the theory of design process can be outlined by the following themes:• task and structure of the design process;• computer–aided tools in the design process;• procedures for solving the design tasks;• factors affecting the efficiency and effectiveness of the design process;• subjective decision–making preferences in the selection activity.

The design process has to be well documented so that every designer may understand the de-sign rationale, thus reducing the risk of imprecise and uncertain decisions. A record of the maindecisions undertaken has to be created for future reference and for educating young designers.Documented design processes usually have been developed over time by trial–and–error searchingfor the so–called optimal solution by an evolutionary approach.

In many industrial activities such as shipbuilding, designers are under increasing pressure to pro-duce ‘right first time’ designs in a very short time. Therefore, skill is required in the applicationof correct decision–making operations (problem solving) at the right time during the product de-velopment process. At concept design phase, the problem–solving activity consists of structuringthe design problem, generating alternative solutions, evaluating and selecting the ‘best possible’design among candidate solutions, and communicating to the preliminary level by delivering top–level specifications.

Due to the complexity of technical systems, it is compulsory for designers to use decompositionto facilitate evaluation of individual properties by subdividing the design process into a set ofdesign activities that permit evaluation prior to a recombination of individual properties to forman overall evaluation of the design. The basis of decomposition and identification of design char-acteristics are obtained from the ‘product design specification’.

It is important to note that companies and industries, at varying levels of maturity, will havevery different views of what their design strategy is and therefore which evaluation methods andapproaches they will use to support their strategy. For example, a company may be preparedto adopt a high–risk strategy to introduce a novel product. However, when one considers themany millions of euros, and the many hundreds of people, required to design, develop, test andmanufacture a new complex technical system, it is clear that the consequences of failure is great.To reduce risk levels, several evaluation methods have developed over the years to accommodatethe needs of the different design strategies employed by companies and decision makers.

Characterization and Evaluation in the Design Process

In general, the design process is still recursive - problem solving calling problem solving for asub–problem, or a design process calling a second design process for a less complex subsystem,

17


going downwards in a hierarchy of complexity - and also iterative - exploring forward into moreadvanced design stages, backwards for review, expansion, completion and correction.

The design process is to some extent idiosyncratic, depending on the experience background ofthe design team. It may concern a novel product , with little or nothing taken over from a previ-ous product; then risk tends to be relatively large, especially if untried principles are used. Theproduct may be more or less an adaptation of an existing one to new requirements, that is, a kindof redesign. It can be a renewal of a previous product, using modified principles, performancevalues, new subsystems, etc. Design can result in an alteration or updating of existing products.Many design projects relate to variants, by scaling up or down with appropriate adaptation ofsize relationships.

A typical evolution of a technical product from a management viewpoint consists of identifyingcriteria, conceptualizing, evaluating with accuracy, embodiment, and detailing. Identifying thecriteria leads to design specifications. Concept design uses knowledge–based metamodels to eval-uate technical and economic properties of the product to be designed. Accurate evaluation inpreliminary design and contractual design requires precise definition of the product geometry,usage of numerical codes, experimental analysis, etc. Embodiment develops the functional designup to final improvement of the product. The final stage of detailing produces the complete man-ufacturing information; typically detail drawings, parts lists, and instructions for assembly.

The decision process may be applied for evaluation, improvement and optimization of a candidatedesign, but its main purpose is selection. Decision-making theory has been developed for thatpurpose, to make decisions more rational, provided that the criteria and attribute functions canbe formulated in mathematical terms. Evaluation needs previous identification of criteria andtargets about acceptable performance related to the properties. Some evaluation criteria will beabsolutely objective, containing numbers or mathematical relationships, but others will be heav-ily dependent on designers’ subjective preferences. Selection includes comparison of candidatealternatives with respect to the ideal design to establish their potential quality.

The design process must be managed and made more reliable and predictable in timing andquality of outcomes trying to shorten the time to economic break–even and profit; that requires:

• generating and evaluating alternative solutions to select the ‘preferred design’ among feasi-ble, non–dominated alternatives;

• improving the preferred solution before basic design phase through sensitivity analyses, sothat as many of the potential faults can be avoided;

• scientific knowledge–base about similar products that have been designed, about availableprinciples of operation, but also about failures that have occurred, and their causes;

• the best possible knowledge about the design process;

• auditing the design work with respect to all properties and accepted design specificationsby validating the models, verifying I/O data and checking manufacturing information.

Furthermore, rationalizing the design process also implies that the product and its manufactur-ing process should be designed concurrently. For a novel product, concurrency usually needs

18


to wait until at least the concept design phase is substantially completed. During embodimentand detailing of the product, concurrent engineering of the manufacturing process can proceedat best. It is necessary, however, to ‘freeze’ the design development of the product before themanufacturing process is completed; otherwise, an expensive redesign situation can occur.

Goals of the Design Process Design EngineeringDesign Process and Information Design

Methodology Domain

Quality of thetechnical system decisive decisive moderateto be designed

Design time decisive decisive decisive

Design efficiency decisive decisive moderate

Reduction ofrisk for the decisive decisive decisivedesigners

Ratio ofskilled to decisive moderate low

routine work

Maturationtime for decisive decisive moderatedesigners

Design costs decisive decisive decisive

Team work decisive moderate decisive

Table 1.2. Relevance of areas on goals of the design process

A possible list of the partial areas of the design process and the corresponding effect on differentgoals is shown in Table 1.2. It is evident that relationship between goals and partial areas of thedesign process is at maximum as regards design time and reduction of risk for the designers. Ofcourse, evaluations may be modified in this judgement matrix according individual experiences.

Transformation Technologies in the Design Process

To realize a rational transformation in the design process, the technology of the design processshould be ascertained. It should make possible not only the transformation of the information(from rough specifications to a representation and modelling of the industrial product ready formanufacturing), but at the same time also its optimization. To guarantee that the process pro-gresses in an effective way, all the technical and organizational elements as well as their mutualrelationships must be standardized.

An analysis of present design processes shows at least three typical classes of transformationtechnologies:

• traditional , which is still predominantly used in engineering heuristical design practice;

19


• methodical , if the process is guided by systematic and rational methods;

• mixed , if both intuitive and methodical classes of procedures are used simultaneously.

The tasks of the individual designers are obviously different within these types of design processand technologies. Many parameters of the design process are influenced by the design technology;in particular, quality of the design and product, time to complete the design work, and especiallytransparency of the design process. This latter factor can permit or hinder such procedures asteam cooperation, design audits, product liability litigation.

Among the available procedural models, three classes can be envisaged according to the type ofguidelines:

• strictly algorithmic instructions, i.e. rigidly regulated procedures;

• heuristic instructions, i.e. relatively flexible procedures;

• relatively vague instructions, with no clear references, i.e. fairly free procedures, guidedonly by some relevant principles.

To make the area of methodology clearer, one should distinguish between design strategy , whichshould determine the general direction of the design process, and design tactics, which treats themethods and working principles of the individual design steps. In the context of the systematicinstructions for procedure, the question has emerged whether the whole design process is algorith-mically solvable. Whatever is the reply, procedural models of design have emerged and continueto emerge, as ‘flexible algorithms’, not only in design strategy but also in design tactics.

Risk Analysis

A major development is the use of statistics and probability for risk assessment in all aspects ofdesign and operation.

Although relatively new to shipbuilding, risk analysis and other risk techniques have been usedin other industries for more than sixfty years. It obtained its impetus from the start of newindustries such as nuclear power generation and the USA space program. Most recently it wasapplied to the protection of the environment and safety in the operating of all types of technicalproducts. All these cases shared the same problem in that there were no historical data on whichto base design decisions, or to predict the performance of equipment relative to its safe operation.

The old way to design for uncertainty and to eliminate risk of failure was to apply safety marginsto the derived requirements. The problem is that safety margins are built on experience andwhere there is no experience large safety margins have to be applied which is a waste of resourcesand may be cost prohibitive. In order to find solutions to this situation the designers turned tothe application of probability, which is the foundation of risk analysis.

Techniques such as fault tree analysis were developed to break down the problem into parts thatcould be analyzed and assigned individual probability levels, which would then be combined intoan overall risk assessment. After its initial development, the application of risk analysis expanded

20


into industries where the rate of new technology development was high, or the risk of catastrophicor very serious outcomes was present. In some cases it was only brought into use after significantaccidents occurred, such as the ‘Exxon Valdez’ cargo oil spill in Alaska.

In case of the current focus, it can be seen as the risk of not achieving the contract speed on trialsor deadweight for a ship, the risk of structural failure, the risk of an oil spill, the risk of collisionin a crowded sea lane, and so on. Probabilistic approaches in ship design now cover subdivision,damage stability, oil outflow, reliability-based structural design, machinery monitoring and con-trol, maintenance and operation.

The global marine industry was introduced to risk analysis and management through the activi-ties of IMO. The UK Marine Safety Agency developed a risk analysis and mitigation approach,the Formal Safety Assessment (FSA), which is a broad brush approach to identifying major riskareas, analyzing them in turn, developing ways to mitigate the risk, performing a cost–benefitanalysis for the proposed solutions, and then deciding on an approach.

Designers and Leadership

What are engineering designers? They are educated engineers open to new ideas to think ahead,capable to transform a concept about an industrial product into a technical system to be manufac-tured. In process terms, the designers transform information from requirements and constraintsinto the description and representation of a product, which is capable of fulfilling given demands,including requirements from the life cycle. Designers include all team members who cooperate inand contribute to the transformation in the design process; not only development engineers andlayout engineers but also decision makers and draftsmen.

The design process demands from the designers a large quantity of different activities which sup-port design, but cannot be allocated directly to the real design process. They range from designplanning to supplying information, from interaction with production planning to experimentalanalyses. Besides the design team directly active in the transformation process, the complete de-sign system contains also several further staff members who fulfill the general functions necessaryfor the progress of the design process: providing managing, administrative processing, archiving,etc. The historical individuality of designers meant, but sometimes still means, that the predom-inant share of design was accomplished by designers themselves. That is why the great relevanceof design for an industrial company has to rest with the designers. This necessity is, however, inevident contrast to many situations, where the designers are generally held in low respect.

The leadership strategy in the design process will hardly be found in a manual for management.Leading designers up to the top position must always be branch experts, who understand thework and are ready to take part in discussion, for otherwise they can hardly find the necessaryrecognition from their team members. All activities of the design leaders, such as determiningtasks and assignments, planning, working methods, work for coordination, organization and oth-ers, must be conducted and influenced by conscious leadership tactics. Continuing education ofthe staff members should not be forgotten, in branch knowledge and in design process knowledge.

21


That is why the ‘design science’ has the task to support the design managers with related knowl-edge and to make available for design processes the particular theories of specialization, organi-zation and planning, in order to solve often conflicting problems; among the others:

• creative–innovative tasks which can be very complex;

• design work often carried out and released under time pressure;

• simultaneous processing of several tasks which are respectively in different states of designmaturity;

• shortening of qualified members in some design areas;

• management of even capable staff members who are often individualists and not suitableto group job;

• psychological barriers up to refusal of new methods and techniques/technologies;

• space for personality development, career prospects and chances for promotion (for exam-ple, ascent without compelled transfer into company management).

22

Bibliography

[1] Archer, M.: Systematic Method for Designers, Council for Industrial Design, London, 1964.

[2] Asimov, M.: Introduction to Design, Englewood Cliffs, Prentice–Hall, New York, 1962.

[3] Dieter, G.E.: Engineering Design, 2nd edition, McGraw–Hill, New York, 1991.

[4] Eder, W.E.: Developments in Education for Engineering Design - Some Results of 15 Years of WDKActivity in the Context of Design Research, Journal of Engineering Design, Carfax, 1994, Vol. 5,no. 2, pp. 135–144.

[5] Hubka, V.: Theory of Technical Systems, Springer–Verlag, Berlin–Heidelberg, 2nd edition, 1984.

[6] Hubka, V., Eder, W.E.: Introduction to Design Science, Springer–Verlag, Berlin–Heidelberg & NewYork, 1992.

[7] Jones, F.: Design Methods - Seeds of Human Futures, Reinhold van Nostrand, New York, 1992.

[8] Katz, R.H.: Information Management for Engineering Design Applications, Springer–Verlag, NewYork, 1985.

[9] Kesselring, F.: Technical–Economic Designing , VDI–Zeitschr., 1964, Vol. 106, no. 30, pp. 1530–1532.

[10] Klaus, G.: Dictionary of Cybernetics, Fischer, Frankfurt, 1969.

[11] Koen, B.V.: Definition of the Engineering Method , ASEE, Washington, 1985.

[12] Muller, J.: Working Methods of the Engineering Sciences - Systematics, Heuristics, Creativity ,Springer–Verlag, Berlin–Heidelberg, 1990.

[13] National Science Foundation: Program Announcement Design Theory and Methodology , Nr. OMB3145–0058, 1985.

[14] Page, J.K.: Contribution to Building for People, Conference Report, London: Ministry of PublicBuildings and Works, 1966.

[15] Polya, J.K.: School of Thinking - about Solving Mathematical Problems, Francke, Bern, 1980.

[16] Reswick, J.B.: Prospectus for an Engineering Design Center , Cleveland, OH: Case Institute ofTechnology, 1965.

[17] Schneekliut, H., Bertram, V.: Ship Design for Efficiency and Economy , 2nd edition, Butterworth–Heinemann, Oxford, 1998.

[18] Shuster, H.D.: Teaming for Quality Improvement , Englewood Cliffs, Prentice–Hall, New York, 1990.

[19] Suh, N.P.: Principles of Design, University Press, London, 1989.

[20] Taylor, E.S., Acosta, A.J.: MIT Report, Cambridge, MIT Press, 1991.

[21] Watson, D.G.M.: Practical Ship Design, Elsevier Ocean Engineering Book Series, Vol. 1, Elsevier,Oxford, 1998.

23

Chapter 2

Standard Design Processes

Nowadays, there is still a general lack of understanding of the essence of the design process.Design is not a body of knowledge, but it is the activity in which the design team integratesthe existing bodies of knowledge, while continuously and simultaneously paying attention to andbalancing many factors that influence the design outcome.

The term design process, interchangeable with design methodology , concerns procedures devel-oped to adequately perform design activities. These procedures are structured, that is, they area step–by–step description and provide a template for the key information and decision–making.Documented design processes have been developed over time by trial-and-error following an evo-lutionary approach, with a few exceptions based on more rational design theories.

The whole design process is a decision–making process of analysis, synthesis, evaluation, feedback,modification and control. The life–cycle design starts from the identification of customer require-ments, and extends to the concept design, basic design, detailed design, production, deployment,and phase out. The mission of system designers is concluding a definition of the technical systemin the concept design phase, then proceeding down in the subsystem level during preliminarydesign phase, and providing every component functional items in the detailed design phase.

In general, because of the complexity of modern technical systems and more stringent require-ments the traditional design methods are no longer suited to yield competitive products. That isworth mostly for ships, which are among the most complex products as it is evident by consider-ing the number of individual parts required for different products (see Table 2.1).

In today competitive market, there is an ever–increasing need to develop and produce technicalsystems that fulfill a number of customer requirements, are reliable and of high quality, and arecost effective. Even though it is well known that the early design phases are of utmost impor-tance for the lifetime success of a technical product, even recently the focus was on engineeringimprovements, such as product planning, parts planning, process planning, production planningautomated manufacturing, reduction of labor costs, etc. (Elvekrok, 1997). Therefore, a ratio-nal paradigm for design is needed to increase both the efficiency and effectiveness of the designprocess. Efficiency is intended as a measure of the quickness in generating reliable information,which can be used by designers in decision–making process. Effectiveness denotes a measure ofquality of design decisions (accuracy, comprehensiveness).

25

2 – Standard Design Processes

Product Type Number of Parts

Aircraft carrier 2,500,000Submarine 1,000,000VLCC 250,000Boeing 777 100,000Fighter aircraft 15,000Automobile 1,000

Table 2.1. Number of unique parts in technical products

The primary objective of the design effort, besides creating the information needed to build thetechnical system, is to satisfy the customer’s requirements at minimum cost. Life–cycle cost of anindustrial product includes the design, construction, and operating costs. For designs that incor-porate new technologies, and hence research and development costs, these also must be included.

The demand for innovation in product development requires innovation in the design processachievable by consideration of some basic fundamentals:

• design is the primary driver of cost, quality, and time:

– concept design influences over 70% of the total life–cycle cost of a technical system;– too much is spent too late;– more focus is needed on the concept and preliminary design phases;

• recognize the need to leverage the power of design earlier, broader and deeper:

– design improvements are marginal if they only address single components and proper-ties, and not the life–cycle processes;

– it is a mistake to focus only on reduction of labor and material costs;– the major cost battleground is overhead, which must be attacked aggressively;

• multidisciplinary teams are the key to drive the overall design process:

– only multidisciplinary teams can manage the multiple, often conflicting properties ofa complex technical system;

– customers have to participate in decision–making from the earliest design stages.

Since Wallace and Hales (1987) documented in explicit detail an actual design process in indus-try, nothing similar was published in open literature. Even though Andreasen (1987) highlightedsome improvements and positive changes in designers’ attitude because they had started to relyon the basic methods of design science (for example, using the ‘concurrent engineering’ approach),analysis of designers’ behavior along the design process shows diffuse situations with poor trainingand cognitive limitations. In general, designers:

• develop the functional aspects of the design in stages throughout the design process ratherthan during an initial functional development phase;

• use functional relationships that are not always quantitative since the real beginning of thedesign process;

• make decisions based on heuristic reasoning instead of rational decision making;

26

2.1 – Design Models

• use their individual knowledge to influence the generation and evaluation of different solu-tions, thus avoiding to use domain–independent procedures;

• select the final design in a pure deterministic way without due consideration of imprecisionand external noise.

2.1 Design Models

Nowadays two major streams of development in design theory can be envisaged, e.g. the pursuitof a rational theory of design and the development and application of computer–based tools.From a theoretical viewpoint, there are many theoretical models for the design process developedcompetitively from practitioner designers and researchers, which can be grouped in two basicapproaches and a third one derived from their merge:

• descriptive models, which build the design process from observation of the professional prac-tice of designers;

• prescriptive models, which aim to structure a rational design process based on knowledge-base;

• hybrid models, which combine descriptive and prescriptive models; they promise to be moreflexible than the prescriptive models while fixing the sequence of main design phases.

2.1.1 Descriptive Models

The basic activities of the descriptive models are:

• problem definition: the problem statement is developed to identify the design attributesand constraints according to customer expectations; a useful investigation is conducted toclarify the advantages and disadvantages of various subsystem options to study during theconceptual design phase;

• concept design: the problem statement is used to develop a baseline system to meet customerde3mands under given constraints; decisions made during this design phase are strategicand influence the whole product life cycle as regards performance and cost;

• preliminary/contractual design: the previous conceptual baseline is used as an initiation fora detailed design description and drawings that define an integrated engineering system bymeans of high–tech tools and knowledge–base.

• functional design: the final design is refined as to its functionality through optimization oftotal engineering system performance and cost; a specification is provided with supportingdrawings and procurement documents to purchase long lead–time equipment;

• detail design: all the details of the final design are specified, and the manufacturing draw-ings and documentation are prepared.

27


The descriptive models exemplify the way design is performed sequentially by a design team, butdo not indicate how to arrive at the ‘best possible’ solution applying decision–making techniques.They closely resemble the traditional approach to design relying on decomposition, evaluationand iteration. These models are usually no more than guidelines because of the predominantinfluence of subjective preferences of designers on the process. Their validation is difficult asmost engineers often do not have the required skills.

2.1.2 Prescriptive Models

In contrast to descriptive models, prescriptive models encompass a decision–based design ap-proach, which has its foundation on a general design methodology containing four basic processes(Jones, 1963):

• problem definition: defining all the requirements for the technical system and reducingthese to a complete set of performance specifications; to this end, technical systems are de-composed or partitioned into subsystems to determine their size, properties, etc.; anothercrucial objective is to develop the orderly design schedules and plans;

• analysis: finding possible solutions for each individual performance specification and inte-grating subsystems to arrive at alternative design solutions with feasible properties.

• synthesis: selecting the preferred solution among a number of alternative solutions withfeasible properties.

• evaluation: assessing the degree to which the preferred solution fulfills the stated require-ments.

Prescriptive models of the design process have been developed by Hubka (1982) and Pahl andBeitz (1984) in harmony with ‘systems engineering’ approach. Commentaries on the develop-ment, implementation, and application of related guidelines have been made by Beitz (1987) andCross (1989).

De Boer (1989) deems that a basic three-step pattern (diverging, systemizing and converging)can generally be recognized in each phase and subphase of a prescriptive model. The first stepin design is usually divergent in nature: the design team generates a large number of candidatealternatives, which are then analyzed and synthesized into forms that may represent feasiblesolutions. Finally, the refined solutions are further analyzed, synthesized, and evaluated leadingin general to a set of satisficing designs. As the number of acceptable solutions is decreased, theselection activity is characterized by convergence. This pattern is that one more correspondingto the logic of the concept design.

2.1.3 Hybrid Models

The is no doubt that design of complex technical products will continue to require the individualexpertise, perception, and judgement of the members of the design teams. However, the designers

28

2.2 – Design Approaches

have to be organized as a network of experts based upon their technical speciality or particularrole. Further information technology and computer-aided support to decision making should bedisseminated across design subteams to facilitate their critical communication and negotiationabout the design process. This strategy results in hybrid models for the design process as a com-bination of prescriptive and descriptive models, supported by cross–functional design subteams.

Although it is considered a means of achieving an efficient and effective concept design (Parsonset al., 1999), the hybrid model approach is suitable also to preliminary and contractual designphases, when the naval architects, strength and vibration experts, and marine engineers in ded-icated teams can be assigned specific tasks. It is responsibility of each job leader of a subteamto decide when and how to carry out more accurate computations and evaluations. Nevertheless,conflicts arise when design subteams disagree on the importance of the single characteristics oftheir own functional parts with respect to the features of the entire technical system. As a result,trade–offs are often resolved in ways that do not optimize for the best overall system. Therefore,hybrid models are also aimed to overcome these problems by addressing three key problems indesign management:

• planning : design tasks cannot be sequenced in rigid detail, as it happens in the designspiral approach; project management has to be flexible and adaptable to specific customerrequirements and external environment; parallelism is a key concept in design planning;

• integration: decisions that a design subteam makes affect previous decisions made by an-other design team; this permanent need for re–evaluation as a result of changes in thedesign interfaces can lead to lengthly cycles of iteration and change; moreover, design ishighly influenced by the integration of marketing, procurement, production, finance, andhuman resources.

• ranking : designers have non-common language and preferences; hence mathematical toolsare necessary for comparing the importance of different attributes and ranking overall scores.

This fosters a decision–based design approach allowing the design to proceed concurrently anddefers detailed specifications until trade–offs are more fully understood.

2.2 Design Approaches

As the design process progresses and decisions accumulate, freedom to make changes is reduced;at the same time, the knowledge about the product under design increases. This increase inknowledge is characterized by a transformation of soft (vague) information into hard (more pre-cise) information. Soft information refers to the heuristic and uncertain information that stemsfrom approximate predictions of product properties also based on designer’s subjective judgmentand experience, whereas hard information is generally based on much more accurate and science–based evaluations. Given this nature of design information, what a rational design approach hasto facilitate is to know more about the design properties early on, that is, faster increase in ratioof hard–to–soft information. This relative improvement in the quality of information is expected

29


to lead to equivalent or better designs that are completed in less time and at less cost than thosedeveloped using a traditional serial process. Therefore, it is worth discussing how the most rele-vant design approaches satisfy these expectations.

The design spiral has been and is still the preferred approach to describe the ship design process.It is focused on a series of activities that meet, as efficiently as possible, at a single design solu-tion for a specific project. This approach often involves making decisions based on incompleteinformation and/or compromise. Thus, it requires significant rework (iterations) to reach an ac-ceptable design and has no way of knowing if it is a good solution other than experience. Figure2.1 shows how the ship designers move through the design process in a serial series of steps, eachdealing with a particular analysis or synthesis task.

After all the steps have been completed, the design is unlikely to be balanced, or even feasible.Thus a second cycle begins and all the steps are repeated in the same sequence. Typically, anumber of cycles (design iterations) are required to arrive at a satisfactory solution. Anyone whohas ever participated in a ship design knows that in this approach the steps often will not beperformed in a prescribed, hierarchical order; instead the design team will jump from one spotto another on the spiral, as knowledge is gained and problems are encountered.

Figure 2.1. The design spiral

The design bounding approach is an alternative design process that uses the concept of designspace. It considers a number of alternative solutions within a range of values for independentvariables and parameters, which bound the design space, avoiding pressure for iteration. Whileit involves performing the design calculations for every design combination, the need for iterationit avoided. This approach exploits the fact that all design team members have access to a unique3D computer model of the ship by means of a network, which can only be updated with theapproval of the design team leader.

30

2.3 – Systems Engineering

It is because of the iterative nature of many design approaches that the approach of least com-mitment should be followed. That is, progressing from step to step in the process, no irreversibledecision should be taken until it is necessary. This principle of least commitment provides maxi-mum flexibility in each step and the assurance that more alternatives remain available as long aspossible, before selecting the best alternative. The policy of least commitment has been shownto result in more efficient design, primarily due to the reduced requirement for iteration.

The decision–based design approach, accredited to Toyota, was born out by evidence as the bestdesign methodology since it provides better designs and in shorter time. It is founded on theaxiom that the principal role of designers is to make decisions. It encompasses ‘systems engi-neering’ paradigm and embodies the ideas of ‘concurrent engineering’ approach for the productlifetime.

The general application of decision–based design is particularly suitable in the early stages ofthe design process, although the tools developed and employed (multicriterial decision–makingtechniques) can be useful even in further design phases. The motivation to foster this paradigm inthe early design phases is that it offers the greatest potential to affect the design process, productperformance and total cost. The initial stages of design dramatically affect the final outcomes.Indeed, possibilities for influencing total lifetime product cost are very high in the real begin-ning of the design process and decrease during the following design stages, process developmentand manufacturing. Hence, to solve the conflicting issues of economic viability, technical andeconomic design properties of an industrial product are to be considered simultaneously sinceconcept design.

Independent of the paradigms or methods used to plan, establish goals and model technical sys-tems, designers are and will continue to be involved in two primary activities, namely, performingsimulations and making decisions - two activities that are central to increasing the efficiency andeffectiveness of designers themselves and the processes they use.

Before discussing the multicriterial decision–making approach and summarizing some suitablesupporting mathematical tools, the evolution of design paradigms is illustrated which includesnew processes, new management practices and new tools.

2.3 Systems Engineering

Design is a process of synthesis and integration covering many disciplines. Because of the extentof required knowledge, traditional design is accomplished by dividing the overall product intomanageable subsystems and solving for them. Therefore, system engineering1 has been devel-oped to ensure that the isolated specialist solutions are integrated. It focuses on the relationshipof the different subsystems and disciplines involved in their design and integration of them all.

The technical literature on the subject has extended in last decades, even though it traces itsorigin back to the end of the Second World War, while the earlier books go back to the 1960’s.

1The term ‘system’ stems from the Greek word ‘systema’, which means ’organized whole’.

31


Systems engineering received its impetus from the defense industries in a number of countries.The US Navy defined systems engineering as ‘the application of science and art to integrate theinterdependent aspects of a ship design to form an optimum whole which meets operational re-quirements within technical and programmatic constraints’.

Systems engineering developed because of two reasons (Hazzelrigg, 1996). The first is that engi-neers had become so specialized (taylorism) that someone needed to take the responsibility for theoverall technical system. As to ships, the naval architects have always had this responsibility andstill maintain it in a lot of shipbuilding countries, even if in the United States they allowed thisresponsibility to be taken away from them since many decades. The second reason is that someindustrial products (aircraft, ships, cars, and so forth) had become so complex that a rationalway to manage the design has become essential.

Increased system complexity had increased emphasis on the definition of requirements for individ-ual system elements as well as definition of the interfaces between subsystems. Increased systemsize and complexity had forced expansion of the engineering workforce required to develop thesystem as well as increased specialization within the workforce. Collectively, these trends hadinevitably forced the managers of complex systems to expand and formalize their developmentprocedures and processes under the systems engineering umbrella.

2.3.1 Goals

Systems engineering is a formal process for the design of complex systems to meet technicalperformance and achievable objectives within cost and schedule constraints. It concerns the en-gineering processes and techniques aimed to transform a set of operational requirements intoa defined system configuration through a top–down iterative process of requirements analysis,functional analysis, design synthesis, system documentation, manufacturing, trials and valida-tion. Therefore, it overlaps and interfaces with both design and project management. Systemsengineering does not comprise, but is only the organization and management of the design process.

In recent years some proponents of systems engineering have pushed for its use almost as if it is adesign approach. Nevertheless, while the overall design has always considered both the design ofindividual components and integration of them, systems engineering does not comprehend the de-sign process. In other terms, while design is a decision-making process characterized by selectionof the design variables and parameters as well as of the preferred product, systems engineeringis only the organization and management of the design process. Systems engineering cannot beconsidered intrinsically as an engineering discipline in the same way as civil engineering, shipengineering, mechanical engineering, and other design specialty areas.

The systems engineering process involves both technical and management aspects. Its princi-pal objective is to achieve the optimum balance of all system elements so as to optimize overallsystem effectiveness, albeit at the expense of subsystem optimization. According to the ‘Inter-national Council on Systems Engineering’, this methodology focuses on defining customer needsand required functionality early in the development cycle, documenting requirements, and then

32


proceeding with design and system validation. It integrates technical disciplines and ensuresthe compatibility of all physical, functional, and program interfaces. These disciplines include:reliability, maintainability, supportability, safety, manning, human factors, survivability, test en-gineering and production engineering. During the technical product development, the systemsengineering process gives great weight to customer needs, characterizing and managing technicalrisk, technology transfer from the scientific community into application development, system testand evaluation, system production, and life–cycle support considerations.

The objectives of the systems engineering process are the following:• ensure that the product design reflects requirements for all system elements: hardware,

software, personnel, facilities, and procedural data;• integrate the technical efforts of the design specialists to produce a balanced design;• provide a framework for production engineering analysis and production/manufacturing

documentation;• ensure that life–cycle cost is fully considered in all phases of the design process.

2.3.2 Process Description

The systems engineering methodology is a collection of processes.that surround and enhance thefundamental process by complementing or focusing on particular aspects of it. The fundamentalprocess is iterative and increases detail in each phase of the system development. It is followedat the total system level by those with overall responsibility for system integration while, at thesame time, it is being followed by the developers of individual subsystems and components. Theprincipal steps in the process are shown in Figure 2.2 and briefly discussed in the following.

Figure 2.2. The systems engineering process

33


Initial Requirements. Initial requirements are contained in an ‘initial draft system requirementsdocument’. They consist of objectives, constraints, and relevant measures of effectiveness for thetechnical system. Generally they come from the customer.

Functional Analysis. Functional analysis is a method for analyzing the initial top–level require-ments for the technical system and dividing them into discrete tasks, which define the essentialfunctions that the technical system must perform. It consists of two activities: identificationof system functions and allocation of system requirements. It is performed in parallel with thesecond step in the fundamental process, design synthesis, since there must be interactions be-tween the two activities. Functional analysis starts with the identification of the top–level systemfunctions and then progressively allocates the functions to lower levels in the system. Each func-tion is described in terms of inputs, outputs, and interface requirements. Functional flow blockdiagrams are used to depict the serial relationship of all the functions to be performed at onelevel, that is, the time–phased sequence of the functional events. For some time-critical functions,time line analysis is used to support the functional analysis and design requirements development.

Requirements Allocation. Requirements allocation proceeds after the system functions have beenidentified in sufficient detail and candidate design concepts have been synthesized. It definesthe performance requirements for each functional block and allocates the functional performancerequirements to individual system elements. The performance requirements are stated in termsof design constraints and requirements for aspects such as reliability, safety, operability, andmaintainability. Requirements allocation decomposes the system level requirements to the pointwhere a specific hardware item, software routine, or trained crew member will fulfill the neededfunctional/performance requirements. The end result of requirements allocation is the technicalsystem specification. Design constraints such as dimensions, weight, and electric power are definedand documented in the ‘requirements allocation sheet’, along with all functional and technicalinterface requirements. The personnel requirements for all tasks are defined. Some performancerequirements or design constraints can be allocated to lower levels of the system.

Design Synthesis. Design synthesis provides the engineers’ response to the requirements outputsof functional analysis. Its goal is the creation of a design concept that best meets the statedsystem requirements. Inputs from all engineering specialties that significantly affect the outcomeare utilized. Several design solutions are typically synthesized and assessed. Two tools are usedto document the resulting candidate design solutions, that is, the overall configuration, inter-nal arrangement of system elements, and principal attributes by means of the ‘schematic blockdiagrams’ and ‘concept description sheet’. As the concepts that survive the screening processare developed further, ‘schematic block diagrams’ are developed in greater detail and are usedto develop ‘interface control documents’. For attractive design concepts, physical and numericalmodels are developed later in the synthesis process. The concept description sheet is the initialversion of the ‘concept design report’, a technical report that documents the completed conceptdesign. This report includes drawings and technical data such as weights, material element list,etc. The results of system analysis for the concept, described in the following, are also typicallyincluded in the report.

34


System Analysis: Once a design concept has been synthesized, its overall performance, costs andrisks are analyzed. As the design development proceeds, the number of attributes and level ofdetail of the analysis will increase. Early phase analysis typically consists of quick assessmentsusing empirical data based on past designs and reflects many simplifying assumptions. In thelater stages of design process, much more sophisticated modelling and simulation is done, cou-pled with physical model tests in some cases. The aspects of performance with major effectson mission effectiveness are identified and analyzed individually. Development, production andoperation costs are typically analyzed for each option being considered. Risk is assessed usingstandard procedures.

Evaluation and Decision. Trade–off studies are an essential part of the systems engineering pro-cess. Once several feasible design concepts have been generated, a selection process must beactivated.by means of a standard methodology where seven steps can be envisaged, namely:

1. Define the goals and requirements to be met by the candidate designs (functional analysis).

2. Identify the design candidates and discard the unfeasible solutions (design synthesis).

3. Formulate selection criteria (attributes) and, if possible, define threshold and goal valuesfor everyone (minimum acceptable and desired values, respectively).

4. Weight the attributes. Assign numerical weights to each attribute according to its perceivedcontribution to overall performance. Mathematical techniques can be used to translate thesubjective preferences into weights.

5. Formulate utility functions, which translate diverse attributes to a common scale, for ex-ample, comparing speed vs. endurance, cargo capacity vs. on-off-load times for a ro–ro ship.

6. Evaluate the alternatives. Estimate overall performance and other required attributes suchas risk (system analysis). Then score the overall technical capability with respect to cost.Calculate the cost/capability ratio for each alternative.

7. Perform sensitivity analysis. Assess the sensitivity of the resulting overall score to changesin attributes, weights, and utility functions. This enables a more informed judgment to bemade as to whether one alternative is clearly preferred over the others.

System Documentation. The system design must be documented as it evolves. Traditionally,this has been done on paper by means of documents such as specifications, drawings, technicalreports, and tables of data. Today, this is increasingly done utilizing integrated design systemsand producing the desired documentation on electronic supports. In the future, ‘smart productmodels’ will contain all necessary design documentation.

2.3.3 Synergy with Information Technology

In the ‘information age’, systems engineering and computer technology operate in an industrialworld requiring synergistic activities. Systems engineering may allow designers to find some solu-tions to design problems relying on databases and decision–making support systems available on

35


computers. When adapting the information technology (IT) in an industrial company to princi-ples of systems engineering, information may be provided in a synergistic way to designers almostinstantly in quantity and quality yet not considered possible.

Designers are still involved primarily with the unstructured or partially structured parts of prob-lems. Nevertheless, project managers and designers as well are discovering that the simple anddichotomous subdivision that separates, say, mechanical engineering, electrical engineering andnaval architecture is more a historical tradition than a technical necessity. It may be a traditionaland convenient means for structuring administrative entities and budgets, but it is dramaticallyinefficient for organizing design teams.

Combination of systems engineering and information technology permits to plan and control themanufacturing process as a whole. A corollary effect of the advent of systems engineering is theblurring of the lines that have separated the traditional disciplines in both academic institutionsand industry.

On the other hand, in the decades since computers became the universal tool of engineers andscientists, dramatic changes have been observed in the computers themselves and the way ofusing them, which have paved the way to new related fields of research in science and technology.Designers are on the eve of being able to use computers not just as fast and accurate devices, butas partners in the design process.

2.3.4 Critique of Systems Engineering

An area of potential confusion in discussing systems engineering is its scope as a discipline inrelation to activities such as design process, project management, systems management, designmanagement, engineering design, etc. Many engineering design texts (Erichsen, 1989; Sage, 1992)present many confusing overlaps among design, design management and systems engineering, eventhough they emphasize the creative challenges and techniques of the design synthesis task at theinitial design phases.

Differentiation between design and project management is ambiguous. Again there are significantoverlaps, and many of the techniques of systems engineering are also claimed in project man-agement literature. Some author states that systems engineering provides the creative heart ofproject management by defining the technical and work deliverables, i.e. the requirements, thedesign development and all the tasks necessary to build and test the industrial product. Underthis view project management becomes the set of activities associated with implementing andcontrolling the ‘product and process blueprint’, which systems engineering has provided. Thuscontract management, scheduling techniques, cost control, quality assurance, etc., are activitieswhich have no meaning without the foundation of systems engineering. An alternative, but re-lated, view is not to see project management and systems engineering as separate disciplines, butis to see project management simply as the larger canvas which must include systems engineering.

So far, the idea that systems engineering may cope with the overall design of complex productshas been criticized by many experts. This paradigm is no more than a framework for thinking

36

2.4 – Concurrent Engineering

about the engineering process, which needs tailoring to be applicable to a particular product andproject (Van Griethuysen, 2000). It is evident that many industrial products have always beendesigned and produced using some kind of systems engineering. It is also true that much of navalarchitecture and marine engineering concerned with design management is an example of systemsengineering. Thus, it is not so much a question of whether systems engineering can cope withunstructured design of industrial products, but more a question of whether in its current ’born–again’ form it has anything to offer beyond current understanding of the engineering management.

Although existing systems engineering texts - often with a bias towards software/computer sys-tems - appear not to offer anything new for the overall system design process of industrial prod-ucts, there are techniques and insights to be learnt in the area of process/information systems.As engineering products become more influenced by software systems, these methods need to beadded to the design management ‘toolkit’.

However, the relevance of systems engineering should be advocated with due caution because ofthe following reasons:

• The current language of systems engineering has to some extent been abused by engineeringcommunities working in particular industrial sectors. What is presented as ‘general’ is infact often ‘partial’, especially with respect to ’systems design’. It would undoubtedly behelpful to its wider acceptance, if systems engineering publications and courses used moresignificant examples from a wider product base, and paid due attention to the physicalaspects of the design of complex products.

• Systems engineering can be harmful if procedures are applied across products in an inappro-priate or disproportionate manner; see, for example, the over elaboration of requirementsin computer databases under the banner of ‘requirements engineering’, without progressivedesign modelling to establish feasibility in terms of cost and operability.

• Whilst the concept of systems engineering, as a set of knowledge, methods and techniqueswhich can be applied to different product areas, is a valuable one, the further step of defin-ing systems engineering as an independent branch, or even an overall design paradigm, ishighly questionable.

2.4 Concurrent Engineering

Today designers are compelled more than ever to develop strategies to allow for reduction ofdesign time without loss of quality. According to Kusiak (1993), an appropriate paradigm tothis comprehensive perspective is concurrent engineering (CE) whose term originated from theUS Defense Department. At its outset, CE was the concurrent design of a technical system andits manufacturing processes. The main objective of concurrent engineering is to shorten timefrom order to delivery of a new industrial product at lowest cost and highest quality. Figure ??schematically shows the CE approach, where all main activities are carried out through parallelismand bi–directional integration..

37


Figure 2.3. Concurrent product development process

The concept first gained considerable attention during the 1980’s when the United States auto-motive industry realized that it needed to shorten the time for developing and marketing newmodels in competition with the Japanese industry. Concurrent engineering has been widely ac-cepted as an effective engineering practice for decreasing product development time, improvingquality, and decreasing manufacturing costs. Since it aims to consider all elements of productlife–cycle from the outset, CE approach increases the complexity of the design process.

In the past there has been widespread emphasis on work specialization, and the result oftenhas been a stovepipe organizational structure with inadequate communication and transfer ofinformation. To hinder this trend, concurrent engineering aims to totally integrate developmentof product and process design using cross–functional, empowered teams. The essential tenetsof concurrent engineering are customer focus, life–cycle emphasis, and the acceptance of designcommitment by all team members.

Concurrent engineering, like systems engineering, is more a matter of approach and philosophythan an engineering discipline. It represents a common sense approach to an industrial productdevelopment in which all elements of the product’s life cycle from conception to delivery areintegrated in a single continuous feedback–driven design process. There are several other wordsfor concurrent engineering; among the others, ‘simultaneous engineering’, ‘unified life–cycle engi-neering’, ‘producibility engineering’, ‘concurrent design’, ‘co–operative product development’, etc.

A generally accepted definition of CE was prepared by Winner et al. (1988) and is: Concur-rent engineering is a systematic approach to the integrated design of industrial products and theirrelated processes, including manufacturing and support. This approach is intended to cause devel-opers, from the outset, to consider all elements of the product from conception through disposal,including quality, costs, schedule, and user requirements. This definition may be regarded asoperational oriented, implementing concurrent engineering on a low level.

Alternatively, Shina (1991) stated that ”... concurrent engineering is the earliest possible inte-gration of the overall company’s knowledge, resources, and experiences in design, development,marketing, manufacturing and sales into creating successful new products, with high quality andlow cost, while meeting customer expectations. The most important result of applying concurrentengineering is the shortening of the product design, and the development process from a serial toa parallel one.

38


This definition applies more to an overall design strategy at company level. The different per-spectives of the two definitions complement each other and state that concurrent engineeringmust be applied both at the company’s overall product development and design strategy and atoperational level. The most practical definition of CE is probably: Concurrent engineering issystems engineering performed by cross functional teams.

By regarding the different definitions and semantics it is possible to designate some characteristicsof concurrent engineering; the most significant may be: design, integration, parallelism, product ,and process. Implicit in these characteristics, organization, communication, and requirementsmust be managed. Compared to traditional engineering design in which analysis of the productplays the central role, the synthesis of the process is the dominant feature in concurrent (paralleland integrated) engineering.

Concurrent engineering is not new; since as a concept has now been around over two decades, ifone starts counting from the publication of Pennel and Winner’s (1989) definition. Many of itstechniques and tools have been around since long time worldwide, but this approach packagedthem into an integrated philosophy. Its implementation, therefore, goes to the very structureof an organization and its management philosophy. Implementation of concurrent engineeringrequires moving from:

• department focus to customer focus;

• individual interests to team interests;

• dictated decisions to consensus decisions.

Experience has shown that concurrent engineering cannot be implemented gradually and grace-fully; an all or nothing approach is required. Such changes are clearly difficult to implement andrequire the expenditure of time and money, but they induce potential long-term benefits. Perhapsan even greater challenge is changing the culture of the organization. Managers and workers atall levels may be fearful of giving up some individual authority, but they must recognize thatchange is necessary in order to remain competitive in a global economy.

2.4.1 Basic principles and benefits

The basic principles of concurrent engineering require process orientation, team approach, empow-erment, open communication, and customer satisfaction. Concurrent engineering is characterizedby a focus on the customer’s requirements and priorities, a conviction that quality is the result ofimproving a process, and a philosophy that improvement of the processes of design, productionand support are never–ending responsibilities of the entire company.

In concurrent engineering the design problem is approached by defining a multi–disciplinary de-sign and focusing on such aspects like functional requirements, production, quality assuranceand economic efficiency of the engineering product. Generally, the term concurrent engineeringis connected to consideration of how the product will be manufactured, but it may be used todescribe the consideration of economy in its overall development.

39


As some analysts state, concurrent engineering has shifted companies from a manufacturing en-vironment to a design environment. It changed the way designers work, as they are compelled tointeract with greater numbers of people and gain knowledge from other disciplines and organiza-tions. Throughout all of these changes, the designers have to be the key actors in the concurrentengineering process and the design process has to drive the overall manufacturing process.

The primary benefit of concurrent engineering is improved design quality and production pro-ductivity. This can lead to increased market share achieved by:

• understanding the customer requirements and the cost implied;

• appraising one’s own products with respect to those of the competitors;

• minimizing the time and cost from concept design through production and delivering.

A design team that employs concurrent engineering principles has to include experts in require-ments analysis, cost analysis (acquisition and operation), production engineering, ilities (reliabil-ity, maintainability, availability); material procurement, tests and trials, and marketing.

A basic premise is that the design team has many customers. As to ships, these include theshipbuilder and shipowner staffs. Experts on crew training and logistics are also customers, par-ticularly if the design includes new technologies. These different groups view the ship design fromdifferent perspectives, have different goals and objectives, and they bring different experiencesand expertise to the design team. Hence, early involvement of all these different customers willproduce a better product. Expressions such as ‘integrated product teams’ and ‘integrated prod-uct and process development’ are now widely discussed. Coupling process and product is alsoworthy of note, since it recognizes that if an enterprize hopes to improve the product, it mustfirst examine and improve the processes used to design and build it.

In general, the expected benefits of the concurrent engineering approach are (Winner et al., 1988):

• improving the quality of designs, which may result in dramatic reductions of engineeringchange orders (greater than 50%);

• reduction of product development time by 40–60% with respect to serial design processes;

• reduction of manufacturing costs by 30–40% when multidisciplinary design teams integrateproduct and process designs;

• reduction of the scrap and rework costs by 75% through product and process design opti-mization.

Although concurrent engineering can be implemented in many ways, its basic elements are:

• reliance on multidisciplinary teams to integrate the design, manufacturing and support pro-cesses of a product;

• use of computer–aided design, engineering and manufacturing methods to support designintegration through shared process models and databases;

• use of a variety of analytical, numerical, and experimental methods to optimize a product’sdesign, manufacturing and support processes.

40


Of course, this is a simple strategy to state on paper; it is quite another to implement it inpractice, especially when one recognizes the increasing complexity of modern products and theuse of geographically distributed and multidisciplinary teams. This situation demands the useof information technology to assist the control of the concurrent processes and to ensure that acommon database of information can be shared by all those involved in the product developmentprocess.

The application of concurrent engineering has several meanings to the ship designers. In the past,ship designs were generally developed by a stove–piped design organization without the direct,early participation of the shipbuilder, shipowner, operators and maintainers. Nor were specialistsin unique but important disciplines such as manning, cost, safety, reliability, and risk analysesinvolved from the outset. When these and other groups did get involved, after the design waslargely complete, it was generally in a review and comment mode. By this time, changes wouldbe difficult to incorporate without extra costs.

A customer’s representative should be a design team member. The basic premise of concurrentengineering is that it is better to make design decisions (at all levels) based on real time feedbackfrom all who have an interest in designing, producing, marketing, operating, and servicing thefinal product.

2.4.2 Information Flow

In CE the early design stages are especially significant because major design decisions are madethere with far–reaching effects on the engineering product being designed. Portions of a serialversus a concurrent engineering process of design are illustrated in Figure 2.4.

Figure 2.4. A comparison of serial and concurrent engineering

41


To achieve high–quality designs, the information flow in concurrent engineering between designengineering, manufacturing, marketing, and others, has to be early transferred, and decisions areto be based on both upstream and downstream considerations (bi–directional). On the contrary,in a serial approach information flows in one direction only.

It has been recognized that, in an engineering design, most of the changes occurring in ear1ydesign stages will lead to high quality design with significantly reduced cycle time (Sullivan,1986). On the contrary, if most of the changes happen in late design stages, e.g. re-design, thecost of making changes will dramatically increase since design freedom is highly limited in thesestages. Figure 2.5 shows the comparison of traditional serial design approach and concurrentengineering design approach with respect to a design time line (Kirby and Mavris, 2001). Fromthis figure, one can see that the cost of design changes increases during engineering design andincreases exponentially when the changes are necessary during manufacturing.

Figure 2.5. Serial approach vs. concurrent engineering approach

Therefore, as many changes as possible shou1d be comp1eted ear1y in the design time line. Toprevent the costly re-designs, as much know1edge as possib1e shou1d be made availab1e at theear1y stage of a design and the requisite changes should be accomplished before the cost is 1ockedin. This paradigm shift of bringing know1edge to the ear1y design stages to increase design free-dom and reduce cost is illustrated in Figure 2.6.

It is absolutely evident that as the design process progresses and decisions are made, freedomto make changes is reduced and knowledge about the object of design increases. A product ofand a clear motivation for concurrent engineering is to anticipate the knowledge curve, therebyincreasing the ratio of hard–to–soft information that is available in the early design phases. Thisrelative improvement in the quality of information should lead to products that are completed inless time and at less cost than those designed using a traditional serial process.

42


Figure 2.6. Cost–knowledge–freedom relations

Therefore, as briefly stated before, more and more attention is paid to the conceptual and pre-liminary design stages to increase the probability of choosing a design that wil1 be successful.The decisions made during these design stages, including identifying customer’s requirements,determining the attributes of interest, and selecting analysis tools (design mathematical models),play a critical ro1e in the design process. They are the guidance and basis that subsequent designdecisions rely upon, and have an important impact on the final design solution. Therefore, thesedecisions in the early design stage need to be made rationally based upon decision–based design.

2.4.3 Concurrent Engineering Environment

To keep the different development processes in balance, a dynamic concurrent engineering envi-ronment must be introduced. It ensures that the different conditions for concurrent engineeringare arranged, systematized and controlled.

Managing Sources of Change

The main reason to introduce a concurrent engineering environment is to manage change in orga-nization and product development. Baker & Carter (1992) outline five sources of change, namely:

• Technology. Both the technology in a product and the technology to produce it becomemore complex. New technology is a source of change and it is important to have a plan forintroducing new technology and managing changes allowed by the technology.

43


• Tools. The tools to design and produce a technical system change in time with technology.The sources of change in using advanced tools may be degree of automation in production,integration of product development processes, and flexibility in the production process.

• Tasks. The variation and complexity of the tasks are sources of change. If the tasks aredifferent from time to time, the task itself become a source of change.

• Talent. Each individual worker/technician/engineer may have a special talent for new ideas,which may be a source of change. In addition, the degree of change is also influenced bythe individual ability to manage external or organizational changes.

• Time. The time to product delivery is important to stay in the market. Therefore, it isnecessary to search for improvements which contribute to reduced overall development andmanufacturing time.

The outlined sources of change are dependent on each other; for example, decisions about tech-nologies impact the choice of design and development tools. Moreover, some aspects may beinternally managed by the company while others are influenced by the company’s external inter-action.

To yield the desired changes, the sources of change are translated into resources by four intercon-nected activities:

• Organization. This includes both the organizational boundaries, such as disciplines and de-partments, and physical location; but typical for concurrent engineering is the design team.The organization may, therefore, be divided into managers and product design teams. Themanagers must establish the overall strategy and are responsible to provide a concurrentengineering environment. They must establish the product development teams by givingthem the authority and responsibility to make decisions. In addition, they must provideteams training and support the team with professional and technical needs.

• Communication Network. The main purpose of communication network is sharing infor-mation. It transfers to the actual members involved in the design process the overall infor-mation related to a product. In large projects with many people or different co-operatedcompanies, establishing one development team may not be possible. In such projects in-ternet communication technologies are required for effective information exchanges andsharing. However, the most effective way of communication may be person–to–person.

• Requirements. The customer requirements are the overall target of an industrial product.The most important to specify in the concept design phase are the required properties andthe constraints of the product. The quality function deployment method described in thefollowing identifies the requirements and solutions in concurrent engineering.

• Integrated Product Development. This activity links all the tasks in the development processtogether, including considerations about support, operation and maintenance through thelifetime. All tasks are executed in parallel and across disciplines.

44


Targets

Concurrent engineering may advantage the customer in two ways. According to Blankenburg(1994), the advantages are both in the process, which means that the customer gets the productfaster and cheaper, or in the product, which means that the customer gets a better product.Therefore, process and product are the targets of concurrent engineering process.

Process. The process includes procedures, methods, techniques, etc., to design and produce aproduct. Most of the literature and definitions of concurrent engineering focus on processes andthis indicates a belief that improved productivity and quality in the processes also results inimproved quality of the products.

Product. The outcome of the product development process is the technical system. In addi-tion, outcomes may also consist of other services which secure the support over its lifetime.(user-instructions, refitting, etc.). The main goal of the product is to satisfy the quality andfunctionality required by the customers.

Mechanisms

Although a considerable number of studies have been devoted on design decomposition as a meansto reduce the complexity of a large scale design problem, only recently due attention has beengiven to the computational framework for dynamic and systematic design integration in a com-puter network–oriented design environment. On the basis of the integrated product developmentmodel, Blankenburg (1994) introduces three mechanisms of concurrent engineering: integration,prescience and parallelism. These mechanisms are necessary to accomplish the activities in theproduct development process according to the concurrent engineering concepts.

Integration. It is important to secure all available, relevant information and knowledge abouta product during the product design and manufacturing, and insure they are taken into con-sideration. No single discipline or department alone has information or knowledge necessary toconsider all the elements influencing a product during its lifetime. Therefore, the knowledgeand information from different disciplines and departments must be integrated. Regarding theintegrated product development model, a vertical integration between the market, product andproduction insure that the information and knowledge from the different departments are takeninto consideration. To consider elements from the different phases of the product developmentprocess, a horizontal integration is necessary. The horizontal integration includes in early designphases considerations from late phases, such as manufacturing and operation. This may be doneby including people from late phases, for example manufacturing and operation, in modellingconcept design phase. The advantage of integration is that all special information and knowledgethat usually belongs to a special discipline, department or phase is shared and taken into con-sideration before a decision is made. This insures that decisions are made in co-operation andacross disciplines and organizational boundaries.

Prescience. Prescience aims to search for and identify forthcoming activities of high uncertaintyas well as to execute parts of these activities searching for information to reduce the uncertainty

45


in preceding activities. Prescience insures short feedback time and iterations instead of long iter-ations from late to early activities.

Parallelism. A way to shorten the time of the product development process is to execute activitiesin parallel, e.g. at the same time, independent of the function or the phase to which they belong.However, the extent and contents of the different activities influence the dependencies and givesome restrictions to parallelism. The restrictions may be divided into three groups:

• resource dependencies, which are restricted by resources available, e.g. quantity and qualityof people, hardware and software;

• precedence dependencies, which are caused by natural limitations;• information dependencies, which occur when one activity is dependent on the information

output from another activity to be still executed.

Concurrent Engineering Matrix

The targets (product, process) and mechanisms (integration, prescience, parallelism) of concur-rent engineering influence each other according to the arrangement of a two–dimensional matrixas shown in Figure 2.7. The matrix shows that combining integration and prescience increases thequality and minimizes the uncertainty of the product and processes of manufacturing. Further, acombination of prescience and parallelism reduces the lead–time and controls uncertainty. Thesecombinations distinguish concurrent engineering from traditional, serial approach of product de-velopment and design.

Figure 2.7. The concurrent engineering matrix

The challenge of a concurrent engineering environment may be summarized as a combination ofmechanisms (integration, prescience, parallelism) which concurrently advance the targets (prod-uct and processes), all supported and arranged by the five sources of change and the four mainactivities of concurrent engineering.

46

2.5 – Quality Function Deployment

2.5 Quality Function Deployment

Engineering systems have become increasingly complex to design and build while the demandfor quality and effective development at lower costs and shorter time continues. Today, manycompanies are facing rapid change, stimulated by technological innovation and new customerdemands. They realize that if they can deliver new products earlier than their competitors, theyhave a good chance of obtaining a major advantage in the market. Thus, designers are attemptingto shorten the duration of new product development through the use of concurrent engineeringconcepts and good time estimation techniques.

However, many new industrial products with short development time are not successful. Thisis mostly because the design teams did not focus on actual customer demands and expectationssince the very initial design phases. For the prevention of unsuccessful products, quality shouldbe set much before the functional and embodiment design phase to avoid developing productswith low customer satisfaction. Quality is the measure of how well a product satisfies a customerat a reasonable cost during its lifetime (Priest and Sanchez, 2001). Accordingly, the major chal-lenge that system designers face is to bring technical products to meet customers’ expectationscost-effectively.

There are different groups of properties used to determine quality. Setting quality concerns threepartial areas during the comprehensive development of a technical system:

• quality of design, which has the largest influence on the overall quality; it can be ensuredthrough a methodical and transparent design process, beginning with the definition of needsand extending through requirement analysis, design synthesis, design evaluation, functionalanalysis and system validation; this quality absolutely demands an objective set of technicalknowledge (design science) as well as special techniques to put quality under risk control(robustness analysis);

• quality of manufacturing, which is measured on the produced components of the technicalsystem; it is known as quality of conformity to the manufacturing specifications, i.e. to thedetail drawings;

• quality of application, which appears only when the technical system is employed; this in-cludes also the secondary processes, such as maintenance, repair, refit, upgrading, etc.

Knowledge systems have been developed for each of these three partial areas. The relevant stan-dards for recognition and verification of quality assurance scheme are ISO 9000, ISO 9001, andISO 9002.

Companies employ different design strategies to suit quality assurance. Well–established largecompanies are generally more likely to adopt low risk strategies because of the large losses thatcould accrue from the failure of a product in the market place. At the same time these companiesare very aware of the need to ensure customer satisfaction via product quality and to competein global markets with reducing time available for product development. The challenges aresignificant and have led to the development of complete methodologies aimed to ensure that designteams produce customer–driven designs that are ‘right first time’ and delivered very quickly.

47


One such methodology is known as quality function deployment (QFD), which stands for:

• quality : meeting customer requirements;

• function: focusing attention on what must be done;

• deployment : indicating who will do what and when, perhaps even how.

Quality function deployment is a method employed to facilitate the translation of a prioritizedset of customer demands into a set of project targets by means of applied statistics. In particular,it facilitates identification of a set of system–level requirements during conceptual design. It isalso applicable to all project phases during product, parts, process and production planning.

The QFD–method was originally developed and implemented at the Kobe Shipyards of Mit-subishi Heavy Industries in the late sixties to support the design process for large ships. Duringthe 1970’s, Toyota and its suppliers further developed QFD in order to address design problems as-sociated with automobile manufacturing. Toyota was able to reduce start–up and pre–productioncosts by 60% from 1977 to 1984 through the use of QFD. During the 1980’s, many US–basedcompanies began employing QFD. It is believed that there are now over 150 major corporationsusing QFD in the United States, including Motorola, Compaq, Hewlett–Packard, Xerox, AT&T,NASA, Eastman Kodak, Goodyear, Ford, General Motors, and the housing industry.

The philosophy behind QFD is that product design must reflect customer requirements from thestart of the project and the product development. It needs multidisciplinary coordination. Toinvolve all related actors, such as marketing experts, engineers and manufacturers, it is usualto establish a project team to carry out the QFD–analysis. The aim of the project team is tointegrate all necessary aspects regarding the product development. In addition, to expose in-terdisciplinary and functional relationships, the QFD method allows weighting of criticality andstimulates team work.

The QFD process is relatively simple to outline on paper but requires significant commitmentto achieve in practice. It aims to identify and record customer requirements and then translatethese into design requirements and product component characteristics. Basically the translationinvolves restating the often vague customer requirements into specific design targets and engineer-ing characteristics. By consequence, it requires the identification of operating requirements andmanufacturing procedures that will ensure that the customer viewpoint is maintained throughoutthe design, manufacturing and test process. If successfully applied, the result should be a deeperunderstanding of customer needs coupled with better organized and more efficient projects. Ad-ditionally, there should be a smoother introduction to production with fewer design changes and,of course, an enhanced quality accordingly.

The quality function deployment has also been defined as a system for designing a product basedon customer demands and involving all members of the producer or supplier organization. It is aplanning and decision–making tool. The method is based on a matrix transformation (Fig. 2.8)which forms like a house and it is, therefore, also referred to as the house of quality (HoQ) due toits shape. The process involves constructing one or more matrices or quality tables (Cohen, 1995).

48


House of Quality

Figure 2.8 shows the principles of the house of quality which form the map of the quality functiondeployment analysis. The left part of the HoQ contains the whats, that is the attributes identifiedto better describe customer demands. The top part of the HoQ identifies the designer’s technicalresponse relative to the characteristics (attributes) that must be incorporated into the designprocess to respond to the customer requirements. This constitutes the how , a set of designcharacteristics (technical measures of performance). The inter–relationships among attributesare identified (correlation matrix ). The center part of the HoQ conveys the strength or impactof the proposed technical response on the identified requirement (relationship matrix ).

Figure 2.8. House of quality

The structure of a house of quality (HoQ) depends on the objective, development stage, andscope of the QFD project, and thus, different HoQ charts have different components. However,there is a set of standard components that include the following:

• marketing and technical benchmark of data from customer and technical competitive anal-ysis;

• customer requirements (attributes) and their relative importance;• design characteristics (product specifications);• relationship matrix between customer requirements and design attributes;• correlation matrix among design attributes;• computed absolute/relative importance ratings of design attributes.

49


Mapping the house

There are no general rules to follow to derive the house of quality and the design team mustcustomize its own house regarding the needs and the purpose of the analysis. Some generalguidelines are outlined in the following.

Step 1. The analysis starts by identifying the customer’s requirements such as wants and needs,likes and dislikes, termed the whats, and bringing out the so–called customer attributes (CA). Thecustomer is defined as any user of the design (shipowner, ship operator, shipbuilder, etc.) Theneeds and desires of these customers are identified, based on consensus, including a prioritizing ofrelative importance, which is a weighting of the benefits the customer expects to get by fulfilmentof the CAs. A benchmark of the product’s competitiveness with regard to the competitors maybe assessed.

Step 2. The next step makes a list and description of the hows, also called design characteristics(DC), which affect one or more CAs. The DC must be measurable and possible to control by theproject-team and should ensure that the CAs are satisfied. There must be at least one how foreach what and there may be more. Also, each how will typically influence more than one what .The relationships between the CAs and the DCs are marked in the relationship matrix. If a DChas no relationship to a CA, it satisfies no customer needs and may be discarded.

Step 3. The hows and whats are then correlated by means of the what–how relationship matrix ,which is the core matrix of the QFD and correlates the CAs and DCs. Once the CAs and DCsare linked objective measures should be added, which contain customer satisfaction, preferablystated in units of measurement. The DC may be marked with a positive ‘+’ or negative ‘-’ sign.A positive sign means that increasing the value of the attribute to measure will benefit the user,a negative sign the opposite. The hows associated with each what are noted in the appropriateboxes of the matrix and the strength of each association is estimated. By this means, the relativebenefits of each how can be expressed numerically, that is, the hows can be weighted.

Step 4. This step fills the roof matrix which exposes the hows can be correlated with one an-other and the related relationships are also rated. This is done in the roof of the HoQ. Changesmay lead to improved or impaired quality and affect different characteristics. The challenge is tobalance the matrix according to optimal benefit for the customer. By exposing these interactions,the roof matrix may contain the most important information of the QFD–analysis.

Step 5. This step sets the priorities for improving components by weighting attributes to actualcomponents. The components to consider may be added by the design–team’s own priorities andmay include technical items, attributes’ importance and estimated costs.

Step 6. The intention of this step is to help the design team to set the targets. These are enteredat the bottom of the house. The targets are determined by looking at the relative importance asprovided by the users.

50


Application of HoQ

The QFD team employs HoQ analysis to understand which design characteristics maximize cus-tomer requirements and how much these characteristics have to be improved to achieve preferenceover the competitors. To answer the first question, the relative importance of DCs is calculatedtaking into account the importance of customer attributes (CA) as shown in Figure 2.9.

Figure 2.9. A house of quality chart

To answer the second question, traditional QFD finds a proper design strategy to improve cus-tomer perception using trial–and–error methods. In the example given in the figure the matrixhas four CAs and five DCs. It may be observed that CA4 has the maximum importance leveland CAl has the minimum one. So, the design team at first will try to improve CA4 in order toincrease total customer satisfaction. This CA4 is affected strongly by DC2, even though DC2 isnot the most important design characteristic. So, the design team may prefer to improve DC3 atfirst because this strategy can improve three attributes at the same time. The cost of improve-ment is another criterion for the selection of some DCs vis–a–vis their level of improvement. Inthe end, the customer decision depends on the total cost, and thus the product development teamhas to control this design variable. The design team has many possible choices for improvementwith various effects on customer satisfaction and the total cost of the product. Therefore, manytypes of quantitative models, especially optimization models,have been developed to help QFDteams.

Due to the inherent differences between market analysis and design strategy, it is assumed thatthe QFD process is performed in two phases:

1. setting target value related to each attribute according to customer requirements;

2. determination of the design attributes to maximize performance achievements and to min-imize product cost.

Actually, in the second phase, goals are achieved that were set in the first phase. This approachis consistent with the inherent characteristics of QFD to determine a successful design strategy.

51


Linking the houses

The QFD–analysis not only gives the customer’s requirements for the product, but is also suitableduring later phases of the project, mainly concerned with detailed design. Figure 2.10 shows theQFD matrix chain.

To continue the HoQ to succeeding phases, the whats of the preceding house are transformed tohows in the next phase. In each successive matrix, correlations can be identified and the impactsof these correlations can be judged. By this multi–step process, the customers’ desires can belinked to system features and the relative importance of various system features can be assessed.This knowledge can be used to influence the allocation of design resources and the numeroustrade-off decisions that must be made during design development.

Figure 2.10. The QFD matrix chain

2.6 Design Evaluation Methods

Decisions must be made at every stage of the design development process when selecting amongthe technical alternatives that are capable to meet functional requirements.

Traditionally, it has been assumed that the technical requirements are mutually compatible. Inthis case a few feasible alternatives can be developed, selection attributes (or an objective func-tion) established, the criteria applied and a basis ship selected. In this serial process no realdecision-making is involved.

On the contrary, when one wants to consider the real design situation where the criteria gov-erning a selection are in conflict, the decision-making process is as important as the quantitativeoutcomes upon which decisions are based. Multicriterial decision–making (MCDM) methods aredesigned to address this kind of problem (see Chapter 3). The underlined methodologies are stillin development and not so much diffused in the shipbuilding community.

As a transition between the traditional design approach and a rational one based on MCDMmethods, a number of design evaluation methods and tools have been developed to enhance indi-vidual and group evaluation activity in the design process. They are:

52

2.6 – Design Evaluation Methods

• Controlled Convergence Method• Weighted Attributes/Objectives Method• Systematic Method• Probabilistic Method

Each of the above methods employs decomposition to enable design evaluation, and assumesthe availability of soft and hard information about the design being evaluated. Therefore, eachmethod fits with a specific design phase, as reported in Table 2.2.

Method Design Phase

Controlled Convergence ConceptWeighted Attributes/Objectives Basic

Systematic PreliminaryProbabilistic Design Concept/Basic

Table 2.2. Evaluation methods

The increasing complexity of technical systems requires the application of rigorous, team–basedand objective evaluation that demands a clear knowledge of the pro’s and con’s of each supporttechnique and how and when it can be effectively applied within the design process. It is also worthnoting that, with the exception of the ‘probabilistic design’ option, all these methods assumedeterministic evaluations. This means that there is an assumption of certainty about specificvalues of design properties. The more detailed approaches are best applied within subsystemswith an increasing support from computer–aided packages.

Controlled Convergence Method

The controlled convergence method was developed by Pugh (1980) and reflects the fact that theattention of the design team may be initially divergent, generating lots of alternatives, and thenconvergent towards one design solution. This evaluation method is a non–numeric and iterativetool for concept selection, which has joint goals of both narrowing and improving the choice offeasible concepts. It therefore seeks to identify specific strengths and weaknesses of alternativedesign concepts. The method encourages a cyclic process of expanding number of concept designsand elimination of unfeasible alternatives before tending to the ‘preferred solution’.

With reference to Table 2.3, the controlled convergence method uses a simple matrix to comparefeasible concept designs against a set of pre–defined attributes, which should be driven by a clearunderstanding of customer requirements and normative rules. The list of attributes is the verticalaxis of the matrix, whereas the concept designs form the horizontal axis.

To compare concept designs, one of them is assumed as the ‘datum concept’ (‘◦’). It does notmatter which one, but it can help if it is a technical system that already exists. Each concept iscompared with the datum for each attribute. If the concept is better than datum in a property,the corresponding attribute is marked as ‘+’; if it is worse, it is marked as ‘-’; if it is similar to

53


or the same as the datum, it is marked as ‘n’ (neutral). For each design, the total number of ‘+’,‘-’, and ‘n’ scores are added up taking the ‘-’ total away from the ‘+’ total. Each concept willnow have a net score, and it is possible to rank the competing designs in preferential order.

Design Design Design Design Design DesignAttributes 1 2 3 4 5 6

Stability ◦ + + - - nPower ◦ - + + - -

Endurance ◦ + + - + +Payload ◦ + - + - +

Seakeeping ◦ n + - n -Manoeuvring ◦ - n n + +

Vibration ◦ n + n - +Acquisition Cost ◦ - - + - +Operating Cost ◦ + + - + n

Net Score ◦ 1 4 -1 -2 3

Rank 4 3 1 5 6 2

Table 2.3. Evaluation in controlled convergence method

The design team eventually might repeat the cycle, taking one of the stronger candidates asthe new datum, while increasing the level of detail in the evaluation attributes and adding newattribute(s).

Weighted Attributes/Objectives Method

The weighted attributes method is a very straightforward evaluation method. It applies weightfactors according to the relative importance given to each attribute/objective. This method hasbeen applied mainly during the concept design phase when an appropriate amount of informationis available. It differs from the ‘controlled convergence method’ in that cardinal scales (ranks)are used to evaluate the degree of match between the design outcomes and specifications. Weightfactors are applied to each attribute/objective to reflect the relative scale of importance of eachdesign characteristic to the overall quality of the design. When multiplied by the correspondingweight factor a weighted score results for each attribute/objective. The sum of the weightedscores yields a total weighted score providing the means of comparing the overall performance ofeach design. A scale of 1 through 10, or alternatively of 0 through 1, is generally used to rateeach design against the design attributes.

It could come out that even the best ranked design concept still possesses relative weaknesses insome important design characteristics. While keeping that design for further development andanalyses, it is important to remind that these weaknesses may and are to be eliminated in furtherphases of the design process.

54

2.6 – Design Evaluation Methods

Systematic Method

The systematic method is a particular evaluation method developed by Pahl and Beitz (1984).It is very similar to the ’weighted attributes method’ although it has been generally used atpreliminary design phase when much more hard information is available. This is reflected in agrowing number of more detailed design characteristics that may be used as a measure for thedesign outcomes of product subsystems. Since at preliminary and contract design phases enoughinformation is available in the design process via direct computations and experimental analysesto obtain more accurate values for most of the design properties, the systematic method is usedfor weighted optimization of specific subsystems. Once again, when the value of each objectiveis multiplied by its weight factor a weighted score results. The sum of these weighted scoresprovides the relative ranking of each subsystem variant.

Probabilistic Design Method

This method is attributable to Siddall (1983) and is important in that it reflects the uncertaintyof evaluation at the very initial stages of the design process. It is flexible enough to deal withuncertainty, which is an all–pervading and dominant characteristic of engineering practice. Aprobabilistic design can be defined as a design in which the design team codifies uncertainty byprobability distributions. Evaluating this uncertainty is a design decision.

Figure 2.11. Value and probability density curves for engine power

An important feature of the probabilistic design method is its use of the ‘value probability dis-tribution’ (corporate utility) of each design as decision criterion when the design characteristicsare random variables. The probability density curves reflect the uncertainty in the minds of thedesigners. A simple graph with a utility scale on the y axis and a design characteristic value

55


scale on the x axis is used to reflect the customer view of the importance of achieving particularvalues for a design characteristic. In the example shown in Figure 2.11, the value curve (utilitycurve) shows that there is a preference for a lower value of power; indeed, as the power increasesbeyond 15 MW the utility starts to drop significantly. It is worth noting that this technique isuseful in determining target values for the product design specifications when the design is in theconceptual phase. Members of the customer staff may be asked to sketch these utility curves fora range of design properties of the potential product, giving the designers a clear indication ofdesign targets they should try to achieve.

Once the utility curve (value curve) is available the design team superimposes evaluation of eachdesign attribute value for each design option. In Figure 2.11 this is illustrated for two competingdesigns. In Design 1 there is a greater probability that the power will be 20 MW but there is avery small probability that it could be as low as 10 MW or as high as 32 MW. A higher averageand a wider spread are indicated in Design 2. In this case, Design 1 is clearly preferred since itwould result in higher utility for the customer and there is more confidence in the design achievinga more acceptable power level. As the design progresses and more specific information is avail-able, it may become possible for the probability density curve to reduce to one deterministic value.

Unfortunately, almost no technique has succeeded in combining multicriterial decision makingapproach with probabilistic design method. Bandte (2000) has tried to overcome this deficiencyby generating a multivariate probability distribution that serves in conjunction with a criterionvalue range of interest as an applicable objective function for multicriterial optimization andproduct selection.

2.7 Decision–Based Design

Any discussion about designing technical systems of tomorrow, using approaches based on sys-tems thinking and information technology, must include concurrent engineering design for the lifecycle. Further, while the targets of concurrent engineering are clear, there is no generally acceptedmodel for the design process able to combine concurrency and rational design for the lifetime of anindustrial product. It is unlikely that one model will emerge as the ultimate model of design for allindustrial products and processes. Therefore, only a paradigm, such as the decision–based design,can be advocated aiming to make rational, value–based decisions in a realistic design environment.

Decision–based design is a term coined to emphasize that the main role of designers is to makedecisions (Mistree et al., 1990). Therefore, design methods are to be based on a paradigmthat springs from the perspective of decisions made by designers as opposed to design that issimply assisted by the use of optimization methods or specific analysis tools. In decision–baseddesign, decisions serve as markers to identify the progression of a design from conception throughdelivering.

56

2.7 – Decision–Based Design

2.7.1 Basic Principles

Some basic principles from a decision–based design perspective are as follows:

• design involves a series of decisions which may be serial or concurrent;

• design implies hierarchical decision–making and interaction among decisions;

• design productivity can be increased by combining usage of prescriptive models (analysis,synthesis, and evaluation) and more powerful and capable computers in processing numer-ical information;

• life–cycle considerations that affect design can be modelled in upstream design decisions;

• techniques that support design team’s decision–making should be:

– process–based and discipline–independent;– suitable for solving uncertain, imprecise, and ambiguous problems;– suitable to facilitate self–learning.

Decision–based design is the decision–making paradigm that translates information into knowl-edge, provided design decisions are governed by the following main properties:

• decisions on design are ruled by multiple measures of merit and performance;

• decisions involve hard and/or soft information that comes from different sources and disci-plines;

• none of the decisions may yield a singular, unique solution (ideal), since whichever decisionis less than optimal.

2.7.2 Design Types and Structures

Three different types of design, namely, original, adaptive and variant, may be distinguished ac-cording to the amount of originality included (Pahl and Beitz, 1984):

• Original Design: original solution principles are used to design an innovative product; forexample, in shipbuilding an original design occurs only when the well-known and abused‘basis ship’ design procedure cannot be employed.

• Adaptive Design: an existing design is adapted to different conditions or tasks; thus, thesolution principles remain the same but the technical product will be sufficiently differentto meet the new targets derived from specifications; the ‘basis ship’ approach is an exampleof adaptive design.

• Variant Design: only the size and/or arrangement of subsystems of an existing technicalproduct are varied so that the solution principles are not changed.

The type of design has its influences on the type of tools and amount of design interaction re-quired. As shown in Figure 2.12, a variant design is an integral part of an adaptive design, whichin itself can be viewed as a subset of an original design. Whether the design process is classified asoriginal, adaptive or variant greatly depends on the perspective chosen. The application of steam

57


power to ships, which occurred during the industrial revolution, generated original design prin-ciples for providing waterborne transportation. Clearly, this represented a discontinuity in thedevelopment of design solutions for naval and merchant ships. However, if the design procedureis classified based on the functional requirements of the entire product, simultaneous designingin all three categories is possible.

Capability to structure the design process using a set of decision entities is one of the main featuresof decision–based design. Modelling processes (e.g. design, manufacturing, maintenance) helpsdesigners to identify the right problem at the right level and structure each process for ensuringthe ‘best possible’ outcome. Without modelling the design process, it is impossible to providesuitable guidance for improving the efficiency and effectiveness of a design team. By focusing upondecisions, a means should be provided for creating models of decision-based processes, supportedby a computer–based ‘decision support system’ (Bras et al., 1990).

Figure 2.12. Design types

In industrial engineering there is an increasing awareness that decisions made by designers couldbe the key element in the development of design methods that facilitate design for the life cycleand promote concurrency in the process (Suh, 1984; Whitney et al., 1988; Hills and Buxton,1989; Mistree et al., 1991; Zanic et al., 1992).

The starting point for representing a designer’s perception of the real world and design envi-ronment is a heterarchical set of activities arranged without discernable pattern and with noactivity being dominant. Typically, the heterarchical set associated with a product lifetime in-cludes market analysis, design, manufacturing, maintenance of the product and its subsequentscrapping. In decision–based design this heterarchical set embodies decisions or sets of decisions(decision entities) that characterize the designer’s judgment. A hierarchical set of activities, onthe other hand, characterizes the sequence of decision entities involved, and hence, heavily influ-ence the design product. Knowledge and information entities may link the decision entities inboth heterarchical and hierarchical representations. In a heterarchical structure there are con-nections between nodes, but the structure is recursive without a permanent uppermost node orwell–identified starting point (Fig. 2.13).

A design process starts when the first step is taken to extract a hierarchy from a heterarchy,that is, when the dominant node is chosen. In practice, transforming a decision heterarchy to adecision hierarchy requires to identify a correct starting point, that leads to a plan of action thatis both viable and cost–effective.

58

2.7 – Decision–Based Design

Figure 2.13. Heterarchical and hierarchical sets

With knowledge brought forward in the design time line, designers are able to make more ratio-nal decisions. The Integrated Product Process Development (IPPD), illustrated in Figure 2.14,encourages moving information forward in the design process. IPPD is concerned with upfrontactivities in the early design phases and allows the designers to decompose the product and pro-cess design trade iteration through a systems’s life cycle (Marx et al., 1994). The implementationof IPPD reorders decision making, brings downstream and global issues to bear earlier and inconcern with conceptual and detailed planning (DoD, 1996); so it can allow the design team tomake better decisions in the early design stages.

Figure 2.14. Hierarchical process flow for technical system integration

59


2.8 Decision Support Systems

To provide support for selection in designing technical systems, computer–aided decision supportsystems (DSS) are very effective. They assist decision makers in considering the implications ofvarious courses of thinking and can help reduce the risk of human errors.

The concept of DSS was introduced, from a theoretical viewpoint, in the late 1960’s. In gen-eral, decision support systems can be defined as computer systems that provide information fromdatabases and mathematical models, analyze it using decision–making techniques according tocustomer specifications, and finally yield the results in a format that users can readily understandand use. Thus, the basis target of DSS is to provide the necessary information to the decisionmakers in order to help them get a better understanding of the decision environment and selectamong feasible design alternatives.

A typical structure of a decision support system includes three main components:

• a mathematical design model rooted in knowledge–base systems;

• a multicriterial decision–making shell for implementing decision support tools;

• a set of user’s friendly interfaces connecting evaluation modules and selection procedures.

A decision support system is aimed to carry out whichever different type of design, namely, origi-nal, adaptive and variant. It requires implementation of two design phases, namely, a metadesignphase and a computer–based design phase. Metadesign is accomplished through partitioning adesign problem into its elemental entities and then devising a plan of action by establishing hi-erarchical sets. Multiple attributes and multiple objectives, quantified using analysis–based softinformation and insight–based hard information, respectively, can be modelled providing domain–specific mathematical models (metamodels) to reduce uncertainty in design decision–making.

Overall design and manufacturing processes may be modelled via DSS using entities such asphases, events, tasks, decisions. Formulation and solution of a decision support system provide ameans for allowing different types of decisions:

• Heuristics: decisions made on the basis of a knowledge–base and rules of thumb;

• Selection: decisions based on multiple attributes, weighted according to preferences, for the‘best possible’ design among nondominated alternatives (Kuppuraju et al., 1985; Trincas etal., 1994);

• Robustness: managing the risk and uncertainty related to exogenous design parameters(Allen et al., 1989; Grubisic et al., 1997).

• Compromise: improvement of the ‘best possible’ design through further optimization ofsubsystems (Lyon and Mistree, 1985);

Applications of decision support systems include the design of ships, aircraft, mechanisms, thermalenergy systems, etc. They have been developed also for hierarchical design, where selection–compromise, compromise–compromise and selection–selection decisions may be coupled (Bascaranet al., 1989).

60

2.8 – Decision Support Systems

2.8.1 Design Time Line

An industrial product life–cycle has a beginning and an end with certain specific events occurringat approximately predictable points during this lifetime. Time in development processes of it maybe modelled using event–based time rather than physical time. As noticed earlier, the principaltarget of the design process is to convert information that characterizes the needs and require-ments for a technical system into knowledge about the product itself. From the standpoint of theinformation necessary for making decisions in each of the design phases, what is important is that:

• the types of decisions being made (e.g., selection, compromise, robustness analysis) are thesame in initial design phases of all technical systems;

• the quantity of hard information with respect to soft information increases as the knowledgeabout the product increases.

In the decision support systems (see Figure 2.15, which provides an example incorporating design-ing for concept and designing for manufacturing), the ratio of hard–to–soft information availableis a key factor in determining the nature of the support that a design team needs as soon as asolution is searched. Hence, it is mandatory to define any of the design processes in terms ofphases (e.g., designing for concept and designing for manufacturing) and identifiable milestonesor events (e.g., economic viability, preliminary synthesis, detailed analysis).

Figure 2.15. A typical design time line

Using the hard–to–soft relationship makes intuitively possible to categorize computer–based toolsfor design; for example, tools used to provide support for the decision–making activities form onecategory, while analytical, numerical, and statistical codes that facilitate evaluation of engineer-ing product’s performance form another category.

61


The simplified time line for an original design (Fig. 2.15) shows how in designing for concept phasea net as wide as practicable is distributed in order to generate as many feasible solutions as possibleand then to select the ‘compromise’ concept, which satisfies the functional specifications at best.In designing for manufacturing, the goal is to ensure that the product can be manufactured cost–effectively. Even if it is not explicitly shown in Figure 2.15, in practice iteration between eventsand phases will occur.

Event: Conceptual Design

Feasibility Generate a large number of feasible concepts

(two/three decks, single/twin–screw, diesel/diesel–electric)

Decision via Initial Selection

Generate and select the ‘best possible design’ amongnon-dominated solutions, subject to multiple constraints

Engineering

Functional feasibility of the ‘preferred’ conceptsgiven basic requirements

Decision via Selection

Select the ‘robust design’ for manufacturing development(establish the cost–effectiveness and manufacturability)

(develop top–level specifications)

Event: Preliminary Design

Decision via Compromise DSP

Improve the functional effectiveness of the ‘robust design’ through modification(establish and accept a ‘satisficing’ design)

Contract Assignment

Event: Contractual Design

Engineering

Based on information provided in preliminary design, check the functionality of theimproved design, subject to a comprehensive set of functional requirements

for subsystems, and develop detailed information on acquisition cost

Decision via Refined Compromise DSP

Improve, through modification, the functional and cost–effectiveness of the final design(refine the compromise DSP by including information on costs and manufacturability -

establish and accept the improved design)

Event: Functional Design .....

Event: Detail Design .....

Table 2.4. Flow of designing for an original concept

62


2.8.2 Designing for an Original Product

One possible scenario to accomplish an original design from concept through preliminary designfor a ro–ro vessel is shown in Table 2.4. Provided the economic viability of the project has beenestablished, the first task is the generation of a large number of feasible designs. Techniques thatfoster an original product include brainstorming to identify and agree upon selection of attributesand constraints. At this stage technical and economic information on feasible alternatives shouldbe sufficient to rank candidate designs and to arrive at selection of the ‘robust solution’ via amulticriterial approach.

The key design phase, that is, the concept design, is a three–step process:

• in the first step,the available soft information on attributes is used to evaluate the feasiblesolutions; an initial selection is accomplished by solving for nondominated designs.

• the amount of hard information is increased windowing the design space, in order to reduceuncertainty about attribute values of further nondominated designs;

• finally the ‘robust design’ is selected for further development, which results in a robustproduct that fulfills the functional requirements, is cost–effective and can be manufactured.

In preliminary design the robust solution is improved through sub–optimization of various at-tributes. This is achieved by formulating and solving a compromise decision support problem.

In contractual design the final design is completely reviewed, subject to a comprehensive andstringent set of requirements (final specifications), thus ensuring functional feasibility and cost–effectiveness.

Designing generally involves costly iterations. Ideally they should be avoided or at least accom-plished as rapidly as possible in a decision–based design environment. Iteration costs can also bereduced by evaluating the need for iteration at clearly defined points (phases and events). Theevents are used to model the design process by means of a time line, thus arriving at a metadesign.

2.8.3 Metadesign

The specific activities performed by design teams change as the design process evolves. In theconcept design phase, a mathematical design model of the technical system is needed to evaluateits required properties. The model is built using representations of subsystems or clusters ofsubsystems through tuned metamodels. Later on, in preliminary design stage, within the boundsof top–level specifications, the design teams can arrange and rearrange the essential functionalcomponents of the product, before the design is frozen and changes in it can be made only withgreat difficulty. Therefore, it is necessary to develop methods for dividing technical system designinto subsystems, solving them and then synthesizing the solutions into a metadesign for the entiretechnical system.

For metadesign to represent dynamic partitioning and planning the connotation placed on theterm ‘meta’ can have three meanings:

• after : meta–‘x’ occurs after ‘x’; thus ‘x’ is a prerequisite of meta–‘x’;

63


• change: meta–‘x’ indicates that ‘x’ changes and is a general name of that change;

• above: meta–‘x’ is superior to ‘x’ in the sense that it is more highly organized, of a higherquality or viewed from an enlarged perspective.

This third meaning is the most feasible for design purposes. This notion of higher has also beenemployed in terms like metaknowledge, metadomain, metamodelling, etc.

In a metadesign, the design problem may be divided into subproblems either by decomposing orpartitioning and planning , which are not synonyms, in the early stages of project. For furtherdesign phases, and particularly in the context of the ‘decision support systems’, the differencesin the meanings of these terms are essential to distinguish between two modes of approach todesigning, that is, conventional design and metadesign. In particular:

• Decomposing is the process of dividing the system into its smallest elements. especiallyappropriate when design synthesis is based on the principle of repeated analysis on com-ponents. In adaptive and variant design, decomposition is important and the reverse ofthe decomposition process, that is synthesis, is exploited. On the contrary, in designingoriginal products, which initially are vaguely specified, using decomposition is precluded.Partitioning and planning are then required since subsystems cannot be defined a priori.

• Partitioning is the process of dividing the functions, processes, and structures, that com-prise the technical system, into subsystems, sub–subsystems, etc. In partitioning a designteam is guided by knowledge of the technical system, by considerations of the requirementsthe system must fulfill and the tasks that must be performed by the fully functional system.Partitioning a design problem yields a grouping of interrelated decisions and also providesknowledge and information that can be used for planning. In the DSS technique the productbeing designed is partitioned into its subsystems and the process of design is partitionedinto decisions using generic, discipline–independent models (Miller, 1987).

• Planning allows to add information about organizational resources and time constraintsto the decisions identified in the partitioning phase. These decisions are organized into adecision plan, that is, a plan of action for implementing the decision–based design process.

Metadesign is, therefore, a metalevel process of designing industrial products that includes parti-tioning the product for function, dividing the design process into a set of decisions and planningthe sequence in which these decisions will be made. Metadesign is particularly useful in the designof technical systems in which concurrency among disciplines is required or in which some degreeof concurrency in analysis and synthesis is sought.

2.8.4 Axioms of Decision–Based Design

Metadesign is based on the primary axioms of decision–based design. They map the particulardesign tasks to characteristic decisions and provide a domain–independent framework for therepresentation and processing of domain–relevant design information (Kamal, 1990).

64


Axiom 1. Existence of Decisions in DSS

The application of the decision support systems results in the identification of relevant decisionsassociated with the technical system and its relevant subsystems.

Axiom 2. Type of Decisions in DSS

All decisions identified in the decision support systems are categorized as selection, compromise,or a combination of these. Selection and compromise are referred to as primary decisions. Allother decisions which are represented as a combination of these are identified as derived decisions.Primary and derived decisions are resolved using specialized tools.

Selection Decision

The selection decision is the process of making a choice between a number of feasible alternativestaking into account a number of measures of merit. These measures, called attributes, representthe functional requirements and may not all be of equal importance. Attributes may be quanti-fied using precise and/or vague information. The emphasis in selection is on the acceptance ofcertain alternatives while others are discarded. The goal of selection in design is to reduce thealternatives to a realistic and manageable number.

Keywords and Descriptors

Table 2.5 summarizes the keywords and descriptors associated with the selection and compromisedecision support problems.

DSP Keywords Descriptors

Selection Given Candidate AlternativesIdentify Attributes’ Relative ImportanceRate Alternatives vs. AttributesRank Order of Preference

Compromise Given InformationFind Attribute Values (MADM)

Deviation Variables (MODM)Satisfy System Constraints

Targets (goals, attributes)Bounds

Minimize Distance from IdealDeviation Function

Table 2.5. Keywords and Descriptors in Decision Support Problems

65


Compromise Decision

Similarly, the compromise decision requires to find the best combination of design variables inorder to improve the ‘best possible solution’ with respect to multiple constraints and attributes.The emphasis in compromise is on modification and change by making appropriate trade–offsbased on criteria relevant to the feasibility and performance of the technical system.

Keywords are the tasks that classify domain–relevant information and identify the related rela-tionships. They embody in themselves the domain-independent ‘procedural knowledge’ for deci-sion support problems. Procedural knowledge is the knowledge about the process, i.e., knowledgeabout how to represent and process domain information for design synthesis. The keyword ‘given’is a heading under which the background or known information is grouped.Descriptors are objects organized under the relevant keywords within the decision support prob-lem formulation. Again, they also help to transform the problem from its discipline specificdescription to a discipline independent representation. In other terms, they represent ‘declara-tive knowledge’ (Rich, 1983), which is the knowledge about the product, i.e., the representationof problem relevant information and background knowledge about the domain.

Within the DSS the nature of decision support problems is qualified by means of two axioms:

Axiom 3. Domain–Independence of DSS Descriptors and Keywords

The descriptors and keywords used to model decision support problems need to be domain–independent with respect to processes (e.g., design, manufacturing, maintenance) and disciplines(e.g., hydrodynamics, structural mechanics, engineering management).

Axiom 4. Domain–Independence of DSS Techniques

The techniques used to actually provide decision support need to be domain–independent withrespect to processes and disciplines. This axiom may seem self-evident as many solution tech-niques (e.g., linear programming, nonlinear optimization and expert systems) are applicable toproblems from different domains. However, this condition supplements the previous axiom bystating that decision support models using domain–independent techniques should be solved in adomain-independent manner.

66

Bibliography

[1] Allen, I.K., Simovic G., Mistree, F.: Selection Under Uncertain Conditions: A Marine Application,Proceedings, Fourth International Symposium on ‘Practical Design of Ships and Mobile Units’,PRADS’89, Bulgarian Ship Hydrodynamics Centre, Varna, 1989, Vol. 2, pp. 80.1–80.8.

[2] Andreasen, M.M.: Design Strategy , Proceedings, International Conference on Engineering Design,ICED ’87, The American Society of Mechanical Engineers, 1987, Vol. 1, pp. 171-178.

[3] Bandte, O.: A Probabilistic Multi–Criteria Decision Making Technique for Conceptual and Prelimi-nary Aerospace Systems Design, Ph.D. Thesis, Georgia Institute of Technology, 2000.

[4] Bascaran, E., Bannerot, R. B., Mistree, F.: Hierarchical Selection Decision Support Problems inConceptual Design, Engineering Optimization, 1989, Vol. 14, pp. 207–238.

[5] Bashir, H., Thomson, V.: Project Estimation from Feasibility Study and Completion. A QuantitativeMethodology , Concurrent Engineering: Research and Application, 2001, Vol. 9. no. 4.

[6] Beitz, W.: General Approach of Systematic Design - Application of VDI-Guideline 2221 , Pro-ceedings, International Conference on Engineering Design, ICED ’87, The American Society ofMechanical Engineers, 1987, Vol. 1, pp. 15–20.

[7] Blanchard, B.S., Fabrycky, W.J.: Systems Engineering and Analysis, Prentice–Hall, New York, 1998.

[8] Bras, B., Smith, W.F., Mistree, F.: The Development of a Design Guidance System for the EarlyStages of Design, Proceedings, CFD and CAD in Ship Design, Van Oortmerssen Ed., ElsevierScience Publishers B.V., Wageningen, 1990, pp. 221-231.

[9] Cohen, L.: Quality Function Deployment: How to Make QFD Work for You, Addison–WesleyPublishing Company, New York, 1995.

[10] Cross, N.: Engineering Design Methods, John Wiley & Sons, Chichester, 1989.

[11] De Boer, S.J.: Decision Methods and Techniques in Methodical Engineering Design, AcademischBoeken Centrum, De Lier, The Netherlands, 1989.

[12] DoD: Dod Guide to Integrated Product and Process Development , Systems Engineering, Office of theunder Secretary of Defense (Acquisition and Technology), Washington D.C., 1996.

[13] Elvekrok, D.R.: Concurrent Engineering in Ship Design, Journal of Ship Production, Vol. 13, no. 4,1997, pp. 258–269.

[14] Evans, J.H.: Basic Design Concepts, ASNE Journal, 1959.

[15] Hazelrigg, G.A.: Systems Engineering and Approach to Information–Based Design, Prentice Hall,1996.

[16] Hills, W., Buxton, I.L.: Integrated Ship Design and Production During the Pre–Construction Phase,Transactions RINA, Vol. 131, 1989, pp. 189-210.

[17] Hubka, V.: Principles of Engineering Design, Eder Ed., Butterworth, London, 1982.

[18] Jones, J.C.: A Method of Systematic Design, in ‘Developments in Design Methodology’, John Wiley& Sons, Chichester, 1963.

67

Bibliography

[19] Kamal, S.Z., Karandikar, H.M., Mistree, F., Muster, D.: Knowledge Representation for Discipline–Independent Decision Making , Proceedings, Expert Systems in Computer–Aided Design, Gero Ed.,Elsevier Science Publishers B.V., Amsterdam, 1987, pp. 289–321.

[20] Kirby, D., Mavris, D.N.: A Method for Technology Selection Based on Benefit, Available Scheduleand Budget Resources, Proceedings, 2000 World Aviation Congress and Exposition, SAEAIAA2000–01–5563, 2000.

[21] Kuppuraju, N., Ittimakin, P., Mistree, F.: Design through Selection - A Method that Works, DesignStudies, 1985, Vol. 6, no. 2, pp. 91–106.

[22] Kusiak, A.: Concurrent Engineering: Automation, Tools and Techniques John Wiley & Sons, 1993.

[23] Lyon, T.D., Mistree, F.: A Computer-Based Method for the Preliminary Design of Ships, Journal ofShip Research, 1985, Vol. 29, no. 4, pp. 251–269.

[24] Marx, W.J., Mavris, D.N., Schrage, D.P.: Integrating Design and Manufacturing for the High SpeedCivil Transport , Proceedings, 19th ICAS Congress / AIAA Aircraft Systems Conferencem 94–10.8.4,International Council of the Aeronautical Sciences, 1994.

[25] Miller, J.G.: Living Systems, McGraw-Hill, New York, 1978.

[26] Mistree, F., Smith, W.F., Bras, B.A., Allen, J.K., Muster, D.: Decision–Based Design: A Contem-porary Paradigm for Ship Design, Transactions SNAME, 1990, Vol. 98, pp. 565–597.

[27] Pahl, G., Beitz, W.: Engineering Design, Wallace Ed., Pomerans Transactions, The Design Council,Springer–Verlag, London/Berlin, 1984.

[28] Pennel, J.P., Winner, R.I.: Concurrent Engineering Practices & Prospects , Proceedings, IEEEGlobal Telecommunications Conference Exhibition, Part I, 1989, pp. 647–655.

[29] Priest, J.W., Sanchez, J.M.: Product Development and Design for Manufacturing , Marcel DekkerInc., New York, 2001, pp. 15–36.

[30] Rogan, J.E., Cralley, W.E.: Meta–Design - An Approach to the Development of Design Methodolo-gies, IDA Paper no. P–2152, Institute for Defense Analyses, Alexandria, Virginia, 1990.

[31] Simon, H.A.: The Sciences of the Artificial , The MIT Press, Cambridge, Massachusetts, 1982.

[32] Shina, S.G.: Concurrent Engineering and Design for Manufacture of Electronics Products, VanNostrand Reynold, New York, 1991.

[33] Suh, N.P.: Development of the Science Base for the Manufacturing Field through the AxiomaticApproach, Robotics and Computer lntegrated Manufacturing, 1984, Vol. 1, no. 314, pp. 397–415.

[34] Sullivan, L.P.: Quality Function Deployment , Quality Ptogress, 1986, Vol. 10.

[35] Trincas, G., Zanic, V., Grubisic, I.: Comprehensive Concept Design of Fast Ro–Ro Ships by Multiat-tribute Decision Making , Proceedings, 5th International Marine Design Conference, IMDC’94, Delft,1994, pp. 403-418.

[36] Wallace, K.M., Hales, C.: Detailed Analysis of an Engineering Design Project , Proceedings,International Conference on Engineering Design, ICED ’87, The American Society of MechanicalEngineers, 1987, Vol. 1, pp. 94–101.

[37] Whitney, D.E., Nevins, J.L., De Fazio, T.L., Gustavson, R.E., Metzinger, R.W., Rourke, J.M.,SeIzer, D.S.: The Strategic Approach to Product Design, Proceedings, Design and Analysis ofManufacturing Systems, National Academy Press. Washington D.C., 1988.

[38] Winner, R.I., Pennell, I.P., Bertrand, H.E., Slusarczuk, M.M.G.: The Role of Concurrent Engineeringin Weapons System Acquisition, IDA Report R-338, Institute for Defense Analyses, Alexandria,Virginia, 1988.

68

Chapter 3

Design As a MulticriterialDecision–Making Process

Although design is a purposeful activity directed toward the goal of fulfilling market and/orhuman needs, particularly those which can be met by the technological factors of one culture(Asimov, 1962), and even though the identification of design variables, parameters and con-straints, as well as the selection of the ‘best design solution’ represent a decision–making process(Hazelrigg, 1996), efforts for rationalizing the design process remained taboo for a long time. Infact, design was regarded substantially as an intuitive and creative activity for which talent wasnecessary, rather than as a rational and science–based work.

For more than three centuries, all over the world engineering design has been based on theNewtonian concepts of reductionism and mechanism, considering closed systems in equilibriumisolated from their environments. Only during the Second World War period, particular condi-tions and demands (i.e. shortage of materials) led to designing being more close observed, andcertain insights were made useful for rationalizing. After the Second World War the marketsbecame hungry after all products, but their quality played a secondary role. Only in the laterfifties, but more especially in the sixties, a new situation emerged which brought increasing andbroader demands on higher product quality. In addition, opening of the world markets forcedtoward an increasing international competition, which exploded in the nineties with globalization.

In the past sixty years, there has been a revolution in the way engineers view many of theirproblems and, even more recently, in the way design is being taught at some universities. Thepressure on the quality of products has led to searching for new knowledge about designing. Thefundamental reason for this change can be attributed to two separate events: a new emphasis onsystems engineering and the pervasive diffusion of computer science. In their synergistic coupling,they have irreversibly changed the world view of engineering and engineering education, and pro-vided the foundation for developing systematic methods for rational, science–based approachesto the design of large–scale, fuzzily–defined, trans–disciplinary technical systems open to externalenvironments.

69

3 – Design As a Multicriterial Decision–Making Process

In the sixties the research efforts were mainly devoted to design methodology . If one analyzesthe status of the design knowledge as it existed then, practically no references can be found tothe working methodology of the designers. The phases of the design process with respect to thedesigned product and also with respect to the design process raised further questions, which hadto be answered to arrive at definition of a rational design process.

3.1 Decision Making Process

Design concerns usage of available information to make intelligent decisions leading to the ‘bestpossible solutions’ which satisfy the customer’s requirements. Problem definition, for example,involves deciding what the customer requirements are, and how to define the constraints andtargets. Other design activities such as generation of alternative solutions, technology infusion,and concept selection of the ‘preferred solution’ heavily rely on decision–making processes (Liet al., 2004). Therefore, one can state with confidence that design is a decision making processsubject to a set of rules or governibg rational axioms.

”Decision making is characterized by its involvement with information, value assessment andoptimization. Thus, whereas inventiveness seeks many possible answers and analysis seeks oneactual answer, decision making seeks to choose the one best answer” (Dixon, 1966). But the‘one best answer’ can be difficult to obtain, particularly when the decision is based on severalobjectives.

The theory of design makes it possible to help decision makers in identifying which design vari-ables are needed to satisfy the functional requirements of an industrial product, in deciding whya design is better than the others, in understanding whether the ‘preferred solution’ is a robustdesign, and so forth. These and similar goals form a decision–making problem in systems engi-neering.

The close relation between design and decision making can be seen from the following statements:”A decision–making problem exists when and only when there is an objective to be reached, al-ternative methods of proceeding, and a variety of factors that are relevant to the evaluation of thealternatives or their probability of success” (Dixon, 1966), and: ”Decision making is the study ofidentifying and choosing alternatives based on the values and preferences of the decision maker(s).Making a decision implies that there are alternative choices to be considered, and in such a casewe want not only to identify as many of these alternatives as possible but to choose the one thatbest fits with our goals, objectives, desires, values, and so on” (Harris, 1980).

Decision making can be shortly defined as the cognitive process based on explicit assumptions,which leads to the selection among feasible alternatives up to a final choice. Structured rationaldecision making is an important part of all science–based activities, where specialists apply theirknowledge in a given area to make decisions.

In general, evaluation of the performance attributes of the design solution is needed to satisfysome functional requirements and constraints. For example, to design a large merchant ship,

70

3.1 – Decision Making Process

multiple requirements, such as targets on hydrodynamics, propulsion, structure and noise, needto be satisfied. Usually, the design that best satisfies one individual requirement does not havethe best performance on other requirements. That is, typically there is no design that has thebest performance on all the requirements. As a result, trade–offs need to be done when multiplecriteria are simultaneously taken into account. This usually involves decision making activities,such as determining the preference information of the customer, establishing the decision rules toevaluate the alternative designs, and selecting the ‘best solution’ among the alternatives. Sen andYang (1998) point out that decision making in engineering design ”can be helpfully visualized asa collection of activities that relate to choice in the context of competing technical or functionalrequirements”. Dieter (2000) also argued that ”decision making is essentially part of the designprocess and the fundamental structure in engineering design”.

3.1.1 Decision Making in Technical Systems Design

According to Baker et al. (2001), decision making should start with the clear identification ofthe decision makers’ requirements, reducing the possible disagreement about problem definition,requirements, goals and criteria. Figure 3.1 shows a possible decision–making process at conceptdesign phase, which can be divided in a set of steps.

Figure 3.1. A decision–making cycle at concept design level

71


In the intelligence phase, the goal is to define the problem, collect the necessary information,identify criteria and establish goals. The decision makers (the design team) must translate theproblem in a clear, concise problem statement agreed by the customer(s). Even if it can be some-times a long iterative process to come to such an agreement, it is a crucial and necessary pointbefore processing ahead.

The following step is to determine the crisp criteria, which check feasibility of design alternativesby different levels of elimination. To establish goals, the decision makers should answer such ques-tions as: what is more important? which are the attributes? maximizing product performance orminimizing its cost? what about minimizing risks? Getting a clear understanding of the crucialgoals in a decision situation must be done before design evaluations are accomplished.

When the design problem is clearly stated and pertinent criteria and goals are established, thenext step, e.g. the design phase is to generate the design alternatives after formulation of anadequate design mathematical model. Any alternative must meet sets of criteria for design andselection. The infeasible solutions must be deleted from further consideration, thus obtaining theexplicit list of feasible alternatives. Often a careful examination and analysis of outcomes canreveal design alternatives that were not obvious at the outset. Therefore, ‘modelling and solving’form the heart of most textbooks on decision analysis. Although the idea of modelling is criticalin decision making, design problems are generally decomposed to understand their structures aswell as to measure values of their properties and related uncertainties. Indeed, decompositionmay be seen as a milestone to decision analysis (Clemen, 1996). The first level of decompositioncalls for structuring the problem in smaller and more manageable subproblems. Subsequent de-composition by the decision maker may entail careful consideration of elements of uncertainty indifferent parts of the problem.

Modelling may be performed in several ways. Influence diagrams or decision trees may be usedto create a representation of the decision problem. Probability theory is used to build modelsof the uncertainty inherent in the design problem. Hierarchical and network models are usedto understand the relationships among multiple attributes (objectives), and utility functions ormetamodels are assessed in order to model the way in which decision makers evaluate differentoutcomes and trade–off competing attributes (objectives).

Every correct method for decision making needs, as input data, evaluation of the alternativesversus the criteria. Depending on the criterion, the assessment may be objective, with respectto some commonly shared and understood scale of measurement, or can be subjective reflectingthe subjective assessment of the decision maker(s). After the evaluations the selected decision–making tool can be applied to rank the alternatives or to choose a subset of the most promising(nondominated) alternatives.

Decision analysis, e.g. the selection phase, is typically an iterative process. Once the ‘bestalternative’ has been designed, sensitivity analysis is performed. If a ‘preferred solution’ or a‘compromise solution’ has been selected, the further step is to improve it by developing basicdesign.

72

3.1 – Decision Making Process

The arrows in Figure 3.1 show that the decision maker(s) may return even to the intelligencephase. It may be necessary to refine the definition of the attributes (objectives) or include at-tributes (objectives) that were not previously included in the mathematical design model . Newalternatives may be identified, the design model structure may change, and the models of uncer-tainty and preferences may need to be refined. The term decision–making cycle best describesthe overall process, which may go through several iterations before a satisfactory solution is found.

In this iterative process, the decision maker’s perception of the problem might change, beliefsabout the likelihood of various uncertain eventualities might develop and change, and inter- andintra–attribute preferences not previously considered might mature as more time is spent in re-flection. Decision making not only provides a structured way to think about decisions, but alsomore fundamentally provides a structure within which a decision maker can develop preferences,that is, subjective judgements that are critical for a good solution.

Figure 3.2 illustrates main categories involved with decision–making activities. Most expressionsand their usage will be explained diffusely over this textbook. Here, the complexity of thedecision–making process is considered mainly with reference to multiutility and multicriterialconcepts.

Figure 3.2. Activities and categories associated with decision making

73


3.2 Basic Concepts of Multicriterial Decision Making

Design is a decision making process which permeates throughout the design process, and is at thecore of all design activities. In modern design of technical systems, more and more attention ispaid to the concept and preliminary design phases so as to increase the odds of choosing a designthat will ultimately be successful at the completion of the design process. Decisions made duringthese early design stages play a critical role in determining the success of a design.

3.2.1 Why Multicriterial Decision Making?

Worldwide experience indicates that successful innovative products presuppose significant innova-tion in design strategy . Many design techniques have been introduced over the course of decadesto produce the best product possible. But, whereas inventiveness seeks many possible answersand analysis seeks one actual answer, decision making seeks to choose the ‘best possible solution’.But such a solution can be difficult to obtain, particularly when the decision is based on severalcriteria. As decisions in design are multidimensional in nature, they are based on multicriterialdecisions. There is no doubt that the discipline of the decision–making theory which is calledmulticriterial decision making (MCDM), better respects the very character of a rational designprocess.

Since complex technical systems deal with interacting disciplines and technologies, the decisionmakers dealing with design problems are involved in balancing the multiple, potentially conflictingattributes/objectives, transforming a large amount of customer supplied guidelines into a solidlydefined set of requirement definitions. Thus, one could state with confidence that modern designis a MCDM process.

Multicriterial decision–making methods apply to problems where a decision maker is selectingor ranking a finite number of alternatives which are measured by often conflicting attributes.Multiple criteria pervade all that people do and include such public policy tasks as determininga country’s policy, developing a national energy plan, as well as planning national defense expen-ditures, in addition to such public/private company tasks as are product development, pricingdecisions, and research project selection. All have a common thread, i.e. multiple conflictingtargets.

Almost every design problem in modern engineering design inherently has multiple criteria whichneed to be satisfied. It is often the case that good values of some criteria inevitably go with poorvalues of others, so that the best design is always a compromise in some sense. In order to findthe best compromise design solution, designers are required to take all the metrics of interest intoaccount concurrently when making decisions. For example, when designing a merchant ship, de-signers will have to consider reducing cost, increasing performance and minimizing motions. As aresult, a trade–off has to be done, and compromise becomes an essential part of the multicriterialdecision–making process.

74

3.2 – Basic Concepts of Multicriterial Decision Making

Typically, in order to solve an MCDM problem, some necessary factors need to be known before-hand:

• well defined, measurable criteria;• preference information on the criteria• feasible design alternatives• rational decision–making method.

The criteria can be thought of as the measure of performance for an alternative, such as speedand payload of a ship concept, and can be checked with respect to the customer’s requirements.The alternatives are the candidates among which the ‘best solution’ is selected. They may bedesigns that are already existing, or need to be generated in the design process. Since the criteriado not have the same priority to the customer, the preference information on the criteria shouldbe defined. Relative weights, which are assigned beforehand or calculated, are a transparent wayto represent the preference information. A set of appropriate alternatives has critical impact onthe final solution because the final solution is one of the elements of this set.

Characteristics of MCDM

MCDM usually refers to the set of methods enabling a decision maker to make decisions in thepresence of multiple, often conflicting, criteria. It is an excellent tool for multiattribute selectionand multiobjective optimization of industrial products. MCDM as a discipline, and its applica-tion, has increased significantly after development of computer science, as most of methods arecomplex combinations of higher mathematics.

Even if designs may be managed by means of MCDM techniques in widely different ways, theyshare the following common characteristics:

• Problem statement . Problem formulation is based on identifying the true needs of thecustomer and formulating them in a set of targets (attributes, objectives) for the designsolution. The problem statement has to express as specifically as possible what is intendedto be accomplished to achieve the established goals. Design specifications are a major com-ponent of the problem statement. A good problem statement plays an important role indetermining the success of the final solution.

• Resolution of conflict among multiple criteria. The problem definition yields a set of at-tributes/objectives (criteria) on which the design team should base its design decisions.Criteria play the essential role in the decision–making process, where an alternative solutionis deemed successful if the customer desired levels are met. Multiple criteria usually con-flict with each other. MCDM allows managing these conflicts since is a conflict–resolutionapproach.

• Normalization of attribute values. Each attribute/objective has a different unit of mea-surement. In a technical system selection case, fuel consumption is expressed by tons permile, comfort is measured by specialized indexes in a non-numerical way, cost is indicatedby monetary units, etc. Hence, a normalization of the criteria values may be essential toobtain comparable scales.

75


• Selection/Optimization. Solutions to design problems are either to select the best solutionamong previously defined finite number of alternatives or to optimize the ‘best possiblesolution’. At first, the MCDM selection process involves searching for an alternative that isthe ‘best possible solution’ or the ‘preferred solution’ over all criteria. Then the ‘preferredsolution’ can be improved by means of a MCDM optimization process.

Studies dealing with the identification of decision alternatives focus on the question how the‘complete solution’ of a decision problem with multiple attributes/objectives can be describedand characterized. This ‘complete solution’ consists of the set of functionally–efficient decisionalternatives and/or the set of efficient vectors of objective values. For linear decision problems,like multiattribute decision making, efforts have been made to identify functionally–efficient facetsof the set of alternatives by assigned preference set of weights given to attributes. Extensionsare concerned with the question to what extent available computational techniques which havealready been applied to linear problem formulations are useful and/or must be modified for thedetermination of the set of efficient points in nonlinear problems.

Dichotomy of decision making

The decision making about a problem may be partitioned by means of the following double di-chotomy:

1. Is it a problem under certainty or uncertainty? If it is in the uncertainty category, then onehas to assume that to each decision there is a well–defined probability distribution over thepossible resulting consequences.

2. Is it a single or multiple attribute problem? That is, can the outcome be adequately de-scribed in terms of a single descriptor or a single aggregate measure like cost, or is morethan one attribute needed? Then the decision can be made implicitly by determining thealternative with the best value of the single attribute or aggregate measure.

The most general case to consider is when a decision problem is both uncertain and multidi-mensional. It can be labelled as x, where the tilde represents uncertainty and the boldface xrepresents a vector in contrast to a scalar. It can be distinguished from cases as exhibited inFigure 3.3.

Figure 3.3. Double dichotomy of decision problems

76


When the problem is both certain and unidimensional: the decision maker merely chooses thefeasible alternative that maximizes the given single objective measure. In practice, if the alter-natives are numerous and constraints are given in terms of a set of mathematical bounds, thedecision maker might be hard pressed to find the optimum by employing the entire range ofmathematical programming techniques. There are many MCDM methods available. As theyhave each its characteristics, there are many ways to classify them. One classification method isaccording to the type of data they use, which can be deterministic, stochastic, or fuzzy . Anotherway to classify MCDM methods is corresponding to the number of decision makers involved inthe decision–making process: either only one decision maker, or a group of decision makers.

Since its early development a few decades ago, multicriterial decision making has reached ma-turity but not in all respects. A still too large part of research in this field concentrates onalgorithms rather than problems, even though more attention is being paid to the adaptationof tools to problems instead of the other way round. MCDM is coupled more and more withdecision support tools, using results of research in human sciences and organization theory, as faras they are concerned with the study of decisions by either individuals or groups.

3.2.2 Individual Decision Making

To generate and select solutions for multicriterial decision problems involving only one decisionmaker, one frequently assumes some decision rule which serves as the decision maker’s guidingprinciple. One can distinguish between multiattribute decision problems and multiobjective pro-gramming problems. The former are concerned with the task of ordering a finite number ofdecision alternatives, each of which is explicitly described in terms of different attributes, whichhave to be taken into account simultaneously. The crux of the problem is in obtaining informationon the decision maker’s preferences. This can be achieved in many different ways. The spectrumranges from directly asking the decision maker for preference statements on the basis of strongorders over preference functions, to the attempt to decompose a cardinal utility function withrespect to its arguments in order to be able to measure the effects of isolated changes of individualattributes. On the contrary, multiobjective decision problems are usually characterized by thefact that several objective functions are to be optimized with respect to an infinite convex set(implicitly described by a set of constraints) of decision alternatives.

In a relatively large number of procedures, a linear or locally linear approximating utility func-tion is assumed. An optimal solution is then detected gradually by asking the decision makerfor certain values of the objectives, for weights given to the objectives or for marginal rates ofsubstitution between pairs of objectives.

In recent years, a large part of research has been devoted to sensitivity analysis (robustness),that is, to ascertain how sensitive is a given problem solution to unpredictable changes in someparameters. This question is not only important because of uncertainty with respect to the toolsand their effectiveness, but also because of uncertainty about the ‘rightness’ of the statements onthe decision makers’ preferences.

77


3.2.3 Group Decision Making

With the complexity of design problems increasing, decision making is almost an impossible taskfor the individual decision maker to manage. Group decision is usually understood as aggregatingdifferent individual preferences on a given set of alternatives to a single collective preference. It isassumed that the individuals participating in making a group decision face the problem and areall interested in finding a solution. A group decision situation involves multiple decision makers,each with different skills, experience and knowledge relating to different aspects (criteria) of theproblem.

Decision making in groups is sometimes examined separately as process and outcome. This hasled to a series of different methodical approaches. One method has tried to apply the conceptswhich have proven to be successful in dealing with multicriterial problems with one decision makerto problems involving a multiplicity of decision makers using the same analytical tools. Problemson preference structures have been been considered within the framework of multiattribute utilitytheory (MAUT). Among the others, the following questions have been dealt with: which axiomsallow the aggregation of the individual utility functions into a group preference function? whatforms of group preference functions may be contemplated?

Another approach has chosen a completely different starting point. Partly based on game andbargaining theoretic approaches, the conditions are examined under which the former can beapplied to multiobjective decision problems in groups. The game and bargaining theoretic ap-proaches have their basis in utility theory as well as in other axiomatic viewpoints. Critics pointout that the axiomatic foundation has a large influence on the determination of the optimal solu-tion, which consequently entails the loss of flexibility required for practical applications. However,this critique is counterbalanced by the presence of a great number of such approaches which areable to deal adequately with real decision behavior as observed in groups and organizations.

Individual and group decision making are interrelated and can be approached from the samemethodological viepoint.

3.2.4 Elements of MCDM

By MCDM one refers usually to a set of methods enabling a user to aggregate several evaluationcriteria in order to select one or several ‘actions’ (projects, solutions, etc.). But these expressionsrefer also to the activity of supporting decisions for a well–defined decision maker (individual,groups, company, ...).

Set of Methods

The main available methods in decision–making theory stem from very different horizons:

• Utility theory , born in the eighteenth century with the first works of Bernoulli, was con-cerned at first with modelling preferences of an individual decision maker who must choose

78


among alternatives with risky outcomes. Multiattribute utility theory (MAUT) is a devel-opment of utility theory .

• Theory of social welfare was also born in the eighteenth century with the works of theMarquis de Condorcet who was interested in the problems of aggregating individual pref-erences expressed under the form of collective unique ranking . Some methods issued fromthis field of research use developments in linear programming ; some others are at the originof important concepts in multicriterial decision making such as outranking relation.

• Multiattribute decision making approaches are more suitable to design problems which ofteninvolve a conflict resolution process focusing on problems when the number of the criteriaand alternatives is finite; the analytical and synthesis tools in concept design must allowfor this.

• Operational research and mathematical programming involve the design of the ‘best alter-native’ by considering the trade–offs for an infinite number of feasible alternatives within aset of interacting constraints. It always had to handle the difficult question of choosing aparticular objective function leaving some aspects of the preference in the set of constraints.Many important concepts and methods have been developed in this field; among the others,the goal programming approach, methods to find the set of efficient solutions, interactivemethods to find a compromise solution, etc.

• Data analysis and multidimensional scaling have recently been conceived with the analysisof qualitative and often ordinal data. Regression methods such as response surface method-ology have been proposed in order to estimate the parameters of a model (additive valuefunction) consistent with some holistic ranking of alternatives.

Modelling Decision–Making Activities

Roy and Vincke (1980) define decision making as ”the activity of a person who relies on clearlyexplicit but more or less completely formalized models, in order to get answers to the questionsposed in a decision–making process”. This definition refers to a very large conception of decisionmaking, if compared with classical operational research whose aim is to find out the optimalsolution: it implies analytical approaches or mathematical models. From a practical viewpoint,decision making leads to modelling activities at three levels:

1. Nature of the decision and choice of a problem formulation

While identifying a set of alternatives, the decision maker has to choose a problem formu-lation which might be:

• choice of one and only one alternative;

• choice of all good alternatives;

• choice of some of the best alternatives.

79


2. Definition of a set of criteria

If the choice of a single criterion is too difficult or arbitrary to make, one has to use severaland often conflicting criteria. The concept of a consistent family of criteria gives conditionsto respect in the choice of a set of criteria.

3. Choice of an approach in aggregating the criteria

In order to aggregate the criteria, one can choose one among the following approaches:

• Aggregation of criteria into a single one called value/utility function

A utility function is the name often given to a multicriterial utility function; this modelconsists of aggregating the n criteria into a function U (g1,g2, . . . ,gn), which representsan overall criterion. In utility theory, the distinction is made between a value functionwhen no risky outcomes are taken into account, and a utility function which allowsthe comparison of risky outcomes through the computation of an expected utility.

• Aggregation models in an outranking relation

These models aggregate the criteria into a partial binary relation (outranking relation)which is ‘more complete’ than the dominance relation.

Dominance relation and efficient set. They are interesting concepts when, but onlywhen, the problem formulation is to select one and only one alternative.

Concepts for building outranking relations are:

– concordance: it generalizes the concept of majority rule;– nondiscordance: it is used to reject a situation of a over b whenever there exists

a criterion for which b is ‘much better’ than a;– cardinal outranking relations: it uses the concept of trade–off ratio.

• Interactive and local aggregation of criteria to find a compromise solution

Even though this method was first proposed in the context of multiobjective linear pro-gramming. using the notion of ideal point (Zeleny, 1982), it is better suited to MADM.Each coordinate of this point equals the maximum value which can be obtained on thecorresponding attribute without considering the other criteria, i.e.

g∗ is such that g∗i = maxa∈A

∀ i

The interaction process can rely on the following phases (Roy, 1975):

– Search of candidate designs for a compromise solution: considering the informa-tion available on the preference of the decision maker, the model searches for oneor more alternative which could appear as possible compromise solution(s);

– Communication to the decision maker: these solutions are shown to the decisionmaker, together with all the information which seems useful to him/her, such asthe values of these solutions on the different criteria;

80

3.3 – Multicriterial Decision–Making Theory

– Reaction of the decision maker: some solutions can be judged satisfactory and thenthe procedure stops; otherwise, information on the decision maker’s preferences isobtained; the type of information differs from one method to the other (holisticjudgement, aspiration levels, new constraints which modify the ideal point, etc.).

3.3 Multicriterial Decision–Making Theory

Almost every design problem in modern engineering design has multiple criteria to satisfy. Itoften happens that good values of some criteria inevitably go with poor values of others, so thatthe the ‘best possible design’ is always a compromise. In order to find the best compromise designsolution, designers are required to take all the metrics of interest into account concurrently whenmaking decisions. For example, when designing a large merchant ship, designers will have toconsider reducing cost, increasing performance and minimizing risks simultaneously. As a result,a trade–off will be done.

In a multicriterial decision–making process there is a decision maker, or a group of decision makerswho make the decisions, a set of attributes (objectives) that are to be pursued and a set of alterna-tives from which one is to be selected. In a decision situation the decision makers have to managegoals, criteria, objectives, attributes, constraints and targets, in addition to decision variables.Although goals, criteria, objectives, and targets have essentially similar dictionary meanings, itis useful to distinguish them in a decision–making context. For example, while criteria typicallydescribe the standards of judgements or rules to evaluate feasibility, in MCDM they simply indi-cate attributes, objectives and constraints. These terms are individually defined in the Appendix.

In design, multicriterial considerations arise as soon as both economic and technical factors arepresent in the design evaluation and selection. In the framework of prescriptive design models(that is, directed toward helping the decision maker to make better decisions), a set of multi-criterial decision–making methodologies, i.e. sequential linear programming, weighted criteriamethods, goal programming, fuzzy outranking, etc., was developed (Keeney and Raiffa, 1993;Hamalainen et al., 2000). They allow the design process to involve a number of often conflictingcriteria both of technical and economic nature to be handled simultaneously . MCDM techniquesenable the design team either to generate and select the ‘best possible design’ or to evaluate themerit index of alternative designs or to optimize some features (subsystems) of a robust design.

3.3.1 MCDM Background

A solution to be delivered in a multidimansional and complex environment should indicate anoptimal decision under given circumstances in precisely defined temporal and space limits. Inorder to cope with such a structured problem, the decision making theory adopts complex math-ematical models, methods and techniques, some using qualitative, others quantitative approach.

81


The quantitative approaches, in particular MCDM, have shown to be the most favourable instru-ment when:

• the decision problem is vague, new or complex,

• economic implications are significant,

• existing esperience (objective and/or subjective) is insufficient,

• intuition is not an option.

The main components of a MCDM process are the resources (attributes, alternatives, criteria),the process of transformation (mappings) and the final state (decision) (see Figure 3.4). Theamount of existing knowledge related to each component defines the rank of a well or a baddefined decision–making problem.

Figure 3.4. Relations between components of the multicriteria decision–making process

3.3.2 Mathematical Definition

Mathematical formulation to soving a MCDM problem is shortly presented here. Further detailsare elaborated in Hwang and Masud (1979), Keeney and Raiffa (1993), Belton and Stewart (2002).The decision matrix aggregates the complete problem related information and is a basis for theproblem solution

[xij = fj(Ai)]m×n , i = 1, . . . ,m , j = 1, . . . ,n (3.1)

where m, n are the number of alternatives and criteria, respectively,; xij = fj(Ai) indicates thevalue of criterion Xj with respect to alternative Ai; S = {f1, f2, . . . ,fn} is the set of criteria,defined as

(∀x ∈ X) (∃f(x) ∈ S) : X 7→ S = {f(x) | x ∈ X} (3.2)

82


X = {x | g(x) ≤ 0} and g(x) ≤ 0 are the problem–related set of attributes/objectives and thecorrespnding vector of constraints, respectively; and A = {A1, A2, . . . , Am} is the set of the iden-tified feasible alternatives. A weighting factor, wj , can be associated to each criterion indicatingits importance.

The, the ‘best solution’ to a MCDM problem is defined as

max /min U(f)n∑

i=1

wi ·ui(fj(x)) (3.3)

where U(f) is the overall utility function, whereas wi and ui are the weighting factor and theutility related to a particular criterion and the corresponding alternative, respectively.

3.3.3 MADM and MODM

The discipline of multicriterial decision making can be broadly grouped into two classes; the mul-tiattribute decision–making (MADM) and multiobjective decision–making (MODM) techniques:

• Multiattribute decision making includes methods that involve selection of the ‘best possibledesign’ from a discrete pool of alternatives described in terms of their prioritized attributesAttributes are generally defined as characteristics that describe in part the state of a productor system.. Assessment of alternatives and selection of the ‘best possible design’ is done viastraightforward evaluation. The increased speed of computers provides the opportunity tomodel a complex design problem as a multiple evaluation process by intentionally creatinga large number of design variants.

Most of the techniques available for dealing with multiple attribute problems require in-formation about (i) the decision maker’s preferences among values of a given attribute(inter-attribute preferences) and (ii) the decision maker’s preference across attributes (intra-attribute preferences). The multiple attribute techniques either directly ask the decisionmaker for an assessment of the strengths of these preferences or they infer them fromhis/her past choices, while all attributes are evaluated simultaneously .

• Multiobjective decision making relates to techniques that synthesize a set of alternatives,which optimize or ‘best satisfy’ the set of mathematically prescribed objectives (or goals)and constraint functions of the decision maker(s). MODM problems involve the design of the‘best alternative’ by considering the trade–offs within a set of interactive design constraints.They assume continuous solution spaces. i.e. the number of alternatives is effectively in-finite and the trade–offs among design objectives are typically described by continuousfunctions. Multicriterial optimization problems fall under the heading of MODM (Stadler,1988). That is, optimization will be performed to maximize or minimize the associatedobjective(s), and the final selected solution is a design with the best values of the objec-tive(s). Each optimization problem can be classified into two parts: the set of functionsto be optimized (minimized or maximized), i.e., objectives; and the set of functions to besatisfied in terms of their predetermined values, i.e., constraints.

83


In general, the objectives are often conflicting; so the optimal solution is usually a com-promise that simultaneously can best satisfy the different objectives. The inverse mappingimplied in this design class is entangled with complex mathematical problems. They havelead to different methods tailored to characteristics of objective and constraint functionsof the problem at hand. Long experience with MODM has shown that during designprocess number of design alternatives should be investigated, each requiring execution ofnonlinear programming modula with sophisticated convergence checks, linearization tech-niques, etc. A general multiobjective optimization problem is to find the vector of designvariables x = (x1, x2, . . . , xn)T which minimizes/maximizes a vector of objective functionsf(x) = [f1(x), f2(x), . . . , fk(x)]T over the feasible design space x, subject to a set of con-straints gr(x) ≤ 0.

In actual practice this classification is well fitted to the two facets of design solving: MADM isfor design selection of the ‘best possible solution’ among a finite number of alternatives, whereasMODM is for design optimization of the ‘best possible solution’.

In this respect, MADM methods are best conceived for the concept design phase whose finaldecisions (top–level specifications) become constraints for the subsequent design phases. On thecontrary, MODM methods, mostly based on goal programming, are more oriented to supportdecision making in basic design, since it presupposes some direct computations (Mistree et al.,1991; Sen, 1992; Ray and Sha, 1994, Lee and Kim, 1996). Although the goal programming tech-nique provides a way of striving towards several objectives simultaneously, the sequential natureof its procedure implies that the various objectives have to be ranked in a strict hierarchy (Smith,1992), thus loosing potentiality of really considering all objectives simultaneously.

The main distinctions of MADM and MODM are enumerated in Table 3.1 according to Yoonand Hwang (1981).

Elements MADM MODM

Criteria Attributes Objectives/goalsObjectives Implicit ExplicitAttributes Explicit ImplicitAlternatives Finite number Infinite numberApplication Design selection Design optimization

Table 3.1. MADM vs. MODM

Nevertheless, MADM and MODM approaches should not be thought of as alternative method-ologies to each other but complementary in a rational design strategy. When dealing with mul-tiattribute and multiobjective decisions, a combination of methods is often more effective than asingle technique.

84


3.3.4 Properties of Attributes/Objectives

Identifying attributes/objectives is an important activity in structuring the decision–making pro-cess. Fundamental attributes are organized into a hierarchy in which the lower levels of thehierarchy explain what is meant by the higher levels.

To provide the means to measure accomplishment of the fundamental attributes, the concept ofattribute scales is introduced. Some attribute scales are easily defined; others are more difficult.For example, there is no obvious way to measure risks related to aesthetic aspects.

To adequately represent the properties of a technical system, it is important in any decisionproblem that the set of attributes have appropriate characteristics: to be complete, so that theycover all the important aspects of the problem; to be minimal, so that the number of attributesis kept as small as possible; to be decomposable, so that the evaluation process can be simplifiedby breaking it down into parts; to be workable, so that they can be meaningfully used in theanalysis; and to be non-redundant, so that double counting of their impacts can be avoided. Anencapsulation of the essential criteria follows:

Completeness. A set of attributes/objectives is complete if it includes all relevant aspects of adecision problem and is adequate in providing the decision maker with a clear picture about thedegree to which the overall goal is met. This condition should be satisfied when the lowest–level properties in a hierarchy include all areas of concern in the problem at hand and whenthe individual attributes associated with each of the lowest–level properties in this hierarchysatisfy the comprehensiveness criterion. The fact that important attributes are missing can beindicated by reluctance of the decision maker to accept the results of an analysis or simply thefeeling that something is missing. If the results ‘just don’t feel right’, the decision maker has to askhimself/herself what is wrong with the alternatives that the analysis suggests should be preferred.

Minimum Size. At the same time, the set of attributes (objectives) should be as small aspossible. Too many attributes can be cumbersome and hard to grasp. Furthermore, each at-tribute/objective should differentiate the available alternatives. If all the alternatives are almostequivalent with regard to a particular attribute, then that attribute will not be of any help in thedecision–making process. In some problems, it is possible to combine attributes and thus reducethe dimensionality. The decision makers often want to fulfill conflicting attributes/objectives and,since this is an ideal which cannot be achieved, they must make usage of multicriterial methods.

Non-redundancy. The final set of fundamental attributes/objectives should not be redundant.That is, the same attributes should not be repeated in the hierarchy, and the attributes should notbe closely related. A way in which redundancies enter a set of attributes is when some attributesrequire variables that are inputs to a system while others require variables that are outputs. Oneexample of such a problem is the evaluation of space vehicles. An input might be ‘weight’ andan output might be ‘thrust’ required to break out of the earth’s gravitational field. Weight mayonly be important because of its implications on thrust.

85


Decomposability. As far as possible, the set of attributes should be decomposable. A formaldecision analysis requires possibility of quantifying both the decision makers’ preferences andtheir judgments about uncertain events. For a problem with n attributes, this means assessingan n-attribute utility function as well as joint probability distributions for the relevant uncer-tainties. Because of the complexity involved, these tasks will be extremely difficult for decisionproblems in which the dimensionality n is even modestly high unless the set of attributes is madedecomposable.

Workability. Attribute scales must be workable, that is, they should provide an easy way tomeasure performance of the alternatives or the outcomes on the fundamental attributes. Theattributes must be meaningful to the decision makers, so that they can understand the implica-tions on the design alternatives. The decision makers must be aware of the many non-technicalproblems that may render a set of attributes as non–workable.

3.3.5 Typology of MCDM Models

Quite naturally, different researchers have proposed different decision making typologies, whichreflect their own biases. So, any typology reflects individual interpretation of the world of MCDMmodels. The main dimensions of a possible typology are:

• the nature of outcomes: deterministic versus stochastic.• the nature of the alternative generating mechanism. e.g. whether the constraints limiting

the alternatives are explicit or implicit.

These dimensions are indicated in Table 3.2. The left–hand column includes the implicit con-straint models. When the constraints are implicit or explicit and non–mathematical, the alter-natives must be explicit. One of a list of alternatives is then selected.

Implicit Constraints Explicit Constraints(Explicit Solutions) (Implicit Solutions)

Deterministic Choosing among deterministic Deterministic mathematicalOutcomes discrete alternatives programming

Stochastic Stochastic decision Stochastic mathematicalOutcomes analysis programming

Table 3.2. A multicriterial decision method typology

The decision analysis problem is included in the implicit constraint category. When the con-straints are explicit and mathematical, then the alternative solutions are implicit and may beinfinite in number if the design space is continuous and consists of more than one solution. Prob-lems in the explicit constraint are generally regarded as mathematical programming problemsinvolving multiple criteria.

More dimensions may be added to this typology. In addition to implicit constraints versusexplicit constraints, and deterministic outcomes versus stochastic outcomes, other dimensions

86

3.4 – Nondominance and Pareto Optimality

can be identified as well. The number of decision makers may be classified as a dimension: onedecision maker versus two or more decision makers. One may classify the number of objectives,the nature of utility function considered, as well as the number of solutions found.

3.4 Nondominance and Pareto Optimality

Since good values of some criteria inevitably go with poor values of others, the goal of the MCDMis to find the ‘best compromise’ solution which has best overall performance of satisfying all theattributes. This ‘best compromise’ solution can be obtained from a set of design alternatives re-ferred to as the efficiency frontier or Pareto optimal–set . All these solution sets consist of pointshaving a simple and highly desirable property, i.e. dominance.

A point in a set is nondominated in that no other point is feasible at which the same or betterperformance could be achieved with respect to all criteria, with at least one being strictly better.

The nondominance solution concept, originating with Pareto (1906), has been one of the corner-stones of traditional economic theory. It is usually stated as the Pareto optimality principle: asolution B is dominated by solution A if by moving from B to A at least one attribute (objectivefunction) is improved while leaving others unchanged. A design solution is nondominated if thereis no other feasible solution which would improve at least one attribute (objective function) andnot worsen any other.

The definition of the Pareto optimality indicates that there is no other feasible solution in the de-sign space which has the same or better performance than the Pareto optimal solution consideringall criteria; the Pareto–optimal solution does not have the best performance in all criteria (Zeleny,1982). It is clear that the Pareto-optimal solution is a nondominated solution which is achievedwhen no criteria can be improved without simultaneous detriment to at least one other criterion.The locus of the Pareto–optimal solutions is known as Pareto frontier . A two–dimensional Paretofrontier is illustrated in Figure 3.5 for ‘smaller is better’ criteria.

The set of nondominated solutions is often referred to in the literature as the ‘efficient set’, the‘admissible set’, the ‘noninferior set’, the ‘Pareto–optimal set’, etc. The term ‘nondominated’should be preferred because of its clear, unambiguous meaning and because it best describeswhat such points really are not dominated by other points.

It is useful to express nondominance in terms of a simple vector comparison. Let x and y be twovectors of n components, x1, . . . , xn and y1, . . . , yn, respectively. Thus

x = {x1, . . . , xn} and y = {y1, . . . , yn}

One can say that x dominates y if xi ≥ yi (i = 1, . . . ,n) and xi > yi for at least one i, and maycompare x and y directly and say that x dominates y if x ≥ y and x 6= y.

87


Figure 3.5. Two–dimensional Pareto frontier

Assume that x belongs to a set of feasible solutions or feasible design alternatives, designated X.Then it is nondominated in X if there exists no other x in X such that x ≥ x and x 6= x.

The set of all nondominated solutions in X is designated N . The main property of N is that forevery dominated solution (i.e., feasible solution not in N) the decision maker can find a solutionin N at which no vector components are smaller and at least one is larger. Figure 3.6 providessome graphic explanation of the above concepts. Feasible set X, the shaded area in the two–dimensional space of points x = {x1, x2}, consists of feasible combinations of x1 and x2.

Observe that the point x in X is dominated by all points in the shaded subregion of X, indicatingthat the levels of both components can be increased simultaneously. Only for points in N doesthis subregion of improvement extend beyond the boundaries of X into the infeasible region.Thus the points in N are the only points satisfying given definitions, and they make up the heavyboundary of X. All other points of X are dominated.

Figure 3.6. Set of nondominated solutions

88

3.4 – Nondominance and Pareto Optimality

Finding N on X is one of the major tasks of multiattribute methods and multiobjective pro-gramming. At this point some comments should be made about the usefulness of nondominatedsolutions. Among the advantages of dominance concept the following are relevant:

1. Multiple attributes/objectives are often incommensurate, both quantitatively and quali-tatively, and carry different weights of importance. This leads to a complex problem oftrade–off evaluation using the decision maker’s utility or preference function. Reliable con-struction of a utility function may, however, be too complex, unrealistic, or impractical.The set of nondominated solutions then provides a meaningful step forward under suchconditions of relative ignorance.

2. If more is always preferable to less, then any solution which maximizes the utility functionof a rational decision maker must be nondominated: if more is preferred to less then onlyhigher or equal utility may be derived from the increased levels of corresponding attributesor criteria of choice. Such a utility function is said to be nondecreasing in its arguments;that is

U (x1 + ∆1, x2 + ∆2) ≥ U (x1, x2) for ∆1,∆2 ≥ 0

Thus, regardless of the specific mathematical form of U , one knows that its maximum willbe reached at a nondominated point.

3. If N consists of only a relatively small number of solutions or alternatives of choice, thereis no need to search for the decision maker’s utility function. Consequently it makes senseto explore X and characterize its N before engaging in the assessment of U . It is not wiseto gather and process all the information needed for utility assessment without finding theapproximate size of N first. It is even possible that an alternative will emerge, such asshown in Figure 3.7.

Figure 3.7. Conflict–free solution

Observe that N consists of a single point only, that such a point will always be the choiceunder the assumption of nondecreasing utility functions, and that an assessment of U forthis particular X would constitute an effort of considerable redundance.

89


4. The set of nondominated alternatives can be useful in dealing with more complicated typesof X; for example, discrete point sets or nonconvex sets of feasible alternatives. Figure 3.8ashows seven distinct alternatives. The nondominated ones are indicated by heavy dots.Observe that only points 3 and 6 are dominated by some other available points, whilethe nondominated set comprises points 1, 2, 4, 5, and 7. In Figure 3.8b observe that thenondominated boundary is not necessarily continuous, especially in the presence of gaps inX (nonconvex cases of X). Both these cases are more difficult to handle analytically.

Figure 3.8. Nondominance on a discrete point set (a) and on a nonconvex set (b)

A careful review of the figures displaying two–dimensional nondominated sets would reveal thata nondominated solution is a feasible solution for which an increase in value of any one criterioncan be achieved only at the expense of a decrease in value of at least one other criterion. Thisdefinition leads naturally to the concept of value trade–offs: How much achievement with respectto criterion 1 is one decision maker willing to sacrifice in order to gain a particular achievementwith respect to criterion 2? The nondominated boundary is sometimes characterized as the‘trade-off curve’.

3.4.1 Key Concepts and Notation

Recall that x = {x1,x2, . . . ,xm} denotes the set of feasible alternatives and that each alternativeis characterized by n attributes. For example, the kth design alternative can be written as

xk = (xk1, x

k2, . . . , xk

n) k = 1, 2, . . . , m

where individual xki designates the level of attribute i attained by alternative k, and i = l, . . . , n

; k = l, . . . , m.

Thus, xk is simply a vector of n numbers, assigned to each xk and summarizing the availableinformation about xk in terms of incommensurable, quantitative and qualitative, objective andsubjective, attributes and criteria. It has been thus established what is often called a ‘multiat-tribute alternative’ in decision theory.

90

3.5 – Theory of the Displaced Ideal

Look now at the ith attribute in isolation. The vector

xi = (x1i , x

k2, . . . , xm

i )

represents the currently achievable scores or levels of the ith attribute. Their simplest interpre-tation occurs when it is assumed that more is always preferred to less (or vice versa). Because

mink

xki = max

k(−xk

i ) k = 1, 2, . . . ,m

i.e., finding the minimum of the m numbers is identical to finding the maximum of these numberstaken with negative signs, one shall agree to treat both cases as maximization.

Among all achievable scores for any ith attribute, see vector xi, there is at least one extreme or‘ideal value’ that is preferred to all others. It can be called an ‘anchor value’, denoted x∗i , andwritten as

x∗i = maxk

xki i = 1, 2, . . . , n

with the understanding that the above ‘max’ operation is only a simplification, since both maxi-mum and ideal values are included in the concept of an anchor value.

The simple notion of anchor dependency is introduced reflecting the conditions of choice discussedabove: A set of attributes is anchor dependent if the degrees of closeness computed within theset depend on the corresponding anchor values as well as on the degrees of closeness associatedwith other attributes in the set.

The set of all such ‘anchor values’ is called the ‘ideal alternative’ or the ‘ideal’ denoted as

x∗ = (x∗1, . . . , x∗n)

The ‘ideal’ plays a prominent role in decision making. Suppose, for example. that there exists xk

in x such that xk ≡ x∗; the ideal is attainable by the choice of xk. There is no decision to be made.Any conceivable (but rational) utility function defined over an n–tuple of numbers (x1, . . . , xn)would attain its maximum value at x∗ and consequently at xk. The ideal is, however, not feasiblein general , or if feasible, it soon becomes infeasible as soon as the decision maker raises aspirationfor just one xi.

3.5 Theory of the Displaced Ideal

The theory of the displaced ideal has evolved from ideas that were floating around MCDM scien-tific community for some years. Its main concept, the ideal solution, has been disguised undermany different labels, and the exposition of this concept has often been indirect through a largevariety of working papers, theses, articles. The idea seems to possess the exciting and elegantquality of a paradigm. It seems that the appearance of the concept of the ideal solution is dueto parallel searches in the early sixties for an approach to multiobjective conflict resolution. Theidea was temporarily abandoned in favor of the nondominated solutions concept.

91


The concept of displaced ideal was briefly introduced by Geoffrion (1967) as the ‘perfect solu-tion’. It was originally conceived as a technical artifact, a fixed point of reference, facilitating thechoice of a compromise solution. The first fully operational use of the concept occurred in thelinear multiprogramming methodology of Saska (1968). The ideal solution soon became knownunder the term ‘movable target’. Zeleny (1974) introduced the concept of the compromise setand developed the method of the displaced ideal. Sequential displacements of the ideal solutionalso form the basis for evolutive target procedure, introduced by Roy (1977).

The concept appears to be general enough to encompass problems involving multiple decisionmakers as well. Some initial thoughts on this possibility was advanced by Yu (1973) who usedthe term ‘utopia point’. It is a diffuse opinion that the concept of the ideal solution and itsdisplacement represent more than a convenient technical tool. It is a hypothesis about therationale underlying human decision–making processes. As such it deserves a full axiomaticdevelopment, empirical testing, and interdisciplinary cross validation.

3.5.1 Measurement of Preferences

The first important prerequisite is to acquire some understanding of the measurement scales ofhuman preferences, utilities, and subjective probabilities. All three notions are crucial for theso–called von Neumann–Morgenstern utility theory and its major normative dictum: maximizeexpected utility. The above ‘golden rule’ stands rather firmly at the core of modern decisionanalysis and its most lively derivative, multiattribute decision theory (MAUT) - see Section 3.7.

Two basic ways of measuring preferences can be expressed by notions of ordinal and cardinalscales. Ordinal scales are purely relational; designs are rank–ordered, and no other meaningfulnumerical properties can be assigned to them. One can say only that design A is preferred to B,that A is equal to B, or that B is preferred to A, but one cannot say by how much; the strengthof preference is not apparent from ordinal scales.

Ordinal scales can be expressed through numerical or verbal rankings, i.e., 1, 2, 3, 4, etc., or‘bad’, ‘average’, ‘good’, ‘excellent’, etc. A special case of an ordinal scale would be a booleanvariable, i.e., assigning 1 to preference and 0 otherwise. Ordinal numbers are those for which thedifferences between them are meaningless. That is, if 7− 5 6= 4− 2, then all algebraic manipula-tion of such numbers are meaningless as well, and the numbers can be replaced by ordinal ranking.

Cardinal scales do assign meaningful numerical values (numbers, intervals, ratios, etc.) to the de-signs in question. Differences between cardinal numbers are meaningful; for example, 7−5 = 4−2,and addition, subtraction and multiplication by a constant are allowable operations.

Cardinal scales can be further divided into interval and ratio scales. Interval scales are char-acterized by the allowance of an arbitrary zero point so that only addition, subtraction andmultiplication by a constant are well defined. The Fahrenheit and Celsius scales of measuringtemperatures are typical examples. Ratio scales are characterized by a nonarbitrary zero point, asfor example in the Kelvin temperature scale. Here the multiplication by interval–scaled variablesis allowed, i.e., the ratios of individual scale values have meaning.

92


Observe that cardinal ordering would be meaningless unless the interval 0 to 1 were specified.Without such reference points, or anchor points, the decision maker could not make any senseout of the intensities of preference.

Although anchor points can be chosen arbitrarily. there are some choices that are better thanothers. And often an anchor point is implied uniquely and unequivocally by a given physicalsituation. It will be argued later that, especially in design and related assessment of preference,reference designs are not selected arbitrarily but are characterized by distinct desirable properties.Even ordinal scales can be anchored, i.e., furnished with convenient reference points.

3.5.2 Traditional Utility Approach

Some relevant concepts of the utility theory are briefly anticipated to understand better the ad-vantages of the ideal point concept. Consider the preference space in Figure 3.9. Both axes x andy may represent a number of things: attribute scores, criteria levels, preferences of two differentindividuals, and so on. Maximum utility is achieved at M . Obviously, M is preferred to all pointson lower indifference curves, that is, M > In > . . . > I2 > I1; toward points on the same curve,like A and B, the decision maker is assumed to be indifferent, that is, A ≈ B, where the symbol≈ indicates indifference. ln the absence of any availability constraints, point M would always bethe choice; no conflict is present and no decision making is needed.

Morgenstern (1972) criticizes the indifference–curve analysis as introduced above. If x and y

denote respective amounts of goods in one’s possession, then one could move from B to M di-rectly by disposing of excess amounts of x and y. One cannot similarly go from A to M . Itis thus difficult to maintain indifference between A and B; actually B > A. Similarly, F > A,A > E, etc. It turns out that indifference curves seem to be valid only in the shaded subregionof Figure 3.9.

Most utility theory assumes that all alternatives are comparable in the sense that given any twoalternatives, one or the other is either strictly preferred or the two are seen as being preferentiallyequivalent. If the decision maker is presumed not to be able to express the strength of his/herpreference, as is assumed in the ordinal utility model, then also the notion of indifference, whichis the extreme and the most precise expression of preference intensity (i.e., the one of intensityzero), becomes difficult or impossible to assess explicitly.

If the decision maker does not strictly prefer one alternative to another, the absence of strictpreference should not imply indifference. As Roy (1977) emphasizes, certain pairs of alternativesare non comparable because the decision maker (i) does not know how to, (ii) does not want to, or(iii) is not able to compare them. To confound such noncomparability with indifference representsa considerable simplification of the decision–making process. In Figure 3.9 decision maker is notexpected to be able to state how much G is preferred to C (only that G > C), and yet he/sheis assumed to be quite capable of stating how much C is preferred to E. The decision maker isassumed to be able to determine the indifference between C and E with absolute precision.

93


Figure 3.9. Unconstrained utility space

3.5.3 Ideal Point

Coombs (1958) assumes that there is an ideal level of attributes for candidate designs and thatthe decision maker’s utility decreases monotonically on both sides of this ideal point. He showsthat probabilities of choice depend on whether compared alternatives lie on the same side of theideal point, or whether some lie on one side of the ideal and some on the other.

In technologically constrained situations, as in Figure 3.10, attainment of M becomes an unre-alistic goal. The set of available alternatives is much too limited by the production–possibilityboundary P . Conflict between what is preferable (the ends) and what is possible (the means) isthus established, and a decision–making process may take place. Because M is not a clearly de-fined point or a crisply delineated region but rather a fuzzy cloud of preferred levels of attributes,the conflict is perceived by decision makers only as a fuzzy sense of conflict.

As the decision maker attempts to grasp the extent of the emerging conflict between means andends, he/she explores the limits attainable with each important attribute. The highest achievablescores with all currently considered attributes form a composite, an ideal alternative x∗. Fig-ure 3.10 shows both M and x∗. Whereas M is almost always too difficult to identify, x∗ is easierto conceptualize because all its characteristics can be directly experienced and derived from theexisting alternative choices. These individual attribute maxima can be found, quantified, andmade fully operational. Point x∗ serves as a good temporary approximation of M in decisionmaking.

The general infeasibility or nonavailability of x∗ creates a predecision conflict and thus generatesthe impulse to move as closely as possible toward it. Because of the conflict experienced, thedecision maker starts searching for new alternatives, preferably those which are the closest to theideal one.

94


It should be noted that if such an ideal alternative is created, that is, if point x∗ becomes feasible,then there is no need for further continuation of the decision process. Conflict will have beendissolved, and x∗ will automatically be selected since it is unquestionably the best of the currentlyavailable choices, provided that the set of alternatives is technologically closed.

Figure 3.10. Constrained utility space

Note that in contrast the ideal point x∗ can be and is frequently displaced. It is responsiveto changes in the available set of alternatives, objectives, evaluations, measurements, and evenerrors. It responds to new technological advances, inventions, and discoveries of oversights. Itbecomes a moving target, a point of reference which provides an anchor for dynamic adjustmentof preferences.

At this point, the axiom of choice can be stated: Alternatives that are closer to the ideal arepreferred to those that are farther away. To be as close as possible to the perceived ideal is therationale of human choice.

The fuzzy language employed in the axiom of choice - ‘as close as possible’, ‘closer’, ’farther’, etc.- reflects the reality of the fuzziness of human perception and preferences. Before engaging infurther elaboration of the axiom, it is proper to clarify a few minor points. It is quite obvious that‘preference’. can be expressed as an ‘as far as possible’ concept as well, employing an anti–idealas the point of reference. It can be shown that the two concepts are closely interrelated andcomplementary.

95


3.5.4 Fuzziness and Precision

It is straightforward to explore the case of a single attribute first, mainly to emphasize its in-clusion as a special case of the displaced ideal theory. Given that the anchor value of a singleattribute has been successfully located, the decision problem is trivial: choose the anchor value.Construction of a utility function seems superfluous. In order to express the intensities of pref-erence for all alternatives (especially if a selection of multiple alternatives is intended) and todemonstrate the use of the axiom of choice in this special case, a cardinal analysis is essential.

Since the ideal point and the anchor value are now identical, the alternatives close to x∗i arepreferred to those farther away. Consider the following: Three different alternatives are to beevaluated with respect to a single, simple attribute, say ‘euro return’. For example, a three–dimensional vector of returns might describe the alternatives (5, 10, 100). Obviously the first twovalues are quite far from 100, with 10 being a little closer then 5. Observe that 100 is the anchorvalue and, in this case, the ideal. Assume that the lucrative third alternative has tuned out to beinfeasible and was replaced by a new alternative, thus generating a modified vector (5, 10, 11).This change in the anchor value has also caused 10 to be much closer to the ideal than 5. Thedifference between 5 and 10 has changed from negligible to substantial.

There are two important points made by this example: the levels of preference change with thesituation, and they are expressed in fuzzy terms. It is suitable, therefore, to employ the linguisticapproach developed by Zadeh (1973). The essence of the linguistic approach is best captured inZadeh’s principle of incompatibility. He states that ... as the complexity of a system increases,our ability to make precise and yet significant statements about its behavior diminishes until athreshold is reached beyond which precision and significance can no longer coexist (Zadeh, 1973).

The complexity of human preferences is unquestionable, and it is amplified further by the dom-inant role of judgment, perception, and emotions. In contrast, to create units of measurementfor preferences may allow for precise mathematical treatment but diminishes understanding ofhuman preferences. The key elements in human thinking are not numbers but labels of fuzzy sets,i.e. , classes of objects in which the transition from membership to nonmembership is gradualrather than abrupt.

For example, to designate a color by a natural linguistic label, such as ‘red’, is often much lessprecise than to apply the numerical value of the appropriate wavelength. Yet it is far more sig-nificant and useful in human affairs. Similarly, people tend to assign a linguistic rather than anumerical value to the intensity of our choice preferences. In order to amplify the relationshipbetween fuzziness and precision in human deliberations, the example of labelling of colors is elab-orated in the following short digression. Fuzzy linguistic labels, rather than ‘precise’ numericalmeasurements, are often successfully used by large numbers of people with no apparent handicapor uncertainty. A typical example concerns our way of defining colours.

It would be most precise, a scientist might argue, to associate each color with its particularwavelength, as registered by the human retina, and measure it in angstroms to as many decimalplaces as desired. Yet anybody has never suggested such nonsense, even though the wavelengthsof colors are in principle measurable.

96


The reason for using linguistic labels for designating color is the need for a system which is ac-ceptable and usable in science, sufficiently broad for art and industry, and sufficiently familiarto be understood by the public. Such diverse needs are well met by the Munsell color system.According to this system, each color can be described in terms of three basic attributes: hue,lightness, and saturation. Hue names can be used as both nouns and adjectives: ‘red’, ‘reddishorange’, ‘orange’, ‘orange yellow’. ‘yellow’, etc. The hues include black, gray, and white. Theterms ‘light’, ‘medium’, and ‘dark’ designate decreasing degrees of lightness. The adverb ‘very’then extends the lightness scale from ‘very light‘ to ‘very dark’. Finally, the increasing degrees ofcolor saturation are labelled with the adjectives ‘grayish’, ‘moderate’, ‘strong’, and ’vivid’. Ad-ditional adjectives cover combinations of lightness and saturation: ‘brilliant’ for light and strong,‘pale‘ for light and grayish, and ‘deep’ for dark and strong. Combining the agreed upon linguisticlabels, one can specify about 267 visually distinguishable colors, for example, vivid purple, bril-liant purple, very light purple, very pale purple, very deep purple, very dark purple, but also darkgrayish purple, very light purplish gray, and strong purplish pink. Most of these colors can be rec-ognized and their differences remembered by scientists, artists, professionals, and the public alike.

The Munsell system also fixes the boundaries of each color name. These boundaries are thentranslated into numerical scales of hue, lightness, and saturation, and each color can thus beexpressed as accurately as desired.

Definition. A fuzzy subset A of a set of objects U is characterized by a membership function fA

which associates with each element x of U a number fA in the interval [0, 1], which representsthe grade of membership of x in A.

This definition will be used to exemplify the meaning of ‘as close as possible’ in the axiom ofchoice. Consider the vector xi of available scores of the ith attribute over m alternatives. Thedegree of closeness of xk

i to x∗i is defined as

d (xki , x

∗i ) = Uk

i

where Uki = 1 if xk

i = x∗i , and otherwise 0 ≤ Uki ≤ 1.

Degrees of closeness Uki are not of great value in the case of a single attribute. The transitivity of

preferences is preserved along a single dimension, and the ordinal ranking of alternatives is notinfluenced by changes and adjustments in degrees of closeness.

97


3.5.5 Membership Functions

Essentially the ith attribute’s scores are now viewed as a fuzzy set , defined as the following set ofpairs

{xki , U

ki } i = 1, . . . , n ; k = 1, . . . , m

where Uki is a membership function mapping the scores of the ith attribute into the interval [0, 1].

For example, the scores generated by available alternatives might be labelled with respect to theideal as ‘close’, ‘not close’, ‘very close’, ‘not very close’, ‘distant’, ‘not distant’, ‘not very distant’,‘not close and not distant’, etc.

The membership function of a fuzzy set can be defined by a fuzzy recognition algorithm, a pro-cedure suggested by Zadeh (1973). At this stage it is enough to simply introduce a few plausiblefunctions yielding the degree of closeness to x∗i for individual alternatives:

1. If x∗i is a maximum, then

Uki =

xki

x∗i

2. If x∗i is a minimum, then

Uki =

x∗ixk

i

3. If x∗i is a feasible attribute value, whether x∗i is preferred to all xki smaller and larger than

x∗i , then

Uki =

[12

(xk

i

x∗i+

x∗ixk

i

)]−1

4. If the most distant feasible score is labelled by zero regardless of its actual closeness to x∗i ,one can define

xi∗ = mink

xki

and write Uki as

Uki =

xki − xi∗

x∗i − xi∗

The above four functions Uki indicate that xj is preferred to xk when Uk

i < U ji .

To gain a proper numerical grasp of the functions Uki introduced so far, evaluate a simple vector

of ten numbers with respect to their distances from different anchor values x∗i (and xi∗), as shownin Figure 3.11, being Uk

i denoted dki .

98


Figure 3.11. A simple fuzzy recognition algorithm

That implies that preference ordering among available alternatives is transitive with respect to asingle attribute.

3.5.6 Multiattribute Dependency

Focus on X, the set of all initially feasible alternatives. Each alternative k induces a particularvector xk consisting of the scores attained with respect to all salient attributes. In this sense onecan say that all attribute scores are fixed for a given alternative. That is, xk

1 comes only with xk2

and not any other value. The two scores xk1 and xk

2 are not separable, and they both characterizea particular alternative in a vector sense.

Alternatives are usually characterized by multiple attributes, i.e., by vectors xk = (xk1, x

k2, . . . , x

kn),

k = 1, . . . , m. Independent attributes can be represented as in table reported in Figure 3.12.

Figure 3.12. Matrix of alternatives and attributes

In each column the decision maker locates an anchor and then transforms the scores into thecorresponding degrees of closeness, i.e., all xk

i ’s would be changed into Uki ’s according to a par-

ticular membership function as, for example, one of the four function types (fuzzy recognition

99


algorithms) indicating the closeness of an attribute score to the ideal attribute. The question isnow to determine how close is the kth alternative to the ‘anchor value’ along the ith attribute.There are n attributes for each alternative. If the decision maker was to assume independencyamong the individual columns of the given table, this approach would be quite straightforward.There is, however, usually some interdependence among the attributes in the sense that the valueof, say, Uk

1 restricts or even determines the possible values of Uk2 , Uk

3 , etc.

Assume that attributes are generally dependent on each other in a complex, dynamic, and highlysubjective way. This subjective nature of attribute dependency makes an interaction betweendecision maker and model almost mandatory. To this end, some traditional notions of attributedependency are now reviewed briefly, as they can be derived from the multiattribute utility lit-erature.

Most theories of multiattribute utility first define strict independence conditions for a decisionmaker’s preferences for different levels of a given set of attributes while the levels of the remainingattributes are held fixed. It is often assumed that when the levels of the other attributes shift,the initially derived preferences stay unaffected. The two basic types of attribute dependency arevalue dependency and preferential dependency:

• Value dependency. A set of attributes is value–dependent (inter–attribute) if the measure-ment of numerical scores with respect to one attribute implies or restricts a particularattainment of scores by all other attributes of the set. Typical examples are water temper-ature and water density, cost and price, and size and weight.

• Preferential dependency. A set of attributes is preferentially–dependent (intra–attribute)on other attributes if preferences within the set depend on the levels at which the scores ofother attributes are fixed. For example, the preference for speed in a car depends on safety;the preference for size in a ship depends on harbors’ capability; etc.

Note that value dependency and preferential dependency are themselves interdependent. Thatis, the scores of the attributes cannot be fixed at any particular level without simultaneouslyfixing all value–dependent attributes as well. Preferential changes are thus induced in responseto different subsets of the value–dependent set. Similar interdependence exists across the alterna-tives. The problem lies in the proper specification of attributes and in increase of the number ofattributes, and such composite attributes are often difficult to quantify and even to conceptualize.

Traditionally, dependency has been treated as separable from a particular set of feasible alterna-tives. Thus, if the strength of preference for a given level of one attribute systematically changeswith respect to all achievable levels along the second attribute, then all the conditional or para-metric preferential functions must be assessed a priori.

Instead of making an a priori assessment of attribute dependency, its impact is implicitly in-corporated into the dynamic process of partial decision making. As an alternative, say the kth,is removed from further consideration, the set of n attribute scores (xk

1, . . . , xkn) is removed as

well. The initial evaluation is performed on a more or less complete set X, and the attributeinteraction demonstrates itself only as the alternatives (and the appropriate attribute scores) arebeing progressively removed.

100


The impact of removing an alternative k is essentially twofold:

• the variety and contrast of the currently achievable attribute scores is reduced;• the ideal alternative can be displaced if the removed alternative contained at least one at-

tribute anchor value.

Consequently, the removal of any alternative affects the ranking of the remaining alternativesin terms of their closeness to the ideal. It affects the relative importance of attributes as well.Finally, if the ideal is displaced, the actual distances of the remaining alternatives must also berecomputed.

Then, all degrees of closeness shall interactively be adjusted each time an ideal value of anattribute is displaced. The question concerning closeness of alternative k to the ith attributeanchor value, e.g. the ideal, can be viewed as a composite question. The answer to the compositequestion can be derived from the answers to its constituent questions. The multiattribute natureof this dependency. i.e., the manner in which the constituent questions are combined to form acomposite question. is explored next.

3.5.7 Composite Membership Functions

Answer to the composite question represents the grade of membership of the alternative k inthe fuzzy set ‘as close as possible’, expressed numerically. This answering thus corresponds toassigning a value to the membership function. The answer set may be the unit interval [0, 1].

Assume that the set of feasible alternatives X has been mapped through Uki ’s into a ‘distance

space’, where Uki represents the degrees of closeness of xk

i to x∗i . Denote the space of all Uki ’s

generated by X as D.

Note also that the ideal alternative is now translated into a unitary vector U∗ = (U∗1 , . . . , U∗

n).because if

xki = x∗i then Uk

i = U∗i = 1

To determine the degree of closeness of any xk to x∗ in terms of Uk and U∗, an appropriatefamily of distance membership functions is defined as follows

Lp (w, k) =

[n∑

i=1

wpi (1− Uk

i )p

]1/p

where w = (w1, . . . ,wn) is a vector of attribute preference levels wi, and the power p representsthe distance parameter 1 ≤ p ≤ ∞. Thus Lp (w, k) evaluates the distance between the idealalternative with membership grade U∗ and the actual vector of degrees of closeness induced byan alternative with membership grade Uk.

Observe that for p = 1, and assuming∑

wi = 1, Lp (w, k) can be written as

L1 (w, k) = 1−n∑

i=1

wi Uki

101


Similarly, for p = 2, one obtains

L2 (w, k) =

[n∑

i=1

w2i (1− Uk

i )2]1/2

and for p = ∞L∞ (w, k) = max

i{wi (1− Uk

i )}

In order to appreciate the numerical differences between L1 (w, k), L2 (w, k), and L∞ (w, k),consider ten alternatives evaluated with respect to two attributes. Numerical values are given inthe table reported in Figure 3.13.

Figure 3.13. Distance metrics (ten alternatives vs. two attributes)

Observe that x∗1 = x101 = 99 and x∗2 = x8

2 = 15. Therefore, x∗ = (99, 15) is the (infeasible) idealpoint.

The following formulae

Uki =

xki

x∗iand Uk

i =

[12

(xk

i

x∗i+

x∗ixk

i

)]−1

have been used for transforming attributes 1 and 2 into the distances from their respective anchorpoints, x∗1 = 99 and x∗2 = 15. For example

U61 =

x61

x∗1=

3699

= 0.3636

and

U62 =

[12

(x6

2

x∗2+

x∗2x6

2

)]−1

=[12

(1015

+1510

)]−1

= 0.9230

Note that only x8, x9, and x10 are nondominated by any other alternative in the set of ten.Applying the three measures of distance the decision maker derives the closeness of each xk tox∗. Both attributes are assumed as equally important, that is, w1 = w2 = 0.5.

102


For example,

L1 (w, 6) = 1− (w1U61 + w2U

62 ) = 1− (0.5× 0.363 + 0.5× 0.923) = 0.357

L2 (w, 6) = [w21 (1− U6

1 )2 + w22 (1− U6

2 )2]1/2 = [0.25 (1− 0.363)2 + 0.25 (1− 0.923)2]1/2 = 0.320

L∞ (w, 6) = max{w1 (1− U61 ); w2 (1− U6

2 )} = max{0.5 (1− 0.363); 0.5 (1− 0.923)} = 0.318

Observe that x10 is the closest to x∗ with respect to L1, while x9 is the closest with respect toL2 and L∞. Both compromise solutions x9 and x10 are encircled in the above table.

Compromise Solution

Thus the closest alternatives to the ideal can be defined as those minimizing Lp (w, k) with respectto some p. If

mini

Lp (w, k)

is achieved at xk(p), then xk(p) is called the compromise alternative with respect to p. Let C

denote the set of all such compromise alternatives for p = 1, . . . ,∞.

A number of interesting and useful properties are typical for compromise solutions:

• for 1 ≤ p ≤ ∞, since there is no xk in X such that Uki ≥ U

k(p)i for all i’s and Uk 6= Uk(p),

xk(p) is nondominated , even though it can be demonstrated that at least one xk(∞) is non-dominated.

• for 1 < p < ∞, xk(p) (and Uk(p)) is the unique minimum of Lp (w, k) on X.

It can be shown that Lp (w, k) is a strictly increasing function of

L′p (w, k) =

n∑

i=1

wpi (1− Uk

i )p

and thus xk(p) minimizes Lp if and only if it minimizes L′p. Note that L

′p (w, k) is a strictly convex

function and thus it gives a unique minimal point on X for 1 ≤ p ≤ ∞.

It is important to realize that the membership functions Lp and L′p are not independent of a

positive linear transformation of individual degrees of closeness (Yu, 1973).

For example. let dki = αi U

ki , αi > 0 and d∗i = αiU

∗i = αi.

Then

Lp (w, k) =

[n∑

i=1

wpi (d∗i − dk

i )p

]1/p

transforms into

Lp (w, k) =

[n∑

i=1

wpi (αi − αiU

ki )p

]1/p

=

[n∑

i=1

wpi α

pi (1− Uk

i )p

]1/p

103


Thus changing the scale of the degrees of closeness has the same effect as changing the preferencelevels wi in Lp and L

′p.

The above observation suggests that the degrees of closeness are interrelated with the weights ofimportance. It seems that their compounding effect must be clearly understood to avoid ‘doubleweighting’. Decision makers should concentrate on manipulating either Uk

i or wi, only excep-tionally on both. The assignment of a particular set (U1

i , . . . , Umi ) already implicitly contains

and reflects the importance of the ith attribute. It is necessary to understand how much Uki

reflect the underlying objective measurements and how much they are products of a subjectivereinterpretation.

Before exploring the problem of weights in greater detail (see Chapter 4), it is advisable to gainsome understanding of the distance parameter p. So far it was worked with p = 1, 2,∞. Becausethe power 1/p may be disregarded, use L

′p and substitute ν = 1− Uk

i

L′p (w, k) =

n∑

i=1

wpi νp−1

i (1− Uki )

Observe that as p increases, more and more weight is given to the largest deviation (1 − Uki ).

Ultimately the largest deviation completely dominates when p = ∞ in L∞ and L′∞. It can be

concluded that p weights the individual deviations according to their magnitudes and across theattributes. while wi weights deviations according to the attributes and irrespective of their mag-nitudes.

The compromise with respect to p then indicates a particular form of conflict resolution betweenthe available alternatives and the infeasible ideal.

Figure 3.14. Typical compromise solutions and a compromise set

Observe that for p = 1 the minimization of L′∞(w, k) reflects decision maker’s extreme disregard

for individual deviation magnitudes - it is their total sum they are after. On the other hand,

104


for p = ∞ one tries to minimize the maximum of the individual deviations. All attributes arethus considered to be of comparable importance, and the compromise deviations are equalized asmuch as possible.

Nondominated Solutions

It has become an accepted belief that nondominated solutions provide a good general startingpoint (or sometimes even the endpoint) of rational decision analysis. So far, the concept ofnondominance was not used explicitly, and now its general usefulness will actually be disputedand its inferiority to the concept of compromise solutions will be discussed. If there is no j andxj in X such that U j

i ≥ Uki for all i’s and U j

i 6= Uki , then k represents a nondominated alternative

xk, which generates a nondominated outcome Uki in the above sense. That is, xk is nondominated

if, and only if, there is no other feasible alternative generating an outcome which can dominateit. It may be concluded that a good decision must yield a nondominated solution, and manyauthors actually start their procedures by eliminating all dominated xj from X.

Figure 3.15. The problem of the second best

At least two objections have been raised against such a conceptual framework:

• If more than one alternative is required for a solution, then the second and subsequentchoices are not necessarily nondominated. The concept of nondominated solutions is fullyviable if and only if a single solution is required.

• If a ranking of alternatives is desired, then the set of all nondominated solutions does notprovide a good basis for the ranking. Even if only a single solution is the target, subsequentrankings of alternatives serve as an importani intermediate orientation tool, helping thedecision maker to make preferences.

105


These objections, however, do not dispose of the fact that a single or the first selection is alwaysto be nondominated. It is only the tendency to work exclusively with nondominated solutionswhich is questionable.

In Figure 3.15 the shaded boundary of D, denoted N , represents the set of all nondominatedsolutions. Recall that all compromise solutions, denoted C, are nondominated by definition. SinceC is always smaller or equal to N , the selection of a single solution is thus greatly simplified.If the decision maker is concerned about the second best alternative (after the ideal point) withdistance dk(2) from the ideal, it can be assumed that the kth alternative is the next closest to theideal. Observe that even if the solution with dk is obviously dominated by the one with dk(2), yetits initial omission could significantly distort the final choice of the second best. Correct rankingof alternatives provides the essential information for the intermediate as well as the final stagesof a decision process.

Anti–Ideal

A concept similar to the ideal alternative, its mirror image, the anti–ideal , can be defined on anyproperly bounded set of feasible alternatives.

Among all achievable scores, for any ith attribute, there is at least one extreme value which isthe least preferred in relation to all remaining values. Define

xi∗ = mink

xki i = 1, . . . , n

and the collection of all such minima, the anti–ideal alternative, as

x∗ = (x1∗ , . . . , xn∗)

The anti–ideal might be either infeasible or feasible; in either case it could serve as a point ofreference during the process of decision making when the decision maker strives to be as close aspossible to the ideal and simultaneously as far away as possible from the anti-ideal.

Return to the simple example of three alternatives, evaluated along a single dimension, generatinga vector of scores (5, 10, 11). The task is to choose among the first two alternatives, 5 and 10,using he third one, 11, as the ideal. To transform the scores into the corresponding degrees ofcloseness it will be assumed that a simple function xk

i /x∗i provides a good approximation. Theideal will be displaced farther and farther away from the two values in question as in Table 3.3.

Observe that in the last two columns, the discriminatory power of the ideal diminishes as itsvalue approaches large numbers. Under such conditions a decision maker might attempt to usethe anti–ideal since its discriminatory power would still be observed.

106

3.6 – Multiattribute Selection

no. vector xki /x∗i xk

i − x∗i1 (5, 10, 11) (0.45, 0.9, 1) (6, ‘, 0)2 (5, 10, 20) (0.25, 0.5, 1) (15, 10, 0)3 (5, 10, 100) (0.05, 0.1, 1) (95, 90, 0)4 (5, 10, 500) (0.01, 0.02, 1) (495, 490, 0)5 (5, 10, 1000) (0.005, 0.01, 1) (995, 990, 0)...

......

...∞ (5, 10, ∞) (0, 0, 1) (∞, ∞, 0)

Table 3.3. Discriminatory power of the ideal

Naturally, the compromise set based on the ideal is not identical with the compromise set basedon the anti–ideal. This fact can be used in further reducing the set of available solutions byconsidering the intersection of the two compromises. This possibility is illustrated in Figure 3.16.

Figure 3.16. Ideal and anti–ideal

3.6 Multiattribute Selection

Multiattribute decision–making methods are developed to handle selection problems at conceptdesign level. In this class of problems, the ‘best possible solution’ is determined from a finiteand usually small set of alternatives. The selection is performed based on the evaluation of theattributes and their preference information.

Consider a multiattribute decision–making problem which has m attributes and n alternatives.Let C1, . . . , Cm and A1, . . . , An denote the attributes and design alternatives, respectively. Astandard feature of multiattribute decision–making methodology is the decision table as shownin Figure 3.17. In the table each row belongs to an attribute and each column describes the

107


performance of an alternative. The score aij describes the performance of alternative Aj againstattribute Xi.

Figure 3.17. The decision table

As shown in the decision table, weights w1, . . . , wm are assigned to the attributes. Weight wi

reflects the relative importance of attribute Ci to the decision, and is assumed to be positive. Theweights of the attributes are usually determined on subjective basis. They represent the opinionof a single decision maker or synthesize the opinions of a group of experts.

The values x1, . . . , xn are the final scores of the alternatives. Usually, a higher score for analternative means a better performance, so the alternative with the highest score is the best ofthe alternatives.

Multiattribute decision–making techniques partially or completely rank the alternatives: a singlemost preferred alternative can be identified or a short list of a limited number of alternatives canbe selected for subsequent detailed appraisal.

Besides some monetary–based and elementary methods, the two main families in the multiat-tribute selection techniques methods are those based on the multiattribute utility theory (MAUT)and on the outranking methods.

The family of MAUT methods consists of aggregating the different attributes into a function,which has to be maximized. This theory allows complete compensation between attributes, i.e.the gain on one attribute can compensate the lost on another (Keeney and Raiffa, 1976).

The concept of outranking was originally proposed by Roy (1968). The basic idea is as follows.Alternative Ai outranks Aj if on a great part of the attributes Ci performs at least as good as Aj

(concordance condition), while its worse performance is still acceptable on the other attributes(non–discordance condition). After having determined for each pair of alternatives whether onealternative outranks another, these pairwise outranking assessments can be combined into apartial or complete ranking. Contrary to the MAUT methods, where the alternative with thebest value of the aggregated function can be obtained and considered as the best one, a partialranking of an outranking method might not identify the best alternative directly. A subset ofalternatives can be determined such that any alternative not in the subset may be outranked by atleast one member of the subset. The aim is to make this subset as small as possible. This subsetof alternatives can be considered as a short list, within which a good compromise alternativeshould be found by further methods.

108

3.7 – Multiattribute Utility Theory

3.7 Multiattribute Utility Theory

Utility theory , originated in economics with the first works of Bernoulli in the eighteenth century,was concerned at first with modelling preferences of a decision maker who has to select amongalternatives with risky outcomes. Utility is an abstract variable, indicating goal–attainment. Itcan also be considered as a ‘measure of satisfaction or value which the decision maker associateswith each outcome’ (Dieter, 2000). The diminishing marginal utility, which is represented asan utility function (Fig. 3.18), indicates that a decision evaluation of a risky venture is not theexpected return of that venture, but rather the expected utility from that venture.

Figure 3.18. Marginal utility

The basic reason for using a utility function as a preference model in decision making is to cap-ture decision maker’s attitudes about achievable target and risk. Accomplishing high performanceand minimizing exposure to risk of an industrial product are two of the fundamental conflictinggoals that decision makers face. There are many other trade–offs that designers make in theirdecisions; for example, cost versus safety of an industrial product. When purchasing industrialproducts, owners consider not only reliability and life span but also price, maintenance costs,operating expenses, and so on. Understanding trade–offs in detail, however, may be criticalfor a design team. It is suitable to model preference trade-offs between conflicting attributesusing utility functions. A utility function represents a mapping of decision maker’s preferenceonto a mathematical function so allowing the preference information to be expressed numerically.For a decision–making problem with multiple attributes, a utility function is assigned to eachattribute to reflect the decision maker’s preference information. Usually, a more preferred per-formance value of the attribute obtains a higher utility value; for example, if cost is identifiedas an attribute its associated utility function would have higher utility values for lower cost values.

A relatively straightforward way of dealing with conflicting attributes is to create an additivepreference model ; that is, to calculate a utility score for each attribute and then weighting themappropriately according to the relative importance assigned to each one; and hence obtaininga function which expresses utility as a mathematical function of the decision–making criterion.Thus, the first task is identifying attributes, constructing their hierarchies, and creating usefulattribute scales. With attribute scales specified, the matter of understanding trade–offs may bedealt with. But limitations come with the simple additive form so that it is advisable to constructmore complicated preference models that are less limiting.

109


To overcome aforementioned limitations, multiattribute utility theory (MAUT) provides a formalbasis for describing or prescribing choices between alternative solutions whose properties are char-acterized by a large number of attributes. It evaluates utility functions intended to accuratelyexpress a decision maker’s outcome preferences in terms of multiple attributes. MAUT grew outof the unidimensional utility theory and its central dogma of ‘rational’ behavior. If an appro-priate utility is assigned to each possible outcome and the expected utility of each alternativeis calculated, then the best selection for any decision maker is the alternative with the highestexpected utility. The main purpose of MAUT is to establish a superattribute, to maximize theoverall utility , as the criterion for selecting a project.

MAUT does not tend to replace unidimensional utility functions defined over single attributes.Rather, it reduces the complex problem of assessing a multiattribute utility function into one ofassessing a series of functions. Such individually estimated component functions are then gluedtogether again; the glue is known as ‘value trade–offs’, which requires the subjective judgment ofthe decision maker.

The family of MAUT methods consists of aggregating different utility functions into one functionto be maximized. Utility functions can be applied to transform the performance values of thealternatives against diverse attributes to a common, dimensionless scale. In practice, the interval[0, 1] is used for this purpose. That allows a more preferred performance to obtain a higher utilityvalue. A good example is an attribute reflecting the target of cost minimization. The associatedutility function must result in higher utility values for lower cost values.

3.7.1 Additive Utility Function

The essential problem in multicriterial decision making is deciding how best to trade off increasedvalue on one attribute for lower value on another. Making these trade–offs is a subjective matterand requires the decision maker’s judgment. If there is a large number of alternatives and stillfew attributes, it is preferable to attempt an explicitly assessment of the overall utility functionU (acquisition cost, horsepower, miles per ton, depreciation percent, maintenance costs, comfort)of the selected multiple attributes.

One of the most common assumptions is that function U is additive. That means that is can bewritten as follows

U = w1u1 (cost)+w2u2 (hp)+w3u3 (nm/t)+w4u4 (depr.)+w5u5 (maint. costs)+w6u6 (comfort)

The preference model which can solve aforementioned problems is called additive utility function.The most comprehensive discussion, and the only one that covers swing weights, is that by vonWinterfeldt and Edwards (1986). Swing weighting considers differences in attribute scales, wherethe input is admittedly an approximation. It can be used in virtually any weight assessment situ-ation (Clemen, 1996). Keeney (1980) and Keeney & Raiffa (1993) have devoted a lot of efforts tothis preference model. Edwards and Barron (1994) discuss some heuristic approaches to assessingweights, including the use of only rank–order information about the attributes.

110

3.7 – Multiattribute Utility Theory

The basic idea of creating an additive utility function, which has to be maximized, has beenapplied diffusely. Other decision-aiding techniques also use the additive utility function implicitlyor explicitly, including the ‘Analytic Hierarchy Process’ (Saaty, 1980) and ‘Goal Programming’with non–preemptive weights (Winston, 1987). For all of these alternative models, extreme caremust be exercised in making the judgments on which the additive utility function is based.

To build an additive utility function properly, satisfactory ways must be found because of twoproblems. The first problem has to do with comparing the attribute levels (numerical scores) ofthe available alternatives, thus requiring a quantitative model of preferences for each alternativethat reflect the comparisons. The second problem depends on to how the attributes compare toeach other in terms of importance. As with the scores, numerical weights must be assessed foreach attribute.

Additivity and the determination of individual U ’s and w’s require independence of attributes.For example, a shipowner’s preferences regarding the price of a ship should not be affected bychanges in its fuel mileage. The two attributes, cost and mileage, are not independent, so thatthe use of an additive utility function would be difficult to substantiate over the given set ofattributes. One has to redefine the attributes, i.e., combine acquisition cost and fuel mileage intoan overall ship cost over the lifetime period. A typical ship would then be characterized by thetriplet of total cost, horsepower, and comfort. The decision maker must test, of course, whethertotal cost, horsepower, and comfort are mutually independent attributes. If they are not, one hasto resort to additional combinations of attributes. Consequently, one of the most important tasksof MAUT is to verify the independence of attributes. After independent attributes suitable foranalysis have been established, all individual single–attribute utility function must be constructed.

The additive utility function assumes that marginal utility functions are available, that is, U1(x1),U2(x2), . . . ,Um(xm) for m different attributes x1 through xm. The additive utility function issimply a weighted average of these different marginal utility functions, where the decision makermust assign weighting factors which reflect the relative contribution of each attribute to overallvalue. For a design solution that has numerical scores x1, . . . ,xm on the m attributes, the utility ofthis alternative may be calculated by aggregating the weights and values obtained above accordingto the additive combination rule

U(x1, . . . ,xm) = w1 U1(x1) + . . . + wm Um(xm) =m∑

i=1

wi Ui(xi) (3.4)

where the weights w1, . . . ,wm, associated with the attributes, reflect the relative importance ofeach attribute through a 1 point scale, where 0 points are assigned to the less important attributeand 1 point is assigned to the most important attribute.

When one plugs in the worst level (x−i ) for each attribute, the marginal utility functions thenassign 0 to each attribute [U(x−i ) = 0], and so the overall utility is also 0. If one plugs in the bestpossible value for each attribute (x+

i ), the marginal utility functions are equal to 1 [U(x+i ) = 1],

and so the overall utility becomes

U(x+1 , . . . ,x+

m) = w1 U1(x+1 ) + . . . + wm Um(x+

m) = w1 + . . . + wm = 1 (3.5)

111


3.7.2 Risk and Utility Function

MAUT is primarily concerned about the independence of attributes, which allows to decomposethe evaluation of multiattribute alternatives into unidimensional attribute evaluations. Althougfhthere are no unidimensional decision problems as such, there are many situations where one issearching for the alternative which maximizes or minimizes a single measure of merit: requiredfreight rate, response to comfort of a passenger ship, etc. It is around situations of this typethat unidimensional utility theory has evolved. One notices that the examples of one–attributeproblems come across as being a bit forced. It is quire rare, and therefore difficult to imagine,that a comparison of decision alternatives proceeds in so simple–minded a fashion. Nearly always,there are multiple criteria to be taken into account.

The ‘riskless decomposition’ is, however, only a first step in MAUT. If alternatives become risky,the decomposition over attributes is closely linked to the decomposition over uncertain events.Therefore, it is useful to distinguish between riskless and risky decisions. In the former case, thedecision maker acts with perfect information, and thus is able to specify with complete certaintythe properties which will result from any combination of independent variables. In the lattercase, the decision maker has only partial information, and is assumed only to be able to assignsubjective probabilities to each of the possible properties. It can be argued that no decision istruly riskless - one does never act with perfect information - but for many purposes the risklesschoice assumption provides a reasonable approximation to the situation actually confronting thedecision maker.

Figure 3.19. Utility functions

By fitting curves through the individually assessed points achieved by assuming different valuesof probability p which reflect the actual attitude of the decision maker, one can gain some ideaabout the shape and a possible functional form of the utility function. If such a curve lies belowthe straight line connecting the endpoints of a given interval of values, it is said to be concaveand upward sloping (the curve opens downward) over the interval (Fig. 3.19). Concave utilityfunctions reflect a decision maker’s aversion to risk ; straight lines, i.e., linear utility functions,define risk neutrality or indifference; while convex (opening upward) utility functions (everywhereabove the straight line) define risk propensity , or risk seeking.

112

3.8 – Multiattribute Concept Design

Most functions should display all three basic attitudes toward risk over certain non-overlappingsubregions of possible attribute levels. Although most decision makers are not risk–neutral, it isoften reasonable for them to assume their utility curve is nearly linear for a particular decision,say, in the range of safety. Keep in mind that the utility function is only a model of a decisionmaker’s attitude toward risk.

3.8 Multiattribute Concept Design

A very competitive market of high–tech technical systems pushes to improve design methods,especially concept design where main characteristics of the industrial product are determined,which affect performance and total cost in its life-span. Certain controlling factors such as maindimensions and geometric characteristics, technical performance, etc., are not expected to varysubstantially upon the subsequent design phases. Possibilities for influencing total life-time costof a technical system are very high during concept design and decrease during following designphases, process development and manufacturing.

Therefore, the design process has to involve simultaneously a number of often conflicting goalsboth of technical and economic nature. Classical single objective optimization schemes, bestillustrated by the design spiral, involve one criterion at time dealing with the others through pureheuristic preferences. On the other hand, some multicriterial optimization methods use hybridsolvers inadequate to comprehend the actual nature of design. In particular, Mistree et al. (1991),Sen (1992), Ray and Sha (1994) utilize procedures that require predefined preference informationon attributes to be applied in ranking feasible designs only. To circumvent this limitation thepowerful concept of nondominated (Pareto–optimal) designs is introduced in concept design, fullyillustrated by Zanic et al. (1992).

Figure 3.20. Framework of multiattribute concept design

113


Experience with different multicriterial design procedures has indicated that the multiattributedecision-making (MADM) method is the most suitable for practical application to concept shipdesign. It treats design as a whole requiring only simple evaluation and selection procedure. Itdeliberately does not attempt to provide automatically an optimal solution, also because thatwould not decrease the risk associated with uncertainties intrinsic to design process. On thecontrary, it is conceived so as to drive selection of the ‘best possible solution’ also on the basis ofaspiration levels established by the decision–maker.

A framework of multiattribute concept design suitable to robust design simulation is illustratedin Figure 3.20, where communication between the design process and external environment isemphasized.

In this method a large number of feasible designs is created by multiple execution of design modelwith sets of random design variables generated by an adaptive Monte Carlo method. Constraints,of min-max, crisp or fuzzy type, may be applied to any attribute value generated within thedesign model. A design is feasible if it meets the given set of requirements and all constraints arewithin acceptable limits. It is probable that some of them will be superseded by other designsin every respect. It means that if there exists a design A, that in all relevant attributes is betterthan design B, then design B is dominated and therefore is discarded in further consideration.Therefore, among all feasible designs only nondominated ones are retained as shown in Figure3.21, which presents an illustration of a two attribute space.

Figure 3.21. Projection of two–dimensional attribute space

This approach makes it relatively easy to search for the multidimensional highly constrainedsubspace of feasible designs. The end product of multicriterial concept design procedure is ahyper–surface of nondominated designs, where selection of the ‘preferred design’ is performedonly after sufficient number of nondominated designs are generated. The procedure is imple-mented in a concept design shell capable of searching the design space and monitoring the process.

The MADM design procedure is illustrated in Figure 3.22, where the two main tasks, i.e. designmodelling and design selection procedure are bolded.

114


Figure 3.22. The multiattribute design procedure

Primary design attributes will be transformed into objectives in the process of decision makingat basic design level.

3.8.1 Basic Concepts

Identification of design problems implies specification of design criteria in terms of design vari-ables.

Design criteria, as measure of design effectiveness, can be divided into two broad groups:• Criteria with a priori limiting levels form design constraints (hard constraints) on design

variables. They are used to distinguish between feasible and infeasible designs.• Criteria that can be used as design performance measures are called design attributes (soft

constraints). They can be transformed into design goals if direction of quality improvementis specified (minimization or maximization of objective function).

In the frame of MADM approach, each design can be represented as a point in the design spaceX spanned by NV design variables. It can also be considered as a point in the attribute space Y

spanned by NA design attributes. Constraints in X space bound the subspace of feasible designs(Fig. 3.23).

The evaluation process is a mapping (f : X → Y ) from X on Y space, i.e. calculation of at-tribute functions values for given values of design variables. The design process is inverse mapping(f−1 : Y → X) from Y space to X space, i.e. identification of most appropriate values of designvariables for given aspiration levels of attributes.

When preference among attributes is established (level, ordinal, cardinal, etc.), techniques ofmultiattribute decision making can be applied to evaluate feasible solutions. The outcome of

115


concept design process may be optimal, efficient, satisficing or preferred solution (Trincas et al.,1987).

Optimal solution is rarely attainable in multicriterial problems. It is reached if all attributefunctions reach their extreme values simultaneously. Ideal solution (utopia, zenith), usuallyinfeasible, is a point defined by favorable extremes of attribute values. Anti–ideal point (nadir)is the most unfavorable combination of attribute levels. Nondominated, efficient solutions are ofprimary importance, since they correspond to designs which are better than any other feasibledesign in at least one attribute. Satisficing solutions correspond to all designs that completelysatisfy aspiration levels of the decision maker. Among satisficing solutions the set of nondominatedsolutions is characterized by the fact that there exists no solution in which increase in any criterionwould not cause decrease in at least another criterion. Finally, the preferred solution (or efficientsolution) is the one selected among nondominated designs when design team’s preferences arecompletely established.

Figure 3.23. Concept design mapping

The increased speed of workstations provides the opportunity to model the complex design prob-lem as a multiple evaluation process by intentionally creating a discrete number of feasible designsby multiple execution of a design model . The mathematical design model is driven by applyingan adaptive Monte Carlo method which guides the random generation of a large set of designalternatives. Constraints of min-max, crisp or fuzzy type, may be applied to any attributevalue generated within the mathematical model. A design is feasible if it complies with all crispconstraints. Feasible designs are tested for dominance, based on membership grade functionsfor intra–attribute preference and design team’s subjective preference across attributes (inter–attribute preference).

116


Among feasible designs only nondominated ones are retained. This approach makes it possibleto search the multidimensional highly constrained subspace of nondominated designs with littledifficulty. If sufficient density of nondominated points is generated one may obtain a ‘discrete’inversion of Y on X mapping for the most important zone of the design space. Therefore, it ispossible to replace optimization–oriented MODM approach with much simpler MADM, whichimplies only mathematically easier procedures for evaluation and selection. Details of the under-lined procedure are available in specialized literature (Zanic et al., 1992, 1997; Grubisic et al.,1997).

Present increased interest for simulation, random generation and Monte Carlo methods showsthat simple approaches to complex problems of decision making are at hand in many engineeringdisciplines. Based on hardware development, increased computational speed makes MADM meth-ods feasible for many otherwise mathematically cumbersome problems. This situation resemblesthe period of fast replacement of complex analytical methods in the mechanics of continua (i.e.structural analysis) by simple finite element method as soon as the solution of large systems oflinear equations become possible in reasonable computing time. In the sequel the basic steps ofthe proposed design process are given.

3.8.2 Concept Design Process Description

The concept design process is divided in two phases, that is, the phase of design points genera-tion in the design space and the phase of design selection in the attribute space, which generallyrequires introduction of metrics. Both are included in a shell which drives main activities, i.e.problem formulation, solution generation, dominance filtering, etc. The process of design selec-tion is basically interactive since the decision maker could change and refine his/her preferences(sensitivity study).

Information on the synthesis shell, on graphic representation of design and attribute space, aswell as on organization of data structure have been extensively detailed by Trincas et al. (1994).The MADM shell consists of the following main functions:

• define min-max design subspace;

• generate sample designs via an adaptive Monte Carlo method;

• evaluate feasibility of generated designs subject to specified crisp constraints;

• define intra-attribute fuzzy functions and inter-attribute preference matrix;

• transform design attributes to membership grade via fuzzy functions;

• define dominance structure for filtering nondominated solutions by building metrics of val-ues of design variables and attributes;

• refine designs around specific nondominated designs to establish robustness of the preferredsolutions.

117


Main steps of design generation in the design space

.The logical flow of the generation activity is as follows:

• Determination of the ranges of the design variables defined by min–max and linear con-straints only. They are determined via series of simple linear programming problems withmaximization/minimization of each design variable. Errors in linear constraints definitionare spotted here.

• Random generation of points in the X space as defined in the first step. Standard randomnumber generator is used.

• Evaluation of design feasibility. Crisp constraint values and linear constraints are used tofilter feasible designs.

• Evaluation of attribute functions, i.e. Y space image of the design

yi = yi(x) ; i = 1, . . . , NA

• Transformation of attribute evaluation functions yi(x) through membership grade functionsUi(yi).in the range [0–1]:

yi = Ui [yi(x)] ; i = 1, . . . , NA

• The most simple ‘more is better’ dominance structure (Pareto dominance) can be builtusing yi(x) values. Filtering of nondominated designs among feasible ones can be now per-formed with an efficient dominance algorithm.

• Control of the number of nondominated designs with respect to the total number of feasibledesigns based on given resolution in X and Y space. If prescribed density is not achievedmore efficient method of design generation is needed to search for more nondominateddesigns. Formula for number of nondominated points (ND) as function of number of feasiblepoints (NF ) and number of attributes (NA) reads (Calpine and Golderg, 1976)

ND = 1 + ln(NF ) +ln2(NF )

2!+ . . . +

lnNA−3(NF )(NA− 3)!

+ 0.5572lnNA−2(NF )(NA− 2)!

+lnNA−1(NF )(NA− 1)!

• Random generation of new design points in mini hypercubes with decreased range aroundnondominated solutions, hence yielding the ‘chain generation’ of a great number of non-dominated points or alternatively discrete approximation of nondominated hypersurface inY space. With proper convergence checks in X and Y space the number of random pointsin the ‘minicube’ based design process is much smaller than the number of points yieldedby crude generation in the primary screening of the design space (see Fig. 3.23).

• Random generation of designs around extreme points for all attributes. In this manner theextremes of nondominated subspace are obtained more accurately.

This procedure is developed in the design space and no mathematical intricacies connected withnormalization, inter–attribute and intra–attribute relationships are needed. Moreover, the pro-cess can be automated and executed off-line.

118


The result of this process are ND nondominated designs defined in two matrices:

- design matrix : X(NV,ND) - X–space coordinates of nondominated designs

- decision matrix : Y (NA,ND) - Y –space coordinates of nondominated designs

Decision matrix plays the most important role in design selection in Y space, while design matrixgives X space counterpart of any selected design.

Main steps of design selection in the attribute space

.The logical flow of the selection procedure is as follows:• Interactive definition of the preference information regarding relationship between attributes

or, alternatively, preference information on design alternatives.• Extraction of weight factors from subjective preference matrix (Saaty method, least squares

method or entropy method).• Selection of attribute value type (direct, membership grade formulation).• Normalization of attribute values (vectorial or linear scale)

yji =

yji√√√√

NA∑

k=1

(yjk)

2

; i = 1, . . . , NA ; j = 1, . . . , ND

yji =

yji − y∗imin

y∗imax− y∗imin

for maximization

yji =

y∗imax− yj

i

y∗imax− y∗imin

for minimization

• Calculation of L1, L2 (Euclid) and L∞ (Cebycev) norms with respect to the ideal point (yi)or prescribed goal for all nondominated designs (y), with given attribute weights (wi):

Lp(w, y) = |y − y∗|w, p =NA∑

i=1

[wi |yj

i − y∗i !p]1/p

; i = 1, . . . , ND ; p = 1, 2,∞

• Stratification of the set of nondominated solutions into layers according to the value func-tion (i.e. L1, L2 or L∞ norms or other). Stratified X or Y space can be used for

1. graphic presentation;2. experiments in interpolation (i.e., design variable as a function of design attribute val-

ues in specified limited stratum).

• Extraction of ‘preferred solutions’ according to given preference structure. In this way de-signs of minimal distance from ideal or other prescribed goals are obtained and displayed.

• Random generation of designs around all ‘preferred solutions’. In this manner more accu-rate value of the best possible design is obtained in few steps.

119


3.9 Multiobjective Design

Multiobjective decision making (MODM) methods are suited to handle the MCDM problems inbasic design where the ‘preferred design’ is the final outcome of the concept design. That is, op-timization will be performed to maximize or minimize the objectives associated with subsystemsof the complex technical system In general, these objectives are often conflicting so the optimalsolution is usually a ‘compromise design’ that aims to satisfy simultaneously the different objec-tives at best.

Figure 3.24 lists some MODM methods that are capable of dealing with this class of problems.These MODM methods are classified into different groups mainly ‘based on the types of preferenceinformation and timing for eliciting preference information’ (Sen and Yang, 1998).

Figure 3.24. Classification of MODM methods

A decision tree for MODM technique selection was also developed by Sen and Yang (1998), asillustrated in Figure 3.25. By using this figure, the decision maker can construct a choice rule toselect a method by examining the decision rule or the computational procedure of the methods.

120

3.9 – Multiobjective Design

Figure 3.25. Decision tree for MODM technique selection

121


3.9.1 Genetic Algorithm

The Genetic Algorithm (GA) is a type of evolutionary algorithm to find approximate solutionsto optimization and search problems. The basis of GA is the use of an adaptive heuristic globalsearch algorithm originated from the evolutionary ideas of natural selection and genetic. GeneticAlgorithm technique consists of a structured random algorithm that reproduces Darwin’s evolu-tionary process of survival of the fittest in natural system. By mimicking this process, geneticalgorithm is able to evolve solutions to realistic problems by performing an intelligent exploitationof a random search within a defined design space. It has been demonstrated that GA is capableof efficiently finding the global optimum for a MODM problem.

The GA has five major steps, that is, initialization, evaluation, selection, crossover and mutation.Figure 3.26 depicts the steps of the genetic algorithm.

Figure 3.26. Genetic algorithm

The GA starts from the creation of an initial population which is chosen randomly from thedesign space defined by the independent variables. The individuals of the population are usuallyencoded as binary strings of 0s and 1s. The individuals are the so–called chromosomes; they arethen evaluated and a fitness value is assigned to each individual. Once evaluated, the ‘parents’ foreach generation are stochastically selected based on their fitness; this step is known as ‘selection’.After the new population is established, the genetic material of the parents is combined to createchildren by performing a crossover operation (structural exchange of characters). The crossoveris accomplished by randomly selecting a splice point in the binary string and then swapping bitsbetween the parents at the splice. Once the crossover is done, the mutation operation is applied.

122


In the process of mutation, the value of a bit is changed (0 changes to 1 and vice versa) with aspecified mutation probability. Thus a new offspring is established and their value is evaluatedagain. If the best individuals and fitness are obtained the algorithm stops; otherwise, the processis repeated for several cycles until the fitted solutions are selected.

3.9.2 Goal Programming

Goal programming (GP) grew out of the need to deal with some unsolvable linear programmingproblems. The term ‘goal programming’ was used by its developer (Ignizio, 1983) to indicate thesearch for a mathematical model that is composed solely of goals. These problems are generatedby the fact that in real life decisions have often to be taken in circumstances where the decisionmaker sets goals which are not necessarily attainable but which serve nevertheless as a standardto aspire to or as a reminder of long term aims. The objective then becomes the attainment ofa solution which comes as close as possible to the indicated goals. However ‘closeness’ to goalsis a vague concept. Goal programming gets around this problem by an ordinal ranking of goals,so that lower priority goals are attended to only after the higher priority goals have either beensatisfied or the solution has reached the point beyond which no further improvements are possibleor desirable.

The general form of a linear programming problem may be written as

maximize Z = f(x) =n∑

j=1

cjxj

subject ton∑

j=1

aijxj ≤ bi , i = 1, 2, . . . , k

≥ bi , i = k + 1, k + 2, . . . , l

= bi , i = l + 1, l + 2, . . . , m

xj ≥ 0 , j = 1, 2, . . . , n

where xj are the independent variables defining the problem and

cj profit associated with each unit of xj

aij amount of resource i used (i.e. steel, foreign exchange) or contributionto goal i (i.e. cost savings) associated with each unit xj

bi total amount of resource i available or target value for goal i

It is quite simple to conceive of a set of constraints that are incompatible and hence do not definea feasible region. For example, the target value of savings may not be attainable with the amountof steel or foreign exchange available. Under such circumstances, the view may be taken that theobjective of the decision maker is simply to satisfy a set of constraints.

123


The goal programming formulation would then be

minimize Z =p∑

k=1

m∑

i=1

(wki−Pk d−i + wki+Pk d+i ) i+ = i + m

subject ton∑

j=1

aijxj + d−i − d+i = bi i = 1, 2, . . . , m

xj ≥ 0 j = 1, 2, . . . , n

d−i , d+i ≥ 0 i = 1, 2, . . . , m

Pk >> Pk+1 k = 1, 2, . . . , p− 1

where

p number of priority levelsPk preemptive weights, or priority weightswki− , wki+ weights for d−i and d+

i , respectivelyd−i , d+

i deviation variables representing negative and positive deviation from bi

Therefore, the goal programming formulation is one in which the weighted sum of the devia-tions from target values or goals, bi, is minimized, according to some externally imposed priorityranking of goals. The units in which the goals, bi, would be expressed would, in general, beincommensurate. The only requirement is that goals at the same priority level need to be incommensurate units.

It is quite conceivable that when several different commensurate deviation (slack) variables areto be treated at the same priority level, the decision maker may wish to give somewhat greaterimportance to a few of these deviation variables. That is. even at the same priority level. somegoals may be more important than others. This can be conveniently handled by multiplying theappropriate deviation variables by suitable weighting factors.

The priority ordering of goals reflects the attitude of a particular decision maker. However, onedoes not need to adhere to some rigid priority ordering. Preemptive weights determine the hi-erarchy of goals, where goals of higher priority levels are satisfied first, and only then may thelower priority goals be considered. Lower priority goals cannot alter the goal attainment at higherpriority levels.

Depending on the nature of the goals and on whether the decision maker wishes to meet a goalexactly, or to overachieve or underachieve it, several alternate formulations can be envisaged.

Minimization of sum of deviations (d−i + d+i )

This will ensure that goal i is met exactly, if possible. For example, the transport task performedby the replacement vessels must be exactly equal to the transport task performed previously bythe scrapped vessels. Hence the net transportation task performed by scrapped and replacement

124


vessels should be zero, if the former is taken to be negative and the latter taken to be positive.To achieve this goal, therefore, the sum of the positive and negative deviations from the goal ofzero net transportation should be minimized.

Minimization of d−i

This will ensure that d−i is driven to zero if possible. Since d−i represents a shortfall in meeting thegoal i, this minimization implies that the goal should be achieved or overachieved. The solutionwill try to ensure that

n∑

j=1

aijxj ≥ bj

Lower target values for any achievement index like cost savings are goals of this type.

Minimization of d+i

This has the opposite effect to that in above formulation. The solution will try to ensure thatn∑

j=1

aijxj ≤ bj

That is, a solution that underachieves the goal i will be preferred. Limitation on the use of scarceand expensive resources like foreign exchange will result in goals of this type.

Very large or very small bi values

Using a very large bi value and minimizing d−i is equivalent to ensuring a large value ofn∑

j=1

aijxj

This, therefore, effectively means an attempt to achieve as high a value of this goal as possible.It will be obvious that using a very small bi value and minimizing d+

i value will have the oppositeeffect.

3.9.3 Compromise Programming

Compromise programming handles constraints and bounds separately from the system goals, con-trary to the goal programming where all is converted into goals It can be viewed as an effort toapproach or emulate the ideal solution as close as possible. Note that the ideal solution is thesituation where every objective does get the largest possible outcome. In a technical sense, thedecision maker measures the ‘goodness’ of any compromise by its closeness to the ideal or by

125


its remoteness from the anti–ideal. Thus, the notion of distance and its measurement cannot beavoided in decision making.

One of the best–known concepts of distance is the pythagorean theorem for measuring the distancebetween two points whose coordinates are known. That is, given points x1 ≡ (x1

1, x12) and

x2 ≡ (x21, x

22) in a plane, distance d between them is found to be

d =√

(x11 − x2

1)2 + (x12 − x2

2)2

But instead of measuring the distance between any two points, one can be interested to comparingthe distances of various points from one point of reference, the ideal point x∗. That is, variouspoints xk are compared in terms of their distance from point x∗. In a two–dimensional case

d =√

(x∗1 − xk1)2 + (x∗2 − xk

2)2

The concept is readily generalizable to higher dimensions. If there are n objectives (measuredalong n coordinates) characterizing the points being compared, the distance formula becomes

d =

[n∑

i=1

(x∗i − xki )

2

]1/2

(3.6)

In the above formula, the deviations (x∗i − xki ) are raised to the power p = 2; in general, they

could be raised to any power p = 1, . . . ,∞ before being summed. Also different deviations,corresponding to different objectives i, can be weighted by differential levels of their contribution(weights wi) to the total sum. A generalized family of distance measures, dependent on power p,can be expressed as follows

dp =

[n∑

i=1

wpi (x∗i − xk

i )p

]1/p

(3.7)

where wi > 0 and p ranges from 1 to ∞.

For p = ∞, the above expression is reduced to

d∞ = maxi{wi (x∗i − xk

i )} (3.8)

Consider points (8, 6) and (4, 3). Their distance can be compared numerically for different levelsof p; wi is assumed to be equal to 1 for all i; that is, weights can be ignored.

In Table 3.4 observe the effect of the increasing p on the relative contribution of individualdeviations: the larger the p, the greater the emphasis given to the largest of the deviations informing the total. Ultimately, for p = ∞, the largest of the deviations completely dominatesthe distance determination. One can also see why the values p = 1, 2 and ∞ are strategicallyimportant: p = 1 implies the ‘longest’ distance between the two points in a geometric sense.The measure d1 is therefore often referred to as a ‘city block’ measure of distance. The shorterdistance between any two points is a straight line, and this is achieved for p = 2. For p > 2 one

126


has to consider distances that are based on even ‘shorter’ measures of distance than a straightline.

p (x11 − x2

1)p (x1

2 − x22)

p total sum dp

1 4 3 7 7.0002 16 9 25 5.0003 64 27 91 4.4985 1024 243 1267 4.174...

......

......

∞ 4∞ 3∞ ∞ 4∗

Table 3.4. Measurements of distance

Recall that distance is employed as a proxy measure for human preference and not as a purelygeometric concept only. In multicriterial decision making distance is used as a measure of resem-blance, similarity, or proximity with respect to individual coordinates, dimensions, and attributes.Thus decision makers cannot narrow their attention to only one p or even to the interval of geo-metrically intuitive measures of 1 ≤ p ≤ 2.

The concept of the ideal solution and that of a set of compromise solutions were studied in Section3.5. These concepts are fully applicable to the problems of mathematica programming, where afeasible set X is described indirectly through a set of functions serving as constraints.

Given a set of decision variables x = (x1, . . . ,xn) and a set of constraint functions gr(x) ≤br, (r = 1, . . . ,m), a feasible set X is the set of decision variables x which satisfies the constraints,X = {x | gr(x) ≤ br. Let f1, f2, . . . , fl denote the objective functions defined on X; that is, fi(x)is the value achieved at x from X with respect to the ith objective function (i = 1, . . . , l). Observethat f = (f1, . . . , fl) maps the n–dimensional set X into its l–dimensional image f(X) = Y . Itis often useful to explain some concepts in terms of the objective (attribute space) Y rather thanthe decision space X.

It is straightforward to demonstrate the mapping from X to Y on a simple numerical geometricexample of linear multiobjective programming.

maximize f1(x) = 3x1 + x2

f2(x) = x1 + 2x2

subject to x1 + x2 ≤ 7x1 ≤ 5

x2 ≤ 5x1, x2 ≥ 0

In Figure 3.27 observe that f2 is maximized at point B = (2, 5), achieving value f2(B) = 12;function f1 reaches its maximum at point C = (5, 2) and the value f2(C) = 17. The heavyboundary N of X denotes the set of nondominated solutions, including both corner points B andC. All other solutions in X are inferior to those in N .

127


Figure 3.27. Set of nondominated solutions N and the ideal x∗ in the decision space

The ideal solution, although infeasible, is x∗ = (4.4; 3.8); it provides maxima for both objectivefunctions. Check that f1(x∗) = 17 and f2(x∗) = 12. Next, points 0, A, B, C, D, and x∗ aretranslated into corresponding value space Y = f(X), consisting of points y = f(x) based on allx from X.

In Figure 3.28, obtained mapping from Figure 3.27, y∗ = f(x∗); that is, f(4.4, 3.8) = (17, 12).All other points of the polyhedron X are similarly translated. Observe that a point y from Y isnondomiinated if there is no other y in Y such that y ≥ y and y 6= y.

Figure 3.28. Set of nondominated solutions f(N) and the ideal y∗ in the objective space

128


Any point y from Y is a compromise solution if it minimizes

dp =

[l∑

i=1

wpi (y∗i − yi)p

]1/p

for some choice of weight wi > 0,∑

wi = 1, and 1 ≤ p ≤ ∞. It can be shown that each compromisesolution satisfying these conditions is nondominated. For 1 < p < ∞ these compromise solutionsare also unique.

3.9.4 Physical Programming

Physical programming (PP) addresses the inherent multicriterial nature of design problems sinceit provides a flexible approach to obtaining a satisficing solution taking into account designers’preferences deterministically. It captures the designers’ physical preferences in forming the ag-gregate multiobjective function. The physical programming method (Messec, 1996) eliminate theneed for weight setting in multicriterial optimization.

Figure 3.29. Class–function classification

129


In the physical programming method the design team specifies quantitative ranges of differentdegrees of desirability for each design objective, using three different classes. Each class comprisestwo cases defined as soft (S) and crisp (C). As depicted in Figure 3.29, the preference soft classeswith respect to each objective are referred to as follows:

• class l-S: smaller–is–better,

• class 2-S: larger–is–better,

• class 3-S: value–is–better,

• class 4-S: range–is–better,

For each criterion, a class function gi is established that constitutes a component of the multi-objective preference function to be minimized. A lower value of the class function is consideredbetter than a higher value. The ideal value of the class function is zero. The previous classifica-tion offers significantly more flexibility than the typical weighted criterion approach provided theproper shape of the soft curves can be determined.

Class Functions

Physical programming exploits information which is already available to designers. They mustknow the desired features of the ‘best solution’. Physical programming class functions are usedas indicators of aspiration levels (Trincas, 2002).

Figure 3.30 depicts the qualitative meaning of each class. The value of the design metric underconsideration, gi, is on the horizontal axis, and the function that will be minimized for that de-sign metric, gi, hereby called the class–function, is on the vertical axis. Each class comprises twocases, hard and soft , referring to the sharpness of the preference. All soft class functions becomeconstituent components of the multiobjective function.

Under conventional design optimization approaches (for example, weighted sum approach), thedesign metric for which class l-S or 2-S applies would generally become part of the multiobjectivefunction with a multiplicative weight, while all the hard classes would simply become constraints.Handling the cases of classes 3–S and 4–S is a more difficult matter. One approach would be touse a positive or negative weight, depending on whether the current value of the pertaining designmetric is on the right or left of the most desired value during optimization. Choosing the rightassociated weights would involve significant trial–and–error. Physical programming removes thistrial–and–error entirely by using the 3–S class and/or 4–S class functions which essentially adaptto the current range in objective space during optimization. The shape of the class–functiondepends on the stated preference of the decision maker.

130


Figure 3.30. Class–function ranges for the ith objective

Physical Programming Lexicon

As mentioned previously, physical programming allows designers to express preferences for eachcriterion with more specificity and flexibility than by simply using the terms minimize, maximize,greater than, less than, or equal to; as would be done in conventional mathematical programmingformalism. Physical programming circumvents the limitations of such a problem formulationframework by employing a new expansive and flexible lexicon. The PP lexicon comprises termsthat characterize the degree of desirability of six ranges for each generic criterion of classes l-Sand 2-S, and ten ranges for classes 3-S.

131


To illustrate the physical programming lexicon, consider the case of class l-S, shown in Figure 3.30.The ranges are defined as follows, in order of decreasing preference:

• highly desirable (gi ≡ gi1): a value for which every value of the objective is ideal;.

• desirable (gi1 ≤ gi ≤ gi2): a range that is desirable;

• tolerable (gi2 ≤ gi ≤ gi3): a range that is tolerable;

• undesirable (gi3 ≤ gi ≤ gi4): a range that is undesirable;

• highly undesirable (gi4 ≤ gi ≤ gi5): a range that is highly undesirable;

• unacceptable (gi ≥ gi5): this range is treated as infeasible.

These terms form the backbone of the physical programming and bring a new flexibility to thedesign process. The shape of the class functions depends on the numerical values of the rangetargets. The targets gi1 through gi5 are physically meaningful values that are provided by thedesigner to quantify the preference associated with the ith design objective. Further insight intothese ranges can be gained by examining the generic shapes of the class functions (Fig. 3.30).Since the curve in the highly desirable range is nearly flat, any two points in the range are of anearly equivalent desirability level.

The class functions map design objectives into nondimensional, strictly positive real numbers.This mapping, in effect, transforms design objectives with different units and physical meaningonto a dimensionless scale through a unimodal function. Figure 3.30 illustrates the mathematicalnature of the class functions and shows how they allow designers to express the preference ranges.Consider the class function l-S where six ranges are defined. The designer specifies parameters gi1

through gi5. When the value of the design objective, gi, is less than gi1 (highly-desirable range)further minimization would express explicit indifference of the class function. When, on the otherhand, the value of the metric, gi, is between gi4 and gi5 (highly–undesirable range), the value ofthe class function is large, requiring further, significant minimization of the class–function. Thebehavior of the other class functions is indicated in Figure 3.30. Stated simply, the value of theclass–function for each design objective governs the optimization path in the objective space forthat design metric.

In the case of class l-S, the class function will seek to minimize its criterion only until the targetvalue gi1 is reached. The decision maker will easily preclude the possibility of obtaining domi-nated solutions by setting gi1 to a value outside of the feasible space in order to exclude solutionsin the ideal range. A similar discussion would apply to the cases of class 2-S.

In cases where the designers have only to care staying within some limits, the hard option wouldapply. For a hard criterion, only two ranges are defined, that is, feasible and infeasible. Itshould be noticed that all of the soft class functions will become constituent components of themultiobjective function to minimize, and that all of the hard class functions will simply appearas crisp constraints, as it will be seen in the linear physical programming (LPP) model. Thequantity on the vertical axis, gi, is what will be minimized. LPP’s effectiveness comes from itsability to shape the class function to suite the typical complex texture of the design performance.It should be emphasized that while choosing weights is difficult and undesirable, choosing target

132


values is instead a welcome option. This is because weights are physically meaningless, whiletargets are physically meaningful.

Mathematical Model

Since the weakest link in MODM process is the development of the multiobjective function, mak-ing this step an effective phase of the design optimization process is a critical objective of thephysical programming method. Another objective of PP is to simplify application of computa-tional optimization. To this end, physical programming brings user–friendliness to optimizationsince it removes the weighting process and frees the decision maker from the details and intrica-cies of numerical optimization. A short description of the mathematical procedure for applyingphysical programming follows.

The implementation of physical programming will define the path from design variables to actualmultiobjective function to minimize via a nonlinear programming code. As stated before, thebasic design process starts choosing the design variables which are then mapped into the designobjectives, gi. Numerically the goodness achieved by a design objective depends on the valueachieved on the class type assigned to the objective and on the aspiration values associated withit (gil to gi5). The sum of all the class functions, which represent mappings of the design objectivs,equals the multiobjective function (Messac, 1996).

Intra–objective preference

Intra–objective preference function is aimed at expressing desiderata from the design team withrespect to each objective. Some useful relationships are herein provided that result from class–function properties (see Fig. 3.30).

• The value of the class function is always the same at each range boundary, for any class–type.Only the location of the boundary changes from objective to objective. As a consequence, asone travels across a given region (say, desirable), the change in the class function will alwaysbe of the same magnitude. This behavior of the class function values at the boundaries isthe critical factor that makes each range type having the same numerical consequence fordifferent criteria. This common behavior has a normalizing effect, and results in favorablenumerical conditioning properties.

The improvement that takes place as one travels across the kth region reads gk and isexpressed by the relation

gk = gi(g+ik) = gi(g−ik) ; 2 ≤ k ≤ 5 ; g1 ≈ 0 (3.9)

where i and k denote a generic objective number and range–intersection, while primes ’+’and ’-’ refer to Class l-S and Class 2-S, respectively.

133


• The independence of the class function from both the considered objective and the regiontype determines the following condition

gk = β(nsc − 1)g(k−1) ; 2 ≤ k ≤ 5 ; β > 1 (3.10)

where nsc denotes the number of soft criteria and β is a convexity parameter. To applyequation (3.10), a small positive number (say, 0.l) needs to be given for g2.

• Keeping convexity requirement yields

g+ik = g+

ik − g+i(k−1) ; g−ik = g−ik − g−i(k−1) ; 2 ≤ k ≤ 5 (3.11)

g1 > gi [gi1] (3.12)

where g±ik is the length of the kth range of the ith objective.

The magnitude of the slopes of the class function of the ith objective changes from rangeto range and takes the form

s±ik =

(∂g±i∂gi

)

gi=gik

; s±i(k−1) =

(∂g±i∂gi

)

gi=gi(k−1)

; 2 ≤ k ≤ 5 (3.13)

More than to the goal programming method, the physical programming offers conceptual sim-ilarities to fuzzy optimization where membership grade functions play a role similar to that ofclass functions in physical programming.

Formulation of generic preference

Following Messac (1996), the class function in the range k > 1 can be expressed by means of avery flexible spline as

gik = T0(ζik) gi(k−1) + T1(ζik) gik + T0(ζik, λik) si(k−1) + T1(ζik, λik) sik ; k = 2 , . . . , 5 (3.14)

where

T0 = 0.5 ζ4 − 0.5 (ζ − 1)4 − 2 ζ + 1.5

T1 = −0.5 ζ4 + 0.5 (ζ − 1)4 + 2 ζ − 0.5

T0 = λ [0.125 ζ4 − 0.375 (ζ − 1)4 − 0.5 ζ + 1.5]

T1 = λ [0.375 ζ4 − 0.125 (ζ − 1)4 − 0.5 ζ + 0.125]

ζik =gi − gi(k−1)

gi(k) − gi(k−1); 0 ≤ ζik ≤ 1

λik = gi(k) − gi(k−1)

134


For range 1, the class–function expression is given by an exponential function that reads

gi1 = gi1 ·exp[si1

gi1

(gi − gi1)]

(3.15)

The quantities sik and si(k−1) are exactly the weights in the linear programming model of theclass functions. ln effect, equation (3.14) states that so long as all these weights are positive, theclass function will be convex. The important point is to observe that convexity can always besatisfied increasing the magnitude of the convexity parameter β through an iterative procedure.

Multiobjective preference function

Once the decision maker decides to which class the objective belongs and defines the range–targetvalues, the intra–objective preference is complete. As to inter–objective preference, the worst ob-jective is always treated first.

Finally, the physical programming mathematical model associated with objective class functionstakes the form of a multiobjective preference function

minx

G(x) = Log

{1

nsc

nsc∑

i=1

gi[gi(x)]

}(3.16)

subject to

gi(x) ≤ gi5(x) (for class 1–S)

gi(x) ≥ gi5(x) (for class 2–S)

gi5L(x) ≤ gi(x) ≤ gi5R(x) (for class 3–S)

gi5L(x) ≤ giL(x) ≤ gi5R(x) (for class 4–S)

while the crisp classes are treated as

gi(x) ≤ gimax(x) (for class 1–H)

gi(x) ≥ gimin(x) (for class 2–H)

gi(x) = giν(x) (for class 3–H)

xjmin < xj < xjmax (for class 4–H)

where pedix ’j’ denotes each generic crisp constraint, while ximin , ximax , gimin , and gimax , repre-sent minimum and maximum values of independent variables and crisp constraints, respectively,while the giν ’s help to define the equality constraints.

Using the logarithmic operator in forming G(x) has the result of mapping a domain that spansseveral orders of magnitude to one that typically involves one order of magnitude only. Themultiobjective function in equation (3.16) is the actual function that nonlinear programmingcodes minimize with possible minor reassignments.

135


3.10 Advanced Decision Support Systems

Decision Support Systems (DSS) originally developed to aid managers in the decision makingprocesses at the beginning of 1970’s. Various DSSs were proposed in the past decades, and thesystems mainly aimed at easing the decision makers’ tasks in decision making process. They covera wide variety of systems, tools and technologies, and integrates them into a computer system tofacilitate the decision making process.

Various definitions have been given to this term by the researchers in the early days after thisterm just emerged. Keen and Scott–Morton (1978) proposed the following classic definition: De-cision Support Systems combine the intellectual abilities of humans with the abilities of computersystems in order to improve the quality of the decisions made. DSSs are computer–based systemsthat are used in order to support decision makers in ill–structured problems. This definition wasextended by Sage (1991) and Adelman (1992) to the following final formulation: Decision SupportSystems are interactive computer–based systems (software), which use analytical methods such asdecision analysis, optimization algorithms, etc., in order to develop appropriate models that willsupport decision makers in the formulation of alternative solutions, the resolution of the reactionsamongst them, their representation, and finally in the choice of the most appropriate solution tobe implemented .

Therefore, DSS is a computer-based information system that uses data and multicriterial decisionmaking (MADM and MODM) models to organize information for decision situations and interactwith decision makers to expand their horizons. It highly alleviates the decision maker’s burdenin dealing with the problems which are semi–structured or ill–structured, and supports all thephases in a decision making process. In addition, the systems are able to store and process a largeamount of knowledge at much higher speed than the human mind, and therefore can considerablyimprove the decision making quality.

3.10.1 Distributed Decision Support Systems

In today’s engineering design decision makers seldom make decision alone, since the decisionmaking problems are becoming more and more complicated. This complexity inspires the idea ofdecomposing the complex decision making problem into partial problems and handling each bydifferent groups of experts. This motivation results in the emergence of Distributed Decision Sup-port Systems (DDSS), specific DSSs to handle Distributed Decision Making (DDM) situations.DDM is defined as a decision making process in which the participating people own different spe-cialized knowledge, execute different specialized tasks, and communicate with each other througha computer environment, which aims at the support of the entire process (Chi and Turban, 1995).

With the development of IT, the utilization of DDM was dramatically expanded. The infor-mation technologies have the on–line and real–time information capabilities through which theDDM can be fulfilled easily and efficiently because the ITs offer immediate response and easyinformation exchange. Most of the current information systems provide such capabilities that canbe characterized as distributed on–line systems. More recently, the web–based DSSs are viewed

136

3.10 – Advanced Decision Support Systems

as clients linked to a server hosting the DSS application, and have great potentials to inspirenew distributed, cooperative or collaborative decision support strategies impacting the very corestructures of the DSSs.

3.10.2 Artificial Intelligence

After the first calculating machine, the abacus, was invented by the Chinese in the twenty–sixthcentury BC, the ability to mechanize the algebraic process intrigued humans until great progresswas made as the digital computer was invented by Charles Babbage in 1856. Just after the SecondWorld War the digital computer was rapidly employed in many areas and alleviated some of theonerous and tedious work that people engage. At almost the same time, researchers made effortsto create machines with some sort of intelligence. In 1950, Alan Turing, the father of ArtificialIntelligence (AI) presented the famous Turing test to show that it is possible for a machine tothink as a human being. Artificial intelligence became an area of computer science that focuseson making intelligent machines, especially intelligent computer programs, that can engage onbehaviors that humans consider intelligent. Today with the rapid upgrading of the computerand sixty years of research, AI has been utilized in various fields, such as decision making, gameplaying, computer vision, speech recognition, expert systems and so on.

Expert systems (ES) are viewed as the most well–known application field of artificial intelligence.ESs are problem–solving programs that combine the knowledge of human experts and mimic theway human experts reason. The goal of an expert system is to emulate the problem–solvingprocess of an expert whose knowledge was used in developing the system.

Figure 3.31. Typical structure of an expert system

Figure 3.31 presents the typical structure of an expert system, which consists of three modules,namely, user interface, inference engine, and knowledge base. The operation procedure startsfrom the user’s task querying through the user interface. After receiving the query from theuser, the inference engine manipulates and uses information in the knowledge base to form aline of reasoning. And then the response is provided by the expert system via the user interface.Further input information may be required from users until the system reaches a desired solution.

The user interface system allows the user to interact with the expert system to accomplish acertain task. It manages the interaction, which can be menus, natural language or any other typeof data, between the system and users. A user can be (i) an expert who maintains and develop

137


the system, (ii) an engineer who employs the system to solve his/her specific problem or (iii) atechnician who is trained for the problem solving procedure.

The inference engine is the control mechanism that applies information present in the knowledgebase to task–specific data to arrive at a decision. It organizes and controls the steps taken to solvethe problem. The most widely used problem-solving method at this point is IF–THEN rules, andthe expert systems that use the rules for reasoning are called rule–based systems. In rule–basedsystems, inference engines utilize the idea that if the condition holds then the conclusion holdsto form a line of reasoning. There are a few techniques for drawing inferences from a knowledgebase such as forward chaining, backward chaining and tree search. Forward chaining starts froma set of conditions and moves to a conclusion while backward chaining has the conclusion firstand tries to find a path to get the conclusion. Tree search is applied when the knowledge base isrepresented by a tree, and the reasoning process is performed by checking the nodes around theinitial node until a terminal node is found.

The knowledge base is the core of the advisor system. Its main purpose is to provide the con-nections between ideas, concepts, and statistical probabilities that allow the inference engine toperform an accurate evaluation of a problem. The knowledge base stores facts and rules, whichinclude both factual and heuristic knowledge and support the judgment and reasoning of theinference engine.

138


Appendix

Terminology for MCDM Problems

There are many terms mostly used in MCDM literature such as alternatives, criteria, attributes, objectives,goals, decision matrix and so on. There are no universal definitions of these terms, since some authorsmake distinctions in their usage, while many use them interchangeably.

Alternatives

Alternatives are the finite set of different solutions which are available to the decision maker.

Criteria

Criteria are a measure of effectiveness of performance. They are the basis by which the performance of adesign is evaluated. Criteria may be hard (constraints) or soft (attributes), according to the requirementsanalysis and the actual problem setting.

Attributes

Attributes are generally referred as designed-to criteria that describe the performance, properties, andalike, of a technical system (size, weight, range, speed, payload, reliability, cost, etc.). They provide ameans of evaluating the levels of aspiration achieved on various targets. That is why they are often re-ferred as soft constraints. Each design alternative can be characterized by a number of attributes chosenby the decision maker. Although most criteria are structured on a single level, sometimes, if there is alarge quantity of them, this structure is based on a hierarchical composition.

Decision Variables

A decision variable is one of the specific choices made by a decision maker. For example, the weight of anindustrial product is a decision variable.

Constraints

Constraints are temporarily fixed requirements on attributes and decision variables which cannot be vio-lated in a given problem formulation, that is, upper and lower bounds cannot be exceeded, and strictlyrequirements must be satisfied precisely. Constraints divide all possible solutions (combinations of vari-ables) into two groups: feasible and infeasible. They are crude yes or no requirements, which can be eithersatisfied or not satisfied.

Weights

Most of the decision–making methods assign weights of importance to the criteria. To represent the rela-tive importance of the attributes, a weighting vector is to be given as w = (w1, w2, . . . , wn). The attributeweights are cardinal weights that represent the decision maker’s absolute preference.

Objectives

Objectives are unbounded, directionally specified (maximization/minimization) requirements which are tobe pursued to the greatest extent possible. It is very likely that objectives will conflict with each other inthat the improved achievement with one objective can only be accomplished at the expense of another.They generally indicate the desired direction of change, i.e. the direction in which to strive to do better asperceived by the decision maker. No particular value of the objective is set a priori as a reference point.Only its maximum/minimum is sought within the limits of feasibility determined by constraints and goals.

139


Goals

Goals (synonymous with targets) are useful for clearly identifying a level of achievement to strive forward,to overcome or not to exceed. They are often referred as hard constraints because they are fixed to limitand restrict the set of feasible solutions from the generated alternatives. In many cases, the terms objec-tive and goal are used interchangeably. They are temporarily fixed requirements which are to be satisfiedas closely as possible in a given problem formulation. That is, upper and lower bounds as well as fixedrequirements are to be approached as closely as possible. Goals allow for fine tuning through their controlover the degree of satisfaction.

Decision matrix

Common to many MCDM techniques is the concept of decision matrix, or comparison matrix, goal decisionmatrix, or project impact matrix. It concisely indicates both the set of alternatives and the set of attributesbeing considered in a given problem.. A decision matrix D is a (m× n) matrix on which the element xij

represents the ‘score’ or ‘performance rating’ values of the alternative Ai, i = 1,2,...,m with respect to aset of attributes xj , j = 1,2,...,n . Hence an alternative is denoted by a row vector

xi = (xi1, xi2, . . . , xin)

whereas a column vector of attributes

xj = (x1j , x2j , . . . , xmj)T

shows the contrast of each alternative with respect to attribute xj .

When expressed in numerical terms, the element xij is commonly termed the jth attribute value for alter-native i.

Metadesign

A metalevel process of designing systems that includes partitioning the system for function, partitioningthe design process into a set of decisions and planning the sequence in which these decisions will be made.

Classification of MCDM Solutions

MCDM problems will not always have a unique solution. Depending on their nature, different types aregiven to different solutions.

Optimal solution

An optimal solution to a MCDM problem is one which results in the maximum value of each of the at-tribute or objective functions simultaneously. That is, x∗ is an optimal solution to the problem iff x∗ ∈ Xand f(x∗) ≥ f(x) for all x∗ ∈ X.

Since it is the nature of MCDM criteria to conflict to each other, usually there is no optimal solution to aMCDM problem.

Ideal solution

The concept of the ideal solution is essential for the approach of multicriterial decision making. An idealsolution may be indicated also as optimal solution, superior solution, or utopia.

140


In a MADM problem the ideal solution A∗ to the decision problem is a hypothetical alternative whoseCartesian product combines the best achievements of all attributes given in the decision matrix. Formally,

A∗ = (x∗1, x∗2, . . . , x

∗j , . . . , x

∗n)

x∗j = maxj

Uj(xij) , i = 1, 2, . . . ,m

where U(·) indicates the value/utility function or the membership grade of the jth attribute.In a MODM problem: max

x∈x[f1(x), f2(x), . . . , fk(x)], X = {x |gi(x) ≤ 0, i = 1, 2, . . . , m}, the ideal solution

is the one that optimizes each objective function simultaneously, i.e.

maxx∈X

fj(x) , j = 1, 2,..., k

An ideal solution can be defined asA∗ = (f∗1 , f∗2 , . . . , f∗j , . . . , f∗k )

where f∗j is a feasible and optimal value for the jth attribute (objective) function. This solution is generallyinfeasible; if it were not, then there would be no conflict among objectives.

Though an ideal solution does not actually exist, the concept of an ideal solution is essential in the devel-opment of MCDM methods. For example, a compromise model is based on the idea of obtaining the ‘bestpossible solution’, which is the closest to the ideal solution.

It should be noted that an ideal solution in MADM problems is driven by the existing solutions. On theother hand, in a MODM environment the objective ideal is the best solution that any alternative couldpossibly obtain. Hence locating an ideal solution is one of the topics in MADM study if a decision makeruses nonmonotonic value/utility functions or membership grade functions.

Nondominated solution

This solution is named differently by different disciplines: non-inferior solution and efficient solution inMCDM, a set of admissible alternatives in statistical decision theory, and Pareto–optimal solution in eco-nomics.

A feasible solution x∗ in MCDM is called a nondominated solution if and only if there exists no otherfeasible solution that will yield an improvement in one attribute without causing a degradation in at leastanother attribute. In other words, a nondominated solution is achieved when no attribute can be improvedwithout simultaneous detriment to at least another attribute.

The nondominated solution concept is well utilized for the second–level screening process in MADM. Butthe generation of a large number of nondominated solutions reduces significantly its effect of screening thefeasible solutions in MODM problems; rather the nondominated concept is used for the sufficient conditionof the final solution.

Satisficing solution

A satisficing solution (Simon, 1996) is a reduced subset of feasible solutions that are not the ‘best’ but‘good’ enough. Satisficing solutions need not to be nondominated. It may well be used as the final solutionthough it is often utilized for screening out infeasible solutions. Whether a solution is satisficing belongsto the level of the knowledge and ability of the decision maker.

Preferred solution

The preferred solution, which is a nondominated solution, represents the solution that is the mostlysatisficing solution for the decision maker. Under this view, MCDM methods can be referred to as decisionaids to reach the preferred solution on condition that the subjective preferences of the decision maker areobserved.

141


142

Bibliography

[1] Adelman, L.: Evaluating Decision Support, Expert Systems, John Wiley & Sons, New York, 1992.

[2] Asimov, M.: Introduction to Design, Prentice–Hall, Englewood Cliffs, 1962.

[3] Belton, V., Stewart, Th.J.: Multiple Criteria Decision nalysis: An Integrated Approach, KluwerAcademic Publishers, Boston, 2002.

[4] Box, G.E.P., Draper, N.R.: Empirical Model-Building, Response Surfaces, John Wiley & Sons, NewYork, 1987.

[5] Calpine, H.C., Golding, A.: Some Properties of Pareto–Optimal Choices in Decision Problems,OMEGA, 1976, Vol. 4, no. 1, pp. 141–147.

[6] Clemen, R.T.: Making Hard Decisions - An Introduction to Decision Analysis, 2nd Edition, DuxburyPress, Pacific Grove, 1995.

[7] Coombs, C.H.: On the Use of Inconsistency of Preferences in Psychological Measurement , Journalof Experimental Psychology, Vol. 55, 1958, pp. 1–7.

[8] Dieter, G.E.: Engineering Design, McGraw–Hill, Boston, 2000.

[9] Edwards, W., Barron, F. H.: SMARTS, SMARTER: Improved Simple Methods for MultiattributeUtility Measurement , Organizational Behavior, Human Decision Processes, 1994, Vol. 60, pp. 306–325.

[10] Geoffrion, A.M.: Solving Bicriterion Mathematical Programs, Operations Research, Vol. 15, no. 1,1967, pp. 39–54.

[11] Grubisic, I., Zanic, V., Trincas, G.: Sensitivity of Multiattribute Design to Economic EnvironmentShortsea Ro-Ro Vessels, Proceedings, 6th International Marine Design Conference, IMDC’97,Newcastle-upon-Tyne, 1997, pp. 201-216.

[12] Hamalainen, R.P, LIndstedt, M.R.K., Sinkko, K.: Multiattribute Risk Analysis in Nuclear EmergencyManagement , Risk Analysis, Vol. 20, no. 4, 2000, pp. 455-467.

[13] Hazelrigg, G.A.: System Engineering: An Approach to Information–Based Design, Prentice–Hall,Upper Saddle Rive, 1996.

[14] Hey, J.D.: Uncertainty in Microeconomics, Martin Robertson, Oxford, 1979.

[15] Hwang, C.L., Yoon, K.: Multiple Attribute Decision Making; Methods, Application - A State-of-the-Art Survey , Springer–Verlag, Berlin–Heidelberg, 1981.

[16] Ignizio, J.P.: Generalized Goal Programming: An Overview , Computers, Operations Research, Vol. 5,no. 3, 1983, pp. 179-197.

[17] Keen, P.G.W., Scott–Morton, M.S.: Decision Support Systems: An Organizational Perspective, MA:Addison–Wesley, 1978.

[18] Keeney, R.: Siting Energy Facilities, Academic Press, New York, 1980.

[19] Keeney, R., Raiffa, H.: Decisions with Multiple Objectives: Preferences and Value Tradeoffs,Cambridge University Press, Cambridge, Massachusetts, 1993.

143

Bibliography

[20] Kesselring, F.: Technical–Economic Designing , VDI–Zeitschr., 1964, Vol. 106, no. 30, pp. 1530–1532.

[21] Lee, D., Kim, S.Y.: Techno-Economic Optimization of an LNG Carrier with Multicriteria inPreliminary Design Stage, Journal of Ship Production, 1996, Vol. 12, no. 3, pp. 141-152.

[22] Li, Y., Mavris, D.N., De Laurentis, D.A.: The Investigation of a Decision–Making Technique Usingthe Loss Function, Proceedings, 4th AIAA Aviation Technology, Integration, Operations (ATIO)Forum, AIAA-2004-6205, 2004.

[23] Messac, A.: Physical Programming: Effective Optimization for Design, AIAA Journal, Vol. 34, no. 1,1996, pp. 149-158.

[24] Mistree, F., Smith, W.F., Kamal, S.Z., Bras, B.A.: Designing Decisions: Axioms, Models, MarineApplications, Proceedings, 4th International Marine Systems Design Conference, IMSDC’91, SNAJ,Kobe, 1991, pp. 1-24.

[25] Morgenstern, O.: Thirteen Critical Points in Contemporary Economic Theory - An Interpretation,Journal of Economic Literature, Vol. 10, 1972, pp. 1163–1189.

[26] Moskowitz, H., Wright, G.P.: Operation Research Techniques for Management , Prentice–Hall, 1979.

[27] Neumann, J. von, Morgenstern, O.: Theory of Games, Economic Behavior , Princeton UniversityPress, Princeton, 1944.

[28] Pareto, V.: Manuale di Economia Politica, con una Introduzione alla Scienza Sociale, SocietaEditrice Libraria, Milano, 1906.

[29] Ray, T., Sha, O.P.: Multicriteria Optimization Model for a Containership Design, Marine Technology,Vol. 29, no. 4, 1994, pp. 258–268.

[30] Roy, B.: Classement et choix en presence de points de vue multiple (la methode electre), RAIRO,Vol. 2, 1968, pp. 57–75.

[31] Roy, B.: Vers une methodologie generale d’aide a la decision, METRA, Vol. XIV, no. 3, 1975,pp. 459–497.

[32] Roy, B.: A Conceptual Framework for a Prescriptive Theory of ‘Decision Aid’ , in ‘Multiple CriteriaDecision Making, TIMS Studies in teh Management Sciences, Starr & Zeleny eds., North–HollandPublishing, Vol. 6, 1977, pp. 179–210.

[33] Roy, B.: Partial Preference Analysis, Decision–Aid: the Fuzzy Outranking Concept , in ‘ConflictingObjectives in Decision’, John Wiley & Sons, New York, 1994.

[34] Roy, B., Vincke, P.: Pseudo-merites et systemes relationnels de preferences - nouveaux conceptset nouveaux resultats en vue de l’aide a la decision, Universite de Paris–Dauphine, Cahiers duLAMSADE, no. 28, 1980.

[35] Saaty, T.L.: A Scaling Method for Priorities in Hierarchical Structures, Journal of MathematicalPsychology, 1977, Vol. 15, no. 3, pp. 234–281.

[36] Saaty, T.: The Analytic Hierarchy Process, New York: McGraw–Hill, 1980.

[37] Sage, A.P.: Decision Support Systems Engineering , John Wiley & Sons, New York, 1991.

[38] Sen, P.: Marine Design: The Multiple Approach, Transactions RINA, Vol. 122, 1992.

[39] Sen, P., Yang, J.B.: Multiple Criteria Decision Support in Engineering Design, Springer–Verlag,London, 1998.

[40] Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication, The University of IllinoisPress, Urbana, III, 1947.

[41] Simon, H.A.: The Science of the Artificial , The MIT Press, Cambridge, Massachusetts, 1996.

[42] Stadler, W.E.: Fundamentals of Multicriteria Optimization, Multicriteria Optimization in Engineer-ing, in the Science, Plenum Press, New York, 1988.

144

Bibliography

[43] Starr, M.K., Greenwood, L.H.: Normative Generation of Alternatives with Multiple Criteria Evalu-ation, in Multiple Criteria Decision Making, Starr & Zeleny eds. North Holland, New York, 1977,pp. 111–128.

[44] Trincas, G., Zanic, V., Grubisic, I.: Optimization Procedure for Preliminary Design of Fishing Vessels,Proceedings, 2nd Symposium on ‘Technics & Technology in Fishing Vessels’, Ancona, 1989, pp. 22–31.

[45] Trincas, G.: Addressing Robust Concept Ship Design by Physical Programming (invited paper),Proceedings, Fourth International Conference on Marine Industry, Barudov, Bogdanov eds., Varna,Vol. II, 2002, pp. 29-38.

[46] Trincas, G., Zanic, V., Grubisic, I.: Comprehensive Concept Design of Fast Ro–Ro Ships byMultiattribute Decision-Making , Proceedings, 4th International Marine Design Conference, IMDC94, Delft, 1994, pp. 403-418.

[47] Winterfeldt, D. von, Edwards, W.: Decision Analysis, Behavioral Research, Cambridge UniversityPress, Cambridge, 1986.

[48] Winston, W.: Operations Research: Applications, Algorithms. Boston: PWS–KENT.

[49] Yu, P.L.: A Class of Solutions for Group Decision Problems, Management Science, 1973, Vol. 19,no. 8, pp. 836-946.

[50] Yu, P.L., Zeleny, M.: The Set of All Nondominated Solutions in Linear Cases, a MulticriteriaSimplex Method , Journal of Mathematical Analysis, Applications, 1975, Vol. 49, pp. 430-468.

[51] Zadeh, L.A.: Outline of a New Approach to the Analysis of Complex Systems, Decision Processes, in‘Multiple Criteria Decision Making’, Cochrane & Zeleny eds., South Carolina Press, Columbia, 1973,pp. 686–725.

[52] Zeleny, M.: A Concept of Compromise Solutions, the Method of the Displaced Ideal , Computers,Operations Research, Vol. 1, no. 4, 1974, pp. 479–496.

[53] Zeleny, M.: Linear Multiobjective Programming , Springer–Verlag, Berlin/Heidelberg, 1974.

[54] Zeleny, M.: Multiple Criteria Decision Making , McGraw-Hill, New York, 1982.

[55] Zanic, V., Grubisic, I., Trincas, G.: Multiattribute Decision-Making System Based on RandomGeneration of Nondominated Solutions: An Application to Fishing Vessel Design, in ‘PracticalDesign of Ships, Mobile Units’, PRADS’92, Caldwell, Ward ed., Elsevier Applied Science, Vol. 2,1992, pp. 1443-1460.

[56] Zanic, V., Grubisic, I., Trincas, G.: Mathematical Models for Ship Concept Design, Proceedings,Eighth Congress of the ’International Maritime Association of Mediterranean’, IMAM’97, Istanbul,1997, Vol. 1, pp. 5.1-7-5.1-16.

145

Chapter 4

Multiattribute Solution Methods

A complex technical system needs complex decisions. To assist in this difficult task, decisionsupport methods have been developed. They try to supersede heuristic choices based on experi-ence or intuition and allow decisions based on scientifically based arguments. A decision supportmethod tries to model the preference system in the mind of the decision maker.

Among decision support methods, multiattribute decision making (MADM) deals with the method-ologies of selection from among a finite and generally small set of discrete and predetermineddesign alternatives associated with multiple attributes. Although only the last decades have seenthe effort to introduce the concept of multiple criteria into the normative decision making process,studies on multiple criteria have a long tradition. MADM has found acceptance in many real–lifedecision situations: in the areas of the business sector, management science, economics, psycho-metrics, marketing, applied statistics, decision theory, and so on. Consequently each area hasdeveloped methods for its own particular application, mostly to explain, rationalize, understand,or predict decision behavior, but not to guide the decision making process.

Multiattribute decision–making techniques can partially or completely rank the alternatives: asingle most ‘preferred alternative’ can be identified or a short list of a limited number of ‘bestpossible alternatives’ can be selected for subsequent detailed appraisal.

In discrete alternative multiattribute decision–making problems, an integrated procedure for de-cision making can be the following:

1. identification of necessary attributes for the problem;

2. elicitation of weights to attributes by individual;

3. allocation of weights to attributes by group consensus;

4. ranking alternatives by individual;

5. screening alternatives by group for the final decision;

6. choosing the most preferred alternative.

147

4 – Multiattribute Solution Methods

Multiattribute decision making is defined in a narrow sense as a decision aid to help a decisionmaker to identify the ‘best possible alternative’ that maximizes his/her satisfaction with respectto more than one attribute.

Several methods have been developed to solve MADM problems. Some of the methods, partic-ularly those from the psychology literature, are oriented toward describing the process by whichsuch decisions are made in order to better understand them and to predict actions and choicesin future decision situations. Other approaches, particularly those from the management scienceliterature, are directed toward providing the decision makers with practical techniques which canbe used to improve their decision making. Intermediate methods are more suitable in the engi-neering field, where models are structured in terms of the designers’ actual preferences, but arethen used normatively, MADM methods apply to problems where a decision maker is choosing orranking a finite number of alternatives which are measured by two or more relevant attributes.

Most multiattribute decision–making techniques for problem–solving on discrete alternatives fo-cus on value evaluation, such as setting standards for evaluation attributes, assigning weight foreach attribute, grading each alternative under individual or group criteria, synthesizing outcomesand utilities, and ranking alternatives. These techniques usually assume that the set of attributesis predefined or there exists some kind of agreement before the MADM solving procedure starts.

Attributes and constraints serve as input to model the concept design process according to thedecision maker’s representation. The MADM modelling for outranking and selection starts out atthe stage where feasible alternatives have been built and the nondominated alternatives have beenidentified. The general concepts of dominance structures and nondominated solutions play animportant role in describing the decision problems and the decision maker’s revealed preferences.Usually, there exist a number of Pareto optimal solutions, which are considered as candidates offinal decision–making solution. The nondominated designs together with their attribute levelsform the decision matrix , which is the concise expression of a MADM problem.

4.1 Decision Matrix

A multiattribute decision–making problem can generally be characterized by a decision matrixwhich indicates both the set of alternatives and the set of attributes being considered in a designproblem. The decision matrix summarizes the ‘raw’ data available to the decision maker atthe start of the analysis. A decision matrix has a row corresponding to each alternative beingconsidered and a column corresponding to each attribute being considered. A problem witha total of m alternatives characterized by n attributes is described by an m × n matrix A asshown in Figure 4.1. Each element of the matrix is the ‘score’ or ‘performance rating’ of thatrow’s alternative with respect to that column’s attribute, and can be stated either numericallyor verbally. When expressed in numerical terms, the element aij is commonly termed the jth

attribute value for the ith alternative.

148

4.1 – Decision Matrix

If the decision matrix is square, matrix A is a ‘reciprocal matrix’ and has all positive elements.Its elements aij compare the alternatives i and j of a decision problem. These elements are saidto be consistent if they respect the transitivity rule

aij = aik ·akj (4.1)

together with the reciprocity rule

aij =1

aji(4.2)

where j > k > i are any alternatives of the comparison matrix1.

Figure 4.1. Comparison matrix

The multiattribute decision–making approaches can be viewed as alternative methods for com-bining the information on a design decision matrix together with additional information fromthe decision maker in order to determine final ranking, screening, or selection from the alter-natives. Besides the information contained in the decision matrix, all but the simplest MADMtechniques require additional information from the decision maker in order to arrive at a finalranking, screening, or selection. For example, the decision matrix provides no information aboutthe relative importance of the different attributes to the decision maker, nor about any minimumor maximum acceptable values, or target values for particular attributes.

It is important that the decision matrix include only those attributes which vary significantlyamong one or more alternatives and for which the decision maker considers this variation tobe important. Some attributes may be important as threshold criteria (constraints), in thatalternatives are excluded from further consideration if they do not meet the threshold requirement.But with respect to such constraints, variation among alternatives that all pass the screeningrequirement is irrelevant. In such cases these attributes should not be included in the decisionmatrix.

1A comparison matrix is reciprocal because its inferior part is reciprocal to the superior part and all the elementsof the principal diagonal are 1. Therefore a transitivity test of one of the two parts of the matrix is sufficient;hence, for each element aij a number of j − (i + 1) equations (4.1) have to be respected.

149


4.2 Measuring Attribute Importance

One crucial problem in MADM is to assess the relative importance of different attributes, whichshould be consistent to the design strategy. Weights reflect the relative importance of attributes.If the decision problem entails several conflicting attributes, experience has shown that not all ofthem are of equal importance to the customer. The importance of preferences among attributesshould not be underestimated since a multiattribute decision making procedure can produce dif-ferent designs depending on the way attribute information is processed. Weights may representthe opinion of a simple decision maker or synthesize the opinions of a group of experts using agroup decision techniques.

Designers may determine the weighting coefficients on the basis of their perception; that impliesthat they are subject to incomplete information and subjective judgement.

In case of n attributes, the vector with attribute weights reads

w = {w1, w2, . . . , wj , . . . , wn}where n∑

j=1

wj = 1

Typically, the method used for assigning weights to attributes is primarily based on the nature ofthe problem and the available information. Several approaches have been proposed to determineweights. They may be distinguished in two classes, i.e. subjective weights and objective weights.

The subjective weights are supposed to convey the strategic task as perceived by the decisionmaker. They form the inter–attribute preferences and correspond to situations where the dataof the decision matrix, in terms of a set of alternative solutions and values of associated decisionattributes, is unknown.

The objective weights are derived from computational analysis of the attribute space. There-fore they are intrinsic properties of the attribute space (intra-attribute preferences). When thedecision matrix information is available, methods such as the ‘entropy method‘ and the ‘linearprogramming technique’ can be used to assess weights of attributes.

Apart from the oldest method named mean of the normalized values 2four methods are availablefor assessing the cardinal weights in the MADM environment:

• the eigenvector method;• the entropy method;• the weighted least-square method;• the LINMAP method.

2This is the oldest method and is based on three steps to derive the priorities as follows:• sum of the elements of the column j;• normalization of column j;• mean of the row i.

The ‘mean of the normalized values’ calculates exact priorities only for consistent matrices. In the case ofinconsistent matrices, this method cannot be mathematically justified. No theory is known for inconsistent matrices.

150

4.2 – Measuring Attribute Importance

Among these four methods, the entropy and LINMAP methods both require the decision matrixto be a part of the input. However, at the design stage, the requirement is to use the weights tofind the best alternative and not to choose the best one from an enumeration of a set of alterna-tives. Therefore, these methods cannot be used in conjunction with the MODM environment.

The weighted least–square and eigenvector methods are based on a so–called fundamental scaleconcept in the MADM environment. They can be used only if the information about the relativeimportance of each attribute over another is known. This is represented in a square matrix,termed as the pairwise comparison matrix.

4.2.1 Eigenvector Method

The decision maker is supposed to judge the relative importance of n attributes. Saaty (1977)introduced a method of ratio scale priorities using the principal eigenvalue of a positive pairwisecomparison matrix , which has to respect the principle of consistency. Let the positive pairwisecomparison matrix A be

A =

A1

A2...

Am

a11 a12 . . . a1n

a21 a22 . . . a2n...

.... . .

...am1 am2 . . . amn

=

w1

w1

w1

w2. . .

w1

wn

w2

w1

w2

w2. . .

w2

wn...

......

wm

w1

wm

w2. . .

wm

wn

(4.3)

where A1, A2, . . ., Am are the design alternatives among which the decision maker has to choose,x1, x2, . . ., xn are the attributes with which each alternative performances are measured, whileaij is the rating of design Ai with respect to attribute Xj , and wj denotes the relative weight ofattribute xj , where wj ≥ 0 (j = 1, 2, . . . , n) and

∑wj = 1. The number of independent pairwise

evaluations is n(n− 1)/2.

If the decision maker wants to find the vector of weights w, given these ratios, he/she can takethe matrix product of the matrix A with the vector w to obtain3

A·w =

w1

w1

w1

w2. . .

w1

wn

w2

w1

w2

w2. . .

w2

wn...

......

wm

w1

wm

w2. . .

wm

wn

·

w1

w2

...wn

= λ

w1

w2

...

wn

= λ·w

3The matrix product is formed by multiplying, element by element, each row of the first factor, A, by thecorresponding elements of the second factor, w, and adding. Thus the weighting vector would be λ·w.

151


or

A·w = λ·w (4.4)

where λ is the eigenvalue associated with the eigenvector A, while w is the priorities vector. maybe normalized and used as a vector of relative attributes.

Due to the consistency property of equation (4.2), the system of homogeneous linear equations(4.4) has only trivial solutions unless some imprecision from these evaluations is allowed. Saatyjustifies the eigenvalue approach for slight inconsistent matrices with with the perturbation theory,which says that slight variations in a consistent matrix imply slight variations of the eigenvectorand the eigenvalue.

In general, as the precise value of wi/wj is difficult to assess, the decision maker’s evaluationscannot be so accurate to satisfy the transitivity rule (4.2) completely. It is known that in anymatrix, small perturbations in the coefficients of the matrix imply small perturbations in theeigenvalues. If A′ is defined as the near consistent matrix slightly modified from the matrixA and w′ denotes the corresponding weight vector, then w′ is determined as the eigenvectorcorresponding to the maximum eigenvalue λmax of A′ according to

A′ ·w′ = λmax w′ (4.5)

So the vector w′ can be obtained by solving the system of linear equations (4.5). This principaleigenvector of the pairwise comparison matrix is then normalized so that the elements in the finalvector of weights w′ sum to 1.

In summary, the eigenvector method calculates a vector of cardinal weights which are derivedfrom the principal eigenvector of the pairwise comparison matrix and which are normalized tosum to one. This method argues that slight perturbations on a consistent matrix induce slightperturbations on the eigenvalues and the corresponding eigenvector.

4.2.2 Weighted Least Square Method

Chu et al. (1979) propose the weighted least square method to obtain the weights. This methodinvolves the solution of a set of simultaneous linear algebraic equations and is conceptually easierto understand than Saaty’s eigenvector method.

To determine the weights, suppose the decision maker giving his/her pairwise comparisons be-tween the elements aij of Saaty’s matrix A in equation (4.3). If aij = wi/wj are the elements of apairwise comparison matrix, the weights can be obtained by solving the constrained optimizationproblem (nonlinear programming model)

Minimize z =n∑

i=1

n∑

j=1

(wj aij − wi)2

subject ton∑

i=1

wi = 1 ; wi > 0

(4.6)

152


where aij denotes the relative weight of attribute Ai with respect to attribute Aj . Although anadditional constraint for model (4.6) is that wi > 0, it is assumed that the above problem can besolved to obtain wi > 0 without this constraint.

Equation (4.6) is a nonlinear programming model. In order to minimize z, the Lagrangian functionis formed

L =n∑

i=1

n∑

j=1

(wj aij − wi)2 + 2λ

(n∑

i=1

wi

)(4.7)

where λ is the Lagrangian multiplier.

Differentiating equation (4.7) with respect to wm and λ, the following set of (n + 1) nonhomoge-neous linear equations with (n + 1) unknowns is obtained

n∑

i=1

wl (aij − wi) ail −n∑

j=1

(wj aij − wi) + λ = 0 ; m = 1,2, . . . ,n (4.8)

which provides the n weights wi and the Lagrangian multiplier λ.

For example, for n = 2, the equations are (recall that aii = 1 , ∀ i)

(1 + a221) w1 − (a12 + a21) w2 + λ = 0

−(a21 + a12) w1 + (1 + a212) w2 + λ = 0

w1 + w2 = 1

Given the coefficients aij , the above equations can be solved for w1, w2, and λ.

The main disadvantage of the weighted least square method is probably the fact that the theorybehind this method is based on the assumption that the weights are known exactly. It is importantto remain aware of this potential problem, and to use weighted least square method only whenthe weights can be estimated precisely relative to one another.

4.2.3 Entropy Method

The entropy method can be used for evaluating the weights when the data of the decision matrixis known with some uncertainty. Entropy analysis indicates the discriminating ability of a certainattribute in a given design space. If the performance of all competing alternatives with respectto a certain performance attribute have similar scores, then this attribute does not have any rele-vance in the comparative analysis and can be eliminated (Hwang and Yoon, 1981). On the otherhand, if the attribute outcomes of alternative solutions are very different, then the attribute hasan important discriminating ability. In other terms, the more distinct and differentiated are thescores, i.e. the larger is the contrast intensity of an attribute, the greater is the amount of ‘deci-sion information’ contained in and transmitted by the attribute. Weights derived from entropyanalysis show these strengths. Therefore, the entropy idea is particularly useful to investigatecontrasts between sets of data.

153


Entropy is the most fundamental concept in information theory as well as in the statistical me-chanics, since it has many properties that agree with the intuitive notion of what a measure ofinformation should be. It measures the uncertainty associated with random phenomena of theexpected information content (Shannon and Weaver, 1947). This uncertainty is represented bya discrete probability distribution, pj , which agrees that a broad distribution represents moreuncertainty than does a sharply peaked one.

The measure of uncertainty E in a probability distribution (p1, p2, . . . , pn), associated with n

possible outcomes of a certain attribute, is given by Shannon (1948) as

E (p1, p2, . . . , pn) = −kn∑

j=1

pj ·ln pj

where k is a positive constant and ln denotes natural logarithm.. Since the terms ‘entropy’ and‘uncertainty’ are considered synonymous in statistical mechanics, E is called the entropy of theprobability distribution pj , since it depends only on the single probabilities. Observe that thelarger entropy is, the less information is transmitted by the jth attribute. E(p1, . . . , pn) takes itsmaximum value when all scores have the same probability pj = 1/n.

The entries of the decision matrix with n alternatives and m decision attributes can be representedin a probability distribution pij , where i = 1, 2, . . . , m alternatives and j = 1, 2, . . . , n attributes.Each entry pij includes a certain information content, which can be measured by means of theentropy value. Therefore, if the decision matrix D of m alternatives and n attributes is

x1 x2 . . . xn

D =

A1

A2...

Am

x11 x12 . . . x1n

x21 x22 . . . x2n...

.... . .

...xm1 xm2 . . . xmn

a probability value pij for each entry in the decision matrix can be simply determined by normal-izing the attribute values at each m design alternative; that is

pij =xij

m∑

i=1

xi

, ∀ i, j (4.9)

Based on this, the pij matrix is formed as follows

pij =

p11 p12 . . . p1n

p21 p22 . . . p2n...

.... . .

...pm1 pm2 . . . pmn

154


Then the entropy Ej of a decision attribute j for m design alternatives is determined as

Ej = −km∑

i=1

pij ·ln pij , ∀ j ∈ 1, . . . ,m (4.10)

where k denotes a constant with a value of (1/ ln m) at Emax, which guarantees that 0 ≤ Ej ≤ 1.

The weight related to entropy Ej is then

wj = 1− Ejn∑

i=1

Ej

(4.11)

Zeleny (1974) mentioned that a weight assigned to an attribute is directly related to the averageintrinsic information generated by a given set of alternatives over that attribute as well as to itssubjective assessment. Based on this, the degree of diversification dj of the information providedby the outcomes of an attribute j can be defined as

dj = 1− Ej , ∀ j (4.12)

According to Hwang and Yoon (1981), if the decision maker has no reason to prefer one attributeto another, the principle of insufficient reason (Starr and Greenwood, 1977) suggests that eachattribute should be equally preferred. Then the best weight set w associated with n decisionattributes the decision maker can expect, instead of equal weights, has elements

wj =dj

n∑

j=1

dj

, ∀ j (4.13)

If the decision maker has a prior, subjective weight λj , then the overall importance weight can beadapted using the set of calculated weight wj . The new weight w◦j can be formulated as follows

w◦j =wj λj

n∑

j=1

wj λj

, ∀ j (4.14)

It may be concluded that the most important attribute is always the one having both wj and λj

at their highest levels possible.

155


4.3 Selection Models

The multiattribute decision–making methods can address three types of problems: screening al-ternatives, ranking alternatives, and selecting the final ‘best’ alternative. Note that if a methodgenerates a cardinal ranking of the alternatives, then it can be used for both screening and rank-ing as well as for selecting. In cases where the initial number of alternatives is large, ‘narrowingthe field’ through the use of simple screening methods first will reduce the computational andinformation burdens of subsequent ranking or selection analysis. An instance where the prior useof simple screening methods is a ‘must’ is when there exist stringent requirements with respectto one or more attributes,

Most of the effective techniques available for dealing with MADM require information about therelative importance of the different values of the same attribute (intra-attribute preference) andthe decision maker’s preference across attributes (inter-attribute preference).

4.3.1 Compensatory and Non–Compensatory Models

There are two major approaches in multiattribute information processing, that is, compensatorymodels and non–compensatory models. In any case, the decision maker may deem that high per-formance relative to one attribute can at least partially compensate for low performance relativeto another attribute, particularly if an initial screening analysis has eliminated alternatives whichfail to meet any minimum performance requirements.

Non–compensatory Models

The non–compensatory models do not permit trade–offs between attributes since compensationis not allowed. A decrease or unfavorable value in one attribute cannot be offset by an advantageor favorable value in some other attribute. Hence comparisons are made on an attribute-by-attribute basis. The MADM methods which belong to this class are credited for their simplicitywhich matches the behavior process of the decision maker whose knowledge is limited. Theyinclude methods such as dominance, maximin, maximax, conjunctive constraint method, dis-junctive constraint method, lexicographic method, lexicographic semiordering, and eliminationsby aspects.

A non–compensatory strategy is appropriate when the overall performance of a design is limitedby its lowest performing attribute.

Compensatory Models

The compensatory models incorporate trade–offs between high and low scores of attributes, as-signing a number to each multidimensional representation. In many cases, the decision makermay be of the view that high performance relative to one attribute can at least partially compen-sate for low performance relative to another attribute. A compensatory strategy is used whenhigher preference in one variable may compensate for lower preference in another.

156

4.3 – Selection Models

With compensatory models a single number is usually assigned to each multidimensional attributecharacterizing an alternative design. Based upon the method of evaluating this number, thesemodels can further be divided into three subgroups (Yoon and Hwang, 1981), that is, scoring,compromising and concordance models.

Scoring Models. These models select an alternative which has the highest score (or the maximumutility), reducing the decision problem to assessing the appropriate multiattribute utility functionfor the relevant decision situation. Simple additive weighting, hierarchical additive weighting andinteractive simple additive weighting belong to this category.

Compromising Models. These models identify the alternative which is the closest to the idealsolution. TOPSIS, LINMAP and nonmetric MRS methods belong to this category. Especiallywhen a decision maker uses a square utility function, identification of an ideal solution is assistedby LINMAP procedures.

Concordance Models. These models identify a set of preference rankings which best satisfy a givenconcordance measure. Permutation method, linear assignment method, and ELECTRE methodare classifiable in this class.

Compensatory methods, in order to accommodate trade–offs of low versus high performanceamong attributes, generally either require that the attributes be all measured in commensurateunits, or that the methods incorporate procedures for normalizing data which is not initiallycommensurate in order to facilitate attribute trade–off analysis.

4.3.2 Overview of MADM methods

Methods for MADM are classified based upon different forms of preference information availableto a decision maker; they require different amounts and types of information about the attributesand alternatives, beyond the basic data included in the decision matrix. The degree of evaluationaccuracy also varies, since the different preference information on attributes may be listed byan ascending order of complexity. i.e. threshold value, ordinal, cardinal, and marginal rate ofsubstitution.

A three-stage taxonomy of MADM methods which is widely recognized by the scientific commu-nity, is shown in Table 4.1 according to:

• kind of information (on attribute or alternative or neither) essential to the decision maker;

• salient feature of the information needed;

• major classes of decision methods in any combination from previous stages.

The list is comprehensive, ranging from very simple screening methods to sophisticated rankingand selection algorithms requiring computer–aided computations. The set of methods includesonly those which have been found to be practical for application to real world problems.

157


MADM methods require different amounts and types of information about the attributes andalternatives, above and beyond the basic data included in the decision matrix. These methodscan address three types of problems: screening alternatives, ranking alternatives, or choosing afinal ‘best alternative’. Note that if a method generates a cardinal ranking of the alternatives,then it can be used for both screening and choosing as well as ranking. Methods such as AHP,ELECTRTE and TOPSIS are of this ’multipurpose variety’.

Information from Salient Feature Major Classesthe Decision Maker of Information of Methods

No DominancePreference MaximinInformation Maximax

Threshold Conjunctive MethodDisjunctive Method

Ordinal Lexicographic MethodInformation Elimination by Aspects

about Attributes Permutation Method

Cardinal Analytical Hierarchy Process (AHP)Simple Additive Weighting Method (SAW)Hierarchical Additive Weighting MethodLinear Assignment MethodELECTRE MethodTOPSIS Method

Marginal Rate Hierarchical Trade–offs

Information on Pairwise Preference LINMAPabout Alternatives Interactive SAW Method

Pairwise Proximity Multidimensional Scaling with Ideal PointMarginal Rate of Substitution with Ideal Point

Table 4.1. Classes of methods for multiattribute decision making

Some methods (dominance, maximin, and maximax methods) require no additional informationbesides the basic decision matrix data. Other methods (additive weighting, TOPSIS, ELECTRE)require cardinal attribute importance ‘weights’ and cardinal performance ratings of the alterna-tives with respect to the attributes. Methods requiring this additional information place heavierdemands on the decision maker (in terms of time and information searching required), but in turnthey are able to combine, evaluate, and trade–off the decision matrix data in more sophisticatedways than the simpler methods.

Before going into the actual review, some key concepts and notations will be defined also withthe purpose to establish a unified notation of the most used terms. Also some supporting tech-niques for MADM, such as transformation of attributes and assessment of attribute weights, arediscussed, also illustrating the computational procedure of each method.

158

4.4 – Methods with No Preference Information

4.4 Methods with No Preference Information

There are some classical decision rules such as dominance, maximin and maximax which are stillfit for MADM, since they do not require any preference information from the decision maker.

4.4.1 Dominance Method

The use of dominance rule is quite common in procedures related to MADM. Assume that anumber of feasible designs, satisfying the same set of constraints and for the same selection ofattributes, are generated. It is probable that some of them will be superseded by other designsin every respect. It means that if there exists a feasible design x′ that in each relevant attributeis better or equal to design x′′, that is

x′i ≥ x

′′i , ∀ i

then design x′ dominates x′′.

Therefore, x′′ is not a competitor in further selection since a better design, x′, is found. Viceversa, an alternative design x′′ is dominated if another design x′ overperforms it with respect toat least one attribute, and performs equally with respect to the remainder of attributes.

The subset of all designs from the set M of m feasible designs, which are non dominated byany other vector of M is the Pareto set . Testing feasible designs for dominance and filteringonly nondominated solutions finally yields a set of nondominated designs. Finally, the numberof alternatives can be reduced before selection process by eliminating the dominated ones. Thismethod does not require any assumption or any transformation of attributes.

Application of the dominance rule takes the following procedures. Compare the first two alter-natives and if one is dominated by the other, discard the dominated one. Next compare theundiscarded alternative with the third alternative and discard the dominated alternative. Thenintroduce the fourth alternative and so on. After (m− 1) stages the nondominated set is deter-mined. It has multiple elements in it; hence the dominance method is mainly used for the initialfiltering. The concept of dominance exploits only the ordinal character and not the cardinal char-acter of the attribute values. Also observe that dominance does not require comparison betweendifferent attributes of two competitive designs.

With the dominance method, alternatives are screened so that all dominated alternatives arediscarded. The screening power of this method tends to decrease as the number of independentattributes becomes larger.

The following important characteristics in using nondominated designs are worth mentioning:

• subjectivity does not influence the selection of nondominated designs if the decision makerdefines the direction of design improvement (i.e. the larger attribute value the better, orvice versa);

• the process of design generation is additive; that is, it is possible to generate more designsat any time and test them against an already existing set of nondominated designs;

159


• selecting the final ‘preferred design’ is performed by a separate procedure and only aftersufficient number of nondominated designs are generated.

Calpine and Golding (1976) derived the formula for the expected average number of nondominatedsolutions when m alternatives are compared with respect to n attributes. Consider first the veryspecial case in which all the elements in the decision matrix are random numbers uniformlydistributed over the range 0 to m. Attention is first focussed on the final nth column. Arrangethe rows so that the elements in the nth column are in decreasing order of magnitude. By therandomness of the elements, the probability of an arbitrarily selected row being the rth in the orderis 1/m. Let p (m,n) be the probability that a row, arbitrarily chosen from m rows (alternatives),is nondominated with respect to n attributes. Consider the rth row. The ordering ensures thatthis row is not dominated by any row below it and also that it exceeds no row above it in thenth attribute. Hence a necessary and sufficient condition for the rth row to be nondominated isthat it is nondominated among the first r candidates with respect to the first (n− 1) attributes.Thus the probability of a row being the rth and nondominated is p (r, n− 1)/m.

Figure 4.2. Expected number of nondominated alternatives

The probability of an arbitrarily selected row being nondominated is

p (m,n) =m∑

r=1

p (r, n− 1)/m = [p (m,n− 1) + (m− 1)· p (m− 1, n)] /m (4.15)

Then the expected average number of nondominated alternatives, a (m,n), is

a (m,n) = m·p (m,n) = a (m,m− 1)/m + a (m− 1, n)

160

4.4 – Methods with No Preference Information

As a (m,1) = a (1,n) = 1, the number a (m,n) can be calculated recursively. A good approxima-tion of a (m,n) is given by

a (m,n) ≈ 1 + lnm + (lnm)2/2! + . . . + (ln m)(n−3)!/(n− 3)! +

γ(lnm)(n−2)/(n− 2)! + (lnm)(n−1)/(n− 1)! (4.16)

where γ is Euler’s constant, equal to -0.5772.

Some typical results are shown in Figure 4.2, where the expected number of nondominated al-ternatives is given in terms of the number of attributes for different sets of feasible alternatives.It indicates that the number of nondominated alternatives, for a few attributes, i.e., n = 4, willbe reduced to 8, 20, and 80 for m = 10, 100, and 1000, respectively. However, the number ofnondominated alternatives for a large number of attributes, i.e., n = 8, will still be very large,i.e., 10, 90, and 900, respectively, for m = 10, 100, and 1000.

4.4.2 Maximin Method

The principle underlying the maximin method is that ‘a chain is only as strong as its weakestattribute’: the method gives each alternative a score equal to the strength of its weakest attribute.Thus, it requires that performance with respect to all attributes be measured in commensurateunits or else be normalized prior to performing the method. Moreover, the maximin method canbe used only when all attributes are comparable so that they can be measured on a commonscale. The alternative for which the score of its weakest attribute is the highest is preferred.

In situations where the overall performance of an alternative is determined by the lowest attributescore, the decision maker should examine the attribute values for each alternative, identify thelowest value for each one, and then select the solution with the most acceptable value in its lowestattribute. This method belongs to the class of non–compensatory techniques. Alternatively, itcan be thought of as minimizing the maximum performance among attributes (minimax method).

This method utilizes only a small part of the available information in making a final choice, i.e.only one attribute per alternative. Under this procedure only the single weakest attribute rep-resents the related alternative design; all other (n− 1) attributes for a particular alternative areignored. Thus even if an alternative is clearly superior in all but one attribute which is belowaverage, another alternative with only average on all attributes would be chosen over it. If theselowest attribute values come from different attributes, as they often do, the decision maker maybe basing his/her final choice on single values of attributes that differ from alternative to alter-native. Therefore, the maximin method can be used only when all attributes must be measuredon a common scale; this characteristic can be a limitation (Linkov et al., 2004).

The alternative, A+, is selected such that

A+ ={

Ai |maxi

minj

xij

}i = 1, 2, . . . ,m ; j = 1, 2, . . . , n (4.17)

where all xij ’s are in a common scale.

161


One way of making a common scale is using the degree of closeness to the ideal solution (Zeleny,1974), defined as the ratio of an attribute value to the most preferable attribute value (xmax

j =max {x1j , x2j , . . . , xmj}), that is

rij =xij

xmaxj

(4.18)

provided that attribute j is a benefit criterion (i.e., the larger xj , the more preference).

A more complicated form of rij is

rij =xij − xmin

j

xmaxj − xmin

j

(4.19)

where xminj = minj xij , i = 1, 2, . . . , m.

Then the maximin procedure selects the alternative design as

A+ = maxi

minj

rij i = 1, 2, . . . , m ; j = 1, 2, . . . , n (4.20)

Note that in case of a cost criterion, rij has to be computed as

rij =1/xij

maxi

(1/xij)=

mini

xij

xij=

xminj

xij(4.21)

or

rij =xmax

j − xij

xmaxj − xmin

j

(4.22)

The maximin method has some shortcomings, so that its applicability is relatively limited. Ingeneral, the method can be applied whenever the decision maker has a pessimistic outlook in thedecision making situation and the attributes are truly of equal importance. The maximin and itsreverse, the minimax procedure, is used in game theory (von Neumann and Morgenstern, 1947).

4.4.3 Maximax Method

In contrast to the maximin method, the maximax method selects the alternative that maximizesthe maximum outcome among attributes for every alternative. Extending the ‘chain’ analogyused in describing the maximin method, maximax performs as if one was comparing alternativechains in search of the best single link. The score of each alternative is equal to the performanceof its strongest attribute. Like the maximin method, maximax requires that all attributes becommensurate or pre–normalized.

In this case the highest attribute value for each alternative is identified; then these maximumvalues are compared in order to select the alternative with the largest such value. This methodis also called an ‘optimistic decision criterion’.

Note that in this procedure only the single strongest attribute represents the whole alternativedesign; all other (n− 1) attributes for the particular alternative are ignored. Therefore, as with

162

4.5 – Selection Methods with Information on Attributes

the maximin method, the maximax method can be used only when all attributes are measuredon a common scale - see equations (4.18) through (4.22).

The alternative A+ is selected such that

A+ ={

Ai |maxi

maxj

rij

}j = 1,2, . . . ,n ; i = 1,2, . . . ,m (4.23)

The comparability assumptions and incompleteness properties of the maximax method do notmake it a very useful technique for general decision making. However, just as the maximinmethod, also the maximax method may be suitable in some specific decision–making situations.

Both the maximin procedure and the maximax procedure use what could be called a specializeddegenerate weighting, which may be different for each alternative (Moskowitz and Wright, 1979):the maximin method assigns a weight of 1 to the worst attribute value and a weight of 0 to allothers; the maximax method assigns a weight of 1 to the best attribute value and a weight of 0to all others.

The Hurwicz procedure (Hey, 1979) is an amalgamation of the above two, in that it takes intoaccount both the worst and the best, thus selecting A+ such that

A+ = {Ai |max [α min rij + (1− α)max rij ]} (4.24)

The weight α is referred to as the pessimism-optimism index ; it is supposed to vary (over0 ≤ α ≤ 1) among the individual decision makers; the higher α the more pessimistic the in-dividual decision maker. As is apparent, the extreme case α = 1 gives the maximin, while α = 0the maximax.

Although this procedure might seem useful in a single instance, it is clearly inadequate whenconsidering the whole multiple attribute problem since it would draft the whole design team onthe basis of a single attribute.

4.5 Selection Methods with Information on Attributes

There is a large variety of multicriterial techniques to drive selection in conditions of multipleattributes and alternatives. Usually the information on attributes is less demanding to assessthan that on alternatives. The majority of MADM methods require preference information toprocess inter-attribute and intra–attribute comparisons.

The information can be expressed in various ways:

• threshold value of each attribute (conjunctive method; disjunctive method);

• relative importance of each attribute by ordinal preference (lexicographic ordering; elimi-nation by aspects; permutation method);

• relative importance of each attribute by cardinal preference (linear assignment method; sin-gle additive weighting method; hierarchical additive weighting method; ELECTRE method;TOPSIS method);

163


• marginal rate of substitution (MRS) between attributes (marginal rate of substitution; in-difference curves; hierarchical trade–offs).

Threshold values or ordinal preference information is utilized in non–compensatory models,whereas cardinal preference or marginal rate of substitution is for compensatory models.

4.6 Methods with Threshold on Attributes

These methods require satisfactory rather than best performance in each attribute. To obtaina feasible solution the decision maker sets up the minimal threshold values he/she will acceptfor each attribute. Any candidate design which has an attribute value less than the thresholdvalue will be rejected. This procedure is called the conjunctive method (Dawes, 1964) or thesatisficing method (Simon, 1955). On the other hand, if evaluation of an alternative solution isbased upon the greatest value of only one attribute, the procedure is called the disjunctive method .

Any alternative that does not meet the conjunctive or disjunctive rules is deleted from furtherconsideration. These screening rules can be used to select a subset of alternatives for analysis byother more complex decision making tools. Screening by conjunctive and disjunctive rules canalso be applied in determination of requirements for the decision–making process.

4.6.1 Conjunctive Method

Consider, for example, the position of a CFD consultant in a ship design company. His/hereffectiveness as an expert will be limited by the lesser of his/her abilities in hydrodynamics andnumerical computation; he/she cannot compensate for an insufficient knowledge of hydrodynam-ics by an excellent knowledge of fluid dynamics, or vice versa. The company will reject thecandidates who do not possess the required standard knowledge level in both fields.

The conjunctive method is purely a screening method. The requirement embodied by the con-junctive screening approach is that in order to be acceptable, an alternative must exceed givenperformance thresholds (cut–off values) for all attributes. The cut–off values given by the de-cision maker play the key role in eliminating the unfeasible alternatives. Hence increasing theminimal threshold levels in an iterative way, the decision maker can sometimes narrow down thealternatives to a single choice. The attributes, and thus the thresholds, need not be measured incommensurable units.

An alternative Ai is classified as feasible only if

xij ≥ xoj , ∀ j = 1,2, . . . ,n (4.25)

where xoj is the cut–off value of attribute xj .

The conjunctive method is not usually used for selection of alternatives but rather for dichotomiz-ing them into feasible/unfeasible designs. Dawes (1964) developed a way to set up the standardsif the decision maker wants to dichotomize the alternatives.

164

4.6 – Methods with Threshold on Attributes

Consider a set of n equally weighted independent attributes. Let it be

r - the proportion of alternatives which are rejected;pc - the probability that a randomly chosen alternative yields outcomes above the

conjunctive cut–off level.

Then

r = 1− pnc (4.26)

since the probability of being rejected is equal to one minus the probability of passing on allattributes.

From equation (4.26) it can be derived

pc = (1− r)1/n (4.27)

which indicates that the decision maker must choose a cut–off level for each attribute such thatpc of the candidate designs will place above this score.

4.6.2 Disjunctive Method

The disjunctive method is also a pure screening method. It is the complement of the conjunctivemethod, substituting ‘or’ in place of ‘and’. That is, to pass the disjunctive screening test, analternative design must exceed the given performance threshold for at least one attribute. Likethe conjunctive method, the disjunctive method does not require attributes be measured on acommon scale.

An alternative Ai is classified as feasible only if

xij ≥ xoj , j = 1 or 2 or . . . n (4.28)

where xoj is a desirable level of xj .

For the disjunctive method, the probability of being rejected is equal to the probability of failingon all attributes

r = (1− pd)n (4.29)

where r is the proportion of alternatives which are rejected, and pd is the probability that arandomly chosen alternative scores above the disjunctive cut–off level. From equation (4.29), oneobtains

pd = 1− r1/n (4.30)

Like the conjunctive method, the disjunctive method does not need that the attribute informationbe in numerical form and does not require information on the relative importance of the attributes.

165


4.7 Methods with Ordinal Information

The most important information needed for lexicographic method, elimination by aspects, andpermutation method is the ordinal inter–attribute preference information. The relative impor-tance among attributes determined by ordinal preference is less demanding for the decision makerto assess than that by cardinal preference.

The permutation method was originally developed for the cardinal preferences of attributes given,but it is better used for the ordinal preferences given. The method will identify the best orderingof the alternative rankings.

4.7.1 Lexicographic Method

The lexicographic ordering is more widely adopted in practice than it deserves to be becauseof its limited information requirements.. This method is simple and easy to handle. The term‘lexicography’ reflects the similarity between this method and the method by which words areordered in a lexicon since it ranks attributes according to importance. The values of each succes-sive attribute are compared across alternatives.

In some decision problems a single attribute may be predominant. One way of treating thesesituations is to compare the alternatives by ranking the most important attributes in the orderof their importance. The alternative with the best performance score on the most importantattribute is preferred and the decision process ends. However, if multiple alternatives have thehighest values on the specified attribute, then the attribute ranked second in importance is com-pared across all alternatives. If alternatives are tied again, the performance of the next mostimportant attribute will be compared, and so on, till a unique alternative is found. The processcontinues sequentially until a single alternative is chosen or until all attributes have been consid-ered.

The method requires the decision maker to rank the attributes in the order of their importance.Let the subscripts of the attributes indicate not only the components of the attribute vector, butalso the priorities of the attributes, i.e., x1 be the most important attribute to the decision maker,x2 the second most important one, and so on. Then alternative A1 is (are) selected such that

A1 = {Ai |maxi

xi1} , i = 1, 2, . . . , m (4.31)

If this set {A1} has a single element, then this element is the most preferred alternative. If thereare multiple alternatives with maximal scores, consider

A2 = {A1 |maxi

xi2} , i ∈ {A1} (4.32)

If this set {A2} has a single element, then stop and select this alternative. If not, consider

A3 = {A2 |maxi

xi3} , i ∈ {A2} (4.33)

166

4.7 – Methods with Ordinal Information

Continue this process until either some {Ak} with a single element is found which is then themost preferred alternative.

When applied to general decision making, the lexicographic method requires information on thepreference among attribute values and the order in which attributes should be considered. Inboth cases, it needs only ordering or ranking information and not (necessarily) numerical values.

The lexicographic method, as it is in ‘maximin’ and ‘maximax’ methods, utilizes only a smallpart of the available information in making a final choice. But lexicography is somewhat moredemanding of information than ‘maximin’ and ‘maximax’, because it requires a ranking of theimportance of the attributes, whereas ‘maximin’ and ‘maximax’ do not. However, lexicographydoes not require comparability across attributes as did ‘maximin’ and ‘maximax’ methods.

The lexicographic semiordering, described by Tversky (1969), is closely related to the lexico-graphic ordering. In most cases it makes sense to allow ranges of imperfect discrimination sothat one alternative is not judged better just because it has a slightly higher value on one at-tribute. In a lexicographic semiordering, a second attribute is considered not only in cases wherevalues for several alternatives on the most important attribute are equal but also for cases wherethe differences between the values on the most important attribute are negligible, keeping morealternatives in the decision–making process. This same process may then be used for furtherattributes if more than one alternative still remains. Thus a consideration of whether differencesare significant is imposed upon lexicographic ordering.

4.7.2 Elimination by Aspects

Elimination by aspects (EBA) is a formalization of the well–known heuristic ‘process of elimina-tion’ followed by the decision makers during a process of sequential choice and which constitutesa good balance between the cost of a decision and its quality. At each stage of decision, thedecision makers eliminate all the alternatives not having an expected given attribute, until onlyone alternative remains.

It is a discrete model of probabilistic choice worked out by Tversky (1971), which supposes thatdecision makers follow a particular heuristic during a process of sequential selection. Like thelexicographic method, EBA examines one attribute at a time, starting with attributes deemedto be most important to make comparison among alternatives. However, it does differ slightlysince it eliminates alternatives which do not satisfy some minimum performance, and it proceedsuntil all alternatives except one have been eliminated, although adjustment of the performancethreshold may be required in some cases in order to achieve a unique solution. Another differenceis that the attributes are not ordered in terms of importance, but in terms of their discriminationpower in a probabilistic mode.

Tversky has formalized the decision process mathematically with the introduction of selectionprobability as a theoretical concept in the analysis of choice. The model proposed by Tverskypostulates that the utility assigned to the candidate designs is deterministic, but that the decisionrules by the decision makers are intrinsically probabilistic. Each alternative is viewed as a set

167


of attributes which could represent values along some fixed quantitative or qualitative scores.Since the model describes selection as an elimination process governed by successive choices ofattributes instead of cut–offs, it is called the elimination by aspects.

After a set of attributes is selected, EBA heuristic rule focuses first on the most important at-tribute and searches for a clear winner. If a winner emerges, the process stops. If a winner doesnot emerge, attention focuses on the second attribute, and so forth. The decision maker, as in theconjunctive method, is assumed to have minimum cut–offs for each attribute. When an attributeis selected, all design alternatives not passing the cut–off on that attribute are eliminated. Theprocess stops when all but one alternative are eliminated.

In the EBA model, individual choices are described as a result of a stochastic process involvinga successive elimination of the design alternatives:

• the attributes common to all alternatives are eliminated because they cannot permit todiscriminate between solutions during the selection process;

• all the alternatives that do not have a randomly selected attribute are eliminated; the higherthe utility of a property is, the larger the probability of selecting this property is;

• if the remaining solutions still have specific attributes, the decision maker goes back to thefirst step; on the contrary, if all solutions have the same attributes, the procedure ends; ifonly one alternative remains, it is selected by the decision maker; otherwise, all the remain-ing alternatives have the same probability to be selected.

The EBA is similar in some respects to the lexicographic method and conjunctive method. Itdiffers from them in that, due to its probabilistic nature, the criteria for elimination (i.e. theselected attributes) and the order in which they are applied vary from one design situation toanother and are not determined in advance. In particular, it differs from the conjunctive methodin that the number of criteria for elimination varies. If an attribute which belongs only to a singlealternative is chosen at the first stage, then the EBA needs only one attribute. The eliminationby aspects has some advantages: it is relatively easy to apply; it involves no numerical computa-tions, and it is easy to explain and justify in terms of a priority ordering defined on the attributes.The major flow on the logic of elimination by aspects lies in the non–compensatory nature of theselection process. Although each selected attribute is desirable, it might lead to the eliminationof alternatives that are better than those which are retained.

In general, the strategy of EBA cannot be defended as a rational procedure of choice. On theother hand, there may be many decision situations in which it provides a good approximationto much more complicated compensatory models and could thus serve as a useful simplificationprocedure.

4.7.3 Permutation Method

The permutation method (Paelinck, 1976) aims to identify the dominating design, that is, thebest ordering of the alternative rankings among candidate designs by measuring the level of con-cordance and discordance of the complete preference order. It uses Jaquet–Lagreze’s successive

168

4.7 – Methods with Ordinal Information

permutations of all possible rankings of the alternative designs against all others. The methodwas originally developed to treat the cardinal preferences (i.e., a set of weights) of attributesgiven, but is rather designed for the ordinal preferences of attributes given. With m alternatives,m! permutation rankings are available.

Suppose a number of alternatives (Ai, i = 1, 2, . . . , m) have to be evaluated according to n

attributes (xj , j = 1, 2, . . . , n). The problem can be stated in a decision matrix D as

x1 x2 . . . xn

D =

A1

A2...

Am

x11 x12 . . . x1n

x21 x22 . . . x2n...

.... . .

...xm1 xm2 . . . xmn

Assume that a set of cardinal weights wj , j = 1, 2, . . . , n,∑

wj = 1, be given to the set ofcorresponding attributes.

Suppose that the problem is to rank three alternatives A1,A2, and A3. Then six permutations ofthe ranking of the alternatives exist (m! = 3! = 6). They are:

P1 = (A1, A2, A3) P4 = (A2, A3, A1)

P2 = (A1, A3, A2) P5 = (A3, A1, A2)

P3 = (A2, A1, A3) P6 = (A3, A2, A1)

Assume a testing order of the alternatives P5 = (A3,A1,A2). Then the set of concordance partialorder is {A3≥A1, A3≥A2, A1≥A2} and the set of discordance is {A3≤A1, A3≤A2, A1≤A2}.If in the ranking the partial ranking Ak ≥Al appears, the fact that xkj ≥ xlj will be rated wj ,xkh ≤ xlh being rated −wh. The evaluation criterion of the chosen hypothesis (ranking of thealternatives) is the algebraic sum of wj ’s corresponding to the element–by–element consistency.Consider the ith permutation

Pi = (. . . , Ak, . . . , Al, . . .) , i = 1, 2, . . . , m!

where Ak is ranked higher than Al.

Then the evaluation criterion of Pi, i.e. Ri, is given by

Ri =∑

j∈Ckl

wj −∑

j∈Dkl

wj (4.34)

where

Ckl = {j |xkj ≥ xlj} , k, l = 1, 2, . . . , m , k 6= l

Dkl = {j |xkj ≤ xlj} , k, l = 1, 2, . . . , m , k 6= l

169


The concordance set Ckl is the subset of all criteria for which xkj ≥ xlj , and the discordance setDkl is the subset of all criteria for which xkj ≤ xlj .

The permutation method is a useful method owing to its flexibility with regard to ordinal andcardinal rankings. A possible drawback of this method is the fact that, in the absence of aclear dominant alternative, rather complicated conditions for the values of weights may arise,particularly because numerical statements about ordinal weights are not easy to interpret. Alsowith the increase of the number of alternatives the number of permutations increases drastically.

4.8 Methods with Cardinal Information

Each multiattribute decision–making process with cardinal information is based on four steps:

• determine the target, the attributes and the alternatives;

• allocate weights to attributes representing their relative importance and deliver weights toalternatives representing their effects on these attributes;

• process the scores to determine a ranking of the solutions;

• analyze the sensitivity of the results by systematically varying the weights allocated to thedifferent attributes to assess the uncertainty of the derivation.

The methods in this class require the decision maker’s cardinal preferences of attributes. This isthe most common way of expressing inter–attribute preference information. All of them involveimplicit trade–offs, but their evaluation principles are quite different:

• select an alternative in problems that have a hierarchical structure of attributes (AHP);

• select an alternative which has the largest utility (simple additive weighting, hierarchicaladditive weighting);

• arrange a set of overall preference rankings which best satisfies a given concordance measure(linear assignment method, ELECTRE);

• select an alternative which has the largest relative closeness to the ideal solution and thelargest relative distance from the anti–ideal solution (TOPSIS).

A supplementary ramification can be introduced for the methods needing cardinal information:the methods with full aggregation and those woth semi–aggregation. The methods with fullaggregation consider an optimum and the classification of alternatives is always possible. Methodswith semi–aggregation accept the possibility of incomparability and thus of intransitivity. Onlycertain elements are brought out. If an optimum does not exist, one prefers to present a subsetof alternatives instead of imposing one.

170

4.8 – Methods with Cardinal Information

4.8.1 Analytical Hierarchy Process

The Analytical Hierarchy Process (AHP) method, originally developed by Saaty (1980), dealswith the study of how to derive ratio scale priorities or weights through pairwise relative com-parisons. The basic idea of AHP is to convert subjective assessments of relative importance toa set of overall scores or weights. AHP not only supports the decision makers by enabling themto structure complexity and exercise judgement, but allows them to make both subjective prefer-ences and objective evaluation measures in the decision process. It provides a useful mechanismfor checking the consistency of the evaluation measures and alternatives generated by the designteam thus reducing bias in decision making.

AHP is a compensatory decision methodology because alternatives that are deficient with respectto one or more attributes can compensate by their performance with respect to other attributes.AHP is composed of several previously existing but unassociated concepts and techniques such ashierarchical structuring of complexity, pairwise comparisons, redundant judgements, an eigenvec-tor method for deriving weights and consistency considerations. Although each of these conceptsand techniques are useful in themselves, Saaty’s synergistic combination of the concepts and tech-niques (along with some new developments) produced a process whose power is indeed far morethan the sum of its parts. This mathematically–based method is intended to solve such selectionproblems that have a hierarchical structure of attributes. Uncertainties and other influencingfactors can also be included. Hence decision makers can make usage of their level of expertiseand apply judgement to the attributes deemed important to achieve goals.

By reducing complex decisions to a series of one-on-one (pairwise) comparisons, then synthesizingthe results, AHP helps decision makers arrive at the best possible decisions. Attributes in onelevel are compared in terms of relative importance with respect to an element in the immediatehigher level, treating the pairwise comparison with the eigenvector method as outlined in Senand Yang (1998).

The method is developed through two steps. The first step is for the decision maker to decom-pose the decision problem into its constituent parts, progressing from the general to the specific.Since the decision maker is assumed to be consistent in making evaluations about any one pair ofattributes and since all attributes will always rank equally when compared to themselves, one hasaij = 1/aji and aii = 1. This means that it is necessary to make only m (m− 1)/2 comparisonsto establish the full set of pairwise judgements for m attributes. The entries aij , (i,j = 1, . . . ,m),can be arranged in a pairwise comparison matrix A of size m×m.

The second step is to estimate the set of weights that are more consistent with the impreci-sions expressed in the comparison matrix. Note that while there is complete consistency in the(reciprocal) evaluations made about and one pair, consistency of evaluations between pairs, i.e.aij ·ajk = aik for all i,j,k (i.e. the transitivity rule) is not guaranteed. Thus the task is to searchfor an m-vector of weights such that the m×m matrix w of entries wi/wj will provide the bestfit to the evaluations recorded in the pairwise comparison matrix.

171


To assess the scale ratio wi/wj , Saaty (1977) gives a nine-point verbal scale expressing the intensityscale of the preference for one attribute over another, as shown in Table 4.2. The verbal scale isessentially an ordinal scale. If attribute Ai is judged more important than attribute Aj , then thereciprocal of the relevant index value is assigned. Saaty’s original method to compute the weightsis based on matrix algebra and determines them as the elements in the eigenvector associatedwith the maximum eigenvalue of the matrix.

Intensity of Verbal Judgement ExplanationImportance of Preference

1 Equal importance of Two attributes contribute equallyboth attributes to the goal

3 Moderate preference of one Experience and evaluation slightlyattribute over another favor one attribute over another

5 Strong preference of one Experience and evaluation stronglyattribute over another favor one attribute over another

7 Very strong preference of An attribute is strongly favoredone attribute over another and dominant

9 Extreme preference of one An attribute is favored by at leastattribute over another an order of magnitude

2, 4, 6, 8 Intermediate values between Used to compromise betweenthe two adjacent evaluations two judgements

Table 4.2. Pairwise comparison scale of attributes in AHP

Similarly to calculation of the weights for the attributes, AHP also uses the technique based onpairwise comparisons to determine the relative performance scores of the decision matrix for eachof the alternatives on each subjective evaluation. Searched scores use the same set of nine indexassessments as before, and the same techniques can be used in computing the weights of attributes.

In spite of its quality, AHP method has several disadvantages. First, it requires attributes to beindependent with respect to their preferences, which is rarely the case in design selection processes.Second, all attributes and alternatives are compared with each other (at a given level), whichmay cause a logical conflict of the kind: A > B and B > C, but C > A. The likelihood of suchconflicts occurring in the hierarchy trees increases dramatically with the number of alternativesand attributes. Moreover, a number of specialists have voiced a number of concerns about theAHP, including the potential internal inconsistency and the questionable theoretical foundationof the rigid 1-9 scale, as well as the phenomenon of rank reversal possibly arising when a newalternative is introduced. On the same time, there have also been attempts to derive similarmethods that retain the strength of AHP while avoiding some of the criticisms.

172


4.8.2 Simple Additive Weighting Method

The simple additive weighting method (Klee, 1971) is one of the best known and the most widelyused MADM methods.

To reflect their marginal worth assessments within attributes, the decision makers assign to eachattribute the importance weights which become the coefficients of the elements in the decisionmatrix, thus making a numerical scaling of intra–attribute values. They can then obtain a totalscore for each alternative simply by multiplying the scale rating for each attribute value by theimportance weight assigned to the attribute and then summing these products over all attributes.After the total scores are computed for each alternative, the solution with the highest score (thehighest weighted average) is the one suggested to the decision makers. Although this techniqueis easy to apply, it runs the risk of ignoring interactions among the attributes.

Mathematically, simple additive weighting method can be stated as follows. Suppose the decisionmaker assigns a set of importance weights w = (w1, w2, . . . , wn) to the attributes. Then the mostpreferred alternative, A∗, is selected such that

A∗ =

Ai |max

i

n∑

j=1

wjxij /n∑

j=1

wj

(4.35)

where xij is the outcome of the ith alternative about the jth attribute with a numerically com-parable scale. Usually the weights are normalized so that

∑wj = 1.

Simple additive weighting method uses all n attribute values of an alternative and uses the reg-ular arithmetical operations of multiplication and addition. Therefore, the attribute values mustbe both numerical and comparable. Moreover, it is also necessary to find a reasonable basis onwhich to form the weights reflecting the importance of each of the attributes.

When weights are assigned and attribute values are numerical and comparable, some arbitraryassumptions still remain. It can happen that a low outcome multiplied by a high weight yieldsabout the same product as a high attribute value multiplied by a low weight. This identity thenimplies that two attributes just ‘offset each other’, that is, both make the same contribution tothe weighted average. Thus there exist some difficulties in interpreting the output of the multi-plication of attribute values by weights.

Attributes cannot often be considered separately and then added together; because of the comple-mentarities between the various attributes, the approach of weighted averages may give misleadingresults. But when the attributes can in fact be considered separately (i.e., when there are essen-tially no important complementarities), simple additive weighting method can be a very powerfultool in MADM. This method will leads to a unique choice since a single number is arrived at foreach alternative, and since these numbers will usually be different.

The utility function being used for uncertainty can be equally used in the case of certainty: it iscalled value function V (x1, x2, . . . , xn). A value function satisfies the following property

V (x1, x2, . . . , xn) ≥ V (x′1, x

′2, . . . , x

′n) ⇐⇒ (x1, x2, . . . , xn) ≥ (x

′1, x

′2, . . . , x

′n)

173


For independent attributes, a value function takes the form

V (x1, x2, . . . , xn) =n∑

j=1

wj vj(xj) =n∑

j=1

wjrj

where vj(·) is the value function for the jth attribute and rj is the jth attribute transformed intothe comparable scale. A utility function can be a value function, but a value function is notnecessarily a utility function; that is

U (x1, x2, . . . , xn) =⇒ V (x1, x2, . . . , xn)

Hence a valid additive utility function can be substituted for the simple additive weighting func-tion.

In simple additive weighting method, it is assumed that the utility (score, value) of the multipleattributes can be separated into utilities for each of the individual attributes. When the attributesin question are complementary (that is, excellence with respect to one attribute enhances theutility of excellence with respect to another), or substitutes (that is, excellence with respect toone attribute reduces the utility gain associated with excellence with respect to other attributes),it is hard to expect that attributes take the separable additive form. Then the overall score orperformance can be made in a quasi–additive or multilinear form (Keeney and Raiffa, 1976).But theory, simulation computations, and experience all suggest that simple additive weightingmethod yields extremely close approximations to very much more complicated nonlinear forms,while remaining far easier to use.

4.8.3 Hierarchical Additive Weighting Method

In simple additive weighting method, the weighted averages (or priority value) for the alternativeA. are given by

n∑

j=1

wjxij /n∑

j=1

wj

where it is generally imposed that∑

wj = 1 and attribute xij is in a ratio scale.

If one interprets the normalized value xij as the score of the ith alternative with regard to the jth

attribute (Klee, 1971), then the vector xj = (x1j , x2j , . . . , xij) may indicate the contribution orimportance of Ai’s for the jth attribute, whereas the weight vector w still represents the impor-tance of the considered attributes for the decision problem.

In fact, the more sophisticated hierarchical additive weighting recognizes that attributes may sim-ply be means towards higher level targets. Hence, the decision maker assigns preferences to thehigher level targets and then assesses the capability of each of the attributes in attaining thesehigher level targets. In this way he/she infers the inter–attribute weighting from his/her directassessment of the higher level targets. Such an approach matches Saaty’s hierarchical structures(Saaty, 1977).

174


Consider, for example, a ship decision problem which can be represented as a hierarchy with threelevels. In Figure 4.3 the first hierarchy level has a single attribute, say, the life–time effectivenessof the ship. Its priority (weight) value is assumed to be equal to unity. The second hierarchylevel has six attributes, maximum speed, range, maximum payload, acquisition cost, reliability,and maneuverability. Their weights are derived from the various weight assessing methods withrespect to the attribute of the first level. The third hierarchy level has the four candidate shipsconsidered. In this level weights should be derived with respect to each attribute of the secondlevel. The problem is to determine the priorities of the different ships on life–time effectivenessthrough the intermediate second level.

Figure 4.3. A hierarchy for priorities in ship concept design

To this end, it is essential to structure a formal hierarchy (Saaty, 1977) in terms of partiallyordered sets of decision maker’s intuitive understanding of the design concept. The hierarchy ofpriorities has various levels: the top level consists of a single element and each element of a givenlevel dominates (serve as a property of) some or all the elements in the level immediately below.The pairwise comparison matrix approach may be then applied to compare elements in a singlelevel with respect to a property from the adjacent higher level. The process is repeated up tocompose the resulting weights (obtained by either eigenvector method or least weighted method)in such a way as to obtain one overall weighting vector of the impact of the lowest elements onthe top element of the hierarchy by successive weighting and composition.

4.8.4 Linear Assignment Method

Bernardo and Blin (1977) developed the linear assignment method which gives an overall prefer-ence ranking of the alternatives based on a set of attribute-wise rankings and a set of attributeweights. It features a linear compensatory process for attribute interaction and combination. Inthe process only ordinal data, rather than cardinal data, are used as input.

The linear assignment method is a special type of linear programming problem. Besides beingable to determine the best alternative, the method has certain unique advantages in application.

175


For data collection, all that is required is the attribute-wise rankings. Thus the tedious require-ments of the existing compensatory models are eliminated; i.e., the rather lengthy procedures oftrade–off analysis are not required. The procedure also eliminates the obvious difficulties encoun-tered in constructing appropriate interval–scaled indices of attributes as required for regressionanalysis to be applicable. Even though a lengthy data gathering effort is eliminated, the methoddoes satisfy the compensatory hypothesis, whereas other procedures which rely on minimal datado not.

A compensatory model from this simple approach is devised. Let define a product–attribute ma-trix π as a square (m×m) non–negative matrix whose elements πik represent the frequency (ornumber) that Ai is ranked for the kth attribute-wise ranking. It is understood that πik measuresthe contribution of Ai to the overall ranking, if Ai is assigned to the kth overall rank. The largerπik indicates the more concordance in assigning Ai to the kth overall rank. Hence the problemis to find Ai for each k, (k = 1, 2, . . . , m) which maximizes

∑πik. This is an m! comparison

problem. A linear programming model is suggested for the case of large m.

Let define a permutation matrix P as a (m ×m) square matrix whose element Pik = 1 if Ai isassigned to overall rank k, and Pik = 0 otherwise. The linear assignment method can be writtenby the following linear programming form

Maximize Π =m∑

i=1

m∑

k=1

πik Pik

subject tom∑

k=1

Pik = 1 , i = 1, 2, . . . , m

m∑

i=1

Pik = 1 , k = 1, 2, . . . , m

Pik ≥ 0 ∀ i, k

(4.36)

Recall that Pik = 1 if alternative i is assigned rank k. Of course, alternative i can be assigned toonly one rank; therefore, the first constraint equation in (4.36) holds. Likewise, a given rank k

can only have one alternative assigned to it; therefore, the second constraint equation in (4.36)holds.

Let the optimal permutation matrix, solution of the above LP problem, be P ∗. Then, the optimalordering can be obtained by multiplying the attribute-wise preference matrix A by P ∗.

4.8.5 ELECTRE Method

The ELECTRE (Elimination et Choice Translating Reality) method was originally proposed byBenayoun et al. (1966). Since then Roy (1973), Nijkamp and van Delft (1977) have developed itto the present state.

176


This method uses the concept of an outranking aggregate relationship by using pairwise compari-son of alternatives for each attribute. The outranking relationship of two alternatives denoted asAk→Al, states that even though the kth alternative does not dominate lth alternative quantita-tively, the decision maker may still accept the risk of regarding Ak as almost surely better than Al

(Roy, 1973). Through sequential assessments of the outranking relationships of all alternatives,the dominated alternatives can be eliminated.

The ELECTRE method sets the criteria for the assessment of the outranking relationships byeliciting for each pair of alternative a concordance index and a discordance index . The formerrepresents the sum of all the weights for those attributes where the performance score of thealternative Ak is at least as high as that of Al; the latter describes the counter–part of the con-cordance index. Finally a binary outranking relation between the alternatives is yielded as a finalresult.

The ELECTRE method is applied in situations where the less favored alternatives should beeliminated and a leading set of alternatives should be produced; this holds particularly in casesof a large number of alternatives with only a few attributes involved. The outranking proceduretakes the following steps:

Step 1. Normalize the decision matrix

During this step the attribute scales xij are transformed into comparable scales. Each value rij

in the normalized decision matrix R

R =

r11 r12 . . . r1n

r21 r22 . . . r2n...

.... . .

...rm1 rm2 . . . rmn

(4.37)

can be calculated as the normalized preference measure of the ith alternative in terms of the jth

attribute as

rij =xij√√√√m∑

i=1

x2ij

so that all attributes have the same unit length of vector.

Step 2. Weighting the normalized decision matrix

The weighted normalized decision matrix V is calculated by multiplying each column of thematrix R with its associated weight wj as determined off–line by the decision maker; so it isequal to

177


V = R·w =

v11 . . . v1j . . . v1n

v21 . . . v2j . . . v2n...

......

vm1 . . . vmj . . . vmn

=

w1r11 . . . wjr1j . . . wnr1n

w1r21 . . . wjr2j . . . wnr2n...

......

w1rm1 . . . wjrmj . . . wnrmn

(4.38)

where

w =

w1 0w2

. . .0 wm

andn∑

i=1

wi = 1

Step 3. Define the concordance and discordance set

For each pair of alternatives k and l (k, l = 1, 2, . . . ,m and k 6= l), the set of decision attributesJ = {j|j = 1, 2, . . . , n} is divided into two distinct subsets. The concordance set Ckl of twoalternatives Ak and Al is explained as the set of all attributes for which Ak is preferable to Al;that is

Ckl = {j|xkj ≥ xlj} for j = 1, 2, . . . , n (4.39)

On the other hand, the complementary set is called the discordance set , which is

Dkl = {j | xkj < xlj} = J − Ckl (4.40)

Step 4. Calculate the concordance and discordance matrices

The relative value of the elements in the concordance set is calculated by means of the concordanceindex , which is equal to the sum of all the weights associated with those attributes which arecontained in the concordance set. Therefore, the concordance index ckl between Ak and Al isdefined as

ckl =

∑

j∈Ckl

wj

n∑

j=1

wj

which for the normalized weight set reduces to

ckl =∑

j∈Ckl

wj with 0 ≤ ckl ≤ 1 (4.41)

The concordance index reflects the relative importance of Ak with respect to Al : a higher valueof ckl indicates that Ak is preferable to Al as far as the concordance criteria are concerned. The

178


successive values of the concordance indices ckl (k,l = 1,2, . . . ,m and k 6= l) form the concordancematrix C of (m× n) terms

C =

− c12 c13 . . . c1(m−1) c1m

c21 − c23 . . . c2(m−1) c2m...

......

......

cm1 cm2 cm3 . . . cm(m−1) −

which is generally not symmetric.

So far, no attention has been paid to the degree to which the properties of a certain alternativeAk are worse than the properties of a competing alternative Al.

Therefore a second index, called the discordance index , has to be defined as

dkl =maxj∈Dkl

| vkj − vlj |maxj∈J

| vkj − vlj | (4.42)

where 0 ≤ dkl ≤ 1 and terms v.,j denote the weighted normalized values for the attribute jth. Ahigher value of dkl implies that, for the discordance criteria, Ak is less preferable than Al, and alower value of dkl implies that Ak is more preferable than Al. The discordance indices form thediscordance matrix Dx of (m× n) terms, which is generally an asymmetric matrix

Dx =

− d12 d13 . . . d1(n−1) d1n

d21 − d23 . . . d2(n−1) d2n...

...... . . .

......

dm1 dm2 dm2 . . . dm(n−1) −

It should be noticed that the information contained in the concordance matrix C is considerablydifferent to that contained in the discordance matrix D, making the information content C andD complementary. In other terms, the concordance matric describes differences among weights,whereas differences among attribute values are represented by means of the discordance matrix.

Step 5. Determine the concordance and discordance dominance matrices

These matrices can be calculated with the help of a threshold value for the concordance indexand the discordance index, respectively. That means that for the concordance index Ak will onlyhave a chance of dominating Al, if its corresponding concordance index ckl exceeds at least acertain threshold value c, i.e.

ckl ≥ c

This cut-off value can be determined, for example, as the average concordance index

c =m∑

k=1

m∑

l=1

ckl

m(m− 1), with k 6= l (4.43)

179


On the basis of the threshold value, the Boolean concordance dominance matrix F can be con-structed, whose elements are determined as

fkl = 1 , if ckl ≥ c

fkl = 0 , if ckl < c

}(4.44)

Then each element of l on the matrix F represents the dominance of one alternative with respectto another one.

The discordance dominance matrix G is constructed in a way analogous to the F matrix on thebasis of a threshold value d to the discordance indices, calculated as

d =m∑

k=1

m∑

l=1

dkl

m(m− 1), with k 6= l (4.45)

The unit elements of the Boolean matrix G are defined as

gkl = 1 , if dkl ≤ d

gkl = 0 , if dkl > d

}(4.46)

represent the dominance relationships between any two alternatives.

Step 6. Determine the aggregate dominance matrix

The next step is to calculate the intersection of the concordance dominance matrix F and dis-cordance dominance matrix G. The resulting matrix, called the aggregate dominance matrix E,is defined by means of its typical elements ekl as follows

ekl = fkl ·gkl (4.47)

Step 7. Eliminate the less favorable alternatives

The aggregate dominance matrix E gives the partial–preference ordering of the alternatives. Ifekl = 1, the alternative Ak is preferable to the alternative Al for both the concordance criteria andthe discordance criteria, but Ak still has the chance of being dominated by the other alternatives.Hence according to the ELECTRE procedure the condition that Ak is not dominated is

ekl = 1 , for at least one l , l = 1, 2, . . . ,m ; k 6= l

ekl = 0 , for all i , i = 1, 2, . . . ,m ; i 6= k , i 6= l

(4.48)

This condition appears difficult to apply, but the dominated alternatives can be easily identifiedin the aggregate dominance matrix. If any column of the E matrix has at least one element of 1,then this column is ‘ELECTREcally’ dominated by the corresponding row(s). Hence any columnwhich has an element of 1, will be simply eliminated.

A weak point of the ELECTRE method is the use of threshold values c and d. These values arerather arbitrary, although their impact on the final solution may be significant. For example, if

180


the decision maker takes a cut–off value of c = 1, and d = 0 for complete dominance, then it israther difficult to eliminate any of the alternatives. And by relaxing the threshold value (c = 1lowered; d = 0 increased) the number of nondominated solutions can be reduced.

Nijkamp and van Delft (1977) have introduced the net dominance relationships for the comple-mentary analysis of the ELECTRE method. First they define the net concordance dominancevalue ck, which measures the degree to which the total dominance of the alternative Ak exceedsthe degree to which all competing alternatives dominate Ak, i.e.

ck =m∑

l=1

ckl −m∑

l=1

clk ; with l 6= k (4.49)

Similarly, the net discordance dominance value dk is defined as

dk =m∑

l=1

dkl −m∑

l=1

dlk ; with l 6= k (4.50)

Obviously Ak has a higher chance of being accepted with the higher ck and the lower dk. Hencethe final selection should satisfy the condition that its net concordance dominance value should beat a maximum and its net discordance dominance at a minimum. If one of these conditions is notsatisfied, a certain trade–off between the values of ck and dk has to be carried out. The procedureis to rank the alternatives according to their net concordance and discordance dominance values.The alternative that scores on the average as the highest one can be selected as the final solution.

The ELECTRE method should be considered to be one of the best ranking methods because ofits simple logic and full utilization of information contained in the decision matrix.

4.8.6 TOPSIS Method

The Technique for Order Preference by Similarity to the Ideal Solution (TOPSIS), initially pre-sented by Yoon and Hwang (1985), is an alternative to the ELECTRE method. It is one of thecompromising methods among the compensatory techniques, which utilizes performance informa-tion provided in the form of weights wi for each attribute. TOPSIS is attractive in that limitedsubjective input is needed from decision makers. It originates from the concept of displaced ideal(Zeleny, 1974) according to which the selected alternative should have the shortest distance fromthe ideal solution and the farthest from the anti–ideal solution. Commonly used metrics (L1,L2, and L∞) are considered to measure distance from zenith and nadir points on whose relativecloseness and remoteness, respectively, the preferred solution is adopted.

Then it is easy to locate the ideal solution which is composed of all best attribute values attainable,and the anti–ideal solution composed of all worst attribute values attainable. Sometimes thechosen alternative, which has the minimum Euclidean distance from the ideal solution, also hasthe shorter distance (to the anti–ideal) with respect to the other alternatives. Figure 4.4 showsan example where an alternative A1 has shorter distances (both to ideal solution A∗ and to theanti–ideal solution A−) than A2. In this case it is very difficult to justify the selection of A1.

181


Figure 4.4. Euclidean distances to the ideal and anti–ideal points in 2D space

This method again requires a decision matrix but also needs relative weights to represent pref-erence information. TOPSIS also assumes that each attribute is monotonically increasing ordecreasing.

The Algorithm

The TOPSIS method computes the following decision matrix, which refers to m alternatives A

that are evaluated in terms of n attributes as follows

X1 X2 . . . Xj . . . Xn

D =

A1

A2...

Ai...

Am

x11 x12 . . . x1j . . . x1n

x21 x22 . . . x2j . . . x2n...

......

...xi1 xi2 . . . xij . . . xin...

......

...xm1 xm2 . . . xmj . . . xmn

where xij denotes the performance measure of the ith alternative in terms of the jth attribute.

TOPSIS takes the cardinal preference information on attributes; that is, a set of weights for theattributes is required. The solution depends upon the weighting scheme given by the decisionmaker. Reliable methods for weight assessment have appeared (Chu et al., 1979; Saaty, 1977;Zeleny, 1974) which enhance usage of this method. TOPSIS assumes each attribute in the decisionmatrix to take either monotonically increasing or monotonically decreasing utility. In other words,the larger the attribute outcomes, the greater the preference for the ‘benefit’ attributes and theless the preference for the ‘cost’ attributes. Further, any attribute which is expressed in a non–numerical way should be quantified through an appropriate scaling technique.

182


The process of TOPSIS includes a series of six successive steps as follows.

Step 1. Construct the normalized decision matrix

This process standardizes the various dimensional attributes into non–dimensional attributes,which allows comparison across the attributes. One way is to take the outcome of each attributedivided by the norm of the total outcome vector of the criterion at hand, also called the Euclideanlength of a vector . An element rij of the normalized decision matrix R can be calculated as

rij =xij√√√√m∑

i=1

x2ij

obtained by all existing solutions; consequently, each attribute has the same unit length of vector.

Step 2. Form the weighted normalized decision matrix

A set of weights w = (w1,w2, . . . ,wj , . . . ,wn),∑

wj = 1, is accommodated to the decision matrixin this step. This matrix can be calculated by multiplying each column of the matrix R withits associated weight wj . Therefore, the weighted normalized decision matrix V is generated asfollows

V = R·w =

v11 v12 . . . v1j . . . v1n...

......

...vi1 vi2 . . . vij . . . vin...

......

...vm1 vm2 . . . vmj . . . vmn

=

w1r11 w2r12 . . . wjr1j . . . wnr1n...

......

...w1ri1 w2ri2 . . . wjrij . . . wnrin

......

......

w1rm1 w2rm2 . . . wjrmj . . . wnrmn

Step 3. Identify the ideal and anti–ideal solutions

The ideal solution A∗ and the anti–ideal solution, denoted as A−, are the collection of the bestand the worst values of the attributes and defined respectively as

A∗ = {(maxi

vij | j ∈ J), (mini

vij | j ∈ J ′), i = 1, 2, . . . , m} =

{v∗1, v∗2, . . . , v∗j , . . . , v∗n} (4.51)

A− = {(mini

vij | j ∈ J), (maxi

vij | j ∈ J ′), i = 1, 2, . . . , m} =

{v−1 , v−2 , . . . , v−j , . . . , v−n } (4.52)

where

J = {j = 1, 2, . . . , n | j associated with benefit criteria}J ′ = {j = 1, 2, . . . , n | j associated with cost criteria}

183


Then it is obvious that the previous created alternatives A∗ and A− represent the most prefer-able alternative, i.e. the ideal solution, and the least preferable alternative or anti–ideal solution,respectively.

Step 4. Develop the separation measure over each attribute to both zenith and nadir

The separation distances of each alternative from the ideal solution and the anti–ideal solutionare measured by the n–dimensional Euclidean metrics. That means Si∗ is the distance (in anEuclidean sense) of each alternative from the ideal solution and is defined as

Si∗ =

√√√√n∑

j=1

(vij − v∗j )2 , i = 1, 2, . . . , m (4.53)

Similarly, the separation from the anti–ideal solution is given by

Si− =

√√√√n∑

j=1

(vij − v−j )2 , i = 1, 2, . . . , m (4.54)

Step 5. Determine the relative closeness to the ideal solution

The relative closeness of an alternative Ai to the ideal solution A∗ is then found for each designas

Ci∗ =Si−

Si∗ + Si−, where 0 < Ci∗ < 1 , i = 1,2, . . . ,m (4.55)

Apparently, an alternative Ai is closer to the ideal solution A∗ as Ci∗ approaches to 1. ThusCi∗ = 1, if Ai = A∗, and Ci∗ = 0, if Ai = A−.

Step 6. Rank the preference order among alternatives

Now a preference order can be ranked according to the descending order of Ci∗ . Therefore, thebest alternative is the one with the largest value of Ci∗ , that is, with the shortest distance to theideal solution and with the largest distance and the non–ideal solution.

Unfortunately, TOPSIS suffers from two weaknesses. Firstly, the definition of separation betweenalternatives and ideal and anti–ideal points is done via an Euclidean distance measurement. Thismetric is highly sensitive to the subjective weights used to give the weighted normalized decisionmatrix. Sensitivity is increased further for higher dimension decision spaces. The second problemconcerns the fact that the distance definition automatically assumes that the attributes can bedirectly compensated by each other in a simple manner. This may lead to the method selectingdesigns with strange balances between attributes possibly leasing to extreme solutions.

184


SAW reviewed through TOPSIS

Probably the best known and widely used MADM method is the Simple Additive Weighting(SAW) method. This method is so simple that some decision makers are reluctant to accept thesolution. The SAW method is re–examined here through the concept of TOPSIS.

SAW chooses an alternative which has the maximum weighted average outcome, that is, it selectsA+ such that

A+ =

Ai | max

n∑

j=1

wjrij/n∑

j=1

wj

where∑

wj = 1 and rij is the normalized outcome of Ai with respect to the jth benefit criterion(cost criterion is converted to the benefit by taking the reciprocal before normalization).

The selected alternative A+ can be rewritten as

A+ =

Ai|max

n∑

j=1

vij

Let the separation measure in TOPSIS be defined by the city block distance (Dasarathy, 1976)instead of the Euclidean distance; then the separation between Ai and Ak can be written as

Sik =n∑

j=1

|vij − vkj | , i, k = 1, 2, . . . ,m ; i 6= k

This city block distance measure has the following useful relationship for the separation measuresto both ideal and anti–ideal solutions (Yoon, 1980)

Si∗ + Si− = S∗− = K

where K is a positive constant.

This relationship states that any alternative which has the shortest distance to the ideal solutionis guaranteed to have the longest distance to the anti–ideal solution. This is not true for theEuclidean distance measure (see Fig. 4.4). Now the relative closeness to the ideal solution canbe simplified as

Ci∗ =Si−

S∗−, i = 1,2, . . . ,m

Under the hypothesis that the chosen alternative A+ can be described as

A+ = {Ai | max Ci∗}it can be proved that

A+ =

Ai|max

n∑

j=1

vij

= {Ai|max Ci∗}

so that it can be concluded that the result of SAW is a special case of TOPSIS using the cityblock distance.

185


4.9 LINMAP

In the cases where ideal alternative and weight of attributes are not available for decision maker,methods such as ELECTRE and TOPSIS are not applicable. The linear programming techniquefor multidimensional analysis of preferences (LINMAP) developed by Srinivasan and Schoker(1973) is very appropriate for this situation.

This method is based on pairwise comparisons between alternatives given by the decision maker,generates a weight vector and produce the ideal alternative, which are unknown a priori. Finally,it generates the ’best compromise’ alternative as the solution that has the shortest distance tothe ideal solution.

LINMAP utilizes only the cardinal properties of the preference data and performs well empiricallyrelative to regression. It can deal only with MADM problems in crisp environments. All thedecision data are known precisely or given as crisp values.

4.10 MAUT Method of Group Decision

There are several approaches to extend the basic multiattribute decision making techniques forthe case of group decisions; among the others, the method developed by Csaki et al. (1995).

Consider a decision problem with r group members (decision makers) D1, . . . , Dr, n design alter-natives A1, . . . , An, and m attributes X1, . . . , Xm. In case of a factual attribute the evaluationscores must be identical for any alternative and any decision maker, while subjective attributescan be evaluated differently by each decision maker. Denote the result of the evaluation of thedecision maker Dk for alternative Aj on the attribute Xi by ak

ij . Assume that the possible prob-lem arising from the different dimensions of the attributes has already been settled, and the ak

ij

values are the results of proper transformations.

The individual preferences of the attributes are expressed as weights. Let the weights of impor-tance wk

i ≥ 0 be assigned to attribute Xi by the decision maker Dk, i = 1, . . . , m; k = 1, . . . , r.

The different knowledge and priority of the group members are expressed by voting powers bothfor weighting the attributes and scoring the alternatives against the attributes. For factual at-tributes only the preference weights given by the decision makers will be revised at each attributeby the voting powers for weighting. However, in case of subjective attributes, not only the weightsbut also the ak

ij values will be modified by the voting powers for scoring.

Let V (w)ki denote the voting power assigned to Dk for weighting on attribute Xi and V (q)k

i thevoting power assigned to Dk for scoring on attribute Xi. The method of calculating the grouputility (group ranking value) of alternative Aj is as follows:

• For each attribute Xi, the individual weights of importance of the attributes will be aggre-gated into the group weights Wi as

186

4.11 – Methods for Trade–offs

Wi =

r∑

k=1

wki V (w)k

i

n∑

k=1

V (w)ki

, i = 1, . . . , m (4.56)

• The group scoring Qij of alternative Aj against attribute Xi is

Qij =

r∑

k=1

akij V (q)k

i

n∑

k=1

V (q)ki

, i = 1, . . . , m ; j = 1, . . . , n (4.57)

• The group utility Uj of Aj is determined as the weighted algebraic mean of the aggregatedscoring values with the aggregated weights

Uj =

m∑

i=1

W ki Qij

n∑

k=1

Wi

, j = 1, . . . , n (4.58)

The best alternative of group decision is the one associated with the highest group utility. Acorrect group utility function for cardinal ranking must satisfy the axioms given in Keeney (1976).

4.11 Methods for Trade–offs

A shipowner sometimes trades in a second–hand ship plus some amount of money for a new shipbased upon his/her acceptance of the market offer. The procedure for this commercial transactionis applied in multiple attribute decision making situations. If he/she can settle for a lower valueon one attribute (i.e. reduce an amount in his/her own capital), how much can he/she expect toget for the improved value of another attribute (i.e. long–term profit)? Another specific examplein choosing a ship is that if the shipowner is willing to lower the range value of a ship, how muchpassenger space can he/she get if other properties remain the same?

Most MADM methods except the noncompensatory model deal with trade–offs implicitly or ex-plicitly. A trade–off is a ratio of the change in one attribute that exactly offsets a change inanother attribute.

Here two methods are discussed where trade–off information is explicitly utilized. Marginal rateof substitution (MRS) and indifference curves are the two basic terms describing the trade–offinformation.

187


4.11.1 Marginal Rate of Substitution

Suppose that in a ship selection problem, where two attributes x1 (range) and x2 (payload) arespecified desirable attributes while other attributes remain equal, the decision maker is asked: ifx2 is increased by ∆ units, how much does x1 have to decrease in order for the decision maker toremain indifferent? Clearly, in many cases, the answer will depend on the levels x1 of x1 and x2

of x2. If, at a point (x1, x2) the decision maker is willing to give up λ∆ units of x1 for ∆ units ofx2, then it will be said that the marginal rate of substitution (MRS) of x1 for x2 at (x1, x2) is λ.In other words, λ is the amount of x1 the decision maker is willing to pay for a unit of x2 giventhat he/she presently has x1 of x1 and x2 of x2 (Fig. 4.5). The marginal rate of substitution isthe rate at which one attribute can be used to replace another.

Figure 4.5. The marginal rate of substitution as a function of x1 and x2

Making trade–offs among three attributes is usually more difficult than making trade–offs be-tween two attributes. Hence only pairs of attributes are usually considered at a time.

It should be noted that when two attributes are independent of each other (noncompensatory),trade–offs between these attributes are not relevant. In this case it is not possible to get a highervalue on one attribute even though the decision maker is willing to give up a great outcome ofanother attribute.

The marginal rate of substitution usually depends on the levels of x1 and x2, that is, on (x1,x2).For example, suppose the substitution rate at (x1,x2), the point b in Figure 4.5, is λb. If x1 isheld fixed, one might find that the substitution rates increase with a decrease in x2 and decreasewith an increase in x2 as shown at points a and c in Figure 4.5 for x1 (range) and x2 (payload).The changes in the substitution rates mean that the more of x2 the decision maker has, the lessof x1 he/she would be willing to give up to gain additional amounts of x2 and the sacrifice ofx1 is less at point c than at point a. This implies that the MRS at which the decision maker

188


would give up x1 for gaining x2 decreases as the level of x2 increases, i.e., the marginal rate ofsubstitution diminishes.

4.11.2 Indifference Curves

Consider again the ship selection problem with x1 (fuel consumption) and x2 (payload expressedby the volume per passenger). Consider A1 as a reference point, a ship whose fuel consumptionis 26 quintals per day and whose individual cabin space is 81 ft3. A1 can be expressed, then,as the point (26, 81) in Figure 4.6. The indifference curve would require a new alternative, sayA2 (20, 95), that the decision maker would deem equivalent in preference to A1. By obtaining anumber of such points, it would be possible to trace out a curve of indifference through A1. Theindifference curve is, then, the locus of all attribute values indifferent to a reference point. Thedecision maker can draw any number of indifference curves with different reference points.

Figure 4.6. A set of indifference curves

The indifference curve can be thought of as the locus of a set of alternatives among which thedecision maker is indifferent. It is particularly useful because it divides the set of all attributevalues into (i) those indifferent to the reference point, (ii) those preferred to the reference point,and (iii) those to which the reference point is preferred. It is well known that any point on thepreferred side of the indifference curve is preferred to any point on the curve or on the non–preferred side of the curve. Hence, if the decision maker is asked to compare A1 with any otheralternative, he/she can immediately indicate a choice. However, if several points are given on thepreferred side of the indifference curve, nobody can say which is the most preferred. It would benecessary to draw new indifference curves. See the preference relationships of A1 ' A2 > A3 > A4

in Figure 4.6.

189


Three major properties are assumed for indifference curves (MacCrimmon and Toda, 1969). Thefirst property is non–intersection as opposite to intersection which implies an intransitivity ofpreference and its occurrence would generally indicate a rushed consideration of preferences. Thesecond property relates to the desirability of the attributes considered: if the decision makerassumes both attributes are desirable, then in order to get more of one attribute he/she would bewilling to give up some amount of a second attribute. This leads to a negatively sloped curve inthe preference origin. The third property is an empirical matter, in the sense that the indifferencecurves are assumed to be convex to the preference origin. This implies that the marginal rate ofsubstitution diminishes.

Note that the slope of indifference curves in Figure 4.6 gets steeper as the curves are moved down(from right to left). Also it can be observed that the MRS at (x1, x2) is the negative reciprocalof the slope of the indifference curve at (x1,x2). Thus if indifference curves are drawn, then thedecision maker can directly calculate the marginal rate of substitution.

MacCrimmon and Toda (1969) suggest some effective methods for obtaining indifference curves.One of their methods has the three types of structured procedures: (i) generating points by fixingonly one attribute; (ii) generating points by fixing both attributes, but fixing them one at a time;and (iii) generating points by fixing both attributes simultaneously.

4.11.3 Indifference curves in SAW and TOPSIS

The two different separation measures of TOPSIS (by the city block distance and Euclideandistance) are contrasted with the concept of trade–offs. Mathematically, if an indifference curvepassing through a point (v1,v2) is given by

f(v1, v2) = c (4.59)

where f is a value function and c is a constant, then the marginal rate of substitution, λ, at(v1, v2) can be obtained as

λ =(−dv1

dv2

)

(v1,v2)=

(∂f/∂v2

∂f/∂v1

)

(v1,v2)

(4.60)

The Simple Additive Weighting method or TOPSIS method with city block distance measure hasthe value function of

f(v1, v2) = v1 + v2 (4.61)

The MRS is then given by λ = 1 (actually λ = w2/w1 in x1 and x2 space). This implies that theMRS in SAW is constant between attributes, and the indifference curves form straight lines withthe slope of -1. A constant MRS is a special rare case of MRS, which implies that the local MRSis also the global MRS.

190


TOPSIS with Euclidean distance measure has the value function of

f(v1, v2) =Si−

Si− + Si∗= c =

√(v1 − v−1 )2 + (v2 − v−2 )2√

(v1 − v−1 )2 + (v2 − v−2 )2 +√

(v1 − v∗1)2 + (v2 − v∗2)2(4.62)

The MRS is now calculated by

λ =S2

i− (v∗2 − v−2 ) + S2i∗ (v2 − v−2 )

S2i− (v∗1 − v−1 ) + S2

i∗ (v1 − v−1 )(4.63)

It is evident that the marginal rate of substitution depends on the levels of v1 and v2 except atthe point where distances to the ideal and anti–ideal solution are equal, i.e., when Si∗ = Si− .

In this case

λ =v∗2 − v−2v∗1 − v−1

(4.64)

and it is not easy to illustrate the general shapes of the indifference curves.

If the value function is rewritten as

f(v1, v2) = c Si∗ − (1− c)Si− = 0 (4.65)

where 0 < c < 1, this expression indicates a variation of hyperbola where the difference of itsweighted distances from the ideal and anti–ideal points is zero.

Some typical indifference curves are shown in Figure 4.7. Any curves with c ≥ 0.5 are convexto the preference origin, which indicate the property of the diminishing MRS observed in mostindifference curves (MacCrimmon and Toda, 1969), whereas indifference curves with c ≤ 0.5 areconcave to the preference origin.

Figure 4.7. Typical indifference curves observed in TOPSIS

191


This is an unusual case of indifference curves, but it may be interpreted as a risk–prone attituderesulting from a pessimistic situation; when a decision maker recognizes his/her solution is closerto the anti-ideal than to the ideal one, he/she is inclined to take one which has the best attributewith the other worst attribute.. This approach can be viewed, therefore, as an amalgamation ofoptimistic and pessimistic decision methods which is presented by the Hurwirtz rule (Hey, 1979).

4.11.4 Hierarchical Trade–Offs

When interdependency exists among attributes, the consideration of trade–offs allows the decisionmaker to make the alternatives much more comparable than they are initially. That is, he/shecan make alternatives equivalent for all attributes except one by trade–offs, and then evaluatethe alternatives by the attribute values of the remaining one (Mac Crimmon, 1969; MacCrimmonand Wehrung, 1977).

The simplest way to deal with trade–offs on n attributes is to ignore all but two attributes; thenattributes are discarded one by one through the trade–offs between the natural combination of twoattributes. The indifference curves easily facilitate this equalization process. Suppose alternativesare located on the indifference curves. The one attribute level can be easily driven to the samelevel, and the corresponding modified value of the other attribute is read. The attribute whichis driven to the same level (called base level) is no longer necessary for further consideration. Ifthis procedure can be carried through for pairs of the remaining (n− 2) attributes, the decisionmaker will have a new set of n/2 attributes. Similarly, if these composite attributes also havepairs of natural combinations, he/she can consider the trade–offs among the pairs and use theindifference curves he/she obtains to scale a new higher order composite attribute. The decisionmaker can continue this hierarchical combination until he/she obtains two high–order compositeattributes for which he/she again forms the trade–off. In the end, all the attributes might beincorporated.

To select the preferred alternative with this approach, the decision maker must be able to lo-cate it in the final composite space. This can be done by assuring that each alternative is onan indifference curve in the initial space; thus, the combination of values defining an alternativewill be one of the scale values for the new attribute. By including these combinations on anindifference curve each step of the way, the decision maker can ensure that the alternatives willbe representable in the highest order space finally considered.

The use of this method requires that the attributes be independent among the initial classes.That is, while the trade–off between any initial pair can be nonconstant and highly interrelated,this trade–off cannot depend on the level of the other attributes. This restriction suggests thata useful way to form the initial pairs is by grouping attributes that seem relatively independentfrom the other ones.

A drawback of the hierarchical trade–off analysis with two attributes at a time may be its slownessin reducing attributes. MacCrimmon and Wehrung (1977) propose the lexicographic trade–offsfor eliminating this difficulty. If the most important class by the lexicographic method has more

192


than one attribute, the decision maker forms trade–offs among these attributes. The second mostimportant class of attributes is considered only if there are several alternatives having equallypreferred attribute values in the most important class. This extended lexicography overcomes thenoncompensatory characteristic of the standard lexicography by considering trade–offs within aclass.

Trade–off information is more useful when designing multiple attribute alternatives than whenchoosing among final versions of them.

193


194

Bibliography

[1] Benayoun, R., Roy, B. and Sussman, N.: Manual de Reference du Programme Electre, Note deSynthese et Formation, no. 25, Direction Scientifique SEMA, Paris, 1966.

[2] Bernardo, J.J. and Blin, J.M.: A Programming Model of Consumer Choice among Multi–AttributeBrands, Journal of Consumer Research, Vol. 4, no. 2, 1977, pp. 111–118.

[3] Calpine, H.C. and Golding, A.: Some Properties of Pareto–Optimal Choices in Decision Problems,OMEGA, Vol. 4, no. 1, 1976, pp. 141–147.

[4] Chu, A.T.W., Kalaba, R.E. and Spingarn, K.: A Comparison of Two Methods for Determining theWeights of Belonging to Fuzzy Sets, Journal of Optimization Theory and Applications, Vol. 27,no. 4, 1979, pp. 531–538.

[5] Dasarathy, E.V.: SMART: Similarity Measure Anchored Ranking Technique for the Analysis ofMultidimensional Data Analysis, IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC–6,no. 10, 1976, pp. 708–711.

[6] Dawes, R.M.: Social Selection Based on Multidimensional Criteria, Journal of Abnormal and SocialPsychology, Vol. 68, no. 1, 1964, pp. 104–109.

[7] Hey, J.D.: Uncertainty in Microeconomics, Martin Robertson, Oxford, 1979.

[8] Hwang, C.L. and Yoon, K.: Multiple Attribute Decision Making; Methods and Application - AState-of-the-Art Survey , Springer–Verlag, Berlin–Heidelberg, 1981.

[9] Keeney, R.L.: Decisions with Multiple Objectives: Preference and Value Tradeoffs, John Wiley, NewYork, 1976.

[10] Keeney, R.L. and Raiffa, H.: A Group Preference Axiomatization with Cardinal Utility , ManagementScience, Vol. 23, 1976, pp. 140–145.

[11] Klee, A.J.: The Role of Decision Models in the Evaluation of Competing Environmental HealthAlternatives, Management Science, Vol. 18, no. 2, 1971, pp. B52–B67.

[12] Linkov, I. Varghese, A., Jamil, S., Seager, T.P. and Bridges, T.: Multicriteria Decision Analysis:A Framework for Structuring Remedial Decisions at the Contaminated Sites, in ‘Comparative RiskAssessment and Environmental Decision Making’, Linkov and Ramadan eds. Spriger, New York,2004, pp. 15–54.

[13] MacCrimmon, K.R. and Toda, M.: The Experimental Determination of Indifference Curves, TheReview of Economic Studies, Vol. 36, no. 4, 1969, pp. 433–450.

[14] MacCrimmon, K.R. and Wehrung, D.A.: Trade–off Analysis; the Indifference and Preferred Propor-tions Approaches, in ‘Conflicting Objectives in Decisions, Bell et al. eds., John Wiley, New York, 1977.

[15] Moskowitz, H. and Wright, G.P.: Operation Research Techniques for Management , Prentice–Hall,1979.

[16] Neumann, von J. and Morgenstern, O.: Theory of Games and Economic Behavior , PrincetonUniversity Press, Princeton, 1947.

195

Bibliography

[17] Nijkamp, P. and Delft, van A.: Multi–Criteria Analysis and Regional Decision–Making , Martinus–Nijkoff Social Sciences Division, Leiden, 1977.

[18] Roy, B.: A Conceptual Framework for a Prescriptive Theory of ‘Decision–Aid’ , in Multiple CriteriaDecision Making, Cochrane & Zeleny eds., 1973, pp. 179–201.

[19] Saaty, T.L.: A Scaling Method for Priorities in Hierarchical Structures, Journal of MathematicalPsychology, Vol. 15, no. 3, 1977, pp. 234–281.

[20] Sen, P. and Yang, J.B.: Multiple Criteria Decision Support in Engineering Design, Springer–Verlag,Belin-Heidelberg, 1998.

[21] Shannon, C.E.: A Mathematical Theory of Communication, Bell System Technical Journal, Vol. 27,1948, pp. 379–423.

[22] Shannon, C.E. and Weaver, W.: The Mathematical Theory of Communication, The University ofIllinois Press, Urbana, III, 1947.

[23] Simon, H.A.: A Behavioral Model of Rational Choice, Quarterly Journal of Economics, Vol. 69,no. 1, 1955, pp. 99–114.

[24] Starr, M.K. and Greenwood, L.H.: Normative Generation of Alternatives with Multiple CriteriaEvaluation, in Multiple Criteria Decision Making, Starr & Zeleny eds., North Holland, New York,1977, pp. 111–128.

[25] Tversky, A.: Intransitivity of Preferences, Psychological Review, Vol. 76, no. 1, 1969, pp. 31–48.

[26] Tversky, A.: Elimination by Aspects: A Probabilistic Theory of Choice, Michigan MathematicalPsychology Program MMPM 71–12, The University of Michigan, Ann Arbor, 1971.

[27] Yoon, K.: Systems Selection by Multiple Attribute Decision Making , Ph.D. Dissertation, KansasState University, 1980.

[28] Yoon, K. and Hwang, C.L.: Multiple Attribute Decision Making: An Introduction, Sage, ThousandOaks, 1985.

[29] Zeleny, M.: Linear Multiobjective Programming , Springer–Verlag, Berlin/Heidelberg, 1974.

196

Chapter 5

Optimization Methods

A rational, computer–aided method employing optimization techniques can overcome many de-sign problems since it is capable of finding an optimal solution for a ship subsystem in a matterof minutes compared with days or even weeks.

The subject which started as operation research during the Second World War has grown theoret-ically and also in its applications to a variety of problems in different fields, such as engineering,economics and management. In its more comprehensive sense, which includes data collection,mathematical modelling, solutions of mathematical problems and improvements through feedbackof results, the subject has come to be known as systems analysis. The mathematical contents ofsystems analysis concerned with optimization of objectives may be grouped under the headingoptimization methods, which form the subject matter of these notes.

This chapter is an elementary mathematical introduction to classical optimization techniques, lin-ear and nonlinear programming and direct search methods. Most of the chapter can be studiedindependently of each other. A knowledge of algebra (including matrices), calculus and geometryis assumed.

In their application to real life, problems in systems analysis and operations research usuallyinvolve large number of variables, parameters, equations and constraints. The problems generallyinvolve too much numerical work which can be handled only by digital computer. For this reasonthe methods of solution are computer oriented. The criterion of suitability of a method is oftenthe economy and efficiency with which it can be programmed on the computer.

197

5 – Optimization Methods

5.1 Mathematical Modelling

Optimization is the act of achieving the best result under given conditions. In design, construc-tion, and maintenance of any technical system, engineers have to take many technological andmanagerial decisions at several stages. The ultimate goal of whichever decision is to either mini-mize the effort required or maximize the desired benefit. Since the effort required or the benefitdesired in any practical situation can be expressed as a function of certain decision variables,optimization can be defined as the process of finding the conditions that give the maximum orminimum value of a function. It can be seen from Figure 5.1 that if a point x∗ corresponds tothe minimum value of function f(x), the same point also corresponds to the maximum value ofthe negative of the function, −f(x). Thus, without loss of generality, optimization can be takento mean minimization since the maximum of a function can be found by seeking the minimumof the negative of the same function.

Figure 5.1. Minimum of f(x) is same as maximum of −f(x)

A vector of objective functions is denoted as f(x) with components fi(x), i=1,...,n. The standardform of an optimization problem assumes that all the objective functions are to be minimized. Ifthere is a problem where some function f(x) is to be maximized instead, it can be transformedto the standard form by minimizing f(x) = −f(x).

Design problems are translated into mathematical optimization problems by identifying the fol-lowing elements of the mathematical model for all technical systems. The first is to specify designvariables, which the design team can change in order to optimize its design. The second is todefine objective functions, which are figures of merit to be minimized or maximized. The third isto identify constraint functions, which specify limits that must be satisfied by the design variables.

If an n–dimensional cartesian space with each coordinate axis representing a design variable xi isconsidered, this space is called the design space. Each point in the design space is called a designpoint , which represents a feasible, or non–dominated solution to the design problem.

Solving a problem with a multiobjective function is much more complicated than solving a prob-lem with a single objective function. The solution to an optimization problem with a single

198

5.1 – Mathematical Modelling

objective is usually a single design point. On the contrary, when there are multiple objectives,the solution is usually a subspace of designs. This subspace is characterized by the condition(called Pareto optimality) that no objective function can be improved without some deteriora-tion in another objective function. Because of the complexity associated with multiple objectivefunctions, for the time being focus will be put mostly on problems with a single objective.

The optimum searching methods are also known as mathematical programming techniques and aregenerally studied as a part of operation research. Operation research is a branch of mathematicswhich is concerned with the application of scientific methods and techniques to decision–makingproblems and with establishing the best or optimal solutions. There is no single method availablefor solving all kinds of optimization problems efficiently. Hence a number of optimization methodshave been developed for solving different types of optimization problems. These methods can bebroadly divided in three categories:

1. Mathematical programming techniques

• Calculus methods

• Calculus of variations

• Geometric programming

• Linear programming

• Integer programming

• Nonlinear programming

• Quadratic programming

• Stochastic programming

• Multiobjective programming

• Dynamic programming

• Theory of games

• Network methods

2. Stochastic process techniques

• Statistical decision theory

• Markov processes

• Queueing theory

• Simulation methods

• Reliability theory

3. Statistical methods

• Regression analysis

• Cluster analysis, patter recognition

• Design of experiments

• Factor analysis

199


The mathematical programming techniques are useful in finding the minimum of a function ofseveral variables eventually under a prescribed set of constraints. The stochastic process tech-niques can be used to solve problems which are described by a set of random variables havingknown probability distributions. The statistical methods enable the decision maker to analyzeexperimental data and databases in order to build empirical models which should provide themost accurate representation of a physical phenomenon. This chapter essentially deals with thetheory of mathematical programming techniques that are suitable for the solution of engineeringproblems.

5.2 Historical Development

The existence of optimization methods can be traced back to the days of Newton, Lagrange andCauchy. The development of differential calculus methods for optimization was possible becauseof the contributions of Newton and Leibnitz. The foundations of calculus of variations were laidby Bernoulli, Euler, Lagrange and Weierstrass. Cauchy made the first application of the steepestdescent method to solve unconstrained minimization problems. The method of optimization forconstrained problems, which involves the addition of unknown multipliers, became known by thename of its inventor, Lagrange. In spite of these early contributions, very little progress wasmade until the middle of the twentieth century, when high–speed digital computers made theimplementation of the optimization procedures possible and stimulated further research on newmethods. Spectacular advances followed, producing a massive literature on optimization tech-niques and the emergence of several new areas in optimization theory.

It is interesting to note that the major developments in the area of numerical methods of uncon-strained optimization have been made in the United Kingdom only in the 1960s. The developmentof the simplex method by Dantzig in 1947 for linear programming problems and the statementof principle of optimality by Bellman in 1957 for dynamic programming paved the way for thedevelopment of the methods of constrained optimization. The work by Kuhn and Tucker in 1951on the necessary and sufficient conditions for the optimal solution of programming problems laidthe foundations for a great deal of later research in nonlinear programming . Although no singletechnique has been refined to be universally applicable for nonlinear programming problems, theworks by Carroll and Fiacco, and McCormick as well, made many difficult problems feasible tosolve by using the well–known techniques of unconstrained optimization. Geometric programmingwas developed in the 1960s by Duffin, Zener and Peterson. Gomory did pioneering work in in-teger programming , which is one of the most exciting areas of optimization, since most of thereal–world applications fall under this category of problems. Dantzig and Chames, and Cooperas well, developed stochastic programming techniques and solved some optimization problems byassuming design parameters to be independent and normally distributed. The desire to optimizemore than one objective or goal while satisfying the physical constraints led to the developmentof multiobjective programming methods. Goal programming is a well known technique for solv-ing specific types of multiobjective optimization problems. It was originally proposed for linear

200

5.3 – Statement of an Optimization Problem

problems by Chames and Cooper. Network analysis methods are essentially management controltechniques and were developed during the 1950s. The foundations of the theory of games werelaid by von Neumann in 1928 and since then it has been applied to solve several mathematicaleconomics and military problems.

Except for industrial engineering problems, there are very few linear problems in engineeringdesign. A bulk of research work has gone into developing nonlinear techniques, and literallydozen of numerical techniques have been developed. However, none have been as successful asthe linear case, which guarantees a global optimum in a finite number of steps. Despite this, thereare a number of good nonlinear methods that work successfully in most applications (Fletcher,1970; Siddall, 1972).

5.3 Statement of an Optimization Problem

Optimization problems can be formulated using widely varying notation, which can inhibit ef-fective communication about mathematical properties, algorithms and software. For this reason,there is a tendency to adopt a standard form of optimization formulation. Unfortunately, thistendency has not yet been fully realized, and the standard form described in this section is notuniversal, although it is quite common in engineering optimization textbooks.

The standard formulation of a mathematical programming problem for a single objective functioncan be stated as

Find x = {x1,x2, . . . ,xn} which minimizes f(x)

subject to the constraints gj(x) ≤ 0 , j = 1,2,...,m

hk(x) = 0 , k = 1,2,...,p

with xL ≤ x ≤ xU

(5.1)

where x is the n–dimensional vector of variables, whereas the design vector f(x) belonging to asubset of the n-dimensional real space Rn, is called the objective function, and gj and hk are theset of algebraic inequality and equality constraints, respectively. Vectors xL and xU denote thelower limit vector and upper limit vector, respectively.

The number of variables n and the number of constraints m and/or p need not be related in anyway. The problem stated in equation (5.1) is called a constrained optimization problem 1.

When an optimization problem does not involve any constraint, then it is called an unconstrainedoptimization problem and can be stated simply as

Find x = {x1,x2, . . . ,xn} which minimizes f(x) (5.2)

1In the mathematical programming problems, the equality constraints hk(x) = 0 , k = 1,2,...,p are oftenneglected, for simplicity, in the statement of a constrained optimization problem although several methods areavailable for handling problems with equality constraints.

201


5.3.1 Definitions

It is worthwhile to establish a common language at least for the most important definitions inmathematical programming problems. They are design vector distinguishing between variablesand parameters, single and multiobjective function, design constraints (functional and geomet-ric, active and inactive), feasible solution and feasible domain, simple and composite constraintsurface, free and bound design point.

Design vector

Any engineering system or component is described by a set of quantities some of which are viewedas variables during the design process. In general certain quantities are usually fixed from theoutset and these are called preassigned parameters. The vector of design variables is denotedas x with components x1, x2, ..., xn, denoting the n design variables. In general, vectors aredenoted by bold characters, whereas their components have the same character in regular font,with subscripts indicating the component number. Design variables can be real or integer orbinary ; they can be continuous or discrete.

When optimization problems are solved numerically, there is substantial advantage in scaling allquantities to avoid ill–conditioned problems. It is customary to scale all design variables so thatthey are all of order 1. Besides improving the numerical conditioning of the problem, this practicealso creates unit–independent design variables, which is often an advantage.

Design constraints

In many practical problems, the design variables have to satisfy certain specified functional andother requirements. The restrictions that must be satisfied in order to produce an acceptabledesign are collectively called design constraints. The constraints which represent limitations onthe behavior or performance of the technical system are termed as behavior or functional con-straints. The constraints which represent physical limitations on the design variables are knownas geometric constraints.

In engineering applications most constraints are inequality constraints. However, occasionallyequality constraints are also used. As in the case of design variables, it is worthwhile to trans-form constraints into non-dimensional forms of similar magnitudes.

A constraint which is satisfied with a margin is called inactive. A constraint with a negativevalue is called violated . When gj(x) = 0, the constraint is active. A design point which satisfiesall the constraints is called feasible, while a design point which violates even a single constraint iscalled infeasible. The collection of all feasible points is called the feasible domain, or occasionallythe constraint set .

202


Constraint surface

For illustration, consider an optimization problem with only inequality constraints gj(x) ≤ 0.The set of values of x that satisfy the equation gj(x) = 0 forms a hypersurface in the designspace, which is called a constraint surface. The constraint surface divides the design space intotwo regions; one in which gj(x) < 0 and the other in which gj(x) > 0. Thus, the points lying onthe hypersurface will satisfy the constraint gj(x) critically whereas the points lying in the regionwhere gj(x) < 0 are feasible. The set of all the constraint surfaces gj(x) = 0 , j = 1,2, . . . ,m,which separates the feasible region, is called the composite constraint surface.

Figure 5.2 shows a hypothetical two–dimensional design space where the unfeasible region is in-dicated by hatched lines. A design point which lies on one or more than one constraint surfaceis called a bound point and the associated constraint is called an active constraint. The designpoints which do not lie on any constraint surface are known as free points. Depending uponwhether a particular design point belongs to the feasible or unfeasible region, it can be identifiedas one of the following four types:

• free feasible point,• free unfeasible point,• bound feasible point,• bound unfeasible point.

Figure 5.2. Constraint surfaces in a 2D design space

Objective function

The concept design procedures aim to find the ‘best possible’ design, which merely satisfies thefunctional, geometric and other requirements of the problem. The criterion with respect to whicha subsystem of the design is optimized, when expressed as a function of the design variables isknown as criterion or merit or objective function of the mathematical model. The choice of theobjective function is governed by the nature of the problem. For example, the objective functionfor minimization may be generally taken as steel weight in ship, aircraft and aerospace structural

203


design problems. The maximization of mechanical efficiency is the obvious choice of an objectivein mechanical engineering systems design.

However, there may be cases where the optimization with respect to a single objective may lead toresults which may not be satisfactory with respect to another criterion. For example, in propellerdesign the geometry established for maximizing efficiency might not correspond to the one thatwould minimize the induced pressure forces. Similarly, in statically indeterminate structures, thefully stressed design may not correspond to the minimum weight design; again, it may not be thecheapest one.

In many situations, it could feasible or even necessary to identify more than one criterion tosatisfy simultaneously. For example, a gear-pair may have to be designed for minimum weightand maximum efficiency while transmitting a specified horse power. With multiple objectivesthere arises possibility of conflict and one simple way to handle this issue is to assign somesubjective preference weights and to take the actual objective function as a linear combinationof the conflicting multiple objective functions. Thus, if f1(x) and f2(x) are the two objectivefunctions, it is possible to formulate the objective function for optimization as

f(x) = w1 f1(x) + w2 f2(x) with w1 + w2 = 1 (5.3)

Objective function surfaces

The locus of all points satisfying f(x)= c = constant forms a hypersurface in the design spaceand for each value of the constant c there corresponds a different member of a family of surfaces.These surfaces, called the objective function surfaces, are shown in a hypothetical two–dimensionaldesign space in Figure 5.3.

Figure 5.3. Contours of the objective function

Once the objective function surfaces are drawn along with the constraint surfaces, the optimumpoint can be determined without much difficulty. But the main problem is that as the numberof design variables exceeds two or three, the constraint and objective function surfaces becomecomplex even for visualization and the problem has to be solved purely as a mathematical problem.

204


5.3.2 Design Optimization

A design model at preliminary design stage is called an optimization model , where the ‘bestpossible design’ selected is called the optimal design and the criterion used is called the objectiveof the model. Some optimization models will be studied later. What follows is just a discussionof the way design optimization models can be used in practice.

Optimal Design Concept

Today design is still the ultimate expression of the science of engineering. From the early days ofengineering, the goal has been to improve the design so as to achieve the best way of satisfyingthe original need, within the available means.

The design process can be organized in many ways, but it is clear that there are certain elementsin the process that any description must contain: a recognition of need , a phase of generation,and a selection of alternatives. Traditionally, the improvement of the ‘best’ alternative is thephase of design optimization. In a traditional description of the design phases, recognition ofthe original need is followed by a technical statement of the problem (problem definition), thegeneration of one or more physical configurations and the study of the candidates’ performanceusing engineering science (analysis), the selection of the ‘best possible alternative’ (synthesis)and its improvement (optimization). The process concludes with experimental validation of theprototype against the original need.

Such sequential description, though perhaps useful for educational purposes, cannot describe real-ity adequately since the question of how the ‘best possible design’ is improved within the availablemeans, is pervasive, influencing all phases where decisions are made.

So what is design optimization? One may recognize that a rigorous definition of ‘design opti-mization’ can be reached if the following questions are answered:

1. How to describe different designs?

2. What is the criterion for enhancing the ‘best possible design’?

3. What are the ‘available means’?

The first question is addressed by describing a design as a system defined by design variablesand parameters. The second question requires decision–making models where the idea of ‘bestpossible design’ is introduced and the criterion for an optimal design is called an objective.

Designers are left with the last question on the ‘available means’ by which decision makers signifya set of requirements that must be satisfied by any acceptable design. These design requirementsmay not be uniquely defined but are under the same limitations as the choice of problem objec-tive and variables. In addition, the choices of design requirements that must be satisfied are veryintimately related to the choice of objective function and design variables.

To summarize, informally, but rigorously, it can be said that design optimization involves:

1. the selection of a set of variables to describe the design alternatives;

205


2. the selection of an objective, expressed in terms of the design variables, to be minimized ormaximized;

3. the determination of a set of constraints, expressed in terms of the design variables, whichmust be satisfied by any acceptable design;

4. the determination of a set of values for the design variables, which minimize (or maximize)the objective, while satisfying all the constraints.

Optimal Product Development

The motivation for using design optimization models is improvement of the ‘most preferred’ designselected at concept design phase, which represents a compromise of many different requirements.Clearly, if this attempt is successful, substantial cost savings will be realized. Such optimizationstudies may provide the competitive edge in product design.

In the case of product development , a new original design may be represented by its model. Designalternatives can be generated by manipulating the values of the design variables. Also, changesin design parameters can show the effect of environmental changes on a particular design. Theobjective criterion will help select the best of all preferred alternatives, thus developing a prelim-inary design. How good it is depends on the model used. Many details must be left out becauseof modelling difficulties. But with accumulated experience, reliable elaborate models can be builtand design costs may be drastically reduced.

In the case of product enhancement , an existing design can be described by a model. At this designstage engineering designers should not be interested in drastic design changes that might resultfrom a full–scale optimization study, but in relatively small design changes that might improvethe performance of the product. In such circumstances, the model can be used to predict theeffect of the changes. Design cost and cycle time will be reduced. Sometimes this type of modeluse is called sensitivity study , to be distinguished from a complete optimization study .

5.3.3 Graphical Optimization

When the design problem can be formulated in terms of only two design variables, graphicalmethods can be profitably used to solve the problem and gain understanding on the nature of thedesign space. As visualizing the design space is a powerful tool for understanding the trade-offsassociated with a design problem, graphical methods are often used even when the number ofdesign variables is greater than two. In that case, one looks at special forms of the design problemwith some of the design variables frozen, and two allowed to vary.

The design space could be some part of the earth’s surface that would represent the objectivefunction. Mountain peaks would be maxima, and valley bottoms would be minima. An equalityconstraint would be a road one must stay on. An inequality constraint could be a barrier witha no trespassing sign. In fact, some optimization jargon comes from topography. Much can begained by this visualization and used to describe features of the design space. One should keep

206


in mind, however, that certain unexpected complexities may arise in problems with dimensionshigher than two, which may not be immediately evident from the three-dimensional image.

Interior and boundary optima

A problem such as

minimize f = f(x)

subject to g1(x) ≤ 0

g2(x) ≤ 0

(5.4)

can be represented by a two-dimensional picture, as in Figure 5.4.

Figure 5.4. One–dimensional representation

If the functions behave as shown in the figure, the problem is restated simply as

minimize f = f(x)

subject to xL ≤ x ≤ xU

}(5.5)

The function f(x) has a unique minimum x∗, an interior minimum, lying well within the range[xL, xU]. The point x∗L may also be called an unconstrained minimum, in the sense that theconstraints do not influence its location, that is, g1 and g2 are both inactive. It is possible,however, that problem (5.4) may result in all three situations shown in Figure 5.5. Therefore, ifx∗L is the minimum of the unconstrained function f(xL), the solution to problem (5.4) is generallygiven by selecting x∗L such that it is the middle element of the set (xL, x∗, xU) ranked accordingto increasing order of magnitude, with x∗ being the unconstrained optimum. In cases (b) and (c)where x∗ = xL and x∗ = xU, respectively, the optima are boundary optima because they occurat the boundary of the feasible region.

207


Figure 5.5. Possible bounding of minimum

In two-dimensional problems the situation becomes more complicated. A function f(x1, x2) isrepresented by a surface, so the feasible domain would be defined by the intersection of surfaces.It is obviously difficult to draw such pictures, so a representation using the orthogonal projectioncommon in engineering design drawings may be more helpful. Figure 5.6 shows a map of thedependence of vertical acceleration in a cabin of a ro–ro/pax vessel on two hull form geometricalvariables. The region where the acceleration values are higher are dark, while the region wherethe acceleration values are lower are light. The curved lines separating bands of shadings arecalled function contours. Along these lines the vertical acceleration is constant, so that they areakin to isotherms or isobars on a weather map.

208

5.4 – Classical Optimization Techniques

av(rms) (SS6)Fitting With Weighted Squares

2.2 2 1.8 1.6 1.4 1.2

6.4006.600

6.8007.000

7.2007.400

7.6007.800

8.0008.200

L/V1/3

13.20

13.40

13.60

13.80

14.00

14.20

14.40

14.60

14.80

15.00

15.20

15.40

L1/2

Figure 5.6. Vertical accelerations for a family of ro-ro vessels

5.4 Classical Optimization Techniques

The classical methods of optimization techniques are useful in finding the optimum of continuousand differentiable functions. These methods are analytical and make use of the techniques ofdifferential calculus in locating the optimum points. Since some of the practical problems in-volve objective functions that are not continuous and/or differentiable, the classical optimizationtechniques have limited scope in practical applications. However, a study of the calculus meth-ods of optimization forms a basis for developing most of the numerical techniques of optimization.

What follows presents the necessary and sufficient conditions in locating the optimum of a singlevariable function, a multivariable function without constraints, and multivariable function withequality and inequality constraints. The application of differential calculus will be consideredin the unconstrained optimization of single and mu1tivariable functions. The methods of directsubstitution, constrained variation and Lagrange multipliers will be discussed for the minimizationof a function of several variables subject to equality constraints. The application of Kuhn–Tuckernecessary conditions for the solution of a general nonlinear optimization problem with inequalityconstraints is illustrated. The convex programming problem is also defined, for which the Kuhn–Tucker conditions are both necessary and sufficient.

209


5.4.1 Single Variable Optimization

It occurs rather rarely in practice that the optimum value of a function of just one variable isrequired. However, several numerical techniques for nonlinear programming require its usage aspart of the computation strategy.

A function of one variable f(x) is said to have a relative or local minimum at x = x∗ iff(x∗) ≤ f(x∗+h) for all sufficiently small positive and negative values of h. Similarly a point x∗

is called a relative or local maximum if f(x) ≥ f(x∗+h) for all values of h sufficiently close to zero.

A function f(x) is said to have a global or absolute minimum at x∗ if f(x∗) ≤ f(x) for all x inthe domain over which f(x) is defined. Similarly a point x∗ will be a global maximum of f(x)if f(x∗) ≥ f(x) for all x in the domain. Figure 5.7 shows the difference between the relative andthe global optimum points.

Figure 5.7. Relative and global minima

A single variable optimization problem is one in which the value of x = x∗ is to be found in theinterval [a,b] such that x∗ minimizes f(x). The following two theorems provide the necessary andsufficient conditions for the relative minimum of a function of a single variable.

Necessary Condition (theorem 1). If a function f(x) is defined in the interval a ≤ x ≤ b

and has a relative minimum at x = x∗ where a ≤ x∗ ≤ b , and if the derivative df(x)/dx = f ′(x)exists as a finite number at x = x∗ , then f ′(x∗) = 0.

Discussion

1. This theorem can be proved even if x∗ is a relative minimum.

2. The theorem does not say what happens if a minimum or maximum occurs at a point x∗

where the derivative fails to exist (Fig. 5.8). If f ′(x∗) does not exist, the above theorem isnot applicable.

3. The theorem does not say what happens if a minimum or maximum occurs at an end pointof the interval of definition of the function.

210


4. The theorem does not say that the function necessarily will have a minimum or maximumat every point where the derivative is zero; it may happen that this point is neither a min-imum nor a maximum. In general, a point x∗ at which f ′(x∗) = 0 is called a stationarypoint .

Figure 5.8. Derivative undefined at x∗

If the function f(x) possesses continuous derivatives of every order in the neighborhood of x = x∗,the following theorem provides the sufficient condition for the minimum or maximum value ofthe function.

Sufficient Condition (theorem 2). Let f ′(x∗) = f ′′(x∗) = ... = f (n−1)(x∗) = 0, butf (n)(x∗) 6= 0. Then f(x∗) is (i) a minimum value of f(x) if f (n)(x∗) > 0 and n is even,(ii) a maximum value of f(x) if f (n)(x∗) < 0 and n is even, (iii) neither a maximum nor aminimum if n is odd .

In the latter case the point x∗ is called an inflection point .

5.4.2 Multivariable Optimization without Constraints

To discuss the necessary and sufficient conditions for the minimum or maximum of a function ofmultiple variables without any constraints, it is necessary to formulate Taylor’s series expansionof a multivariable function, which previously requires definition of the rth differential of thatfunction.

Definition of the rth differential of a multivariable function

If all partial derivatives of the function f through order r ≥ 1 exist and are continuous at a pointx∗, then the polynomial

d rf(x∗) =n∑

i=1

n∑

j=1

...n∑

k=1

hi hj ...hk∂rf(x∗)

∂xi ∂xj ... ∂xk(5.6)

is called the rth differential of f at x∗. Notice that there are r summations and one hi is associ-ated with each summation in equation (5.6).

211


For example, when r = 2 and n = 3, one has

drf(x∗) = d2f(x∗1,x∗2,x

∗3) =

3∑

i=1

3∑

j=1

hihj∂2f(x∗)∂xi∂xj

= h21

∂2f(x∗)∂x2

1

+ h22

∂2f(x∗)∂x2

2

+ h23

∂2f(x∗)∂x2

3

+

2h1h2∂2f(x∗)∂x1∂x2

+ 2h1h3∂2f(x∗)∂x1∂x3

+ 2h2h3∂2f(x∗)∂x2∂x3

Taylor’s series expansion

The Taylor’s series expansion of a multivariable function f(x) about a point x∗ is a multipleseries expansion given by

f(x) = f(x∗) + df(x∗) +12!

d2f(x∗) +13!

d3f(x∗) + . . . +1n!

dnf(x∗) + Rn(x∗,h) (5.7)

where the last term is called the remainder, and is given by

Rn(x∗,h) =1

(n + 1)!d(n+1)f(x∗ + θ h)

where 0 < θ < 1, and h = x− x∗.

Necessary Condition (theorem 3). If f(x) has an extreme point (maximum or minimum) atx = x∗, and if the first partial derivatives of f(x) exist at x∗, then

∂f

∂x1(x∗) =

∂f

∂x2(x∗) = . . . =

∂f

∂xnx∗ = 0 (5.8)

Sufficient Condition (theorem 4). A sufficient condition for a stationary point x∗ to be anextreme point is that the matrix of second partial derivatives (Hessian matrix) of f(x) evaluatedat x∗ is (i) positive definite when x∗ is a minimum point, and (ii) negative definite when x∗ isa maximum point .

Note: A matrix A is positive definite if all its eigenvalues are positive, i.e. all the values of λ,

which satisfy the determinant equation

| A− λ I |= 0

are positive.

212


Saddle Point

In the case of a function of two variables, f(x,y), the Hessian matrix may be neither positive nornegative definite at a point (x∗,y∗) at which

∂f

∂x=

∂f

∂y= 0

In such a case, the point (x∗,y∗) is called a saddle point . The characteristic of a saddle point isthat it corresponds to a relative minimum or maximum of f(x,y) with respect to one variable,say, x (the other variable being fixed at y = y∗) and a relative maximum or minimum of f(x,y)with respect to the second variable y (the other variable being fixed at x = x∗).

As an example, consider the function f(x,y) = x2 − y2. For this function,

∂f

∂x= 2x and

∂f

∂y= −2y

These first derivatives are zero at x∗ = 0 and y∗ = 0. Since the Hessian matrix of f at (x∗,y∗) isneither positive definite nor negative definite, the point (x∗ = 0, y∗ = 0) is a saddle point. Thefunction is shown in Figure 5.9. It can be seen that f(x,y∗) = f(x,0) has a relative minimum andf(x∗,y) = f(0,y) has a relative maximum at the saddle point (x∗,y∗).

Figure 5.9. Saddle point of the function f(x,y) = x2 − y2

Saddle points may exist for functions of more than two variables too. The characteristic of thesaddle point stated above still holds provided that x and y are interpreted as vectors in multidi-mensional cases.

The saddle point may be particularly tricky to rule out because it appears to be a minimum if oneapproaches it from only certain directions. Yet both ascending and descending directions leadaway from it. All points at which the gradient is zero are collectively called stationary points,and the above necessary condition is often called the stationary condition.

213


Nature of Stationary Points

The singularity of the Hessian gives a quadratic form that is not strictly positive but only non-negative. This could make a big difference in assessing optimality. When the second–order termsare zero at a stationary point, the higher order terms will in general be neede for a conclusivestudy. The condition∂xTH∂x is no longer sufficient but only necessary. The associated matrixis called positive–semidefinite. Identifying semidefinite Hessians at stationary points of functionshigher than quadratic should be a signal for extreme caution in reaching optimality conclusions.

The terminology and sufficiency conditions for determining the nature of a stationary point aresummarized in Table

Quadratric form Hessian matrix Nature of x

positive positive–definite local minimumnegative negative–definite local maximumnonnegative positive–semidefinite probable valleynonpositive negative–semidefinite probable ridgeany form indefinite saddle point

Table 5.1. Terminology for stationary points

5.4.3 Multivariable Optimization with Equality Constraints

The optimization of continuous functions subject to equality constraints can be stated as

minimize f = f(x)

subject to gj(x) = 0 , j = 1,2, . . . ,m

}(5.9)

where m is less than or equal to the number of variables n; otherwise (if m > n), the problembecomes overdefined and, in general, there will be no solution. There are several methods availablefor the solution of this problem. The methods of direct substitution, constrained variation andLagrange multipliers are discussed below.

Method of Direct Substitution

In the unconstrained problem, there are n independent variables and the objective function canbe evaluated for any set of n numbers. However, in the constrained problem, at least one indepen-dent variable loses its arbitrariness with the addition of each equality constraint. Thus a problemwith m constraints in n variables will have only (n −m) independent variables. If the values ofany set of (n−m) variables are selected, the values of the remaining variables are determined bythe m equality constraints.

214


Thus it is theoretically possible to solve simultaneously the m equality constraints and expressany n variables in terms of the remaining (n−m) variables. When these expressions are substi-tuted into the original objective function, there results a new objective function involving only(n − m) variables. The new objective function is not subject to any constraint and hence itsoptimum can be found by using the unconstrained optimization techniques discussed above.

This method of direct substitution, although appears to be simple in theory, is not convenient frompractical point of view. The reason for this is that the constraint equations will be nonlinear formost of the practical problems and often it becomes impossible to solve them and express any m

variables in terms of the remaining (n−m) variables. However, this method of direct substitutionmight prove to be very simple and direct for solving simple problems.

Method of Constrained Variation

The basic idea used in the method of constrained variation is to find a closed form expressionfor the first order differential of the objective function f at all points at which the constraintsgj (x) = 0 , j = 1,2, . . . ,m are satisfied. The desired optimum points are then obtained by settingthe differential df equal to zero.

Simple Problem

Before presenting the general method, its salient features will be indicated through the followingsimple problem with n = 2 and m = 1.

Consider the problem

minimize f (x1,x2)

subject to g (x1,x2) = 0

(5.10)

Let the constraint equation g(x1,x2) = 0 be solved to obtain x2 as

x2 = h (x1) (5.11)

By substituting equation (5.11), the objective function becomes a function of only one variableas f = f [x1,h(x1)]. A necessary condition for f to have a minimum at some point (x∗1,x∗2) is thatthe total derivative of f(x1,x2) with respect to x1 must be zero at (x∗1,x∗2). The total differentialof f(x1,x2) may be written as

df (x1,x2) =∂f

∂x1dx1 +

∂f

∂x2dx2

and the total derivative with respect to x1 as

df (x1,x2)dx1

=∂f (x1,x2)

∂x1+

∂f (x1,x2)∂x2

· dx2

dx1

215


When this is equated to zero, the following relation is obtained

df =∂f

∂x1dx1 +

∂f

∂x2dx2 = 0 (5.12)

Since g (x∗1,x∗2) = 0 at the minimum point, any variations dx1 and dx2 taken about the point(x∗1,x∗2) are called admissible variations provided they satisfy the relation

g (x∗1 + dx1,x∗2 + dx2) = 0 (5.13)

The Taylor’s series expansion of the function in equation (5.13) about the point (x∗1,x∗2) gives

g (x∗1 + dx1,x∗2 + dx2) ' g(x∗1,x

∗2) +

∂g (x∗1,x∗2)∂x1

dx1 +∂g (x∗1,x∗2)

∂x2dx2 = 0 (5.14)

where dx1 and dx2 are assumed to be small.

Since g(x∗1,x∗2) = 0, equation (5.14) reduces to

dg =[

∂g

∂x1dx1 +

∂g

∂x2dx2

]

(x∗1,x∗2)= 0 (5.15)

Thus equation (5.15) has to be satisfied by all admissible variations, as it is shown in Figure 5.10where PQ denotes the constraint curve at each point of which the second of equations (5.10) issatisfied. If A is the base point (x∗1,x∗2), the variations in x1 and x2 leading to the points B andC are called admissible variations. On the other hand, the variations in x1 and x2 representingthe point D are not admissible since the point D does not lie on the constraint curve.

Figure 5.10. Variations about the base point

Thus any set of variations (dx1,dx2) that does not satisfy equation (5.15) leads to points like D,which do not satisfy the constraint equation (5.10).

Assuming that ∂g/∂x2 6= 0, equation (5.15) can be rewritten as

dx2 = −[∂g/∂x1

∂g/∂x2

]

(x∗1,x∗2)

dx1 (5.16)

216


This relation indicates that once the variation dx1 in x1 is chosen arbitrarily, the variation dx2

in x2 is automatically decided in order to have dx1 and dx2 as a set of admissible variations. Bysubstituting equation (5.16) in equation (5.12), one obtains

df =[

∂f

∂x1− (∂g/∂x1)

(∂g/∂x2)∂f

∂x2

]

(x∗1,x∗2)

dx1 = 0 (5.17)

Note that equation (5.17) has to be satisfied for all values of dx1. Since dx1 can be chosenarbitrarily, equation (5.17) leads to

df =(

∂f

∂x1

∂g

∂x2− ∂f

∂x2

∂g

∂x1

)

(x∗1,x∗2)= 0 (5.18)

Equation (5.18) gives a necessary condition in order to have (x∗1,x∗2) as an extreme point (mini-mum or maximum).

General problem: necessary conditions

The procedure indicated above can be generalized to the case of a problem in n variables withm constraints. In this case, each constraint equation gj(x) = 0, j = 1,2, . . . ,m, gives rise toa linear equation in the variations dxi, i = 1,2, . . . ,n. In all there will be m linear equationsin n variations. Hence any m variations can be expressed in terms of the remaining (n − m)variations. These expressions can be used to express the differentiated objective function df interms of the (n−m) independent variations. By letting the coefficients of the independent vari-ations vanish in the equation df = 0, one obtains the necessary conditions for the constrainedoptimum of the given function. The equations involved in this procedure are given below in detail.

The differential of the objective function is given by

df =∂f

∂x1(x∗) dx1 +

∂f

∂x2(x∗) dx2 + . . . +

∂f

∂xn(x∗) dxn (5.19)

where x∗ represents the extreme point and (dx1,‘,dx2, . . . ,‘,dxn) indicates the set of admissibleinfinitesimal variations about the point x∗.

The following holds

gj (x∗) = 0 , j = 1,2, . . . ,m (5.20)

since the given constraints are satisfied at the extreme point x∗, and

gj (x∗ + dx) ' gj (x∗) +n∑

i=1

∂gj

∂xi(x∗) dxi = 0 , j = 1,2, . . . ,m (5.21)

since dx∗ = {dx1,dx2, . . . ,dxn} is a vector of admissible variations.

217


Equations (5.20) and (5.21) lead to

∂g1

∂x1dx1 +

∂g1

∂x2dx2 + . . . +

∂g1

∂xndxn = 0

∂g2

∂x1dx1 +

∂g2

∂x2dx2 + . . . +

∂g2

∂xndxn = 0

...∂gm

∂x1dx1 +

∂gm

∂x2dx2 + . . . +

∂gm

∂xndxn = 0

(5.22)

where all the partial derivatives are assumed to have been evaluated at the extreme point x∗.Any set of variations dxi, not satisfying equations (5.22) will not be of interest here, since theydo not satisfy given constraints. Equations (5.22) can be solved to express any m variations, say,the first m variations in terms of the remaining variations, such that they are rewritten as

∂g1

∂x1dx1 +

∂g1

∂x2dx2 + . . . +

∂g1

∂xmdxm = − ∂g1

∂xm+1dxm+1−

∂g1

∂xm+2dxm+2 − . . .− ∂g1

∂xndxn = h1

∂g2

∂x1dx1 +

∂g2

∂x2dx2 + . . . +

∂g2

∂xmdxm = − ∂g2

∂xm+1dxm+1−

∂g2

∂xm+2dxm+2 − . . .− ∂g2

∂xndxn = h2

...∂gm

∂x1dx1 +

∂gm

∂x2dx2 + . . . +

∂gm

∂xmdxm = − ∂gm

∂xm+1dxm+1−

∂gm

∂xm+2dxm+2 − . . .− ∂gm

∂xndxn = hm

(5.23)

where the terms containing the independent variations dxm+1,dxm+2, . . . ,dxn are placed on theright side. Thus, for any arbitrarily chosen values of dxm+1,dxm+2, . . . ,dxn, the values of thedependent variations are given by equations (5.23), which can be solved using Cramer’s rule.

General Problem: sufficient conditions

By eliminating the first m variables, using the m equality constraints (at least theoreticallypossible), the objective function f can be made to depend only on the remaining variablesxm+1, xm+2, . . . , xn. Then the Taylor’s series expansion of f , in terms of these variables, aboutthe extreme point x∗ gives

f (x∗ + dx) ' f (x∗) +n∑

l=m+1

(∂f

∂x1

)

gdxl +

12!

n∑

l=m+1

n∑

j=m+1

(∂2f

∂xi ∂xj

)

g

dxi dxj (5.24)

where (∂f/∂xi)g denotes the partial derivative of f with respect to xi (holding all the othervariables xm+1,xm+2, . . . ,xi+1,xi+2, . . . ,xn constant) where x1,x2, . . . ,xm are allowed to change

218


so that the constraints gj (x∗ + dx) = 0, (j = 1,2, . . . ,m) are satisfied, and the second derivative(∂2f/∂xi ∂xj)g denote a similar meaning.

Method of Lagrange Multipliers

In the method of direct substitution, n variables were eliminated from the objective function withthe help of m equality constraints. In the method of constrained variation, m variations wereeliminated from the differential of the objective function. Thus both these methods were basedon the principle of eliminating m variables by making use of the constraints and then solving theproblem in terms of the remaining (n−m) decision variables.

In the Lagrange multiplier method , on the contrary, one additional variable is introduced to theproblem for each constraint. Thus if the original problem has n variables and m equality con-straints, m additional variables are added to the problem so that the final number of unknownsbecomes (n + m). Of course, there are some simplifications afforded by the addition of the newvariables. The basic features of the method will be initially given for a simple problem of twovariab1es with one constraint. The extension of the method to a general prob1em of n variableswith m constraints follows.

Simple Problem

Consider the optimization problem

minimize f(x1,x2)

subject to g(x1,x2) = 0

(5.25)

which was examined in discussing the method of constrained variation, where the necessarycondition for the existence of an extreme point at x = x∗ was found (see equation (5.18)) to bethat

(∂f

∂x1− ∂f/∂x2

∂g/∂x2· ∂g

∂x1

)

(x∗1,x∗2)

= 0 (5.26)

where all quantities are evaluated at (x∗1,‘,x∗2).

By defining a quantity λ, called the Lagrange multiplier , as

λ = −(

∂f/∂x2

∂g/∂x2

)

(x∗1,x∗2)

(5.27)

equation (5.26) can be rewritten as(

∂f

∂x1+ λ

∂g

∂x1

)

(x∗1,x∗2)= 0 (5.28)

whereas equation (5.27) can be rewritten with some rearrangement as(

∂f

∂x2+ λ

∂g

∂x2

)

(x∗1,x∗2)= 0 (5.29)

219


In addition, the constraint equation has to be satisfied at the extreme point, i.e.

g(x1,x2) |(x∗1,x∗2)= 0 (5.30)

Thus equations (5.28) through (5.30) represent the necessary conditions for the point (x∗1,‘,x∗2) tobe an extreme point.

Notice that the partial derivative (∂g/∂x2) |(x∗1,x∗2) has to be nonzero in order to be able to defineλ by equation (5.27). This is because the variation dx2 was expressed in terms of dx1 by equation(5.16) in the derivation of equation (5.26). On the other hand, if one chooses to express dx1 interms of dx2, the requirement would be obtained that (∂g/∂x1) |(x∗1,x∗2) be nonzero to define λ.Thus the derivation of the necessary conditions by the method of Lagrange multipliers requiresonly that at least one of the partial derivatives of g(x1,x2) be nonzero at an extreme point.

The necessary conditions given by equations (5.28) to (5.30) can also be generated by constructinga function L, known as the Lagrange function, as

L(x1,x2,λ) = f(x1,x2) + λ·g (x1,x2) (5.31)

If the partial derivatives of the Lagrange function L(x1,x2,λ) with respect to each of its argumentsare set equal to zero, the necessary conditions given by equations (5.28) through (5.30) can beobtained as

∂L

∂x1(x1,x2,λ) =

∂f

∂x1(x1,x2) + λ

∂g

∂x1(x1,x2) = 0

∂L

∂x2(x1,x2,λ) =

∂f

∂x2(x1,x2) + λ

∂g

∂x2(x1,x2) = 0

∂L

∂λ(x1,x2,λ) = g(x1,x2) = 0

(5.32)

which are to be satisfied at an extreme point (x∗1,x∗2). The sufficient conditions to be satisfied willbe given later.

General Problem

The equations derived above can be extended to the case of a general problem with n variablesand m equality constraints. The result can be stated in the form of a theorem as follows.

Necessary condition (theorem 5). A necessary condition for a function f(x) subject to theconstraints gj(x) = 0 (j = 1,2, . . . ,m) to have a relative minimum at a point x∗ is that the firstpartial derivatives of the Lagrange function defined by L = L (x1,x2, . . . ,xn; λ1,λ2, . . . ,λm) withrespect to each of its arguments must be zero.

To have a constrained relative minimum at x∗ it needs to comply with the following theoremwhich gives the sufficient condition for f(x) .

220


Sufficient condition (theorem 6). A sufficient condition for f(x) to have a relative minimumat x∗ is that the quadratic, Q, defined by

Q =n∑

i=1

n∑

j=1

∂2L

∂xi ∂xjdxi dxj (5.33)

evaluated at x = x∗ must be positive definite for all values of dxi for which the constraints aresatisfied.

Discussion

1. If the quadratic form Q =n∑

i=1

n∑

j=1

[∂2L (x∗,λ)/(∂xi ∂xj)] dxi dxj at an extreme point of f(x)

is negative for all choices of the admissible variations dxi, x∗ will be a constrained maximumof f(x).

2. It has been shown by Hancock (1960) that a necessary condition for the quadratic form Q

to be positive (negative) definite for all admissible variations dx is that each root of thepolynomial, zi, defined by the following determinant equation, be positive (negative)

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

(L11 − z) L12 L13 . . . L1n g11 g21 . . . gm1

L21 (L22 − z) L23 . . . L2n g12 g22 . . . gm2

...Ln1 Ln2 Ln3 . . . (Lnn − z) g1n g2n . . . gmn

g11 g12 g13 . . . g1n 0 0 . . . 0

g21 g22 g23 . . . g2n 0 0 . . . 0...

gm1 gm2 gm3 . . . gmn 0 0 . . . 0

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

= 0 (5.34)

where

Lij =

(∂2L

∂xi∂xj

)

(x∗, λ∗)

(5.35)

and

gij =

(∂gi

∂xj

)

(x∗)

(5.36)

3. Equation (5.34), on expansion, leads to a (n −m)th order polynomial in z. If some of theroots of this polynomial are positive while the others are negative, the point x∗ is not anextreme point.

221


5.4.4 Multivariable Optimization with Inequality Constraints

In the constrained optimization problems with inequality constraints

minimize f (x)

subject to gj (x) ≤ 0 , j = 1,2, . . . ,m

(5.37)

where any upper and lower limits on the design variables are assumed here to be included in theinequality constraints.

The latter can be transformed to equality constraints by adding non–negative slack variables, yj ,as

gj (x) + y2j = 0 , j = 1,2, . . . ,m (5.38)

where the values of the slack variables are yet unknown.

The problem is now in a form suitable for the application of one of the methods discussed in theforegoing, that is

minimize f (x)

subject to Gj (x,y) = gj (x) + y2j = 0 , j = 1,2, . . . ,m

(5.39)

where y = {y1,y2, . . . ,ym} is the vector of the slack variables.

This problem can be conveniently solved by the method of Lagrange multipliers. For this, theLagrange function L is constructed as

L(x,y,λ) = f(x) +m∑

j=1

λj Gj (x,y) (5.40)

where λ = {λ1,λ2,...,λm} is the vector of Lagrange multipliers.

The stationary points of the Lagrange function can be found by solving the following equations(necessary conditions)

∂L

∂xi(x,y,λ) =

∂f

∂xi(x) +

m∑

j=1

λj∂gj

∂xi(x) = 0 , i = 1,2, . . . ,n (5.41)

∂L

∂λj(x,y,λ) = Gj (x,y) = gj (x) + y2

j = 0 , j = 1,2, . . . ,m (5.42)

∂L

∂yj(x,y,λ) = 2λj yj = 0 , j = 1,2, . . . ,m (5.43)

Equations (5.42) ensure that the constraints gj (x) ≤ 0 (j = 1,2, . . . ,m) are satisfied, while equa-tions (5.43) imply that either λj = 0 or yj = 0. If λj = 0, it means that the constraint is inactive2

and hence it can be ignored. On the other hand, if yj = 0, it means that the constraint is active

2Those constraints which are satisfied with equality sign, gj = 0, at the optimum point are called the activeconstraints, while those that are satisfied with strict inequality sign, gj ≤ 0, are termed as inactive constraints.

222


(gj = 0) at the optimum point. Consider the division of the constraints into two subsets J1 andJ2 where J1 + J2 represent the total set of constraints. Let the set J1 indicate the indices ofthose constraints which are active at the optimum point and let J2 include the indices for all theinactive constraints.

Thus for j ∈ J1, yj = 0 (constraints are active), whereas for j ∈ J2, λj = 0 (constraints areinactive), and equation (5.41) can be simplified as

∂f

∂xi+

∑

j∈J1

λj∂gj

∂xi= 0 , i = 1,2, . . . ,n (5.44)

Similarly, constraint equations (5.42) can be written as

gj (x) = 0 , j ∈ J1 (5.45)

gj (x) + y2j = 0 , j ∈ J2 (5.46)

Equations (5.44) through (5.46) represent [n + p + (m− p)] = (n + m) equations in the (n + m)unknowns xi (i = 1,2, . . . ,n), λj (j ∈ J1) and yj (j ∈ J2), where p denotes the number of activeconstraints.

Assuming that the first p constraints are active, equations (5.44) can be expressed as

− ∂f

∂xi= λ1

∂g1

∂xi+ λ2

∂g2

∂xi+ . . . + λp

∂gp

∂xi, i = 1,2, . . . ,n (5.47)

or written collectively as

−∇f = λ1 ·∇g1 + λ2 ·∇g2 + . . . + λp ·∇gp (5.48)

where ∇f and ∇gj are the gradients of the objective function and the jth constraint given,respectively, by

∇f =

∂f/∂x1

∂f/∂x2...

∂f/∂xn

and ∇gj =

∂gj/∂x1

∂gj/∂x2...

∂gj/∂xn

Thus the negative of the gradient of the objective function can be expressed as a linear combinationof the gradients of the active constraints at the optimum point . Further, it can be shown that inthe case of a minimization problem, the λj ’s (j ∈ J1) must be positive.

For simplicity of illustration, suppose that only two constraints are active (p = 2) at the optimumpoint. Then equation (5.48) reduces to

−∇f = λ1 ·∇g1 + λ2 ·∇g2 (5.49)

223


Let S be a feasible direction 3 at the optimum point.

By pre-multiplying both sides of equations (5.49) by ST, the following equation is obtained

−ST ·∇f = λ1 ST ·∇g1 + λ2 ST ·∇g2 (5.50)

Since S is a feasible direction, it should satisfy the relations

ST ·∇g1 < 0

ST ·∇g2 < 0

(5.51)

Thus, if λ1 > 0 and λ2 > 0, the quantity ST ·∇f can be seen to be always positive. As ∇f

indicates the gradient direction, along which the value of the function increases at the maximumrate, ST·∇f represents the component of the increment of f along the direction S. If ST·∇f > 0,the function value increases as it moves along the direction S. Hence, if λ1 and λ2 are positive,one will not be able to find any direction in the feasible domain along which the function valuecan be further decreased. Since the point at which equation (5.51) is valid is assumed to beoptimum, λ1 and λ2 have to be positive. This reasoning can be extended to cases where thereare more than two constraints active. By proceeding in a similar manner, one can show that theλj ’s have to be negative for a maximization problem.

Figure 5.11. Feasible direction S

3A vector S is called a feasible direction from a point x if at least a small step can be taken along it that does notimmediately leave the feasible region. Thus for problems with sufficiently smooth constraint surfaces, the vector Ssatisfying the relation

ST ·∇gj < 0

can be called a feasible direction. On the other hand, if the constraint is either linear or concave as shown in Figures5.11(b) and 5.11(c), any vector satisfying the previous relation can be called a feasible direction. The geometricinterpretation of a feasible direction is that the vector S makes an obtuse angle with all the constraint normalsexcept that, for the linear or outward curving (concave) constraints, the angle may go to 90o at the optimum point.

224


Kuhn–Tucker Conditions

When the set of active constraints is known, the conditions to be satisfied at a constrainedminimum point, x∗, for the problem stated in equation (5.37), can be expressed as

∂f

∂xi+

∑

j∈J1

λj∂gj

∂xi= 0 , i = 1,2, . . . ,n

λj > 0 , j ∈ J1

(5.52)

where ∂f/∂xi and ∂gj/∂xi denote the gradient vectors with respect to x. These conditions arecalled the Kuhn–Tucker conditions (1951) after the mathematicians who developed them.

They are the necessary conditions to be satisfied at a relative constrained minimum of f(x).These conditions are, in general, not sufficient to ensure a relative minimum. However, there isa class of problems, called convex programming problems for which the Kuhn–Tucker conditionsare necessary and sufficient for a global minimum.

If the set of active constraints is not known, the Kuhn–Tucker conditions can be stated as follows

∂f

∂xi+

m∑

j=1

λj∂gj

∂xi= 0 , i = 1,2, . . . ,n

λjgj = 0 , j = 1,2, . . . ,m

gj ≤ 0 , j = 1,2, . . . ,m

λj ≥ 0 , j = 1,2, . . . ,m

(5.53)

Note that if the problem is one of maximization or if the constraints are of the type gj ≥ 0, thenλj have to be non-positive in equations (5.53). On the other hand, if the problem is a minimizationone with constraints in the form gj ≥ 0, then λj have to be non–negative in equations (5.53).

Convex Programming

Any optimization problem stated in the form of equation (5.37) is called a convex programmingproblem, provided the objective function, f(x), and the constraint functions, gj(x), are general(smooth) convex functions. The definitions and properties related to convex functions are givenin Appendix A.

Suppose that f(x) and gj(x) (j = 1,2, . . . ,m), are convex functions. The Lagrange function ofequations (5.39) can be written as

L(x,y,λ) = f(x) +m∑

j=1

λj [gj(x) + y2j ] i = 1,2, . . . ,n (5.54)

If λj > 0, then λj gj(x) is convex and since λj yj = 0 from equation (5.43), L(x,y,λ) will be aconvex function.

225


It has been derived that a necessary condition for f(x) to be a relative minimum at x∗ is thatL(x,y,λ) has a stationary point at x∗. However, if L(x,y,λ) is a convex function, its derivativevanishes only at one point and hence this point must be an absolute minimum for the functionf(x). Thus the Kuhn–Tucker conditions are both necessary and sufficient for an absolute mini-mum of f(x) at x∗.

To end, the following remarks:

• If the given optimization problem is known to be a convex programming problem, there willbe no relative minima or saddle points and hence the extreme point found by applying theKuhn–Tucker conditions is guaranteed to be an absolute minimum of f(x). However, it isoften very difficult to ascertain whether the objective and constraint functions involved ina practical engineering problem are convex.

• The Kuhn–Tucker conditions derived above are based on the development given for equalityconstraints. One of the requirements for these conditions is that at least one of the Jaco-bians composed of the m constraints and m of the (n + m) variables (x1, . . . ,xn; y1, . . . ,yn)be nonzero. This requirement is implied in the above derivation.

226


Appendix A

Convex and Concave Functions

1. Convex Function

A function f(x) is said to be convex if for any pair of points

x1 = {x(1)1 ,x

(1)2 ,...,x

(1)n }, and x2 = {x(2)

1 ,x(2)2 ,...,x

(2)n }

it results

f [λx2 + (1− λ)x1] ≤ λ f(x2) + (1− λ) f(x1) for all λ , 0 ≤ λ ≤ 1 (5.55)

that is, if the line segment connecting any two points in the graph lies entirely above or on thegraph of f(x).

Figures 5.12(a) and 5.13(a) illustrate a convex function in one and two dimensions, respectively.It can be seen that a convex function is always bending upwards and hence it is apparent thatthe local minimum of a convex function is also a global minimum.

Figure 5.12. Functions of one variable: (a) convex function; (b) concave function

2. Concave Function

A function f(x) is called a concave function if for any two points x1, x2, it results

f [λx2 + (1− λ)x1] ≥ λ f(x2) + (1− λ) f(x1) for all λ , 0 ≤ λ ≤ 1 (5.56)

that is, if the line segment joining any two points lies entirely below or on the graph of the functionbetween the two points x1, x2.

Figures 5.12(b) and 5.13(b) illustrate a concave function in one and two dimensions, respectively.It can be seen that a concave function bends downwards and hence the local maximum will alsobe a global maximum. It can be seen that the negative of a convex function is a concave function

227


and vice versa. Also note that the sum of convex functions is a convex function and the sum ofconcave functions is a concave function.

A function f(x) is a strictly convex or concave function if strict inequality holds in equations(5.190) or (5.191) respectively, for any x1 6= x2. A linear function will be both convex andconcave since it satisfies both the inequalities (5.190) and (5.191). A function may be convexwithin a region and concave elsewhere.

Figure 5.13. Functions of two variables: (a) convex function; (b) concave function

It is important to note that the convexity or concavity of a function is defined only when itsdomain is a convex set. Convex sets are a special class of sets of points in the Euclidean spaceEn, which play an important role in optimization theory.

Testing for convexity and concavity

In addition to above definition, the following equivalent relations can be used to identify a convexfunction.

Theorem 7. A function f(x) is convex if, for any pair of points x1 and x2

f(x2) ≥ f(x1) +∇fT (x1)·(x2 − x1)

If f(x) is concave, the opposite type of inequality holds.

Theorem 8. A function f(x) is convex if the Hessian matrix H(x) = [∂2f(x)/∂xi∂xj ] is positivesemidefinite.

If H(x) is positive definite, the function f(x) will be strictly convex. It can also be proved thatif f(x) is concave, the Hessian matrix is negative semidefinite.

Theorem 9. Any local minimum of a convex function f(x) is a global minimum.

It means that there cannot exist more than one minimum for a convex function.

228

5.5 – Classification of Optimization Techniques

5.5 Classification of Optimization Techniques

The various techniques available for the solution of optimization problems are given under theheading mathematical programming techniques. They are generally studied as a part of operationresearch, which is a branch of mathematics which is concerned with the application of scientificmethods and techniques to decision-making problems and with establishing the best or optimalsolutions.

The classical methods of differential calculus can be used to find unconstrained maxima and min-ima of a function of several variables. These methods assume that the function is differentiabletwice with respect to the design variables and the derivatives are continuous. When the problemis one of minimization or maximization of an integral, the methods of calculus of variations canbe used to solve it. For problems with equality constraints, the Lagrange multiplier method isfrequently used. But this method, in general, leads to a set of nonlinear simultaneous equationswhich may be difficult to solve.

Classification of optimization problems can be based on the nature of the expressions for theobjective function and the constraints (see Section 5.1). This classification is extremely usefulfrom the computational viewpoint since there are many methods indicated by the same name,developed solely for the efficient solution of a particular class of problems. These are all numericalmethods wherein an approximate solution is sought by proceeding in an iterative manner startingfrom a guess solution. Thus the first task of a designer would be to investigate the class of theproblem encountered. This will, in many cases, dictate the types of solution procedures to adoptin solving the problem.

Geometric Programming

Geometric programming (GP) is an optimization technique applicable to programming problemsinvolving functions of a special mathematical form called posynomials, an adaptation of the wordpolinomial to the case where all coefficients are positive. A function f(x) is called a posynomialif it can be expressed as the sum of positive terms, each of which is a power function

f(x) = ci xai11 xai2

2 ... xainn + . . . + cN xaN1

1 xaN22 ... xaNn

n (5.57)

where ci and aij are constants with ci > 0, xj > 0 and aij ∈ R. Note, however, that the power–function exponents, which must be positive integers for polinomials, can be any real number forposynomials.

A geometric programming is one in which the objective function and constraints are expressedas posynomials in x. Thus the GP method is applicable to an optimization problem of the type(Duffin et al., 1967)

229


Find x which minimizes f(x) =No∑

i=1

ci

n∏

j=1

xpij

j

ci > 0 , xj > 0

subject to gj(x) =Nj∑

i=1

aij

[n∏

k=1

xqikk

]≤ 0 aij > 0 , j = 1,2,...,m

(5.58)

where No and Nj denote the number of posynomial terms in the objective and jth constraintfunction, respectively.

Linear Programming

If the objective function and all the constraints in equation (5.1) are linear function of the de-sign variables, the general problem of mathematical programming reduces to linear programming(LP). A linear programming problem is stated in the following standard form (Dantzig, 1963)

Find x which minimizes f(x) =n∑

i=1

cixi

subject ton∑

k=1

ajkxk = bj , j = 1,2,... ,m

xi ≥ 0 i = 1,2,... ,n

(5.59)

where ci, ajk and bj are constants.

Integer Programming

If some or all of the design variables x1,x2,... ,xn of an optimization problem are restricted to takeon only integer (or discrete) values, the problem is called an integer programming problem (Hu,1969). On the other hand, if all the design variables are permitted to take any real values, theoptimization problem is called a real–valued programming problem.

Strictly speaking, if in an LP problem one restricts the design vector x to non-negative integers,the problem becomes nonlinear. But it should be more realistic to call it an integer linear pro-gramming because the form of the constraints and the objective function remain linear if therestrictions on x are ignored.

A systematic method for handling the integer programming problem consists in ignoring the re-strictions on x solving it as an ordinary LP problem, and then introducing additional constraintsone by one to cut out the region near the solution point till an integer solution is obtained. The-oretically the method converges, but in practice the number of iterations may be very large. Alsothe method increases the number of constraints and even an original small–sized problem maybecome very large. When the answers are in the neighborhood of large integers, the method givesa satisfactory result. But if the answer is in the neighborhood of small integers, such roundingoff may lead to a totally wrong answer.

230

5.5 – Classification of Optimization Techniques

Nonlinear Programming Problem

If one or all of the functions among the objective and constraint functions in equation (5.1) isnonlinear, the problem is said to be of a nonlinear programming (NLP). This is the most generalprogramming problem, which can be used to solve any optimization problem. All other problemscan be considered as special cases of the nonlinear programming problem.

In general, nonlinear programming presents much greater mathematical difficulties than linearprogramming. Even the case when all the constraints are linear and only the objective functionis nonlinear is often quite complicated.

Quadratric Programming

The quadratic programming (QP) problem is the simplest case of nonlinear programming problemwhen the objective function is quadratic and the constraints are linear. It is usually formulatedas follows

Find x which minimizes f(x) = c +n∑

i=1

qixi +n∑

i=1

rijxixj

subject tom∑

i=1

aijxi = bj , j = 1,2,... ,m

xi ≥ 0 , i = 1,2,... ,n

(5.60)

where c, qi, rij , aij and bj are constants.

Stochastic Programming

A stochastic programming problem is an optimization problem in which some of the design vari-ables and/or preassigned parameters are described by probabilistic (nondeterministic or stochas-tic) distributions (Sengupta, 1972).

Multiobjective Programming

A multiobjective programming problem can be stated as follows

Find x which minimizes f1(x),f2(x), . . . ,fk(x)

subject to gj(x) ≤ 0 , j = 1,2,... ,m

(5.61)

where f1, f2, ... , fk denote the objective functions to be minimized simultaneously.

231


Theory of Games

When two or more candidate designs are competing for the achievement of conflicting goals, acompetitive goal exists. Generally, in such problems the losses of one candidate signify the gainsof the others. Naturally, the objective function depends on a set of controlled as well as uncon-trolled variables where the uncontrolled variables depend on the strategy of the competitor.

Dynamic Programming

The method of dynamic programming (DP) was developed in the fifties through the work of Bell-man who is still the doyen of research workers in this field. The essential feature of the method isthat a multivariate optimization problem is decomposed into a series of stages, optimization beingdone at each stage with respect to one variable only. Bellman gave it rather the non–descriptivename dynamic programming, whereas a more significant name would be recursive optimization.

Both discrete and continuous problems can be amenable to this method; also deterministic as wellas stochastic models can be handled. The complexities increase tremendously with the number ofconstraints. A single constraint problem is relatively simple, but even more than two constraintsproblem can be formidable.

Network Methods

Networks are familiar diagrams in electrical theory, even though they are easily visualized intransportation systems like roads, railways or pipelines. A large variety of intricate mathematicalproblems challenging mathematicians are presented by networks. Many problems, particularlythose which involve sequential operations or different but related states or stages, are convenientlydescribed as networks. Sometimes a problem with no such apparent structure assumes a mathe-matical form which is best understood and solved by interpreting it as a network.

A network, in its more generalized and abstract sense, is called a graph. In last decades graphtheory has found more and more applications in diverse areas. In the field of operation researchgraph theory plays a particularly important role as quite often the problem of finding an optimalsolution can be looked upon as a problem of choosing the best sequence of operations out of afinite number of alternatives which can be represented as a graph.

The critical path methods (CPM) and the programme evaluation and review technique (PERT)are network methods which are useful in planning, scheduling and controlling a project. Thesemethods belong to network methods since in both the methods, the various operations necessaryto complete the project and the order in which the operations are to be performed are shownin a graph called a network. CPM is useful for projects in which the durations of the variousoperations are known exactly, whereas PERT is designed to deal with projects in which there isuncertainty regarding the durations of various operations.

232

5.6 – Linear Programming

5.6 Linear Programming

Linear programming is an optimization method applicable for the solution of problems havingobjective functions and constraints that are all linear functions of the decision variables. Theconstraint equations in a linear programming problem may be in the form of equalities and in-equalities.

The linear programming type of optimization problem was first recognized in the 1930s byeconomists while developing methods for the optimal allocation of resources. During the SecondWorld War the United States Air Force sought more effective procedures of allocating resourcesand turned to linear programming. Dantzig (1947), who was a member of the Air Force group,formulated the general linear programming problem and devised the simplex method of solution.This has become a significant step in bringing the linear programming into wider usage.

Afterwards, much progress has been made in the theoretical development and in the practicalapplications of linear programming. Among all the works, the theoretical contributions madeby Kuhn and Tucker had a major impact in the development of the duality theory in linearprogramming. The works of Charnes and Cooper were responsible for paving the path to theindustrial applications of linear programming; their number has been so large that it is possiblehere to describe only some of them.

One of the early industrial applications of linear programming has been made in the petroleumrefineries. In general, an oil refinery has a choice of buying crude oil from several different sourceswith differing compositions and at differing prices. It can manufacture different products likeaviation fuel, diesel fuel and gasoline in varying quantities. The constraints may be due to therestrictions on quantity of the crude oil available from a particular source, the capacity of therefinery to produce a particular product, etc. A mix of the purchased crude oil and the manufac-tured products is sought that gives the maximum profit.

In food processing industry, linear programming has been used to determine the optimal shippingplan for the distribution of a particular product from the different manufacturing plants to thevarious warehouses. In the iron and steel industry, the linear programming was used to decide thetypes of products to be made in their rolling mills to maximize the profit. Metal working industriesuse linear programming for shop loading and for determining the choice between producing andbuying a part. The optimal routing of aircraft and ships as well as an optimal fleet can also bedecided by using linear programming.

5.6.1 Graphical Representation

The concept of linear programming can be grasped in a preliminary way by observing a graphicalsolution when the number of variables is three or less. One can graph the set of feasible solutionstogether with the level sets of the objective function. Then, it is usually a trivial matter to writedown the optimal solution.

233


To illustrate, consider the following problem:

maximize f = 2x1 + x2

subject to g1 : x1 + 2x2 ≤ 8 g2 : 2x1 ≥ 1

g3 : x1 − x2 ≤ 3/2 g4 : 2x2 ≥ 1

x1 ≥ 0 , x2 ≥ 0

(5.62)

Each constraint (including the nonnegativity constraints on the variables) is a half-plane. Thesehalf–planes can be determined by first graphing the equation one obtains by replacing the in-equality with an equality. The geometric representation of the feasible space and the contours off are shown in in Figure 5.14. The optimization surface is a plane, and once any contour line forthe objective function is drawn, it is clear that the optimum is the vertex of the feasible regionthrough which the line representing the largest value of the objective function can pass; in thiscase the optimum solution is x∗1 = 2.73, x∗2 = 2.17.

Figure 5.14. The set of feasible solutions together with level sets of the objective function

Some immediate general remarks can be made based on this example. In a linear model, theobjective and constraint functions are always monotonic. If equalities exist, one can assume,without loss of generality, that they have been eliminated, explicitly or implicitly, so that theresulting reduced problem will be monotonic. From the first monotonicity principle, there will bealways at least one active constraint, identified possibly with the aid of dominance. Subsequentelimination of active constraints will always yield a new monotonic problem. The process willcontinue as long as activity can be proven, until no variables remain in the objective. The solutionreached usually will be at a vertex of the feasible space, which is the intersection of as many activeconstraint surfaces as there are variables. The only other possibility is to have variables left in theconstraints that do not appear in the objective function. In this case, the second monotonicityprinciple would indicate the existence of an infinite number of optima along the edge or facewhose normal matches that of the objective function gradient. The limiting optimal values will

234


be at the corners of the optimal face which correspond to upper and lower bounds on the variablenot appearing in the objective function.

5.6.2 Standard Form of a Linear Programming Problem

In linear programming the objective is always to maximize or to minimize some linear functionof the decision variables. The general linear programming problem can be stated in the followingstandard form:

Scalar form

minimize f (x1,x1, . . . ,xn) = c1x1 + c2x2 + . . . + cnxn

subject to a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2

am1x1 + am2x2 + . . . + amnxn = bm

xi ≥ 0

(5.63)

where cj , bj and aij (i = 1,2, . . . ,m; j = 1,2, . . . ,n ) are known constants, and xj are the decisionvariables.

The linear programming problem in scalar form may also be stated in a compact form by usingthe summation sign as

minimize f (x1,x1, . . . ,xn) =n∑

j=1

cjxj

subject ton∑

j=1

aijxj = bi , i = 1,2. . . . ,m

xj ≥ 0 i = 1,2, . . . ,n

(5.64)

Matrix form

minimize cTx

subject to ax = b

x ≥ 0

(5.65)

where x = {x1,x2, . . . ,xn} ; b = {b1,b2, . . . ,bn} ; c = {c1,c2, . . . ,cn}

a =

a11 a12 . . . a1n

a21 a22 . . . a2n...

am1 am2 . . . amn

and the superscript T is used to indicate the transpose.

235


The characteristics of the linear programming problem stated in the standard form are:• the objective function is of the minimization type;• all the constraints are of the equality type;• all the decision variables are nonnegative.

It is shown below that any linear programming problem can be put in the standard form by theuse of the following transformations:

1. Also in LP problems the maximization of a function f(x1,x2, . . . ,xn) is equivalent to theminimization of the negative of the same function. Consequently, the objective functioncan be stated in the minimization form in any linear programming problem.

2. In most of the engineering optimization problems, the decision variables represent somephysical dimensions and hence the variables xj have to be nonnegative. However, when avariable is unrestricted in sign (e.g., it can take a positive, negative or a zero value), it canbe written as the difference of two nonnegative variables. Thus, if xj is unrestricted in sign,it can be written as xj = x

′j − x

′′j where

x′j ≥ 0 and x”

j ≥ 0

It can be noticed that xj will be negative, zero or positive depending on whether x′′j is

greater than, equal to or less than x′j .

3. If a constraint appears in the form of a ‘less than’ type of inequality as

ak1x1 + ak2x2 + . . . + aknxn ≤ bk

it can be converted into the equality form by adding a nonnegative slack variable xn+1 asfollows

ak1x1 + ak2x2 + . . . + aknxn + xn+1 = bk

Similarly, if the constraint is in the form of a ‘greater than’ type of inequality as

ak1x1 + ak2x2 + . . . + aknxn ≥ bk

it can be converted into the equality form by subtracting a variable as

ak1x1 + ak2x2 + . . . + aknxn − xn+1 = bk

where xn+1 is a nonnegative variable known as the surplus variable.

A set of specific values for the decision variables is called a solution with values {x1,x2, . . . ,xn}.A solution is called feasible if it satisfies all the constraints. One should assume that m < n, forif m > n, then there would be (m−n) redundant equations which could be eliminated. The casen = m is of no interest, for then there is either a unique solution x which satisfies constraints inequations (5.63), in which case there can be no optimization, or no feasible solution, in which caseat least one constraint is contradicted. The case m < n corresponds to an unbounded problemwith an underdetermined set of linear equations. The problem of linear programming is to findone of these solutions satisfying equations (5.64) or (5.65) and yielding the minimum of f .

236


5.6.3 Definitions and Theorems

The geometrical characteristics of linear programming problems can be proved mathematically.Some of the more powerful methods for solving linear programming problems take advantage ofthese characteristics. The terminology used in linear programming and some of the most impor-tant related theorems are considered below.

1. Point in n–dimensional space

A point x in an n–dimensional space is characterized by an ordered set of n values or coor-dinates (x1,x2, . . . ,xn). The coordinates of x are also called the components of x.

2. Line segment in n-dimensions

If the coordinates of two points A and B are given by x(1)j and x

(2)j (j = 1,2,...,n), the line

segment L joining these points is the collection of points x (λ) whose coordinates are givenby xj = λx

(1)j + (1− λ)x

(2)j , j = 1,2, . . . ,n , where 0 ≤ λ ≤ 1. Thus

L = {x | x = λx(1) + (1− λ)x(2)) (5.66)

In one dimension, for example, it is easy to see from Figure 5.15 that the definition is inaccordance with experience

x(λ) − x(1) = λ (x(2) − x(1)) , 0 ≤ λ ≤ 1 (5.67)

whence

x(λ) = (1− λ) x(1) + λx(2) (5.68)

Figure 5.15. A line segment

3. Hyperplane

In n–dimensional space, the set of points whose coordinates satisfy a linear equation

a1x1 + a1x1 + . . . + anxn = aTx = b (5.69)

is called a hyperplane.

Thus, the hyperplane H is given by

H(a,b) = {x | aTx = b} (5.70)

A hyperplane has (n−1) dimensions in n–dimensional space (En). For example, in a three–dimensional space, it is a plane, and in two–dimensional space, it is a line. The set of pointswhose coordinates satisfy a linear inequality like a1x1 + . . . + anxn ≤ b is called a closedhalf–space; closed due to the inclusion of equality sign in the above inequality.

237


A hyperplane partitions En into two closed half–spaces so that

H+ = {x |aTx ≥ b} (5.71)

and

H− = {x |aTx ≤ b} (5.72)

This is illustrated in Figure 5.16 in the case of a two–dimensional space (E2).

Figure 5.16. Hyperplane in two dimensions

4. Convex set

A convex set is a set of points such that if x(1) and x(2) are any two points in the set, theline segment joining them is also in the set .

If S denotes the convex set, it can be defined mathematically as follows:

if x(1) ,x(2) ∈ S , then x ∈ S

where

x = αx(2) + (1− α)x(1) , 0 ≤ α ≤ 1

As an assumption the set containing only one point is convex. Some examples of convexsets in two dimensions are shown shaded in Figure 5.17.

Figure 5.17. Convex sets

On the other hand, the sets depicted by the shaded region in Figure 5.18 are not convex.The L–shaped zone, for example, is not a convex set because it is possible to find two pointsa and b in the set such that not all points on the line joining them belong to the set.

238


Figure 5.18. Nonconvex sets

5. Convex polyhedron

A convex polyhedron is a set of points common to one or more half–spaces.

In particular, a convex polygon is the intersection of one or more half planes. Thus, Figure5.19(a) shows a 2D convex polygon, while Figure 5.19(b) represents a 3D convex polyhedron.

6. Vertex (extreme point)

A vertex is a point in the convex set which does not lie on a line segment joining two otherpoints of the set .

Thus, for example, every point on the circumference of a circle and each corner point of apolygon can be called a vertex or extreme point.

Figure 5.19. Convex polyhedron

7. Feasible solution

In a linear programming problem, any solution which satisfies the constraints

ax = b for x ≥ 0 (5.73)

is called a feasible solution.

8. Basic solution

It is a solution in which (n−m) variables are set equal to zero.

The basic solution can be obtained by setting (n − m) variables to zero and solving theconstraint equations (5.73) simultaneously.

239


9. Basis

The set of variables not set equal to zero to obtain the basic solution is the basis.

10. Basic feasible solution

It is a basic solution which satisfies the nonnegativity conditions of equation (5.65)

11. Non–degenerate basic feasible solution

It is a basic feasible solution which has got exactly m positive xi.

12. Optimal solution

An optimal solution is a feasible solution which optimizes the objective function.

13. Optimal basic solution

It is a basic feasible solution for which the objective function is optimal .

Basic theorems in linear programming.

Theorem 10. The intersection of any number of convex sets is also convex .

Physically, the theorem states that if there are a number of convex sets represented by R1, R2,. . ., then the set of points R common to all these sets will also be convex. Figure 5.20 illustratesthe meaning of this theorem for the case of two convex sets.

Figure 5.20. Intersection of two convex sets

Theorem 11. The feasible region of a linear programming problem is convex .

Theorem 12. Any local minimum solution is global for a linear programming problem.

Theorem 13. Every basic feasible solution is an extreme point of the convex set of feasiblesolutions.

240


Theorem 14. Let S be a closed, bounded convex polyhedron with xi, i = 1,2, . . . ,p as the set ofits extreme points. Then any vector x ∈ S can be written as

x =p∑

i=1

λi xi with λ ≥ 0 ,p∑

i=1

λi = 1

Theorem 15. Let S be a closed convex polyhedron. Then the minimum of a linear functionover S is attained at an extreme point of S.

5.6.4 Solution of a System of Linear Simultaneous Equations

Before studying the most general method of solving a linear programming problem, it will be use-ful to review the methods of solving a system of linear equations. Hence some of the elementaryconcepts of linear equations are reviewed.

Particular Case

Consider the following square system of n equations in n unknowns

a11x1 + a12x2 + . . . + a1nxn = b1 E1

a21x1 + a22x2 + . . . + a2nxn = b2 E2

a31x1 + a32x2 + . . . + a3nxn = b3 E3

......

......

an1x1 + an2x2 + . . . + annxn = bn En

(5.74)

or in matricial form

AX = B (5.75)

If it is assumed the reader is familiar with the definition of the inverse of a square matrix, andrecall that the inverse of A, denoted as A−1, is defined only when the determinant of A, denotedas |A|, is non zero. When |A| = 0 the matrix A is said to be singular , and when |A| 6= 0 it iscalled nonsingular .

Assuming that this set of equations possesses a unique solution, one way of solving the systemconsists of reducing the equations to a form known as canonical form.

It is well known from elementary algebra that the solution of equations (5.74) will not be al-tered under the following elementary operations: (i) any equation Er is replaced by the equationk Er, where k is a nonzero constant, and (ii) any other equation Er is replaced by the equationEr + k Es, where Es is any other equation of the system. By making use of these elementaryoperations, the system of equations (5.74) can be reduced to a convenient equivalent form asfollows. Let select some variable xi and try to eliminate it from all the equations except thejth one (for which aji is nonzero). This can be accomplished by dividing the jth equation by aji

and subtracting aki times the result from each of the other equations, k = 1,2, . . . ,j−1,j+1, . . . ,n.

241


The resulting system of equations can be written as

a′11x1 + a

′12x2 + . . . + a

′1,i−1xi−1 + 0·xi + a

′1,i+1xi+1 + . . . + a

′1nxn = b

′1

a′21x1 + a

′22x2 + . . . + a

′2,i−1xi−1 + 0·xi + a

′2,i+1xi+1 + . . . + a

′2nxn = b

′2

...a′j−1,1x1 + . . . + a

′j−1,i−1xi−1 + 0·xi + a

′j−1,i+1xi+1 + . . . + a

′j−1,nxn = b

′j−1

a′j,1x1 + a

′j,2x2 + . . . + a

′j,i−1xi−1 + 1·xi + a

′j,i+1xi+1 + . . . + a

′jnxn = b

′j

a′j+1,1x1 + . . . + a

′j+1,i−1xi−1 + 0·xi + a

′j+1,i+1xi+1 + . . . + a

′j+1,nxn = b

′j+1

...a′n1x1 + an2x2 + . . . + . . . + a

′n,i−1xi−1 + 0·xi + a

′n,i+1xi+1 + . . . + a

′n,nxn = b

′n

(5.76)

where the primes indicate that the a′ij and b

′j are changed from the original system. This pro-

cedure of eliminating a particular variable from all but one equation is called a pivot operation.The system of equations (5.76) produced by the pivot operation have exactly the same solutionas the original set of equations (5.74). That is, the x which satisfies equations (5.74) satisfiesequations (5.76) and vice versa.

In the next step, if one takes the system of equations (5.76) and performs a new pivot operationby eliminating xs,s 6= i, in all the equations except in the tth equation, t 6= j, the zeroes or the1 in the i–th column will not be disturbed. This pivotal operations can be repeated by usinga different variable and a different equation each time until the system of equations (5.74) isreduced to the form

1.x1 + 0.x2 + 0.x3 + . . . + 0.xn = b′′1

0.x1 + 1.x2 + 0.x3 + . . . + 0.xn = b′′2

0.x1 + 0.x2 + 1.x3 + . . . + 0.xn = b′′3

...0.x1 + 0.x2 + 0.x3 + . . . + 1.xn = b

′′n

(5.77)

The system of equations (5.77) is said to be in canonical form and has been obtained after carryingout n pivot operations. From the canonical form, the solution vector can be directly obtained as

xi = b′′i ; i = 1,2, . . . ,n (5.78)

Since the set of equations (5.77) has been obtained from equations (5.74) only through elementaryoperations, the system of equations (5.77) is equivalent to the system of equations (5.74). Thusthe solution given in equation (5.78) is the desired solution for equations (5.74).

242


Pivotal Reduction of a General System of Equations

Instead of a square system, let consider a system of m equations in n variables with n > m

a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2...am1x1 + am2x2 + . . . + amnxn = bm

(5.79)

This system of equations is assumed to be consistent so that it will have at least one solution. Thesolution vectors x which satisfy the system are not evident from equations (5.79). However, it ispossible to reduce this system to an equivalent canonical system from which at least one solutioncan be readily deduced. If pivotal operations, with respect to any m variables, say, x1,x2, . . . ,xm

are carried out, the resulting set of equations can be written as follows

1·x1 + 0·x2 + . . . + 0·xm + a′′1,m+1xm+1 + . . . + a

′′1nxn = b

′′1

0·x1 + 1·x2 + . . . + 0·xm + a′′2,m+1xm+1 + . . . + a

′′2nxn = b

′′2

...0·x1 + 0·x2 + . . . + 1·xm + a

′′m,m+1xm+1 + . . . + a

′′mnxn = b

′′m

(5.80)

One special solution which can always be deduced from the system of equations (5.80) is

xi = b′′i , i = 1,2, . . . ,m

xi = 0 , i = m + 1,m + 2, . . . ,n

(5.81)

This solution is called a basic solution since the solution vector contains no more than m nonzeroterms. The pivotal variables xi(i = 1,2, . . . ,m) are called basic variables whereas the remainingvariables xi, i = m+1,m+2, . . . ,n are called non-pivotal, or independent, or nonbasic variables.Of course, the basic solution is not the only solution, but it is the one most readily deduced fromequations (5.80). If all b

′′i in the solution given by equations (5.81) are nonnegative, it satisfies

all the constraints in equations (5.63), and hence it can be called a basic feasible solution.

It is possible to obtain the other basic solutions from the canonical system of equations (5.80).One can perform an additional pivotal operation on the system after it is in canonical form, usinga′′pq (which is nonzero) as the pivot term, q > m, and using any row p (among 1,2, . . ., m). The

new system will still be in canonical form, but with xq as the pivotal variable in place of xp. Thevariable xp, which was a basic variable in the original canonical form, will no longer be a basicvariable in the new canonical form. This new canonical system yields a new basic solution (whichmay or may not be feasible) similar to that of equations (5.81). It is to be noted that the valuesof all the basic variables change, in general, as one goes from one basic solution to another, butonly one zero variable (which is nonbasic in the original canonical form) becomes nonzero (whichis basic in the new canonical system) and vice versa.

243


5.6.5 Why the Simplex Method?

Given a system in canonical form corresponding to a basic solution, it has been shown how tomove to a neighboring basic solution by a pivot operation. Thus, one way to find the optimalsolution of a linear programming problem is to generate all the basic solutions and pick the onewhich is feasible and corresponds to the optimal value of the objective function. This can bedone because the optimal solution, if one exists, always occurs at an extreme point or vertexof the feasible domain. If there are m equality constraints in n variables with n > m, a basicsolution can be obtained by setting any of the (n −m) variables equal to zero. The number ofbasic solutions to be inspected is thus equal to the number of ways in which m variables can beselected from a group of n variables, i.e.

n!(n−m)! m!

=(

nm

)

For example, if n = 10 and m = 5, there are 252 basic solutions and if n = 20 and m = 10,one gets approximately 184700 basic solutions. Usually, one does not have to inspect all thesebasic solutions since many of them will be infeasible. However, for large n and m, this is stilla very large number for inspecting one by one. So, it is not possible to find an analytical so-lution to the LP problem. The difficulty arises because the analysis tools are not well suitedto handle inequalities. Hence, what one really needs is a computational scheme that examinesa sequence of basic feasible solutions, each of which corresponds to a lower value of the objec-tive function f until a minimum is reached. Numerical methods which enable to compute thesolution of numerical values of aij , xi and bj for finite number of variables and constraints havebeen discovered. The most general and widely used of these methods is called the simplex method .

The simplex method of Dantzig provides an algorithm4 for obtaining a basic feasible solution; ifthe solution is not optimal, the method provides for finding a neighboring basic feasible solutionwhich has a lower or equal value of f . The process is repeated until, in a finite number of steps,an optimum is found.

The first step involved in the simplex method is to construct an auxiliary problem by introducingcertain variables, known as artificial variables, into the standard form of the linear programmingproblem. The primary aim of adding the artificial variables is to bring the resulting auxiliaryproblem into a canonical form from which its basic feasible solution can be immediately obtained.Starting from this canonical form, the optimal solution of the original linear programming problemis sought in two phases. The first phase is intended to find a basic feasible solution to the originallinear programming problem. It consists of a sequence of pivot operations which produces asuccession of different canonical forms from which the optimal solution of the auxiliary problemcan be found. This also enables to find a basic feasible solution, if one exists, of the originallinear programming problem. The second phase is intended to find the optimal solution of theoriginal linear programming problem. It consists of a second sequence of pivot operations which

4An algorithm is a rule of procedure usually involving repetitive application of an operation. The word isderived from the Arabic Al Khwarizmi (after the Arab mathematician of the same name, about 825 Dc) which inOld French became algorismus and in Middle English algorism.

244


enables to move from one basic feasible solution to the next of the original linear programmingproblem. In this process, the optimal solution of the problem, if one exists, will be identified. Thesequence of different canonical forms that is necessary in both the phases of the simplex methodis generated according to the simplex algorithm described below. That is, the simplex algorithmforms the kernel of the simplex method.

5.6.6 Simplex Algorithm

The starting point of the simplex algorithm is always a set of equations, which includes theobjective function along with the equality constraints of the problem in canonical form. Thus theobjective of the simplex algorithm is to find the vector x ≥ 0 which minimizes the function f(x)and satisfies the system of equations

1·x1 + 0·x2 + . . . + 0·xm + a′′1,m+1xm+1 + . . . + a

′′1nxn = b

′′1

0·x1 + 1·x2 + . . . + 0·xm + a′′2,m+1xm+1 + . . . + a

′′2nxn = b

′′2

...0·x1 + 0·x2 + . . . + 1·xm + a

′′m,m+1xm+1 + . . . + a

′′mnxn = b

′′m

0·x1 + 0·x2 + . . . + 0·xm − f + c′′m+1xm+1 + . . . + c

′′mnxn = −f

′′o

(5.82)

where a′′ij , c

′′j , b

′′j , and f

′′o are constants. Notice that (−f) is treated as a basic variable in

the canonical form of equations (5.82). The basic solution which can be readily deduced fromequations (5.82) is

xi = b′′i , i = 1,2, . . . ,m

f = f′′o

xi = 0 , i = m + 1,m + 2, . . . ,n

(5.83)

If this basic solution is also feasible, the values of xi (i = 1,2, . . . ,n) are non–negative and hence

b′′i ≥ 0 , i = 1,2, . . . ,m

In the first phase of the simplex method, the basic solution corresponding to the canonical formobtained after the introduction of the artificial variables must be feasible for the auxiliary prob-lem. As it has been stated earlier, the second phase of the simplex method starts with a basicfeasible solution of the original linear programming problem. Hence the initial canonical form atthe start of the simplex algorithm will always be a basic feasible solution.

It is known from theorem 15 that the optimal solution of a LP problem lies at one of the basicfeasible solutions. Since the simplex algorithm is intended to move from one basic feasible solu-tion to the other through pivotal operations, it must be made sure that the present basic feasiblesolution is not the optimal solution before moving to the next basic feasible solution. By merelyglancing at the numbers c

′′j (i = 1,2, . . . ,n) it is possible to ascertain whether the present basic

feasible solution is optimal or not.

245


Identifying an optimal point (theorem 16)

A basic feasible solution is an optimal solution with a minimum objective function value of f′′o if

all the cost coefficients c′′j (j = m + 1,m + 2, . . . ,n) in equations (5.82) are non-negative.

A glance over c′′j can also show if there are multiple optima. Let all c

′′i > 0 (i = m + 1,m +

2, . . . ,k + 1, . . . ,n) and let c′′k = 0 for some nonbasic variable xk. Then if the constraints allow

that variable to be made positive (from its present value of zero), no change in f results, andthere are multiple optima. It is possible however, that the variable may not be allowed by theconstraints to become positive; this may occur in the case of degenerate solutions.

Thus, as a corollary to the above discussion, one can state that a basic feasible solution is theunique optimal feasible solution if c

′′i > 0 for all nonbasic variables xj (j = m + 1,m + 2, . . . ,n).

If, after testing for optimality, the current basic feasible solution is found to be non-optimal, animproved basic solution is to be obtained from the present canonical form as follows.

Improving a non-optimal basic feasible solution

From the last row of equations (5.82), the objective function can be written as

f = f′′o +

m∑

i=1

c′′i xi +

n∑

j=m+1

c′′j xj = f

′′o (5.84)

for the solution given by equations (5.83).

If at least one c′′j is negative, the value of f can be reduced by making the corresponding xj > 0.

In other words, the nonbasic variable xj , for which the cost coefficient c′′j is negative, is to be

made a basic variable in order to reduce the value of the objective function. At the same time,due to the pivotal operation, one of the current basic variables will become nonbasic and hencethe values of the new basic variables are to be adjusted in order to bring the value of f less thanf′′o . If there are more than one c

′′j < 0, the index s of the nonbasic variable xs which is to be

made basic is chosen such that

c′′s = minimum c

′′j < 0 (5.85)

Although this may not lead to the greatest possible decrease in f (since it may not be possible toincrease xs very far), this is intuitively at least a good rule for choosing the variable to becomebasic. It is the one generally used in practice because it is simple and it usually leads to feweriterations than just choosing any c

′′j < 0. If there is a tie in applying (5.85), i.e., if more than

one c′′j have the same minimum value, one selects one of them as c

′′s arbitrarily.

Having decided on the variable xs to become basic, it has to be increased from zero holding allother nonbasic variables zero, observing the effect on the current basic variables. By equations(5.82), these are related as

246


x1 = b′′1 − a

′′1sxs , b

′′1 ≥ 0

x2 = b′′2 − a

′′2sxs , b

′′2 ≥ 0

...xm = b

′′m − a

′′msxs , b

′′m ≥ 0

(5.86)

f = f′′o + c

′′sxs , c

′′s < 0 (5.87)

Since c′′s < 0, equation (5.87) suggests that the value of xs should be made as large as possible

in order to reduce the value of f as much as possible. However, in the process of increasing thevalue of xs, some of the variables xi (i = 1,2, . . . ,m) in equations (5.86) may become negative.It can be seen that if all the coefficients a

′′is < 0, then xs can be made infinitely large without

making any xi < 0. In such a case, the minimum value of f is minus infinity and the linearprogramming problem is said to have an unbounded solution.

On the other hand, if at least one a′′is is positive, the maximum value that xs can take without

making any xi negative is (b′′i /a

′′is). If there are more than one a

′′is > 0. the largest value x∗s that

xs can take is given by the minimum of the ratios (b′′i /a

′′is) for which a

′′is > 0. Thus

x∗s =b′′r

a′′rs

= mina′′is>0

b′′i

a′′is

(5.88)

The choice of r in the case of a tie, assuming that all b′′i > 0, is arbitrary. If any b

′′i , for which

a′′is > 0, is zero in equations (5.86), then xs cannot be increased by any amount. Such a solution

is called a degenerate solution.

In the case of a non–degenerate basic feasible solution, a new basic feasible solution can beconstructed with a lower value of the objective function as follows. By substituting the value ofx∗s given by equation (5.88) into equations (5.86) and (5.87), one obtains

xs = x∗s

xi = b′′i − a

′′isx

∗s , i = 1,2, . . . ,m and i 6= r

xr = 0

xj = 0 , j = m + 1,m + 2. . . . ,n and j 6= s

(5.89)

f = f′′o + c

′′sx∗s ≤ f

′′o (5.90)

which can readily be seen to be a feasible solution different from the previous one. Since a′′rs > 0

in equations (5.88), a single pivot operation on the element a′′rs in the system of equations (5.82)

will lead to a new canonical form from which the basic feasible solution of equations (5.89) caneasily be deduced. Also, equation (5.90) shows that this basic feasible solution corresponds to alower objective function value compared to that of equations (5.83). This basic feasible solutioncan again be tested for optimality by seeing whether all c

′′i > 0 in the new canonical form. If

the solution is not optimal, the whole procedure of moving to another basic feasible solution

247


from the present one has to be repeated. In the simplex algorithm, this procedure is repeatedin an iterative manner until the algorithm finds either (i) a class of feasible solutions for whichf → −∞, or (ii) an optimal basic feasible solution with all c

′′i > 0. Since there are only a finite

number of ways to choose a set of m basic variables out of n variables, the iterative process ofthe simplex algorithm will terminate in a finite number of cycles. The iterative process of thesimplex algorithm is shown as a flowchart in Figure 5.21.

Figure 5.21. Searching the optimal solution by the simplex algorithm

5.6.7 Phases of the Simplex Method

The problem is to find non–negative values for the variables x1,x2, . . . ,xn which satisfy theconstraint equations

a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2

...am1x1 + am2x2 + . . . + amnxn = bm

(5.91)

248


and minimize the objective function given by

f = c1x1 + c2x2 + . . . + cnxn (5.92)

The two–phase simplex method can be used to solve this problem. The difficulties encounteredin solving this problem are:

• an initial feasible canonical form may not be readily available; this is the case when thelinear programming problem does not have slack variables for some of the equations or whenthe slack variables have negative coefficients.

• the problem may have redundancies and/or inconsistencies, and may not be solvable innon–negative numbers.

The first phase of the simplex method uses the simplex algorithm itself to find whether the linearprogramming problem has a feasible solution. If a feasible solution exists, it provides a basicfeasible solution in canonical form ready to initiate the second phase of the method.

The second phase, in turn, uses the simplex algorithm to find whether the problem has a boundedoptimum. If a bounded optimum exists, it finds the basic feasible solution which is optimal .

The simplex method is described in the following steps

1. Arrange the original system of equations (5.91) so that all constant terms bi are positive orzero by changing, where necessary, the signs on both sides of any of the equations.

2. Introduce to this system a basic set of artificial variables y1,y2, . . . ,ym where each yi ≥ 0 sothat it becomes

a11x1 + a12x2 + . . . + a1nxn + y1 = b1

a21x1 + a22x2 + . . . + a2nxn + y2 = b2

...am1x1 + am2x2 + . . . + amnxn + ym = bm

(5.93)

with bi ≥ 0.

The objective function of equation (5.92) can be written as

c1x1 + c2x2 + . . . + cnxn + (−f) = 0 (5.94)

3. First phase of the method . Define a quantity w as the sum of the artificial variables

w = y1 + y2 + . . . + ym (5.95)

and use the simplex algorithm to find xi ≥ 0 (i = 1,2, . . . ,n) and yi ≥ 0 (i = 1,2, . . . ,m)which minimize w and satisfy the equations (5.93) and (5.94).

249


Consequently, consider the array

a11x1 + a12x2 + . . . + a1nxn + y1 = b1

a21x1 + a22x2 + . . . + a2nxn + y2 = b2


c1x1 + c2x2 + . . . + cnxn + (−f) = 0

y1 + y2 + . . . + ym + (−w) = 0

(5.96)

This array is not in canonical form; however, it can be rewritten as a.canonical system withbasic variables y1,y2, . . . ,ym,− f and −w by subtracting the sum of the first m equationsfrom the last one to obtain the new system

a11x1 + a12x2 + . . . + a1nxn + y1 = b1

a21x1 + a22x2 + . . . + a2nxn + y2 = b2


c1x1 + c2x2 + . . . + cnxn + (−f) = 0

d1x1 + d2x2 + . . . + dnxn + (−w) = −wo

(5.97)

where

di = −(a1i + a2i + . . . + ami) for i = 1,2, . . . ,n (5.98)

and

−wo = −(b1 + b2 + . . . + bm) (5.99)

Equations (5.97) provide the initial basic feasible solution that is necessary for starting thefirst phase.

4. The quantity w is called the infeasibility form and has the property that if, as a result ofthe first phase, minimum of w > 0, then no feasible solution exists for the original linearprogramming problem stated in equations (5.93) and (5.94), and thus the procedure is ter-minated. On the other hand, if minimum of w = 0, then the resulting array will be incanonical form. So initiate the second phase, eliminating the w equation from the array aswell as the columns corresponding to each of the artificial variables y1,y2, . . . ,ym.

5. Second phase of the method . Apply the simplex algorithm to the adjusted canonical systemat the end of the first phase to obtain a solution, if a finite one exists, which optimizes thevalue of f .

The flowchart for the two-phase simplex method is given in Figures 5.22 and Figures 5.23, whichare to be read sequentially and simultaneously.

250


Figure 5.22. Flowchart for the two phase simplex method (A)

251


Figure 5.23. Flowchart for the two phase simplex method (B)

252

5.7 – NLP: One–Dimensional Minimization Methods

5.7 NLP: One–Dimensional Minimization Methods

Section 5.4 has shown that if the expressions for the objective function and the constraints arefairly simple in terms of design variables, the classical methods of optimization can be used tosolve the problem. On the other hand, if the optimization problem involves objective functionsand/or constraints which are nonlinear and/or are not stated as explicit functions of the designvariables or which are too complicated to manipulate, it cannot be solved by using the classicalanalytical methods. There are many engineering design problems which possess this characteris-tic, where it is necessary to resort to numerical, nonlinear optimization methods.

The basic philosophy of most of the numerical methods of optimization is to produce a sequenceof improved approximations to the optimum according to the following scheme:

1. start with an initial trial point xi

2. find a suitable direction Si (i = 1 to start with) which points in the direction of minimum;3. find an appropriate step length λ∗i for movement along the feasible direction Si;4. obtain the new approximation xi+1 as

xi+1 = xi + λ∗i Si (5.100)

5. test whether xi+1 is optimum; if xi+1 is optimum, stop the procedure; otherwise, set newi = i + 1, and repeat step 2 onwards.

The iterative procedure indicated by equation (5.100) is valid for unconstrained as well as con-strained optimization problems. The procedure is graphically represented for a hypotheticaltwo–variable problem in Figure 5.24.

Figure 5.24. The iterative process of optimization

From equation (5.100) it can be felt that the efficiency of an optimization method depends onthe efficiency with which the quantities λ∗i and Si are determined. The methods of finding thestep length λ∗i are considered in this section, whereas the methods of finding Si are considered inthe next two sections.

253


If f (x) is the objective function to be minimized, the problem of finding λ∗i comes down to findingthe value λi = λ∗i which minimizes f (xi+1) = f (xi + λi Si) = f (λi) for fixed values of xi and Si.Since f becomes a function of one variable λi only, the methods of finding λ∗i in equation (5.100)are called one-dimensional minimization methods.

Section 5.4 demonstrated that the differential calculus method of optimization is an analyticalapproach and is applicable to continuous, twice–differentiable functions. In this method, the cal-culation of the numerical value of the objective function is virtually the last step of the process.where the optimal value of the objective function is calculated after determining the optimal valuesof the decision variables. On the contrary, in the numerical methods of optimization, an oppositeprocedure is followed in that the values of the objective function are first found at various com-binations of the decision variables and conclusions are then drawn regarding the optimal solution.

Several methods are available for solving the one–dimensional minimization problem. They canbe classified as illustrated in Table 5.2.

Elimination Methods Interpolation Methods

Unrestricted search Requiring no derivatives Requiring derivatives

Exhaustive search - quadratic - cubic

Dichotomous search - direct root

Fibonacci method

Golden section method

Table 5.2. One–dimensional numerical minimization methods

The elimination methods can be used for the minimization of even discontinuous functions. Thequadratic and the cubic interpolation methods involve polynomial approximations to the givenfunction. The direct root method interpolates the derivatives of the functions linearly.

5.7.1 Elimination Methods

Fibonacci Method

The Fibonacci method can be used to find the minimum of a function of one variable, even if thefunction is not continuous. The method, like many other elimination methods, has the followinglimitations:

• the initial interval of uncertainty, in which the optimum lies, has to be known;

• the function being optimized has to be unimodal5 in the initial interval of uncertainty;

• the exact optimum cannot be located in this method; only an interval, known as the finalinterval of uncertainty, will be known; the final interval can be made as small as desired bymaking more computations;

5A unimodal function is one that has only one peak (maximum or minimum) in a given interval

254


• the number of function evaluations to be used in the search or the resolution required hasto be specified beforehand.

This method makes use of the sequence of Fibonacci numbers, {Fn}, for placing the experiments.These numbers are defined as

F0 = F1 = 1

Fn = Fn−1 + Fn−2 , n = 2,3,4, . . .

(5.101)

yielding the sequence 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, .....

Procedure

Let L0 be the initial interval uncertainty defined by a ≤ x ≤ b , and n the total number ofexperiments to be considered. Define

L∗2 =Fn−2

FnL0 (5.102)

and place the first two experiments at the points x1 and x2 which are located at a distance of L∗2from each end of L0

6. This gives7

x1 = a + L∗2

x2 = b− L∗2 = a +Fn−1

FnL0

(5.103)

Discard some part of the interval by using the unimodality assumption. Then there remains asmaller interval of uncertainty L2

8 given by

L2 = L0 − L∗2 = L0

(1− Fn−1

Fn

)=

Fn−1

FnL0 (5.104)

and with one experiment left in it. This experiment will be at a distance of

L∗2 =Fn−2

FnL0 =

Fn−2

Fn−1L2 (5.105)

from one end, and

L2 − L∗2 =Fn−3

FnL0 =

Fn−3

Fn−1L2 (5.106)

from the other end.

6If one experiment is at a distance of (Fn−2/Fn) from one end, it will be at a distance of (Fn−1/Fn) from theother end. Thus L∗2 = (Fn−1/Fn) L0 will also yield the same result as with L∗2 = (Fn−2/Fn) L0.

7It can be seen that

L∗2 =Fn−2

FnL0 ≤ L0

2for n ≥ 2

8The symbol Lj is used to denote the interval of uncertainty remaining after conducting j experiments, whilethe symbol L∗j is used to denote the position of experiments.

255


Now place the third experiment in the interval L2 so that the current two experiments are locatedat a distance of

L∗3 =Fn−3

FnL0 =

Fn−3

Fn−1L2 (5.107)

from each end of the interval L2.

Again the unimodality property will allow to reduce the interval of uncertainty to L3 given by

L3 = L2 − L∗3 = l2 − Fn−3

Fn−1L2 =

Fn−2

Fn−1L2 =

Fn−2

FnL0 (5.108)

This process of discarding a certain interval and placing a new experiment in the remaininginterval can be continued, so that the location of the jth experiment and the interval of uncertaintyat the end of j experiments are, respectively, given by

L∗j =Fn−j

Fn−(j−2)Lj−1 (5.109)

Lj =Fn−(j−1)

FnL0 (5.110)

The ratio of the interval of uncertainty remaining after conducting j of the n predeterminedexperiments, to the initial interval of uncertainty becomes

Lj

L0=

Fn−(j−1)

Fn(5.111)

which for j = n reads

Ln

L0=

F1

Fn=

1Fn

(5.112)

The ratio (Ln/L0) will permit to determine n, the required number of experiments, to obtainany desired accuracy in locating the optimum point. Table 5.3 gives the reduction ratio in theinterval of uncertainty obtainable for difverent number of experiments.

Position of the Final Experiment

In the Fibonacci method, the last experiment has to be placed with some care. From equation(5.110) the following holds

L∗nLn−1

=F0

F2=

12

for all n (5.113)

Thus, after conducting (n− 1) experiments and discarding the appropriate interval in each step,the remaining interval will contain one experiment precisely at its centre. However, the finalexperiment, namely, the nth experiment, is also to be placed at the centre of the present intervalof uncertainty.

256


n Fn Ln/L0 n Fn Ln/L0

0 1 1.0 11 144 0.0069441 1 1.0 12 233 0.0042922 2 0.5 13 377 0.0026533 3 0.3333 14 610 0.0016394 5 0.2 15 987 0.0010135 8 0.125 16 1597 0.00064066 13 0.07692 17 2584 0.00038707 21 0.04752 18 4181 0.00023928 34 0.02941 19 6765 0.00014799 55 0.01818 20 10946 0.000091410 89 0.01124 . . . . . . . . .

Table 5.3. Fibonacci numbers and reduction ratios

That is, the position of the nth experiment will be same as that of (n− 1)th one and this is truefor whatever value is chosen for n. Since no new information can be gained by placing the nth

experiment, the nth experiment is placed very close to the remaining valid experiment as in thecase of dichotomous search method. This enables to obtain the final interval of uncertainty towithin 1

2Ln−1. The flowchart for implementing the Fibonacci method of maximization is given inFigure 5.25.

257


Figure 5.25. Implementation of the Fibonacci search method

258

5.8 – NLP: Unconstrained Optimization Methods

5.8 NLP: Unconstrained Optimization Methods

This section deals with the various methods of solving an unconstrained minimization prob-lem. An unconstrained minimization problem is one where the value of the design vector x ={x1,x2, . . . ,xn} is sought that minimizes the objective function f(x). The unconstrained min-imization problem can be considered as a particular case of the general constrained nonlinearprogramming problem. The special characteristic of this problem is that the solution vector xneeds not satisfy any constraint. Although rarely a practical design problem would be uncon-strained, the study of this class of problems is important because:

• there are some design problems that can be treated as unconstrained except very close tothe final minimum point;

• some of the most powerful and convenient methods of solving constrained minimizationproblems involve the transformation of the problem into one of unconstrained minimization;

• the study of the unconstrained minimization techniques provides the basic understandingnecessary for the study of the constrained optimization techniques;

• these methods have emerged as powerful solution techniques for certain engineering analysisproblems.

For example, the displacement response (linear or nonlinear) of any structure under any specifiedload condition can be obtained by minimizing its potential energy. Similarly, the eigenvalues andeigenvectors of any discrete system can be found by minimizing the Rayleigh quotient, etc.

As it has already been demonstrated when discussing classical optimization techniques, a pointx∗ will be a relative minimum of f(x) if the necessary conditions

∂f(x∗)∂xi

= 0 , i = 1,2, . . . ,n (5.114)

are satisfied.

The point x∗ is guaranteed to be a relative minimum if the Hessian matrix is positive definite,i.e.

Jx∗ =∂2f (x∗)∂xi ∂xj

= positive definite (5.115)

Equations (5.114) and (5.115) can be used to identify the optimum point during numerical com-putations. While these properties of a minimum are useful in many problems, there are severalfunctions where equations (5.114) and (5.115) cannot be applied to identify the optimum point.In all such cases, only the commonly understood notion of a minimum, namely, f(x∗) ≤ f(x) forall x, can be used to identify a minimum point.

Several methods are available for solving an unconstrained minimization problem. These methodscan be classified into two broad categories as direct search methods and descent methods, as shownin Table 5.4.

259


Direct Search Methods Descent Methods(do not require the derivatives (require the derivatives

of the function) of the function)

Random search method Steepest descent method

Univariate method Conjugate gradient method(Fletcher–Reeves)

Pattern search method Newton method- Powell method- Hooke and Jeeves method

Rosenbrock method of Variable metric methodrotating coordinates (Davison-Fletcher-Powell)

Table 5.4. Unconstrained minimization methods

All the unconstrained minimization methods are iterative in nature and hence they start froman initial trial solution and proceed towards the minimum point in a sequential manner. Thegeneral iterative scheme is shown in Figure 5.26 as a flowchart.

Figure 5.26. General iterative scheme for optimization

It is important to note that all the unconstrained minimization methods require an initial pointx1 to start the iterative procedure and differ from one another only in the method of generatingthe new point xi+1 (from xi), and in testing the point xi+1 for optimality.

260


5.8.1 Direct Search Methods

The direct search methods require only objective function evaluations and do not use the partialderivatives of the function in finding the minimum. Hence, they are often called the non-gradientmethods, which are more suitable for simple problems involving a relatively small number ofvariables. These methods are, in general, less efficient than the descent methods.

Random Search Methods

The random search methods are based on the use of random numbers in finding the minimumpoint. Since most of the computer libraries have random number generators, these methods canbe used quite conveniently. These methods have the following advantages:

• they work even if the objective function is discontinuous and non–differentiable at some ofthe points;

• they can be used to find the global minimum when the objective function possesses severalrelative minimum points;

• they are applicable when other methods fail due to local difficulties such as sharply varyingfunctions and shallow regions;

• although they are not very efficient by themselves, they can be used in the early stagesof optimization to detect the region where the global minimum is likely to be found; oncethis region is found, some of the more efficient techniques can be used to find the preciselocation of the global minimum point.

Random Jumping Method

Let the problem be to find the minimum of f(x) in the n–dimensional hypercube defined by

li ≤ xi ≤ ui , i = 1,2, . . . ,n (5.116)

where li and ui are the lower and the upper bounds on the variable xi. In the random jumpingmethod, one generates sets of n random numbers, (r1,r2, . . . ,rn), that are uniformly distributedbetween 0 and 1. Each set of these numbers, is used to find a point x inside the hypercube definedby inequalities (5.116) as

x =

x1

x2

...xn

=

l1 + r1(u1 − l1)l2 + r2(u2 − l2)

...ln + rn(un − ln)

(5.117)

and the value of the function is evaluated at this point x. By generating a number of points andevaluating the value of the objective function at each of these points, one takes the least value off(x) as the desired minimum point.

261


Although the random jumping method is very simple, it is not practical for problems with manyvariables and is used only when efficiency is not a consideration.

Random Walk Method

The random walk method is more efficient than the random jumping method. It is based on gen-erating a sequence of improved approximations to the minimum, each derived from the precedingapproximation. Thus if xi is the approximation to the minimum obtained in the (i − 1)th step,the new or improved approximation in the ith stage is found from the relation

xi+1 = xi + λui (5.118)

where λ is a prescribed scalar step length, and ui is a unit random vector generated in the ith

stage.

Figure 5.27. Flowchart for the random walk method

262


The detailed procedure of this method is given by the following steps (see the flowchart in Figure5.27):

1. Start with an initial point x1 and a scalar step length that is sufficiently large in relationto the final accuracy desired; find the function value f1 = f(x1).

2. Set the iteration number, i = 1.

3. Generate a set of n random numbers and formulate the unit random vector xi.

4. Find the new value of the objective function as f = f(x1 + λu).

5. Compare the values of f and f1. If f < f1, set xi = x1 + λu, and f1 = f , and repeat steps3 through 5. If f ≥ f1, just repeat steps 3 through 5.

6. If a sufficient large number of iterations, N , cannot produce a better point, xi+1, reducethe scalar step length λ and go to step 3.

7. If an improved point could not be generated even after reducing the value of λ below asmall number ε, take the current point x1 as the desired optimum, and stop the procedure.

Univariate Method

In this method, only one variable at a time is changed in order to try producing a sequence ofimproved approximations to the minimum point. By starting at a base point xi in the ith itera-tion, one fixes the values of (n − 1) variables and varies the remaining variable. Since only onevariable is changed, the problem becomes a one-dimensional minimization problem and any of themethods previously discussed can be used to produce a new base point xi+1. The search is nowcontinued in a new direction. This new direction is obtained by changing any one of the (n− 1)variables that were fixed in the previous iteration. In fact, the search procedure is continued bytaking each coordinate direction in turn. After all the n directions are searched sequentially, thefirst cycle is complete and hence the entire process of sequential minimization is repeated. Thisprocedure is continued until no further improvement is possible in the objective function in anyof the n directions of a cycle. The choice of the direction and the step length in the univari-ate method for a n–dimensional problem can be summarized in the following procedure (see theflowchart in Figure 5.28):

1. Choosing a starting point x1 and set i = 1.

2. Find the search direction Si as

STi =

(1,0,0, . . . ,0) for i = 1,n + 1,2n + 1, . . .

(0,1,0, . . . ,0) for i = 2,n + 2,2n + 2, . . .

(0,0,1, . . . ,0) for i = 3,n + 3,2n + 3, . . ....(0,0,0, . . . ,1) for i = n,2n,3n, . . .

(5.119)

263


Figure 5.28. Flowchart for the univariate method

3. Determine whether λi should be positive or negative. This means, for the current directionSi, find whether the function value decreases in the positive or negative direction. Forthis, one takes a small probe length, ε, and evaluates fi = f(xi), f+ = f(xi + εSi) andf− = f(xi − εSi). If f+ < fi, Si will be the correct direction for decreasing the value off and if f−< fi, −Si will be the correct one. If both f+ and f− are greater than fi, thepoint xi is taken as the minimum along the direction Si.

4. Find the optimal step length λ∗i such that

f(xi ± λ∗i Si) = minλi

(xi ± λi Si)

where + or - sign has to be used depending upon whether Si or -Si is the direction fordecreasing the function value.

264


5. Set xi+1 = xi ± λ∗i Si depending on the direction for decreasing the function value, andfi+1 = f(xi+1).

6. Set the new value of i = i+1, and go to step 2; continue this procedure until no significantchange. is achieved in the value of the objective function.

The univariate method is very simple and it can be implemented very easily. However, it will notconverge rapidly to the optimum solution as it has a tendency to oscillate with steadily decreasingprogress towards the optimum. Hence it will be better to stop the computations at some pointnear to the optimum point rather than trying to find the precise optimum point.

Theoretically this method can be applied to find the minimum of any function that possessescontinuous derivatives. However, if the function has a steep valley, the method may not evenconverge. For example, consider the contours of a function of two variables with a valley asshown in Figure 5.29. If the univariate search starts at point P , the function value cannot bedecreased either in the direction ±S1 or in the direction ±S2. Thus the search comes to ahalt and one may be misled to take the point P , which is certainly not the optimum point, asthe optimum point. This situation arises whenever the value of the probe length ε needed fordetecting the proper direction (±S1 or ±S2) happens to be less than the number of workingsignificant figures of the computer.

Figure 5.29. Failure of the univariate method in a steep valley

Pattern Search Methods

In the univariate method, one searches for the minimum along directions parallel to the coordinateaxes. It is worth noticing that this method may not converge in some cases and, even if itconverges, its convergence will be very slow while approaching the optimum point. These problemscan be avoided by changing the directions of search in some favorable manner instead of retainingthem always parallel to the coordinate axes. To understand this idea, consider the contours of afunction shown in Figure 5.30.

265


Figure 5.30. Lines defined by the alternate points lie in the general direction of the minimum

The points 1, 2, 3, . . . indicate the successive points found by the univariate method. It canbe noticed that the lines joining the alternate points of the search (like 13; 24; 35; 46; . . . )lie in the general direction of the minimum. It can be proved that if the function being mini-mized is quadratic in two variables, all such lines pass through the minimum. In other words, all1ines like 13; 24; 35; 46 move toward the common center of the family of ellipses which are thecontours of the quadratic objective function. Unfortunately, this characteristic does not carrythrough directly to higher dimensions even for quadratic objective functions. However, this ideacan still be used to achieve rapid convergence while finding the minimum of a n–variable function.

This is the basic idea used in several direct search methods, which are known collectively as thepattern search methods. Two of the well–known pattern search methods, namely the Hooke andJeeves method and the Powell method , will be discussed below. In general, a pattern searchmethod takes m univariate steps (m = n if there are n variables in the problem) and thensearches for the minimum along the direction Si defined by

Si = xi − xi−m (5.120)

where xi, is the point obtained at the end of m univariate steps and xi−m is the starting pointbefore taking the m univariate steps. The direction defined by equation (5.120) is called a patterndirection and hence the methods that make use of the pattern direction are called pattern searchmethods. Actually, the directions used prior to taking a move along a pattern direction need notbe univariate directions. The general pattern search method is shown in Figure 5.31.

One important point is to be noted while using equation (5.120). If the point xi is already aminimum point on the line Si, no improvement can be achieved even by searching along thepattern direction Si. Hence whenever the optimal step length, λ∗i , along the pattern direction Si,is found to be zero, the corresponding starting point xi can be taken as the optimum point.

Of course, the other convergence requirements are also to be verified before actually terminatingthe procedure. In some cases, the direction Si may give a direction of ascent and in such cases

266


the optimum step length will be negative. This situation can be handled by providing an ap-propriate logic for determining the proper direction Si, or −Si, before proceeding to solve theone–dimensional minimization problem.

Figure 5.31. Flowchart for pattern search method

Hooke and Jeeves Method

The simple and very effective technique called Hooke and Jeeves direct search method is a sequen-tial technique each step of which consists of two kinds of moves, one called the exploratory moveand the other called the pattern move. The first kind of move has to explore the local behaviorof the objective function, while the second kind of move has to take advantage of the patterndirection. The general procedure can be described by the following steps:

1. Select an arbitrarily starting point x = x1,x2, . . . ,xn , called the initial base point , and asmall predetermined step length ∆xi, in each of the coordinate directions ui, i = 1,2, . . . ,n.Set k = 1.

2. Compute fk = f(xk). Set i = 1, yk,0 = xk (the point ykj indicates the temporary basepoint obtained from the base point xk by perturbing the jth component of xk) and startthe exploratory search as stated in step 3.

267


3. The variable xi is perturbed about the current temporary base point yk,i−1 to obtain thenew temporary base point as

yk,i =

yk,i−1 + ∆xiui if f+ = f(yk,i−1 + ∆xiui) < f(yk,i−1)

yk,i−1 −∆xiui if f− = f(yk,i−1 −∆xiui) < f(yk,i−1) < f+

yk,i−1 if f = f(yk, i−1) < min (f+,f−)

This process of finding the new temporary base point is continued for i = 1,2, . . . until xn

is perturbed to find yk,n.

4. If the point yk,n remains the same as xk, reduce the step lengths ∆xi, (say, by a factor oftwo), set i = 1 and go to step 3. If yk,n is different from xk, obtain the new base point as

xk+1 = yk,n

and go to step 5.

5. With the help of the base points xk and xk+1, establish a pattern direction S as

S = xk+1 − xk

and find a point yk+1, 0 as

yk+1,0 = xk+1 + λS (5.121)

where λ is the step length which can be taken as 1 for simplicity. Alternatively, one cansolve a one–dimensional minimization problem in the direction S and use the optimum steplength λ∗ in place of λ in equation (5.121).

6. Set k = k + 1, fk = f(yk0), i = 1, and repeat step 3. If at the end of step 3, f(yk,n) <

f(xk), take the new base point as xk+1 = yk,n, and go to step 5. On the other hand, iff(yk,n) ≥ f(xk), set xk+1 = yk, reduce the step lengths ∆xi, and go to step 2.

7. The process is assumed to have converged whenever the step length falls below a predeter-mined small quantity ε. Thus the process is terminated if

maxi

∆xi < ε

Powell Method

The Powell method is an extension of the basic pattern search method. It is the most widelyaccepted direct search methods and it can be proved to be a method of conjugate directions.As it will be shown later, a conjugate directions method will minimize a quadratic function ina finite number of steps. Powell has suggested some modifications to facilitate the convergenceof this method when applied to non-quadratic objective functions. The Powell method is a verypowerful method and has been proved to be superior to some of the descent methods.

The basic idea of the Powell method can be understood with the help of Figure 5.32.

268


Figure 5.32. Progress of Powell method

Let the given two–variable function be minimized once along each of the coordinate directions,and then in the corresponding pattern direction. This gives point 4. For the next cycle of mini-mization, one of the coordinate directions (x1–direction in the present case) is discarded in favorof the pattern direction. Thus minimization is searched along u2 and S1 and the point 6 is ob-tained. Then a new pattern direction S2 is generated. For the next cycle of minimization, one ofthe previously used coordinate directions is discarded (u2-direction in this case) in favor of thenewly generated pattern direction. Then, by starting from point 7, minimization is performedalong the directions S1 and S2 thereby obtaining points 8 and 9, respectively. For the next cycleof minimization, since there is no coordinate direction to discard the whole procedure is restartedby minimizing along directions parallel to the coordinate axes. This procedure is continued untilthe desired minimum point is found.

The flowchart for the simplest version of the Powell method is given in Figure 5.33. Note thataccordingly the search will be made sequentially in the directions Sn; S1, S2, S3, . . . , Sn−1, Sn,S1

p; S2, S3, S4, . . . , Sn−1, Sn, S(1)p ; S(2)

p ; S3, S4, . . . ,Sn−1, Sn; S(1)p ; S(2)

p ; S(3)p ; . . . until the

minimum point is found. Here Si indicates the coordinate direction ui, while S(j)p denotes the jth

pattern direction. In Figure 5.33 the previous base point is stored as the vector z in block A, andthe pattern direction is constructed by subtracting the previous base point from the current onein block B. The pattern direction is then used in the previous cycle as a minimization directionin blocks C and D. For the next cycle, the first direction used in the previous cycle is discardedin favor of the newly generated pattern direction: This is achieved by updating the numbers ofthe search directions as shown in block E. Thus both the points z and z used in block B for the

269


construction of pattern direction are points that are minima along Sn in the first cycle, the firstpattern direction S(1)

p in the second cycle, the second pattern direction S(2)p in the third cycle,

and so forth.

Figure 5.33. Flowchart for the Powell method

Definitions

Conjugate directions: Let A be an n × n symmetric matrix. A set of n vectors (or directions)Si are said to be conjugate (more accurately A-conjugate) if

STi ASj = 0 for all i 6= j, i = 1,2, . . . ,n and j = 1,2, . . . ,n (5.122)

Quadratically convergent method. If a minimization method always locates the minimum of ageneral quadratic function in no more than a predetermined number of operations and if the

270


limiting number of operations is directly related to the number of variables n, then the methodis said to be quadratically convergent.

Theorem 17. If a quadratic function

Q (x) =12xTA x + BT x + C (5.123)

is minimized sequentially, once along each direction of a set of n linearly independent, A–conjugate directions, the global minimum of Q will be located at or before the nth step regardlessof the starting point .

Such a method is known as a quadratically convergent method. The order in which the directionsare used is immaterial in this property.

5.8.2 Descent Methods

The descent techniques require, in addition to objective function evaluations, the evaluation offirst and possibly higher order derivatives of the objective function. Since more informationabout the function being minimized is used through the use of derivatives, the descent methodsare generally more efficient compared to the direct search techniques. The descent techniques arealso known as gradient methods.

Gradient of a Function

The partial derivatives of a function f with respect to each of the n variables are collectivelycalled the gradient of the function, which is denoted by

∇f =

∂f/∂x1

∂f/∂x2...

∂f/∂xn

(5.124)

The gradient is a n–component vector and has a very important property. If one moves alongthe gradient direction from any point in n–dimensional space, the function value increases at thefastest rate. Hence the gradient direction is called the direction of steepest ascent . Unfortunately,the direction of steepest ascent is a local property and not a global one. This is illustrated in Fig-ure 5.34 where the gradient vectors∇f evaluated at the points 1, 2, 3 and 4 lie along the directions11’ , 22’ , 33’ and 44’, respectively. Thus the function value increases at the fastest rate in thedirection 11’ at point 1, but not at point 2. Similarly, the function value increases at the fastestrate in the direction 22’ (33’) at point 2 (3), but not at point 3 (4). In other words, the direction ofsteepest ascent generally varies from point to point, and if one makes infinitely small moves alongthe direction of steepest ascent, the path will be a curved line like the curve 1 2 3 4 in Figure 5.34.

271


Since the gradient vector represents the direction of steepest ascent, the negative of the gradientvector denotes the direction of steepest descent . Thus, any method which makes use of thegradient vector can be expected to give the minimum point faster than the one which does notmake use of the gradient vector. All the descent methods make use of the gradient vector, eitherdirectly or indirectly, in finding the search directions. Before considering the descent methods ofminimization, it is necessary to state that the gradient vector represents the direction of steepestascent (Theorem 18).

Figure 5.34. Steepest ascent directions

Evaluation of the gradient

As stated earlier, all the descent methods are based on the use of gradient in one form or other.Assuming that the function is differentiable, the gradient at any point xm can be evaluated as

∇f |xm= ∇fm =

∂f/∂x1

∂f/∂x2...

∂f/∂xn

xm

However there are three situations where the evaluation of the gradient poses certain problems:

• the function is differentiable at all the points, but the calculation of the components of thegradient, ∂f/∂xi, is either impractical or impossible;

• the expressions for the partial derivatives ∂f/∂xi can be derived, but they require largecomputational time for evaluation;

• the gradient ∇f is not defined at all the points.

In the first case, the forward finite difference formula can be used

∂f

∂xi|xm '

f(xm + ∆xi ui)− f(xm)∆xi

, i = 1,2, . . . ,n (5.125)

272


to approximate the partial derivative ∂f/∂xi at xm. If the function value at the base point xm

is known, this formula requires one additional function evaluation to find (∂f/∂xi)|xm . Thusit requires n additional function evaluations to evaluate the approximate gradient ∇f |xm . Forbetter results, one can use the central finite difference formula to find the approximate partialderivative of (∂f/∂xi)|xm

(∂f

∂xi

)

xm

' f(xm + ∆xiui)− f(xm −∆xiui)∆xi

, i = 1,2, . . . ,n (5.126)

This formula requires two additional function evaluations for each of the partial derivatives. Inequations (5.125) and (5.126), ∆xi is a small scalar quantity and ui is a vector of order n whoseith component has a value of one, and all other components have a value of zero. In practicalcomputations, the value of ∆xi has to be chosen with some care. If ∆xi is too small, the differencebetween the values of the function evaluated at (xm + ∆xi ui) and (xm − ∆xi ui) may be verysmall and numerical round–off error may predominate. On the other hand, if ∆xi is too large,the truncation error may predominate in the calculation of the gradient.

In the second case also, the use of the finite difference formulae is preferred whenever the exactgradient evaluation requires more computational time than the one involved in using equations(5.125) and (5.126).

In the third case, the finite difference formulae cannot be used since the gradient is not definedat all the points. For example, consider the function shown in Figure 5.35. If equation (5.126)is used to evaluate the derivative ∂f/∂x at xm, one obtains a value of α1 for a step size ∆x1

and a value of α2 for a step size ∆x2. Since, in reality, the derivative does not exist at the pointxm, the use of finite difference formulae might lead to a complete breakdown of the minimizationprocess. In such cases, the minimization can only be done by any of the direct search techniquesdiscussed earlier.

Figure 5.35. Gradient not defined at xm

273


Rate of change of a function along a direction

In most of the optimization techniques, one will be interested in finding the rate of change of afunction with respect to a parameter λ along some specified direction Si, away from a given pointxi. Any point in the specified direction away from the point xi can be expressed as x = xi +λSi.The interest is to find the rate of change of the function along the direction Si (characterized bythe parameter λ), that is

df

dλ=

n∑

j=1

∂f

∂xj· ∂xj

∂λ(5.127)

where xj is the jth component of x. But

∂xj

∂λ=

∂

∂λ(xij + λ sij) = sij (5.128)

where xij and sij are the jth components of xi and Si, respectively. Hence

df

dλ=

n∑

j=1

∂f

∂xjsij = ∇fT Si (5.129)

If λ∗ minimizes f in the direction Si, one obtains

df

dλ|λ=λ∗ = ∇f |Tλ∗Si = 0 (5.130)

at the point xi + λ∗ Si.

Steepest Descent Method

The use of the negative of the gradient vector as a direction for minimization was first made byCauchy (1847). In this method, one starts from an initial trial point x1 and iteratively movetowards the optimum point according to the rule

xi+1 = xi + λ∗i Si = xi − λ∗i ∇fi (5.131)

where λ∗i is the optimal step length along the search direction Si = −∇fi. The flowchart for thismethod is given in Figure 5.36. The method of steepest descent may appear to be the best uncon-strained minimization technique since each one–dimensional search starts in the ‘best’ direction.However, owing to the fact that the steepest descent direction is a local property, the method isnot really effective in most of the problems.

In two–dimensional problems, the application of the steepest descent method leads to a pathmade up of parallel and perpendicular segments as shown in Figure 5.37. It can be seen that thepath is a zig-zag in much the same way as the univariate method. In higher dimensions, the pathmay not be made up of parallel and perpendicular segments and hence the method may havedifferent characteristics than the univariate method. For functions with significant eccentricity,the method settles into a steady n–dimensional zig–zag and the process will be hopelessly slow.

274


Figure 5.36. Flowchart for the steepest descent method

On the other hand, if the contours of the objective function are not very much distorted, themethod may converge faster as shown in Figure 5.37.

Figure 5.37. Convergence of the steepest descent method

The following criteria can be used to terminate the iterative process

| f(xi+1)− f(xi)f(xi)

| ≤ ε1 (5.132)

∣∣∣∣∂f

∂xi

∣∣∣∣ ≤ ε2 , i = 1,2, . . . ,n (5.133)

| xi+1 − xi | ≤ ε3 (5.134)

275


Conjugate Gradient Method

The convergence characteristics of the steepest descent method can be greatly improved by mod-ifying it into a conjugate gradient method known as Fletcher–Reeves method . It has been shownthat any minimization method that makes use of the conjugate directions is quadratically conver-gent. This property of quadratic convergence is very useful because it ensures that the methodwill minimize a quadratic function in n steps or less. Since any general function can be approx-imated reasonably well by a quadratic near the optimum point, any quadratically convergentmethod is expected to find the optimum point in a finite number of iterations.

It has been shown that the Powell’s conjugate direction method requires n single variable min-imizations per iteration and sets up one new conjugate direction at the end of each iteration.Thus it requires, in general, n2 single variable minimizations to find the minimum of a quadraticfunction. On the other hand, if one can evaluate the gradients of the objective function, he/shecan set up a new conjugate direction after every one–dimensional minimization and hence canachieve faster convergence. The construction of conjugate directions and the development of theconjugate gradient method is given below.

Development of the conjugate gradient method

The procedure used in the development of the conjugate gradient method is analogous to theGram–Schmidt orthogonalization procedure. This procedure sets up each new search directionas a linear combination of all the previous search directions, and the newly determined gradient.The following theorem is important in developing the conjugate gradient method.

Theorem 19. Suppose that the point xi+1 is reached after i steps while minimizing a quadraticfunction f(x) = 1/2xT Ax+BT x+C. If the search directions used in the minimization process,S1, S2, . . . , Sn, are mutually conjugate with respect to A, then

STk ∇fi+1 = 0 for k = 1,2, . . . ,i (5.135)

New algorithm

Consider the development of a new algorithm by modifying the steepest descent method applied toa quadratic function f(x) = 1/2xT Ax+BTx+C by imposing the condition that the successivedirections be mutually conjugate. Let x1 be the starting point for the minimization and let thefirst search–direction be the steepest descent direction. Then

S1 = −∇f1 = −Ax1 −B (5.136)and

x2 = x1 + λ∗1S1 (5.137)

where λ∗1 is the minimizing step length in the direction S1 so that

ST1 ∇f |x2 = 0 (5.138)

276


Equation (5.138) can be expanded as

ST1 [A(x1 + λ∗1S1) + B] = 0

or

ST1 Ax1 + λ∗1 ST

1 AS1 + ST1 B} = 0

from which the value of λ∗1 can be obtained as

λ∗1 =−ST

1 (Ax1 + B)ST

1 AS1=

ST1 ∇f1

ST1 AS1

(5.139)

Now express the second search direction as a linear combination of S1 and -∇f2 as

S2 = −∇f2 + β2S1 (5.140)

where β2 is to be chosen so as to make S1 and S2 conjugate. This requires

ST1 AS2 = 0 (5.141)

Substituting for S2 from equation (5.140), equation (5.141) becomes

ST1 A (−∇f2 + β2 S1) = 0 (5.142)

Since equation (5.137) gives

S1 =x2 − x1

λ∗1(5.143)

Equation (5.142) can be written as

ST1 AS2 = −x2 − x1)T

λ∗1A (∇f2 + β2 S1) = 0 (5.144)

The difference of the gradients (∇f2 −∇f1) is given by

∇f2 −∇f1 = (Ax2 + B)− (Ax1 + B) = A (x2 − x1) (5.145)

With the help of equations (5.145), equation (5.144) can be written as

(∇f2 −∇f1)T (∇f2 − β2 S1) = 0 (5.146)

where the symmetricity of the matrix A has been used. Equation (5.146) can be expanded toobtain

∇fT2 ∇f2 −∇fT

1 ∇f2 − β2∇fT2 S1) + β2∇fT

1 S1) = 0

Since ∇fT1 ∇f2 = −ST

1 ∇f2 = 0 from equation (5.135), this equation gives the value of β2 as

β2 =∇fT

2 ·∇f2

∇fT1 ∇f1

(5.147)

277


Next consider the third search direction as a linear combination of S1, S2 and −∇f3 as

S3 = −∇f3 + β3 S2 + δ3 S1 (5.148)

where the values of β3 and δ3 can be found by making S3 conjugate to S1 and S2.

First consider

ST1 AS3 = −ST

1 A∇f3 + β3 ST1 AS2 + δ3 ST

1 AS1 = 0 (5.149)

If one assumes that S1 and S2 are already made conjugate, ST1 AS2 = 0, and equation (5.149)

gives

δ3 =ST

1 A∇f3

ST1 AS1

(5.150)

From equation (5.143) δ3 can be expressed as

δ3 =(x2 − x1)T

λ∗1· A∇f3

ST1 AS1

(5.151)

By using equation (5.145), equation (5.151) can be rewritten as

δ3 =1λ∗1· (∇f2 −∇f1)T ∇f3

ST1 AS1

(5.152)

Since S1 = −∇f1 from equation (5.136) and S2 − β2S1 = −∇f2 from equation (5.140), oneobtains

∇f2 −∇f1 = −S2 + S1 (1 + β2) (5.153)

and equation (5.152) gives

δ3 =1λ∗1· [−S2 + (1 + β2)S1]T ∇f3

ST1 AS1

(5.154)

which can be seen to be equal to zero in view of equation (5.135). Therefore equation (5.148)becomes

S3 = −∇f3 + β3 S2 (5.155)

The value of β3 can be found by making S3 conjugate to S2. However, instead of finding thevalue of a particular β, one can derive a general formula for βi,i = 2,3, . . ..

By generalizing equation (5.155), one can express can express the search direction in the ith step,Si, as a 1inear combination of −∇fi and Si−1, that is

Si = −∇fi + βi Si−1 (5.156)

where the value of βi can be found by making Si, conjugate to Si−1 as

βi =∇fT

i ∇fi

∇fTi−1∇fi−1

(5.157)

278


The search directions that have been considered so far, equation (5.156), are precisely the direc-tions used in the Fletcher–Reeves method.

So far Si and Si−1 have been conjugate. It will be shown now that Si, given by equation (5.156),will be automatically conjugate to all the previous search directions, Sk, k = 1,2, . . . ,i−2, providedthat S1, S2, . . ., Si−1 are conjugate. For this, consider

STk ASi = ST

k A(−∇fi + βi Si−1) = −STk A∇fi + βi ST

k ASi−1 , k = 1,2, . . . ,i− 2 (5.158)

Since STk ASi−1 = 0 for k = 1,2, . . . ,i− 2, one obtains

STk ASi = −ST

k A∇fi , k = 1,2, . . . ,i− 2 (5.159)

Equations similar to (5.143) and (5.145) can be obtained as

Sk =xk+1 − xk

λ∗k(5.160)

and

∇fk+1 −∇fk = A (xk+1 − xk) (5.161)

and equation (5.159) can be written as

STk ASi = − 1

λ∗k(∇fk+1 −∇fk)T ∇f1 = 0 , k = 1,2, . . . ,i− 2 (5.162)

in view of the relation

∇fTk ·∇fi+1 = 0 , k = 1,2, . . . ,i− 2 (5.163)

The algorithm

The use of equations (5.156) and (5.157) for the minimization of general functions was first sug-gested by Fletcher and Reeves (1964). Their algorithm can be summarized as follows:

1. Start with an arbitrary initial point x1.

2. Set the first search direction S1 = −∇f(x1) = −∇f1.

3. Find the point x2 according to the relation

x2 = x1 + λ∗1 S1

where λ∗1 is the optimal step length in the direction S1. Set i = 2 and go to the next step.

4. Find ∇f1 = ∇f(xi), and set

Si = ∇fi +|∇fi|2|∇fi−1|2 Si−1 (5.164)

279


Figure 5.38. Flowchart for the Fletcher–Reeves method

5. Compute the optimum step length λ∗i in the direction Si, and find the new point

xi+1 = xi + λ∗i Si (5.165)

6. Test for the optimality of the point xi+1. If xi+1 is optimum, stop the process. Otherwise,set the value of i = i + 1, and repeat steps 4, 5 and 6 until the convergence is achieved.

The flowchart for the Fletcher and Reeves method is shown in Fig. 5.38.

280


This method was originally proposed as a method for solving systems of linear equations derivedfrom the stationary conditions of a quadratic. Since the directions Si, used in this method areA–conjugate, the process should converge in n-cycles or less for a quadratic function. Howeverfor ill–conditioned quadratics (whose contours are highly eccentric and distorted), the methodmay require much more than n-cycles for convergence. The reason for this has been found to hethe cumulative effect of rounding errors. Since Si is given by equation (5.164), any error resultingfrom the inaccuracies involved in the determination of λ∗i , and from the round–off error involvedin accumulating the successive |∇fi|2 Si−1/|∇fi−1|2, is carried forward through the vector Si.Thus the search directions Si will be progressively contaminated by these errors. Hence it isnecessary, in practice, to restart the method periodically after every, say, m steps by taking thenew search direction as the steepest descent direction. That is, after every m steps, Sm+1 is setequal to -∇fm+1 instead of the usual form. Fletcher and Reeves have recommended a value ofm = n + 1 where n is the number of design variables.

In spite of this, the Fletcher and Reeves algorithm is vastly superior to the steepest descentmethod and the pattern–search methods, but it turns out to be rather less efficient than thequasi–Newton and the variable metric methods, which will be considered below. It should beborne in mind, however, that all these methods (to be discussed) require a storage of a n × n

matrix.

Variable Metric Method

Significant developments have taken place in the area of the descent techniques with the intro-duction of the variable metric method by Davidon (1959). This method was extended by Fletcherand Powell (1963), so becoming the Davidon-Fletcher-Powell method . This method is the bestgeneral purpose unconstrained optimization technique making use of the derivatives. The itera-tive procedure of this method can be stated as follows:

1. Start with an initial point x1 and a n× n positive definite symmetric matrix H1. UsuallyH1 is taken as the identity matrix I. Set iteration number as i = 1.

2. Compute the gradient of the function, ∇fi, at the point xi, and set

Si = −Hi∇fi (5.166)

taking into account that for the first iteration, the search direction S1 will be the same asthe steepest descent direction, i.e. S1 = −∇f1, if H1 = I.

3. Find the optimal step length λ∗i in the direction Si and set

xi+1 = xi + λ∗i Si (5.167)

4. Test the new point xi+1 for optimality. If xi+1 is optimal, terminate the iterative process;otherwise, go to the next step.

5. Update the H matrix as

Hi+1 = Hi + Mi + Ni (5.168)

281


where

Mi = λ∗iSi ST

i

STi Qi

(5.169)

Ni = −(Hi Qi)(Hi Qi)T

QTi Hi Qi

(5.170)

Qi = ∇f(xi+1)−∇f(xi) = ∇fi+1 −∇fi (5.171)

6. Set the new iteration number l = i + 1, and go to step 2.

This method is very powerful and converges quadratically since it is a conjugate gradient method.It is very stable and continues to progress towards the minimum even while minimizing highlydistorted and eccentric functions. The stability of this method can be attributed to the fact thatit carries the information obtained in previous iterations through the matrix Hi. It can be shownthat Hi will always remain positive definite and will be an approximation to the inverse of thematrix of second partial derivatives of the objective function.

282

5.9 – NLP: Constrained Optimization Methods

5.9 NLP: Constrained Optimization Methods

This section deals with the optimization techniques that are applicable for the solution of a con-strained nonlinear optimization problem.

There are many techniques available for the solution of a constrained nonlinear programmingproblem, which can be classified into two broad categories, namely, the direct methods and indirectmethods as shown in Table 5.5.

Direct Methods Indirect Methods

(a) Heuristic search method (a) Transformation of variables- complex method

(b) Constraint approximation methods (b) Penalty function methods- cutting plane method - interior penalty function method- approximate programming method - exterior penalty function method

(c) Methods of feasible directions- Zountendijk method- Rosen method

Table 5.5. Constrained Optimization Techniques

Direct Methods

In the direct methods, the constraints are handled in an explicit manner, whereas in most of theindirect methods the constrained problem is solved as a sequence of unconstrained minimizationproblems.

Heuristic Search Methods

The heuristic search methods are mostly intuitive and do not have much theoretical support. Thecomplex method , which can be considered to be similar to the simplex method, may be studiedunder this category.

Constraint Approximation Methods

In these methods, the nonlinear objective function and the constraints are linearized about somepoint and the approximating linear programming problem is solved by using linear programmingtechniques.

The resulting optimum solution is then used to construct a new linear approximation which willagain be solved by using LP techniques. This procedure is continued until the specified conver-gence criteria are satisfied. There are two methods, namely, the cutting plane method and theapproximate programming method , which work on this principle.

283


Methods of Feasible Directions

The methods of feasible directions are those which produce an improving succession of feasiblevectors xi, by moving in a succession of usable feasible directions. A feasible direction is onealong which at least a small step can be taken without leaving the feasible domain.

A usable feasible direction is a feasible direction along which the objective function value canbe reduced at least by a small amount. Each iteration consists of two important steps in themethods of feasible directions. The first step finds a usable feasible direction at a specified pointand the second step determines a proper step length along the usable feasible direction found inthe first step. The Zoutendijk method of feasible directions and Rosen gradient projection methodcan be considered as particular cases of general methods at feasible directions.

Indirect Methods

Two basic types of indirect optimization methods are dealt with.

Transformation of Variables

Some of the constrained optimization problems have their constraints expressed as simple andexplicit functions of the decision variables. In such cases, it may be possible to make a changeof variables such that the constraints are automatically satisfied. In some other cases, it may bepossible to know, in advance, which constraints will be active at the optimum solution. In thesecases, the particular constraint equation, gj(x) = 0, can be used to eliminate some of the variablesfrom the problem. Both these approaches will be discussed under the heading transformation ofvariables.

Penalty Function Methods

There are two types of penalty function methods, namely, the interior penalty function methodand the exterior penalty function method . In both types of methods, the constrained problem istransformed into a sequence of unconstrained minimization problems such that the constrainedminimum can be obtained by solving the sequence of unconstrained minimization problems.

In the interior penalty function methods, the sequence of unconstrained minima lies in the feasibleregion and thus it converges to the constrained minimum from the interior of the feasible region.In the exterior methods, the sequence of unconstrained minima lies in the infeasible region andconverges to the desired solution from the exterior of the feasible region.

Before discussing various types of constrained minimization techniques stated, it is worth seeingsome of the important characteristics of a constrained problem.

284


5.9.1 Characteristics of a Constrained Problem

The presence of constraints in a nonlinear programming problem creates more problems whilefinding the minimum. Several situations can be identified depending on the effect of constraintson the objective function. The simplest situation is when the constraints do not have any influ-ence on the minimum point. However, it is necessary to proceed with the general assumptionthat the constraints will have some influence on the optimum point.

A general case would appear as the case shown in Figure 5.39. Here the minimum value of f

corresponds to the contour of the lowest value having at least one point in common with theconstraint set. If the problem is a LP problem, the optimum point will always be an extremepoint.

Figure 5.39. Constrained minimum occurring on a nonlinear constraint

It should be noted that ∇f is not equal to zero at the optimum point x∗, but at least one of theconstraints, gj(x), will be zero at x∗. It can be seen from Figure 5.39 that the negative of thegradient must be expressed as

−∇f = λ∇gj , λ > 0

at an optimum point. This condition can easily be identified as a particular case of the Kuhn–Tucker necessary conditions to be satisfied at a constrained optimum point.

Another situation is one where the minimization problem has two or more local minima. Ifthe objective function has two or more unconstrained local minima and if at least one of them iscontained in a feasible region, then the constrained problem would have at least two local minima.

In summary, the minimum of a nonlinear programming problem will not be, in general, an extremepoint of the feasible region, and may not even be on the boundary. Also the problem mayhave local minima even if the corresponding unconstrained problem is not having local minima.Further, none of the local minima may correspond to the global minimum of the unconstrainedproblem. All these characteristics are direct consequences of the introduction of constraints.

285


5.9.2 Direct Methods

Methods of feasible directions

The methods of feasible directions are based on the same philosophy as the methods of uncon-strained minimization, but are constructed to deal with inequality constraints. The basic idea isto choose a starting point satisfying all the constraints and to move to a better point accordingto the iterative scheme

xi+1 = xi + λSi (5.172)

where xi is the starting point for the ith iteration, Si is the direction of movement, λ is thedistance of movement (step length) and xi+1 is the final point obtained at the end of the ith

iteration. The value of λ is always chosen so that xi+1 lies in the feasible region. The searchdirection Si is found such that (i) a small move in that direction violates no constraint, and (ii)the value of the objective function can be reduced in that direction. The new point xi+1 is takenas the starting point for the next iteration and the whole procedure is repeated several times untila point is obtained such that no direction satisfying both (i) and (ii) can be found. In general,such a point denotes the constrained local minimum of the problem. This local minimum needsnot be a global one unless the problem is a convex programming problem. A direction satisfyingthe property (i) is called feasible, while a direction satisfying both the properties (i) and (ii) iscalled a usable feasible direction. This is the reason, why these methods are known as methods offeasible directions. There are many ways of choosing usable feasible directions and hence thereare many different methods of feasible directions.

Situations for feasible directions will depend on the geometry of constraint functions; that is, inFigure 5.40(a) g1 and g2 are convex, in Figure 5.40(b) g1 is convex and g2 is linear; in Figure5.40(c) g1 is convex and g2 is concave.

Figure 5.40. Feasible directions S

286


A vector S will be a usable feasible direction if it satisfies both the relations

d

dλf(xi + λS |λ=0= ST ·∇f(xi) ≤ 0 (5.173)

d

dλgj(xi + λS |λ=0= ST ·∇gj(xi) ≤ 0 (5.174)

where the equality sign holds true only if a constraint is linear or strictly concave as shown inFigures 5.11(b) and 5.11(c).

The geometrical meaning of equation (5.174) is that the vector S must make an obtuse anglewith all the constraint normals except that, for the linear or concave constraints, the angle maygo to as less as 90o. Any feasible direction satisfying the strict inequality sign of equation (5.174)lies at least partly in the feasible region. By moving along such a direction from xi one will beable to find another point xi+1, which also lies in the feasible region.

It is possible to reduce the value of the objective function at least by a small amount by takinga step length λ > 0 along such a direction.

The iterative procedure of the methods of feasible directions is shown graphically in Figure5.41. Let x1 be the starting feasible point and let the initial usable feasible direction chosen beS1 = −∇f (x1). A step length λ > 0 is to be taken along the direction S1 so as to minimize f

along the direction S2 without violating any of the constraints. This procedure gives x2 as thenew point.

Figure 5.41. Iterative procedure of the methods of feasible directions

Proceeding in the direction of the negative gradient of the objective function at x2 violates theconstraints. Hence a usable feasible direction S2 is found at the point x2 such that it makes anangle greater than 90o with ∇g2 and an angle lesser than 90o with ∇f2. Several usable feasibledirections can be generated at this point x2. The locally best feasible direction may be selected

287


along which the value of f decreases most rapidly, that is, along which −ST2 ∇f(x2) is maxi-

mized. This is the feasible direction which makes the smallest angle with −∇f(x2) = −∇f2.By moving along the direction S2 by the maximum possible distance, the point x3 is obtained.A new usable feasible direction S3 at x3 is obtained along which to move as much as possibleto obtain the point x4. At this point, the negative of the gradient of the objective function is−∇f4 and no usable feasible direction can be found. In other words, no feasible direction at x4

makes an angle of less than 90o with −∇f4. Thus the point x4 will be taken as a local minimum.It can be seen that this local minimum is same as the global minimum of f over the constraint set.

It may not always be possible to obtain the global minimum of f . For example, if the processstarts with point y1 shown in Figure 5.41, the iterative procedure leads to the local minimum y3,which is different from the global minimum x4. This problem of local minima is common to allmethods and one can be sure of avoiding them only in the case of convex programming problems.

5.9.3 Indirect Methods

Transformation Techniques

Change of Variables

If the constraints gj(x) are explicit functions of the variables xi and have certain simple forms, itmay be possible to make a transformation of the independent variables such that the constraintsare automatically satisfied. Thus it may be possible to convert a constrained optimization probleminto an unconstrained one by making change of variables. One of the frequently encounteredconstraints, which can be satisfied in this way, is that when the variable is bounded below andabove by certain constants

li ≤ xi ≤ ui (5.175)

where li, and ui, are, respectively, the lower and the upper limits on xi. These constraints canbe satisfied by transforming the variable xi as

xi = li + (ui − li) sin2 yi (5.176)

where yi is the new variable which can take any value.

In the particular case when the variable xi is restricted to lie in the interval (0, 1), any of thefollowing transformations can be used

xi = sin2 yi ; xi = cos2 yi ; xi = eyi/(eyi + e−yi) ; xi = y2i /(1 + y2

i )

If the variable xi is constrained to take only positive values, the transformation has to be asfollows

xi =| yi | ; xi = y2i ; xi = eyi

288


On the other hand, if the variable is restricted to take values lying only in between -1 and 1, thetransformation is given by

xi = sin yi ; xi = cos yi ; xi =2yi

1 + y2i

After applying these transformations, the unconstrained minimum of the objective function issought with respect to the new variables yi.

The following points are to be noted in applying this transformation technique:

• the constraints gj(x) have to be very simple functions of xi;• for certain constraints it may not be possible to find the necessary transformation;• if it is not possible to eliminate all the constraints by making change of variables, it may

be better not to use the transformation at all; the partial transformation may, sometimes,produce a distorted objective function which might be more difficult to minimize than theoriginal function.

Elimination of Variables

If an optimization problem has m inequality constraints, all of them may not be active 9 at theoptimum point. If it is known, in advance, which constraints are going to be active at the opti-mum point, those constraint equations can be used to eliminate the variables from the problem.Thus if r (< n) specific constraints are known to be active at the optimum point, any r variablescan be eliminated from the problem and a new problem is obtained involving only (n−r) variablesand (m− r) constraints. This problem will be, in general, much easier to solve compared to theoriginal problem.

The major drawback of this method is that it will be very difficult to know, before hand, whichof the constraints are going to be active at the optimum point. Thus, in a general problem withm constraints, it is needed to check (i) the minimum of f (x) with no constraints (assuming thatno constraint is active at the optimum point), (ii) the minimum of f (x) by taking one constraintat a time with equality sign (assuming that one constraint is active at the optimum point), (iii)the minimum of f (x) by taking all possible combinations of constraints taken two at a time(assuming that two constraints are active at the optimum point), etc. If any of these solutionssatisfies the Kuhn–Tucker necessary conditions, it is likely to be a local minimum of the originaloptimization problem. It can be seen that, in the absence of a prior knowledge about whichconstraints are going to be active at the optimum point, the number of problems to be solved isgiven by

1 + m +m (m− 1)

2!+

m (m− 1) (m− 2)3!

+ . . . +m!

(m− n)!n!=

n∑

k=0

m!k! (m− k)!

For example, if the original optimization problem has m = 5 variables and n = 10 constraints,the number of problems to be solved will be 638, which can be seen to be very large.

9A constraint, which is satisfied with equality sign, is called an active constraint

289


However, in LP problems, it is known that exactly (n−m) variables will be zero at the optimumpoint. In such cases, it is necessary to solve only n!/[(n − m)!m!] problems to identify theoptimum solution. For example, if m = 5 and n = 10, the number of problems to be solved willbe 252, which is still a 1arge number in terms of practical computations. Hence this approachis not feasible even for solving LP problems. The simplex method can be seen to be much moreefficient than this technique because it moves from one basic feasible solution to an improvedneighboring basic feasible solution in a systematic manner.

Penalty Function Methods: Basic Approach

The penalty function methods transform the basic optimization problem into alternative formula-tions such that numerical solutions are sought by solving a sequence of unconstrained minimiza-tion problems. Let the basic optimization problem be of the form

Find x which minimizes f(x)

subject to gj(x) ≤ 0 , j = 1,2,...,m

(5.177)

This problem is converted into an unconstrained minimization problem by constructing a functionof the form

φk = φ(x,rk) = f(x) + rk

m∑

j=1

Gj [gj(x)] (5.178)

where Gj is some function of the constraint gj and rk is a positive constant known as the penaltyparameter . The second term on the right side of equation (5.178) is called the penalty term andits significance will be seen later. If the unconstrained minimization of the φ–function is repeatedfor a sequence of values of the penalty parameter rk (k = 1,2, . . .), the solution may be broughtto converge to that of the original problem stated in equation (5.177). This is the reason why thepenalty function methods are also known as sequential unconstrained minimization techniques(SUMT).

The penalty function formulations for inequality constrained problems can be divided into twocategories, namely, the interior penalty function method and the exterior penalty function method .In the interior formulations some of the popularly used forms of Gj are given by

Gj = − 1gj(x)

(5.179)

or

Gj = log [−gj(x)] (5.180)

In the case of exterior penalty function formulations, some of the commonly used forms of thefunction Gj are

Gj = max [0, gj(x)] (5.181)

290


or

Gj = {max [0, gj(x)]}2 (5.182)

In the interior methods, all the unconstrained minima of φk lie in the feasible region and convergeto the solution of equations (5.177) as rk is varied in a particular manner. In the exterior methods,all the unconstrained minima of φk lie in the infeasible region and converge to the desired solutionfrom the outside as rk is changed in a specified manner. The convergence of the unconstrainedminima of φk is illustrated in Figure 5.42 for the simple problem.

Find x = {x1}which minimizes f(x) = α x1

subject to g1(x) = β − x1 ≤ 0

(5.183)

It can be seen from Figure 5.42(a) that the unconstrained minima of φ(x,rk) converge to theoptimum point x∗ as the parameter rk is increased sequentially. On the other hand, the interiormethod shown in Figure 5.42(b) gives convergence as the parameter rk is decreased sequentially.

Figure 5.42. Penalty function methods: (a) exterior method; (b) interior method

There are several reasons for the appeal of the penalty function formulations. One main reason,which can be observed from Figure 5.42 is that the sequential nature of the method allows agradual or sequential approach to criticality of the constraints. In addition, the sequential processpermits a graded approximation to be used in the analysis of the system. This means that if theevaluation of f and gj , and hence φ(x,rk), for any specified design vector x is computationally

291


very difficult, one can use coarse approximations during the early stages of optimization (when theunconstrained minima of φk are far away from the optimum) and finer or more detailed analysisapproximation during the final stages of optimization. Another reason is that the algorithms forthe unconstrained minimization of rather arbitrary functions have been well studied and generallyare quite reliable.

Interior Penalty Function Method

As indicated previously, in the interior penalty function method, a new function (φ-function) isbuilt by adding a penalty term to the objective function. The penalty term is chosen such thatits value will be small at points away from the constraint boundaries and will tend to infinity asthe constraint boundaries are approached. Hence the values of the φ-function also ‘blows up’ asthe constraint boundaries are approached. This behavior can also be seen from Figure 5.42(b)Thus, once the unconstrained minimization of φ(x,rk) is started from any feasible point x1, thesubsequent points generated will always lie within the feasible domain since the constraint bound-aries act as barriers during the minimization process. This is the reason why the interior penaltyfunction methods are also known as ‘barrier methods’.

The φ–function defined originally by Carroll (1981) is

φ(x,rk) = f(x)− rk

m∑

j=1

1gj(x)

(5.184)

It can be seen that the value of the function φ will always be greater than f since gj(x) is negativefor all feasible points x. If any constraint gj(x) is satisfied critically (with equality sign), the valueof φ tends to infinity. It is to be noted that the penalty form in equation (5.184) is not definedif x is infeasible. This introduces serious shortcoming while using equation (5.184). Since thisproblem does not allow any constraint to be violated, it requires a feasible starting point for thesearch towards the optimum point. However, in many engineering problems, it may not be verydifficult to find a point satisfying all the constraints, gj(x) ≤ 0, at the expense of large valuesof the objective function f(x). If there is any difficulty in finding a feasible starting point, themethod described below can be used to find a feasible point. Since the initial point as well aseach of the subsequent points generated in this method lies inside the acceptable region of thedesign space, the method is classified as the interior penalty function formulation. The iterationprocedure of this method is illustrated below.

Iterative Process

1. Start with an initial feasible point x1 satisfying all the constraints with strict inequalitysign, that is, gj(x1) < 0 for j = 1,2, . . . ,m, and an initial value of r1 > 0. Set k = 1.

2. Minimize φ(x,rk) by using any of the unconstrained minimization methods and obtain thesolution x∗k.

292


3. Test whether x∗k is the optimum solution of the original problem. If x∗k is found to beoptimum, terminate the process; otherwise, go to the next step.

4. Find the value of the next penalty parameter, rk+1, asrk+1 = c·rk where c < 1

5. Set the new value of k = k + 1, take the new starting point as xi = x∗k and go to step 2.

These steps are shown in the form of a flowchart in Figure 5.43.

Figure 5.43. Flowchart for the interior penalty function method

Although the algorithm is straightforward, there are a number of points to be considered in im-plementing the method. These are:

• the starting feasible point x1 may not be readily available in some cases;

• a suitable value of the initial penalty parameter r1 has to be found;

• a proper value has to be selected for the multiplication factor c;

• suitable convergence criteria have to be chosen to identify the optimum point;

• the constraints have to be normalized so that each one of them vary between -1 and 0 only.

293


Starting feasible point

In most of the engineering problems, it is not very difficult to find an initial point x1 satisfying allthe constraints, gj(x1) < 0. In most of the practical problems, a feasible starting point may befound even at the expense of a large value of the objective function. However, there may be somesituations where the feasible design points could not be found so easily. In such cases, the requiredfeasible starting points can be found by using the interior penalty function method itself as follows:

1. Choose an arbitrary point x1 and evaluate the constraints gj(x) at the point x1. Since thepoint x1 is arbitrary, it may not satisfy all the constraints with strict inequality sign. If r

out of a total of m constraints are violated, renumber the constraints such that the last r

constraints will become the violated ones; that is,

gj(x) < 0 j = 1,2, . . . ,m− r

gj(x) ≥ 0 j = m− r + 1,m− r + 2, . . . ,m

(5.185)

2. Identify the constraint which is violated most at the point x1, that is, find the integer k

such that

gk(x1) = max [gj(x1)] for j = m− r + 1,m− r + 2, . . . ,m (5.186)

3. Now formulate a new optimization problem as

Find x which minimizes gk(x)

subject to gj(x) ≤ 0 j = 1,2, . . . ,m− r

gj(x)− gk(x1) ≤ 0 j = m− r + 1, . . . ,k, . . . ,m

4. Solve the optimization problem formulated in step 3. by taking the point x1 as a feasi-ble starting point using the interior penalty function method. Note that this optimizationmethod can be terminated whenever the value of the objective function gk(x) drops belowzero. Thus the solution obtained xM will satisfy at least one more constraint than did theoriginal point x1.

5. If all the constraints are not satisfied at the point x∗, set the new starting point as x1 = x∗and renumber the constraints such that the last r constraints will be the unsatisfied ones(this value of r will be different from the previous value), and go to step 2.

This procedure is repeated until all the constraints are satisfied, and a point x1 = xM is obtainedfor which gj(x1) < 0,j = 1,2, . . . ,m.

If the constraints are consistent, it should be possible to obtain, by applying the above procedure, apoint x1 that satisfies all the constraints. However, there may exist situations in which the solutionof the problem formulated in step 3 gives the unconstrained or constrained local minimum of gk(x)that is positive. In such cases, one has to restart the procedure with a new point x1 from step 1onwards.

294


Initial value of the penalty parameter

Since the unconstrained minimization of φ(x,rk) is to be carried out for a decreasing sequence ofrk, it might appear that by choosing a very small value of r1, one can avoid an excessive numberof minimization of the function φ. But, from computational point of view, it will be easier tominimize the unconstrained function φ(x,rk) if rk is large. This can be seen qualitatively fromFigure 5.42(b). It can be seen that as the value of rk becomes smaller, the value of the functionφ changes more rapidly in the vicinity of the minimum φ∗k. Since it is easier to find the minimumof a function whose graph is smoother, the unconstrained minimization of φ will be easier if rk islarge. However, the minimum of φk, x∗k, will be farther away from the desired minimum x∗ if rk islarge. Thus it requires an excessive number of unconstrained minimization of φ(x,rk) (for severalvalues of rk) to reach the point x∗ if r1 is selected to be very large. Thus a ‘moderate’ value hasto be chosen for the initial penalty parameter r1. In practice, a value of r1, which gives the valueof φ(x1,r1) approximately equal to 1.1 to 2.0 times the value of f(x1), has been found to be quitesatisfactory in achieving quick convergence of the process. Thus, for any initial feasible startingpoint x1, the value of r1 can be taken as

r1 ' (0.1÷ 1.0)f(x1)

−m∑

j=1

1gj(x1)

(5.187)

Subsequent values of the penalty parameter

Once the initial value of rk is chosen, the subsequent values of rk have to be chosen such that

rk+1 < rk (5.188)

For convenience, the value of rk are chosen according to the relation

rk+1 = c·rk where c < 1 (5.189)

The value of c can be taken as 0.1 or 0.2 or 0.5, etc.

Convergence criteria

Since the unconstrained minimization of φ (x, rk) has to be carried out for a decreasing sequenceof values rk, it is necessary to use proper convergence criteria to identify the optimum pointand to avoid an unnecessary large number of unconstrained minimization. The process can beterminated whenever the following conditions are satisfied:

1. The relative difference between the values of the objective function obtained at the end ofany two consecutive unconstrained minimization falls below a small number ε1, i.e.

| f(x∗k)− f(x∗k−1)f(x∗k)

| ≤ ε1 (5.190)

295


2. The difference between the optimum points x∗k and x∗k−1 becomes very small. This can bejudged in several ways. Some of them are given below

|(∆x)i| ≤ ε2 (5.191)

where ∆x = x∗k − x∗k−1, and (∆x)i is the ith component of the vector ∆x; or

max |(∆x)i| ≤ ε3 (5.192)

|(∆x)| = [(∆x)21 + (∆x)22 + . . . + (∆x)2n]1/2 ≤ ε4 (5.193)

Note that the values of ε1 to ε4 have to be chosen depending on the characteristics of the problemat hand.

Normalization of constraints

A structural optimization problem, for example, might be having constraints on the deflection,δ, and the stress, σ, as

g1(x) = δ(x)− δmax ≤ 0 (5.194)

g2(x) = σ(x)− σmax ≤ 0 (5.195)

where the maximum allowable values are given by δmax = 0.5 cm, and σmax = 3,000 kg/cm2. If adesign vector x1 gives the values of g1 and g2 as -0.2 and -2000, the contribution of g1 will be muchlarger than that of g2 (by an order of 104) in the formulation of the φ–function given by equations(5.184). This will badly effect the convergence rate during the minimization of φ–function. Thus,it is advisable to normalize the constraints so that they vary between -1 and 0 as far as possible.For the constraints shown in equations (5.194) and (5.195), the normalization can be done as

g′1(x) =

g1(x)δmax

=δ(x)δmax

− 1 ≤ 0 (5.196)

g′2(x) =

g2(x)σmax

=σ(x)σmax

− 1 ≤ 0 (5.197)

If the constraints are not normalized as shown in equations (5.196) and (5.197), the problem canstill be solved effectively by defining different penalty parameters for different constraints as

φ(x, rk) = f(x)− rk

m∑

j=1

Rj

gj(x)(5.198)

where R1, R2, . . ., Rm are selected such that the contributions of different gj(x) to the φ–functionwill be approximately same at the initial point x1. When the unconstrained minimization ofφ(x, rk) is carried for a decreasing sequence of values of rk, the values of R1, R2, . . ., Rm willnot be altered; however, they are expected to be effective in reducing the disparities between thecontributions of the various constraints to the φ function.

296


Exterior Penalty Function Method

In the exterior penalty function method, the φ-function is generally taken as

φ(x, rk) = f(x) + rk

m∑

j=1

< gj(x) >q (5.199)

where rk is a positive penalty parameter, the exponent q is a nonnegative constant, and thebracket function < gj(x) > is defined as

< gj(x) > = max < gj(x),0 > =

gj(x) if gj(x) > 0 constraint is violated

0 if gj(x) ≤ 0 constraint is satisfied(5.200)

It can be seen, from equation (5.199), that the effect of the second term on the right side isto increase φ(x, rk) in proportion to the qth power of the amount by which the constraints areviolated. Thus there will be a penalty for vio1ating the constraints, and the amount of penaltywill increase at a faster rate compared to the amount of violation of a constraint (for q > 1). Thisis the reason why the formulation is called the penalty function method. Usually, the functionφ(x, rk) possesses a minimum as a function of x in the infeasible region. The unconstrainedminima x∗k converge to the optimal solution of the original problem as k → ∞ and rk → ∞.Thus the unconstrained minima approach the feasible domain gradually and as k → ∞, the x∗keventually lies in the feasible region. Consider equation (5.199) for various values of q.

• q = 0

Here the φ-function is given by

φ(x, rk) = f(x) + rk

m∑

j=1

< gj(x) >0 =

{f(x) + mrk if all gj(x) > 0

f(x) if all gj(x) ≤ 0(5.201)

This function is discontinuous on the boundary of the acceptable region as shown in Figure5.44 and hence it would be very difficult to minimize this function.

Figure 5.44. φ–function discontinuous for q = 0

297


• 0 < q < 1

Here the φ–function will be continuous, but the penalty for violating a constraint may betoo smal1. Also the derivatives of the function are discontinuous along the boundary. Thusit will be difficult to minımize the φ–function. Typical contours of the φ–function are shownin Figure 5.45.

Figure 5.45. Derivatives of φ–function discontinuous for 0 < q < 1

.

• q = 1

In this case, under certain restrictions, it has been shown that there exists an r◦ large enoughthat the minimum of φ(x, rk) is exactly the constrained minimum of the original problemfor all rk > r◦. However, the contours of the φ–function look similar to those shown inFigure 5.45 and possess discontinuous first derivatives along the boundary. Hence, in spiteof the convenience of choosing a single rk that yields the constrained minimum in one un-constrained minimization, the method is not very attractive from computational point ofview.

• q > 1

In this case the φ–function will have continuous first derivatives as shown in Figure 5.46.These derivatives are given by

∂φ

∂xi=

∂f

∂xi+ rk

m∑

j=1

< gj(x) >q−1 ∂gj(x)∂xi

(5.202)

Generally, the value of q is chosen as 2 in practical computation. It will be assumed a valueof q > 1 in the subsequent discussion of this method.

298


Figure 5.46. Derivatives of φ–function discontinuous for 0 < q < 1

Algorithm

The exterior penalty function method can be .stated by the following steps:

1. Start from any design x1 and a suitable value of r1. Set k = 1.

2. Find the vector x∗k that minimizes the function

φ(x.rk) + rk

m∑

j=1

< gj(x) >q

3. Test whether the point x∗k satisfies all the constraints. If x∗k is feasible, it is the desiredoptimum and hence terminate the procedure. Otherwise, go to step 4.

4. Choose the next value of the penalty parameter which satisfies the relation

rk+1 > rk

and set the new value of k as original k plus one, and go to step 2.

This procedure is indicated as a flow chart in Figure 5.47 where rk+l is chosen, for simplicity,according to the relation

rk+1

rk= c

where c is a constant greater than one.

299


Figure 5.47. Flowchart for exterior penalty function method

300

Bibliography

[1] Carroll, A.L.: The Created Response Surface Technique for Optimizing Nonlinear Restrained System,Operations Research, Vol. 9, 1961, pp. 169–184.

[2] Cauchy, A.L.: Methode generale pour la resolution des systemes d’equations simultanees, C.R. Acad.Science, Vol. 7, 1947.

[3] Davidon, W.C.: Variable Metric Method of Minimization, Argonne National Laboratory Report no.ANL–5990, 1959.

[4] Duffin, R.J., Peterson, E.L., Zener, C.: Geometric Programming: Theory and Applications, Wiley,New York, 1967.

[5] Dantzig, C.: Linear Programming and Extensions, Princeton University Press, 1963.

[6] Fletcher, R. and Powell, M.J.D.: A Rapidly Convergent Descent Method for Minimization, ComputerJournal, Vol. 6, no. 2,, 1963, pp. 163–168.

[7] Fletcher, R. and Reeves, C.M.: Function Minimization by Conjugate Gradients, Computer Journal,Vol. 7, no. 2,, 1964, pp. 149–154.

[8] Hancock, H.: Theory of Maxima and Minima, Dover, New York, 1960.

[9] Hu, T.C.: Integer Programming and Network Flows, Addison–Wesley, Reading, Massachusets, 1969.

[10] Kuhn, H.W., Tucker, A.: Nonlinear Programming , Proceedings, the Second Berkeley Symposium onMathematical Statistic and Probability, University of California Press, Berkeley, 1951.

[11] Sengupta, J.K.: Stochastic Programming: Methods and Applications, North–Holland Publishing Co.,Amsterdam, 1972.

301

Chapter 6

Design of Experiments

Sir Ronald Fisher was the innovator in the use of statistical methods in design of experiments(DoE). He developed the analysis of variance as the primary method of statistical analysis inexperimental design. While Fisher was clearly the pioneer, there have been many other signifi-cant contributors to the techniques of experimental design, including Yates, Bose, Kempthorne,Cochran, and Box.

Many of the early applications of experimental design methods were in the agricultural and bio-logical sciences, and as a result, much of the terminology of the field is derived from this heritage.The first industrial applications of experimental design began to appear in the 1930s, initially inthe British textile and woolen industry. After the Second World War, experimental design meth-ods were introduced to the chemical and process industries in the United States and WesternEurope. The semiconductor and electronics industry has also used experimental design methodssince the late 60’s with considerable success. In last decades, there has been a revival of interestin experimental design in the United States because many industries discovered that their Asiaticcompetitors have been using designed experiments for many years and that this has been animportant factor in their competitive success.

It is time for all engineers to receive formal training in experimental design as part of theirundergraduate education. The successful integration of experimental design into the engineeringprofession is expected to become a key factor in the future competitiveness of the Europeanindustry.

6.1 General

Experiments, i.e. tests, are performed by investigators in virtually all industrial sectors. A de-signed experiment is, therefore, a test or series of tests in which purposeful changes are made tothe input variables of an industrial process or technical system so that the reasons for changes inthe output response may be identified.

301

6 – Design of Experiments

Experimental design methods have found broad application in many disciplines. In engineer-ing, experimentation may be viewed as part of the scientific process and as one of the ways theexperimenter learns about how technical systems or manufacturing processes work. Generally,the experimenter learns through a series of activities in which he/she makes conjectures about aprocess or a physical phenomenon, performs (numerical experiments) to generate data from theprocess, and then uses the information from the experiment to establish new conjectures, whichlead to new experiments, and so on.

A process can be represented by the model shown in Figure 6.1. One can usually visualizethe process as a combination of machines, methods, engineers, computers, and other resources,which transforms some input into an output that has one or more observable responses. Someof the process variables x1, x2, . . . , xp are controllable, whereas other variables z1, z2, . . . , zp areuncontrollable.

Figure 6.1. General model of a process

Experimental design methods play an important role in design and process development to im-prove manufacturing performance. The objective in many cases may be to develop a robust designand/or a robust process, that is, a design or process affected minimally by external sources ofvariability (the noise parameters z’s).

The application of experimental design techniques in manufacturing process development canresult in

• improved product and process outcomes,• reduced variability and closer conformance to nominal or target requirements,• reduced development time,• reduced overall costs.

Experimental design methods also play a major role in engineering design activities, where newproducts are developed and existing ones improved. Some applications include:

• evaluation and comparison of conceptual and basic design configurations;• selection of design parameters so that the technical product will work well under a wide

variety of environmental conditions (robust product);• determination of key design variables that impact product performance.

302

6.1 – General

The use of experimental design in these areas can result in products that are easier to manufac-ture, products that enhance technical performance and reliability together with lower productcost, and shorter time for product design and development.

Finally, the objectives of an experiment may include the following determinations:

• which variables are most influential on the response, y;• where to set the influential x’s so that y is almost always near the desired nominal value;• where to set the influential x’s so that variability in y is small;• where to set the influential x’s so that the effects of the uncontrollable variables z1, z2, . . .,

zn are minimized.

6.1.1 Basic Principles

The statistical approach to experimental design is mandatory if one wishes to draw meaningfulconclusions from the data, collected from a physical experiment and from numerical analysis ofa phisical phenomenon or process. In order to perform a design experiment most efficiently, ascientific approach to planning the experiment must be employed. To reach valid and objectiveconclusions by means of the statistical design of experiments, the primary activity is collectingdata appropriately.

When the problem involves data that are subject to experimental errors, statistical methodologyis the only objective approach to analysis. Thus, there are two aspects to any experimental prob-lem: the design of the experiment and the statistical analysis of the data. These two subjects areclosely related since the method of analysis depends directly on the design method employed.

There are three basic principles of experimental design, namely, replication, randomization, andblocking.

Replication is intended as a repetition of a physical experiment and has two important properties.First, it allows the experimenter to obtain an estimate of the experimental error. This estimateof error becomes a basic unit of measurement for determining whether observed differences in thedata are really statistically different. Second, if the sample mean, y, is used to estimate the effectof a factor (variable) in the experiment, then replication permits the experimenter to obtain amore precise estimate of this effect. For example, if σ2 is the variance of the data, and there aren replicates, then the variance of the sample mean is

σ2y =

σ2

n

The practical implication of this fact is that, if the experimenter has n = 1 replicates and manyobservations, he/she would probably be unable to make satisfactory inferences about the effectof variables, that is, the observed difference could be the result of an experimental error. Onthe other hand, if n is reasonably large, and the experimental error is sufficiently small, then theexperimenter will be reasonably safe in his/her conclusions.

303


Randomization is the cornerstone underlying the use of statistical methods in experimental de-sign. By randomization it is meant that both the allocation of the experimental data and theorder in which the individual runs or trials of the experiment are to be performed are randomlydetermined. Randomization usually makes it valid the assumption that statistical methods re-quire the observations (or errors) be independently distributed random variables. By properlyrandomizing the experiment, the investigator is also helped in ‘averaging out’ the effects of ex-ogenous, imprecise and ambiguous factors that may be present.

Blocking is a technique used to increase the precision of an experiment. A block is a portion ofthe experimental data that should be more homogeneous than the entire set of data. Blockinginvolves making comparisons among the conditions of interest in the experiment within eachblock.

6.1.2 Guidelines for Designing Experiments

To use the statistical approach in designing and analyzing a design experiment, the investigatormust have a clear idea in advance of exactly which phenomenon or process is to be modelled, howthe data are to be collected, and at least a qualitative understanding of how these data are to beanalyzed. The recommended procedure is outlined through the following steps:

1. Recognition and statement of the problem. This may seem to be a rather obvious point, butin practice it is often not so simple to develop a clear and generally accepted statement ofthe problem. Usually, it is important to press for cooperation from all concerned parties.A clear statement of the problem often contributes substantially to a better understandingof the phenomenon or process and to reach a sound solution.

2. Choice of the variables and levels. The experimenter must choose the variables to be var-ied in the experiment, the ranges over which these factors will be varied, and the specificlevels of variables at which runs will be made. Thought must also be given to how thesefactors are to be controlled at the desired values and how they are to be measured. Theexperimenter will also have to decide on the range over which each factor. It is necessaryto investigate all variables that may be of importance, particularly when the investigatoris in the early stages of the designed experiment. When the objective is factor screening orprocess characterization, it is usually best to keep the number of variable levels low (mostoften two–three leves are used).

3. Selection of the response. In selecting the response variable, the experimenter should becertain that this variable really provides useful information about the phenomenon understudy. Most often, the average or standard deviation (or both) of the measured character-istic will be the response variable.

304

6.1 – General

4. Choice of the experimental design. If the first three steps are done correctly, this step isrelatively easy. Selection of the appropriate design method involves consideration of sam-ple size, selection of a suitable run order for the experimental trials, and determinationof whether or not blocking or other randomization restrictions are involved, while keepingthe experimental objectives in mind. In many engineering experiments, the investigatoralready knows at the outset that some of the factor levels will result in different values forthe response. Consequently, he/she has to identify which variables cause this difference andto estimate the magnitude of the response variation.

5. Performing the experimental design. Errors in experimental procedure at this stage willusually destroy experimental validity. It is easy to underestimate the planning aspects ofrunning a designed experiment in a complex research and development environment.

6. Data analysis. Statistical methods should be used to analyze the data so that results andconclusions are objective in nature. If the experiment has been designed correctly and if ithas been performed according to the design, then the statistical methods required are notcomplicated. There are many excellent software packages designed to assist in data analy-sis, and simple graphical methods play an important role in data interpretation. Residualanalysis and model goodness checking are important analysis techniques. Notice that sta-tistical methods cannot prove that a variable has a particular effect. They only provideguidelines as to the reliability and validity of results. Properly applied, statistical methodsallow to measure the likely error in a conclusion or to assign a level of confidence to a state-ment. The primary advantage of statistical methods is that they add objectivity to thedecision–making process. Statistical techniques coupled with good engineering knowledgewill usually lead to sound conclusions.

7. Conclusions and recommendations. Once the data have been analyzed, the experimentermust draw practical conclusions about the results. Graphical methods are often useful inthis stage, particularly in presenting the results. Follow–up runs and confirmation testingshould also be performed to validate the conclusions from the experiment. Throughout thisentire process, it is important to keep in mind that experimentation is an important part ofthe learning process. This suggests that experimentation is iterative: it is usually a majormistake to design a single, large, comprehensive experiment at the start of a study. A suc-cessful experiment requires knowledge of the primary variables, the ranges over which theyshould be varied, and the appropriate number of levels to use. As an experimental programprogresses, the experimenter often drops some input variables, adds others, changes theregion of exploration for some factors, or adds new response variables.

6.1.3 Statistical Techniques in Designing Experiments

Much of the research in science and industry is empirical and makes extensive use of (numerical)experimentation. Statistical methods can greatly increase the efficiency of experiments and oftenstrengthen the conclusions so obtained. The intelligent use of statistical techniques in experimen-tal designs requires that the investigator keeps the following points in mind:

305


• Use non-statistical knowledge of the problem. In some fields there is a large body of physicaltheory on which to draw in explaining the relationships between variables and responses.This type of nonstatistical knowledge is invaluable in choosing variables, determining vari-able levels, interpreting the results of the analysis, and so forth. Using statistics is nosubstitute for thinking about the problem.

• Keep the design and analysis as simple as possible. Do not exagerate in the use of com-plex, sophisticated statistical techniques. Relatively simple design and analysis methodsare almost always the best. If the investigator builds the experimental design carefully andcorrectly, the analysis will almost always be relatively straightforward.

• Recognize the difference between practical and statistical significance. Just because two ex-perimental conditions produce mean responses that are statistically different, there is noassurance that this difference is large enough to have any practical value.

• Experiments are usually iterative. Remember that, in most situations, it is unwise to designtoo comprehensive an experiment at the start of a study. This argues in favor of the iterativeor sequential approach. Of course, there are situations where comprehensive experimentsare entirely appropriate, but as a general rule, most experiments should be iterative.

6.1.4 Basic Concept of Probability

Engineers must deal with uncertainty, which is an all–pervading and dominant characteristic ofengineering practice. Engineering uncertainty may occur in three basic ways.

First it occurs when the investigator measures something or makes predictions of dependent vari-ables from measured/computed quantities. Engineers are inclined to assume that this kind ofuncertainty does not exist at all. They tend to treat problems of this type as deterministic, andoverdesign to compensate for uncertainty. This must be considered as a rather crude treatmentof this kind of uncertainty. It is necessary to tackle the problem in a more rational way, usingthe concepts of probability theory . For example, in case of a towing resistance experiment, navalarchitects would expect that the measurement could be repeated with very consistent results.This is typical of engineering measurements, in which the discrimination of the instrument’s scalecorresponds to its inherent variation, and one gets the impression that the measurement is exact,or almost exact. However, substantial uncertainty about the true value of the towing resistancemay really exist. It may be due to a bias in the measurement technique, or variations in temper-ature and humidity conditions in the model basin. Furthermore, the equation predicting totalresistance in full scale may be empirical, with considerable uncertainty about the ‘correct‘ valueof scaling coefficients. Or the predictive law may be a poor model for the true behavior, to anuncertain degree.

The second basic type of engineering uncertainty occurs when engineers are concerned with anevent that may or may not occur, or the time of its occurrence may be uncertain. Typical of thiswould be hydrodynamic loads on a ship structure, or the frequency of occurrence a seakeepingphenomenon reaches a critical level.

306

6.2 – Probabilistic Approach

Designers may also be uncertain about the validity of a hypothesis or theory that is used topredict performance of a technical system or subsystem. This is similar to the case mentioned inthe discussion of the first type of uncertainty, the question of a poor model for the design analy-sis. However, designers are concerned there about whether or not a model, which was known tohave good validity in certain circumstances, was applicable to a particular design for which thecircumstances are uncertain. Here designers are concerned with the validity of one or perhapsalternative models when the circumstances are well defined or well controlled.

To deal with uncertainty in a rational, ordered manner, engineers must be able, in a sense, tomeasure it. The term event will be used to designate all the kinds of issues engineers are uncer-tain about. The measure of the degree of uncertainty about the likelihood of an event occurring isprobability . In an attempt to measure probability the experimenter begins by arbitrarily definingits range as 0 to 1, or 0 to 100%. If he/she is certain that an event will not occur, one says thatit has a probability of 0; if he/she is certain that an event will occur, it has a probability of 1.In general engineers are faced with the problem of measuring intermediate values of probabilitywhere the measure is obvious from the real situation or known a priori. Situations in which apriori probabilities may be assumed, represent a rather special type of situation quite unrepre-sentative of engineering problems.

Designers are concerned with random or unpredictable events; so, they have to understand whatis really meant by the likelihood of an event occurring. In this respect, there are in fact severalmeanings. A straightforward meaning uses the frequency concept, which is clear enough for re-peatable events with a priori probabilities.

There are other apparently different meanings for probability. The rather simple meaning defin-ing probability as the relative frequency of occurrence of an event should be examined a littlemore closely. Experience shows that variation in frequency becomes less as the sample increases.Probability can be defined in this sense more precisely as the limit of the ratio of the number ofoccurrences to the number of trials, as the number of trials increases without limit.

Probability is also a measure of risk that an event will or will not occur, a measure of uncertaintyabout the occurrence of an event, and a measure of the degree of a designer’s belief that an eventwill occur. These are closely related concepts, and represent successive increases in generality ofthe concept of probability.

6.2 Probabilistic Approach

In the probabilistic approach, uncertainties are characterized by the probabilities associated withevents. The probability of an event can be interpreted in terms of the frequency of occurrenceof that event. When a large number of samples or experiments are considered, the probabilityof an event is defined as the ratio of the number of times the event occurs to the total num-ber of samples or experiments. For example, the statement that the probability for vertical

307


acceleration av lying between a1 and a2 equals p means that from a large number of indepen-dent measurements of the acceleration av, under identical conditions, the number of times thevalue of av lies between a1 and a2 is roughly equal to the fraction p of the total number of samples.

Probabilistic analysis is the most widely used method for characterizing uncertainty in physicalsystems, especially when experiments are concerned with estimates of the probability distribu-tions of uncertain parameters. This approach can describe uncertainty arising from stochasticdisturbances, variability conditions, and risk considerations; the uncertainties associated withmodel inputs are described by probability distributions, and the objective is to estimate the out-put probability distributions. This process comprises two stages:

• Probability encoding of inputs. This process involves the determination of the probabilisticdistribution of the input variables, and incorporation of their random variations due to bothexogenous variability (from, e.g., meteorology, finance) and errors. This is accomplished byusing either statistical estimation techniques or expert judgements. Statistical estimationtechniques involve estimating probability distributions from available data or by collectionof a large number or representative samples. In cases where limited data are available, anexper judgement provides the information about the input probability distribution. Normaldistribution is tipically used to describe unbiased measurement errors.

Figure 6.2. Schematics of the propagation of uncertainties in a mathematical model

• Propagation of uncertainty through models. Figure 6.2 depicts schematically the concept ofuncertainty propagation: each point of the response surface (i.e., each calculated outputvalue) corresponding to changes in inputs 1 and 2 will be characterized by a probabilitydensity function that will depend on the pdfs of the inputs.

308


6.2.1 Basic Statistical Concepts

Simple comparative experiments are often considered to compare two formulations (treatments).

As an example, consider an experiment which was performed to determine whether two differentvariable levels of a product give equivalent results. The tension bond strength of portland cementmortar is an important characteristic of the product. An engineer may be interested in comparingthe strength of a modified formulation in which polymer latex emulsions have been added duringmixing to the strength of the unmodified mortar. The experimenter has collected 10 observationsof strength for the modified level of variable and other 10 observations for the unmodified level.The data from this experiment are plotted in Figure 6.3. This display is called a dot diagram.

Figure 6.3. Dot diagram for the tension bond strength

Visual examination of these data give the immediate impression that the response of the mod-ified formulation is greater than the one of the unmodified variable level. This impression issupported by comparing the average response, y1 = 16.76 kgf/cm2 for the modified level andy2 = 17.92 kgf/cm2 for the unmodified level. The average responses in these two samples differby what seems to be a nontrivial amount.

However, it is not obvious that this difference is large enough to imply that the two treatmentsare really different. Perhaps this observed difference in average strengths is the result of samplingfluctuation and the two treatments are really identical. Possibly other two samples would giveopposite results, with the strength of the modified treatment exceeding that of the unmodifiedformulation.

A technique of statistical inference called hypothesis testing , or significance testing , can be used toassist the experimenter in comparing two treatments. Hypothesis testing allows the comparisonof the two formulations to be made on objective terms, with a knowledge of the risks associ-ated with reaching the wrong conclusion. To present procedures for hypothesis testing in simplecomparative experiments, however, it is first necessary to develop and review some elementarystatistical concepts, such as random variables, probability distributions, random samples, sam-pling distributions, and tests of hypotheses.

Random variables may be discrete or continuous. If the set of all possible values of the randomvariable is either finite or countably infinite, then the random variable is discrete, whereas if the setof all possible values of the random variable is an interval, then the random variable is continuous.Discrete random design variables are rather rare in the design of technical componenets, but arenot uncommon in design of technical systems. Environmental random discrete variables alsooccur.

309


Sample of Data

Because of the fundamental importance of the probability density function (pdf) in probabilisticdesign it is extremely important to illustrate its concept. The pdf is the basic tool for codifyingand communicating uncertainty about a value of a continuously varying variable.

There are several ways of developing the concept of a probability density function. Simple graph-ical methods are often used to assist in analyzing the data from an experiment. The dot diagram,illustrated in Figure 6.3, is a very useful device for displaying a small body of data (say up toabout 20 observations). The dot diagram enables the experimenter to see quickly the generallocation or central tendency of the observations and their spread.

If the data are a very large random sample, then the dots in a dot diagram become difficult todistinguish, and a histogram may be preferable. If a sample of values of a random variable isanalyzed to give frequencies in specified intervals, the results can be plotted as a histogram, asshown in Figure 6.4. The histogram shows the central tendency, spread, and general shape of theprobabilistic distribution of the data.

Figure 6.4. Frequency histogram

The box plot (or box and whisker plot) is a very useful way to display data. A box plot displaysthe minimum, the maximum, the lower and upper quartiles (the 25th percentile and the 75thpercentiles, respectively), and the median (the 50th percentile) on a rectangular box aligned ei-ther horizontally or vertically. The box extends from the lower quartile to the upper quartile,and a line is drawn through the box at the median. Lines (or whiskers) extend from the ends ofthe box to the minimum and maximum values.

Figure 6.5 presents the box plots for the two samples of tension bond strength in the portlandcement mortar experiment. This display clearly reveals the difference in mean strength betweenthe two treatments. It also indicates that both variable levels produce reasonably symmetricdistributions of strength with similar variability or spread.

Dot diagrams, histograms, and box plots are useful for summarizing the information in a sampleof data. To describe the observations that might occur in a sample more completely, the conceptof the probability distribution is used.

310


Figure 6.5. Box plots

Probability Distributions

The probability structure of a random variable, say y, is described by its probability distribution.If the random variable is discrete, the probability distribution of y, say p(y) = f(y), is oftencalled the probability function of y and is represented directly by a probability mass function asillustrated in Figure 6.6. The function f(y) now represents the actual probability of the value y

occurring; it is its height that represents probability.

Figure 6.6. Discrete probability distribution

If y is continuous, the probability distribution of y, say f(y), is usually called the probabilitydensity function (pdf) for y. Figure 6.7 illustrates a hypothetical continuous probability distri-bution together with the cumulative distribution function Fy. It is the area under the curve f(y)associated with a given interval that represents probability.

311


Figure 6.7. Continuous probability distribution

The properties of probability distributions may be summarized quantitatively as follows:

if y discrete 0 ≤ p(yj) ≤ 1 for all values of yj

P (y = yj) = p(yj) for all values of yj

n∑

j=1

p (yj) = 1

if y continuous f(y) ≥ 0

P (a ≤ y ≤ b) =∫ b

af(y) dy

∫ ∞

−∞f(y) dy = 1

Characteristic Measures of a Random Variable

The random nature of a variable is commonly represented in a limited way by a central measureand a ‘scatter’ measure. The central measure may be the mean, median, or mode.

The mean value is a weighed average, in which the weighing factors are the probabilities associatedwith each value. It is a measure of the central tendency or location of a probability distribution.The mean is commonly designated µ, and is defined mathematically as

µ =

∫ ∞

−∞y ·f(y) dy if y continuous

n∑

j=1

yj ·p(yj) if y discrete(6.1)

where n is the set size, yj is the jth discrete value, and p(yj) is the probability function.

312


The mean may also be called the expected value of y, designated E(y) - the expected value operator- or the long–run average value of the random variable y.

The median is that value of the random variable for which any other sampled value is equallylikely to be above or below. It is defined by means of the cumulative distribution function as

0.5 =∫ y

−∞f(y) dy = F (y) (6.2)

where y is the median.

This can be generalized to give fractiles or percentiles

ξ =∫ yξ

−∞f(y) dy = F (yξ) (6.3)

Thus y0.25 is the value for y corresponding to a cumulative distribution function value of 0.25, or25% probability that a sampled y value will be less than y0.25; it is called the 25 percentile.

The mode is the most likely value of the random variable, and corresponds to the maximum valueof the probability density function, or probability function for a discrete variable. A densityfunction can be multimodal, usually as a result from combining two different populations.

The spread or dispersion of a probability distribution can be measured by the variance, also calledsecond central moment, defined as

σ2 =

∫ ∞

−∞(y − µ)2 ·f(y) dy if y continuous

n∑

j=1

(yj − µ)2 · p(yj) if y discrete(6.4)

The square root of the variance is the standard deviation, σ, which is the commonly used char-acteristic measure of dispersion, or ‘width’ of the density function.

Notice that the variance can be expressed in terms of expectation since

σ2 = E[(y − µ)2] (6.5)

Finally, the variance is used so extensively that it is convenient to define a variance operator V

such that

V (y) = E[(y − µ)2] = σ2 (6.6)

The concepts of expected value and variance are used extensively in statistics, and it may behelpful to review several elementary results concerning these operators. If y is a random variablewith mean µ and variance σ2 and c is a constant, then

• E(c) = c

• E(y) = µ

313


• E(cy) = cE(y) = c µ

• V (c) = 0

• V (y) = σ2

• V (cy) = c2 V (y) = c2 σ2

• If there are two random variables, for example, y1 with E(y1) = µ1 and V (y1) = σ21 , and

y2 with E(y2) = µ2 and V (y2) = σ22, then E(y1 + y2) = E(y1) + E(y2) = µ1 + µ2

• It is possible to show that

V (y1 + y2) = V (y1) + V (y2) + 2Cov(y1, y2) where Cov(y1, y2) = E[(y1 − µ1)·(y2 − µ2)]is the covariance of the random variables y1 and y2. The covariance is a measure of thelinear association between y1 and y2. More specifically, it may be shown that if y1 and y2

are independent, then Cov(y1, y2) = 0.

• It may also be shown that

V (y1 − y2) = V (y1) + V (y2)− 2Cov(y1, y2)

• If y1 and y2 are independent, then

V (y1 − y2) = V (y1) + V (y2) = σ21 + σ2

2 and

E(y1 ·y2) = E(y1)·E(y2) = µ1 ·µ2

• However, note that, in general,

E

(y1

y2

)6= E(y1)

E(y2)regardless of whether or not y1 and y2 are independent.

6.2.2 Statistical Inference

The objective of statistical inference is to draw conclusions about a population using a samplefrom that population.

Random Sampling, Sample Mean, and Sample Variance

Most of the statistical inference methods assume that random samples are used. That is, if thepopulation contains N elements and a sample of n of them is to be selected, then if each of theN !/(N−n)!n! possible samples has an equal probability of being chosen, the procedure employedis called random sampling .

Statistical inference makes considerable use of quantities computed from the observations in thesample. For example, suppose that y1, y2, . . . , yn represents a sample.

314


Then the sample mean

y =1n

n∑

i=1

yi (6.7)

and the sample variance

S2 =

n∑

i=1

(yi − y)2

n− 1(6.8)

are both statistics. These quantities are measures of the central tendency and dispersion ofthe sample, respectively. Sometimes S =

√S2, called the sample standard deviation, is used as a

measure of dispersion. Engineers often prefer to use the standard deviation to measure dispersionbecause its units are the same as those for the variable of interest y.

Properties of the Sample Mean and Sample Variance

The sample mean y is a point estimator of the population mean µ, and the sample varianceS2 is a point estimator of the population variance σ2. In general, an estimator of an unknownparameter is a statistic that corresponds to that parameter. Note that a point estimator is arandom variable. It may be easily demonstrated that y and S2 are unbiased estimators of µ andσ2, respectively.

For the sample mean, using the properties of expectation

E(y) = E

(1n

n∑

i=1

yi

)=

1n

n∑

i=1

E(yi) =1n

n∑

i=1

µ = µ (6.9)

since the expected value of each observation yi is µ, y is an unbiased estimator of µ.

Now consider the sample variance S2; one has

E(S2) = E

[1

n− 1

n∑

i=1

(yi − y)2]

=1

n− 1E

[n∑

i=1

(yi − y)2]

=1

n− 1E(SS)

where SS =∑

(yi − y)2 is the corrected sum of squares of the observations yi. Since

E(SS) = E

[n∑

i=1

(yi − y)2]

= E

[n∑

i=1

y2i − ny2

]=

n∑

i=1

(µ2+σ2)−n (µ2+σ2/n) = (n−1)σ2 (6.10)

the expected sample variance reads

E(S2) =E(SS)n− 1

= σ2 (6.11)

and it can be seen that S2 is an unbiased estimator of σ2.

The quantity (n− 1) in equation (6.10) is called the number of degrees of freedom of the sum ofsquares SS.

315


If y is a random variable with variance σ2 and SS =∑

(yi − y)2 has (n− 1) degrees of freedom,then

E

(SS

n− 1

)= σ2 (6.12)

The number of degrees of freedom of a sum of squares is equal to the number of independentelements in that sum of squares.

6.2.3 Probability Density Functions

Theoretical continuous distributions tend to be used because of their convenienece, despite ratherlimited physical justification. There is little need for this in design applications, but purely nu-merical representation of distributions can be quite adequate when numerical simulations areused. However, if there is evidence that a theoretical distribution can be fitted well to a randomvariable, this is useful information, and more confidence can be placed in small–place informationused to define the distribution parameters.

It is often possible, therefore, to determine the probability distribution of a particular statisticif one knows the probability distribution of the population from which the sample was drawn.Several useful sampling distributions are briefly discussed below.

Normal Distribution

The normal or Gaussian function is historically the dominant theoretical probability function inthe theory of statistics, and has a central role in the theory of statistical inference. Although ithas been widely used in scientific work to represent populations arising from natural phenomena,and in error theory, its use in engineering work is much more limited. Its main disavantages arethat it must be symmetrical, and the tails go to infinity at both ends (Fig. 6.8).

Figure 6.8. The normal distribution

Nevertheless, sample runs that differ as a result of experimental errors are often well describedby a normal distribution, the normal distribution plays a central role in the analysis of data from

316


designed experiments. Many important sampling distributions may also be defined in terms ofnormal random variables. The notation y ∼ N(µ, σ2) is often used to denote that y is distributednormally with mean µ and variance σ2.

If y is a normal random variable, then the probability density function of y has the form

f(y) =1

σ√

2πe−(y−µ)2/2σ2 −∞ < y < ∞ (6.13)

An important special case of the normal distribution is the standard normal distribution, whereµ = 0 and σ2 = 1. It is evident that if y ∼ N(µ, σ2), then the random variable

z =y − µ

σ(6.14)

follows the standard normal distribution, denoted z ∼ N(0, 1). The operation demonstrated inequation (6.14) is often called standardizing the normal random variable y.

The Central Limit Theorem.

If y1, y2, . . . , yn is a sequence of n independent and identically distributed random variables withE(yi) = µ and V (yi) = σ2 and x = y1+y2+ . . .+yn is the sum of a large number of independentelements, each of which has a small effect on the sum, then the distribution of the variable

zn =x− nµ√

nσ2

tends to be normal. In other terms, the statistic zn has an approximate N(0.1) distribution inthe sense that, if Fn(z) is the distribution function of zn and Φ(z) is the distribution function ofthe N(0,1) random variable, then

limn→∞[Fn(z)/Φ(z)] = 1

This result states essentially that the sum of n independent and identically distributed randomvariables is approximately normally distributed. In many cases this approximation is good forvery small n, say n < 10, whereas in other cases large n is required, say n > 100.

Chi–square Distribution

Many statistical techniques assume that the random variable is normally distributed. An im-portant sampling distribution that can be defined in terms of normal random variables is thechi-square or χ2 distribution. If z1, z2, . . . , zk are normally and independently distributed ran-dom variables with µ = 0 and σ2 = 1, denoted NID(0, 1), then the random variable

χ2k = z2

1 + z22 + . . . + z2

k

follows the chi-square distribution with k degrees of freedom.

317


The probability density function of chi-square is

f(χ2) =1

2k−2 Γ

(k

2

) (χ2)(k/2)−1 · e−χ2/2 (6.15)

where Γ is the standard gamma function.

Different chi-square distributions are shown in Figure 6.9; these distributions are asymmetric orskewed , with mean and variance given respectively as µ = k and σ2 = 2k.

Figure 6.9. Various chi–square distributions

As an example of a random variable that follows the chi-square distribution, suppose thaty1, y2, . . . , yn is a random sample from an N(µ, σ2) distribution. Then

SS

σ2=

n∑

i=1

(yi − y)2

σ2' χ2

n−1 (6.16)

that is, a sum of squares in normal random variables when divided by σ2 follows the chi-squaredistribution.

As many of the statistical techniques involve the computation and manipulation of sums ofsquares, the result given in equation (6.16) is extremely important and occurs repeatedly.

t–Distribution

If z and χ2k are independent standard normal and chi-square random variables, respectively, then

the random variable

tk =z√

χ2k/k

(6.17)

follows the t-distribution (t–Student) with k degrees of freedom, denoted tk.

318


The density function of t is

f(t) =Γ [(k + 1)/2]√

kπ Γ (k/2)· 1[(t2/k) + 1](k+1)/2

−∞ < t < ∞ (6.18)

and the mean and variance of t are µ = 0 and σ2 = k/(k − 2) for k > 2, respectively.

Various t distributions are shown in Figure 6.10.

Figure 6.10. Differentl t distributions

Note that if k = ∞, the t distribution becomes the standard normal distribution. If y1, y2, . . . , yn

is a random sample from the N(µ, σ2) distribution, then the quantity

t =y − µ

S/√

n≈ tn−1 (6.19)

is distributed as t with (n− 1) degrees of freedom.

F–Distribution

The final sampling distribution considered is the F -distribution (Fisher distribution). If χ2u and

χ2v are two independent chi-square random variables with u and v degrees of freedom, respectively,

then the ratio

Fu,v =χ2

u/u

χ2u/v

(6.20)

follows the F distribution with u numerator degrees of freedom and v denominator degrees offreedom. The probability distribution of F is

h(F ) =Γ

(u + v

2

) (u

v

)u/2

F (u/2)−1

Γ

(u

2

)Γ

(v

2

) [(u

v

)F + 1

](u+v)/2(6.21)

319


Several F distributions are shown in Figure 6.11. This distribution is very important in thestatistical analysis of designed experiments.

Figure 6.11. Several F distributions

6.3 Inferences about Differences in Randomized Designs

In this section discussion will be on how the data from a simple comparative experiment can beanalyzed using hypothesis testing and confidence interval procedures for comparing two treatmentmeans (responses). It will be assumed that a completely randomized experimental design is used.In such a design, the data are viewed as if they were a random sample from a normal distribution.

6.3.1 Hypothesis Testing

A statistical hypothesis is a statement about the parameters of a probability distribution. Thismay be stated formally as

H◦ : µ1 = µ2

H1 : µ1 6= µ2

where µ1 and µ2 are the mean values of the two treatments. The statement H◦ : µ1 = µ2 is calledthe null hypothesis, whereas H1 : µ1 6= µ2 is called the alternative hypothesis. The alternativehypothesis specified here is called a two–sided alternative hypothesis since it would be true eitherif µ1 < µ2 or if µ1 > µ2.

To test a hypothesis a procedure is devised for taking a random sample, computing an appro-priate test statistic, and then rejecting or failing to reject the null hypothesis H◦. Part of thisprocedure is specifying the set of values for the test statistic that leads to rejection of H◦. Thisset of values is called the critical region or rejection region for the test.

Two kinds of errors may be committed when testing hypotheses. If the null hypothesis is rejectedwhen it is true, then a type I error has occurred. If the null hypothesis is not rejected when it isfalse, then a type II error has been made. The probabilities of these two errors are given specialsymbols, namely

320

6.3 – Inferences about Differences in Randomized Designs

α = P (type I error) = P (rejectH◦ | H◦ is true)

β = P (type II error) = P (fail to rejectH◦ | H◦ is false)

Sometimes it is more convenient to work with the power of the test, where

Power = 1− β = P (rejectH◦ | H◦ is false)

The general procedure in hypothesis testing is to specify a value of the probability of type I errorα, often called the significance level of the test, and then design the test procedure so that theprobability of type II error β has a suitably small value.

Assume that the variances of two treatments were identical. Then an appropriate test statisticto use for comparing two treatment means in the completely randomized design is

t◦ =y1 − y2

Sp

√1n1

+1n2

(6.22)

where y1 and y2 are the sample means, n1 and n2 are the sample size, S2p is an estimate of the

common variance σ21 = σ2

2 = σ2 computed from

S2p =

(n1 − 1)S21 + (n2 − 1)S2

2

n1 + n2 − 2(6.23)

and S21 and S2

2 are the two individual sample variances.

To determine whether to reject the hypothesis H◦ : µ1 = µ2, the experimenter could comparet◦ to the t–distribution with (n1 + n2 − 2) degrees of freedom. If | t◦ | > tα/2, n1+n2−2 , wheretα/2, n1+n2−2 is the upper α/2 percentage point of the t distribution, the experimenter wouldreject H◦ and conclude that the mean values of the two treatments of an experiment differ.

In some problems, the experimenter may wish to reject the hypothesis H◦ only if one mean is largerthan the other. Thus, he/she would specify a one–sided alternative hypothesis H1 : µ1 > µ2

and would reject H◦ only if t◦ > tα, n1+n2−2. If the experimenter wants to reject H◦ only if µ1

is less than µ2, then the alternative hypothesis is H1 : µ1 < µ2 and he/she would reject H◦ ift◦ < −tα, n1+n2−2.

6.3.2 Choice of Sample Size

Selection of an appropriate sample size is one of the most important aspects of any experimentaldesign problem. The choice of sample size and the probability of type II error β are closelyconnected. Suppose that the experimenter is testing the hypotheses

H◦ : µ1 = µ2

H1 : µ1 6= µ2

321


and that the means are not equal so that δ = µ1 − µ2. Since the hypothesis H◦ : µ1 = µ2 isnot true, one is concerned about wrongly failing to reject H◦. The probability of type II errordepends on the true difference in means δ. A graph of β versus δ for a particular sample sizeis called the operating characteristic curve for the test. The β error is also a function of samplesize. Generally, for a given value of δ, the β error decreases as the sample size increases. That is,a specified difference in means is easier to detect for larger sample sizes than for smaller ones.

A set of operating characteristic curves for the given hypotheses, in the case where the twopopulation variances σ2

1 and σ22 are unknown but equal (σ2

1 = σ22 = σ2 ) and for a level of

significance of α = 0.05 , is shown in Figure 6.12.

Figure 6.12. Operating characteristics curves for the two–sided t-test with α = 0.05

The curves also assume that the sample sizes from the two populations are equal; that is, n1 =n2 = n. The parameter on the horizontal axis in Figure 6.12 is

d =| µ1 − µ2 |

2σ=| δ |2σ

Dividing |δ | by 2σ allows the experimenter to use the same set of curves, regardless of the valueof the variance (the difference in means is expressed in standard deviation units). Furthermore,the sample size used to construct the curves is actually n∗ = 2n− 1.

Examining these curves permits to note the following:

• the greater the difference in means, µ1 − µ2, the smaller the probability of type II error fora given sample size and the significance level α; that is, for a specified sample size and α,the test will detect large differences more easily than small ones;

• as the sample size gets larger, the probability of type II error gets smaller for a given differ-ence in means and α; that is, to detect a specified difference δ, the experimenter may makethe test more powerful by increasing the sample size.

322

6.3 – Inferences about Differences in Randomized Designs

Operating characteristic curves are often helpful in selecting a sample size to use in an exper-iment. They often play an important role in the choice of sample size in experimental designproblems. For a discussion of the uses of operating characteristic curves for other simple compar-ative experiments similar, see Hines and Montgomery (1990).

6.3.3 Confidence Intervals

Although hypothesis testing is a useful procedure, it sometimes does not tell the entire story. Itis often preferable to provide an interval within which the value of the parameter or parametersin question would be expected to lie. These interval statements are called confidence intervals.In many engineering and industrial experiments, the experimenter already knows that the meansµ1 and µ2 differ; consequently, hypothesis testing on µ1 = µ2 is of little interest. The experi-menter would usually be more interested in a confidence interval on the difference in means µ1−µ2.

To define a confidence interval, suppose that θ is an unknown parameter. To obtain an intervalestimate of θ, the experimenter needs to find two statistics L and U such that the followingprobability statement holds

P (L ≤ θ ≤ U) = 1− α (6.24)

where the interval

L ≤ θ ≤ U (6.25)

is called a 100 (1−α) percent confidence interval for the parameter θ. The interpretation of thisinterval is that if, in repeated random samplings, a large number of such intervals are constructed,100 (1−α) percent of them will contain the true value of θ. The statistics L and U are called thelower and upper confidence limits, respectively, and (1 − α) is called the confidence coefficient .If α = 0.05, then equation (6.25) is called a 95 percent confidence interval for θ. Note thatconfidence intervals have a frequency interpretation; that is, the experimenter does not know ifthe statement is true for this specific sample, but he/she does know that the method used toproduce the confidence interval yields correct statements 100 (1− α) percent of the times.

Case where σ21 6= σ2

2

If the experimenter is testing the hypothesis

H◦ : µ1 = µ2

H1 : µ1 6= µ2

and cannot reasonably assume that the variances σ21 and σ2

1 are equal, then the two–sample t testmust be modified slightly.

323


The test statistic becomes

t◦ =y1 − y2√S2

1

n1+

S22

n2

(6.26)

This statistic is not distributed exactly as t. However, the distribution of t◦ is well approximatedby t if one uses

ν =

(S2

1

n1+

S22

n2

)2

(S21/n1)2

n1 − 1+

(S22/n2)2

n2 − 1

(6.27)

as the degrees of freedom.

Case where σ21 and σ2

2 are known

If the variances of both populations are known, then the hypothesis

H◦ : µ1 = µ2

H1 : µ1 6= µ2

may be tested using the statistic

Z◦ =y1 − y2√σ2

1

n1+

σ22

n2

(6.28)

If both populations are normal, or if the sample sizes are large enough so that the central limittheorem applies, the distribution of Z◦ is N(0, 1) if the null hypothesis is true. Thus, the criticalregion would be found using the normal distribution rather than the t distribution. Specifically,the experimenter would reject H◦ if |Z◦ | > Zα/2 , where Zα/2 is the upper α/2 percentage pointof the standard normal distribution.

The 100 (1− α) percent confidence interval on µ1 − µ2, where the variances are known, is

y1 − y1 − Zα/2

√σ2

1

n1+

σ21

n1≤ µ1 − µ2 ≤ y1 − y1 + Zα/2

√σ2

1

n1+

σ21

n1(6.29)

As noted previously, the confidence interval is often a useful supplement to the hypothesis testingprocedure.

324

6.4 – Experiments with a Single Factor

Comparing a Single Mean to a Specified Value

Some experiments involve comparing only one population mean µ to a specified value, say µ◦.The hypotheses are

H◦ : µ1 = µ2

H1 : µ1 6= µ2

If the population is normal with known variance, or if the population is nonnormal but the samplesize is large enough so that the central limit theorem applies, then the hypothesis may be testedusing a direct application of the normal distribution. The test statistic is

Z◦ =y − µ◦σ/√

n(6.30)

If the hypothesis H◦ : µ1 = µ2 is true, then the distribution of Z◦ is N(0, 1). Therefore, thedecision rule for H◦ : µ1 = µ2 is to reject the null hypothesis if |Z◦ | > Zα/2. The value of themean µ◦ specified in the null hypothesis is usually determined in one of three ways. It may resultfrom past evidence, knowledge, or experimentation. It may be the result of some theory or modeldescribing the situation under study. Finally, it may be the result of contractual specifications.The 100 (1− α) percent confidence interval on the true population mean is

y − Zα/2 ·σ/√

n ≤ µ ≤ y + Zα/2 ·σ/√

n (6.31)

6.4 Experiments with a Single Factor

In previous section, methods have been discussed for comparing two conditions or treatments.Another way to describe an experiment is to consider it as a single–factor experiment with a

levels of the factor (variable).

6.4.1 Analysis of Variance

Suppose there are a treatments or different levels of a single factor that the experimenter wishesto compare. The observed response from each of the a treatments is a random variable. Gen-erally, each entry, yij , represents the jth observation taken under treatment i. There will be, ingeneral, n observations under the ith treatment.

It may be useful to describe the observations by means of a linear statistical model

yij = µ + τi + εij

{i = 1, 2, . . . , a

j = 1, 2, . . . , n(6.32)

325


where µ is a parameter common to all treatments called the overall mean, τi is a parameterunique to the ith treatment called the ith treatment effect , and εij is a random error component.The goal is to test appropriate hypotheses about the treatment effects and to estimate them. Forhypothesis testing, the model errors are assumed to be normally and independently distributedrandom variables with mean zero and variance σ2 which is assumed to be constant for all levelsof the factor.

This model is called the one-way or single-factor analysis of variance because only one vari-able is investigated. Furthermore, it is required that the experiment be performed in randomorder so that the environment in which the treatments are used (often called the experimentalunits) is as uniform as possible. Thus, the experimental design is a completely randomized design.

Actually, the statistical model equation (6.32) describes two different situations with respect tothe treatment effects. First, the a treatments could have been specifically chosen by the experi-menter when he/she wishes to test hypotheses about the treatment means. In this situation theconclusions will apply only to the factor levels considered in the analysis and cannot be extendedto similar treatments that were not explicitly considered. The experimenter might also wish toestimate the model parameters (µ, τi, σ2). This is called the fixed effects model .

Alternatively, the a treatments could be a random sample from a larger population of treatments.In this situation the experimenter would like to extend the conclusions (which are based on thesample of treatments) to all treatments in the population, were they explicitly considered inthe analysis or not. Here the τi are random variables, and knowledge about the particular onesinvestigated is relatively useless. Instead, the experimenter tests hypotheses about the variabilityof the τi and try to estimate this variability. This is called the random effects model or componentsof variance model .

6.4.2 Fixed Effects Model

In the fixed effects model the treatment effects τi are usually defined as deviations from the overallmean, so

a∑

i=1

τi = 0 (6.33)

Let yi represent the total of the observations under the ith treatment. whereas yi represents theaverage of the observations under the ith treatment. Similarly, let y represent the grand totalof all the observations, while y represent the grand average of all the observations. Expressedsymbolically

yi. =n∑

j=1

yij yi. = yi./n i = 1,2, . . . ,n

y.. =a∑

i=1

n∑

j=1

yij y.. = y../N

(6.34)

326


where ν = a·n is the total number of observations, whereas the ‘dot’ subscript notation impliessummation over the subscript that it replaces.

The mean of the ith treatment is E(yij) = µi = µ + τi (i = 1, 2, . . . , a). Thus, the mean ofthe ith treatment consists of the overall mean plus the ith treatment effect. The experimenter isinterested in testing the equality of the a treatment means; that is,

H◦ : µ1 = µ2 = . . . = µa

H1 : µi 6= µj for at least one pair (i,j)

Note that if the hypothesis H◦ is true, all treatments have a common mean µ. An equivalent wayto write the above hypotheses is in terms of the treatment effects τi, say

H◦ : τ1 = τ2 = . . . = τa = 0

H1 : τi 6= 0 for at least one i

Thus, it is equivalent to speak of testing the equality of treatment means or testing that thetreatment effects (the τi) are zero. The appropriate procedure for testing the equality of a

treatment means is the analysis of variance.

Decomposition of the Total Sum of Squares

The term analysis of variance is derived from a partitioning of total variability into its componentparts. The total corrected sum of squares

SST =a∑

i=1

n∑

j=1

(yij − y..)2

is used as a measure of overall variability in the data. Intuitively, this is reasonable since, if theexperimenter were to divide SST by the appropriate number of degrees of freedom (in this case,a ·n − 1 = ν − 1), he/she would obtain the sample variance of the y’s, which is, of course, astandard measure of variability.

Notice that the total corrected sum of squares SST may be written as

a∑

i=1

n∑

j=1

(yij − y..)2 =a∑

i=1

n∑

j=1

[(yi. − y..) + (yij − yi.)]2

= na∑

i=1

(yi. − y..)2 +a∑

i=1

n∑

j=1

(yij − yi.)2 + 2a∑

i=1

n∑

j=1

(yi. − y..)·(yij − yi.)

However, the cross–product term in equation (6.35) is zero, sincen∑

j=1

(yij − yi.) = yi. − n yi. = yi. − n (yi./n) = 0

327


Therefore, the fundamental analysis of variance identity is obtained asa∑

i=1

n∑

j=1

(yij − y..)2 = na∑

i=1

(yi. − y..)2 +a∑

i=1

n∑

j=1

(yij − yi.)2 (6.35)

Equation (6.35) states that the total variability in the data, as measured by the total correctedsum of squares, can be partitioned into a sum of squares of the differences between the treatmentaverages and the grand average, plus a sum of squares of the differences of observations withintreatments from the treatment averages. Now, the difference between the observed treatmentaverages and the grand average is a measure of the differences between treatment means, whereasthe differences of observations within a treatment from the treatment average can be due only torandom error. Thus, equation (6.35) can be written symbolically as

SST = SSTreatments + SSE

where SSTreatments is called the sum of squares due to treatments (i.e., between treatments), andSSE is called the sum of squares due to error (i.e., within treatments). Since there are a·n = ν

total observations, SST has (ν − 1) degrees of freedom. There are a levels of the factor (and a

treatment means), so SSTreatments has (a− 1) degrees of freedom. Finally, within any treatmentthere are n replicates providing (n − 1) degrees of freedom with which to estimate the experi-mental error. Since there are a treatments, there are a (n−1) = ν− a degrees of freedom for error.

It is helpful to examine explicitly the two terms on the right–hand side of the fundamental analysisof variance identity. e.g. equation (6.35). From the error sum of squares SSE , combination ofa sample variance gives an estimate of the common variance within each of the a treatments asfollows

MSE =SSE

ν − a=

a∑

i=1

n∑

j=1

(yij − yi.)2

a∑

i=1

(n− 1)

Similarly, if there were no differences between the a treatment means, one could use the differencesof the treatment averages from the grand average to estimate σ2. Specifically,

MSTreatments =SSTreatments

a− 1=

na∑

i=1

(yi. − y..)2

a− 1

is an estimate of σ2 if the treatments means are equal.

It may be observed that the analysis of variance identity, given by equation (6.35), provides theexperimenter with two estimates of σ2: one based on the inherent variability within treatmentsand one based on the variability between treatments. If there are no differences in the treat-ment means, these two estimates should be very similar, and if they are not, the experimentershould suspect that the observed difference must be caused by differences in the treatment means.

328


The quantities MSTreatments and MSE are called mean squares. Their expected values whenintroducing the linear statistical model given by equation (6.32) into previous equations areobtained respectively as

E (MSTreatments) = σ2 +

na∑

i=1

τ2i

a− 1

and

E (MSE) = σ2

Thus MSE = SSE/(ν−a) estimates σ2, and if there are no differences in treatment means (whichimplies that τi = 0) MSTreatments = SSTreatments/(a− 1) also estimates σ2. However, note that iftreatment means do differ, the expected value of the treatment mean square is greater than σ2.

It seems clear that a test of the hypothesis of no difference in treatment means can be performedby comparing MSTreatments and MSE . Therefore, it will be illustrated how this comparison maybe made.

Statistical Analysis

As a test of the hypothesis of no difference in treatment means can be performed by comparingMSTreatments and MSE , it is proper to investigate how a formal test of the hypothesis of no differ-ences in treatment means (H◦ : µ1 = µ2 = . . . = µa, or equivalently, H◦ : τ1 = τ2 = . . . = τa = 0)can be performed. Since it has been assumed that the errors εij are normally and independentlydistributed with mean zero and variance σ2, the observations yij are normally and independentlydistributed with mean µ + τi and variance σ2. Thus, SST is the total sum of squares in nor-mally distributed random variables; consequently, it can be shown that SST /σ2 is chi–squaredistributed with (ν − 1) degrees of freedom. Furthermore, it can be shown that also SSE/σ2 ischi–square with (ν − a) degrees of freedom and that SSTreatments/σ2 is chi-square with (a − 1)degrees of freedom if the null hypothesis H◦ : τi = 0 is true. However, all three sums of squaresare not independent since SSTreatments and SSE add to SST .

However, since according to Cochran’s theorem SSTreatments/σ2 and SSE/σ2 are independentlydistributed chi–square random variables, if the null hypothesis of no difference in treatment meansis true, the statistic ratio

F◦ =SSTreatments/(a− 1)

SSE/(ν − a)=

MSTreatments

MSE(6.36)

is distributed as F with (a−1) and (ν−a) degrees of freedom. Equation (6.36) is the test statisticfor the hypothesis of no differences in treatment means.

From the expected mean squares it may be noticed that, in general, MSE is an unbiased estimatorof σ2. Also MSTreatments is an unbiased estimator of σ2 under the null hypothesis. However, if the

329


null hypothesis is false, then the expected value of MSTreatments is greater than σ2. Therefore,under the alternative hypothesis, the expected value of the numerator of the test statistic inequation (6.36) is greater than the expected value of the denominator, and the experimenterwould reject H◦ on values of the test statistic that are too large. This implies an upper–tail,one–tail critical region. The experimenter would reject H◦ if

F◦ > Fα, a−1. N−a

where F◦ is computed from equation (6.36).

Computation of formulas for the sums of squares may be obtained by rewriting and simplifyingthe definitions of SSTreatments and SST in equation (6.35). This yields

SST =a∑

i=1

n∑

j=1

y2ij −

y2..

N(6.37)

and

SSTreatments =a∑

i=1

y2i.

n− y2

..

N(6.38)

The error sum of squares is obtained by subtraction as

SSE = SST − SSTreatments (6.39)

The test procedure is summarized in Table 6.1, which is called analysis of variance table.

Source of Sum of Degrees of MeanVariation Squares Freedom Square F◦

Between treatments SSTreatments a− 1 MSTreatmentsMSTreatments

MSE

Error (within treatments) SSE ν − a MSE

Total SST ν − 1

Table 6.1. Analysis of variance table for the single-factor, fixed effects model

Information from Computer Packages

It may be observed that the sums of squares have been defined in terms of averages; that is, forexample, from equation (6.35)

SSTreatments = na∑

i=1

(yi. − y)2

but the computing formulas were developed using totals; for example, to compute SSTreatments

the analyst would use equation (6.38)

330


SSTreatments =a∑

i=1

y2i.

n− y2

..

N

The reason for this is numerical accuracy; the totals yi. and y.. are not subject to rounding error,whereas the averages yi. and y.. are. Generally, the experimenter needs not be too concerned withcomputing, as there are many widely available computer programs for performing the calcula-tions. These computer programs are also helpful in performing many other analyses associatedwith experimental design (such as residual analysis, etc.). In addition to the basic analysis ofvariance, the computer programs display some additional useful information such as R2, R2

adj,std-dev, t–test, p–level, etc. The computer programs also calculate and display the residuals.

The sum of squares corresponding to the ‘model’ is the usual SSTreatments for a single factordesign. These sums of squares are always identical for a balanced design, and in case of a single–factor design, they are the same as the model sum of squares.

R-square

The quantity R2, also called coefficient of determination, is calculated as

R2 =SSR

SSTotal= 1− SSE

SST

It is loosely interpreted as the proportion of the variability in a data set that is accounted for bya statistical model and ‘explained’ by the analysis of variance model. In this definitiom, the term‘variability’ stands for variance or, equivalent, sum of squares. Clearly, it must result 0 < R2 < 1,with larger values being more desirable, e.g. values approaching 1 are valuable.

R–square is the relative predictive power of a model. If SSE is much smaller than SST , thenthe model fits well. R–square can be a lousy measure of goodness–of–fit, especially when it ismisused. By definition, R2 is the fraction of the total squared error that is explained by themodel. But some data contain irreducible error, and no amount on the limiting value of R2.Sadly, many practitioners pursue very high order polynomila models in the mistaken but widelyheld belief that as the number of variables approaches the number of observations the model canbe made to pass through every point.

It must be noticed that R2 does not tell whether:

• the independent variables are a true cause of the changes in the dependent variable,

• omitted variable bias exists,

• the corrected regression was used,

• the most appropriate set of independent variables has been chosen,

• co–linearity is present in the data.

331


Adjusted R-square

The main drawback of R–square is that it always increases as the number of variables in themodel increases. The alternative technique is to look for the adjusted R-square (R2

adj) statistic.

Adjusted R-square is a modification of R2 that adjusts for the number of variables in a model.Unlike R2, the adjusted R2 increases only if the new term improves the model. R2

adj can be evennegative, and will always be less than or equal to R2. It is defined as

R2adj = 1− SSE/(n− p)

SST /(n− 1)= (1−R2)− n− 1

n− p− 1

where p is the total number of variables in a linear model, and n is the sample size. In general,the adjusted-R2 statistic will not always increase as variables are added to the model. In fact, ifunnecessary terms are added, the value of R2

adj will often decrease.

Standard Deviation

Standard deviation ‘std dev’ is the square root of the error mean square and ‘CV’ is the coefficientof variation, defined as (

√MSE/y)·100. The coefficient of variation measures the unexplained or

residual variability in the data as a percentage of the mean of the response variable.

Student t–test

A t-test is any statistical hypothesis test for two models in which the test statistic has a Student’st distribution if the null hypothesis is true. The t statistic was introduced by William Sealy Gossetfor cheaply monitoring the quality of beer brews. Student was his pen name. Today, it is moregenerally applied to the confidence that can be placed in judgements made from small samples.

Among the most frequent used t–tests:

• a test of null hypothesis that the means of two normally distributed populations are equal;

• a test of whether the mean of a normally distributed population has a value specified in anull hypothesis;

• a test of whether the slope of a regression line differs significantly from 0.

p–level

The probability of making an error is often called p in research reports. If α–level is set priorto analyzing the statistical outcomes and that level is the acceptable level of error which will beoverlooked since no statistical analysis is ever 100% error free, the α–level means that the resultswill be deemed significant or valid if the amount of error is less than the amount accepted.

If the probability of error (p–level) is less than the accepted level of error (α), then it can bestated that the research is significant because p < α. For example, if one sets α = 0.5, the resultsare significant provided they yield p < 0.5.

332


6.4.3 Model Adequacy Checking

The decomposition of the variability in observations through the fundamental analysis of vari-ance identity, as given in equation (6.35), is a purely algebraic relationship. However, the useof the partitioning to test formally for no differences in treatment means requires that certainassumptions be satisfied.

Specifically, these assumptions are that the observations are adequately described by the model


and that the errors are normally and independently distributed with mean zero and constant butunknown variance σ2. If these assumptions are valid, then the analysis of variance procedure isan exact test of the hypothesis of no difference in treatment means.

In practice, however, these assumptions will usually not hold exactly. Consequently, it is usu-ally unwise to rely on the analysis of variance until the validity of these assumptions has beenchecked. Violations of the basic assumptions and model adequacy can be easily investigated bythe examination of residuals. The residual for observation j in treatment i is defined as

eij = yij − yij (6.40)

where yij is an estimate of the corresponding observation yij obtained as follows

yij = µ + τi = y.. + (yi. − y..) = yi. (6.41)

That is, the residuals for the ith treatment are found by subtracting the treatment average fromeach observation in that treatment. Model adequacy checking usually consists of plotting theresiduals as described below. Such a diagnostic checking should be a routine part of every exper-imental design project. Equation (6.41) gives the intuitively appealing result that the estimateof any observation in the ith treatment is just the corresponding treatment average.

Examination of the residuals should be an automatic part of any analysis of variance. If themodel is adequate, the residuals should be structureless; that is, they should contain no obviouspatterns. Through a study of residuals, many types of model inadequacies and violations of theunderlying assumptions can be discovered.

Normality Assumption

A check of the normality assumption may be made by plotting a histogram of the residuals. If theNID(0, σ2) assumption on the errors is satisfied, then this plot should look like a sample from anormal distribution centered at zero. Unfortunately, with small samples, considerable fluctuationoften occurs, so the appearance of a moderate departure from normality does not necessarilyimply a serious violation of the assumptions. Gross deviations from normality are potentiallyserious and require further analysis.

Another useful procedure is to construct a normal probability plot of the residuals. A nor-mal probability plot is just a graph of the cumulative distribution of the residuals on a normal

333


probability paper , that is, graph paper with the ordinate scaled so that the cumulative normaldistribution plots as a straight line. To construct a normal probability plot, the residuals arearranged in increasing order and the kth of these ordered residuals are plotted versus the cumu-lative probability point Pk = (k − 1/2)/N on normal probability paper. If the underlying errordistribution is normal, this plot will resemble a straight line. In visualizing the straight line, moreemphasis should placed on the central values of the plot than on the extremes.

A normal probability plot of the residuals from a sample is shown in Figure 6.13. The generalimpression from examining this display is that the error distribution may be slightly skewed,with the right tail being longer than the left. The tendency of the normal probability plot tobend down slightly on the left side implies that the left tail of the error distribution is somewhatthinner than would be anticipated in a normal distribution; that is, the negative residuals are notquite as large (in absolute value) as expected.

Figure 6.13. Normal probability plot of residuals

In general, moderate departures from normality are of little concern in the fixed effects analysisof variance. An error distribution that has considerably thicker or thinner tails than the normalis of more concern than a skewed distribution. Since the F–test is only slightly affected, it can besaid that the analysis of variance (and related procedures such as multiple comparisons) is robustto the normality assumption. Departures from normality usually cause both the true significancelevel and the power to differ slightly from the foreseen values, with the power generally beinglower.

Analysis of Residuals

If the model is correct and if the assumptions are satisfied, the residuals should be structureless;in particular, they should be unrelated to any other variable including the response yij . A simple

334


check is to plot the residuals versus the fitted values yij . For the single factor model, rememberthat yij = yij , the ith treatment average. This plot should not reveal any obvious pattern. Figure6.14 plots the residuals versus the fitted values for some experiment. No unusual structure isapparent.

Figure 6.14. Plot of residuals versus fitted values

A defect that occasionally shows up on this plot is nonconstant variance. Sometimes the varianceof the observations increases as the magnitude of the observation increases. This would be thecase if the error or background noise in the experiment was a constant percentage of the size ofthe observation. This commonly happens with many measuring instruments - error is a percentof the scale reading. If this were the case, the residuals would get larger as yij gets larger, and theplot of residuals versus yij would look like an outward opening funnel or megaphone. Nonconstantvariance also arises in cases where the data follow a nonnormal, skewed distribution because inskewed distributions the variance tends to be a function of the mean.

6.4.4 Random Effects Model

Experimenters are frequently interested in factors that have a large number of possible levels. Ifthey randomly select a of these levels from the population of variable levels, then it is said thatthe factor is random. Because the levels of the factor actually used in the experiment were chosenrandomly, inferences are made about the entire population of factor levels. It is assumed thatthe population of factor levels is either of infinite size or is large enough to be considered infinite.Situations in which the population of factor levels is small enough to employ a finite populationapproach are not encountered frequently.

335


The linear statistical model called the components of variance or random effects model is


{i = 1, 2, . . . , a

j = 1, 2, . . . , n(6.42)

where both τi and εij are now random variables. If τi has variance σ2τ and is independent of εij ,

the variance of any observation is

V (yij) = σ2τ + σ2

where the variances σ2τ and σ2 are called variance components.

The sum of squares identity

SST = SSTreatments + SSE (6.43)

is still valid. That is, the total variability in the observations is partitioned into a componentthat measures the variation between treatments (SSTreatments) and a component that measuresthe variation within treatments (SSE).

Testing hypotheses about individual treatment effects is meaningless; so instead the experimentertests the hypotheses

H◦ : σ2τ = 0

H1 : σ2τ > 0

If σ2τ = 0, all treatments are identical; but if σ2

τ > 0, variability exists between treatments. Asfor the fixed effects model, SSE/σ2 is distributed as chi-square with (ν − a) degrees of freedom,and under the null hypothesis, SSTreatments is distributed as chi-square with (a − 1) degrees offreedom. Both random variables are independent. Thus, under the null hypothesis σ2

τ = 0, thestatistic ratio

F◦ =

SSTreatments

a− 1SSE

ν − a

=MSTreatments

MSE(6.44)

is distributed as F with (a− 1) and (ν−a) degrees of freedom. However, the experimenter needsto examine the expected mean squares to fully describe the test procedure. It can be shown that

E (MSTreatments) = σ2 + nσ2τ (6.45)

E (MSE) = σ2 (6.46)

From the expected mean squares, the experimenter can see that under the hypothesis H◦ both thenumerator and denominator of the test statistic, as in equation (6.44), are unbiased estimatorsof σ2, whereas under the hypothesis H1 the expected value of the numerator is greater than theexpected value of the denominator. Therefore, the experimenter should reject H◦ for values ofF◦ that are too large. This implies an upper-tail, one-tail critical region, so he/she will reject H◦if F◦ > Fα, a−1, ν−a.

336

6.5 – Sampling Based Methods

The computational procedure and analysis of the variance table for the random effects modelare identical to those for the fixed effects model. The conclusions, however, are quite differentbecause they apply to the entire population of treatments.

The experimenter is usually interested in estimating the variance components (σ2 and σ2τ ) in

the model. The procedure that is used to estimate σ2 and σ2τ is called the analysis of variance

method because it makes use of the lines in the analysis of variance table. The procedure consistsof equating the expected mean squares to their observed values in the analysis of variance tableand solving for the variance components. In equating observed and expected mean squares in thesingle–factor random effects model, one obtains

MSTreatments = σ2 + nσ2τ

andMSE = σ2

Therefore, the estimators of the variance components are

σ2 = MSE (6.47)

σ2τ =

MSTreatments −MSE

n(6.48)

For unequal sample sizes, replace n in equation (6.48) by

n◦ =1

a− 1

a∑

i=1

ni −

a∑

i=1

n2i

a∑

i=1

ni

(6.49)

The analysis of variance method of variance component estimation does not require the normalityassumption. It does yield estimators of σ2 and σ2

τ that are best quadratic unbiased (i.e., of allunbiased quadratic functions of the observations, these estimators have minimum variance).

6.5 Sampling Based Methods

Sampling based methods involve running a set of models at a set of sample points, and establish-ing a relationship between inputs and outputs using the model results at the same points. Someof the widely used sampling based methods also useful for sensitivity and uncertainty analysis are:

• Response Surface Methodology;

• Monte Carlo Methods;

• Latin Hypercude Sampling Methods;

• Fourier Amplitude Sensitivity Test (FAST);

• Reliability Based Methods (FORM and SORM).

337


Traditional sampling methods for sensitivity and uncertainty analysis require a substantial num-ber of model runs to obtain a good approximation of the output pdfs, especially for cases in-volving several inputs. On the other hand, analytical methods require the information about themathematical equations of a model, and often are restricted in their applicability to cases wherethe uncertainties are small. Therefore there is a need for computationally efficient methods foruncertainty propagation that are robust and also applicable to a wide range of complex models.

Monte Carlo Methods

Monte Carlo methods are the most widely used means for uncertainty analysis, with applicationsranging from aerospace engineering to economics. These methods involve random sampling fromthe distribution of inputs and successive model runs until a statistically significant distributionof outputs is obtained. They can be used to solve problems with physical probabilistic struc-tures, such as uncertainty propagation in models or solution of stochastic equations, or can beused to solve non–probabilistic problems. Monte Carlo methods are also used in the solution ofproblems that can be modelled by the sequence of a set of random steps that eventually convergeto a desired solution. Problems such as optimization are often addressed through Monte Carlosimulations.

Since these methods require a large number of samples (or model runs), their applicability issometimes limited to simple models. In case of computationally intensive models, the time andresources required by these methods could be prohibitively expensive..A degree of computational efficiency is accomplished by using Adaptive Monte Carlo methodsthat sample from the input distribution in an efficient manner, so that the number of necessarysolutions compared to the simple Monte Carlo methods is significantly reduced.

Latin Hypercube Sampling Methods

The Latin Hypercube Sampling is a widely used variant of the standard Monte Carlo method.In this method, the range of probable values for each uncertain input parameter is divided intoordered segments of equal probability. Thus, the whole parameter space, consisting of all theuncertain parameters, is partitioned into cells having equal probability, and they are sampled inan ‘efficient’ manner such that each parameter is sampled once from each of its possible segments.The advantage of this approach is that the random samples are generated from all the rangesof possible values, thus giving insight into the extremes of the probability distribution of theoutputs.

338

6.5 – Sampling Based Methods

Fourier Amplitude Sensitivity Test

Fourier Amplitude Sensitivity Test (FAST) is a method based on Fourier transformation of un-certain model parameters into a frequency domain, thus reducing a multidimensional model intoa single dimensional one. For a model with m model parameters, k1, k2, . . . , km, and n outputs,u1, u2, . . . , un, such that

ui = ui (t; k1, k2, . . . , km) ; i = 1, 2, . . . , n

the FAST method involves the transformation of the parameters into a frequency domain spannedby a scalar s as follows:

kl = kl (sin ωls) ; l = 1, 2, . . . , m

The outputs are then approximated as

µi(t) =12π

∫ π

−πui[t; k1(s), k2(s), . . . , km(s)] ds

σ2i =

12π

∫ π

−πu2

i [t; k1(s), k2(s), . . . , km(s)] ds− u2i

(6.50)

These integrals are evaluated by repeatedly sampling the parameter space of s, which correspondsto the sampling in the multidimensional model parameter space. The details of the transformationof model parameters into the frequency domain and the subsequent sampling are explained byKoda et al. (1979) and Fang et al. (2003).

Reliability Based Methods

First- and Second-Order Reliability methods (FORM and SORM, respectively) are approxima-tion methods that estimate the probability of an event under consideration (typically termed‘failure’). For example, these methods can provide the probability that a structural fatigue ex-ceeds a target level at a location (or, the probability of failure). In addition, these methodsprovide the contribution to the probability of failure from each input random variable, at noadditional computational effort. They are useful in uncertainty analysis of models with a singlefailure criterion.

For a model with random parameters

x = (x1, x2, . . . , xn)

and a failure condition

g(x1, x2, . . . , xn) < 0

the objective of the reliability–based approach is to estimate the probability of failure.

339


In case of structural strength exceedance, the failure condition can be defined as

g(x) = σa − σ(x) < 0

where σa is a pre–specified maximum permissible tension at a location of interest.

If the joint probability density function for the set x is given by fx, then the probability of failureis given by the n–fold integral

Pf = P {g(x) < 0} = P {σa − σ(x)} =∫

g(x)<0fx dx

where the integration is carried out over the failure domain. The evaluation of this integralbecomes computationally demanding as the number of random variables (the dimension of theintegration) increases; in fact, if m is the number of function calls of the integrand per dimension,and n is the dimension, the computation time grows as mn. In addition, since the value of theintegrand is small, the numerical inaccuracies can be considerably magnified when integrated overa multidimensional space.

FORM and SORM use analytical schemes to approximate the probability integral, through aseries of the following simple steps:

• mapping the basic random variables x and the failure function g(x) into a vector of stan-dardized and uncorrelated normal variates u, as X(u) and G(u) respectively;

• approximating the function G(u) by a tangent (FORM) or a paraboloid (SORM) at thefailure point u∗ closest to the origin;

• calculating the probability of failure as a simple function of u∗.

These methods are reported to be computationa1ly very efficient compared to Monte Carlo meth-ods, especially for scenarios corresponding to low probabilities of failure. Further, SORM is moreaccurate than FORM, but computationally more intensive, since it involves a higher order ap-proximation.

The main drawbacks of FORM and SORM are that the mapping of the failure function onto astandardized set, and the subsequent minimization of the function, involve significant computa-tional effort for nonlinear black-box numerical models. In addition, simultaneous evaluation ofprobabilities corresponding to multiple failure criteria would involve significant additional effort.Furthermore, these methods impose some conditions on the joint distributions of the randomparameters, thus limiting their applicability.

6.6 Response Surface Methodology

Response Surface Methodology (RSM) is a collection of statistical and mathematical techniquesuseful for modelling, improving, and optimizing processes. It also has important applications inthe design, development, and formulation of new technical products, as well as in the improve-ment of existing technical systems.

340

6.6 – Response Surface Methodology

The origin of RSM is the seminal paper by Box and Wilson (1951), which had a profound im-pact on industrial applications of experimental design, and was the motivation of much of theresearch in the field. The monograph by Myers (1976) was the first book devoted exclusively toRSM. There are also three other full–length books on the subject: Box and Draper (1987), Khuriand Cornell (1996), and Myers and Montgomery (2002). The paper by Myers (1999) on futuredirections in RSM offers a view of research needs in the field.

The most extensive applications of RSM are in the industrial world, particularly in situationswhere several input variables potentially influence some performance measure or quality charac-teristic of the product or process. This performance measure or quality characteristic is calledthe response, which is typically measured on a continuous scale, although attribute responses andranks are not unusual. The input variables are also called independent variables, and they aresubject to the control of the engineer or scientist at least for purposes of an experimental test oran experiment.

Figure 6.15. Response surface (a) and contour plot (b) of a theoretical response surface

Figure 6.15 shows graphically the relationship between the response variable y in an industrialprocess and the two process variables (or independent variables) ξ1 and ξ2. Note that for eachvalue of ξ1 and ξ2 there is a corresponding value of y, and that one may view these values of

341


the response as a surface lying above the ξ1 − ξ2 plane, as in Figure 6.15(a). It is this graphicalperspective of the problem environment that has led to the term response surface methodology. Itis also convenient to view the response surface in the two–dimensional plane, as in Figure 6.15(b).In this presentation one has to look down at the ξ1 − ξ2 plane and connect all points that havethe same response values to produce contour lines of constant response. This type of display iscalled a contour plot .

Clearly, if one could easily construct the graphical displays in Figure 6.15, optimization of thisprocess would be very straightforward. Unfortunately, in most practical situations, the trueresponse function in Figure 6.15 is unknown. The field of response surface methodology consistsof the experimental strategy for exploring the space of the process or independent variables,empirical statistical modelling to develop an appropriate approximating relationship between theresponse and the process variables, and optimization methods for finding the levels or values ofthe process variables ξ1 and ξ2 that produce desirable values of the responses.

6.6.1 Approximating Response Functions

In general, suppose that the scientist or engineer or experimenter is concerned with a product,process, system, or physical phenomenon involving a response y that depends on the controllableinput variables β1, β2, . . . , βk. The relationship is

y = f (β1, β2,, . . . , βk) + ε (6.51)

where the form of the true response function f is unknown and perhaps very complicated, andε is a term that represents other sources of variability not accounted for in f . Thus ε includeseffects such as measurement error on the response, other sources of variation that are inherent inthe process or system (background noise), the effect of other variables, and so on. Generally, ε

is treated as a statistical error, often assuming it to have a normal distribution with mean zeroand variance σ2. If the mean of ε is zero, then

E (y) ≡ η = E [f (βl, β2, . . . , βk)] + E (ε) = f (β1, β2, . . . , βk) (6.52)

The variables β1, β2,, . . . , βk in equation (6.52) are usually called the natural variables, becausethey are expressed in the natural units of measurement, such as kilowatts, pascal, or g/kWh. InRSM work it is convenient to transform the natural variables to coded variables x1, x2, . . . xk,which are usually defined to be dimensionless with mean zero and the same standard deviation.In terms of the coded variables, the true response function (6.52) is written as

η = f (x1, x2, . . . , xk) (6.53)

Because the form of the true response function f is unknown, it is necessary to approximate it.In fact, successful use of RSM is critically dependent upon the experimenter’s ability to developa suitable approximation for f . Usually, a low–order polynomial in some relatively small region

342


of the independent variable space is appropriate. In many cases, either a first–order or a second–order model is used.

For the case of two independent variables, the first–order model in terms of the coded variablesis

η = β◦ + β1 x1 + β2 x2 (6.54)

Figure 6.16 shows the three–dimensional response surface and the two–dimensional contour plotfor a specific first–order model. In three dimensions, the response surface is a plane lying abovethe (x1, x2) space. The contour plot shows that the first–order model can be represented as par-allel straight lines of constant response in the (x1, x2) plane.

The first–order model is likely to be appropriate when the experimenter is interested in approxi-mating the true response surface over a relatively small region of the independent variable spacein a location where there is little curvature in f .

Figure 6.16. Response surface (a) and contour plot (b) for a first–order model

The form of the first–order model in equation (6.54) is sometimes called a main effects model ,because it includes only the main effects of the two variables x1 and x2. If there is an interactionbetween these variables, it can be added to the model easily to find out the first–order model withinteraction. as follows

η = β◦ + β1 x1 + β2 x2 + β12 x1 x2 (6.55)

343


Figure 6.17 shows the three–dimensional response surface and the contour plot for a specific casewhere the interaction term β12 x1 x2 introduces curvature into the response function.

Often the curvature in the true response surface is strong enough that the first–order model isinadequate even with the interaction term included. A second-order model will likely be requiredin these situations. For the case of two variables, the second–order model is

η = β◦ + β1 x1 + β2 x2 + β11 x21 + β22 x2

2 + β12 x1 x2 (6.56)

This model would likely be useful as an approximation to the true response surface in a relativelysmall region around the point B in Figure 6.15(b), where there is substantial curvature in thetrue response function f .

Figure 6.17. Response surface (a) and contour plot (b) for a first–order model with interaction

344


Figure 6.18 presents the response surface and contour plot for a special case of the second–ordermodel. Such a response surface could arise in approximating a response near a maximum pointon the surface.

Figure 6.18. Response surface and contour plot for a second–order model

The second–order model is widely used in response surface methodology mainly because

• it is very flexible since it can take on a wide variety of functional forms; so it will often workwell as an approximation to the true response surface;

• it is easy to estimate the coefficients (the β’s) in the second–order model, say, by means ofthe least squares method;

• there is considerable practical experience indicating that second–order models work well insolving real response surface problems.

In general, the first–order model is

η = β◦ + β1 xl + β2 x2 + . . . + βk xk (6.57)

and the second–order model is

η = β◦ +k∑

j=1

βj xj +k∑

j=1

βjj x2j +

∑

i<j

k∑

j<2

βij xi xj (6.58)

345


Figure 6.19 shows several different response surfaces and contour plots that can be generated bya second–order model. In some situations, approximating polynomials of order greater than twoare used.

Figure 6.19. Examples of types of surfaces defined by the second-order model in two variables

The general motivation for a polynomial approximation for the true response function f is basedon the Taylor series expansion around the point (x10, x20, . . . , xk0). For example, the first–ordermodel is developed from the first–order Taylor series expansion

f = f (x10,x20, . . . ,xk0) +(

∂f

∂x1

)

x=x◦(x1 − x10) +

(∂f

∂x2

)

x=x◦(x2 − x20) + . . . +

(∂f

∂xk

)

x=x◦(xk − xk0) (6.59)

where x refers to the vector of independent variables and x◦ is that vector of variables at thespecific point (x10, x20, . . . , xk0). In equation (6.59) only the first–order terms are included in theexpansion, thus implying the first–order approximating model in equation (6.57). If one was toinclude second–order terms in equation (6.59), this would lead to the second–order approximatingmodel in equation (6.58).

Finally, note that there is a close connection between RSM and linear regression analysis. Forexample, consider the model

y = β◦ + β1 x1 + β2 x2 + . . . + βk xk + ε

The β’s are a set of unknown parameters. To estimate the values of these parameters, theexperimenter must collect data on the process or technical system under study. Regression

346


analysis is a branch of statistical model building that uses these data to estimate the β’s. Because,in general, polynomial models are linear functions of the unknown β’s, one refers to the techniqueas linear regression analysis. It will also noticed that it is very important to plan the datacollection phase of a response surface study carefully. In fact, special types of experimentaldesigns, called response surface designs, are valuable in this regard.

6.6.2 Phases of Response Surface Building

Most applications of RSM are sequential in nature. That is, at first some ideas are generatedconcerning which factors or independent variables are likely to be important in the response sur-face study. This usually leads to an experiment designed to investigate these factors with a viewtoward eliminating the unimportant ones. This types of experiments are usually called screeningexperiments. Often at the outset of a response surface study there is a rather long list of variablesthat could be important in explaining the response. The objective of factor screening is to reducethis list of candidate variables to a relatively few so that subsequent experiments will be moreefficient and require fewer runs or tests. Screening experiment is referred to as phase zero ofa response surface study. The experimenter should never undertake a response surface analysisuntil a screening experiment has been performed to identify the important factors.

Once the important independent variables are identified, phase one of the response surface studybegins. In this phase, the experimenter’s objective is to determine if the current levels or settingsof the independent variables result in a value of the response that is near the optimum, such aspoint B in Figure 6.15(b), or if the process is operating in some other region that is remote fromthe optimum, such as point A in Figure 6.15(b). If the current settings or levels of the independentvariables are not consistent with optimum performance, then the experimenter must determinea set of adjustments to the design variables that will move the design toward the optimum. Thisphase of response surface methodology makes considerable use of the first–order model and anoptimization technique called the method of steepest ascent .

Phase two of a response surface study begins when the process is near the optimum. At thispoint the experimenter usually wants a model that will accurately approximate the true responsefunction within a relatively small region around the optimum. As the true response surface usu-ally exhibits curvature near the optimum (see Figure 6.15), a second–order model (or perhapssome higher–order polynomial) will be used. Once an appropriate approximating model has beenobtained, this model may be analyzed to determine the optimum conditions for the design.

This sequential experimental process is usually performed within some region of the independentvariable space called the operability region. If one is operating at the levels shown as point A

in Figure 6.20, it is unlikely that he/she would want to explore the entire region of operabilitywith a single experiment. Instead, one usually defines a smaller region of interest or region ofexperimentation around the point A within the larger region of operability. Typically, this regionof experimentation is either a cuboidal region, as shown around the point A in Figure 6.20, or aspherical region, as shown around point B.

347


Figure 6.20. Region of operability and region of experimentation

The sequential nature of response surface methodology allows the experimenter to learn aboutthe process or system under design as the investigation proceeds. This ensures that over thecourse of the RSM application the experimenter will learn the answers to questions such as

• the location of the region of the optimum,

• the type of the most appropriate approximating function,

• the proper choice of experimental design,

• whether or not changes on the responses or any of the design variables are required.

The strategic objective of RSM is to lead the experimenter rapidly and efficiently to the generalvicinity of the optimum. Often, when the experimenter is at a point on the response surface thatis remote from the optimum, there is little curvature in the system and the first-order model willbe appropriate. Once the region of the optimum has been found, a more elaborate model suchas the second–order model may be employed, and an analysis may be performed to locate theoptimum. From Figure 6.19, one can see that the analysis of a response surface can be thoughtof as ‘climbing a hill’ where the top of the hill represents the point of maximum response. If thetrue optimum is a point of minimum response, then one may think of ‘descending into a valley’.

The eventual objective of RSM is to determine the optimum operating conditions for a technicalsystem or to determine a region of the attribute space in which operating specifications aresatisfied. RSM is not used primarily to gain understanding of the physical mechanism of atechnical system, although RSM may assist in gaining such a knowledge. Furthermore, note that‘optimum’ in RSM is used in a special sense. The ’hill climbing’ or ’valley descending’ proceduresof RSM guarantee convergence to a local optimum only.

348


6.6.3 Goals and Control of Quality

Response surface methodology is useful in the solution of many types of industrial problems.Generally, these problems fall into three categories:

1. Mapping a Response Surface over a Particular Region of Interest . If the true unknownresponse function has been approximated over a region around the current operating condi-tions with a suitable fitted response surface (say a second–order surface), then the processengineer and/or the designer can predict in advance the changes in the response that willresult from any readjustments to the independent variables.

2. Selecting the Operating Conditions to Achieve Specifications or Customer Requirements. Inmost response surface problems there are several responses that must be simultaneouslyconsidered. In this case, one way to solve the problem is to obtain response surfaces for allthe responses and then superimpose the contours for these responses.

3. Optimizing the Response. In the industrial world, a very important problem is determin-ing the conditions that optimize a process or a subsystem of a technical product. An RSMstudy that has begun near point A in Figure 6.15(b) would eventually lead the experimenterto the region near point B. A second–order model could then be used to approximate theresponse in a narrow region around point B, and from examination of this approximatingresponse surface the optimum levels for the independent variables could be chosen.

During the last thirty years, industrial organizations in the United States and Europe have be-come quite interested in quality improvement . Statistical methods, including statistical processcontrol and design of experiments, play a key role in this activity. Quality improvement is mosteffective when it occurs early in the product and process development cycle. Industries suchas semiconductors and electronics, aerospace, automotive, biotechnology and pharmaceuticals,medical devices, chemical, and process industries are all examples where experimental designmethodology has resulted in products that are easier to manufacture, have higher reliability, haveenhanced product performance, and meet or exceed customer requirements. The objectives ofquality improvement, including reduction of variability and improved product and process per-formance, can often be accomplished directly using RSM.

RSM is an important branch of smapling based methods in this respect. It is often an importantconcurrent engineering tool , in that product designers, process developers, quality controllers,manufacturing engineers, and operations personnel often work together in a team environmentto apply RSM.

It is well known that variation in key performance characteristics can result in poor productand process quality. During the 1980s, considerable attention was given to this problem, andmethodology was developed for using DoE, specifically for the following purposes:

• designing products or processes so that they are robust to environment conditions;

• designing or developing products so that they are robust to component variation;

• minimizing variability in the output response of a product around a target value.

349


Robust means that the product or process performs consistently on target and is relatively insen-sitive to factors that are difficult to control. Taguchi (1981, 1983) used the term robust parameterdesign to describe his approach to this important class of industrial problems. Essentially, ro-bust parameter design methodology prefers to reduce product or process variation by choosinglevels of controllable factors that make the system insensitive (or robust) to changes in a set ofuncontrollable factors that represent most of the sources of variability. Taguchi referred to theseuncontrollable factors as noise factors. These are the environmental parameters such as stowagefactor levels, changes in prices of materials, fuel cost variability, interest rate on debt, and so on.It is usually assumed that these noise factors are uncontrollable in actual operation, but can becontrolled during product or process design and development for purposes by means of DoE.

Considerable attention has been focused on the methodology advocated by Taguchi. There aremany useful concepts in his philosophy, and it is relatively easy to incorporate these within theframework of response surface methodology. Several attractive alternatives to the robustnessstudies were developed, that are based on principles and philosophy of Taguchi, and avoid theflaws and controversy that surround his techniques.

6.7 Building Empirical Models

There is a collection of statistical techniques useful for building the types of empirical modelsrequired in RSM. A summary of regression methods is provided, useful in response surface work,focusing on the basic ideas of least square model fitting, diagnostic checking, and inference forthe regression model.

6.7.1 Multiple Regression Analysis

Multiple regression analysis is a formalized way to develop models or equations from historicaldata. It is a technique for curve fitting when the relationship between the dependent and inde-pendent variables is not known. It consists of the following steps:

• establish an equation form;

• transform to linear form if needed;

• perform least squares to fit to data;

• test for ‘goodness’ of fit.

The experimenter must check between data points, especially with higher order equations, andnot extrapolate or use beyond the range of available data. There are general rules to respect,namely:

• curves should be consistent with theory and engineering judgement;

• it should not be possible to remove one independent variable without a significant loss inaccuracy;

• keep the number of terms in the equation reasonably small, say 10 or less;

350

6.7 – Building Empirical Models

• it should not be possible to improve the regression equation by adding one or two indepen-dent variables;

• try achieving R2adj = 0.95 or so;

• avoid problems with highly correlated variables;

• any variable with a large coefficient should be watched, especially on terms with coefficientsof opposite signs.

A number of standard equation forms is illustrated in Table 6.2

Linear y = b◦ + b1x

Multiple linear y = b◦ + b1x1 + b2x2 + . . .

Hyperbola y =1

b◦ + b1xcan use z =

1y

= b◦ + b1x (linear)

Polynomial y = b◦ + b1x1 + b2x2 + . . .

Exponential y = bcx can use log y = log b + x log c (linear on semi–log)

Geometric y = bxc can use log y = log b + c log x (linear on log–log)

Table 6.2. Standard equations in regression analysis

6.7.2 Multiple Linear Regression Models

In the practical application of response surface methodology it is necessary to develop an approxi-mating model for the true response surface, which is typically driven by some physical mechanism.The approximating model is based on observed or computed data from the manufacturing processor technical system and is an empirical model.

As an example, suppose that the experimenter wishes to develop an empirical model relating theeffective lift of an airfoil to the flow speed and the incidence angle. A first-order response surfacemodel that might describe the relationship for an empirical model with two variables is

y = β0 + β1x1 + β2x2 + ε (6.60)

where y represents the lift, x1 represents the flow speed, and x2 denotes the incidence angle. Thisis a multiple linear regression model with two independent variables. The independent variablesare often called predictor variables or regressors. The term ‘linear’ is used because equation (6.60)is a linear function of the unknown parameters β0, β1 and β2. The model describes a plane in thetwo-dimensional x1,x2 space. The parameter β0 fixes the intercept of the plane. The parametersβ1 and β2 are sometimes called partial regression coefficients, because β1 measures the expectedchange in y per unit change in x1 when x2 is held constant, and β2 measures the expected changein y per unit change in x1 when x1 is held constant.

351


In general, the response variable y may be related to k regressor variables. The model

y = β0 + β1x1 + β2x2 + . . . + βkxk + ε (6.61)

is called a multiple linear regression model with k regressor variables, {xj}. The parametersβj (j = 0, 1, . . . , k) are called the regression coefficients. This model describes a hyperplane inthe k-dimensional space of the regressor variables {xj}. The parameter βj represents the ex-pected change in response y per unit change in xj when all the remaining independent variablesxi (i 6= j) are held constant.

Models that are more complex in appearance than equation (6.61) may often still be analyzed bymultiple linear regression techniques. For example, adding an interaction term to the first-ordermodel in two variables, one obtains

y = β0 + β1x1 + β2x2 + β12x1x2 + ε (6.62)

If one lets x3 = x1x2 and β3 = β12, then equation (6.62) can be written as

y = β0 + β1x1 + β2x2 + β3x3 + ε (6.63)

which is a standard multiple linear regression model with three regressors.

As another example, consider the second-order response surface model in two variables:

y = β0 + β1x1 + β2x2 + β11x21 + β22x

22 + β12x1x2 + ε (6.64)

If one lets x3 = x21, x4 = x2

2, x5 = x1x2, β3 = β11, β4 = β22, and β5 = β12, then equation (6.64)becomes

y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + ε (6.65)

which is a linear regression model.

In general, any regression model that is linear in the parameters (the β-values) is a linear regres-sion model, regardless of the shape of the response surface that it generates.

Methods for estimating the parameters in multiple linear regression models are often called modelfitting . They will be now illustrated together with methods for testing hypotheses and construct-ing confidence intervals for these models will be discussed, as well as for checking the adequacyof the model fit. Focus is primarily on those aspects of regression analysis useful in RSM.

6.7.3 Parameters Estimation in Linear Regression Models

The method of least squares is typically used to estimate the regression coefficients in a multiplelinear regression model. Suppose that n > k designs on the response variable are available, sayy1,y1, . . . ,yn. Along with each observed or computed response yi, the experimenter will have avalue on each regressor variable. Let xij denote the ith level of variable xj . The data matrix willappear as in Table 6.3. The error term ε in the model is assumed to have E(ε) = 0 and V(ε) = σ2

and that the {ε} are uncorrelated random variables.

352


In general, the model equation (6.61) may be written in terms of the regressors in Table 6.3 as

y = β0 + β1xi1 + β2xi2 + . . . + βkxik + εi = β0 +k∑

j=1

βixij + εi , i = 1, 2, . . . , n (6.66)

y x1 x2 x3 . . . xk

y1 x11 x12 x13 . . . x1k

y2 x21 x22 x23 . . . x2k

......

......

...yn xn1 xn2 xn3 . . . xnk

Table 6.3. Data matrix for multiple linear regression

The method of least squares chooses the β’s in equation (6.66) so that the sum of the squares ofthe errors, εi, is minimized. The least squares function is

L =n∑

i=1

ε2i =

n∑

i=1

yi − β0 −

k∑

j=1

βjxij

2

(6.67)

The function L is to be minimized with respect to β0, β1, . . . , βk. The least squares estimators,say b0, b1, . . . , bk, must satisfy the system of equations

(∂L

∂β0

)

b0,b1,...,bk

= −2n∑

i=1

yi − b0 −

k∑

j=1

βjxij

= 0

(∂L

∂βj

)

b0,b1,...,bk

= −2n∑

i=1

yi − b0 −

k∑

j=1

βjxij

xij = 0 , j = 1, 2, . . . , k

(6.68)

Simplifying equations (6.68), one obtains

nb0 + b1

n∑

i=1

xi1 + b2

n∑

i=1

xi2 + . . . + bk

n∑

i=1

xik = b1

n∑

i=1

yi

b0

n∑

i=1

xi1 + b1

n∑

i=1

x2i1 + b2

n∑

i=1

xi1xi2 + . . . + bk

n∑

i=1

xi1xik =n∑

i=1

xi1yi

......

......

...

b0

n∑

i=1

xik + b1

n∑

i=1

xikxi1 + b2

n∑

i=1

xikxi2 + . . . + bk

n∑

i=1

x2ik =

n∑

i=1

xikyi

(6.69)

These equations are called the least squares normal equations. Note that there are p = k + 1normal equations, one for each of the unknown regression coefficients. The solution to the normalequations will be the least squares estimators of the regression coefficients β0, β1, . . . , βk.

353


In scalar notation, the fitted model is

yi = b0 +k∑

j=1

bjxij , i = 1, 2, . . . , n

The difference between the computed value yi and the fitted value yi is the residual of the ith

design, say

ei = yi − yi

6.7.4 Model Adequacy Checking

It is always necessary to examine the fitted model to ensure that it provides an adequate approx-imation to the true physical phenomenon or process as well as to verify that none of the leastsquares regression assumptions are violated. Proceeding with exploration and optimization of afitted response surface will likely give poor or misleading results unless the model is an adequatefit. Several techniques for checking model adequacy are presented below.

Test for Significance of Regression

The test for significance of regression is a test to determine if there is a linear relationship betweenthe response variable y and a subset of the regressor variables x1, x2, . . . , xk. The appropriatehypotheses are

H◦ : β1 = β2 = . . . = βk = 0

H1 : βj 6= 0 for at least one j(6.70)

Rejection of H◦ in equation (6.70) implies that at least one of the independent variables x1, x2, . . .,xk contributes significantly to the model. The test procedure involves partitioning the total sumof squares SST =

∑(yi− y)2 into a sum of squares due to the model (or to regression) and a sum

of squares due to residual (or error), say

SST = SSR + SSE (6.71)

Now, if the null hypothesis H◦ : β1 = β2 = . . . = βk = 0 is true, then SSR/σ2 is distributedas χ2

k, where the number of degrees of freedom for χ2k is equal to the number of independent

variables in the model. Also, it can be demonstrated that SSE/σ2 is distributed as χ2n−k−1 and

that SSE and SSE are independent. The test procedure for H◦ : β1 = β2 = . . . = βk = 0 is tocompute

F◦ =SSR/k

SSE/(n− k − 1)=

MSR

MSE(6.72)

and to reject H◦ if F◦ exceeds Fα,k,n−k−1.

354


Alternatively, one could use the p–value approach to hypothesis testing and, thus, reject H◦ ifthe p–value for the statistic F◦ is less than α. The test is usually summarized as in Table 6.4

Source of Sum of Degrees of MeanVariation Squares Freedom Square F0

Regression SSR k MSR MSR/MSE

Error of residual SSE n− k − 1 MSE

Total SST n− 1

Table 6.4. Analysis of the variance for significance of regression in multiple regression

This test procedure is called analysis of variance because it is based on decomposition of the totalvariability in the response variable y.

Residual Analysis

The decomposition of the variability in observations through an analysis of variance identity -see equation (6.35) - is a purely algebraic relationship. However, the use of the partitioning totest formally for no differences in treatment means requires that certain assumptions be satisfied.Specifically, these assumptions are that the observations are adequately described by the model


and that the errors are normally and independently distributed with mean zero and constant butunknown variance σ2. If these assumptions are valid, then the analysis of variance procedure isan exact test of the hypothesis of no difference in treatment means.

In practice, since these assumptions will usually not hold exactly, it is unwise to rely on theanalysis of variance until the validity of these assumptions has been checked. Violations of thebasic assumptions and model adequacy can be easily investigated by the examination of residuals.The residual from the least square fit for the jth observation in the ith treatment, defined as

eij = yij − yij (6.73)

plays an important role in judging model adequacy.

In equation (6.73) yij is an estimate of the corresponding experimental or computed response yij

obtained as follows

yij = µ + τi = y.. + (yi. − y..) = yi. (6.74)

Equation (6.74) gives the intuitively appealing result that the estimate of any observation in theith treatment is just the corresponding treatment mean.

Examination of the residuals should be always part of any analysis of variance. If the model isadequate, the residuals should be structureless; that is, they should contain no regular patterns.Through a study of residuals, many types of model inadequacies and violations of the underlyingassumptions can be discovered.

355


A check of the normality assumption may be made by constructing a normal probability plot ofthe residuals, as in Figure 6.21. If the residuals plot approximately along a straight line, thenthe normality assumption is satisfied. Figure 6.21 reveals no apparent problem with normality.When this plot indicates problems with the normality assumption, the response variable is oftentransformed as a remedial measure.

Figure 6.21. Normal probability plot of residuals

Figure 6.22 presents a plot of residuals ei versus the predicted response yi. The general impressionis that the residuals scatter randomly, suggesting that the variance of the responses is constantfor all values of y. If the variance of the response depends on the mean level of y, then this plotwill often exhibit a funnel–shaped pattern. This is also suggestive of the need for transformationof the response variable y.

Figure 6.22. Plot of residuals versus predicted response yi

356


Scaling Residuals

Many response surface analysts prefer to work with scaled residuals, in contrast to the ordinaryleast squares residuals, since they often convey more information than do the ordinary residuals.

Standardized Residuals

One type of scaled residual is the standardized residual

di =ei

σ, i = 1, 2, . . . , n (6.75)

where the residual is scaled by dividing it by the average standard deviation σ =√

MSE isgenerally used in the computation.

These standardized residuals have mean zero and approximately unit variance; consequently, theyare useful in looking for outliers. Most of the standardized residuals should lie in the interval−3 ≤ di ≤ 3, and any phenomenon with a standardized residual outside of this interval is poten-tially unusual with respect to its response. The outliers should be carefully examined, becausethey may represent something as simple as a data recording error or something of more seriousconcern, such as a region of the regressor variable space where the fitted model is a poor approx-imation to the true response surface.

Studentized Residuals

In some data sets, residuals may have standard deviations that differ greatly so that a differentscaling (studentized residuals) should be taken into account.

The vector of fitted values yi corresponding to the computed values yi is

y = xb = Hy (6.76)

The n × n matrix H is usually called the hat matrix because it maps the vector of computedvalues into a vector of fitted values. The hat matrix and its properties play a central role inregression analysis.

The variance of the ith residual is

V (ei) = σ2 (1− hii) (6.77)

where hii is the ith diagonal element of H. Because 0 ≤ hii ≤ 1, using the residual mean squareMSE to estimate the variance of the residuals actually overestimates V(ei). Furthermore, be-cause hii is a measure of the location of the ith point in x-space, the variance of ei depends uponwhere the point xi lies. Generally, residuals near the center of the x-space have larger variancethan do residuals at more remote locations. Violations of model assumptions are more likely atremote points, and these violations may be hard to detect from inspection of ei (or di) becausetheir residuals will usually be smaller.

357


It is therefore recommended to take this inequality of variance into account when scaling theresiduals. Instead of ei (or di) it is suggested plotting the studentized residuals

ri =ei√

σ2(1− hii), i = 1, 2, . . . , n (6.78)

with σ2 = MSE instead of di or ei.

The studentized residuals have constant variance V(ri) = 1 regardless of the location of xi whenthe form of the model is correct. In many situations the variance of the residuals stabilizes, par-ticularly for large data sets. In these cases there may be little difference between the standardizedand studentized residuals. Thus standardized and studentized residuals often convey equivalentinformation. However, because any point with a large residual and a large hii is potentiallyhighly influential on the least squares fit, examination of the studentized residuals is generallyrecommended .

PRESS Residuals

The prediction error sum of squares (PRESS) proposed by Allen (1971, 1974) provides anotheruseful residual scaling. To calculate PRESS, select a design i. Fit the regression model to theremaining n−1 computations and use this equation to predict the withheld response yi. Denotingthis predicted value by y(i) one may find the prediction error for point i as e(i) = yi − y(i), whichis often called the ith PRESS residual. This procedure is repeated for each design i = 1, 2, . . ., n,producing a set of n PRESS residuals e(1), e(2), . . . , e(n). Then the PRESS statistic is defined asthe sum of squares of the n PRESS residuals

PRESS =n∑

i=1

e2(i) =

n∑

i=1

[yi − y(i)]2 (6.79)

Thus PRESS uses each possible subset of n − 1 responses as an estimation data set, and everycomputation in turn is used to form a prediction data set. It would initially seem that calculatingPRESS requires fitting n different regressions. However, it is possible to calculate PRESS fromthe results of a single least squares fit to all n observations. It turns out that the ith PRESSresidual is

e(i) =ei

1− hii(6.80)

Thus, because PRESS is just the sum of the squares of the PRESS residuals, a simple computingformula is

PRESS =n∑

i=1

(ei

1− hii

)2

(6.81)

From equation (6.80) it is easy to see that the PRESS residual is just the ordinary residualweighted according to the diagonal elements of the hat matrix hii. Data points for which hii

are large will have large PRESS residuals. These computations will generally be high influence

358


points. Generally, a large difference between the ordinary residual and the PRESS residual willindicate a point where the model fits the data well, but a model built without that point predictspoorly.

The variance of the ith PRESS residual is

V[e(i)] = V[

ei

1− hii

]=

σ2(1− hii)(1− hii)2

=σ2

1− hii(6.82)

so that the standardized PRESS residual ise(i)√V[e(i)]

=ei/(1− hii)√σ2/(1− hii)

=ei√

σ2 (1− hii)

which, if MSE is used to estimate σ2, is just the studentized residual.

Finally, one may note that PRESS can be used to compute an approximate R2 for prediction,say

R2prediction = 1− PRESS

SST(6.83)

This statistic gives some indication of the predictive capability of the regression model. Theoverall predictive capability of the model based on this criterion is generally satisfactory.

R-Student

The studentized residual ri discussed above is often considered an outlier diagnostic. It is cus-tomary to use MSE as an estimate of σ2 in computing ri. This is referred to as internal scalingof the residual, because MSE is an internally generated estimate of σ2 obtained from fitting themodel to all n designs. Another approach would be to use an estimate of σ2 based on a data setwith the ith design removed. Denote the estimate of σ2 so obtained by S2

(i). It can be shown that

S2(i) =

(n− p) MSE − e2i /(1− hii)

n− p− 1(6.84)

The estimate of σ2 in equation (6.84) is used instead of MSE to produce an externally studentizedresidual , usually called R-student , given by

ti =ei√

S2(i)(1− hii)

, i = 1, 2, . . . , n (6.85)

In many situations, ti will differ little from the studentized residual ri. However, if the ith design isinfluential, then S2

(i) can differ significantly from MSE , and thus R-student will be more sensitiveto this point. Furthermore, under the standard assumptions, ti has a tn−p−1 distribution. ThusR-student offers a more formal procedure for outlier detection via hypothesis testing. However,it is generally accepted that a formal approach is usually not necessary and that only relativelycrude cut–off values need be considered. In general, a diagnostic view as opposed to a strict

359


statistical hypothesis-testing view is preferred. Furthermore, detection of outliers needs to beconsidered simultaneously with detection of influential observations.

Figure 6.23 is a normal probability plot of the studentized residuals. It conveys exactly the sameinformation as the normal probability plot of the ordinary residuals ei in Figure 6.21. This isbecause most of the hii-values are similar and there are no unusually large residuals. In someapplications, however, the hii can differ considerably, and in those cases plotting the studentizedresiduals is the best approach.

Figure 6.23. Normal probability plot of studentized residuals

Influence Diagnostics

One may occasionally find that a small subset of the data exerts a disproportionate influence onthe fitted regression model. That is, parameter estimates or predictions may depend more onthe influential subset than on the majority of the data. The experimenter would like to identifythese influential points and assess their impact on the model. If these influential points are ‘bad’values, then they should be eliminated. On the other hand, there may be nothing wrong withthese points, but if they control key model properties, the experimenter would like to know it,because it could affect the use of the model. Several useful measure of influence are described.

Leverage Points

The disposition of points in the design space is important in determining model properties. Inparticular remote points potentially have disproportionate leverage on the parameter estimates,the predicted values, and the usual summary statistics. The hat matrix H is very useful in iden-tifying influential design points. The elements hij of H may be interpreted as the amount ofleverage exerted by yj on yi. Thus inspection of the elements of H can reveal points that arepotentially influential by virtue of their location in x-space. Attention is usually focused on the

360


diagonal elements hii. Because∑

hii = rank(H) = rank(x) = p, the average size of the diagonalelement of the matrix H is p/n. As a rough guideline, then, if a diagonal element hii is greaterthan 2p/n, design point i is a high–leverage point.

Cook’s Distance

The hat diagonals will identify points that are potentially influential due to their location in thedesign space. It is desirable to consider both the location of the point and the response variablein measuring influence. Cook (1977, 1979) has suggested using a measure, D, of the squareddistance between the least squares estimate based on all n points b and the estimate obtainedby deleting the ith point, say b(i). Cook’s distance practically measures the effect of deleting agiven case.

The statistic Di may be written as

Di =r2i

p·V[y(xi)]

V(ei)=

r2i

p· hii

(1− hii), i = 1, 2, . . . , n (6.86)

It can be noted that, apart from the constant p, Di is the product of the square of the ith stu-dentized residual and hii/(1 − hii). This ratio can be shown to be the distance from the vectorxi to the centroid of the remaining data. Thus Di is made up of a component that reflects howwell the model fits the ith computed value yi and a component that measures how far that pointis from the rest of the data. Either component (or both) may contribute to a larger value of Di.

Points with large values of Di have considerable influence on the vector of least squares estimatesb. The magnitude of Di may be assessed by comparing it with Fα,p,n−p. If Di ' F0.5, p, n−p ,then deleting point i would move b to the boundary of a 50% confidence region for β based onthe complete data set. This is a large displacement and indicates that the least squares estimateis sensitive to the ith data point. Because F0.5, p, n−p ' 1, points for which Di > 1 are usuallyconsidered to be influential. Practical experience has shown the cut-off value of 1 works well inidentifying influential points.

6.7.5 Fitting a Second-Order Model

Many applications of response surface methodology involve fitting and checking the adequacy ofa second-order model. A complete example of this process is presented hereinafter.

Suppose that after a screening experiment involving several factors the two most importantvariables were selected. Because the experimenter thought that the process was operating in thevicinity of the optimum, he elected to fit a quadratic model relating the response to those twovariables. Table 6.5 shows the levels in terms of coded variables x1 and x2, while Figure 6.24shows the experimental design graphically.

361


This design is called a central composite design, and it is widely used for fitting a second-orderresponse surface. Notice that the design consists of four runs at the corners of a square, plus onrrun at the center of this square, plus four axial runs.

Computation x1 x2 y

1 -1 -1 432 1 -1 783 -1 1 694 1 1 735 -1.414 0 486 1.414 0 787 0 -1.414 658 0 1.414 749 0 0 76

Table 6.5. A Central Composite Design

In terms of the coded variables the corners of the square are (x1, x2) = (-1, -1), (1, -1), (-1, 1),(1, 1); the center point is at (x1, x2) = (0, 0); and the axial runs are at (x1, x2) = (-1.414, 0),(1.414, 0), (0, -1.414), (0, 1.414).

Figure 6.24. Example of central composite design

The second–order model will be fitted as

y = β0 + β1x1 + β2x2 + β11x21 + β22x

22 + β12x1x2 + ε

using the coded variables.

362


The matrix x and vector y for this model are

x1 x2 x21 x2

2 x1x2

x =

1 −1 −1 1 1 11 1 −1 1 1 −11 −1 1 1 1 −11 1 1 1 1 11 −1.414 0 2 0 01 1.414 0 2 0 01 0 −1.414 0 2 01 0 1.414 0 2 01 0 0 0 0 0

, y =

437869734876657479

Notice that the variables associated with each column have been shown above that column in thematrix x. The entries in the columns associated with x2

1 and x22 are found by squaring the entries

in columns x1 and x2, respectively, and the entries in the x1x2 column are found by multiplyingeach entry from x1 by the corresponding entry from x2.

Figure 6.25 shows the response surface and contour plot of the predicted response.

Figure 6.25. Example of CCD design experiment

6.7.6 Transformation of the Response Variable

It has been noted above that often a data transformation can be used when residual analysisindicates some problem with underlying model assumptions, such as nonnormality or nonconstantvariance in the response variable. Here the use of data transformation is illustrated by considering

363


a 33 factorial experiment, taken from Box and Draper (1987), which supports a complete second–order polynomial. Its least squares fit is

y = 550.7+660x1−535.9x2−310.8x3+238.7x21+275.7x2

2−48.3x23−456.5x1x2−235.7x1x3+143x2x3

The R2 value is 0.975. An analysis of variance is given in Table 6.6. The fit appears to bereasonable and both the first- and second–order terms appear to be necessary.

Source of Sum of Squares Degrees of Mean SquareVariability (x 10−3) Freedom (x 10−3) F0

First-order terms 14,748.5 3 4,916.2 70.0Second-order terms 4,224.3 6 704.1 9.5Residual 1,256.6 17 73.9

Total 20,229.4 26

Table 6.6. Analysis of the variance for a quadratic model

Figure 6.26 is a plot of residuals versus the predicted values y for this model. There is an indicationof an outward–opening funnel in this plot, implying possible inequality of variance.

Figure 6.26. Plot of residuals vs. predicted values for a quadratic model

When a natural log transformation is used for y, the following model is obtained

ln y = 6.33 + 0.82x1 − 0.63x2 − 0.38x3 ⇒ y = e6.33+0.82x1−0.63x2−0.38x3

This model has R2 = 0.963, and has only three model terms (apart from the intercept). None ofthe second–order terms are significant. Here, as in most modelling exercises, simplicity is of vitalimportance. The elimination of the quadratic terms and interaction terms with the change in

364


response metric not only allows a better fit than the second–order model with the natural metric,but the effect of the design variables x1, x2, and x3, on the response is clear.

Figure 6.27 is a plot of residuals versus the predicted response for the log model. There is stillsome indication of inequality of variance, but the log model, overall, is an improvement on theoriginal quadratic fit.

Figure 6.27. Plot of residuals vs. predicted values for a log model

The transformation of the response variable can be used for stabilizing the variance of the re-sponse.

Generally, transformations are used for three purposes: stabilizing the response variance, makingthe distribution of the response variable closer to the normal distribution, and improving the fit ofthe model to the data. This last objective could include model simplification, say by eliminatinginteraction, or higher–order polynomial terms. Sometimes a transformation will be reasonablyeffective in simultaneously accomplishing more than one of these objectives.

It is often found that the power family of transformations y∗ = yλ is very useful, where λ is theparameter of the transformation to be determined (e.g., λ = 1/2 means use the square root ofthe original response). Box and Cox (1964) have shown how the transformation parameter λ

may be estimated simultaneously with the other model parameters (overall mean and treatmenteffects). The theory underlying their method uses the method of maximum likelihood. The actualcomputational procedure consists of performing, for various values of λ, a standard analysis ofvariance on

yλ =

yλ−1

λ yλ−1λ 6= 0

y ln y λ = 0

(6.87)

365


where y = ln−1[(1/n)∑

ln y] is the geometric mean of the responses. The maximum likelihoodestimate of λ is the value for which the error sum of squares, SSE(λ), is a minimum. This valueof λ is usually found by plotting a graph of SSE(λ) versus λ and then reading the value of λ

that minimizes SSE(λ) from the graph. Usually between 10 and 20 values of λ are sufficientfor estimation of the optimum value. A second iteration using a finer mesh of values can beperformed if a more accurate estimate of λ is necessary.

Once a value of λ is selected, the experimenter can analyze the data using yλ as the response,unless of course λ = 0, in which case he/she can use ln y. It is perfectly acceptable to use yλ

as the actual response, although the model parameter estimates will have a scale difference andorigin shift in comparison with the results obtained using yλ (or ln y).

An approximate 100 (1− α)% confidence interval for λ can be found by computing

SS∗ = SSE(λ)

(1 +

t2α/2, ν

ν

)(6.88)

where ν is the number of degrees of freedom, and plotting a line parallel to the λ-axis at heightSS∗ on the graph of SSE(λ) versus λ. Then by locating the points on the λ-axis where SS∗ cutsthe curve SSE(λ), the experimenter can read confidence limits on λ directly from the graph. Ifthis confidence interval includes the value λ = 1, this implies that the data do not support theneed for the transformation.

366

Bibliography

[1] Allen, D.M.: Mean Square Error of Prediction as a Criterion for Selecting Variables, Technometrics,Vol. 13, 1971, pp. 469–475.

[2] Allen, D.M.: The Relationship Between Variable Selection and Data Augmentation and a Methodfor Prediction, Technometrics, Vol. 16, 1974, pp. 125–127.

[3] Box, G.E.P. and Cox, D.R.: An Analysis of Transformations, Journal of the Royal StatisticalSociety, B, Vol. 26, 1964, pp. 211–243.

[4] Box, G.E.P. and Draper, N.R.: Empirical Model Building and Response Surface, John Wiley & Sons,New York, 1987.

[5] Box, G.E.P. and Wetz, J.M.: Criterion for Judging the Adequacy of Estimation by an ApproximationResponse Polynomial , Technical Report no. 9, Department of Statistics, University of Wisconsin,Madison, 1973.

[6] Box, G.E.P. and Wilson, K.B.: On the Experimental Attainment of Optimum Conditions, Journalof the Royal Statistical Society, Series B, Vol. 13, 1951, pp. 1–45.

[7] Cook, R.D.: Detection of Influential Observation in Linear Regression, Technometrics, Vol. 18, 1977,pp. 15–17.

[8] Cook, R.D.: Influential Observations in Linear Regression, Journal of American Statistical Associa-tion, Vol. 74, 1979, pp. 169–174.

[9] Fang, S., Gertner, G.Z., Shinkareva, S., Wang, G. and Anderson, A.: Improved Generalized FourierAmplitude Sensitivity Test (FAST) for Model Assessment , Journal of Statistics and Computing,Vol. 13, 2003, pp. 221–226.

[10] Gunst, R.F. and Mason, R.L.: Some Considerations in the Evaluation of Alternative PredictionEquations, Technometrics, Vol. 21, 1979, pp. 55–63.

[11] Hill, R.C., Judge, G.G. and Fomby, T.B.: Test the Adequacy of a Regression Model , Technometrics,Vol. 20, 1978, pp. 491–494.

[12] Hines, W.W. and Montgomery, D.C.: Probability and Statistics in Engineering and ManagementScience, 3rd edition, John Wiley & Sons, New York, 1990.

[13] Khuri, A.I. and Cornell, J.A.: Response Surfaces: Designs and Analyses, 2nd edition, Dekker ed.,New York, 1987.

[14] Koda, M., McRae, G.J. and Seifeld, J.H.: Automatic Sensitivity Analysis of Kinetic Mechanisms,International Journal of Chemical Kinetics, Vol. 11, 1979, pp. 427–444.

[15] Myers, R.H.: Response Surface Methodology , Allyn and Bacon eds., Boston, 1976.

[16] Myers, R.H.: Response Surface Methodology: Current Status and Future Directions, Journal ofQuality Technology, Vol. 31, 1999, pp. 30–44.

[17] Myers, R.H. and Montgomery, D.C.: Response Surface Methodology: Process and Product Optimiza-tion Using Designed Experiments, John Wiley & Sons, 2002.

367

Bibliography

[18] Suich, R. and Derringer, G.C.: Is the Regression Equation Adequate? One Criterion, Technometrics,Vol. 19, 1977, pp. 213–216.

368

Chapter 7

Metamodelling Techniques

Today’s engineering design frequently involves complex and large technical systems. With theadvances of Computer–Aided Design and Engineering (CAD/CAE) techniques, complex math-ematical models and computation-intensive numerical analyses/simulations have been used tomore accurately simulate and analyze the system behaviour from many aspects, to explore asmall number of design alternatives and to guide design improvements at subsystem level. How-ever, the high computational cost associated with these analyses and simulations prohibits themfrom being used as performance measurement tools in a design selection and optimization. Inspite of advances in computer speed and capacity, computer codes (e.g., CFD and FEM) alwaysseem to keep pace so that their computational time and cost remains non-trivial. The designoptimization process normally requires a large number of numerical iterations, before the optimalsolution is identified.

The multidisciplinary nature of design and the need for incorporating uncertainty in design se-lection and optimization have posed additional challenges. In order to reduce computationalefforts, a widely used strategy is to utilize approximation models which are often referred to asmetamodels, as they provide a ‘model of the model (Klejinen, 1987), replacing the expensive sim-ulation models during the design process. Metamodels are thought to provide an approximationto simpler physical models, where a set of independent input variables lead to a dependent outputvariable in a function–like manner.

7.1 Notion of Metamodel

This section concerns itself with the literature background of design methods, metamodellingnotions and related techniques. As such, it is intended to serve a dual purpose. The first isto provide an overview of the research and scientific as well as engineering progress which hastaken place to date in the area of metamodelling. A second intention is to give a survey of thestate–of–the–art in several fields that are related to metamodelling and have helped guide theproblem statements and formulation of the research questions.

369

7 – Metamodelling Techniques

While it is not easily discernable where the term ‘metamodelling’ first originated and in whatcontext it was placed then, what does become clear from a literature review are two aspects:

1. within the field of computer science, there is a term being referred to as ‘metamodelling’,which seems to be directed towards an information coding approach more than modelling,and as such is decidedly different from the term’s use in engineering;

2. the use of the term ‘metamodelling’ in the context of essentially the combination of func-tion approximation and modelling task, seems primarily found in engineering literature,although the nature of the problems it addresses spans across several other scientific disci-plines as well.

This knowledge is useful when attempting to understand how the design efforts relate to variousareas of science. What sets engineering apart from other sciences is often the task of having todesign a real–world, complex and mostly functional systems. The complexity issues introducedin both product and especially the manufacturing process are not as likely to be found in puresciences, but they commonly arise in areas of other applied sciences and social sciences.

7.1.1 Nature of Metamodelling

To counter the confusion that design of complex technical systems poses, and to provide firststeps towards solving the problems mentioned, researchers in the field of mathematic statisticshave introduced the notion of metamodels as a means to organize information. While modelsemployed in other fields and disciplines are based on the physics directly and may only be validfor small ranges and very specified circumstances, engineering metamodels inherently need tospan over wide ranges and configurations and encompass a variety of circumstances. Metamodelsare to the engineers what models are to the physicists. They help understand the design world,that is, the design space; they help organize, explain and predict results within this space.

This point, however, leads to a duality in the tasks one expects from a metamodel: the distinctdifference between a predictive model and one that is primarily explanatory. It may seem thatthis distinction is artificial, and actually does not even make sense, since every highly explana-tory model in both science and engineering has always been judged and validated by its abilityto predict data accurately. Thus, an explanatory model, which does not predict well, will notbe deemed appropriate. Similarly, a model which predicts well, and thus captures the essenceof what is to be modelled at least partially, will almost inevitably also reveal something aboutthe object of study. However, in the case of metamodelling, this can be different. The apparentparadox dissolves when one considers the true nature of metamodelling: the effective combinationof a regression task and a modelling task , rather than a pure modelling task.

It may seem that this alone must not lead to a contradiction and that the metamodel yieldingthe best explanatory insights will often also achieve a high rating on the predictive accuracy; thisis indeed the case for most first- and second–order approximations. However, as the boundariesto higher standards are expanded, a discrepancy among these aims will appear.

370

7.1 – Notion of Metamodel

As a regression tool, a predictive metamodel is primarily concerned only with the accuracy andvalidity of the predictions it produces. However, while this purpose is the primary factor it isnot always the only one. In certain cases it may be desirable to sacrifice part of the predictionaccuracy in order to obtain a model that, due to its transparency, yields insight into the natureof the problem itself. An example of such is the execution of screening tests: the actual fit andthus predictive validity is of secondary importance in such a case.

One could argue that any model, given that it has a sufficient va1idity in the predictions, willa1so be capab1e of yie1ding insight into the effects and dynamics within the modelled systems,due to prediction profiles which can be generated for each variable. While this is true and mostcertainly useful, a bit of doubt remains, since such information is often most1y graphic and intu-itive and analysis and further thought needs to be invested before many significant conclusionscan be drawn.

In the end, however, as useful as explanatory models may be, there will a1ways a1so be a needfor accurately predicted va1ues, with those criteria ranking higher in the regression approachthan the transparency requirement. Primary app1ications of such include optimization runs andsensitivity ana1yses. Here the emphasis is de1iberately p1aced on the regression metamodel,where predictive va1idity is of primary concern, but it shou1d be noted that this might nota1ways be app1icab1e. Thus, other processes and techniques, which yield a higher transparency,still have their place in design, and may be favored in certain circumstances.

7.1.2 Metrics for Performance Measurements

The requirement that any designer building a metamodel would strive to fulfill is that of maximumefficacy, where efficacy is the combination of effectiveness and efficiency. For an approximationmetamodel to be effective means primarily to yield a good, accurate representation of the ap-proximated response. Typically, such a model is based on certain data points, and then thesedata points are approximated to yield the representation. Maximum efficiency is reached when aminimum of data points is required to achieve an acceptable representation. These data pointsare each actual design configurations, often referred to as cases.

In order for the metamodel to be effective, it must exhibit a good representation of the underly-ing, unknown function. This representation should satisfy several criteria:

• it must be accurate;• it must have good interpolation properties;• it is desirable to have reasonable extrapolation properties.

The first requirement for a metamodel to be effective is related to the accuracy: that is, to fitthe given data points well. How well depends on how much accuracy is desired, and this is oftencaptured in the form of an error criterion or other convergence criteria, or a model adequacychecking that is the result of regression. Clearly, the approximation needs to be sufficiently pre-cise: a certain lever of accuracy will be necessary for subsequent decision making to be successful

371


and meaningful. However, it is not desirable to focus solely on perfect accuracy. Rather, theoptima1 degree of accuracy will be a function of the level of noise in the modelling process whichthe metamodel is approximating.

On a second note, to be effective the metamodel representation must also model potential datapoints, not just those that happen to have been chosen as a basis for the approximation. Thisis captured by the term interpolation, and often also associated with the term extrapolation.Partially, it is a matter of collecting a representative data set. This aspect is also closely relatedto the efficiency of the metamodel: the objective will be to choose those cases which minimize thetotal number of cases necessary to be analyzed but still yield a reasonably representative dataset. This is where Designs of Experiments (DoE) (Box, 1987) have been found useful.

A third criterion making a metamodel more effective would be for it to remain a good approxi-mation even if it was to be applied outside the bounds it was originally designed for. In practice,the need to extrapolate computational results arises surprisingly often. Therefore, although itseems inherently unnecessary to explore extrapolation capabilities, it remains to be a thought tobe considered whenever possible. Naturally, the extrapolation properties will largely depend onthe problem itself, and a general treatment may well be practically impossible.

In accordance with having multiple metamodelling criteria, the performance of each metamod-elling technique is measured from the following aspects

• accuracy : the capability of predicting the system response over the design space of interestby using nore complex approximations that are capable of fitting both linear and nonlinearfunctions;

• robustness: the capability of achieving good accuracy for different sample sizes;

• efficiency : the computational effort required for sampling the design space by using fewersample points and seeking better coverage of the design space for constructing the meta-model and for predicting the response for a set of new points by metamodels;

• transparency : the capability of illustrating explicit relationships between input variablesand responses;

• conceptual simplicity : ease of implementation; simple methods should require minimumuser input.

To provide a more complete picture of metamodel accuracy and to verify the accuracy, threedifferent metrics are used: R-Square, Relative Average Absolute Error , and Relative MaximumAbsolute Error .

372

7.2 – Metamodels in Engineering Design

Correlation Determination

R2 = 1−

n∑

i=1

(yi − yi)2

n∑

i=1

(yi − yi)2= 1− MSE

σ2(7.1)

where yi is the corresponding predicted value for the given value yi; yi is the mean of the observedvalues. While MSE (Mean Square Error) represents the departure of the metamodel from thereal simulation model, the variance captures how irregular the problem is. The larger the valueof R square, the more accurate the metamodel.

Relative Average Absolute Error

RAAE =

n∑

i=1

| yi − yi |

n·σ (7.2)

where σ stands for standard deviation. The smaller the value of RAAE, the more accurate themetamodel.

Relative Absolute Absolute Error

RMAE =max (|y1 − y1| , |y2 − y2| , . . . , |yn − yn|

σ(7.3)

While the RAAE is usually highly correlated with MSE and thus R2, RMAE is not necessarily.Large RMAE indicates large error in one region of the design space even though the overallaccuracy indicated by R2 and RAAE can be very good. Therefore, a small RMAE is preferred,however, since this metric cannot show the overall performance in the design space, it is not asimportant as R2 and RAAE.

7.2 Metamodels in Engineering Design

In physics and other sciences, researchers use models to describe what they know about the world.These models do not describe the world as a whole, but rather one part of it. Such models aretypically based on a set of data, which has been carefully collected by scientists. Then the modelis formulated in a manner that it agrees with existing theories, satisfies and explains the givendata and accurately predicts new data. Engineering, as a discipline, makes use of these advancesin science. By learning about the explanatory and predictive models created by disciplines suchas physics, engineers create analysis codes that employ these models to analyze the specific con-figurations and designs with which they work.

373


Metamodels, like other models, are also created based on data. In the case of metamodels,however, the data collected is produced from analysis codes which are run for multiple configu-rations. The word metamodel is derived from ‘meta’, a Greek pre–syllable which means ‘over’or ‘above’. In essence, metamodels are actually similar to the mathematical models, which aredeveloped and used in science to describe physical phenomena. However, they are concerned withthe information that can be expressed during the modelling process, and thus also referred toas information models (Friedman, 1996). While a physics–based model is intended to describeand/or explain actual data that has been collected, metamodels are different because as informa-tion models, they attempt to model the model (Sacks et al., 1989), rather than the actual dataitself. They are necessary because the level at which design analysis takes place is so far removedfrom the physics–based models that the complexity of the design problem becomes too intense touse physics-based models directly. This process of model and metamodel creation is illustratedin Figure 7.1.

Figure 7.1. Metamodels in Technical Systems Design

374

7.2 – Metamodels in Engineering Design

This figure is structured to illustrate the ways models in generaI and metamodels in particularare built on data collected at various points. The source of the data gives the associated modelsvalidity. When employing physics-based analysis codes to create the metamodels, their range ofvalidity is thus determined by the range of validity of the underlying physics models on whichthe analysis tool was built. This typically results in a much greater range of validity than testdata based analysis models can offer.

With this background, it becomes clear that the design discipline must be closely related to in-formation modelling and information technology. There is - or should be - a close tie between thedesign disciplines and areas of computer science and mathematics. Information is the designer’sraw material and what he molds to ultimately determine the design that best suits the require-ments. As such, metamodels find their place in design as information models, and as models ofmodels.

While direct integration of the disciplines is not an option, metamodels offer an enabling mecha-nism to integrate physics-based disciplines in a mathematically and technically sound way whileupholding the contextual integrity of the information. The creation of metamodels is thereforean enabling technology to not only the design process itself, but also for further research in prob-abilistic and robust design. This a1so inc1udes the infusion of new techno1ogies and integrationof new and higher–1eve1 discip1inary aspects.

7.2.1 Modelling Design Uncertainties

In the early stages of the design process, the design team has only a rough knowledge of the formof the design, also because precise information on the functional requirements mat not be avail-able. Further, the design parameters themselves may not be specified precisely, but only knownto fall within a certain range. To select among design alternatives at an early stage, the teammembers are faced with uncertainties in both the design description (and consequently the prod-uct performance) and the functional requirements (and consequently the technical specifications).

A metamodel based approach allows uncertainties to be examined across the many dimensionsthat the design team must examine: functional requirements, technical requirements, designspecification, predicted performance, manufacturing costs, and so forth. In this respect, it ismandatory to describe how the design process involves an examination of the product’s charac-terization across many domains, and how uncertainties complicate the picture, especially whenthe maps between domains are nonlinear.

The uncertainties further complicate the design process: in addition to considering the mappingof design parameters to other spaces, and constraints or goals back to design parameter space,there is a requirement to map the uncertainty characterizations between spaces. The inherentnonlinearity of the maps cannot be ignored when the uncertainty characterizations is over a non-infinitesimal region.

Defining the supply region and demand region and their overlap can be difficult when the mappingfunctions are nonlinear. Further, depending on the space in which the comparison is made, the

375


required mapping function or software may not exist explicitly, but only through the inverse of aknown function or subroutine. For example, the map from the technical performance to designparameters is known implicitly through its inverse, the (e.g.added resistance in waves) map thatprovides performance as a function of the design parameters.

As metamodels generally allow rapid calculation, the nature of the regions and their overlap canbe explored numerically. For example, the overlap can be measured by numerical integrationof the intersection of the two maps, using any numerical integration method and any measurefunction.

7.2.2 Importance of Accuracy Estimates

One concern still remaining with this methodo1ogy it that it is crucially dependent on ensuringthat the accuracy, which is assumed to be met, is indeed achieved.

The benefits of metamodelling - if accurate - to the designers are c1ear, and include:

• integration across different teams and organizations;• reduction in design cyc1e effort, quick tradeoff for eva1uation;• visibi1ity and transparency;• enab1ing of the day–to–day use of probabi1ity methods;• allowing for parametric design definitions.

However, each of these points is dependent on a functiona1 accuracy assessment, and the ques-tions becomes: How do I know whether the accuracy I am assuming is indeed being provided?A1so, the accuracy measure itse1f may be based on assumptions.

Most of the common1y emp1oyed metamode1ing techniques require va1idation cases to be runin order to ensure that this is the case, as the method itse1f does not usually provide a way ofassessing the confidence in the prediction and the data set provided.

It would therefore be beneficial to have an alternative method that can provide these estimates,based on certain assumptions. One could attempt to prove that Bayesian regression, implementedthrough the Gaussian Process prediction method, is capable of doing so.

7.3 Engineering Simulation Metamodels

As previously mentioned, the idea of metamodelling as used in engineering design today is de-rived mainly from a regression and function approximation approach. This was guided foremostby the desire to have fast, easily created and reliable models upon which to base more involvedstudies. The major application of and need for such models arose directly out of design tasks andhigher–level approaches such as simulation and optimization.

A first, relatively early survey specifically on metamodels addressing the nature and intent behindcreating them, requirements imposed, and some examples of how they can be employed was given

376

7.3 – Engineering Simulation Metamodels

by Friedman (1996). He introduced the simulation aspect of metamodels and how this gave riseto emphasis on reliable predictions and demanded accuracy. Later reprise of the idea has oftengone to specific application of a certain technique, usually within an embedded larger applicationtechnique such as a particular optimization approach or certain framework conceived to facilitatethe synthesis aspect of design (Mavris et al., 1996).

Research publications on metamodelling often focus on specific techniques, sampling techniquesand the Design of Experiments (DoE) (Sacks et al. 1989; Mavris et al., 1998; Simpson, et al.,2001).

The use of approximation models, based on statistical techniques, to replace the expensive com-puter analysis is the preferable strategy to avoid the computation barrier to the application ofmodern CAD/CAE tools in conceptualdesign selection and preliminary design optimization. Sta-tistical techniques are widely used in multidisciplinary design to construct approximations ofexpensive computer analyses; they are much more efficient to run, easier to integrate, and yieldinsight into the functional relationship between design variables and performance responses. Toreplace the expensive simulation models during the concept design and optimization processes ofsubsystems, approximation models are often referred to as metamodels as they provide a ‘modelof the model’ (Klejnen, 1987).

A variety of metamodelling techniques have been introduced in the last decades; among themthe response surface methodology (RSM). The RSM stems from science disciplines in which phys-ical experiments are performed to study the unknown relation between a set of variables andthe system output or response. These relations are then modelled using a mathematical model,called response surface. Optimization based on the response surface methodology is referred toas experimental optimization.

The creation of response surface metamodels to approximate detailed computer analysis codes isparticularly appropriate in the conceptual design stage when comprehensive trade-offs of multi-ple performance and economic objectives is critical. Through metamodelling, approximations ofthese codes are built that are orders of magnitude cheaper to run. These metamodels can then belinked to optimization routines for fast analysis, or they can serve as a bridge for integrating anal-ysis codes across multiple disciplines. The simulation community has used metamodels to studythe behavior of computer simulations for over twenty-five years. The most popular techniqueshave been based on parametric polynomial response surface approximations. With metamodels,designers are interested in not only predicting a response at a new design setting, but also gaininginsight into the relationship between a response and the input variables.

A simulation model input–output function is mathematically represented as

y = g(v) (7.4)

where y and v are vector valued, and will usually include random components. For example, thev vector for a seakeeping simulation might include the main geometrical characteristics, whereasthe vector y might include acceleration values in different sea states.

377


Metamodels are typically developed separately for each component of y, that is, for each coor-dinate function of g. For the time being, attention will be restricted to models where: (i) y hasone component, (ii) the random component, if present, is additive, and (iii) the list of variablesis restricted to those that will be in the argument list of the metamodel

y = g(x) + ε (7.5)

The metamodelling task involves finding ways to model g and ways to model ε. The metamodelwill be denoted as f and the predicted output responses as f(x) or y; that is,

g(x) ' f(x) = y (7.6)

The major issues in metamodelling include: (i) the choice of a functional form for f , (ii) thedesign of experiments, i.e., the selection of a set of x points at which to evaluate y (run thefull model) in order to adjust the fit of f to g, the assignment of random number streams, thelength of runs, etc., and (iii) the assessment of the adequacy of the fitted metamodel (confidenceintervals, hypothesis tests, and other model diagnostics). The functional form will generally bedescribed as a linear combination of basis functions from a parametric family. So there are choicesfor families (e.g. polynomials, sine functions, piecewise polynomials, etc.) and choices for the wayto pick the ’best’ representation from within a family (e.g. least squares, maximum likelihood,cross validation, etc.).

The issues of design of experiments and metamodel assessment are related since the selection ofan experiment design will be determined in part by its effect on assessment issues. The mostpromising metamodel and experiment design strategies should be focused according to the sim-ulated phenomenon.

The most popular techniques for constructing f have been based on parametric polynomial re-sponse surface approximations. While reviewing recent developments for polynomial metamodelsbeginning with traditional response surface methodology, alternative approximation techniquesmay be used as metamodels:

• response surface metamodels;

• spline metamodels;

• radial basis function metamodels;

• artificial neural networks;

• kernel smoothing methods;

• spatial correlation metamodels;

• frequency-domain basis functions;

• kriging method.

All the metamodels used herein are based on a finite number of predetermined analyses, analo-gous to the design of experiments (DoE) or Taguchi methods. Developments for both responsesurface models and non–traditional models provide increased efficiency and applicability for thesemethods. In particular, recent work in the areas of spatial correlation and radial basis functions

378

7.3 – Engineering Simulation Metamodels

has clarified the importance of experimental design for non-traditional models. While the fittingcapability of these alternative methods is exciting, a more extensive computational comparisonof the methods is needed, but this will have to wait for more generally available computer codesfor the newer metamodeling methods.

An important issue related to metamodelling is how to achieve a good accuracy of a metamodelwith a reasonable number of sample points. While the accuracy of a metamodel is directly re-lated to the metamodelling technique used and the properties of a problem itself, the types ofsampling approaches also hae direct influences on the performance of a metamodel. It is generallybelieved that space–filling designs, e.g., Latin Hypercube design (McKay et al., 1979), OrthogonalArray (Owen, 1992), Minimax and Maximin design (Johnson et al., 1990), and Entropy design(Currin et al., 1991), etc., are preferable for metamodelling. In addition to the types of samplingapproaches, equally important is the sample size and how the sample points are generated.

Typically, sample points are generated all at once, that is, at one–stage. However, when a limitedsample size is affordable, sequentially selecting a (or several) sample point(s) at a time allowsthe sample size to be determined as data accumulate, or in other words, it allows the samplingprocess to be stopped as soon as there is sufficient information (Sacks et al., 1989). Based on asequential sampling strategy, a metamodel is updated sequentially with new sample points andthe associated response evaluations. Sequential sampling strategies that can take the advantageof information gathered from the existing metamodel could be very helpful. Fully sequentialdesign is, therefore, the most natural for computer experiments.

Sequential sampling can be used for both global metamodelling and metarnodel–based designoptimization. Sequential sampling for global metarnodelling focuses on sequentially improvingthe accuracy of a metarnodel over the entire design space of interest; Sequential sampling formetamodel–based design optimization emphasizes finding the global optimum by balancing asearch for the optimal point of a metarnodel and a search for unexplored regions to avoid missingpromising areas due to the inaccuracy of the metamodel.

Sequential sampling is recommended for global metamodeling (Jin et al., 2002). The major issuein sequential sampling is how to choose new sample points based on the locations of availablesample points and the information gathered from a created (updated) metamodel. To select newsample points, optimality criteria, such as the maximum entropy criterion and the maxmin dis-tance criterion are often used. Regarding the utilization of information from an earlier createdmodel, Sacks, et al. (1989) classified sequential sampling approaches as sequential approacheswith adaptation to the existing metamodel and those without adaptation. One immediate ques-tion is, how effective are the sequential sampling approaches compared to a one–tage approach?Clearly, any sequential approaches without adaptation will generally be less efficient than a one–stage optimal sampling approach, since these approaches simply split the optimization in theone–stage approach into a sequence of sma1l optimizations and their results (sample points) areonly sub–optimums of the one–stage approach. On the other hand, if the information gatheredfrom an existing metamodel is dependable and usable, sequential approaches with adaptationmay be superior to one–stage approaches.

379


Even though most of the existing works provide good details about a particular type of sequentialsampling strategy, it is not clear the effectiveness of a sequential sampling strategy when used fora variety of applications. There is also a lack of comparison between various sequential samplingapproaches. Further, in the literature of computer experiments, sequential sampling approacheswith adaptation are mainly developed for the Kriging method. They typica1ly make use of themodel structure information available from a Kriging model. These techniques are no longerapplicable when metamodeling techniques other than Kriging are used. It is necessaty. therefore,to investigate the generai applicability of sequential sampling for creating global metamodels andto develop new sequential sampling approaches, which are not limited to Kriging metamodel.

7.4 Space Filling Designs

7.4.1 Response Surface Models

This methodology was originally intended to aid in optimization problems by specifically assessingthe region of interest within a larger space, sampling over this region via DoE and then approx-imating it with a linear or quadratic polynomial function to assess where to step next. RSM’sefficient use of potentially expensive function calls lends itself to application in engineering design.

For the purpose of applying it as a metamodelling technique in engineering systems design, themethod was slightly abstracted in order to be more of a predictive model than an optimizationtool. This allowed the creation of the first metamodelling process of its kind, which was completefrom model assumption over the sample and through a least squares regression fit.

Response surface models based on computer experiments are called surrogates. There are a largenumber of successful design examples using surrogates that are available in the literature. Appli-cations of surrogates have also been widely studied in multidisciplinary design optimization. Forinstance, Renaud and Gabriele developed response surface approximations of multidisciplinarysystems during concurrent subspace optimization; Chen et al. (1996) investigated the use ofresponse surface approximations for robust concept exploration and concurrent design; Korngoldand Gabriele addressed discrete multidisciplinary problems using RSM.

When employing surrogates in design optimization, the number of expensive computer exper-iments should be minimized to reduce computation cost, while ensuring improvements in theactual system through each step of the search. The surrogate related methods can be roughlycategorized into two groups. The first group focuses on the better choice of an experiment plan-ning scheme and a response surface model. The second group concentrates on the approximationmodel management in the optimization process to ensure a design optimum for the actual problem.

Standard DoE methods, such as factorial design and central composite design are often used inexperiment planning. Other schemes include the method of Taguchi (1993). D-optimal designand Latin Hypercube designs. Sacks et al. (1989) proposed a stochastic model to treat the de-terministic computer response as a realization of a random function with respect to the actual

380

7.4 – Space Filling Designs

system response. Neural networks have also been applied in generating the response surfaces forsystem approximation.

The standard response surface methodology first employs an experimental strategy to generatedesign points in the design space; then it applies either the first–order model or the second–ordermodel to approximate the unknown system by means of a polynomial regression.

Response surface methods have been used effectively for over thirty years as metamodels. Thesemethods are the topic of entire texts (Box and Draper, 1987; Myers, 1976). Polynomial regressionmodels were developed for the ‘exploitation’ of response surfaces (1), that is, for optimization.This approach fits first or second order polynomial models to y, the system response. The modelhas the form of equation (7.6) with y a scalar and ε a scalar, although these quantities are oftenviewed as vectors by considering multiple input data simultaneously.

Polynomial Regression

The response surface methodology based on polynomial regression has been widely applied inengineering area because of its easiness in implementation and interpretation (Chen et al., 1996;Simpson et al., 1997). To fit a reasonable metamodel by polynomial regression, the sample sizeshould be at least two or three times of the number of model coefficients. For a problem withn input variables. there are (n + 1)(n + 2)/2 coefficients for a quadratic polynomial model,(n + 1)(n + 2)(n + 3)/6 coefficients for a cubic polynomial model, etc. However, it should benoted that with the number of input variables increasing, the cubic or higher order polynomialmodeld could easily become unaffordable.

The metamodels produced by response surface equations are continuous and have continuousderivatives. Once estimated, the only informaion equired to use a response surface equation as ametamodel are the polynomila coefficinets. The general polynomial form of a quadratic responsesurface equation is included in equation (7.7), where y is the dependent response, x representsthe independent design variable, and β represents the polynomial coefficient.

y = β◦ +n∑

i=1

βixi +n∑

i=1

n∑

j≥i

βijxixj (7.7)

When creating polynomial regression models, it is possible to identify the significance of differentdesign factors directly from the coefficients in the normalized regression model. For problemswith a large dimension, it is important to use linear or second–order polynomial models to nar-row the design variables to the most critical ones. In optimization, the smoothing capability ofpolynomial regression allows quick convergence of noisy functions.

In spite of the advantages, there is always a drawback when applying polynomial regression tomodel highly nonlinear behaviors. Higher–order polynomials can be used; however, instabilitiesmay arise (Burton, 1992), or it may be too difficult to take sufficient sample data to estimate offal the coefficients in the polynomial equation, particularly in large dimensions.

381


Mathematical Form for Response Surface Models

Let y = (y1, . . . , yn) represent a set of (univariate) outputs of the simulation model run underinput variables x1, . . . , xn, respectively. The εi for the multiple observations are assumed to beindependent, identically distributed Gaussian quantities with variance σ2. The basis functionsare usually taken as the products of the power functions, 1, xj . x2

j , giving

y = f(x) =∑

βkpk(x) (7.8)

where pk(x) is a product of univariate power functions, such as x1, x21, . . ., x2

2x3, etc. Alternatively,the basis may be orthogonal polynomials, ϕk(x), providing the same polynomial for f but adifferent representation:

y = f(x) =∑

αkϕk(x) (7.9)

The coefficients βk or αk are estimated from observed (xi, yi) data points, i = 1, . . . , n via leastsquares or maximum likelihood estimation, which are identical procedures for Gaussian errors.The resulting estimates can be thought of as random quantities that depend on the randomobservations. The advantage of Equation(7.9) over Equation (7.8) is that the coefficient estimatesfor the αk’s will be uncorrelated and will be robust to small changes in the observed data.

7.4.2 Spline Metamodels

Any polynomial approximation represented by Equation (7.8) can be constructed from linearcombinations of the functions Πxjk, where the product is over k, and the index jk may take thesame value more than once.

The high–order polynomial achieves a good fit by adjusting coefficients to achieve cancellation oflarge oscillations over most of the range. This reliance on cancellation makes high–order poly-nomial fits non-robust. If a quadratic approximation to the function is adequate, then globalpolynomial basis functions can be used to build the approximating metamodel. If a more accu-rate representation is needed, the simulation modeler should consider other basis functions fromwhich to build the metamodel.

Spline models have been used widely for approximation of deterministic simulation responses.Myers (1996) describes the use of splines for linking submodels for system-level design.

Mathematical Form for Spline Models

The difficulties with polynomial basis functions are avoided if they are applied to a small regionand only low order polynomials are used. This is the motivation for metamodels based on piece-wise polynomial basis functions. When continuity restrictions are applied to adjacent pieces, thepiecewise polynomials are called splines. The (univariate) metamodel can be written as

y = f(x) =∑

cjBj(x) (7.10)

382


where the Bj are the quadratic or cubic piecewise polynomial basis functions. The basis func-tions can be described most simply for the univariate case. The domain is divided into intervals[t1,t2] , [t2, t3], . . . , [tn−1, tn], whose endpoints are called knots. Two sets of spline basis functionsare commonly used, e.g. the truncated power function basis and the B-spline basis (deBoor, 1978).

Since most simulation model output functions will not be deterministic, interpolating splineswill not be satisfactory. The motivation for smoothing splines is based on an explicit trade–offbetween the fit accuracy of the approximation at known points and smoothness of the resultingmetamodel. The fit term is represented as a sum of squared differences of the metamodel andsimulation model responses at each of the experimental runs.

The smoothing spline functions arise as solutions to the following optimization problem, wherethe relative importance of fit versus smoothness is controlled by the smoothing parameter λ:

minfλ∈Ck−2

∑(yi − fλ(xi))2 + λ

∫(fk−2

λ )2dx (7.11)

If the smoothing parameter is λ = 0, it provides interpolation with no constraint on smoothness.The function that minimizes the quantity (7.11) will be a spline of order k, which is in Ck−2

(continuous derivatives up to the (k − 2)th derivative) and is a piecewise polynomial with termsup to xk−1. The knots will occur at points in x corresponding to the observed data, xj .

An important issue is the selection of the value for the smoothing parameter λ. The value maybe chosen by visual examination of the fit, or by minimizing cross validation (like residual sumof squares), or generalized cross validation (an adjusted residual sum of squares).

Three classes of spline metamodels can be described as solutions to special cases of this smooth-ness vs. fit trade–off (7.11):

• Smoothing Splines: k is chosen by the user, knots are not pre-specified, but they will occurat the xj values in the optimal solution (i.e., tj = xj), λ can be chosen based on the user’spreference or by generalized cross validation.

• Spline Interpolation: k is chosen by the user; knots are not pre–specified, but they willoccur at the xj values in the optimal solution, λ = 0.

• Regression Splines: k is chosen by the user, preferably near local maxima/minima and in-flection points, knots are chosen by the user, λ = 0.

Multivariate Splines

Tensor products of univariate splines can be used for multivariate metamodels (deBoor 1978).Tensor product approximation requires a full factorial experiment design to estimate the param-eters of the metamodel. Univariate splines are fit for each factor, for each level of every otherfactor. There is no requirement for either equal numbers of levels across all design factors, norequal spacing within one factor. Because tensor product splines require many experimental runson a complete rectangular grid, and because there are numerical difficulties in calculating thespline coefficients for metamodels with many input parameters, several alternative multivariate

383


spline models have been proposed. The first nteraction splines were presented by Wahba (1986).These models are linear combinations of products of at most two univariate splines.

Multivariate Adaptive Regression Spline (MARS) model (Friedman, 1991) adaptively selects a setof basis functions for approximating the response function through a forward/backward iterativeapproach. These models use a stepwise procedure to recursively partition the simulation inputvariable space. The univariate product degree and the knot sequences are determined in a step-wise fashion based on the GCV score. The MARS model uses truncated power basis functions,which ae not as numerically robusr as B–splines.

A MARS model can be written as

y = f(x) =M∑

m=1

amBm(x) (7.12)

where am is the coefficient of the expansion, and Bm, the basis functions, can be represented as

Bm(x) = ΠKmk=1[sk,m (xνk,m

− tk,m)]q+ (7.13)

In equation (7.13) Km is the number of factors (interaction order) in the mth basis function,sk,m = ±1, xνk,m

is the νth variable (1 ≤ νk,m ≤ n), and tk,m is a knot location on each of thecorresponding variables. The subscript ‘+’ means the function is a truncated power function

[sk,m (xνk,m− tk,m)]q+ =

[sk,m (xνk,m− tk,m)]q if sk,m (xνk,m

− tk,m) > 0

0 otherwise(7.14)

The Π model (Breiman, 1991) also uses a stepwise procedure for selecting a linear combinationof products of univariate spline functions to be included in the metamodel. This method beginswith a large number of knots for each variable. and uses a forward stepwise procedure based onthe GCV score to select terms for the product, and to select the number of products. The back-wards elimination step is also based on the GCV, and is used to delete knots (or, equivalently,univariate basis elements).

Optimal Knot Distribution for Regression Splines

Dyn and Yad-Shalom (1991) consider the optimal distribution of knots for tensor product splineapproximations. They find distributions for each axis that are asymptotically optimal as thenumber of knots approaches infinity. For thin plate regression splines. the knots need not occurat data points. McMahon and Franke (1992) select knot locations to minimize the sum of squareddistances from each data point to the nearest knot point.

This knot selection method can be applied to the location of multiquadrics for radial basis functionapproximation.

384


7.4.3 Kernel Smoothing Metamodels

Ail of the estimation methods described above produce predicted values, f(x), that are linearfunctions of the observed yi values, with coefficients determined by the basis functions and theircoefficients. The kemel smoothing metamodel uses this representation explicitly, without devel-oping an explicit representation for f in terms of basis functions. A value, f(x), is computeddirectly as a weighted sum of the observed yi va1ues, where the weights are determined by akernel function.

Mathematical Form for Kernel Models

There are many forms that this weighting or kemel function may take, and there are several waysto use the weighting function to calculate f(x). To simplify the discussion, kernel smoothing inthe setting of a single design pararneter, i.e., x = x, is first discussed. Only one way is presentedto use the weighting function to compute f(x), the Nadaraya-Watson formula, because it is pop-ular and easy to understand, and it reduces the bias of the kernel metamodel near the borders ofthe region over which model outputs have been computed. Details on other kernels and kernelsmoothing formulas are given in Eubank (1988) and Hardle (1990).

Given a set of completed simulation runs with data (xi yi) the Nadaraya-Watson formula for themetamodel is

fλ(x) =

n∑

i=1

w[(x− xi)/λ] yi

n∑

i=1

w[(x− xi)/λ](7.15)

where w(o) is the kernel function.

Common choices for the kernel include

uniform w(u) = 1/2 −1 ≤ u ≤ 1triangular w(u) = 1− |u| −1 ≤ u ≤ 1quadratic w(u) = (3/4)(1− u2) −1 ≤ u ≤ 1

quartic w(u) = (15/16)(1− u2)2 −1 ≤ u ≤ 1

The approximation formula also depends on a smoothing parameter lambda, which controls thesize of the neighborhood over which y va1ues are averaged. When λ is small, few points will beincluded in the range of u, producing a nonsmooth metamodel f(x). When λ is large, many pointsare included in the weighted average, and f(x) will be a slowly varying function, with greater bias.

The natural extension of equation (7.15) to the multivariate case would replace (x− xi)/λ with||x − xi)||/λ (||o|| is the Euclidean norm). This form is symmetric about xi. As a consequence,asymmetric boundary mooifications of the kernel are not possible. Furthermore, individual λj

are not possible.

385


lnstead, (x− xi)/λ is replaced by

(Πdj=1w[(xj − xij)/λj ] : fλ(x) =

∑ni=1

{Πd

j=1w[(xj − xij)/λj ] yi

}

∑ni=1 Πd

j=1w[(xj − xij)/λj ](7.16)

The value of the smoothing parameter λ affects both smootlmess and bias, and so must be chosento balance these properties of the fitted metamodel. The method of least squares might be appliedto choose the value of λ. However, limλ→0 fλ(xi) = yi so that least squares will drive the choiceof λ to zero. An alternative to eliminate this behavior is to leave (xi. yi) out of the metamodelwhen calculating the difference between the metamodel and yi. The cross validation estimatefor λ minimizes this quantity. Wahba (1990) proposed another technique for choosing λ, calledgeneralized cross validation (GCV) which includes an adjustment to the sum of squared residuals.

7.4.4 Spatial Correlation Metamodels

Sacks et al. (1989) developed a spatial correlation parametric regression modelling approachthat shares some common features with spline smoothing, since the expected smoothness of thefunction is captured in a spatial correlation function. Spatial correlation models, also calledKriging models, have recently become popular for deterministic simulation metamodels (Simpsonet al., 1998).

Mathematical Form for Spatial Models

The model assumption is

y(x) = g(x) + Z(x) (7.17)

where Z is assumed to be a Gaussian stochastic process with spatial correlation function

Cov [Z(u), Z(v)] = R(u,v) = exp[−∑

wj(uj − vj)p] (7.18)

The value of p is sometimes fixed at 2, and g(x) is usually approximated by a constant, or a linearfunction of x. The values wj are estimated by maximum likelihood, and are used to calculateapproximate expected values of (7.18) to provide the metamodel f(x). This metamodel familyhas been used to model deterministic simulation models, but Sacks, et al. (1989) suggest theaddition of a stochastic term for nondeterministic simulation metamodelling.

Design of Experiments for Spatial Correlation Metamodels

Currin et al. (1991) discussed the design of simulation experiments for estimating the p and θj

parameters in (7.18). Factorial designs are not appropriate for fitting these parameters. In caseof a factorial design on r factors, if there are fewer than r factors active in the model, the designwill be projected effectively on the remaining factors, giving duplicate points. For the spatialcorrelation model, this leads to difficulties: the covariance matrix R will not be full rank, and

386


it will be impossible to maximize the likelihood function. Latin hypercube designs avoid thisproblem, but often provide a poor coverage of the space. Sacks et al. (1989) consider initial Latinhypercube experiment designs followed by the sequential addition of points to minimize meansquared error integrated over the region of interest.

The spatial correlation model provides a very good fit with relatively small designs. Orthogonalarrays of strength r (Owen, 1992) are an attractive class of sparse designs because they providebalanced (full factorial) designs for any projection onto r factors.

An alternative approaches to improve on the coverage of Latin hypercube designs is proposed byHandcock (1992). It is a hierarchical design, in which the design space is first subdivided intoregions to maintain balance, and sub–designs are constructed for a subset of the regions.

Morris and Mitchell (1993) expand the spatial correlation model to consider the case wherefunction and derivative information is available. Considered example with Latin hypercube,D–optimal, and two hybrid design procedures designed to have the properties of both Latinhypercube and D-optimality. One of the hybrid methods provided the smallest prediction error.

7.4.5 Frequency Domain Metamodels

Viewing variations of g over its domain in terms of spatial correlation leads naturally to the ideaof Fourier basis functions for representing an approximation to g in (7.4). While such an approachis possible, it is prone to difficulties because the Fourier decomposition is based on basis functionswith global support. Close approximations of g by a metamodel using a Fourier basis dependsheavily on cancellation to achieve the desired form, which may result in a lack of robustness.This is less than an issue when modelling dynamic phenomena. Schruben and Cogliano (1987)have used Fourier decomposition to determine steady state input/output structure by deliber-ately varying input parameters sinusoidally.

For static metamodels, wavelet basis functions provide a decomposition in both location andfrequency, providing local rather than global basis functions. The wavelet basis elements havefinite support, and are adjusted by dilatation factors to achieve a good fit (Daubechies, 1988).

7.4.6 Artificial Neural Networks

Neural networks can be thought of as flexible parallel computing devices for producing responsesthat are complex functions of multivariate input information. They can approximate arbitrarysmooth functions and can be fitted using noisy response values. Neural networks are networks ofnumerical processors, whose inputs and outputs are linked according to specific topologies. Fora brief overview to neural networks, see Wilson and Sharda (1992). Networks used for functionapproximation are typically multi–layer feed-forward networks. Feed–forward layered networkshave the flexibility to approximate smooth functions arbitrarily well, provided sufficient nodesand layers. This follows from the work of Kolmogorov (1941) whose results imply that any con-tinuous function f : Rn → R can be exactly reproduced over a compact subset by a three-layer

387


feed-forward network. While there are some approximation schemes using three layers, mostapproximations use a two–layer network structure, with a single output node for models havinga univariate dependent variable. The overall metamodel is then a linear combination of linear ornonlinear functions of the argument vector x.

Strictly speaking, neural networks are assumed to use functions that are threshold functions. It isuseful to allow more general functions, however and to think of neural networks as a technique forcomputing metamodel coefficients and predicted values rather than as representing a particularclass of modeling techniques. All of the metamodels discussed in this paper can be implementedusing a neural network structure. Network architecture can be designed using a genetic algorithmwith a selection procedure and crossing/mutation operators that introduce some diversity to thealgorithm.

The fitness function used in the genetic algorithm has two objectives: to minimize the residualsum of squares, and to reduce the unnecessary variables. The second term penalizes more com-plex models with a large number of independent variables as compared with less complex modelsoffering the same generalization capacity. Some techniques, collectively termed ’pruning’ reducethe network size by modifying not only the connection parameters (weights) but also the networkstructure during training, beginning with a network design with an excessive number of nodesand gradually eliminating the unnecessary nodes or connections.

In the past, there has been consistent effort extended to Artificial Intelligence (AI) strategies andtheir development. Among the most noticeable and high profile ones are the Artificial NeuralNetworks (ANNs) which exhibit a wide range of applications. Their use as classification andespecially regression tools also makes them conceptually suitable as metamodelling techniques.

ANNs are inspired by the way the brain functions and basically consist of a large number of verysimple processing units, called neurons, which are interconnected in a very specific way. Thestrengths of the connections between the neurons are referred to as weights, and these are whatis typically adjusted when an NN learns. The reason they are computationally so powerful - andeven provide a universal approximator - is because they combine logical sequences with paralleloperations (Bishop, 1995).

Apart from the craze and hype that surrounded AI during the 90s and before, there have been agreat number of very successful applications of NNs in a variety of fields. To understand wherethese applications fit in, it is necessary to discern two fundamentally different kinds of ANNswhich perform different tasks: one is driven by the supervised learning scenario, the other by un-supervised learning. Suffice it to say at this point that the unsupervised learning scenario appliesmore to ‘learning as you go’, where in the supervised learning there is a distinct training phase.During this training phase the network connection strengths, ’weights’ are adjusted via learningrules according to known data set values. This is followed by an application phase, where theANN performs its originally intended purpose (Ripley, 1996).

There are several other distinctions of neural networks, and more than thirty different architec-tures have been suggested and applied. Different aspects include the dynamics within a network,

388


the nature of the weights, learning rules and training methods as well as different transfer func-tions within the neurons themselves.

The problems that are presented to NNs are multifold, and range from sensory and classifica-tion tasks over control applications to regression. While many control applications, which havealso taken place successfully in aerospace engineering, refer to unsupervised learning scenarios,the classification and regression problems often call for supervised learning. Since this aspectinfluences the type of neural network and its application quite a bit, focus here is placed on thesupervised learning NNs only, as the interest concerned with metamodelling strictly resembles aregression task. When an appropriate NN is applied in a function approximation manner, thiscan actually be thought of as nonlinear regression, with the network weights as the adjustableparameters (Gibbs, 1997).

7.4.7 Neuro–Fuzzy Methods

Neural learning in fuzzy systems is most often implemented by learning techniques derived fromneural networks. The term neuro–fuzzy system (also neuro-fuzzy methods or models) refers tocombinations of neural networks and fuzzy systems. This combination does not usually mean thata neural network and a fuzzy system are used together in some way. A neuro–fuzzy method israther a way to create a fuzzy system from data by some kind of (heuristic) learning method thatis motivated by learning procedures used in neural networks. Equivalent terms for neuro-fuzzythat can be found in the literature are neural fuzzy or sometimes fuzzy networks. We distinguishthese terms from fuzzy neural networks or fuzzy–neuro approaches which denote fuzzified neuralnetworks, i.e. neural networks that use fuzzy sets instead of real numbers as weights, activations,inputs and outputs (Buckley and Eslami, 1996, Feuring and Lippe, 1996). Fuzzy neural networksare black box function approximators that map fuzzy inputs to fuzzy outputs. They are obtainedby applying Zadeh’s extension principle to a normal neural network architecture. Neuro-fuzzymethods are discussed here for data analysis only. We consider data analysis as a process thatis exploratory to some extent, as in intelligent data analysis (Berthold and Hand, 1999) or datamining (Fayyad et al., 1996). If a fuzzy model is to be created in such a scenario it is important tohave learning algorithms available that support the exploratory nature of the data analysis processand can be used to create fuzzy systems from data. The algorithms are especially examined fortheir capabilities to produce interpretable fuzzy systems. It is important that during learningthe main advantages of a fuzzy system - its simplicity and interpretability - do not get lost. Thealgorithms are presented in a way that they can readily be used for implementations. Intelligentdata analysis and knowledge discovery in databases are areas where neuro–fuzzy systems can beapplied with great benefits.

7.4.8 Kriging Method

Another widely applied method is that of Kriging. It was named after its inventor and origi-nally developed within the field of geostatistics, and thus was traditionally limited to 2–D or 3–D

389


problems, but has since been expanded to more dimensions and applied with some success by avariety of fields.

In the literature, the method itself has been termed anything from an ’extended linear regressionapproach’, over a ’method dominated by 1oca1 behavior’ to one that it is essentially the same asthe Gaussian Process based methods. While all these expressions seem particularly confusing atfirst glance, the secret to understanding may lie within a hybrid nature of the technique itself.

The Kriging mode1 consists of two parts: a first part which signifies the ’global’ behavior, and asecond which is associated with the local variation concerned on1y with the immediate neighbor-hood. It treats the deterministic output y(x) as a realization of a stochastic process

Y (x) = µ + Z(x) (7.19)

where µ is a constant and Z(x) is assumed to be a stochastic process with zero mean.

The Gaussian spatial correlation is given by

Cov [Z(u), Z(v)] = σ2 ·ρ(u,v) = σ2 ·exp [−d2s (u,v)] (7.20)

where σ2 is the process variance, ρ stands for correlation functionsm and ds (u,v) is the scaleddistance between two ponts u and v

ds (u,v) =

√√√√k∑

h=1

wh (uh − vh) (7.21)

The weight coefficients wh, also called correlation parameters, can be considered as the impor-tance of variable h: the larger the value of wh, the larger influence u could have on v in the h

direction. The coefficients wh together with µ and σ2 determines the stochastic process Y (x).

The first, global, part is typically a linear regression model, often actually linear in the predic-tor variables, and sometimes utilizing interactions. The second part, the local contribution, is astatistical process that takes the covariance among sampled points within the immediate neigh-borhood into account. Using the combined information derived from these two contributions aprediction is formed.

This makes it clear how multiple claims like the above can all hold: Kriging is in fact a hybrid ofseveral methods, and as such naturally expandable in any direction. The difference is actually inthe perspective. If one were to focus on primarily the first term where the linear regression takesplace, it could be argued that Kriging is an augmented linear regression technique. However, it isoften precisely this ’augmentation’ which contributes significantly to a good fit. Thus, one couldsay that it is actually this second (’local’) term that makes the method work, and hence Krigingis a method dominated by local behavior. Finally, the second term only takes those data intoaccount, which lie within a certain (predefined) neighborhood of the point in question. If onewere to expand this neighborhood to include all the samples over the entire space, this wouldindeed be similar to a GP–based method, however not making complete use of the Bayesian stepthe way the GP method does.

390


When thought of in this way, Kriging as a hybrid method in essence spans the space in betweenthe linear regression methods and Bayesian regression considering the covariance among the data.Within this space of techniques, the method can be modified in either direction and in the limitsresembles either of the pure methods.

Given a simple set Xd with n design points xd1 , xd2 , . . ., xdn , and their responses yd =[yd1 , yd2 , . . . , ydn ]T, the best linear unbiased predictor, i.e., the Kriging metamodel, can be ex-pressed by

y(x) = µ + rD(x)T R−1D (yD − Jn µ) (7.22)

where Jn is an n–vector of 1’s; Rd is a n × n matrix, whose elements are the correlationRDij = ρ(xDi ,xdj

) (1 ≤ i, j ≤ n) between two sample points; the elements of an n–vector rD(x)are the correlation between a new point x and sample points rdi(x) = ρ(x,xdi) (1 ≤ i ≤ n);y = (JT

nR−1d yd)/(JT

nR−1d Jn) is the generalized least–squares estimate of µ. The unknown param-

eters w = [w1, w2, . . . ,wk]T and σ2 are obtained by maximizing the likelihood function.

Kriging method provides an estimation of the prediction error on an uncomputed point, which isalso called as the mean squared error

MSE = s2(x) = σ2

[1− rd(x)TR−1

d rd(x) +[1− JT

n R−1d rd(x)]2

JTn R−1

d Jn

](7.23)

Since the global term is a linear regression approach, it requires a model to be pre–assumed,which is an inherent potential drawback. Fortunately Kriging, due to the fact that it is not solelyreliant on the linear regression model, is not as sensitive in this respect as the pure linear regres-sion techniques, which is the reason why a linear model is often sufficient. However, this may stillentice a necessary feedback loop to the initial assumption concerning the model, depending howlarge the neighborhood for the second tenn was chosen to be.

Kriging method is very flexible in capturing nonlinear behavior because the correlation functionscan be statistically tuned by the sample data. Another good feature of Kriging is its abilityto provide the estimation of the prediction error. This feature is the basis of several sequentialapproaches.

7.4.9 Radial Basis Functions

Radial basis functions (RBF) were developed by Hardy (1971) and use linear combinations of aradially symmetric function based on Euclidean distance or similar metric to build approximationmodels.

RBF approximations have been shown to produce good fits to arbitrary contours of both deter-ministic and stochastic response functions (Powell, 1987). As commonly applied, the method isan interpolating approximation. They provide an alternative approach to multivariate metamod-elling. In an empirical comparison, Franke (1982) found radial basis functions to be superior tocubic splines, B-splines, and several others.

391


Mathematical Form for RBF Models

This approach is suitable for highly nonlinear problems. Given n sample points xd1, xd2, . . ., xdn,and their responses yd = [yd1 , yd2 , . . . , ydn ]T, the radial basis function interpolation is expressedas follows

y(x) = p(x) +n∑

i=1

βi φ [d(x,xdi)] (7.24)

where p(x) is a polynomial model; d is the Euclidean distance; φ is a basis function for whichthere are many different choices, such as linear, cubic, thin plate spline, multiquadric, Gausssian,etc.

Here the focus is on the RBF method with linear basis functions φ[d(u,v)] = d(u,v) and aconstant value for the polynomial part, i.e.,

y(x) = f(x) = µ +n∑

i=1

βi || x− xdi || (7.25)

where the sum is taken over the observed set of system responses, {(xi, yi)}; || ◦ || represents theEuclidean norm, βi is the coefficient of the expression, and xdi is the observed input; and where

µ =1n

n∑

i=1

ydi

The n unknown coefficients βi (i = 1, . . . , n) are found simply by replacing the left hand side of(7.25) with g(xi), and solving the resulting linear system of n equations.

The prediction equation is expressed as

y(x) = µ + bd(x)T B−1d (yd − Jn µ) (7.26)

where Jn is a n–vector od 1’s, Bd is a n×n matrix, and bd is a n×1 vectors with elements,respectively

Bdij = ||xdi − xdj || 1 ≤ i , j ≤ n

bdi = x− xdi 1 ≤ i ≤ n

It can be noticed that the function form of the predictor in equation (7.22) for Kriging is verysimilar to that of the predictor in equation (7.26) for RBF. Kriging is a RBF interpolationwith Gaussian basis functions, except that the scaled distance is used and statistical approachesare applied to tune the correlation variables wh in Kriging. The RBF method, unlike Kriging,cannot provide the information of prediction error. Therefore, some of the sequential samplingapproaches are not applicable when the RBF is used as the metamodelling technique.

392

7.5 – Sequential Sampling Approaches

Design of Experiments for RBF Models

Unfortunately, the condition number of the linear system deteriorates rapidly with increasingdimension and increasing numbers of data values to be fitted (Hardy, 1971). Also, since this isan interpolation method, its direct application to simulation metamodelling is limited. Dyn etal. (1987) solve both of these problems by finding effective pre-conditioners for the linear system,and by executing only the first few iterative steps in solving the system of equations to providea smooth fit to noisy data.

Balling and Clark (1992) provide upper and lower bounds on the L2 norm of the matrix of equa-tion coefficients (Hardy matrix), and Sun (1993) gives necessary and sufficient conditions on thelocation of the design points for the Hardy matrix to be nonsingular.

The so–called thin plate splines have radial basis functions of

||x− xdi ||2 ·log ||x− xi||

Radial basis functions also arise for a class of spline functions. Like smoothing splines, the radialbasis functions, as well as their coefficients in the ‘best fit’ metamodel, depend on the location ofthe observed values xdi .

7.5 Sequential Sampling Approaches

As mentioned before, sequential sampling approaches are typically based on optimality criteriafor experimental design. Here five sequential sampling approaches, i.e., the Entropy approach,the MSE approach which is a variant of the Entropy criterion, the IMSE approach, the MaximinDistance approach, the Maximin Scaled Distance method, and the Cross–Validation approach areintroduced. The Entropy approach, the IMSE approach, and the MSE approach are sequentialapproaches with adaptation to existing Kriging metamodels, and hence they are not suitable forother types of metamodels. The Maximin Distance approach is a sequential approach withoutadaptation and it is not limited to Kriging.

7.5.1 Entropy Approach

Currin et al (1991) introduced the entropy criterion for design of computer experiments under theBayesian framework. Given the existing sample set Xd (with all n previous sample points xd1 ,xd2 , . . ., Xdn) and the existing Kriging metamodel (created based on Xd), the Entropy approachis to select a new sample set Xc (with m sample points xc1 , xc2 , . . ., xcm) to maximize the amountof information obtainable from the new sample set. It has been shown (Koelher and Owen, 1996)that this criterion is equivalent to

maxXc

|Ra|×|JTn+mR−1

a Jn+m| (7.27)

393


where Ra is the correlation matrix of all the n + m sample points in Xa = Xc ∪Xd; Jn+m is an(n + m)–vector of l’s.

The correlation parameters θh (h = 1, . . . , k) in the correlation matrix Ra are set to the sameas those in the existing Kriging metamodel, which makes the new sample points adapted to theexisting metamodel.

7.5.2 MSE Approach

The mean squared error (MSE) approach is to select a new sample point xc with the largestestimation of prediction error (eq. (7.23)) in the existing Kriging metamodel (created based onthe existing sample set Xd, i.e.,

maxxc

s2(xc) (7.28)

This approach is in fact a special case of the Entropy criterion if only one new sample point isselected at each stage.

7.5.3 IMSE Approach

This integrated mean squared error (IMSE) criterion was introduced by Sacks et al (1989) fordesign of computer experiments. Given the existing sample set Xd and the existing Krigingmetamodel (created based on XD), the IMSE approach is to select a new sample set XC tominimize the integrated mean square error, i.e.,

minXc

∫s2(x) dx (7.29)

where

s2(x) = σ2

[1− ra(x)T R−1

a ra(x) +(1− Jn+m)T R−1

A ra(x))2

JTn+m R−1

a Jn+m

](7.30)

Refer to equations (7.23) and (7.27) for the notations of the parameters. The correlation param-eters used in RA and rA(x) are the same as those used in the existing metamodel.

The differences between the IMSE approach (when used to generate one sample point at a stage)and the MSE approach are:

• the IMSE approach uses an averaged MSE in the whole design space;

• in the IMSE approach, the mean squared error is not only based on existing sample set Xd,but also based on new sample set Xc, while in the MSE approach, the mean squared erroris only based on the previous sample points Xd.

394

7.5 – Sequential Sampling Approaches

7.5.4 Maximin Distance Approach

The Maximin Distance (MD) criterion was proposed by Johnson et al. (1990) for computerexperiments. Given the existing sample set Xd, the Maximin Distance approach is to select anew sample set Xc to maximize the minimum distance between any two sample points in thesample set Xa = Xc ∪Xd (with all n + m sample points), i.e.,

maxXd

{xci 6=xaj

min1≤i≤m, 1≤j≤n+m

[d(xci ,xaj )

]}(7.31)

where xci ∈ Xc (i = 1, . . . , m) and xaj ∈ Xa = Xc ∪Xd (j = 1, . . . , n + m).

This approach can be used for both Kriging and RBF methods. However, one potential limitationis that unlike other approaches it cannot adapt new sample points to the information obtainedfrom the existing metamodel (created based on Xd).

7.5.5 Maximin Scaled Distance Approach

The Maximin Scaled Distance approach is a modification to the Maximin Distance approach.The origina1 MD approach cannot utilize the information obtained from the existing metamodel.To make new sample points adapted to the information from the existing metamodel, a scaleddistance approach

maxXc

{xci 6=xaj

min1≤i≤m, 1≤j≤n+m

[ds(xci ,xaj )

]}(7.32)

was introduced by Jin et al. (2002), where

ds(u,v) =

√√√√k∑

h=1

wh (uh − vh)2 (7.33)

The weight parameters wh reflect the importance of different variables identified from the exist-ing metamodel. If one variable is more important than the other variables, more weights shouldbe assigned to that variable. This approach is expected to lead to a better uniformity of theprojection of sample points into the space made of those important variables and therefore toimprove the quality of the information obtained. For example, consider an extreme case in whichsome input variables are totally irrelevant (wh = 0), this approach should select sample pointsthat uniformly fill the space made of the rest of variables.

When Kriging is used for metamodelling, the correlation coefficients θh could be used as wh. Infact, the correlation coefficients to some extent are indicators of the smoothness or predictabilityalong the xh coordinate (e.g., the smaller the θh is, the smoother along xh coordinate; if θh isclose to zero, the influence of xh is close to linear). In this case, the scaled distance ds is the samedistance used in the correlation function of Kriging - see equatiom 7.21. For other metamodelingtechniques, the relative importance of variables can be obtained by global sensitivity analysisapproaches.

395


7.5.6 Cross–Validation Approach

For those metamodeling techniques with which an estimation of prediction errors are not provided,such as the RBF method, Jin et al. (2002) proposed a cross–validation approach to estimate theprediction error. The basic idea of this method is to leave out one or several sample points eachtime, to fit a metamodel based on the rest of sample points, to use the metamodel to predict theresponse on those leave-out points, and then to calculate the difference between the predictionand the real response. With this approach, no new sample points are needed for assessing the ac-curacy of a metamodel. Instead of using cross–validation for accuracy assessment, this approachis used to estimate the prediction error for sequential sampling. The prediction error is estimatedby averaging the cross–validation errors.

Based on the existing sample set Xd with n points xd1 , xd2 , . . ., Xdn , the prediction error onpoint x is estimated by

e(x) =

√√√√ 1n

n∑

i=1

[y−i(x)− y(x)]2 (7.34)

where y(x) denotes the prediction of the response on x based on the metamodel created basedon all n existing sample points; y−i(x) denotes the prediction of the response on x using themetamodel created based on n−1 existing sample points with point i moved out (i = 1, 2, . . . , n).

With the cross validation approach, a point with the largest prediction error - equation (??) - isselected as the new sample point. The idea is similar to the MSE Approach. It has been foundthat in some cases the point with the largest estimated prediction error tends to be close to oneofthe existing sample points. To avoid that new sample points cluster around the existing samplepoints, equation (??) is modified to take into consideration the distance between points

maxxC

{e[xc)i]×min[d(xc,xdi ]} (7.35)

7.5.7 Potential Usages of Sequential Approaches

As mentioned earlier, the major advantage of sequential approaches is that they can be usedto monitor the process of metamodelling and to help a user decide when to stop the samplingprocess. This is desirable for many engineering problems if the simulation or analysis codes arecomputationally expensive. To save computational costs, designers can stop the sampling processwhenever they think the quality or fidelity of metamodels is good enough or the performance ofmetamodels cannot be further improved (in that case one may change the metamodelling ap-proaches). One issue, however, is how to assess the performance of metamodels without usingadditional confirmation samples. The cross-validation error can be used as an accuracy measureto overcome this difficulty.

396

7.6 – Metamodelling Process in Technical Systems Design

Sequential sampling approaches could also be very helpful for identifying an interesting designregion and improving the accuracy of a metamodel in a narrowed interesting design region. Con-sider a situation in which the design team is only interested in a given range of a response. Ifthey simply use space–filling approaches for the entire design space, many sample points will bewasted. By using sequential sampling approaches, the sample points can be added gradually andused to refine the metamodel.

7.6 Metamodelling Process in Technical Systems Design

From the nature of the application in engineering design, a metamodelling process as follows canbe distilled to be the current state–of–the–art. It is based on the linear regression technique ofResponse Surface Methodology (RSM), where a model is first predefined and then, based on aselected DOE, the corresponding cases are used to identify the contributors and select the pa-rameters within the model.

The difference between this process and the technique itself is that the process actually demon-strates the complete application procedure, since it accounts for having to define the model, assessits validity and use it for the desired application in the end. This current process is illustrated inFigure 7.2, and can be described as follows.

Given a problem, i.e. to approximate a particular disciplinary response for a specific aircraft con-figuration, the first step is to select a physics-based analysis method of acceptable fidelity , andthen assume a model, which one hopes will be suited to accurately represent this desired response.Often, the model which is being assumed takes on the character of a quadratic polynomial; thisway interactions among the variables can be taken into account and simple non-linear behaviorfmds its equivalent in the quadratic terms, in a first–approximation sense, along with the maineffects.ding cases are used to identify the contributors and select the parameters within the model.

Once a model has been selected, the next step is to decide upon a particular design of experiments.This decision depends on the complexity expected in the response to be modelled, as well as therun time associated with the analysis code(s) needing to be executed for each ’case’ or functioncall. It may be, for instance, that there is a fairly good intuitive and conceptual understanding ofwhich variables will have the more dominant impact on the response, and which can be expectedto have strong interaction effects. In this case, it may be possible to select DoE configurationswith a lower number of cases than may be required otherwise since the confounding effects, whichwould appear, can be sorted out fairly confidently. It may also be the case that the analysisthat needs to be executed is extremely extensive and computationally expensive, with scripts ofperhaps several analysis codes feeding into each other and even iterations taking pIace. In sucha case it is impractical to run a large number of cases, and thus the DOE selection is restricted.

Occasionally, one runs into the problem of having to run more cases than can be afforded, accord-ing to the assessment of the information needed from the DoE in terms of variable effects anda compilation of computational expense incurred with the number of cases and their runtime.

397


In such instances what may be in order is a screening test , i.e. a two–level DoE that focuseson the main effects and identifying the most significant contributors, in a Pareto chart manner.This allows the designer to down-select the number of variables modelled to a more reasonablenumber.

Figure 7.2. Metamodels in Technical Systems Design

If a screening test is necessary, the next set of steps includes identifying the DOE, running thecases, and performing an Analysis of Variance (ANOVA). After this is achieved an argument anddiscussion are in order as to whether these results make sense and if one is confident enough to

398

7.6 – Metamodelling Process in Technical Systems Design

indeed make a selection of the variables based on these preliminary results. If one should finda strong discrepancy between the Pareto analysis of which variable should be considered andengineering intuition, then it may be that interaction effects among the variables are dominantand need to be modelled before an informed decision can take place.

Assuming that one has found a satisfying DOE either directly or via a screening test, the actualwork of running the cases specified in the DoE table takes place. This procedure can usuallybe automated via shell scripts and even parallel processing, and is typically characterized bycomputer analysis without extensive work by the designer.

Once the cases have finished running, all the data is collected, and an analysis of variance isconducted, such that the next step is parameter estimation to find suitable parameters settingsof the model, namely the coefficients of the pre–selected function. At this stage, this is a purelystatistical exercise, where one assesses the overall fit of the model, identifies possible outliers, andverifies the scatter plots accordingly.

If the results from the ANOVA in the previous step were satisfactory, one still needs to test theassumed model validity via some random (or quasi–random, if preferred) cases. This is necessary, since the DOE cases were selected assuming a particular model and do not account well fordiscrepancies between the model itself and the true underlying, unknown function. This stepclosely resembles the efforts in previous steps in terms of programming, employing shell scriptsand running the actual codes.

Finally, the results need to be assessed, and based on the error observed on the validation dataset a decision is made as to whether the initially assumed model can be considered appropriate.If it is, then the acquired parameter setting together with the assumed model form the newmetamodel, which can now be employed as intended. If, however, there is evidence that themodel does not lie within the required margin of error, i.e. if the observed fit on the validationdata set is insufficient, this indicates an irrecoverable problem with the initial set of assumptionsregarding the model. In this case the only way to improve the metamodel is to go back andassume a new model, and go through the entire process again. It will not help to simply addmore sampling cases; in fact, that may actually make the model fit worse, essentially indicatingthe inappropriateness of the assumed model in ever increasing clarity. Naturally, the informationand experience gained from the first run through this process will still be beneficial. On onehand, it will hopefully provide guidance on how to find a better suited model, either throughthe application of transformations or by going to a higher order polynomial. Also, dependingon how the new model differs from the one originally assumed, the data already collected fromthe sampling can obviously be used further, thus reducing the computational effort significantly.However, most of the statistical effort is still required, and in the end there is no guarantee thatthe new model will yield the required accuracy either.

There is also the possibility that the analysis method which had been selected initially will turnout to not be appropriate, such that the cases may need to be rerun entirely to create a new setof data, based on higher fidelity methods.

In review of this process, two observations can be made:

399


1. A main problem can be identified to lie within the large loops, carrying the design analystfrom what was thought to be the end all the way back to the ’drawing board’, to the initialassumption, in particularly the loop which concerns itself with the model assumption. Thisis not only time consuming and an annoyance, but can actually severely disrupt the presetof deadlines, where there was a specific time allocation associated with such metamodelcreation or analysis. This is particularly dangerous if the knowledge about the responsebehavior is not easily accessed, and a great deal of uncertainty is associated with this initialassumption. In an engineering process it poses an additional problem to adjust the magni-tude of the loop associated with the uncertainty in the design process which it represents.

2. Considering the process rather than simply the technique associated with RSM, it becomesclear that this advertised ’black box’ approach is not truly applicable as a real black boxapproach in the implementation. Often, more detailed knowledge of the underlying statis-tics is required to actually create the metamodel in the desired validity.

With this background, it becomes apparent that an alternative metamodel implementation pro-cess should on one hand exhibit an ease of execution and a certain robustness if it is to besuccessful in industry design implementations. However, the main improvement desired actuallyis to restructure the long loop leading back to the first model assumption. The loop itself willhave to remain in one form or another so long as assumptions are required, which is practicallyalways the case. Nonlinear regression techniques such as feed–forward back propagation neu-ral networks exhibit a slight improvement of the situation somewhat, since their flexible naturereduces the probability that the loop back needs to be executed. Bayesian regression methodsmake assumption concerning the statistics of the data rather than the function behavior itself.Therefore, which model is appropriate is actually dependent on the problem at hand and the typeof information available.

400

Bibliography

[1] Balling, R.J. and Clark, D.T.: A Flexible Approximation Model for Use with Optimization, Proceed-ings, 4th AIAA/USAF/NASA/OAI Symposium on Multidisciplinary Analysis and Optimization,Cleveland, AIAA, Vol. 2, 1992, pp. 886–894, AIAA-92-4801-CP.

[2] Box, G.E.P. and Draper, N.R.: Empirical Model Building and Response Surfaces, John Wiley &Sons, New York, 1987.

[3] Breiman, L.: The Π Method for Estimating Multivariate Functions from Noisy Data, Technometrics,Vol. 33, 1991, pp. 125-160.

[4] Chen, W.: A Robust Concept Exploration Method for Configuring Complex Systems, Ph.D. Thesis,Georgia Institute of Technology, School of Mechanical Engineering.

[5] Chen, W., Allen, J.K., Mavris, D. and Mistree, F.: A Concept Exploration Method for DeterminingRobust Top–Level Specifications, Engineering Optimization, Vol. 26, 1996, pp. 137–158.

[6] Currin, C., Mitchell, T., Morris, M. and Ylvisaker, D.: Bayesian Prediction od DeterministicFunctions with Applications to the Design and Analysis of Computer Experiments, Journal of theAmerican Statistical Association, Vol. 86, no. 416, 1991, pp. 953–963.

[7] Daubechies, I.: Orthonormal Bases of Compactly Supported Wavelets, Communications in Pure andApplied Mathematics, Vol. 41, 1988, pp. 909–996.

[8] deBoor, C.: A Practical Guide to Splines, Springer–Verlag, New York, 1978.

[9] Dyn, N., Levin, D. and Rippa, S.: Numerical Procedures for Surface Fitting of Scattered Data byRadial Basis Functions, SIAM Journal of Scientific and Statistical Computing, Vol. 7, no. 2, 1986,pp. 639–659.

[10] El–Beltagy, M.A.: A Natural Approach to Multilevel Optimization in Engineering Design, Ph.D.Thesis, University of Southampton, Department of Mechanical Engineering, 2000.

[11] Eubank, R.L.: Spline Smoothing and Nonparametric Regression, Marcel Dekker, New York, 1988.

[12] Friedman, J.H.: Multivariate Adaptive Regression Splines, The Annals of Statistics, Vol. 19, no. 1,1991, pp. 1–67.

[13] Friedman, L.W.: The Simulation Metamodel , Kulwer Academic Publishers, 1996.

[14] Handcock, M.S.: On Cascading Latin Hypercube Designs and Additive Models for Experiments, JointStatistical Meetings, Boston, 1992.

[15] Hardle, W.: Applied Nonparametric Regression, Cambridge University Press, Cambridge, 1990.

[16] Hardy, R.L.: Multiquadratic Equations of Topography and Other Irregular Surfaces, Journal ofGeophysic Resources, Vol. 76, 1971, pp. 1905–1915.

[17] Jin, R., Chen, W. and Sudjanto, A.: On Sequential Sampling for Global Metamodeling in EngineeringDesign, Proceedings of DETC’02, ASME, Montreal, 2002, pp. 1–10.

[18] Johnson, M., Moore, L. and Ylvisaker, D.: Minimax and Maximin Distance Designs, Journal ofStatistical Planning and Inference, Vol. 26, 1990, pp. 121–148.

401

Bibliography

[19] Klejinen, J.P.C.: Statistical Tools for Simulation Practitioners, Marcel Dekker, New York, 1987.

[20] Koelher, J.R. and Owen, A.B.: Computer Experiments, in ‘Handbook of Statistics’, Ghosh & Raoeds., Elsevier Science, New York, 1996, Chapter 13, pp. 261–308.

[21] Kolmogorov, A.: Stationary Sequences in Hilbert Space, Bulletin of Mathematics, University ofMoscow, Vol. 2, no. 6, 1941.

[22] MacKay, J.C.: Gaussian Processes - A Replacement for Supervised Neural Network? , 2001, availablefrom http://www.inference.phy.cam.ac.uk/mackay/.

[23] Mavris, D.N., Bandte, O. and Schrage, D.P.: Effect of Mission Requirements on the EconomicRobustness of an HSCT Concept , Proceedings, 18th Annual Conference of the International Societyof Parametric Analysis, Cannes, 1996.

[24] Mavris, D.N., DeLaurentis, D.A., Bandte, O. and Hale, M.A.: A Stochastic Approach to Multidisci-plinary Aircraft Analysis and Design, AIAA 98–0912, 1998.

[25] McKay, M.D., Beckman, R.J. and Conover, W.J.: A Comparison of Three Methods for Selectingvalues of Input Variables in the Analysis of Output from a Computer Code, Technometrics, Vol. 21,no. 2, 1979, pp. 239–245.

[26] Morris, M.D. and Mitchell, T.J.: Exploratory Designs for Computer Experiments, ORNL/TM-12045,Oak Ridge National Laboratory, Oak Ridge, 1992.

[27] Myers, R.H.: Response Surface Methodology , Allyn & Bacon eds., Boston, 1976.

[28] Owen, A.B.: Orthogonal Arrays for Computer Experiments, Integration and Visualization, StatisticalSinica, Vol. 2, 1992, pp. 439–452.

[29] Powell, M.J.D.: Radial Basis Functions for Multivariable Interpolation: a Review , ‘Algorithms forApproximations’, Mason and Cox eds., London, Oxford University, 1987.

[30] Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P.: Design and Analysis of Computer Experi-ments, Statistical Science, Vol. 4, no. 4, 1989, pp. 409–435.

[31] Schruben, L.W. and Cogliano, V.J.: An Experimental Procedure for Simulation Response SurfaceModel Identification, Communications of the ACM, Vol. 30, 1987, pp. 716–730.

[32] Simpson, T.W., Allen, J.K. and Mistree, F.: Spatial Correlation Metamodels for Global Approxima-tion in Structural Design Optimization, Advances in Design Automation, Atlanta, ASME, Paperno. DETC98, 1989, pp. 409–435.

[33] Simpson, T.W., Mauery, T.M., Korte, J.J. and Mistree, F.: Comparison of ResponseSurface and Kriging Models for Multidisciplinary Design Optimization, Proceedings, 7thAIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis & Optimization, St.Louis, AIAA, Vol. 1, 1998, pp. 381–391, AIAA–98–4755.

[34] Sun, X.: Solvability of Multivariate Interpolation by Radial or Related Functions, Journal ofApproximation Theory, Vol. 72, 1993, pp. 252–267.

[35] Taguchi, G.: Taguchi on Robust Technology Development: Bringing Quality Engineering Upstream,ASME Press, New York, 1993.

[36] Wahba, G.: Partial and Interaction Splines for the Semiparametric Estimation of Functions ofSeveral Variables, in ‘Computer Science and Statistics’, Proceedings of the 18th Symposium on theInterface, Boardman ed., Washington D.C., American Statistic Association, 1986, pp. 75-80.

[37] Wahba, G.: Spline Models for Observational Data, CBMS–NSF Regional Conference Series in AppliedMathematics, Philadelphia, SIAM, Vol. 59, 1990.

402

Chapter 8

Fuzzy Sets and Fuzzy Logic

Fuzzy set theory was formally introduced by Zadeh (1965) but the basic concepts have been knowna very long time. Kosko (1994) ascribes the history of fuzzy logic going back to ancient Chinesephilosophers who introduced the concept of yin and yang. In this framework every element of auniverse of discourse (i.e., analogous to the ‘environmental’ set in the classical, crisp set theorycan belong to a fuzzy set with a membership grade, µ, being any number in [0, 1].

In mathematical modelling of system properties and behavior the analyst can come across incon-veniences. The first is caused by the excessive complexity of the situation being modelled. Thesecond inconvenience consists of the indeterminacy caused by analyst’s inability to differentiateevents in real situations exactly and, hence, to model problem domains in precise terms. Thisindeterminacy is not an obstacle when using natural language since its main property is vaguenessof its semantics and, thus, its capability of working with vague notions. Classical mathematics,however, has not coped with such vagueness. A mathematical apparatus capable to describevague notions could be very useful since it could help us to overcome the above obstacles inmodelling. Moreover, it appears to be necessary for the development of some new branches ofscience, such as artificial intelligence.

This chapter is devoted to such an apparatus. lt is called fuzzy set theory and its fundamentalnotion is that of a fuzzy set. It can also be understood to be a generalization of the classical set.Using it, it is possible to model vague notions and also imprecise events.

8.1 Preview

Fuzzy set theory is characterized by its capability of handling linguistic variables in a non-analytical environment; this makes it a paradigm very close to the way a human thinks. Theability to summarize information finds its most pronounced manifestation in the use of naturallanguages. Thus, each word x in a natural language L may be viewed as a summarized descrip-tion of a fuzzy subset M(x) of a universe of discourse U , with M(x) representing the meaning ofx. In this sense, the language as a whole may be regarded as a system for assigning atomic and

405

8 – Fuzzy Sets and Fuzzy Logic

composite labels (i.e., words, phrases, and sentences) to the fuzzy subsets of U . For example, ifthe meaning of the noun flower is a fuzzy subset M(flower), and the meaning of the adjectivered is a fuzzy subset M(red), then the meaning of the noun phrase red flower is given by theintersection of M(red) and M(flower). If one regards the color of an object as a variable, thenits values, ‘red’, ‘blue’, ‘yellow’, ‘green’, etc., may be interpreted as labels of fuzzy subsets ofa universe of objects. In this sense, the attribute color is a fuzzy variable, that is, a variablewhose values are labels of fuzzy sets. It is important to note that the characterization of a valueof the variable by a natural label such as ‘red’ is much less precise than the numerical value ofthe wavelength of a particular color. More generally, the values may be sentences in a specifiedlanguage, in which case it may be said that the variable is linguistic. The values of the fuzzyvariable height might be expressible as ‘tall’, ‘not tall’, ‘somewhat tall’, ‘very tall’, ‘not very tall’,‘very very tall’, ‘tall but not very tall’, ‘quite tall’, ‘more or less tall’. Thus, the values in questionare sentences formed from the label ‘tall’, the negative ‘not’ the connectives ‘and’ and ‘but’ andthe hedges ‘very’, ‘somewhat’, ‘quite’ and ‘more or less’. In this sense, the variable ‘height’ asdefined above is a linguistic variable.

The main function of linguistic variables is to provide a systematic means for an approximatecharacterization of complex or ill-defined phenomena. In essence, by moving away from the useof quantified variables and toward the use of the type of linguistic descriptions employed by hu-man beings, one acquires a capability to deal with systems which are too much complex to besusceptible to analysis in conventional mathematical terms.

For the above reason fuzzy logic is frequently used in those problems where it is necessary to mimicthe behavior of some human expert. This is one of the main reasons that makes fuzzy logic auseful approach to decision problems. Moreover, fuzzy logic has been extensively and successfullyapplied to many engineering problems. Among those developments fuzzy multiattribute decisionmaking techniques present characteristics useful also in approaching decision problems emergingin the design process. These techniques are characterized by handling multiple imprecise (fuzzy)attributes.

Fuzzy–set theory provides a mathematical basis for representing and reasoning with knowledgein uncertain and imprecise problem domains. As compared to crisp requirements (constraints),fuzzy approach softens the sharp transition from acceptable to unacceptable. The mathematicaltheory of fuzzy sets (Zadeh, 1965), alternatively referred to as fuzzy logic, is concerned with thedegree of truth that the outcome belongs to a particular category, not with the degree of likeli-hood that the outcome will be observed. Fuzzy logic provides appropriate models for the abilityof human beings to categorize things, not by verifying whether they satisfy some unambiguousdefinitions, but by comparing them with characteristic examples of the categories in question.

Fuzzy–set theory is an important branch of decision–making theory, providing tools to quantifyimprecise verbal statements and to classify outcomes of decision-analytical experiments. Usually,when decisions are prepared a considerable amount of imprecise information with a quantitativeconnotation is transmitted via natural language. Well–known examples are the frequency indica-tors like: almost never, rarely, sometimes, often, mostly, and almost always. They are meaningful

406

8.1 – Preview

albeit in a particular context only. Since decisions are invariably made within a given context,graded judgement should also be considered within a particular framework. Mutual understand-ing of what the context is seems to be possible by common experience and education of humanbeings. The qualifying terms like almost, rather, somewhat, the so-called hedges, enable us toexpress degrees of truth in situations where a black–or–white statement would be inadequate.

Although fuzzy-set theory has been criticized for being probability theory in disguise, it is easy tounderstand that the two theories are concerned with two distinct phenomena: with observationsthat can be classified in vaguely described (imprecise) categories only, and with experiments suchthat the outcomes can be classified into well–defined (crisp) categories. In essence, fuzzy–settheory is concerned with ability of human beings to categorize things and to label the categoriesvia natural language.

The almost ideological debate between the supporters of probability theory and fuzzy–set theoryreveals that the conflict has deep roots. Indeed, the fact that fuzzy–set theory models degrees oftruth leads to a confrontation with our scientific tradition. Fuzzy logic agrees that an elementmay with a positive degree of truth belong to a set and with another positive degree of truth tothe complement of the set, so violating the law of non–contradiction which states that a statementcannot be true and not-true at the same time. Fuzzy logic also violates the law of the excludedmiddle (a statement is either true or false, ‘tertium non datur’). Indeed, the real world is nota world of black-and-white, but it is full of gray shades. Note that probability theory neverchallenged the traditional bivalent logic. It has its roots in gambling, where the rules and theoutcomes are unambiguous.

8.1.1 Types of Uncertainty

Probability theory is a well-established mathematical theory, designed to model precisely de-scribed, repetitive experiments with uncertain outcomes. In the last few decades other types ofuncertainty have been identified, however, and new mathematical tools are accordingly understudy, in attempts to deal with situations which are not or cannot be covered by the classicaltools of probability theory. The key notion is that uncertainty is a matter of degree. Thus, eventsoccur with a particular degree of likelihood, elements have properties with a particular degree oftruth, and actions can be carried out with a particular degree of ease. Roughly speaking, thefollowing types of uncertainty can be distinguished.

• Randomness occurs when a precisely described experiment such as casting a die on a flattable has several possible outcomes, each with known probability (a perfect die with a ho-mogeneous mass distribution) or with unknown probability (an inhomogeneous die). Theoutcomes of the experiment (the faces 1, 2, . . ., 6) can unambiguously be observed. Theexperiment of casting the die can arbitrarily be repeated. Further experimentation willreduce the uncertainty: it will reveal the probability distribution of the outcomes of the die.Probability theory is concerned with the uncertainty of whether the respective outcomeswill occur or not, that is, with their degree of likelihood.

407


• Vagueness or imprecision arises as soon as the outcome of the experiment cannot properlybe observed. A typical example is given by the situation arising after the experiment of cast-ing a die with colored faces, under twilight where colors cannot properly be distinguished.There are several possible outcomes, each with a particular degree of truth. Further exper-imentation will not reduce the uncertainty. Color perception illustrates that vagueness orimprecision may be due to the manner in which our neural system operates.

• Ambiguity arises when a verbal statement has a number of distinct meanings so that onlythe context may clarify what the speaker really wants to say.

Risk is not a particular type of uncertainty but rather a mixture, where outcomes cannot pre-cisely be classified into a small number of categories so that it is also difficult to specify theirprobabilities.

8.1.2 Crisp Sets

A set is any well defined collection of objects. An object contained by a set is called a member , orelement . For instance, if one considers books, sets might be hard cover, soft cover, large, small,fiction, etc. A particular book could be a member of multiple sets. All members of a set arecreated as equal members of that set.

Below capital letters denote sets, while members of a set are written in lowercase. To indicatethe universe of discourse, often referred to as the universal set, the symbol U is used. All setsare members of the universal set. Additionally, a set with no elements is called a null, or emptyset and is denoted 0.

If there is an element x of set A, this is written as x ∈ A, while if x is not a member of A, it iswritten as x /∈ A. There are two methods used to describe the contents of a set, the list methodand the rule method. The list method defines the members of a set by listing each object in theset

A = {a1,a2, . . . ,an}

The rule method defines the rules that each member must adhere to in order to be considered amember of the set

A = {a | a has properties P1,P2, . . . ,Pn}

When every element in the set A is also a member of set B, then A is a subset of B

A ⊆ B

If every element in A is also in B and every element in B is also in A, then A and B are equal

A = B

408

8.1 – Preview

If at least one element in A is not in B or at least one element in B is not in A, then A and B

are not equal

A 6= B

The set A is a proper subset of B if A is a subset of B but A and B are not equal, i.e. A ⊆ B

and A 6= B

A ⊂ B

To present the notion that an object is a member of a set either fully or not at all, the functionµ is introduced. For every x ∈ U , µa(x) assigns a value that determines the grade of membershipof each x in the set A ∈ U

µA(x) =

{1 if and only if x ∈ A

0 if and only if x /∈ A

Therefore, µA maps all elements of the universal set into the set A with values 0 and 1

µA : U → {0,1}

The characteristic function has two possible values to model the idea that the statement x belongs

to A is either true or false, for each element in U . Only one of the previous statements holds,that is, the element has respectively either a 0 or a 1 membership grade in the given set.

Figure 8.1. Venn diagrams

Using the given notation, four basic operations that can be used on sets are shown in Figure 8.1using Venn diagrams and also written in set theoretic notation. The shaded region indicates theresult of applying the given function.

409


The four operations shown in Figure 8.1 are routinely combined to produce more complex func-tions. Additionally, these examples use only two sets, but union and intersection can be definedfor any number of sets. This is due to the properties of the basic operations shown in Table 8.1.

Property Description

Commutativity A ∪ B = B ∪ A

A ∩ B = B ∩ A

Associativity (A ∪ B) ∪ C = A ∪ (B ∪ C)(A ∩ B) ∩ C = A ∩ (B ∩ C)

Distributivity A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

Idempotence A ∩ A = A

A ∪ A = A

Law of contradiction A ∩ A = 0

Law of excluded middle A ∪ A = UTable 8.1. Summary of crisp set properties

Preserving these behaviors is important as fuzzy sets are a generalization of classic sets and mustbe able to reproduce exactly their behavior.

8.1.3 Fuzzy Control

Fuzzy–set theory experienced huge resistance from probability theory, but in electrical engineer-ing it is now widely accepted as a suitable model for the verbal classification of observations andcontrol commands. Fuzzy–set theory and fuzzy logic have been successfully applied to industrialcontrol problems, delivering performance levels similar to those obtained by expert human oper-ators.

Fuzzy logic, the name of which appears on Japanese cameras, washing machines, refrigerators,and other domestic appliances, has a certain future in the design of control mechanisms. The firstreally exciting application of fuzzy logic was realized in 1987, when the Sendai railway started itsoperations. On a single North-South route of 13.6 km and 16 stations, the train glides smootherthan any other train because of its sophisticated control system. So, fuzzy logic did not comeof age at universities (Kosko, 1994) but in industry and in the commercial market. The debatebetween fuzzy logic and probability theory will not be solved by theoretical arguments but bythe successes in industrial design, development, production, and sales.

Control systems benefit so much from fuzzy logic because they follow the example of the humancontrollers who categorize their observations (the speed is rather high, low, etc.) whereafter theyissue vague commands to the system under control (slow down, or accelerate slightly, etc.). Afuzzy air conditioner, for instance, employs a number of rules of the form

410

8.2 – Basics of Fuzzy Logic

if temperature is cold then motor speed must be fast,

if temperature is just right then motor speed must be medium,

etc. The system obviously checks to which of the categories ‘cold’, ‘coo1’, ‘just right’, ‘warm’, or‘hot’ the temperature belongs, whereafter the motor speed is properly adjusted if it does not suf-ficiently belong to the required category ‘stop’, ‘slow’, ‘medium’, ‘fast’, or ‘blast’ (Kosko, 1994).The temperature in this example is alternatively referred to as a linguistic variable which canonly assume a verbally defined value.

In general, a control system is characterized by how it transforms input quantities into outputquantities. An intelligent control system yields appropriate problem-solving responses when it isfaced with information which are usually imprecise. Moreover, an intelligent system learns frompast experience, it generalizes from a limited number of experiences which are mostly imprecise,and it creates new input-output relationships.

8.2 Basics of Fuzzy Logic

This section starts from the mathematical model of vagueness and imprecision originally proposedby Zadeh (1965) who suspected that an ever-increasing amount of precision in mathematical mod-elling would lead to almost insignificant models for control systems. The processing of impreciseinformation is typically the domain of fuzzy logic.

8.2.1 Membership Function

A fuzzy set A is defined as the ordered set of pairs (x,µA(x)). For each pair, x signifies an elementin the fuzzy set, while µA(x) represents the grade of membership x has in A.

Since Zadeh (1965) introduced the notion of fuzzy sets, one of the main difficulties has been withthe meaning and measurement of membership function. Fuzzy sets are totally characterized bytheir membership functions. For a sound theory of fuzzy sets a rigorous semantics together withpractical elicitation methods for membership functions are necessary.

It is suitable to start with the formal (i.e., mathematical) definition of a membership function.The so–called membership function µA of the fuzzy set A models the idea that the statement x

belongs to A is not necessarily true or false only. On the contrary, it denotes the grade of anelement x in the set A. The membership (Fig. 8.2) function maps all elements of the universalset U into the set A with values in the continuous normalized range 0 (non–membership) to 1(full membership), i.e.

µA : U → [0,1]

with 0 and 1 representing the lowest and highest grades (values) of membership, respectively.

411


Figure 8.2. A membership function

When {µA(x)} contains only the two points 0 and 1, the set A is non–fuzzy (crisp). A fuzzy set issaid to be normal if maxx µA(x) = 1. Subnormal fuzzy sets can be normalized by dividing eachµA(x) by maxx µA(x).

In fuzzy sets the membership grade does not need to be either 0 or 1, but can be any real numberbetween them. Loosely speaking, this is like excluding the possibility of only black or white,but accommodating all the different shades of gray in between. In fuzzy set theory a set iscompletely identified by its membership function as well as in crisp set theory. Indeed, while incrisp set theory the membership function can be any function defined in the environment set andhaving values in {0, 1}, in fuzzy sets the membership function is still defined in the universe ofdiscourse but it assumes values in [0, 1]. Thus, for every element in the universe of discourse themembership function of a fuzzy set gives its membership grade, that is

0 ≤ µA(x) ≤ 1 for any x ∈ U (8.1)

where the truth value µA(x) represents the degree of truth, subjectively assigned by a decisionmaker, of the statement x belongs to A.

Figure 8.3. Characteristic function and membership function

Figure 8.3 illustrates the two concepts, the characteristic function and the membership function.The interval (18, 25) on the scale of temperature is crisp, but the interval of the room tempera-tures where one feels comfortable is fuzzy. There is a zone of imprecision on both sides. Below18◦C the temperature is chilly, above 18◦C the temperature tends to be uncomfortably hot. The

412


form of the membership function depends on the individual, subjective feelings of the decisionmaker, however.

Since it is not easily acceptable to define a concept on the basis of subjective feelings, attemptshave been made to introduce a more objective definition of the truth value. Therefore, the truthvalue µA(x) is sometimes interpreted as the fraction of a sufficiently large number of refereesagreeing with the statement x belongs to A. Thus, it must be assumed that the fuzzy set A,despite the imprecision of its boundaries, can be delineated by subjectively associating a gradeof membership (a number between 0 and 1) with each of its elements.

As for crisp sets, a fuzzy set may be defined formally in two ways, each introduced by Zadeh(1965). The list method for a fuzzy set lists the membership grade of each element of a discrete,countable universe of discourse U to the set in question

A =n∑

i=1

µi/xi = {µ1/x1 + . . . + µn/xn} (8.2)

where xi denotes the ith member of U and µi/xi is the strength of membership of element xi.The use of the plus symbol to separate individual elements is a departure from standard settheory notation, which uses the comma. The plus sign in (8.2) denotes the union rather than thearithmetic sum.

To describe a fuzzy set on a continuous universe one writes

A =n∑

i=1

µA/x

where µA(x) is a function that represents the grade of membership to x to A for every elementx has in U .

For example, if one wishes to represent speeds ‘close to 50’ miles per hour using a fuzzy set witha continuous (non countable) universal set, µA(x) can be defined as

µA(x) =1

1 +(x− 50)2

50

This function, shown graphically in Figure 8.4, maps every real number into the set of speeds‘close to 50’ miles per hour (mph). If one were travelling 20 mph, that would be assigned a valueof 0.05, while 40 mph gets 0.33 and 50 mph gets 1. Clearly, the closer the number to 50, thehigher its membership in the set. The term ‘very close to 50’ would be

µA(x) =

1

1 +(x− 50)2

50

2

413


Figure 8.4. Continuous definition for ‘close to 50’ miles/hour

Alternatively, if one is dealing with a countable universe, a similar function can be defined inaccordance with equation (8.2) as

A = {0.03/10 + 0.05/20 + 0.11/30 + 0.33/40 + 1.0/50 + 0.33/60 + 0.11/70 +

0.05/80 + 0.03/90}which consists solely of points when plotted, as shown in Figure 8.5.

Figure 8.5. Countable definition for ‘close to 50’ mph

The concept of a fuzzy relation is closely related to the concept of a fuzzy set. Consider, forinstance, the relation R describing rough equality between two elements x and y, and use thesymbol µR(x,y) to express the truth value of the expression x ≈ y. For many practical purposes,it could be set

µR(x,y) = e−(x−y)2

It will be obvious that the relation R is in fact a fuzzy subset of the (x, y)-space and that µR isthe corresponding membership function.

414


8.2.2 Formulations of Membership Functions

Useful formulations of the membership grade functions are shown in Figure 8.6.

Figure 8.6. Main types of membership functions

The type of membership grade function reflects designers intention regarding the value of specificattribute.

Experience shows that the ‘Nehrling type’ membership grade function is very suited in conceptship design. In this respect four types are possible: attracting , ascending , aveting and descending ,as shown in Figure 8.7.

415


Two points on a membership grade curve are important and may be defined as

• y − y1 → the level of an attribute which is 100% satisfactory, i.e. the level that mayoptimistically be expected to be reached by the best design in respect to specific attribute;

• y = y1/2 = y1 − d → the level that is only 50% satisfactory, i.e. the level that may beexpected in the average design.

By assigning appropriate values to y1 and d, the designers may express the aspiration level forspecific attribute. For some attributes other modifiers to the formulation may be added, such asdifferent exponent values, asymmetric curves, etc.

Figure 8.7. Generalized Nerhling type functions

416


8.2.3 Fuzzy partitioning

A number of fuzzy sets form a fuzzy partition whenn∑

i=1

µAi(x) ∀ x ∈ X

The fuzzy sets Ai must all be subsets of the same universe X and none of the sets may equal 0or U . This property is very important in fuzzy inference systems, and most fuzzy sets used formfuzzy partitions.

One method of working with fuzzy sets is to treat them as a collection of crisp sets. This isachieved by using the concept of an α–cut. The formal definition of an α–cut is the crisp set Aα

containing all the elements of A with a membership grade ≥ α

Aα = {x ∈ U |µA(x) ≥ α}

The set of all α–cuts of a fuzzy set results in a family of nested crisp subsets of U . A fuzzy setcan be completely decomposed into a number of crisp sets by creating α–cuts for each distinctmembership value in the set, as shown in Figure 8.8, where each concentric ring shows themembers of the crisp set created by performing an α–cut with the value of α given. Note thatas α increases, each ring is fully contained by rings corresponding to a lower α. Thus, α–cut setscorrespond to discarding those elements of a fuzzy set that are ‘extreme’ in the sense of having‘low’ membership in the set.

Figure 8.8. Example of an α–cut set

Every property that is valid for a crisp set is also valid for an α–cut set. This means that onemethod of working with fuzzy sets can be to create the appropriate number of α–cuts and usestandard crisp set operations on them (Klir & Folger, 1988). When this processing has finished,the resulting crisp sets can be recombined to create a fuzzy set.

417


The method used to recombine a decomposed fuzzy set whether or not operations have beenperformed on the crisp sets is to multiply each member of an α–cut set by α and take the unionof each resulting set, i.e.

A =⋃α

α Bα

8.2.4 Properties of Fuzzy Sets

The properties shown in this subsection do not provide a complete coverage of this area. Rather,it is designed to show those properties that are required later in this chapter.

The maximum value of the elements in a set is called the height of the fuzzy set

height (A) = maxx∈X

µA(x)

where X ∈ U is the set from which the members of A are drawn.

The support of a fuzzy set A is a crisp set defined as

support (A) = {x ∈ X | µA(x) > 0}which results in a crisp set containing all members of A that are non–zero.

The core of a fuzzy set A, itself another crisp subset of X, is a subset of its support

core (A) = {x ∈ X | µA(x) = 1}

Fuzzy sets are generally normalized and convex . A normalized set is one in which at least onemembership value reaches the maximum permitted value, i.e. height(A) = 1. A non-normalizedset is one whose maximum value does not reach 1. An example of a normalized fuzzy set is shownin Figure 8.9.

Figure 8.9. Normalized fuzzy set

418


Table 8.2 lists some fuzzy operators (Klir & Folger, 1988).


Support (A) The support of a fuzzy set A is the crisp set containing all non–zeromembers of A

Normalized Set A set is considered normalized when at least one member attainsthe highest possible membership value

α–cut The crisp set Aα containing all the elements of A with amembership grade ≥ α

Aα = {x ∈ U|µA(x) ≥ α} for 0 ≤ α ≤ 1

Level Set The set of all α such that a distinct set is producedL(A) = {α|µA(x) = α} for some x ∈ U

Convex A fuzzy set is convex iff each α–cut is convex

Scalar Cardinality The summation of all membership grades in the fuzzy set A

|A| =∑

x∈UµA(x)

A ⊆ B If the membership grade of all members of A is less than or equal to themembership grade of the same members in B, then A is a subset of B

µA(x) ≤ µB(x) ∀ x ∈ UA = B A is equivalent to B

µA(x) = µB(x) ∀ x ∈ UA 6= B A is not equivalent to B

µA(x) 6= µB(x) for at least one x ∈ UA ⊂ B A is a proper subset of B

A ⊆ B and A 6= B

Table 8.2. Summary of fuzzy set properties

Now, a fuzzy number is a fuzzy set A in the one–dimensional universe of discourse U when thefollowing hypotheses are satisfied:

- the α–level subsets are intervals monotonously shrinking as α↑1,

- there is at least one x ∈ U such that µA(x) = 1,

By the first requirement, the inequality α1 < α2 implies

{x | µA(x) ≥ α1} ⊃ {x | µA(x) ≥ α2}

The second requirement that the top value of the membership function of A must be 1, seems tobe reasonable. If one considers the fuzzy number a, that is, the fuzzy set of numbers which areroughly equal to the crisp number a, obviously the crisp number a belongs to the fuzzy set a sothat

µA(a) = 1

419


In general, a fuzzy number has a membership function which increases monotonically from 0 to1 on the left–hand side; thereafter, there is a single top or a plateau at the level; and finally, themembership function decreases monotonically to 0 on the right–hand side.

The concept of convex fuzzy sets is introduced to define fuzzy numbers. Convex fuzzy sets arecharacterized by convex α–cuts. More formally, a fuzzy set is considered convex if and only if(Dubois and Prade, 1980)

µA [λx1 + (1− λ)x2] ≥ min [µA(x1),µA(x2)] ∀ x1,x2 ∈ X , ∀λ ∈ [0,1]

8.2.5 Extension Principle

First introduced by Zadeh (1965), the extension principle is one of the most important elementsof fuzzy set theory. It provides the framework necessary to extend crisp mathematical conceptsinto the fuzzy realm. This is accomplished by extending a function f that maps points in thecrisp set Ac to the crisp set Bc such that it maps between fuzzy sets A and B. The main provisionis to allow a mapping from points in the universe X to the universe Y.

A = {µ1/x1 + . . . + µn/xn}

B = f(A) = f{µ1/x1 + . . . + µn/xn} = {µ1/f(x1) + . . . + µn/f(xn)}

where A and B are fuzzy sets in X and Y respectively.

Using a modified example of speeds ‘close to 50’ miles per hour, and a mapping f(x) =√

x

A = {0.11/30 + 0.33/40 + 1.0/50 + 0.33/60 + 0.11/70}

B = f(A) =√{0.11/30 + 0.33/40 + 1.0/50 + 0.33/60 + 0.11/70}

= {0.11/√

30 + 0.33/√

40 + 1.0/√

50 + 0.33/√

60 + 0.11√

70}= {0.11/5.5 + 0.33/6.3 + 1.0/7 + 0.33/7.7 + 0.11/8.4}

If multiple x ∈ A map to the same element y ∈ f(A), the maximum membership value of elementsin A is selected for the membership value of y ∈ B. If no elements map to a particular y ∈ B,the membership grade for that element is zero.

Additionally, the fuzzy set operations such as complement, union and intersection can all bewritten using the extension principle. For example, the union of two fuzzy sets will be shown tobe representable using the function ‘max’. The extension principle can be used to map from atwo variable input space to the output space using this function, thus implementing the unionoperation.

420


8.2.6 Operations on Fuzzy Sets

The three most important operations on any set, whether crisp or fuzzy, are complement , union,and intersection. These three operations are capable of producing more complex operations whenused in combination. In the classical set theory, these operations can be defined uniquely. Infuzzy set theory these operations are no longer uniquely defined, as membership values are nolonger restricted to {0, 1} and can be in the range [0, 1]. Any definition of these operations onfuzzy sets must include the limiting case of crisp sets.

Fuzzy Complement

The least complex of the three operations, the so–called fuzzy complement describes the differ-ence between an object and its opposite. The membership function of the fuzzy complement ornegation A of the set A is

µA(x) = 1− µA(x) (8.3)

In the case that 0 < µA(x) < 1 it follows easily that

µA∩A(x) = min [µA(x),1− µA(x)] < 1

which implies that the law of non–contradiction is violated. To a certain degree, a fuzzy statementcan be true and not–true at the same time. In other words, the overlap of A and its complementcan be non–empty. Similarly, if 0 < µA(x) < 1, one has

1/2 ≤ µA∪A (x) = max [µA(x),1− µA(x)] < 1

so that the law of the excluded middle is also violated. A fuzzy statement is true or not trueor both to a certain extent only. In other words, the underlap of A and its complement is notnecessarily the universe of discourse U . It is easy to verify that

µA∩A(x) + µA∪A(x) = 1

so that the violations just mentioned are equal, that is, the intersection’s deviation from 0 equalsthe union’s deviation from l. In the crisp case, when the truth values are 0 or 1 only, the classicalresult holds

µA∩A(x) = 0 and µA∪A(x) = 1

An example to illustrate the above concepts is given by the membership function µA(x) of thefuzzy set A of the ages where human beings are referred to as young, at least in the personalopinion of an anonymous referee. He/she may decide, for instance, that µA(1) = µA(2) = . . . =µA(20) = 1 and that there is a monotonous decrease of µA until µA(40) = µA(41) = . . . = 0.

In addition, the referee may set µA(30) = 1/2 so that a 30–years old person is as young as he/sheis not-young. That person is young and not–young with the same degree of truth 1/2, and he/sheis either young or not–young or both with the same degree of truth.

421


Of course, one could ask how fuzzy the statement x belongs to A actually is. When µA(x) is closeto 0 or 1, the statement is almost crisp, and when µA(x) is close to 1/4 the statement is veryfuzzy indeed, but it is possible to be more precise. Consider the distinction between a fuzzy setand its complement. The ratio

µA∩A(x)µA∪A(x)

has the minimum value 0 if, and only if,

µA(x) = 0 or 1

and it has the maximum value 1 if, and only if,

µA(x) = µA(x) = 1/2

Hence, this so–called ratio of overlap and underlap seems to be an appropriate measure for thedegree of fuzziness of the statement x belongs to A.

Fuzzy Union

The concept of union of fuzzy sets can be introduced following the treatment of sets in classicalset theory. The operators of fuzzy union takes two sets and returns a simple set representingtheir union. Given the truth values

µA(x) and µB(x)

to represent the degrees of truth that an element x belongs to the respective sets A and B, thetruth value

µA∪B (x)

of the statement x belongs to A, to B, or to both cannot be smaller than the maximum of the twooriginal truth values.

In fuzzy-set theory, for each element in U the classic fuzzy union of A and B (Fig. 8.10), denotedA ∪B, is defined as the smallest fuzzy set containing both A and B.

The membership function for A ∪B is

µA∪B (x) = max [µA(x),µB(x)] x ∈ X (8.4)

When E denotes the empty set and U the universe of discourse one has

µA∪E (x) = max [µA(x),0] = µA(x)

µA∪U (x) = max [µA(x),1] = 1

422


Figure 8.10. Classic fuzzy union

The property of idempotency is

µA∪A (x) = max [µA(x),µA(x)] = µA(x)

It is easy to verify that commutativity law holds, becauseµA∪B (x) = µB∪A(x)

Similarly, the associative law is defined asµ(A∪B)∪C (x) = µA∪(B∪C) (x)

and the distributive law isµA∪(B∩C) = µ(A∪B)∩(A∪C) (x)

The maximum operator for the union and the minimum operator for the intersection of two fuzzysets are not necessarily interactive. The value

max [µA(x),µB(x)]

remains unchanged under small perturbations of µB(x) when

µA(x) > µB(x)

and a similar thing may happen tomin(µA(x),µB(x))

when the inequalityµA(x) < µB(x)

holds. It is easy to verify that the above rules also hold when the sets under consideration arecrisp. The maximum operator, for instance, also gives the correct answer for the union whenµA(x) and µB(x) are 0 or 1 only. This is one of the boundary conditions of fuzzy logic, whereasin the crisp case the operators must coincide with the classical operators.

The notion of a union is like the inflexible connective ‘or’. Thus, if A = {fast ships} and B ={long ships}, then A∪B = {fast ‘or’ long ships}. This rigid ‘or’ may be softened by forming thealgebraic sum of A and B, denoted as A + B, is defined as

µA+B(x) = [µA(x) + µB(x)]− µA(x)·µB(x) for each x ∈ X

423


Fuzzy Intersection

The operation of fuzzy intersection takes two sets and returns a single set representing theirdifference. On the analogy of the concept of union of fuzzy sets, the truth value

µA∩B (x)

of the statement x belongs to A and to B cannot be greater than the minimum of the originaltruth values.

Therefore, in fuzzy–set theory, the intersection of A and B (Fig. 8.11), denoted as A ∩ B, isdefined as the largest fuzzy set contained in both A and B. The membership function for A ∩B

is

µA∩B (x) = min [µA(x),µB(x)] (8.5)

When E denotes the empty set and U the universe of discourse one has

µA∩E (x) = min [µA(x),0] = 0

µA∩U (x) = min [µA(x),1] = µA(x)

Figure 8.11. Classic fuzzy intersection

The property of idempotency is

µA∩A (x) = min [µA(x),µA(x)] = µA(x)

It is easy to verify that the commutativity law holds, because

µA∩B (x) = µB∩A(x)

Similarly, the associative law gives

µ(A∩B)∩C (x) = µA∩(B∩C) (x)

whereas the distributive law provides

µA∩(B∪C) = µ(A∩B)∪(A∩C) (x)

424


The minimum operator for the intersection of two fuzzy sets is not necessarily interactive, so thevalue

max [µA(x),µB(x)]

remains unchanged under small perturbations of µB(x) when

µA(x) > µB(x)

and the same may happen to

min [µA(x),µB(x)]

when the inequality

µA(x) < µB(x)

holds.

The notion of an intersection is like the inflexible connective ‘and’. Thus, if A is a set of fastships and B is a set of long ships, the A ∩ B is that set of ships which are both fast ’and’ long.This inflexible ‘and’ may be softened by forming the algebraic product of the fuzzy sets A andB. The membership function for this algebraic product, denoted as AB, is defined as

µAB(x) = µA(x)·µB(x) for each x ∈ X.

To conclude consider the following example (Fig. 8.12). The fuzzy set A is taken to represent theset of comfortable velocities and B the set of high velocities, so that the truth values µA(x) andµB(x) stand for the degree that the velocity x is felt to be comfortable and high respectively.

Figure 8.12. Comfortable and high velocities

Obviously, the expression

µA∩B(x)

is the degree that the velocity x is felt to be comfortable and high at the same time. The drivermight aim at a velocity which is as comfortable as it is high (x = 160 km/h), so that he/she aimsat the value of x which maximizes the expression

min [µA(x),µB(x)]

but this is not necessary. He/she may aim at a velocity which is higher than it is comfortable,depending on his/her preference for comfort and high–speed driving. The choice of a particularvelocity, in fact a compromise between two conflicting objectives, is usually referred to as thedefuzzification.

425


Other Union and Intersection Operators

Although the maximum operator for the union of two fuzzy sets (see formula (8.4)) and theminimum operator for the intersection - see formula (8.5) - are still the most popular ones, thereare other operators who satisfy certain desirable properties.

In order to introduce some of these operators, µA(x) is often interpreted as the fraction of a groupof decision makers agreeing with the statement that x belongs to A. Similarly, the symbol µB(x)designates the fraction of the referees who agree with the statement that x belongs to B . Now,the truth value

µA∪B(x)

with the maximum operator for the union of A and B, can accordingly be seen to represent thefraction of the decision makers who agree with the statement that x belongs to A, to B, or to bothwhen the referees try to agree as much as possible. This is also shown in Figure 8.13 where thefractions with the respective sizes µA(x) and µB(x) are both positioned at the left–hand side ofthe interval [0, l].

Figure 8.13. Maximum agreement between referees

Similarly, the truth value

µA∩B(x)

with the minimum operator for the intersection of A and B, stands for the fraction of the decisionmakers who agree with the statement x belongs to A and to B , when agreement is pursued asmuch as possible.

Consider now the case where the decision makers disagree to the maximum extent. Then thetruth value of the statement x belongs to A, to B, or to both is given by

min(1,µA(x) + µB(x))

This is illustrated in Figure 8.14, where the fractions with the sizes µA(x) and µB(x) are positionedat the left–hand side and the right–hand side of the interval [0, l] respectively.

Figure 8.14. Maximum disagreement between referees

By a similar argument the truth value of the statement x belongs to A and to B can be writtenas

max [0,µA(x) + µB(x)− 1]

In the literature, the operators corresponding to the largest possible disagreement are usuallyreferred to as the bounded–sum operators.

426


Finally, one can also imagine the decision makers to be independent as much as possible. Thenthe truth values of the union and the intersection are simply given respectively by

µA(x) + µB(x)− µA(x)·µB(x)

and

µA(x) µB(x)

and it is easy to verify that

max [µA(x),µB(x)] ≤ µA(x) + µB(x)− µA(x)·µB(x) ≤ min [1,µA(x) + µB(x)]

and

min [µA(x),µB(x)] ≥ µA(x)·µB(x) ≥ max [0,µA(x) + µB(x)− 1]

Note that the above operators coincide with the classical operators for union and intersection inthe crisp case, when the truth values are 0 or 1 only.

In the applications of fuzzy–set theory there is a clear but insufficiently motivated preferencefor the maximum and the minimum operator - see the formulas (8.4) and (8.5) - to model theunion and the intersection of two fuzzy sets. In general, the question of when to apply whichoperator has not been solved at all. Although this is an unsatisfactory situation, below thepopular maximum and minimum operators will be used and the other ones will be ignored.

8.2.7 Elementhood and Subsethood

When the universe of discourse U is the one–dimensional space, the support of a fuzzy set A

can be an interval, a collection of disjunct intervals, an infinite sequence, a finite grid. For easeof exposition, attention here will be limited to fuzzy sets with a finite support. The cardinalityM(A) of the fuzzy set A can now be defined by

M(A) =∑

i

µA(xi)

When A happens to be a crisp set, the cardinality so defined stands for the number of elementsof A so that it coincides with the classical concept of cardinality.

In the previous subsections the reader was concerned with the degree of truth that a given elementx belongs to a given set A or, in other words, with the so–called elementhood E(x,A) of x withrespect to A. This concept has been generalized by Kosko (1992) who introduced the so–calledsubsethood S(B,A) of a set B with respect to A. The subsethood stands for the degree that agiven set B is a subset of another set A. Defined as the fraction of B which is contained in A thesubsethood can be written as

S(B,A) =M(B ∩A)

M(B)

427


an expression which has the typical ratio form of a conditional probability, namely

P (A|B) =P (B ∩A)

P (B)

Rewriting the subsethood of B with respect to A in the equivalent form

S(B,A) =

∑

i

min[µB(xi),µA(xi)]

∑

i

µB(xi)

one can readily see that

S(B,A) = 0 if, and only if µB(xi)·µA(xi) = 0 ∀ i

S(B,A) = 1 if, and only if µB(xi) ≤ µA(xi) ∀ i

Thus, the subsethood of B with respect to A is 0 if, and only if, the two sets are disjunct, andthe subsethood is 1 if, and only if, the set B is fully contained in A. In all other cases, it mustbe true that

0 < S(B,A) < 1

To a certain degree, the universe of discourse is also contained in any of its subsets. This maylead to an interesting interpretation of the concept of subsethood. Consider the case that A is acrisp subset of the universe of discours U ; then the subsethood of U with respect to A is

S(U ,A) =M(U ∩A)

M(U)=

M(A)M(U)

=nA

nU

where nA and nU represent the cardinalities of A and U . Now, if U stands for a set of identicaland independent probabilistic experiments and A stands for the subset of successful experimentsin U , then the subsethood of U with respect to A represents the relative frequency of the successes.

Using the concept of cardinality for fuzzy sets with a finite support, the degree of fuzziness of theset A can be defined as the ratio

M(A ∩ A)M(A ∪ A)

=

∑

i

min [µA(xi),1− µA(xi)]

∑

i

max [µA(xi),1− µA(xi)]

This expression equals 0 if, and only if, the set A is crisp. It equals 1 if, and only if,

µA(xi) = µA(xi) = 1/2 ∀ i

and it must have a value between 0 and 1 in all other cases. The set A clearly has the maximumdegree of fuzziness in the case that each element attains the maximum degree of fuzziness.

Consider now two sets in the universe of discourse with five rank-ordered elements: the fuzzy setA and the crisp set B defined as

428


A = {0.4,0.7,0.2,0.9,0.3} , B = {1,0,1,1,0}

It is evident that

A = {0.6,0.3,0.8,0.1,0.7} B = {0,1,0,0,1}A ∩ A = {0.4,0.3,0.2,0.1,0.3} A ∪ A = {0.6,0.7,0.8,0.9,0.7}

B ∩ B = the empty set E = {0,0,0,0,0}B ∪ B = the universe of discourse U = {1,1,1,1,1}

A ∩B = {0.4,0,0.2,0.9,0} A ∪B = {1,0.7,1,1,0.3}

Subsethoods can now easily be calculated as

S(B,A) =M(B ∩A)

M(B)=

1.53

S(U,A) =M(U ∩A)

M(U)=

M(A)M(U)

=2.55

The degree of fuzziness of the set A is given by

M(A ∩ A)M(A ∪ A)

=1.33.7

whereas the set B has the degree of fuzziness 0.

8.2.8 Fuzzy Numbers

In daily conversation qualitative terms are frequently used with a quantitative connotation. Im-precise numbers are also used: the distance between Trieste and Koper is roughly twenty kilome-ters, the annual number of road victims is roughly seven thousand, etc. Vagueness or imprecisioncan be observed. One of the interesting features of fuzzy logic is that it can model such impre-cise information via the concept of fuzzy numbers and that it can process these numbers via theconvenient arithmetic operations introduced by Dubois and Prade (1980).Fuzzy numbers are one of various ways to express imprecision of design variables, parameters andbehavior variables (objective functions and constraints). Indeed, imprecision in the design processis the imprecision of given values which are thought to have deterministic characters rather thanstochastic ones. Thus, it is more rational to treat imprecision of design variables and parametersvia fuzzy numbers.

There are several types of fuzzy numbers. Among them, the very simple LR-fuzzy numbers,which are formed by three real numbers and two shape functions as follows

x = (xl,x,xu)

429


Triangular fuzzy numbers

Triangular fuzzy numbers are the subclass of fuzzy numbers with a triangular membership func-tion. A triangular fuzzy number a is characterized by three parameters: the lower value al, themodal value am, and the upper value au. The interval (al,au) constitutes the basis of the triangle,and am is the position of the top (Fig. 8.15), with al < am < au. The modal value am coincideswith the crisp value a.

The length au − al of the basis (the width of the fuzzy number) depends on the actual circum-stances. Thus, if one talks about the triangular fuzzy number roughly equal to twenty to designatethe distance between Trieste and Koper in kilometers, the width may be ten or twenty percent ofthe modal value, but if one talks about the result of a scientific experiment with highly accurateequipment, the width may be a few promilies of the modal value only.

Figure 8.15. Triangular fuzzy number

The membership function of the triangular fuzzy number a is defined by

µA(x) =x− al

am − alif al ≤ x ≤ am

on the left–hand side, and by

µA(x) =au − x

au − amif am ≤ x ≤ au

on the right–hand side, whereas it is 0 elsewhere.

From now triangular fuzzy numbers will be denoted as ordered triples, so that they will simplybe written as

a = (al,am,au)

Trapezoidal fuzzy numbers

Triangular fuzzy numbers are easy to use, but sometimes a more sophisticated model is neededto work with imprecise quantities. Then one can also resort to trapezoidal fuzzy numbers which,having a plateau at the top value 1, are characterized by four parameters. Using the behaviorof the α-level sets one can easily define arithmetic operations which (sometimes approximately)preserve the trapezoidal shape of the membership function. Although the class of trapezoidalfuzzy numbers is more general than the class of triangular fuzzy numbers, cognitive economy isan incentive to use triangular fuzzy numbers in real–life applications.

430


8.2.9 Operations on Fuzzy Numbers

Fuzzy operations are defined using Zadeh’s extension principle, which provides a method for ex-tending non–fuzzy mathematical operations to deal with fuzzy sets and fuzzy numbers.

Addition of triangular fuzzy numbers

A heuristic argument is first presented to make it plausible that the sum of two triangular fuzzynumbers

a = (al,am,au) and b = (bl,bm,bu)

is given by the triangular fuzzy number

a + b = (al + bl,am + bm,au + bu) (8.6)

Let x be a point in the α–level set

[al + α(am − al),au − α(au − am)]

of a. This means that the statement x is roughly equal to a has at least the truth value a.Similarly, let y be a point in the α-level subset

[bl + α(bm − bl),bu − α(bu − bm)]

of b. This means that the statement y is roughly equal to b also has at least the truth value b.Then the statement z = x + y is roughly equal to a + b has at least the truth value a, so that z

must be in the α–level set of the sum of a and b. In other words, when x and y vary over therespective α–level sets just mentioned, then z varies over the interval

[(al + bl) + α ((am + bm)− (al + bl)),(au + bu)− α ((au + bu)− (am + bm))]

This must be the α–level set of the sum of a and b. On the other hand, this is precisely theα–level set of the triangular fuzzy number

(al + bl,am + bm,au + bu)

Hence, the membership function of the sum of a and b has a triangular shape. In other words,the addition is exact in the behavior of the parameters and exact in the shape of the membershipfunction.

Multiplication of triangular fuzzy numbers

The product of two fuzzy numbers is not exactly triangular. If the fuzzy numbers a and b areconsidered again, now under the additional hypothesis that the lower values (and henceforth theother parameters as well) are positive, then the α–level set of their product is the interval

[(al + α(am − al))× (bl + α(bm − bl)),(au − α(au − am))× (bu − α(bu − bm))]

431


However, the α–level set of the triangular fuzzy number

(al bl,am bm,au bu)

is given by the interval

[(albl + α(ambm − albl)),(aubu − α(aubu − ambm))]

On the left–hand side (the right-hand side can be analyzed in a similar way) the deviation fromthe triangular shape is given by

α(ambl + albm − albl − ambm) + α2(am − al)(bm − bl)

and the maximum deviation −0.25 (am − al) (bm − bl) is found at α = 0.5.

In what follows, however, the deviation will be ignored, and the product of a and b will simplybe written as

a× b = a b = (albl,ambm,aubu) (8.7)

Addition of arbitrary fuzzy numbers

Consider again two fuzzy numbers a and b, not necessarily with a triangular membership function.Using the analogy with the convolution in probability theory, the membership function of thesum of the two fuzzy numbers is defined by

µa+b(z) = maxx+y=z

[min(µa(x),µb(y))] (8.8)

Thus, in order to evaluate the membership function of the sum of a and b at the point z, theunion of all pairs (x.y) is taken adding up to z, such that x is roughly equal to a and y roughlyequal to b. This idea leads to the so–called extension principle.

Addition of triangular fuzzy numbers

One can now use the extension principle underlying (8.8) to demonstrate that the sum of twotriangular fuzzy numbers a and b is also triangular and that it satisfies formula (8.6). Theleft–hand sides of the respective membership functions are considered and the points

xα ∈ (al,am) and yα ∈ (bl,bm)

are chosen such that

µa(xα) = µb(yα) with 0 < α < 1

as well as a point is selected

zα ∈ (al + bl,am + bm)

such that

µa+b

(zα) = α

432


The points xα, yα and zα are unique because of the strictly monotonous behavior of the mem-bership functions in question. Let

z∗ = xα + yα

Two arbitrary points x ∈ (al,am) and y ∈ (bl,bm) are now considered such that x + y = z∗.Obviously, if x < xα, then y > yα, which implies

min [µa(x),µb(y)] < α

This inequality also holds when x > xα and y < yα. Moreover,

min [µa(xα),µb(yα)] = α

whence

maxx+y=z∗

[min(µa(x),µb(y))] = α

In other words

µa+b

(z∗) = α

so that one can write zα = z∗ = xα + yα. The observation that zα is now clearly a linear functionof α completes the proof that addition preserves the triangular shape of the membership function.

Multiplication of arbitrary fuzzy numbers

For the product of two arbitrary fuzzy numbers a and b with positive support, the membershipfunction is defined by

µab(z) = maxxy=z

[min (µa(x),µb(y))] (8.9)

Multiplication does not preserve the triangular shape of the membership function if the factorsare triangular, as it was seen in formula (8.7).

Functions of arbitrary fuzzy numbers

In general, the membership function of any function f(a,b) of two fuzzy numbers a and b can nowbe defined by

µf(ab)(z) = maxf(xy)=z

[min(µa(x),µb(y))]

This illustrates the extension principle: the union of the pairs (x,y) is considered such thatf(x,y) = z, where x is roughly equal to a and y roughly equal to b, in order to evaluate themembership function of f(a,b) at the point z.

433


In what follows, a simple set of arithmetic rules will be introduced to operate with triangularfuzzy numbers. In fact, the reader will only operate on the lower, the modal, and the uppervalues characterizing them.

Subtraction of triangular fuzzy numbers

The difference between two triangular fuzzy numbers a = (al,am,au) and b = (bl,bm,bu) is givenby the triangular fuzzy number

a− b = (al − bu,am − bm,au − bl) (8.10)

an assertion which can easily be verified by inspection of the behavior of the α–level sets. Notethat the solution of the equation

b + x = a

is given by

(al − bl,am − bm,au − bu)

which only represents a triangular fuzzy number if the three parameters have an increasing order.In fuzzy arithmetic, there is clearly a distinction between the implicit and the explicit solutionof an equation. In order to clarify the issue, consider the fuzzy numbers a = (9,10,18) andb = (1,3,7). Then

a− b = (2,7,17)

and the solution x to the above equation would be given by (8, 7, 11). The last–named tripledoes not represent a fuzzy number, however. This has important implications.

The sum of a triangular fuzzy number and its fuzzy opposite number is given by

(al − au,am − am,au − al) = (al − au,0,au − al)

which can be taken to represent roughly zero. In general, two triangular fuzzy numbers withopposite modal values could be opposite fuzzy numbers if their sum is roughly zero in the actualcontext. The equation

(al,am,au) + (xl,xm,xu) = (0,0,0)

with exactly zero in the right–hand side, has no solution, however, because the parameters of thetriple

(−al,− am,− au)

do not have the increasing order which is required for fuzzy numbers. In general, a triangularfuzzy number does not therefore have a proper opposite number.

Division of triangular fuzzy numbers

Although division does not preserve triangularity, the ratio of two triangular fuzzy numbers a

and b can be written in the form of the triangular fuzzy number

434


a

b=

(al

bu,am

bm,au

bl

)

provided that the lower values of a and b are positive.

The solution of the equation

b x = a

is approximately given by (al

bl,am

bm,au

bu

)

which indeed stands for a triangular fuzzy number if the three parameters have an increasingorder. Again, there is a distinction between the implicit and the explicit solution of an equation,and there are important implications.

The product of a triangular fuzzy number and its fuzzy inverse is given by

(al,am,au)×(

1au

,1

am,1al

)=

(al

au,am

am,au

al

)=

(al

au,1,

au

al

)

which can be taken to stand for roughly one. In general, two triangular fuzzy numbers withinverse modal values could be fuzzy inverse numbers if their product is roughly 1 in the actualcontext. The equation

(al,am,au)× (xl,xm,xu) = (1,1,1)

with exactly one in the right–hand side, does not have a fuzzy solution because the parametersof the triple (

1au

,1

am,1al

)

do not have the increasing order which is required for fuzzy numbers. In general, a triangularfuzzy number does not therefore have a proper inverse.

Maximum of triangular fuzzy numbers

The maximum of two triangular fuzzy numbers is not necessarily triangular, but one can write

max [(al,am,au),(bl,bm,bu)] = [max (al,bl), max (am,bm), max (au,bu)] (8.11)

The membership function of the maximum (the thick line) is shown in Figure 8.16. The correct-ness can be verified via inspection of the behavior of the α–level sets.

435


Figure 8.16. The maximum of two triangular fuzzy numbers

Exponentials, logarithms, and inverses

Functions of a single triangular fuzzy variable can also be defined via the extension principle.This leads to simple operations like the calculation of exponentials which can be written in theform

exp [(al,am,au)] = [exp(al), exp(am), exp(au)] (8.12)

whereby it is tacitly assumed that the deviations from triangularity may be ignored. Under thecondition al > 0 one can also write

ln (al,am,au) =] ln(al), ln(am), ln(au)] (8.13)

and1

(al,am,au)=

(1au

,1

am,1al

)

whereby it is also tacitly ignored that triangularity is not exactly preserved.

8.3 Fuzzy SMART

The simple multiattribute rating technique (SMART, Von Winterfeldt and Edwards, 1986) is amethod for multicriterial decision making (MCDM) whereby the decision maker evaluates a finitenumber of decision alternatives under a finite number of performance criteria. The purpose ofthe analysis is to rank the alternatives in a subjective order of preference and, if possible, torate the overall performance of the alternatives via the proper assignment of numerical grades orscores. SMART is first presented in its deterministic form, regardless of the vagueness of humanpreferential judgement, and thereafter a fuzzy variant is discussed which can easily be used for asensitivity analysis of the results. As a vehicle for discussion an example is employed to illustratethe applications of MCDM: the evaluation and the selection of a class of vessels.

8.3.1 Screening Phase

MADM starts with the so–called screening phase which proceeds via several categorizations.What is the objective of the decision process? Who is the decision maker or what is the compo-sition of the decision–making group? What are the performance criteria to be used in order tojudge the alternatives? Which alternatives are feasible or not totally unfeasible?

436

8.3 – Fuzzy SMART

Throughout the decision process new alternatives may appear, new criteria may emerge, oldones may be dropped, and the decision–making group may change. Many decision problems arenot clear–cut, and the decision makers have to find their way in the jungle of conflicting objectives.

The result of the screening phase is the so–called design matrix which exhibits the performanceof the alternatives. Under the so–called quantitative or measurable criteria the performance isrecorded in the original physical or monetary units. Under the qualitative criteria it can only beexpressed in verbal terms. Table 8.3 shows such a possible performance tableau for the vesselselection example. The tacit assumption is that the alternatives are in principle acceptable for thedecision makers and that a weak performance under some attributes (criteria) can be compensatedby an excellent performance under some of the remaining ones. In other words, the decisionmakers are in principle prepared to trade–off possible deficiencies of the alternatives under someattributes against possible benefits elsewhere in the performance matrix. The alternatives whichdo not appear in the matrix have been dropped from consideration because their performanceunder at least one of the attributes was beyond certain limits (crisp constraints).

Attribute A1 A2 A3 A4

Table 8.3. Decision matrix of four vessels under seven criteria

The importance of the tableau cannot be underestimate. In many situations, once the data areon the table, the preferred alternative clearly emerges and the decision problem can easily besolved. It is left to the decision makers to arrive at a compromise solution.

Given the performance matrix, the next question is how to select the attributes which are reallyrelevant. The number of attributes might be too large, and they are not independent. For exam-ple, the acquisition cost, the fuel consumption, and the costs for maintenance are closely related.Hence, the decision maker could take the estimated annual expenditures or just the buildingcost to represent the costs in the selection problem. Similarly, low accelerations and the absenceof noise and vibrations contribute to the comfort on board, which could be the real attribute.Nevertheless, measurable criteria usually help the decision makers to remain down to earth sothat they are not swept away by the nice design of a general arrangement plan, for instance.

Finally, the decision makers have to convert the data of the design matrix into subjective values ex-pressing their preferential judgement. For the qualitative criteria they usually have an arithmeticscale only to express their assessment of the performance. The seven–point scale 1, . . . ,7 whichis well–known in the behavioral sciences, and the scale 4, . . . ,10 which can easily be used for thesame purposes, will extensively be discussed in the subsections to follow. Under the quantitative

437


criteria the conversion is also non-trivial. To this end. a simple and straightforward conversionprocedure is proposed below, which derives many arguments from the behavioral sciences andfrom psycho–physics.

8.3.2 Categorization of a Range

Consider the subjective evaluation of vessels in the vessel selection problem, first under the build-ing cost criterion, thereafter under the operability criterion and the maximum speed criterion.This will enable the decision maker to illustrate not only the subdivision of the ranges of accept-able performance data but also the generation of judgemental categories (cost categories, . . .).For the time being, the problem is considered from the viewpoint of a single decision maker only.

Vessels under the cost criterion

Usually, low costs are important for a decision maker so that he/she should carefully consider thebuilding cost and possibly the annual manning costs. The building cost as such, however, cannottell whether a given vessel would be more or less acceptable. That depends on the context of thedecision problem, that is, on the spending power of the decision maker and on the alternativevessels which he/she seriously has at disposal. In what follows it will be assumed that theacceptable costs are anchored between a minimum cost Cmin to be paid anyway for the type orclass of vessels which the decision maker seriously considers and a maximum cost Cmax whichhe/she cannot or does not really want to exceed. Furthermore, it is assumed that the decisionmaker will intuitively subdivide the cost range (Cmin,Cmax) into a number of subintervals whichare felt to be subjectively equal. The grid points Cmin, Cmin + e◦, Cmin + e1, . . . , are takento denote the cost levels which demarcate these subintervals. The cost increments e0,e1,e2, . . .

represent the echelons of the so–called category scale under construction. In order to model therequirement that the subintervals must subjectively be equal, one should recall Weber’s statingthat the just noticeable difference ∆s in stimulus intensity must be proportional to the actualstimulus intensity itself. The just noticeable difference is the smallest possible step when thedecision maker moves from Cmin to Cmax, which is assumed to be practically the step carried outin the construction of the model. Thus, taking the cost increment Cmin as the stimulus intensity,i.e. assuming that the decision maker is not really sensitive to the cost as such but to the excessabove the minimum cost to be paid anyway for the vessels under consideration, it is set

eν − eν−1 = ε·eν−1 ν = 1,2, . . .

which yields

eν = (1 + ε) eν−1 = (1 + ε)2 eν−2 = . . . = (1 + ε)ν e◦

Obviously, the echelons constitute a sequence with geometric progression. The initial step is e0

and (1 + ε) is the progression factor. The integer–valued parameter ν is chosen to designate theorder of magnitude of the echelons.

438

8.3 – Fuzzy SMART

The number of subintervals is rather small because human beings have the linguistic ability to usea small number of verbal terms or labels in order to categorize the costs (cognitive economy). Thefollowing qualifications are commonly used as category labels here: ‘cheap’, ‘cheap - somewhatmore expensive’, ‘somewhat more expensive’, ’somewhat more-more expensive’, ‘more expensive’,‘more - much more expensive’, ‘much more expensive’.

Thus, there are four major, linguistically distinct categories: cheap, somewhat more, more, andmuch more expensive vessels. Moreover, there are three so–called threshold categories betweenthem which can be used when the decision maker hesitates between the neighboring qualifica-tions. Now it is necessary to link the cost categories with the cost levels Cmin + e0, Cmin + e1, . . .

.

The next subsection will show that human beings follow a uniform pattern in many unrelatedareas when they subdivide a particular range into subjectively equal subintervals. They demar-cate the subintervals by a geometric sequence of six to nine grid points corresponding to majorand threshold echelons, and the progression factor is roughly 2. Sometimes there is a geometricsequence with grid points corresponding to major echelons only, and the progression factor isroughly 4.

Take, for instance, the range between MU 20,000 and MU 40,000, where MU denotes the monetaryunit, for small to mid–size vessels. The length of the range is MU 20,000. Hence, setting the costlevel Cmin + e6 at Cmax one has

e6 = Cmax − Cmin

e0 (1 + ε)6 = 20,000 and (1 + ε) = 2 ⇒ e0 = 20,000/64 ≈ 300

Now, the cost levels are associated with the cost categories as follows:

C◦ = Cmin + e◦ MU 20,300 cheap vesselsC1 = Cmin + e1 MU 20,600 cheap - somewhat more expensive vesselsC2 = Cmin + e2 MU 21,200 somewhat more expensive vesselsC3 = Cmin + e3 MU 22,500 somewhat more - more expensive vesselsC4 = Cmin + e4 MU 25,000 more expensive vesselsC5 = Cmin + e5 MU 30,000 more - much more expensive vesselsC6 = Cmin + e6 MU 40,000 much more expensive vessels

Thus, the cost range (Cmin,Cmax) has been ‘covered’ by the grid with the geometric sequence ofpoints

Cν = Cmin + (Cmax − Cmin)× 2ν

64ν = 1,2, . . . ,6 (8.14)

In what follows Cν ie taken to stand for the ν–th cost category and the integer–valued parameterν for its order of magnitude, which is given by

ν = log2

(Cν − Cmin

Cmax − Cmin× 64

)(8.15)

439


Categorization of the costs means that each cost in or slightly outside the range (Cmin, Cmax)is supposed to ‘belong’ to a particular category, namely the category represented by the nearestCν . Of course, categorization can more appropriately be modelled via fuzzy–set theory. This willbe considered in the subsection 8.3.5. The vessels of the category C0 are referred as the cheapones within the given context, and the vessels of the categories C2, C4, and C6 as the somewhatmore, more, and much more expensive ones. At the odd–numbered grid points C1, C3, and C5,the decision maker hesitates between two adjacent gradations of expensiveness. If necessary, onecan also introduce the category C8 of vastly more expensive vessels which are situated beyondthe range, as well as the category C7 if the decision maker hesitates between much more andvastly more expensiveness. The even–numbered grid points are the so-called major grid pointsdesignating the major gradations of expensiveness. They constitute a geometric sequence in therange (Cmin, Cmax) with progression factor 4. If the decision maker also takes into account theodd-numbered grid points corresponding to hesitations, he/she has a geometric sequence of majorand threshold gradations with progression factor 2.

Figure 8.17. Categorization of a cost range

The crucial assumption here is that the decision maker considers the costs from the so–calleddesired target Cmin at the lower end of the range of acceptable costs. From this viewpoint he/shelooks at less favorable alternatives. That is the reason why the above categorization, in principlean asymmetric subdivision of the range under consideration, has an orientation from the lowerend. The upward direction is typically the line of sight of the decision maker under the costcriterion. Figure 8.17 shows the concave form of the relationship between the cost echelons onthe interval (Cmin, Cmax) and their order of magnitude.

Suppose that the costs of the vessels Aj and Ak belong to the categories represented by Cνj andCνk

, respectively. The relative preference for Aj with respect to Ak is expressed by the inverseratio of the cost increments above Cmin, which can be written as

Cνk− Cmin

Cνj − Cmin= 2 νk−νj (8.16)

440

8.3 – Fuzzy SMART

By this definition, a vessel in the cost category C0 is 4 times more desirable than a vessel in thecategory C2. The first–named vessel is said to be somewhat cheaper, the last–named vessel issomewhat more expensive. Hence, assuming that the decision maker also has a limited numberof labels to express relative preference in comparative judgement, we identify the ratio 4:1 withweak preference. Similarly, he/she identifies the ratio 16:1 with definite preference (the first–named vessel is cheaper, the last–named vessel more expensive), and a ratio of 64:1 with strongpreference (the first–named vessel is much cheaper, the last–named vessel much more expensive).The relative preference depends strongly on Cmin and weakly on Cmax. When Cmax increases,two costs which initially belong to different cost categories will tend to belong to the same one.

Vessels under the operability criterion

Numerical data to estimate the operability of vessels are usually available. Suppose that thedecision maker only considers vessels with an operability of at least Omin = 95%, so that he/sheis restricted to the interval (Omin, Omax) with Omax usually set to 100%. Following the mode ofoperation just described, the decision maker obtains the major grid points (the major categoriesof operability)

O0 = Omax − e0 = 99.9 % operable vesselsO2 = Omax − e2 = 99.7 % somewhat operable vesselsO4 = Omax − e4 = 98.7 % less operable vesselsO6 = Omax − e6 = 95.0 % much less operable vessels

because e0 = (100− 95)/64 ≈ 0.08. In general one can write

Oν = Omax − (Omax −Omin)× 2ν

64ν = 1,2, . . . ,6

The alternatives are compared with respect to the desired target, which is here taken to be atthe upper end Omax of the range of acceptable operabilities. The relative preference is inverselyproportional to the distance from the target. If one takes the symbols Oνj and Oνk

to denote theoperability of the alternative vessels Aj and Ak respectively, then the inverse ratio

Omax −Oνk

Omax −Oνj

= 2 νk−νj

represents the relative preference for Aj with respect to Ak under the operability criterion. Thequalification ‘somewhat more operable’ implies that the inverse ratio of the distances to therespective target is 4:l; the qualification ‘more operable’ implies that the inverse ratio is 16:l, etc.The relationship between the order of magnitude ν and the operability category Oν takes theexplicit form

ν = log2

(Omax −Oν

Omax −Omin× 64

)(8.17)

The typical relationship between the echelons on the dimension of operability and their orders ofmagnitude is shown in Figure 8.18.

441


Figure 8.18. Categorization of an operability range

Vessels under the maximum speed criterion

It may happen that the categorization starts not from the desired target at one end of the range,but from the opposite end point because the desired target is hazy. An example is given by thecategorization of the maximum velocities. The range of acceptable maximum velocities has aclear lower end point Vmin at 14.0 kn. Even if the shipowner should consistently prefer highermaxima to lower ones, the desired target is difficult to specify. Set Vmax at 22.0 kn. It seemsto be reasonable to choose the orientation from Vmin so that the following major grid points doexist:

V0 = Vmin + e0 = 14.1 kn slow vesselsV2 = Vmin + e2 = 14.5 kn somewhat faster vesselsV4 = Vmin + e4 = 16.0 kn faster vesselsV6 = Vmin + e6 = 22.0 kn much faster vessels

In general one has

Vν = Vmin − (Vmax − Vmin)× 2ν

64ν = ,2, . . . ,6

where the order of magnitude ν and the category Vν are connected by the relation

ν = log2

(Vν − Vmin

Vmax − Vmin× 64

)(8.18)

The relative maximum speed of two alternative vessels Aj and Ak with maximum speeds Vνj andVνk

respectively, is given by the ratio

eνj

eνk=

Vνj − Vmin

Vνk− Vmin

= 2 νj−νk (8.19)

not by the inverse ratio of the echelons as in the previous cases. The choice of the orientation isleft to the decision maker. What matters is his/her perspective on the decision problem.

442

8.3 – Fuzzy SMART

8.3.3 Assessing the Alternatives: Direct Rating

When decision makers judge the performance of the alternatives, they frequently express theirjudgement by choosing an appropriate value between a predetermined lower limit for the worstalternative and a predetermined upper limit for the best alternative. In schools and universitiesthis direct–rating procedure is known as the assignment of grades expressing the performance ofthe pupils or students on a category scale with equidistant steps, between 1 and 5, between 1 and10, or between 1 and 100. The upper limit varies from country to country. Because everybody hasonce been subject to his/her teacher’s judgement the grades have a strong qualitative connotationwhich can successfully be used in MADM. Concentrating on the scale between 1 and l0 supposethat a unit step difference represents an order of magnitude difference in performance. A studentwho scores 9 is an order of magnitude better than a pupil scoring 8, etc. A unit step differencedesignates a performance ratio 2. Returning to the vessel selection problem the following gradesare assigned

10 excellent order of magnitude ν = 08 good order of magnitude ν = 26 fair order of magnitude ν = 44 poor order of magnitude ν = 6

according to the major gradations of expensiveness and operability (in pass–or–fail decisions atschools the grades 1, 2, and 3 are normally used for a very poor performance that cannot becompensated elsewhere so that they are mostly ignored here). Now, considering two alternativevessels Aj and Ak under the building cost criterion with the respective grades

gj = 10− νj and gk = 10− νk

assigned to them, the inverse ratio is taken

eνk

eνj

=Cνk

− Cmin

Cνj − Cmin= 2 gj−gk

to stand for their relative expensiveness. The relative operability of the two alternatives is scoredin a similar way. Figure 8.19 and Figure 8.20 illustrate the relationship between judgementalcategories, orders of magnitude, and grades.

For maximum speed it is suitable a somewhat different assignment of grades. One takes

gj = 4 + νj and gk = 4 + νk

since he/she does not have an orientation from the desired target but from the opposite end ofthe range. Thus, the relative maximum speed of the two alternatives is expressed by the ratio

eνj

eνk

=Vνj − Vmin

Vνk− Vmin

= 2 gj−gk

not by the inverse ratio of maximum velocities above Vmin.

443


Figure 8.19. Categorization of a cost range

Figure 8.20. Categorization of an operability range

Because a decision maker works in fact with differences of grades only, there is an additive degreeof freedom in the grades which enables him/her to replace the scale 4, . . . ,10 by the scale 1, . . . ,7,a scale which is well–known in the behavioral sciences. Similarly, one can convert the qualitativescale ranging from — to +++ into the quantitative scale 4, . . . ,10. Thus, the decision makerhas a variety of scales to express his judgement, but there is a uniform approach to analyze theresponses.

The direct-rating procedure is illustrated via the vessel selection problem, under the assumptionthat the decision maker considers acquisition cost in the range between MU 20,000 and MU40,000, operability between 95% and 100%, and maximum velocities between 14 kn and 22 kn(by these ranges judgement as shown in Table 8.4.

444

8.3 – Fuzzy SMART

Cost operability Speed Performance Grade Qual. Scale

20,300 99.9 22.0 Excellent 10 +++20,600 99.8 18.0 Good/Excellent 9 ++21,200 99.7 16.0 Good 8 +++22,500 99.2 15.0 Fair/Good 7 025,000 98.7 14.5 Fair 6 -30,000 97.5 14.3 Poor/Fair 5 - -40,000 95.0 14.1 Poor 4 - - -

Table 8.4. Assignment of grades with predetermined range

Such an assignment of grades is feasible when the performance of the alternatives can be expressedin physical or monetary units on a one-dimensional scale. A direct–rating procedure is also used,however, when the performance can only be expressed in qualitative terms. In the evaluation ofvessels under the criterion of comfort, for instance, the decision maker is asked to rate the vesselsstraightaway. First, he/she has to determine the endpoints being requested to identify the worstand the best alternative and to assign proper grades to them. Thereafter he/she can interpolatethe remaining alternatives between the endpoints. This procedure also explains the name of theconsidered method: the simple direct–rating technique SMART to assess a number of alternativesunder a multiple set of attributes or criteria.

8.3.4 Criterion Weights and Aggregation

Some notation and terminology are introduced first. Consider a finite number of alternativesAj (j = 1, . . . ,n), under a finite number of performance criteria Ci (i = 1, . . . ,m), with the respec-tive criterion weights ci (i = 1, . . . ,m). Furthermore, it is assumed that the criterion weights arenormalized so that they sum up to l. The decision maker assessed the alternatives under eachof the criteria separately. Moreover, the decision maker expressed his/her judgement of alter-native Aj under criterion Ci by the assignment of the grade gij which from now will be calledthe impact grade. So far, the decision maker has been working on different dimensions: buildingcost, operability, and maximum speed. Judgemental statements like ‘somewhat more expensive’and ‘somewhat more operable’ cannot be aggregated, however, unless a transition is made to thenew, common dimension of desirability or preference intensity. That is the reason why the ex-pression ‘somewhat more operable’, for instance, is taken to stand for ‘somewhat more desirable’or ‘weakly preferred’ under the operability criterion. Similarly, it is assumed that the expression‘somewhat more fast’ stands for ‘somewhat less desirable’ under the building cost attribute, etc.

In order to judge the overall performance of the alternatives under all criteria simultaneously wecalculate the final grades sj of the respective alternatives Aj according to the so–called arithmetic–mean aggregation rule

sj =m∑

i=1

ci gij j = 1, . . . ,n (8.20)

445


The highest final grade is supposed to designate the preferred alternative. Let us first discuss thesignificance of the criterion weights, however, as well as their elicitation.

We start from the assumption that criteria have particular weights in the mind of the decisionmaker. They could depend on the manner in which the performance of the alternatives under eachof the criteria individually has been recorded, that is, on the units of performance measurement.They could also depend on the aggregation procedure generating the final grades which expressthe performance of the alternatives under all criteria simultaneously. Usually, however, decisionmakers ignore these issues. They are prepared to estimate the weights of the criteria or their rel-ative importance (weight ratios), regardless of how the performance of the alternatives has beenmeasured and regardless of the aggregation procedure so that they seem to supply meaninglessinformation. Many decision makers want to be consistent over a coherent collection of decisionproblems. Moreover, there is a good deal of distributed decision making in large organizations.The evaluation of a number of decision alternatives is entrusted to a committee, the criteria aresuggested in vague verbal terms or firmly prescribed by those who established the committee,but the choice of the attribute weights and the final aggregation are felt to be the responsibilityof administrators or design leaders at higher levels in the hierarchy.

In SMART the decision maker may ignore the units of performance measurement because thegrades do not depend on them. Under the building cost criterion, for instance, he/she may replaceUS dollars by Euros or any other currency, but this does not affect the orders of magnitude ofthe cost categories. Lootsma (1993, 1996) presented a more detailed discussion of the issue in astudy on SMART and the Multiplicative AHP, a multiplicative variant of the Analytic HierarchyProcess (Saaty, 1980). The relative importance of the criteria appeared to be a meaningful con-cept, even in isolation from immediate context (see also Section 5.3). In what follows we shallwork with the arithmetic–mean aggregation rule without further discussion.

We now concern ourselves with the numerical scale to quantify the relative importance (the weightratio) of any two criteria. The first thing we want to establish is the range of possible valuesfor the relative importance. Equal importance of two criteria is expressed by the ratio 1:1 ofthe criterion weights, but how do we express much more or vastly more importance? In order toanswer the question we carry out an imaginary experiment: we ask the decision maker to considertwo alternatives Aj and Ak and two criteria such that his/her preference for Aj over Ak under thefirst criterion Cf is roughly equal to his/her preference for Ak over Aj under the second criterionCs. Moreover, we suppose that the situation is extreme: the impact grades assigned to the twoalternatives are 6 units apart under each of the two criteria. We can accordingly write the impactgrades assigned to Aj as

gfj and gsj

and the impact grades assigned to Ak as

gfk = gfj − 6 and gsk = gsj + 6

This means that the decision maker has a strong preference for Aj over Ak under the first criterionCf and an equally strong but opposite preference under the second criterion Cs. If the two criteria

446

8.3 – Fuzzy SMART

are felt to be equally important, the final grades of the two alternatives will be equal so that thedecision maker is indifferent between the two alternatives under the two criteria simultaneously.However, if the final grades are 5 units apart, the ratio ω of the corresponding criterion weightshas to satisfy the relation

{ω

ω + 1gfj +

1ω + 1

gsj

}−

{ω

ω + 1(gfj − 6) +

1ω + 1

(gsj + 6)}

= 5 (8.21)

whence ω = 11. Such a ratio implies that the strong preference for Aj over Ak under Cf almostcompletely wipes out the equally strong but opposite preference under Cs.

If the impact grades of the two alternatives are 8 units apart under each of the two criteria andif the final grades are 7 units apart, the ratio ω has to satisfy the relation

{ω

ω + 1gfj +

1ω + 1

gsj

}−

{ω

ω + 1(gfj − 8) +

1ω + 1

(gsj + 8)}

= 7 (8.22)

which yields ω = 15.

In addition, two new assumptions are now introduced: (i) the number of gradations to expressrelative preference for the alternatives equals the number of gradations for the relative importanceof the criteria, and (ii) the numerical values associated with the gradations of relative importanceconstitute a sequence with geometric progression. In the extreme case of formula (8.21), where amuch higher preference under the first criterion is practically wiped out by a much higher pref-erence under the second criterion, we accordingly refer to the relative importance of the firstcriterion with respect to the second one as much higher . Similarly, in the extreme case of formula(8.22), the relative importance of the first criterion with respect to the second one is vastly higher .So, a ratio of 16:1 may be taken to stand for vastly more importance. This is also confirmed byother imaginary experiments where a decision maker is supposed to have a very strong preferencefor Aj under the first of three criteria and a definite preference for Ak under the second and thethird criterion.

A simple geometric sequence of values, with echelons corresponding to equal, somewhat more,more, much more, and vastly more importance, and ‘covering’ the range of values between 1/16and 16, is given by the sequence 1/16, 1/8, 1/4, 1/2, l, 2, 4, 8, 16 with progression factor 2.Hence, we obtain the following geometric scale for the major gradations of relative importance:

16 Cf vastly more important than Cs

8 Cf much more important than Cs

4 Cf more important than Cs

2 Cf somewhat more important than Cs

1 Cf as important as Cs

and if we allow for threshold gradations to express hesitations between two adjacent qualificationsin the above list, we have a geometric sequence with progression factor

√2.

We can now ask the decision maker to express the importance of the criteria in grades on thescale 4,5, . . . ,10 (grades lower than 4 are possible but they practically eliminate the correspondingcriteria). A difference of 6 units has to represent the ratio 8. This can be achieved via the

447


progression factor√

2. Taking hi to stand for the grade assigned to criterion Ci, we estimate theratio of the weights of two criteria Ci1 and Ci2 by

(√2)hi1

−hi2

An non-normalized weight of Ci is accordingly given by (√

2)hi .

The normalization thereafter, whereby we eliminate the additive degree of freedom in the grades,yields the desired weight

ci =(√

2)hi

∑

i

(√

2)hi(8.23)

of criterion Ci. The criterion weights clearly sum up to l.

8.3.5 Sensitivity Analysis via Fuzzy SMART

The SMART version described so far has a particular drawback: the decision maker is supposed tochoose one grade only, for the assessment of an alternative under a given criterion and also for theassessment of the importance of the criteria. Usually, however, he/she realizes that several grades(not only the integer–valued ones) are more or less appropriate. There are several reasons for this.

1. Even if the performance of the alternatives can be expressed on a numerical scale, in phys-ical or monetary units, the data may be imprecise: building cost is negotiable, maximumspeed may be measured under ideal circumstances, and so forth.

2. The gradations of human judgement (excellent, good, fair, poor), to be used if the perfor-mance of the alternatives can only be expressed in verbal terms under qualitative criteriasuch as design, ambiance, comfort, . . . , are vaguely defined.

3. The upper and the lower ends of the ranges of acceptable performance data urge the decisionmaker to concentrate his/her attention on the alternatives which are in principle acceptable,but they are only vaguely known.

4. The relative importance of the criteria, usually expressed in verbal terms only (equally im-portant, somewhat more, more, much more important), has an imprecise meaning.

The decision maker could accordingly model his/her judgement by assigning truth values toseveral (integer and non–integer) grades in order to express how well they represent his/her pref-erence. An alternative mode of operation would be to choose the most appropriate grade as we1las right–hand and left–hand spreads indicating how far his/her judgement extends.

Let us further explore the last–named approach. Then we model the decision maker’s judgementof the performance of alternative Aj under criterion Ci by a fuzzy number, in particular by thetriangular fuzzy number

gij = (gijl,gijm,giju)

448

8.3 – Fuzzy SMART

Of course, the decision maker is supposed to supply, not only the modal value gijm, but also thelower value gijl and the upper value giju. Similarly, we model the importance of criterion Ci bythe triangular fuzzy number

hi = (hil,him,hiu)

again under the assumption that the decision maker wi1l be prepared to supply, not only themodal value him, but also the lower value hil and the upper value hiu. Using the arithmeticoperations of subsection 7.3.4 we find a non–normalized weight of criterion Ci of the form

((√

2)hil),(√

2)him),(√

2)hiu))

and we obtain normalized weights of the criteria when we divide the lower, modal, and uppervalues by the sum of the modal values. This procedure (allowed because the grades hil, him,hiu, i = 1, . . . ,m, are supposed to have an additive degree of freedom) guarantees that the modalvalues are properly normalized in the sense that they sum up to l.

This approach does not seem to be practical. We ask the decision maker to supply much moreinformation than in the crisp case whereas the added value of the analysis does not proportionallyincrease. However, we can simplify the procedure considerably. In order to get a rough idea ofhow the decision maker’s imprecision affects the final grades of the alternatives, he/she can beasked to specify a uniform right–hand spread and left–hand spread σ (= upper - modal value= modal - lower value) which he/she almost never exceeds in the actual decision problem. Areasonable value seems to be given by σ = 1 so that not only a given integer grade 9 but also thenon–integer grades between g − 1 and g + 1 are more or less prototypical. We can now write

gij = (gijm − σ,gijm,gijm + σ)

hi = (him − σ,him,him + σ)

Normalized criterion weights are accordingly given by

ci = cim ×((√

2)−σ),1,(√

2)σ))

where cim stands for the normalized weight of criterion Ci in the crisp case, written as

cim =(√

2)him

∑

i

(√

2)him(8.24)

Ignoring the bounds on the impact grades we could write the fuzzy final grades of the alternativesin the form

sj =∑

i

ci gij =∑

i

cim ×((√

2)−σ),1,(√

2)σ))× (gijm − σ,gijm,gijm + σ)

but since the final grades must be between 4 and 10 we have the approximate result

sjm =∑

i

cim gijm (8.25)

449


sjl = max(4,(√

2)−σ (sjm − σ))

(8.26)

sju = min(10,(

√2)σ (sjm + σ)

)(8.27)

Take these formulas for a sensitivity analysis of the results in Table 8.5.

Attribute Weight Des1 Des2 Des3 Des4

Building costMaximum speedAccelerationCargo volumeoperabilityAmbiance

Final scores

Table 8.5. Final design matrix for a vessel selection problem

With σ = 1 the fuzzy final grades of the alternative vessels are as follows:

Des1 (4.0, 6.6, 10)Des2 (4.0, 7,0, 10)Des3 (4.0, 7.0, 10)Des4 (5.0, 6.0, 9)

The 50%–level sets of the fuzzy final grades of Des1 and Des2 are (5.3, 8.3) and (5.5, 8.5) respec-tively. The ratio of the overlap and the underlap of the two sets is 0.88, a rather high value whichdoes not yet allow us to drop Des3 and Des4. In general, we take the above ratio of overlap andunderlap as a measure for the difference between two triangular fuzzy numbers. It may be equalto one, even when the modal values do not coincide, and it is zero as soon as the area where thetwo triangles overlap is below the 50% level.

In the crisp version of SMART described in the previous sections and in the present fuzzy ex-tension. the calculations are very simple indeed, and they should be. The leading objective ofMADM is to structure a decision problem. First, there is the screening phase where the decisionmakers, the alternatives, and the criteria are selected. Next, the performance matrix is drawnup. If the information so obtained does not readily bring up a preferred alternative, a full–scaleMADM problem arises. By an appropriate choice of the ranges representing the context of theactual decision problem the decision maker can easily assign impact grades to the alternativesunder the respective criteria, in order to express how well the alternatives perform within the cor-responding ranges. The final grades designate a possibly preferred alternative. For a sensitivityanalysis the decision maker may use a fuzzy extension of the method.

450

8.4 – Additive and Multipllcative AHP

8.4 Additive and Multipllcative AHP

The Analytic Hierarchy Process (AHP) of Saaty (1980) is a widely used method for MADM,presumably because it elicitates preference information from the decision makers in a mannerwhich they find easy to understand. The basic step is the pairwise comparison of two so–calledstimuli, two alternatives under a given criterion, for instance, or two criteria. The decision makeris requested to state whether he/she is indifferent between the two stimuli or whether he/shehas a weak, strict, strong, or very strong preference for one of them. The original AHP hasbeen criticized in the literature because the algorithmic steps do not properly take into accountthat the method is based upon ratio information. The shortcomings can easily be avoided in theAdditive and the Multiplicative AHP to be discussed below. The Additive AHP is the SMARTprocedure with pairwise comparisons on the basis of difference information. The MultiplicativeAHP with pairwise comparisons on the basis of ratio information is a variant of the original AHP.There is a logarithmic relationship between the Additive AHP (SMART) and the MultiplicativeAHP. Both versions can easily be fuzzified. The reasons why we deviate from the original AHPwill be explained at the end of this subsection.

8.4.1 Pairwise Comparisons

First, the assessment of the alternatives under the respective criteria is considered. In the basicpairwise–comparison step of the AHP, two alternatives Aj and Ak are presented to the decisionmaker whereafter he/she is requested to judge them under a particular criterion. The underlyingassumptions are: (i) under the given criterion the two alternatives have subjective values Vj andVk for the decision maker, and (ii) the judgemental statement whereby he/she expresses his/herrelative preference for Aj with respect to Ak provides an estimate of the ratio Vj/Vk. For reasonsof simplicity we immediately illustrate the basic step via the subjective evaluation of vessels, firstunder the cost criterion and thereafter under the operability criterion. Finally, the subjectiveevaluation under qualitative criteria is briefly discussed.

Vessels under the cost criterion

We assume again that the decision maker is only prepared to consider alternative vessels withbuilding costs between a lower bound Cmin, the cost to be paid anyway for the vessels whichhe/she seriously has in mind, and an upper bound Cmax, the cost that he/she cannot or does notreally want to exceed. In order to model the relative preference for alternative Aj with respectto Ak we categorize the costs which are in principle acceptable. We first ‘cover’ the range (Cmin,Cmax) by the grid with the geometric sequence of points

Cν = Cmin + (Cmax − Cmin)× 2ν

64ν = 0,1, . . . ,6

Just like in SMART we take Cν to stand for the ν-th cost category and the integer–valuedparameter ν for its order of magnitude, which is given by

451


ν = log2

(Cν − Cmin

Cmax − Cmin× 64

)(8.28)

The vessels of the category C◦ are the cheap ones within the given context. The vessels ofthe categories C2, C4, and C6 are somewhat more, more, and much more expensive. At theodd–numbered grid points C2, C3, and C5 the decision maker hesitates between two adjacentgradations of expensiveness. Sometimes we also introduce the category C8 of the vastly moreexpensive vessels which are situated beyond the range, as we1l as the category C7 if the decisionmaker hesitates between much more and vastly more expensiveness. The even–numbered gridpoints are the so–called major grid points designating the major gradations of expensiveness.They constitute a geometric sequence in the range (Cmin, Cmax) with progression factor 4. Ifwe also take into account the odd–numbered grid points corresponding to hesitations, we have ageometric sequence of major and threshold gradations with progression factor 2.

Suppose that the costs of the vessels Aj and Ak belong to the categories represented by

Cνj and Cνk

respectively. We express the relative preference for Aj with respect to Ak by the inverse ratio ofthe cost increments above the desired target Cmin so that it can be written as

Ojk =

(Cνk

− Cmin

Cνj − Cmin

)= 2νk−νj (8.29)

A vessel of the category C2 is somewhat more expensive than a vessel in the category C◦. In otherwords, there is a weak preference for the vessel in the category C◦. It is 4 times more desirablethan a vessel in the category C2. On the basis of such considerations weak preference with theratio 4:l are identified. Similarly, definite preference is identified with the ratio 16:l, and strongpreference with the ratio 64:l.

If the cost category around Cν is represented by the grade g = 10− ν, the relative preference forthe vessel Aj with respect to Ak can be expressed by the difference of grades

qjk = log2 Ojk = log2

(Cνk

− Cmin

Cνj − Cmin

)= νk − νj = gj − gk (8.30)

The major gradations of the decision maker’s comparative judgement are now put on a numericalscale in two different ways. We assign scale values, either to the relative preferences themselves,or to the logarithms of the relative preferences. The assignment is shown in Table 8.6, whererelative preferences with scale values assigned to them, in real magnitudes as ratios of subjectivevalues. and in logarithmic form as differences of grades. The reader can easily complete theassignment of values to the threshold gradations between the major ones.

452


Comparative judgement of Relative preference for Rel. preference Ojk Diff. of gradesAj with respect to Ak Aj w.r.t. Ak in words in real magnitudes qjk = log2 Ojk

Aj much less expensive strong preference for Aj 64 6Aj less expensive strict, definite pref. for Aj 16 4Aj somewhat less expensive weak preference for Aj 4 2Aj as expensive as Ak indifference 1 0Ak somewhat less expensive weak preference for Ak 1/4 -2Ak less expensive strict, definite pref. for Ak 1/16 -4Ak much less expensive strong preference for Ak 1/64 -6

Table 8.6. Comparative judgement under the acquisition cost criterion

There are now two different ways to collect the preference information from the decision maker:

• He/she can be asked to consider the axis corresponding to the building cost criterion andto specify the endpoints of the range of acceptable costs. Next, we identify the judgementalcategories on the range, the corresponding orders of magnitude, and the correspondinggrades. Thereafter, we can immediately express his/her relative preference Ojk for Aj withrespect to Ak under the cost criterion by

Ojk = 2 qjk = 2 gj−gk (8.31)

• If the decision maker is unable or unwilling to specify the endpoints of the range of accept-able costs we can ask him/her to express his/her comparative judgement directly in words,that is, to state whether he/she is indifferent between the two alternatives under the givencriterion, or whether he/she has a weak, a definite, or a strong preference for one of the two.Thereafter, we set the numerical estimate rjk of his/her relative preference for Aj with re-spect to Ak under the building cost criterion to the appropriate value as shown in Table 8.6.

To illustrate matters, it is supposed that a decision maker considers two alternative vessels Aj andAk of MU 25.000, and MU 30,000, respectively. The decision maker is not prepared to specify theendpoints of the range of acceptable costs. Nevertheless, it remains necessary to keep a somewhatholistic view on the alternatives A1, . . . ,An. The two alternatives Aj and Ak cannot reasonablybe judged in isolation from the context of the selection problem. Hence, the decision maker firstpartitions the set of alternatives into three categories: the vessels which are ‘good’ because theircosts are roughly below MU 22.000; the vessels which are ‘bad’ because their costs are roughlybeyond MU 30.000; and the intermediate category with costs between the two thresholds justmentioned. Since the vessels Aj and Ak are both contained in the intermediate category and notvery close the decision maker first declares Ak to be somewhat more expensive than Aj , We modelhis/her relative preference for Aj with respect to Ak by setting Ojk = 4. The decision maker alsofeels that his/her relative preference could be expressed by a difference of grades which is equal to1 so that we would obtain Ojk = 2. In order to solve the conflict the decision maker reconsidershis/her previous judgemental statements. He/she specifies the interval between MU 20.000, andMU 40.000, as the range of acceptable costs. Here, the two vessels have the respective grades 6 and5 so that the relative preference for Aj with respect to Ak can now be modelled by setting Ojk = 2.

453


Vessels under the operability criterion

Let us again suppose that the decision maker only considers vessels with a operability of at leastOmin so that he/she is restricted to the interval (Omin, Omax), with Omax usually set to 100%.We cover the given range by the grid with the geometric sequence of points

Oν = Omax − (Omax −Omin)× 264

ν = 1, . . . ,6

The alternatives are again compared with respect to the desired target. If we take the symbolsOνj and Oνk

to denote the operability of the alternative vessels Aj and Ak respectively, thenthe inverse ratio

Ojk =

(Omax −Oνk

Omax −Oνj

)= 2νk−νj (8.32)

of the distances with respect to the desired target Omax represents the relative preference for Aj

with respect to Ak under the operability criterion. The qualification ‘somewhat more operable’implies that the inverse ratio of the distances to the desired target is 4:1, etc. Representing theoperability category around Oν by the grade g = 10−ν, we can also express the relative preferencefor the vessel Aj with respect to Ak by the difference of grades

qjk = log2 Ojk = log2

(Omax −Oνk

Cmin −Oνj

)= νk − νj = gj − gk (8.33)

The assignment of numerical values to the major gradations of comparative judgement is shownin Table 8.7.

Comparative judgement of Linguistic preference for Rel. preference Ojk Diff. of gradesAj with respect to Ak Aj with respect to Ak in real magnitudes qjk = log2 Ojk

Aj much more operable strong preference for Aj 64 6Aj more operable strict, definite pref. for Aj 16 4Aj somewhat more operable weak preference for Aj 4 2Aj as operable as Ak indifference 1 0Ak somewhat more operable weak preference for Ak 1/4 -2Ak more operable strict, definite pref. for Ak 1/16 -4Ak much more operable strong preference for Ak 1/64 -6

Table 8.7. Comparative judgement under the operability criterion

The elicitation of preference information can now again be carried out in two different waysbecause operability is expressed on a one-dimensional scale. We can ask the decision maker tospecify the endpoints of the range of acceptable operabilities whereafter we calculate the gradesto be assigned to the alternative vessels under the operability criterion. This yields the corre-sponding difference of grades and the relative preference in its real magnitude. If the decisionmaker is unable or unwilling to specify the requested endpoints, however, we can use his/her

454


comparative judgement directly. Thus, somewhat more operability yields the relative preference4:1 and the difference of grades 2, more operability yields the relative preference 16:1 and thedifference of grades 4, etc.

The above procedure whereby we assign numerical values to the relative preferences themselvesor to the logarithms of the relative preferences is similar to the mode of operation in acousticswhere ratios of sound intensities are encoded, either in real magnitudes or logarithmically onthe decibel scale. The elicitation of preferential information from the decision maker seems toproceed in two different ways. Ratio information is obtained on a scale with geometric progressionif the decision maker is asked to formulate his/her relative preferences (Multiplicative AHP), anddifference information is obtained on an arithmetic scale if one asks the decision maker to expresshis/her judgement via a difference of grades (Additive AHP, SMART with pairwise comparisons).For the decision maker these are alternative ways of saying the same thing, however.

8.4.2 Calculation of Impact Grades and Scores

In a method of pairwise comparisons we seem to collect much more information than we need.With n alternatives the decision maker may carry out n(n − 1)/2 basic experiments in order tofill the upper or the lower triangle in the matrix {Ojk} of pairwise comparisons, whereas (n− 1)properly chosen experiments would be sufficient (A1 versus A2, A2 versus A3, etc.). The redun-dancy is usually beneficial, however, since it enables us to smooth the results of the analysis. TheAdditive and the Multiplicative AHP can easily analyze incomplete pairwise comparisons whichoccur when the decision maker does not carry out the maximum number of basic experiments.In addition, it can easily be used in groups of decision makers who individually do not even carryout the minimum number of experiments.

Incomplete pairwise comparisons in a group of decision makers

Let us first consider a group of decision makers who are requested to assess the alternatives Aj

and Ak under a particular criterion. We shall be assuming that these alternatives have the samesubjective values Vj and Vk for all decision makers. Moreover, the decision makers are supposedto estimate the ratio Vj/Vk via their judgemental statements. These are strong assumptions, butthey are not unreasonable since many decisions are made within an organizational frameworkwhere the members have common values.

The verbal comparative judgement given by decision maker d is converted into the numericalvalue rjkd according to the rules of the pairwise comparisons, so that

qjkd = log2 Ojkd

Next, the vector V of subjective values is approximated via logarithmic regression. Introducingthe set Djk to denote the set of decision makers who actually expressed their opinion about thetwo alternatives under consideration, the vector V of subjective values is approximated via theunconstrained minimization of the sum of squares

455


∑

j<k

∑

d∈Djk

(log2 Ojkd − log2 νj + log2 νk)2 (8.34)

Introducing the new variables wj = log2 νj expression (8.34) can be rewritten as∑

j<k

∑

d∈Djk

(qjkd − wj + wk)2 (8.35)

So, it does not matter which type of information we collect from the decision makers. The sumof squares (8.4.2) is minimized regardless of whether one has ratio or difference information.Since (8.4.2) is a convex quadratic function we can easily find an optimal solution by solvingthe associated set of normal equations which is obtained by setting the first–order derivatives of(8.4.2) to zero. Using the properties

qjkd = −qkjd for any j and k

one finds the associated set of normal equations from∑

j<k

∑

d∈Djk

(qjkd − wj + wk) = 0

so that the normal equations themselves take the form

wj

n∑

k=1,k 6=j

Njk −n∑

k=1,k 6=j

Njkwk =∑

j<k

∑

d∈Djk

qjkd j = 1, . . . ,n (8.36)

where Njk denotes the cardinality of the set Djk. The normal equations are dependent. Theysum up to the zero equation (see the below example). There is at least one additive degree offreedom in the unconstrained minima of the function (8.4.2) because there are only differences ofvariables in the sum of squares. Hence, there is at least one multiplicative degree of freedom inthe unconstrained minima of the function (). In other words, we can only draw conclusions fromdifferences wj − wk and from ratios νj/νk. Note that the decision makers have to judge morepairs if there are two or more degrees of freedom.

Let illustrate the foregoing results with the pairwise comparisons of the Renault Laguna 2.0,the Honda Accord 2.0, and the Ford Mondeo 1.8 under the criterion of comfort. There are fourdecision makers, the parents and the two adult children of a family. They do not necessarilycarry out all possible pairwise comparisons. Table 8.8 shows their verbal judgemental statementsin logarithmic form: equally comfortable 0, somewhat more comfortable ± 2, more comfortable± 4, much more comfortable ± 6. Obviously, the cells of the pairwise–comparison tableau arenot completely filled with four entries each, and there is no information in the cells on the maindiagonal. There are four decision makers who do not judge all possible pairs. The Solution tothe associated normal equations yields SMART impact grades between 4 and 10 with an additivedegree of freedom and the AHP impact scores summing up to 1 with a multiplicative degree offreedom.

456


Ship Ro–Ro Ro–Ro Ro–Ro Solution of SMART AHPDesign Des001 Des002 Des003 normal eq. grades scores

Des001 empty -1, -3 -2, +2, -1 0 6.0 0.203Des002 +1, +3 empty -3 1.18182 7.2 0.461Des003 +2, -2, +1 +3 empty 0.72727 6.7 0.336

Table 8.8. Pairwise-comparison tableau under the criterion of comfort in logarithmic form

The normal equations corresponding to the pairwise comparisons in Table 8.8 can be written inthe explicit form

5w1 −2w2 −3w3 = -5−2w1 +3w2 −w3 = +1−3w1 −w2 +4w3 = +4

The first equation originates from the first row in the pairwise–comparison tableau; the coefficient5 on the main diagonal stands for the total number of entries, the coefficients 2 and 3 for thenumber of elements in the second and the third cell, whereas the right–hand side element -5represents the sum of the entries. The remaining equations are built up in a similar way. Sincethe equations are dependent they sum to the zero equation) we drop one of them, and since thesolutions have an additive degree of freedom we can arbitrarily choose one of the variables. Thus,setting wj = 0 and dropping the first equation we obtain the solution exhibited in Table 8.8.The additive degree of freedom, designated by the symbol θ. is used to shift the solution so thatSMART impact grades are obtained

gj = wj + θ j = 1, . . . ,n (8.37)

which are nicely but somewhat arbitrarily situated between 4 and 10. Next, we compute

vj = 2wj j = 1, . . . ,n (8.38)

and these values are normalized in order to obtain AHP impact scores aj by setting

aj = β vj j = 1, . . . ,n (8.39)

where β stands for the normalization factor which guarantees that the impact scores sum up to1. Hence

aj =2wj

n∑

j=1

2wj

(8.40)

Note that both the shift and the normalization are cosmetic operations carried out in order topresent the results in a more or less attractive way. It will be clear from Table 8.8 that the AHP

457


impact scores (Multiplicative AHP) suggest more distinction between the alternatives than theSMART impact grades (Additive AHP). They are connected by the logarithmic relationship

aj

ak= 2 gj−gk (8.41)

This relationship depends neither on the choice of the shift constant θ nor on the choice of the nor-malization factor β. Throughout this subsection we will see that the really interesting, uniquelydetermined information consists of ratios of scores and differences of grades.

Complete pairwise comparisons by a single decision maker

The above general result can be simplified in special cases. Let us consider one single decisionmaker who expressed his opinion about all possible pairs of alternatives under the given criterionso that

Njk = 1 ∀j 6= k

The normal equations (8.36) take the simple form

(n− 1)wj −n∑

k=1,k 6=j

wk =n∑

k=1,k 6=j

qjk

or, equivalently

nwj −n∑

k=1

wk =n∑

k=1

qjk

if we take qjj = 0 all j. The additive degree of freedom is used in the solutions of this set ofequations to set the sum ofthe variables to zero, which yields

wj =1n

n∑

k=1

qjk (8.42)

This means that wj is the arithmetic mean of the jth row in the matrix of pairwise comparisonsin logarithmic form, at least under the tacit assumption that the elements on the main diagonalare set to 0 so that we have indeed n elements in each row. By a proper shift of the wj we cangenerate impact grades gj which are situated between 4 and 10.

The AHP impact scores can also directly be computed when the pairwise comparisons are recordedas ratio estimates Ojk. By equations (8.38) and (8.42) it must be true that

vj = n

√Πn

k=1rjk (8.43)

at least under the assumption that the main diagonal elements are available. We set rjj = 1for any j. So, vj is the geometric mean of the j–th row in the matrix of pairwise comparisonsin real magnitudes. The AHP impact scores aj are obtained by normalization, again a cosmeticoperation to make sure that the scores add up to l.

458


An illustrative example is given by the complete set of pairwise comparisons of three vesselsunder the criterion of operability. There is one single decision maker, and the logarithms ofhis/her verbal judgement are shown in Table 8.9, where there is one single decision maker whojudges all possible pairs. The solutions to the associated normal equations yields SMART impactgrades between 4 and 10 with an additive degree of freedom and the AHP impact scores summingup to 1 with a multiplicative degree of freedom.

Ship Ro–Ro Ro–Ro Ro–Ro Arithmetic SMART AHPDesign Des001 Des002 Des003 row means grades scores

Des001 0 -3 -1 -1.33333 4.7 0.091Des002 +3 0 +2 1.66667 7.7 0.727Des003 +1 -2 0 -0.33333 5.7 0.182

Table 8.9. Pairwise-comparison tableau under the criterion of operability in logarithmic form

Complete pairwise comparisons in a group of decision makers

When all decision makers in a group of size G assess all possible pairs of alternatives under thegiven criterion, we can simplify formula (8.36) because Njk = G for all j 6= k. The normalequations are now given by

(n− 1)Gwj −n∑

k=1,k 6=j

wk =n∑

k=1,k 6=j

G∑

d=1

qjkd j = 1, . . . ,n

so that they can be rewritten as

nG wj −Gn∑

k=1

wk =n∑

k=1

G∑

d=1

qjkd j = 1, . . . ,n

if we take qjkd = 0 for all j. We use again the additive degree of freedom in the solutions to setthe sum of the variables to zero, whence

wj =1

nG

n∑

k=1

G∑

d=1

qjkd (8.44)

The wj can equivalently be calculated in two different ways:

• First, all entries are replaced in a cell by their arithmetic mean, so that we have a groupopinion about each pair of alternatives. Thereafter we calculate the arithmetic row meansof the matrix of group opinions. It is tacitly assumed that there are zeroes in the cells onthe main diagonal.

• First, the arithmetic row means are calculated in the pairwise comparison matrices of theindividual group members separately, with zeroes on the main diagonals, so that one ob-tains the impact grades assigned to the alternatives by each group member. Thereafter wecalculate the arithmetic means of the impact grades.

459


The AHP impact scores under the given criterion can be found in a similar way. On the basis ofequation (8.44) a solution of the logarithmic regression problem (8.4.2) is given by

vj = n

√G

√Πn

k=1ΠGd=1Ojkd (8.45)

a formula which shows that the vj can be obtained by geometric–mean calculations. The resultsdo not depend on the order of the operations. We can first calculate the group opinions abouteach pair of alternatives and the scores thereafter, and vice versa, under the assumption thatthere are ones on the main diagonals of all matrices.

The SMART impact grades gj and the AHP impact scores aj can finally be obtained by a propershift of the wj and a proper normalization of the vj respectively.

Power games in groups of decision makers

The results of this subsection have been generalized by Barzilai and Lootsma (1997) so that theycan be used in groups of decision makers who have widely varying power positions. The crucialstep is the assignment of power coefficients Cd to the respective group members. These coefficients,normalized so that they sum up to l, stand for the relative power of the decision makers. Impactgrades and scores of the alternatives are obtained by the unconstrained minimization of

∑

j<k

∑

d∈Djk

(qjkd − wj + wk)2pd (8.46)

clearly a generalization of equation (8.4.2) in the sense that each term in the sum of squares isweighted with the relative power of the decision maker who expressed the corresponding compar-ative judgement. When the pairwise comparisons are complete, a solution is given by

wj =1n

n∑

k=1

G∑

d=1

qjkd Cd (8.47)

8.4.3 Criterion Weights and Aggregation

Let us first introduce some notation. There are m criteria C1, . . . ,Cm. Suppose that we have ob-tained the SMART impact grades gij and the AHP impact scores aij of the respective alternativesAj under criterion Ci via the solution (wi1, . . . ,win) of a set of normal equations. Thus

gij = wij + θi j = 1, . . . ,n

aij = βi 2wij j = 1, . . . ,n

where θi and βi stand for the shift constant and the normalization factor under the i–th crite-rion. Differences of impact grades and ratios of impact scores are connected by the logarithmicrelationship a

aij

aik= 2 gij−gik (8.48)

460


We also assume that there are criterion weights ci expressing the relative importance of the re-spective criteria. They may have been obtained via the direct–rating procedure of subsection7.3.4, but they may also be generated via the method of pairwise comparisons to be described inthe present subsection.

Aggregation via arithmetic and geometric means

On the basis of equations (8.39) and (8.48) one can easily derive

Πmi=1

(aij

aik

)ci

= 2∆jk (8.49)

where

∆jk = sj − sk =m∑

i=1

ci gij −m∑

i=1

ci gik (8.50)

The symbols sj and sk, clearly obtained via a so–called arithmetic–mean aggregation rule, standfor the final SMART grades. They are not unique. There is an additive degree of freedom θi

under each criterion as well as a general degree of freedom η so that the final grade sj is generallygiven by

sj = η +m∑

i=1

ci(wij + θi) (8.51)

with arbitrary shift constants η, θ1, . . . ,θm. The formulas (8.49) and (8.50) suggest that the dif-ference of the final grades is the logarithm of a ratio of final scores according to the MultiplicativeAHP. We therefore take

tj = α Πmi=1a

ciij = α Πm

i=1(βi vij)ci (8.52)

to represent the final AHP score of Aj . The multiplicative factor α is used for cosmetic purposesonly, to make sure that the final scores sum up to l. The βi stand for arbitrary normaliza-tion factors under the respective criteria. Obviously, the final AHP scores are calculated via ageometric–mean aggregation rule and, moreover

tjtk

= 2 sj−sk (8.53)

regardless of the choice of the shift constants and the normalization factors. Hence, even the finalSMART grades and the final AHP scores satisfy the logarithmic relationship that we found forthe impact grades and the impact scores of the alternatives under each of the criteria separately(see equations (8.41) and (8.48).

Aggregation of complete pairwise comparisons

When the pairwise comparisons are complete, in the sense that each decision maker judged everypair of alternatives under each criterion, we can simplify the calculations considerably. Thesymbol qijkd is taken to stand for the difference of grades assigned to the alternatives Aj and Ak

461


under criterion Ci by decision maker d, and rijkd for the corresponding ratio of subjective values.It follows easily from (8.44) and (8.45) that the final SMART grades and the final AHP scores, ifone ignores the shift constants and the normalization factors, can be written as

sj =1

nG

m∑

i=1

n∑

k=1

G∑

d=1

ci qijkd (8.54)

tj = n

√G

√Πm

i=Πnk=1Π

Gd=1O

ciijkd (8.55)

Computationally, this implies that we can operate in any order without affecting the final resultsof the analysis. One can aggregate, first, over the decision makers so that we obtain the groupopinion about every pair of alternatives under each criterion, thereafter over the criteria so thatwe obtain an aggregate pairwise–comparison matrix, and finally over the row means of the ag-gregate pairwise–comparison matrix to obtain the final grades and scores. We can also changethe order of the operations, however, in order to check the correctness of the computational results.

Interpretation of a ratio of criterion weights

Shift constants and normalization factors do not affect ratios of criterion weights. We show thisfor the normalization factors only. In doing so we also find an interpretation for ratios of criterionweights in terms of substitution rates, as one might expect since these ratios are traditionallylinked with the trade–offs between gains and losses during a move along indifference curves. Onthe basis of formula (8.52) we can generally write the final AHP score of an alternative A as afunction of the approximations to the subjective values of A under the respective criteria. Thus,the final AHP score is given by

t(A) = t(ν1, . . . ,νn) = α Πmi=1(βi νi)ci (8.56)

The arbitrary multiplicative factors α, β1, . . . ,βm appear in formula (8.56) for normalizationpurposes but one can equivalently say that they appear because the units of performance mea-surement were not specified. The first–order partial derivatives of t take the form

∂t

∂νi=

ci

νi× t

whence

1∂νi1

∂t

∂vi2

/1

∂νi2

∂t

∂vi1

=ci2

ci1

(8.57)

for arbitrary i1 and i2, and regardless of the factors α, β1, . . . ,βm. We can now study thebehavior of t as a function of νil and νi2 along a contour or indifference curve. In a first–orderapproximation, when we make just a small step, such a move proceeds in a direction which isorthogonal to the gradient of t, that is, in the direction

(∂t

∂νi2

,− ∂t

∂νi1

)

462


This is shown in Figure 8.21. The reader has to remember that in a first–order approximation,a move along an indifference curve proceeds in a direction which is orthogonal to the gradient inthe point where the move starts. The marginal substitution rate is the ratio of the componentsof that direction. The relative substitution rate is the ratio of the components divided by thecoordinates of the point of departure.

Figure 8.21. Indifference curve of a final AHP score function

Traditionally (Keeney and Raiffa, 1976) the ratio∂t

∂νi2

/∂t

∂νi1

has been defined as the marginal trade–off or the marginal substitution rate between the twocriteria under consideration. Under the geometric–mean aggregation rule, with the functiont defined by formula (8.56), this ratio is not constant along an indifference curve. Lootsmaand Schuyt (1997) therefore introduced the relative substitution rate, which is based upon theobservation that human beings generally perceive relative gains and losses, that is, gains andlosses in relation to the levels from which the move starts. Thus, when a small step is made alongan indifference curve, the relative gain (or loss) in the vi1 direction and the corresponding relativeloss (or gain) in the νi2 direction are proportional to

1∂νi1

∂t

∂νi2

and1

∂νi2

∂t

∂νi1

respectively. Under the geometric–mean aggregation rule (8.56) the substitution rate betweenrelative gains and losses is a constant, not only along an indifference curve, but over the entire(νi1 ,νi2) space. It depends neither on the units of measurement nor. on the values of the re-maining variables νi, i 6= i1 and i2. Thus, we can meaningfully use the concept of the relativeimportance of the criteria, even in the absence of immediate context, when the alternatives arenot yet available, for instance.

Pairwise comparison of the criteria

The pairwise comparison of two criteria proceeds almost in the same way as the pairwise com-parison of two alternatives under a particular criterion. It is also closely related to the proceduredescribed in subsection ...???. In the basic experiment a pair (Cf ,Cs) of criteria is presented to

463


the decision maker whereafter he/she is requested to state whether they are equally important forhim/her or whether one of the two is somewhat more, more, or much more important than theother. By the information so obtained we estimate the relative importance of the first criterionCf with respect to the second criterion Cs, that is, the ratio of the associated criterion weights.By the arguments just mentioned the geometric–mean aggregation rule enables us to avoid thepitfalls mentioned in subsection ...??...

The assignment of numerical values to the gradations of relative importance is shown in Table8.10, where the relative importance of criteria are expressed in the form of a difference of grades.The scale l, 2, 4, 8, 16 is used, which has been derived in subsection ....???...., a scale withprogression factor

√2 if one considers the major as well as the threshold gradations of relative

importance. For reasons of simplicity the difference of grades qfsd is usually recorded to representthe comparative judgement of decision maker d about the pair (Cf ,Cs).

Comparative judgement of Cf Difference ofwith respect to Cs grades qfsd

Cf vastly more important than Cs 8Cf much more important than Cs 6Cf more important than Cs 4Cf somewhat more important than Cs 2Cf as important as Cs 0Cf somewhat less important than Cs -2Cf less important than Cs -4Cf much less important than Cs -6Cf vastly less important than Cs -8

Table 8.10. Comparative judgement of the criteria

The criterion weights are also computed, just like the impact scores of the alternatives, via thesolution of a regression problem, i.e. the unconstrained minimization of the sum of squares

∑

f<s

∑

d∈Dfs

(qfsd − wf + ws)2 (8.58)

The associated normal equations do not have a unique solution. There is an additive degree offreedom which can be used to obtain the normalized criterion weights

ci =(√

2)wi

∑mi=1(

√2)wi

(8.59)

In fact, the calculation of the impact scores of the alternatives and the calculation of the criterionweights differ with respect to the progression factor only. It is 2 for the alternatives (formula(8.36)) and

√2 for the criteria (formula (8.59)), because human beings categorize long ranges on

the dimensions of time, sound, and light, but evidently a short range on the dimension of theimportance of the criteria. .

464


8.4.4 Fuzzy Extension

A fuzzy extension of MADM methods is desirable because the problem data and the judgementalstatements of the decision makers are usually imprecise. This has been explained in subsection7.3.4, so that it is possible to immediately proceed to the design of a fuzzy extension of theAdditive and the Multiplicative AHP. The method starts from the subjective evaluation of n

alternatives under an unspecified criterion, typically the point of departure of subsection 7.3.2.It is supposed that both the estimated ratios Ojkd and the associated differences of grades qjkd

can be mode1led as fuzzy numbers with triangular membership functions. Thus, in what fo1lowsthe estimated ratio

rjkd = (Ojkdl,Ojkdm,Ojkdu)

is estimated as well as the difference of grades

qjkd = (qjkdl,qjkdm,qjkdu) = (ln Ojkdl, ln Ojkdm, ln Ojkdu)

whereby the decision maker d expresses his/her comparative judgement of Vj/Vk, the ratio of thesubjective values of the alternatives Aj and Ak. The crucial algorithmic step in the Additive andthe Multiplicative AHP is the solution of the system of normal equations (8.36), and this systemcan easily be fuzzified. The right–hand side has fuzzy elements now, but the coefficient matrixremains crisp. The variables are also fuzzy, and it is assumed that they can be written as fuzzynumbers

wj = (wjl,wjm,wju)

with triangular membership functions. Hence, one now has to deal with the fuzzy normal equa-tions

wj

n∑

k=1,k 6=j

Njk −n∑

k=1,k 6=j

Njk wk =n∑

k=1,k 6=j

∑

d∈Djk

qjkd , j = 1, . . . ,n

It follows easily that the modal values wjm have to satisfy the equations

wjm

n∑

k=1,k 6=j

Njk −n∑

k=1,k 6=j

Njk wkm =n∑

k=1,k 6=j

∑

d∈Djk

qjkdm , j = 1, . . . ,n (8.60)

These are precisely the normal equations (8.36), even though he lower values wjl and the uppervalues wju cannot be solved separately. They jointly have to satisfy the equations

wjl

n∑

k=1,k 6=j

Njk −n∑

k=1,k 6=j

Njk wku =n∑

k=1,k 6=j

∑

d∈Djk

qjkdl , j = 1, . . . ,n (8.61)

wju

n∑

k=1,k 6=j

Njk −n∑

k=1,k 6=j

Njk wkl =n∑

k=1,k 6=j

∑

d∈Djk

qjkdu , j = 1, . . . ,n (8.62)

The equations in the system (8.60) sum up to the zero equation so that they have an additivedegree of freedom. Because qjkdl = −qkjdu and Njk = Nkj , it can shown that the equations(8.61) and (8.62) also sum to the zero equation.

465


In general, the right–hand side elements of equations (8.61) and (8.62) sum up to zero becausen∑

j=1

n∑

k=1,k 6=j

∑

d∈Djk

qjkdl −n∑

j=1

n∑

k=1,k 6=j

∑

d∈Djk

qkjdl = 0

This result is not satisfactory, however. The systems (8.60), (8.61) and (8.62) have at least onedegree of freedom each, and these degrees are independent so that one cannot identify a fuzzysolution unless some additional assumptions. are introduced. For a fuzzy version the decisionmakers would have to supply much more information (the lower and the upper values of thedifferences of grades) than for a crisp version (the modal values only), but one has the impressionthat they are mostly unable or unwilling to provide the additional data. It is therefore necessaryto simplify the procedure. In order to get a rough idea of how the imprecision of the decisionmaker affects the final grades and scores of the alternatives, one has to ask them to specify auniform right–hand and left–hand spread σ which they almost never exceed in the actual decisionproblem. Now it can be written

qjkdu − qjkdm = qjkdm − qjkdl = σ

Thus, the triangular membership functions of the differences of grades are isosceles and they havethe same basis length. If it is assumed that the variables have a similar form, albeit with spreadswhich still have to be determined, one can write the following equation

wju − wjm = wjm − wju = τj

Subtracting the equations (8.61) from (8.62) we find that the spreads τi satisfy the system

τj

n∑

k=1,k 6=j

Njk +n∑

k=1,k 6=j

Njk τk =n∑

k=1,k 6=j

∑

d∈Djk

σ = σn∑

k=1,k 6=j

Njk , j = 1, . . . ,n (8.63)

If it is moreover assumed that the variables have equal spreads τ (a reasonable assumption becausethe differences of grades have equal spreads as well) the system (8.63) yields the surprisingly simpleresult τ = σ/2. This enables to carry out a sensitivity analysis of the final grades and scoresin a manner which parallels the procedure for SMART (see subsection 7.4.3). The fuzzy impactgrades can be written as

gij = [gijm − σ/2,gjm,gijm + σ/2]

and the fuzzy criterion weights have the form

ci = cim

[(√

2)−σ/2,1,(√

2)σ/2]

where cim is the normalized weight of criterion Ci in the crisp case. The fuzzy final grades of thealternatives can now be written as

sj =∑

i

ci gij =∑

i

cim ×((√

2)−σ/2,1,(√

2)σ/2)× [gijm − σ/2,gjm,gijm + σ/2]

466


Usually, all grades must be between 4 and 10, and this implies that the modal, the lower, andthe upper values of the fuzzy final grades are approximately given by

sjm =∑

i

cim gijm (8.64)

sjl = max[4,(√

2)−σ/2 (sjm − σ/2)]

(8.65)

sju = min[10,(

√2)σ/2 (sjm + σ/2)

](8.66)

8.4.5 Original AHP

The original AHP of Saaty (1977, 1980) has been criticized for various reasons:• for the so–called fundamental scale to quantify verbal comparative judgement;• because it estimates the impact scores of the alternatives by the Perron–Frobenius eigen-

vector of the pairwise–comparison matrix;• because it calculates the final scores of the alternatives via the arithmetic–mean aggregation

rule.

The original AHP is based upon ratio information so that the proposed algorithmic operationsare inappropriate. The critical comments and the supporting evidence are briefly summarizedbelow.

The fundamental scale

For a decision maker there is no difference between the original AHP and the Additive or theMu1tipllcative AHP as far as the input of the judgemental statements is concerned. Two stimuliare presented to him/her, whereafter he/she is invited to answer the same questions regardless ofthe method to be used. The difference is in the subsequent quantification of the answers. Table8.11 shows how the major gradations of the decision maker’ s comparative preferential judgementare encoded in numerical scales.

The scale of the Multiplicative AHP, based upon psycho–physical arguments, is clearly muchlonger than the fundamental scale of the original AHP . Several numerical studies, however,show that the final scores are almost insensitive to the length of the scale (Lootsma, 1993).The fundamental scale, neither arithmetic nor geometric, introduces fractions which are notnecessarily present in the decision maker’s mind. Consider three alternatives Aj , Ak, and Al,for instance, and suppose that the decision maker has a weak preference for Aj with respectto Ak (fundamental–scale value 3) and a weak preference for Ak with respect to Al (the samefundamental–scale value 3). He/she might logically have a strict preference for Aj with respectto Ak (transitivity of preference, weak × weak = strict), certainly not the very strong or absolutepreference which is suggested by the product of the scale values for weak preference (3 × 3 =9). Frictions to such an extent do not occur under the Additive AHP (2 + 2 = 4, addition oflogarithms) or the Multiplicative AHP (4 x 4 = 16).

467


Comparative preferential Original AHP Additive AHP Multiplicative AHPjudgement of Aj estimated ratio of difference of estimated ratio of

with respect to Ak subjective values grades subjective values

Very strong preference for Aj 9 8 256Strong preference for Aj 7 6 64Strict (definite) preference for Aj 5 4 16Weak preference for Aj 3 2 4Indifference between Aj and Ak 1 0 1Weak preference for Ak 1/3 -2 1/4Strict (definite) preference for Ak 1/5 -4 1/16Strong preference for Ak 1/7 -6 1/64Very strong preference for Ak 1/9 -8 1/256

Table 8.11. Scales of comparisons in the original, the additive and the multiplicative AHP

The Perron-Frobenius eigenvector

It is we1l–known that Saaty (1977, 1980), considering the positive and reciprocal matrix R withcomplete pairwise comparisons under a given criterion (one single decision maker), proposed toestimate the impact scores of the alternatives by the components of the eigenvector correspondingto the largest eigenvalue (real and positive by the theorem of Perron and Frobenius). At an earlystage, this proposal has been criticized by Johnson et al. (1979), Cogger and Yu (1983), andTakeda et al. (1987). The key issue is the so–called right-left asymmetry: which eigenvectorshould be used to produce the impact scores of the alternatives under a given criterion, the leftor the right eigenvector? Johnson et al. (1979) considered the pairwise–comparison matrix

1 3 13

12

13 1 1

6 23 6 1 12 1

2 1 1

The Perron–Frobenius right eigenvector, positive and normalized in the sense that the compo-nents sum up to 1, is given by (0.184, 0.152, 0.436, 0.227) so that it provides the rank orderA3 > A4 > A1 > A2. The element Ojk in R tells us how strongly the decision maker prefersalternative Aj over Ak, and the components of the right eigenvector stand for the ‘degree of sat-isfaction’ with the alternatives. If one rephrases the questions submitted to the decision maker,that is, if one asks how strongly he/she dislikes Aj with respect to Ak, we would logically obtainthe transpose of R. The right eigenvector of the transpose of R is the left eigenvector of R itself,and it appears to be given by (0.248, 0.338, 0.105, 0.259). The components represent the ‘degreeof dissatisfaction’ with the alternatives. This leads to the rank order A3 > A1 > A4 > A2, andaccordingly to an interchange of the positions of A1 and A4. Note that such a rank reversal doesnot occur when one uses the geometric row means to compute the impact scores. The geometriccolumn means are the inverses of the corresponding geometric row means.

468


Saaty and Vargas (1984) tried to settle the matter and to show that the right eigenvector shouldalways be used because it properly captures relative dominance. The evidence is not convincing,however. The basic step in the argumentation is that the (j,k)th element in the mth power of apairwise–comparison matrix represents the total intensity of all walks of length m from a node j

to a node k. The intensity of an m–walk from j to k is the product of the intensities of the arcs inthe walk. In terms of the AHP, the total intensity would be a sum of products of preference ratios.

Aggregation of ratio and difference information

In recent years Barzilai et al. (1994, 1997) analyzed the AHP under the requirement that theresults of the aggregation step should not depend on the order of the computations. The finalscores should be the same, regardless of whether one combines first the pairwise–comparison ma-trices into an aggregate matrix, or whether one computes first the impact scores from the separatepairwise–comparison matrices. The original AHP is based upon ratio information so that it has amultiplicative structure. The geometric–mean aggregation rule appears to be the only rule herewhich satisfies certain consistency axioms (see formula (8.45)). For a variant which is based upondifference information so that it has an additive structure the arithmetic–mean aggregation ruleappears to be the only rule which is compatible with the corresponding consistency axioms (seeformula (8.44)).

The original AHP follows an inappropriate sequence of operations, particularly when it is usedin group decision making: geometric–mean computations to synthesize the pairwise comparisonsexpressed by the individual members into group pairwise comparisons, eigenvector calculationsto compute the impact scores of the alternatives, and arithmetic–mean calculations to combinethe impact scores into final scores. The dependence on the order of the computations is avoided:in the Multiplicative AHP by using a sequence of geometric–mean calculations (formula (8.55)and in the Additive AHP by using a sequence of arithmetic–mean computations (formula (8.54).

Numerical experiments

How do the shortcomings of the original AHP emerge? How do the decision makers notice thatthere is an inappropriate sequence of operations in it? Not by inspection of the final scores be-cause the AHP is remarkably robust: the final scores are almost insensitive to the choice of thealgorithmic operations.

The vivid discussions about the shortcomings of the original AHP were triggered by Belton andGear (1983) who studied the behavior of the method on an artificial numerical example. Theynoted that the addition of a copy of an alternative may change the rank order of the final scoresin a set of consistently assessed alternatives, even when the criteria and the criterion weightsremain the same. This prompted Dyer (1990) to state that the rankings provided by the originalAHP are arbitrary. It is easy to verify, however, that the rank reversal disappears as soon as thearithmetic–mean aggregation is replaced by the geometric–mean aggregation. In the Additiveand the Multiplicative AHP the addition or the deletion of an alternative, whether it is a copy

469


of another alternative or not, preserves the rank order of the remaining alternatives. These twovariants have an even stronger property. The Additive AHP preserves the difference between anytwo final grades. the Multiplicative AHP preserves the ratio of any two final scores.

Matters may be clarifies by considering here the example of Belton and Gear (1983). Table8.12 shows the original data. There were initially three alternatives, A1, A2, and A3, and threecriteria with equal weights. Under each criterion the pairwise comparisons were consistent in themultiplicative sense that Ojk × Okl = Ojl for any triple of elements in the pairwise–comparisonmatrix (hence the somewhat unusual entries 8/9 and 9/8). The final scores appear to be (0.45,0.47, 0.08) so that A2 turns out to be the preferred alternative. When a copy A4 of A2 is addedto the set, however, the original AHP yields the final scores (0.37, 0.29, 0.06, 0.29) so that itdesignates Al to be the leading alternative.

Criterion Design A1 A2 A3 A4

A1 1 1/9 1 1/9A2 9 1 9 1

C1A3 1 1/9 1 1/9A4 9 1 9 1

A1 1 9 9 9A2 1/9 1 1 1

C2A3 1/9 1 1 1A4 1/9 1 1 1

A1 1 8/9 8 8/9A2 9/8 1 9 1

C3A3 1/8 1/9 1 1/9A4 9/8 1 9 1

Table 8.12. Comparison in AHP

Table 8.13 shows that the computations in the Additive and the Multiplicative AHP are muchsimpler because one can use the aggregate matrix to find the final grades and scores. First. notethat the pairwise-comparison matrices are consistent in the additive sense so that the aggregatematrix is also consistent. One therefore needs the top row only. The remaining rows do notprovide any additional information. It follows easily now that the final grades of A1 and A2

have the difference -0.333, the final grades of A1 and A3 have the difference 5, etc. Hence. thealternatives A1, A2, and A3 have the normalized final scores (0.436, 0.550, 0.014). When A4 isadded to the set, the normalized final scores are (0.281, 0.355, 0.009, 0.355). No rank reversal!The ratio of any two final scores is preserved.

The aggregate matrix is a powerful instrument for the analysis via the Additive and the Multi-plicative AHP, but it does not make sense in the original AHP. With the data of Table 8.12 itwould not even be reciprocal so that the theorem of Perron and Frobenius does not apply.

470

8.5 – ELECTRE Systems

Criterion Design A1 A2 A3 A4

A1 0 -8 0 -8A2 8 0 8 0

C1A3 0 -8 0 -8A4 8 0 8 0

A1 0 8 8 8A2 -8 0 0 0

C2A3 -8 0 0 0A4 -8 0 0 0

A1 0 -1 7 -1A2 1 0 8 0

C3A3 -7 -8 0 -8A4 1 0 8 0

A1 0 -0.333 5 -0.333A2 0.333 0 5.333 0

AggregateA3 -5 -5.333 0 -5.333A4 0.333 0 5.333 0

Table 8.13. Comparison with rescaled data in Additive AHP

The key question is the following one, however. Given the assumption that the criteria and thecriterion weights do not change, could one legitimately expect rank reversal by the addition orthe deletion of copies of an alternative? We do not think so. Rank reversal cannot logically beexpected under such circumstances so that the example of Belton and Gear (1983) provides thestrong warning that the original AHP should be used with considerable caution.

8.5 ELECTRE Systems

The ELECTRE systems are central to the French school in MADM where a complete or in-complete rank order of the alternatives is built up via outranking relations under the individualcriteria. In the pairwise comparison step the outranking relation between two alternatives undera given criterion is established by inspection of the difference between the physical or monetaryvalues expressing the performance of the respective alternatives. The key question is to find cer-tain discrimination thresholds to categorize the differences. The indifference, preference, and vetothresholds in ELECTRE III constitute the basis for two fuzzy concepts: the degree of concordance(the degree of agreement or harmony with the statement that the first alternative in the pair isat least as good as the second), and the degree of discordance (the degree of disagreement withthe above statement). A so–called distillation procedure will eventually produce a not necessarilycomplete rank order of the alternatives. The French school is based upon the idea of construc-tivism which implies that a coherent system of preferences and values is not necessarily presentin the decision maker’s mind at the beginning of the decision process. It may be constructed,

471


however, by the decision maker and the analyst together in the course of the process. It willbe that the elicitation of the discrimination thresholds can be simplified considerably when theperformance of the alternatives is expressed in SMART grades.

8.5.1 Discrimination Thresholds

In the series of ELECTRE systems (the acronym stands for ELimination Et Choix Traduisantla REallte) ELECTRE III was the first with fuzzy concepts incorporated in it. Lootsma andSchuyt (1997) extensively used it for a comparative study involving the AHP and SMART aswell. For the basic concepts of the French school, the reader can refer to Roy (1985) and Roy andBouyssou (1993). The establishment of an outranking relation proceeds via a pairwise comparisonof two alternatives Aj and Ak under some criterion. It is assumed that the performance of thealternatives is expressed in physical or monetary values φj = φ(Aj) and φk = φ(Ak). Increasingvalues are supposed to generate an increasing strength of preference. When φj < φk, φj is takento stand for the reference value and the strength of preference for Ak is categorized with respectto Aj by inspection of certain inequalities. If

φk ≥ φj + νj(φj) (8.67)

where νj(φj) is the so–called veto threshold , then the preference for Ak over Aj is predominant.It cannot be reversed by an excellent performance of Aj under the remaining criteria. In generaI,the veto threshold is a positive and non–decreasing function of the physical or monetary value ofthe corresponding alternative. If

φj + pj(φ) ≤ φk ≤ φj + νj(φj) (8.68)

where pj(φj) is the so–called preference threshold , then Ak is strictly preferred over Aj . Onecan also say that Ak is situated in the strict–preference zone at the right–hand side of Aj . Ingeneral, the preference threshold is also a positive and non–decreasing function of the physical ormonetary value of the corresponding alternative. If

φj + qj(φj) ≤ φk ≤ φj + pj(φj) (8.69)

where qj(φj) is the so–called indifference threshold , then Ak is weakly preferred over Aj . In otherwords, Ak is situated in the weak–preference zone at the right–hand side of Aj . The indifferencethreshold is also positive and non–decreasing. Finally, if

φj ≤ φk ≤ φj + qj(φj) (8.70)

then the decision maker is indifferent between the two alternatives.

When φk ≤ φj the position of the two alternatives is interchanged. Ak is taken as the referencealternative and the strength of preference for Aj us judged on the basis of the veto, preference,and indifference thresholds at the right–hand side of Ak.

Using this information the analyst can construct the degree of concordance or harmony h(Aj ,Ak)with the statement that Aj outranks Ak. This is a function of the difference φ(Ak) − φ(Aj).

472


Figure 8.22 shows that it has the form of a membership function. It illustrates the degree ofconcordance or harmony with the judgemental statement that the alternative Aj outranks (is atleast as good as) Ak. The degree of concordance decreases linearly from the top level as soon ashas passed the indifference threshold and it arrives at the bottom level as soon as has reached thepreference threshold. It decreases linearly from 1 to 0 over the interval between the indifferencethreshold and the preference threshold.

Figure 8.22. Degree of concordance or harmony

On the basis of similar arguments the analyst can construct the degree of discordance d(Aj ,Ak).The function is shown in graphical form in Figure 8.23 with the degree of discordance with thejudgemental statement that the alternative Aj outranks (is at least as good as) Ak. The degreeof discordance increases linearly from the bottom level as φk has passed the preference thresholdand it arrives at the top level as soon as φk has reached the veto threshold.

Figure 8.23. Degree of discordance

In the formulas (8.67) through (8.70) thresholds are found only on the right–hand side of thereference alternative Aj . If an indifference threshold is now considered on the left–hand side, itshould clearly have the property that the decision maker is indifferent between Aj and Ak if

φj − qj′(φj) ≤ φk ≤ φj (8.71)

As soon as the variable φk coincides with the point where the transition from indifference to weakpreference occurs it has to satisfy the equality

φk = φj − q′j(φj) (8.72)

but at such a coincidence it must also be true that

φj = φk + qk(φk) (8.73)

This implies that the indifference thresholds on the right–hand side and the left–hand side arenot independent (Roy, 1985). The same conclusion can be drawn, on similar grounds, for thepreference and the veto thresholds.

473


If the indifference thresholds on the right–hand side can be written in the form

qj(φj) = q × φj for any j = 1, . . . ,n

where q is a proportionality factor which does not depend on the alternatives (it is typical for thecriterion under consideration only), then relations (8.72) and (8.73) yield

q′j(φj) =

q

1 + q× φj for any j = 1, . . . ,n

One does not have to specify the thresholds around φj in relation to φj only. If the indifferencethreshold on the right–hand side has the more general form

qj(φj) = q × (φj − φmin) (8.74)

where φmin is the lower endpoint of the range of acceptable performance data under the givencriterion, then

q′j(φj) =

q

1 + q× (φj − φmin) (8.75)

Similarly, if the indifference threshold on the right–hand side is proportional to the deviationfrom the upper endpoint φmax of the range of acceptable performance data so that

qj(φj) = q × (φmax − φj) (8.76)

then one can easily derive

q′j(φj) =

q

1− q× (φmax − φj) (8.77)

In order to choose the indifference thresholds with the above formulas; there is an opportunityto merge the ideas of the AHP, SMART, and ELECTRE. In order to anchor the discriminationthresholds in the model of the decision problem, they are linked to the endpoints of the rangeof acceptable performance data. Recall that the relative preference was identified with the ratioof the deviations from the non–desired endpoint φmin of the range of acceptable performancedata (the maximum speed of vessels with φmin = 14 kn, for instance) or with the inverse ratioof the deviations from the desired target φmax (vessels under the operability criterion with φmax

= 100%, for instance). Furthermore, the ratio 4:1 was identified with weak preference and theratio 16:1 with strict or definite preference. There is a hesitation between indifference and weakpreference when the ratio is roughly 2:l. This observation can now fruitfully be employed. Usingthe formulas (8.74) and (8.75) with q = 1 indifference thresholds are obtained at the pointsφj − (φj − φmin)/2 and φj + (φj − φmin). The position of these thresholds is shown in Figure8.24.

474


Figure 8.24. Indifference thresholds around φj with respect to the non–desired endpoint of therange of acceptable performance data (maximum speed of vessels, for instance)

Similarly, using the formulas (8.76) and (8.77) with q = 1/2 the indifference thresholds are foundat φj − (φmax − φj) and φj + (φmax − φj)/2 (see Figure 8.25).

The preference thresholds may be found in the same way. Recall that the transition from weak tostrict or definite preference occurs when the ratio of the deviations from the desired or the oppositeendpoint of the range equals 8:l. In the next subsection the reader see that the discriminationthresholds can be identified in a much simpler way as soon as the physical or monetary values ofthe alternatives have been converted into grades on an arithmetic scale.

Figure 8.25. Indifference thresholds around φj with respect to the desired target at the end of therange of acceptable performance data (operability of vessels, for instance)

Under a qualitative criterion the performance of the alternatives cannot be expressed in physicalor monetary values but it can usually be expressed on a numerical scale such as l, 2, . . . , 7, or onthe SMART scale 4, 5, . . ., 10. The subsequent identification of the discrimination thresholds willbe explained in detail in the next subsection. The decision maker may also come to the conclu-sion that the two alternatives are incomparable under the given criterion so that an outrankingrelation cannot be established.

The aggregation procedure of ELECTRE III is now briefly sketched . It is much more complicatedthan the aggregation procedure of the AHP and SMART so that a complete description is omitted.First some additional notation is needed. The degree of concordance and the degree of discordanceof alternative Aj versus Ak under the ith criterion will be denoted by hi(Aj ,Ak) and di(Aj ,Ak),respectively. There is also an importance factor (sometimes referred to as the voting power) ki

associated with the ith criterion. The degree of global concordance or harmony with the statementthat Aj outranks Ak is defined as

H(Aj ,Ak) =

m∑

i=1

ki hi(Aj ,Ak)

m∑

i=1

ki

475


The discordance information is used to weaken the degree of global concordance, but one onlyemploys the degree of discordance under the criteria such that

di(Aj ,Ak) > H(Aj ,Ak)

The degree of credibillty that alternative Aj outranks Ak is now given by

H(Aj ,Ak)×Πi1− di(Aj ,Ak)1−H(Aj ,Ak)

where the product runs over the criteria just mentioned. If such criteria cannot be found, thedegree of credibility is set to the degree of global concordance. Obviously, the degree of credibilityequals zero if there is a criterion where the veto threshold at the right–hand side of Aj has beenpassed by Ak so that the degree of discordance equals one. The matrix of the degrees of credibilityis finally used to rank the alternatives. The rank order may be incomplete, which implies thatfor some pairs of alternatives there is not enough evidence to establish a preference relation.

ELECTRE is clearly full of fuzzy information but it is difficult to analyze.

8.6 Fuzzy Multiobjective Optimization

Many features of real-life single-objective optimization problems are imprecise. The values ofthe coefficients are sometimes merely prototypical, the requirement that the constraints must besatisfied may be somewhat relaxed, and the decision makers are not always very satisfied with thevalue attained by the objective function. Multiobjective Optimization introduces a new feature:the degrees of satisfaction with the objective–function values play a major role because they en-able the decision makers to control the convergence towards an acceptable compromise solution.Since the objective functions have different weights for the decision maker we also have to controlthe computational process via weighted degrees of satisfaction.

Multiobjective optimization has two subfields: (i) the identification of the nondominated solu-tions, and (ii) the selection of a nondominated solution where the objective-function values arefelt to be in a proper balance. The first–named subfield can be studied in the splendid isolationof mathematical research. From the early days in multiobjective optimization it has attracted Yuand Zeleny (1975) who provided a substantial contribution to the characterization of nondomi-nated (efficient, Pareto–optimal) solutions. The second subfield, however, straddles the boundarybetween mathematics and other disciplines because human subjectivity is an integral part of theselection process. Certain parameters (weights, targets, desired levels) are adjusted on the ba-sis of new preference information, whereafter the computations proceed in a somewhat modifieddirection. Several methods optimize a so–called scalarizing function, that is, a particular combi-nation of the objective functions which also contains a set of weights to control the computationalprocess. It is not always clear, however, how the weights could be used in order to obtain a rapid‘convergence’ towards an acceptable compromise solution.

476

8.6 – Fuzzy Multiobjective Optimization

Therefore a fuzzy concept is introduced: the degree of satisfaction with an objective function, aquantity between 0 and 1 expressing the position of the objective–function value between the idealand the nadir value. In order to arrive at a non-dominated solution where the objective functionsare reasonably balanced, a weighted geometric mean of the degrees of satisfaction is maximized.Numerically, this is equivalent to the maximization of a weighted geometric mean of the deviationsfrom the nadir values. The composite function so obtained has a particular property; the relativesubstitution rate between any two function values along an indifference curve is equal to theratio of the corresponding weights, regardless of the performance of the alternatives under theremaining objectives and regardless of the units of performance measurement. Some numericalexperiments, which are illustrated below, will demonstrate that the minimization of the weightedCebycev–norm distance and the maximization of the weighted degrees of satisfaction produceroughly the same nondominated solution, although deviations from the ideal values are minimizedin the first–named approach whereas deviations from the nadir values are maximize in the secondapproach. Thus, there are at least two approaches which seem to process the concept of therelative importance of the objective functions in a usable manner.

8.6.1 Ideal and Nadir Values

The multiobjective optimization problem is concerned with maximizing the concave objectivefunctions

fi(x) , i = 1, . . . ,p

over the set C of points satisfying the constraints

gi(x) ≥ 0 , i = 1, . . . ,m

with concave constraint functions gi defined on the n–dimensional vector space En so that C isa convex subset of En. In addition, it is assumed that C is closed and bounded so that there isa maximum solution for each objective function separately. For ease of exposition it is assumedthat each objective function has a unique maximum solution over C. Such a point is accordinglynondominated because a deviation from it will reduce at least one of the objective functions. Themaximum solution of fi will be denoted by xi.

It is customary in multiobjective optimization to consider not only the n–dimensional decisionspace En of x–vectors but also the p–dimensional objective space Ep of z–vectors which containsthe set f(C) of vectors z = f(x), x ∈ C. The symbol f obviously denotes the mapping fromEn into Ep with the components fi, i = 1, . . . ,p. The original problem can now equivalently berestated as the problem of maximizing the components of z subject to the constraints z ∈ f(C).In the linear case, when the problem functions f1, . . . ,fp, g1, . . . ,gm are linear, the sets C andf(C) are both simplices. When the problem functions are concave and non–linear, however, onecannot in general guarantee that f(C) is convex.

It is useful in multiobjective optimization to calculate the so–called single–objective maximumsolutions xı because the ideal values

477


zmaxi = fi (xı) , i=1,..., p(8.78)

show the decision maker how far one could go with each objective function separately. Thedecision maker may even decide, before the analysis is continued, to relax certain constraintswhen some of the ideal values are still rather low, or to introduce new constraints guaranteeingthat some objective functions will remain above a certain level, whereafter the ideal values arerecalculated. The ideal values are unique, even if the single–objective maximum solutions xı arenot. The ideal vector zmax with the components zmax

i , i = 1, . . . ,p, is normally outside f(C).Otherwise there would not be a real problem for the decision maker. An indication of the worstpossible outcome for the respective objective functions is given by the so–called nadir values

zmini = min

j=1,...,p[fi(xj)] , i = 1, . . . , p (8.79)

By the assumed uniqueness of the single-objective maximum solutions it must be true that thenadir values are also unique and that

zmaxi > zmin

i , i = 1, . . . ,p

The nadir vector zmin with the components zmini , i = 1, . . . ,p, does not necessarily belong to.f(C),

so that it may be too pessimistic about the possible variations of the corresponding objectivefunction. Figure 8.26 sketches the position of the ideal and the nadir vector in a problem withtwo objective functions. The objective space is accordingly two–dimensional, and the directionsof optimization are parallel to the coordinate directions.

Figure 8.26. Ideal and nadir vector in a two–dimensional objective space

8.6.2 Weighted Cebycev–Norm Distance Functions

The leading idea in the original method of multiobjective optimization, is to solve the problemof minimizing the weighted Cebycev–norm distance function

maxi=1,...,p

= {wi{zmaxi − fi(x)} (8.80)

over the constraint set C, with weight coefficients wj .

478


This problem can easily be rewritten as the problem of minimizing a new variable y subject tothe constraints

y ≥ wi {zmaxi − fi(x)} , i = 1, . . . ,p , x ∈ C (8.81)

The idea to minimize the distance function has been generalized in the reference–point methodof Wierzbicki (1980). If the jth and the kth constraint of (8.81) are active at a point (x, y)minimizing y over the set defined by (8.81), then

wj

{zmaxj − fj(x)

}= wk {zmax

k − fk(x)} = y (8.82)

and this leads tozmaxj − fj(x)

zmaxk − fk(x)

=wk

wj

Hence, the deviations of the jth and the kth objective–function values from the correspondingideal values is inversely proportional to the weights. Practical experience has shown that this isan attractive property when attempts are made to control the approach towards an acceptablecompromise solution, although a three–dimensional linear example is sufficient to show that apoint x minimizing the distance function (8.80) over C is not necessarily efficient and not neces-sarily unique.

In order to avoid the danger of generating solutions which are not efficient, Wierzbicki (1980)proposed to add a perturbative term of the form

p∑

i=1

εi{zmaxj − fj(x)}

to the distance function (8.80), with small positive numbers εi. It is beyond the scope of thepresent volume, however, to discuss the choice of these numbers and the usefulness of the aboveperturbation.

The decision maker can express his/her reluctance to deviate from the components of the idealvector by a proper choice of the weights wi. This feature has extensively been explored by Koket al. (1985) and Kok (1986) in experiments with a long–term energy–planning model for thenational economy of The Netherlands. In order to explain the choice of the weights, we firstrewrite them in the form

wi =ρi

zmaxi − zmin

j

The decision maker is then requested to estimate the ρi via pairwise comparisons, in fact a pro-cedure which is highly similar to the mode of operation described in Chapter 5. In the basic step,the j–th and the k–th objective function are presented to the decision maker whereafter he/sheis asked to specify the ratio which is acceptable for the deviations from the ideal vector. Thus,the decision maker is supposed to estimate the acceptable ratio of ρi and ρk. In principle, thesequestions can be rather precise. The analyst can ask him/her whether a 10% deviation from the

479


ideal value zmaxj in the direction of the nadir value zmin

j is equivalent to a 10% deviation fromzmaxk in the direction of zmin

k . If the answer is ‘yes’ and if the decision maker declares that he/sheis indifferent between 25% deviations in both directions and also between 50% deviations, thenthe ratio ρj/ρk can reasonably be estimated by the value of 1. The analyst can also vary thepercentages: if the decision maker is indifferent between a 10% deviation from the ideal valuezmaxj in the direction of the nadir value zmin

j and a 50% deviation from zmaxk in the direction of

zmink , then the ratio ρj/ρk can be estimated by the value of 5. The analyst clearly takes the inverse

ratio of the deviations because a higher weight corresponds to a higher reluctance to deviate fromthe ideal value. Finally, when all or almost all pairs of objective functions have been considered,the matrix R of ratio estimates provides the analyst with the information to calculate a set ofvalues for the ρi. A detailed description of the procedure may be found in Section 5.2. When thepairwise comparisons are complete, the weights can be estimated by the geometric row meansof R. The weights are not unique since the analyst collected ratio information only. There is amultiplicative degree of freedom which can be used to normalize them in the sense that they addup to 1 or to 100%.

The analyst

zmaxj − fj(x)

(zmaxj − zmin

j )

/ zmaxk − fk(x)

(zmaxk − zmin

k )=

ρk

ρj(8.83)

8.6.3 Weighted Degrees of Satisfaction

For each feasible solution x there is a vector (f1(x), . . . ,fp(x)) of objective-function values ex-pressing the performance of x under the respective objectives. The degree of satisfaction µi(x)with the solution x under the ith objective to be defined by

µi(x) =fi(x)− zmin

i

zmaxi − zmin

i

an expression which increases monotonically from zero to one when fi(x) increases from the nadirvalue to the ideal value. If the degree of satisfaction is defined to be zero below the nadir valueand one above the ideal value, it has the form of a membership function. The global degree ofsatisfaction is now deemed to be given by the weighted geometric mean

Πpi=1µi(x)ci = Πp

i=1

(fi(x)− zmin

i

zmaxi − zmin

i

)

ci

(8.84)

where the ci,i = 1, . . . ,p, stand for normalized weights assigned to the objective functions. Theproblem is to maximize function (8.84) over the set

{x | fi(x) ≥ zmin

i i = 1, . . . ,p, x ∈ C}

(8.85)

The logarithm of (8.84) is concave over the set (8.85) so that any local maximum of (8.84) is also aglobal maximum. Moreover, the function (8.84) depends monotonically on the objective–functionvalues so that any maximum solution is non–dominated (efficient, Pareto–optimal). In order to

480


analyze of the weights the function (8.84) is somewhat generalized and the geometric mean of thedeviations from the nadir values is considered. which is defined by

F (z) = Πpi=1βi (zi − zmin

i )ci (8.86)

where the βi represent arbitrary positive factors which are due to the choice of the units ofperformance measurement. The first–order partial derivatives of F are given by

∂F

∂zi=

ci

(zi − zmini )

× F

whence

1(zj − zmin

j )∂F

∂zk/

1(zk − zmin

k )∂F

∂zj=

ck

cj(8.87)

for arbitrary j and k, and regardless of the factors βi, . . . ,βp. It is now possible to study thebehavior of F along a contour or indifference curve in the (zj ,zk) space (see also subsection5.3 where the behavior of the geometric–mean aggregation rule was studied). In a first–orderapproximation a move towards alternative points for which the decision maker is indifferent (inthe sense that they have the same global degree of satisfaction) proceeds in a direction which isorthogonal to the gradient of F , that is, in the direction

{∂F

∂zk,∂F

∂zj

}

The left–hand side of (8.86) is now defined as the relative substitution rate since it is based onthe observation that human beings generally perceive relative gains and losses, that is, gains andlosses in relation to the level from which the move starts. Thus, when a small step is made alongthe indifference curve, the relative gain (or loss) in the zj–direction is proportional to

1(zj − zmin

j )∂F

∂zk

and the corresponding relative loss (or gain) in the zk–direction is proportional to

1(zk − zmin

k )∂F

∂zj

For a function F of the form (8.86) the substitution rate between the relative gains and losses inthe (zj ,zk) space (the left–hand side of (8.87) is a constant which does not depend on the valuesof the remaining variables. Since it is also a dimensionless quantity which does not depend onthe units of measurement either, it can meaningfully be referred to as a model for the relativeimportance of the objective functions. In fact, formula (8.87) presents an inverse proportional-ity: if ck > cj then a larger step in the zj–direction is compensated by a sma1ler step in thezk–direction. This is just what one may expect when objective functions have different weightsin the decision maker’s mind.

481


It is easy to see that the maximization of a weighted arithmetic mean of the objective functionssuch as

p∑

i=1

cifi(x)

although it seems to be a popular method for solving multiobjective optimization problems, hasthe disadvantage that the maximum solution (a non–dominated solution) depends strongly onthe units of performance measurement. The decision makers who choose the weights ci are notalways aware of this. Hence, their information is meaningless if a weighted arithmetic mean ofthe objective functions is used as a scalarizing function. One could avoid the dependence on theunits of performance measurement by the maximization of the function

p∑

i=1

cifi(x)− zmin

i

zmaxi − zmin

i

but it is unclear why one should ever employ the ideal and the nadir values in this function,and not in the weighted Cebycev–norm distance function which enables the user to control thedeviations from the ideal values.

8.6.4 Numerical Example

In order to illustrate the two methods, minimization of the weighted Chebycev–norm distancefunction and maximization of the weighted degrees of satisfaction, a simple numerical exampleis first considered: the multiojective optimization problem of maximizing the objective functionsx1, x1, and x1 subject to the constraint

a1x21 + a2x

22 + a3x

23 ≤ 1

with positive coefficients in the left–hand side. The single–objective maximum solutions are givenby the points (

1√ai

, 0, 0

), ,

(0,

1√ai

, 0

), ,

(0, 0,

1√ai

)

respectively.

The ideal vector is given by (1√ai

,1√ai

,1√ai

)

and the nadir vector is the origin (0, 0, 0). The problem of finding a feasible solution where theweighted Chebycev–norm distance from the ideaI vector is minimized can be formulated here asthe problem of minimizing the variable y subject to the constraints

y ≥ wi

(1√ai− xi

), i = 1,2,3

a1x21 + a2x

22 + a3x

23 ≤ 1

482


With the weights rewritten as wi = ρ√

ai the problem is to minimize y subject to

y ≥ ρi (1− xi√

ai) , i = 1.2.3

a1x21 + a2x

22 + a3x

23 ≤ 1

Using the Kuhn–Tucker conditions for optimality one can easily show that all constraints areactive at an optimal solution (x, y) if the weights are not too small. Then

xi =1√ai·(

1− y

ρi

)

whereas y can be solved from the quadratic equation3∑

i=1

(1− y

ρi

)2

= 1 (8.88)

The relative deviations from the ideal values (relative in the sense that they are given as fractionsof the deviations between the corresponding ideaI and nadir values) are given by

[1√ai− 1√

ai

(1− y

ρi

)]/

1√ai

=y

ρi, i = 1,2,3

These deviations do not depend on the coefficients ai If one of the weights is so small that equa-tion (8.88) does not have a real solution, the corresponding component of x must be set to zerowhereafter the remaining components can be calculated from a reduced set of Kuhn–Tucker con-dition.

The maximization of the weighted degrees of satisfaction

Π3i=1x

cii

subject to the constraint

a1x21 + a2x

22 + a3x

23 ≤ 1

with normalized weights ci yields as the unique solution

xi =√

ci

ai, i = 1, 2, 3

so that the relative deviations from the ideal values are given by(

1√ai− 1√

ai

√ci

)/

1√ai

= 1−√ci , i = 1,2,3

The relative deviations for a few sets of weights are now calculated in order to illustrate how thetwo methods control the computational process. Taking ρ1 = c1 = 0.6 and ρ2 = ρ3 = c2 c3 = 0.2,the following results are obtained

483


minimization of weighted Cebycev–norm distance 0.19 0.58 0.58maximization of weighted degrees of satisfaction 0.23 0.55 0.55

Note that formula (8.83) is satisfied by the relative deviations from the ideal values when theweighted Chebycev–norm distance function is minimized: the desired ratios of the deviations(the inverted ratios of the corresponding weights) are indeed produced by the procedure. Themaximization of the weighted degrees of satisfaction yields roughly the same ratios.

A similar pattern of results is obtained for the two methods when ρ1 = ρ2 = c1 c2 = 0.4 andρ3 = c3 = 0.2


By a more ‘extreme’ choice of the weights, however, larger discrepancies are found between thetwo methods. If ρ1 = c1 = 0.7, ρ2 = c2 = 0.2, and ρ3 = c3 = 0.1, for instance, one founds thefollowing results:


Nevertheless, the results seem to be close enough to speculate that the two methods make theelusive concept of the relative importance of the objective functions operational. This is confirmedby the real–life example of the next subsection.

8.6.5 Design of a Gearbox

Multiobjective optimization has a firm place in mechanical engineering. Lootsma et al. (1995)used the experience of Athan (1994) with the gearbox design problem (Osyczka, 1984) in orderto analyze the behavior of the two methods which are considered in this subsection.

The key issues are briefly summarized. Multi-speed gearboxes are used in automobiles, machinetools, and other machines to provide a range of output rotational velocities for any input rota-tional velocity via combinations of mating gears mounted upon parallel shafts. Generally, eachshaft carries more than one gear. The operator ‘changes the gears’ by disengaging one set ofgears and engaging another. The design of a gearbox is, first, concerned with important layoutdecisions regarding the number of shafts, the distances between them, and the number as wellas the placement of the gears on the shafts. Next, the transmission ratios are determined bythe choice of the diameters of the gears, whereas strength considerations determine the numberof teeth on each gear, the modules of the gear wheels, and the tooth widths. Osyczka (1984)studied the layout design problem separately, and he proposed to model it as a multiobjectiveoptimization problem with four objectives, one of them pertaining to the dynamic performance

484


of the gearbox, the others to its weight and dimensions. Athan (1994) used the same separation,added some details to clarify Osyczka’ s problem formulation, and solved the problem with avariety of multiobjective optimization techniques.

Gearbox design generally lies within the realm of mixed continuous–discrete optimization. Gearteeth must be specified in integer numbers, and the related design variables are restricted to afinite set of integer values. Because of the difficulties encountered in mixed continuous–discreteoptimization, designers have often solved the gearbox design problem as a problem with contin-uous variables, whereafter they rounded-off the resulting values to the nearest allowable values.This mode of operation does not necessarily produce an optimal solution to the original mixedproblem. In the present chapter, however, only the results of the continuous problem are shownand the integrality requirements are ignored. Following the model reduction principles of Pa-palambros and Wilde (1988), Athan (1994) reformulated the originaI problem as a non–linearoptimization problem with 14 variables and 43 constraints. As mentioned before, there were fourobjectives, the minimization of

1. the volume of the material to be used for the gears,

2. the maximum peripheral velocity, pertinent to vibrations and noise,

3. the width of the gearbox,

4. the distance between the shafts.

The objective functions, not necessarily convex, are to be minimized (not maximized, so thatone has to use the formulas of the preceding subsections with some care) over a constraint setwhich is not necessarily convex either. The results to be reported here have been obtained withthe NLPQL subroutine which is based upon a method using quadratic approximations to thefunctions in the problem to be solved. A priori knowledge of the problem, common sense, andfour single–objective optimization runs generated the ideal vector

(9.6,10.5,226,284)

as well as the nadir vector

(13.3,27.6,473,471).


Run ρ1 ρ2 ρ3 ρ4 f1 f2 f3 f4

1 0.55 0.15 0.15 0.15 9.7 16.6 314 351

2 0.15 0.55 0.15 0.15 11.2 12.6 339 369

3 0.15 0.15 0.55 0.15 9.9 19.8 263 386

4 0.15 0.15 0.15 0.55 10.1 17.9 377 315

Table 8.14. Gearbox design problem: minimization of the weighted Chebycev–norm distance

Table 8.14 exhibits the objective–function values obtained in four runs, where the weightedChebycev–norm distance from the ideal vector is minimized for given sets of weights.

485


Obviously, run i gives the highest priority to the ith objective function (ρi = 0.55) and equalpriorities (ρi = 0.15,j 6= i) the remaining ones.

Maximization of the weighted degrees of satisfaction (the weighted geometric mean of the de-viations from the nadir vector) with the same weights produced the results displayed in Table8.15.


Run c1 c2 c3 ρc f1 f2 f3 f4

1 0.55 0.15 0.15 0.15 9.7 15.7 313 355

2 0.15 0.55 0.15 0.15 10.0 12.6 355 343

3 0.15 0.15 0.55 0.15 9.7 16.8 261 393

4 0.15 0.15 0.15 0.55 9.8 14.4 385 315

Table 8.15. Gearbox-design problem: maximization of the weighted degrees of satisfaction

At first sight, the results of the two methods are highly similar. Let us examine. them in moredetail, however.

In the run 1 relative deviations from the ideal values generated by the two methods can besummarized as follows

minimization of weighted Cebycev–norm distance ≈ 0.03 0.36 0.36 0.36maximization of weighted degrees of satisfaction ≈ 0.03 0.30 0.35 0.38

The three entries 0.36, for instance, related to the second, the third, and the fourth objectivefunction respectively, represent the relative deviations

16.6− 10.527.6− 10.5

314− 226473− 226

and351− 284471− 284

with 16.6, 10.5, and 27.6 standing for the computed, the ideal, and the nadir value of the secondobjective function, etc. Obviously, the relations (8.83) are satisfied by these objective functionswhen the Cebvcev–norm distance is minimized (recall that ρ2 = ρ3 = ρ4 in run l), but theratio 0.36/0.03 is ‘better’ than the desired ratio 0.55/0.15, in the sense that the first objectiveis closer to the ideal value than it was requested, possibly at the expense of the other objectivefunctions. Maximization of the weighted degrees of satisfaction yields practically the same results.

The run 2 relative deviations generated by the two methods are as follows

minimization of weighted Cebycev–norm distance 0.43 0.12 0.46 0.45maximization of weighted degrees of satisfaction 0.10 0.13 0.52 0.62

Note that the relations (8.83) are indeed satisfied when the Chebycev–norm distance is minimized(0.43/0.12 ≈ 0.55/0.15 and 0.43 ≈ 0.46 ≈ 0.45). Nevertheless, maximization of the weighteddegrees of satisfaction gives a ‘better’ result, possibly at the expense of the fourth objective.

486


The relative deviations generated by the two methods in run 3 are

minimization of weighted Cebycev–norm distance 0.09 0.54 0.15 0.55maximization of weighted degrees of satisfaction ≈ 0.03 0.37 0.14 0.58

They show that the maximization of the weighted degrees of satisfaction yields slightly betterresults than the minimization of the weighted Cebvcev–norm distance. The relations (8.83) arenot satisfied by the first objective function.

Finally, run 4 generates the relative deviations:

minimization of weighted Cebycev–norm distance 0.14 0.43 0.61 0.16maximization of weighted degrees of satisfaction ≈ 0.05 0.23 0.64 0.17

Maximization of the weighted degrees of satisfaction seems again to do somewhat better thanminimization of the weighted Cebvcev–norm distance. The relations (8.83) are not satisfied bythe first and the second objective function.

Obviously, as soon as the ideal and the nadir vector are known, the choice of the weights enablesthe decision maker to home in towards a compromise solution where the deviations from the idealvector are simply related to what is intuitively known as the relative importance of the objectives.

The assumptions underlying the above analysis can be somewhat relaxed. So far, we havee.xtensively used the ideal and the nadir vector because this is the initial information whichthe decision maker can (and normally will) collect when the problem under consideration is new.At a later stage he/she may choose a reference point on the basis of his/her intuition and knowl-edge. Minimization of the weighted Cebvcev–norm distance from an unfeasible reference point aswell as maximization of the weighted degrees of satisfaction from a feasible reference point maylead to ‘further improvements of the decision maker’ s guess at an acceptable compromise solu-tion. Similarly, although convenient convexity conditions were assumed, the crucial requirementsare differentiability (in order to optimize via gradient methods) and the ability to find globaloptima (in order to find the ideal and the nadir values). Moreover, the decision maker may needother scalarizing functions in order to find non–dominated solutions when the constraint set isnon-convex.

487


488

Bibliography

[1] Athan, T.W.: A Quasi–Monte Carlo Method for Multicriteria Optimization, Ph.D. Thesis, Mechan-ical Engineering and Applied Mechanics, The University of Michigan, Ann Arbor, 1994.

[2] Bel1man, R., and Giertz, M.: On the Analytic Formalism of the Theory of Fuzzy Sets, InformationScience, Vol. 5, 1973, pp. 149–156.

[3] Cooke, R.W.: Experts in Uncertainty , Oxford University Press, New York, 1991.

[4] Dubois, D., and Prade, H.: Fuzzy Sets and Systems, Theory and Applications Academic Press, NewYork, 1980.

[5] Dubois, D., and Prade, H.: Possibility Theory, an Approach to Computerized Processing ofUncertainty Plenum Press, New York, 1988.

[6] Fodor, J., and Roubens, M.: Fuzzy Preference Modelling and Multi–Criteria Decision Support ,Kluwer, Dordrecht, The Netherlands, 1994.

[7] Gaines, B.R.: Precise Past, Fuzzy Future, International Journal of Man–Machine Studies, Vol. 19,1983, pp. 117–134.

[8] Kay, P., and McDaniel, C.K.: The Linguistic Significance of the Meaning of Basic Color Terms,Language 54, 1978, pp. 610–646.

[9] Kok, M.: Conflict Analysis via Multiple Objective Programming , Ph.D. Thesis, Delft University ofTechnology, Delft, The Netherlands, 1986.

[10] Kok, M., and Lootsma, F.A.: Pairwise–Comparison Methods in Multiple Objective Programming,with Applications in a Long-Term Energy-Planning Mode1 . European Journal of OperationalResearch, 1985, Vol. 22, no. 44–55.

[11] Kosko, B.: Fuzzy Thinking, the New Science of Fuzzy Logic, Hyperion, New York, 1994.

[12] Kosko, B.: Neural Networks and Fuzzy Systems, Prentice Hall, Englewood Cliffs, New Jersey, 1992.

[13] Lootsma, F.A.: Optimization with Multiple Objectives, in ‘Mathenlatical Progranmming, RecentDevelopments and Applications’. KTK Scientific Publishers, Tokyo, 1989, pp. 333–364.

[14] Lootsma, F.A., Athan, T.W., and Papalambros, P.Y.: Contro1ling the Search for a CompromiseSolution in Multi–Objective Optimization. Engineering Optimization, 1995, Vol. 25, pp. 65-81.

[15] Lootsma, F.A., and Schuyt, H.: The Multiplicative AHP, SMART, and ELECTRE in a CommonContext , Journal of Multi-Criteria Decision Analysis, Vol. 6, 1997.

[16] McNeill, D. and Freiberger, P.: Fuzzy Logic, Touchstone, New York, 1993.

[17] 0syczka, A.: Multicriterion Optimization, in Engineering”. Wiley, New York, 1984.[18] Papalambros, P.Y., and Wilde, D.J.: Principles of Optimal Design, Cambridge University Press,

Cambridge, UK, 1988.[19] Rosch, E.: Principles of Categorization in Rosch and Lloyds eds., ‘Cognition and Categorization’,

Lawrence Erlbaum, Hillsdale, 1978, pp. 27–48.

[20] Roy, B.: Methodologie Multicritere d’Aide a la Decision, Econornica, Collection Gestion, Paris, 1985.

489

Bibliography

[21] Roy, B., and Bouyssou, D.: Aide Multicritere a la Decision: Methodes et Cas. Economica, CollectionGestion, Paris, 1993.

[22] Roy, B. and Vanderpooten, D.: The European School of MCDA: Emergence, Basic Features andCurrent Works, Journal of Multi–Criteria Decision Analysis, Vol. 5, pp. 22–37.

[23] Saaty, T.L.: The Analytic Hierarchy Process, Planning, Priority Setting, Resource Allocation,McGraw-Hill, New York, 1980.

[24] Smithson, M.: Fuzzy Set Analysis for Behavioral and Social Sciences, Springer, New York, 1987.

[25] Wierzbicki, AP.: A Mathematical Basis for Satisficing Decision Making . WP-80-90, IIASA,Laxenburg,, 1980.

[26] Winterfeldt, D. and Edwards, W.: Decision Analysis and Behavioral Research, Cambridge UniversityPress, Cambridge, UK, 1986.

[27] Yu, P.L., and Zeleny, M.: The Set of All Nondominated Solutions in Linear Cases and a Multicri-teria Simplex Method , Joumal of Mathematical Analysis and Applications, 1975, Vol. 49, pp. 430–468.

[28] Zadeh, L.A.: Fuzzy Sets, Information and Control, Vol. 8, 1965, pp. 338–353.

Zimmermann, H.J.: Fuzzy Set Theory and Its Applications, Third edition, Kluwer AcademicPublishers, Boston/Dordrecht/London, 1996.

490

Chapter 9

Engineering Economics

Economics is an important aspect of human activities and engineering as well. It deals with theuse of scarce resources: e.g. materials, human skill, energy, machinery, and last but not leastcapital . Thus economics is not just about money; nevertheless, money is necessary as a commonmonetary unit to prepare analysis of alternative designs.

This chapter is primarily concerned with the principles of engineering economics, which can bedefined as the scientific tool to help rational design decisions. Several available economic mea-sures of merit are illustrated and rational methods are emphasized for selecting criteria suitablefor different economic scenarios. The importance of stipulating reasonable rates of interest isstressed and, in this respect, the influence of financial factors is explained.

Engineering economics is a powerful but often neglected tool in ship design. Every design deci-sion should consider how the decision would affect the overall economics of the ship in question.Engineering economics provides a means to evaluate economic investment among a large set ofalternative designs since the conceptual design stage, by studying the differences in cash flow thatshould result from each alternative solution. It offers a criterion which takes all aspects of thealternative designs into account and by which they may be ranked.

Whereas there are no good reasons to suppose the economists can design ships, this does notmatter very much: they do not have to design. But the reverse argument does not apply. Navalarchitects who are concerned with the design of merchant and offshore ships have to do witheconomics in some form.

When discussing with some prospective shipowner to sale their designs, naval architects havenot to talk about technical issues. They have, rather, to talk about economics, finding out whathis/her needs are, expressed in functional terms, and what economic measure of merit is most

491

9 – Engineering Economics

convincing in his/her eyes, and be ready to deal with certain necessary details (depreciation plan,tax rates, freight rates, interest rate, charter rates, etc.).

Engineering economics is closely akin to systems engineering , an organized approach to decisionmaking. This is a systematic way of attacking a problem, using the following discrete steps:

• clearly define the objective in functional terms;

• indicate clearly under which constraints the system is to operate (e.g., flag of registry, clas-sification society requirements, port and canal limits, labor union agreements, loading andunloading facilities, etc.);

• define the economic measure(s) of merit to be used in choosing among alternative designs,and which affecting values (e.g., tax rates) are to be assumed;

• predict the quantitative value(s) of the measure(s) of merit likely to be attained by each ofthe alternative strategies.

• append summary of any important influencing considerations that cannot be reduced tomonetary terms (e.g., political implications).

Until relatively recent years naval architects and marine engineers were not instructed in practi-cal economics as part of their formal education. As an unfortunate result, most of the big andimportant studies bearing on ships design, as well as on design of offshore platforms, were (andoften still are) unfortunately made by accountants. Accounting is an admirable and necessaryprofession. Those reared in its complexities are not, however, ideally suited to the task of ana-lyzing alternative design proposals - at least not by themselves. Of course, accountants do notpretend to understand the technical matters involved in ship design. More than that, they aretrained to look back, not ahead, and they allow the arbitrary strictures of book keeping rules todistort their thinking. Three examples :

• accountants tend to ignore lost–opportunity costs because they are not entered in the books;

• accountants tend to treat imaginary depreciation costs as though they actually exist;

• accountants normally accept money at face value just as though inflation did not exist.

Wise ship design decisions require teamwork between engineers, business managers, and operat-ing personnel. Three additional observations are pertinent at this point:

• Decisions are between alternative designs. In making comparisons, designers must concen-trate their attention on assessing those cost factors that could be different between thealternatives.

• Since much guess work is involved in predicting future conditions, cost projections are boundto be crude.

• Most engineering decisions should be made on a basis of simple economic analysis. Prudentbusiness managers usually select their options on straightforward economics.

492

9.1 – Engineering Economics and Ship Design

9.1 Engineering Economics and Ship Design

Ship design involves countless decisions where many factors must be weighted before reaching adecision. Naval architects and marine engineers have traditionally slighted or misused economicsas a tool in ship design. When making a decision a designer should be sure that in privilegingone sub–system, others are not overly degraded. To avoid such sub–optimization the overall eco-nomics of the entire system should be analyzed. The actual outcome of an optimal design is notonly to generate an ideal hull form for lower resistance or minimum fuel consumption, but alsoto carry desired cargo at minimum cost. Design of the ‘best possible’ ship can only be identifiedby comparing alternative designs in economic terms.

The technical and economic capabilities of a ship are mutually dependent, and any attempt todesign a ship without due recognition of this interdependence cannot be expected to provide the‘best possible’ solution. Naval architects must aspire to produce designs which offer the ‘bestpossible’ solution to the shipowners’ requirements. It follows, therefore, that they must be ableto make valid economic analyses and estimates of both building and operating costs of ships.

Engineering success of a technical system depends substantially on economic success. Every de-sign decision should consider how that decision would affect its overall economics. A constantguiding principle in decision making is the analysis of costs and economic benefits; these com-prise the primary role of engineering economics in all engineering disciplines. In the design of aproduct, process, or system, the engineer makes a multitude of choices of configurations, subsys-tems, components, and materials; economic factors are central to engineering decision making indesigning a product or process (Hazelberg, 1994). The quantitative understanding of economicimplications in the design and operation of a product or process is indispensable.

Engineering economics should concentrate on the difference between alternatives. Correspond-ing differences in cash flows and in selected economic criteria must be predicted as a result ofdecision making . Related to the above is the rule that lost opportunity costs must be given asmuch emphasis as real costs. This is one of the major points of difference between engineersand accountants. Lost opportunity costs never show up in the textbooks, and so are ignored byaccountants. Another difference to keep in mind is that accountants focus on past results whereasengineers should look ahead.

Hence ship designers have to reclaim their role in technical-economic analysis. ”Engineers shouldprovide the economic analyses that compare the profitability potential of each alternative ...”(Benford, 1970). Although that is true in many countries, sorrowfully this statement still remaina desiderata in the Italian shipping and shipbuilding industry. There are many reasons for this,of cultural, political, and psychological nature. Nevertheless, it is not of minor importance thefact that ship designers are not capable to combine decision making with engineering economics.

Since Napier (1865) tried to apply cost studies to the determination of ship characteristics, onlyslight progress was made in engineering economics for ship design. Along decades neither thebasic textbooks of the subject nor the periodical literature normally available to practising engi-neers, managers and accountants appeared to give clear guidelines. This is not to suggest that no

493


one has even thought or written about the economics of ship design. Many authors have tackledthis problem from Bergings (1871) to Marther (1963). But few of them provided or discussed,at least implicitly, what the criteria for comparing ship designs should be. The notable paper byBenford (1963) marked a turn point since it implies or advocates specific criteria.

So it was only one century after Napier that rigorous economic evaluations have found seriousapplication to ships because of three principal reasons:

• The risk for making wrong decisions in ship design has increased continuously with ex-pansion in ship sizes and types, together with development of novel ship concepts. Untilrecently, the decision depended more on whether to build rather than what to build, as eachsucceeding ship design was usually a modification of a baseline ship.

• It is axiomatic that a ship design must be the ‘best possible’ for her service, but optimiza-tion of single technical criteria is not enough. It is widely recognized that the main criterionmust be of an economic nature, giving full weight to simultaneous influence of technical fac-tors in its evaluation. The optimal design is that which is most profitable to the customer.

• There has been increasing complexity in the financial conditions surrounding ship procure-ment. Once new ships were largely financed out of retained profits, but now cheap loans,accelerated depreciation, hidden subsidies and tax relief all add greatly to the difficultiesof estimating ship profitability. However, most design decisions should be made on thebasis of simple economic analysis. In short, particularly at initial design stages, economicevaluation has not to be adulterated with confusing financial intricacies.

The principles of engineering economics are straightforward, and designers should not find anydifficulty in making the detailed calculations, even because there are computer programs avail-able. Discussion will be substantially confined to the economic evaluations encompassing thedecision-making process in conceptual and basic design of merchant ships. The related princi-ples, however, are easily adaptable to offshore platforms and navy ships. While many of thetechniques available from engineering economics may be used by shipowner management, herethe primary purpose is to assist decisions in concept design. There are two fundamental principlesthat should guide every decision in ship design:

• a merchant ship is a capital investment that earns its returns as a socially useful instrumentof transport;

• the best measure of engineering success is profitability ; and the only meaningful measure ofprofitability is the returned profit, or required interest rate (after tax).

The required interest rate should be some logical measure of the decision–maker’s time–value ofmoney. In case of a government–owned ship it might reflect the current rate of interest paid ongovernment bonds.

9.1.1 Criteria for Optimizing Ship Design

In the context of ships, any criterion for determining the optimum investment involves the an-swers to a set of inquiries as to the extent of which economic outcomes will be different as a resultof the investment. Thus, the following issues are of importance:

494


1. What will be the gross benefits over the ship’s life? In the simplest case this is the grossearnings of the ship. But where the ship is operating as part of a liner service then theremay be some effects on earnings of other ships in the same ownership and these must betaken into account. In other terms, the needed figure is the difference between what therevenue would be with the investment and what it would have been without it.

2. What is the cost of the ship? This can be divided into two parts: acquisition cost and op-erating costs. Acquisition cost, conventionally referred to as the capital cost though it mayinclude some elements which an accountant would not normally recognized as capital (e.g.any special training that may be required by the crew, or the cost of having senior officersstanding by during the building period). Operating costs, though some, like the increase inmanagement costs and commissions to brokers, may be external to the ship. They can bedivided into the accounting headings of fuel, wages, stores, port fees, and so forth. Becausecapital cost has been included in acquisition cost it would be double–counting to includeany part of depreciation under this heading.

3. What is the life of the ship, either to scrapping or to sale second–hand? This will dependlargely on the physical characteristics of the ship, the work she has to undertake and thepolicies of the owners. Because second–hand values are usually based on the estimatedprofitability of the remaining ship’s life, policies preferring one to the other will not makevery much difference to the final answer except where the ship is highly specialized. But,since the second–hand sale of a highly specialized ship is unlikely, it may be assumed thatall ships are retained until scrapping.

4. What is the distribution of estimated revenues over the estimated life? In the years of thequadrennial classification surveys the ship will be out of service for increasing periods. Inthose years, therefore, her earnings will be reduced. The distribution of earnings throughoutthe ship’s life will not be constant. In addition, any rising or falling trend in the supply-demand position of the type of ship under consideration may affect the freight rate at whichshe trades, and, if she is a liner, the load factor at which she operates.

5. What is the distribution of the estimated operating costs over the estimated life? Again,because of the quadrennial surveys these costs will be greater in some years than in others.Moreover, there may be rising or falling trends in the costs of operation. Methods arerequired by which such trends may be brought into the calculation.

6. What is the scrap (or second–hand) value at the end of the ship’s life? Because scrap valuesare easier to estimate than second–hand values, this is a further reason for assuming thatthe ship will be retained in the same ownership until scrapping.

The answers to these questions can be stated in terms of time and money. To this end, a tablecan be drawn up in which each row represents the year of construction. In the first column onecan place all the earnings of the ship; these may be regarded as positive components of cashflow. In the second column one can place all the costs of the ship, capital and operating, againstthe years in which they are paid. It is the cash movements that one is estimating; costs must,therefore, be entered in the years of payment. This is particularly important where tax paymentsare concerned. By subtracting column two from column one, one arrives at column three: net

495


cash flow for each year in which receipts and payments caused by the ship will take place. Insome years, certainly the year of construction, this figure will be negative, until the break–evenpoint is reached. In the rest the net cash flow should be positive if the ship has any chance ofbeing an economic success.

Now it remains to relate the single net cash flows to one another. This can be done by recognizingthat the present value of a sum of money accruing in the future is less than that of an equal sumof money accruing now. That brings to the necessity of ‘discounting the future’. Concepts such ascompound interest will serve to calculate the discounted cash flow . Then, an economic criterioncan be selected, e.g. either net present value, or required freight rate, or another.

The different ways of financing the design may include some with deferred payments and interestchanges. All these payments are to be included, as and when they are expected to occur. It maythen happen that none of the early years of the ship’s life have any negative cash flows at all.

9.1.2 Operating Economics

The shipowner’s responsibilities for the various items of expenditure are illustrated in Figure 9.1.Capital charges cover items such as loan interest, repayments, and profit, all related to the capitalinvestment in the ship. The full calculation of effective capital charges can be complex. Voyagecosts cover fuel, port and canal dues, and sometimes cargo handling charges. Daily runningcosts are those incurred on a day–in, day–out basis whether the ship is at sea or in port; theseinclude crew wages and benefits, victualling, ship upkeep, stores, insurance, equipment hire andadministration. Voyage costs vary considerably from trade to trade, while daily running costs arelargely a function of ship type, size and flag.

Figure 9.1. Division of responsibility for operating costs

The type of charter and the division of responsibility for cost and ship’s time between shipownerand charterer can influence some features of the design of the ship and its equipment. With bare-boat charters less than the life of the ship, the charterer has less incentive than an owner-operatorto reduce fuel consumption, while time in port is more significant for owners of owner-operated

496


or voyage chartered ships than for time-chartered ships. Owner operators may thus be expectedto be more forward–looking in fitting fuel saving devices or better equipment to keep port turn-rounds short, e.g. bow thrusters or more elaborate cargo handling equipment. Owner operatorsoften have the highest standards of equipment and maintenance.

From estimates of the components of ship operating costs and the corresponding transport per-formance, it is possible to calculate freighting costs for a variety of ships. If relying on shore gearhaving a constant handling rate, time in port is roughly proportional to size, unlike tankers wheretime in port is almost independent of size. Thus big ships are only economic where handling ratesare commensurate with size of ship. Shore costs per ton may increase with ship size, as deeperdredging, more powerful tugs, faster cargo handling gear, and bigger stockyards are required.

Item Liner Shipping Bulk Shipping

Ship Size (deadweight) Small - Medium Medium-Large(5000-25000 multi-deck) (15000-550000)(5000-50000 unit load)

Ship Speed Medium–Fast Medium–Low (12-17 knots)

Area of Operation Specific trade routes Worldwide

Type of Carrier Common Contract

Organisation/Ownership Conference of liner members Independent or industrial carrier

Assignments Large number of small parcels Small number of large parcels

Nature of Cargo Heterogeneous (general) Homogeneous (bulk)

Freight Rates Administered Negotiated(level set to cover costs) (set by supply & demand)

Competition Market shares Price and deliveryQuality of serviceNon–conference lines

Scheduled Service Yes (constant speed ship) No (constant power ship)

Mass or Volume Limited Volume Usually mass except certaincargoes and SBT tankers

Ports Serviced Range of ports near Usually one port each end nearmajor cities producing/consuming plant

Days at Sea per Year 180-240 (multi-deck) 240-330200-280 (unit load)

Own Cargo Handling Gear Yes (multi-deck) Usually none except tankersSometimes (unit load) and smaller bulk carriers

Table 9.1. Some differences between liner and bulk shipping

Actual bulk cargo freight rates are regularly published in the shipping press, ship brokers’ reports,etc. They vary with supply and demand, and can be regarded as oscillating about a level offreighting cost, which gives the average efficient operator an acceptable rate of return in the longrun. However over–supply of ships leading to long periods of low freight rates can occur owing to,for example, very attractive shipbuilding loan terms. Table 9.1 indicates that different economicfactors apply differently to liner as opposed to bulk shipping.

497


9.2 Time–Value of Money

Money has not only a nominal value, expressed in some monetary units, but also a time value. Inthe most familiar form the time value occurs on a bank–account. So, the bank–account has timevalue. Exactly the same value occurs in investments. If a return is yielded over an investment,and is reinvested, a return is yielded over a return, which means that the value of a return dependson the year in which that return is earned.

If designers want to make proper decisions, first they have to recognize that a given amount ofcash exchanging hands today is more important than the same amount of cash exchanging handsin the future. In more simple terms, a sum of money in the hand today need not be spent, butcould be put to work and allowed to generate rent money (i.e., interest). Then a fundamentalconcept of economics can be introduced: it must be considered not only how much money flows,in or out of a company, but also when. Designers must assign some time–value to money. Theyhave also to consider relative risks, recognizing that expectations may or may not be fulfilled.Higher risky proposals naturally place greater emphasis on the time–value of money .

The quantitative recognition of time–value of money is handled by means of standard compoundinterest formulae. Interest relationships make allowances for the time–value of money and thelife of the investment and may be used to convert an investment (e.g. cost of a ship) into anannual amount which, when added to the annual operating costs, may be used to determine thenecessary level of income to give any required rate of return.

The interest can be thought about in three distinct forms:

1. Simple interest when saving deposits in banks.2. Compound interest when a present sum is converted into a future sum and vice versa.

3. Returned interest which is a measure of gains from risk capital invested in a profitable com-pany. This is called by various names including internally generated interest , interest rate ofreturn, profit or simply yield . It is one good measure of profitability, expressing the benefitsof an investment as equivalent to returns from a bank at the derived rate of interest. Mostcountries impose a tax on business incomes, so the analysts must differentiate between ratesbefore and after tax .

The interest is calculated by exactly the same mathematical expressions in all three cases. Eventhough the compound interest is the more suitable to engineering economics in design decision–making, what is more important is that in deciding between alternative designs, one must considerfor each solution not only cash flows, but also their temporarily evaluations.

Alternatively, where annual cash flows are known, the relationships can convert them into presentworths, which may be added together to give net present value (NPV ), for comparison with theamount of the investment. The future cash flows are discounted (the inverse of ’compounded’),hence the common name of discounted cash flow (DCF ) calculations. For an investment to beworthwhile, the present worth of the cash flows of income minus expenditure should be greaterthan the investment, taking inflows as positive, and outflows as negative, i.e. NPV should bepositive. Cash flow implies money moving in and out of the company’s bank account.

498

9.3 – Cash Flows

Before considering how to integrate the related economic factors into the technical design of ships,the methods of making economic calculations, which can be used to evaluate alternative designsof freight earning vessels, must be taken into consideration. Engineering economics calculationsneed to take account of performance over longer periods.

9.3 Cash Flows

An investment project can be described by the amount and timing of expected costs and bene-fits in the planning horizon. The terms costs and benefits represent disbursements and receipts,respectively. The term net cash flow is used to denote the receipts less the disbursements thatoccur at the same point in time. The stream of disbursements and receipts for an investmentproject over the planning horizon is said to be the cash flow profile of the project.

To facilitate the description of project cash flows, they are classified in two categories: (i) discrete–time cash flows, and (ii) continuous–time flows. The discrete-time cash flows are those in whichcash flow occurs at the end of, at the start of, or within discrete time periods. The continuousflows are those in which money flows at a given rate and continuously throughout a given timeperiod. The following notation will be adopted:

• Fn = discrete payment occurring at period n;

• Ft = continuous payment occurring at time t.

If Fn < 0, Fn represents a net disbursement (cash outflow). If Fn > 0, Fn represents a net receipt(cash inflow). The same can be said for Ft.

9.3.1 Cash Flow Profile

Cash flow diagrams are an important convention that engineering economists and designers shoulduse in decision–making. This refers to simple schematics showing how much money is being spentor earned year-by-year. In them, the horizontal scale represents future time, generally dividedinto years. The vertical scale shows annual amounts of cash inflows (upward pointing arrows) oroutflows (downward pointing arrows). When cash flow estimation is repeated over the projectlife of a ship, the result is a series of net cash flows. If the sum of the series is positive, the flow isfrom the ship to the shipping company; if it is negative, the flow is from the shipping company tothe ship. This series is often called the cash flow series for the ship project, and the shipownerdecides whether to undertake the project on the basis of the estimated cash flow.

Part of the convention is that definition of cash flow series is simplified by assuming that all thecash flows occur on the last day of each year. This assumption simplifies mathematical formula-tion. Although in fact cash may change hands almost continuously, any errors that result fromthis simplifying assumption are likely to he common to all alternatives under study, and so shouldhave little effect on the decision.

499


To help visualization of the amount and timing of cash entering or leaving the organization, theso-called cash flow diagram is frequently used (see Figure 9.2), where time is represented on thehorizontal scale, whereas annual cash amounts are shown on the vertical scale.

Figure 9.2. Cash flow vs. time

Zero on the time scale can be arbitrarily selected. It may mean ‘now’, ‘time of decision’, ‘timewhen ship goes into service’, etc. Cash flows may be represented by bars or by arrows. Figure 9.3shows a typical irregular cash flow pattern, in which receipts during a period of time are shownby an upward arrow and disbursements during the period are shown by a downward arrow. Thediagrams are drawn from the perspective of a lender or an investor. A borrower, on the otherhand, would picture the arrows reversed, but the method of analysis would be exactly the same.

Figure 9.3. Representation of cash flows

Ships have long economic lives, usually at least twenty years. It is therefore justifiable to treatcash flows on an annual basis. For shorter-term studies, briefer time periods can be used, perhapsmonths. The basic principles and mathematics remain the same.

The basic relationships shown below use the following nomenclature (standard notation of Amer-ican Society for Engineering Education), where capital letters are used for absolute values, andlower case for fractional values:

P present amount, principal, present worth, or present valueA annual return (e.g. income minus expenditure) or annual repayment

(e.g. principal plus interest)F future amount

500

9.3 – Cash Flows

N number of years (e.g. life of ship or period of loan)i interest or discount rate per year, decimal fraction (percentage rate/100)

All of the following basic interest relationships apply to cash flow patterns illustrated below.

9.3.2 Interest Relationships

Single Series

The first basic interest relationship is the single-investment, single-payment pattern shown inFigure 9.4.

Figure 9.4. Cash flow diagram for a single payment

Knowing the initial amount, P , and wanting to find the future amount, F , multiply P by the so–called single payment compound amount factor , usually shortened to compound amount factor .If the time period is but a single year, the future amount, F , would equal the initial amount, P ,plus the interest due, which would be i·P ; in short

F = P + i·P = P (1 + i)

If the time period, N , is some integer greater than one, then the balance of the account would havecompounded annually as a function of that number of years, leading to the general expression forthe total repayment by the end of N periods

F = P (1 + i)N = (CA− i−N) P

The factor (1+i)N is called the single-payment compound amount factor and is available in tablesindexed by i and N . It is abbreviated CA and, when associated with a given interest rate andnumber of years, the combination is indicated by the convention

(CA− i−N) = (1 + i)N

The reciprocal of the compound amount factor is the single present worth factor . It is oftenshortened to present worth factor , indicated by convention as (PW − i−N). It is the multiplierto convert a future sum into a present sum. This being the case, the abbreviation PW can nowbe taken to mean present worth or present value. It is also called the discount factor . The termsare used interchangeably.

501


Reversing the process, if it is desired to know what single future amount F must be depositedat interest i, compounded periodically, the equivalent present value can be found by multiplyingthe desired amount by the reciprocal of the compound amount factor

P =F

(1 + i)N= (PW − i−N) F

The ‘present worth’ of F , which includes accumulated interest, is exactly the same as P , i.e. theyare effectively equivalent.

Uniform Series

In many economic projections, decision makers assume uniform annual cash flow, even thoughthat uniformity will not really occur. Again, any errors that result from this assumption are likelyto be the same for all design alternatives.

The interest relationship applies to a single initial amount, P , balanced against uniform annualamounts, A, as shown in Figure 9.5.

Figure 9.5. Single investment, uniform annual returns

If the uniform annual amounts, A, are known and the decision maker wants to find the presentworth of them, P , he/she can use the expression

P =(1 + i)N − 1i (1 + i)N

·A = (SPW − i−N) A

The component (SPW − i−N) is called the series present worth factor , which is the multiplierto convert a number of regular annual payments into a present sum. It is also called annuity factor.

This relationship is useful for situations in which the size of future uniform annual returns froman investment can be predicted and the decision maker wants to find out how much he/she canafford to put into that investment.

Note that the series present worth factor is numerically equal to the sum of the individual annualpresent worth factors over the life of the investment; so is very useful for dealing with uniformcash flows, which can be used for many marine problems, at least in preliminary evaluations.

Again reversing the approach, suppose to convert a present sum of money into an equivalentamount repaid uniformly over a number of time periods, usually annual. Then the capital recoveryfactor , CR, enables an initial capital investment (say in a ship) to be recovered as an annualcapital charge, which includes both principal and interest. CR is the ratio between this uniform

502

9.3 – Cash Flows

annual amount, A, and the principal, P , i.e. A = CR·P . It can be shown from compound interestrelationships and the sum of geometrical progressions that

CR =A

P=

i (1 + i)N

(1 + i)N − 1

When associated with a given interest rate per compounding period, i, and number of compound-ing periods, N , the capital recovery factor is (CR− i−N).

Uniform Annual Deposits, Single Withdrawal

The third pair of interest relationships apply to the cash flow pattern shown in Figure 9.6, inwhich annual amounts are matched against a single future amount, F .

A quiry about this pattern is that at the end of the final year there are arrows pointing in oppositedirections. This is done to simplify the calculations. Of course, in real life the net amount paidwould not be F , but F minus A. Another possibility is that within a business setting the annualamounts would actually comprise continual cash deposits during the year. Nevertheless, one mayassume single year–end amounts.

Figure 9.6. Uniform annual deposits, single withdrawal

If the uniform annual amounts, A, are known, and it is desired to find the equivalent single futureamount, F , that can be withdrawn, multiply A by the series compound amount factor, SCA

F = (SCA− i−N) A

Conversely, if the analyst wants to build up the future amount, F , and wants to find the corre-sponding uniform annual amounts to be deposited, A, he/she will multiply that future amountby what is called the sinking fund factor, SF

A = (SF − i−N) F

Of course, the sinking fund factor is the reciprocal of the series compound amount factor

(SF − i−N) =1

(SCA− i−N)=

i

(1 + i)N − 1

503


Gradient Series

Many engineering economic problems, particularly those related to equipment maintenance, in-volve cash flows that increase by a fixed amount, g, each year. The gradient factors can be usedto convert such gradient series to present amounts and equal annual series.

The present worth of such a cash flow can be found with a year-by-year analysis, as shown inFigure 9.7.

Figure 9.7. Gradient series pattern

A more sophisticated way may be to first find the equivalent uniform annual amount, A, by meansof the following formula

A = A1 +g

i− N ·g

i(SF − i−N)

Consider the series

Fn = (n− 1) g , n = 1,2, . . . ,N

The gradient g can be either positive or negative. If g > 0, the series is called an increasinggradient series. If g < 0, one has a decreasing gradient series. The single–payment present–worthfactor can be applied to each term of the series to obtain the expression

P =N∑

n=1

(n− 1) g

(1 + i)n

Alternatively, the present value of the series can then be found using the appropriate seriespresent worth factor based on the same values of i and N

P = (SPW − i−N) A

If the pattern shows a uniform downward slope, then the equivalent uniform annual amount willbe

A = A1 − g

i+

N ·gi

(SF − i−N)

504

9.3 – Cash Flows

Random Series

To find the present worth of an irregular cash flow (Fig. 9.8) the analyst must discount eachamount individually to time zero and then find the cumulative present value. This cumulativeamount can be converted to an equivalent uniform annual amount using the capital recoveryfactor

Figure 9.8. Random series pattern

Stepped Cash Flows

Another common variation involves cash flows that remain uniform for some number of years (orother compounding periods) but then suddenly exhibit a step up or down, or perhaps severalsuch steps. In real life this might come about because of the peculiarities of the tax laws, as oneexample.

One way to solve this problem would be to analyze the cash flow year–by–year in a table, butthere are easier ways. Perhaps, with reference to Figure 9.9 the easiest way to understand is to

- find the present worth of A2 for N years;

- add the present worth of ∆A for Q years.

Figure 9.9. Stepped patterns

In short

PW = (SPW − i−N) A2 + (SPW − i−Q) ∆A

505


The analytical technique developed above can be applied to cash flows that involve more thenthe two levels of income shown, as well as to negative cash flows or combinations of positive andnegative flows.

9.4 Financial Factors

The overall profitability and the extent of own capital required is affected by load finance and taxconsiderations. Clearly the availability of advantageous terms may be an important factor in thechoice of a particular ship. However, within one particular design solution, these considerationsdo not usually affect the order of merit; that is, initial technical decisions on, for example, typeof main engine, can often be made without considering detailed fiscal considerations, only usingfor example a discount rate which takes account of typical conditions. Of course, once a projecthas reached the detailed investigation stage, such aspects will need to be considered explicitly, ona year–by–year basis, examining cash flow projections in more detail.

9.4.1 Taxes

Taxation represent one of the most important aspects of the investment projects. To omit it wouldbe to falsify the picture. Not only is tax important as such, but so are the various allowanceswhich may be made against taxable income. Therefore, naval architects involved in the design ofa merchant ship should have at least a rough idea about the applicable tax structure. In manycases a proper recognition of the tax law will have a major impact on design decisions. In othercases, taxes can be ignored. In any event, a naval architect should understand enough about thesubject to discuss it intelligently with business managers. Tax laws are written by politicians whoare swayed by pressures coming from many directions and are changed over time. As a result taxlaws are almost always complex, and continually changing. Thus, most large companies employexperts whose careers are devoted to understanding the tax laws and find ways to minimize theirimpact. No attempt is made here to explain all the complexity of current tax laws; but somesimple tax concepts are outlined and their effects on cash flow explained.

When incomes are known, the impact of the corporate profits tax is usually neutral, that is,analyzing before–tax returns will point to the same optimum as that indicated by after–taxreturns. This is fairly obvious from the equation relating capital recovery factor before and aftertax

CR′ = CR (1− t) +t

N

where t is the tax rate.

Since the tax rate t and N are presumably the same for every alternative, the maximum valuesof CR′ and CR are of necessity tied to the same design. This might lead the designer to concludethat taxes have no influence on technical decisions. That would be true were the level of incomeindependent of taxes. Such is seldom true, however, because free market conditions make freight

506

9.4 – Financial Factors

rates sensitive to taxes. Shipowners base their prices on the attainment of a reasonable level ofprofitability. When taxes are raised, prices must also be raised to reflect the added burden. Thetrue impact of the tax is perhaps best illustrated in conceptual design, which involves a proposalfor a new ship, whose advantages must be weighted against competitive ships. In some cases a shipwill have a radically greater first cost but lower operating costs (or more income) than the other.The time-value of money thus becomes supremely important in the comparison. Furthermore, tomake the conclusions as general as possible (and in recognition of free market conditions), mostconceptual designs use the RFR criterion or something equivalent. The decision–maker musttherefore select his/her stipulated yield with great care; he/she must also recognize the effect ofthe taxes on the yearly revenue required to attain the stipulated yield.

The present tax is basically structured so that the corporation tax rate is levied at a particularrate on the before–tax cash flow. This tax base is broadly: Income - Operating Expenses - Intereston Loans - Depreciation Allowances.

Cash Flows Before and After Tax

Tax is assessed after the shipping company’s annual accounts are made up and are thus paid1–2 years in arrears of the corresponding cash flows. Annual income can therefore be dividedaccording to the bar diagram illustrated in Figure 9.10. It shows how annual revenues are treatedwhen figuring corporate income taxes. It is assumed here that all factors remain constant overthe N years of the design’s economic life. This is what economists call a heroic assumption, butit is frequently good enough for design studies.

Figure 9.10. Distribution of annual income

The bar diagram shows that the annual cash flow after tax , A′, is related to the cash flow beforetax, A, by this simple expression

A′ = A (1− t) +t·PN

(9.1)

or, turning it around

A =A′ − t·P

N1− t

(9.2)

507


It is important to note is that all rational measures of merit are based on after–tax cash flows,not profits. In short, decision makers should not use profits to measure profitability, but use cashflows instead . Profits are misleading because they are polluted with depreciation, an expensethat is misallocated in time.

The return after tax, which includes the depreciation provision, is the shipowner’s disposable in-come to use for repayment of loan principal, dividends, fleet replacement or any other permissibleuse. Dividends, however, are paid to shareholders without further deduction to tax allowance ismade in the shareholder’s own tax liability for the amount already paid under ’corporation tax’(tax credit).

Depreciation allowances are usually based on historic costs (i .e., face–value units) rather thanon replacement costs. Thus, the standard tax shield for depreciation (t·P/N) must be discountedtwice to find its present worth in constant value terms. If one assumes that before-tax cashflows, A, will remain uniform in constant–value monetary units, then one must recognize thatafter-tax cash flows will drop somewhat over the years. This is due to the diminishing value ofthe depreciation allowances. The constant–value present worth of the after-tax cash flow can befound as follows

PW = (SPW − r −N)·A (1− t) + (SPW − r −N)t·PN

or, by subtracting the investment, one can find the net present value

NPV = (SPW − r −N)·A (1− t) + (SPW − r −N)t·PN

− P

where

A annual cash flow before tax (in current–value monetary units)t tax rateP initial investmentN economic lifer discount rate applied to current–value (i.e., true time–value of money)i discount rate applied to face–value [i = (1 + d)∆(1 + r)− 1]d general rate of inflation

Fast Write–off

So far, the assumption was made that the ship’s tax life coincided with its economic life. Thisis not always the case because owners are sometimes permitted to base depreciation on a shorterperiod. It is called fast write–off and is advantageous to the investor. This is so because itprovides a more favorable after–tax cash flow pattern. Over the life of the ship the same totaltaxes must be paid, but their worst impact is delayed.

Some countries allow shipowners’ freedom to depreciate their ships as fast as they like. In thatsetting the owner can make the depreciation allocation equal to the cash flow before tax. That

508


will reduce the tax base to zero, and no taxes need be paid during the early years of the ship’slife. After that, of course, the depreciation tax shield will be gone, and higher taxes will come.Again, however, the total tax bill over the ship’s life will remain the same, unless the ship is soldbefore the expected life.

More typically, the owner will not be given a free hand in depreciating the ship. Rather, the taxlife, that is, depreciation period, will be set at some period appreciably shorter than the expectedeconomic life. This will result in cash flow projections that feature uniform annual amounts witha step down after the depreciable life is reached.

To handle such a situation, first give separate attention to two distinct time periods. The first ofthese comprises the years during which depreciation allowances are in effect, the final such yearbeing identified as Q. The second time period follows Q and extends to the final year of the ship’slife, designated as N . Among straight–line depreciation, the cash flows before, A, and after tax,A′, will be related as shown in Figure 9.11.

Figure 9.11. Cash flow for fast write–off

Now, recalling how stepped cash flows were handled above, the present worth of the above canbe found as follows

PW = A (1− t)·(SPW − i′ −N) +t·PQ

(SPW − i′ −N)

9.4.2 Leverage

Various ways are examined in which a shipowner may go into debt in order to expand the scopeof operations. It is noted that the interest payments incurred may reduce the tax base and sothey must be recognized in assessing after–tax cash flows.

Increasingly complicated loan arrangements are considered. In fact, there are times when a navalarchitect will want to apply simple schemes, but there will be times in the design process whenhe/she will want to apply complex schemes. In general, in the conceptual design stage, whenhundreds or thousands of alternatives are under consideration, the designers should be satisfiedto use the simplest schemes. At the other end of the scale, when the choice has been narroweddown to half a dozen, the naval architect, the shipowner, or the business manager, can applymany more realistic assumptions if considered necessary.

In general, the more realistic and complex assumptions will slightly reduce the impact of theincome tax. In the early design stages, when assuming simple loan plans, the naval architect mayrecognize this effect by adding a small increment to the actual tax rate or to the interest rate.

509


The same thought applies to assumptions regarding tax depreciation plans. By using such ad-justments, the ‘best possible’ design as indicated by the simple assumptions will closely approachthe ‘best possible’ as indicated by the more realistic and elaborate assumptions.

Many, if not most, business managers have ambitions beyond the reach of their equity capital.This leads them to leverage up their operation by obtaining a loan from a bank. The same is trueof individuals who want to own a yacht. It is also often true of governments who sell bonds soas to finance a share of current expenditures. In nearly every case the lender requires repaymentof the loan within a given time at a given interest rate. Typically, the repayments are made inperiodic bits and pieces comprising both interest and some reduction in the debt itself. In short,the periodic payments are determined by multiplying the amount of the loan, PB, by the capitalrecovery factor appropriate to the loan period, H, and the agreed–upon interest rate, iB. Thetypical repayment period is monthly, but for ship design studies one may generally assume annualrepayments, AB; in short

AB = PB (CR− iB −H)

As an alternative to applying to a bank, managers may choose to raise capital by selling bonds.As far as one needs be concerned here, the effect is the same: the debt must be repaid at someagreed–upon rate of interest.

To stimulate their shipbuilding industries, many countries throughout the world offer loans forship purchase, subsidized from central sources at below-market rates of interest. The loans reducethe effective cost of the ship, and encourage owners to place orders. Credit terms officially avail-able are broadly similar in each major country, for ships typically 80% of the contract price foreight years at 5% interest. For offshore mobile units, typically 85% loans are available but only forfive years. Loans for second–hand vessels are usually made on normal commercial terms. Interestpayments can be deducted before tax liability is calculated on profits earned during the ship’s life.

Various initial and legal fees are charged in addition to loan repayments and interest, usuallyabout 1% of the total loan. Generally the credit is advanced to the owner as building instalmentsbecome due, so that interest becomes payable before the ship is delivered unless arrangementsare made to defer it. Repayment is usually in equal amounts at six-monthly intervals after de-livery, plus interest on the declining balance. Although favorable credit terms are an importantmarketing factor for shipbuilders (and once have contributed to the world over–supply of ships),they do not usually affect the order of merit between technical alternatives.

Figure 9.12 shows that as the credit proportion approaches 100 percent, the IRR on the shipowner’sdiminished equity capital approaches infinity, but NPV or RFR continue to give meaningful re-sults. While this might suggest that shipowners should borrow near 100% of their capital needs,in fact this is risky, as in adverse market conditions, prior charges such as loan servicing would beexcessive, which could force the owner into liquidation with insufficient cash flow. An appropriatebalance of own capital (e.g. shareholders’ or equity funds) and debt (loans or credit) is necessaryfor financial stability.

510


Figure 9.12. Effect of borrowed capital on return

Some shipowners like to maintain a debt–equity ratio of about 60:40, buying in their own stockwhen investment opportunities are too limited to maintain that ratio. Another tempering factoris that the shipowner has not to pay any tax on that part of his/her gross income that he/sheturns over to the bank or bond-holder for interest. This, in effect, cuts his/her cost of borrowingabout in two.

9.4.3 Practical Cash Flows

Although it is possible to make good use of the uniform cash flow relationships in preliminarycalculations and obtain results of about the correct order of magnitude, cash flows in most prac-tical cases of ship investment are not uniform.

The most important reasons for these irregular cash flows are:

- loans for less than the life of the ship;

- differing relative rates of growth in main items of income and expenditure;

- tax allowances for (capital) depreciation and loan interest;

- subsidies.

Other variations occur but, although altering the absolute values in the economic calculations,are unlikely to change significantly the relative values (‘ranking’) between alternative designs, asthey tend to affect all designs in a similar manner. The variations would have to be taken intoaccount where the differences in the designs affect one particular factor, e.g. different scrap valuesbetween steel, aluminium and GRP hulls. These variations are:

- scrap value;

- irregular pattern of building installments;

- special surveys or major overhauls involving appreciable cost and time out of service;

- general decrease of speed with increasing age;

- long–term charters less than ship’s life.

Although corrections may be applied to the uniform cash flow cases to cater for some of theitems quoted, the more general procedure is to make complete year-by-year calculations. A table

511


is constructed to show for each year of life, the items of income and expenditure generating abefore–tax cash flow. After making allowances for tax, the after–tax cash flows are multiplied byeach year’s present worth factor, and summed up to give the discounted cash flow over the ship’slife and a resulting NPV .

Cash Flows for Equal Periods

The bar diagram shown in Figure 9.13 explains the cash flow before and after tax when thebank loan period is assumed to be the same as the ship’s economic life (H = N). Straight–linedepreciation is also assumed, with depreciation period equal to economic life (Q = N). A finalassumption is that the before–tax cash flow, A, remains constant. For many design studies theseassumptions are reasonable.

Figure 9.13. Cash flow for equal periods

In analyzing the cash–flow distribution shown in Figure 9.13 a more simplifying assumption isused, which involves substituting a uniform annual value of the interest payments, IB, for theactual, ever–diminishing values. Figure 9.14 shows the real distribution between principal andinterest payment as well as the simplification of uniform iB.

Figure 9.14. Distribution between principal and interest payments

Cash Flows for Differing Periods

Shorter Pay–Back Period

Benford (1965) explains how to analyze returns before and after tax when the pay–back period tothe bank differs from the economic life of the ship. It is assumed that the debt is repaid in equal

512


annual installments as well as straight-line depreciation and zero scrap value. The expressionsrelating conditions before and after tax now become

A′ = A (1− t) +t·PN

+ t·IB

and

CR′ = CR (1− t) +t

N+

t·IB

P(9.3)

where IB is the annual interest paid to the bank.

Further, the residual annual return to the owner, A◦, will be

A◦ = A′ −AB

where the annual return to the bank is found by means of the appropriate capital recovery factoras

AB = PB (CR− iB −N)

The annual interest paid to the bank will diminish from year to year, but for design purposes onecan safely assume that it will be constant and equal to the annual return to the bank minus auniform annual payback of the initial loan

IB = AB − PB

N

The annual interest paid to the bank can be also written as

IB =[(CR− iB −N)− 1

N

]·PB

All Periods Differing

Finally, it is appropriate to analyze cash flow before and after tax when the period of the bankloan, the depreciation period, and the economic life of the ship are all different. Initially it isassumed that the loan period, H, is shorter than the depreciation period, Q, which in turn isshorter than the economic life, N . The cash flow diagram would then contain three segments, asshown in Figure 9.15.

During loan period (O − H) the cash flow before and after tax would be as developed beforeexcept that care must be taken to identify the differing time periods H, Q, and N . The cash flowafter tax will be

A′ = A (1− t) + t·IB +t·PQ

513


Figure 9.15. Cash flow for differing time periods

During residual depreciation period (H −Q) the interest payments would no longer be a factor;so the only tax shields would be the depreciation allocation, i.e.

A′ = A (1− t) +t·PQ

During remaining period (Q−N) there would be no tax shields at all, so

A′ = A (1− t)

Applying the techniques introduced when discussing cash flow diagrams, one can find the presentworth of this cash flow as follows

PW = A (1− t)·(SPW − i−N) + (t·P/Q)·(SPW − i−Q) + t·IB (SPW − i−H)

Thus, if there are uniform cash flows before tax and a stepped–pattern of cash flows after tax, theanalyst can find the present worth of the after–tax cash flows by means of that relatively simpleequation.

9.4.4 Depreciation

When a shipping company makes a major investment, it exchanges a large amount of cash fora physical asset of equal value. In its annual report it takes credit for that ship and shows nosudden drop in company net worth. Over the years, however, as the ship becomes less valuable,its contribution to the company’s worth declines; that is, it depreciates.

Depreciation is of special significance when computing income taxes, for it is an accounting ex-pense that reduces taxable income and hence reduces taxes, but it does not represent a cash flowduring this accounting period. Instead, the cash flow may have occurred at time 0 when theship was purchased, or it may be spread over a loan repayment period that is different from thelifetime used for depreciation.

Depreciation is not an actual cost or expenditure of cash, but a bookkeeping transaction usedboth for tax and for accounting purposes. For accounting purposes, depreciation is used to as-sess the ‘profit’ available for distribution to shareholders after applying a rate on fixed assets

514


that maintains capital intact in money terms. The calculation of depreciation for tax purposesis nearly always different, and as it affects the actual cash flows and final net income, it is theaspect considered here.

Traditionally, depreciation (or capital) allowances have been calculated either as ‘straight line’(annual allowance = ship cost/ship life) or ‘declining (or reducing) balance’ (annual allowance =percentage of residual value of ship each year), or other variants which, in effect, write off theinitial cost over the expected life of the investment. In many cases, ’cost’ may be acquisition costminus expected residual value, e.g. assumed scrap value.

When using the basic interest relationships, e.g. CR, it is not necessary to add any furtheramounts for depreciation. The use of CR recovers the capital invested over the life of the ship,plus the required rate of return. However, depreciation affects the amount of tax payable by ashipping company. Regularly occurring expenses, such as operating costs, may be deducted in fullbefore tax is levied, but purchase of a ship is treated on a different basis by means of depreciationallowances, strictly called ‘capital allowances’.

Straight–Line Depreciation

In its simplest form, the ship is assumed to lose the same amount of value every year until theend of her economic life. This is called straight–line depreciation (Fig. 9.16).

Figure 9.16. Straight–line depreciation

The straight–line method spreads the amount evenly over the depreciation lifetime of N years. Ifthe initial value or cost is P and the residual or salvage or scrap value is S, the annual depreciationDn is found as

Dn =P − S

N, n = 1,2, . . . ,N

In most cases one is justified in ignoring the disposal value. Although it is hard to predict, thescrap value is typically less than 5% of the initial investment; and, being many years off, it haslittle impact on overall economics.

515


Declining Balance

The declining–balance method allocates each year a given fraction of the book balance at theend of the previous year. If the declining balance is R% per annum, the N–th year depreciationallowance is 100R (1−R)N−1, where the declining balance rate R is given by

R = 1−(

S

P

)1/N

Such a method can be used for accounting purposes, and some countries’ tax authorities usevariants of them. For example, the declining balance method was instituted in 1984 for Britishshipowners for tax purposes. Following a transition period, the system adopted a declining bal-ance rate of 25%. Thus first year allowance is 25%, second 18.25%, third 14.06%, fourth 10.55%etc. Thus it takes eight years to accumulate to 90%, a typical amount allowing 10% scrap value.

Until 1984 British shipowners were allowed to depreciate their ships for tax purposes at anyrate they liked, with 100% first year allowances and ‘free depreciation’. In practice, this meantwriting the ship off as fast as profits permitted, i.e. extinguishing all liability for tax until thedepreciation allowance had been exhausted. If there were profits from other ships in the fleet, orother activities of the business, it was possible to write off the entire cost of a new ship againsttax liability on these other profits in the first year. From then on, tax was paid on the full profit.This could be called the ‘full depreciation’ or ‘full tax’ position. Any unused allowance (e.g.,because of insufficient profits) can be carried forward and used in subsequent years.

A more general case for economic studies was to assume that depreciation could only be allowedagainst the profits of the particular ship being studied. This is equivalent to a newcomer to ship-ping, so can be called the ‘new entry’ position. At typical freight rates, it then takes some 6 to 12years before tax becomes payable, but considering the time value of money, this is not worth somuch as writing off in one year, but was better than writing off over, say, 20 years. In all cases,tax balancing charges are usually levied if the disposal value of a ship exceeds its written–downvalue for tax purposes, i.e. tax allowances have been granted on the full cost of the ship, but thedisposal income needs to be set against this, so is potentially taxable.

The 100% allowance system may encourage the leasing of expensive ships whereby a financialinstitution like a bank actually owns the ship. The ship may then bareboat or chartered to a shipoperator at a slightly lower rate than would otherwise be possible.

Accelerated depreciation and other complexities

Nearly every maritime country gives special tax treatment to ships, usually including some formof accelerated depreciation. Some tax laws recognize that straight–line depreciation is based onan unrealistic assessment of actual resale values of physical assets. This leads to various deprecia-tion schemes that feature a large allocation during the first year of the asset’s life and diminishingallocations thereafter. These declining amounts may continue over the entire economic life, orthey may lead to complete write–off in some shorter period. One may thus found accelerated

516


depreciation combined with fast write–off. In any event, the total taxes over the asset’s life willonce more be the same. The primary advantage of such schemes is to offer the corporate a morefavorable earlier distribution of after–tax cash flows.

When dealing with shipowners naval architects will likely have to talk to accountants who knowall the tax rules and want to apply them to the design analysis. Naval architects must of coursepay attention to skills of these people. But they must also realize that they are usually safe inapplying massive amounts of simplifying assumptions, at least in the initial design stages.

It should be known that some managers use the simplest sort of analysis in choosing projectsand in deciding whether or not to go ahead with them. This is so even though they intend touse every possible tax–reducing trick if the project does indeed come to fruition. This suggeststhe wisdom of using multiple methods, for example, straight–line depreciation, in the conceptualdesign stage when hundreds or thousands of alternatives are under construction, but then, havingnarrowed the choice down to half a dozen alternatives, letting the accountants adjust the chosenfew to satisfy their needs.

Starting with gross simplifications enables looking ahead to the effect of the more elaborate taxschemes by recognizing that their net effect is to produce some modest increase in present valuesof future incomes. This may be taken into account by assuming a slightly lower tax rate. Alter-natively, future cash flows can be discussed with a slightly lower interest rate.

9.4.5 Inflation

Here the scope is to explains how to analyze monetary inflation, particularly how it may influ-ence decision–making in ship design. In general, inflation has a trivial impact on rational designdecisions. However, there may be special situations in which inflation should not be overlooked.Shipowners, who expect more inflation than that expected by their bankers, are likely to favorgoing into debt, confident of their ability to pay off with the easier money of the future. Offsettingthis, however, is the government’s insistence on allowing tax–depreciation credits based only onhistoric costs rather than constant-value monetary units.

If it can be assumed that a shipowner is free to raise freight rates commensurate with any futureinflation in operating costs, then all financial and economic factors will float upward on the sameuniform tide. If that occurs, the ‘best possible’ ship based on no inflation will also be the ‘bestpossible’ ship in which inflation is taken into account.

Inflation needs concern the design team only when it becomes apparent that rates of inflation arenot the same for every factor in the economic structure. Long–term cash flows cannot be analyzedwithout first adjusting each year’s figure according to the purchasing power of the monetary unitrelative to any convenient base year.

Three basic approaches can be envisaged for calculating equivalence values in an inflationaryenvironment that allow for the simultaneous consideration in earning power and changes in pur-chasing power. The three approaches are consistent and, if applied properly, should result in

517


identical solutions. The first approach assumes that cash flow is estimated in terms of actualmonetary units, whereas the second uses the concept of constant monetary units. The thirdapproach uses a combination of actual and constant monetary units.

To develop the relationship between actual monetary unit analysis and constant monetary unitanalysis, it is appropriate to give precise definition of several inflation related terms (Thesen andFabrycky, 1989):

• Actual monetary units represent the out-of-pocket monetary units received or expended atany point in time. Other names for them are current, future, inflated, and nominal mone-tary units.

• Constant monetary units represent the hypothetical purchasing power of future receipts anddisbursements in terms of the purchasing monetary units in some base year. It is assumedthat the base year, the beginning of the investment, is always time zero unless specifiedotherwise. Other names are real, deflated, and today’s monetary units.

• Market interest rate, i, represents the opportunity to earn as reflected by the actual ratesof interest available in the financial market. The interest rates used previously are actuallymarket interest rates. When the rate of inflation increases, there is a corresponding upwardmovement in market interest rates. Thus, the market interest rates include the effects ofboth the earning power and the purchasing power of money. Other names are combinedinterest rate, minimum attractive rate of return, and inflation–adjusted discount rate.

• Inflation-free interest rate, i′, represents the earning power of money isolated from the effectof inflation. This interest rate is not quoted by financial institutions and other investorsand is therefore not generally known to the public. This rate can be computed, however, ifthe market interest rate and inflation rate are known. Naturally, if there is no inflation inan economy, i and i′ should be identical. Other names are real interest rate, true interestrate, and constant monetary unit interest rate.

• General inflation rate, d, represents the average annual percentage of increase in prices ofgoods and services. The market inflation rate is expected to respond to this general inflationrate.

The problem is then about which is the best way to correct a misleading ‘future value’ into areliable ‘current value’. There are two alternative methods. Both are based on the same principlesand, if corrected carried out, should produce the same final outcome and resulting design decision.One way is to prepare a year–by–year table in which all cash flows are entered in current values.The analyst is then in a position to apply standard interest relationships to find the present valueor equivalent uniform annual cost of this current–value cash flow in the usual way.

The other approach, as might be guessed, is to start with face-value monetary units and apply adiscount rate that has built into it adjustments for both inflation and time–value of money. Thismethod can be handled by simple algebraic procedures and does not require the time–consuming,error-prone, year–by–year tabular approach described previously. It allows one to find the presentworth (corrected for inflation) of a future cash flow that is subject to predictably changing mon-etary values.

518


The task, now, is to derive values of i for any given set of assumptions as to the rate of inflationand time-value of money. Note that i incorporates both time–value of money and inflation.

One way is to start with the simple case in which some cost factor is floating up right alongwith the general inflation rate, d. That being the case, although it appears to be increasing inface–value terms, it is really holding steady in real purchasing power. That is, it is always thesame in current–value monetary units; so one can ignore inflation and say

i = r

Next, examine the case where a given cost factor remains fixed in face–value monetary unitsduring a period of general inflation. One example might be straight–line depreciation. Anotherwould be a fixed–level charter fee. In any given year

AFV = Ao

Correcting for inflation

ACV =AFV

(1 + d)N=

Ao

(1 + d)N

and correcting to present worth

PW =ACV

(1 + r)N=

Ao

(1 + r)N ·(1 + d)N

That is, double discounting is employed, once for the time-value of money, and again the decliningreal value of the monetary unit. In short, where costs remain fixed in face–value terms one mayuse

i = (1 + r)·(1 + d)− 1

Finally, consider the case of a cost factor that changes at an annual rate, x, that differs fromgeneral inflation. In face–value terms

AFV = Ao (1 + x)

This final expression may, in extreme cases, produce a negative interest rate (equivalent to payingthe bank to guard cash). This will lead to a present worth exceeding the future amount.

Non-Annual Compounding

In most ship design studies engineers usually assume annual compounding when weighting thetime–value of money. There may be instances, however, when other compounding periods shouldbe recognized. It may be recalled that the standard interest formulas are applicable to any com-bination of compounding periods and interest rate per compounding period .

519


Clearly, when changing the frequency of compounding the analyst also changes the weight givento the time–value of money. In order to make a valid comparison between debts involving differingcompounding periods, the analyst needs an algebraic tool that will assign to each repayment plana measure that is independent of frequency of compounding.

The usual approach to this operation is based on what is generally called the effective interestrate, abbreviated t. This is an artificial interest rate per year that ascribes the same time–valueto money as some nominal annual rate, rM , with M compounding periods per year .

For example, suppose one loan plan is based on quarterly compounding at one interest rate,whereas another is based on monthly compounding at a somewhat lower rate. It is not possibleto tell by looking at the numbers, which is more desirable. If both nominal annual rates areconverted to their corresponding effective rates, however, those values will tell which is the betterdeal. To convert from a nominal annual rate, rM , to effective rate, r1, the following equation isused:

r1 =(

1 + rM

M

)M

− 1

9.4.6 Escalation Rate

Escalation rate represents a specific inflation rate sometimes applicable in contracts. It is quiteunderstandable how increasing costs reduce the profitability of an investment. During the yearsof industrial expansion of the post–war period, freight rates for a given ship and cargo have gen-erally followed a broadly escalation in nearly every item concerned due to rates of inflation and oilprices going up particularly in the 1970’s, although the underlying trend has often been obscuredby market fluctuations, increasing ships’ efficiency and reductions arising from the economies ofscale as larger ships have been introduced.

Voyage charter rates do not include escalation clauses, nor do the majority of time-charter rates,which cover short and medium periods, i.e. they remain fixed for the duration of the charter.However, sometimes escalation clauses covering increases in certain operating costs are includedin the few long–term charters. Liner conference freight rates have been adjusted regularly overthe years as elements of running costs have increased, particularly bunker costs.

In the majority of economic studies concerned with actual ships, it is suggested that money termsare used throughout (i.e. the actual cash amounts moving through the company’s bank account,including escalation), as this is the form usually used by shipowners in evaluating projects, whosecash flows from charter income and loan repayments are expressed in money terms. Use of moneyterms also forces attention on differential escalation rates (if all costs and income rose at equalrates, it would be easy to work in real terms), on second–hand values (ships are often sold longbefore the end of their physical life), and on likely rates of return, both before and after tax. Italso makes hindcasting easier, checking on the results of previous evaluations. General forecastsof inflation, plus analysis of past data, can be used to assist in estimating escalation rates.

520

9.5 – Economic Criteria

It is equally possible to work in real terms, i.e. in money of constant purchasing power, butadjustments need to be made when some costs may be quoted in money terms, e.g. progresspayments when building ships, while others may be estimated in real terms, e.g. crew costs.

9.5 Economic Criteria

So far, this chapter has dealt with the basic principles of engineering economics. It has shownhow to assess the relative values of cash exchanges that occur at different times, and how toanalyze the impact of taxes and interest payments on cash flows. Now comes the critical questionof how to apply all of the foregoing to decision making in ship.

It should be stressed is that there is no universally accepted technique for weighting the relativemerits of alternative designs. Business managers, for example, may agree that the aim in designinga merchant ship should be to maximize its profitability as an investment. But they may fail toagree on how to measure profitability. Likewise, officials who are responsible for designing non–commercial vessels, such as for military or service functions, have a hard time agreeing on howto go about deciding between alternative designs. The truth of the matter is that there aregood arguments on favor of each of several economic measures of merit and the designer shouldunderstand how to handle each of them.

9.5.1 Set of Economic Criteria

The most widely used techniques are those associated with discounted cash flows. The magnitudeand timing of cash payments in and out are estimated over the vessel’s life for acquisition costs,operating costs, and revenue generating potential.

The time value of money is recognized by discounting future cash flows at the operator’s costof capital, i.e. multiplying each year’s net cash flow by (1 + i)−N , where N is the number ofyears from project start, and i is the discount (or interest) rate as a decimal fraction. Cash flowcalculations may be made in ‘money terms’ (the actual amount in money of the day) or in ‘realterms’ (money of constant purchasing power). In the former case inflation has to be allowed for;in the latter case the interest rate will be in real terms, which is approximately the rate in moneyterms minus the rate of inflation.

Table 9.2 identifies thirteen measures of merit, each based on sound economic principles. Each isof potential value in marine design, and several have strong supporters. They are placed in threecategories depending on whether the analyst wants to assign, versus derive, a level of income andassign, versus derive, an interest rate.

There are only three primary economic criteria: the other ten are each closely related to one ofthose three. Here the discussion is confined to the four most important measures of merit. Thereare the three primary criteria shown in the middle column of Table 9.2 (net present value, yield ,and average annual cost) plus required freight rate. All assume uniform annual costs and revenues,

521


although levels may vary between alternatives. The last is not stated explicitly in Table 9.2; butin structuring a problem to find its minimum value, it is generally implied that the present worthsof income and expenditure are equal so that their ‘net present value’ is zero.

Required Assumptions Primary Measure Surrogates orRevenue Interest Rate of Merit Derivatives

yes yes NPV NPV I,AAB,AABI

yes no IRR CR,CR′,PBP

no yes AAC LCC,CC,RFR,ECT

Table 9.2. Three major categories of economic criteria

The numerical results will be different in each case depending on what criterion is being calcu-lated, but if being used to compare alternative ship designs, all would indicated the same optimaldesign if data are consistent, e.g. rates of return are commensurate with freight rates.

Marine literature contains many studies based on questionable logic. Perhaps the most commonvariety tries to minimize the unit cost of service. That is, someone looks for the alternative thatminimizes the cost to the shipowner. This is technically called the fully distributed cost . It issomething like the required freight rate, but ignores corporate income taxes and applies a rock–bottom interest rate to total capital. By ignoring taxes and minimizing the time–value of money,this criterion is almost always misleading.

9.5.2 Definition of the Economic Criteria

The most popular economic criteria for marine problems, which are the most rational measuresof economic merit to optimize the design from the shipowner’s point of view, can be firstly con-sidered under a set of simplifying assumptions:

- all annual incomes and expenses remain uniform in constant–value terms;

- the investment is made in single lump-sum payment upon delivery of the ship;

- no bank loans and tax-credits are involved;

- the tax life equals the economic life;

- straight–line depreciation is used in figuring tax;

- the scrap value is zero.

Most initial ship design economic studies will probably not be afflicted with complex cash flowpatterns, but will rather consist at a single investment, at year zero, and uniform after–taxreturns. In the sequel different economic criteria are illustrated.

Net Present Value

The net present value (NPV ) criterion is by far the most popular and easily understood of allthe economic measures of merit in use today among business managers.

522


It requires an estimate of future revenues and it assigns an interest rate for discounting future,usually after–tax, cash flows. The discount rate is usually taken as the minimum rate of returnacceptable to the decision maker. As implied by its name, NPV is simply the present value ofthe projected cash flow including the investments.

If the building cost of a ship is known, together with the minimum required rate of return on thecapital invested (discount rate), all the annual operating costs, the cargo quantity transportedeach year and the corresponding freight rate (i.e. annual revenues), one can calculate the presentworth of each item of income and expenditure and add them to find NPV .

The general form of NPV for freight earning ships is defined by the difference between the presentvalue of cash receipts over the project life and the present value of all cash expenses

NPV =N∑

0

PWannual payload quantity×freight rate − PWshipbuilding cost − PWannual operating costs

In the simple cash–flow pattern shown in Figure 9.5, if the cash flows after tax have a uniformlevel, A′, over the ship’s life, and P represents a single lump investment, the net present value isfound by subtracting the investment from the present value of the future cash flows; in short

NPV = A′ (SPW − i′ −N)− P

where

A′ : uniform annual after–tax cash flow = A (1− t) + t·P/N

A : uniform annual cash flow before taxt : corporate income tax rate

(SPW − i′ −N) series present worth factor for an owner’s stipulated minimum acceptableafter–tax rate of return i′ and a period of N years

If cash flows are not uniform, the present worth of each annual cash flow after tax can be calcu-lated for each of the N years of the ship’s life.

The net present value may be regarded as an instantaneous capital gain if positive (or loss, ifnegative), or as a discounted profit, or the sum for which the total project could be sold at itsstart. Consequently designs with the highest NPV s are sought. Of course, when the NPV isnegative, the project would be rejected.

The NPV economic criterion has two inherent weaknesses: it tends to favor massive investmentsand it can be misleading if alternatives have different lives. The first weakness may be overcomeby using the net present value per monetary unit of investment. This normalized quantity iscalled the net present value index (NPV I) or profitability index (Benford, 1981)

NPV I =NPV

P

523


If alternatives have different lives, NPV tends to favor the longer lived. That distortion can beeliminated by multiplying each NPV by a capital recovery factor based on the same discount rate,but appropriate to the individual life expectancies. This criterion is called the average annualbenefit

AAB = (CR′ − i′ −N) NPV

If one takes the AAB per monetary unit invested, that will eliminate both weaknesses in the useof NPV . This third variation of NPV is called the average annual benefit index

AABI =AAB

P

Internal Rate of Return

It is important to notice that NPV is found by discounting future cash flows at the decisionmaker’s minimum acceptable interest rate. Because the predicted value of an acceptable projectmust always be positive, the actual expected interest rate will be something higher than the min-imum rate used in computations. Instead of applying that minimum acceptable rate, the decisionmaker could look at the expected cash flow pattern and derive the interest rate implied.

There is some interest rate that will make the NPV of a cash flow equal to zero. It is the in-ternal rate of return (IRR), which is another time–discounted measure of investment worth. Itis feasible to use IRR in cases where the freight rate or income is known. Designs are preferredthat offer the highest IRR, which also goes by various names, including discounted cash flow rateof return (DCF ), yield , equivalent interest rate of return, profitability index , marginal efficiencyof capital , equivalent return on investment , and others.

It is derived iteratively, or using the ‘goal seek’ function, by finding the interest rate that willmake the present worth of future after-tax cash flows, including the investment, equal to thepresent worth of the investment. In short, IRR is that interest rate that leads to an NPV ofzero. In other words, it is the maximum rate of interest at which the shipowner could financehis/her ship on normal bank overdraft terms or any equivalent (i.e. where interest is charged onlyon the outstanding balance of the loan) in the absence of uncertainty.

In simple patterns, however, yield can be easily found. First find the expected after–tax capitalrecovery factor (i.e., the ratio of A′ to P ), then go to interest tables and find the interest ratethat corresponds to that combination of CR and ship’s economic life, N . There are also variousextensions to the basic method to cater for special situations.

IRR avoids the shortcomings of NPV in that it does not give unfair advantage to larger invest-ments or those with longer lives. Nevertheless, some advocates of NPV point to cases where IRR

may be misleading. This is particularly true where the attainable yield differs markedly from thecompany’s actual value of money. Another of its shortcomings may show up if the analyst is facedwith a cash flow pattern that shows a year–by–year mix of money coming in or out. That beingthe case, it may turn out that there is more than one interest rate that will bring the net present

524


value down to zero. Fortunately, most ship economic studies involve simple cash–flow patternsin which that dilemma does not arise.

When revenues are predictable, decision makers should take advantage of that knowledge to es-timate the equivalent interest rate of return. That frees them of the necessity to stipulate aninterest rate, which is a tricky question. Nevertheless, the yield must be compared with someinterest rate. The naval architect has to compare the estimated yield for every alternative designand select the highest. Whether the figure is high enough to justify the risk of investment isstrictly manager’s concern.

When properly applied, usually IRR will produce exactly the same answers as the net presentvalue method; and it requires exactly the same information. The choice between the two methodsis, therefore, largely a matter of personal preference.

If all design alternatives have equal lives, then an examination of the relationship between capitalrecovery factor and interest rate will show that the alternative with the highest value of CR′ willautomatically have the highest yield. Moreover, the alternative with the highest capital recoveryfactor before tax, CR, will normally enjoy the highest capital recovery factor after tax, CR′. Thismeans that CR may be a surrogate for the yield.

A surrogate for IRR is the pay–back period , PBP , the number of years required to regain theinitial investment:

PBP =P

A′

As it is evident, PBP is the reciprocal of the capital recovery factor after tax, CR′. As such, itshares both strengths and weaknesses of that criterion.

Average Annual Cost

The next economic criterion is useful in designing ships that are not expected to generate income:naval vessels, patrol vessels, dredgers, yachts, etc. Now the cash flow pattern will feature onlymoney flowing out. When that is the case, a logical and popular measure of merit is the so–calledaverage annual cost (AAC) criterion. The AAC measure of merit may be applied also to mer-chant ship designs where all alternatives would happen to have equal incomes, which includes thepossibility of that being zero.

Whereas in using NPV or IRR, the analyst seeks for the alternative promising highest values,in using AAC, the lowest values are desired.

The simplest case ought to have a single initial investment, P , at time zero, and uniform annualoperating expenses, Y , for N years thereafter. The problem reduces to find, among a set of alter-natives, the design solution with the lowest value of AAC, which would be found by convertingthe initial investment, P , to an uniform annual amount, which would be added to the annualoperating costs, Y

AAC = CR·P + Y = (CR− i−N)·P + Y

525


where

Y annual operating costs

CR capital recovery factor corresponding to the life of investment, N , and the owner’sstipulated before-tax interest rate of return, i;

The term (CR − i −N)·P is called the annual cost of capital recovery, ACCR. Note that it isbased on before–tax interest rates in case revenues and taxes are both involved.

The interest rate should be some logical measure of the decision maker’s time-value of money.In the case of a government–owned ship it might reflect the current rate of interest paid ingovernment bonds. For more complex cash flows, simply discount everything back to year zero,(including P ), then multiply the total figure by the capital recovery factor based on the sameinterest rate for a number of years equal to the ship’s life span. That will produce the averageannual cost.

Required Freight Rate

If two competitive designs promise the same average annual cost, but one promises to be moreproductive than the other, this difference is quantified by relating the AAC to productivity. Inthe case of merchant ships this is done by dividing the average annual cost by the annual trans-port capacity. This gives the required freight rate (RFR), which economists call a ‘shadow price’.It represents the necessary minimum income per unit of capacity (e.g. passengers or cargo) tocover all operating costs while providing the required rate of return on capital invested in theship. The same concept could be applied to other measures of productivity such as cars per yearand/or passengers per year for a ro–ro vessel, tons of fish per year for a trawler, and so forth.

If the acquisition cost, P , of a ship is known, the required rate of return, i, all the operating costs,Y , and the annual cargo quantity transported, C, the level of freight rate can be found whichproduces equal present values of income and expenditure, i.e. zero NPV . This criterion is moresuitable when revenues are unknown but will vary between alternatives because of differences intransport capability. In general

RFR =AAC

C=

ACCR + Y

C=

CR·P + Y

C

where the annual cargo capacity can be in whichever unit, and Y = Yr + Yv, being Yr and Yv theannual running costs and the voyage costs, respectively.

So RFR can be regarded as a calculated freighting cost, which can then be compared with actualfreighting price, i.e. market freight rates. For service vessels (e.g. offshore vehicles like cranebarges, pipe–laying ships, etc.), RFR may be calculated in the form of necessary daily hire rate.

In other terms, it is the rate the shipowner must charge the customer if he/she is to earn areasonable return on investment. The theory is that the owner who can enter a given trade routewith a ship offering the lowest RFR will best be able to compete.

526


This is a valuable criterion, much used in the marine industry for studying the feasibility of newship concepts or optimizing the details of any particular concept. It provides the money amountthe shipowner must charge a customer if the shipowner wants to earn a reasonable after–taxreturn on his/het investment. The theory behind RFR is that the best ship for any given serviceis the one that provides that service at minimum cost to the customer. Implicit here are theassumptions that (i) free market forces predominate in the trade, and (ii) all competitors operatewithin the same frame of capital and operating costs.

A key step in finding RFR is to convert the initial investment to an equivalent uniform annualnegative cash flow before tax. These annual amounts must be large enough to pay the incometax, and return the original investment to the owner at the specified level of interest. In short, asuitable value for the capital recovery factor before tax must be found.

To show the truth of the above assertion, recall the basic relationship (9.2) between cash flowsbefore and after tax

A′ = A (1− t) +t·PN

To make this non–dimensional, divide through by the initial investment P

A′

P=

A (1− t)P

+t

N

ButA′

P= CR′ and

A

P= CR

which leads to

CR′ = CR (1− t) +t

N

Then solving for CR

CR =CR′ − t

N1− t

Since t and N are the same for all design alternatives, CR will vary directly with CR′ which will,in turn, vary with the yield, i′. This is a simple way of converting an after–tax interest rate toa before–tax capital recovery factor. It assumes an all–equity investment and a tax depreciationperiod equal to the ship’s economic life.

For non-uniform cash flows, an initial freight rate has to be assumed so that an initial NPV canbe calculated as above. This NPV is unlikely to be zero, so an iterative procedure has to be usedto find the exact freight rate which gives zero NPV .

If the ship is in a fixed, single–cargo trade, then the annual transport capacity, C, is fairly easyto estimate. If the ship is in, say, a round–the–world voyage with many ports of call and manyclasses of cargo, RFR may prove too cumbersome to be practical.

527


Payback Period

A popular rule-of-thumb for evaluating projects is to determine the number of periods needed torecover the original investment. The payback period (PBP ) is defined as the number of periodsit will take to recover the initial investment outlay.

Assuming uniform annual returns, the payback period is supplied as

PBP =P

A′

This is the reciprocal of CR′ and so incorporates all that criterion’s strengths and weaknesses.Obviously, the most serious deficiencies of the payback period are that it fails to consider thetime value of money and that it fails to consider the consequences of the investment after thepayback period.

As a modification of the conventional payback period, one may incorporate the time value ofmoney. The method is to determine the length of time required for the project’s equivalent re-ceipts to exceed the equivalent capital outlays.

Mathematically, the discounted payback period Q is the smallest n that satisfies the expression

Q∑

n=0

Fn

(1 + i)n≥ 0 (9.4)

Clearly, the payback period analysis is simple to apply and, in some cases, may give answersapproximately equivalent to those provided by more sophisticated methods. Many authors havetried to show an equivalence between the payback period and other criteria, such as IRR, underspecial circumstances. For example, the payback period may be interpreted as an indirect, thoughquick, measure of merit. With a uniform stream of receipts, the reciprocal of the payback periodis the IRR for a project of infinite life and is a good approximation to this rate for a long–livedproject.

There are many reasons why the payback period measure is so popular in business. One reasonis that the payback period can function like many other rules-of-thumb to shortcut the processof generating information and then evaluating it. Payback reduces the information search by fo-cusing on the time when the firm expects to ‘be made whole again’. Hence, it allows the decisionmaker to judge whether the life of the project past the break even point is sufficient to make theundertaking worthwhile.

In summary, the payback period gives some measure of the rate at which a project will recover itsinitial outlay. This information is not available from either the NPV or the IRR. The paybackperiod may not be used as a direct figure of merit, but as a constraint: no project may be acceptedunless as payback period in shorter than some specified period of time.

528


9.5.3 Choice of the Economic Criteria in the Marine Field

The following are just general recommendations. When revenue is either unknown or zero, as-sume a reasonable interest rate and convert all costs to discounted cash flows. When revenueis predictable, use equivalent yield as economic criterion. If revenues are the same for all alter-natives, seek the one with the lowest annual average cost. If transport capabilities vary, divideaverage annual cost by the annual tons of cargo (number of units, number of passengers) moved.This gives the freight rate required to return the stipulated yield; and the best ship for the tradeis the one with the lowest required freight rate.

In finding average annual cost or required freight rate, one must not overlook the corporate profittax. Where revenues are predictable, engineers need worry less about the tax. Under most nor-mal circumstances the ‘best possible’ ship before tax will also be the optimum ship after tax,assuming all alternative designs charge the same freight rate..

The NPV economic criterion is widely used as a criterion especially where investment funds arelimited, but it is best used in those cases in which income can be predicted reasonably confidently,e.g. long-term time–charters. It has the computational merit of being a single calculation not re-quiring an iterative solution. A drawback to its use is interpretation of the results. The differencesbetween investments are absolute, not relative, and this can make comparison of widely differentalternatives difficult. This may be partially overcome by the net present value index (NPV I)introduced by Benford (1970), which can be used to compare investments differing greatly inabsolute size, e.g. coastal tankers versus very large crude oil carriers.

Alternatively a profitability index may be calculated as ratio between NPV of cash inflows andNPV of cash outflows. There still remains the problem of comparison when NPV s are close tozero or negative, and of forecasting income in a fluctuating business like shipping. NPV I usedas a measure of merit is analogous to IRR, since it is effectively a ‘profit’ divided by the first cost.

The RFR measure of merit is useful in the many cases where incomes are unknown. In aninternationally competitive business like shipping, rates of return oscillate about a long-termtrend, and over a ship’s life it is not unreasonable to expect that freight rates will providea return on an efficient ship tending to the average trend. If this did not occur, shipownerswould not reinvest in new tonnage, demand would ultimately exceed supply and produce itsown correction in the form of higher freight rates, unless there is too much non-commerciallyrun tonnage available (e.g. state supported fleets). Freight rates do not remain permanently inpeaks or troughs. RFR is particularly useful when comparing alternative ship sizes, as a singlefreight rate cannot be expected to apply to all sizes - the market ensures that economies of scaleare eventually passed on the consumer. RFR can be compared with predicted market rates tosee if the results appear realistic. Low discount rates may lead to over-design, e.g. ships fasterthan is ‘economic’, since capital cost is being assessed more ‘cheaply’ than operating costs. Highdiscount rates may result in required freight rates so high as to be unattainable under normalmarket conditions, so the design is likely to be uncompetitive in the sense of being able to findbusiness. The RFR rate have different units according to the type of vessel and duty, e.g.

529


passenger ships cost per passenger-milero-ro ships cost per vehicle-milecargo carrier ships cost per tonne-milecontainer ships cost per TEU-milepatrol craft cost per day

The AAC concept is analogous to RFR to compare alternatives which have equal annual trans-port capability or equal annual performance capability for those vessels which do not generate anincome. In using AAC, all costs are discounted to year zero, to give present value of costs whichcan be converted to an equivalent annual amount via the capital recovery factor. This criterioncan be also used for items of equipment which do not affect a ship’s earning potential.

The IRR economic criterion gives a more recognizable comparison between widely different alter-natives, especially where funds available for investment are relatively unrestricted. It is a usefulmethod for additional pieces of equipment, especially those not significantly affecting a ship’s in-come, where it can be measured against some target rate of return for the degree of risk involved.Like NPV , there is the problem of forecasting income, but in addition, IRR is not related tothe absolute amount of the investment. IRR is, however, not the same as the profit on historiccapital shown in a company’s accounts, but is more like the rate of return on a fixed interest rateinvestment like a government stock. In general, the design with maximum CR will be that withthe highest IRR, if lives are equal. In theory, there will be multiple solutions to the calculation ofIRR where cash flows alternate in sign, but this is not often a problem in marine work (Sloggett,1984).

The incremental rate of return is a variant that calculates the IRR on an additional investment,e.g. an extra piece of equipment on a ship, or the difference between two projects’ cash flows toshow whether the rate of return on this ‘incremental’ investment is at least as high as that onthe basic ship. In this case, only the cash flows and extra first cost associated with the ’incre-ment’ are used in calculating the rate of return, so simplifying the appraisal, as δA/δ → CR′ → i′.

Permissible cost can be used when assessing newbui1ding prospects or the purchase of second-hand ships, comparing this price against current ship prices and expected freight rates. It canalso be used to assess new items of machinery or equipment, whose operational costs and savingscan be estimated.

Figure 9.17 shows the normal circumstances under which one of the criteria may be selectedfor ships, according to the amount of information known. The designer task is primarily thatof selecting the best alternative, leaving to management the problem of whether to invest atall and if so, when. In the marine field, where it is not always possible to predict income overthe life of a ship, the preference should be for required freight rate as the most useful long–run economic criterion in establishing the most economic vessel design. In the case of closelycompeting alternatives, a range of assumed freight rates may then be taken, so that NPV s andIRRs can be calculated to see whether the order of merit of the alternative designs indicated byRFR is changed. Where equipment, rather than the entire ship, is being considered, income may

530


take the form of cost savings, and IRR is a useful criterion, especially where ship performance(speed, payload, port time, etc.) is not significantly affected.

Figure 9.17. Decision chart for selecting the economic criterion

The criterion of payback period , PBP , is still sometimes used in industry. This is the number ofyears it takes the net revenue (income - expenditure) to accumulate to the level where it equals(‘pays back’) the investment. While payback period is numerically equal to SPW for uniformcash flows, P/A, the value of i should still be calculated for the appropriate N . A variant cal-culates the number of years before the discounted net revenue equals the investment. This isanalogous to rate of return, but solving for N instead of i. Payback period should not be used fornon-uniform cash flows, as all variation in income and expenditure for years beyond the paybackperiod is completely ignored, taking little account of cost escalation or change in performancewith time. Its use as a primary criterion is therefore not recommended, but it can be presentedas a supplementary result or a simple shorthand for results derived more rigorously, especially ifthe result is attractively small.

Even if non–economic factors are the primary reason for purchasing a ship in the first place, e.g.national prestige, technical and economic criteria still have their place in assisting the selectionof the best of the alternative ship designs, machinery and equipment.

531


9.6 Ship Costs

The cost of any ship is a function of different kinds of variables - technical, physical, managerial,political. Its complete estimation calls for professional guidance from a range of disciplines, someof which are quite remote from that of naval architecture - accountancy, planning and produc-tion control, trade union agreements, shipyard management, insurance and many others. But atinitial design stages naval architects aim at nothing more than first approximations of ship costs,which can be obtained fairly quickly, but which are nevertheless associated to some extent withthe physical features of the ships under consideration. The reasons for costing since conceptualdesign stage are to get an idea of the capital investment involved and to see how the cost mightbe affected by altering any of the principal variables.

Life–cycle costs, including both building and operating costs, are among the most importantparameters influencing the choice between competing vessels. Cost is concerned with how muchmoney the shipbuilder will pay for shipyard labor to build the ship, subcontractors to assist, allmaterials and equipment contained in the completed vessel, miscellaneous services and establish-ment charges. That is why estimates of costs with a good level of accuracy are desirable sincethe initial stages of the design process. This may not be easy, as costing data necessary for thecalculations are usually not readily available. Indeed, shipowners and shipbuilders characteristi-cally refuse at giving easily one another cost information.

The design team is generally concerned with evaluating alternative solutions in conceptual design.The alternatives will usually differ not only in performance, but in their first costs and operatingcosts. It is, therefore, useful to obtain quickly an indication of relative costs, before developingmore detailed studies which may involve work by other organizations. Cost estimates maybebroadly divided into three main categories:

1. Conceptual design cost estimate for selection process.

2. Basic design cost estimate, associated with detailed exploration of robust alternatives.

3. Fully detailed cost estimate, usually for tendering purposes.

The expected level of accuracy increases with detail, as does the amount of data and effortrequired. Here only the first category is concerned, because cost estimating is more likely tobe applied at this level by ship operators, consultants, equipment suppliers, regulatory bodies,researchers, etc., rather than at the more detailed levels, which are largely the preserve of profes-sional cost estimators, e.g. in shipbuilding companies.

At conceptual design stage it is not possible to suggest more than very simple cost estimatingrelationships for approximate estimates; nevertheless, these can still be useful in establishing thepotential feasibility of a project, and in ranking the principal alternatives for more detailed study.

In the ship design context, the need to estimate the principal costs to carry out an economicevaluation concerns the following components of an economic model:

• Building cost

– Structural hull

532

9.6 – Ship Costs

– Outfitting

– Machinery

• Voyage costs

– Fuel consumed in transit

– Fuel consumed on duty

– Fuel price

– Other consumables, e.g. lube oil

– Port charges

– Payload handling charges

– Other shore/base costs

– Cost escalation

• Manning costs

– Crew

– Upkeep

– Insurance

– Stores

– Overheads/Administration

– Manning (per running hour and total time)

– Cost escalation

• Financial factors

– Internal rate of return

– Expected economic life

– Financing/loan terms

– Fiscal factors (tax, subsidy)

– Exchange rates

– Residual value

• Payload/Revenue/Effectiveness

– Sea time per transit (mission/duty)

– Port/base time

– Non-operational/off-hire time

– Voyages/missions per year (with seasonal variations if appropriate)

– Payload/duty performed per voyage/mission

– Freight rate

– Total revenue

– Revenue escalation

533


What follows is mainly an explanation of how one can structure a procedure for estimating thecosts of alternative design concepts. Naval architects need to complement what is explained herewith appropriate real–life data collected from many various resources, Nevertheless, the followingnotes will assist in producing approximate estimates, but is not a substitute for more detailedmethods or more accurate data where these are available. Contributions by Erichsen (1972),Carreyette (1978) and Kerlen (1985) may also usefully be consulted for methods and data.

9.6.1 Building Cost

Analyses of engineering economic cost always involve an estimate of invested costs. Indeed, theconstruction costs are usually the single largest, hence most important, factor entering into theanalysis. Although shipbuilding costs may be estimated for several different reasons, here themain scope is to help make rational decisions in conceptual design stage.

Naval architects normally want to predict the economics to assist comparison of large numbersof alternative designs (Buxton, 1987) as well as to establish the design variables and parametersof the most efficient vessel (synthesis or optimization). This means that the estimating methodsshould be relatively simple, provided basic data are available. The alternatives under considera-tion usually exist only as virtual concepts about which few details have been established. This,too, suggests that the techniques must be relatively simple. Moreover, the estimating methodsshould strive to emphasize differences in costs between the competing alternatives.

At the simplest level, the first cost of a ship is influenced mainly by her type, size, speed, hencepower. Where the range of possible specifications is small, e.g. in straightforward vessels suchas tankers, size alone is often a fair guide to approximate first cost. Maritime journals such asFairplay and Lloyd’s Ship Manager include published prices of recent contracts, and graphs canbe plotted to give an indication of expected prices, at least when market conditions are reasonablystable. Such graphs may indicate whether a simple cost relationship of the form

P = k (Lα ·Bβ ·Dγ) or P = k (L·B ·D)x

may be derived. The slope of such a curve, if plotted on a log–log graph paper is given by x,typically about 0.7; that is, cost increases less rapidly than size, as would be expected. Regressionanalysis can be used where there may be more variables, e.g. speed.

Another simple technical characteristic to use as a basis for estimating building cost is the lightshipweight, WLS . An accurate mass estimate is fundamental, even at the initial stages of the design.Aeronautical engineers have concluded that the cost of almost any kind of vehicle could beapproximated by means of the simple expression

P = k (WLS)0.87

Again, such a rough approach has its limitations, but can be useful in situations where returnedcosts are rare, such as in merely developing kinds of marine vehicles.

534

9.6 – Ship Costs

Care needs to be taken to keep the data as consistent as possible, e.g. untypical ships must beeliminated and data from the same time periods should be used, as well as a relatively stablecurrency. Cost per ton lightship may also be used, with typical prices around 4200 to 5800 eurosper ton for deep–sea container and ro–ro vessels. Bulk carriers would be 80% of this, VLCCs75%, and products/chemical tankers 110%.

Where the alternatives differ in other respects, e.g. speed, machinery type, hull material, numberof decks, etc. a more detailed process is required, unless the cost of the differences can be easilyidentified and simply added to the basic price. When shipyard cost estimators prepare a bidfor a proposed ship, they look at costs based on technical characteristics. But now, rather thanbasing their work on a single characteristic, they look at one part of ship at a time and try topredict both material and labor costs for building each part. Typically, they may make individualestimates for about 200 physical components of the finished ship. Most of their unit costs arebased on weights, which can be fairly accurately predicted during the bidding phase.

In conceptual and basic design work, however, not enough is known about the ship to go intosuch detail. So simplification is needed. Before lines plan and any drawings have been prepared,the alternative designs are in the form of concepts about which very few is known: the principalcharacteristics, power, general weight breakdown. The total lightship mass can be divided intohull, machinery and outfitting.

An approximate first estimate of hull costs is possible through the cubic number (CN), of ma-chinery costs through power (usually PB), and for outfitting, including additional equipment,through a corrected cubic number (CNc). This might lead to this expression for first cost:

P = C1 (CN)α + C2 (PB)β + C3 (CNc)γ (9.5)

where C1, C2 and C3 are coefficients, and α, β and γ are exponents, all of which are derived fromprevious similar ships.

Again, such simple methods become inaccurate unless narrowly confined. Confidence can beincreased if one applies techniques that are considerably more accurate and yet require no moreknowledge about the alternative ships than what is implied above: main dimensions, power,geometrical coefficients. To do this the naval architect could break the ship down into threemajor parts, namely, structural hull, outfitting plus hull engineering, and machinery. In additionexpenses can be divided between material, labor and overhead. Labor rate should include al-lowances for other indirect costs. Normally, material and labor costs for each of the three majorcomponents are estimated, to which overhead is applied as a single, overall cost.

The total building cost of a ship may be divided into about eight principal groups, as indicated inTable 9.3, which gives an indication of the breakdown of shipbuilding costs into those categoriesfor four types of ship.

A division of building cost is carried out for hull, machinery, and outfitting costs into material andlabor. For these estimates, detailed data are required from shipyards, or machinery manufacturersfor the relevant acquisition costs. Such data are not normally fully available from shipyards.

535


Item Cargo Liner Bulk Carrier Tanker Ro-Ro Vessel12-20000 dwt 25-50000 dwt 200-300000 dwt 4500-6500 dwt

1. Steel work materials 10 13 20 122. Steel work labor 12 13 14 113. Outfitting materials

and subcontractors 19 16 18 214. Outfitting labor 7 7 8 105. Main propulsion

machinery 13 12 8 106. Other machinery 7 7 8 107. Machinery installation

labor 3 3 3 38. Overheads 18 18 18 189. Appended costs 11 13 13 910. Total building cost 100 100 100 100

1. Plates, sections and welding materials.2. Direct labor only, excluding overheads.3. Semi–fabricated materials, e.g. timber and piping, items of equipment like

hatch covers, winches, anchors, galley gear, and equipment subcontractors,such as insulation and ventilation.Electrical equipment outside machinery space.

4. Shipyard outfitting trades only including electrical, excluding overheads.5. Slow-speed diesel or equivalent, e.g. boilers, turbines, gearing, condenser.6. Auxiliary machinery, generators, shafting, pumps, piping, controls in machinery space.7. Shipyard trades only.8. Variable overheads, e.g. social security and holiday expenses, supervision and

power supplies, and fixed overheads like plant maintenance.9. Classification society costs, design costs, towing tank costs.10. Profits not included, so percentages of selling price should be slightly lower.

Table 9.3. Approximate percentage breakdown of shipbuilding costs

Hull Structure Material Cost

The floating steel mass is taken from the lightship estimate. More detailed calculations arepossible through the application of classification societies’ rules and use of the midship sectionmethod. The scrap percentage (typically 10% but 20% or more for small vessels), is added togive the steel mass in tons. The corresponding average price per ton of steel material can usuallybe obtained from a steel maker, e.g. British Steel Corporation who publish a price list for eachmain type of steel, heavy plates, sections, etc. Current prices for mild steel are 400 euros per ton.Extra may have to be added for high–tensile steel, or a preponderance of very thin or very thickplates, etc.

536

9.6 – Ship Costs

Hull Structure Labor Cost

Man–hours are the basis of all direct labor costs, and once estimated, it is only necessary toapply wage rates, overheads and profit to arrive at the total labor costs. At the simplest level,steelwork labor cost can be estimated from

Ch = steelwork tons× man–hourston steel

× wage rateman–hour

Man–hours per ton depend not only on the general level of productivity in a particular shipyard,but also on the size and type of ship. Large vessels, such as tankers, have greater steel mass perunit area of structure, i.e. thicker plating, as well as more repetitive components, than smallerships. Man–hours per ton for complex zones, e.g. hull ends and superstructures can easilyamount to two or three times that for parallel mid-body construction. As a first approximation,Carreyette (1978) suggests that steelwork man–hours may be estimated from:

Rh = 227W 0.85

s · L1/3

CB

where Ws is the net steel mass in tonnes and CB is the block coefficient at laden summer draught.

In labor–intensive activities such as shipbuilding, it seems to be a natural law that as the shipsize or number of ships being produced increases, the rate Rh, of man–hours required per tonnedecreases asymptotically to some fairly constant rate. This suggests that man–hours per ton canvary from below 50 for large ships, to over 200 for small ships. Substantially higher figures areappropriate for warships and offshore marine vehicles.

Wage rates per man–hour excluding overheads vary from yard to yard and country to country. InItaly the rate at present is approximately 17 euros per man–hour; but allowance should be madefor inflation if delivery dates are a long way ahead.

Outfitting Material and Labor Cost

The outfitting cost of a ship can vary markedly with ship type and specification; for example,variations in cargo handling gear, accommodation and equipment. For passenger ships the mostsignificant component of outfitting mass is accommodation weight. At the simplest level, a costper ton of outfitting mass could be assumed for material plus labor, say around 9000 to 12000euro for fairly straightforward ships

At a slightly more detailed level, material and sub-contractors’ costs could be separated into asmall number of items where information can often be obtained from manufacturers, e.g. hatchcovers and cargo handling equipment, plus an aggregation of other remaining items based ontheir total mass, say around 6000 to 9000 euro per ton.

Wage rates for outfitting workers are generally similar to steel–workers. Carreyette (1978) suggeststhat outfitting labor can be based on outfitting mass W◦ as follows:

H◦ = 2980×W2/3◦

537


whereas outfitting material can be estimated from:

C◦ = k◦ ×W 0.95◦

where k◦ is about 11000–14000 euro per ton.

Machinery and Labor Cost

The largest part of machinery mass is that of the main propulsion units, namely main engines,gearboxes and propulsors. It is calculated reliably as a function of installed power and enginespeed, using the data of relevant databases. Economic studies comparing alternative machineryare quite common. In general, each different type of machinery has a different first cost, both ofthe basic prime mover, and as installed as a complete system. Detailed estimates of the purchasecosts of main engines, gearboxes and waterjets, based on installed power (P in kW), are summa-rized in Table 9.4, indicating that cost per unit power falls with increasing power.

Approximately 80% of total machinery costs is contributed by the ten most significant items ofequipment. Derated versions (e.g. to reduce specific fuel consumption) are almost the same priceas the maximum rating model, despite the lower output. It should be noted that these costs areper unit at ‘Maximum Continuous Rating’ (MCR).

Type of Machinery Cost

Diesel engines Cd = 0.524 P

Gas turbines Cgt = 0.70 P − 6·10−6 P 2

Gearboxes Cgb = 114 + 0.043 P − 6·10−7 P 2

Waterjets Cwj = 0.936 P 0.84

Table 9.4. Cost of propulsion units

Because different machinery types may require different installed powers to achieve the same shipspeed (different transmission or propulsive efficiencies and service ratings), the ratios of absolutecosts may not be the same as the relative costs. For twin–screw propulsion, about 15% can beadded for diesel or gas turbine. For electric transmission, compared with gearing, 15 to 20% canbe added.

Broad corrections may be applied for major changes, such as:

• ship type• machinery aft or midships• propeller type and revolutions• steam conditions and number of boilers• difference in major auxiliaries• alternative fuels, e.g. marine diesel oil

Beyond this level, a more detailed specification and quotations from subcontractors would berequired for a full cost estimate.

538

9.6 – Ship Costs

The purchase cost of the remaining items of machinery, such as generators, together with theoverall labor cost for installation of machinery, is generally of the order of 40% of the propulsionmachinery cost. Assuming no subcontracting, the total cost of machinery installation labor matbe calculated through the following expression:

CMl = 1200·P 0.82

Overheads

Overhead costs (sometimes called establishment charges) are costs which are necessary to a ship-yard, but which cannot be allocated to any particular ship under construction, They includesalaries for administration staff and managers, watchmen as well as bills for training, electricityand power supplies, capital charges on plant, insurance, real estate taxes, maintenance, research& development, and marketing.

The usual estimating technique is to express overheads as a percentage of total direct labor costsas calculated previously, typically about 10 to 25%.

Something else to note is that what is usually called material costs should more correctly becalled costs for goods and outside services. Many shipyards, for example, use subcontractors todo the joiner work on the deck covering. Consulting service bills would come in this category, too.

Naval architects will seldom be called upon to perform detailed estimates of overhead costs tobe assigned to a ship being bid. They should, nevertheless, have some understanding of thedifficulties involved. To begin with, there are two basic kinds of overhead: fixed overheads, thosethat remain much the same regardless of how busy the yard may be; variable overheads, thosethat vary with the level of activity within the yard.

This leads to the conclusion that overhead costs taken as a percentage of labor costs will requirea prediction of what other work may be under way in the yard while the proposed ship is beingbuilt. Clearly, these estimates are outside the naval architect’s knowledge, but are the projectmanager’s responsibility. It is enough to know that overhead costs, as a fraction of labor cost,will drop if the shipbuilding company is in a period of prosperity, with several contracts on hand.

Profit

In a shipyard, it is the job of management and not the cost estimator to decide on an appropri-ate profit margin to add to the estimated building cost. The decision will be influenced by theexperience of the shipbuilding company with the type of work in question (and the associateduncertainty of the cost estimate), the yard’s order book, the state of the shipbuilding market andcompetition, and the standing of the customer. A figure of about 10% of estimated costs is aimedat, but rarely achieved in the present competitive world of shipbuilding.

In simple cost estimates, it is possible to aggregate both overheads and profit together by addingabout 30 to 35% to the sum of steelwork plus outfitting plus machinery costs.

539


Appended Costs

Appended costs include classification society fees and similar costs that the shipyard normallypasses on to the owner without mark–up for profit. They also include tug and dry–dock chargesbased-upon standard rates that already include profit.

Total First Cost and Selling Price

The total price estimated from summing the above items can then be compared with currentmarket prices to assess whether the results appear to be reasonable. However, over recent years,many shipyards have quoted prices below cost (damping) to obtain work, assisted in some casesby subsidies.

Duplicate Cost Savings

Small reductions are possible for production of multiple ships. It is estimated that by doublingthe number of sister ships produced, their average cost can be reduced by about 3 to 5%, i.e. theslope parameter of the progress curve (or learning curve) is about 0.965. This means that theaverage cost of each of N ships is N−0.035 times the cost of one ship (Couch, 1963).

Indeed, there are two reasons for costs going down. The first is the matter of non–recursive costs,e.g. costs required to build the first ship but which need not be repeated for follow–on ships.Examples are basic design, engineering, plan approval, and preparation of numerical controlfor fabrication. The second category consists primarily of labor learning: the increased efficiencyworkers acquire through repetitive work. There are also saving in material costs because suppliers,too, may experience savings. The cost of labor on repeated ships falls faster than material costs.

9.6.2 Operating Costs

It is necessary to provide an understanding of the various components that go to make up theannual costs of operating a ship. Unfortunately, there is no practical way to present a tidy hand-book of actual quantitative values, but there are a number of useful references (Benford, 1975;Reifs and Benford, 1975) that present some, but quickly outdated data.

The various components of operating costs are divided into two main categories, namely, manningcosts and voyage costs The first includes costs that are constant, regardless of whether the shipoperates or not. The second category includes those costs that occur only when the ship actuallyoperates and therefore increase with increased ship operation. These are highly dependent on theroute and the ship’s operating profile. Some of these can be independent of the route on whichthe ship operates but highly dependent on the ship building cost.

For the prediction of operating costs, the main characteristics of the ship’s operation must bedefined. A basic step is to project the times involved in a typical round trip voyage, sometimescalled a proforma voyage or cycle. Typically, such an imaginary, representative voyage would in-clude distance(s) and operating speed(s), estimated times for proceeding through a harbor, down

540

9.6 – Ship Costs

a river, and out to the open sea, perhaps some time in passing through a canal, then more timein the open sea, followed by time in sheltered waters, manoeuvring time, time to unload cargo,time to shift to another pier, time to load cargo, and then perhaps a more minor image of all ofthe forgoing until a complete round trip is completed and the ship is once more loaded at thefirst port and ready to live. Factored into this must be some reasonable allowances for port andcanal queuing delays and speed losses in fog or heavy weather. Time may also be lost in takingon bunkers or pumping out holding tanks. If the ship is not designed to be route–specific, somebasic assumptions will have to be made

The total time for the cycle voyage, when divided into the estimated operating days per year(typically 340–350), will give the estimated total number of round trips per year.

These scheduling calculations serve other purposes as well. In bulk ships deadweight is critical:they are used to establish the weight of fuel that must be aboard when the ship reaches that pointin her voyage where draft is most limited. In this phase of the work, one should give thought tothe relative benefits of taking on bunkers for a round trip versus only enough for one leg. Andone must of course add some prudent margin (often 20 to 25%) for bad weather or other kindsof delays.

The days per round trip estimate can also be used to establish the weight of other non–payloadparts of total deadweight that are a function of days away from port: fresh water, stores, andsupplies. Finally, all this may lead to that critical number: the annual cargo or passenger trans-port capacity . That estimate of actual annual transport achievement should be tempered by somerealistic assumptions as to probable amounts available to be carried on each leg of the voyage.In the bulk trades, that might amount to 100% use one way, and return in ballast. In the linertrades, one might typically assume 85% full outbound, 65% inbound; but this varies greatly de-pending on trade and route.

In more advanced studies, the naval architect may need to make adjustments to minimum allow-able freeboard changes brought on by geographic or seasonal requirements. Shallow water andice operations may also be a factor.

Manning Costs

The breakdown of running costs discussed in the sequel represents standard accounting practicein an Italian shipping line. Perhaps the first thing that should be said about these accountingpractices is that they can be misleading. As an example, the maintenance and repair categoryincludes only money paid to outside entities, usually repair yards. Maintenance or repairs carriedout by the ship’s crew are charged to wages; and materials used are charged to stores and supplies.

541


Table 9.5 shows manning costs broken down by major category for four types of vessel .

Items Bulk Carrier Container Tanker Ro-Ro120000 DWT 2000 TEU 30000 DWT 1000 DWT

Crew 47 48 46 23Upkeep 30 27 32 28Insurance 16 19 14 6Stores & supplies 7 6 8 10Miscellaneous 7 6 8 10.Total 100 100 100 100

Approx % of total cost,including capital, fuel, 18 14 24 21port & cargo handling

Table 9.5. Percentage breakdown of daily manning costs

Crew Costs

Crew costs are calculated directly using the required crew size, breakdown, and relevant charges.They include not only wages, but victualling, leave and reliefs, training, benefits, and travel.

Numbers usually vary between one and two dozens, depending on union agreements and shipowner’swilling to invest in automated equipment, more reliable components, and minimum maintenanceequipment. Now with rational schemes for reducing personnel, new complements are nearly in-dependent of ship size and power.

Crew costs include:

• direct costs (wages for the crew, paid vacation, travels, overtime, food, pension insurance);• indirect costs (health insurance, employment agency fees, trade union fees, training and

education, sick leave, working clothes, etc.).

In passenger ships average wage rates will decrease with increasing passenger capacity becausemost of the additional crew members will be at the lower end of the wage range. A default valuemay be used in conceptual design, as this does not vary significantly with ship size or capacity.

In addition to direct daily wages there are many benefits paid to seafarers. In some instancesthere may be rotation schemes so that crew numbers are on year–round salary, with vacationtimes that may amount to as much a day ashore for every day aboard. There are sick benefits,payroll taxes, and repatriation costs (travel between home and ship when rotating on or off.These are major increments that must not be overlooked.

For general studies, not specific to any owner, it is necessary to set up a wage and benefit equationthat recognizes that total costs are not directly proportional to numbers because automation andother crew reduction factors tend to eliminate people at the lower end of the pay scale. Thegeneral equation for crew cost, Cc, may take this form

542

9.6 – Ship Costs

Cc = f1 N0.8c + f2 Nc

where Nc is number in crew, and f1 and f2 are coefficients that vary with time, flag, and labor con-tract. In first instance, average annual crew cost may be assumed as 40000 euro per crew member

The cost of victuals is a function of number of people aboard and operating days per year.Compared to wages, these costs are modest, and most owners consider the money well spent asa key element in attracting and retaining good seafarers.

Upkeep Costs

Upkeep includes maintenance, repair, stores and lubricating oil, while miscellaneous includes ad-ministration, equipment hire, etc.

Maintenance and repair costs (M & R) comprises direct and indirect costs. Direct costs includethe price of work in ship maintenance interventions and the cost of the expended material. Indi-rect maintenance costs refer to all other expenses of the ship when it is not in operation.

Expenditure on maintenance and repairs depends on the class requirements, related to the qualityof the ship arrangement, on the freight market, the ship age, its type and size, voyage patterns,and the shipowner’s strategy. Prediction of costs depending upon some of these parameters isprecarious due to the repair shipyard market. They are taken as 4% of acquisition cost. Analysisof actual M & R costs suggest that they are roughly proportional to (ship size) and that theyincrease with age in real terms (i.e. before allowing for inflation) at about 3 to 5% per annum.Insurance depends on a number of factors: ship type, size and value, plus the shipowner’s record.As a proportion of first cost, annual total premiums covering all categories of insurance carriedvary between about 1 and 3%.

Annual costs for M & R can be estimated in two parts. Hull maintenance and repair will beroughly proportional to the cubic number raised to the two–thirds power. Machinery maintenanceand repair costs will be roughly proportional to the horsepower also raised to the two–thirdspower. A refinement on this approach is embodied in the following approximation

CM&R = k + f3 (L·B ·D)0.685 + f4 ·MCR + f5 ·MCR0.6

where MCR is main engine’s maximum continuous rating in kW, whereas f3, f4, and f5 arecoefficients that vary with kind of ship, owner’s policy, and so forth, and k is a fixed amountregardless of hull size and engine power. Where data is available from different time periods, theescalation rates may be used to adjust them to a common basis. Such rates may also be used toestimate future cash flows if calculations are being made in money terms.

A large share of upkeep costs is related to the costs for lubricating oils. The total consumption oflubricating oils, consequently the expenditure on lubes, to a large extent depends on the size, ageand technology of the ship equipment and machinery, as well as on the efficiency of maintenance.

543


Costs for provision depend on the crew numbers, costs for operating supplied depend on the shipdeadweight, while other expenses for management depend on the organization and the size of theshipping company.

Insurance Costs

Risk insurance costs are divided into two groups:

• insurance of hull, equipment and machinery (H & M);

• insurance of cargo, crew and indemnity (protection and indemnity, e.g. P & I).

H & M insurance covers the ship hull, equipment and machinery, in which the shipowner has adirect insurable interest.

P & I insurance covers the claims from third parties in cases such as, for example, oil pollutionof the sea, or claims in cases of injuries to employees. These risks do not tend to be covered bydirect placement on the insurance market as, more usually, this business is covered by mutualinsurance between shipowners under the auspices of the P & I clubs.

Annual insurance expenses are calculated directly as a percentage of the vessel’s building cost.Protection and indemnity insurance, protecting the shipowner against law suites, usually basedon ‘gross tonnage’ of the shipowner’s fleet, may add an annual cost of about 1% of acquisitioncost. Although even higher values can occur in less developed markets.

The annual cost of hull and machinery insurance is based on the ship’s insured value and size.Underwriters use a ‘formula deadweight’, which is effectively the ‘cubic number’. Typical figuresmight be one percent of the first cost. First cost is a rather illogical basis for fixing insurancepremiums, but the marine insurance business is marked with such irrational practices.

Further Costs

Other manning costs include the expenses for operating supplies (spares, paints, chemical sub-stances, stores, ...), provisions, administration and other expenses related to general managementof the ship.

The annual cost of stores and supplies would consist of three parts. The first would be propor-tional to the ship’s size (mooring lines, for example). The second would be proportional to themain engine power (machinery replacement parts, for example). The third would be proportionalto the number of crew members aboard (paint and cleaning compound, for example).

A final annual cost category covers overhead and miscellaneous expenses. This would have toabsorb a prorated share of the costs associated with maintaining one or more offices ashore. Shorestaffs may number anywhere from what can be counted on one hand to bureaucracies borderingon civil service multitudes. .

544

9.6 – Ship Costs

Voyage Costs

Features of the ship exploitation define the range of voyage, the time required for the ship load-ing/unloading and the annual rate of the ship exploitation. Calculation of these features is thebasis for the calculation of the voyage costs and the cargo handling costs. Depending on thefeatures of the ship exploitation, the share of a particular group of costs varies.

Fuel Costs

Specific fuel consumption at each speed is used to calculate annual expenses for fuel consump-tions. The effect of reduced power operation (manoeuvring, port restrictions, etc.) should notbe neglected, due to the increase in specific fuel consumption, especially if gas turbines are used.These calculations also account for auxiliary fuel consumption. Fuel costs are directly derived byusing actual fuel prices.

Although fuel prices vary throughout the world, such differences are often small enough to ignorein feasibility studies. The prices published in journals such as ‘The Motor Ship’, ‘Lloyd’s ShipManager’ or ‘Lloyd’s List’ may be taken as a good guide. After several years when heavy fuel priceswere in the range 140 to 190 dollars per ton (diesel oil about 220 to 300), they fell dramaticallyin 1986 to about half those levels, and have grown dramatically once again during the last yearsup to 750 dollars per ton. Until some degree of stability emerges in fuel prices, it would be wiseto investigate a range of prices in economic evaluations.

Port Costs

Port charges can be significant yet at the same time difficult to model. They tend to be highin the case of high–speed vessels. Port costs include a mixture of expenses, such as charges forentering the port, lighthouse dues, pilotage fees, tug service, mooring costs, port agency fees,custom duty, etc. Some port costs are on a per–use basis; others are on a per–day basis. Portcharges may be based on the size of the ship. Pilotage may be based on ship draft.

Actual charges per call are variable enormously from one port to the other around the world.They are beyond any comparison. The lowest costs per gross or net (registered) ton are usuallyfound for large ships in ports with few facilities, e.g. tanker loading jetty ports, while the highesttend to apply to small ships in ports with an extensive range of facilities. Many ports now chargeon a gross ton basis rather than net. Some investigations indicate that quoted charges can bevery high and may even lead to total expenses significantly higher than fuel cost. However, thismay not be the case in reality, and operators will often make special arrangements with portauthorities, leading to major reductions in charges actually being paid. This situation makes thecalculation of port charges difficult to model.

Canal dues must be added where applicable. They are standardized according to the ship tonnageand draughts, although the rules for measuring NT are different for both the Suez and PanamaCanals. Dues per transit per NT are approximately euro 1.83 laden and euro 1.46 ballast forPanama. For Suez there is a sliding scale based on Suez net tons and Special Drawing Rights,

545


which range for laden ships from about euro 6 for small ships up to 5000 tons, to about euro 4at 20000 tons, to about euro 2 for the largest ships; ballast rates are 80% of laden.

Cargo Handling Costs

Another important cost is that of cargo handling, which may or may not be included in thecontract, depending on trade. If it were be included, it logically would be treated as a voyagecost. Associated with this may be brokerage fee and cargo damage claims, hold cleaning, raintext and other miscellaneous cargo–related expenses. In some conceptual designs cargo handlingcosts will be the same for all alternatives, in which case they can be but all ignored.

Cargo handling costs vary widely between ports, especially for break–bulk general cargo. For thelatter, loading or discharging in a port with low labor costs (e.g. in the Far East) may cost aslittle as euro 8-10 per ton cargo ship-to-quay or vice versa, rising to as much as euro 60-80 in highcost areas such as North America. A realistic average to use for conceptual design will dependon the range of ports served, and also the range of cargoes carried: low stowage factor cargoessuch as steel cost less to handle than high stowage factor cargoes such as wool.

Unit load cargo handling costs are more uniform throughout the world. A container can varybetween about euro 100 to euro 240 ship–to–quay, or vice versa, i.e. about euro 10 to euro 20per ton average cargo (multiplied by two for loading and for discharging). Stuffing and strippingthe container itself will cost extra, but is not included in the sea freight charge.

Bulk cargo handling costs are not usually paid by the shipowner. However, loading costs areusually small for cargoes such as coal or grain (which are often sold f.o.b.) say, 60 euro cents perton, while discharging is more expensive, around euro 2 to euro 4 per ton for mineral or granulartype cargoes. Liquid cargo handling costs are largely pumping costs which are absorbed by theship (discharging) or shore terminal (loading).

Capital Charges

Capital charges to cover the investment and a return on capital are normally the most significantcomponent of running costs, due to factors such as high initial cost and possible high requiredinterest rate to account for high risk investment in unproven designs. They are around 30 to50% of operating costs, excluding cargo costs if a good rate of return is to be achieved. At theirsimplest, they are calculated as a direct proportion of first cost via the capital recovery factor,modified to account for taxation.

In more complex situations, where taxation and loans arise, the processes outlined above arerequired to incorporate the acquisition cost into the economic calculations. In poor markets,shipowners will accept freight rates making no contribution to capital charges; but this cannotbe sustained indefinitely, especially if there are loans outstanding on the ship.

546

9.6 – Ship Costs

Freight Rates

All of the categories mentioned previously are items of expenditure. Income is generated fromthe product of cargo carried per annum times average freight rate. Freight rates, especially in thebulk trades, vary widely with supply and demand. Past and present rates for particular cargoesand trade routes are published in ‘Shipping Statistics’ and in the shipping press, from the trendsof which an assessment can be made regarding possible future long term levels (unless RFR isthe criterion). Some realistic escalation rate should also be applied as, in the long run, freightrates increase with inflation. Such references often also give freight rates dating back for severalyears, which can help in estimating possible escalation rates.

Cargo liner freight rates are not usually published, varying widely between routes and differenttypes of cargo. However, shipowners and cargo agents are usually willing to provide some currentfreight rates for particular cargo liner services quay–to–quay. By selecting an ‘average’ cargo andallowing for stowage factor if weight/measurement rates are quoted, a reasonable estimate can bemade. On some routes, especially short sea, ‘freight all kinds’ rates are quoted for containers, i.e.a rate per box irrespective of contents. Liner freight rates on a route do not fluctuate as widelyas bulk rates, but remain constant for some months before any percentage change (overall or forspecial factors like bunker charges) is applied. The German cargo liner freight index can be usedto give some guidance on escalation.

For all freight rates, the shipowner does not receive the full revenue. For bulk cargoes, brokers’ feeswill amount to typically 2.5 to 5% of the gross freight, while for liner cargoes within Conferences,rebates of typically 10% are granted to shippers who use only Conference ships.

9.6.3 Other Decision Factors

In addition to factors which can be quite readily incorporated in economic models, there arefrequently other factors which influence both the decision as to whether to invest in a vessel ornot, and the decision as to what are the characteristics of the ‘best’ vessel. Some factors may bemore applicable to state–owned vessels than others:

• maintain market share;

• minimize risk to company survival;;

• maximize foreign exchange earnings;

• enhance company image or prestige;

• utilize currently favorable tax allowances (although this can usually be evaluated directlyin economic terms);

• maintain employment (may include operating staff or construction personnel).

However, even if the overall decision to acquire a ship is taken on such grounds, the selectionof the ‘most efficient’ design features is still likely to be made on basically economic grounds,e.g. choice of machinery. These raise the question of multicriterial decision–making, wheretechniques are being developed to weigh up attributes which can be quantified, but are measured

547


in incommensurate units. Among other applications, the comparison of multi-role vessels can beaddressed.

9.7 Ship Leg Economics

The overall economics of the sea leg simply involves adding the inventory cost of the goods intransit to the costs of the ship itself. This can best be understood by imagining that the shipownerbuys the cargo as it is loaded on board and sells it as soon as it is discharged. The componentsof this inventory costs are the value of the cargo (what the owner paid for it), his time–value ofmoney (expressed as an interest rate), the time the goods are in transit (during the period theinvestment is tied up), and the corporate income tax.

9.7.1 Inventory Costs

Where both ship and cargo are owned by the same entity, the cost of the goods, both in transitand in storage, represents money tied up - an investment that should be earning returns. Thiscost is referred to as inventory cost .

The same principles can be applied to the case where ownership is divided. This can be justi-fied on the basis of the cargo owner being willing to pay somewhat higher freight rates to theshipowner who provides the better service. Thus, if completely free market conditions obtain,the team of owners (ship and cargo) would tend to make the same design decisions as would bemade by an individual who owned both.

The inventory cost passed along to the customer for one complete ship load can be designatedas I. If the tax rate if t, the government takes t·I, leaving the shipowner I (1− t). This amountmust cover the owner’s cost of capital tied up in inventory during the voyage, so

I (1− t) = i·v ·DWT tc

d

365

wherei annual interest rate appropriate to owner’s time value of moneyv value of cargo per ton (or other units) as loadedDWT t cargo deadweight in one tripd days in transit

Transposing (1− t) yields

I =i·v ·DWT t d

365 (1− t)

The inventory cost per voyage can be converted to an annual cost, Ia, by multiplying by thecargo–legs per year

548

9.7 – Ship Leg Economics

Ia = I ·RT =i·v ·DWT t d

365 (1− t)·RT

where RT is the number of round trips per year, assuming one-way trade route.

If the annual transport capacity in tons (or other units) is

C = DWT t ·RT

the annual inventory cost is

Ia =i·v ·d

365 (1− t)·C

This can be converted to a unit inventory cost per ton of cargo delivered, CI , as

CI =Ia

C=

i·v ·d365 (1− t)

9.7.2 Economic Cost of Transport

Equation giving the unitary inventory cost pertains to the economics of the cargo alone. Thecombined economics of ship and cargo can be considered by adding the unit inventory cost to therequired freight rate, yielding the economic cost of transport :

ECT =CR·P + Y

C+

i·v ·d365 (1− t)

As common sense dictates, recognition of inventory costs will invariably tend to favor higherspeeds. The days in transit, d, must of course vary inversely with sea speed, and so increasingspeed will always decrease CI . The net result is that the ship speed selected to minimize ECT

will always be higher than that selected to minimize RFR.

In the bulk trades this impact of inventory costs is so trivial as to be safely ignored. In theliner trades high–speed ships have demonstrated a marked capacity to attract high–value, high–paying cargo. That correlation of high–value cargo and high–speed ships may be explained bythe machinations of conference rate setting practices as much as by the inventory value of thegoods in transit.

9.7.3 Effects on NPV

To this point only inventory charges have been considered as they affect unit costs of transport,while it should be preferred to consider other measures of merit such as NPV .

Would the shipowner buy the cargo as it comes aboard, the inventory cost could be treated moreor less like working capital. I can, in short, be treated as a non–depreciable addition to the initial

549


investment. Since NPV is generally found by determining the discounted present value of allthe future cash flows and then subtracting the initial investment, the ship’s net present valuewould thereby be reduced by the value of one ship load of cargo, that is, v ·DWT t

c . This wouldbe tempered, however, by a factor equivalent to the fraction of time there is cargo in transit. Insummary:

∆NPV = −v ·DWT t · d·RT

365

where ∆NPV is the annual interest rate appropriate to owner’s time value of money.

550

Bibliography

[1] Benford, H.: Investment Returns Before and After Tax, The Engineering Economist ASEE,1965.

[2] Benford, H.: Measures of Merit for Ship Design University of Michigan,Ann Arbor. 1969.

[3] Benford, H.: Bulk Cargo Inventory Costs and Their Effect on the Design of Ships and TerminalsNarine Technology, Vol. 18, no. 4, 1981, pp. 344–349.

[4] Buxton, I.L.: Engineering Economics Applied to Ship Design Transactions RINA, Vol. 114, 1972,pp. 409–428.

[5] Buxton, I.L.: Engineering Economics and Ship Design British Maritime Technology, Tyne andWear, 1987.

[6] Carreyette, J.: Preliminary Ship Cost Estimation Transactions RINA, Vol. 120, 1978, pp. 235–258.

[7] Couch, J.C.: The Cost Savings of Multiple Ship Production International Shipbuilding Progress, 1963.

[8] Erichsen, S.: Optimising Containerships and their Terminals SNAME Spring Meeting,1972.

[9] Goss, R.O.: Economic Criteria for Optimal Ship Design, Trans. RINA, Vol. 107, 1965, pp. 581–600.

[10] Grant, E.L. and Ireson, W.G.: Principles of Engineering Economy , Ronald Press, New York, 1960.

[11] Herbert, R.N.: Design of the SCA Special Ships Marine Technology,1971.

[12] Kerlen, H.: How Does Hull Form Influence the Building Cost of Cargo Vessels Proceedings, SecondInternational Marine Systems Design Conference, Danish Technical University, Lyngby, 1985.

[13] Napier, J.R.: On the Most Profitable Speed for a Fully Laden Cargo Steamer for a Given Voyage,the Philosophical Society of Glasgow, Glasgow, 1865.

[14] Thuesen, G.J. and Fabrycky, W.J.: Engineering Economy , Prentice-Hall, Englewood Cliffs, N.J.,1989.

[15] Volker, H.: Economic Calculations in Ship Design, International Shipbuilding Progress, 1967, Vol.14, no. 150.

551

Department of Naval Architecture, - UniNa STiDuE

Documents

Transcript of Department of Naval Architecture, - UniNa STiDuE