Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale...

16
RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy function Chanyoung Park 1 & Raphael T. Haftka 1 & Nam H. Kim 1 Received: 5 January 2018 /Revised: 5 June 2018 /Accepted: 12 June 2018 /Published online: 23 June 2018 # Springer-Verlag GmbH Germany, part of Springer Nature 2018 Abstract This study explores why the use of the low-fidelity scale factor can substantially improve the accuracy of the Bayesian multi- fidelity surrogate (MFS). It is shown analytically that the Bayesian MFS framework utilizes the scale factor to reduce the waviness and variation of the discrepancy function by maximizing the Gaussian process-based likelihood function. Less wavy functions are more accurately fitted, and variation reduction mitigates the effect of fitting error. Bumpiness is another way used to combine waviness and variation. Two examples, Borehole3 and Hartmann6, illustrated that indeed the Bayesian MFS reduced bumpiness using the scale factor. The finding may be useful for MFS using surrogates lacking uncertainty structure, so that likelihood is not an option, but bumpiness may be. Keywords Bayesian . multi-fidelity . surrogate . scale factor . bumpiness . Gaussian process 1 Introduction Design optimization and uncertainty quantification often re- quire numerous expensive simulations to find an optimum design and to propagate uncertainties. Instead, a surrogate fit to dozens of simulations is often employed as a cheap alter- native. However, for computationally expensive high-fidelity simulations, even evaluating sufficient samples for building a surrogate is often unaffordable. To address this challenge, multi-fidelity surrogates (MFS) combine inexpensive low- fidelity models with a small number of high-fidelity simula- tions. For example, MFS can be employed to predict the re- sponse of an expensive finite element model with a few runs of the model and many runs from a less accurate model with a coarse mesh. MFS have been applied extensively in the liter- ature (Fernández-Godino et al. 2016; Gano et al. 2006). The regression-based MFS framework has been used ex- tensively in design optimization, for example, by combining two- and three-dimensional finite element models (Mason et al. 1998) or coarse and fine finite element models (Balabanov et al. 1998). More recently, Bayesian MFS frameworks have become popular. Qian and Wu (2008) proposed the use of Markov chain Monte Carlo and sample average approxima- tion algorithm for hyperparameter estimation of the Bayesian MFS framework. Co-Kriging followed with better computa- tional efficiency (Forrester, 2007; Le Gratiet 2013). A Gaussian process (GP) based Bayesian MFS framework was introduced by Kennedy and O'Hagan (2000). The use of GP model provides flexibility and their prediction is not limited to a specific form of the trend function. Thus, the Bayesian MFS can also be useful when there is no prior information, which is also called non-informative prior (Rasmussen, 2006). The Bayesian MFS framework can be expressed as ^ y H x ðÞ¼ ρ^ y L x ð Þþ ^ δ x ðÞ ð1Þ where ^ y H x ðÞ is high-fidelity function prediction at x, ^ y L x ðÞ is low-fidelity function prediction, ^ δ x ðÞ is discrepancy function prediction, and ρ is a low-fidelity scale factor. This scale factor has rarely been used in the past. However, the combined use of a scale factor and a discrepancy function has been common for recently developed GP-based MFS frameworks (Fernández- Godino et al. 2016; Zhou et al. 2018). The MFS frameworks can handle noisy data using (1) with a noise model, where random noise follows a normal distribu- tion defined with zero mean and a noise standard deviation, which needs to be estimated. However, this paper focuses on Responsible Editor: Helder C. Rodrigues * Nam H. Kim [email protected] 1 Department of Mechanical and Aerospace Engineering, University of Florida, PO Box 116250, Gainesville, FL 32611-6250, USA Structural and Multidisciplinary Optimization (2018) 58:399414 https://doi.org/10.1007/s00158-018-2031-2

Transcript of Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale...

Page 1: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

RESEARCH PAPER

Low-fidelity scale factor improves Bayesian multi-fidelity predictionby reducing bumpiness of discrepancy function

Chanyoung Park1 & Raphael T. Haftka1 & Nam H. Kim1

Received: 5 January 2018 /Revised: 5 June 2018 /Accepted: 12 June 2018 /Published online: 23 June 2018# Springer-Verlag GmbH Germany, part of Springer Nature 2018

AbstractThis study explores why the use of the low-fidelity scale factor can substantially improve the accuracy of the Bayesian multi-fidelity surrogate (MFS). It is shown analytically that the Bayesian MFS framework utilizes the scale factor to reduce thewaviness and variation of the discrepancy function by maximizing the Gaussian process-based likelihood function. Less wavyfunctions are more accurately fitted, and variation reduction mitigates the effect of fitting error. Bumpiness is another way used tocombine waviness and variation. Two examples, Borehole3 and Hartmann6, illustrated that indeed the Bayesian MFS reducedbumpiness using the scale factor. The finding may be useful for MFS using surrogates lacking uncertainty structure, so thatlikelihood is not an option, but bumpiness may be.

Keywords Bayesian . multi-fidelity . surrogate . scale factor . bumpiness . Gaussian process

1 Introduction

Design optimization and uncertainty quantification often re-quire numerous expensive simulations to find an optimumdesign and to propagate uncertainties. Instead, a surrogate fitto dozens of simulations is often employed as a cheap alter-native. However, for computationally expensive high-fidelitysimulations, even evaluating sufficient samples for building asurrogate is often unaffordable. To address this challenge,multi-fidelity surrogates (MFS) combine inexpensive low-fidelity models with a small number of high-fidelity simula-tions. For example, MFS can be employed to predict the re-sponse of an expensive finite element model with a few runsof the model and many runs from a less accurate model with acoarse mesh. MFS have been applied extensively in the liter-ature (Fernández-Godino et al. 2016; Gano et al. 2006).

The regression-based MFS framework has been used ex-tensively in design optimization, for example, by combiningtwo- and three-dimensional finite element models (Mason etal. 1998) or coarse and fine finite element models (Balabanov

et al. 1998). More recently, Bayesian MFS frameworks havebecome popular. Qian and Wu (2008) proposed the use ofMarkov chain Monte Carlo and sample average approxima-tion algorithm for hyperparameter estimation of the BayesianMFS framework. Co-Kriging followed with better computa-tional efficiency (Forrester, 2007; Le Gratiet 2013). AGaussian process (GP) based Bayesian MFS framework wasintroduced by Kennedy and O'Hagan (2000). The use of GPmodel provides flexibility and their prediction is not limited toa specific form of the trend function. Thus, the Bayesian MFScan also be useful when there is no prior information, which isalso called non-informative prior (Rasmussen, 2006).

The Bayesian MFS framework can be expressed as

yH xð Þ ¼ ρyL xð Þ þ δ xð Þ ð1Þ

where yH xð Þ is high-fidelity function prediction at x, yL xð Þ islow-fidelity function prediction, δ xð Þ is discrepancy functionprediction, and ρ is a low-fidelity scale factor. This scale factorhas rarely been used in the past. However, the combined use ofa scale factor and a discrepancy function has been common forrecently developed GP-based MFS frameworks (Fernández-Godino et al. 2016; Zhou et al. 2018).

The MFS frameworks can handle noisy data using (1) witha noise model, where random noise follows a normal distribu-tion defined with zero mean and a noise standard deviation,which needs to be estimated. However, this paper focuses on

Responsible Editor: Helder C. Rodrigues

* Nam H. [email protected]

1 Department of Mechanical and Aerospace Engineering, Universityof Florida, PO Box 116250, Gainesville, FL 32611-6250, USA

Structural and Multidisciplinary Optimization (2018) 58:399–414https://doi.org/10.1007/s00158-018-2031-2

Page 2: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

MFS prediction with a few high-fidelity samples without ran-dom noise. This is because filtering noise based on a few high-fidelity samples is often not reliable (Matsumura et al. 2015).

It was found that the Bayesian framework and co-Krigingoften gave significantly more accurate predictions than otherMFS frameworks with a discrepancy function only (Park et al.2017). An interesting observation was the influence of thescale factor. The Bayesian framework gave much more accu-rate discrepancy predictions with the scalar and so did theMFS predictions, but it gave mediocre predictions withoutthe scalar. The objective of this paper is to discover the reasonwhy the use of the scalar made the Bayesian MFS significant-ly more accurate. Understanding the reason behind theBayesian MFS is likely to help extend the success to othernon-Bayesian MFS frameworks that may have advantages forsome applications.

This paper is organized as follows. Section 2 discusses theimportance of using the scalar for making MFS prediction.One-dimensional examples show how the scalar improvesthe accuracy of the discrepancy function to improve MFSprediction. It is discussed that large waviness and variationof the discrepancy function tend to increase errors.Bumpiness is introduced to combine them. Section 3 explainshow the Bayesian framework characterizes a discrepancyfunction with variation and waviness. It finds the scalar bycombining them through the likelihood function based on aGaussian process model. Section 4 uses multi-dimensionalexamples to illustrate the correlation between bumpiness anderror. These include the Borehole3 physical function andHartmann6 algebraic function. Concluding remarks are pre-sented in Section 5.

2 The importance of having the scale factorto reduce the bumpiness of a discrepancy

InMFS frameworks, it is usually assumed that the low-fidelityfunction is well approximated due to a relatively large numberof samples. With the model expressed in (1), the low-fidelityprediction is based on a sufficient number of low-fidelity sam-ples. Once the low-fidelity model is determined, the differ-ences between the high-fidelity samples and the low-fidelitypredictions are used to fit the discrepancy function. Since thediscrepancy samples are based on the high-fidelity samples,the discrepancy function prediction depends on a small num-ber of high-fidelity samples. Therefore, if the discrepancyfunction is wavy or has a high amplitude of oscillation, a smallnumber of high-fidelity samples may lead to large errors.

Fortunately, the discrepancy function depends on thescalar ρ because it is defined as the difference betweenthe high-fidelity prediction and the scaled low-fidelityprediction. Therefore, it is possible to manage the dis-crepancy function by changing the scalar. Based on our

study about MFS frameworks, the Bayesian MFS frame-work was particularly effective with ρ. We found that itwas because the Bayesian MFS determines the scalar tomake the discrepancy simple. That is, it reduces thewaviness and variation of the discrepancy, which tendsto improve the prediction accuracy.

In order to show the above-mentioned characteristic, Fig. 1shows two analytical examples where the Bayesian MFSframework was applied with and without ρ. In Fig. 1(a) and(b), the red and blue curves are the true high- and low-fidelityfunctions, and the crosses and the hollow circles are the high-and low-fidelity samples, respectively. Figure 1(c) and (d)show the true discrepancy function (dashed curve) without ρ(or with ρ= 1) and the corresponding predictions (red solidcurve) using the Bayesian framework. The figures also showthe two-sigma prediction uncertainty. Due to a large variationof the true discrepancy function, the four samples were notenough to predict the discrepancy accurately. Also, the 2σ-confidence intervals (blue areas) failed to cover the true dis-crepancy function.

On the other hand, when the scale factor is present, theBayesian framework found ρ = 2.5 for the first example,whose discrepancy is shown in Fig. 1(e). Note that the dis-crepancy samples in Fig. 1(c) and (e) are different becausethey are discrepancies between the high-fidelity samples tothat of scaled low-fidelity samples. By choosing ρ= 2.5, thevariation in the original discrepancy in Fig. 1(c) is drasticallyreduced as shown in Fig. 1(e), and thus, four samples wereenough to accurately predict it. Due to the reduction of varia-tion, the root-mean-squared-error (RMSE) in the discrepancyis reduced by more than a factor of 8. Although the true dis-crepancy function is still wavy, the bumpiness is significantlyreduced by reducing the variation in this case.

In the case of the second example, Fig. 1(f) shows that theBayesian method found ρ= 2 that turns the wavy discrepancyfunction in Fig. 1(e) into a linear function, and the predictionbecomes almost perfect. In this case, the Bayesian MFSframework reduces the waviness of the discrepancy whilethe variation is still large.

Note that the results also illustrate the flexibility of the GP-based Bayesian MFS framework. The trend functions of theGP models of the Bayesian MFS framework were set to aconstant function. Figure 1(f) shows that the GP-based dis-crepancy prediction gave a prediction like a linear functionbased on the data while its trend function is a constant func-tion. In addition to the two analytical examples, a cantileverbeam example is also included in Appendix B.

The concept of variation and waviness can be combinedinto the concept of bumpiness. The notion of bumpiness,which is also referred to as roughness, was introduced formeasuring function roughness (Duchon 1977; Gutmann2001; Cressie 2015). Salem and Tomaso (2018) uses bumpi-ness to select surrogate and surrogate weighting for an

400 C. Park et al.

Page 3: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

ensemble. Bumpiness is an integral of the square of secondderivative of the associated function, as

B f xð Þð Þ ¼ ∫ f ″ xð Þ�� ��2dx ð2Þ

In the following section, we describe how the BayesianMFS framework combines the effect of variation and wavi-ness through the likelihood function. In the Bayesian method,finding a scalar value that maximizes the likelihood functionis related to reducing variation and waviness, which also leadsto the reduction of bumpiness in (2). However, maximizing

the likelihood function does not mean exactly minimizing thebumpiness.

3 Bayesian MFS framework: Finding the scalefactor that reduces bumpiness

The two examples in the previous section used the Bayesianframework to determine ρ, as shown in Fig. 1(e) and (f). TheBayesian framework finds ρ using the method of maximum

a b

c d

e f

Fig. 1 Options to improve theaccuracy by including ρ for MFSprediction (δ(x): True discrepancyfunction; δ (x): Discrepancyprediction; δ: Discrepancyfunction data)

Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy... 401

Page 4: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

likelihood estimation (MLE) that estimates ρ at the mode ofthe likelihood function. While the ρ obtained by MLE is notexactly the same as the ρ minimizing the bumpiness, for theexamples we analyzed they were close, as will be seen in thenext section. This section discusses why the Bayesian formu-lation tends to reduce the bumpiness in the discrepancy func-tion using variation reduction and waviness reduction.

The likelihood function, which is in the form of the multi-variate normal distribution, can be reformulated to find ρ asthe minimizer, as

argminρ

σ2Δ ρð Þ RΔ ωΔð Þj j1=nH ð3Þ

whereσΔ ρð Þ and |RΔ(ωΔ)|are, respectively, the process stan-dard deviation and the determinant of the correlation matrixobtained based on discrepancy data yH−ρycL. σΔ ρð Þ representsthe variation, while |RΔ(ωΔ)| represents the waviness of thediscrepancy data. |RΔ(ωΔ)| can be interpreted as a wavinessmeasure, which is a function of waviness vector ωΔ. Thedetailed derivation of (3) from the likelihood function is givenin Appendix A.

σΔ andωΔare estimated based on the discrepancy data forgiven ρ using auto-covariance, which was introduced to quan-tify the probabilistic similarity of two values in space (Ripley1981). The auto-covariance is also applicable to random func-tion generation based on the variation and waviness parame-ters of the auto-covariance. The inverse use of the auto-covariance allows estimating the variation and waviness ofthe true function based on the data (Rasmussen 2004).

The Bayesian framework uses the Gaussian correlationfunction to model the auto-covariance of the uncertainties indiscrepancy function predictions at different locations. LetΔ(x) and Δ(x′) be the discrepancy predictions at two datalocations x and xʹ. The covariance between them is expressedusing their distance as

cov Δ xð Þ;Δ x0

� �� �

¼ σ2Δexp − x−x

0� �T

diag ωΔð Þ x−x0

� �� �ð4Þ

where diag(ωΔ) is a diagonal matrix with the waviness vectorωΔ, which has the same dimension with x.

Figure 2 shows two sets of samples from (a) a wavy func-tion with small variation and (b) a less wavy function withlarge variation. The process standard deviation and wavinesswere estimated based on the data sets using (4). It is clear that awavy function with small variation has a small σΔ and a largeωΔ. On the other hand, a less wavy function with large var-iation has a large σΔ and a small ωΔ.

The effect of the process standard deviation on (3) is obvi-ous because the objective function decreases as with the pro-cess standard deviation. The influence of |RΔ| on (3) is a

function of the waviness parameter and discrepancy data lo-cations. Since data location remains the same for finding ρ,there is no influence of the data location.

The correlation matrix RΔ is a symmetric square matrix.The size of the matrix is the number of discrepancy data. Thediagonal elements of RΔ are one and the off-diagonal ele-ments are obtained by the exponential part of the auto-correlation in (4). The off-diagonal elements measure the cor-relations between discrepancy values at two different datalocations. The correlation matrix is expressed as

RΔ ¼1 ⋯ exp − xδ−x

� �Tdiag ωΔð Þ xδ−x

� �� �⋱ ⋮

symm 1

2664

3775

nH�nHð Þ

ð5Þ

The properties of the determinant of a correlation matrixare well known (Reddon et al. 1985; Lophaven et al. 2002;Johnson andWichern 2007). The determinant has a minimumvalue of zero whenωΔ = 0, which makes all the off-diagonalelements one. On the other hand, the determinant has a max-imum value of one when ωΔ→∞, which makes all the off-diagonal elements zero. As shown by Johnson and Wichern(2007), the determinant decreases monotonically with de-creasingωΔ. Since waviness is proportional toωΔ, reducingthe determinant is equivalent to reducing the waviness.

In summary, the minimization of (3) is achieved by reducingthe product of the terms representing variation and waviness.When there is no way to reduce the variation and waviness usingρ simultaneously, (3) trade-off between them to minimize theobjective function. Equation (3) drives bumpiness reductionthrough the reduction of variation and/or waviness. However,minimizing (3) is not theoretically equivalent to minimizing thebumpiness of the discrepancy function defined in (2).

The reader is referred to Section A.2 of Appendix for thedetailed formulas to measure variation and waviness using theauto-covariance model of the Bayesian framework.

4 Multi-dimensional examples

In this section, the influence of ρ on increasing MFS predic-tion accuracy by reducing bumpiness will be presentedthrough multi-dimensional examples: (a) physical boreholefunction and (b) numerical Hartmann 6 function.

Firstly, in this section, three different MFS frameworks arecompared, along with two single-fidelity Kriging surrogates.Table 1 shows the framework descriptions with the corre-sponding abbreviations. “H” and “L” denote Kriging surro-gates using only high- and low-fidelity samples, respectively.“B” is the Bayesian framework without ρ. “BR” is theBayesian framework with ρ that is found by minimizing the

402 C. Park et al.

Page 5: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

bumpiness, while ρ in BR2 is found by minimizing error. Thecomparison between BR and B shows the effect of including ρon the prediction. The comparison between BR and BR2shows the effect of different criteria for finding ρ: reducingbumpiness versus minimizing error.

Secondly, the influences of ρ on the bumpiness of the discrep-ancy function and the accuracy of MFS prediction were mea-sured in the form of graphs by gradually changing ρ. However,since evaluating second-order derivatives of a multi-dimensionalfunction in (2) is a computational challenge (Cressie, N., 2015;Duchon, J., 1977), one-dimensional bumpiness measures areused along Nline = 1000 randomly generated lines. Each linewas generated by connecting two randomly generated pointsand extending the line to the boundary of the sampling domain.The average of the one-dimensional bumpiness measures is usedas a representative bumpiness measure for a given value of ρ, as

B ρð Þ ¼ 1

Nline∑i¼1

Nline

∫ δ″ si; ρð Þ�� ��2dsi ð6Þ

where siis the parameter along the ith line, and δ″(si, ρ)is thesecond-order derivative of the discrepancy function along the linefor a given ρ.

The graph of bumpiness with respect to ρ explains the influ-ence of ρ on the bumpiness, but it does not explain whether thechange in bumpiness is caused by the variation and/or the wav-iness. To measure their individual contributions, graphs of

variation and waviness are also obtained in terms of ρ. Thediscrepancy variation is measured using the variance of the dis-crepancy along all lines as

V ρð Þ ¼ 1

Nline∑i¼1

Nline

σ2i ð7Þ

μi ¼ ∫δ si; ρð Þ=Lidsi and σ2i ¼ ∫ δ si; ρð Þ−μið Þ2=Lidsi ð8Þ

where Liis the length of the ith line.

To quantify the effect of waviness, a normalized bumpiness isused. For example, the variation of δ1(x) = 2sin(100×) is fourtimes of the variation of δ2(x) = sin(100×) while their wavinessis the same. If these two functions are normalized, then they havethe same variation, so that only waviness can be measured. Thewaviness measure is defined as

W ρð Þ ¼ 1

Nline∑i¼1

Nline

∫ δ″si; ρð Þ

��������2

dsi ð9Þ

where δ″si; ρð Þ ¼ δ″ si; ρð Þ=σi is the normalized second-order

derivative of the discrepancy function using the standarddeviation.

The accuracy graph of the Bayesian framework is measuredin terms of RMSE with respect to ρ. Since the MFS predictiondepends on samples; i.e., design of experiments (DOE), 100DOEs were randomly generated using the nearest neighbor sam-pling method (Forrester et al. 2007; Jin et al. 2005). Since theBayesian framework is applied for each DOE to calculateRMSE, the above process yields 100 RMSEs. In the followingexamples, the median, 25 and 75 percentiles of RMSEs wereobtained as a function of ρ.

4.1 Borehole function example: Reducing discrepancyvariation

The empirical Borehole function calculates the water flow ratefrom an aquifer through a borehole. The function was obtained

a a

Fig. 2 The variation and thewaviness of a discrepancyfunction

Table 1 Frameworks and the corresponding labels

Label Framework

BR Bayesian discrepancy framework including ρ

BR2 Bayesian discrepancy framework and finding ρ for maximizingagreement

B Bayesian framework excluding ρ (or BR with ρ = 1)

H Kriging surrogate based on high-fidelity samples

L Kriging surrogate based on low-fidelity samples

Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy... 403

Page 6: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

based on assumptions of steady-state flow from an upper aqui-fer to the borehole and from the borehole to the lower aquifer,no groundwater gradient, and laminar, isothermal flow throughthe borehole (Morris et al. 1993).

In this example, the borehole function is considered as thehigh-fidelity function and an approximate function is used as alow-fidelity function. The high-fidelity function is defined as

f H Rw; L;Kwð Þ

¼ 2πTu Hu−Hlð Þln R=Rwð Þ 1þ 2LTu

ln R=Rwð ÞR2wKw

þ Tu

Tl

� � ð10Þ

The flow rate fH(Rw, L, Kw) is a function of three inputvariables, Rw, L, and Kw, which are the borehole radius, bore-hole length and hydraulic conductivity of borehole, respec-tively. The ranges of the input variables and other environ-mental parameters are presented in Table 2. The values ofthe parameters were determined as nominal values of the pa-rameters based on Morris et al. (1993).

A low-fidelity function of the borehole function is obtainedfrom the literature (Xiong et al. 2013) as

f L Rw; L;Kwð Þ

¼ 5Tu Hu−Hlð Þln R=Rwð Þ 1:5þ 2LTu

ln R=Rwð ÞR2wKw

þ Tu

Tl

� � ð11Þ

Note that bounds of [0.5, 1.5] were used for ρ, and constanttrend functions were used for the Bayesian framework.

Since MFS are built with low- and high-fidelity samples,there are many different combinations of low- and high-fidelity samples possible for the same total computationalbudget. MFS performances for different combinations weremeasured, and then, the one that shows the highest accuracywas selected and analyzed further.

All the frameworks in Table 1 were applied for differentsample combinations for the same total budget. Table 3 shows

sample size ratios for the total budget of 10H, which meansthe computational budget for evaluating ten high-fidelity sam-ples. The sample cost ratio of 30 means that the cost of eval-uating 30 low-fidelity samples is equivalent to that of evalu-ating a single high-fidelity sample. With the total budge of10H, we can use either 10 high-fidelity samples, 300 low-fidelity samples, or any combinations as shown in Table 3.These combinations are expressed with the numbers of high-(nH) and low- (nL) fidelity samples, such as 7/90.

For each sample size ratio, 100 DOEs were randomly gen-erated using the nearest neighbor sampling method (Forresteret al. 2007), and the statistics of their RMSEs are used forevaluating the performance of each MFS framework. Notethat RMSE of each DOE was calculated based on 100,000test points in the sampling domain. For each sample size ratioand MFS framework, the median, 25 and 75 percentiles ofRMSEs were obtained.

Figure 3 shows the median RMSEs of all five frameworksfor different sample size ratios. Since the low-fidelity Krigingsurrogate used 300 samples, the prediction error is smallagainst the true low-fidelity function, but RMSE is highagainst the true high-fidelity function. On the other hand, theerror in the high-fidelity Kriging surrogate comes from theprediction error because 10 high-fidelity samples are too

Table 2 Input variables and environmental parameters

Input Bounds Description

Rw [0.05, 0.15] m Borehole radius

L [1120,1680] m Borehole length

Kw [1500,15,000] m/yr Hydraulic conductivity of borehole

Parameters Value Description

R 25,050 m Radius of aquifer influence

Tu 89,335 m2/yr Transmissivity of upper aquifer

Hu 1050 m Potentiometric head of upper aquifer

Tl 89.55 m2/yr Transmissivity of lower aquifer

Hl 760 m Potentiometric head of lower aquifer

Table 3 Cases of sample size combinations for a total computationalbudget of evaluating 10 high-fidelity samples (10H) and ratio of 30between the cost of high-fidelity and low-fidelity simulation (Borehole3function)

Total budget Sample cost ratio High- and low-fidelitysamples

10H 30 8/60, 7/90, 6/120, 5/150, 4/180,3/210, 2/240

Fig. 3 The median (of 100 DOEs) accuracy for different sample sizeratios (Note that the BR2 curve is overlapped with the BR curve)(Borehole example)

404 C. Park et al.

Page 7: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

few. In general, the RMSEs of the MFS frameworks weresignificantly lower than that of single-fidelities.

The difference between these MFS frameworks is from thecontributions of ρ and the criteria for finding ρ. BR and BR2significantly outperformed B. That indicates that the inclusionof ρ is a key factor for the accuracy in this example. In addi-tion, the difference in finding ρ (BR and BR2) did not lead to asignificant difference, which means that the different criteriafor finding ρ did not change the results. In the case of thesample size ratio of 7/90 (BR and BR2 were most accurateat this ratio), the estimated ρ from 100 DOEs has the mean of1.25 and standard deviation of 9.3 × 10−7; that is, the effect ofdifferent DOEs is negligible. In this example, the directions offinding ρ for reducing bumpiness (BR) andmaximizing agree-ment (BR2) are consistent, which is not always true. Thereason will be discussed in a later section.

Figure 4(a), (b) and (c) show the graphs of bumpiness,variation, and waviness of the true discrepancy function withrespect to ρ. In Fig. 4(d), the MFS accuracy was calculatedbased on 100 DOEs of 7/90 at which the prediction accuracyof BR is closest to the minimum median RMSE of BR shownin Fig. 3. From Fig. 4(a), ρ = 1.25 minimizes the bumpiness,which is consistent with the mean of the estimated ρ from BR;that is, BR found ρ that is identical to minimizing the bump-iness. The bumpiness behavior is related to the variation andwaviness behavior. Figure 4(b) and (c) show that ρ affects the

variation but not the waviness of the discrepancy function, soall the changes in the bumpiness are due to variation. Figure4(d) shows that the MFS accuracy as well as its uncertainty isthe best when the bumpiness is minimum. In summary, BRfound ρ = 1.25 by minimizing the bumpiness, or equivalently,by minimizing variation. Such behavior is related to the char-acteristics that both the high- and low-fidelity functions areconvex.

Figure 5 illustrates a prediction comparison between BRand B along the line connecting x1 = {0.2, 1120, 1500} andx2 = {0, 1680, 15,000} for a chosen DOE. Note that the char-acteristics along other lines are similar to this one. Since thelow-fidelity prediction was very accurate, the prediction accu-racy was determined by the error in the discrepancy function.As the result shows, seven high-fidelity samples could notcapture the curvature of the discrepancy in Fig. 5(b) withoutρ (or ρ = 1). ρ reduced the variation of the discrepancy func-tion and so does the bumpiness as shown in Fig. 5(a); and itincreased theMFS prediction accuracy significantly. Note thatthe magnitude of the discrepancy function in Fig. 5(a) and (b)are different by a factor of about 100. Since the variation of thediscrepancy function was reduced so much, the errors infitting the discrepancy function have also been reduced bytwo orders of magnitude. The comparison between Fig. 5(c)and (e) shows that the high-fidelity response has a similartrend with the low-fidelity response. Therefore, magnifying

a Bumpiness curve b

c d

Variation curve

Waviness curve RMSE curve (BR)

Fig. 4 The bumpiness, variationand waviness graphs and theRMSE from BR in terms of ρ forthe borehole example

Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy... 405

Page 8: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

the low-fidelity function reduces the variation of the discrep-ancy function.

It is recalled that the performances of the BR and the BR2were almost identical. This is because for this example, thereduction in variation of the discrepancy function is achievedby scaling down its magnitude. However, this is not alwaysthe case, as the Hartman 6 example will show.

4.2 Hartmann 6 function example

The Hartmann 6 function example also shows that Bayesianframeworks with ρ increase the prediction accuracy of MFS.However, in contrast to the borehole example, the BR2 did not

give as good prediction as BR, which confirms that reducingbumpiness was more effective than minimizing error. For ahigh-fidelity function, the six-dimensional Hartmann 6 func-tion is defined as

f H xð Þ ¼ −1

1:942:58þ ∑

4

i¼1αiexp − ∑

6

j¼1Aij x j−Pij� �2 ! !

ð12Þwhere the domain of input variables is defined as{0.1, ...0.1} ≤ x ≤ {1,…, 1}. For model parameters, α = {11.2 3 3.2}T, and the following two constant matrices areused:

Discrepancy prediction, ρ=1.25: RMSE=0.031 (BR)

Discrepancy prediction, without ρ (ρ=1):RMSE=4.449 (B)

, ρ=1.25: RMSE=0.060 (BR)

c MFS prediction d

a b

MFS prediction, without ρ (ρ=1): RMSE=4.470 (B)

e Low-fidelity prediction, ρ=1.25:

RMSE=0.028 (BR)

f Low-fidelity prediction, without ρ (ρ=1):

RMSE=0.028 (B)

Fig. 5 Comparisons between thepredictions based on the BR andthe B on the line betweenx1 = {0.2, 1120, 1500} andx2 = {0, 1680, 15,000} andRMSEs along the line (Boreholeexample)

406 C. Park et al.

Page 9: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

A ¼10 3 17 3:5 1:7 80:05 10 17 0:1 8 143 3:5 1:7 10 17 817 8 0:05 10 0:1 14

0BB@

1CCA a n d

P ¼ 10−4

1312 1696 5569 124 8283 58862329 4135 8307 3736 1004 99912348 1451 3522 2883 3047 66504047 8828 8732 5743 1091 381

0BB@

1CCA.

The approximated Hartmann 6 function was invented to beused as a low-fidelity function as

f L xð Þ ¼ −1

1:942:58þ ∑

4

i¼1α

0i f exp − ∑

6

j¼1Aij x j−Pij� �2 ! !

ð13Þ

where α0 ¼ 0:5 0:5 2:0 4:0f gT andfexp(x)is the ap-

proximated exponential function as

f exp xð Þ ¼ exp−49

� �þ exp

−49

� �xþ 4ð Þ9

� �9

ð14Þ

Note that the total function variation of the Hartmann 6function is 0.33 and the RMSE of the low-fidelity functionis 0.11.

In this example, the total computational budget is the costof evaluating 56 high-fidelity samples (56H), and the samplecost ratio between high- and low-fidelity functions is 30.Table 4 shows the considered sample size ratios. The notationsand the repetitions of DOE are the same as the previousexample.

Figure 6 shows the median of RMSEs of all the frame-works for different sample size ratios. In this example, BRoutperformed both B and BR2, which shows that not onlythe inclusion of ρ but also reducing the bumpiness is importantfor prediction accuracy. Finding ρ by reducing bumpinessyielded much more accurate prediction than by minimizingerror.

Figure 7(a) and (b) show the histograms of ρ estimatedfromBR and BR2 for the sample size ratio of 42/420, at whichthe BR is the most accurate. The histograms clearly show theyestimated significantly different ρ. The mode of the histogramwas 1.49 for BR, while it was 1.03 for BR2. Since there is nodifference in making low-fidelity predictions between the BRand the BR2, the difference between the two frameworks iscaused by the difference in the ways of finding ρ.

The graphs of MFS discrepancy bumpiness, variation andwaviness with respect to ρ are shown in Fig. 8 as well as thegraph of prediction accuracy for the sample size ratio of 42/420.Figure 8(a) shows the bumpiness graph of the true discrepancyfunction, where the minimum bumpiness occurred at ρ= 1.41.The mode of ρ from BR (1.49) is close to ρ at the minimumbumpiness, which indicates that BR found ρ, which reduces thebumpiness as discussed in Section 3. The contributions of thevariation and the waviness are shown as a graph in Fig. 8(b) and(c). It shows that the bumpiness is strongly correlated with thevariation, while the waviness shows an opposite behavior, but itscontribution is overwhelmed by that of the variation. Figure 8(d)shows the RMSE of BR for varying ρ. Note that the correspond-ing RMSE graph of BR2 is identical to that of BR. That means,BR and BR2 gave identical predictions for the same ρ. Thedifference between BR and BR2 shown in Fig. 6 was becausethey used different ρ estimates. RMSE is closely correlated withthe bumpiness, where the maximum accuracy occurred at ρ=1.55. ρ at the minimum variation (1.55 from Fig. 8(b)) is consis-tent with ρ at the minimum RMSE (1.55 from Fig. 8(d)).

The result indicates that bumpiness reduction was effectiveto reduce prediction error. However, minimizing bumpiness isnot equivalent to maximizing accuracy. An explanation of theobservation is that the bumpiness of the true discrepancy func-tion is compared with the accuracy of the predictions based onsamples. Since there are infinite true functions for a given setof samples, the bumpiness of the true function cannot be per-fectly correlated with the error.

In order to visualize the effect of different variation reduc-tions between BR with BR2, a DOE was chosen for the sam-ple size ratio of 42/420. Figure 9 shows predictions along aline in the sampling domain that has the maximum differencebetween BR and BR2. The low-fidelity prediction is reason-ably accurate with RMSE of 0.048 for both frameworks(Fig. 9(e) and (f)). Therefore, the error in MFS mostly comes

Table 4 Cases of sample size combinations for a total computationalbudget of evaluating 56 high-fidelity samples (56H) and sample cost ratioof 30 (Hartmann6 example)

Total budget Sample size ratio nH/nL

56H 48/240, 46/300, 44/360, 42/420, 40/480,38/540, 28/840, 18/1140

Fig. 6 The median (of 100 DOEs) RMSEs versus sample size ratio(Hartman6 example)

Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy... 407

Page 10: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

from the error in the discrepancy function. This is because 42high-fidelity samples are not sufficient to capture the bumpybehavior of the discrepancy in the six-dimensions. BR deter-mined ρ = 1.48 by minimizing the bumpiness, while BR2found ρ = 1.04 by minimizing the error.

As in the borehole function, Figs. 9(a) and (b) show that thefit along the line is poor for both BR and BR2. However, theimprovement by BR is not accomplished by reducing themagnitude of the discrepancy but by reducing its variation.The RMSEs along the line are much higher than the medianRMSEs of the whole sampling domain shown in Fig. 6. Thisis because the line is a tiny part of the domain. However, interms of the RMSE reduction, they are consistent: 23% reduc-tion for the line and 20% reduction for the whole domain.

5 Concluding remarks

This paper discussed that the Bayesian discrepancy frame-work uses the low-fidelity scaling scalar to reduce variationand waviness of the discrepancy function through the likeli-hood based on the Gaussian process model. The variation andwaviness reductions lead to reduction of bumpiness that com-bines the two without a Gaussian process model. The impor-tance of including the low-fidelity scaling factor is that it al-lows to reduce bumpiness of the discrepancy function thattends to reduce error as the examples show. For the examplesstudied, the success of the Bayesian framework was largelybased on the use of the scale factor. Without the scalar, theBayesian method gave mediocre predictions. The three-

a Bumpiness curve b Variation curve

c Waviness curve d RMSE curve (BR)

0.5 1 1.5 2 2.50.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

RM

SE

(B

R)

[25%,75%]Median

Fig. 8 The bumpiness, variationand waviness graphs and theRMSE from BR in terms of ρ forthe Hartman6 example

a Histogram of ρ for BR (mode of 1.49) b Histogram of ρ for BR2 (mode of 1.03)

Fig. 7 Histogram of ρ estimates(Hartman6 example)

408 C. Park et al.

Page 11: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

dimensional borehole and the six-dimensional Hartmann6 ex-amples demonstrated that the accuracy of theMFS predictionswas strongly correlated with the bumpiness of the discrepancyfunction. For the Borehole3 example, the minimum RMSEwas achieved with the scalar minimizing bumpiness. For theHartmann6 example, the scalar minimizing RMSE was notidentical to the scalar minimizing the bumpiness but they werevery close and the behaviors of RMSE and bumpiness werestrongly correlated. However, a perfect correlation cannot beexpected between the bumpiness of the true function and theprediction error due to infinite possible true functions passingthrough a finite number of samples.

The Bayesian framework characterizes a discrepancy func-tion with two factors variation and waviness through theGaussian process model. And the maximum likelihood meth-od combines variation and waviness reduction. Bumpiness isanother way to combine them without using a Gaussian pro-cess model. For the examples using the Bayesian framework,variation reduction dominated bumpiness reduction for theHartmann6 and Borehole3 examples. That can be interpretedas the low-fidelity function captured the trend of the high-fidelity function, but it did not capture the high-frequencybehavior. Whereas waviness reduction requires the scaledlow-fidelity model to capture the high-frequency behavior of

a b

c d

e f

Fig. 9 Comparisons between theBR and the BR2 on hyper-linebetween{0.35,0.32,0.63,0.14,0.88,0.10}and{0.30,0.39,0.31,0.36,0.10,0.80}and RMSEs along the line(Hartman6 example)

Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy... 409

Page 12: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

the high-fidelity model without necessarily capturing the low-frequency behavior. We suspect that such a case may be rare,so reducing variation would be more common. The lessonslearned from the Bayesian framework can be utilized for otherMFS predictions.

Acknowledgements This work is supported by the U.S. Department ofEnergy, National Nuclear Security Administration, Advanced Simulationand Computing Program, as a Cooperative Agreement under thePredictive Science Academic Alliance Program, under Contract No.DE-NA0002378.

Appendix A: Bayesian MFS framework

The discrepancy function based Bayesian framework(equivalent to co-Kriging) predicts a high-fidelity responsebased on a scaled low fidelity prediction and the correspond-

ing discrepancy prediction,ρyL xð Þ andδ xð Þ, respectively, as

yH xð Þ ¼ ρyL xð Þ þ δ xð Þ ð15Þwhere ρ is a scalar for the low-fidelity response. Both predic-tions are based on a low fidelity Gaussian process (GP) modelYL(x) and a discrepancy function GP model Δ(x), respective-ly. MFS prediction is made with the high-fidelity GP model,which is the sum of the two GP models as

YH xð Þ ¼ ρYL xð Þ þΔ xð Þ ð16Þ

MFS prediction is made with two steps: 1) estimatinghyperparameters of the GP models and ρ using the Bayesianinference and 2) updating the high-fidelity GPmodels definedwith the estimates using data. The MFS predictor is the meanof the updated high-fidelity GPmodel YH(x) ∣ yH, yL, which isexpressed as

yH xð Þ ¼ E YH xð ÞjyH ; yLð Þ ð17Þwhere yH and yL are, respectively, the vectors of high- andlow-fidelity sample sets. The corresponding prediction uncer-tainty estimate is

σ2

H xð Þ ¼ Var YH xð ÞjyH ; yLð Þ ð18Þ

Gaussian process models

The GP models assume that the prediction uncertainties (epi-stemic uncertainty) follow normal distributions. The low-fidelity and discrepancy GP models are respectively definedat x as

YL xð Þ∼N ξL xð ÞβL;σ2L

� �Δ xð Þ∼N ξΔ xð ÞβΔ;σ

� � ð19Þ

where the subscript L andΔ denote low-fidelity function anddiscrepancy function, respectively. Since the GP models arefunctions of x, the GP models are defined with mean func-tions, where a polynomial regression model is oftenemployed. ξL(x)andξΔ(x)are basis vectors of the mean func-tions at x and βLandβΔare coefficient vectors. Note that theunknown coefficient vectors are to be estimated based on data.Equation (24) and (31) are the estimators of the coefficientvectors. The variances remain constant but the GP modelsdefine the relation between responses at two different pointswith covariance functions based on the distance between thepoints, which are defined as

cov YL xð Þ; YL x0

� �� �¼ σ2

Lcorr YL xð Þ; YL x0

� �� �cov Δ xð Þ;Δ x

0� �� �

¼ σ2Δcorr Δ xð Þ;Δ x

0� �� � ð20Þ

where σ2L and σ2

Δ are process variances of the GP models ofthe low-fidelity and discrepancy function, respectively.corr(YL(x), YL(x

′))andcorr(Δ(x),Δ(x′))are correlation func-tions. A Gaussian kernel is used for the correlation functionsas

corr YL xð Þ; YL x0

� �� �¼ exp − x−x

0� �T

diag ωLð Þ x−x0

� �� �

corr Δ xð Þ;Δ x0

� �� �¼ exp − x−x

0� �T

diag ωΔð Þ x−x0

� �� �ð21Þ

wherediag(ωL)anddiag(ωΔ)are diagonal matrices. Their (i,i)component is the ith component ofωLand ωΔ, respectively.

Finally, the high fidelity GP model is obtained as the sumof the GP models as

YH xð Þ∼N ρξTL xð ÞβL þ ξT

Δ xð ÞβΔ; ρ2σ2

L þ σ2Δ

� � ð22Þ

The covariance function of the GP model is

cov YH xð Þ; YH x0

� �� �¼ ρ2σ2

Lcorr YL xð Þ; YL x0

� �� �þ σ2

Δcorr Δ xð Þ;Δ x0

� �� �ð23Þ

F o r m a k i n g M F S p r e d i c t i o n ,{βL,ωL, σL,βΔ,ωΔ, σΔ, ρ}of the high fidelity GP modelneed to be estimated.

Estimating hyper parameters of the GP models and ρ

A common condition for MFS sampling is that high fidelitysampling locations are selected from low fidelity samplinglocations. With the condition, the discrepancy can be directlyobtained at the common locations, i.e. at the high-fidelitysample locations. For the Bayesian framework, the conditionallows computational advantage of estimating the parameters.

410 C. Park et al.

Page 13: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

The condit ion yields independent est imations of{βL,ωL, σL}and{βΔ,ωΔ, σΔ, ρ}. This is because the dis-crepancy function prediction does not depend on the low fi-delity prediction that makes{βΔ,ωΔ, σΔ, ρ}estimation inde-pendent to {βL,ωL, σL}estimation.

The Bayesian inference uses the GP model inversely toobtain the likelihood function with respect to{βL,ωL, σL}.Fortunately, {βL, σL} can be analytically expressed as a func-tion ofωL. Thus, the likelihood function is reformulated as afunction of ωLonly by substituting {βL, σL}:

p ωLjyLð Þ ¼ 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2πð ÞnL σ

2

L ωLð ÞRL ωLð Þ����

����s

exp −1

2σ2

L ωLð ÞyL−XLβL ωLð Þ� �T

R−1L ωLð Þ yL−XLβL ωLð Þ

� �0@

1Að24Þ

where βL ωLð Þ; σ2L ωLð Þ �

are the analytical estimates of{βL, σL}for given ωL. These estimates can be expressed asa function ofωL:

βL ωLð Þ ¼ XTLR

−1L ωLð ÞXL

� �−1XT

LR−1L ωLð ÞyL ð25Þ

σ2L ωLð Þ ¼ 1

nLyL−XLβL ωLð Þ� �T

R−1L ωLð Þ yL−XLβL ωLð Þ� �

ð26Þ

Correlation matrixRL(ωL)is also a function of ωL, whichis defined as

RL ωΔð Þ ¼1 L exp − xL;nL−xL;1

� �Tdiag ωLð Þ xL;nL−xL;1

� �� �M O M

symm L 1

264

375

nL�nLð Þ

ð27Þ

where xL ¼ xL;1;…; xL;nL �

is the vector of nL low-fidelitysample locations. The moment matrix is defined by applyingthe sample locations to the basis vectors as

XL ¼ξTL xL;1� �⋮

ξTL xL;nL� �

8<:

9=;

nL�p1ð Þ

ð28Þ

where p1 is the number of basis of the mean function.Using the maximum likelihood estimation, the mode of the

likelihood function is used to determine ωL. Thus, the likeli-hood function can be reformulated as long as the mode re-mains the same. By substituting (26) into the likelihood func-tion and taking 2/nL

th root, the likelihood function in (23) and(24) can be simplified as

p ωLjyLð Þ∝−σ2L ωLð Þ RL ωLð Þj j−1=nL ð29Þ

Finally,ωLestimate is obtained by solving an optimizationproblem defined asargmin

ωΔ

σ2L ωLð Þ RL ωLð Þj j−1=nL ð30Þ

Note that Bayesian statistics uses the maximum a posteriori(MAP) estimation, which takes the mode of the posterior dis-tribution as a parameter estimate. However, in the case of non-informative prior, which is the case we used, the MAP esti-mation gives the same estimate with the method of maximumlikelihood estimation. Because of this reason, we simply de-scribe that hyper parameters were estimated with the methodof maximum likelihood in this paper.

In the same sense, {βΔ,ωΔ, σΔ, ρ}are estimated using thediscrepancy GPmodel. Since{βΔ, σΔ}can be analytically ob-tained for given{ωΔ, ρ}, the likelihood function is expressedas

p ωΔ; ρjyH ; ycL� � ¼ 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2πð ÞnH σ2

Δ ωΔ; ρð ÞRΔ ωΔð Þ����

����s

exp −1

2σ2

Δ ωΔ; ρð ÞyH−ρy

cL−XΔβΔ ωΔ; ρð Þ

� �TR−1

Δ ωΔð ÞyH−ρy

cL−XΔβΔ ωΔ; ρð Þ

� �8<:

9=;

264

375ð31Þ

where ycL is a subset of the low fidelity data at the commondata locations. The analytical estimates of {βΔ, σΔ} are

βΔ ωΔ; ρð Þ

¼ XTΔR

−1Δ ωΔð ÞXΔ

� �−1XT

ΔR−1Δ ωΔð Þ yH−ρy

cL

� � ð32Þ

σ2

Δ ωΔ; ρð Þ ¼ 1

nyyH−ρy

cL−XΔβΔ ωΔ; ρð Þ

� �TR−1

Δ ωΔð Þ yH−ρycL−XΔβΔ ωΔ; ρð Þ

� �

ð33Þ

RΔ(ωΔ) is defined as

RΔ ωΔð Þ ¼1 L exp − xH ;nH−xH ;1

� �Tdiag ωΔð Þ xH ;nH−xH ;1

� �� �O M

symm 1

264

375

nH�nHð Þ

ð34Þ

where xH ¼ xH ;1;…; xH ;nH

�is the vector of nHhigh-fidelity

sample locations. The moment matrix is defined by applyingthe sample locations to the basis vectors as

XΔ ¼ξTΔ xΔ;1

� �⋮

ξTΔ xΔ;nL

� �8<:

9=;

nΔ�pð Þ

ð35Þ

Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy... 411

Page 14: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

By substituting (32) and (33) into the exponential term ofthe likelihood function in (31) and taking 2/nH

th root, the like-lihood function is simplified as

p ωΔ; ρjyH ; ycL� �

∝−σ2

Δ ωΔ; ρð Þ RΔ ωΔð Þj j−1=nH ð36Þ

Finally, ωΔand ρ estimates are obtained by solving anoptimization problem defined as

argminωΔ;ρ

σ2

Δ ωΔ; ρð Þ RΔ ωΔð Þj j−1=nH ð37Þ

The objective function is the simplified negative maximumlikelihood function of {ωΔ, ρ}. One interesting aspect of thisproblem is the presence of ρ. For a fixed ρ, (37) turns a prob-lem of finding ωΔfor fittingyH−ρycL that is the same withfinding hyperparameters for fitting a Kriging surrogate for yH−ρycL (Lophaven et al. 2002).

By assuming σ2Δ ρð Þ and RΔare the process standard devi-

ation and correlation matrix for yH−ρycL, (37) can be turnedinto

argminρ

σ2

Δ ρð Þ RΔj j−1=nH ð38Þ

The objective function if the minimized negative likelihoodfunction of the Kriging surrogate for yH−ρycL. Equation (38)can be interpreted as a problem for finding ρ minimizing thenegative Kriging likelihood function.

Equation (38) is identical to (5). With the fact that σ2Δ ρð Þ

and |RΔ|represent the variance and waviness estimated basedon yH−ρycL, (38) is to find ρ that minimizes the bumpiness ofdiscrepancy data yH−ρycL.

High fidelity function prediction

When all the hyperparameters are estimated, MFS predictionat a chosen point x is made by updating the GP model definedwith the estimated hyperparameters. Posterior distribution ofthe prediction is obtained with (17).

The predictor and prediction uncertainty estimate are themean and variance of the posterior distribution, respectively,expressed as

yH xð Þ ¼ ξ xð ÞT βþ t xð ÞTΣ−1 y−Xβ� �

σ2

yHxð Þ ¼ ρ2σ

2

L ωLð Þ þ σ2

Δ ωΔ; ρð Þ−t xð ÞTΣ−1t xð Þþ ξ xð Þ−t xð ÞTΣ−1X� �T

XTΣ−1X� �−1

ξ xð Þ−t xð ÞTΣ−1X� �

ð39Þ

where Σ−1is the inverse matrix of the covariance matrixexpressed in (42).

The vectors and matrixes used in (39) are defined as fol-lows.

βT¼ β

T

L βT

Δ

n oð40Þ

X ¼ XL 0ρXL XΔ

� ð41Þ

Σ ¼ σ2L ωLð ÞRL ωLð Þ ρσ2L ωLð ÞRLH ωLð ÞTρσ2L ωLð ÞRLH ωLð Þ ρ2σ2L ωLð ÞRL ωLð Þ þ σ2Δ ωΔ; ρð ÞRΔ ωΔð Þ�

ð42Þwhere the correlation matrix between the high and low fidelityGP models RLH(ωL)is expressed as

RLH ωLð Þ ¼1 L corr YL xL;nL

� �; YL xH ;1

� �� �M O M

corr YL xL;1� �

; YL xH ;nH

� �� �L 1

24

35

nL�nHð Þ

ð43Þ

ξ xð Þ ¼ ξL xð ÞξΔ xð Þ�

nLþnHð Þ�1

ð44Þ

t(x)is expressed as

t xð Þ ¼

cov YH xð Þ; YL xL;1� �� �

Mcov YH xð Þ; YL xL;nL

� �� �cov YH xð Þ; YH xH ;1

� �� �M

cov YH xð Þ; YH xH ;nH

� �� �

26666664

37777775

nLþnHð Þ�1

ð45Þ

where cov(YH(x), YL(x′))andcov(YH(x), YH(x

′))are defined in(20) and (21).Appendix B: Cantilever beam example

A cantilever beam example is chosen as a structural example toshowhow ρ simplifies the discrepancy function tomaximizeMFSprediction accuracy. Figure 10(a) shows the geometry of a canti-lever beam with a rectangular section under a concentrated forceand moment. By choosing the height of the beam as a variable, itis easy to visualize the discrepancy function. Since shear deforma-tion is not ignorable for a thick beam, the Timoshenko beamtheory is used as a high-fidelity function to calculate the tip-de-flection. The high-fidelity function is expressed as

f H hð Þ ¼ 4FL3

Ebh3þ 6ML2

Ebh3þ FL 4þ 5υð Þ

2Ebhð46Þ

The low-fidelity function is based on the Euler-Bernoullibeam theory, where the shear deformation is ignored, as

f L hð Þ ¼ 4FL3

Ebh3þ 6ML2

Ebh3ð47Þ

412 C. Park et al.

Page 15: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

The error in the low-fidelity function comes from the thirdterm of (46). Since the error is linear in the reciprocal of h, itcannot be captured perfectly with a polynomial trend function.

Figure 10(b) compares the high- and low-fidelity functions.Two samples at x = {7.1,10} are used from the high-fidelityfunction and five samples at x = {7.1,8.5,10,11.5,13} from thelow-fidelity function.

Figure 11(a) and (b) show the predictions using the BayesianMFS frameworks with and without including ρ (or with ρ= 1).The red solid curves are the MFS prediction, while the black

dashed curves are the true high-fidelity function. Figure 11(c)and (d) show the corresponding discrepancy predictions andthe true discrepancy functions. Note that the true discrepancyfunctions are different since one is determined by ρ and the otheris the difference between the high- and low-fidelity functions.The figures also show the two-sigma prediction uncertainty. Inthe case of Fig. 11(a) and (c), the prediction uncertainty is toosmall to show in the figure.

Figure 11(c) shows that ρ is determined such that the truediscrepancy function has a small variation, which allows an

a Cantilever beam under concentrated force and moment

b Tip-deflections of the high- and low-fidelitymodels

Fig. 10 Cantilever beam example and the tip-deflection of the high- and low-fidelity models with respect to section height

a b

c d

Fig. 11 The MFS predictions andthe discrepancy predictions forcantilever beam example

Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy... 413

Page 16: Low-fidelity scale factor improves Bayesian multi-fidelity ...RESEARCH PAPER Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy

accurate prediction with two samples. On the other hand, thetrue discrepancy function without ρ has much larger variationthat leads larger bumpiness. The prediction error of the dis-crepancy function with smaller bumpiness is 0.0011 in termsof RMSE in Fig. 11(c), while its counterpart is 0.0040 inFig. 11(d), which is almost four times larger.

References

Balabanov V, Haftka RT, Grossman B, Mason WH, Watson LT (1998)Multifidelity response surface model for HSCTwing bending mate-rial weight. In Proceedings of 7th AIAA/USAF/NASA/ISSMOSymposium on Multidisciplinary Analysis and Optimization (pp.778–788)

Cressie N (2015) Statistics for spatial data. John Wiley & SonsDuchon J (1977) Splines minimizing rotation-invariant semi-norms in

Sobolev spaces. In Constructive Theory of Functions of SeveralVariables, W. Schempp and K. Zeller, eds. Lecture Notes inMathematics, No. 571. Springer, Berlin, 85–100

Fernández-GodinoMG, Park C, KimNH, andHaftka RT (2016), Reviewof Multi-fidelity Models. arXIV preprint arXiv:1609.07196.September, 2016

Forrester AI, Sóbester A, KeaneAJ (2007)Multi-fidelity optimization viasurrogate modelling. Proceedings of the royal society A: mathemat-ical, physical and engineering science 463(2088):3251–3269

Gano SE, Renaud JE, Martin JD, Simpson TW (2006) Update strategiesfor kriging models used in variable fidelity optimization. StructMultidiscip Optim 32(4):287–298

Gutmann HM (2001) A radial basis function method for global optimi-zation. J Glob Optim 19(3):201–227

Jin R, Chen W, Sudjianto A (2005) An efficient algorithm for construct-ing optimal design of computer experiments. Journal of StatisticalPlanning and Inference 134(1):268–287

Johnson RA, Wichern DW (2007) Applied multivariate statistical analy-sis. Sixth edition. Prentice–Hall, Englewood Cliffs, New Jersey

Kennedy MC, O'Hagan A (2000) Predicting the output from a complexcomputer code when fast approximations are available. Biometrika87(1):1–13

Le Gratiet L (2013) Multi-fidelity Gaussian process regression for com-puter experiments (Doctoral dissertation, Université Paris-Diderot-Paris VII)

Lophaven SN, Nielsen HB, Søndergaard J (2002). DACE-A MatlabKriging toolbox, version 2.0

Mason BH, Haftka RT, Johnson ER, Farley GL (1998) Variable complex-ity design of composite fuselage frames by response surface tech-niques. Thin-Walled Struct 32(4):235–261

Matsumura T, Haftka RT, Kim NH (2015) Accurate predictions fromnoisy data; replication versus exploration with application to struc-tural failure. Struct Multidiscip Optim 51(1):23–40

Morris MD, Mitchell TJ, Ylvisaker D (1993) Bayesian design and anal-ysis of computer experiments: use of derivatives in surface predic-tion. Technometrics 35(3):243–255

Park C, Haftka RT, Kim NH (2017). Remarks on multi-fidelity surro-gates. Structural and Multidisciplinary Optimization, 55(3), 1029-1050

Qian PZ, Wu CJ (2008) Bayesian hierarchical modeling for integratinglow-accuracy and high-accuracy experiments. Technometrics 50(2):192–204

Rasmussen CE (2004). Gaussian processes in machine learning. InAdvanced lectures on machine learning (pp. 63–71). SpringerBerlin Heidelberg

Reddon JR, Jackson DN, Schopflocher D (1985) Distribution of the de-terminant of the sample correlation matrix: Monte Carlo type oneerror rates. J Educ Stat 10(4):384–388

Ripley BD (1981) Spatial statistics. Wiley, New York. https://doi.org/10.1002/0471725218

Ben Salem M, Tomaso L (2018) Automatic selection for general surro-gate models, Structural and multidisciplinary optimization.Retrieved from https://doi.org/10.1007/s00158-018-1925-3

Xiong S, Qian PZ, Wu CJ (2013) Sequential design and analysis of highaccuracy and low-accuracy computer codes. Technometrics 55(1):37–46

ZhouQ,WangY, Choi SK, Jiang P, Shao X, Hu J, Shu L (2017). A robustoptimization approach based onmulti-fidelitymetamodel. Structuraland Multidisciplinary Optimization, 57(2), 775-797.

414 C. Park et al.