10.1.1.410.2397

10
H. Sarin M. Kokkolaras G. Hulbert P. Papalambros Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109-1316 S. Barbat R.-J. Yang Passive Safety, Research and Advanced Engineering, Ford Motor Company, Highland Park, MI 48203-3177 Comparing Time Histories for Validation of Simulation Models: Error Measures and Metrics Computer modeling and simulation are the cornerstones of product design and develop- ment in the automotive industry. Computer-aided engineering tools have improved to the extent that virtual testing may lead to significant reduction in prototype building and testing of vehicle designs. In order to make this a reality, we need to assess our confi- dence in the predictive capabilities of simulation models. As a first step in this direction, this paper deals with developing measures and a metric to compare time histories ob- tained from simulation model outputs and experimental tests. The focus of the work is on vehicle safety applications. We restrict attention to quantifying discrepancy between time histories as the latter constitute the predominant form of responses of interest in vehicle safety considerations. First, we evaluate popular measures used to quantify discrepancy between time histories in fields such as statistics, computational mechanics, signal pro- cessing, and data mining. Three independent error measures are proposed for vehicle safety applications, associated with three physically meaningful characteristics (phase, magnitude, and slope), which utilize norms, cross-correlation measures, and algorithms such as dynamic time warping to quantify discrepancies. A combined use of these three measures can serve as a metric that encapsulates the important aspects of time history comparison. It is also shown how these measures can be used in conjunction with ratings from subject matter experts to build regression-based validation metrics. DOI: 10.1115/1.4002478 1 Introduction Automotive manufacturers have to meet several vehicle safety regulations and mandatory Federal Motor Vehicle Safety Stan- dards FMVSS. Additionally, consumer information programs such as the new car assessment program NCAP and the Insur- ance Institute of Highway Safety IIHS impose further require- ments on vehicle safety. Currently, assessment of whether these requirements are satisfied is conducted through numerous, costly and time-consuming physical experiments. Computer modeling and simulation-based methods for virtual vehicle safety analysis and design verification could make this process more time and cost efficient. Moreover, virtual testing VT can improve real-world vehicle safety beyond regulatory requirements since computer predictions can be used to extend the range of protection to real-world crash conditions at speeds and configurations not addressed by current regulations. To achieve the promises of VT, computer predictions need veri- fication and validation V&V, so that the designs obtained using simulation models can be cleared for production with minimal or reduced physical prototype testing. The American Institute of Aeronautics and Astronautics guide for verification and validation of computational fluid dynamics simulations defines verification and validation as follows 1. “Verification is the process of determining that a model implementation accurately represents the developer’s con- ceptual description of the model and the solution to the model.” “Validation is the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model.” The American Society of Mechanical Engineers Standards Committee on verification and validation in computational solid mechanics describes model validation as a two-step process 2: 1. quantitatively comparing the computational and experimen- tal results for the response of interest 2. determining whether there is an acceptable agreement be- tween the model and the experiment for the intended use of the model Oberkampf and Barone proposed in Ref. 3 six properties that a validation metric should satisfy. These six properties form a generic guideline and act as a set of requirements for the devel- opment of a new validation metric. Their third property dictates that an effective metric for measuring the discrepancy between simulation model responses represented by time histories is nec- essary to accomplish the first step of the validation process. In this paper, we review existing error measures and metrics and discuss their advantages and limitations. We then propose a combination of measures associated with three physically meaningful error characteristics: phase, magnitude, and slope. The proposed ap- proach utilizes measures such as cross-correlation and L 1 norm and algorithms such as dynamic time warping DTW to quantify the discrepancy between time histories. We then show how these measures can be used to build regression-based validation metrics in cases where subject matter expert data are available. It is important to note that four of the remaining five properties advocated by Oberkampf and Barone 3 for useful validation metrics involve the uncertainties related to numerical error, ex- perimental error, experiment postprocessing, and the number of experiments conducted. While these are critical issues to be ad- dressed, they are not considered in this paper as the goal of this present work is to establish an appropriate set of error measures for vehicle safety applications and to assess combinations of these measures into an error metric. With an established set of error measures, the next step toward a fully developed validation metric is to use the error measures to provide the quantitative values for Contributed by the Dynamic Systems Division of ASME for publication in the JOURNAL OF DYNAMIC SYSTEMS,MEASUREMENT, AND CONTROL. Manuscript received September 15, 2008; final manuscript received May 12, 2010; published online October 28, 2010. Assoc. Editor: Jeffrey L. Stein. Journal of Dynamic Systems, Measurement, and Control NOVEMBER 2010, Vol. 132 / 061401-1 Copyright © 2010 by ASME Downloaded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm

description

teste

Transcript of 10.1.1.410.2397

  • inoa

    aniveestisig

    iveh donatior cs. Fiesiniasse),e waso sexp24

    1

    r

    dsaments on vehicle safety. Currently, assessment of whether theserequirements are satisfied is conducted through numerous, costlyand time-consuming physical experiments.

    vprrc

    fisrAoa

    tween the model and the experiment for the intended use ofthe model

    JSO

    J

    DownloaComputer modeling and simulation-based methods for virtualehicle safety analysis and design verification could make thisrocess more time and cost efficient. Moreover, virtual testingVT can improve real-world vehicle safety beyond regulatoryequirements since computer predictions can be used to extend theange of protection to real-world crash conditions at speeds andonfigurations not addressed by current regulations.

    To achieve the promises of VT, computer predictions need veri-cation and validation V&V, so that the designs obtained usingimulation models can be cleared for production with minimal oreduced physical prototype testing. The American Institute oferonautics and Astronautics guide for verification and validationf computational fluid dynamics simulations defines verificationnd validation as follows 1.

    Verification is the process of determining that a modelimplementation accurately represents the developers con-ceptual description of the model and the solution to themodel.

    Validation is the process of determining the degree towhich a model is an accurate representation of the real worldfrom the perspective of the intended uses of the model.

    Oberkampf and Barone proposed in Ref. 3 six properties thata validation metric should satisfy. These six properties form ageneric guideline and act as a set of requirements for the devel-opment of a new validation metric. Their third property dictatesthat an effective metric for measuring the discrepancy betweensimulation model responses represented by time histories is nec-essary to accomplish the first step of the validation process. In thispaper, we review existing error measures and metrics and discusstheir advantages and limitations. We then propose a combinationof measures associated with three physically meaningful errorcharacteristics: phase, magnitude, and slope. The proposed ap-proach utilizes measures such as cross-correlation and L1 normand algorithms such as dynamic time warping DTW to quantifythe discrepancy between time histories. We then show how thesemeasures can be used to build regression-based validation metricsin cases where subject matter expert data are available.

    It is important to note that four of the remaining five propertiesadvocated by Oberkampf and Barone 3 for useful validationmetrics involve the uncertainties related to numerical error, ex-perimental error, experiment postprocessing, and the number ofexperiments conducted. While these are critical issues to be ad-dressed, they are not considered in this paper as the goal of thispresent work is to establish an appropriate set of error measuresfor vehicle safety applications and to assess combinations of thesemeasures into an error metric. With an established set of errormeasures, the next step toward a fully developed validation metricis to use the error measures to provide the quantitative values for

    Contributed by the Dynamic Systems Division of ASME for publication in theOURNAL OF DYNAMIC SYSTEMS, MEASUREMENT, AND CONTROL. Manuscript receivedeptember 15, 2008; final manuscript received May 12, 2010; published onlinectober 28, 2010. Assoc. Editor: Jeffrey L. Stein.

    ournal of Dynamic Systems, Measurement, and Control NOVEMBER 2010, Vol. 132 / 061401-1H. Sarin

    M. Kokkolaras

    G. Hulbert

    P. Papalambros

    Department of Mechanical Engineering,University of Michigan,

    Ann Arbor, MI 48109-1316

    S. Barbat

    R.-J. Yang

    Passive Safety, Research and AdvancedEngineering,

    Ford Motor Company,Highland Park, MI 48203-3177

    ComparValidatiError MeComputer modelingment in the automotextent that virtual ttesting of vehicle dedence in the predictthis paper deals wittained from simulativehicle safety applichistories as the lattesafety considerationbetween time historcessing, and data msafety applications,magnitude, and slopsuch as dynamic timmeasures can servecomparison. It is alsfrom subject matterDOI: 10.1115/1.400

    IntroductionAutomotive manufacturers have to meet several vehicle safety

    egulations and mandatory Federal Motor Vehicle Safety Stan-ards FMVSS. Additionally, consumer information programsuch as the new car assessment program NCAP and the Insur-nce Institute of Highway Safety IIHS impose further require-Copyright 20

    ded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASMg Time Histories forn of Simulation Models:sures and Metrics

    d simulation are the cornerstones of product design and develop-industry. Computer-aided engineering tools have improved to theng may lead to significant reduction in prototype building andns. In order to make this a reality, we need to assess our confi-capabilities of simulation models. As a first step in this direction,eveloping measures and a metric to compare time histories ob-model outputs and experimental tests. The focus of the work is onns. We restrict attention to quantifying discrepancy between time

    onstitute the predominant form of responses of interest in vehicleirst, we evaluate popular measures used to quantify discrepancyin fields such as statistics, computational mechanics, signal pro-ng. Three independent error measures are proposed for vehicleociated with three physically meaningful characteristics (phase,which utilize norms, cross-correlation measures, and algorithmsarping to quantify discrepancies. A combined use of these threea metric that encapsulates the important aspects of time historyhown how these measures can be used in conjunction with ratingserts to build regression-based validation metrics.

    78

    The American Society of Mechanical Engineers StandardsCommittee on verification and validation in computational solidmechanics describes model validation as a two-step process 2:

    1. quantitatively comparing the computational and experimen-tal results for the response of interest

    2. determining whether there is an acceptable agreement be-10 by ASME

    E license or copyright; see http://www.asme.org/terms/Terms_Use.cfm

  • app

    2

    rtnrvrvpmwmbnpu

    sh

    in

    Table 1 Results for the L1, L2, and L norms

    Norm Test 1 and test 2 Test 1 and test 3

    0

    Downloassessment under uncertainty. For example, the error measuresroposed in this paper could be used in the Bayesian frameworkroposed by Rebba and Mahadevan 4.

    Review of Error Measures, Metrics, and AlgorithmsIn this section, we review popular measures, metrics, and algo-

    ithms used currently to quantify discrepancies between time his-ories in various fields such as voice, signature or pattern recog-ition, computational mechanics, data mining, and operationsesearch. Of particular emphasis are their advantages and disad-antages in order to identify a set of measures, metrics, and algo-ithms that are best suited for vehicle safety applications. We pro-ide references only for the less commonly used metrics. In thisaper, a distinction is made between error measures and erroretrics. An error measure provides a quantitative value associatedith differences in a particular feature of time series. An erroretric provides an overall quantitative value of the discrepancy

    etween time series; it can be a single error measure or a combi-ation of error measures. Typically, an error measure does notrovide a complete perspective of time series differences to besed reliably as an error metric.

    We consider a simple example comprising time histories of theame physical measure obtained from three different tests. Timeistories, test 2 and test 3, are compared with time historytest 1 to determine which one has the smallest discrepancy ands thus the better prediction of test 1 Fig. 1. The reader shouldot be biased as to which of the time histories test 2 or test 3 is

    Fig. 1 Time history examples

    61401-2 / Vol. 132, NOVEMBER 2010ded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASMcloser to time history test 1. These time histories are used todemonstrate that there is a need for objective metrics and thatexisting measures must be used appropriately.

    2.1 Vector Norms. When time histories are discretized i.e.,finite-dimensional, the most popular measure for quantifyingtheir difference is to use vector norms. Assuming two time historyvectors A and B of equal size N, the Lp norm of the difference ofthe two is

    A Bp = i=1

    N

    ai bip1/p 1The three most popular norms are L1, L2 Euclidean, and L.

    The results obtained when using these three norms for measuringthe discrepancy between test 1 and test 2 and test 1 and test 3 arepresented in Table 1 and confirm the known fact that norm choicemay lead to different conclusions: one would conclude that test 2is closer to test 1 when using the L1 and and L norms, whilethe use of the L2 norm would lead to the conclusion that test 3 is,in fact, closer to test 1. The major limitation of using norms andthe reason of the illustrated differences is that they are not ca-pable of distinguishing error due to phase from error due to mag-nitude. Even with this limitation, norms form the foundation forquantifying discrepancy between time histories.

    2.2 Average Residual and Its Standard Deviation. The av-erage residual measures the mean difference between two timehistories:

    R =i=1

    N

    ai bi

    N2

    A distinct disadvantage is that positive and negative differencesat various points may cancel each other out. The standard devia-tion of residuals is defined as the square root of the sample vari-ance of the residuals:

    SN1 =

    i=1N Ri R 2

    N 13

    where Ri= aibi.The results for the time history examples shown in Fig. 1 are

    presented in Table 2. The results cannot lead to conclusive state-ments regarding which test 2 or 3 is closer to test 1, as themeasures of average residual and its standard deviation are con-flicting.

    2.3 Coefficient of Correlation and Cross-Correlation. Thecoefficient of correlation is a measure that indicates the extent of

    L1 0.3 0.45L2 0.6 0.58L 0.82 0.85

    Table 2 Results for average residual and its standarddeviation

    Measure Test 1 and test 2 Test 1 and test 3

    R 0.8 3.8SN1 7.7 6.4

    Transactions of the ASMEE license or copyright; see http://www.asme.org/terms/Terms_Use.cfm

  • linear relationship between two time histories, i.e., to what extentcan A be represented as mB+c. The coefficient of correlation canrltptc

    c

    t

    aibi

    i2

    wtteAttions.

    aeaTtCt

    w

    elTci

    limitation is illustrated by the example in Fig. 2: The two simple

    comprehensive error facto, similar to the S&G metric. The mag-

    Table 3 Results for coefficient of correlation and R-square

    Measure Test 1 and test 2 Test 1 and test 3

    J l

    DownloaPS&G =1

    cos1 ABAABB 7CS&G = MS&G2 + PS&G2 8

    here

    AA =

    i=1

    N

    ai2

    N, BB =

    i=1

    N

    bi2

    N, AB =

    i=1

    N

    aibi

    NThe results of applying the S&G metric to the time history

    xamples are presented in Table 4. The S&G metric quantifies aower magnitude error for test 2 and a lower phase error for test 3.he combined error is lower for test 2, indicating that test 2 isloser to test 1 than test 3. The limitation of the S&G metric is thatt is not symmetric. The results depend on the time history that is

    ournal of Dynamic Systems, Measurement, and Controded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASMnitude error factor is given by

    MR = signAA BBlog101 + AA BBAABB 9The results of applying the Russell metric to the time history

    examples are presented in Table 5. Even though Russells error

    Table 4 Results for S&G metric

    Test 1 andtest 2

    Test 2 andtest 1

    Test 1 andtest 3

    Test 3 andtest 1

    MS&G 0.08 0.08 0.67 0.40PS&G 0.20 0.20 0.17 0.17CS&G 0.22 0.22 0.70 0.44

    NOVEMBER 2010, Vol. 132 / 061401-32.4 Sprague and Geers (S&G) Metric. Geers 9 proposedn error measure for comparing time histories that combined therrors due to magnitude and phase differences. Recently, Spraguend Geers updated the phase error portion of the metric 10,11.he error in magnitude and phase are computed for the time his-

    ories by using Eqs. 6 and 7, respectively. The combined errorS&G is then used to provide an overall error measure between the

    wo time histories.

    MS&G =AABB

    1 6

    time histories have the same value for AA and BB but differ fromeach other in magnitude, phase, and shape. Even though thereexists an error in magnitude, the S&G metric quantifies it as zero.

    2.5 Russells Error Measure. Russell 12,13 proposed a setof magnitude, phase, and comprehensive error measures to pro-vide a robust means for quantifying the difference between timehistories. The metric is similar to the S&G metric with a modifi-cation in the magnitude error factor. The magnitude error factor isdefined such that it has approximately the same scale as the phaseerror when there exists an order-of-magnitude difference in am-plitude of the responses. These are then combined to form theange from 1 to +1. The value of +1 represents a perfect positiveinear relationship between the time histories, which implies thathey are both identical in shape. A value of 1 would indicate aerfect negative linear relation, which would indicate that the twoime histories are mirror images of each other. The coefficient oforrelation is computed as

    =

    i=1

    N

    ai abi b

    i=1

    N

    ai a2i=1

    N

    bi b24

    The square of the coefficient of correlation is called the coeffi-ient of determination and is commonly known as R-square.

    The results of applying this measure to the previous time his-ory examples are presented in Table 3 and indicate that test 3 is

    n =

    N ni=1

    Nn

    N ni=1

    Nn

    ai2

    i=1

    Nn

    a

    here n=0,1 , . . . ,N1. To compute the phase difference betweenwo time histories, we determine the maximum value n; n ishen a measure for phase lag. This concept has been used by Liut al. 5 and Gu and Yang 6 and is also included as a metric inDVISER, a commercial software package that contains a simula-

    ion model quality rating module 7,8 for vehicle safety applica-better correlated with test 1 than is test 2. However, the R-squarevalues for tests 2 and 3 are very low and hence neither seems to beclose to test 1. This is mainly because these measures are sensitiveto phase difference and cannot distinguish between error due tophase and error due to magnitude.

    A modification to the concept of coefficient of correlation usedin signal processing is called cross-correlation. It is sometimescalled the sliding dot product and has applications in the fields ofpattern recognition and cryptanalysis. It can be used to measurethe phase lag between two time histories. Cross-correlation is aseries defined as

    +n i=1

    Nn

    aii=1

    Nn

    bi+n

    N ni=1

    Nn

    bi+n2

    i=1

    Nn

    bi+n2 5

    used as a reference in Eq. 6.The separation of the error into magnitude and phase compo-

    nents is an advantage when more detailed investigation of theerror sources is necessary. But the metric lumps the entire infor-mation of the time histories into AA, BB, and AB. Consequently,this metric cannot consider the shape of the time histories. This

    0.5 0.6R-square 0.25 0.36E license or copyright; see http://www.asme.org/terms/Terms_Use.cfm

  • mtmte

    uptc

    n

    oqf

    T

    ec

    o

    fen

    Fm

    MPC

    Table 6 Results for NISE metric

    Test 1 and test 2 Test 1 and test 3

    0

    Downloaeasure overcomes the limitation of asymmetry as observed inhe S&G metric, it still fails in identifying and quantifying the

    agnitude error of the example shown in Fig. 2 i.e., the magni-ude error between these two time histories is still computed asqual to zero.

    2.6 Normalized Integral Square Error (NISE). The NISE issed to quantify the difference between time histories from re-eated tests, e.g., see Ref. 14. It measures the difference betweenwo time histories and is related in principle to the concept ofross-correlation. It considers three aspects: phase shift, amplitudemagnitude difference, and shape difference.

    It uses the cross-correlation principle from Sec. 2.3 to compute. It then shifts one of the time histories A or B relative to thether by n steps to compensate for the error in phase. Theuantity ABn is computed after this adjustment. The equationsor the phase, magnitude, and shape error are given in Eqs.1012, respectively.

    PNISE =2ABn 2ABAA + BB

    10

    MNISE = n 2ABnAA + BB

    11

    SNISE = 1 n 12he overall NISE for two time histories is given by

    CNISE = PNISE + MNISE + SNISE = 1 2AB

    AA + BB13

    The results of applying the NISE metric to the time historyxamples are presented in Table 6. Even though NISE attempts toonsider shape error, the overall measure CNISE is independentf n as this term is cancelled out; hence, it does not accountor shape error. An interesting observation is that the magnituderror contribution to the NISE error can be negative, i.e., the mag-itude error can decrease the overall combined error.

    ig. 2 Failure of S&G metric to quantify error due toagnitude

    Table 5 Results for Russell metric

    Test 1 and test 2 Test 1 and test 3

    R 0.064 0.32R 0.20 0.17R 0.21 0.36

    61401-4 / Vol. 132, NOVEMBER 2010ded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASM2.7 Dynamic Time Warping (DTW). DTW is an algorithmfor measuring discrepancy between time histories and was firstused in context with speech recognition in the 1960s 15. Sincethen, it has been used in a variety of applications: computer visione.g., Ref. 16, data mining e.g., Ref. 17, signature matchinge.g., Ref. 18, and polygonal shape matching e.g., Ref. 19.The ability of DTW to identify that two time histories with timeshifts are a match makes it an important similarity identificationtechnique in speech recognition 20, since human speech consistsof varying durations and paces. The time warping technique alignspeaks and valleys as much as possible by expanding and com-pressing the time axis according to a given cost distance func-tion 21.

    As an example, consider the cost function, di , j= aibj2, inwhich ai is the ith element of time history A ai=Ati, bj is thejth element of time history B bj =Btj, and i , j=1,2 , . . . ,N,where N is the total number of time samples the lengths of A andB are assumed to be the same in this case. Let wk= ik , jk denotethe indices of an ordered pair of time samples from A and B. TheDTW algorithm then finds a monotonically increasing sequence ofordered adjacent pairs such that the cumulative cost function sumof the cost functions over k=1,2 , . . . ,N is minimized. That is, asequence w1 ,w2 , . . . ,wN is found that minimizes the cost func-tion subject to the constraints that i the sequence must progressone step at a time 0 ik ik11 and 0 jk jk11, k=2,3 , . . . ,N+1 and ii the sequence is monotonically increasingwk1 wk0, k=2,3 , . . . ,N+1.

    The results of the DTW algorithm for the time history exampleare shown in Figs. 3 and 4. The DTW distance square root of thecumulative cost function for test 2 is 768 while the DTW distancefor test 3 is 5636. Consequently, test 2 is to be a closer represen-tation of test 1 than is test 3 with respect to the DTW distance.

    3 Proposed Error MeasuresSeveral measures used to quantify discrepancy or error be-

    tween time histories have been discussed in the previous section.Each has its own advantages and limitations. The concepts ofmagnitude and phase measures were introduced and different ap-proaches to measuring and combining these measures into a singlemetric were articulated. In signal processing literature, a third sig-nal measure is given by frequency. That is, for a simple harmonicsignal, the time history can be described by

    yti = Y costi + 14in which Y is the amplitude, is the frequency, is the phase,and ti is the value of time at time index i. The difficulty in quan-tifying the error associated with the features of phase, magnitude,and frequency separately is that they can be coupled strongly. Forexample, to quantify the error associated with magnitude, thepresence of a phase difference between the time histories mayresult in a misleading measurement. Thus, it is important to mini-mize the influence of the other two features when quantifying theerror due to the third one. While this can be accomplished usingstandard signal processing techniques such as fast Fourier trans-forms FFTs, transformation to the frequency domain is less use-ful for more content-rich signals that pure harmonics, such asvehicle safety-related time histories.

    PNISE 0.18 0.09MNISE 0.045 0.014SNISE 0.06 0.15CNISE 0.20 0.25

    Transactions of the ASMEE license or copyright; see http://www.asme.org/terms/Terms_Use.cfm

  • tmetq

    sRnfos

    amimn

    asbfq

    mde

    J l

    DownloaIn this section, we propose using measures to quantify magni-ude and phase error. To minimize the influence of phase on the

    agnitude error, we employ the DTW algorithm and a suitablerror function. To capture the complex behavior of frequency con-ent, we introduce a slope measure, which captures the local fre-uency discrepancies.

    3.1 Phase Error. To quantify the error due to phase, we con-idered the phase measure used by Sprague and Geers and byussell in their metric Eq. 7 and the cross-correlation tech-ique presented in Sec. 2.3. The cross-correlation based methodor quantifying error in phase was used in Ref. 22, shifting onef the time histories to maximize the correlation coefficient. Thishift is considered to be the measure for error in phase.

    We compared the performances of the cross-correlation methodnd the S&G phase error and concluded that the cross-correlationethod has greater sensitivity to phase differences. An example to

    llustrate this is presented in Fig. 5. It is evident that there exists auch larger phase difference between the computer-aided engi-

    eering CAE-1 and test time histories than between theCAE-2 and test time histories note that the time history ex-mples in this section are not related to the ones in the previousection. The S&G phase error quantification was identical foroth cases while the cross-correlation quantification provided dif-erent values. Thus, we use the cross-correlation technique touantify phase error in our metric.

    Using the number of shifted time steps, n is a linear type ofeasure for phase error. In practical applications, small time step

    ifferences should be viewed as local, rather than global phaserror. Consequently, small time step differences should not be

    Fig. 3 DTW results for time histories 1 and 2

    Fig. 4 DTW results for time histories 1 and 3

    ournal of Dynamic Systems, Measurement, and Controded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASMweighted as heavily as large time step differences in the totalphase error measure. We chose a penalty function that can betuned to suit a particular application:

    Errorphase = enc/r 15

    where c and r are parameters that define the rise start point andrate of increase for the function. That is, c provides a measure ofthe time shift value below, which the phase error can be consid-ered negligible, while r affects the rate of phase error increaseabove the critical value given by c. For our safety applications,c=15 and r=20, based on the subject matter experts assessmentthat phase shifts less than 1.2 ms are negligible.

    3.2 Magnitude Error. To quantify the error only associatedwith magnitude, we need to first minimize the discrepancy be-tween the time histories caused by error in phase and frequency.We can compensate for global phase error by shifting the timehistory by the number of steps n computed for the phase error.However, time-shifted history may still exhibit local phase errors.To address the local effects, we apply DTW to the time-shiftedtime history and to the reference time history.

    The cost function selected for DTW considers not only thedistance but also the slope between two points:

    Fig. 5 Example to compare S&G phase measure tocross-correlation

    NOVEMBER 2010, Vol. 132 / 061401-5E license or copyright; see http://www.asme.org/terms/Terms_Use.cfm

  • ihTslh

    Dpwr

    im

    presna

    error, the slope is calculated for the time-shifted histories. Then,taking the derivative at each point, we obtain derivative time-

    ts+d ts+d

    Fh

    0

    Downloadi, j = aits bjts2 + ti tj2dAtsdt t=ti dBts

    dt t=tj 16n which the superscript ts is used to denote time-shifted timeistories although only one time history is shifted in practice.his ensures the mapping of a point to the closest point havingimilar slope on the other time history and thus minimizes bothocal phase and local frequency differences between the two timeistories.

    Figure 6 depicts two time history examples before and afterTW using Eq. 16. It is apparent that DTW minimizes the localhase and frequency effects. We then use the L1 norm on thearped time-shifted histories to isolate the relative magnitude er-

    or between the two time histories:

    Errormagnitude =Ats+w Bts+w1

    Bts+w117

    n which the superscript ts+w denotes the phase-shifted, DTWodified time histories.

    3.3 Slope Error. As frequency is a global measure, we em-loy the slope of the time history at each time point, following theationale that the time derivative of a harmonic time history, Eq.14 provides a direct value for frequency . Therefore, the sloperror is computed from the derivative of the time histories. Con-idering the derivative information ensures that the effect of mag-itude is compensated for, as the derivative depends on the slopend not on the amplitude. To minimize the effect of global phase

    ig. 6 Illustration of DTW effect on time histories: top timeistories before DTW and bottom time histories after DTW

    61401-6 / Vol. 132, NOVEMBER 2010ded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASMshifted histories represented by A and B . The effect oflocalized time shifts still exists, so the DTW algorithm is applied.The L1 norm of the DTW time-shifted histories is then used toquantify the isolated contribution of slope error:

    Errorslope =Ats+d+w Bts+d+w1

    Bts+d+w118

    in which the superscript ts+d+w denotes that the time historieswere processed by the sequence of time shifting for global phaseeffect, derivative computation, and finally, DTW.

    4 ExampleIn this section, we present results from the application of the

    proposed error measures using data from a case study provided byan International Standards Organization ISO working group onVirtual Testing ISO technical committee TC 22, subcommitteesSC 10 and 12, and working group WG 4. An experimental testsetup used available crash pulses to record acceleration time his-tories at different locations of a dummy during impact: head, tho-rax, and tibia. For the head impact case, three experiments wereconducted. Eleven time history responses were recorded. ThreeCAE simulations were conducted, employing a different computersimulation code for each model. We present here the error mea-sures for three responses of the head impact case: head impactordisplacement, head acceleration in the x-direction, and neck forcein the x-direction. Figure 7 provides plots of the time histories forthese three physical responses from the experiments and the simu-lations. The complete set of results can be found in Ref. 23.

    We quantify error between the different tests and the computa-tional models for each response individually. For each response,we compare tests among themselves to obtain error measures be-tween test repetitions. We then compare the computational modelpredictions to each of these tests to obtain a measure for the dis-crepancy between test and computational data. If the error be-tween tests is greater than or equal to the error between the com-putational model and the tests, we may infer that thecomputational model is adequate.

    To illustrate this idea, we consider the error measures relative toone test, test 1. In practice, the error measure quantification isperformed using each available test data set as the baseline case.We compare the remaining two tests and the three computationalmodels to test 1. We then have the following three cases:

    1. Looking at one response at a time, if the values associatedall three error measures for a computational model are lessthen or equal to the respective error values for the tests, wemay conclude that the computational model is a good repre-sentation of reality.

    2. Looking at one response at a time, if all three error measurevalues for one computational model are less than all threeerror measure values for another computational model, wemay conclude that the first model is better than the secondmodel.

    3. Looking across all responses, if we find that one computa-tional model performs well for all of the responses, we canconclude that it is better than the other models collectively.

    Figure 8 depicts the results for the three considered responses.From the head impactor response, all three error measure valuesfor all three computational models are less than or equal to theerror measure values for test 2 and test 3. Thus, we may concludethat all of the computational models are adequate for the headimpactor response. As there are negligible differences in the errormeasures for the three computational models, they can be consid-ered to have equally good representation of the head impactorresponse.

    Transactions of the ASMEE license or copyright; see http://www.asme.org/terms/Terms_Use.cfm

  • mmtcmxHa

    Fd

    J l

    DownloaFor the head acceleration in the x-direction, the computationalodels have acceptable error only in the phase component; theodels have larger magnitude and slope errors compared with the

    ests. In addition, the three computational models do not exhibitonsistently better or worse errors, so no conclusive ranking of theodels can be made. For the neck force response in the

    -direction, only the phase error is acceptable for all three models.owever, the computational models exhibit consistent magnitude

    nd slope errors, with model 1 being the best and model 3 being

    ig. 7 Computational results and test data for head impactorisplacement top, head acceleration in the x-directionmiddle, and neck force in the x-direction bottom

    ournal of Dynamic Systems, Measurement, and Controded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASMthe worst. In this case, we can rank the models. The values of theerror measures shown in Fig. 8 are consistent with the qualitativevisual differences that can be observed in Fig. 7 for phase, mag-nitude, and slope among the tests and computational time histo-ries.

    5 Building Regression-Based Validation Metrics UsingRatings of Subject Matter Experts

    In Sec. 4, we presented the results individually from the threeproposed error measures. It is apparent that no single error mea-sure can provide a quantitative metric regarding the match be-tween time history responses. Instead, as was done to develop theS&G error metric, a combination of error measures is needed.Consequently, a rational procedure is required to develop such acombination of error measures. In this work, we rely on the opin-ions of subject matter experts SMEs to build and train a regres-sion model for model validation. Subject matter experts are indi-viduals with long experience in a particular discipline. They arethus trusted to evaluate and rank the predictive capability of com-putational models by mostly visual inspection of comparisonplots. We use SME ratings of computational models and the threeproposed error measures to build a regression-based validationmetric that can validate and/or rank other computational models.Comparisons with other metrics in use commercially are made toassess the robustness of the developed regression model.

    We consider a case previously reported in Ref. 24, where adeceleration time history from a crash is known by means ofphysical experiment. Fifteen computational models had been de-

    Fig. 8 Sample of results for head impact case

    NOVEMBER 2010, Vol. 132 / 061401-7E license or copyright; see http://www.asme.org/terms/Terms_Use.cfm

  • vbtmrmt

    trpgfStpdr

    mnfl

    ls iting

    4

    23346656910

    337810

    Table 8 Error measure values for the CAE models

    CAE ID Phase Magnitude Slope

    0

    Downloaeloped to predict the deceleration time history for this crashthese models are not necessarily different computational modelsut can include different substantiations of the same computa-ional model due to different parameter values chosen for the

    odels. Six SMEs were presented with fifteen comparison plotsone for each model and average SME ranking of the models wasecorded. Ratings range from 1 worst match to 10 excellentatch. Figure 9 depicts a typical comparison plot that was shown

    o the SMEs.We used ten of the available fifteen data sets and SME ratings

    o build a regression-based validation metric. We then used theemaining five data sets to test our model. Many combinations areossible for choosing which ten data sets to use to build the re-ression model and a full discussion of the combinations areound in Ref. 23. Table 7 presents the individual and averageME ratings for the time histories associated with the training and

    est data sets for one particular training set selection. Each com-utational model CAE is identified with an ID number and theata sets have been sorted in ascending order of average SMEating.

    The error measures computed using the three proposed erroreasures are given in Table 8 for the 15 data sets. It is worth

    oting that the relatively large phase error for CAE 1189 is re-ected by the low SME rating for this model. The three error

    Fig. 9 A typical plot presented to the SMEs

    Table 7 SME ratings for the fifteen CAE modebased validation metric; last five used to test

    CAEID

    SME rank

    1 2 3

    1188 5 3 11189 3 4 41130 4 4 41047 5 4 51020 6 4 51041 6 5 51028 7 6 51005 8 7 71083 7 7 71052 8 7 8

    1042 4 3 41100 5 4 31009 7 6 61016 7 7 71022 8 9 8

    61401-8 / Vol. 132, NOVEMBER 2010ded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASMmeasures were combined using a regression model to predict theaverage SME ratings. We built a linear regression model using thefollowing first-degree polynomial to fit the error measure valuesto the SME average ratings:

    Rp = 10 c1Errorphase + c2Errormagnitude + c3Errorslope 19where Rp denotes predicted rating and recalling that a rating of 10is an excellent match, implying no error.

    Figure 10 depicts the regression model rating predictions on theten time histories used to build the model and on the five remain-ing time histories relative to the average SME ratings the bars forthe SME ratings represent the range of the SME ratings. It can beseen that the validation metric assessments agree well with theSME ratings and the predictions always fall within the range ofthe individual SME ratings. While the results presented are foronly one training set, the same performance was observed for allregression models we built using different combinations of train-ing and test time histories 23.

    It is instructive to compare the rating predictions of ourregression-based validation metric to the rating predictions of fourexisting metrics used currently for this particular application:wavelet decomposition method, step function, ADVISER modelevaluation criteria, and corridor violation plus area. A completedescription of these metrics may be found in Ref. 24. It shouldbe noted that a linear regression approach was used to combine

    first ten models used to build the regression-

    Average ranking5 6

    3 4 3.003 3 3.334 5 4.005 5 4.677 4 5.336 5 5.507 6 6.008 6 7.009 7 7.6710 8 8.50

    4 4 3.674 6 4.177 6 6.509 5 7.1710 8 8.83

    1188 0.52 0.42 0.431189 51.94 0.20 0.331130 1.73 0.31 0.511047 0.67 0.22 0.411020 0.61 0.19 0.481041 0.50 0.22 0.311028 0.52 0.19 0.271005 0.50 0.16 0.231083 0.52 0.12 0.331052 0.52 0.09 0.251042 0.64 0.33 0.451100 1.16 0.28 0.691009 0.70 0.16 0.281016 0.61 0.17 0.441022 0.52 0.08 0.22

    Transactions of the ASMEE license or copyright; see http://www.asme.org/terms/Terms_Use.cfm

  • tp

    mmawrmf

    drti

    6

    ehot

    tDtite

    sfi

    lid

    J

    Downloahe individual error measures in the existing metrics used for com-arison.

    Figure 11 presents a comparison of the existing metrics and theetric proposed in this work, which is labeled as the error assess-ent of response time histories EARTH metric. The EARTH

    nd wavelet decomposition metrics predict the SME ratings quiteell. While different training sets yield slightly different absolute

    esults, in aggregate, the EARTH metric provides a more robusteasure of SME average rating across all training sets than the

    our commonly used metrics studied.In all the regression models we built, EARTH consistently pre-

    icted SME ratings well. This indicates that EARTH is capable ofecognizing the key features associated with the time histories forhis application and provide an overall error measure by combin-ng them.

    ConclusionsThe objective of the research presented in this paper was to

    valuate existing measures for assessing the error between timeistories and to propose a set of measures that can quantify errorf complex time histories associated with vehicle safety applica-ions.

    We adopted the idea of classifying error into phase and magni-ude based on existing metrics. We enhanced this concept by usingTW to separate the effects of phase and magnitude. In addition,

    o provide a measure of error due to differences in shape, wentroduced the concept of slope error, using the slope time history,o account for shape discrepancy. The DTW algorithm also wasmployed when assessing slope error.

    The applicability of the proposed error measures was demon-trated through two case studies pertaining to vehicle safety. Therst case study illustrates how the proposed measures can be used

    Fig. 10 Regression-based va

    ournal of Dynamic Systems, Measurement, and ControFig. 11 Comparison of EARTH to other metrics

    l NOVEMBER 2010, Vol. 132 / 061401-9ded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASMation metric: data fit and testE license or copyright; see http://www.asme.org/terms/Terms_Use.cfm

  • to assess predictive capability of computational models. The sec-ond case study showed how the measures can be used in conjunc-tion with SME data to develop regression-based models to vali-date simulation models. A comparison with four existing metricsfor model validation in vehicle safety applications demonstratedthat the proposed metric agrees well with SME ratings.

    The methods presented are the first step toward developing afully realized validation metric. Following the guidelines ofOberkampf and Barone 3, with effective error measures in place,the need exists to incorporate uncertainty related to experimentaland numerical error and to incorporate information regarding thenumber of experiments available. This work is being conducted,following some of the methodologies proposed by Rebba and Ma-hadevan 4.

    AcknowledgmentThe authors would like to thank Dr. Guosong Li of Ford Motor

    Co. and Dr. Matt Reed of the University of Michigan Transporta-tion Research Institute UMTRI for providing data and helpfulfeedback and suggestions. This work has been supported partiallyby Ford Motor Co. University Research Project No. 20069038and by the Automotive Research Center ARC, a U.S. ArmyCenter of Excellence in Modeling and Simulation of Ground Ve-hicles led by the University of Michigan. Such support does notconstitute an endorsement by the sponsors of the opinions ex-pressed in this paper.

    References1 American Institute of Aeronautics and Astronautics, 1998, Guide for the Veri-

    fication and Validation of Computational Fluid Dynamics Simulations.2 American Society of Mechanical Engineers, 2003, Council on Codes and Stan-

    7 2007, ADVISER Reference Guide, 2.5 ed.8 Jacob, C., Charras, F., Trosseille, X., Hamon, J., Pajon, M., and Lecoz, J. Y.,

    2000, Mathematical Models Integral Rating, Int. J. Crashworthiness, 54,pp. 417432.

    9 Geers, T. L., 1984, Objective Error Measure for the Comparison of Calcu-lated and Measured Transient Response Histories, Shock and Vibration Bul-letin, 54, pp. 99107.

    10 Sprague, M. A., and Geers, T. L., 2004, A Spectral-Element Method forModelling Cavitation in Transient Fluid-Structure Interaction, Int. J. Numer.Methods Eng., 6015, pp. 24672499.

    11 Schwer, L. E., 2005, Validation Metrics for Response Histories: A ReviewWith Case Studies, Schwer Engineering & Consulting Service Technical Re-port.

    12 Russell, D. M., 1997, Error Measures for Comparing Transient Data: Part I,Development of a Comprehensive Error Measure, Proceedings of the 68thShock and Vibration Symposium, Hunt Valley, MD.

    13 Russell, D. M., 1997, Error Measures for Comparing Transient Data: Part II,Error Measures Case Study, Proceedings of the 68th Shock and VibrationSymposium, Hunt Valley, MD.

    14 Donnelly, B. R., Morgan, R. M., and Eppinger, R. H., 1983, Durability,Repeatability, and Reproducibility of the NHTSA Side Impact Dummy, 27thStapp Car Crash Conference.

    15 Rabiner, L. R., and Huang, B. H., 1993, Fundamentals of Speech Recognition,Prentice-Hall, Englewood Cliffs, NJ.

    16 Munich, M., and Perona, P., 2003, Visual Identification by Signature Track-ing, IEEE Trans. Pattern Anal. Mach. Intell., 252, pp. 200217.

    17 Oates, T., Firoiu, L., and Cohen, P., 2000, Using Dynamic Time Warping toBootstrap HMM-Based Clustering of Time Series, Springer-Verlag, Berlin, pp.3552.

    18 Faundez-Zanuy, M., 2007, On-Line Signature Recognition Based on VQ-DTW, Pattern Recogn., 40, pp. 981992.

    19 Arkin, E. M., Chew, L. P., Huttenlocher, D. P., Kedem, K., and Mitchell, J. S.B., 1991, An Efficiently Computable Metric for Comparing PolygonalShapes, IEEE Trans. Pattern Anal. Mach. Intell., 133, pp. 209216.

    20 Efrat, A., Fan, Q., and Venkatasubramanian, S., 2007, Curve Matching, TimeWarping, and Light Fields: New Algorithms for Computing Similarity Be-tween Curves, J. Math. Imaging Vision, 273, pp. 203216.

    21 Chan, F., Fu, A., and Yu, C., 2003, Haar Wavelets for Efficient SimilaritySearch of Time-Series: With and Without Time Warping, IEEE Trans. Knowl.

    0

    Downloadards, Board of Performance Test Codes: Committee on Verification and Vali-dation in Computational Solid Mechanics.

    3 Oberkampf, W. L., and Barone, M. F., 2006, Measures of Agreement BetweenComputation and Experiment: Validation Metrics, J. Comput. Phys., 2171,pp. 536.

    4 Rebba, R., and Mahadevan, S., 2006, Model Predictive Capability Assess-ment Under Uncertainty, AIAA J., 44, pp. 23762384.

    5 Liu, X., Yan, F., Chen, W., and Paas, M., 2005, Automated Occupant ModelEvaluation and Correlation, Proceedings of the 2005 ASME InternationalMechanical Engineering Congress and Exposition, Orlando, FL.

    6 Gu, L., and Yang, R. J., 2004, CAE Model Validation in Vehicle SafetyDesign, SAE Technical Paper Series, Paper No. 20054-01-0455.

    61401-10 / Vol. 132, NOVEMBER 2010ded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASMData Eng., 153 pp. 686705,22 Chang, Y., and Seong, P., 2002, A Signal Pattern Matching and Verification

    Method Using Interval Means Cross Correlation and Eigenvalues in theNuclear Power Plant Monitoring Systems, Ann. Nucl. Energy, 29, pp. 17951807.

    23 Sarin, H., 2008, Error Assessment of Response Time Histories EARTH: AMetric to Validate Simulation Models, MS thesis, University of Michigan,Ann Arbor, MI.

    24 Yang, R. J., Li, G., and Fu, Y., 2007, Development of Validation Metrics forVehicle Frontal Impact Simulation, Proceedings of the ASME 2007 Interna-tional Design Engineering Technical Conferences and Computers and Infor-mation in Engineering Conference, Las Vegas, NV.

    Transactions of the ASMEE license or copyright; see http://www.asme.org/terms/Terms_Use.cfm