06 Simple Modeling

download 06 Simple Modeling

of 16

Transcript of 06 Simple Modeling

  • 7/28/2019 06 Simple Modeling

    1/16

    Linear fit of a theoretical relationship

    concentration

    abso

    rbance,

    550nm

    Were just trying to minimize the standard error (SE) for

    our model.

    The optimum value for b1 is the one that minimizes the

    standard error.

    We also must assume that the values are homoscedastic:

    (unweighted least squares)

  • 7/28/2019 06 Simple Modeling

    2/16

    homoscedasticnot homoscedastic

    Weighted least squares

    Used when the data is not homoscedastic.

    Each measurement is weighted by thereciprocal of its variance.

    v y12 ! v y2

    2 ! .... ! v yn2 !_ i

    b1=xi

    2!xi yi! =0.0204 soY=0.0204X

    An ANOVA can can be conducted to determin

    the goodness of fit for the model.

    Source of Variance Sum of Squares DF

    Model p

    Residual n-p-1

    Total n-1

    p = number of parameters.

    n = number of data points.

  • 7/28/2019 06 Simple Modeling

    3/16

    ppm absexp abscalc model residual

    1 0.020 0.020 0.0402 0.0004

    2 0.038 0.0408 0.0198 0.0028

    3 0.064 0.0612 -0.0006 -0.0028

    4 0.077 0.0816 -0.021 0.0046

    5 0.105 0.102 -0.003 -0.0030

    0.00416 0.000046

    Calculate absorbances based on Y = b1X

    b1=0.0204 Y=0.0608

    F test will show that the model accounts for asignificant amount of the variance comparedto the residual. F = 272

    Both examples show a linear regression fit of

    a data set. The one on the right indicates that

    the best model may not be a linear one.

    Y

    +r

    -r

    0 r

  • 7/28/2019 06 Simple Modeling

    4/16

    Correlation, |r| 100 r2

    0.10 1

    0.20 4

    0.50 25

    0.80 64

    0.90 81

    0.95 90

    0.99 99

    1.00 100

    Relationship between correlation coefficient (r)

    and proportion of variance (r2).

    |r| valuesbelow 0.9

    indicate a poor

    relationship.

    Y

    X

    Y

    X

    Y

    X

    Y

    X

    r = 0.90

    r = 1.00

    r = -1.00

    r = 0.60

  • 7/28/2019 06 Simple Modeling

    5/16

  • 7/28/2019 06 Simple Modeling

    6/16

    Regression of absorbance by ppm

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0 1 2 3 4 5 6

    ppm

    absorbance

    95% CI

    95% CI of

    the mean

    Pred(absorbance) / absorbance

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0 0.02 0.04 0.06 0.08 0.1 0.12

    Pred(absorbance)

    absorbance

    Standardized residuals / ppm

    -1.5

    -1

    -0.5

    0

    0.5

    1

    0 1 2 3 4 5 6

    ppm

    Standardizedr

    esiduals

  • 7/28/2019 06 Simple Modeling

    7/16

    In some cases, it is best to do a

    simple transformation of your data

    prior to attempting a linear

    regression fit.

    The goal of the transformation is tomake the resulting relationship

    linear.

    An ANOVA analysis can be conducted

    on the transformed data.

    Transform Equation of the lin

    Y X Y = b X + a

    Y 1/X Y = a + b / X

    1/Y X Y = 1 / ( a + b X )

    X/Y X Y = X / ( a + b X )

    log Y X Y = a bX

    log Y log X Y = a Xb

    Y Xn Y = a + b Xn

    b = slope, a = intercept.

    MLR assumes a linear relationship between

    Xi and y, with superimposed noise (e). It

    also assumes that there are no

    interactions between X1, X2, Xn.

    Y = b0 + b1X1 + b2X2 + . . . . bnXn + e

    We then fit a regression equation. The bn

    values are considered to be estimates of

    the true population parameters, n.

    Y=b1X1+b2X2

    b1 X12! +b2 X1X2= YX1!!

    b2 X22! +b1 X1X2= YX2!!

    R2 simply looks a how well all of the X

    account for Y. The adjusted R2 is weighted

    by the number of X used.

    R2=

    SSTotalSSTotal-SSresidual

    adjusted R2=MSTotal

    MSTotal-MSresidual

  • 7/28/2019 06 Simple Modeling

    8/16

    One advantage of an MLR approach is that you

    can obtain multiple measurements (X) for a

    single response. This can help eliminate noise.

    Well look at one example using MLR -Determination of Octane Number by NIR.

    Well revisit this example in a later unit when

    we compare it to other multivariate calibration

    methods.

    A 915 nm, CH2

    stretch B 1021 nm, CH2

    /CH3

    combination band

    C 1151 nm, aromatic and CH3

    stretch D 1194 nm, CH3

    stretch

    E 1394 nm, CH2

    combination bands F 1412 nm aromatic & CH2

    combination bands

    G 1435 nm aromatic & CH2 combination bands

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    900 1000 1100 1200 1300 1400 1500 1600

    Wavelength

    Absorbance

    A

    C

    B

    D

    E

    F

    G

    Single variable - 900 - just for comparison.

    80

    82

    84

    86

    88

    90

    92

    94

    96

    98

    0.21 0.215 0.22 0.225 0.23 0.235

    900

    Octane#

    Active

    Validation

    Model

    Conf. interv

    (Mean 95%

    Conf. interv

    (Obs. 95%

  • 7/28/2019 06 Simple Modeling

    9/16

    Standardized residuals / 900

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    0.21 0.215 0.22 0.225 0.23 0.235

    900

    Standardized

    residuals

    Active

    Validation

    Using all values results in a significant

    improvement in the fit..

    Sum of squares analysis shows which lines are

    the most significant for the fit.

    What happened to the other variables?

    Pred(Octane #) / Octane #

    83

    85

    87

    89

    91

    93

    83 85 87 89 91 93

    Pred(Octane #)

    Octane#

    Active Validation

    Octane # / Standardized residuals

    -2.5

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    83 85 87 89 91

    Octane #

    Standardizedr

    esiduals

    Active Validation

  • 7/28/2019 06 Simple Modeling

    10/16

    Pred(Octane #) / Octane #

    83

    85

    87

    89

    91

    93

    83 85 87 89 91 93

    Pred(Octane #)

    Octane#

    No significant change in the

    quality of the fit.

    Means you only need 4 lines

    to determine octane number.

    Pred(Octane #) / Standardized residuals

    -2.5

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    83 85 87 89 91 93

    Pred(Octane #)

    Standardizedr

    esiduals

    Number of X

    variablesQual Quant Mixed

    1Simple

    ANOVALR -

    2 or more ANOCA MLR ANCOVA

    Both treatment andcovariate variables

    are significant with

    the same model.

    Covariate variable

    is significant,treatment is not.

    Neither are

    significant.

    Treatment is

    significant but

    covariate isnt.

    Both treatment and

    covariate variables aresignificant but the

    models are different.

  • 7/28/2019 06 Simple Modeling

    11/16

    For simplicity, well look at a

    single near IR region. It has

    one of the highest correlations

    with octane number of those

    evaluated earlier.

    Start with a simple linear

    regression analysis, ignoring

    the fact there are both summer

    and winter blends included.

    80

    82

    84

    86

    88

    90

    92

    94

    96

    0.34 0.36 0.38 0.4 0.42

    a1360

    Octane

    #

    Active Model

    Conf. interval (Mean 95%) Conf. interval (Obs. 95%)

    Standardized residuals / a1360

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    0.34 0.36 0.38 0.4 0.42

    a1360

    Standardized

    residuals

    Octane # / Standardized residuals

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    83 85 87 89 91 93

    Octane #

    Sta

    ndardizedr

    esiduals

    It seems pretty clear that asimple linear regression has a

    problem. It is also obvious that

    there are two types of samples.

    Evaluating a simple

    scatterplot may help.y = 0.0023x + 0.159

    R2

    = 0.9072

    y = 0.0021x + 0.2249

    R2

    = 0.8963

    0.35

    0.36

    0.37

    0.38

    0.39

    0.40

    0.41

    0.42

    0.43

    83.0 84.0 85.0 86.0 87.0 88.0 89.0 90.0 91.0 92.0 93.0

    Summer

    Winter

    Linear (Summer)

    Linear (Winter)

    Lets try an ANCOVA

    using a1360 as thecovariate and the blen

    type as an additional

    factor.

    Pred(Octane #) / Octane #

    83

    85

    87

    89

    91

    93

    83 85 87 89 91 93

    Pred(Octane #)

    Octan

    e#

    Octane # / Standardized residuals

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    2.5

    3

    83 85 87 89 91 93

    Octane #

    Standardizedr

    esiduals

  • 7/28/2019 06 Simple Modeling

    12/16

    x y

    1 2

    2 43 8

    For our simple model, the program will

    attempt to find this minimum value.

    Yi

    f(x)

    error

  • 7/28/2019 06 Simple Modeling

    13/16

    Yi

    f(x)

    error

    This can be more difficult as the number of

    adjustable parameters increases. There can

    be several false minima that must be avoided.

    Typical options

    Initial estimate of parameters

    An initial guess of your values can not only

    save processing time but avoid false

    convergence as well.

    Scaling of parameters

    The magnitude of your parameters can varygreatly. Scaling will give them comparable

    weight during the fit.

    Maximum iterations

    How many guesses it should attemptbefore quitting.

    Limits

    Do you want to set any upper/lowerlimits for the parameters?

    Convergence

    How good does the estimate need to be.

  • 7/28/2019 06 Simple Modeling

    14/16

    For this model, the goal is to determine the

    minimum error solution for the following:

    We will be minimizinby tracking the sum

    of squares - that iswhat is held in thedelta2 column

    A A1

    Column B contains known constants for theexperiment

    Column E contains the values to beadjusted - Dp, Pp and Hp. SSE is the sumof squares that will be tested.

  • 7/28/2019 06 Simple Modeling

    15/16

  • 7/28/2019 06 Simple Modeling

    16/16

    Pred(A1) / A1

    0

    0.5

    1

    1.5

    2

    2.5

    0 0.5 1 1.5 2 2.5

    Pred(A1)

    A1