Post on 01-Feb-2018
10/29/2007
1
Using Pre-Whitening Techniques for Process Model Stabilization
Using Pre-Whitening Techniques for Process Model Stabilization
Jeremy M. Shaver, Barry M. Wise
Eigenvector Research, Inc.
Seattle/Wenatchee, WA
shaver@eigenvector.com
• Inverse linear regression (e.g. partial least squares)
likes random noise
• Non-random systematic interference common and
make it work harder for same accuracy.
• More components makes models harder to
interpret, introduces more noise
• Goal: Reduce presence of systematic information
not relevant to prediction
What is Pre-Whitening?
10/29/2007
2
Pre-Whitening Methods
• Add more PLS components… (brute force)
• Orthogonal Signal Correction (OSC)Wold et. al. Chemometrics Intell. Lab. Syst. 1998; 44: 175-185
• Orthogonal Partial Least Squares (OPLS)*Trygg, Wold, J. Chemometrics 2002; 16: 119-128
• Generalized Least Squares Weighting (GLS
/ GLSW) with y-block sort
* use requires patent license agreement
Orthogonal Signal Correction (OSC)
and GLS with y-block Sort
• Use y-block to identify spectral differences which are not apparently related to the “property of interest” (e.g. concentration)
• use y-block value to sort x-block (spectra)
• calculate row-wise differences
• Use those differences to create filter and “deweight” those features.
• y-block differences also calculated to use as weights in covariance calculation
10/29/2007
3
Concentration Spectra
analyte band
GLS with y-block Sort
Sort concentration
Concentration Spectra
and use order to sort spectra…
Spectra with similar y-block values are near each other and should contain similar features.
Sorting Spectra Using y-block
10/29/2007
4
Concentration Spectra
derivative down columns
(both conc. and spec.)
Only differences between spectra with similar y-values are left.
Create covariance from these differences.
Isoflavone in Corn
• Determine isoflavone content in corn
kernels (~ 600-7000 ppm)
• 820 samples from 2000 growing season
• NIR 400-2500 cm-1
• Goal: <400 ppm and <10 latent variables?!?
Steven Wright and James Janni
Pioneer Hi-Bred, Inc., A DuPont Business
(presented at CAC 2002 conference)
10/29/2007
5
5 10 15 20 25 30 35 40300
400
500
600
700
800
900
1000
1100
1200
Latent Variable Number
RM
SE
CV
5-splits contiguous block
(1)(2)
(3)AutoOPLS + AutoGLS + AutoAuto + OSC
(#) = number of components used in preprocessing.Dashed lines = Same preprocessing technique
with different numbers of components.
(6)
(8)
(4)
Cross-Validation Results for PLS Models
Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm
-1000
0
1000
2000
3000
4000
5000
Y C
V P
red
icte
d
Y Measured
Y C
V P
red
icte
d
1000 2000 3000 4000-1000
0
1000
2000
3000
4000
5000
GLS + Autoscale
RMSECV =384 (6 LVs)
AutoscaleRMSECV = 508 (11 LVs)
10/29/2007
6
Raman Analysis of Ionic Mixtures
from Salt Cake Dissolution
• Monitor ionic concentration during dissolution of
waste tank salt cakes
• Most ions by classical least squares successful
• OH- difficult due to significant overlap with water
OH envelope (changes with ionic strength and
other effects)
Samuel A. Bryan, Tatiana G.
Levitskaia, Serguei I. Sinkov
Pacific Northwest National Labs
Spectral Changes with [OH-]
2800 3000 3200 3400 3600 3800 4000-50
0
50
100
150
200
250
300
Frequency (cm-1)
Rela
tive Inte
nsity
Other ions change
envelope shape!
10/29/2007
7
Before we bother, what’s the
advantage?!?
• Does the model respond differently to
random noise?
• Does the model respond differently to
changes in interferences?
• Is the model differently sensitive to
Hotelling’s T2 and/or outliers?
100 200 300 400 500 600 700 800 900
Simulated Frequency
0 50 100 150 200 250 300-20
0
20
40
60
80
100
120
Sample Number
Sim
ula
ted
Co
nce
ntr
atio
n
Experiment with different
positions and amplitudes of
analytical band and different
interferences.
Simulated Process DataConcentrations
Spectra
10/29/2007
8
1 2 3 4 5 6 7 8 9 1012
14
16
18
20
22
24
26
28
30
# of LVs
RM
SE
CV
OSC (2 comp, 7 iter)
mean center
GLSW (alpha = 1)
GLSW (alpha = .01)
GLSW (alpha = 100)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
High Overlap Simulated Example(Very Low Signal to Noise Ratio)
Prediction at V. High Noise(peak to peak SNR = 1)
0 10 20 30 40 50 60 70 800
10
20
30
40
50
60
70
80
90
100
RMSEP
Num
ber
of
Replic
ate
s
noneOSCGLS
10/29/2007
9
Cross-Validation at Low Noise(peak to peak SNR = 5)
2 4 6 8 10 12 140
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Number of Latent Variables
RM
SE
CV
None
OSCGLS
Prediction at Low Noise(peak to peak SNR = 5)
1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.60
5
10
15
20
25
30
35
RMSEP
Num
ber
of
Replic
ate
s
noneOSCGLS
10/29/2007
10
Cross-Validation at High Noise(peak to peak SNR = 2)
2 4 6 8 10 12 140
1
2
3
4
5
6
7
8
9
10
Number of Latent Variables
RM
SE
CV
meancenterOSC (2)OSC (3)GLS
Prediction Error at High Noise(peak to peak SNR = 2)
3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 50
5
10
15
20
25
30
35
RMSEP
Num
ber
of
Replic
ate
s
noneOSCGLS
Changing Noise and Background
10/29/2007
11
Q Residuals With and Without GLS
50 100 150 200 250 30050
60
70
80
90
100
110
120
130
Sample
Q R
esid
uals
(0.0
3%
), Q
Resid
uals
(54.2
8%
)
Samples/Scores Plot of data
SNR = 3
Scores With and Without GLS
1000
-20 0 20 40 60 80 100-800
-600
-400
-200
0
200
400
600
800
Y Measured 1
Without GLS
-20 0 20 40 60 80 100-20
-15
-10
-5
0
5
10
15
Y Measured 1
With GLS
SNR = 3
10/29/2007
12
Spectral Changes with [OH-]
2800 3000 3200 3400 3600 3800 4000-50
0
50
100
150
200
250
300
Frequency (cm-1)
Rela
tive Inte
nsity
OH- Cross Validation
2 4 6 8 10 12 140
0.2
0.4
0.6
0.8
1
Latent Variables
none
OSC
GLS
RM
SE
CV
10/29/2007
13
Conclusions
• Sensitivity to Outliers and Noise not apparently changed
• GLS can achieve factor reduction in some scenarios in which OSC cannot
• GLS pre-processed data less sensitive to selection of number of factors
• Easier to interpret models may be primary benefit