Multivariate calibration of spectrophotometric data using a partial least squares with data fusion

6
Spectrochimica Acta Part A 76 (2010) 363–368 Contents lists available at ScienceDirect Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy journal homepage: www.elsevier.com/locate/saa Multivariate calibration of spectrophotometric data using a partial least squares with data fusion Ling Gao, Shouxin Ren Department of Chemistry, Inner Mongolia University, West University Road 235, Huhehot, 010021 Inner Mongolia, PR China article info Article history: Received 10 November 2009 Received in revised form 9 March 2010 Accepted 16 March 2010 Keywords: Data fusion Wavelet multiscale Partial least squares Overlapping spectra Chemometrics abstract A novel method named DF–PLS based on partial least squares (PLS) regression combined with data fusion (DF) was applied to enhance the ability of extracting characteristic information and the qual- ity of regression for the simultaneous spectrophotometric determination of Cu(II), Ni(II) and Cr(III). Data fusion is a technique that seamlessly integrates information from disparate sources to produce a sin- gle model or decision. Wavelet representations of signals provide a local time–frequency description and are multiscale in nature, thus in the wavelet domain, the quality of noise removal is implemented by a scale-dependent threshold method. Information from different wavelet scales is just like different sources of information. Integrating the information from different wavelet scales to obtain a PLS model belongs to the technique of data fusion. PLS was applied for multivariate calibration and noise reduc- tion by eliminating the less important latent variables. In this case, by optimization, wavelet functions, decomposition level and thresholding methods and the number of PLS factors for DF–PLS were selected as Daubechies 4, 7, HYBRID thresholding and 3, respectively. The relative standard errors of prediction (RSEP) for all compounds with DF–PLS and PLS were 3.13% and 10.3%, respectively. Experimental results showed the DF–PLS method to be successful for simultaneous multicomponent determination even when severe overlap of spectra was present and proved it to be better than PLS. The DF–PLS method is a hybrid technique that combines the best attributes of DF and PLS, which makes it a promising and attractive method. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Multivariate calibration has been studied in different tech- niques. Among them, spectrophotometric method has been frequently used because it is simple, rapid, sensitive, and usually easy to interpret, and is the most common type of method to which chemometric techniques are applied. Inductively coupled plasma atomic emission spectroscopy (ICP-AES) and inductively coupled plasma mass spectroscopy (ICP-MS) and some hyphen- ated techniques have been used in this area; however these techniques require expensive instrumentation and maintenance. Thus, spectrophotometry is most widely applied for simulta- neous determination of multicomponent components based on chemometrics-assisted multivariate calibration. The main draw- back of ultraviolet–visible (UV–vis) is its poor selectivity because in many cases UV–vis spectra display strong overlaps in complex systems. In recent years, rapid-scanning spectrophotometers are capable of quickly generating huge data sets. However, the acquired data have complicated structures, are contaminated by noise and Corresponding author. Tel.: +86 471 4992125; fax: +86 471 4992984. E-mail address: [email protected] (S. Ren). redundancy, and can cause collinearity. Chemometric methods were proven to be effective in overcoming this difficulty [1,2]. Chemometrics is a cross-discipline that utilize methods and con- cepts of applied mathematics, statistics and artificial intelligence in the field of chemistry. The main objectives of chemometrics in analytical chemistry are to design or select optimal measure- ment procedure and experiment and to provide maximum relevant chemical information by analyzing chemical data [3–10]. In order to eliminate noise and irrelevant information, wavelet transform (WT) [11–13] and wavelet packet transform (WPT) [14,15] denois- ing methods were used as a preprocessing step. WT represents relatively recent mathematical developments, and can offer a successful time–frequency signal for enhanced information local- ization. WT and WPT have the ability to provide information in the time and frequency domain thus can be used for the purpose of converting data from its original domain into wavelet domain, in which the representation of a signal is sparse and the signal denoising is easier to be carried out. These characteristics of WT and WPT make it possible to perform data reduction, feature extrac- tion and denoising [16–18]. The wavelet functions are localized in both time and frequency and are multiscale in nature. Analytical signals are inherently multiscale, such as noise usually locates in high-frequency range whereas background and drifts often appear 1386-1425/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.saa.2010.03.024

Transcript of Multivariate calibration of spectrophotometric data using a partial least squares with data fusion

Page 1: Multivariate calibration of spectrophotometric data using a partial least squares with data fusion

Mw

LD

a

ARRA

KDWPOC

1

nfewpcatTncbiscd

1d

Spectrochimica Acta Part A 76 (2010) 363–368

Contents lists available at ScienceDirect

Spectrochimica Acta Part A: Molecular andBiomolecular Spectroscopy

journa l homepage: www.e lsev ier .com/ locate /saa

ultivariate calibration of spectrophotometric data using a partial least squaresith data fusion

ing Gao, Shouxin Ren ∗

epartment of Chemistry, Inner Mongolia University, West University Road 235, Huhehot, 010021 Inner Mongolia, PR China

r t i c l e i n f o

rticle history:eceived 10 November 2009eceived in revised form 9 March 2010ccepted 16 March 2010

eywords:ata fusionavelet multiscale

artial least squaresverlapping spectrahemometrics

a b s t r a c t

A novel method named DF–PLS based on partial least squares (PLS) regression combined with datafusion (DF) was applied to enhance the ability of extracting characteristic information and the qual-ity of regression for the simultaneous spectrophotometric determination of Cu(II), Ni(II) and Cr(III). Datafusion is a technique that seamlessly integrates information from disparate sources to produce a sin-gle model or decision. Wavelet representations of signals provide a local time–frequency descriptionand are multiscale in nature, thus in the wavelet domain, the quality of noise removal is implementedby a scale-dependent threshold method. Information from different wavelet scales is just like differentsources of information. Integrating the information from different wavelet scales to obtain a PLS modelbelongs to the technique of data fusion. PLS was applied for multivariate calibration and noise reduc-tion by eliminating the less important latent variables. In this case, by optimization, wavelet functions,

decomposition level and thresholding methods and the number of PLS factors for DF–PLS were selectedas Daubechies 4, 7, HYBRID thresholding and 3, respectively. The relative standard errors of prediction(RSEP) for all compounds with DF–PLS and PLS were 3.13% and 10.3%, respectively. Experimental resultsshowed the DF–PLS method to be successful for simultaneous multicomponent determination even whensevere overlap of spectra was present and proved it to be better than PLS. The DF–PLS method is a hybridtechnique that combines the best attributes of DF and PLS, which makes it a promising and attractive method.

. Introduction

Multivariate calibration has been studied in different tech-iques. Among them, spectrophotometric method has been

requently used because it is simple, rapid, sensitive, and usuallyasy to interpret, and is the most common type of method tohich chemometric techniques are applied. Inductively coupledlasma atomic emission spectroscopy (ICP-AES) and inductivelyoupled plasma mass spectroscopy (ICP-MS) and some hyphen-ted techniques have been used in this area; however theseechniques require expensive instrumentation and maintenance.hus, spectrophotometry is most widely applied for simulta-eous determination of multicomponent components based onhemometrics-assisted multivariate calibration. The main draw-ack of ultraviolet–visible (UV–vis) is its poor selectivity because

n many cases UV–vis spectra display strong overlaps in complexystems. In recent years, rapid-scanning spectrophotometers areapable of quickly generating huge data sets. However, the acquiredata have complicated structures, are contaminated by noise and

∗ Corresponding author. Tel.: +86 471 4992125; fax: +86 471 4992984.E-mail address: [email protected] (S. Ren).

386-1425/$ – see front matter © 2010 Elsevier B.V. All rights reserved.oi:10.1016/j.saa.2010.03.024

© 2010 Elsevier B.V. All rights reserved.

redundancy, and can cause collinearity. Chemometric methodswere proven to be effective in overcoming this difficulty [1,2].Chemometrics is a cross-discipline that utilize methods and con-cepts of applied mathematics, statistics and artificial intelligencein the field of chemistry. The main objectives of chemometricsin analytical chemistry are to design or select optimal measure-ment procedure and experiment and to provide maximum relevantchemical information by analyzing chemical data [3–10]. In orderto eliminate noise and irrelevant information, wavelet transform(WT) [11–13] and wavelet packet transform (WPT) [14,15] denois-ing methods were used as a preprocessing step. WT representsrelatively recent mathematical developments, and can offer asuccessful time–frequency signal for enhanced information local-ization. WT and WPT have the ability to provide information inthe time and frequency domain thus can be used for the purposeof converting data from its original domain into wavelet domain,in which the representation of a signal is sparse and the signaldenoising is easier to be carried out. These characteristics of WT

and WPT make it possible to perform data reduction, feature extrac-tion and denoising [16–18]. The wavelet functions are localized inboth time and frequency and are multiscale in nature. Analyticalsignals are inherently multiscale, such as noise usually locates inhigh-frequency range whereas background and drifts often appear
Page 2: Multivariate calibration of spectrophotometric data using a partial least squares with data fusion

3 ca Act

iaampaftpweotpsiawafaaDtPicttmtipiotdttkaitsmp

2

2t

aaaagsgt�

f

64 L. Gao, S. Ren / Spectrochimi

n the lowest frequency range. Measured data from most processesre also naturally multiscale. Since most data contain contributionst multiple scales, it is important to know which scales are theost important for forming a regression model with the highest

rediction ability. Thus, developing multiscale approaches for datanalysis and model relies on the extraction of various informationrom different wavelet scales, which is a challenging task. Unfor-unately, in WT, multiscale methods are only mentioned in fewapers [19–21]. In fact, integrating the information from differentavelet scales is like processing a large amount of data from sev-

ral sources by data fusion. In essence, combining different sourcesf data would lead to better information. Data fusion is a techniquehat faultlessly integrates information from disparate sources toroduce a single model or decision. Providing some brilliant per-pectives, data fusion is one of the most important techniques ofnformation science. The research of data fusion in chemistry ist preliminary stage; its theory from views of information scienceas rarely explored. While data fusion has been applied in military

ffairs, robotics, remote sensing, image analysis and computer, soar it seems very few reports [22,23] have appeared in the liter-ture describing the utilities of data fusion in chemistry. A novelpproach named DF–PLS tested here is the combination of PLS withF for analyzing overlapping spectra. This method combines the

echniques of data fusion and multiscale wavelet transforms withLS regression for enhancing the ability of extracting characteristicnformation and the quality of regression. This method on one handan benefit from data fusion technique to obtain better calibra-ion model in which the reduction of noise and redundancies andhe optimization of feature extraction are implemented by wavelet

ultiscale methods, and on the other hand can benefit from PLSechnique to provide the capability to improve predictive abil-ty. As a hybrid technique, the DF–PLS method combines the bestroperties of the two techniques and its problem-solving capac-

ty is dramatically increased. This seems to be the first applicationf a combined DF–PLS approach to multicomponent spectropho-ometric determination of Cu(II), Co(II) and Cr(III). Simultaneousetermination of the three kinds of elements is very difficult dueo their overlapping spectra. The suggested method was appliedo the simultaneous multicomponent determination of the threeinds of elements without prior separation. These three elementsre common elements and appear in a variety of environmental,ndustrial and geological samples. Simultaneous determination ofhese elements is important yet difficult due to their overlappingpectra. The proposed method was applied to the simultaneousulticomponent determination of Cu(II), Co(II) and Cr(III) without

rior separation.

. Theory

.1. The discrete wavelet transforms (DWT) and wavelethresholding denoising

The discrete wavelet transform (DWT) has been recognized asnatural wavelet transform for discrete time signals. DWT has thebility to provide information in the time and frequency domainsnd is a basis transformation, i.e. it calculates the coordinates ofdata vector in the so-called wavelet basis. The wavelet basis is

enerated by stretching out the wavelet to fit different scales of theignal and by moving it to cover all parts of the signal. DWT canive a time–frequency analysis of signals and is a linear operationhat decomposes a signal f(t) into a weighted sum of basis function

j,k(t):

(t) =∑

j

k

Cj,k�j,k(t), j, k ∈ z (1)

a Part A 76 (2010) 363–368

where z means the set of integers. The � j,k(t) are generated from asingle mother wavelet � (t) by dilations and translations:

�j,k(t) = 2−j/2� (2−jt − k) (2)

where j is the dilation scale index and k is the translation index.The empirical wavelet coefficients Cj,k are found by projecting thesignal f(t) onto the wavelet basis set � j,k(t), i.e.

Cj,k = 〈f (t), �j,k〉 (3)

where 〈f(t),� j,k〉 is a notation used for inner products or the projec-tion of function f onto the wavelet function � j,k. The fast discretewavelet transform (FDWT) can be implemented by means of theMallat’s pyramid algorithm [12], which is more efficient than com-puting a full set of inner products. FDWT can be viewed as thefilter of a signal via recursive application of high-pass and low-pass filters that comprise a pair of quadrature mirror filters (QMFs).The low-pass filter outputs ‘approximation coefficients’ whereasthe high-pass filter outputs ‘detail coefficients’. The approximationcoefficients can be used as the data input for QMF to achieve furtherwavelet decomposition. The theoretical background of FDWT hasalready been described in detail by the authors of this paper [7].

Original signals are always contaminated by noise. The mainsteps of signal denoising by DWT are:

• decompose measured data on a selected wavelet basis;• thresholding;• reconstruct the signals.

Choosing the threshold without prior knowledge is very important.If a lower threshold is utilized, more noise in the original data will beretained, whereas more information will be removed when a highthreshold is utilized. For wavelet denoising, Donoho has developedtwo different approaches: hard thresholding and soft threshold-ing [24]. In this paper we selected soft thresholding as a denoisingapproach. A fundamental issue in wavelet thresholding denoisingis the choice of threshold value [24,25]. Four methods includingMINIMAX, VISU, SURE and HYBRID have been used in this case toselect threshold values. Experiments have proven HYBRID to bemore suitable and convenient for real applications. These meth-ods were proposed by Donoho [24] and the algorithmic procedureswere well described in the literature [25] by Alsberg et al.

2.2. Wavelet multiscale and data fusion

The basic idea of wavelet analysis is multi-resolution, whichis known as the simultaneous appearance of a signal on multiplescales. The concept of multi-resolution was introduced by Mallatet al. [12], and it provided a powerful framework for understand-ing wavelet decomposition. The multi-resolution of WT means anoriginal signal is decomposed into multiple scales. The scale canbe interpreted as a measure of frequency. Analytical signals areinherently multiscale. For example, noise usually locates in thehigh-frequency ranges while background and drifts often appearat the lowest frequency ranges. The original signals (f) are decom-posed into the first-level approximation (A1) and detail (D1). Thefirst-level approximation A1 is then further decomposed into thesecond-level approximation (A2) and detail (D2). Thus, the origi-nal signals can be represented by specified frequency blocks (i.e.scales). For examples, when the decomposition level is equal to7, f = A7 + D7 + D6 + D5 + D4 + D3 + D2 + D1. In order to remove irrele-

vant information and retain the pertinent chemical information,the threshold values were selected in each scale. The thresholdingoperation means omitting some coefficient with small magnitude,which contribute little to the total energy of the signal. Apply-ing separate thresholding operations at each scale identifies the
Page 3: Multivariate calibration of spectrophotometric data using a partial least squares with data fusion

ca Acta Part A 76 (2010) 363–368 365

rceeaemtcbWbisaasmdfciaao

2

1

2

3

Ad

L. Gao, S. Ren / Spectrochimi

egion of scales where data come from different frequencies andapture different information. As observed in this case, the high-st frequency components, D1, D2 and D3, were noise and wereliminated easily by using the scale-dependent thresholding oper-tion. Wavelet transform is a linear operation, thus, the signals ofach component obtained from wavelet transform decompositionaintain their linearity. In the multiscale operation, each scale is

reated as a separate block. After denoising, the wavelet coeffi-ients of each scale can be converted back to their original domainy Mallat’s algorithm, retaining the original length of the signal.hen the decomposition level is equal to 7, the original signal can

e expressed as: f = Ra7 + Rd7 + Rd6 + Rd5 + R d4 + Rd3 + Rd2 + Rd1,n which Ra7, Rd7, Rd6, Rd5, Rd4, Rd3, Rd2 and Rd1 are recon-tructed components corresponding to A7, D7, D6, D5, D4, D3, D2nd D1, respectively. These reconstructed components are mutu-lly orthogonal and contribute to the original signal from differentcales, which is valid over the whole domain. Thus the infor-ation from different wavelet sources is like information from

ifferent sources. Data fusion allows the integration of informationrom different wavelet scales to obtain an PLS model. In principal,ombining different sources of information would lead to betternformation. In this case, after eliminating the data sources Rd1, Rd2nd Rd3, the data sources Rd4, Rd5, Rd6, Rd7 and Ra7 were selectednd concatenated into an augmentation data matrix to perform PLSperation.

.3. Data fusion partial least squares algorithm

Details about the DF–PLS algorithm are presented below:

. The whole set of spectra obtained from the standard mixture isused to build the experimental data matrix D. Before starting theDF–PLS calculation, mean centering and data standardization areperformed.

. The matrix D is transformed to the WT domain by Mallat’s algo-rithm. In the wavelet domain, it is easier to perform featureextraction, data compression, and denoising. Due to its multi-scale nature, the original signal (in the wavelength domain) isdecomposed and separated into a set of multi-frequency scales,each of which is treated as a separate source of information. Inthis paper a scale-dependent threshold method is performed.The thresholding operation is implemented in each scale by theHYBRID soft thresholding method. The wavelet coefficients ofapproximation and detail are reconstructed separately from dif-ferent scale denoised blocks. The reconstructed data in eachscale, including the details at all levels, and the approxima-tion at the coarsest level can be treated as separate sources ofinformation. Some blocks that only contain noise and irrelevantinformation, such as Rd1, Rd2 and Rd3, are removed. Other blockswhich are most important for forming the calibration model, areretained and integrated into an augmented matrix. In essence,this practice of data processing is a type of data fusion technique.The large amount of data from different sources can be easilycombined by means of data fusion techniques to obtain a betterregression model.

. The augmented matrix resulting from data fusion was usedas input data for further PLS operation. The calibration withPLS is done by decomposing both the concentration and theabsorbance data into latent variables, D = TPT + E and C = UQT + F.The regression coefficients are expressed as B = W(PTW)−1QT,

where W is a weight matrix.

ccording to these algorithms, a program named PDFPLS wasesigned to perform DF–PLS calculations.

Fig. 1. The absorption spectra of 100 �g ml−1 Cr3+ (1), 400 �g ml−1 Cu2+ (2),1300 �g ml−1 Ni2+ (3) and their mixture (4).

3. Experimental

3.1. Apparatus and reagents

A Shimadzu UV-240 spectrophotometer with optional modelOPI–2 was used for all experiments. A Lenovo Pentium 4 microcom-puter was used for all calculations. All reagents were of analyticalreagent grade. Doubly distilled and deionized water was used.Stock standard solutions of Cu(II), Ni(II) and Cr(III) were preparedfrom their respective nitrates and standardized titremetrically withethylenediaminetetraacetate (EDTA). These were diluted to theappropriate concentrations with 0.100 mol l−1 EDTA solution pre-viously adjusted to pH 6.0 with sodium hydroxide and hydrochloricacid solutions.

3.2. Procedures

A series of mixed standard solutions containing various ratiosof stock standard solutions of Cu(II), Ni(II) and Cr(III) at pH 6.0 with0.100 mol l−1 EDTA was prepared. The solutions were warmed toboil, after cooling transformed to 50 ml standard flasks and dilutedwith distilled water to the mark. Cuvettes with a path length of1 cm were used and the blank absorbance due to distilled waterwas subtracted. Spectra were recorded between 370 and 740 nmat 5 nm intervals by the model named “data print out at wave-length intervals” of the Shimadzu UV-240 spectrophotometer. Anabsorption matrix D was built up from these data. Following thesame procedures an absorption matrix for unknown mixtures Du

was built up. All the experimental operations were performed atroom temperature (19–20 ◦C).

4. Results and discussion

4.1. Spectral characteristics and optimization of experimentalvariables

Fig. 1 shows the absorption spectra of Cu(II), Ni(II) and Cr(III)with EDTA as reagents in pH 6.00 buffer solution. Experimentalconditions used were the same as those described in experimen-tal procedures. The absorption maxima of Cu(II), Ni(II) and Cr(III)with EDTA were 740, 593 and 542 nm, respectively. As can be seenfrom Fig. 1, the spectra exhibited have strongly overlapped bands in

their absorbing regions. In this case, quantitative evaluation by con-ventional methods without separation is impossible. The optimalexperimental conditions for this system were selected experimen-tally. Effect of pH values on determination of Cu(II), Ni(II) and Cr(III)was investigated and shown in Fig. 2. From Fig. 2 we can find that
Page 4: Multivariate calibration of spectrophotometric data using a partial least squares with data fusion

366 L. Gao, S. Ren / Spectrochimica Act

tTNTei

4

tb(mtstf

levels L were also tested. The choice of threshold values is also

Fig. 2. Effect of pH on determination of Ni(II) (1), Cr(III) (2) and Cu(II) (3).

he optimum range of pH for all investigated ions is pH 4.0–7.0.hus, pH 6.00 was selected because in this medium the Cu(II),i(II) and Cr(III) mixture had maximum and stable absorbance.he amount of EDTA was selected after investigation. The optimalxperimental conditions selected were the same as those describedn Section 3.2.

.2. Wavelet transform and wavelet multiscale properties

Wavelets are localized in time and frequency, and they possesshe properties of multi-resolution analysis. WT is a tool that cane utilized to convert data into different frequency componentsscales). In wavelet domain, approximation signals concentrates

ost of the energy of the source signal and have large magni-

ude coefficients located in the low frequency ranges, while detailignals represent the change in source signal and have small magni-ude coefficients with high frequency. WT allows the original signalto represent a series of coefficients of specified energy such as

Fig. 3. The original signals and their reconstructed

a Part A 76 (2010) 363–368

f = A7 + D7 + D6 + D5 + D4 + D3 + D2 + D1 when decomposition level is7. The noise can be easily reduced by removing all insignificantdetail signals without substantially affecting the information con-tent. In order to visually inspect the multiscale property of WT, thescales (A7, D7, D6, D5, D4, D3, D2 and D1) have been converted backto their original domain by Mallat’s algorithm to retain the originallength of the signal. The reconstructed components (Ra7, Rd7, Rd6,Rd5, Rd4, Rd3, Rd2 and Rd1) are shown in Fig. 3. These reconstructedcomponents are mutually orthogonal and contribute to the origi-nal signal from different scales having different frequencies. FromFig. 3, it is fairly obvious that Rd1, Rd2 and Rd3 centered in the highfrequency-ranges that resemble noise and should be eliminated.Other scales (Rd7, Rd6, Rd5 and Rd4) contain pertinent informationand should be retained, because the analytical signals usually arefound in the low frequency ranges. The scale (Ra7) concentrated inthe lowest frequency ranges may contain some backgrounds anddrift, so it should be removed.

4.3. DF–PLS

In order to optimize the DF–PLS method, four parameters, i.e. thenumber of PLS factors (t), wavelet functions, decomposition leveland thresholding methods are required to be optimized by trialand error. The first parameter is associated with PLS and the lastthree parameters are required by data fusion. Each of the waveletfunctions has different characteristics. A wavelet function that isoptimal for a given signal is not necessarily the best for anothertype of signal. Thus, the choice of the wavelet functions is essentialfor this technique. In this work, the wavelet functions tested wereCoiflet 1, 2,. . .,5, Daubechies (Db) 2, 4, 6,. . .,20, Symmlet 4, 5,. . .,10.It is possible to use the absolute and relative standard errors of pre-diction (SEP and RSEP) of total elements [7] to find the optimumchoice of functions. In a similar way, one to eight decomposition

very important to data fusion. Many algorithms based on statis-tical properties of the noise have been developed to estimate thebest threshold value, including VISU, SURE and HYBRID, MedianAbsolute Deviation (MAD), and MINIMAX [22]. By optimization as

components from different wavelet scales.

Page 5: Multivariate calibration of spectrophotometric data using a partial least squares with data fusion

L. Gao, S. Ren / Spectrochimica Acta Part A 76 (2010) 363–368 367

Table 1Comparison of DF–PLS method and single PLS methods from differ-ent scale.

Methods RSEP (%) (total elements)

PLS (Rd4) 5.71PLS (Rd5) 4.25PLS (Rd6) 8.25

mo

osddstPdr

adfcwas

ClR

TA

TS

TA

PLS (Rd7) 5.94PLS (Ra7) 8.49DF–PLS 3.13

entioned above, t = 3, Daubechies 4, L = 7 and the HYBRID thresh-lding was selected as optimal parameters.

In order to evaluate the advantages of data fusion, a comparisonf DF–PLS regression and single PLS regression from different dataources were performed. Five kinds of single PLS methods fromifferent data sources including Rd4, Rd5, Rd6, Rd7 and Ra7 wereesigned. The predictive parameter RSEP for total elements of theame test set was computed and is shown in Table 1. It can be seenhat the DF–PLS method had better performance than other singleLS methods. Obviously DF–PLS regression with the advantages ofata fusion leads to better results and is a challenge for multivariateegression.

A training set of 9 samples formed by the mixture of Cu(II), Co(II)nd Cr(III) was designed according to a three-level orthogonal arrayesign with the L9 (34) matrix. The experimental data obtainedrom the training set were arranged in matrix D, where each columnorresponded to the absorbance of different mixtures at a givenavelength and each row represented the spectrum obtained atgiven mixture. Fig. 4 shows the three-dimensional plots of the

pectra of standard samples.

Using program PDFPLS, the concentrations of Cu(II), Co(II) and

r(III) for a test set were calculated and their calculating results areisted in Table 2. The experimental results showed that the SEP andSEP for total elements were 1.01 × 10−3mol l−1and 3.13%.

able 2ctual concentration and percentage recovery of the unknowns.

Sample no. Actual concentration (10−3mol l−1) Recovery (%

Cu(II) Ni(II) Cr(III) Cu(II)

1 6.400 12.40 2.000 99.72 6.400 14.40 2.800 101.13 6.400 16.40 3.800 103.84 8.400 12.40 2.800 100.05 8.400 14.40 3.800 104.76 8.400 16.40 2.000 91.37 10.00 12.40 3.800 97.68 10.00 14.40 2.000 100.59 10.00 16.40 2.800 94.6

able 3EP and RSEP values for Cu(II), Ni(II) and Cr(III) system by the three methods.

Method SEP (10−3mol l−1) Total element

Cu(II) Ni(II) Cr(III)

DF–PLS 0.607 0.590 0.365 0.307PLS 0.241 1.72 0.209 1.01

able 4nalysis of real water samples by applying the DF–PLS method.

Sample no. Added concentration (10−3mol l−1) Found conc

Cu(II) Ni(II) Cr(III) Cu(II)

Tap water 1 6.300 13.90 3.500 6.540Tap water 2 8.500 15.50 2.500 9.098Tap water 3 10.10 12.00 2.900 11.02

Fig. 4. Three-dimensional plots of the spectra of the standard samples.

4.4. A comparison of DF–PLS and PLS

An intensive comparative study was performed between twomethods (DF–PLS and PLS) with a set of synthetic unknown sam-ples. The values of RSEP for the two methods are given in Table 3.The PLS method can decompose both the absorbance and concen-tration data, and is commonly known as one of the best multivariatecalibrations for linear systems. PLS algorithm is built on the prop-erties of the nonlinear iterative partial least squares (NIPALS)algorithm. The PLS also has the inherent ability to reduce noise by

eliminating less important latent variables. By combining the ideaof data fusion with PLS, the DF–PLS method draws advantages fromboth techniques to effectively eliminate noise and unrelated infor-mation as well as to improve the performance of regression. The

) DF–PLS Recovery (%) PLS

Ni(II) Cr(III) Cu(II) Ni(II) Cr(III)

102.0 88.5 104.0 111.8 89.9102.3 85.5 100.3 99.7 95.6

99.4 96.3 98.2 100.5 98.4100.3 98.7 98.6 102.5 94.9

98.4 95.8 95.3 92.5 100.3104.1 103.2 98.2 73.5 111.7101.2 102.4 98.5 110.9 95.0

97.4 116.7 102.5 108.2 100.4102.7 103.6 104.1 106.6 116.8

s RSEP (%) Total elements

Cu(II) Ni(II) Cr(III)

4.17 2.35 7.11 3.132.87 11.9 7.06 10.3

entration (10−3mol l−1) Recovery (%) DF–PLS

Ni(II) Cr(III) Cu(II) Ni(II) Cr(III)

14.25 3.304 103.8 102.5 94.415.78 2.095 107.0 101.8 83.8012.37 2.582 109.1 103.1 89.0

Page 6: Multivariate calibration of spectrophotometric data using a partial least squares with data fusion

3 ca Act

RamP

4m

pocfistiPi

5

salmoPdTrtmo

[[[[[

[[

[[[[[21] S. Yoon, J.E. MacGregor, AIChE J. 50 (2004) 2891–2903.[22] K.H. Ruhm, Measurement 40 (2007) 145–157.

68 L. Gao, S. Ren / Spectrochimi

SEP values for all compounds using DF–PLS and PLS were 3.13%nd 10.3%, respectively. The results demonstrate that the DF–PLSethod is successful and generates much better results than the

LS.

.5. Analysis of real water samples by applying the DF–PLSethod

In order to test the applicability and matrix interferences of theroposed method to the analysis of real samples, known amountsf Cu(II), Ni(II) and Cr(III) were spiked into some samples that do notontain these elements initially. Table 4 shows the results obtainedor real matrix samples. The DF–PLS method is able to predict thendividual concentration of Cu(II), Ni(II) and Cr(III) in the real matrixamples. The interference research was processed for some ionshat may interfere the determination of Cu(II), Ni(II) and Cr(III)n the real matrix samples. Experiments found that Zn(II), Al(III),b(II), Fe(III), Ca(II), Mg(II), Ba(II), F−, NO3

− and SO42− caused no

nterference.

. Conclusion

A method named DF–PLS was developed for multicomponentpectrophotometric determination. This approach combines DFnd PLS to enhance the ability of eliminating noise and unre-ated information as well as the quality of regression method. The

ethod relies on the concept of fusing the sources of informationbtained by reconstructing different scale denoised components;LS regression is performed on this fused matrix in order to pro-uce a much exact estimate of concentration of different elements.

his approach is based on the multiscale nature of wavelet; theeduction of noise and redundancies and the optimization of fea-ure extraction are implemented by a scale-dependent threshold

ethod. PLS provides calibration model and reduces the dimensionf raw spectra. Experimental results demonstrated that the DF–PLS

[[[

a Part A 76 (2010) 363–368

approach was successful and delivered more satisfying results com-pared to PLS.

Acknowledgements

The authors would like to thank the National Natural ScienceFoundation of China (20667002 and 60762003) and Natural ScienceFoundation of Inner Mongolia (2009MS0209) for financial supportof this project.

References

[1] J.W. Einax, Anal. Bioanal. Chem. 390 (2008) 1225–1226.[2] Editorial, Chemometr. Intell. Lab. Syst. 191 (2008) 99–100.[3] F. Dieterle, S. Busche, G. Gauglitz, Anal. Bioanal. Chem. 380 (2004) 383–396.[4] Q.J. Han, H.L. Wu, C.B. Cai, L. Xu, R.Q. Yu, Anal. Chim. Acta 612 (2008) 121–125.[5] Y.N. Ni, Y. Wang, S. Kokot, Talanta 78 (2009) 432–441.[6] F. Zhang, H. Li, Chemometr. Intell. Lab. Syst. 82 (2006) 184–192.[7] S.X. Ren, L. Gao, Talanta 50 (6) (2000) 1163–1173.[8] S.X. Ren, L. Gao, J. Electroanal. Chem. 586 (1) (2006) 23–30.[9] L. Gao, S.X. Ren, Spectrochim. Acta Part A 73 (2005) 960–965.10] G.Q. Wang, W.S. Cai, X.G. Shao, Chemometr. Intell. Lab. Syst. 82 (2006) 137–144.11] I. Daubechies, Commun. Pure Appl. Math. 41 (1988) 909–996.12] S. Mallat, W.L. Hwang, IEEE Trans. Inform. Theory 38 (2) (1992) 617–643.13] C.E. Meil, D.F. Walnut, SIAM Rev. 31 (4) (1989) 628–666.14] R.R. Coifman, M.V. Wickerhauser, IEEE Trans. Inform. Theory 38 (2) (1992)

713–718.15] B. Jawerth, W. Swedens, SIAM Rev. 36 (3) (1994) 377–412.16] D. Donald, Y. Everingham, D. Coomans, Chemometr. Intell. Lab. Syst. 77 (1–2)

(2005) 32–42.17] B. Walczak, D.L. Massart, Chemometr. Intell. Lab. Syst. 36 (2) (1997) 81–94.18] X.G. Shao, W.S. Cai, Anal. Lett. 32 (4) (1999) 743–760.19] B.R. Bakshi, AIChE J. 44 (7) (1998) 1596–1610.20] B.K. Alsberg, Anal. Chem. 71 (1999) 3092–3100.

23] Z. Ahmad, J. Zhang, Comput. Chem. Eng. 30 (2005) 295–308.24] D.L. Donoho, IEEE Trans. Inform. Theory 41 (3) (1995) 613–627.25] B.K. Alsberg, A.M. Woodward, M.K. Winson, J. Rowland, D.B. Kell, Analyst 122

(1997) 625–645.