Grey modeling approaches to investigate chemical processes Romà Tauler 1 and Anna de Juan 2...

Post on 19-Jan-2016

213 views 0 download

Tags:

Transcript of Grey modeling approaches to investigate chemical processes Romà Tauler 1 and Anna de Juan 2...

Grey modeling approaches to investigate chemical processes

Romà Tauler1 and Anna de Juan2

IIQAB-CSIC1, UB2 Spain

E-mail: rtaqam@iiqab.csic.es

Grey modeling approaches to investigate chemical processes

• Introduction to chemical modeling: white (hard), black (soft) and grey modeling in chemistry

• Multivariate Curve Resolution as a grey modeling method

• Grey modeling applications using MCR-ALS

Modeling approaches

Hard Modeling White ModelingModels based on Physical/Chemical Laws

Soft Modeling Black ModelingEmpirical Models with no knowledge/assumptions about the Physical/chemical laws of the system (usually non-linear)

Models with no assumptions about the physical/chemical modelbut with assumptions about the measurement model (usually multivariate and linear)

Soft+Hard Modeling? Grey Modeling?Mixed Models partially using information about physical//chemical laws

Chemical model (variation of compound contribution)

MIXTURE

Non-existent

PROCESS

Known Too complex Unknown

Chemical multicomponent systems. Structure

Measurement model (variation of the instrumental signal)

Simple additive linear model (Factor Analysis tools)

D D1 D2 Dn

= + + ... +

D

= +

s 1

+c2

s 2

... + cn

s n

c1

D

=

C

Ssn

s1

cnc1

Hard (White) Modeling

•Data modeling and data fitting in chemical sciences has been traditionally done by hard modeling techniques.

•They are based on physical/chemical models which are already known (or assumed, proposed,...)

•The parameters of these model are not known and they are estimated by least squares curve fitting

•This approach may be also called white modeling and it is valid for well known phenomena and laboratory data, where the variables of the model are under control during the experiments and only the phenomena under study affect the data.

2i,j

i=1 j=1

= I J

ssq r f ( ,model, )ssq Y

Hard (White) Modeling

ijijij YYr ˆ

0)(

ssqFind the optimal

parameters of theModel ,

eYY ),model(ˆ

Hard (White) Modeling

Case 1 Kinetic Systems:Yij = Aij

measured absorbances of sample/solution i wavelength j

Measurement model assumptions:

Chemical Model assumptions:

Defining the residuals:

Finding the best model and its parameters

kj

ki

kij

K

k

kijij CAAA ,

klklkl

tkk cqTeCC ,0

ijijij AAr ˆ

0

ssq

Hard (White) Modeling

Case 2 Solution Equilibria: Yij = Aij

measured absorbances of sample/solution i wavelength j.

Measurement model assumptions:

Chemical Model assumptions:

Defining the residuals:

Finding the best model and its parameters

kj

ki

kij

K

k

kijij CAAA ,

k

lklkll

qlk

kk cqTcC lk ,

i j

ijijijij rssqAAr 2,ˆ

0

ssq

mp=0

guess parameters, k0

calculate residuals, r(k0)and the sum of squares, ssq

calculate Jacobian J

calculate shift vector k, andk0 = k0 + k

end;display resultsssqold <> ssq mp=0

mp / 3mp5

<

>

yes

no

The Newton-Gauss-Levenberg/Marquardt (NGL/M) algorithm

Hard (White) Modeling

t 1 t0( ) ( )mp k J J J r kI

i iik y d

y

ssq

nt n nk nc n

• In soft (black) modeling no physical model is assumed.

• In some cases a linear measurement model is assumed (factor analysis methods)

• In other cases dependencies among variables and sources of variation are considered to be non linear (neural networks, genetic algorithms, …)

• The goal of these methods is the explanation of data variance using the minimal or softer assumptions about data

Soft (Black) Modeling

Example of Soft (Black) ModelingFactor Analysis/Principal Component Analysis

Bilinear ModelD = U VT + E

Unique solutions but without physical meaningConstraints: U orthogonal, VT orthonormalVT in the direction of maximum variance

N

ij in nj ijn=1

d u v e

N

D UVT

E+I

J J J

I I

N

N << I or J

Hard (white)- vs. Soft (black)-modelling

Pros HM• Well defined behaviour

model (useful chemical information).

• Unique solutions.

• Reduced number of parameters to be optimised (e.g., K, k,..)

Pros SM No explicit model is

required.

Information on the process or signal may be used (constraints).

May help to set or to validate a physicochemical model.

Cons HM• The underlying model

should be correct and completely known.

• No variations other than those related to the model should be present in the data set.

Cons SM Ambiguous solutions.

Does not provide directly physicochemical (kinetic or thermodynamic,...) information.

Hard (white)- vs. Soft (black)-modelling

Hard (white)- vs. Soft (black)-modelling

Use HM• The variation of the

system is completely described by a reliable physicochemical model.

Clean reaction systems (kinetic or thermodynamic processes)

Use SM The model describing the

variation of the data is too complex, unknown or non-existent.

Images.

Chromatographic data.

Macromolecular processes.

• Mixed systems with hard-modelable and soft-modelable parts are proposed– Hard-model: kinetic process, equilibrium reaction.....– Soft-model: interferent, background, drift, unknown....

• Introducing a hard-model part decreases the ambiguity related to pure soft-modeling methods and gives additional information (parameters).

• Introducing a soft-model part, may help to clarify the nature of the physicochemical model and give more reliable results.

Grey (hard+soft) modeling

Grey modeling approaches to investigate chemical processes

• Introduction to chemical modeling: white (hard), black (soft) and grey modeling in chemistry

• Multivariate Curve Resolution as a grey modeling method

• Grey modeling applications using MCR-ALS

Multivariate Curve Resolution (MCR)

Goal

Knowing the identity and contribution of each pure

compound (entity) in the process or in the mixture.

PROCESS

The composition changes in a continuous

evolutionary manner.

E.g. chemical reactions, processes, HPLC-DAD.

MIXTURE

The composition changes with a random pattern

variation.

E.g. Series of independent samples.

The composition changes with a non-random pattern variation.

E.g. environmental data, spectroscopic images.

A tool to analyse (resolve) changes in composition and response in multicomponent systems.

Multivariate Curve Resolution

Pure component information

C

ST

sn

s1

c nc 1

WavelengthsRetention times

Pure concentration profiles Chemical model

Process evolutionCompound contribution

Pure signals

Compound identity

D

Mixed information

tR

Multivariate Curve Resolution methods

D = CST + E

• Investigation of chemical reactions (kinetics, equilibria, …) using multivarite measurements (spectrometric,...)

• Industrial processes (blending, syntheses,…).• Macromolecular processes.• Biochemical processes (protein folding).• Spectroscopic images.• Mixture Analysis (in general)• Hyphenated separation techniques (HPLC-DAD, GC-MS, CE-

DAD,...).• Environmental data (model of pollution sources)• ……………..

Multivariate Curve Resolution Bilinear Model: Factor Analysis Model

D = C ST + E

N

ij ik kj ijn=1

d c s e

D CST

E+I

J J J

I I

K

N << I or J

N

Non-unique solutions but with physical meaning (rotational/ intensity ambiguities are present)

Constraints: C and ST non-negativeC or ST scaled (normalization, closure)

Other constraints (unimodality, local rank, selectivity, previous knowledge... )

D1

D2

D3

ST

C1

C2

C3

Z

D C

Multivariate Curve resolution Alternating Least Squares MCR-ALS

Extension to multiple data matrices

quantitative information

row-, concentration profiles

column-, spectraprofiles

column-wiseaugmenteddata matrix

NR1

NR2

NR3

NC

NM = 3

Advantages of matrix augmetation(multiway data)

• Resolution local rank conditions are achieved in many situations for well designed experiments (unique solutions!)

• Rank deficiency problems can be more easily solved

• Unique decompositions are easily achieved for trilinear data (trilinear constraints)

• Constraints (local rank/selectivity and natural constraints) can be applied independently to each component and to each individual data matrix.

J,of Chemometrics 1995, 9, 31-58 J.of Chemometrics and Intell. Lab. Systems, 1995, 30, 133

Multivariate Curve Resolution – Alternating Least Squares (MCR-ALS)

• Determination of the number of components (i.e. by SVD)

• Building of initial estimates (C or ST)

• Iterative optimisation of C and/or ST by Alternating Least Squares (ALS) subject to constraints.

• Check for satisfactory CST data reproduction.

Data exploration

Input of external information asCONSTRAINTS

The aim is the optimal description of the experimental data using chemically meaningful pure profiles.

Fit and validation

TPCA

CSCDmin ˆˆˆ

ˆ T

PCAS

SCDminT

ˆˆˆ

• Optional constraints (local rank, non-negativity, unimodality,closure,…) are applied at each iteration• Initial estimates of C or S are obtained from EFA or from pure variable detection methods.

C and ST are obtained by solving iteratively the two LS equations:

An algorithm for Bilinear Multivariate Curve Resolution Models :

Alternating Least Squares (MCR-ALS)

Constraints

Definition

Any chemical or mathematical feature obeyed by the profiles of the pure compounds in our data set.

• C and ST can be constrained differently.• The profiles within C and S can be constrained

differently.

Constraints transform resolution algorithms into problem-oriented data analysis tools

Soft constraints

Non-negativity

C*

0 10 20 30 40 50-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Retention times

Cc

0 10 20 30 40 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Retention times

Concentration profiles

spectra

Unimodality

C*

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Retention times

Cc

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Retention times

Reaction profiles Chromatographic peaks

Voltammograms

Soft constraints

Soft constraints

Selectivity/local rank

Concentration selectivity/local rank constraint

C*

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Retention times

Cc < threshold values

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Retention times

We knowthat this region

is not rank 3, but rank 2!

D

Select

Updated

STALS

cALS

Local model

predALSc

calALSc

calALSc

refc

predcpredALSc

calc calc

cal

ALSref cc

b, b0

b, b0predc

C

Errorbcbc 0calALSref

0predALS

pred bcbc ˆ

Concentration correlation constraint (multivariate calibration)

ST

C

=

D

D1

D2

D3

Trilinearity Constraint (flexible to every species) Extension of MCR-ALS to multilinear systems

1st scoreloadings

PCA,SVD

Foldingspeciesprofile

1st scoregives thecommonshape

Loadings give therelative amounts!

Trilinearity Constraint

Unfolding species profile

UniqueSolutions!

Substitution of species profile

C

Selection of species profile

R.Tauler, I.Marqués and E.Casassas. Journal of Chemometrics, 1998; 12, 55-75

Hard modeling: Mass balance or Closure constraint

C*

2 3 4 5 6 7 8 90

0.05

0.1

0.15

0.2

0.25

0.3

0.35

pH

ctotal

2 3 4 5 6 7 8 90

0.05

0.1

0.15

0.2

0.25

0.3

0.35

pH

Cc

= ctotal

ctotal

Mass balance

Closed reaction systems

Hard modeling constraints

Hard modeling: Mass action law and rate laws

Hard modeling constraints

C

2 3 4 5 6 7 8 90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

pH

Ccons

2 3 4 5 6 7 8 90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

pH

Physicochemical model

Kinetic processes

Equilibrium processes

• The hard model is introduced as a new and essential constraint in the soft-modelling resolution process.

• It is applied in a flexible manner, as the soft-modelling constraints.

– To some or to all process profiles.– To some or to all matrices in a three-way data set. – Different hard models can be applied to different

matrices in a three-way data set.

Grey modeling using MCR-ALSsoft + hard modeling constraints

C

2 3 4 5 6 7 8 90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

pH

Ccons

2 3 4 5 6 7 8 90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

pH

physicochemical model (mass action law, rate law)

Kinetic processes

Equilibrium processes

CSM CHM

Grey modeling using MCR-ALS

soft model (non-negativity)

HM

SM

1. Select the soft-modelled profiles to be constrained (CSM).

2. Non-linear fit of the selected profiles according to the hard model selected.

3. Update the soft-modelled profiles CSM.by the fitted CHM.

min(ssq(CSM-CHM))

ssq=f(CSM, model, parameters)

Grey modeling using MCR-ALS

Grey modeling approaches to investigate chemical processes

• Introduction to chemical modeling: white (hard), black (soft) and grey modeling in chemistry

• Multivariate Curve Resolution as a grey modeling method

• Grey modeling applications using MCR-ALS

Grey modeling approaches to investigate chemical processes

Examples:

1. Getting kinetic and analytical information from mixed systems (drift and interferents)

2. Using a physicochemical model to decrease resolution ambiguity and getting analytical information

3. pH induced transitions in hemoglobin

0 5 100

0.5

1

Time

Con

cent

ratio

n

0 50 1000

1

2

3

4x 10

4

Wavelengths

Abs

orba

nce

A

B

C

C B A

i

D a

d

0 5 100

0.5

1

Con

cent

ratio

n

drift

D d

Time

Kinetic process + drift

0 5 100

0.5

1

TimeC

once

ntra

tion

interf.

D i

Kinetic process + interferent

CBA k1 = k2 = 1

Model

Grey modeling applications using MCR-ALS

consecutiveirreversible

Example 1 Getting kinetic information from mixed systems (drift and interferents)

Anna de Juan, Marcel Maeder, Manuel MartÍnez, Romà TaulerAnalytica Chimica Acta 442 (2001) 337–350;

Kinetic model

][][][][

)(][][

][][

21

1

12

1

CBAC

eekk

kAB

eAA

o

tktko

tko

CHM = f(k1, k2)

Kinetic process

+ drift/interferent

A, B, C HMDrift, inter SM

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

Time

Con

cent

ratio

n (a

.u.)

0 20 40 60 80 1000

1

2

3x 10

4

Wavelength channel

Abs

orpt

iviti

es (

a.u.

)

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

Time

Con

cent

ratio

n (a

.u.)

0 20 40 60 80 1000

1

2

3

4x 10

4

a)

b)

Kinetic process + drift

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

Time

Con

cent

ratio

n

0 20 40 60 80 1000

1

2

3

x 104

Wavelength channel

Abs

orba

nce

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

TimeC

once

ntra

tion

0 20 40 60 80 1000

1

2

3

x 104

Wavelength channel

Abs

orba

nce

a)

b)

Kinetic process + interferent

HM

HSM

Grey modeling applications using MCR-ALS

Example 1 Getting kinetic information from mixed systems (drift and interferents)

System Algorithm k1 = 1 k2 = 1

A,B,C (drift) HM 1.40 0.83

HSM 1.16 0.90

A,B,C (interferent) HM 1.16 0.89

HSM 0.95 1.05

Anna de Juan, Marcel Maeder, Manuel MartÍnez, Romà TaulerChemometrics and Intelligent Laboratory Systems 54 2000 123–141

Example 2. Using a physicochemical model to decrease resolution

ambiguity. Getting analytical information.

Chemical problem: multiequilibria systems

Quantitation of an analyte (H2A) in the presence of an interferent (H2B).

Measurements FT-IR monitored pH titrations

H2A (malic acid)

H2B (tartaric acid)

0

0.02

0.04

0.06

0.08

0.1

0.12

1 6 11

pH

Co

nc

en

tra

tio

n

Grey modeling applications using MCR-ALS

Highly overlapped concentration profiles

Example 2. Using a physicochemical model to decrease resolution

ambiguity. Getting analytical information.

Too correlated concentration profiles

Too overlapped spectra

Too ambiguous SM solutions

Quantitation fails

Data set

Standard

H2A

Sample

H2A/H2B

pH

pH

Grey modeling applications using MCR-ALS

Time effect on pH transitions (UV)

2 3 4 5 6 7 8 9 1000.10.20.30.40.50.60.70.80.9

pH

3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

pH

350 400 450 500 550 600 650 7000

0.5

1

1.5

2

2.5

Wavelengths (nm)

350 400 450 500 550 600 650 7000

0.2

0.4

0.6

0.8

1

After 24 hours

Fresh solution

Wavelengths (nm)

Grey modeling applications using MCR-ALSExample 3 Time effect on pH induced transitions in hemoglobin

SM

SM

1,2 Heme group unbound 3 Native 4 Heme bound (change in coordination)

• Time-dependent acidic conformations evolve very similarly with pH (rank-deficiency).• The kinetic matrix helps in the resolution of the acidic conformations in the pH-dependent process.• Hard-modelling constraint applied to the kinetic process helps to a less ambiguous recovery of the

acidic conformations in the pH-dependent process.

tim

ep

H

D C

ST

=

Global description of the process

After 48 hours

Grey modeling applications using MCR-ALSExample 3 Time effect on pH induced transitions in hemoglobin

SM

HM

• All the pH-dependent conformations can be resolved, even those time-dependent.

• Additional kinetic information is obtained. k1 = 1.424e-5 + 4 e-8

Complete description SM + HM

350 400 450 500 550 600 650 7000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Wavelengths (nm)0 2 4 6 8 10 12 14 16

0

1

2

3

4

5

Time

3 4 5 6 7 8 9 10

0

1

2

3

4

5

pH

Grey modeling applications using MCR-ALSExample 3 Time effect on pH induced transitions in hemoglobin

HM SM

Some References Soft+Hard (Grey) Modelling

• A. de Juan, M. Maeder, M. Martínez, R. Tauler. Chemom. Intell. Lab. Sys. 54 (2000) 123.

• A. de Juan, M. Maeder, M. Martínez, R. Tauler. Anal. Chim. Acta, 442 (2001) 337

• J.Diework, A. de Juan, R.Tauler and B.Lendl. Applied Spectroscopy, 2002, 56, 40-50

• J. Diewok; A. de Juan; M. Marcel; R. Tauler; B. Lendl. Analytical Chemistry, 2003, 76, 641-7

Acknowledgements• Chemometrics Group (UB and IIQAB-CSIC)

– Staff: Romà Tauler, Javier Saurina, Anna de Juan, Raimundo Gargallo– Post-doc: Montse Vives, Mónica Felipe– PhD : Susana Navea, Joaquim Jaumot, Emma Peré-Trepat, Elisabeth

Teixido– Master: Silvia Termes, Silvia Mas, Gloria Muñoz, Marta Terrado, Xavier

Puig

.

Manel Martínez, University of Barcelona (Spain)Marcel Maeder (University of Newcastle, Australia)Josef Diewok (University of Viena, Austria)