Download - Joint Selection of Wavenumber Regions for MidIR and Raman … · 2015-05-29 · Explanatory variables are organized in a multitable in which intervals and individual variables are

Transcript
Page 1: Joint Selection of Wavenumber Regions for MidIR and Raman … · 2015-05-29 · Explanatory variables are organized in a multitable in which intervals and individual variables are

Lidwine Grosmaire, Pedro Maldonado-Alvarado, Christelle Reynès, Robert Sabatier, Dominique Dufour, Thierry Tran and Jean-Louis Delarbre

Joint Selection of Wavenumber Regions for MidIR and Raman Spectra and Variables in PLS Regression using

Genetic Algorithms

The Context

Empirical and small-scale processing Good breadmaking ability

Interest

Increased production and consumption of cassava

From the crop to the starch

20

12

EF

Fo

ST

An

nu

al

Me

eti

ng

20

-23

No

ve

mb

er

20

12

• M

on

tpe

llie

r, F

ra

nc

e

Varietal and process impacts on breadmaking ability

Standardize and scale-up the process

Physicochemical parameters

Improve product quality

Industrial development of new gluten-free bread products

Aim

The Aim

The Data

Physicochemical parameters Spectral data

Predicting the expansion ability

PLS1

Y

IR2X

Ra3X1

52

1

52

1 4562

-0.45

-0.40

-0.35

-0.30

-0.25

-0.20

-0.15

-0.10

-0.05

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Inte

nsité (

co

ups)

500 1 000 1 500 2 000 2 500 3 000 3 500

Shift Raman (cm-1)

RVA

Relevant parameters

selGAmPLS

33511

c1X

IR2X

13

Breadmaking abilityAmylose contentRVA parameters (12 variables)

Mid-infrared spectra (3351 variables)

Raman spectra (4562 variables)

How to explain the breadmaking ability from our data using a statisticalregression method while selecting variables of different types:individual and intervals?

Explanatory variables are organized in a multitable in which intervals andindividual variables are selected in order to predict one variable ofinterest: the breadmaking capacity.

A Genetic Algorithm (GA) was developed in a context of discrimination,jointly with the PLS1 method : this method is called selGAmPLS.

The ResultsFig 2: Final GA populations

characteristics: selected variables are indicated by black points

Table 1: Comparison with other methods results (number of selected variables, number of retained

PLS components, R² and cross-validation R²).

The 10 final populations are quite close indicating a global convergence of the GA

Individual populations seem to have converged

Method # var # comp R² R²CV

PLS 7926 7 0.7836 0.6605

PLS + VIP 4 3 0.7210 0.6650

selAGmPLS 311 12 0.9936 0.8273

Physicochemical parameters

4 RVA parameters

Peak ViscosityHolding Strenght

BreakdownRelative Breakdown

Spectroscopic variables

4 spectral regions

Crystalline order of starch

Selected variables

Conclusion

Genetic Algorithms provide a very adaptable and efficient solution when dealing with both several kinds of variables selections (individual vsintervals) and multiway tables.

The results obtained are very interesting for a predictive use. In terms of interpretation the method allowed to highlight the importance ofsome physico-chemical variables and to select a small number of short intervals in spectroscopic data.

The data selected are related to the water absorptivity and the crystalline state of starch and play a key role in breadmaking ability

7926 variables

The Problematic

Fig 1: Selection of relevant intervals and variables in multitables with a GA for modelling Y thanks to PLS