ars.els-cdn.com · Web viewFriedman, Jerome, Trevor Hastie and Rob %J R package version Tibshirani....

Supplementary Information

Quantifying the influence of surface physico-

chemical properties of biosorbents on heavy metal

adsorption

Chaamila Pathiranaa,b,c, Abdul M. Ziyathd, K.B.S.N. Jinadasac Prasanna Egodawattab,

Sarina Sarinab, Ashantha Goonetillekeb*

a Department of Forestry and Environmental Science University of Sri Jayewardenepura ,

Nugegoda, Sri Lanka

bScience and Engineering Faculty, Queensland University of Technology (QUT), GPO Box

2434, Brisbane, 4001, Queensland, Australia

cDepartment of Civil Engineering, University of Peradeniya, Sri Lanka

dZedz Consultants Pty Ltd, Hillcrest, QLD 4118, Australia

[email protected]; [email protected]; [email protected];

[email protected]; [email protected]; [email protected]

*Corresponding author

Ashantha Goonetilleke

[email protected]

1

1. Selection of Biosorbents

For the selection of biosorbents, PROMETHEE (Preference Ranking Organisation METHod

for Enrichment Evaluations) which is a Multi Criteria Decision Making (MCDM) Technique

was employed. MCDM techniques are employed to help with the decision making process

when multi variable problems are involved. From the various MCDM methods available,

PROMETHEE is considered as a relatively sophisticated method compared to the others

(Brans, Vincke and Mareschal 1986; Keller, Massart and Brans 1991).

PROMETHEE is a non-parametric data analysis method used to rank the actions/objects on

the basis of a set of pre-determined criteria. For each of the variables in the data matrix, the

degree of preference of one object to another is assessed. The ranking order is developed

aided by the calculation of net ranking flow (φ value), for the available objects/actions on the

basis of a range of criteria (Ayoko et al. 2007; Podvezko and Podviezko 2010). To calculate

the φ values, each criterion must be provided with three conditions: a preference function, a

preference order (maximise/minimise) and a weighting. The PROMETHEE algorithm then

employs a number of steps to calculate the φ values between objects as explained elsewhere

(Keller, Massart and Brans 1991; Kokot and Phuong 1999). Visual PROMETHEE software

was used for the analysis.

2

Table S1 PROMTHEE II complete ranking results for the five biosorbents

Material Ø Rank

Coconut shell biochar 0.0589 1

Coir pith 0.0342 2

Rice straw 0.0304 3

Rice husk 0.0228 4

Tea waste 0. 0175 5

Tea waste (TW) and coconut shell biochar (CSB) ranking 1 and 5 respectively, were selected

for the preparation of material mixtures as the variability of physico-chemical parameters was

the highest between them.

3

Figure S1 Photos of two selected biosorbents (a) Tea factory waste (b) Coconut

shell biochar

(a) (b)

Table S2 Weight percentage of TW and CSB used to generate biosorbent mixtures

Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Weight % of CSB

10

0 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0

Weight % of TW 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

10

0

4

2. ANOVA results of the material mixtures for different variables

Table S3 Significance of p values obtained by ANOVA for each variable in the material

mixtures

Variable

Significance of P value obtained by

ANOVA

SSA < 0.001

PS < 0.005

PV < 0.001

ZP > 0.5

TAG < 0.001

TBG < 0.005

3. Statistical approach adopted

5

Figure S2 Statistical approach adopted to investigate the influence of biosorbent physico-

chemical properties on heavy metal adsorption (SSA surface area, PV pore volume, PS pore

size, ZP zeta potential, TAG total acidic group and TBG total basic group)

4. Pearson product-moment correlation coefficient (PPMCC)

The Pearson product-moment correlation coefficient (PPMCC) is a measure of the strength

and direction of association that exists between two variables measured at an interval scale.

PPMCC is defined as the covariance of the two variables divided by the product of their

standard deviations. It has a value between +1 and −1, where 1 implies total positive linear

correlation, 0 implies no linear correlation whereas −1 is for total negative linear correlation

(Bruce and Bruce 2017).

This test involves four assumptions: 1. Variables are measured at the interval or ratio level

(i.e., they are continuous); 2. There is a linear relationship between the two variables; 3.

There should be no significant outliers; and 4. The variables should be approximately

normally distributed. Statistical inference based on Pearson's correlation coefficient often

involves running a permutation test (resample test).

5. Principal components analysis (PCA)

PCA is a multivariate data analysis technique employed to assess and visualize the

interdependencies among variables and to trim down the significantly correlated (redundant)

variables which serve to measure the same construct. Matrices of data containing significant

proportions of interrelated variables are converted to a set of new hypothetical variables

known as principal components (PCs) which are orthogonal (uncorrelated) to one another.

They are represented in an order so that the first few PCs represent most of the variation in

6

the original data matrix. PCs reflect both, common and unique variance of the original

variables (conversely, common factor analysis aims to exclude unique variance) and serves to

reduce the number of variables under assessment allowing identification and assessment of

groups of interrelated variables (Salkind 2010).

Where d i , g2 is the squared distance of a given observation to the origin. The squared distance,

d i , g2 is computed (via Pythagorean Theorem) as the sum of the squared values of all the factor

scores of this observation. Components with a large value of cosi ,l2 contribute a relatively

large portion to the total distance and therefore, these components are important for that

observation.

The distance to the center of gravity is defined for supplementary observations and the

squared cosine can be computed and is meaningful. Therefore, the value of cos2 can help to

find the components that are important to interpret both, active and supplementary

observations.

Table S4 cos2 values for PC1 and PC2.

Variabl

e cos2 for PC1 Variable cos2 for PC2

TAG 0.9068 SSA 0.8150

ZP 0.7091 PS 0.7765

SSA 0.4609 PV 0.7116

PS 0.2076 ZP 0.1140

PV 0.0884 TBG 0.0882

7

TBG 0.0017 TAG 0.0515

6. Penalized Regression; Ridge, Lasso and Elastic Net regressions

In contrast to the standard linear model (the ordinary least squares method), penalized

regression allows the creation of a linear regression model that is penalized, for having too

many variables in the model, by adding a constraint to the equation. Penalized regression

methods are also known as shrinkage or regularization methods. The consequence of

imposing this penalty is to reduce (i.e. shrink) the coefficient values towards zero. This

allows the less contributive variables to have a coefficient close to zero or equal to zero. The

most common penalized regression methods are ridge regression, lasso regression and elastic

net regression.

Lasso regression employs L1 Regularization (Lasso penalization) which adds a penalty equal

to the sum of the absolute value of the coefficients. It will shrink some parameters to absolute

zero. Hence some variables will not play any role in the model, adding variable selection as

an intrinsic feature of the method. L2 Regularization used in Ridge regression (Ridge

penalization) adds a penalty equal to the sum of the squared value of the coefficients and

forces the parameters to be relatively small but, never equal to zero. It will include all the

variables. Lambda is a shared penalization parameter for both these methods.

Elastic-net is a mix of both, L1 and L2 regularizations (James et al. 2013; Bruce and Bruce

2017). A penalty is applied to the sum of the absolute values and to the sum of the squared

values. The parameter alpha sets the ratio between L1 and L2 regularization and a hybrid

behavior between L1 and L2 regularization is seen with variable selection as an intrinsic

8

feature. Optimal values for lambda and alpha are calculated by the algorithm using RMSE

(Root mean squared error). RMSE value represents the square root of the variance of

residuals and it is a measurement of the absolute fit of the model to the data. In other words,

it indicates how close the observed measurements are to the model’s predicted values (Bruce

and Bruce 2017; James et al. 2013; Friedman, Hastie and Tibshirani 2009).

7. Packages and Libraries used for statistical analysis in R studio.

The following packages and Libraries were loaded.

library(devtools)

library(caret)

library(factoextra)

library(elasticnet)

library(glmnet)

library(VIF)

library(fmsb)

library(tidyverse)

library(plyr)

library(scales)

library(grid)

library(RVAideMemoire)

8. Codes used for statistical analysis in R studio.

# Loading Libraries for analysis

9

library(devtools)

library(caret)

library(factoextra)

library(elasticnet)

library(glmnet)

library(VIF)

library(fmsb)

library(tidyverse)

# datamatrix is the original data matrix.

# DataPbAll is a data matrix created by removing Cu and Cd adsorption data from the

original data matrix.

# DataCuAll is a data matrix created by removing Pb and Cd adsorption data from the


# DataCdAll is a data matrix created by removing Pb and Cu adsorption data from the


# Prepare the correlation matrix

correlationmatrix <- cor(datamatrix)

# summarize the correlation matrix

print(correlationMatrix)

10

# Permutation test for PPMCC. Example:- PPMCC between PV and SSA is tested with

10000

# Resamples.

x<-datamatrix$PV

y<-datamatrix$SSA

perm.cor.test(x, y, nperm = 10000, progress = TRUE)

# Prepare PCA analysis

pcatest1 <- prcomp(datamatrix,

center = TRUE,

scale. = TRUE)

# Print and summarize output

print(pcatest1)

summary(pcatest1)

# Attributes of PCA test

res.var <-get_pca_var(pcatest1)

res.var$coord

res.var$contrib

# Obtain cos2 values for the variables

res.var$cos2

11

# Print the biplot for PCA

fviz_pca_biplot(pc1,

pointsize = 2,

col.var="dark blue",

repel = TRUE)

# Assessing VIF of variables was done using the following codes. model1 is a linear model

where Pb is the dependent variable. Cu or Cd can also be used as the dependent variable and

the VIF values will be the same.

model1 <- lm(Pb~., data=DataPbAll)

car::vif(model1)

#Checking for aliased coefficients

alias(model1)

# Building models with Enet using glmnet package.

# DataCuSel = A data matrix created by removing following from the original data matrix;

Pb, Cd, Carboxylic, Phenolic, Lactonic, PV.

# DataPbSel = A data matrix created by removing following from the original data matrix;

Cu, Cd, Carboxylic, Phenolic, Lactonic, PV.

# DataCdSel = A data matrix created by removing following from the original data matrix;

Pb, Cu, Carboxylic, Phenolic, Lactonic, PV.

# Defining repeated k-fold cross validation (k=10, repeats=10).

12

fitControl <- trainControl(method = 'repeatedcv',

number = 10,

repeats=10,

search = "grid")

# Building model for Cu.

model.Cu <- caret::train(Cu~ .,

data = DataCuSel,

method="glmnet",

trControl = fitControl,

tuneLength = 20)

# Printing the model.

summary(model.Cu)

print(model.Cu)

# Attributes for final model

model.Cu$finalModel

model.Cu$bestTune

model.Cu$coefnames

# Coefficients when lambda is set to the optimal value

FinModCu <- model.Cu$finalModel

coef(FinModCu, s=model.Cu$finalModel$lambdaOpt)

model.Cu$finalModel$lambdaOpt

13

# Defining and summarize importance of variables

impCu <- varImp(model.Cu, scale=FALSE)

print(impCu)

# Building model for Cd.

model.Cd <- caret::train(Cd~ .,

data = DataCdSel,

method="glmnet",


tuneLength = 20)


summary(model.Cd)

print(model.Cd)


model.Cd$finalModel

model.Cd$bestTune

model.Cd$coefnames


FinModCd <- model.Cd$finalModel

coef(FinModCd, s=model.Cd$finalModel$lambdaOpt)

model.Cd$finalModel$lambdaOpt

14


impCd <- varImp(model.Cd, scale=FALSE)

print(impCd)

# Building model for Pb.

model.Pb <- caret::train(Pb~ .,

data = DataPbSel,

method="glmnet",


tuneLength = 20)


summary(model.Pb)

print(model.Pb)


model.Pb$finalModel

model.Pb$bestTune

model.Pb$coefnames


FinModPb <- model.Pb$finalModel

coef(FinModPb, s=model.Pb$finalModel$lambdaOpt)

model.Pb$finalModel$lambdaOpt

15


impPb <- varImp(model.Pb, scale=FALSE)

print(impPb)

References

Ayoko, Godwin A, Kirpal Singh, Steven Balerea and Serge Kokot. 2007. "Exploratory

multivariate modeling and prediction of the physico-chemical properties of surface

water and groundwater." Journal of Hydrology 336 (1-2): 115-124.

Brans, Jean-Pierre, Ph Vincke and Bertrand Mareschal. 1986. "How to select and how to rank

projects: The PROMETHEE method." European journal of operational research 24

(2): 228-238.

Bruce, Peter and Andrew Bruce. 2017. Practical Statistics for Data Scientists: 50 Essential

Concepts: " O'Reilly Media, Inc.".

Friedman, Jerome, Trevor Hastie and Rob %J R package version Tibshirani. 2009. "glmnet:

Lasso and elastic-net regularized generalized linear models 1 (4).

James, Gareth, Daniela Witten, Trevor Hastie and Robert Tibshirani. 2013. An introduction

to statistical learning. Vol. 112: Springer.

Keller, HR, DL Massart and JP Brans. 1991. "Multicriteria decision making: a case study."

Chemometrics and Intelligent laboratory systems 11 (2): 175-189.

Kokot, S and Tran Dong Phuong. 1999. "Elemental content of Vietnamese ricePart 2.†

Multivariate data analysis." Analyst 124 (4): 561-569.

Podvezko, Valentinas and Askoldas Podviezko. 2010. "Use and choice of preference

functions for evaluation of characteristics of socio-economical processes.

16

Salkind, Neil J. 2010. Encyclopedia of research design. Vol. 1: Sage.

17

ars.els-cdn.com · Web viewFriedman, Jerome, Trevor Hastie and Rob %J R package version Tibshirani....

Documents

Transcript of ars.els-cdn.com · Web viewFriedman, Jerome, Trevor Hastie and Rob %J R package version Tibshirani....