8/20/2019 A graph theoretical approach to SPR
1/12
a
~m po siu m n G ra ph Theory in Chemistry
6 10
2 4
Graph-Theoretical Approach
to Structure-Property Relationships
Zlatko MlhaliC
Faculty of Science and Mathematics, The University of Zagreb, Strossmayerov trg 14.41000 Zagreb, The Republic of Croatia
Nenad TrinajstlC
The Rugjer BoSkoviC Institute,
P.O.B.
1016,41001 Zagreb,The Republic of Croatia
A
fundamental concept of chemistry is that the struc-
tural characteristics of a molecule are responsible for its
pmperties
1).
This was pointed out in the middle of the
last century by Crum Brown and Fraser
2 )
who had also
devised one of the first structure-property models. How-
ever, the earliest work in which this relationship was ob-
served (the toxicity of methyl and amyl alcohols)was a the-
sis by Cms in 1863 (3).
A
Topological Model of Matter
The origin of the structure-property concept can be
traced (4) to the work of the Croatian Jesuit priest, scien-
tist, and philosopher Rugjer Josip BoGkoviC
5)
who intro-
duced the idea of representing atoms as points in space (6).
(His major work was the theory of a single law of forces.)
By allowing the point atoms to assume a variety of differ-
ent arrangements, BogkoviC was able to account for the ex-
istence of different substances.
In this way the BobkoviC model may be considered as the
forerunner of a topological model for the structure of mat-
ter. BoBkovib's fundamental idea, which is of the greatest
importance in chemistry, was that substances have differ-
ent properties because they have different structures. This
idea was used, for example, by Davy to rationalize the dif-
ference between diamond and graphite (4,
7 .
Table 1. List of Selected Topological Indices
Topological Standard Structural interpretationa Author (Year)
index symbol
Wiener
W
Sum o distances
n
a Weiner (1947)
number molecular graph
Hosoya Sum o countsof non- Hosoya (1971)
index adjacent edges
n
a
molecular graph
RandiC Sum of weighted edges RandiC (1975)
index
n
a molecular graph
Balaban
J
Sum of weighted Balaban (1982)
index distances
n
a molecular
graph
Schultz
MTI
Sum of elements of the Schultz (1989)
index structural row matrix
v[A
+bD]of a molecular
graph
Haraty H Sum of squares of PlavSiC, NikoliiC
number reciprocal distances
n
a TrinajstiC (1991)
molecular araoh
Table 2 List of Properties that
Are
Deslrable for
Topological Indices a s Proposed by RandiC 18)
1
Direct structural interpretation
2
Good correlation with at least one molecular property
3 Good discrimination of isomers
4
Locally defined
5 Generalizable
Linearly independent
Simplicty
Not based on physical or chemical properties
Not trivially related to other indices
Effidencyo construction
Based on familiar structural concepts
Correct size dependence
Gradual change with gradual change in structures
QSPR
The structure-~m~ertvelationships want ifv the con-
nection between ihe structure and p pekies o
8/20/2019 A graph theoretical approach to SPR
2/12
liferation will stop in the near future. Here we will review
onlv several selected touoloeical indices. Table 1 ists six
topblogical indices thatAwill-be considered in this report.
Table
2
gives a list of useful properties that are desirable
for topological indices 18).
The desirable properties proposed by RandiL
18)
epre-
sent the very high level of sophistication that a topological
index should achieve. All six indices listed in Table 1 ap-
proach this ideal. Their weakest point is the discrimina-
tion of isomers. This narticular urouertv is rather low for
all topological indices'considerei here except the Balaban
index (19-22). However, this is the weak point of most to-
pological indices, except for molecular identification num-
bers (23). Nonetheless. the low discriminatorv Dower of
many indices does not prevent them from being useful de-
scriptors in structure property activity modelling.
In the next section we will give a brief survey of elemen-
tary (chemical) grapb-theoretical concepts. This section
will be followed by a section containing definitions of the
six selected to~oloeicalndices. In the fourth section a de-
sign of the ~ t ~ c t - ~ m ~ e r t ~elationships will be deline-
ated. Then a didactic example will be presented.
Elementary Graph Theoretical Concepts
We will cover only those graph-theoretical concepts that
will be used in this report. In doing so, we will follow the
book Graph Theory by FrankHarary(24) and both editions
of our book Chemical Graph Theoq (8,16,25).
Graph theory is a branch of discrete mathematics, re-
lated to topology and wmbinatorics. It deals with the way
objects a re connected and with all the consequences of the
connectivity. The connectivity in a system is, thus, a funda-
mental quality of graph theory.
Chemical graph theory is a branch of mathematical
chemistry, and consequently of theoretical chemistry. I t is
concerned with handling chemical graphs, tha t is, graphs
that represent chemical systems. Hence, chemical graph
theory deals with analyses of all consequences of connec-
tivity in a chemical system. In other words, chemical graph
theory is concerned with all aspects of the application of
graph theory to chemistry.
The Concept of a Graph in Graph Theory
The central concept in graph theory is tha t of a graph.
For a graph theorist, a graph is the application of a set on
itself, tha t is, a collection of elements of the set and of bi-
nary relations between these elements. Graphs are one-di-
mensional objects, but they can be embedded or realized in
spaces of higher dimensions.
For a chemist, the two-dimensional realization of a
graph is more appealing, tha t is, a set of vertices (points)
and of edges (lines) oining these vertices. Agraph G can be
visualized by a diagram when the vertices are drawn as
small circles or dots, and the edges as lines or curves con-
Figure
1.
Adiagram of a labeled (numbered)graph, showing vertices
as circles and edges as lines. The graph is aclualy aone-dimensional
entity, by
it
can be realized
in
two dimensions, as shown here.
necting the appropriate circles. Because a diagram of a
graph completely describes the graph, it is customary and
convenient to refer to the diagram of the graph as the
graph itself.
Mainly due to their diagrammatic representation,
graphs have appeal as structural models in science, in gen-
eral, and in chemistry, in particular (26,271.As an exam-
ple, Figure
1
hows a diagram of a labelled graph. Agraph
is called labeled when a specific numbering of the its verti-
ces is introduced.
The Concept of a Graph in Chemistry
In chemistry, graphs can be used to represent a variety
of chemical objects such as molecules, reactions, crystals,
polymers, and clusters. The common feature of chemical
systems is the presence of sites and connections between
them. Sites can be atoms, electrons, molecules, molecular
fragments, intermediates, ete., while the connections be-
tween sites can represent bonds, reaction steps, van der
Waals forces, etc. Chemical systems can be represented by
chemical graphs using a simple conversion rule: Sites are
replaced by vertices and wnnections by edges.
Molecular Gmphs
A special class of chemical graphs a re molecular graphs.
Molecular graphs (or constitutional graphs) are chemical
graphs tha t represent the constitution of molecules. In
these graphs vertices correspond to individual atoms, and
edges correspond to the bonds between them. An interest-
ing historical detail
i.;
related to the concept of the molecu-
lar graph: The term graph was introduced by English
mathematician Sylvester (28) in 1878 on the basis of the
constitutional formulas used by the chemists of his day.
To simplify the manipulation of molecular graphs, hy-
drogen-depleted graphs are often used. Such graphs repre-
sent only the molecular skeletons, omitting hydrogen
atoms and their bonds. As an example, Figure gives a
labeled molecular hydrogen-depleted graph that depicts
the carbon skeleton of 2,3,4-trimetbylhexane.
Figure
2.
A aoelea, hydrogen-depleted,molecular graph correspond-
mg lo tne carbon skeleton of 2.3.4-trimethylhexane.Tne vertices cor-
respono to aroms, an0 the edges correspono to chem.ca wnos.
Analyzing and Comparing Graphs
Two graphs GI and Gz are isomorphic if there exists a
one-to-one correspondence between their vertex sets V(GJ
and V(G2), which induces a one-to-one correspondence be-
tween their edge sets E(GJ and E(G2).
n
invariant of a
graph G is a quantity associated with G tha t has the same
value for any graph that is isomorphic with G. I t should be
noted that topological indices are graph invariants.
Two vertices i and of a graph G are adjacent if there is
an edge joining them; the vertices i and are then incident
to such an edge. Similarly, two edges of G are adjacent if
they have a vertex in common. The valency of a vertex i of
G is the number of edges incident to i. This is denoted by
dm.
7 2
Journal
of
Chemical Education
8/20/2019 A graph theoretical approach to SPR
3/12
A walk of a graph G is an alternating sequence of vertices
and edges, beginning and ending with vertices, in which
each edge is incident with the two vertices immediately
preceding and following it. A path is a walk in which no
vertex occurs more than once. The distance between two
vertices is the number of edges in the shortest path that
joins the two vertices. Agraph G is connected if every pair
of its vertices is joined by a path. Otherwise, a graph is
considered disconnected.
A graph whose vertices all have the same valence is
called a regular graph. If all vertices in a regular graph
have a valence of 2, then the graph is called a cycle. A tree
is a connected acyclic graph. The molecular graph in Fig-
ure 2
is
an example of a tree. A graph is acyclic
if
it has no
cycles.
Associating Graphs with Matrices
A labeled (chemical) graph may be associated with sev-
eral matrices. Two very important graph-theoretical ma-
trices a re the vertex-adjacency matrix and the distance
matrix.
The vertex-adjacency matrix, A = A(G), of a labeled con-
nected graph G with N vertices, is a square symmetric ma-
trix of orderN. It is commonly called the adjacency matrix.
It is defined below.
1 if
vertices
i
and are adjacent
1)
The distance matrix, D = D G), f a labeled connected
graph G with N vertices is a square symmetric matrix of
order N. It is defined below.
where l j is the length of the shortest path (i.e., the dis-
tance) between the vertices and in G.
Very often the distance matrix of a graph G can be gen-
erated using powers of the corresponding adjacency matrix
of G 29).Table 3 gives the adjacency matrix and the dis-
tance matrix that correspond
t
the molecular graph in
Figure 2.
Table3 The Adjacency Matrix and the Distance Matrix
of the Molecular Graph in Figure2
Definitions of the Selected Topological Indices
Wiener Number
The Wiener number, W = W(G) of G, wasintroduced by
Wiener
in
1947 as the path number 30).This topological
index is defined as the half-sum of the elements of the dis-
tance matrix 15).
Table 4 gives an example for computing he Wiener num-
ber.
Table
4
The Computation of the Wiener Number for a
Tree TDepicting the Carbon Skeleton
of BMethylbutane.
(a)A labeled tree 7
b)
The distance matrix of T
(c) The Wiener number of T
W n =-( I+ 8.2+ 4.3) 18
2
Table5 The Computation of the Hosoya Index for a
Tree TRepresenting the Carbon Skeleton
of 2 3-Dimethylpentane.
(a)A tree
T
(b)
The count of the ~(Tfiquantities
n T
i)
p T;O)
= 1
ii)
p T;I)
=
6
i i i ) p(T;2)
=
8
iv)
p(T:3)
= 2
(c)The Hosoya index of
T
Z n
=
p(T;O) p(T;l)
p(T;2)
p(T;3) 17
Volume 69 Number 9 September 1992
7 3
8/20/2019 A graph theoretical approach to SPR
4/12
Table6. The Edge Weights of 10 Edge Types
Which Appear in Graphs Corresponding to the Carbon
Skeletons of Hydrocarbons
Table8. The Computation of the Balaban lndex
for a Labeled Tree TRepresenting the Carbon Skeleton
of 2,3-Dimethylpentane
(a)A labeld tree 7
I ? 1
1 2 0.7071
1,3 0.5774
1,4 0.5
2 2 0.5
2 3 0.4082
2,4 0.3536
3,3 0.3333
3,4 0.2887
4,4 0.25
Table 7. The Computation of RandiC lndex for a Tree T
Depicting the Carbon Skeleton
of 4-Ethyl-2-methylheptane
(a)
A
tree T
b) Count o the edge-types (the numbers at the vertices
represent their valencies)
4 2 = 2
4 3 = 2
bL2= 1
b 3 = 4
c)
The Randit index o
x q 2.0.7071 2.0.5774 0.5 4.0.4082 4.7018
Hosoya lndex
The Hosoya index, Z
=
Z(G), was introduced by Hosoya
in 1971 as the Z index 15).This index is defined below.
wherep(G; i) is the number of selections of i mutually non-
adjacent edges in G.
By definition, p(G; 0) = 1, and p(G; 1) s the number of
edges in
G.
Table 5 gives an example of computing the
Hosoya index.
RandiC lndex
The Randid index,
=
x(G) of G was introduced by
RandiC in 1975
as
the connectivity index (31). This is one
of the most widely used topological indices in QSPR (32-
c) The Balaban index o
b)The distance sums
34) (and also in quantitative structure-reactivity relation-
ship ( SARI 35)).
The Randid index is defined as
D( z)=
where d(i) and d j)are the valencies of the vertices
i
and
tha t define the edge ij.
For saturated hydrocarbons, eq 5 may be givenin closed
form. In molecular graphs that depict the carbon skeletons
of hydrocarbons, only four types of vertices with respect to
their valencies appear, tha t is, vertices with
d
=
1,2,
3 4.
These give rise to 10 types of edges whose weights are
given in Table 6.
If the number of each edge type is denoted by
0 1 2 3 4 2 3
1 0 1 2 3 1 2
2 1 0 1 2 2 1
3 2 1 0 1 3 2
4 3 2 1 0 4 3
2 1 2 3 4 0 3
3 2 1 2 3 3 0
bg
where i
= 1, ... 4
j
i ... 4
and if the edge weights from Table 6 are used, then eq 5
becomes the following.
This expression reveals tha t the Randid indices of hydm-
carbons are fully determined by the counts of the edge
types in the corresponding hydrogen-depleted graphs.
Table
7
gives an example of computing he RandiC index by
means of eq 6.
Balaban lndex
The ~a l a banndex,
J
= J(G) of G, was introduced by
Balaban in 1982 as the average-distance sum connectivity
(36). It is defined as
704 Journal of Chemical Education
8/20/2019 A graph theoretical approach to SPR
5/12
Table
10.
The Com~utation f the Haraw Number for a
able 9. The Computationof the Schultz lndex for a
Tree TDe~ictinahe Carbon Skeleton
a)A labeled tree
T
b)The adjacency matrix of
T
ic) The distance matrix of
d) The adjacency-plus-distance matrix of T
I l
e)The valence row matrix of T
v T ) = [ 1 3 2 2 1 11
1)
The v[A Dl row matrix
v[A D] T) [22 15 16 16 25 221
(g)
The Schukz index of T
MTZ T)
= 2.22 15 16 18 25= 118
where
M
is the number of edges in G; v is the eyelomatic
number of G;and D) i s the distance sum where
i
= 1,2,
...,
N
The cyclomatic number
= p G)
of a polycydic graph
G
is equal to the minimum number of edges that must be
removed from
G
to transform it to the related acyclic
graph. For trees,
= 0;
for monocycles,
v =
1
The distance sum Dlifor
a
vertex i of G represents a sum
of all entries
in
the corresponding row of the distance ma-
trix.
Clearly the Wiener number can also be expressed in
terms of the distance sums.
Table 8 gives an example of computing the Balaban index.
Tree T~eljictin~he Carbon skeleton
of 2,3-Dimethylhexane
a)
A
labeled tree T
b)
The distance matrix
of 7
c)The D- matrix of T
I
.2 0.25
0.33 0.5 1
0
0.2 0.25
0.5 1
0.5 0.33 0.25
0.2
0 0.33
0.33 0.5 1
0.5 0.33 0.25
0.33
0
d)The
D-
matrix o
T
I
1 0.25 0.11 0.06
0.04 0.25
0.11
1
0 1 0.25
0.11 0.06 1
0.25
0.25
1
0
1
0.25 0.11 0.25
1
e)The Harary number of T
H T)
= 14.1 16.0.25 14.0.11
8.0.06 +40.04) 10.10
Schultz lndex
The Schultz index, MTI = MTI G) of G , was introduced
by Schultz in
989
as the molecular topological index 3 7 ) .
This index is defined below 21,371.
MTI = i
i = l
10)
where the ezs i
=
1,2, ...,N represent the elements of the
following row matrix of order
N
where
v
s the valency row matrix,
A
is the adjacency ma-
trix, and is the distance matrix. Table
9
gives an exam-
ple of computing the Schultz index.
araiy Number
The Harary number, H = H G ) of G , was introduced by
PlavSiC et al. 3 8 ) n 99 n honor of Professor Frank Har-
ary on his 70th birthday He greatly influenced the devel-
opment of graph theory and chemical graph theory. This
index is defined below.
Volume
69
Number
9
September
992
7 5
8/20/2019 A graph theoretical approach to SPR
6/12
Table
11.
The Wiener
Numbers
IWI.
Hosova
Indices a.andic indices
irl
Balaban lndices
(4,
chultz indic M T ~ arary ~u ni be rsH) and Boili
Points (bp In 'C) of Alkanes with Up to 1 Carbon Atoms
Alkane
W Z
J
MTI
H
p
methane 0
ethane 1
propane 4
2-methylpropane 9
butane 10
2,2-dimethylpropane 16
2-methylbutane 18
pentane 20
2.2-dimethyl butane 28
2.3-dimethyl 29
butane
2-methylpentane 32
3-methylpentane 31
hexane 35
2,2,3-trimethylbutane 42
2,2-dimethylpentane 46
3,3-dimethylpentane 44
29-dimethylpentane 46
2,4-dimethylpentane
48
2-methylhexane 52
3-methylhexane 50
3-ethylpentane 48
heptane 56
2,2,3,3-tetramethyl- 58
butane
2,2,3-trimethyl pentane 63
2,3,3-trimethyl pentane 62
2,2,4-trimethyl pentane 66
2,2-dimethyl hexane 71
3,3-dimethylhexane 67
3-ethyl-3-methyl- 64
pentane
2,3.4-trimethylpentme 65
2,3-dimethylhexane 70
3-ethyl-2- 67
methyipentane
3,4-dimethylhexane 66
2,4-dimethylhexane 71
2,s-dimethylhexane 74
2methylheptane 79
3-methylheptane 76
4methylheptane 75
3-ethylhexane 72
octane 84
2,2,3,3- 82
tetramethylpentane
2,2,3,4- 86
tetramethylpentane
2,2,3-trimethylhexane 92
2.2-dimethyl-3- 88
ethylpentane
3.34-trimethylhexane 88
2,3,3,4- 84
tetramethylpentane
2,3,3-trimethylhexane 90
2,3-dimethyl-3- 86
ethylpentane
2,2,4,4- 88
tetramethylpentane
where V s th e mat rix whose ele-
ments ar e th e squares of the reciprocal
distances in
G.
TheD matrix may be considered as
the distance matrix of a class of spe-
cially weighted graphs in which
weights between vertices in
G
mimic
the Coulomb law between the sites in
the corresponding structure. Table 10
eives a n examole of comoutine the
ka ra ry number:
Table 11 eives the Wiener and Har
ry numbers, and the Hosoya, RandiC,
Balaban, and Schultz indices for al-
kanes with up to 10 carbon atoms.
Designing QSPR Models
There a re several ways to design
QSPR models 39-44).Here we outline
one possible strategy. Figure 3 con-
tains a flow diagram of the steps in-
volved in the design of a QSPR model.
This is an iterative approach.
Step
1 Get a reliable source of experi-
mental data for a given set
of
molecules.
This initial set of molecules is sometimes
called the training set
45).
The data in this
set must be reliable and accurate. The qual-
ity of the selected data is important because
it will affect all the following steps.
Step 2 The topological index is selected
and computed. This is also an important
step because selecting the appropriate topo-
logical index (or indices) can facilitate find-
ing the most accurate model.
Step 3 The two sets of numbers are then
statistically analyzed using a suitable alge-
braic expression.
The QSPR model is t hus
a
regression
model, and one must be careful about
its statistical stability. Chance factors
could yield spuriously accurate corre-
lations (4648). The quali ty of the
QSPR models can be conveniently
measured by the correlation coefficient
r and the s tandard deviations. Agood
QSPR model must have > 0.99, while
depends on the property. For exam-
ple, for boiling points,
s c
5 C. There-
fore, Step 3 is a central step in the de-
sign of the structure-property models.
Step 4 Predictions are made for the val-
ues of the molecular property for species
that are not part of the training set
via
the
obtained initial
QSPR
model. The unknown
molecules are ~ t ~ ~ t u r d l yelatedto the ini-
tial set
of
compounds.
Step 5
The predictions are tested with
unknown molecules by experimental deter-
mination of the predicted properties. This
step is rather involved because it requires
acquiring or preparing the test molecules.
Step
6. If the tests support the predic-
tions, one presents the
QS R
model in its
final form with all necessary statistical
characteristics.
If the te sts do not support the initial
QSPR model, it must be revised and
7 6
Journal of Chemical Education
8/20/2019 A graph theoretical approach to SPR
7/12
Table 11 Continued
Alkane
2 2 4trimethylhexane
2 4 4trimethylhexane
2 2 5-trimethylhexane
22-dimethyiheptane
3 bdimethylheptane
44-dimethylheptane
3-ethyi-3-methylhexane
3 bdiethylpentane
23.4-trimethylhexane
2 4-dimethyl-3-ethyipentane
2 3 5-trimethylhexane
2 3-dimethylheptane
3-ethyl-2-methylhexane
3 4-dimethylheptane
3-ethyl-4methylhexane
2 4-dimethylheptane
4-ethyl-2-methylhexane
3.5-dimethyiheptane
2 5-dimethylheptane
2 6-dimethyiheptane
2-methyloctane
3-methyioctane
4-methyloctane
Sethylheptane
4-ethylheptane
nonane
2 2 3 3 4-pentamethylpentane
2 2 3 3-tetramethylhexane
3-ethyl-22.3-trimethylpentane
3 3.4 4-tetramethylhexane
2 2 3 4 4-pentamethylpentane
2 2 3 4-tetramethylhexane
3-ethyl-2 2 44rimethylpentane
2 3 4 4tetramethyihexane
2 2 3 5tetramethylhexane
2 2 3-trimethylheptane
2 2dimethyl-3-ethylhexane
3 3 4trimethylheptane
3.3-dimethyl-4-ethylhexane
2 3 3 4-tetramethylhexane
3 4 4-trimethylheptane
3 4-dimethyl-3-ethylhexane
3-ethyl-234-lrimethylpentane
2 3 3 54etramethylhexane
2 3 3-trimethylheptane
2.3-dimethyl-3-ethylhexane
33diethyl-2-methylpentane
2 2 4 4tetramethylhexane
2 2 5-trimethylheplane
2 5 54rimethylheptane
2 2 6-trimethyiheptane
2 2-dimethyloctane
3 3-dimethyloctane
4 4-dimethyloctane
3-ethyl-3-methylheptane
4-ethyl-4-melhylheptane
3 3-diethylhexane
MT
2 3 4 5tetramethylhexane
121 58 4.4641 3.8140 436 13.9933 161
Volume 69 Number
9
September
1992
707
8/20/2019 A graph theoretical approach to SPR
8/12
Table
11
Continued
Alkane
2,3.4-trimethylheptane
2,3-dimethyi-4-ethylhexane
2,3-dimethyl-4-ethylhexane
2,4-dimethyl-3-ethyihexane
3,4,5-trimethyiheptane
2,4-dimethyl-3-isopropylpentane
3-isopropyl-2-methylhexane
2,35trimethylheptane
2,5-dimethyl-3-ethylhexane
2,4.5-trimethylheptane
2,3.6-trimethylheptane
2,3-dimethyloctane
3-ethyl-2-methylheptane
3.4-dimethyloctane
4-isopropylheptane
4-ethyl-3-methylheptane
43-dimethyloctane
3-ethyl-4-methylheptane
3.4-diethylhexane
2,4,6-trimethylheptane
2,4-dimethyloctane
4-ethyl-2-methylheptane
3,5-dimethyloctane
3-ethyl-5-methylheptane
2,5-dimethyloctane
5-ethyl-2-methylheptane
3.6-dimethyloctane
2.6-dimethyioctane
2.7-dimethyloctane
2-methylnonane
3-methylnonane
4-methylnonane
3-ethyloctane
5-methylnonane
4-ethyloctane
4-propylheptane
decane 165 89 4.9142
the procedure repeated. The
QS R
model thu s estab-
lished, even for a narrow c lass of compounds, is a very use-
ful
tool for predicting t he properties of hypothetical com-
pounds a nd for the s earch for new compounds with
programmed properties 12).
An Instructive Example
We will apply the procedure from the preceding section,
to give an instru ctive example of the design of the
QSPR
model for predicting th e boiling points of alkanes. As the
initial set we will consider alkanes with up t
8
carbon
atoms (40 molecules).
Step
The boiling points ( C) of the alkanes are taken from the
CRC Handbook of Chemistry and Physics
49)
and Beil-
stein
50).
Step
2
We will consider at th is s tage
all
six topological indices
discussed i n this report.
3.5833
3.7561
3.7561
3.7979
3.6854
3.9835
3.7280
3.4617
3.6033
3.5027
3.3014
3.1296
3.3978
3.3088
3.4999
3.5637
3.3759
3.5299
3.6982
3.3374
3.1600
3.3908
3.2686
3.4123
3.1244
3.2555
3.1682
3.0333
2.9095
2.7732
2.8862
2.9680
3.0869
2.9984
3.2055
3.2951
2.6476
Step
MTI
find
The following structure-property models ar e th e most
successful for each index considered:
p 77.93 (M.97) ~30899 0 0137 - (3.35 .02)10~
-164.24 (i4.99) (13)
7 8
Journal of Chem ical Education
8/20/2019 A graph theoretical approach to SPR
9/12
Table
12.
The Predicted Values of Boiling Points
( C)
of Nonanes
Figure 3. A flow diagram of the steps involved
n
the design of a
QSPR model.
1 Source of experimental data. 2: Seledion of
the
topological index.
3:
Statistical work and senino uo the QSPR model. 4 Predictions.
~ .~ .
r
~
~
5: Test ng the predictions.
6.
The final foml ofthe OSPR model. S:
Tests confirmea he nit:al model. Tne model appears to be satlsfac-
lory for f~rtherwork. hS: Tens rejected the nit al model as not sat~ s-
factory. Tne model mJst be rev,seo and the proced~reepeateo
~ n t i l
the satisfactory model is obtained
The most accurate models ar e those based on in
Z
eq 14)
and eq 15). They
will
be used in th e next step.
Step 4
We use eqs
14
and
15
o predict t he boiling points of non-
anes 35 molecules) see Table 12).
Step 5
We compare the predicted and experimental values of
the nonane boiling ~ o i n t ssee Table 13).
Both models have problems with some members of the
nonane series. However. when S t e ~i s r e ~ e a t e dsine the
boiling points of all alkan es with
up
o
9
Ar ba n atom; the
QSPR models based on
n Z
an d did not improve. The
slight improvement happened only when a hiparametric
model with and N is th e number of carbon atoms in alk-
ane) was used.
This model
is
given by
predicted boiling point
Nonane ~q 14 ~q 15
2,2,3,3-letramethylpenlane
119.26 119.40
2,2,3,4-tetramethylpentane
2,2,3-trimethylhexane
2,2-dimethyl-3-ethylpentane
3,3,4-trimethylhexane
2,3,3,4-tetramethylpentane
233-trimethylhexane
2,3-dimethyl-3-ethylpentane
2,2,4,4-tetramethylpentane
2,2,Plrimethylhexane
2.4,Ptrirnethylhexane
2.2,5-lrimethylhexane
22-dimelhylheptane
3.3-dimethylheptane
4.4-dimethylheptane
3-ethyl-3-methylhexane
3.3-diethylpentane
2,3,Ptrimethylhexane
2,4-dimethyl-3-ethylpentane
2,3,5trimethylhexane
2,3-dimethylheptane
3-ethyl-2-methylhexane
3,4dimelhylheptane
3-ethyl-Pmethylhexane
2,4-dimethylheplane
4-ethyl-2-methylhexane
3,5-dimelhylheptane
2,5-dimethylheptane
2,6-dimethylheptane
2-methyloctane
3-methyloctane
4-methyloctane
3-elhylheptane
4-ethylheptane
nonane
The procedure may be repeated, a nd we will eventually
arrive a t the best possible QS R model for predicting the
boiling points of alkanes.
Step
6
All thre e models expressed
as
14, 15, and 19 may serve
as reliable models for predicting th e alk ane boiling points.
Plots of
p
vs in
Z
p
vs
X
and
p
vs
X
nd th e accompa-
nying statis tical da ta a re given, respectively, in Figures
4-
6.
The boiling points of alkane s have been predicted many
times 8,13,15 ,3037 ,40,51 ). Althoughmost of the QSPR
models produced are very accurate
r > 0.998, s <
2
W
they suffer from several shortcomings.
i. Methane was not considered.
In
some cases other lighter
alkanes, such as ethane and propane, were also eliminated
from the study.
ii. Models were built for a limited set of alkanes, usually for
C4-C7 families.
iii. The complexity of some of the accurate QSPR models n
the l iterature is forbidding.
For
example, one
of
the most
a m -
rate QSPR models for predicting boiling points of
alkanes
is
the following 40 ) .
All
alkanes with up to
9
carbon atoms have
been considered but methsne.)
Volume
69
Number
9
Sevternber
1992
709
8/20/2019 A graph theoretical approach to SPR
10/12
Table 13 Comparison between Predicted (Two Models) and Experimental Values of Boiling Points ( C) of Nonanes
Nonane (bp)exp Model Model Nonane (bp) Model Model
(14) (15) (14) (15)
2,2,3,3-tetramethylpentane
2.2,3,4-tetramethylpentane
2.2,3-trimethylhexane
2,2-dimethyl-3-
ethylpentane
3,3,4-trimethylhexane
2,3,3,4-tetramethylpentane
2,3,3-trlmethylhexane
2.3-dimethyl-3-
ethylpentane
2,2,4. tetramethylpentane
2,2+trimethylhexane
2,4, trimethylhexane
2.23-trimethylhexane
2,2-dlmethylheptane
3,3-dimethylheptane
4,4-dimethylheptane
3-ethyl-3-methylhexane
32-diethylpentane
2,3,4-trimethylhexane
2,4-dimethyl-3-
ethylpentane
2,3,5-trimethylhexane
2.3-dimethylheptane
3-ethyl-2methylhexane
3,4dimethylheptane
3-ethyl-4-methylhexane
2,4dmethylheptane
4-ethyl-2-methylhexane
3,bdimethyiheptane
2,5dimethylheptane
2,6dimethylheptane
2-methyloctane
3-methyloctane
4-methyloctane
3-ethylheptane
4-ethylheptane
nonane
2M
0 W
0 50
1 w 1 50
2w
25 0 3 w 3 50
In
gure 4. plot of p
vs
In Zfor the first 40 alkanes.
710 Journal of Chemical Education
8/20/2019 A graph theoretical approach to SPR
11/12
Figure 5 Aplot of bpv s for the first
40
alkanes
Figure 6 A plot of bpvs y or the first
75
alkanes
Volume 69 Number 9 September 199 2
711
8/20/2019 A graph theoretical approach to SPR
12/12
The
in eq 20
are
defined Figure 7. Examples
of
a path (3rd order), a cluster (3rd order) and a pathcluster 4th order) for a
as follows.
tree Tcorresponding to 3-methylpentane.
The
extended connectivity index
m ~ =[d(i) dm
...
d(m l)la5
(21)
where m represents the order of possible fragments. When
m = 1. framnents are edges which lead to the f int-order
connekivitY ndex
x.
-
The
zero-connectivity index
u
where nl,
n 2 ,n3,
and
n4
are the numbers of vertices with
valencies 1,2,3, and 4, respectively
Connectivity indices ' x of order m and type t can be ob-
tained by summing analogous terms over subgraphs in-
volving paths (t = p ,clusters (t= c), or path-cluster ( t =pc)
combinations ofm edges. Examples of a path, a cluster and
a path-cluster are given in Figure 7.
To conclude this section we stress that there is no simple
QSPR model for predicting boiling points over a wide
range of alkanes. However, if we limit ourselves to a simple
family of alkanes (especially with less than 10 carbon
atoms), then simple aceurate models are possible
34).
Conclusions
In this report we presented a strategy for designing the
quantitative structure-property relationships based on to-
pological indices. The instructive example was directed to
the design of the structure-property model for predicting
the boiling points of alkanes. Six selected topological indi-
ces were tested. The most accurate QSPR models for alk-
ane boiline ~oi n ts re based on ln
2.
and
Nu.
The accu-
~ ~~
racy of t h l bodel was judged according to thLcorrelation
coefficient and the standard error. The umer limits for the
accurate models were set a t
r
> 0.995
z s
5
T
We conclude that there is no simple single-parameter
QSPR model for predicting the boiling points over a wide
range of alkanes due to the great diversity among experi-
mental values. Multivariate regression models appear to
be verv accurate due to a varietv ~arametersnvolved in
the correlation. Each of these p&.meters takes care of a
certain structural detail of a large alkane. When all di-
verse structural features of alkanes are considered, the
model usually gives extremely good agreement between
the experimental and calculated boiling points.
Acknowledaement
We are thankful to the Ministry of Science, Technology,
and Informatics of the Republic of Cmatia for support.
3. LipecL, R. LEnuimn. Tmrhi. Chem 1989.8, 1.
4. hon ey, R.
J.
Chem. Ed=. 1886,62,846 .
5. DdiC,~R~uaiuaiB Po~oi4
kolaka
knjige: W b . 1987. This s a bilibiligvsl edition:
cmatlan and English.
6. Basmvick,
R
J. ~Ik- i~ph i loeoph imotvmlia &It ad micam legem uirivm in
mtum exUffntium; Runondinl: Venetia, 1763. The English translation ia also
m4abl e: The TheoryafNolvrol Ph ih ph y; MIT Cambridge,MA, 1966.
7.
Daw,
H. EIPmntaofCkmimlPhl losophy; London, 1812.
8. %ajstiC,
N.
Chemiml Gmph Theory; CRC: BoeaRaton. FL, 1963:Vol.lI,Chapter
I1 hunay. D I1 InCh.mloolAppiicanomo/T~pd~g ndUmph
T h o in
R B.
Ed .Elsene,: Amsterdam. 1981;
p
159.
12 Smkcneh. M. .. Stankcnch. I
V .
Mm X. S R u m Ckm Roo 1S88.57.337.
13. Hanscn.P J
:
Jura P : J Chm E d vc
LW
65.575
11 Rsndk M .I Math ChDm
1890.4
337.
15 llopava H
Bull
ChemSa .
Jomn
1071.44.2332
.
16. Trinajat if.N. Ckml ml Gmkh Thewry, 2nd neviaeded.; CRC: BoeaRaton, FL, 99%
chapter 3.
17. huvray , DH. J.MoL St m t. (Thm hemJ 1988,285,187.
18. Randii. M. J Moth. C h . 891. 7.155.
19. Bonrheu.D lbnsp lk .U
J
Chrm Phya.
l m.
67.4517.
20 F h l a b ~ n . ~.Bumms. L Math Chm. lMvlk~ m uh?
lW.9.
14.21:l
21 \lullcr
W R ;Szymanalu.K; Knop. J V.. 'lhna).uc. S J Chem In/ Compur
Sn
1880.30.160
22. Plav3iC.D.; Nib%
S.;Rinajsti6
N. J Moth. Ck.m in pms.
23. szymansld.K: o u e r , R. ffiop J. V;%sjati&, N. ~ n t . @onrum cham:
Qunntum ChPm Symp. 1989,20,173.
24. Haran. F. Gmph Theary;Addison-Wesley: Reading,MA, 1971:
2nd prmtmg.
25. %ajetif,N. Ch am id Gmph Thmy RC: Baca Raton,K 1983;Val. 1.
26. Chartrand,G. mphs
m
Mothematical M&b;
Rindle ,
we be^, and Sehmidt: Be -
ton,
MA
1977.
27. l hng sti C, N.
In
MATHICHEMICOMP 1967; Lacher,
R
C.. Ed.;
ElsevierAmater
dam, 1986, p83.
28. Sylvester J. J.Natum 1878,17.264.
29. hbelta, F.
8.
Dk re te Molhemniiml M&l; Rentiee-Hall: Englearaod CIS%NJ,
1976: p 56.
30. Wiener.H. J. Am Chem.Soc 1917,69.17.
31. RandiC, M.J.Am. Chem Soc. lW6,9 7,6MR .
82. %zinger, M.: Chr(den, J. R.; Dub0is.J. E. J C h . °C Compul: &i. 19S5,26 ,23.
33. huvray, D. U %.Am 1988,254,40.
34. Sey bld , P. 0.;May,M.;Bagal,U. A Ck m. Edv c 1%87,84,575.
35. Kier. h B.:Hall. L.H. Molffvlor Conmti uitv
in
Stmbre4ctiuihlAdwie.. Wiley:
N Y ,
1986.
36. Rslaban,A. T C h . hys. Lo . l M , 9.399
37. Sehul tz, H. P. J. Chem Inf Compvt Sc i 1983.29.221.
88. PhraiC, D.; Nikoli&,
S.;
Trinajatik,N. J.Moth Chem, sutmuttedforpublicat im.
39. Ran&&,M.; Jema n- Bl di f, B.; Gmaaman, S.C.; Rounay.
D.
H. Math. Compul.
Mmklling 1968,6,571.
4C. Needham,D. E.; Wei,M .;&yb ld.P G.J.Am.Chem S a 988,120,4188.
41. Nizhnii,
S.
V Epehtein,N. A. R u m Chem Rou. 1078,47,363.
42. Hol,
W.
G.
J A w u
Ckm.
Id.
Ed*. En#. 1983, 26,767.
43.
B a d , S. c.; Niemi. G.
;
Vdth.
G.
DI C o m p v l n t i ~ ~ lhemiml Gmph
ThmX
huvray,
D
H. d.; Nova: New
Ymk
990; p 235.
U . Psta,B.,Mayer, J.M .A cl aPh an Jugarl. 1990,40,315.
45. W.; hvi lk m, J. InPmt lool Applimtlolo of Q~ mt if ot ii m&=m4cIiu-
ity Roiationahipa (QSARJ in Enuimnmnfd Clumiafry and lbdmlogy; M e r
w: Deviuem, J.. Ed*.:
muarer:
Dordnecht, 1990; p 1
46. Topliaa.
J.
G.; Coste1lo.R. J. J Md . Chem la?& 15,1066.
47. lbpliss, J. G.; Edwards, R. P.J Med Chem 1818.22.1238.
48. Banchav, D.;Mekenyan, 0.J M&. Ckm.. pms.
49. We&, R. C.
CRCHa kofChrmlatnondPhysiac,
67th d , 3 d rinting:CRC:
Baea
Raton FL. 987.
50. Re&tPmbHandbueh &r%Mis~ishen Chamie.
51. Nip,PA :Belaban, T.-8.;Balaban,A T J.Math. Chem 1987,1,61.
Top Related