Multifractal Characterisation and Analysis of Complex Networks · repeats on an ever-reduced scale....

$: Multifractal Characterisation and Analysis of Complex Networks · repeats on an ever-reduced scale. Fractal analysis is a useful method for their study. However, real-world fractals$
Multifractal Characterisation and Analysis of

Complex Networks

Danling Wang

Bachelor of Science (Information and Computation Sciences)

Master of Science (Applied Mathematics)

Thesis submitted for the degree of Doctor of Philosophy in

Discipline of Mathematical Sciences

Faculty of Science and Technology

Queensland University of Technology

2011

Principal supervisor: Professor Vo Anh

Associate supervisor: Professor Zuguo Yu

Multifractal Characterisation and Analysis of Complex Networks

____________________________________________________________________

1

Keywords

Complex networks; weighted networks; protein-protein interactions; fractal

dimension; self-similarity; iterative scoring method; multifractal analysis; the

generalized fractal dimension; scale-free networks; small-world networks; random-

networks; gene networks; correlation coefficient; time series; fractional Brownian

motion; Hurst index; binomial multifractal measure; measure representation of DNA

sequence; degree distribution; resilience.


____________________________________________________________________

2

Abstract Complex networks have been studied extensively due to their relevance to many

real-world systems such as the world-wide web, the internet, biological and social

systems. During the past two decades, studies of such networks in different fields

have produced many significant results concerning their structures, topological

properties, and dynamics. Three well-known properties of complex networks are

scale-free degree distribution, small-world effect and self-similarity. The search for

additional meaningful properties and the relationships among these properties is an

active area of current research. This thesis investigates a newer aspect of complex

networks, namely their multifractality, which is an extension of the concept of self-

similarity.

The first part of the thesis aims to confirm that the study of properties of complex

networks can be expanded to a wider field including more complex weighted

networks. Those real networks that have been shown to possess the self-similarity

property in the existing literature are all unweighted networks. We use the protein-

protein interaction (PPI) networks as a key example to show that their weighted

networks inherit the self-similarity from the original unweighted networks. Firstly,

we confirm that the random sequential box-covering algorithm is an effective tool to

compute the fractal dimension of complex networks. This is demonstrated on the

Homo sapiens and E. coli PPI networks as well as their skeletons. Our results verify

that the fractal dimension of the skeleton is smaller than that of the original network

due to the shortest distance between nodes is larger in the skeleton, hence for a fixed

box-size more boxes will be needed to cover the skeleton. Then we adopt the iterative

scoring method to generate weighted PPI networks of five species, namely Homo

sapiens, E. coli, yeast, C. elegans and Arabidopsis Thaliana. By using the random

sequential box-covering algorithm, we calculate the fractal dimensions for both the

original unweighted PPI networks and the generated weighted networks. The results

show that self-similarity is still present in generated weighted PPI networks. This


____________________________________________________________________

3

implication will be useful for our treatment of the networks in the third part of the

thesis.

The second part of the thesis aims to explore the multifractal behavior of different

complex networks. Fractals such as the Cantor set, the Koch curve and the Sierspinski

gasket are homogeneous since these fractals consist of a geometrical figure which

repeats on an ever-reduced scale. Fractal analysis is a useful method for their study.

However, real-world fractals are not homogeneous; there is rarely an identical motif

repeated on all scales. Their singularity may vary on different subsets; implying that

these objects are multifractal. Multifractal analysis is a useful way to systematically

characterize the spatial heterogeneity of both theoretical and experimental fractal

patterns. However, the tools for multifractal analysis of objects in Euclidean space

are not suitable for complex networks. In this thesis, we propose a new box covering

algorithm for multifractal analysis of complex networks. This algorithm is

demonstrated in the computation of the generalized fractal dimensions of some

theoretical networks, namely scale-free networks, small-world networks, random

networks, and a kind of real networks, namely PPI networks of different species. Our

main finding is the existence of multifractality in scale-free networks and PPI

networks, while the multifractal behaviour is not confirmed for small-world networks

and random networks. As another application, we generate gene interactions

networks for patients and healthy people using the correlation coefficients between

microarrays of different genes. Our results confirm the existence of multifractality in

gene interactions networks. This multifractal analysis then provides a potentially

useful tool for gene clustering and identification.

The third part of the thesis aims to investigate the topological properties of networks

constructed from time series. Characterizing complicated dynamics from time series

is a fundamental problem of continuing interest in a wide variety of fields. Recent

works indicate that complex network theory can be a powerful tool to analyse time

series. Many existing methods for transforming time series into complex networks

share a common feature: they define the connectivity of a complex network by the

mutual proximity of different parts (e.g., individual states, state vectors, or cycles) of


____________________________________________________________________

4

a single trajectory. In this thesis, we propose a new method to construct networks of

time series: we define nodes by vectors of a certain length in the time series, and

weight of edges between any two nodes by the Euclidean distance between the

corresponding two vectors. We apply this method to build networks for fractional

Brownian motions, whose long-range dependence is characterised by their Hurst

exponent. We verify the validity of this method by showing that time series with

stronger correlation, hence larger Hurst exponent, tend to have smaller fractal

dimension, hence smoother sample paths. We then construct networks via the

technique of horizontal visibility graph (HVG), which has been widely used recently.

We confirm a known linear relationship between the Hurst exponent of fractional

Brownian motion and the fractal dimension of the corresponding HVG network. In

the first application, we apply our newly developed box-covering algorithm to

calculate the generalized fractal dimensions of the HVG networks of fractional

Brownian motions as well as those for binomial cascades and five bacterial genomes.

The results confirm the monoscaling of fractional Brownian motion and the

multifractality of the rest. As an additional application, we discuss the resilience of

networks constructed from time series via two different approaches: visibility graph

and horizontal visibility graph. Our finding is that the degree distribution of VG

networks of fractional Brownian motions is scale-free (i.e., having a power law)

meaning that one needs to destroy a large percentage of nodes before the network

collapses into isolated parts; while for HVG networks of fractional Brownian

motions, the degree distribution has exponential tails, implying that HVG networks

would not survive the same kind of attack.


____________________________________________________________________

5

Declaration of Original Authorship

The work contained in this thesis has not been previously submitted for a degree or

diploma at any other higher educational institution. To the best of my knowledge and

belief, the thesis contains no material previously published or written by another

person except where due reference is made.

Signed:

Date:


____________________________________________________________________

6

Lists of Papers

1. Dan-ling Wang, Zu-Guo Yu, Vo Anh. Self-similarity in weighted PPI networks.

IACSIT International Conference on Bioscience, Biochemistry and Bioinformatics,

2011, Singapore, Singapore.

2. Dan-ling Wang, Zu-Guo Yu, Vo Anh. Multifractality in complex networks.

Submitted to Chaos, Solitons & Fractals.

3. Dan-ling Wang, Zu-Guo Yu, Vo Anh. Multifractal analysis of complex networks

constructed from time series. (To be submitted)

4. Dan-ling Wang, Zu-Guo Yu, Vo Anh. Degree distributions of constructed

horizontal visibility graphs. (In preparation)


____________________________________________________________________

7

Acknowledgements

I would like to acknowledge and thank Professor Vo Anh, principal supervisor, for

his encouragement, support, valuable discussion, guidance which enabled my

research to be finished. I am extremely grateful for his assistance to this thesis.

I would also like to thank Professor Zu-guo Yu, associate supervisor, for his

assistance, time, valuable suggestion, discussion which also enabled my research to

be continued.

My thanks also go to Queensland University of Technology and the School of

Mathematical Sciences for their financial support during this study. This research

also is benefited from the fellow PhD students and staff in the school.

I would especially like to thank my parents and my husband Yanfei Wang.


____________________________________________________________________

8

Content

Chapter 1.................................................................................................................... 15

Introduction ............................................................................................................... 15

1.1 The research problems .....................................................................................15

1.2 An introduction of complex networks..............................................................18

1.2.1 Definitions and notations ..........................................................................20

1.2.2 Properties of complex networks................................................................25

1.2.3 Introduction of PPI networks ....................................................................28

1.3 Fractal scaling of complex networks................................................................30

1.3.1 Literature review .......................................................................................30

1.3.2 Methods.....................................................................................................32

1.4 Multifractal analysis of complex networks ......................................................35


1.4.2 Methods.....................................................................................................38

1.5 Analysis of networks constructed from time series .........................................40


1.5.2 Methods.....................................................................................................42

1.6 Contributions of this thesis...............................................................................45

Chapter 2.................................................................................................................... 49

Fractal scaling of complex networks ....................................................................... 49

2.1 Introduction ......................................................................................................49

2.2 Theoretical background....................................................................................51

2.2.1 Mathematical concepts and definitions .....................................................51


____________________________________________________________________

9

2.2.2 Calculation of fractal dimension ............................................................... 56

2.3 Self-similarity of complex networks................................................................ 58

2.3.1 Methods..................................................................................................... 59

2.3.2 Numerical results and discussion.............................................................. 67

2.4 Fractal dimensions of weighted PPI networks................................................. 78

2.4.1 Methods..................................................................................................... 78

2.4.2 Results and discussion .............................................................................. 82

2.5 Conclusion ....................................................................................................... 86

Chapter 3 ....................................................................................................................88

Multifractal analysis of complex networks..............................................................88

3.1 Introduction...................................................................................................... 88

3.2 Theoretical background.................................................................................... 89

3.3 Methods............................................................................................................ 95

3.3.1 Box-covering algorithm ............................................................................ 95

3.3.2 Sand-box algorithm................................................................................... 97

3.3.3 Algorithms for multifractal analysis of networks ..................................... 98

3.4 Multifractality of theoretical networks...........................................................106

3.4.1 Multifractality of scale-free networks..................................................... 106

3.4.2 Multifractality of small-world networks ................................................. 114

3.4.3 Multifractality of random networks ........................................................ 119

3.5 Multifractality of PPI networks...................................................................... 122

3.6 Multifractality of gene networks.................................................................... 125

3.6.1 Methods................................................................................................... 126

3.6.2 Results and discussion ............................................................................ 129


____________________________________________________________________

10

3.7 Conclusion......................................................................................................133

Chapter 4.................................................................................................................. 135

Analysis of networks generated from time series................................................. 135

4.1 Introduction ....................................................................................................135

4.2 Theoretical background..................................................................................135

4.3 Fractal scaling of weighted networks generated from FBM..........................139

4.3.1 Methods...................................................................................................140

4.3.2 Numerical results and discussion ............................................................142

4.4 Multifractal analysis of horizontal visibility graphs ......................................144

4.4.1 Introduction of VG and HVG..................................................................144

4.4.2 Multifractal analysis of FBM HVG ........................................................147

4.4.3 Multifractal analysis of HVG of binomial measure ................................150

4.4.4 Multifractal analysis of measure representation of genome HVG..........157

4.5 Resilience of visibility graphs and horizontal visibility graphs .....................162

4.5.1 Resilience of visibility graphs .................................................................162

4.5.2 Resilience of horizontal visibility graphs................................................167

4.6 Conclusion......................................................................................................174

Chapter 5.................................................................................................................. 175

Summary and future research ............................................................................... 175

5.1 Research innovations and contributions.........................................................175

5.2 Possible future research..................................................................................176

References ................................................................................................................ 179


____________________________________________________________________

11

List of Figures

Fig 1.1 Euler’s graph for the Königsberg bridge problem......................................... 19

Fig 1.2 E.coli protein-protein interaction network..................................................... 19

Fig 2.1 The mass dimension method ......................................................................... 55

Fig 2.2 Illustration of the solution for the network covering problem via mapping to

the graph colouring problem. ..................................................................................... 60

Fig 2.3 Schematic illustration of the RS box-covering algorithm introduced by Kim

et al. (2007a, b). ......................................................................................................... 65

Fig 2.4 Protein-protein interaction network of the Homo sapiens............................ 71

Fig 2.5 The largest connected sub-network of Homo sapiens (Hsapi20041003) ...... 72

Fig 2.6 Fractal scaling for the sub-network G of Hsapi20041003............................. 72

Fig 2.7 Skeleton of the network showed in Fig 2.5 ................................................... 73

Fig 2.8 Fractal scaling of the skeleton of subnetwork G of Hsapi20041003............. 73

Fig 2.9 Fractal scaling of the network and its skeleton.............................................. 74

Fig 2.10 Fractal scaling of the largest connected part of Hsapi20081014CR............ 75

Fig 2.11 Fractal scaling of the skeleton of the largest connected part of

Hsapi20081014CR..................................................................................................... 75

Fig 2.12 PPI network of E. coli (version of E.coli20081014CR) .............................. 76

Fig 2.13 The largest connected sub-network G of E. coli.......................................... 76

Fig 2.14 Fractal scaling for the sub-network of E. coli.............................................. 77

Fig 2.15 Fractal scaling of the skeleton of the sub-network of E. coli ...................... 77

Fig 2.16 Fractal scaling of the Homo sapiens (Hsapi20041003) PPI ........................ 83

Fig 2.17 Fractal scaling of the Arabidopsis thaliana PPI........................................... 84

Fig 2.18 Fractal scaling of the E.coli PPI .................................................................. 85

Fig 2.19 Fractal scaling of the C.elegans PPI ............................................................ 85

Fig 2.20 Fractal scaling of the Yeast PPI.................................................................. 86

Fig 3.1 Construction of Cantor multifractal in Example (ii)………………………..93

Fig 3.2 Multifractal spectra for Cantor multifractal sets of Examples (i) and (ii). .... 94

Fig 3.3 Illustration of Sand-box algorithm................................................................. 99

Fig 3.4 Box-covering algorithm could result in different numbers of boxes needed to

cover the entire network........................................................................................... 100


____________________________________________________________________

12

Fig 3.5 Linear regression.......................................................................................... 103

Fig 3.6 Arabidopsis thaliana PPI network................................................................ 107

Fig 3.7 Node degree sequence of Arabidopsis thaliana PPI network ...................... 108

Fig 3.8 Node degree frequency of Arabidopsis thaliana PPI network..................... 108

Fig 3. 9 Degree distribution of Arabidopsis thaliana PPI network .......................... 109

Fig 3.10 Generating scale-free networks ................................................................. 112

Fig 3.11 The Dq curves for theoretically generated scale-free networks ................. 113

Fig 3.12 Generating small-world networks..............................................................116

Fig 3.13 The Dq curves for theoretically generated small-world networks ............. 118

Fig 3.14 Generating random networks..................................................................... 119

Fig 3.15 The Dq curves for theoretically generated random networks.................... 121

Fig 3.16 The Dq curves for protein-protein interaction networks ............................ 124

Fig 3.17 The Dq curves for sub-networks of human PPI networks.......................... 125

Fig 3.18 The Dq curves for gene networks of colorectal cancer microarray data.... 131

Fig 3.19 The Dq curves for gene networks of diabetes microarray data .................. 132

Fig 3.20 The Dq curves for gene networks of type I diabetes microarray data........ 132

Fig 3.21 The Dq curves for gene networks of lung cancer microarray data............. 133

Fig 4.1 Examples of fractional Brownian motions………………………………...138

Fig 4.2 Fractal scaling of weighted network of FBM .............................................. 143

Fig 4.3 Illustration of the visibility graph algorithm................................................145

Fig 4.4 Illustration of the horizontal visibility algorithm (Luque et al. 2009). ........ 146

Fig 4.5 Fractal scaling for FBM HVG ..................................................................... 148

Fig 4.6 The Dq curves for FBM with error bar........................................................ 149

Fig 4.7 The Dq curves for FBM HVG .....................................................................150

Fig 4.8 Illustration of a binomial measure ............................................................... 151

Fig 4.9 Binomial multifractal when p1 = 0.3 and p2 = 0.7 ....................................... 151

Fig 4.10 Dq of binomial measure when p1 = 0.3 and p2 = 0.7.................................. 153

Fig 4.11 Cq of binomial measure when p1 = 0.3 and p2 = 0.7.................................. 154

Fig 4.12 Illustration of HVG for binomial measure................................................. 155

Fig 4.13 Dq of Binomial multifractal network ......................................................... 156

Fig 4.14 Cq of binomial multifractal network .......................................................... 156

Fig 4.15 Measure representation of B. burgdorferi .................................................. 158


____________________________________________________________________

13

Fig 4.16 Dq of measure representations of bacteria ................................................. 159

Fig 4.17 Cq of measure representations of bacteria ................................................. 159

Fig 4.18 Dq of measure representation of bacteria HVG ......................................... 161

Fig 4.19 Cq of measure representation of bacteria HVG ......................................... 161

Fig 4.20 FBM VG after random breakdown (H = 0.9) ............................................167

Fig 4.21 Illustration of exponential fit of HVG degree distribution ........................ 168

Fig 4.22 Hurst index with degree distribution exponent.......................................... 169

Fig 4.23 FBM HVG log P (k) vs. k .......................................................................... 170

Fig 4.24 Exponential fits for HVG........................................................................... 171


____________________________________________________________________

14

List of tables

Table 2.1 Numerical results of fractal scaling of PPI networks................................. 69

Table 2.2 Numerical results of fractal scaling for the original PPI networks ............ 84

Table 2.3 Numerical results of fractal scaling for the weighted PPI networks.......... 84

Table 3.1 Numerical results of generated scale-free networks…………………….113

Table 3.2 Numerical results of generated small-world networks ............................ 117

Table 3.3 Numerical results of generated random networks.................................... 120

Table 3.4 Numerical results of Protein-protein interaction networks ...................... 123

Table 3.5 Numerical results of sub-networks of Human PPI................................... 124

Table 3.6 Numerical results of gene networks......................................................... 131

Table 4.1 Fractal dimensions of FBM weighted networks………………………...144

Table 4.2 Numerical results of FBM HVG.............................................................. 148

Table 4.3 Numerical results of measure representations of genome HVG.............. 160

Table 4.4 Exponential exponent for degree distribution of FBM HVG .................. 169

Table 4.5 Comparison of logP(k) vs. k between numerical linear regressions and Eq.

4.35........................................................................................................................... 171

Table 4.6 Exponential exponents for degree distribution of multifractal HVG....... 172

Table 4.7 Random breakdown test for FBM VG and HVG .................................... 173


____________________________________________________________________

15

Chapter 1

Introduction

1.1 The research problems

A great majority of real-world networks, including the World Wide Web, the Internet,

cellular networks, social networks and many others, are complex networks. During

the past two decades, comparative analyses of such networks from different fields

have produced a series of significant results concerning their structures, topology

properties, dynamics, etc. It has been shown that many real complex networks share

distinctive characteristic properties that differ in many ways from random and

regular networks (Lee and Jung 2006, Guo and Cai 2009). Two fundamental

properties of real complex networks have attracted much attention recently: the

scale-free degree distribution (Albert et al. 1999, Albert and Barabasi 2002,

Faloutsos 1999) and the small-world effect (Erdos and Renyi 1960, Milgram 1967).

The small-world effect in complex networks is mathematically characterized by an

average shortest path length l that depends at most logarithmically on the network

size N (number of nodes in the network), lnl N≈ , where l is the shortest distance

between two nodes and defines the distance metric in complex networks.

Equivalently, 0

llN e≈ , where l0 is the characteristic length. This equation led to a

belief that complex networks are not self-similar, since self-similarity requires a

power-law relation between N and l. In order to unfold the self-similarity properties

of complex networks, Song et al. (2005) calculated their fractal dimension and found

that the box-covering method is a powerful tool for further investigations of network

properties. After their work has been published in Nature, the research of complex

networks has been brought into a new era. This new property is so important and has

drawn much attention.

Recently, how to improve the box-covering method for fractal scaling of complex

networks has been discussed and several algorithms have appeared. Song et al. (2007)


____________________________________________________________________

16

developed some algorithms to calculate the fractal dimension of complex networks.

Then Kim et al. (2007a) proposed an improved algorithm by considering the skeleton

of networks. Zhou et al. (2007) proposed an alternative algorithm, based on the edge-

covering box counting, to explore self-similarity of complex cellular networks. Later

on, a ball-covering approach (Gao et al. 2008) and an approach defined by the

scaling property of the volume (Shanker 2007, Guo and Cai 2009) were proposed for

calculating the fractal dimension of complex networks.

This thesis aims to study the multifractality of complex networks in three

related aspects: (i) fractal scaling of weighted complex networks; (ii)

multifractal analysis of complex networks; (iii) analysis of complex networks

constructed from time series.

(i) Fractal scaling of weighted complex networks.

Fractal scaling of complex networks has been so important because of two basic

reasons. The first reason is that it proves that real-world networks could be self-

similar (with fractal dimension) and have small-world effect as heralded in Song et al.

(2005); the other reason is that it successfully improves the box-covering method for

complex networks which are not defined on Euclidean space (Song et al. 2007, Kim

et al. 2007a, Zhou et al. 2007, Gao et al. 2008). Because a metric on graphs is not the

same as the Euclidian metric on Euclidian spaces, the box-covering algorithms to

calculate the fractal dimension of networks is much more complicated than the

traditional box-covering algorithm for fractal sets in Euclidian spaces. Those real

networks that have been shown to possess the self-similarity property in the existing

literatures are all unweighted networks. We aim to confirm that the study of

properties of complex networks can be expanded to a wider field including more

complex weighted networks. In this thesis the protein-protein interaction (PPI)

networks are chosen as a key example to investigate whether their weighted

networks inherit the self-similarity from the original unweighted ones. Moreover, the

idea of weighted networks will be useful for our study in the third part of the thesis.

(ii) Multifractal analysis of complex networks.


____________________________________________________________________

17

In Song et al. (2005), self-similarity was shown to exist in the WWW, E.coli and

human protein-protein interaction (PPI) networks with fractal dimensions dB = 4.1,

dB = 2.3 and dB = 2.3, respectively. Both the PPI networks have similar fractal

dimension. Real-world fractals are not homogeneous. There is rarely an identical

motif repeated on all scales. Two objects might have the same fractal dimension, and

yet look completely different. So we want to study more on the fractality property of

complex networks. We wonder whether they are simple fractals or multifractals and

how different two networks could be even they have the same fractal dimension. To

this end, multifractal analysis is needed. Multifractal analyses have been used to

study and to characterize a wide range of objects. Fernández et al. (1999)

investigated the theoretical and practical aspects of studying and measuring the

multifractal dimensions of neurons; Saa et al. (2007) used multifractal analysis to

estimate the Rényi dimensions of river basins; Yu et al. (2001b) discussed the

multifractal property of the measure representation and the classification of bacteria;

Lee and Jung (2006) found that the probability distribution of the clustering

coefficient is best characterized by a multifractal.

(iii) Analysis of complex networks constructed from time series.

Time series analysis attracts special attention due to its practical and theoretical

importance in physics, physiology, biology and society. Reconstruction of networks

from time series is a common problem in diverse research fields. Network theory

provides us with a new viewpoint and an effective tool for understanding a complex

system from the relations between the elements in a global way. But how to construct

a network from a time series is still an essential problem to be solved. Networks

constructed with different methods will have distinct characteristics. The key

question is to know whether the associated graph inherits some structure of the time

series, and consequently whether the process that generated the time series may be

characterized by using graph theory. Several approaches have been proposed to

investigate properties of time series via their corresponding constructed networks

(Donner et al. 2011). Zhang and Small (2006), Zhang et al. (2008), and Small et al.

(2009) showed that linear and periodic systems have cycle networks that appear

randomly, while chaotic and nonlinear systems generate highly structured networks.


____________________________________________________________________

18

Donner et al. (2010) discussed the geometric interpretation of a variety of global

network properties as well as vertex and edge properties of ε - recurrence networks,

including graphical representations of the spatial distributions of different vertex

properties for the Lorenz system in the standard parameter setting; Lacasa et al.

(2008) used the visibility graph for estimating the Hurst exponent h in fractional

Brownian motion. In this thesis we aim to investigate the topological properties of

networks constructed from time series. More specifically, we will investigate

whether the fractal and multifractal behaviour could be inherited from time series by

the corresponding complex networks. Additionally, some other topological

properties of the connected complex networks such as degree distribution and

resilience will be discussed.

1.2 An introduction of complex networks

A network is a set of items, which we will call vertices or nodes, with connections

between them, called edges. Systems taking the form of networks (also called

“graphs” in much of the mathematical literature) abound in practice. Examples

include the Internet, the World Wide Web, social networks of acquaintance or other

connections between individuals, organizational networks and networks of business

relations between companies, neural networks, metabolic networks, food webs,

distribution networks such as blood vessels or postal delivery routes, networks of

citations between papers, and many others.

The study of networks, in the form of mathematical graph theory, is one of the

fundamental pillars of discrete mathematics. In 1735 Euler made what is now

regarded as the birth point of graph theory: he solved the Königsberg bridge problem.

As showed in Fig. 1.1 the nodes of the graph are separate land masses in old

Königsberg, and its links are the bridges between these pieces of land. The problem

is could a pedestrian walk around Königsberg crossing each bridge only once? In

other words, is it possible to walk this graph passing through each link only once?

Euler proved that such a walk is impossible.


____________________________________________________________________

19

Fig 1.1 Euler’s graph for the Königsberg bridge problem

In recent years, with the development of technology, more and more networks turn

out to be quite complex. They have huge numbers of nodes and edges and have such

features that do not occur in simple graphs. And the research of networks has been

shifted away from the analysis of single small graphs and the properties of individual

vertices or edges within such graphs to consideration of large-scale statistical

properties of complex networks. For networks of tens or hundreds of vertices, it is a

relatively straightforward matter to draw a picture of the network with actual points

and lines and to answer specific questions about network structure by examining this

picture.

Fig 1.2 E.coli protein-protein interaction network

An example of E.coli protein-protein interaction network is showed in Fig. 1.2. From

this figure we could see that most proteins are connected in the largest part, while

some others are isolated. We believe that with the development of experiment


____________________________________________________________________

20

technology and lots of analysis methods more and more protein interactions could be

found thus this network will be more connected and more complex. However, the

recent development of statistical methods for quantifying large networks is to a large

extent an attempt to gain an understanding of their structure.

1.2.1 Definitions and notations

A network is a system that admits an abstract mathematical representation as a graph.

In a network, nodes usually identify the elements of the system, and links or edges

identify the relations or interactions among all the elements. In this subsection we

will introduce some basic definitions of networks in graph theory.

Graph G

A graph is a ordered pair G = (V, E), where

(i) V = {v1, v2,…, vn}, V ≠Ø , is called the vertex or node set of G;

(ii) E = {w1, w2,…, wm} is the edge set of G in which wi = {vj, vt} or <vj, vt> is

the edge linking two nodes vj and vt. We use

Node set

N = {n1, n2, …, nI} represents the nodes or vertices or points of the graph G with

N ≠ Ø.

Edge set

E = {e1, e2, …, eJ} represents the links or edges or lines among the elements of N.

The number of elements in N and E are denoted by I and J, respectively. So a simply

form of G (I, J) or G I, J could be referred to as a graph or network which contains a

number of I nodes and a mount of J edges. A node is usually referred to by its order i

in the set N. In an undirected graph, each of the edges is defined by a pair of nodes ni

and nj, so eij means the edge between node ni and node nj and the nodes ni and nj are

called two ends of the edges.


____________________________________________________________________

21

Directed /undirected graph

An edge is directed if it runs in only one direction (such as a one-way road between

two points), and undirected if it runs in both directions. Directed edges, which are

sometimes called arcs, can be thought of as sporting arrows indicating their

orientation. A graph is directed if all of its edges are directed. An undirected graph

can be represented by a directed one having two edges between each pair of

connected vertices, one in each direction.

Adjacency matrix

Networks are naturally represented in matrix form. A graph of N nodes is described

by an N×N adjacency matrix A whose non-zero elements aij indicate connections

between nodes. For undirected networks, a non-diagonal element aij of an adjacency

matrix is equal to the number of edges between nodes i and j, and so the matrix

symmetric. A diagonal element aii is twice the number of loops of length one

attached to node i.

Node degree

The degree ki of a node i is the number of edges connected with the node, and is

defined in terms of the adjacency matrix A as

i ijj N

k a∈

=∑ . (1.1)

If the graph is directed, the degree of the node has two components:

outi ij

j

k a=∑ (referred to as the out-degree of the node)

ini ij

j

k a=∑ (referred to as the in-degree of the node).

The total degree is then defined as

in outi i ik k k= + . (1.2)

A list of the node degrees of a graph is called the degree sequence.

Degree distribution

Degree distribution is the most basic topological characterization of a graph G which

is defined as the probability that a node chosen randomly has degree k or,


____________________________________________________________________

22

equivalently, as the fraction of nodes in the graph having degree k. Alternatively, the

degree distribution is denoted as Pk, or pk, to indicate that the variable k assumes

non-negative integer values. In the case of directed networks one needs to consider

two distributions, P(kin) and P(kout). Information on how the degree is distributed

among the nodes of an undirected network can be obtained by a plot of Pk.

Shortest path and distance between nodes

In graph theory, the shortest path is a path between two vertices (or nodes) such that

the sum of the weights of its constituent edges is minimized. The distance between

two nodes in an unweighted graph is the number of edges in a shortest path

connecting them. This is also known as the geodesic distance because it is the length

of the graph geodesic between those two vertices. If there is no path connecting the

two vertices, i.e., if they belong to different connected components, then

conventionally the distance is defined as infinite. The standard algorithms to find

shortest paths such as the Dijkstra’s algorithm, or the breadth-first search method

have been proposed in Cormen et al. 2001, Sedgewick 1988 and Ahuja et al. 1993.

Network diameter and radius

Shortest paths play an important role in the transport and communication within a

network (Pastor-Satorras and Vespignani 2004, Wasserman and Faust 1994, Scott

2000). It is useful to represent all the shortest path lengths of a graph G as a matrix

D in which the elements dij is the length of the geodesic from node i to node j .The

maximum value of dij is called the diameter of the graph which is the longest path

between any two nodes in the graph. Let the longest path from a node i to all other

nodes of a connected graph be defined as the radius of node i, denoted radius (i).

Then the node with the smallest radius is the centre of this graph, and the largest

radius over all nodes is the graph’s diameter.

Characteristic path length

A measure of the typical separation between two nodes in the graph is given by the

average shortest path length, also known as characteristic path length, defined as the


____________________________________________________________________

23

mean of geodesic lengths over all couples of nodes (Watts and Strogatz 1998, Watts

1999):

, ,

1

( 1) iji j N i j

L dN N ∈ ≠

=− ∑ . (1.3)

A problem with this definition is that L diverges if there are disconnected

components in the graph. One possibility to avoid the divergence is to limit the

summation in formula (1.3) only to couples of nodes belonging to the largest

connected component (Watts and Strogatz 1998).

Efficiency of graph

The so-called efficiency of a graph is defined as

, ,

1 1

( 1) i j N i j ij

LN N d∈ ≠

=− ∑ . (1.4)

Such a quantity is an indicator of the traffic capacity of a network, and avoids the

divergence of formula (1.3), since any couple of nodes belonging to disconnected

components of graph yields a contribution equal to zero to the summation in formula

(1.4). The mathematical properties of the efficiency have been investigated (Criado

et al. 2005a, Buhl et al. 2004, Latora and Marchiori 2005, Criado et al 2005b).

Betweenness and closeness

Betweenness of a node i is the number of paths from all nodes except i to all other

nodes that must pass through node i. Closeness of node i is the number of direct

paths from all nodes to all other nodes that must pass through node i. So betweenness

considers all paths, while closeness considers only the direct paths. In fact, the

number of paths between pairs of nodes could be very large if we count both

directions along all paths in an undirected graph. For this reason, some researchers

prefer to use the closeness property, which counts only the number of direct paths.

This may be much smaller than counting all paths, but still is a large number. The

communication of two non-adjacent nodes, say j and k, depends on the nodes

belonging to the paths connecting j and k. Consequently, together with the degree

and the closeness of a node the betweenness is one of the standard measures of node

(edge) centrality, originally introduced to quantify the importance of an individual in


____________________________________________________________________

24

a social network. More precisely, the betweenness ib of a node (edge) i, sometimes

also referred to as load which is defined as

, ,

( )jki

j k N j k jk

n ib

n∈ ≠

= ∑ , (1.5)

where njk is the number of shortest paths connecting j and k, while njk(i) is the

number of shortest paths connecting j and k and passing through i. Betweenness

distributions, betweenness-betweenness correlations and betweenness-degree

correlations have been studied recently (Goh et al. 2001 and Goh et al. 2003).

Clustering

Clustering, also known as transitivity is a typical property of acquaintance networks,

where two individuals with a common friend are likely to know each other

(Wasserman and Faust 1994). In terms of generic graph G, transitivity means the

presence of a high number of triangles. This can be quantified by defining the

transitivity T of the graph as the relative number of transitive triples, i.e. the fraction

of connected triples of nodes which also form triangles (Newman 2003, Newman

2001a, Barrat and Weigt 2000):

3 # of triangles in G

# of connected triples of vertices in GT

×= . (1.6)

The factor 3 in the numerator compensates for the fact that each complete triangle of

three nodes contributes three connected triples; one centred on each of the three

nodes, and ensures that 0 ≤ T ≤ 1.

Clustering coefficient

The clustering coefficient C is a measure introduced by Watts and Strogatz (1998). A

quantity ci (the local clustering coefficient of node i) is first introduced, expressing

how likely ajm = 1 for two neighbours j and m of node i. Its value is obtained by

counting the actual number of edges (denoted by ei) in Gi (the subgraph of

neighbours of node i). The local clustering coefficient is defined as

2

( 1)i

ii i

ec

k k=

−, (1.7)


____________________________________________________________________

25

where ( 1)i ik k − is the maximum possible number of edges in Gi. The clustering

coefficient of the graph is then given by the average of ci over all the nodes in G:

1

ii N

C c cN ∈

= = ∑ , (1.8)

by definition, 0 ≤ ci ≤ 1, and 0 ≤ C ≤ 1.

1.2.2 Properties of complex networks

There are many complex systems in nature and in technology such as neural

networks, coupled biological and chemical systems, social interacting species and the

Internet. The first approach to capture the global properties of such systems is to

model them as graphs whose nodes represent the elements in the system and links

represent the interactions between elements. People may consider these networks as

an approximation of true systems because the original interaction between elements,

which is usually depending on time, space and many other details, will be simply

translated into the existence or not of a link between the two corresponding nodes.

Nevertheless, in many cases of practical interest, such an approximation provides a

simple but still very informative representation of the entire system.

During the past decade, the growing availability of large databases and computing

facilities as well as the development of powerful and reliable data analysis tools has

constituted a better environment to explore the topological properties of several

networked systems from the real world. This has allowed studying the topology of

the interactions in a large variety of systems as diverse as communication, social and

biological systems. Most of the interesting features of real-world networks that have

attracted the attention of researchers concern the ways in which networks are

different from theoretical networks like random graphs.

Most of the real networks are characterized by the same topological properties, for

instance, relatively small characteristic path lengths, high clustering coefficients,

heavy tails in the degree distributions, degree correlations, and the presence of motifs

and community structures. This has led to a large attention towards the understanding


____________________________________________________________________

26

of the evolution mechanisms that have shaped the topology of a network, and to the

design of new models retaining the most significant properties empirically observed.

In this section we describe some features that appear to be common in networks of

many different types.

Small- world effect

In most of the real networks, despite of their large size, there is a relatively short path

between any two nodes. This property was first investigated, in the social context, by

Milgram in the 1960s in a series of experiments to estimate the actual number of

steps in a chain of acquaintances. In its first experiment, Milgram asked randomly

selected people in Nebraska to send letters to a distant target individual in Boston,

identified only by his name, occupation and rough location. The letters could only be

sent to someone whom the current holder knew by first name, and who was

presumably closer to the final recipient. Milgram kept track of the paths followed by

the letters and of the demographic characteristics of their handlers. Although the

common guess was that it might take hundreds of these steps for the letters to reach

their final destination, Milgram’s surprising result was that the number of links

needed to reach the target person had an average value of just six. The experiments

are often associated with the phrase "six degrees of separation", although Milgram

did not use this term himself.

Small-world effect is mathematically characterized by an average shortest path

length l , that depends at most logarithmically on the network size N which is the

number of nodes in the network, lnl N≈ , where l is the shortest distance between

two nodes and defines the distance metric in complex networks. Equivalently, we

obtain

0

llN e≈ , (1.9)

where l0 is a characteristic length.

The small-world property has been observed in a variety of other real networks,

including biological and technological networks (Watts and Strogatz 1998, Watts


____________________________________________________________________

27

1999, Newman 2001a, Newman 2001b). Watts and Strogatz (1998) proposed to

define small-world networks as sparse networks with high cluster coefficient,

relatively short average path length. In simple terms, the small-world effect is the

rapid decline in average path length as a small number of links randomly added to a

network.

Scale-free degree distribution

In regular lattices or in random graph possible links is present with equal probability

N(N − 1)/2, and thus the degree distribution is binomial or Poisson in the limit of

large graph size. It is not startling then that, when scientists approached the study of

real networks from the available databases, it was considered reasonable to find

degree distributions localized around an average value, with a well-defined average

of quadratic fluctuations. In contrast with all the expectancies, it was found that some

of the real networks display power law degree distribution:

( )P k k γ−≈ , (1.10)

where the exponents vary in the range 2 3γ< < . The average degree k in such a

network is therefore well defined. Such networks have been named scale-free

networks (Albert and Barabasi 2002, Barabasi and Albert 1999). Power laws have a

particular role in statistical physics because of their connections to phase transitions

and fractals. The networks whose degree distribution have a power-law property will

have a small number of high-degree nodes and a large number of low-degree nodes.

In nonmathematical terms, a scale-free network is one with a few nodes linked to

many other nodes, and a large number of poorly connected nodes. The rare nodes

with high degree are called hubs. Scale-free networks have been the focus of a great

deal of attention (Albert and Barabasi 2002, Dorogovtsev and Mendes 2002, Strogatz

2001).

Other properties of networks

It is widely assumed that most social networks show “community structures”, i.e.,

groups of vertices that have a high density of edges within them, with a lower density

of edges between groups (Scott 2000, Wasserman 1994). The traditional method for

extracting community structure from a network is cluster analysis (Everitt 1974)


____________________________________________________________________

28

which sometimes is also called hierarchical clustering. Clustering is possible

according to many different definitions of the connection strength. Results appear to

show that, for social and biological networks at least, community structure is a

common network property. Goh et al. (2002) made a statistical study of the

distribution of the “betweenness centrality” of vertices in networks. They showed

that betweenness appears to follow a power law for many networks and proposed a

classification of networks into two kinds based on the exponent of this power law.

1.2.3 Introduction of PPI networks

Here we introduce the protein-protein interaction (PPI) network specifically, because

PPI network is a typical real world network with properties such as small-world

effect and scale-free degree distribution. In our work, we will apply fractal scaling

and multifractal analysis on some real networks and PPI is one of the best choices.

In the “post-genome” era, proteomics (Palzkill 2002, Waksman2005) has become an

essential field and drawn much attention. Proteomics is the systematic study of the

many and diverse properties of proteins with the aim of providing detailed

descriptions of the structure, function, and control of biological systems in health and

diseases. A particular focus of the field of proteomics is the nature and role of

interactions between proteins. Protein-protein interactions (Palzkill 2002, Park et al.

2009, Peink and Alber 1998, Pellegrini et al. 1999, Qi et al. 2007, Rao and Srinivas

2003 and Rumelhart et al. 1986) play different roles in biology depending on the

composition, affinity, and lifetime of the association. It has been observed that

proteins seldom act as single isolated species while performing their functions in

vivo.

The study of protein interactions is fundamental to understand how proteins function

within a cell. The knowledge of protein-protein interaction can provide important

information on the possible biological function of a protein. Much effort has been

done to detect and analyse PPIs using experimental methods such as the yeast two-

hybrid system which is well known. Recently, several algorithms have been


____________________________________________________________________

29

developed to identify functional interactions between proteins using computational

methods which can provide clues for the experimental methods and could simplify

the task of protein interaction mapping. As the prediction task becomes harder the

need for methods that can accommodate high levels of missing values and are

directly interpretable by biologists increases.

Protein-protein interaction plays a key role in the cellular processes of an organism.

An accurate and efficient identification of protein-protein interaction is fundamental

for us to understand the physiology, cellular functions, and complexity of an

organism. Before the year 2000 most theoretical methods to predict protein-protein

interactions are based on available complete genomes such as the phylogenetic

profiles, domain fusion or Rosetta stone method, and gene neighbour method, etc.

The phylogenetic profile (Cubellis et al. 2005, Hoskins et al. 2006 and Karimpour-

Fard et al. 2007), which is also called the co-conservation method, is a computational

method which has been used to predict functional interactions between pairs of

proteins in a target organism by determining whether both proteins are consistently

present or absent across a set of reference genomes. This method was first introduced

by Pellegrino et al. (1999) and it has been successfully applied to the prediction of

protein function by several groups and proved to be more powerful than sequence

similarity alone on predicting protein function.

Some basic definitions:

The simplest representation of PPI networks (Dandekar et al. 1998, Enright et al.

1999 and Rual et al. 2005) takes the form of a mathematical graph consisting of

nodes and edges (or links). Proteins are represented as nodes and an edge represents

a pair of proteins which physically interact. The degree of a node is the number of

other nodes with which it is connected. It is the most elementary characteristic of a

node.

Properties of PPI Networks

A protein-protein interaction network has three main properties (Hu and Pan 2007):

scale invariance, disassortativity and small-world effect. Much work has been done

to study these properties and to find new ones.


____________________________________________________________________

30

(i) Scale invariance: in scale-free networks, most proteins participate in only a

few interactions, while a few participate in dozens of interactions.

(ii) Small-world effect means that any two nodes can be connected via a short

path of a few links. The small-world phenomenon was first investigated as

a concept in sociology and is a feature of a range of networks arising in

nature and technology such as the most familiar one: Internet.

(iii) Disassortativity: in protein-protein interaction networks the nodes which are

highly connected seldom link directly to each other. This is very different

from social networks in which well-connected people tend to have direct

connections to each other. All biological and technological networks have

the property of disassortativity.

Sol and O’Meara (2005) have introduced an approach to identify key residues in

protein-protein interactions. They showed that protein complexes (Altaf-Ul-Amin et

al. 2006) form small-world networks, and a high percentage of highly connected

residues correspond to or are in contact with experimentally validated hot spots.

Their study illustrated that in the complexes with more than or equal to 50% of them

statistically significant, high-betweenness residues occur at the protein-protein

interfaces which play an important role in the structures of the protein complexes.

1.3 Fractal scaling of complex networks

1.3.1 Literature review

Watts (1999) is one of the pioneering books on the subject which deals with the

structure and dynamics of small-world networks as well as small-world modelling.

Bornholdt and Schuster (2003) defines the field of complex interacting networks in

its infancy and presents the dynamics of networks and their structure as a key

concept across disciplines. Newman et al. (2006) is a convenient sourcebook which

brings together for the first time a set of key research articles in this fast-growing

field. There are review articles such as Strogatz (2001) in Nature’s special issue on


____________________________________________________________________

31

complex systems, which contains a discussion on the networks of dynamical units;

Newman (2003) reviewed some recent works on the structure and function of

networked systems. Boccaletti et al. (2006) reviewed the major concepts and results

in the study of the structure and dynamics of complex networks and summarized the

relevant applications of these ideas in many different disciplines, ranging from

nonlinear science to biology, from statistical mechanics to medicine and engineering.

Many works on the mathematics of networks have been driven largely by

observations of the properties of actual networks and attempted to model them.

Social networks have been studied regarding different patterns such as friendships

between individuals (Scott J. 2000, Wasserman and Faust 1994); business

relationships between companies (Moreno 1934, Rapoport and Horvath 1961).

Technological networks are designed typically for distribution of some commodity

or resource such as electricity or information. The electric power grid is a good

example. Statistical studies of power grids have been proposed, for example Watts

and Strogatz (1998), Watts (1999) and Amaral (2000). River networks could be

regarded as a naturally occurring form of distribution networks (Dodds and Rothman

2001). Metabolic network is a classic example of biological networks. It is a

representation of metabolic substrates and products with directed edges that acts on a

given substrate and produces a given product. Studies of the statistical properties of

metabolic networks have been performed by, for example, Jeong et al. (2000), Fell

and Wagner (2000), etc.

Two well-known properties of complex networks are scale-free degree distribution

and small-world effect. With the aim of providing a deeper understanding of the

underlying mechanism that leads to these common features, we need to probe the

patterns within the network structure in more detail. The question of connectivity

between groups of interconnected nodes on different length scales has received less

attention. But many examples exhibit the importance of collective behaviour, such as

interactions between communities within social networks, links between clusters of

websites of similar subjects, and the highly modular manner in which molecules

interact to keep a cell alive. The box covering algorithm has been studied (Song et al.


____________________________________________________________________

32

2005) to demonstrate the existence of self-similarity in many real networks. More

specifically, networks such as the world-wide web (WWW), social network, protein–

protein interaction (PPI) networks and cellular networks are invariant or self-similar

under a length-scale transformation.

Since then, the study of complex networks has reached a new era. With the help of

box-covering algorithms which are basic tools to measure the fractal dimension of

conventional fractal objects (Feder 1998, Brown and Liebovitch 2009), the fractal

and self-similarity properties of complex networks were subsequently studied

extensively in a variety of systems (Rozenfeld 2008, Song et. al 2006).

1.3.2 Methods

The box-covering method is a basic tool to measure the fractal dimension of

conventional fractal objects embedded in the Euclidean space (Feder 1998, Brown

and Liebovitch 2009). However, such a method cannot be applied to scale-free

networks that exhibit an inhomogeneous degree distribution and the small-world

effect because Euclidean metric is not well defined in such networks. Song et al.

(2007) has provided a detailed study of the algorithms used to calculate quantities

characterizing the topology such as the fractal dimension dB of such complex

networks. They studied and compared several possible box covering algorithms

applied to a number of model and real-world networks. They showed that the optimal

network covering can be directly mapped to a vertex colouring problem, which is a

well-studied problem in graph theory. They found that the approach leads to the most

efficient solution of the optimal box covering problem. Two other methods based on

breadth-first search were also presented.

Let us recall the original definition of box covering by Hausdorff (Mandelbrot 1982,

Feder 1998). For a given network G and box size lB, a box is a set of nodes where all

distances lij between any two nodes i and j in the box are smaller than lB. The

minimum number of boxes required to cover the entire network G is denoted by NB.

For lB = 1, NB is obviously equal to the size of the network N, while for lB ≥lB max

NB = 1, where maxBl is the diameter of the network (i.e. the maximum distance in the


____________________________________________________________________

33

network) plus one. The ultimate goal of all box covering algorithms is to locate the

optimum solution, i.e., to identify the minimum NB (lB) value for any given box size

lB. This problem can be mapped to the graph colouring problem, which is known to

belong to the family of NP-hard problems. So the exact solution for vertex colouring

can only be achieved on small size networks, and in practice, a greedy algorithm is

widely adopted to obtain an approximate solution.

Greedy colouring algorithm

The greedy colouring algorithm (Song et. al 2007) is based on the greedy algorithm

(Cormen et al. 2001) and vertex colouring program. We can simply proceed in two

main steps: first, ranking the nodes in a sequence randomly; second, marking each

node with a free colour, which is different from the colours of its nearest neighbours

(according to the fixed value of lB). This greedy algorithm is very efficient, since we

can cover the network with a sequence of box sizes lB performing only one network

pass. Because the results may depend on the original colouring sequence, in order to

investigate the quality of the algorithm, Song et al. (2007) have randomly reshuffled

the colouring sequence and applied the greedy algorithm 10000 times on several

different models and real-world networks. Strictly speaking, the calculation of the

fractal dimension dB through the relation ~ BdB BN l − is valid only for the minimum

possible value of NB, for any given lB value, so an algorithm should aim to find this

minimum NB. For the greedy colouring algorithm it has been shown (Cormen et al.

2001) that it can identify a colouring sequence which yields the optimal solution, i.e.

the minimal value from the greedy algorithm coincides with the optimal value.

Random sequential box-covering

Compared with the above algorithms, Kim et al. (2007a, b, c) introduced another

method called the random sequential box-covering which has the following steps.

Start with all vertices labelled as not burned. Then,

(1) Select a vertex randomly at each step; this vertex serves as a seed.

(2) Search the network by distance rB from the seed and burned all vertices found

but not burned yet. Assign newly burned vertices to the new box. If no newly

burned vertex is found, the box is discarded.


____________________________________________________________________

34

(3) Repeat (1) and (2) until all vertices are assigned to their respective boxes.

Here rB refers to the radius of a box which is related to lB approximately

as 2 1B Bl r= + . A different Monte Carlo realization of this procedure ((1)–(3)) may

yield a different number of boxes for covering the network. In Kim et al. (2007a, b),

for simplicity, the smallest number of boxes among all the trials has been chosen.

Other methods

A traditional geometrical approach is the so-called ‘burning’ algorithm (breadth-

first search) (Song et. al 2007). The basic idea is to generate a box by growing it

from one randomly selected node towards its neighbourhood until the box is compact

(the box includes the maximum possible number of nodes, i.e. when there do not

exist any other network nodes that could be included in this box), or equivalently that

each box should include the maximum possible number of nodes. Although this

algorithm is quite easy to implement, it requires a very long computational time. For

this reason, another algorithm which is called compact-box-burning (Song et. al

2007) has been introduced. The formal definition of boxes includes the maximum

separation lB between any two nodes in a box. However, it is possible to recover the

same fractal properties of a network where a box can be defined as nodes within a

radius rB from a central node. Based on this box definition and random central nodes,

and a newly defined ‘excluded mass’ of a node (meaning the number of uncovered

nodes within a chemical distance less than rB), the maximum-excluded-mass-

burning (MEMB) (Song et. al 2007) has been introduced.

The iterative scoring method Many methods have been proposed to assess the reliability of protein interactions.

These methods usually assign a score to each protein pair such that the higher the

score is, the more likely the proteins interact with each other. Among these methods,

CD-distance (Brun et al. 2003) and FS-Weight (Chua et al. 2008) are measures

calculated considering the number of common neighbors of two proteins. They are

initially proposed to predict protein functions and protein complexes, and have been

shown to perform well for assessing the reliability of protein interactions.


____________________________________________________________________

35

The intuition behind the iterative scoring method is that if the score of an interaction

reflects its reliability, then the scored interactions should better represent the actual

interaction network than the initial binary ones, and we should be able to further

improve score computation by re-computing the score of each protein pair using the

scored interactions. Liu et al. (2009) used the Adjust CD-distance which is a variant

of CD-distance to calculate the score of protein pairs. CD-distance and FS-Weight

can be iterated in a similar way. Liu et al. (2009) showed that the iterative scoring

method can improve functional homogeneity and localization coherence of top

ranked interactions, and the iterative scoring method performs best when k = 2, and

the subsequent iterations do not improve the performance further.

1.4 Multifractal analysis of complex networks

Fractal and multifractal concepts have been increasingly applied in various fields of

science for describing complexity and self-similarity in nature. The main attraction

of fractal geometry stems from its ability to describe the irregular or fragmented

shape of natural features as well as other complex objects that traditional Euclidean

geometry fails to analyse. The tools of fractal analysis provide a global description of

the heterogeneity of an object. Most of the fractals such as the Cantor set, Koch

curve, Sierpiński triangle, etc. are homogeneous since the fractals consist of a

geometrical figure repeated on an ever-reduced scale. For these objects, the fractal

dimension is the same on all scales. However, in the real world, fractals are not

homogeneous; there is rarely an identical motif repeated on all scales. Thus fractal

analysis is not enough or appropriate to describe such real world objects just by a

single fractal value which is the fractal dimension. Therefore, multifractal analysis is

needed. Multifractals could be seen as an extension of fractals. A multifractal object

is more complex in the sense that it is always invariant by translation, although the

dilatation factor needed to be able to distinguish the detail from the whole object

depends on the detail being observed. Multifractal analysis has been applied in a

variety of fields such as biology, finance, physics, geography, etc.


____________________________________________________________________

36


In recent years, fractal and multifractal analyses have been applied extensively in

medical signal analysis such as image segmentation and characterization (Lopes and

Betrouni 2009). Image segmentation is a key step in many medical imaging based

procedures. Fractal features are used, in this field, as additional texture parameters.

Indeed, the fractal dimension showed interesting results in some image modalities

like MR (Magnetic Resonance), CT (computerized tomoscanning) and US

(ultrasound-based diagnostic). However, considered alone, it cannot provide a

precise method of segmentation, since it is calculated on windows of the image.

From another angle, multifractal analysis seems more adapted than the fractal

analysis to texture segmentation. Its advantage is to characterize the local scale

properties in addition to the global properties. So it makes it possible to quantify the

distribution of the local singularities.

Keller et al. (1989) were the first to propose a method for texture segmentation using

fractal geometry. Subsequently, a number of works (Hsu et al., 2007; Kikuchi et al.,

2005; Zhuang and Meng, 2004) examined several fractal parameters. In some cases,

fractal analysis does not perform correct image segmentation (Lopes and Betrouni

2009). Indeed, some images are too complex to study because they present

irregularities and more regular zones at all scales, without following a clear law. To

recover information from such singular images, the multifractal formalism suggests

studying the way in which the image’s singularities are distributed (Lopes and

Betrouni 2009). Thus, a number of recent studies have focused on texture

segmentation using multifractal analysis (Abadi and Grandchamp, 2006; Xia et al.,

2006) with application to MR and US images (Ezekiel, 2003).

Fractal and multifractal analyses have been used to study and to characterize a wide

range of signals in biology and medicine such as electrocardiogram (ECG) and

electroencephalogram (EEG) signals (Jun et al. 1994, Hsu et al. 2007, Popivanov et

al. 2005, Shimizu et al. 2004), brain imaging (Kedzia et al. 2002, Mansury and

Deisboech 2004, Takahashi et al. 2006), mammography (Kestener et al. 2004,

Mavroforakis et al. 2006) and bone imaging (Yi et al. 2007).


____________________________________________________________________

37

The rapidly accumulating complete genome sequences of bacteria and archaea

provide a new type of information resource for understanding gene functions and

evolution. In Yu et al. (2001a) the coding and noncoding length sequences

constructed from a complete genome are characterised by multifractal analysis. The

dimension spectrum Dq and its derivative, the ‘analogous’ specific heat Cq, are

calculated for the coding and noncoding length sequences of bacteria. According to

the types of the Cq curves of the noncoding length sequences, bacteria could be

characterise. This new type of classification allows a better understanding of the

relationship among bacteria at the global gene level instead of nucleotide sequence

level.

Yu et al. (2001b) discussed the multifractal property of the measure representation

and the classification of bacteria. By using subintervals in one dimensional space to

represent substrings one can directly obtain an accurate histogram of the substrings

in the complete genome. The histogram viewing as a measure could be called the

measure representation of the complete genome. When the measure representation is

viewed as a time series, spectral analysis and multifractal analysis are then performed

on the measure representations of a large number of complete genomes. From the

measure representations and the values of the Dq spectra and related Cq curves, it is

concluded that these complete genomes are quite different from random sequences.

A classification of genomes of bacteria by assigning to each sequence a point in two-

dimensional space (D-1, D1) and in three-dimensional space (D-1, D1, D-2) was given.

A new chaos game representation (CGR) of protein sequences based on the detailed

HP model was proposed Yu et al. (2004). Multifractal and correlation analyses of the

measures based on the CGR of protein sequences from complete genomes are

performed. The Dq spectra of all organisms studied are multifractal-like and

sufficiently smooth for the Cq curves to be meaningful.

A measure of the strength of a magnetic storm is the Dst index, which reflects the

variations in the intensity of the symmetric part of the ring current at altitudes

ranging from about 3–8 earth radii. Yu et al. (2005) proposed a two-dimensional


____________________________________________________________________

38

chaos game representation (CGR) for the Dst index. The CGR provides an effective

method to characterize the multifractality of the Dst time series. The probability

measure of this representation is then modelled as a recurrent iterated function

system in fractal theory, which leads to an algorithm for prediction of a storm event.

The hydrophobic free energy and solvent accessibility of amino acids are used to

study the relationship between the primary structure and structural classification of

large proteins. A measure representation and a Z curve representation of protein

sequences are proposed (Yu et al. 2006). Fractal analysis of the measure and Z curve

representations of proteins and multifractal analysis of their hydrophobic free energy

and solvent accessibility sequences indicate that the protein sequences possess

correlations and multifractal scaling.

There are many other research studies on fractal and multifractal analyses of diverse

systems, such as shape characterization and modelling in a dynamic evolution

context (Backes and Bruno 2010), quantification of pore spaces of carbonate samples

based on the two-dimensional images from thin sections at various magnification

scales (Xie et al. 2010), multifractal analysis of Shanghai and Shenzhen stock

markets (Chen and Wu 2011).

However, work on multifractal analysis of complex networks is rare. It was known

that many complex networks have distinctive global features in common, including

the small-world effect and the scale-free degree distribution properties. To gain

further understanding of complex networks, Lee and Jung (2006) investigated local

features of the networks from the perspective of the statistical self-similarity. This

may provide us with not only deeper insight into complex networks, but also a

unified way of describing them. They found that the probability distribution of the

clustering coefficient is best characterized by a multifractal technique.

1.4.2 Methods

Box-covering (box-counting) method


____________________________________________________________________

39

The most common algorithms of multifractal analysis are the fixed-size box-covering

algorithms. In the one-dimensional case (Halsey et al. 1986), for a given box B, for a

given measure 0 ≤ µ ≤ 1 with supportE R⊂ in a metric space, we consider the

partition sum ( ) ( )( ) 0

q

B

Z q Bεµ

µ≠

= ∑ , where q is a real number and the sum runs

over all different non-overlapping boxes of a given size ε in a grid covering of the

support E. It follows that ( ) 0Z qε ≥ and (0) 1Zε = . The exponent ( )qτ of the measure

µ is defined by ( ) ( )0

lnlim

ln

Z qq ε

ετ

ε→= . And the generalized fractal dimensions of the

measure µ are defined as ( )1q

qD q

τ= − , for q ≠ 1; 1,

0lim

lnq

ZD ε

ε ε→= , for q = 1, where

( )( ) ( )1, 0ln

BZ B Bε µ

µ µ=

=∑ .

The box-covering method has been applied in various problems to investigate the

multifractal behaviours. For example, the dimension spectrum Dq has been calculated

for the coding and noncoding length sequences constructed from a complete genome

(Yu et al. 2001a); multifractal property of the measure representation and the

classification of bacteria have been discussed also by Dq spectra (Yu et al. 2001b);

multifractal analyses of the measures based on the chaos game representations of

protein sequences from complete genomes have also been carried out (Yu et al.

2004).

Sand-box method

For feasible computation of the generalized dimension on real data, Tél et al. (1989)

introduced a sand-box method. The generalized fractal dimension Dq is defined by

[ ] 1

0

0 0

ln ( ) 1lim ( ) lim

ln( ) 1

q

q qr r

M r MrD D L r L q

−

→ →= =

−. It is derived from the box-covering

method, but has better convergence. The idea is that one can randomly choose a

point on the object A, make a sandbox (i.e. a ball with radius r) around it, then count

the number of points in A that fall in this sand box of radius r, represented as M(r) in

the above definition. And L is the linear size of A; M0 is the total number of points in


____________________________________________________________________

40

the object A, the brackets <·> mean to take statistical average over randomly chosen

centres of the sandboxes.

The sand-box method is also used in many applications, for example Fernández et al.

(1999) investigated the theoretical and practical aspects of studying and measuring

the multifractal dimensions of neurons, emphasising the serious difficulties

encountered in multifractal analysis of experimental objects; Yu et al. (2005) used

the same method to investigate the multifractality of the Dst index which is a measure

of the strength of a magnetic storm.

1.5 Analysis of networks constructed from time series

There has been more and more interest in the study of complex networks and their

applications in a variety of fields, ranging from computer science and

communications, to sociology and biology. In parallel, there has also been much

interest in time series which are observed in many natural and man-made phenomena,

ranging from various indicators of economic activities to velocity fluctuations in

turbulent flows, heartbeat dynamics, and many other phenomena (Mantegna and

Stanley 2000, Friedrich et al. 2009). It is of interest and practical importance to

develop a framework that may connect the two different representations of complex

networks and time series.


Reconstruction of networks from time series is a common problem in diverse

research fields. Several approaches have been proposed to investigate properties of

time series via their corresponding constructed networks (Donner et al. 2011).

Cycle networks

Zhang and Small (2006), Zhang et al. (2008), and Small et al. (2009) showed that

pseudoperiodic time series can be investigated from the complex network


____________________________________________________________________

41

perspective: the nodes of the network correspond directly to cycles in the time series,

and network connectivity is determined by the strength of temporal correlation

between cycles. This representation encodes the underlying time series dynamics in

the network topology, which may then be quantified via the usual statistical

properties of the network. With the above framework, noisy periodic and chaotic

time series have been studied. In particular chaotic dynamics could be characterized

through the basic statistical properties of the network, such as the degree distribution,

average path length, and clustering coefficient. These statistical properties actually

reflect and quantify the hierarchy of unstable periodic orbits embedded in the chaotic

attractor which leads to small world characteristics. Therefore, this approach

provides information that is not available from classical nonlinear time series

analysis. Additionally networks constructed with these methods have characteristic

and distinct properties: linear and periodic systems have cycle networks that appear

randomly, while chaotic and nonlinear systems generate highly structured networks

(Zhang and Small 2006, Zhang et al. 2008).

Correlation network

Yang and Yang (2008) proposed a procedure for constructing complex networks

from the correlation matrix of a time series. By embedding an arbitrary time series,

the individual state vectors ix�

in the m-dimensional phase space of the embedded

variables can be considered as vertices of an undirected complex network.

Specifically, if the Pearson correlation coefficient , ,i ji jr x x=� �

is larger than a given

threshold r, the vertices i and j are considered to be connected. An original stock time

series, the corresponding return series and its amplitude series are converted into

correlation networks. The degree distribution of the original series can be well fitted

with a power law, while that of the return series can be well fitted with a Gaussian

function. The degree distribution of the amplitude series contains two asymmetric

Gaussian branches. The consideration of correlation coefficients between two phase

space vectors usually requires a sufficiently large embedding dimension m for a

proper estimation of ri,j. Hence, information about the short-term dynamics might be

lost. Moreover, since embedding is known to induce spurious correlations (Thiel et al.


____________________________________________________________________

42

2006) the results of the correlation method of network construction may suffer from

related effects.

Transition networks

Shirazi et al. (2009) proposed a method by which a given stochastic process is

mapped onto a complex network with distinct geometrical properties. The relation

between the statistical properties of the stochastic process, such as the intermittency

and correlation length, and their stochastic behaviours, as well as the properties of

their equivalent networks were then studied. As examples, complex networks for

several distinct time series, such as those for the free-jet turbulence, financial

markets and the white noise have been constructed. They described the physical

interpretation of the networks’ geometrical properties, such as their mean length,

diameter, clustering, average number of connection per node, and their stochastic

interpretations.

1.5.2 Methods

Among all the methods of transferring time series into networks, recurrence

networks and visibility graph (including horizontal visibility graph) have been

extensively studied.

Recurrence networks

A recurrence network is a complex network, whose adjacency matrix is given by the

recurrence matrix of a time series, i.e., we define the adjacency matrix of a

recurrence network by , , ,i j i j i jA R δ= − (Marwan et al. 2007b), where Ri,j means

resultant length of phase vectors i and j; and δi,j means Kronecker delta function

( ( ), {1 ; 0 }i j i j i jδ = = ≠ ). Since the recurrence matrix can be defined in different

ways, there are distinct sub-types of recurrence networks that are characterized by

somewhat different structural properties.

(i) k-nearest neighbour networks (Donner et al. 2011)


____________________________________________________________________

43

Following Eckmann et al. (1987), every (possibly embedded) observation vector is

considered as a node i, which is then linked to its k other vertices j that have the

shortest mutual distances di,j with respect to i in phase space (i.e., to its k nearest

neighbours). Unlike for cycle and correlation networks, the adjacency matrix of the

k-nearest neighbour network defined in such a way is generally asymmetric. Hence,

the resulting networks are characterized by directed edges of the adjacency matrix.

(ii) Adaptive nearest neighbour networks

Unlike other approaches for transforming time series into complex networks, the k-

nearest neighbour method leads to directed networks. However, in many cases the

properties of undirected networks would be more directly interpretable. In order to

define an undirected nearest neighbour network Xu et al. (2008) and Small et al.

(2009) suggested an alternative network construction method considering nearest

neighbours but correcting for a constant number of distinct edges assigned to each

node. The construction of adaptive nearest neighbour networks differs from the k-

nearest neighbours network, since the resulting matrix is symmetric, i.e., the edges

defined here are undirected from the beginning.

(iii) ε - recurrence networks

As a disadvantage of both types of nearest neighbour networks, there is no direct

relationship between their local as well as global properties and the invariant density

of the system under study. As an alternative, the neighbourhood of a single point in

phase space can also be defined by a fixed phase space distance ε (Wu et al. 2008;

Gao and Jin 2009 a, b, Marwan et al. 2009, Donner et al. 2010). A detailed

discussion of the geometric interpretation of a variety of global network properties as

well as vertex and edge properties of ε - recurrence networks, including graphical

representations of the spatial distributions of different vertex properties for the

Lorenz system in the standard parameter setting, can be found in Donner et al. (2010).

Visibility graphs (VG)

Visibility graph (VG), introduced by Lacasa et al. (2008), is a simple computational

method of converting a time series into a graph (network). It is inspired by the


____________________________________________________________________

44

concept of visibility (de Berg et al. 2008). The constructed graph inherits several

properties of the series in its structure. Thereby, periodic series convert into regular

graphs, and random series do so into random graphs. Moreover, fractal series convert

into scale-free networks, enhancing the fact that power law degree distributions are

related to fractality. These findings suggest that the visibility graph may capture the

dynamical fingerprints of the process that generate the series. Furthermore, it has

been recently pointed out that this algorithm stands as a method for estimating the

Hurst exponent h in fractional Brownian motion. A linear relation between h and the

exponent γ of the power-law degree distribution in the scale-free associated visibility

graph was shown to exist (Lacasa et al. 2009).

A visibility graph (Lacasa et al. 2008 & 2009) is obtained from the mapping of a

time series into a network according to the following visibility criterion: two

arbitrary data (ta, ya) and (tb, yb) in the time series have visibility, and consequently

become two connected nodes in the associated graph, if any other data (tc, yc) such

that ta < tc < tb fulfils ( ) c ac a b a

b a

t ty y y y

t t

−< + −−

; thus a connected unweighted network

could be constructed based on a time series.

Murk and Perc (2011) showed that time series of different complexities can be

transformed into networks that host individuals playing evolutionary games. A new

chaos-wavelet approach is presented for electroencephalogram-based diagnosis of

Alzheimer's disease employing visibility graph (Ahmadlou et al. 2010).

Horizontal visibility graph (HVG)

Horizontal visibility graph (HVG) proposed by Luque et al. (2009) is a slight

modification and simpler form of visibility graph which is essentially similar to VG

yet having a geometrically simpler visibility criterion. It is defined as follows. For a

time series of N data X = {x1, x2, ... , xN}, each data of X is assigned as a node in the

network. Two nodes xi and xj are connected with an edge if one can draw a horizontal

line in the time series joining xi and xj that does not intersect any data in between.


____________________________________________________________________

45

Thus, node i and j are connected if the following geometrical criterion is fulfilled

within the time series: ,i j nx x x> , for all n such that i < n < j.

It has been shown that the degree distribution of any HVG mapped from random

series without temporal correlations has an exponential form3

ln( )2

3( )

4

kP k e

−= , which

allows distinguishing chaotic series from independent and identically distributed time

series (Luque et al. 2009). Furthermore, numerical simulations show that an HVG

mapped from chaotic or correlated time series has exponential degree distribution

P(k) ∼ e−λk, in which a chaotic process has λ < ln(3/2) and a correlated time series

has λ > ln(3/2), separated by the identically distributed time series case with λ =

ln(3/2) (Lacasa et al. 2010).

Xie and Zhou (2011) investigated the topological properties of HVGs mapped from

fractional Brownian motions with different Hurst indexes. Special attention is paid to

the influence of the Hurst index of the fractional Brownian motion on the topological

properties of the associated HVG. Namely, the degree distribution, the clustering

coefficient, the mean length of the shortest paths, the motif distribution, the fractal

nature, and the mixing behaviour of the HVGs are numerically studied. The most

striking result is that the HVGs possess both fractal and assortative features, which is

different from the usual conclusion that fractal networks are disassortative (Yook et

al. 2005, Zhang et al. 2007). Gutin et al. 2011 prove that a graph is an HVG if and

only if it is outerplanar and has a Hamilton path. Therefore, an HVG is a noncrossing

graph, as defined in algebraic combinatorics

1.6 Contributions of this thesis

In Chapter 2 we introduce the theoretical definition and calculation of fractal

dimension and address several practical issues we should pay attention to during the

implementation of the box counting (or box covering) method. Then we present

several possible box covering algorithms that could be used to compute the fractal


____________________________________________________________________

46

dimension of networks. Among these methods, we choose the random sequential

box-covering algorithm due to its easy implementation and evaluate its performance

on detection of possible fractal scaling of protein-protein interaction networks.

i. We first test our programs on several protein-protein interaction networks and

compare the results with those in Song et. al (2005) and Kim et. al (2007). The

test results are quite similar.

ii. Then, we generate skeletons (a minimum spanning tree of the network with

highest edge betweenness) of these PPI networks. In our results the fractal

dimension of the skeleton is smaller than the fractal dimension of the original

network. This is in contrast with the finding of Kim et. al (2007). An

explanation is given and this situation happens mostly with small networks.

We then adopt the iterative scoring method to generate weighted PPI networks. By

using the random sequential box covering algorithm, we calculate the fractal

dimensions for both the original unweighted PPI networks and the generated

weighted PPI networks of five species, namely, Homo sapiens, E. coli, Arabidopsis

Thaliana, C. elegans and baker’s yeast S. cerevisiae. The results show that self-

similarity is still present in generated weighted PPI networks. This implies that it is

viable to expand the study of properties of complex networks to a wider field

including more complex weighted networks. We believe that we are the first to study

the self-similarity of weighted complex networks which are generated from

biological networks.

In Chapter 3, we first explain the necessity of multifractal analysis of complex

networks. Then, we discuss the reasons why the box-covering methods for fractal

analysis of complex networks could not be used directly in mulitfractal analysis of

such networks. Next, we propose a new box-covering algorithm for analysing the

multifractal behaviour of complex networks. To our knowledge, our investigation of

multifractal behaviours of complex networks via the box-covering method is the first

attempt in the current literature.


____________________________________________________________________

47

This algorithm is applied on generated scale-free networks, small-world networks

and random networks as well as protein-protein interaction networks. The numerical

results indicate that multifractality exists in scale-free networks and PPI networks,

while for small-world networks and random networks their multifractality is not

clear-cut, particularly for small-world networks generated by the NW model.

Furthermore, for scale-free networks, the values of Dq increase when the size of the

network increases because larger scale-free networks usually have more hubs which

make the structure of the network more complex. However, for random networks

there is no clear relationship between Dq and the size of the network. The quantity

∆Dq = maxDq - limDq is used to investigate how Dq changes. Larger ∆Dq means the

network's edge distribution is more uneven; while smaller ∆Dq means the network's

edge distribution is more symmetrical, which is the case for random networks.

We also apply our fixed-size box-covering method on gene networks reconstructed

from gene microarrays. Firstly, we use the fuzzy membership test to get the most

important genes that are related with the disease; then we construct networks based

on the microarray data of the selected genes by calculating the correlation coefficient.

The results show that multifractality exists in gene networks as well. This

multifractal analysis would provide a potentially useful tool for gene clustering and

identification.

These results support that the algorithm proposed in this thesis is a suitable and

effective tool to perform multifractal analysis of complex networks. Especially, in

conjunction with the derived quantities from Dq, the method and algorithm provide a

needed tool to cluster and classify real networks such as the protein-protein

interaction networks of organisms.

Complex networks which could provide us with a new viewpoint for understanding a

complex system from the relations between elements in a global way can be a

powerful tool for investigating nonlinear time series in practice. In Chapter 4, we

transform typical time series into networks and investigate their properties.


____________________________________________________________________

48

(i) Fractional Brownian motion converted into weighted networks

In this subsection, we apply the false nearest neighbours (FNN) method to estimate

the embedding dimension m for fractional Brownian motions (FBM) generated with

Hurst index h = 0.05, 0.1, …, 0.95. Then we construct weighted networks from FBM

with nodes corresponding to the m-vectors of the FBM series and weighted edges

corresponding to the Euclidean distance between the vectors. The results confirm

that the fractal dimensions of weighted networks constructed from fractional

Brownian motion decreases when its Hurst index increases.

(ii) Multifractal analysis of FBM HVG

From an FBM series, we build a horizontal visibility graph (HVG) and apply the

random sequential box-covering algorithm to calculate their fractal dimension. A

linear relationship is confirmed between the Hurst index and the fractal dimension of

HVG. Then we apply our newly proposed fixed-size box-covering method to detect

their multifractal behaviour. The results show that FBM HVGs are monofractal.

Additionally, we construct HVG for binomial multifractal measures and measure

representations of DNA sequences both of which have been known to be multifractal.

The results of our fixed-size box-covering method show that HVGs constructed from

multifractal time series are also multifractal. Our study confirms that HVGs inherit

the multifractality of the original time series.

(iii) Degree distribution of HVGs numerically.

The results show that the degree distributions of HVGs constructed from fractional

Brownian motions, binomial multifractal measures and measure representations of

DNA sequences have exponential tails. An approximately linear relationship

between the exponential exponent (b) and the Hurst index (h) is found.

(iv) Resilience comparision between VG (visibility graph) and HVG (horizontal

visibility graph) constructed from FBM.

The results show that VG which has scale-free degree distribution is more robust

than HVG which has exponential degree distribution.


____________________________________________________________________

49

Chapter 2

Fractal scaling of complex networks

2.1 Introduction

Mathematician Benoit Mandelbrot is the father of modern fractal analysis. He

worked on a broad array of apparently unrelated problems in economics, linguistics

as well as engineering. But it was just these intellectual wanderings that led him to

his most famous insight: that a multitude of diverse phenomena in these and other

fields all exhibited similar patterns that he would come to call fractals. A few of

these patterns had been known since the 19th century, but they were regarded merely

as weird curiosities. Mandelbrot realized that these patterns were common, even

ubiquitous, and were significant in many fields of inquiry. He set out to study them

as a unit, so he coined the term fractal and explored the underlying nature of these

patterns and the dynamical processes that create them. He succeeded in showing that

many complex and irregular patterns traditionally believed to be random, bizarre, or

too complex to describe are, in fact, strongly patterned and can be described by fairly

simple algorithms. His book ‘The Fractal Geometry of Nature’ (Mandelbrot 1982) is

the standard reference and contains both the elementary concepts and an unusually

broad range of new and rather advanced ideas, such as multifractals, currently under

active study. In this book, he offers a tentative definition of a fractal as follows: “A

fractal is by definition a set for which the Hausdorff-Besicovitch dimension strictly

exceeds the topological dimension.” This definition requires three terms which are

set, Hausdorff-Besicovitch dimension and topological dimension which is always an

integer. Given the restriction of this definition, Mandelbrot (1986) proposed another

definition which is less formal mathematical by using simple examples as: “A fractal

is a shape made of parts similar to the whole in some way.” And there are similar

ways of defining fractals such as: Fractals are sets defined by the three related

principles of self-similarity, scale invariance, and power law relations (Brown and

Liebovitch 2009); or a fractal is an object that displays self-similarity under

magnification and can be constructed using a simple motif (an image repeated on


____________________________________________________________________

50

ever-reduced scales) (Lynch 2004); or one could simply consider a fractal is an

object that has noninteger fractal dimension (Lynch 2004).

In this chapter we will pay attention to the scaling characteristics and especially the

possible multifractality of complex networks. Complex networks have been studied

extensively due to their relevance to many real-world systems such as the world-

wide web, the internet, biological and social systems. During the past two decades,

studies of such networks in different fields have produced many significant results

concerning their structures, topological properties, and dynamics. Three well-known

properties of complex networks are scale-free degree distribution, small-world effect

and self-similarity. The search for additional meaningful properties and the

relationships among these properties is an active area of current research.

This chapter aims to confirm that the study of properties of complex networks can be

expanded to a wider field including more complex weighted networks. Those real

networks that have been shown to possess the self-similarity property in the existing

literature are all unweighted networks. We use the protein-protein interaction (PPI)

networks as a key example to show that their weighted networks inherit the self-

similarity from the original unweighted networks.

Firstly, we confirm that the random sequential box-covering algorithm is an effective

tool to compute the fractal dimension of complex networks. This is demonstrated on

the Homo sapiens and E. coli PPI networks as well as their skeletons. Our results

verify that the fractal dimension of the skeleton is smaller than that of the original

network due to the shortest distance between nodes is larger in the skeleton, hence for

a fixed box-size more boxes will be needed to cover the skeleton.

Then we adopt the iterative scoring method to generate weighted PPI networks of five

species, namely Homo sapiens, E. coli, yeast, C. elegans and Arabidopsis Thaliana.

By using the random sequential box-covering algorithm, we calculate the fractal

dimensions for both the original unweighted PPI networks and the generated weighted

networks. The results show that self-similarity is still present in generated weighted

PPI networks.


____________________________________________________________________

51

2.2 Theoretical background

2.2.1 Mathematical concepts and definitions

Before detailing our work in the next two sections, we collect here some definitions

relating to different concepts of fractal dimensions as well as the well-known

methods for their calculation.

Open cover

For a given set nE ⊂ ℝ , we consider a collection { }, 0iU iα = > of open sets in nℝ

which cover E, that is, for which0

ii

E U>

⊂∪ . Such a collection is known as an open

cover. We usually simply refer to a cover, with the implicit assumption that every

element of the cover is an open set. One could imagine it as a collection of open balls.

We denote the diameter of a set Ui by: diam sup{ ( , ) , }i iU d x y x y U= ∈ and the

diameter of a cover bydiam α supdiami

iU

Uα∈

= . Fix ε > 0. If diam α ≤ ε, we say that α is

a ε-cover.

Hausdorff measure

Fix 0s ≥ , 0ε > . Define1

inf 0ss

ii

H Uε

∞

−= ≥∑ , where inf is the greatest lower bound.

Now we have a function H which depends on the set E and the parameters s and ε.

Observe that given ε1 > ε2> 0, any ε2-cover is also a ε1-cover. It means

2 1( ) ( )s sH E H Eε ε> and implies that H is monotonic as a function of ε. Then we define

1 1

( ) inf{ ( ) : , }s si i i

i i

H E diamU U E diamUε

ε∞∞

= =

= ⊃ <∑ ∪ (2.4)

as a s-dimension Hausdorff measure of E.

Hausdorff dimension

From definition (2.4) we could see that when s increases from 0 to infinite Hs(E)

doesn’t increase. Thus, if s < t, then ( ) ( )s s t tH E H Eε εε −≥ . And it is easy to see that


____________________________________________________________________

52

when Ht (E) > 0, ( )sH E = ∞ . Therefore, there exists a critical value dimH(E),

satisfying: H

H

, if 0 s <dim (E)( )

0, if dim (E) < s < sH E

∞ ≤= ∞

.It could also be written as

dim ( ) inf{ : ( ) 0} sup( : ( ) )s sH E s H E s H E= = = = ∞ ,

H

H

, if s < dim (E)( )

0 , if s > dim (E)sH E

∞=

. (2.5)

dimH(E) is the Hausdorff dimension of set E. In particular, the definition of

Hausdorff dimension allows for a covering of the set by ‘balls’ that are not all the

same size, but have diameters less then ε.

We use DH for a short representation of Hausdorff dimension. The familiar cases are

DH = 1 for lines, DH =2 for planes and surfaces, DH =3 for spheres and other finite

volumes. We shall see in numerous examples that there are sets for which the

Hausdorff dimension is noninteger and is said to be fractal.

The definition (2.5) could be used in practice (Feder 1988). Take the coast line for

example, we could cover it with a set of squares with edge length ε. Count the

number of squares needed to cover the coastline represented as N(ε). Now we may

simply find N(ε) for the smallest value of ε since it follows from equation (2.5) that

asymptotically in the limit of small ε

1

( )HD

N εε∼ . (2.6)

So ln ( )

lnH

ND

εε

=−

is accordingly valid. The number of DH, which needs not be an

integer, gives a measure of how the density of the fractal object varies with respect to

length scale.

Box dimension (box-counting dimension)

Box dimension or sometimes referred to as box-counting dimension, is another way

of measuring fractal dimension. The mathematical definition of box dimension is as


____________________________________________________________________

53

follows. For nF ⊂ ℝ , ε > 0, Nε (F) is the smallest ε-cover of set F with diameter

smaller then ε. We define

0

0

ln ( )dim lim

lnln ( )

dim limln

B

B

N FF

N FF

εε

εε

ε

ε

→

→

= − = −

(2.7)

as the lower box dimension and upper box dimension respectively, and denote

dimBF and dimBF . If dim dimB BF F= , thendim dimB B FF F D= = . DF is called the

box dimension or box-counting dimension, denoted

0

ln ( )lim

lnF

N FD ε

ε ε→=

−. (2.8)

In most cases, we assume that ε > 0 is sufficiently small to ensure that –ln ε and

similar quantities are strictly positive. To avoid problems with ‘ln 0’ or ‘ln ∞’ we

generally consider box dimension only for those non-empty bounded sets. In the

general theory of box dimension we assume that sets considered are non-empty and

bounded. A more practical alternative is to superimpose a regular grid of pixels of

length ε on the object and count the number of occupied pixels. This procedure is

repeated using different value of ε. The volume occupied by a curve is then estimated

with a series of counting boxes spaning a range of volumes down to some small

fraction of the entire volume. The number of occupied boxes increases with

decreasing box size, leading to the following power-law relationship:

( ) FDN kε ε −= , (2.9)

where ε is the box size, N(ε) is the number of boxes occupied by the curve, k is a

constant, and DF is the box fractal dimension.

Cluster dimension

Formally, the box dimension can be generalized to characterize the extent of self-

similar spatial clustering in point patterns. This is of salient importance in ecology,

where organisms can be regarded as discrete events distributed in two- and three-

dimensional spaces, for instance, the distribution of trees in a forest or cows and

sheep in a pasture that can be regarded as points presenting different degrees of


____________________________________________________________________

54

clustering. The cluster dimension, DC, is conceptually equivalent to the box

dimension DF, and is defined as

( ) CDN kε ε −= , (2.10)

where ε is still the box size, N(ε) is the number of boxes occupied by at least a single

point, k is a constant, and DC is the cluster dimension.

The cluster dimension can also be calculated using ‘counting disks’ instead of boxes

(Frontier 1987). Robertson et al. (1995) used a three-dimensional ‘cube-counting

version’ of the cluster dimension to study the distribution of earthquake hypocenters

in space. As the one-to-one correspondence between the box dimension and the

cluster dimension we could note that DF = DC.

Mass dimension

This method has initially been developed to analyse point pattern data (Voss 1988)

but can easily be applied to any objects embedded in two- or three-dimensional space.

It can be applied to digitized images as the area-perimeter methods but does not

required discrete patterns. Formally, the method counts the number of pixels

occupied by an object in square (ε×ε) sampling windows or equivalently circles of

radius ε as NO(ε). The mass m(ε) of occupied pixels is then defined as

( )

( )( )

O

T

Nm

N

εεε

= , (2.11)

where NO(ε) and NT(ε) are the number of occupied pixels and the total number of

pixels an observation window of size ε. These computations are repeated for various

values of ε, and the mass dimension DM is defined as

( ) MDm kε ε= , (2.12)

where k is also a constant.

Practically, the mass m(ε) could be estimated using squares or circles of increasing

size ε starting from the centre of domain. This approach is best suited to objects that

follow some radial symmetry, such as diffusion-limited aggregates. In addition, we

stress here that increases in the window size (ε) may result in exclusion of greater


____________________________________________________________________

55

proportion of pixels along the periphery of the domain. Under assumption of isotropy,

a circle edge correction can nevertheless be applied to circumvent this problem.

Alternatively, in the case of point-pattern data sets, calculating the mass m(ε) as the

average mass in a number of squares or circles of radius ε is recommended.

Fig 2.1 The mass dimension method

Seuront (2010) (Fig 2.1) illustrates how to use squares of increasing size starting

from the centre (left) or the side (right) of the domain under interest. One counts the

number of occupied pixels (shown in black), and estimates the mass m(ε). The slope

of the linear behaviour of m(ε) vs. ε in a log-log plot provides an estimate of the mass

dimension, DM.

Information dimension

The information dimension DI can be conceptually related to the box dimension DF

and the cluster dimension DC, because it is based on a count of occupied boxes of

varying size ε. However, in the box and the cluster dimension estimates, a box is

counted as occupied and enters the calculation of N(ε), regardless whether it contains

one point or a relatively large number of points. The information dimension provides

more details, as the number of points ni within each occupied box is counted and is

expressed as the relative frequency fi:

ii

nf

N= , (2.13)


____________________________________________________________________

56

where N is the total number of points in the set ( )

1

N

iif

ε

=∑ , and N(ε) is the number of

occupied boxes of size ε. A weight is then assigned to each box; the boxes containing

a greater number of points count more than boxes with fewer points. The information

entropy or Shannon entropy is defined as

( )

1

( ) lnN

i ii

H f fε

ε=

= −∑ . (2.14)

For nonuiformly distributed point patterns, the information dimension DI is defined

as

( ) ln lnIH k Dε ε= − , (2.15)

where k is still a constant.

Correlation dimension

The correlation dimension is well adapted to the characterization of spatial clustering

of the point patterns and was initially introduced to characterize the dimension of

strange attractor. This method is widely used in empirical analysis of dynamical

systems (Grassberger and Procaccia 1983) and in cosmology (McCauley 2001). The

correlation function, usually referred to as the correlation integral C (ε), could be

written simply as

,2

1( ) i jC N

Nε = , (2.16)

where N is the total number of pairs of points in the set, and Ni,j is the number of

pairs whose Euclidean distance di,j < ε. In other words, the correlation integral C (ε)

represents the probability that the distance between a pair of randomly chosen points

will be less than a distance ε apart. For nonuiformly distributions, C (ε) is given by

( ) corDC kε ε= , (2.17)

where k is still a constant and Dcor is the correlation dimension.

2.2.2 Calculation of fractal dimension

Box counting method


____________________________________________________________________

57

Box counting method is probably the best known method and most commonly used

to analyse sets embedded in two dimensions. It is a versatile method because, unlike

the divider method, it can be used with any kind of set embedded in two dimensions.

It is also easily generalized to other embedding dimensions, which enhances its

popularity. The box counting method is designed simultaneously which allows us to

determine whether a pattern is fractal and, if it is, to estimate its fractal dimension.

Following is the procedure.

Firstly, overlay a grid of squares on the object to be measured, and then count the

number of boxes that contain part of the object, that is, in intercept with at least one

point of the set. The number of boxes NB required to cover the set will depend on the

linear size of the squares, rB. So NB is a function of rB. Secondly, we reduce the mesh

size of the grid and again count the number of boxes occupied by at least some part

of the object. The number of boxes will increase while the size of the box gets

smaller. So we repeat the first two steps by reducing the size of the boxes in the grid

overlaid on the object and recording the two variables NB and rB each time. At last,

we plot the log of NB against the log of rB. If the relation between the two is linear, it

is a power law, and therefore the object is fractal. Usually we take the absolute value

of the slope of the best-fit line on our plot as the numerical box-counting dimension

which is the fractal dimension.

Applying the method in practice

To avoid simple mistakes, we should consider several practical issues when

implementing the box counting method (Brown and Liebovitch 2009).

i. To estimate the dimension accurately, we should minimize the number of

occupied boxes, NB, at each step of the calculation. Deviations from this rule

contribute small error to the estimate of dimension. One solution with random

fractals is to offset the grid repeatedly, perhaps randomly, to minimize NB at

each step of analysis. An alternative is to rotate the grid.

ii. It is also important to fashion a grid that tightly matches the boundaries of the

object that we wish to measure. Spurious results could be obtained if we use a

mesh or grid that is arbitrarily larger than the fractal itself.


____________________________________________________________________

58

iii. We should also try to use a wide range of box sizes as possible, but within the

finite limits of the data set. More specifically, box sizes that are larger than

the entire data set or smaller than a single data point do not contribute

additional information to the analysis and should be avoided. We usually

expect the range of fractal behaviour to extend over at least a couple of orders

of magnitude. If the fractal property extends over only a very short range, its

fractal dimension may not be a meaningful description of the data.

iv. The increment by which we reduce the box sizes will have an effect on the

analysis (Foroutan-pour et al. 1999) because each box size becomes a case in

the subsequent analysis of the power law relation. For ease of analysis, it may

be desirable to ensure that the logarithms of the box sizes are evenly spaced

on the log-log plot. It is also preferable to perform the analysis at many

different box sizes because each step of iteration of the process contributes an

additional data point to the analysis on log-log plot. If the increment of

reduction is large, then we cannot use many box sizes because we quickly

reach the finite limits of the fractal. On the other hand, if we choose too small

increments, then the number of occupied boxes may not change between

iterations, creating a series of undesirable plateaus in the graph.

2.3 Self-similarity of complex networks

As we noted above, self-similarity is a typical property of fractals. A self-similar

object is composed of smaller copies of itself and these smaller copies are also

composed of even smaller copies, and so on. The word “similar” carries its

geometrical meaning: objects that have the same form but may be different in size.

The result is an object composed of a single pattern that repeats itself many times at

many different sizes.

Complex networks are networks with more complex architectures than classical

random graphs with simple Poissonian distributions of connections. Two well-known


____________________________________________________________________

59

properties of complex networks are scale-free degree distribution and small-world

effect.

Could small-world networks be self-similar?

Small-world effect is mathematically characterized by an average shortest path

lengthl that depends at most logarithmically on the network size N which is the

number of nodes in the network, lnl N≈ , where l is the shortest distance between

two nodes and defines the distance metric in complex networks. Equivalently, we

obtain 0

llN e≈ where l0 is a characteristic length. Self-similar objects with

parameters N and s are described by a power law such as: 1D

Ns∼ . The exponent d is

the dimension of the scaling law known as the fractal dimension.

It is obvious that the power-law relation which is required by self-similarity is in

contradiction with what a small-world property implies. However, in the real world

some complex networks which have a small world effect show the self-similarity

property during a renormalization process by means of box covering algorithms.

Recently the box covering algorithm has been studied (Song et. al 2005) to

demonstrate the existence of self-similarity in many real networks. The fractal and

self-similarity properties of complex networks were subsequently studied extensively

in a variety of systems (Rozenfeld 2008, Song et. al 2006) through box-covering

algorithms. Kim et al. (2007a, b) has introduced another method called the random

sequential box-covering. Details of these box-covering algorithms will be introduced

and discussed in the following parts of this chapter.

2.3.1 Methods

Box-covering algorithms for fractal scaling in complex networks

A box-covering method is a basic tool to measure the fractal dimension of

conventional fractal objects embedded in the Euclidean space (Feder 1998, Brown

and Liebovitch 2009). However, such a method cannot be applied to scale-free


____________________________________________________________________

60

networks that exhibit an inhomogeneous degree distribution and the small-world

effect because Euclidean metric is not well defined in such networks. Song et al.

(2007) has provided a detailed study of the algorithms used to calculate quantities

characterizing the topology such as the fractal dimension dB of such complex

networks. They studied and compared several possible box covering algorithms on a

number of model and real-world networks. They showed that the optimal network

covering can be directly mapped to a vertex colouring problem, which is a well-

studied problem in graph theory. They found that the approach leads to the most

efficient solution of the optimal box covering problem.

Greedy colouring algorithm

Fig 2.2 Illustration of the solution for the network covering problem via mapping to the graph colouring problem.

The greedy colouring algorithm (Song et. al 2007) is based on the greedy algorithm

(Cormen et. al 2001) and vertex colouring program. Starting from G we construct the

dual network G’ for a given box size (lB = 3), where two nodes are connected if the

distance between them is no smaller than lB. The greedy algorithm for vertex

colouring in G’ is then used to determine the box covering in G, as shown in Fig 2.2.

We can simply proceed in two main steps: first, ranking the nodes in a sequence

randomly; second, marking each node with a free colour, which is different from the

G

Colouring

G’


____________________________________________________________________

61

colours of its nearest neighbours (according to the fixed value of lB). For this

implementation, we need a two dimensional matrix cil of size maxBN l× , whose values

represent the colour of node i for a given box size l = lB.

(i) Assign a unique id from 1 to N to all network nodes, without assigning any

colours yet;

(ii) For all lB values, assign a colour value 0 to the node with id = 1, i.e. cil=0.

(iii) Set the id value i =2. Repeat the following until i = N.

(a) Calculate the distance lij from i to all the nodes in the network with id j less

than i

(b) Set lB =1.

(c) Select one of the unused colours ijjlc from all nodes j<i for which lij≥lB. This

is the colour Bilc of node i for the given value.

(d) Increase lB by one and repeat (c) until maxB Bl l= .

(e) Increase i by 1.

This greedy algorithm is very efficient, since we can cover the network with a

sequence of box sizes lB performing only one network pass. Because the results may

depend on the original colouring sequence, in order to investigate the quality of the

algorithm, Song et al. (2007) have randomly reshuffled the colouring sequence and

applied the greedy algorithm 10000 times on several different models and real-world

networks. Strictly speaking, the calculation of the fractal dimension dB through the

relation ~ BdB BN l − is valid only for the minimum possible value of NB for any given

lB value, so an algorithm should aim to find this minimum NB. For the greedy

colouring algorithm it has been shown (Cormen et. al 2001) that it can identify a

colouring sequence which yields the optimal solution, i.e. the minimal value from the

greedy algorithm coincides with the optimal value.

Burning with the diameter lB

A traditional geometrical approach is the so-called ‘burning’ algorithm (breadth-first

search) (Song et. al 2007). The basic idea is to generate a box by growing it from one

randomly selected node towards its neighbourhood until the box is compact (the box


____________________________________________________________________

62

includes the maximum possible number of nodes, i.e. when there do not exist any

other network nodes that could be included in this box), or equivalently each box

should include the maximum possible number of nodes. This algorithm is quite

simple and can be summarized as follows:

(i) Choose a random uncovered node as the seed for a new box.

(ii) All uncovered nodes connected to the current box are tested for being

within distance lB from all the nodes currently in the box. Nodes that obey

this criterion are included in the box.

(iii) Repeat (ii) until there are no more nodes that can be added into this box.

(iv) Repeat (i)-(iii) until all nodes are covered.

Although this algorithm is quite easy to implement, it requires a very long

computational time. For this reason, another algorithm which is called compact-box-

burning has been introduced.

Compact-box-burning (CBB)

For the case of a complex network (Song et. al 2007), we could follow these steps:

(i) Construct the set C of all yet uncovered nodes.

(ii) Choose a random node p from the candidate set C and remove it from C.

(iii) Remove from C all nodes i whose distance from p is pi Bl l≥ , since by

definition they will not belong to the same box.

(iv) Repeat steps (ii) and (iii) until the candidate set is empty.

The set of the chosen nodes p forms a compact box. We then repeat the above

procedure until the entire network is covered.

Maximum-excluded-mass-burning (MEMB)

The formal definition of boxes includes the maximum separation lB between any two

nodes in a box. However, it is possible to recover the same fractal properties of a

network where a box can be defined as nodes within a radius rB from a central node.

Based on this box definition and random central nodes, and a newly defined

‘excluded mass’ of a node (meaning the number of uncovered nodes within a


____________________________________________________________________

63

chemical distance less than rB ), the maximum-excluded-mass-burning (MEMB)

(Song et. al 2007) has been introduced as follows:

(i) Initially, all the nodes are marked as uncovered and non-centres.

(ii) For all non-centre nodes (including the already covered nodes) calculate the

excluded mass, and select the node p with the maximum excluded mass as

the next centre.

(iii) Mark all the nodes with chemical distance less than rB from p as covered.

(iv) Repeat steps (ii) and (iii) until all nodes are either covered or cantered.

Notice that the excluded mass has to be updated in each step because it is possible

that it has been modified during each step. A box centre can also be an already

covered node, because it may lead to a largest box mass.

For both the greedy colouring and the CBB algorithm the connectivity of boxes is

not guaranteed. That is, for some boxes there may not be a path inside the box that

connects two nodes belonging to the same box. The reason is that some boxes may

already include certain nodes that are crucial for the optimization of other boxes. But

the MEMB algorithm always yields connected boxes and this is the most appropriate

method when this condition is required.

In Kim et al. (2007), a comparison between the greedy colouring, the CBB and

MEMB has been presented to show that the three methods, except the random

burning with rB, are not sensitive to the specific realization used. The calculation of

the fractal dimension dB yields the same value for all the presented algorithms

indicating that the scaling of the number of boxes is quite stable in all cases.

Random sequential box-covering

Compared with the above algorithms, Kim et al. (2007a, b) have introduced another

method called the random sequential box-covering which has the following steps.

Start with all vertices labelled as not burned. Then,

(i) Select a vertex randomly at each step; this vertex serves as a seed.


____________________________________________________________________

64

(ii) Search the network by distance rB from the seed and burned all vertices

found but not burned yet. Assign newly burned vertices to the new box. If

no newly burned vertex is found, the box is discarded.

(iii) Repeat (i) and (ii) until all vertices are assigned to their respective boxes.

Here rB is the radius of a box which is related to lB approximately as 2 1B Bl r= + . A

different Monte Carlo realization of this procedure ((1)–(3)) may yield a different

number of boxes for covering the network. In Kim et al. (2007a, b), for simplicity,

the smallest number of boxes among all the trials has been chosen.

In our study we have chosen the random sequential box-covering algorithm to

calculate the fractal dimension in protein-protein interaction networks. In fact, this

algorithm shares a common spirit with the other algorithms introduced by Song et al.

(2006); however, their details differ from one to another in the following

perspectives. The random sequential (RS) box-covering method contains a random

process of selecting the position of the centre of each box. A new box can overlap

with preceding boxes. In this case, vertices in pre-assigned boxes are excluded in the

new box, and thereby, vertices in the new box can be disconnected within the box,

but connected through a vertex or vertices in a preceding box or boxes. Nevertheless,

such a divided box is counted as a single one.

In Fig 2.3, vertices are selected randomly, for example, from vertex 1 to 4

successively. Vertices within distance rB = 1 from vertex 1 are assigned to a box

represented by solid (red) circle. Vertices from vertex 2, not yet assigned to their

respective box are represented by dashed-dotted-dotted (black) closed curve, vertices

from vertex 3 are represented by dashed-dotted (green) circle and vertices from

vertex 4 are represented by dashed (blue) ellipse.


____________________________________________________________________

65

Fig 2.3 Schematic illustration of the RS box-covering algorithm introduced by

Kim et al. (2007a, b).

Applying random sequential box-covering in practice

To avoid simple mistakes, we should consider several practical issues when

implementing the RS box-covering algorithm.

i. To estimate the dimension accurately, we should minimize the number of

occupied boxes, NB, at each step of the calculation. Because the centre node is

chosen randomly in each step, this will affect the number of boxes for a fixed rB.

More specifically, if a node with large degree (hubs) is randomly chose at first, a lot

more nodes could be covered, and it is an efficient way when we produce box-cover.

However, if a node with small degree (or isolated node) is randomly chose at first,

few nodes could be covered. In order to overcome this situation, we run the node

sequence for at least 1000 times, and use the minimum value of each NB for

computing the fractal scaling exponent.

ii. The increment by which we reduce the box sizes will have an effect on the

analysis (Foroutan-pour et al. 1999) because each box size becomes a case in the

subsequent analysis of the power law relation. In our work, we set rB starting from 1

and increase 1 at each step because for unweighted networks, shortest path length

between two nodes are real values.


____________________________________________________________________

66

iii. Choose appropriate range of rB. If rB extends over only a very short range its

fractal dimension may not be meaningful. And we should ensure the range of rB to

match the boundaries of the object.

Skeleton of complex networks

The concept of skeleton, introduced by Kim et al (2004), refers to a special minimum

spanning tree. In scale–free networks, the distribution of the vertex betweeneess

centrality (which is calculated in equation (3.4)) is known to follow a power law with

exponent of either 2.2 or 2.0 (Milkova 2007). Though the edge betweenness

distribution does not follow a power law exactly, the distribution of the edge

betweenness is also very inhomogeneous in scale-free networks. This indicates that

there exist extremely essential edges having large edge betweenness which are used

for communication very frequently. Thus one can imagine a sub-network constructed

only by these essential edges with global connectivity retained which was regarded

as a communication kernel because it handles most of the traffic on a network.

Skeleton is just the communication kernel which is obtained with a set of edges

maximizing the summation of their edge betweenness on the original networks (Goh

et al. 2006). The constructing procedure is very similar to the minimum spanning tree

algorithm which repeatedly selects an edge according to the priority of the edge

betweenness and adds the edge to the tree if it does not make any loop until the tree

includes all vertices. Note that the residual edges can be regarded as the shortcuts

since they shorten paths on the spanning tree.

Here we use Prim’s algorithm (Prim 1957) to calculate the skeleton of a network.

Prim’s algorithm is an algorithm in graph theory that finds a minimum spanning tree

for a connected weighted graph. This means it finds a subset of the edges that forms

a tree that includes every vertex, where the total weight of all the edges in the tree is

minimized. The algorithm was developed in 1930 by the Czech mathematician

Vojtech Jarmik and later independently by computer scientist Robert C Prim (1957)

and rediscovered by Edsger Dijkstra in 1959.


____________________________________________________________________

67

2.3.2 Numerical results and discussion

In this section, we present some typical results of fractal scaling tested on protein-

protein interaction networks using the random sequential box-covering algorithm.

The study of protein interactions is fundamental to understand how proteins function

within a cell (Palzkill 2002, Pellegrini 1999). It has been observed that proteins

seldom act as single isolated species (Park et al. 2009, Qi 2007) while performing

their functions in vivo. Protein-protein interactions play different roles in biology

depending on the composition, affinity, and lifetime of the association. The simplest

representation of PPI networks (Dandekar 1998, Enright 1999, Rual et al. 2005)

takes the form of a mathematical graph consisting of nodes and edges (or links).

Proteins are represented as nodes and an edge represents a pair of proteins which

physically interact.

Properties of PPI Networks

A protein-protein interaction network has three main properties (Hu and Pan 2007):

scale free degree distribution, small-world effect and disassortativity. Much work has

been done to study these properties and to find new ones.

1. Scale free degree distribution: in scale-free networks, most proteins

participate in only a few interactions, while a few participate in dozens of

interactions.

2. Small-world effect means that any two nodes can be connected via a short

path of a few links.

3. Disassortativity: in protein-protein interaction networks the nodes which are

highly connected (hubs) are seldom link directly to each other. This is very different

from social networks in which well-connected people tend to have direct connections

to each other. Most biological and technological networks have the property of

disassortativity.


____________________________________________________________________

68

Databases

The PPI data that we used were downloaded from the Database of Interacting

Proteins (DIP: http://dip.doe-mbi.ucla.edu/). The DIP database is composed of nodes

and edges. Each protein participating in a DIP interaction is identified by a unique

identifier of the form <DIP:nnnN> and cross-references to, at least, one of the major

protein databases - PIR, SWISSPROT and/or GENBANK In addition, some basic

information about each protein, such as name, function, subcellular localization and

cross-references to other biological databases is stored locally (if available) in case

the cross-referenced databases are not accessible. The information about each DIP

interaction is identified by a unique identifier of the form <DIP:nnnE> that provides

access to information such as the region involved in the interaction, the dissociation

constant and the experimental methods used to identify and characterize the

interaction. The DIP database can be searched in a number of ways to retrieve the

information about specific protein or interaction. It is also possible to retrieve entire

groups of proteins or interactions fulfilling user-specified criteria. We chose two

species: Homo sapiens and Escherichia coli. The PPI networks of C. elegans and

Arabidopsis thaliana are downloaded from BioGRID for a comparison.

Useful softwares

Cytoscape is a project dedicated to building open-source network visualization and

analysis software. Its roots are in systems biology, where it is used for integrating

biomolecular interaction networks with high-throughput expression data and other

molecular state information (Paul et al. 2003).

The purposes of using Cytoscape in this study are as follows:

i. Visualization of PPI. More specifically, we import the PPI data into

Cytoscape to get a global view of the PPI network. We could see from Fig.

2.2 and Fig. 2.8 that, because of the incomplete of PPIs of Homo sapiens and

E. coli, both of their PPI networks are not entirely connected. More

specifically, their protein-protein interactions are several separated

subnetworks.


____________________________________________________________________

69

ii. So we take the largest connected exponent (which is the largest connected

subnetwork) as a network on which we will perform the fractal scaling

analysis.

Matlab and Fortran have been used for calculation in this chapter. Matlab is good for

computing the adjacency matrix (or interaction matrix) and the shortest path length

matrix of networks as well as the box-covering algorithm for networks within 500

hundreds of nodes. For larger networks we use Fortran for fractal scaling which

consumes less time.

The flow-chart 2.1 shows the basic steps from the very beginning of original PPI data

to fractal scaling of network. We chose three typical PPI data (download from DIP)

and test the fractal scaling process on them. Both Hsapi20041003 and

Hsapi20081014CR are human PPI data of different years and versions and

E.coli20081014CR is E.coli PPI data. We first tested our programs on

Hsapi20041003 and compared our results with those of Song et al. (2005) and Kim et

al. (2007) based on the same data.

A summary of the numerical results is presented in Table 1.1, including the number

of nodes (N), the number of edges (E), the number of nodes of largest connected sub-

network (V), the number of edges of largest connected sub-network (L), the fractal

dimension of the original network (dB) and the dimension of the skeleton of the

original network (db).

Table 2.1 Numerical results of fractal scaling of PPI networks. Version N E V L dB db

Hsapi20041003 1065 1369 563 894 2.32± 0.06 2.01±0.03

Hsapi20081014CR 1389 1619 638 932 2.30±0.08 2.20±0.07

E.coli20081014CR 725 991 248 453 2.11±0.09 1.95±0.06


____________________________________________________________________

70

Flow-chart 2.1: Fractal scaling of complex networks

Translate the PPI data of G into an adjacency matrix of size N×N (N is the number of nodes in G)

Calculate the edge betweenness and translate them into edge weight

Calculate the shortest path length matrix of G

Calculate the fractal dimension of the G

Calculate the shortest path length matrix for the skeleton

Calculate the skeleton of G using Prim’s algorithm

Calculate the fractal dimension of the skeleton

Import PPI data into Cytoscape and take the largest connected exponent as network G for the following fractal analysis. Export the PPI data of G.

Compare the fractal dimension of G and its skeleton


____________________________________________________________________

71

Fig 2.4 Protein-protein interaction network of the Homo sapiens version of Hsapi20041003

The original PPIs of Hsapi20041003 include 1065 proteins and 1396 interactions

(which are separated exponents) as showing in Fig. 2.4. We chose the largest

connected part G which includes 563 nodes and 894 edges showing in Fig 2.5. The

fractal scaling analysis of G is showed in Fig 2.6. The blue straight line drawn for

guidance has a slope of -2.32 which means the fractal dimension of this network is

2.32. This result is quite close to the results in Song et al. (2005), which is 2.28, and

Kim et al. (2007), which is 2.3.


____________________________________________________________________

72

Fig 2.5 The largest connected sub-network of Homo sapiens (Hsapi20041003)

Fig 2.6 Fractal scaling for the sub-network G of Hsapi20041003

The skeleton of the largest connected sub-network (Fig 2.5) of Homo sapiens

(Hsapi20041003) was calculated via Prim’s algorithm. This skeleton contains the

same amount of 563 nodes and 562 edges which has the highest edge betweenness

and connects all the 563 nodes. A graph of this skeleton is shown in Fig 2.7. Fractal

scaling of skeleton is shown in Fig 2.8. The blue straight line drawn for guidance has

a slope of -2.01 which means the fractal dimension of this skeleton is 2.01. This

ln(rB)

ln(N

B)


____________________________________________________________________

73

fractal dimension of the skeleton is smaller than the fractal dimension of the original

network (2.32). The reasons are as follows.

Fig 2.7 Skeleton of the network showed in Fig 2.5

Fig 2.8 Fractal scaling of the skeleton of subnetwork G of Hsapi20041003

A skeleton is formed by edges with the highest betweenness centralities or loads. The

remaining edges in the original network are referred to as shortcuts that contribute to

loop formation. In other words, the distance between any two nodes in the original

ln(N

B)

ln(rB)


____________________________________________________________________

74

network could be increased in the skeleton. Take the previous network as example:

the largest distance between any two nodes in the original network is 21 while the

largest distance between any two nodes in its skeleton is 27. So when we produce the

box-covering program on the skeleton, more boxes will be needed for each fixed box

radius rB. The increasing rate of the number of boxes (NB) varies when the size of the

box increases (rB). More specifically, when rB is smaller, the number of boxes needed

doesn’t show much difference for both the original network and its skeleton; when rB

is larger, much more boxes are needed to cover the skeleton than the original

network. We could see it from Fig 2.9 using G (the largest connected part of

Hsapi20041003) and its skeleton for example. The blue lines represent the fractal

scaling of the network and the upper red dots represent the fractal scaling of its

skeleton.

Fig 2.9 Fractal scaling of the network and its skeleton

The PPI data of Hsapi20081014CR include 1389 proteins and 1619 interactions. We

chose the largest connected part G which includes 638 nodes and 932 edges. The

fractal scaling of G is showed in Fig 2.10. The blue straight line drawn for guidance

has a slope of -2.30 which means the fractal dimension of this network is 2.30. The

skeleton of the largest connected sub-network of Hsapi20081014CR has been

ln(rB)

ln(N

B)


____________________________________________________________________

75

calculated through Prim’s algorithm. This skeleton contains the same amount of 638

nodes and 637 edges which have the highest edge betweenness and connect all the

638 nodes. The fractal scaling of the skeleton is shown in Fig 2.11. The blue straight

line drawn for guidance has a slope of -2.20 which means the fractal dimension of

this skeleton is 2.20. The fractal dimension of the skeleton (2.20) is smaller than the

fractal dimension of the original network (2.30).

Fig 2.10 Fractal scaling of the largest connected part of Hsapi20081014CR

Fig 2.11 Fractal scaling of the skeleton of the largest connected part of Hsapi20081014CR

ln(N

B)

ln(N

B)

ln(rB)

ln(rB)


____________________________________________________________________

76

Fig 2.12 PPI network of E. coli (version of E.coli20081014CR)

Fig 2.13 The largest connected sub-network G of E. coli


____________________________________________________________________

77

Fig 2.14 Fractal scaling for the sub-network of E. coli

Fig 2.15 Fractal scaling of the skeleton of the sub-network of E. coli

Protein-protein interaction data of E.coli20081014CR include 725 proteins and 991

interactions which are shown in Fig 2.12. We chose the largest connected part G

which includes 248 nodes and 453 edges showed in Fig 2.13. The fractal scaling of

network G is showed in Fig 2.14. The blue straight line drawn for guidance has a

slope of -2.11 which means the fractal dimension of this network is 2.11. The

skeleton of G has been calculated through Prim’s algorithm. This skeleton contains

the same amount of 248 nodes and 247 edges which have the highest edge

ln(rB)

ln(rB)

ln(N

B)

ln(N

B)


____________________________________________________________________

78

betweenness and connect all the 248 nodes. Fractal scaling of the skeleton is shown

in Fig 2.15. The blue straight line drawn for guidance has a slope of -1.95 which

means the fractal dimension of this skeleton is 1.95. The fractal dimension of the

skeleton (1.95) is smaller than the fractal dimension of the original network (2.11).

2.4 Fractal dimensions of weighted PPI networks

2.4.1 Methods

Protein-protein interactions (PPI) play a critical role in most cellular processes and

form the basis of biological mechanisms. In the post-genome era, the developments

of high-throughput methods, such as yeast-two-hybrid and mass spectrometry, have

produced vast amounts of PPI data, which makes it possible to study genes and

proteins at the network level (Uetz et. al. 2000). PPI networks are normally

represented as unweighted graphs. However, given the varying reliability of

interactions, these unweighted graphs are far from being optimal in representing the

data (Deane et. al 2002, Patil and Nakamura 2005). More effective analysis would be

achieved by considering weighted PPI networks in which each edge is associated

with a weight representing the probability of an interaction. For this aim, many

computational approaches have been proposed.

Pei and Zhang (2005) proposed a model to integrate different PPI data sets. They

construct this model using prior knowledge of data set reliability. Based on the

model, a topological measurement was proposed to select reliable interactions and to

quantify the similarity between two proteins’ interaction profiles. The topological

measurement exploits the small-world network topological property of PPI network

as well as some other properties. Li et al. (2009, 2010) proposed a method for

weighing protein-protein interactions based on the combination of logistic

regression-based model and function similarity. Many significant functional modules

and essential proteins are discovered by the weighted PPI networks.


____________________________________________________________________

79

CD-distance

Brun et al. (2003) described a new computational method allowing the functional

clustering of proteins on the basis of PPI data. In their work of clustering only the

proteins which involved in at least three binary interactions were considered. That

means proteins implicated in one or two interactions were not classified. Firstly, if

two proteins interact with each other or they have at least one common neighbour

then a relation will be established between these two proteins, that is to say there is

edge between these two nodes in the PPI network. Secondly, the Czekanovski-Dice

distance (CD-distance) between any two nodes is computed by

( ) ( )( , ) u v

u v u v

N ND u v

N N N N

∆=∪ + ∩

, (2.18)

where Nu is the node set including node u and its nearest neighbours; Nu ∆ Nv is the

symmetrical difference between the two sets.

FS-Weight

Protein complexes are fundamental for understanding principles of cellular

organizations. As the sizes of protein–protein interaction networks are increasing,

accurate and fast protein complex prediction from these PPI networks can serve as a

guide for biological experiments to discover novel protein complexes. However, it is

not easy to predict protein complexes from PPI networks, especially in situations

where the PPI network is noisy and still incomplete. Chua et al (2008) proposed a

method in which all direct and indirect PPI interactions are first weighted using

topological weight (FS-Weight) which estimates the strength of functional

association. Then different clustering methods were used based on the weighted PPI

networks to predict protein complex.

The functional similarity weight (FS-Weight) is formulated based on the underlying

hypothesis that proteins share functions as a result of two distinct ways of association:

direct functional association through interactions, and indirect functional association

through interactions with common proteins. If two proteins share many common

interaction partners, they are more likely to possess similar properties that allow

them to meet the constraints imposed by these common interaction partners. The FS-


____________________________________________________________________

80

Weight is a measure of overlap between the interaction partners of two proteins. The

higher the overlap between the interaction partners of two proteins is, the higher the

likelihood of them sharing common functions.

The FS-Weight between two proteins u, v is defined as

, ,

2 2( , v)

2 2u v u v

FSu v u v u v v u u v v u

N N N NS u

N N N N N N N Nλ λ∩ ∩

= ×− + ∩ + − + ∩ +

(2.19)

, max(0, ( ))u v avg u v u vn N N N Nλ = − − + ∩ . (2.20)

Nu is the node set including node u and its nearest neighbours (level 1 neighbours)

and the similar meaning for Nv; navg is the average number of neighbours per node in

the PPI network. The FS-Weight can also be extended to take into account the

reliability of each individual interaction (Chua et al 2008).

Many methods have been proposed to assess the reliability of protein interactions.

These methods usually assign a score to each protein pair such that the higher the

score is, the more likely the proteins interact with each other. Among these methods,

CD-distance (Brun et al. 2003) and FS-Weight (Chua et al. 2008) are measures

calculated considering the number of common neighbors of two proteins. They are

initially proposed to predict protein functions and protein complexes, and have been

shown to perform well for assessing the reliability of protein interactions.

The intuition behind the iterative scoring method is that if the score of an interaction

reflects its reliability, then the scored interactions should better represent the actual

interaction network than the initial binary ones, and we should be able to further

improve score computation by re-computing the score of each protein pair using the

scored interactions. Liu et al. (2009) use the Adjust CD-distance which is a variant of

CD-distance to calculate the score of protein pairs.

A PPI network could be represented as an undirected network G = (V, E), where the

node set V is the set of proteins and the edge set E is the set of interactions between

proteins. We use u, v, x to denote individual nodes (proteins) and (u, v) to denote the


____________________________________________________________________

81

edge between node u and node v. The neighbour set of a node u in G, denoted as Nu,

is defined as Nu = {v| (u, v) ∈E}. For a given pair of proteins u and v, the distance

AdjustCD (Liu et al. 2009) of edge (u, v) is defined as

2

( , ) u v

u u v v

N NAdjust CD u v

N Nλ λ− =

+ + +

∩

, (2. 21)

where λu and λv are used to penalize proteins with very few neighbours as in FS-

Weight (Chua et al. 2008). They are defined as

max 0,xx V

u u

NN

Vλ ∈= −

∑, (2.22)

max 0,xx V

v v

NN

Vλ ∈= −

∑ . (2.23)

Based on this definition, if the degree of a node u is below the average degree, then it

is adjusted to the average degree. The iterative version of Adjust CD-distance is

defined as follows:

1 1

1 1

( ( , ) ( , ))

( , ) ( , )( , )

k k

x N Nu vk k k k

u vx N x Nvu

w x u w x vk

w x u w x vw u v

λ λ

− −∈

− −∈ ∈

+

+ + +

∑=∑ ∑

∩

, (2.24)

where 1( , )kw x u− and 1( , )kw x v− are scores of (x, u) and (x, v) respectively in the (k-1)-

th iteration. Initially, if there is an edge between x and u in the original PPI network,

then w0 (x, u) = 1, otherwise, w0(x, u) = 0. The two terms λuk and λv

k are also defined

based on weighted degree:

( )( )

1

1,

max 0, ,k

u

x

u

k

x V y N k

x N

w x yw x u

Vλ

−

∈ ∈ −

∈

= −

∑ ∑∑ , (2.25)

( )( )

1

1,

max 0, ,k

v

x

v

k

x V y N k

x N

w x yw x v

Vλ

−

∈ ∈ −

∈

= −

∑ ∑∑ . (2.26)

It is not difficult to see that w1 (u,v) = Adjust CD(u, v). CD-distance and FS-Weight

can be iterated in a similar way. Liu et al. (2009) showed that the iterative scoring

method can improve functional homogeneity and localization coherence of top


____________________________________________________________________

82

ranked interactions, and the iterative scoring method performs best when k = 2, and

the subsequent iterations do not improve the performance further. By using this

method, we generate a weighted PPI network based on the original unweighted PPI

network, and we take the score of each protein pair as the weight of the edge between

them.

2.4.2 Results and discussion

In this part, we first use the iterative scoring method to generate weighted PPI

networks based on the original PPI networks and then calculate their fractal

dimensions. For five PPI networks of Homosapiens, E. coli, Arabidopsis Thaliana, C.

elegans and baker’s yeast S. cerevisiae, we demonstrate that self-similarity exists in

both the PPI networks and their weighted networks. The fractal dimensions of

weighted PPI networks are slightly smaller than those of the original PPI networks.

Details of the numerical results are presented below.

PPI databases and software

The protein-protein interaction data used here are downloaded from two databases.

The PPI networks of C.elegans and Arabidopsis thaliana were downloaded from

BioGRID. The PPI networks of baker’s yeast S.cerevisiae, E.coli and Homosapiens

were downloaded from DIP. Our fractal scaling analysis is based on connected

networks, which means there are no isolated nodes and that all the nodes in the

network must be reachable. And with Cytoscape, we could get the largest connected

part of each interacting PPI data and use them as networks in our fractal analysis.

Fractal scaling on weighted PPI networks

Among the original five PPI networks (human, Arabidopsis thaliana, E.coli, yeast and

C.elegans), self-similarity is apparent. By using the iterative scoring method, we then

transform the PPI networks into weighted networks and calculate their fractal

dimension using the same random sequential box-covering algorithm (Kim et al.

2007a, b).


____________________________________________________________________

83

Remark: In graph theory, the shortest path is a path between two vertices (or nodes)

such that the sum of the weights of its constituent edges is minimized. So for the

unweighted PPI networks the shortest distance between two nodes is the number of

edges in a shortest path connecting them; while for the weighted PPI network it is the

minimum summary of weights of its constituent edges.

The fractal scaling of each PPI network and its weighted PPI network is showed in

Fig 2. 16 – Fig 2.20, where we use triangle (∆) for the original network and circle (○)

for the weighted network, together with their fitted lines. The fractal dimension is the

absolute value of the slope of each fitted line. The fractal dimensions of weighted PPI

networks are slightly smaller than those of the original PPI networks. The numerical

results of fractal scaling for the original PPI networks and their weighted PPI

networks are summarized in Tables 2.2 and 2.3 respectively. For each PPI network, N

is the number of nodes of the largest connected part, dB is the fractal dimension with

error range.

−5 −4 −3 −2 −1 0 1 2 3−1

0

1

2

3

4

5

6

7

ln(rB)

ln(N

B)

original PPI slope −2.20 weighted PPIslope −1.85

Fig 2.16 Fractal scaling of the Homo sapiens (Hsapi20041003) PPI


____________________________________________________________________

84

Table 2.2 Numerical results of fractal scaling for the original PPI networks

PPI database N dB error

Human DIP 503 2.20 ± 0.09

E.coli DIP 642 2.37 ± 0.11

Yeast DIP 1922 2.90 ± 0.20

C.elegans BioGRID 3343 3.48 ± 0.24

Arabidopsis Thaliana BioGRID 1298 2.26 ± 0.06

Table 2.3 Numerical results of fractal scaling for the weighted PPI networks

Weighted PPI database N dB error

Human DIP 417 1.85 ± 0.04

E.coli DIP 451 1.98 ± 0.07

Yeast DIP 1713 2.06 ± 0.04

C.elegans BioGRID 2444 2.04 ± 0.05

Arabidopsis Thaliana BioGRID 800 2.25 ± 0.10

−5 −4 −3 −2 −1 0 1 2 3 4−1

0

1

2

3

4

5

6

7

8

ln(rB)

ln(N

B)

original PPIslope −2.26weighted PPIslope −2.25

Fig 2.17 Fractal scaling of the Arabidopsis thaliana PPI


____________________________________________________________________

85

−6 −5 −4 −3 −2 −1 0 1 2 3−1

0

1

2

3

4

5

6

7

ln(rB)

ln(N

B)


Fig 2.18 Fractal scaling of the E.coli PPI

−10 −8 −6 −4 −2 0 2 4−1

0

1

2

3

4

5

6

7

8

9

ln(rB)

ln(N

B)


Fig 2.19 Fractal scaling of the C.elegans PPI


____________________________________________________________________

86

−5 −4 −3 −2 −1 0 1 2 3−1

0

1

2

3

4

5

6

7

8

9

ln(rB)

ln(N

B)


Fig 2.20 Fractal scaling of the Yeast PPI

2.5 Conclusion

In this chapter we introduced the theoretical definition and calculation of fractal

dimension and discussed several practical issues we should pay attention to during

implementing the box counting method. Then we presented several possible box

covering algorithms that could be used to compute the network’s fractal dimension.

Among these methods, we chose the random sequential box-covering algorithm due

to its easy implementation and tested it on protein-protein interaction networks for

fractal scaling.

i. We first tested our programs on Hsapi20041003. Our result calculated

fractal dimension of 2.32 is quite close to the results of Song et al. (2005)

and Kim et al. (2007).

ii. In our results the fractal dimension of the skeleton is smaller than that of

the original network. An explanation was given and this situation happens

mostly with small networks.


____________________________________________________________________

87

iii. In the last part of this chapter, we adopted the iterative scoring method to

generate weighted PPI networks. By using the random sequential box

covering algorithm, we calculated the fractal dimensions for both the

original unweighted PPI networks and the generated weighted PPI

networks. The results showed that self-similarity is still present in

generated weighted PPI networks. This implies that it is viable to expand

the study of properties of complex networks to a wider field including

more complex weighted networks.


____________________________________________________________________

88

Chapter 3

Multifractal analysis of complex networks

3.1 Introduction

It was shown in Chapter 2 that a fractal object can be characterized by its fractal

dimension, which gives a measure of how the density varies with respect to length

scale. Many of the well-known fractals such as the Cantor set, Koch curve,

Sierpiński triangle, etc. are homogeneous since they consist of a geometrical figure

repeated on an ever-reduced scale. For these objects, the fractal dimension is the

same on all scales. In the real world, fractals are not homogeneous: there is a non-

uniformity possessing rich scaling and self-similarity properties that can change from

point to point. Put plainly, the object can have different dimensions at different scales.

These more complicated objects are known as multifractals, and it is necessary to

define a continuous spectrum of dimensions to classify them.

The concepts underlying the development of multifractals were originally introduced

by Mandelbrot in the discussion of turbulence and expanded by Mandelbrot (1982),

Grassberger (1983), Hentschel and Procaccia (1983a, b), etc. to many other contexts.

Recently the concepts have been applied successfully in many different fields

including time series analysis (Canessa 2000), biological systems (Yu et al.

2001a,b,c, Anh et al. 2002, Yu et al. 2003, 2004, 2006, Zhou 2005) and geophysical

systems (Kantelhardt 2006, Veneziano 2006, Venugopal 2006, Yu et al. 2007, 2009,

2010).

In this chapter we aim to explore the multifractal behavior of different complex

networks. Multifractal analysis is a useful way to systematically characterize the

spatial heterogeneity of both theoretical and experimental fractal patterns. However,

the tools for multifractal analysis of objects in Euclidean space are not suitable for

complex networks. In this chapter, we propose a new box covering algorithm for

multifractal analysis of complex networks. This algorithm is demonstrated in the


____________________________________________________________________

89

computation of the generalized fractal dimensions of some theoretical networks,

namely scale-free networks, small-world networks, random networks, and a kind of

real networks, namely PPI networks of different species. As another application, we

generate gene interactions networks for patients and healthy people using the

correlation coefficients between microarrays of different genes. Our multifractal

analysis results in a potentially useful tool for gene clustering and identification.


Following Feder (1998), we may distinguish a measure (of probability or of some

physical quantity such as mass, energy, or number of individuals) from its geometric

support, which might or might not have fractal geometry. Then, if a measure has

different fractal dimensions on different parts of the support, the measure is a

multifractal. Multifractal measures have been observed in many physical situations,

for example in fluid turbulence, rainfall distribution, mass distribution across the

universe, viscous fingering, neural networks, share prices and in many other

phenomena.

There are two basic approaches to multifractal analysis: fine theory, where we

examine the structure and dimensions of the fractals that arise themselves, and coarse

theory, where we consider the irregularities of the distribution of the measure of balls

(or boxes) of small but positive radius (size) ε , then take the limit as ε → 0. In many

ways, fine multifractal analysis parallels finding the Hausdorff dimension of sets,

while coarse theory is related to box-covering dimension. The fine theory may be

more convenient for mathematical analysis, but the coarse approach is usually more

practicable when it comes to analysing physical examples or computer experiments.

Generalized correlation dimension function Dq

We consider the support of a multifractal as a DE-dimensional space which is divided

into DE-dimensional boxes of size ε. Then

( ) ( )N

ii

f ε µ ε=∑ (3.1)


____________________________________________________________________

90

denotes the integrated measure on the i-th cube of edge ε. One can then define the

qth-order moment of the probability distribution or partition function Mq(ε) (Halsey

et al. 1986)

( )

1

( ) ( )N

qq i

i

M fε

ε ε=

= ∑ , (3.2)

where N(ε) is the number of boxes of size ε. The information dimension (Equation

2.15) and the correlation dimension (Equation 2.17) can then be written as

1

( ) log ( )1q qI M

qε ε=

− (3.3)

and

0 0

( ) log ( )1( ) lim lim

log(1/ ) 1 log(1/ )q qI M

D qqε ε

ε εε ε→ →

= =−

. (3.4)

Iq(ε) is the generalized information dimension (or Reyni information of qth order.

Rényi 1970) and D (q) is the generalized correlation dimension (Grassberger 1983).

From Equation 3.2 through Equation 3.4, one readily finds the previously defined

box dimension, the information dimension as well as the correlation dimension

(which we defined in Chapter 2) for integer q as special cases. The box dimension Db,

the information dimension Dinfor, and the correlation dimension Dcor are then

respectively given as

0

lim ( ) (0)bq

D D q D→

= = , (3.5)

infor 1lim ( ) (1)q

D D q D→

= = , (3.6)

2

lim ( ) (2)corq

D D q D→

= = . (3.7)

The generalized dimension function D(q) is defined for all real values of q and is

estimated as the slope of the log-log plot of Iq(ε) vs. ε. For monofractal sets, the

function D(q) is a linear function of q; in order words, no additional information is

gained by examining higher moments, that is, more extreme values of the measure µ.

Alternatively, for multifractal sets, D(q) is a nonlinear function of q. Note that there

are lower and upper limiting dimensions, D−∞ and D+∞ , which are related to the

regions of the set where the measure µ is sparser and denser, respectively. For


____________________________________________________________________

91

positive values of q, D(q) reflects the scaling of the large fluctuations and strong

singularities. In contrast, for negative values of q, D(q) reflects the scaling of the

small fluctuations and weak singularities (Vicsek 1993, Takayasu 1997).

Mass exponent function τ(q)

For a given value of ε, the mass m(ε) is expressed as first moment of the probability

distribution, so Equation 3.2 can be equivalently written as

( )( ) qqM τε ε −∝ , (3.8)

where τ(q) is a mass exponent function. For monofractal sets, τ(q) is linear,

( ) 1q qHτ = − , where H is the Hurst exponent. In contrast, multifractal sets display a

nonlinear function τ(q). Moreover, for self-similar multifractals, Equation (3.3), (3.4),

(3.8) lead to

0

log ( )( ) ( 1) ( ) lim

log( )qM

q q D qε

ετ

ε→= − = . (3.9)

Note that for q = 0, D(q) = τ(q) = D(0); and τ(1) = 0.

Multifractal spectrum f(α)

The number log ( ) logi iα µ ε ε= , also referred to as the Hölder exponent, is the

singularity strength of the ith box. This exponent may be interpreted as a crowding

index of a measure of the concentration of measure µ: the greater αi is, the smaller

the concentration of the measure, and vice versa. For every box size ε, the numbers

of cells Nα(ε) in which the Hölder exponent αi has a value within the range

[ ], dα α α+ behave like

( )fN αα ε −∝ . (3.10)

The function f (α) signifies the Hausdorff dimension of the subset which has

singularity αi equal to α; that is, f (α) characterizes the abundance of cells with

Hölder exponent α and is called the singularity spectrum of the measure. The

measure µ is said to be a multifractal measure if its singularity spectrum f (α) ≠ 0 for

a range of values of α. The singularity spectrum f (α) and the mass exponent function

τ (q) are connected via the Legendre transform as (Evertsz and Manderlbrot 1992):


____________________________________________________________________

92

( )

( )d q

qdq

τα = (3.11)

and

( ( )) ( ) ( )f q q q qα α τ= − . (3.12)

Considering the relationship between the mass exponent function τ (q) and the

generalized dimension function D (q), the singularity spectrum f (α) contains exactly

the same information as τ (q) and D (q).

Because the calculation of f (α(q)) from Equation 3.11 and 3.12 can be highly

problematic, Chhabra and Jensen (1989) developed a much simpler method for the

calculation of f(α(q)) and α(q) for multifractal structures as

( )

1

( , ) log ( )( )

log

N

i ii

qq

ε

µ ε µ εα

ε==∑

, (3.13)

( )

1

( , ) log ( , )( ( ))

log

N

i ii

q qf q

ε

µ ε µ εα

ε==∑

, (3.14)

where ( )

1( , ) ( ) ( )

Nq qi i ii

qεµ ε µ ε µ ε

== ∑ . The parameter q provides a scanning tool to

scrutinize the denser and rarer regions of the measure µ. For q>1, regions where µ

has a high degree of concentration are amplified, while for q < -1, regions with a

small degree of concentration are magnified. Finally, for q = 1, the measure itself is

replicated. The function f (α(q)) thus gives the entropy dimension of the distorted

measure µ(q, ε) and characterizes the original measure µ by analysing the variation

under successive distortions driven by the parameter q. The singularity spectrum

f(α(q)) takes its maximum value for q = 0 and typically has a parabolic shape around

this point. The number f(α(0)) = α(0) = D(0) is the box dimension (or fractal

dimension) of the measure µ, and the number f(α(1)) = α(1) = D(1) is the information

dimension.

Examples of Cantor multifractal sets

A Cantor multifractal set is constructed by removing the middle third segment at

each stage and distributing a weight so that each of the remaining two segments


____________________________________________________________________

93

receive a fraction of p1 and p2 units, respectively, and p1 + p2 = 1. Based on our

previous introduction, p1 and p2 could be seen as the measure of each segment.

Example (i) p1 = 1/9 and p2 = 8/9; (ii) p1 = 1/3 and p2 = 2/3. We illustrate the

generation of Cantor multifractal sets in Fig 3.1 under the condition of Example (ii).

Fig 3.1 Construction of Cantor multifractal in Example (ii)

At stage k, each segment is of length (1/3)k and there are N = 2k segments. Assign a

unit weight to the original line. Then for k = 1, one line segment has weight p1 and

the other has weight p2. For k = 2, there are four segments: one with21p , two

with 1 2p p , one with 22p . At stage 3, there are eight segments: one with3

1p , three

with 21 2p p , three with 2

1 2p p , and one with 32p . It is not difficult to see that at stage k,

there will be ( )s

kN

sε

=

segments of weight1 2s k sp p − .

k = 0

k = 1

k = 2

k = 3

k = 4


____________________________________________________________________

94

Fig 3.2 Multifractal spectra for Cantor multifracta l sets of Examples (i) and (ii).

From Equations 3.2, 3.4, and 3.9 we get: 1 2( ) ( )q q kqM p pε = + , 1 2log( )

( )log(3)

q qp pqτ += ,

1 2log( )1( )

1 log(3)

q qp pD q

q

+=−

(The D(q) spectrum can be plotted using continuity at q =

1). To construct an f(α(q)) spectrum, for each step 0,1,...,s k= , we have the weight

of the segment as 21 2s k

sp p p −= and the size of the segment as 1

( ) 33

k kε −= = . Hence,

1 2log ( ) log

log3s k

s p k s pα −

+ −= and the number of segments of weight ps at the kth step

is s

kN

s

=

. Hence, log log3 ks

kf

s−

=

. The multifractal curves are shown in Fig

3.2. In the left column, the curves a, b, c are multifractal spectrum of Example (i),

and in the right column, the curves d, e, f are multifractal spectrum of Example (ii).


____________________________________________________________________

95

3.3 Methods

3.3.1 Box-covering algorithm

The most common algorithm for multifractal analysis is the fixed-size box-covering

algorithm. In the one-dimensional case (Halsey et al. 1986), for a given measure 0 ≤

µ ≤ 1 with supportE R⊂ in a metric space, we consider the partition sum

( ) ( )( ) 0

q

B

Z q Bεµ

µ≠

= ∑ , (3.15)

where q is a real number and the sum runs over all different non-overlapping boxes

of a given size ε in a grid covering of the support E. It follows that

( ) 0Z qε ≥ (3.16)

and (0) 1Zε = .

The mass exponent ( )qτ of the measure µ is defined by

( ) ( )0

lnlim

ln

Z qq ε

ετ

ε→= . (3.17)

Proposition 1 The mass exponent ( )qτ is an increasing function of q.

Proof. For q1 < q2, it follows from µ being a probability measure

that 1 2( ) ( )q qi iB Bµ µ> ; thus 1 2( ) ( )Z q Z qε ε> . Since ln 0ε < when 0ε → , the

increasing property of ( )qτ follows.

The generalized fractal dimensions of the measure µ are defined as

( )1q

qD q

τ= − , (3.18)

for 1q ≠ , and

1,

0lim

lnq

ZD ε

ε ε→= , (3.19)

for 1q = , where ( )( ) ( )1, 0ln

BZ B Bε µ

µ µ=

=∑ . The generalized fractal dimensions are

numerically estimated through a linear regression of 1,ln ( 1)Z qε − against ln ε


____________________________________________________________________

96

for 1q ≠ , and similarly through a linear regression of 1,Z ε against ln ε for 1q = .

The value 1D is called the information dimension (introduced in Chapter 2, Section

2.1, Equation 2.15) and 2D the correlation dimension (introduced in Chapter 2,

Section 2.1, Equation 2.17).

Proposition 2 Dq is a decreasing function of q for q ≠ 1.

Proof. Combining Equations 3.17 and 3.18 yields, for q ≠ 1,

0

1ln ( )

1lim

lnq

Z qq

Dε

ε ε→

−= . (3.20)

We need to consider three cases:

(i) For 1 < q1 ≤ q2 < ∞, we have

2 1

1 10

1 1q q< ≤ < ∞

− − (3.21)

and 2 10 ( ) ( ) 1Z q Z qε ε< ≤ ≤

that is ,

2 1ln ( ) ln ( ) 0Z q Z qε ε≤ < . (3.22)

From Equations 3.20 to 3.22, it is seen that 1

ln ( )1

Z qq ε−

increases as a

function of q. Thus Dq decreases as a function of q since ln ε < 0 as ε → 0.

(ii) For 0 < q1 ≤ q2 < 1, we have:

2 1

1 11

1 1q q−∞ < ≤ < −

− −

and 2 12 1

1 1ln ( ) ln ( )

1 1Z q Z q

q qε ε≥− −

. Thus Dq decreases as a function of q in

this case.

(iii) For -∞ < q1 ≤ q2 < 0, we have:

2 1

1 11 0

1 1q q− < ≤ <

− −


____________________________________________________________________

97

and also 2 12 1

1 1ln ( ) ln ( )

1 1Z q Z q

q qε ε≥− −

. Thus Dq decreases as a function of

q in this case.

Lau and Ngai showed in their Proposition 3.4 (page 57) that

(i) minlim qq

D α→∞

= ; (ii) maxlim qq

D α→−∞

= . This result together with Proposition 2 and the

definition of a multifractal measure given above lead to a method to determine the

multifractality of a probability measure µ:

(i) When αmin = αmax, the function Dq is constant for q ≠ 1 and the measure µ is

monofractal;

(ii) When αmin ≠ αmax, Dq is a decreasing function of q and the measure µ is

multifractal.

This method is the key element in the next section when we investigate the

multifractality of a variety of networks.

3.3.2 Sand-box algorithm

For feasible computation of the generalized dimension on real data Tél et al. (1989)

introduced a sandbox method which is defined as

[ ] 1

0

0 0

ln ( ) 1lim ( ) lim

ln( ) 1

q

q qr r

M r MrD D L r L q

−

→ →= =

− , q ∈ℝ . (3.23)

It is derived from the box-covering method, but has better convergence. The idea is

that one can randomly choose a point on the object A, make a sandbox (i.e. a ball

with radius r) around it, then count the number of points in A that fall in this sand

box of radius r, represented as M(r) in the above definition. And L is the linear size

of A; M0 is the total number of points in the object A, the brackets <·> mean to take

statistical average over randomly chosen centres of the sandboxes.

The generalized dimension Dq is then obtained by performing a linear regression of

the logarithm of sampled data [ ] 1

0ln ( )q

M r M− vs. ( 1)ln( )q r L− and taking its


____________________________________________________________________

98

slope as the multifractal dimension in a practical use of the sandbox method. The

idea can be illustrated by rewriting the above equation as

1

0ln( ( ) ) ( ) ( 1) ln( ) ( 1) ln( )q

qM r D r L q r L q M− = × − + − . (3.24)

First we choose r in an appropriate range [rmin, rmax]. For each chosen r, we compute

the statistical average of [ ] 1

0( )q

M r M−

over a large number of radius-r sandboxes

randomly distributed on A, [ ] 1

0( )q

M r M−

, then plot the data on the

[ ] 1

0ln ( )q

M r M− vs. ( 1)ln( )q r L− plane. We next perform a linear regression and

calculate the slope as an approximation of the multifractal dimension Dq. The value

D1 is the information dimension and D2 the correlation dimension of the measure.

The Dq values for positive values of q are associated with the regions where the

points are dense. The Dq values for negative values of q are associated with the

structure and properties of the most rarefied regions.

3.3.3 Algorithms for multifractal analysis of networks

Sand box algorithm applied on networks

As we introduced above, sand-box is easy to use and we can simply apply it on

networks. The idea is that: for a given value of box size rB, we randomly choose a

node as the centre of a box and cover all the nodes within distance of rB from the

centre node; then count the number of nodes covered in the same box represented as

M(r). For each rB, we compute the statistical average of [ ] 1

0( )q

M r M−

over a large

number of radius-r sandboxes [ ] 1

0( )q

M r M−

then plot the data on the

[ ] 1

0ln ( )q

M r M− vs. ( 1)ln( )q r L− plane.

We should notice that, in each step we just need to choose one node as the centre of a

box, and only one box we could get no matter how many nodes it covers; and nodes

could be randomly chosen (as centre of a box) and covered more than one time. We


____________________________________________________________________

99

illustrate the sand-box process in Fig 3.3. For a given rB =1, in step one, node 1 has

been randomly chosen as the centre node, and four other nodes covered in the same

box with node 1 are all coloured in red for illustration; in step two, node 2 has been

randomly chosen as the centre node, and two other nodes covered in the same box

with node 2 are all coloured in green for illustration. In these two steps, node 3 has

been covered twice which is coloured in yellow.

Fig 3.3 Illustration of Sand-box algorithm

Discussion of box-covering algorithm applied on networks

In Chapter 2, Section 2.2.1 we introduced five box-covering algorithms, namely the

greedy colouring algorithm (Song et al. 2007), burning with the diameter lB (Song et

al. 2007), compact-box-burning (CBB) (Song et al. 2007), maximum-excluded-mass-

burning (MEMB) (Song et al. 2007), random sequential box-covering (Kim et al.

(2007a, b). All these box-covering algorithms could be used to calculate the fractal

dimensions of complex networks. However, such algorithms could not be used to

analyse multifractal behaviours of networks directly. Because the box-covering

method contains a random process of selecting the position of the centre of each box,

this will affect the number of boxes with a fixed box size. Especially, if a node with

large degree (a hub) is randomly chosen, a lot more nodes could be covered within

one box, and it is an efficient way when we produce box-covering. However, if a

node with small degree is randomly chosen first, few nodes could be covered within

the same box. As a result, the partition sum defined in Equation 3.15 will change for

each time we proceed with box covering.

1

3

2


____________________________________________________________________

100

We illustrate this situation in Fig 3.4 as follows. We consider a network of eight

nodes. In Fig 3.4 A, for a fixed box size rB =1, firstly, node a is chosen as the centre

of a box and both nodes a and b are covered in the same box coloured in black. Next

node f is chosen as a centre of a box, and nodes b, c, d, e, g are all within distance

rB =1. Since node b has already been covered in the previous step, so node c, d, e, f, g

are covered in the same box coloured in blue. In the last step, node g is chosen as the

centre of a box and its neighbouring node h is the only one found within distance

rB =1 not covered yet, so h is the only one covered in a box coloured in red. In

summary three boxes are needed to cover the entire network.

Fig 3.4 Box-covering algorithm could result in different numbers of boxes needed to cover the entire network.

h

g f

e d c b

a

h

f g

e d c b

a

A

B


____________________________________________________________________

101

In Fig 3.4 B, for the same fixed box size rB =1, firstly node h is chosen as the centre

of a box and both node h and node g are covered in the same box coloured in red.

Next node f is chosen as a centre of a box, node e, g are all within distance rB =1.

Since node g has already been covered in the previous step, so node e, f are covered

in the same box coloured in blue. Next node d is then chosen as the centre of a box

and since its two neighbours f, g have already been covered, so d is the only one

covered in a box coloured in brown; likewise, node c is chosen and covered alone in

the box coloured in green. In the last step, node a is chosen as a centre and both node

a and b are covered within one box coloured in black. In summary five boxes are

needed to cover the entire network. In these two cases of Fig 3.5 A and B, the

partition sums are different.

Modify fixed-size box counting algorithm

To avoid the above effect, we propose to take the average of the partition sum over a

large number of times and accordingly modify the original fixed-size box-covering

algorithm into a new method. To our knowledge, this improvement is the first

introduced in this approach to analyse the multifractal behaviour of complex

networks.

We need to calculate the shortest-path distance matrix for each network and these

matrices are the input data for fractal and multifractal analyses. We describe the

process as follows.

(i) Transform the pairs of edges and nodes in a network into a matrix AN×N,

where N is the number of nodes of the network. The matrix AN×N is a

symmetric matrix where the elements aij = 0 or 1 with aij = 1 when there is an

edge between node i and node j, while aij = 0 when there is no edge between

them. We define that each node has no edge with itself and accordingly aii =

0.

Remark: AN×N could be the input data for calculating the degree distribution and

characteristic path length to see if the network possesses the properties of scale-free

degree distribution and small-world effect.


____________________________________________________________________

102

(ii) Compute the shortest path length among all the linked pairs and save these

pairs into another matrix BN×N.

Remark: In graph theory, calculation of the shortest path is a significant problem and

there are many algorithms for solving this problem. Here, in our approach, we use

Dijkstra's algorithm (Dijkstra 1959) of the Matlab toolbox.

After the above steps we could use the matrix BN×N as input data for multifractal

analysis based on our modified fixed-size box counting algorithm as follows:

(i) Initially, all the nodes in the network are marked as uncovered and no node

has been chosen as a seed or centre of a box.

(ii) According to the number of nodes in the network, set t = 1, 2,…, T

appropriately. Group the nodes into T different ordered random sequences.

More specifically, in each sequence, nodes which will be chosen as seed or

centre of a box are randomly arrayed.

Remark: T is the number of random sequences and is also the value over which we

take the average of the partition sum( )rZ q . Here in our study, we set T = 200 for all

the networks in order to compare.

(iii) Set the size of the box in the range[ ]1,r d∈ , where d is the diameter of the

network.

Remark: When r = 1, the nodes covered within the same box must be connected to

each other directly. When r = d, the entire network could be covered in only one box

no matter which node could be chosen as the centre of the box.

(iv) For each centre of a box, search all the neighbours within distance r and cover

all nodes which are found but not been covered yet.

(v) If no newly covered nodes have been found, then this box is discarded.


____________________________________________________________________

103

(vi) For the nonempty boxes B, we define their measure µ(B) as ( ) BB N Nµ = ;

where NB is the number of nodes covered by the box B, and N is the number

of nodes of the entire network.

(vii) Repeat (vi) until all nodes are assigned to their respective boxes.

(viii) When the process of box counting is finished, we calculate the partition sum

as [ ]( ) 0

( ) ( )q

rB

Z q Bµ

µ≠

= ∑ for each value of r.

(ix) Repeat (v) and (vi) for all the random sequences, and take the average of the

partition sums ( ) ( ( ))r rZ q Z q T= ∑ and then use ( )rZ q for linear regression.

Fig 3.5 Linear regression

Linear regression

Linear regression is an essential step to get the appropriate range of [ ]min max,r r r∈

and to get the generalized fractal dimensions Dq. In our approach, we run the linear

regression of ln ( ) ( 1)rZ q q − against ln(r/d) for q ≠ 1, and similarly the linear


____________________________________________________________________

104

regression of 1,rZ against ln(r/d) for q = 1, where 1,( ) 0

( ) ln ( )rB

Z B Bµ

µ µ≠

= ∑ and d is

the diameter of the network. An example of linear regression for the Arabidopsis

thaliana PPI network is shown in Fig. 3.5. The numerical results show that the best

fit occurs in the range ( )1,9r ∈ , hence we select the range [ ]1,9r ∈ to perform

multifractal analysis and get the spectrum of generalized dimensions Dq.

Flow-chart 3.1 shows the basic steps of our modified fixed-size box covering method

to calculate the generalized fractal dimension Dq of complex networks.

After the spectrum of generalized dimensions Dq has been obtained, we use

∆ Dq =max Dq - lim Dq (3.25)

to verify how Dq changes along each curve. The quantity ∆ Dq has been defined in

the literature to describe the density of an object. In this work, based on our modified

fixed-size box covering method, ∆ Dq can help to understand how the edge density

changes in the complex network. In other words, a larger value of ∆ Dq means the

edge distribution is more uneven. More specifically, for a network, edge distribution

could vary from an area of hubs where edges are dense to an area where nodes are

just connected with few links.

In the following sections, we calculate the generalized fractal dimension Dq of

complex networks. From the shape of Dq, we can infer their possible multifractality

via the method described in Subsection 3.2.1. For a mono-fractal system which has

the same scaling behaviour we will see that Dq is a constant which is independent of

q. For multifractal objects, Dq should be a convex curve. We then can calculate ∆ Dq

to verify how Dq changes along each curve


____________________________________________________________________

105

Flow-chart 3.1: Multifractal analysis of complex networks

Calculate the adjacency matrix AN×N

Calculate the shortest path length matrix BN×N

Set the size of the box in the range[ ]1,r d∈ , d is the diameter of G

Choose network G (should be a connected graph) of size N

Through linear regression choose the appropriate range of [ ]min max,r r r∈

For each fixed value of r, process box-covering for t times and calculate the partition sums

Take the average of the partition sums over t for linear regression.

Set the size of the box in the range [ ]min max,r r r∈

For each fixed value of r, process box-covering and calculate the generalized dimensions Dq


____________________________________________________________________

106

3.4 Multifractality of theoretical networks

In recent years, with the development of technology, the research on networks has

shifted away from the analysis of single small graphs and the properties of individual

vertices or edges within such graphs to consideration of large-scale statistical

properties of complex networks. Newman (2001 a, b and 2003) reviewed some latest

works on the structure and function of networked systems such as the Internet, the

World Wide Web, social networks and a variety of biological networks.

Besides reviewing empirical studies, the author also focused on a number of

statistical properties of networks including path lengths, degree distributions,

clustering and resilience. In this section, we pay attention to another aspect of

networks, namely their multifractality. We aim to develop a tool based on this

property to characterize and classify real-world networks. It has been shown that

many real complex networks share distinctive characteristics that differ in many

ways from random and regular networks (Lee and Jung 2006, Guo and Cai 2009,

Newman 2001 a, b, Newman 2003). Two fundamental properties of complex

networks: small-world effect and the scale-free degree distribution have attracted

much attention recently. These properties have in fact been found in many naturally

occurring networks. In Subsections 3.3.1, 3.3.2 and 3.3.3, we generate scale-free

networks using the BA model of Barabasi and Albert (1999), small-world networks

using the NW model of Newman and Watts (1999), then random networks using the

ER model of Erdös and Rényi (1960) respectively. We then apply our modified

fixed-size box counting algorithm to analyse the multifractal behaviour of these

networks.

3.4.1 Multifractality of scale-free networks

Some real networks display power law degree distribution as ( )P k k γ−∼ where the

exponent varies in the range of 2 3γ< < . Such networks have been named scale-free

networks (Albert and Barabasi 2002, Barabasi and Albert 1999). Power-laws have a


____________________________________________________________________

107

particular role in statistical physics because of their connections to phase transitions

and fractals. The networks whose degree distribution has a power-law property will

have a small number of high-degree nodes and a large number of low-degree nodes.

In nonmathematical terms, a scale-free network is one with a few nodes linked to

many other nodes, and a large number of poorly connected nodes. The rare nodes

with high degree are called hubs. Scale-free networks have been the focus of a great

deal of attention (Albert and Barabasi 2002, Dorogovtsev and Mendes 2002, Strogatz

2001).

Fig 3.6 Arabidopsis thaliana PPI network

In order to show the power-law degree distribution of scale-free networks here we

use the Arabidopsis thaliana protein-protein interaction network as an example as

shown in Fig 3.6. The Arabidopsis thaliana PPI was downloaded from BioGRID

Dataset. The original data include 1770 nodes and 3896 edges; we take the largest

connected part which includes 1298 nodes and 2767 edges (self connected ones were

not included). Then we randomly make the node in to a sequence and plot the degree

of each node sequentially in Fig 3.7. And Fig 3.8 shows the frequency of

degree iNN , where Ni is the number of node with degree of i, and N is the total


____________________________________________________________________

108

number of nodes. We could see that, most nodes have small degrees which contribute

to a large frequency of such degree values; while only several nodes have rarely

large degree. Fig 3.9 shows the power-law degree distribution in which the blue

straight line drawn for guidance has a slope represents the power-law exponent of

1.71 ± 0.08.

Fig 3.7 Node degree sequence of Arabidopsis thaliana PPI network

Fig 3.8 Node degree frequency of Arabidopsis thaliana PPI network


____________________________________________________________________

109

Fig 3. 9 Degree distribution of Arabidopsis thaliana PPI network

Generating scale-free networks

We use the BA model (Barabasi and Albert 1999) to generate scale-free networks as

follows:

(i) Initially, the network begins with a seed network of n nodes where n ≥ 2 and

the degree of each node in the initial network should be at least 1, other wise it will

always remain disconnected from the rest of the network.

(ii) Then we add one node to this initial network at a time. Each new node is

connected to n0 < n existing nodes with a probability that is proportional to the

number of links that the existing nodes already have. Formally, the probability pi that

the new node is connected to node i is

ii

jj

kp

k=∑

, (3.26)

where ki is the degree of node i. Numerical simulations and analytic that this network

evolves a scale-invariant state with the probability that a node has k edges following

a power-law with an exponent γ. Which means that hubs (nodes with high degree

values) are tend to quickly accumulate even more links, while nodes with only a few

links are unlikely to be chosen as destination for a new link.


____________________________________________________________________

110

Analytical solution for the BA model

The model could be solved analytically by setting up a differential equation in which

the rate that a node acquires links is equal to n0 (the number of links added) times the

probability of acquiring a link:

0i i

jj

dk kn

dt k=

∑, (3.27)

where ki is the degree of node i. This equation can be simplified by realizing that at

each step n0 links are added, thus

02jj

k n t=∑ (3.28)

and

2

i idk k

dt t= , (3.29)

1

ln ln2ik t C= + , (3.30)

where C is an integration constant which can be determined by using the fact that the

ith node which arrives to the network at time ti has degree n0. Thus

1

20( )i

i

tk n

t= . (3.31)

The degree distribution can be calculated by finding the probability that a node has a

degree smaller than k:

0 02 2

( ( ) ) ( ) 1 ( )i i i

n t n tP k t k P t P t

k k> = < = − > . (3.32)

Without loss of generality we can assume that nodes are added at a constant rate,

thus

1

( )iP tm t

=+

, (3.33)

where m is the total degree of nodes in the initial network. Using this distribution

02

1( ) 1i

n tP k k

k m t− = −

+. (3.34)

Finally we get the degree distribution by differentiating and conclude that

2

03

2 1( ) ( )i i

n tdP k k P k k

dk k m t< = = =

+. (3.35)


____________________________________________________________________

111

This mechanism was first introduced by Yule in the early 20th to explain the

distribution of different taxes and was later generalized by Price in the 70’s and

coined as cumulative advantage. The general version of Price’s model can be found

in the original paper as well as in Newman (2005). In any case, the lesson that should

be learned form this is that whenever we found a system in which the probability of

increasing is proportional to the actual value, we should expect its distribution to

follow a power-law.

Example of BA model

(i) Initially, the network begins with a seed network of 5 nodes and its adjacency

matrix is

0 1 0 0 1

1 0 0 1 0

0 0 0 1 0

0 1 1 0 0

1 0 0 0 0

.

(ii) Then we add one node to this initial network at a time. Each new node is

connected to only 1 existing node. Fig 3.10 shows four generated scale-free

networks of 30 nodes (upper left network), 60 nodes (upper right network),

100 nodes (bottom left network) and 500 nodes (bottom right network)

respectively which are all generated from the same seed.


____________________________________________________________________

112

Fig 3.10 Generating scale-free networks

Numerical results

In this Chapter, these scale-free networks are generated based on the same seed

which is the initial network of 5 nodes and its adjacency matrix is the same as the

example shown in Fig 3.11. For better comparison, in each step, one node will be

added into the network with one link. Then we apply the modified fixed-size box-

covering method on them to detect their multifractal behaviour.

We summarize the numerical results in Table 3.1 including the number of nodes (N),

number of edges (E), diameter of the network (d), power law exponent (γ), maximum

value of Dq, limit of Dq, and ∆Dq. From these results we could see that scale free

networks with larger size (more nodes and more edges) are likely to have larger

values of maximum and limit of Dq. In other words, the function Dq increases with

the size of a scale-free network. An explanation of this situation is that larger sclale-

free networks usually have more hubs which make the structure of the network more

complex.


____________________________________________________________________

113

Table 3.1 Numerical results of generated scale-free networks

N E γ d rB max Dq lim Dq ∆Dq 500 499 1.939 ± 0.151 13 1 ~ 8 2.6678 1.3553 1.3125 1000 999 2.018 ± 0.069 16 1 ~ 7 2.9288 1.4710 1.4578 1500 1499 2.095 ± 0.044 17 1 ~ 10 2.9573 1.6539 1.3034 2000 1999 1.988 ± 0.076 20 1 ~ 9 3.0501 1.7562 1.2939 3000 2999 2.06 ± 0.038 20 1 ~ 8 3.2645 1.8268 1.4377 4000 3999 2.098 ± 0.033 23 1 ~ 9 3.3206 1.8015 1.5191 5000 4999 2.084 ± 0.037 23 1 ~ 12 3.259 1.7528 1.5061 6000 5999 2.06 ± 0.042 22 1 ~ 9 3.3927 1.8839 1.5088 7000 6999 2.077 ± 0.038 28 1 ~ 8 3.3894 2.1003 1.2891 8000 7999 1.9129 ± 0.1217 25 1 ~ 11 3.3297 2.1115 1.2183

Fig 3.11 The Dq curves for theoretically generated scale-free networks

In Fig 3. 11 we can see that scale-free networks are multifractal by the shape of the

Dq curves. The Dq functions of these networks decrease sharply after the peak. An

explanation is that, in a scale-free network, there are several nodes which are known


____________________________________________________________________

114

as hubs that have a large number of edges connected to them, so the edge density

around the areas near hubs is larger than the remaining parts of the network.

Scale-free networks have power-law degree distribution ( )P k k γ−∼ , where P(k) is

the probability of a node randomly chosen with degree k. It was shown in Albert et al.

(1999), Albert and Barabasi (2002) that when γ < 2, the average degree diverges;

while for γ < 3, the standard deviation of the degree diverges. It has been found that

the degree exponent γ usually varies in the range of 2 < γ < 3 (Albert et al. 1999) for

most scale-free networks. The results show that there doesn’t seem to be any clear

relationship between power-law exponent and the maximum of Dq, limit Dq or ∆Dq.

3.4.2 Multifractality of small-world networks

In most of the real networks, despite of their large size, there is a relatively short path

between any two nodes which implies the small-world effect. In Section 1.1.2 we

described the famous experiments carried out by Stanley Milgram in the 1960s, in

which letters passed from person to person were able to reach a designated target

individual in only a small number of steps - around six in the published cases. This is

one of the first direct demonstrations of the small-world effect, the fact that most

pairs of vertices in most networks seem to be connected by a sort path through the

network.

Small-world effect is mathematically characterized by an average shortest path length

l that depends at most logarithmically on the network size N which is the number of

nodes in the network, lnl N≈ , where l is the shortest distance between two nodes

and defines the distance metric in complex networks. Equivalently, we obtain

Equation 1.9: 0

llN e≈ where l0 is a characteristic length. The small-world property

has been observed in a variety of other real networks such as social networks,

biological networks and technological networks (Watts and Strogatz 1998, Watts

1999, Newman 2001a, 2001b).


____________________________________________________________________

115

Generating small-world networks by the WS model

Watts and Strogatz (1998) proposed a single-parameter small-world network model

that bridges the gap between a regular network and a random graph. With the WS

small-world model, one can link a regular lattice with pure random network by a

semirandom network with high clustering coefficient and short average path length.

Later on, Newman and Watts (1999) modified the original WS model. In the NW

model, instead of rewiring links between nodes, extra links called shortcuts are added

between pairs of nodes chosen at random, but no links are removed from the existing

network. The NW model is equivalent to the WS model for small p and sufficiently

large N but easier to proceed.

The WS model is a random graph generation method that produces graphs with

small-world properties. First, we should select three parameters: the dimension n

(number of nodes in a graph); the mean degree k (assumed to be an even integer)

which is the number of nearest-neighbours to connect; and the probability p of

adding a shortcut in a given row, 0 ≤ p ≤ 1 and n ≥ k ≥ ln(n) ≥ 1. Secondly, we follow

two steps:

(i) Construct a regular ring lattice. For example, if the nodes are named

N0,…,Nn-1, there is an edge eij between node Ni and Nj if and only if

i j K− ≡ for 0, 2kK ∈

;

(ii) For every node Ni, i = 0,…, n-1, take every edge eij between node Ni and Nj

where i < j, and rewire it with probability p. Specifically, node Nk is chosen

with uniform probability from all the nodes, and edges eij are replaced by eik.

Remark: the process of rewiring avoids loops and edge duplication.

NW model

Firstly, we should select three parameters: the dimension n; which is the number of

nodes in a graph; the mean degree k (assumed to be an even integer) which is the


____________________________________________________________________

116

number of nearest-neighbours to connect; and the probability p of adding a shortcut

in a given row, where 0 ≤ p ≤ 1 and n ≥ k ≥ ln(n) ≥ 1. Secondly, we follow two steps:

(i) Construct a regular ring lattice. For example, if the nodes are named

N0,…,Nn-1, there is an edge eij between node Ni and Nj if and only if

i j K− ≡ for [ ]0, 2K k∈ ;

(i) We add a new edge between nodes Ni and Nj with probability p.

Fig 3.12 Generating small-world networks

We use the NW model in our study. An illustration of the NW model generating

process is shown in Fig 3.12. The upper left figure corresponds to the probability p =

0. It is a regular network containing 20 nodes and each node has two neighbours on

both sides. In other words, in this regular network, each node has four edges. All the

nodes and edges are shown in blue. Then we start generating small-world networks

based on this regular network. The upper right figure of Fig 3.12 corresponds to the

probability p = 0.1; one edge is added into the network which is coloured in black.

The network then becomes a small-world network. The bottom left figure

corresponds to the probability p = 0.5; seven black edges are added into the original


____________________________________________________________________

117

regular network and it is also a small-world network. The bottom right figure

corresponds to the probability p = 1; 10 black edges are added into the original

network and this time it becomes a random network.

Numerical results

In this work, we firstly generated a regular network which contains 5000 nodes and

250, 000 edges. Each node has 50 edges on each side. Then we apply the modified

fixed-size box-covering method on this regular network. The numerical results are

shown in the last row of Table 3.2. Both the maximum value of Dq and limit of Dq

are equal to one, thus ∆ Dq = 0. This is because regular networks are not fractal, and

they have dimension one.

Table 3.2 Numerical results of generated small-world networks

N (k=2m)

m p E d r Max Dq lim Dq ∆Dq

5000 5 0.03 25159 33 1 to 20 2.3131 2.2783 0.03

5000 5 0.04 25207 29 1 to 18 2.4323 2.3719 0.06

5000 5 0.06 25290 23 1 to 15 2.562 2.5265 0.04

5000 5 0.08 25358 23 1 to 13 2.6584 2.6289 0.03

5000 5 0.1 25513 18 1 to 12 2.8181 2.7518 0.07

5000 5 0.13 25621 15 1 to 11 2.8927 2.8295 0.06

5000 5 0.15 25792 15 1 to 10 2.9884 2.9278 0.06

5000 5 0.2 26017 12 1 to 9 3.0845 3.0403 0.04 regular network

5000 50 0 250000 50 1 to 10 1 1 0.00002

Secondly, for better comparison, we generated ten small-world networks based on a

regular network of 5000 nodes with 5 edges on each side of a node. During the

generation, when the probability p = 0, the process generates a regular network;

when 0 < p < 1, it generates a small-world network; and when p = 1 it generates a

random network. More specifically, when the probability p increases, more edges are

added into the original regular network. Then we applied the modified fixed-size

box-covering method on them to detect their multifractal behaviour. We summarize


____________________________________________________________________

118

the numerical results in Table 3.2, which includes the number or nodes (N), number

of edges (E), diameter of the network (d), generating parameter the probability p,

maximum value of Dq, limit of Dq, and ∆ Dq.

In Fig 3.13 we can see that the Dq curve of regular network whose probability p = 0

during generation is a straight line with the value of 1. The Dq curves of the other

small-world networks are also approximately straight lines but with different Dq

values. So these networks are not multifractal. Another interesting property is

apparent when 0.03 < p < 0.2, in which case Dq increases along with the value of p.

More specifically, when p increases, more edges are added into the network, and

both maximum value of Dq and limit of Dq increase. The values of ∆ Dq are all

within the error range, confirming that the Dq curves are simply straight lines.

Fig 3.13 The Dq curves for theoretically generated small-world networks


____________________________________________________________________

119

3.4.3 Multifractality of random networks

In mathematics, a random network (random graph) is a network that is generated by

some random process. Random networks were first presented by P. Erdös and E.

Réyni (1960). They defined a random network as one consisting of N nodes

connected with n edges which are chosen randomly from ( 1) 2N N − possible edges.

In total there are precisely ( 1) 2nN NC − different networks with N nodes and n edges

possible, from which any network is equiprobale.

Generate random network

We generate random networks based on the ER (Erdös and Réyni 1960) model

(i) Start with N isolated nodes;

(ii) Pick up every pair of nodes and connect them by an edge with probability p:

Fig 3.14 Generating random networks


____________________________________________________________________

120

Usually, the results of this generation are separated subnetworks. In this work, we

just consider the largest connected part as the network to work on and apply the

modified fixed-size box-covering method to detect their multifractal behaviours. An

example of generating random network is given in Fig 3.14 N = 1000 and p = 0.005.

We could see that the generated random network consists of a large connected

network, pair of interacting nodes and several isolated nodes. We chose the largest

connected part as the random network and proceed with the multifractal analysis.

Numerical results

We then summarize the numerical results in Table 3.3 including the number or nodes

(N), number of edges (E), diameter of the network (d), the generating parameter

probability p, maximum value of Dq, limit of Dq, and ∆ Dq. These results indicate

that there is no clear relationship between Dq and the size of random network.

Table 3.3 Numerical results of generated random networks

N E d p r max Dq lim Dq ∆ Dq

449 610 15 0.005 1 ~ 9 2.4247 2.144 0.28

994 2502 8 0.005 1 ~ 5 3.3235 2.8661 0.46

1991 5939 9 0.003 1 ~ 5 3.7288 3.4069 0.32

2484 6310 11 0.002 1 ~ 6 3.6956 3.3321 0.36

2790 4374 18 0.001 1 ~ 9 3.291 2.9549 0.34

3373 5978 15 0.001 1 ~ 8 3.4686 3.1546 0.31

3931 8125 13 0.001 1 ~ 7 3.6716 3.3511 0.32

4919 10179 13 0.0008 1 ~ 8 3.7755 3.3918 0.38

5620 8804 16 0.0005 1 ~ 10 3.5377 3.2078 0.33


____________________________________________________________________

121

Fig 3.15 The Dq curves for theoretically generated random networks

In Fig 3.15, we can see that the Dq curves of random networks decreases slowly after

the peak and the changes could be seen by the values of ∆Dq. This pattern occurs

because, during the generating process, nodes are randomly connected with

probability p, and few hubs may exist. Compared with scale-free networks, this

decrease supports the claim that, in random networks, edges are distributed more

symmetrically.

Remark: In the present study, we consider the generalized fractal dimension Dq

because we can determine whether the object is multifactal from the shape of Dq. For

a monofractal system, which has the same scaling behaviour at any point, Dq should

be a constant which is independent of q, while for multifractal objects, the Dq should

be a non-increasing convex curve as q increases. However, in our results, an

anomalous behaviour is observed: the Dq curves increase at the beginning when q < 0.

This anomalous behaviour has also been observed (Opheusden et al.1996, Smith and

Lange 1998, Fernández et al. 1999). Some reasons for this behaviour have been

suggested, such as the boxes containing few elements (Fernández et al. 1999), or the


____________________________________________________________________

122

small scaling regime of less than a decade, so that we cannot extrapolate the box

counting results for the partition function to zero box size (Opheusden et al.1996). In

encountering the anomalous spectra of Dq, we tried another method of multifractal

analysis called the sand-box method, but the linear regression fittings are not

satisfactory. We therefore used the modified fixed-size box counting algorithm in

this research. For the purpose of detecting the multifractality of complex networks,

we adopt the anomalous spectra of Dq and focus on the decreasing parts which are

presented in Figs 3.11, 3.13 and 3.15.

3.5 Multifractality of PPI networks

Introduction

Protein-protein interactions (PPI) play a critical role in most cellular processes and

form the basis of biological mechanisms. In the post-genome era, the developments

of high-throughput methods, such as yeast-two-hybrid and mass spectrometry, have

produced vast amounts of PPI data, which makes it possible to study genes and

proteins at the network level (Uetz et. al. 2000). PPI networks are normally

represented as unweighted graphs (Deane et. al 2002, Patil and Nakamura 2005).

Databases

The protein-protein interaction data we used here are mainly downloaded from two

databases: the PPI networks of Drosophila melanogaster (fruit fly), C. elegans,

Arabidopsis thaliana (a type of plant) are downloaded from BioGRID. The PPI

networks of S. cerevisiae (baker's yeast), E. coli and H. pylori are downloaded from

DIP. We also use the same human PPI network data as in Lee and Jung (2006).

Our fractal and multifractal analyses are based on connected networks which do not

have separated parts or isolated nodes. In order to apply them to protein-protein

interaction networks, some preparation is needed in advance. Firstly, we need to find

the largest connected part of each data set. For this purpose many tools and methods

could be used. In our study, we adopt Cytoscape which is an open bioinformatics

software platform for visualizing molecular interaction networks and analysing


____________________________________________________________________

123

network graphs of any kind involving nodes and edges. In using Cytoscape, we could

get the largest connected part of each interacting PPI data set and this connected part

is the network on which fractal and multifractal analyses are performed.

Numerical results

We calculated the Dq spectra for seven PPI networks of different organisms and

summarize the corresponding numerical results in Table 3.4 including the number or

nodes (N), number of edges (E), diameter of the network (d), maximum value of Dq,

limit of Dq, and ∆ Dq. These results show multifractality exists in PPI networks.

Table 3.4 Numerical results of Protein-protein interaction networks

Networks N E d Max Dq Lim Dq ∆Dq

Human 8934 41341 14 4.89 2.65 2.24

D. melanogaster 7476 26534 11 4.84 2.87 1.97

S. cerevisiae 4976 21875 10 4.62 2.48 2.14

E.coli 2516 11465 12 4.15 2.10 2.05

H.pylori 686 1351 9 3.47 1.91 1.56

Arabidopsis thaliana 1298 2767 25 2.51 1.62 0.88

C.elegans 3343 6437 13 4.47 1.49 2.98

From Fig 3.16 we could see that all PPI networks are multifractal and there are two

clear groupings of organisms based on the peak values of their Dq curves. The first

group includes human, Drosophila melanogaster, S. cerevisiae, and C. elegans. The

second group just includes two bacteria E.coli and H. pylori. We could also see that

the PPI networks of the seven organisms have similar shape for the Dq curves. They

reach their peak values around q = 2, then decrease sharply as q > 2 and finally reach

their limit value when q > 10. So we can take limDq = D(20) and use ∆Dq = maxDq -

limDq to verify how the Dq function changes along each curve.


____________________________________________________________________

124

Fig 3.16 The Dq curves for protein-protein interaction networks

Table 3.5 Numerical results of sub-networks of Human PPI

Networks N E d Max Dq Lim Dq ∆Dq Subnetwork of

Human PPI 3505 4651 24 3.65 1.99 1.66 Subnetwork of






Human PPI 3505 10652 10 4.02 2.47 1.55

Then we randomly chose several sub-networks from different parts of the human PPI

network. These sub-networks all contain 3505 nodes and different numbers of edges.

Since these sub-networks are chosen randomly, overlapping between them is allowed.

Then we calculated the Dq spectra for sub-networks of human protein-protein


____________________________________________________________________

125

interaction network (Lee and Jung 2006) and summarize the corresponding

numerical results in Table 3.5 including the number or nodes (N), number of edges

(E), diameter of the network (d), maximum value of Dq, limit of Dq, and ∆Dq. These

results show multifractality exists in PPI networks.

From Fig 3.17 we could see that not all parts of a PPI network have the same

multifractal behaviour. More specifically, among these sub-networks, the ∆Dq values

vary from one to another which means that the edge distribution of some parts of a

network is symmetric while that of the other parts may not be. This may help to

understand the diversity and complexity of protein-protein interactions.

Fig 3.17 The Dq curves for sub-networks of human PPI networks

3.6 Multifractality of gene networks

In recent years, bioinformatics has become a more and more notable research field

since it allows biologists to make full use of the advances in computer science and

computational statistics in analysing the data of an organism at the genomic,


____________________________________________________________________

126

transcriptomic and proteomic levels (Keedwell and Narayanan 2005). DNA

technology, i.e. microscopic arrays of large sets of nucleotide sequences, is a modern

tool that is used to obtain information about expression levels of thousands of genes

simultaneously. The important applications of gene expression microarray data

include disease diagnosis (Chiang et al. 2008, Zhang et al. 2010), classification of

samples into categories (Liu et al. 2006, Rapaport et al. 2007), drug discovery

(Oostrom et al. 2009) and toxicological research (Wright and Simon 2003).

In the context of the human genome project, new technologies emerged that facilitate

the parallel execution of experiments on a large number of genes simultaneously.

The gene microarray technology (Stekel 2003, Hu and Pan 2007) aims at the

measurement of mRNA levels in particular cells or tissues for lots of genes at once.

A microarray experiment consists of the following components: a set of probes, an

array on which these probes are immobilised at specified locations, a sample

containing a complex mixture of labelled bimolecular that can bind to the probes,

and a detector that is able to measure the spatially resolved distribution of label after

it has bound to the array. After exposure of the array to the sample, the abundance of

individual species of sample molecules can be quantified through the signal intensity

at the matching probe sites.

In this section, we aim to compare the difference of multifractal behaviours between

gene networks that reconstructed from patients and normal microarrays. However,

work in such high dimensional microarray data is extremely difficult. So firstly, we

apply Fuzzy Membership test (FM-test) (Liang 2006) to get the most important genes

that are related with the disease; then we construct networks based on the microarray

data of the selected genes by calculating the correlation coefficient. Next we apply

the modified fixed-size box-covering method on them to detect their multifractal

behaviours.

3.6.1 Methods

Gene network reconstruction from microarray data


____________________________________________________________________

127

Two main approaches have been proposed in the literature for gene network

reconstruction from microarray data, namely Bayesian networks and graphical

Gaussian models (Werhli et al. 2006). Bayesian networks are directed acyclic graphs,

i.e. no feedback loop is possible. On the other hand, graphical Gsussian models are

undirected graphs and are very computationally efficient. Additionally, to extract

gene networks from microarray data, correlations are often used as a measure of gene

co-expression (Borate et al. 2009). Network and clustering analyses of microarray

co-expression correlation data often require application of a threshold to discard

small correlations, thus reducing computational demands and decreasing the number

of uninformative correlations. For example: M n×t is the microarray matrix, where n is

the number of genes and each gene has t values. The genes are denoted by vectors

,1 ,2 ,, ,...,i i i i tM m m m= , (1, )i n∈ : the correlation between any two genes Mi and Mj is

then defined by

, ,

, ,1

2 2

1 1

( , )

i k j k

t

i k j kk

i j t t

k k

M MC M M

M M

=

= =

=∑

∑ ∑. (3.36)

Next by defining a threshold r, we could reconstruct a gene network from microarray

by the following rules:

(i) if C(Mi, Mj) ≥ r, there is an edge between the two nodes corresponding to

genes Mi and Mj;

(ii) if C(Mi, Mj) < r, there is no edge between the two nodes corresponding to

genes Mi and Mj;

There are several methods for selecting appropriate threshold r, such as an arbitrary

threshold correlation of 0.80 (Sanoudou et al. 2003); Moriyama et al. (2003) obtained

random correlation distributions for gene pairs by permuting their expression values

and defended their choice of threshold based on statistical significance; Lee et al.

(2004) used the top 1% of correlations (absolute value) to build a co-expression

network; Voy et al. (2006) used distribution of correlations of genes with buffer

spots on the arrays to select a threshold correlation value of 0.875, etc. In this section,


____________________________________________________________________

128

we need to calculate the generalized fractal dimension Dq of a connected network, so

we chose the minimum value of r as long as there will be no isolated nodes in the

network.

Fuzzy membership test

The fuzzy membership test (FM-test) is proposed by Liang (2006). The FM-test is

based on fuzzy set theory to identify disease associated genes from microarray gene

expression profiles. Let S1 and S2 be two sets of values of patient’s and normal

people’s gene expression values.

(i) Calculate the sample mean µ1 and standard deviation σ1 of S1 and S2

respectively as follows:

1

11

1

i

ix S

xN

µ∈

= ∑ , (3.37)

where N1 is the number of elements in S1,

1

21 1

1

1( )

1i

ix S

xN

σ µ∈

= −− ∑ . (3.38)

(ii) Characterize S1 and S2 as two fuzzy sets FS1 and FS2 whose fuzzy

membership functions, 1( )FSf x and

2( )FSf x , are defined with the sample

means and standard deviations. The fuzzy membership function 1( )FSf x (i =

1, 2) maps each value xj to a fuzzy membership value that reflects the

degree of x belonging to 2( )FSf x (i = 1, 2).

(iii) Using the two fuzzy membership functions, 1( )FSf x and

2( )FSf x , quantify

the convergence degree of two sets:

2 2

1 1

1

( ) 2( ) xFSf x e µ σ− −= , (3.39)

2 2

2 2

2

( ) 2( ) xFSf x e µ σ− −= . (3.40)

(iv) Define the divergence degree (FM d-value) between the two sets based on

the convergence degree:


____________________________________________________________________

129

( ) ( )( ) ( ) ( ) ( )

2 1

1 2

1 2 1 21 2

, 1 ,F S F S

e S e S

f e f f

d S S c S SS S

∈ ∈

+= − =

+

∑ ∑. (3.41)

3.6.2 Results and discussion

Data description

Four different gene microarray data sets are used in our work.

(i) Colorectal cancer. A colorectal cancer dataset of microarray gene

expression for a total of 15552 genes is downloaded from Collado et al.

(2007). For each gene, there are twenty expression values, twelve from a

group of colorectal cancer patients and eight from a group of healthy

donors.

(ii) Type II diabetes: A diabetes dataset of microarray gene expression for a

total of 10831 genes is downloaded from Yang et al. (2002). The

maicroarray expression profiles are from 18 insulin-sensitive and 17

insulin-resistant non-diabetic Pima Indians. Insulin resistant will increase

the chance of type II diabetes. For each gene, there are ten expression

values, five from a group of insulin-sensitive (IS) people and five from a

group of insulin-resistant (IR) people. Only the genes that have no null

expression values and at least five of them have expression values greater

than 100 are included in the analysis. Therefore 2304 genes are selected for

this analysis.

(iii) Type I diabetes: A type I diabetes dataset of microarray gene expression for

a total of 24354 genes is downloaded from Oostrom et al. (2009). For each

gene, there are 21 expression values, 11 are from a group of type I diabetes

patients and 10 are from the same patients who have been on medication of

folic acid supplementation for a week.


____________________________________________________________________

130

(iv) Lung cancer: A total number of 22283 microarray gene expression data is

downloaded from Wachi et al. (2005). Expression profiling is obtained for

squamous lung cancer biopsy specimens and paired normal specimens from

5 patients. For each gene, there are ten expression values, five from a group

of normal cells and five from a group of lung cancer tumours cells.

Gene networks construction

(i) Colorectal cancer microarray dataset includes 15552 genes in total. First,

we use the FM-test to get 2000 genes that are most related with the disease.

Then we construct two gene networks of these 2000 selected genes, one is

for colorectal cancer patient and the other is for healthy donors.

(ii) Type II diabetes microarray dataset includes 2304 genes. We construct two

gene networks of these genes, one is for insulin-sensitive (IS) and the other

is for insulin-resistant (IR).

(iii) Type I diabetes microarray dataset includes 24354 genes. First, we use FM-

test to get 2000 genes that are most related with the disease. Then we

construct two gene networks of these 2000 selected genes, one is for type I

diabetes patients and the other is for the same patients who have been on

medication of folic acid supplementation for a week.

(iv) The lung cancer microarray dataset includes 22283 genes. First, we use the

FM-test to get 2000 genes that are most related with the disease. Then we

construct two gene networks of these 2000 selected genes, one is for normal

cells and the other is for lung cancer tumours cells.

Numerical results and discussion

We calculated the Dq spectra for these gene networks of different datasets and then

summarize the numerical results in Table 3.6 including the number or nodes (N),

number of edges (E), diameter of the network (d), the threshold t, maximum value of

Dq, limit of Dq, and ∆Dq. Fig 3.18 to Fig 3.21 show that the generalized fractal


____________________________________________________________________

131

dimension Dq of these gene networks are decreasing functions of q and

multifractality exists in these networks.

Table 3.6 Numerical results of gene networks

Networks N E d t Max Dq Lim Dq ∆Dq

Patients

(Colorectal cancer)

2000 124778 6 0.72 3.44 1.20 2.24

Healthy donors

(Colorectal cancer)

2000 27936 9 0.85 3.22 2.23 0.99

insulin resistant (IR) 2304 87318 10 0.95 2.34 1.47 0.87

insulin sensitive (IS) 2299 175165 13 0.95 2.14 1.08 1.06

Type I diabetes

(11 patients)

2000 142294 6 0.72 3.39 1.34 2.05

Type I diabetes

(After medication)

2000 17460 10 0.81 3.36 2.20 1.16

Lung cancer cells 2000 15225 13 0.97 2.36 1.91 0.45

Normal cells 2000 24825 11 0.96 2.43 1.90 0.53

Fig 3.18 The Dq curves for gene networks of colorectal cancer microarray data


____________________________________________________________________

132

Fig 3.19 The Dq curves for gene networks of diabetes microarray data

Fig 3.20 The Dq curves for gene networks of type I diabetes microarray data


____________________________________________________________________

133

Fig 3.21 The Dq curves for gene networks of lung cancer microarray data

In summary, multifractality exists in the all the gene networks we generated and the

difference in the shape of the Dq curves are obvious for the first three microarray data

sets. Thus multifractal analysis could provide a potentially useful tool for gene

clustering and identification between healthy people and patients.

3.7 Conclusion

A modified algorithm for analysing the multifractal behaviour of complex networks

is proposed in this chapter. This algorithm is applied on generated scale-free

networks, small-world networks and random networks as well as protein-protein

interaction networks. The numerical results indicate that multifractality exists in

scale-free networks and PPI networks, while for small-world networks and random

networks their multifractality is not clear-cut, particularly for small-world networks

generated by the NW model. Furthermore, for scale-free networks, the values of Dq

increase when the size of the network. This agrees with the fact that larger scale-free

networks usually have more hubs which make the structure of the network more

complex. However, for random networks there is no clear relationship between Dq

and the size of the network. The quantity ∆Dq = maxDq - limDq has been used to


____________________________________________________________________

134

investigate how Dq changes. Larger ∆Dq means the network's edge distribution is

more uneven; while smaller ∆Dq means the network's edge distribution is more

symmetrical, which is the case for random networks.

We also apply the modified fixed-size box-covering method on gene networks

reconstructed from gene microarrays. Firstly, we use the fuzzy membership test to

get the most important genes that related with the disease; then we construct

networks based on the microarray data of the selected genes by calculated the

correlation coefficient. The results show that multifractality exists in gene networks

as well.

These results support that the algorithm proposed in this chapter is a suitable and

effective tool to perform multifractal analysis of complex networks. Especially, in

conjunction with the derived quantities from Dq, the method and algorithm provide a

needed tool to cluster and classify real networks such as the protein-protein

interaction networks of organisms.


____________________________________________________________________

135

Chapter 4

Analysis of networks generated from time series

4.1 Introduction

Characterizing complicated dynamics from time series is a fundamental problem of

continuing interest in a wide variety of fields. Recent works indicate that complex

network theory can be a powerful tool to analyse time series. Many existing methods

for transforming time series into complex networks share a common feature: they

define the connectivity of a complex network by the mutual proximity of different

parts (e.g., individual states, state vectors, or cycles) of a single trajectory.

In this chapter, we propose a new method to construct networks from time series and

apply this method to build networks for fractional Brownian motions, whose long-

range dependence is characterised by their Hurst exponent. We then construct

networks via the technique of horizontal visibility graph (HVG), which has been

widely used recently. We confirm a known linear relationship between the Hurst

exponent of fractional Brownian motion and the fractal dimension of the

corresponding HVG network. In the first application, we apply our newly developed

box-covering algorithm to calculate the generalized fractal dimensions of the HVG

networks of fractional Brownian motions as well as those for binomial cascades and

five bacterial genomes. As an additional application, we discuss the resilience of

networks constructed from time series via two different approaches: visibility graph

and horizontal visibility graph.


Scaling concepts

A stochastic process X (t) is said to have scaling property if for each 0 < c < 1 there

exists a non-random function M(c) such that

( ) ( ) ( ), 0.d

X ct M c X t t= ≥ (4.1)


____________________________________________________________________

136

Here, ( ) ( ) ( )d

X at M a X t= means equality in distribution. For instance, Brownian

motion has the scaling property with( )M c c= , while fractional Brownian motion

has the scaling property with ( ) HM c c= , 0 < H < 1. The symmetric stable Lévy

process L (t), t ≥ 0 with cumulant transform (1)log { }i LE eας ς= ,0 2α< ≤ , has the

scaling property with 1

( )M c c α= .

Self-similar processes

When there exists a unique H > 0, such that, for all a > 0, ( ) ( ), 0d

HX at a X t t= ≥ the

process X (t) is said to be self-similar. A self-similar process with parameter H is

denoted H-ss. It is denoted H-sssi if it also has stationary increments. The parameter

H is known as the Hurst exponent, which is named after Harold Hurst, the British

engineer whose work on Nile river data led to the development of self-similar

processes. An H-sssi process with finite mean has 0 1H< ≤ and X (0) = 0 a.s. The

process with finite mean and H = 1 is the degenerate process X (t) = t X for some

random variable X. If X (t) has finite mean and 0 < H < 1, then E X (t) = 0.

When a process X (t) is H-sssi and has finite variance, its covariance function is

given, up to a constant, by

2 2 21( ) ( ) ( ( ) ( ) ( ( ) ( )) )

2EX t X s EX t EX s E X t X s= + − −

2 2 21( ( ) ( ) ( ))

2EX t EX s EX t s= + − −

( )22 2 21( ) 1

2HH Ht s t s EX= + − − . (4.2)

Since the mean and covariance functions completely determine the finite

dimensional distributions of a Gaussian process, there is only one Gaussian H-sssi

process for each H. This is fractional Brownian motion BH (t) whose covariance

function is given by Equation 4.2.


____________________________________________________________________

137

A stationary sequence Xn with finite variance is said to be strongly dependent or has

long-range dependence if 00 nnEX X

∞

== ∞∑ . The concept of long-range dependence

can be extended to random variables with infinite variance. Let Y (t) be a finite-

variance H-ssi process with 1

12

H< < and ( ) ( 1)nX Y n Y n= − − , n ∈Z ; it follows

that Xn is a sequence with long-range dependence. It is therefore usual to assume a

self-similar model when long-range dependence is found.

A typical feature of a self-similar process X (t) on [0, T] is that it has a noninteger

degree of differentiability, characterised by its local Hölder exponent h (t) defined by

( ) : sup{ : ( ) ( ) }l

tl

h t l X P C tτ τ τ= − < − , for τ sufficiently close to t and Pt(.) being the

Taylor polynomial of X at t. The sets { }: : ( )E t h tα α= = , which form a decomposition

of the support of X according to its singularity exponents α, can be highly interwoven

and dense on [0, T]. The singularity spectrum of X, also known as its multifractal

spectrum, is then defined as d (α) = dim (Eα), where dim is the Hausdorff dimension

(see Chapter 3). A process X is said to be multifractal if the support of its singularity

spectrum has a non-empty interior. A classical example of a multifractal process is

the binomial cascade on the interval [0, 1] which will be introduced in the following

section. A multiplicative cascade does not possess stationary increments and is only

defined on a finite interval. An example of a stochastic process with stationary

increments defined on [0, ∞) which is also a multifractal is Lévy motion. Jaffard

(1999) showed that all Lévy motions are multifractal with the exception of Brownian

motion, compound Poisson processes, deterministic motion and their convolutions.

Fractional Brownian motion

The concept of fractional Brownian motion, introduced by Mandelbrot, is given

above. It is a generalization of Brownian motion by allowing short-range or long-

range dependence captured by an exponent H in the range of 0 < H < 1 (Mandelbrot

1982). For H =1/2, fractional Brownian motion reduces to Brownian motion. The

parameter H intuitively determines the ruggedness of the graph: for small H, the

graph is extremely rugged looking; for large H, the graph is smoother. Examples of


____________________________________________________________________

138

fractional Brownian motions and Brownian motion of length 105 are given in Fig 4.1

for different values of parameter H = 0.2, 0.5, 0.7, 0.9 respectively.

Fig 4.1 Examples of fractional Brownian motions

The variance for increments of fractional Brownian motion is given by

22 2

2 1 2 1[( ( ) ( )) ]H

E X t X t t tσ− = − , (4.3)

where [ ]1 2, 0,t t T∈ . By contrast with ordinary Brownian motion, which satisfies the

independent increments property, a fractional Brownian motion with parameter

H ≠ ½ does not satisfy this property. In fact fractional Brownian motion with

parameter H, 0 < H < 1, has independent increments if and only if H = ½.

(Crownover 1995 § 9.4, p. 247).

Transforming time series into networks

Recently, several approaches have been proposed from transforming time series into

complex network representations (Donner et al. 2011). Donner et al. summarized the


____________________________________________________________________

139

methods which are related with the concept of recurrence into three classes: cycle

networks (Zhang and Small 2006, Zhang et al. 2008, and Small et al. 2009),

correlation networks (Yang and Yang 2008, Gao and Jin 2009), recurrence networks

(Marwan et al. 2007a,b, Shimada et al. 2008, Xu et al. 2008, Small et al. 2009,

Marwan et al. 2009, Donner et al. 2010), and other methods such as visibility graphs

(Lacasa et al. 2008 and Lacasa et al. 2009) and transition networks (Daw et al. 2003

and Donner et al. 2008). The recurrence-based complex networks provide a new

approach for time series analysis and offer a promising and complementary view for

the study of dynamical systems. Applying well established complex network

measure, we are able to characterize and classify the dynamics of complex systems,

to detect dynamical transitions, or to identify invariant substructures (Donner et al.

2011).

In this chapter, we transform time series of fractional Brownian motion (FBM) into

networks and investigate their properties such as fractal scaling, multifractal analysis

and degree distributions.

4.3 Fractal scaling of weighted networks generated from FBM

There are several methods of constructing a network from a time series. Marwan et al.

(2009) demonstrated that the recurrence matrix can be considered as the adjacency

matrix of an undirected, unweighted network allowing us to study time series using a

complex network approach. A recurrence matrix is a binary symmetric square

matrix which encodes the times when two states are in close proximity (i.e.

neighbours in the phase space). Based on such a recurrence matrix, a large and

diverse amount of information on the dynamics of the system can be extracted.

Recurrence plot

A recurrence plot is a representation of recurrent states of a dynamical system in its

m-dimensional space. It is a pair-wise test of all phase space vectors


____________________________________________________________________

140

( 1,2,..., , )mix i N x R= ∈� �

among each other, whether or not they are close (Marwan

et al. 2009):

( )( ), ,i ji jR d x xε= Θ −� �

, (4.4)

where Θ(·) represents the Heaviside function and ε is a threshold for proximity. The

closeness ( ),i jd x x� �

can be measured in different ways. Mostly, a spatial distance is

considered in terms of maximum or Euclidean norm

( ),i j i jd x x x x= −� � � �

. (4.5)

The binary recurrence matrix R contains the value one for all close pairsi jx x ε− <� �

.

A phase space trajectory can be reconstructed from a time series { } 1

N

i iµ

= by time

delay embedding

( )( )1, ,...,i i i i mx τ τµ µ µ+ + −=�

, (4.6)

where m is the embedding dimension and τ is the delay.

Let us consider the phase space vectors as nodes of a network and identify

recurrences with edges. An undirected and unweighted network is represented by the

binary adjacency matrix A, where a connection between nodes i and j is marked as

Ai,j = 1. Excluding self-loops, A could be obtained from the RP by removing the

identity matrix (Marwan et al. 2009):

, , ,i j i j i jA R δ= − , (4.7)

where δi,j is the Kronecker delta. In this way, each state vector in phase space is

represented by one distinct node. An important advantage of such methods is that if

the embedding is chosen appropriately, the topological distribution of the set of

vector points in phase space will accurately reflect the underlying dynamics of the

original system.

4.3.1 Methods

In our work, we apply the FNN (false nearest neighbours) method (Kennel et al.

1992) to estimate the embedding dimension. The FNN procedure is a method to


____________________________________________________________________

141

obtain the optimum embedding dimension for phase space reconstruction. By

checking the neighbourhood of points embedded in projection manifolds of

increasing dimension, the algorithm eliminates 'false neighbours' which means that

points apparently lying close together due to projection are separated in higher

embedding dimensions.

FNN (False Nearest Neighbours)

It has become quite familiar in the analysis of observed time series from nonlinear

systems to make a time-delay reconstruction of a phase space in which to view the

dynamics (Kennel et al. 1992). This is accomplished by utilizing time delayed

versions of an observed scalar quantity: x (t0+n∆t) = x (n) as coordinates for the

phase space. From the set of observations, multivariate vectors in d-dimensional

space

( ) ( ( ), ( ),..., ( ( 1) ))y n x n x n T x n d T= + + − , (4.8)

are used to trace out the orbit of the system. Time evolution of the y's is given by y (n)

+y (n +1). In practice, the natural questions of what time delay T and what

embedding dimension d to use in this reconstruction have had a variety of answers.

In Kennel et al. (1992) a method of false nearest neighbours has been proposed to

estimate the appropriate value of d. The idea is that if we are in d dimensions and we

denote the r-th nearest neighbour of y (n) by y(r) (n), then from Equation 4.10 we

could calculate the square of the Euclidian distance between the point y (n) and its

neighbour by

1

2 ( ) 2

0

( , ) [ ( ) ( )]d

rd

k

R n r x n kT x n kT−

== + − +∑ . (4.9)

Then going from dimension d to dimension d + 1 by time delay embedding we add a

(d +1)-th coordinate onto each of the vectors y(n). This new coordinate is just

x(n +Td). We now ask what is the Euclidean distance, as measured in dimension d +1,

between y(n) and the same r-th neighbour as determined in dimension d? After the

addition of the new (d +1)-th coordinate the distance between y(n) and the same r-th

nearest neighbour we determine in d dimensions is


____________________________________________________________________

142

( ) ( ) ( ) ( )22 21 , , ( ) r

d dR n r R n r x v dT x n dT+ = + + − +

. (4.10)

A natural criterion for catching embedding errors is that the increase in distance

between y(n) and y(r)(n) is large when going from dimension d to dimension d +1.

The increase in distance can be stated quite simply from Equation 4.7 and 4.6. We

state this criterion by designating as a false neighbour any neighbour for which

( ) ( )

( )( ) ( ) ( )

( )

12 2 2

12

, ,

, ,

r

d dtol

d d

x n Td x n TdR n r R n rR

R n r R n r+

+ − + −= >

, (4.11)

where Rtol is a threshold. A problem turns out if a point is a nearest neighbour of

another without necessarily being close to it. Therefore the number of false nearest

neighbours will again increase in higher dimensions. To handle this problem, a

further criterion is implemented: the loneliness criterion. It is represented by the

loneliness tolerance threshold. The output produced by the function is the percentage

amount of FNN versus increasing embedding dimension and has a monotonic

decreasing graph. The optimum embedding dimension usually can be found near the

crossing of the 30 % threshold.

We use the Matlab program on http://ideas.repec.org/c/boc/bocode/t741510.html (by

Mohammadi) to compute false nearest neighbours and find the proper embedding

dimension for phase space reconstruction.

4.3.2 Numerical results and discussion

We generated fractional Brownian motion series of size 3×103, with different Hurst

indices H = 0.05, 0.1, 0.15,…,0.95. For each series X = (x1, x1, x2, ... , x3000),

assuming that the time delay τ = 1, we use false nearest neighbours method to find

the proper embedding dimension m. In d-dimensional space we will get a total

number of3000 - (m - 1) vectors,

( ) ( 1) ( ( 1))( ) ( , ,..., ), 1,2,...,3000 ( 1)n n n my n x x x n m+ + −= = − − .

Then considering each vector as a node we calculate the Euclidean distance between

any two nodes; accordingly a weighted network could be obtained. The network is


____________________________________________________________________

143

fully connected and the weight of each edge is the Euclidean distance between the

corresponding vectors. For these networks, we apply the random-sequence box-

covering method to calculate their fractal dimension. A typical result of fractal

scaling is shown in Fig 4.2 when H = 0.8. An obvious linear relationship between rB

(box size) and NB (number of boxes needed to cover the whole network for a given rB)

is shown.

Fig 4.2 Fractal scaling of weighted network of FBM

Then we vary H from 0.5 to 0.95 at interval of 0.05 then convert them into weighted

networks and calculate their fractal dimensions. We summarise the numerical results

in Table 4.1 where H is the Hurst index of each fractional Brownian motion, m is the

embedding dimension, N is the number of nodes, d is the diameter of the network, fd

is the fractal dimension with error. From the results we could see that, the fractal

dimensions of weighted networks constructed from fractional Brownian motion

decreases when its Hurst index increases.


____________________________________________________________________

144

Table 4.1 Fractal dimensions of FBM weighted networks

H m N d fd error 0.05 4 2997 24.70 3.21 0.0727 0.1 4 2997 29.38 3.13 0.0932 0.15 4 2997 32.55 3.04 0.0524 0.2 4 2997 25.69 2.91 0.0602 0.25 4 2997 32.44 2.77 0.068 0.3 4 2997 36.99 2.54 0.0653 0.35 4 2997 53.12 2.32 0.0994 0.4 3 2998 40.55 2.08 0.0746 0.45 3 2998 72.35 1.5 0.0916 0.5 3 2998 94.52 1.27 0.0537 0.55 3 2998 127.90 1.11 0.0117 0.6 3 2998 121.53 1.12 0.0135 0.65 3 2998 148.16 1.10 0.0127 0.7 3 2998 114.21 1.12 0.0125 0.75 3 2998 103.57 1.14 0.0226 0.8 3 2998 174.99 1.07 0.0099 0.85 3 2998 161.37 1.08 0.0165 0.9 2 2999 221.75 1.07 0.0141 0.95 2 2999 290.75 1.06 0.0221

However, the weighted network is fully connected and may contain redundant

information due to the large amount of edges. So we need to consider reconstructing

networks from time series that contain less of edges. In the following sections, we

will adopt the HVG (horizontal visibility graph) method and investigate the

multifractal behaviour of such networks.

4.4 Multifractal analysis of horizontal visibility graphs

4.4.1 Introduction of VG and HVG

Visibility graphs (VG)

A visibility graph (Lacasa et al. 2008 & 2009) is obtained from the mapping of a

time series into a network according to the following visibility criterion: Two

arbitrary data (ta, ya) and (tb, yb) in the time series have visibility, and consequently

become two connected nodes in the associated graph, if any other data (tc, yc) such

that ta < tc < tb fulfils


____________________________________________________________________

145

( ) c ac a b a

b a

t ty y y y

t t

−< + −−

. (4.12)

Properties:

(i) Connectedness: each node sees at least its nearest neighbours (left and right)

(ii) Undirectedness: the way the algorithm is built up, there is no direction

defined in the links

(iii) Invariant under affine transformations of the series data: the visibility

criterion is invariant under rescaling of both horizontal and vertical axes,

and under horizontal and vertical translations

Fig 4.3 Illustration of the visibility graph algori thm

An example of constructing visibility graph from time series is shown in Fig 4.3. A

time series of 20 data points is depicted in the upper part by using vertical bars, and

the associated graph derived from the visibility algorithm is plotted in the bottom

panel. Considering this as a landscape, we link every bar (every point of the time

series) with all those that can be seen from the top of the considered one (gray lines),

obtaining the associated graph (shown in the lower part of the figure). In this graph,

every node corresponds, in the same order, to the data series, and two nodes are


____________________________________________________________________

146

connected if visibility exists between the corresponding data, that is to say, if there is

a straight line that connects the data series, provided that this ‘‘visibility line’’ does

not intersect any intermediate data height.

Horizontal visibility graph (HVG)

The horizontal visibility graph (HVG) proposed by Luque et al. (2009) could map

time series into networks (graphs). It is defined as follows. For a time series of N

data points X = {x1, x2, ... , xN}, each data of X will be assigned as a node in the

network. Two nodes xi and xj are connected with an edge if one can draw a horizontal

line in the time series joining xi and xj that does not intersect any date in between

(illustrated in Fig 4.4). Thus, node i and j are connected if the following geometrical

criterion is fulfilled within the time series:

, for all such that i j nx x x n i n j> < < . (4.13)

Fig 4.4 Illustration of the horizontal visibility a lgorithm (Luque et al. 2009).

For a time series of 20 data points X = {x1, x2, ... , x20} plotted in the upper part, a

horizontal visibility graph could be generated as shown in the bottom. Each data in

the series corresponds to a node in the graph, such that two nodes are connected if


____________________________________________________________________

147

their corresponding data heights are larger than all the data heights between them.

The data values heights are made explicit in the top part.

Properties of HVG

Based on the above definitions, HVG have the following properties (Luque et al.

2009):

(i) Connectedness: each node in HVG at least connected with its two

neighbours on both sides.

(ii) Invariant under affine transformations of the series data: the visibility

criterion is invariant under rescaling of both horizontal and vertical axes, a

well as under horizontal and vertical translations.

(iii) Undirectedness and unweightedness: the way the algorithm is built, there is

no direction defined in the edges, and no weight is assigned to any node or

edge.

4.4.2 Multifractal analysis of FBM HVG

In this section, we firstly generate ten fractional Brownian motions with Hurst index

H = 0.1, 0.2, …, 0.9. All these time series contain 3000 values. For each time series,

we build a horizontal visibility network. For these networks, we apply the random

sequential box-covering algorithm to calculate their fractal dimension. Fig 4.5 shows

the fractal scaling for three HVG built on Brownian motions with Hurst index H =

0.5 represented by ‘°’, and fractional Brownian motions with Hurst indices H = 0.3,

0.8 represented by ‘*, ·’ respectively.


____________________________________________________________________

148

Fig 4.5 Fractal scaling for FBM HVG

Table 4.2 Numerical results of FBM HVG

H N E d r dB error max Dq lim Dq ∆ Dq 0.1 3000 5957 45 2 ~ 10 2.04 0.09 2.3 1.93 0.37 0.2 3000 5960 48 2 ~ 9 1.96 0.10 2.08 2.01 0.07 0.3 3000 5924 78 1 ~ 13 1.74 0.09 1.84 1.6 0.24 0.4 3000 5912 104 1 ~ 14 1.65 0.06 1.67 1.57 0.1 0.5 3000 5828 170 2 ~ 15 1.58 0.02 1.59 1.48 0.11 0.6 3000 5758 272 2 ~ 19 1.43 0.02 1.52 1.4 0.12 0.7 3000 5743 329 1 ~ 20 1.33 0.02 1.38 1.23 0.15 0.8 3000 5579 525 1 ~ 20 1.21 0.02 1.23 1.18 0.15 0.9 3000 5340 810 1 ~ 20 1.13 0.01 1.19 1.06 0.13

This fractal scaling test has confirmed a roughly linear relation between fractal

dimensions of HVG and the Hurst index dB ≈ 1.96 – 1.05H for 0 < H < 0.3 and dB ≈

2 – H for 0.3 < h < 1 as observed in Xie and Zhou (2011). This implies that the

fractal dimensions of FBM HVGs are just slightly less than the fractal dimensions of

the associated FBMs. In other words, horizontal visibility graphs inherit the fractality

of fractional Brownian motions.


____________________________________________________________________

149

We next apply our newly proposed fixed-size box-covering method (detailed in

Chapter 3) to detect their multifractal behaviour. We summarize the numerical

results in Table 4.2, which includes the Hurst index (H), number or nodes (N),

number of edges (E), diameter of the network (d), range of box size for fractal

scaling (r), fractal dimension (dB), maximum value of Dq, limit of Dq, and ∆ Dq. The

generalized fractal dimensions Dq are shown in Fig 4.7 and 4.8

In Fig 4.6 we plot the Dq curves with error bar for three horizontal visibility networks

for Brownian motions with Hurst index H = 0.5 represented by ‘°’, and fractional

Brownian motions with Hurst indices H = 0.3, 0.8 represented by ‘*, ·’ respectively.

We could see that their Dq curves are nearly flat and stable around the value which is

quite close to the value of HVG’s fractal dimensions.

Fig 4.6 The Dq curves for FBM with error bar.

Fig 4.7 gives more typical examples with H varying from 0.1 to 0.9 at interval of 0.1.

We can see that the Dq curves of the FBM HVG are approximately straight lines. So

these networks are not multifractal. Another interesting property is apparent when 0

< H < 1 in which case Dq decreases along with the values of H. More specifically,

when H increases, the fractional Brownian motion becomes smoother resulting in


____________________________________________________________________

150

less amount of edges in the corresponding HVG, and both maximum value of Dq and

limit of Dq increase. The values of ∆ Dq are all within the error range, confirming

that the Dq curves are simply straight lines.

Fig 4.7 The Dq curves for FBM HVG

4.4.3 Multifractal analysis of HVG of binomial measure

Binomial measure

Mandelbrot's binomial measure is the simplest multifractal on the interval [0, 1]. This

is the limit of an iterative procedure called a multiplicative cascade (Mandelbrot et al.

1997).

The generating process of binomial measure is shown in Fig 4.8. Let p1 and p2 be

two positive numbers with p1 + p2 = 1. At step k = 0, we start the construction with

the uniform probability measure µ0 on [0, 1]. In the step k = 1, the measure µ1

uniformly spreads mass equal to p1 on the subinterval [0, ½] and mass equal to p2 on

[½, 1]. The density of µ1 is drawn for p1 = 0.3 and p2 = 0.7. In the step k = 2, the set

[0, ½] is split into two subintervals [0, ¼] and [¼, ½], which respectively receive a


____________________________________________________________________

151

fraction p1 and p2 of the total mass µ1 ([0, ½]). We apply the same procedure to the

dyadic set [½, 1] and obtain: µ2 ([0, ¼]) = p1 p1, µ2 ([¼, ½]) = p1 p2, µ2 ([½, ¾]) = p2

p1, µ2 ([¾, 1]) = p2 p2.

Fig 4.8 Illustration of a binomial measure

Fig 4.9 Binomial multifractal when p1 = 0.3 and p2 = 0.7


____________________________________________________________________

152

Iteration of this procedure generates an infinite sequence of measures (Mandelbrot et

al. 1997). In step k +1, we assume that the measure µk has been defined and construct

µk+1 as follows. Consider an interval [t, t + 2-k], where the dyadic number t is of the

form

1 21

0. ... 2k

ik i

i

t η η η η −

== =∑ , (4.14)

in the counting base b = 2. We uniformly spread a fraction p1 and p2 of the mass

µk ([t, t + 2-k]) on the subintervals [t, t + 2-k-1] and [t + 2-k-1, t + 2-k]. A repetition of

this scheme to all subintervals determines µk+1. The measure µk+1 is now well-defined.

Fig 4.9 represents the measure µ12 in the step k = 12 of the recursion where the series

contains 212 values.

Properties of binomial measure

Consider the interval [t, t + 2-k], where 1 20. ... kt η η η= in the counting base b = 2. Let

p1 and p2 denote the relative frequencies of 0’s and 1’s in the binary development of t.

The measure of the dyadic interval simplifies to

0 11 2[ , 2 ] k kkt t p pϕ ϕµ −+ = . (4.15)

The binomial measure has important characteristics common to many multifractals.

It is a singular probability measure; it thus has no density and no point mass. We also

observe that since p1 + p2 = 1, each step of the construction preserves the mass of

split dyadic intervals (Mandelbrot et al. 1997).

Multifractal analysis of binomial measure

Viewing the binomial measure sequence as a time series, multifractal analysis can

then be undertaken on the sequence. In the one-dimensional case, for a given

measure 0 ≤ µ ≤ 1, we consider the partition sum as in Equation 3.15:

( ) ( )( ) 0

q

B

Z q Bεµ

µ≠

= ∑ , the exponent ( )qτ of the measure µ is defined by Equation

3.17: ( ) ( )0

lnlim

ln

Z qq ε

ετ

ε→= , the generalized fractal dimensions of the measure µ are

defined by Equation 3.18 and Equation 3.19:


____________________________________________________________________

153

( )1q

qD q

τ= − 1q ≠

1,

0lim

lnq

ZD ε

ε ε→= 1q =

where ( )( ) ( )1, 0ln

BZ B Bε µ

µ µ=

=∑ . When p1 = 0.3 and p2 = 0.7, Dq of the binomial

measure are shown in Fig 4.10. By following the thermodynamic formulation of

multifractal measures, Canessa (2000) derived an expression for the “analogous”

specific heat as

( ) ( ) ( ) ( )

2

22 1 1q

qC q q q

q

ττ τ τ

∂≡ − ≈ − + − −

∂. (4.16)

He showed that the form of Cq resembles a classical phase transition at a critical

point for financial time series. We calculate Cq of the binomial measure when p1 =

0.3 and p2 = 0.7 as shown in Fig 4.11.

Fig 4.10 Dq of binomial measure when p1 = 0.3 and p2 = 0.7


____________________________________________________________________

154

Fig 4.11 Cq of binomial measure when p1 = 0.3 and p2 = 0.7

Multifractal analysis of binomial measure HVG

For different values of p1 and p2 we could get different binomial measure series.

(i) For any pairs of p1 and p2 where p1 < p2, we uniformly spread a fraction p1

and p2 of the mass µk[t, t + 2-k] on the subintervals [t, t + 2-k-1] and [t + 2-k-1,

t + 2-k]. A repetition of this scheme to all subintervals determines µk+1. More

specifically, in the step k-th step, the order of the sequence has been fixed,

thus based on the rules of HVG, any two nodes could either be linked by an

edge or have no edges at all in any HVG when p1 < p2. For example: for p1

= 0.2, p2 = 0.8; and p1 = 0.4, p2 = 0.6 in the step k = 3, we could get two

binomial measure series X = {x1=0.008, x2 =0.032, x3 =0.032, x4 =0.128,

x5=0.032, x6 =0.128, x7 =0.128, x8 =0.512} and Y = {y1=0.064, y2=0.096,

y3=0.096, y4=0.144, y5=0.096, y6=0.144, y7=0.144, y8=0.216} respectively.

We could see that if xi < xj then yi < yj i, j = 1, 2,…,8; if xi = xj then yi = yj i,

j = 1, 2,…,8; if xi > xj then yi > yj i, j = 1, 2,…,8. So the horizontal visibility

graphs of both X and Y will be the same one as shown in Fig 4.12 A.


____________________________________________________________________

155

Fig 4.12 Illustration of HVG for binomial measure

(ii) For any pairs of p1 , p2 and p2, p1 we could still get the same HVG

because the order of the two binomial measure series of p1 , p2 and p2, p1

are just opposite. For example: p1 = 0.1, p2 = 0.9; and p1 = 0.9, p2 = 0.1 in

the step k = 4, we could get two binomial measure series X = {0.0001,

0.0009, 0.0009, 0.0081, 0.0009, 0.0081, 0.0081, 0.0729, 0.0009, 0.0081,

0.0081, 0.0729, 0.0081, 0.0729, 0.0729, 0.6561}; Y = { 0.6561, 0.0729,

0.0729, 0.0081, 0.0729, 0.0081, 0.0081, 0.0009, 0.0729, 0.0081, 0.0081,

0.0009, 0.0081, 0.0009, 0.0009, 0.0001}. We could see that xi =yj, i = 1, 2,

3, …, 15, 16; j = 16, 15, 14, …, 2, 1. So the horizontal visibility graphs of

both X and Y will be the same one as shown in Fig 4.12B and C.

1 3 6 2 5 4 7 8

B

C

A


____________________________________________________________________

156

Then we take p1 = 0.3, p2 = 0.7 for example, and establish the horizontal visibility

graph of this binomial measure which contains 4096 nodes and 6131 edges. Then we

apply the modified fixed-size box-covering method which we proposed in Chapter 3

to detect their multifractal behaviour. Dq and Cq of the binomial measure are shown

in Fig 4.13 and Fig 4.14 respectively. A clear multifractal behaviour could be seen by

the shape of Dq which decreases as a function of q.

Fig 4.13 Dq of Binomial multifractal network

Fig 4.14 Cq of binomial multifractal network


____________________________________________________________________

157

In conclusion from these results, the horizontal visibility graphs inherit the

multifractality of the binomial multifractal measure. More specifically, the horizontal

visibility graphs for fractional Brownian motions, which are simple fractals, are

fractals, while the horizontal visibility graphs for generated binomial multifractal

measures are multifractal.

4.4.4 Multifractal analysis of measure representation of genome HVG

Measure representation of complete genomes

We first outline the method of Yu et al. (2001) in deriving the measure representation

of a DNA sequence. We call any string made up of K letters from the set {g, c, a, t} a

K-string. For a given K there are in total 4K different K-strings. In order to count the

number of each kind of K-string in a given DNA sequence, 4K counters are needed.

We divide the interval [0, 1) into 4K disjoint subintervals, and use each subinterval to

represent a counter. Letting { }1,..., , , , , , 1,2,...,K is s s s a c g t i K= ∈ = be a substring

with length K, we define

( )1 4

Ki

l ii

xx s

==∑ , (4.17)

where

0 if

1 if

2 if

3 if

i

ii

i

i

s a

s cx

s g

s t

= == = =

, (4.18)

and

( ) ( ) 1

4r l Kx s x s= + . (4.19)

We then use the subinterval ( ) ( ),l rx s x s to represent substring s. Let NK(s) be the

number of times that substring s with length K appears in the complete genome. If

the number of bases in the complete genome is L, we define

( ) ( ) ( 1)K KF s N s L K= − + (4.20)


____________________________________________________________________

158

to be the frequency of substring s. It follows that ( ){ } 1KsF s =∑ . Now we can define

a measure µK on [0, 1] by ( ) ( )Kd x Y x dxµ = , where

( ) ( ) ( ) ( )4 , when ,KK K l rY x F s x x s x s= ∈ . (4.21)

It is easy to see that ( )1

01Kd xµ =∫ and ( ) ( )( ) ( )[ , ]K l r Kx s x s F sµ = . Then µK is called

the measure representation of the organism corresponding to the given K. As an

example, the measure representation of substrings in the genome of B. burgdorferi

for K = 6 is given in Fig 4.15. Self-similarity is apparent in the measure.

Fig 4.15 Measure representation of B. burgdorferi

Remark: The ordering of a, c, g, t in Equation 4.21 follows the natural dictionary

ordering of K-strings in the one-dimensional space. A different ordering of K -strings

would change the nature of the correlations. But in our case, a different ordering of a,

c, g, t in Equation 4.21 gives almost the same Dq curve (therefore, the same with the

Cq curve) when the absolute value of q is relatively small. Hence a different ordering

of a, c, g, t in Equation 4.21 would not change our result. When we want to compare

different bacteria using the measure representation, once the ordering of a, c, g, t in

Equation 4.21 is given, it is fixed for all bacteria.


____________________________________________________________________

159

Fig 4.16 Dq of measure representations of bacteria

Fig 4.17 Cq of measure representations of bacteria

We can order the F(s) according to the increasing order of xl(s). We then obtain a

sequence of real numbers consisting of 4K elements which we denote as F (t), t=1,

2,…, 4K. Viewing the sequence ( ){ }4

1

K

tF t

= as a time series, we use the fixed-size box-

counting algorithm to calculate Dq for the measure representation of five bacteria as


____________________________________________________________________

160

shown in Fig 4.16. The Cq curves of these measure representations are shown in Fig

4.17.

Multifractal analysis of measure representations of genome HVG

Consider the measure representation of a DNA sequence as a time series. We

construct the HVG of each series. For these networks, we apply the random-

sequence box-covering method to calculate their fractal dimension. Then we apply

our modified fixed-size box-covering method to detect their multifractal behaviour.

We consider five bacteria:

1. Archaeoglobus fulgidus (A. Fulgidus)

2. Borrelia burgdorferi (B. Burgdorferi)

3. Chlamydia trachomatis serovar (C. Trachomatis)

4. Escherichia coli (E. coli)

5. Mycoplasm genitalium (M. Genitalium)

The numerical results are summarised in Table 4.3 including organisms, length of

each measure representation series (L), number of nodes in the corresponding HVG

(N), number of edges in the corresponding HVG (E), diameter of the corresponding

HVG (d), fractal dimension of the corresponding HVG (fd) with error, maximum of

Dq, limit of Dq and ∆ Dq.

Table 4.3 Numerical results of measure representations of genome HVG

organism A. fulgidus B. burgdorferi C. trachomatis E. col M. genitalium L 4096 4096 4096 4096 4096 N 4096 4096 4096 4096 4096 E 8164 8139 8134 8163 8093 d 25 27 27 33 24 fd 1.7806 1.775 1.8898 1.873 1.8013

error 0.1079 0.0871 0.0852 0.14 0.1126 max Dq 2.77 2.65 2.68 2.57 2.72 lim Dq 2.04 2.24 2.08 1.92 2.22 ∆ Dq 0.73 0.41 0.6 0.65 0.5


____________________________________________________________________

161

Dq and Cq of measure representation HVG are shown in Fig 4.18 and Fig 4.19

respectively. A clear multifractal behaviour could be seen by the shape of Dq which

decreases as a function of q.

Fig 4.18 Dq of measure representation of bacteria HVG

Fig 4.19 Cq of measure representation of bacteria HVG


____________________________________________________________________

162

4.5 Resilience of visibility graphs and horizontal visibility graphs

4.5.1 Resilience of visibility graphs

Resilience can be understood as the ability of a system to return to a stable state

following a strong perturbation caused by failure, disaster or attack. Ip and Wang

(2011) represented transportation networks by an undirected graph with the nodes as

cities and edges as traffic roads. The resilience and friability of the railway network

within the Chinese mainland was then evaluated and analysed. Rodrigues et al. (2011)

investigated how resilience is determined by the large scale features of protein-

protein interaction networks. Their results revealed that the fraction of hubs and the

average neighbour degree contribute significantly to the resilience of the networks.

Many other research studies have discussed the resilience of different systems, such

as the resilience of ecosystems (Vergano and Nunes 2007), strategies to enhance

resilience in hospitals (Simonis 2006) and resilient topology structures of computer

networks (Miller and Xiao 2007).

Resilience of scale free networks

It is commonly believed that scale-free networks are robust to massive numbers of

random node deletions (Cormen et al. 2001 a, b). When such networks are subject to

random breakdowns— a fraction p of the nodes and their connections are removed

randomly—their integrity might be compromised: when p exceeds a certain

threshold, p > pc, the network disintegrates into smaller, disconnected parts. Below

that critical threshold, there still exists a connected cluster that spans the entire

system (its size is proportional to that of the entire system).

Random breakdown in networks can be seen as a case of infinite-dimensional

percolation (Cormen et al. 2001 a). Cormen et al. considered random breakdown in

the scale-free networks (such as the Internet) and introduced an analytical approach

to finding the critical point. They introduced a new general criterion for the

percolation critical threshold of randomly connected networks. Using this criterion,

they showed analytically that the Internet undergoes no transition under random


____________________________________________________________________

163

breakdown of its nodes. In other words, a connected cluster of sites that spans the

Internet survives even for arbitrarily large fractions of crashed sites.

Consider a node with initial connectivity k0, chosen from an initial distribution P(k0).

After the random breakdown the distribution of the new connectivity of the node

becomes 00 (1 ) k kkkp p

k−

−

, and the new distribution is

0

0

00'( ) ( ) (1 ) k kk

k k

kP k P k p p

k

∞−

=

= −

∑ . (4.22)

Using this new distribution, we could obtain 0' (1 )k k p= × − and

2 2 20 0' (1 ) (1 )k k p k p p= − + − , thus we have ( )

20

0

1 2c c

kp p

k− + = or

0

11

1cpκ

− =−

, (2.23)

where 20 0 0k kκ = is computed from the original distribution before the random

breakdown.

For example, in the case of the Internet, it is widely believed that, to a good

approximation, the degree distribution of the Internet nodes follows a power law

(Cormen et al. 2001 a, b):

( ) , , 1,...,P k ck k m m Kα−= = + , (4.24)

where α ≈ 5/2, c is an appropriate normalization constant, and m is the smallest

possible degree. In a finite network, the largest degree, K, can be estimated from

( ) 1K

P k dkN

∞=∫ (4.25)

yielding K ≈ mN1/( α-1) . For the Internet, m = 1 and K ≈ N2/3. For the sake of generality,

consider a range of variables, α ≥ 1 and 1 ≤ m ≤ K. The key parameter, according to

Equation 2.25, is the ratio of second to first moment, k0, which we compute by

approximating the distribution (Equation 2.26) to a continuum (this approximation

becomes exact for 1 ≤ m ≤ K, and it preserves the essential features of the transition

even for small m):


____________________________________________________________________

164

3 3

0 2 2

2

3

K m

K m

α α

α αακα

− −

− −

− − = − − . (4.26)

When K » m, this may be approximated as

2 30

, if 3 2

, if 2 33

, if 1 2

m

m K

K

α α

αακ αα

α

− −

>− → × < <− < <

. (4.27)

We see that for α > 3 the ratio κ0 is finite and there is a percolation transition at

12

1 13cp m

αα

−− − = − − : for p > pc, the spanning cluster is fragmented and the

network is destroyed. However, for α < 3, the ratio κ0 diverges with K and so pc → 1

when K → ∞ (or N → ∞). The percolation transition does not take place: a

spanning cluster exists for arbitrarily large fractions of breakdown, p < 1. In finite

systems a transition is always observed, though for α < 3, the transition threshold is

exceedingly high. For the case of the Internet (α ≈ 2.5), κ0 ≈ K½ ≈ N ¾. Considering

the enormous size of the Internet, N > 106, one needs to destroy over 99% of the

nodes before the spanning cluster collapses.

Degree distribution of FBM VG

Lacasa et al. (2009) showed that the visibility graphs derived from generic fractional

Brownian motion are also scale-free, that is to say, their degree distribution follows a

power law P(k) ∼k−α, where k stands for the degree of a given node. This robustness

goes further, and they proved that a linear relation between the exponent γ of the

power law degree distribution in the visibility graph and the Hurst exponent H of the

associated FBM series exists.

In order to compare γ and H appropriately, Lacasa et al. (2009) calculated the

exponent of different scale-free visibility graphs associated with FBM series of 104

data points with 0 < H < 1 generated by a wavelet-based algorithm (Abry and Sellan

1996). For each value of the Hurst parameter they took the average over 10

realizations of the FBM. Then through the maximum-likelihood estimation


____________________________________________________________________

165

1

1 min

1 logn

i

i

kn

kα

−

=

= +

∑ , (4.28)

where ki (i = 1,2,…,n) are the observed values of k such that ki ≥kmin, and kmin = 10

was chosen (Lacasa et al. 2009), the power low exponent α could be estimated. They

inferred a roughly linear relation of

α = 3 – 2H. (4.29)

A similar result was obtained by Ni et al. (2009). They investigated FBM with the

Hurst index ranging from 0.1 to 0.9 in the spacing of 0.1. For each H, they repeated

the simulation 100 times and each simulation gives an FBM time series with the size

N = 50 000. For each FBM time series, a visibility graph was constructed and its

empirical degree distribution was determined. For a given H, the 100 distributions

almost collapse onto a single curve. This enables us to construct the empirical degree

distribution by treating all the data of the 100 graphs as one sample to gain better

estimates. Good power-law behaviours are observed in the degree distributions,

followed by faster relaxation. The power-law exponent α of the distributions is

calculated in the scaling ranges. The power-law exponent α for each sample is also

estimated through maximum likelihood estimation of Equation 4.30 and typically

kmin = 10 has been chosen. A linear relation of α = a – bH has been obtained where

a = 3.15 ± 0.07, and b = 2.15 ± 0.13.

That FBM yields scale-free visibility graphs is not that surprising (Lacasa et al.

2009). The most highly connected nodes (hubs) are responsible for the heavy-tailed

degree distributions. Within FBM series, hubs are related to extreme values in the

series, since a data point with a very large value has typically a large connectivity,

according to Equation 4.14 (definition of visibility graph).

Resilience of FBM VG

For FBM visibility graph, the Hurst index ( )0,1H ∈ , thus ( )1,3α ∈ . Combine

Equations 4.29 and 4.31 we get


____________________________________________________________________

166

1 2 2

0

1 , if 0

2 1 212

, if 12

H Hm K HH

HK H

κ

− < <− → × < <

. (4.30)

According to Equation 4.25 0

11

1cpκ

− =−

. For visibility graphs constructed from

fractional Brownian motions with Hurst index 0 < H < 1, the ratio κ0 diverges with K

and so pc → 1 when K → ∞ (or N → ∞). This means that VG of FBM remains

connected under large fractions of breakdown, p < 1.

Random breakdown test on FBM VG

Based on the above statement, we chose visibility graphs of fractional Brownian

motion for example, and randomly removed certain amount of nodes to test whether

this VG could survive. We selected fractional Brownian motion with Hurst index H

= 0.9, size 3×103 and constructed the visibility graph of 3000 nodes and 176160

edges. According to Equation 4.32 and 4.25, when H = 0.9, 0

2 1

2

HK

Hκ −= × where

K = 612 is the maximum degree in the VG. Thus κ0 = 272 and pc ≈ 0.996. This

means if 99.6% of the nodes have been removed randomly, the remaining nodes

would still be connected in a giant component.

However, after a random breakdown test, we could see that FBM VGs are not robust

to massive numbers of random node deletions. Two examples of random breakdowns

are shown in Fig 4.20. The original VG constructed from FBM with H = 0.9 has

3000 nodes and 176160 edges. In Fig 4.20 A, we randomly deleted 2617 nodes

(approximately 87.23% of the nodes) the remaining 383 nodes collapse into

separated parts and isolated nodes; in Fig 4.20 B, we randomly deleted 2089 nodes

(approximately 69.63% of the nodes) the remaining 911 nodes also collapse into

separated parts and isolated nodes.


____________________________________________________________________

167

Fig 4.20 FBM VG after random breakdown (H = 0.9)

Therefore visibility graphs of fractional Brownian motion are not as robust as other

scale-free networks (such as the Internet) under random breakdown, and more

detailed work will be needed for this study.

4.5.2 Resilience of horizontal visibility graphs

Degree distribution of HVG

The degree distributions of the HVGs of correlated and uncorrelated chaotic time

series are known to be exponential (Luque et al. 2009, Lacasa and Toral 2010):

( ) bkP k ae−≈ . (4.31)

A B


____________________________________________________________________

168

A natural conjecture is that the degree distributions of the HVGs transformed from

fractional Brownian motions might have exponential degree distribution as well. We

study the problem numerically. For a given value of Hurst index, we generated ten

FBMs of size 7×103 and converted into HVGs. Next for each HVG’s degree

distribution we chose the best exponential fit and then took the average of

exponential exponents over ten sample paths. We summarize the average of

exponential exponents (b) of FBM HVG’s degree distributions for a given value of

Hurst index (H) in Table 4.4.

Fig 4.21 Illustration of exponential fit of HVG degree distribution

Remark: The best exponential fit means, for different degree distributions, we need

to choose the range of k that the exponential fit of P (k) versus k has an error less

than 0.1. An illustration is shown in Fig 4.21. When H = 0.3, we start exponential

fitting from the second point and the error is 0.1. So usually, HVGs’ degree

distributions have exponential tails.


____________________________________________________________________

169

Table 4.4 Exponential exponent for degree distribution of FBM HVG

H 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

b 0.39 0.38 0.38 0.38 0.37 0.40 0.42 0.41 0.43

h 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

b 0.47 0.48 0.52 0.55 0.58 0.61 0.66 0.67 0.71 0.76

From the results we could see that there are approximately linear relationships

between exponential exponents (b) and Hurst index (H). As shown in Fig 4.22, when

0 < h ≤ 0.4, a linear regression gives that

0.37 0.1b h≈ + . (4.32)

The error bar of the slope of the linear term is 0.04. When 0.4 ≤ h < 1, another linear

regression yields

0.15 0.6b h≈ + . (4.33)

The error bar of the slope of the linear term is 0.02.

Fig 4.22 Hurst index with degree distribution exponent

Combining Equations 4.31-4.33, we get

log ( ) ( ) logP k b k a≈ − + , (4.34)


____________________________________________________________________

170

or more specifically

(0.37 0.1 ) log for 0 0.4

log ( )(0.15 0.6 ) log for 0.4 1

h k a hP k

h k a h

− + + < ≤≈ − + + ≤ <

. (4.35)

Hence, for a given value of Hurst index H, there would be a linear relationship

between logP(k) and k with slope approximately equal to -b (exponential exponents).

So we test this conclusion numerically (as shown in Fig 4.23) for HVGs constructed

from fractional Brownian motions with H = 0.1, 0.2, …, 0.8. Through linear fits we

find the slopes for logP(k) versus k quite close to –b (summarized in Table 4.5),

which confirms the above statement.

Fig 4.23 FBM HVG log P (k) vs. k


____________________________________________________________________

171

Table 4.5 Comparison of logP(k) vs. k between numerical linear regressions and Eq. 4.35

H 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

-b -0.38 -0.39 -0.40 -0.41 -0.47 -0.53 -0.59 -0.65

slope -0.40 -0.40 -0.42 -0.44 -0.49 -0.53 -0.64 -0.66

error 0.01 0.01 0.02 0.03 0.05 0.06 0.07 0.06

Additionally, the best exponential fits for horizontal visibility graphs constructed

based on measure representations of DNA sequences of five bacteria as well as

binomial multifractal measure (with p1 = 0.3, p2 = 0.7) are shown in Fig 4.24 and the

numerical results are summarized in Table 4.6.

Fig 4.24 Exponential fits for HVG


____________________________________________________________________

172

Table 4.6 Exponential exponents for degree distribution of multifractal HVG

network N E a b error

A. fulgidus 4096 8164 0.74 0.40 0.04

B. burgdorferi 4096 8139 0.90 0.43 0.05

C. trachomatis 4096 8134 0.84 0.42 0.05

E. coli 4096 8163 0.69 0.38 0.06

M. genitalium 4096 8093 1.05 0.46 0.07

Binomial 4096 6131 2.00 0.69 0.00

Resilience comparison between FBM VG and HVG

In Section 4.4.1 we discussed the resilience of visibility graph constructed based on

fractional Brownian motions with scale-free degree distributions. The most notable

characteristic in a scale-free network is the highest-degree nodes which are often

called "hubs". The scale-free property strongly correlates with the network's

robustness to failure. It turns out that the major hubs are closely followed by smaller

ones. These ones, in turn, are followed by other nodes with an even smaller degree,

and so on. This hierarchy allows for a fault tolerant behavior. If failures occur at

random and the vast majority of nodes are those with small degree. The likelihood

that a hub would be affected is almost negligible. Even if a hub-failure occurs, the

network will generally not lose its connectedness, due to the remaining hubs.

Although, the test on FBM VG with Hurst index H = 0.9 shows the network is not as

robust as the Internet (which could remain connected after 99% breakdown), VG

could still survive after a large number of nodes being randomly deleted.

Here we compare the resilience of visibility graphs and horizontal visibility graphs

constructed based on the same fractional Brownian motions through a random test.

More specifically, we generated a fractional Brownian motion with Hurst index H =

0.9 of size 3×103, and converted it into visibility graphs and horizontal visibility

graphs respectively. Then, we removed the same nodes from the VG and HVG to see

whether the remaining nodes would still be connected or separated into isolated parts.


____________________________________________________________________

173

Table 4.7 Random breakdown test for FBM VG and HVG

Network N E d D n N’ E’ d’

VG 3000 176160 13 1:100:3000 30 2970 172713 13

VG 3000 176160 13 1:50:3000 60 2940 170153 13

VG 3000 176160 13 1:30:3000 100 2900 164408 13

VG 3000 176160 13 1:10:3000 300 2700 142076 13

VG 3000 176160 13 1:5:3000 600 2400 111467 14

VG 3000 176160 13 1:3:3000 1000 2000 78430 ∞

HVG 3000 5340 810 1:100:3000 30 2970 5239 ∞

We summarize the results into Table 4.7 including the number of nodes of the

original network (N), number of edges in the original network (E), diameter of the

original network (d), deleted range (D), number of nodes which have been deleted

(n), remaining number of nodes (N’), remaining number of edges (E’) and diameter

of the remaining part (d’). For example, both VG and HVG constructed based on the

same FBM, each node in the network represents a value in the time series. In the first

test (corresponding to the second row in Table 4.7), we deleted 30 nodes starting

from the first node to 2901 at interval of 100. And if the diameter of a network is ∞ it

means there are at least two nodes that do not have a path to connect them, resulting

in isolated parts instead of a whole connected part.

From this test, we could see that for the same fractional Brownian motion (H = 0.9),

its visibility graph is more robust than the corresponding horizontal visibility graph.

Its VGs could remain connected when 20% of nodes (600 nodes) have been removed,

while its HVGs collapse even with 1% of nodes (30 nodes) removed. The scale-free

degree distribution of VG reflects the fact that there are more edges concentrated

around hubs, while for HVG its exponential degree distribution means edges are

distributed regularly. Therefore, HVG is more easily damaged then VG.


____________________________________________________________________

174

4.6 Conclusion

In this chapter, we transformed some typical time series into networks and

investigated their properties.

(v) The fractal dimension of weighted networks constructed from fractional

Brownian motion decreases when its Hurst index increases.

(vi) Applying our fixed-size box-covering method, we found that HGVs

constructed from FBM are monofractals, while those for binomial measures

and measure representations of DNA sequences are multifractal. This study

implies that HVG inherits the multifractality of the original time series.

(vii) The degree distributions of HVGs constructed from fractional Brownian

motions, binomial measures and measure representations of DNA

sequences have exponential tails. The approximately linear relationships

between exponential exponents (b) and Hurst index (H) have been found.

We compared the resilience between VG and HVG constructed from FBM. VG

which has scale-free degree distribution is more robust than HVG which has

exponential degree distribution.


____________________________________________________________________

175

Chapter 5

Summary and future research This thesis proposed several new approaches and achieved some new results for

three related problems: (i) fractal scaling of weighted complex networks; (ii)

multifractal analysis of complex networks; and (iii) analysis of complex networks

constructed from time series.

5.1 Research innovations and contributions

In Chapter 2 we confirmed that a box-covering method, with a suitable concept of

distance metric, can be developed to expand the study of properties of complex

networks to a wider field including more complex weighted networks. We showed

that the weighted networks of protein-protein interactions inherit their self-similarity

from the original unweighted networks. This was demonstrated on the basis of the

fractal dimension, computed from the random sequential box-covering algorithm, of

the weighted PPI networks of five species, namely Homo sapiens, E. coli, yeast, C.

elegans and Arabidopsis Thaliana, generated by the iterative scoring method.

Chapter 3 established the multifractal behavior of different complex networks. In this

chapter, we proposed a new box covering algorithm for multifractal analysis of

complex networks. This algorithm was demonstrated in the computation of the

generalized fractal dimensions of some theoretical networks, namely scale-free

networks, small-world networks, random networks, and a kind of real networks,

namely PPI networks of different species. Our main finding is the existence of

multifractality in scale-free networks and PPI networks, while the multifractal

behaviour is not confirmed for small-world networks and random networks. As

another application, we generated gene interaction networks for patients and healthy

people using the correlation coefficients between microarrays of different genes. Our

results confirm the existence of multifractality in gene interaction networks.


____________________________________________________________________

176

Chapter 4 of the thesis concentrated on the topological properties of networks

constructed from time series. In this chapter, we proposed a new method to construct

networks of time series. We applied this method to build networks for fractional

Brownian motions, whose long-range dependence is characterised by their Hurst

exponent. We verified that time series with larger Hurst exponent have weighted

networks with smaller fractal dimension.

Our study of networks constructed via the technique of horizontal visibility graph

confirmed a linear relationship between the Hurst exponent of fractional Brownian

motion and the fractal dimension of the corresponding HVG network. Another

application of our newly developed box-covering algorithm to calculate the

generalized fractal dimensions of the HVG networks confirmed the monoscaling of

fractional Brownian motion and the multifractality of the binomial cascades and the

measure representations of five bacterial genomes.

As an additional application, we discussed the resilience of networks constructed

from time series via visibility graph and horizontal visibility graph. Our finding is

that the degree distribution of VG networks of fractional Brownian motions is scale-

free, while the degree distribution of their HVG networks has exponential tails. An

implication is that VG networks are more robust than HVG networks.

5.2 Possible future research

(1) Improvement of HVG for multifractal time series

By exploiting the duality between the recurrence matrix and the adjacency matrix of

a complex network in the study of dynamical systems, information about dynamical

recurrences can be used to construct complex networks from time series. These

recurrence-based complex networks provide a new approach for time series analysis

and offer a promising complementary view for the study of dynamical systems.

Applying well established complex network measures, it is feasible to characterize


____________________________________________________________________

177

and classify the dynamics of complex systems, to detect dynamical transitions and

identify invariant substructures.

In §4.3.3, based on the idea of recurrence plot we have established weighted

networks for fractional Brownian motions. However, a weighted network is fully

connected and may contain redundant information. So we need to consider to

reconstruct networks that contain less amount of edges from time series. There are

many open questions concerning the specific features and applicability of this

approach. To name a few: how to define a threshold properly to remove redundant

edges as much as possible and at the same time produce a connected network;

theoretical investigation of the relations between networks topological properties,

such as degree distribution, fractal scaling, multifractal analysis, and time series

recurrence quantification analysis.

We have noted that different values of p1 and p2 yield different binomial cascades;

however, the same horizontal visibility graph can be constructed from every

binomial cascade with the same length. Because the network structure is completely

determined in the (binary) adjacency matrix, information regarding the time series is

inevitably lost while mapping into HVG.

We applied the box-covering method proposed in Chapter 3 to detect the multifractal

behaviour by calculating Dq and Cq of horizontal visibility graph constructed from

binomial measures. A clear multifractal behaviour could be seen by the shape of Dq

which decreases as a function of q. However, HVG is not able to verify different

binomial measures generated from different p1 and p2.

So how to construct a network from a multifractal time series is still an essential

problem to be solved. We plan to bring in the weighted network idea to solve this

problem.

(2) Resilience comparison between VG and HVG of multifractal time series


____________________________________________________________________

178

In §4.4.2 we compared the resilience of visibility graph and horizontal visibility

graph constructed from the same fractional Brownian motions through a random test.

This test indicated that for the same fractional Brownian motion its visibility graph is

more robust than its horizontal visibility graph.

However, resilience of VG and HVG may vary for different time series. So we need

to do more detailed work on this problem. The work must also be expanded into the

context of multifractal time series.


____________________________________________________________________

179

References Abadi, M. and Grandchamp, E. (2006). Texture features and segmentation based on

multifractal approach. The 11th Iberoamerican Congress on Pattern Recognition,

4225, 297–305.

Abry P. and Sellan F. (1996). The wavelet-based synthesis for fractional Brownian

motion proposed by F. Sellan and Y. Meyer: Remarks and fast implementation. Appl.

Comput. Harmon. Anal., 3, 377.

Ahmadlou M., Adeli H. and Adeli A. (2010). New diagnostic EGG markers of the

Alzheimer’s disease using visibility graph. Journal of Neural Transmission, 117 (9),

1099-1109.

Ahuja R.K., Magnati T.L. and Orlin J.B. (1993). Network Flows: Theory,

Algorithms, and Applications, Prentice-Hall, Englewood Cliffs, NJ.

Albert R., Jeong H. and Barabasi A.L. (1999). Diameter of the World Wide Web.

Nature, 401, 130-131.

Albert R. and Barabasi A.-L. (2002). Statistical mechanics of complex networks, Rev.

Mod. Phys. 74, 47-94.

Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S.(2006).

Development and implementation of an algorithm for detection of protein

complexes in large interaction networks. BMC Bioinformatics, 7: 207

Amaral L.A.N., Scala A., Barthélémy M., and Stanley H.E. (2000). Classes of small-

world networks, Proc. Natl. Acad. Sci. USA, 97, 11149-11152.

Anh V.V., Tieng Q.M. and Tse Y.K. (2000). Cointegration of stochastic

multifractals with application to foreign exchange rates. Int. Trans. Opera. Res. 7,

349-363.


____________________________________________________________________

180

Anh V.V., Lau K.S. and Yu Z.G. (2002). Recognition of anorganism from fragments

of its complete genome. Phys. Rev. E. 66, 031910.

Backes A.R. and Bruno O.M. (2010). Shape classification using complex network

and multi-scale fractal dimension. Pattern Recognition Letter, 31, 44-51

Badii R. and Politi A. (1984). Hausdorff dimension and uniformity of strange

attractors. Phys. Rev. Lett. 52, 1661-1664.

Badii R. and Politi A. (1985). Statistical description of chaotic attractors: The

dimension function. J. Stat. Phys. 40, 725-750.

Barabasi A.-L. and Albert R. (1999). Emergence of Scaling in Random Networks,

Science 286, 509-512.

Barrat A. and Weigt M. (2000). On the properties of small-world network models,

Eur. Phys. J. B, 13, 547.

BioGRID: Http://thebiogrid.org/download.php.

Boccaletti S., Latora V., Moreno Y., Chavez M. and Hwang E.-U. (2006). Complex

networks: structure and dynamics. Physics Reports. 424, 175-308.

Borate B.R., Chesler E.J., Langston M.A., Saxton A.M. and Voy B.H. (2009).

Comparison of threshold selection methods for microarray gene co-expression

matrices. BMC Research Notes. 2,240

Bornholdt S. and Schuster H.G. (2003). Handbook of graphs and networks: from nets

to the Internet and WWW. Oxford, Oxford University Press.

Brown C.T. and Liebovitch L. S. (2009). Fractal analysis. Sara Miller McCune,

SAGE Publications, Inc.


____________________________________________________________________

181

Brun C., Chevenet F., Martin D., Wojcik J., Guenoche A.and Jacq B.(2003).

Functional classification of proteins for the prediction of cellular function from a

protein-protein interaction network, Genome Biol. 5, R6

Buhl J., Gautrais J., Solé R.V., Kuntz P., Valverde S., Deneubourg J.L. and

Theraulaz G. (2004). Topological patterns in street networks of self-organized urban

settlements, Eur. Phys. J. B 42, 123.

Canessa E. (2000). Multifractality in time series. J. Phys. A: Math. Gen. 33, 3637-

3651.

Chen H.T. and Wu C.F. (2011). Forecasting volatility in Shanghai and Shenzhen

markets based on multifractal analysis. Physica A-Statistical Mechanics and Its

Applications, 390 (16) 2926-2935.

Chiang Y.M., Chiang H.M. and Lin S.Y. (2008). The application of ant colony

optimization for gene selection in microarray-based cancer classification.

Proceedings of the Seventh International Conference on Machine Learning and

Cybernetics, Kunming, 12-15 July.

Chua H.N., Ning K., Sung W.K., Leong H.W.and Wong L.(2008). Using indirect

protein-protein interactions for protein complex predication, J. Bioinform. Comput.

Biol., 6, 435-466.

Collado M., Garcia V., Garcia J.M., Alonso I., Lombardia L., Diaz-Uriarte R.,

Fernández L.A., Zaballos A., Bonilla F.and Serrano M.(2007). Genomic profiling of

circulating plasma RNA for the analysis of cancer. Clin Chem. 53 (10), 1860-1863.

Cormen T.H., Leiserson C.E., Rivest R.L. and Stein C. (2001a). Introduction to

Algorithms, MIT University Press, Cambridge.


____________________________________________________________________

182

Cormen T.H., Leiserson C.E., Rivest R.L. and Stein C. (2001b). Breakdown of the

Internet under intentional attack. Phys. Rev. L. 86,16

Criado R., García del Amo A., Hernández-Bermejo B. and Romance M. (2005a).

New Results on computable efficiency and its stability for complex networks, J.

Comput. Appl. Math. 192, 59-74.

Criado R., Flores J., Hernández-Bermejo B., Pello J., Romance M. (2005b). Effective

measurement of network vulnerability under random and intentional attacks, J. Math.

Model. Alg. 4, 307-316.

Crownover R.M. 1995. Introduction to fractals and chaos. Jones and Bartlett

Publishers. London.

Cubellis M.V., Caillez F., Blundell T.L., Lovell S.C. (2005). Properties of

polyproline Ⅱ, a secondary structure element implicated in protein-protein

interactions. Proteins, 58: 880-892

Cytoscape software: http://cytoscapeweb.cytoscape.ory/

Dandekar T., Snel B., Huynen M., Bork P.(1998). Conservation of gene order: a

fingerprint of proteins that physically interact. Trends. Biochem. Sci. 23: 324-328

Daw C., Finney C. and Tracy E. (2003) A review of symbolic analysis of

experimental data. Review of Scientific Instruments 74, 915-930

Deane C. M., SlwinskiL., Xenarios L. and Eisenberg D.(2002). Protein interactions:

Two methods for assessment of the reliability of high throughput observations,

Molecular & Cellular Proteomics, 1, 349-356.


____________________________________________________________________

183

de Berg M., van Kreveld M., Overmans M. and Schwarzkopf O. (2008).

Computational Geometry: Algorithms and Applications (Third Edition). Springer-

Verlag, Berlin.

Dijkstra E.W. (1959). A note on two problems in connexion with graphs.

Numerische Mathematik, 1, 269-271.

DIP: http://dip.doe-mbi.ucla.edu/

Dodds P.S. and Rothman D.H. (2001). Geometry of river networks, Phys. Rev.E, 63,

016115, 016116& 016117.

Donner R. V., Zou Y., Donges J. F., Marwan N. and Kurths J. (2010). Recurrence

networks - A novel paradigm for nonlinear time series analysis," New Journal of

Physics 12, 033025

Donner R.V., Small M., Donges J.F., Marwan N., Zou Y., Xiang R.X. and Kurths J.

(2011). Recurrence-based time series analysis by means of complex network

methods. International Journal of Bifurcation and Chaos. 21 (4) 1019-1046.

Dorogovtsev S.N. and Mendes J.F.F. (2002). Evolution of networks, Advances in

Physis, 51, 1079-1187

Eckmann J.-P., Kamphorst S. O. and Ruelle, D. (1987). Recurrence plots of

dynamical systems. Europhysics Letters 4, 973-977.

Erdös P. and Rényi A. (1960). On the evolution of random graphs. Publ. Math. Inst.

Hung. Acad. Sci. 5, 17-61.

Enright A.J., Iliopoulos I., Kyrpides N.C., Ouzounis C.A. (1999). Protein interaction

maps for complete genomes based on gene fusion events. Nature, 402: 86-90


____________________________________________________________________

184

Everitt B. (1974). Cluster analysis, John Wiley, New York.

Ezekiel, S. (2003). Medical Image Segmentation using multifractal analysis.

Proceedings of the Applied Informatics, 378, 220–224.

Falconer K. (1997).Techniques in Fractal Geometry. Wiley, New York.

Faloutsos M., Faloutsos P. and Faloutsos C. (1999). On power-law relationships of

the internet topology. Comput. Commun. Rev. 29, 251-262.

Feder J. (1988). Fractals. Plenum, New York,.

Fell D.A. and Wagner A. (2000). The small world of metabolism, Nature

Biotechnology. 18, 1121-1122.

Fernández E., Bolea J.A., Ortega G. and Louis E. (1999). Are neurons multifractals?

Journal of Neuroscience Methods, 89, 151-157.

Mohammadi S. FNN: MATLAB function to calculate corrected false nearest

neighbors: http://ideas.repec.org/c/boc/bocode/t741510.html

Foroutan-pour K., Dutilleul P.and Simth D.L. (1999). Advances in the

implementation of the box-counting method of fractal dimension estimation, Applied

Mathematics and Comuputation, 105, 195-210.

Friedrich R., Peinke J., and Rahimi T.M. R.(2009). In Encyclopedia of Complexity

and System Science, edited by R. Meyers, Springer, Berlin.

Frontier S. (1987). Applications of fractal theory to ecology. In Developments in

Numerical Ecology, Legendre, P. and Legendre, L., Eds. Springer Verlag, Berlin,

335-378.


____________________________________________________________________

185

Gao L., Hu Y. and Di Z. (2008). Accuracy of the ball-covering approach for fractal

dimensions of complex networks and a rank-driven algorithm. Phys. Rev. E, 78,

046109.

Gao Z. and Jin N. (2009). Complex network from time series based on phase space

reconstruction. Chaos, 19, 033137

Goh K.I., Kahng B. and Kim D. (2001). Universal behaviour of load distribution in

scale-free networks, Phys. Rev. Lett. 87, 278701.

Goh K.I., Oh E., Jeong H., Kahng B. and Kim D. (2002). Classification of scale-free

networks, Proc. Natl. Acad. Sci. USA, 99, 12583-12588.

Goh K.I., Kahng B. and Kim D. (2003). Betweenness centrality correlation in social

networks, Phys. Rev. E 67, 017101.

Goh K-I, Salvi G., Kahng B., Kim D.(2006). Skeleton and fractal scaling in complex

networks. Physical Review Letters , 96(1): 018701.

Grassberger P.(1983). Generalized dimension of strange attractors. Phs. Lett. A, 97,

227-230.

Grassberger P. and Procaccia I. (1983). Characterization of strange attractors. Phys.

Rev. lett. 50, 346-349.

Guo L. and Cai X. (2009). The fractal dimensions of complex networks. Chin. Phys.

Lett. 26(8), 088901.

Gutin G., Mansour T. and Severini S. (2011). A characterization of horizontal

visibility graphs and combinatories on words. Physica A. 390, 2421-2428.


____________________________________________________________________

186

Halsey T.C., Jensen M.H., Kadanoff L.P., Procaccia I., and Shraiman B.I. (1986).

Fractal measures and their singularities: the characterization of strange sets. Phys.

Rev. A.33, 1141-1151.

Hentschel H.G.E. and Procaccia I. (1983a). Fractal nature of turbulence as

manifested in turbulent diffusion. Phys. Rev.A, 27, 1266-1269.

Hentschel H.G.E. and Procaccia I. (1983b). The infinite number of generalized

dimensions of fractals and strange attractors. Physica 8, 435-444.

Hoskins J., Lovell S.C., Blundell T.L. (2006). An algorithm for predicting protein-

protein interaction sites: abnormally exposed amino acid residues and secondary

structure elements. Protein Sci., 15(5): 1017-1029

Hsu, W.Y., Lin, C.C., Ju, M.S. and Sun, Y.N. (2007). Wavelet-based fractal features

with active segment selection: application to single-trial EEG data. Journal of

Neuroscience Methods. 163, 145–160.

Hu X.H. and Pan Y. (2007). Knowledge Discovery in Bioinformatics: Techniques,

Methods, and Applications. WILEY.

Ip W.H. and Wang D.W. (2011). Resilience and friability of transportation networks:

Evaluation, analysis and optimizaion. Systems Journal, IEEE, 5(2):189-198

Jaffard S. (1999). The multifractal nature of Lévy process. Probab. Theory Relat.

Fields, 114, 207-227.

Jeong H., Tombor B., Albert R., Oltvai Z.N. and Barabάsi A.L. (2000). The large-

scale organization of metabolic networks, Nature. 407, 651-654


____________________________________________________________________

187

Jun, Y., Yoon, Y. and Yoon, H. (1994). ECG data compression using fractal

interpolation. Proceedings of the 16th Annual International Conference of the IEEE

Engineering in Medicine and Biology Society, 1 (3-6), 161-162.

Kantelhardt J.W., Koscielny-Bunde E., Rybski D., Braun P., Bunde A. and Havlin S.

(2006). Long-term persistence and multifractality of precipitation and river runoff

records. J. Geophys. Res. 111, D01106.

Karimpour-Fard A., Hunter L., and Gill R.T. (2007). Investigation of factors

affecting prediction of protein-protein interaction networks by phylogenetic profiling.

BMC Genomics, 8:393

Kedzia, A., Rybaczuk, M. and Andrzejak, R. (2002). Fractal dimensions of human

brain cortex vessels during the fetal period. Medical Science Monitor, 8 (3), 46–51.

Keedwell E. and Narayanan A. (2005). Intelligent bioinformatics, the application of

artificial intelligence techniques to bioinformatics problems. John Wiley & Sons Ltd,

England.

Keller, J., Crownover, R., Chen, S. (1989). Texture description and segmentation

through fractal geometry. Computer Vision Graphics and Image Processing, 45 (2),

150–160.

Kestener, P., Arneodo, A., 2003. A three-dimensional wavelet based multifractal

method : about the need of revisiting the multifractal description of turbulence

dissipation data. Physical Review Letter, 91 (19), 194501.1–194501.4.

Kikuchi, A., Unno, N., Horikoshi, T., Shimizu, T., Kozuma, S. and Taketani, Y.

(2005). Changes in fractal features of fetal heart rate during pregnancy. Early Human

Development, 81, 655–661.


____________________________________________________________________

188

Kim D. H., Noh J. D, Jeong H. (2004). Scale-free trees: the skeleton of complex

networks. Physical Review E. 70(4): 046126.

Kim J. S., Goh K-I, Kahng B., Kim D.(2006). Skeleton and fractal scaling in

complex networks. Phys. Rev. Lett. 96, 018701.

Kim J. S., Goh K. I., Kahng B. and Kim D. (2007a). A box-covering algorithm for

fractal scaling in scale-free networks. Chaos, 17(2): 026116.

Kim J. S., Goh K-I, Kahng B., Kim D. (2007b). Fractality and self-similarity in

scale-free networks. New Journal of Physics, 9(6): 177

Kim J. S., Goh K-I, Salvi G., Oh E., Kahng B. and Kim D.(2007c). Fractality in

complex networks: critical and supercritical skeletons. Physical Review E. 75(1):

016110

Kim J. S., Kahng B., Kim D. and Goh K. I. (2008). Self-similarity in fractal and non-

fractal networks. Journal of Korean Physical Society 52, 350.

Kim J. S., Kahng B. and Kim D. (2009). Disassortativety of random critical

branching trees. . Physical Review E. 79(6): 067103

Opheusden J.H.H., Bos M.T.A. and Kaaden G.(1996). Anomalous multifractal

spectrum of aggregating Lennard-Jones particles with Brownian dynamics. Physica

A, 227, 183-196

Palzkill T.(2002). Proteomics. Kluwer Academic Publishers.

Park S. H., Reyes J. A., Gilbert D. R., Kim J. K., Kim S.(2009). Prediction of

protein-protein interaction types using association rule based classification. BMC

Bioinformatics, 10-36


____________________________________________________________________

189

Pastor-Satorras R. and Vespignani A. (2004). Evolution and structure of the Internet:

A statistical physics approach, Cambridge University Press, Cambridge.

Patil A. and Nakamura H. (2005). Filtering high-throughput protein-protein

interaction data using a combination of genomic features, BMC Bioinformation, 6,

100

Pei P. J. and Zhang A.D. (2005). A topological measurement for weighted protein

interaction networks, Computational system bioinformatics conference proceedings.

IEEE, 268-278.

Peink J., Alber M.(1998). Improved multifractal box-counting algorithm, virtual

phase transitions, and negative dimensions. Physical Review E. 57(5): 5489-5493

Pellegrini M., Marcotte E.M., Thompson M.J., Eisenberg D., Yeates T.O.(1999)

Assigning protein functions by comparative genome analysis: protein phylogenetic

porgies. Proc Natl Acad Sic USA , 96(8): 4285-4288

Prim R. C. (1957). Shortest connection networks and some generalizations. In: Bell

System Technical Journal, 36, 1389–1401.

Lacasa L., Luque B., Ballesteros F., Luque J. and Nuno J. C. (2008). From time

series to complex networks: The visibility graph. Proceedings of the National

Academy of Sciences USA 105, 4972-4975.

Lacasa L., Luque B., Luque J. and Nuno J. C. (2009). The visibility graph: A new

method for estimating the Hurst exponent of fractional Brownian motion.

Europhysics Letters 86, 30001.

Lacasa L. and Toral R.(2010). Description of stochastic and chaotic series using

visibility graphs. Phys. Rev. E 82, 036120.


____________________________________________________________________

190

Latora V., Marchiori M. (2005). Vulnerability and protection of infrastructure

networks, Phys. Rev. E 71, 015103 (R).

Lazaros K.G., Song C.M. and Makse H.A.(2007). A review of fractality and self-

similarity in complex networks. Phys. A. 386, 686-691.

Lee C.Y. and Jung S. (2006). Statistical self-similar properties of complex networks.

Phys. Rev. E. 73(6),066102.

Lee H.K., Hsu A.K., Sajdak J. Qin J. and Pavlidis P. (2004). Coexpression analysis

of human genes across many microarray data sets. Genome Res. 14 (6), 1085-1094.

Lee C.Y. and Jung S. (2006). Statistical self-similar properties of complex networks.

Physical Review E. 73, 066102.

Li M., Wang J.X., Chen J.E. and Pan Y. (2009). Hierarchical organization of

functional modules in weighted protein interaction networks using clustering

coefficient, LNCS, Bioinformatics research and applications, 5542, 75-86.

Li M., Wang J.X., Wang H. and Pan Y. (2010). Essential proteins discovery from

weighted protein interaction networks, LNCS, Bioinformatics research and

applications, 6503, 89-100.

Liang L.R., Lu S., Wang X., Lu Y., Mandal V., Patacsil D. and Kumar D. (2006).

FM-test: a fuzzy-set-theory-based approach to differential expression data analysis.

BMC Bioinformatics. 7(s4), s7.

Liu C.C., Chen W.E., Lin C.C., Liu H.C., Chen H.Y., Yang P.C., Chang P.C. and

Chen J.J.W. (2006). Topology-based cancer classification and related pathway

mining using microarray data. Nucleic Acids Research. 34 (14), 4069-4080.


____________________________________________________________________

191

Liu G., Wong L. and Chua H.N. (2009). Complex discovery from weighted PPI

networks, Systems biology, 25, 1891-1897.

Lopes R. and Betrouni N. (2009). Fractal and multifractal analysis: A review.

Medical Image Analysis. 13 (4), 634-649

Luque B, Lacasa L., Ballesteros F. and Luque J. (2009). Horizontal visibility graphs:

Exact results for random time series. Physical Rev. E. 80, 046103.

Lynch S. (2004). Dynamical systems with applications using MATLAB. Birkhäuser

Boston.

Mandelbrot B.B. (1982). The Fractal Geometry of Nature. W.H. Freeman and

Company.

Mandelbrot B.B. (1986). Self-affine fractal sets, I: The basic fractal dimensions.

Mandelbrot B.B., Fisher A. and Calvet L. (1997).A Multifractal Model of Asset

Returns.

Mansury, Y. and Deisboeck, T. (2004). Simulating ‘‘structure-function” patterns of

malignant brain tumors. Physica A., 331 (1–2), 219–232.

Mantegan R. and Stanley H. E. (2000). An Introduction to Econophysics:

Correlations and Complexities in Finance, Cambridge University Press, New York.

Marwan N., Kurths J. and Saparin P. (2007a). Generalised Recurrence Plot Analysis

for Spatial Data. Physics Letters A 360, 545-551.

Marwan N., Romano M. C., Thiel M. and Kurths J. (2007b). Recurrence Plots for the

Analysis of Complex Systems. Physics Reports 438, 237-329.


____________________________________________________________________

192

Marwan N., Donges J. F., Zou Y., Donner R. V. and Kurths J. (2009). Complex

network approach for recurrence analysis of time series. Physics Letters A 373,

4246-4254

Mavroforakis M., Georgiou H., Dimitropoulos N., Cavouras D. and Theodoridis. S.,

(2006). Mammographic masses characterization based on localized texture and

dataset fractal analysis using linear, neural and support vector machine classifiers.

Artificial Intelligence In Medicine, 37, 145–162.

Milgram S. (1967). The small-world problem. Psychol. Today. 2, 60-67.

Milkova E. (2007). The minimum spanning tree problem: Jarnik’s solution in

historical and present context. Electronic notes in discrete mathematics. 28: 309-316

Miller A. and Xiao Y. (2007). Multi-level strategies to achieve resilience for an

organization operating at capacity: A case study at a trauma centre,” Cogn. Tech.

Work., 9, 51–66.

Moreno, J.L. (1934). Who shall survive? : A new approach to the problem of human

interrelations. Beacon House, Beacon, NY.

Moriyama M., Hoshida Y., Otsuka M., Nishimura S., Kato N., Goto T., Taniguchi H.,

Shiratori Y., Seki N. and Omata M. (2003). Relevance network between

chemosensitivity and transcriptome in human hepatoma cells. Mol Cancer Ther. 2 (2),

199-205.

Murks A, and Perc M. (2011). Evolutionary games on visibility graphs. Advances in

Complex Systems, 14 (3), 307-315

Newman M.E.J and Watts D.J. (1999). Renormalization group analysis of the small-

world network model. Phys. Lett., 263, 341-346.


____________________________________________________________________

193

Newman M.E.J. (2001a). The structure and function of complex networks, SIAM

Review, 45, 167-256.

Newman M.E.J. (2001b). Scientific collaboration networks: I. Network construction

and fundamental results, Phys. Rev. E 64, 016131.

Newman M.E.J. (2003). The structure and function of complex networks. SIAM

Review, 45,(2), 167-256.

Neman M.E.J., Barabάsi A.L. and Watts D.J. (2006). The structure and dynamics of

networks. Princeton University Press.

Ni X.H., Jiang Z.Q. and Zhou W.X. (2009). Degree distributions of the visibility

graphs mapped from fractional Brownian motions and multifractal random walks.

Physics Letters A. 373, (42): 3822-3826.

Oostrom O. V., Kleijn D., Fledderus J.O., Pescatori M., Stubbs A., Tui nenburg A.,

Lim S.K. and Verhaar M.C. (2009). Folic acid supplementation normalizes the

endothelial progenitor cell transcriptome of patients with type I diabetes: a case-

control pilot study. Cardiovascular Diabetology. 8, 47.

Palzkill T. (2002): Proteomics. Kluwer Academic Publishers.

Park S. H., Reyes J. A., Gilbert D. R., Kim J. K., Kim S. (2009). Prediction of

protein-protein interaction types using association rule based classification. BMC

Bioinformatics, 10, 36.

Peink J., Alber M. (1998). Improved multifractal box-counting algorithm, virtual

phase transitions, and negative dimensions. Physical Review E., 57(5): 5489-5493


____________________________________________________________________

194

Pellegrini M., Marcotte E.M., Thompson M.J., Eisenberg D. (1999). Yeates TO:

Assigning protein functions by comparative genome analysis: protein phylogenetic

porgies. Proc Natl Acad Sic USA, 96(8): 4285-4288

Popivanov, D., Jivkova, S., Stomonyakov, V. and Nicolova, G. (2005). Effect of

independent component analysis on multifractality of EEG during visual-motor task.

Signal Processing, 85 (11), 2112–2123.

Qi Y., Klein-Seetharaman J., Bar-Joseph Z. (2007). A mixture of feature experts

approach for protein-protein interaction prediction. BMC Bioinformatics, 8 (Suppl

10), S6.

Rao M.A. and Srinivas J. (2003). Neural Networks: Algorithms and Applications.

Alpha Science International Lrd.

Rapaport F., Zinovyev A., Dutreix M., Barillot E. and Vert J.P. (2007). Classification

of microarray data using gene networks. BMC Bioinformatics. 8, 35.

Rapoport A. and Horvath W.J. (1961). A study of a large sociogram. Behavioral

Science, 6, 279-291.

Roberson M.C., Sammis C.G., Sahimi M. and Martin A.J. (1995). Fractal analysis of

three-dimensional spatial distributions of earthquakes with a percolation

interpretation. J.Geophys. Res., 100, 609-620.

Rodrigues F. A., Costab L.da F. and Barbierib A. L. (2011). Resilience of protein-

protein interaction networks as determined by their large-scale topological features.

Molecular Biosystems. 4, 1263-1269.

Rozenfeld H. D., Gallos L. K., Song C., Makse H. A.(2008) Fractal and transfractal

scale-free networks. Physics. 16


____________________________________________________________________

195

Rual J. F. et al.(2005). Towards a proteome-scale map of the human protein-protein

interaction network. Nature, 473(20): 1173-1178

Rumelhart D.E., Hinton G.E., Williams R.J. (1986). Learning representations by

back-propagating errors. Nature, 323:533-536

Saa A., Gascό G., Grau J.B., Antόn J.M. and Tarquis A.M. (2007). Comparison of

gliding box and box-counting methods in river network analysis. Nonlin. Processes

Geophys., 14, 603-613.

Sanoudou D., Haslett J.N., Kho A.T., Guo S., Gazda H.T., Greenberg S.A., Lidov

H.G., Kohane I.S., Kunkel L.M. and Beggs A.H. (2003). Expression profiling reveals

altered satellite cell numbers and glycolytic enzyme transcription in nemaline

myopathy muscle. Proc Natl Acad Sci USA. 100 (8), 4666-4671.

Scott J. (2000). Social network analysis: A handbook, Sage Publications, London.

Sedgewick R. (1988). Algorithms in C++. Part 5: Graph Algorithms, Addison-

Wesley, Boston MA.

Seuront L. (2010). Fractals and multifractals in ecology and aquatic science. CRC

Press.

Shanker O. (2007). Defining dimension of a complex network. Mod.Phys.Lett.B, 21,

321-326.

Shimada, Y., Kimura, T. and Ikeguchi, T. (2008). Analysis of chaotic dynamics

using measures of the complex network theory. Arti_cial Neural Networks - ICANN

2008, Pt. I, eds. Kurkova, V., Neruda, R. and Koutnik, J. (Springer, New York), pp.

61-70


____________________________________________________________________

196

Shimizu, Y., Barth, M., Windischberger, C., Moser, E. and Thurner, S. (2004).

Waveletbased multifractal analysis of fMRI time series. NeuroImage, 22 (3), 1195–

1202.

Simonis H. (2006). Constraint based resilience analysis. Lecture Notes in Computer

Science, 4204, 16–28.

Simth T.G. and Lange G.D. (1998). Biological cellular morphometry-fractal

dimensions, lacunarity and multifractals. Fractal in Biology and Medicine (Birkauser,

Basel, 1998)

Shirazi A. H., Jafari G. R., Davoudi J., Peinke J., Tabar M. R. R. and Sahimi M.

(2009). Mapping stochastic processes onto complex networks, J. Stat. Mech. P07046.

Small, M., Zhang, J. and Xu, X. (2009). Transforming time series into complex

networks. Complex Sciences.First International Conference, Complex 2009.

Shanghai, China, February 2009. Revised Papers, Part 2, ed. Zhou, J. (Springer,

Berlin), 2078-2089

Sol A.D. and O’Meara P. (2005). Small-world network approach to identify key

residues in protein-protein interaction. Protein, 58: 672-682

Song C., Havlin S., and Makse H. A. (2005). Self-similarity of complex networks.

Nature (London), 433, 392 -395.

Song C., Gallos L. K., Havlin S., Makse H. A. (2006). Origins of fractality in the

growth of complex networks. Nature Physics 2, 275-281.

Song, C., Lazaros K. G., Havlin S., Makes H. A. (2007). How to calculate the fractal

dimension of a complex network: the box covering algorithm. Journal of Statistical

Mechanics: Theory and Experiment, 3, P03006


____________________________________________________________________

197

Stekel D. (2003). Microarray Bioinformatics. Cambridge.

Strogatz S.H. (2001). Exploring complex networks, Nature, 410,268-276

Takahashi, T., Murata, T., Narita, K., Hamada, T., Kosaka, H., Omori, M., Takahashi,

K., Kimura, H., Yoshida, H. and Wada, Y. (2006). Multifractal analysis of deep

white matter microstructural changes on MRI in relation to early-stage

atherosclerosis. NeuroImage, 32 (3), 1158–1166.

Thiel M., Romano M. and Kurths J. (2006). Spurious structures in recurrence plots

induced by embedding. Nonlinear Dynamics, 44.

Uetz P., et. al. (2000). A comprehensive analysis of protein-protein interactions in

Saccharomyces cerevisiae, Nature, 403,623-627

Veneziano D., Langousis A. and Furcolo P. (2006). Multifractality and rainfall

extremes: A review. Water Resour. Res.42, W06D15.

Venugopal V., Roux S.G, Foufoula-Georgiou E. and Arneodo A. (2006). Revisiting

multifractality of high-resolution temporal rainfall using a wavelet-based formalism.

Water Resour. Res. 42, W06D14.

Vergano L. and Nunes P. A. L. D. (2007). Analysis and evaluation of ecosystem

resilience: An economic perspective with an application to the Venice lagoon.

Biodivers. Conserv., 16, 3385–3408.

Voss R.F. (1988). Fractals in nature: From characterization to simulation. In The

Science of Fractal Images, Peiten H.O. and Saupe D., Eds. Springer, New York, 21-

70


____________________________________________________________________

198

Voy B.H., Scharff J.A., Perkins A.G., Saxton A.M., Borate B., Chesler E.J.,

Branstetter L.K. and Langston M.A. (2006). Extracting gene networks for low-dose

radiation using graph theoretical algorithms. PLos Comput Biol. 2 (7), e89.

Wachi S. Yoneda K. and Wu R. (2005). Interactome-transcriptome analysis reveals

the high centrality of genes differentially expressed in lung cancer tissues.

Bioinformatics. 21 (23), 4205-4208

Waksman G. (2005): Protein Reviews. Springer.

Wasserman S. and Faust K. (1994). Social networks analysis, Cambridge University

Press, Cambridge.

Watts D.J. (1999). Small worlds: the dynamics of networks between order and

randomness, Princeton University Press, Princeton, NJ.

Watts D.J. and Strogatz S.H. (1998). Collective dynamics of ‘small-world’ networks,

Nature 393, 440-442.

Werhli A.V., Grzegorczyk M., Husmeier D. (2006). Comparative evaluation of

reverse engineering gene regulatory networks with relevance networks, graphical

Gaussian models and Bayesian models. Bioinformatics. 22, 2523-2531.

Wright G.W. and Simon R.M. (2003). Arandom variance model for detection of

differential gene expression in small microarray experiments. Bioinformatics. 19 (18),

2448-2455.

Wu J., Sun H. and Gao Z. (2008). Mapping to complex networks from chaos time

series in the car following model. Traffic and Transportation Studies: Proceedings of

the Sixth International Conference on Traffic and Transportation Studies, eds. Mao,

B., Tian, Z., Huang, H. & Gao, Z. (ASCE & T&DI, Reston, VA), 397-407


____________________________________________________________________

199

Xia, Y., Feng, D. and Zhao, R. (2006). Morphology-based multifractal estimation for

texture segmentation. IEEE Transactions on Image Processing, 15 (3), 614–624.

Xie S.Y., Cheng Q.M., Ling Q.C., Li B., Bao Z.Y. and Fan P. (2010). Fractal and

multifractal analysis of carbonate pore-scale digital images of petroleum reservoirs.

Marine and Petroleum Geology, 27 (2), 476-485.

Xie W.J. and Zhou W.X. (2011). Horizontal visibility graphs transformed from

fractional Brownian motions: Topological properties versus the Hurst index. Phys. A.

390(20), 3592-3601.

Xu, X., Zhang, J. and Small, M. (2008). Superfamily phenomena and motifs of

networks induced from time series. Proceedings of the National Academy of

Sciences USA 105, 19601-19605.

Yang X., Pratley R.E., Tokraks S., Bogardus C. and Permana P.A. (2002).

Microarray profiling of skeletal muscle tissues from equally obese, non-diabetic

insulin-sensitive and insulin-resistant Pima Indians. Diabetolotgia. 45, 1584-1593.

Yang Y. and Yang H. (2008). Complex network-based time series analysis. Physica

A 387, 1381-1386.

Yi W.J., Heo M.S., Lee S.S., Choi S.C., Huh K.H., Lee S.P. (2007). Direct

measurement of trabecular bone anisotropy using directional fractal dimension and

principal axes of inertia. Oral surgery, oral medicine, oral pathology, oral radiology,

and endodontics, 104 (1), 110–116.

Yook S.-H., Radicchi F. and Meyer-Ortmanns H. (2005) Self-similar scale-free

networks and disassortativity, Phys. Rev. E., 72, 045105.


____________________________________________________________________

200

Yu Z.G., Anh V. and Lau K.S. (2001a). Multifractal characterisation of length

sequences of coding and noncoding segments in a complete genome. Physica A.301,

351-361.

Yu Z.G., Anh V. and Lau K.S. (2001b). Measure representation and multifractal

analysis of complete genome. Phys. Rev. E. 64, 31903.

Yu Z.G., Anh V. and Wang B. (2001c). Correlation property of length sequences

based on global structure of complete genome. Phys. Rev. E. 63, 11903.

Yu Z.G., Anh V. and Lau K.S. (2003). Multifractal and correlation analysis of

protein sequences from complete genome. Phys. Rev. E. 68, 021913.

Yu Z.G., Anh V. and Lau K.S. (2004). Chaos game representation of protein

sequences based on the detailed HP model and their multifractal and correlation

analyses. J. Theor. Biol. 226, 341-348.

Yu Z.G., Ang V., Wanliss J.A. and Watson S.M. (2005). Chaos game representation

of the Dst index and prediction of geomagnetic storm events. Chaos, Solitons and

Fractals, 31, 736-746.

Yu Z.G., Anh V.V., Lau K.S. and Zhou L.Q. (2006). Fractal and multifractal analysis

of hydrophobic free energies and solvent accessibilities in proteins. Phys. Rev. E. 73,

031920.

Yu Z.G., Anh V.V., and Zhou Y. (2007). Cluster protein structures using recurrence

quantification analysis on coordinates of alpha-carbon atoms of proteins, Physics

Letters A., 368, 314-319.

Yu Z.G., Anh V. and Eastes R. (2009). Multifractal analysis of geomagnetic storm

and solar flare indices and their class dependence. J. Geophys. Res. 114, A05214.


____________________________________________________________________

201

Yu Z.G., Anh V., Wang Y., Mao D. and Wanliss J. (2010). Modeling and simulation

of the horizontal component of the geomagnetic field by fractional stochastic

differential equations in conjunction with empirical mode decomposition. J. Geophys.

Res. 115, A10219.

Zhang J. and Small M. (2006). Complex network from pseudoperiodic time series:

Topology versus dynamics. Physical Review Letters 96, 238701.

Zhang J., Sun J., Luo X., Zhang K., Nakamura T. and Small M. (2008).

Characterizing pseudoperiodic time series through the complex network approach.

Physica D 237, 2856-2865

Zhang L., Hu K. and Tang Y. (2010). Predicting disease-related genes by topological

similarity in human protein-protein interaction network. Cent. Eur.J.Phys. 8(4), 672-

682.

Zhang Z.-Z., Zhou S.-G. and Zou T. (2007). Self-similarity, small-world, scale-free

scaling, disassortativity, and robustness in hierarchical lattices. Eur. Phys. J. B, 56,

259–271.

Zhou L.Q., Yu Z.G., Deng J.Q., Anh V. and Long S.C. (2005). A fractal method to

distinguish coding and noncoding sequences in a complete genome based on a

number sequence representation. J. Theor. Biol. 232, 559-567.

Zhou W.X., Jiang Z.Q. and Sornette D. (2007). Exploring self-similarity of complex

cellular networks: The edge-covering method with simulated annealing and log-

periodic sampling. Physica A. 375, 741-752.

Zhuang X. and Meng Q. (2004). Local fuzzy fractal dimension and its application in

medical image processing. Artificial Intelligence in Medicine, 32 (1), 29–36.

Multifractal Characterisation and Analysis of Complex Networks · repeats on an ever-reduced scale....

Documents

Transcript of Multifractal Characterisation and Analysis of Complex Networks · repeats on an ever-reduced scale....