Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.

Graph, Search Algorithms

Ka-Lok NgDepartment of Bioinformatics

Asia University

2

Content

How to characterize a biology network ?– Graph theory, topological parameters (node degrees, average path length, clustering coefficient, and node degree correlation.)– Random graph, Scale-free network, Hierarchical network

Search algorithm – Breadth-first Search, Depth-first Search

3

Biological Networks - metabolic networks

Metabolism is the most basic network of biochemical reactions, which generate energy for driving various cell processes, and degrade and synthesize many different bio-molecules.

4

Biological Networks - Protein-protein interaction network (PIN)

Proteins perform distinct and well-defined functions, but little is known about how

interactions among them are structured at the cellular level. Protein-protein interaction account for binding interactions and formation of protein complex. - Experiment – Yeast two-hybrid method, or co-immunoprecipitation

www.utoronto.ca/boonelab/proteomics.htm

Limitation: No subcellular location, and temporalinformation.

Cliques – protein complexes ?

5

Biological Networks - PIN

Yeast Protein-protein interaction network - protein-protein interactions are not random - highly connected proteins are unlikely to interact with each other.

Not a random network - Data from the high- throughput two-hybrid experiment (T. Ito, et al. PNAS (2001) )- The full set containing 4549 interactions among 3278 yeast proteins 87% nodes in the largest component- kmax ~ 285 !- Figure shows nuclear proteins only

6

Biological Networks – Gene regulation networks

Example of a genetic regulatory network of two genes (a and b), each coding for a regulatory protein (A and B).

In a gene regulatory network, the protein encoded by a gene can regulatethe expression of other genes, for instance, by activating or inhibiting DNA transcription. These genes in turn produce new regulatory proteins that control other genes.

7

Biological Networks – Gene regulation networks

Transcription regulatory network in Yeast- From the YPD database:

1276 regulations among 682 proteins

by 125 transcription factors (~10 regulated genes per TF)

- Part of a bigger genetic regulatory network of 1772 regulations among 908 proteins

Transcription regulatory network in H. sapiensData courtesy of Ariadne Genomics

obtained from the literature search:

1449 regulations among 689 proteins

Transcription regulatory network in E. coliData (courtesy of Uri Alon)

was curated from the Regulon

database:

606 interactions between

424 operons (by 116 TFs)

8

Graph Theory – Basic conceptsGraphsG=(N,E)N={n1 n2,... nN}E={e1 e2,... eM}ek={ni nj}Nodes: proteinsEdges: protein interactions

Mutligraphek={ni nj}+ duplicate edgesi.e. em={ni nj}Nodes: proteinsEdges: interactions of different sort: binding and similarity

HypergraphsHyperedge: ex={ni, nj, nk ...}Nodes: proteinsEdges: protein complexes

Directed hypergraphHyperedge: ex={ni, nj .. | nk nl ...}Nodes: substancesEdges: chemical reactions A + B C +DeX={A, B .. | C, D ...}

Directed graphek={ni nj}Nodes: genes and their productsEdges from A to B: gene regulation gene A regulates expression of gene B

Different systems Different graphs

NnEnd ||2)(

9

Graph Theory – Basic conceptsNode degree

Components

Complete graph (Clique)

Shortest path length

Clustering coefficient Ci

if A-B, B-C, then it is highly probable that A-C

)1(

2

ii

ii kk

EC

1.0)15(5

12

AC

Two ways to compute Ci

-Ei actual connections out of Ck2 possible

connections-number of triangles that included i/ki(ki-1)

Average clustering coefficient

N

iiC

NC

1

1

10

Graph Theory – Vertex adjacency matrix

01

01

1101

10

A

1

2 3

4- ∞ means not directly connected

- node i connectivity, ki = countj(mij = 1)

ki

1

3

1

1

Undirected graph

Bipartite graph

0

0TB

BA

symmetric

11

Graph Theory – Edge adjacency matrix

c

0111

1010

1101

1010

)(GE

a b c d

a

b

c

d

symmetric

1

2 3

4

ab

d

G

The edge adjacency matrix (E) of a graph G is identical to vertex adjacency matrix (A) of the line graph of G, L(G). That is the edge in G are replaced by vertices in L(G). Two vertices in L(G) are connected whenever the corresponding edges in G are adjacent. a b

cd

A(L(G)) = E(G)L(G)

The labeling of the same graph G are related by a similarity transformation, P -1A(G1)P=A(G2).

12

Graph Theory – average network distance

Interaction path length or average network distance, d

- the average of the distances between all pairs of nodes - frequency of the shortest interaction path length, f(L) - determined by using the Floyd’s algorithm The average network diameter d is given by

where L is the shortest path length between two nodes.

Network diameter (global) Average network distance (local)

L

L

Lf

LLfd

)(

)(

13

Graph Theory – the shortest path

The shortest path- Floyd algorithm, an O(N3) algorithm.

For iteration n,- given three nodes i, j and k, it is

shorter to reach j from i by passing through k

Mnij=min{Mn-1

ij, Mn-1ik+Mn-1

kj}

- search for all possible paths,

e.g. 1-2, 1-2-3, 1-2-4, 2-3, 2-4

1

2 3

4

i

k

j

14

Random Graph Theory = Graph Theory +Probability

15

Random Graph Theory = Graph Theory +Probability

16

Random Graph Theory= Graph Theory + Probability

Random graph (Erdos and Renyi, 1960)

N nodes labeled and connected by n edges CN

2 = N(N-1)/2 possible edges

possible graphs with N nodes and n edges

2

N

nC

n Number of possible graphs, C6n

1 6

2 15

3 20

4 15

5 6

6 1

N = 4 C6n

n 3 3 4 4 5 6

N = 4

kNkNki ppCkkP 11 )1()(

Search Algorithms Find the shortest route, in terms of distance between

nodes S and G. A matrix representation of the graph in Figure 3.1

17

Search Algorithms – Breadth-first search (BFS)

Nodes are expanded in the order in which they are generated. S is expanded into A, B, and C, which are generated in the order 1,2,and 3.

A is expanded first to B, C and D, which has generation order 4, 5 and 6 BFS goes back to node B and expands that next to A, C and E (generation order 7,

8 and 9) and then goes back to node 3 (C) and expands that to A, B, D, E and F (generation order 10, 11, 12, 13 and 14).

18

Search Algorithms – Depth-first search (DFS)

Begin from the root node of the tree Visited the first unvisit node, then marked this

node Then find the next unvisit node, then marked

this node When proceed, all the nodes are already

visited, go back to the parent node

19

Search Algorithms – Depth-first search (DFS)

20

E

Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.

Documents

Transcript of Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.