Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
-
date post
21-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Graph, Search Algorithms
Ka-Lok NgDepartment of Bioinformatics
Asia University
2
Content
How to characterize a biology network ?– Graph theory, topological parameters (node degrees, average path length, clustering coefficient, and node degree correlation.)– Random graph, Scale-free network, Hierarchical network
Search algorithm – Breadth-first Search, Depth-first Search
3
Biological Networks - metabolic networks
Metabolism is the most basic network of biochemical reactions, which generate energy for driving various cell processes, and degrade and synthesize many different bio-molecules.
4
Biological Networks - Protein-protein interaction network (PIN)
Proteins perform distinct and well-defined functions, but little is known about how
interactions among them are structured at the cellular level. Protein-protein interaction account for binding interactions and formation of protein complex. - Experiment – Yeast two-hybrid method, or co-immunoprecipitation
www.utoronto.ca/boonelab/proteomics.htm
Limitation: No subcellular location, and temporalinformation.
Cliques – protein complexes ?
5
Biological Networks - PIN
Yeast Protein-protein interaction network - protein-protein interactions are not random - highly connected proteins are unlikely to interact with each other.
Not a random network - Data from the high- throughput two-hybrid experiment (T. Ito, et al. PNAS (2001) )- The full set containing 4549 interactions among 3278 yeast proteins 87% nodes in the largest component- kmax ~ 285 !- Figure shows nuclear proteins only
6
Biological Networks – Gene regulation networks
Example of a genetic regulatory network of two genes (a and b), each coding for a regulatory protein (A and B).
In a gene regulatory network, the protein encoded by a gene can regulatethe expression of other genes, for instance, by activating or inhibiting DNA transcription. These genes in turn produce new regulatory proteins that control other genes.
7
Biological Networks – Gene regulation networks
Transcription regulatory network in Yeast- From the YPD database:
1276 regulations among 682 proteins
by 125 transcription factors (~10 regulated genes per TF)
- Part of a bigger genetic regulatory network of 1772 regulations among 908 proteins
Transcription regulatory network in H. sapiensData courtesy of Ariadne Genomics
obtained from the literature search:
1449 regulations among 689 proteins
Transcription regulatory network in E. coliData (courtesy of Uri Alon)
was curated from the Regulon
database:
606 interactions between
424 operons (by 116 TFs)
8
Graph Theory – Basic conceptsGraphsG=(N,E)N={n1 n2,... nN}E={e1 e2,... eM}ek={ni nj}Nodes: proteinsEdges: protein interactions
Mutligraphek={ni nj}+ duplicate edgesi.e. em={ni nj}Nodes: proteinsEdges: interactions of different sort: binding and similarity
HypergraphsHyperedge: ex={ni, nj, nk ...}Nodes: proteinsEdges: protein complexes
Directed hypergraphHyperedge: ex={ni, nj .. | nk nl ...}Nodes: substancesEdges: chemical reactions A + B C +DeX={A, B .. | C, D ...}
Directed graphek={ni nj}Nodes: genes and their productsEdges from A to B: gene regulation gene A regulates expression of gene B
Different systems Different graphs
NnEnd ||2)(
9
Graph Theory – Basic conceptsNode degree
Components
Complete graph (Clique)
Shortest path length
Clustering coefficient Ci
if A-B, B-C, then it is highly probable that A-C
)1(
2
ii
ii kk
EC
1.0)15(5
12
AC
Two ways to compute Ci
-Ei actual connections out of Ck2 possible
connections-number of triangles that included i/ki(ki-1)
Average clustering coefficient
N
iiC
NC
1
1
10
Graph Theory – Vertex adjacency matrix
01
01
1101
10
A
1
2 3
4- ∞ means not directly connected
- node i connectivity, ki = countj(mij = 1)
ki
1
3
1
1
Undirected graph
Bipartite graph
0
0TB
BA
symmetric
11
Graph Theory – Edge adjacency matrix
c
0111
1010
1101
1010
)(GE
a b c d
a
b
c
d
symmetric
1
2 3
4
ab
d
G
The edge adjacency matrix (E) of a graph G is identical to vertex adjacency matrix (A) of the line graph of G, L(G). That is the edge in G are replaced by vertices in L(G). Two vertices in L(G) are connected whenever the corresponding edges in G are adjacent. a b
cd
A(L(G)) = E(G)L(G)
The labeling of the same graph G are related by a similarity transformation, P -1A(G1)P=A(G2).
12
Graph Theory – average network distance
Interaction path length or average network distance, d
- the average of the distances between all pairs of nodes - frequency of the shortest interaction path length, f(L) - determined by using the Floyd’s algorithm The average network diameter d is given by
where L is the shortest path length between two nodes.
Network diameter (global) Average network distance (local)
L
L
Lf
LLfd
)(
)(
13
Graph Theory – the shortest path
The shortest path- Floyd algorithm, an O(N3) algorithm.
For iteration n,- given three nodes i, j and k, it is
shorter to reach j from i by passing through k
Mnij=min{Mn-1
ij, Mn-1ik+Mn-1
kj}
- search for all possible paths,
e.g. 1-2, 1-2-3, 1-2-4, 2-3, 2-4
1
2 3
4
i
k
j
14
Random Graph Theory = Graph Theory +Probability
15
Random Graph Theory = Graph Theory +Probability
16
Random Graph Theory= Graph Theory + Probability
Random graph (Erdos and Renyi, 1960)
N nodes labeled and connected by n edges CN
2 = N(N-1)/2 possible edges
possible graphs with N nodes and n edges
2
N
nC
n Number of possible graphs, C6n
1 6
2 15
3 20
4 15
5 6
6 1
N = 4 C6n
n 3 3 4 4 5 6
N = 4
kNkNki ppCkkP 11 )1()(
Search Algorithms Find the shortest route, in terms of distance between
nodes S and G. A matrix representation of the graph in Figure 3.1
17
Search Algorithms – Breadth-first search (BFS)
Nodes are expanded in the order in which they are generated. S is expanded into A, B, and C, which are generated in the order 1,2,and 3.
A is expanded first to B, C and D, which has generation order 4, 5 and 6 BFS goes back to node B and expands that next to A, C and E (generation order 7,
8 and 9) and then goes back to node 3 (C) and expands that to A, B, D, E and F (generation order 10, 11, 12, 13 and 14).
18
Search Algorithms – Depth-first search (DFS)
Begin from the root node of the tree Visited the first unvisit node, then marked this
node Then find the next unvisit node, then marked
this node When proceed, all the nodes are already
visited, go back to the parent node
19
Search Algorithms – Depth-first search (DFS)
20
E