On finding clusters in undirected simple graphs: application to protein complex detection
description
Transcript of On finding clusters in undirected simple graphs: application to protein complex detection
![Page 1: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/1.jpg)
1. On finding clusters in undirected simple graphs: application to protein complex detection
2. DPClus software tool
Today’s lecture will cover the following four topics
Comparative Genomics
(Network Biology)
![Page 2: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/2.jpg)
Outline
•Introduction
•Some basic concepts
•The proposed algorithm
•The DPClus software
•Results & Discussion
•Conclusions
On finding clusters in undirected simple graphs: application to protein complex detection
![Page 3: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/3.jpg)
Introduction
•There is no universal definition of a cluster.
•But clustering is an important issue.
•Consequently there are diverse definitions and various methods.•The major purpose of clustering is finding cohesive groups.
•Here, we are going to discuss a graph clustering algorithm.
![Page 4: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/4.jpg)
Regarding a graph, a cluster is a subgraph whose nodes are densely connected with each other compared to their connections with other nodes in the graph.
This is a flexible definition of a cluster.
Intuitively, we can recognize two clusters in this arbitrary graph.
Introduction
But it is difficult to draw a big graph revealing its clusters.
![Page 5: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/5.jpg)
An E. coli protein-protein interaction network---consisting of 3007 proteins and 11531 interactions (From Mori Lab NAIST, Japan)
Some algorithm is needed to detect locally dense regions……
Introduction
![Page 6: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/6.jpg)
Md. Altaf-Ul-Amin, Yoko Shinbo, Kenji Mihara, Ken Kurokawa and Shigehiko Kanaya, “Development and implementation of an algorithm for detection of protein complexes in large interaction networks”, BMC Bioinformatics 7:207, April 2006.
Introduction
![Page 7: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/7.jpg)
Some basic concepts
It is likely that two nodes belong to the same cluster have more common neighbors than two nodes that are not
![Page 8: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/8.jpg)
It is likely that two nodes belong to the same cluster have more common neighbors than two nodes that are not
Some basic concepts
![Page 9: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/9.jpg)
•The density d of a cluster is the ratio of the number of edges present in it and the maximum possible number of edges in it.
•It is easy to realize that d = |E|/|E|max = 2*|E|/|N|*(|
N|-1).
•d is a real number ranging from 0 to 1.
Some basic concepts
![Page 10: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/10.jpg)
Density of the total graph = 0.241
d=0.9d=1.0
The density of the complexes are relatively higher
Some basic concepts
![Page 11: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/11.jpg)
Considering density alone is not enough
Such situations can be tackled by keeping track of the periphery
Some basic concepts
•Both the graphs consist of 8 nodes and both are of density 0.5
•But one of them seems to be a single cluster while the other is divided into two clusters
a
b c
d
e
g f
h
a
b
cd
ef
g
h
![Page 12: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/12.jpg)
Some basic concepts
The cluster property of any node n with respect to any cluster k of density dk and size Nk is defined as follows:
cpnk=|Enk|/(dk* |Nk|)
Here, |Enk| is the total number of edges between the node n and each of the nodes of cluster k.
a
b c
d
e
g f
h
a
b
cd
ef
g
h
Cluster property of node f 0.57
Cluster property of node f = 0.2
![Page 13: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/13.jpg)
The proposed algorithm is a sequential constructive algorithm:
It initializes the complex/cluster by choosing a seed node.
It then repeatedly add other nodes on the basis of priority and some conditions.
The major methods of the algorithm
•Choosing a seed node.
•Selecting a priority node.
•Checking necessary conditions before adding a node to a complex.
The proposed Algorithm
![Page 14: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/14.jpg)
Inputs to the algorithm are:
•The associated matrix of the network.
•A minimum threshold density for the generated clusters.
•A parameter to determine how we separate a complex from its periphery.
Output of the algorithm are :
Overlapping/non-overlapping complexes whose densities are more or equal to the given density.
The proposed Algorithm
![Page 15: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/15.jpg)
-
The proposed AlgorithmInput an undirected simple graph G.
Set thresholds din and cpin
and initialize cluster ID k = 1.
Generate degrees of the nodes of G.Determine the highest highest node degree (Dh). Dk= 0
Start at highest weight nodeof G as the kth cluster.
dk > din
No
Yescpp(k-p) > cpin
Yes
No
Deduct the last added node from kth cluster.
No
End
All neighbors of kth cluster are checked?
No
Yes
Print kth cluster.G G – kth cluster
k k+1.
Yes
Input & Initialization
Generate weight of each node of G.
highest node weight= 0 YesNo
Start at highest degree nodeof G as the kth cluster.
Generate the neighbors of the kth cluster in G. and sort them according to priority.Add the highest prority neigbor (p) to the cluster.
Add the next priority neighbor (p) to kth cluster.
Termination check
Seed selection
Cluster formation
Output & update
Flowchart of the proposed Algorithm
![Page 16: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/16.jpg)
0 1 0 0 0 0 0 0 0 0 0 0 0 0
1 0 1 1 0 1 0 0 0 0 0 0 0 0
0 1 0 1 1 1 0 0 0 0 0 0 0 0
0 1 1 0 1 1 0 1 0 0 0 0 0 0
0 0 1 1 0 1 0 0 0 0 0 0 0 0
0 1 1 1 1 0 1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 1 0 0 1 1
0 0 0 0 0 0 0 0 1 0 1 0 1 1
0 0 0 0 0 0 1 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1 1 0 1 0 1
0 0 0 0 0 0 0 0 1 1 0 0 1 0
M =
Muv = 1 if there is an edge between
nodes u and v and 0 otherwise.
The proposed Algorithm
![Page 17: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/17.jpg)
1 0 1 1 0 1 0 0 0 0 0 0 0 0
0 4 2 2 3 2 1 1 0 0 0 0 0 0
1 2 4 3 2 3 1 1 0 0 0 0 0 0
1 2 3 5 2 3 1 0 1 0 0 0 0 0
0 3 2 2 3 2 1 1 0 0 0 0 0 0
1 2 3 3 2 5 0 1 0 0 1 0 0 0
0 1 1 1 1 0 2 0 0 1 0 0 0 0
0 1 1 0 1 1 0 2 0 1 0 0 1 1
0 0 0 1 0 0 0 0 4 2 1 1 2 2
0 0 0 0 0 0 1 1 2 4 0 1 2 2
0 0 0 0 0 1 0 0 1 0 2 0 1 1
0 0 0 0 0 0 0 0 1 1 0 1 0 1
0 0 0 0 0 0 0 1 2 2 1 0 4 2
0 0 0 0 0 0 0 1 2 2 1 1 2 3
M2 =
(M2)uv for uv represents the
number of common neighbor of the nodes u and v.
The proposed Algorithm
![Page 18: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/18.jpg)
1 0 1 1 0 1 0 0 0 0 0 0 0 0
0 4 2 2 3 2 1 1 0 0 0 0 0 0
1 2 4 3 2 3 1 1 0 0 0 0 0 0
1 2 3 5 2 3 1 0 1 0 0 0 0 0
0 3 2 2 3 2 1 1 0 0 0 0 0 0
1 2 3 3 2 5 0 1 0 0 1 0 0 0
0 1 1 1 1 0 2 0 0 1 0 0 0 0
0 1 1 0 1 1 0 2 0 1 0 0 1 1
0 0 0 1 0 0 0 0 4 2 1 1 2 2
0 0 0 0 0 0 1 1 2 4 0 1 2 2
0 0 0 0 0 1 0 0 1 0 2 0 1 1
0 0 0 0 0 0 0 0 1 1 0 1 0 1
0 0 0 0 0 0 0 1 2 2 1 0 4 2
0 0 0 0 0 0 0 1 2 2 1 1 2 3
M2 =
(M2)uv for uv represents the
number of common neighbor of the nodes u and v.
The proposed Algorithm
![Page 19: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/19.jpg)
2
2
3
22
0
3
2
2
0 02
2
2
2
23
0
0
00
2
The proposed Algorithm
The weights of edges are derived by squaring the associated matrix of the graph
![Page 20: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/20.jpg)
2
2
3
22
0
3
2
2
0 02
2
2
2
23
0
0
00
2
10
10 6
10
6
0
6
6
0
0
6
0
06
The proposed Algorithm
The weights of nodes (sum of the weights of the connecting edges)
![Page 21: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/21.jpg)
2
2
3
22
0
3
2
2
0 02
2
2
2
23
0
0
00
2
10
10 6
10
6
0
6
6
0
0
6
0
06
Sum of edge weights
# of edges
P1 2 1
P3 3 1
P4 2 1
P5 3 1
The proposed Algorithm
Seed
Neighbors
![Page 22: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/22.jpg)
2
2
3
22
0
3
2
2
0 02
2
2
2
23
0
0
00
2
10
10 6
10
6
0
6
6
0
0
6
0
06
Sum of edge weights
# of edges
P3 3 1
P5 3 1
P1 2 1
P4 2 1
The proposed Algorithm
Neighbors
cp of P3 = 1
![Page 23: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/23.jpg)
2
2
3
22
0
3
2
2
0 02
2
2
2
23
0
0
00
2
10
10 6
10
6
0
6
6
0
0
6
0
06
Sum of edge weights
# of edges
P1 4 2
P4 4 2
P5 6 2
P7 0 1
d=1.0
Neighbors
The proposed Algorithm
![Page 24: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/24.jpg)
2
2
3
22
0
3
2
2
0 02
2
2
2
23
0
0
00
2
10
10 6
10
6
0
6
6
0
0
6
0
06
Sum of edge weights
# of edges
P5 6 2
P1 4 2
P4 4 2
P7 0 1
d=1.0
Neighbors
The proposed Algorithm
cp of P5 = 1
![Page 25: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/25.jpg)
2
2
3
22
0
3
2
2
0 02
2
2
2
23
0
0
00
2
10
10 6
10
6
0
6
6
0
0
6
0
06
Sum of edge weights
# of edges
P1 4 2
P4 4 2
P6 0 1
P7 0 1
d=1.0
Neighbors
The proposed Algorithm
cp of P1 = 1
![Page 26: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/26.jpg)
2
2
3
22
0
3
2
2
0 02
2
2
2
23
0
0
00
2
10
10 6
10
6
0
6
6
0
0
6
0
06
Sum of edge weights
# of edges
P0 0 1
P4 4 2
P6 0 1
P7 0 1
d=1.0
Neighbors
The proposed Algorithm
![Page 27: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/27.jpg)
2
2
3
22
0
3
2
2
0 02
2
2
2
23
0
0
00
2
10
10 6
10
6
0
6
6
0
0
6
0
06
Sum of edge weights
# of edges
P4 4 2
P0 0 1
P6 0 1
P7 0 1
d=1.0
Neighbors
The proposed Algorithm
cp of P4 = 0.75
![Page 28: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/28.jpg)
2
2
3
22
0
3
2
2
0 02
2
2
2
23
0
0
00
2
10
10 6
10
6
0
6
6
0
0
6
0
06
d=0.9
Neighbors
The proposed Algorithm
Sum of edge weights
# of edges
cp-value
P0 0 1 ~0.22
P6 0 1 ~0.22
P7 0 1 ~0.22
![Page 29: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/29.jpg)
02
2
2
2
2
0
0
0
2
6
0
6
6
0
6
0
0
The proposed Algorithm
The remaining graph
Seed
![Page 30: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/30.jpg)
02
2
2
2
2
0
0
0
2
6
0
6
6
0
6
0
0
d=1.0
The proposed Algorithm
![Page 31: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/31.jpg)
02
2
2
2
2
0
0
0
2
6
0
6
6
0
6
0
0
d=1.0
The proposed Algorithm
![Page 32: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/32.jpg)
02
2
2
2
2
0
0
0
2
6
0
6
6
0
6
0
0
d=1.0
The proposed Algorithm
![Page 33: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/33.jpg)
The proposed Algorithm
The remaining graph
![Page 34: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/34.jpg)
The proposed Algorithm
Clustering by the proposed algorithm
![Page 35: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/35.jpg)
Example
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅰ
![Page 36: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/36.jpg)
1. Input and Initialized cpin=0.4, din = 0.6
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅰ
![Page 37: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/37.jpg)
A
B
D
C
E
L
F
H
G
K
J
I
2
2
2 22
31
21
1
0
1
11
0
1
1
1
1. Seed Selection-1: calculation of weights of edges
![Page 38: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/38.jpg)
1. Seed selection-2: Calculation of weights of nodes A
B
D
C
E
L
F
H
G
K
J
I
( )ⅲクラスター 1 のシード選択
2
2
2 22
31
21
1
0
1
11
0
1
1
1
6
6
10
8
4
2
2
2
2
2
2
2
Selected seed
![Page 39: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/39.jpg)
2. Cluster formation-1 Calculation of weights of nodes
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅳ
223
21
Cluster 1d1=1
クラスター1の形成
22
3
2
1
Cluster 1d1=1
Candidate merged to Cluster 1
1
![Page 40: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/40.jpg)
2. Cluster formation-2
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅴ
Check thresholds OK d1=1/1=1 > 0.6
cpC1=1/(1*1)=1 > 0.4 (cpin )
2
2 22
2
クラスター1の形成
4
4
3
1
1
Candidate merged to Cluster 1
1
![Page 41: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/41.jpg)
2. Cluster formation-3
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅵクラスター1の形成
cpA1=2/(1x2)=1>0.4
Cluster 1 d1=3/3=1
2
2
12
1
1
3
62
![Page 42: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/42.jpg)
2. Cluster formation-4
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅶクラスター 1 の形成
21
1
1
3
Check thresholds OK d1=1/1=1 > 0.6
cpB1=3/(1x3)=1 > 0.4 (cpin )
Candidate merged to Cluster 1
![Page 43: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/43.jpg)
2. Cluster formation-5
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅷクラスター 1 の形成
0 11
2
0
Check thresholds OK d1=8/10=0.8 > 0.6
cpL1=2/(1*4)=0.5 > 0.4 (cpin )
Candidate merged to Cluster 1
![Page 44: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/44.jpg)
2. Cluster formation-6
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅸクラスター 1 の探索
0
0
0
0
Check thresholds OK d1=10/15=0.67 > 0.6
cpE1=2/(0.8*5)=0.6 > 0.4 (cpin )
Candidate merged to Cluster 1
![Page 45: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/45.jpg)
2. Cluster formation-7
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅸクラスター 1 の探索
0
0
0
0
Check thresholds Out d1=11/12=0.52 < 0.6
cpE1=1/(0.52*6)=0.32 < 0.4 (cpin )
![Page 46: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/46.jpg)
2. Cluster formation-8
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅸクラスター 1 の探索
0
0
0
0
Check thresholds Out d1=11/12=0.52 < 0.6
cpF1=1/(0.52*6)=0.32 < 0.4 (cpin )
![Page 47: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/47.jpg)
2. Cluster formation-8
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅸクラスター 1 の探索
0
0
0
0
Check thresholds Out d1=11/12=0.52 < 0.6
cpF1=1/(0.52*6)=0.0 < 0.4 (cpin )
![Page 48: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/48.jpg)
2. Cluster formation-9: Remove the edges and nodes belonging to Cluster 1
F
H
G
K
J
I
( )ⅹクラスター 1 を削除
![Page 49: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/49.jpg)
Results of Density Periphery Clustering
A
B
D
C
E
L
F
H
G
K
J
I
( )ⅹ終了
Cluster 1d1=10/15=0.67
Cluster 2d2=3/3=1
Cluster 3d3=3/3=1
ⅰ
![Page 50: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/50.jpg)
Results: Complexes in the E. coli PPI Network
The network of E. coli proteins consists of 363 interactions involving a total of 336 proteins
DIP:339N GroEL DIP:1081N PrnP
DIP:1025N CarB DIP:1026N CarA
DIP:539N MalG DIP:508N MalE
DIP:124N XerD DIP:726N XerC
DIP:367N PntB DIP:366N PntA
DIP:342N SbcC DIP:572N Gam
-------------- --------- -------------- ---------
-------------- --------- -------------- ---------
http://dip.mbi.ucla.edu/
![Page 51: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/51.jpg)
components of RNA polymerase (RpoA, RpoB, RpoC, Rsd, RpoZ RpoD, RpoN, FliA)
Results: Complexes in the E. coli PPI Network
![Page 52: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/52.jpg)
components of ATP synthetase (AtpA, AtpB, AtpE, AtpF, AtpG, AtpH, AtpL);
Results: Complexes in the E. coli PPI Network
![Page 53: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/53.jpg)
Proteins involved in cell division (FtsQ, FtsI, FtsW, FtsN, FtsK and FtsL)
Results: Complexes in the E. coli PPI Network
![Page 54: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/54.jpg)
components of DNA polymerase (DnaX, HolA, HolB, HolD, and HolC);
Results: Complexes in the E. coli PPI Network
![Page 55: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/55.jpg)
We extract a set of 12487 unique binary interactions involving 4648 proteins by discarding self-interactions of the PPI data obtained from ftp://ftpmips.gsf.de/yeast/PPI/.
Results: Complexes in the S. cerevisiae PPI Network
![Page 56: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/56.jpg)
Results: Details of a Group of Predicted Complexes
Information on the complexes that are of size 6 of the set generated using din=0.7, cpin=0.50 and non-overlapping mode.
ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
1 5 10 15
17
13
14
14
12
12
11
9
8
8
8
8
8
7
7
7
7
6
6
6
6
6
6
6
6
6
6
6
28 0.71
0.72
1.00
0.83
0.71
0.94
0.71
0.98
0.72
0.93
0.72
0.71
0.71
0.71
0.95
0.76
0.71
0.71
0.80
0.80
0.73
0.73
0.73
0.73
0.73
0.73
0.73
0.73
0.73
CTF4,CTF8,CTF18,CTF19,CIN1,CIN2,CIN8,GIM3,GIM4,GIM5,MAD1,MAD2,MAD3,BUB1,BUB3,PAC2,PAC10,ARP6,BIK1,BIM1,CHL1,CSM3, DCC1,HTZ1,KAR3,SCC1-73,TUB3,YKE2
CHS3,CHS5,CHS7,BNI1,BNI4,RVS161,RVS167,ARC40,ARP2,BCK1,CLA4,FKS1,KRE1,SKT5,SLT2, SMI1,SWI4
TAF17,TAF25,TAF60,TAF61,TAF90,SPT3,SPT7,SPT8,SPT20,ADA2,GCN5,HFI1,NGG1,TRA1
LSM1,LSM2,LSM3,LSM4,LSM5,LSM6,LSM7,LSM8,DCP1,KEM1,MRNa,PAT1,SNRNa,U6
RAD27,RAD50,CDC45-1,ELG1,ESC2,HPR5,MMS4,MRC1,POL32,RRM3,SGS1,TOF1,TOP3
TRS20,TRS23,TRS31,TRS33,TRS65,TRS85,TRS120,TRS130,BET3,BET5,GSG1,KRE11
COG5,COG6,COG7,COG8,ARL1,ARL3,GOS1,GYP1,RIC1,SWF1,TLG2,YPT6
APC1,APC2,APC4,APC5,APC9,APC11,CDC16,CDC23,CDC26,CDC27,DOC1
CDC73,CTI6,DEP1,LEO1,SAP30,SET2,SIF2,SWR1,VPS71
CFT1,CFT2,FIP1,PAP1,PFS2,PTA1,YSH1,YTH1
MED2,MED4,MED7,MED8,PGD1,RPB3,SOH1,SRB4
BEM1,BEM2,BOI1,BOI2,CDC24,CDC42,MSB1,STE20
ARP1,ASE1,CLB4,JNM1,KAR9,KIP3,NIP100,PAC11
CDC4,CDC34,CDC53,CLN1,CLN2,CLN3,SIC1,SKP1
CDC3,CDC10,CDC11,CDC12,GIN4,SEP7,SHS1
CKA1,CKA2,CKB1,CKB2,CDC7-1,RHO3,TOP2
SNR3,SNR10,SNR11,SNR189,GAR1,NHP2,NOP10
SPC19,SPC24,NNF1,NUF2,SMC1,TID3,YDR295c
YGL161c,YGL198w,GCS1,YDR425w,YIP1,YPL095c
PRP5,PRP9,PRP11,PRP21,NOG2,YNR053c
NUP49,NUP57,APG17,NIC96,NSP1,SEC35
KTR3,LAS17,SLA1,YFR024c,YOR284w,YSC84
ECM31,GCD7,NIP29,TEM1,YJL199c,YPL070w
ERB1,HAS1,NIP7,NOP7,NUG1,SSF1
SEC2,SEC4,SEC10,SEC15,MYO2,SMY1
MYO3,MYO5,BBC1,BZZ1,UBP7,VRP1
DBF2,DBF20,CDC15,LTE1,MOB1,SPO12
HHF1,HHF2,HHT1,HHT2,SPT6,STH1
CBF1,CEP3,CHL4,CTF13,MCM21,MIF2
N d Function Class Gene Name
YIP1
GCS1
YGL161c
YPL095c
YGL198w
YDR425w
(a) (b)
3.9x10-17
9.0x10-13
1.7x10-11
1.1x10-6
3.7x10-4
3.4x10-11
4.0x10-6
2.1x10-10
1.9x10-5
4.8x10-7
3.4x10-5
3.1x10-9
4.5x10-7
6.8x10-7
3.5x10-6
5.4x10-3
1.3x10-4
3.5x10-6
9.5x10-4
1.3x10-7
6.3x10-10
1.0x10-4
4.8x10-1
2.3x10-3
2.4x10-5
1.0x10-4
1.2x10-3
1.8x10-5
2.3x10-5
Corrected P-value
We considered 15 functional classes: (1) Cell cycle and DNA processing, (2) Protein with binding function or cofactor requirement (structural or catalytic), (3) Protein fate (folding, modification, destination), (4) Biogenesis of cellular components, (5) Cellular transport, transport facilitation and transport routes, (6) Metabolism, (7) Interaction with the cellular environment, (8) Transcription, (9) Energy, (10) Cell rescue, defense and virulence, (11) Cell type differentiation, (12) Cellular communication/signal transduction mechanism, (13) Protein activity regulation, (14) Protein synthesis, and (15) Transposable elements, viral and plasmid proteins
![Page 57: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/57.jpg)
1
01
k
i
C
N
iC
FN
i
F
P
Results: Hypergeometric distribution
N= Total number of proteins in the network
F= Number of proteins of a functional group in the network
C= Number of proteins in a cluster
k= Number of proteins of a functional group in a cluster
The p-value of a cluster implies the probability that the proteins of the cluster have been randomly selected
The lower the p-value the higher the statistical significance
![Page 58: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/58.jpg)
3 green and 4 red balls
Put them in a box
Randomly choose any 3
P0(# of red ball is 0) = 35
1
3
7
3
3
0
4
P1(# of red ball is 1) = 35
12
3
7
2
3
1
4
P2(# of red ball is 2) = P3(# of red ball is 3) = 35
18
3
7
1
3
2
4
35
4
3
7
0
3
3
4
Notice that, P0 +P1+P2+P3=1
P-value & Hyper geometric distribution
![Page 59: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/59.jpg)
P0(# of red ball is 0) = 35
1
3
7
3
3
0
4
P1(# of red ball is 1) = 35
12
3
7
2
3
1
4
P2(# of red ball is 2) = P3(# of red ball is 3) = 35
18
3
7
1
3
2
4
35
4
3
7
0
3
3
4
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1 32
P-value & Hyper geometric distribution
![Page 60: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/60.jpg)
P0(# of red ball is 0) = 35
1
3
7
3
3
0
4
P1(# of red ball is 1) = 35
12
3
7
2
3
1
4
P2(# of red ball is 2) = P3(# of red ball is 3) = 35
18
3
7
1
3
2
4
35
4
3
7
0
3
3
4
P(# of red ball ≤ 1)= P0 +P1
P(# of red ball ≥ 2)=1-(P0 +P1)
P(# of red ball ≥ k)=1-(P0 +P1+…+Pk-1)
1
01
k
i
C
N
iC
FN
i
F
PN=7, F=4, C=3
P-value & Hyper geometric distribution
![Page 61: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/61.jpg)
Results: Details of a Group of Predicted Complexes
Information on the complexes that are of size 6 of the set generated using din=0.7, cpin=0.50 and non-overlapping mode.Protein YDR425w of complex 19 is related to cellular transport and YIP1, YGL198w, YGL161c and GCS1 are related to vesicular transport. Hence, we predict the function-unknown protein YPL095c of this complex is a transport related protein most likely related to vesicular transport.
ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
1 5 10 15
17
13
14
14
12
12
11
9
8
8
8
8
8
7
7
7
7
6
6
6
6
6
6
6
6
6
6
6
28 0.71
0.72
1.00
0.83
0.71
0.94
0.71
0.98
0.72
0.93
0.72
0.71
0.71
0.71
0.95
0.76
0.71
0.71
0.80
0.80
0.73
0.73
0.73
0.73
0.73
0.73
0.73
0.73
0.73
CTF4,CTF8,CTF18,CTF19,CIN1,CIN2,CIN8,GIM3,GIM4,GIM5,MAD1,MAD2,MAD3,BUB1,BUB3,PAC2,PAC10,ARP6,BIK1,BIM1,CHL1,CSM3, DCC1,HTZ1,KAR3,SCC1-73,TUB3,YKE2
CHS3,CHS5,CHS7,BNI1,BNI4,RVS161,RVS167,ARC40,ARP2,BCK1,CLA4,FKS1,KRE1,SKT5,SLT2, SMI1,SWI4
TAF17,TAF25,TAF60,TAF61,TAF90,SPT3,SPT7,SPT8,SPT20,ADA2,GCN5,HFI1,NGG1,TRA1
LSM1,LSM2,LSM3,LSM4,LSM5,LSM6,LSM7,LSM8,DCP1,KEM1,MRNa,PAT1,SNRNa,U6
RAD27,RAD50,CDC45-1,ELG1,ESC2,HPR5,MMS4,MRC1,POL32,RRM3,SGS1,TOF1,TOP3
TRS20,TRS23,TRS31,TRS33,TRS65,TRS85,TRS120,TRS130,BET3,BET5,GSG1,KRE11
COG5,COG6,COG7,COG8,ARL1,ARL3,GOS1,GYP1,RIC1,SWF1,TLG2,YPT6
APC1,APC2,APC4,APC5,APC9,APC11,CDC16,CDC23,CDC26,CDC27,DOC1
CDC73,CTI6,DEP1,LEO1,SAP30,SET2,SIF2,SWR1,VPS71
CFT1,CFT2,FIP1,PAP1,PFS2,PTA1,YSH1,YTH1
MED2,MED4,MED7,MED8,PGD1,RPB3,SOH1,SRB4
BEM1,BEM2,BOI1,BOI2,CDC24,CDC42,MSB1,STE20
ARP1,ASE1,CLB4,JNM1,KAR9,KIP3,NIP100,PAC11
CDC4,CDC34,CDC53,CLN1,CLN2,CLN3,SIC1,SKP1
CDC3,CDC10,CDC11,CDC12,GIN4,SEP7,SHS1
CKA1,CKA2,CKB1,CKB2,CDC7-1,RHO3,TOP2
SNR3,SNR10,SNR11,SNR189,GAR1,NHP2,NOP10
SPC19,SPC24,NNF1,NUF2,SMC1,TID3,YDR295c
YGL161c,YGL198w,GCS1,YDR425w,YIP1,YPL095c
PRP5,PRP9,PRP11,PRP21,NOG2,YNR053c
NUP49,NUP57,APG17,NIC96,NSP1,SEC35
KTR3,LAS17,SLA1,YFR024c,YOR284w,YSC84
ECM31,GCD7,NIP29,TEM1,YJL199c,YPL070w
ERB1,HAS1,NIP7,NOP7,NUG1,SSF1
SEC2,SEC4,SEC10,SEC15,MYO2,SMY1
MYO3,MYO5,BBC1,BZZ1,UBP7,VRP1
DBF2,DBF20,CDC15,LTE1,MOB1,SPO12
HHF1,HHF2,HHT1,HHT2,SPT6,STH1
CBF1,CEP3,CHL4,CTF13,MCM21,MIF2
N d Function Class Gene Name
YIP1
GCS1
YGL161c
YPL095c
YGL198w
YDR425w
(a) (b)
3.9x10-17
9.0x10-13
1.7x10-11
1.1x10-6
3.7x10-4
3.4x10-11
4.0x10-6
2.1x10-10
1.9x10-5
4.8x10-7
3.4x10-5
3.1x10-9
4.5x10-7
6.8x10-7
3.5x10-6
5.4x10-3
1.3x10-4
3.5x10-6
9.5x10-4
1.3x10-7
6.3x10-10
1.0x10-4
4.8x10-1
2.3x10-3
2.4x10-5
1.0x10-4
1.2x10-3
1.8x10-5
2.3x10-5
Corrected P-value
![Page 62: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/62.jpg)
Conclusions
•In this work, we present an algorithm to detect locally dense regions in undirected simple graphs.
•The algorithm can be used to detect protein complexes in large protein-protein interaction networks or co-expressed gene clusters based on microarray data.
•It can also be used for protein/gene function prediction by way of finding complexes/clusters in networks consisting of function known and function unknown proteins.
•Also, DPClus can be applied to other networks where finding cohesive groups is an agenda.
The DPClus software is available at http://kanaya.naist.jp/DPClus/
![Page 63: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/63.jpg)
Md. Altaf-Ul-Amin, Hisashi Tsuji, Ken Kurokawa, Hiroko Asahi, Yoko Shinbo, Shigehiko Kanaya, “DPClus: A Density-periphery Based Graph Clustering Software Mainly Focused on Detection of Protein Complexes in Interaction Networks”, Journal of Computer Aided Chemistry , Vol.7, 150-156, 2006.
2. The DPClus Software
The DPClus software is available at http://kanaya.naist.jp/DPClus/
The DPClus software has been developed based on the proposed algorithm.
![Page 64: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/64.jpg)
The main window of DPClus
The DPClus Software
![Page 65: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/65.jpg)
AtpB AtpAAtpG AtpEAtpA AtpHAtpB AtpHAtpG AtpHAtpE AtpH
The input file format
0 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0
List of edges
Corresponding network
Adjacency matrix
The DPClus Software
![Page 66: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/66.jpg)
ClusterLength of cluster 1 is: 8RpoARpoBRpoCRsdRpoZRpoDRpoNFliAClusterLength of cluster 2 is: 8AtpHAtpGAtpBAtpAAtpFAtpLAtpEAtpB(A)ClusterLength of cluster 3 is: 5----------------------------------------------------------------------------
Output file format
The DPClus Software
![Page 67: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/67.jpg)
Click!
Intra cluster edges are green and inter cluster edges are red
Nodes have been arranged by dragging
The DPClus Software
![Page 68: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/68.jpg)
Click
Click
Click
Hierarchical graph of the clusters
The DPClus Software
![Page 69: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/69.jpg)
Clustering of microarray data
Sample microarray data
To apply DPCcus, we need to convert this data to a network
The DPClus Software
![Page 70: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/70.jpg)
Experiment ID
Genes
m
kjjk
m
kiik
m
kjjkiik
ij
xxxx
xxxxR
1
2
1
2
1
)()(
))((
Gene-Gene correlation
Select highly correlated gene pairs
Edges of a Network
At3g10060 At3g54150At3g10060 At3g63140At3g10060 At5g07020-------------- --------------------------- -------------
The DPClus Software
![Page 71: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/71.jpg)
# of experiments 626 Threshold correlation 0.95cp value 0.5density value 0.9Minimum cluster size 3
The DPClus Software
![Page 72: On finding clusters in undirected simple graphs: application to protein complex detection](https://reader036.fdocuments.us/reader036/viewer/2022062803/56814650550346895db3627e/html5/thumbnails/72.jpg)
Ribosomal proteinclusters
Electron transport clusters
Photosynthesis clusters
The DPClus Software