Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail...
-
Upload
byron-powers -
Category
Documents
-
view
229 -
download
5
Transcript of Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail...
![Page 1: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/1.jpg)
Efficient Identification of Overlapping Communities
Jeffrey BaumesMark Goldberg
Malik Magdon-Ismail
Rensselaer Polytechnic Institute, Troy, NY
![Page 2: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/2.jpg)
Outline
• Communities as clusters • What is a cluster? • Cluster seed procedure (LA) • Cluster refinement procedure (IS2) • Experimental results • Conclusions and future work
![Page 3: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/3.jpg)
Communities as clusters
• Malicious groups use large communication networks for planning and coordination
• Their goal: remain undetected• Our goal: sift through
communications for suspicious patterns, using structure only, not content
![Page 4: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/4.jpg)
Communities as clusters
• Detecting all social groups (malicious or not) will aide in searching for “hidden” groups
• Social groups tend to communicate densely
• Approach: Find social groups by finding clusters in the graph of the communication network
actor Aactor B
A communicates with Blikely a social group
likely not a social group
Add external edges
![Page 5: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/5.jpg)
What is a cluster?
• Many partitioning algorithms exist• Social groups often overlap• Instead define clusters as locally
optimal with respect to density
partitioning overlapping clustering
![Page 6: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/6.jpg)
Two-stage process
seed procedure
refinement procedure
communication network
seed clusters
final clusters
![Page 7: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/7.jpg)
Original procedures
Rank Removal(RaRe)
Iterative Scan(IS)
communication network
seed clusters
final clusters
Jeffrey Baumes, Mark Goldberg, Mukkai Krishnamoorthy, Malik Magdon-Ismail,
Nathan Preston. "Finding Communities by Clustering a Graph into
Overlapping Subgraphs", International Conference on Applied Computing (IADIS
2005), Feb 22-25, Algarve, Portugal.
![Page 8: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/8.jpg)
Proposed new procedures
Link Aggregate(LA)
Iterative Scan 2(IS2)
communication network
seed clusters
final clusters
![Page 9: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/9.jpg)
Link Aggregate (LA)
• Order the nodes (two routines are used)
• Pass through the nodes– For each node, add it to the clusters it
improves, or start a new cluster
![Page 10: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/10.jpg)
LA procedure
![Page 11: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/11.jpg)
LA procedure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 17
18
1920
21
22
2324
25
26
27
28
29
30
31
32
33
34
35
![Page 12: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/12.jpg)
LA procedure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 17
18
1920
21
22
2324
25
26
27
28
29
30
31
32
33
34
35
![Page 13: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/13.jpg)
LA procedure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 17
18
1920
21
22
2324
25
26
27
28
29
30
31
32
33
34
35
![Page 14: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/14.jpg)
LA procedure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 17
18
1920
21
22
2324
25
26
27
28
29
30
31
32
33
34
35
![Page 15: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/15.jpg)
LA procedure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 17
18
1920
21
22
2324
25
26
27
28
29
30
31
32
33
34
35
![Page 16: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/16.jpg)
LA procedure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 17
18
1920
21
22
2324
25
26
27
28
29
30
31
32
33
34
35
![Page 17: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/17.jpg)
Iterative Scan (IS)
• Old refinement procedure– Traverses entire node list, adding /
removing nodes which increase the density
– Repeats the process until no improvements are possible
• May be inefficient in sparse networks\
• Guaranteed to be locally optimal
![Page 18: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/18.jpg)
Iterative Scan 2 (IS2)
• New refinement procedure– Traverses neighborhood of cluster
only, adding / removing nodes which increase the density
– Repeats the process until no improvements are possible
• More efficient in sparse networks in spite of overhead, less efficient in dense networks
![Page 19: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/19.jpg)
IS2 procedure
![Page 20: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/20.jpg)
IS2 procedure
![Page 21: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/21.jpg)
IS2 procedure
![Page 22: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/22.jpg)
IS2 procedure
![Page 23: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/23.jpg)
IS2 procedure
![Page 24: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/24.jpg)
Experimental results
• Compare run time of new vs. old• Compare cluster quality of new vs.
old• Compare on different network types
– Random– Preferential attachment– Real-world
• Compare possible actor orderings for LA
![Page 25: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/25.jpg)
RaRe vs. LA run time
New RaRe
LA
Original RaReNew RaRe
LA
![Page 26: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/26.jpg)
IS vs. IS2 run time
Define IS* = IS for dense graphs, IS2 for sparse graphs
![Page 27: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/27.jpg)
Old vs. new quality
New RaRe → IS
LA → IS2
New RaRe → IS
LA → IS2
![Page 28: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/28.jpg)
Preferential attachment
New RaRe → IS
LA → IS2
New RaRe → IS
LA → IS2
![Page 29: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/29.jpg)
Real-World Networks
Ratio = new/old = (LA→IS*)/(RaRe→IS)
Quality Ratio
0
0.5
1
1.5
2
2.5
E-mail Web Newsgroup Fortune 500
Run-time Ratio
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
E-mail Web Newsgroup Fortune 500
IS2 IS IS2 IS2IS* =
![Page 30: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/30.jpg)
LA ordering
![Page 31: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/31.jpg)
Conclusions and future work
• Overlapping clustering may be used to discover social groups in communication networks
• The new algorithm is more efficient in many cases, while keeping the same or better quality
• A unified algorithm should choose strategies and parameters based on network properties
![Page 32: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/32.jpg)
Questions
![Page 33: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/33.jpg)
Rank Removal
• Existing seed procedure– Removes highly connected nodes until network is
broken into small clusters– Adds removed nodes back into clusters it is well-
connected to
• Two main inefficiencies– Computed Page Rank at each iteration– Computed connected components at each iteration
• Page Rank could be computed once, but reprocessing connected components is crucial
![Page 34: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/34.jpg)
LA procedure detail
![Page 35: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/35.jpg)
IS2 procedure detail
![Page 36: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/36.jpg)
RaRe vs. LA
![Page 37: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/37.jpg)
RaRe vs. LA
![Page 38: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/38.jpg)
RaRe vs. LA
![Page 39: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/39.jpg)
IS vs. IS2
![Page 40: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/40.jpg)
IS vs. IS2
![Page 41: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/41.jpg)
IS vs. IS2
![Page 42: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/42.jpg)
Run time RaRe vs. LA
![Page 43: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/43.jpg)
Run time IS vs. IS2
![Page 44: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/44.jpg)
Cluster quality
![Page 45: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/45.jpg)
Cluster quality
![Page 46: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/46.jpg)
Preferential attachment run time
![Page 47: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/47.jpg)
Preferential attachment quality
![Page 48: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/48.jpg)
LA ordering run time
![Page 49: Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.](https://reader034.fdocuments.us/reader034/viewer/2022042514/56649e915503460f94b9688c/html5/thumbnails/49.jpg)
LA ordering quality