Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Put conference information here: The 12-th International Conference of Date Engineering
Version 1(2012-3-25)张俊骏
A Large-Scale Community Structure Analysis in Faceboo
k
Email:[email protected]
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Put conference information here: The 12-th International Conference of Date Engineering
OutLine
•Introduction•Data Collection Algorithm• (1) BFS sampling (2) Uniform sampling•Detection Communities• (1) LPA algorithm (2) FNCA algorithm•Experimentation• (1) Community structure similarity• (2) Out-of-scale community
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Introduction•Large-Scale: There have been over 500 million users
registered in Facebook in 2011.•Community •Structure : (1) Relationships are very tight over some
areas of the social life , such as family, colleagues,friends.
• (2) While the outgoing connections not belonging to any of these categories are less likely to happen.
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Introduction(2)•Large-Scale: There have been over 500 million users
registered in Facebook in 2011.•Community •Structure : (1) Relationships are very tight over some
areas of the social life , such as family, colleagues,friends.
• (2) While the outgoing connections not belonging to any of these categories are less likely to happen.
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Introduction(3)•Community: A sub-structure within the overall graph ,
in which the density of the relationships in a certain community is much greater than the density among communities.
•Clustering : Get the communities within the certaingraph (overall , or generating subgraph). In mathematic word , find a partition
• V = (V1∪V2∪ ... ∪Vn) , in which V1-Vn are vertex sets and for any Vx and Vy ,
• Vx ∩ Vy = Ø•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Introduction(4)•DataSets: (1) 2 different samples of the graph of
relationships among the social network users .
• (2) Each contains millions entities, and then adopting two fast and efficient community detecting algorithms .
• (3) Working with no a-priori knowledge .
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Data Collection Algorithm
•BFS Sampling
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Data Collection Algorithm (2)
•BFS Sampling • (1) Starting from one node • (2) End when reaching the required level or node
number.• (3) Easy to achieve ; Efficient• (4) Depend on the node selected at the start .
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Data Collection Algorithm (3)
•Uniform Sampling
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Data Collection Algorithm(4)
•Uniform Sampling • Legal ID number in Facebook: about 2^32• Existed ID number in Facebook: about 500 million (2011)• Thus , theoretically , if we want to mine a dataset of 1 million existed IDs , we need to test:• S = 1,000,000 / (500,000,000/2^32) = 8,590,000 legal IDs• Thus , generate 8,590,000 legal IDs randomly , check whether that ID exist . If so , mine the information of this node ; otherwise , drop it . • •
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Data Collection Algorithm(5)
•Uniform Sampling • Obviously , the advantage of uniform sampling is the • fact that the social network of the nodes will not make• effect on the result . • In the actual experiment , the generating dataset is a • little smaller than BFS , because some users hide themselves from the random search . •
•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Data Collection Algorithm(6)
•DataSet Description
•平均集聚系数为所有结点 Vi 的局部集聚系数的均值•结点 Vi 的局部集聚系数 Ci 是它的相邻结点之间的连接数与它们所有可能存在连接的数量的比值。•
•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Detecting Communities•LPA algorithm•
•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Detecting Communities(2)
•LPA algorithm• (1) Under specific conditions , could not converge. In order to avoid deadlocks and to guarantee an efficient network clustering , we suggested to adopt an "asynchronous" update of the labels, thus considering the values of some neighbors at the previous iteration and some at the actual one.• (2) About 5 iterations are sufficient to correctly classify
95% of vertices of the network .•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Detecting Communities(3)
•LPA algorithm• (3) It could exist a path connecting a pair of vertices in a
group passing through vertices belonging to different groups.We devise a final step• to split the groups into one • or more contiguous• communities.• (4) Near linear cost• (5) Not stable in some cases•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Detecting Communities(4)•FNCA algorithm(Pre)•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Detecting Communities(4)•FNCA algorithm(Pre)•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Detecting Communities(5)•FNCA algorithm
•Aij=1 当且仅当点 i 和点 j 互相连接。•δ ( u,v ) =1 当且仅当 u=v•ki 就是点 i 与所有其他点 j 的 Aij 的总和 ( 即点 i 的总边数 )•m 是所有点的 k 值的总和的一半(即图的总边数)•r(i) 即 i 所属的社区
•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Detecting Communities(6)
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Detecting Communities(7)•FNCA algorithm• (1) Experimental results show that, the clustering solution• of FNCA is good enough before iteration number reaches
50 for most networks (even large scale)• (2) Generally speaking , the community structure of a network
• is evident when its Q-value is greater than 0.3
• (3) The time complexity of the FNCA algorithm can not be
• worse than O(T * n * k * c)
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Detecting Communities(8)•Experimentation Result•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Detecting Communities(9)•Experimentation Result•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Experimentation•Community structure similarity
•
•
•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Experimentation(2)•Community structure similarity•
• rough method:
• improved method:
• M11 代表 v 交 w 之间共享的元素总数, M01 代表 w-v , M10 代表 v-w 当且仅
当 v=w 时这个 J 值等于 1
•
•
•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Experimentation(3)•Experimantal results•
•
•
•
•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Experimentation(4)•Out-of-scale community
•Maybe the shortage of algorithms , maybe it real exists . Anyway , it will be studied in the future.•
•
•
•
•
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Thank you!
Top Related