1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science...

53
1/52 erlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] Online Search of Overlapping Communities Wanyun Cui, Fudan University Yanghua Xiao, Fudan University Haixun Wang, Microsoft Research Asia Yiqi Lu, Fudan University Wei Wang, Fudan University Presenter. Wanyun Cui

Transcript of 1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science...

1/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Online Search of Overlapping Communities

Wanyun Cui, Fudan UniversityYanghua Xiao, Fudan University

Haixun Wang, Microsoft Research AsiaYiqi Lu, Fudan University

Wei Wang, Fudan University

Presenter. Wanyun Cui

2/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Outline Motivation Model Algorithm Experiments Applications

3/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Outline Motivation Model Algorithm Experiments Applications

4/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Complex network Complex network is everywhere.

Social Network

5/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Complex network Complex network is everywhere.

Internet

6/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Complex network Complex network is everywhere.

Protein Network

7/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Complex network Complex network is everywhere.

InternetSocial Network Protein Network

8/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Community structures Complex network is everywhere. Most real life networks have community

structures.• The graph can be divided into different groups such that

the vertices within each group are closely connected and the vertices between different groups are sparsely connected

InternetSocial Network Protein Network

9/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Overlapping community structure Overlapping community: a vertex may belong to

multiple communities

10/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Overlapping community structure Overlapping community: a vertex may belong to

multiple communities

C1: small boatC2: meaning of bucketC3: big boatC4: table wares

11/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Finding community structures Two possible ways to find the community

structure• OCD: overlapping community detection• OCS: overlapping community search

12/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

OCD vs. OCS OCD: divides the entire network to find

communities

13/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

OCD vs. OCS

Disadvantages of OCD• Too costly• Global criterion• Unfriendly to

dynamic graph

Facebook network: over 800 million nodes and 100 billion links

algorithm complexity

Girvan–Newman algorithm

O(|E|3)

LPA Almost linear

LA O(|C||E|+|V|)

14/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

OCD vs. OCS

Disadvantages of OCD• Too costly• Global criterion• Unfriendly to

dynamic graph

A fixed parameter or criterion is not appropriate for all vertices and queries.• Communities of a student• Communities of Barack

Obama

15/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

OCD vs. OCS

Disadvantages of OCD• Too costly• Global criterion• Unfriendly to

dynamic graph

Graphs in real life are always evolving over time.

We cannot afford to run OCD very frequently.

OCD loses its freshness and effectiveness

16/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

OCD vs. OCS

Disadvantages of OCD• Too costly• Global criterion• Unfriendly to

dynamic graph

Usually performed in an offline fashion

17/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

OCS: problem definition OCS:

• Given graph G, a query vertex v• Return: all communities that v belong to

Given: Return:

18/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

OCD vs. OCS

Advantages of OCS:• More efficient• Personalized

criterion• Light weight

We just need to find communities within the local neighborhoods of the vertex.

Our OCS solution only needs several milliseconds to find answer

19/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

OCD vs. OCS

Advantages of OCS:• More efficient• Personalized

criterion• Friendly to

dynamic graph

20/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

OCD vs. OCS

Advantages of OCS:• More efficient• Personalized

criterion• Light weight

A good choice to find communities in an online fashion

21/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Applications of OCS

• Friend recommendation on Facebook.

• Semantic expansion.• Infectious disease

control.• Etc.

22/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Challenges of OCS

• Modeling• Complexity and

scalability

A community should be dense enough

Overlapping aware

Generality

23/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Challenges of OCS

• Modeling• Complexity and

scalability

OCS in the worst case may need to enumerate an exponential number of valid communities.• Computational hard

Approximate approach

24/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Outline Introduction Model Algorithm Experiments Applications

25/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Model Community structure

awareness

Overlapping awareness

Generality

The inner edges of a community should be dense

Clique as the unit of community

A clique of 6 vertices

26/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Model Community structure

awareness

Overlapping awareness

Generality

Two k-cliques are adjacent if they share k-1 vertices

A community is a component in the k-clique graph

Original graph Clique graph (k=4)

27/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Model Community structure

awareness

Overlapping awareness

Generality

Weaken the strict constraint on clique density and clique adjacency

quasi-clique

adjacency

28/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Model Community structure

awareness

Overlapping awareness

Generality

Weaken the strict constraint on clique density and clique adjacency

quasi-clique

adjacency

It’s ok if a few edges are missing in the clique

29/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Model Community structure

awareness

Overlapping awareness

Generality

Loose the strict constraint of clique and adjacency

quasi-clique

𝛼 adjacency

If two cliques share at least 𝛼 vertices, they are 𝛼 adjacent.

30/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Model Community structure

awareness

Overlapping awareness

Generality

Loose the strict constraint of clique and adjacency

quasi-clique

𝛼 adjacency

Original graph Clique graph (=1)

31/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

(𝛼 ,𝛾 )−𝑂𝐶𝑆 Given graph G, query vertex v, k, , and , find all

connected quasi-clique components containing v.

k=4

32/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Alpha-gamma ocs Given graph G, query vertex v, k, , and , find all

connected quasi-clique components containing v.

k=3

33/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Parameter selection and k

• In general, larger k leads to larger

• Has an upper bound and a lower bound corresponding

to and k

34/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Outline Introduction Model Algorithm Experiments Applications

35/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Algorithm Exact algorithm

Approximate algorithm

36/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Exact Algorithm Example

• k=4, (3,1)-OCS• Query vertex = Bob

37/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Exact Algorithm Example

• k=4, (3,1)-OCS• Query vertex = Bob

Drawback• exponential enumerations

38/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Approximate Algorithm Example

• k=4, (3,1)-OCS• Query vertex = Bob

Approximate• the new clique contains at

least one new vertex

39/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Approximate Algorithm Example

• k=4, (3,1)-OCS• Query vertex = Bob

Approximate• the new clique contains at

least one new vertex

40/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Outline Introduction Model Algorithm Experiments Applications

41/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Experiments Setup

Dataset

Intel Core2 2.13GHz

4GB memory

64 bit windows 7

42/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Experiments Setup

Dataset

Dataset |V| |E|

WordNet 82676 133445

DBLP 560851 1816613

Google 916427 4322051

Livejournal 4847572 42851237

43/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Effectiveness It successfully unveils multiple research interests Example

• Jiawei Han • K=6

Jiawei Han

C1: multimedia data miningC2: stream data miningC3: information network

44/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Effectiveness Our model is flexible to support different

parameters. Example

• Jiawei Han • K=9

Jiawei Han

45/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Effectiveness

For most vertices, OCS model can find non-trivial results.

46/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Performance OCS is more efficient than OCD.

Competitors:• LA

• <Identification of overlapping community structure in complex networks using fuzzy c-means clustering>

• OSLOM

• <Finding statistically significant communities in networks> Amortized time

• (Total time of OCD)/n

47/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Performance: influence of parameters For the same k and , a smaller costs more time For the same k and , a smaller costs more time

48/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Accuracy of approximate algorithm More than 70% accuracy can be consistently

achieved, in some cases almost 90% accuracy can be achieved

49/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Outline Introduction Model Algorithm Experiments Applications

50/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Diversity-based Social Network Analysis What is the distribution of diversity? Can we find people with really large diversity?

51/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Name disambiguation Ambiguous names with a significant number of

entities also have a large number of communities.

Real person’s communities is smaller than these ambiguous names.

52/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Contributions Problem definition Model Guide for parameter selection Algorithms Extensive experiments and applications

53/52Overlapping Community Search

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Email: [email protected]

Q&A

Thank you!