A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf,...

22
A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor Advisor Dr. Dr. Koh Jia-Ling Koh Jia-Ling Speaker Speaker Che-Wei Liang Che-Wei Liang Date Date 2008.1.8 2008.1.8

Transcript of A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf,...

Page 1: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

A framework For Community Identification in Dynamic Social

Networks Chayant, Tanya Berger-Wolf, David Kempe

[KDD’07]

AdvisorAdvisor :: Dr. Koh Jia-Dr. Koh Jia-LingLing

SpeakerSpeaker :: Che-Wei LiangChe-Wei LiangDateDate :: 2008.1.82008.1.8

Page 2: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Outline

• Introduction• Problem formulation• Finding optimal colorings• Group Coloring Heuristics• Experiment• Conclusion

Page 3: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Introduction

• Social networks– Graphs of interactions between individuals.

Page 4: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Introduction

• What is Community?– collections of individuals who interact unusually

frequently.– reveal interesting properties shared by member,

such as common hobbies, occupations.

• Why dynamic community?– may have more interesting properties.

Page 5: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

History of Interactions

t=11

2 3

45

Assume discrete time and interactions in form of complete subgraphs.

Page 6: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Approach: Graph Model

5

5

5

5

5

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

Page 7: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Preliminaries

Page 8: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Problem Formulation

• Behavior of individuals assumption:– Individuals and groups represent exactly one

community at a time.

– Concurrent groups represent distinct communities.

Page 9: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Problem Formulation

• Behavior of individuals assumption (cont.):– Conservatism: community affiliation changes are

rare.– Group Loyalty: individuals observed in a group

belong to the same community.– Parsimony: few affiliations overall for each

individual.

Page 10: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Approach: Color = Community

Valid coloring: distinct color of groups in each time step

Page 11: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

i-cost

•Conservatism: switching cost (α)

•Group loyalty:•Being absent (β1) •Being different (β2)

•Parsimony: number of colors (γ)

Page 12: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

g-cost

•Conservatism: switching cost (α)

•Group loyalty:-Being absent (β1) -Being different (β2)

•Parsimony: number of colors (γ)

Page 13: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

c-cost

•Conservatism: switching cost (α)

•Group loyalty:-Being absent (β1) -Being different (β2)

•Parsimony: number of colors (γ)

Page 14: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

• Minimum Community Interpretation For a given cost setting, (α,β1,β2,γ), find vertex coloring that minimizes total cost.– Color of group vertices = Community structure– Color of individual vertices = Affiliation sequences

• Problem is NP-Complete and APX-Hard

Page 15: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Finding Optimal Colorings

• Individual Coloring

– G(t, x): g-cost of coloring i at time step t with color x– I(t, x, y): i-cost of coloring I at time steps t and i-1 with colors x and y.– C(x, R): c-cost of using color x when R is the set of colors used

in prior steps.

Page 16: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Finding Optimal Colorings

• Group Coloring– Using exhaustive search over all group colorings.– Speed up by Branch-and-Bound techniques.

Page 17: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Group Coloring Heuristics

• Bipartite Matching Heuristic– Using standard flow techniques.

• Greedy Heuristics– Maximize “similarity”– Jaccard’s index: Jac(g, g’) = – Repeatedly select the pair(g, g’) with highest

similarity, decide g, g’ should have same color.

'

'

gg

gg

Page 18: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Experiment

• Synthetic Data sets

Page 19: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Southern Women Data Setby Davis, Gardner, and Gardner, 1941

Photograph by Ben Shaln, Natchez, MS, October; 1935 Aggregated network

Event participation

Page 20: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

An Optimal Coloring: (α,β1,β2,γ)=(1,1,3,1)

Cor

eP

erip

hery

Pe

riph

ery

Cor

e

Page 21: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

An Optimal Coloring: (α,β1,β2,γ)=(1,1,1,1)

Cor

eP

erip

he

ry

Cor

e

Page 22: A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.

Conclusion

• An optimization-based framework for finding communities in dynamic social networks.

• Finding an optimal solution is NP-Complete and APX-Hard.

• Model evaluation by exhaustive search.• Heuristic algorithms for larger data sets.

Heuristic results comparable to optimal.