Community Extracting Using Intersection Graph and Content Analysis in Complex Network

25
ENGINEERING SCIENCE Community Extracting using Intersection Graph and Content Analysis in Complex Network Toshiya Kuramochi Naoki Okada Kyohei Tanikawa Yoshinori Hijikata Shogo Nishida Graduate School of Engineering Science, Osaka University, Japan The 2012 IEEE/WIC/ACM International Conference on Web Intelligence

description

Presentation in the IEEE/WIC/ACM International Conference on Web Intelligence 2012

Transcript of Community Extracting Using Intersection Graph and Content Analysis in Complex Network

Page 1: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

ENGINEERING SCIENCE

Community Extracting using Intersection Graph and Content Analysis in Complex Network

Toshiya Kuramochi Naoki Okada Kyohei TanikawaYoshinori Hijikata Shogo Nishida

Graduate School of Engineering Science, Osaka University, Japan

The 2012 IEEE/WIC/ACM International Conference on Web Intelligence

Page 2: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

2page Overview

1. Background and Problems of Community Detection

2. Our Proposed Method

3. Experimentation in Real SNS Networks

4. Results and Discussions

5. Conclusion

Page 3: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

page Background 3

Community structure• connection in groups is densely• connection among groups is sparsely

communities in WWW

sets of web pages relatedto a certain topic

Many researchers have studied about complex networksand have found the “community structure”

business

science

sport

Community structure is a key characteristic of complex network

Page 4: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

page Problem (1) – overlap of communities 4

Some nodes belong to several communities in real networks

communities in WWW

community ofsports pages

community ofbusiness pages

overlap of communities

Most of ordinary clustering methods allocate nodes one cluster They CANNOT represent the overlap of communities

(e.g., economic effect of the Olympic Games)

Community detection method should be able to allocate nodes several clusters

Page 5: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

page Problem (2) – edge inhomogeneity 5

Edges are not homogeneous in real networks

edges in SNS network

Many community detection methods assume all edges are same

They CANNOT represent the edge inhomogeneity

same hobby

family same university

work place

Weights of edges should be set individually

Page 6: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

page Problem (3) – appropriate number of communities 6

The number of real communities is often unknown

How many communitiesin this network?

234

Most hierarchical clustering methodsrequire manual input of appropriatenumber of communities

Number of communities should be determined automatically

Page 7: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

7page Purpose of this work

• A node may belong to several communities Using the idea of intersection graph [Everett & Borgatti, 1998]

• Weights of edges are set individually Content information analysis

• Number of communities are automatically determined Clustering based on modularity [Newman, 2003]

We solve these three problems by proposinga new community detection method

Page 8: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

8page Overview

1. Background and Problems of Community Detection

2. Our Proposed Method

3. Experimentation in Real SNS Networks

4. Results and Discussions

5. Conclusion

Page 9: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

9page Summary of our proposed method

Input Graph & Content information

• Step 1: Enumeration of dense subgraphs

• Step 2: Conversion to the intersection graph

• Step 3: Calculation of the weights of edges

• Step 4: Clustering based on modularity

Output Clusters (communities)

Page 10: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

10page

threshold enumerate

3

4

5

Step 1: Enumeration of dense subgraphs

dense subgraphclique, n-clique, n-clan, etc.

(example of dense subgraph)

clique threshold: thresholdof size of clique

A

B

C

DE

IJ

K

F

G

H

example of clique enumeration

complete graph

{ B, C, D, J, K } (size = 5)

{ D, E, I, J } (size = 4)

{ E, G, H } (size = 3)

Page 11: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

11page Step 2: Conversion to the intersection graph

A

B

C

DE

IJ

K

F

G

H

B, C, D

J, K

cliques in inputgraph

X

Z

Y

D, E,

I, J

E, G,

H

intersection graph

intersection graph • dense subgraphs in original graph

{ D, J }(common member)

{ E }∅

overlap threshold: threshold of number of common members

Page 12: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

12page Step 3: Calculation of weights of edges

B C

K DJ E

I

XY

• degree of overlap

• similarity of content information

• weight of the edge

= (Jaccard coefficient)

1. each set ( and ) as one vector ( and ) (tf-idf score)

2. (cosine similarity)

𝑤 ( 𝑋 ,𝑌 )= 𝑑 (𝑋 ,𝑌 )1+𝜖−𝑠𝑖𝑚(𝑋 ,𝑌 )

(0<𝜖<1 )

Page 13: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

13page Step 4: Clustering based on modularity

Modularity

• Modularity is an indicator for evaluation ofdivision of networks

• Clustering method based on modularityoptimizes directly

• Division with the highest is the best division

automatically detection of best number of clusters

Page 14: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

14page Summary of our proposed method

• Step 1: Enumerationof cliques

• Step 2: Conversion tothe intersection graph

• Step 3: Calculation ofweights of edges

• Step 4: Clusteringbased on modularity

A

B

C

DE

IJK

F

G

H

X

Y

Z

X

Y

Z

𝑤(𝑋 ,𝑌 )𝑤(𝑌 ,𝑍 )

𝑤 ( 𝑋 ,𝑍 )=0

X

Y

Z

cluster 1

cluster 2

Page 15: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

15page Overview

1. Background and Problems of Community Detection

2. Our Proposed Method

3. Experimentation in Real SNS Networks

4. Experimental Results

5. Conclusion

Page 16: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

16page

mixi: one of the most popular SNS in Japan• test subjects: 20 mixi users• link structure: two radius from each test subject• content information: self-introduction, friend introduction,

attributes (gender, address, birthday, etc.)

Dataset

ground truth: relation names between a test subject and personin the dataset which is enumerated by the test subject(e.g., ‘same university’, ‘hobby friend’, ‘coworker’)

evaluation: we evaluate communities that include test subject• numerical evaluation

• precision• recall• F-measure

• visual evaluation

Page 17: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

17page Implementation

parameter setting: (clique threshold, overlap threshold) = (3, 2), (4, 2), (4, 3), (5, 2), (5, 3) and (5, 4)

implementation:

WithCA NonCA

conventional method

Everett’s method

Everett’s method*

content analysis

friend introduction analysis none

clustering method

clustering based on modularitysimple hierarchical

clustering

# output clusters

automatically determined correct data* equals to NonCA

* the number of relation names which is enumerated by test subject

Page 18: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

18page Overview

1. Background and Problems of Community Detection

2. Our Proposed Method

3. Experimentation in Real SNS Networks

4. Results and Discussions

5. Conclusion

Page 19: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

19page

Everett Everett* NonCA WithCA

Everett bad bad bad

Everett* good bad bad

NonCA good good

WithCA good good

Numerical evaluation (F-measure)

(3, 2) (4, 2) (4, 3) (5, 2) (5, 3) (5, 4)0

0.1

0.2

0.3

0.4

0.5Everett's method Everett's method* NonCA WithCA

(clique threshold, overlap threshold)

F-m

easu

re

contribution of clusteringbased on modularity

superiority of our method

Page 20: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

20page Numerical evaluation (precision)

(3, 2) (4, 2) (4, 3) (5, 2) (5, 3) (5, 4)0

0.10.20.30.40.50.60.70.8

Everett's method Everett's method* NonCA WithCA

(clique threshold, overlap threshold)

prec

isio

n

Everett Everett* NonCA WithCA

Everett bad

Everett* bad

NonCA bad

WithCA good good goodthe precision becomebetter with content analysis

Page 21: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

21page Numerical evaluation (recall)

(3, 2) (4, 2) (4, 3) (5, 2) (5, 3) (5, 4)0

0.1

0.2

0.3

0.4

0.5

0.6Everett's method Everett's method* NonCA WithCA

(clique threshold, overlap threshold)

reca

ll

Everett Everett* NonCA WithCA

Everett bad bad bad

Everett* good bad bad

NonCA good good good

WithCA good good bad

the recall become better byusing clustering methodbased on modularity

our methods overcomethe ordinary method

Page 22: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

22page Visual evaluation (Everett’s method vs. WithCA)

Everett’s method WithCA

Page 23: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

23page Visual evaluation (NonCA vs. WithCA)

NonCA WithCA

Page 24: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

24page Overview

1. Background and Problems of Community Detection

2. Our Proposed Method

3. Experimentation in Real SNS Networks

4. Results and Discussions

5. Conclusion

Page 25: Community Extracting Using Intersection Graph and Content Analysis in Complex Network

25page Conclusion

• Features of our proposed method– Our method can allocate nodes several clusters– Our method can represent edge inhomogeneity – Our method can automatically detect the number of

clusters

• Evaluation on real SNS networks– Our method overcomes conventional method in F-

measure– The recall becomes better by using clustering method

based on modularity– The precision becomes better with content analysis