Community Extracting Using Intersection Graph and Content Analysis in Complex Network

ENGINEERING SCIENCE

Community Extracting using Intersection Graph and Content Analysis in Complex Network

Toshiya Kuramochi Naoki Okada Kyohei TanikawaYoshinori Hijikata Shogo Nishida

Graduate School of Engineering Science, Osaka University, Japan

The 2012 IEEE/WIC/ACM International Conference on Web Intelligence

2page Overview

1. Background and Problems of Community Detection

2. Our Proposed Method

3. Experimentation in Real SNS Networks

4. Results and Discussions

5. Conclusion

page Background 3

Community structure• connection in groups is densely• connection among groups is sparsely

communities in WWW

sets of web pages relatedto a certain topic

Many researchers have studied about complex networksand have found the “community structure”

business

science

Community structure is a key characteristic of complex network

page Problem (1) – overlap of communities 4

Some nodes belong to several communities in real networks

communities in WWW

community ofsports pages

community ofbusiness pages

overlap of communities

Most of ordinary clustering methods allocate nodes one cluster They CANNOT represent the overlap of communities

(e.g., economic effect of the Olympic Games)

Community detection method should be able to allocate nodes several clusters

page Problem (2) – edge inhomogeneity 5

Edges are not homogeneous in real networks

edges in SNS network

Many community detection methods assume all edges are same

They CANNOT represent the edge inhomogeneity

same hobby

family same university

work place

Weights of edges should be set individually

page Problem (3) – appropriate number of communities 6

The number of real communities is often unknown

How many communitiesin this network?

Most hierarchical clustering methodsrequire manual input of appropriatenumber of communities

Number of communities should be determined automatically

7page Purpose of this work

• A node may belong to several communities Using the idea of intersection graph [Everett & Borgatti, 1998]

• Weights of edges are set individually Content information analysis

• Number of communities are automatically determined Clustering based on modularity [Newman, 2003]

We solve these three problems by proposinga new community detection method

8page Overview

5. Conclusion

9page Summary of our proposed method

Input Graph & Content information

• Step 1: Enumeration of dense subgraphs

• Step 2: Conversion to the intersection graph

• Step 3: Calculation of the weights of edges

• Step 4: Clustering based on modularity

Output Clusters (communities)

10page

threshold enumerate

Step 1: Enumeration of dense subgraphs

dense subgraphclique, n-clique, n-clan, etc.

(example of dense subgraph)

clique threshold: thresholdof size of clique

example of clique enumeration

complete graph

{ B, C, D, J, K } (size = 5)

{ D, E, I, J } (size = 4)

{ E, G, H } (size = 3)

11page Step 2: Conversion to the intersection graph

B, C, D

cliques in inputgraph

intersection graph

intersection graph • dense subgraphs in original graph

{ D, J }(common member)

{ E }∅

overlap threshold: threshold of number of common members

12page Step 3: Calculation of weights of edges

K DJ E

• degree of overlap

• similarity of content information

• weight of the edge

= (Jaccard coefficient)

1. each set ( and ) as one vector ( and ) (tf-idf score)

2. (cosine similarity)

𝑤 ( 𝑋 ,𝑌 )= 𝑑 (𝑋 ,𝑌 )1+𝜖−𝑠𝑖𝑚(𝑋 ,𝑌 )

(0<𝜖<1 )

13page Step 4: Clustering based on modularity

Modularity

• Modularity is an indicator for evaluation ofdivision of networks

• Clustering method based on modularityoptimizes directly

• Division with the highest is the best division

automatically detection of best number of clusters

14page Summary of our proposed method

• Step 1: Enumerationof cliques

• Step 2: Conversion tothe intersection graph

• Step 3: Calculation ofweights of edges

• Step 4: Clusteringbased on modularity

𝑤(𝑋 ,𝑌 )𝑤(𝑌 ,𝑍 )

𝑤 ( 𝑋 ,𝑍 )=0

cluster 1

cluster 2

15page Overview

4. Experimental Results

5. Conclusion

16page

mixi: one of the most popular SNS in Japan• test subjects: 20 mixi users• link structure: two radius from each test subject• content information: self-introduction, friend introduction,

attributes (gender, address, birthday, etc.)

Dataset

ground truth: relation names between a test subject and personin the dataset which is enumerated by the test subject(e.g., ‘same university’, ‘hobby friend’, ‘coworker’)

evaluation: we evaluate communities that include test subject• numerical evaluation

• precision• recall• F-measure

• visual evaluation

17page Implementation

parameter setting: (clique threshold, overlap threshold) = (3, 2), (4, 2), (4, 3), (5, 2), (5, 3) and (5, 4)

implementation:

WithCA NonCA

conventional method

Everett’s method

Everett’s method*

content analysis

friend introduction analysis none

clustering method

clustering based on modularitysimple hierarchical

clustering

# output clusters

automatically determined correct data* equals to NonCA

* the number of relation names which is enumerated by test subject

18page Overview

5. Conclusion

19page

Everett Everett* NonCA WithCA

Everett bad bad bad

Everett* good bad bad

NonCA good good

WithCA good good

Numerical evaluation (F-measure)

(3, 2) (4, 2) (4, 3) (5, 2) (5, 3) (5, 4)0

0.5Everett's method Everett's method* NonCA WithCA

(clique threshold, overlap threshold)

contribution of clusteringbased on modularity

superiority of our method

20page Numerical evaluation (precision)

(3, 2) (4, 2) (4, 3) (5, 2) (5, 3) (5, 4)0

0.10.20.30.40.50.60.70.8

Everett's method Everett's method* NonCA WithCA

Everett bad

Everett* bad

NonCA bad

WithCA good good goodthe precision becomebetter with content analysis

21page Numerical evaluation (recall)

(3, 2) (4, 2) (4, 3) (5, 2) (5, 3) (5, 4)0

0.6Everett's method Everett's method* NonCA WithCA

Everett bad bad bad

Everett* good bad bad

NonCA good good good

WithCA good good bad

the recall become better byusing clustering methodbased on modularity

our methods overcomethe ordinary method

22page Visual evaluation (Everett’s method vs. WithCA)

Everett’s method WithCA

23page Visual evaluation (NonCA vs. WithCA)

NonCA WithCA

24page Overview

5. Conclusion

25page Conclusion

• Features of our proposed method– Our method can allocate nodes several clusters– Our method can represent edge inhomogeneity – Our method can automatically detect the number of

clusters

• Evaluation on real SNS networks– Our method overcomes conventional method in F-

measure– The recall becomes better by using clustering method

based on modularity– The precision becomes better with content analysis

Community Extracting Using Intersection Graph and Content Analysis in Complex Network

Documents

Transcript of Community Extracting Using Intersection Graph and Content Analysis in Complex Network

Conics A conic section is a graph that results from the intersection of a plane and a double cone.

Intersection of lines - from a graph • Intersection of ......VCE Maths Methods - Simultaneous equations Simultaneous equations - elimination method • If the equations are given

Graph Exploration w/ Neo4j - GitHub Pages · GRAPHEXPLORATION 3 Efficiently extracting knowledge from graph data even if we do not know exactly what we are looking for Graph Exploration:

Extracting Certainty from Uncertainty: Transductive …papers.nips.cc/paper/5607-extracting-certainty-from...Extracting Certainty from Uncertainty: Transductive Pairwise Classiﬁcation

ON THE INTERSECTION OF EDGES OF A GEOMETRIC …nogaa/PDFS/Publications2/On the intersection... · ON THE INTERSECTION OF EDGES OF A GEOMETRIC GRAPH BY STRAIGHT LINES N. ALON Department

ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Conflict Points 3-leg intersection 4-leg intersection Chapter 6: Intersections Conflict Points 3-leg intersection 4-leg intersection 9 conflict points 32 conflict points Intersection

Chapter 4 · Chapter 4 Inequalities Inequality: x < 3 Graph: ... Intersection: {1, 2, 3} {1, 3, 5 ... let’s graph the solutions we got on a number line ...

NCEA LEVEL 2 MATHEMATICS - mathtec.weebly.com · 1.Sketch the graph of a circle x² + y² = 17 and a line y = x + 3 Calculate the intersection points. 2.Sketch the graph of the circle

Example 1 Solve and Graph an Intersection Solve 7 < z + 2 ≤ 11. Graph the solution set. First express 7 < z + 2 ≤ 11 using and. Then solve each inequality.

Sources Decision Support Data Hub...Kimley Horne KITS Intersection Signals (Duarte, Monrovia, LA County) CT Intersection Signal Reader Intersection Signals Intersection Signals Intersection

Policy-GNN: Aggregation Optimization for Graph …key techniques behind these applications is graph representation learning [13], which aims at extracting information underlying the

casablanca.pkcasablanca.pk/surgical/dental.pdf · DENTAL CATALOGUE 2009-2010 CASABLANCA . Extracting Forceps CASABLANCA Er.sh Patten . Extracting Forceps ... Wisdom Teeth Extracting

Computer Science in the Information Age...Extracting signal from noise Graph Theory of the 50’s Theorem: A graph is planar if it does not contain a Kuratowski subgraph as a contraction.

Perfumes Extracting

Extracting Data

Lesson 4 MI/Vocab compound inequality intersection union Solve compound inequalities containing the word and and graph their solution sets. Solve compound.

Getting Started with S-PLUS 8 for Windows - MS MIAMI · Creating Graphs With Multiple Axes 39 Embedding and Extracting Data in Graph Sheets 45 Creating a Graph Using the Object Explorer

Epidemics on random intersection graphsdenis/epi/edinburgh11/talks/Trapman.pdf · Introduction The Graph The Epidemic Final Size Final Remarks Epidemics on random intersection graphs

A Faster R-CNN Approach for Extracting Indoor Navigation ... · A Faster R-CNN Approach for Extracting Indoor Navigation Graph from Building Designs L. Niu1, *, Y.Q Song2 1 School