Exploiting Local Similarity for Indexing Paths in Graph-Structured Data
Compression-based Graph Mining Exploiting Structure Primites
-
Upload
werner-hoffmann -
Category
Data & Analytics
-
view
105 -
download
3
Transcript of Compression-based Graph Mining Exploiting Structure Primites
![Page 1: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/1.jpg)
Compression-based Graph Mining Exploiting
Structure Primitives
Seminar explorative DatenanalyseWerner Hoffmann
19.06.2015
Jing Feng, Xiao He, Nina Hubig, Christian Böhm and Claudia Plant
![Page 2: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/2.jpg)
Outline
- What?- Why?- How?- Conclusion!
2
![Page 3: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/3.jpg)
Context [1]
Graphs:unweightedundirectedmodelled as adjacency matrixsparse
3
![Page 4: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/4.jpg)
Social Media DataIs Facebook sparse?-> 1.4 x 10^9 nodes ¹-> on average 340 friends² per node-> 478 x 10^9 edges-> possible edges: 0.9 x 10^18 => only 0,000000156% of all possible edges existYes Facebook is very sparse¹https://en.wikipedia.org/wiki/Facebook ²http://www.statista.com/statistics/232499/americans-who-use-social-networking-sites-several-times-per-day/
4
![Page 5: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/5.jpg)
Why
spa
rse?
http://www.twolfanger.de/wp-content/uploads/2013/06/Degree-Network.png 5
![Page 6: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/6.jpg)
Instagram network Werner
6
51 friends13 edges[4]
![Page 7: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/7.jpg)
Goal
Find values for
transitivity [2] and
hubness of a graph
7
![Page 8: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/8.jpg)
Outline
- What?- Why?- How?- Conclusion!
8
![Page 9: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/9.jpg)
What is the benefit of knowing the structure of a graph?- deeper insights in Graph
- lossless compression is possible
- link prediction
- number of clusters
- graph partitioning9
![Page 10: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/10.jpg)
Outline
- What?- Why?- How?- Conclusion!
10
![Page 11: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/11.jpg)
Basic regular substructures
Trianglestransitivity
11
Starshubness
![Page 12: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/12.jpg)
Characteristics of CXprime(Compression-based eXploiting Primitives)
Minimum Description Length - based [3]¹
no Input parameters (unsupervised)
Clustering is k-means like
¹https://en.wikipedia.org/wiki/Minimum_description_length 12
![Page 13: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/13.jpg)
Three different ways of coding
- edge
- hub (or star)
- mesh (or triangle)
13
![Page 14: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/14.jpg)
Coding Example Hub
14
G={(A,B);(A,C);(A,D);(A,E);(A,F)}
![Page 15: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/15.jpg)
Coding Example Hub
15
G={(A,B);(A,C);(A,D);(A,E);(A,F)}
G={HUB(A|B,C,D,E,F}
![Page 16: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/16.jpg)
Coding Example Mesh
16
G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}
![Page 17: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/17.jpg)
Coding Example Mesh
17
G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}
G={HUB(A|B,C,D,E);HUB(B|C,D,E);HUB(C|D,E);HUB(D|E)}
![Page 18: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/18.jpg)
Coding Example Mesh
18
G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}
G={HUB(A|B,C,D,E);HUB(B|C,D,E);HUB(C|D,E);HUB(D|E)}
G={M(A,B,C,D,E)}
![Page 19: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/19.jpg)
Coding Example Hub
19
G={(A,B);(A,C);(A,D);(A,E);(A,F)}
G={HUB(A|B,C,D,E,F}
G={M(A,B);M(A,C);M(A,D);M(A,E);M(A,F);M(A,G)}
![Page 20: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/20.jpg)
Outcomes 1
After coding the graph in a star-coding and in a
triangle-coding you can see which one is the
smallest, so which basic structure is most
common.
20
![Page 21: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/21.jpg)
21
![Page 22: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/22.jpg)
all possible connections of Three nodes
22
![Page 23: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/23.jpg)
23
![Page 24: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/24.jpg)
24
![Page 25: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/25.jpg)
Outcomes 2
If you always use the minimum of the three
possible codings you get an overall minimum
graph. This graph is now clustered in areas of
hubs and triangles.
25
![Page 26: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/26.jpg)
Outline
- What?- Why?- How?- Conclusion!
26
![Page 27: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/27.jpg)
27
![Page 28: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/28.jpg)
28
![Page 29: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/29.jpg)
Critics
- No example how the coding
actually looks like
- given probabilities are not
replicable
29
![Page 30: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/30.jpg)
Summary
The mentioned results in the paper are really good. The compression rate is extremely high compared to other graph compression algorithms. The clustering results look really good.
30
![Page 31: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/31.jpg)
Thanks for your attention[1] FENG JING , XIAO HE , NINA HUBIG , CHRISTIAN BÖHM, CLAUDIA PLANT: Compression-based Graph Mining Exploiting Structure Primitives. Data Mining (ICDM), 2013 IEEE 13th International Conference on, 181–190. IEEE, 2013
[2] T. Schank and D. Wagner, “Approximating clustering coefficient and transitivity,” J. Graph Algorithms Appl., vol. 9, no. 2, pp. 265–275, 2005.
[3] J. Rissanen, “An introduction to the mdl principle,” Helsinki Institute for Information Technology, Tech. Rep., 2005.
[4] Python, Pyplot, Instagram API
31
![Page 32: Compression-based Graph Mining Exploiting Structure Primites](https://reader031.fdocuments.us/reader031/viewer/2022030212/589f34251a28ab4d568b6bdf/html5/thumbnails/32.jpg)
32