Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
-
Upload
sotiris-beis -
Category
Technology
-
view
137 -
download
2
description
Transcript of Benchmarking graph databases on the problem of community detection
![Page 1: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/1.jpg)
ADBIS 2014 Conference
Ohrid, Republic of Macedonia, September 7-10, 2014
Benchmarking graph databases on the problemof community detectionSotirios Beis, Symeon Papadopoulos, Yiannis Kompatsiaris
Centre for Research and Technology Hellas, Information Technologies Institute (CERTH-ITI)
![Page 2: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/2.jpg)
Important Note
• This is an update of the actual presentation given in September 2014. After the presentation, we received comments about the implementation of our benchmark, which led us to update the implementation and rerun the experiments. We would like to thank @lgarulli and @OrientDB for their contribution.
• The updated paper is now available on:http://mklab.iti.gr/files/beis_adbis2014_corrected.pdf
while the original is still available on the Springer site:http://link.springer.com/chapter/10.1007/978-3-319-10518-5_1
#2
![Page 3: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/3.jpg)
#3
Overview
• Problem formulation
• Related work
• Workload description
• Evaluation
• Conclusions
![Page 4: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/4.jpg)
Motivation
• The rapid growth of Online Social Networks contributes to the creation of high-volume and velocity graph data
– Twitter had 145M users in 2010 and 300M in 2011
• The need for massive graph data management and processing systems is constantly increasing
– Many methods and technologies available (RDBMS, OODBMS and graph databases)
• Benchmarks to evaluate candidate solutions are absolutely essential!
– We had to choose one in the context of the SocialSensorresearch project
#4
http://dstevenwhite.com/2011/12/29/social-media-growth-2006-2011/
![Page 5: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/5.jpg)
Problem Formulation
• Given a set of database management systems assess the performance of each system with respect to a common mining task: community detection
• Many DBMS available to use
– Graph databases selected, as they are designed to store and manage efficiently big graph data
– Benchmarked systems: Titan, OrientDB and Neo4j
• Many operations used in databases to benchmark
– Workloads that simulate common operations in OSNs employed
#5
![Page 6: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/6.jpg)
Related Work (1)
• Evaluation between different DBMS
– Giatsoglou et al. focused on the Social Tagging System use case scenario
– Angles et al. and Armstrong et al. proposed a synthetic graph generator with OSN characteristics and use this data for evaluation
– Grossniklaus et al. used a workloads that cover a wide variety of graph data use case
– Vicknair et al. utilized query workload that simulates typical operations performed in provenance system
#6
![Page 7: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/7.jpg)
Related Work (2)
• Evaluation between graph DBMS
– Bader et al. proposed a benchmark with four operations: insert, h-hops node and edge selection, while Dominguez et al. reports the implementation.
– A similar workload is proposed by Jouili et al. emphasizing the effects of increasing multiple concurrent users.
– Ciglan et al. proposed a benchmark focusing on graph traversal operations.
– Dayarathna et al. used similar workload, focusing on graph databases server and cloud environments.
#7
![Page 8: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/8.jpg)
Workloads Description (1)
Our Contribution
• Clustering Workload (CW)
– Very important due to its numerous applications, such as topic detection, photo clustering and event detection.
– Until now most community detection algorithms used main memory to store the graph and perform the required operations fast execution time, unable to manage big graphs.
– We propose an implementation of Louvain Method on top of graph databases employing cache techniques take advantage of both graph database capabilities and in-memory execution speed.
#8
![Page 9: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/9.jpg)
Workload Description (2)
Supplementary Workloads
• Massive Insertion Workload (MIW)– Populates a graph with bulk load operations.
– Simulates batch creation of a graph.
• Single Insertion Workload (SIW)– Every object insertion is committed directly.
– Simulates the growth of a OSN.
• Query Workload (QW)– FindNeighbors query (FN) Finds the neighbors of all nodes
• simulates the friends retrieval a Facebook user
– FindAdjacentNodes query (FA) Finds the adjacent nodes of all edges
• used to find whether two user joined the same Facebook group
– FindShortestPath (FS) Finds the shortest paths between 100 nodes
• used to find the level two LinkedIn users are connected
#9
![Page 10: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/10.jpg)
Experimental Study
• Datasets
– Synthetic for CW, generated with LFR generator.
– Real for MIW, SIW & QW, from SNAP dataset collection.
• Evaluation Measure: execution time (second)
#10
![Page 11: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/11.jpg)
Results: Comparison wrt CW
#11
![Page 12: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/12.jpg)
Results: Cache size impact
#12
Titan
Neo4j
OrientDB
![Page 13: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/13.jpg)
Results: comparison wrt MIW and QW
#13
![Page 14: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/14.jpg)
Results: Comparison wrt SIW
#14
YouTube LiveJournal
![Page 15: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/15.jpg)
#15
Conclusions
SUMMARY• OrientDB is the most efficient solution for CW, while Titan is
the winner in SIW
• Neo4j is the fastest graph database for MIW and QW
• Titan and OrientDB have competitive performance in MIW and QW respectively don't scale as good as Neo4j
FUTURE WORK• Test with bigger graphs.
• Run the benchmark employing the distributed version of Titan and OrientDB and other graph databases (Sparksee already added).
• Improve performance of the implemented community detection method.
![Page 16: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/16.jpg)
References (1)
Evaluation between different DBMSGiatsoglou, M., Papadopoulos, S., Vakali, A.: Massive graph management for the web
and web 2.0. In Vakali, A., Jain, L., eds.: New Directions in Web Data Management 1. Volume 331 of Studies in Computational Intelligence. Springer Berlin Heidelberg (2011) 19-58
Angles, R., Prat-Perez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: Benchmarking database systems for social network applications. In: First International Workshop on Graph Data Management Experiences and Systems. GRADES '13, New York, NY, USA, ACM (2013) 15:1-15:7
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the facebook social graph. (2013)
Grossniklaus, M., Leone, S., Zaschke, T.: Towards a benchmark for graph data management and processing. (2013)
Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a graph database and a relational database: A data provenance perspective. In: Proceedings of the 48th Annual Southeast Regional Conference. ACM SE '10, New York, NY, USA, ACM (2010) 42:1-42:6
#16
![Page 17: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/17.jpg)
References (2)
Evaluation between graph DBMSBader, D.A., Feo, J., Gilbert, J., Kepner, J., Koester, D., Loh, E., Madduri, K., Mann,
B., Meuse, T., Robinson, E.: Hpc scalable graph analysis benchmark (2009)
Dominguez-Sal, D., Urbn-Bayes, P., Gimnez-Va, A., Gmez-Villamor, S., Martnez-Bazn, N., Larriba-Pey, J.: Survey of graph database performance on the hpc scalable graph analysis benchmark. In Shen, H., Pei, J., zsu, M., Zou, L., Lu, J., Ling, T.W., Yu, G., Zhuang, Y., Shao, J., eds.:Web-Age Information Management. Volume 6185 of Lecture Notes in Computer Science. Springer Berlin Heidelberg (2010) 37-48
Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking traversal operations over graph databases. In: Data Engineering Workshops (ICDEW), 2012 IEEE 28th
International Conference on. (April 2012) 186-189
Jouili, S., Vansteenberghe, V.: An empirical comparison of graph databases. In: Social Computing (SocialCom), 2013 International Conference on. (Sept 2013) 708-715
Dayarathna, M., Suzumura, T.: Xgdbench: A benchmarking platform for graph stores in exascale clouds. In: Cloud Computing Technology and Science (Cloud-Com), 2012 IEEE 4th International Conference on. (Dec 2012) 363-370
#17
![Page 18: Benchmarking graph databases on the problem of community detection](https://reader034.fdocuments.us/reader034/viewer/2022042614/559445811a28ab13738b4596/html5/thumbnails/18.jpg)
#18
Questions
Further contact:
[email protected] / [email protected]
Acknowledgement:
www.socialsensor.eu
Source: https://github.com/socialsensor/graphdb-benchmarks