Shengqi Yang, Xifeng Yan, Bo Zong and Arijit Khan Computer...

Shengqi Yang, Xifeng Yan, Bo Zong and Arijit Khan

Computer Science, UC Santa Barbara {sqyang, xyan, bzong, arijitkhan}@cs.ucsb.edu

5/25/2012 1

ICB UCSB

5/25/2012j 2

Web graph Social network

Biological graph Linked data: RDF graphs

Google: > 1 trillion indexed pages

Facebook: >800 million active users

De Bruijn: billions of vertices

20 billion RDF tuples

Memory-resident solution Running on single server.

Difficult/Impossible to accommodate the content of an extremely large graph.

Low concurrency.

Simple distributed solution (e.g., MapReduce) Running on commodity cluster.

High concurrency and enough memory space.

Chained MapReduce

• Communication and serialization overhead.

• Programming complexity

5/25/2012 3 UCSB

Distribution model: graph partitioning. Computation model: run on each partition/vertex

simultaneously. Communication model: message passing

5/25/2012 4 UCSB

Pros Designed for iterative jobs

▪ Graph algorithms: shortest distance, PageRank, etc.

▪ DM/ML tasks: K-Means, belief propagation, Parallel Gibbs sampling, etc.

Scalable, robust, simple.

Cons Graph partitioning: one partition can not fit all !

Query on graph • Unbalanced workload

• Inter-machine communication

5/25/2012 5 UCSB

Unbalanced workload

5/25/2012 6

Inter-machine communication

Graph query pattern

Graph query processing

Solving the problems facing graph partitioning (Pregel)

• Workload balancing (replication)

• Communication reduction

Graph partition management strategy

• Evolving query workload.

Sedge: a Self Evolving Distributed Graph Processing

Environment

5/25/2012 7 UCSB

Partitioning techniques

Complementary partitioning

On demand partitioning

Two-level partition management

System architecture

Experiments

Conclusions

5/25/2012 8 UCSB

Complementary partitioning : repartition the graph with region constraint.

These two sets of partitions will run independently.

5/25/2012 9 UCSB

… …

Iteratively repartition the graph Pros

Effective communication reduction Workload balancing

Cons Space limitation Can not adapt to dynamic workload

5/25/2012 10 UCSB

C1 C2 C3

5/25/2012 11

Query profiling Blocking: use blocks to track cross-partition queries. Advantages: •Query generalization. •Profiling with fewer features.

Blocks Goal: coarsen a graph Method: disjoint 1-hop region

C1 C2 C3

Envelope: a set of blocks that covers a cross partition query.

Envelope Collection: put the maximized number of envelopes into a new partition wrt. space constraint.

5/25/2012 12 UCSB

Intention: combine similar envelopes sharing many common blocks.

Algorithm:

1) Similarity search (nearest neighbor search).

Locality Sensitive Hashing (LSH): Min-Hash, in O(n)

2) Envelope combining

1. Cluster the envelopes in the same bucket produced by Min-Hash.

2. Combine the clusters with highest benefit.

5/25/2012 13 UCSB

Two-level partition architecture Primary partitions. e.g., A, B, C and D. They are inter-

connected in two-way

Secondary partitions. e.g., B’ and E. They are connected with primary partitions in one-way

5/25/2012 14 UCSB

5/25/2012 UCSB 15

SP2Bench (Schmidt et. al., ICDE ‘09) Employ the DBLP library as its simulation

basis.

100M tuples (11.24GB): 6 partitions.

Query templates:

• 5 of 12: Q2, Q4, Q6, Q7, Q8.

• E.g. Q4: Given a journal (J), select all distinct pairs of article author names for authors (?A) that have published papers (?P) in the journal.

Cluster environment 31 nodes (connected by a gigabit Ethernet).

• 4 GB RAM and quad-core 2:60GHz Xeon Processors

• 1 master, 30 workers.

5/25/2012 16

5/25/2012 UCSB 17

1 part. set 5 comp. part. set

Query workload: 104 queries with random start

Log scale !

5/25/2012 UCSB 18

5 Static replication

5 Comp. part.

4 Comp. part. + On demand part.

Datasets UK web graph: 30M vertices, 956M edges.

Twitter: 15M users, 57M edges.

Bio graph: 50M vertices, 51M edges.

Synthetic graph: 0.5B vertices, 2 Billion edges (by R-MAT,

Chakrabarti et. al., SDM’04).

Query workload Mixture of general graph queries

• neighbor search (2~3 hops).

• random walk (5~10 steps).

• random walk with restart (5~10 steps, 10% restart).

5/25/2012 19 UCSB

5/25/2012 UCSB 20

Twitter

Conclusions Partitioning techniques

• Complementary partitioning

• On demand partitioning

Two-level partition management Project home:

http://grafia.cs.ucsb.edu/sedge Documents and sample datasets

Source code

5/25/2012 21 UCSB

5/25/2012 22 UCSB

Shengqi Yang, Xifeng Yan, Bo Zong and Arijit Khan Computer...

Documents

Transcript of Shengqi Yang, Xifeng Yan, Bo Zong and Arijit Khan Computer...

“Good Business Practices for Improved Market …...Sedge Mat Business Practice Our Background • Khmer Sedge Designs Co.Ltd is a jointed Venture company between member of Cambodia

Existing Parking Lot (1.50 ac) - Banning Ranch Conservancy€¦ · Sagebrush Dwarf Coyote Brush Dwarf Coyote Brush Berkeley Sedge Dune Sedge Foothill Sedge Natal Plum Coast Sun ...

BLACK FENSOIL, SCREENED SEDGE PEAT TOP DRESSINGS …

2019–2020 Florida Citrus Production Guide: WeedsWeeds in Citrus, HS-955/HS175, Identification of Grass Weeds in Florida Citrus, HS-962/HS205, Identification of Sedge and Sedge-Like

Towards Effective Partition Management for Large Graphs › ~tozsu › courses › CS848 › W15 › presentatio… · Towards Effective Partition Management for Large Graphs Shengqi

AS Trembling aspen - Slough sedge 00 - a100.gov.bc.caa100.gov.bc.ca/appsdata/acat/documents/r17734/Appendix5bTEM... · AS Trembling aspen - Slough sedge CDFmm 00 ... Rocky mountain

SEDGE-28-2016 - DEF

Kobresia simpliciuscula (Wahlenberg) Mackenzie (simple bog sedge

Great Bulrush Schoenoplectus tabernaemontani Cyperaceaeâ€”Sedge family

Efficient By Designccag-eh.ucanr.edu/files/241427.pdf · Carex pansa •Clustered field sedge Carex praegracilis •Santa Barbara Sedge Carex barbarae •San Diego Sedge Carex spissa

Scots pine litter decomposition along drainage succession ... · oligotrophic tall sedge fen (VSN sensu Laine and Vasander, 1906) and an oligotrophic tall sedge pine fen (VSR) were

· 2013-05-24 · Cindy & Robert Sedge 736 Talbot Street Urban Design Brief 1 . Cindy & Robert Sedge . Affordable Housing Apartments . 736 Talbot Street, London . Urban Design Brief

ARTICLE IN PRESS - epic.awi.de · northern tree line was located near the Oyogos Yar region during the Eemian thermal optimum. Grass-sedge Grass-sedge dominated tundra vegetation

Turfgrass Science - EDIS · Sedge Biology and Management in Turf1 Darcy E. P. Telenko, Ramon Leon, J. Bryan Unruh, and Barry J. Brecke2 Turfgrass Science Members of the sedge family

St. John River - Seven Islands and White Pond Fen Sundew Livid Sedge Low Spike-moss Marsh Valerian Mistassini Primrose Moor Rush Northern Bog Sedge Northern Comandra Showy Lady’s-slipper

Advanced Partitioning Techniques for Massively Distributed Computationjrzhou/pub/Partitioning-SIGMOD12.pdf · Advanced Partitioning Techniques for Massively Distributed Computation

RED OAK RAIN GARDEN...RED OAK RAIN GARDEN PLANT COLLECTION FALL 2019 GROUNDCOVERS AND FORBS: CELL 2 EMORY’S SEDGE Carex emoryi PENNSYLVANIA SEDGE Carex pensylvanica WILD GINGER Asarum

New RED OAK RAIN GARDEN · 2020. 8. 20. · RED OAK RAIN GARDEN. PLANT COLLECTION SUMMER 2020. GROUNDCOVERS AND FORBS: CELL 2. EMORY’S SEDGE Carex emoryi PENNSYLVANIA SEDGE Carex

Strzelecki Ranges bioregion - Environment...MTG Lepidosperma gladiatum Coast Sword-sedge MTG Isolepis inundata Swamp Club-sedge MNG Ficinia nodosa Knobby Club-sedge GF Pteridium SC

Ontology-based Subgraph Querying Yinghui Wu Shengqi Yang Xifeng Yan University of California, Santa Barbara.