Gossip based partitioning and replication for Online Social Networks

22
Gossip-based Partitioning and Replication Middle-ware for Online Social Networks Muhammad Anis Uddin Nasir (EMDC/ICT/LCN) Supervisor: Šarūnas Girdzijauskas Examiner: Johan Montelius

Transcript of Gossip based partitioning and replication for Online Social Networks

Page 1: Gossip based partitioning and replication for Online Social Networks

Gossip-based Partitioning and ReplicationMiddle-ware forOnline Social Networks

Muhammad Anis Uddin Nasir(EMDC/ICT/LCN)

Supervisor: Šarūnas GirdzijauskasExaminer: Johan Montelius

Page 2: Gossip based partitioning and replication for Online Social Networks

Online Social Networks

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware

•Vertices •Edges •Metadata

Ioanna Antonio Vaidas

Aras

VasiaAnis

Mudit

Manos

2

LeandroJohan

Page 3: Gossip based partitioning and replication for Online Social Networks

Existing Solutions

• Relational Databases- MySQL Cluster

• Key-Value stores- Cassandra, Amazon Dynamo

• Document Databases- MongoDB, CouchDB

• Graph Databases- Neo4j, Titans

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 3

Page 4: Gossip based partitioning and replication for Online Social Networks

Why Existing Solutions are not enough?

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware

5

3

4

2

1

10

8

9

7

6

4

Page 5: Gossip based partitioning and replication for Online Social Networks

Why Existing Solutions are not enough?

• Random Partitioning• Social Request

- E.g., gather new feeds from all the friends

• Enforcing Data Locality

• Random partitioning can lead to full replication!

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware

5

3

4

2

1

10

8

9

7

6

1 4 7 82 3 5 6 10 9

1’ 4’ 7’ 8’ 9’ 2’ 3’ 6’5’ 10’

5

Page 6: Gossip based partitioning and replication for Online Social Networks

Social Graphs are not Random

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 6

Graphs with

small world

properties

Page 7: Gossip based partitioning and replication for Online Social Networks

Graph Partitioning

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 7

Page 8: Gossip based partitioning and replication for Online Social Networks

JA-BE-JA- edge-cut

04/18/2023Muhammad Anis Uddin Nasir- Gossip-based Partitioning and

Replication Middle-ware

Server A Server B

6

3

5

2

1

4

76’

3’

1’

4’

7’

• Edge Cut = 3 links, 3+2=5 replicas to maintain

8

Page 9: Gossip based partitioning and replication for Online Social Networks

SPAR- Minimizing Replicas

04/18/2023Muhammad Anis Uddin Nasir- Gossip-based Partitioning and

Replication Middle-ware

Server A Server B

6

3

5

2

1

4

76’

3’2’

5’

• Edge Cut = 4 links, 2+2=4 replicas to maintain

9

Page 10: Gossip based partitioning and replication for Online Social Networks

Initialization

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware

5

3

4

2

1

10

8

9

7

6

1 4 7 82 3 5 6 10 9

1’ 4’ 7’ 8’ 9’ 2’ 3’ 6’5’ 10’

• Node Addition- Assign it to server with minimum master

• Edge Addition- Check if Nodes are Local- Else create replicas to

maintain locality

10

Page 11: Gossip based partitioning and replication for Online Social Networks

Gossip Phase

• Cost Function- Count number of replicas- For current and new server

• Peer Selection- Local, Random, Hybrid

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware

5

3

4

2

1

10

8

9

7

6

1 4 7 82 3 5 6 10 9

1’ 4’ 7’ 8’ 9’ 5’ 10’

11

2’ 3’ 6’

Page 12: Gossip based partitioning and replication for Online Social Networks

Gossip Phase

• Cost Function- Count number of replicas- For existing and new server

• Peer Selection- Local, Random, Hybrid

• Simulated Annealing

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware

5

3

4

2

1

10

8

9

7

6

6 4 7 82 3 5 1 10 9

4’ 8’ 9’ 3’ 5’ 10’6’ 1’

4 10

12

Page 13: Gossip based partitioning and replication for Online Social Networks

Simulated Annealing

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 13

Page 14: Gossip based partitioning and replication for Online Social Networks

Algorithms

Algorithm Random SPAR JA-BE-JA Gossip-based

Data locality

Decentralized

Load Balancing

Fault tolerance

Avoiding Local Optima

Availability

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 14

Page 15: Gossip based partitioning and replication for Online Social Networks

Datasets

Datasets Vertices Edges

Synth-C 2,000 20,000

Synth-HC 2,000 20,000

Synth-PL 2,000 20,000

SNAP-Facebook 4,039 88,234

WSON-Facebook 60,290 1,545,686

SNAP-Twitter 81,306 1,768,149

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 15

Page 16: Gossip based partitioning and replication for Online Social Networks

Evaluation- with datasets

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware

Synt

h-C

Synt

h-HC

Synt

h-PL

SNAP

-Fac

eboo

k

WSO

N-Fac

eboo

k

SNAP

-Twitt

er0

2

4

6

8

10

12Random

SPAR

JA-BE-JA

Gossip-based

Replic

ati

on O

verh

ead

>3x gain compared to

Random Partitioning

≈2x gain compared to

SPAR

• Number of Servers =16, Replication factor=2

16

Page 17: Gossip based partitioning and replication for Online Social Networks

Evaluation- with replication factor

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware

Replic

ati

on O

verh

ead

• Number of Servers =16

Synt

h-LC

Synt

h-LH

C

Synt

h-PL

Synt

h-C

Synt

h-HC

SNAP

-Fac

eboo

k

WSO

N-Fac

eboo

k

SNAP

-Twitt

er0123456789

10f=0

f=2

Random Graphs generates maximum replication overhead Real Graphs

generates minimum replication overhead

Data locality is achieved by fault tolerance replicas

17

Page 18: Gossip based partitioning and replication for Online Social Networks

Evaluation- with servers

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware

Replic

ati

on O

verh

ead

• Replication factor =2

Number of Servers

WSON-Facebook

18

8 16 32 6402468

101214161820

Random

SPAR

JA-BE-JA

Gossip-based

Gossip-based generates minimum

replication overhead

Replication overhead

increases non linearly

>4x gain compared to Random Partitioning

8 16 32 6402468

101214161820

Gossip-based

Page 19: Gossip based partitioning and replication for Online Social Networks

Evaluation- dynamicity

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware

• Number of Servers =16, Replication factor=2

1 157 313 469 625 781 937 10931249140515610.2

0.25

0.3

0.35

0.4

0.45

1 125 249 373 497 621 745 869 993 111712411365148916130.2

0.25

0.3

0.35

0.4

0.45

SNAP-Twitter SNAP-Facebook

Number of cycles Number of cycles

Replic

ati

on O

verh

ead

Replic

ati

on O

verh

ead

Spikes show bulk edge addition

AlgorithmStabilization

19

Transition state, i.e., reducing the

number of replicas after new edge

additions

Page 20: Gossip based partitioning and replication for Online Social Networks

Conclusion

• Random Partitioning does not provide efficient solution of Online Social Networks

• Minimizing Replicas can help to achieve better partitioning

• Gossip-based heuristic was proposed to solve the minimization problem while achieving the global optima

• Algorithm able to handle different datasets and adjusts with dynamic nature of OSNs

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 20

Page 21: Gossip based partitioning and replication for Online Social Networks

Gossip-based Partitioning and ReplicationMiddle-ware forOnline Social Networks

Muhammad Anis Uddin Nasir(EMDC/ICT/LCN)

Supervisor: Šarūnas GirdzijauskasExaminer: Johan Montelius

Page 22: Gossip based partitioning and replication for Online Social Networks

Future Work

• Execution of the algorithm with large datasets using parallel graph processing frameworks like GraphLab and Apache Girpah

• Load Balancing using both Master and Replicas and providing different consistency levels

• Smart Replication to provide data locality for highly interactive nodes

• Implement different consistency strategies based to access patterns

04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 22