Distributed Load Balancing for Key-Value Storage Systems Imranul Hoque Michael Spreitzer Malgorzata...

25
Distributed Load Balancing for Key- Value Storage Systems Imranul Hoque Michael Spreitzer Malgorzata Steinder

Transcript of Distributed Load Balancing for Key-Value Storage Systems Imranul Hoque Michael Spreitzer Malgorzata...

Distributed Load Balancing for Key-Value Storage Systems

Imranul HoqueMichael Spreitzer

Malgorzata Steinder

2

3

Key-Value Storage Systems

• Usage:– Session state, tags, comments, etc.

• Requirements:– Scalability– Fast response time– High availability & fault tolerance– Relaxed consistency guarantee

• Example: Cassandra, Dynamo, PNUTS, etc.

4

Load Balancing in K-V Storage

• Hash partitioned vs. range partitioned– Range partitioned data ensures efficient range

scan/search– Hash partitioned data helps even distribution

Server 1 Server 2 Server 3 Server 4

SATTUE

SUNMON

WED THU

FRI

MON TUE WED THU FRI SAT SUN

Tablets Table

5

Issues with Load Balancing

• Uneven space distribution due to range partitioning– Solution: partition the tablets and move them

around• Few number of very popular records

Server 1 Server 2 Server 3 Server 4

SATTUE

SUNMON

WED THU

FRI

6

Contribution

• Algorithms for solving the load balancing problem– Load = space, bandwidth– Evenly distribute the spare capacity– Distributed algorithm, not a centralized one– Reduce the number of moves

• Previous solutions:– One dimensional/key-space redistribution/bulk

loading

7

Outline

• Motivation• System modeling and assumptions• Algorithms– One-to-one– One-to-n– Move suppression

• Design decisions• Experimental results

Emulation of proposed distributed algorithms• Future works

8

System Modeling and Assumptions

Table

Tablet

Tablet

Tablet

Server A

Server B

Server C

B1, S1

B2, S2

B3, S3

BA, SA

BB, SB

BC, SC1. <= 0.01 in both dimensions2. # of tablets >> # of nodes

B1, S1

B4, S4

B5, S5

9

System State

B

STarget Zone:

helps achieve convergence

Target Point

Goal: Move tablets around so that every server is within the target zone

10

Load Balancing Algorithms

• Phase 1:– Global averaging scheme– Variance of the approximation of the average

decreases exponentially fast • Phase 2:– One-to-one gossip– One-to-n gossip– Move suppression

Phase 1 Phase 2 Phase 1 Phase 2

t

11

One-to-One Gossip

• Point selection strategy– Midpoint strategy – Greedy strategy

• Tablet transfer strategy– Move to the selected point with minimum cost

(space transferred)

12

Tablet Transfer StrategyServer 2

Server 1

Target for Server 1

B

S

13

Tablet Transfer Strategy (2)

Server 1

Left Right

• Start with an empty bag• Goal: take vectors from the servers so that they add up

to the target vector• If slope(bag + left + right) < slope(target):– Add right to bag, move right– Otherwise, add left to bag move left

14

Initial Configurations

Uniform Two Extreme Mid Quadrant

15

Point Selection Strategy

• Midpoint Strategy+ Guaranteed convergence+ No need to run phase 1– Lots of extra movement

• Visualization Demo– Uniform– Two extreme– Mid quadrant 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Series1; 0.5Series1; 0.45

0.85

0.65

S

B

Server 1

Server 2

16

Point Selection Strategy (2)

• Greedy Strategy– Take the point closer to

the target– Move it to the target, if• improves the position of

the other point• does not worsen by more

than δ

• Reduces movement

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Series1; 0.5Series1; 0.45

0.85

Server 1

Server 2

Takes long time to converge in some cases

17

DHT-based Location Directory

18

DHT + Midpoint

• Greedy + fallback to DHT:– Convergence problem exists for some configurations– Visualization Demo

• Solution:– Greedy + fallback to DHT with Midpoint – Demo: uniform, two extreme, mid quadrant

• Alternate approach:– Greedy + fallback to Midpoint– Trade-off: movement cost vs. DHT overhead

19

Experimental Evaluation

• Uniform configuration– Greedy + DHT (Midpoint)– Midpoint– Greedy + Midpoint (No DHT)

• Effect of varying target zone• Effect of failed gossip count• Metrics– Amount of space moved– # of gossip rounds– Multiple tablet move

20

Uniform Configuration: Results

greedy midpoint greedy+mid0

200

400

600

800

1000

1200

Spac

e m

oved

1 2 3 4 5 6 7 8 9 > 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

greedymidpointgreedy+midpoint

# of movements

% o

f mov

ed t

able

ts

greedy midpoint greedy+mid0

5

10

15

20

25

30

35

40

Failed GossipSuccessful Gossip

# of

rou

nds

21

Effect of Varying Target Zone

0.01 0.02 0.03 0.04 0.050

100

200

300

400

500

600

700

800

900

1000

GreedyMidpoint

Half-length of target zone

Spac

e m

oved

0.01 0.02 0.03 0.04 0.050

5

10

15

20

25

30

35

40

Avg. Failed Gossip: MidpointAvg.Successful Gossip: MidpointAvg. Failed Gossip: GreedyAvg. Successful Gossip: Greedy

Half-length of target zone#

of r

ound

s

Larger target zone = fast convergence, less accuracy

Target zone width should depend on the target point value

22

Effect of Failed Gossip Count (Greedy)

5 10 15 20 250

5

10

15

20

25

30

Failed GossipSuccessful Gossip

Failed Gossip

# of

roun

ds

5 10 15 20 250

100

200

300

400

500

600

Failed Gossip

Spac

e m

oved

Large failed gossip count = More time in greedy mode, more unproductive gossip at the end

23

One-to-N Gossip

• Contact a few random nodes– Locked/unlocked mode

• Pick the most profitable one – Distance from the target is minimized

• Advantage– Better choices

• Initial results– Locked mode: may lead to deadlock– Unlocked mode: most of the cases other nodes start

transfer

24

Move Suppression

• Two global stages• Stage 1:– One-to-One gossip, but moves are hypothetical

• Stage 2:– Change to chosen placement

• Advantage– Tablet not moved multiple times

• Challenges– When to switch to Stage 2 from Stage 1

25

Future Works

• Handling initial placement• Frequency of running the placement

algorithm• Considering the network hierarchy• Handling failures• Extending to heterogeneous resources

Questions?