Download - Database Replication in Tashkent

Transcript
Page 1: Database Replication in Tashkent

Database Replication in Tashkent

CSEP 545 Transaction ProcessingSameh Elnikety

Page 2: Database Replication in Tashkent

Replication for PerformanceExpensiveLimited scalability

2

Page 3: Database Replication in Tashkent

DB Replication is Challenging• Single database system

– Large, persistent state– Transactions– Complex software

• Replication challenges– Maintain consistency – Middleware replication

3

Page 4: Database Replication in Tashkent

BackgroundReplica 1StandaloneDBMS

4

Page 5: Database Replication in Tashkent

Background

Replica 2

Replica 1

Replica 3

Load Balancer

5

Page 6: Database Replication in Tashkent

Read Tx

Replica 2

Replica 1

Replica 3

Load Balancer

T

Read tx does not change DB state

6

Page 7: Database Replication in Tashkent

Update tx changesDB state

Update Tx 1/2

Replica 2

Replica 1

Replica 3

Load Balancer

Twsws

7

Page 8: Database Replication in Tashkent

Update tx changesDB state

Update Tx 1/2

Replica 2

Replica 1

Replica 3

Load Balancer

Tws

Apply (or commit) T everywhere

ws

ws

ws

Example:T1: { set x = 1 }

8

ws

Page 9: Database Replication in Tashkent

Ordering

Update Tx 2/2

Replica 2

Replica 1

Replica 3

Load Balancer

Tws

Update tx changesDB state

ws

Tws

ws

9

Page 10: Database Replication in Tashkent

Ordering

Update Tx 2/2

Replica 2

Replica 1

Replica 3

Load Balancer T

Update tx changesDB state

T

ws

ws

ws ws

ws

ws

ws

ws

Replica 3 Example:T1: { set x = 1 }T2: { set x = 7 }

Commit updates in order

10

Page 11: Database Replication in Tashkent

Ordering

Sub-linear Scalability Wall

Replica 2

Replica 1

Replica 3

Load Balancer T

T

ws

ws

ws ws

ws

ws

ws

ws

Replica 3

11

Replica 4

Page 12: Database Replication in Tashkent

• General scaling techniques– Address fundamental bottlenecks– Synergistic, implemented in middleware– Evaluated experimentally

This Talk

12

Page 13: Database Replication in Tashkent

Super-linear Scalability

Single Base United MALB UF0

20

40

60

80

100

120

TP S

12 X

25 X

37 X

1 X

7 X

Page 14: Database Replication in Tashkent

Big Picture: Let’s OversimplifyStandaloneDBMSR reading

updatelogging

U

14

Page 15: Database Replication in Tashkent

readingupdatelogging

Big Picture: Let’s Oversimplify

Replica 1/N (traditional)

StandaloneDBMSR reading

updatelogging

U

N.RN.U

RU

(N-1).ws

15

Page 16: Database Replication in Tashkent

readingupdatelogging

readingupdatelogging

Big Picture: Let’s Oversimplify

Replica 1/N (traditional)

Replica 1/N (optimized)

StandaloneDBMS

16

R readingupdatelogging

U

N.RN.U

RU

(N-1).ws

N.RN.U

R*U*

(N-1).ws*

Page 17: Database Replication in Tashkent

readingupdatelogging

readingupdatelogging

Big Picture: Let’s Oversimplify

Replica 1/N (traditional)

Replica 1/N (optimized)

StandaloneDBMS

17

R readingupdatelogging

U

N.RN.U

RU

(N-1).ws

N.RN.U

R*U*

(N-1).ws*

MALBUpdate FilteringUniting O & D

Page 18: Database Replication in Tashkent

Key Points1. Commit updates in order

– Perform serial synchronous disk writes– Unite ordering and durability

2. Load balancing– Optimize for equal load: memory contention– MALB: optimize for in-memory execution

3. Update propagation– Propagate updates everywhere– Update filtering: propagate to where needed

18

Page 19: Database Replication in Tashkent

Tx A

Roadmap

Replica 2

Replica 1

Replica 3

Load Balancer

12, 3Ordering

Load balancing

Update propagation

Commit updates in

order 19

Page 20: Database Replication in Tashkent

• Traditionally: – Commit ordering and durability are separated

• Key idea: – Unite commit ordering and durability

Key Idea

20

Page 21: Database Replication in Tashkent

All Replicas Must Agree• All replicas agree on

– which update tx commit– their commit order

• Total order – Determined by middleware – Followed by each replica

durability

Replica 3

Tx A

Tx Bdurability

Replica 2

durability

Replica 1

21

Page 22: Database Replication in Tashkent

Tx B

durability

Replica 3

Ordering

Tx A

Order Outside DBMSTx A

Tx Bdurability

Replica 2

durability

Replica 1

22

Page 23: Database Replication in Tashkent

Tx B

durability

Replica 3

Ordering

Tx A

A B

A B

Order Outside DBMSTx A

Tx Bdurability

Replica 2

A B

durability

Replica 1

A B

A B

A B

A B

23

Page 24: Database Replication in Tashkent

Ordering

A B DBMS

durability

Replica 3

Proxy

Tx A

Tx B

SQL interface

Task A

Task B

Enforce External Commit Order

24

Page 25: Database Replication in Tashkent

Ordering

A B DBMS

durability

Replica 3

Proxy

Tx A

Tx B

SQL interface

Task A

Task B

B A

Enforce External Commit Order

25

Page 26: Database Replication in Tashkent

Ordering

A B DBMS

durability

Replica 3

Proxy

Tx A

Tx B

SQL interface

Task A

Task B

B A

Cannot commit A & B concurrently!

Enforce External Commit Order

26

Page 27: Database Replication in Tashkent

Ordering

A B

durability

Replica 3

Proxy

Tx A

Tx B

SQL interface

Task A

Task B

A

Enforce Order = Serial Commit

DBMS

27

Page 28: Database Replication in Tashkent

Ordering

A B

durability

Replica 3

Proxy

Tx A

Tx B

SQL interface

Task A

Task B

A B

Enforce Order = Serial Commit

DBMS

28

Page 29: Database Replication in Tashkent

Commit Serialization is Slow

DurabilityA

Proxy

DBMS

durability

CPU

OrderingA B C

Commit orderA B C

DurabilityA B

CPU

DurabilityA B C

CPU

Com

mit A

Com

mit B

Com

mit C

Ack

A

Ack

B

Ack

C

29

Page 30: Database Replication in Tashkent

Commit Serialization is Slow

DurabilityA

Proxy

DBMS

durability

CPU

OrderingA B C

Commit orderA B C

DurabilityA B

CPU

DurabilityA B C

CPU

Com

mit A

Com

mit B

Com

mit C

Ack

A

Ack

B

Ack

C

Problem: Durability & ordering separated → serial disk writes

30

Page 31: Database Replication in Tashkent

Com

mit A

Com

mit B

Com

mit C

Ack

A

Ack

B

Ack

C

Unite D. & O. in Middleware

Proxy

DBMS

CPU

OrderingA B C

Commit orderA B C

CPU

DurabilityA B C

CPU

durabilityOFF

durability

31

Page 32: Database Replication in Tashkent

Com

mit A

Com

mit B

Com

mit C

Ack

A

Ack

B

Ack

C

Unite D. & O. in Middleware

Proxy

DBMS

CPU

OrderingA B C

Commit orderA B C

CPU

DurabilityA B C

CPU

durabilityOFF

durability

Solution: Move durability to MW Durability & ordering in middleware → group commit

32

Page 33: Database Replication in Tashkent

• Middleware logs tx effects– Durability of update tx

• Guaranteed in middleware• Turn durability off at database

• Middleware performs durability & ordering– United → group commit → fast

• Database commits update tx serially– Commit = quick main memory operation

Implementation: Uniting D & O in MW

33

Page 34: Database Replication in Tashkent

Uniting Improves Throughput• Metric

– Throughput• Workload

– TPC-W Ordering (50% updates)

• System– Linux cluster – PostgreSQL– 16 replicas– Serializable exec. Single Base United MALB UF

0

5

10

15

20

25

30

35

40

TPC-W

1 X

12 X

7 X

TP S

Page 35: Database Replication in Tashkent

Tx A

Roadmap

Replica 2

Replica 1

Replica 3

Load Balancer

1Ordering

2, 3

Load balancing

Update propagation

Commit updates in

order 35

Page 36: Database Replication in Tashkent

Key IdeaReplica 1

Mem

Disk

Replica 2Mem

Disk

Load Balancer

Equal load on replicas

36

Page 37: Database Replication in Tashkent

Key IdeaReplica 1

Mem

Disk

Replica 2Mem

Disk

Load Balancer

Equal load on replicas

MALB: (Memory-Aware Load Balancing)Optimize for in-memory execution 37

Page 38: Database Replication in Tashkent

How Does MALB Work?Database 21 3

Workload A →

B →

MemMemory

21

2 3

38

Page 39: Database Replication in Tashkent

A, B, A, B

A, B, A, B

Read Data From Disk

A, B, A, B

Replica 1Mem

Disk21 3

Replica 2Mem

Disk21 3

LeastLoaded

31

A →

B →21

2 3

39

Page 40: Database Replication in Tashkent

A, B, A, B

A, B, A, B

Read Data From Disk

A, B, A, B

Replica 1Mem

Disk21 3

Replica 2Mem

Disk21 3

LeastLoaded

31

Slow

Slow

A →

B →21

2 3

40

21 331

21 331

Page 41: Database Replication in Tashkent

Data Fits in MemoryReplica 1

Mem

Disk21 3

Replica 2Mem

Disk21 3

MALB

A →

B →21

2 3A, A, A, A

B, B, B, B

A, B, A, B

41

Page 42: Database Replication in Tashkent

Data Fits in MemoryReplica 1

Mem

Disk21 3

21

Replica 2Mem

Disk21 3

32

MALB

Fast

Fast

A →

B →21

2 3A, A, A, A

B, B, B, BMemory info?Many tx and replicas?

A, B, A, B

42

Page 43: Database Replication in Tashkent

• Exploit tx execution plan– Which tables & indices are accessed– Their access pattern

• Linear scan, direct access• Metadata from database

– Sizes of tables and indices

Estimate Tx Memory Needs

43

Page 44: Database Replication in Tashkent

• Objective– Construct tx groups that fit together in memory

• Bin packing– Item: tx memory needs– Bin: memory of replica– Heuristic: Best Fit Decreasing

• Allocate replicas to tx groups– Adjust for group loads

Grouping Transactions

44

Page 45: Database Replication in Tashkent

MALB in Action

A B CD E F

MALB

45

Page 46: Database Replication in Tashkent

MALB in Action

A B CD E F

MALB

Memory needs forA, B, C, D, E, F

46

Page 47: Database Replication in Tashkent

Group A

MALB in Action

A B CD E F Group B C

Group D E F

MALB

Memory needs forA, B, C, D, E, F

47

Page 48: Database Replication in Tashkent

Group A

MALB in Action

A B CD E F Replica

Replica

Replica

Group B C

A

Group D E F

B C

D E F

MALB

Disk

Disk

Disk

Memory needs forA, B, C, D, E, F

48

Page 49: Database Replication in Tashkent

• Objective– Optimize for in-memory execution

• Method– Estimate tx memory needs– Construct tx groups– Allocate replicas to tx groups

MALB Summary

49

Page 50: Database Replication in Tashkent

• Implementation– No change in consistency – Still middleware

• Compare– United: efficient baseline system– MALB: exploits working set information

• Same environment– Linux cluster running PostgreSQL– Workload: TPC-W Ordering (50% update txs)

Experimental Evaluation

50

Page 51: Database Replication in Tashkent

MALB Doubles Throughput

TPC-WOrdering16 replicas

51

Single Base United MALB UF0

20

40

60

80

100

120

TP S

105%

12 X

25 X

1 X

7 X

Page 52: Database Replication in Tashkent

MALB Doubles Throughput

52United MALB

0.0

0.2

0.4

0.6

0.8

1.0

Single Base United MALB UF0

20

40

60

80

100

120

TP S

Rea

d I/O

, nor

mal

ized

105%

12 X

25 X

1 X

7 X

Page 53: Database Replication in Tashkent

BigSmall

Big

Small

MemSize

DBSize

Big Gains with MALB

4%0%29%

48%105%45%

182%75%12%

Page 54: Database Replication in Tashkent

BigSmall

Big

Small

MemSize

DBSize

Big Gains with MALB

4%0%29%

48%105%45%

182%75%12%

Run from memory

Run from disk

Page 55: Database Replication in Tashkent

Tx A

Roadmap

Replica 2

Replica 1

Replica 3

Load Balancer

1Ordering

2, 3

Load balancing

Update propagation

Commit updates in

order 55

Page 56: Database Replication in Tashkent

• Traditional: – Propagate updates everywhere

• Update Filtering: – Propagate updates to where they are needed

Key Idea

56

Page 57: Database Replication in Tashkent

Update Filtering ExampleReplica 1

Mem

Disk21 3

Replica 2Mem

Disk21 3

MALBUF

A →

B →21

2 3

A, B, A, B

57

Page 58: Database Replication in Tashkent

Group A

Update Filtering ExampleReplica 1

Group B

Mem

Disk21 3

21

Replica 2Mem

Disk21 3

32

MALBUF

A →

B →21

2 3

A, B, A, B

58

Page 59: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk21 3

2

MALBUF

Updatetable 1

3

3

A →

B →21

2 3

A, B, A, B

59

Page 60: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk21 3

2

MALBUF

Updatetable 1

3

3

A →

B →21

2 3

A, B, A, B

60

Page 61: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk2 3

2

MALBUF

Updatetable 1

3

3

A →

B →21

2 3

A, B, A, B

611

Page 62: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk21 3

2

MALBUF

Updatetable 1

3

Updatetable 3

3

A →

B →21

2 3

A, B, A, B

62

Page 63: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk21 3

2

MALBUF

Updatetable 1

3

Updatetable 3

3

A →

B →21

2 3

A, B, A, B

63

Page 64: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk21 3

2

MALBUF

Updatetable 1

3

Updatetable 3

3

A →

B →21

2 3

A, B, A, B

64

Page 65: Database Replication in Tashkent

Update Filtering in Action

UF

65

Page 66: Database Replication in Tashkent

Update Filtering in Action

UF

Update tored table

66

Page 67: Database Replication in Tashkent

Update Filtering in Action

UF

Update tored table

Update togreen table

67

Page 68: Database Replication in Tashkent

Update Filtering in Action

UF

Update tored table

Update togreen table

68

Page 69: Database Replication in Tashkent

Update Filtering in Action

UF

Update tored table

Update togreen table

69

Page 70: Database Replication in Tashkent

Single Base United MALB UF0

20

40

60

80

100

120

MALB+UF Triples Throughput37 X

TP S

12 X

25 X

1 X

7 X

49% TPC-WOrdering16 replicas

Page 71: Database Replication in Tashkent

MALB UF0

2

4

6

8

10

12

14

Single Base United MALB UF0

20

40

60

80

100

120

MALB+UF Triples Throughput37 X

TP S

12 X

25 X

1 X

7 X Prop

. Upd

ates

15

7

49%

Page 72: Database Replication in Tashkent

1.49

0

0.5

1

1.5

2

MALB MALB+UF

Filtering Opportunities

50%Ordering Mix

5% Browsing Mix

1.02

0

0.5

1

1.5

2

MALB MALB+UF

Updates

Rat

io M

ALB

+UF

/ MA

LB

72

Page 73: Database Replication in Tashkent

Conclusions1. Commit updates in order

– Perform serial synchronous disk writes– Unite ordering and durability

2. Load balancing– Optimize for equal load: memory contention– MALB: optimize for in-memory execution

3. Update propagation– Propagate updates everywhere– Update filtering: propagate to where needed

73