Geographically distributed Swift clusters · Swift is durable server server server server 0 server...

34
Geographically distributed Swift clusters Alistair Coles Swift core developer [email protected] irc: acoles

Transcript of Geographically distributed Swift clusters · Swift is durable server server server server 0 server...

Page 1: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Geographically distributed Swift clusters

Alistair ColesSwift core developer

[email protected]: acoles

Page 2: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Overview

• What is Swift?• Geographically distributed clusters • What? • Why?• How?

• Erasure coded geographically distributed cluster• Swift now supports these !• …enabled by fragment duplication and composite rings

• Summary

2

Page 3: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

What is Swift?

• object storage service

• REST API – Create, Read, Update, Delete

• Simple naming hierarchy• objects belong to containers• containers belong to accounts

3

Page 4: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Swift is durable

server

server

server

server

server0

2

• Multiple replicas of every object (or erasure coding)• The Ring always tries to disperse replicas across

different devices and servers

RingProxy server

PUT a/c/o

3 replica policy

1

4

Page 5: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Swift is scalable

RingProxy server

server

server

server

server

server0

2

PUT a/c/o

3 replica policy

• The Ring always tries to balance load across all devices• No centralized services

RingProxy server

1

0

2PUT a2/c2/o2

3 replica policy1

5

Page 6: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Swift is highly available

RingProxy server

server

server

server

server

server

1

23 replica policy

• Write succeeds on quorum of replicas• Missing replicas are updated asynchronously

PUT a/c/o @ t1‘my old data’

t1

t1

asyncupdate

6

Page 7: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Swift is eventually consistent

RingProxy server

server

server

server

server

server

1

0

23 replica policy

• Possibility to read stale data• Async process makes data eventually consistent

RingProxy serverGET a/c/o @ t2 + ∂

‘my old data’

3 replica policy

t2

t2

t1

asyncupdate

7

PUT a/c/o @ t2‘my new data’

Page 8: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Overview

• What is Swift?• Geographically distributed clusters • What? • Why?• How?

• Erasure coded geographically distributed cluster• Swift now supports these!• … but there’s some new stuff to know about

• Deep dive: erasure coding, fragment duplication, composite rings• Summary

8

Page 9: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Geographically distributed clusters

• What?• Multiple physical locations

• Typically connected by high latency/low bandwidth WAN• Copies of data in each location• Single namespace

• Why?• Disaster recovery• Data locality

• Also known as “Global clusters”, “Multi-region Swift”

WAN

9

Page 10: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Geographically distributed Swift clusters

Ring

Proxy server

3

server

server

server

server

server

1

0

2

4 replica policy

PUT a/c/o

WAN

• The Ring always tries to disperse replicas across different devices and servers …and regions

10

Page 11: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Geographically distributed clusters- disaster recovery

Ring

Proxy server

3

server

server

server

server

server

1

0

2GET a/c/o

WAN

• A 4-replica policy makes each region independently robust to a single device failure

11

4 replica policy

Page 12: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Geographically distributed Swift clusters- data locality

Ring

Proxy server

3

server

server

server

server

server

1

0

2GET a/c/o @ t1

WAN

GET a/c/o @ t2

GET a/c/o @ t3

• By default the ring will try to balance read load by choosing random replicas

random choice for

reads

12

4 replica policy

Page 13: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Read affinity – trade off load balancing for read performance

Ring

Proxy server

3

server

server

server

server

server

1

0

2

Ring

Proxy server

GET a/c/o @ t4

read_affinity -> region1

WAN

read_affinity -> region2

always read replica in

local region

GET a/c/o @ t1

GET a/c/o @ t2

GET a/c/o @ t3

13

Page 14: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

What if remote writes fail?

Ring

Proxy server

server

server

server

server

server0

2PUT a/c/o @ t1

1

WANasync

move to remote region

3

• Remote replicas are written to temporary locations.

• Async process moves them when remote region is available.

temporarily misplaced

replicas

14

t1

t1

t1

t1

Page 15: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Overwrite - replicas are eventually consistent

Ring

Proxy server

server

server

server

server

server

PUT a/c/o @ t3‘my new data’

Ring

Proxy serverGET a/c/o @ t4‘my old data’

0

2

1

WAN

PUT a/c/o @ t1‘my old data’

temporarily stale

replicas

asyncmove to remote region

3

1

WAN fails at t2

15

t1

t1

t3

t3

t3

t3

3

Page 16: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

What if remote writes are slow?

Ring

Proxy server

server

server

server

server

server

1

0

2PUT a/c/o

4 replica policy

“But isn’t this a terrible idea? All my writes will be slowed down by requests to the remote region!”

3

16

WAN

Page 17: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Effect of remote region write time on PUTs(remote region servers artificially slowed)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00

10K

obje

ct

PUT

time

(s)

Time (secs)

Remote write 100ms

Remote write 200ms

Remote write 600ms

Remote write 800ms

post_quorum_timeout= 0.5

Remote write 400ms

post_quorum_timeout puts an upper bound on the extra latency

17

Page 18: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Write affinity – temporarily trade off dispersion for write performance

Ring

Proxy server

server

server

server

server

server0

2PUT a/c/o

1

write_affinity = region1

1

• Remote replicas are initiallywritten to the local region.

• Async process moves them to remote region.

asyncmove to remote region

3

3

18

WAN

Page 19: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Write affinity performance improvement(remote region servers artificially slowed)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00

10K

obje

ct

PUT

time

(s)

Time (secs)

Write affinity enabledRemote write

100ms

Remote write 200ms

Remote write 600ms

Remote write 800ms

Remote write 400ms

19

Page 20: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Write affinity – data is always available

Ring

Proxy server

server

server

server

server

server

Ring

Proxy serverGET a/c/o @ t1 + ∂‘my data’

PUT a/c/o @ t1‘my data’

0

2

1

WANreads fall back to remote region

3write_affinity = region1

20

Page 21: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Write affinity – trade off consistency for write performance

Ring

Proxy server

server

server

server

server

server

PUT a/c/o @ t2‘my new data’

Ring

Proxy serverGET a/c/o @ t2 + ∂‘my data’

0

2

1

WAN

PUT a/c/o @ t1‘my data’

write_affinity = region1 asyncmove to remote region

3

1

3

21

t1

t1

t2

t2

t2

t2

temporarily stale

replicas

Page 22: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Workload considerations

• Use write affinity with care!• “No free lunch” - replicas have to be copied across WAN eventually

• Suitable workloads:• Moderate or bursty write rates • Non-immediate remote reads• E.g. replicating archive data across sites

• Unsuitable workloads:• Continuous high write rate

• misplaced replicas will back up in local region• Immediate remote reads

• reads will fetch data over WAN before async move has happened

22

Page 23: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Overview

• What is Swift?• Geographically distributed clusters • What? • Why?• How?

• Erasure coded geographically distributed cluster• Swift now supports these!• … but there’s some new stuff to know about

• Deep dive: erasure coding, fragment duplication, composite rings• Summary

23

Page 24: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

decode

Erasure Code (EC)

Erasure coding – same durability, less storage

data 0 1 2 3 4 5

Example: 4 data fragments + 2 parity fragments

Any subset of 4 unique fragments is sufficient to reconstruct data:

data0 1 3 5

Requires only 1.5 x size of data to store all fragments

24

Page 25: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

server

Erasure coding in Swift

server

server

server

server

server

3

0

2

1

5

4https://github.com/openstack/pyeclibPython interface to liberasurecode

https://github.com/openstack/liberasurecodeC library with pluggable Erasure Code backends

ECdata 0 1 2 34 5

Ring

Proxy server

EC 4+2 policy

4 + 2 erasure coding requires approx. 50% storage vs 3 replicas with similar durability

25

Page 26: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Erasure coding across regions

server

server

server

server

server

3

0

2

1

5

4

ECdata 0 1 2 34 5

Ring

Proxy server

EC 4+2 policy

WAN

26

• The Ring always tries to dispersefragments across different devices and servers …and regions

Page 27: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Erasure coding across regions

server

server

server

server

server

3

0

2

1

5

4

With EC 4+2 we don’t have enough fragments in oneregion to reconstruct the

data!

ECdata 0 1 2 34 5

Ring

Proxy server

EC 4+2 policy

WAN

27

Page 28: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Erasure coding across regions requires more fragments

server

server

server

server

server

3

0

2

1

5

4

8

7

9

6

How about EC 4+6?requires 2.5 x size of data

vsReplication requires 4 x size of

data for similar durability

ECdata0 1 2 34 5 6 78 9

Ring

Proxy server

EC 4+6 policy

WAN

28

Page 29: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Erasure coding time increases with number of parity fragments

0

0.01

0.02

0.03

0.04

0.05

0.06

2 3 4 5 6 7

Rela

tive

com

pute

tim

e

Number of parity fragments

4 data fragments isa_l_rs_cauchy backend40MB object

29

Page 30: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

EC duplication: more fragments, less compute

server

server

server

server

server

0

0

2

1

2

4

3

4

3

1

Each region has EC 4+1 fragmentsrequires 2.5 x size of data

vsReplication requires 4 x size of

data for similar durability

Ring

Proxy server

ECdata 0 1 2 34 3

0

4

12

3

0

4

12

WAN

EC 4+1 policy plus duplication

30

Page 31: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

But duplicates must be correctly dispersed…

server

server

server

server

server

3

0

2

1

1

4

3

4

0

2

ECdata 0 1 2 34

Ring

Proxy server

We don’t have enough unique fragments in this region to reconstruct the

data!

1 123 3

20 0

4 4

WAN

EC 4+1 policy plus duplication

31

Page 32: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

region1

region2server

server

server

server

server

Solution: EC duplication + composite rings

server

server

server

server

server

0

0

2

1

2

4

3

4

3

1

PUT a/c/o

EC 4+1 policy plus duplication

ECdata 0 1 2 34

Proxy server

3

0

4

12

3

0

4

12

WAN

Each region has its own ‘component’ ring - this guarantees a set of unique fragments in each region.

32

Page 33: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Summary

• Swift enables you to build geographically distributed clusters• Good for disaster recovery and data locality• Tuning via options in the Swift proxy server

• write_affinity, read_affinity• Understand your workloads!

• Erasure coded storage policies can also be geographically distributed1

• Uses new features: EC fragment duplication and composite rings

1 new in Swift 2.15.0

33

Page 34: Geographically distributed Swift clusters · Swift is durable server server server server 0 server 2 • Multiple replicas of every object (or erasure coding) • The Ring always

Swift welcomes new users and contributors

• You can find us in freenode #openstack-swift• Project: https://launchpad.net/swift• Docs: https://docs.openstack.org/swift• Code: https://github.com/openstack/swift

34