Synchronous Multi-Master Clusters in WAN
description
Transcript of Synchronous Multi-Master Clusters in WAN
Building Synchronous MySQL clusters in Cloud and WAN
Alexey YurchenkoCodership Oy
3www.codership.com
A Very Dirrrty Word
Sssssssssss...
4www.codership.com
A Very Dirrrty Word
Synchronous.
5www.codership.com
A Very Dirrrty Word
Synchronous.w h a t i s i t g o o d f o r ? ? ?
6www.codership.com
Data Safety
COMMIT
Replicate
OK
Client Master Slave
CO
MM
IT
Asynchronous Replication:
Potential data loss
7www.codership.com
Data Safety
COMMITReplicate
ACK
OK
Client Master Slave
CO
MM
IT
Synchronous Replication:
Additional latency
8www.codership.com
Data Safety
Disaster Recovery:
DC1 DC2Replication
#1
9www.codership.com
Multi-Master
COMMIT Replicate
OK
Client1 Master1 Master2 Client2COMMIT
DEADLOCK
CONFLICTDETECTION
CONFLICTRESOLUTION
COMMIT
CONFLICTDETECTION
CONFLICTRESOLUTION
ROLLBACK
10www.codership.com
Access Latency Elimination
11www.codership.com
Access Latency Elimination
#2
12www.codership.com
us-east
Benchmark Setup (Amazon EC2)
eu-west~ 6000 km, ~ 90 ms RTT
us-east eu-west
13www.codership.com
Access Latency Elimination
client location us-east server US-EU cluster change
us-east 28.03 ms 119.80 ms ~4.3x
eu-west 1953.89 ms 122.92 ms ~0.06x
14www.codership.com
What Happened?
SQL traffic (reads, writes, etc.)
~ 6000 km, ~ 90 ms RTT
SQLtraffic
Replication traffic(commits only)
To Sync or Semi-sync?
16www.codership.com
Look, Ma! No 2-phase commit!
COMMITReplicate
ACK
OK
Client Master Slave
CO
MM
IT
Slave didn't commit!
17www.codership.com
To Sync or Semi-sync?
Synchronous (master rolls back and stops):● Data redundancy preserved (sort of: slave is dead)● Availability compromised (!!!)
Semi-synchronous (master continues):● Data redundancy compromised● Availability preserved
Master Slave
Replicate
18www.codership.com
To Sync or Semi-sync?
For all practical purposes (production) replication is supposed to protect against master loss, not slave loss (slave loss is mitigated by adding more slaves), to increase the availability of the service.
Ironically, fully synchronous replication is not only impractically slow, it is detrimental to the availability goal.
Synchronous Replication in WAN
The Latency And How To Deal With It.
20www.codership.com
The Latency And How to Deal With It
Latency: 1 RTT – 1.5 RTT (100 – 500 ms)
(<200 ms should be practically possible)
Trx rate <= 1/Latency
(10 – 2 transactions per second? Blast! )
21www.codership.com
The Latency And How to Deal With It
The way they deal with any latency:
1) Buffering:
AUTOCOMMIT UPDATEs → multi-statement transactions
2) Parallelization:
1 client session → 10 client sessions
Synchronous Replication in WAN
Galera Cluster for MySQL variants
24www.codership.com
Galera Cluster for MySQL variants
mysqld
Galera
APIwsrep
MySQL
Cluster(other nodes)
Synchronous communication
Dynamic library
wsrep patch
wsrep API
25www.codership.com
Galera Cluster for MySQL variants
26www.codership.com
Galera Cluster for MySQL variants
MySQL-wsrep MariaDBGalera Cluster
PerconaXtraDB Cluster
Galera GaleraGalera
27www.codership.com
Galera Cluster and CAP Theorem
Consistency
AvailabilityPartition
Tolerance
Fixed:
timeouts
Synchronous Replication in WAN
Goals:● Disaster Recovery
● Performance● Service Availability
DO's and DONT's
29www.codership.com
Synchronous Replication in WAN: DO's
Invest in a good WAN link(You invest in nodes. The link is the same part of the
cluster as the nodes are.)
30www.codership.com
Synchronous Replication in WAN: DO's
Categorize your data:
1) Rare, small writes, frequent reads, global data – good.
2) Heavy writes, few reads, local data – bad.
31www.codership.com
Synchronous Replication in WAN: DO's
Categorize your data (OpenStack):
1) KeyStone identity data, Glance image metadata:
mostly reads, small writes, data of global interest.
2) Ceilometer monitoring data:
almost write-only, no need to share globally – store in MongoDB.
Jay Pipes, “Tales from the Field: Backend Data Storage in OpenStack Clouds”
32www.codership.com
Synchronous Replication in WAN: DO's
Configure timeouts:● All Galera timeouts and periods should be no less than WAN round
trip times.
● Defaults should be suitable for networks with up to 500ms RTTs.
● The higher the timeouts – the more partition tolerant and the less available the cluster is (CAP theorem).
● Timeouts relation:RTT <= evs.suspect_timeout <= evs.inactive_timeout <= evs.install_timeout
● evs.suspect_timeout is the timeout to detect single node partition/failure
● Further info:http://galeracluster.com/documentation-webpages/configurationtips.html#wan-replication
33www.codership.com
Synchronous Replication in WAN: DO's
Configure cluster segments:
DC1
1
1
1
DC2
2
2
DC3
3
3
3
2
34www.codership.com
Synchronous Replication in WAN: DO's
Choose odd number of nodes and odd number of datacenters:● Most popular choice: 3x3
● Also observed in the field: 5x3 and 3x5
35www.codership.com
Synchronous Replication in WAN: DO's
3 is better than 2!
DC1
DC2 DC3
36www.codership.com
Synchronous Replication in WAN: DONT's
1) Hot Spots
37www.codership.com
Synchronous Replication in WAN: DONT's
hotspot
1
RTT
38www.codership.com
Synchronous Replication in WAN: DONT's
1) Hot Spots
2) Poor Links
39www.codership.com
Synchronous Replication in WAN: DONT's
Full packet loss
→ the node is not with us
No packet loss
→ the node is with us ???
Synchronous – with who?
40www.codership.com
Synchronous Replication in WAN: DONT's
1) Hot Spots
2) Poor Links
3) Huge Transactions
41www.codership.com
Synchronous Replication in WAN: DONT's
Huge transactions kill concurrency:
a) Long to replicate
b) Long to certify
c) Long to apply on slave
→ SLAVE LAG
42www.codership.com
Synchronous Replication in WAN: DONT's
1) Hot Spots
2) Poor Links
3) Huge Transactions
4) No Primary Keys
43www.codership.com
Synchronous Replication in WAN: DONT's
No PRIMARY KEY:
mysql> DELETE FROM 10M_rows_no_PK_table;
=> 50 000 000 000 000 rows scan.
44www.codership.com
If Synchronous Doesn't Work Out
Galera1
1
3
2 Galera2
A
C
B
Native MySQL Asynchronous Replication Between Galera Clusters
(log_slave_updates = ON)
async
Master Slave
45www.codership.com
If Synchronous Doesn't Work Out
Galera1
1
3
2 Galera2
A
C
B
Native MySQL Asynchronous Replication Between Galera Clusters
async
Master Slave
46www.codership.com
If Synchronous Doesn't Work Out
Galera1
1
3
Galera2
A
C
B
Native MySQL Asynchronous Replication Between Galera Clusters
async
Master Slave
47www.codership.com
If Synchronous Doesn't Work Out
Galera1
1
3
2 Galera2
A
C
B
async
Master Slave
Native MySQL Asynchronous Replication Between MariaDB Galera Clusters(log_slave_updates = OFF)
48www.codership.com
Synchronous Replication in WAN
Q & A