HA Reloaded

40
IVAN ZORATTI Chief Technology Officer High Availability Reloaded 1201.01.01 Oracle, MySQL and InnoDB are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Tuesday, 24 January 12

description

 

Transcript of HA Reloaded

Page 1: HA Reloaded

IVAN ZORATTIChief Technology Officer

High AvailabilityReloaded

1201.01.01Oracle, MySQL and InnoDB are registered trademarks of Oracle and/or its affiliates.  Other names may be trademarks of their respective owners.Tuesday, 24 January 12

Page 2: HA Reloaded

Agenda

• SkySQL - 3 (+1) slides!

• A bit of theory

• High availability solutions

• ...and the famous last words!

2

Tuesday, 24 January 12

Page 3: HA Reloaded

SkySQL Ab

• Funded by:–MySQL® AB founders Monty Widenius and

David Axmark–US Investment group OnCorps.org

• A team of 40, operating in 14 countries, 90% from MySQL® AB

• Backed by:–Product Engineering MontyProgram Ab–Top Community contributors, commercial partners

and end users

3

Tuesday, 24 January 12

Page 4: HA Reloaded

SkySQL Offering• SkySQL Enterprise Subscriptions

– Monitoring, Administration and End User tools– Specialised modules for High Availability and performance

improvements1

• SkySQL Enterprise Cluster and SkySQL Enterprise HA– Up to L3 Technical and Consultative Support for the most used

MySQL® distributions and branches

• SkySQL Consulting– Top class team for MySQL® technology– Extended service offering from Health Check to continuous

administration

• SkySQL Training– MySQL® Training and Certification

41 - Option

Tuesday, 24 January 12

Page 5: HA Reloaded

The SkySQL Reference ArchitectureComponents

Migra&onToolsMigra&onTools

5

Integra&onToolsIntegra&onTools

Tuesday, 24 January 12

Page 6: HA Reloaded

6

High Availability...a bit of theory

Tuesday, 24 January 12

Page 7: HA Reloaded

High Availability

“High availability is a system design protocol and associated implementation that ensures a

certain degree of operational continuity during a given measurement period.”

7

Tuesday, 24 January 12

Page 8: HA Reloaded

Fault-tolerant?

“Fault-tolerant design enables a systemto continue operation,

possibly at a reduced level(also known as graceful degradation),

rather than failing completely,when some part of the system fails.”

8

Tuesday, 24 January 12

Page 9: HA Reloaded

Switchover / Failover

• Switchover– “Switchover is the capability to manually switch over from

one system to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active server, system, or network.”

• Failover– “Failover is the capability to switch over automatically to a

redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active application, server, system, or network.”

• Aided Switchover?• Failback?

9

Tuesday, 24 January 12

Page 10: HA Reloaded

Downtime

• Planned/Scheduled

• Unplanned/Unscheduled

• “Downtime or outage duration refers to a period of time that a system fails to provide or perform its primary function.”

10

Tuesday, 24 January 12

Page 11: HA Reloaded

Single Point Of Failure - SPOF

11

“A Single Point of Failure, (SPOF),

is a part of a system which, if it fails,

will stop the entire system from working.”

Tuesday, 24 January 12

Page 12: HA Reloaded

12

“Disaster recovery is the process, policies and procedures related to preparing for recovery

or continuation of technology

infrastructure critical to an organization after a

natural or human-induced disaster.”

Disaster Recovery andBusiness Continuity

“Disaster recovery planning is a subset of a

larger process known as business continuity planning and should include planning for

resumption of applications, data,

hardware, communications (such

as networking) and other IT infrastructure.”

Tuesday, 24 January 12

Page 13: HA Reloaded

13

“Disaster recovery is the process, policies and procedures related to preparing for recovery

or continuation of technology

infrastructure critical to an organization after a

natural or human-induced disaster.”

Disaster Recovery and Business Continuity

Tuesday, 24 January 12

Page 14: HA Reloaded

Designing a Highly Available System

• Which level of High Availability do I need?

• Do I require no loss of data?

• Do I need failover or is switchover enough?

• Can I provide a reasonable service when a component is down?

14

Tuesday, 24 January 12

Page 15: HA Reloaded

Something to clarify...

• Availability vs Scalability

• HA Costs

• HA for your systems, not only for your database

• Review your SLAs

15

Tuesday, 24 January 12

Page 16: HA Reloaded

16

High AvailabilitySolutions

Tuesday, 24 January 12

Page 17: HA Reloaded

High Availability with MySQL

17

HigherAvailability

• Combined solutions• Shared nothing distributed cluster with MySQL

Cluster• Geographical Replication for disaster recovery• Virtualised Environments• Active/Passive Clusters through shared storage• MySQL synchronous replication• Generic synchronous replication• MySQL Replication with agents and failover• MySQL Replication

Tuesday, 24 January 12

Page 18: HA Reloaded

MySQL Replication

18

• Something you may have missed...–Asynchronous or Semi-synchronous–Pros and Cons of RBR vs SBR–Mono-thread pull from

the slaves–sync_binlog = 0/1–Antilope vs Barracuda–Group Commit–Multi-engines

–Rolling upgrades

99

Read-Write

Read-Only Read-Only

binlog

relaylog relaylog relaylog relaylogTuesday, 24 January 12

Page 19: HA Reloaded

MySQL Replication with MMM(Multi-Master replication Manager)

• Master-Master features:–Monitoring–Automatic failover–Data backup–Resynch

• http://code.openark.org/blog/mysql/problems-with-mmm-for-mysql

• http://www.xaprb.com/blog/2011/05/04/whats-wrong-with-mmm/

19

mmm_agentd mmm_agentd

Read-Write

binlog binlog

relaylog relaylog relaylog relaylog

Read-Only Read-Only

mmm_mond

http://mysql-mmm.org

Tuesday, 24 January 12

Page 20: HA Reloaded

MySQL Replication with MHA

20

• Something to consider...–read-only=1 and

log-bin on slaves–Master IP failover–Filtering rules–multi-tier replication

http://code.google.com/p/mysql-master-ha/

Tuesday, 24 January 12

Page 21: HA Reloaded

Tungsten Replicator

• Open Source, heterogenous replication• Truly multi-master

and fan-in withGlobal ID

• Per-schemamulti-thread

21http://code.google.com/p/tungsten-replicator/

Replicatoragent

Replicatoragent

Replicatoragent

Replicatoragent

Read-Write

Tuesday, 24 January 12

Page 22: HA Reloaded

Tungsten Enterprise

22

Read-Write

Connector Connector Connector Connector Connector

Replicator+ Monitor

Replicator+ Monitor

Replicator+ Monitor

Replicator+ Monitor

http://www.continuent.com/solutions/overview

• Tungsten Replicator +–Client Connector with R/W

split and load balancing–Replication Monitoring–Integrated backup

Tuesday, 24 January 12

Page 23: HA Reloaded

Synchronous Replication with DRBD

• Typical Active/Standby• Cross active/active servers implementations

• Possible issues:–Dependencies–Infrastructure SPOFs–Write performance

impact–InnoDB only

• DRBD in a virtualizedenvironment

23Block Device

Block Device

Active/HotServer

Passive/Std-byServer

Tuesday, 24 January 12

Page 24: HA Reloaded

Synchronous Replication through DRBD

24

Configuration

Block Device Block Device

Active/HotServer Passive/Std-by

Server

192.168.1.1

Gateway

192.168.1.X

VIP192.168.1.2

HB1: 10.0.3.X

HB2: 10.0.4.X

DRBD: 10.0.5.X

15 16

/dev/sdb /mysqldata /dev/sdb /mysqldata

Tuesday, 24 January 12

Page 25: HA Reloaded

Synchronous Replication with Galera

• Synchronous replication for InnoDB• Multi-master, no SPOF

• Applicationfailover must bemanaged

• Conflict resolution

25http://www.codership.com

wsrep wsrep wsrep

Read-Write Read-Write

Tuesday, 24 January 12

Page 26: HA Reloaded

Percona XtraDB Cluster

• Alpha version of Galera + XtraDB• Multi-master, no SPOF

• Applicationfailover must bemanaged

• Conflict resolution withaborted COMMITs

• Auto Increment• No XA TXN• NoPK operations issues

26http://www.percona.com/doc/percona-xtradb-cluster

wsrep wsrep wsrep

Read-Write Read-Write

Tuesday, 24 January 12

Page 27: HA Reloaded

SchoonerSQL

• Synchronous master-slave replication for InnoDB• Retrieve/Inject in

the transactionlog and bufferpool

• Monitoring/Administrationtool

• Closed source

27

Tuesday, 24 January 12

Page 28: HA Reloaded

Active/Passive Clusters usingShared Storage

• Points to consider:–Redundancy and replication

must be guaranteed bythe shared storage(and this is not trivial)

–InnoDB only–File Systems

28Shared Storage

Active/HotServer

Passive/Std-byServer

Tuesday, 24 January 12

Page 29: HA Reloaded

Active/Passive Clusters using Shared Storage

29

Large Deployments

Shared Storage

in01 in02 in03 in04in05 in06 in07 in08

01 02 03

04 05 06

07 08

VIP01 VIP02 VIP03 VIP04VIP05 VIP06 VIP07 VIP08

Tuesday, 24 January 12

Page 30: HA Reloaded

Active/Passive Clusters using Shared Storage

30

Failover in Large Deployments

Shared Storage

in01

in02 in03 in04

in05in06 in07 in08

01 02 03

04 05 06

07 08

VIP01VIP02 VIP03 VIP04

VIP05

VIP06 VIP07 VIP08

Tuesday, 24 January 12

Page 31: HA Reloaded

Virtualised Environments• Data storage, high availability and load balancing are

provided and managed by the virtualised software• In case of fault, the virtualised software restarts on

any otherphysicalserver

• MySQL Replicationfor disasterrecovery

• InnoDB only

31Shared Storage

01 02 0304 05 0607 08

01

02

03

04

05

06

07

08

Tuesday, 24 January 12

Page 32: HA Reloaded

Geographical Replication for Disaster Recovery

• Master-Master Asynchronous Replication is used to update the backup data centre

• In case of fault, the network traffic is redirected to the backup data centre. Failback must be executed manually

• Cross-platform and cross-engine

32

ActiveData Centre Backup Data

Centre

Tuesday, 24 January 12

Page 33: HA Reloaded

Storage Snapshots for Disaster Recovery

• Snapshots are managed by the NAS and SAN firmware. There is usually a short read-only freeze

• Snapshots can be used as run-time backup

• InnoDB only, NetApp NASs and firmware are certified using Snapshot and Snapmirror

33

Active DataCentre

Backup Data Centre

Tuesday, 24 January 12

Page 34: HA Reloaded

34

MySQL Cluster• Shared-nothing, fully transactional and distributed architecture used for high volume and

small transactions.• MySQL Cluster is based on the NDB (Network DataBase) Storage Engine• Data is distributed for scalability and performance, and it is replicated for redundancy on

multiple data nodes.• Nodes in a cluster:

–SQL Nodes: provide theSQL interface to NDB

–API Nodes: provide thenative NDB API

–Data Nodes: store andretrieve data, managetransactions

–Management Nodes:manage the Cluster

• Load balanced• Memory or disk-based• Geographically replicated

for disaster recovery withconflict resolution

• Full online operation formaintenance andadministration

Application Nodes

Data Nodes

SQL Nodes

ManagementNodes

ND

B A

PI, C

lust

erJ/

JPA

Tuesday, 24 January 12

Page 35: HA Reloaded

Client-based Failover and Proxies

• Connector/J–jdbc:mysql://[host][,failoverhost...][:port]

• mysqlnd_ms for PHP–connection pooling for mysqli, mysql and

PDO_MYSQL

• ScaleBase

35

Tuesday, 24 January 12

Page 36: HA Reloaded

The absolutely necessary comparison chart...

36

MySQL  Replica.on

MHA Tungsten DRBDGaleraXtraDBCluster

ShoonerSQL

Shared  Cluster

VMGeo  

Replica.onStorage  

SnapshotsMySQL  Cluster

100%  Data  Safe

✘ ✘ ✔ ✔ ✘ ✔ ✔ ✔ ✘ ✘ ✔

All  Storage  Engines

✔ ✔ ✔ ✘ ✘ ✘ ✘ ✘ ✔ ✘ ✘

Automa&c  Failover

✘! ✔ ✔ ✔ ✘! ✔ ✘! ✔ ✘ ✘ ✔

Performance  Overhead(*  -­‐  Best)

* * ** *** ** ** -­‐ ** * ** *

Easy  admin/config(*  -­‐  Best)

* * * *** * * * * ** ** ***

Scalability(***-­‐  Best) ** ** *** * ** ** * * * * **

Tuesday, 24 January 12

Page 37: HA Reloaded

The famous last words...• I need 5 nines

• Everything must be automatic–

• I want to migrate to MySQL Cluster–

• I can’t afford to lose any data–

• I need a sub-second failover–

37

Tuesday, 24 January 12

Page 38: HA Reloaded

The famous last words...• I need 5 nines

–Implement what you really need

• Everything must be automatic–Aided switchover is sometimes more effective,

inexpensive and easy to implement/administer

• I want to migrate to MySQL Cluster–Is your application designed for Cluster?

• I can’t afford to lose any data–People lose data every day. Is the drop in

performance worth it?

• I need a sub-second failover–Check the timeout periods and the caching warm-

ups38

Tuesday, 24 January 12

Page 39: HA Reloaded

SkySQL Enterprise HA

• Full HA solution, supported on–Platforms: Linux, Windsows Solaris X86–DB Servers: Oracle MySQL, MariaDB, Percona Server–2 to 3 days implementation guaranteed with

acceptance tests

• Technologies:–MySQL Replication–DRBD Active/Passive or Cross Active–MHA Tool with/without Multi-tier Replication–Linux or Windows Shared Storage–MySQL Cluster–Tungsten Enterprise

39

Tuesday, 24 January 12