HA Reloaded

IVAN ZORATTIChief Technology Officer

High AvailabilityReloaded

1201.01.01Oracle, MySQL and InnoDB are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.Tuesday, 24 January 12

Agenda

• SkySQL - 3 (+1) slides!

• A bit of theory

• High availability solutions

• ...and the famous last words!

2

Tuesday, 24 January 12

SkySQL Ab

• Funded by:–MySQL® AB founders Monty Widenius and

David Axmark–US Investment group OnCorps.org

• A team of 40, operating in 14 countries, 90% from MySQL® AB

• Backed by:–Product Engineering MontyProgram Ab–Top Community contributors, commercial partners

and end users

3


SkySQL Offering• SkySQL Enterprise Subscriptions

– Monitoring, Administration and End User tools– Specialised modules for High Availability and performance

improvements1

• SkySQL Enterprise Cluster and SkySQL Enterprise HA– Up to L3 Technical and Consultative Support for the most used

MySQL® distributions and branches

• SkySQL Consulting– Top class team for MySQL® technology– Extended service offering from Health Check to continuous

administration

• SkySQL Training– MySQL® Training and Certification

41 - Option


The SkySQL Reference ArchitectureComponents

Migra&onToolsMigra&onTools

5

Integra&onToolsIntegra&onTools


6

High Availability...a bit of theory


High Availability

“High availability is a system design protocol and associated implementation that ensures a

certain degree of operational continuity during a given measurement period.”

7


Fault-tolerant?

“Fault-tolerant design enables a systemto continue operation,

possibly at a reduced level(also known as graceful degradation),

rather than failing completely,when some part of the system fails.”

8


http://en.wikipedia.org/wiki/Graceful_degradation

http://en.wikipedia.org/wiki/Graceful_degradation

http://en.wikipedia.org/wiki/Failure

http://en.wikipedia.org/wiki/Failure

Switchover / Failover

• Switchover– “Switchover is the capability to manually switch over from

one system to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active server, system, or network.”

• Failover– “Failover is the capability to switch over automatically to a

redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active application, server, system, or network.”

• Aided Switchover?• Failback?

9


http://en.wikipedia.org/wiki/Redundancy_(engineering)


http://en.wikipedia.org/wiki/Computer


http://en.wikipedia.org/wiki/Server_(computing)


http://en.wikipedia.org/wiki/System


http://en.wikipedia.org/wiki/Computer_network


http://en.wikipedia.org/wiki/Abnormal_end














http://en.wikipedia.org/wiki/Application_software

http://en.wikipedia.org/wiki/Application_software

Downtime

• Planned/Scheduled

• Unplanned/Unscheduled

• “Downtime or outage duration refers to a period of time that a system fails to provide or perform its primary function.”

10




Single Point Of Failure - SPOF

11

“A Single Point of Failure, (SPOF),

is a part of a system which, if it fails,

will stop the entire system from working.”


12

“Disaster recovery is the process, policies and procedures related to preparing for recovery

or continuation of technology

infrastructure critical to an organization after a

natural or human-induced disaster.”

Disaster Recovery andBusiness Continuity

“Disaster recovery planning is a subset of a

larger process known as business continuity planning and should include planning for

resumption of applications, data,

hardware, communications (such

as networking) and other IT infrastructure.”


http://en.wikipedia.org/wiki/Natural_disaster


http://en.wikipedia.org/wiki/Man-made_hazards




http://en.wikipedia.org/wiki/Disaster


13

“Disaster recovery is the process, policies and procedures related to preparing for recovery

or continuation of technology

infrastructure critical to an organization after a

natural or human-induced disaster.”

Disaster Recovery and Business Continuity










Designing a Highly Available System

• Which level of High Availability do I need?

• Do I require no loss of data?

• Do I need failover or is switchover enough?

• Can I provide a reasonable service when a component is down?

14


Something to clarify...

• Availability vs Scalability

• HA Costs

• HA for your systems, not only for your database

• Review your SLAs

15


16

High AvailabilitySolutions


High Availability with MySQL

17

HigherAvailability

• Combined solutions• Shared nothing distributed cluster with MySQL

Cluster• Geographical Replication for disaster recovery• Virtualised Environments• Active/Passive Clusters through shared storage• MySQL synchronous replication• Generic synchronous replication• MySQL Replication with agents and failover• MySQL Replication


MySQL Replication

18

• Something you may have missed...–Asynchronous or Semi-synchronous–Pros and Cons of RBR vs SBR–Mono-thread pull from

the slaves–sync_binlog = 0/1–Antilope vs Barracuda–Group Commit–Multi-engines

–Rolling upgrades

99

Read-Write

Read-Only Read-Only

binlog

relaylog relaylog relaylog relaylogTuesday, 24 January 12

MySQL Replication with MMM(Multi-Master replication Manager)

• Master-Master features:–Monitoring–Automatic failover–Data backup–Resynch

• http://code.openark.org/blog/mysql/problems-with-mmm-for-mysql

• http://www.xaprb.com/blog/2011/05/04/whats-wrong-with-mmm/

19

mmm_agentd mmm_agentd

Read-Write

binlog binlog

relaylog relaylog relaylog relaylog

Read-Only Read-Only

mmm_mond

http://mysql-mmm.org


http://tinyurl.com/yzdg9tb


MySQL Replication with MHA

20

• Something to consider...–read-only=1 and

log-bin on slaves–Master IP failover–Filtering rules–multi-tier replication

http://code.google.com/p/mysql-master-ha/




Tungsten Replicator

• Open Source, heterogenous replication• Truly multi-master

and fan-in withGlobal ID

• Per-schemamulti-thread

21http://code.google.com/p/tungsten-replicator/

Replicatoragent

Replicatoragent

Replicatoragent

Replicatoragent

Read-Write




Tungsten Enterprise

22

Read-Write

Connector Connector Connector Connector Connector

Replicator+ Monitor

Replicator+ Monitor

Replicator+ Monitor

Replicator+ Monitor

http://www.continuent.com/solutions/overview

• Tungsten Replicator +–Client Connector with R/W

split and load balancing–Replication Monitoring–Integrated backup




Synchronous Replication with DRBD

• Typical Active/Standby• Cross active/active servers implementations

• Possible issues:–Dependencies–Infrastructure SPOFs–Write performance

impact–InnoDB only

• DRBD in a virtualizedenvironment

23Block Device

Block Device

Active/HotServer

Passive/Std-byServer


Synchronous Replication through DRBD

24

Configuration

Block Device Block Device

Active/HotServer Passive/Std-by

Server

192.168.1.1

Gateway

192.168.1.X

VIP192.168.1.2

HB1: 10.0.3.X

HB2: 10.0.4.X

DRBD: 10.0.5.X

15 16

/dev/sdb /mysqldata /dev/sdb /mysqldata


Synchronous Replication with Galera

• Synchronous replication for InnoDB• Multi-master, no SPOF

• Applicationfailover must bemanaged

• Conflict resolution

25http://www.codership.com

wsrep wsrep wsrep

Read-Write Read-Write




Percona XtraDB Cluster

• Alpha version of Galera + XtraDB• Multi-master, no SPOF

• Applicationfailover must bemanaged

• Conflict resolution withaborted COMMITs

• Auto Increment• No XA TXN• NoPK operations issues

26http://www.percona.com/doc/percona-xtradb-cluster

wsrep wsrep wsrep

Read-Write Read-Write





SchoonerSQL

• Synchronous master-slave replication for InnoDB• Retrieve/Inject in

the transactionlog and bufferpool

• Monitoring/Administrationtool

• Closed source

27


Active/Passive Clusters usingShared Storage

• Points to consider:–Redundancy and replication

must be guaranteed bythe shared storage(and this is not trivial)

–InnoDB only–File Systems

28Shared Storage

Active/HotServer

Passive/Std-byServer


Active/Passive Clusters using Shared Storage

29

Large Deployments

Shared Storage

in01 in02 in03 in04in05 in06 in07 in08

01 02 03

04 05 06

07 08

VIP01 VIP02 VIP03 VIP04VIP05 VIP06 VIP07 VIP08


Active/Passive Clusters using Shared Storage

30

Failover in Large Deployments

Shared Storage

in01

in02 in03 in04

in05in06 in07 in08

01 02 03

04 05 06

07 08

VIP01VIP02 VIP03 VIP04

VIP05

VIP06 VIP07 VIP08


Virtualised Environments• Data storage, high availability and load balancing are

provided and managed by the virtualised software• In case of fault, the virtualised software restarts on

any otherphysicalserver

• MySQL Replicationfor disasterrecovery

• InnoDB only

31Shared Storage

01 02 0304 05 0607 08

01

02

03

04

05

06

07

08


Geographical Replication for Disaster Recovery

• Master-Master Asynchronous Replication is used to update the backup data centre

• In case of fault, the network traffic is redirected to the backup data centre. Failback must be executed manually

• Cross-platform and cross-engine

32

ActiveData Centre Backup Data

Centre


Storage Snapshots for Disaster Recovery

• Snapshots are managed by the NAS and SAN firmware. There is usually a short read-only freeze

• Snapshots can be used as run-time backup

• InnoDB only, NetApp NASs and firmware are certified using Snapshot and Snapmirror

33

Active DataCentre

Backup Data Centre


34

MySQL Cluster• Shared-nothing, fully transactional and distributed architecture used for high volume and

small transactions.• MySQL Cluster is based on the NDB (Network DataBase) Storage Engine• Data is distributed for scalability and performance, and it is replicated for redundancy on

multiple data nodes.• Nodes in a cluster:

–SQL Nodes: provide theSQL interface to NDB

–API Nodes: provide thenative NDB API

–Data Nodes: store andretrieve data, managetransactions

–Management Nodes:manage the Cluster

• Load balanced• Memory or disk-based• Geographically replicated

for disaster recovery withconflict resolution

• Full online operation formaintenance andadministration

Application Nodes

Data Nodes

SQL Nodes

ManagementNodes

ND

B A

PI, C

lust

erJ/

JPA


Client-based Failover and Proxies

• Connector/J–jdbc:mysql://[host][,failoverhost...][:port]

• mysqlnd_ms for PHP–connection pooling for mysqli, mysql and

PDO_MYSQL

• ScaleBase

35


The absolutely necessary comparison chart...

36

MySQL Replica.on

MHA Tungsten DRBDGaleraXtraDBCluster

ShoonerSQL

Shared Cluster

VMGeo

Replica.onStorage

SnapshotsMySQL Cluster

100% Data Safe

✘ ✘ ✔ ✔ ✘ ✔ ✔ ✔ ✘ ✘ ✔

All Storage Engines

✔ ✔ ✔ ✘ ✘ ✘ ✘ ✘ ✔ ✘ ✘

Automa&c Failover

✘! ✔ ✔ ✔ ✘! ✔ ✘! ✔ ✘ ✘ ✔

Performance Overhead(* -‐ Best)

* * ** *** ** ** -‐ ** * ** *

Easy admin/config(* -‐ Best)

* * * *** * * * * ** ** ***

Scalability(***-‐ Best) ** ** *** * ** ** * * * * **


The famous last words...• I need 5 nines

–

• Everything must be automatic–

• I want to migrate to MySQL Cluster–

• I can’t afford to lose any data–

• I need a sub-second failover–

37


The famous last words...• I need 5 nines

–Implement what you really need

• Everything must be automatic–Aided switchover is sometimes more effective,

inexpensive and easy to implement/administer

• I want to migrate to MySQL Cluster–Is your application designed for Cluster?

• I can’t afford to lose any data–People lose data every day. Is the drop in

performance worth it?

• I need a sub-second failover–Check the timeout periods and the caching warm-

ups38


SkySQL Enterprise HA

• Full HA solution, supported on–Platforms: Linux, Windsows Solaris X86–DB Servers: Oracle MySQL, MariaDB, Percona Server–2 to 3 days implementation guaranteed with

acceptance tests

• Technologies:–MySQL Replication–DRBD Active/Passive or Cross Active–MHA Tool with/without Multi-tier Replication–Linux or Windows Shared Storage–MySQL Cluster–Tungsten Enterprise

39


40

Thank You!

[email protected]


mailto:[email protected]






HA Reloaded

Technology

Transcript of HA Reloaded