HA Reloaded
-
Upload
ivan-zoratti -
Category
Technology
-
view
2.269 -
download
1
description
Transcript of HA Reloaded
IVAN ZORATTIChief Technology Officer
High AvailabilityReloaded
1201.01.01Oracle, MySQL and InnoDB are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.Tuesday, 24 January 12
Agenda
• SkySQL - 3 (+1) slides!
• A bit of theory
• High availability solutions
• ...and the famous last words!
2
Tuesday, 24 January 12
SkySQL Ab
• Funded by:–MySQL® AB founders Monty Widenius and
David Axmark–US Investment group OnCorps.org
• A team of 40, operating in 14 countries, 90% from MySQL® AB
• Backed by:–Product Engineering MontyProgram Ab–Top Community contributors, commercial partners
and end users
3
Tuesday, 24 January 12
SkySQL Offering• SkySQL Enterprise Subscriptions
– Monitoring, Administration and End User tools– Specialised modules for High Availability and performance
improvements1
• SkySQL Enterprise Cluster and SkySQL Enterprise HA– Up to L3 Technical and Consultative Support for the most used
MySQL® distributions and branches
• SkySQL Consulting– Top class team for MySQL® technology– Extended service offering from Health Check to continuous
administration
• SkySQL Training– MySQL® Training and Certification
41 - Option
Tuesday, 24 January 12
The SkySQL Reference ArchitectureComponents
Migra&onToolsMigra&onTools
5
Integra&onToolsIntegra&onTools
Tuesday, 24 January 12
6
High Availability...a bit of theory
Tuesday, 24 January 12
High Availability
“High availability is a system design protocol and associated implementation that ensures a
certain degree of operational continuity during a given measurement period.”
7
Tuesday, 24 January 12
Fault-tolerant?
“Fault-tolerant design enables a systemto continue operation,
possibly at a reduced level(also known as graceful degradation),
rather than failing completely,when some part of the system fails.”
8
Tuesday, 24 January 12
Switchover / Failover
• Switchover– “Switchover is the capability to manually switch over from
one system to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active server, system, or network.”
• Failover– “Failover is the capability to switch over automatically to a
redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active application, server, system, or network.”
• Aided Switchover?• Failback?
9
Tuesday, 24 January 12
Downtime
• Planned/Scheduled
• Unplanned/Unscheduled
• “Downtime or outage duration refers to a period of time that a system fails to provide or perform its primary function.”
10
Tuesday, 24 January 12
Single Point Of Failure - SPOF
11
“A Single Point of Failure, (SPOF),
is a part of a system which, if it fails,
will stop the entire system from working.”
Tuesday, 24 January 12
12
“Disaster recovery is the process, policies and procedures related to preparing for recovery
or continuation of technology
infrastructure critical to an organization after a
natural or human-induced disaster.”
Disaster Recovery andBusiness Continuity
“Disaster recovery planning is a subset of a
larger process known as business continuity planning and should include planning for
resumption of applications, data,
hardware, communications (such
as networking) and other IT infrastructure.”
Tuesday, 24 January 12
13
“Disaster recovery is the process, policies and procedures related to preparing for recovery
or continuation of technology
infrastructure critical to an organization after a
natural or human-induced disaster.”
Disaster Recovery and Business Continuity
Tuesday, 24 January 12
Designing a Highly Available System
• Which level of High Availability do I need?
• Do I require no loss of data?
• Do I need failover or is switchover enough?
• Can I provide a reasonable service when a component is down?
14
Tuesday, 24 January 12
Something to clarify...
• Availability vs Scalability
• HA Costs
• HA for your systems, not only for your database
• Review your SLAs
15
Tuesday, 24 January 12
16
High AvailabilitySolutions
Tuesday, 24 January 12
High Availability with MySQL
17
HigherAvailability
• Combined solutions• Shared nothing distributed cluster with MySQL
Cluster• Geographical Replication for disaster recovery• Virtualised Environments• Active/Passive Clusters through shared storage• MySQL synchronous replication• Generic synchronous replication• MySQL Replication with agents and failover• MySQL Replication
Tuesday, 24 January 12
MySQL Replication
18
• Something you may have missed...–Asynchronous or Semi-synchronous–Pros and Cons of RBR vs SBR–Mono-thread pull from
the slaves–sync_binlog = 0/1–Antilope vs Barracuda–Group Commit–Multi-engines
–Rolling upgrades
99
Read-Write
Read-Only Read-Only
binlog
relaylog relaylog relaylog relaylogTuesday, 24 January 12
MySQL Replication with MMM(Multi-Master replication Manager)
• Master-Master features:–Monitoring–Automatic failover–Data backup–Resynch
• http://code.openark.org/blog/mysql/problems-with-mmm-for-mysql
• http://www.xaprb.com/blog/2011/05/04/whats-wrong-with-mmm/
19
mmm_agentd mmm_agentd
Read-Write
binlog binlog
relaylog relaylog relaylog relaylog
Read-Only Read-Only
mmm_mond
http://mysql-mmm.org
Tuesday, 24 January 12
MySQL Replication with MHA
20
• Something to consider...–read-only=1 and
log-bin on slaves–Master IP failover–Filtering rules–multi-tier replication
http://code.google.com/p/mysql-master-ha/
Tuesday, 24 January 12
Tungsten Replicator
• Open Source, heterogenous replication• Truly multi-master
and fan-in withGlobal ID
• Per-schemamulti-thread
21http://code.google.com/p/tungsten-replicator/
Replicatoragent
Replicatoragent
Replicatoragent
Replicatoragent
Read-Write
Tuesday, 24 January 12
Tungsten Enterprise
22
Read-Write
Connector Connector Connector Connector Connector
Replicator+ Monitor
Replicator+ Monitor
Replicator+ Monitor
Replicator+ Monitor
http://www.continuent.com/solutions/overview
• Tungsten Replicator +–Client Connector with R/W
split and load balancing–Replication Monitoring–Integrated backup
Tuesday, 24 January 12
Synchronous Replication with DRBD
• Typical Active/Standby• Cross active/active servers implementations
• Possible issues:–Dependencies–Infrastructure SPOFs–Write performance
impact–InnoDB only
• DRBD in a virtualizedenvironment
23Block Device
Block Device
Active/HotServer
Passive/Std-byServer
Tuesday, 24 January 12
Synchronous Replication through DRBD
24
Configuration
Block Device Block Device
Active/HotServer Passive/Std-by
Server
192.168.1.1
Gateway
192.168.1.X
VIP192.168.1.2
HB1: 10.0.3.X
HB2: 10.0.4.X
DRBD: 10.0.5.X
15 16
/dev/sdb /mysqldata /dev/sdb /mysqldata
Tuesday, 24 January 12
Synchronous Replication with Galera
• Synchronous replication for InnoDB• Multi-master, no SPOF
• Applicationfailover must bemanaged
• Conflict resolution
25http://www.codership.com
wsrep wsrep wsrep
Read-Write Read-Write
Tuesday, 24 January 12
Percona XtraDB Cluster
• Alpha version of Galera + XtraDB• Multi-master, no SPOF
• Applicationfailover must bemanaged
• Conflict resolution withaborted COMMITs
• Auto Increment• No XA TXN• NoPK operations issues
26http://www.percona.com/doc/percona-xtradb-cluster
wsrep wsrep wsrep
Read-Write Read-Write
Tuesday, 24 January 12
SchoonerSQL
• Synchronous master-slave replication for InnoDB• Retrieve/Inject in
the transactionlog and bufferpool
• Monitoring/Administrationtool
• Closed source
27
Tuesday, 24 January 12
Active/Passive Clusters usingShared Storage
• Points to consider:–Redundancy and replication
must be guaranteed bythe shared storage(and this is not trivial)
–InnoDB only–File Systems
28Shared Storage
Active/HotServer
Passive/Std-byServer
Tuesday, 24 January 12
Active/Passive Clusters using Shared Storage
29
Large Deployments
Shared Storage
in01 in02 in03 in04in05 in06 in07 in08
01 02 03
04 05 06
07 08
VIP01 VIP02 VIP03 VIP04VIP05 VIP06 VIP07 VIP08
Tuesday, 24 January 12
Active/Passive Clusters using Shared Storage
30
Failover in Large Deployments
Shared Storage
in01
in02 in03 in04
in05in06 in07 in08
01 02 03
04 05 06
07 08
VIP01VIP02 VIP03 VIP04
VIP05
VIP06 VIP07 VIP08
Tuesday, 24 January 12
Virtualised Environments• Data storage, high availability and load balancing are
provided and managed by the virtualised software• In case of fault, the virtualised software restarts on
any otherphysicalserver
• MySQL Replicationfor disasterrecovery
• InnoDB only
31Shared Storage
01 02 0304 05 0607 08
01
02
03
04
05
06
07
08
Tuesday, 24 January 12
Geographical Replication for Disaster Recovery
• Master-Master Asynchronous Replication is used to update the backup data centre
• In case of fault, the network traffic is redirected to the backup data centre. Failback must be executed manually
• Cross-platform and cross-engine
32
ActiveData Centre Backup Data
Centre
Tuesday, 24 January 12
Storage Snapshots for Disaster Recovery
• Snapshots are managed by the NAS and SAN firmware. There is usually a short read-only freeze
• Snapshots can be used as run-time backup
• InnoDB only, NetApp NASs and firmware are certified using Snapshot and Snapmirror
33
Active DataCentre
Backup Data Centre
Tuesday, 24 January 12
34
MySQL Cluster• Shared-nothing, fully transactional and distributed architecture used for high volume and
small transactions.• MySQL Cluster is based on the NDB (Network DataBase) Storage Engine• Data is distributed for scalability and performance, and it is replicated for redundancy on
multiple data nodes.• Nodes in a cluster:
–SQL Nodes: provide theSQL interface to NDB
–API Nodes: provide thenative NDB API
–Data Nodes: store andretrieve data, managetransactions
–Management Nodes:manage the Cluster
• Load balanced• Memory or disk-based• Geographically replicated
for disaster recovery withconflict resolution
• Full online operation formaintenance andadministration
Application Nodes
Data Nodes
SQL Nodes
ManagementNodes
ND
B A
PI, C
lust
erJ/
JPA
Tuesday, 24 January 12
Client-based Failover and Proxies
• Connector/J–jdbc:mysql://[host][,failoverhost...][:port]
• mysqlnd_ms for PHP–connection pooling for mysqli, mysql and
PDO_MYSQL
• ScaleBase
35
Tuesday, 24 January 12
The absolutely necessary comparison chart...
36
MySQL Replica.on
MHA Tungsten DRBDGaleraXtraDBCluster
ShoonerSQL
Shared Cluster
VMGeo
Replica.onStorage
SnapshotsMySQL Cluster
100% Data Safe
✘ ✘ ✔ ✔ ✘ ✔ ✔ ✔ ✘ ✘ ✔
All Storage Engines
✔ ✔ ✔ ✘ ✘ ✘ ✘ ✘ ✔ ✘ ✘
Automa&c Failover
✘! ✔ ✔ ✔ ✘! ✔ ✘! ✔ ✘ ✘ ✔
Performance Overhead(* -‐ Best)
* * ** *** ** ** -‐ ** * ** *
Easy admin/config(* -‐ Best)
* * * *** * * * * ** ** ***
Scalability(***-‐ Best) ** ** *** * ** ** * * * * **
Tuesday, 24 January 12
The famous last words...• I need 5 nines
–
• Everything must be automatic–
• I want to migrate to MySQL Cluster–
• I can’t afford to lose any data–
• I need a sub-second failover–
37
Tuesday, 24 January 12
The famous last words...• I need 5 nines
–Implement what you really need
• Everything must be automatic–Aided switchover is sometimes more effective,
inexpensive and easy to implement/administer
• I want to migrate to MySQL Cluster–Is your application designed for Cluster?
• I can’t afford to lose any data–People lose data every day. Is the drop in
performance worth it?
• I need a sub-second failover–Check the timeout periods and the caching warm-
ups38
Tuesday, 24 January 12
SkySQL Enterprise HA
• Full HA solution, supported on–Platforms: Linux, Windsows Solaris X86–DB Servers: Oracle MySQL, MariaDB, Percona Server–2 to 3 days implementation guaranteed with
acceptance tests
• Technologies:–MySQL Replication–DRBD Active/Passive or Cross Active–MHA Tool with/without Multi-tier Replication–Linux or Windows Shared Storage–MySQL Cluster–Tungsten Enterprise
39
Tuesday, 24 January 12
40
Thank You!
Tuesday, 24 January 12