Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) •...

37
Building MySQL DBaaS on OpenStack with XtraDB Cluster

Transcript of Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) •...

Page 1: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Building MySQL DBaaS on

OpenStack with XtraDB

Cluster

Page 2: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Who We Are

Paddy Power Betfair is a leading international sports betting and gaming operator

FTSE100, Market Cap ~£7Bn

We operate six leading brands; PaddyPower, Betfair,

Sportsbet, FanDuel, TVG, DRAFT

Over five million customers worldwide

We run some of the world’s most exciting online sports

betting and gaming brands

We employ over 7000 people from Los Angeles to

Melbourne, via Dublin and London

Page 3: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Where We Started

Merger of Paddy Power and Betfair

Ageing native Infrastructure

Lack of cross DC DR for MySQL

Reduce TTM for new database systems

S/W and H/W inconsistencies across Dev,

QA and Prod

Page 4: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Our Vision

DB as a service

Always-On, Highly Available, Disaster Proof

architecture

Rapid provisioning

Ability to quickly patch systems with little to

no disruption for Applications

Free up staff for more valuable work

Page 5: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

DBaaS at Paddy Power Betfair

Page 6: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

XtraDB Cluster on OpenStack…

Page 7: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

MySQL HA Options?

• MySQL Master-Master cross DC replication

• XtraDB cluster with arbitrator node in cloud/3-DC

• Asymmetric cross DC XtraDB Cluster (3-node)

Page 8: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Why Not Master-Master Cross DC

Replication?

Limitations:

• Handling replication lags in case of unplanned

failovers

• Handling split brain scenarios

• Operational overhead of keeping replication

working for over 160+ environment’s

• Conflict resolution

Page 9: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Why Not XtraDB Cluster with Arbitrator in

3rd DC?

Limitations:

• Additional round trip network latency

• SST with just 2 active node will cause

service disruption

• Handling split brain scenarios

arbitrator

arbitrator

Page 10: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Why Asymmetric Cross DC XtraDB

Cluster?

Limitations:

• Unplanned DC outage on majority node DC

Page 11: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Why Percona XtraDB?

Cross DC resiliency Transparent/Seamless failover for planned maintenance

Cross DC deployment pipeline

Fast recovery from DC outages Less Operational Overhead

Improving customer experience

Page 12: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Why XtraDB Backup, PMM,

pt-online-schema-change?

• XtraDB Backup allows us to recover individual

nodes, without having to do SST on 1 TB DB’s

• XtraDB Backup allows us to do point in time

and partition level recovery

• PMM allows us to monitor XtraDB cluster,

MySQL and O/S metrics in a centralized

fashion.

• PMM allows us to add PMM agents as part of

our deployment pipeline

• pt-online-schema-change for running schema

upgrades, on OLTP platform

Page 13: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

PMM Dash Board

Page 14: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Why NetScaler?

• MaxScale and ProxySQL did not support values

returned from DB procedure calls (at the time of

testing)

• NetScaler allows us to check DB state for routing

connections, as it works better than other

connection managers which checks the port state

• DB state check has helped in reducing the failover

time’s from 10 sec to 2-3 seconds

• NetScaler allows us to implement read/write split

rules, this is something we plan to use in future.

• Existing framework code to provision NetScalers

Page 15: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

What Did We Build?

Page 16: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

IaC /1 - Automation Tools

Our toolset includes:

• Gitlab (code repository)

• Artifactory (artifacts, external repos proxy)

• Jenkins (Ci Build Jobs)

• GoCD (Pipeline configuration and templates)

Page 17: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

IaC /2 - Ansible Framework

We have number of Git repositories to describe our

infrastructure requirements.

They all feed our Ansible Framework that calls APIs

to provision what’s required.

Page 18: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

IaC /3 - Our repos

• Openstack VM provisioning specs

• SDN (Nuage network and firewall

design)

• Load Balancer (Citrix netscaler

VIPs, AVI GSLB)

• Monitoring (Sensu, Splunk, Tsdb)

Page 19: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

IaC – PPB Cloud /3a

Percona XtraDB Cluster Configuration

Percona XtraDB Cluster gets configured using an

Ansible role included by our Framework

• We use jinja2 templates

• Default values for all MySQL parameters

• Override values for each environment

e.g.

Memory parameter is calculated dynamically as a

percentage of the total allocated memory to VM.

Page 20: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

IaC – PPB Cloud /3b

Percona XtraDB Cluster Configuration

Page 21: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

IaC - PPB Cloud /5 Jenkins wraps it up

Page 22: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

IaC - PPB Cloud /6 GoCD Pipelines

Provisioning the desired infrastructure with the same process for each Environment (QA/Pre-Prod/Perf/Prod)

Page 23: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

CI/CD Workflow in a picture

Page 24: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Challenges

• Hosting stateful applications on PPBF Openstack.

• Reducing Service Disruption.

• Hosting highly concurrent OTLP application on

XtraDB Cluster.

• Developing a mechanism for fast recovery from full

unplanned DC outages.

Page 25: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Stateful Apps on PPBF OpenStack

• Rolling update is the process to redeploy our

environment(s); challenge was how to minimize

service disruption

• Rolling update requires a new VM to be deployed

with the new changes and move the DB instance

onto the new VM (A / B deployments)

Page 26: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Rolling Update Explained

Volume

clone

Page 27: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Rolling Update Explained

Volume

snapshot

Page 28: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Rolling Update Explained

Page 29: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Rolling Update Explained

Volume

Clone

Page 30: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Rolling Update Explained

Page 31: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Reducing Service Disruption

• Reducing the time to failover.

sqlquery: "show global status like 'wsrep_local_state_comment'"

evalrule: "MYSQL.RES.ROW(0).TEXT_ELEM(1).CONTAINS(\"Synced\")||MYSQL.RES.ROW(0).TEXT_ELEM(1).CONTAINS(\"Donor\")“

Page 32: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Hosting Highly Concurrent OLTP

Applications

• Route all write connections to a single XtraDB cluster node

• Reads are being scaled across nodes

• DB design strategy

Page 33: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Recovery from Unplanned DC Outages

Fast recovery from unplanned DC outages:

• Trade-off between:

Transaction latency with Arbitrator

Vs

Fast enough recovery from not so frequent DC

failure

• Created workflow to recover from unplanned DC

outages;

• Process works through the MySQL environment

identifying those that have minority nodes in the

surviving DC and bootstraps them

• This process take 5-10 min for the entire DC

• set global wsrep_provider_options="pc.bootstrap=1";

Page 34: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

How has Percona XtraDB Cluster on

OpenStack Benefited PPBF?

✓ Time-to-Market

✓ Operational Support and Consistency

✓ HA and DR for our business critical services

✓ Minimal service disruption for planned maintenance and upgrades

✓ Better manageability

✓ Standardised and improved monitoring

✓ Security

Page 35: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Current Status…

• We host 40+ applications on XtraDB clusters

• Biggest single database about 1TB in size

• Max transaction rate 6k/sec

• 480 VMs used

• On average we deploy/migrate 3 applications onto MySQL XtraDB cluster

every month

• About half a day average time to build a full set of environments for a new

application

• 2 major planned XtraDB cluster and OpenStack version upgrade completed

with close to zero downtime

Page 36: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

What’s Next for PPBF?

• Standard deployment pipelines for other SQL and

NoSQL technologies

• Integrating DB releases into the pipeline and make

it self service for development team

• Full DBaaS offering for Dev team to test different

DB technologies

Page 37: Building MySQL DBaaS on OpenStack with XtraDB Cluster · • Gitlab (code repository) • Artifactory (artifacts, external repos proxy) • Jenkins (Ci Build Jobs) • GoCD (Pipeline

Thank You

Any Questions?