Lessons from database failures - Percona · Lessons from database failures - January 2017 Created...

32
Lessons from database failures Colin Charles, Chief Evangelist, Percona Inc. [email protected] / [email protected] http://www.bytebot.net/blog/ | @bytebot on Twitter Percona Webminar 18 January 2017

Transcript of Lessons from database failures - Percona · Lessons from database failures - January 2017 Created...

Page 1: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Lessons from database failuresColin Charles, Chief Evangelist, Percona [email protected] / [email protected]://www.bytebot.net/blog/ | @bytebot on TwitterPercona Webminar18 January 2017

Page 2: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

whoami• Chief Evangelist (in the CTO office), Percona Inc

• Focusing on the MySQL ecosystem (MySQL, Percona Server, MariaDB Server), as well as the MongoDB ecosystem (Percona Server for MongoDB) + 100% open source tools from Percona like Percona Monitoring & Management, Percona xtrabackup, Percona Toolkit, etc.

• Founding team of MariaDB Server (2009-2016), previously at Monty Program Ab, merged with SkySQL Ab, now MariaDB Corporation

• Formerly MySQL AB (exit: Sun Microsystems)• Past lives include Fedora Project (FESCO), OpenOffice.org• MySQL Community Contributor of the Year Award winner 2014

Page 3: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Agenda• Backups (and verification)• Replication (and failover)• Security (and encryption)

Page 4: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

ma.gnolia.com

Page 5: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

ma.gnolia.com’s failure• January 30 2009: complete outage• February 17 2009: data corruption in the UDB, essentially dead• What happened?

• Ruby on Rails on four self-hosted Mac Mini’s, a couple of XServe’s, 500GB+ MySQL 5 DB

• Filesystem corruption, corrupted database backup• No versioning, didn’t check if the backups worked, made use of rsync to backup the database over Firewire network

Page 6: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

ma.gnolia.com today?• EC2 for the app with EBS snapshots, RDS with snapshots, Multi-AZ

deployment• Self-hosted?• xtrabackup • START TRANSACTION WITH CONSISTENT SNAPSHOT + mysqldump —single-transaction —master-data

• Backup a replica• Replication event checksums

Page 7: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Couchsurfing, 2006

Page 8: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Couchsurfing problems1. major, avoidable hard drive crash2. incremental backups weren’t executed in the correct manner, and

twelve of our most important data files didn’t survive

Page 9: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Time-delayed replication• MySQL 5.6+ has time-delayed replication. Stop replication when you

know a mistake has happened before it propagates to all the slaves.• Feature suggestion since 2001! Bug reported August 2006

(mysql#21639). Pushed June 2010 (WL#344). GA February 2013.

Page 10: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Why replicate?• Scale out• [automatic] (master) failover• Geographical redundancy across multiple data centres• Online schema changes

Page 11: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Replication• Asynchronous (default)• (Enhanced loss-less) Semi-synchronous (plugin)• Synchronous (Galera, group replication, NDBCLUSTER)• DRBD

Page 12: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Frameworks• MySQL-MMM• Severalnines ClusterControl• Orchestrator• MySQL MHA• Tungsten Replicator• 5.6+ utilities: mysqlfailover, mysqlrpladmin

• Percona Replication Manager (https://github.com/percona/percona-pacemaker-agents/)

• Replication Manager (github.com/tanji/replication-manager)

Page 13: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

GitHub

Page 14: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

GitHub

Page 15: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

GitHub

Page 16: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

GitHub

https://github.com/blog/1261-github-availability-this-week

Page 17: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Fully automated failover a good idea?• False alarms• Repeated failover

• Overloaded master? MHA doesn’t allow a failover within 8h, unless —last_failover_min=n is set

• Data loss• id=103 latest, relay logs at id=101 => loss• group commit in the binary log

• Split brain

Page 18: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Proxies• MariaDB MaxScale

• Popular use: load balancing Galera clusters• MySQL Router + MySQL Fabric• ProxySQL

• Used alongside Galera clusters too• Included with Percona XtraDB Cluster 5.7

Page 19: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...
Page 20: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Sharding• SPIDER• Tungsten Replicator• Tumblr JetPants

Page 21: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Vitess• Servers & tools to scale MySQL for web written in Go• Has MariaDB support too (*)• Python client interface• DML annotation, connection pooling, shard management, workflow

management, zero downtime restarts• Become super easy to use: http://vitess.io/ (with the help of

Kubernetes)

Page 22: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Failwhales• Twitter started on MySQL, and is still MySQL - you just need to

“evolve”• Gizzard (sharding), Mesos + Apache Cotton

• Digg started on MySQL, migrated to Cassandra, and came back to MySQL

Page 23: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Security• Philippines voter data leave 55m at risk: 338GB MySQL dump• Ashley Madison: 6.9GB compressed dump, 36m email addresses

leaked, 9.6m credit card transactions• Patreon: 13.7GB MySQL dump, 99 tables

Page 24: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Mossack Fonseca: Panama Papers

Page 25: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Prevent SQL injections• MariaDB MaxScale database firewall filter

• Configurable filter actions on rule match (Allow the query, block the query or ignore the match), Logging of matching and/or non-matching queries

• MySQL Enterprise firewall• ProxySQL

Page 26: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Encryption at rest• MariaDB Server 10.1: table or tablespace encryption

• design goal: Encrypt all user data that may touch the disk — InnoDB data, InnoDB logs, binary logs, temporary tables, temporary files

• key management on the filesystem? [no key rotation] Amazon KMS? • caveats: mysqlbinlog needs work with encrypted binlogs; Galera

Cluster gcache isn’t encrypted• MySQL 5.7: only encrypts InnoDB tablespaces (innodb_file_per_table;

logs unencrypted)

Page 27: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

In conclusion…• Use semi-sync replication with a failover solution that ensures you

don’t failover too often• Make good backups. Test them. Save them.• You’ll most definitely need to shard your data, use proven

frameworks and get a proxy involved. Complete backups with multi-source replication when needed.

• Use mysqldump and xtrabackup together (and mydumper for parallel backup/restore; mysqlpump)

• Security is key: prevent SQL injections, encrypt your data at rest

Page 28: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

It’s 2016, you don’t want this…

Page 29: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Percona Monitoring and Management (PMM)

• http://pmmdemo.percona.com/

Page 30: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Join Us at Percona Live

When: April 24-27, 2017 Where: Santa Clara, CA USA

Percona Live is a very popular conference: last year’s Percona Live Europe sold out, and we’re looking to do the same for Percona Live 2017. Don’t miss your chance to get your ticket at its most affordable price. https://www.percona.com/live/17/registration

Percona Live 2017 sponsorship opportunities are available now. https://www.percona.com/live/17/sponsors

Page 31: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...

Thank you. Q&[email protected] / [email protected]@bytebot on Twitter | http://www.bytebot.net/blog/slides: slideshare.net/bytebot

Page 32: Lessons from database failures - Percona · Lessons from database failures - January 2017 Created Date: 1/18/2017 2:49:41 PM ...