Online Schema Changes for Maximizing Uptime

34
Online Schema Changes for Maximizing Uptime PALOMINODB OPERATIONAL EXCELLENCE FOR DATABASES David Turner Ben Black www.palominodb.com

description

PalominoDB's David Turner and Ben Black cover common operations implemented in production and how you can minimize downtime and customer impact

Transcript of Online Schema Changes for Maximizing Uptime

Page 1: Online Schema Changes for Maximizing Uptime

Online Schema Changes

for Maximizing Uptime

PALOMINODB OPERATIONAL EXCELLENCE

FOR DATABASES

David Turner

Ben Black

www.palominodb.com

Page 2: Online Schema Changes for Maximizing Uptime

Palominodb: Who are we and what do we do?

• Proactive database awesomeness so you

can focus on other things

• Systems and Performance Optimization

• Operational Best Practices

• Devops

• Ask us how we like working from anywhere

Page 3: Online Schema Changes for Maximizing Uptime

Online Schema Changes

• Best practices

• Online Schema Tools

• PT Online Schema Change

• Innodb Online DDL

• Case Study

Page 4: Online Schema Changes for Maximizing Uptime

Best Practices

• Backups

• Benchmark o Start small

o Step up

o Time

o Performance_schema and Information_schema

Page 5: Online Schema Changes for Maximizing Uptime

Repository

• Prevent

o Broken Replication

o Data loss

o Broken applications

Page 6: Online Schema Changes for Maximizing Uptime

Creating the repository

• Dump

• Script

Page 7: Online Schema Changes for Maximizing Uptime

Syncing the repository

• Migrations

• Schema Changes

• Scheduled

Page 8: Online Schema Changes for Maximizing Uptime

Problems

• Schema changes

• Fragmentation

Page 9: Online Schema Changes for Maximizing Uptime

You want to do what to that 40G

table?

• Down for Maintenance/blocking DDL

• Disable writes

• Remaster

• Facebook OSC

• Large Hadron Migrator

• Openark-kit

• Pt Online Schema Change

• MySQL Innodb DDL - IOD

Page 10: Online Schema Changes for Maximizing Uptime

OSC Tools or how I learned

to love the alter...

• copy original table structure to new table

• alter new table

• create triggers to copy dml from original

table to new table

• copy data in original table to new table in

chunks

• swap table names and drop original table

Page 11: Online Schema Changes for Maximizing Uptime

pt-online-schema-change

Percona toolkit is your friend.

Mature product

Pay attention to version you are using!!!

FK issues, log_bin with 2.0

screen - use it along with tee and time

Page 12: Online Schema Changes for Maximizing Uptime

It's all ball bearings nowadays

--progress time,10 (default 30 seconds)

--max-lag

--recursion-method (how to find slaves)

show processlist

show hosts

dsn table (great for ignoring some slaves)

Page 13: Online Schema Changes for Maximizing Uptime

What could possibly go wrong?

• PK/UK required

• FK names will change on altered table

• FK's reference table to alter

--alter-foreign-keys-method drop_swap

sets foreign_key_checks=0

drops original table (hardlink!!!)

renames new table

Page 14: Online Schema Changes for Maximizing Uptime

What could possibly go wrong?

• non xfs table drops (create the hardlink!)

o Can even cause innodb to crash mysqld

• PKs with gaps

• Largest table to alter vs free disk space

• Disk space (2x for RBR)

• Global mutexes (table drops)

• table metadata locks (triggers)

Page 15: Online Schema Changes for Maximizing Uptime

NOOOOOO!!

• Running PT-OSC against a slave with RBR

Replication started erroring after pt-osc

why???

• And how about syncing a table using pt-osc

with RBR?

Page 16: Online Schema Changes for Maximizing Uptime

Set it and forget it???

• watch

o w

o df -h

o ls -alh ibd files

o mysql -e"show processlist;"|egrep -v "Sleep|repl"

o slave lag

• How is it affecting the application?

• max-lag, max-load,

Page 17: Online Schema Changes for Maximizing Uptime

Features we like

• Throttling

• Nibbling

• Reporting

• Replicating

• Reorgs

Page 18: Online Schema Changes for Maximizing Uptime

Innodb Online DDL - IOD

• Looking back 5.1 and 5.5

o Fast index creation

• Enhancements

o Online DDL

o Inplace

Page 19: Online Schema Changes for Maximizing Uptime

Silver Bullets

http://dev.mysql.com/doc/refman/5.6/en/innodb-

create-index-overview.html#innodb-online-

ddl-summary-grid

• Drop and add Index together

• Temporary table index creation

Page 20: Online Schema Changes for Maximizing Uptime

Alter Online lock modes

• Exclusive

• Shared

• Default

• None

* Note

Page 21: Online Schema Changes for Maximizing Uptime

Alter Online Algorithms

• Inplace

• Copy

Page 22: Online Schema Changes for Maximizing Uptime

Grouping DDL

• Copy

• Sequence

• Lock

Page 23: Online Schema Changes for Maximizing Uptime

IOD Behavior

• Metadata

• Data volume

• Type of index

• Foreign keys

• Partitioning

o Add partition, drop partition

o Truncate partition

o Add partition and coalesce partition

• Algorithm

• Autoincrement

• Locking

Page 24: Online Schema Changes for Maximizing Uptime

IOD Errors

• Locking

• Timeouts

• Tmpdir

• DML

Page 25: Online Schema Changes for Maximizing Uptime

IOD Crash Recovery

• Secondary Indexes

• Clustered Indexes (don't crash!)

Page 26: Online Schema Changes for Maximizing Uptime

Of Interest

• Tmpdir

• Dropping Indexes with Foreign keys

• Cascade on

• Inconsistent .frm

Page 27: Online Schema Changes for Maximizing Uptime

Monitoring

• Rows Affected

• Performance_schema and

Information_schema

• Progress

• Replication lag

Page 28: Online Schema Changes for Maximizing Uptime

IOD Server parms

• innodb_online_alter_log_max_size

• sort_buffer_size

Page 29: Online Schema Changes for Maximizing Uptime

Case Study

• Ticketing system

Page 30: Online Schema Changes for Maximizing Uptime

Architecture

• 1 master

• 1 active slave

• 1 analytics/indexing slave

• 1 backup/reorg slave

Page 31: Online Schema Changes for Maximizing Uptime

Issues

• Growth

• Rebalancing Shards

• Deletions

• Reorgs

Page 32: Online Schema Changes for Maximizing Uptime

Timeline

• Difficult failovers

• Ticket Growth

• FB OSC

• PT OSC

• Reorgs

Page 33: Online Schema Changes for Maximizing Uptime
Page 34: Online Schema Changes for Maximizing Uptime

Questions

David Turner

[email protected]

Ben Black

[email protected]

www.palominodb.com