Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users...

24
MongoDB Backup and Recovery Field Guide Tim Vaillancourt Sr Technical Operations Architect, Percona

Transcript of Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users...

Page 1: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

MongoDB Backup and Recovery Field Guide

Tim VaillancourtSr Technical Operations Architect, Percona

Page 2: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

2

`whoami`

{name: “tim”,lastname: “vaillancourt”,employer: “percona”,techs: [

“mongodb”,“mysql”,“cassandra”,“redis”,“rabbitmq”,“solr”,“python”,“golang”

]}

Page 3: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

3

Agenda

● History● Methods

○ Logical○ Binary

■ Cold■ LVM■ Hot Backup

● Integrity / Consistency○ mongodb_consistent_backup

● Architecture● Restore and Validation

Page 4: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

4

History

● 3000-4000 BC: Culturally significant data backed up in a universal format

● 1400: The Printing Press● 1600-1800: Chapultepec Aqueduct● 1990s: Floppy and Zip Disks● 2000s: No more Floppy/Zip Disks● Present: All my data is on Google

Drive and I have 7 days of hourly Time Machine backups!

● Future: ?

Page 5: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

5

Replication != Backup

● Replication is not a backup!○ Replication is High Availability○ Including

■ Binary/Statement-based Replication of any type● Delayed Replication***

■ RAID Arrays● <EOF>

Page 6: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

Backup Methods

Page 7: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

7

Logical Backups

● Tools○ mongodump

■ Uses find() queries with $snapshot to backup all collections■ Supports Gzip and Threading in 3.2+■ Outputs a directory containing bson files in various subdirectories

○ Custom Queries■ The client API could be used similarly to mongodump to perform logical backups

● Benefits○ Reduced storage footprint○ Replication awareness○ Compatibility

● Drawbacks

Page 8: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

8

Binary Backups: Cold Backup

● Very simple process● Causes full outage to MongoDB instance!● Process

○ Stop mongod○ Copy and archive dbPath○ Start mongod

Page 9: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

9

Binary Backups: LVM / Filer / Cloud Disk

● Process○ If Non-Journalled

■ db.fsyncLock()■ Keep session open

○ Create block-device snapshot○ Unlock the database

■ db.fsyncUnlock()○ Copy or achive the snapshot directory○ Remove block devics snapshot (as quickly as

possible!)● LVM

○ Snapshots have been demonstrated to cause up to 30%* write latency impact to disk due to COW

Page 10: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

10

Binary Backups: Hot Backup

● PSMDB or MongoDB Enterprise○ Pay $$$ for MongoDB Enterprise or download

PSMDB for free(!)○ db.adminCommand({

createBackup: 1,backupDir: "/data/mongodb/backup"

})○ Copy/archive the output path○ Delete the backup output path○ NOTE:

■ RocksDB-based createBackup creates filesystem hardlinks whenever possible!

■ Delete RocksDB backupDir as soon as possible to reduce bloom filter overhead!

Page 11: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

Backup Integrity / Consistency

Page 12: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

12

The “Distributed Cluster Backup Problem”

● Mongodump is single node consistent only!● Common to most or all database techs in

sharded environment● Problems:

○ Backup tools consider single-instance integrity only

○ Backups of different shards may complete at different times

○ Changes replicate asynchronously○ Data may be balancing / moving in the cluster

● Risks:○ Orphaned documents / references○ Holes in data

Page 13: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

13

Backups: mongodb_consistent_backup

● Python project by Percona-Lab for consistent backups● URL: https://github.com/Percona-Lab/mongodb_consistent_backup● Best-effort support, not a “Percona Product”● Created to solve limitations in MongoDB backup tools:

○ Replica Set and Sharded Cluster awareness○ Cluster-wide Point-in-time consistency○ In-line Oplog backup (vs post-backup)○ Notifications of success / failure

● Extra Features○ Remote Upload (AWS S3, Google Cloud Storage and Rsync)○ Archiving (Tar or ZBackup deduplication and optional AES-at-rest)○ CentOS/RHEL7 RPMs and Docker-based releases (.deb soon!)

Page 14: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

14

Backups: mongodb_consistent_backup

● 1.2.0○ Multi-threaded Rsync Upload○ Replica Set Tags support○ Support for MongoDB SSL / TLS connections and client auth○ Rotation / Expiry of old backups (locally-stored only)

● Future○ Incremental Backups○ Binary-level Backups (Hot Backup, Cold Backup, LVM, Cloud-based, etc)○ More Notification Methods (PagerDuty, Email, etc)○ Restore Helper Tool○ Instrumentation / Metrics○ <YOUR AWESOME IDEA HERE> we take GitHub PRs (and it’s Python)!

Page 15: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

Backup Architecture

Page 16: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

16

Architecture: Simple Example

● Method○ Run mongodump (with --oplog) using a plain secondary○ Store backups with on-site remote storage (filer, rsync,

etc)● Potential Issues

○ Application Impact■ I/O and CPU impact due to backups may affect application■ Storage-engine and FS caches will become dirty■ Primary Failure

● A failure of the Primary may cause the Secondary backing-up to become Primary

● This can be avoided by using a Read Preference of ‘secondary’ (supported in recent mongodump versions)

○ No Disaster Recovery

Page 17: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

17

Architecture: Tag-Based Example

● Replica Set Tags○ Allow selection of MongoDB nodes using key/value pairs○ Represented in JSON/single document○ Many key/value pairs is possible

● Example Backup from “west” Only○ Specify a single node with a tag such as { location: “west” }○ Use Read Preference Tag in mongodump/mongodb_consistent_backup to target a

specific node.

Page 18: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

18

Architecture: Offsite Backup Example

● Example○ Create backup within local datacenter○ Upload completed backups to other datacenter, cloud,

etc■ mongodb_consistent_backup supports Amazon S3,

Google Cloud Storage and Rsync for remote upload!● Benefits

○ Fast backup time due to in-datacenter latency● Drawbacks

○ A full backup data uploaded each backup job

Page 19: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

19

Architecture: Disaster Recovery Example

● Example○ Place a SECONDARY node in another location

■ Dedicated node is recommended to reduce impact■ hidden:true recommended

○ Run backup from off-site SECONDARY member○ Optionally upload to Cloud Storage

● Benefits○ Only changes (replication) replicated to offsite location○ Potentially faster uploads to Cloud Storage

● Drawbacks○ Bootstrap / Initial Sync may use high bandwidth (if not

seeded by backup)

Page 20: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

Restore and Validation“It’s not a backup system, it’s a restore system” ~ Raymond Blum, Google SRE

Page 21: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

21

Restoring and Validation

● Methodology○ Optimise restore time, not backup run time

■ Users and business care how fast their data is back, not how long it takes to backup■ Binary-level backups are much faster to restore in MongoDB

● Validation○ This is very application specific○ Random sample restored data and validate

■ Example: Compare to Production● Compare real Production item, user, article, etc to backup● Ensure backup age doesn’t cause false alarms, ie: test data older than backup

■ Example: Integration Test / QA● Run code integration tests or QA on restored data

■ Example: Production Backup as Test Data● Copy Production Data to Test periodically using backups

Page 22: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

22

Thank You Sponsors!

Page 23: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

23

SAVE THE DATE!

CALL FOR PAPERS OPENING SOON!www.perconalive.com

April 23-25, 2018Santa Clara Convention Center

Page 24: Recovery Field Guide MongoDB Backup and - Percona...Optimise restore time, not backup run time Users and business care how fast their data is back, not how long it takes to backup

24

Questions?