DataStax: Backup and Restore in Cassandra and OpsCenter

19
Backup and Restore Backup and Restore in Cassandra and in Cassandra and OpsCenter OpsCenter

Transcript of DataStax: Backup and Restore in Cassandra and OpsCenter

Page 1: DataStax: Backup and Restore in Cassandra and OpsCenter

Backup and RestoreBackup and Restorein Cassandra andin Cassandra and

OpsCenterOpsCenter

Page 2: DataStax: Backup and Restore in Cassandra and OpsCenter

OverviewOverviewSnapshot OperationsRestore OperationsCommit Log Archiving/Point in Time RestoreRemote backupFrom both Cassandra and Opscenter perspectives

Page 3: DataStax: Backup and Restore in Cassandra and OpsCenter

SnapshotsSnapshotsNodetool Snapshot Basics Performs a flush, then hard links sstables to

More at

http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSnapShot.html

org.apache.cassandra.db ->StorageService ->takeSnapshot

<data_file_directories>/<ks>/<table>/snapshots/<snapshot-name>/

Under the hood, mbeans

Page 4: DataStax: Backup and Restore in Cassandra and OpsCenter

Snapshots in OpscenterSnapshots in Opscenter

Under Services -> BackupDisplays backup history, allows backup and restore.Advanced settings we'll cover laterBackup Service is an Enterprise Feature

More at

http://docs.datastax.com/en/opscenter/5.2/opsc/online_help/services/opscBackupService.

html

Page 5: DataStax: Backup and Restore in Cassandra and OpsCenter

Snapshots in OpscenterSnapshots in OpscenterSchedule repeated backupsor create ad hoc backupSelect keyspacesSet location (on server vss3)Uses the mbean to performthe snapshot rather thanshelling out.Coordinates the snapshoton all nodes.Backs up the schemato schema.jsonKeeps a log for audit

Page 6: DataStax: Backup and Restore in Cassandra and OpsCenter

Auditable RecordsAuditable Records

Page 7: DataStax: Backup and Restore in Cassandra and OpsCenter

Remote SnapshotsRemote SnapshotsOpscenter can alsobackup to s3Specify s3 bucket name,aws credentialsOptional transfer throttleand compressionNot all SSTables need tobe backed up, becausethey are immutable onlypart of the data mayrequire it.

Page 8: DataStax: Backup and Restore in Cassandra and OpsCenter

SSTables need to be stored per node to avoid namecollisions.However dropping and recreating a table can lead toa naming collision as well, OPSC can attach atimestamp.If your data is encrypted, make sure that theencryption key is also put somewhere safe.Opsc backs up schemasTopologies change over time (more on this in restore).

Page 9: DataStax: Backup and Restore in Cassandra and OpsCenter

Restore OperationsRestore OperationsSSTableloader Basics

Expects the schema to already exist for the sstables.Expects a directory structure different from thatcreated by the snapshot, specifically<Keyspace>/<Table>/<files>Can stream data to other nodes, doesn't just movefiles into placeLeaves files in place as they are restored, possibledisk penalty.

More at

http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html

Page 10: DataStax: Backup and Restore in Cassandra and OpsCenter

Restore OperationsRestore OperationsSelect a backup from alist of availablesnapshots.Point in Time restores(more on this later)Restore from otherlocation

Page 11: DataStax: Backup and Restore in Cassandra and OpsCenter

Restore OperationsRestore OperationsAttempts to recreate theschema or do a schemacomparison. The latter isextremely difficult withthrift.Creates symbolic links in atemporary directory tomatch what SSTableloaderexpects.Logs/audit trail to follow.Uses SSTableloader

Page 12: DataStax: Backup and Restore in Cassandra and OpsCenter

Remote RestoreRemote RestoreTopologies change over time.When topologies shrink multiple nodes worth of datawill have to be sent to a single node (sstable namingcollisions).

Page 13: DataStax: Backup and Restore in Cassandra and OpsCenter

Remote RestoreRemote RestoreWhen topologies grow some nodes may be idleduring a restore.Replacement nodes will have a different host ID andwill need to be matched to host ID of the snapshot.Opscenter handles all of these cases.

Page 14: DataStax: Backup and Restore in Cassandra and OpsCenter

Commit Log ArchivingCommit Log ArchivingCassandra an execute a scriptwhen writing commit logsegmentsset incommitlog_archiving.properties

http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLogArchive_t.

html

Page 15: DataStax: Backup and Restore in Cassandra and OpsCenter

Commit Log ArchivingCommit Log ArchivingOpscenter can enable that alsounder services->backupsservice->settingsOpscenter can also send theseto s3 as well.

http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLogArchive_t.

html

Page 16: DataStax: Backup and Restore in Cassandra and OpsCenter

Point in Time RestorePoint in Time Restore2 step operation, restore snapshot, then replaycommit logs.Find the nearest snapshot that happens prior to thepoint in time desired, perform a restore.Update commitlog_archiving.properties with thelocation of the commit logs as well as the point intime to restore.Restart cassandra.

More At

http://docs.datastax.com/en//cassandra/2.0/cassandra/configuration/configLogArchive_t.

html

Page 17: DataStax: Backup and Restore in Cassandra and OpsCenter

PiT in OpscenterPiT in OpscenterOpsCenter canautomate the PiTrestore processSet time (in UTC)OpsCenter will verifythat it is capable ofrestoring to that pointin time.Commit logs orSnapshots can be localor on S3

Page 18: DataStax: Backup and Restore in Cassandra and OpsCenter

PiT Restore ChallengesPiT Restore ChallengesCommit log replays don't stream data around thering, this makes topology changes difficult to handle.Comparing schemas can be tricky if the reply containsschema changes.

Page 19: DataStax: Backup and Restore in Cassandra and OpsCenter

Questions?Questions?

Feel free to reach out:https://www.linkedin.com/in/philipsdoctor