Ingres Backup and Recovery -...
Transcript of Ingres Backup and Recovery -...
2
Abstract
Proper backup is crucial in any production DBMS installation, and Ingres is no exception. And backups are useless unless you can recover from them. This session explains how Ingres backup and recovery work. We will also cover some ideas on how best to do a regular backup and how to do a save recovery.
3
Agenda
• Why backup and recovery?
• Disaster scenarios
• Ingres features
• Housekeeping
• Customisation
• Issues to Consider
• Tips and cautions
4
Why backup and recovery?
• Insurance
• What if?
• Cost to business
• Critical functionality
• One part of overall process
6
System Crash
• Automated Recovery
• After a crash Ingres will• Scan the transaction log file• Rollback uncompleted transaction• Apply completed transactions
• Databases will be consistent• Depends on the crash
7
Database Corruption
• Databases can be recovered
• Only if valid Ingres backup is available!
• ckpdb command to backup
• rollforwarddb to recover
8
Backup Mechanisms
• OS backup• invalid unless done with Ingres shut down cleanly• important for backing up Ingres installation, journals,
checkpoints, dumps• useless for backing up databases unless you can
guarantee a clean shutdown
• unloaddb• an archiving or porting tool, not a backup tool• no way to ensure a consistent snapshot without locking out
all users (an "offline" archive)
9
Backup Mechanisms
• In order to get the most out of a backup mechanism, two things are needed:• a way to take a static snapshot of the database without
interfering too greatly with active users• a way to record incremental changes since that static
snapshot
• Ingres does both via checkpoints and journals• a checkpoint is the static backup or snapshot• the journals are the ongoing change records
10
Backup Mechanisms
• Terminology note! Ingres differs from other DBMS's in its use of the word "checkpoint"
• Ingres:• a checkpoint is a backup snapshot• a consistency point (CP) is a buffer and log flush
• Other DBMS's:• a checkpoint means a buffer flush• a backup is just called a backup
11
Database Checkpoints
• Backup the whole database
• Online or Offline
• Enable / Disable journaling
• Can be performed in parallel
• Written to• Tape• Disk
• Don’t forget iidbdb!!
12
Online versus Offline
• Offline• Requires exclusive access to database
• Online• Users carry on working• No DDL statements• Slower than offline• Can cause transaction log file to fill
13
Online Checkpointing
• An online checkpoint (the ckpdb command) has three phases:• quiescing the database• file copying with change logging• completion recording
16
Online Checkpointing
• File copying is controlled by the checkpoint template (cktmpl.def)• can be modified by Ingres administrator• change copy command, add file compression, etc• amazing things are possible
• DML allowed during file copying• but not DDL - no file creation/deletion
• Changes during file copying are specially logged• before-images sent to dump files
17
Checkpointing
• After copying is complete, the checkpoint success or failure is recorded in the database config file• aaaaaaaa.cnf• another copy left in cnnnnnnn.dmp in dump location• note that the checkpoint itself does not contain a record of
the checkpoint completion
• Config file records last N checkpoint attempts• successful or not• N = 99 for recent releases of Ingres• N = 16 for older versions (2.0 and older)
18
Online Checkpointing
• When it's all over, you have• one or more checkpoint files (one for each data location)
• in disk checkpoint area, or on tape• zero or more dump files containing changes made while
file-copying• an updated database config file
• plus an updated copy in the dump location• a new set of journal files
• a fresh journal file is started at the end of the database quiescent phase
19
Checkpointing
• What to save after the checkpoint completes:• the checkpoint and dump locations
• you need both• infodb output (human readable listing of the database
config file)• output of: select * from iifile_info
• for manual table level recovery and emergencies• optional but recommended
20
Journals
• Audit trail of all changes made to selected tables• written in batches by the archiver (dmfacp)
• Default for tables is journaling ON• journaling also needs to be enabled for the database using
ckpdb +j• this is an offline checkpoint; no users allowed
• Journal files grow to a target size, then a new one is started• current expected size and sequence number is stored in the
database config file• each checkpoint starts a fresh set of journal files
21
Database Checkpoint - Examples
• Command line• Online checkpoint
ckpdb dbname• Offline checkpoint – enabling journaling
ckpdb +j dbname ’#m3’• Offline checkpoint – disabling journaling
ckpdb -j dbname
23
Recovery
• Recovery is a two step process• one command (rollforwarddb) with two distinct phases
• First, restore the database to a point in time (a checkpoint)
• Second, replay journals• optional• all journals, or stop at a given time
27
Recovery
• The database must exist before it can be recovered
• All required data locations must exist
• A valid config file must be available• recovery looks in the data location first, then the dump
location• config file is renamed to aaaaaaaa.rfc
• The last checkpoint must be valid• can ask for an earlier checkpoint with #cn option
28
When Recovery Is Needed
• Stay calm!• you have practiced recovery, right?• haste makes mistakes• turn off the mobile phone, pager, etc• the database will be ready when it's ready
• Save your current database config• ideally, make a copy of the dump location and the data location
aaaaaaaa.cnf• as a minimum save aaaaaaaa.cnf• allows you to try again if something goes wrong• if you have time, save everything in sight
29
Database Recovery
• Point in time recovery• Last checkpoint only• Last checkpoint + 10 hours work• 5 checkpoints ago
• Based on available files
30
Database Recovery - Examples
• Command Line• Last checkpoint only, no journals
Rollforwarddb +c –j dbname• Last checkpoint, journals to 12:32 on 10/05/02
Rollforwarddb +c +j dbname –e10-may-2002:12:32:00
33
Recovery Scenarios
• Data area is lost• shut down Ingres if it's not down• restore data directories with db config file• restart Ingres
• transaction log contents can be moved to journals only if a valid config file is available!
• rollforwarddb• up-to-the-minute recovery should be possible
34
Recovery Scenarios
• Transaction log is lost• wasn't it mirrored?• recreate transaction log• rollforwarddb• most recent transactions not moved to journals will be lost
35
Recovery Scenarios
• Checkpoint or dump location is lost• recreate location directories• take fresh checkpoint• loss of checkpoint area should not affect running database
36
Recovery Scenarios
• Journal location is lost• installation will continue to run until transaction log fills up• recreate journal directory• alterdb -disable_journaling to halt journaling• restart archiver which will have stopped due to inability to
write journals• ckpdb +j to restart journaling
37
Recovery Scenarios
• Software or human error is discovered
• If mistake is discovered immediately:• crash/restart Ingres, or remove all user sessions• rollforwarddb with -e option to replay journals, stopping
short of the time of mistake
• If mistake isn't discovered until later, recovery is more complicated• Ingres Journal Analyzer (IJA) can help
38
Accidental Transaction
• AuditDB• Filter against
• Table• Users• Time
• Scan Journal files• Generate SQL• Execute
39
Accidental Transaction
• Ingres Journal Analyzer• Auditdb with Knobs on…• Connect to remote servers• Force Log Flush• Point and Click
43
Recovery Scenarios
• Disaster
• Use OS backups to restore Ingres system directories, all data, work, checkpoint, dump, journal directories
• rollforwarddb iidbdb• you have been checkpointing iidbdb, right?• restores users, locations, database privileges, etc
• rollforwarddb databases
44
Recovery Scenarios
• Rollforwarddb failure• restore the config or dump info you saved before
attempting rollforwarddb• rename aaaaaaaa.rfc back to aaaaaaaa.cnf if it exists• cure any other rollforwarddb complaints• try again
• Last checkpoint didn't work• use ckpdb #cn to restore an older one• you do have more than one checkpoint around, right?
45
Lost Table
• Table can be recovered
• From table checkpoint only
• Enforce logical consistency
• Journaling must be enabled
46
Table Checkpoints - Examples
• Command line• Checkpoint table t1
ckpdb dbname –table=t1• Checkpoint table t1 and t2
ckpdb dbname –table=t1,t2
47
Table Recovery - Examples
• From table checkpoint only
• Command line• Recover table t1
rollforwarddb dbname –table=t1• Recover table t1 and t2
rollforwarddb dbname –table=t1,t2
49
Infodb / aaaaaaaa.cnf
• Shows meta-data about database• Locations• Checkpoint sequence
• Valid / Invalid• Dump / Journal sequence• Counters
• Last table id• Last valid checkpoint
50
Infodb / aaaaaaaa.cnf
• Info stored in aaaaaaaa.cnf
• Three copies• Primary database location• Dump location as aaaaaaaa.cnf• Dump location as cxxxx.dmp
• Infodb reads CNF file in database area
• Copy to dump area with every change• II_DUMP• database own dump area
51
Checkpoint files
• Stored in 1 location• II_CHECKPOINT• Database defined checkpoint area
• One file for each location
• Format depends on archiver used
52
Dump files
• Changes during ONLINE checkpoint
• Required for recovery
• Single location• II_DUMP• Database defined dump area
53
Journal Files
• Record of changes• Table configuration
• Facilitates point in time recovery
• Files stored in single location• II_JOURNAL• Database defined journal area
54
Backing up the backup files
• OFFLINE Checkpoint• Database aaaaaaaa.cnf• Dump aaaaaaaa.cnf• Output from infodb• Checkpoint• Journals
• ONLINE Checkpoint• All above• Dump files
55
Cleaning up
• ckpdb –d• All but the last checkpoint• Dump, journal files deleted as well
• alterdb –delete_oldest_ckp• Oldest checkpoint only• Maintain set of checkpoints• Dump, journal files deleted as well
56
Customisation• cktmpl.def
• $II_SYSTEM/ingres/files
• Defines actions• Before / During / After• Tape• Disk
• II_CKTMPL_FILE• ingsetenv only
• Most common entries to change:• WSDD: work phase of regular checkpoint• WRDD: work phase of regular rollforward
• Some things you can do:• add compression/decompression• use a different utility (eg star instead of tar)• wild and crazy stuff
• Test both checkpoint and restore after modifying the template
57
Issues To Consider
• Files• Ingres supports large files• OS archiver utility may not
• POSIX standard• tar• cpio
58
Tips and Cautions
• Hardware "solutions" aren't solutions• "I don’t need to backup, I have magic solution of the
moment"• RAID 5, mirroring, whatever• you aren't protected against software failures• you aren't protected against human failures• you aren't protected against disasters• you may not be protected against multiple hardware
failures• you are putting all your eggs in one basket
59
Tips and Cautions
• Backups are no good if they don't work• make sure that ckpdb works• automatic verification is better than manual verification
• not ensuring that checkpoints are working may be the #1 cause of recovery failure
• Automate as much as possible• error checking• disk space checking• old-checkpoint deletion
60
Tips and Cautions
• A choice of checkpoints is better than just one• avoid ckpdb -d (delete all prior checkpoints)• alterdb -delete_oldest_ckp is better• manual (or scripted) deletion of old checkpoints is often best
• maintains checkpoint history in the config file
• Keep as many checkpoints as you can• gives you more recovery options• don't skimp on checkpoint disk space (disks are cheap!)• you can delete checkpoints but keep journals• it's all on OS backups, right??
61
Tips and Cautions
• Be wary of checkpointing to tape• nasty, unreliable devices they are• "oops, there wasn't a tape in the drive"• if you must use tape, verify your backups regularly
• tape drives have been known to write unreadable tapes
• Keep checkpoint and dump locations together• on the same file system or drive• keep them on the same OS backup schedule• checkpoints are worthless without the dump info
62
Tips and Cautions
• Practice is essential• not just once, but regularly• practice on look-alike installation if production is not
available• practice on production at least occasionally
• clean Ingres shutdown• OS backup everything in sight• verify the OS backup, then run your recovery tests
• you need hardware resources to support your recovery practice
63
Tips and Cautions
• Document your recovery procedures• let someone else do a trial recovery• keep the procedures up to date• make sure that more than one person knows how to do a
recovery
• make sure that more than one person knows where to find the documentation
• – keep a copy offsite or in a safe place
64
Tips and Cautions
• Backing up and archiving are different• a backup has a short useful lifetime• an archive (unload) is good indefinitely
• Backup planning and disaster recovery planning are different• recoverable backups are just one aspect of a complete
disaster recovery plan
65
More Information
• Ingres DBA guide• Chapter 15 (2.6)
• Ingres Command Reference Guide
• Compressed Checkpoints• Servicedesk Doc ID 409751
66
Summary
• Backups deserve more than lip service
• Ensuring 100% recoverable backups takes time, effort, and money
• Ingres checkpoint and rollforward capabilities are simple yet powerful and customisable
• With proper practice and procedures, a recovery is nothing to be afraid of