Post on 26-Dec-2015
Principles of Incident Response and
Disaster Recovery
Chapter 6Contingency Strategies for Business
Resumption Planning
Principles of Incident Response and Disaster Recovery 2
Objectives
• Know and understand the relationships between the overall use of contingency planning and the subordinate elements of incident response, business resumption, disaster recovery, and business continuity planning
• Become familiar with the techniques used for data and application backup and recovery
• Know the strategies employed for resumption of critical business processes at alternate and recovered sites
Principles of Incident Response and Disaster Recovery 3
Introduction
• Contingency planning addresses everything done by an organization to prepare for the unexpected
• IR process focuses on detecting, evaluating, and reacting to an incident
• Later phases focus on keeping the business functioning even if the physical plant is destroyed or unavailable
• Business resumption (BR) plan: takes over when the IR process cannot contain and resolve an incident
Principles of Incident Response and Disaster Recovery 4
Introduction (continued)• Business resumption (BR) plan major elements:
– Disaster recovery (DR) plan: lists and describes the efforts to resume normal operations at the primary places of business
– Business continuity (BC) plan: contains steps for implementing critical business functions using alternative mechanisms until normal operations can be resumed at the primary site or elsewhere
• Primary site: location(s) at which the organization executes its functions
• BR plan operates concurrently with DR plan when damage is major or long-term
Principles of Incident Response and Disaster Recovery 5
Introduction (continued)
Principles of Incident Response and Disaster Recovery 6
Introduction (continued)
Principles of Incident Response and Disaster Recovery 7
Introduction (continued)
• Each component of CP (IRP, DRP, and BCP) comes into play at specific times in the life of an event
• 5 key procedural mechanisms for restoring critical information and facilitating continuation of operations:– Delayed protection– Real-time protection– Server recovery– Application recovery– Site recovery
Principles of Incident Response and Disaster Recovery 8
Data and Application Resumption• Backup methods must be used according to an
established policy:– How often to back up– How long to retain the backups– What must be backed up
• Data files and critical system files should be backed up daily, with one copy on-site and one copy off-site
• Nonessential files should be backed up weekly
• Full backups: keep at least one copy in a secure location off-site
Principles of Incident Response and Disaster Recovery 9
Disk-to-Disk-to-Tape: Delayed Protection
• Decreasing costs of storage media, especially hard drives and removable drives, precludes the time-consuming nature of tape backup
• Storage area networks provide on-line backups• Lack of redundancy if both online and backup
versions fail or are attacked dictates that tape backup is still required periodically
• Disk-to-disk initial copies are efficient and can run simultaneously with other processes
• Secondary disk-to-tape copies do not affect production processing
Principles of Incident Response and Disaster Recovery 10
Disk-to-Disk-to-Tape: Delayed Protection (continued)
• Types of backups:– Full backup– Differential backup– Incremental backup
• Full backup: – Includes entire system, including applications, OS
components, and data– Pro: provides a comprehensive snapshot– Con: requires large media; time consuming
Principles of Incident Response and Disaster Recovery 11
Disk-to-Disk-to-Tape: Delayed Protection (continued)
• Differential backup:– Includes all files that have changed or been added
since the last full backup– Pro: faster and less storage space than full backup;
only 1 backup file needed to restore from full backup– Con: gets larger each day and takes longer; one
corrupt file loses everything• Incremental backup:
– Includes only files that were modified that day– Pro: requires less space and time than the
differential– Con: multiple incremental backups are required to
restore from the last full backup
Principles of Incident Response and Disaster Recovery 12
Disk-to-Disk-to-Tape: Delayed Protection (continued)
• Fastest backup method: incremental backups• Fastest recovery time: differential backups• All on-site and off-site storage must be secured and
must have a controlled environment (temperature and humidity)
• Media should be clearly labeled and write-protected• Tape media types:
– Digital audio tape (DAT)– Quarter-inch cartridge (QIC)– 8 mm tape– Digital linear tape (DLT)
Principles of Incident Response and Disaster Recovery 13
Disk-to-Disk-to-Tape: Delayed Protection (continued)
• Typical backup scheduling:– Daily: on-site incremental or differential backup– Weekly: off-site full backup
• Tape media should be retired and replaced periodically
• Popular strategies for selecting the files to back up:– Six-tape rotation– Grandfather-Father-Son– Towers of Hanoi
Principles of Incident Response and Disaster Recovery 14
Disk-to-Disk-to-Tape: Delayed Protection (continued)
• Six-tape rotation:– Uses a rotation of six sets of media – Five media sets per week are used with one extra
labeled Friday2– Friday full backup is taken off-site– Friday1 and Friday2 are rotated off-site every week– Provides roughly 2 weeks of recovery capability– Variation: keep a copy of each off-site Friday tape on-
site for faster recovery
Principles of Incident Response and Disaster Recovery 15
Disk-to-Disk-to-Tape: Delayed Protection (continued)
• Grandfather-Father-Son (GFS):– Uses five media sets per week– Allows recovery for previous 3 weeks– First week uses first set, second week uses second
set, third week uses third set– Following week starts with first set– Every 2nd or 3rd month, a group of media sets are
taken out of the cycle for permanent storage and replaced with a new set
Principles of Incident Response and Disaster Recovery 16
Disk-to-Disk-to-Tape: Delayed Protection (continued)
• Towers of Hanoi:– More complex approach– Based on statistical principles to optimize media wear– 16-step strategy assumes that 5 media sets are used
per week on a daily basis– First media set is used more often and must be
monitored for wear
Principles of Incident Response and Disaster Recovery 17
Disk-to-Disk-to-Tape: Delayed Protection (continued)
Principles of Incident Response and Disaster Recovery 18
Disk-to-Disk-to-Tape: Delayed Protection (continued)
Principles of Incident Response and Disaster Recovery 19
Disk-to-Disk-to-Tape: Delayed Protection (continued)
Principles of Incident Response and Disaster Recovery 20
Disk-to-Disk-to-Tape: Delayed Protection (continued)
Principles of Incident Response and Disaster Recovery 21
Redundancy-Based Backup and Recovery Using RAID
• Redundant array of independent disks (RAID): uses online disk drives for redundancy
• RAID spreads out data across multiple units, and offers recovery from hard drive failure
• 9 established RAID configurations: RAID Level 0 through 10
• RAID Level 0 (disk striping without parity):– Not redundant– Spreads data across several drives in segments
called stripes– Failure of one drive may make all data inaccessible
Principles of Incident Response and Disaster Recovery 22
Redundancy-Based Backup and Recovery Using RAID (continued)
• RAID Level 1 (disk mirroring):– Uses twin drives in a system– All data written to one drive is written to the other
simultaneously– Is expensive and is an inefficient use of disk space– Vulnerable to a disk controller failure– Disk duplexing: mirroring with dual disk controllers
• RAID Level 2:– Specialized form of disk striping with parity that is not
widely used– Uses the Hamming code for parity– No commercial implementations of this
Principles of Incident Response and Disaster Recovery 23
Redundancy-Based Backup and Recovery Using RAID (continued)
• RAID Levels 3 and 4: – RAID 3 uses byte-level striping while RAID 4 uses
block-level striping– Parity information is stored on a separate drive and
provides error recovery• RAID Level 5:
– Balances safety and redundancy against costs– Stripes data across multiple drives– Parity is interleaved with data segments on all drives– Hot-swappable: drives can be replaced without
shutting down the system
Principles of Incident Response and Disaster Recovery 24
Redundancy-Based Backup and Recovery Using RAID (continued)
• RAID Level 6:– Combination of RAID 1 and RAID 5– Performs two different parity computations or the
same computation on overlapping subsets of data• RAID Level 7:
– Proprietary variation on RAID 5 in which the array works as a single virtual drive
– May be implemented via software running on RAID 5 hardware
• RAID Level 10:– Combination of RAID 1 and RAID 0
Principles of Incident Response and Disaster Recovery 25
Redundancy-Based Backup and Recovery Using RAID (continued)
Principles of Incident Response and Disaster Recovery 26
Database Backups
• Databases require special considerations when planning backup and recovery procedures– Are special utilities required to perform database
backups?– Can the database be backed up without interrupting
its use?– Are there additional journal files or database system
files that are required in order to use backup tapes or disk images?
Principles of Incident Response and Disaster Recovery 27
Application Backups
• Some applications use file systems and databases in unusual ways
• Members of the application development and support teams should be involved in the planning process
Principles of Incident Response and Disaster Recovery 28
Backup and Recovery Plans
• The backup and recovery setting should be provided with complete recovery plans
• Plans need to be developed, tested, and rehearsed periodically
• Plans should include information about:– How and when backups are created and verified– Who is responsible for backup creation and
verification– Storage and retention of backup media– Review cycle of the plan– Rehearsal of the plan
Principles of Incident Response and Disaster Recovery 29
Real-Time Protection, Server Recovery, and Application Recovery
• Entire servers can be mirrored to provide real-time protection and recovery in a strategy of hot, warm, and cold servers– Hot server: the server in production– Warm server: backup server that is running and may
handle overflow work from hot server– Cold server: offline, test server
• If hot server goes down, warm and cold servers are promoted while the hot server is being repaired
• Bare metal recovery: technologies designed to replace operating systems and services when they fail
Principles of Incident Response and Disaster Recovery 30
Real-Time Protection, Server Recovery, and Application Recovery
(continued)• Application recovery (or clustering plus
replication):– Applications are installed on multiple servers – If one fails, the secondary systems take over the role
• Electronic vaulting: – Bulk transfer of data in batches to an off-site facility– Receiving server archives the data– Can be more expensive than tape backup and slower
than data mirroring– Data must be encrypted for transfer over public
infrastructure
Principles of Incident Response and Disaster Recovery 31
Real-Time Protection, Server Recovery, and Application Recovery
(continued)
Principles of Incident Response and Disaster Recovery 32
Real-Time Protection, Server Recovery, and Application Recovery
(continued)
• Remote journaling (RJ): – Transfer of live transactions to an off-site facility– Only transactions are transferred in near real-time to
a remote location– Facilitates the recovery of key transactions in near
real-time
Principles of Incident Response and Disaster Recovery 33
Real-Time Protection, Server Recovery, and Application Recovery
(continued)
Principles of Incident Response and Disaster Recovery 34
Real-Time Protection, Server Recovery, and Application Recovery
(continued)
• Database shadowing (or databank shadowing): – Storage of duplicate online transaction data and
duplication of databases at a remote site on a redundant server
– Both databases are updated, but only the primary responds to the user
– Combines electronic vaulting with remote journaling– Used when immediate data recovery is a priority– Also used for data warehousing, data mining, batch
reporting, complex SQL queries, local access at the shadow site, and load balancing
Principles of Incident Response and Disaster Recovery 35
Real-Time Protection, Server Recovery, and Application Recovery
(continued)
Principles of Incident Response and Disaster Recovery 36
Real-Time Protection, Server Recovery, and Application Recovery
(continued)
• Network-attached storage (NAS):– Usually a single device or server attached to a
network to provide online storage– Not well suited for real-time applications due to
latency
• Storage area networks (SANs):– Online storage devices connected by fiber-channel
direct connections between the servers and the additional storage
Principles of Incident Response and Disaster Recovery 37
Real-Time Protection, Server Recovery, and Application Recovery
(continued)
Principles of Incident Response and Disaster Recovery 38
Real-Time Protection, Server Recovery, and Application Recovery
(continued)
Principles of Incident Response and Disaster Recovery 39
Site Resumption Strategies
• If the primary business site is not available, alternative processing capability may be needed
• CPMT can choose from several strategies for business resumption planning
• Exclusive control options:– Hot sites– Warm sites– Cold sites
• Shared-use options:– Timeshare– Service bureaus– Mutual agreements
Principles of Incident Response and Disaster Recovery 40
Exclusive Site Resumption Strategies
Principles of Incident Response and Disaster Recovery 41
Exclusive Site Resumption Strategies (continued)
• Hot site:– Fully configured computer facility– Duplicates computing resources, peripherals, phone
systems, applications, and workstations – Can be 24/7 if desired– Can be a mirrored site that is identical to the primary
site
Principles of Incident Response and Disaster Recovery 42
Exclusive Site Resumption Strategies (continued)
Principles of Incident Response and Disaster Recovery 43
Exclusive Site Resumption Strategies (continued)
• Warm site:– Provides some of the same services and options as a
hot site– May include computing equipment and peripherals
but not workstations– Has access to data backups or off-site storage– Lower cost than a hot site, but takes more time to be
fully functional
Principles of Incident Response and Disaster Recovery 44
Exclusive Site Resumption Strategies (continued)
Principles of Incident Response and Disaster Recovery 45
Exclusive Site Resumption Strategies (continued)
• Cold site:– Provides only rudimentary services and facilities– No computer hardware or software are provided– Communications services must be installed when the
site is occupied– Often no quick recovery or data duplication functions
on site– Primary advantage is cost
Principles of Incident Response and Disaster Recovery 46
Exclusive Site Resumption Strategies (continued)
Principles of Incident Response and Disaster Recovery 47
Exclusive Site Resumption Strategies (continued)
• Other options:– Rolling mobile site configured in the payload area of a
tractor-trailer– Rental storage area with duplicate or second
generation equipment– Mobile temporary offices
Principles of Incident Response and Disaster Recovery 48
Exclusive Site Resumption Strategies (continued)
Principles of Incident Response and Disaster Recovery 49
Shared Site Resumption Strategies
• Timeshare:– Leased site shared with other organizations– Possibility that more than one organization might
need the facility simultaneously
• Service bureaus:– Service agency that provides physical facilities in the
event of a disaster– May provide off-site data storage
Principles of Incident Response and Disaster Recovery 50
Shared Site Resumption Strategies (continued)
• Mutual agreement:– Contract between two organizations to provide
mutual assistance in the event of a disaster– Each organization is obligated to provide facilities,
resources, and services to the other – Good for divisions of the same parent company,
between business partners, or when both parties have similar capabilities and capacities
– A memorandum of agreement (MOA) should be drawn up with specific details
Principles of Incident Response and Disaster Recovery 51
Service Agreements
• Service agreement:– A contractual document guaranteeing certain
minimum levels of service provided by a vendor
• Service agreement should specify:– The parties in the agreement– Services to be provided by the vendor– Fees and payments for those services– Statements of indemnification– Nondisclosure agreements and intellectual property
assurances– Noncompetitive agreements
Principles of Incident Response and Disaster Recovery 52
Summary• Contingency planning includes everything done to
prepare for the unexpected and recover from it
• BR plan includes the DR plan for resuming operations at the primary site and the BC plan for moving to an alternate site if needed
• 5 procedural mechanisms for restoration of critical data: delayed protection, real-time protection, server recovery, application recovery, and site recovery
• Backup plan is essential
• Retention period for backups must be specified
Principles of Incident Response and Disaster Recovery 53
Summary (continued)
• 3 types of backups: full, differential, and incremental
• RAID systems provide online disk drives for redundancy
• Databases require special considerations for backup and recovery planning
• Mirroring and duplication of server data storage provide real-time protection
• Electronic vaulting, remote journaling, and database shadowing store data at remote locations
Principles of Incident Response and Disaster Recovery 54
Summary (continued)
• Business resumption strategies include hot sites, warm sites, cold sites, timeshare, service bureaus, and mutual agreements
• Service agreements guarantee certain minimum levels of service by the vendor