Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business...

Post on 26-Dec-2015

250 views 1 download

Tags:

Transcript of Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business...

Principles of Incident Response and

Disaster Recovery

Chapter 6Contingency Strategies for Business

Resumption Planning

Principles of Incident Response and Disaster Recovery 2

Objectives

• Know and understand the relationships between the overall use of contingency planning and the subordinate elements of incident response, business resumption, disaster recovery, and business continuity planning

• Become familiar with the techniques used for data and application backup and recovery

• Know the strategies employed for resumption of critical business processes at alternate and recovered sites

Principles of Incident Response and Disaster Recovery 3

Introduction

• Contingency planning addresses everything done by an organization to prepare for the unexpected

• IR process focuses on detecting, evaluating, and reacting to an incident

• Later phases focus on keeping the business functioning even if the physical plant is destroyed or unavailable

• Business resumption (BR) plan: takes over when the IR process cannot contain and resolve an incident

Principles of Incident Response and Disaster Recovery 4

Introduction (continued)• Business resumption (BR) plan major elements:

– Disaster recovery (DR) plan: lists and describes the efforts to resume normal operations at the primary places of business

– Business continuity (BC) plan: contains steps for implementing critical business functions using alternative mechanisms until normal operations can be resumed at the primary site or elsewhere

• Primary site: location(s) at which the organization executes its functions

• BR plan operates concurrently with DR plan when damage is major or long-term

Principles of Incident Response and Disaster Recovery 5

Introduction (continued)

Principles of Incident Response and Disaster Recovery 6

Introduction (continued)

Principles of Incident Response and Disaster Recovery 7

Introduction (continued)

• Each component of CP (IRP, DRP, and BCP) comes into play at specific times in the life of an event

• 5 key procedural mechanisms for restoring critical information and facilitating continuation of operations:– Delayed protection– Real-time protection– Server recovery– Application recovery– Site recovery

Principles of Incident Response and Disaster Recovery 8

Data and Application Resumption• Backup methods must be used according to an

established policy:– How often to back up– How long to retain the backups– What must be backed up

• Data files and critical system files should be backed up daily, with one copy on-site and one copy off-site

• Nonessential files should be backed up weekly

• Full backups: keep at least one copy in a secure location off-site

Principles of Incident Response and Disaster Recovery 9

Disk-to-Disk-to-Tape: Delayed Protection

• Decreasing costs of storage media, especially hard drives and removable drives, precludes the time-consuming nature of tape backup

• Storage area networks provide on-line backups• Lack of redundancy if both online and backup

versions fail or are attacked dictates that tape backup is still required periodically

• Disk-to-disk initial copies are efficient and can run simultaneously with other processes

• Secondary disk-to-tape copies do not affect production processing

Principles of Incident Response and Disaster Recovery 10

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Types of backups:– Full backup– Differential backup– Incremental backup

• Full backup: – Includes entire system, including applications, OS

components, and data– Pro: provides a comprehensive snapshot– Con: requires large media; time consuming

Principles of Incident Response and Disaster Recovery 11

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Differential backup:– Includes all files that have changed or been added

since the last full backup– Pro: faster and less storage space than full backup;

only 1 backup file needed to restore from full backup– Con: gets larger each day and takes longer; one

corrupt file loses everything• Incremental backup:

– Includes only files that were modified that day– Pro: requires less space and time than the

differential– Con: multiple incremental backups are required to

restore from the last full backup

Principles of Incident Response and Disaster Recovery 12

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Fastest backup method: incremental backups• Fastest recovery time: differential backups• All on-site and off-site storage must be secured and

must have a controlled environment (temperature and humidity)

• Media should be clearly labeled and write-protected• Tape media types:

– Digital audio tape (DAT)– Quarter-inch cartridge (QIC)– 8 mm tape– Digital linear tape (DLT)

Principles of Incident Response and Disaster Recovery 13

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Typical backup scheduling:– Daily: on-site incremental or differential backup– Weekly: off-site full backup

• Tape media should be retired and replaced periodically

• Popular strategies for selecting the files to back up:– Six-tape rotation– Grandfather-Father-Son– Towers of Hanoi

Principles of Incident Response and Disaster Recovery 14

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Six-tape rotation:– Uses a rotation of six sets of media – Five media sets per week are used with one extra

labeled Friday2– Friday full backup is taken off-site– Friday1 and Friday2 are rotated off-site every week– Provides roughly 2 weeks of recovery capability– Variation: keep a copy of each off-site Friday tape on-

site for faster recovery

Principles of Incident Response and Disaster Recovery 15

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Grandfather-Father-Son (GFS):– Uses five media sets per week– Allows recovery for previous 3 weeks– First week uses first set, second week uses second

set, third week uses third set– Following week starts with first set– Every 2nd or 3rd month, a group of media sets are

taken out of the cycle for permanent storage and replaced with a new set

Principles of Incident Response and Disaster Recovery 16

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Towers of Hanoi:– More complex approach– Based on statistical principles to optimize media wear– 16-step strategy assumes that 5 media sets are used

per week on a daily basis– First media set is used more often and must be

monitored for wear

Principles of Incident Response and Disaster Recovery 17

Disk-to-Disk-to-Tape: Delayed Protection (continued)

Principles of Incident Response and Disaster Recovery 18

Disk-to-Disk-to-Tape: Delayed Protection (continued)

Principles of Incident Response and Disaster Recovery 19

Disk-to-Disk-to-Tape: Delayed Protection (continued)

Principles of Incident Response and Disaster Recovery 20

Disk-to-Disk-to-Tape: Delayed Protection (continued)

Principles of Incident Response and Disaster Recovery 21

Redundancy-Based Backup and Recovery Using RAID

• Redundant array of independent disks (RAID): uses online disk drives for redundancy

• RAID spreads out data across multiple units, and offers recovery from hard drive failure

• 9 established RAID configurations: RAID Level 0 through 10

• RAID Level 0 (disk striping without parity):– Not redundant– Spreads data across several drives in segments

called stripes– Failure of one drive may make all data inaccessible

Principles of Incident Response and Disaster Recovery 22

Redundancy-Based Backup and Recovery Using RAID (continued)

• RAID Level 1 (disk mirroring):– Uses twin drives in a system– All data written to one drive is written to the other

simultaneously– Is expensive and is an inefficient use of disk space– Vulnerable to a disk controller failure– Disk duplexing: mirroring with dual disk controllers

• RAID Level 2:– Specialized form of disk striping with parity that is not

widely used– Uses the Hamming code for parity– No commercial implementations of this

Principles of Incident Response and Disaster Recovery 23

Redundancy-Based Backup and Recovery Using RAID (continued)

• RAID Levels 3 and 4: – RAID 3 uses byte-level striping while RAID 4 uses

block-level striping– Parity information is stored on a separate drive and

provides error recovery• RAID Level 5:

– Balances safety and redundancy against costs– Stripes data across multiple drives– Parity is interleaved with data segments on all drives– Hot-swappable: drives can be replaced without

shutting down the system

Principles of Incident Response and Disaster Recovery 24

Redundancy-Based Backup and Recovery Using RAID (continued)

• RAID Level 6:– Combination of RAID 1 and RAID 5– Performs two different parity computations or the

same computation on overlapping subsets of data• RAID Level 7:

– Proprietary variation on RAID 5 in which the array works as a single virtual drive

– May be implemented via software running on RAID 5 hardware

• RAID Level 10:– Combination of RAID 1 and RAID 0

Principles of Incident Response and Disaster Recovery 25

Redundancy-Based Backup and Recovery Using RAID (continued)

Principles of Incident Response and Disaster Recovery 26

Database Backups

• Databases require special considerations when planning backup and recovery procedures– Are special utilities required to perform database

backups?– Can the database be backed up without interrupting

its use?– Are there additional journal files or database system

files that are required in order to use backup tapes or disk images?

Principles of Incident Response and Disaster Recovery 27

Application Backups

• Some applications use file systems and databases in unusual ways

• Members of the application development and support teams should be involved in the planning process

Principles of Incident Response and Disaster Recovery 28

Backup and Recovery Plans

• The backup and recovery setting should be provided with complete recovery plans

• Plans need to be developed, tested, and rehearsed periodically

• Plans should include information about:– How and when backups are created and verified– Who is responsible for backup creation and

verification– Storage and retention of backup media– Review cycle of the plan– Rehearsal of the plan

Principles of Incident Response and Disaster Recovery 29

Real-Time Protection, Server Recovery, and Application Recovery

• Entire servers can be mirrored to provide real-time protection and recovery in a strategy of hot, warm, and cold servers– Hot server: the server in production– Warm server: backup server that is running and may

handle overflow work from hot server– Cold server: offline, test server

• If hot server goes down, warm and cold servers are promoted while the hot server is being repaired

• Bare metal recovery: technologies designed to replace operating systems and services when they fail

Principles of Incident Response and Disaster Recovery 30

Real-Time Protection, Server Recovery, and Application Recovery

(continued)• Application recovery (or clustering plus

replication):– Applications are installed on multiple servers – If one fails, the secondary systems take over the role

• Electronic vaulting: – Bulk transfer of data in batches to an off-site facility– Receiving server archives the data– Can be more expensive than tape backup and slower

than data mirroring– Data must be encrypted for transfer over public

infrastructure

Principles of Incident Response and Disaster Recovery 31

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

Principles of Incident Response and Disaster Recovery 32

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

• Remote journaling (RJ): – Transfer of live transactions to an off-site facility– Only transactions are transferred in near real-time to

a remote location– Facilitates the recovery of key transactions in near

real-time

Principles of Incident Response and Disaster Recovery 33

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

Principles of Incident Response and Disaster Recovery 34

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

• Database shadowing (or databank shadowing): – Storage of duplicate online transaction data and

duplication of databases at a remote site on a redundant server

– Both databases are updated, but only the primary responds to the user

– Combines electronic vaulting with remote journaling– Used when immediate data recovery is a priority– Also used for data warehousing, data mining, batch

reporting, complex SQL queries, local access at the shadow site, and load balancing

Principles of Incident Response and Disaster Recovery 35

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

Principles of Incident Response and Disaster Recovery 36

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

• Network-attached storage (NAS):– Usually a single device or server attached to a

network to provide online storage– Not well suited for real-time applications due to

latency

• Storage area networks (SANs):– Online storage devices connected by fiber-channel

direct connections between the servers and the additional storage

Principles of Incident Response and Disaster Recovery 37

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

Principles of Incident Response and Disaster Recovery 38

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

Principles of Incident Response and Disaster Recovery 39

Site Resumption Strategies

• If the primary business site is not available, alternative processing capability may be needed

• CPMT can choose from several strategies for business resumption planning

• Exclusive control options:– Hot sites– Warm sites– Cold sites

• Shared-use options:– Timeshare– Service bureaus– Mutual agreements

Principles of Incident Response and Disaster Recovery 40

Exclusive Site Resumption Strategies

Principles of Incident Response and Disaster Recovery 41

Exclusive Site Resumption Strategies (continued)

• Hot site:– Fully configured computer facility– Duplicates computing resources, peripherals, phone

systems, applications, and workstations – Can be 24/7 if desired– Can be a mirrored site that is identical to the primary

site

Principles of Incident Response and Disaster Recovery 42

Exclusive Site Resumption Strategies (continued)

Principles of Incident Response and Disaster Recovery 43

Exclusive Site Resumption Strategies (continued)

• Warm site:– Provides some of the same services and options as a

hot site– May include computing equipment and peripherals

but not workstations– Has access to data backups or off-site storage– Lower cost than a hot site, but takes more time to be

fully functional

Principles of Incident Response and Disaster Recovery 44

Exclusive Site Resumption Strategies (continued)

Principles of Incident Response and Disaster Recovery 45

Exclusive Site Resumption Strategies (continued)

• Cold site:– Provides only rudimentary services and facilities– No computer hardware or software are provided– Communications services must be installed when the

site is occupied– Often no quick recovery or data duplication functions

on site– Primary advantage is cost

Principles of Incident Response and Disaster Recovery 46

Exclusive Site Resumption Strategies (continued)

Principles of Incident Response and Disaster Recovery 47

Exclusive Site Resumption Strategies (continued)

• Other options:– Rolling mobile site configured in the payload area of a

tractor-trailer– Rental storage area with duplicate or second

generation equipment– Mobile temporary offices

Principles of Incident Response and Disaster Recovery 48

Exclusive Site Resumption Strategies (continued)

Principles of Incident Response and Disaster Recovery 49

Shared Site Resumption Strategies

• Timeshare:– Leased site shared with other organizations– Possibility that more than one organization might

need the facility simultaneously

• Service bureaus:– Service agency that provides physical facilities in the

event of a disaster– May provide off-site data storage

Principles of Incident Response and Disaster Recovery 50

Shared Site Resumption Strategies (continued)

• Mutual agreement:– Contract between two organizations to provide

mutual assistance in the event of a disaster– Each organization is obligated to provide facilities,

resources, and services to the other – Good for divisions of the same parent company,

between business partners, or when both parties have similar capabilities and capacities

– A memorandum of agreement (MOA) should be drawn up with specific details

Principles of Incident Response and Disaster Recovery 51

Service Agreements

• Service agreement:– A contractual document guaranteeing certain

minimum levels of service provided by a vendor

• Service agreement should specify:– The parties in the agreement– Services to be provided by the vendor– Fees and payments for those services– Statements of indemnification– Nondisclosure agreements and intellectual property

assurances– Noncompetitive agreements

Principles of Incident Response and Disaster Recovery 52

Summary• Contingency planning includes everything done to

prepare for the unexpected and recover from it

• BR plan includes the DR plan for resuming operations at the primary site and the BC plan for moving to an alternate site if needed

• 5 procedural mechanisms for restoration of critical data: delayed protection, real-time protection, server recovery, application recovery, and site recovery

• Backup plan is essential

• Retention period for backups must be specified

Principles of Incident Response and Disaster Recovery 53

Summary (continued)

• 3 types of backups: full, differential, and incremental

• RAID systems provide online disk drives for redundancy

• Databases require special considerations for backup and recovery planning

• Mirroring and duplication of server data storage provide real-time protection

• Electronic vaulting, remote journaling, and database shadowing store data at remote locations

Principles of Incident Response and Disaster Recovery 54

Summary (continued)

• Business resumption strategies include hot sites, warm sites, cold sites, timeshare, service bureaus, and mutual agreements

• Service agreements guarantee certain minimum levels of service by the vendor