bcpdr

116
1 BUSINESS SURVIVAL A Guide to Business Continuity Planning and Disaster Recovery

Transcript of bcpdr

  • 1

    BUSINESS SURVIVAL A Guide to Business Continuity

    Planning and Disaster Recovery

  • Page 2 of 116

    CONTENTS

    1 WHAT IS BUSINESS CONTINUITY PLANNING? ......................................... 4

    1.1 Business Continuity Planning Defined ............................................................................................ 4

    1.2 Disaster Recovery Defined ............................................................................................................... 5

    1.3 Overall Steps ..................................................................................................................................... 5 1) Get Board approval ......................................................................................................................... 5 2) Determine scope ............................................................................................................................. 5 3) Carry out risk analysis / management (Business Impact Analysis) ................................................ 5 4) Create a project plan and budget .................................................................................................... 5 5) Create the plan (overall document) ................................................................................................. 5 6) Gather / Create supporting documentation ..................................................................................... 5 7) Test / review /audit the plan and the process .................................................................................. 5 8) Change Manage any changes made to the plan / process / documentation ..................................... 5 9) Formally approve the plan .............................................................................................................. 5 10) Return to 6 ...................................................................................................................................... 5

    2 CONVINCING THE BOARD ........................................................................... 6

    2.1 The Importance of Support from the Top ...................................................................................... 6

    2.2 Explaining Why its so Important ................................................................................................... 7

    3 DEFINING SCOPE ....................................................................................... 10

    3.1 Which Sites? .....................................................................................................................................10

    3.2 Which Systems? ...............................................................................................................................10

    3.3 Which Departments/Business Functions? .....................................................................................11

    3.4 Which Personnel? ............................................................................................................................13

    3.5 Business Partner Relationships ......................................................................................................13

    3.6 Which Types of Disasters and Risks? ............................................................................................15

    3.7 Which Legislation/Standards need to be considered? ..................................................................15

    3.8 Interaction with Other Organizations ...........................................................................................15

    3.9 Gap Analysis ....................................................................................................................................15

    3.10 Questionnaires .................................................................................................................................16

    4 RISK MANAGEMENT .................................................................................. 21 1) Identify Risks .................................................................................................................................22 2) Quantify Risks (Probability and Impact) .......................................................................................22

  • Page 3 of 116

    3) Risk Tolerance Levels ...................................................................................................................22 4) Allocate Risks to Appropriate Personnel .......................................................................................22 5) Risk Mitigation, Reduction and Response .....................................................................................23 6) Evaluation of Effectiveness ...........................................................................................................23

    4.1 Benefits of Risk Assessment / Management ..................................................................................23 1) Cost Justification ...........................................................................................................................23 2) Facilitation of Communication between all departments in the Business .....................................23 3) Business Responsibility .................................................................................................................23 4) Business Continuity Awareness ....................................................................................................23

    4.2 Risk Identification ...........................................................................................................................23 1) Environmental Disasters ................................................................................................................26 2) Equipment/System Failure.............................................................................................................26 3) Serious Information Security Incidents .........................................................................................26 4) Organized/Deliberate Disruption ...................................................................................................26 5) Loss of Utilities/Services ...............................................................................................................26 6) Business Partners ...........................................................................................................................26 7) Other Emergency Situations ..........................................................................................................27

    4.3 Risk Assessment ...............................................................................................................................29 1) Cost Impact ....................................................................................................................................29 2) Vulnerability Factors .....................................................................................................................31 3) Likely Loss ....................................................................................................................................32 4) Probability .....................................................................................................................................34

    4.4 Calculations ......................................................................................................................................40

    4.5 Risk Mitigation / Risk Response .....................................................................................................42 1) Controls .........................................................................................................................................43 2) Risk Appetite .................................................................................................................................48

    4.6 Risk Allocation .................................................................................................................................49

    4.7 Scenario Grouping of Risks ............................................................................................................49

    4.8 More on Risk Management.............................................................................................................50

    5 CREATING THE PLAN ................................................................................ 51

    5.1 Documents to use as Inputs to the Plan .........................................................................................53

    5.2 Purpose .............................................................................................................................................54

    5.3 Scope .................................................................................................................................................55

    5.4 Objectives .........................................................................................................................................55 1) Category I - Critical Functions Recovery Objective 2 hours ......................................................56 2) Category II - Essential Functions Recovery Objective 5 hours ..................................................56 3) Category III - Necessary Functions Recovery Objective 24 hours .............................................56 4) Category IV - Desirable Functions - Recovery Objective 48 hours .............................................56

    5.5 Distribution List ...............................................................................................................................56

  • Page 4 of 116

    5.6 Version Control ...............................................................................................................................57

    5.7 Review Process .................................................................................................................................57

    5.8 Strategies ..........................................................................................................................................58 1) Dual Site Method / Alternate Site Method ....................................................................................58 2) Bilateral Aid Agreement Method / Reciprocal Agreement Method ..............................................59 3) Dispersal Method ...........................................................................................................................59 4) Deference Method .........................................................................................................................59

    5.9 Functions, Responsibilities and Personnel Contact Info ..............................................................59

    5.10 Lists ...................................................................................................................................................60 1) IT Systems and Components .........................................................................................................60 2) List of key Documents ...................................................................................................................62 A list of all key documents related to the BCP should be provided. This should include references to

    Procedures, Policies and Guidelines as well as SLAs, contracts, insurance documents etc. .....................62 A key set of procedures to be included is the full set of backup and recovery documents for all IT

    systems: .....................................................................................................................................................62 3) Info about all buildings/sites ..........................................................................................................62 4) Key Personnel During Emergencies ..............................................................................................62 5) Emergency Services Contact information .....................................................................................63 6) Roles and Responsibilities .............................................................................................................63

    5.11 Policies and Procedures ...................................................................................................................64 1) Notification Procedures, to include ...............................................................................................64 Description/diagram of the notification process (to include notification to external authorities) ......64 Escalation Procedures ........................................................................................................................64 Script (Telephone Guidelines) for Notification .................................................................................64 Organizational Structure for Notification ..........................................................................................64 Recovery Team Personnel Notification .............................................................................................64 2) Emergency Procedures And Information, to include .....................................................................64 Alarm Systems information ...............................................................................................................64 Evacuation Procedures ......................................................................................................................64 Local Emergency Telephone Numbers..............................................................................................64 Vital Records Retrieval Procedures ...................................................................................................64 Vital Records Restoration Procedures ...............................................................................................64 Documentation Recovery Procedures ................................................................................................64 Start-up Procedures ...........................................................................................................................64 Network Control Center Restoration Procedures ..............................................................................64 Applications Software and Data Restoration Procedures ..................................................................64 Other Mission Critical Procedures/Information.................................................................................64 Management of security and logistics during emergency situations .................................................64 Procedures for moving critical information to a secure site ..............................................................65 Procedures for keeping outside agencies, local government agencies etc. informed.........................65 Procedures for determining whether or not recovery / restoration should be attempted ...................65 Procedures for training, implementing, testing and maintaining the plan .........................................65

    5.12 Contingency options/Redundancy ..................................................................................................66

    5.13 Key Timeframes ...............................................................................................................................66 Category I - Critical Functions Recovery Objective 2 hours ..........................................................66 Category II - Essential Functions Recovery Objective 5 hours ......................................................66

  • Page 5 of 116

    Category III - Necessary Functions Recovery Objective 24 hours .................................................67 Category IV - Desirable Functions - Recovery Objective 48 hours .................................................67

    5.14 Legal Requirements .........................................................................................................................67

    5.15 Best Business Practices (Standards) Requirements ......................................................................67

    5.16 Communications ..............................................................................................................................67 1) Internal Communications...............................................................................................................67 2) Communications Plan ....................................................................................................................68 3) Stakeholder communications .........................................................................................................69

    5.17 Action Task Lists .............................................................................................................................69

    5.18 Plan Testing and Maintenance .......................................................................................................69

    5.19 IT-specific Considerations ..............................................................................................................69 1) Perform backups regularly. Keep information central, this will help control information backup and help protect information integrity. Where information is decentralized (e.g. held on PCs), ensure

    that this information is also regularly backed up. ......................................................................................69 2) Increase physical security to server room to prevent Data loss. ....................................................69 3) Antivirus software should be in place on PCs, Servers (data, file and mail servers) and if possible at network level also. .................................................................................................................................69 4) Patch update and management, including patch management of operating systems, application software, database software, middleware software, firewall software and other network management

    software. ....................................................................................................................................................70 5) Change Management and Configuration Management procedures to ensure that it is easier to restore applications and system components back to the most recent build and configuration (setup)

    easily if the live (production) system is destroyed due to an incident. This should include procedures for

    reapplying patches to software and components, recreating firewall rules and policies, operating system

    settings etc. ................................................................................................................................................70 6) Internet facing systems are secured and maximum security is applied..........................................70 7) Remote access to data servers is controlled and strictly monitored. ..............................................70 8) Verify backups to ensure that they are not corrupt. .......................................................................70 9) Backups should be stored offsite. ..................................................................................................70 10) Replicate critical data in as close to real-time as possible .............................................................70 11) Use redundant hardware and software options wherever possible (e.g. RAID, hot failover servers, alternative ISPs, alternative firewalls, etc.) ...............................................................................................70 12) Physical solutions like fire suppression environmental monitoring and access control are implemented. .............................................................................................................................................70 13) Standardize the setups / configurations of all hardware, software and network components, and where possible create scripts to recreate those setups / configurations .....................................................70 14) Document all changes to system and application configurations, patches, versions etc. using proper change control. ...............................................................................................................................70 15) Perform systematic scheduled restores that verify Tape or backup media integrity. ............. Error! Bookmark not defined. 16) Ensure the whole process is documented and can be followed by non-technical personnel. ........70 17) Use UPS backup power supply options for all servers, and any critical PCs ................................70 18) Ensure that Intrusion Detection Software, Intrusion Prevention Software and / or good Firewall software is in place to ensure that hackers are either prevented from accessing systems, or are detected as

    soon as they access them. Ensure that the alerts from such software are taken seriously, reviewed and

    are followed up by key staff. .....................................................................................................................70

    5.20 People-specific Considerations .......................................................................................................70 1) Reducing Impact of Personnel Loss ..............................................................................................71

  • Page 6 of 116

    2) Reducing Impact of Perceived Events ...........................................................................................71

    5.21 Third Party Considerations ............................................................................................................72

    5.22 Sample Plans ....................................................................................................................................73

    6 MAINTAINING, TESTING AND AUDITING YOUR PLAN ........................... 74

    6.1 Testing Plan ......................................................................................................................................74 1) Planning .........................................................................................................................................74 2) Test Execution ...............................................................................................................................75 3) Evaluating testing ..........................................................................................................................75 4) Frequency of testing ......................................................................................................................75

    6.2 Proposed Testing Scenarios ............................................................................................................75 1) Scenario 1 ......................................................................................................................................75 2) Scenario 2 ......................................................................................................................................76 3) Scenario 3 ......................................................................................................................................76 4) Scenario 3 ......................................................................................................................................76

    6.3 Auditing/Testing Documentation ...................................................................................................76 1) Evaluating Backup and Recovery Strategy Documentation ..........................................................77 2) Evaluating SLAs ............................................................................................................................78

    6.4 Training ............................................................................................................................................80

    6.5 Review / Maintenance Process ........................................................................................................80

    6.6 Change Control/Version Control ...................................................................................................82

    7 FRAMEWORKS, METHODOLOGIES, TOOLS AND SERVICES ................ 83

    7.1 Why use a Framework/Methodology? ...........................................................................................83

    7.2 Which Framework/Methodology? .................................................................................................83 1) ITIL ...............................................................................................................................................83 2) COBRA .........................................................................................................................................84 3) NIST Risk Management Guide for IT Systems .............................................................................84 4) OCTAVE .......................................................................................................................................84 5) Six Sigma.......................................................................................................................................84 6) FISCAM (Federal Information System Controls Audit Manual) ..................................................84 FISCAM offers guidance to auditors of Federal Agencies systems in terms of integrity, confidentiality and availability. Section 3.6 goes into great detail about the requirements for service continuity and

    provides a framework for use by Auditors to assess compliance. .............................................................84 7) Other Methodologies and Frameworks..........................................................................................84

    7.3 Why use Tools? ................................................................................................................................85

    7.4 Which Tools? ...................................................................................................................................85 1) Risk Evaluation Tools ...................................................................................................................85 2) Self-Assessment Tools ..................................................................................................................85 3) Change Management Tools ...........................................................................................................85 4) Documentation Generators ............................................................................................................85

  • Page 7 of 116

    a) Policy and Procedure Generators .......................................................................................................85 b) SLA Generators .............................................................................................................................85 c) Questionnaire/Survey Generators ......................................................................................................85

    7.5 Which Services are Available? .......................................................................................................86 1) Web-based Services .......................................................................................................................86 2) Consultancy Services .....................................................................................................................86 3) Audits ............................................................................................................................................86

    8 LEGISLATION, EXTERNAL STANDARDS AND THEIR EFFECTS ........... 87

    8.1 Legislation and Regulations in the US ...........................................................................................87 1) Sarbanes Oxley Act .......................................................................................................................87 2) HIPAA ...........................................................................................................................................88 3) NASD ............................................................................................................................................88 4) GLBA ............................................................................................................................................88 5) Federal Information Security Act 2002 (FISM) ............................................................................88 6) OSHA 1970 (Occupational Safety and Health Administration) ....................................................88 7) Other relevant US legislation and regulations ...............................................................................89 41 Code of Federal Regulations 101.20.103-4, Occupant Emergency Program, revised as of July 1, 2000 ...........................................................................................................................................................90 36 Code of Federal Regulations, Part 1236, Management of Vital Records, revised as of July 1, 2000 ...........................................................................................................................................................90 Presidential Decision Directive 67, Protection Against Unconventional Threats to the Homeland and Americans Overseas, dated May 22,1998 ...........................................................................................90 Homeland Security Presidential Directive 3, Homeland Security Advisory System, dated March 11, 2002 ...........................................................................................................................................................90 Homeland Security Presidential Directive 5, Management of Domestic Incidents, dated February 28, 2003 .....................................................................................................................................................90 Homeland Security Presidential Directive 7, Critical Infrastructure Identification, Prioritization, and Protection, dated December 17, 2003 ........................................................................................................90 Homeland Security Presidential Directive 8, National Preparedness, dated December 17, 2003 .....90 Federal Preparedness Circular 60, Continuity of the Executive Branch of the Federal .....................90 Government at the Headquarters Level During National Security Emergencies, dated November 20, 1990 ...........................................................................................................................................................90

    8.2 Legislation in the UK .......................................................................................................................91 1) The UK Civil Contingencies Bill ..................................................................................................91 2) Data Protection Legislation ...........................................................................................................92

    8.3 Other Legislation and Directives ....................................................................................................92 1) EU Data Protection Directive 1995 ...............................................................................................92 2) WTO Government Procurement Agreement .................................................................................92 3) PIPEDA (Canada)..........................................................................................................................92 4) Singapore BC/DR Standard ...........................................................................................................92

    8.4 External Standards ..........................................................................................................................93 1) ISO .................................................................................................................................................93 a) ISO17799 ( BS7799) standard mandates that in order to comply an organization must have solid Business Continuity Management, and must take measures to: ................................................................93 prevent loss, damage or compromise of assets and interruption of business .....................................93 prevent compromise or theft of information and information processing facilities ...........................93 prevent loss, modification or misuse of user data in application systems..........................................93

  • Page 8 of 116

    protect the confidentiality, authenticity and integrity of information ................................................93 reduce risks of human error, theft, fraud or misuse of facilities ........................................................93 b) ISO9001 standard mandates Quality Management requirements of IT systems, including dictating that the Business Continuity process is well-documented, personnel are trained effectively, etc. ............93 2) BSI PAS 56....................................................................................................................................93 3) BSI5000 .........................................................................................................................................93 4) FIPS-PUB-87 Guidelines for Automated Data Processing Contingency Planning .......................93 5) ISF Standard for Information Security ..........................................................................................94 6) Visa CISP (Cardholder Information Security Program) and PCI (Payment Card Industry) requirements ..............................................................................................................................................94 7) Other Standards .............................................................................................................................95

    9 USEFUL RESOURCES ................................................................................ 96

    9.1 Websites ............................................................................................................................................96 1) General ..........................................................................................................................................96 2) Guides and Templates ...................................................................................................................97 3) Risk Management/Impact Analysis ...............................................................................................98 4) Training and Certification..............................................................................................................98 5) Change Management .....................................................................................................................99 6) Methodologies ...............................................................................................................................99 7) Tools ..............................................................................................................................................99 8) Standards and Legislation ............................................................................................................100 9) Useful Other Sites ........................................................................................................................100

    9.2 Papers .............................................................................................................................................101

    9.3 Books...............................................................................................................................................101

    10 SPECIFIC REFERENCES ....................................................................... 103

    10.1 Retail / Supply Chain BCP ...........................................................................................................103

    10.2 Banking / Finance Industry BCP .................................................................................................103

    10.3 Human Security Issues ..................................................................................................................103

    10.4 IT Security Issues ..........................................................................................................................103

    10.5 Database Recovery ........................................................................................................................104

  • Page 9 of 116

    Acknowledgements

    Michelle Sollicito would like to say thank you to all the friends and colleagues from Yahoo!, Earthlink,

    Schlumberger Sema, Accenture, and other large companies for providing input, advice and support to help

    me in completing this book.

    About the Authors

    Michelle Sollicito is an Ebusiness Consultant with Exceptiona.com in Atlanta Georgia. She has 16 years

    IT and Ebusiness experience gained with many organizations across the world, having lived in the UK,

    New Zealand and now in the USA.

    Who is This Book For?

    Business Survival a Guide to Business Continuity Planning and Disaster Recovery is for experienced and inexperienced, technical, and non-technical personnel who are interested in the need for Business

    Continuity Planning within their organizations.

    These personnel include:

    Senior and Executive management, the decision-makers who make budgetary decisions

    Business Continuity Managers and their teams

    Chief Information Officers, who ensure the implementation of the Disaster Recovery elements of the Business Continuity Plan and play a large role in (and perhaps even manage or oversee) the

    Business Continuity Process

    The IT security program manager, who implements the security program

    IT managers and system owners of system software and/or hardware used to support IT functions.

    Information owners of data stored, processed, and transmitted by the IT systems

    Business Unit owners and managers who are responsible for the way in which their own unit fits into the overall Business Continuity Plan, but especially

    o Facilities Managers, who are responsible for the way the buildings are evacuated and secured, providing floor plans and information to Emergency Services, etc.

    o Human Resources Managers who are responsible for the people elements of the Business Continuity Plan

    o Communications and PR Managers who are responsible for the communications policies that form part of the Business Continuity Plan

    Technical support personnel (e.g. network, system, application, and database administrators; computer specialists; data security analysts), who manage and administer security for the IT

    systems

    Information system auditors, who audit IT systems

    IT consultants, who support clients in developing, implementing and testing their Business Continuity Plans

  • 3

    BUSINESS SURVIVAL A Guide to Business Continuity

    Planning and Disaster Recovery

  • Page 4 of 116

    The driver for Business Continuity Planning should take into consideration natural disasters and

    internal security breaches as well as Terrorism! Gartner 2002

    1 What is Business Continuity Planning?

    With the emergence of the internet as a place for doing business 24 hours per day, 365 days per year, it is

    more and more important that organizations are able to continue to operate when unexpected events occur.

    The events of 9/11 showed how deeply unexpected events can affect whole industries such as the Financial

    Industry and the Airline industry, and the knock-on effect this kind of event can have across the whole

    economy of the United States.

    However, even in the midst of such a catastrophic disaster, Dow Jones and Co, publisher of The Wall

    Street Journal, located very close to the World Trade Center, enacted its Business Continuity Plan so

    effectively that it was able to provide its readers with the newspaper the very next day, despite having to

    relocate all its editors, reporters and support personnel to alternate offices and installing 100 pcs at the new

    location!

    The tsunami of Christmas 2004 (and before that, the power blackout in New York, and the rolling

    blackouts in California) showed the potential for disasters to affect whole regions at a time.

    The Enron and Worldcom scandals (amongst others) illustrated how deeply companies can be affected by

    bad PR events.

    Taking into account all of these facts, the Federal US Government, investors, as well as many of the

    organizations governing industry standards and regulations, recognized the need for improved Business

    Continuity measures and increased controls to help prevent the impact of events such as these.

    As a result, organizations (especially those in the United States) are coming under increasing pressure to

    produce effective Business Continuity Planning measures in order to reduce/mitigate risks.

    Many organizations had Disaster Recovery Plans already in place and assumed that these were sufficient to

    meet the requirements of new laws, regulations and conformance requirements. However, they have now

    discovered that Business Continuity Planning is about much more than simply ensuring that computer

    systems come back up quickly and effectively after a disaster.

    So, what exactly is the difference between Business Continuity Planning and Disaster Recovery Planning?

    1.1 Business Continuity Planning Defined

    Business continuity planning is concerned with optimizing organizational resilience.

    As such, it is a business function aimed at developing, documenting and integrating procedures, processes

    and technologies in order that in the event of a disaster, critical business functions can continue with

    minimal disruption or downtime, providing at least the minimum level of acceptable service, while the

    remainder of the organization is restored to business as usual status.

    Business Continuity Planning (or BCP) is all-encompassing. It is the responsibility of each department or

    business function to define the restoration requirements essential to continuing its operations as part of

  • Page 5 of 116

    Business Continuity Planning, and thus BCP encompasses the complete restoration process required across

    the whole organization, not only the IT systems.

    Likewise, Business Continuity Planning should consider all kinds of disasters natural ones (e.g. those caused by flood or earthquake), system failures (e.g. caused by hardware failure or software failure) and all

    other types of disasters (e.g. those caused by caused intentionally by viruses, hackers or terrorism or those

    caused accidentally by fire, accidents etc.).

    BCP does include IT recovery plans (Disaster Recovery), but also considers other aspects such as

    communications, buildings, stationery, office equipment, water supplies, electrical supplies, etc. Done

    properly, BCP is more about creating a Business Continuity culture rather than simply specifying a set of

    procedures to follow in the event of an emergency.

    In order to avoid some of the political issues commonly experienced within organizations having a Disaster

    Recovery team and a Business Continuity team, it should be made very clear that the Disaster Recovery

    team reports to the Business Continuity manager and is just one team making up the Business Continuity

    team.

    1.2 Disaster Recovery Defined

    Disaster Recovery, however, is a subset of Business Continuity Management and is primarily an IT

    function, aimed at restoring the organizations IT systems to business as usual status in as efficient a manner as possible.

    Disaster Recovery Plans document the actions required to restore systems and data after a disaster or an

    outage in such a way as to prevent, or at the least minimize, the impact that the disaster or outage has on the

    organization.

    Disaster Recovery Plans typically also document any precautions taken to minimize the effects of a disaster

    or outage.

    The key difference between Business Continuity Planning and Disaster Recovery is that BCP is proactive

    (its aim is to avoid or mitigate the impact of a risk), whereas Disaster Recovery is reactive (it aims to

    restore the business after the risk occurs).

    However, Disaster Recovery is an integral component of a Business Continuity plan.

    1.3 Overall Steps

    In order to successfully create a Business Continuity Plan and Process, there are a number of steps that

    need to be taken.

    1) Get Board approval 2) Determine scope 3) Carry out risk analysis / management (Business Impact Analysis) 4) Create a project plan and budget

    Establish a BCP group / team

    Establish a Steering Committee 5) Create the plan (overall document) 6) Gather / Create supporting documentation 7) Test / review /audit the plan and the process 8) Change Manage any changes made to the plan / process / documentation 9) Formally approve the plan 10) Return to 6

  • Page 6 of 116

    2 Convincing the Board

    2.1 The Importance of Support from the Top

    It is absolutely essential to gain top management approval and commitment to the development of a

    Business Continuity Plan and Process. The aim of the BCP process is to create a Business Continuity

    culture, to change the way that everyone in the organization thinks, and for a change of that magnitude, it

    has to be perceived as being driven from the top downwards.

    Without it, it will be almost impossible to motivate other players, who may see no direct financial return

    from work carried out on such a plan. It will also be difficult to obtain resources and finances required to

    make the Business Continuity Plan effective.

    Luckily, corporate governance and preparedness is a hot topic in the Board Rooms of most organizations

    these days, thanks to 9-11, Sarbanes Oxley Audits and public trials of corporate executives, so this should

    make it much easier to gain the top-level approval required. Many companies are also already under

    pressure to comply with international standards that require Business Continuity Plans as a key component

    in the path to compliance.

    The reason for the pressure to conform to corporate governance standards is, in essence, purely because

    Business Continuity is good practice. Knowing that a BCP is in place reassures investors and potential

    investors, employees and potential employees, customers and potential customers.

    Some executives, however, will provide endless excuses for why they cannot commit resources to BCP

    work lack of time, resources and/or money being the most common excuse1.

    The best way to gain commitment from such executives is to ask probing questions about their level of

    confidence in the event of a disaster ask them how confident they are that the companys vital records are well protected, or how confident they are that a determined hacker could not get into the companys systems.

    Ask them how much it would cost their organization (in terms of financial costs, as well as in terms of

    consumer and investor confidence) if a key system went down for two weeks, and ask how sure they are

    that this could never happen.

    Ask them how confident they would be about producing the Year End Accounts if their CFO befell an

    unfortunate accident a week before Accounts Close.

    Point out that these days, an organization that does not provide at least a minimum level of service to its

    clients (and /or business partners) following a disaster may not have a business worth recovering!

    Customers are just one click away from a competitor in many cases, and, of course, if business partners

    find out that they can function without your organization for one day, they may decide they can function

    without you for longer!

    This approach is likely to increase their attention level dramatically!

    1 http://www.exceptiona.com/displaycategoryitems.asp?ArticleId=155 Ostrich Syndrome

  • Page 7 of 116

    For more information on how the Board can be convinced of the necessity of a BCP plan, see Whats Wrong

    With BCP? CPM's Advisory Board Sounds Off On the State of the Industry, Paul Kirvan

    http://www.contingencyplanning.com/archives/2003/janfeb/1.aspx

    2.2 Explaining Why its so Important

    The US Federal government recognizes the impact upon the economy of organizations having insufficient

    Business Continuity measures in place, and hence is putting increased pressure on organizations to put into

    place (and to test) business continuity plans to reduce the impact on the economy, industry and the public

    of major disasters. Government Agencies must now have BCPs in place, and most large organizations are

    affected by one of the laws/regulations requiring that a BCP be in place now, or in the near future such as HIPAA2 and the Sarbanes Oxley Act3.

    Increasing emphasis on Corporate Governance means that key stakeholders are insisting upon effective

    Disaster Recovery / Business Continuity Plans. This is because suppliers, employees, business partners,

    shareholders and potential investors are acutely aware of the potential financial impact of not having such

    plans not only in terms of real financial losses, but also the potential loss of customers, poor public image, falling share prices etc.

    Not only is it a good idea to have an effective BCP to keep stakeholders minds at rest, but Business continuity plans can be shown to be an excellent return on investment - research has shown that

    organizations have a much better chance of remaining in business and suffering significantly fewer costs as

    a result of a disruption if they have a Business Continuity Plan. Further, the alternative to having a BCP

    can be financial ruin.

    One study suggests that any organization that suffers a computer outage lasting more than 10 days never

    fully recovers, and that 50% of such companies go out of business within 5 years of such an incident.4

    Disaster Recovery Institute International (www.drii.org) research shows that

    More than 75% of US businesses have experienced some type of interruption

    More than 80% of small businesses experiencing business interruptions go out of business within 5 years

    93% of all organizations who experienced a disaster with no recovery plan in place closed within five years

    50% of companies that lost critical business functions for more than ten days never recovered

    the average cost of business and system downtime for Fortune 500 companies, is $96,000 per minute.

    Despite these figures, IDC found that although 80% of large companies have a BCP plan or process

    underway, only 40-45% of small companies do.

    Part of the reason that more companies do not have BCP plans in place is because there is clearly a

    mismatch in communication between the IT function and Business functions within many organizations A Roper study shoed that while 52% of IT executives believe that their organizations are very vulnerable to

    the possibility of losing critical data, only 14% of business executives in the same organizations were aware

    of this vulnerability.

    There are many other compelling reasons for creating an effective Business Continuity Plan

    2 http://www.hipaadvisory.com/action/notes/vol3/may03.htm HIPAA and Business Continuity/Disaster Recovery Planning 3 http://www.itpapers.com/abstract.aspx?cid=66&docid=83814 Sarbanes Oxley and BCP 4 Jon Toiga, Disaster Recovery Planning: Managing Risk and Catastrophe in Information Systems, (Yourdon Press, 1989)

  • Page 8 of 116

    Disasters are by their very nature, unexpected. They tend to occur at the most inconvenient times the one week in the year when all the experts on the critical systems are away at a seminar, the day after the most

    recent system backup failed, just before close for End of Year Accounting functions, at the end of a long day when all the network engineers are tired and confused.

    Effective Business Continuity Planning enables all the key players to think carefully through all the

    possible scenarios and determine the best way to tackle each, and then to document these in such a way that

    the most tired, confused, inexperienced team member knows what is expected of him or her in each

    situation.

    Having an effective Business Continuity Plan ensures that a great deal of the critical employee knowledge

    and expertise is captured on paper in policies, procedures and plans. This protects the organization in the

    event of employee absenteeism and resignation, and reduces training costs when replacing critical

    employees.

    Compliance with many internationally recognized standards (e.g. ISO1799) - increasingly of great

    importance to many organizations throughout all sectors of trade, industry and government (many

    government agencies and organizations will not enter into contracts with companies who are not compliant

    with one or other of the required standards) - requires appropriate business continuity management and

    planning.

    Furthermore, many organizations decide to implement BCP planning measures independent of any external

    pressures from trade, industry and government, because of factors such as

    Increasing dependency over recent years on computerization, leading to an increased risk of loss of normal business operations

    Increased security threats to IT systems (including viruses, hackers, trojan horses etc.)

    Increased recognition of the impact that a serious incident could have on the business (and even the whole industry/economy) in the light of events such as Y2k, 911, the New York power outage

    (August 2003), the Rolling blackouts in California (2001), etc.

    If the Board still doesnt get it once you have listed all these reasons that Business Continuity Planning is so important, it is time to get out your list of issues that you have found within the organization that could

    cause potentially huge losses or embarrassment.

    For example

    A critical business function located on the ground floor of a building that is built within a flood plain

    Firewalls that are being bypassed, meaning that hackers could potentially gain direct access to the systems containing credit card data

    UPS backup power systems that are not maintained and therefore will not operate in the event of power outages

    Change management procedures being bypassed so that it would be impossible to recreate the software set up on key web servers if one of them was lost due to hardware failure

    A failover site that uses the same ISP as the main sites, so that in the event of ISP failure neither site would be operational

    SLAs that are out of date or insufficient to protect the business against loss of service

    64-70% of business that have major fires never recover. The primary reason for failure is the loss of vital

    business records. (Broder 1999)

  • Page 9 of 116

    Also, point out that research shows that

    The Power Grid in the US, built in the 19th Century, is not able to support 21st Century load effectively

    Demand for premium power will grow from 10% to 50% by 2010

    The New York blackout and the Rolling Blackouts in California may be symptoms of this problem

    More and more organizations are looking at backup power options

    Software bugs cost business $60billion per year

    Ecommerce/ebusiness increasingly requires 24 x 7 x 365 availability of many systems

    In 1999, a large US candy manufacturer missed candy deliveries worth approximately $200 million

    because of system glitches in its new $112 million computer system.

    An online trading company lost $2.5 billion in market value when its system crashed.

  • Page 10 of 116

    3 Defining Scope

    Assuming that the Board gives approval for the creation of a Business Continuity Process / Plan to be

    developed, the first stage is to define the scope. Many BCP planners have skipped this step to their peril,

    perhaps assuming that it is obvious what is within scope and what is not in scope. However, it is often not until the detail is examined that it becomes clear how important scoping is within any BCP exercise.

    An essential for the scoping phase of any BCP process is a database to contain all the lists that will be

    obtained during this phase and then expanded upon and updated/maintained later in the process. Ensure

    that this database is secure, is regularly backed up, its contents are available to authorized staff only within

    the intranet, and that the BCP team and other authorized parties get an updated paper copy of all relevant

    lists from that database regularly. Also, ensure that the backup of that database as well as the paper copies

    are stored at multiple sites, including in a fireproof safe offsite at a document storage facility. It would be

    most embarrassing for the BCP team not to be able to function during a crisis because they did not have

    access to their own database, or information stored in it!

    As a minimum, the BCP should cover all business processes within the organization not only IT systems, but also communications, business information, production, sales, accounts, customer service and public

    relations etc. However, consideration should be given to which systems and sites span across

    organizational boundaries, and which of these should be taken into account during the scoping exercise.

    This chapter provides a guide to ascertaining which sites, systems etc. to consider when making scoping

    decisions.

    3.1 Which Sites?

    For example, it is important to get a complete list of all sites that contain systems or personnel owned by or

    related to the organization in any way. Start by assuming that all are in scope, and only eliminate sites

    when you are sure that they are not in scope.

    While it may be obvious that sites such as the organizational Headquarters, the Data Center Site and any

    offices, warehouses and retail outlets owned by the organization should be included, sites that some

    organizations might overlook if this approach were not taken include

    Homes of employees who work from home some or all of the time

    Business Partner sites (the sites of Suppliers, Wholesalers, even some Customers)

    Offshore Outsourcing centers

    Failover sites (it is amazing how many organizations assume that Failover sites will always be operational and available, no matter when an emergency might materialize, and therefore

    eliminate these from the scope of their Business Continuity Plans!)

    Banks/Financial Institutions (how would your organization operate if its main source of finance were not operational for a few days?)

    The sites of Utilities on which the business is dependent

    Offsite storage facilities for backup media

    Once a list of all sites exists within your BCP database, the sites should be prioritized according to

    criticality - e.g. the Data Center may be a criticality level 1 site, whereas the Bonded Warehouse may be a

    criticality level 5 site.

    3.2 Which Systems?

  • Page 11 of 116

    A clear definition of which systems are covered within the Business Continuity Process is needed because

    so many owners of smaller systems will otherwise assume that their system is not significant enough to

    warrant inclusion.

    It is often obvious that the larger IT systems should be included within scope however, the inavailability of

    the smallest, single user spreadsheet just before the Accounts are due to close can potentially cause big

    problems for an organization, so overlook nothing!

    Also it is vital to consider systems that are not owned by the organization (and possibly located offsite) especially nowadays, with so many organizations reliant upon external systems for their own survival e.g. credit card software/shopping carts, business to business marketplaces/exchanges, bank automated

    clearance systems, etc. The impact of losing access to these kinds of systems can be huge on an

    organization, even for a short period of time, so examining SLAs and Failover options for those systems is

    crucial to BCP planning.

    Interdependencies between all systems (both internal and external) need to be clearly understood before it

    can be determined that any system is not within scope.

    It is also important to take into account that systems consist of many components, and each component is

    within scope.

    Hardware (servers, network cables, routers, hubs, firewalls, etc.)

    Software (operating system software, system administration software, networking software, database software, application software, office automation software, web server software, custom

    software, packaged software, all single user software, spreadsheets, all development and test code

    used to create any of these in house, etc.)

    Interfaces with other systems

    Data and information

    It is easy to forget (or not realize) how interdependent these components often are, so it is key to identify

    interdependencies clearly also. For example, in some organizations, if a key firewall, proxy or router goes

    down, access to most of the organizations systems is no longer available.

    As input to the process of determining which systems and components are in scope, use network topology

    diagrams, system architecture diagrams, etc. and be sure they are kept up-to-date with all the latest changes.

    A wealth of automated scanning tools are available these days to enable you to double-check that the list of

    systems you have identified is truly the full list it is very easy to miss systems when carrying out a system audit. Some tools are aimed at scanning routers and firewalls to identify all that are operational, while

    some are aimed at scanning each PC or Server to find out which applications are installed (remember that

    in many organizations, PC owners can install software on their own PCs without informing the Technical

    Support personnel, especially by downloading from the internet).

    List all systems in terms of priority/criticality within your BCP database, remembering that some systems

    that may not seem to be critical may need to be brought up before critical systems can become operational,

    due to interdependencies.

    3.3 Which Departments/Business Functions?

    Of course it is easy to say that all departments and business functions should be part of the Business

    Continuity Planning Process, and so they should. However, it is important to recognize that some

    departments will be more involved than others, while some will simply provide input to the process in

    terms of which systems are essential to their continued operations.

  • Page 12 of 116

    Recognize that some parts of the business may have a lot of the information needed for the BCP process

    already at their fingertips - many organizations nowadays have an Internal Audit department (especially in

    the US, since the Sarbanes Oxley laws have become operational) - and this department may have already

    done a great deal of the legwork involved in obtaining lists of key systems, personnel and contact

    information, as well as lots of useful documentation procedures, guidelines and standards.

    They may also have already identified some areas of BCP/DR which are weak within the company. All

    this information will be very useful in reducing the workload of the BCP/DR team, and having the

    information to hand will also help to prevent bad feeling within the company arising from two different

    departments asking for exactly the same information!

    Encourage team working, and lots of cross-communication and sharing of information with such a

    department if your organization has one.

    Of course, the IT department is another key ally essential to the success of the BCP effort. In most

    organizations, a great number of the concerns of the BCP team are already concerns that have been

    considered by the IT department in some detail, and it is likely that the IT department will already have

    established Disaster Recovery plans and procedures, Failover sites for the IT department and Failover

    hardware and software options for key IT systems. Learn everything you can from the IT department and

    be sure to foster a close relationship there is nothing more likely to make a BCP effort fail than a poor relationship between the BCP team and the IT department.

    List all departments and business functions and order them by priority (in order of importance to the

    organization) within your BCP database.

    Areas to be considered could include:

    Ecommerce processes

    Email-based communications

    Other online real-time customer services

    Production line / processes

    Quality control mechanisms

    Customer service handling

    Sales / sales admin

    Finance / treasury

    Research / development

    Maintenance and support services

    Information technology services

    Premises (Head Office and branches)

    Marketing

    Public Relations

    Accounting and reporting

    Strategic and business planning activities

    Internal audit

    Human resources management

  • Page 13 of 116

    3.4 Which Personnel?

    It is often best to allow the Business Functions or Departments to elect representatives to be involved in the

    BCP effort from amongst themselves. However, there are some key players who really need to play some

    part in the team.

    At least one Board Member should have day-to-day knowledge about the work on the BCP effort, so that

    the team has the full authority it needs in order to carry out its mission. In addition, at least one legal expert

    is required in order to advise on laws which must be conformed to, and to help when examining contracts

    and SLAs to explain the implications of different clauses within them.

    The Emergency Operations Center Management team (or Help Desk in a smaller organization) should be

    fully involved with all aspects of the BCP as it is their staff who will be implementing many of the

    procedures in an emergency, and who will be the first point of contact in a disaster.

    Ensure that for each department/business function listed within the BCP database, there are at least two

    contact names/numbers (preferably each from a different site if possible) who are aware of the BCP

    process.

    It is imperative that one person is elected to be responsible overall for the BCP team, plan and process, and

    is given full authority as BCP Manager. However, this person should have a strong backup partner

    (preferably based at a different location) who is fully aware of all aspects of the BCP in case the BCP

    Manager is unavailable.

    This person must have full-time responsibility for the BCP plan and process in order to be most effective,

    someone who is free from other responsibilities and who has the authority to confront other managers when

    necessary.

    3.5 Business Partner Relationships

    From time to time it will be necessary to involve business partners in the BCP process. In some cases, this

    will be purely to inform them of alternative contact numbers and locations in case of emergency or disaster.

    However, in other cases, business partners may play a key role in getting the organization back up and

    running for example, if the business partner provides or shares a key system.

    In the Ebusiness world, the boundaries between organizations is becoming more and more blurred with many companies involved in Joint Ventures, many organizations opening up a great deal of their IT

    Infrastructure to one another via Extranets and public websites, and with many business processes crossing

    organization boundaries. As organizations depend more and more upon the extended value chain of all

    their business partners in order to produce a product or service, business partners become more crucial to

    the overall Business Continuity of each organization. The BCP team should be aware of this, and be sure

    to include Business Partners is whichever aspects of the BCP process are relevant to them.

    A good way to determine which business partners the BCP team may need to focus on, is by sending out a

    questionnaire to all Business Partners/Vendors such as the one below.

  • Page 14 of 116

    For more ideas on third party questionnaires see the sample questionnaire provided at DRJ.com

    http://www.drj.com/eab/q&a/bcpvendorquestions.doc

    It should be mandatory that critical business partners, such as Banks, Financial Institutions, etc. have their

    own real BCP plan the organization should seriously consider the option of changing business partners if one does not exist. In the US, public companies under the scrutiny of Sarbanes Oxley Auditors often

    require SAS70s or other documentation (e.g. Systrust certification) to reduce the risk of critical business

    partners having inadequate Business Continuity and Systems Management.

    List all the organizations business partners. Take into account Banks and Financial Institutions, Corporate Customers, Suppliers/Vendors, ISPs, Wholesalers, any Business to Business (EBusiness) partners,

    Auditors, Consultants etc. Ensure that their contact details are kept fully up to date in your BCP database.

    Recognize that your organization may be required to provide reassurance to its business partners also.

    More and more organizations are becoming acutely aware of how dependent they are upon their business

    partners and are requiring periodic testing of business continuity or disaster recovery exercises. For

    example, in the Financial Sector, the Nasdaq Stock Market (and SIA) currently requires members to

    participate in such tests, see http://www.continuitycentral.com/news0894.htm.

    Third Party Questionnaire on BCP

    1. Has your organization recently (in the past year) been audited by an External

    Auditor for any of the following standards? SAS70? Visa CISP? Sarbanes Oxley?

    Yes/No

    If yes, please provide the Auditors report or a copy of the Certificate.

    2. Does your organization have a fully documented Business Continuity Plan or

    Disaster Recovery Plan? Yes/No

    If you answered No, please go to question 5.

    3. Does your organization have a process to support the Business Continuity Plan /

    Disaster Recovery Plan, which ensures that changes to the business or systems are

    constantly assessed to determine whether or not the BCP needs to change?

    4. Does your Business Continuity Plan / Disaster Recovery Plan cover:

    all sites? all business functions? all IT systems? emergency evacuation procedures? alternate site arrangements? data backup and recovery policy (including offsite storage of key data)? a number of different types of incident / disaster scenarios?

    5. Please provide key contact information in the event of an emergency or incident.

    6. Please provide the name, position and contact info of the person responsible for

    BCP within your organization

  • Page 15 of 116

    3.6 Which Types of Disasters and Risks?

    At this point most disasters and disaster types should be considered to be in scope. It may be that your

    organization determines that some potential disasters are excluded because they are so unlikely (e.g.

    nuclear war, terrorist bomb), but if that decision is made, it is important to get sign off at Board level of any

    threats that are excluded from scope.

    3.7 Which Legislation/Standards need to be considered?

    An initial meeting with the Board, IT Management and the Legal Department should identify most

    legislative requirements, and standards the organization wishes to comply with, the most common being

    HIPAA, Sarbanes Oxley, Visa CISP and Data Protection Act laws. However, anticipate others cropping up

    during your investigations, especially if your organization is a government agency, financial institution or

    in the Health industry.

    3.8 Interaction with Other Organizations

    Before determining the scope of your own incident response plans, it is important to know how local

    authorities and government agencies will respond to incidents and how your organization should fit into the

    overall picture.

    There are a number of resources on the web to help you to assess how to respond to various different types

    of incident, including bioterrorist incidents, disease outbreaks, natural disasters etc. Some useful resources

    are listed here:

    http://www.riskinstitute.org/ptrdocs/LocalGovernmentPreparationforBioterroristActs.pdf

    Local Government Preparation for Bioterrorist Acts

    http://www.bt.cdc.gov/Planning/

    Public Health Emergency Preparedness and Response, CDC

    http://www.fema.gov/library/bizindex.shtm

    Emergency Management Guide for Business and Industry, FEMA

    http://www.disastercenter.com/terror.htm

    Counter-Terrorism Terrorism and Security Information

    http://www.cj.msu.edu/%7Eoutreach/CIP/CIP.pdf

    Critical Incident Protocol a Public and Private Partnership, Michigan State Uni

    http://europa.eu.int/comm/environment/civil/pdfdocs/commission.pdf

    European Union Strategy on Prevention, Preparedness, and Response to Natural, Man-made and other risks

    http://www.dhs.gov/dhspublic/interapp/editorial/editorial_0566.xml

    National Response Plan (Dept Homeland Security)

    3.9 Gap Analysis

    Which parts of the BCP are already in place, are not in place, and which parts need defining more clearly?

    Review current policies and procedures and meet with key personnel to determine what needs to be done.

    Work closely with the Internal Audit department in ascertaining the current situation - Internal Audit

    departments should have a fairly clear idea of what is already in place, and what is missing.

  • Page 16 of 116

    In many organizations, Disaster Recovery procedures are already defined but Business Continuity Plans are

    not.

    Knowing what already exists ahead of starting the BCP process can reduce workload and frustration!

    A good way of determining the current position is by using questionnaires to gather initial information.

    Some sample questionnaires are given below.

    3.10 Questionnaires

    Some useful fact-gathering questionnaires to use when trying to identify scope, and attempting to determine

    the current position within the company are provided below.

  • Page 17 of 116

    Questionnaire for Top Executives

    1. Does your organization have a Business Continuity Plan in place? Yes/No

    (if you answered No, go to question 5)

    2. When was the Business Continuity Plan last tested?

    3. Who is responsible for the Business Continuity Plan?

    4. What is the most recent date on which the BCP was updated, reviewed, approved

    and released?

    a) Most recent six months

    b) Most recent year

    c) More than one year ago

    5. If a 9/11 type event wiped out your whole data center today, what is your

    confidence level that the organization would survive?

    a) Low

    b) Medium

    c) High

    6. When was the most recent Business Impact analysis/Risk analysis exercise

    carried out?

    a) Most recent six months

    b) Most recent year

    c) More than one year ago

    7. Has your organization quantified the risks you face in financial terms? If yes,

    please give details of documentation or contacts for further questions in this area

    Y/N _______________________________________________________________

    8. Has your organization prioritized systems and business functions according to

    their criticality to business continuity? If yes, please give details of documentation

    or contacts for further questions in this area Y/N

    ___________________________________________________________________

  • Page 18 of 116

    Questionnaire for Top Executives Pt 2

    9. Has your organization determined maximum tolerable outage times for each of

    the INTERNAL systems used by your organization? If yes, please give details of

    documentation or contacts for further questions in this area Y/N

    ___________________________________________________________________

    10. Has your organization determined maximum tolerable outage times for each of

    the EXTERNAL systems used by your organization? If yes, please give details of

    documentation or contacts for further questions in this area Y/N

    ___________________________________________________________________

    11. Does your organization have SLAs with third parties documenting these

    maximum tolerable outage times? If yes, please give details of documentation or

    contacts for further questions in this area Y/N

    ___________________________________________________________________

    12. Does your organization have an alternate site to use for systems and / or business

    functions in the event of an incident (a hot site, cold site, or reciprocal agreement)?

    If yes, please give details of documentation or contacts for further questions in this

    area Y/N

    ___________________________________________________________________

    13. Are names and numbers in contact lists updated regularly and redistributed in

    paper form? Y/N

    14. Is there a published Security Policy which is given to every new member of staff

    whether permanent or contract?

  • Page 19 of 116

    Questionnaire For Data Center Management 1. Which of the following physical security measures exist to protect the Data

    Center?

    No windows / no windows that allow viewing of the inside of the Data Center? No signs advertising the fact that the Data Center is a Data Center? No access to the building through any door apart from the main door without an access card and pin number?

    Security Guards on duty 24 hours per day? Closed circuit TV, monitored by Security 24 hours per day? Access control keypads requiring a pin to be entered and a security card to be swiped before gaining access to secure areas?

    Access control lists indicating staff who are allowed to access different parts of the building

    All Emergency exits are always clear and open? Alarm systems both automated (fire detection, smoke detection, flood detection) and manual?

    Are sprinkler systems in place where appropriate? Are air conditioning systems checked regularly? ID badges must be worn by all employees at all times? Are clear evacuation routes posted on all notice boards and at key locations? Are all visitors required to sign in and out and to be accompanied at all times by authorized personnel?

    Located away from rail lines, airports, chemical plants, and other hazardous locations?

    2. Which of the following security procedures are in place to protect the Data

    Center?

    Are fire drills / evacuation drills carried out regularly? Is the fire detection and extinguishing equipment tested / inspected regularly (past 6 months)?

    Are all occupants trained in emergency procedures and security procedures? Is there a written termination procedure that includes a checklist of items to be returned to the company, such as keys, ID badges, card access, etc.?

    Is there a policy to challenge visitors who are unknown, not accompanied, not wearing a badge?

    Is there a no tailgating policy ensuring that employees do not hold the door open for anyone they do not know is authorized to enter the building?

    Is this procedure followed by termination of all system access for the terminated employee?

    Is there a no smoking, eating or drinking policy in effect in the most sensitive areas of the Data Center?

    3. Which of these further security measures are in place for protection of the Data

    Center?

    Is one person responsible for Security of the Data Center? Is there a document / booklet of which all occupants are aware, providing information about how to respond to different types of incidents? Bomb threats,

    Fire, Security violations, power failures etc.?

    Are backup power generation facilities and UPS facilities on hot standby at all times and available in case of power failure?

    Is backup air conditioning equipment available at all times?

    Is all computing equipment and all network components clearly

    marked for identification purposes?

    Do all changes to the Data Center equipment, software, configuration

    and layout require a full change management request and go through

    full review before being implemented?

  • Page 20 of 116

    A good source of further questionnaire questions for different groups involved in Business Continuity is

    provided at http://www.drj.com/articles/drpall.html within the sample Disaster Recovery Plan.

  • Page 21 of 116

    4 Risk Management5

    Risk Management is defined as the process of identifying risk, assessing risk and taking steps to reduce risk to an acceptable level6. This is an ongoing process, and aims to continually identify, assess and handle risks across the organization.

    All organizations have to take risks in order to survive. In fact, the original definition of the term Risk is simply uncertainty of outcome, and should neither imply positive nor negative impact. However, in day-to-day use, the term has become synonymous with adverse outcomes and hence, will be used within that

    context here.

    Risk Management is about reducing the degree of uncertainty and reducing the number of surprises involved in potential risks through more effective risk identification, mitigation and response as well as

    more effective management of change, more efficient use of resources, and improved reporting and

    communication within the organization.

    Risk Management requires that:

    All risks are identified, especially the key risks to critical business functions

    Risks are quantified (in terms of probability and impact), and prioritized

    Risk tolerance levels are clearly defined

    Risks are allocated to the appropriate group / person

    Appropriate mitigation or risk responses are identified

    These responses and mitigation measures are reviewed for effectiveness

    5 Some BCP practitioners (e.g. The Business Continuity Institute) prefer to call this phase of BCP

    Business Impact Analysis and differentiate it from more traditional risk management and analysis. The author believes the name is less important than understanding the process behind it. 6 as above

  • Page 22 of 116

    1. Identify Risks

    5. Identify Risk Mitigation,

    Reduction and Response

    Measures

    6. Evaluate Effectiveness

    of Measures

    4. Allocate Risks

    to Appropriate Personnel

    3. Define

    Risk Tolerance levels

    2. Quantify Risks

    (probability, impact)

    1) Identify Risks

    The first stage in Risk Management is to identify which risks exist that could possibly affect the

    organization. The initial risk assessment process should include a thorough review of all possible risks, but

    once the Risk Management process is ongoing, this identification process can be the work of a Risk Council which meets regularly (monthly?) with the remit of looking at all recent changes to the organization, the systems used by the organization or the environment within which the organization

    operates (e.g. new legislation, changes to the supply chain, etc.) and identifying any new risks that should

    be considered. The Risk Council should gain its input from the Change Management processes within the

    organization, but also from the head of each business function, whose responsibility it should be to identify

    changes and report them to the Risk Council. All risks identified should be recorded in a central risk log

    and monitored by the Risk Council.

    2) Quantify Risks (Probability and Impact)

    Once risks have been identified, they need to be given a priority. Two factors determine the priority

    assigned to a risk the probability that it will occur and the impact it will have on the organization if that risk materializes.

    3) Risk Tolerance Levels

    Once risks are identified and quantified, it can be determined what the organizations risk tolerance level is with regard to each risk. For example, the organization may determine that it can tolerate up to 25 pcs

    being infected with a virus. After that point, it may be necessary to invoke incident management

    procedures such as shutting off access to the internet from the internal network.

    4) Allocate Risks to Appropriate Personnel

    It is important to make a team or an individual ultimately responsible for monitoring for each risk,

    responding to it etc. For example the PC system administrator might be the appropriate person to monitor

    for virus attacks and to respond appropriately to such attacks.

  • Page 23 of 116

    5) Risk Mitigation, Reduction and Response

    Risk mitigation measures should be identified to determine to which extent the risk itself can be reduced or

    eliminated, or to which extent the impact of the risk can be reduced or eliminated. For example, the risk of

    a Virus Infection can be mitigated using Anti-virus software and Firewall software. In some cases, this

    software can also provide automatic responses to infections, such as automatically quarantining infected files, or preventing mail attachments from being opened if they are infected.

    6) Evaluation of Effectiveness

    The Risk Council should