CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the...

31
Appendix LL Comprehensive Workforce Development System (CWDS) Disaster Recovery Plan Development Deliverable Jan 2013

Transcript of CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the...

Page 1: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

Appendix LL

Comprehensive Workforce Development System

(CWDS)

Disaster Recovery Plan Development

DeliverableJan 2013

Page 2: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

Comprehensive Workforce Development System Project 1.3

Document History

Version Date Author Status Notes

1.0 03/17/2006 DRAFT Initial Creation1.1 04/11/2006 DRAFT Reviewed with DLI1.2 04/12/2006 DRAFT Changed Section Header Name1.3 04/29/2006 FINAL Incorporated Comments from IV&V

Review2.0 05/30/2009 FINAL Minor changes made to reflect the

selection of Scranton PA as the DLI DRA recovery site. The environment diagrams were updated to reflect current as built status for UAT and Production.

3.0 1/3/2013 CWOPA Update

CWDS Disaster Recovery Plan Development Deliverable

Page 3: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

Comprehensive Workforce Development System Project 1.3

Table of Contents1 Introduction........................................................................................42 Purpose..............................................................................................43 Overview............................................................................................44 CWDS Requirements...........................................................................4

4.1 UAT Needs to Mirror Production.....................................................54.2 Network Interdependency Connectivity...........................................64.3 Storage Area Network (SAN)..........................................................6

5 Roles and Responsibilities...................................................................66 DR Plan (DRP).....................................................................................7

6.1 DR Project Planning.......................................................................86.2 DR Recovery Strategy Formulation.................................................8

6.2.1 CWDS Recovery Assumptions:..................................................................86.2.2 CWDS Recovery Strategy Definition:.........................................................96.2.3 IT Process Documentation.......................................................................106.2.4 DR Plan Documentation..........................................................................11

7 Single Point of Failure (SPOF) Analysis of CWDS IT Components..........127.1.1 Facility Interdependencies......................................................................127.1.2 Connectivity Interdependencies..............................................................127.1.3 Network Interdependency.......................................................................127.1.4 System Redundancy...............................................................................127.1.5 CWDS Recovery Data Validation.............................................................15

8 Tape Backup and Restore..................................................................168.1.1 Archival...................................................................................................168.1.2 Backup and Recovery Approach..............................................................168.1.3 Database Recovery.................................................................................168.1.4 Functional Components...........................................................................178.1.5 Testing....................................................................................................19

9 Single Point of Failure (SPOF) Analysis of CWDS Business Components199.1.1 Detailed Site Assessment........................................................................199.1.2 Leveraging CTC relationships for recovery site.......................................209.1.3 Real Time Backups..................................................................................209.1.4 Data Validation........................................................................................209.1.5 Enterprise Business Continuity Planning.................................................20

10 Summary.......................................................................................21APPENDIX A – Interview / Meeting Attendees.........................................23APPENDIX B – Production Environment..................................................25APPENDIX C –UAT Environment.............................................................26APPENDIX D – Roles and Responsibilities..............................................27APPENDIX E – Six Phase DR Methodology..............................................28APPENDIX F – PA Labor & Industry High-Level Network Diagram............29APPENDIX G – DR Architecture for CWDS...............................................30

CWDS Disaster Recovery Plan Development Deliverable

Page 4: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

1 IntroductionBusiness interruptions ranging from catastrophic natural disasters to acts of terrorism to technical outages mandate that organizations develop and cultivate business continuity and recovery resources, plans, and management. As part of the CWDS project, a Disaster Recovery Plan (DRP) is required to maximize system availability, in order to sustain both internal and external business operations with minimum interruption or delay. This document outlines the disaster recovery plan, activities associated with its development, and its components.

2 PurposeThe purpose of the CWDS Disaster Recovery Development Plan outline is to provide a structure for the activities associated with the development of the CWDS Disaster Recovery Plan (DRP). In this capacity, the document is meant to assist with and set the expectations for the plan to guide DLI in the development of all of its components.

3 OverviewThe document outlines the components necessary for restoration of the CWDS system in terms of hardware, software, application systems, databases and workforce related technology components. In addition to the requirements and assumptions for developing the plan, the contents of the document are outlined in terms of:

Disruption scenarios Roles and responsibilities Disaster recovery plan details A single point of failure analysis Business continuity considerations

4 CWDS RequirementsThe requirements section outlines the needs related to disaster recovery for the Workforce systems supporting CWDS collected through interviews with DLI personnel, CWDS team members, as well as those specifically outlined in the RFP. For the purposes of this project, CWDS requirements are only identified for CWDS IT components (Hardware, software, connectivity) as they relate to the recovery of the CWDS infrastructure at DLI server operations. Requirements stated in the RFP include the following:

The CWDS User Acceptance Testing environment will be leveraged to recover Production at an alternate site

A “warm” off-site location will be provided by the Commonwealth with redundant power sources, UPS capabilities, building security access and communications will be in place. This facility is now operational in Scranton, PA.

DLI continues to evolve its work on a Disaster Recovery Plan which will allow the operations to resume within a Recovery Time Objective of 5 days of a declared disaster – (RTO), with a Recovery Point Objective (RPO) of 72 hours

The CWDS system must be capable to archive data that is 4 years old and provide recovery and roll back procedures for the full production system.

??? Requirements identified for CWDS focused only on CWDS systems hosted at the DLI server farm.

4 CWDS Disaster Recovery Plan Development Deliverable

Page 5: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

4.1 UAT Needs to Mirror ProductionIn order for the UAT environment to mirror production, some items have to be addressed from a hardware component perspective. As it has been proposed, hardware components in UAT mirror Production one for one, with 2 exceptions. The comparisons are provided below:

CWDS Production & UAT Hardware Component Comparison Chart

CWDS Production QtyCWDS User Acceptance Testing

(UAT) QtyIntranet Web/App 6 Intranet Web/App 6Correspondence/Print Results 2 Correspondence/Print Results 2Identity Minder User Mgmt 2 Identity Minder User Mgmt 2SiteMinder Policy 2 SiteMinder Policy 2Staging Server 1 Staging Server 1webMethods Shared Infrastructure 6 webMethods Shared Infrastructure 6CWOPA User Repository 2 CWOPA User Repository 2Enterprise Reporting Cluster (BOE) 2 Enterprise Reporting Cluster (BOE) 2DW Enterprise Reporting Cluster (BOE)

3 DW Enterprise Reporting Cluster (BOE)

3

Monitoring Server 1 Monitoring Server 1Systems Management Server 1 To Be Determined ?GIS Application Batch 2 GIS Application Batch 2WebSphere Java Application 2 WebSphere Java Application 2AD Authorization Repository 2 AD Authorization Repository 2AD User Repository 2 AD User Repository 2OLTP DB/ODS/DW 2 OLTP DB/ODS/DW 2Reporting (ODS) / DW DB Server 2 Reporting (ODS) / DW DB Server 2CSS Switch 2 CSS Switch 2Checkpoint Firewall 2 Checkpoint Firewall 2Sorry Server 1 Sorry Server 1

Figure 4.1 - A: Production & UAT Component Comparison Chart

Another point of concern is shared components. Currently, components for the UAT infrastructure that need to mirror production, share responsibilities at either the enterprise level, or across CWDS environments. A list of these is provided below: Enterprise Shared Components:

o SMTP Mail Servero FTP Servero DART (WOTC) Reporting Server / LIHBG000DB61o Enterprise Sorry Server

CWDS Shared Components:o Active Directory Authorization Repository (Training, UAT)o Active Directory User Repository (Development, Training, CIT, UAT)o $Uo Web Methodso TFS

5 CWDS Disaster Recovery Plan Development Deliverable

Page 6: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

o GISDLI has to decide as to what approach it takes to assure that these components are available for CWDS Production recovery at the Scranton site. CWDS project suggests replicas of all production Active Directory servers are pre-staged in the DRA site to insure the shortest time to recovery as nearly all other recovery activities will be dependent on their availability.For diagrams of the Production and UAT environments, please reference APPENDIX B –PRODUCTION ENVIRONMENT and APPENDIX C –UAT ENVIRONMENT.

4.2 Network Interdependency ConnectivityThe existing DLI network infrastructure must be accessible in order to implement our recovery strategy. Connectivity to the following networks is a must:

The Metropolitan Area Network (MAN) The Storage Area Network (SAN) including the tape drive system DLI Network Commonwealth Technology Center DPW Data Center

4.3 Storage Area Network (SAN)In version one of the DRA plan a requirement for independent SAN storage to support the UAT environment in the same manner as the SAN is used at the headquarters site in Harrisburg was identified. DLI has executed on that requirement and adequate SAN and backup facilities are available in the Scranton DRA facility. A separate DRA plan/process for the SAN is under development by DLI.

5 Roles and ResponsibilitiesIn this phase of the DRP development project, we identify key personnel responsible for the existing recovery of the Workforce Development systems and personnel who are responsible for the recovery of the CWDS system at the designated DLI recovery facility. Individuals identified are responsible for initial assessment of the infrastructure (hardware, software, interdependencies) as well as the CWDS recovery team (employee, contractor, and vendor) in terms of safety, administration, recovery process execution, and communication.

Based on interviews and meetings conducted with DLI employees and assessments of existing Office of Information Technology (OIT) and departmental DR documentation, we created a Roles and Responsibility diagram to reflect the DLI Disaster Recovery (DR) organization. This process helped us identify key players, ensure communication flow, and provide a visual representation of the organization (See APPENDIX D – ROLES AND RESPONSIBILITIES)

In our documentation assessment, we found that OIT existing DR documentation reflected good communication procedures. We were able to identify a recovery team role that was not documented, the “Disaster Recovery Coordinator”.

The Disaster Recovery Coordinator (DRC) provides the Business Recovery Coordination Team (BRCT) a much-needed “hub” for information for disaster recovery efforts in terms of communication, assessment, and plan execution between other DR teams.

6 CWDS Disaster Recovery Plan Development Deliverable

Page 7: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

Gerry Bucko is the DRC for OIT and will be a strategic part of any future recovery efforts.

Furthermore, the table below lists critical staff that are part of the CWDS Recovery Team:

CWDS Recovery TeamTeam Member Name Function

CWDS L&I Project Manager Recovery Team LeadCWDS L&I Project Team Member Recovery team AlternateCWDS L&I Project Team Member Team MemberCWDS L&I Project Team Member Team memberCWDS L&I Project Team Member Team MemberCWDS L&I Project Team Member Team Member

Contractor Staff/Non-CWOPA Staff

Application Support/Recovery

The Roles and Responsibility information, along with the CWDS Recovery Team table, were published as an attachment to the April 29, 2006 submission of the Disaster Recovery Plan Development Deliverable (v1.3).”

6 DR Plan (DRP)The DRP defines the resources, actions, tasks, and data required to manage the business recovery process. It is designed to assist in restoring the business process within CWDS stated disaster recovery goals; a Recovery Time Objective (RTO) of 5 days and a Recovery Point Objective (RPO) of 72 hours.

Part of the DRP development requires the identification of the recovery strategy, disruption scenarios, recovery assumptions, and recovery roles and responsibilities. This process facilitates development of processes surrounding communication, responsibilities, and execution of the plan. The main sections of plan development include the following:

DR Project Planning Disaster Recovery Strategy Formulation IT Process Documentation DR Plan Documentation

Activities associated with each component have been provided below:

6.1 DR Project PlanningDuring the initial development of this outline, the project work plan was developed and finalized. A review of best practices and deliverables from similar engagements was also completed. At that time, meetings were held with CWDS project sponsors and staff from the DLI Security Division to request documentation and identify candidates for interviews.

6.2 DR Recovery Strategy FormulationThis component has two elements associated with it; Assumptions and Strategy. Each item has been explained below:

7 CWDS Disaster Recovery Plan Development Deliverable

Page 8: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

6.2.1 CWDS Recovery Assumptions:As stated in the proposal (II-1.5_Technical Architecture Approach_final.doc / System and User Acceptance Test (UAT) Environment Configuration) CWDS is to be able to recover the Production environment by leveraging the UAT environment. Therefore, the UAT environment must “mirror” production in order for full recovery to be successful. Also, in order for UAT to be a “true” recovery solution, it needs to be located at an alternate site that must not fall within the scope of a disruption of the production facility, as outlined in the “Recovery Strategy Definitions” below.

Also, as stated in the RFP (Section IV-5.7.2), it is assumed “that in addition to the primary site location, a warm off-site location will be designated and provided by the Commonwealth and that redundant power sources, UPS capabilities, building security access and communications will be in place”. In an effort to define this in detail, the requirements for our DRP are based on the following assumptions:

o The DLI Recovery Facility is assumed to be intact, operable, and available to support the recovery of the CWDS system. (Note: This requirement has been met with the implementation of UAT in Scranton.)

o DLI’s OIT has implemented recovery capabilities that will permit redirection of critical data communications networks to the DLI Recovery Facility. This redirection capability addresses requirements for both its internal DLI network and any external connections, specifically:

The Metropolitan Area Network (MAN) The Storage Area Network (SAN) DLI Network OA CTC DPW Mainframe Farm

Note: The physical requirements for network connectivity have been met in as much as the Scranton DRA site has its own network capable of handling the CWDS workload. Implementation strategies are still being worked on as DLI develops its enterprise DRA strategy.

o The Recovery Time Objective of 5 days is established from the point of disruption, and gives consideration to the time required for damage assessment, staff mobilization to alternate facilities, and recovery of the CWDS system at the DLI Recovery Facility

o DLI’s OIT has implemented electronic data backup systems and procedures, including tape backup and restore, to facilitate recovery and resynchronization of data to pre-defined application data recovery point objectives

o All recovery team leaders, alternates, and members have ready access to copies of their respective DRPs, including emergency contact information, at time of disruption

o A majority of the Workforce Development Information Technology Services Office employees and CWDS employees with the expertise required to implement the DRP are available following a disruption of operations

o The DRP will be modified as appropriate depending upon the magnitude and nature of the disruption

o Materials and records required to immediately begin damage assessment and recovery activities have been stored in an offsite

8 CWDS Disaster Recovery Plan Development Deliverable

Page 9: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

location that, while easily accessed, is not likely to be affected by the disruption

o The CWDS UAT environment, which mirrors Production, will be used to recover the Production environment at the recovery site. All other environments that share components or services (User Acceptance Testing, Development, Component Integration & Testing, and Training) will be considered low priority and will be disabled to rebuild Production. These environments will be recovered when normal business resume at a time established by the BRCT

o DLI needs to insure all shared services that the CWDS application relies on can also be recovered in Scranton within the defined time limits for CWDS.

6.2.2 CWDS Recovery Strategy Definition:Meetings were conducted with DLI employees and CWDS team members. DR documentation from OIT, CareerLink, and OVR was also assessed.

One of the documents provided from OIT was an “Office of Information Technology Enterprise IT Threat and Risk Assessment”. It provides excellent information on potential risks and threats surrounding the DLI facility, as it exists today. However, it addresses DLI network recovery at an enterprise level. For our scope, we needed our strategies to focus on the CWDS infrastructure solely, with the assumptions that major network dependencies (i.e. MAN, SAN, DLI Network, OA CTC, etc.) and connectivity to these networks would exist at the recovery site. DLI has defined a recovery strategy for the CWDS disruption scenarios supported in the DRP. The recovery strategy addresses requirements for immediate response to a disruption, DRP activation, employee notification, vendor and third-party notification, and required steps to recover the CWDS system at an alternate recovery site. Based on our disruption scenarios, the key elements of DLI’s recovery strategy include the following: The CWDS system in partial is inaccessible / inoperable at the DLI

facility – All functionality of the CWDS system located at DLI is maintained at DLI, assuming that the disruption is minor, can be dealt with locally, or if the operational capacity plus time required to recover critical support systems does not exceed 24 hours. Additionally, event must not to exceed established RTO and RPO’s.

The DLI data center or facility is inaccessible / inoperable – Within the established RTO of 5 days, the BRCT will declare a disaster and key employees from the DLI Data Center are dispersed to the DLI recovery facility for recovery of the CWDS Production environment. Declaration occurs when the loss of operational capacity, plus the estimated time required to recover the critical support systems, exceeds 24 hours. This situation can occur even if there is no physical damage to the facility.

The distinction between these two scenarios is determined by whether the production environment at DLI remains intact, accessible, and operational from the perspective of the DLI network, MAN, and OA CTC. Therefore, DLI needs to determine whether this boundary router is intact during a damage assessment.

The scenarios serve as a “guideline” to assessing the criticality of the event. Management’s decision to declare will be based on many factors, such as

9 CWDS Disaster Recovery Plan Development Deliverable

Page 10: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

overall damage, time frame of recovery efforts, availability, potential loss of business functions, etc. DLI needs to determine as to how much time can go by in a disruption before “declaration” becomes inevitable. Some activities surrounding the determination of this critical point or timeframe include analyzing:

set up or restoration times for hardware and software response time by external dependencies and vendors for recovery

purposes based on DR SLA’s, which could include:o Vendors (IBM, Dell, HP, etc.)o Service agencies:

Communication (Verizon, AT&T) Power (National Grid, KeySpan) Water

o Emergency authorities FEMA OSHA

average response time to call trees execution staff mobilization times to recovery site displacement times of staff at reciprocal site (if using an existing facility

such as Scranton) overall recovery site preparation

6.2.3 IT Process DocumentationThis section mainly focuses on the gathering of process documentation surrounding the CWDS Production and UAT environment that either exists, or needs to be documented, for recovery purposes.

Meetings were conducted with DLI and CWDS team members. As it stands, there are no documented installation and restoration processes and procedures for the UAT and Production environments because all the requirements for the CWDS Production environment have not been finalized yet. (Note: Documentation on setup and installation of the production system was included in the CWDS Hardware Systems Installation Manual and the CWDS Software Systems Installation manual) The only documented process for restoration that can be leveraged is documentation surrounding the CWDS Development infrastructure and any Tape Backup and Restore processes that might exist at DLI. Processes to complete the DR Plan will need to be documented in the future, once the production environment configurations have been developed. This should be re-addressed by DLI once the SAN recovery plan has been completed.

6.2.4 DR Plan DocumentationMeetings were conducted with some of the OIT and CareerLink DR Team members (AIMS and OVR were not present). Based on this meeting, existing DR Plan information for OIT, OVR and CareerLink was provided, reviewed, assessed and leveraged where applicable in the development of the new CWDS DR Plan. AIMS plans were not received for assessment.

The business continuity plans for OVR and CareerLink were documented in Recovery PAC (RPAC) software, a business continuity planning tool. The plans included good information, but no real process behind the execution of the plan. We concluded that the existing RPAC template would not provide the needed

10 CWDS Disaster Recovery Plan Development Deliverable

Page 11: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

structure to develop a recovery plan for CWDS. A new CWDS DR Plan methodology was developed, presented, and authorized by representatives of OIT for use, based on “ability to execute”, using a six-phase methodology approach to disaster recovery provided below:

Purpose and Scope Recovery Team responsibilities Disaster Recovery Objectives and Strategy Plan Assumptions Recovery Plan Organization Phase I – Emergency Response to a DLI Data Center Interruption Phase II – Mobilize to recovery site Phase III – Restore CWDS system at recovery site Phase IV – Application Restoration on the CWDS system Phase V – Validate CWDS Application Restoration Phase VI – Resumption of Business Operations

A visual representation of this methodology has been provided in APPENDIX E – SIX PHASE DR METHODOLOGY.

The plan is documented to reflect the recovery of all proposed components of the CWDS Production environment. Most of the documentation was acquired through meetings with CWDS and DLI employees. However, because all of the requirements for the CWDS Production environment are not finalized yet, documentation does not exist. Therefore, the DRP has been set up using the CWDS Production component environment, leveraging existing CWDS Development Hardware and Software restoration processes. At a minimum, it provides CWDS a foundation in which to build on.

7 Single Point of Failure (SPOF) Analysis of CWDS IT ComponentsThe SPOF section identifies the key resources which, if unavailable, would prevent CWDS’ ability to conduct business operations. We performed an SPOF assessment of the IT component infrastructure on the two major environments (User Acceptance Testing and Production) for redundancy.

Our approach was to first assess the interdependencies surrounding the assumed network infrastructure outlined in Section 6 of this document. Next, we analyzed our proposed UAT and Production infrastructures.

7.1.1 Facility InterdependenciesA High-Level Site Assessment was conducted to surface recovery site possibilities, but no decision has been made. Since then DLI has taken responsibility for the selection of the disaster recovery site. It is assumed that prior to its selection, due diligence was done to identify and minimize any potential single point of failures and post selection correct those that would be fiscally feasible. A new SPOF analysis was not within the scope of this deliverable.

11 CWDS Disaster Recovery Plan Development Deliverable

Page 12: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

7.1.2 Connectivity InterdependenciesDLI maintains network diagrams which detail all pertinent information regarding the production and user acceptance test environments (SEE APPENDIX F – PA LABOR & INDUSTRY HI-LEVEL NETWORK DIAGRAM and APPENDIX G – DR ARCHITECTURE FOR CWDS). Upon review of this documentation, along with other DLI network infrastructure information provided, we concluded that the connectivity for CWDS recovery requirements for all of the networks will be sufficiently redundant. Since implementing a facility in Scranton, DLI has worked diligently to improve connectivity between there and the Harrisburg data center. Additionally all the CWDS specific hardware and critical subsystems have been moved to the Scranton facility.

7.1.3 Network InterdependencyA detailed assessment of the DLI infrastructure was not conducted. However, based on information gathered through interviews and documentation assessments, we concluded that CWDS Production environment has sufficient redundancy.

7.1.4 System Redundancy

7.1.4.1 General Components

Discussions were held with the CWDS project team and proposed CWDS UAT and Production environment infrastructure diagrams were assessed for redundancy. Based on the analysis, there is partial redundancy across both environments. Components that have redundancy deficiencies are provided below for each environment:

Production Environment:

CWDS Production & UAT Hardware Component Comparison Chart

CWDS Production QtyIntranet Web/App 6Identity Minder User Mgmt 2SiteMinder Policy 2Staging Server 1CWOPA User Repository 2Enterprise Reporting Cluster 2DW Enterprise Reporting Cluster 2Monitoring Server 1Systems Management Server 1GIS Batch Server 2SMTP Mail Server 1WebSphere Java Application 2Batch Cluster 2AD Authorization Repository 2AD User Repository 2OLTP DB/ODS/DW 2Reporting (ODS) / DW DB Server 2State Management DB Server 2

12 CWDS Disaster Recovery Plan Development Deliverable

Page 13: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

CWDS Production & UAT Hardware Component Comparison Chart

CWDS Production QtyCSS Switch 2CheckPoint Firewall 2Sorry Server 1

Figure 7-A: CWDS Production Component Redundancy Chart

The items in bold signify components that have do not have redundancy across the CWDS Production environment. While none of these servers are critical to the availability of CWDS as an application, they are important to the ongoing operations and DLI should consider additional hardware in order to provide redundancy for these components.

13 CWDS Disaster Recovery Plan Development Deliverable

Page 14: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

User Acceptance Testing (UAT) Environment

Figure 7-B: CWDS User Acceptance Testing Redundancy Chart

Please reference APPENDIX B –PRODUCTION ENVIRONMENT and APPENDIX C –UAT ENVIRONMENT for diagrams of the two environments.

7.1.5 CWDS Recovery Data ValidationCurrently, there is no existing documented process for data validation. There needs to be procedures as to how the CWDS team interacts with the business side to assure that data recovered is accurate and what needs to be done to re-create data that is not accurate.

The Internal Customer Support Team (ICS) is currently responsible for coordinating the validation processes of DLI systems. CWDS needs to work with the ICS team to document validation procedures.

Roles and Responsibility information can be found in APPENDIX D – ROLES AND RESPONSIBILITIES.

8 Tape Backup and RestoreIn this section, we provide the storage and restoration requirements needed for the recovery of CWDS broken down in components as follows:

14 CWDS Disaster Recovery Plan Development Deliverable

CWDS Production & UAT Hardware Component Comparison Chart

CWDS User Acceptance Testing (UAT) QtyIntranet Web/App 6Identity Minder User Mgmt 2SiteMinder Policy 2Staging Server 1CWOPA User Repository 2Enterprise Reporting Cluster 2DW Enterprise Reporting Cluster 2Monitoring Server 1Systems Management Server TBDSMTP Mail Server 1GIS Batch Server 2WebSphere Java Application 2Batch Server 2AD Authorization Repository 2AD User Repository 2OLTP DB/ODS/DW 2State Management DB Server 2Reporting (ODS) / DW DB Server 2CSS Switch 2CheckPoint Firewall 2Sorry Server 1

Page 15: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

8.1.1 ArchivalOur approach addresses the requirement to archive 4 years of data, offline and online archival and retrieval of data. Based on the business requirement the new CWDS application will host current fiscal year and three previous fiscal years data. The approach will also support users archive records on their own using web interface.

8.1.2 Backup and Recovery ApproachOur approach provides offline data storage onto tapes which will provide a backup and recovery solution to meet the CWDS systems requirements. The tape backup strategy provides a failsafe for data protection. It addresses the Disaster Recovery requirements and allows for full recovery of information within 72 hours. With full or incremental/differential tapes cut each night and sent to secure storage, copies are preserved off-site (potentially the secondary site), in event of a primary site disaster.

We have assessed the existing DLI tape backup and restore policies and processes and agree that they comply with CWDS RTO and RPO requirements. However, we will coordinate DR testing activities with DLI to assure compliance with the 72 hour RPO and 5 day RTO.

8.1.3 Database Recovery

The Microsoft SQL Server database selected for the CWDS system allows for easy data extraction. “Data dumps” will be taken from the Microsoft SQL Server database and written to files. This ensures that the timeframe the SQL database is offline is kept to a minimum. The CWDS data dump files are then sent to VRI for disaster recovery purposes. As part of the second release of CWDS the project team identified and CWDS procured additional database backup and recovery software compatible with SQL 2005, SQL 2008 and Tivoli Storage Manager. This was done in response to the continued growth of the CWDS database and a need to compress the data contained in the backup files so that the copies require less SAN space and consume less network bandwidth during storage to tape. Lightspeed from Quest Software was chosen because it meets both these requirements and provides the added capability of component level restoration which SQL Server does not natively provide.

In addition to the data in each of the databases, it is critical to restore the data model should a disaster occur. The Microsoft SQL Server database designer repository will be backed up, with the tapes being rotated to offsite storage as well. Backups will be created on both a periodic basis, as well as after changes to the designer repository to help ensure the design of the SQL databases and the data dump files match. Additionally, an analysis of periodic daily transaction dumps will enable the restoration of the database to a point more recent than the data dumps.

8.1.4 Functional ComponentsCWDS encompasses a wide variety of hardware and software to meet the demanding needs of its clients and customers. Table 8.1 identifies the functional components that are part of the CWDS system. For each component, the table also identifies the operating and supporting software required to provide the function it was designed for. The last two columns identify the days backup will be required and the expected retention cycle for both local and

15 CWDS Disaster Recovery Plan Development Deliverable

Page 16: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

offsite storage. In the “Data Retention” column, the “DEFAULT” statement refers to an existing SFOT/OIT default retention policy with states that the default retention requirements are 4 cycles. A cycle consists of 1 full image copy and daily incremental copies that cover a 1 week period:

Server Role

Software Utilized Backup Requirements

Operating SystemCWDS

Application Software Installed

Days of week

backup required Data Retention

Intranet Web/App Windows 2008 Enterprise Edition

IIS 7, ASP.NET (D) / VeriSign 128-bit Digital Certificates

Sunday - Saturday

DEFAULT

Identity Minder User Mgmt/ JAS

Windows 2008 Enterprise Edition

Identity Minder,WebSphere

Sunday - Saturday

DEFAULT

SiteMinder Policy Windows 2008 Enterprise Edition

IIS 6, ASP.NET (D) / Netegrity / SiteMinder

Sunday - Saturday

DEFAULT

Staging Server Windows 2008 Enterprise Edition

Application Center 2000

Sunday - Saturday

DEFAULT

EAI App Cluster. (webMethods)

Windows 2008 Enterprise Edition

webMethods Sunday - Saturday

DEFAULT

CWOPA User Repository

Windows 2008 Enterprise Edition

Active Directory Sunday - Saturday

DEFAULT

CWDS BOE Report Server

Windows 2003 Enterprise Edition

Business Objects Sunday - Saturday

DEFAULT

DW BOE Report Server Windows 2008 Enterprise Edition

Business Objects Sunday - Saturday

DEFAULT

Enterprise Reporting Cluster

Windows 2008 Enterprise Edition

Crystal Reports Pro 11

Sunday - Saturday

DEFAULT

GIS Application/ Database

Windows 2008 Enterprise Edition

Batch Software and CWDS batch applications

Sunday - Saturday

DEFAULT

Batch Server Windows 2008 Enterprise Edition

Dollar Universe Sunday - Saturday

DEFAULT

AD Authorization Repository

Windows 2008 Enterprise Edition

Active Directory Sunday - Saturday

DEFAULT

AD User Repository Windows 2008 Enterprise Edition

Active Directory / ADAM

Sunday - Saturday

DEFAULT

OLTP DB/ODS/DW Windows 2008 Enterprise Edition

Quest LightSpeed Sunday - Saturday

3 years or business requirements of the data

Reporting (ODS) / DW DB Server

Windows 2008 Enterprise Edition

Quest LightSpeed Sunday - Saturday

3 years or business requirements of the data

Table 8.1 CWDS Functional Components

16 CWDS Disaster Recovery Plan Development Deliverable

Page 17: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

CWDS has completed the DLI Storage Subsystem service level agreement which identifies in detail exactly each server that is to be backed up, how often it should be backed up, and what data on the server will be included in the backup.

Additionally for each server providing the functions in Table 8.1, CWDS will complete the Server Backup Request Form. The Server Backup Request Form, created by SFO/OIT is used to identify the specifics for each data storage device which the servers utilize by drive letter and data type. This provides significant flexibility not only for data backup frequency but also allows for flexibility with regard to data retention needs depending on how the online storage device is utilized.

Part 2 of the Server Backup Request form is used to identify how the data on each storage area is to be archived and for what period of time. DLI employs IBM’s Tivoli Storage Manager to identify and manage all data which will be managed in the active archive.

CWDS will follow the standard implementation for drive allocation currently in use at DLI. Table 8.2 Sample Partition Allocation identifies the drive mappings and expected utilization for the CWDS servers:

Table 8.2 Sample Partition Allocation

As part of the standard implementation, drives on the server will be utilized as defined in the table above. This data will be collected in greater detail for each of the servers and submitted on the form. It is important to note the Tivoli Storage Manager treats an incremental backup differently than what is traditionally understood by the term. The current SFO/OIT default retention schedule which allows for 4 inactive and 1 active copy currently meets all retention requirements for the CareerLink system. The same retention schedule will support the CWDS application as well

Due to the nature of the type of data on a SQL Server database, these files need to be backed up differently. A separate form for database servers is currently utilized at DLI, which will also be used for CWDS. In addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively back up critical SQL Server database files. It is also expected that CWDS will incorporate the current retention practices utilized

17 CWDS Disaster Recovery Plan Development Deliverable

Drive Directories/Files File Type File Size GB

% Yr. Growth Type Retention

C: C:\ (all directories and subdirectories)

Operating System Files 10 GB <1% Incremental DEFAULT

D: None <10 GB <1% Incremental

E: E:\ (all directories and subdirectories)

Application Software 10 GB <1% Incremental DEFAULT

F: F:\ (all directories and subdirectories) Log Files 30GB TBD Incremental TBD

G: G:\ (all directories and subdirectories)

CWDS Application files

Various TBD Incremental TBD

Page 18: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

by the CareerLink system, unless the business requirements of the new system make it necessary to alter them in any way.

8.1.5 Testing

Since most of the requirements for archival and retrieval are being developed, a formal test of the backup and restore process cannot be completed at this time. However, a partial test will be done using servers in the Development environment in the near future.

Now that the CWDS Production environment is running, daily restores of the production data are executed in TFP using the Lightspeed software. The tape backup/restore system using Tivoli Storage Manager (TSM) has been used to restore many of the services, components and systems, in support of CWDS. As requirements and components change, DLI will need to develop necessary documentation for continued testing of the backup and recovery systems.

9 Single Point of Failure (SPOF) Analysis of CWDS Business ComponentsBusiness Continuity Planning as a whole focuses on assuring continuous business processes and is a major factor in an organization's survival during and after a disruption. The success of this planning should not only focus on the IT infrastructure, but on the critical business functions that depend on the CWDS system as well. This section elaborates on future considerations to be assessed, not just to disaster recovery, but also for the DLI business continuity program as a whole. The following should be points of interest for the future:

9.1.1 Detailed Site Assessment(With the selection of the Scranton facility this section was no longer needed and so it was removed.)

9.1.2 Leveraging CTC relationships for recovery siteAs it stands today, the Commonwealth of PA has an existing relationship with SunGard and IBM BCRS recovery services. A further assessment of how DLI can leverage this relationship for CWDS recovery solution should be done. SunGard's technology recovery services provide an infrastructure focusing on the recovery of servers, with an extensive range of server and mainframe platforms. The cost of recovering CWDS at SunGard could be potentially manageable for DLI.

9.1.3 Real Time BackupsWith the SAN equipment in place, DLI may want to investigate the possibility it could be leveraged to perform real-time backups, in addition to storing tapes off-site. This would facilitate and dramatically improve recovery process.There are some products that capture many versions of a file, not just those versions that existed during a backup cycle. Essentially, if a file is lost in-between a backup cycle, the software is able to restore it. Therefore, if a file becomes corrupt, is accidentally overwritten, or infected with a virus, users can go back to a moment before the interruption to recover data. However, this can be a costly solution, so DLI would needs to do a cost-benefit analysis.

18 CWDS Disaster Recovery Plan Development Deliverable

Page 19: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

Additionally with the selection and implementation of the Scranton site, DLI could evaluate the possibility of pushing daily copies of backup data across the MAN or dedicated circuit to insure its availability on site in the event it would be needed.

9.1.4 Data ValidationAlthough our proposed CWDS DR plan addresses data validation, DLI needs to develop recovery processes with the program areas that are ultimately responsible for the data. The CWDS DR Plan has established protocols to recover the CWDS IT Production environment. Once recovered, it is the function of the business unit to validate all data in the CWDS system to assure that is has been recovered correctly. Section 10 (Phase V) of the CWDS DR Plan references the coordination of data validation activities on restored CWDS application with program areas. The plan reflects high level activities associated with this phase. However, from the business side, current business continuity plans do not reflect data validation activities. This needs to be addressed before both business and IT restoration plans are complete for CWDS.

9.1.5 Enterprise Business Continuity PlanningThe existing scope of the project is to create IT DR plans for CWDS. However, a high level review of the existing business continuity plans was done and although they posses excellent information, they lack “executability”. The six-phase methodology used for the development of the IT DR plan could be considered for the enterprise plans as well, leveraging existing CoG documentation.

The business continuity plan is a “living” document. Processes and protocols might not vary too much, but the people responsible for its execution might fluctuate. Therefore, the plan will need to get reviewed semi-annually, preferable quarterly.

Ultimately, DLI would benefit from an in-depth business continuity program assessment to assure the business, and not just the IT infrastructure, would be recovered in the event. This would involve an enterprise approach to Business Continuity Management, which brings together the IT and business recovery requirements into one defined recovery strategy.

10 SummarySince the initial release of this document DLI has undertaken a number of initiatives to fulfill many of the requirements identified in this document. Site improvements to the Scranton facility include but are not limited to additional power and telecommunications capabilities as well as independent SAN storage and backup capabilities. The CWDS computing environment was relocated and implemented in the facility and served as the testing platform for functional testing as well as load testing.

19 CWDS Disaster Recovery Plan Development Deliverable

Page 20: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

Appendices

20 CWDS Disaster Recovery Plan Development Deliverable

Page 21: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System Project

APPENDIX A – Production Environment

Page 22: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

APPENDIX B –UAT Environment

22 CWDS Disaster Recovery Plan Development Deliverable

Page 23: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

APPENDIX C – Roles and Responsibilities

23 CWDS Disaster Recovery Plan Development Deliverable

Page 24: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

APPENDIX D – Six Phase DR Methodology

24 CWDS Disaster Recovery Plan Development Deliverable

Page 25: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

APPENDIX E – PA Labor & Industry High-Level Network Diagram

(Note: This drawing is outdated and may no longer be useful, recommend that it be removed or updated in the next revision.)

25 CWDS Disaster Recovery Plan Development Deliverable

Page 26: CWDS DR Plan Development - PA - eMarketplace Web viewIn addition to the archival storage of the database files created by the SQL Server DBA, CWDS will employ a Tivoli agent to interactively

1.3 Comprehensive Workforce Development System

APPENDIX F – DR Architecture for CWDS

26 CWDS Disaster Recovery Plan Development Deliverable