How New Data Center Technologies Impact Recoverability
Transcript of How New Data Center Technologies Impact Recoverability
How New Data Center Technologies Impact Recoverability
Presented by:
Damian Walch, CISA, CISSP, CBCP
TerrorismCyber AttacksBiological ThreatsEmployee SabotageIndustrial Espionage
RegulationDeregulationIncentivesLegal
Global MarketplacePartners/SuppliersDemand Elasticity
IT InfrastructureTechnology
AdoptionInnovation and
Trends24x7 ExpectationsDenial of Service
AttackVirus
Natural DisastersWorkplace IssuesNational
Programs
Stressors that Test Your VulnerabilityEnvironmentalEnvironmental SocialSocial
PoliticalPolitical
EconomicEconomic
TechnologicalTechnological
Business & ITProcesses
Technology
Organization
Facilities& Security
Strategy
Applications& Data
The Problem is Viewed “Narrowly”9/11 Lessons
• Business not linked to IT Strategy
• Roles poorly defined… no ownership
• Outdated, overly complicated processes
• Processes didn’t cross LOBs• “Shared Services” forgotten• Lack of standardization• No true redundancy• Supply Chain not covered• B/U components not
maintained• Little geographic spread
Corporate CulturePosition the corporate
mission and values within the continuity and
recovery program to ensure that the EBCP can adapt to business change
Technology SolutionsIdentify and implement technology solutions to
support business integration and availability
to protect against interruptions and/or
outages
GovernanceProvide clarity,
definition, and guidance for the EBCP at the Enterprise level to ensure that the
initiatives are carried out
Enterprise Risk Management
Identify, mitigate, and control threats to the business in
order to protect the enterprise in a
consistent manner
Business IntegrationIntegrate all lines of
business into the EBCP to provide end-to-end
availability and protection of business
process across the organization
Quantify, track, and communicate the continuity and recovery value to the organization and ensure the EBCP investment is
managed
Value Assurance
Manage the execution of the EBCP to ensure that the program is executing as
designed and is providing a consistent approach
throughout the enterprise
Program Execution
Enterprise Business Continuity Framework
Evolution of Service Delivery
Time
Pro
du
cti
vit
y/V
alu
e
Individual Data Centers
e-Utility
ResiliencyConsolidated
Delivery Centers
•Consolidation•Economies of Scale•Common Processes•H/W & S/W Standards
Grid
•Virtual Consolidation
•Further Economies
•Dynamic Allocation
•Collaboration/Alliances
•Commoditization•Resource on demand•Standardize Measures/billing
•Expand ASP Model
Evolution of Business ResilienceCentralized Computing
Distributed Computing
'60's - Early 80's
1.Mainframe model: centralized control, standardization, batch reporting
2.Focus: data center, internal stresses, very localized disruptions
3.IT: reactiveBusiness: none
4.Recovery Time in weeks
5.Mindset: insurance
Disaster Recovery
Mid - Late 80's
1.Midrange & client-server model: departmental computing, creativity, independence
2.Focus: satellite hubs, internal stresses, very localized disruptions
3.IT: reactive/noneBusiness: reactive
4.Recovery Time in days
5.Mindset: insurance
Business Recovery
The '90's - 2000
1.Hybrid model: connectivity, data sharing cross-bu, re-standardization
2.Focus: enterprise I/S, internal/external stress, localized disruptions
3.IT: reactive Business: reactive
4.Recovery time in hours
5.Mindset: insurance
Business Continuity
Year 2001 - today
1.Virtualized model: extended supply chain, mobility, direct customer access
2.Focus: extended global I/S, internal/external stress, broad disruptions
3.IT: proactiveBusiness: proactive
4.Always up
5.Mindset: survival
Business Resiliency
Network Centric Computing
On-Demand Computing
Element Monitoring
Event Detection
Event Correlation
Service Level Management
MicromuseNetcool
Quallaby
NetworkPerformance
HP Internet,Firehunter
Internet Services
Fault/Performance
OpenViewNNM
Network Fault
CiscoWorks2000
Automated Call Dispatch: AproprosTrouble Ticketing Systems: Remedy ARS
Automated Call Dispatch: AproprosTrouble Ticketing Systems: Remedy ARS
OpenViewVPO
Server Fault
event
eventevent
root causereports
exceptionstopology viewevent
MicromuseSlamService LevelAgreement Manager
StorageManager(SNMP)
RobotManager(SNMP)
FabricManage
r(SNMP)
mgmt apps, actions root cause
OpenViewVPO SPI
Application Fault/Performance
event
Service Level Agreement Management
Emergency Messaging Services EMRS performs multichannel device notification
•Notification messages, directions, and critical information sent to cell phones (SMS), pagers, RIM, alternate email addresses, etc.
Employees access e-mail from any web browser
•Home, temporary offices, Kinko’s
•Transparent failover to rest of world
•Use original e-mail addresses
•128-bit SSL encrypted•Users can be authenticated with SecureID or passwords
Services
Mediation
Transaction Management
Corporate Finance ERP(e.g. SAP, PeopleSoft)
ConsolidationEngine
3rd Party Partner
Switches
Routers
Probes
Equipment Business ProcessApplications
ERP CRM
SFA E-mail
Manual
Process Automation Tools
Reporting Invoicing
ITMediation
SingleView Mediation
Existing Solution
3rd Party Solutions
onDemand or Utility Computing
IT Infrastructure
Grid Computing
Grid Management Nodes
Grid Compute Nodes
IBM 9-dot
UMI Tools
ToolsServers
Build Servers
WebSphere Application Server
...
Tools Transit (50)
InformationService
CertificateAuthority
GridDemo
Application
AXISSOAPServer
WebSphereAdministration
LiveClusterAdministration
GRAM GridFTP
Globus CoreLiveCluster
Driver / Server
MetaProcessorServer /
Scheduler
MetaProcessorSystem
Management
Met
aP
roce
ssor
Dat
abas
e
RLS
/ R
FT
App
licat
ion
Dat
abas
e
Grid Resource
GRAM GridFTPEngine
Daemon
EngineInstance
(1 per CPU)
Globus Core LiveCluster
UDAgent
MPDevice
Grid Resource
GRAM GridFTPEngine
Daemon
EngineInstance
(1 per CPU)
Globus Core LiveCluster
UDAgent
MPDevice
Grid Resource
GRAM GridFTPEngine
Daemon
EngineInstance
(1 per CPU)
Globus Core LiveCluster
UDAgent
MPDevice
Characteristics of a ResilienceBUSINESS CONTINUITY IT RECOVERY AVAILABILITY SECURITY
knowledge of which business processes supported by which applications
monitor the backups of all applications and platforms across the enterprise
all applications are properly assigned a "recovery tier"
firewalls, virus protection and intrusion detection is implemented and kept up-to-date
clear incident response and crisis management procedures tested
application design process that is integrated with the business continuity process
disaster recovery process integrated with problem management and help desk processes
patch management team
knowledge of risks and regulations that are required of functions
change management process that considers disaster recovery (each checkpoint)
SLA management (SLAM) tool implemented
24x7 monitoring of IDS logs
automated process for restoring OS footprint on recovery platforms
storage mirroring established for the highest priority (tier 1) applications SPAM engine
E-mail recovery or replication solution is in place
physical security -- possibly biometrics in place
8 Pragmatic Approaches to Resilience1. Make executives aware of program (and risks)
2. Understand the most critical business processes
3. Create “commitments” (i.e. policies for corporations)
4. Implement call trees and exercises
5. Explain objectives for the year and measure results
6. Ensure backup and offsite storage - audit
7. Backup workstations and laptops
8. Conduct desktop exercises for operations staff
Closing Comments“Resilience” should be our goal and will ultimately be
achieved by most organizations, but it’s not here today
Resilience is the integration of DR, BC, physical security, information security and operational availability…aligned with business processes
Poor results in the BC industry are our fault for not simplifying messages, measuring results and providing a clear roadmap
Great strides can be achieved by focusing on 8 to 10 reasonable principles for increasing recovery and “resilience”
By integrating the disciplines and processes for DR, BC, physical security and information security you can reduce overall effort, increase results and in many cases address regulatory requirements