Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of...
Transcript of Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of...
Protecting Email Communicationby Preventing Downtime
Continuity Insights
Paul D’Arcy, VP MarketingMessageOne
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 2
Agenda
� Introduction
� The Problem with Email
� Protecting Communications: Approaches to Email Availability
� Outage Scenarios
� Email Continuity – Key Conclusions
Protecting Email Communications by Preventing Downtime
The Problem With Email
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 4
Email is:
� The most commonly used business and communication application
� Primary channel for sales, customer support, and business communications
� Growing rapidly -- per user email volumes will double in next 3-years
� Considered a business record by the courts
Why Focus on Email?Email has become the most important business application
75%Percent of business
information isstored in email
3Hours per employee
per day spent on email
10Number of minutes
before employees find email downtime painful
80MB / month of email sent & received by
typical user
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 5
Email is a Tier One ApplicationGiven its growth and importance, email is typically classified as a Tier One application
RTO< 96 Hours
RPO<24 Hours
RTO< 48 Hours
RPO< 24 Hours
RTO< 24 Hours
RPO< 4 Hours
RTO < 2 Hours
RPONo Data Loss
Tier 4 Tier 3 Tier 2 Tier 1
Application Criticality
Least Critical Most Critical
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010
The Problem with Email: Complexity
� Typical email system deployment
has independent servers for:
- Email application
- Web mail
- DNS
- Directory
- Wireless device access
- Anti-spam / perimeter protection
- Clustering services
- Mail storage
� All components must function
� IT must failover / restore many
components correctly
Email is a complex ecosystem
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 7
The Problem with Email: Single Point Of FailureThere are many SPOF, any of which can take down email
Common Single Points of Failure
Lotus NotesExchange
� Active Directory
� DNS
� Windows Authentication
� Exchange non-clustered Server
� Operating System malware
� Software / Hardware
� Network
� Single-site data center
� Power
� Storage / SAN
� AC
� Domino non-clustered server
� Operating System malware
� DNS
� Software / Hardware
� Network
� Single-site data center
� Power
� Storage / SAN
� AC
Exchange corruption on a single mail store can affect 100s if not 1000s of users.
© 2007 MessageOne – Confidential
� Everyone plans for natural disasters but they are
infrequent
� Technical failures are a given
� But the amount of human error is staggering and is very difficultto avoid
The Problem with Email: Human Error
Source: MessageOne Activation Data
The hardest failure cause to prevent
Human Error Based Outages Are Common
14%
46%
40%
TechnicalFailure
NaturalDisaster
Email Failures by Event Type
Human
Error
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 9
� 75% of companies experience a major email outage each year
� 14% will have a major planned outage each year
� 55% of outages will last more than 6 hours
The Problem with Email: Reliability
Source: MessageOne Activation Data
Email remains fragile and prone to downtime
Email Will Fail
Email Failures by Event TypeEmail Failures by Event
Type
14%
31%
InfrastructureHardware
Directory
Software
Storage/Database
24%
4%
27%
14%
Email Failures by Cause
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 10
Cause #1: Local Infrastructure Outages
Infrastructure Failure Accounts for 31% of All Outages
� Includes network downtime
� Real-world infrastructure failures include:
- Datacenter power outage
- Termite infestation
- Cold weather caused pipes to burst & flood a 5th floor data center
- Construction work destroyed fiber lines causing an extended network outage
- Security guard hit the wrong button
Email infrastructure availability is unpredictable
A 5,000 person health care provider lost $3 million during 8-hour outage when IT staff accidentally shut down datacenter power.
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 11
Cause #2: Hardware Failure
Hardware Failure Accounts for 27% of All Outages
� Includes servers, drives, cables, RAM, Cache, etc.
� Real world examples include:
- RAM defect corrupting database
- Catastrophic hard drive failure knocking out server OS
- Power/UPS/HVAC Failure taking down server
- Logic board defect causing clusters to fail to communicate
- Routers dropping traffic and corrupting archives
Hardware will eventually fail
A 2,000 person national law firm CIO estimates a recent email hardware outage cost the firm $100,000/hr in lost revenue and productivity.
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 12
Cause #3: Storage and Database Failure
Storage and Database Failures Account for 24% of All Outages
� Complex storage systems are difficult to operate and maintain
� Real world examples include:
- Database corruption causing 6 day outage
- SAN device failure taking out local data stores
- SAN configuration errors causing data loss windows
- Loss of retention policy compliance due to storage system failures
- Costly recovery operations from tape backups
Result in in loss of productivity and inaccurate archives
Five financial services firms were fined a total of $8.25 million for failure to protect and preserve email communications.
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 13
Cause #4&5: Software & Directory Problems
Software and Directory Problems Account for 18% of All Outages
� Email software systems are complex ecosystems with many interwoven parts; server software, directories, authentication systems
� Real world examples include:
- Configuration errors and software corruption
- Directory failures and directory corruption
- Migrations, faulty patches, upgrades
- Out of date drivers
- Security threats including viruses, worms and malware
Occur when working within complex IT environments
A national financial services firm lost $6 million from a virus-related email outage, plus damage because their financial planners lost access to their email calendar, used to track customer appointments.
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010
How Long Do Outages Last?Over one-quarter of email outages last over two days
72% of email outages last 4 hours or more:
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 15
2.0 Hours
32.0 Hours
Tier 1 RTO Average “Real-World”Downtime per Incident
Few companies are able to meettheir stated RTO for email
� Existing DR & HA options are incomplete
- Database corruption- Directory failures- Windows viruses
� Email is built on large corruptible databases
� Frequent target of malware and viruses
Why?
Downtime is Almost Unavoidable Gaps in security, availability, and disaster recovery
Protecting Communications: Typical Approaches to
Email Recovery and Continuity
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 17
Email Continuity
� Goal: Avoid interruption of end user
email services during an outage
or failure
� Seamlessly transitions end users to
a back-up system when problems
occur
� Backup system may be structurally
different than primary email system
� Continuity services, log-shipping,
and some replication solutions
provide email continuity
Email Recovery
� Goal: Restore primary environment
to normal state after an outage or
failure
� Recreate application, data,
configurations with as little data
loss as possible
� May take hours or days
� Tape, SAN’s, vaulting, replication,
log-shipping solutions provide
email recovery
Email Recovery and ContinuityBoth are necessary for complete email availability
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 18
Replication: A Common Approach for High Availability
� Utilize two geographically separated datacenters
� Purchase 2x hardware
� Purchase 2x software
� Build a large network pipe between datacenters
� Deploy, configure and maintain dual environments identically
� Hire staff to ensure 24/7 failover when problems occur
� Still susceptible to same worms/viruses as primary environment
Replication is typically the 1st DR approach that CIOs evaluate
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 19
How Much Should it Cost to Protect Email?Insurance should not cost more than your initial investment
New Truck
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 20
How Much Should it Cost to Protect Email?Insurance should not cost more than your initial investment
New Truck
Special Cable
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 21
How Much Should it Cost to Protect Email?Insurance should not cost more than your initial investment
New Truck Second Truck… just in case
Special Cable
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 22
How Much Should it Cost to Protect Email?Insurance should not cost more than your initial investment
New Truck Second Truck… just in caseSpare Driver
Special Cable
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 23
How Much Should it Cost to Protect Email?Insurance should not cost more than your initial investment
New Truck Second Truck… just in caseSpare DriverMaintenance crew
Special Cable
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 24
Speed to Recovery Determines CostRecovery speed & breadth of coverage are the primary determinants of cost
Gartner: Technologies to Reduce Recovery Time
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 25
Five Approaches to Email Availability
“Organizations do not apply the same operational rigor to email that they apply to other mission critical business systems…and we believe they
must…to ensure maximum reliability.” - Gartner 2007
But none provide full protection from email downtime
Increasing Levels of Protection
ReplicationClusteringSAN Log-Shipping Continuity
Asynchronous replication of
data to a remote data center
2 hourfailover
Multiple hardware nodes in the same data
center
Immediatelocal failover
Provides data availability via a
point-in-time snapshot stored
in redundant local data stores
Data only
A remote SaaSemail continuity
system
60 second activation
Transaction-level replication
of data to a remote data
center
2 hourfailover
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 26
Storage Area Network (SAN)An option for protecting data
Definition:
� Solution for storing redundant copies of data to prevent data loss
Strengths:
� If database becomes corrupt, can return to last valid snapshot
� Enables disaster recovery
Weaknesses:
� Expensive
� Doesn’t protect against local power, network or data center outages
� Doesn’t protect against hardware/software problem
� Complex to set up and maintain
SANs provide a good means to protect data from loss or corruptionbut do not provide true high-availability
Local Remote
Da
taA
pp
lic
ati
on
SAN Architecture
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 27
ClusteringProtection against local server hardware failures
Definition:
� Hot or cold standby servers co-located within the primary datacenter that can take over for disabled servers when problems arise.
Strengths:
� Fast failover
� No interruption to users if primary fails
� Planned maintenance can occur without downtime
Weaknesses:
� Expensive
� Doesn’t protect against power outages, network or local infrastructure
� Database corruption and viruses can impact all nodes
� Set up and admin require highly skilled staff
Clusters provide protection against local server hardware failures but do not ensure high-availability
Local Remote
Da
taA
pp
lic
ati
on
Clustered Architecture
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 28
ReplicationProtection from site-level failures
Definition:
� An identical second mail system in a redundant data center keeps a binary replication
Strengths:
� Fastest recovery-focused option
� Immediate access in case of failure
� Copy is insurance against threats to the primary data store
� Enables continued business operations
Weaknesses:
� Very complex and expensive
� Replicates corruption and viruses
� Fail-over and Fail-back requires highly skilled staff
Replication provides quick recovery but are not immune to database corruption, directory problems and viruses
Local Remote
Da
taA
pp
lic
ati
on
Replicated Architecture
…101011101…
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 29
Log ShippingCorruption resistant protection from site-level failures
Log Shipping provides quick recovery and better immunity to database corruption, but susceptible to directory problems and viruses
Definition:
� An identical second mail system in a redundant data center is populated from transaction logs
Strengths:
� Fastest recovery-focused option
� Immediate access in case of failure
� Copy is insurance against threats to the primary data store
� Enables continued business operations
Weaknesses:
� Log-shipping creates enormous data stores
� Very complex and expensive
� Replicates configuration errors and viruses
� Requires highly skilled staff
Local Remote
Da
taA
pp
lic
ati
on
Log Shipping Architecture
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 30
Exchange 2007 Uses Log Shipping - 3 Flavors
Local Continuous Replication (LCR)
Single server solution utilizing log shipping to provide data redundancy.
Limitation: Does not protect against server failures.*
Cluster Continuous Replication (CCR)
Two node cluster/log shipping that relies on Microsoft Clustering Services for continuity.
Limitation: No manual failover for planned activations, must use same AD site; locations need to be relatively close.*
Standby Continuous Replication (SCR)
New with SP1, SCR utilizes log shipping for replication of databases to other servers located anywhere on the Intranet.
Limitation: Failover should take less than 30 minutes but could be longer, and there will be some data loss.*
Exchange 2007 high availability options do not provide complete protection
* Source: Gartner Research “Exchange Server 2007 HA/DR: Options, Benefits and Limitations”, 10/27/07
1.
2.
3.
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 31
ContinuityLow-cost, maximum protection from all outage types
Definition
� A hosted standby email system that can be activated in 60 seconds, data stored in remote disaster recovery class datacenters
Strengths:
� Outages invisible to outside world
� Protects wireless communications devices
� Email functionality restored in less than one minute
� Only solution to work through all types of outages including database corruption, viruses, hardware/software failure & connectivity
� Inexpensive, rapid deployment
Weaknesses:
� Does not rebuild the server
� Recovery operations can be complex
Continuity services are low cost and most likely option to ensure email availability during outages, but do not fully replace recovery options
Local Remote
Da
taA
pp
lic
ati
on
Continuity Architecture
© 2007 MessageOne – Confidential
High-Availability Technology Threat CoverageSolutions to keep email up and running
27%
√√√√
Clustering
83%
√√√√
√√√√
√√√√
Log Shipping
100%59%24%Threats Covered (%)
√√√√Directory (4%)
√√√√Software (14%)
√√√√√√√√Hardware (27%)
√√√√√√√√Storage / Database (24%)
√√√√√√√√Infrastructure (31%)
ContinuityReplicationSANFailure Type
Level of Protection by Solution Type
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 33
Summary: High-Availability TechnologiesOnly continuity services truly eliminate email downtime and data loss
SAN Clustering ReplicationLog
ShippingContinuity
Cost $$ $$$ $$$$ $$$$ $
Recovery Time (minutes) N/A 0 60-180 60-180 1
Designed for Continuity No Yes Yes Yes Yes
Designed for Recovery Yes No Yes Yes No
Operational Difficulty Medium Medium High High Low
Requires dedicated staff Yes Yes Yes Yes No
Supports wireless devices No No No No Yes
Protecting Communications: Outage Scenarios
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010
Case Study: Katrina Causes Outage
Company Background:
� Large law firm located in TX, LA, MS and AL
� >700 employees
� Data center in downtown New Orleans
Email Environment:
� Multiple Exchange Servers
� New Orleans Data Center
Outage Scenario:
� Data center loses Internet connectivity, power for weeks
� Intermittent Internet, power follow for months
� FEMA took generator fuel
Infrastructure goes down in wake of flooding and power loss
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 36
Case Study: Hurricane Causes OutageLengthy local outage, intermittent network / power create availability challenges
Clustering
Replication
Log Shipping
SAN
Continuity
Local infrastructure out, data lost
Local Infrastructure out, local cluster lost
Local Infrastructure out, Failover in two hours
Local Infrastructure out, High risk of data corruption
Email access available from web outside New Orleans
Likely ScenarioOption
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 37
Case Study: AD Corruption Knocks Out Email
Company Background:
� Large Internet Company
� 26,000 employees
� Global Redundancy
Email Environment:
� Exchange environment
� Multiple European and US datacenters
� SAN, Cluster, Replication & Continuity
Outage Scenario:
� Centralized Active Directory server fails
� Corruption propagated to European AD
� Email down globally 14 hours while AD Server rebuilt
Corruption propagates across email environments
© 2007 MessageOne – Confidential
Case Study: AD CorruptionThe Directory is an important single point of failure to protect against
Clustering
Log Shipping
SAN
Continuity
Data protected but unavailable
Data protected but unavailable
Data protected but unavailable
Replication Data protected but unavailable
Email access available from web or Outlook
Likely ScenarioOption
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 39
Case Study: Regional Power OutageGlobal hotel chain has 400+ email servers
Company Background:
� Geographically dispersed hotel chain
� > 40,000 Global Users
� Staff trained and well prepared
Email Environment:
� Microsoft Exchange
� Servers in 400+ locations
Outage Scenario:
� Regional power outage in Pakistan cuts email access for 1,000 users
© 2007 MessageOne – Confidential
Case Study: Regional Power Outage
Clustering
Replication
Log Shipping
SAN
Continuity
Data protected but unavailable
Data protected but unavailable
Failure contained to primary site, clean standby fail-over
Failure contained to primary site, clean standby fail-over
Email access available from web outside
Local Infrastructure Outage effects 1,000 remote users
Likely ScenarioOption
Email Continuity – Key Conclusions
© 2007 MessageOne – Confidential
Key HA/DR CapabilitiesEnsure your solution meets your needs economically
Increasing Levels of Protection
ReplicationClusteringSAN Log-Shipping Continuity
2 hourfailover
Immediatelocal failover
Data only 60 second activation
2 hourfailover
Email Continuity
� Purpose is to restore or continue
email access when an outage
occurs
� Includes web-based email
continuity solutions which are
synchronized with your primary
email system
Email Recovery
� Purpose is to restore Exchange,
Notes, Groupwise environment
quickly when problems occur
� Includes storage-based solutions,
clustering, replication, and log
shipping solutions
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 43
Tips for Ensuring Email Continuity
� Maintain geographically isolated primary and backup systems
� Pay for protection, not for redundancy
� Utilize an on-demand service to protect against local infrastructure outages
� Use integrated services to minimize maintenance and manual operations
� Fail-over for planned and unplanned events
� Ensure that archiving is part of your continuity solution
� Protect wireless device communications
� Carefully access system complexity and staff training requirements
� Consult with industry peers on actual deployment times and costs
Eliminate email downtime and data loss
© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 44
Thank You
www.messageone.com
Call and speak to one of our representatives at:
888-367-0777
For more information on email continuity
Thank you!
Paul D’ArcyTel: +1 512 652 4500