Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of...

15
Protecting Email Communication by Preventing Downtime Continuity Insights Paul D’Arcy, VP Marketing MessageOne © 2007 MessageOne – Confidential Wednesday, June 09, 2010 2 Agenda Introduction The Problem with Email Protecting Communications: Approaches to Email Availability Outage Scenarios Email Continuity – Key Conclusions Protecting Email Communications by Preventing Downtime The Problem With Email

Transcript of Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of...

Page 1: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

Protecting Email Communicationby Preventing Downtime

Continuity Insights

Paul D’Arcy, VP MarketingMessageOne

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 2

Agenda

� Introduction

� The Problem with Email

� Protecting Communications: Approaches to Email Availability

� Outage Scenarios

� Email Continuity – Key Conclusions

Protecting Email Communications by Preventing Downtime

The Problem With Email

Page 2: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 4

Email is:

� The most commonly used business and communication application

� Primary channel for sales, customer support, and business communications

� Growing rapidly -- per user email volumes will double in next 3-years

� Considered a business record by the courts

Why Focus on Email?Email has become the most important business application

75%Percent of business

information isstored in email

3Hours per employee

per day spent on email

10Number of minutes

before employees find email downtime painful

80MB / month of email sent & received by

typical user

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 5

Email is a Tier One ApplicationGiven its growth and importance, email is typically classified as a Tier One application

RTO< 96 Hours

RPO<24 Hours

RTO< 48 Hours

RPO< 24 Hours

RTO< 24 Hours

RPO< 4 Hours

RTO < 2 Hours

RPONo Data Loss

Tier 4 Tier 3 Tier 2 Tier 1

Application Criticality

Email

Least Critical Most Critical

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010

The Problem with Email: Complexity

� Typical email system deployment

has independent servers for:

- Email application

- Web mail

- DNS

- Directory

- Wireless device access

- Anti-spam / perimeter protection

- Clustering services

- Mail storage

� All components must function

� IT must failover / restore many

components correctly

Email is a complex ecosystem

Page 3: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 7

The Problem with Email: Single Point Of FailureThere are many SPOF, any of which can take down email

Common Single Points of Failure

Lotus NotesExchange

� Active Directory

� DNS

� Windows Authentication

� Exchange non-clustered Server

� Operating System malware

� Software / Hardware

� Network

� Single-site data center

� Power

� Storage / SAN

� AC

� Domino non-clustered server

� Operating System malware

� DNS

� Software / Hardware

� Network

� Single-site data center

� Power

� Storage / SAN

� AC

Exchange corruption on a single mail store can affect 100s if not 1000s of users.

© 2007 MessageOne – Confidential

� Everyone plans for natural disasters but they are

infrequent

� Technical failures are a given

� But the amount of human error is staggering and is very difficultto avoid

The Problem with Email: Human Error

Source: MessageOne Activation Data

The hardest failure cause to prevent

Human Error Based Outages Are Common

14%

46%

40%

TechnicalFailure

NaturalDisaster

Email Failures by Event Type

Human

Error

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 9

� 75% of companies experience a major email outage each year

� 14% will have a major planned outage each year

� 55% of outages will last more than 6 hours

The Problem with Email: Reliability

Source: MessageOne Activation Data

Email remains fragile and prone to downtime

Email Will Fail

Email Failures by Event TypeEmail Failures by Event

Type

14%

31%

InfrastructureHardware

Directory

Software

Storage/Database

24%

4%

27%

14%

Email Failures by Cause

Page 4: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 10

Cause #1: Local Infrastructure Outages

Infrastructure Failure Accounts for 31% of All Outages

� Includes network downtime

� Real-world infrastructure failures include:

- Datacenter power outage

- Termite infestation

- Cold weather caused pipes to burst & flood a 5th floor data center

- Construction work destroyed fiber lines causing an extended network outage

- Security guard hit the wrong button

Email infrastructure availability is unpredictable

A 5,000 person health care provider lost $3 million during 8-hour outage when IT staff accidentally shut down datacenter power.

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 11

Cause #2: Hardware Failure

Hardware Failure Accounts for 27% of All Outages

� Includes servers, drives, cables, RAM, Cache, etc.

� Real world examples include:

- RAM defect corrupting database

- Catastrophic hard drive failure knocking out server OS

- Power/UPS/HVAC Failure taking down server

- Logic board defect causing clusters to fail to communicate

- Routers dropping traffic and corrupting archives

Hardware will eventually fail

A 2,000 person national law firm CIO estimates a recent email hardware outage cost the firm $100,000/hr in lost revenue and productivity.

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 12

Cause #3: Storage and Database Failure

Storage and Database Failures Account for 24% of All Outages

� Complex storage systems are difficult to operate and maintain

� Real world examples include:

- Database corruption causing 6 day outage

- SAN device failure taking out local data stores

- SAN configuration errors causing data loss windows

- Loss of retention policy compliance due to storage system failures

- Costly recovery operations from tape backups

Result in in loss of productivity and inaccurate archives

Five financial services firms were fined a total of $8.25 million for failure to protect and preserve email communications.

Page 5: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 13

Cause #4&5: Software & Directory Problems

Software and Directory Problems Account for 18% of All Outages

� Email software systems are complex ecosystems with many interwoven parts; server software, directories, authentication systems

� Real world examples include:

- Configuration errors and software corruption

- Directory failures and directory corruption

- Migrations, faulty patches, upgrades

- Out of date drivers

- Security threats including viruses, worms and malware

Occur when working within complex IT environments

A national financial services firm lost $6 million from a virus-related email outage, plus damage because their financial planners lost access to their email calendar, used to track customer appointments.

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010

How Long Do Outages Last?Over one-quarter of email outages last over two days

72% of email outages last 4 hours or more:

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 15

2.0 Hours

32.0 Hours

Tier 1 RTO Average “Real-World”Downtime per Incident

Few companies are able to meettheir stated RTO for email

� Existing DR & HA options are incomplete

- Database corruption- Directory failures- Windows viruses

� Email is built on large corruptible databases

� Frequent target of malware and viruses

Why?

Downtime is Almost Unavoidable Gaps in security, availability, and disaster recovery

Page 6: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

Protecting Communications: Typical Approaches to

Email Recovery and Continuity

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 17

Email Continuity

� Goal: Avoid interruption of end user

email services during an outage

or failure

� Seamlessly transitions end users to

a back-up system when problems

occur

� Backup system may be structurally

different than primary email system

� Continuity services, log-shipping,

and some replication solutions

provide email continuity

Email Recovery

� Goal: Restore primary environment

to normal state after an outage or

failure

� Recreate application, data,

configurations with as little data

loss as possible

� May take hours or days

� Tape, SAN’s, vaulting, replication,

log-shipping solutions provide

email recovery

Email Recovery and ContinuityBoth are necessary for complete email availability

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 18

Replication: A Common Approach for High Availability

� Utilize two geographically separated datacenters

� Purchase 2x hardware

� Purchase 2x software

� Build a large network pipe between datacenters

� Deploy, configure and maintain dual environments identically

� Hire staff to ensure 24/7 failover when problems occur

� Still susceptible to same worms/viruses as primary environment

Replication is typically the 1st DR approach that CIOs evaluate

Page 7: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 19

How Much Should it Cost to Protect Email?Insurance should not cost more than your initial investment

New Truck

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 20

How Much Should it Cost to Protect Email?Insurance should not cost more than your initial investment

New Truck

Special Cable

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 21

How Much Should it Cost to Protect Email?Insurance should not cost more than your initial investment

New Truck Second Truck… just in case

Special Cable

Page 8: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 22

How Much Should it Cost to Protect Email?Insurance should not cost more than your initial investment

New Truck Second Truck… just in caseSpare Driver

Special Cable

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 23

How Much Should it Cost to Protect Email?Insurance should not cost more than your initial investment

New Truck Second Truck… just in caseSpare DriverMaintenance crew

Special Cable

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 24

Speed to Recovery Determines CostRecovery speed & breadth of coverage are the primary determinants of cost

Gartner: Technologies to Reduce Recovery Time

Page 9: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 25

Five Approaches to Email Availability

“Organizations do not apply the same operational rigor to email that they apply to other mission critical business systems…and we believe they

must…to ensure maximum reliability.” - Gartner 2007

But none provide full protection from email downtime

Increasing Levels of Protection

ReplicationClusteringSAN Log-Shipping Continuity

Asynchronous replication of

data to a remote data center

2 hourfailover

Multiple hardware nodes in the same data

center

Immediatelocal failover

Provides data availability via a

point-in-time snapshot stored

in redundant local data stores

Data only

A remote SaaSemail continuity

system

60 second activation

Transaction-level replication

of data to a remote data

center

2 hourfailover

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 26

Storage Area Network (SAN)An option for protecting data

Definition:

� Solution for storing redundant copies of data to prevent data loss

Strengths:

� If database becomes corrupt, can return to last valid snapshot

� Enables disaster recovery

Weaknesses:

� Expensive

� Doesn’t protect against local power, network or data center outages

� Doesn’t protect against hardware/software problem

� Complex to set up and maintain

SANs provide a good means to protect data from loss or corruptionbut do not provide true high-availability

Local Remote

Da

taA

pp

lic

ati

on

SAN Architecture

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 27

ClusteringProtection against local server hardware failures

Definition:

� Hot or cold standby servers co-located within the primary datacenter that can take over for disabled servers when problems arise.

Strengths:

� Fast failover

� No interruption to users if primary fails

� Planned maintenance can occur without downtime

Weaknesses:

� Expensive

� Doesn’t protect against power outages, network or local infrastructure

� Database corruption and viruses can impact all nodes

� Set up and admin require highly skilled staff

Clusters provide protection against local server hardware failures but do not ensure high-availability

Local Remote

Da

taA

pp

lic

ati

on

Clustered Architecture

Page 10: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 28

ReplicationProtection from site-level failures

Definition:

� An identical second mail system in a redundant data center keeps a binary replication

Strengths:

� Fastest recovery-focused option

� Immediate access in case of failure

� Copy is insurance against threats to the primary data store

� Enables continued business operations

Weaknesses:

� Very complex and expensive

� Replicates corruption and viruses

� Fail-over and Fail-back requires highly skilled staff

Replication provides quick recovery but are not immune to database corruption, directory problems and viruses

Local Remote

Da

taA

pp

lic

ati

on

Replicated Architecture

…101011101…

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 29

Log ShippingCorruption resistant protection from site-level failures

Log Shipping provides quick recovery and better immunity to database corruption, but susceptible to directory problems and viruses

Definition:

� An identical second mail system in a redundant data center is populated from transaction logs

Strengths:

� Fastest recovery-focused option

� Immediate access in case of failure

� Copy is insurance against threats to the primary data store

� Enables continued business operations

Weaknesses:

� Log-shipping creates enormous data stores

� Very complex and expensive

� Replicates configuration errors and viruses

� Requires highly skilled staff

Local Remote

Da

taA

pp

lic

ati

on

Log Shipping Architecture

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 30

Exchange 2007 Uses Log Shipping - 3 Flavors

Local Continuous Replication (LCR)

Single server solution utilizing log shipping to provide data redundancy.

Limitation: Does not protect against server failures.*

Cluster Continuous Replication (CCR)

Two node cluster/log shipping that relies on Microsoft Clustering Services for continuity.

Limitation: No manual failover for planned activations, must use same AD site; locations need to be relatively close.*

Standby Continuous Replication (SCR)

New with SP1, SCR utilizes log shipping for replication of databases to other servers located anywhere on the Intranet.

Limitation: Failover should take less than 30 minutes but could be longer, and there will be some data loss.*

Exchange 2007 high availability options do not provide complete protection

* Source: Gartner Research “Exchange Server 2007 HA/DR: Options, Benefits and Limitations”, 10/27/07

1.

2.

3.

Page 11: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 31

ContinuityLow-cost, maximum protection from all outage types

Definition

� A hosted standby email system that can be activated in 60 seconds, data stored in remote disaster recovery class datacenters

Strengths:

� Outages invisible to outside world

� Protects wireless communications devices

� Email functionality restored in less than one minute

� Only solution to work through all types of outages including database corruption, viruses, hardware/software failure & connectivity

� Inexpensive, rapid deployment

Weaknesses:

� Does not rebuild the server

� Recovery operations can be complex

Continuity services are low cost and most likely option to ensure email availability during outages, but do not fully replace recovery options

Local Remote

Da

taA

pp

lic

ati

on

Continuity Architecture

© 2007 MessageOne – Confidential

High-Availability Technology Threat CoverageSolutions to keep email up and running

27%

√√√√

Clustering

83%

√√√√

√√√√

√√√√

Log Shipping

100%59%24%Threats Covered (%)

√√√√Directory (4%)

√√√√Software (14%)

√√√√√√√√Hardware (27%)

√√√√√√√√Storage / Database (24%)

√√√√√√√√Infrastructure (31%)

ContinuityReplicationSANFailure Type

Level of Protection by Solution Type

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 33

Summary: High-Availability TechnologiesOnly continuity services truly eliminate email downtime and data loss

SAN Clustering ReplicationLog

ShippingContinuity

Cost $$ $$$ $$$$ $$$$ $

Recovery Time (minutes) N/A 0 60-180 60-180 1

Designed for Continuity No Yes Yes Yes Yes

Designed for Recovery Yes No Yes Yes No

Operational Difficulty Medium Medium High High Low

Requires dedicated staff Yes Yes Yes Yes No

Supports wireless devices No No No No Yes

Page 12: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

Protecting Communications: Outage Scenarios

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010

Case Study: Katrina Causes Outage

Company Background:

� Large law firm located in TX, LA, MS and AL

� >700 employees

� Data center in downtown New Orleans

Email Environment:

� Multiple Exchange Servers

� New Orleans Data Center

Outage Scenario:

� Data center loses Internet connectivity, power for weeks

� Intermittent Internet, power follow for months

� FEMA took generator fuel

Infrastructure goes down in wake of flooding and power loss

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 36

Case Study: Hurricane Causes OutageLengthy local outage, intermittent network / power create availability challenges

Clustering

Replication

Log Shipping

SAN

Continuity

Local infrastructure out, data lost

Local Infrastructure out, local cluster lost

Local Infrastructure out, Failover in two hours

Local Infrastructure out, High risk of data corruption

Email access available from web outside New Orleans

Likely ScenarioOption

Page 13: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 37

Case Study: AD Corruption Knocks Out Email

Company Background:

� Large Internet Company

� 26,000 employees

� Global Redundancy

Email Environment:

� Exchange environment

� Multiple European and US datacenters

� SAN, Cluster, Replication & Continuity

Outage Scenario:

� Centralized Active Directory server fails

� Corruption propagated to European AD

� Email down globally 14 hours while AD Server rebuilt

Corruption propagates across email environments

© 2007 MessageOne – Confidential

Case Study: AD CorruptionThe Directory is an important single point of failure to protect against

Clustering

Log Shipping

SAN

Continuity

Data protected but unavailable

Data protected but unavailable

Data protected but unavailable

Replication Data protected but unavailable

Email access available from web or Outlook

Likely ScenarioOption

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 39

Case Study: Regional Power OutageGlobal hotel chain has 400+ email servers

Company Background:

� Geographically dispersed hotel chain

� > 40,000 Global Users

� Staff trained and well prepared

Email Environment:

� Microsoft Exchange

� Servers in 400+ locations

Outage Scenario:

� Regional power outage in Pakistan cuts email access for 1,000 users

Page 14: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – Confidential

Case Study: Regional Power Outage

Clustering

Replication

Log Shipping

SAN

Continuity

Data protected but unavailable

Data protected but unavailable

Failure contained to primary site, clean standby fail-over

Failure contained to primary site, clean standby fail-over

Email access available from web outside

Local Infrastructure Outage effects 1,000 remote users

Likely ScenarioOption

Email Continuity – Key Conclusions

© 2007 MessageOne – Confidential

Key HA/DR CapabilitiesEnsure your solution meets your needs economically

Increasing Levels of Protection

ReplicationClusteringSAN Log-Shipping Continuity

2 hourfailover

Immediatelocal failover

Data only 60 second activation

2 hourfailover

Email Continuity

� Purpose is to restore or continue

email access when an outage

occurs

� Includes web-based email

continuity solutions which are

synchronized with your primary

email system

Email Recovery

� Purpose is to restore Exchange,

Notes, Groupwise environment

quickly when problems occur

� Includes storage-based solutions,

clustering, replication, and log

shipping solutions

Page 15: Protecting Email Communication by Preventing Downtime... · The Problem with Email: Single Point Of Failure There are many SPOF, any of which can take down email Common Single Points

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 43

Tips for Ensuring Email Continuity

� Maintain geographically isolated primary and backup systems

� Pay for protection, not for redundancy

� Utilize an on-demand service to protect against local infrastructure outages

� Use integrated services to minimize maintenance and manual operations

� Fail-over for planned and unplanned events

� Ensure that archiving is part of your continuity solution

� Protect wireless device communications

� Carefully access system complexity and staff training requirements

� Consult with industry peers on actual deployment times and costs

Eliminate email downtime and data loss

© 2007 MessageOne – ConfidentialWednesday, June 09, 2010 44

Thank You

www.messageone.com

[email protected]

Call and speak to one of our representatives at:

888-367-0777

For more information on email continuity

Thank you!

Paul D’ArcyTel: +1 512 652 4500

[email protected]