Disaster Recovery - cisco.com · Network fail-over can happen within 10s of ... Load Balancing...
Transcript of Disaster Recovery - cisco.com · Network fail-over can happen within 10s of ... Load Balancing...
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 1
Disaster Recovery
KwaiSeng Consulting Systems Engineer
Asia
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 2
Agenda
Active-Active DC Design Overview
Site Selection Basic Mechanisms
Design Considerations for Site Selection
The Middle Tier for DR Design
The Backend Tier for DR Design
Conclusions
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 3
Active-Active DC Design Overview
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 4
Business Continuance – Disaster Recovery
Web Users
X Primary Data Center
Secondary Data Center
Internet SP-A SP-B
Primary with a Secondary Backup Site Recovering Service Availability after
Failure Active-Passive Design – two data centers Highly Available - Data Center Infrastructure Network fail-over can happen within 10s of seconds Application/Server Recovery time is based on the time it take to complete Data Synchronization of back-end data base, application servers and Web servers
Data Synchronization
after Failure
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 5
Primary Data Center
Secondary Data Center
App A App B App A App C
FC FC
Active-Active Data Center Design
Data Replication
Required by disaster recovery and business continuance
Avoid single, concentrated data depositary High availability of applications and data access
Load balancing together with performance scalability
Better response and optimal content routing: proximity to clients
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 6
Disaster Recovery Failure Scenarios and Mitigations
Network failure Routing Enhancement
NSF, BFD, EtherChnl
Device failure Device Level HA
SSO/NSF, GOLD
Storage failure Device Redundancy, Data Mirroring, Site to Site
Site failure Site Selection
Types of Failure Internet Service
Provider A Service
Provider B
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 7
Primary Data Center
Secondary Data Center
App A App B App A App C
FC FC
Front-End IP Access Layer
“Content Routing” Site Selection
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 8
Primary Data Center
Secondary Data Center
App A App B App A App C
FC FC
Application and Database Layer
“Content Switching” Load Balancing
“Server Clustering” High Availability
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 9
Primary Data Center
Secondary Data Center
App A App B App A App C
FC FC
Backend SAN Extension
“Storage” and “Optical” Data Replication and Transporting
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 10
Site Selection Basic Mechanisms
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 11
HTTP Redirection—Traffic Flow
http://www2.cisco.com/
http://www1.cisco.com/
http://www.cisco.com/
3. GET/HTTP/1.1 Host: www2.cisco.com HTTP/1.1 200 OK
Keepalives
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 12
DNS-Based Site Selection—Traffic Flow
Client
DNS Proxy
Data Center 1
http://www.cisco.com/
Root Name Server for/ Authoritative Name Server for .com
Authoritative Name Server
www.cisco.com
Authoritative Name Server
www.cisco.com
Keepalives
1
2 3 4
5 6
7 8
9
10
Data Center 2
UDP:53 TCP:80
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 13
Route Health Injection—Implementation Client B Client A Router 13
Router 11
Router 12
Router 10
Location B Preferred Location for
VIP x.y.w.z
Location A Backup Location for
VIP x.y.w.z
Very High Cost
Low Cost
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 14
Site Selection Summary
Redundancy
Mode
Convergence App Health Visibility
Site Persistence
HTTP
Re-Direct
Active/Active No No Yes
DNS Active/Active DNS Cache Yes No
RHI Active/Standby Within Secs Yes No
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 15
Design Considerations for Site Selection
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 16
Tokyo Data
Center #2
DNS Global Control Plane
Resolver
Dedicated Disaster Recovery Solution
DNS Name Servers
NJ Back-up
Data Center #3
Chicago Data
Center #1
IP Control/ Forwarding Plane
Enables highly available, globally distributed data center infrastructure
GSS Cluster
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 17
CSS-B CSS-A
Servers Site 1 Keepalives: TCP ICMP HTTP-Head SNMP
CSS-B CSS-A
Servers Site 2
Keep Alives – Universal Mechanism to Track Global Load and Availability
KALs – back-end process gathers state and load information from devices within the data center such as local server load balancers, and origin servers
KALs (up to 5) can be grouped and logically “AND” together
V2.0 adds a new KAL type --- SNMP based that can monitor any MIB value and use it for load balancing feedback
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 18
Route Health Injection
Disaster Recovery within Seconds Achieved!
Primary Site
To Internal Network (IGP)
Enterprise Edge Router
I-BGP
E-BGP E-BGP
Backup Site
To Internal Network (IGP)
E-BGP
E-BGP Internet
ISP2 (AS 2)
141.41.248.x
30.30.30.x
72kSecEdge (AS 3)
151.
41.2
48.x
160.
41.2
48.x
Route Advertised
Conditional Advertisement
ISP1 (AS 1)
72kPriEdge (AS 3)
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 19
Tokyo Data
Center #2
DNS Global Control Plane
Resolver
Self protecting DNS infrastructure
Compromised DNS Name Servers or DNS bots
NJ Back-up
Data Center #3
Chicago Data
Center #1
IP Control/ Forwarding Plane
Provides Security Focused, highly available, DNS/DHCP/TFTP infrastructure for one or more data centers. Automatically identifies DNS-based DDOS attack and mitigates the attacks
GSS Appliance Cluster with Full DNS and IP Management Services
- Compatible with:
BIND (all record types and Zone TX) Dynamic DNS
Rate limits these specific DNS
Request
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 20
Improving DNS Survivability
Detects and mitigates the DNS focused Distributed Denial of Service (DDoS) attacks. Multiple defenses including source verification With the granularity and accuracy to provide new levels of business continuity by processing only legitimate DNS requests Delivering the performance and architecture suitable for the largest enterprises and providers Addresses DDoS attacks today, and its network-based behavioral anomaly capability will be extended to additional DNS focused threats
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 21
The DDoS prevention module will primarily have 3 functions: 1. Filters 2. Rate-limiting – per D-proxy with peacetime learning 3. Anti-spoofing – cookie insert
It will thwart typically the following kinds of attacks: Rapid DNS queries for the same domain (‘replay attack”) (DoS) from a specific
source IP. Broadcast IP addressing as source IP Multicast IP addressing as source IP Empty IP addressing as source IP GSS’s IP address as source IP Invalid IP range (209.165.202.128 209.165.202.159) Malformed DNS packets Rapid DNS queries for domains not configured on the GSS. DNS queries from different source IPs globally exceeding the packet processing
rate of the GSS.
DDOS mitigation
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 22
GSS and peace time learning and rate limiting
DNS
Normal Traffic Rates DNS request per second
100 D-RPS
50 D-RPS
500 D-RPS
500 D-RPS
10,000 D-RPS
10,000 D-RPS
D-Proxy 1
D-Proxy 2
D-Proxy 3
D-Proxy 4
Compromised
Compromised
Rate limit these requests
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 23
Tokyo Data
Center #2
DNS Global Control Plane
Resolver
Dedicated Traffic Manger Solution
DNS Name Servers
NJ Back-up
Data Center #3
Chicago Data
Center #1
IP Control/ Forwarding Plane
Enables highly available, optimized globally distributed data center infrastructure
GSS Cluster
North American Content Request Asia PAC Content Request
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 24
The Middle Tier Considerations for DR
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 25
Cluster Overview
Load Balancing Cluster : multiple copies of the same application against the same data set, usually read only
High Availability Cluster : multiple copies of application that requires access to a common data depository, usually read and write
Clustering provides benefits for availability, reliability, scalability, and manageability
Application Servers
Web Servers
Database Servers
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 26
High Availability Cluster Design
Public Network : Client /Application requests
Private Network : Interconnection between nodes
Storage Disk : Shared storage array, NAS or SAN
OS
Cluster Enabler
Cluster Software
APP
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 27
HA Cluster Application View Active/standby
– Standby takes over when active fails – Two-node or multi-node
Active/active – Database requests load balanced all nodes – Lock mechanism ensures data integrity
Shared everything – Each node mounts all storage resources – Provides a single layout reference system for all
nodes
Shared nothing – Each node mounts only its “semi-private” storage – Data stored on the peer system’s storage is
accessed via the peer-peer communication
Node1 Node2
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 28
Geo-Cluster: Cluster That Span Multiple Data Centers
Geo-Clusters Considerations
Node1 Node2
LocalDatacenter
RemoteDatacenter
WAN
Disk ReplicationSynchronous or Asynchronous
2 x RTT
• Challenges: Split brain
L2 heart-beats
Storage
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 29
HA Cluster Challenges : Split-Brain
Split-brain : Active nodes concurrently accessing the same disk, leads to data corruption
Resolution : Use a Quorum, a tie breaker for gaining access to the disk
Node1 Node2
Data Corruption
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 30
Layer 2 Heartbeats
Extended L2 Network : L2 adjacency required for node’s heartbeat.
Extending VLAN across site is hazardous
Resolution : L3 Capability for Cluster Heartbeat. EoMPLS to carry L2 hearbits across DR sites.
Node1 Node2
LocalDatacenter
RemoteDatacenter
WAN
Disk ReplicationSynchronous or Asynchronous
Public Layer 2 Network
Private Layer 2 Network
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 31
Storage Disk Zoning
Storage Zoning : Taking over of storage disk array when active node fails.
Resolution : Cluster Enabler to communicate with the Cluster Enabler. Instructs the Disk Array to perform an failover when failure is detected.
Node1 Node2
Extended SAN
sym1320 sym1291
Standby Active
WD
WD RW
RW
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 32
The Backend Tier for DR
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 33
Synchronous Impact to Application
Performance
Distance Limited (Are Both Sites Within the Same Threat Radius)
No Data Loss
Asynchronous No Application
Performance Impact
Unlimited Distance (Second Site Outside Threat Radius)
Exposure to Possible Data Loss
Synchronous vs. Asynchronous Trade-Off
Maximum tolerable distance ascertained by assessing each application
Cost of data loss
Enterprises Must Evaluate the Trade-Offs
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 34
Data Replication with DB Example
Mixture of Sync and Async Replication Technologies Commonly Used
• Usually only redo logs sync replicated to remote site • Archive logs created from redo log and copied when redo log switches • Point in Time (PiT) copies of datafiles and control files copied periodically (e.g.,
nightly)
Redo Logs (Cyclic) Redo Logs (Cyclic) Copy of Every
Committed Transaction
Archive Logs
Synchronously Replicated
for Zero Loss
Replicated/Copied
Primary Site Secondary Site
Replicated/Copied
Point in Time
Copy Taken When DB Quiescent
Database
Database Copy at Time t0
Database Copy at Time t0
Earlier DB Backups
Archive Logs
SAN Extension Transport
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 35
IP (FCIP) Network
WRITE XFER_RDY
DATA XFER_RDY
FCIP with Write Acceleration (WA)
Reduction in I/O Latency equal to one round trip time (RTT)
STATUS
WA WA
Extends the Effective Distance Capabilities for remote replication Improves Replication Performance at a given Distance
Reduces each Write I/O to One Round Trip over WAN (from two or more)
Local FCIP end-point “proxies” XFER_RDY
Suitable for sync or async replication – “Status” not proxied
Built into IP Services module (IPS), MDS 9216i, and MPS-14/2
FCIP Write Acceleration
IP (FCIP) Network
WRITE
XFER_RDY
DATA
FCIP – Normal Flow
STATUS
WAN Round
Trip
WAN Round
Trip
WAN Round
Trip
Local MDS Proxies
XFER_RDY
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 36
FCIP Write Acceleration – Performance Benefits
Up to 2:1 Performance Gain under most circumstances 3:1 or more with large I/Os involving three or more round trips
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 37
RTT (ms)
Thro
ughp
ut (M
B/s
)
(a) Legato Networker 7.0 (b) Windows Advanced Server 2000. Dual Xeon CPUs © IBM Ultrium TD2 LT0-2 Tape Drive
FCIP Tape Acceleration Tape backup over WAN has issues
Single Outstanding I/O reduces throughput over distance (latency) Variable latency reduces the life of tape (shoe shine effect)
FCIP Tape Acceleration overcomes the above limitations
Local MDS IPS module proxies as a tape library Remote MDS IPS module proxies as a backup server Status Proxied Write Filemarks checkpoints process
Enabler for Tape Backup over WAN Use Centralized tape library over long distances Ubiquity & economics of IP
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 38
Appliance Primary Target
Primary Hosts
FCIP
= SANTap Service
Replicated IO
Copy of I/O
Remote Target
DR Hosts
SANTap
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 39
Conclusion
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 40
Recovery Architectures Redundancy at Many Levels
Web Servers
App Servers
DB Servers
Storage Network
N-Tier Applications
Web Servers
App Servers
DB Servers
Storage Network Front End
Network
IP Layer 2/3
Front End Network
IP Layer 2/3
Remote Disk -Disk and Disk -Tape Copy Server Load Balancing
Transaction Replication
Data Center 1 Data Center 2
Internet
Rapid Spanning Tree and Routing Convergence
Intranet
Site Selection
N-Tier Applications
Database Replication
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 41
Conclusion Active Active Data Center Design
Cisco’s Global Site Selector offers solution to address Active Active DC Design Needs Active Active DC Design must be approach from a total picture Resiliency significantly enhance via network integration Mitigation against DDOS will be vital
Importance of Integrated Solution. Point products promises lots of capabilities, but is it really usable in Real World?
Leverage on Cisco Technology to meet the Data Center challenges ahead
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID 42
Q & A