Overview and Technical Walkthru Site Recovery Manager 6 · Site Recovery Manager 6.1 Technical...
Transcript of Overview and Technical Walkthru Site Recovery Manager 6 · Site Recovery Manager 6.1 Technical...
Site Recovery Manager 6.1Overview and Technical Walkthru
GS Khalsa
Technical Marketing, Storage & Availability
About This PresentationAuthor(s) Technical Marketing – GS Khalsa & Ken Werneburg
Title and General description
Site Recovery Manager 6.1 Technical Overview
Distribution and Audience type
Internal, External, Partner, Customer. This deck is designed to be a deep dive on SRM 6.1. This is considerably longer than required for most presentations as it is designed to be fairly exhaustive. For most sales conversations this deck should be dramatically shortened, and should be used in its entirety only for a longer session exploring the technology of Site Recovery Manager.For further SRM information please visit the Vault and look for the SRM Overview deck as well as the SRM offline demo.
Primary target audience
Technical: Server team (managers, directors, VI Admin), BC/DR teams.
Technical level High
Time required to present
90+ minutes
Date updated September 17, 2015
Agenda• Overview• Architecture• Topologies• Deployment and Configuration• Replication• Protection • Recovery• Workflows
Overview
VMware Site Recovery Manager
vSphere
vCenter Server Site RecoveryManager vCenter Server Site Recovery
Manager
vSphere
Production Site Recovery Site
Servers ServersArray-basedreplication
vSphereReplication
• Centralized recovery plans for 1000’s of VMs• Non-disruptive recovery testing• Automated DR workflows • Integrated with the VMware product stack• Eliminates complexity and risk of manual
processes• Enables fast and highly predictable RTOs• Provides policy-driven DR control for any
virtualized app
Transforms Management of Recovery And Migration Plans
v Weeks or months to set up recovery plans
v Unstructured and manual makes them error-prone
v Quickly fall out of sync with infrastructure changes
ü Simple set up in minutes
ü Software-defined workflows eliminate errors
ü Simple to update and keep in sync with changes
From Complex Runbooks… …to Simple Recovery Plans
Terminology
7
RPO - Recovery Point Objective
RTO - Recovery Time Objective
Last Viable Restore Point
All Functionality Recovered
Disaster Strikes
Architecture
Replication
9
•SRM is not a replication solution•SRM monitors and interacts with replication solutions
•Choice of replication options•vSphere Replication/Host Based Replication
•Array Based Replication
OS
DataApp
OS
DataApp
SAN Virtual SAN
vSphere Replication
Only changes are replicated
OS
DataApp
OS
DataApp
Site Recovery Manager Architecture
vSphere Web Client
SRM Plugin
SRM Server
vSphere
vSphere Web Client
SRM Plugin
SRM Server
vSphere
Array Replication
vCenter ServerLinked ModeSRA
Storage Storage
SRASSOPSC
vCenterServer
SSOPSC
vCenterServer
vSphere Replication
Protected Site Recovery Site
VR Appliance VR Appliance
Use Cases and Topologies
Disaster Recovery• Least frequent but most-critical use case
• Ensures fastest RTO
12
PROTECTED SITE RECOVERY SITE
Disaster Avoidance• Ensures app-consistency and zero data loss
• Zero downtime if used with stretched storage
• Proactive, controlled workflow
13
PROTECTED SITE RECOVERY SITE
Planned Migration• Most common use case• Frequent on-ramp for SRM• Enables data center maintenance and
global load balancing
14
SITE A SITE B
Active-Passive Failover
15
RecoveryProduction
•Dedicated resources for recovery•Most common•Paying for idle resources
Active-Active Failover
16
RecoveryProduction
•Run low-priority apps on recovery infrastructure•Shutdown low-priority apps as part of recovery
Bi-directional Failover
17
ProductionProduction
•Production applications at both sites•Each site acts as the recovery site for the other
Multi-Site Failover
18
SRM VC
Remote Office A
SRMVC
Main Data Center
SRM
SRM
SRM VC
Remote Office B
SRM VC
Remote Office C
SRMVC
Site B
SRM SRMVC
Site C
SRM
SRMVC
Site A
SRM
• One to One pairing of SRM servers
• Each VM only protected once
• Each VM only replicated once
• Utilize enhanced linked mode
Stretched Storage & Orchestrated vMotion
19
ProductionProduction
•Production apps at both sites with seamless mobility across sites•Zero downtime for planned events•Typically limited to a Metro distance (less than 100 km)
Stretched Storage
Deployment
Site Recovery Manager Concepts
Recovery Plans
SRM Server
Networks, Folders, Resources, Storage Policies, Placeholder
Datastores
SRM Server
One or more Protection Groups
Protection Groups
vCenterServer
vCenterServer
Protected Site Recovery Site
Site PairingMapping
Protection Groups
Resources: Networks, Folders, Resource
Pools, Storage Policies
Recovery Plans
Groups of VMs Recovered Together
Replication
Storage Replication Adapters (SRAs):• Discover arrays• Determine which LUNs are replicated• Assist in initiating tests, recovery• Other SRA capabilities
– Reprotect– Synchronization– Planned Migration
• SRM Compatibility Matrix: http://www.vmware.com/pdf/srm_storage_partners.pdf
SRM Server
SRA
Vendor Management
Interface
Array Manager
Array Manager
Replication Manager
SRA
Vendor Management
Interface
ArrayArray Array
Storage Array Integration
Storage Array Integration
vSphere Replication Overview• Per-VM, host-based replication
• Network-efficient by replicating only changed data
• Included with vSphere Essentials Plus and higher editions
25
OS
DataApp
OS
DataApp
SAN Virtual SAN
vCenter Server
vSphereReplication
Only changes are replicated
OS
DataApp
OS
DataApp
vSphere Replication Features and Benefits• Easy virtual appliance deployment
– Minimal time investment, no hardware procurement
• Integration with vSphere Web Client– Ease of administration and monitoring
• Protect any VM regardless of OS and apps– One solution reduces complexity and cost
26
OS
Data
App
vSphere Replication Features and Benefits• Flexible recovery point objective (RPO) policies
– Supports a wide variety of business requirements
• Compatible with Virtual SAN, SAN, NAS, local storage– One solution reduces complexity and cost
• Quick recovery for individual VMs– Reduces downtime, minimizes resource requirements
27
vSphere Replication Features and Benefits• End-to-end network compression
– Further reduces bandwidth requirements
• Network traffic isolation– Control bandwidth, improve performance, security
• Windows VSS and Linux file system quiescing– Increased reliability when recovering VMs
28
Management
Replication WAN
LAN
vSphere Replication Overview• Reliable: Protecting thousands of VMs since 2011
• Efficient: WAN-friendly replication with compression
• Value: Included with vSphere Essentials Plus Kit and higher
• Easy: Virtual appliance deployment, vSphere Web Client management
29
OS
DataApp
OS
DataApp
SAN Virtual SAN
vCenter Server
vSphereReplication
Only changes are replicated
OS
DataApp
OS
DataApp
Protection
Protection Groups• Group of VMs that will be recovered together
– Application– Department– System type– Or ?
• Different depending on replication type
• A VM can only belong to one Protection Group
CONFIDENTIAL 31
ProtectionGroup
vSphere Replication Protection Groups • Group VMs as desired into Protection Groups
• What storage they are located on doesn’t matter
CONFIDENTIAL 32
Protection Group 1 – Web App Protection Group 2 – Email
Protection Group 3 – SharePoint
Array Based Protection Groups
33
Consistency Group Protection Group 1 – Web AppLUN 1
Protection Group 2 – Email
Protection Group 3 – SharePoint
Datastore A
LUN 2
Datastore B
LUN 3
Datastore C
LUN 4
Datastore D
LUN 5
Datastore F
Storage Policy-Based Protection Groups
CONFIDENTIAL 34
Profile Driven Protection Group
• Policy Driven Protection
• New Style Protection Group leveraging storage profiles
• High level of automation compared to traditional protection groups
• Policy based approach reduces OpEx
• Simpler integration of VM provisioning, migration, and decommissioning
Storage Policy
Recovery
Protection Groups fit into Recovery Plans
CONFIDENTIAL 36
Protection Group 1 – Web App
Protection Group 2 – Email
Protection Group 3 – SharePoint
Protection Group 1 – Web App
Protection Group 2 – Email
Protection Group 3 – SharePoint
Recovery Plan 2 - Email
Protection Group 2 – Email
Recovery Plan 3 – Whole Site
Recovery Plan 1 – Web App
Protection Group 1 – Web App
Priorities and Dependencies UI
37
Priority Group 5
Priority Group 4
Priority Group 3
Priority Group 2
Priority Group 1
Desktop
Desktop
Desktop
Desktop
Apache
Apache
Mail Sync
Exchange
App Server 2
App Server 1Database
Priorities and Dependencies
Master Database
Dep
ende
ncy
VM IP Customization• IP Subnet Mapping
– Ability to map entire subnets rather than individual addresses
39
Shutdown & Startup Actions• Can be customized for each VM
40
Pre and post power on steps
41
• Script or Prompt
• Can be run on – Recovered VM– SRM server
Workflows
Workflows for Recovery Plans
• Recovery– Planned Migration– Disaster Recovery
• Reprotect
• Test
• Cleanup
Replication
Running a Recovery Plan – Planned Migration or Disaster RecoveryProtected Site Recovery Site• Synchronize storage
• Power off VMs
• Synchronize storage again
• Break replication
• Mount datastores to hosts at Recovery Site
• Power off non-critical VMs at Recovery Site (optional)
• Power on VMs
Differences between Planned Migration & Disaster Recovery
• Planned Migration Mode§ Allows for a data synchronization as part of the process§ Will stop on errors and allow you to resolve them before continuing§ Since it shut’s down the virtual machines being migrated, application consistent VM’s are recovered on
the recovery side
• Disaster Recovery Mode§ Allows for a data synchronization as part of the process§ Will not stop on errors§ If the protected site is available, than the virtual machines being migrated will be application consistent
at the recovery side. § If the protected site is not available the consistency state will be what was designed in the solution
45
Failback is a process of “Reverse Recovery”
Reliable and automated for
both ABR and VR
Easily return environments to
the primary production site
Failback – continued• After a reprotect, replication now goes in reverse – to the protected side
Testing a Recovery PlanProtected Site Recovery Site
Replication
not impacted
Isolated Test Network
Snapshot
• Entirely non-disruptive to production VMs and replication
• Allows for data synchronization as part of the process
• Supports a recovery that uses a different network
• Uses a clone or snapshot
Steps:
• Replicate storage (optional)
• Snapshot VM or Storage
• Mount snapshot to hosts
• Power on VMs
Cleaning up a Test Recovery• Run after testing is complete
• Steps:
• Power off VMs
• Remove VMs from inventory
• Delete snapshot
• Following cleanup, no test resources are in use at the recovery site
• Test or recovery is now ready to be run
History Reports• Each workflow operation has an associated history report
History Reports - continued
History Reports - continued
Additional Resources• Hands on Lab
• SRM Technical Overview
• SRM Evaluation Guide
• Product Documentation
• Trial Licenses
• VMTN Community Forums
• SRM FAQ
CONFIDENTIAL 53
Supplemental
Recovery Plan Steps• Are all the steps that need to be taken to recover
• Pre-sync storage – to reduce downtime
• Shutdown VMs – to ensure no data loss
• Sync storage – to get latest data
• Power On VMs - in desired sequence
• VMs can be left shutdown as part of recovery
56
Multi-site UI
CONFIDENTIAL 57
Advanced – IP Customization
Forced Recovery (Introduced in 5.0.1)
VMware vSphere
VMwarevCenter Server
Site RecoveryManager
VMwarevCenter Server
Site RecoveryManager
VMware vSphere
Site A (Primary) Site B (Recovery)
Servers Servers
??
Avoid delays to RTO when protected site is inconsistent
Testing a Recovery Plan
Testing a Recovery Plan
VM’s are ready to be used now
All Paths Down Handling
VMFS
vSphere vSphere
VMFS
vSphere vSphere
Limits
CONFIDENTIAL 63
Maximum
Protected virtual machines total 5000
Simultaneously recoverable VMs 2000
Protected virtual machines in a single protection group 500
Protection groups 250
Simultaneous running recovery plans 10
vSphere Replicated virtual machines 2000
SDRS, sVmotion & Array Based Replication
• SRM + SDRS supported with heterogeneous datastore clusters– replicated and non-replicated– mix of consistency groups
• Protected VM state maintained
DatastoreCluster
CG
1C
G 2SDRS
SDRS
SDRS, sVmotion & vSphere Replication
LUN 1
LUN 2
• Protected or recovery site SDRS or svmotion move between devices supported
• Protected VM state maintained
• Full sync resumes (not restarts) if interrupted by svmotion or SDRS move
Embedded vPostgres Database
• Provides alternate integrated and simplified installation option
• Supports any size SRM environment
CONFIDENTIAL 66
Enhanced topology support
SRMVC
Site A
Shared Site
VCSRM SRM
SRM VC
Site A
Enhanced topology support• Shared recovery site and shared protected site support
SRM VC
Remote Office A
SRMVC
Main Data Center
SRM
SRM
SRM VC
Remote Office B
SRM VC
Remote Office C
Enhanced topology support
Remote Office A
Remote Office B
Remote Office B
SRMVC
Shared DR Site
SRMVC
Site A
Remote OfficeSRM
Remote OfficeVC
SRM
Enhanced topology support
SRMVC
Site B
SRM SRMVC
Site C
SRM
SRMVC
Site A
SRM
SRM and VR Interop resolution
Point in time recovery is available in SRM when using vSphere Replication
SRM Advanced Settings dialog to instruct SRM to preserve/collapse MPIT images
vrReplication.preserveMpitImagesAsSnapshots
Preserved by default so set at both sites
Running a Recovery Plan
Recovery Plan