Post on 03-Jan-2016
description
Overview of high availability in Microsoft SQL Server
Szymon Wójcik
Agenda
IntroductionWhat is availability?What is and why to choose high availability?Key factors to consider for high availability scenarioHigh availability techniques in Microsoft SQL Server
ReplicationLog shippingMirroringFailover clustering
Discussion
PLSSUG Cracow Partners
Agenda
IntroductionWhat is availability?What is and why to choose high availability?Key factors to consider for high availability scenarioHigh availability techniques in Microsoft SQL Server
ReplicationLog shippingMirroringFailover clustering
Discussion
Introduction
Szymon WójcikExperience with MS SQL Server since 2000 (dev/admin)MCITP: DBA SQL Server 2005Interests:
Performance tuningHigh availability
Blog – sqlphobosq.wordpress.comTwitter - @phobosq
Agenda
IntroductionWhat is availability?What is and why to choose high availability?Key factors to consider for high availability scenarioHigh availability techniques in Microsoft SQL Server
ReplicationLog shippingMirroringFailover clustering
Discussion
Availability [1/5]
One of the concepts defined within ITILAbility to perform its agreed function when requiredDetermined by:
Reliability – how long (MTBF)Maintainability – how quickly restored (MTRS)Serviceability – contract conditionsPerformanceSecurity
ConfidentialityIntegrityAvailability
Availability [2/5]
Best practice – measure in %:
Agreed Service Time – defined in SLA (Service Level Agreement)Downtime – duration of service unavailability during Agreed Service TimeImportant when planning/deploying a service to understand availability concept
Availability [3/5] – figures for one week
Allowed downtime duration per week [hh:mm:ss format]Availability level 8x5 (40 hours/week) 24x7 (168 hours/week)
80% 08:00:00 33:36:00
90% 04:00:00 16:48:00
95% 02:00:00 08:24:00
98% 00:48:00 03:21:36
99% 00:24:00 01:40:48
99,9% 00:02:24 00:10:05
99,99% 00:00:14,4 00:01:01
99,999% 00:00:01,44 00:00:06
Availability [4/5] – figures for one year
Allowed downtime duration per year [DD.hh:mm:ss format]Availability level 8x5 (40 hours/week) 24x7 (168 hours/week)
80% 52.00:00:00 73.00:00:00
90% 26.00:00:00 36.12:00:00
95% 13.00:00:00 18.06:00:00
98% 5.04:48:00 7.07:24:00
99% 2.14:24:00 3.15:48:00
99,9% 0.06:14:24 0.08:45:36
99,99% 0.00:37:26 0.00:52:33
99,999% 0.00:03:45 0.00:05:15
Availability [5/5] – important notes
Availability != Uptime (service may be up but unavailable)Scheduled downtime does not have to cause unavailability (up to definition in SLA)
Agenda
IntroductionWhat is availability?What is and why to choose high availability?Key factors to consider for high availability scenarioHigh availability techniques in Microsoft SQL Server
ReplicationLog shippingMirroringFailover clustering
Discussion
High availability - definition
System design approach and service implementation that ensures a certain level of operational performance (Wikipedia)Masks the effects of hardware or software failureMaintains availability of applications so that perceived downtime is minimized (Microsoft)
High availability != disaster recovery
High availability is used for ensuring for meeting Service Level Target for availabilityDisaster recovery is ensuring operational continuityThey can be used complementary – HA can minimize the need of invoking DR, but never replace it
Why to choose high availability
For users:Minimizes downtime probabilityAllows to sustain a failure if properly designed
For administrators:Simplifies migration effortMinimizes risk of continuity
Agenda
IntroductionWhat is availability?What is and why to choose high availability?Key factors to consider for high availability scenarioHigh availability techniques in Microsoft SQL Server
ReplicationLog shippingMirroringFailover clustering
Discussion
Single point of failure
A whole system is as strong as the weakest link
User
Server
LAN
Switch
Router
Server on the Web
Hardware redundancy
Introduce additional hardware to minimize risk of failure
User
Server
LAN
Switch
Router
Server on the Web
Switch
Router
Hardware redundancy
Not only whole machines may be multiplicated to become fault tolerantAlso components:
Power suppliesCPUsHard disksNetwork interface cardsStorage controllers
Standby node
A standby node is a machine in a HA system that takes over in case of primary server failureThree types:
Cold standby – Unplugged, needs to be prepared before useWarm standby – Ready to use, but requires manual switchHot standby – Ready to use, takes over automatically
Fail over = switching from primary to standbyFail back = return to primary
There may be more than one standby in HA scenario!
Load balancing vs failover
Load balancing – distributing of workload between several peer servers
If one goes down, others take overWorkload distributed by load balancer
Failover – automatic switch to standbyStandby is not activeSwitch initiated upon loss of heartbeat
Other points
High availability requires additional costs – multiple components must be present according to design in order to meet requirementsIt may become complex to maintain – additional CIs present in environment that need to be kept up-to-dateHardware design must be followed by software to fully benefit from HAKISS – Keep It Simple and Stupid
Agenda
IntroductionWhat is availability?What is and why to choose high availability?Key factors to consider for high availability scenarioHigh availability techniques in Microsoft SQL Server
ReplicationLog shippingMirroringFailover clustering
Discussion
High availability in Microsoft SQL Server
SQL Server, as a RDBMS, provides means for failover scenarioLoad balancing is difficult and must be properly designed in order to workHigh availability in SQL Server does not prevent logical data corruption – periodic DBCC checks are advised
HA methods overview in SQL ServerMethod What it does Standby
type# of standby
nodesRemark
ReplicationTransfers completed transactions to standby nodes
Cold/warm Any• Standby is accessible• May allow for updates• Conflicts may appear
Log shippingPerforms regular log backups, copy to standby and restore
Cold Any• Database unavailable
during restore• Standby may be
accessible
Database mirroring
Replays transactions as they are logged Warm/hot 1
• Standby unavailable• Requires third server
to allow hot standby
Failover cluster
Monitors Windows service status and transfers execution
Hot Any
• Requires shared (or replicated) storage
• Requires identical hardware
• Failover = downtime
Replication
Three server roles in replication:PublisherDistributorSubscriber
Three types:SnapshotTransactionalMerge
Two subscription methods:Push – Distributor pushes articles to SubscribersPull – Subscribers downloads from Distributor
Replication topology
Publisher
Distributor
Subscriber
Publisher
Subscriber
Subscriber
Subscriber
Possible application of replication
Create a second copy of data to be used in case of emergency (DR)Create a copy of data to offload the server (load balancing)Allow offline users to work with data and upload their changes later (high availability)
Replication agents
External programs which are used to implement replication:
Snapshot Agent:creates snapshots
Log Reader Agent:Reads transaction logMarks transactions for replication
Distribution Agent:Dispatches transactions to Subscriber
Merge Agent:Downloads remote and uploads local changesResolves conflicts in merge replication
Snapshot replication
Publisher makes a copy of a database which is applied at SubscriberGood for small, static data:
Whole snapshot is applied every time – the changes which appear after snapshot will be applied with next snapshotRequires sufficient bandwidth
Transactional replication
Starts with a snapshotTransactions are recorded at Publisher and replayed at SubscriberMay allow for updatable subcriptionsIf Subscriber is offline, records are stored at the Distributor
Merge replication
Starts with a snapshotMerges changes between Publisher and SubscribersAllows synchronization via HTTPS (since SQL Server 2008)Allows the most autonomous design – e.g. mobile users, multiple branch offices working on the same data
Replication how-to
Configure DistributorConfigure Publisher:
Select replication typeSelect articles to be published[Optional] Set up article filteringSet up security
Configure Subscribers:Connect to DistributorSelect subscription method
Apply snapshot[Transactional/merge] Synchronize changes
Failover in replication
Stop subscriptionDirect all traffic from Publisher to Subscriber:
Change application connection stringsChange DNS aliases, if required, orChange IP addresses
Failback in replication
After restoring Publisher, restore a copy of database from SubscriberDirect all traffic from Subscriber to PublisherReestablish the replication
Log shipping
Keeps a standby by automating backup, copy and restore processThree server roles in log shipping:
PrimarySecondaryMonitor
How it works? [1/2]
Restore a full backup from Primary to Secondary and then:
A job runs on Primary which backs up transaction logSecond job copies the log backup to SecondaryThird job on Secondary restores the log after it’s copied
[Optional] Monitor server tracks performance and incidents
How it works? [2/2]
Primary Secondary
Monitor
Failover in log shipping
Copy transaction log backups from primary to secondaryBackup tail of the log on primaryRestore all backups except tail-log with NORECOVERYRestore tail-log with RECOVERYDisable log shipping jobsRedirect client traffic to secondary
Drawbacks of log shipping
You can’t miss a transaction log backupNetwork traffic generated has to be consideredYou are always behind on SecondarySecondary is read-only
Database mirroring
Allows to keep your standby up-to-dateAllows automatic failoverCost-effective alternative to clusteringAvailable in Standard Edition (2005 – 2008 R2)Does not require cluster capable hardwareMight be in implemented when Windows Authentication mode is not possible (using certificates)
How it works?
Principal Mirror
Witness
Database mirroring modes
High availability (with witness)Automatic failoverSynchronous transaction commit (principal commits after mirror confirms it’s commit)
High protection (without witness)Manual failoverSynchronous transaction commit
High performance (without witness)Manual failoverAsynchronous transaction commit
Manual failover in database mirroring
Can be done with one mouse click in SSMSRequires client traffic redirection:
Possible within connection string using Failover Partner command
Automatic failover in database mirroring
Initiated automatically by witness if there is no quorum:
If principal is unavailable, fails over to mirrorDoes nothing if mirror becomes unavailableFails over also if principal is up but unreachable from network!
Requires client traffic redirection:Possible within connection string using Failover Partner command
Failover clustering
Provides protection on a server level:Automatic failover in case of server failureFails over logins, endpoints and jobs
Combines multiple machines (nodes) in a single virtual serverRequires cluster-capable hardware:
Shared or common storageCertified server hardware
Clustering
Node A
Node B
UserStorage
Cluster
Failover in a cluster
Node A
Node B
UserStorage
Cluster
Node A
Node B
UserStorage
Cluster
Node A
Node B
UserStorage
Cluster
SummaryMethod What it does Standby
type# of standby
nodesRemark
ReplicationTransfers completed transactions to standby nodes
Cold/warm Any• Standby is accessible• May allow for updates• Conflicts may appear
Log shippingPerforms regular log backups, copy to standby and restore
Cold Any• Database unavailable
during restore• Standby may be
accessible
Database mirroring
Replays transactions as they are logged Warm/hot 1
• Standby unavailable• Requires third server
to allow hot standby
Failover cluster
Monitors Windows service status and transfers execution
Hot Any
• Requires shared (or replicated) storage
• Requires identical hardware
• Failover = downtime
Discussion
THANK YOU!