Module 1 Application HA With SQL, Exchange and Other Servers

7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

1/40

This video is part of the Microsoft Virtual Academy.

1


2/40

In this session we are going to be diving deeper into understanding Microsoft's high availability

solutions. Part one of this series look to the application infrastructure meaning failover clustering,virtualization and some of the other key infrastructure components.

Part two is going to look at the applications which run on top of this infrastructure. Were going to

spend most of our time looking at SQL Server, Exchange Server and then briefly cover some of the

other server high availability solutions.

Im Symon Perriman and Im going to be joined by SQL program manager Justin Erickson and

Exchange technical writer Scott Schnoll in this session.

2


3/40

Learn about Microsofts different High Availability technologies and when to use each of them.

High availability is important because it keeps our applications up and running not only for

availability but also to make sure that our customers are happy, by maintaining continual service we

can keep our customers connected in a 24/7 marketplace. This session will specifically focus on the

application layer, weve covered the core infrastructure in part one of this video session and part

three will look at the management focusing on System Center.

16-Nov-11

3


4/40

Im now going to turn it over to Justin Erickson, Senior Program Manager with the SQL team. Justin.

Justin: Hello everyone Im Justin Erickson, Im a program manager in the SQL Server database engine

team.

4


5/40

So lets quickly go through the introduction to each one of the technologies, if you guys have

questions theres sessions through the SQL Server track that goes into more detail in a lot of the high

availability technologies.

Key thing that Id want to point out is when you look at what comprises database downtime theres

two big portions; you have unplanned downtime where I actually have a failure, user caused an issue

where I have to move to a different system and theres also planned downtime where Im doing an

application upgrade, Im doing a patch or Im just trying to maintain the escalades that I need for my

system through put so we look at all of these drivers as we look into what makes SQL Server

availability technology.

5


6/40

And so the gamut of technologies that we have looking at existing releases as well as whats coming

up in the SQL Server Denali release through always on is listed over here. So Ill walk through each

one of these and talk about what does each technology build on based on the previous one and

youll see that theres a sequence of looking at back up and restore, log shipping, database mirroring

which is sort of the same technology being built incrementally to give you better SLAs. As well as

technologies like replication which sort of fit into this space and failover clustering instances which

use a lower level of data protection with the SANs and shared storage and SQL Server along with the

shared storage. And then well end talking about some of the ways to manage downtime as well

which is a majority of your downtime that youll see.

So back and restore is the most basic technology. Regardless of what technology youre using on top

of it its always a good idea to have a physical back up of your database so you can go and recreatethe entire system from scratch should your high availability system go down, your entire data center

go down or whatever other issues that maybe you have some issues that you need to go back to a

point and time, back up and restore is your base set of technologies there.

6


7/40

When you look at the downtime though of the backup and restore solution, you have the backup

and if something goes down you need to use backup and restore process to get your system back upand running, youre now doing a full installation of that system and applying that restore maybe you

have the system there but you actually go into the restore from scratch there which if youre looking

at terra byte sized databases that could take you a good amount of time.

And thats where a system like SQL log shipping comes into place, where this is basically an

automated backup and restore process where you have transactions that are coming into your

primary system , you have log backups that are going on a periodic basis through the aging job

schedule and this guys just copying backups out. Theres another job that will copy the backups to

your local system and finally a third job that goes through the restore process, so this is basically

doing the backup and restore, not waiting for that failure but saying Im going to have the systemready to go, constantly going through this backup, the copy job and the restore job so when I have

that failure I have the system ready to go and just have to apply whatever logs I havent applied at

that time.

And theres a nice wizard with SQL Server management studio to help you set this up and determine

what sets of intervals you want to be able to configure this on based on your needs.

7


8/40

The next technology takes the log shipping which was the automated backup restore process and

builds it into the engine so this looks through how do I get that streaming log records and nowbecause its built into the engine we can go and do things like provide synchronous commits where I

can make sure my secondary system is fully up to date with the primary so when Im failing over

theres zero data loss going on. And the way that this works is your application is coming in,

committing a set of transactions at the time that we write the transactions locally to our log file

were sending it over to the secondary. And if youre in synchronous mode well write it over to the

log file on your secondary side, send back and ack and only then will we go and tell the application

that hey, your transaction has been committed. So this means that at any point and time when we

execute that failover my system is fully up to date with that primary. Of course you dont have to run

in synchronous mode, you can always run in asynchronous mode where this guys just sending logrecords over as fast as he can, not waiting for the back up and restore process but the primary will

continue to go ahead so Im not slowing down the workload to get that data that ends up being a

choice depending on what your SLA needs are.

8


9/40

Another technology which isnt really built to be a high availability technology but is often used in

high availability scenarios is replication and the reason why this is typically used is wanting to getextra utilization of that hardware. When Im using a system like log shipping or database mirroring I

have a mirror or secondary thats sitting there, waiting in the event of a failure and sometimes we

hear from a customer well if I have that hardware I also want to do something with it and theres

scenarios where customers have previously used replication in the past because that not only allows

you to send the data to your secondary but actually be able to read the data from the secondary as

well for doing reporting or offloading other sets of workloads. And Ill talk in a second how always

on availability groups takes away this need so as we go forward were simplifying our technology

stack.

9


10/40

So SQL Server always on in the upcoming release of SQL Server, we took a holistic look of what do

we do with high availability and figured how do we build an integrated, flexible, efficient, singlesolution to meet your high availability needs rather than the previous set of technologies that we put

together to build a solution there. And from that we came up with two main feature areas; we have

always on availability groups which provides you database protection with SQL Server is doing the

data replication and similar to database mirroring and log shipping. And we have always on failover

cluster instances which allows customers to use their existing infrastructure and provide data

protection at the lower layer of the hardware stack using the SAN and the shared storage to provide

the data protection and SQL Server failing over between these. Failover cluster instances is a

technology that existed in the previous releases but was enhanced in Denali with multi-side

clustering, flexible failover policy provides a better set up health detection and diagnosticinfrastructure as well as improved failure times with indirect checkpoints.

Always on availability is a new feature in Denali that replaces database mirroring, provides a multi-

database failover unit, multiple secondaries so I dont need a combined database mirroring and log

shipping as well as active secondaries that I can now read and provide backups for from the

secondary system so replication doesnt end up being in the high availability mix.

As we look to the additional features set we provided here we looked into provided an integrated HA

management solution.

10


11/40

So what is SQL Server failover cluster instances? This is built similar to what Symon went through

with the other technologies with always on failover cluster instances we use a shared disk to do thedata protection so each one of the machines is accessing the same sets of files and when were

failing over were moving the access over to that same data file to another machine and having SQL

Server start up on that side so on the SQL side were providing protection of the binaries processes

between machines or relying on external SAN technologies to provide protection of the database

files themselves.

11


12/40

WSFC = Windows Server Failover Clustering

WSFC vs. FCI

Scoping: No replicas on same node (Hyper-V)

WSFC for:

1. Primary selection and coordination

2. Primary health detection

3. Distributed changes and truth

Secondary health driven from primary (no impact to primary)

SQL Server always on availability groups uses SQL Server to provide the data protection, its still built

on top of Windows clustering to help us with the inter node health detection state configuration

changes across the system but does not rely on any SAN or shared storage infrastructure, that

SQL Servers providing the data protection levels. We have collections of databases moving

rather than the binaries services moving between them.

12


13/40

When we look at always on as a comprehensive solution its built to be able to meet combinations of

needs so in some cases youre looking at using shared storage and SANs for your data protection

within your data center and availability groups between data centers which is like the picture on the

right. In some cases you dont have any investments into shared storage and you want to use a

cheaper solution to provide as a faster failover and thats where always on availability groups comes

in. And so you can mix and match these technologies to meet your needs whatever they are.


14/40

Another common question that comes up is well what about virtualization, how does this fit into the

mix? Virtualization is often used in consolidation scenarios with SQL Server and virtualization on its

own does provide some high availability guarantees as well so when you look into virtualization you

need to look at both the planned and unplanned downtime, at the host as well as the virtualization

layer because virtualization will provide live migration where you can failover VMs between hosts

with zero downtime and so thats the best set of solutions there. If you have an unplanned failover,

an unplanned event at the host level, thats when youre failing over the entire VM and doing and OS

restart so provides some protection over there but youll have a slower recovery time. If you have

failures at the guest level thats where virtualization doesnt provide any set of protection there so if

I have database file corruption or the binaries themselves within that OS get corrupt for whatever

reason just using virtualization is not providing protection there, thats where youre falling back toback restore or you can use an additional technology at the guest level and they provide you the best

of both worlds in a solution. And similarly at the planned level when Im patching the guest OS

youre having downtime during the patch unless you have another technology within the guest OS to

provide you protection. So when you look at this when to use always on technology and when to

use a high availability technology if you look at these sets of requirements and your customers

looking at requirements and says this isnt enough, thats when its worth going and investing into

that complexity. If youre looking at these sets of requirements and meets your SLAs then its good

to stick with virtualization as your core technology rather than biting off the additional complexity

and adding in another solution to the guest and all our technologies will work through virtualization.

14


15/40

So that gives you a quick introduction to our unplanned downtime features.

When we look into planned downtime there are other things to consider, how do I handle the OS aswell as the SQL Server upgrades and thats where each one of the technologies has a rolling

upgrades story where I can upgrade the mirror, failover to a secondary, patch the old primary and

then fail back if I need to. Online operations is another key thing, if Im doing an application change

where Im actually changing the database structures or adding new data to the system, online

operations that are enhanced in Denali will allow you to make these changes without impacting the

currently running workloads and theres new enhancements in SQL Server Denali where we can do

more online index builds with log data types and big data types as well as adding in columns that are

non null able columns which were not previously available in the previous releases. Along with this

a lot of times you look into okay what are other sources that impact my SLAs if Im building my SLAsas a business Im looking not only what happens in the event of a failover when Im taking the system

down but is my system able to respond at the throughput that I need it to go and respond and thats

where resource governor is a great technology and allows you to throttle the workloads to reserve

the capacity for your core workloads so I can have my core system saying that I want to make sure

that I reserve 80% of my CPU for my core workload allowing lower priority workloads to still run onto

that same box but restricting that so its not going to be on a certain set of resources so thats where

resource governor is a great technology to be able to restrict your lower priority workloads from

impacting the SLAs of your most critical workloads.

15


16/40

Thats a little bit about flash introduction to SQL Server, now Ill hand it over to Scott to talk about

Exhange.

Scott: My name is Scott Schnoll, Im a Principal Technical Writer on the Exchange Team among other

things I write all the product documentation around high availability, site resilience, disaster recovery

and a few other areas so Im really excited to talk to you about it.

16


17/40

I do though want to tell you that Exchange does things a little differently from what youve heard

until now. We do use failover clustering technologies but we dont use any shared storage, we dont

use the resource model and in fact were just more of a consumer of clustered technologies as youll

see in a minute. We also have in Exchange a very specific definition of high availability okay so to

have true high availability for an Exchange Server you must have three criteria; you must service

availability, data availability and automatic recovery from most failures and we say most failures

because youre not going to get automatic recovery from all failures for example a data center level

type event you wouldnt get automatic recovery from them. We have mechanisms to do manual

recoveries for that but thats not an automatic solution and that would be a DR process, not a highly

available process.

The other thing I want to tell you about is we use this acronym called *overs a lot and that really isjust our short hand notation for switchovers and failovers. Failoversweve been talking about a lot,

failovers simply when the system takes the automatic corrective action for you, a switchover is when

an administrator manually activates for instance a passive copy of an exchange database.

And then we have site resilience as well, site resilience and HA they are unified into a single platform

inside of Exchange 2010 for example but there are different operations with different configurations

as youll see here in a minute, site resilience is that DR type configuration that you do to protect

yourself when you have multiple data centers and you want redundancy across those data centers.

17


18/40

Now we actually introduced both service availability and *over capabilities way, way back in

Exchange 5.5 but back in those days we were using Microsoft cluster server and NT 4, we were usingthe cluster resource model and many of our core components were cluster aware Exchange knew it

was being installed in a cluster and it did something a little different from an unclustered Exchange

Server.

We also back at that time relied very heavily on third party partner products, we didnt have any built

in data replication whatsoever so we had no native data availability in Exchange and instead relied on

hardware vendors, storage vendors, replication vendors to make copies of our data for us.

In Exchange 2007 we took a very revolutionary leap forward okay we started the model of breaking

away of doing the old legacy way of doing Exchange clustering. We still supported the old style of

Exchange clustering where you use shared storage but we gave that a different name, we called thata single copy cluster to reflect that in that cluster you only had one single copy of your data. So in

2007 we also introduced a second form of Exchange clustering called cluster continuous replication

and that in fact is when we introduced our continuous replication or what we call log shipping

technology.

18


19/40

We actually have three different forms of continuous replication in Exchange 2007, one is local

where youre just shipping a copy of the logs to the database thats connected to the same server as

your active copy, we also have cluster replication where every database you had on an active node

was being replicated up to a node and you always had them in pairs and then we had this one called

standby continuous replication that we introduced in service pack one for Exchange 2007 and what

that did was allow you to replicate data pretty much anywhere from a standalone mailbox server to

another standalone, from a cluster to a standby cluster and so forth and in fact it became as

Exchange 2007 evolved and matured it became pretty much the defacto configuration or

architecture to use a combination of cluster continuous replication for high availability within the

data center and standby continuous replication to get you site resilience for that data center as well.

And so this is basically what it looks like, the information stored in Exchange is doing what its donesince day one, generates log files and as those log files are closed theyre copied over to the other

copy of the database, theyre inspected by the other copy of the database and assuming they pass

inspection they then get replayed into that copy thereby making that copy pretty much an up to date

bit for bit duplicate of the original active copy.

19


20/40

Now this is typically what it would look like when you see it in the organization topology, here Ive got two

separate CCR clusters remember CCR was always a pair of two; an active and a passive so Ive got twoseparate clusters, Ive got some Outlook and Outlook web app, and Active Sync clients that are out theretheyre going through our front end component called a client access server in 2007 and later or in the case ofOutlook going directly to the information store and talking to it and basically we would replicate one for onewithin these pairs. If you wanted to extend that solution to another data center you used a separatetechnology, you used standby continuous replication and that actually worked really well but it had somechallenges with it. But it still worked, it got the data over there, you had a standby server, maybe a standbycluster and so if you had any problem with your primary site, in this case San Jose you could go ahead andactivate the Dallas site, get your clustered mailbox up and running and life was good.There were some challenges though when you clustered the mailbox role in 2007 it couldnt co-exist with anyother server roles or client access role the transport role, the unified messaging role, you had to buy extrahardware to do that, it only allowed you to use the mailbox role in the cluster. So that meant at a minimum ifyou wanted high availability for Exchange 2007 you had to buy at least four servers. Some of the other

challenges were you had to have some clustering knowledge okay and that might not seem like a big deal ifyouve been doing it for a long time but most of the administrators who managed Exchange solutions areExchange pros not cluster pros. And so sometimes it was challenging for them to build the underlying clustercorrectly before they would deploy Exchange. This wasnt so much true in the CCR paradigm but it wasespecially true in the other type of cluster we had in 2007 called the single copy cluster when you had to alsodeal with the shared storage and the interconnects and getting that just right. Some of the other challengeswas even though in 2007 we supported 50 databases per server if you had a problem with just a singledatabase on that server you had to failover the whole clustered mailbox server, the entire Exchange Serversnetwork identity had to be moved to another server even if you only had one problematic database out of 50so that wasnt very optimal.We did introduce SCR, finally people had a built in way to get data replicated outside of the cluster and offsiteto a different data center but we introduced it in a service pack and typically when we introduce majorfeatures in a service pack we dont put GUI around them, okay so that meant if you wanted to manage SCR

you had to do it all from the Exchange management shell which is a powershell based console that you had touse, you couldnt use the Exchange management console which is an MMC snapin and click on pictures andstuff so that meant administrators had to learn to manage CCR one way and manage SCR a completelydifferent way. And then the last challenge was even after you got the data over there, it was a pretty complexactivation process that you had to go through there were many, many steps that involved usurping theclustered mailbox server itself and forklifting it over to the recovery server that took time, for someadministrators it was confusing because of the different technologies and so we looked at all of this and cameup with a whole new solution in Exchange 2010.

20


21/40

And in fact Exchange 2010 is very different from anything that weve done in the past. First of all theres no

more clustered mailbox server okay, we dont use the cluster resource anymore or put slightly different, thecluster has no idea that were even there, but we know the clusters there because we use it, we use the

clusters node and membership APIs so that we can join the servers together in a group, we also use the

clusters heart beating technology which is very mature, and proven technology and will allow us to find out

when servers are dropping off the network. And of course we use the cluster database because theres data

that we need to share between the members and the solution and we need to share it very quickly much

more quickly then if we were to store it in Active Directory and wait for it to be replicated across.

So what we have now, what youre seeing is a representation of a new construct that we call a database

availability group, or DAG for short. A DAG is simply a collection of mailbox servers, in this case five mailbox

servers that host replicated databases so for example if you look at DB1 for example you can see that DB1 is

using green, on mailbox server 1, green means in this case its the active copy and then weve got DB1 onmailbox server 2 and a DB1 under mailbox server 4 that are in blue, those represent passive copies of the

database. Databases that the system keeps up and maintains itself and that are waiting to become active in

the case of some sort of failure affecting the actual active database. We also made another architectural

change where now all clients including Outlook mapping clients no longer connect directly to the information

store. Instead they now connect to a set of services on the client access server, one is called the client

address service, thats where they get their directory information, and the other service is called the RPC

client access service and thats where they get their MAPI endpoint now, so all Outlook knows is its got its

MAPI and directory in points, it has no idea its talking to a client access server not a mailbox server anymore.

So you can see here I have the option to replicate databases as I see fit, its not like CCR where every database

you have on the active node gets replicated at the passive node. Its more like SCR in this case in that theadministrator gets to choose which databases get replicated and to where. So in this case the administrator

only wanted three copies of DB1 so we spread them across mailbox server one, two and four. Similarly you

can see on mailbox server one, DB1 and 3 are both green, those are the active copies but mailbox server 1

also hosts a passive copy of DB2, again, this is another departure from our previous model where you had

only active instances on one server and only passive instances on another server. Now we have multiple

instances, you could have active and passive copies of multiple databases on multiple servers as you see here.

21


22/40

In changing to this model this changed everything from a failover perspective okay because we dont

have a clustered mailbox server anymore because we dont have a network identity to moveanymore, we now only have to simply move the designation of the active copy and I say move thedesignation of the active copy because were not really picking up a database and moving it. All

were saying is youre active now, youre passive now, you had a problem, youre active now and

youre passive now, its that simple. Failovers now managed completely within Exchange because

there is no cluster resource model if you open up failover cluster manager on a mailbox server thats

a member of DAG and you look under services and applications, youre not going to see anything.

Theres no exchange group, theres no exchange resources, no IP addresses, no storage groups, no

databases, no information store, no system attendant, nothing, we dont use the cluster resource

anymore. That though means that we had to have some mechanism to handle failover within

Exchange in previous versions if we had a problem the cluster moved that resource over to anothernode for us. Now we actually have a whole brand new component inside of Exchange called Activemanager and Active manager runs in a key service on these mailbox servers its called the Microsoft

Exchange replication service.Its the same service we introduced in 2007 to do log shipping and CCR and SCR but now theres anew component that runs inside that service called Active Manager thats the brain of the Exchange

solution. Active manager is not only responsible for managing everything but its also responsible for

initiating the corrective action when some sort of failure occurs. So say for instance the disk hostingDB1just dies, were not using RAID in this case so the disk dies and now the database is gone with it.Active manager detects that and will automatically failover the active copy to one of the otherpassive copies whichever one it believes to be the best most up to date healthy copy.

So I mention all clients connect via CAS so the system works somewhat like this, Ive got any clientout there, might be Outlook, might be Outlook web app, might be Active sync, we dont know, its

just a client accessing and getting messages in. Theres an active manager client that also runs inside

of the client access server and that knows where the users database is located so users only connectto CAS and its CAS that talks to RPC MAPI to the information store. Users dont talk to the

information store directly anymore, weve abstracted the user connection away from the information

store so that we can get fast failover when one of these databases has to failover.

22


23/40

So messages come in, they go to the appropriate database and then the log files representing those

messages get replicated to the copies of that database these are using message icons, its not the

actual messages that we replicate its the actual transaction log files generated by the Exchange

database engine itself that gets replicated.

So if we have a failure affecting database 1, database 1 disappears for whatever reason maybe its

the storage, maybe its some sort of corruption we dont know, database 1 is gone, what happens in

30 seconds or less and you can see under mailbox server #2, the copy of DB1 has now gone green, in

30 seconds or less a new active copy replaces the failed active copy by choosing the best available

passive copy to activate. So in this case the system decided that the best copy was on mailbox server

2, notice the client still stays connected the CAS, even though their underlying database went away,

CAS understands whats going on because of the active manager clients, so active manager says no,your database is now over here, Im going to connect CAS to mailbox server 2 and the client is back

in business.

And because it happens so quickly its quite possible and more often then not clients dont even

notice that any of this happened, theyre abstracted away from it so as their database goes away

they dont get disconnected, they might get disconnected at the CAS server goes away but well talk

about load balancing and how to deal with that in a minute but assuming CAS doesnt go away and

its just a failure of the mailbox server, the mailbox servers network, the mailbox servers disks,

thats going to be a transparent failover to the client okay, theyre probably not going to notice

anything. Okay, now if they happen to be in the middle of Outlook web app, theyre composing

something and they go to hit send, and in the middle of hitting send a failover occurs, they will get a

message saying that your mailbox is temporarily unavailable. But if they just wait a few seconds and

press f5 for refresh on the browser it will bring them right back into their mailbox and they wont

even have to log on again, okay, its that fast. And of course mail flow will continue because active

manager knows where the database is located and replication will continue as long as you have

multiple copies left.

23


24/40

Now each DAG, Im showing a five member DAG here, each DAG can go up to sixteen members so

you can have sixteen copies of all your databases and each Exchange Server itself supports 100databases so you can 1600 databases inside a single DAG, of course that would be non replicated

databases but you can have 800 databases where youve got 2 copies of each, 533 databases with 3

copies of each, and so forth and also consider that our maximum recommended database size is

now 2 terra bytes per database you can grow this very, very large it scales incredibly well, and in case

youre wondering how well, this solution is whats running Outlook.com, Office 365 and so forth. So

were talking 75 million almost 80 million mailboxes on this solution. The beauty of it is the same

exact commands you use to create the DAG inside a data center the same exact ones you would use

to extend it to another data center to put yourself in a site resilience configuration, its that easy.

24


25/40

Now, I mentioned the DAG and this is basically again, what the architecture would look like, clients

are talking to the client access server, theyre talking to the RPC client access service and the addressbook service, and its the active manager component that is telling those services where the users

mailbox is located so that CAS can talk to mailbox for them.

25


26/40

So a DAG is simply just a set of servers up to sixteen that host a set of replicated databases you can

have multiple DAGs in a single org, obviously if you need more than 16 members in the DAG youhave to use a second DAG and so forth. We do leverage the Windows failover cluster technologies

but were not cluster aware and we dont use the cluster resource model and the DAG itself defines

the boundary for replication so you wont be replicating outside the DAG.

26


27/40

And we mention this, now we did add a second form of continuous replication in service pack 1 so

let me briefly talk about that.

27


28/40

So now I have two forms and in 2007 and in 2010 RTM we had one form of continuous replication

and one form of log shipping where were shipping closed transaction files. So the active copy in

green would create the log files and then the passive copy would say hey, send me your latest log

files, Ive got these so far, the latest log files would go through and if the passive copy is able to keep

up and catch up with log generation activity on the active copy, which in this case it would because

now its got log five which was the last one generated, now it says you know what, the database

copy is up to date, now the system switches into block mode and we now instead of shipping those

transaction files we actually ship blocks of ESE transactions as theyre being written to the log buffer.

So we write to the log buffer on the active side and at the same time we send that information over

to a corresponding buffer on the passive side and keep things up to date. Now all continuous

replication is asynchronous so we dont wait for acknowledgement on the other side. So there ispotential for data loss but weve got other mechanisms built in the system to get that data back. But

again, now we have the ability to replicate blocks, we dont have to wait for a transactional log file to

be closed in order to externalize that data which means that that amount of losable data has

substantially decreased with service pack one as a result of databases being able to leverage block

mode. And of course once the buffers full, it generates the corresponding log file on each separate

side, its built, its inspected separately, and of course replayed into the copy of the database after

that. We also have a mechanism whereby if we only get a partial buffer and then the active copy

goes away, well actually use that, well take that, we call it a log fragment and well convert it into a

full log and if theres usable transactions in there, well play those transactions against the database

and at least get the data that we were able to get over.


29/40

We also have this concept of a lagged database copy and this is a database copy that you have the

ability to delay replay of log files against for up to 14 days. So think of it as a point in time back up ofyour database up to 14 days. Dont go beyond 14 days we have a hard coded limit of 14 days for the

lag but its basically there to provide you with a maximum of 14 days protection against things like

logical corruption. If you have physical corruption in your store thats not going to be a problem

because continuous replication will detect that and it will block physical corruption from being

replicated to another database. For logical corruption theres no way for the system to tell that and

so as its fail back mechanism you have the ability to delay replay into a passive copy so that if you do

detect logical corruption from an end user you can go and activate a point and copy at a point and

time before that corruption took place. And of course lag copies will affect storage design where

holding those log files so you are going to need to size them appropriately but thats something elseas a protection that we have.

29


30/40

Now load balancing has changed a little bit with Exchange 2010 as well in 2007 most customers were

used to doing load balancing for the reverse proxies so that traffic coming from the internet wouldget load balanced and not overwhelm a single reverse proxy. As a result of the architectural change

we made where Outlook now connects to the client access server instead of the information store

you need to have a form of RPC load balancing that you can use for your Outlook clients so that all of

your Outlook clients arent going to a single CAS server. So you will need a load balancer now and it

will have to be an RPC load balancer so that means something like you know Windows Load

balancing wont be able to handle that for you, its got to be RPC load balancing and the RPC load

balancer has to not only support RPC but it also has to support infinity as well. This catches some

customers off guard because its a new requirement we never had in Exchange before so be aware of

that when you talk to customers who are migrating from 2003 or 2007 to 2010.And the last thing we have is Exchange does support back up and recovery obviously thats disaster

recovery not high availability. We used to support both the ESE streaming back up APIs and the VSS

APIs but because were dealing with much larger data sets now the ESE streaming APIs just werent

going to do the job and so we cut them from Exchange 2010.

16-Nov-11

30


31/40

We now support only VSS based back ups but the good news is we ship a plug in for Windows Server

back up in the box so if you just want a basic VSS back up of your databases you get that in the boxwith Exchange, you dont have to buy other products.

If you want something more full featured its when DPM or any other Exchange aware third party or

VSS solution would work for you.

We also have some other DR technologies one called a recovery database, its basically an object into

which you can put a restore database in which you can extract data out of it. We also have this

concept of database portability where you can take any Exchange 2010 database and have it be

moved to any other Exchange 2010 server inside the Org so even if you didnt replicate it you can

pick it up and forklift it over to somewhere else.

The last thing we have is this dial tone portability which if you do have a failure affecting your onlydatabase copy you at least have the ability to spring up what we call this dial tone database, its an

empty database that just allows users to send and receive mail, it doesnt have all their historical

data in it, its empty but it gives them dial tone so they can at least send and receive messages while

youre in the background restoring their data.

16-Nov-11

31


32/40

Thank you Scott. Now Im going to continue and talk about some of the other mission critical servers

for Microsoft and theyre high availability solutions.

32


33/40

http://support.microsoft.com/kb/957006

First of all think about virtualization, virtualization is one of Microsofts key investments of the Hyper

V platform and all teams now test their products in Hyper V. Microsoft has whats called the

common engineering criteria which are a series of guidelines which each engineering team must

follow to ensure their applications are enterprise ready. One of the key tenets of this guide is to test

in Hyper V to make sure that it has equivalent performance and equivalent resiliency, so you can

actually go online to check out this KB article 957006 and its kept up to date as far as which of the

various versions of all the Microsoft products and whether they are supported in a Hyper V

environment. And when we think of Hyper V, think Hyper V with failover clustering meaning that we

can run all of these application service inside a VM Guest and the VM itself is clustered.

16-Nov-11

33
http://support.microsoft.com/kb/957006http://support.microsoft.com/kb/957006


34/40

The next major application is the file server and file server of course manages your storage, your

shares, replication and search and indexing. Traditionally file servers user failover clustering, this isthe default configuration and you can have multiple file servers on a failover cluster.

DFS replication is another technology which is part of file server. And DFS replication can be used as

a high availability technology in regards to it allows you to push information from one server to

another server so in theory its in multiple locations. So if a primary server crashes or becomes

unavailable you can recover the information from a secondary location. Now, you can do this within

a single site, within a data center, within a group of servers, you could do this across multiple sites,

so this can build up a disaster recovery solution if you have multiple data centers. With replication it

also gives you the ability to access offline files, so if you use the offline files feature youre actually

using some DFS replication on the backend to go and push or to keep updated your local copy of allof these versions. Now, one of the key things to keep in mind is that replication only happens when a

file is closed so this could give you pretty good availability if youre working on something such as a

Word document, or an Excel spreadsheet, but if we extend this concept to the enterprise it doesnt

do a great job of replicating things which keep their file open. For example a virtual machines VHD

file or a SQL database, these types of resources are kept open indefinitely and really theyre only

closed when theyre taken offline. Now, if these are kept open and replication hasnt happened then

potentially you could lose all of the data, all the information which has been collected since the last

replication. And for this reason failover clustering does not support DFSR as a replication technology

since it is possible that some data could be lost. Additionally you have to keep in mind that there

could be replication conflicts, if you have multiple people working on the same document

simultaneously in two different locations, when replication happens there could be some synching

and configuration conflicts which need to be resolved. Nevertheless it is a great true inbox solution

to give you some levels of high availability in your data center.

34


35/40

Lync Server is one of the extensions of the unified communications server, it basically covers all types

of messaging, including IM, voice and video, and content sharing over live streaming mediums. Lyncserver has a high availability architecture which is relatively flexible. The core is using load balancers

to connect people to a registrar so when a user wants to go and connect to the Lync server theyre

going to go and get sent to a registrar. Now there is a requirement to use hardware load balancers

for this registrar and NLB, Microsofts network load balancing is explicitly not supported. Now the

registrars themselves they actually have access to whats called a back up registrar pool so if this

primary registrar is unavailable when a client connects or it crashes the client will get sent to this

backup registrar. So from their perspective they may be disconnected temporarily but their

transaction will be recovered and they can continue staying online. There is also DNS load balancing

available for other types of network traffic from Lync server, so this gives you kind of the basic formsof high availability just by distributing incoming clients to different registrars or different Lync server

components to help distribute the traffic and make sure that one specific server or one specific

component isnt overloaded with too many client connections.

Now there are partners which deliver what are called SBAs or survivable branch applications. These

are essentially customized unified communication applications which contain a subset of all of the

Lync functionality so by connecting to these SBAs which are generally a branch office we could keep

a client connected, we could keep them using some of the basic communication and collaboration

tools but they do have limited ability to use the full functionality of Lync server. By having an SBA in a

branch office we could keep our client up and running even if they cannot connect to their primary

data center.

16-Nov-11

35


36/40

SBA in a branch office we could keep our client up and running even if they cannot connect to their

primary data center.Lync also has two multi-site high availability solutions which are called Data Center Resiliency and

Metropolitan Data Center Resiliency. With Data Center Resiliency it allows us to spread our Lync

servers across multiple physical locations and we can even have high availability for the voice

communication so if somebodys on a phone call, primary data center crashes, we can actually

failover to this secondary without dropping the call. Specifically the high availability is built for voice

failover so it is possible that other types of transactions such as IM communication could be

temporarily lost if a failover happens.

The more advanced version of this is whats called Metropolitan Data Center Resiliency and with this

you have an Active/Active configuration with continual replication at the sites at the hardware level.And the reason why this is called metropolitan is that its generally going to be deployed within a

specific city, meaning that the breadth or the distance which you can stretch the sites is limited to a

few miles or a few dozen miles. This does give you higher availability since it is an active/active

connection so youll have better resilience but the distance between the data center can be limited.

The final Lync server high availability solution is simply backup and restore. If you lose information, it

crashes, you can pull it back using an expedited service restoration process which can be a workflow

that can be pre-programed or pre-orchestrated to help recover as quick as possible.

16-Nov-11

36


37/40

http://blogs.msdn.com/b/joelo/archive/2007/03/09/sharepoint-backup-restore-high-availability-

and-disaster-recovery.aspx

SharePoint Server is Microsofts web platform for all types of collaboration and document

management. Its primarily built around a database where all of this shared content is stored and

this database can be made highly available using SQL; it can use SQL backup and restore, database

mirroring or log shipping and this database can be protected using System Center Data Protection

Manager or DPM. DPM has some nice integration points with SharePoint because it gives you the

ability to granularly restore specific objects so if you lost a specific document you could go and just

recover that specific document rather than having to restore the whole database.

Two of the additional SharePoint servers the Crawl or Index server and the Search or Query serverthese are deployed in a redundant topology meaning that there are multiple versions of them

available throughout the infrastructure and if one of them is unavailable clients will simply be

reconnected to another one to help speed up the indexing or to help speed up their search queries.

SharePoint does have a rich, front end, web based interaction experience where clients or users

actually will go and browse the document or collaborate on the site using a web front end and this is

made highly available using Windows Network Load Balancing, NLB will go and distribute the traffic

across these multiple front end servers to ensure that a single server is not overloaded.

An additional high availability feature which is unique or customized for SharePoint is the recycle bin

and this gives you the ability to simply recover items that were accidentally deleted so if you lose any

type of file, list or application, by default they are still saved for 30 days before theyre permanently

deleted to give people the opportunity to recover any documents that were accidentally removed.

16-Nov-11

37


38/40

Technical Review complete

As we talked about the web server for SharePoint a lot of this is built on Microsofts web server

known as IIS. IIS has a rich series of clients and a rich topology to handle all the different kind of web

services from file transfers to actually displaying websites.

Network Load Balancing is used for most of the web server roles with IIS, this means that when a

client tries to connect to anything they can go through NLB and be load balanced across multiple

servers, additionally hardware load balancing can be used. Now network load balancing, this does

load balancing at a level 2-3 layer in the networking stack. However with IIS with the web server

there are often load balancing requirements at level 7 which is what we call the HTTP traffic and so

load balancer at this level actually look at the URL, HTTP://microsoft.com and it will go and loadbalance traffic based on what is contained within that URL it will load balance the HTTP traffic. And

it does this by whats called an application request routing server or ARR. ARR essentially contains

this logic to do the load balancing for this level 7 traffic. However the ARR server itself needs to be

made highly available so that its not a single point of failure and the ARR server can user network

load balancing to be deployed in redundant arrays. So at the front end youre going to have ARR

with network load balancing this is going to not only give you traffic load balancing at level 2 and 3

but then its going to go and figure out at level 7 where it should redirect the clients to the content

servers and then you have the middle tier which will actually go and serve the content up.

Additionally IIS has high availability for two of its roles using failover clustering. This is the FTP role

and the WWW role. There are white papers out there that will actually show you how to explicitly

configure these roles on a Windows Server failover cluster so that there are client access points for

anyone trying to connect to either of these roles is always available and can move between different

nodes in the failover cluster.

16-Nov-11

38


39/40

As we review this section weve covered quite a lot of the core servers and core applications from

Microsoft. As we know downtime is inevitable so not only is it important to keep our infrastructureup and running but its more important to keep our applications up and running. While most of the

technologies and servers can use virtualization or network load balancing, many of them have

unique and specific dependencies on failover clustering. And as weve seen from Exchange Server as

well as SQL Server they both use failover clustering as one of the underlying technologies yet they

abstract a lot of the management and unique functions specific to SQL and Exchange from clustering

so while they might use the cluster for membership or for health checking the rest of the

functionality is unique to Exchange and to SQL.

We hope that you found this module on application high availability useful and check out part three

of this series which will go and look at management high availability.

16-Nov-11

39


40/40

This video is a part of the Microsoft Virtual Academy.

Thank you.

Module 1 Application HA With SQL, Exchange and Other Servers

Documents

Transcript of Module 1 Application HA With SQL, Exchange and Other Servers