Module 1 Application HA With SQL, Exchange and Other Servers

download Module 1 Application HA With SQL, Exchange and Other Servers

of 40

Transcript of Module 1 Application HA With SQL, Exchange and Other Servers

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    1/40

    This video is part of the Microsoft Virtual Academy.

    1

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    2/40

    In this session we are going to be diving deeper into understanding Microsoft's high availability

    solutions. Part one of this series look to the application infrastructure meaning failover clustering,virtualization and some of the other key infrastructure components.

    Part two is going to look at the applications which run on top of this infrastructure. Were going to

    spend most of our time looking at SQL Server, Exchange Server and then briefly cover some of the

    other server high availability solutions.

    Im Symon Perriman and Im going to be joined by SQL program manager Justin Erickson and

    Exchange technical writer Scott Schnoll in this session.

    2

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    3/40

    Learn about Microsofts different High Availability technologies and when to use each of them.

    High availability is important because it keeps our applications up and running not only for

    availability but also to make sure that our customers are happy, by maintaining continual service we

    can keep our customers connected in a 24/7 marketplace. This session will specifically focus on the

    application layer, weve covered the core infrastructure in part one of this video session and part

    three will look at the management focusing on System Center.

    16-Nov-11

    3

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    4/40

    Im now going to turn it over to Justin Erickson, Senior Program Manager with the SQL team. Justin.

    Justin: Hello everyone Im Justin Erickson, Im a program manager in the SQL Server database engine

    team.

    4

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    5/40

    So lets quickly go through the introduction to each one of the technologies, if you guys have

    questions theres sessions through the SQL Server track that goes into more detail in a lot of the high

    availability technologies.

    Key thing that Id want to point out is when you look at what comprises database downtime theres

    two big portions; you have unplanned downtime where I actually have a failure, user caused an issue

    where I have to move to a different system and theres also planned downtime where Im doing an

    application upgrade, Im doing a patch or Im just trying to maintain the escalades that I need for my

    system through put so we look at all of these drivers as we look into what makes SQL Server

    availability technology.

    5

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    6/40

    And so the gamut of technologies that we have looking at existing releases as well as whats coming

    up in the SQL Server Denali release through always on is listed over here. So Ill walk through each

    one of these and talk about what does each technology build on based on the previous one and

    youll see that theres a sequence of looking at back up and restore, log shipping, database mirroring

    which is sort of the same technology being built incrementally to give you better SLAs. As well as

    technologies like replication which sort of fit into this space and failover clustering instances which

    use a lower level of data protection with the SANs and shared storage and SQL Server along with the

    shared storage. And then well end talking about some of the ways to manage downtime as well

    which is a majority of your downtime that youll see.

    So back and restore is the most basic technology. Regardless of what technology youre using on top

    of it its always a good idea to have a physical back up of your database so you can go and recreatethe entire system from scratch should your high availability system go down, your entire data center

    go down or whatever other issues that maybe you have some issues that you need to go back to a

    point and time, back up and restore is your base set of technologies there.

    6

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    7/40

    When you look at the downtime though of the backup and restore solution, you have the backup

    and if something goes down you need to use backup and restore process to get your system back upand running, youre now doing a full installation of that system and applying that restore maybe you

    have the system there but you actually go into the restore from scratch there which if youre looking

    at terra byte sized databases that could take you a good amount of time.

    And thats where a system like SQL log shipping comes into place, where this is basically an

    automated backup and restore process where you have transactions that are coming into your

    primary system , you have log backups that are going on a periodic basis through the aging job

    schedule and this guys just copying backups out. Theres another job that will copy the backups to

    your local system and finally a third job that goes through the restore process, so this is basically

    doing the backup and restore, not waiting for that failure but saying Im going to have the systemready to go, constantly going through this backup, the copy job and the restore job so when I have

    that failure I have the system ready to go and just have to apply whatever logs I havent applied at

    that time.

    And theres a nice wizard with SQL Server management studio to help you set this up and determine

    what sets of intervals you want to be able to configure this on based on your needs.

    7

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    8/40

    The next technology takes the log shipping which was the automated backup restore process and

    builds it into the engine so this looks through how do I get that streaming log records and nowbecause its built into the engine we can go and do things like provide synchronous commits where I

    can make sure my secondary system is fully up to date with the primary so when Im failing over

    theres zero data loss going on. And the way that this works is your application is coming in,

    committing a set of transactions at the time that we write the transactions locally to our log file

    were sending it over to the secondary. And if youre in synchronous mode well write it over to the

    log file on your secondary side, send back and ack and only then will we go and tell the application

    that hey, your transaction has been committed. So this means that at any point and time when we

    execute that failover my system is fully up to date with that primary. Of course you dont have to run

    in synchronous mode, you can always run in asynchronous mode where this guys just sending logrecords over as fast as he can, not waiting for the back up and restore process but the primary will

    continue to go ahead so Im not slowing down the workload to get that data that ends up being a

    choice depending on what your SLA needs are.

    8

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    9/40

    Another technology which isnt really built to be a high availability technology but is often used in

    high availability scenarios is replication and the reason why this is typically used is wanting to getextra utilization of that hardware. When Im using a system like log shipping or database mirroring I

    have a mirror or secondary thats sitting there, waiting in the event of a failure and sometimes we

    hear from a customer well if I have that hardware I also want to do something with it and theres

    scenarios where customers have previously used replication in the past because that not only allows

    you to send the data to your secondary but actually be able to read the data from the secondary as

    well for doing reporting or offloading other sets of workloads. And Ill talk in a second how always

    on availability groups takes away this need so as we go forward were simplifying our technology

    stack.

    9

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    10/40

    So SQL Server always on in the upcoming release of SQL Server, we took a holistic look of what do

    we do with high availability and figured how do we build an integrated, flexible, efficient, singlesolution to meet your high availability needs rather than the previous set of technologies that we put

    together to build a solution there. And from that we came up with two main feature areas; we have

    always on availability groups which provides you database protection with SQL Server is doing the

    data replication and similar to database mirroring and log shipping. And we have always on failover

    cluster instances which allows customers to use their existing infrastructure and provide data

    protection at the lower layer of the hardware stack using the SAN and the shared storage to provide

    the data protection and SQL Server failing over between these. Failover cluster instances is a

    technology that existed in the previous releases but was enhanced in Denali with multi-side

    clustering, flexible failover policy provides a better set up health detection and diagnosticinfrastructure as well as improved failure times with indirect checkpoints.

    Always on availability is a new feature in Denali that replaces database mirroring, provides a multi-

    database failover unit, multiple secondaries so I dont need a combined database mirroring and log

    shipping as well as active secondaries that I can now read and provide backups for from the

    secondary system so replication doesnt end up being in the high availability mix.

    As we look to the additional features set we provided here we looked into provided an integrated HA

    management solution.

    10

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    11/40

    So what is SQL Server failover cluster instances? This is built similar to what Symon went through

    with the other technologies with always on failover cluster instances we use a shared disk to do thedata protection so each one of the machines is accessing the same sets of files and when were

    failing over were moving the access over to that same data file to another machine and having SQL

    Server start up on that side so on the SQL side were providing protection of the binaries processes

    between machines or relying on external SAN technologies to provide protection of the database

    files themselves.

    11

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    12/40

    WSFC = Windows Server Failover Clustering

    WSFC vs. FCI

    Scoping: No replicas on same node (Hyper-V)

    WSFC for:

    1. Primary selection and coordination

    2. Primary health detection

    3. Distributed changes and truth

    Secondary health driven from primary (no impact to primary)

    SQL Server always on availability groups uses SQL Server to provide the data protection, its still built

    on top of Windows clustering to help us with the inter node health detection state configuration

    changes across the system but does not rely on any SAN or shared storage infrastructure, that

    SQL Servers providing the data protection levels. We have collections of databases moving

    rather than the binaries services moving between them.

    12

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    13/40

    When we look at always on as a comprehensive solution its built to be able to meet combinations of

    needs so in some cases youre looking at using shared storage and SANs for your data protection

    within your data center and availability groups between data centers which is like the picture on the

    right. In some cases you dont have any investments into shared storage and you want to use a

    cheaper solution to provide as a faster failover and thats where always on availability groups comes

    in. And so you can mix and match these technologies to meet your needs whatever they are.

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    14/40

    Another common question that comes up is well what about virtualization, how does this fit into the

    mix? Virtualization is often used in consolidation scenarios with SQL Server and virtualization on its

    own does provide some high availability guarantees as well so when you look into virtualization you

    need to look at both the planned and unplanned downtime, at the host as well as the virtualization

    layer because virtualization will provide live migration where you can failover VMs between hosts

    with zero downtime and so thats the best set of solutions there. If you have an unplanned failover,

    an unplanned event at the host level, thats when youre failing over the entire VM and doing and OS

    restart so provides some protection over there but youll have a slower recovery time. If you have

    failures at the guest level thats where virtualization doesnt provide any set of protection there so if

    I have database file corruption or the binaries themselves within that OS get corrupt for whatever

    reason just using virtualization is not providing protection there, thats where youre falling back toback restore or you can use an additional technology at the guest level and they provide you the best

    of both worlds in a solution. And similarly at the planned level when Im patching the guest OS

    youre having downtime during the patch unless you have another technology within the guest OS to

    provide you protection. So when you look at this when to use always on technology and when to

    use a high availability technology if you look at these sets of requirements and your customers

    looking at requirements and says this isnt enough, thats when its worth going and investing into

    that complexity. If youre looking at these sets of requirements and meets your SLAs then its good

    to stick with virtualization as your core technology rather than biting off the additional complexity

    and adding in another solution to the guest and all our technologies will work through virtualization.

    14

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    15/40

    So that gives you a quick introduction to our unplanned downtime features.

    When we look into planned downtime there are other things to consider, how do I handle the OS aswell as the SQL Server upgrades and thats where each one of the technologies has a rolling

    upgrades story where I can upgrade the mirror, failover to a secondary, patch the old primary and

    then fail back if I need to. Online operations is another key thing, if Im doing an application change

    where Im actually changing the database structures or adding new data to the system, online

    operations that are enhanced in Denali will allow you to make these changes without impacting the

    currently running workloads and theres new enhancements in SQL Server Denali where we can do

    more online index builds with log data types and big data types as well as adding in columns that are

    non null able columns which were not previously available in the previous releases. Along with this

    a lot of times you look into okay what are other sources that impact my SLAs if Im building my SLAsas a business Im looking not only what happens in the event of a failover when Im taking the system

    down but is my system able to respond at the throughput that I need it to go and respond and thats

    where resource governor is a great technology and allows you to throttle the workloads to reserve

    the capacity for your core workloads so I can have my core system saying that I want to make sure

    that I reserve 80% of my CPU for my core workload allowing lower priority workloads to still run onto

    that same box but restricting that so its not going to be on a certain set of resources so thats where

    resource governor is a great technology to be able to restrict your lower priority workloads from

    impacting the SLAs of your most critical workloads.

    15

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    16/40

    Thats a little bit about flash introduction to SQL Server, now Ill hand it over to Scott to talk about

    Exhange.

    Scott: My name is Scott Schnoll, Im a Principal Technical Writer on the Exchange Team among other

    things I write all the product documentation around high availability, site resilience, disaster recovery

    and a few other areas so Im really excited to talk to you about it.

    16

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    17/40

    I do though want to tell you that Exchange does things a little differently from what youve heard

    until now. We do use failover clustering technologies but we dont use any shared storage, we dont

    use the resource model and in fact were just more of a consumer of clustered technologies as youll

    see in a minute. We also have in Exchange a very specific definition of high availability okay so to

    have true high availability for an Exchange Server you must have three criteria; you must service

    availability, data availability and automatic recovery from most failures and we say most failures

    because youre not going to get automatic recovery from all failures for example a data center level

    type event you wouldnt get automatic recovery from them. We have mechanisms to do manual

    recoveries for that but thats not an automatic solution and that would be a DR process, not a highly

    available process.

    The other thing I want to tell you about is we use this acronym called *overs a lot and that really isjust our short hand notation for switchovers and failovers. Failoversweve been talking about a lot,

    failovers simply when the system takes the automatic corrective action for you, a switchover is when

    an administrator manually activates for instance a passive copy of an exchange database.

    And then we have site resilience as well, site resilience and HA they are unified into a single platform

    inside of Exchange 2010 for example but there are different operations with different configurations

    as youll see here in a minute, site resilience is that DR type configuration that you do to protect

    yourself when you have multiple data centers and you want redundancy across those data centers.

    17

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    18/40

    Now we actually introduced both service availability and *over capabilities way, way back in

    Exchange 5.5 but back in those days we were using Microsoft cluster server and NT 4, we were usingthe cluster resource model and many of our core components were cluster aware Exchange knew it

    was being installed in a cluster and it did something a little different from an unclustered Exchange

    Server.

    We also back at that time relied very heavily on third party partner products, we didnt have any built

    in data replication whatsoever so we had no native data availability in Exchange and instead relied on

    hardware vendors, storage vendors, replication vendors to make copies of our data for us.

    In Exchange 2007 we took a very revolutionary leap forward okay we started the model of breaking

    away of doing the old legacy way of doing Exchange clustering. We still supported the old style of

    Exchange clustering where you use shared storage but we gave that a different name, we called thata single copy cluster to reflect that in that cluster you only had one single copy of your data. So in

    2007 we also introduced a second form of Exchange clustering called cluster continuous replication

    and that in fact is when we introduced our continuous replication or what we call log shipping

    technology.

    18

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    19/40

    We actually have three different forms of continuous replication in Exchange 2007, one is local

    where youre just shipping a copy of the logs to the database thats connected to the same server as

    your active copy, we also have cluster replication where every database you had on an active node

    was being replicated up to a node and you always had them in pairs and then we had this one called

    standby continuous replication that we introduced in service pack one for Exchange 2007 and what

    that did was allow you to replicate data pretty much anywhere from a standalone mailbox server to

    another standalone, from a cluster to a standby cluster and so forth and in fact it became as

    Exchange 2007 evolved and matured it became pretty much the defacto configuration or

    architecture to use a combination of cluster continuous replication for high availability within the

    data center and standby continuous replication to get you site resilience for that data center as well.

    And so this is basically what it looks like, the information stored in Exchange is doing what its donesince day one, generates log files and as those log files are closed theyre copied over to the other

    copy of the database, theyre inspected by the other copy of the database and assuming they pass

    inspection they then get replayed into that copy thereby making that copy pretty much an up to date

    bit for bit duplicate of the original active copy.

    19

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    20/40

    Now this is typically what it would look like when you see it in the organization topology, here Ive got two

    separate CCR clusters remember CCR was always a pair of two; an active and a passive so Ive got twoseparate clusters, Ive got some Outlook and Outlook web app, and Active Sync clients that are out theretheyre going through our front end component called a client access server in 2007 and later or in the case ofOutlook going directly to the information store and talking to it and basically we would replicate one for onewithin these pairs. If you wanted to extend that solution to another data center you used a separatetechnology, you used standby continuous replication and that actually worked really well but it had somechallenges with it. But it still worked, it got the data over there, you had a standby server, maybe a standbycluster and so if you had any problem with your primary site, in this case San Jose you could go ahead andactivate the Dallas site, get your clustered mailbox up and running and life was good.There were some challenges though when you clustered the mailbox role in 2007 it couldnt co-exist with anyother server roles or client access role the transport role, the unified messaging role, you had to buy extrahardware to do that, it only allowed you to use the mailbox role in the cluster. So that meant at a minimum ifyou wanted high availability for Exchange 2007 you had to buy at least four servers. Some of the other

    challenges were you had to have some clustering knowledge okay and that might not seem like a big deal ifyouve been doing it for a long time but most of the administrators who managed Exchange solutions areExchange pros not cluster pros. And so sometimes it was challenging for them to build the underlying clustercorrectly before they would deploy Exchange. This wasnt so much true in the CCR paradigm but it wasespecially true in the other type of cluster we had in 2007 called the single copy cluster when you had to alsodeal with the shared storage and the interconnects and getting that just right. Some of the other challengeswas even though in 2007 we supported 50 databases per server if you had a problem with just a singledatabase on that server you had to failover the whole clustered mailbox server, the entire Exchange Serversnetwork identity had to be moved to another server even if you only had one problematic database out of 50so that wasnt very optimal.We did introduce SCR, finally people had a built in way to get data replicated outside of the cluster and offsiteto a different data center but we introduced it in a service pack and typically when we introduce majorfeatures in a service pack we dont put GUI around them, okay so that meant if you wanted to manage SCR

    you had to do it all from the Exchange management shell which is a powershell based console that you had touse, you couldnt use the Exchange management console which is an MMC snapin and click on pictures andstuff so that meant administrators had to learn to manage CCR one way and manage SCR a completelydifferent way. And then the last challenge was even after you got the data over there, it was a pretty complexactivation process that you had to go through there were many, many steps that involved usurping theclustered mailbox server itself and forklifting it over to the recovery server that took time, for someadministrators it was confusing because of the different technologies and so we looked at all of this and cameup with a whole new solution in Exchange 2010.

    20

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    21/40

    And in fact Exchange 2010 is very different from anything that weve done in the past. First of all theres no

    more clustered mailbox server okay, we dont use the cluster resource anymore or put slightly different, thecluster has no idea that were even there, but we know the clusters there because we use it, we use the

    clusters node and membership APIs so that we can join the servers together in a group, we also use the

    clusters heart beating technology which is very mature, and proven technology and will allow us to find out

    when servers are dropping off the network. And of course we use the cluster database because theres data

    that we need to share between the members and the solution and we need to share it very quickly much

    more quickly then if we were to store it in Active Directory and wait for it to be replicated across.

    So what we have now, what youre seeing is a representation of a new construct that we call a database

    availability group, or DAG for short. A DAG is simply a collection of mailbox servers, in this case five mailbox

    servers that host replicated databases so for example if you look at DB1 for example you can see that DB1 is

    using green, on mailbox server 1, green means in this case its the active copy and then weve got DB1 onmailbox server 2 and a DB1 under mailbox server 4 that are in blue, those represent passive copies of the

    database. Databases that the system keeps up and maintains itself and that are waiting to become active in

    the case of some sort of failure affecting the actual active database. We also made another architectural

    change where now all clients including Outlook mapping clients no longer connect directly to the information

    store. Instead they now connect to a set of services on the client access server, one is called the client

    address service, thats where they get their directory information, and the other service is called the RPC

    client access service and thats where they get their MAPI endpoint now, so all Outlook knows is its got its

    MAPI and directory in points, it has no idea its talking to a client access server not a mailbox server anymore.

    So you can see here I have the option to replicate databases as I see fit, its not like CCR where every database

    you have on the active node gets replicated at the passive node. Its more like SCR in this case in that theadministrator gets to choose which databases get replicated and to where. So in this case the administrator

    only wanted three copies of DB1 so we spread them across mailbox server one, two and four. Similarly you

    can see on mailbox server one, DB1 and 3 are both green, those are the active copies but mailbox server 1

    also hosts a passive copy of DB2, again, this is another departure from our previous model where you had

    only active instances on one server and only passive instances on another server. Now we have multiple

    instances, you could have active and passive copies of multiple databases on multiple servers as you see here.

    21

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    22/40

    In changing to this model this changed everything from a failover perspective okay because we dont

    have a clustered mailbox server anymore because we dont have a network identity to moveanymore, we now only have to simply move the designation of the active copy and I say move thedesignation of the active copy because were not really picking up a database and moving it. All

    were saying is youre active now, youre passive now, you had a problem, youre active now and

    youre passive now, its that simple. Failovers now managed completely within Exchange because

    there is no cluster resource model if you open up failover cluster manager on a mailbox server thats

    a member of DAG and you look under services and applications, youre not going to see anything.

    Theres no exchange group, theres no exchange resources, no IP addresses, no storage groups, no

    databases, no information store, no system attendant, nothing, we dont use the cluster resource

    anymore. That though means that we had to have some mechanism to handle failover within

    Exchange in previous versions if we had a problem the cluster moved that resource over to anothernode for us. Now we actually have a whole brand new component inside of Exchange called Activemanager and Active manager runs in a key service on these mailbox servers its called the Microsoft

    Exchange replication service.Its the same service we introduced in 2007 to do log shipping and CCR and SCR but now theres anew component that runs inside that service called Active Manager thats the brain of the Exchange

    solution. Active manager is not only responsible for managing everything but its also responsible for

    initiating the corrective action when some sort of failure occurs. So say for instance the disk hostingDB1just dies, were not using RAID in this case so the disk dies and now the database is gone with it.Active manager detects that and will automatically failover the active copy to one of the otherpassive copies whichever one it believes to be the best most up to date healthy copy.

    So I mention all clients connect via CAS so the system works somewhat like this, Ive got any clientout there, might be Outlook, might be Outlook web app, might be Active sync, we dont know, its

    just a client accessing and getting messages in. Theres an active manager client that also runs inside

    of the client access server and that knows where the users database is located so users only connectto CAS and its CAS that talks to RPC MAPI to the information store. Users dont talk to the

    information store directly anymore, weve abstracted the user connection away from the information

    store so that we can get fast failover when one of these databases has to failover.

    22

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    23/40

    So messages come in, they go to the appropriate database and then the log files representing those

    messages get replicated to the copies of that database these are using message icons, its not the

    actual messages that we replicate its the actual transaction log files generated by the Exchange

    database engine itself that gets replicated.

    So if we have a failure affecting database 1, database 1 disappears for whatever reason maybe its

    the storage, maybe its some sort of corruption we dont know, database 1 is gone, what happens in

    30 seconds or less and you can see under mailbox server #2, the copy of DB1 has now gone green, in

    30 seconds or less a new active copy replaces the failed active copy by choosing the best available

    passive copy to activate. So in this case the system decided that the best copy was on mailbox server

    2, notice the client still stays connected the CAS, even though their underlying database went away,

    CAS understands whats going on because of the active manager clients, so active manager says no,your database is now over here, Im going to connect CAS to mailbox server 2 and the client is back

    in business.

    And because it happens so quickly its quite possible and more often then not clients dont even

    notice that any of this happened, theyre abstracted away from it so as their database goes away

    they dont get disconnected, they might get disconnected at the CAS server goes away but well talk

    about load balancing and how to deal with that in a minute but assuming CAS doesnt go away and

    its just a failure of the mailbox server, the mailbox servers network, the mailbox servers disks,

    thats going to be a transparent failover to the client okay, theyre probably not going to notice

    anything. Okay, now if they happen to be in the middle of Outlook web app, theyre composing

    something and they go to hit send, and in the middle of hitting send a failover occurs, they will get a

    message saying that your mailbox is temporarily unavailable. But if they just wait a few seconds and

    press f5 for refresh on the browser it will bring them right back into their mailbox and they wont

    even have to log on again, okay, its that fast. And of course mail flow will continue because active

    manager knows where the database is located and replication will continue as long as you have

    multiple copies left.

    23

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    24/40

    Now each DAG, Im showing a five member DAG here, each DAG can go up to sixteen members so

    you can have sixteen copies of all your databases and each Exchange Server itself supports 100databases so you can 1600 databases inside a single DAG, of course that would be non replicated

    databases but you can have 800 databases where youve got 2 copies of each, 533 databases with 3

    copies of each, and so forth and also consider that our maximum recommended database size is

    now 2 terra bytes per database you can grow this very, very large it scales incredibly well, and in case

    youre wondering how well, this solution is whats running Outlook.com, Office 365 and so forth. So

    were talking 75 million almost 80 million mailboxes on this solution. The beauty of it is the same

    exact commands you use to create the DAG inside a data center the same exact ones you would use

    to extend it to another data center to put yourself in a site resilience configuration, its that easy.

    24

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    25/40

    Now, I mentioned the DAG and this is basically again, what the architecture would look like, clients

    are talking to the client access server, theyre talking to the RPC client access service and the addressbook service, and its the active manager component that is telling those services where the users

    mailbox is located so that CAS can talk to mailbox for them.

    25

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    26/40

    So a DAG is simply just a set of servers up to sixteen that host a set of replicated databases you can

    have multiple DAGs in a single org, obviously if you need more than 16 members in the DAG youhave to use a second DAG and so forth. We do leverage the Windows failover cluster technologies

    but were not cluster aware and we dont use the cluster resource model and the DAG itself defines

    the boundary for replication so you wont be replicating outside the DAG.

    26

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    27/40

    And we mention this, now we did add a second form of continuous replication in service pack 1 so

    let me briefly talk about that.

    27

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    28/40

    So now I have two forms and in 2007 and in 2010 RTM we had one form of continuous replication

    and one form of log shipping where were shipping closed transaction files. So the active copy in

    green would create the log files and then the passive copy would say hey, send me your latest log

    files, Ive got these so far, the latest log files would go through and if the passive copy is able to keep

    up and catch up with log generation activity on the active copy, which in this case it would because

    now its got log five which was the last one generated, now it says you know what, the database

    copy is up to date, now the system switches into block mode and we now instead of shipping those

    transaction files we actually ship blocks of ESE transactions as theyre being written to the log buffer.

    So we write to the log buffer on the active side and at the same time we send that information over

    to a corresponding buffer on the passive side and keep things up to date. Now all continuous

    replication is asynchronous so we dont wait for acknowledgement on the other side. So there ispotential for data loss but weve got other mechanisms built in the system to get that data back. But

    again, now we have the ability to replicate blocks, we dont have to wait for a transactional log file to

    be closed in order to externalize that data which means that that amount of losable data has

    substantially decreased with service pack one as a result of databases being able to leverage block

    mode. And of course once the buffers full, it generates the corresponding log file on each separate

    side, its built, its inspected separately, and of course replayed into the copy of the database after

    that. We also have a mechanism whereby if we only get a partial buffer and then the active copy

    goes away, well actually use that, well take that, we call it a log fragment and well convert it into a

    full log and if theres usable transactions in there, well play those transactions against the database

    and at least get the data that we were able to get over.

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    29/40

    We also have this concept of a lagged database copy and this is a database copy that you have the

    ability to delay replay of log files against for up to 14 days. So think of it as a point in time back up ofyour database up to 14 days. Dont go beyond 14 days we have a hard coded limit of 14 days for the

    lag but its basically there to provide you with a maximum of 14 days protection against things like

    logical corruption. If you have physical corruption in your store thats not going to be a problem

    because continuous replication will detect that and it will block physical corruption from being

    replicated to another database. For logical corruption theres no way for the system to tell that and

    so as its fail back mechanism you have the ability to delay replay into a passive copy so that if you do

    detect logical corruption from an end user you can go and activate a point and copy at a point and

    time before that corruption took place. And of course lag copies will affect storage design where

    holding those log files so you are going to need to size them appropriately but thats something elseas a protection that we have.

    29

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    30/40

    Now load balancing has changed a little bit with Exchange 2010 as well in 2007 most customers were

    used to doing load balancing for the reverse proxies so that traffic coming from the internet wouldget load balanced and not overwhelm a single reverse proxy. As a result of the architectural change

    we made where Outlook now connects to the client access server instead of the information store

    you need to have a form of RPC load balancing that you can use for your Outlook clients so that all of

    your Outlook clients arent going to a single CAS server. So you will need a load balancer now and it

    will have to be an RPC load balancer so that means something like you know Windows Load

    balancing wont be able to handle that for you, its got to be RPC load balancing and the RPC load

    balancer has to not only support RPC but it also has to support infinity as well. This catches some

    customers off guard because its a new requirement we never had in Exchange before so be aware of

    that when you talk to customers who are migrating from 2003 or 2007 to 2010.And the last thing we have is Exchange does support back up and recovery obviously thats disaster

    recovery not high availability. We used to support both the ESE streaming back up APIs and the VSS

    APIs but because were dealing with much larger data sets now the ESE streaming APIs just werent

    going to do the job and so we cut them from Exchange 2010.

    16-Nov-11

    30

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    31/40

    We now support only VSS based back ups but the good news is we ship a plug in for Windows Server

    back up in the box so if you just want a basic VSS back up of your databases you get that in the boxwith Exchange, you dont have to buy other products.

    If you want something more full featured its when DPM or any other Exchange aware third party or

    VSS solution would work for you.

    We also have some other DR technologies one called a recovery database, its basically an object into

    which you can put a restore database in which you can extract data out of it. We also have this

    concept of database portability where you can take any Exchange 2010 database and have it be

    moved to any other Exchange 2010 server inside the Org so even if you didnt replicate it you can

    pick it up and forklift it over to somewhere else.

    The last thing we have is this dial tone portability which if you do have a failure affecting your onlydatabase copy you at least have the ability to spring up what we call this dial tone database, its an

    empty database that just allows users to send and receive mail, it doesnt have all their historical

    data in it, its empty but it gives them dial tone so they can at least send and receive messages while

    youre in the background restoring their data.

    16-Nov-11

    31

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    32/40

    Thank you Scott. Now Im going to continue and talk about some of the other mission critical servers

    for Microsoft and theyre high availability solutions.

    32

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    33/40

    http://support.microsoft.com/kb/957006

    First of all think about virtualization, virtualization is one of Microsofts key investments of the Hyper

    V platform and all teams now test their products in Hyper V. Microsoft has whats called the

    common engineering criteria which are a series of guidelines which each engineering team must

    follow to ensure their applications are enterprise ready. One of the key tenets of this guide is to test

    in Hyper V to make sure that it has equivalent performance and equivalent resiliency, so you can

    actually go online to check out this KB article 957006 and its kept up to date as far as which of the

    various versions of all the Microsoft products and whether they are supported in a Hyper V

    environment. And when we think of Hyper V, think Hyper V with failover clustering meaning that we

    can run all of these application service inside a VM Guest and the VM itself is clustered.

    16-Nov-11

    33

    http://support.microsoft.com/kb/957006http://support.microsoft.com/kb/957006
  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    34/40

    The next major application is the file server and file server of course manages your storage, your

    shares, replication and search and indexing. Traditionally file servers user failover clustering, this isthe default configuration and you can have multiple file servers on a failover cluster.

    DFS replication is another technology which is part of file server. And DFS replication can be used as

    a high availability technology in regards to it allows you to push information from one server to

    another server so in theory its in multiple locations. So if a primary server crashes or becomes

    unavailable you can recover the information from a secondary location. Now, you can do this within

    a single site, within a data center, within a group of servers, you could do this across multiple sites,

    so this can build up a disaster recovery solution if you have multiple data centers. With replication it

    also gives you the ability to access offline files, so if you use the offline files feature youre actually

    using some DFS replication on the backend to go and push or to keep updated your local copy of allof these versions. Now, one of the key things to keep in mind is that replication only happens when a

    file is closed so this could give you pretty good availability if youre working on something such as a

    Word document, or an Excel spreadsheet, but if we extend this concept to the enterprise it doesnt

    do a great job of replicating things which keep their file open. For example a virtual machines VHD

    file or a SQL database, these types of resources are kept open indefinitely and really theyre only

    closed when theyre taken offline. Now, if these are kept open and replication hasnt happened then

    potentially you could lose all of the data, all the information which has been collected since the last

    replication. And for this reason failover clustering does not support DFSR as a replication technology

    since it is possible that some data could be lost. Additionally you have to keep in mind that there

    could be replication conflicts, if you have multiple people working on the same document

    simultaneously in two different locations, when replication happens there could be some synching

    and configuration conflicts which need to be resolved. Nevertheless it is a great true inbox solution

    to give you some levels of high availability in your data center.

    34

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    35/40

    Lync Server is one of the extensions of the unified communications server, it basically covers all types

    of messaging, including IM, voice and video, and content sharing over live streaming mediums. Lyncserver has a high availability architecture which is relatively flexible. The core is using load balancers

    to connect people to a registrar so when a user wants to go and connect to the Lync server theyre

    going to go and get sent to a registrar. Now there is a requirement to use hardware load balancers

    for this registrar and NLB, Microsofts network load balancing is explicitly not supported. Now the

    registrars themselves they actually have access to whats called a back up registrar pool so if this

    primary registrar is unavailable when a client connects or it crashes the client will get sent to this

    backup registrar. So from their perspective they may be disconnected temporarily but their

    transaction will be recovered and they can continue staying online. There is also DNS load balancing

    available for other types of network traffic from Lync server, so this gives you kind of the basic formsof high availability just by distributing incoming clients to different registrars or different Lync server

    components to help distribute the traffic and make sure that one specific server or one specific

    component isnt overloaded with too many client connections.

    Now there are partners which deliver what are called SBAs or survivable branch applications. These

    are essentially customized unified communication applications which contain a subset of all of the

    Lync functionality so by connecting to these SBAs which are generally a branch office we could keep

    a client connected, we could keep them using some of the basic communication and collaboration

    tools but they do have limited ability to use the full functionality of Lync server. By having an SBA in a

    branch office we could keep our client up and running even if they cannot connect to their primary

    data center.

    16-Nov-11

    35

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    36/40

    SBA in a branch office we could keep our client up and running even if they cannot connect to their

    primary data center.Lync also has two multi-site high availability solutions which are called Data Center Resiliency and

    Metropolitan Data Center Resiliency. With Data Center Resiliency it allows us to spread our Lync

    servers across multiple physical locations and we can even have high availability for the voice

    communication so if somebodys on a phone call, primary data center crashes, we can actually

    failover to this secondary without dropping the call. Specifically the high availability is built for voice

    failover so it is possible that other types of transactions such as IM communication could be

    temporarily lost if a failover happens.

    The more advanced version of this is whats called Metropolitan Data Center Resiliency and with this

    you have an Active/Active configuration with continual replication at the sites at the hardware level.And the reason why this is called metropolitan is that its generally going to be deployed within a

    specific city, meaning that the breadth or the distance which you can stretch the sites is limited to a

    few miles or a few dozen miles. This does give you higher availability since it is an active/active

    connection so youll have better resilience but the distance between the data center can be limited.

    The final Lync server high availability solution is simply backup and restore. If you lose information, it

    crashes, you can pull it back using an expedited service restoration process which can be a workflow

    that can be pre-programed or pre-orchestrated to help recover as quick as possible.

    16-Nov-11

    36

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    37/40

    http://blogs.msdn.com/b/joelo/archive/2007/03/09/sharepoint-backup-restore-high-availability-

    and-disaster-recovery.aspx

    SharePoint Server is Microsofts web platform for all types of collaboration and document

    management. Its primarily built around a database where all of this shared content is stored and

    this database can be made highly available using SQL; it can use SQL backup and restore, database

    mirroring or log shipping and this database can be protected using System Center Data Protection

    Manager or DPM. DPM has some nice integration points with SharePoint because it gives you the

    ability to granularly restore specific objects so if you lost a specific document you could go and just

    recover that specific document rather than having to restore the whole database.

    Two of the additional SharePoint servers the Crawl or Index server and the Search or Query serverthese are deployed in a redundant topology meaning that there are multiple versions of them

    available throughout the infrastructure and if one of them is unavailable clients will simply be

    reconnected to another one to help speed up the indexing or to help speed up their search queries.

    SharePoint does have a rich, front end, web based interaction experience where clients or users

    actually will go and browse the document or collaborate on the site using a web front end and this is

    made highly available using Windows Network Load Balancing, NLB will go and distribute the traffic

    across these multiple front end servers to ensure that a single server is not overloaded.

    An additional high availability feature which is unique or customized for SharePoint is the recycle bin

    and this gives you the ability to simply recover items that were accidentally deleted so if you lose any

    type of file, list or application, by default they are still saved for 30 days before theyre permanently

    deleted to give people the opportunity to recover any documents that were accidentally removed.

    16-Nov-11

    37

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    38/40

    Technical Review complete

    As we talked about the web server for SharePoint a lot of this is built on Microsofts web server

    known as IIS. IIS has a rich series of clients and a rich topology to handle all the different kind of web

    services from file transfers to actually displaying websites.

    Network Load Balancing is used for most of the web server roles with IIS, this means that when a

    client tries to connect to anything they can go through NLB and be load balanced across multiple

    servers, additionally hardware load balancing can be used. Now network load balancing, this does

    load balancing at a level 2-3 layer in the networking stack. However with IIS with the web server

    there are often load balancing requirements at level 7 which is what we call the HTTP traffic and so

    load balancer at this level actually look at the URL, HTTP://microsoft.com and it will go and loadbalance traffic based on what is contained within that URL it will load balance the HTTP traffic. And

    it does this by whats called an application request routing server or ARR. ARR essentially contains

    this logic to do the load balancing for this level 7 traffic. However the ARR server itself needs to be

    made highly available so that its not a single point of failure and the ARR server can user network

    load balancing to be deployed in redundant arrays. So at the front end youre going to have ARR

    with network load balancing this is going to not only give you traffic load balancing at level 2 and 3

    but then its going to go and figure out at level 7 where it should redirect the clients to the content

    servers and then you have the middle tier which will actually go and serve the content up.

    Additionally IIS has high availability for two of its roles using failover clustering. This is the FTP role

    and the WWW role. There are white papers out there that will actually show you how to explicitly

    configure these roles on a Windows Server failover cluster so that there are client access points for

    anyone trying to connect to either of these roles is always available and can move between different

    nodes in the failover cluster.

    16-Nov-11

    38

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    39/40

    As we review this section weve covered quite a lot of the core servers and core applications from

    Microsoft. As we know downtime is inevitable so not only is it important to keep our infrastructureup and running but its more important to keep our applications up and running. While most of the

    technologies and servers can use virtualization or network load balancing, many of them have

    unique and specific dependencies on failover clustering. And as weve seen from Exchange Server as

    well as SQL Server they both use failover clustering as one of the underlying technologies yet they

    abstract a lot of the management and unique functions specific to SQL and Exchange from clustering

    so while they might use the cluster for membership or for health checking the rest of the

    functionality is unique to Exchange and to SQL.

    We hope that you found this module on application high availability useful and check out part three

    of this series which will go and look at management high availability.

    16-Nov-11

    39

  • 7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers

    40/40

    This video is a part of the Microsoft Virtual Academy.

    Thank you.