High availability of azure applications(paa s)

20
High Availability of Azure Applications(PaaS) Himanshu Sahu Mindfire Solutions himanshus@mindfiresolutions. com

Transcript of High availability of azure applications(paa s)

High Availability of Azure Applications(PaaS)

Himanshu SahuMindfire [email protected]

AgendaIntroduction

Windows Azure Role Architecture

Fault Domains in Windows Azure

Update Domains in Windows Azure

Windows Azure Host OS Updates

Windows Azure Guest OS Updates

Techniques for High Availability

High Availability in AzureIntroduction

ALWAYS ON

Reliability and Scalability

Design for failure

Implement separation of function

Use a service-oriented architecture

Windows Azure Role Architecture

Fault Domains in Windows Azure

Fault DomainsFault Domain is a physical unit of failure, and is closely related to the physical infrastructure in the data centers. In Windows Azure the rack can be considered a fault domain. However there is no 1:1 mapping between fault domain and rack.

Windows Azure Fabric is responsible to deploy the instances of your application in different fault domains. Right now Fabric makes sure that your application uses at least 2 (two) fault domains.

As a developer have no direct control over how many fault domains your application will use.

Update Domains in Windows Azure

Update DomainsUpgrade Domain is a logical unit, which determines how particular service will be upgraded.

The default number of upgrade domains that are configured for your application is 5 (five). You can control how many upgrade domains your application will use through the upgradeDomain configuration setting in your service definition file (CSDEF).

Windows Azure Host UpdatesWhen and WhyWindows Azure deploys updates to the host OS approximately once per month. This ensures that Windows Azure provides a reliable, efficient and secure platform for hosting your applications.

The HA consists of multiple subcomponents, such as the Network Agent (NA) that manages virtual machine VLANs and the Virtual Machine virtual disk driver that connects Virtual Machine disks to the blobs containing their data in Windows Azure Storage. Azure therefore update the HA and its subcomponents at different intervals, depending on when a fix or new functionality is ready.

Windows Azure Host Updates

Windows Azure Host UpdatesHowThe host OS reboots instances and the fabric controller ensures that only instances from one upgrade domain at a time will be rebooted.

Virtual machines running on the server that have an Input Endpoint in their role’s service model are removed from the load balancer rotation so that no new requests will come to the virtual machine and instead new requests are sent to other instances of that role as per the Azure load-balancing policies.

Each virtual machine hosting a Web or Worker Role receives a Stopping event, whereas VM Roles receive a standard Windows shutdown event.

Worker, Web, and Virtual machine roles are allowed five minutes to respond to the stopping and shutdown event before they are forcibly stopped.

Windows Azure Host UpdatesHow

After all guest virtual machines are stopped, the root partition OS shuts down and the server reboots.

The updated root partition OS starts.

The virtual machines hosted on the server boot and start their application code.

Virtual machines hosting service roles with Input Endpoints reconnect to the load balancer, enabling them to receive client request

Windows Azure Guest Updates

Once the Host OS has finished upgrading across the datacenter then the Guest OS will be upgraded for services which are configured to use automatic Guest OS versions and this upgrade will proceed using standard upgrade domain rules for your service.

Your VM will be rebooted and the Windows Partition (the D drive) will be reimaged with the upgraded OS.

The Guest OS update process is much faster than the Host OS update since the fabric only has to coordinate the update within your hosted service and your upgrade domains.

Availability

An available application considers the availability of its underlying infrastructure and dependent services. Available applications remove single points of failure through redundancy and resilient design

Azure SLA

More Instances in Azure

Make Guest OS Update Manual

Availability

Scalability directly affects availability—an application that fails under increased load is no longer available. Scalable applications are able to meet increased demand with consistent results in acceptable time windows.

Auto Scaling in Azure

Availability

Protection against hardware failures

Because every application is made up of multiple instances of each role, hardware failures—a disk crash, a network fault, or the death of a server machine—won’t take down the application. To help with this, the fabric controller doesn’t choose machines for an application’s instances at random. Instead, different instances of the same role are placed in different fault domains. A fault domain is a set of hardware—computers, switches, and more—that share a single point of failure. (For example, all of the computers in a single fault domain might rely on the same switch to connect to the network.) Because of this, a single hardware failure can’t take down an entire application. The application might temporarily lose some instances, but it will continue to behave correctly.

Availability

Protection against software failures

The fabric controller can also detect failures caused by software. If the code in an instance crashes or the VM in which it’s running goes down, the fabric controller will start either just the code or, if necessary, a new VM for that role. While any work the instance was doing when it failed will be lost, the new instance will become part of the application as soon as it starts running.

AvailabilityThe ability to update applications with no application downtime

When a new version of the application needs to be deployed, the fabric controller can shut down the instances in just one update domain, update the code for these, then create new instances from that new code. Once those instances are running, it can do the same thing to instances in the next update domain, and so on. While users might see different versions of the application during this process, depending on which instance they happen to interact with, the application as a whole remains continuously available.

AvailabilityThe ability to update Windows and other supporting software with no application downtime.

Answer is Update Domain. :)

Resources

https://msdn.microsoft.com/enus/library/azure/dn251004.aspxhttp://blogs.msdn.com/b/kwill/archive/2011/05/05/windows-azure-role-architecture.aspx

http://blog.toddysm.com/2010/04/upgrade-domains-and-fault-domains-in-windows-azure.html

http://blogs.msdn.com/b/kwill/archive/2012/09/19/role-instance-restarts-due-to-os-upgrades.aspx

Questions?

Thank you!