ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance...

20
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability

Transcript of ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance...

Page 1: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

ATIF MEHMOOD MALIKKASHIF SIDDIQUE

Improving dependability of Cloud Computing with Fault Tolerance and High Availability

Page 2: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Dependability

In Systems Engineering, dependability is a measure of system’s availability, reliability and maintainability

It is ability of system to deliver services that can be justifiably trusted

Often considered as third axis of system quality

Page 3: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Dependability ontology

Page 4: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Dependability challenges in cloud computing

Lack of trust in shared virtualized infrastructures

Management of cloud computing service by a single provider or vendor is in fact a single point of failure

APIs are proprietaryVirtualization increases complexityHigher resource utilization Common mode outagesMultiple administrative domainsLegal and privacy implications

Page 5: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Threats to dependability

Faults, Errors and FailuresA fault in a system is a deviation from its

expected behaviorFaults may arise due to hardware failure,

software bugs, user error and network problems

Page 6: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Fault Tolerance

Ability of a system to continue providing services to its user in case of failure of some of its components

Faults can be introduced at: Application level Virtual machine level Physical resource level

Page 7: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Fault Tolerance

Application Fault Tolerance: Application health is continuously monitored by

special software components called sensors Sensor may trigger specific procedures to start

repairing process of an application that is malfunctioning

Example : Vmware App HA

Page 8: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Fault Tolerance

Virtual Machine Fault Tolerance: Can be detected by both customer and service

provider Customers can detect virtual machine failure by

monitoring its state with the help of sensors deployed in the cloud

Cloud service provider can provide VM fault tolerance by installing a single sensor per physical server that monitors all virtual machines hosted on that server

Page 9: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Fault Tolerance

Physical Machine Fault Tolerance: Can be implemented by cloud service provider by

monitoring state of physical server machines and in case of hardware failure, resume all virtual machines on new server

Page 10: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Fault Tolerance Techniques

Reactive Fault Tolerance In case of failure, these techniques reduce the effect

of failure on application execution

Proactive Fault Tolerance These techniques work by predicting faults and

proactively replacing the suspected components with working ones

Page 11: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Reactive Fault Tolerance

Check pointingReplicationJob migrationSGuardRetryTask resubmissionUser defined exception handlingRescue workflow

Page 12: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Proactive Fault Tolerance

Software Rejuvenation Self-HealingPre-emptive migration

Page 13: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Tools for implementing fault tolerance

HA proxy: Open source high availability and load balancing

solution for TCP and HTTP based applications De facto standard open source load balancer

ASSUE Automatic Software Self-healing Using REscue points Uses rescue points to detect, tolerate and recover

from software faults

Page 14: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Tools for implementing fault tolerance

SHelp: Upgraded version of ASSURE Uses weighted values to rescue points and error

virtualization techniques so that applications bypass the faulty path

Page 15: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Tools for implementing fault tolerance

Page 16: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

High Availability

Can be achieved by having redundant failover servers

Can be achieved at application level, infrastructure level, data center level

Page 17: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Types of Virtual Machines High Availability

Load sharing Both replicas are active Service requests are equally distributed between both

of themUpdated dedicated hot standby

Two identical virtual machines execute on two different physical servers

Both virtual machines are fully synchronized with state information

VMware Fault Tolerance is an example

Page 18: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Types of Virtual Machines High Availability

Not dedicated hot standby Standby VM running in parallel with active VM Standby is not fully synchronized VMware HA and Symantec’s Veritas Cluster Server

are examples

Page 19: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Types of Virtual Machines High Availability

Shared hot standby Uses check pointing mechanism to update the standby

replica Requires fewer resources for standby replica

Cold standby Standby replica is powered off and lies on storage

media Brought to service when active VM fails Useful for situations where availability requirements

are low

Page 20: ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Conclusion

Dependability is one of the major challenges in cloud computing

Adoption of cloud computing can be increased by addressing the dependability challenges