Making Services Fault Tolerant

23
1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engi neering The Chinese University of Hong Kong Miroslaw Malek Department of Computer Science and Engi neering Humboldt University Berlin

description

Making Services Fault Tolerant. Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek Department of Computer Science and Engineering Humboldt University Berlin. Outline. Introduction Problem Statement - PowerPoint PPT Presentation

Transcript of Making Services Fault Tolerant

Page 1: Making Services Fault Tolerant

1

Making Services Fault Tolerant

Pat Chan, Michael R. Lyu Department of Computer Science and EngineeringThe Chinese University of Hong Kong Miroslaw MalekDepartment of Computer Science and EngineeringHumboldt University Berlin

Page 2: Making Services Fault Tolerant

2

Outline Introduction Problem Statement Methodologies for Web Service

Reliability New Reliable Web Service Paradigm Road Map for Experiment Experimental Results and Discussion Conclusion

Page 3: Making Services Fault Tolerant

3

Introduction Service-oriented computing is becoming a

reality. Service-oriented Architectures (SOA) are

based on a simple model of roles. The problems of service dependability,

security and timeliness are becoming critical.

We propose experimental settings and offer a roadmap to dependable Web services.

Page 4: Making Services Fault Tolerant

4

Problem Statement Fault-tolerant techniques

Replication Diversity

Replication is one of the efficient ways for providing reliable systems by time or space redundancy.

Increasing the availability of distributed systems Key components are re-executed or replicated Protect against hardware malfunctions or transient system faults.

Another efficient technique is design diversity. By independently designing software systems or services with

different programming teams, Resort in defending against permanent software design faults.

We focus on the analysis of the replication techniques when applied to Web services.

A generic Web service system with spatial as well as temporal replication is proposed and investigated.

Page 5: Making Services Fault Tolerant

5

Methodologies for reliable Web services -- Redundancy Spatial redundancy

Static redundancy, all replicas are active at the same time and voting takes place to obtain a correct result.

Dynamic redundancy engages one active replica at one time while others are kept in an active or in standby state.

Temporal redundancy Redundant in time

Page 6: Making Services Fault Tolerant

6

Methodologies for reliable Web services -- Diversity

Protect redundant systems against common-mode failures

With different designs and implementations, common failure modes will probably cause different error effects.

N-version programming, recovery blocks…

Page 7: Making Services Fault Tolerant

7

Failure Response Stages of Web Services Fault confinement Fault detection Diagnosis Fail-over Reconfiguration Recovery Restart Repair Reintegration

Page 8: Making Services Fault Tolerant

8

Fault Confinement

Fault Detection Fault Detection

Failover Diagnosis

Online Offline

Reconfiguration

Recovery

Restart

Repair

Reintegration

Page 9: Making Services Fault Tolerant

9

Replication Manager

Web service selection algorithm

WatchDog

UDDI

Registry

WSDL

Web ServiceIIS

Application

Database

Web ServiceIIS

Application

Database

Web ServiceIIS

Application

Database

Client

Port

Application

Database

1. Create web services

2. Select primary web service (PWS)

3. Register

4. Look up

5. Get WSDL

6. Invoke web service

7. Keep check the availability of the PWS

8. If PWS failed, reselect the PWS.

9. Update the WSDL

Proposed Paradigm

Page 10: Making Services Fault Tolerant

10

RM sends message to the Web Service

Reselect a primary Web Service

Do not get reply

Map the new address to the WSDL

System Fail

Get reply

All Service failed

Work Flow of the Replication Manager

Page 11: Making Services Fault Tolerant

11

Road Map for Experiment Research

Redundancy in time Redundancy in space

SequentiallyParallelMajority voting using N modular

redundancyDiversified version of different

services

Page 12: Making Services Fault Tolerant

12

Experiments

A series of experiments are designed and performed for evaluating the reliability of the Web service, single service without replication,single service with retry or reboot and, service with spatial replication.

We will also perform retry or failover when the Web service is down.

Page 13: Making Services Fault Tolerant

13

Summary of the experiments

  None Retry/Reboot

Failover Both (hybrid)

Single service, no retry

0 -- -- --

Single service with retry

-- 1 -- --

Single service with reboot

-- 2 -- --

Spatial replication

-- -- 3 4

Page 14: Making Services Fault Tolerant

14

Parameters of the Experiments

Parameters Current setting/metric

Request frequency 1 req/min

Polling frequency 5 ms

Number of replicas 5

Client timeout period for retry 10 s

Failure rate λ # failures/hour

Load (profile of the program) % or load function

Reboot time 10 min

Failover time 1 s

Page 15: Making Services Fault Tolerant

15

Experimental Results

Experiments over 360 hour periods (43200 reqs)

Number of failures Normal

Number of failuresServer busy

Number of failuresServer reboots periodically

Exp 0 4928 6130 6492

Exp 1 2210 2327 2658

Exp 2 2561 3160 3323

Exp 3 1324 1711 1658

Exp 4 1089 1148 1325

Retry11.97% to 4.93%

Reboot11.97% to 6.44%

Failover11.97% to 3.56%Retry and Failover11.97% to 2.59%

Page 16: Making Services Fault Tolerant

16

Number of failure when the server is is normal situation

Page 17: Making Services Fault Tolerant

17

Number of failure when the server is busy

Page 18: Making Services Fault Tolerant

18

Number of failure when the server reboots periodically

Page 19: Making Services Fault Tolerant

19

Reliability of the system over time

0

( ) ( )lim 0.025t

F t t F t

t

( )( ) t tR t e

Page 20: Making Services Fault Tolerant

20

Reliability Model

Page 21: Making Services Fault Tolerant

Reliability Model Parameters

ID Description Value

λn Network failure rate 0.02

λ* Web service failure rate 0.228

λ1 Resource problem rate 0.142

λ2 Entry point failure rate 0.150

μ* Web service repair rate 0.286

μ1 Resource problem repair rate 0.979

μ2 Entry point failure repair rate 0.979

C1 Probability that the RM responds on time 0.9

C2 Probability that the server reboots successfully 0.9

Page 22: Making Services Fault Tolerant

22

Outcome (SHARPE)

Failure Rate0.2280.1140.057

Reliability of the proposed system

Page 23: Making Services Fault Tolerant

23

Conclusion

Surveyed replication and design diversity techniques for reliable services.

Proposed a hybrid approach to improving the availability of Web services.

Carried out a series of experiments to evaluate the availability and reliability of the proposed Web service system.

N-Version Programming may finally become commercially viable in service environment.