Wide-Area Service Composition: Evaluation of Availability and Scalability

Wide-Area Service Composition:Evaluation of Availability and

Scalability

Bhaskaran RamanSAHARA, EECS, U.C.Berkeley

Provider QProvider Q

TextTexttoto

audioaudio

Provider RProvider R

CellularPhone

Emailrepository

Provider AProvider A Video-on-demandserver

Provider BProvider B

ThinClient

Transcoder

Problem Statement and Goals

Goals– Performance: Choose

set of service instances

– Availability: Detect and handle failures quickly

– Scalability: Internet-scale operation

Problem Statement– Path could stretch across

– multiple service providers– multiple network domains

– Inter-domain Internet paths:– Poor availability [Labovitz’99]– Poor time-to-recovery

[Labovitz’00]– Take advantage of service replicas

Provider AProvider A

Provider AProvider A

Video-on-demandserver

Provider BProvider B

Provider BProvider BThinClient

Transcoder

Related Work– TACC: composition within cluster– Web-server choice: SPAND, Harvest– Routing around failures: Tapestry, RON

We address: wide-area n/w perf., failure issues for long-lived composed sessions

Is “quick” failure detection possible?

• What is a “failure” on an Internet path?– Outage periods happen for varying durations

• Study outage periods using traces– 12 pairs of hosts

• Berkeley, Stanford, UIUC, UNSW (Aus), TU-Berlin (Germany)• Results could be skewed due to Internet2 backbone?

– Periodic UDP heart-beat, every 300 ms– Study “gaps” between receive-times

• Results:– Short outage (1.2-1.8 sec) Long outage (> 30 sec)

• Sometimes this is true over 50% of the time

– False-positives are rare:• O(once an hour) at most

– Similar results with ping-based study using ping-servers– Take away: okay to react to short outage periods, by

switching service-level path

UDP-based keep-alive stream

HB destination HB source Total time Num. False positives

Num. Failures

Berkeley UNSW 130:48:45 135 55

UNSW Berkeley 130:51:45 9 8

Berkeley TU-Berlin 130:49:46 27 8

TU-Berlin Berkeley 130:50:11 174 8

TU-Berlin UNSW 130:48:11 218 7

UNSW TU-Berlin 130:46:38 24 5

Berkeley Stanford 124:21:55 258 7

Stanford Berkeley 124:21:19 2 6

Stanford UIUC 89:53:17 4 1

UIUC Stanford 76:39:10 74 1

Berkeley UIUC 89:54:11 6 5

UIUC Berkeley 76:39:40 3 5Acknowledgements: Mary Baker, Mema Roussopoulos, Jayant Mysore, Roberto Barnes, Venkatesh Pranesh, Vijaykumar Krishnaswamy, Holger Karl, Yun-Shen Chang, Sebastien Ardon, Binh Thai

Architecture

Composed services

Hardware platform

Peering relations,Overlay network

Service clusters

Logical platform

Application plane

Service cluster: compute cluster capable of running

services

Internet

Peering: exchange perf. info.

Destination

Source

Fin

ding

Ove

rlay

Ent

ry/E

xit

Loc

atio

n of

Ser

vice

Rep

lica

s Service-Level PathCreation, Maintenance,

and Recovery

Link-State Propagation

At-least-once UDP

Perf.Meas.

LivenessDetection

Functionalities at the Cluster-Manager

Evaluation• What is the effect of recovery mechanism on application?

– Text-to-Speech application

– Two possible places of failure• 20-node overlay network• One service instance for each service• Deterministic failure for 10sec during session• Metric: gap between arrival of successive audio packets at the client

• What is the scaling bottleneck?– Parameter: #client sessions across peering clusters

• Measure of instantaneous load when failure occurs

– 5000 client sessions in 20-node overlay network– Deterministic failure of 12 different links (12 data-points in graph)– Metric: average time-to-recovery

Leg-2 Leg-1TextText

totoaudioaudio Text Source

End-ClientRequest-response protocolData (text, or RTP audio)Keep-alive soft-state refreshApplication soft-state (for restart on failure)

11

22

Recovery of Application

Session:CDF of

gaps>100ms

Recovery time: 822 ms(quicker than leg-2 due to

buffer at text-to-audio service)

Recovery time: 2963 ms

Recovery time: 10,000 ms

Jump at 350-400 ms: due to synch. text-to-audio processing (impl. artefact)

11

AverageTime-to-

Recovery vs. Instantaneous

Load• Two services in each

path• Two replicas per service• Each data-point is a

separate run

End-to-End recovery algorithm

High variance due to varying path length

Load: 1,480 paths on failed linkAvg. path recovery time: 614 ms

22

Results: Discussion• Recovery after failure (leg-2): 2,963 = 1,800 + O(700) +

O(450)– 1,800 ms: timeout to conclude failure– 700 ms: signaling to setup alternate path– 450 ms: recovery of application soft-state: re-process current

sentence• Without recovery algorithm: takes as long as failure duration• O(3 sec) recovery

– Can be completely masked with buffering– Interactive apps: still much better than without recovery

• Quick recovery possible since failure information does not have to propagate across network

• 12th data point (instantaneous load of 1,480) stresses emulator limits– 1,480 translates to about 700 simul. paths per cluster-

manager– In comparison, our text-to-speech implementation can

support O(15) clients per machine• Other scaling limits? Link-state floods? Graph computation?

11

22

Summary

• Service Composition: flexible service creation• We address performance, availability,

scalability• Initial analysis: Failure detection -- meaningful

to timeout in O(1.2-1.8 sec)• Design: Overlay network of service clusters• Evaluation: results so far

– Good recovery time for real-time applications: O(3 sec)

– Good scalability -- minimal additional provisioning for cluster managers

• Ongoing work:– Overlay topology issues: how many nodes,

peering– Stability issues

Feedback, Questions?Presentation made using VMWare

Evaluation

Analy

sis

Design

Emulation Testbed

App

LibNode 1

Node 2

Node 3

Node 4

Rule for 12

Rule for 13

Rule for 34

Rule for 43

Emulator

Operational limits of emulator: 20,000 pkts/sec, for upto 500 byte pkts, 1.5GHz Pentium-4

Wide-Area Service Composition: Evaluation of Availability and Scalability

Documents

Transcript of Wide-Area Service Composition: Evaluation of Availability and Scalability