Integrated Resource Management for Cluster-based Internet Services

20
Integrated Resource Management for Cluster-based Internet Services Hong Tang, Tao Yang*, Hong Tang, Tao Yang*, Lingkun Chu Lingkun Chu Dept. of Computer Science Univ. of California, Santa Barbara *: Ask Jeeves, Inc. Kai Shen Kai Shen ept. of Computer Science Univ. of Rochester

description

Integrated Resource Management for Cluster-based Internet Services. Kai Shen Dept. of Computer Science Univ. of Rochester. Hong Tang, Tao Yang*, Lingkun Chu Dept. of Computer Science Univ. of California, Santa Barbara * : Ask Jeeves, Inc. Background. - PowerPoint PPT Presentation

Transcript of Integrated Resource Management for Cluster-based Internet Services

Page 1: Integrated Resource Management for Cluster-based Internet Services

Integrated Resource Management for Cluster-based Internet Services

Hong Tang, Tao Yang*, Lingkun Hong Tang, Tao Yang*, Lingkun ChuChu

Dept. of Computer ScienceUniv. of California, Santa

Barbara*: Ask Jeeves, Inc.

Kai ShenKai ShenDept. of Computer Science

Univ. of Rochester

Page 2: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 2

Background

Large-scale resource-intensive Internet services hosted on server clusters.

Yahoo, MSN, Google, Teoma/Ask Jeeves …

Challenges/requirements for resource management:

Scalability and robustness; Online users require interactive responses; Resource (CPU, IO)–hungry service processing and

large user traffic require efficient resource utilization; Fluctuating user traffic requires adaptive management; Supporting differentiated services to different types of

user requests.

Page 3: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 3

Architecture of Targeted Services:Document Search Engine

Index serversIndex servers(partition 1)(partition 1)

Doc Doc serversservers

Web server/Web server/Query handlersQuery handlers

Local-areaLocal-areanetworknetwork Index serversIndex servers

(partition 2)(partition 2)

Firewall/Firewall/Web switchWeb switch

Index serversIndex servers(partition 3)(partition 3)

Query Query cachescaches

Page 4: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 4

“Neptune” Project Overview

Programming and runtime support to aggregate and replicate stand-alone service components.

Building blocks for scalable and robust service constructions:1. Functionally-symmetric clustering architecture; 2. Integrated resource management – quality, efficiency,

and differentiation;3. Replication management.

Page 5: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 5

Architecture of Targeted Services:Document Search Engine

Index serversIndex servers(partition 1)(partition 1)

Doc Doc serversservers

Web server/Web server/Query handlersQuery handlers

Local-areaLocal-areanetworknetwork Index serversIndex servers

(partition 2)(partition 2)

Firewall/Firewall/Web switchWeb switch

Index serversIndex servers(partition 3)(partition 3)

Query Query cachecache

Neptune

runtime

SAP

Neptune

runtime

SAP

Page 6: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 6

Neptune Deployments

Service deployments: Web document searching; BLAST – protein sequence similarity matching; Prototype database services – online discussion group,

auction. Production system at search engines Teoma/Ask

Jeeves since 2000: search indexes of more than 450M Web documents; over 800 multiprocessor servers; tens of millions of search queries per day.

Page 7: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 7

Outline

Project Overview Integrated Resource Management

Multiple Resource Management Objectives Two-level Mechanism Trace-driven Performance Evaluation on a

Linux Cluster Related Work and the Conclusion

Page 8: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 8

Quality-aware Resource Utilization Efficiency

Throughput: measure resource utilization efficiency.

Service response time: measure client-perceived service quality.

Aggregate service yield: measure quality-aware resource utilization efficiency.

Fulfillment of each service request generates quality-aware service yield – a function of service response time.

Service yield function – specified by service providers (flexibility).

System goal – maximizing aggregate service yield:

)(rY

r

rY )(

Page 9: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 9

Sample Service Yield Functions

0 Full-yield deadline

Deadline0

Drop penalty

Fullyield

Response time

<C> A hybrid metric

Serv

ice y

ield

Response time0 Deadline

0

Constantyield

<A> Maximizing throughput (with a deadline)

Serv

ice y

ield

0 Deadline0

Fullyield

Responsetime

<B> Minimizing mean response time (with a deadline)

Serv

ice y

ield

. if 0

,0 if {Dr

DrCYthroughput

Page 10: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 10

Service Differentiation

Service class – a category of service accesses that enjoy the same level of QoS support.

Client identities: paid vs unpaid, consumers vs corporate partners.

Service types or data partitions: order placement vs catalog browsing.

Service differentiation in Neptune Differentiated service yield function. Proportional resource allocation guarantee.

Page 11: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 11

Two-level Resource Management

Servicenode

Serviceclient

Serviceclient

Serviceclient

Cluster-levelrequest distribution

Service cluster

ServicenodeService

nodeServicenode

Nodes hosting therequested service

Othernode

Othernode

… ...

Page 12: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 12

Cluster-level: Partitioning or Not?

Periodic Server Partitioning [Zhu2001]: Determine resource allocation at each epoch. Partition the server pool among service classes.

Neptune – does not partition servers at cluster-level:

Random polling-based load balancing to evenly distribute requests for each service class to all nodes service differentiation inside each node.

Advantages: Functional-symmetry and decentralization

robustness and scalability. Better handling of system state changes: demand

spikes and node failures. Disadvantage:

Less isolation for misbehaved service classes.

Page 13: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 13

Node-level Request Scheduling

Class 1 … Class N

Requestscheduler

Worker threads

Drop requests likelygenerating zero yield

Search for under-allocated

service class

Found ?Schedule the

under-allocated service class

Yes

Schedule for high aggregate yield

No

Page 14: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 14

Scheduling for High Aggregate Yield

Policy Priority (the smaller the higher)

EDF Relative deadline;

YID Relative deadline divided by expected yield;

Greedy Expected resource consumption divided by expected yield;

Adaptive Dynamically switch between YID (in under-load) and Greedy (in overload).

Offline optimal scheduling is NP-complete.

Page 15: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 15

Evaluation Settings

Evaluation platform A cluster of Linux servers connected by switched

Ethernet. Workload I: trace-driven

Document search on a 2.5GB memory-mapped search index.

Based on 1.5M search queries selected from an one-week access trace at Ask Jeeves search in January 2002.

“ Service yield”-based priority order: Gold > Silver > Bronze.

Workload II: CPU-spinning micro-benchmark. Poisson process arrival; exponentially-distributed service

processing time.

Page 16: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 16

Evaluation on Scheduling Policies (16 nodes aggregate)

EDF and YID perform better than Greedy during system under-load; Greedy performs better during system overload.

Adaptive dynamically switches between YID and Greedy to achieve good performance under both situations.

%100ldOfferedYie

eldRealizedYildOfferedYietLossPercen

Performance Metric:

0% 25% 50% 75% 100%0%

2%

4%

6%

Arrival demand

(A) Underload

EDFYIDGreedyAdaptive

100% 125% 150% 175% 200%0%

15%

30%

45%

60%

Arrival demand

(B) Overload

EDFYIDGreedyAdaptive

Loss

perc

ent

Loss

perc

ent

Page 17: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 17

Service Differentiation during a Demand Spike and a Node Failure (8 nodes)

“ Service yield”-based priority order: Gold > Silver > Bronze. 20% proportional resource guarantee for low-priority Bronze class. Demand spike for the Silver class between time 50 and 150. One node fails at time 200 and recovers at 250.

0 50 100 150 200 250 3000%

20%

40%

60%

80%

100%

Timeline (seconds)

Gold demand

Silver demand

Bronze demand

Gold acquisition

Silver acquisition

Bronze acquisition

CPU

dem

and/a

cquis

itio

nIn

perc

enta

ge t

o t

ota

l sy

stem

re

sourc

e

Page 18: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 18

Performance Scalability

0 5 10 15 200

5

10

15

20

Number of service nodes

<B> Micro-benchmark

Demand 200%Demand 125%Demand 75%

Aggre

gate

d y

ield

(n

orm

aliz

ed)

0 5 10 15 200

5

10

15

20

Number of service nodes

<A> Differentiated Search

Demand 200%Demand 125%Demand 75%

Aggre

gate

d y

ield

(n

orm

aliz

ed)

Page 19: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 19

Related Work

Software infrastructure for cluster-based Internet services – TACC [Fox1997], MultiSpace [Gribble1999], Porcupine [Saito1999], Ninja [von Behren2002].

QoS and service differentiation in computer networks – Weighted Fair Queuing [Demers1990; Parekh1993], Leaky Bucket, LIRA [Stoica1998], [Dovrolis1999].

QoS or real-time scheduling at the single host level – [Huang1989], [Haritsa1993], [Waldspurger1994], [Mogul1996], LRP [Druschel96], [Jones97], Eclipse [Bruno1998], Resource Container [Banga1999], [Steere1999].

Resource management and QoS for Web servers – [Almeida1998], [Pandey1998], [Abdelzaher1999], [Bhatti1999], [Chandra2000], [Li2000], [Voigt2001].

Resource management for clustered servers – LARD [Pai1998], Cluster Reserves [Aron2000], [Sullivan2000], DDSD [Zhu2001], [Chase2001].

Page 20: Integrated Resource Management for Cluster-based Internet Services

04/19/23 OSDI 2002 20

Conclusion

Multiple resource management objectives: quality-aware resource utilization efficiency service differentiation

Two-level resource management mechanism: non-partitioning at the cluster level adaptive scheduling at the node level

Trace-driven evaluations. Future work – other types of service qualities.