DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University...

51
DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman, and Patrick Wendell

Transcript of DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University...

Page 1: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

DONAR: Decentralized Server Selection for Cloud Services

Jennifer Rexford, Princeton University

Joint work with Joe Wenjie Jiang, Michael J. Freedman, and Patrick Wendell

Page 2: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Outline

• Server selection background

• Constraint-based policy interface

• Scalable optimization algorithm

• Production deployment

Page 3: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

User-Facing Services are Geo-Replicated

Page 4: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Reasoning About Server Selection

ServiceReplicas

Client Requests

MappingNodes

Page 5: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Example: Distributed DNS

Client 1

Client C

DNS 1

DNS 2

DNS 10

ServersAuth. Nameservers

Client 2

Clients Mapping Nodes Service Replicas

DNS Resolvers

Page 6: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Example: HTTP Redir/Proxying

Client 1

Client C

DatacentersHTTP Proxies

Client 2

Clients Mapping Nodes Service Replicas

HTTP Clients

Proxy 1

Proxy 2

Proxy 500

Page 7: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Reasoning About Server Selection

ServiceReplicas

Client Requests

MappingNodes

Page 8: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Reasoning About Server Selection

ServiceReplicas

Client Requests

MappingNodes

Outsource to DONAR

Page 9: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Outline

• Server selection background

• Constraint-based policy interface

• Scalable optimization algorithm

• Production deployment

Page 10: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Naïve Policy Choices: “Round Robin”

ServiceReplicas

Client Requests

MappingNodes

Page 11: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Naïve Policy Choices: “Closest Node”

ServiceReplicas

Client Requests

MappingNodes

Goal: support complex policies

across many nodes.

Page 12: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Policies as Constraints

ReplicasDONAR Nodes

bandwidth_cap = 10,000 req/m

split_ratio = 10%allowed_dev = ± 5%

Page 13: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Eg. 10-Server Deployment

How to describe policy with constraints?

Page 14: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

No ConstraintsEquivalent to “Closest Node”

2%6%

10%1% 1%

7%2%

28%

9%

35%Requests per Replica

Page 15: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

No ConstraintsEquivalent to “Closest Node”

2%6%

10%1% 1%

7%2%

28%

9%

35%Requests per Replica

Impose 20% Cap

Page 16: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Cap as Overload Protection

2%6%

10%1% 1%

7%14%

20% 20% 20%Requests per Replica

Page 17: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

12 Hours Later…

5%

16%

29%

4% 3%

16%

3%10% 12%

3%

Requests per Replica

Page 18: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

“Load Balance”(split = 10%, tolerance = 5%)

Requests per Replica

5% 5% 5% 5% 5%

15% 15% 15% 15% 15%

Page 19: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

“Load Balance”(split = 10%, tolerance = 5%)

Requests per Replica

5% 5% 5% 5% 5%

15% 15% 15% 15% 15%

Trade-off network proximity & load distribution

Page 20: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

12 Hours Later…

Requests per Replica

7%15% 15% 15%

5%13%

5%10% 10%

5%

Large range of policies by varying cap/weight

Page 21: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Outline

• Server selection background

• Constraint-based policy interface

• Scalable optimization algorithm

• Production deployment

Page 22: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Optimization: Policy Realization

• Global LP describing “optimal” pairing

Clients: c C ∈ Nodes: n N∈ Replica Instances: i I∈

Minimize network cost

Server loads within tolerance

Bandwidth caps met

s.t.

Page 23: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Optimization Workflow

Page 24: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Optimization Workflow

Per-customer!

Page 25: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Optimization Workflow

Continuously!

(respond to underlying traffic)

Page 26: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

By The Numbers101 102 103 104

DONAR Nodes

Customers

replicas/customer

client groups/ customer

Problem for each customer:102 * 104 = 106

Page 27: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Measure Traffic & Optimize Locally?

ServiceReplicas

MappingNodes

Page 28: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Not Accurate!

ServiceReplicas

MappingNodes

Client Requests

No one node seesentire client population

Page 29: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Aggregate at Central Coordinator?

ServiceReplicas

MappingNodes

Page 30: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Aggregate at Central Coordinator?

ServiceReplicas

MappingNodes

Share TrafficMeasurements

(106)

Page 31: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Aggregate at Central Coordinator?

ServiceReplicas

MappingNodes

Optimize

Page 32: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Aggregate at Central Coordinator?

ServiceReplicas

MappingNodes

Returnassignments

(106)

Page 33: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

So FarAccurate Efficient Reliable

Local only No Yes Yes

CentralCoordinator

Yes No No

Page 34: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Decomposing Objective Function

min ∑𝑐∈𝐶

∑𝑖 ∈𝐼

α𝒄∙𝑅𝑐𝑖 ∙𝑐𝑜𝑠𝑡 (𝑐 ,𝑖)

∑𝑛∈𝑁

𝑠𝑛 ∑𝑐∈𝐶

∑𝑖∈𝐼

α𝑐𝑛∙𝑅𝑛 𝑐𝑖 ∙𝑐𝑜𝑠𝑡 (𝑐 ,𝑖)

Traffic from c cost of mapping c to i

∀ clients∀ instances

=

∀nodesTraffic to this node

cost(c, i)

prob of mapping c to i

cost(c, i)

Page 35: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Decomposed Local ProblemFor Some Node (n*)

min

loadi = f(prevailing load on each server + load I will impose on each server)

∀𝑖 𝑙𝑜𝑎𝑑𝑖+𝑠𝑛∗ ∑𝑐∈𝐶

∑𝑖∈𝐼

α𝑐𝑛∗∙𝑅𝑛∗𝑐𝑖

∙𝑐𝑜𝑠𝑡 (𝑐 ,𝑖 )

Local distanceminimization

Global loadinformation

Page 36: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

DONAR Algorithm

ServiceReplicas

MappingNodes

Solve local problem

Page 37: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

DONAR Algorithm

ServiceReplicas

MappingNodes

Solve local problem

Share summary data

w/ others(102)

Page 38: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

DONAR Algorithm

ServiceReplicas

MappingNodes

Solve local problem

Page 39: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

DONAR Algorithm

ServiceReplicas

MappingNodes

Share summarydata w/ others

(102)

Page 40: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

DONAR Algorithm

• Provably converges to global optimum

• Requires no coordination

• Reduces message passing by 104

ServiceReplicas

MappingNodes

Page 41: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Better!Accurate Efficient Reliable

Local only No Yes Yes

CentralCoordinator

Yes No No

DONAR Yes Yes Yes

Page 42: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Outline

• Server selection background

• Constraint-based policy interface

• Scalable optimization algorithm

• Production deployment

Page 43: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Production and Deployment

• Publicly deployed 24/7 since November 2009

• IP2Geo data from Quova Inc.

• Production use:– All MeasurementLab Services

(incl. FCC Broadband Testing) – CoralCDN

• Services around 1M DNS requests per day

Page 44: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Systems Challenges

• Network availability Anycast with BGP

• Reliable data storageChain-Replication with Apportioned Queries

• Secure, reliable updatesSelf-Certifying Update Protocol

Page 45: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

CoralCDNReplicas

DONAR Nodes

Client Requests

CoralCDN Experimental Setup

split_weight = .1tolerance = .02

Page 46: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Results: DONAR Curbs Volatility

“Closest Node” policy

DONAR “Equal Split” Policy

Page 47: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Results: DONAR Minimizes Distance

1 2 3 4 5 6 7 8 9 10

Minimal (Closest Node)DONARRound-Robin

Ranked Order from Closest

Requ

ests

per

Rep

lica

Page 48: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Conclusions

• Dynamic server selection is difficult– Global constraints– Distributed decision-making

• Services reap benefit of outsourcing to DONAR– Flexible policies– General: Supports DNS & HTTP Proxying– Efficient distributed constraint optimization

• Interested in using? – Visit http://www.donardns.org.

Page 49: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Questions?

Page 50: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Related Work (Academic and Industry)

• Academic– Improving network measurement

• iPlane: An informationplane for distributed servicesH. V. Madhyastha, T. Isdal, M. Piatek, C. Dixon, T. Anderson,A. Krishnamurthy, and A. Venkataramani, “,” in OSDI, Nov. 2006

– “Application Layer Anycast”• OASIS: Anycast for Any Service

Michael J. Freedman, Karthik Lakshminarayanan, and David MazièresProc. 3rd USENIX/ACM Symposium on Networked Systems Design and Implementation(NSDI '06) San Jose, CA, May 2006.

• Proprietary– Amazon Elastic Load Balancing– UltraDNS– Akamai Global Traffic Management

Page 51: DONAR: Decentralized Server Selection for Cloud Services Jennifer Rexford, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman,

Doesn’t [Akamai/UltraDNS/etc] Already Do This?

• Existing approaches use alternative, centralized formulations.

• Often restrict the set of nodes per-service.

• Lose benefit of large number of nodes (proxies/DNS servers/etc).