Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of...

24
Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta , Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign 1 buted Protocols Research Group: http://dprg.cs.uiuc

Transcript of Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of...

Page 1: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Cross-Layer Scheduling in Cloud Systems

Hilfi Alkaff, Indranil Gupta, Luke Leslie

Department of Computer Science

University of Illinois at Urbana-Champaign

1Distributed Protocols Research Group: http://dprg.cs.uiuc.edu

Page 2: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Inside a Datacenter: Networks Connecting Servers

Tree

Fat Tree[Leiserson 85]

Jellyfish [Singla 12]

Clos [Dally 04]

VL2 [Greenberg 09]

2

Page 3: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Tree

Fat Tree[Leiserson 85]

Jellyfish [Singla 12]

Clos [Dally 04]

VL2 [Greenberg 09]

Structured Networks Unstructured Networksand/or routing

Inside a Datacenter: Networks Connecting Servers

3

Page 4: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

SDN• Software Defined Networking

• For any end-host pair, multiple routes available

• SDN Controller helps to choose one of these routes– Configures switches accordingly

• Which route is the “best”?

4

Page 5: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

SDNs and Applications• Which route is the “best”?• Our approach

– Best network routes should really be decided based on the application that is using the network• To minimize interference (and thus congestion) and to optimize bandwidth use• Today: SDN routes selected application-agnostic way

– But the application itself can help, by placing tasks at servers• Today: Applications schedule tasks in network-agnostic way, leading to bad

bandwidth utilization– SDN Controller and Application Scheduler should coordinate with

each other• This is our cross-layer scheduling approach

5

Page 6: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Applications: Short Real-Time Analytics Jobs

Batch Processing: MapReduce, Hadoop

Stream Processing: Storm

6

Page 7: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Tasks

Storm

Tasks

Hadoop

7

Page 8: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Tasks and Flows

Storm

Tasks

Hadoop

Flows 8

Page 9: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Challenges• Two large state spaces to explore

1. Set of Possible Routes for each end-to-end flow

– Large numbers of flows and possible routes

2. Set of Possible Task to Server Placements

– Large numbers of servers and tasks

9

Page 10: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Our Strategy• To explore state space, use simulated annealing– At application level scheduler– And separately at routing (SDN) level

• Simulated Annealing– probabilistic approach – avoids getting stuck in local optima with some non-zero

probability of jumping away– probability of jumping away decreases quickly over time

(annealing process for steel)

10

Page 11: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Pre-computation• For all pairs of servers, pre-compute the k shortest paths

– Store it in a hash table, indexed by server pair

– Compact storage by merging overlapping routes (for a server pair) into a tree

• Small in size and Quick to compute– 1000 servers, k=10

– 50 M entries

– After compaction, 6 MB

– 3 minutes to generate

11

Page 12: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

When a Job Arrives• Don’t change the allocations or routes of existing jobs

– Non-intrusive

– Reduces state space to explore

• Simulated Annealing is run offline, and the resultant schedule is used to schedule new job’s tasks and flows

• Primary Simulated Annealing (SA) runs at Application level– Calls Routing level SA

12

Page 13: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Simulated Annealing Steps• Start from an arbitrary state

– Tasks to servers, and routes to flows

• Generate next-state S’(At Application Level)

1. De-allocate one task• Prefer tasks that affect computation more, e.g., closer to beginning or end of

topology

2. Allocate this task to random server

3. Call Routing level SA

13

Page 14: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Simulated Annealing Steps (2)…

3.Call Routing level SA

4.(At Routing Level)

5.De-path one route• Select random server pair

• Remove its worst path

– Prefer higher number of hops, and break ties by lower bandwidth

6.Allocate Path: Change this route to a better path– Prefer lower number of hops, and break ties by higher bandwidth

14

Page 15: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Simulated Annealing Steps (3)• After generating next-state S’

– Calculate utility(S’)

– Utility function considers all jobs in cluster (not just new job)

– Utility function accounts for bottlenecked paths from source tasks to sink tasks

• If utility(S’) > utility(current state)– Transition from current state to S’

• If utility(S’) ≤ utility(current state)– Transition with probability e(utility(S’)-utility(current state))/t

– Non-zero probability of transitioning even if S’ is a worse state

– Probability decreases over time (t)

• Wait until convergence

• Re-run entire simulated annealing 5 times, and take best result

15

Page 16: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Experiments• Implemented into Apache Hadoop (YARN)

• Implemented into Apache Storm

• Deployment experiments on Emulab: up to 30 hosts– Emulated network using ZeroMQ and Thrift

– Emulated Fat-Tree and Jellyfish

• Larger scale simulation experiments – Upto 1000 hosts

16

Page 17: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Experimental Settings• 10 hosts, 100 Mbps, 5 links per router, #links selected via scaling rules

– 3 GHz, 2 GB RAM

• Hadoop cluster workload– Facebook’s SWIM benchmark

– Shuffle ranges from 100 B to 10 GB

– 1 job per second

• Storm cluster workload: Random tree topologies– Topologies constructed as randomly with number of children selected by Gaussian (mean = sd = 2)

– 100 B tuples

– Each source generate 1 MB – 100 MB of data

– 10 jobs per minute

• Each experimental run is 10 minutes

17

Page 18: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Tree

Fat Tree[Leiserson 85]

Jellyfish [Singla 12]

Clos [Dally 04]

VL2 [Greenberg 09]

Structured Networks Unstructured Networksand/or routing

Inside a Datacenter: Networks Connecting Servers

18

Page 19: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Storm on Jellyfish Topology

App+Routing SA: 34.1% improvement in throughput at 30 hosts

Application-only SA: 21.2%Routing-only SA: 23.2% Performance

improves with scale

19

Page 20: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Hadoop on Fat-Tree Topology

App+Routing SA: 26% improvement in throughput at 30 hosts

Application-only SA & Routing-only SASmaller than combining both

Performance improves with scale

20

Page 21: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Other Experimental Results• Similar results for other combinations

• Hadoop on Jellyfish– App+Routing SA: 31.9% improvement in throughput at 30 hosts

– Performance improves with scale

– Application-only SA: 18.8%

– Routing-only SA: 25.5%

• Storm on Fat-Tree– App+Routing SA: 30% improvement in throughput at 30 hosts

– Performance improves with scale

– Application-only SA: 21.1%

– Routing-only SA: 22.7%

21

Page 22: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Other Experimental Results (2)• Scheduling time is small

– Time to schedule a new job in a 1000 server cluster– Fat-Tree: 0.48 s (Hadoop) to 0.53 s (Storm)

– Jellyfish: 0.67 s (Hadoop) to 0.74 s (Storm)

• No starvation – Worst case degradation in completion time for any job is 20% in Hadoop, 30% in

Storm

– Outliers are large jobs (rare in real-time analytics with short jobs)

• Fault-recovery is fast– Upon failure, re-run simulated annealing once

– Recovery occurs within 0.35 s to 0.4 s

22

Page 23: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Takeaways• Today: Application schedulers and SDN scheduler are disjoint

– Leads to suboptimal placement and routing

• Our approach: coordinated cross-layer scheduling– Explore small state spaces

– Use simulated annealing

• At 30 hosts, gives between 26% to 34% improvement in Hadoop and Storm for both structured/unstructured networks – Other networks will fall between these two numbers

• Overheads are small, and improvement gets better with scale

23Distributed Protocols Research Group: http://dprg.cs.uiuc.edu

Page 24: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Ongoing/Future WorkOur work opens the door:

•Explore other heuristics, e.g., data affinity for tasks, congestion

•Explore other non-SA approaches

•Available bandwidth estimation

•OpenFlow integration

•Batching multiple jobs into scheduling

24Distributed Protocols Research Group: http://dprg.cs.uiuc.edu