Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement

On Effectively Exploiting Multiple Wireless Interfaces in Mobile Hosts

Xiaoqiao Meng, Vasileios Pappas, Li ZhangIBM T.J. Watson Research CenterImproving the Scalability of Data Center Networkswith Traffic-aware Virtual Machine PlacementIEEE INFOCOM 20101INTRODUCTIONThe scalability of modern data centers has become apractical concern and has attracted significant attention in recent years.

existing solutions that require changes inthe network architecture and the routing protocols to balance traffic load

With an increasing trend towards more communicationintensive applications in data centers, the bandwidth usage between virtual machines (VMs) is rapidly growing.

INTRODUCTIONThis paper proposes using traffic-aware virtual machine (VM) placement to improve the network scalability

Many VM placement solutions seek to consolidate VMs for CPU, physical memory and power consumption savings, yet without considering consumption of network resources

This paper tackling the scalability issue by optimizing the placement of VMs on host machines.

INTRODUCTIONe.g. VMs with large mutual bandwidth usage are assigned to host machines in close proximity

design a two-tier approximate algorithm that efficiently solves the VM placement problem for very large problem sizes

INTRODUCTIONContributions

1.We address the scalability issue of data center networkswith network-aware VM placement. We formulate it asan optimization problem, prove its hardness and propose a novel two-tier algorithm.INTRODUCTIONContributions

2.We analyze the impact of data center network architectures and traffic patterns on the scalability gains attained by network-aware VM placement.INTRODUCTIONContributions

3.We measure traffic patterns in production data centerenvironments, and use the data to evaluate the proposedalgorithm as well as the impact analysis.INTRODUCTIONProblem definition:the Traffic-aware VM Placement Problem (TVMPP) as an optimization problem.

Input: traffic matrix, cost matrixOutput: where VMs should be placedGoal: minimize the total cost

Data Center Traffic PatternsExamine traces from two data-center-like systems:

1. a data warehouse hosted by IBM Global Services the incoming and outgoing traffic rates for 17 thousand VMs

2. the incoming and outgoing TCP connections for 68 VMs

10 days measurement

Data Center Traffic PatternsUneven distribution of traffic volumes from VMs

Data Center Traffic PatternsStable per-VM traffic at large timescale:

Data Center Traffic PatternsWeak correlation between traffic rate and latency:

Data Center Traffic PatternsThe potential benefit : increased network scalability and reduced average traffic latency.

The observed traffic stability over large timescale suggests that it is feasible to find good placements based on past traffic statisticsData Center Network Architectures

Data Center Network Architectures

Problem FormulationCij: the communication cost from slot i to jDij: traffic rate from VM i to jei: external traffic rate for VMigi:communication cost between VMi and the gateway

Problem FormulationThe above objective function is equivalent to

X: permutation matrix

Problem Formulationwe define Cij as the number of switches on the routing path from VM i to j.

Accordingly, optimizing TVMPP is equivalent to minimizing average traffic latency caused by network infrastructure.

Offline scenario & Online scenarioComplexity AnalysisTVMPP falls into the category of Quadratic Assignment Problem (QAP)

QAP: n things are put into n location, with distance & flows

NP-hard, finding the optimality of QAP problems with size > 15 is practically impossibleComplexity AnalysisTheorem 1: For a TVMPP problem defined on a data center that takes one of the topology in Figure 4, finding the TVMPP optimality is NP-hardTheorem 1This can be proved by a reduction from the Balanced Minimum K-cut Problem (BMKP)

BMKP: G = (V,E) undirected, weighted graph with n vertices

A k-cut on G is defined as a subset of E that partition G into k components

BMKP is NP-hardTheorem 1Now considering a data center network, regardless of whichtopology being used, we can always create a network topology that satisfy:

n slots that are partitioned into k slot-clusters of equal size n/kEvery two slots have a connection with certain costwithin the same cluster : ciacross clusters cost: co co > ci

Theorem 1Suppose there are n VMs with traffic matrix D. By assigning these n VMs to the n slots, we obtain a TVMPP problem

if we define a graph with the n VMs as nodes and D as edge weights, we obtain a BMKP problem

It can be shown that when the TVMPP is optimal, the associated BMKP is also optimal.

And vise versaTheorem 1when the TVMPP is optimal, if we swap any two VMs i, j that have been assigned to two slot-cluster r1, r2 respectively, the k-cut weight will increase.

Let s1 denote the set of VMs assigned to r1Let s2 denote the set of VMs assigned to r2

the TVMPP objective value increases:

The amount of change for the k-cut weight is:

ALGORITHMSProposition 1: Suppose 0 a1 a2 . . . an and 0 b1 b2 . . . bn, the following inequalities hold for any permutation on [1, . . . , n]

ALGORITHMSFirst, according to Proposition 1, solving TVMPP is intuitively equivalent to finding a mapping of VMs to slots such that VM pairs with heavy mutual traffic be assigned to slot pairs with low-cost connections.

ALGORITHMSThe second design principle is divide-and-conquer:we partition VMs into VM-clusters and partition slots into slot clusters.

VM-clusters are obtained via classical min-cut graph algorithm which ensures that VM pairs with high mutual traffic rate are within the same VM cluster

Slot-clusters are obtained via standard clustering techniques which ensures slot pairs with low-cost connections belong to the same slot-clusterALGORITHMSSlotClustering: Minimum k-clustering ,NP-hard. ( an approximation ratio 2 ) O(nk)

VMMinKcut: minimum k-cut algorithmO(n4)

Assign VMs to slotsRecursive call

ALGORITHMS

IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNSGlobal Traffic Model:each VM sends traffic to every other VM at equal and constant rateFor any permutation, matrix X, holds

This simplifies the TVMPP problem to the following:

IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNSwhich is the classical Linear Sum Assignment Problem (LSAP) . The complexity for LSAP is O(n3)

Random placement:

IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS

IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNSPartitioned Traffic Model:Under the partitioned traffic model, each VM belongs to agroup of VMs and it sends traffic only to other VMs in thesame group

The GLB is a lower bound for the optimal objective value of a QAP problem



observation

EVALUATION

DISCUSSIONCombining VM migration with dynamic routing protocols

VM placement by joint network and server resource optimization:Amos Brocco, Apostolos Malatras, Ye Huang, Beat HirsbrunnerDepartment of InformaticsUniversity of Fribourg, SwitzerlandARiA: A Protocol for Dynamic Fully DistributedGrid Meta-SchedulingICDCS 2010

40INTRODUCTIONAn advantage of grid systems is their ability to guarantee efficient meta-scheduling (optimal allocation of jobs across a pool of sites with diverse local scheduling policies )

The centralized nature of current meta-scheduling solutions is not well suited for the envisioned increasing scale and dynamicity

INTRODUCTIONThis paper focuses on grid task meta-scheduling, by presenting a fully distributed protocol named ARiA to achieve efficient global dynamic scheduling across multiple sites

The meta-scheduling process is performed online, and takes into account the availability of new resources as well as changes in actual allocation policies

ARiA PROTOCOLJob Submission PhaseJob Acceptance PhaseDynamic Rescheduling Phase

Job Submission PhaseJobs are assigned a universal unique identifier (UUID)

Nodes receiving job submissions are referred to as initiators for these jobs

Initiators issue resource discovery queries across the grid peer-to-peer overlay by broadcasting REQUEST messages

Job Acceptance PhaseIf the request cannot be satisfied, the message is further forwarded on the peer-to-peer overlay

otherwise a cost value for the job based on actual resources and current scheduling is computed and sent back to the jobs initiator by means of an ACCEPT message

The initiator evaluates incoming ACCEPT responses, and selects the best qualified node, and sends an ASSIGN message

Dynamic Rescheduling Phasethe assignee attempts to find candidates for rescheduling of jobs in its queue while their execution has not yet started by the INFORM messagges

The structure of INFORM messages relates to that of REQUEST messages

EVALATIONFor the evaluation of ARiA, an overlay of 500 nodes with a target average path length of 9 hops was deployed in a custom simulator

The average nodes degree attained during simulations was 4, resulting in about 2000 overlay linksEVALATIONIn all scenarios a total of 1000 jobs is submitted to randomnodes on the grid. Unless otherwise specified, jobs aresubmitted at 10 seconds intervals

when dynamic rescheduling is enabled, INFORM messages are sent for at most 2 scheduled jobs every 5 minutes

EVALATIONREQUEST messages are forwarded on the overlay for at most 9 hops; at each step, at most 4 random neighbors of the current node are contacted

INFORM messages a more lightweight approach is followed, with at most 8 hops and up to 2 neighbors

EVALATION

EVALATION

EVALATION

EVALATION

EVALATION

EVALATION

EVALATION

EVALATION

EVALATION

EVALATION

EVALATION

Amir Epstein Dean H. Lorenz Ezra Silvera Inbar ShapiraVirtualization Technologies, System Technologies & ServicesIBM Haifa Research Lab, Haifa, IsraelVirtual Appliance Content Distributionfor a Global Infrastructure Cloud ServiceIEEE INFOCOM 201061INTRODUCTIONAn emerging cloud service is a virtual server shop, that allows cloud customers to order virtual appliances to be delivered virtually on the cloud

Global cloud providers need to create customized virtual-server disk images and deliver them on time to meet the customer reservations and service level

In order to reduce provisioning time and meet reservation deadlines, one approach is to stage images on storage near the customer

INTRODUCTIONThis introduces an optimization problem of finding an optimal staging schedule, according to network bandwidth, pending reservations schedule, and customer value

Continuous model vs integral model

Problem Definitiona staging storage space with capacity C

n appliance deployment requests

Each request i is for an appliance that consume Ci staging capacity and has a desired due date di

propagation time pi (assume pi = kCi)Continuous model

Continuous modelLemma 4.1: Any feasible schedule S can be turned intoright-tight feasible schedule by right-shifting job executionintervals.

Lemma 4.2: There exists an optimal schedule in which thejobs are processed in EDD (Earliest Due Date) order

Continuous modelAlgorithm 1 is a dynamic program that finds an optimal schedule that is right-tight and in EDD orderO(nW ) time and space

THE INTEGRAL MODELp1=3, d1=5p2=3, d2=8P3=2, d3=9Capacity = 5

s1=2, s2=5, s3=0But can not be an EDD order

SOLUTIONOur algorithms for the integral model have two steps

In the first step, we solve the problem for the continuous model

In the second step, we discard jobs from this schedule, without losing too much weight, to obtain a feasible schedule for the integral modelTHE INTEGRAL MODELLemma 5.1: For unweighted jobs, any feasible schedule Sfor the continuous model can be transformed to a feasibleschedule S for the integral model with w(S)>=1/2w(S).

Lemma 5.2: For weighted jobs, any feasible schedule S forthe continuous model can be transformed to a feasible schedule S* for the integral model with w(S*)>=1/2w(S)

SOLUTION

IMPLEMENTATION AND SIMULATION RESULTS

Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement

Documents

Transcript of Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement