On Effectively Exploiting Multiple Wireless Interfaces in Mobile Hosts
Xiaoqiao Meng, Vasileios Pappas, Li ZhangIBM T.J. Watson Research CenterImproving the Scalability of Data Center Networkswith Traffic-aware Virtual Machine PlacementIEEE INFOCOM 20101INTRODUCTIONThe scalability of modern data centers has become apractical concern and has attracted significant attention in recent years.
existing solutions that require changes inthe network architecture and the routing protocols to balance traffic load
With an increasing trend towards more communicationintensive applications in data centers, the bandwidth usage between virtual machines (VMs) is rapidly growing.
INTRODUCTIONThis paper proposes using traffic-aware virtual machine (VM) placement to improve the network scalability
Many VM placement solutions seek to consolidate VMs for CPU, physical memory and power consumption savings, yet without considering consumption of network resources
This paper tackling the scalability issue by optimizing the placement of VMs on host machines.
INTRODUCTIONe.g. VMs with large mutual bandwidth usage are assigned to host machines in close proximity
design a two-tier approximate algorithm that efficiently solves the VM placement problem for very large problem sizes
INTRODUCTIONContributions
1.We address the scalability issue of data center networkswith network-aware VM placement. We formulate it asan optimization problem, prove its hardness and propose a novel two-tier algorithm.INTRODUCTIONContributions
2.We analyze the impact of data center network architectures and traffic patterns on the scalability gains attained by network-aware VM placement.INTRODUCTIONContributions
3.We measure traffic patterns in production data centerenvironments, and use the data to evaluate the proposedalgorithm as well as the impact analysis.INTRODUCTIONProblem definition:the Traffic-aware VM Placement Problem (TVMPP) as an optimization problem.
Input: traffic matrix, cost matrixOutput: where VMs should be placedGoal: minimize the total cost
Data Center Traffic PatternsExamine traces from two data-center-like systems:
1. a data warehouse hosted by IBM Global Services the incoming and outgoing traffic rates for 17 thousand VMs
2. the incoming and outgoing TCP connections for 68 VMs
10 days measurement
Data Center Traffic PatternsUneven distribution of traffic volumes from VMs
Data Center Traffic PatternsStable per-VM traffic at large timescale:
Data Center Traffic PatternsWeak correlation between traffic rate and latency:
Data Center Traffic PatternsThe potential benefit : increased network scalability and reduced average traffic latency.
The observed traffic stability over large timescale suggests that it is feasible to find good placements based on past traffic statisticsData Center Network Architectures
Data Center Network Architectures
Problem FormulationCij: the communication cost from slot i to jDij: traffic rate from VM i to jei: external traffic rate for VMigi:communication cost between VMi and the gateway
Problem FormulationThe above objective function is equivalent to
X: permutation matrix
Problem Formulationwe define Cij as the number of switches on the routing path from VM i to j.
Accordingly, optimizing TVMPP is equivalent to minimizing average traffic latency caused by network infrastructure.
Offline scenario & Online scenarioComplexity AnalysisTVMPP falls into the category of Quadratic Assignment Problem (QAP)
QAP: n things are put into n location, with distance & flows
NP-hard, finding the optimality of QAP problems with size > 15 is practically impossibleComplexity AnalysisTheorem 1: For a TVMPP problem defined on a data center that takes one of the topology in Figure 4, finding the TVMPP optimality is NP-hardTheorem 1This can be proved by a reduction from the Balanced Minimum K-cut Problem (BMKP)
BMKP: G = (V,E) undirected, weighted graph with n vertices
A k-cut on G is defined as a subset of E that partition G into k components
BMKP is NP-hardTheorem 1Now considering a data center network, regardless of whichtopology being used, we can always create a network topology that satisfy:
n slots that are partitioned into k slot-clusters of equal size n/kEvery two slots have a connection with certain costwithin the same cluster : ciacross clusters cost: co co > ci
Theorem 1Suppose there are n VMs with traffic matrix D. By assigning these n VMs to the n slots, we obtain a TVMPP problem
if we define a graph with the n VMs as nodes and D as edge weights, we obtain a BMKP problem
It can be shown that when the TVMPP is optimal, the associated BMKP is also optimal.
And vise versaTheorem 1when the TVMPP is optimal, if we swap any two VMs i, j that have been assigned to two slot-cluster r1, r2 respectively, the k-cut weight will increase.
Let s1 denote the set of VMs assigned to r1Let s2 denote the set of VMs assigned to r2
the TVMPP objective value increases:
The amount of change for the k-cut weight is:
ALGORITHMSProposition 1: Suppose 0 a1 a2 . . . an and 0 b1 b2 . . . bn, the following inequalities hold for any permutation on [1, . . . , n]
ALGORITHMSFirst, according to Proposition 1, solving TVMPP is intuitively equivalent to finding a mapping of VMs to slots such that VM pairs with heavy mutual traffic be assigned to slot pairs with low-cost connections.
ALGORITHMSThe second design principle is divide-and-conquer:we partition VMs into VM-clusters and partition slots into slot clusters.
VM-clusters are obtained via classical min-cut graph algorithm which ensures that VM pairs with high mutual traffic rate are within the same VM cluster
Slot-clusters are obtained via standard clustering techniques which ensures slot pairs with low-cost connections belong to the same slot-clusterALGORITHMSSlotClustering: Minimum k-clustering ,NP-hard. ( an approximation ratio 2 ) O(nk)
VMMinKcut: minimum k-cut algorithmO(n4)
Assign VMs to slotsRecursive call
ALGORITHMS
IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNSGlobal Traffic Model:each VM sends traffic to every other VM at equal and constant rateFor any permutation, matrix X, holds
This simplifies the TVMPP problem to the following:
IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNSwhich is the classical Linear Sum Assignment Problem (LSAP) . The complexity for LSAP is O(n3)
Random placement:
IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS
IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNSPartitioned Traffic Model:Under the partitioned traffic model, each VM belongs to agroup of VMs and it sends traffic only to other VMs in thesame group
The GLB is a lower bound for the optimal objective value of a QAP problem
IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS
IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS
observation
EVALUATION
DISCUSSIONCombining VM migration with dynamic routing protocols
VM placement by joint network and server resource optimization:Amos Brocco, Apostolos Malatras, Ye Huang, Beat HirsbrunnerDepartment of InformaticsUniversity of Fribourg, SwitzerlandARiA: A Protocol for Dynamic Fully DistributedGrid Meta-SchedulingICDCS 2010
40INTRODUCTIONAn advantage of grid systems is their ability to guarantee efficient meta-scheduling (optimal allocation of jobs across a pool of sites with diverse local scheduling policies )
The centralized nature of current meta-scheduling solutions is not well suited for the envisioned increasing scale and dynamicity
INTRODUCTIONThis paper focuses on grid task meta-scheduling, by presenting a fully distributed protocol named ARiA to achieve efficient global dynamic scheduling across multiple sites
The meta-scheduling process is performed online, and takes into account the availability of new resources as well as changes in actual allocation policies
ARiA PROTOCOLJob Submission PhaseJob Acceptance PhaseDynamic Rescheduling Phase
Job Submission PhaseJobs are assigned a universal unique identifier (UUID)
Nodes receiving job submissions are referred to as initiators for these jobs
Initiators issue resource discovery queries across the grid peer-to-peer overlay by broadcasting REQUEST messages
Job Acceptance PhaseIf the request cannot be satisfied, the message is further forwarded on the peer-to-peer overlay
otherwise a cost value for the job based on actual resources and current scheduling is computed and sent back to the jobs initiator by means of an ACCEPT message
The initiator evaluates incoming ACCEPT responses, and selects the best qualified node, and sends an ASSIGN message
Dynamic Rescheduling Phasethe assignee attempts to find candidates for rescheduling of jobs in its queue while their execution has not yet started by the INFORM messagges
The structure of INFORM messages relates to that of REQUEST messages
EVALATIONFor the evaluation of ARiA, an overlay of 500 nodes with a target average path length of 9 hops was deployed in a custom simulator
The average nodes degree attained during simulations was 4, resulting in about 2000 overlay linksEVALATIONIn all scenarios a total of 1000 jobs is submitted to randomnodes on the grid. Unless otherwise specified, jobs aresubmitted at 10 seconds intervals
when dynamic rescheduling is enabled, INFORM messages are sent for at most 2 scheduled jobs every 5 minutes
EVALATIONREQUEST messages are forwarded on the overlay for at most 9 hops; at each step, at most 4 random neighbors of the current node are contacted
INFORM messages a more lightweight approach is followed, with at most 8 hops and up to 2 neighbors
EVALATION
EVALATION
EVALATION
EVALATION
EVALATION
EVALATION
EVALATION
EVALATION
EVALATION
EVALATION
EVALATION
Amir Epstein Dean H. Lorenz Ezra Silvera Inbar ShapiraVirtualization Technologies, System Technologies & ServicesIBM Haifa Research Lab, Haifa, IsraelVirtual Appliance Content Distributionfor a Global Infrastructure Cloud ServiceIEEE INFOCOM 201061INTRODUCTIONAn emerging cloud service is a virtual server shop, that allows cloud customers to order virtual appliances to be delivered virtually on the cloud
Global cloud providers need to create customized virtual-server disk images and deliver them on time to meet the customer reservations and service level
In order to reduce provisioning time and meet reservation deadlines, one approach is to stage images on storage near the customer
INTRODUCTIONThis introduces an optimization problem of finding an optimal staging schedule, according to network bandwidth, pending reservations schedule, and customer value
Continuous model vs integral model
Problem Definitiona staging storage space with capacity C
n appliance deployment requests
Each request i is for an appliance that consume Ci staging capacity and has a desired due date di
propagation time pi (assume pi = kCi)Continuous model
Continuous modelLemma 4.1: Any feasible schedule S can be turned intoright-tight feasible schedule by right-shifting job executionintervals.
Lemma 4.2: There exists an optimal schedule in which thejobs are processed in EDD (Earliest Due Date) order
Continuous modelAlgorithm 1 is a dynamic program that finds an optimal schedule that is right-tight and in EDD orderO(nW ) time and space
THE INTEGRAL MODELp1=3, d1=5p2=3, d2=8P3=2, d3=9Capacity = 5
s1=2, s2=5, s3=0But can not be an EDD order
SOLUTIONOur algorithms for the integral model have two steps
In the first step, we solve the problem for the continuous model
In the second step, we discard jobs from this schedule, without losing too much weight, to obtain a feasible schedule for the integral modelTHE INTEGRAL MODELLemma 5.1: For unweighted jobs, any feasible schedule Sfor the continuous model can be transformed to a feasibleschedule S for the integral model with w(S)>=1/2w(S).
Lemma 5.2: For weighted jobs, any feasible schedule S forthe continuous model can be transformed to a feasible schedule S* for the integral model with w(S*)>=1/2w(S)
SOLUTION
IMPLEMENTATION AND SIMULATION RESULTS
Top Related