Gueyoung Jung, Nathan Gnanasambandam, and Tridib Mukherjee International Conference on Cloud...
-
Upload
allyson-bell -
Category
Documents
-
view
215 -
download
1
Transcript of Gueyoung Jung, Nathan Gnanasambandam, and Tridib Mukherjee International Conference on Cloud...
Synchronous Parallel Processing of Big-Data Analytics Services to Optimize Performance in Federated Clouds
Gueyoung Jung, Nathan Gnanasambandam, and Tridib Mukherjee
International Conference on Cloud Computing 2012
2
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting
(MOBB) approach Experimental Evaluation Conclusion
Outline
3
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting
(MOBB) approach Experimental Evaluation Conclusion
Outline
4
Collected data can exceed hundreds of terabytes and continuously generated◦ sensors, social media, click-stream, log files,
and mobile devices
The solution: Cloud Computing◦ Analyze big-data by leveraging vast amounts of
computing resources available on demand with low resource usage cost
Big Data
5
Parallel data mining◦ topic mining, pattern mining◦ analyze large amounts of unstructured data◦ time constraint
Big-data are partly analyzed on local private resources while rest of big-data are transferred to external computing nodes◦ more flexible and obvious cost benefits
Parallel Data Mining on Cloud
6
The considerations for optimizing parallel data mining◦Node determination◦Synchronized completion◦Data partition determination
Maximally Overlapped Bin-packing driven Bursting (MOBB)
Optimization of Parallel Data Mining
7
The goals of MOBB algorithm◦Balancing across computing nodes◦Time overlap between data transfer
delay and computation time in each computing node
Goals
8
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting
(MOBB) approach Experimental Evaluation Conclusion
Outline
9
Load distribution◦ the overhead of data transfer
Maximum overlap between data transfer and computation◦ determine the order of different sizes of data
chunks transferred to each node
Task scheduling among computing nodes◦ load-balancing (CometCloud)◦ heterogeneous clouds
Related Work
10
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting
(MOBB) approach Experimental Evaluation Conclusion
Outline
14
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting
(MOBB) approach Experimental Evaluation Conclusion
Outline
16
Estimation of computation time◦ Response surface model◦ Queueing model
Estimation of data transfer delay◦ more dynamic than computation time◦ Auto-regressive moving average (ARMA) model
Estimation of Data Computation and Transfer Delay
18
Determination of bucket size of each node Sorting of data chunks in descending order Sorting node bucket sizes in descending
order (high delay = lower bucket size)
1. Pre-processing
19
The size of data given to particular node depends on the average delay of task on a node
For ideal parallelization,
Determination Bucket Size
where is denoted as total data size assigned to node , and is denoted as the delay for a unit of data
21
Thus, data assigned to each node is limited by an upper bound given as follow,
Determination Bucket Size (Cont.)
22
Weighted load distribution
Delay-based preference
Buckets are completely filled one at a time◦ reduce fragmentation of
buckets
2. Greedy Bin-packing
23
Organize the sequence of chunks for maximizing the overlap between data transfer and computation
3. Post-processing
24
Assume and is the transfer delay and the computation time per unit of data on a node , respectively◦ Ideal: ◦ Type 1: unavailability of data computation ()◦ Type 2: delay incurred by queueing ()
Complete parallelization
Maximizing the Overlap between Data Transfer and Computation
and are the size of data chunk and assigned to node
26
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting
(MOBB) approach Experimental Evaluation Conclusion
Outline
27
Frequent Pattern Mining◦ A phone call log obtained from a call center and
web access log
◦ Size: 200 GB (collected for one year)
◦ Objective: Obtain patterns of each user activities on human resource information systems
Experimental Setup - FPM
28
Four computing nodes◦ Low–end Local Central node (LLC)
5 VMs, each has two 2.8 GHz cores, 1GB memory, 1TB hard drive
◦ Low-end Local Worker (LLW) similar to LLC
◦ High-end Local Worker (HLW) 6 non-virtualized servers, each has 24 2.6 GHz cores,
48GB memory, 10 TB hard drive Shared by other applications
◦ Mid-end Remote Worker (MRW) 9 VMs, each has two 2.8 GHz, 4 GB memory, 1 TB hard
drive
Experimental Setup – Computing Nodes
33
Ideal optimal data allocation ◦ The slack time must be 0
Comparison of Different Algorithms (Cont.)
34
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting
(MOBB) approach Experimental Evaluation Conclusion
Outline
35
A cloud-bursting based on maximally overlapped load-balancing algorithm which is to optimize the performance of big-data analytics is proposed
Results shows the performance can be improved by 20% to 60% against other approaches
Conclusion