DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang...
-
Upload
carmel-marshall -
Category
Documents
-
view
224 -
download
0
Transcript of DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang...
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters
Nanyang Technological University
Shanjiang Tang, Bu-Sung Lee, Bingsheng He
School of Computer Engineering
Nanyang Technological University
23/4/21
OutLine
• Background and Motivation• DynamicMR Overview• Experimental Evaluation• Conclusion
2Nanyang Technological University23/4/21
Big Data is Everywhere
• Lots of data is being collected and warehoused. – Web data, e-commerce– purchases at department/
grocery stores– Bank/Credit Card
transactions– Social Network– Astronomical Image
Processing– Bioinformatics.
3Nanyang Technological University23/4/21
MapReduce is a Promising Choice
• A popular parallel programming model
4Nanyang Technological University
Map Intermediate
Result
Intermediate
Result
Intermediate
Result
Intermediate
Result
Map
Map
Map
Reduce OutputResult
ReduceOutputResult
ReduceOutputResult
ReduceOutputResult
FinalResult
Map-Phase Computation
Reduce-Phase Computation
InputData
23/4/21
Hadoop
• Apache Hadoop is a open-source framework for reliable, scalable, and distributed computing. It implements the computational paradigm named MapReduce.– Scale up to 6,000-10,000 machines– Support for multi-tenancy
• Useful links:– http://hadoop.apache.org/– http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html
– http://apache.panu.it/hadoop/common/stable/
5Nanyang Technological University23/4/21
Challenges in Distributed Environment
• Node failures and Stragglers (slow nodes)– Mean time between failures for 1000 nodes = 1 dayAffecting performance.
• Commodity network = low bandwidth– Push computation to the data (Data Locality Optimization)Affecting performance.
• Resource contention in shared cluster environment– Performance isolation and fair resource sharingAffecting performance and fairness.
Performance and fairness optimization are important!
23/4/21 Nanyang Technological University 6
Our Work
• Challenges: How to improve the performance of Hadoop while guarantee the fairness?
• Our Solution: DynamicMR: A Dynamic Resource Allocation System for Hadoop. – Improve the resource utilization as much as possible.– Improve the utilization efficiency as much as possible.
23/4/21 Nanyang Technological University 7
OutLine
• Background and Motivation• DynamicMR Overview• Experimental Evaluation• Conclusion
8Nanyang Technological University23/4/21
• Hadoop abstracts resources into map slots and reduce slots.– Configured by Hadoop administrator statically. – Resource constrain: map tasks can only use map slots,
reduce tasks can only use reduce slots.
Observation 1#: Poor Resource Utilization
9Nanyang Technological University23/4/21
0 4 8 12 16 20 24 28 32 36 40 44
1JM
1JM
3JM
4JM
4JM
4JM
4JM
4JM
4JM
4JM
4JM
4JM
2JR 3J
R
1JR 4J
R
2JM
4JR
3JR
Slots resources are wasted during computation!
• Core idea of DHSA.– Slots are generic and can be used by either map or
reduce tasks, although there is a pre-configuration for the number of map and reduce slots.
– Map tasks will prefer to use map slots and likewise reduce tasks prefer to use reduce slots.
Technique 1#: Dynamic Hadoop SlotAllocation (DHSA)
10Nanyang Technological University23/4/21
0 4 8 12 16 20 24 28 32 36 40 44
1JM
3JM
4JM
4JM
4JM
4JM
4JM
4JM
4JM
4JM
4JM
3JR
1JR
4JR
4JR
3JR
1JM
2JM
2JM
2JR
2JR
Observation 2#: Speculative Execution is a Double-edged Sword• Speculative Scheduling
– Run a backup task for straggled task. – Pros: Can improve the performance of a single Job.– Cons: the resource utilization efficiency is reduced,
especially when there are other pending tasks.
11Nanyang Technological University23/4/21
1122
3 3
44
55
1
stragglerstraggler
Backup taskBackup task
A Performance tradeoff for a single job and batch jobs!
1122
3 3
44
55
66
Benefit J1 Benefit J1
Benefit the whole workloadBenefit the whole workload
• Key idea of SEPB:– Instead of running speculative tasks immediately when
straggler of a job is detected, we check a subset of jobs (maxNumOfJobsCheckedForPendingTasks)for pending tasks.
– If there are pending tasks, allocate pending tasks. Otherwise, allocate speculative task.
Technique 2#: Speculative Execution Performance Balancing (SEPB)
12Nanyang Technological University23/4/21
J4 J3 J2 J1J5J6
maxNumOfJobsCheckedForPendingTasks
Observation 3#: Load Balance Requirement Harms Data Locality• Load Balancing is adopted by Hadoop.
– Hadoop tries to keep the load (i.e., running tasks) in each node is as close as possible.
13Nanyang Technological University23/4/21
Load Balancing makes J1 failed to achieve data locality!
• Key idea: Improve data locality at the expense of load balance.– When there are idle slots and local data, we preschedule
the task on that machine first.– Otherwise, we keep the load balance constrain.
Technique 3#: Slot PreScheduling
14Nanyang Technological University23/4/21
DynamicMR
• A combination of the aforementioned three techniques.– DHSA : Slot Utilization Optimization.– SEPB, Slot PreScheduling: Efficiency Optimization
15Nanyang Technological University23/4/21
Speculative Execution Performance Balancing
(SEPB)Slot PreScheduling
Dynamic Hadoop SlotAllocation (DHSA)
Map Task
ReduceTask
(1). Slot Utilization Optimization
(2). Utilization Efficiency Optimization
IdleSlot
1 2 3
OutLine
• Background and Motivation• DynamicMR Overview• Experimental Evaluation• Conclusion
16Nanyang Technological University23/4/21
Experimental Setup
• Hadoop Cluster– 10 nodes, each with two Intel X5675 CPUs (6 cores per
CPU with 3.07 GHz), 24GB DDR3 memory, 56GB hard disks.
• Benchmark and Data Sets.
17Nanyang Technological University23/4/21
DynamicMR Performance Evaluation
18Nanyang Technological University23/4/21
DynamicMR VS YARN
• DynamicMR achieves better performance than YARN.– Benefits from the ratio control of concurrently running map
and reduce tasks of DynamicMR, whereas YARN not.
19Nanyang Technological University23/4/21
OutLine
• Background and Motivation• DynamicMR Overview• Experimental Evaluation• Conclusion
20Nanyang Technological University23/4/21
Conclusion
• We propose a DynamicMR framework to improve the performance of MapReduce workloads while maintaining the fairness.– Consists of three techniques: DHSA, SEPB, and Slot
Prescheduling.
• Experimental results show that:– It improves the performance of Hadoop 46%~115% for
single jobs and 49%~112% for batch jobs.– It outperforms YARN by about 2%~9% for multiple jobs.
21Nanyang Technological University23/4/21
22Nanyang Technological University23/4/21
DHSA Evaluation
• DHSA achieves a better performance than Hadoop.• Hadoop is sensitive to slot configuration, whereas
DHSA does not.
23Nanyang Technological University23/4/21
SEPB Evaluation
• SEPB improves the performance for the whole jobs (Figure a).
• There is a performance tradeoff between an individual jobs and the whole jobs with SEPB (Figure b).
24Nanyang Technological University23/4/21
Slot PreScheduling Evaluation
• Data Locality and Performance Improvement
25Nanyang Technological University23/4/21