Matchmaking: A New MapReduce Scheduling Technique Chen He Dr. Ying Lu Dr. David Swanson.

Matchmaking: A New MapReduce Scheduling Technique

Chen He Dr. Ying Lu Dr. David Swanson

Problem Statement

• MapReduce cluster scheduling algorithm becomes increasingly important

• Efficient MapReduce scheduler must avoid unnecessary data transmission

• We will focus on decreasing data transmission in a MapReduce cluster

Contributions• Build a matchmaking algorithm to improve

data locality of Hadoop MapReduce jobs

• MatchMaking algorithm lead to higher data locality rate and shorter map task response time

• We substitute Delay algorithm with MatchMaking algorithm in Fair-sharing scheduler and also obtain better performance

Outline

• Background• Delay Algorithm• MatchMaking algorithm• Evaluation• Conclusion• Questions

Background

• Hadoop FIFO scheduler– Scheduler searches local tasks in the

first job and assign them– If no local task in the first job, a non-

local task of the first job will be assigned– Strict FIFO job order is followed

Background

• Hadoop FIFO scheduler

Background

• Hadoop FIFO scheduler deficiencies– On the node side, strict FIFO job order

reduces data locality

– On the job side, FIFO can not provide a fair opportunity for each worker node

Delay Algorithm

• Driven by Facebook events log saved in their Hadoop data warehouse

• Hadoop default FIFO scheduler results in unnecessarily long job response time and lack of fairness in resource sharing

• Focus on two points: fair sharing and data locality

Delay Algorithm

• Workload*Bin #Maps

%Jobs at Facebook

#Maps in Benchmark

# of jobs in Benchmark

1 1 39% 1 382 2 16% 2 163 3-20 14% 10 144 21-60 9% 50 85 61-150 6% 100 6

6 151-300 6% 200 6

7 301-500 4% 400 4

8 501-1500 4% 800 4

9 >1501 3% 4800 4

*Matei Zaharia et al “Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling”

Delay Algorithm

• Fairness:– Task execution percentage between jobs– groups– users

• Data locality– For Map stage, a map task is running on

a node that contains its input data– For Reduce stage?

Delay Scheduling

Fairness VS. Data locality

Delay Algorithm

• Fair-sharing principle-hierarchical principle

Delay Scheduling-including rack locality

Delay Algorithm

• Relax the strict job order– Scheduler can search other jobs in the

job queue to find a local task

• Maximum Delay Time (MDT) for a job to avoid starvation

– MDT is a user defined maximum time that the scheduler can delay a job from assigning its non-local map tasks

Delay Algorithm

Delay algorithm

Delay Algorithm Properties

• MDT decides data locality rate

– Rl is an increasing function of MDT but with a ceiling value “1”

• However, average response time

._ _ _ _100%

._ _ _l

No of data local tasksR

No of total tasks

nlavgl

lavglavg tRtRt )1(

l nlavg avgt t ,l nl

avg avgMDT t t

Delay Algorithm Deficiency

To achieve best response time, we need to

vary the MDT value – different types of jobs – different cluster sizes– different job execution orders

Outline


MatchMaking Algorithm

– Relax strict job order• search all jobs in the queue for local tasks

– To give every node a fair chance to grab its local tasks

• when a node fails to find a local task for the first time in a row, no non-local task will be assigned to it

• when a node fails to find a local task for the second time in a row, a non-local task will be assigned to it

– A node can be assigned at most one non-local task in every heartbeat interval

MatchMaking Algorithm

Outline


Evaluation• Environment

– Hardware• 1 head node with 2 AMD Optron 2.2GHz 64bit, 8GB Mem,

1Gbps Ethernet• 30 worker nodes with same CPUs and network but 4GB Mem

– Software• Hadoop 0.21• Redhat Linux CentOS 5.5

• Test cases– Loadgen– Wordcount

• Metrics– Locality Rate– Average Response Time

Evaluation

• Hadoop Configuration– HDFS

• Block size is128MB• 100 Blocks evenly distributed in 30 worker

nodes• Replication number is 2

– MapReduce• 2 map slots and 1 reduce slot for each worker

node• Facebook production workload*

*Matei Zaharia et al “Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling”

Evaluation

• FIFO Scheduler– Default locality policy– Delay policy– Matchmaking policy

• Fair-sharing Scheduler– Delay policy– Matchmaking policy

Evaluation

• FIFO scheduler locality rate loadgen wordcount

Evaluation

• FIFO scheduler MTART loadgen

wordcount

Evaluation

• Fair sharing scheduler locality rate

Evaluation

• Fair sharing scheduler response time

Conclusion

• We create MatchMaking algorithm to improve MapReduce scheduler’s data locality without tuning

• It obtains good performance in a middle size cluster with Facebook production workload

• It can be easily integrated with other scheduler like FIFO or Fair-sharing scheduler

Disscussion

• Data locality in the Reduce stage

Discussion

• Performance in a large cluster and uneven distributed environment– Large cluster may have long hearbeat interval– Large block size

• ResponseTime=QueuingTime+DataLoadingTime+DataProcessTime

– More replicas

– Data blocks may not be evenly distributed• Hotspot

Discussion

• If the job queue is very long.– Set a parameter MaxJobConsidered– Priorities

Discussion

• Anything else?

Back Page Questions

This picture is adopted from the Internet

Matchmaking: A New MapReduce Scheduling Technique Chen He Dr. Ying Lu Dr. David Swanson.

Documents

Transcript of Matchmaking: A New MapReduce Scheduling Technique Chen He Dr. Ying Lu Dr. David Swanson.