"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014
-
Upload
mario-pastorelli -
Category
Engineering
-
view
108 -
download
2
description
Transcript of "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014
HFSP: Size-based Scheduling for Hadoop
Mario Pastorelli∗ Antonio Barbuzzi∗ Matteo Dell’Amico∗
Damiano Carra† Pietro Michiardi∗
∗EURECOM, France
†University of Verona, Italy
IEEE BigData 2013
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 1 / 15
Why a new scheduler?
Focus on short system response timesheterogeneous workloads [VLDB12,VLDB13,SOCC13]
big differences in jobs sizesdata exploration, preliminary analyses, algorithm tuning, orchestrationjobs. . .
Current schedulers need manual setupfine-tuning of the scheduler parametersconfiguration of pools of jobscomplex, error prone and difficult to adapt to workload/clusterchanges
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 2 / 15
Why a new scheduler?
Focus on short system response timesheterogeneous workloads [VLDB12,VLDB13,SOCC13]
big differences in jobs sizesdata exploration, preliminary analyses, algorithm tuning, orchestrationjobs. . .
Current schedulers need manual setupfine-tuning of the scheduler parametersconfiguration of pools of jobscomplex, error prone and difficult to adapt to workload/clusterchanges
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 2 / 15
Size-based schedulers
Size-based schedulers are more efficient than other schedulers
job priority based on the job sizefocus resources on a few jobs instead of splitting them among manyjobs. . . but the job size is required
MapReduce is suitable for size-based scheduling
we don’t have the job size but we have the time to estimate itno perfect estimation is required . . .
. . . as long as the jobs very different are sorted correctly
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 3 / 15
Size-based schedulers
Size-based schedulers are more efficient than other schedulers
job priority based on the job sizefocus resources on a few jobs instead of splitting them among manyjobs. . . but the job size is required
MapReduce is suitable for size-based scheduling
we don’t have the job size but we have the time to estimate itno perfect estimation is required . . .
. . . as long as the jobs very different are sorted correctly
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 3 / 15
Size-based schedulers: example
Job Arrival Time Sizejob1 0s 30sjob2 10s 10sjob3 15s 10s
Scheduler AVG sojourn timeProcessor Share 35sSRPT 25s
Processor
Share
SRPT
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 4 / 15
Size-based schedulers: example
Job Arrival Time Sizejob1 0s 30sjob2 10s 10sjob3 15s 10s
Scheduler AVG sojourn timeProcessor Share 35sSRPT 25s
Processor
Share
SRPT
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 4 / 15
Hadoop Fair Sojourn Protocol
Like SRPT, HFSP wants to be efficient but it avoids starvation
How: Shortest Remaining Virtual Time first (SRVT)
Each job has a virtual size based on the real oneVirtual size decreases with timeJobs are scheduled by ascending virtual size
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 5 / 15
Hadoop Fair Sojourn Protocol: challenges
Job size estimation
Virtual size and aging
Task scheduling policy
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 6 / 15
Job size estimation (1/2)
Two ways to estimate a job size:Offline: based on the informations available a priori (num tasks, blocksize, past history . . . ):
available since job submissionnot very precise
Online: based on the performance of a subset of tasks:
need time for trainingmore precise
We need both:
Offline estimation for the initial size, because jobs need size since theirsubmissionOnline estimation because it is more precise: when it is completed, thejob size is updated
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 7 / 15
Job size estimation (1/2)
Two ways to estimate a job size:Offline: based on the informations available a priori (num tasks, blocksize, past history . . . ):
available since job submissionnot very precise
Online: based on the performance of a subset of tasks:
need time for trainingmore precise
We need both:
Offline estimation for the initial size, because jobs need size since theirsubmissionOnline estimation because it is more precise: when it is completed, thejob size is updated
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 7 / 15
Job size estimation (2/2)
Implementation details:
Online estimation is done while the job progresses, no work is wastedEstimation technique: first-order statistics are good enoughThe Map and Reduce phases of a job are treated as independent
Further details in the paper . . .
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 8 / 15
Virtual size and aging
Like SRPT, HFSP wants to be efficient but it avoids starvation
How:
Each job has a “virtual” sizeA “virtual” Fair Scheduler lets each job make virtual progressWe use virtual job sizes to take scheduling decision in the real cluster
→ Priority to small jobs→ Every job eventually gets small, hence no starvation
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 9 / 15
Task scheduling policy
When a task slot becomes free:
Schedule a task for online estimation, if any
otherwise, schedule a task from the highest priority job
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 10 / 15
Experimental Setup
Task Trackers 36CPUs Task Tracker 4RAM Task Tracker 8 GBMap slots 72Reduce slots 36
Network speed: 1Gbps
Using PigMix jobs
Two kinds of workloadsinspired by existing traces
Dataset size Map tasksWorkload
SMALL LARGE
1 GB < 5 65% 0%10 GB 10− 50 20% 10%40 GB 50− 150 10% 60%
100 GB > 150 5% 30%
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 11 / 15
Experimental Setup
Task Trackers 36CPUs Task Tracker 4RAM Task Tracker 8 GBMap slots 72Reduce slots 36
Network speed: 1Gbps
Using PigMix jobs
Two kinds of workloadsinspired by existing traces
Dataset size Map tasksWorkload
SMALL LARGE
1 GB < 5 65% 0%10 GB 10− 50 20% 10%40 GB 50− 150 10% 60%
100 GB > 150 5% 30%
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 11 / 15
Results
SMALL
101 102 103
Sojourn Time (s)
0.0
0.2
0.4
0.6
0.8
1.0
EC
DF
HFSPFAIR
Same performance for tiny jobs
Large difference for other jobs
Mean sojourn time descreased by16% using HFSP
LARGE
101 102 103 104
Sojourn Time (s)
0.0
0.2
0.4
0.6
0.8
1.0
EC
DF
HFSPFAIR
Jobs completed after 100 seconds:Fair: 2% jobs HFSP: 30% jobs
Jobs completed after 1000 seconds:Fair: 15% jobs HFSP: 90% jobs
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 12 / 15
Results
SMALL
101 102 103
Sojourn Time (s)
0.0
0.2
0.4
0.6
0.8
1.0
EC
DF
HFSPFAIR
Same performance for tiny jobs
Large difference for other jobs
Mean sojourn time descreased by16% using HFSP
LARGE
101 102 103 104
Sojourn Time (s)
0.0
0.2
0.4
0.6
0.8
1.0
EC
DF
HFSPFAIR
Jobs completed after 100 seconds:Fair: 2% jobs HFSP: 30% jobs
Jobs completed after 1000 seconds:Fair: 15% jobs HFSP: 90% jobs
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 12 / 15
Experiments: task times and estimation errors
Task times are skewed
10% of the Reducers are muchlonger than other tasks
100 101 102 103 104
Task Time
0.0
0.2
0.4
0.6
0.8
1.0
EC
DF
MAP
REDUCE
0.25 0.5 1 2 4Error
0.0
0.2
0.4
0.6
0.8
1.0
EC
DF
MAP
REDUCE error = est. sizereal size
∼60% jobs are over estimated
impact of the over-estimation ismitigated by the aging function
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 13 / 15
Experiments: task times and estimation errors
Task times are skewed
10% of the Reducers are muchlonger than other tasks
100 101 102 103 104
Task Time
0.0
0.2
0.4
0.6
0.8
1.0
EC
DF
MAP
REDUCE
0.25 0.5 1 2 4Error
0.0
0.2
0.4
0.6
0.8
1.0
EC
DF
MAP
REDUCE error = est. sizereal size
∼60% jobs are over estimated
impact of the over-estimation ismitigated by the aging function
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 13 / 15
Conclusions
HFSP strives for efficiency and avoids starvation
Particularly suitable for loaded clusters
Requires no manual, per-job priorities
→ heterogeneous workloads can coexist in the same cluster
HFSP developed within the BigFoot project
Available at: https://github.com/bigfootproject/HFSP
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 14 / 15
Thank you!
@mariopastorelli @BigFoot project
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 15 / 15