"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Transcript of "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

HFSP: Size-based Scheduling for Hadoop

Mario Pastorelli∗ Antonio Barbuzzi∗ Matteo Dell’Amico∗

Damiano Carra† Pietro Michiardi∗

∗EURECOM, France

†University of Verona, Italy

IEEE BigData 2013

Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 1 / 15

Page 2: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Why a new scheduler?

Focus on short system response timesheterogeneous workloads [VLDB12,VLDB13,SOCC13]

big differences in jobs sizesdata exploration, preliminary analyses, algorithm tuning, orchestrationjobs. . .

Current schedulers need manual setupfine-tuning of the scheduler parametersconfiguration of pools of jobscomplex, error prone and difficult to adapt to workload/clusterchanges

Page 3: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Why a new scheduler?

Focus on short system response timesheterogeneous workloads [VLDB12,VLDB13,SOCC13]

big differences in jobs sizesdata exploration, preliminary analyses, algorithm tuning, orchestrationjobs. . .

Current schedulers need manual setupfine-tuning of the scheduler parametersconfiguration of pools of jobscomplex, error prone and difficult to adapt to workload/clusterchanges

Page 4: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Size-based schedulers

Size-based schedulers are more efficient than other schedulers

job priority based on the job sizefocus resources on a few jobs instead of splitting them among manyjobs. . . but the job size is required

MapReduce is suitable for size-based scheduling

we don’t have the job size but we have the time to estimate itno perfect estimation is required . . .

. . . as long as the jobs very different are sorted correctly

Page 5: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Size-based schedulers

Size-based schedulers are more efficient than other schedulers

job priority based on the job sizefocus resources on a few jobs instead of splitting them among manyjobs. . . but the job size is required

MapReduce is suitable for size-based scheduling

we don’t have the job size but we have the time to estimate itno perfect estimation is required . . .

. . . as long as the jobs very different are sorted correctly

Page 6: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Size-based schedulers: example

Job Arrival Time Sizejob1 0s 30sjob2 10s 10sjob3 15s 10s

Scheduler AVG sojourn timeProcessor Share 35sSRPT 25s

Processor

Share

SRPT

Page 7: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Size-based schedulers: example

Job Arrival Time Sizejob1 0s 30sjob2 10s 10sjob3 15s 10s

Scheduler AVG sojourn timeProcessor Share 35sSRPT 25s

Processor

Share

SRPT

Page 8: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Hadoop Fair Sojourn Protocol

Like SRPT, HFSP wants to be efficient but it avoids starvation

How: Shortest Remaining Virtual Time first (SRVT)

Each job has a virtual size based on the real oneVirtual size decreases with timeJobs are scheduled by ascending virtual size

Page 9: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Hadoop Fair Sojourn Protocol: challenges

Job size estimation

Virtual size and aging

Task scheduling policy

Page 10: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Job size estimation (1/2)

Two ways to estimate a job size:Offline: based on the informations available a priori (num tasks, blocksize, past history . . . ):

available since job submissionnot very precise

Online: based on the performance of a subset of tasks:

need time for trainingmore precise

We need both:

Offline estimation for the initial size, because jobs need size since theirsubmissionOnline estimation because it is more precise: when it is completed, thejob size is updated

Page 11: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Two ways to estimate a job size:Offline: based on the informations available a priori (num tasks, blocksize, past history . . . ):

available since job submissionnot very precise

Online: based on the performance of a subset of tasks:

need time for trainingmore precise

We need both:

Offline estimation for the initial size, because jobs need size since theirsubmissionOnline estimation because it is more precise: when it is completed, thejob size is updated

Page 12: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Implementation details:

Online estimation is done while the job progresses, no work is wastedEstimation technique: first-order statistics are good enoughThe Map and Reduce phases of a job are treated as independent

Further details in the paper . . .

Page 13: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Virtual size and aging

Like SRPT, HFSP wants to be efficient but it avoids starvation

How:

Each job has a “virtual” sizeA “virtual” Fair Scheduler lets each job make virtual progressWe use virtual job sizes to take scheduling decision in the real cluster

→ Priority to small jobs→ Every job eventually gets small, hence no starvation

Page 14: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Task scheduling policy

When a task slot becomes free:

Schedule a task for online estimation, if any

otherwise, schedule a task from the highest priority job

Page 15: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Experimental Setup

Task Trackers 36CPUs Task Tracker 4RAM Task Tracker 8 GBMap slots 72Reduce slots 36

Network speed: 1Gbps

Using PigMix jobs

Two kinds of workloadsinspired by existing traces

Dataset size Map tasksWorkload

SMALL LARGE

1 GB < 5 65% 0%10 GB 10− 50 20% 10%40 GB 50− 150 10% 60%

100 GB > 150 5% 30%

Page 16: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Experimental Setup

Task Trackers 36CPUs Task Tracker 4RAM Task Tracker 8 GBMap slots 72Reduce slots 36

Network speed: 1Gbps

Using PigMix jobs

Two kinds of workloadsinspired by existing traces

Dataset size Map tasksWorkload

SMALL LARGE

1 GB < 5 65% 0%10 GB 10− 50 20% 10%40 GB 50− 150 10% 60%

100 GB > 150 5% 30%

Page 17: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Results

SMALL

101 102 103

Sojourn Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

EC

DF

HFSPFAIR

Same performance for tiny jobs

Large difference for other jobs

Mean sojourn time descreased by16% using HFSP

LARGE

101 102 103 104

Sojourn Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

EC

DF

HFSPFAIR

Jobs completed after 100 seconds:Fair: 2% jobs HFSP: 30% jobs

Page 18: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Results

SMALL

101 102 103

Sojourn Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

EC

DF

HFSPFAIR

Same performance for tiny jobs

Large difference for other jobs

Mean sojourn time descreased by16% using HFSP

LARGE

101 102 103 104

Sojourn Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

EC

DF

HFSPFAIR

Page 19: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Experiments: task times and estimation errors

Task times are skewed

10% of the Reducers are muchlonger than other tasks

100 101 102 103 104

Task Time

0.0

0.2

0.4

0.6

0.8

1.0

EC

DF

MAP

REDUCE

0.25 0.5 1 2 4Error

0.0

0.2

0.4

0.6

0.8

1.0

EC

DF

MAP

REDUCE error = est. sizereal size

∼60% jobs are over estimated

impact of the over-estimation ismitigated by the aging function

Page 20: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Experiments: task times and estimation errors

Task times are skewed

10% of the Reducers are muchlonger than other tasks

100 101 102 103 104

Task Time

0.0

0.2

0.4

0.6

0.8

1.0

EC

DF

MAP

REDUCE

0.25 0.5 1 2 4Error

0.0

0.2

0.4

0.6

0.8

1.0

EC

DF

MAP

REDUCE error = est. sizereal size

∼60% jobs are over estimated

impact of the over-estimation ismitigated by the aging function

Page 21: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Conclusions

HFSP strives for efficiency and avoids starvation

Particularly suitable for loaded clusters

Requires no manual, per-job priorities

→ heterogeneous workloads can coexist in the same cluster

HFSP developed within the BigFoot project

Available at: https://github.com/bigfootproject/HFSP

https://github.com/bigfootproject/HFSP

Page 22: "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Thank you!

@mariopastorelli @BigFoot project

"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Engineering

Transcript of "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014