iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving...

25
iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT, UNIVERSITY OF COLORADO, COLORADO SPRINGS PRESENTED BY YANFEI GUO

Transcript of iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving...

Page 1: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

iShuffle: Improving Hadoop Performance with Shuffle-on-WriteYANFEI GUO, J IA RAO, XIAOBO ZHOUC S D E PA R T M E N T, U N I V E RS I T Y O F C O LO R A D O, C O LO R A D O S P R I N G S

P R E S E N TE D BY YA N F E I G U O

Page 2: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

MapReduceA framework for processing parallelizable problems across huge data sets using a large number of machines◦ Invented and used by Google [OSDI’04]◦ Many implementations

◦ Apache Hadoop, Dryad

◦ From interactive query to massive/batch computation◦ Nutch, Hive, HBase

7/3/2013 2ICAC'13 ISHUFFLE

Page 3: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

MapReduce Model

… If by your art, my dearest father, you havePut the wild waters in this roar, allay them. …

…If = 8By = 5Your = 7Art = 1…

…IfByYourArt…

7/3/2013 3

Apply a common function to the problem’s input

Generate intermediate data

Process intermediate data for answer

MapMa p ( k1, v1) l i s t ( k2, k2)

ReduceRed u ce( k2, l i s t ( v2) ) l i s t (v3)

ICAC'13 ISHUFFLE

Page 4: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

MapReduce

Programming and Execution Model

Map ReducePartition Combine Shuffle/Sort

Map(k1, v1) l i st(k2, k2) Reduce(k2, l i st(v2) ) l i st(v3)

Map Partition Combine

Map Partition Combine

……

ReduceShuffle/Sort

ReduceShuffle/Sort

……

7/3/2013 ICAC'13 ISHUFFLE 4

Page 5: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Hadoop ImplementationMap◦ Buffered output◦ Spill to disk

Reduce

7/3/2013 ICAC’13 ISHUFFLE 5

OutputCollectorCollect()

Map

per

Map

()

OutputBufferPartitioner Combiner

v1, v2, v3 ...k1

...k2

......

v1, v2, v3 ...k7

...k8

......

v1, v2, v3 ...k9

...k10

......

P1

P2

...

v1, v2, v3 ...k1

...k2

......

v1, v2, v3 ...k7

...k8

......

v1, v2, v3 ...k9

...k10

......

P1

P2

...

SpillThread

v1, v2, v3 ...k1

...k2

......

v1, v2, v3 ...k7

...k8

......

v1, v2, v3 ...k9

...k10

......

P1

P2

...

<k, v>

InputSplit

Map Task

Page 6: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Hadoop Key DesignsShuffle◦ All-to-all input data fetching phase in a reduce task◦ The reduce function will not be performed until its completion◦ Disk I/O and network intensive

Overlapping shuffle with map tasks◦ Hadoop allows an early start of the shuffle phase as soon as part of the

reduce input is available◦ By default, shuffle is started when 5% of map tasks finished

Fair sharing◦ Hadoop enforces fairness among users/jobs◦ Fair share of map and reduce slots

7/3/2013 ICAC'13 ISHUFFLE 6

Page 7: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

IssuesInput data skew among reduce tasks◦ Non-uniform key distribution Different partition size◦ Lead to disparity in reduce completion time

Inflexible Scheduling of Reduce Tasks◦ Reduce tasks are created during job initialization◦ Tasks are scheduled in ascending order of their ID◦ Reduce tasks can not start even if all their input partitions are available

Tight coupling of shuffle and reduce◦ Shuffle starts only when the corresponding reduce is scheduled◦ Leaving parallelism within and between jobs unexploited

7/3/2013 ICAC'13 ISHUFFLE 7

Page 8: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

A Motivating Example

7/3/2013 ICAC'13 ISHUFFLE 8

Workload: tera-sort with 4GB datasetPlatform: 10-node Hadoop cluster1 map and 1 reduce slots per node

Page 9: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Related WorkMap Scheduling in Hadoop◦ Accelerating straggler Task: [OSDI’08]◦ Enforcing Fairness: [Middleware’10], [EuroSys’10]

Improving reduce performance◦ Push-based shuffling: [NSDI’10]◦ RDMA-based acceleration: [SC’11]◦ Specially designed partitioner: [TPDS’12]

7/3/2013 ICAC'13 ISHUFFLE 9

Not applicable to reduce tasks

Requiring hardware support or not effective in multiplewave execution

Page 10: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Our ApproachDecouple shuffle phase from reduce tasks◦ Shuffle as a platform service provided by Hadoop◦ Pro-actively and deterministically push map output to different slave nodes

Balancing the partition placement◦ Predict partition sizes during task execution ◦ Determine which node should a partition been shuffled to◦ Mitigate data skew

Flexible reduce task scheduling◦ Assign partitions to reduce tasks only when scheduled

7/3/2013 ICAC'13 ISHUFFLE 10

Page 11: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

iShuffle Design

iShuffle◦ Shuffler◦ Shuffle Manager◦ Task Scheduler

Features◦ User-Transparent Shuffle Service◦ Shuffle-on-Write◦ Automated Map Output Placement◦ Flexible Reduce Task Scheduling

7/3/2013 ICAC'13 ISHUFFLE 11

Page 12: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Shuffle-on-Write“shuffle” when Hadoop stores intermediate results

Map output collection◦ MapOutputCollector◦ DataSpillHandler

Data shuffling◦ Queuing and Dispatching◦ Data Size Predictor◦ Shuffle Manager

Map output merging◦ Merger◦ Priority-Queue merge sort

7/3/2013 ICAC'13 ISHUFFLE 12

Page 13: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Partition PlacementPrediction of Partition Sizes◦ Task characteristics: input data size, map selectivity◦ Linear model between partition size and input data size◦ Metrics measured during the task execution

𝑝𝑝𝑖𝑖,𝑗𝑗 = 𝑎𝑎𝑗𝑗 + 𝑏𝑏𝑗𝑗𝐷𝐷𝑖𝑖

Partition Placement Problem◦ Minimizes the difference of total partition size on different nodes

◦ 𝜎𝜎 = 1𝑛𝑛∑𝑖𝑖=1𝑛𝑛 𝜇𝜇 − ∑𝑗𝑗∈𝑠𝑠𝑖𝑖 𝑝𝑝𝑗𝑗

7/3/2013 ICAC'13 ISHUFFLE 13

Page 14: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Heuristic Placement AlgorithmLargest Partition First (LPF)◦ Pick the largest partition first◦ Place it to node with the least total

partition size

7/3/2013 ICAC'13 ISHUFFLE 14

Page 15: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Flexible Reduce SchedulingAssign Partitions to Reduce Tasks at Runtime◦ Override the partition assignment at job initialization◦ Allow tasks to run on any node

Multiple Job Scheduling◦ Fair scheduling for map tasks◦ Disabled fair share for reduce tasks◦ Prevent wasted cluster cycles for waiting unfinished maps

7/3/2013 ICAC'13 ISHUFFLE 15

Page 16: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Experiments32-node Hadoop Cluster◦ 1 namenode, 1 jobtracker, 30 slave nodes◦ 4 map slots and 2 reduce slots per slave◦ HDFS Block size = 64 MB◦ Hadoop version 1.1.1

Hardware◦ Intel Xeon E5530, 4-core, 2.4 GHz◦ 4 GB Memory◦ 1 Gbps Ethernet

7/3/2013 ICAC'13 ISHUFFLE 16

Page 17: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

BenchmarkPurdue MapReduce Benchmark Suite (PUMA)◦ Real data set from Wikipedia, Netflix◦ Shuffle-heavy and shuffle-light

7/3/2013 ICAC'13 ISHUFFLE 17

Job Input Size (GB) # Map # Reduce Shuffle Vol (GB)

Self-join 250 4000 180 246

Tera-sort 300 4800 180 300

Ranked-inverted-index 220 3520 180 235

K-means 30 480 6 43

Inverted-index 250 4000 180 57

Term-vector 250 4000 180 59

wordcount 250 4000 180 49

Histogram-movies 200 3200 180 0.002

Histogram-ratings 200 3200 180 0.0012

Grep 250 4000 180 0.0013

Shuffle-heavy

Shuffle-light

Page 18: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

iShuffle PerformanceExecution Trace◦ Slow start of Hadoop does not eliminate

shuffle delay for multiple reduce wave◦ Overhead of remote disk access of

Hadoop-A [SC’11]◦ iShuffle has almost no shuffle delay

7/3/2013 ICAC'13 ISHUFFLE 18

Page 19: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

iShuffle Performance (cont’d)Reducing Job Completion Time◦ 30% and 21% less than vanilla Hadoop and Hadoop-A

Reducing Shuffle Delay◦ 10x less than vanilla Hadoop in job’s with large shuffle volume◦ 2x to 3x less than Hadoop-A

7/3/2013 ICAC'13 ISHUFFLE 19

0

0.2

0.4

0.6

0.8

1

1.2

Nor

mal

ized

Job

Exec

utio

n Ti

me

Hadoop

Hadoop-A

iShuffle 0

2

4

6

8

10

12

14

Nor

mal

ized

Shuf

fle D

elay

Hadoop

Hadoop-A

iShuffle

Page 20: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Balanced Partition PlacementPerformance improvement by a Balanced Partition Placement◦ 8-12% shorter job completion time

7/3/2013 ICAC'13 ISHUFFLE 20

0.75

0.8

0.85

0.9

0.95

1

1.05

Nor

mal

ized

Job

Exec

utio

n Ti

me

Hadoop

Hadoop-A

iShuffle

Page 21: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Multiple Job PerformanceShuffle-heavy + Shuffle-heavy◦ 8% and 23% improvement on

tera_sort and inverted-index

Shuffle-heavy + Shuffle-light◦ 16% and 25% improvement on

tera_sort and histogram-movies

7/3/2013 ICAC'13 ISHUFFLE 21

0

500

1000

1500

2000

2500

tera_sort inverted-index

Job

Exec

utio

n Ti

me

(s)

Separate iShuffle

iShuffle w/ HFS

iShuffle w/ HFS_mod

0

500

1000

1500

2000

2500

tera_sort histogram-movies

Job

Exec

utio

n Ti

me

(s)

Separate iShuffle

iShuffle w/ HFS

iShuffle w/ HFS_mod

Page 22: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

ConclusionsMotivationso Tight coupling of shuffle of reduceo Inefficient reduce schedulingo Parallelism unexploited

iShuffleo Proactively push shuffle datao Balancing map output to mitigate data skewo Flexible reduce scheduling

Resultso Significantly reducing completion time for shuffle-heavy jobs

7/3/2013 ICAC'13 ISHUFFLE 22

Page 23: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Questions?

7/3/2013 ICAC'13 ISHUFFLE 23

Page 24: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

Backup Slides

7/3/2013 ICAC'13 ISHUFFLE 24

Page 25: iShuffle: Improving Hadoop Performance with Shuffle-on-Write · 2019-12-18 · iShuffle: Improving Hadoop Performance with Shuffle-on-Write YANFEI GUO , JIA RAO, XIAOBO ZHOU CS DEPARTMENT,

iShuffle v.s. Random PlacementiShuffle outperforms randomization -based placement algorithms

7/3/2013 ICAC'13 ISHUFFLE 25

0.9

0.95

1

1.05

1.1

1.15

1.2

Nor

mal

ized

Job

Exec

utio

n Ti

me

GREEDY(2)

LPF-GREEDY(2)

iShuffle