Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland...

21
Optimizing Shuffle in Wide-Area Data Analytics Shuhao Liu*, Hao Wang, Baochun Li Department of Electrical & Computer Engineering University of Toronto

Transcript of Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland...

Page 1: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Optimizing Shuffle inWide-Area Data AnalyticsShuhao Liu*, Hao Wang, Baochun Li Department of Electrical & Computer Engineering University of Toronto

Page 2: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

What is: - Wide-Area Data Analytics? - Shuffle?

�2

Page 3: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Wide-Area Data Analytics

�3

N. California

Ireland

Singapore

Oregon

N. Virginiadata 1

data 2

data 3

Large volumes of data are generated, stored and processed across geographically distributed DCs.

Page 4: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Existing work focuses on Task Placement

�4

Page 5: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Rethink the root cause of inter-DC traffic: Shuffle

�5

Page 6: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Fetch-based Shuffle

�6

Mapper 1

Mapper 2

Mapper 3

Reducer 1

Reducer 3

Reducer 2

All-to-allCommunicationat Beginning of Reduce Tasks

Page 7: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Problems with Fetch‣ Under-utilize the inter-datacenter bandwidth

‣ Start late: beginning of reduce

‣ Start concurrently: share bandwidth

‣ Need for refetch

‣ Possible reduce task failure

�7

Page 8: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Push-based Shuffle‣ Bandwidth Utilization

�8

time0 4 8 12 16

time0 4 8 12 16

Map Reduce

Reduce

Map Data Transfer

Data Transfer

Reduce

ReduceMap

Shuffle Read

Shuffle Read

ShuffleWrite

ShuffleWrite

Map

Stage N Stage N+1

Stage N Stage N+1

worker A

worker B

worker A

worker B

ShuffleRead

(a)

(b)

Page 9: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Push-based Shuffle‣ Failure Recovery

�9

time0 4 8 12 16 20 24

Map FailedReduce

Reduce

Refetch ReduceShuffle Read

Shuffle Read

ShuffleWrite

time0 4 8 12 16 20 24

Map

Stage N Stage N+1

worker A

worker B(a)

Map FailedReduce

Reduce

Reduce

Map

Stage N Stage N+1

worker A

worker B(b)

Data Transfer

Data Transfer

ShuffleWrite

ShuffleRead

Refetch

Page 10: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Where to Push?‣ Optional: existing task placement algorithms

‣ Know reducer placement before hand

‣ Require prior knowledge

‣ e.g., predictable jobs, inter-DC available bandwidth

‣ Our solution: Push/Aggregate

�10

Page 11: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

‣ Send shuffle input to a subset of datacenters with a large portion of shuffle input

‣ Reduce inter-datacenter traffic in future shuffles

‣ Likely to reduce inter-datacenter traffic at current shuffle

Aggregating Shuffle Input

�11

Datacenter AReceiver of shuffle

input

Reducer 1

Reducer 3

Datacenter B

Reducer 2

Reducer 4

Reducer 5

Inter-DC Transfers

Page 12: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

For any partition of shuffle input, the expected inter-datacenter traffic in next shuffle is proportional to the number of non-colocated reducers.

�12

Page 13: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Aggregating Shuffle Input‣ Send shuffle input to a subset of datacenters with a

large portion of shuffle input

‣ Reducer is likely to be placed close to shuffle input

‣ More aggregated data -> less inter-datacenter traffic with reasonable task placement

�13

Page 14: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Implementation in Spark‣ Requirements:

‣ Push before writing to disk

‣ Destined to the aggregator datacenters

‣ transferTo() as an RDD transformation

‣ Allow implicit or explicit usage

�14

Page 15: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Implementation in Spark

�15

InputRDD

A1 B1A2

mapA1

mapA2

mapB1

reduceA1, A2

B1

reduceA1, A2

B1

InputRDD .map(…) .reduce(…) …

InputRDD .map(…) .transferTo([A]) .reduce(…) …

InputRDD

A1 B1A2

mapA1

mapA2

mapB1

reduceA1, A2, Ax

transferTo

A1transferTo

A2transferTo

A*

reduceA1, A2, Ax

(a) (b)

Page 16: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Implementation in Spark

�16

5/12/2016 ScalaWordCount - Details for Job 0

http://54.173.130.234:4040/jobs/job/?id=0 1/2

1.6.1(/) ScalaWordCount application UI

(kill)(/stages/stage/kill/?id=0&terminate=true)

Details for Job 0

Status: RUNNINGActive Stages: 1Pending Stages: 1

Stage 0

map

sequenceFile

map

flatMap

Stage 1

map

reduceByKey

Active Stages (1)

Stage

Id Description Submitted Duration

Tasks:

Succeeded/Total Input Output

Shuffle

Read

Shuffle

Write

0 2016/05/1207:01:01

8 s 128.0MB

24.1KB

Jobs (/jobs/) Stages (/stages/) Storage (/storage/) Environment (/environment/)

Executors (/executors/)

1/9

5/12/2016 ScalaWordCount - Details for Job 0

http://54.173.130.234:4040/jobs/job/?id=0 1/2

1.6.1 (/) ScalaWordCount application UI

Details for Job 0Status: RUNNINGActive Stages: 1Pending Stages: 1

Stage 0

transferTo

sequenceFile

map

map

flatMap

Stage 1

map

e uce e

Active Stages (1)

Jobs (/jobs/) Stages (/stages/) Storage (/storage/) Environment (/environment/)

Executors (/executors/)

(a) (b)

embeddedtransformation

Page 17: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Implementation in Spark‣ transferTo() implicit insertion

�17

DC1 DC2

Origin Code

val InRDD = In1+In2

InRDD .filter(…) .groupByKey(…) .collect()

Produced Code

val InRDD = In1+In2

InRDD .filter(…) .transferTo(…) .groupByKey(…) .collect()

In1

filter

groupByKey

Shuffle Input

collect

In2

filter

groupByKey

Shuffle Input

DC1 DC2

In1

filter

groupByKey

Shuffle Input

collect

In2

filter

groupByKey

Shuffle Input

transferTo

Processed ByDAGScheduler

transferTo

Page 18: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Evaluation‣ Amazon EC2, m3.large instances

‣ 26 nodes in 6 different locations

�18

4 6

4

4

4

4N. Virginia

N. California

São Paulo

Frankfurt

Singapore

Sydney

Page 19: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Performance

�19

The lower, the better

Page 20: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Take-Away Messages‣ Push-based shuffle mechanism is beneficial in wide-area

data analytics

‣ Aggregating shuffle input to a subset of datacenters is likely to help when you have no priori knowledge

‣ Implementation in Apache Spark as a data transformation

‣ Performance: reduced shuffle time and its variance

�20

Page 21: Optimizing Shuffle in Wide-Area Data Analytics...Wide-Area Data Analytics 3 N. California Ireland Singapore Oregon N. Virginia data 1 data 2 data 3 Large volumes of data are generated,

Thanks! Q&A

�21