On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR ›...

Post on 07-Jun-2020

3 views 0 download

Transcript of On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR ›...

On The [Ir]relevance of Network Performance for Data Processing

Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle,Radu Stoica, Bernard Metzler, Ioannis Koltsidas,

Nikolas Ioannou

IBM Research, Zurich

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 2

How [Ir]relevant is the Network?

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 3

How [Ir]relevant is the Network?

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 4

How [Ir]relevant is the Network?

TeraSort PageRank SQL WordCount GroupBy0

50

100

150

200

250

3001 Gbps 10 Gbps 40 Gbps

Ru

nti

me

in s

ecs

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 5

How [Ir]relevant is the Network?

TeraSort PageRank SQL WordCount GroupBy0

50

100

150

200

250

3001 Gbps 10 Gbps 40 Gbps

Ru

nti

me

in s

ecs

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 6

How [Ir]relevant is the Network?

TeraSort PageRank SQL WordCount GroupBy0

50

100

150

200

250

3001 Gbps 10 Gbps 40 Gbps

Network IO is very relevant - up to 64%

Ru

nti

me

in s

ecs

60%

64%

47% 33%

28%

1

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 7

How [Ir]relevant is the Network?

TeraSort PageRank SQL WordCount GroupBy0

50

100

150

200

250

3001 Gbps 10 Gbps 40 Gbps

Network IO is very relevant - up to 64% ??

Ru

nti

me

in s

ecs

1

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 8

Is It Spark Specific?

Flink-TS Flink-PR GraphLab Timely0

50

100

150

200

250

3001 Gbps 10 Gbps 40 Gbps

Ru

nti

me

in s

ecs

725s

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 9

Spark TeraSort: The Shuffle Story

outputinput

distributed sorting

- simple - shuffle data is input data- highest chance of improvements

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 10

Spark TeraSort: The Shuffle Story

Shuffledata

Reduce tasks

output

Map tasks

Cores

input

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 11

Spark TeraSort: The Shuffle Story

Shuffledata

Reduce tasks

output

net

net

net

reading in shuffle data

Cores

input

Map tasks

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 12

Spark TeraSort: The Shuffle Story

Shuffledata

Reduce tasks

output

net CPU

net CPU

net CPU

reading in shuffle data

sortingshuffle data

Cores

input

Map tasks

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 13

Spark TeraSort: The Shuffle Story

Shuffledata

Reduce tasks

output

net CPU

net CPU

net CPU

reading in shuffle data

sortingshuffle data

performance

Cores

input

Map tasks

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 14

How Important is the Network?

1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%

20%

40%

60%

80%

100%

CPUNetwork

Gains from the networks are shadowed by the high CPU footprint

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 15

How Important is the Network?

1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%

20%

40%

60%

80%

100%

CPUNetwork

Gains from the networks are shadowed by the high CPU footprint

52%

48%

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 16

How Important is the Network?

1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%

20%

40%

60%

80%

100%

CPUNetwork

Gains from the networks are shadowed by the high CPU footprint

52%

48%

8%

92%

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 17

How Important is the Network?

1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%

20%

40%

60%

80%

100%

CPUNetwork

Gains from the networks are shadowed by the high CPU footprint

52%

48%

8%

92% 97%

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 18

How Important is the Network?

1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%

20%

40%

60%

80%

100%

CPUNetwork

Gains from the networks are shadowed by the high CPU footprint

52%

48%

8%

92% 97% 99%

Network gains are shadowed by the CPU

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 19

What Exactly is the CPU Doing?

Map Reduce0%

20%

40%

60%

80%

100%

Misc.IteratorSerializationSortingIOJVMLinux

Spark

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 20

What Exactly is the CPU Doing?

Map Reduce0%

20%

40%

60%

80%

100%

Misc.IteratorSerializationSortingIOJVMLinux

Spark

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 21

What Exactly is the CPU Doing?

Map Reduce0%

20%

40%

60%

80%

100%

Misc.IteratorSerializationSortingIOJVMLinux

Overheads are spread across the entire stack - serialization, abstration, execution model etc.2

Spark

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 22

The Balancing Act: CPU vs Network

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 23

The Balancing Act: CPU vs Network

I.Balance out the CPU

with the network time

Sorting : O(nlog(n))Network: O(n)

use smaller 'n'

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 24

The Balancing Act: CPU vs Network

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 25

The Balancing Act: CPU vs Network

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 26

The Balancing Act: CPU vs Network

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 27

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

if a single corecannot do 40 Gbps

then use more

Needs a more careful analysis of at the entire stack

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 28

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

1 2 4 8 160

20

40

60

Number of cores

idealmeasured

Ban

dw

idth

(Gb

ps)

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 29

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

1 2 4 8 160

20

40

60

Number of cores

idealmeasured

Ban

dw

idth

(Gb

ps)

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 30

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

1 2 4 8 160

20

40

60

Number of cores

idealmeasured

Ban

dw

idth

(Gb

ps)

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 31

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

Number of coresR

un

tim

e (s

ecs)

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

1 2 4 8 160

100

200

300reduce map

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 32

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

Number of coresR

un

tim

e (s

ecs)

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

1 2 4 8 160

100

200

300reduce map

260

_____coresruntime = 9 +

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 33

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

Classical techniques are ineffective

I.Balance out the CPU

with the network time

3

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

Number of cores1 2 4 8 16

0

100

200

300reduce map

Ru

nti

me

(se

cs)

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 34

ConclusionFaster networks (IO) are very relevant – as long as you have CPU cycles – differentiate between user vs framework CPU usage

Framework's CPU usage is bad – CPU-network imbalance : sorting, serialization, volcano

execution model, etc. – scalability (serial vs parallel components)– ineffective classical balancing techniques

Knowing today's usec-era IO and CPU hardware, how would you re-design modern data processing framework?

1

2

3

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 35

ConclusionFaster networks (IO) are very relevant – as long as you have CPU cycles – differentiate between user vs framework CPU usage

Framework's CPU usage is bad – CPU-network imbalance : sorting, serialization, volcano

execution model, etc. – scalability (serial vs parallel components)– ineffective classical balancing techniques

Knowing today's usec-era IO and CPU hardware, how would you re-design modern data processing framework?

1

2

3

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 36

ConclusionFaster networks (IO) are very relevant – as long as you have CPU cycles – differentiate between user vs framework CPU usage

Framework's CPU usage is bad – CPU-network imbalance : sorting, serialization, volcano

execution model, etc. – scalability (serial vs parallel components)– ineffective classical balancing techniques

Knowing today's usec-era IO and CPU hardware, how would you re-design modern data processing framework?

1

2

3

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 37

Backup

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 38

Spark

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 39

Spark

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 40

Runtime

1 2 4 8 160

50

100

150

200

250

300reduce map

What Exactly is the CPU Doing?Sp

ark

Map Reduce Reduce/Count0%

20%

40%

60%

80%

100%

Misc.IteratorSerializationSortingIOJVMLinux