On The [Ir]relevance of Network Performance for Data Processing
Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle,Radu Stoica, Bernard Metzler, Ioannis Koltsidas,
Nikolas Ioannou
IBM Research, Zurich
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 2
How [Ir]relevant is the Network?
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 3
How [Ir]relevant is the Network?
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 4
How [Ir]relevant is the Network?
TeraSort PageRank SQL WordCount GroupBy0
50
100
150
200
250
3001 Gbps 10 Gbps 40 Gbps
Ru
nti
me
in s
ecs
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 5
How [Ir]relevant is the Network?
TeraSort PageRank SQL WordCount GroupBy0
50
100
150
200
250
3001 Gbps 10 Gbps 40 Gbps
Ru
nti
me
in s
ecs
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 6
How [Ir]relevant is the Network?
TeraSort PageRank SQL WordCount GroupBy0
50
100
150
200
250
3001 Gbps 10 Gbps 40 Gbps
Network IO is very relevant - up to 64%
Ru
nti
me
in s
ecs
60%
64%
47% 33%
28%
1
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 7
How [Ir]relevant is the Network?
TeraSort PageRank SQL WordCount GroupBy0
50
100
150
200
250
3001 Gbps 10 Gbps 40 Gbps
Network IO is very relevant - up to 64% ??
Ru
nti
me
in s
ecs
1
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 8
Is It Spark Specific?
Flink-TS Flink-PR GraphLab Timely0
50
100
150
200
250
3001 Gbps 10 Gbps 40 Gbps
Ru
nti
me
in s
ecs
725s
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 9
Spark TeraSort: The Shuffle Story
outputinput
distributed sorting
- simple - shuffle data is input data- highest chance of improvements
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 10
Spark TeraSort: The Shuffle Story
Shuffledata
Reduce tasks
output
Map tasks
Cores
input
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 11
Spark TeraSort: The Shuffle Story
Shuffledata
Reduce tasks
output
net
net
net
reading in shuffle data
Cores
input
Map tasks
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 12
Spark TeraSort: The Shuffle Story
Shuffledata
Reduce tasks
output
net CPU
net CPU
net CPU
reading in shuffle data
sortingshuffle data
Cores
input
Map tasks
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 13
Spark TeraSort: The Shuffle Story
Shuffledata
Reduce tasks
output
net CPU
net CPU
net CPU
reading in shuffle data
sortingshuffle data
performance
Cores
input
Map tasks
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 14
How Important is the Network?
1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%
20%
40%
60%
80%
100%
CPUNetwork
Gains from the networks are shadowed by the high CPU footprint
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 15
How Important is the Network?
1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%
20%
40%
60%
80%
100%
CPUNetwork
Gains from the networks are shadowed by the high CPU footprint
52%
48%
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 16
How Important is the Network?
1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%
20%
40%
60%
80%
100%
CPUNetwork
Gains from the networks are shadowed by the high CPU footprint
52%
48%
8%
92%
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 17
How Important is the Network?
1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%
20%
40%
60%
80%
100%
CPUNetwork
Gains from the networks are shadowed by the high CPU footprint
52%
48%
8%
92% 97%
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 18
How Important is the Network?
1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%
20%
40%
60%
80%
100%
CPUNetwork
Gains from the networks are shadowed by the high CPU footprint
52%
48%
8%
92% 97% 99%
Network gains are shadowed by the CPU
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 19
What Exactly is the CPU Doing?
Map Reduce0%
20%
40%
60%
80%
100%
Misc.IteratorSerializationSortingIOJVMLinux
Spark
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 20
What Exactly is the CPU Doing?
Map Reduce0%
20%
40%
60%
80%
100%
Misc.IteratorSerializationSortingIOJVMLinux
Spark
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 21
What Exactly is the CPU Doing?
Map Reduce0%
20%
40%
60%
80%
100%
Misc.IteratorSerializationSortingIOJVMLinux
Overheads are spread across the entire stack - serialization, abstration, execution model etc.2
Spark
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 22
The Balancing Act: CPU vs Network
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 23
The Balancing Act: CPU vs Network
I.Balance out the CPU
with the network time
Sorting : O(nlog(n))Network: O(n)
use smaller 'n'
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 24
The Balancing Act: CPU vs Network
I.Balance out the CPU
with the network time
Smaller Partitions
Ru
nti
me
(se
cs)
020406080
100
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 25
The Balancing Act: CPU vs Network
I.Balance out the CPU
with the network time
Smaller Partitions
Ru
nti
me
(se
cs)
020406080
100
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 26
The Balancing Act: CPU vs Network
I.Balance out the CPU
with the network time
Smaller Partitions
Ru
nti
me
(se
cs)
020406080
100
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 27
The Balancing Act: CPU vs Network
II.Use more cores to
scale up
if a single corecannot do 40 Gbps
then use more
Needs a more careful analysis of at the entire stack
I.Balance out the CPU
with the network time
Smaller Partitions
Ru
nti
me
(se
cs)
020406080
100
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 28
The Balancing Act: CPU vs Network
II.Use more cores to
scale up
1 2 4 8 160
20
40
60
Number of cores
idealmeasured
Ban
dw
idth
(Gb
ps)
I.Balance out the CPU
with the network time
Smaller Partitions
Ru
nti
me
(se
cs)
020406080
100
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 29
The Balancing Act: CPU vs Network
II.Use more cores to
scale up
1 2 4 8 160
20
40
60
Number of cores
idealmeasured
Ban
dw
idth
(Gb
ps)
I.Balance out the CPU
with the network time
Smaller Partitions
Ru
nti
me
(se
cs)
020406080
100
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 30
The Balancing Act: CPU vs Network
II.Use more cores to
scale up
1 2 4 8 160
20
40
60
Number of cores
idealmeasured
Ban
dw
idth
(Gb
ps)
I.Balance out the CPU
with the network time
Smaller Partitions
Ru
nti
me
(se
cs)
020406080
100
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 31
The Balancing Act: CPU vs Network
II.Use more cores to
scale up
Number of coresR
un
tim
e (s
ecs)
I.Balance out the CPU
with the network time
Smaller Partitions
Ru
nti
me
(se
cs)
020406080
100
1 2 4 8 160
100
200
300reduce map
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 32
The Balancing Act: CPU vs Network
II.Use more cores to
scale up
Number of coresR
un
tim
e (s
ecs)
I.Balance out the CPU
with the network time
Smaller Partitions
Ru
nti
me
(se
cs)
020406080
100
1 2 4 8 160
100
200
300reduce map
260
_____coresruntime = 9 +
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 33
The Balancing Act: CPU vs Network
II.Use more cores to
scale up
Classical techniques are ineffective
I.Balance out the CPU
with the network time
3
Smaller Partitions
Ru
nti
me
(se
cs)
020406080
100
Number of cores1 2 4 8 16
0
100
200
300reduce map
Ru
nti
me
(se
cs)
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 34
ConclusionFaster networks (IO) are very relevant – as long as you have CPU cycles – differentiate between user vs framework CPU usage
Framework's CPU usage is bad – CPU-network imbalance : sorting, serialization, volcano
execution model, etc. – scalability (serial vs parallel components)– ineffective classical balancing techniques
Knowing today's usec-era IO and CPU hardware, how would you re-design modern data processing framework?
1
2
3
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 35
ConclusionFaster networks (IO) are very relevant – as long as you have CPU cycles – differentiate between user vs framework CPU usage
Framework's CPU usage is bad – CPU-network imbalance : sorting, serialization, volcano
execution model, etc. – scalability (serial vs parallel components)– ineffective classical balancing techniques
Knowing today's usec-era IO and CPU hardware, how would you re-design modern data processing framework?
1
2
3
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 36
ConclusionFaster networks (IO) are very relevant – as long as you have CPU cycles – differentiate between user vs framework CPU usage
Framework's CPU usage is bad – CPU-network imbalance : sorting, serialization, volcano
execution model, etc. – scalability (serial vs parallel components)– ineffective classical balancing techniques
Knowing today's usec-era IO and CPU hardware, how would you re-design modern data processing framework?
1
2
3
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 37
Backup
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 38
Spark
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 39
Spark
The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 40
Runtime
1 2 4 8 160
50
100
150
200
250
300reduce map
What Exactly is the CPU Doing?Sp
ark
Map Reduce Reduce/Count0%
20%
40%
60%
80%
100%
Misc.IteratorSerializationSortingIOJVMLinux
Top Related