MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots •...
Transcript of MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots •...
![Page 1: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/1.jpg)
MapReduce Online
Tyson Condie
UC Berkeley
Joint work with
Neil Conway, Peter Alvaro, and Joseph M. Hellerstein (UC Berkeley)
Khaled Elmeleegy and Russell Sears (Yahoo! Research)
![Page 2: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/2.jpg)
MapReduce Programming Model
• Think data‐centric – Apply a two step transformaMon to data sets
• Map step: Map(k1, v1) ‐> list(k2, v2) – Apply map funcMon to input records – Assign output records to groups
• Reduce step: Reduce(k2, list(v2)) ‐> list(v3) – Consolidate groups from the map step – Apply reduce funcMon to each group
![Page 3: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/3.jpg)
MapReduce System Model • Shared‐nothing architecture – Tuned for massive data parallelism – Many maps operate on porMons of the input – Many reduces, each assigned specific groups
Batch‐oriented computaMons over massive data – RunMmes range in minutes to hours – Execute on tens to thousands of machines – Failures common (fault tolerance crucial)
• Fault tolerance via operator restart since … – Operators complete before producing any output – Atomic data exchange between operators
![Page 4: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/4.jpg)
Life Beyond Batch
• MapReduce oXen used for analyMcs on streams of data that arrive con/nuously – Click streams, network traffic, web crawl data, …
• Batch approach: buffer, load, process – High latency – Hard to scale for real‐Mme analysis
• Online approach: run MR jobs con/nuously – Analyze data as it arrives
![Page 5: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/5.jpg)
Online Query Processing • Two domains of interest (at massive scale):
1. Online aggrega8on • InteracMve data analysis (watch answer evolve)
2. Stream processing • ConMnuous (real‐Mme) analysis on data streams
• Blocking operators are a poor fit – Final answers only – No infinite streams
• Operators need to pipeline – BUT we must retain fault tolerance
![Page 6: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/6.jpg)
A Brave New MapReduce World
• Pipelined MapReduce – Maps can operate on infinite data (Stream processing)
– Reduces can export early answers (Online aggrega8on)
• Hadoop Online Prototype (HOP) – Preserves Hadoop interfaces and APIs – Pipelining fault tolerance model
![Page 7: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/7.jpg)
Outline
1. Hadoop Background 2. Hadoop Online Prototype (HOP) 3. Performance (blocking vs. pipelining)
4. Future Work
![Page 8: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/8.jpg)
Wordcount Job
• Map step – Parse input into a series of words – For each word, output <word, 1>
• Reduce step – For each word, list of counts – Sum counts and output <word, sum>
• Combine step (op8onal) – Preaggregate map output – Same as the reduce step in wordcount
![Page 9: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/9.jpg)
Master Client
Submit wordcount
Workers
map
reduce
reduce
schedule
![Page 10: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/10.jpg)
Block 1
HDFS
Cat Rabbit
Cat
Rabbit
Dog
Turtle
Map step
Workers
map
reduce
reduce
Cat, 1
Rabbit, 1
Apply map funcMon to the input block Assign a group id (color) to output records group id = hash(key) mod # reducers
Cat, 1 Turtle, 1
Rabbit, 1 Dog, 1
![Page 11: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/11.jpg)
Group step (opMonal) Sort map output by group id and key
Workers
map
reduce
reduce
Cat, 1 Cat, 1 Dog, 1
Rabbit, 1 Rabbit, 1 Turtle, 1
![Page 12: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/12.jpg)
Combine step (opMonal)
Workers
map
reduce
reduce
Apply combiner funcMon to map output o Usually reduces the output size
Cat, 2 Dog, 1
Rabbit, 2 Turtle, 1
![Page 13: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/13.jpg)
Commit step
Workers
map
reduce
reduce
Cat, 2 Dog, 1
Rabbit, 2 Turtle, 1
Final output stored on local file system Register file locaMon with TaskTracker
Local FS
![Page 14: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/14.jpg)
Master
Workers
reduce
reduce
Local FS
Map finished Map output loca8on
![Page 15: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/15.jpg)
Workers
reduce
reduce
Local FS
Shuffle step
HTTP get
Reduce tasks pull data from map output locaMons
![Page 16: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/16.jpg)
Workers
reduce
reduce
Group step (required) When all sorted runs are received merge‐sort runs (opMonally apply combiner)
. . .
. . .
Cat, 5,1,3,4,… Dog, 1,4,2,5,…
Rabbit, 2,5,1,7,… Turtle, 4,2,3,3,…
Map 1, Map 2, . . ., Map k
Map 1, Map 2, . . ., Map k
![Page 17: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/17.jpg)
Workers
Reduce step Call reduce funcMon on each <key, list of values> Write final output to HDFS
reduce
reduce
HDFS
Cat, 25 Dog, 14
Rabbit, 23 Turtle, 16
Cat, 5,1,3,4,… Dog, 1,4,2,5,…
Rabbit, 2,5,1,7,… Turtle, 4,2,3,3,…
![Page 18: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/18.jpg)
Outline
1. Hadoop MR Background 2. Hadoop Online Prototype (HOP) – Implementa8on
– Online Aggrega8on – Stream Processing (see paper)
3. Performance (blocking vs. pipelining)
4. Future Work
![Page 19: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/19.jpg)
Hadoop Online Prototype (HOP)
• Pipelining between operators – Data pushed from producers to consumers – Data transfer scheduled concurrently with operator computaMon
• HOP API No changes required to exisMng clients
• Pig, Hive, Jaql sMll work + ConfiguraMon for pipeline/block modes + JobTracker accepts a series of jobs
![Page 20: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/20.jpg)
Master
Workers
map
reduce
reduce
Schedule Schedule + Map loca8on (ASAP)
pipeline request
![Page 21: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/21.jpg)
Pipelining Data Unit
• IniMal design: pipeline eagerly (each record) – Prevents map side group and combine step
– Map computaMon can block on network I/O
• Revised design: pipeline small sorted runs (spills) – Task thread: apply (map/reduce) funcMon, buffer output
– Spill thread: sort & combine buffer, spill to a file
– TaskTracker: service consumer requests
![Page 22: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/22.jpg)
Simple AdapMve Policy
• Halt pipeline when … 1. Unserviced spill files backup OR 2. EffecMve combiner
• Resume pipeline by first … – merging & combining accumulated spill files into a single file
Map tasks adapMvely take on more work
![Page 23: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/23.jpg)
Workers
map
reduce
reduce
Pipelined shuffle step Each map task can send mulMple sorted runs
![Page 24: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/24.jpg)
Workers
map
reduce
reduce
Pipelined shuffle step
Merge and combine
Merge and combine
Each map task can send mulMple sorted runs Reducers perform early group + combine during shuffle ➔ Also done in blocking but more so when pipelining
![Page 25: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/25.jpg)
Pipelined Fault Tolerance (PFT)
• Simple PFT design: – Reduce treats in‐progress map output as tenta/ve – If map dies then throw away its output
– If map succeeds then accept its output
• Revised PFT design: – Spill files have determinisMc boundaries and are assigned a sequence number
– Correctness: Reduce tasks ensure spill files are idempotent
– Op8miza8on: Map tasks avoid sending redundant spill files
![Page 26: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/26.jpg)
HDFS
Write Snapshot Answer
HDFS
Block 1
Block 2
Read Input File map
map
reduce
reduce
• Execute reduce task on intermediate data – Intermediate results published to HDFS
Online AggregaMon
![Page 27: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/27.jpg)
Example ApproximaMon Query • The data: – Wikipedia traffic staMsMcs (1TB) – Webpage clicks/hour – 5066 compressed files (each file = 1 hour click logs)
• The query: – group by language and hour – count clicks
• The approximaMon: – Final answer ≈ (intermediate click count * scale‐up factor) 1. Job progress: 1.0 / fracMon of input received by reducers 2. Sample frac8on: total # of hours / # hours sampled
and fracMon of hour
(each file = 1 hour click logs)
![Page 28: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/28.jpg)
• Bar graph shows results for a single hour (1600) – Taken less than 2 minutes into a ~2 hour job!
!"#$!!%
&"#$!'%
("#$!'%
)"#$!'%
*"#$!'%
+"#$!'%
,"#$!'%
-"#$!'%
./012%
3456782%
9:3412%
53:;<4%
70<67<4%
=<><4383%
>?6782%
>?:0/5/383%
:/887<4%
8><4782%
@74<6%<48A3:% B<;>63%9:<1C?4% D?E%>:?5:388%
![Page 29: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/29.jpg)
• Approxima8on error: |es/mate – actual| / actual – Job progress assumes hours are uniformly sampled – Sample fracMon ≈ sample distribuMon of each hour
!"
!#$"
!#%"
!#&"
!#'"
!#("
!#)"
!#*"
!#+"
!"
%'!"
'+!"
*%!"
,)!"
$%!!"
$''!"
$)+!"
$,%!"
%$)!"
%'!!"
%)'!"
%++!"
&$%!"
&&)!"
&)!!"
&+'!"
'!+!"
'&%!"
'()!"
(&'!"
!"#$%#&%'(&&)&'
*+,-'./-0/1'
-./"01.21344" 567083"916:;.<"
![Page 30: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/30.jpg)
Outline
1. Hadoop MR Background 2. Hadoop Online Prototype (HOP) 3. Performance (blocking vs. pipelining) – Does block size maQer?
4. Future Work
![Page 31: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/31.jpg)
Large vs. Small Block Size
• Map input is a single block (Hadoop default) – Increasing block size => fewer maps with longer runMmes
• Wordcount on 100GB randomly generated words – 20 extra‐large EC2 nodes: 4 cores, 15GB RAM
• Slot capacity: 80 maps (4 per node), 60 reduces (3 per node)
– Two jobs: large vs. small block size • Job 1 (large): 512MB (240 maps/blocks) • Job 2 (small): 32MB (3120 maps/blocks)
– Both jobs hard coded to use 60 reduce tasks
![Page 32: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/32.jpg)
• Poor CPU and I/O overlap – Especially in blocking mode
• Pipelining + adapMve policy less sensiMve to block sizes – BUT incurs extra sorMng between shuffle and reduce steps
!"#
$!"#
%!"#
&!"#
'!"#
(!!"#
!# )# (!# ()# $!# $)# *!# *)# %!# %)#
!"#$"%&&'
()*%'+*),-.%&/'
011'23'34#56),$'+78"$%'34#56&/'
+,-#-./0.122# 314561#-./0.122#
!"#
$!"#
%!"#
&!"#
'!"#
(!!"#
!# )# (!# ()# $!# $)# *!# *)# %!# %)#
!"#$"%&&'
()*%'+*),-.%&/'
01123'!)4%5),),$'+67"$%'35#89&/'
+,-#-./0.122# 314561#-./0.122#
Large Block Size Job compleMon Mme Reduce idle
period Reduce idle period on final merge‐sort
4 minutes < 1 minute
Reduce step (75%‐100%)
Shuffle step (0%‐75%)
![Page 33: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/33.jpg)
• Improves CPU and I/O overlap – BUT idle periods sMll exist in blocking mode shuffle step – AND increases scheduler overhead (3120 maps) – AND increases HDFS (NameNode) memory pressure
• AdapMve policy finds the right degree of pipelined parallelism – Based on runMme dynamics (reducer load, network capacity, etc.)
!"#
$!"#
%!"#
&!"#
'!"#
(!!"#
!# )# (!# ()# $!# $)# *!# *)# %!#
!"#$"%&&'
()*%'+*),-.%&/'
01123'34#56),$'+7*844'34#56&/'
+,-#-./0.122# 314561#-./0.122#
!"#
$!"#
%!"#
&!"#
'!"#
(!!"#
!# )# (!# ()# $!# $)# *!# *)# %!#
!"#$"%&&'
()*%'+*),-.%&/'
01123'!)4%5),),$'+6*755'35#89&/'
+,-#-./0.122# 314561#-./0.122#
Small Block Size Job compleMon Mme
<< 1 minute < 1 minute
Reduce idle period
![Page 34: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/34.jpg)
Future Work
1. Blocking vs. Pipelining – Comprehensive performance study at scale – Hadoop opMmizer
2. Online AggregaMon – Random sampling of the input – Bever UI for approximate results
3. Stream Processing – Bever interface for window management – Support for high‐level query languages
![Page 35: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/35.jpg)
Thank you!
More informaMon: hvp://boom.cs.berkeley.edu
HOP code: hvp://code.google.com/p/hop/
![Page 36: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/36.jpg)
• Simple wordcount on two (small) EC2 nodes 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots
• Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces
Blocking vs. Pipelining 2 maps sort and send output to
reducer
3rd map finishes, sorts, and sends
to reduce
Final map finishes, sorts and sends to reduce
![Page 37: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/37.jpg)
• Simple wordcount on two (small) EC2 nodes 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots
• Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces
Blocking vs. Pipelining
Reduce task 6.5 minute idle period
Reduce task performing final
merge‐sort
Job compleMon when reduce finishes
1st map output received
2nd map output received
3rd map output received
4th map output received
No significant idle periods during the
shuffle phase
![Page 38: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/38.jpg)
• Operators block – Poor CPU and I/O overlap – Reduce task idle periods
• Only the final answer is fetched – So more data is fetched resulMng in…
– Network traffic spikes – Especially when a group of maps finish
Recall in blocking mode …
![Page 39: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/39.jpg)
CPU UMlizaMon
Mapper CPU
Reducer CPU
Amazon Cloudwatch
Map tasks loading 2GB
of data
Reduce task 6.5 minute idle period
Pipelining reduce tasks start working
(presorMng) early
Map step more I/O bound
13 min. 7 min.
![Page 40: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/40.jpg)
• Operators block – Poor CPU and I/O overlap – Reduce task idle periods
• Only the final answer is fetched – So more data is fetched at once resulMng in…
– Network traffic spikes – Especially when a group of maps finish
Recall in blocking mode …
![Page 41: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/41.jpg)
Network Traffic (map machine network out)
Amazon Cloudwatch
First 2 maps finish and send output
3rd map finishes and sends output
Last map finishes and sends output
Steady network traffic
![Page 42: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/42.jpg)
Benefits of Pipelining
• Online aggregaMon – An early view of the result from a running computaMon – InteracMve data analysis (you say when to stop)
• Stream processing – Tasks operate on infinite data streams – Real‐Mme data analysis
• Performance? Pipelining can … – Improve CPU and I/O overlap – Steady network traffic (fewer load spikes) – Improve cluster uMlizaMon (reducers do more work)
![Page 43: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/43.jpg)
Stream Processing
• Map and reduce tasks run con/nuously – Scheduler: wait for required slot capacity
• Map tasks stream spill files – Input taken from arbitrary source
• MapReduce job, TCP socket, log files, etc.
– Garbage collecMon handled by system
• Window management done at reducer – Reduce output is an infinite series of windowed results – Window boundary based on Mme, record counts, etc.
![Page 44: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/44.jpg)
Real‐Mme Monitoring System
• Use MapReduce to monitor MapReduce – Economy of Mechanism
• Agents monitor machines – Implemented as a conMnuous map task – Record staMsMcs of interest (/proc, log files, etc.)
• Aggregators group agent‐local staMsMcs – Implemented as reduce tasks – Aggregate staMsMcs along machine, rack, datacenter
– Reduce windows: 1, 5, and 15 second load averages
![Page 45: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/45.jpg)
• Monitor /proc/vmstat for swapping – Alert triggered aXer some threshold
• Alert reported around a second aXer passing threshold – Faster than the (~5 second) TaskTracker reporMng interval ? Feedback loop to the JobTracker for bever scheduling
!"
#!!!!"
$!!!!"
%!!!!"
&!!!!"
'!!!!"
(!!!!"
)!!!!"
*!!!!"
+!!!!"
#!!!!!"
!" '" #!" #'" $!" $'" %!"
!"#$%&%'
"(($)&
*+,$&-%$./0)%1&
2345+$6&7$4$.8/0&
![Page 46: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/46.jpg)
Workers
map
reduce
reduce
Pipelined shuffle step Each map task can send mulMple sorted runs Reducers perform early group + combine during shuffle ➔ Also done in blocking but more so when pipelining
![Page 47: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/47.jpg)
Hadoop Architecture
• Hadoop MapReduce – Single master node (JobTracker), many worker nodes (TaskTrackers)
– Client submits a job to the JobTracker – JobTracker splits each job into tasks (map/reduce) – Assigns tasks to TaskTrackers on demand
• Hadoop Distributed File System (HDFS) – Single name node, many data nodes – Data is stored as fixed‐size (e.g., 64MB) blocks – HDFS typically holds map input and reduce output
![Page 48: MapReduce Online - USENIX · 1. Map machine: 2 map slots 2. Reduce machine: 2 reduce slots • Input 2GB data, 512MB block size – So job contains 4 maps and (a hard‐coded) 2 reduces](https://reader033.fdocuments.us/reader033/viewer/2022050109/5f475f70de02913704201234/html5/thumbnails/48.jpg)
Performance
• Why block? – EffecMve combiner
– Reduce step is a bovleneck
• Why pipeline? – Improve cluster uMlizaMon
– Smooth out network traffic