Reining in the Outliers in MapReduce Jobs using Mantri
description
Transcript of Reining in the Outliers in MapReduce Jobs using Mantri
![Page 1: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/1.jpg)
1
Reining in the Outliers in MapReduce Jobs
using Mantri
Ganesh Ananthanarayanan†, Srikanth Kandula*, Albert Greenberg*, Ion Stoica†, Yi Lu*, Bikas Saha*,
Ed Harris*
† UC Berkeley * Microsoft
![Page 2: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/2.jpg)
2
MapReduce JobsBasis of analytics in modern Internet
services◦E.g., Dryad, Hadoop
Job {Phase} {Task}
Graph flow consists of pipelines as well as strict blocks
![Page 3: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/3.jpg)
3
Example Dryad Job Graph
EXTRACT
AGGREGATE_PARTITION
FULL_AGGREGATE
PROCESS
COMBINE
PROCESS
Distr. File System
Distr. File System
Phase
Pipeline
Blocked untilinput is done
Map.1
Reduce.1
Map.2
Reduce.2
Join
EXTRACT
AGGREGATE_PARTITION
FULL_AGGREGATE
Distr. File System
![Page 4: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/4.jpg)
4
Log Analysis from ProductionLogs from production cluster with
thousands of machines, sampled over six months
10,000+ jobs, 80PB of data, 4PB network transfers◦Task-level details◦Production and experimental jobs
![Page 5: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/5.jpg)
5
Outliers hurt!Tasks that run longer than the rest in the
phase
Median phase has 10% outliers, running for >10x longer
Slow down jobs by 35% at median
Operational Inefficiency◦Unpredictability in completion times affect
SLAs◦Hurts development productivity◦Wastes compute-cycles
![Page 6: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/6.jpg)
6
Why do outliers occur?
Mantri: A system that mitigates outliers based on root-cause
analysis
Input Unavaila
ble
Read Input
Execute
Network Congesti
on
Local Contentio
n
Workload
Imbalance
![Page 7: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/7.jpg)
7
Mantri’s Outlier MitigationAvoid Recomputation
Network-aware Task Placement
Duplicate Outliers
Cognizant of Workload Imbalance
![Page 8: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/8.jpg)
Recomputes: Illustration(a) Barrier phases (b) Cascading
Recomputes
InflationIdeal
Actual
Inflation
Ideal
Actual
Recompute task Normal task
8
![Page 9: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/9.jpg)
9
What causes recomputes? [1]
Faulty machines◦Bad disks, non-persistent hardware
quirks
(4%)
Set of faulty machines varies with time, not constant
![Page 10: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/10.jpg)
10
What causes recomputes? [2]
Transient machine load◦Recomputes correlate with machine
load◦Requests for data access dropped
![Page 11: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/11.jpg)
11
Replicate costly outputs
Task1
Task 2
Task 3 MR3
MR2
((MR3*(1-MR2)) * T3
(MR3 * MR2) (T3+T2)
+Replicate (TRep)
TRep < TRecomp
REPLICATE
TRecomp =
MR: Recompute Probability of a machine
Recompute only Task3 or both
Task3 as well as Task2
![Page 12: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/12.jpg)
12
Transient Failure CausesRecomputes manifest in clutchesMachine prone to cause
recomputes till the problem is fixed◦Load abates, critical process restart
etc.
Clue: At least r recomputes within t time window on a machine
![Page 13: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/13.jpg)
13
Speculative RecomputesAnticipatorily recompute tasks
whose outputs are unread
SpeculativeRecompute
SpeculativeRecompute
(Read Fail)
Unread Data
Task
Input Data
![Page 14: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/14.jpg)
14
Mantri’s Outlier MitigationAvoid Recomputation
◦Preferential Replication + Speculative Recomp.
Network-aware Task Placement
Duplicate Outliers
Cognizant of Workload Imbalance
![Page 15: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/15.jpg)
Reduce TasksTasks access output of tasks from
previous phasesReduce phase (74% of total
traffic)
Reduce
Map
Network
Local
Outlier!15
Distr. File System
![Page 16: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/16.jpg)
16
Variable Congestion
Reduce taskMap outputRack
Smart placement smoothens hotspots
![Page 17: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/17.jpg)
17
Traffic-based Allotment
For every rack:◦d : data◦u : available uplink bandwidth ◦v : available downlink bandwidth
Goal: Minimize phase completion time
Solve for task allocation fractions, ai
![Page 18: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/18.jpg)
18
Local Control is a good approx.
Let rack i have ai fraction of tasks◦Time uploading, Tu = di (1 - ai) / ui
◦Time downloading, Td = (D – di) ai / vi
Timei = max {Tu , Td}
Goal: Minimize phase completion timeFor every rack:◦d : data, D: data over all racks◦u : available uplink bandwidth ◦v : available downlink bandwidth
Link utilizations average out in long term, are steady on the short term
![Page 19: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/19.jpg)
19
Mantri’s Outlier MitigationAvoid Recomputation
◦Preferential Replication + Speculative Recomp.
Network-aware Task Placement◦Traffic on link proportional to bandwidth
Duplicate Outliers
Cognizant of Workload Imbalance
![Page 20: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/20.jpg)
20
Contentions cause outliersTasks contend for local resources
◦Processor, memory etc.
Duplicate tasks elsewhere in the cluster◦Current schemes duplicate towards end
of the phase (e.g., LATE [OSDI 2008])
Duplicate outlier or schedule pending task?
![Page 21: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/21.jpg)
21
Resource-Aware Restart
Running task Potential restart
(tnew) nowtime
trem Save time and resources:P(c tnew < (c + 1) trem)
Continuously observe and kill wasteful copies
![Page 22: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/22.jpg)
22
Mantri’s Outlier MitigationAvoid Recomputation
◦Preferential Replication + Speculative Recomp.
Network-aware Task Placement◦Traffic on link proportional to bandwidth
Duplicate Outliers◦Resource-Aware Restart
Cognizant of Workload Imbalance
![Page 23: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/23.jpg)
23
Workload ImbalanceA quarter of the outlier tasks
have more data to process◦Unequal key partitions for reduce
tasksIgnoring these better than
duplication
Schedule tasks in descending order of data to process◦Time α (Data to Process)◦[Graham ‘69] At worse, 33% of
optimal
![Page 24: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/24.jpg)
24
Mantri’s Outlier MitigationAvoid Recomputation
◦Preferential Replication + Speculative Recomp.
Network-aware Task Placement◦Traffic on link proportional to bandwidth
Duplicate Outliers◦Resource-Aware Restart
Cognizant of Workload Imbalance◦Schedule in descending order of size
Proactive
Reactive
Predict to act early
Be resource-aware
Act based on the cause
Predict to act early
Be resource-aware
Act based on the cause
![Page 25: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/25.jpg)
25
ResultsDeployed in production Bing
clusters
Trace-driven simulations◦Mimic workflow, failures, data skew◦Compare with existing and idealized
schemes
![Page 26: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/26.jpg)
26
Jobs in the Wild
Act Early: Duplicates issued when task 42% done (77% for Dryad)
Light: Issues fewer copies (.47X as many as Dryad)
Accurate: 2.8x higher success rate of copies
Jobs faster by 32% at median, consuming lesser resources
Jobs faster by 32% at median, consuming lesser resources
![Page 27: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/27.jpg)
27
Recomputation Avoidance
Eliminates most recomputes with minimal extra resources
(Replication + Speculation) work well in tandem
![Page 28: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/28.jpg)
28
Network-Aware Placement
Mantri well-approximates the ideal
Bandwidth approximations
![Page 29: Reining in the Outliers in MapReduce Jobs using Mantri](https://reader036.fdocuments.us/reader036/viewer/2022062722/568139f9550346895da1beb4/html5/thumbnails/29.jpg)
29
SummaryFrom measurements in a production
cluster, ◦Outliers are a significant problem◦Are due to an interplay between storage,
network and map-reduce
Mantri, a cause-, resource-aware mitigation
Deployment shows encouraging results
“Reining in the Outliers in MapReduce Clusters using Mantri”, USENIX OSDI 2010