Post on 21-Dec-2015
Minimal MapReduce Algori
thms
Yufei Tao
Chinese University of Ho
ng Kong, Hong Kong
outline
• INTRODUCTION• PRELIMINARY AND RELATED WORK• SORTING• BASIC MINIMAL ALGORITHMS IN DATABAS
ES• SLIDING AGGREGATION• EXPERIMENTS• CONCLUSIONS
introduction• Motivation Although these principles have guided th
e design of MapReduce algorithms, the previous practices have mostly been on a best-effort basis, paying relatively less attention to enforcing serious constraints on different performance metrics.
introduction• Minimal MapReduce Algorithms
Minimum footprint.Minimum footprint.Bounded net-trafficBounded net-trafficConstant roundConstant roundOptimal computationOptimal computation
introduction• Contributions
The core of this work comprises of neat mini
mal algorithms for two problems:
SortingSortingSliding AggregationSliding Aggregation
introductionSortingSortingSliding AggregationSliding Aggregation
related work
MapReduceMapReduceTeraSortTeraSortAlgorithms on MapReduceAlgorithms on MapReduceRelevance to Minimal AlgorithmsRelevance to Minimal Algorithms
related work-MR
Statelessness for Fault ToleranceStatelessness for Fault Tolerance
Some MapReduce implementations (e.g., Hadoop) place the requirement that, at the end of a round, each machine should send all the data in its storage to a distributed file system.
related work-TS
What's TeraSort?What's TeraSort?
sorting-TS
sortingDefine Si = S ∩(bi−1, bi], for 1 ≤ i ≤ t. In Round 2, all the objects in Si are gathered by Mi, which sorts them in the reducephase. For TeraSort to be minimal, it must hold:P1. s = O(m).P1. s = O(m).P2. |Si| = O(m) for all 1 ≤ i ≤ tP2. |Si| = O(m) for all 1 ≤ i ≤ t
sortingPr
Discussion
Minimality
sorting
Removing the Broadcast Assumption
(by changing round 1)
in databases
Ranking & Skyline
Group by
Semi-Join
in databasesGroup by
example
sliding aggregation
,
,
( )
( ) ( )o window o
win sum o w o
The window sum of o equal:
sliding aggregation
Sorting with Perfect Balance
sliding aggregation
Sliding Aggregate Computation
experiments-sorting
experiments-sorting
experiments-skyline
本篇论文的主要贡献是填充了
最小 MR 算法概念一个空隙。。
thx @hh's