Efficient processing of Rank-aware queries in Map/Reduce
-
Upload
spiros-oikonomakis -
Category
Software
-
view
46 -
download
0
description
Transcript of Efficient processing of Rank-aware queries in Map/Reduce
![Page 1: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/1.jpg)
EFFICIENT PROCESSING OF RANK -AWARE
QUERIES IN MAP/REDUCE
O I K O N O M AK I S S P Y R I D O N
S O F T WAR E / E N G I N E E R AT P E O P L E P E R H O U R
![Page 2: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/2.jpg)
Need for a new model
Exponential data growth
Need for analysis, utilization and scalability of more and more data
Need for parallel processing
Need to reduce reading time and data recovery
Need for convenience in terms of programmer
Cost
![Page 3: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/3.jpg)
What is the Map/Reduce?
Distributed data processing programming model
and runtime environment that operates in a large
number of clusters of machines with parallel
processing
![Page 4: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/4.jpg)
Is the Map/Reduce model reliable?
![Page 5: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/5.jpg)
Map/Reduce
![Page 6: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/6.jpg)
Weaknesses in Top-K Join Queries
What is the Top-K Join?
Weaknesses
Read all the data for the recovery of K results
Non-equitable distribution of workload per Reducer
![Page 7: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/7.jpg)
Goals of the experiment
Implementation of Top-K Join queries in
Map/Reduce model in an efficient manner
Troubleshooting shown in Map / Reduce with:
Early Termination
Load Balancing
![Page 8: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/8.jpg)
Design
Comparison of three algorithms (1 default and 2 new) Naive
EarlyTermination (using bounds)
EarlyTermination & LoadBalancing (using bounds and Longest Processing Time)
Pre-Elaboration Production of two data tables with Join attributes
Statistics for the data in the form of histograms
Elaboration Calculating bounds of histograms for each table
Run Map/Reduce
![Page 9: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/9.jpg)
Design(2)
![Page 10: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/10.jpg)
Early Termination
EarlyTermRecordReaderCheck Bounds
Send Data
Send Data
HDFS
Generated Sorted
Data
Histograms
EarlyTermInputFormat
Mapper
ReducersProcess
![Page 11: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/11.jpg)
Early Termination & Load Balancing
EarlyTermRecordReaderCheck
BoundsSend Data
Send Data
HDFS
Generated Sorted
Data
Histograms
EarlyTermInputFormat
Mapper
Reducer
CustomPartitioner
Reducer Reducer
![Page 12: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/12.jpg)
Experiment (1)
Parameters Values
Data Distribution: Zipfian
Number of data: 1.000.000 / table
Number of reducers: 10, 6
Number of K results: 10
Data skew: 0, 0.5, 1
Number of Joining Attributes: 10
Max value for data: 10000
Sorting: By score
Histograms: 10 bins
Cluster: 8 machines
![Page 13: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/13.jpg)
Experiment Part – Comparison of algorithms (2)
0:00:00
0:07:12
0:14:24
0:21:36
0:28:48
0:36:00
0:43:12
0:50:24
0 0.5 1
Ru
nn
ing
tim
e
Skew
Naive
Early Termination
Early Termination & LoadBalancing
REDUCERS = 10
![Page 14: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/14.jpg)
Experiment Part – Comparison of algorithms (3)
0
500000
1000000
1500000
2000000
2500000
0 0.5 1
Nu
mb
er
of
reco
rds
Skew
Naive
Early termination
Early termination & Load Balancing
REDUCERS = 10
![Page 15: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/15.jpg)
Experiment Part – Comparison of algorithms (4)
0:00:00
0:02:53
0:05:46
0:08:38
0:11:31
0:14:24
0:17:17
6 10
Ru
nn
ing
tim
e
Number of Reducers
Early Termination
Early Termination & Load Balancing
REDUCERS = 6
![Page 16: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/16.jpg)
Conclusion
By using the techniques proposed: :
Early Termination
Load Balancing
is possible to implement rank aware queries (Top-K) in
Map / Reduce efficiently and solving disadvantages of
the model Map / Reduce
![Page 17: Efficient processing of Rank-aware queries in Map/Reduce](https://reader033.fdocuments.us/reader033/viewer/2022052907/559428401a28ab0f418b45f2/html5/thumbnails/17.jpg)
Questions
????
Thank you.