Paper Presentation
Medusa: An Efficient Cloud Fault-Tolerant
MapReduce [1]
PePedro A. R. S. Costa, Xiao Bai†, Fernando M. V. Ramos, Miguel Correia2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing
Presented by : Muhammad Salman Aslam (G-00763290)
Department of Computer Science, VSE,
George Mason University M S Aslam - GMU VSE 11/13/17
Sequence of Presentation
Basics of Map Reduce
Distribute Map Reduce (Overview and Issues / Problem Definition)
Proposed Solution (Medusa)
Medusa Design
Medusa Scheduler Model
Q & A (Design and Model)
Experimental Evaluation and Results
Critique
Important Takeaways
Q & A (Final)
• Additional Comments / Observations by the Presenter in white Font Color
• Not included in the Paper by Authors - Open to disagreement / discussion
M S Aslam - GMU VSE 11/13/17
2
Map Reduce - Basics
MAP Reduce Basics (Architecture)
Architecture
Master Node
Many Worker
Nodes
Work is
distributed
Distributed
Nodes
Job tracker
Task tracker Task tracker Task tracker
Slave node 1 Slave node 2 Slave node N
Workers
Client
Workers Workers
Master node
M S Aslam - GMU VSE 11/13/17
4
MAP Reduce Basics (Operations
/Work flow)
Client Adds Job at master which is Manger
Map and Reduce at Nodes monitored by
Mode Manager
M S Aslam - GMU VSE 11/13/17
5
Map Reduce Functions
Map
Process a key/value pair to generate intermediate key/value pairs
Reduce
Merge all intermediate values associated with the same key
Partition
By default : hash(key) mod R
Goal is to keep the partitions Well balanced
M S Aslam - GMU VSE 11/13/17
6
Map Reduce Scaling and
Distributed Deployments• Review
• Issues
Revisiting Architecture
Think of the nodes
as part of one
cluster
Typically cluster
sizes grow large as
larger jobs are
designed
A cloud may be
holding one Map
Reduce
M S Aslam - GMU VSE 11/13/17
8
How to Handle Faults
Typical Fault Handling
Monitoring and restart tasks
Node manager and Resource Manager Crashes
Restart the Node
Spin a new Machine and Restart execution
Tasks crash
Restart the Task
Redistribute the task to different node
Adding checksums to the files in HDFS to detect data corruption in disks
Techniques for single cloud Only.
M S Aslam - GMU VSE 11/13/17
9
Distributed Map Reduce
May be use several clouds !
Due
Size of Problem
BY Design
To Build efficiency
Redundancy
Fault Tolerance
Cloud 1
Cloud 2
Cloud 3
Cloud of
Clouds
_
M S Aslam - GMU VSE 11/13/17
10
Map Reduce Fault Tolerance Needed
Distribution BY Design fro Fault Tolerance
Build Fault Tolerance
Complete cloud Failure
Power outages
Calamities
Malicious Attacks
INSIDER THREAT
Cloud 1
Cloud 2
Cloud 3
Cloud of
CloudsM S Aslam - GMU VSE 11/13/17
11
Proposed SolutionChallenges Addressed
Medusa Design
Challenges Addressed
Transparent Solution
Keep MapReduce API (no changes),
No modification of Map Reduce Jobs necessary
No Modification to Hadoop Framework.
Tolerate Additional Faults (Map Reduce only address machine crash faults)
Arbitrary faults
Cloud outages
Malicious faults
Minimum replication
Matching or Better performance
Solution - An efficient Middleware named MEDUSA
M S Aslam - GMU VSE 11/13/17
13
Medusa Design
Work as a Middleware
Distributing and Managing MapReduce jobs among clouds
Minimizing amount of data replication
Ensuring efficient completion of the entire MapReduce
Distribute Jobs to ensure lower workspan
Ensure the copy of data between clouds with high pairwise bandwidth
Select Clouds with high computational Power
Ensuring Fault Tolerance
Traditional Approach
If f faulty resource manager then we need at least f +1 correct results to rule out f identical wrong answers
There can be f Faulty clouds; Thus we need to distribute the job at 2f+1
replications of Map and aggregation Jobs
Medusa ensures only f+1 replications – provides same fault
toleranceM S Aslam - GMU VSE 11/13/17
14
Map Reduce on a Federation
Federation
Distributed Cloud N
Distributed Cloud 2
Distributed Cloud 1
-
-
-Medusa
(Proxy)
M S Aslam - GMU VSE 11/13/17
15
Medusa - 2 Phase Operation
1. Vanilla Map Reduce Rank Clusters
( Scheduling Algorithm)
Copy Data
(f=1 -> only 2 replicas)
Run Job
Verify Output
2. Global Map Reduce Rank Clusters
(Minimize Data Transfer)
Copy Job
Run Job
Verify Output
3. Return ResultM S Aslam - GMU VSE 11/13/17
16
Medusa – Important Issues
Fault Tolerance
Build Integrity
Data Digest using Hash Function (SHA 256)
Send hash of data when transmitting job : Ensure successful Tx
In verification step the hash of result is compared between the replicated jobs
If the hash matches Good, Else re run the job
Efficiency
Ensure the best make span (job ends is least time)
Select clouds to Replicate the job optimally
Estimate Transmission time
Estimate Processing time
M S Aslam - GMU VSE 11/13/17
17
Medusa – Scheduler Model
Estimation Model
Total Job Time (Workspan)
Transmission time
Processing time
Scheduler Algorithm (pseudo code) is not included in the Paper
But is available as open source distribution M S Aslam - GMU VSE 11/13/17
18
Medusa Scheduler : Model
Estimate Workspan
Running time (Phase 1)
Transmission time + Processing time
Running time (Phase 2)
Copy phase output from phase 1 to all clouds which does not have that result
Select Assignments with minimal t1 and t2 for all i clouds
Schedule job replicas to ensure lowest workspan
M S Aslam - GMU VSE 11/13/17
19
Medusa Scheduler : Estimating transmission time
Transmission time :
𝑡𝑡𝑟𝑎𝑛𝑠(𝑖, 𝑗) =𝑙 (𝑖, 𝑗)
2+
𝑆
𝑇 (𝑖, 𝑗)
Terms
𝑡𝑡𝑟𝑎𝑛𝑠(𝑖, 𝑗) _ Transmission time between i and j clouds
𝑙 (𝑖, 𝑗) _ Distance between i and j (in terms of ound trip time)
𝑇 (𝑖, 𝑗) _ Network Through put (Transmission rate)
S - Data size
Parameters tracking
Through put Medusa tracks inter cloud through puts – using Ipref
RTT Measured periodically M S Aslam - GMU VSE 11/13/17
20
Medusa Scheduler : Estimating Processing time
Scheduler Model
𝑦′ = 𝛽1𝑥1 + 𝛽
2𝑥2 +⋯+ 𝛽
𝑛𝑥𝑛 + 𝛽
0
Terms
y _ Data Processing time predicted
𝑥1 , 𝑥2 …𝑥𝑛 _ n features
𝛽1, 𝛽2 … 𝛽𝑛 _ estimated parameters
Parameters estimated based on
Job Configuration (Input size, Number of map jobs, Number of reduce jobs)
Cloud capacity (Clock, Cores, Memory)
Overhead (Queued Jobs, No of Map reduce jobs running, % completion)
M S Aslam - GMU VSE 11/13/17
21
Q & A
Basics – Medusa Design - Model
M S Aslam - GMU VSE 11/13/17
22
Experiments
Evaluations/Results
Evaluation : Setup
Applications (from Gridmax Benchmarks)
Word Count
Web Data Scan
Monster query
Up to 6 GB of problem size generated
Larger sizes also used in some experiments – confirms trends
Evaluation Platform : ExoGini testbed
Geological distribution
Word count : Chicago, California and West Virginia
Others : Pittsburg, Massachusetts and Texas
Each cloud - One Resource Manager + three node manager (4 hosts)
Each experiment repeated 40 times
Linear regression based on 30 historical points
Cloud stressed with external jobs with random workloadsM S Aslam - GMU VSE 11/13/17
24
Evaluations
Companions
Baseline : Round Robin based scheduling
Medusa
Compare % cloud usage on various clouds(varying Input)
Compare the Workspan
Fault tolerance
With No faults
With Faults
M S Aslam - GMU VSE 11/13/17
25
Results : RR vs Md Workspan [1]
Experiments
Run WordCount, WebScan and MonsterQuerry
Varying Input Data Size (1000 – 60000 MB)
Results
Medusa has Better Workspan in all experiments
Word Count : Almost 300% improvement
WebScan : Better result and Medusa more stable
Monster query – Marginally better result
Observation:
Medusa outperforms RR heavily in the right
conditions (cloud and restrictive through puts)
Better Perf and more stable under any
arrangement
Word Count Run on different Clouds show marked
improvement – 300% improvement
M S Aslam - GMU VSE 11/13/17
26
Results : RR vs Md Workspan (cntd.)
Experiments
Measure Network Throughput between clouds
Results
Word Count : Through put is lower and variable
Webscan/Monsterquery network through higher and consistent
Cloud to Cloud – Throughput different if direction reversed
(Why?)
Observation:
Word Count through put are over taxing
In restrictive though put scenario Medusa out performs RR
When Through put is high and less varying then Medusa’s
advantage are overshadowed - However Medusa is still
stableM S Aslam - GMU VSE 11/13/17
27
Results : RR vs Md Workspan (cntd.)
Experiments
Measure Cloud usage (work distribution)
Results
Round Robin : Uniform Work distribution
Medusa : Distribute based on estimated Workspan
More workload on Chicago II and WV
Observation:
RR – Assumes even cloud performances
Variation only because of high RTT
Medusa – Takes into account the cloud performance and Network Throughput
End Result is better Workspan by Medusa
Medusa’s work load distribution follows similar trends for Chicago I and CA, and similar for WV and Chicago II 0
5
10
15
20
25
30
35
40
45
50
1500 3000 4500 6000
%C
lou
d U
sage
Medusa - Work eff. for input data size
WV ChgI ChgII CA
0
5
10
15
20
25
30
35
1500 3000 4500 6000
% C
lou
d u
sage
Round Robin – Work eff. for Input data size
WV ChgI ChgII CA
Poly. (WV) Poly. (ChgI) Poly. (ChgII) Poly. (CA)
M S Aslam - GMU VSE 11/13/17
28
Work Load Distribution Combined
How Workload is distributed on each cloud How RR and Md select clouds for different data size
Observation:
When Medusa Favors the distribution to the cloud with highest Network Through put
Some clouds seem to provide more advantage for a problem of particular data size eg.. Chicago I
Some clouds are selected very consistently - eg. WV (with Medusa)
0
5
10
15
20
25
30
35
40
45
50
WV WV MD ChgI ChgI MD ChgII ChgII MD CA CA MD
Combined RR and MD - Proc usage on clouds
1500 3000 4500 6000
0
10
20
30
40
50
60
1500 3000 4500 6000
% C
lou
d U
sag
e
Combined RR and MD - Proc Usage with Input size
WV ChgI ChgII CA WV MD ChgI MD ChgII MD CA MD
M S Aslam - GMU VSE 11/13/17
29
Results : Fault Tolerance
Experiments
Medusa With no Faults
Medusa with faults
(Malicious faults - Digest Corrupted after Vanilla job, can not schedule to same cloud
Arbitrary Faults - Digest Corrupted after Vanilla job, But can schedule job to same cloud
Cloud outage - Crash the resource manager
Results
Malicious Fault : Workspan doubles (i.e. Fault is detected and extra replica is launched to arrive at the correct result before moving to global map reduce (same cloud is not trusted again for the replica)
Arbitrary Fault : Workspan improves (i.e. With high probability of job is launched on the same server
Cloud Outage - Requires the highest increases in Workspan – Unlike digest faults cloud outage takes more time to detect – and the copy of job has to looked up in another cloud.
,
Observation:
Medusa is able to detect and correct the arbitrary, malicious and cloud outage faults
With high input size Medusa has better Workspan for word count even with a fault as compared to no fault RR Workspan. produce better RR –Assumes even cloud performances
M S Aslam - GMU VSE 11/13/17
30
Critique Model for Estimated work s[an not detailed
Cloud Processing power estimates not presented
Why Throughput is different when directions reversed?
Stark difference in through puts between apps (were the jobs not normalized)
Only 1 fault – f=1 assumed
Mix of faults (Arbitrary + Cloud outage) not explored
Detection of Malicious and Arbitrary faults - Difference ?
All cloud stats only presented for the clouds for Word Count
Maintain the state of cloud-cloud through put is cumbersome – How expensive ?
Counter critique
Source code made available and can be explore the community
M S Aslam - GMU VSE 11/13/17
31
Important Take away Distributed MapReduce is effective
Bit prone to faults
Medusa is an efficient scheduler
Takes into account cloud conditions and Network through put
Experiment results support high gains in right condition and application
Always more efficient and stable
Medusa effective for Fault tolerance and recovery
Also effective to recover from faults and detect malicious/arbitrary faults
A very slight cost overhead
Cloud condition and transmission delays contribute significantly to Workspan regardless of the technique used
Future Work Determine the right mix of technique for various application types and clouds
Cloud classifications can be done offline periodically and learnt overtime ?
May be affinity aware application deployment strategy
Apply other hashes other than SH-256 for the digest
More Malicious faults types be studied
M S Aslam - GMU VSE 11/13/17
32
Q & A
Final
M S Aslam - GMU VSE 11/13/17
33
References
1. PePedro A. R. S. Costa, Xiao Bai†, Fernando M. V. Ramos, Miguel Correia,
Medusa: An Efficient Cloud Fault-Tolerant MapReduce, 6th IEEE/ACM
International Symposium on Cluster, Cloud, and Grid Computing, 2016
2. J. Dean and S. Ghemawat, “MapReduce: a flexible data processing tool,”
Communications of the ACM, vol. 53, pp. 72–77, Jan. 2010.
3. T. Kurze, M. Klems, D. Bermbach, et al., “Cloud federation,” in Proceedings
of the 2nd International Conference on Cloud Computing, Grids, and
Virtualization, Sept. 2011.
4. M. Kandias, N. Virvilis, and D. Gritzalis, “The insider threat in cloud
computing,” in Critical Information Infrastructure Security, vol. 6983 of LNCS,
pp. 93–103, Springer Berlin Heidelberg, 2013.
M S Aslam - GMU VSE 11/13/17
34
Additional NotesApplication Basics
WordCount in a multi-cloud system can be considered as building the inverted indexes of a multi-site web search engine for each search site and then aggregating the results
WebdataScan is a benchmarking application that extracts samples from a large data set
Monsterquery is a benchmarking application that queries part of the data from a large data set
---------------------------------------------------------------------------------------------------------------------------------------
Message queuing service (MQ) uses reliable channels so that no messages are lost, duplicated or corrupted. In practice, this is provided by establishing TCP/IP connections (A message is only lost if the cloud is unreachable) also use for Cloud Outage Detection
-----------------------------------------------------------------------------------------------------------
Features for the Processing time on each cloud are retrieved from filesystem information on Node Masters
jobs that are currently running, the percentage of completion of jobs, the number of MapReduce jobs queued, and the size of the input data
M S Aslam - GMU VSE 11/13/17
35
Back
Top Related