Post on 17-Jul-2015
OPTIMAL RESOURCE
PROVISIONING FOR RUNNING
MAPREDUCE PROGRAMS IN
THE CLOUD
Presented By:
Group Id: 29
Priyanka Sangtani
Anshul Aggarwal
Pooja Jain
PROBLEM STATEMENT
The problem at hand is defining a resource provisioning framework for MapReduce jobs running in a cloud keeping in mind performance goals such as
Resource utilization with
-optimal number of map and reduce slots
-improvements in execution time
-Highly scalable solution
This is a design issue related to software frameworks available in cloud . Traditional provisioning frameworks provide the users with defaults which do not lend well to Mapreduce jobs .
Such jobs are highly parallelizable and our proposed algorithm aims to use this fact to provide highly optimized resource provisioning suitable for Mapreduce.
MAPREDUCE OVERVIEW
In a typical MapReduce framework, data are divided into blocks and distributed across many nodes in a cluster and the MapReduce framework takes advantage of data locality by shipping computation to data rather than moving data to where it is processed.
Most input data blocks to MapReduce applications are located on the local node, so they can be loaded very fast and reading multiple blocks can be done on multiple nodes in parallel.
Therefore, MapReduce can achieve very high aggregate I/O bandwidth and data processing rate.
WHY MAPREDUCE OPTIMIZATION
MapReduce programming paradigm lends itself well to most data-intensive analytics jobs, given its ability to scale-out and leverage several machines to parallel process data.
Research has demonstrated that existing approaches to provisioning other applications in the cloud are not immediately relevant to MapReduce -based applications
MapReduce jobs have over 180 configuration parameters . Too high a value can potentially cause resource contention and degrade overall performance. Setting a low value, on the other hand, might under-utilize the resources, and once again reduce performance.
Each application has a different bottleneck resource (CPU:Disk:Network), and different bottleneck resource utilization, and thus needs to pick a different combination of these parameters such that the bottleneck resource is maximally utilized.
WORK FLOW OF PROPOSED SOLUTION
User Application
Signature Matching Algorithm
SLO Based Provisioning
Priority Algorithm
Bottleneck Removal
Database with
Signature
YesNo
Resource Provisioning
Framework
Optimal no. of map / reduce
slots
PROPOSED ALGORITHM
1.Signature Matching
A sample of input is run on the cloud to generate a resource consumption signature . This
signature is matched with a database. If a match is found, we can use the optimal configurations
stored for the matched signature else we move to SLO-based provisioning.
2. SLO based Resource Provisioning
Based on the number of maps and reduce jobs, available slots and time constraints , we
calculate the optimal number of maps and reduce jobs to run in parallel .
3. Priority Assignment
To give users a better control over provisioning, we can assign priorities in this stage
4. Skew Mitigation
Managing parallel partitions.
5. Bottle Neck Removal
The most common problem in parallel computation is bottleneck.
6. Deadlock detection removal
This stage deals with deadlock removal to improve execution time.
1 . SIGNATURE MATCHING
MATHEMATICAL MODEL Entire job run split into n (a pre-chosen number) intervals with
each interval having the same duration.
For the ith interval, compute the average resource consumption for each, rth resource. The resource types (us, sy, wa, id, bi, bo, ni, no , sr = % of CPU in user time, system time, waiting time, ideal time, disk block in, disk block out, network in and network out ,slow ratio respectively)
Generate a resource consumption signature set, Sr , for every rthresource as
Srm = {Srm1, Srm2 , ..., Srmn }
The signature distances between the generated signatures and the signature of the databases is computed as
X2(𝑆𝑅1𝑚 , 𝑆
𝑅1𝑚 ) =
𝐼=1
𝑛
(𝑆𝑅1𝑚𝑖−𝑆
𝑅2𝑚𝑖) 2/(𝑆
𝑅1𝑚𝑖 +𝑆
𝑅2𝑚𝑖)
χ2 represents the vector distance between two signatures for a particular resource r in time-interval vector space. We compute scalar addition of χ2 for all the resource types . Lower value of sum of χ2 indicates more similar signatures. We choose the configuration of the application that has the closest signature distance sum to the new application.
ALGORITHM
1. Take a sample input IS of appropriate size from actual input.
2. Take a resource set RS .
3. Take the signature database with average distance between signatures
DAVG..
4 .Split the entire job run into n (a pre-chosen number) intervals with each
interval having the same duration.
5. For all the resource types in (us, sy, wa, id, bi, bo, ni, no ,sr )
6. For the ith interval from 1 to n
7. Compute the average resource consumption . We generate a
resource consumption signature set, Sr , for every rth resource as Srm =
{Srm1, Srm2 , ..., Srmn }.
8. Set min_distance = 10000.
9. For every signature S in database
10. Find the distance D between the calculated signature and S
11. If D < min_distance , set min_distance = D and Signature_matched
= S
12. Set precision value P
13. If D > P*DAVG , return no match found
14. Else return Signature_matched
2. SLO – BASED PROVISIONING
Given a MapReduce job J with input dataset D identify minimal combinations (S JM
, S JR) of map and reduce slots that can be allocated to job J so that it finishes within time T?
Step I: Create a compact job profile that reflects all phases of a given job: map, shuffle/sort and reduce phases.
Map Stage: (Mmin,Mavg,Mmax,AvgSizeinputM , SelectivityM)
Shuffle Stage: (Sh1avg, Sh1
max, Shtypavg, Shtyp
max)
Reduce Stage: (Rmin,Ravg ,SelectivityR)
Step II: There are three design choices according to the completion time-
1) T is targeted as a lower bound of the job completion time. Typically, this leads to the least amount of resources allocated to the job for finishing within deadline T.
The lower bound corresponds to an ideal computation under allocated resources and is rarely achievable in real environments.
2) T is targeted as an upper bound of the job completion time. Typically, this leads to a more aggressive resource allocations and might lead to a job completion time that is much smaller than T because worst case scenarios are also rare in production settings.
3) Given time T is targeted as the average between lower and upper bounds on job completion time. This more balanced resource allocation might provide a solution that enables the job to complete within time T.
Mathematical Model –
Makespan Algo: The makespan of the greedy task assignment is at least n*avg /k and at most (n − 1)*avg/k + max.
Suppose the dataset is partitioned into NJM map tasks and NJ
R reduce tasks. Let SJM and SJ
R be the number of map and reduce slots.
By Makespan Theorem, the lower and upper bounds on the duration of the entire map stage (denoted as Tlow
M and TupM respectively) are estimated as follows:
T lowM = NJ
M * Mavg/SJM
T upM = (NJ
M− 1) * Mavg/SJM +Mmax
T lowsh = (NJ
r /SJr -1)* Shtyp
avg
T upsh = ((NJ
r− 1) /SJr ) -1)* Shtyp
avg +Shtypmax
T lowM = T low
M + Sh1avg + T low
sh +T lowR
T upM = T up
M + Sh1avg + T up
sh +T upR
T lowJ = NJ
M·Mavg / SJM+ NJ
R·(Shtypavg+Ravg) / S
JR+ Sh1
avg−Shtypavg
Tlowj = Alow
J·NJM/SJ
M+ BlowJ·N
JR /SJ
R+ ClowJ
Where
AlowJ = Mavg
BlowJ = Shtyp
avg+Ravg
ClowJ = Sh1
avg−Shtypavg
Taking Tlowj as T (expected completion time),
T= AlowJ·N
JM/SJM+ Blow
J·NJR /SJ
R+ ClowJ
In the algorithm, T is targeted as a lower bound of the job completion time. The algorithm sweeps through the entire range of map slot allocations and finds the corresponding values of reduce slots that are needed to complete the job within time T.
Resource allocation algorithm
Input:
Job profile of J
(NJM,NJ
R) ←Number of map and reduce tasks of J
(SM, SR) ←Total number of map and reduce slots in the cluster
T ←Deadline by which job must be completed
Output: P ←Set of plausible resource allocations SJM,SJ
R
Algorithm:
for SJM← MIN(NJ
M, SM) to 1 do
Solve the equation AlowJ·N
JM /SJ
M+ BlowJ·N
JRSJ
R= T − ClowJ for SJ
R
if 0 < SJR≤ SR then
P ← P ∪ (SJM, SJ
R)
else
// Job cannot be completed within deadline T
// with the allocated map slots
Break out of the loop
end if
end for
The complexity of the above proposed algorithm is O(min(NJM,Sm)) and thus linear in the number of map
slots.
3. PRIORITY ALGORITHM
Workflow Priority
o prioritizes entire workflows
o increase spending on all workflows that are more important
and drop spending on less important workflows
o Importance may be implied by proximity to deadline, current
demand of anticipated output or whether the application is in a
test or production phase.
Stage Priority
o Prioritizes different stages of a single workflow
o system splits a budget according to user-defined weights
o budget is split within the workflow across the different stages
o Spending more on phases where resources are more critical,
the overall utility of the workflow may be increased
MATHEMATICAL MODEL
Workflow priority
o Lets say we have m workflow with weight vector w, i.e
w = [w1,w2…….wn]
o Total weight of job is
W= w1+w2…… wn
o Budget for workflow i is
bwi = bs* wi/W
Where bs is total budget of job.
Stage Priority
o Lets say we have m stages with weight vector sw i.e
sw = [sw1,sw2…….swm]
o Total weight of workflow is
SW= sw1+sw2……swm
o Budget for stage i is
bswi = bw* swi/SW
Where bw is total budget of workflow.
ALGORITHM
1. Consider a job with n workflow and each workflow consist of m stages.
2. User are asked to input total budget and workflow priority and stage priority.
3. Low priority has value 1 and high priority has value 0.5 to spend double on high priority.
4. Calculate budget for each workflow i.e bwi = bs* wi/W
5. Use bwi to find resource share for a workflow
6. Calculate budget for each stage i.e bswi = bw* swi/SW
7. Use bswi to find resource share for a stage
8. Workflow or stage will be given more cost and time for execution and thus high priority task have high spending rate i.e high b/d ratio.
SKEW MITIGATION
In addition, to support parallelism, partitions must be small enough that several partitions can be processed in parallel. To avoid record skew, select a partitioning function to keep each partition roughly the same size
On each node, we applies the map operation to a prefix of the records in each input file stored on that node.
As the map function produces records, the node records information about the intermediate data, such as how much larger or smaller it is than the input and the number of records generated. It also stores information about each intermediate key and the associated record's size.
It sends that metadata to the coordinator. The coordinator merges the metadata from each of the nodes to estimate the intermediate data size. It then uses this size, and the desired partition size, to compute the number of partitions.
Then, it performs a streaming merge-sort on the samples from each node. Once all the sampled data is sorted, partition boundaries are calculated based on the desired partition sizes. The result is a list of “boundary keys" that define the edges of each partition.
BOTTLENECK REMOVAL
A map-reduce system can
simultaneously run multiple
jobs competing for the node’s
resources and traffic bandwidth.
These conflicts cause slowdown
in the execution of tasks. The
duration of each phase, and hence
the duration of the job is determined
by the slowest, or straggler task.
The slowdowns of individual tasks are highly correlated with overall
job latencies.
However, significant task slowdowns tend to indicate bottlenecks in
job execution as well.
MATHEMATICAL MODEL
Bottleneck detection
Tei is expected execution time of task i.
Tri is running time of task i.
TEi>Tr
i means no bottleneck
Tri – Te
i > t means bottleneck is present where t is a time which is derived from past data .If a task is running for t more than expected time, bottleneck is detected.
Bottleneck Elimination
ni- number of idle nodes, na- number of active nodes,f – boost factor
To reduce bottleneck, we distribute task such that total spending is equal to average spending, i.e. b/d.
Spending at active node = b/d ∗ (1 + (ni/na) ∗ f)
Spending at idle node = b/d ∗ (1 − f)
E = na/na+ni*( b/d ∗ (1 + (ni/na) ∗ f)) + ni/na+ni*( b/d ∗ (1 − f))
= b/ na+ni*d(na + ni*f + ni – ni*f)
= b/ na+ni*d(na + ni)
= b/d
= Avg. Spending
ALGORITHM
Bottleneck avoidance
Step 1: Compute task and node features
1. Run the task over cloud
2. Collect the performance traces after every 10 minutes and store the result in a file
Step 2: Compute slowdown factor
1. Compare current job trace with already completed job
2. Calculate slowdown factor which is ration of current job parameter to similar job
Step 3: Give slowdown factor of each job to scheduler
1. Scheduler schedule high slowdown job first
2. Scheduler don’t schedule high slowdown job to congested hardware node
Bottleneck detection
Step 1 : Estimate execution time of each job using historical data
Step 2: Periodically compute time for which job is running
Step 3: Compare excepted execution time and running time
1. If TEi>Tr
i ,no bottleneck.
2. Else If Tri – Te
i > t, bottleneck has occurred
Bottleneck Elimination
To reduce execution time we can carry out Execution Bottleneck elimination
algorithm that will schedule redundant copies of the remaining tasks across
several nodes which do not have other work to perform
Bottleneck elimination algorithm
1. idle ← GETIDLENODES(nodes)
2. active ← nodes – idle
3. ni ← SIZE(idle)
4. na ← SIZE(active)
5. for each node ∈ active
node.spending ←b/d ∗ (1 + (ni/na) ∗ f)
6. for each node ∈ idle
node.spending ←b/d ∗ (1 − f)
where f is a boost factor whose value is between 0 and 1 and this is set by
user. b is budget and d is duration
DEADLOCKA deadlock may occur between mappers and reducers with no progress in the job
when
Initial available map/reduce slots were allocated to mappers
Once few of mappers are completed, reducers started occupying few of the slots
After a while ,all slots occupied by reducers.
Since there were still mapper tasks not yet assigned any slot, the map phase never
completed.
The system entered a deadlock state where reducers occupy all available slots, but
are waiting for mappers to be complete; mappers cannot move forward because of
no slot available.
Deadlock prevention:
Unlike existing MapReduce systems, which executes map and reduce tasks
concurrently in waves, we can implements the MapReduce programming model in two
phases of operation:
Phase 1: Map and shuffle
The Reader stage reads records from an input disk and sends them to the Mapper
stage, which applies the map function to each record. As the map function produces
intermediate records, each record's key is hashed to determine the node to which it
should be sent and placed in a per destination buffer that is given to the sender when it
is full.
Phase 2: Sort and reduce
In phase two, each partition must be sorted by key, and the reduce function must be
applied to groups of records with the same key.
Deadlock Detection:
The deadlock detector periodically probes workers to see if they are waiting for a
memory allocation request to complete.
If multiple probe cycles pass in which all workers are waiting for an allocation or are
idle, the deadlock detector informs the memory allocator that a deadlock has
occurred.
Deadlock Elimination
Process Termination: One or more process involved in the deadlock may be
aborted. We can choose to abort all processes involved in the deadlock. This ensures
that deadlock is resolved with certainty and speed.
Resource Preemption: Resources allocated to various processes may be
successively preempted and allocated to other processes until the deadlock is broken.
IMPLEMENTATION FRAMEWORK
Apache Hadoop is an open source implementation of the MapReduceprogramming model supported by Yahoo and used by google , Amazon etc
It also includes the underlying Hadoop Distributed File System (HDFS).
Hadoop has over 180 configuration parameters. Examples include number of replicas of input data, number of parallel map/reduce tasks to run, number of parallel connections for transferring data etc.
Hadoop installation comes with a default set of values for all the parameters in its configuration.
Scheduling in Hadoop is performed by a master node
Hadoop has a variety of schedulers. The original one schedules all jobs using a FIFO queue in the master. Another one, Hadoop on Demand (HOD), creates private MapReduce clusters dynamically and manages them using the Torque batch scheduler
CHALLENGES IN MAPREDUCE SIMULATIONS
The right level of abstraction.
Data layout aware.
Resource contention aware.
Heterogeneity modeling.
Resource heterogeneity is common in large clusters.
Input dependence.
Workload aware.
Verification.
Performance
Comparison of Map Reduce Simulators
Based on Language GUI Support Workload-
aware
Resource-
contention
aware
MRPerf Ns-2 JAVA Yes Yes Yes
Cardona et al. GridSim C No Yes No
Mumak Hadoop C No Yes No
SimMR From scratch - - Yes No
HSim From scratch - - No Yes
MRSim GridSim JAVA Yes No Yes
SimMapReduc
e
GridSim JAVA Yes No yes
Prior simulators on evaluating schedulers are trace-driven and aware of other jobs in a work-load, but are limited in that they are not aware of resource contention, so tasks execution time may not be accurate. Our algorithm optimizes resource provisioning so we require resource-contention-aware simulator.
It is almost impractical to set up a very large cluster consisting hundreds or thousands of nodes to measure the scalability of an algorithm. Hadoop environment set up involves alterations of a great number of parameters which are crucial to achieve best performances. An obvious solution to the above problems is to use a simulator which can simulate the Hadoop environment; a simulator on one hand allows us to measure scalability of MapReduce based applications easily and quickly, on the other hand determines the effects of different configurations of Hadoop setup on MapReduce based applications behavior in terms of speed.
MRPerf is implemented based on ns-2, a packet-level network simulator, and its performance is much worse than other simulators. It could not generate accurate results for jobs of different type of algorithms or different cluster configurations.
No existing implementation of HSim is available so it will require a lot of work to start from scratch.
Most of the current ongoing works in cloud computing are being done on the CloudSim simulator but since our problem entails use of map reduce model and no implementation is provided by CloudSim to support MapReduce , we are not using it.
MRSim is extending discrete event engine used SimJava to accurately simulate the Hadoop environment. Using SimJavawe simulate interactions between different entities within cluster. GridSim package is also used for network simulation. It is written in Java programming language on top of SimJava.
MRSIM ARCHITECTURE
MRSim model simulates network topology and
traffic using GridSim. On the other hand, it models
the rest of system entities using SimJava discrete
event engine. The System is designed using object
oriented based models.
Each machine is part of Network Topology model.
Each machine can host Job Tracker process and
Task Tracker Process. However there is only one
Job Tracker per MapReduce Cluster. Each Task
Tracker Model can launch several Map and Reduce
tasks up to the maximum allowed number in the
configuration files.
WHAT IS SIMJAVA?
SimJava is a discrete event, process oriented simulation
package. It is an API that augments Java with building blocks
for defining and running simulations.
Each system is considered to be a set of interacting
processes or entities as they are referred to in SimJava. These
entities communicate with each other by passing events. The
simulation time progresses on the basis of these events.
Progress is recorded as trace messages and saved in a file.
As of version 2.0, SimJava has been augmented with
considerable statistical and reporting support.
CONSTRUCTING A SIMULATION INVOLVES :
Coding the behavior of simulation entities - done by
extending the sim_entity class and using the body()
method.
Adding instances of these entities using sim_system
object using sim_system.add(entity)
linking entities ports together,using
sim_system.link_ports()
finally,setting the simulation in motion using
sim_system.run().
GRIDSIM
allows modelling and simulation of entities in parallel and distributed
computing (PDC) systems-users, applications, resources, and resource
brokers (schedulers) for design and evaluation of scheduling
algorithms.
provides a comprehensive facility for creating different classes of
heterogeneous resources that can be aggregated using resource
brokers. for solving compute and data intensive applications. A
resource can be a single processor or multi-processor with shared or
distributed memory and managed by time or space shared schedulers.
The processing nodes within a resource can be heterogeneous in
terms of processing capability, configuration, and availability. The
resource brokers use scheduling algorithms or policies for mapping
jobs to resources to optimize system or user objectives depending on
their goals.
JACKSON MODEL
Jackson Api contains a lot of functionalities to read and
build json using java.
It has very powerful data binding capabilities and provides
a framework to serialize custom java objects to json string
and deserialize json string back to java objects.
Json written with jackson can contain embedded class
information that helps in creating the complete object tree
during deserialization.
JACKSON API
//1. Convert Java object to JSON format
ObjectMapper mapper = new ObjectMapper();
mapper.writeValue(new File("c:\\user.json"), user);
//2. Convert JSON to Java object
ObjectMapper mapper = new ObjectMapper();
User user = mapper.readValue(new File("c:\\user.json"),
User.class);
JOB TRACKER LAYOUT
The main components of the simulator is Job Tracker that controls generating map and reduce tasks, monitors when different phases complete, and producing the final results.
Map task is started by Job Tracker. The following processes take place;
• A Java VM is instantiated for the task.
• Data is read from the local disk or requested remotely.
• Map, sort, and spill operations are performed on the input data until all of it has been consumed.
• Background file system mergers are merging the output data to reduce the number of output files to one or few files.
• A message indicating the completion of the map task is returned to the Job Tracker.
DEMO – MRSIM
COMPARISON PARAMETERS
Number of map and reduce slots
CPU Usage
Hard-disk Utilization
Average Mapper Time
Average Reducer Time
Execution Time
JOB PROFILES
Referred from Resource Provisioning Framework for MapReduce Jobs with
Performance Goals , Abhishek Verma1, Ludmila Cherkasova2, and
Roy H. Campbell
TIME DURATION FOR DIFFERENT PHASES
PROFILE NoOfMap,
NoOfReduce
T1 T2 T3
Profile1 7,10 SLO 1398 1344 1357
SIGN + PRIOR 1209 1207 1217
Profile2 7,10 SLO 1367 1368 1387
SIGN + PRIOR 1276 1256 1273
Profile3 3,12 SLO 1397 1380 1363
SIGN + PRIOR 1245 1288 1253
Profile4 12,16 SLO 1320 1402 1409
SIGN + PRIOR 1263 1285 1207
Profile5 46,14 SLO 1316 1368 1353
SIGN + PRIOR 1208 1254 1256
Profile6 12,2 SLO 1342 1376 1332
SIGN + PRIOR 1267 1265 1287
Profile7(Job can’t be completed) 22,33 SLO 472 450 430
SIGN + PRIOR 0 0 0
Profile8 16,12 SLO 1327 1396 1376
SIGN + PRIOR 1233 1265 1274
MEAN TIME OVERHEADS FOR VARIOUS PHASES
SLO FAILED(JOB CAN’T BE COMPLETED
WITHIN DEADLINE)
420
SLO EXECUTED 1334
Signature not found 1337
Signature found 937
Priority 331
COMPARISON OF BASE ALGORITHM VS PROPOSED
ALGORITHM
PROFILE NO OF
MAPPER
NO OF
REDU
CERS
BASE ALGO CPU USAGE HDD UTILIZATN TIME AV MAPPER
TIME
AV
REDUCER
TIME
OUR ALGO
Profile 1 60 1 0.00001429 0.00105 1919 28.021 238.179
0.0000020 0.00403 2372 25.313 853.76
Profile 2 7 10 0.000001653 0.001834 5200 291.21 316.163
0.0002732 0.003917 4095 283.891 112.045
Profile 3 7 10 0.000003592 0.0031320 3044 314.459 84.322
0.00913784 0.01550 4108 281.432 114.249
Profile 4 3 12 0.0000023 0.03093 4259 1143.458 69.098
0.0008095 0.01197 4066 425.292 108.949
CONTI…..Profile 5 12 16 0.000015307 0.002802 5239 164.185 204.315
0.001846 0.022107 4240 286.6 124.45
Profile 6 46 14 0.000036771 0.0024045 4163 426. 536 117.796
0.0010386 0.01082 3171 44.416 105. 881
Profile 7 12 2 0.00021723 0.005321 3986 205.405 137.099
0.0003971 0.007538 2739 426.411 100.124
Profile 8 16 12 0.00010813 0.0028452 4136 426.987 75.338
0.00478452 0.0093604 2863 122.479 114.748
CPU UTILIZATION
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.01
Base Algorithm
Proposed Algorithm
HARD – DISK UTILIZATION
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Base Algorithm
Proposed Algorithm
EXECUTION TIME
0
1000
2000
3000
4000
5000
6000
Base Algorithm
Proposed Algorithm
AVERAGE MAPPER TIME
0
200
400
600
800
1000
1200
1400
Base Algorithm
Proposed Algorithm
AVERAGE REDUCER TIME
0
100
200
300
400
500
600
700
800
900
Base Algorithm
Proposed Algorithm
RESULTS FOR JOB PROFILE 1
GRAPHS FOR PROFILE 1
RESULTS FOR JOB PROFILE 2
GRAPHICAL COMPARISON FOR PROFILE 2
TRACE FOR EXECUTION
INFO GUISimulator:114 - <init>- done
Initialising...
INFO HTopology:112 - initGridSim- Initializing GridSim package
Initialising...
INFO HSimulator:64 - initSimulator- creat new Result dir /home/hadoop/workspace/work/hadoop.simulator/results/26-27-Apr-2010 19:57:55
INFO HJobTracker:311 - createEntities- create topology
INFO HJobTracker:314 - createEntities- config.Heartbeat:1.0, read topology.getName:rack 0
INFO HJobTracker:318 - createEntities- init NetEnd from rack
INFO GUISimulator:389 - mnuSimStartActionPerformed- simulator has started simulator
INFO HSimulator:106 - startSimulator- Starting simulator version
INFO HSimulator:117 - startSimulator- trace level200
INFO HSimulator:120 - startSimulator- graph file: /home/hadoop/workspace/work/hadoop.simulator/results/26-27-Apr-2010 19:57:55/graph.sjg
INFO HSimulator:125 - startSimulator- going to call Sim_system.run()
Entities started.
Entity huser has no body().
INFO HJobTracker:129 - body- start entity
INFO SimoTreeCollector:94 - body- add rack {m1=m1}
INFO GUISimulator:394 - mnuSimStopActionPerformed- going to stop simulator
INFO HTopology:252 - stopSimulation- Stopping NetEnd Simulation
TRACE CONTINUED…
INFO HJobTracker:622 - stopSimulation- send end of simualtion 10.0
INFO InMemFSMergeThread:71 - body- m1-reduce-0-inMemFSMergeThread END_OF_SIMULATION 10.0
INFO CPU:148 - body- cpu_m1 END_OF_SIMULATION 10.0
INFO InMemFSMergeThread:71 - body- m1-reduce-0-inMemFSMergeThread END_OF_SIMULATION 10.0
INFO HDD:148 - body- hdd_m1 END_OF_SIMULATION 10.0
INFO InMemFSMergeThread:71 - body- m1-reduce-1-inMemFSMergeThread END_OF_SIMULATION 10.0
INFO HTask:166 - body- m1-reduce-0 END_OF_SIMULATION 10.0
INFO HTask:166 - body- m1-map-1 END_OF_SIMULATION 10.0
INFO HTask:166 - body- m1-map-2 END_OF_SIMULATION 10.0
INFO HTask:166 - body- m1-map-0 END_OF_SIMULATION 10.0
INFO HTask:166 - body- m1-reduce-1 END_OF_SIMULATION 10.0
INFO HTask:166 - body- m1-map-3 END_OF_SIMULATION 10.0
INFO NetEnd:100 - body- m1 end simulation at time 10.0
INFO HTask:166 - body- m1-map-0 END_OF_SIMULATION 10.0
INFO HTask:166 - body- m1-map-1 END_OF_SIMULATION 10.0
INFO HTask:166 - body- m1-map-2 END_OF_SIMULATION 10.0
INFO HTask:166 - body- m1-map-3 END_OF_SIMULATION 10.0
INFO HTask:166 - body- m1-reduce-0 END_OF_SIMULATION 10.0
INFO HTask:166 - body- m1-reduce-1 END_OF_SIMULATION 10.0
INFO SimoTreeCollector:78 - body- simotree END_OF_SIMULATION 10.0
INFO InMemFSMergeThread:71 - body- m1-reduce-1-inMemFSMergeThread END_OF_SIMULATION 10.0
OUTPUT SNAPSHOTS FOR PROPOSED
ALGORITHM
REFERENCES
[1] E. Bortnikov, A. Frank, E. Hillel, and S. Rao, “Predicting execution bottlenecks in map-reduce clusters” In Proc. of the 4th USENIX conference on Hot Topics in Cloud computing, 2012.
[2] R. Buyya, S. K. Garg, and R. N. Calheiros, “SLA-Oriented Resource Provisioning for Cloud Computing: Challenges, Architecture, and Solutions” In International Conference on Cloud and Service Computing, 2011.
[3] S. Chaisiri, Bu-Sung Lee, and D. Niyato, “Optimization of Resource Provisioning Cost in Cloud Computing” in Transactions On Service Computing, Vol. 5, No. 2, IEEE, April-June 2012
[4] L Cherkasova and R.H. Campbell, “Resource Provisioning Framework for MapReduce Jobs with Performance Goals”, in Middleware 2011, LNCS 7049, pp. 165–186, 2011
[5] J. Dean, and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, Communications of the ACM, Jan 2008
[6] Y. Hu, J. Wong, G. Iszlai, and M. Litoiu, “Resource Provisioning for Cloud Computing” In Proc. of the 2009 Conference of the Center for Advanced Studies on Collaborative Research, 2009.
[7] K. Kambatla, A. Pathak, and H. Pucha, “Towards optimizing hadoop provisioning in the cloud in Proc. of the First Workshop on Hot Topics in Cloud Computing, 2009
[8] Kuyoro S. O., Ibikunle F. and Awodele O., “Cloud Computing Security Issues and Challenges” in International Journal of Computer Networks (IJCN), Vol. 3, Issue 5, 2011
[9] R. Lammel, “Google’s MapReduce programming model – Revisited” in Journal of Science of Computer Programming, Oct 2007
[10] R. P. Padhy, “Big Data Processing with Hadoop-MapReduce in Cloud Systems” In International Journal of Cloud Computing and Services Science, vol. 2, Feb 2013.
[11] B. Palanisamy, A. Singh, L. Liu and B. Langston, "Cura: A Cost-Optimized Model for MapReducein a Cloud", Proc. of 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013)
[12] A. Rasmussen, M. Conley, R. Kapoor, V. T. Lam, G. Porter, and A. Vahdat, “Themis: An I/O-Efficient MapReduce”, Communications of the ACM, Oct 2012
[13] V. K. Reddy, B. T. Rao, Dr. L.S.S. Reddy, and P. S. Kiran ,” Research Issues in Cloud Computing” in Global Journal of Computer Science and Technology, vol. 11, Jul 2011
[14] T. Sandholm and K. Lai, “MapReduce Optimization Using Regulated Dynamic Prioritization” in Social Computing Laboratory, Hewlett-Packard Laboratories, 2011
[15] F. Tian, K. Chen,”Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds”, in 4th Intl. Conference on Cloud Computing, IEEE, 2011
[16] Hadoop. http://hadoop.apache.org.
[17] Amazon Elastic MapReduce, http://aws.amazon.com/elasticmapreduce/