Cloud schedulers and Scheduling in Hadoop

CLOUD SCHEDULERS

P A L L A V J H A ( 1 0 - 1 - 5 - 0 2 3 )P R A B H A K A R B A R U A ( 1 0 - 1 - 5 - 0 1 7 )P R A B O D H H E N D ( 1 0 - 1 - 5 - 0 5 3 )J U G A L A S S U D A N I ( 1 0 - 1 - 5 - 0 6 8 )P R E M C H A N D R A ( 0 9 - 1 - 5 - 0 6 2 )

SCHEDULING IN HADOOPWhen a node has an empty task slot, Hadoop chooses a task for it from one of three categories. First, if any task has failed, it is given highest priority. This is done to detect when a task fails repeatedly due to a bug and stop the job. Second, unscheduled tasks are considered. For maps, tasks with data local to the node are chosen first. Finally, Hadoop looks for a task to speculate on.To select speculative tasks, Hadoop monitors task progress using a progress score, which is a number from 0 to 1. For a map, the score is the fraction of input data read. For a reduce task, the execution is divided into three phases, each of which accounts for 1/3 of the score:1.The copy phase, when the task is copying outputsof all maps. In this phase, the score is the percent ofmaps that output has been copied from.2.The sort phase, when map outputs are sorted by key.Here the score is the percent of data merged.3.The reduce phase, when a user-defined function isapplied to the map outputs. Here the score is thepercent of data passed through the reduce function

ASSUMPTIONS IN HADOOP’S SCHEDULER1. Nodes can perform work at roughly the same rate.2. Tasks progress at a constant rate throughout time.3. There is no cost to launching a speculative task on anode that would otherwise have an idle slot.4. A task’s progress score is roughly equal to the fraction

of its total work that it has done. Specifically,in a reduce task, the copy, reduce and merge phaseseach take 1/3 of the total time.5. Tasks tend to finish in waves, so a task with a lowprogress score is likely a slow task.6. Different tasks of the same category (map or reduce)require roughly the same amount of work.

RDDan RDD is a read-only, partitioned collection of records. RDDs

can only becreated through deterministic operations on either (1) data in stable storage or (2) other RDDs. We call these operations transformations todifferentiate them from other operations on RDDs. Examples of

transformations include map, filter, and join.

RDDs do not need to be materialized at all times. Instead, an RDD has enough information about how it was

derived from other datasets (its lineage) to compute itspartitions from data in stable storage. This is a powerful

property: in essence, a program cannot reference anRDD that it cannot reconstruct after a failure.

RDD

RDDs do not need to be materialized at all times. Instead, an RDD has enough information about how it was derived from other datasets (its lineage) to compute its partitions from data in stable storage. This is a powerful property: in essence, a program cannot reference anRDD that it cannot reconstruct after a failure.

APPLICATION OF RDD: “LOG MINING”

val lines = spark.textFile(“hdfs://...”)

val errors = lines.filter(_.startsWith(“ERROR”))

val messages = errors.map(_.split('\t')(2))

messages.cache()

message.filter(_.contains(“foo”)).count

message.filter(_.contains(“bar”)).count

RDD FAULT TOLERANCERDDs track the series of transformations used

to build them (their lineage) to recompute lost data

E.g: messages =

textFile(...).filter(_.contains(“error”)) .map(_.split(‘\t’)(2)

NAIVE FAIR SHARING ALGORITHM

1. when a heartbeat is received from node n:

2. if n has a free slot then

3. sort jobs in increasing order of number of running tasks

4. for j in jobs do

5. if j has unlaunched task t with data on n then

6. launch t on n

7. else if j has unlaunched task t then

8. launch t on n

9. end if

10. end for

11. end if

DELAY SCHEDULING1. Initialize j.skipcount to 0 for all jobs j.2. when a heartbeat is received from node n:3. if n has a free slot then4. sort jobs in increasing order of number of running tasks5. for j in jobs do6. if j has unlaunched task t with data on n then7. launch t on n8. set j.skipcount = 09. else if j has unlaunched task t then10.if j.skipcount >= D then11.launch t on n12.else13.set j.skipcount = j.skipcount +114.end if15.end if16.end for17.end if

LATE(LONGEST APPROXIMATE TIME TO END) SCHEDULER

the LATE algorithm works as follows:• If a task slot becomes available and there are lessthan SpeculativeCap speculative tasks running:– Ignore the request if the node’s total progressis below SlowNodeThreshold.– Rank currently running, non-speculatively executed tasks by

estimated time left.– Launch a copy of the highest-ranked task withprogress rate below SlowTaskThreshold.

LIMITATIONS :FSSLocality Problems with Fair SharingThe main aspect of MapReduce that complicates scheduling is the need to place tasks near their input data. Locality increases throughput because network bandwidth in a large cluster is much lower than the total bandwidth of the cluster’s disks. Running on a node that contains the data(node locality) is most efficient, but when this is not possible, running on the same rack (rack locality) is faster than running off-rack. But in fair share scheduling we only consider node locality.

LIMITATIONS : FSSHead of line scheduling:The first locality problem occurs in small jobs (jobs that have small input files and hence have a small number of data blocks to read). The problem is that whenever a job reaches the head of the sorted list in fair share algorithm (i.e. has the fewest running tasks), one of its tasks is launched on the next slot that becomes free, no matter which node this slot is on. If the head-of-line job is small, it is unlikely to have data on the node that is given to it. For example, a job with data on 10% of nodes will only achieve 10% locality.

LIMITATIONS :FSSSTICKY SLOTSThe problem is that there is a tendency for a job to be assigned the same slot repeatedly.

Suppose that job j’s fractional share of the cluster is f . Then for any given block b, the probability that none of j’s slots are on a node with a copy of b is (1− f )RL: there are R replicas of b, each replica is on a node with L slots, and the probability that a slot does not belong to j is 1− f . Therefore, j is expected to achieve at most 1−(1− f )RL locality.

LIMITATIONS : DELAY SCHEDULINGLong Task Balancing:

To lower the chance that a node fills with long tasks, we can spread long tasks through out the cluster by changing the locality test in Algorithm to prevent jobs with long tasks from launching tasks on nodes that are running a higher-than-average number of long tasks. Although we do not know which jobs have long tasks in advance, we can treat new jobs as long-task jobs, and mark them as short-task jobs if their tasks finish quickly.

LIMITATIONS :DELAY SCHEDULING

Hotspots are only likely to occur if multiple jobs need to read the same data file, and that file is small enough that copies of its blocks are only present on a small fraction of nodes. In this case, no scheduling algorithm can achieve high locality without excessive queueing delays.

LIMITATIONS:LATE SCHEDULERIf some commodity hardware computers is running behind its peers this scheduler,instead of trying to finding out the reasons as to why it is behaving this way, it marks it as a straggler. The complications associated with it are tremendous as this does not observe whether it is temporary defect or a permanent crippling one we are not giving it any more tasks during the entire duration of computation

PROPOSITIONWe have tried to crete a scheduler which may be able to circumvent the limitations described earlier. In this scheduler the task is enqueued into the priority queues of the nodes where the data for those tasks are avialable.

Algorithm1. Retrieve the list of local nodes from the arriving task.2. Set n= REPLICATION.FACTOR 3. Create n instances of the task in the in n task trackers ’ priority queue with different priority value where the data will be local for that specific task.4. The tasks will be executed in accordance with the priority status, the task which has a certain priority other than 1 will have to skip that many number of tasks if and only if those tasks have a higher priority and have arrived at a later time with reference to that task.

PROPOSITIONST1-1 T4-2 T5-1

T2-1 T5-2

T1-2 T2-3

T4-1 T2-2

T1-3 T4-3 T5-3

TT1

TT2

TT3

TT4

TT5

PROPOSITIONST4-2 T5-1

T5-2

T2-3

T2-2

T4-3 T5-3

TT1

TT2

TT3

TT4

TT5

EXPERIMENT AND RESULTSFCFS Simulation50 Cloudlets; 8 VMsTotal time to Complete: 1165.96 msPQRST Simulation50 Cloudlets; 8 VMsTotal time to Complete: 1120.10 ms

THANK YOU

Cloud schedulers and Scheduling in Hadoop

Engineering

Transcript of Cloud schedulers and Scheduling in Hadoop