Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The...

SAMR: A Self-adaptive MapReduce Scheduling Algorithm

In Heterogeneous Environment

Quan Chen Daqiang Zhang Minyi

Guo Qianni DengDepartment of Computer Science

Shanghai Jiao Tong University, Shanghai, China

Song GuoSchool of Computer Science and

Engineering,The University of Aizu, Japan

Presented by Xiaoyu Sun

Authors

Table of Contents

Overview

Scheduling in Hadoop

Heterogeneity in Hadoop

The LATE Scheduler(Longest Approximate Time to End)

The SAMR(A Self-adaptive MapReduce Scheduling Algorithm) Scheduler

Experiment

Conclusion

Overview User

Program

Worker

Master

Worker

fork fork fork

assignmap

assignreduce

readlocalwrite

remoteread,sort

OutputFile 0

OutputFile 1

Split 0Split 1Split 2

Input Data

The Map Step

k vmap

Inputkey-value pairs

Intermediatekey-value pairs

The Reduce Step

Intermediatekey-value pairs

reduce

Key-value groups Output key-value pairs

Overview

Google has noted that speculative execution improves response time by 44%

The paper shows an efficient way to do speculative execution in order to maximize performance

It also shows that Hadoop’s simple speculative algorithm based on comparing each task’s progress to the average progress brakes down in heterogeneous systems

Overview

The proposed scheduling algorithm increases Hadoop’s response time

The paper addresses two important problems in speculative execution: Choosing the best node to run the speculative

task Distinguishing between nodes slightly slower than

the mean and stragglers

Assumptions made by Hadoop Scheduler:

Nodes can perform work at roughly the same rate

Tasks progress at a constant rate throughout time

R1:1/3

• Copy data

R2:1/3

• Order

• Execute map function

• Reorder intermediate results

Reduce Task

Map Task

• Copy• 1/3

Done• Sort• 1/3

Done• Merge• 1/4

Processing

• Copy• 1/3

Done• Sort• 1/3

Processing

• Copy• 1/3

Done• Sort• 1/5

Done Processing

Task3X

If Average PS is 10/15

• Copy• 1/3

Done• Sort• 1/3

Processing

• Copy• 1/3

Done• Sort• 1/3

Processing

• Copy• 1/3

Done• Sort• 1/5

Done• Merge• wating

Processing

Task3X

8/1540s

• Copy• 1/3

Done• Sort• 1/4

Done• Merge• waiting

Processing

• Copy• 1/3

Done• Sort• 1/12

Processing

• Copy• 1/3

Done• Sort• waiting

Processing

• Copy• 1/3

Processing

Not Data locality

Data locality

The LATE Scheduler

R1:1/3

• Copy data

R2:1/3

• Order

Reduce Task

Map Task

The LATE Scheduler

• Copy• 1/3

Done• Sort• 1/3

Processing

• Copy• 1/3

Done• Sort• 1/4

Processing

X 11/12

The LATE Scheduler

• Copy• 1/3

Done• Sort• waiting

Processing

• Copy• 1/3

Processing

Not Data locality

Data locality

The LATE Scheduler

In order to get the best chance to beat the original task which was speculated the algorithm launches speculative tasks only on fast nodes

It does this using a SlowNodeThreshold which is a metric of the total work performed

Because speculative tasks cost resources LATE uses two additional heuristics:

A limit on the number of speculative tasks executed (SpeculativeCap)

A SlowTaskThreshold that determines if a task is slow enough in order to get speculated (uses progress rate for comparison)

The SAMR Scheduler

• Copy data

• Order

Reduce Task

Map Task

The SAMR Scheduler

The way to use and update historical information

The SAMR Scheduler

SLOW_TASK_CAP (STaC)

The SAMR Scheduler

SLOW_TRACKER_CAP (STrC)

The SAMR Scheduler

SLOW_TRACKER_PRO (STrP)

SlowTrackerNum< STrP*TrackerNum (14)

The SAMR Scheduler

Launching backup tasks

BackupNum <BP(Backup Pro) * TaskNum (15)

The SAMR Scheduler

Experiment

Affection of “HP” on the execute time

Experiment

Affection of “STac”,”STrC”, and “STrP” on the execute time

Experiment

Affection of “BP” on the execute time

Experiment

Historical information and Real information on all 8 nodes

Experiment

HP=0.2

STaC=0.3

STrC=0.2

STrP=0.3

and BP=0.2

Experiment

The execute results of “Sort” running on the experiment platform.

Experiment

LATE decreases about 7% execute time

LATE using historical information decrease about 15% execute time

SAMR decreases about 24% execute time compared to Hadoop

Conclusion

Identify the problem in Hadoop’s scheduler

Compare two schedulers for improving the performance of MapReduce in heterogeneous environment

How to improve the performance of SAMR

Thanks

Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The...

Documents

Transcript of Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The...

SAS Configuration Guide for Base SAS and SAS/ACCESS …...the Apache Oozie Workflow Scheduler for Hadoop. • Scalable Performance Data (SPD) Engine enables Base SAS users to use Hadoop

COSHH: A Classi cation and Optimization based Scheduler ...downd/coshh.pdf · COSHH: A Classi cation and Optimization based Scheduler for Heterogeneous Hadoop Systems Aysan Rasooli

A GRID META-SCHEDULER USING COMMUNITY SCHEDULER …eprints.usm.my/31401/1/TAN_ZHEN_LING.pdf · A GRID META-SCHEDULER USING COMMUNITY SCHEDULER FRAMEWORK (CSF) WITH DIFFERENT PLUG-IN

Table of ContentsTable of Contents - aaglobal.com€¦ · Table of ContentsTable of Contents Fax 410-252-7137 Phone 800-638-6000 Online aaglobal.com Visit aaglobal.com to view our

Table of ContentsTable of Contents. 1. Polity and Governance ..... 1

Table of ContentsTable of Contentsosproperty.ext4joomla.com/OS_Transparent.pdf · Table of ContentsTable of Contents Table of ContentsTable of Contents..... .....2222

A Constraint Programming Based Hadoop Scheduler for ... · A Constraint Programming Based Hadoop ... jobs with deadlines and achieve high system performance is devised. The MRCP algorithm

Simulation and Performance Evaluation of the Hadoop Capacity Scheduler · 2014-10-01 · The Task Scheduler is an important part of the MapReduce framework. Initially, MapReduce was

Proactive Process-Level Live Migration in HPC Environmentsengelman/publications/... · zJob exec. resume nodes n0 n1 n2 n3 lamd scheduler lamd scheduler lamd scheduler lamd scheduler

HFSP: Bringing Size-Based Scheduling To Hadoopprofs.sci.univr.it/~carra/downloads/carra_TCC2015.pdf · the widely used Hadoop Fair scheduler, without impacting the fairness of the

Hadoop , Hadoop , Hadoop !!!

Towards a Resource Aware Scheduler in Hadoop · Towards a Resource Aware Scheduler in Hadoop Mark Yong, Nitin Garegrat, Shiwali Mohan Computer Science and Engineering, University

Cisco Tidal Enterprise Scheduler Hive Adapter Guide · Hadoop Cluster. The Adapter is designed using the same user interface approach as other Enterprise Scheduler adapter jobs, seamlessly

Managing & Scheduling Jobs, Cluster Maintenance & Logging€¦ · Hadoop Administration Managing & Scheduling Jobs, Cluster Maintenance & Logging . MapReduce Schedulers • Scheduler

Deploying Hadoop with SUSE Manager...•Work flow scheduler system to manage Apache Hadoop jobs •Workflow jobs are Directed Acyclical Graphs (DAGs) of actions •Coordinator jobs

Natjam: Design and Evaluation of Eviction Policies …publish.illinois.edu/assured-cloudcomputing/files/2014/...Hadoop YARN scheduler (Hadoop 0.23). Natjam’s ﬁrst challenge is

CLUSTER CONTINUOUS DELIVERY WITH OOZIE is a workflow scheduler system to manage Apache Hadoop jobs. ... Jenkins / Build Server Driven Deployments: Git Repo

Table of ContentsTable of · PDF fileTable of ContentsTable of Contents. ... immigration.through.World.War.II ..In.particular,.the.film ... Abraham.Lincoln.as.Ambassador.to.Cyprus

Circuit Playground Class Scheduler - Adafruit …...Scheduler Code Class Scheduler Logic The sketch that runs on the Circuit Playground Class Scheduler is fairly simple. It really

Table ConTenTsTable of ConTenTs v 3. First Impressions Matter .....64