prr.hec.gov.pkprr.hec.gov.pk/jspui/bitstream/123456789/7997/1/Hameed Hussain.pdf · iii Power...
Transcript of prr.hec.gov.pkprr.hec.gov.pk/jspui/bitstream/123456789/7997/1/Hameed Hussain.pdf · iii Power...
POWER EFFICIENT RESOURCE
ALLOCATION IN HIGH PERFORMANCE
COMPUTING SYSTEMS
By
Hameed Hussain
CIIT/FA09-PCS-003/ISB
PhD thesis
In
Computer Science
COMSATS Institute of Information Technology,
Islamabad-Pakistan
Fall, 2016
ii
COMSATS Institute of Information Technology
Power Efficient Resource Allocation in High
Performance Computing Systems
A Thesis Presented to
COMSATS Institute of Information Technology, Islamabad
In partial fulfillment
of the requirement for the degree of
Ph.D Computer Science
By
Hameed Hussain
CIIT/FA09-PCS-003/ISB
Fall, 2016
iii
Power Efficient Resource Allocation in High
Performance Computing Systems
________________________________________________________________________
A Post Graduate Thesis submitted to the Department of Computer Science as
partial fulfillment of the requirement for the award of Degree of Ph.D in Computer
Science.
Name
Registration Number
Hameed Hussain CIIT/FA09-PCS-003/ISB
Supervisor
Dr. Nasro Min Allah
Associate Professor,
Department of Computer Science
COMSATS Institute of Information Technology (CIIT)
Islamabad, Campus.
October, 2016
iv
Certificate of Approval
This is to certify that the research work presented in this thesis, entitled “Power Efficient
Resource Allocation in High Performance Computing System” was conducted by
Hameed Hussain having registration number CIIT/FA09-PCS-003/ISB, under the
supervision of Dr. Nasro Min-Allah. No part of this thesis has been submitted anywhere
else for any other degree. This thesis is submitted to the Department of Computer
Science, COMSATS Institute of Information Technology, Islamabad, in the partial
fulfillment of the requirement for the degree of Doctor of Philosophy in the field of
Computer Science.
Student Name: Hameed Hussain Signature: __________________
Examination Committee:
Prof. Dr. Bhawani Shankar Chowdhry Prof. Dr. Malik Sikandar Hayat Khiyal
Dean Faculty of Electrical, Electronics Department of Computer Science
and Computer Engineering MUET, Preston University, Islamabad, Pakistan
Jamshoro, Pakistan
Dr. Nasro Min-Allah Dr. Majid Iqbal Khan
Supervisor HoD, Department of Computer Science,
Department of Computer Science, CIIT, Islamabad
CIIT, Islamabad
Prof. Dr. Zulfiqar Habib Prof. Dr. Syed Asad Hussain
Chairperson, Computer Science, Dean Faculty of Information Sciences
CIIT and Technology, CIIT
v
Author’s Declaration
I Hameed Hussain, reg no. CIIT/FA09-PCS-003/ISB, hereby state that my PhD thesis
title “Power Efficient Resource Allocation in High Performance Computing Systems” is
my own work and has not been submitted previously by me for taking any degree from
this university i.e. COMSATS Institute of Information Technology or anywhere else in
the country/world.
At any time if my statement is found to be incorrect even after I graduate the University
has the right to withdraw my PhD degree.
Date: _____________________ Signature of the student
(Thesis submission)
Hameed Hussain
CIIT/FA09-PCS-003/ISB
vi
Plagiarism Undertaking
I solemnly declare that research work presented in this thesis titled “Power Efficient
Resource Allocation in High Performance Computing Systems” is solely my research
work with no significant contribution from any other person. Small contribution/help
wherever taken has been duly acknowledged and that complete thesis has been written by
me.
I understand the zero tolerance policy of HEC and COMSATS Institute of Information
Technology towards plagiarism. Therefore, I as an author of the above titled thesis
declare that no portion of my thesis has been plagiarized and any material used as
reference is properly referred/cited.
I undertake if I am found guilty of any formal plagiarism in the above titled thesis even
after award of PhD Degree, the University reserves the right to withdraw/revoke my PhD
degree and that HEC and the university has the right to publish my name on the
HEC/university website on which name of students are placed who submitted plagiarized
thesis.
Date: _____________________ Signature of the student
(Thesis submission)
Hameed Hussain CIIT/FA09-PCS-003/ISB
vii
Certificate
It is certified that Mr. Hameed Hussain reg. no. CIIT/FA09-PCS-003/ISB has carried
out all the work related to this thesis under my supervision at the Department of
Computer Science, COMSATS Institute of Information Technology, Islamabad and the
work fulfills the requirement for award of PhD degree.
Date: __________________ Supervisor:
(Thesis submission)
_________________________
Dr. Nasro Min Allah
Associate Professor,
Department of Computer Science
CIIT, Islamabad
Head of Department:
______________________________
Dr. Majid Iqbal Khan
Associate Professor,
Department of Computer Science,
CIIT, Islamabad
viii
DEDICATION
To my parents who are like cool shade in the noontide of my life. Who sacrifice their
today for my tomorrow, particularly to my late mother, whose hands get tired of praying
for my success; and to those who pray for me and encourage me throughout my
educational career.
ix
ACKNOWLEDGEMENTS
I offer heartiest “Drood-o-salam” to holly “Prophet Muhammad (peace be upon him)”. I
am grateful to almighty Allah who is merciful and beneficent, and who enable me to
work on this research successfully. Accomplishment of a research thesis requires the help
of many peoples who steer, guide, give confidence and help you. Many people guide and
support me. They were always there to help me out in the time of need. First, I would like
to express my sincere gratitude to my supervisors, “Dr. Nasro Min Allah and Dr.
Manzoor Illahi Tamimy” who are associate professors in COMSATS Institute of
Information Technology (CIIT) Islamabad for their esteemed supervision, encouragement
and guidance for successful completion of this research work. Secondly, I am grateful to
all faculty members of computer science department in CIIT Islamabad for their timely
and unconditional help. They always encouraged, and helped me in understanding my
research problems and guide me to cope with issues and problems faced during this
research work. I am also thankful to my friends specially “Muhammad Bilal Qureshi”
who encouraged me to complete this work. I am heartedly grateful to my parents,
brothers, sisters and wife for their gracious, unconditional support, prayers and
encouragement throughout my educational career. I am thankful to almighty Allah who
blessed me, with a daughter “Anfal” and two sons “Muhammad Bilal” and “Muhammad
Talha”. The love of my children encourages me and adds new motivation in my work
that was the real source of energy for giving final touch to the thesis.
Hameed Hussain,
CIIT/FA09-PCS-003/ISB
x
ABSTRACT
Power Efficient Resource Allocation in High Performance
Computing Systems
An efficient resource allocation is a fundamental requirement in High Performance
Computing (HPC) systems. Many projects are dedicated to large-scale distributed
computing systems that have designed and developed resource allocation mechanisms
with a variety of architectures and services. Resource allocation mechanisms and
strategies play a vital role towards the performance improvement of all the high
performance computing classifications. Therefore, a comprehensive discussion of widely
used resource allocation strategies deployed in distributed high performance computing
environment is required. The author has classified the distributed high performance
computing systems into three broad categories, namely: (a) cluster, (b) grid, and (c) cloud
systems and defines the characteristics of each class by extracting sets of common
attributes. All of the aforementioned systems are cataloged into pure software and
hybrid/hardware solutions. The system classification is used to identify approaches
followed by the implementation of existing resource allocation strategies that are widely
presented in the literature.
More computational power is offered by high performance computing systems to cope
with CPU intensive applications. However, this facility comes at the price of more energy
consumption and eventually higher heat dissipation. As a remedy, these issues are be ing
encountered by adjusting system speed on the fly so that application deadlines are
respected and also, the overall system energy consumption is reduced. In addition, the
current state of the art of high performance computing, particularly the multi-core
technology opens further research opportunities for energy reduction through power
efficient scheduling. However, the multi-core front is relatively unexplored from the
perspective of task scheduling. To the best of our knowledge, very little is known as of
yet to integrate power efficiency component into real-time scheduling theory that is
tailored for high performance computing particularly multi-core platforms. In these
efforts the author first proposes a technique to find the least feasible speed to schedule
individual tasks. The proposed technique is experimentally evaluated and the results
show the supremacy of our approach over the existing counterpart called first feasible
speed. However, this solution is at the cost of delayed response time. The experimental
xi
results are in accordance with the mathematical formulation established in this work. To
minimize power consumption, the author did another attempt by applying genetic
algorithm on first feasible speed. The newly proposed approach is termed as genetic
algorithm with first feasible speed. The author compares the results obtained through
aforementioned approach with existing techniques. It is worth mentioning that proposed
technique outperforms first feasible speed and least feasible speed with respect to energy
consumption and response time perspectives respectively.
Load balancing is also vital for efficient and equal utilization of computing units
(systems or cores). To load balancing among computing units, the author applies lightest
task-migration (task-shifting) and task splitting mechanisms in a multi-core environment.
In task shifting a task having minimum load on a highly utilized computing unit is fully
transferred to a low utilized computing unit. In task splitting, the load of a task from a
highly utilized computing unit is shared among the computing unit and a low utilized
computing unit. It concludes from the given results that task splitting mechanism fully
balance load but it is more time consuming as compared to task shifting strategy.
Keywords: High performance computing, Real time system, Scheduling, Resource
allocation, Resource management, Genetic algorithm, Task migration and Task splitting.
xii
TABLE OF CONTENTS
Chapter 1: Introduction ..................................................................................1
1.1 Introduction ...................................................................................................2
1.2 Motivation ......................................................................................................6
1.3 Problem Statement .........................................................................................6
1.4 Research Issues ..............................................................................................7
1.5 Contributions of The Thesis ..........................................................................8
1.6 Organization of The Thesis ...........................................................................9
1.7 Summary ...................................................................................................... 10
Chapter 2: RA in High Performance Distributed Computing Systems ........ 11
2.1 Introduction ................................................................................................. 12
2.2 Overview of Distributed HPC Systems ...................................................... 13
2.2.1 Distributed HPC Systems Classes ........................................................... 15
2.2.2 Cluster Computer Systems: Features and Requirements ........................ 20
2.2.3 Grid Computer Systems: Features and Requirements ............................ 24
2.2.4 Cloud Computing Systems: Features and Requirements ........................ 28
2.3 Comparison and Survey of The Existing HPC Solutions ........................... 32
2.3.1 Cluster Computing Systems .................................................................... 33
2.3.2 Grid Computing Systems ......................................................................... 40
2.3.3 Cloud Computing Systems ...................................................................... 50
2.4 Classification of Systems ............................................................................ 54
2.4.1. Software Only Solutions .......................................................................... 55
2.4.2. Hardware/Hybrid Only Solutions ............................................................ 55
2.5 Conclusion of the Chapter ........................................................................... 56
Chapter 3: Power Efficient Resource Allocation Using LFS ........................ 57
3.1 Introduction .......................................................................................... 58
3.2 System Model and Background ............................................................. 59
3.3 Lowest Speed Calculations .................................................................... 62
3.4 Experimental Analysis .......................................................................... 65
3.4.1 Determining the Lowest Speed ................................................................ 65
3.4.2 Energy Savings ........................................................................................ 68
xiii
3.5 Task Partitioning in Multi-core Systems ................................................ 73
3.6 Task Mapping on Cores ......................................................................... 77
3.7 Conclusion of the Chapter ..................................................................... 80
Chapter 4: Power Efficient Resource Allocation Using GA-FFS ................. 82
4.1 Introduction ................................................................................................. 83
4.2 Proposed Work ............................................................................................ 83
4.2.1 Main Drivers of Genetic Algorithm ........................................................ 86
4.2.1.1 Cross Over ............................................................................................ 86
4.2.1.2 Mutation ................................................................................................ 87
4.2.2 Feasibility Checking Through GA-FFS Approach ................................. 88
4.3 Experimental Results and Analysis ............................................................ 89
4.4 Conclusion of the Chapter ........................................................................... 93
Chapter 5: Resource Allocation Using Load Balancing Mechanisms .......... 95
5.1 Introduction .......................................................................................... 96
5.2 Load Balancing Mechanisms ................................................................. 97
5.2.1 Task Migration or Task Shifting ............................................................. 98
5.2.2 Task Splitting ........................................................................................... 98
5.2.3 Explanation Through an Example ........................................................... 99
5.3 Results and Discussions ...................................................................... 100
5.4 Conclusion of the Chapter ................................................................... 104
Chapter 6: Conclusion, Recommendations and Future Directions ............ 105
6.1 Introduction ............................................................................................... 106
6.2 Conclusion ................................................................................................. 106
6.3 Recommendations...................................................................................... 107
6.4 Future Directions ....................................................................................... 108
xiv
LIST OF FIGURES
FIGURE 2.1: HPC SYSTEMS CATEGORIES AND ATTRIBUTES ................................................. 14
FIGURE 2.2: A CLUSTER COMPUTING SYSTEM ARCHITECTURE ........................................... 16
FIGURE 2.3: A MODEL OF GRID COMPUTING SYSTEM......................................................... 17
FIGURE 2.4: A LAYERED MODEL OF CLOUD COMPUTING SYSTEM ...................................... 19
FIGURE 2.5: A) SINGLE TASK JOB AND B) MULTIPLE TASK JOB ........................................... 22
FIGURE 2.6: RESOURCE MANAGEMENT (A) CENTRALIZED (B) DECENTRALIZED ................ 22
FIGURE 2.7: TAXONOMY OF RESOURCE ALLOCATION POLICIES .......................................... 27
FIGURE 3.1: GANTT CHART FOR 1 1.30,3 , 2 1.19,5 AND 3 1.19,10 . ........................... 64
FIGURE 3.2: GANTT CHART FOR 1 1.57,3 , 2 1.42,5 AND 3 1.42,10 ........................... 64
FIGURE 3.3: EFFECT OF UTILIZATION ON SYSTEM SPEED. ..................................................... 67
FIGURE 3.4:. POWER CONSUMPTION OF CRUSOE PROCESSOR AT RESPECTIVE VOLTAGE. ..... 69
FIGURE 3.5: NORMALIZED ENERGY CONSUMPTIONS ............................................................ 71
FIGURE 3.6: LFS AND FFS COMPARISON BASED ON REQUIRED EXECUTION TIME ............... 72
FIGURE 3.7: LOAD DISTRIBUTION ON SYSTEM WITH 8 CORES. .............................................. 78
FIGURE 3.8: LOAD DISTRIBUTION ON SYSTEM WITH 12 CORES. ........................................... 79
FIGURE 4.1: FLOW CHART OF GA-FFS ................................................................................. 84
FIGURE 4.2: PROCESS OF TOURNAMENT SELECTION ............................................................ 85
FIGURE 4.3: CROSS-OVER PROCESS ...................................................................................... 86
FIGURE 4.4: MUTATION PROCESS ......................................................................................... 87
FIGURE 4.5: TASK SET SIZE AGAINST REQUIRED SPEED (LFS, FFS, GA-FFS) .................... 90
FIGURE 4.6: ENERGY CONSUMPTION AGAINST TASK SET SIZE FOR LFS, FFS AND GA-FFS 91
FIGURE 4.7: EXECUTION TIME AGAINST TASK SET SIZE FOR LFS, FFS AND GA-FFS ......... 91
FIGURE 5.1: LOAD BALANCING MECHANISMS (TASK SHIFTING AND TASK SPLITTING) ........ 99
FIGURE 5.2: NUMBER OF TASKS ON CORES BEFORE LOAD BALANCING. ............................. 101
FIGURE 5.3: CORES UTILIZATION BEFORE LOAD BALANCING. ............................................ 101
FIGURE 5.4: NUMBER OF TASKS ON CORES AFTER TASK SHIFTING. .................................... 102
FIGURE 5.5: CORES UTILIZATION AFTER TASK SHIFTING. ................................................... 102
FIGURE 5.6: CORES UTILIZATION AFTER TASK SPLITTING .................................................. 103
_______________________________________________________________________________________________________
xv
LIST OF TABLES
TABLE 2.1: COMMONALITY BETWEEN CLUSTER, GRID, AND CLOUD SYSTEMS. ................... 20
TABLE 2.2: SURVEY OF THE EXISTING HPC SYSTEMS ...................................................... 32
TABLE 2.3: COMPARISON OF CLUSTER COMPUTING SYSTEMS .......................................... 34
TABLE 2.4: COMPARISON OF GRID COMPUTING SYSTEMS ................................................ 41
TABLE 2.5: COMPARISON OF CLOUD COMPUTING SYSTEMS ............................................. 52
TABLE 2.6: CLASSIFICATION OF GRID, CLOUD AND CLUSTER SYSTEMS ............................ 56
TABLE 3.1: OPERATIONAL LEVELS AND THE RESPECTIVE SPEED RANGES. ......................... 61
TABLE 5.1: OVERALL SIMULATION RESULTS ................................................................. 103
___________________________________________________________________________________________________________
xvi
LIST OF ABBREVIATIONS
Amazon EC2: Amazon Elastic Compute Cloud
AMIs: Amazon Machine Images
AOP: Application Oriented Policy
BNS: Broker Name Service
CC: Cluster Controller
CGs: Computational Grids
CIIT: COMSATS Institute of Information Technology
CMOS: Complementary Metal Oxide Silicon
CPU: Central Processing Unit
CSPs: Cloud Service Providers
DAG: Directed Acyclic Graph
DM: Deadline Monotonic
DQS: Distributed Queuing System
DVS: Dynamic Voltage Scaling
FCFS: First Come First Serve
FFS: First Feasible Speed
FORTRAN: Formula Translation
FS: File System
GA: Genetic Algorithm
GAE: Google Application Engine
GA-FFS: Genetic Algorithm with First Feasible Speed
GENI: Global Environment for Network Innovations
GHS: Grid Harvest Service
GNQS: Generic Network Queuing System
GQoSM: Grid Quality of Services Management
GRACE: Grid Architecture for Computational Economy
GRB: Grid Resource Broker
GRIP: GRid Information Protocol
GRRP: GRid Registration Protocol
HB: Hyperbolic Bound
H-FSC: Hierarchical Fair Service Curve
HP: Hewlett Packard
HPC: High Performance Computing
HTTP: Hypertext Transfer Protocol
IaaS: Infrastructure as a Service
IBM: International Business Machines
ICDIM: International Conference on Digital Information Management
IP: Internet Protocol
IT: Information Technology
JPDC: Journal of Parallel and Distributed Computing
JSON: JavaScript Object Notation
LFS: Least Feasible Speed
LL-bound: Liu and Leyland bound
LSF: Load Sharing Facility
MATLAB: Mathematics Laboratory
xvii
MIPS: Million of Instruction Per Second
MOL: Meta Computing Online
MPI: Message Passing Interface
MTTF: Mean Time To Failure
NC: Node Controller
Ninf: Network Infrastructure
NIST: National Institute of Standards and Technology
NWS: Network Weather Service
OGSA: Open Grid Service Architecture
OpenSSI: Open Single System Image
ORB: Object Request Broker
OS: Operating System
PaaS: Platform as a Service
PARCO: Parallel Computing Journal
PBS: Portable Batch System
PC: Personal Computer
PDAs: Personal Digital Assistants
PUNCH: Purdue University Network Computing Hub
PVM: Parallel Virtual Machine
QoS: Quality of Service
RAS: Reliability, Availability and Serviceability
REST: Representational State Transfer
RM: Rate Monotonic
RMS: Resource Management System
S3: Simple Storage Service
SaaS: Software as a Service
SC: Sufficient Conditions
SLAs: Service Level Agreements
SLURM: Simple Linux Utility for Resource Management
SLURM: Simple Linux Utility for Resource Management
SMP: Symmetric Multi-Processing
SOAP: Simple Object Access Protocol
SP: System Provisioning
SSL: Secure Socket Layers
STC: Storage Controller
TAO: The Ace ORB
TCP: Transmission Control Protocol
TSWJ: The Scientific World Journal
URL: Unified Resource Locator
VGrADS: Virtual Grid Application Development Software
VLAN: Virtual Local Area Network
VM: Virtual Machine
WAN: Wide Area Network
Web API: Website Application Programming Interface
XML: Extensible Markup Language
_________________________________________________________________
1
Chapter 1
Introduction
2
1.1 Introduction
Improving computation power at minimum power (energy) consumption is the demand of
the day. Computational power enhances by using HPC systems. The HPC systems are
categorized into distributed and non-distributed [1, 2, 3]. By distribution, we mean
processors on different boards. Cluster, Grid and Cloud computing systems are under the
umbrella of distributed HPC systems [1], while multi-core [4] technology comes under
the category of non-distributed HPC systems. Clustering or Cluster computing is using
multiple storage and processing devices and interconnections between them to form a
single system image to outside world [1, 5]. The prime goal of cluster system is
integrating software, hardware and network resources for availability, load balancing and
performance improvement [1, 5, 6, 7]. The grid-computing concept is based on using
Internet as a medium [1]. In grid-computing powerful computing resources are connected
through the medium for wide spread availability [8]. Grid systems as opposed to cluster,
have different administrative domains, user privileges [1, 9, 10], heterogeneous, loosely
coupled and geographically spread [1, 11, 12]. Problem solving and resource sharing are
the primal motivations behind grid computing. For more detail about grid and its types,
see ref [13]. Services in cloud computing are provided though Internet. The services are
dynamically scalable and resources are virtualized over the Internet [1, 14, 15, 16, 17]. In
cloud computing the services are software, platform and infrastructure. The Software as a
service (SaaS), Plateform as a Service (PaaS) and Infrastructure as a Service (IaaS)
terminologies are used in cloud computing. For more detail and profound comparison of
cluster, grid and cloud see ref [1].
The distributed computing paradigm endeavors to tie together the power of large number
of resources distributed across a network. Each user has the requirements that are shared
in the network architecture through a proper communication channel [18]. Distributed
computing paradigm is used for three major reasons. First, the nature of distributed
applications suggests the use of a communication network that connects several
computers. Such networks are necessary for producing data that are required for the
execution of tasks on remote resources. Second, most of the parallel applications have
multiple processes that run concurrently on many nodes communicating over a high-
speed interconnect. The use of high performance distributed systems for parallel
applications is beneficial as compared to a single CPU machine for practical reasons. The
3
ability of services distributed in a wide network is low-cost and makes the whole system
scalable and adapted to achieve the desired level of the performance efficiency [14].
Third, the reliability of the distributed system is higher than a monolithic single processor
machine. A single failure of one network node in a distributed environment does not stop
the whole process as compared to a single CPU resource. Some techniques for achieving
reliability on a distributed environment are check pointing and replication [14].
The birth of multi-core systems has significantly advanced the existing technologies in
the domain of computer architecture and HPCs. However, this advantage presents the
research community with enormous challenges, such as the efficient handling of thermal
dissipation and the lack of mature scheduling techniques.
Normally, all the cores of a chip operate in the same clock domain, clock frequency, and
operational voltage [19]. However, there exist systems in which the cores do not operate
at the same frequency. Therefore, maintaining performance symmetry among
asymmetrically operating cores is one of the most critical issues that the researchers are
dealing with today [20, 21]. There are two possible solutions for the abovementioned
issues: (i) add dynamic voltage circuitry per core (a hardware solution), or (ii) schedule
tasks among cores judicially to enable all the cores to operate on the same clock
frequency (a software solution). The former compensation strategy exhibits power
leakage at higher frequencies and undermine the thermal throttling [20]. Being a
promising alternative, the latter solution is relatively unexplored from the point of view
of scheduling. Considering this gap, we partition a given workload among the cores with
the intention that all the cores operate on the same clock frequency for maximum energy
savings.
The newer processors provide an interface to dynamically adjust the voltage (or speed)
for optimized power consumption. This voltage (speed) adjustment on run time is termed
as Dynamic Voltage Scaling (DVS), which is an effective methodology for the reduction
of core power consumption. The dynamic clock and voltage adjustments represent the
cutting edge of power reduction capabilities in Complementary Metal Oxide
Semiconductor (CMOS) circuitry. The relation between frequency and voltage/power
provides foundation for dynamic voltage scaling in modern processors [20, 22, 23, 24].
Theoretically, an ideal processor would be the one that supplies continuous voltage
levels. However, using continuous variable voltages is infeasible because of the
switching overhead to support several operational levels. Therefore, the latest processors
4
are capable of supporting a fixed number of discrete-level speeds between a predefined
minimum and maximum levels. It has been reported in [25] that the energy–speed curve
is convex in nature. Therefore, according to Jensen‟s inequality [26, 27, 28], as long as
the deadline constraints are fulfilled, it is more energy efficient to execute tasks at a
constant speed than at a variable speed for each of the individual tasks. We further extend
this result by exploring the possibility of determining a uniform system speed for all the
cores by considering a processor that supports a large number of discrete energy–voltage
levels.
In real-time systems, tasks are scheduled based on some predefined criteria, such as
activation rates, deadlines, and priorities [29, 30, 31, 32, 33]. The higher the priority of a
task, the more is the attention devoted to the task when a scheduling decision is to be
made. Real time systems are usually not fully utilized up to the maximum extent.
Therefore, the systems are a promising venue to apply DVS methodologies and DVS
enabled scheduling techniques. Applying DVS techniques requires careful consideration
of task scheduling and a number of results are available (primarily for the uni-processor
systems) [19, 27, 28, 34, 35, 36, 37, 38, 39].
Continuous speed levels are normally assumed to obtain optimality. However, the
aforesaid is inapplicable to practical systems that have processors with discrete voltage
regulators [40, 41]. Manufacturers are introducing processors that will operate on more
discrete levels than what we see today. For instance, the new Foxon technology is
expected to enable the Intel servers to operate on as many as 64 speed grades [40].
Therefore, an accurate model for reducing the energy consumption of the latest systems
must capture the discrete rather than the continuous nature of the available speed scaling
[25, 40, 41, 42]. However, the work we present here can easily be extending to systems
that may operate on a continuous speed spectrum.
The most commonly used policy to schedule real-time tasks is the “priority driven”, that
can be classified into the following two types: (i) fixed priority and (ii) dynamic priority
[43]. A fixed priority algorithm assigns a fixed value to priority to all jobs in each task,
which should be different from the priorities assigned to jobs generated by other tasks
within the system. In contrast, dynamic-priority scheduling algorithms place no
restrictions on the manner in which priorities are assigned to individual jobs. Although,
dynamic algorithms are better considered theoretically, they become unpredictable when
5
transient overload occurs [44]. Therefore, in this work, we only consider fixed-priority
scheduling due to its applicability, reliability, and simplicity [29, 32, 33, 45, 46, 47].
The problem of scheduling periodic tasks under a fixed-priority scheme was first
addressed by Liu and Layland [29] in 1973 with simplified assumptions. They derived
the optimal static priority scheduling algorithm for implicit -deadline model (when
deadlines coincide with respective periods), termed the RM algorithm. The RM algorithm
assigns static priorities on the task activation rates (periods) such that for any two tasks
i and j , priority ( i ) priority ( j ) period ( i ) < period ( j ), wherein ties are
broken arbitrarily. For a constrained deadline system, where deadlines are not greater
than periods, an optimal priority ordering has been reported in [48], termed the Deadline
Monotonic (DM) scheduling, wherein, the assigned priorities are inversely proportional
to the relative deadlines. The Rate Monotonic (RM) and DM methodologies are identical
when the relative deadlines of tasks are proportional to their periods. In the remainder of
this work, a task model refers to a constrained deadline system, and both RM and DM
will be used interchangeably to align with the terminologies used in the literature.
Scheduling policies developed for symmetric multiprocessors may also be applicable to
the multi-core counterpart. Recently, the fixed-priority scheduling theory for multi-core
environment was studied in [34]. We extend the abovementioned work to further explore
the necessary and sufficient condition [49] of the RM paradigm pertaining to multi-core
systems. In particular, more interesting results are revealed for multi-core systems where
all the cores operate at the same clock frequency [50]. Once the speed for a generic core
i is determined, the average system speed suitable for all the cores is calculated.
However, this average speed might potentially make the task set un-schedulable on some
cores. In this work, we address this anomaly to maintain system feasibility by shifting
tasks from a heavily utilized core to an underutilized core such that all the cores process
the same workload and the task set remains feasible at uniform system speed.
Genetic algorithm [51] is an optimization algorithm. It is influence from the notion of
Darwin‟s theory [52], which is “survival of the fittest” [53]. The algorithm retains the
fittest genes. The optimization process of GA is such that, initially offspring‟s fitness
values are calculated and some offsprings are selected by using any selection method.
The selection method can be random, roulette wheel or tournament. The selected
offsprings are then passes through cross over [51] and mutation [51] phases. Finally, the
6
fitness values of the new offsprings are calculated. After cross over and mutation, only
those genes retain in new population whose new fitness value is better than old fitness
value and thus the process of optimization takes place.
The overall performance of HPC system depends on resource management and load
balancing among all computing units. Efficient resource management and load balancing
is a key and fundamental requirement to the success of any HPC environment like cluster,
gird, cloud and multi-core systems [54]. For load balancing among computing units the
author only focuses on one dimension of HPC systems which is multi-core and applies
the load balancing strategies such as task migration and task splitting for load balancing.
The ideas of the aforementioned strategies can easily be extended to distributed HPC
systems (Cluster, Grid and Cloud).
1.2 Motivation
Normally, the amount of consumption of energy is directly proportional to population
i.e., energy consumption increases with increase in population. As the use of computer
technology increases, the power (energy) consumption also increases. Less efficient
computers consume more energy that not only wastes precious energy but also increases
pollution [55]. Some of the pollutions due to enhancement in computer technology are in
the form of toxic materials and carbon dioxide in power plants used for production of
computers [55]. The foremost motivation behind this research is reduction of energy
consumption that results saving of money and energy for further usage, as well as acts as
consent for green computing.
Power efficient resource allocation in HPC systems is most important for several reasons
[195]. Firstly, the costs of electricity for cooling and powering of resources are going
beyond the actual purchasing cost of the resources. Secondly, increase in energy usage
and its associated carbon emission have provoked environmental concerns. And finally,
increase in energy usage and heat dissipation has negative impacts for reliability, density,
and scalability of HPC systems [196].
1.3 Problem Statement
Resource allocation mechanisms play a vital role in performance improvement of HPCs
systems. Therefore, a comprehensive discussion of widely used resource allocation
strategies in distributed HPC systems is required. However, maintaining system-timing
7
constraint on HPC is a challenge and getting attention from research community these
days. This research work address the problem of how the power(energy) efficiency is
obtained by using task scheduling to adjust system speed on the fly in multi-core
environment, enabling uniform system speed. This work also deal with the problem of
load balancing among cores.
1.4 Research Issues
The key issue addressed in Chapter 2 at the abstract level is “resource allocation in high
performance distributed computing systems” [1]. There are three broad classes of
distributed HPC systems: (a) cluster, (b) grid, and (c) cloud. Besides other factors, the
performances of the abovementioned classes are directly related to the resource allocation
mechanisms used in the system. Therefore, in the said perspective, a complete analysis of
resource allocation mechanism used in the HPCs classes is required. The features of the
HPC categories (cluster, grid, and cloud) are conceptually similar [56]. Therefore, efforts
are required to distinguish each of the categories by selecting relevant distinct features
for all, and catalog the systems into pure software and hybrid/hardware HPC solutions.
Author believe that the comprehensive analysis of leading research and commercial
projects in HPC domain can provide readers with an understanding of the essential
concepts of the evolution of the resource allocation mechanisms in HPC systems.
Moreover, the solution of aforementioned research issues will help individuals and
researchers to identify the important and outstanding issues for further investigation.
Conceptually, the research issue addressed in Chapter 3 and Chapter 4 is “how power
(energy) efficiency is obtained by adjusting system speed in multi -core environment” [2,
4]. As a known phenomenon, more computational power is offered by current real-time
systems to cope with CPU intensive applications. However, this facility comes at the
price of more energy consumption and eventually higher heat dissipation. As a remedy,
these issues are being encountered by adjusting system speed on the fly so that
application deadlines are met, and the overall system energy consumption is reduced. In
addition, the current state of the art of multi-core technology opens further research
opportunities for energy reduction through power efficient scheduling. However, the
multi-core front is relatively unexplored from the perspective of task scheduling. To the
best of author‟s knowledge, very little is known as of yet to integrate power efficiency
component into real-time scheduling theory that is tailored for multi-core platforms. In
8
Chapter 3, the issue is addressed through a novel approach called LFS, while Chapter 4
address the issue by adding GA into an existing approach FFS and termed the new
approach as GA-FFS.
The application of the proposed power minimization approaches (LFS and GA-FFS) can
leads to unbalanced utilization of computing units. While, load balancing among
computing units (cores or systems) plays a vital role in overall performance of HPC.
Efficient results may not be obtained unless a specific load is properly balanced among
systems or cores in HPC. The primal issue addressed in Chapter 5 is load balancing
among cores in multi-core environment by using task shifting (migration) and task
splitting mechanisms.
1.5 Contributions of The Thesis
The thesis contributes to research community working in the field of HPC systems. The
highlighted contributing aspects of Chapter 2 of the thesis are as follows:
1. First, the said chapter analyses and differentiate distributed HPC systems (cluster,
grid and cloud) based on some predefined common features.
2. Further, Chapter 2 deeply analyses resource allocation mechanisms of cluster,
grid, and cloud systems [1].
3. In the said chapter the author discuss the common features of each category and
comparing the resource allocation mechanisms of the systems based on the
selected features [1].
4. Finally, Chapter 2 classifies analysed systems of cluster, grid and cloud as
software only and hybrid/hardware systems [1].
Chapter 3 of the thesis advances the current state of the art of scheduling theory as
follows.
1. Identification of the lowest possible core speed. The work presented in Chapter 3
identifies the implicit disadvantage associated with the FFS approach that is often
used in the literature. Author further investigate this issue and identify properties
and bounds that enables to identify a procedure, which can further reduce the core
speed to the minimum possible level and also ensure that the task set remains RM
schedulable [2].
2. Practical power savings with adjustable core speeds. Because of the practical
limitations of the available DVS-enabled processors, the tasks are mapped using a
9
finite number of discrete voltage levels. However, the work presented in Chapter 3
can also be equally applicable to future generation processors that may support
continuous voltage levels [2].
3. Present a simple but practical core load balancing procedure. Author in Chapter
3 presents the lightest task shift procedure to load balance the system cores. The
motivation behind this mechanism is based on the observation that the lightest
task (with lowest utilization among all the tasks assigned to a core) is the only
task that decreases the core utilization by the minimum possible load by shifting
(or migrating) the task from an over utilized core to the underutilized cores [2].
4. Achieving uniform system power consumption and utilization. The focus of
Chapter 3 was kept as general as possible to include heterogeneous system cores.
The approach in Chapter 3 can fine-tune the system so that all the cores operate on
the same clock rate and have equally proportionate core utilization. The
abovementioned results in a uniform system performance with predictable power
consumption. The approach presented in Chapter 3 can be useful for designing
applications that demand homogeneous performance over a heterogeneous system
[2].
5. Presents a novel energy efficient approach LFS. A novel approach called LFS is
the main contribution, discussed in Chapter 3 that greatly enhances energy
minimization in multi-core environment.
Chapter 4 of the thesis contributes to research community by enhancing the power
consumption of an existing technique (FFS) [46] by applying genetic algorithm in
addition with FFS and termed the new technique as GA-FFS [4].
Chapter 5 contributes research community by balancing load among cores through task
shifting and task splitting strategies. The said chapter concludes that if response time is
important, then use task shifting (migration) mechanism, otherwise use task splitting for
fully load balancing [3].
1.6 Organization of The Thesis
The rest of the thesis is organized as; In Chapter 2, a comprehensive “survey on resource
allocation in high performance distributed computing systems” [1] is presented. Cluster,
grid and cloud computing systems come in the category of distributed HPC systems. In
Chapter 2, analysis of existing distributed HPC systems based on predefined features is
10
under consideration. To cover the multi-core side of HPC systems, “Power efficiency
through least feasible speed in multi-core environment” [2] is the focus of Chapter 3. The
foremost focus of Chapter 3 is to allocate the computing resources in such a way that
optimizes the overall speed and power consumption in multi-core environment. In
Chapter 4, GA is applied to FFS [4] for efficiency of speed and power of existing
technique i.e., FFS. The new technique introduced in Chapter 4 is termed as GA-FFS.
The primal focus of Chapter 5 is “load balancing through task shifting and task splitting
strategies in multi-core environment” [3]. Finally, Chapter 6 concludes the thesis and
discusses some recommendations and future directions.
1.7 Summary
Obtaining maximum system utilization with minimum energy consumption has been the
central focus for researcher working in the domain of power efficient resource allocation.
Encouraging results have been presented recently by applying energy aware scheduling
techniques in HPC environments. The motivation is normally extending battery life to
make the device operation for longer and also to reduce heat dissipation. On the other
hand, more computational power in HPC comes at the price of more energy consumption.
In HPC the computational power increases in two dimensions, i-borrow computational
(other resources may also be borrowed) power of other computers i.e., cluster, grid and
cloud, ii-increase the processing power of a single system by incorporating more cores to
the same chip. Both dimensions have its merits and demerits. Careful resources allocation
plays a vital role in an energy efficient computing facility. In multi-core systems when
the cores run asymmetrically the main problem is the load balancing among cores. In
such systems, it is more energy efficient to execute tasks at constant speed, as long as
deadline constrained are fulfilled [27, 28, 57].
In this research work, author first carried out survey of resource allocation in distributed
HPC (cluster, gird and cloud) systems, and later, extended the work to the second
dimension of HPC i.e., resource allocation (scheduling) and load balancing in multi-core
systems. The author proves that for core speed calculation, the conventional FFS [46] is
inferior to author‟s proposed approaches the LFS concept and GA-FFS in energy
consumption point of view.
11
Chapter 2
Resource Allocation in High Performance Distributed
Computing Systems
12
2.1 Introduction
The purpose of this chapter is to analyze the resource allocation mechanism of three
broad classes of HPC: (a) cluster, (b) grid, and (c) cloud. Besides other factors, the
performances of the aforesaid classes are directly related to the resource allocation
mechanisms used in the system. Therefore, in the said perspective, a complete analysis of
resource allocation mechanism used in HPCs classes is required. In this chapter, the
author present a thorough analysis and characteristics of the resource management and
allocation strategies used in academic, industrial, and commercial system.
The features of the HPC categories (cluster, grid, and cloud) are conceptually similar
[56]. Therefore, an effort has been made to distinguish each of the categories by selecting
relevant distinct features for all. The features are selected based on the information
present in the resource allocation domain, acquired from a plethora of literature. Author
believe that the comprehensive analysis of leading research and commercial projects in
HPC domain can provide readers with an understanding of the essential concepts of the
evolution of the resource allocation mechanisms in HPC systems. Moreover, this research
will help individuals and researchers to identify the important and outstanding issues for
further investigation. The highlighted aspects of this chapter are as follows:
1. Analysis of resource allocation mechanisms of cluster, grid, and cloud.
2. Identifying the common features of each category and comparing the resource
allocation mechanisms of the systems based on the selected features.
3. Classification of systems as software only and hybrid/hardware systems.
In contrast to the other compact surveys and system taxonomies, such as [58, 59], the
focus of this study is to demonstrate the resource allocat ion mechanisms. Note that the
purpose of this survey is to demonstrate the resource allocation mechanisms and not the
performance analysis of the systems. Although, the performance can be analyzed based
on the resource allocation mechanism but this is not the scope of the chapter. The
purpose of this chapter is to aggregate and analyze the existing solutions for HPC under
the resource allocation policies. Moreover, an effort has been made to provide a broader
view of the resource allocation mechanisms and strategies by discussing systems of
different categories, such as obsolete systems (systems that were previously being used),
academic system (research projects proposed by institutes and universities), and
established systems (well-known working systems). The projects are compared on the
13
basis of the selected common features within the same category. For each category, the
characteristics discussed are specific and the list of features can be expanded further.
Finally, the systems are cataloged into pure software and hybrid/hardware HPC solutions.
The rest of the chapter is organized as follows: In Section 2.1, author presents the HPC
system classification and highlights the key terms and the basic characteristics of each
class. In Section 2.2, author survey the existed HPC system research projects and
commercial approaches of each classification (cluster, grid, and cloud). The projects are
cataloged into pure software and hybrid/ hardware solutions in Section 2.3.
Energy efficient resource allocation in HPC systems is most important for several reasons
[195]. Firstly, the costs of electricity for cooling and powering of resources are going beyond
the actual purchasing cost of the resources. Secondly, increase in energy usage and its
associated carbon emission have provoked environmental concerns. And finally, increase in
energy usage and heat dissipation has negative impacts for reliability, density, and scalability
of HPC systems hardware [196].
In order to minimize the implementation and computational complexity, the original problem
(power-efficient resource allocation in distributed HPCs) with multiple constraints is
decomposed into multiple optimize-able sub-problems (power-efficient resource allocation in
multi-core) with simple constraints. For the latter problem, two power efficient resource
allocation algorithms (LFS and GA-FFS) were proposed in Chapter 3 and Chapter 4
respectively. The simulation results of the said chapters reveal that the proposed algorithms
outperforms than existing counterpart (FFS) in terms of speed and hence power (energy).
Therefore, the authors‟ focus in this chapter is only to demonstrate existing resource allocation
mechanisms in distributed HPCs that are given in the subsequent sections.
2.2 Overview of Distributed HPC Systems
The section discusses three main categories of HPC systems that are analyzed, evaluated,
and compared based on the set of identified features. Author put cloud under the category
of HPC because it is now possible to deploy a HPC cloud, such as Amazon EC2. Clusters
having 50,000 cores have been run on Amazon EC2 for scientific applications [60].
Moreover, the HPC workload is usually massively high scale and has to be run on many
machines, which is naturally compatible with a cloud environment. The taxonomy
representing the categories and the selected features used for the comparison within the
same category are shown in Figure 2.1.
14
Features
HPC
Cluster
Grid
Cloud
Resource Allocation
Job Processing Type
QoS Attributes
Job Composition
Resource Allocation Control
Platform Support
Evaluation Method
Scheduling Organization
System Type
Resource Description
Resource Allocation Policy
Breadth of scope
Triggering Info
System Functionality
Implementation Structure
System Focus
Dynamic Negotiation of QoS
Web APIs
Virtualization
Services
User Access Interface
Value added Services
Process Migration
Figure 2.1: HPC Systems categories and attributes [1]
Dong et al. [61] designed taxonomy for the classification of scheduling algorithms in
distributed systems. Moreover, Ref. [61] has broadly categorized scheduling algorithms
as: (a) Local vs. Global, (b) Static vs. Dynamic, (c) Optimal vs. Suboptimal, (d)
Distributed vs. Centralized, and (e) Application centric vs. Resource centric. Apart from
above classification, different variants of scheduling, such as conservative, aggressive,
and no reservation can also be found in literature [62, 63]. In conservative scheduling [38,
64], processes allocate required resources before execution. Moreover, the operations are
delayed for serial execution of the tasks that helps in process sequencing. The delay is
also helpful in rejection of the processes. In an aggressive (easy) scheduling [64],
operations are immediately scheduled for execution to avoid delay in the operations.
Moreover, the operations are reordered on the arrival of new operations. In some
situations when a task cannot be completed in serial way, the operations are rejected. In
aggressive scheduling, the operations are not delayed but have rejection risk at later
15
stages. However, in conservative scheduling the operations are not rejected but delayed.
No reservation [65] is a dynamic scheduling technique where the resources are not
reserved prior to execution but allocated at run time. The resources without reservation
are wasted because, if a resource is not available at request time, then the process has to
wait till the availability of resource.
2.2.1 Distributed HPC Systems Classes
This section focuses distributed HPC systems. A brief introduction of each category of
distributed HPC (cluster, grid and cloud) system is given.
2.2.1.1 Cluster Computing Systems
Cluster computing commonly referred as clustering, is the use of multiple computers,
multiple storage devices, and redundant interconnections to form a single highly
available system [5]. Cluster computing can be used for high availability and load
balancing. A common use of cluster computing is to provide load balancing on high-
traffic websites. The concept of clustering was already present in DEC ‟s VMS systems
[66, 67]. IBM‟s Sysplex is a cluster-based approach for a mainframe system [68].
Microsoft, Sun Microsystems, and other leading hardware and software companies offer
clustering packages for scalability and availability [69]. With the increase in traffic or
availability assurance, all or some parts of the cluster can be increased in size or number.
The goal of cluster computing is to design an efficient computing platform that uses a
group of commodity computer resources integrated through hardware, networks, and
software to improve the performance and availability of a single computer resource [6,
7]. One of the main ideas of cluster computing is to portray a single system image to the
outside world. Initially, cluster computing and HPC were referred to the same type of
computing systems. However, today‟s technology enables the extension of cluster class
by incorporating load balancing, parallel processing, multi-level system management and
scalability methodologies. Load balancing algorithms [70] are designed essentially to
equally spread the load on processors and maximize the utilization while minimizing the
total task execution time. To achieve the goals of cluster computing, the load-balancing
mechanism should be fair in distributing the load across the processors [71]. The
objective is to minimize the total execution and communication cost encountered by the
task assignment, subject to the resource constraints.
16
The extension of traditional clusters transforms into user-demand systems (provides
SLA-based performance) that deliver RAS needed for HPC applications. A modern
cluster is made up of a set of commodity computers that are usually restr icted to a single
switch or group of interconnected switches within a single VLAN [72]. Each compute
node (computer) may have different architecture specifications (single processor
machine, symmetric multiprocessor system, etc.) and access to various types of storage
devices. The underlying network is a dedicated network made up of high-speed and low-
latency system of switches with a single or multi-level hierarchical internal structure. In
addition to executing compute-intensive applications, cluster systems are also used for
replicated storage and backup servers that provide essential fault tolerance and reliability
for critical parallel applications. Figure 2.2, depicts a cluster computing system that
consists of: (a) Management servers (responsible for controlling the system by taking
care of system installation, monitoring, maintenance, and other tasks), (b) Storage
servers, disks, and backup (storage servers are connected to disks for the storage purpose
and the disks are connected to backup for data backup purposes, the storage server in
Figure 2.2 provides a shared file system access across the cluster), (c) User nodes (used
by system users to login to user nodes to run the workloads on each cluster), (d)
Scheduler nodes (users submit their work to a scheduler nodes to run the workload), and
(e) Computer nodes (run the workloads).
Admin
Scheduler Nodes
Computer Nodes
User Nodes
Storage Servers
Management Servers
Disk
Backup
Figure 2.2: A Cluster Computing System Architecture
17
2.2.1.2 Grid Computing Systems
The concept of grid computing is based on using the Internet as a medium for the wide
spread availability of powerful computing resources as low-cost commodity components
[8]. Computational grid can be thought of as a distributed system of logically coupled
local clusters with non-interactive workloads that involve a large number of files [73,
74]. By non-interactive we mean that assigned workload is treated as a single task. The
logically coupled clustering refers that the output of one cluster may become input for
another cluster, but within a cluster the workload is interactive. In contrast with the
conventional HPC (cluster) systems, grids account for different administrative domains
with access policies, such as user privileges [9, 10]. Figure 2.3 depicts a general model of
grid computing system.
The motivations behind grid computing were the resource sharing and problem solving in
multi-institutional and dynamic virtual organizations as depicted in Figure 2.3. A group
of individuals and institutions form a virtual organization. In virtual organization, the
individuals and the institutions define rules for resource sharing. Such rules can be: what
is shared on the basis of what condition and to whom, etc. [75]. Moreover, Grid
guarantees the secure access by user identification. The aggregate throughput is more
important than the price and overall performance of a grid system. What makes grid
different from conventional HPC systems, such as cluster, is that grids tend to be more
loosely coupled, heterogeneous, and geographically dispersed [11, 12].
Figure 2.3: A Model of Grid Computing System
18
2.2.1.3 Cloud Computing Systems
Cloud computing describes a new model for IT services based on the Internet, and
typically involves provision of dynamically scalable and often virtualized resources over -
the-Internet [14, 15, 16]. Moreover, cloud computing provides the ease-of-access to
remote computing sites using the Internet [76, 77]. Figure 2.4 shows a generic layered
model of cloud computing system. The user-level layer in Figure 2.4 is used by the users
to deal with the service provided by the cloud. Moreover, the top layer also uses the
services provided by the lower layer to deliver the capabilities of SaaS [78]. The tools
and environment that are required to create interfaces and applications on the cloud is
provided by the user-level middleware layer. The runtime environment that enables cloud
computing capabilities to application services of user-level middleware is provided by the
core middleware layer. Moreover, the computing capabilities are provided by the layer
through implementing the platform level services [78]. The computing and processing
power of cloud computing is aggregated through data centers. At the system level layer
physical resources, such as storage servers and application servers are available that
powers up the data center [78].
The current cloud systems, such as Amazon EC2 [79], Eucalyptus [80], and LEAD [81]
are based on the VGrADS, sponsored by NIST [82]. The term "Cloud" is a metaphor for
the Internet. The metaphor is based on the cloud drawing used in the past to represent the
telephone network [83] and later to depict the Internet in computer network diagrams as
an abstraction of the underlying infrastructure [84]. Typical cloud computing providers
deliver common business applications online that are accessed through web service and
the data and software are stored on the servers. Clouds often appear as a single point of
access for computing the consumer needs. Commercial offerings are generally expected
to meet the QoS requirements of customers, and typically include SLAs [85].
The model of the cloud requires minimal management and interactions with IT
administrators and resource providers, as seen by the user. Alternatively, self-monitoring
and healing of cloud computing system requires complex networking, storage, and
intelligent system configuration. Self-monitoring is necessary for automatic balancing of
workloads across the physical network nodes to optimize the cost of system utilization.
Failure of any individual physical software or hardware component of the cloud system is
arbitrated swiftly for rapid system recovery.
19
Cloud Applications
Social Computing, Enterprise, ISV,
Scientific, CDNs
Environment and Tools
Web 2.0 , Mashups, Scripting, Libraries
QoS negitiation,
Controls and policies,
SLA management,
Acounting
VM
Management
and
Deployment
Resource
Compute, Storage, ...
Application Hosting Platform
Core
Middleware
User- level
Middleware
User- level
System-level
User
Figure 2.4: A Layered Model of Cloud Computing System
Table 2.1 depicts the common attributes among the HPC categories, such as size, network
type, and coupling. Moreover, no numeric data is involved in the Table 2.1. For example,
the size of the grid is large as compared to cluster. The network grid is usually private
and over WAN that means that grids spread over the Internet is owned by a single
company. Foster et al. [58] uses various perspectives, such as architecture, security
model, business model, programming model, virtualization, data model, and compute
model to compare grids and clouds. Sadashiv et al. [59] have done a comparison of three
computing models (cluster, grid, and cloud) based on different characteristics, such as
business model, SLA, virtualization, and Reliability. Similar comparison can also be
found in [86]. Another comparison amongst the three computing models can also be
found [87].
20
Table 2.1: Commonality between cluster, grid, and cloud systems.
Feature Cluster Grid Cloud
Size
Network type
Job management and scheduling
Coupling
Resource reservation
SLA constraint
Resource support
Virtualization
Security type
SOA and heterogeneity support
User interface
Initial infrastructure cost
Self service and elasticity
Administrative domain
Small to medium
Private, LAN
Centralized
Tight
Pre-reserved
Strict
Homogeneous and heterogeneous (GPU)
Semi-virtualized
Medium
Not supported
Single system image
Very high
No
Single
Large
Private, WAN
Decentralized
Loose/tight
Pre-reserved
High
Heterogeneous
Semi-virtualized
High
Supported
Diverse and dynamic
High
No
Multi
Small to large
Public, WAN
Both
Loose
On-demand
High
Heterogeneous
Completely virtualized
Low
Supported
Single system image
Low
Yes
Both
2.2.2 Cluster Computer Systems: Features and Requirements
The overall performance of the cluster computing system depends on the features of the
system. Cluster systems provide a mature solution for different types of computation and
data-intensive parallel application. Among many specific system settings related to a
particular problem, sets of basic generic cluster properties can be extracted as a common
class of classical and modern cluster systems [88]. The extracted features shown in
Figure 2.1 are defined in the following paragraphs.
2.2.2.1 Job Processing Type
Jobs submitted to the cluster system may be processed as parallel or sequential. The jobs
can be characterized as sequential or parallel based on the processing of the tasks
involved in the job. A job that consists of parallel tasks has to execute concurrently on
different processors, where each task starts at the same time. (The readers are encourage
to see [89] for more details on HPC job scheduling in Cluster.) Usually, the sequential
jobs are executed at a single processor as a queue of independent tasks. Parallel
applications are mapped to the multi-processor parallel machine and are executed
simultaneously on the processors. The parallel processing mode speeds up the whole job
execution and the appropriate strategy is to solve the complex large-scale problems
within a reasonable amount of time and cost. Many conventional market -based cluster
21
resource management systems support sequential processing mode. However, number of
compute intensive applications must be executed within a feasible deadline. Therefore,
parallel processing mode of cluster job is implemented in high-level cluster systems, such
as SLURM [90], Enhanced MOSIX [14], and REXEC [15].
2.2.2.2 QoS Attributes
QoS attributes describe the basic service requirements requested by the consumers that
the service provider is required to deliver. The consumer represents a business user that
generates service requests at a given rate that needs to be processed by the system.
General attributes involved in QoS are: (a) time, (b) cost, (c) efficiency, (d) reliability,
(e) fairness, (f) throughput, (g) availability, (h) maintainability, and (i) security. QoS
metrics can be estimated by using various measurement techniques. However, such
techniques are difficult to use in solving a resource allocation problem with multiple
constraints. The difficulty in resource allocation problem with multiple constraints is still
a critical problem in cluster computing. In some conventional cluster systems, REXEC
[15], Cluster-on-Demand [76], and Libra SLA [77], the job deadline and user defined
budget constraints, such as: (a) fairness, (b) time, and (c) cost are considered. Market-
based cluster RMS still lacks efficient support of reliability or trust. Recent applications
that manipulate huge bytes of distributed data must provide guaranteed QoS during
network accessibility. Providing the best effort services by ignoring the network
mechanism is not enough to the customer requirements. (Readers are encouraged to read
[91, 92] for more understanding of QoS attribute in clusters.)
2.2.2.3 Job Composition
Job composition depicts the number of tasks involved in a single job prescribed by the
user. A single-task job is defined as a monolithic application, in which just a single task
is specified as depicted in Figure 2.5(a). Parallel (or multi-task) jobs are usually
represented by a DAG as shown in Figure 2.5(b). Moreover, the nodes express the
particular tasks partitioned from an application and the edges represent the inter -task
communication [93] (please read [93] for more details).The tasks can be independent or
dependent. Independent tasks can be executed simultaneously to minimize the processing
time. Dependent tasks are cumbersome and must be processed in a pre-defined manner to
ensure that all dependencies are satisfied. Market-based cluster RMSs must support all
22
three types of job compositions namely: (a) single task, (b) independent multiple -task,
and (c) dependent multiple-task [93].
Job A
Task 1
Job B
Task 1
...
Task 2
Task n
Depends
Depends
Depends
a) b)
Figure 2.5: a) Single Task Job and b) Multiple Task Job
2.2.2.4 Resource Allocation Control
Resource Allocation Control is a mechanism that manages and control resources in a
cluster system. Resource allocation control system can be centralized or decentralized
[86]. The jobs in centralized system are being administered centrally by a single resource
manager that has complete knowledge of the system. In decentralized RMS, several
resource managers and providers communicate with one another to keep the load for all
resources balanced and satisfy the specific users requirements [86]. Figure 2.6 depicts
centralized and decentralized resource management systems (more details please see
[94]).
Resource Manager
R1 R2 R3 RNR4 …
Resource Manager1
R1 R2 R3
Resource Manager2
R4 R5 R 6
(a) (b)
Figure 2.6: Resource Management (a) Centralized Resource Management (b) Decentralized
Resource Management
2.2.2.5 Platform Support
Two main categories of cluster infrastructure to support the execution of cluster
applications are homogeneous and heterogeneous platforms. In a homogeneous platform,
the system runs on a number of computers with similar architectures and same OS. In a
23
heterogeneous platform, the architecture and the OS of the nodes are different.
2.2.2.6 Evaluation Method
The performance of cluster system can be evaluated through several metrics to determine
the effectiveness of different cluster RMSs. The performance metrics are divided into two
main categories, namely system-centric and user-centric evaluation criteria [95]. System
centric evaluation criteria depict the overall operational performance of the cluster.
Alternatively, user centric evaluation criteria portray the utility achieved by the
participants. To assess the effectiveness of RMS system-centric and user-centric criteria,
evaluation factors are required. System-centric factors guarantee that system performance
is not compromised and user-centric factors assure that desired utility of various RMS are
achieved from participant perspective [95]. The system-centric factors can include disk
space, access interval, and computing power. User-centric factors can include the cost
and execution time of the system. Moreover, a combination of system-centric and user-
centric approach can be used to form another metric that uses features from both, to
evaluate the system more effectively [95, 96].
2.2.2.7 Process Migration
In cluster, transfer of job from one computer to another without restarting is known as
process migration. A standard cluster RMS usually provides the process migration in
homogeneous systems. The migration in heterogeneous systems is much more complex
because of the numerous complex conversion processes from sources to destination
access points.
2.2.2.8 Correlation of Cluster Features and Resource Allocation
All the cluster computing features defined in the previous paragraphs are crucial for an
efficient resource management and allocation in the system. The job scheduling policy
strictly depends on the type of the job processed in a cluster. Moreover, the job
scheduling is completely different for batch (group) scheduling as compared to sequential
and simultaneous processed applications. The job processing schema, together with a job
structure, determines the speed of the cluster system. The monolithic single-task job
processed in a sequential mode is the main reason for possible ineffective system
utilization, because some cluster nodes are kept idle for a long time.
24
The cluster RMSs are defined as a system middleware that provides a single interface for
user-level applications to be executed on the cluster. The aforementioned, allows the
complexities of the underlying distributed nature of the clusters to be hidden from the
users. For effective management, the RMS in cluster computing requires some knowledge
of how users value the cluster resources. Moreover, RMS provides support for the users
to define QoS requirements for the job execution. In such scenarios, the system-centric
approaches have limited abilities to achieve the user desired utility. Therefore, the focus
is to increase the system throughput and maximize the resources utilization.
The administration of a centralized RMS is easier than the decentralized structures
because a single entity in the cluster has complete knowledge of the system. Moreover,
definition of communication protocols for different local job dispatchers is not required.
Furthermore, the reliability of centralized systems may be low because of the complete
outage of the system in case of central cluster node failure. The distributed administration
can tolerate a loss if any node is detached from a cluster. Another important factor of
resource allocation in cluster systems is the platform support. In homogenous systems,
the resource types are related to the specified scheduling constraints and service
requirements defined by the users. The analysis of the requests sent to the system can
help in managing the resource allocation process. In heterogeneous platform, the range of
resources required may vary that cause an increase in the complexity of the resource
management. A phenomenon where a process, task, or request is permanently denied for
resources is known as resource starvation. The probability of occurring resource
starvation is less in heterogeneous platform as compared to the homogenous platform.
If any node is disconnected from the cluster, then the workload of the node can be
migrated to other nodes present in the same cluster. Migration adds the reliability and
balancing of resource allocation across the cluster. A single node can request the
migration of resources when a request received is difficult to handle. The cluster as a
whole is responsible for process migration.
2.2.3 Grid Computer Systems: Features and Requirements
Grid systems are composed of resources that are distributed across various organizations
and administrative domains [56]. A grid environment needs to dynamically address the
issues involved in sharing a wide range of resources. Moreover, various types of grid
systems, such as CGs, desktop, enterprise, and data grids can be designed [97, 56]. For
25
each type of grid, a set of various features can be defined and analyzed. Author presents
multiple grid properties that can be extracted from different grid classes to form the
generic model of a grid system. A generic model could have all or some of the following
extracted properties.
2.2.3.1 System Type
The large ultimate scale of a grid system, require an appropriate architectural model that
allows efficient management of geographically distributed resources over multiple
administrative domains [83]. The system type can be categorized as computational, data,
and service grid based on the focus of a grid. The computational grid can be categorized
as high throughput and distributed computing. The service grid can be categorized as on-
demand, collaborative, and multimedia. In hierarchical models, the scheduling is hybrid,
having centralized scheduling at the top level and decentralized scheduling at the lower
level. Therefore, author categorized system type into three categories: (a) data, (b)
computational, and (c) service.
2.2.3.2 Scheduling Organization
Scheduling organization refers to the way or mechanism that defines the way resources
are being allocated. We have considered three main organizations of scheduling namely:
(a) centralized, (b) decentralized, and (c) hierarchical. In the centralized model, a central
authority has full knowledge of the system. The disadvantages of centralized model are
limited scalability, lack of fault tolerance, and difficulty in accommodating local multiple
policies imposed by the resource owners. In the decentralized model, local schedulers
interact with each other to manage the tasks pool. No central authority is responsible for
the resource allocation in case of decentralized model. Therefore, the model naturally
addresses issues, such as, fault-tolerance, scalability, site-autonomy, and multi-policy
scheduling. The decentralized model is used for large scale network sizes but the
scheduling controllers need to coordinate with each other every time for smooth
scheduling. The coordination can be achieved through resource discovery or resource
trading protocols. Finally, in the hierarchical model, a central meta-scheduler (or meta-
broker) interacts with local job dispatchers to define the optimal schedules. The higher-
level scheduler manages large sets of resources while the lower level job managers
control a small set of resources. The local schedulers have knowledge about resource
26
clusters but cannot monitor the whole system. The advantage of using hierarchical
scheduling is the ability to incorporate scalability and fault -tolerance. Moreover,
hierarchical scheduling retains some of the advantages of the centralized scheme such as
co-allocation (readers are encourage to see [98] for more details).
2.2.3.3 Resource Description
Grid resources are spread across the wide-area global network with different local
resource allocation policies. The characteristics should include specific parameters
needed to express the resource heterogeneity, structure, and availability in the system. In
the NWS project [85], the specification of availability of CPU, TCP connection
establishment time, end-to-end latency, and available bandwidth are needed for resource
description. Similarly, cactus worm (Section 2.2.2.11) is an on-demand grid computing
system that needs an independent service that is responsible for resource discovery and
selection based on application-supplied criteria, using GRRP and GRIP [99].
2.2.3.4 Resource Allocation Policies
A scheduling policy has to be defined for ordering of jobs and requests when any
rescheduling is required. Different resource utilization policies are available for different
systems due to different administrative domains. Figure 2.7 represents the taxonomy of
resource allocation policies. Resource allocation polices are necessary for ordering the
jobs and requests in all types of grid models. In fixed resource allocation approach, the
resource manager implements predefined policy.
Moreover, the fixed resource allocation approach is further classified into two categories
namely, system oriented and application oriented. System-oriented allocation policy
focuses on maximizing the throughput of the system [100]. The aim of application
oriented allocation strategy is to optimize the specific scheduling attributes, such as time
and cost (storage capacity).
Many examples of systems are available that use application oriented resource allocation
policies, such as PUNCH [101], WREN [102], and CONDOR [103].The resource
allocation strategy that allows external agents or entities to change the scheduling policy
is called an extensible scheduling policy. The aforesaid can be implemented by using ad-
hoc extensible schemes that defines an interface used by an agent for the modification of
the scheduling policy [100].
27
Resource
Allocation Policy
Fixed Extensible
System Oriented Application Oriented Adhoc Structured
Figure 2.7: Taxonomy of Resource Allocation Policies
2.2.3.5 Breadth of Scope
The breadth of scope expresses the scalability and self-adaptation levels of the grid
systems. If the system or grid-enable application is designed only for specific platform or
application, then the breadth of scope is low. Systems that are highly scalable and self-
adaptive can be characterized as medium or high breadth of scope. Adopting the self-
adaptive mechanisms oriented towards specific type of applications can lead to poor
performance of applications not covered by the mechanisms. One example of a breadth of
scope is a scheduler that read applications as if there is no data dependency between the
tasks but if an application has a task dependency then the schedu ler may perform poorly
[66].
2.2.3.6 Triggering Information
Triggering information refers to an aggregator service that collects information and check
if the data against a set of conditions defined in a configuration file are met [104]. If the
conditions are met, then the specified action takes place. the service plays an important
role in notifying certain actions to the administrator or controller whenever any service
fails. One example of the aforementioned action is to construct an email notification to
the system administrator when the disk space on a server reaches a certain threshold.
Triggering information can be used by the schedulers while allocating resources.
2.2.3.7 System Functionality
System Functionality is an attribute used to define the core aspect of the system, such as
javelin is a system for Internet-wide parallel computing based on java.
28
2.2.3.8 Correlation of Grid Features and Resource Allocation
Grid systems are categorized into various types based on the characteristics of the
resource, such as resource type and the allocation policies [105]. The primary focus of
computational grids is on processing capabilities. Computational grids are suitable for the
execution of compute-intensive and high throughput applications that usually need more
computing power by a single resource [105]. Scheduling organization determines the
priorities in the resource allocation process. In centralized systems, only single or
multiple resources located in a single or multiple domains can be managed [97]. In
decentralized scheduling model, the schedulers interact with each other to select
resources appropriate for jobs execution. In case of conflicts among resource providers
on a global policy for resource management, the aforesaid system (centralized or
decentralized) can be difficult to implement as a grid system. The hierarchical system
allows remote resource providers to enforce local allocation policies [97]. Several
policies are available for resource allocation. The fixed policy is generally used for
sequentially processed jobs. Extensible policies are used if the application priorities can
be set using the external agents.
2.2.4 Cloud Computing Systems: Features and Requirements
The cloud computing systems are difficult to model with resource contention (competing
access to shared resources). Many factors, such as the number of machines, types of
application, and overall workload characteristics, can vary widely and affect the
performance of the system. A comprehensive study of the existing cloud technologies are
discussed in the following Section i.e., 2.3 based on a set of generic features in the cloud
systems.
2.2.4.1 System Focus
Each cloud system is designed to focus on certain aspects, such as Amazon EC2 is
designed to provide the best infrastructure for cloud computing systems with every
possible feature available to the user [106]. Similarly, GENI system [107] focuses on
providing a virtual laboratory for exploring future internets in a cloud. Google Nimbus
[108] focuses on extending and experimenting for the set of capabilities, such as resource
as an infrastructure and ease of use. Open Nebula [109], provides complete organization
of data centers for on-premise IaaS cloud infrastructure.
29
2.2.4.2 Services
Cloud computing is usually considered a next step from the grid-utility model [56].
However, the cloud system not only realizes the service but also utilizes resource sharing.
Cloud system guarantees the delivery of consistent services through advanced data
centers that are built on compute and storage virtualization technologies [56, 110]. The
type of services that a system provides to a user is an important parameter to evaluate the
system [111].Cloud computing is all about providing services to the users, either in the
form of SaaS, PaaS, or IaaS. The cloud computing architecture can be categorized based
on the type of services, such as Amazon EC2 provides computational and storage
services. However, the Sun Network.com (Sun Grid) only provides computationa l
services. Similarly, we can categorize Microsoft Live Mesh and GRIDS Lab Aneka as
infrastructure and software based cloud, respectively.
2.2.4.3 Virtualization
Cloud resources are modeled as virtual computational nodes connected through large-
scale network conferring to the specified topology. Peer-to-peer ring topology is a
commonly used example for a cloud resource system and users community organization
[110]. Based on the virtualization, the cloud computing paradigm allows workloads to be
deployed and scaled-out quickly through the rapid provisioning of VM on physical
machines. Author evaluated number of systems based on the entities or the processes
responsible for performing virtualization.
2.2.4.4 Dynamic QoS Negotiation
Real-time middleware services must guarantee predictable performance under specified
load and failure conditions [80]. Provision of QoS attributes dynamically at run-time
based on specific conditions is termed as Dynamic QoS negotiation, such as renegotiable
variable bit-rate [112]. Moreover, dynamic QoS negotiations ensure graceful degradation
when the aforementioned conditions are violated. QoS requirements may vary during the
execution of the system workflow to allow the best adaptation to customer expectations.
Dynamic QoS negotiation in cloud systems are performed by either a dedicated process
or by an entity. Moreover, self-algorithms [96] can also be implemented to perform
dynamic QoS negotiation. . In Eucalyptus group managers [80], the dynamic QoS
operations are performed by the resource services. Dynamic QoS negotiation provides
30
more flexibility to the cloud that can make a difference while selecting two different
clouds, based on the requirements of the user. That is the reason Dynamic QoS is used as
a comparison attribute in the cloud.
2.2.4.5 User Access Interface
User access interface defines the communication protocol of the cloud system with
general user. Access interfaces must be equipped with the relevant tools necessary for
better performance of the system. Moreover, the access interface can be designed as a
command line, query based, console based, or graphical form interface. Although the
access interface is available in cluster and grid but in case of cloud, the access interface is
important because if the interface provided to the user is not user-friendly, then the user
might not use the service. In another scenario, suppose two CSPs provides same services
but if one has a user-friendly interface and the second one do not, then user would
definitely prefer the one with a user-friendly interface. In such scenarios the access
interface plays an important role and that is why it is used as a comparison feature under
cloud.
2.2.4.6 Web APIs
In cloud, the Web API is a web service dedicated for the combination of multiple web
services into new applications [111]. A set of HTTP request messages and description of
the schema of response messages in JSON or XML format makes a web API [95]. The
ingredients of current Web APIs are REST style communication. Earlier APIs were
developed using SOAP based services [111].
2.2.4.7 Value Added Services
Value added services are defined as additional services beyond the standard services
provided by the system. Value added services are available for a modest additional fee
(or free) as an attractive and low-cost alternative system support. Moreover, the purpose
of value added services are to: (a) promote the cloud system, (b) attract the new service
users, and (c) keep the old service users intact. The services ment ioned in SLA are
standard services. Moreover, the services that are provided to end-users to promote the
standard services come under the category of value added services. Value added services
are important to promote the cloud and to provide an edge over competitors. If one cloud
31
is only offering SLA based services and other is offering SLA based service plus value
added services too, then generally end-users will prefer the cloud that provides both of
the services.
2.2.4.8 Implementation Structure
Different programming languages and environments are used to implement a cloud
system. The implementation package can be monolithic and consists of a single specific
programming language. The Google app engine is an example of a cloud system that has
been implemented in the python script language. Another class is the high-level universal
cloud systems, such as Sun Network.com (Sun Grid) that can be implemented using
Solaris OS and programming languages like Java, C, C++, and FORTRAN.
Implementation structure is an important aspect to compare amongst different clouds
because if a cloud is implemented in a language that is obsolete, then people will hesitate
using such cloud.
2.2.4.9 VM Migration
The VM technology has emerged as a building block of data centers, as it provides
isolation, consolidation, and migration of workload. The purpose of migrating VM is to
seek improvement in performance, fault tolerance, and management of the systems over
the cloud. Moreover, in large scale systems the VM migration can also be used to balance
the systems by migrating the workload from overloaded or overheated systems to
underutilized systems. Some hypervisors, such as Vmware [113] and Xen, provides
„„live‟‟ migration, where the OS continues to run while the migration is performed. VM
migration is an important aspect of the cloud towards achieving high performance and
fault tolerance.
2.2.4.10 Pricing Model in Cloud
The pricing model implemented in the cloud is pay-as-you-go model, where the services
are charged as per the QoS requirements of the users. The resources in the cloud, such as
network bandwidth and storage, are charged on a specific rate. For example, the standard
price for block storage on HP cloud is $0.10 per GB/mo [114]. The prices of the clouds
may vary depending on the types of services they provide.
32
2.2.4.11 Correlation of Cloud Features and Resource Allocation
The focus of the cloud system is an important factor for the selection of appropriate
resources and services for the cloud users. Some resources may require a specific type of
infrastructure or platform. However, the cloud computing is more service-oriented than
resource-oriented [62]. The cloud users do not care much about the resources, but are
more concerned with the services being provided. Virtualization is used to hide the
complexity of the underlying system and resources. User satisfaction is one of the main
concerns in provisioning cloud computing web services. Dynamic QoS negotiations can
only be made if the resources are available.
2.3 Comparison and Survey of the Existing HPC Solutions
In Table 2.1, a comparison of three HPC categories (cluster, grid, and cloud) is provided.
Author classifies various HPC research projects and commercial products according to
the HPC systems classification that author have developed in Section 2.1. The list of the
systems discussed is not exhaustive but is representative of the classes. The projects in
each category have been chosen in Table 2.2 based on the factors specified for each HPC
class that was reported in Section 2.1.
Table 2.2: Survey of the Existing HPC Systems
HPC
Cluster Grid Cloud
Enhanced MOSIX Gluster
Faucets
DQS Tycon
Cluster-on-demand
Kerrighed
Open SSI Libra
PVM
CONDOR REXEC
GNQS
LoadLeveler
LSF SLRUM
PBS
GRACE
Ninf
G-QoSM Javelin
NWS
GHS
Standford Peer Initiatives 2K
AppleS
Darwin Cactus
Punch
Nimrod/G Netsolve
MOL
Legion
Wren Globus
Amazon EC2
Eucalyptus
Google App Engine
GENI Microsoft Live Mesh
Sun Netwrk.com (Sun
Grid) E-Learning Ecosystem
Grids Lab Aneka
Open Stack
33
2.3.1 Cluster Computing Systems
A list of representative cluster projects and a brief summary is provided in Table 2 .3. The
systems discussed below are characterized based on the generic cluster system features
highlighted in Section 2.1.
2.3.1.1 Enhanced MOSIX
Enhanced Mosix (E-Mosix) is a tailored version of Mosix project [115], which was
geared to achieve efficient resource utilization amongst nodes on a distributed
environment. Multiple processes are created by the users to run the applications. Mosix
will then discover the resources and will automatically migrates the processes among the
nodes for performance improvement without changing the run-time environment of the
processes. E-Mosix uses cost-based policy for process migration. The node in every
cluster makes the resource allocation decisions independently. Different resources are
collected and the overall system performance measure is defined as a total cost of the
resource utilization. E-Mosix supports parallel job processing mode. Moreover, migration
process is used to decrease the overall cost of job execution on different machines in the
cluster. Furthermore, a decentralized resource control is implemented and each cluster
node in the system is supplied with an autonomous resource assignment policy.
2.3.1.2 Gluster
Gluster defines a uniform computing and storage platform for developing applications
inclined towards specific tasks, such as storage, database clustering, and enterprise
provisioning [116]. The distribution of the Gluster is independent and has been tested on
a number of distributions. Gluster is an open source and scalable platform whose
distributed Gluster FS is capable of scaling up to thousands of clients. Commodity
servers are combined with Gluster and storage to form a massive storage networks.
Gluster SP and Gluster HPC are bundled cluster applications associated with Gluster. The
said system can be extended using Python scripts [116].
34
System
Job
Processing
Type
QoS Attributes Job Composition
Resource
Allocation
Control
Platform
Support
Evaluation
Method
Process
Migratio
n
Enhanced
MOSIX[115]
(1999)
Parallel Cost Single Task Decentralized Heterogeneous User-centric Yes
Gluster[116](200
7) Parallel Reliability (no point of failure) Parallel Task Decentralized Heterogeneous N/A Yes
Faucets[117]
(2003) Parallel Time, Cost Parallel Task Centralized Heterogeneous System-centric Yes
DQS [120]
(1998) batch
CPU memory sizes, Hardware architecture and OS
versions. Parallel Task Decentralized Heterogeneous System-centric No
Tycoon[121]
(2004) Sequential Time, Cost Multiple Task Decentralized Heterogeneous User-centric No
Cluster-On-
Demand[123]
(2002)
Sequential Cost in terms of time Independent Decentralized Heterogeneous User-centric No
Kerrighed[124,12
5] (1999) Sequential
Ease of use, High performance, High availability,
Efficient resources management, and High
customizability of the OS
Multiple Task Decentralized Homogeneous System-centric Yes
OpenSSI [126]
(2004) Parallel Availability, Scalability and manageability Multiple Task Centralized Heterogeneous System-centric Yes
Libra[82]
(2004)
Batch,
Sequential Time, Cost Parallel Centralized Heterogeneous
System-centric,
User-centric Yes
PVM[128]
(2001)
Parallel,
Concurrent Cost Multiple Task Centralized Heterogeneous User-centric Yes
Condor [103,134]
(1998) Parallel
Throughput, Productivity of computing
environment Multiple Task Centralized
Platform
Support System-centric Yes
REXEC[129]
(1999)
Parallel,
Sequential Cost
Independent, Single
Task Decentralized Homogeneous User-centric No
GNQS[130]
(2001)
Batch,
Parallel Computing power Parallel processing Centralized Heterogeneous System-centric No
LoadLeveler
[131] (2001) Parallel Time, High Availability Multiple Task Centralized Heterogeneous System-centric Yes
LSF[132] (2007) Parallel,
Batch
Job submission simplification,
Setup time reduction and operation errors Multiple Task Centralized Heterogeneous System-centric Yes
SLURM[90]
(2003) Parallel
Simplicity, Scalability, Portability and fault
tolerance Multiple Task Centralized Homogeneous
System-centric,
User-centric No
PBS [133] (1991) Batch Time, Jobs queuing Multiple Task Centralized Heterogeneous System-centric Yes
Table 2.3: Comparison of Cluster Computing Systems
35
2.3.1.3 Faucets
Faucets [117] are designed for processing parallel applications and offers an internal
adaptation framework for the parallel applications based on adaptive MPI [118] and
Charm++ [69] solutions. The number of applications executed on Faucets can vary [119].
The abovementioned process allows the utilization of all resources that are currently
available in the system. For each parallel task submitted to the system, the user has to
specify the required software environment, expected completion time, number of
processors needed for the task completion, and budget limits. The privileged scheduling
criterion in Faucets is the completion time of a job. The total cost of the resource
utilization calculated for a particular user is specified based on the bids received from the
resource providers. Faucets supports time-shared scheduling that simultaneously executes
adaptive jobs based on dissimilar percentages of allocated processors. Faucets support
parallel job processing type. Moreover, the constraints about the requirements of any
parallel task remain constant throughout the task execution. Jobs are submitted to
Faucets with a QoS requirement and subscribing clusters return bids. Moreover, the best
bid that meets all criteria is selected. Bartering is an important unit of Faucet that permits
cluster maintainers to exchange computational power with each other. Moreover, units
are awarded when the bidding cluster successfully runs an application. Users can later on
trade the bartering units to use the resources on other subscribing clusters.
2.3.1.4 Distributed Queuing System
DQS is used for scheduling background tasks to a number of workstations. The tasks are
presented to the system as a queue of applications. The queue of tasks is automatically
organized by the DQS system based on the current resource status [120]. Jobs in the
queue are sorted on the priority of subsequent submission pattern, internal sub-priority,
and the job identifier. The sub-priority is calculated each time the master node scans the
queued jobs for scheduling. The calculat ions are relative to each user and reflect the total
number of jobs in the queue that are ahead of each job. The total number of jobs includes
any of the use jobs that are in “Running” state or in “Queued” state.
2.3.1.5 Tycoon
Tycoon allocates cluster resources with different system performance factors, such as
CPU cycles, memory, and bandwidth [121, 122]. Tycoon is based on the principle of
36
proportional resource sharing. Moreover, the major advantage of Tycoon is to
differentiate the values of the jobs. Communication delay is a major factor for resource
acquisition latency, and no process of manual bidding is available in Tycoon. Manual
bidding supports proficient use of different resources when no precise bids are present at
all. Tycoon is composed of four main components namely:(a) bank, (b) auctioneers, (c)
location service, and (d) agents. Tycoon [122] uses two-tier architecture for the allocation
of resources. Ref. [122] differentiates between allocation mechanism and user strategy.
Allocation mechanism offers different means to seek user assessments for efficient
execution and user strategy captures high-level preferences that vary across number of
users but are more application-dependent. The division of allocation mechanism and user
strategy permits requirements not to be restricted and dependent.
2.3.1.6 Cluster on Demand
Cluster-on-Demand allocates servers from a common pool to multiple partitions called
virtual clusters, with independently configured software environments [123]. The jobs
executed in the system are implicitly single-task applications and are ordered on the basis
of arrival time. For each job submitted to the system, the user specifies a value function
containing a constant reduction factor for the required level of services needed by the
user [123]. The value function remains static throughout the execution once the
agreement has been approved by the user. A cluster manager is responsible for
scheduling the tasks to resources from different administrative domains. The support of
adaptive allocation of resource update is available. Moreover, the cluster manager forces
less costly dedicated jobs to wait for more costly new tasks that may arrive later in the
future. A deduction can be made from the aforementioned, that no hard constraints are
supported because many accepted jobs can take more time for the completion than
anticipated. The cost measure of Cluster-on-Demand is the cost of node configuration for
a full wipe clean install. The Cluster-on-Demand uses a user-centric evaluation of cost
measure and the major cost factor is the type of hardware devices used.
2.3.1.7 Kerrighed
Kerrighed is a cluster system with a Linux kernel patch as a main module for controlling
the whole system behavior [124, 125]. For fair load balancing of the cluster, schedulers
use sockets, pipe, and char devices. Moreover, the use of devices does not affect the
37
cluster communication mechanisms due to seamless migration of the applications across
the system. Furthermore, the migration of single threaded and multi-threaded applications
are supported in the process. The running process at one node can be paused and restarted
at another node. Kerrighed system provides a view of single SMP machine.
2.3.1.8 Open Single System Image
OpenSSI is an open source uniform image clustering system. Moreover, the collection of
computers to serve as a joint large cluster [126] is also supported by OpenSSI. Contrary
to Kerrighed, in OpenSSI, the number of resources that are available may vary during the
task execution. OpenSSI is based on Linux OS. The concept of bit variation process
migration that is derived from Mosix is used in OpenSSI. Bit Variation dynamically
balances the CPU load on the cluster by migrating different threaded processes. The
process management in OpenSSI is tough. A single process ID is assigned to each
process on a cluster and the inter process communication is handled cluster wide. The
limitation of OpenSSI is the support of maximum 125 nodes per cluster.
2.3.1.9 Libra
Libra takes advantage of the number of jobs based on the system and user requirements
[82]. Different resources are allocated based on the budget and deadline constraints for
each job. Libra communicates with the federal resource manager that is responsible for
collecting information of different resources presented in the cluster. In case of a mixed
composition of resources, estimated execution time is calculated on diverse worker
nodes. Libra assigns different resources to executing jobs based on the deadlines. A
centralized accounting mechanism is used for resource utilization of current jobs, to
periodically relocate time partitions for each critical job and to meet the deadlines. Libra
assumes that each submitted job is sequential and is composed of a single task. Libra
schedules tasks to internal working nodes available in the cluster [127]. Each internal
node has a task control component that relocates and reassigns processor time, and
performs partitioning periodically based on the actual execution and deadline of each
active job. The system evaluation factors to assess overall system performance are
system-centric with average waiting and response time as the parameters. However, Libra
performs better than traditional FCFS scheduling approach for both user-centric and
system-centric evaluation factors.
38
2.3.1.10 Parallel Virtual Machine
PVM is a portable software package combining a heterogeneous collection of computers
in a network to provide a view of a single large parallel computer. The aim of using PVM
is to aggregate memory and power of many computers to solve large computational
problems in a cost efficient way. To solve much larger problems, PVM accommodate
existing computer hardware with some minimal extra cost. A PVM user outside the
cluster can view the cluster as a single terminal. All cluster details are hidden from the
end user, irrespective of how cluster put tasks on individual nodes. PVM is currently
being used by a number of websites across the globe for solving medical, scientific, and
industrial problems [128]. PVM is also employed as an educational tool for teaching
parallel programming courses.
2.3.1.11 REXEC
In REXEC, a resource sharing mechanism, the users struggle for shared resources in a
cluster [129]. The computational loads of the resources are balanced according to the
total allocation cost, such as credits per minute that users agreed to pay for resource
utilization. Multiple daemons select the best node to execute particular tasks that are the
key components of the decentralized resource management control system. Numbers of
jobs are mapped to the distributed resources at the same time intervals according to the
time-shared scheduling rules. REXEC supports parallel and sequential job processing.
Users specify the cost restrictions that remains fixed after the task submission. The
resource assignments already presented in the system are reassigned, whenever a new
task execution is being initialized or finished. REXEC uses an aggregate utility function
as a user-centric evaluation factor that represents the cost of all tasks on the cluster. The
end-users are charged based on the completion times of the tasks.
2.3.1.12 Generic Network Queuing System
GNQS is an open source batch processing system. The networks of computers or
applications on a single machine are scheduled through GNQS that do not allow tasks to
be executed simultaneously [130]. GNQS is not a shareware application and is
maintained by a large community across the Internet. ANSI-C language is required to
compile the code with root privileges to successfully run the GNQS on a single local
computer.
39
2.3.1.13 Load Leveler
Load Leveler [131] is a parallel scheduling system developed by IBM that works by
matching the processing needs of each task and priorities of available resources. Number
of end users can execute jobs in a limited time interval by using load leveler. For high
availability and workload management, the Load Leveler provides a single point of
control. In a multi-user production environment, the use of Load Leveler supports
aggregate improvement in system performance as well as turnaround time with equal
distribution of the resources. In Load Leveler, every machine that contributes must run
one or more daemons [131].
2.3.1.14 Load Sharing Facility
LSF [132] has a complete set of workload management abilities that manages workload
in distributed, demanding, and critical HPC environments. LSF executes batch jobs. The
set of workload management and intelligent scheduling features fully utilize the
computing resources. LSF schedules a complex workload and provides a highly available
and scalable architecture. Ref [132] provides HPC components like HPC data center for
managing workload and also provides vendor support.
2.3.1.15 Simple Linux Utility for Resource Management
SLURM [90] is an open source, scalable, and fault tolerant cluster management, and job
scheduling system. SLURM is used in small and large Linux cluster environments.
SLURM provides exclusive and non-exclusive access of computing resources to users.
Then, the execution and monitoring of the allocated computing resources are performed.
Finally, the awaiting requests are accomplished.
2.3.1.16 Portable Batch System
PBS [133] provides job resource management in batch cluster environment. In HPC
environment, PBS provides the jobs information to the Moab that is a job scheduler used
in PBS. Moab decides the selection of jobs for execution. PBS selects and dispatches jobs
from the queue to cluster nodes for execution. PBS supports non-interactive batch jobs
and interactive batch jobs. The non-interactive jobs are more common. The essential
execution commands and resource requests are created in the form of a job script that is
submitted for execution.
40
2.3.1.17 Condor (HTCondor)
A large collection of heterogeneous machines and networks are managed by Condor high
throughput computing environment [103, 134]. Condor shares and combines the idle
computing resources. Condor reserves the information of the originating machine
specifications through remote system call capabilities. The remote system call tracks the
originated machines when the file system or scheme is not shared among the users. A
Condor matchmaker is used to determine the compatible resource request. The
matchmaker triggers a query to the condor collector for resource information stored for
resource discovery.
2.3.2 Grid Computing Systems
A diverse range of applications is employed in computational grids. Scientists and
engineers rely on grid computing to solve challenging problems in engineering,
manufacturing, finance, risk analysis, data processing, and science [135]. Table 2.4
shows the representative grid systems that are analyzed according to the grid features
specified in Section 2.1. All the values of the features are straight forward. For some
features, we did a comparative study, such as breadth of scope or triggering information
and the values are high, medium, or low. No threshold value for the categorization is
provided.
41
Table 2.4: Comparison of Grid Computing Systems
System
Sys Type
Scheduling
Organization
Resource Description
Resource
Allocation
Policy
Breadth
of scope
Triggering
Info
Sys Functionality
GRACE
[135,136]
(2005)
Computational
Not specified
can be
Decentralized/
ad -hoc
CPU process power, memory,
storage capacity and network
bandwidth
Fixed AOP High High Resources are allocated on demand
and supply
Ninf [137]
(1996) Computational Decentralized
No QoS, periodic push
Dissemination, centralized
queries discovery
fixed AOP Medium Low A global computing client-server
based system
G-QoSM
[139,140,
141]
(2002)
On-demand Requirements
matching Processing power and Bandwidth. Fixed AOP High High
SLA based resources are allocated
in the system.
Javelin [142]
(2000) Computational Decentralized
Soft QoS,
distributed queries discovery, other
network directory store,
periodic push dissemination
fixed
AOP Low Medium
A system for Internet-wide parallel
computing based on Java
NWS [143]
(1997) Hierarchical Host Capacity
end-to-end latency and available
bandwidth , availability of CPU N/A Low Low Used for short term prediction
GHS[147]
(2005) Hierarchical
Heuristics
based on Host
Capacity
availability of CPU, TCP
connection establishment
time, end-to-end latency
N/A Medium Medium
Scalability and precision in
prediction at high level than NWS
[47].
Stanford
Peer
Initiatives
Computational
Hierarchical/
Decentralized
CPU cycles, disk space, network
bandwidth
NA
High
Medium
Distribution of main costs of
sharing data, disk space for storing
files and bandwidth for transfer
2K [149,150]
(1999) On-demand
Hierarchical/
Decentralized
Online dissemination, Soft network
QoS,
agent
discovery
fixed AOP High Medium
Flexible and adaptable distributed
OS used for a wide variety of
platforms
AppLeS
[153,138]
(1997)
High-
throughput
Hierarchical/
Decentralized
Models for resources provided by
Globus, Legion, or Netsolve fixed AOP Low Medium
Produces scheduling agents for
computational grids
Darwin [154]
(2001) Multimedia
Hierarchical/
Decentralized
hard QoS , Graph
namespace
fixed system
oriented
policy (SOP)
Low High
Manages resources for network
services
Cactus
Worm [155]
(2001)
On-demand Requirements
matching N/A fixed AOP High Medium
When required performance is not
achieved the systemallows
applications to adapt accordingly.
PUNCH
[101]
(1999)
Computational
Hierarchical/
Decentralized
Soft QoS, periodic push
Dissemination, distributed
queries discovery
fixed
AOP
Medium
Medium
A middleware that provides
transparent access to remote
programs and resources.
Nimrod/G
[158,159]
High-
throughput
Hierarchical/
Decentralized
Relational
network directory data store, soft
QoS, distributed queries
fixed
AOP Medium Medium
Provides brokering services for
task farming application
42
(2000) discovery
NetSolve
[162]
(1997)
Computational Decentralized
Soft QoS,
periodic push dissemination,
distributed queries discovery
fixed
AOP Medium Medium
A network-enabled application
server for solving computational
problems in distributed
environment.
MOL [164]
(2000) Computational Decentralized
Distributed queries
discovery, periodic push
dissemination
Extensible
Ad-hoc
scheduling
policies
(ASP)
Low Low
Provide resource management for
dynamic communication, fault
management, and access provision
Legion
[100,166]
(1999)
Computational
Hierarchical/
Decentralized
Soft QoS, periodic pull
Dissemination, distributed queries
discovery
Extensible
structured
scheduling
policy (SSP)
Medium Medium Provides an infrastructure for grid
based on object meta system.
Wren [102]
(2003) Grid
No mechanism
for initial
scheduling
NA Fixed AOP Low Low Provide active probing with low
overhead.
Globus [161]
(1996) Hierarchical Decentralized
Soft QoS,
network directory store,
distributed queries discovery
extensible
Ad-hoc
scheduling
policy (ASP)
High Medium
Provides basic services for
Modular deployment of grids in
Globus Meta computing Toolkit.
43
2.3.2.1 Grid Architecture for Computational Economy
GRACE is a generic infrastructure for the market-based grid approach that co-exists with
other grid systems, such as Globus. The interactions of the grid users with the system are
provided through GRB. GRACE employs Nimrod-G grid scheduler [135] responsible for:
(a) resource discovery, (b) selection, (c) scheduling, and (d) allocation. The resource
brokers fulfill the user demands by optimizing execution time of the jobs and user budget
expenses, simultaneously [135]. GRACE architecture allocates resources on supply and
demand basis [136]. The resources monitored by GRACE are software applications or
hardware devices. GRACE enables the control of CPU power, memory, storage capacity,
and network bandwidth. The resources are allocated according to the fixed application-
oriented policy.
2.3.2.2 Network Infrastructure
Ninf [137] is an example of a computational grid that is based on a client -server
infrastructure. Ninf clients are connected with the servers through local area networks.
The server machine and the client machines could be heterogeneous that is why the data
to be communicated is translated into a mutual network data format [137]. The
components of the Ninf system are client interfaces, remote libraries, and a meta-server.
Ninf applications invoke Ninf libraries and the request is forwarded to the Ninf meta-
server that maintains the Ninf servers‟ directory. Ninf meta-server forwards the library
request to the appropriate server. Moreover, Ninf uses centralized resource discovery
mechanism. The computational resources are registered with meta-server through library
services [138]. The scheduling mechanism in Ninf is decentralized and the server
performs actual scheduling of the client requests. Ninf uses a fixed application oriented
policy, has a medium level breadth of scope, and provides no QoS. The triggering
information in Ninf is low.
2.3.2.3 Grid-Quality of Services Management
G-QoSM [136, 139, 140] system works under an OGSA [141]. G-QoSM provides
resource and service discovery support, based on QoS features. Moreover, guarantee of
supporting QoS at application, network, and middle grid level is also provided. G-QoSM
provides three levels of QoS namely: (a) best effort, (b) controlled, and (c) guaranteed
levels. The resources are allocated on the basis of SLA between the users and providers.
44
G-QoSM utilizes SLA mechanism, so the triggering information as well as breadth of
scope is high. The scheduling organization in G-QoSM can be centralized or
decentralized. However, the main focus of G-QoSM is managing the QoS. The resources
focused by the G-QoS management are the bandwidth and processing power. G-QoSM
uses fixed application oriented policy for resource allocation.
2.3.2.4 Javelin
Javelin is a Java based infrastructure [142] that may be used as an Internet-wide parallel
computing system. Javelin is composed of three main components: (a) clients, (b) hosts,
and (c) brokers. Hosts provide computational resources; clients seek for computational
resources and resource brokers support the allocation of the resources. In Jave lin, hosts
can be attached to a broker, considering as a resource. Javelin uses hierarchical resource
management [138].If a client or host wants to connect to Javelin, then a connection with
Javelin broker has to be made that is agreed to support the client or host. The backbone of
Javelin is the BNS. The BNS is an information system that keeps the information about
available brokers [142]. Javelin has a decentralized scheduling organization with a fixed
application oriented resource allocation policy. The breadth of scope of Javelin is low
and only supports Java based applications. The triggering information of Javelin is
medium.
2.3.2.5 Network Weather Service
NWS [143] is a distributed prediction system for the network (and resources) dynamics.
The prediction mechanism in NWS is based on the adaptation strategies that analyze the
previous system states. In NWS, system features and network performance factors, such
as bandwidth, CPU speed, TCP connection establishment time, and latency are
considered as the main criteria of resource description measurements [136]. System, such
as NWS have been used successfully to choose between replicated web pages [144] and
to implement dynamic scheduling agents for meta-computing applications [145, 146].
Extensible system architecture, distributed fault -tolerant control algorithms, and adaptive
programming techniques has been illuminated by the implementation of the NWS to
operate in a variety of meta-computing and distributed environments with changing
performance characteristics. NWS uses host capacity based scheduling organization.
45
Moreover, NWS works well for short time processes in restricted area grid clusters.
Therefore, the breadth of scope and triggering information of NWS is low.
2.3.2.6 Grid Harvest Service
The goal of GHS is to achieve high scalability and precision in a network [147]. . The
GHS system is comprised of five subsystems, namley: (a) task allocation module, (b)
execution management system, (c) performance measurement, (d) performance
evaluation, and (e) task scheduling modules. GHS enhances application performance by
task re-scheduling and utilizing two scheduling algorithms. First algorithm minimizes
task execution time and the second algorithm is used to assign the tasks to an individua l
resource. GHS, like NWS [143], uses host capacity based heuristics as scheduling
organization. The breadth of scope and the triggering information is of medium level in
GHS [136].
2.3.2.7 Stanford Peers Initiative
Stanford Peers Initiative utilizes a peer-to-peer data trading framework to create a digital
archiving system. Stanford Peers Initiative uses a unique bid trading auction method that
seeks bids from distant web services to replicate the collection. In response, each remote
web service replies that reflect the amount of total disk storage space [148]. The local
web service selects the lowest bid for maximizing the benefits. Because the system
focuses on preserving the data for the longest possible period, the major system
performance factor is the reliability. The reliability measure is a MTTF for each local
web service. Each web service try to minimize the total cost of trading that is usually
measured in term of disk space provided. For replicating data collection, a decentralized
management control is implemented in the system. The web service makes the decision to
select the suitable remote services. Each web service is represented as an independent
system entity. The storage space remains fixed throughout, even if a remote service is
selected.
2.3.2.8 2K
2K grid system provides distributed services for multiple flexible and adaptable platforms
[149, 150]. The supported platform ranges from PDAs to large scale computers for the
application processing. 2K is a reflective OS built on top of a reflective ORB, dynamic
TAO [151], a dynamically configurable version of TAO [152]. The key features of 2K
46
are: (a) distribution, (b) user-centrism, (c) adaptation, and (d) architectural awareness. 2K
is an example of an on-demand grid system that uses agents for resource discovery and
mobile agents to perform resource dissemination functionality [138].2K uses
decentralized and hierarchical scheduling organization and a fixed application oriented
resource allocation policy. In 2K system, no mechanism for rescheduling is supported.
The breadth of scope of the 2K system is high due to multiple ranges of platforms.
Moreover, the triggering information is medium due to soft QoS provisioning.
2.3.2.9 AppLeS
AppLeS is an application level grid scheduler that operates as an agent in a dynamic
environment. AppLeS assists an application developer by enhancing the scheduling
activity. For each application, an individual AppLeS agent is designed for resource
selection. AppLeS agents are not utilized as resource management system. Moreover, the
projects such as Legion or Globus grid packages [153] utilize resource management
systems. AppLeS is used for computational purposes. AppLeS provide templates that are
used in structurally similar applications. AppLeS utilize hierarchical or decentralized
schedulers and a fixed application oriented policy for resource allocation [138]. The
triggering information in AppLeS is medium and the breadth of scope is low.
2.3.2.10 Darwin
Darwin is a grid resource management system that provides value-added network
services electronically [154]. The main features of the system are: (a) high level resource
selection, (b) run-time resource management, (c) hierarchical scheduling, and (d) low
level resource allocation mechanisms [154]. Darwin utilizes hierarchical schedulers and
online rescheduling mechanisms. The resource allocation policy in Darwin is fixed
system-oriented. To allocate resources globally in grid, the system employs a request
broker called Xena [154]. For resource allocation at higher level, Darwin uses an H-FSC
scheduling algorithm. Darwin runs in routers and provides hard network QoS.
2.3.2.11 Cactus Worm
Cactus Worm [139 155] is an on-demand grid computing system. Cactus Worm supports
an adaptive application structure and can be characterized as an experimental framework
that can handle dynamic resource features. Cactus supports dynamic resource selection
for resource interchange through migration. The migration mechanism is performed only
47
when the performance is not adequate [155]. Cactus Worm supports different
architectures namely: (a) uni-processors, (b) clusters, and (c) supercomputers [156]. The
scheduling organization in Cactus Worm is based on requirements matching (Condor)
and uses a fixed-based AOP for resource allocation. The functionality of the Cactus
Worm is expressed in adaptation of the resource allocation policy if the required
performance level is not achieved. Moreover, the breadth of scope is high and triggering
information is at the middle level in Cactus Worm.
2.3.2.12 PUNCH
PUNCH is a network-based computing middleware test-bed that provides OS services in
a distributed computing environment [101]. PUNCH is a multi-user and multi-process
environment that allows: (a) a transparent remote access to applications and resources,
(b) access control, and (c) job control functionality. PUNCH supports a virtual grid
organization by fully decentralized and autonomous management of resources [157]. The
key concept is to design and implement a platform that provides independence between
the applications and the computing infrastructure. PUNCH possesses hierarchical
decentralized resource management and predictive machine learning methodologies for
mapping the jobs to resources. PUNCH uses: (a) an extensible schema model, (b) a
hybrid namespace, (c) soft QoS, (d) distributed queries discovery, and (e) periodic push
dissemination as resources. The resources are allocated according to the fixed application
oriented policy. The functionality of PUNCH systems is expressed in terms of flexible
remote access to the user and the computing infrastructure of the application. The
breadth of scope and triggering information of PUNCH are at the middle level.
2.3.2.13 Nimrod/G
Nimrod/G is designed to seamlessly execute large-scale parameter study simulations,
such as parameter sweep applications through a simple declarative language and GUI on
computational Grids [158, 159]. Nimrod/G is a grid resource broker for managing and
steering task farming applications and follows a computational market -based model for
resource management. Nimrod/G strives for low cost access to computational resources
using GRACE services. Moreover, the user defined constraints cost is minimized using
adaptive scheduling algorithms. Beside the parameter studies, Nimrod/G also provides
support for a single window to: (a) manage and control experiments, (b) discover
48
resources, (c) trade resources, and (d) perform scheduling [160, 161]. Nimrod/G uses a
task performing engine that generates user defined scheduling policies. Nimrod uses
hierarchical decentralized scheduler and predictive pricing models as scheduling
organizations. Moreover, Nimrod/G uses resource descriptions, such as (a) relational
network directory data store, (b) soft QoS, (c) distributed queries discovery, and (d)
periodic dissemination. Fixed application-oriented policy driven by user-defined
requirements, such as deadline and budget limitations are used in Nimrod/G for resource
allocation. Active sheets is an example of Nimrod/G used to execute Microsoft Excel
computations/cells on the Grid [162].
2.3.2.14 NetSolve
NetSolve [163] is an application server based on a client-agent-server environment.
NetSolve integrates distributed resources to a desktop application. NetSolve resources
include hardware, software, and computational software packages. TCP/IP sockets are
used for the interaction among the user, agents, and servers. The server can be
implemented in any scientific package. Moreover, the clients can be implemented in C,
FORTRAN, MATLAB, or web pages. The agents are responsible for locating the best
possible resources available in the network. Once the resource is selected, the agents
execute the client request and return the answers back to the user. NetSolve is a
computational grid with a decentralized scheduler. NetSolve uses soft QoS, distributed
queries discovery and periodic push dissemination for the resource description.
Moreover, a fixed application oriented policy is used for resource allocation. Breadth of
scope and triggering information of NetSolve is medium as scalability is limited to
certain applications.
2.3.2.15 Meta Computing Online
MOL system consists of a kernel as the core component of system and provides the basic
infrastructure for interconnected resources, users, and third party meta-computer
components [164]. MOL supports dynamic communications, fault management, and
access provisions. The key aspects of MOL kernel are reliability and flexibility.
Moreover, MOL is the first meta-computer infrastructure that does not reveal a single
point of failure [165]. MOL has a decentralized scheduler and uses: (a) hierarchical
namespace, (b) object model store, and (c) distributed quer ies discovery as resource
49
description. MOL uses extensible ad-hoc scheduling policies for resource allocation. No
QoS support is available, so the triggering information and breath of scope are low.
2.3.2.16 Legion
Legion is a software infrastructure that aims to connect multiple hosts ranging from PCs
to massive parallel computers [100]. The most important features that motivate the use of
legion include: (a) site autonomy, (b) support for heterogeneity, (c) usability, (d) parallel
processing to achieve high system performance, (e) extensibility, (f) fault tolerance, (g)
scalability, (h) security, (i) multi-language implementation support, and (j) global naming
[166]. Legion appears as a vertical system and follows a hierarchical scheduling model.
Legion uses distributed queries for resource discovery and periodic pull for
dissemination. Moreover, Legion uses extensible structured scheduling policy for
resource allocation. One of the main objectives of the Legion is the system scalability
and high performance. The breadth of scope and triggering information are high in
Legion.
2.3.2.17 Wren
Wren is a topology-based steering approach for providing network measurement [100].
The network ranges from clusters to WANs and uses information about the possible
bottlenecks that may occur in the networks from topologies. The information is useful in
steering the measurement techniques to calculate the channels where the bottlenecks may
occur. Passive and active measurement systems are combined by Wren to minimize the
measurement load [100]. Topology-based steering is used to achieve the load
measurement task. Moreover, no mechanism for initial scheduling is available and a fixed
application policy is used for resource allocation. Furthermore, Wren has limited
scalability that results in low breadth of scope. No QoS attributes are considered in
WREN and, the triggering information is also low.
2.3.2.18 Globus
The Globus system achieves a vertically integrated treatment of applications, networks,
and middleware [161]. The low level toolkit performs: (a) communication, (b)
authentication, and (c) access. Meta computing systems has the problem of configuration
and performance optimization.
50
2.3.3 Cloud Computing Systems
Numerous cloud approaches tackle complex resource provisioning and programming
problems for the users with different priorities and requirements. Eight examples of cloud
computing solutions are summarized and characterized under the key system features in
Table 2.5.
2.3.3.1 Amazon Elastic Compute Cloud
Amazon EC2 [167] is a virtual computing environment that enables a user to run Linux-
based applications. Amazon EC2 provides a rental service of VM on the Internet [79].
Amazon‟s EC2 service has become a standard-bearer for IaaS providers and provides
many different service levels to the users [168]. Depending on the individual user
choices, a new „Machine Image‟ based on: (a) application types, (b) structures, (c)
libraries, (d) data, and (e) associated configuration settings can be specified. The user can
also choose the available AMIs in the network and upload AMI to S3. The machine can
reload in a shorter period of time, for performing flexible system operations. Moreover,
the whole system load time increases significantly. Virtualization is achieved by running
the machines on Xen [169] at OS level. The users interact with the system through
Amazon EC2 Command-line tools. Amazon EC2 is built through customizable Linux-
based AMI environment.
2.3.3.2 Eucalyptus
Eucalyptus [80] is a Linux-based open source software framework dedicated for cloud
computing. Eucalyptus allows the users to execute and control the entire VM instances
deployed across a variety of physical resources. Eucalyptus is composed of an NC that
controls the: (a) execution, (b) inspection, and (c) termination of VM instances on the
hosts running CC [110]. CC gathers information about VM and schedules VM execution
on specific NCs. Moreover, CC manages virtual instance networks. A STC called
“Walrus”, a storage service, provides a mechanism for storing and accessing VM images
and user data [110]. Cloud Controller is the web service entry point for users and
administrators that make high level scheduling decisions. Eucalyptus high-level system
components are implemented as web services in a system [110]. Instance manager is
responsible for virtualization in Eucalyptus. Moreover, Amazon EC2‟s SOAP and Query
interfaces provide the system access to the user. Dynamic QoS negotiation is performed
in Eucalyptus by Group Managers, who collect information through resource services.
51
2.3.3.3 Google Application Engine
GAE [170] is a freeware platform designed for the execution of web applications. The
applications are managed through a web-based administration console [171]. GAE is
implemented using Java and Python. GAE provide users with a facility of authorization
and authentication as a web service that lifts burden from the developers. Other than
supporting Python standard library and Java, GAE also supports APIs for (a) Data store,
(b) Google accounts, (c) URL fetch, (d) image manipulation, and (e) Email services.
2.3.3.4 Global Environment for Network Innovations
GENI provides a mutual and grouping environment for academia, industry, and public to
catalyze revolutionary discoveries and innovation in the emerging field of global
networks [107]. The project is sponsored by the National Science Foundation and is open
source and broadly inclusive. GENI is a “virtual laboratory” for exploring future internets
at scale [156]. The virtualization is achieved through network accessible APIs. GENI
creates major opportunities to: (a) understand, (b) innovate, (c) transform global
networks, and (d) interactions with society. GENI enables researchers to play with
different network structures by running experimental systems within private isolated
slices of a shared test-bed [172]. The user can interact with the GENI interface through
slice federation architecture 2.0 [173]. Dynamic QoS negotiation is also incorporated
through clearing house based resource allocation. GENI can be implemented in: (a) SFA
(PlanetLab), (b) ProtoGENI, and (c) GCF based environment.
52
Table 2.5: Comparison of Cloud Computing Systems
System System Focus Services Virtualization Dynamic QoS
Negotiation
User Access
Interface Web APIs
Value
added
Services
Implementation
Structure
Amazon Elastic
Compute Cloud
(EC2)
(2006)
Infrastructure
Compute,
Storage
(Amazon S3)
OS level
running on a
Xen hypervisor
None EC2 Command-line
Tools Yes Yes
Customizable Linux-based
AMI
Eucalyptus
(2009)
Infrastructure
Compute,
Storage
Instance
Manager
Group Managers
through
resource services
EC2‟s SOAP and
Query Interfaces.
Yes Yes
Open Source
Linux-Based
Google App
Engine
(2008)
Platform Web
Application
Application
Container None
Web-based
Administration
Console
Yes No
Python
GENI
(2007)
Virtual
Laboratory
Compute
Network
Accessible
APIs
Clearing House based
Resource Allocation
Slice Federation
Architecture 2.0
Network
Accessible
APIs
Yes SFA (PlanetLab),
ProtoGENI and GCF
Microsoft Live
Mesh
(2005)
Infrastructure Storage OS Level None
Web-based Live
Desktop and Any
Devices with Live
Mesh Installed
N/A No N/A
Sun
Network.com
(Sun Grid)
(2007)
Infrastructure Compute
Job
Management
System (Sun
Grid Engine)
None
Job Submission
Scripts, Sun Grid
Web portal
Yes Yes Solaris OS, Java, C, C++,
FORTRAN
E-learning
Ecosystem
(2007)
Infrastructure
Web
Application
Infrastructure
Layer
None
Web-based Dynamic
Interfaces
Yes Yes
Programming Models
Available in ASP.Net for
Front end and
Any Database like SQL,
Oracle at the Back end
GRIDS Lab
Aneka
(2008)
Software
Platform for
enterprise
clouds
Compute
Resource
Manager and
Scheduler
SLA-based
Resource Reservation
on Aneka Side
Workbench, Web-
based portal Yes No
APIs Supporting Different
Programming Models in
C# and.Net Supported
OpenStack
(2011)
Software
Platform
Compute,
Storage, web
Image
Compute, Web
Image Service None REST interface Yes Yes N/A
53
2.3.3.5 Microsoft Live Mesh
Microsoft Live Mesh aims to provide remote access to applications and data that are
stored online. The user can access the uploaded applications and data through web-based
live desktop or live mesh software [74]. The Live Mesh software uses Windows live
login for password-protection and is authenticated when all files transfers are protected
using SSL [52]. The concept of virtualization is implemented at the OS level. Any
machine having live mesh installed can access Microsoft Live Mesh or web-based live
desktop.
2.3.3.6 Sun Network.Com (Sun Grid)
Sun Grid belongs to cloud that offers its services as PaaS. Sun Grid [174, 175] is used to
execute Java, C, C++, and FORTRAN based applications on the cloud. For running an
application on Sun Grid, the user has to follow a certain sequence of steps. First, the user
has to build and debug the applications and scripts at a local development environment.
The environment configuration must be similar to that on the Sun Grid [175]. Secondly, a
bundled zip archive (containing all the related scripts, libraries, executable binaries, and
input data) must be created and then uploaded to Sun Grid. The virtualization is achieved
through a job management system commonly termed as Sun Grid Engine. Lastly, the Sun
Grid web portal or API can be used to execute and monitor the application. After the
completion of application execution, the results can be downloaded to the local
development environment for viewing [174, 175].
2.3.3.7 E-Learning Ecosystem
E-Learning Ecosystem is a cloud computing based infrastructure used for the
specification of all components needed for the implementation of e-learning solutions
[176, 177, 178]. A fully developed e-learning ecosystem may include: (a) web-based
portal, (b) access learning program, and (c) personal career aspirations. The purpose is to
facilitate the users or employees to: (a) check the benefits, (b) make changes to medical
plans, and (c) learn competencies that tie to the business objectives [177]. The focus of
an e-learning ecosystem is to provide an infrastructure that applies business discipline to
manage the learning assets and activity of the entire enterprise. The virtualization is
implemented at the infrastructure layer [179]. Web based dynamic interfaces are used to
interact with the users. Moreover, some value added services are also provided on
54
demand to exclusive users.
2.3.3.8 Grids Lab Aneka
Grids Lab Aneka [179] is a service oriented architecture used in enterprise grids. The aim
of Aneka is to provide a development of dynamic communication protocols that may
change the preferred selection at any time. Grids Lab Aneka supports multiple
application models, persistence, and security solutions [179, 180]. Virtualization is an
integral part and is achieved in Aneka through the resource manager and scheduler. The
dynamic QoS negotiation mechanism is specified based on the SLA resource
requirements. Moreover, Aneka addresses deadline (maximum time period that
application needs to be completed in) and budget (maximum cost that the user is willing
to pay for meeting the deadline) constraints. The user access is provided by using a
workbench or a web-based portal along with value added services.
2.3.3.9 OpenStack
OpenStack is a large-scale open source community maintained software made by the
collaboration of programmers for producing an open standard operating system that runs
clouds for virtual computing or storage for both public and private clouds. OpenStack is
composed of three software projects: (a) OpenStack Compute, (b) OpenStack Object
Storage, and (c) OpenStack Image Service [181]. OpenStack Compute produces a
redundant and scalable cloud computing platform by provisioning and managing large
networks of VM. OpenStack Object Storage is a long-term storage system that stores
multi peta bytes of accessible data. OpenStack Image Service is a standard REST
interface for querying information about virtual disk images. OpenStack is an open
industry standard with massively scalable public cloud. Moreover, OpenStack avoids
proprietary vendor lock-in by supporting all available Hypervisors abide by Apache 2.0
licensing.
2.4 Classification of Systems
The systems of each category (cluster, grid, and cloud) under software only and hardware
or hybrid only systems have been classified in the following section and is shown in
Table 2.6. The software only classification is composed of tools, mechanisms, and
policies. The hardware and hybrid classification is comprised of infrastructures or
55
hardware oriented solutions. Any change in the hardware design and OS extensions is
done by the manufacturers. The hardware and OS support can be cost prohibitive to the
end-users. However, programming in the case of the addition of new hardware and
software can result in more time and computational cost used. Moreover, the
programming can become a big burden to end-users. The cost to change hardware and
software at the user level is the least amongst all the costs associated with the system.
2.4.1. Software Only Solutions
The software only solutions are the projects that are distributed as software products,
components of a software package, or as a middleware. The distinguished feature of
software only solution is the controlling mechanism or job scheduler. As an example of
such systems, DQS and GNQS cluster queuing systems can be considered. Moreover, the
crucial component is the queue management module. The other examples include
OSCAR and CONDOR grid software packages. For many grid approaches, the
middleware layer is a crucial layer [160]. In fact, for research purposes, the grid system
can be reduced just to software only layer. Therefore, most of the grid systems presented
in Table 2.6 is categorized as the pure software solutions.
2.4.2. Hardware/Hybrid Only Solutions
Hardware/hybrid class of HPC systems is usually referred to as the multi-level cloud
systems. Because the clouds are used for business purposes, strict integration of the
service software application is needed with the physical devices. Moreover, the
intelligent software packages are specially designed and dedicated.
56
Table 2.6: Classification of Grid, Cloud and Cluster systems into Software and Hybrid/Hardware
approaches
Software Only Systems Hybrid and Hardware Only
Systems
Cluster Systems
OpenMosix, Kerrighed, Gluster,
Cluster-On-Demand, Enhanced
MOSIX, Libra, Faucets, Nimrod/G,
Tycoon, DQS, PVM, LoadLeveler,
SLURM, PBS, LSF, GNQS
OpenSSI
Grid Systems
G-QoSM, 2K, Bond, Globus, Javelin,
Legion, Netsolve, Nimrod/G, Ninja,
PUNCH, MOL, AppLeS, Condor,
Workflow Based Approach, Grid
Harvest Service,
Cactus Worm, Network Weather
Service
GRACE, Ninf
Cloud Systems
OpenStack, Eucalyptus
Amazon EC2, Sun Grid, Google App
Engine,
GRIDS Lab Aneka, Microsoft Live
Mesh, GENI, E-learning ecosystem
2.5 Conclusion of the Chapter
The analysis of resource allocation mechanisms of distributed high performance
computing (Cluster, Grid and Cloud) were considered in the Chapter. The conclusion of
the chapter is; firstly, the three categories of distributed high performance computing are
analyzed on the basis of commonalities between them. Secondly, systems in each
category are analyzed and compared based on selected features for each category.
Finally, the analyzed systems in each category are classified into software and
hybrid/hardware i.e. a particular system in a category of high performance computing is
software, hardware or hybrid only solution.
57
Chapter 3
Power Efficient Resource Allocation Using Least Feasible
Speed
58
3.1 Introduction
In the previous chapter, the author explores distributed HPC systems from resource
allocation point of view. To do this, firstly the author explores and takes some common
features from plethora of text on the distributed HPC systems. The author compares the
distributed HPC systems based upon those features. Secondly, the author explores and
takes common features for the individual distributed HPC category and compares systems
of each category on those features respectively. Finally, the author investigates and
classifies all the distributed HPC systems into pure software only solutions or into
hardware or hybrid only solutions. In order to incorporate power efficiency in distributed
HPC systems, the author takes the other dimension of the HPC that is the multi-core.
While keeping distributed HPC systems in mind, the author selects the multi-core end of
the HPC for experimental results of his proposed techniques because of shorter distance
between cores due to which cache coherency is more in multi-core as compared to
distributed HPC systems. Also the signal degradation and data transfer time is less.
In this chapter, the author proposes a novel generic technique called LFS. The work
presented in this chapter identifies the implicit disadvantage associated with existing
counterpart i.e., FFS approach. Furthermore the author investigates properties and bounds
that enable to identify a procedure, which can further reduce speed. The FFS approach
calculates speed at the first scheduling point while LFS approach calculate the speed on
all scheduling and takes that scheduling point on which the task is feasible and the speed
in minimum. This chapter also presents a simple core load balancing procedure i.e.,
lightest task shift procedure. The approach presented in this chapter can fine-tune the
system so that all the cores/processing units operate on the same clock rate and have
equally proportionate core utilization. The description of LFS is given below.
LFS: In this approach firstly, all scheduling points for every task in a task set are
calculated. Secondly, feasibility of every task is checked on all its corresponding
scheduling points. Thirdly, speeds for every task are calculated on all its
corresponding feasible scheduling points. Finally the minimum speed amongst all
calculated speeds is taken for a task on which the task is schedulable.
59
3.2 System Model and Background
In the periodic model of hard real-time systems, a task i is described by: (i) a task period iP
that is the time between any two consecutive instances of i , (ii) a worst-case execution time
iC that is scalable with core speed, and (iii) a relative deadline iD . A task i must have iC
units of CPU shares by iD . However, iC varies considerably at run time. All the
aforementioned parameters are integers. The task set 1 2{ , ,..., }n consists of n tasks and
can be divided into subsets such that 1 2{ , ,..., }o , where 1 1 2{ , ,..., }k and
2 1 2{ , ,..., }k k i , and so on. Moreover, a set of cores 1 2{ , ,..., }m , ( )m n is
available. The system speed if is within a predefined range 0.1,1.0 , with a step size of 0.01 .
The unit of speed is taken arbitrary (hertz, MIPS or percentage (%)). In our case we measured
speed in terms of percentage (%). Task set size is measured in number of tasks say n while
step size is incrementated amount of a value in case of speed its unit is arbitrary and in case of
tasks the incrementation is one. The core utilization of an individual task i is given as
( ) ii i
i
CU
P and the cumulative core utilization is denoted by
1( )
ki
i ii
CU
P . The problem that
we are addressing here is to map over under the fixed priority scheduling paradigm.
Core utilization and energy consumption is measured in terms of percentage. The first
feasibility test for an RM scheduling on a uniprocessor (also to be understood as a uni-
core) system was reported in [29], which was termed as the LL-bound. The LL-bound
states that a periodic task system where d p is static priority feasible if and only if
1
( ) 2 1niU n (1)
Where n denotes the number of tasks in . The term 1
2 1nn decreases monotonically
from 0.83 (when 2n ) to ln (2) as n . This result mandates that any periodic task set
of any size is static priority feasible on a preemptive uniprocessor if and only if the RM
scheduling is used and ( )iU is not greater than 0.693 . This result gives a simple ( )O n
procedure to test the task feasibility when tasks arrive at run time. However, the
abovementioned is only a sufficient condition. Therefore, it is quite possible that an
60
implicit-deadline synchronous periodic task system that exceeds the LL-bound to be
static-priority feasible. The LL-bound for the RM paradigm is quite pessimistic.
Therefore, it has been proven that for the average case [49]:
0.88iU (2)
A better utilization based test, termed the HB was detailed in [182]. Using the HB test, a
periodic task set is deemed schedulable if and only if
1
1 2n
i i
h
U
(3)
The classic work reported in [29] was later extended by modifying the task parameters in
[43]. However, all the aforementioned tests cover only the SC and trade utilization for
performance.
One possible solution to the aforesaid problem is to first equally distribute a given
workload among all the cores and then to find the feasibility of using the RM bounds,
such as the LL-bound [29] or the H-bound [182]. However, these bounds provide only the
sufficient conditions and a thick share of the core utilization is compromised for
schedulability. To the best of our knowledge, this work is the first to: (i) derive the exact
RM scheduling conditions for a multi-core system and (ii) determine a uniform lowest
possible system speed for a given workload that maintains system feasibility. Symmetric
performance among cores is only possible when all the cores operate at the same low
speed. The disadvantage associated with higher core frequency is that of leakage power,
i.e., higher clock frequency increases the system power leakage. Therefore, cores must
operate on the same minimum possible frequency for the following two reasons to: (i)
avoid power leakage and (ii) conserve energy by allowing cores to execute tasks at a
constant speed.
In our proposed model, we assume that a processor has 10 major operational levels as
detailed in the Table 3.1. Let if denote a speed level and the corresponding range is given
by if , as per our processor specifications. If for a particular speed 0
if that is
unavailable within the range if (minor levels), then the next (higher) nearest value is
assigned to 0
if from the range if . We must note that if is the highest possible speed
61
within a level. Therefore, any task i that is schedulable with any speed in if , is also
schedulable with if (for any i , maxi if f ). However, the converse may not hold.
Initially, we assume to be scheduled on a single core. For our model to be as close as
possible to the real-world scenarios, we opt for a constrained task model i iD P . Let
time 0t be the critical instant and the cumulative work load of a task i at any instance
of time t running at speed , 1i i if f f is to be represented by
Table 3.1: Operational levels and the respective speed ranges.
Level i if if (respective subranges/minor-levels for speed if )
0 0.1 0.01,0.02,...0.09,0.10
1 0.2 0.11,0.12,...0.19,0.20
2 0.3 0.21,0.22,...0.29,0.30
3 0.4 0.31,0.32,...0.39,0.40
4 0.5 0.41,0.42,...0.49,0.50
5 0.6 0.51,0.52,...0.59,0.60
6 0.7 0.61,0.62,...0.69,0.70
7 0.8 0.71,0.72,...0.79,0.80
8 0.9 0.81,0.82,...0.89,0.90
9 1.0 0.91,0.92,...0.99,1.00
1
1
( )i
i j
j j
i
i
tC C
PL t
f
(4)
The classic work reported in [49] details a solution that a task i is always feasible on a
generic core i at any instance of time t if and only if
mini
it S
L t t
(5)
Where t is a scheduling point and iS denotes a set of all the scheduling points constituted
by 1,..., ; 1,.... ii j
j
PS lP j i l
P
. The whole of the task set becomes RM feasible when
1,...,
minmax 1i
it S
i k
L t
t
(6)
62
3.3 Lowest Speed Calculations
In this section and in Section 3.4, we address the problem of scheduling hard-deadline
periodic tasks on a multi-core environment. Section 3.3 details the simulation results for
uniprocessor systems, which are extended to encompass the multi-core counterpart in
Section 3.4.
E. Humenay et.al [20] reports that the performance of the cores is asymmetric. Therefore,
tasks cannot be assigned to the cores with the implicit assumption that all the cores are
operating at the maximum clock frequency. Moreover, heat dissipation increases when
processors operate at higher clock rates. Because of the abovementioned issues pertaining
to higher clock rates, we must first determine the appropriate core performance and once
that is known, uniform system speed can be calculated by distributing the workload
among the cores based on some schedulability tests. It has been reported in [34] that the
bin-packing technique allows only half of the core utilization and the technique trades
utilization at the cost of performance. To overcome the aforesaid gap of50% , we derive
and utilize the necessary and sufficient condition. For our analysis, and for simplicity, we
assume that initially the system is a single core entity. Once the average core speed is
determined, we relax the abovementioned assumption to accommodate multiple cores.
A task i is schedulable on a generic core i if and only if Eq. (5) holds true. However,
it is possible that the task may also be schedulable at a lower core speed. Therefore, we
add the speed component into the schedulability analysis to determine the required task
speed, which can be represented by
1
1min
i
i j
j
i
i
j
t S
tC C
Pf
t
(7)
Any value of it S that satisfies Eq. (7) ensures that i is also schedulable with speed if .
However, for different values of t , there could be a set of respective speed levels,
guarantying the schedulability of i .
63
Ref. [46] reports a methodology that for a given workload returns the speed determined at
the first feasible point in the scheduling point set, termed FFS. Therefore, as soon as the
schedulability is confirmed at the first true scheduling (the time where i is schedulable),
the value of if is also determined. From the aforementioned discussion, an interesting
observation can be made that we state below.
Observation 1. The set of scheduling points iS for task i is always in a non-decreasing
order and the first value of it S that satisfies Eq. (7) does not guarantee the lowest
system speed required.
To further elaborate on Observation 1, we highlight the point with the help of an example
task set given below.
Example 1. Given three tasks 1 1.1,3 , 2 1,5 , 3 1,10 , where each task i is represented
by its parameters iC and iP , as an ordered pair ,i i iC P . Determine the lowest core speed to
schedule the lowest priority task 3 , in addition to higher priority tasks 1 and 2 .
According to the RM scheduling theory, task 3 is schedulable if and only if it satisfies
Eq. (7). Task 3 has a set of scheduling points 3 3,5,6,9,10S .
List 1. Task 3 is RM-schedulable if and only if
1 2 3 3C C C
1 2 32 5C C C
1 2 32 2 6C C C
1 2 33 2 9C C C
1 2 34 2 10C C C
It can be observed that, in the presence of the workload due to 1 and 2 , task 3 is also
schedulable at points 5 , 6 , 9 and 10 . The speed required at the respective points becomes
0.84 , 0.86 , 0.70 and 0.74 . The lowest speed is 0.70 that is achieved at the scheduling
point 9 , which is the fourth element in set 3S . Therefore, the first element does not
always guarantee the lowest system speed. From the aforementioned discussion, we can
64
conclude that all the values of it S need to be tested for finding the lowest core speed
for task i . That is, if may be obtained by the following equation
1
1min max
i
i j
j
it S
i
j
tC C
Pf
t
(8)
Figure 3.1: Gantt chart for 1 1.30,3 , 2 1.19,5 and 3 1.19,10 .
Figure 3.2: Gantt chart for 1 1.57,3 , 2 1.42,5 and 3 1.42,10 .
65
Figure 3.1 and 3.2 depict the Gantt charts for the task set given in Example 1. The charts
are drawn for the task set at the speeds of 0.84 and 0.70 , respectively. The values for iC
are rounded off to two decimals points to avoid cumbersome Gantt charts. We must note
that, irrespective of the representation of decimal fractions, Eq. (8) always results in the
exact same analysis and always respects the timing constraints of the task set. In both
cases, the entire task set is schedulable with lower speeds. The task set when executed at
the speed of 0.84 becomes 1 1.30,3 , 2 1.19,5 , 3 1.19,10 and the same is reflected
in Figure 3.1. It can be observed from Figure 3.1 that, after scheduling all the jobs of the
tasks, there still are 1.53: 4.98,5 7.49,9 time slots unused and these slots can further
be utilized for lowering the system speed. Similarly, Figure 3.2 reflects the Gantt chart
for the modified task set when executed at a lower speed of 0.70 and the original task set
(give in Example 1) is transformed into 1 1.57,3 , 2 1.42,5 , 3 1.42,10 . In contrast
to Figure 3.1, there are only 0.03: 8.97,9 unused slots in Figure 3.2, which is a clear
advantage and results in maximum system utilization.
3.4 Experimental Analysis
This section is devoted to experimental analysis and the results can easily be analyzed
from figures used in this section.
3.4.1 Determining the Lowest Speed
In this section, we evaluate the performance of our proposed technique, LFS, by
comparing it with the previously mentioned FFS methodology. Both the abovementioned
methodologies are compared from the perspective of system speed. The lower the speed,
the better is the technique.
To compare both techniques, random task sets of sizes within the range of [5, 50] were
generated, with a step size of 1. The plots reported in this Chapter are the average values
of 300 runs of all the task sets 5 through 50. The task periods were randomly generated
from a uniformly distributed range of [100, 10,000]. To obtain the corresponding task
execution demands iC for i , random values were taken from within the range of 1, iP ,
also with uniform distribution. The priorities were assigned to the tasks as per RM
scheduling rules. That is, the smaller the task period, the higher is the task priority. To
66
have a feasible RM schedulable task set, initially, we keep the system utilization at 0.69
or ln 2 , which is quite low. This low system utilization ensures that all the tasks within
a given task set are RM feasible. Otherwise, it is very likely that some of the tasks may
not be RM feasible when the system utilization is kept high. Moreover, this also will
pertain to an unfair comparison.
The author uses MATLAB as a simulations tool for the results obtained in Figure 3.3 of
this research work. In Figure 3.3 the author uses the FFS required speed as a benchmark
for comparison with varying utilization of computing unit i.e., core. Figure 3.3 depicts
the advantage of the LFS methodology over the FFS [46] technique. It can be observed
that the LFS approach continues until it finds the minimum possible speed for a given
task set, while maintaining task set schedulability. In contrast, the FFS procedure stops
searching for the scheduling points immediately as soon as it finds the first feasible point.
The difference between both techniques is quite large. For instance, for the task set
having only 5 tasks, the system speed required by the LFS methodology is much lower
than compared to the FFS procedure.
To further illustrate the effectiveness of the proposed methodology, we report several
simulation results with different system utilization. Figure 3.3(a) reports that, with an RM
schedulable task set, the LFS procedure is always able to execute all the tasks with lesser
system speed. This includes the task set with 50 tasks. As shown in Figure 3.3(d), higher
system speed is required for larger task sets. This is an understandable phenomenon,
because when the workload increases, more computational cycles are needed to complete
all the tasks by their respective deadlines. From the plots, we can also see that the
aforementioned behavior is exhibited by both the techniques as expected.
Although the FFS technique is based on the necessary and sufficient conditions of the
RM scheduling theory, the system feasibility is a must and it is maintained with the FFS
approach. However, when the task set size increases, the FFS procedure allows the
system speed to grow very rapidly to accommodate the resource requirements to ma intain
the deadline constraints. On the other hand, our proposed methodology gradually
increases the system speed in accordance with the principal objective, which is to
conserve energy as much as possible by allowing the system to operate at a clock rate
that is the slowest and keeping the task deadline constraints intact. Figure 3.3(a) through
Figure 3.3(d) reveals the performance of both techniques with the system utilization kept
67
at 70%, 80%, 90%, and 100%. It can be observed that when the system utilization
increases, the task computational demands also increase and both techniques need more
system speed to accommodate the workload presented. Therefore, the system operates at
a higher clock rate.
(a) 0.69.iU (b) 0.8.iU
(c) 0.9.iU (d) 1.0.iU
(n) (n)
(n) (n)
(%)
(%)
(%)
(%)
Figure 3.3: Effect of utilization on system speed.
The scheduling schemes studied here focus on distributed processing systems particularly
cluster and also be implemented on multi-core as the author is considering. The proposed
solution can be extended for scheduling across grids and clouds with variable delays
between computing units, having some implications and open challenges such as: (i)
mechanisms for the solution of heterogeneity, various administrative domains and user
68
privileges. (ii) selection mechanism of centralized server amongst the local servers that
runs a centralized dispatcher is necessary. (iii) maintenance mechanisms of various
queues on the dispatcher such as the request queue, server record queue etc. (iv)
mechanisms for meeting QoS requirements of customers and strategies for the fulfillment
of SLAs. Etc.
3.4.2 Energy Savings
As indicated in the introductory passage, the DVS is a promising technique for lowering
the power consumption of a CMOS circuitry. Before presenting our experimental analysis
using the DVS technique, we establish the necessary formulations for the DVS
methodology from the previous literature [22, 23, 24, 35].
The average power dissipation avgP of modern processors is composed of four parts.
avg leak cap std by shortP P P P P (9)
Where leakP , capP , std byP , and shortP denotes the power leakage, capacitive, standby, and
short-circuit power, respectively. The most critical component of Eq. (9) is the term capP .
Therefore, we can ignore the rest of the terms as in Ref. [24]. Being the dominating term,
capP can be expressed as:
2
cap ddP V f (10)
Where represents a transition activity dependent parameter and the switched
capacitance, and ddV is the supply voltage. Eq. (10) indicates the quadratic dependence
of ddV and f . It can be concluded that lowering the supply voltage is the most effective
factor in lowering the dynamic power consumption. However, the lowering of ddV
increases the circuit delay, which may be represented by the following
dd
delay
dd th
VT k
V V
(11)
Where k is a constant specific to a given technology and depends on the gate size and
capacitance, thV is threshold voltage that is the minimum required voltage, and is the
velocity saturation index of a CMOS circuit within the range of1 2 . Because f and
delayT are inversely related, we can say that
69
dd th
dd
V Vf k
V
(12)
Eq. (12) reflects that f is linearly related to the supply voltage. That is, the processor
speed is a direct consequence of the supplied voltage [35]. Therefore, by assuming that
avg capP P , Eq. (10) can be rewritten as
2
avg ddP V f (13)
It can also be observed that avgP is an increasing function of f . Let E be the energy
consumed while running a task with an average power avgP at the processor speed of f
for T time units. The abovementioned relationship can be represented mathematically by
the following equation
avgE P f T (14)
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Voltage (v)
No
rma
lize
d P
ow
er
(w)
Figure 3.4: Power consumption of Crusoe processor at respective voltage levels [19].
From the aforementioned discussion, we can deduce that an ideal processor would be the
one that can operate on continuous voltage levels. However, due to the switching
overhead, a continuous voltage spectrum is not provided for a CMOS circuit [183].
Therefore, only a discrete number of supply voltage levels are provided that can be
controlled with a DVS technique [184]. In our work, we assume a processor that can
support multiple discrete frequency levels within the range of [0.1, 1.0], with a step of
0.01, where 0.1 is the minimum speed needed to make the peripherals and interrupts
remain powered and active. For our study, we operate within the bounds reported in [166,
19] for a 70 nm Crusoe processor. The bounds and the discrete voltage levels are plotted
70
for the readers‟ convenience in Figure 3.4, which illustrates the relationship between the
power consumption and the supplied voltage per cycle.
To evaluate the two methodologies, namely the LFS-energy and FFS-energy, from the
point of view of energy savings, we simulate the system under the same arrangement as
previously discussed in Section 3.3.1. For this set of simulations, the speed of the system
was based on Eq. (7) for the FFS-energy technique and Eq. (8) for the LFS-energy
methodology. The workload of the system was the task set within the range of [5, 50],
with an increase of a single task after every 1000th iteration. The author uses MATLAB
as a simulations tool for the results presented in Figure 3.5 of this research work. In
Figure 3.5 the author uses the FFS normalized energy consumption as a benchmark for
comparison.
The voltage and the clock rate required for a successful completion of all the tasks within
the task set is determined by the FFS-energy and LFS-energy techniques. The total
energy consumed within the interval of 0 1,T T is measured by
1
0,
T
avg iT
P t f dt , where
,avg iP t f is the power consumption of the core when executing a task i at the speed of
if for t time units. The system utilization is again kept within the range of ln 2 ,1.0 ,
with a step size of 0.1. That is, the energy values are measured for the task set after a
10% increase in the system utilization. It can be observed from Figure 3.5 that when all
the tasks are schedulable, the savings in energy consumption of both techniques are very
encouraging, up to a certain level. The reason behind this low energy consumption is the
likelihood of the RM feasibility of all the tasks due to low system utilization. Therefore,
the computational demands of the individual tasks are much lower than their respective
periods.
The only difference is that with the FFS-energy approach, the plot trend remains higher
when the task set increases. This is due to the fact that there is a possibility that some of
the task sets may contain some tasks that must be run at a higher speed. Therefore, the
system energy consumption increases as the power function is quadratically proportional
to the system speed.
71
(a) 0.69.iU (b) 0.8.iU
(c) 0.9.iU (d) 1.0.iU
(n) (n)
(n) (n)
(%j)
(%j)
(%j)
(%j)
Figure 3.5: Normalized energy consumptions for the task set under varying utilizations.
Our proposed LFS-energy technique projects a lower system speed compared to the FFS-
energy approach [46]. This is due to the fact that its implicit characteristic of
continuously searching for the lowest possible feasible speed out of all the possible
speeds, which tends to be never higher than that obtained through Eq. (7). With increased
system utilization, the computational demands of individual tasks increase. Therefore, the
workload also increases in the allowable time window. It can also be observed from
Figure 3.5 that the LFS energy approach also increases the system speed with the
increase in utilization. This is due to the fact that the scheduler must respect all the
deadlines of the tasks. However, the FFS-energy approach is a very reactive procedure as
the first feasible scheduling point might demand high system speed. Therefore, the cores
run at a higher clock rate. In contrast, the LFS-energy approach determines the point
where the minimum task speed is calculated. Therefore, the speed required for the same
72
task set is much lower. However, the lower system speed identified by the Eq. (8) also
means that the computation demands of the task have now prolonged.
10 20 30 40 501
1.2
1.4
1.6
1.8
Task Set Size (n)
Req
uir
ed
Ex
ecu
tio
n T
ime (
ms)
LFS-Time
FFS-Time
Figure 3.6: LFS and FFS Comparison based on required execution time
The Figure 3.6 compares the FFS approach and LFS approach based on required
execution time as the number of tasks set size increases. It is clear from the results
obtained in Figure 3.6 that the existing technique i.e. FFS [46] outperform LFS in
required execution time. It‟s due to the FFS approach that executes tasks on first feasible
schedulable point and calculates speed on that point while LFS checks all the feasible
points and take the point on which speed is the minimum one. This extra time taken by
the LFS approach is really important. In this research work, the author is more focusing
on power (energy) than required execution time or response time. However, care must be
taken on applying the LFS approach in case of hard real time task. The LFS approach can
easily be applied to soft real time tasks and firm real time tasks where the deadline
missing is not as hazardous as in hard real time tasks. The effect of lowering speed on
tasks execution times is illustrated through an example already presented in this chapter
and Figure 3.1 and 3.2 are based on the example. Also the task deadlines are intact
through the LL-bound and H-bound presented earlier in this chapter. The author uses
MATLAB as a simulations tool for the results obtained in Figure 3.6 presented in this
research work. In Figure 3.6 the author uses the FFS required execution time as a
benchmark for comparison. Based on the above discussion the time complexity of FFS is
( log )m n where m shows the amount of time taken calculating the number of
scheduling points and log n is the amount of time taken to find the first feasible point to
73
calculate speed at that point. The time complexity of LFS is ( )mn , where m shows the
amount of time taken in calculating the number of scheduling points, and n is the amount
of time taken in calculating speed at every feasible point. The minimum speed among the
calculated speeds is selected. As we know that 1ft
which is of our concern i.e. the
author is interested in lowering the speed (frequency) and hence power (energy) the
relationship of FFS and LFS may becomes like ( ) ( ( ))f LFS g FFS . It means that if
speed is taken as comparison parameter the FFS is the upper bound for the LFS i.e.
0 ( ) ( ( ))f LFS C g FFS and if time is the comparison parameter then all these terms
occurs in reverse order. If speed is the comparison parameter then the above relationship
can also be written as ( ) ( ( ))f FFS g LFS . It means that LFS is the lower bound for the
FFS i.e. 0 ( ( )) ( )C g LFS f FFS .
3.5 Task Partitioning in Multi-core Systems
To avoid testing the schedulability of a task at reduced number of scheduling points,
authors in [185] introduced the concept of false point.
Definition 1. Under a fixed priority scheduling, a point t is termed a false point for a
generic task i , if and only if it satisfies the inequality constraint of iL t t .
The concept of false point is plausible; however, it is inapplicable to DVS-enabled cores
for the following reason.
Theorem 1. Under fixed priority scheduling and multiple system speed levels, a false
point t for i is not necessarily a false point for the lower priority tasks 1,...,i n .
Proof. The proof is presented for 1i that can easily be extended to the case of 2 ,...i n .
As mentioned in Section 3.1, a scheduling point t for i is also the scheduling point for
1i . Let 't be such a point that is present within the set iS and all the subsequent sets
1,...i nS S . If 't is a false point for i that is executing at the speed of if , then,
'1
1' '
i
i j
j j
i
i
tC C
PL t t
f
(15)
74
Similarly, the workload for 1i at the same point 't that is running at the speed of
'
if can
be expressed as
'
1
1'
1 '
i
i j
j j
i
i
tC C
PL t
f
(16)
When'
i if f , the false point for i also remains the false point for 1i . However, if '
i if f
, then
' '1
1
1 1
'
i i
i j i j
j jj j
i i
t tC C C C
P P
f f
(17)
Which is a contradiction because: (a) '
i if f and (b) the workload due to 1i at 't is
lowered due to the higher value of '
if . Therefore,
'
1
1 '
'!
i
i j
j j
i
tC C
Pt
f
(18)
Which contradicts that 't is a false point for 1i .
First, determine the speed for the execution of task i . Then calculate the core‟s specific
minimum required speed so that all the tasks remain schedulable on the core i with
speed if . This relationship can be captured by the following expression
0
max mini ii n
f f
(19)
Eq. (19) ensures that the core operates on the appropriate speed to execute all the tasks
1,..., k i successfully. Next, we must find the average system speed (uniform speed
for all the cores) to execute tasks on cores. As previously mentioned in Section 3.1,
the aforementioned result (Eq. (19)) is applicable only to a single core system. Therefore,
we must relax some of the assumptions for the multi-core model. The same technique as
discussed in Section 3.1 can also be applied to the whole task set and the tasks can be
mapped on cores, which we detail in the subsequent text.
75
Let the tuple , ,i i if represent the task set i assigned to core i running at the
speed of if . Because it is preferred to execute all the cores at a uniform speed, the
average system speed must be calculated. In other words,
1
m
i
j
f
fm
(20)
To achieve load balancing among the cores, we adopt a task shifting strategy that
migrates a task from an un-schedulable core to a core with the smallest workload.
Theorem 2. If a task i is shifted from an unschedulable core i to another core j
(wherein both the cores run at the same speed), then the schedulability of i on i
increases by a factor of iC .
Proof. If i is un-schedulable on i at 1,..., ; 1,..., ii j
j
Pt S lP j i l
P
, then
1
1
i
i j i
j j
tC C f t
P
1
1
i
j i i
j j
tC f t C
P
The aforementioned is true because i i if t C f t , iC time units are reduced from core
i by assuming that both cores run at the same speed if .
Theorem 3. If all the cores run at the same speed, then adding a task i to i weakens
the schedulability of i by iC .
Proof. Follows from Theorem 2.
Theorem 4. If all the cores run at the same speed, then no task can be added to the barely
schedulable core.
Proof. Let i be a task such that in addition to the already schedulable 1i tasks, is
schedulable on a barely schedulable core i . The term barely schedulable refers to a
system in which only 0 1i i slots are available on a core i , i.e.,
76
1
1
i
j i
j j
tC f t
P
By adding i to the aforementioned, we obtain
1
1
i
i j i
j j
tC C f t
P
,
1
1
i
i j i
j j
tC C f t
P
Because i if t C , we get
1
1
i
i j i
j j
tC C f t
P
(21)
Which shows that the available slot is small enough to accommodate the task i .
However, the aforementioned claim contradicts the assumption that i is barely
schedulable on i .
There are two possible cases to balance the load among all the cores of the underlyin g
system, which we detail below.
Case 1. i
f f : Because all the computation times are proportionate to the core
speed, a lower speed core would prolong the task computations. Therefore, the task set
i that was previously feasible at the speed ofi
f on core i , now becomes infeasible at
the speed of f . Care must be taken when migrating tasks from i to another core j . We
must find the most underutilized core among all the cores that are operating at the speed
f , which also offer space to accommodate more tasks. Once the particular core is
identified, the task shifting (or migration) process begins. Let core i be the task donor
and core j be the task acceptor, i.e.,
, , :ii i l i l i q if l U U | 1,..., ;q k q l . In our system, if a task i i is
to be shifted to j , then the task with the lowest utilization on i is chosen as the
candidate for shifting. The process continues until utilization of all the cores is leveled.
77
This arrangement guarantees to meet all the task deadlines and to run all the cores at the
same speed.
Case 2. i
f f : In this case, the task subset i is feasible on i and the core might
be underutilized at the speed of f . As mentioned above, i can accommodate more
tasks that are assigned to other cores within the system.
Once the core utilization is balanced among all the cores under RM scheduling, a uniform
speed f is recalculated. This uniform speed mandates that all the tasks are schedulable
and allows the system to operate at the lowest possible speed. Therefore, the overall
system power consumption is also reduced. In other words, it is the core utilization that
decides the system speed and not the number of tasks. This is due to the fact that there
may exist a core i that has a higher number of tasks than those assigned to a core j ,
while j iU U . Therefore, j must operate at a higher speed than i .
3.6 Task Mapping on Cores
By applying LFS strategy discussed earlier in this chapter, it may not ensure that load on
all computing units (core or system) will be the same. In-order to equally utilize all the
computing units, load balancing is required. In this section, we generate the task set with
the same procedure as previously described in Sections 3.3.1 and 3.3.2. Initially, we start
with a task set of size 120 to observe its mapping on 8 cores. We plot these results in
Figure 3.7, which are categorized into two domains: (i) Figure 3.7(a) and (b) depict the
results before applying the proposed strategy and (ii) Figure 3.7(c) and (d) show the
results after applying the proposed technique. The author uses MATLAB as a simulations
tool for the results obtained in Figure 3.7 and Figure 3.8 presented in this research work.
In Figure 3.7 and Figure 3.8, the author wants to investigate the task mapping to
computing units and utilization of each computing unit before and after applying lightest
task migration strategy and no benchmark is used.
78
(c) Utilization of cores (after balancing).
(a) Utilization of cores (before balancing). (b) Tasks mapping to cores (before task shifting).
(d) Tasks mapping to cores (after balancing).
(n) (n)
(n) (n)
(n)
(n)
(%)
(%)
Figure 3.7: Load distribution on system with 8 cores.
Figure 3.7(a) reflects the case, where a task set of size 120 is distributed over 8 cores and
some cores are heavily utilized than others. For instance, the utilization of core 7 is the
highest among all the 8 cores while 8 has the lowest utilization. Similarly, Figure 3.7(b)
shows the corresponding number of tasks, where core 2 has the highest number of tasks,
while core 8 has the lowest number of tasks. The load is balanced on the basis of core
utilization; therefore, some of the tasks are shifted from core 7 to the other cores.
Because we have used the necessary and sufficient conditions in our work, the tasks are
79
assigned to the cores based on the exact feasibility analysis. That is, when a core, say 1 ,
is assigned a certain number of tasks, the rest of the tasks are mapped onto the next core
2 . The same is observed from Figure 3.7, where the cores 1 through 7 are fully utilized
while core 8 remains underutilized. This is due to the fact that fewer tasks are left for
core 8 . It can also be deduced from Figure 3.7(a) and (b) that 3 has a utilization of
80%, while the total number of tasks assigned is only 12 (2nd
lowest after core 8 in the
system, see Figure 3.7(b)). After applying our proposed technique, the results are plotted
in Figure 3.7(c) and (d). The core utilization is almost the same; however, the number of
tasks assigned to the cores is not uniform (see Figure 3.7(d)).
(a) Utilization of cores (before balancing). (b) Tasks mapping to cores (before task shifting).
(C) Utilization of cores (after balancing). (d) Tasks mapping to cores (after balancing).
(n) (n)
(n) (n)
(n)
(n)
(%)
(%)
Figure 3.8: Load distribution on system with 12 cores.
80
We further increase the number of the cores to 12 and distribute the workload of the task
set of size 190. The corresponding results are shown in Figure 3.8. Although we have
applied a heavy system load, it can be seen from Figure 3.7(a) and (c), and Figur e 3.8(a)
and (c) that the utilization of a core never reaches 100%, which is due to the implicit
characteristics of the RM scheduling algorithm. Figure 3.8(a) and (b) report the
utilization and the task mapping of cores before load balancing, while Figure 3.8(c) and
(d) plot the results after applying the task shifting technique. From Figure 3.8(a), we can
deduce that core 9 and core 10 are heavily utilized, while Figure 3.8(b) reports that
core 10 and core 11 have the maximum number of tasks assigned. It is worth
mentioning that the task shifting is performed in such a way that the lightest task among
all the assigned tasks to the maximum utilized core is shifted to the minimum utilized
core. This arrangement: (i) results in minimum possible workload shifting from the
higher to a lower utilized core and (ii) does not violate the timing constraints of the
already assigned tasks to the cores. It can be observed from Figure 3.8(b) and (d) that 15
tasks from other cores are shifted to core 12 under the load balancing mechanism.
Interestingly, it can also be observed from Figure 3.8(c) that core 7 has the highest
number of tasks (26 in total), while core 7 , 9 , and 10 have the lowest number of tasks
(12 each). However, the utilization of all the cores is almost the same as reported in
Figure 3.7(c). Moreover, initially, core 12 was under utilized as can be observed from
Figure 3.8(a) and had the lowest number of task assignments. Because of the shifting of
the lightest task from other cores, the task assigned to core 12 is the highest (26 as can
be seen from Figure 3.8(d)). However, the utilization is balanced with the remaining
cores (see Figure 3.8(c)). Therefore, all the cores can now be run on a uniform speed,
which was the intention of this work.
As reflected in Figures 3.7(c, d) and 3.8(c, d), any further task shifting is not possible
until task splitting techniques are applied. Since we do not consider the task splitting case
here, there might be situations of uniform utilization and hence uniform speed will not be
possible. In such cases, the speed assigned to the system is the speed of the core that is
highly utilized.
81
3.7 Conclusion of the Chapter
Frequency (speed) is one of the factor through which power can be minimized. In the
Chapter a new mechanism named Least Feasible Speed (LFS) were proposed for power
reduction and obtained results were compared with its existing counterpart called First
Feasible Speed (FFS) [46]. The obtained results reveal the speed obtained through LFS is
low as compared to FFS and hence power (energy). A lowest single core speed is
calculated for each core and then average system speed is calculated. If all the cores runs
at the average speed then the tasks that were feasible at high speed of a core may now
becomes infeasible at average speed. Therefore, a lightest task migration strategy in the
Chapter is proposed to equally utilize the load among cores and hence run all the cores at
average speed for power reduction.
82
Chapter 4 Power Efficient Resource Allocation Using Genetic
Algorithm
83
4.1 Introduction
Mathematical formulation and results reported in the previous chapter confirmed that the
speed obtained through and hence power (energy) i.e., .
Genetic algorithms are classified as meta-heuristics and applied when more formal
optimizations are intractable and difficult to solve. In this research genetic algorithm is
applied on FFS values for improving speed and power as well. The process of GA is:
initially offspring‟s fitness values are calculated and some offsprings are selected by
using any selection method. The selection method can be random, roulette wheel or
tournament. The selected offsprings then passes through cross over and mutation phases.
Finally, the fitness values of the new offsprings are calculated. After cross over and
mutation, only those genes retain in new population whose new fitness value is better
than old fitness value and thus the process of optimization takes place.
It is clear from the results obtained in previous chapter that the speed and hence the
power obtained through LFS is the optimal one. In this chapter the author attempts to
further investigates and identify mechanism for power minimization. To do this, the
author apply genetic algorithm to FFS that presents very interesting results not only in
terms of speed but also in terms of time as well. This chapter not only presents a novel
approach i.e., GA-FFS but also compares the two proposed algorithms (LFS and GA-
FFS) of the author with existing counterpart i.e., FFS in terms of speed and time. The
chapter presents the tradeoff between time and speed and gives fruitful results that if time
is your main consideration use FFS approach for fast response but if you give more
attention to power then use LFS and use GA-FFS in case of moderate power and time is
required.
4.2 Proposed Work
The working process of GA-FFS is clearly depicted in Figure 4.1. The upper red
rectangle in Figure 4.1 represents the process of FFS. Initially, scheduling points are
calculated for a generic task through the mechanisms given in previous chapter. Then
a workload is calculated and checked that whether the load (of this task plus other higher
priority tasks) is feasible or not. The feasibility of task is checked through Eq. (5). If
the task is feasible at a scheduling point , then the speed of task is calculated
LFS FFSpower powerLFS FFS
i
i
i t i
84
thorough Eq. (7) and further scheduling points are discarded. This speed is called FFS.
However, if the task is not feasible at the scheduling point, then another scheduling
point is taken and the process continues until first feasible point reached.
Figure 4.1: Flow chart of GA-FFS
i
85
FFS for every task in task set is calculated and it becomes initial
population in GA. The lower red rectangle in Figure 4.1 shows the process of genetic
algorithm. The FFS values behave phenotype in the genetic algorithm and these values
must be converted into genotype for further processing. In other words, the FFS values
are in decimal form therefore we must have to convert it into binary form for further
steps of genetic algorithm.
Fitness values of all the offspring in the initial population are calculated. The fitness
function in our case is:
Such that
and 1i if f (22)
By we mean minimization of FFS such that the given conditions are satisfied.
The is the first feasible speed and can be calculated through Eq. (7).
Randomly selected offspring
from Population
Fitness value (
) Tournament Selection between (1,
2) and (3, 4) Offsprings
0.50
0.55
0.20
0.40
Figure 4.2: Process of Tournament Selection
Once fitness values are calculated, next step in genetic algorithm is selection of
offsprings. There are many ways for selection like random selection, tournament
selection and roulette wheel selection [51]. Every selection method have its merits and
1 2{ , ,..., }n
min if
1
1min max
i
i
i i
j j
it S
tC C
Pf
t
min if
if
if
0 1 0 0 1 0 1 0 0 1
0 1 0 0 1 0 1 0 0 1
0 1 1 0 1 0 1 0 1 0
0 0 0 0 1 0 0 0 0 1
0 0 0 0 1 0 0 0 0 1
1 1 1 0 0 0 1 0 0 0
86
demerits. We use tournament selection, as the notion of genetic algorithm is “survival of
the fittest” [53]. In selection phase, four offsprings are randomly selected from the initial
population and then tournament selection is applied to select two offsprings for cross
over phase of genetic algorithm. For tournament selection, two offsprings must be
needed. In tournament selection, the offspring having minimum speed (as our problem is
minimization problem) as selected to become parent. The above Figure 4.2 can pictorially
represent the tournament selection.
4.2.1 Main Drivers of Genetic Algorithm
The main drivers of genetic algorithm are cross over and mutation. Cross over always
occurs between two offsprings i.e., for cross over process two offsprings are required,
while mutation process occur in a single offspring.
4.2.1.1 Cross Over
It is one of the main drivers in genetic algorithm. Just like natural phenomenon, parents
meet and produce offsprings or children. The offsprings have some features of parents
while they also possess some features of their own. The features that are transferred from
parents to offsprings are due to cross over process. Cross over occurs between two
offsprings. A cross over can occur either at a single point or at multiple points. The cross
over point may fix or may vary. In our case we are using single point cross over but the
cross over point is not fixed. The cross over process can be pictorially represented in
Figure 4.3.
(a) One point cross over
(b) Multi point cross over
Figure 4.3: Cross-over Process
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 1 1 1 1 1
1 1 1 1 1 0 0 0 0 0
Offsprings before
One Point Cross Over
Offsprings after
One Point Cross Over
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1
0 0 0 1 1 1 1 0 0 0
1 1 1 0 0 0 0 1 1 1
Offsprings before
Multi Points Cross Over
Offsprings after
Multi Point Cross Over
87
4.2.1.2 Mutation
This is another driver of genetic algorithm. Through cross over traits are transferring
from parents to offsprings while mutation is responsible for the offspring‟s own traits.
Mutation means a slight change in offspring. Mutation occurs after cross over. Mutation
occurs in a single offspring that there is no need of offspring pairs. Mutation can be
bitwise (multi point) or it can be single bit (just one point). Sometimes mutation did not
occur in offspring. Normally the mutation probability is kept very low. If the mutation
probability is high then there are high chances of convergence towards objective
function. Mutation is used to trap out from local minima. In our case, we are using just
single point mutation and the mutation probability is 0.1. For mutation we randomly
select a mutation point and then check the probability that whether mutation occurs at
this point or not. If mutation can occur, the bit is changed accordingly and leaves the
offspring as it is, if mutation is not possible. Mutation can be pictorially represented in
the following Figure 4.4.
(a) Random single point mutation
(b) Bitwise mutation
Figure 4.4: Mutation Process
In random single point mutation a point is selected randomly and probability is checked
only once, that mutation will occur or not. While in bitwise mutation, the probability is
checked at each bit to mutate the bit or retain it as it is.
After cross over and mutation processes, the fitness values of the new offsprings are
calculated through Eq. (9). Old offsprings are replaced with new offsprings if and only if;
(23)i iNew f Old f
The new fitness value of an offspring is less than the old fitness value of the same
offspring and also fulfils both constraints of Eq. (22).
0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
Offspring before
Random single point mutation
Offspring after
Random single point mutation
88
The offsprings whose new fitness value is not less than its old fitness value, are retain in
population for further iterations of GA.
By summing the above discussion, FFS approach calculates through Eq. (7) and LFS
through Eq. (8). GA-FFS improves the obtained through FFS by using above drivers
of GA such that the new speed is not more than that obtained through LFS. Hence, we
can easily conclude that obtained through .
4.2.2 Feasibility Checking Through GA-FFS Approach
By rearranging Eq. (7) and Eq. (8), a task can become feasible at FFS if and only if;
1
1min (24)
i
i
i j
j j
t Si
tC C
Pt
f
And a task become feasible at LFS if and only if;
1
1min max (25)
i
i
i j
j j
t Si
tC C
Pt
f
We know the obtained through LFS, FFS and GA-FFS is like .
We also know that as value decreases, the required execution time increases. It is also
clear from the discussion in previous chapter that if a task is feasible at FFS, it may also
be feasible at LFS. Therefore, from the above discussion we can conclude that
. Hence, if a task is feasible at LFS it must be feasible at
GA-FFS as well.
Algorithm 1: GA-FFS
Input: FFS values
Output: Set of values
Steps:
1. Calculate individual of all the offsprings in initial population (FFS)
2. FOR epoch:1 TO Total number of epoch
if
if
if LFS GA FFS FFS
i
i
if LFS GA FFS FFS
if
time time timeFFS GA FFS LFS
min if
if
89
i. Tournament selection
Select randomly where and arrange tournament between (1,
2) and (3, 4).
Select those as parent whose opponent // Handle ties
arbitrary ii. Cross over
Select where
Cross over the two parents at .
iii. Mutation
Select where
IF lies in the range of
mutate the bit at
ELSE
retain the bit as it is
End IF
iv. Calculate of the two new
IF // Handle ties arbitrary
Replace old with new
ELSE
Retain the old as it was (before cross over and mutation)
End IF
epoch = epoch +1
End FOR
EXIT
4.3 Experimental Results and Analysis
For experimental results and analysis the three techniques FFS [46], GA-FFS and LFS
(proposed in Chapter 3) were compared. To compare the three techniques, random task
sets of sizes within the range of [5, 50] were generated, with a step size of 1. The plots
reported in this Chapter are the average values of 300 runs of all the task sets 5 through
50. The task periods were randomly generated from a uniformly distributed range of
[100, 10,000]. To obtain the corresponding task execution demands iC for i , random
values were taken from within the range of 1, iP , also with uniform distribution. The
priorities were assigned to the tasks as per RM scheduling rules. That is, the smaller the
task period, the higher is the task priority. To have a feasible RM schedulable task set,
initially, we keep the system utilization at 0.69 or ln 2 , which is quite low.
iOff 4
1i Off
Offif if
p 1 ( )p size Off
Off p
p 1 ( )p size Off
mutprob mutthreshold
p
if Off
( ) ( )i iNew f Old f
Off Off
Off
90
0 10 20 30 40 500.5
0.6
0.7
0.8
0.9
1
Task Set Size (n)
Re
qu
ire
d S
pe
ed
(%
)
FFS
GA-FFS
LFS
0 10 20 30 40 500.5
0.6
0.7
0.8
0.9
1
Task Set Size (n)
Req
uir
ed
Sp
eed
(%
)
FFS
GA-FFS
LFS
at 150 epoch (b) at 400 epoch
Figure 4.5: Task set size against Required Speed (LFS, FFS, GA-FFS)
In this section FFS, LFS and GA-FFS are evaluated experimentally. Initially we plot the
speeds results of FFS, LFS and GA-FFS in Figure 4.5. It is clear from Figure 4.5 that FFS
runs at higher speed than GA-FFS. Therefore, GA-FFS performs better than FFS. The
Figure 4.5 (a) results are based on 150 epochs and Figure 4.5 (b) results are based on 400
epochs for GA-FFS. Some major findings based on Figure 4.5 (a) and Figure 4.5 (b) are
noted, that are given below:
GA-FFS performs better than FFS when speed is taken as testing attribute. FFS
uses Eq. (7) for speed calculation. The output of this equation is the input for GA-
FFS. As genetic algorithm is an optimization algorithm, it also applies cross over
and mutation. In addition, the objective (fitness) function of GA-FFS is
.Therefore; GA-FFS improves the results of FFS. Hence, GA-FFS outperforms
than FFS when speed is taken into consideration. The Figure 4.5 also presents the
supremacy of LFS over FFS and GA-FFS, if speed is the testing criterion. In other
words, LFS algorithm runs a system at less speed than FFS and GA-FFS. It should
be noted that LFS is efficient when speed is under consideration [2]. Therefore,
author set a constraint on GA-FFS that the speed obtained through GA-FFS
must not be less than that of LFS. As shown in Eq. (22)
1
1min max (26)
i
i
i j
j j
it S
tC C
Pf
t
min if
if
91
0 10 20 30 40 50
0.4
0.5
0.6
0.7
0.8
0.9
1
Task Set Size (n)
No
rmal
ized
En
erg
y C
on
sum
pti
on
(%
j )
FFS-energy
GA-FFS-energy
LFS-energy
0 10 20 30 40 50
0.4
0.5
0.6
0.7
0.8
0.9
1
Task Set Size (n)
No
rmalize
d E
nerg
y C
on
su
mp
tio
n (
% j )
FFS-energy
GA-FFS-energy
LFS-energy
(a) at 150 epoch (b) at 400 epoch
Figure 4.6: Energy Consumption against Task set size for LFS, FFS and GA-FFS
Figure 4.6 shows the results of energy consumption by FFS, GA-FFS and LFS
algorithms. The Figure 4.6 (a) results are based on 150 epochs and Figure 4.6 (b) results
are based on 400 epochs for GA-FFS. Some major findings based on Figure 4.5 (a) and
Figure 4.6 (b) are noted, that are given below:
GA-FFS consumes less power (energy) than FFS. It is clear from previous results
that the speed obtained through GA-FFS is less than FFS. We also know that
and . When we put the speed values of FFS and GA-FFS in
these two equations, the power (energy) value for GA-FFS will outperform FFS
value. It is also clear from the results that LFS consumes less power (energy) than
FFS and GA-FFS. As clear from the findings of Figure 4.5 (a) and Figure 4.5 (b).
0 10 20 30 40 501
1.2
1.4
1.6
1.8
2
Task Set Size (n)
Req
uire
d E
xecu
tion
Tim
e (m
s)
LFS-Time
GA-FFS-Time
FFS-Time
0 10 20 30 40 501
1.2
1.4
1.6
1.8
2
Task Set Size (n)
Re
qu
ire
d E
xe
cu
tio
n T
ime
(m
s)
LFS-Time
GA-FFS-Time
FFS-Time
(a) at 150 epoch (b) at 400 epoch
Figure 4.7: Execution Time against Task set size for LFS, FFS and GA-FFS
f
E P T 2P V f
92
Figure 4.7 depicts required execution time of FFS, GA-FFS and LFS. It is clear from
Figure 4.7 that GA-FFS defeat LFS when required execution time is the testing
parameter. Figure 4.7 (a) results are based on 150 epochs and Figure 4.7 (b) results
are based on 400 epochs for GA-FFS. Some major findings based on Figure 4.7 (a)
and Figure 4.7 (b) are noted, that are given below:
As clear from the results obtained in Figure 4.7 the GA-FFS defeats LFS in
required execution time. As LFS checks all scheduling points for finding least
feasible speed of every task, hence consumes more execution time. While GA-FFS
uses the first feasible scheduling point, for every task and then applies genetic
algorithm to improve the results. As we assume the genetic portion‟s execution
time of GA-FFS takes constant time equal to number of iterations
(epochs/population), therefore, GA-FFS defeats LFS in required execution time. It
also represents that FFS defeat GA-FFS and LFS in required execution (response)
time. As clear from the previous finding, GA-FFS defeat LFS in execution time.
The output of FFS becomes input for GA-FFS for further operations of genetic
algorithm that take constant time equal to number of iterations
(epochs/population), as assumed. Therefore, FFS defeats GA-FFS and LFS in
required execution time.
A task is feasible using GA-FFS i.e., a task completes in its deadline, if GA-FFS
is used. As clear from the algorithm itself and can be deduced from previous
findings that the required execution time of GA-FFS is less than LFS i.e.,
and also the results obtained in previous chapter reveal that
if a task is feasible at FFS, it may also be feasible at LFS. Therefore, from the
above discussion we can conclude that . Hence, if a
task is feasible at LFS, it may also be feasible at the speed obtained through GA-
FFS. The results also represents that Genetic algorithm proves the “survival of the
fittest” notion. “Survival of the fittest” [53] is the notion of Darwin‟s theory and of
genetic algorithm. As clear the algorithm itself and also from Figure 4.5, Figure
4.6 and Figure 4.7 that, when the numbers of epochs are less, the GA-FFS behaves
like FFS and when the number of epochs increases, the GA-FFS behaves like LFS
i.e., as the number of iteration increases the more fit values are obtained. It is also
time timeGA FFS LFS
time time timeFFS GA FFS LFS
93
clear from the obtained results that values obtained through GA-FFS are non-
decreasing. As clear from the second constraint of the fitness function, that
. Therefore, the values obtained through GA-FFS are non-decreasing.
Based on the above discussion the time complexity of FFS is ( log )m n where m shows
the amount of time taken calculating the number of scheduling points and log n is the
amount of time taken to find the first feasible point to calculate speed at that point. The
time complexity of LFS is ( )mn , where m shows the amount of time taken in
calculating the number of scheduling points, and n is the amount of time taken in
calculating speed at every feasible point and finally select minimum speed among the
calculated speeds. The time complexity of GA-FFS is ( log )m n k , where logm n is the
amount of time taken by the FFS and k is the amount of time taken by the genetic
algorithm portion of the GA-FFS algorithm. The amount of k depends on the number of
iterations (epochs/population) in genetic algorithm. If the value of k is high enough to
the amount of n then the GA-FFS will take more time than LFS and if the value of k is
small then GA-FFS will take less time than LFS. However existing counterpart the FFS
[46] beats our both proposed approaches if time is the comparison parameter. As we
know that 1ft
which is of our concern i.e. the author is interested in lowering the
speed (frequency) and hence power (energy), the relationship of FFS and LFS may
becomes like ( ) ( ( ))f LFS g FFS . It means that if speed is taken as comparison
parameter the FFS is the upper bound for the LFS i.e. 0 ( ) ( ( ))f LFS C g FFS and if
time is the comparison parameter then all these terms occurs in reverse order. If speed is
the comparison parameter then the above relationship can also be written as
( ) ( ( ))f FFS g LFS . It means that LFS is the lower bound for the FFS i.e.
0 ( ( )) ( )C g LFS f FFS .
4.4 Conclusion of the Chapter
In the Chapter a new mechanism for lowering system speeds and hence energy is
proposed. The new mechanism is termed as GA-FFS, as clear from its name the
mechanism takes the values obtained through FFS [46] as input and apply genetic
algorithm to further improve the speed of single tasks and hence for whole computing
if
1i if f if
94
unit. The results obtained through GA-FFS are compared with our own proposed
mechanism (LFS discussed in Chapter 3) and with existing counterpart the FFS [46]. The
results reveal that GA-FFS improve the results of FFS. The LFS is optimal solution so
with increase in number of epochs (new populations) the GA-FFS will behave like LFS
while as the number of epoch decreases the GA-FFS behave like FFS.
95
Chapter 5 Resource Allocation Using Load Balancing Mechanisms
96
5.1 Introduction
By applying LFS and GA-FFS strategies discussed in Chapter 3 and Chapter 4, it may not
ensure that load on all computing units will be the same. “The load imbalance (especially
in the many-core processors) is a major source of energy (power) drainage.” [62]. In-
order to equally utilize all the computing units, load balancing strategies are applied.
There is a huge amount of literature on load balancing. All the load balancing strategies
are broadly categorized into two types: one is called static [186] load balancing and the
second one is called dynamic [187, 188] load balancing. Static load balancing has
statistical information of application and uses it for load balancing. D. Grosu et al. [189]
formulated the problem of static load balancing. Dynamic load balancing mechanisms
only use the current state of the system for load balancing. There are three ways for the
solutions of dynamic load balancing problem. The three ways are global, cooperative and
non-cooperative. In global method of load balancing there is only one dedicated machine
for load balancing. In cooperative approach few dedicated machines work cooperatively
for overall system load balancing. In non-cooperative method, every system balances its
load individually. Kameda et al. [190] developed some algorithms for load balancing in
non-cooperative games. Our approach is hybrid between the cooperative and non-
cooperative approach. Our approach acts like non-cooperative approach on single core
and for overall load balancing our approach uses cooperative method of dynamic
approach as we are interested in the overall load balancing of a system.
Andrey G et al. [191] shows a gradient decent algorithm for load balancing. In gradient
decent method of load balancing a specific load gradually balances between systems or
cores. Gradient decent method is just like moving down from hill i.e., load from high
utilized system slowly transfers to a low utilized system till the overall load balances.
Some of the variations of the gradient decent algorithm are found in [192, 193, 194]. Our
strategy of lightest task migration and task splitting resembles to the gradient decent
algorithm.
In this chapter of the research work, the author proposes and experimentally evaluated
two mechanisms for load balancing among cores or computing units. The first one is
called lightest task migration strategy and the second one is called task splitting strategy.
In the lightest task migration strategy, the task having minimum utilization from the
97
highly utilized core is transferred to a low utilized core. While in task splitting strategy, a
task is split among cores in such a way that cores utilization becomes equal. The
abovementioned strategies are discuss in the subsequent section.
5.2 Load Balancing Mechanisms
In this section two mechanisms are discussed for load balancing. The first one is task
migration or task shifting and the second one is task splitting strategy. Before applying
these load balancing strategies; first ly, tasks from a task set are assigned to different
cores such that the tasks are feasible on that cores. The task set 1 2, ,... n consists of n
tasks and can be divided into subsets such that 1 2, ,..., n , where 1 1 2, ,... k and
2 1 2, ,...k k i and so on. Moreover, a set of cores 1 2, ,..., m , m n is
available. The individual task i utilization at a specific core is given as 1
ki
i ii
CU
P . Where
iC is the execution time needed by a task and the iP is the period of the task.
The problem that we are addressing is to map over such that tasks are feasible on
each core and then balance the load among all cores through task shifting and task
splitting strategies. For feasibility checking first, cumulative work load of task i is
calculated through Eq. (27) at any time instance t . After calculating cumulative work load
feasibility is checked through Eq. (5).
1
1
i
i i j
j j
tL t C C
P
(27)
A task i is always feasible on a generic core i at any instance of time t if and only if
Eq. (5) hold true. Where t is a scheduling point and iS is the scheduling points set and is
calculated through:
1,..., ; 1,..., ii j
j
PS lP j i l
P
(28)
On each scheduling point cumulative work load is calculated and if the work load is
feasible the task is assigned to a core and if it is not feasible then the task is assigned to
98
another core. Following are some assumptions that should be kept in mind for
explanation of task splitting strategy.
i. A task can be split to any number of parts.
ii. The splitting of a task does not affect the overall results.
iii. All tasks are independent.
There are two types of tasks dependencies. One is inter-tasks dependency and the other is
intra-task dependency. The primal focus of the author in this chapter is load balancing
mechanisms and not to indulge in task dependencies. Authors in ref [16, 97] are also not
considering task dependency. Therefore, for the sack of simplicity, the author of the
research work is also neither considering the intra-task dependency (assumption ii) nor
the inter-tasks dependency (assumption iii).
5.2.1 Task Migration or Task Shifting
After assigning tasks to cores, the next step is to balance the load among cores. Task
migration or task shifting is one strategy used for load balancing. In this strategy, a core
having maximum utilization i.e., having maximum load is selected and on that core a task
having minimum utilization is selected for shifting. Task utilization is calculated through:
ii
i
CU
P (29)
A task having low utilization is shifted from a highly utilized core to a low a utilized core
and the process is repeated until utilization of all cores becomes approximately equal to
the average utilization of all cores. Nevertheless, the utilization of cores in this strategy is
not necessarily equal because the utilization of the tasks is also different, and this leads to
unequal utilization amongst the cores even after shifting the lightest tasks. The lightest
task is selected for shifting to balance the load among cores gradually. If a task having
maximum utilization is selected for shifting then there may be greater fluctuation of
balancing load among cores i.e., the load on cores will quickly increase and decrease
from the average utilization.
5.2.2 Task Splitting
This is another strategy used for load balancing among cores. Task shifting strategy does
not guarantee equal load among all cores as compared to task splitting. In task splitting
99
strategy iC is the only parameter to play with, because the parameter iP is constant and
we can not change it. Task splitting strategy balances load, among all cores by splitting
the iC parameter into two parts in such a way that core‟s utilization becomes equal after
assigning one part to one core and the second part to another core. Task splitting strategy
is more time consuming as compared to task shifting policy because in task splitting extra
time is required to split a task into two parts and then transfer a part of the split task to
another core for balancing load.
Task splitting strategy can be implemented in two ways: i) Assign tasks to cores and
apply task splitting strategy directly. ii) First apply the task shifting strategy and then
apply task splitting. The second way of implementation is less time consuming as
compared to the first because after applying task shifting strategy t he cores are
approximately balanced as compared to the first choice. We apply the second way of
implementation for task shifting. The results of task shifting and task spl itting are
discussed in Section 5.2.
5.2.3 Explanation Through an Example
Let‟s take three tasks 1 1.2,6 , 2 4,8 and 3 1.2,6 . Here our focus is the load
balancing and not the feasibility of tasks therefore, we assume that tasks 1 and 2 are
feasible on core 1 ( 1 ) and 3 is feasible on core 2 ( 2 ) and assigned to respective cores.
The individual task‟s utilization is calculated through Eq. (29) and are 0.2, 0.5 and 0.2 for
1 , 2 and 3 respectively. The overall core utilizations are 0.7(0.2 + 0.5) of 1 and 0.2 of
2 and are given at the top of each core in Figure 5.1.
Task Shifting Task Splitting
Figure 5.1: Load balancing mechanisms (Task shifting and Task Splitting) among two cores.
100
Now for task shifting policy the high-utilized core is 1 and lightest task having
minimum utilization at 1 is 1 . Therefore, 1 of 1 is selected and shifted from 1 to 2
. After task shifting the total utilization of cores is depicted in Figure 5.1 and are 0.5 and
0.4 for 1 and 2 respectively. Now the cores utilization is just near to balance but not
fully balance. In order to fully balance the cores utilization, task-splitting policy is
applied. For task splitting policy, the only factor is iC that we have to play with. The iC
can be split through the following procedure.
Average utilization of a core is equal to total utilization of all tasks on cores divided by
number of cores i.e., totUAvg
n . So the average utilization of a core is:
0.5 0.4 0.452
Avg . Now the difference among average core utilization and
actual core utilization is 5 0.45 0.05 or 0.45 0.40 0.05 . Now divide the iP by the
difference value (0.05) to calculate how many parts we have to divide the iC for full
balancing of load among cores. The number of parts that iC will have to be divided is:
i
val
UDiff
. So number of parts: 0.5/ 0.05 10 . Next, divide the iC of that task by the
number of parts in order to determine the portion of iC that has to be transferred to
another core as: iCPartsNo
4 0.410
. So the iC is split into two portion one portion
will be 0.4 and another portion will be 3.6.
Now the utilizations of split portions are: 0.4 0.058 (this portion will have to be
transferred from one core to another) and 3.6 0.458 (this portion will retain on the same
core). In this way all the load is fully balanced among the cores. The above mentioned
policies are pictorially depicted in Figure 5.1.
5.3 Results and Discussions
This section shows the results of the strategies discussed in previous section. Matlab is
used for simulation of the above discussed strategies. To compare both techniques,
random task sets of sizes within the range of [60, 80] were generated. The task per iods
were randomly generated from a uniformly distributed range of [100, 10,000]. To obtain
101
the corresponding task execution demands iC for i , random values were taken from
within the range of 1, iP , also with uniform distribution. The priorities were assigned to
the tasks as per RM scheduling rules. That is, the smaller the task period, the higher is the
task priority. After task set generation, the tasks are assigned to cores and then the
discussed strategies are applied.
1 2 3 40
5
10
15
20
Cores (n)
Num
ber
of T
asks
Bef
ore
(n)
Figure 5.2: Number of tasks on cores before load balancing.
Figure 5.2 shows assigned tasks to cores, before applying any of the loads balancing
strategy. It shows that before load balancing core 3 has maximum tasks which are 19 and
core 2 and core 4 have 13 numbers of tasks and remaining tasks are assigned to core 1 as
depicted in Figure 5.2.
1 2 3 40
0.2
0.4
0.6
0.8
1
Cores (n)
Uti
lizat
ion
Bef
ore
(%
)
Figure 5.3: Cores utilization before load balancing.
Figure 5.3 depicts utilization of cores corresponding to the tasks assigned in Figure 5.2. It
is clear from Figure 5.3 that core 3 has maximum utilization which is 0.8149 and core 4
has minimum utilization which is 0.6704 before load balancing.
102
1 2 3 40
5
10
15
20
Cores (n)N
um
ber
of
Ta
sk
s A
fte
r S
hif
tin
g (
n)
Figure 5.4: Number of tasks on cores after task shifting.
Figure 5.4 shows the number of tasks on the four cores after applying task migration or
task shifting strategy. Figure 5.4 depicts that 7 tasks are shifted to core 4 from core 1 and
core 3 so now core 4 has maximum tasks which are 20. It should be clear that in task
shifting the lightest task is shifted from a high utilized core to a low utilized core. Core 2
also gains 2 tasks from core 1 and core 3. After shifting tasks core 3 has minimum task
numbers which is 12 as depicted in Figure 5.4
1 2 3 40
0.2
0.4
0.6
0.8
Cores (n)
Uti
lizati
on
Aft
er
Sh
ifti
ng
(%
)
Figure 5.5: Cores utilization after task shifting.
Figure 5.5 depicts utilization of cores corresponding to the number of tasks in Figure 5.4.
It is clear from the following Figure 5.5 that after applying task shifting strategy all the
cores utilization are not fully equal but approximately equal.
In Figure 5.5 core 1 has minimum utilization as compared to other cores which is 0.7382
and core 2 has maximum utilization which is 0.7550.
103
1 2 3 40
0.2
0.4
0.6
0.8
Cores (n)
Uti
lizati
on
Aft
er
Sp
litt
ing
(%
)
Figure 5.6: Cores utilization after task splitting
In order to make the utilization of all cores equal, task splitting strategy is applied. The
result of task splitting is depicted in Figure 5.6 and is clear from the figure that all cores
have now equal utilization which is 0.7453.
Although task splitting leads to equal cores utilization but is more time consuming than
task shifting strategy.
Table 5.1: Overall simulation results
Task
Set
Size
No of cores
for full
feasibility of
task set
Task on each
core before
shifting and
splitting
Load on each
core before
shifting and
splitting
Tasks on
each core
after tasks
shifting
Load on
each core
after tasks
shifting
Load on
each core
after tasks
splitting
60 4
C1: 15
C1: 0.7834
C1: 13
C1: 0.7382
C1: 0.7450 C2: 13 C2: 0.7125 C2: 15 C2: 0.7750 C2: 0.7450
C3: 19 C3: 0.8149 C3: 12 C3: 0.7442 C3: 0.7450
C4: 13
C4: 0.6704
C4: 20
C4: 0.7438
C4: 0.7450
70 5
C1: 13
C1: 0.7516
C1: 11
C1:0.7321
C1: 0.7330
C2: 13 C2: 0.7429 C2: 13 C2: 0.7429 C2: 0.7330
C3: 15 C3: 0.7999 C3: 11 C3: 0.7403 C3: 0.7330
C4: 18 C4: 0.8404 C4: 10 C4: 0.7260 C4: 0.7330
C5: 11
C5: 0.5326
C5: 25
C5: 0.7261
C5: 0.7330
80 5
C1: 20
C1: 0.7437
C1:20
C1:0.7437
C1: 0.7360 C2: 17 C2: 0.8179 C2: 11 C2: 0.7410 C2: 0.7360
C3: 18 C3: 0.8503 C3:12 C3: 0.7100 C3: 0.7360
C4: 19 C4: 0.8169 C4: 12 C4: 0.7443 C4: 0.7360
C5: 06
C5: 0.4519
C5: 25
C5: 0.7417
C5: 0.7360
Table 5.1 depicts overall simulation results. The simulations are run on Intel core 2 due
having windows 7 as an operating system. All the above figures from Figure 5.2 to
Figure 5.6 are based on the task set size 60 of Table 5.1.
104
It is clear from the given results in Table 5.1 that task shifting takes less time as
compared to task splitting mechanism.
Another observation is; simulation time taken by task shifting increases as long as the
task set size increases. In task splitting; with the increase in task set size does not always
increase the simulation time; this is due to the fact that in task splitting we are unaware of
the fact that a single task will be split in how many parts. The overall tasks splitting time
is based on two factors; number of tasks to be split and a single task will be split into
how many parts. Therefore in task splitting mechanism, as the task set size increases; it
does not always increase total simulation time.
5.4 Conclusion of the Chapter
The load balancing mechanisms discussed in the Chapter are task shifting and task
splitting. In task shifting mechanism a task having low utilization is selected from a
highly utilized core and shifted (transferred) to a low utilized core. The splitting st rategy
is applied after task shifting as to equally utilize all the computing units/cores. In task
splitting strategy a maximum utilized task from a highly utilized core is selected and
splits its execution time in such a way that if the split portion is assigned to a low utilized
core, the utilization of high and low utilized cores becomes equal. Although the splitting
of a task equates the utilization of cores however, care must be taken to split a task
among cores because task splitting takes more time as compared to task shifting
mechanism.
105
Chapter 6 Conclusion, Recommendations and Future Directions
106
6.1 Introduction
This research thesis focuses on power efficient resource allocation in HPC systems. The
overall scope of the work is presented in Chapter 1, including problem statement,
research issues and overall contribution of the research. Meanwhile, analysis of existing
distributed HPC systems based on predefined features were presented in Chapter 2. To
cover HPC systems from power efficient resource allocation perspective, a new approach
called LFS were presented in Chapter 3. In-order to further investigate the power
efficiency in HPC systems the author presented another approach called GA-FFS in
Chapter 4 of the research work. In Chapter 5, the author presented load balancing
mechanisms in-order to balance loads among computing units that may become
unbalanced due to the proposed techniques of Chapter 3 and Chapter 4. Finally, this
chapter concludes the research thesis with some recommendations and future directions.
6.2 Conclusion
Chapter 2 of the thesis provides a detailed comparison and description of the three broad
categories of HPC systems namely Cluster, Grid, and Cloud. The said categories have
been investigated and analyzed in terms of resource allocation. Moreover, the well-
known projects and applications from each category are briefly discussed and
highlighted. Furthermore, the aforementioned projects in Chapter 2 are compared on the
basis of selected common features belonging to the same category. For each category,
more specific characteristics are discussed. The features list can be expanded further for
Cluster, Grid, and Cloud. However, because the scope of Chapter 2 was on resource
allocation only, the selected characteristics allow more clear distinctions at each level of
the classification. The Chapter 2 will help the readers to analyze the gap between what is
already available in existing systems and what is still required, so that outstanding
research issues can be identified. Moreover, the features of cluster, grid, and cloud are
closely related to each other and the said chapter will help to understand the differences.
Furthermore, the systems of each category have been classified under software only and
hardware or hybrid only. The hardware and OS support could be cost prohibitive to end-
users. However, programming level is a big burden to end-users. Amongst the three HPC
categories, grid and cloud computing appears promising and a lot of research has been
conducted in each category. The focus of future HPC systems is to reduce the operational
cost of data centers and increase the resilience to failure, adaptability, and graceful
107
recovery. The Chapter 2 of this thesis can help new researchers to address the open areas
in the research. Moreover, it also provides the basic information along with the
description of the projects in the broad domain of cluster, grid, and cloud.
In Chapter 3, author integrated dynamic voltage scaling with the fixed priority-
scheduling paradigm. A solution was proposed to find the lowest possible core speed for
a single task. The proposed technique was then applied to the multi-core system to
identify a uniform system speed to conserve energy while maintaining the system timing
requirements. The proposed methodology was compared to existing techniques and the
simulation results presented in Chapter 3 revealed superior performance.
The proposed work in Chapter 4 addressed and improved the speed and power of FFS by
using genetic algorithm. This modified version of FFS is termed as GA-FFS. Author
concluded from the experimental evaluation that, GA-FFS is more efficient in speed and
power consumption than FFS. The GA-FFS also have better results than LFS while
considering required execution time as a testing parameter.
In Chapter 5, author presented two strategies for load balancing among cores or systems
in HPC environment. The first strategy for load balancing is task migrat ion. In Task
migration a lightest task is transferred from a highly utilized core to a low utilized core
and the process is repeated unless the load among cores is approximately balanced. The
other strategy is task splitting. In task splitting strategy cores are fully balanced by
splitting, a task i.e., the execution time of a single task is divided between high-utilized
core and a low utilized core in such a way that cores utilization becomes fully balanced.
As compared to task migration, task-splitting strategy fully balances a specific load
among cores but it is more time consuming than task migration, because it takes extra
time in splitting a task in such a way to balance load among cores.
6.3 Recommendations
A recommendation for new researchers is to read Chapter 2 of the thesis. It will help to
address the open areas in the research. For further recommendations, it is suggested that
use FFS technique in situations where response time is of great importance than energy
consumption. In other words, FFS technique will respond quickly than LFS technique.
Another recommendation is to use LFS technique in situations where power is more
important than response time, as established from the results obtained in Chapter 3. Use
108
the GA-FFS mechanism in situations where response time and power consumption are of
moderate importance. In case of load balancing among cores or systems, use task shifting
strategy whenever time is important otherwise, use task splitting strategy that will fully
balances load as clear from the results obtained in Chapter 5.
Overall, the main objective of this research work has been to devise intelligent resource
allocation strategies that improve power (energy) consumption in HPC systems. Indeed,
this is a wide-ranging field with several existing prior works and findings. The prime
focus of this research is to allocate the computing resources in high performance
computing environment in such a way to minimize the power (energy). This research
effort adds the HPC systems community from energy perspective with two novel
approaches. These efforts, also opens up some novel directions for future research, which
are detailed next.
6.4 Future Directions
For future research, one option is to conduct a survey on research issues in resource
allocation mechanisms in HPC environment. Furthermore, if possible, can improve an
existing mechanism or develop a new mechanism for resource allocation. Initially the
new mechanism will be for multi-core. If promising results were obtained, then the new
resource allocation mechanism will be extended to distributed HPC systems.
Another future direction is to apply any naturally inspired algorithm for task assignment
problem instead of rate monotonic and check system speed for energy consumption. After
that any speed minimization technique will be applied and energy consumption will be
checked again. Both the energy results (with and without speed minimization technique)
will be compared for ensuring of energy reduction.
The research work presented in Chapter 5 can also be extended. It could be a good option
as a future work to extend the task splitting and task shifting strategies to distributed
HPC systems where a lot of new issues comes like delay time in transferring a task and
so on. More interesting result could be obtained by incorporating the delay time in these
concepts in distributed HPC environment. A distributed HPC environment is more
challenging than multi-core, while implementing these concepts in distributed HPC
environment, may be some more interesting research topic and issue will appear. Another
109
future direction is the behavior of task splitting strategy by considering the intra-task and
inter-tasks dependencies.
110
REFERENCES:
[1] Hameed Hussain, Saif-Ur-Rahman Malik, Abdul Hameed, Samee Ullah Khan, et
al. "A survey on resource allocation in high performance distributed computing
systems." Parallel Computing 39.11 (2013): 709-736.
[2] Nasro Min-Allah, Hameed Hussain, Samee Ullah Khan, and Albert Y. Zomaya.
"Power efficient rate monotonic scheduling for multi-core systems." Journal of
Parallel and Distributed Computing 72, no. 1 (2012): 48-57.
[3] Hameed Hussain, Muhammad Bilal Qureshi, Muhammad Shoaib and Sadiq Shah,
"Load balancing through task shifting and task splitting strategies in multi-core
environment." IEEE Eighth International Conference on.Digital Information
Management (ICDIM), 2013: pp. 385-390.
[4] Hameed Hussain, Muhammad Bilal Qureshi and Manzor Illahi Tamimy,
“Minimizing Power Consumption through System Speed using Genetic
Algorithm”, Submitted to The Scientific World Journal (TSWJ), a Hindawi
Journal.
[5] G.L. Valentini, W. Lassonde, S.U. Khan, N. Min-Allah, S.A. Madani, J. Li, L.
Zhang, L. Wang, N. Ghani, J. Kolodziej, H. Li, A.Y. Zomaya, C.-Z. Xu, P. Balaji,
A. Vishnu, F. Pinel, J.E. Pecero, D. Kliazovich, P. Bouvry, “An overview of
energy efficiency techniques in cluster computing systems”, Cluster Computing 16
(1) (2013) 3–15.
[6] F. Pinel, J.E. Pecero, S.U. Khan, P. Bouvry, “Energy-efficient scheduling on
milliclusters with performance constraints”, in: ACM/IEEE International
Conference on Green Computing and Communications (GreenCom) , Chengdu,
Sichuan, China, August 2011, pp. 44–49.
[7] L. Wang, S.U. Khan, D. Chen, J. Kolodziej, R. Ranjan, C.-Z. Xu, A.Y. Zomaya,
“Energy-aware parallel task scheduling in a cluster”, Future Generation Computer
Systems 29 (7) (2013) 1661–1670.
[8] J. Kołodziej, S.U. Khan, L. Wang, M. Kisiel-Dorohinicki, S.A. Madani, E.
Niewiadomska-Szynkiewicz, A.Y. Zomaya, C. Xu, “Security, energy, and
performance-aware resource allocation mechanisms for computational grids”,
Future Generation Computer Systems, October 2012, ISSN 0167-739X,
http://dx.doi.org/10.1016/j.future.2012.09.009.
[9] Grid Computing, http://www.adarshpatil.com/newsite/images/grid-computing.gif,
accessed Feb. 20, 2012
[10] S.U. Khan, “A goal programming approach for the joint optimization of energy
consumption and response time in computational grids”, in: 28th IEEE
International Performance Computing and Communications Conference (IPCCC) ,
Phoenix, AZ, USA, December 2009, pp. 410–417.
[11] D. Chen, L. Wang, X. Wu, J. Chen, S.U. Khan, J. Kolodziej, M. Tian, F. Huang,
W. Liu, “Hybrid modelling and simulation of huge crowd over a hierarchical grid
architecture”, Future Generation Computer Systems 29 (5) (2013) 1309–1317.
[12] J. Kolodziej, S.U. Khan, “Multi-level hierarchical genetic-based scheduling of
independent jobs in dynamic heterogeneous grid environment”, Information
Sciences 214 (2012) 1–19.
[13] Qureshi, Muhammad Bilal, Maryam Mehri Dehnavi, Nasro Min-Allah,
Muhammad Shuaib Qureshi, Hameed Hussain, Ilias Rentifis, Nikos Tziritas et al.
111
"Survey on Grid Resource Allocation Mechanisms." Journal of Grid Computing
(2014): 1-43.
[14] Y. Amir, B. Awerbuch, A. Barak, R. S. Borgstrom, and A. Keren. “An
Opportunity Cost Approach for Job Assignment in a Scalable Computing Cluster”,
IEEE Transactions on Parallel and Distributed Systems, 11(7):760–768, July
2000.
[15] C. Yeo, and R. Buyya, “A Taxonomy of Market-Based Resource Management
Systems for Utility-Driven Cluster Computing”, Software: Practice and
Experience, Vol. 36, No. 13, Nov. 2006, pp. 1381-1419.
[16] C. Diaz, M. Guzek, J. Pecero, P. Bouvry, and S. Khan, “Scalable and Energy-
efficient Scheduling Techniques for Large-scale Systems”, 11th IEEE
International Conference on Computer and Information Technology (CIT), Sep.
2011, pp. 641-647.
[17] J. Kolodiej, S.U. Khan, E. Gelenbe, E.-G. Talbi, “Scalable optimization in grid,
cloud, and intelligent network computing”, Concurrency and Computation:
Practice and Experience 25 (12) (2013) 1719–1721.
[18] G. Andrews, Foundations of Multithreaded, Parallel, and Distributed
Programming, Addison–Wesley, Boston, MA, USA, 2000.
[19] H. Xin, L. KenLi, L. RenFa, “A energy efficient scheduling base on dynamic
voltage and frequency scaling for multi-core embedded real-time system”, in:
Algorithms and Architectures for Parallel Processing, in: LNCS , vol. 5574, 2009,
pp. 137–145 (Chapter).
[20] E. Humenay, D. Tarjan, K. Skadron, “Impact of process variations on multicore
performance symmetry”, In: Proceedings of the Conference on Design,
Automation and Test in Europe, 2007, pp. 1653-1658
[21] J. Sartori, A. Pant, R. Kumar, P. Gupta, “Variation aware speed binning of
multicore processors”, in: Proceedings of the 11-th IEEE International
Symposium on Quality Electronic Design, 2010, pp. 307–314.
[22] A.P. Chandrakasan, S. Sheng, R.W. Brodersen, “Low power CMOS digital
design”, IEEE J. Solid State Circuits (1992) 472–484.
[23] T.D. Burd, T.A. Pering, A.J. Stratakos, R.W. Brodersen, “A dynamic voltage
scaled microprocessor system”, IEEE J. Solid State Circuits 35 (11) (2000) 1571–
1580.
[24] T. Gloker, H. Meyr, “Design of Energy-Efficient Application-Specific Instruction
Set Processors”, Kluwer Academic Publisher, Dordrecht, 2004.
[25] T. Ishihara, H. Yashura, “Voltage scheduling problem for dynamically variable
voltage processors”, in: International Symposium on Low Power Electronics and
Design, 1998, pp. 197–202.
[26] J.L.W.V. Jensen, Sur les fonctions convexes et “les inegalites entreles valeurs
moyennes”, Acta Math. 30 (1) (1906) 175–193.
[27] V. Raghunathan, C. Pereira, M. Srivastava, R. Gupta, “Energy aware wireless
systems with adaptive power-fidelity tradeoffs”, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst. 13 (2) (2005).
[28] W. Lee, H. Kim, H. Lee, “Maximum-utility scheduling of operation modes with
probabilistic task execution times under energy constraints”, IEEE Trans. Comput.
Aided Des. Integr. Circuits Syst. 28 (10) (2009) 1531.
[29] C.L. Liu, J.W. Layland, “Scheduling algorithms for multiprogramming in a hard
real-time environment”, Journal of the ACM 20 (1) (1973) 40–61.
112
[30] N. Min-Allah, S.U. Khan, “A hybrid test for faster feasibility analysis of periodic
tasks”, IJICIC 7 (10) (2011) 5689–5698.
[31] R.I. Davis, T. Rothvo, S.K. Baruah, A. Burns, “Exact quantification of the
suboptimality of uniprocessor fixed priority pre-emptive scheduling”, Real Time
Syst. 43 (3) (2009) 211–258.
[32] C.M. Krishna, Kang G. Shin, “Real-time Systems”, Tsinghua University Press,
McGraw-Hill, 2001.
[33] A. Burns, A.J. Wellings, “Real-Time Systems and Programming Languages”, 4th
ed., Addison Wesley, 2009, 602 pages.
[34] K. Lakshmanan, R. Rajkumar, J.P. Lehoczky, “Partitioned fixed-priority
preemptive scheduling for multi-core processors”, in: Proceedings of the 21st
Euromicro Conference on Real-Time Systems, 2009, pp. 239–148.
[35] S. Saewong, R. Rajkumar, “Practical voltage-scaling for fixed priority rt-
systems”, in: Proceedings of the 9th IEEE Real-Time and Embedded Technology
and Applications Symposium, RTAS03, 2003, pp. 106–115.
[36] N. AbouGhazaleh, B. Childers, D. Mosse, R. Melhem, M. Craven, “Energy
management for real-time embedded applications with compiler support”, in:
ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded
Systems, 2003, pp. 284–293.
[37] H. Aydin, R. Melhem, D. Mosse, P. Alvarez, “Dynamic and aggressive scheduling
techniques for power-aware real-time systems”, in: Proc. IEEE Real-Time Syst.
Symp., 2001, p. 95.
[38] J. Anderson, S. Baruah, “Energy-efficient synthesis of periodic task systems upon
identical multiprocessor platforms”, in: Proc. Distributed Computing Systems,
24th International Conference, 2004, pp. 428–435.
[39] P. Pillai, K.G. Shin, “Real-time dynamic voltage scaling for lowpower embedded
operating systems”, in: Proceedings of the 18th ACM Symposium on Operating
Systems Principles, 2001, pp. 21–24.
[40] F. Li, F.F. Yao, “An efficient algorithm for computing optimal discrete voltage
schedules”, SIAM J. Comput. 35 (2005) 658–671.
[41] F. Zhang, S. Chanson, “Processor voltage scheduling for realtime tasks with non-
preemptible sections”, in: Real-Time System Symposium, Austin, TX, Dec. 2002.
[42] J. Brateman, C. Xian, Y. Lu, “Frequency and speed setting for energy
conservation in autonomous mobile robots”, in: Proceedings of the IFIP
International Federation for Information Processing , vol. 249/2008, 2008, pp.
197–216.
[43] J.W.S. Liu, “Real Time Systems”, Prentice Hall, 2000.
[44] L. George, N. Riverre N, M. Spuri, “Preemptive and Non-Preemptive Real-Time
Uniprocessor Scheduling”, Research Report 2966, INRIA, France, 1996.
[45] N. Min-Allah, SU Khan, Y. Wang, “Optimal task execution times for periodic
tasks using nonlinear constrained optimization”, J. Supercomput. (2010)
doi:10.1007/s11227-010-0506-z.
[46] E. Bini, G.C. Buttazzo, G. lipari, “Minimizing CPU energy in real time systems
with discrete speed management”, ACM Trans. Embedded Comput. Syst. 8 (4)
(2009).
[47] N. Min-Allah, I. Ali, J. Xing, Y. Wang, “Utilization bound for periodic task set
with composite deadline”, J. Comput. Electr. Eng. 36 (6) (2010) 1101–1109.
113
[48] J.Y.T. Leung, J. Whitehead J., “On the complexity of fixed-priority scheduling of
periodic”, Real-time tasks performance evaluation 2 (1982) 237–250.
[49] J.P. Lehoczky, “Fixed priority scheduling of periodic task sets with arbitrary
deadline”, in: Proceedings of the 11-th IEEE Real-Time System Symposium, 1990,
pp. 201–209.
[50] E. Seo, Y. Koo, J. Lee, “Dynamic repartitioning of real-time schedule on a
multicore processor for energy efficiency”, in: LNCS, vol. 4096/2006, 2006, pp.
69–78.
[51] Sastry, Kumara, David Goldberg, and Graham Kendall. "Genetic algorithms." In
Search methodologies, pp. 97-125. Springer US, 2005.
[52] Hull, David L. "Darwin and his critics: The reception of Darwin's theory of
evolution by the scientific community." (1973).
[53] Paul, Diane B. "The selection of the “Survival of the Fittest”." Journal of the
History of Biology 21.3 (1988): 411-424.
[54] Kokkinos et.al, “A framework for providing hard delay guarantees and user
fairness in Grid Computing”. June 2009.
[55] V.Chauhan .et.al. “Motivation for Green Computer, Methods Used in Computer
Science Program”, In National Postgraduate Conference (NPC), 19-20
September, 2011,pp:1-5, doi: 10.1109/NatPC.2011.6136287
[56] N. Sadashiv, and S. Kumar, “Cluster, Grid and Cloud Computing: A Detailed
Comparison,” 6th International Conference on Computer Science & Education
(ICCSE), Sep. 2011, pp. 477-482.
[57] J.L.W.V. Jensen, Su les fonctions convexes et les inegalites entrles valeurs
moyennes, Acta Math, 30(1) (1906) 175-193
[58] I. Foster, Y. Zhao, I. Raicu, and S. Lu, “Cloud Computing and Grid Computing
360-Degree Compared”, Grid Computing Environments Workshop
2008(GCE’08), Nov. 2008, pp. 1-10.
[59] “What is the difference between Cloud, Cluster and Grid Computing?”,
http://www.cloud-competence-center.de/understanding/difference-cloud-cluster-
grid/, accessed Feb. 5, 2011.
[60] Amazon‟s HPC cloud: supercomputing for the 99%,
http://arstechnica.com/business/2012/05/amazons-hpc-cloud-supercomputing-for-
the-99/, accessed 11 Sep 2012.
[61] F.Dong, and S.G. Akl, “Scheduling Algorithms for Grid Computing: State of Art
and Open Problems”. QueensUniversity. Technical report.
http://www.cs.queensu.ca/TechReports/Reports/2006-504.pdf.
[62] G. Valentini, S. Khan, and P. Bouvry, “Energy-efficient Resource Utilization in
Cloud Computing,” Large Scale Network-centric Computing Systems, A. Y.
Zomaya and H. Sarbazi-Azad, eds., John Wiley & Sons, Hoboken, NJ, USA. 2013,
ISBN: 978-0-470-93688-7, Chapter 16.
[63] K. Ramamritham, and J. Stankovic, “Scheduling Algorithm and Operating System
Support for Real-time Systems,” Proceedings of IEEE, Vol. 82, No. 1, Aug. 2002,
pp.55-67.
[64] P. Berstein, http://research.microsoft.com/en-us/people/philbe/chapter3.pdf,
accessed July. 25, 2011.
[65] P. Wieder, O. Waldrich, W. Ziegler, “Advanced Techniques for Scheduling,
Reservation and Access Management for Remote Laboratories and Instruments,”
114
2nd IEEE International Conference on e-Science and Grid Computing (e-
Science’06), Dec. 2006, pp. 128-128.
[66] “Continuous Availability for Enterprise Messaging: Reducing Operational Risk
and Administration Complexity”,
http://www.progress.com/docs/whitepapers/public/sonic/sonic_caa.pdf, accessed
Feb. 12, 2011.
[67] “Cluster Computing”, http://searchdatacenter.techtarget.com/definition/cluster-
computing, accessed Feb. 03, 2011.
[68] “Parallel Sysplex”, http://www-03.ibm.com/systems/z/advantages/pso/, accessed
Feb.22, 2011.
[69] L. Kal‟e and S. Krishnan, “Charm++: Parallel Programming with Message-driven
Objects,” Parallel Programming Using C++, G. V. Wilson and P. Lu, eds., MIT
Press, Cambridge, MA, USA, 1996, pp. 175-213.
[70] P. Brucker, “Scheduling Algorithms”, 4th edition, Springer-Verlag, Guildford,
Surrey, UK, 2004.
[71] L. Wang, J. Tao, H. Marten, A. Streit, S.U. Khan, J. Kolodziej, D. Chen, “Map
Reduce across distributed clusters for data-intensive applications”, in: 26th IEEE
International Parallel and Distributed Processing Symposium (IPDPS) , Shanghai,
China, May 2012, pp. 2004–2011.
[72] G. White, and M. Quartly,
http://www.ibm.com/developerworks/systems/library/es-linuxclusterintro,
accessed Feb. 15, 2012.
[73] P. Lindberg, J. Leingang, D. Lysaker, K. Bilal, S. Khan, P. Bouvry, N. Ghani, N.
Min-Allah, and J. Li, “Comparison and Analysis of Greedy Energy-Efficient
Scheduling Algorithms for Computational Grids,” Energy Aware Distributed
Computing Systems, A. Y. Zomaya and Y.-C. Lee, eds., John Wiley & Sons,
Hoboken, NJ, USA.
[74] Microsoft Live Mesh, http://www.mesh.com, accessed Feb. 12, 2011.
[75] I. Foster, C. Kesselman, and S. Tuecke “The Anatomy of the Grid,” International
Journal of Supercomputer Applications, Vol. 15, No. 3, Aug. 2001, pp. 200-222.
[76] D. Irwin, L. Grit, and J. Chas, “Balancing Risk and Reward in a Market-based
Task Service,” 13th International Symposium on High Performance Distributed
Computing (HPDC13), June 2004 pp. 160-169.
[77] C. Yeo and R. Buyya, “Service Level Agreement based Allocation of Cluster
Resources: Handling Penalty to Enhance Utility,” 7th IEEE International
Conference on Cluster Computing (Cluster 2005), Sep. 2005.
[78] R. Buyya, R. Ranjan and R. N. Calheiros, “Modeling and Simulation of Scalable
Cloud Computing Environments and the CloudSim Toolkit: Challenges and
Opportunities”. Pro-ceedings of the 7th High Performance Computing and
Simulation Conference (HPCS 2009, IEEE Press, New York, USA), Leipzig,
Germany, June 21-24, 2009.
[79] S. Toyoshima, S. Yamaguchi, and M. Oguchi, “Storage Access Optimization
withVirtual Machine Migration and Basic Performance Analysis of Amazon
EC2,” IEEE 24th International Conference on Advanced Information Networking
and Applications Workshops, Apr. 2010, pp. 905-910.
[80] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, andD.
Zagorodnov, “The Eucalyptus Open-Source Cloud Computing System,” 9th
115
IEEE/ACM International Symposium on Cluster Computing and the Grid
(CCGRID ’09), May 2009, pp. 124-31.
[81] K. K. Droegemeier, D. Gannon, D. Reed, B. Plale, J. Alameda, T. Baltzer, K.
Brewster, R. Clark, B. Domenico, S. Graves, E. Joseph, D. Murray, R.
Ramachandran, M. Ramamurthy, L. Ramakrishnan, J. A. Rushing, D. Weber, R.
Wilhelmson, A. Wilson, M. Sue, and S. Yalda,“Service-Oriented Environments
for Dynamically Interacting with Mesoscale Weather”, Computing in Science and
Engg., Vol. 7, No. 6, 2005, pp.12–29.
[82] J. Sherwani, N. Ali, N. Lotia, Z. Hayat, and R. Buyya, “Libra: A Computational
Economy-based Job Scheduling System for Clusters,” Software: Practice and
Experience, Vol. 34, No. 6, May 2004, pp. 573-590.
[83] T. Casavant, and J. Kuhl, “A Taxonomy of Scheduling in General-purpose
Distributed Computing Systems,” IEEE Transactions on Software Engineering,
Vol. 14, No.2, Jan. 1988, pp. 141-154.
[84] J. Regeh, J. Stankovic, and M. Humphrey, “The Case for Hierarchical Schedulers
with Performance Guarantees”, TR-CS 2000-07, Department of Computer
Science, University of Virginia, Mar. 2000, 9 pp.
[85] R. Wolski, N. Spring, and J. Hayes, “Predicting the CPU Availability of Time-
shared Unix Systems on the Computational Grid,” Proceedings of the 8th High-
Performance Distributed Computing Conference, Aug.1999.
[86] N. Arora, R. Blumofe, C. Plaxton, “Thread Scheduling for Multi-programmed
Multi-processors,” Theory of Computing Systems, Vol. 34, No. 2, 2001, pp. 115-
144.
[87] T. Xie, A. Sung, X. Qin, M. Lin, and L. Yang, “Real-time Scheduling with Quality
of Security Constraints,” International Journal of High Performance Computing
and Networking, Vol. 4, No. 3, 2006, pp. 188-197.
[88] L. Wang, J. Tao, H. Marten, A. Streit, S. Khan, J. Kolodziej, and D. Chen, “Map
Reduce across Distributed Clusters for Data-intensive Applications,” 26th IEEE
International Parallel and Distributed Processing Symposium (IPDPS) , May
2012.
[89] S. Iqbal, R. Gupta and Y. Lang, “Job Scheduling in HPC Clusters”, Power
Solutions, Feb. 2005, pp. 133-135.
[90] M. Jette, A. Yoo, and M. Grondona “SLURM: Simple Linux Utility for Resource
Management”, D. G. Feitelson and L. Rudolph eds., Job Scheduling Strategies for
Parallel Processing, 2003, pp. 37-51.
[91] S. Senapathi, D. K. Panda, D. Stredney, and H.-W. Shen,“A QoS Framework for
Clusters to support Applications with Resource Adaptivity and Predictable
Performance,” Proceedings of the IEEE International Workshop on Quality of
Service (IWQoS), May.
[92] K. H. Yum, E. J. Kim, and C. Das, “QoS provisioning in clusters: an
investigationof router and NIC design”, In ISCA-28, 2001.
[93] J. Leung, “Handbook of Scheduling: Algorithms, Models, and Performance
Analysis, First Edition”, CRC Press, Inc., Boca Raton, FL, USA, 2004.
[94] S. Ali, T.D. Braun, H.J. Siegel, A.A. Maciejewski, N. Beck, L. Boloni, M.
Maheswaran, A.I. Reuther, J.P. Robertson, M.D. Theys, B. Yao, “Characterizing
resource allocation heuristics for heterogeneous computing systems,” in: A.R.
Hurson (Ed.), Advances in Computers, vol. 63: Parallel, Distributed, and
Pervasive Computing, Elsevier, Amsterdam,The Netherlands, 2005, pp. 91–128.
116
[95] P. Dutot, L. Eyraud, G. Mounie, and D. Trystram, “Bi-criteria Algorithm for
Scheduling Jobs on Cluster Platforms,” 16th ACM Symposium on Parallelism in
Algorithms and Architectures (SPAA), July 2004, pp. 125-132.
[96] F. Pinel, J. Pecero, P. Bouvry, and S. Khan, “A Two-Phase Heuristic for the
Scheduling of Independent Tasks on Computational Grids,” ACM/IEEE/IFIP
International Conference on High Performance Computing and Simulation
(HPCS), July 2011, pp. 471-477.
[97] J. Kolodziej, S. Khan, and F. Xhafa, “Genetic Algorithms for Energy-aware
Scheduling in Computational Grids,” 6th IEEE International Conference on P2P,
Parallel, Grid, Cloud, and Internet Computing (3PGCIC), Oct. 2001, pp. 17-24.
[98] K. Rzadca, “Scheduling in multi-organization grids: Measuring the inefficiency of
decentralization,”7th International Conference on Parallel Processing and
Applied Mathematics, Gdansk,Poland, 2007, pp.1048-1058.
[99] E. Huedo, R. Montero, and I. Llorente, “A Framework for Adaptive Execution in
Grids,” Software-Practice and Experience, Vol. 34, No. 07, June 2004, pp.631-
651.
[100] S. Chapin, J. Karpovich, and A. Grimshaw, “The Legion Resource Management
System,” 5th Workshop on Job Scheduling Strategies for Parallel Processing ,
Apr.1999, pp.162-178.
[101] N. Kapadia, and J. Fortes, “PUNCH: An Architecture for Web-enabled Wide-area
Network-computing,” The Journal of Networks, Software Tools and Applications,
Special Issue on High Performance Distributed Computing , Vol. 2, No. 2,
Sep.1999, pp.153-164.
[102] B. Lowekamp, “Combining Active and Passive Network Measurements to Build
Scalable Monitoring Systems on the Grid,” ACM SIGMETRICS Performance
Evaluation Review, Vol. 30, No. 4, 2003, pp.19-26.
[103] M. Litzkow, M. Livny, and M. Mutka, “Condor- A Hunter of Idle
Workstations,”8th International Conference of Distributed Computing Systems ,
June 1988, pp. 104 - 111.
[104] L. Wang, W. Jie, and J. Chen, “Grid Computing: Infrastructure, Service, and
Applications, Kindle Edition”, CRC Press 2009, pp. 338.
[105] S. Khan and I. Ahmad, “A Cooperative Game Theoretical Technique for Joint
Optimization of Energy Consumption and Response Time in Computational
Grids,” IEEE Transactions on Parallel and Distributed Systems, Vol. 20, No. 3,
2009, pp. 346-360.
[106] L. Wang and S. Khan, “Review of Performance Metrics for Green Data Centers: A
Taxonomy Study,” Journal of Supercomputing.
[107] GENI,http://www.geni.net, acessed Feb. 02, 2011
[108] Google Nimbus, http://www.nimbusproject.org/doc/nimbus/faq/, accessed Apr.
05, 2012.
[109] Open Nebula, http://opennebula.org/, accessed Apr. 05, 2012.
[110] F. Lombardi, andR. DiPietro, “Secure Virtualization for Cloud Computing,”
Journal of Network and Computer Application”, Vol. 34, No. 4, July 2011,
pp.1113-1122.
[111] D. Benslimane, D. Schahram, and S. Amit, “Services Mashups: The New
Generation of Web Applications,” IEEE Internet Computing, Vol. 12, No. 5, Feb.
2008, pp. 13-15.
117
[112] L. Skorin-Kapov, M. Matijasevic, “Dynamic QoS Negotiation and Adaptation for
Networked Virtual Reality Services,” IEEE WoWMoM ’05, Taormina, Italy,
June2005, pp. 344–51.
[113] Vmware, Inc.,
http://www.vmware.com/pdf/vsphere4/r41/vsp_41_resource_mgmt.pdf, accessed
July 23, 2011.
[114] HP Cloud, https://www.hpcloud.com/pricing, accessed August 08, 2013.
[115] A. Barak and O. La‟adan, “The MOSIX Multicomputer Operating System for
High Performance Cluster Computing,” Future Generation Computer Systems,
Vol. 13, No. 4-5, Mar. 1998, pp. 361–372.
[116] Gluster,www.gluster.org, accessed Feb. 16, 2012.
[117] L. Kal‟e, S. Kumar, M. Potnuru, J. DeSouza, and S. Bandhakavi, “Faucets:
Efficient Resource Allocation on the Computational Grid,” 33rd International
Conference on Parallel Processing (ICPP 2004) , Aug. 2004.
[118] M. Bhandarkar, L. Kal‟e, E. Sturler, and J. Hoeflinger, “Adaptive Load Balancing
for MPI Programs,” Lecture Notes in Computer Science (LNCS), Vol. 2074, May
2001, pp. 108-117.
[119] L. Kal‟e, S. Kumar, and J. DeSouza, “A Malleable-job System for Timeshared
Parallel Machines,” 2nd International Symposium on Cluster Computing and the
Grid (CCGrid 2002), pp. 215-222, May 2002.
[120] DQS, http://www.msi.umn.edu/sdvl/info/dqs/dqs-intro.html, accessed Mar. 20,
2011
[121] K. Lai, L. Rasmusson, E. Adar, L. Zhang, and B. Huberman, “Tycoon: An
Implementation of a Distributed, Market-based Resource Allocation System,”
Multiagent Grid System, Vol. 1, No. 3, Aug. 2005, pp. 169-182.
[122] A. Bernardo, H. Lai, and L. Fine, “Tycoon: A Distributed Market-based Resource
Allocation System”, TR-arXiv:cs.DC/0404013, HP Labs, Palo Alto, CA, USA,
Feb. 2008, 8 pp.
[123] J. Chase, D. Irwin, L. Grit, J. Moore, and S. Sprenkle, “Dynamic Virtual Clusters
in a Grid Site Manager,” 12th International Symposium on High Performance
Distributed Computing (HPDC12), June 2003, pp. 90-100.
[124] C. Morin, R. Lottiaux, G. Vallee, P. Gallard, G. Utard, R. Badrinath, and L.
Rilling, “Kerrighed: A Single System Image Cluster Operating System for High
Performance Computing,” Proceedings of Europar 2003 Parallel Processing,
Lecture Notes in Computer Science, Vol. 2790,Aug. 2003, pp. 1291-1294.
[125] Kerrighed,http://kerrighed.org/wiki/index.php/Main_Page, accessed Mar. 15,
2011.
[126] OpenSSI,http://openssi.org/cgi-bin/view?page=openssi.html, accessed Mar. 12,
2011.
[127] C. Yeo and R. Buyya, “Pricing for Utility-driven Resource Management and
Allocation in Clusters,” International Journal of High Performance Computing
Applications, Vol. 21, No. 4, Nov. 2007, pp. 405-418.
[128] P. Springer, “PVM Support for Clusters,” 3rd IEEE International Conference on
Cluster Computing (CLUSTER’01), Oct. 2001.
[129] B. Chun, and D. Culler, “Market-based Proportional Resource Sharing for
Clusters”, TR-CSD-1092, Computer Science Division, University of California,
Berkeley, USA, Jan. 2000, 19 pp.
118
[130] GNQS, http://gnqs.sourceforge.net/docs/starter_pack/introducing/index.html,
accessed Jan. 20, 2011
[131] Workload Management with Load Leveler,
http://www.redbooks.ibm.com/abstracts/sg246038.html, accessed Feb. 10, 2011
[132] http://www.platform.com/workload-management/high-performance-computing,
accessed Aug. 07, 2011.
[133] Research Computing and Cyber Infrastructure,
http://rcc.its.psu.edu/user_guides/system_utilities/pbs/, accessed Aug. 07, 2011.
[134] J. Basney, and M. Livny, “Deploying a High Throughput Computing Cluster,”
High Performance Cluster Computing, Vol. 1, R. Buyya, eds., Prentice Hall, pp.
116-134, 1999.
[135] R. Buyya, D. Abramson, and J. Giddy, “A Case for Economy Grid Architecture
for Service Oriented Grid Computing,” 15th International Parallel and
Distributed Processing Symposium, Apr. 2001, pp. 776-790.
[136] D.Batista, and N. Fonseca, “A Brief Survey on Resource Allocation in Service
Oriented Grids,” IEEE Globecom Workshops, Nov. 2007, pp. 1-5.
[137] H. Nakada, M. Sato, and S. Sekiguchi, “Design and Implementation of Ninf:
Towards a Global Computing Infrastructure,” Future Generation Computing
Systems (Meta-computing Special Issue), Vol. 15, No. 5-6, Oct. 1999, pp. 649-
658.
[138] K. Krauter, R. Buyya, and M. Maheswaran, “A Taxonomy and Survey of Grid
Resource Management Systems for Distributed Computing,” Journal of Software
Practice and Experience, 2002, pp. 135-164, (DOI: 10.1002/spe.432).
[139] R. Al-Ali, A. Hafid, O. Rana, and D. Walker, “QoS Adaptation in Service-
Oriented Grids,” Performance Evaluation, Vol. 64, No.7-8, Aug. 2007, pp. 646-
663.
[140] R. Al-Ali, O. Rana, D. Walker, S. Jha, andS. Sohail, “G-QoSM: Grid Service
Discovery using QoS Properties,” Computing and InformaticsJournal, Special
Issue on Grid Computing, Vol. 21, No.4, Aug. 2002, pp. 363-382.
[141] I. Foster, C. Kesselman, J. Nick, S. Tuecke, “The physiology of the grid an open
grid services architecture for distributed systems integration”, Argonne National
Laboratory, Mathematics and Computer Science Division Chicago, Jan. 2002, 37
pp., www.globus.org/research/papers/ogsa.pdf.
[142] M. Neary, A. Phipps, S. Richman, and P. Cappello, “Javelin 2.0: Java-based
Parallel Computing on the Internet,” European Parallel Computing Conference
(Euro-Par 2000), Aug. 2000, pp. 1231-1238.
[143] R. Wolski, N. Spring, and J. Hayes, “The Network Weather Service: ADistributed
Resource Performance Forecasting Service for Metacomputing ,” Future
Generation Computer Systems, Vol. 15, No. 5, 1999, pp. 757-768.
[144] D. Andresen and T. McCune, “Towards a hierarchical scheduling system for
distributed WWW server clusters,” Proceedings of the Seventh IEEE International
Symposium on High Performance Distributed Computing (HPDC).
[145] F. Berman, R. Wolski, S. Figueira, J. Schopf, and G. Shao, “Application level
scheduling on distributed heterogeneous networks,” Proceedings of
Supercomputing 1996.
[146] N. Spring and R. Wolski, “Application level scheduling: Gene sequence library
comparison,” Proceedings of ACM International Conference on Supercomputing ,
July 1998.
119
[147] X. Sun, and M. Wu, “GHS: A Performance System of Grid Computing,” 19th
IEEE International Parallel and Distributed Processing Symposium , Apr. 2005.
[148] B. Cooper, and H. Garcia-Molina, “Bidding for Storage Space in a Peer-to-Peer
Data Preservation System,” 22nd International Conference on Distributed
Computing Systems (ICDCS 2002), July 2002, pp. 372-381.
[149] D. Carvalho, F. Kon, F. Ballesteros, M. Romn, R. Campbell, and D. Mickunas,
“Management of Execution Environments in 2K,” 7th International Conference on
Parallel and Distributed Systems (ICPADS ’00) , July 2000, pp. 479-485.
[150] F. Kon, R. Campbell, M. Mickunas, and K. Nahrstedt, “2K: A Distributed
Operation System for Dynamic Heterogeneous Environments,” 9th IEEE
International Symposium on High Performance Distributed Computing (HPDC
’00), Aug. 2000, pp.201-210.
[151] M. Roman, F. Kon, and R. H. Campbell, “Design and Implementation of Runtime
Reflection in Communication Middleware the Dynamic Use Case,” Workshop on
Middleware (ICDCS’99), May 1999.
[152] D. Schmidt, Distributed Object Computing with CORBA Middleware,
http://www.cs.wustl.edu/~schmidt/corba.html, accessed Feb. 4, 2011.
[153] F. Berman, and R. Wolski, “The AppLeS Project: A Status Report,” 8th NEC
Research Symposium, May 1997.
[154] P. Chandra, A. Fisher, C. Kosak, Ng. TSE, P. Steenkiste, E. Takahashi, and H.
Zhang, “Darwin: Customizable Resource Management for Value-added Network
Services,” 6th IEEE International Conference on Network Protocols, Oct. 1998.
[155] G. Allen, D. Angulo, I. Foster, G. Lanfermann, C. Liu, T. Radke, E. Seidel, and J.
Shalf, “The Cactus Worm: Experiments with Dynamic Resource Discovery and
Allocation in a Grid Environment,” International Journal of High Performance
Computing Applications, Vol. 15, No. 4, 2001, pp. 345-358.
[156] F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson,
K. Kennedy, C. Kesselman, J. Mellor-Crummey, D.Reed, L. Torczon, and R.
Wolski, “The GrADS Project: Software Support for High-level Grid Application
Development,” International Journal of Supercomputer Applications, Vol. 15, No.
04, 2001, pp.327-344.
[157] N. Kapadia, R. Figueiredo, and J. Fortes, “PUNCH: Web portal for running tools,”
IEEE Micro, Vol. 20, No. 3, June 2000, pp. 38-47.
[158] D. Abramson, J. Giddy, and L. Kotler, “High Performance Parametric Modeling
with Nimrod/G: Killer Application for the Global Grid?” International Parallel
and Distributed Processing Symposium (IPDPS 2000) , May 2000, pp. 520-528.
[159] R. Buyya, D. Abramson, and J. Giddy, “Nimrod/G: An Architecture for a
Resource Management and Scheduling System in a Global Computational Grid,”
International Conference on High Performance Computing in Asia–Pacific Region
(HPC Asia 2000), May 2000, Vol. 1, pp. 283-289.
[160] R. Buyya, J. Giddy, and D. Abramson, “An Evaluation of Economy-based
Resource Trading and Scheduling on Computational Power Grids for Parameter
Sweep Applications,” 2nd International Workshop on Active Middleware Services
(AMS’00), Aug. 2000.
[161] G. Valentini, W. Lassonde, S. Khan, N. Min-Allah, S. Madani, J. Li, L. Zhang, L.
Wang, N. Ghani, J. Kolodziej, H. Li, A. Zomaya, C. Xu, P. Balaji, A. Vishnu, F.
Pinel , J. Pecero , D. Kliazovich, and P. Bouvry, “An Overview of Energy
120
Efficiency Techniques in Cluster Computing Systems,” Cluster Computing, pp. 1-
13, Sep. 2011, DOI: 10.1007/s10586-011-0171-x.
[162] L. Kotler, D. Abramson, P. Roe, and D. Mather, “Activesheets: Super-computing
with Spreadsheets,” Advanced Simulation Technologies Conference High
Performance Computing Symposium (HPC’01), Apr. 2001.
[163] H. Casanova, and J. Dongarra, “Netsolve: A Network-enabled Server for Solving
Computational Science Problems,” International Journal of Supercomputer
Applications and High Performance Computing, Vol. 11, No. 3, 1997, pp. 212-
223.
[164] J. Gehring, and A. Streit, “Robust Resource Management for Metacomputers,” 9th
IEEE International Symposium on High Performance Distributed Computing ,
Aug. 2000.
[165] I. Foster and C. Kesselman. “Globus: A Metacomputing Infrastructure Toolkit,”
International Journal of Supercomputer Applications, Vol. 11, No. 2, 1996, pp.
115-128.
[166] S. Andrew, and W. Wulf, “The Legion Vision of a Worldwide Virtual Computer,”
Communications of ACM, Vol. 40, No. 1, Jan. 1997, pp. 39-45.
[167] Amazon Elastic Compute Cloud (EC2), http://www.amazon.com/ec2/, accessed
Feb. 10, 2011.
[168] Z. Hill, and M. Humphrey, “A Quantitative Analysis of High Performance
Computing with Amazon‟s EC2 Infrastructure: The Death of the Local Cluster?,”
10th IEEE/ACM International Conference on Grid Computing , Oct. 2009, pp. 26-
33.
[169] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I.
Pratt, and A. Warfield, “Xen and the art of Virtualization,” ACM Symposium on
Operating Systems Principles (SOSP), Vol. 37, No. 5, Oct. 2003, pp.164-177.
[170] A. Bedra, “Getting Started with Google App Engine and Clojure,” IEEE Journal
of Internet Computing,Vol. 14, No. 4, 2010, pp. 85-88.
[171] Google App Engine, http://appengine.google.com, accessed Feb. 02, 2011.
[172] I. Baldine, Y. Xin, A. Mandal, andC. Heermann, “Networked Cloud Orchestration:
A GENI Perspective,” IEEE GLOBECOM Workshops, Dec. 2010, pp. 573-578.
[173] Slice Federation Architecture 2.0, GENI,
http://groups.geni.net/geni/wiki/SliceFedArch, accessed Sep 17, 2012.
[174] Sun Network.com (Sun Grid), http://www.network.com, accessed Feb. 08, 2011.
[175] W. Gentzsch, “Sun Grid Engine: Towards Creating a Compute Power Grid,” 1st
IEEE/ACM International Symposium on Cluster Computing and the Grid , May.
2001, pp. 35-36.
[176] L. Uden, and E. Damiani, “The Future of E-learning: E-learning Ecosystem,” 1st
IEEE International Conference on Digital Ecosystems and Technologies, June.
2007, pp. 113-117.
[177] V. Chang, and C. Guetl, “E-Learning Ecosystem (ELES)-A Holistic Approach for
the Development of More Effective Learning Environment for Small-and-Medium
Sized Enterprises,” 1st IEEE International Conference on Digital Ecosystems and
Technologies, Feb. 2007, pp. 420-425.
[178] B. Dong, Q. Zheng, J. Yang, H. Li, andM. Qiao,“An E-learning Ecosystem Based
on Cloud Computing Infrastructure,” 9th IEEE International Conference on
Advanced Learning Technologies, July 2009, pp. 125 – 127.
121
[179] X. Chu, K. Nadiminti, C. Jin, S. Venugopal,and R. Buyya, “Aneka: Next -
Generation Enterprise Grid Platform for e-Science and e-Business Applications,”
3rd IEEE International Conference on e-Science and Grid Computing, Dec. 2007,
pp. 151-159.
[180] A. Chien, B. Calder, S. Elbert, and K. Bhatia, “Entropia: Architecture and
Performance of an Enterprise Desktop Grid System,” Journal of Parallel and
Distributed Computing, Vol. 63, No. 5, May 2003, pp.597-610.
[181] OpenSatck, http://openstack.org/downloads/openstack-overview-datasheet.pdf,
accessed Apr. 04, 2012.
[182] E. Bini, G.C. Buttazzo, G. Buttazzo, “Rate monotonic analysis: the hyperbolic
bound”, IEEE Trans. Comput. 7 (52) (2003) 933–942.
[183] A.P. Chandrakasan, R.W. Brodersen, Low Power Design, Kluwer Academic
Publishers, Dordrecht, 1995.
[184] Crusoe Processor Model TM5800 Specifications, http://www.charmed.com/
PDF/TM5800.pdf,2011.
[185] N. Min-Allah, Y. Wang, X. Jian-Sheng, J. Liu, “Revisiting fixed priority
techniques”, in: Proceedings of Embedded and Ubiquitous Computing, EUC07,
in: LNCS, vol. 4808, 2007, pp. 134–145.
[186] Daniel Grosu et.al, “Noncooperative load balancing in distributed systems”, J.
Parallel Distrib. Comput. 65 (2005) 1022 – 1034.
[187] R. Mirchan daney, D. Towsley, J. Stankovic, “Adaptive load sharing in
heterogeneous systems”, in: Proceedings of the Ninth IEEE International
Conference on Distributed Computing Systems, June 1989, pp. 298–306.
[188] M.H. Wille beek-Le Mair, A.P. Reeves, “Strategies for dynamic load balancing on
highly parallel computers”, IEEE Trans. Parallel Distributed Systems 4 (9)
(September 1993) 979–993.
[189] D. Grosu, A.T. Chronopoulos, M.Y. Leung, “Load balancing in distributed
systems: an approach using cooperative games”, in: Proceedings of the
International Parallel and Distributed Processing Symposium , April 2002, pp. 52–
61.
[190] H. Kameda, J. Li, C. Kim, Y. Zhang, “Optimal Load Balancing in Distributed
Computer Systems”, Springer, London, 1997.
[191] Andrey G. et al. “Load balancing algorithms based on gradient methods and their
analysis through algebraic graph theory”, J. Parallel Distrib. Comput. 68 (2008)
209 – 220.
[192] F. Lin, R. Keller, “The gradient model load balancing method”, IEEE Trans.
Software Engrg. 1 (1987) 32–38.
[193] R. Lüling, B. Monien, F. Ramme, “A study on load balancing algorithms,
Technical Report”, Universität-GH Paderborn, 1992.
[194] F. Muniz, E. Zaluska, “Parallel load-balancing: an extension to the gradient
model”, Parallel Comput. 21 (1995) 287–301.
[195] Lee YC, Zomaya AY, “Energy efficient utilization of resources in cloud computing
systems”. Journal of Super Computing, 60(2):268–280. doi:10.1007/s11227-010-0421-
3, 2012.
[196] A. Hameed et.al. “A survey and taxonomy on energy efficient resource allocation
techniques for cloud computing systems”, Springer, Journal of Computing, doi:
10.1007/s00607-014-0407-8, June 2014
122
Author’s Publications
(Published)
1. N. Min-Allah, Hameed Hussain, S. U. Khan, and A. Y. Zomaya, "Power Efficient
Rate Monotonic Scheduling for Multi-core Systems", Journal of Parallel and
Distributed Computing, vol. 72, no. 1, pp. 48-57, 2012. Elsevier Journal. Impact
Factor: 1.078 (2010).
2. Hameed Hussain, S. U. R. Malik, A. Hameed, S. U. Khan, G. Bickler, N. Min-Allah,
M. B. Qureshi, L. Zhang, W. Yongji, N. Ghani, J. Kolodziej, A. Y. Zomaya, C.-Z.
Xu, P. Balaji, A. Vishnu, F. Pinel, J. E. Pecero, D. Kliazovich, P. Bouvry, H. Li, L.
Wang, D. Chen, and A. Rayes, "A Survey on Resource Allocation in High
Performance Distributed Computing Systems", Parallel Computing, vol. 39, no. 11,
pp. 709-736, 2013. Elsevier Journal. Impact Factor: 1.214 (2013).
3. Hameed Hussain et.al. “Load Balancing through Task Shifting and Task Splitting
Strategies in Multi-core environment”, Journal of Electronic Systems, vol 4, no 2,
June 2014, pp. 61-67.
4. Hameed Hussain, Muhammad Bilal Qureshi, Manzoor Illahi Tamimy, “Minimizing
Power Consumption through System Speed using Genetic Algorithm” , Accepted in
The Scientific World Journal (TSWJ). (Hindawi Journal) (2016)
5. Muhammad Zakarya, Syed Bilal Hussain Shah, Aftab Alam, Ateeq ur Rahman, Arsh
ur Rahman, Izaz ur Rahman, Ayaz Ali Khan, Hameed Hussain, Nazar Abbas, “An
Overview of New Ultra Lightweight RFID Authentication Protocol SASI”,
International Journal of Computer Science Issues (IJCSI) , Vol 8, Issue 2, pp.518-
524, March 2011, USA, ISSN (Online): 1694-0814.
6. Muhammad Bilal Qureshi, Maryam Mehri Dehnavi, Nasro Min-Allah, Muhammad
Shuaib Qureshi, Hameed Hussain, Ilias Rentifis, Nikos Tziritas et al. “Survey on
Grid Resource Allocation Mechanisms”, Journal of Grid Computing (2014): 1-43.
Springer, Impact Factor: 1.667 (2013)
7. Hameed Hussain, Maqbool Uddin Shaikh, Saif Ur Rehman Malik, “Proposed Text
Mining Framework to Explore Issues from Text in a Certain Domain”, IEEE
International Conference on Computer Engineering and Applications (ICCEA) , 19-
21 March, 2010, pp. 16-21, Bali Island, Indonesia, DOI: 10.1109/ICCEA.2010.11.
8. Hameed Hussain, Muhammad Bilal Qureshi, Muhammad Shoaib, Sadiq Shah, “Load
Balancing through Task Shifting and Task Splitting Strategies in Multi-core
environment”, 8th
IEEE International Conference on Digital Information
Management (ICDIM), 10-12 September, 2013, pp. 385-390, Marriot, Islamabad,
Pakistan. DOI: 10.1109/ICDIM.2013.6694040
9. Azra Shamim, Hameed Hussain, Maqbool Uddin Shaikh, “A Framework for Generation of
Rules from Decision Tree and Decision Table”, IEEE Internal Conference on Education
and Information Technology (ICEIT), 17-19 September, 2010, pp. 1-6, FAST University
Karachi, Pakistan. DOI: 10.1109/ICIET.2010.5625700
123
10. Sadiq Shah, Hameed Hussain, Muhammad Shoaib, “Minimizing Non-coordinated
Interference in Multi-Radio Multi-Channel Wireless Mesh Networks (MRMC-
WMNs)”, 8th
IEEE International Conference on Digital Information Management
(ICDIM), 10-12 September, 2013, pp. 24-28, Marriot, Islamabad, Pakistan. DOI:
10.1109/ICDIM.2013.6694017
11. Muhammad Shoaib, Nasru Minallah, Shahzad Rizwan, Sadiq Shah, Hameed Hussain,
“Investigating the impact of Group Mobility Models over the On-Demand Routing
Protocol in MANETs”, 8th
IEEE International Conference on Digital Information
Management (ICDIM), 10-12 September, 2013, pp. 29-34, Marriot, Islamabad,
Pakistan. DOI: 10.1109/ICDIM.2013.6694016