Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN...

42
Military Technical Academy B ucharest, 2004 GRID CONSTRUCTION GRID CONSTRUCTION SOFTWARE COMPONENTS SOFTWARE COMPONENTS ADINA RIPOSAN ADINA RIPOSAN Applied Information Technology Applied Information Technology Department of Computer Engineering Department of Computer Engineering

Transcript of Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN...

Page 1: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy Bucharest, 2004

GRID CONSTRUCTIONGRID CONSTRUCTION

SOFTWARE COMPONENTSSOFTWARE COMPONENTS

ADINA RIPOSANADINA RIPOSANApplied Information TechnologyApplied Information Technology

Department of Computer EngineeringDepartment of Computer Engineering

Page 2: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

GRID SOFTWARE COMPONENTSGRID SOFTWARE COMPONENTS

Management componentsManagement components Donor softwareDonor software Submission softwareSubmission software Distributed grid managementDistributed grid management SchedulersSchedulers Reservation systemReservation system CommunicationCommunications systems system ObservationObservation & & measurement measurement Monitoring & Monitoring & RecoveryRecovery Grid Grid AApplicationspplications

Page 3: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Management componentsManagement components

Page 4: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Management componentsManagement components

Any grid system has some Any grid system has some

management components:management components:

1st:1st: A component used primarily A component used primarily to decideto decide

where grid jobs should be assignedwhere grid jobs should be assigned

It keeps track of: It keeps track of:

• the the resources availableresources available to the grid, and to the grid, and

• which which usersusers are members of the grid. are members of the grid.

Page 5: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

2nd2nd:: Measurement componentsMeasurement components that determine: that determine:

• the the capacities of the nodescapacities of the nodes on the grid, and on the grid, and

• their their current utilization ratecurrent utilization rate at any given time. at any given time.

This information is used:This information is used:

• to to schedule jobsschedule jobs in the grid in the grid

• to to determine the health of the griddetermine the health of the grid, alerting personnel to , alerting personnel to

problems such as outages, congestion, or problems such as outages, congestion, or

overcommitment overcommitment

• to determine to determine overall usage patterns and statisticsoverall usage patterns and statistics

• to to log and accountlog and account for usage of grid resources. for usage of grid resources.

Page 6: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

3rd3rd:: Advanced grid management softwareAdvanced grid management software

known as known as “autonomic computing”“autonomic computing” or or “recovery oriented computing”“recovery oriented computing”

• can can automatically manageautomatically manage many aspects of the many aspects of the grid, grid,

• automateautomate the various complicated tasks involved the various complicated tasks involved in managing a grid.in managing a grid.

This software would:This software would:• identify problemsidentify problems in real time in real time• quickly quickly initiate corrective actionsinitiate corrective actions before they seriously before they seriously

impair the gridimpair the grid• automatically recoverautomatically recover from various kinds of grid from various kinds of grid

failures and outages, failures and outages, • finding alternative waysfinding alternative ways to get the workload processed. to get the workload processed.

Page 7: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Donor softwareDonor software

Page 8: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Donor softwareDonor software

A user first enrolls as a GA user first enrolls as a Gridrid USERUSER, and installs , and installs the provided grid software on his own machine. the provided grid software on his own machine.

He may He may optionallyoptionally enroll his machine as a enroll his machine as a DONORDONOR on the on the GGrid.rid.

Each machine Each machine contributing resourcescontributing resources typically typically needsneeds to: to: enroll as a member of the gridenroll as a member of the grid and and install some softwareinstall some software that manages the grid’s use of that manages the grid’s use of

its resources.its resources.

The grid system makes information The grid system makes information available available about about the the newly added resourcesnewly added resources available throughout available throughout the grid.the grid.

Page 9: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

=> The need for:=> The need for:• IIdentificationdentification and and AAuthenticationuthentication procedure procedure • CCertificate ertificate AAuthorityuthority

=> Optionally, grid systems:=> Optionally, grid systems:

provide provide their own login systemtheir own login system to the grid, or to the grid, or

depend on the native operating systemsdepend on the native operating systems for for

user authentication user authentication

(in this case, a (in this case, a user ID mapping systemuser ID mapping system may be may be needed to match the user’s rights properly on different needed to match the user’s rights properly on different machines, but this typically is manually maintained by a machines, but this typically is manually maintained by a grid administrator)grid administrator)

Page 10: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

=> To use the grid:=> To use the grid:

Most grid systems require the user to logon to Most grid systems require the user to logon to a system using a a system using a user IDuser ID that is enrolled in the that is enrolled in the grid. grid.

Other grid systems may have their own Other grid systems may have their own

GridGrid login ID login ID separate from the one on the separate from the one on the operating system.operating system.

A A GGrid login is usually more convenient for rid login is usually more convenient for GGrid rid users:users:• It eliminates the It eliminates the ID matching problemsID matching problems among among

different machines;different machines;• To the user, it makes the To the user, it makes the GGridrid look more like one large look more like one large

virtual computervirtual computer rather than a collection of individual rather than a collection of individual machines.machines.

Page 11: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Some grid systems implement a Some grid systems implement a protective protective

“SANDBOX” around the program, so that:“SANDBOX” around the program, so that:

• It cannot cause any disruption to the It cannot cause any disruption to the donating donating machinemachine if it encounters a problem during if it encounters a problem during execution. execution.

• Rights to access filesRights to access files and other resources on the and other resources on the GGrid machine may be restricted.rid machine may be restricted.

• The protection is ensured BOTH for the The protection is ensured BOTH for the donating donating machinemachine and for the and for the Grid systemGrid system

(2-ways protection)(2-ways protection)

Page 12: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Submission softwareSubmission software

Page 13: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Submission softwareSubmission software

Usually any member machine of a Usually any member machine of a GGrid can be rid can be used toused to:: • submit jobssubmit jobs to the grid to the grid,, and and • initiate grid queriesinitiate grid queries. .

In some In some GGrid systems, this function is implemented rid systems, this function is implemented as a as a separate componentseparate component installed on installed on • ““submission nodes” or submission nodes” or • ““submission clients” submission clients”

When a When a GGrid is built using rid is built using dedicated resourcesdedicated resources rather rather than than scavenged resourcesscavenged resources, ,

=> => separate submission softwareseparate submission software is usually installed on is usually installed on

the user’s desktop or workstation.the user’s desktop or workstation.

Page 14: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

The user usually performs QUERIES:The user usually performs QUERIES:• tto check how busy the grid is, o check how busy the grid is, • how his submitted jobs are progressing, and how his submitted jobs are progressing, and • to look for resources on the grid. to look for resources on the grid.

For queries, For queries, GGrid systems usually providerid systems usually providess::

command line toolscommand line tools - especially useful when the - especially useful when the user wants to write user wants to write a script that automates a a script that automates a sequence of actionssequence of actions

(eg. the user might write a script to look for an available (eg. the user might write a script to look for an available resource, submit a job to it, watch the progress of the job, resource, submit a job to it, watch the progress of the job, and present the results when the job has finished).and present the results when the job has finished).

graphical user interfaces (GUIs)graphical user interfaces (GUIs)

Page 15: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Distributed grid managementDistributed grid management

Page 16: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Distributed grid managementDistributed grid management

LARGE LARGE GGRIDS RIDS => => hierarchicalhierarchical / other type of / other type of organizational topologyorganizational topology, usually matching the , usually matching the connectivity topologyconnectivity topology

(machines locally connected together with a LAN form (machines locally connected together with a LAN form a “cluster” of machines).a “cluster” of machines).

=> The grid may be organized in a => The grid may be organized in a hierarchyhierarchy consisting of consisting of clusters of clustersclusters of clusters. .

The work involved in managing the grid is The work involved in managing the grid is distributeddistributed to to increase the increase the scalabilityscalability of the grid. of the grid.

The The collectioncollection and and grid operationgrid operation and and resource dataresource data as well as well as as job schedulingjob scheduling is is distributeddistributed to match the to match the topologytopology of of the grid the grid

Page 17: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

AA central job schedulercentral job scheduler will will NOTNOT schedule a schedule a submitted job directly to the machine which is to submitted job directly to the machine which is to execute itexecute it

TThe job is sent to a he job is sent to a lower level schedulerlower level scheduler which which handles a set of machines (or further clusters)handles a set of machines (or further clusters)

TThe lower level scheduler handles the assignment to he lower level scheduler handles the assignment to the specific machine the specific machine

Similarly, Similarly, the collection of statistical informationthe collection of statistical information is distributed. is distributed.

• lower level clusterslower level clusters receive activity information from receive activity information from the individual machines, aggregate it, and send it to the individual machines, aggregate it, and send it to higher level management nodeshigher level management nodes in the hierarchy. in the hierarchy.

Page 18: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

The work requiredThe work required:: • to collect the to collect the lower level lower level resultsresults,, and and • produce the final resultproduce the final result

is usually accomplished:is usually accomplished:

by a single programby a single program, usually running on the , usually running on the machine at the point of job submission machine at the point of job submission

distributeddistributed, if there are a very large number , if there are a very large number subjobs required for an application subjobs required for an application

(the subjob that submits more subjobs to the grid would (the subjob that submits more subjobs to the grid would be responsible for be responsible for collectingcollecting and and aggregatingaggregating the results the results of the subjobs it spawned).of the subjobs it spawned).

Page 19: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

SchedulersSchedulers

Page 20: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

SchedulersSchedulers

The The job scheduling softwarejob scheduling software locates a machine locates a machine on which to run a grid job submitted by a user.on which to run a grid job submitted by a user.

Schedulers can be organized in a Schedulers can be organized in a hierarchyhierarchy: :

As grids span wider areas => a need for more As grids span wider areas => a need for more meta-schedulersmeta-schedulers • that can manage variously configured that can manage variously configured collections of collections of

clustersclusters and and smaller gridssmaller grids

A meta-scheduler may submit a job to a A meta-scheduler may submit a job to a cluster cluster schedulerscheduler or other or other lower level schedulerlower level scheduler rather rather than to an individual machine.than to an individual machine.

Page 21: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

These schedulers will evolve to better These schedulers will evolve to better schedule jobs, considering schedule jobs, considering multiple resourcesmultiple resources rather than just CPU utilization.rather than just CPU utilization.

They will also extend their reach to implement They will also extend their reach to implement better better QoS (quality of service)QoS (quality of service)

using using

reservationsreservations, ,

redundancyredundancy,,

history profileshistory profiles

of jobsof jobs and grid performanceand grid performance

Page 22: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

ADVANCED SCHEDULERS ADVANCED SCHEDULERS

monitor the progress of scheduled jobsmonitor the progress of scheduled jobs, , managing the overall work-flow.managing the overall work-flow.

• If the jobs are lost due to system or network If the jobs are lost due to system or network outages, a good scheduler will automatically outages, a good scheduler will automatically resubmit the jobresubmit the job elsewhere. elsewhere.

• If a job appears to be in an If a job appears to be in an infinite loopinfinite loop and and reaches a reaches a maximum timeoutmaximum timeout, then such jobs , then such jobs should not be rescheduled. should not be rescheduled.

• Typically, jobs have different kinds of Typically, jobs have different kinds of

completion codescompletion codes, some of which are suitable , some of which are suitable

for re-submission and some of which are not.for re-submission and some of which are not.

Page 23: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Some schedulers implement a JOB PRIORITY Some schedulers implement a JOB PRIORITY system.system.• sometimes done by using sometimes done by using job queuesjob queues, each with , each with

a different priority. a different priority. • as grid machines become available to execute as grid machines become available to execute

jobs, the jobs are taken from the jobs, the jobs are taken from the highest priority highest priority queuesqueues first. first.

POLICIES of various kinds are also implemented POLICIES of various kinds are also implemented using schedulersusing schedulers

• Policies can include various kinds of Policies can include various kinds of constrainsconstrains on jobs, users, and resources on jobs, users, and resources

(eg, there may be a policy that restricts grid jobs from (eg, there may be a policy that restricts grid jobs from executing at certain times of the day)executing at certain times of the day)

Page 24: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Reservation SystemReservation System

Page 25: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Reserving resources on the grid Reserving resources on the grid in advancein advance is is accomplished with a “RESERVATION SYSTEM”accomplished with a “RESERVATION SYSTEM”

It is more than a scheduler.It is more than a scheduler.

It is first It is first a calendar based systema calendar based system for for

reserving resourcesreserving resources for specific time periods for specific time periods and preventing any others from reserving the and preventing any others from reserving the same resource at the same time. same resource at the same time.

Helps improving the Helps improving the QQuality of uality of SService (QoS)ervice (QoS), , the user reserving a set of resources in advance for the user reserving a set of resources in advance for his his exclusiveexclusive or or high priority usehigh priority use..

It also must be able to It also must be able to remove or suspend jobsremove or suspend jobs that may be running on any machine or that may be running on any machine or resource when the reservation period is resource when the reservation period is reached.reached.

Page 26: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

A A reservation systemreservation system can be used in conjunction can be used in conjunction with with planned hardware or software planned hardware or software maintenance eventsmaintenance events, when the affected , when the affected resource might not be available for resource might not be available for GGrid use.rid use.

In a scavenging gridIn a scavenging grid, it may not be possible to , it may not be possible to reserve specific machines in advance.reserve specific machines in advance.

• Instead, the Instead, the GGrid rid MManagement systemsanagement systems may may allocate a larger fraction of its capacity for a allocate a larger fraction of its capacity for a given reservation to allow for the likelihood of given reservation to allow for the likelihood of some of the resources becoming unavailable.some of the resources becoming unavailable.

• This must be done in conjunction with tools that This must be done in conjunction with tools that have profiled the have profiled the grid’s workload capacitygrid’s workload capacity sufficiently to have sufficiently to have reliable statisticsreliable statistics about the about the grid’s ability to serve the reservation.grid’s ability to serve the reservation.

Page 27: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

CommunicationsCommunications System System

inter-jobs / subjobsinter-jobs / subjobs

communicationcommunication

Page 28: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

A A GGrid system may include software to help rid system may include software to help

JOBSJOBS communicate with each other communicate with each other

An An AApplication may split itself into pplication may split itself into a large a large number of SUBJOBSnumber of SUBJOBS => => Each of these Each of these subjobssubjobs is a separate is a separate jobjob in the in the

GGrid. rid.

The application may implement an The application may implement an ALGORITHMALGORITHM that requires that the subjobs communicate that requires that the subjobs communicate some information among themsome information among them

The SUBJOBS need to be able toThe SUBJOBS need to be able to::

locate other specific SUBJOBS locate other specific SUBJOBS

establish a communications connection with them establish a communications connection with them

send the appropriate data send the appropriate data

Page 29: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

There are many considerations in efficiently There are many considerations in efficiently planning the planning the distribution and sharing of datadistribution and sharing of data on a on a GGridrid

Some thought is usually given on how to arrange Some thought is usually given on how to arrange

Data configurationData configuration to have to have the minimum data the minimum data movement on the gridmovement on the grid

If there will be If there will be large jobslarge jobs with a very large with a very large number of number of sub-jobssub-jobs running on most of the running on most of the ggrid systems for an rid systems for an aapplication that will be pplication that will be repeatedly runrepeatedly run, ,

=> => the data they use may be copied to each the data they use may be copied to each machine and reside until the next time the machine and reside until the next time the application runsapplication runs. .

Page 30: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

• This is preferable to using a networked file This is preferable to using a networked file system to share this data, system to share this data,

because in such a file system, the data would be because in such a file system, the data would be effectively moved from a central location every effectively moved from a central location every time the application is run. time the application is run.

• ThThiis is true unless the file system s is true unless the file system implements a implements a caching featurecaching feature or or replicatesreplicates the data automatically.the data automatically.

Page 31: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Observation Observation && measurement measurement

Page 32: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Observation Observation && measurement measurement

Usually, the Usually, the donor softwaredonor software will include some will include some tools that measuretools that measure the current load & the current load & activity on a given machine using:activity on a given machine using:

operating system facilities, or operating system facilities, or by direct measurement. by direct measurement.

=> This software is sometimes referred to as => This software is sometimes referred to as a “LOAD SENSOR” a “LOAD SENSOR”

Some Some GGrid systems provide the means for rid systems provide the means for implementing implementing custom load sensorscustom load sensors for other for other than CPU or storage resources.than CPU or storage resources.

Page 33: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Such measurement information is useful not only for Such measurement information is useful not only for schedulingscheduling, but also for , but also for discovering overall discovering overall usage patternsusage patterns in the grid. in the grid.

The statistics can show The statistics can show trendstrends which may signal the which may signal the need for additional hardwareneed for additional hardware..

Measurement information about specific jobs can be Measurement information about specific jobs can be collected and used to better predict the collected and used to better predict the resource resource requirements of that jobrequirements of that job the next time it is run. the next time it is run.

better prediction => more efficient grid better prediction => more efficient grid workload managementworkload management

The measurement information can also be saved The measurement information can also be saved for for accounting purposesaccounting purposes::

to form the basis for to form the basis for grid resource brokeringgrid resource brokering, or , or to to manage prioritiesmanage priorities more fairly. more fairly.

Page 34: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

The information can also be The information can also be displayed in displayed in various formsvarious forms to to better visualizebetter visualize GGrid activity rid activity and utilization.and utilization.

• The user can query the The user can query the GGrid system to see how his rid system to see how his application and its subjobs are progressing application and its subjobs are progressing (monitoring)(monitoring). .

• When the number of subjobs becomes large, it When the number of subjobs becomes large, it becomes tobecomes tooo difficult to list them all in a graphical difficult to list them all in a graphical window. window.

• Instead, there may simply be a one large bar graph Instead, there may simply be a one large bar graph showing some averaged progress metric.showing some averaged progress metric.

Page 35: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Monitoring & Monitoring & RecoveryRecovery

Page 36: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Monitoring & Monitoring & RecoveryRecovery

A A GGrid system, in conjunction with its rid system, in conjunction with its JJob ob SScheduler, often provides cheduler, often provides

some degree of recoverysome degree of recovery for subjobs that fail for subjobs that fail

• Grid Grid AApplications can be designed topplications can be designed to automateautomate the the monitoring & recoverymonitoring & recovery of their of their own subjobs own subjobs

using using functionsfunctions provided by the provided by the GGrid system software rid system software

application programming interfacesapplication programming interfaces (APIs) (APIs)

Page 37: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

A job may fail due to: A job may fail due to:

• Programming error:Programming error: The job stops part way with The job stops part way with some program fault.some program fault.

• Hardware or power failure:Hardware or power failure: The machine or The machine or devices being used stop working in some way.devices being used stop working in some way.

• Communications interruption:Communications interruption: A A communication path to the machine has failed or is communication path to the machine has failed or is overloaded with other data traffic.overloaded with other data traffic.

• Excessive slowness:Excessive slowness: The job might be in an The job might be in an infinite loop or normal job progress may be limited infinite loop or normal job progress may be limited by another process running at a higher priority or by another process running at a higher priority or some other form of contention.some other form of contention.

Page 38: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

It is not always possible to It is not always possible to automaticallyautomatically determine if determine if the reason for a job’s failure is due to a problem the reason for a job’s failure is due to a problem with with the design of the applicationthe design of the application or if it is due to or if it is due to failures of various kinds in failures of various kinds in the grid system the grid system infrastructureinfrastructure. .

SchedulersSchedulers are often designed: are often designed: to categorize job failuresto categorize job failures in some way, and in some way, and automatically resubmit jobsautomatically resubmit jobs so that they are so that they are

likely to succeed, running elsewhere on the grid. likely to succeed, running elsewhere on the grid.

In some systems, In some systems, the user the user is informedis informed about any job failures about any job failures,, and and the user must the user must decidedecide whether to issue a command whether to issue a command

to attempt to rerun the failed jobs.to attempt to rerun the failed jobs.

Page 39: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Grid ApplicationsGrid Applications

Page 40: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Grid ApplicationsGrid Applications

Grid Grid AApplications can be categorized in one of pplications can be categorized in one of the following three categories:the following three categories:

• Applications that are Applications that are not enablednot enabled for using multiple for using multiple processors but processors but can be executedcan be executed on different on different machines.machines.

• Applications that Applications that are already designedare already designed to use the to use the multiple processors of a multiple processors of a GGrid setting.rid setting.

• Applications that Applications that need to be modifiedneed to be modified or or rewrittenrewritten to better exploit a to better exploit a GGrid.rid.

Page 41: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

The latter category is of interest to The latter category is of interest to

GGrid rid AApplication developerspplication developers

They will find a need for They will find a need for tools for debuggingtools for debugging and and measuring the behaviormeasuring the behavior of of GGrid rid AApplications.pplications.

Such grid based tools are still in their infancy.Such grid based tools are still in their infancy.

It may be useful for developers to configure a small It may be useful for developers to configure a small GGrid of their own, so that they can use debuggers on rid of their own, so that they can use debuggers on each machine to control and watch the detailed each machine to control and watch the detailed workings of the applications.workings of the applications.

Since the Since the debugging processdebugging process can bypass certain can bypass certain security precautionssecurity precautions, it may not always be wise to , it may not always be wise to allow such debugging on a production allow such debugging on a production GGrid.rid.

Page 42: Military Technical Academy Bucharest, 2004 GRID CONSTRUCTION SOFTWARE COMPONENTS ADINA RIPOSAN Applied Information Technology Department of Computer Engineering.

Military Technical Academy BucharMilitary Technical Academy Bucharest, 2004est, 2004

Factors to consider in Grid-enabling an Factors to consider in Grid-enabling an Application: Application:

New computation intensive applications written New computation intensive applications written today are being today are being designed for parallel executiondesigned for parallel execution, and , and

• these will be easily grid-enabled, if they do not already these will be easily grid-enabled, if they do not already follow emerging grid protocols and standards.follow emerging grid protocols and standards.

There are some There are some practical toolspractical tools that skilled that skilled application designers can use toapplication designers can use to write write a parallel grid a parallel grid application.application.

There are There are NO practical toolsNO practical tools for for transformingtransforming arbitrary applications to exploit the parallel arbitrary applications to exploit the parallel capabilities of a grid. capabilities of a grid.

• Automatic transformation of applications is a science Automatic transformation of applications is a science in its infancy.in its infancy.